Monday, 2021-02-01

*** baojg has joined #openstack-swift		02:03
*** rcernin has quit IRC		02:21
*** rcernin has joined #openstack-swift		02:36
alenai	Hello everyone. I'm here to ask for feedback about this patch https://review.opendev.org/c/openstack/swift/+/727876. Did anyone, running production scale deployment (>10mln objects per container, db files on ssd and ~50+ puts/s with ~50+ deletes/s on hot containers) get any improvement on 500s answer codes. Did you get any reduction in 500 codes?	02:41
*** rcernin has quit IRC		02:45
*** rcernin has joined #openstack-swift		02:45
alenai	I'm testing 2.25.1 version of swift before deploying it to production (from 2.25.0) and, for some reason, see minor improvements. Test setup: db file > 5.3GB, 5+mln objects, db on ssd, 3 replicas, replicator with 5 minute interval to accumulate reclaims, 3 day reclaim_age.	02:48
alenai	https://paste.pics/e048c562d7795c3228aadde12c39b0a7 for latency metrics (each spike is a replicator that arrived) and https://paste.pics/8a4ae6f744d3cb5a8f6598569570b5fe (each dip is a replicator that arrived).	02:52
alenai	I tried to insert time.sleep(0.05) between batch deletes and it helped a lot, but it introduced another problem - replication run times skyrocketed. And I cannot allow this in production. Because it would take days to complete 1 replication run.	02:55
alenai	Maybe I'm naive to rely on this patch with my load profile? Window between batch deletes are only for single digit requests? So I can rely on 30-40 quantile latency reduction (seen on dashboards during some replication runs).....	02:59
alenai	P.S: proxy server timeout regarding container server http requests here is 10 seconds. Current replication run takes about 1 hour in production (300k+ containers on 6 nodes with 8 ssd on each)	03:01
*** rcernin has quit IRC		04:00
*** fingo has quit IRC		04:02
*** rcernin has joined #openstack-swift		04:02
*** rcernin has quit IRC		04:27
*** rcernin has joined #openstack-swift		04:35
*** evrardjp has quit IRC		05:33
*** evrardjp has joined #openstack-swift		05:33
*** m75abrams has joined #openstack-swift		05:46
openstackgerrit	Merged openstack/swift master: relinker: Improve logging https://review.opendev.org/c/openstack/swift/+/769632	06:31
*** alenai has quit IRC		06:36
*** gmann has quit IRC		06:45
*** gmann has joined #openstack-swift		06:47
*** rcernin has quit IRC		07:25
*** rcernin has joined #openstack-swift		08:07
*** rpittau\|afk is now known as rpittau		08:11
*** rcernin has quit IRC		08:24
*** rcernin has joined #openstack-swift		08:26
*** rcernin has quit IRC		08:31
*** alenai has joined #openstack-swift		08:32
*** rcernin has joined #openstack-swift		08:37
*** cschwede has joined #openstack-swift		08:57
*** ChanServ sets mode: +v cschwede		08:57
*** rcernin has quit IRC		09:27
*** benj_ has quit IRC		09:28
*** benj_ has joined #openstack-swift		09:30
*** rcernin has joined #openstack-swift		09:39
*** rcernin has quit IRC		10:06
*** baojg has quit IRC		10:09
*** baojg has joined #openstack-swift		10:10
*** baojg has quit IRC		10:10
*** baojg has joined #openstack-swift		10:10
*** alenai has quit IRC		10:11
*** baojg has quit IRC		10:11
*** baojg has joined #openstack-swift		10:11
*** baojg has quit IRC		10:11
*** baojg has joined #openstack-swift		10:12
*** baojg has quit IRC		10:12
*** baojg has joined #openstack-swift		10:13
*** baojg has quit IRC		10:13
*** baojg has joined #openstack-swift		10:13
*** baojg has quit IRC		10:14
*** baojg has joined #openstack-swift		10:14
*** baojg has quit IRC		10:15
*** baojg has joined #openstack-swift		10:15
*** baojg has quit IRC		10:15
*** baojg has joined #openstack-swift		10:16
*** baojg has quit IRC		10:16
*** cschwede has quit IRC		10:47
*** rcernin has joined #openstack-swift		11:16
*** rcernin has quit IRC		12:33
*** rcernin has joined #openstack-swift		12:47
*** rcernin has quit IRC		13:38
openstackgerrit	Clay Gerrard proposed openstack/swift master: relinker: Add option to drop privileges https://review.opendev.org/c/openstack/swift/+/772419	14:59
clayg	alenai: yes we saw a reduction in 500s from container servers when we rolled out that patch - large dbs took a long time to reclaim and the patch seemed to allow more index updates with less timeouts.	15:01
clayg	alenai: "Window between batch deletes are only for single digit requests?" - I'm not sure what you mean by this specifically	15:05
clayg	alenai: are the containers default 3x replica? 1hr db replication cycle in a large cluster is great; replication cycle <=24hrs is probably manageable.	15:07
clayg	10M rows per container should still be managable - if you have 10M objects and 800M tombstones rows (deleted = 1) reclaim can be very challenging. You'll want to consider https://docs.openstack.org/swift/latest/overview_container_sharding.html	15:09
openstackgerrit	Clay Gerrard proposed openstack/swift master: WIP: s3api: Make multi-deletes async https://review.opendev.org/c/openstack/swift/+/648263	15:38
*** zaitcev has joined #openstack-swift		15:57
*** ChanServ sets mode: +v zaitcev		15:57
*** alenai has joined #openstack-swift		16:11
*** m75abrams has quit IRC		16:12
alenai	clayg: "- I'm not sure what you mean by this specifically" - I'm trying to say, that with container that has nonstop 50+puts/sec and 50+ deletes/sec 24/7/365 and 1hour replication run on average, you have always many reclaimable objects and window between batch deletes is too small to squeeze all those requests (50 puts and 50 deletes) in db file.	16:24
alenai	So, we still see 500s and http timeouts.	16:24
alenai	I'm still scared to use sharding in production. (Аnyways, there is only couple of containers that have 10mln+ tombstone rows).	16:24
clayg	"you have always many reclaimable objects" makes sense - a single db is really only good for about 100 req/s - if you're sustaining that and also managing to get your replication and reclaims in... that's probably about as good as it's going to get without sharding I think	16:26
clayg	but I imagine patch 727876 still would have HELPED, no? that was basically the issue - I think we could try adding an eventlet.sleep(0) or making the reclaim size configurable (more smaller maybe better in your use-case)	16:27
clayg	but I think long term the solution is "for busy containers we want to scale them with sharding" 🤷‍♂️	16:28
alenai	yeah, you are right. Maybe I should start testing/staging sharding.... I hoped to postpone this as long as possible... hehe	16:30
clayg	it's going to be SO GREAT! we LOVE sharding!	16:32
clayg	shrinking OTOH 😡	16:32
clayg	but it's getting better 🤞	16:32
alenai	"Note	16:35
alenai	Container sharding is currently an experimental feature."	16:35
alenai	you know.... production and experimental ... in one sentence... Maybe there should be and update in documentation to cheer up conservative guys like me.	16:35
DHE	it's my understanding that automatic sharding is experimental, but using sharding manually is OK	16:49
DHE	I hope so. my biggest container is using it	16:49
clayg	yes, we've had a todo to update docs to clarify the status of sharding for awhile - maybe we can do that before the next major release - but we've been using it reliably for a long time (a year?)	16:55
clayg	all of our big db's are sharded and it's working well for our use-cases	16:55
*** rpittau is now known as rpittau\|afk		17:21
*** alenai has quit IRC		17:44
*** alenai has joined #openstack-swift		19:41
*** alenai has quit IRC		19:58
*** clayg_ has joined #openstack-swift		20:17
*** ChanServ sets mode: +v clayg_		20:17
*** jrosser_ has joined #openstack-swift		20:18
*** fyx_ has joined #openstack-swift		20:18
*** f0o\|away has joined #openstack-swift		20:25
*** jrosser has quit IRC		20:26
*** clayg has quit IRC		20:26
*** fyx has quit IRC		20:26
*** f0o has quit IRC		20:26
*** zigo has quit IRC		20:26
*** sorrison has quit IRC		20:26
*** clayg_ is now known as clayg		20:26
*** f0o\|away is now known as f0o		20:26
*** jrosser_ is now known as jrosser		20:26
*** fyx_ is now known as fyx		20:26
*** zigo has joined #openstack-swift		20:33
*** gyee has joined #openstack-swift		21:16
openstackgerrit	Tim Burke proposed openstack/swift master: Run flake8 on bin/ files https://review.opendev.org/c/openstack/swift/+/773485	21:27
*** zaitcev has quit IRC		22:12
*** rcernin has joined #openstack-swift		22:20
*** Underknowledge has joined #openstack-swift		22:47
*** zaitcev has joined #openstack-swift		23:13
*** ChanServ sets mode: +v zaitcev		23:13

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!