*** baojg has joined #openstack-swift | 02:03 | |
*** rcernin has quit IRC | 02:21 | |
*** rcernin has joined #openstack-swift | 02:36 | |
alenai | Hello everyone. I'm here to ask for feedback about this patch https://review.opendev.org/c/openstack/swift/+/727876. Did anyone, running production scale deployment (>10mln objects per container, db files on ssd and ~50+ puts/s with ~50+ deletes/s on hot containers) get any improvement on 500s answer codes. Did you get any reduction in 500 codes? | 02:41 |
---|---|---|
*** rcernin has quit IRC | 02:45 | |
*** rcernin has joined #openstack-swift | 02:45 | |
alenai | I'm testing 2.25.1 version of swift before deploying it to production (from 2.25.0) and, for some reason, see minor improvements. Test setup: db file > 5.3GB, 5+mln objects, db on ssd, 3 replicas, replicator with 5 minute interval to accumulate reclaims, 3 day reclaim_age. | 02:48 |
alenai | https://paste.pics/e048c562d7795c3228aadde12c39b0a7 for latency metrics (each spike is a replicator that arrived) and https://paste.pics/8a4ae6f744d3cb5a8f6598569570b5fe (each dip is a replicator that arrived). | 02:52 |
alenai | I tried to insert time.sleep(0.05) between batch deletes and it helped a lot, but it introduced another problem - replication run times skyrocketed. And I cannot allow this in production. Because it would take days to complete 1 replication run. | 02:55 |
alenai | Maybe I'm naive to rely on this patch with my load profile? Window between batch deletes are only for single digit requests? So I can rely on 30-40 quantile latency reduction (seen on dashboards during some replication runs)..... | 02:59 |
alenai | P.S: proxy server timeout regarding container server http requests here is 10 seconds. Current replication run takes about 1 hour in production (300k+ containers on 6 nodes with 8 ssd on each) | 03:01 |
*** rcernin has quit IRC | 04:00 | |
*** fingo has quit IRC | 04:02 | |
*** rcernin has joined #openstack-swift | 04:02 | |
*** rcernin has quit IRC | 04:27 | |
*** rcernin has joined #openstack-swift | 04:35 | |
*** evrardjp has quit IRC | 05:33 | |
*** evrardjp has joined #openstack-swift | 05:33 | |
*** m75abrams has joined #openstack-swift | 05:46 | |
openstackgerrit | Merged openstack/swift master: relinker: Improve logging https://review.opendev.org/c/openstack/swift/+/769632 | 06:31 |
*** alenai has quit IRC | 06:36 | |
*** gmann has quit IRC | 06:45 | |
*** gmann has joined #openstack-swift | 06:47 | |
*** rcernin has quit IRC | 07:25 | |
*** rcernin has joined #openstack-swift | 08:07 | |
*** rpittau|afk is now known as rpittau | 08:11 | |
*** rcernin has quit IRC | 08:24 | |
*** rcernin has joined #openstack-swift | 08:26 | |
*** rcernin has quit IRC | 08:31 | |
*** alenai has joined #openstack-swift | 08:32 | |
*** rcernin has joined #openstack-swift | 08:37 | |
*** cschwede has joined #openstack-swift | 08:57 | |
*** ChanServ sets mode: +v cschwede | 08:57 | |
*** rcernin has quit IRC | 09:27 | |
*** benj_ has quit IRC | 09:28 | |
*** benj_ has joined #openstack-swift | 09:30 | |
*** rcernin has joined #openstack-swift | 09:39 | |
*** rcernin has quit IRC | 10:06 | |
*** baojg has quit IRC | 10:09 | |
*** baojg has joined #openstack-swift | 10:10 | |
*** baojg has quit IRC | 10:10 | |
*** baojg has joined #openstack-swift | 10:10 | |
*** alenai has quit IRC | 10:11 | |
*** baojg has quit IRC | 10:11 | |
*** baojg has joined #openstack-swift | 10:11 | |
*** baojg has quit IRC | 10:11 | |
*** baojg has joined #openstack-swift | 10:12 | |
*** baojg has quit IRC | 10:12 | |
*** baojg has joined #openstack-swift | 10:13 | |
*** baojg has quit IRC | 10:13 | |
*** baojg has joined #openstack-swift | 10:13 | |
*** baojg has quit IRC | 10:14 | |
*** baojg has joined #openstack-swift | 10:14 | |
*** baojg has quit IRC | 10:15 | |
*** baojg has joined #openstack-swift | 10:15 | |
*** baojg has quit IRC | 10:15 | |
*** baojg has joined #openstack-swift | 10:16 | |
*** baojg has quit IRC | 10:16 | |
*** cschwede has quit IRC | 10:47 | |
*** rcernin has joined #openstack-swift | 11:16 | |
*** rcernin has quit IRC | 12:33 | |
*** rcernin has joined #openstack-swift | 12:47 | |
*** rcernin has quit IRC | 13:38 | |
openstackgerrit | Clay Gerrard proposed openstack/swift master: relinker: Add option to drop privileges https://review.opendev.org/c/openstack/swift/+/772419 | 14:59 |
clayg | alenai: yes we saw a reduction in 500s from container servers when we rolled out that patch - large dbs took a long time to reclaim and the patch seemed to allow more index updates with less timeouts. | 15:01 |
clayg | alenai: "Window between batch deletes are only for single digit requests?" - I'm not sure what you mean by this specifically | 15:05 |
clayg | alenai: are the containers default 3x replica? 1hr db replication cycle in a large cluster is great; replication cycle <=24hrs is probably manageable. | 15:07 |
clayg | 10M *rows* per container should still be managable - if you have 10M objects and 800M tombstones rows (deleted = 1) reclaim can be very challenging. You'll want to consider https://docs.openstack.org/swift/latest/overview_container_sharding.html | 15:09 |
openstackgerrit | Clay Gerrard proposed openstack/swift master: WIP: s3api: Make multi-deletes async https://review.opendev.org/c/openstack/swift/+/648263 | 15:38 |
*** zaitcev has joined #openstack-swift | 15:57 | |
*** ChanServ sets mode: +v zaitcev | 15:57 | |
*** alenai has joined #openstack-swift | 16:11 | |
*** m75abrams has quit IRC | 16:12 | |
alenai | clayg: "- I'm not sure what you mean by this specifically" - I'm trying to say, that with container that has nonstop 50+puts/sec and 50+ deletes/sec 24/7/365 and 1hour replication run on average, you have always many reclaimable objects and window between batch deletes is too small to squeeze all those requests (50 puts and 50 deletes) in db file. | 16:24 |
alenai | So, we still see 500s and http timeouts. | 16:24 |
alenai | I'm still scared to use sharding in production. (Аnyways, there is only couple of containers that have 10mln+ tombstone rows). | 16:24 |
clayg | "you have always many reclaimable objects" makes sense - a single db is really only good for about 100 req/s - if you're sustaining that and also managing to get your replication and reclaims in... that's probably about as good as it's going to get without sharding I think | 16:26 |
clayg | but I imagine patch 727876 still would have HELPED, no? that was basically the issue - I think we could try adding an eventlet.sleep(0) or making the reclaim size configurable (more smaller maybe better in your use-case) | 16:27 |
clayg | but I think long term the solution is "for busy containers we want to scale them with sharding" 🤷♂️ | 16:28 |
alenai | yeah, you are right. Maybe I should start testing/staging sharding.... I hoped to postpone this as long as possible... hehe | 16:30 |
clayg | it's going to be SO GREAT! we *LOVE* sharding! | 16:32 |
clayg | shrinking OTOH 😡 | 16:32 |
clayg | but it's getting better 🤞 | 16:32 |
alenai | "Note | 16:35 |
alenai | Container sharding is currently an experimental feature." | 16:35 |
alenai | you know.... production and experimental ... in one sentence... Maybe there should be and update in documentation to cheer up conservative guys like me. | 16:35 |
DHE | it's my understanding that automatic sharding is experimental, but using sharding manually is OK | 16:49 |
DHE | I hope so. my biggest container is using it | 16:49 |
clayg | yes, we've had a todo to update docs to clarify the status of sharding for awhile - maybe we can do that before the next major release - but we've been using it reliably for a long time (a year?) | 16:55 |
clayg | all of our big db's are sharded and it's working well for our use-cases | 16:55 |
*** rpittau is now known as rpittau|afk | 17:21 | |
*** alenai has quit IRC | 17:44 | |
*** alenai has joined #openstack-swift | 19:41 | |
*** alenai has quit IRC | 19:58 | |
*** clayg_ has joined #openstack-swift | 20:17 | |
*** ChanServ sets mode: +v clayg_ | 20:17 | |
*** jrosser_ has joined #openstack-swift | 20:18 | |
*** fyx_ has joined #openstack-swift | 20:18 | |
*** f0o|away has joined #openstack-swift | 20:25 | |
*** jrosser has quit IRC | 20:26 | |
*** clayg has quit IRC | 20:26 | |
*** fyx has quit IRC | 20:26 | |
*** f0o has quit IRC | 20:26 | |
*** zigo has quit IRC | 20:26 | |
*** sorrison has quit IRC | 20:26 | |
*** clayg_ is now known as clayg | 20:26 | |
*** f0o|away is now known as f0o | 20:26 | |
*** jrosser_ is now known as jrosser | 20:26 | |
*** fyx_ is now known as fyx | 20:26 | |
*** zigo has joined #openstack-swift | 20:33 | |
*** gyee has joined #openstack-swift | 21:16 | |
openstackgerrit | Tim Burke proposed openstack/swift master: Run flake8 on bin/ files https://review.opendev.org/c/openstack/swift/+/773485 | 21:27 |
*** zaitcev has quit IRC | 22:12 | |
*** rcernin has joined #openstack-swift | 22:20 | |
*** Underknowledge has joined #openstack-swift | 22:47 | |
*** zaitcev has joined #openstack-swift | 23:13 | |
*** ChanServ sets mode: +v zaitcev | 23:13 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!