Wednesday, 2025-01-29

*** tkajinam is now known as Guest7334		09:05
opendevreview	Alistair Coles proposed openstack/swift master: Refactor some file-like iters as utils.InputProxy subclasses https://review.opendev.org/c/openstack/swift/+/940059	10:20
rcmgleite	Hey folks, have a quick question here: we've been having issues with the object-replicator recently. Its running too frequently and causing slowness in client requests. We use the default internal config of 30 seconds. I wanted to ask what's the right way to think about this configuration because we wanted to make it something larger like 30 minutes maybe. But I'm not really sure I understand the side effects of a change like that	13:15
opendevreview	Shreeya Deshpande proposed openstack/swift master: Add labeled metrics to s3api https://review.opendev.org/c/openstack/swift/+/939481	17:44
opendevreview	Shreeya Deshpande proposed openstack/swift master: Add labeled metrics to s3api https://review.opendev.org/c/openstack/swift/+/939481	19:10
mattoliver	Morning	21:02
rcmgleite	hello hello!	21:04
rcmgleite	@mattoliver - are we having the sync this week? Or is it bi-weekly?	21:05
mattoliver	rcmgleite: I assume you have a replication network. You can also set the ionice for the replicator. I think you can also turn down the speed. I'll need to check when I get to my desk (currently on my phone), school drop off this morning.	21:06
mattoliver	we're suppose to have it..	21:06
mattoliver	Maybe timburke_ is distracted, or called out. let me check	21:07
mattoliver	I've pinged him.	21:08
rcmgleite	Cool!	21:09
mattoliver	Today is the first day back at school for my kiddos.. so at 1/2 past the hour I need to drop. Otherwise I'd just start the meeting and get started.	21:10
timburke_	oh, right! sorry	21:11
mattoliver	nps	21:11
timburke_	mattoliver's kids are back in school, mine are out for lunar new year :P	21:12
timburke_	i don't actually have all that much to talk about, though	21:13
mattoliver	That's ok. I only have 15 minutes before I need to drop. So happy to do whatever. Either a quick open floor (not that I have anything to say) or skip until next week is also fine by me	21:13
mattoliver	rcmgleite: is there anything you want to chat about?	21:14
timburke_	i think the main interesting thing is that recent boto3/botocore/s3transfer releases started using some aws-chunked protocol by default, and we don't currently support it	21:14
rcmgleite	I have one for you @timburke_ - how can I help with the chunked encoding uploads PR? I saw a few TODOs but not sure	21:14
rcmgleite	ok so we on the same page :p	21:14
mattoliver	oh yeah	21:14
timburke_	https://review.opendev.org/c/openstack/swift/+/836755	21:14
timburke_	i'm working on getting some more cross-compat tests up that would exercise the hmac-sha256 signing	21:15
timburke_	reviews for what could use some clarification/rewording/refactoring would probably be a good idea. i'm nervous about merging it if i'm the only one that understands how it works, especially since i don't feel like i understand it all ;-)	21:16
timburke_	i know acoles has been looking at it recently, too -- need to actually address comments	21:17
rcmgleite	haha! Cool. I'll block some time tomorrow to go over it as well	21:17
mattoliver	And you've pull it away from the extra checksum patches, so it'll just ignore extra checksums for now in the hopes to get this fix landed quicker?	21:17
mattoliver	I'll attempt to look at it today after school drop off.	21:18
mattoliver	not that I'm a s3api expert.. but I can read code and the api docs :)	21:18
timburke_	yeah, that seemed to be the consensus take, since client upgrades are forcing our hand a bit	21:18
mattoliver	kk, makes sense. We still plan to have the extra checksum support, so long as it lands before the next offical release no one will notice (except us who run master++)	21:19
rcmgleite	Aside from chunked encoding, I remember you mentioning a possible rework of multi-part uploads. Would you be able to explain what is the new version going to address?	21:22
mattoliver	The new MPU stuff will be more atomic. Currently they use SLO which were designed on purpose for users to have access to the manifest (joined object) and the segments. So they can use them in more places.	21:23
rcmgleite	We've started benchmarking s3 vs swift recently and we've seen a lot of difference especially in ttfb - swift being slower - which could 100% be our infrastructure..But I also wonder about all the overhead things like versioning and mpu add. I'll likely share some of our results here just for the sake of discussion	21:24
mattoliver	the new mpu will keep the segments hidden for the user, so a delete of the mpu object can also clean the segments. makes them a 1:1 mapping.	21:24
mattoliver	could be interesting to test the later version of swift, we've done some speed improvements	21:25
rcmgleite	@mattoliver - so it's not a performance benefit that we are seeking here?	21:25
mattoliver	Anyway, I gotta run	21:25
rcmgleite	We are on our way to catch up... currently at ZED still :/	21:25
rcmgleite	Thanks matt! See you!	21:25
timburke_	rcmgleite, no, it's more about correctness -- making sure users can't delete segments out from under the mpu, or overwrite the mpu without deleting the segments	21:26
timburke_	see https://bugs.launchpad.net/swift/+bug/1813202 for example	21:26
rcmgleite	Cool! Thanks	21:28
rcmgleite	A last question that I'm hoping to get some knowledge from ppl that have been running swift in production for a while: The default internal for replicators is 30 seconds. Is this really the config you guys usually use?	21:29
rcmgleite	this seems to be causing perf issues for us (db locking mostly but very early in the investigation). And I'm wondering if there's a more sensible number we should be using there	21:30
timburke_	no, definitely not. something on the order of hours is perfectly reasonable. though you might get further by turning down databases_per_second/files_per_second and/or setting ionice_class to something like IOPRIO_CLASS_IDLE	21:32
timburke_	we should look at changing those defaults	21:32
rcmgleite	The side effect of making that larger is likely a bigger inconsistency window right?	21:33
rcmgleite	And on the same page - this makes me wonder about other defaults that could be better configured - We run on almost 99% of the default configs.. and are now actually trying to go through all of them and see if they need changing. Are there any resources on better defaults that I can dig up somewhere?	21:34
timburke_	yeah, it may cause longer inconsistencies -- though at the DB layer i feel like the replicator is more of a backstop; the object-updater is the main thing to watch when looking for listing consistency	21:36
rcmgleite	Interesting. So object/account/container replicators could all have much longer cycles. That's interesting	21:37
timburke_	well, for object-level consistency, the replicator/reconstructor definitely help. if an overwrite ends up on a mix of primaries and handoffs, any primary that missed the overwrite will continue serving old data until a replicator gets the new data to it	21:40
timburke_	but with listings, a missed update gets written down as an async pending for the updater to resolve, and when it does it has to talk to all db replicas	21:41
rcmgleite	fair enough. But for any metadata updates on the container / account dbs the updater is the one that matters the most then?	21:43
timburke_	true	21:43
rcmgleite	that's really helpful, thanks tim. We are just getting started on all these control plane daemons.. our understanding is much weaker than it should be	21:43
timburke_	or rather, the replicators are still significant for getting metadata in sync	21:43
rcmgleite	yup, makes sense	21:44
rcmgleite	and as for other defaults, guess we will have to go over them and just choose what makes sense right? Just wonder if there should be some effort in making the defaults the most "sane" configs for most cases	21:44
timburke_	yeha, i like that idea a lot. i know we sometimes get nervous about changing defaults unexpectedly, but surely we can fix some of the worst offenders (such as interval)	21:49
rcmgleite	cool, we will start playing with these values and whatever other things we believe are off I'll bring here and we can discuss then!	21:50
timburke_	thanks! sounds like a good plan	21:50
rcmgleite	Thanks a lot again Tim! Hope you have a great rest of the week!	21:51
mattoliver	I remember in a past ptg we wanted to revisit current default, and make them more useful to modern deployments.	22:14
mattoliver	So yeah, revisiting defaults is something we should look at!	22:14

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!