Wednesday, 2025-01-29

*** tkajinam is now known as Guest733409:05
opendevreviewAlistair Coles proposed openstack/swift master: Refactor some file-like iters as utils.InputProxy subclasses  https://review.opendev.org/c/openstack/swift/+/94005910:20
rcmgleiteHey folks, have a quick question here: we've been having issues with the object-replicator recently. Its running too frequently and causing slowness in client requests. We use the default internal config of 30 seconds. I wanted to ask what's the right way to think about this configuration because we wanted to make it something larger like 30 minutes maybe. But I'm not really sure I understand the side effects of a change like that13:15
opendevreviewShreeya Deshpande proposed openstack/swift master: Add labeled metrics to s3api  https://review.opendev.org/c/openstack/swift/+/93948117:44
opendevreviewShreeya Deshpande proposed openstack/swift master: Add labeled metrics to s3api  https://review.opendev.org/c/openstack/swift/+/93948119:10
mattoliverMorning21:02
rcmgleitehello hello!21:04
rcmgleite@mattoliver - are we having the sync this week? Or is it bi-weekly?21:05
mattoliverrcmgleite: I assume you have a replication network. You can also set the ionice for the replicator. I think you can also turn down the speed. I'll need to check when I get to my desk (currently on my phone), school drop off this morning.21:06
mattoliverwe're suppose to have it.. 21:06
mattoliverMaybe timburke_ is distracted, or called out. let me check 21:07
mattoliverI've pinged him. 21:08
rcmgleiteCool!21:09
mattoliverToday is the first day back at school for my kiddos.. so at 1/2 past the hour I need to drop. Otherwise I'd just start the meeting and get started. 21:10
timburke_oh, right! sorry21:11
mattolivernps21:11
timburke_mattoliver's kids are back in school, mine are out for lunar new year :P21:12
timburke_i don't actually have all that much to talk about, though21:13
mattoliverThat's ok. I only have 15 minutes before I need to drop. So happy to do whatever. Either a quick open floor (not that I have anything to say) or skip until next week is also fine by me21:13
mattoliverrcmgleite: is there anything you want to chat about?21:14
timburke_i think the main interesting thing is that recent boto3/botocore/s3transfer releases started using some aws-chunked protocol by default, and we don't currently support it21:14
rcmgleiteI have one for you @timburke_ - how can I help with the chunked encoding uploads PR? I saw a few TODOs but not sure21:14
rcmgleiteok so we on the same page :p21:14
mattoliveroh yeah21:14
timburke_https://review.opendev.org/c/openstack/swift/+/83675521:14
timburke_i'm working on getting some more cross-compat tests up that would exercise the hmac-sha256 signing21:15
timburke_reviews for what could use some clarification/rewording/refactoring would probably be a good idea. i'm nervous about merging it if i'm the only one that understands how it works, especially since i don't feel like i understand it all ;-)21:16
timburke_i know acoles has been looking at it recently, too -- need to actually address comments21:17
rcmgleitehaha! Cool. I'll block some time tomorrow to go over it as well21:17
mattoliverAnd you've pull it away from the extra checksum patches, so it'll just ignore extra checksums for now in the hopes to get this fix landed quicker?21:17
mattoliverI'll attempt to look at it today after school drop off. 21:18
mattolivernot that I'm a s3api expert.. but I can read code and the api docs :) 21:18
timburke_yeah, that seemed to be the consensus take, since client upgrades are forcing our hand a bit21:18
mattoliverkk, makes sense. We still plan to have the extra checksum support, so long as it lands before the next offical release no one will notice (except us who run master++)21:19
rcmgleiteAside from chunked encoding, I remember you mentioning a possible rework of multi-part uploads. Would you be able to explain what is the new version going to address?21:22
mattoliverThe new MPU stuff will be more atomic. Currently they use SLO which were designed on purpose for users to have access to the manifest (joined object) and the segments. So they can use them in more places.21:23
rcmgleiteWe've started benchmarking s3 vs swift recently and we've seen a lot of difference especially in ttfb - swift being slower - which could 100% be our infrastructure..But I also wonder about all the overhead things like versioning and mpu add. I'll likely share some of our results here just for the sake of discussion21:24
mattoliverthe new mpu will keep the segments hidden for the user, so a delete of the mpu object can also clean the segments. makes them a 1:1 mapping. 21:24
mattolivercould be interesting to test the later version of swift, we've done some speed improvements21:25
rcmgleite@mattoliver - so it's not a performance benefit that we are seeking here?21:25
mattoliverAnyway, I gotta run21:25
rcmgleiteWe are on our way to catch up... currently at ZED still :/21:25
rcmgleiteThanks matt! See you!21:25
timburke_rcmgleite, no, it's more about correctness -- making sure users can't delete segments out from under the mpu, or overwrite the mpu without deleting the segments21:26
timburke_see https://bugs.launchpad.net/swift/+bug/1813202 for example21:26
rcmgleiteCool! Thanks21:28
rcmgleiteA last question that I'm hoping to get some knowledge from ppl that have been running swift in production for a while: The default internal for replicators is 30 seconds. Is this really the config you guys usually use?21:29
rcmgleitethis seems to be causing perf issues for us (db locking mostly but very early in the investigation). And I'm wondering if there's a more sensible number we should be using there 21:30
timburke_no, definitely not. something on the order of hours is perfectly reasonable. though you might get further by turning down databases_per_second/files_per_second and/or setting ionice_class to something like IOPRIO_CLASS_IDLE21:32
timburke_we should look at changing those defaults21:32
rcmgleiteThe side effect of making that larger is likely a bigger inconsistency window right?21:33
rcmgleiteAnd on the same page - this makes me wonder about other defaults that could be better configured - We run on almost 99% of the default configs.. and are now actually trying to go through all of them and see if they need changing. Are there any resources on better defaults that I can dig up somewhere?21:34
timburke_yeah, it may cause longer inconsistencies -- though at the DB layer i feel like the replicator is more of a backstop; the object-updater is the main thing to watch when looking for listing consistency21:36
rcmgleiteInteresting. So object/account/container replicators could all have much longer cycles. That's interesting 21:37
timburke_well, for object-level consistency, the replicator/reconstructor definitely help. if an overwrite ends up on a mix of primaries and handoffs, any primary that missed the overwrite will continue serving old data until a replicator gets the new data to it21:40
timburke_but with listings, a missed update gets written down as an async pending for the updater to resolve, and when it does it has to talk to all db replicas21:41
rcmgleitefair enough. But for any metadata updates on the container / account dbs the updater is the one that matters the most then?21:43
timburke_true21:43
rcmgleitethat's really helpful, thanks tim. We are just getting started on all these control plane daemons.. our understanding is much weaker than it should be21:43
timburke_or rather, the replicators are still significant for getting metadata in sync21:43
rcmgleiteyup, makes sense21:44
rcmgleiteand as for other defaults, guess we will have to go over them and just choose what makes sense right? Just wonder if there should be some effort in making the defaults the most "sane" configs for most cases21:44
timburke_yeha, i like that idea a lot. i know we sometimes get nervous about changing defaults unexpectedly, but surely we can fix some of the worst offenders (such as interval)21:49
rcmgleitecool, we will start playing with these values and whatever other things we believe are off I'll bring here and we can discuss then!21:50
timburke_thanks! sounds like a good plan21:50
rcmgleiteThanks a lot again Tim! Hope you have a great rest of the week!21:51
mattoliverI remember in a past ptg we wanted to revisit current default, and make them more useful to modern deployments.22:14
mattoliverSo yeah, revisiting defaults is something we should look at!22:14

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!