21:00:42 <timburke> #startmeeting swift 21:00:42 <opendevmeet> Meeting started Wed Sep 28 21:00:42 2022 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:42 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:42 <opendevmeet> The meeting name has been set to 'swift' 21:00:48 <timburke> who's here for the swift meeting? 21:01:06 <mattoliver> o/ 21:01:07 <acoles> o/ 21:02:15 <timburke> as usual, the agenda's at https://wiki.openstack.org/wiki/Meetings/Swift 21:02:30 <timburke> it's a little sparse, mostly owing to me being out last week 21:02:40 <timburke> first up 21:02:50 <timburke> #topic PTG 21:02:58 <timburke> it's soon! a little over two weeks 21:03:12 <mattoliver> Oh wow, that is soon 21:03:30 <timburke> sorry for the short notice, but i did get a doodle poll up so i can block off some meeting times 21:03:40 <timburke> #link https://doodle.com/meeting/organize/id/b4RryJJe?authToken=dGltLmJ1cmtlQGdtYWlsLmNvbTtUaW0gQnVya2U%3D.VKpXl57xMo1KNayeLq 21:04:26 <zaitcev> I still haven't registered. 21:04:30 <timburke> ...and i maybe shouldn't have included the token in the query... oh well 21:04:36 <zaitcev> Maybe I should just go on my own. 21:04:45 <timburke> zaitcev, it's all-virtual! 21:05:01 <zaitcev> Oh. I thought it was in Columbus, Ohio. 21:05:09 <timburke> where'd i put that ML update... 21:05:11 <zaitcev> Booooring. 21:05:53 <timburke> #link https://lists.openstack.org/pipermail/openstack-discuss/2022-August/029879.html 21:06:14 <timburke> caught me off guard, too -- part of why i'm racing to catch up 21:06:48 <acoles> Boring is in Oregon 21:08:00 <acoles> #link https://doodle.com/meeting/participate/id/b4RryJJe 21:08:05 <timburke> yeah, i'd still prefer an in-person meetup, too. but even when we thought it'd be in person, we needed someone to foot the airfare for acoles and mattoliver... 21:08:47 <timburke> anyway, i also created an etherpad to gather discussion topics 21:08:51 <timburke> #link https://etherpad.opendev.org/p/swift-ptg-antelope 21:09:06 <acoles> are the doodle times UTC? 21:09:14 <timburke> should be 21:09:44 <timburke> i feel like the doodle interface got worse :-( i thought you could switch timezones fairly easily before... 21:11:41 <timburke> oh -- it should give you times in your local TZ i think -- over on the left in a private tab i see "United States - Los Angeles, San Diego, San Jose, San Francisco" 21:12:24 <timburke> and if you haven't registered yet, please do 21:12:26 <timburke> #link https://openinfra.dev/ptg/ 21:13:56 <timburke> that's all i've got for the PTG -- hope to see you all there! i'll also bring it up with a few other nvidians that aren't in IRC much 21:14:19 <timburke> #topic ring v2 - replication improvements 21:15:06 <acoles> I'm registered and I've booked all the travel I need ;-) 21:15:38 <timburke> so i've continued building on the ring v2 work -- matt added the last primaries table, and i made the proxy smart enough to use it 21:16:01 <timburke> next i wanted to improve the replicator/reconstructor to use it, too 21:16:20 <timburke> mattoliver already had a patch to update the reconstructor 21:16:34 <timburke> #link https://review.opendev.org/c/openstack/swift/+/835001 21:16:50 <mattoliver> Oh nice! 21:17:07 <mattoliver> I'll give them a review and a play 21:17:24 <timburke> but i realized it wouldn't work for the replicator -- the approach was fairly deep in the ssync protocol, and worked on an individual diskfile at a time 21:17:55 <timburke> so i took a stab at updating the replicator at the suffix-comparison level 21:18:13 <timburke> #link https://review.opendev.org/c/openstack/swift/+/859349 21:19:09 <timburke> the idea is to gather a bunch of remote suffixes (including from old primaries) before starting to rsync anything 21:20:22 <mattoliver> oh interesting 21:21:11 <timburke> and based on those suffixes, apply some heuristics to decide whether a node (including the local node!) is a mostly-up-to-date primary, a mostly-full old primary, or a mostly-empty new primary 21:21:56 <timburke> if the local node seems to be filling, bail early -- like, before we even do local rehashing 21:22:49 <timburke> if one of the remotes seems to be filling and there's a fairly full old-primary, skip replicating to that specific remote 21:24:48 <timburke> while working through this and running some experiments at home, i realized we currently do way too much rehashing on those filling primaries 21:25:01 <mattoliver> sounds really interesting, being able to make some smart determinations based off the suffixs in the part sounds really clever. 21:26:35 <timburke> it's a problem because rsync will pre-create a bunch of dirs before starting to fill them -- and when we rehash during an inbound transfer, we clean them up, and the remote rsync hits a failure later 21:27:41 <timburke> so to avoid that, i also added a new REPLICATE api to just consolidate hashes, without rehashing the invalid suffixes 21:27:49 <timburke> #link https://review.opendev.org/c/openstack/swift/+/859348 21:27:52 <mattoliver> can that confuse that suffix dirs are on a remote node if another comes and rsyncs at the same time? 21:28:01 <mattoliver> *what 21:28:27 <mattoliver> and confuse your new heuristics 21:29:35 <mattoliver> re: rsync creating a bunch of dirs before filling them up. Or is it more one suffix at a time as it's rsyncing. 21:29:36 <timburke> the heuristics are all based on "knowledge" of a suffix -- they don't actually look at the hash value 21:30:22 <mattoliver> oh yeah, so long as we don't rehash, we wont know of the "rsync dirs" 21:31:18 <timburke> my hope is to get far enough along this path that i can run some more experiments with it at home and have some pretty graphs of improvements to show for the PTG 21:31:38 <mattoliver> Nice, sounds really cool 21:31:48 <timburke> anyway, that's all i've got 21:31:53 <timburke> #topic open discussion 21:31:55 <acoles> yeah, lots of progress 21:32:03 <timburke> what else should we talk about this week? 21:33:00 <timburke> acoles, i saw a bunch of shard range patches earlier today 21:33:49 <acoles> yes I have a chain of patches aimed at improving when shards update their sub-shards 21:34:14 <acoles> all motivated by our painful experience with some shards that got stuck while sharding 21:34:58 <acoles> briefly, the shards had an incomplete set of sub-shards to which they could cleave, so could not complete sharding... 21:35:49 <acoles> ...the *root* had a complete set of shard ranges but the shard never merges what the root has (even though the root ranges are fetched during audit)...so shard remains stuck 21:36:58 <timburke> is it mostly a matter of needing reviews, or is there more discussion that would be useful? 21:37:02 <acoles> the fix allows shard ranges from the root to be merged into the shard, but only if the result appears to be a valid set of shard ranges 21:37:47 <acoles> timburke: I have one last tweak to make on the last patch (I have a comment to that effect), but the first 2 are I hope good to go 21:38:01 <acoles> https://review.opendev.org/c/openstack/swift/+/857718/7 21:38:15 <timburke> 👍 21:38:19 <mattoliver> ^ and the v2 patches and follow ups, I've really been meaning to review and get into, but been distracted with a work priority. 21:38:40 <acoles> https://review.opendev.org/c/openstack/swift/+/858398/ 21:38:41 <mattoliver> Will try and get to them all over the next week. 21:39:11 <mattoliver> On another note, I haven't been too idle. For those who want to try out the new OpenTelemetry version of Tracing, I've got a branch and a PR on the VSAIO repo that you can use. Seems to be working. Still might be some OpenTracing bits I missed, but I think I got them all. https://github.com/NVIDIA/vagrant-swift-all-in-one/pull/134 21:39:36 <acoles> mattoliver: nice! 21:39:44 <timburke> mattoliver, thanks! don't worry too much :-) especially those later patches in the chain, there's a lot of testing gaps... 21:39:51 <mattoliver> Oh and I've also been migrating the OpenTracing POC over to OpenTelemetry 21:40:07 <mattoliver> and cleaning it up quite a bit at the same time. 21:40:43 <mattoliver> #link https://review.opendev.org/c/openstack/swift/+/857559 21:40:48 <mattoliver> is the OTel version 21:41:09 <clarkb> Zuul is adding tracing support with testing using jaeger iirc. There might be useful bits there to copy over to swift if you end up wanting to test this 21:41:15 <mattoliver> Also found a better way of getting the spans into eventlet pools and piles. So the code is much cleaner 21:41:35 <mattoliver> oh nice! clarkb 21:41:41 <mattoliver> I'll take a look! 21:42:32 <mattoliver> Working on an in memory exporter version of the tracer that we can interrogate, in essence a tracer for unittests 21:43:05 <timburke> nice! 21:43:18 <mattoliver> or rather to unit test the tracing code in swift, not trace unittests :P 21:44:00 <timburke> that's how i'd interpreted it ;-) 21:44:15 <mattoliver> great :P 21:45:56 <acoles> I need to drop, 👋 21:46:17 <timburke> seems like we're winding down anyway 21:46:30 <timburke> thank you all for coming, and thank you for working on swift! 21:46:34 <timburke> #endmeeting