21:00:42 #startmeeting swift 21:00:42 Meeting started Wed Sep 28 21:00:42 2022 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:42 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:42 The meeting name has been set to 'swift' 21:00:48 who's here for the swift meeting? 21:01:06 o/ 21:01:07 o/ 21:02:15 as usual, the agenda's at https://wiki.openstack.org/wiki/Meetings/Swift 21:02:30 it's a little sparse, mostly owing to me being out last week 21:02:40 first up 21:02:50 #topic PTG 21:02:58 it's soon! a little over two weeks 21:03:12 Oh wow, that is soon 21:03:30 sorry for the short notice, but i did get a doodle poll up so i can block off some meeting times 21:03:40 #link https://doodle.com/meeting/organize/id/b4RryJJe?authToken=dGltLmJ1cmtlQGdtYWlsLmNvbTtUaW0gQnVya2U%3D.VKpXl57xMo1KNayeLq 21:04:26 I still haven't registered. 21:04:30 ...and i maybe shouldn't have included the token in the query... oh well 21:04:36 Maybe I should just go on my own. 21:04:45 zaitcev, it's all-virtual! 21:05:01 Oh. I thought it was in Columbus, Ohio. 21:05:09 where'd i put that ML update... 21:05:11 Booooring. 21:05:53 #link https://lists.openstack.org/pipermail/openstack-discuss/2022-August/029879.html 21:06:14 caught me off guard, too -- part of why i'm racing to catch up 21:06:48 Boring is in Oregon 21:08:00 #link https://doodle.com/meeting/participate/id/b4RryJJe 21:08:05 yeah, i'd still prefer an in-person meetup, too. but even when we thought it'd be in person, we needed someone to foot the airfare for acoles and mattoliver... 21:08:47 anyway, i also created an etherpad to gather discussion topics 21:08:51 #link https://etherpad.opendev.org/p/swift-ptg-antelope 21:09:06 are the doodle times UTC? 21:09:14 should be 21:09:44 i feel like the doodle interface got worse :-( i thought you could switch timezones fairly easily before... 21:11:41 oh -- it should give you times in your local TZ i think -- over on the left in a private tab i see "United States - Los Angeles, San Diego, San Jose, San Francisco" 21:12:24 and if you haven't registered yet, please do 21:12:26 #link https://openinfra.dev/ptg/ 21:13:56 that's all i've got for the PTG -- hope to see you all there! i'll also bring it up with a few other nvidians that aren't in IRC much 21:14:19 #topic ring v2 - replication improvements 21:15:06 I'm registered and I've booked all the travel I need ;-) 21:15:38 so i've continued building on the ring v2 work -- matt added the last primaries table, and i made the proxy smart enough to use it 21:16:01 next i wanted to improve the replicator/reconstructor to use it, too 21:16:20 mattoliver already had a patch to update the reconstructor 21:16:34 #link https://review.opendev.org/c/openstack/swift/+/835001 21:16:50 Oh nice! 21:17:07 I'll give them a review and a play 21:17:24 but i realized it wouldn't work for the replicator -- the approach was fairly deep in the ssync protocol, and worked on an individual diskfile at a time 21:17:55 so i took a stab at updating the replicator at the suffix-comparison level 21:18:13 #link https://review.opendev.org/c/openstack/swift/+/859349 21:19:09 the idea is to gather a bunch of remote suffixes (including from old primaries) before starting to rsync anything 21:20:22 oh interesting 21:21:11 and based on those suffixes, apply some heuristics to decide whether a node (including the local node!) is a mostly-up-to-date primary, a mostly-full old primary, or a mostly-empty new primary 21:21:56 if the local node seems to be filling, bail early -- like, before we even do local rehashing 21:22:49 if one of the remotes seems to be filling and there's a fairly full old-primary, skip replicating to that specific remote 21:24:48 while working through this and running some experiments at home, i realized we currently do way too much rehashing on those filling primaries 21:25:01 sounds really interesting, being able to make some smart determinations based off the suffixs in the part sounds really clever. 21:26:35 it's a problem because rsync will pre-create a bunch of dirs before starting to fill them -- and when we rehash during an inbound transfer, we clean them up, and the remote rsync hits a failure later 21:27:41 so to avoid that, i also added a new REPLICATE api to just consolidate hashes, without rehashing the invalid suffixes 21:27:49 #link https://review.opendev.org/c/openstack/swift/+/859348 21:27:52 can that confuse that suffix dirs are on a remote node if another comes and rsyncs at the same time? 21:28:01 *what 21:28:27 and confuse your new heuristics 21:29:35 re: rsync creating a bunch of dirs before filling them up. Or is it more one suffix at a time as it's rsyncing. 21:29:36 the heuristics are all based on "knowledge" of a suffix -- they don't actually look at the hash value 21:30:22 oh yeah, so long as we don't rehash, we wont know of the "rsync dirs" 21:31:18 my hope is to get far enough along this path that i can run some more experiments with it at home and have some pretty graphs of improvements to show for the PTG 21:31:38 Nice, sounds really cool 21:31:48 anyway, that's all i've got 21:31:53 #topic open discussion 21:31:55 yeah, lots of progress 21:32:03 what else should we talk about this week? 21:33:00 acoles, i saw a bunch of shard range patches earlier today 21:33:49 yes I have a chain of patches aimed at improving when shards update their sub-shards 21:34:14 all motivated by our painful experience with some shards that got stuck while sharding 21:34:58 briefly, the shards had an incomplete set of sub-shards to which they could cleave, so could not complete sharding... 21:35:49 ...the *root* had a complete set of shard ranges but the shard never merges what the root has (even though the root ranges are fetched during audit)...so shard remains stuck 21:36:58 is it mostly a matter of needing reviews, or is there more discussion that would be useful? 21:37:02 the fix allows shard ranges from the root to be merged into the shard, but only if the result appears to be a valid set of shard ranges 21:37:47 timburke: I have one last tweak to make on the last patch (I have a comment to that effect), but the first 2 are I hope good to go 21:38:01 https://review.opendev.org/c/openstack/swift/+/857718/7 21:38:15 👍 21:38:19 ^ and the v2 patches and follow ups, I've really been meaning to review and get into, but been distracted with a work priority. 21:38:40 https://review.opendev.org/c/openstack/swift/+/858398/ 21:38:41 Will try and get to them all over the next week. 21:39:11 On another note, I haven't been too idle. For those who want to try out the new OpenTelemetry version of Tracing, I've got a branch and a PR on the VSAIO repo that you can use. Seems to be working. Still might be some OpenTracing bits I missed, but I think I got them all. https://github.com/NVIDIA/vagrant-swift-all-in-one/pull/134 21:39:36 mattoliver: nice! 21:39:44 mattoliver, thanks! don't worry too much :-) especially those later patches in the chain, there's a lot of testing gaps... 21:39:51 Oh and I've also been migrating the OpenTracing POC over to OpenTelemetry 21:40:07 and cleaning it up quite a bit at the same time. 21:40:43 #link https://review.opendev.org/c/openstack/swift/+/857559 21:40:48 is the OTel version 21:41:09 Zuul is adding tracing support with testing using jaeger iirc. There might be useful bits there to copy over to swift if you end up wanting to test this 21:41:15 Also found a better way of getting the spans into eventlet pools and piles. So the code is much cleaner 21:41:35 oh nice! clarkb 21:41:41 I'll take a look! 21:42:32 Working on an in memory exporter version of the tracer that we can interrogate, in essence a tracer for unittests 21:43:05 nice! 21:43:18 or rather to unit test the tracing code in swift, not trace unittests :P 21:44:00 that's how i'd interpreted it ;-) 21:44:15 great :P 21:45:56 I need to drop, 👋 21:46:17 seems like we're winding down anyway 21:46:30 thank you all for coming, and thank you for working on swift! 21:46:34 #endmeeting