21:00:42 <timburke> #startmeeting swift
21:00:42 <opendevmeet> Meeting started Wed Sep 28 21:00:42 2022 UTC and is due to finish in 60 minutes.  The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:00:42 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:00:42 <opendevmeet> The meeting name has been set to 'swift'
21:00:48 <timburke> who's here for the swift meeting?
21:01:06 <mattoliver> o/
21:01:07 <acoles> o/
21:02:15 <timburke> as usual, the agenda's at https://wiki.openstack.org/wiki/Meetings/Swift
21:02:30 <timburke> it's a little sparse, mostly owing to me being out last week
21:02:40 <timburke> first up
21:02:50 <timburke> #topic PTG
21:02:58 <timburke> it's soon! a little over two weeks
21:03:12 <mattoliver> Oh wow, that is soon
21:03:30 <timburke> sorry for the short notice, but i did get a doodle poll up so i can block off some meeting times
21:03:40 <timburke> #link https://doodle.com/meeting/organize/id/b4RryJJe?authToken=dGltLmJ1cmtlQGdtYWlsLmNvbTtUaW0gQnVya2U%3D.VKpXl57xMo1KNayeLq
21:04:26 <zaitcev> I still haven't registered.
21:04:30 <timburke> ...and i maybe shouldn't have included the token in the query... oh well
21:04:36 <zaitcev> Maybe I should just go on my own.
21:04:45 <timburke> zaitcev, it's all-virtual!
21:05:01 <zaitcev> Oh. I thought it was in Columbus, Ohio.
21:05:09 <timburke> where'd i put that ML update...
21:05:11 <zaitcev> Booooring.
21:05:53 <timburke> #link https://lists.openstack.org/pipermail/openstack-discuss/2022-August/029879.html
21:06:14 <timburke> caught me off guard, too -- part of why i'm racing to catch up
21:06:48 <acoles> Boring is in Oregon
21:08:00 <acoles> #link https://doodle.com/meeting/participate/id/b4RryJJe
21:08:05 <timburke> yeah, i'd still prefer an in-person meetup, too. but even when we thought it'd be in person, we needed someone to foot the airfare for acoles and mattoliver...
21:08:47 <timburke> anyway, i also created an etherpad to gather discussion topics
21:08:51 <timburke> #link https://etherpad.opendev.org/p/swift-ptg-antelope
21:09:06 <acoles> are the doodle times UTC?
21:09:14 <timburke> should be
21:09:44 <timburke> i feel like the doodle interface got worse :-( i thought you could switch timezones fairly easily before...
21:11:41 <timburke> oh -- it should give you times in your local TZ i think -- over on the left in a private tab i see "United States - Los Angeles, San Diego, San Jose, San Francisco"
21:12:24 <timburke> and if you haven't registered yet, please do
21:12:26 <timburke> #link https://openinfra.dev/ptg/
21:13:56 <timburke> that's all i've got for the PTG -- hope to see you all there! i'll also bring it up with a few other nvidians that aren't in IRC much
21:14:19 <timburke> #topic ring v2 - replication improvements
21:15:06 <acoles> I'm registered and I've booked all the travel I need ;-)
21:15:38 <timburke> so i've continued building on the ring v2 work -- matt added the last primaries table, and i made the proxy smart enough to use it
21:16:01 <timburke> next i wanted to improve the replicator/reconstructor to use it, too
21:16:20 <timburke> mattoliver already had a patch to update the reconstructor
21:16:34 <timburke> #link https://review.opendev.org/c/openstack/swift/+/835001
21:16:50 <mattoliver> Oh nice!
21:17:07 <mattoliver> I'll give them a review and a play
21:17:24 <timburke> but i realized it wouldn't work for the replicator -- the approach was fairly deep in the ssync protocol, and worked on an individual diskfile at a time
21:17:55 <timburke> so i took a stab at updating the replicator at the suffix-comparison level
21:18:13 <timburke> #link https://review.opendev.org/c/openstack/swift/+/859349
21:19:09 <timburke> the idea is to gather a bunch of remote suffixes (including from old primaries) before starting to rsync anything
21:20:22 <mattoliver> oh interesting
21:21:11 <timburke> and based on those suffixes, apply some heuristics to decide whether a node (including the local node!) is a mostly-up-to-date primary, a mostly-full old primary, or a mostly-empty new primary
21:21:56 <timburke> if the local node seems to be filling, bail early -- like, before we even do local rehashing
21:22:49 <timburke> if one of the remotes seems to be filling and there's a fairly full old-primary, skip replicating to that specific remote
21:24:48 <timburke> while working through this and running some experiments at home, i realized we currently do way too much rehashing on those filling primaries
21:25:01 <mattoliver> sounds really interesting, being able to make some smart determinations based off the suffixs in the part sounds really clever.
21:26:35 <timburke> it's a problem because rsync will pre-create a bunch of dirs before starting to fill them -- and when we rehash during an inbound transfer, we clean them up, and the remote rsync hits a failure later
21:27:41 <timburke> so to avoid that, i also added a new REPLICATE api to just consolidate hashes, without rehashing the invalid suffixes
21:27:49 <timburke> #link https://review.opendev.org/c/openstack/swift/+/859348
21:27:52 <mattoliver> can that confuse that suffix dirs are on a remote node if another comes and rsyncs at the same time?
21:28:01 <mattoliver> *what
21:28:27 <mattoliver> and confuse your new heuristics
21:29:35 <mattoliver> re: rsync creating a bunch of dirs before filling them up. Or is it more one suffix at a time as it's rsyncing.
21:29:36 <timburke> the heuristics are all based on "knowledge" of a suffix -- they don't actually look at the hash value
21:30:22 <mattoliver> oh yeah, so long as we don't rehash, we wont know of the "rsync dirs"
21:31:18 <timburke> my hope is to get far enough along this path that i can run some more experiments with it at home and have some pretty graphs of improvements to show for the PTG
21:31:38 <mattoliver> Nice, sounds really cool
21:31:48 <timburke> anyway, that's all i've got
21:31:53 <timburke> #topic open discussion
21:31:55 <acoles> yeah, lots of progress
21:32:03 <timburke> what else should we talk about this week?
21:33:00 <timburke> acoles, i saw a bunch of shard range patches earlier today
21:33:49 <acoles> yes I have a chain of patches aimed at improving when shards update their sub-shards
21:34:14 <acoles> all motivated by our painful experience with some shards that got stuck while sharding
21:34:58 <acoles> briefly, the shards had an incomplete set of sub-shards to which they could cleave, so could not complete sharding...
21:35:49 <acoles> ...the *root* had a complete set of shard ranges but the shard never merges what the root has (even though the root ranges are fetched during audit)...so shard remains stuck
21:36:58 <timburke> is it mostly a matter of needing reviews, or is there more discussion that would be useful?
21:37:02 <acoles> the fix allows shard ranges from the root to be merged into the shard, but only if the result appears to be a valid set of shard ranges
21:37:47 <acoles> timburke: I have one last tweak to make on the last patch (I have a comment to that effect), but the first 2 are I hope good to go
21:38:01 <acoles> https://review.opendev.org/c/openstack/swift/+/857718/7
21:38:15 <timburke> 👍
21:38:19 <mattoliver> ^ and the v2 patches and follow ups, I've really been meaning to review and get into, but been distracted with a work priority.
21:38:40 <acoles> https://review.opendev.org/c/openstack/swift/+/858398/
21:38:41 <mattoliver> Will try and get to them all over the next week.
21:39:11 <mattoliver> On another note, I haven't been too idle. For those who want to try out the new OpenTelemetry version of Tracing, I've got a branch and a PR on the VSAIO repo that you can use. Seems to be working. Still might be some OpenTracing bits I missed, but I think I got them all.  https://github.com/NVIDIA/vagrant-swift-all-in-one/pull/134
21:39:36 <acoles> mattoliver: nice!
21:39:44 <timburke> mattoliver, thanks! don't worry too much :-) especially those later patches in the chain, there's a lot of testing gaps...
21:39:51 <mattoliver> Oh and I've also been migrating the OpenTracing POC over to OpenTelemetry
21:40:07 <mattoliver> and cleaning it up quite a bit at the same time.
21:40:43 <mattoliver> #link https://review.opendev.org/c/openstack/swift/+/857559
21:40:48 <mattoliver> is the OTel version
21:41:09 <clarkb> Zuul is adding tracing support with testing using jaeger iirc. There might be useful bits there to copy over to swift if you end up wanting to test this
21:41:15 <mattoliver> Also found a better way of getting the spans into eventlet pools and piles. So the code is much cleaner
21:41:35 <mattoliver> oh nice! clarkb
21:41:41 <mattoliver> I'll take a look!
21:42:32 <mattoliver> Working on an in memory exporter version of the tracer that we can interrogate, in essence a tracer for unittests
21:43:05 <timburke> nice!
21:43:18 <mattoliver> or rather to unit test the tracing code in swift, not trace unittests :P
21:44:00 <timburke> that's how i'd interpreted it ;-)
21:44:15 <mattoliver> great :P
21:45:56 <acoles> I need to drop, 👋
21:46:17 <timburke> seems like we're winding down anyway
21:46:30 <timburke> thank you all for coming, and thank you for working on swift!
21:46:34 <timburke> #endmeeting