21:09:38 <timburke> #startmeeting swift
21:09:38 <opendevmeet> Meeting started Wed Aug 31 21:09:38 2022 UTC and is due to finish in 60 minutes.  The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:09:38 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:09:38 <opendevmeet> The meeting name has been set to 'swift'
21:09:44 <timburke> who's here for the swift meeting?
21:09:57 <kota> o/
21:10:03 <mattoliver> o/
21:11:25 <timburke> just want to go through a few patches this week, first up
21:11:35 <timburke> #topic get info memcache skipping
21:11:54 <timburke> mattoliver, thanks again for pushing up some fixes to https://review.opendev.org/c/openstack/swift/+/850954
21:12:50 <mattoliver> Nps, I like it :)
21:13:04 <timburke> the idea there is to randomly add a memcache miss to get_account_info and get_container_info calls, similar to what we're already doing for shard ranges
21:14:08 <timburke> the updates looked good to me, too -- do we want to go ahead and merge it, or wait until we've run it in prod for a bit first?
21:15:56 <mattoliver> I guess it doesn't hurt to wait a week. But on the other hand it is disabled by default.
21:17:00 <timburke> we can wait -- i'll make sure there's a ticket for our ops to try it out post-upgrade, and plan on letting y'all know how it goes next week
21:17:16 <mattoliver> Seeing as we are carrying in prod from this week, maybe we take advantage of that and see if it works :)
21:17:18 <mattoliver> Kk
21:17:25 <timburke> next up
21:17:38 <timburke> #topic object metadata validation
21:19:02 <timburke> one of our recent hires at nvidia took a look at a bug we were seeing where we had a healthcheck that talked directly to object-servers to verify that we can PUT/GET/DELETE on every disk in the cluster
21:20:24 <timburke> unfortunately, the healthcheck would write the bare minimum to get a 201, resulting in the reconstructor blowing up if the DELETE didn't go through (or if there was a race)
21:20:42 <timburke> end result was a patch i'm liking
21:20:44 <timburke> #link https://review.opendev.org/c/openstack/swift/+/853321
21:21:57 <timburke> though i kind of want to go a little farther and add some sanity checks for replicated policies, too, as well as using the new validation hook in the auditor
21:22:07 <timburke> #link https://review.opendev.org/c/openstack/swift/+/855296
21:22:35 <zaitcev> Interesting.
21:23:46 <timburke> (fwiw, the specific bug we were seeing stemmed from us not including a X-Object-Sysmeta-Ec-Etag header in the PUT part of the healthcheck -- we'd include frag index, but not the client-facing etag)
21:25:24 <timburke> just wanted to call attention to them -- i don't think there's much discussion that needs to happen around them (except maybe thinking of further checks we'd like to make)
21:26:39 <timburke> next up
21:26:44 <timburke> #topic ring v2
21:27:14 <timburke> #link https://review.opendev.org/c/openstack/swift/+/834261
21:27:34 <timburke> i've had a chance to play more with the patch -- found a few more rough edges, but the core of it still seems solid
21:29:23 <timburke> i also started putting together some benchmarking using some pretty big rings from prod (20k+ devices, part power like 20 or something like that -- 5MB or so in size)
21:29:23 <mattoliver> K, I haven't looked at it for q few weeks, so I'll revisit it today.
21:29:40 <mattoliver> Oh good idea
21:30:22 <timburke> long and short of it is that v1 and v2 seem to be within a few percentage points of each other, which doesn't seem too surprising given that the formats are largely related
21:31:29 <timburke> interestingly, i haven't seen the performance improvement i was expecting in going from v0 to v1 -- i remember that seemed to be the driving force during my format-history research
21:32:44 <timburke> but then, i also discovered that we didn't specify the pickle protocol version when serializing v0 rings in the new patch, so maybe there were some protocol-level improvements
21:32:51 <timburke> i'll dig into it a little more
21:33:43 <mattoliver> Nice work. Yeah performance testing is kinda what we need on this. How well does it work.
21:33:44 <timburke> https://bugs.launchpad.net/swift/+bug/1031954 for the old performance bug
21:34:14 <mattoliver> And memory performance.. ie we can load only what we want out of the format too.
21:35:33 <timburke> for sure -- though it becomes more noticeable if/when we merge the ring and builder files
21:36:00 <mattoliver> Yeah +1
21:36:24 <mattoliver> But metadata only won't include devices which will help :)
21:39:02 <timburke> #topic sync_batches_per_revert
21:39:42 <timburke> i also did some quasi-benchmarks for this patch -- basically doing an A/B test in my home cluster while rebalancing a few terabytes
21:39:45 <timburke> #link https://review.opendev.org/c/openstack/swift/+/839649
21:39:53 <mattoliver> Oh another good patch.. I kinda forgot about
21:40:49 <mattoliver> This also come into its own when wanting to make progress on bad disks right? Or am I thinking of another patch?
21:41:41 <timburke> yes -- but i'm definitely seeing it being a good/useful thing in healthy (but fairly full) clusters
21:42:59 <mattoliver> Oh nice
21:43:19 <timburke> so i've got three object nodes -- i drained one completely so i could simulate a large expansion. the other two nodes, i ran with handoffs_first -- and one tried the old behavior of a single big rsync, while the other tried batches of 20
21:45:24 <timburke> then i watched the disk usage rate while they both rebalanced to the "new" guy. the single rsync per partition would see long periods where disk usage wouldn't move much, with periodic spikes that would go up to like -200, -400, -600MB/s
21:47:27 <timburke> the node that broke it up into batches would fairly consistently be going something like -50MB/s, occasionally jumping up to -100MB/s -- which which matched the rate of rsync transfers *much* better
21:48:51 <timburke> i don't have log aggregation set up at home (yet), but i also got the sense that there were a lot more rsync errors with the single-rsync transfers
21:49:20 <mattoliver> Oh cool interesting. Less of a list to build and compare at both ends before starting, with things that might change?
21:49:39 <mattoliver> Per rsync
21:50:48 <timburke> less that -- i've got a fairly static cluster; now and then some writes, but mostly pretty read heavy. the bigger win was that we didn't need to wait for a whole partition to transfer before we could start deleting
21:51:18 <mattoliver> Oh yeah, that's huge actually.
21:53:15 <timburke> so my proposition is that this would help a cluster-full situation: you might be able to get to 90% full, finally get your expansion in, and not need to wait a day or two for the initially-failed rsyncs to get retried a few times to start bringing down usage on the old disks
21:54:20 <mattoliver> Yeah, +1, that's pretty great
21:54:27 <kota> +1
21:55:22 <timburke> bonus: the batched node finished its rebalance several hours earlier (again, likely due to the lowered likelihood of rsync errors)
21:55:47 <timburke> all right, sorry -- i took a bunch of time with that. it was kinda fun, though it took a bit :-)
21:55:53 <timburke> #topic open discussion
21:56:01 <timburke> what else should we talk about this week?
21:56:24 <zaitcev> Well, it's been almost a full hour.
21:56:28 <mattoliver> Sounds awesome! Let's get it in
21:58:33 <timburke> all right, i'll call it
21:58:41 <timburke> thank you all for coming, and thank you for working on swift!
21:58:45 <timburke> #endmeeting