21:09:38 <timburke> #startmeeting swift 21:09:38 <opendevmeet> Meeting started Wed Aug 31 21:09:38 2022 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:09:38 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:09:38 <opendevmeet> The meeting name has been set to 'swift' 21:09:44 <timburke> who's here for the swift meeting? 21:09:57 <kota> o/ 21:10:03 <mattoliver> o/ 21:11:25 <timburke> just want to go through a few patches this week, first up 21:11:35 <timburke> #topic get info memcache skipping 21:11:54 <timburke> mattoliver, thanks again for pushing up some fixes to https://review.opendev.org/c/openstack/swift/+/850954 21:12:50 <mattoliver> Nps, I like it :) 21:13:04 <timburke> the idea there is to randomly add a memcache miss to get_account_info and get_container_info calls, similar to what we're already doing for shard ranges 21:14:08 <timburke> the updates looked good to me, too -- do we want to go ahead and merge it, or wait until we've run it in prod for a bit first? 21:15:56 <mattoliver> I guess it doesn't hurt to wait a week. But on the other hand it is disabled by default. 21:17:00 <timburke> we can wait -- i'll make sure there's a ticket for our ops to try it out post-upgrade, and plan on letting y'all know how it goes next week 21:17:16 <mattoliver> Seeing as we are carrying in prod from this week, maybe we take advantage of that and see if it works :) 21:17:18 <mattoliver> Kk 21:17:25 <timburke> next up 21:17:38 <timburke> #topic object metadata validation 21:19:02 <timburke> one of our recent hires at nvidia took a look at a bug we were seeing where we had a healthcheck that talked directly to object-servers to verify that we can PUT/GET/DELETE on every disk in the cluster 21:20:24 <timburke> unfortunately, the healthcheck would write the bare minimum to get a 201, resulting in the reconstructor blowing up if the DELETE didn't go through (or if there was a race) 21:20:42 <timburke> end result was a patch i'm liking 21:20:44 <timburke> #link https://review.opendev.org/c/openstack/swift/+/853321 21:21:57 <timburke> though i kind of want to go a little farther and add some sanity checks for replicated policies, too, as well as using the new validation hook in the auditor 21:22:07 <timburke> #link https://review.opendev.org/c/openstack/swift/+/855296 21:22:35 <zaitcev> Interesting. 21:23:46 <timburke> (fwiw, the specific bug we were seeing stemmed from us not including a X-Object-Sysmeta-Ec-Etag header in the PUT part of the healthcheck -- we'd include frag index, but not the client-facing etag) 21:25:24 <timburke> just wanted to call attention to them -- i don't think there's much discussion that needs to happen around them (except maybe thinking of further checks we'd like to make) 21:26:39 <timburke> next up 21:26:44 <timburke> #topic ring v2 21:27:14 <timburke> #link https://review.opendev.org/c/openstack/swift/+/834261 21:27:34 <timburke> i've had a chance to play more with the patch -- found a few more rough edges, but the core of it still seems solid 21:29:23 <timburke> i also started putting together some benchmarking using some pretty big rings from prod (20k+ devices, part power like 20 or something like that -- 5MB or so in size) 21:29:23 <mattoliver> K, I haven't looked at it for q few weeks, so I'll revisit it today. 21:29:40 <mattoliver> Oh good idea 21:30:22 <timburke> long and short of it is that v1 and v2 seem to be within a few percentage points of each other, which doesn't seem too surprising given that the formats are largely related 21:31:29 <timburke> interestingly, i haven't seen the performance improvement i was expecting in going from v0 to v1 -- i remember that seemed to be the driving force during my format-history research 21:32:44 <timburke> but then, i also discovered that we didn't specify the pickle protocol version when serializing v0 rings in the new patch, so maybe there were some protocol-level improvements 21:32:51 <timburke> i'll dig into it a little more 21:33:43 <mattoliver> Nice work. Yeah performance testing is kinda what we need on this. How well does it work. 21:33:44 <timburke> https://bugs.launchpad.net/swift/+bug/1031954 for the old performance bug 21:34:14 <mattoliver> And memory performance.. ie we can load only what we want out of the format too. 21:35:33 <timburke> for sure -- though it becomes more noticeable if/when we merge the ring and builder files 21:36:00 <mattoliver> Yeah +1 21:36:24 <mattoliver> But metadata only won't include devices which will help :) 21:39:02 <timburke> #topic sync_batches_per_revert 21:39:42 <timburke> i also did some quasi-benchmarks for this patch -- basically doing an A/B test in my home cluster while rebalancing a few terabytes 21:39:45 <timburke> #link https://review.opendev.org/c/openstack/swift/+/839649 21:39:53 <mattoliver> Oh another good patch.. I kinda forgot about 21:40:49 <mattoliver> This also come into its own when wanting to make progress on bad disks right? Or am I thinking of another patch? 21:41:41 <timburke> yes -- but i'm definitely seeing it being a good/useful thing in healthy (but fairly full) clusters 21:42:59 <mattoliver> Oh nice 21:43:19 <timburke> so i've got three object nodes -- i drained one completely so i could simulate a large expansion. the other two nodes, i ran with handoffs_first -- and one tried the old behavior of a single big rsync, while the other tried batches of 20 21:45:24 <timburke> then i watched the disk usage rate while they both rebalanced to the "new" guy. the single rsync per partition would see long periods where disk usage wouldn't move much, with periodic spikes that would go up to like -200, -400, -600MB/s 21:47:27 <timburke> the node that broke it up into batches would fairly consistently be going something like -50MB/s, occasionally jumping up to -100MB/s -- which which matched the rate of rsync transfers *much* better 21:48:51 <timburke> i don't have log aggregation set up at home (yet), but i also got the sense that there were a lot more rsync errors with the single-rsync transfers 21:49:20 <mattoliver> Oh cool interesting. Less of a list to build and compare at both ends before starting, with things that might change? 21:49:39 <mattoliver> Per rsync 21:50:48 <timburke> less that -- i've got a fairly static cluster; now and then some writes, but mostly pretty read heavy. the bigger win was that we didn't need to wait for a whole partition to transfer before we could start deleting 21:51:18 <mattoliver> Oh yeah, that's huge actually. 21:53:15 <timburke> so my proposition is that this would help a cluster-full situation: you might be able to get to 90% full, finally get your expansion in, and not need to wait a day or two for the initially-failed rsyncs to get retried a few times to start bringing down usage on the old disks 21:54:20 <mattoliver> Yeah, +1, that's pretty great 21:54:27 <kota> +1 21:55:22 <timburke> bonus: the batched node finished its rebalance several hours earlier (again, likely due to the lowered likelihood of rsync errors) 21:55:47 <timburke> all right, sorry -- i took a bunch of time with that. it was kinda fun, though it took a bit :-) 21:55:53 <timburke> #topic open discussion 21:56:01 <timburke> what else should we talk about this week? 21:56:24 <zaitcev> Well, it's been almost a full hour. 21:56:28 <mattoliver> Sounds awesome! Let's get it in 21:58:33 <timburke> all right, i'll call it 21:58:41 <timburke> thank you all for coming, and thank you for working on swift! 21:58:45 <timburke> #endmeeting