21:00:31 <timburke> #startmeeting swift 21:00:31 <opendevmeet> Meeting started Wed Aug 4 21:00:31 2021 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:31 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:31 <opendevmeet> The meeting name has been set to 'swift' 21:00:40 <timburke> who's here for the swift meeting? 21:00:47 <kota> o/ 21:00:54 <acoles> o/ 21:01:02 <mattoliver> o/ 21:01:31 <timburke> as usual, the agenda's at 21:01:33 <timburke> #link https://wiki.openstack.org/wiki/Meetings/Swift 21:01:45 <timburke> sorry, i only *just* got around to updating it 21:01:56 <timburke> #topic PTG 21:02:26 <timburke> just a reminder to start populating the etherpad with topics 21:02:28 <timburke> #link https://etherpad.opendev.org/p/swift-ptg-yoga 21:03:03 <timburke> i did get around to booking rooms, but i still need to put add them to the etherpad, too 21:03:49 <timburke> i decided to split times again, try to make sure everyone has some time where they're likely to be well-rested ;-) 21:04:16 <timburke> any questions on the PTG? 21:05:18 <mattoliver> Not yet, let's fill out the etherpad and have some good discussions :) 21:05:32 <timburke> agreed :-) 21:06:09 <timburke> #topic expirer can delete data with inconsistent x-delete-at values 21:07:02 <timburke> so i've got some users that are using the expirer pretty heavily, and i *think* i've seen an old bug 21:07:10 <timburke> #link https://bugs.launchpad.net/swift/+bug/1182628 21:09:19 <timburke> basically, there's a POST at t1 to mark an object to expire at t5, then another POST at t2 to have it expire at t10. if replication doesn't get rid of all the t1 .metas, and the t5 expirer queue entry is still hanging around, we'll delete data despite getting back 412s 21:10:26 <timburke> on top of that, since the data got deleted, the t10 expiration fails with 412s and hangs around until a reclaim_age passes 21:10:30 <clayg> timburke: do you have any evidence we've hit expiry times getting [412, 412, 204] because it expired before the updated POST was fully replicated? 21:11:13 <zaitcev> Ugh 21:11:17 <clayg> any chance we could reap the t10 delete row based on the x-timestamp coming off the 404/412 response? 21:11:46 <zaitcev> I can see telling them not to count on POST doing the job, but the internal inconsistency is just bad no matter what. 21:11:53 <clayg> also I think there's an inline attempt (maybe even going to async) to clean up the t5 if the post at t2 happens to notice it 21:12:48 <timburke> i've seen the .ts as of somewhere around the start of July, and the expirer kicking back 412s starting around mid-July. i haven't dug into the logs enough to see *exactly* what happened when, but knowing my users it seems likely that they wanted the later expiration time 21:14:09 <timburke> clayg, there is, but only if the t1 .meta is present on whichever servers get the t2 POST 21:14:42 <clayg> 👍 21:16:35 <timburke> i *think* it'd be reasonable to reap the t10 queue entry based on the t5 tombstone being newer than the t2 enqueue-time. but it also seems preferable to avoid the delete until we actually know that we want to delete it 21:17:39 <timburke> 'cause i *also* think it'd be reasonable to reap the t5 queue entry based on a 412 that indicates the presence of a t2 .meta (since it's greater than the t1 enqueue-time) 21:18:05 <timburke> anyway, i've got a failing probe test at 21:18:08 <timburke> #link https://review.opendev.org/c/openstack/swift/+/803406 21:19:17 <mattoliver> Great start 21:20:06 <timburke> it gets a pretty-good split brain going on, with 4 replicas of an object, two with one delete-at time, two with another, and queue entries for both 21:21:18 <clayg> noice 21:21:40 <clayg> love the idea of getting those queue entries cleaned up if we can do it in a way that makes sense 👍 21:21:41 <timburke> i'm also starting to work on a fix for it that makes DELETE with X-If-Delete-At look a lot like a PUT with If-None-Match -- but it seems like it may get hairy. will keep y'all updated 21:22:02 <clayg> timburke: just do the HEAD and DELETE to start - then make it fancy 21:22:08 <clayg> "someday" 21:22:17 <clayg> (also consider maybe not making it fancy if we can avoid it) 21:23:29 <timburke> it'd have to be a HEAD with X-Newest, though -- which seems like a pretty sizable request amplification for the expirer :-( 21:24:14 <clayg> it doesn't *have* to be x-newest - you could just use direct client and get all the primaries in concert 21:24:32 <clayg> the idea is you can't make the delete unless everyone already has a matching x-delete-if-match 21:26:07 <timburke> i'll think about it -- seems like i'd have to reinvent a decent bit of best_response, though 21:26:30 <timburke> next up 21:26:49 <timburke> #topic last primary table 21:27:21 <timburke> this came up at the last PTG, and i already see it's a topic for the next one (thanks mattoliver!) 21:27:38 <timburke> mattoliver even already put a patch together 21:27:41 <timburke> #link https://review.opendev.org/c/openstack/swift/+/790550 21:28:47 <mattoliver> Thanks for the reviews lately timburke 21:28:57 <timburke> i just wanted to say that i'm excited about this idea -- it seems like the sort of thing that can improve both client-observable behaviors and replication 21:29:16 <zaitcev> Very interesting. That for_read, is it something proxy can use too? 21:29:29 <zaitcev> Right, I see. Both. 21:30:08 <zaitcev> So, where's the catch? How big is that array for a 18-bit ring with 260,000 partitions? 21:30:30 <mattoliver> In my limited testing and as you can see in its follow ups it makes a post rebalance reconstruction faster and less CPU bound. 21:30:32 <timburke> basically, it's an extra replica's worth of storage 21:30:45 <timburke> /ram 21:30:46 <mattoliver> Yeah, so you ring grows an extra replica basically. 21:31:45 <kota> interesting 21:32:16 <timburke> and with proxy plumbing, you can rebalance 2-replica (or even 1-replica!) policies without so much risk of unavailability 21:34:32 <mattoliver> Also means on post rebalance we can take last primaries into account and get built in handoffs first (or at least for last primaries). Which is why I'm playing with the reconstructor as a follow-up. 21:35:02 <timburke> 👍 21:35:12 <timburke> #topic open discussion 21:35:24 <timburke> those were the main things i wanted to bring up; what else should we talk about this week? 21:36:51 <clayg> There was that ML thread about the x-delete-at in the past breaking EC rebalance because of old swift bugs. 21:37:28 <clayg> Moral: upgrade! 21:41:17 <timburke> zaitcev, looks like you took the DNM off https://review.opendev.org/c/openstack/swift/+/802138 -- want me to find some time to review it? 21:42:47 <mattoliver> Just an update, I had a meeting with our SREs re the tracing request patches, they gave some good improvements they'd find useful.. next I plan to do some bench marks to see if it effects anything before I move forward on it. 21:42:48 <timburke> mattoliver and acoles, how are we feeling about the shard/storage-policy-index patch? https://review.opendev.org/c/openstack/swift/+/800748 21:43:09 <timburke> nice! 21:44:14 <zaitcev> timburke: I think it should be okay. 21:45:00 <acoles> timburke: I need to look at the policy index patch again since mattoliver last updated it 21:45:03 <zaitcev> timburke: Some of the "zero" tests belong into system reader. I was thinking about factoring them out, but haven't done that. It's not hurting anything except my sense of symmetry. 21:45:11 <mattoliver> I moved the migration into the replicatior, so there is a bit of new code there, but means we can migrate shards spi before enqueued reconiler (but it still happens in the sharper too). So take a look and we can decide where to do it.. or maybe both. 21:46:08 <zaitcev> I wish people looked at this though... it's the first step of the mitigation for stuck updates: https://review.opendev.org/c/openstack/swift/+/743797 21:48:16 <zaitcev> I'll take a look at 790550 and 803406. 21:49:18 <mattoliver> There is then a follow up to the shard spi migration and that's to get shard containers to respond to GET with the policy supplied to it (if one is supplied) so a longer tail spi migration doesn't effect root container GETS. A shard is an extension of its root, and happily takes objects with a different spi (supplied by the root) so makes sense it should return them on get too. 21:50:01 <acoles> mattoliver: does that need to be a follow-up? does it depend on the other change? 21:50:33 <mattoliver> No it doesn't.. but wanted to test it with clays probe test he wrote :) 21:51:16 <acoles> oic 21:51:29 <mattoliver> So ca n move it off there :) the follow up still needs tests so will do that today. I could always steal clays probe test and change it for this case :) 21:51:40 <acoles> I'll try to catch up on those patches tomorrow 21:51:43 <mattoliver> Just was useful while writing :) 21:52:04 <acoles> clayg has a habit of being useful :) 21:53:10 <clayg> 😊 21:53:11 <timburke> zaitcev, i'll take a look at https://review.opendev.org/c/openstack/swift/+/743797 -- you're probably right that we're in no worse situation than we already were. i might push up a follow-up to quarantine dbs with hash mismatches 21:55:41 <timburke> all right, we're about at time 21:55:52 <timburke> thank you all for coming, and thank you for working on swift! 21:55:57 <timburke> #endmeeting