#openstack-swift log

21:00:31 <timburke> #startmeeting swift
21:00:31 <opendevmeet> Meeting started Wed Aug  4 21:00:31 2021 UTC and is due to finish in 60 minutes.  The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:00:31 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:00:31 <opendevmeet> The meeting name has been set to 'swift'
21:00:40 <timburke> who's here for the swift meeting?
21:00:47 <kota> o/
21:00:54 <acoles> o/
21:01:02 <mattoliver> o/
21:01:31 <timburke> as usual, the agenda's at
21:01:33 <timburke> #link https://wiki.openstack.org/wiki/Meetings/Swift
21:01:45 <timburke> sorry, i only *just* got around to updating it
21:01:56 <timburke> #topic PTG
21:02:26 <timburke> just a reminder to start populating the etherpad with topics
21:02:28 <timburke> #link https://etherpad.opendev.org/p/swift-ptg-yoga
21:03:03 <timburke> i did get around to booking rooms, but i still need to put add them to the etherpad, too
21:03:49 <timburke> i decided to split times again, try to make sure everyone has some time where they're likely to be well-rested ;-)
21:04:16 <timburke> any questions on the PTG?
21:05:18 <mattoliver> Not yet, let's fill out the etherpad and have some good discussions :)
21:05:32 <timburke> agreed :-)
21:06:09 <timburke> #topic expirer can delete data with inconsistent x-delete-at values
21:07:02 <timburke> so i've got some users that are using the expirer pretty heavily, and i *think* i've seen an old bug
21:07:10 <timburke> #link https://bugs.launchpad.net/swift/+bug/1182628
21:09:19 <timburke> basically, there's a POST at t1 to mark an object to expire at t5, then another POST at t2 to have it expire at t10. if replication doesn't get rid of all the t1 .metas, and the t5 expirer queue entry is still hanging around, we'll delete data despite getting back 412s
21:10:26 <timburke> on top of that, since the data got deleted, the t10 expiration fails with 412s and hangs around until a reclaim_age passes
21:10:30 <clayg> timburke: do you have any evidence we've hit expiry times getting [412, 412, 204] because it expired before the updated POST was fully replicated?
21:11:13 <zaitcev> Ugh
21:11:17 <clayg> any chance we could reap the t10 delete row based on the x-timestamp coming off the 404/412 response?
21:11:46 <zaitcev> I can see telling them not to count on POST doing the job, but the internal inconsistency is just bad no matter what.
21:11:53 <clayg> also I think there's an inline attempt (maybe even going to async) to clean up the t5 if the post at t2 happens to notice it
21:12:48 <timburke> i've seen the .ts as of somewhere around the start of July, and the expirer kicking back 412s starting around mid-July. i haven't dug into the logs enough to see *exactly* what happened when, but knowing my users it seems likely that they wanted the later expiration time
21:14:09 <timburke> clayg, there is, but only if the t1 .meta is present on whichever servers get the t2 POST
21:14:42 <clayg> 👍
21:16:35 <timburke> i *think* it'd be reasonable to reap the t10 queue entry based on the t5 tombstone being newer than the t2 enqueue-time. but it also seems preferable to avoid the delete until we actually know that we want to delete it
21:17:39 <timburke> 'cause i *also* think it'd be reasonable to reap the t5 queue entry based on a 412 that indicates the presence of a t2 .meta (since it's greater than the t1 enqueue-time)
21:18:05 <timburke> anyway, i've got a failing probe test at
21:18:08 <timburke> #link https://review.opendev.org/c/openstack/swift/+/803406
21:19:17 <mattoliver> Great start
21:20:06 <timburke> it gets a pretty-good split brain going on, with 4 replicas of an object, two with one delete-at time, two with another, and queue entries for both
21:21:18 <clayg> noice
21:21:40 <clayg> love the idea of getting those queue entries cleaned up if we can do it in a way that makes sense 👍
21:21:41 <timburke> i'm also starting to work on a fix for it that makes DELETE with X-If-Delete-At look a lot like a PUT with If-None-Match -- but it seems like it may get hairy. will keep y'all updated
21:22:02 <clayg> timburke: just do the HEAD and DELETE to start - then make it fancy
21:22:08 <clayg> "someday"
21:22:17 <clayg> (also consider maybe not making it fancy if we can avoid it)
21:23:29 <timburke> it'd have to be a HEAD with X-Newest, though -- which seems like a pretty sizable request amplification for the expirer :-(
21:24:14 <clayg> it doesn't *have* to be x-newest - you could just use direct client and get all the primaries in concert
21:24:32 <clayg> the idea is you can't make the delete unless everyone already has a matching x-delete-if-match
21:26:07 <timburke> i'll think about it -- seems like i'd have to reinvent a decent bit of best_response, though
21:26:30 <timburke> next up
21:26:49 <timburke> #topic last primary table
21:27:21 <timburke> this came up at the last PTG, and i already see it's a topic for the next one (thanks mattoliver!)
21:27:38 <timburke> mattoliver even already put a patch together
21:27:41 <timburke> #link https://review.opendev.org/c/openstack/swift/+/790550
21:28:47 <mattoliver> Thanks for the reviews lately timburke
21:28:57 <timburke> i just wanted to say that i'm excited about this idea -- it seems like the sort of thing that can improve both client-observable behaviors and replication
21:29:16 <zaitcev> Very interesting. That for_read, is it something proxy can use too?
21:29:29 <zaitcev> Right, I see. Both.
21:30:08 <zaitcev> So, where's the catch? How big is that array for a 18-bit ring with 260,000 partitions?
21:30:30 <mattoliver> In my limited testing and as you can see in its follow ups it makes a post rebalance reconstruction faster and less CPU bound.
21:30:32 <timburke> basically, it's an extra replica's worth of storage
21:30:45 <timburke> /ram
21:30:46 <mattoliver> Yeah, so you ring grows an extra replica basically.
21:31:45 <kota> interesting
21:32:16 <timburke> and with proxy plumbing, you can rebalance 2-replica (or even 1-replica!) policies without so much risk of unavailability
21:34:32 <mattoliver> Also means on post rebalance we can take last primaries into account and get built in handoffs first (or at least for last primaries). Which is why I'm playing with the reconstructor as a follow-up.
21:35:02 <timburke> 👍
21:35:12 <timburke> #topic open discussion
21:35:24 <timburke> those were the main things i wanted to bring up; what else should we talk about this week?
21:36:51 <clayg> There was that ML thread about the x-delete-at in the past breaking EC rebalance because of old swift bugs.
21:37:28 <clayg> Moral: upgrade!
21:41:17 <timburke> zaitcev, looks like you took the DNM off https://review.opendev.org/c/openstack/swift/+/802138 -- want me to find some time to review it?
21:42:47 <mattoliver> Just an update, I had a meeting with our SREs re the tracing request patches, they gave some good improvements they'd find useful.. next I plan to do some bench marks to see if it effects anything before I move forward on it.
21:42:48 <timburke> mattoliver and acoles, how are we feeling about the shard/storage-policy-index patch? https://review.opendev.org/c/openstack/swift/+/800748
21:43:09 <timburke> nice!
21:44:14 <zaitcev> timburke: I think it should be okay.
21:45:00 <acoles> timburke: I need to look at the policy index patch again since mattoliver last updated it
21:45:03 <zaitcev> timburke: Some of the "zero" tests belong into system reader. I was thinking about factoring them out, but haven't done that. It's not hurting anything except my sense of symmetry.
21:45:11 <mattoliver> I moved the migration into the replicatior, so there is a bit of new code there, but means we can migrate shards spi before enqueued reconiler (but it still happens in the sharper too). So take a look and we can decide where to do it.. or maybe both.
21:46:08 <zaitcev> I wish people looked at this though... it's the first step of the mitigation for stuck updates: https://review.opendev.org/c/openstack/swift/+/743797
21:48:16 <zaitcev> I'll take a look at 790550 and 803406.
21:49:18 <mattoliver> There is then a follow up to the shard spi migration and that's to get shard containers to respond to GET with the policy supplied to it (if one is supplied) so a longer tail spi migration doesn't effect root container GETS. A shard is an extension of its root, and happily takes objects with a different spi (supplied by the root) so makes sense it should return them on get too.
21:50:01 <acoles> mattoliver: does that need to be a follow-up? does it depend on the other change?
21:50:33 <mattoliver> No it doesn't..  but wanted to test it with clays probe test he wrote :)
21:51:16 <acoles> oic
21:51:29 <mattoliver> So ca n move it off there :) the follow up still needs tests so will do that today. I could always steal clays probe test and change it for this case :)
21:51:40 <acoles> I'll try to catch up on those patches tomorrow
21:51:43 <mattoliver> Just was useful while writing :)
21:52:04 <acoles> clayg has a habit of being useful :)
21:53:10 <clayg> 😊
21:53:11 <timburke> zaitcev, i'll take a look at https://review.opendev.org/c/openstack/swift/+/743797 -- you're probably right that we're in no worse situation than we already were. i might push up a follow-up to quarantine dbs with hash mismatches
21:55:41 <timburke> all right, we're about at time
21:55:52 <timburke> thank you all for coming, and thank you for working on swift!
21:55:57 <timburke> #endmeeting