manuvakery1 | timburke_: we are running following version of swift and is consistent across all nodes. | 03:10 |
---|---|---|
manuvakery1 | openstack-swift-account.noarch 2.23.1-1.el7 @centos-openstack-train | 03:10 |
manuvakery1 | openstack-swift-container.noarch 2.23.1-1.el7 @centos-openstack-train | 03:10 |
manuvakery1 | openstack-swift-object.noarch 2.23.1-1.el7 @centos-openstack-train | 03:10 |
manuvakery1 | timburke_: sorry my mistake. we are running swift==2.25.0 | 03:22 |
opendevreview | Pete Zaitcev proposed openstack/swift master: Make the dark data watcher work with sharded containers https://review.opendev.org/c/openstack/swift/+/787656 | 04:33 |
manuvakery1 | timburke_: you are right, one of the storage node was running an older version even though we upgrade to swift==2.25.0 via pip, cleanup and reinstall fixed the issue, thanks for the help | 05:25 |
opendevreview | Matthew Oliver proposed openstack/swift master: conatiner-server: return objects of a given policy https://review.opendev.org/c/openstack/swift/+/803423 | 07:21 |
mattoliver | ^ that still needs unit tests... but clays probe test looks better :P | 07:23 |
*** mabrams is now known as Guest3279 | 08:58 | |
*** mabrams1 is now known as mabrams | 08:58 | |
*** diablo_rojo is now known as Guest3281 | 09:49 | |
*** Guest3281 is now known as diablo_rojo | 10:09 | |
*** diablo_rojo is now known as Guest3316 | 18:34 | |
*** Guest3316 is now known as diablo_rojo | 18:35 | |
timburke_ | almost meeting time! | 20:57 |
*** timburke_ is now known as timburke | 20:57 | |
clayg | 🥳 | 20:59 |
zaitcev | Sounds good. | 20:59 |
kota | good morning | 20:59 |
mattoliver | Morning | 21:00 |
acoles | good evening :) | 21:00 |
timburke | #startmeeting swift | 21:00 |
opendevmeet | Meeting started Wed Aug 4 21:00:31 2021 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot. | 21:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 21:00 |
opendevmeet | The meeting name has been set to 'swift' | 21:00 |
timburke | who's here for the swift meeting? | 21:00 |
kota | o/ | 21:00 |
acoles | o/ | 21:00 |
mattoliver | o/ | 21:01 |
timburke | as usual, the agenda's at | 21:01 |
timburke | #link https://wiki.openstack.org/wiki/Meetings/Swift | 21:01 |
timburke | sorry, i only *just* got around to updating it | 21:01 |
timburke | #topic PTG | 21:01 |
timburke | just a reminder to start populating the etherpad with topics | 21:02 |
timburke | #link https://etherpad.opendev.org/p/swift-ptg-yoga | 21:02 |
timburke | i did get around to booking rooms, but i still need to put add them to the etherpad, too | 21:03 |
timburke | i decided to split times again, try to make sure everyone has some time where they're likely to be well-rested ;-) | 21:03 |
timburke | any questions on the PTG? | 21:04 |
mattoliver | Not yet, let's fill out the etherpad and have some good discussions :) | 21:05 |
timburke | agreed :-) | 21:05 |
timburke | #topic expirer can delete data with inconsistent x-delete-at values | 21:06 |
timburke | so i've got some users that are using the expirer pretty heavily, and i *think* i've seen an old bug | 21:07 |
timburke | #link https://bugs.launchpad.net/swift/+bug/1182628 | 21:07 |
timburke | basically, there's a POST at t1 to mark an object to expire at t5, then another POST at t2 to have it expire at t10. if replication doesn't get rid of all the t1 .metas, and the t5 expirer queue entry is still hanging around, we'll delete data despite getting back 412s | 21:09 |
timburke | on top of that, since the data got deleted, the t10 expiration fails with 412s and hangs around until a reclaim_age passes | 21:10 |
clayg | timburke: do you have any evidence we've hit expiry times getting [412, 412, 204] because it expired before the updated POST was fully replicated? | 21:10 |
zaitcev | Ugh | 21:11 |
clayg | any chance we could reap the t10 delete row based on the x-timestamp coming off the 404/412 response? | 21:11 |
zaitcev | I can see telling them not to count on POST doing the job, but the internal inconsistency is just bad no matter what. | 21:11 |
clayg | also I think there's an inline attempt (maybe even going to async) to clean up the t5 if the post at t2 happens to notice it | 21:11 |
timburke | i've seen the .ts as of somewhere around the start of July, and the expirer kicking back 412s starting around mid-July. i haven't dug into the logs enough to see *exactly* what happened when, but knowing my users it seems likely that they wanted the later expiration time | 21:12 |
timburke | clayg, there is, but only if the t1 .meta is present on whichever servers get the t2 POST | 21:14 |
clayg | 👍 | 21:14 |
timburke | i *think* it'd be reasonable to reap the t10 queue entry based on the t5 tombstone being newer than the t2 enqueue-time. but it also seems preferable to avoid the delete until we actually know that we want to delete it | 21:16 |
timburke | 'cause i *also* think it'd be reasonable to reap the t5 queue entry based on a 412 that indicates the presence of a t2 .meta (since it's greater than the t1 enqueue-time) | 21:17 |
timburke | anyway, i've got a failing probe test at | 21:18 |
timburke | #link https://review.opendev.org/c/openstack/swift/+/803406 | 21:18 |
mattoliver | Great start | 21:19 |
timburke | it gets a pretty-good split brain going on, with 4 replicas of an object, two with one delete-at time, two with another, and queue entries for both | 21:20 |
clayg | noice | 21:21 |
clayg | love the idea of getting those queue entries cleaned up if we can do it in a way that makes sense 👍 | 21:21 |
timburke | i'm also starting to work on a fix for it that makes DELETE with X-If-Delete-At look a lot like a PUT with If-None-Match -- but it seems like it may get hairy. will keep y'all updated | 21:21 |
clayg | timburke: just do the HEAD and DELETE to start - then make it fancy | 21:22 |
clayg | "someday" | 21:22 |
clayg | (also consider maybe not making it fancy if we can avoid it) | 21:22 |
timburke | it'd have to be a HEAD with X-Newest, though -- which seems like a pretty sizable request amplification for the expirer :-( | 21:23 |
clayg | it doesn't *have* to be x-newest - you could just use direct client and get all the primaries in concert | 21:24 |
clayg | the idea is you can't make the delete unless everyone already has a matching x-delete-if-match | 21:24 |
timburke | i'll think about it -- seems like i'd have to reinvent a decent bit of best_response, though | 21:26 |
timburke | next up | 21:26 |
timburke | #topic last primary table | 21:26 |
timburke | this came up at the last PTG, and i already see it's a topic for the next one (thanks mattoliver!) | 21:27 |
timburke | mattoliver even already put a patch together | 21:27 |
timburke | #link https://review.opendev.org/c/openstack/swift/+/790550 | 21:27 |
mattoliver | Thanks for the reviews lately timburke | 21:28 |
timburke | i just wanted to say that i'm excited about this idea -- it seems like the sort of thing that can improve both client-observable behaviors and replication | 21:28 |
zaitcev | Very interesting. That for_read, is it something proxy can use too? | 21:29 |
zaitcev | Right, I see. Both. | 21:29 |
zaitcev | So, where's the catch? How big is that array for a 18-bit ring with 260,000 partitions? | 21:30 |
mattoliver | In my limited testing and as you can see in its follow ups it makes a post rebalance reconstruction faster and less CPU bound. | 21:30 |
timburke | basically, it's an extra replica's worth of storage | 21:30 |
timburke | /ram | 21:30 |
mattoliver | Yeah, so you ring grows an extra replica basically. | 21:30 |
kota | interesting | 21:31 |
timburke | and with proxy plumbing, you can rebalance 2-replica (or even 1-replica!) policies without so much risk of unavailability | 21:32 |
mattoliver | Also means on post rebalance we can take last primaries into account and get built in handoffs first (or at least for last primaries). Which is why I'm playing with the reconstructor as a follow-up. | 21:34 |
timburke | 👍 | 21:35 |
timburke | #topic open discussion | 21:35 |
timburke | those were the main things i wanted to bring up; what else should we talk about this week? | 21:35 |
clayg | There was that ML thread about the x-delete-at in the past breaking EC rebalance because of old swift bugs. | 21:36 |
clayg | Moral: upgrade! | 21:37 |
timburke | zaitcev, looks like you took the DNM off https://review.opendev.org/c/openstack/swift/+/802138 -- want me to find some time to review it? | 21:41 |
mattoliver | Just an update, I had a meeting with our SREs re the tracing request patches, they gave some good improvements they'd find useful.. next I plan to do some bench marks to see if it effects anything before I move forward on it. | 21:42 |
timburke | mattoliver and acoles, how are we feeling about the shard/storage-policy-index patch? https://review.opendev.org/c/openstack/swift/+/800748 | 21:42 |
timburke | nice! | 21:43 |
zaitcev | timburke: I think it should be okay. | 21:44 |
acoles | timburke: I need to look at the policy index patch again since mattoliver last updated it | 21:45 |
zaitcev | timburke: Some of the "zero" tests belong into system reader. I was thinking about factoring them out, but haven't done that. It's not hurting anything except my sense of symmetry. | 21:45 |
mattoliver | I moved the migration into the replicatior, so there is a bit of new code there, but means we can migrate shards spi before enqueued reconiler (but it still happens in the sharper too). So take a look and we can decide where to do it.. or maybe both. | 21:45 |
zaitcev | I wish people looked at this though... it's the first step of the mitigation for stuck updates: https://review.opendev.org/c/openstack/swift/+/743797 | 21:46 |
zaitcev | I'll take a look at 790550 and 803406. | 21:48 |
mattoliver | There is then a follow up to the shard spi migration and that's to get shard containers to respond to GET with the policy supplied to it (if one is supplied) so a longer tail spi migration doesn't effect root container GETS. A shard is an extension of its root, and happily takes objects with a different spi (supplied by the root) so makes sense it should return them on get too. | 21:49 |
acoles | mattoliver: does that need to be a follow-up? does it depend on the other change? | 21:50 |
mattoliver | No it doesn't.. but wanted to test it with clays probe test he wrote :) | 21:50 |
acoles | oic | 21:51 |
mattoliver | So ca n move it off there :) the follow up still needs tests so will do that today. I could always steal clays probe test and change it for this case :) | 21:51 |
acoles | I'll try to catch up on those patches tomorrow | 21:51 |
mattoliver | Just was useful while writing :) | 21:51 |
acoles | clayg has a habit of being useful :) | 21:52 |
clayg | 😊 | 21:53 |
timburke | zaitcev, i'll take a look at https://review.opendev.org/c/openstack/swift/+/743797 -- you're probably right that we're in no worse situation than we already were. i might push up a follow-up to quarantine dbs with hash mismatches | 21:53 |
timburke | all right, we're about at time | 21:55 |
timburke | thank you all for coming, and thank you for working on swift! | 21:55 |
timburke | #endmeeting | 21:55 |
opendevmeet | Meeting ended Wed Aug 4 21:55:57 2021 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 21:55 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/swift/2021/swift.2021-08-04-21.00.html | 21:55 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/swift/2021/swift.2021-08-04-21.00.txt | 21:55 |
opendevmeet | Log: https://meetings.opendev.org/meetings/swift/2021/swift.2021-08-04-21.00.log.html | 21:55 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!