21:00:30 <timburke> #startmeeting swift 21:00:31 <openstack> Meeting started Wed May 12 21:00:30 2021 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:32 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:34 <openstack> The meeting name has been set to 'swift' 21:00:37 <timburke> who's here for the swift meeting? 21:00:49 <kota_> o/ 21:01:11 <mattoliverau> o/ 21:01:57 <seongsoocho> o/ 21:02:09 <acoles> o/ 21:02:16 <timburke> as usual, the agenda's at https://wiki.openstack.org/wiki/Meetings/Swift 21:02:31 <timburke> though i only just updated it ;-) 21:03:06 <timburke> #topic Python 3.10 beta 21:04:01 <timburke> i just wanted to call attention to this -- it seems like there may be some work to get tests running under py310, and i don't really want it to be a fire-drill come october 21:05:31 <timburke> i've started playing around with it. eventlet's not working yet, nose is busted (and at this point, unlikely to be fixed) 21:05:42 <mattoliverau> good thinking. might need to see if I can create a venv of it. 21:05:55 <mattoliverau> oh wow 21:05:58 <mattoliverau> awesome 21:06:04 <timburke> python-swiftclient's fine, though -- as long as you've got some other test runner 21:06:50 <timburke> good news is that distros seemed to be pretty quick on the packaging front; i got it fine on fedora and ubuntu (via deadsnakes ppa, looks like) 21:07:33 <timburke> pyeclib needed an update, but it was tame enough that i went ahead and merged it when i confirmed the gate was happy with it 21:08:11 <timburke> https://review.opendev.org/c/openstack/pyeclib/+/790537 - Use Py_ssize_t when calling PyArg_Parse 21:09:00 <mattoliverau> if we wont be able to use nose anymore, whats the alternative.. will be need to migrate to something like pytest.. or wait and hope someone fixes nose. or is {o,s}testr still a supported thing? 21:10:04 <timburke> stestr seems to still be the "preferred" way in openstack, as best i can tell. it *also* doesn't work with py310 right now, though -- but i think the next testtools release should fix it 21:10:33 <timburke> personally, i kinda like pytest -- and it's working *today* 21:10:51 <mattoliverau> kk 21:11:37 <tosky> stestr is definitely alive (ostestr is totally deprecated and not used anymore) 21:11:53 <timburke> tl;dr: py310 is coming, and it's likely going to require some changes which we should be on the lookout for. if anyone's interested in staying on the bleeding edge, try it out and see what's broken ;-) 21:12:27 <timburke> any questions or comments? 21:12:41 <acoles> timburke: thanks for the headsup 21:12:45 <mattoliverau> tosky: thanks! 21:13:40 <timburke> all right, on to updates! 21:13:45 <timburke> #topic sharding 21:14:09 <timburke> so we've merged the tombstone counting 21:14:41 <timburke> what are the next steps? acoles, mattoliverau 21:15:57 <acoles> one last known piece of the shrinking jigsaw is getting a better handle on a shard being sharded enough before shrinking, which mattoliverau has been working on 21:16:09 <mattoliverau> I've been playing more with the extra ACTIVE steps based on discussions from PTG. 21:16:57 <mattoliverau> the chain starts at https://review.opendev.org/c/openstack/swift/+/789471 21:17:33 <acoles> I'm hopeful that the latest idea of looking at rows cleaved will work out 21:17:40 <mattoliverau> it starts with adding rows_cleaved to the CleaveContexts so we can tell if a context is a smaller or larger 21:18:00 <mattoliverau> yeah, thanks acoles me too :) 21:18:54 <mattoliverau> I've attempting to see if adding an extra CLEAVED state will help with the edge case where fast cleaving shards become CLEAVED and therefore responsible for listings 21:19:05 <mattoliverau> producing holes until others shard. 21:19:30 <mattoliverau> it's the last in the chain.. and it's still very much a WIP and still trying it out. 21:20:50 <timburke> sounds good 21:20:53 <mattoliverau> I also noticed acoles rebase a bunch of follow up sharding patches based around unification of the sharder and s-m-s-r tool 21:20:59 <acoles> I have a couple of patches around sharder config, first is to share config loading between sharder and s-m-s-r 21:21:00 <mattoliverau> so I want to take a look at those 21:21:00 <acoles> https://review.opendev.org/c/openstack/swift/+/778989 21:21:16 <mattoliverau> lol, same wave length :) 21:21:43 <acoles> and based on that, a proposal to add absolute config values for shrink threshold etc https://review.opendev.org/c/openstack/swift/+/778990 21:22:17 <timburke> yeah, that seems like it ought to reduce confusion for ops -- good plan 21:22:27 <mattoliverau> +1 21:22:42 <acoles> the percent based config is convenient but I am concerned that it is harder to explain to ops the consequences of config changes that have knock-on effects 21:23:25 <timburke> next up 21:23:28 <timburke> #topic relinker 21:23:43 <acoles> it keeps getting better! 21:24:14 <timburke> i put up a change to clean up replication locks, which were preventing the old parts from getting cleaned up properly for EC policies 21:24:20 <timburke> https://review.opendev.org/c/openstack/swift/+/790305 21:25:32 <timburke> and while investigating that, i realized there was a bug that would let two different processes hold partition locks if we're also trying to delete those lock files 21:25:35 <timburke> https://review.opendev.org/c/openstack/swift/+/791022 21:26:39 <acoles> were we already deleting locks ? or is this a future-bug-waiting-to-happen? 21:27:13 <acoles> I guess we already are in relinker cleanup? 21:27:26 <acoles> its just the replication locks that we're adding 21:27:51 <acoles> but, any other places locks get deleted? 21:28:43 <timburke> we're already deleting locks -- but just during relinking 21:28:54 <acoles> ok 21:30:53 <timburke> after that, i think i'd prioritize https://review.opendev.org/c/openstack/swift/+/788413 - relinker: Log and recon on SIGTERM signal 21:31:19 <mattoliverau> replication "should" be on hold on the policy that's being increased.. so it shouldn't be taking locks? 21:31:37 <acoles> oh yeah, mattoliverau 's relinker recon patch merged which is really useful :) 21:32:08 <timburke> the partition lock is still taken during object writes, so we can update hashes.invalid 21:32:14 <mattoliverau> that ended up being more patchsets then I expected :P 21:32:23 <acoles> hehe 21:33:01 <timburke> i'm not as worried about the replication lock (for the reason mattoliverau pointed out), but i updated the cleanup patch to take that lock before deleting it as well 21:33:10 <acoles> mattoliverau: replication locks may not be beng taken but the lock files were preventing the partition rmdir 21:34:10 <timburke> next up 21:34:17 <timburke> #topic stale EC frags 21:34:45 <timburke> we landed https://review.opendev.org/c/openstack/swift/+/788833 - Quarantine stale EC fragments after checking handoffs 21:34:58 <timburke> did we have any noteworthy follow-ups that shook out of that? 21:35:16 <mattoliverau> \o/ nice one acoles 21:35:41 <zaitcev> I'm curious too. 21:35:58 <acoles> I don't recall any follow up patches 21:36:30 <acoles> ah, did someone take a crack at the TODO re. finding the actual wanted frag? 21:36:59 <acoles> but that's more another topic than follow up 21:37:00 <timburke> oh yeah, mattoliverau did https://review.opendev.org/c/openstack/swift/+/790374 - WIP: Single frag iter hack 21:37:29 <timburke> looked fairly sane, and not as much code as i'd feared 21:37:33 <mattoliverau> not really related to stale EC frags, but I have been nerd sniped and playing with trying to use a fragment if we find it (ie duplication) and has lead to adding tracking to last primaries in the ring so rebalances in the reconstructor could be faster because we'd actually check the old primary for the fragment. 21:38:19 <mattoliverau> ring last primary tracking: https://review.opendev.org/c/openstack/swift/+/790550 21:38:26 <acoles> we'll be trying out the stale EC frag quarantine this week and hoping to see a bunch of lonely old frags (and annoying error logs) disappear 21:39:17 <timburke> and that previous-primary tracking seems likely to be useful in other contexts 21:40:01 <timburke> be able to have single-replica policies where a rebalance doesn't automatically ensure availability issues ;-) 21:40:37 <timburke> all right; next up 21:40:45 <timburke> #topic dark data watcher 21:41:25 <timburke> i've still not gotten around to looking at this problem yet. but i wanted to keep the topic as a reminder that i should 21:42:11 <zaitcev> Well, we have 2 problems, AFAIK. 21:42:40 <timburke> true 21:42:45 <zaitcev> #1 is incompatibility with the sharded containers, which you tackled and then Alistair didn't like, and I don't understand what exactly he didn't like. 21:43:02 <zaitcev> #2 is conflict with async pendings and that one I am fixing 21:43:33 <zaitcev> Or, fixed. But I broke probe tests, so I tried to run baseline tests, and it turned out that they no longer work for me. 21:43:53 <zaitcev> I dealt with that just today, and I'm getting to fixing up the tests. 21:44:08 <zaitcev> Oh, and thanks for landing my antisudo thing 21:44:19 <timburke> cool! there was more progress on this than i'd thought :D 21:44:31 <timburke> thanks zaitcev 21:45:12 <zaitcev> BTW. In a couple of places we're running swift-manage-shard-ranges, which turned out to be impossible anymore for me, because I do not run an installed cluster. My code is in git repo, it's not in /usr/local. I use PYTHONPATH to make it work. 21:45:39 <zaitcev> I scratched my head a little and threw a stub into ~/bin, moved on with my life. 21:46:10 <zaitcev> I have a feeling you guys always run probe tests in some kind of VM or at least a container anymore. 21:46:22 <zaitcev> Maybe I should do it too. 21:46:44 <zaitcev> That's it, thanks for listening. 21:47:16 <mattoliverau> yeah I run it in an saio which installs swift (pip install -e .).. so maybe that's why I've never had the problem 21:47:18 <acoles> zaitcev: I have always run probe tests in a VM, these days using vagrant-SAIO 21:47:40 <timburke> all right 21:47:42 <zaitcev> Makes sense, thanks guys 21:47:46 <timburke> #topic open discussion 21:47:56 <timburke> anything else we should bring up this week? 21:48:02 <zaitcev> Well 21:48:08 <zaitcev> Have anyone heard of Minio? 21:48:24 <timburke> heard of it, though i haven't played with it 21:48:38 <zaitcev> A friend of mine wanted an S3, but he installed that thing instead of Swift. Me sad. 21:49:02 <zaitcev> Apparently it's way easier to install in modern servers, it comes as a ready-to-go docker image. 21:49:53 <zaitcev> It may not be something NVIDIA cares about, but I think we ought to think about some kind of canned single-node. 21:49:58 <zaitcev> Better than SAIO 21:50:31 <zaitcev> BTW, I made a post https://zaitcev.livejournal.com/262542.html 21:51:14 <zaitcev> Then, a(nother) friend of mine wrote me and wagged his finger how NVIDIA is uppercase now tsk tsk. 21:51:30 <zaitcev> I had to change all references. 21:52:11 <zaitcev> So, anyway, do we have some kind of ready-made Dockerfile? I would not be surprised if it's in the tree already, I'm just not aware. 21:52:59 <acoles> we do but I have never looked at it until right now ! 21:53:32 <timburke> yeah, we should -- https://github.com/openstack/swift/blob/master/Dockerfile and https://github.com/openstack/swift/blob/master/Dockerfile-py3 21:54:03 <acoles> tdasilva gets the blame :D 21:55:00 <timburke> iirc it was useful as a test target for 1space 21:55:28 <mattoliverau> dont we have https://hub.docker.com/repository/docker/openstackswift/saio 21:55:40 <mattoliverau> that's being build by zuul 21:55:46 <mattoliverau> *built 21:56:06 <mattoliverau> tdasilva: did that, he's pretty awesome 21:56:43 <timburke> yup! again, not something i've really played with, though :-( 21:56:48 <timburke> i keep meaning to 21:58:09 <timburke> all right -- i'm gonna call it 21:58:21 <timburke> thank you all for coming, and thank you for working on swift! 21:58:30 <timburke> #endmeeting