21:00:24 <timburke> #startmeeting swift 21:00:25 <openstack> Meeting started Wed Oct 14 21:00:24 2020 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:26 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:28 <openstack> The meeting name has been set to 'swift' 21:00:31 <timburke> who's here for the swift meeting? 21:00:51 <kota_> hello 21:00:56 <seongsoocho> o/ 21:01:07 <rledisez> hi o/ 21:01:50 <zaitcev> 07 21:02:27 <timburke> as usual, the agenda's at https://wiki.openstack.org/wiki/Meetings/Swift 21:02:43 <timburke> first up, though, something i forgot to put on there ;-) 21:02:49 <timburke> #topic releases 21:02:57 <timburke> victoria was released today! 21:03:09 <timburke> thanks everyone for making this another great cycle 21:03:34 <kota_> congrats! 21:03:37 <seongsoocho> yay ~~~ congrats!!! 🎉 21:03:39 <timburke> we also have new stable releases for ussuri (2.25.1) and train (2.23.2) 21:05:35 <mattoliverau> o/ (sorry I'm late) 21:05:36 <timburke> they include a lot of py3 bug fixes, including the recent crypto patch where you'll need to upgrade while still on py2 before transitioning to py3 21:05:57 <timburke> (as much as anything, just making sure people are aware of them) 21:06:56 <timburke> and i'll probably try to get a swiftclient release out in the near future, as i think it's blocking my current attempt at getting py3 probe tests in the gate 21:07:11 <timburke> #topic PTG 21:07:28 <timburke> just a week and a half away! 21:08:00 <mattoliverau> \o/ 21:08:01 <timburke> i added a rough design for ALOs to the etherpad 21:08:03 <timburke> #link https://etherpad.opendev.org/p/swift-ptg-wallaby 21:08:13 <mattoliverau> Oh nice 21:08:15 <kota_> actually, Summit will start on the next Monday. 21:08:48 <timburke> also true! 21:09:00 <timburke> i should see what sessions look interesting... 21:09:16 <timburke> if anyone has recommendations, i'd love to hear them (now or in -swift) 21:10:15 <timburke> the ptg schedule is up at https://object-storage-ca-ymq-1.vexxhost.net/swift/v1/6e4619c416ff4bd19e1c087f27a43eea/www-assets-prod/Uploads/PTG2-Oct26-30-2020-Schedule-1.pdf and all the meeting times on the etherpad are accurate 21:10:17 * kota_ did NOT catch up the session schedule yet :P 21:11:54 <timburke> summit schedule's also available, at https://www.openstack.org/summit/2020/summit-schedule/ 21:13:17 <timburke> i can't wait to see you all again and do some hacking! :-) 21:13:33 <timburke> speaking of... let's talk patches! 21:13:46 <timburke> #topic audit watchers 21:13:51 <timburke> #link https://review.opendev.org/#/c/706653/ 21:13:52 <patchbot> patch 706653 - swift - Let developers/operators add watchers to object au... - 37 patch sets 21:14:30 <timburke> zaitcev, i saw you added a +2 -- does that mean you and david are ready for the rest of us to do some reviews? 21:14:37 <zaitcev> Yes 21:14:58 <timburke> cool 21:15:00 <timburke> ! 21:15:22 <zaitcev> dsariel (not in the meeting) was already looking for another project, this may bet float up. 21:15:54 <timburke> i'll be sure to make some time for a look, probably rebase https://review.opendev.org/#/c/744078/1 21:15:55 <patchbot> patch 744078 - swift - watchers: Add EC stat gatherer - 1 patch set 21:16:08 <zaitcev> Oh, nice. I forgot you had that. 21:16:39 <timburke> and see what other fun ideas for watchers i can come up with :-) 21:17:35 <timburke> anything else you need there besides reviews? anything we should know digging into it? 21:17:47 <zaitcev> But I just want a claim post in the ground -- even if the EC plugin ends requiring changes to the API. I resigned myself to APIs that need to be changed. I think this is what Sam meant when he wrote that all arguments are passed by name. Note that it requires implementations to always add **kwargs, but beyond that a flexibility is built in. 21:18:46 <timburke> yeah, and we could surely have a few releases where we label the feature as being experimental and subject to change if we really need 21:18:53 <zaitcev> I'm glad that I talked you all into the reduced isolation, because separate processes produced nasty problems with not keeping up and logging. 21:19:23 <zaitcev> If you can accept in-process model, I'm okay with anything. Even "except Exception" which I always -1 elsewhere. 21:20:19 <zaitcev> That is all. 21:20:43 <timburke> #topic replication locking 21:20:46 <zaitcev> If you look at it and like it, we don't really need to waste PTG time on it. Feel free to overstrike after you review. 21:20:58 <timburke> #link https://review.opendev.org/#/c/754242/ 21:20:58 <patchbot> patch 754242 - swift - Fix a race condition in case of cross-replication - 5 patch sets 21:21:05 <zaitcev> Tsk 21:21:13 <zaitcev> I promised to review it but didn't. 21:21:27 <timburke> rledisez, sorry, i still haven't gotten to rebuilding an env that would be amenable to trying to repro :-( 21:22:06 <timburke> i don't want to drop it from the agenda, though, because i hope to shame myself into actually doing what i said i'd do ;-) 21:22:12 <rledisez> So patchset 5 passed the production test, I was unable to lose datafiles using this patch. I added a comment on timeout at the end. I'm in favor of hardcoding a small value, but you may have different opinions 21:22:52 <timburke> that seems perfectly reasonable to me. if we can't get the lock quickly, skip it and try again later; makes sense 21:23:44 <rledisez> ok. so I'll update the review with a hardcoded timeout of 0.2; It's working fine in our deployments 21:24:48 <timburke> anything else you need there? would it be useful for us to think some more about how to handle locking for rsync? 21:25:01 <rledisez> I still have to work on the rsync fix, I have to take some time to think about it. I'm affraid it would incur a major change in SSYNC (splitting "negociation" and "data transfert). But it may be for the best actually 21:25:52 <rledisez> I still think the best option is to wrap rsync with a ssync call 21:27:35 <rledisez> that's all for me on that topic 21:27:38 <zaitcev> Interesting. 21:28:40 <timburke> #topic async slo cleanup 21:28:44 <timburke> #link https://review.opendev.org/#/c/733026/ 21:28:44 <patchbot> patch 733026 - swift - Add a new URL parameter to allow for async cleanup... - 12 patch sets 21:29:07 <timburke> so i think clayg's getting skittish about turning this on by default 21:30:38 <timburke> i might end up having it be opt-in, at least for an initial release. i know the main cluster i care about doesn't make heavy use of the expirer yet, and this would presumably change that 21:31:53 <timburke> meanwhile, i don't have a great handle on how best to monitor my expirers -- i know clayg has a little script that can check how much is in the queue and how much of it is ready for reaping, though 21:32:14 <mattoliverau> I guess opt in initially is probably a safe way to go forward. 21:32:40 <timburke> i'd also had an idea a while back (based on rledisez's https://review.opendev.org/#/c/715580/) to add a "lag" metric for the expirer 21:32:41 <patchbot> patch 715580 - swift - obj-updater: add metric on lag of containers listing - 1 patch set 21:32:52 <clayg> yeah SRE just deployed the expirer monitor on like one random node - so we have stats now (as long as that node is up 🙄 ) 21:33:10 <mattoliverau> Should we get expires to emit something, somehow? Have think about it. 21:34:09 <timburke> we've got some stats already, at least; iirc, mostly just success/failure counts 21:34:35 <zaitcev> I looked at that patch and it seemed okay, but then Clay came in and he had some fundamental comments 21:35:30 <timburke> (fwiw, the expirer patch is https://review.opendev.org/#/c/735271/) 21:35:30 <patchbot> patch 735271 - swift - metrics: Add lag metric to expirer - 1 patch set 21:35:53 <mattoliverau> Can you pull general task queue account stats to get a rough idea on size? With and account head. I should play around with general task queue some more. 21:36:33 <mattoliverau> Or did we shard the accounts too. 21:36:51 * mattoliverau is just thinking out loud, and isn't at his computer 21:37:47 <timburke> yeah, pretty sure that's the idea with https://gist.github.com/clayg/7f66eab2a61c77869e1e84ac4ed6f1df 21:38:55 <timburke> oh, but with https://review.opendev.org/#/c/517389/ that might get more complicated 21:38:56 <patchbot> patch 517389 - swift - Add object-expirer new mode to execute tasks from ... - 46 patch sets 21:41:17 <timburke> anyway, mainly just wanted to call attention to that default change -- i should have a fresh patchset up soon 21:41:24 <timburke> #topic open discussion 21:41:32 <timburke> what else should we bring up today? 21:42:41 <zaitcev> I recommended dsariel to look into sharding, help Matt along. 21:42:51 <zaitcev> Not sure if it's going to work. 21:43:11 <mattoliverau> Thanks! 21:45:46 <timburke> cool! speaking of, i should get my probe test at https://review.opendev.org/#/c/744256/ to a point that it passes consistently :-/ 21:45:46 <patchbot> patch 744256 - swift - sharding: probe test to exercise manual shrinking - 3 patch sets 21:46:53 <mattoliverau> I've been writing some tests for my poc rangescanner that will hopefully fsch root shard ranges 21:47:25 <mattoliverau> *fsck/scan and attempt to fix 21:47:54 <timburke> i'm really coming around on the idea that the way to "solve" autosharding is to get an automated recovery from overlapping shard ranges 21:49:00 <mattoliverau> Yeah, I want to do leader election properly, but in any case being able to fix is more important initially 21:49:37 <mattoliverau> The scanner currently deals with overlaps and can rebuild fragmented paths. 21:50:24 <mattoliverau> It's still a WIP but feel free to have a look. I also have a new doc/braindump 21:50:40 <mattoliverau> Which I hope to go over at the ptg. 21:51:54 <timburke> oh! there's a patch i keep meaning to bring up during meetings but never quite get around to: https://review.opendev.org/#/c/751966/ 21:51:55 <patchbot> patch 751966 - swift - replace md5 with swift utils version - 11 patch sets 21:52:14 <zaitcev> wait, what 21:52:28 <timburke> someone's interested in running swift with FIPS mode enabled, which means we'd need to annotate all uses of md5 21:53:14 <timburke> ...which understandably means that there'd be a decent number of conflicts if/when we merge it 21:53:34 <zaitcev> Russian anecdote: One guys said "What do you know about security through obscurity? On my last job, I replaced MD5 with SHA256 only trimmed to fit." 21:54:08 <timburke> heh 21:54:12 <mattoliverau> Lol 21:55:30 <timburke> (honestly, it makes me think a bit too much of https://tools.ietf.org/html/rfc3514, but w/e...) 21:56:11 <zaitcev> 1 April 2003 - nice tray 21:57:37 <timburke> does anyone have an interest in trying to review this? i'll probably get to it eventually, but it's very much a "when i get around to it" sort of endeavor) 21:58:07 <zaitcev> I could, I suppose. Just after Romain's race condition thing. 21:58:20 <timburke> i like that prioritization :-) 22:00:07 <timburke> all right, we're about at time. thank you all for coming, and thank you for working on swift! 22:00:13 <timburke> #endmeeting