21:00:24 #startmeeting swift 21:00:25 Meeting started Wed Oct 14 21:00:24 2020 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:26 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:28 The meeting name has been set to 'swift' 21:00:31 who's here for the swift meeting? 21:00:51 hello 21:00:56 o/ 21:01:07 hi o/ 21:01:50 07 21:02:27 as usual, the agenda's at https://wiki.openstack.org/wiki/Meetings/Swift 21:02:43 first up, though, something i forgot to put on there ;-) 21:02:49 #topic releases 21:02:57 victoria was released today! 21:03:09 thanks everyone for making this another great cycle 21:03:34 congrats! 21:03:37 yay ~~~ congrats!!! 🎉 21:03:39 we also have new stable releases for ussuri (2.25.1) and train (2.23.2) 21:05:35 o/ (sorry I'm late) 21:05:36 they include a lot of py3 bug fixes, including the recent crypto patch where you'll need to upgrade while still on py2 before transitioning to py3 21:05:57 (as much as anything, just making sure people are aware of them) 21:06:56 and i'll probably try to get a swiftclient release out in the near future, as i think it's blocking my current attempt at getting py3 probe tests in the gate 21:07:11 #topic PTG 21:07:28 just a week and a half away! 21:08:00 \o/ 21:08:01 i added a rough design for ALOs to the etherpad 21:08:03 #link https://etherpad.opendev.org/p/swift-ptg-wallaby 21:08:13 Oh nice 21:08:15 actually, Summit will start on the next Monday. 21:08:48 also true! 21:09:00 i should see what sessions look interesting... 21:09:16 if anyone has recommendations, i'd love to hear them (now or in -swift) 21:10:15 the ptg schedule is up at https://object-storage-ca-ymq-1.vexxhost.net/swift/v1/6e4619c416ff4bd19e1c087f27a43eea/www-assets-prod/Uploads/PTG2-Oct26-30-2020-Schedule-1.pdf and all the meeting times on the etherpad are accurate 21:10:17 * kota_ did NOT catch up the session schedule yet :P 21:11:54 summit schedule's also available, at https://www.openstack.org/summit/2020/summit-schedule/ 21:13:17 i can't wait to see you all again and do some hacking! :-) 21:13:33 speaking of... let's talk patches! 21:13:46 #topic audit watchers 21:13:51 #link https://review.opendev.org/#/c/706653/ 21:13:52 patch 706653 - swift - Let developers/operators add watchers to object au... - 37 patch sets 21:14:30 zaitcev, i saw you added a +2 -- does that mean you and david are ready for the rest of us to do some reviews? 21:14:37 Yes 21:14:58 cool 21:15:00 ! 21:15:22 dsariel (not in the meeting) was already looking for another project, this may bet float up. 21:15:54 i'll be sure to make some time for a look, probably rebase https://review.opendev.org/#/c/744078/1 21:15:55 patch 744078 - swift - watchers: Add EC stat gatherer - 1 patch set 21:16:08 Oh, nice. I forgot you had that. 21:16:39 and see what other fun ideas for watchers i can come up with :-) 21:17:35 anything else you need there besides reviews? anything we should know digging into it? 21:17:47 But I just want a claim post in the ground -- even if the EC plugin ends requiring changes to the API. I resigned myself to APIs that need to be changed. I think this is what Sam meant when he wrote that all arguments are passed by name. Note that it requires implementations to always add **kwargs, but beyond that a flexibility is built in. 21:18:46 yeah, and we could surely have a few releases where we label the feature as being experimental and subject to change if we really need 21:18:53 I'm glad that I talked you all into the reduced isolation, because separate processes produced nasty problems with not keeping up and logging. 21:19:23 If you can accept in-process model, I'm okay with anything. Even "except Exception" which I always -1 elsewhere. 21:20:19 That is all. 21:20:43 #topic replication locking 21:20:46 If you look at it and like it, we don't really need to waste PTG time on it. Feel free to overstrike after you review. 21:20:58 #link https://review.opendev.org/#/c/754242/ 21:20:58 patch 754242 - swift - Fix a race condition in case of cross-replication - 5 patch sets 21:21:05 Tsk 21:21:13 I promised to review it but didn't. 21:21:27 rledisez, sorry, i still haven't gotten to rebuilding an env that would be amenable to trying to repro :-( 21:22:06 i don't want to drop it from the agenda, though, because i hope to shame myself into actually doing what i said i'd do ;-) 21:22:12 So patchset 5 passed the production test, I was unable to lose datafiles using this patch. I added a comment on timeout at the end. I'm in favor of hardcoding a small value, but you may have different opinions 21:22:52 that seems perfectly reasonable to me. if we can't get the lock quickly, skip it and try again later; makes sense 21:23:44 ok. so I'll update the review with a hardcoded timeout of 0.2; It's working fine in our deployments 21:24:48 anything else you need there? would it be useful for us to think some more about how to handle locking for rsync? 21:25:01 I still have to work on the rsync fix, I have to take some time to think about it. I'm affraid it would incur a major change in SSYNC (splitting "negociation" and "data transfert). But it may be for the best actually 21:25:52 I still think the best option is to wrap rsync with a ssync call 21:27:35 that's all for me on that topic 21:27:38 Interesting. 21:28:40 #topic async slo cleanup 21:28:44 #link https://review.opendev.org/#/c/733026/ 21:28:44 patch 733026 - swift - Add a new URL parameter to allow for async cleanup... - 12 patch sets 21:29:07 so i think clayg's getting skittish about turning this on by default 21:30:38 i might end up having it be opt-in, at least for an initial release. i know the main cluster i care about doesn't make heavy use of the expirer yet, and this would presumably change that 21:31:53 meanwhile, i don't have a great handle on how best to monitor my expirers -- i know clayg has a little script that can check how much is in the queue and how much of it is ready for reaping, though 21:32:14 I guess opt in initially is probably a safe way to go forward. 21:32:40 i'd also had an idea a while back (based on rledisez's https://review.opendev.org/#/c/715580/) to add a "lag" metric for the expirer 21:32:41 patch 715580 - swift - obj-updater: add metric on lag of containers listing - 1 patch set 21:32:52 yeah SRE just deployed the expirer monitor on like one random node - so we have stats now (as long as that node is up 🙄 ) 21:33:10 Should we get expires to emit something, somehow? Have think about it. 21:34:09 we've got some stats already, at least; iirc, mostly just success/failure counts 21:34:35 I looked at that patch and it seemed okay, but then Clay came in and he had some fundamental comments 21:35:30 (fwiw, the expirer patch is https://review.opendev.org/#/c/735271/) 21:35:30 patch 735271 - swift - metrics: Add lag metric to expirer - 1 patch set 21:35:53 Can you pull general task queue account stats to get a rough idea on size? With and account head. I should play around with general task queue some more. 21:36:33 Or did we shard the accounts too. 21:36:51 * mattoliverau is just thinking out loud, and isn't at his computer 21:37:47 yeah, pretty sure that's the idea with https://gist.github.com/clayg/7f66eab2a61c77869e1e84ac4ed6f1df 21:38:55 oh, but with https://review.opendev.org/#/c/517389/ that might get more complicated 21:38:56 patch 517389 - swift - Add object-expirer new mode to execute tasks from ... - 46 patch sets 21:41:17 anyway, mainly just wanted to call attention to that default change -- i should have a fresh patchset up soon 21:41:24 #topic open discussion 21:41:32 what else should we bring up today? 21:42:41 I recommended dsariel to look into sharding, help Matt along. 21:42:51 Not sure if it's going to work. 21:43:11 Thanks! 21:45:46 cool! speaking of, i should get my probe test at https://review.opendev.org/#/c/744256/ to a point that it passes consistently :-/ 21:45:46 patch 744256 - swift - sharding: probe test to exercise manual shrinking - 3 patch sets 21:46:53 I've been writing some tests for my poc rangescanner that will hopefully fsch root shard ranges 21:47:25 *fsck/scan and attempt to fix 21:47:54 i'm really coming around on the idea that the way to "solve" autosharding is to get an automated recovery from overlapping shard ranges 21:49:00 Yeah, I want to do leader election properly, but in any case being able to fix is more important initially 21:49:37 The scanner currently deals with overlaps and can rebuild fragmented paths. 21:50:24 It's still a WIP but feel free to have a look. I also have a new doc/braindump 21:50:40 Which I hope to go over at the ptg. 21:51:54 oh! there's a patch i keep meaning to bring up during meetings but never quite get around to: https://review.opendev.org/#/c/751966/ 21:51:55 patch 751966 - swift - replace md5 with swift utils version - 11 patch sets 21:52:14 wait, what 21:52:28 someone's interested in running swift with FIPS mode enabled, which means we'd need to annotate all uses of md5 21:53:14 ...which understandably means that there'd be a decent number of conflicts if/when we merge it 21:53:34 Russian anecdote: One guys said "What do you know about security through obscurity? On my last job, I replaced MD5 with SHA256 only trimmed to fit." 21:54:08 heh 21:54:12 Lol 21:55:30 (honestly, it makes me think a bit too much of https://tools.ietf.org/html/rfc3514, but w/e...) 21:56:11 1 April 2003 - nice tray 21:57:37 does anyone have an interest in trying to review this? i'll probably get to it eventually, but it's very much a "when i get around to it" sort of endeavor) 21:58:07 I could, I suppose. Just after Romain's race condition thing. 21:58:20 i like that prioritization :-) 22:00:07 all right, we're about at time. thank you all for coming, and thank you for working on swift! 22:00:13 #endmeeting