21:00:06 <timburke> #startmeeting swift 21:00:07 <openstack> Meeting started Wed Nov 4 21:00:06 2020 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:08 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:10 <openstack> The meeting name has been set to 'swift' 21:00:15 <timburke> who's here for the swift meeting? 21:00:26 <kota_> hi 21:00:34 <seongsoocho> o/ Hi ! 21:00:44 <mattoliverau> o/ 21:00:53 <rledisez> hi o/ 21:01:05 <timburke> rledisez! we missed you last week 21:01:15 <timburke> good to see you again :-) 21:02:04 <rledisez> yeah, sorry I missed that :( i hope you had great talk 21:02:29 <zaitcev> o/ 21:03:05 <zaitcev> rledisez: I thought it was going to be all ex-OpenIO at OVH from now on, with Swift and Ceph commiserating the in the dustbin. 21:03:39 <clayg> ugh, *already* 21:03:39 <zaitcev> rledisez: Because both you and Alex missed, so it was a bit more than just someone could not make it. 21:03:41 <acoles> o/ 21:04:03 <timburke> all right -- first up 21:04:07 <timburke> #topic PTG 21:05:07 <timburke> thanks to those that cameto the ptg last week, and those that didn't, we all missed you 21:05:28 <timburke> notmyname even made a cameo appearance or two :-) 21:05:53 <clayg> #throwback 21:06:12 <timburke> we covered a bunch of great topics, and tried to takenotes in the etherpad as we did so 21:06:18 <timburke> #link https://etherpad.opendev.org/p/swift-ptg-wallaby 21:06:50 <timburke> a quick overview: 21:07:19 <timburke> clayg was interested in havig a new handoff table in the ring, used only on reads 21:08:00 <timburke> people generally seemed enthusiastic about the idea, and we'll see what code ends up looking like 21:08:29 <clayg> "or if it gets written" 🤷♂️ 21:08:59 <timburke> i was looking at memcached error limiting, particularly when you only have a single mecached server configured in the proxy 21:09:35 <timburke> i wound up writing a new patch to make the existing, hard-coded error limiting more tunable 21:09:37 <timburke> #link https://review.opendev.org/#/c/761029/ 21:09:38 <patchbot> patch 761029 - swift - memcache: Make error-limiting values configurable - 1 patch set 21:10:32 <timburke> ... which i think will satisfy my needs without going the more-complicated route in the chain starting at https://review.opendev.org/759183 21:10:32 <patchbot> patch 759183 - swift - memcache: Refuse to error limit the last available... - 2 patch sets 21:11:11 <timburke> zaitcev and dsariel continue workig on audit watchers 21:11:17 <timburke> #link https://review.opendev.org/#/c/706653/ 21:11:18 <patchbot> patch 706653 - swift - Let developers/operators add watchers to object au... - 38 patch sets 21:11:35 <timburke> i think it's getting pretty darn close to merging 21:11:54 <zaitcev> No, actually, it's done. I addressed the comments that came up at PTG like moving to a separate directory. 21:12:28 <zaitcev> Although leaving a debugging logger.info in there was pretty embarrassing. 21:12:30 <timburke> yeah,i suppose i should say, "i needto go have another look and merge it" :-) 21:12:56 <timburke> we talked a bit about the default and recommended configs, and came away with a few different things we wanted to do 21:13:55 <timburke> cschwede will look at improving our recommendations 21:13:58 <acoles> zaitcev: not as embarassing as me leaving a debugging print ! 21:14:48 <timburke> mattoliverau will look at trimming our manpages (to mostly just point to online docs iirc) 21:15:28 <timburke> and acoles will pull the long tables i the deployment guide out to separate pages (hopefully making the whole thing a bit more readable) 21:16:56 <mattoliverau> zaitcev: I'll try and get back to audit watchers this week. Just been distracted. 21:17:18 <timburke> clayg is excited about some of the recent proxy-logging changes and how they let you slice your metrics; he'll likely propose a change to the sample cofigs to have two separately-named proxy-logging middlewares, each with their own namespacing 21:18:08 <clayg> yeah, i really should do that - and also fix whatever is wrong with the byte-enforcing code 21:18:11 <timburke> and there was a arathon ops feedback session with a lot of good commentary from ormandj 21:19:28 <timburke> (i feel a little bad that that took up so much of our time, but i also feel like it's always one of the most-valuable things we can do when we're all together) 21:19:44 <timburke> for ore on that, see 21:19:47 <timburke> #link https://etherpad.opendev.org/p/swift-wallaby-ops-feedback 21:19:54 <mattoliverau> +1 ormandji and the marathon ops feedback was awesome. 21:20:18 <clayg> for sure all good swift ❤️ ops 21:20:39 <timburke> that's my quick recap of the ptg; did i miss (or misrepresent) anything major? 21:21:27 <zaitcev> ALO? 21:22:07 <acoles> mattoliverau: talked us through all his great work on eliminating overlapping shard ranges 21:22:58 <timburke> oh yeah -- i keep writing it off sincei haven't actually written any code for it yet ;-) but hopefully people have a better feel for the problems we've seen with trying to use SLOs for s3 MPUs, and why a new type of large object might be useful/necessary 21:24:28 <timburke> all right, moving on 21:24:34 <timburke> #topic gate failures 21:25:07 <timburke> so lately, i've had this feeling like our gate has been in particularly bad shape 21:25:16 <mattoliverau> And acoles came up with a great alternative shard audit with gaps algorithm I want to now write into code. 21:26:16 <timburke> i think the guy that finally pushed meover the edge was https://review.opendev.org/#/c/759790/ -- 10 rechecks for a one-line change to drop an unused package from lower-constraints 21:26:16 <patchbot> patch 759790 - swift - Remove the unused coding style modules - 1 patch set 21:27:16 <mattoliverau> Only 10 rechecks :p 21:27:46 <timburke> so i started writing some tooling to get build info from zuul, pull down logs or subunit results, and parse out failures, looking for which jobs (and which individual *tests*) fail most often 21:28:34 <zaitcev> Suspense intensifies 21:28:51 <timburke> i don't have parsing for all job types yet, but it's already been able to help me find some particularly bad/annoying patterns 21:29:44 <mattoliverau> Nice 21:30:06 <timburke> for instance, of 258 individual probe test failures, 149 of them were the result of resetswift failing because the loopback device was busy 21:30:30 <acoles> :'( 21:30:35 <timburke> hopefully that failure mode will go away with https://review.opendev.org/#/c/761439/ 21:30:35 <patchbot> patch 761439 - swift - saio: Stop processes more forcefully in resetswift - 1 patch set 21:31:35 <timburke> i also found that the func tests were pretty nice to deal with, since they emit testrepository.subunit files 21:32:37 <timburke> so i figured i'd try switching probe tests to use ostestr, too: https://review.opendev.org/#/c/761459/ (and we'll just see whether the file shows up; it's all magic to me :-/) 21:32:37 <patchbot> patch 761459 - swift - probe: Use ostestr as test runner - 1 patch set 21:33:23 <mattoliverau> Nice work timburke 21:33:39 <clayg> mattoliverau: +1 timburke is a gate hero! 21:33:42 <timburke> there will probably be more information i glean from all of this (and more patches i write as a result), but wanted to share what i've been working on so far with it 21:34:21 <timburke> because i'm *so* tired of starting (and often ending) my day with a slew of rechecks :-( 21:34:44 <timburke> any questions or comments? 21:34:52 <acoles> good stuff timburke 21:35:37 <tosky> timburke: if I may - at this point you may try stestr 21:36:05 <tosky> ostestr is meant to be deprecated 21:36:30 <tosky> not sure it was already considered and tried in the past, maybe it was, so feel free to ignore me on this :) 21:36:43 <timburke> so i *did* try that originally! but forsome unknown reason it caused the partition numbers used in the probe tests' rings to come out different -- no idea why 21:37:24 <tosky> uhm 21:38:29 <timburke> i'm certainly interested in making sure we have maintained software for the test runner, but i figured i'd start with just seeing whether using the same runner that we do in func tests gets me the test artifacts i'm looking for 21:39:19 <timburke> it's very strange, though. i'll keep digging, see if i can get some kind of repro/explanation/bug report for it 21:40:07 <timburke> #topic replication lock 21:40:10 <timburke> #link https://review.opendev.org/#/c/754242/ 21:40:10 <patchbot> patch 754242 - swift - Fix a race condition in case of cross-replication - 6 patch sets 21:40:22 <timburke> rledisez, i'm sorry to say, i still haven't reviewed it :-( 21:40:34 <timburke> i even promised i would, too. sorry 21:41:22 <rledisez> that's ok, everybody is busy. it will be reviewed eventually ;) 21:43:07 <timburke> i'd still like to get a test ev up such that i actually repro the problem and see the patch fix it, but at the same time, (1) you're already runing it in prod, (2) it's definitely making your clusters better, and (3) i want *everybody's* clusters to run as well as rledisez's 21:43:48 <timburke> somaybe i should just run it through my mental python parser (clayg always tells me it's a pretty good one) 21:44:35 <mattoliverau> So that's your secret :) 21:44:37 <clayg> timburke: your brain is amazing 21:44:57 <clayg> mattoliverau: and he can do py2 and py3 at the same time!!! 21:45:43 <clayg> rledisez: did the rsync fix get squashed into the ssync change? I was pretty happy with the strategy for locking the REPLICATION requests; but never quite followed what you were thinking for rsync? 21:45:44 <timburke> i'll try again this week; we've got a pretty big rebalance coming up, it'd probably be good for us to have that patch 21:46:03 <timburke> i don't think we have an rsync fix yet 21:46:44 <rledisez> clayg: no rsync fix. i have the idea pretty clear but I lack time 21:47:19 <timburke> all right, that's all i've got for the agenda 21:47:26 <timburke> #topic open discussion 21:48:23 <timburke> speaking of replication, rledisez, i'd be curious about your take on https://review.opendev.org/#/c/758636/ 21:48:23 <patchbot> patch 758636 - swift - Add option to REPLICATE to just invalidate hashes - 5 patch sets 21:49:24 <timburke> (though you might be more interested in just ripping out post-(s)sync replicate calls; see https://bugs.launchpad.net/swift/+bug/1818709) 21:49:25 <openstack> Launchpad bug 1818709 in OpenStack Object Storage (swift) "object replicator update_deleted post ssync REPLICATE request considered harmful" [Undecided,New] 21:50:17 <rledisez> yeah, I sometimes disable it (like after a relink, it helps a lot). i'll have a look at the patch 21:50:51 <timburke> thanks 21:51:02 <tosky> I have a quick note about some legacy jobs (I'm the coordinator for the "no legacy jobs" community goal) 21:51:55 <tosky> I've just noticed a few legacy jobs I originally missed (thanks for porting basically all of them long time ago!) 21:52:09 <tosky> they don't use devstack-gate, so they are not so problematic, but still: they are in pyeclib 21:52:46 <tosky> so if you could convert those as well, and backport them to stable/victoria (and if you want also to older branches), that would be nice! 21:53:23 <timburke> tosky, thanks for the reminder! i'll look into it (hopefully this week?) 21:53:26 <tosky> oh, I see now, it doesn't have the openstack stable branches, so I guess master is fine 21:53:44 <tosky> thank you! 21:53:52 <mattoliverau> Yay, already easier :) 21:53:56 <timburke> i suspect they could largely be switched to the openstack-tox-... jobs 21:54:07 <tosky> most likely 21:54:45 <tosky> the official gerrit topic is "native-zuulv3-migration" 21:55:19 <timburke> 👍 21:55:22 <tosky> sorry for the late ping, I totally missed them originally; the devstack-gate jobs had (and have, the few left) an higher priority 21:56:08 <timburke> makes sense -- and pyeclib sees few enough patches, it likely wouldn't show up on any list of recently-run jobs 21:57:08 <timburke> all right 21:57:15 <timburke> thank you all for coming, and thank you for working on swift! 21:57:20 <timburke> #endmeeting