21:00:40 <timburke> #startmeeting swift 21:00:41 <openstack> Meeting started Wed Aug 12 21:00:40 2020 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:42 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:44 <openstack> The meeting name has been set to 'swift' 21:00:47 <timburke> who's here for the swift meeting? 21:00:54 <mattoliverau> o/ 21:01:14 <clayg> o/ 21:01:27 <kota_> hi 21:01:46 <rledisez> o/ 21:02:46 <timburke> agenda's at https://wiki.openstack.org/wiki/Meetings/Swift 21:03:04 <timburke> #topic summit and ptg 21:03:27 <timburke> looks like the dates are set for the ptg 21:03:28 <timburke> #link http://lists.openstack.org/pipermail/openstack-discuss/2020-August/016424.html 21:03:41 <timburke> it will immediately follow the summit 21:04:06 <timburke> so, summit will be oct 19-23, then ptg oct 26-30 21:04:24 <clayg> it's all virtual like last time? that seemed to work ok... 21:04:30 <timburke> registration's already open, and both virtual events are free 21:04:38 <timburke> #link https://www.eventbrite.com/e/open-infrastructure-summit-2020-tickets-96967218561 21:04:44 <timburke> #link https://www.eventbrite.com/e/project-teams-gathering-october-2020-tickets-116136313841 21:05:16 <kota_> good info. thx. 21:05:18 <timburke> i'll keep an eye out for what we need to do to sign up for space and such, and try to be more on it than last time 21:05:48 <clayg> timburke: or just let mattoliverau do it again - things worked great!? 21:05:53 <timburke> though mattoliverau did a great job getting everything organized! 21:06:49 <timburke> that's all i had by way of announcements; any questions or comments? 21:06:50 <mattoliverau> Lol 21:07:51 <timburke> all right 21:07:55 <clayg> timburke: great update! especially the part about mattoliverau being awesome 👍 21:08:07 <timburke> #topic s3api subrequest logging 21:08:15 <timburke> clayg, i think you added this, yeah? 21:08:25 <clayg> oh no 😬 21:08:28 <clayg> did I add a patch? 21:08:35 <timburke> two of them! 21:08:46 <timburke> and now one's landed :-) 21:08:58 <clayg> why is zuul so mad at p 735221 😡 21:08:58 <patchbot> https://review.opendev.org/#/c/735221/ - swift - s3api: Use swift.backend_path to proxy-log s3api r... - 2 patch sets 21:09:27 <timburke> er, approved -- i'll keep poking at the gate 21:09:55 <clayg> so if we CAN get that one landed I'm pretty much fine with p 735220 - in fact I'd like to do some follow-on to update the example configs 21:09:55 <patchbot> https://review.opendev.org/#/c/735220/ - swift - proxy-logging: Be able to configure log_route - 3 patch sets 21:10:34 <clayg> rledisez: do you run with two proxy-logging middlewares in your pipeline? we got SUPER confused having statsd metrics for client facing responses mixed in with subrequests 21:11:53 <clayg> we're splitting everything up right now - internal_client.conf is going to get it's own metrics namespace too - i'll let you know how it works out; but we expect it to be pretty great 21:11:55 <rledisez> clayg: yes, we use the 2 proxy-logging instance 21:12:53 <rledisez> clayg: I though statsd metrics were emited by the left instance of proxy-logging, which is not in the path of subrequests 21:13:09 <rledisez> I guess I assumed wrong :) 21:13:10 <clayg> timburke: ok, I think i got what I need - the first one will merge, no one is going have much input on "fixing" prefix if you want it... 21:13:30 <clayg> so if there's any dicussion to have around "best practices" it should happen on the follow-on patch to update example configs 21:13:44 <timburke> they'll *both* emit statsd metrics 21:13:58 <clayg> rledisez: so we fixed something that made s3 requests parse a, c, o and NOW they emit metrics 21:14:10 <clayg> i mean they always both emitted... it's a disastor 21:14:24 <timburke> but you're probably already getting doubled-up metrics from things like SLO 21:14:36 <clayg> ^ 👍 21:14:43 <clayg> 😬 21:15:17 <clayg> like transfer stats... the extra response codes and GET timings don't quite look so weird... 21:15:31 <clayg> well - depending on what you think they mean 21:15:42 <clayg> cause it wasn't too hard for us to get confused - so we're gunna break it up! 21:17:52 <timburke> all right, sounds like we know what's happening with the two linked patches, and we'll expect more discussion to happen on a follow-up (and possibly at the ptg) 21:18:09 <timburke> #topic py3 encryption bug and upgrades 21:18:14 <timburke> #link https://review.opendev.org/#/c/742033/ 21:18:15 <patchbot> patch 742033 - swift - py3: Work with proper native string paths in crypt... - 4 patch sets 21:18:52 <timburke> so that patch now has the config option from p 742756 squashed-in 21:18:53 <patchbot> https://review.opendev.org/#/c/742756/ - swift - crypto: Add config option to support rolling upgrades (ABANDONED) - 3 patch sets 21:19:26 <timburke> i think the UpgradeImpact has reasonably useful/actionable information 21:20:55 <timburke> i've still got some concerns about an effective checkpoint release for switching from py2 to py3, and what that'll mean once the fix gets backported to train and ussuri 21:21:12 <timburke> but maybe those fears are misplaced? 21:21:29 <clayg> timburke: this one is definitely on my list, i'm onboard with the direction - if anyone wants to merge it for me it'd be a big help! 21:21:55 <timburke> i was just about to ask who could review, so i could get the backport ball rolling :-) 21:22:02 <clayg> timburke: oh right; you left a comment about that? 21:23:08 <clayg> nm, I don't see it - so I don't think I understand your concern 21:23:21 <clayg> do you want to add it to the review; or what to see if I notice the same thing when reviewing? 21:24:44 <timburke> just in general, i'm worried about the added complexity for operators in needing to upgrade swift while still on py2, then upgrade to py3. if someone's following ubuntu's repos for example, i'm not sure they're gonna have a good time 21:24:49 <clayg> again; if anyone else besides timburke has this loaded in their head (seongsoocho?) i'd love some help shaking out anything else needed 21:25:39 <timburke> we might want to divvy up upgrade testing, too, to make sure we're covering all the cases we care about 21:25:55 <clayg> timburke: so the issue is for py3 rollings to go smoothly py3 by default writes v2 broken? so py2 reads will fail - unless everyone is on v3 before upgrading. 21:27:12 <clayg> timburke: i only really have bandwidth enough to functionally test for py2 upgrade case - if someone wants to do more than that BEFORE we +A I'd be happy to hold off 21:27:15 <timburke> yup. and there's no way (currently; i suppose we could add yet another option...) to configure new py3 to write something that old py2 knows how to read 21:27:36 <clayg> do we have a community deployment that's waiting on this fix? or everyone is happily in the python version island with v2 and working fine? 21:28:34 <timburke> ormandj first reported the bug; i think he's running on a patched ussuri to pick up the fix 21:29:06 <clayg> timburke: ok so realistically you can't upgrade from a current version of swift running py2 to a newer version of swift running py3 even with this patch... so... what are we doing? 🤣 21:29:38 <timburke> chasing my tail, i think 21:30:16 <clayg> for US writing v3 is a blocker to migration onto py3 - so we want this patch, the rolling-upgrade behavior is seemless and all we have to do is flip to v3 asap so when we finally upgrade to py3 it's a non issue 21:31:08 <clayg> and people already on py3 are happy 21:31:23 <clayg> they also have a similar; easy upgrade; switch config afterwards story 21:32:11 <clayg> ok, so I think we'll just have to say "When switching from Python 2 to Python 3, first upgrade Swift while on Python 2, then upgrade to Python 3." is reasonable and also the best we can do. 21:32:29 <timburke> cool -- so i probably *was* overly concerned and we just need to review it, land it, and backport it 21:33:22 <timburke> i think i've got what i need then. thanks for offering to review, clayg! again, if anyone else has a chance to review/test it, that'd be much appreciated 21:33:45 <timburke> #topic shrinking and overlapping shard ranges 21:34:25 <clayg> thanks to the awesome mattoliverau for spotting bugs (i haven't written the unittest yet; pls help) 21:34:29 <timburke> so when i originally added this, i wanted to make sure we had some consensus about which path forward we ought to take 21:34:56 <timburke> but i think it's settled pretty well onto clayg's p 741721 21:34:56 <patchbot> https://review.opendev.org/#/c/741721/ - swift - add swift-manage-shard-ranges shink command - 4 patch sets 21:35:32 <timburke> and i'm going to continue working with getting a probe test to exercise it in p 744256 21:35:33 <patchbot> https://review.opendev.org/#/c/744256/ - swift - sharding: probe test to exercise manual shrinking - 1 patch set 21:35:56 <timburke> (mostly extracted from p 738149) 21:35:57 <patchbot> https://review.opendev.org/#/c/738149/ - swift - Have shrinking and sharded shards save all ranges ... - 5 patch sets 21:36:32 <mattoliverau> thanks guys for looking into it, and breaking things, be it shrinking in general or related to the manual patch, it's good to start excersizing the shrinking code. 21:37:09 <clayg> 👍 gunna be so great 21:37:14 <timburke> i'm not actually sure we need much more discussion, just more review bandwidth :-D 21:37:33 <clayg> REVEWAZ!!! 21:38:02 <clayg> mattoliverau: did revewaz; mattoliverau is awesome; be like mattoliverau 21:38:15 <timburke> so, just one more last-minute topic 21:38:16 <timburke> #topic libec upgrades 21:38:17 <clayg> also mattoliverau - do more reviews 😁 21:38:27 <mattoliverau> I might try and progress the shard range audit stuff to fix gaps, split brains and overlaps so we can more easily recover from these kinda issues. (even if it starts of manual). 21:38:54 <mattoliverau> lol, I'll find more reviewing time :) 21:38:56 <timburke> zaitcev's taken a look already, but i think it's probably worth a second set of eyes 21:39:10 <timburke> p 738959 and p 739164 21:39:11 <patchbot> https://review.opendev.org/#/c/738959/ - liberasurecode - Be willing to write fragments with legacy crc - 2 patch sets 21:39:13 <patchbot> https://review.opendev.org/#/c/739164/ - swift - ec: Add an option to write fragments with legacy crc - 1 patch set 21:39:13 <clayg> mattoliverau: I love that - if we can get where we automatically and confidently fix overlapping shard ranges it's gunna be WAY easier to start electing leaders to automatically shard 😍 21:39:50 <mattoliverau> +100 21:40:19 <timburke> heck, we might even get to the point where we say, yeah, replica 0's probably good enough :-) 21:40:39 <clayg> i'm still scared of this patch; kota_ has always stronger at libec that I've been - for me ideally this can merge without me thinking about it anymore 21:40:54 <clayg> it's supposed to be a similar "upgrade nothing changes; then flip the switch" change? 21:41:13 <kota_> i may take time to review on ec patch. 21:41:15 <clayg> only caveat is you also have to repackage a new libec/pyeclib 21:41:26 <kota_> is that ok if I will do that in this week? 21:41:43 <timburke> kota_, that'd be great, thanks! 21:41:44 <clayg> kota_: I don't have any timeline 21:42:02 <kota_> ok. will try it. 21:42:03 <timburke> clayg, no pyeclib changes necessary, given the current state of the patch 21:42:07 <clayg> agree; if we have update/progress to show by next meeting that's a huge win 21:42:26 <clayg> timburke: i'm not sure WE can release a new libec w/o a new pyeclib - but that could easily just be our fault 🤷 21:42:47 <clayg> and I could be wrong - i haven't looked at the dependency chain 21:43:03 <clayg> they may already be independent/orthogonal 21:43:10 <timburke> there is a bit of a detection problem for upgrades: you *may* need to tell new swift to write legacy fragments... or that may be the exact wrong thing to do :-( 21:43:39 <clayg> the default isn't rolling-upgrade compatible? 21:44:15 <timburke> ... it is so long as you aren't touching libec... 21:44:33 * clayg tries again to not think about this anymore 21:45:37 <timburke> the trouble is, we don't know what the upgrade-compatible behavior will be. there's a chance (depending on library load order) that you were already writing zlib crcs 21:45:50 <clayg> oh RIGHT! 🤮 21:46:30 <timburke> fwiw, i wrote up p 744078 to try to help operators figure out what the right thing to do is (and to try out the audit-watchers interface) 21:46:30 <clayg> timburke: you go LOOKING for this bugs 😡 you have a nose for trouble 21:46:30 <patchbot> https://review.opendev.org/#/c/744078/ - swift - watchers: Add EC stat gatherer - 1 patch set 21:47:03 <clayg> except we don't have audit watchers 🤷♂️ 21:48:14 <timburke> well, it's all based on the tool i added to https://bugs.launchpad.net/swift/+bug/1886088, so you could still just use that :P 21:48:15 <openstack> Launchpad bug 1886088 in OpenStack Object Storage (swift) "Mixed versions of liberasurecode leads to quarantined fragments" [Undecided,In progress] 21:49:08 <clayg> 👍 and it sounds like "figuring out which version you're on" needs to happen before you upgrade so you can set the default for the new swift version 21:49:21 <timburke> yup 21:49:28 <timburke> all right, those are the major things i wanted to cover 21:49:33 <timburke> #topic open discussion 21:49:48 <timburke> what else should we discuss? 21:52:05 <clayg> FYI: we're going to push waterfall-ec into the pipeline so we can do more testing in staging and production 21:52:17 <clayg> thanks for everyone's help with the design; i'm happy with where it ended up 21:52:37 <clayg> timburke: did a bunch of cleanup I was able to squash in and mattoliverau found a few more splinters 21:52:52 <timburke> it's gonna be so great 21:52:54 <clayg> but ultimately I think the results look good and the code is going to be fine 21:53:11 <mattoliverau> thanks for all your hard work on it clayg! 21:53:41 <clayg> we'll probably start to push to get some of it landed and I think that's also going to be fine - I hope folks try it out after their next upgrade! 21:53:54 <clayg> timburke: do you have the socket/accept/worker patch handy? 21:54:02 <timburke> not sure if anyone else has seen this, but we noticed some lumpy request distribution between proxy-server workers -- resolution seems to be to bind a listen socket per-worker and let the kernel do a better job distributing work 21:54:09 <timburke> i was just about to mention it! 21:54:13 <clayg> 😍 21:54:22 <timburke> #link https://review.opendev.org/#/c/745603/ 21:54:22 <patchbot> patch 745603 - swift - Bind a new socket per-worker - 3 patch sets 21:54:42 <rledisez> timburke: yes, we observed that. it especially important with EC as the process gets CPU bound 21:54:50 <clayg> ^^^ so much this 21:56:29 <timburke> i still haven't gotten around to testing it not-in-my-dev-VM, though -- debating about applying it on just one machine in prod to see that the distribution gets fixed 21:57:59 <clayg> 🤔 21:58:45 <timburke> all right, we're about out of time 21:58:55 <timburke> thank you all for coming, and thank you for working on swift! 21:58:55 <clayg> 👋 21:59:02 <timburke> #endmeeting