21:00:40 <timburke> #startmeeting swift
21:00:41 <openstack> Meeting started Wed Aug 12 21:00:40 2020 UTC and is due to finish in 60 minutes.  The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:00:42 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:00:44 <openstack> The meeting name has been set to 'swift'
21:00:47 <timburke> who's here for the swift meeting?
21:00:54 <mattoliverau> o/
21:01:14 <clayg> o/
21:01:27 <kota_> hi
21:01:46 <rledisez> o/
21:02:46 <timburke> agenda's at https://wiki.openstack.org/wiki/Meetings/Swift
21:03:04 <timburke> #topic summit and ptg
21:03:27 <timburke> looks like the dates are set for the ptg
21:03:28 <timburke> #link http://lists.openstack.org/pipermail/openstack-discuss/2020-August/016424.html
21:03:41 <timburke> it will immediately follow the summit
21:04:06 <timburke> so, summit will be oct 19-23, then ptg oct 26-30
21:04:24 <clayg> it's all virtual like last time?  that seemed to work ok...
21:04:30 <timburke> registration's already open, and both virtual events are free
21:04:38 <timburke> #link https://www.eventbrite.com/e/open-infrastructure-summit-2020-tickets-96967218561
21:04:44 <timburke> #link https://www.eventbrite.com/e/project-teams-gathering-october-2020-tickets-116136313841
21:05:16 <kota_> good info. thx.
21:05:18 <timburke> i'll keep an eye out for what we need to do to sign up for space and such, and try to be more on it than last time
21:05:48 <clayg> timburke: or just let mattoliverau do it again - things worked great!?
21:05:53 <timburke> though mattoliverau did a great job getting everything organized!
21:06:49 <timburke> that's all i had by way of announcements; any questions or comments?
21:06:50 <mattoliverau> Lol
21:07:51 <timburke> all right
21:07:55 <clayg> timburke: great update!  especially the part about mattoliverau being awesome 👍
21:08:07 <timburke> #topic s3api subrequest logging
21:08:15 <timburke> clayg, i think you added this, yeah?
21:08:25 <clayg> oh no 😬
21:08:28 <clayg> did I add a patch?
21:08:35 <timburke> two of them!
21:08:46 <timburke> and now one's landed :-)
21:08:58 <clayg> why is zuul so mad at p 735221 😡
21:08:58 <patchbot> https://review.opendev.org/#/c/735221/ - swift - s3api: Use swift.backend_path to proxy-log s3api r... - 2 patch sets
21:09:27 <timburke> er, approved -- i'll keep poking at the gate
21:09:55 <clayg> so if we CAN get that one landed I'm pretty much fine with p 735220 - in fact I'd like to do some follow-on to update the example configs
21:09:55 <patchbot> https://review.opendev.org/#/c/735220/ - swift - proxy-logging: Be able to configure log_route - 3 patch sets
21:10:34 <clayg> rledisez: do you run with two proxy-logging middlewares in your pipeline?  we got SUPER confused having statsd metrics for client facing responses mixed in with subrequests
21:11:53 <clayg> we're splitting everything up right now - internal_client.conf is going to get it's own metrics namespace too - i'll let you know how it works out; but we expect it to be pretty great
21:11:55 <rledisez> clayg: yes, we use the 2 proxy-logging instance
21:12:53 <rledisez> clayg: I though statsd metrics were emited by the left instance of proxy-logging, which is not in the path of subrequests
21:13:09 <rledisez> I guess I assumed wrong :)
21:13:10 <clayg> timburke: ok, I think i got what I need - the first one will merge, no one is going have much input on "fixing" prefix if you want it...
21:13:30 <clayg> so if there's any dicussion to have around "best practices" it should happen on the follow-on patch to update example configs
21:13:44 <timburke> they'll *both* emit statsd metrics
21:13:58 <clayg> rledisez: so we fixed something that made s3 requests parse a, c, o and NOW they emit metrics
21:14:10 <clayg> i mean they always both emitted... it's a disastor
21:14:24 <timburke> but you're probably already getting doubled-up metrics from things like SLO
21:14:36 <clayg> ^ 👍
21:14:43 <clayg> 😬
21:15:17 <clayg> like transfer stats... the extra response codes and GET timings don't quite look so weird...
21:15:31 <clayg> well - depending on what you think they mean
21:15:42 <clayg> cause it wasn't too hard for us to get confused - so we're gunna break it up!
21:17:52 <timburke> all right, sounds like we know what's happening with the two linked patches, and we'll expect more discussion to happen on a follow-up (and possibly at the ptg)
21:18:09 <timburke> #topic py3 encryption bug and upgrades
21:18:14 <timburke> #link https://review.opendev.org/#/c/742033/
21:18:15 <patchbot> patch 742033 - swift - py3: Work with proper native string paths in crypt... - 4 patch sets
21:18:52 <timburke> so that patch now has the config option from p 742756 squashed-in
21:18:53 <patchbot> https://review.opendev.org/#/c/742756/ - swift - crypto: Add config option to support rolling upgrades (ABANDONED) - 3 patch sets
21:19:26 <timburke> i think the UpgradeImpact has reasonably useful/actionable information
21:20:55 <timburke> i've still got some concerns about an effective checkpoint release for switching from py2 to py3, and what that'll mean once the fix gets backported to train and ussuri
21:21:12 <timburke> but maybe those fears are misplaced?
21:21:29 <clayg> timburke: this one is definitely on my list, i'm onboard with the direction - if anyone wants to merge it for me it'd be a big help!
21:21:55 <timburke> i was just about to ask who could review, so i could get the backport ball rolling :-)
21:22:02 <clayg> timburke: oh right; you left a comment about that?
21:23:08 <clayg> nm, I don't see it - so I don't think I understand your concern
21:23:21 <clayg> do you want to add it to the review; or what to see if I notice the same thing when reviewing?
21:24:44 <timburke> just in general, i'm worried about the added complexity for operators in needing to upgrade swift while still on py2, then upgrade to py3. if someone's following ubuntu's repos for example, i'm not sure they're gonna have a good time
21:24:49 <clayg> again; if anyone else besides timburke has this loaded in their head (seongsoocho?) i'd love some help shaking out anything else needed
21:25:39 <timburke> we might want to divvy up upgrade testing, too, to make sure we're covering all the cases we care about
21:25:55 <clayg> timburke: so the issue is for py3 rollings to go smoothly py3 by default writes v2 broken?  so py2 reads will fail - unless everyone is on v3 before upgrading.
21:27:12 <clayg> timburke: i only really have bandwidth enough to functionally test for py2 upgrade case - if someone wants to do more than that BEFORE we +A I'd be happy to hold off
21:27:15 <timburke> yup. and there's no way (currently; i suppose we could add yet another option...) to configure new py3 to write something that old py2 knows how to read
21:27:36 <clayg> do we have a community deployment that's waiting on this fix?  or everyone is happily in the python version island with v2 and working fine?
21:28:34 <timburke> ormandj first reported the bug; i think he's running on a patched ussuri to pick up the fix
21:29:06 <clayg> timburke: ok so realistically you can't upgrade from a current version of swift running py2 to a newer version of swift running py3 even with this patch... so... what are we doing?  🤣
21:29:38 <timburke> chasing my tail, i think
21:30:16 <clayg> for US writing v3 is a blocker to migration onto py3 - so we want this patch, the rolling-upgrade behavior is seemless and all we have to do is flip to v3 asap so when we finally upgrade to py3 it's a non issue
21:31:08 <clayg> and people already on py3 are happy
21:31:23 <clayg> they also have a similar; easy upgrade; switch config afterwards story
21:32:11 <clayg> ok, so I think we'll just have to say "When switching from Python 2 to Python 3, first upgrade Swift while on Python 2, then upgrade to Python 3." is reasonable and also the best we can do.
21:32:29 <timburke> cool -- so i probably *was* overly concerned and we just need to review it, land it, and backport it
21:33:22 <timburke> i think i've got what i need then. thanks for offering to review, clayg! again, if anyone else has a chance to review/test it, that'd be much appreciated
21:33:45 <timburke> #topic shrinking and overlapping shard ranges
21:34:25 <clayg> thanks to the awesome mattoliverau for spotting bugs (i haven't written the unittest yet; pls help)
21:34:29 <timburke> so when i originally added this, i wanted to make sure we had some consensus about which path forward we ought to take
21:34:56 <timburke> but i think it's settled pretty well onto clayg's p 741721
21:34:56 <patchbot> https://review.opendev.org/#/c/741721/ - swift - add swift-manage-shard-ranges shink command - 4 patch sets
21:35:32 <timburke> and i'm going to continue working with getting a probe test to exercise it in p 744256
21:35:33 <patchbot> https://review.opendev.org/#/c/744256/ - swift - sharding: probe test to exercise manual shrinking - 1 patch set
21:35:56 <timburke> (mostly extracted from p 738149)
21:35:57 <patchbot> https://review.opendev.org/#/c/738149/ - swift - Have shrinking and sharded shards save all ranges ... - 5 patch sets
21:36:32 <mattoliverau> thanks guys for looking into it, and breaking things, be it shrinking in general or related to the manual patch, it's good to start excersizing the shrinking code.
21:37:09 <clayg> 👍 gunna be so great
21:37:14 <timburke> i'm not actually sure we need much more discussion, just more review bandwidth :-D
21:37:33 <clayg> REVEWAZ!!!
21:38:02 <clayg> mattoliverau: did revewaz; mattoliverau is awesome; be like mattoliverau
21:38:15 <timburke> so, just one more last-minute topic
21:38:16 <timburke> #topic libec upgrades
21:38:17 <clayg> also mattoliverau - do more reviews 😁
21:38:27 <mattoliverau> I might try and progress the shard range audit stuff to fix gaps, split brains and overlaps so we can more easily recover from these kinda issues. (even if it starts of manual).
21:38:54 <mattoliverau> lol, I'll find more reviewing time :)
21:38:56 <timburke> zaitcev's taken a look already, but i think it's probably worth a second set of eyes
21:39:10 <timburke> p 738959 and p 739164
21:39:11 <patchbot> https://review.opendev.org/#/c/738959/ - liberasurecode - Be willing to write fragments with legacy crc - 2 patch sets
21:39:13 <patchbot> https://review.opendev.org/#/c/739164/ - swift - ec: Add an option to write fragments with legacy crc - 1 patch set
21:39:13 <clayg> mattoliverau: I love that - if we can get where we automatically and confidently fix overlapping shard ranges it's gunna be WAY easier to start electing leaders to automatically shard 😍
21:39:50 <mattoliverau> +100
21:40:19 <timburke> heck, we might even get to the point where we say, yeah, replica 0's probably good enough :-)
21:40:39 <clayg> i'm still scared of this patch; kota_ has always stronger at libec that I've been - for me ideally this can merge without me thinking about it anymore
21:40:54 <clayg> it's supposed to be a similar "upgrade nothing changes; then flip the switch" change?
21:41:13 <kota_> i may take time to review on ec patch.
21:41:15 <clayg> only caveat is you also have to repackage a new libec/pyeclib
21:41:26 <kota_> is that ok if I will do that in this week?
21:41:43 <timburke> kota_, that'd be great, thanks!
21:41:44 <clayg> kota_: I don't have any timeline
21:42:02 <kota_> ok. will try it.
21:42:03 <timburke> clayg, no pyeclib changes necessary, given the current state of the patch
21:42:07 <clayg> agree; if we have update/progress to show by next meeting that's a huge win
21:42:26 <clayg> timburke: i'm not sure WE can release a new libec w/o a new pyeclib - but that could easily just be our fault 🤷
21:42:47 <clayg> and I could be wrong - i haven't looked at the dependency chain
21:43:03 <clayg> they may already be independent/orthogonal
21:43:10 <timburke> there is a bit of a detection problem for upgrades: you *may* need to tell new swift to write legacy fragments... or that may be the exact wrong thing to do :-(
21:43:39 <clayg> the default isn't rolling-upgrade compatible?
21:44:15 <timburke> ... it is so long as you aren't touching libec...
21:44:33 * clayg tries again to not think about this anymore
21:45:37 <timburke> the trouble is, we don't know what the upgrade-compatible behavior will be. there's a chance (depending on library load order) that you were already writing zlib crcs
21:45:50 <clayg> oh RIGHT!  🤮
21:46:30 <timburke> fwiw, i wrote up p 744078 to try to help operators figure out what the right thing to do is (and to try out the audit-watchers interface)
21:46:30 <clayg> timburke: you go LOOKING for this bugs 😡 you have a nose for trouble
21:46:30 <patchbot> https://review.opendev.org/#/c/744078/ - swift - watchers: Add EC stat gatherer - 1 patch set
21:47:03 <clayg> except we don't have audit watchers 🤷‍♂️
21:48:14 <timburke> well, it's all based on the tool i added to https://bugs.launchpad.net/swift/+bug/1886088, so you could still just use that :P
21:48:15 <openstack> Launchpad bug 1886088 in OpenStack Object Storage (swift) "Mixed versions of liberasurecode leads to quarantined fragments" [Undecided,In progress]
21:49:08 <clayg> 👍 and it sounds like "figuring out which version you're on" needs to happen before you upgrade so you can set the default for the new swift version
21:49:21 <timburke> yup
21:49:28 <timburke> all right, those are the major things i wanted to cover
21:49:33 <timburke> #topic open discussion
21:49:48 <timburke> what else should we discuss?
21:52:05 <clayg> FYI: we're going to push waterfall-ec into the pipeline so we can do more testing in staging and production
21:52:17 <clayg> thanks for everyone's help with the design; i'm happy with where it ended up
21:52:37 <clayg> timburke: did a bunch of cleanup I was able to squash in and mattoliverau found a few more splinters
21:52:52 <timburke> it's gonna be so great
21:52:54 <clayg> but ultimately I think the results look good and the code is going to be fine
21:53:11 <mattoliverau> thanks for all your hard work on it clayg!
21:53:41 <clayg> we'll probably start to push to get some of it landed and I think that's also going to be fine - I hope folks try it out after their next upgrade!
21:53:54 <clayg> timburke: do you have the socket/accept/worker patch handy?
21:54:02 <timburke> not sure if anyone else has seen this, but we noticed some lumpy request distribution between proxy-server workers -- resolution seems to be to bind a listen socket per-worker and let the kernel do a better job distributing work
21:54:09 <timburke> i was just about to mention it!
21:54:13 <clayg> 😍
21:54:22 <timburke> #link https://review.opendev.org/#/c/745603/
21:54:22 <patchbot> patch 745603 - swift - Bind a new socket per-worker - 3 patch sets
21:54:42 <rledisez> timburke: yes, we observed that. it especially important with EC as the process gets CPU bound
21:54:50 <clayg> ^^^ so much this
21:56:29 <timburke> i still haven't gotten around to testing it not-in-my-dev-VM, though -- debating about applying it on just one machine in prod to see that the distribution gets fixed
21:57:59 <clayg> 🤔
21:58:45 <timburke> all right, we're about out of time
21:58:55 <timburke> thank you all for coming, and thank you for working on swift!
21:58:55 <clayg> 👋
21:59:02 <timburke> #endmeeting