#openstack-meeting log

21:03:18 <timburke> #startmeeting swift
21:03:20 <openstack> Meeting started Wed Jul 29 21:03:18 2020 UTC and is due to finish in 60 minutes.  The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:03:21 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:03:23 <openstack> The meeting name has been set to 'swift'
21:03:25 <timburke> who's here for the swift meeting?
21:03:31 <mattoliverau> o/
21:03:44 <seongsoocho> o/
21:04:36 <kota_> o/
21:04:40 <rledisez> hi o/
21:05:38 <timburke> clayg, zaitcev, tdasilva?
21:05:43 <clayg> i'm here!
21:05:52 <clayg> thanks
21:06:07 <timburke> agenda's at https://wiki.openstack.org/wiki/Meetings/Swift
21:06:21 <timburke> first, a couple announcements
21:06:25 <timburke> #topic ptg
21:06:28 <zaitcev> It's a storm here but I'm here thus far.
21:06:48 <timburke> the next ptg will be all-online, like the last one
21:06:59 <timburke> and there's a poll up for *when* exactly it should be
21:07:02 <timburke> #link http://lists.openstack.org/pipermail/openstack-discuss/2020-July/016098.html
21:07:15 <clayg> oh, thanks
21:07:50 <zaitcev> Maybe we ought to run our own hackathon in person. Of course, masks, hand sanitizer all around.
21:08:04 <clayg> small pods
21:08:31 <timburke> zaitcev, that is *so* tempting for me
21:08:54 <mattoliverau> Tho that only works for peeps in the same country
21:09:28 <timburke> and even then, it'll be harder than usual to get employer buy-in
21:10:03 <timburke> so i think the plan for now should be: do a virtual ptg again
21:10:46 <timburke> though i fully acknowledge that there's something lost in doing it that way
21:11:14 <timburke> #topic London OpenInfra virtual meetup
21:11:44 <timburke> there's a meetup thing going on tomorrow! seems like it might be worth checking out if anyone has time
21:11:46 <timburke> #link http://lists.openstack.org/pipermail/openstack-discuss/2020-July/016109.html
21:13:00 <timburke> that's all i've got for announcements... on to more swifty things
21:13:08 <timburke> #topic py3 crypto bug
21:13:30 <timburke> #link https://launchpad.net/bugs/1888037
21:13:33 <openstack> Launchpad bug 1888037 in OpenStack Object Storage (swift) "Encryption writes different paths for key_id on py2 and py3" [High,In progress]
21:14:07 <timburke> currently, there's an availability issue when upgrading from swift-on-py2 to swift-on-py3
21:14:45 <zaitcev> \u00f0\u009f\u008c\u00b4 is a WSGI string, isn't it
21:14:56 <timburke> specifically, any data that was encrypted for a path that included any non-ascii characters will cause 500s
21:15:09 <timburke> zaitcev, yup :-(
21:15:32 <timburke> good news is, we've got a fix: https://review.opendev.org/#/c/742033/
21:15:32 <patchbot> patch 742033 - swift - py3: Work with proper native string paths in crypt... - 3 patch sets
21:16:51 <timburke> bad news is, it currently causes a rolling upgrade issue (because it bumps the version number stored in crypto metadata, so anything written by an upgraded proxy won't be able to be read by an old proxy)
21:17:00 <clayg> join the debate!  https://etherpad.opendev.org/p/crypto-meta-version-3
21:17:33 <timburke> we've done this once before... for https://launchpad.net/bugs/1813725 about a year and a half ago
21:17:34 <openstack> Launchpad bug 1813725 in OpenStack Object Storage (swift) "Object encryption key truncated sometimes when used with Barbican" [Medium,Confirmed]
21:17:45 <timburke> but that doesn't mean it was a good thing
21:18:03 <mattoliverau> wow, it's like py2 and py3 strings bite us again. All when you think it's safe :P
21:19:47 <clayg> yeah the py2 works with py2 and py3 works with py3 thing in v2 is annoying
21:19:51 <timburke> i also did a patch to add an option to continue doing things the old way(s): https://review.opendev.org/#/c/742756/
21:19:52 <patchbot> patch 742756 - swift - crypto: Add config option to support rolling upgrades - 3 patch sets
21:20:02 <rledisez> is it not an other argument for checkpoint release?
21:20:10 <clayg> but the problem of a py2 upgrading to py3 and expecting it in the py3-v2 format is still bad
21:20:18 <timburke> but i'm realizing that even that won't work well for a rolling py2->py3 upgrade
21:20:26 <clayg> rledisez: I don't think we have ever done anything we'd call a checkpoint release
21:21:03 <rledisez> clayg: my point is maybe we should. it would help us a lot when managing legacy code & co. we also know at some point we can drop old code
21:21:30 <rledisez> (when it's not about data written on disk, of course)
21:21:48 <clayg> i agree, having a checkpoint release process that's robust and available would be super userful
21:22:08 <clayg> I think we've mostly juts done like a 4 year deprecation cycle or.. like "never" also works
21:23:00 <timburke> heh -- reminds me of https://review.opendev.org/#/c/736787/
21:23:00 <patchbot> patch 736787 - swift - Rip out pickle support in our memcached client - 1 patch set
21:24:31 <zaitcev> That's different. It only takes 1 reboot to make sure that old format is no more. But drives can be around for 10 years.
21:25:19 <timburke> i meant more the 'or.. like "never" also works' ;-)
21:25:32 <rledisez> zaitcev: in this situation, we could "force" the operator to upgrade to a version that support both v2 and v3 format, before letting him move to a version that use v3 by default
21:26:03 <clayg> timburke: 🤣
21:27:04 <clayg> we probably *could* drop "reading the pickle format from memcache" support - see!  eventually it'll seem obvious that we don't have to write v2!
21:28:00 <timburke> so, back tot he topic at hand: how can we make sure a rolling upgrade is still successful? i guess, more config options that we hope operators never actually need to use?
21:28:07 <timburke> what should the defaults be?
21:28:14 <clayg> rledisez: you betcha!  a legit checkpoint!  there's just the code needed to support the mechanics and the process needed to make sure all the clusters we care about upgrade like we want
21:29:37 <rledisez> timburke: default should be v2. The operator knows when he upgraded all the cluster and so decides to switch to v3 when he's ready. but we also have to assume that some won't do it and will get bitten one day when v3 become the default
21:29:41 <clayg> I'm 100% sure continuing to write in the current format is the correct thing todo for current clusters - py2 clusters could even skip the latin-1 shiz; py3 tho won't know if it has py2/v2 or py3/v2 so it'll have to do the extra work
21:30:17 <clayg> rledisez: as long as they have code that can READ v3 we can start writing it
21:30:27 <clayg> rledisez: so... like 2 years or whatever... it'll be fine
21:30:38 <timburke> rledisez, *which* v2? the v2 that py2 would've written down (which is essentially identical to v3), or the v2 that py3 would've written down?
21:30:47 <clayg> the real suck is if we have a v4 by then - then we have to think about upgrading from something can't read v3 😞
21:31:42 <clayg> timburke: v2 will have to continue to be different on py2 vs py3 until after they've upgraded AFICT
21:31:44 <timburke> and actually -- i kinda feel like it's worth thinking about whether the default *should be* v1
21:32:31 <clayg> timburke: I think v1 had a BUG tho - loss of information?  and we've made it past the upgrade already?  why do we want to go back?
21:32:32 <timburke> clayg, so to do a py2->py3 transition, you need to go old-swift-on-py2 -> new-swift-on-py2 -> new-swift-on-py3?
21:33:27 <clayg> i think old-swift-on-py2 to new-swift-on-py3 should be fine as long as they both know how to read the right formats (e.g. current swift py3 can't read old-swift-py2 format)
21:34:25 <clayg> but once we've cut new code such that new-swift-py3 can read old-swift-py2 format (even if that old-swift-py2 is still writing v2); we should be fine?
21:34:26 <timburke> clayg, you and i have, certainly. how many clusters are still out there from rocky or earlier?
21:35:28 <clayg> ok, so there's clusters still writing v1 that haven't upgraded to a swift that's now writing v2 so their yet-to-be-done rolling-upgrade WILL cry when old proxies read v2/v3 for the first time
21:35:41 <timburke> clayg, the way the patches are currently, new-swift-on-py3 won't be able to write a v2 that old-swift-on-py2 could read
21:36:00 <clayg> so this is sort of the 'v4 requires code that knows how to read v3 to upgrade" problem... but earlier
21:37:03 <clayg> timburke: but a new-swift-on-py3 *COULD* write a *v3* that new-swift-on-py2 could read so maybe there's a "min swift version prior to upgrade to py3"
21:37:17 <clayg> unless you already have... in which case "thanks for helping us find all these bugs!"
21:38:39 <timburke> i've got this nagging feeling like that version's going to be ever-increasing until we drop support for py2...
21:38:53 <clayg> timburke: 🤗
21:39:17 <clayg> seongsoocho: rledisez: anyone else want to try and jump in?  Any questions about the bug report, the wip patch, or the etherpad
21:40:11 <rledisez> nah, i agree with last comment from timburke (so maybe there's a "min swift version prior to upgrade to py3")
21:41:33 <timburke> should i squash the two patches together, so there's a nice spot to write a fairly complete UpgradeImpact?
21:41:41 <clayg> we could try a "checkpoint process" that's mostly convention w/o code to enforce it
21:42:29 <clayg> but i'm not going to exercise it; cause as soon as we upgrade we'll turn on v3 so if the default changes later it won't effect us.
21:42:50 <seongsoocho> I'm still try to reproduce this bug in my dev cluster .  :-(
21:43:25 <clayg> you have to write py2 crypto - then read it py3 to see the bug really
21:43:39 <timburke> seongsoocho, so you're in a fairly unique position (to my knowledge): your cluster's been py3 from the beginning, right?
21:44:45 <clayg> py2/v2 is unicode from utf8 - py3/v2 is unicode from latin-1
21:45:00 <seongsoocho> timburke:   I have 3 cluster and the 3rd cluster is py3 from the beginning.
21:45:18 <timburke> ah, got it. thanks; good to know
21:45:46 <timburke> do you run with encryption enabled?
21:45:47 <clayg> oh neat!  so you'll definitely want this fixed so you can upgrade the other clusters to py3!!!  💪
21:46:14 <clayg> oh... good question... why did I think seongsoocho's cluster was encrypted 🤔
21:47:23 <seongsoocho> yes .. actually my cluster doesn't use encrypted. but now i'm build a new cluster (support object encryption) with py3.
21:49:01 <seongsoocho> Rather than encrypting object on the server, I'm telling our customer to encrypt it on the client.
21:49:14 <clayg> seongsoocho: excellent recommendation
21:49:25 <timburke> always a good stance to take :-)
21:49:29 <seongsoocho> But, new cluster support server-side object encryption..
21:52:14 <timburke> ok, i'm running out of steam on this. i'm still not sure what i should be coding
21:52:35 <clayg> v2 default so we can upgrade w/o having to push a config first!
21:52:53 <clayg> after we upgrade we'll turn on v3 and we don't have to think about this dumb problem for a little while 🤣
21:53:30 <timburke> and release note that you need to upgrade to swift 2.26.0 (or whatever) before switching from py2 to py3?
21:53:56 <timburke> i can do that. nobody else worried about the v1 upgrade issue?
21:55:04 <timburke> i guess not ;-)
21:55:10 <timburke> ok, last few minutes
21:55:14 <timburke> #topic open discussion
21:55:20 <clayg> upgrade *and* be writing v3 I think - unless new-py2 will be able to read the old-py3-default 🤔
21:55:22 <timburke> anything else to bring up?
21:55:38 <clayg> I wanted to tell people I'm trying to simply the config for concurrent gets with ec!
21:55:56 <clayg> https://review.opendev.org/#/c/737096/
21:55:56 <patchbot> patch 737096 - swift - Add concurrent_ec_extra_requests - 4 patch sets
21:56:17 <seongsoocho> https://bugs.launchpad.net/swift/+bug/1889386   <- Does anyone know about this bug(??)??
21:56:18 <openstack> Launchpad bug 1889386 in OpenStack Object Storage (swift) "[ s3 api ] The CreationDate of listing bucket always return '2009-02-03T16:45:09.000Z'" [Undecided,New]
21:56:27 <zaitcev> to simply? Was it "to simplify" perchance?
21:56:28 <clayg> now that it doesn't have all that "per replica" crap I'm thinking about squishing the concurrent_ec_extra_requests option into p 711342
21:56:28 <patchbot> https://review.opendev.org/#/c/711342/ - swift - Add concurrent_gets to EC GET requests - 14 patch sets
21:56:36 <timburke> clayg, yeah, new-py2 should be fine to deal with old-py3
21:56:57 <clayg> timburke: well, people should still write v3 😁
21:57:09 <timburke> oh, for sure
21:57:59 <clayg> zaitcev: simplify!  yes, thank you!
21:58:00 <timburke> seongsoocho, yeah... that's a sad confusing mess, isn't it...
21:58:22 <clayg> do the account db's tables not have any kind of date?  modified or something?
21:58:37 <seongsoocho> timburke:  yes... I have no idea to fix this problem.
21:58:56 <timburke> i think we could address it -- add a new key to the json responses the account server sends back, have s3api look for that
21:59:06 <clayg> seongsoocho: i'd be onboard with a schema update that includes pushing new info up to the listings from the container-updater
21:59:26 <clayg> anything that makes us more like s3 is helpful!  (it's a big patch tho, I don't know how to cheat)
21:59:38 <timburke> no one's felt enough pressure yet to get it fixed. but i'd be happy to review any patches for it!
21:59:46 <clayg> ❤️
22:00:01 <clayg> timburke: is it a datamodel change too - or just api?
22:00:32 <timburke> i *think* mostly just api. pretty sure we've already got the create date in the db table, though i ought to double-check
22:01:02 <clayg> zaitcev:  I think the concurrent_ec_extra_requests option will allow for most of the interesting things we might have tried with per-replica-timeouts and will be much easier to configure
22:01:10 <clayg> oh... shoot - that's time!
22:01:29 <timburke> i put it up there with ?partNumber=<N> support -- it shouldn't actually be too bad... mostly just a matter of prioritization
22:01:34 <timburke> so it is
22:01:44 <timburke> thank you all for coming, and thank you for working on swift!
22:01:49 <timburke> #endmeeting