21:00:13 <timburke> #startmeeting swift
21:00:14 <openstack> Meeting started Wed Jul 31 21:00:13 2019 UTC and is due to finish in 60 minutes.  The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:00:15 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:00:17 <openstack> The meeting name has been set to 'swift'
21:00:21 <timburke> who's here for the swift meeting?
21:00:26 <kota_> o/
21:00:34 <mattoliverau> o/
21:00:35 <tdasilva> o/
21:01:33 <rledisez> o/
21:01:38 <clayg> ohai
21:01:55 <timburke> agenda hasn't changed too much
21:01:59 <timburke> #link https://wiki.openstack.org/wiki/Meetings/Swift
21:02:13 <timburke> #topic 404s from handoffs
21:02:48 <timburke> i mentioned https://review.opendev.org/#/c/672186/ last week and asked that people start to think about it and form some opinions
21:02:49 <patchbot> patch 672186 - swift - Ignore 404s from handoffs for objects when calcula... - 7 patch sets
21:03:09 <timburke> rledisez helpfully suggested that i make the new behavior configurable
21:03:30 <rledisez> *for few versions, to help transition
21:03:56 <timburke> ...but as i tried to write up an explanation for why you'd enable the new option... i'm having some doubts...
21:04:54 <timburke> regardless, i think we've got a bit of time -- clayg pointed out that it really ought to have some unit tests to cover the part of the change that only affects replicated
21:05:06 <clayg> oh dear, what did I do :'(
21:05:11 <timburke> but i'll try to get that done sooner rather than later
21:05:38 <clayg> oh, right - I don't like that knob at all
21:06:07 <clayg> I don't see any comments on the review that explain the change, so I'm guessing there was a discussion in irc that I missed?
21:06:38 <timburke> rledisez, what do we really expect operators to *do* with the option? ok, we upgraded -- crap, there's a bunch more 503s... twiddle the knob and make them 404s?
21:06:43 <clayg> rledisez: am I correct that the primary concern is some graph that tracks 5XX errors going up under load/failure/rebalance?
21:07:09 <rledisez> I think it should basically say "the new behavior might breaks your clients because the error code changed. to keep the old behavior while you upgrade your clients, set foo = false. the old behavior will be removed in version 2.Y"
21:07:09 <rledisez> the point is not about the metrics. i just won't get my bonus this year ;)
21:07:10 <timburke> what remediation would allow them to turn it back *off*
21:08:16 <clayg> but I want rledisez to get his bonus!  he buys me beers sometimes!
21:08:45 <clayg> I don't think a client that was "correctly" handling a 404 with a retry would be surprised by needing to retry a 503
21:09:23 <clayg> maybe there's a more subtle status code change I'm not considering with proper reference - "might break clients" is a good reason *not* to change it all, but I'm not sure I understand the risk?
21:09:35 <timburke> yeah... idk -- clients have to be able to handle 5xx, even now...
21:09:55 <rledisez> I know some internal users that retry on 404 because I taught them about eventual consistency. I'm not sure what will happen on 503
21:10:08 <clayg> 🤔
21:10:36 <rledisez> (not saying they are right to retry, but that's an other discussion)
21:10:53 <timburke> rledisez, what sdk do they use? or is it pretty hand-rolled?
21:11:19 <rledisez> mostly homemade for what I know
21:11:40 <rledisez> languages are Go, Perl and Java
21:11:48 <timburke> yeah... that does make it tricky...
21:12:05 <notmyname> rledisez: but I think we all tought the clients incorrectly! a 404 should only maybe retry. most of the time, 404 should be the right answer if it's what's given. swift was doing the wrong thing by giving 404 instead of 503
21:13:04 <timburke> kota_, mattoliverau: have you had a chance to think about the issue?
21:13:04 <rledisez> notmyname: i totally agree with that
21:13:05 <notmyname> but if they have been taught to retry on 404 (and we always said retry on 5xx), and we change 404->503, aren't they even more likely to retry if there's a failure?
21:13:44 <kota_> hmm... interesting
21:13:47 <notmyname> so even if clients today have been retrying on 404, they will still work after the 404->503 change (because they better already be handling 5xx anyway)
21:13:55 <mattoliverau> I won't lie, I forgot to look
21:13:56 <clayg> well, we certainly could have been more specific when we want to say "i couldn't find your object, but also might be relevant I couldn't talk to anyone that might have been authoratative"
21:13:57 <timburke> fwiw, i'd written up https://bugs.launchpad.net/swift/+bug/1837819 to try to describe the issue
21:13:58 <openstack> Launchpad bug 1837819 in OpenStack Object Storage (swift) "Overloaded object primaries cause 404s on GET" [Medium,New] - Assigned to Tim Burke (1-tim-z)
21:14:23 <clayg> we want the client to retry, and a 5XX to me is a more RESTful way to indicate to *most* clients what they should do next...
21:14:35 <rledisez> as I said, I'm don't think they should retry, but the most common I case I see is people doing an upload, and checking it's there right after (even if they got a 2xx). I tried to explain "trust swift", but no, they don't trust
21:15:16 <kota_> IMHO, we could retry on both cases. if the users knows swift error statement absolutely retry with 503, not sure on 404.
21:15:40 <timburke> rledisez, we recently had a customer doing the same thing, but looking for the object in *listings* 🤦
21:15:43 <notmyname> rledisez: yeah. unfortunately I haven't been able to figure out how to fix users yet ;-)
21:15:49 <kota_> imo, it's not 5xx, just 503
21:16:21 <kota_> because we should not retry on 500 Internal Server Error, that wouldn't be fixed in the near future.
21:16:49 <mattoliverau> The users surely just need a sleep(10) or setting :p
21:16:59 <mattoliverau> *Something
21:17:04 <rledisez> maybe i'm just too cautious, because I don't think to enable that flag in my clusters. i'll do a proper communication before upgrading. but i'm thinking of somebody who would want to rollback quickly
21:17:47 <clayg> rledisez: yeah... if someone was already running on the edge and they didn't get enough communication before the upgrade they might be really confused/nervous about what the new status code is telling them
21:17:58 <clayg> "everything was working FINE!!!" - yeah... no... it wasn't.
21:18:59 <timburke> this is highlighting for me that this probably ought to have an UpgradeImpact regardless of whether we keep the config option
21:19:18 <kota_> sounds reasonable
21:19:59 <rledisez> agree
21:20:12 <clayg> honestly this change shouldn't even be as contreversal as p 667235 (in that case the proxy could really be doing an inline retry)
21:20:13 <patchbot> https://review.opendev.org/#/c/667235/ - swift - Don't handle object without container (MERGED) - 1 patch set
21:21:14 <clayg> well, I don't want the config option - but won't -2 it or anything w/o it - but it might be the ONLY CONFIG OPTION EVAR that I set a reminder to make sure we deprecate it in the next release 😉
21:21:23 <timburke> ...and *that* makes me wonder if maybe the guy that wrote the bug shouldn't have been the one clicking +A...
21:21:59 <timburke> (well, insofar as that guy was *me*)
21:22:12 <clayg> timburke: you're PTL you can +A whenver you want ;)
21:22:38 <kota_> lol
21:23:03 <notmyname> lol, not it!
21:23:30 <mattoliverau> Lol
21:24:22 <timburke> we've got some precedent for adding known-terrible-idea config options: https://github.com/openstack/swift/commit/94bac4a
21:24:56 <timburke> i'll keep thinking on it, but i'm kinda leaning toward clayg's position personally
21:25:06 <mattoliverau> lol
21:26:07 <timburke> let's keep moving
21:26:15 <timburke> #topic py3
21:26:19 <kota_> config opt_out/in might be terrible
21:26:26 <timburke> not much to report
21:26:29 <kota_> i'm imagine s3acl...
21:26:31 <mattoliverau> I need to look closer at it, and sorry I didn't. Like kota_ says, 503 should be retry. But its the contract changing that is the impact. And one of our dev and Ops wants an escape clause, for a release, I think I'm ok with that.
21:26:38 <kota_> imaging
21:26:42 <mattoliverau> anyway, move on :)
21:26:46 <kota_> ok
21:27:04 <timburke> ugh, yeah... s3_acl...
21:27:28 <timburke> https://review.opendev.org/#/c/672610/ landed! hopefully zaitcev's cluster will be happier now :-)
21:27:29 <patchbot> patch 672610 - swift - py3: fix non-ascii metadata handling in account-se... (MERGED) - 2 patch sets
21:27:37 <mattoliverau> \o/
21:28:11 <timburke> as did https://review.opendev.org/#/c/672803/ -- there may be a bit of a long tail of patches like that :-/
21:28:11 <patchbot> patch 672803 - swift - py3: Fix title-casing in HeaderKeyDict (MERGED) - 3 patch sets
21:30:16 <timburke> it'll be nice to get https://review.opendev.org/#/c/671333/ so i can run probe tests locally again :-P
21:30:17 <patchbot> patch 671333 - swift - py3: (mostly) port probe tests - 2 patch sets
21:30:32 <timburke> that's about it
21:30:41 <timburke> #topic lots of small files
21:31:08 <timburke> rledisez, kota_ i haven't seen too much lately on the branch (but that's ok)
21:31:21 <kota_> ya, sorry
21:31:41 <timburke> like i said, that's ok! no need to apologize
21:31:53 <clayg> tdasilva: is going to be all up in the losf pretty soon!
21:31:56 <rledisez> yeah. alecuyer is off so don't expect a lot from OVH for few weeks
21:31:58 <timburke> i know people here at swiftstack have been getting increasingly interested -- i think tdasilva has been taking a look recently?
21:32:04 <kota_> nice, tdasilva!!!
21:32:52 <timburke> fwiw, i feel like this ought to be the next big item of work that we're all focused on
21:33:15 <kota_> +1
21:33:38 <kota_> oh, tdasilva has left.
21:33:46 <clayg> +2
21:34:19 <timburke> bah. i was just about to ask him if there was anything he'd like to bring up about it... oh well
21:34:46 <timburke> #topic sharding
21:35:11 <timburke> mattoliverau, i know you've got a few patches up now -- anything we ought to be doing besides reviewing them?
21:35:35 <mattoliverau> Nah just reviewing them and point out the obvious flaws and edge cases :)
21:35:46 <timburke> 👍
21:36:02 <timburke> #topic symlinks and versioning
21:36:23 <mattoliverau> I last patch is a complete POC were we send the ranges from the scanner via UPDATE to do a reverse rollback stratergy. Not sure I like it, but was an idea I had.
21:36:31 <mattoliverau> that's all
21:36:40 <timburke> thanks
21:37:22 <clayg> on the hardlinks I was looking at some comments this morning, and adding some more tests for behaviors that we want to better specify - I think we're still not 100% clear on how hardlinks to manifests/symlinks should look
21:37:44 <clayg> I think timburke had the most experience/insight - so i'm hoping to get his feedback on some of the new tests I drafted
21:38:01 <timburke> i'll be sure to take a look :-)
21:38:44 <timburke> any other blockers for you, or places that you need more input?
21:39:07 <clayg> i also managed to get up the s3 versioning patch at the end of the chain, p 673682
21:39:08 <patchbot> https://review.opendev.org/#/c/673682/ - swift - s3api: Implement versioning status API - 1 patch set
21:39:37 <clayg> one thing that starts to shine through on that one is how much it just assumes it knows how versioning works and does what it needs to implement the aws api
21:40:11 <clayg> I think in a more perfect world we'd have looked at the s3 versioning features and added them to versioned writes - then s3api is just doing *translation* from aws apis to swift apis
21:40:46 <kota_> +1
21:40:50 <clayg> but since we don't have spellings for ... e.g. "copy version_id XYZ"  we just "do it" in s3api
21:41:37 <clayg> but I don't think moving forward with something that works really prevents us in any meaningful way from doing that work later (except that maybe we'd have less motivation to do so)
21:42:32 <clayg> OTOH, I'm not sure much moving to symlink versionsing is really going to throw off clients that sort of had to learn how stack & history versoining worked already so they could do things like "restore version X" or "delete version Y"
21:43:11 <clayg> had we had an API for that all along it would make less of a difference to clients when we decide to change the underlying implementation
21:43:59 <timburke> makes sense
21:44:05 <clayg> anyway, is what is... something I'll be a little more aware of as we flesh more of the s3api matrix down the road...
21:44:35 <clayg> I don't think i'm blocked right now and I feel like i'm making progress
21:44:42 <clayg> all feedback is appreciated!
21:45:04 <timburke> 👍
21:45:17 <timburke> #topic shanghai
21:45:47 <timburke> i know i dropped it from the agenda, but i've been looking at the order in which i'll have to do things and i thought i ought to share
21:46:38 <timburke> keeping in mind that it's a bit US-centric, but may be more-or-less applicable for you, too mattoliverau, kota_, and rledisez
21:46:58 <kota_> something like VISA?
21:47:39 <mattoliverau> its probably pretty similar I suspect. Ie get a letter, get visa, etc.
21:49:30 <timburke> looks like to get the visa (http://www.china-embassy.org/eng/visas/hrsq/#M, $140) i need the invitation letter (https://openstackfoundation.formstack.com/forms/visa_form_shanghai_summit), and to get *that* i need to register (https://app.eventxtra.link/registrations/6640a923-98d7-44c7-a623-1e2c9132b402, $161 after the contributor discount)
21:50:14 <timburke> as far as i can tell, airfare and hotel can be done at any point along there
21:50:30 <kota_> that registration with early bird will close around... 14 (maybe?) Aug.
21:50:59 <rledisez> kota_: I was looking for that information earlier. do you have a link?
21:51:06 <kota_> I'm not sure the contribution discount will keep the $161
21:51:22 <timburke> https://www.openstack.org/summit/shanghai-2019/ says "Summit registration is open - get your tickets before prices increase on August 14 at 11:59pm PT! "
21:51:31 <kota_> https://www.openstack.org/summit/shanghai-2019/
21:51:34 <kota_> same URL
21:51:38 <kota_> #link https://www.openstack.org/summit/shanghai-2019/
21:51:45 <rledisez> perfect. thx. how did i miss that :)
21:51:55 <kota_> no information about the standard price after early bird.
21:53:40 <timburke> that's about it. i did send something to the mailing list to call out our etherpad (http://lists.openstack.org/pipermail/openstack-discuss/2019-July/008156.html) -- it was a bit behind when most projects did theirs
21:54:07 <kota_> 你好!
21:54:08 <timburke> we'll see who else puts their name on https://etherpad.openstack.org/p/swift-ptg-shanghai :-D
21:54:54 <timburke> that's all i've got
21:54:59 <timburke> #topic open discussion
21:55:09 <clayg> i gotta bounce, ya'll be good
21:55:14 <timburke> anyone have anything eles to bring up in the last five minutes?
21:57:03 <timburke> all right. thank you all for coming, and thank you for working on swift!
21:57:08 <timburke> #endmeeting