21:00:13 <timburke> #startmeeting swift 21:00:14 <openstack> Meeting started Wed Jul 31 21:00:13 2019 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:15 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:17 <openstack> The meeting name has been set to 'swift' 21:00:21 <timburke> who's here for the swift meeting? 21:00:26 <kota_> o/ 21:00:34 <mattoliverau> o/ 21:00:35 <tdasilva> o/ 21:01:33 <rledisez> o/ 21:01:38 <clayg> ohai 21:01:55 <timburke> agenda hasn't changed too much 21:01:59 <timburke> #link https://wiki.openstack.org/wiki/Meetings/Swift 21:02:13 <timburke> #topic 404s from handoffs 21:02:48 <timburke> i mentioned https://review.opendev.org/#/c/672186/ last week and asked that people start to think about it and form some opinions 21:02:49 <patchbot> patch 672186 - swift - Ignore 404s from handoffs for objects when calcula... - 7 patch sets 21:03:09 <timburke> rledisez helpfully suggested that i make the new behavior configurable 21:03:30 <rledisez> *for few versions, to help transition 21:03:56 <timburke> ...but as i tried to write up an explanation for why you'd enable the new option... i'm having some doubts... 21:04:54 <timburke> regardless, i think we've got a bit of time -- clayg pointed out that it really ought to have some unit tests to cover the part of the change that only affects replicated 21:05:06 <clayg> oh dear, what did I do :'( 21:05:11 <timburke> but i'll try to get that done sooner rather than later 21:05:38 <clayg> oh, right - I don't like that knob at all 21:06:07 <clayg> I don't see any comments on the review that explain the change, so I'm guessing there was a discussion in irc that I missed? 21:06:38 <timburke> rledisez, what do we really expect operators to *do* with the option? ok, we upgraded -- crap, there's a bunch more 503s... twiddle the knob and make them 404s? 21:06:43 <clayg> rledisez: am I correct that the primary concern is some graph that tracks 5XX errors going up under load/failure/rebalance? 21:07:09 <rledisez> I think it should basically say "the new behavior might breaks your clients because the error code changed. to keep the old behavior while you upgrade your clients, set foo = false. the old behavior will be removed in version 2.Y" 21:07:09 <rledisez> the point is not about the metrics. i just won't get my bonus this year ;) 21:07:10 <timburke> what remediation would allow them to turn it back *off* 21:08:16 <clayg> but I want rledisez to get his bonus! he buys me beers sometimes! 21:08:45 <clayg> I don't think a client that was "correctly" handling a 404 with a retry would be surprised by needing to retry a 503 21:09:23 <clayg> maybe there's a more subtle status code change I'm not considering with proper reference - "might break clients" is a good reason *not* to change it all, but I'm not sure I understand the risk? 21:09:35 <timburke> yeah... idk -- clients have to be able to handle 5xx, even now... 21:09:55 <rledisez> I know some internal users that retry on 404 because I taught them about eventual consistency. I'm not sure what will happen on 503 21:10:08 <clayg> 🤔 21:10:36 <rledisez> (not saying they are right to retry, but that's an other discussion) 21:10:53 <timburke> rledisez, what sdk do they use? or is it pretty hand-rolled? 21:11:19 <rledisez> mostly homemade for what I know 21:11:40 <rledisez> languages are Go, Perl and Java 21:11:48 <timburke> yeah... that does make it tricky... 21:12:05 <notmyname> rledisez: but I think we all tought the clients incorrectly! a 404 should only maybe retry. most of the time, 404 should be the right answer if it's what's given. swift was doing the wrong thing by giving 404 instead of 503 21:13:04 <timburke> kota_, mattoliverau: have you had a chance to think about the issue? 21:13:04 <rledisez> notmyname: i totally agree with that 21:13:05 <notmyname> but if they have been taught to retry on 404 (and we always said retry on 5xx), and we change 404->503, aren't they even more likely to retry if there's a failure? 21:13:44 <kota_> hmm... interesting 21:13:47 <notmyname> so even if clients today have been retrying on 404, they will still work after the 404->503 change (because they better already be handling 5xx anyway) 21:13:55 <mattoliverau> I won't lie, I forgot to look 21:13:56 <clayg> well, we certainly could have been more specific when we want to say "i couldn't find your object, but also might be relevant I couldn't talk to anyone that might have been authoratative" 21:13:57 <timburke> fwiw, i'd written up https://bugs.launchpad.net/swift/+bug/1837819 to try to describe the issue 21:13:58 <openstack> Launchpad bug 1837819 in OpenStack Object Storage (swift) "Overloaded object primaries cause 404s on GET" [Medium,New] - Assigned to Tim Burke (1-tim-z) 21:14:23 <clayg> we want the client to retry, and a 5XX to me is a more RESTful way to indicate to *most* clients what they should do next... 21:14:35 <rledisez> as I said, I'm don't think they should retry, but the most common I case I see is people doing an upload, and checking it's there right after (even if they got a 2xx). I tried to explain "trust swift", but no, they don't trust 21:15:16 <kota_> IMHO, we could retry on both cases. if the users knows swift error statement absolutely retry with 503, not sure on 404. 21:15:40 <timburke> rledisez, we recently had a customer doing the same thing, but looking for the object in *listings* 🤦 21:15:43 <notmyname> rledisez: yeah. unfortunately I haven't been able to figure out how to fix users yet ;-) 21:15:49 <kota_> imo, it's not 5xx, just 503 21:16:21 <kota_> because we should not retry on 500 Internal Server Error, that wouldn't be fixed in the near future. 21:16:49 <mattoliverau> The users surely just need a sleep(10) or setting :p 21:16:59 <mattoliverau> *Something 21:17:04 <rledisez> maybe i'm just too cautious, because I don't think to enable that flag in my clusters. i'll do a proper communication before upgrading. but i'm thinking of somebody who would want to rollback quickly 21:17:47 <clayg> rledisez: yeah... if someone was already running on the edge and they didn't get enough communication before the upgrade they might be really confused/nervous about what the new status code is telling them 21:17:58 <clayg> "everything was working FINE!!!" - yeah... no... it wasn't. 21:18:59 <timburke> this is highlighting for me that this probably ought to have an UpgradeImpact regardless of whether we keep the config option 21:19:18 <kota_> sounds reasonable 21:19:59 <rledisez> agree 21:20:12 <clayg> honestly this change shouldn't even be as contreversal as p 667235 (in that case the proxy could really be doing an inline retry) 21:20:13 <patchbot> https://review.opendev.org/#/c/667235/ - swift - Don't handle object without container (MERGED) - 1 patch set 21:21:14 <clayg> well, I don't want the config option - but won't -2 it or anything w/o it - but it might be the ONLY CONFIG OPTION EVAR that I set a reminder to make sure we deprecate it in the next release 😉 21:21:23 <timburke> ...and *that* makes me wonder if maybe the guy that wrote the bug shouldn't have been the one clicking +A... 21:21:59 <timburke> (well, insofar as that guy was *me*) 21:22:12 <clayg> timburke: you're PTL you can +A whenver you want ;) 21:22:38 <kota_> lol 21:23:03 <notmyname> lol, not it! 21:23:30 <mattoliverau> Lol 21:24:22 <timburke> we've got some precedent for adding known-terrible-idea config options: https://github.com/openstack/swift/commit/94bac4a 21:24:56 <timburke> i'll keep thinking on it, but i'm kinda leaning toward clayg's position personally 21:25:06 <mattoliverau> lol 21:26:07 <timburke> let's keep moving 21:26:15 <timburke> #topic py3 21:26:19 <kota_> config opt_out/in might be terrible 21:26:26 <timburke> not much to report 21:26:29 <kota_> i'm imagine s3acl... 21:26:31 <mattoliverau> I need to look closer at it, and sorry I didn't. Like kota_ says, 503 should be retry. But its the contract changing that is the impact. And one of our dev and Ops wants an escape clause, for a release, I think I'm ok with that. 21:26:38 <kota_> imaging 21:26:42 <mattoliverau> anyway, move on :) 21:26:46 <kota_> ok 21:27:04 <timburke> ugh, yeah... s3_acl... 21:27:28 <timburke> https://review.opendev.org/#/c/672610/ landed! hopefully zaitcev's cluster will be happier now :-) 21:27:29 <patchbot> patch 672610 - swift - py3: fix non-ascii metadata handling in account-se... (MERGED) - 2 patch sets 21:27:37 <mattoliverau> \o/ 21:28:11 <timburke> as did https://review.opendev.org/#/c/672803/ -- there may be a bit of a long tail of patches like that :-/ 21:28:11 <patchbot> patch 672803 - swift - py3: Fix title-casing in HeaderKeyDict (MERGED) - 3 patch sets 21:30:16 <timburke> it'll be nice to get https://review.opendev.org/#/c/671333/ so i can run probe tests locally again :-P 21:30:17 <patchbot> patch 671333 - swift - py3: (mostly) port probe tests - 2 patch sets 21:30:32 <timburke> that's about it 21:30:41 <timburke> #topic lots of small files 21:31:08 <timburke> rledisez, kota_ i haven't seen too much lately on the branch (but that's ok) 21:31:21 <kota_> ya, sorry 21:31:41 <timburke> like i said, that's ok! no need to apologize 21:31:53 <clayg> tdasilva: is going to be all up in the losf pretty soon! 21:31:56 <rledisez> yeah. alecuyer is off so don't expect a lot from OVH for few weeks 21:31:58 <timburke> i know people here at swiftstack have been getting increasingly interested -- i think tdasilva has been taking a look recently? 21:32:04 <kota_> nice, tdasilva!!! 21:32:52 <timburke> fwiw, i feel like this ought to be the next big item of work that we're all focused on 21:33:15 <kota_> +1 21:33:38 <kota_> oh, tdasilva has left. 21:33:46 <clayg> +2 21:34:19 <timburke> bah. i was just about to ask him if there was anything he'd like to bring up about it... oh well 21:34:46 <timburke> #topic sharding 21:35:11 <timburke> mattoliverau, i know you've got a few patches up now -- anything we ought to be doing besides reviewing them? 21:35:35 <mattoliverau> Nah just reviewing them and point out the obvious flaws and edge cases :) 21:35:46 <timburke> 👍 21:36:02 <timburke> #topic symlinks and versioning 21:36:23 <mattoliverau> I last patch is a complete POC were we send the ranges from the scanner via UPDATE to do a reverse rollback stratergy. Not sure I like it, but was an idea I had. 21:36:31 <mattoliverau> that's all 21:36:40 <timburke> thanks 21:37:22 <clayg> on the hardlinks I was looking at some comments this morning, and adding some more tests for behaviors that we want to better specify - I think we're still not 100% clear on how hardlinks to manifests/symlinks should look 21:37:44 <clayg> I think timburke had the most experience/insight - so i'm hoping to get his feedback on some of the new tests I drafted 21:38:01 <timburke> i'll be sure to take a look :-) 21:38:44 <timburke> any other blockers for you, or places that you need more input? 21:39:07 <clayg> i also managed to get up the s3 versioning patch at the end of the chain, p 673682 21:39:08 <patchbot> https://review.opendev.org/#/c/673682/ - swift - s3api: Implement versioning status API - 1 patch set 21:39:37 <clayg> one thing that starts to shine through on that one is how much it just assumes it knows how versioning works and does what it needs to implement the aws api 21:40:11 <clayg> I think in a more perfect world we'd have looked at the s3 versioning features and added them to versioned writes - then s3api is just doing *translation* from aws apis to swift apis 21:40:46 <kota_> +1 21:40:50 <clayg> but since we don't have spellings for ... e.g. "copy version_id XYZ" we just "do it" in s3api 21:41:37 <clayg> but I don't think moving forward with something that works really prevents us in any meaningful way from doing that work later (except that maybe we'd have less motivation to do so) 21:42:32 <clayg> OTOH, I'm not sure much moving to symlink versionsing is really going to throw off clients that sort of had to learn how stack & history versoining worked already so they could do things like "restore version X" or "delete version Y" 21:43:11 <clayg> had we had an API for that all along it would make less of a difference to clients when we decide to change the underlying implementation 21:43:59 <timburke> makes sense 21:44:05 <clayg> anyway, is what is... something I'll be a little more aware of as we flesh more of the s3api matrix down the road... 21:44:35 <clayg> I don't think i'm blocked right now and I feel like i'm making progress 21:44:42 <clayg> all feedback is appreciated! 21:45:04 <timburke> 👍 21:45:17 <timburke> #topic shanghai 21:45:47 <timburke> i know i dropped it from the agenda, but i've been looking at the order in which i'll have to do things and i thought i ought to share 21:46:38 <timburke> keeping in mind that it's a bit US-centric, but may be more-or-less applicable for you, too mattoliverau, kota_, and rledisez 21:46:58 <kota_> something like VISA? 21:47:39 <mattoliverau> its probably pretty similar I suspect. Ie get a letter, get visa, etc. 21:49:30 <timburke> looks like to get the visa (http://www.china-embassy.org/eng/visas/hrsq/#M, $140) i need the invitation letter (https://openstackfoundation.formstack.com/forms/visa_form_shanghai_summit), and to get *that* i need to register (https://app.eventxtra.link/registrations/6640a923-98d7-44c7-a623-1e2c9132b402, $161 after the contributor discount) 21:50:14 <timburke> as far as i can tell, airfare and hotel can be done at any point along there 21:50:30 <kota_> that registration with early bird will close around... 14 (maybe?) Aug. 21:50:59 <rledisez> kota_: I was looking for that information earlier. do you have a link? 21:51:06 <kota_> I'm not sure the contribution discount will keep the $161 21:51:22 <timburke> https://www.openstack.org/summit/shanghai-2019/ says "Summit registration is open - get your tickets before prices increase on August 14 at 11:59pm PT! " 21:51:31 <kota_> https://www.openstack.org/summit/shanghai-2019/ 21:51:34 <kota_> same URL 21:51:38 <kota_> #link https://www.openstack.org/summit/shanghai-2019/ 21:51:45 <rledisez> perfect. thx. how did i miss that :) 21:51:55 <kota_> no information about the standard price after early bird. 21:53:40 <timburke> that's about it. i did send something to the mailing list to call out our etherpad (http://lists.openstack.org/pipermail/openstack-discuss/2019-July/008156.html) -- it was a bit behind when most projects did theirs 21:54:07 <kota_> 你好! 21:54:08 <timburke> we'll see who else puts their name on https://etherpad.openstack.org/p/swift-ptg-shanghai :-D 21:54:54 <timburke> that's all i've got 21:54:59 <timburke> #topic open discussion 21:55:09 <clayg> i gotta bounce, ya'll be good 21:55:14 <timburke> anyone have anything eles to bring up in the last five minutes? 21:57:03 <timburke> all right. thank you all for coming, and thank you for working on swift! 21:57:08 <timburke> #endmeeting