#openstack-meeting log

21:00:03 <timburke> #startmeeting swift
21:00:03 <openstack> Meeting started Wed Jun 17 21:00:03 2020 UTC and is due to finish in 60 minutes.  The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:00:04 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:00:06 <openstack> The meeting name has been set to 'swift'
21:00:10 <timburke> who's here for the swift meeting?
21:00:26 <kota__> hi
21:01:12 <tdasilva> half here
21:01:30 <rledisez> o/
21:01:34 <alecuyer> o/
21:02:18 <mattoliverau> o/
21:02:43 <timburke> as usual, the agenda's at https://wiki.openstack.org/wiki/Meetings/Swift
21:02:53 <timburke> first up
21:02:59 <timburke> #topic gate
21:03:19 <timburke> you may have noticed that nothing was passing the last couple days
21:03:41 <timburke> i think it's all resolved now, but i wanted to give an overview of the issues
21:03:53 <timburke> #link http://lists.openstack.org/pipermail/openstack-discuss/2020-June/015432.html
21:04:19 <timburke> there was an issue with uwsgi that broke our grenade job (along with *everyone else*)
21:04:47 <timburke> the qa team's been all over it, and the resolution merged last night
21:05:35 <timburke> then there was another issue with our probe tests (most visibly; also affected the ceph s3 tests and rolling upgrade tests)
21:06:05 <timburke> pretty sure it was the result of pip no longer being available in the base images
21:06:06 <timburke> #link http://lists.openstack.org/pipermail/openstack-discuss/2020-June/015425.html
21:06:08 <clayg> o/
21:06:43 <timburke> the fix there did require a change to our tooling, but that merged this morning
21:06:44 <timburke> https://review.opendev.org/#/c/735992
21:06:44 <patchbot> patch 735992 - swift - Use ensure-pip role (MERGED) - 5 patch sets
21:08:29 <timburke> i rechecked a bunch of changes about three hours ago, but everything's all backed up so none of those have actually posted new results yet
21:08:31 <clayg> thanks for fixing the gate timburke !
21:08:51 <timburke> if anyone sees more issues, holler!
21:09:14 <timburke> #topic memcache and container failures
21:10:38 <timburke> so last week i had all replicas of a container get overloaded
21:10:57 <clayg> yeah that was pretty cool
21:11:22 <clayg> actually I wasn't there - it SOUNDED cool (after the fact)
21:11:42 <timburke> which led me to notice that when the proxy hands back a 503 (because we got timeout, timeout, timeout, 404, 404, 404), we go evict memcache
21:11:55 <timburke> #link https://bugs.launchpad.net/swift/+bug/1883211
21:11:55 <openstack> Launchpad bug 1883211 in OpenStack Object Storage (swift) "get_container_info 503s shouldn't try to clear memcache" [Undecided,In progress]
21:13:40 <timburke> which meant that once info fell out of cache while there were hundreds of concurrent requests trying to do things in the container, it couldn't *stay in cache* even when some of those HEADs to try to repopulate managed to get back to the proxy
21:14:47 <timburke> i proposed https://review.opendev.org/#/c/735359/ to fix it (basically, follow what the docstring said to do in set_info_cache), but i was wondering if anyone else has seen similar behavior
21:14:48 <patchbot> patch 735359 - swift - proxy: Stop killing memcache entries on 5xx responses - 4 patch sets
21:15:07 <clayg> moral of the story: don't let your primaries get overloaded - but when you do!  you know... be better swift
21:15:08 <zaitcev> I haven't but it sounds persuasive.
21:16:14 <timburke> note that prior to https://review.opendev.org/#/c/667411/ (from about a year ago), we would've been caching a 404
21:16:19 <patchbot> patch 667411 - swift - Return 503 when primary containers can't respond (MERGED) - 2 patch sets
21:16:48 <clayg> I was reluctant to go mucking with such old code; but once I realized we're a few iterations away for untangling all the things that could possibly lead to clients+sharder overwhelming a root db... I loaded it in my head and it makes sense to me
21:17:25 <timburke> (funny enough, it was definitely the same cluster and quite possibly the same container that prompted that change, too)
21:17:29 <clayg> I'm not even sure we really *intended* to clear the cache on error - the history of how it evolved reads more like it just happened on accident as the code evolved
21:18:12 <clayg> certainly all the primaries being overloaded isn't something that comes up often - it's possible it was just never bad enough (or when it go that bad there was like OTHER thing that were ALSO bad - like... idk... not enough ratelimiting)
21:18:36 <timburke> yeah, it sure *seemed like* https://review.opendev.org/#/c/30481/ didn't mean to change behavior like that
21:18:37 <patchbot> patch 30481 - swift - get_info - removes duplicate code (Take 3) (MERGED) - 17 patch sets
21:18:41 <clayg> anyway - even if I'm wrong and someone thought they had a good reason to flush cache on error... I can't convince myself anymore it's a good idea
21:19:11 <clayg> when the backend service is saying "please back off" - GO HARDER - is rarely going to be the BEST plan 😁
21:20:01 <clayg> anyway; we're shipping it - and at least two cores like the change - so it'll probably merge eventually, but it's fairly fresh and we're open to better ideas!
21:20:41 <zaitcev> The problem is usually the cache being stale. If the error is indicative of the main storage being changed without cache flushed, then cache needs to be flushed. not sure if 503 is such. The 409 seems like a candidate for suspicion.
21:22:04 <timburke> *nod* i'm not sure that the container server can send back a 409 on GET or HEAD, but good thinking. will check
21:22:34 <clayg> which 409?  timburke the 404 cache is so weird... to think of that as a "remediation" I mean... maybe a client does a PUT and ends up handoffs!?  I don't think that behavior was anymore desirable really.
21:22:49 <clayg> I'm most happy about the tests - it's now defined behavior - we're saying on 503 we don't want to flush the cache
21:23:14 <clayg> if we change our minds later at least we have tests that can express what we want - and we won't accidently forget to think about it next time we're working in there
21:23:33 <clayg> i'm gunna go +A it right now - I'm totally talking myself into it!!! 😁
21:23:44 <timburke> lol
21:24:25 <timburke> so as clayg mentioned, the trouble seemed to come from the shard stat reporting. fortunately, we've already landed a latch for that
21:24:49 <clayg> timburke: so you're saying before p 30481 you think we'd leave the cache alone on 503?  Or just that was so old ago who KNOWS what would have happened?
21:24:50 <timburke> unfortunately, we hadn't gotten that fix out to our cluster yet
21:24:50 <patchbot> https://review.opendev.org/#/c/30481/ - swift - get_info - removes duplicate code (Take 3) (MERGED) - 17 patch sets
21:25:13 <timburke> clayg, yeah, pretty sure it would've been left alone
21:26:23 <clayg> ok, so ... mostly just a heads up for folks I guess - the patch is new; but good.  If anyone else had noticed the behavior before that'd be cool - but it's ok if not either.
21:26:35 <timburke> while we were trying to stop those shard stats from reporting, we were sad to see that we couldn't just stop the replication servers to stop the background traffic
21:26:55 <timburke> #topic replication network and background daemons
21:27:23 <timburke> i wrote up https://launchpad.net/bugs/1883302 and https://review.opendev.org/#/c/735751/ for that particular issue
21:27:23 <openstack> Launchpad bug 1883302 in OpenStack Object Storage (swift) "container-sharder should send stat updates using replication network" [Undecided,In progress]
21:27:23 <patchbot> patch 735751 - swift - sharder: Use replication network to send shard ranges - 1 patch set
21:27:23 <clayg> oh yeah, this one's heavy - timburke wants to go full on
21:28:13 <alecuyer> clayg: (sorry, lagging), we have not seen it, but I can't say it hasn"t happened either
21:28:22 <clayg> hrm... I know that p 735751 is slightly more targeted to the bug - but really the issue and the fix are much more pervasive than we realized originally
21:28:22 <patchbot> https://review.opendev.org/#/c/735751/ - swift - sharder: Use replication network to send shard ranges - 1 patch set
21:29:15 <clayg> timburke: I'd argue we reword the bug to at least "sharder and reconciler don't always use replication" and attempt to move forward with p 735991 which is bigger but WAY better
21:29:16 <patchbot> https://review.opendev.org/#/c/735991/ - swift - Add X-Backend-Use-Replication-Network header - 1 patch set
21:29:54 <timburke> yeah -- so the writes go over replication, but the sharder still does reads over the client-traffic interface -- but it was harder to fix since it uses internal_client for that
21:30:07 <timburke> it's got me wondering: which interface should our background daemons be using?
21:30:09 <clayg> it's like a unified way to make all our different little client interfaces use replication networks like the probably all should have been doing forever; but we never had an interface for 'em before
21:31:06 <mattoliverau> oh yeah interesting.
21:31:18 <timburke> the way i've got that second patch at the moment, callers have to opt-in to using the replication network. but i wonder if we could/should use it by default
21:32:26 <clayg> timburke: I think i'd be willing to say any thing besides the proxy connecting to the node[ip] when a [replication_ip] is available is a bug?  like not a design choice, or operator choice - a bug
21:32:30 <mattoliverau> if a direct client or internal client is ever used inline from a customer request then client traffic else replication network.
21:32:44 <timburke> clayg says we (nvidia nee swiftstack) have at least one utility we've written that *would* want the client-traffic network; i wonder what other people have written and which interface they'd prefer
21:33:11 <clayg> that's a fairly strong stance, but personally having a separate storage server for background work (that I can turn off when needed) has been a HUGE QOL improvement for me over the years
21:33:57 <clayg> mattoliverau: I don't think internally we ever use direct/internal client from inside the proxy (i.e. related to a user request)
21:34:07 <clayg> timburke: do some of the new UPDATE requests use direct client?
21:34:30 <clayg> "new" - i'm not sure there's anything landed that does that... and IIRC they just call req.get_resp(app)?
21:34:32 <mattoliverau> yeah, trying to decide if we use it anywhere
21:34:39 <timburke> nope, it's plumbed through the proxy-server app
21:35:36 <timburke> well, maybe i keep it opt-in on that patch and propose another to change the default while people think through what they've got and what the upgrade impact would be like
21:35:57 <clayg> so internal-client and "the proxy-server app" are VERY similar - but Tim found a place between InternalClient and the app itself where we can plumb this header through (and then way down near where we make_connection(node, ...) we get to look at headers to pick node[ip] or node[replication_ip]
21:36:43 <clayg> it's really sort of slick - and sexy because it works uniformly across both interfaces (because both interfaces already take headers and can set backend defaults)
21:36:53 <zaitcev> Direct client goes straight to replication network sounds unexpected to me. I thought that proxies might not even have that network.
21:37:36 <clayg> zaitcev: that's good feedback!  proxies don't use direct client - but anything "defaulting" to the backend network might be "surprising" to some
21:38:37 <mattoliverau> a quick grep, yeah no direct client in proxy or middlewares.
21:38:42 <clayg> and I hadn't considered access/topology - if someone deploys anything that uses either of these interfaces ON a node that can use the replication network, that could be a big surprise 😞
21:40:25 <mattoliverau> Well for things like the reconsiler and sharder, they are part of the consistency engine, the sharder is just a type of replicatior (in a way). so yeah totally should do it's work over replication network.
21:40:45 <clayg> timburke: I would encourage you to drop p 735751 now that p 735991 is on everyone's radar - to me, it's not so much about "fixing ALL THE THINGS" as "fixing it RIGHT"
21:40:45 <patchbot> https://review.opendev.org/#/c/735751/ - swift - sharder: Use replication network to send shard ranges - 1 patch set
21:40:47 <patchbot> https://review.opendev.org/#/c/735991/ - swift - Add X-Backend-Use-Replication-Network header - 1 patch set
21:41:11 <timburke> 👍 thanks for the feedback, everyone!
21:41:24 <timburke> on to updates
21:41:31 <timburke> #topic waterfall EC
21:41:37 <timburke> clayg, how's it going?
21:41:50 <clayg> mattoliverau: I'm glad to hear you say that!  I think having internal and direct client growing these new interfaces will amek it much easier to get it right out of the gate for new daemons
21:41:58 <clayg> timburke: a little better-ish, or maybe?
21:42:01 <clayg> I like the feeder!
21:42:35 <clayg> https://review.opendev.org/#/c/711342/8 phew - too many links open
21:42:35 <patchbot> patch 711342 - swift - wip: asyc concurrent ecfragfetcher - 8 patch sets
21:42:47 <clayg> I'm still waffling about the code duplication
21:44:02 <clayg> i don't know exactly how to describe the experience of pulling them apart - it's like I'm starting to see the tear lines and I can't help but try and imagine a few abstraction that could MAYBE cut through them 😞
21:44:12 <clayg> I mostly try not to think about it while I make waterfall-ec awseome
21:44:17 <clayg> which it *totally* is
21:45:33 <clayg> or at least I can see how it will be - once I add a follow to configure the feeder with per-policy settings and the stair-step configuration that alecuyer talked about at the PTG
21:45:50 <alecuyer> nice!
21:46:01 <clayg> I'm much more excited about working on that code than wading through the mess of cutting up GETorHEADHandler and  ECFragGetter
21:46:33 <clayg> at some level I want to just leave the messy turd there finish the stuff I care about and then try to re-evaluate when I feel less pressure to FIX THE DAMN BUG
21:47:17 <clayg> but I sort of know a new priority will come along, and even though I'll probably get up a patch out of pure guilt - it's not obvious to me "here a 1000 line diff that doesn't change anything" is gunna get merged if I'm not complaining about it
21:47:52 <clayg> ALSO!  I need to chat with folks about extra requests for non-durables - or at least... the existing behavior is obviously wrong and the correct behavior is not obvious
21:48:23 <clayg> I picked something... and it's... better - but what if Y'ALL have an even BETTER idea!!!
21:48:58 <zaitcev> Little hope of that I'm afraid.
21:49:10 <zaitcev> Also
21:49:13 <clayg> i dunno if we can wait til the next PTG to go over it...
21:49:49 <timburke> should we read what you've done so far to try to get our heads around the problem, or should we sum it now?
21:50:23 <clayg> I think it's a complex enough change (I'm really trying to SIMPLIFY) that it's worth a read by anyone who can handle it
21:50:34 <clayg> I've been trying to drop comments around the interesting bits
21:50:41 <timburke> we could schedule a video chat if you think something closer to "in person" would be best
21:51:05 <mattoliverau> etherpad braindump of the current problem, them video chat to talk through it?
21:51:14 <mattoliverau> plus time to look at code :)
21:51:21 <alecuyer> yes  waiting for the next ptg is too far if clay is working on it *now* ?
21:51:34 <clayg> for the non-durable extra requests - yeah I would like to high-bandwidth (very helpful at the PTG); I would definitely try and prepare if there was something scheduled.
21:52:48 <mattoliverau> let's do something then :) We could zoom or jitsi and announce it in channel so anyone can attend. (keeping it open).
21:52:48 <clayg> ok, well no one is screaming about the code duplication - that gives me some confidence that I've built it up enough no one is going to go to review and be like "WTF is this!?  you can't do this!"
21:53:12 <clayg> so I'll leave the turd there and move on down the line to the follow on configuration stuff (which will be SUPER sexy)
21:53:47 <clayg> then we're just left with non-durable extra requests - which I can write up ASAP and Tim will help me with a zoom thing
21:54:02 <timburke> 👍
21:54:08 <mattoliverau> clayg: like you said at the PTG code dup is ok, so long as we all know, it's documented, and it make it easier to grok and understand ;)
21:54:24 <clayg> mattoliverau: ❤️ you guys are the best
21:54:35 <timburke> all right
21:54:38 <mattoliverau> thanks for polishing the turd :)
21:54:59 <timburke> sorry rledisez, alecuyer: i forgot to drop losf from the agenda like i'd promised to last week
21:55:06 <timburke> so
21:55:11 <timburke> #topic open discussion
21:55:24 <timburke> anything else we should talk about in the last five minutes?
21:55:27 <alecuyer> well I'll just post a link for clay ;) wrt to a PTG question
21:55:27 <clayg> https://review.opendev.org/#/c/733919/
21:55:28 <patchbot> patch 733919 - swift - s3api: Allow CompleteMultipartUpload requests to b... - 3 patch sets
21:55:28 <alecuyer> https://docs.python.org/3/library/multiprocessing.shared_memory.html
21:55:50 <clayg> alecuyer: YAS!!
21:55:55 <alecuyer> 3.8 only tho  - but nice interface to use shared memory, switch the ring to use numpy ?
21:56:05 <clayg> timburke: my complete multi-part retry has been going for 3.5 hours - and it's still working
21:56:07 <timburke> alecuyer, that also makes me think of something DHE mentioned earlier today...
21:56:19 <alecuyer> didn't think about it but thought i'd share the link, and sorry if you're all aware of that
21:56:34 <timburke> clayg, wow! 5 min seems *way* too short then -- maybe it should work indefinitely
21:56:45 <clayg> dunno 😞
21:56:58 <clayg> also i haven't tried abort - or... what was the other calls you were interested in?
21:57:34 <timburke> abort after complete and complete after abort are the two sequences i'm a little worried about
21:57:54 <clayg> alecuyer: I remember thinking "oh it's only arrays?  pfhfhfhfh" - but now that you mention it - what is the ring except a big array!?  😁
21:58:02 <clayg> there is the error limiting stuff 🤔
21:58:33 <timburke> zaitcev, thanks for the review on https://review.opendev.org/#/c/734721/ !
21:58:33 <patchbot> patch 734721 - swift - py3: (Better) fix percentages in configs - 4 patch sets
21:58:35 <clayg> abort after complete - so i'm in that state now... but if that works I could try to complete it again too!  🤔
21:59:00 <kota__> error limiting staff on shared memory seems good idea
21:59:01 <zaitcev> So, are we trying to load rings into SysV shm?
21:59:31 <zaitcev> I'd be more comfortable with an mmap() of some temp file into which the json or pickle is dumped first.
21:59:44 <clayg> kota__: yes!  alecuyer will figure out how to make it work :P
22:00:11 <kota__> it seems it's not only py3.8 but greater and equals to 3.8?
22:00:20 <kota__> not yet 3.9 released yet
22:00:44 <alecuyer> kota__: right
22:00:48 <kota__> good
22:01:01 <timburke> all right, we're about out of time
22:01:04 <clayg> kota__: yeah and like zaitcev it's maybe not even a full solution on it's own even if we did want to do it >= 3.8 only (which by the time it's done might seem reasonable)
22:01:33 <timburke> thank you all for coming! i feel like we had some really good discussions today :-)
22:01:35 <kota__> clayg: true, got it.
22:01:55 <timburke> thank you all for coming, and thank you for working on swift!
22:01:59 <timburke> #endmeeting