#openstack-swift log

21:01:17 <timburke_> #startmeeting swift
21:01:17 <opendevmeet> Meeting started Wed May  4 21:01:17 2022 UTC and is due to finish in 60 minutes.  The chair is timburke_. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:01:17 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:01:17 <opendevmeet> The meeting name has been set to 'swift'
21:01:23 <acoles> o/
21:01:24 <timburke_> who's here for the swift meeting?
21:01:53 <mattoliver> o/
21:03:07 <timburke_> as usual, the agenda's at
21:03:09 <timburke_> #link https://wiki.openstack.org/wiki/Meetings/Swift
21:04:23 <timburke_> but the overall thing i've noticed is: we've got three or four major threads we've been pulling on, and i feel like we need to actually commit to getting some landed or otherwise resolved
21:05:46 <timburke_> i think i'll do things a little out of order
21:06:04 <timburke_> #topic eventlet and locking
21:06:40 <timburke_> a quick overview of why we're even looking at this, though i think both acoles and mattoliver are in the loop already
21:07:17 <timburke_> it all started with wanting to remove some import side-effects with https://review.opendev.org/c/openstack/swift/+/457110 -- specifically, to stop monkey-patching as part of import
21:08:10 <timburke_> the more we dug into it, the more we started to freak out at how broken eventlet monkey-patching of locks is on py3 (see https://github.com/eventlet/eventlet/issues/546)
21:09:32 <timburke_> which led to much hand-wringing and staring at code for a while, and a realization that all object replicators (for example) serialize on logging since the PipeMutex works across processes
21:09:50 <zaitcev> Hah!
21:10:47 <timburke_> until we decided that maybe we *don't* need those logging locks *at all*! so we're going to see how https://review.opendev.org/c/openstack/swift/+/840232 behaves and whether it seems to garble our logs
21:11:48 <acoles> 🤞
21:12:58 <timburke_> note that the fact that the PipeMutex works across processes *also* presents some risk of deadlocks -- if a worker dies or is killed while holding the ThreadSafeSysLogHandler's lock, everybody in the process group deadlocks
21:16:01 <timburke_> so, hopefully that all works out, we drop the locking *entirely* in ThreadSafeSysLogHandler, and hope that there aren't any other CPython RLocks that might actually matter. sounds like mattoliver is planning on doing some load testing to verify our assumptions about how UDP logging (whether via the network or UDS) works
21:16:51 <mattoliver> Yup, will load test it some today. See if I can break it
21:18:20 <acoles> nice - try your hardest @mattoliver :)
21:18:26 <timburke_> next up
21:18:32 <timburke_> #topic ring v2
21:18:34 <zaitcev> I assume one log message is one large UDP datagram. The protocol does the segmentation into packets for you, and it will not deliver a partially assembled datagram. I think.
21:18:57 <opendevreview> Alistair Coles proposed openstack/swift master: backend ratelimit: support per-method rate limits  https://review.opendev.org/c/openstack/swift/+/840542
21:19:00 <timburke_> zaitcev, yeah, that was my understanding too
21:19:16 <timburke_> and if it's via a domain socket, even better
21:20:38 <timburke_> we've talked about ring v2 for a couple PTGs now, we've tried implementing it in a couple different patch chains -- but we haven't actually landed much (any?) code for it
21:22:29 <timburke_> so my two questions are: does the current implementation seem like the right direction? and if so, can we land some of it before *all* of the work's ready?
21:23:43 <mattoliver> I think the current index approach is correct. And I've been able to extend it to include the builder in my WIP chain. And it's what we discussed at PTGs. So I think it's a win
21:24:37 <mattoliver> The new RingReader/RingWriter approach encapsulates it really well. Kudos timburke_
21:26:49 <mattoliver> So for me I'm +1 on thinking it's the right direction, but I might be biased because I've been looking at it and involved closer with it then most.
21:27:07 <timburke_> my feeling, too :-)
21:28:54 <timburke_> so should we commit to reviewing and landing https://review.opendev.org/c/openstack/swift/+/834261? mattoliver, do you have any concerns that some of the follow-ups should get squashed in before landing?
21:32:24 <mattoliver> Nah I think it's a great start. And so long as well double check before next release, I'm fine with it.
21:32:44 <mattoliver> ie if any follow ups are required.
21:33:14 <mattoliver> There was an issue in the RingWriter I think but you fixed that, so cool.
21:34:24 <timburke_> all right! sounds good
21:34:28 <timburke_> next up
21:34:31 <timburke_> #topic backend rate-limiting
21:35:12 <timburke_> acoles, i feel like your chain's getting a little long :-)
21:35:32 <acoles> yeah I need to rate limit myself
21:36:10 <acoles> some may get squashed but I am working little by little because we want to get *something* into prod soon
21:36:29 <mattoliver> lol
21:36:32 <acoles> the first couple of patches are refactor and minor fixup
21:37:59 <acoles> the goal is to provide a 'backstop' ratelimiter that will allow backend servers to shed load rather than have huge queues build up
21:39:45 <timburke_> anything we should be aware of as we review it? it looks like there may be some subtle interactions with proxy-server error limiting
21:43:05 <acoles> yes, so this patch could use some careful review https://review.opendev.org/c/openstack/swift/+/839088 - on master any 5xx response will cause the proxy to error limit (or increment the error limit counter), but we don't want that to happen when the backend is rate limiting (that would be an unfortunate amplification of the rate limiting)
21:43:51 <acoles> so the patch adds special handling for 529 response codes (at PTG IIRC we decided 529 could be used as 'too many backend requests')
21:44:31 <mattoliver> oh yeah, that could be bad, don't want to rate limite into error limitted
21:44:34 <acoles> I chose to NOT log these in the proxy since there could be a lot and they won't stop because the proxy won't error limit the node
21:46:06 <timburke_> seems reasonable -- the 529s will still show up in log lines like `Object returning 503 for [...]`, though, yeah?
21:47:43 <acoles> yes I checked that
21:48:53 <timburke_> https://review.opendev.org/c/openstack/swift/+/840531 seems like a great opportunity to allow for config reloading similar to what we do with rings
21:50:02 <acoles> haha, yes that's exactly where I am heading with that! actually, clayg's idea
21:50:06 <mattoliver> oh yeah, that'll be cool.
21:51:06 <acoles> but, one thing at a time, I think the finer grained per-method ratelimiting may be higher priority?? we'll see
21:51:19 <timburke_> sounds good
21:51:45 <timburke_> ok, we're getting toward the end of our time -- maybe i'll skip the memcache stuff for now, though i'd encourage people to take a look
21:51:51 <timburke_> #topic open discussion
21:52:01 <timburke_> what else should we bring up this week?
21:55:27 <mattoliver> not really, for those who use it, I've been updating VSAIO To use jammy because I got sick of deprecation warnings and older version of things: https://github.com/NVIDIA/vagrant-swift-all-in-one/pull/126
21:55:28 <mattoliver> *nothing really
21:56:12 <mattoliver> timburke_: you were interested in getting people to look at signal handling, so https://review.opendev.org/c/openstack/swift/+/840154 but I'll take a better look today.
21:56:13 <timburke_> 🎉 edge of technology, man!
21:56:15 <clarkb> we've got jammy ci nodes coming up now. There have been a few minor bumps, but they should mostly be working
21:56:16 <zaitcev> Personally I don't believe in that limiting thing unless it's global.
21:56:39 <zaitcev> Election, master, and algorithm that considers a snapshot of load.
21:56:49 <zaitcev> Buuuut
21:57:16 <zaitcev> But it's Alistair so it obviously good, so I dunno.
21:57:55 <mattoliver> cool thanks clarkb !
21:58:09 <zaitcev> Maybe it will be some kind of thing that can dampen itself and never amplify the congestion.
21:58:51 <acoles> zaitcev: it is admittedly a fairly blunt tool
22:00:08 <timburke_> mostly depends on the client response to being told to back off, i suppose -- i have a hard time believing it'd be worse than what we've got now though
22:00:47 <timburke_> (which is to say, backend servers backing up so much that proxies have already timed out by the time they've read the request that was sent)
22:01:28 <timburke_> all right -- i've kept you all long enough
22:01:37 <timburke_> thank you all for coming, and thank you for working on swift!
22:01:42 <timburke_> #endmeeting