21:01:17 <timburke_> #startmeeting swift 21:01:17 <opendevmeet> Meeting started Wed May 4 21:01:17 2022 UTC and is due to finish in 60 minutes. The chair is timburke_. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:01:17 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:01:17 <opendevmeet> The meeting name has been set to 'swift' 21:01:23 <acoles> o/ 21:01:24 <timburke_> who's here for the swift meeting? 21:01:53 <mattoliver> o/ 21:03:07 <timburke_> as usual, the agenda's at 21:03:09 <timburke_> #link https://wiki.openstack.org/wiki/Meetings/Swift 21:04:23 <timburke_> but the overall thing i've noticed is: we've got three or four major threads we've been pulling on, and i feel like we need to actually commit to getting some landed or otherwise resolved 21:05:46 <timburke_> i think i'll do things a little out of order 21:06:04 <timburke_> #topic eventlet and locking 21:06:40 <timburke_> a quick overview of why we're even looking at this, though i think both acoles and mattoliver are in the loop already 21:07:17 <timburke_> it all started with wanting to remove some import side-effects with https://review.opendev.org/c/openstack/swift/+/457110 -- specifically, to stop monkey-patching as part of import 21:08:10 <timburke_> the more we dug into it, the more we started to freak out at how broken eventlet monkey-patching of locks is on py3 (see https://github.com/eventlet/eventlet/issues/546) 21:09:32 <timburke_> which led to much hand-wringing and staring at code for a while, and a realization that all object replicators (for example) serialize on logging since the PipeMutex works across processes 21:09:50 <zaitcev> Hah! 21:10:47 <timburke_> until we decided that maybe we *don't* need those logging locks *at all*! so we're going to see how https://review.opendev.org/c/openstack/swift/+/840232 behaves and whether it seems to garble our logs 21:11:48 <acoles> 🤞 21:12:58 <timburke_> note that the fact that the PipeMutex works across processes *also* presents some risk of deadlocks -- if a worker dies or is killed while holding the ThreadSafeSysLogHandler's lock, everybody in the process group deadlocks 21:16:01 <timburke_> so, hopefully that all works out, we drop the locking *entirely* in ThreadSafeSysLogHandler, and hope that there aren't any other CPython RLocks that might actually matter. sounds like mattoliver is planning on doing some load testing to verify our assumptions about how UDP logging (whether via the network or UDS) works 21:16:51 <mattoliver> Yup, will load test it some today. See if I can break it 21:18:20 <acoles> nice - try your hardest @mattoliver :) 21:18:26 <timburke_> next up 21:18:32 <timburke_> #topic ring v2 21:18:34 <zaitcev> I assume one log message is one large UDP datagram. The protocol does the segmentation into packets for you, and it will not deliver a partially assembled datagram. I think. 21:18:57 <opendevreview> Alistair Coles proposed openstack/swift master: backend ratelimit: support per-method rate limits https://review.opendev.org/c/openstack/swift/+/840542 21:19:00 <timburke_> zaitcev, yeah, that was my understanding too 21:19:16 <timburke_> and if it's via a domain socket, even better 21:20:38 <timburke_> we've talked about ring v2 for a couple PTGs now, we've tried implementing it in a couple different patch chains -- but we haven't actually landed much (any?) code for it 21:22:29 <timburke_> so my two questions are: does the current implementation seem like the right direction? and if so, can we land some of it before *all* of the work's ready? 21:23:43 <mattoliver> I think the current index approach is correct. And I've been able to extend it to include the builder in my WIP chain. And it's what we discussed at PTGs. So I think it's a win 21:24:37 <mattoliver> The new RingReader/RingWriter approach encapsulates it really well. Kudos timburke_ 21:26:49 <mattoliver> So for me I'm +1 on thinking it's the right direction, but I might be biased because I've been looking at it and involved closer with it then most. 21:27:07 <timburke_> my feeling, too :-) 21:28:54 <timburke_> so should we commit to reviewing and landing https://review.opendev.org/c/openstack/swift/+/834261? mattoliver, do you have any concerns that some of the follow-ups should get squashed in before landing? 21:32:24 <mattoliver> Nah I think it's a great start. And so long as well double check before next release, I'm fine with it. 21:32:44 <mattoliver> ie if any follow ups are required. 21:33:14 <mattoliver> There was an issue in the RingWriter I think but you fixed that, so cool. 21:34:24 <timburke_> all right! sounds good 21:34:28 <timburke_> next up 21:34:31 <timburke_> #topic backend rate-limiting 21:35:12 <timburke_> acoles, i feel like your chain's getting a little long :-) 21:35:32 <acoles> yeah I need to rate limit myself 21:36:10 <acoles> some may get squashed but I am working little by little because we want to get *something* into prod soon 21:36:29 <mattoliver> lol 21:36:32 <acoles> the first couple of patches are refactor and minor fixup 21:37:59 <acoles> the goal is to provide a 'backstop' ratelimiter that will allow backend servers to shed load rather than have huge queues build up 21:39:45 <timburke_> anything we should be aware of as we review it? it looks like there may be some subtle interactions with proxy-server error limiting 21:43:05 <acoles> yes, so this patch could use some careful review https://review.opendev.org/c/openstack/swift/+/839088 - on master any 5xx response will cause the proxy to error limit (or increment the error limit counter), but we don't want that to happen when the backend is rate limiting (that would be an unfortunate amplification of the rate limiting) 21:43:51 <acoles> so the patch adds special handling for 529 response codes (at PTG IIRC we decided 529 could be used as 'too many backend requests') 21:44:31 <mattoliver> oh yeah, that could be bad, don't want to rate limite into error limitted 21:44:34 <acoles> I chose to NOT log these in the proxy since there could be a lot and they won't stop because the proxy won't error limit the node 21:46:06 <timburke_> seems reasonable -- the 529s will still show up in log lines like `Object returning 503 for [...]`, though, yeah? 21:47:43 <acoles> yes I checked that 21:48:53 <timburke_> https://review.opendev.org/c/openstack/swift/+/840531 seems like a great opportunity to allow for config reloading similar to what we do with rings 21:50:02 <acoles> haha, yes that's exactly where I am heading with that! actually, clayg's idea 21:50:06 <mattoliver> oh yeah, that'll be cool. 21:51:06 <acoles> but, one thing at a time, I think the finer grained per-method ratelimiting may be higher priority?? we'll see 21:51:19 <timburke_> sounds good 21:51:45 <timburke_> ok, we're getting toward the end of our time -- maybe i'll skip the memcache stuff for now, though i'd encourage people to take a look 21:51:51 <timburke_> #topic open discussion 21:52:01 <timburke_> what else should we bring up this week? 21:55:27 <mattoliver> not really, for those who use it, I've been updating VSAIO To use jammy because I got sick of deprecation warnings and older version of things: https://github.com/NVIDIA/vagrant-swift-all-in-one/pull/126 21:55:28 <mattoliver> *nothing really 21:56:12 <mattoliver> timburke_: you were interested in getting people to look at signal handling, so https://review.opendev.org/c/openstack/swift/+/840154 but I'll take a better look today. 21:56:13 <timburke_> 🎉 edge of technology, man! 21:56:15 <clarkb> we've got jammy ci nodes coming up now. There have been a few minor bumps, but they should mostly be working 21:56:16 <zaitcev> Personally I don't believe in that limiting thing unless it's global. 21:56:39 <zaitcev> Election, master, and algorithm that considers a snapshot of load. 21:56:49 <zaitcev> Buuuut 21:57:16 <zaitcev> But it's Alistair so it obviously good, so I dunno. 21:57:55 <mattoliver> cool thanks clarkb ! 21:58:09 <zaitcev> Maybe it will be some kind of thing that can dampen itself and never amplify the congestion. 21:58:51 <acoles> zaitcev: it is admittedly a fairly blunt tool 22:00:08 <timburke_> mostly depends on the client response to being told to back off, i suppose -- i have a hard time believing it'd be worse than what we've got now though 22:00:47 <timburke_> (which is to say, backend servers backing up so much that proxies have already timed out by the time they've read the request that was sent) 22:01:28 <timburke_> all right -- i've kept you all long enough 22:01:37 <timburke_> thank you all for coming, and thank you for working on swift! 22:01:42 <timburke_> #endmeeting