21:00:15 <timburke> #startmeeting swift 21:00:15 <opendevmeet> Meeting started Wed Nov 30 21:00:15 2022 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:15 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:15 <opendevmeet> The meeting name has been set to 'swift' 21:00:22 <timburke> who's here for the swift meeting? 21:00:41 <mattoliver> o/ 21:01:08 <kota> o/ 21:01:27 <acoles> o/ 21:02:01 <timburke> as usual, the agenda's on the wiki 21:02:04 <timburke> #link https://wiki.openstack.org/wiki/Meetings/Swift 21:02:13 <timburke> first up 21:02:27 <timburke> #topic func test failures 21:02:50 <timburke> so we've still got those failing func tests on centos9 21:03:29 <timburke> and gmann was good enough to pin our dsvm job to focal so it wouldn't start failing, too 21:03:57 <timburke> we've got a fix in hand, and it's even got a +2 (thanks, acoles!) 21:04:00 <timburke> #link https://review.opendev.org/c/openstack/swift/+/863441 21:04:29 <timburke> so: are there any objections to merging it? 21:04:51 <mattoliver> Ok cool. I said I'd look at that, sorry, got distracted. Promise to take a look first thing today. 21:05:04 <timburke> no worries; thanks mattoliver 21:05:23 <mattoliver> From when I looked at it, it made sense. 21:06:01 <timburke> then separately: do we think it's worth removing our dependence on stdlib's http parsing? i have a gut feeling that it would be a fairly large endeavor 21:08:29 <timburke> and it's not clear whether we'd want to have that in-tree or push it into eventlet. i got the impression from https://github.com/eventlet/eventlet/pull/574 that temoto might be receptive to that... 21:09:25 <timburke> it doesn't necessarily mean that we'd have to own the parser, either -- something like https://github.com/python-hyper/h11/ looks somewhat appealing 21:09:50 <mattoliver> That does sound like a large undertaking. And keeping it maintained. Although we do pull more and more. I dunno, but if eventlet want to maintain it. Maybe it's worth it. 21:10:23 <mattoliver> Although would that mean I tighter coupling with eventlet 21:11:49 <timburke> and we'd almost certainly need to revisit https://review.opendev.org/c/openstack/swift/+/427911 | Replace MIME with PUT+POST for EC and Encryption 21:13:03 <timburke> anyway, i think we can debate that more later. my biggest worry is that we already have a bunch of large undertakings in progress 21:13:20 <mattoliver> yeah 21:13:42 <mattoliver> not a bad idea, maybe add it to ideas or something so we can discuss it more, or have a place to write things 21:13:45 <timburke> as long as we can continue to paper over the current trouble, i think we can defer that decision 21:14:00 <timburke> next up 21:14:00 <mattoliver> +1 21:14:10 <timburke> #topic testing under py310 21:14:26 <timburke> just a quick update that we're still blocked on requirements 21:14:35 <timburke> #link https://review.opendev.org/c/openstack/requirements/+/863581/ 21:15:16 <timburke> i gave a nudge just before the meeting, we'll see if we can get this moving 21:15:38 <timburke> #topic testing under py311 21:16:01 <timburke> regardless of what *we're* testing with, our users will continue marching forward! ;-) 21:16:41 <timburke> and by doing that, we helped discover a cpythong bug 21:16:54 <timburke> zigo noticed it first 21:16:57 <timburke> #link https://meetings.opendev.org/irclogs/%23openstack-swift/%23openstack-swift.2022-11-24.log.html 21:17:03 <mattoliver> yup, and you found a rather large issue in 3.11 too right? 21:17:21 <timburke> i eventually ran it down and wrote it up 21:17:23 <timburke> #link https://github.com/python/cpython/issues/99886 21:17:36 <timburke> and it looks like they're working on a fix 21:17:38 <timburke> #link https://github.com/python/cpython/pull/99902 21:18:30 <zigo> FYI, I have warned Doko aka Matthias Klose (maintainer of the interpreter in Debian and Ubuntu) about it on #debian-python. 21:18:44 <zigo> Hopefully, we'll have a fix soon for it. 21:18:56 <timburke> one question i've still got on my mind: i found a workaround to avoid the segfault -- is it worth us applying that? or should we just wait it out for a 3.11.1? 21:19:00 <zigo> However, it's not clear to me: do I still need a patch on the Swift side for it? 21:19:06 <timburke> #link https://review.opendev.org/c/openstack/swift/+/866051 21:19:11 <zigo> Oh, thanks. 21:19:41 <zigo> IMO, not worth merging upstream, but I can take it in downstream Debian until the interpreter is fixed. 21:19:51 <timburke> zigo, if cpython fixes their issue, i don't think the swift patch is needed (though i also haven't tested with the WIP fix yet) 21:20:06 <zigo> I'll build and let you know if that's an ok-ish temp fix for me. 21:20:07 <timburke> 👍 21:20:13 <zigo> Thanks again tim ! 21:20:53 <acoles> great detective work timburke 21:21:57 <timburke> there was something surprising in my work-around patch that seemed like we might still want... 21:22:02 <timburke> #link https://review.opendev.org/c/openstack/swift/+/866051/2/swift/common/db.py#a585 21:22:45 <timburke> the blanket exception handling in that context manager on master doesn't sit right with me 21:22:48 <zigo> I just wrote about the above at the Debian bug entry: https://bugs.debian.org/1024784 21:23:41 <timburke> thanks 21:24:33 <timburke> all right, we'll plan on holding off on that patch, though i might pull that bit i highlighted out as its own patch 21:24:46 <timburke> #topic ring v2 21:25:23 <timburke> so based on some feedback from clayg i started playing around with using zipfiles as rings 21:25:25 <timburke> #link https://review.opendev.org/c/openstack/swift/+/865066 21:26:27 <timburke> it's definitely still rough (not near as polished as the patch we talked about at the PTG), but the core ideas are all there 21:26:56 <timburke> but i think my biggest concern is over the name change 21:27:29 <timburke> 1. there are a bunch of places throughout the code base where we assume rings will be named *.ring.gz 21:28:01 <timburke> 2. dealing with automatic ring reloading seems like it'd get a little hairy 21:28:21 <mattoliver> I plan to as a follow up to the zip patch as it stands, so see what the fallout of a name change would look like. 21:28:42 <mattoliver> might play with that today if there isn't a fire or distraction at work 21:29:52 <timburke> that said, there were definitely some nice things about being able to use zip/unzip on the thing -- in particular, it seems like a much better way to shove in whatever other extension data you might want to put in a ring 21:29:54 <mattoliver> I assume ring reloaded, we'd just be looking at the latest mtime. 21:29:59 <mattoliver> maybe it wont be so bad 21:30:14 <mattoliver> but time and code will tell :P 21:31:25 <timburke> mattoliver, the thing that really got in my way was that we hang on to the filename we loaded and keep checking that -- being able to flop between two names got messy 21:31:50 <mattoliver> hmm, yeah 21:31:56 <timburke> anyway, based on that, i pushed up a new swift-ring-packer utility 21:31:59 <timburke> #link https://review.opendev.org/c/openstack/swift/+/866082 21:32:39 <timburke> lets you pack/unpack/list v2 rings made with the patch that we talked about in the PTG 21:33:08 <timburke> that's all i've got 21:33:14 <timburke> #topic open discussion 21:33:26 <timburke> what else should we talk about this week? 21:33:55 <mattoliver> I've made some progress on cleaning up the tracing stuff 21:34:08 <mattoliver> even moved it into a better namespace inside swift 21:34:53 <mattoliver> At the end of the chain I have a docs patch with an overview_trace 21:35:02 <zigo> timburke: I can confirm your workaround fixes the Swift Debian package unit test runs (just did a build run...). 21:35:10 <timburke> \o/ 21:35:19 <timburke> on both fronts ;-) 21:35:35 <mattoliver> The doc is very rough, but plan to put design decisions there, because I did have to make some interesting decisions thanks to eventlet 21:35:57 <mattoliver> And it's kinda important to know when reviewing tracing in swift 21:36:33 <mattoliver> #link https://review.opendev.org/c/openstack/swift/+/865612 21:36:43 <mattoliver> ^ thats the end of the chain with the doc 21:37:44 <mattoliver> I'm also trying to revive the shrinking chain 21:37:57 <mattoliver> #link https://review.opendev.org/c/openstack/swift/+/843973 21:38:03 <mattoliver> ^currenlty ending there 21:38:31 <mattoliver> I see acoles has worked worked on the front end of the chain again. So will check it out today 21:38:43 <mattoliver> yup, he double worked :P 21:39:19 <acoles> mattoliver: I think it is in good shape, thanks for your help. 21:39:28 <mattoliver> nps 21:39:38 <acoles> I checked test coverage and added some 21:40:14 <acoles> I'll try to keep working up the chain (which I just realised I did not rebase), maybe we can get the first merged 21:40:42 <mattoliver> yeah! 21:40:45 <mattoliver> good plan 21:41:05 <timburke> i'll take another look, too 21:42:21 <acoles> I've been working on reducing the memcache blob size for cached shard ranges https://review.opendev.org/c/openstack/swift/+/863562 21:42:43 <mattoliver> oh cool 21:43:26 <acoles> there's two ideas in there: 1) inspired by timburke's patch, compact the serialised shard range data by only storing the lower & name 21:43:53 <acoles> 2) use a shallow tree to break up 1000's of shard ranges into leaves of 100's 21:44:06 <timburke> nice! 21:44:10 <timburke> i bet i can help with that first TODO ;-) 21:44:46 <acoles> timburke: oh, please do 🙏 21:45:38 <mattoliver> it looks awesome, and thanks for responding to my questions inline. I'll loop back to it soon 21:45:45 <acoles> one wrinkle I uncovered (well, probe tests uncovered!): the backend may return overlaps in updating shard range lists (ok overlaps, when a parent is sharding and children are created) 21:46:16 <acoles> and the overlap does not play well with compacting the shard ranges to only (lower, name) :O 21:46:37 <acoles> so I ended up needing to filter out the overlaps before serialising to cache 21:47:11 <acoles> everything is fine on master because of the way the ranges sort by (upper, state, lower) but once we throw away upper and state the overlaps mess things up 21:47:22 <mattoliver> ahh yeah 21:47:38 <acoles> so...should we change the backend to filter what it returns? 21:47:58 <timburke> i remember seeing that on my patch, too -- pushed up https://review.opendev.org/c/openstack/swift/+/852221 fwiw 21:48:00 <acoles> that means we have an upgrade dance - backends must be upgraded before proxies 21:48:28 <mattoliver> I guess having a cached valid path is good anyway. So long as any mistakes can eventually be fixed. 21:48:29 <timburke> i suppose the better approach might be to have proxies do the filtering themselves... 21:49:12 <acoles> mattoliver: yes, all we want in cache is a single contiguous path of namespaces 21:49:55 <acoles> I need to ponder more what the proxy should do in the face of gaps ... we could have the proxy stretch a neighbor to cover a gap for updates 🤷 21:50:19 <mattoliver> or put the root in there 21:50:34 <timburke> yeah, that was my thought, too 21:50:59 <mattoliver> or not cache and force it to go back to the container for truth, but that would cause hotspotting (but alert us to the problem quickly) :P 21:51:06 <acoles> yes. again, I think we (maybe timburke ) have dabbled with the backend filling ALL gaps with the root, but again that gets us into an upgrade wrangle 21:51:15 <timburke> though if we're only tracking (lower, name), we necessarily start stretching shards 21:51:47 <acoles> timburke: we can put ANY name (including root) in a tuple with a lower to fill a gap 21:51:57 <timburke> true 21:52:11 <acoles> I mean, we could even pick a random other range to spread the pain :) 21:52:24 <mattoliver> oh thats true 21:52:30 <mattoliver> and take load off the root 21:52:44 <acoles> ...and muddle the debugging.. :( 21:53:08 <mattoliver> well maybe just leave the gap, as tim says it'll get caught by the preceeding range 21:53:31 <mattoliver> then the question remains what happens if the gap is at the start 21:54:14 <acoles> ok, I'll work up something to handle gaps 21:54:27 <acoles> at this point I'd appreciate any feedback on whether the whole endeavour seems worthwhile 21:54:58 <timburke> definitely seems worthwhile to me -- we've been struggling with those large cache values, for sure 21:55:07 <mattoliver> +1 21:55:21 <mattoliver> and there only going to go up, unless we do sometihng 21:55:36 <acoles> and whether the class hierarchy makes sense to others - for a while I've wanted a superclass of ShardRange that just knows about namespace bounds 21:56:03 <acoles> I recently checked and we have one shard range list at > 4MB! 21:56:25 <mattoliver> that is a little nuts. 21:56:29 <timburke> one question: now that there are two lookups, what happens if the first succeeds, but the second fails? i'm guessing we just treat it as a miss and go back to the root for a fresh set of everything? 21:56:37 <mattoliver> how big is the list once you run it through your code now? 21:57:05 <acoles> mattoliver: less ;-) 21:57:24 <mattoliver> lol, great, ship it then :P 21:57:30 <acoles> timburke: that would be the intent 21:58:36 <timburke> mattoliver, on my other patch (that just reduced what we track without breaking it into a tree), i saw a 3.1MB cache value shrink to 0.9MB 21:58:37 <acoles> mattoliver: seriously, each blob is reduced to some configurable max-items, but for our 4MB maybe a 10x reduction? 21:59:01 <acoles> And then we can estimate at least a 2x reduction for simply dropping the uppers 21:59:12 <mattoliver> nice! 22:00:04 <acoles> ok, so what timburke measured is for the compaction alone, then the shallow tree gets us even smaller if needed 22:00:09 <timburke> acoles, we're always limited to just the two layers, right? seems like we'd want to target len(shard_ranges)**0.5 shards per blob 22:01:07 <timburke> all right, we're at time 22:01:20 <timburke> thank you all for coming, and thank you for working on swift! 22:01:23 <timburke> #endmeeting