| opendevreview | Lhoussain AIT ASSOU proposed openstack/liberasurecode master: prepare LRC backend https://review.opendev.org/c/openstack/liberasurecode/+/959280 | 07:23 |
|---|---|---|
| opendevreview | Christian Schwede proposed openstack/swift feature/threaded: WIP: Add gunicorn as wsgi server https://review.opendev.org/c/openstack/swift/+/959192 | 10:10 |
| opendevreview | Christian Schwede proposed openstack/swift feature/threaded: WIP: Add dummy Timeout exception https://review.opendev.org/c/openstack/swift/+/959194 | 10:10 |
| opendevreview | Christian Schwede proposed openstack/swift feature/threaded: WIP: Replace eventlet imports with native Python modules https://review.opendev.org/c/openstack/swift/+/959193 | 10:10 |
| opendevreview | Elod Illes proposed openstack/swift unmaintained/2023.1: CI: Fix py27/py36/py37 jobs https://review.opendev.org/c/openstack/swift/+/959327 | 14:40 |
| opendevreview | Clay Gerrard proposed openstack/swift master: s3api: fix test_service with pre-existing buckets https://review.opendev.org/c/openstack/swift/+/958291 | 17:01 |
| *** Guest22851 is now known as diablo_rojo_phone | 18:12 | |
| opendevreview | Shreeya Deshpande proposed openstack/swift master: proxy-logging: Add real-time transfer bytes counters https://review.opendev.org/c/openstack/swift/+/930918 | 18:46 |
| opendevreview | Shreeya Deshpande proposed openstack/swift master: proxy-logging: Add real-time transfer bytes counters https://review.opendev.org/c/openstack/swift/+/930918 | 18:49 |
| opendevreview | Shreeya Deshpande proposed openstack/swift master: proxy-logging: Add real-time transfer bytes counters https://review.opendev.org/c/openstack/swift/+/930918 | 19:01 |
| opendevreview | Shreeya Deshpande proposed openstack/swift master: Provide some s3 helper methods for other middlewares to use. https://review.opendev.org/c/openstack/swift/+/940791 | 19:27 |
| opendevreview | Shreeya Deshpande proposed openstack/swift master: proxy-logging: Add real-time transfer bytes counters https://review.opendev.org/c/openstack/swift/+/930918 | 19:28 |
| *** timburke_ is now known as timburke | 19:39 | |
| opendevreview | Merged openstack/swift unmaintained/2023.1: CI: Fix py27/py36/py37 jobs https://review.opendev.org/c/openstack/swift/+/959327 | 19:53 |
| opendevreview | Merged openstack/swift master: tests: Skip some tests if crc32c is not available https://review.opendev.org/c/openstack/swift/+/958912 | 20:04 |
| opendevreview | Tim Burke proposed openstack/swift master: AUTHORS/CHANGELOG for 2.36.0 https://review.opendev.org/c/openstack/swift/+/956333 | 20:15 |
| opendevreview | Shreeya Deshpande proposed openstack/swift master: proxy-logging: Add real-time transfer bytes counters https://review.opendev.org/c/openstack/swift/+/930918 | 20:45 |
| opendevreview | Tim Burke proposed openstack/swift stable/2025.1: tests: Skip some tests if crc32c is not available https://review.opendev.org/c/openstack/swift/+/959391 | 21:00 |
| timburke | #startmeeting swift | 21:00 |
| opendevmeet | Meeting started Wed Sep 3 21:00:41 2025 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot. | 21:00 |
| opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 21:00 |
| opendevmeet | The meeting name has been set to 'swift' | 21:00 |
| timburke | who's here for the swift team meeting? | 21:00 |
| mattoliver | o/ | 21:01 |
| timburke | as usual the agenda's at | 21:02 |
| timburke | #link https://wiki.openstack.org/wiki/Meetings/Swift | 21:02 |
| timburke | first up | 21:02 |
| timburke | #topic vPTG | 21:02 |
| timburke | as a reminder, it's just under two month out now | 21:02 |
| timburke | oct 27-31 | 21:02 |
| timburke | but there's more than just a reminder this time! there's a call to action! | 21:03 |
| timburke | for i've got the skeleton of an etherpad up for topics | 21:03 |
| timburke | #link https://etherpad.opendev.org/p/swift-ptg-gazpacho | 21:03 |
| mattoliver | Oh nice | 21:03 |
| timburke | so please, add topics you'd like to discuss! | 21:04 |
| timburke | next up | 21:04 |
| timburke | #topic releases | 21:04 |
| timburke | it's getting to that point in the cycle where we really need to get code shipped :-) | 21:04 |
| timburke | liberasurecode 1.7.1 is now released | 21:05 |
| timburke | and the stable branch for swiftclient has been cut | 21:05 |
| mattoliver | Time to fix up and promote the priority reviews page? | 21:05 |
| timburke | but i still need to do a swift release | 21:05 |
| timburke | always a good time for that :-) i really should do it more regularly | 21:06 |
| timburke | but i think i like where things are at the moment. we could always do more, and if anyone wants to slip something in this week, i can update notes for it, but i think i'd be content once we get the notes i've written so far merged | 21:07 |
| timburke | #link https://review.opendev.org/c/openstack/swift/+/956333 | 21:07 |
| mattoliver | Kk | 21:08 |
| timburke | if you've got a sec to spot-check those (make sure i didn't misspell something, or that i explained things well enough), i'd appreciate it | 21:08 |
| mattoliver | Will do! | 21:08 |
| timburke | thanks jianjian for taking a look last week! | 21:08 |
| timburke | now it's got the reno-ified notes, too, though | 21:09 |
| jianjian | no problem, I also have some nits, good time to add since Matt is going to take a look as well :-) | 21:09 |
| timburke | #link https://1fa741547dce2186b901-c02cba5f61e61aa034234ed930eebdcc.ssl.cf1.rackcdn.com/openstack/d2a082b2322e47a791f7d98062e50b34/docs/current.html | 21:09 |
| timburke | but there's even more! i'd also like to get a pyeclib release out, so it can offer the new backend in liberasurecode 1.7.x | 21:10 |
| timburke | #link https://review.opendev.org/c/openstack/pyeclib/+/958706 | 21:11 |
| mattoliver | Oh yeah, getting the new backend out would be awesome too | 21:11 |
| jianjian | nice! | 21:11 |
| timburke | all right, next up | 21:12 |
| timburke | #topic eventlet removal | 21:12 |
| timburke | a bunch of us (clayg, jianjian, and i) met with cschwede last week to ask about where things stand and how we could help out | 21:13 |
| timburke | sounds like things are going fairly well, as POCs go! account server is looking pretty good | 21:15 |
| timburke | one of the things to come out of that was to spin up a feature branch where we can all hack on it | 21:15 |
| timburke | so we now have a feature/threaded branch! | 21:15 |
| timburke | and cschwede has pushed up a chain to get us started | 21:16 |
| timburke | #link https://review.opendev.org/q/project:openstack/swift+branch:feature/threaded+is:open | 21:16 |
| mattoliver | you know we're getting serious when we now have 2 feature branches running! | 21:17 |
| timburke | we're going to have a feature flag for it (currently implemented as an env var), that way we don't have to dive into PUT+POST+POST right away | 21:17 |
| mattoliver | oh nice idea | 21:18 |
| jianjian | seems like a clean addition, great start from cschwede | 21:18 |
| mattoliver | is there like a one pager , overview or doc on the main idea? | 21:18 |
| timburke | and i think i'll try to get some of the gate jobs running with gunicorn for the account server this week | 21:18 |
| timburke | mattoliver, i don't think we have that at the moment -- good idea | 21:18 |
| jianjian | for me, I have been thinking to re-run my previous benchmark tests with newer python versions to check for improvements. I started with Uvicorn (even though we probably won’t adopt it for various reasons) and saw a significant throughput gain when using py3.13 with the uvloop scheduler. I’ll be testing Gunicorn as next. | 21:19 |
| mattoliver | just help people collab if there is at least a known rough idea. | 21:19 |
| mattoliver | although git diff between branches and gunicorn is a good start | 21:20 |
| timburke | all right, one last thing that's been keeping me preoccupied lately | 21:21 |
| timburke | #topic potential EC data loss | 21:21 |
| timburke | we've actually had this come up in two ways | 21:22 |
| timburke | in the first, we've got two clients trying to write to the same object with the same timestamp | 21:22 |
| timburke | the bad news is, the writes can interleave in such a way that both writes return 201 to the client yet we don't have enough fragments to reconstruct either of them -- in an 8+4 policy, we might have 7 of one and 5 of the other | 21:24 |
| timburke | i went searching for bugs about it and apparently we hadn't written anything up about it -- but it sounds a lot like | 21:25 |
| timburke | #link https://bugs.launchpad.net/swift/+bug/1971686 | 21:25 |
| mattoliver | I know other distributed eventual systems use a thing call a lamport clock. That basically boils down to a node having a counter or unique number they also sent to break collision deadlocks. And swift already allows this with the timestamp offsets.. so I've been having a play: https://review.opendev.org/c/openstack/swift/+/959009 | 21:26 |
| timburke | the good news is, they were trying to write the same data. but with encryption enabled, they each used different keys/ivs, so the proxy bombs out on read when it noticed the mismatched etags | 21:26 |
| mattoliver | That just using randint because I haven't got something more generic like machine ids working properly because they're too big an int but unique. | 21:26 |
| timburke | yep, thanks mattoliver! and clayg's been playing with a probe test to prompt the error: https://review.opendev.org/c/openstack/swift/+/927327 | 21:27 |
| mattoliver | you guys have been amazing and digging into the EC details and getting them getting things fixed! | 21:27 |
| mattoliver | at that end | 21:27 |
| timburke | one trouble i see, though, is that there's nothing to stop both clients going through the same proxy (as with the probe test) | 21:28 |
| mattoliver | yeah, I got it different per worker | 21:28 |
| jianjian | the new improvement from timburke on the repair tool is great 👍 | 21:28 |
| mattoliver | but I guess need to add some more randomness via eventlet | 21:28 |
| timburke | yeah, so you're beating me to the punch a little :-) despite not having any set of frags large enough to rebuild using the normal machinery, we did find a way to fix them | 21:29 |
| jianjian | that's very smart | 21:30 |
| mattoliver | you guys are bloody geniuses, especially you timburke to figure it all out! | 21:30 |
| timburke | basically, since they've always been collisions with the same plaintext user data and we've been using the systematic isa-l codes, we could grab just the data fragments, do some math to figure out the right offsets, and decrypt the frags directly without involving pyeclib at all | 21:31 |
| mattoliver | :mind-blown: | 21:33 |
| timburke | that worked for most cases, but every now and then we had an object that lost a data frag. *then* we can figure out which key/iv has the most parity, re-encrypt any other data frags to use that key/iv, and then send it all through pyeclib and a decrypter like a proxy would | 21:34 |
| timburke | so far i don't think we've had anything that we couldn't fix that way | 21:35 |
| jianjian | \o/ | 21:35 |
| timburke | but it's still treating the symptoms, not the root cause. so expect more work on that front (thanks for thinking about lamport clocks, mattoliver!) | 21:36 |
| mattoliver | incredible! | 21:36 |
| jianjian | +1, will look into matt's new patches | 21:37 |
| timburke | the other case is a little more scary. we've *also* seen the occasional fragment that's just bad. like, the frag-level etag matches, but trying to use it when decoding gives us data that doesn't match the object-level etag | 21:38 |
| mattoliver | well the fact that you got as far as you did it way way better then I'd ever expect you to get. | 21:39 |
| timburke | we're still searching for explanations. the good news is that there are usually enough other frags such that we can find *some* set that still works | 21:39 |
| jianjian | probably that was some kind of bit rot or disk corruption | 21:40 |
| mattoliver | yeah, I guess if you gather all frag (even on handoffs) then brute force the combinations until it works | 21:40 |
| mattoliver | yeah | 21:40 |
| timburke | maybe? but then i would've expected our normal machinery to quarantine it, and the frag-level etag to not match | 21:41 |
| timburke | (we've even had a case where an object hit *both* of these issues, so clayg enhanced our repair tool to be able to exclude specific frags) | 21:41 |
| mattoliver | Re uniqueness in green threads, I wonder if I could just do something simple like crc the txid and add that to the offset so unique-ify between greenthreads :hmm: Need to add something to make it more unique. Does each thread have an id in eventlet land or something. | 21:43 |
| mattoliver | Anyway my patch is what I have is just a start at playing with the idea. don't think it's unique enough yet, well it can be.. but not happy with it yet. | 21:43 |
| mattoliver | but we have Timestamp objects with offsets plumbed all the way to deiskfile, so it kinda just works. Just weird seeing data files on disk with an offset included. | 21:44 |
| timburke | i think our leading suspects are an issue with reconstruction, where one of the frags we reach for *does* get quarantined but we still send data reconstructed from it to the destination with a fresh, good etag; or, some kind of weird bit-flip somewhere in the proxy, where the frag was always corrupted as soon as we got it back from pyeclib (but due to memory problems, not pyeclib) | 21:45 |
| timburke | mattoliver, yeah, iirc greenthreads have some kind of id -- if nothing else, you could surely use python's id() on it | 21:46 |
| mattoliver | oh yeah! | 21:47 |
| timburke | another thought i had was to make diskfile more brittle: prevent it from linking over an existing file | 21:47 |
| mattoliver | thats not a bad idea either. So basically in this case, both would fail | 21:48 |
| mattoliver | unless enough frags for each were written to handoffs | 21:48 |
| mattoliver | *both requests | 21:48 |
| timburke | yeah -- getting 503s back out to the client and prompting a retry | 21:48 |
| timburke | hmm... true... | 21:48 |
| timburke | so, might be good to do both :D | 21:49 |
| timburke | all right, that's all i've got, and we've only got about 10mins left | 21:49 |
| timburke | #topic open discussion | 21:49 |
| timburke | anything else we ought to bring up this week? | 21:49 |
| mattoliver | I've respun the ring/ringdata patch and playing with using a reentrant lock so the ring can't update mid get_nodes/get_more_nodes | 21:51 |
| mattoliver | #link https://review.opendev.org/c/openstack/swift/+/955263 | 21:51 |
| mattoliver | opps wrong one | 21:51 |
| mattoliver | #link https://review.opendev.org/c/openstack/swift/+/957291 | 21:51 |
| timburke | nice -- i'll try to have a look soon | 21:52 |
| timburke | the stable gates that were broken should be fixed now. i saw elodilles also backported the fix to the most-recent unmaintained branch | 21:53 |
| mattoliver | oh cool | 21:54 |
| timburke | oh yeah, and the guy that added the new libec backend it working on another one | 21:55 |
| timburke | #link https://review.opendev.org/c/openstack/liberasurecode/+/959280 | 21:55 |
| timburke | if you're interested, maybe start with https://en.wikipedia.org/wiki/Locally_recoverable_code (at least for an overview of the goals) | 21:56 |
| mattoliver | thanks, yeah, I'll need an overview | 21:56 |
| timburke | it kinda sounds like a better take on our ec duplication factor -- which did always feel like overkill | 21:57 |
| mattoliver | But nice to see so much liberasure love | 21:57 |
| timburke | agree | 21:57 |
| timburke | all right, i think i oughta call it | 21:57 |
| timburke | thank you all for coming, and thank you for working on swift! | 21:57 |
| timburke | #endmeeting | 21:57 |
| opendevmeet | Meeting ended Wed Sep 3 21:57:51 2025 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 21:57 |
| opendevmeet | Minutes: https://meetings.opendev.org/meetings/swift/2025/swift.2025-09-03-21.00.html | 21:57 |
| opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/swift/2025/swift.2025-09-03-21.00.txt | 21:57 |
| opendevmeet | Log: https://meetings.opendev.org/meetings/swift/2025/swift.2025-09-03-21.00.log.html | 21:57 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!