Wednesday, 2025-09-03

opendevreviewLhoussain AIT ASSOU proposed openstack/liberasurecode master: prepare LRC backend  https://review.opendev.org/c/openstack/liberasurecode/+/95928007:23
opendevreviewChristian Schwede proposed openstack/swift feature/threaded: WIP: Add gunicorn as wsgi server  https://review.opendev.org/c/openstack/swift/+/95919210:10
opendevreviewChristian Schwede proposed openstack/swift feature/threaded: WIP: Add dummy Timeout exception  https://review.opendev.org/c/openstack/swift/+/95919410:10
opendevreviewChristian Schwede proposed openstack/swift feature/threaded: WIP: Replace eventlet imports with native Python modules  https://review.opendev.org/c/openstack/swift/+/95919310:10
opendevreviewElod Illes proposed openstack/swift unmaintained/2023.1: CI: Fix py27/py36/py37 jobs  https://review.opendev.org/c/openstack/swift/+/95932714:40
opendevreviewClay Gerrard proposed openstack/swift master: s3api: fix test_service with pre-existing buckets  https://review.opendev.org/c/openstack/swift/+/95829117:01
*** Guest22851 is now known as diablo_rojo_phone18:12
opendevreviewShreeya Deshpande proposed openstack/swift master: proxy-logging: Add real-time transfer bytes counters  https://review.opendev.org/c/openstack/swift/+/93091818:46
opendevreviewShreeya Deshpande proposed openstack/swift master: proxy-logging: Add real-time transfer bytes counters  https://review.opendev.org/c/openstack/swift/+/93091818:49
opendevreviewShreeya Deshpande proposed openstack/swift master: proxy-logging: Add real-time transfer bytes counters  https://review.opendev.org/c/openstack/swift/+/93091819:01
opendevreviewShreeya Deshpande proposed openstack/swift master: Provide some s3 helper methods for other middlewares to use.  https://review.opendev.org/c/openstack/swift/+/94079119:27
opendevreviewShreeya Deshpande proposed openstack/swift master: proxy-logging: Add real-time transfer bytes counters  https://review.opendev.org/c/openstack/swift/+/93091819:28
*** timburke_ is now known as timburke19:39
opendevreviewMerged openstack/swift unmaintained/2023.1: CI: Fix py27/py36/py37 jobs  https://review.opendev.org/c/openstack/swift/+/95932719:53
opendevreviewMerged openstack/swift master: tests: Skip some tests if crc32c is not available  https://review.opendev.org/c/openstack/swift/+/95891220:04
opendevreviewTim Burke proposed openstack/swift master: AUTHORS/CHANGELOG for 2.36.0  https://review.opendev.org/c/openstack/swift/+/95633320:15
opendevreviewShreeya Deshpande proposed openstack/swift master: proxy-logging: Add real-time transfer bytes counters  https://review.opendev.org/c/openstack/swift/+/93091820:45
opendevreviewTim Burke proposed openstack/swift stable/2025.1: tests: Skip some tests if crc32c is not available  https://review.opendev.org/c/openstack/swift/+/95939121:00
timburke#startmeeting swift21:00
opendevmeetMeeting started Wed Sep  3 21:00:41 2025 UTC and is due to finish in 60 minutes.  The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot.21:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.21:00
opendevmeetThe meeting name has been set to 'swift'21:00
timburkewho's here for the swift team meeting?21:00
mattolivero/21:01
timburkeas usual the agenda's at21:02
timburke#link https://wiki.openstack.org/wiki/Meetings/Swift21:02
timburkefirst up21:02
timburke#topic vPTG21:02
timburkeas a reminder, it's just under two month out now21:02
timburkeoct 27-3121:02
timburkebut there's more than just a reminder this time! there's a call to action!21:03
timburkefor i've got the skeleton of an etherpad up for topics21:03
timburke#link https://etherpad.opendev.org/p/swift-ptg-gazpacho21:03
mattoliverOh nice21:03
timburkeso please, add topics you'd like to discuss!21:04
timburkenext up21:04
timburke#topic releases21:04
timburkeit's getting to that point in the cycle where we really need to get code shipped :-)21:04
timburkeliberasurecode 1.7.1 is now released21:05
timburkeand the stable branch for swiftclient has been cut21:05
mattoliverTime to fix up and promote the priority reviews page? 21:05
timburkebut i still need to do a swift release21:05
timburkealways a good time for that :-) i really should do it more regularly21:06
timburkebut i think i like where things are at the moment. we could always do more, and if anyone wants to slip something in this week, i can update notes for it, but i think i'd be content once we get the notes i've written so far merged21:07
timburke#link https://review.opendev.org/c/openstack/swift/+/95633321:07
mattoliverKk21:08
timburkeif you've got a sec to spot-check those (make sure i didn't misspell something, or that i explained things well enough), i'd appreciate it21:08
mattoliverWill do! 21:08
timburkethanks jianjian for taking a look last week!21:08
timburkenow it's got the reno-ified notes, too, though21:09
jianjianno problem, I also have some nits, good time to add since Matt is going to take a look as well :-)21:09
timburke#link https://1fa741547dce2186b901-c02cba5f61e61aa034234ed930eebdcc.ssl.cf1.rackcdn.com/openstack/d2a082b2322e47a791f7d98062e50b34/docs/current.html21:09
timburkebut there's even more! i'd also like to get a pyeclib release out, so it can offer the new backend in liberasurecode 1.7.x21:10
timburke#link https://review.opendev.org/c/openstack/pyeclib/+/95870621:11
mattoliverOh yeah, getting the new backend out would be awesome too21:11
jianjiannice!21:11
timburkeall right, next up21:12
timburke#topic eventlet removal21:12
timburkea bunch of us (clayg, jianjian, and i) met with cschwede last week to ask about where things stand and how we could help out21:13
timburkesounds like things are going fairly well, as POCs go! account server is looking pretty good21:15
timburkeone of the things to come out of that was to spin up a feature branch where we can all hack on it21:15
timburkeso we now have a feature/threaded branch!21:15
timburkeand cschwede has pushed up a chain to get us started21:16
timburke#link https://review.opendev.org/q/project:openstack/swift+branch:feature/threaded+is:open21:16
mattoliveryou know we're getting serious when we now have 2 feature branches running!21:17
timburkewe're going to have a feature flag for it (currently implemented as an env var), that way we don't have to dive into PUT+POST+POST right away21:17
mattoliveroh nice idea21:18
jianjianseems like a clean addition, great start from cschwede21:18
mattoliveris there like a one pager , overview or doc on the main idea?21:18
timburkeand i think i'll try to get some of the gate jobs running with gunicorn for the account server this week21:18
timburkemattoliver, i don't think we have that at the moment -- good idea21:18
jianjianfor me, I have been thinking to re-run my previous benchmark tests with newer python versions to check for improvements. I started with Uvicorn (even though we probably won’t adopt it for various reasons) and saw a significant throughput gain when using py3.13 with the uvloop scheduler. I’ll be testing Gunicorn as next.21:19
mattoliverjust help people collab if there is at least a known rough idea. 21:19
mattoliveralthough git diff between branches and gunicorn is a good start21:20
timburkeall right, one last thing that's been keeping me preoccupied lately21:21
timburke#topic potential EC data loss21:21
timburkewe've actually had this come up in two ways21:22
timburkein the first, we've got two clients trying to write to the same object with the same timestamp21:22
timburkethe bad news is, the writes can interleave in such a way that both writes return 201 to the client yet we don't have enough fragments to reconstruct either of them -- in an 8+4 policy, we might have 7 of one and 5 of the other21:24
timburkei went searching for bugs about it and apparently we hadn't written anything up about it -- but it sounds a lot like21:25
timburke#link https://bugs.launchpad.net/swift/+bug/197168621:25
mattoliverI know other distributed eventual systems use a thing call a lamport clock. That basically boils down to a node having a counter or unique number they also sent to break collision deadlocks. And swift already allows this with the timestamp offsets.. so I've been having a play: https://review.opendev.org/c/openstack/swift/+/959009 21:26
timburkethe good news is, they were trying to write the same data. but with encryption enabled, they each used different keys/ivs, so the proxy bombs out on read when it noticed the mismatched etags21:26
mattoliverThat just using randint because I haven't got something more generic like machine ids working properly because they're too big an int but unique. 21:26
timburkeyep, thanks mattoliver! and clayg's been playing with a probe test to prompt the error: https://review.opendev.org/c/openstack/swift/+/92732721:27
mattoliveryou guys have been amazing and digging into the EC details and getting them getting things fixed!21:27
mattoliverat that end21:27
timburkeone trouble i see, though, is that there's nothing to stop both clients going through the same proxy (as with the probe test)21:28
mattoliveryeah, I got it different per worker21:28
jianjianthe new improvement from timburke on the repair tool is great 👍21:28
mattoliverbut I guess need to add some more randomness via eventlet21:28
timburkeyeah, so you're beating me to the punch a little :-) despite not having any set of frags large enough to rebuild using the normal machinery, we did find a way to fix them21:29
jianjianthat's very smart21:30
mattoliveryou guys are bloody geniuses, especially you timburke  to figure it all out!21:30
timburkebasically, since they've always been collisions with the same plaintext user data and we've been using the systematic isa-l codes, we could grab just the data fragments, do some math to figure out the right offsets, and decrypt the frags directly without involving pyeclib at all21:31
mattoliver:mind-blown: 21:33
timburkethat worked for most cases, but every now and then we had an object that lost a data frag. *then* we can figure out which key/iv has the most parity, re-encrypt any other data frags to use that key/iv, and then send it all through pyeclib and a decrypter like a proxy would21:34
timburkeso far i don't think we've had anything that we couldn't fix that way21:35
jianjian\o/21:35
timburkebut it's still treating the symptoms, not the root cause. so expect more work on that front (thanks for thinking about lamport clocks, mattoliver!)21:36
mattoliverincredible! 21:36
jianjian+1, will look into matt's new patches21:37
timburkethe other case is a little more scary. we've *also* seen the occasional fragment that's just bad. like, the frag-level etag matches, but trying to use it when decoding gives us data that doesn't match the object-level etag21:38
mattoliverwell the fact that you got as far as you did it way way better then I'd ever expect you to get. 21:39
timburkewe're still searching for explanations. the good news is that there are usually enough other frags such that we can find *some* set that still works21:39
jianjianprobably that was some kind of  bit rot or disk corruption21:40
mattoliveryeah, I guess if you gather all frag (even on handoffs) then brute force the combinations until it works21:40
mattoliveryeah21:40
timburkemaybe? but then i would've expected our normal machinery to quarantine it, and the frag-level etag to not match21:41
timburke(we've even had a case where an object hit *both* of these issues, so clayg enhanced our repair tool to be able to exclude specific frags)21:41
mattoliverRe uniqueness in green threads, I wonder if I could just do something simple like crc the txid and add that to the offset so unique-ify between greenthreads :hmm: Need to add something to make it more unique. Does each thread have an id in eventlet land or something. 21:43
mattoliverAnyway my patch is what I have is just a start at playing with the idea. don't think it's unique enough yet, well it can be.. but not happy with it yet.21:43
mattoliverbut we have Timestamp objects with offsets plumbed all the way to deiskfile, so it kinda just works. Just weird seeing data files on disk with an offset included. 21:44
timburkei think our leading suspects are an issue with reconstruction, where one of the frags we reach for *does* get quarantined but we still send data reconstructed from it to the destination with a fresh, good etag; or, some kind of weird bit-flip somewhere in the proxy, where the frag was always corrupted as soon as we got it back from pyeclib (but due to memory problems, not pyeclib)21:45
timburkemattoliver, yeah, iirc greenthreads have some kind of id -- if nothing else, you could surely use python's id() on it21:46
mattoliveroh yeah! 21:47
timburkeanother thought i had was to make diskfile more brittle: prevent it from linking over an existing file21:47
mattoliverthats not a bad idea either. So basically in this case, both would fail21:48
mattoliverunless enough frags for each were written to handoffs21:48
mattoliver*both requests21:48
timburkeyeah -- getting 503s back out to the client and prompting a retry21:48
timburkehmm... true...21:48
timburkeso, might be good to do both :D21:49
timburkeall right, that's all i've got, and we've only got about 10mins left21:49
timburke#topic open discussion21:49
timburkeanything else we ought to bring up this week?21:49
mattoliverI've respun the ring/ringdata patch and playing with using a reentrant lock so the ring can't update mid get_nodes/get_more_nodes21:51
mattoliver#link https://review.opendev.org/c/openstack/swift/+/955263 21:51
mattoliveropps wrong one21:51
mattoliver#link https://review.opendev.org/c/openstack/swift/+/957291 21:51
timburkenice -- i'll try to have a look soon21:52
timburkethe stable gates that were broken should be fixed now. i saw elodilles also backported the fix to the most-recent unmaintained branch21:53
mattoliveroh cool21:54
timburkeoh yeah, and the guy that added the new libec backend it working on another one21:55
timburke#link https://review.opendev.org/c/openstack/liberasurecode/+/95928021:55
timburkeif you're interested, maybe start with https://en.wikipedia.org/wiki/Locally_recoverable_code (at least for an overview of the goals)21:56
mattoliverthanks, yeah, I'll need an overview21:56
timburkeit kinda sounds like a better take on our ec duplication factor -- which did always feel like overkill21:57
mattoliverBut nice to see so much liberasure love21:57
timburkeagree21:57
timburkeall right, i think i oughta call it21:57
timburkethank you all for coming, and thank you for working on swift!21:57
timburke#endmeeting21:57
opendevmeetMeeting ended Wed Sep  3 21:57:51 2025 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)21:57
opendevmeetMinutes:        https://meetings.opendev.org/meetings/swift/2025/swift.2025-09-03-21.00.html21:57
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/swift/2025/swift.2025-09-03-21.00.txt21:57
opendevmeetLog:            https://meetings.opendev.org/meetings/swift/2025/swift.2025-09-03-21.00.log.html21:57

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!