21:00:09 <timburke> #startmeeting swift
21:00:09 <opendevmeet> Meeting started Wed Sep 18 21:00:09 2024 UTC and is due to finish in 60 minutes.  The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:00:09 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:00:09 <opendevmeet> The meeting name has been set to 'swift'
21:00:16 <timburke> who's here for the swift meeting?
21:00:20 <fuleco> o/
21:01:03 <mattoliver> o/
21:01:14 <timburke> as usual, the agenda's at
21:01:17 <timburke> #link https://wiki.openstack.org/wiki/Meetings/Swift
21:01:22 <timburke> first up
21:01:29 <timburke> #topic 2.34.0 release
21:01:54 <timburke> it's out! this is our final release for the dalmatian cycle
21:02:08 <fuleco> 👏🏻
21:02:11 <timburke> there's a lot of good stuff in it
21:02:17 <timburke> #link https://opendev.org/openstack/swift/src/branch/master/CHANGELOG
21:03:09 <mattoliver> \o/
21:03:11 <mattoliver> nice
21:03:19 <mattoliver> best swift ever :)
21:03:30 <mattoliver> or best swift yet
21:03:35 <mattoliver> whatever the saying :P
21:03:43 <timburke> it's one better than the last one!
21:03:50 <mattoliver> -ETOOEARLY :P
21:03:51 <mattoliver> lol
21:04:03 <timburke> next up
21:04:08 <timburke> #topic vPTG
21:04:13 <timburke> it's next month!
21:04:35 <timburke> sorry, it kind of snuck up on me; i should really pay more attention to the mailing list
21:05:00 <mattoliver> Yeah, it seems to do that
21:05:04 <timburke> we aren't on the initial project list
21:05:06 <timburke> #link https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/APAC5ANX4TQLP5R257D6OIADTN6Y5GMS/
21:05:07 <patch-bot> Bug #5 - Plone Placeless Translation Service metadata missing from po files (Fix Released)
21:05:31 <mattoliver> I don't think thats the right line :P
21:05:34 <mattoliver> *link
21:05:58 <mattoliver> I think we need to send an email to the ptg people.
21:06:12 <timburke> but i send an email as recommended
21:06:27 <mattoliver> oh cool
21:06:41 <mattoliver> want me to set up the etherpads so we can start collecting topics?
21:07:12 <timburke> sure! i started on the general topics one, but if you wanted to pick up the ops-feedback, that'd be great!
21:07:22 <mattoliver> kk
21:07:24 <timburke> #link https://etherpad.opendev.org/p/swift-ptg-epoxy
21:08:17 <timburke> (side note: i also just learned today that our next release name is Epoxy -- i suppose we help hold everything together :-)
21:09:41 <mattoliver> lol, really
21:09:50 <timburke> i'll get a poll up for meeting times this week, though i suspect as usual there just isn't going to be a great time slot
21:10:41 <timburke> oh, and registration is at
21:10:42 <timburke> #link https://ptg2024.openinfra.dev/
21:10:55 <timburke> next up
21:11:03 <timburke> #topic s3api bugs
21:11:29 <timburke> fuleco has been good at finding some current deficiencies in our s3api
21:11:36 <timburke> #link https://bugs.launchpad.net/swift/+bug/2077179
21:11:37 <patch-bot> Bug #2077179 - S3Api - async delete (New)
21:11:39 <timburke> and today
21:11:44 <timburke> #link https://bugs.launchpad.net/swift/+bug/2081103
21:11:44 <patch-bot> Bug #2081103 - s3api: Deleting the current version of an object can (sometimes?) 500 (New)
21:13:06 <timburke> i don't think there's much progress on either yet, but i wanted to make sure that people were aware of them, and hopefully we can devote some time toward them before the PTG
21:13:07 <fuleco> Yeah the first one is kind of a feature too
21:13:20 <fuleco> Since it adds a functionality to s3api
21:13:41 <fuleco> But the second one is the weird one in the pack, I've been cracking my head at it for some days now...
21:14:32 <timburke> true enough -- though i'd be willing to bet that AWS does some dribble-out-whitespace trick for long-running multi-deletes, so us *not* matching that behavior could be construed as a bug ;-)
21:15:11 <fuleco> Seems fair XD
21:15:37 <fuleco> Progress from my part has been a little slow, some other issues got my schedule filed
21:15:54 <fuleco> But I hope I can do something on them in the next weeks
21:16:23 <mattoliver> We know the feeling (re:shedules). Nice finds
21:16:33 <timburke> fwiw, the more i think about it, i've got a suspicion that the second one has to do with a Transfer-Encoding: chunked DELETE call, and us re-using the client request environment to PUT the old version back
21:17:06 <fuleco> I do think so too
21:17:15 <timburke> it reminds me a bit of how we (ab)use chunked transfers for EC PUTs
21:17:34 <timburke> https://bugs.launchpad.net/swift/+bug/1496636
21:17:38 <fuleco> On my side, I'll be applying a patch to just add an empty body on the request call to try and deflect that. That would at least give us some more information
21:17:41 <patch-bot> Bug #1496636 - EC: Chunked transfer/commit protocol is *not* HTTP (In Progress)
21:17:44 <fuleco> However
21:18:02 <timburke> which unfortunately means that we don't really want eventlet to fix it properly
21:18:03 <fuleco> I cannot replicate this error in any environment I have access to
21:18:57 <fuleco> So I'll have to wait for it to go to production and maybe get some feedback? Even the original environment where it errored does not consistantly report it
21:19:34 <fuleco> But I would be more than happy if anyone has other ideas or can at least replicate the error
21:20:28 <timburke> for sure, having someone replicate the error would be great
21:21:02 <mattoliver> it is weird that it's a timeout and not an eventlet.timeout, and down in eventlet hub.
21:21:33 <fuleco> Just to give a bit of context, it was a bucket with 230GB in files, versions and delete markers, totaling about 3 and a half million entries
21:21:54 <mattoliver> I mean I thought eventlet would've monkey patched timeout there.
21:22:19 <fuleco> Yeah, it is very weird
21:22:49 <timburke> agreed. i might be part to blame; i think i did a lot of the get-tests-passing-on-py310 work for eventlet
21:23:19 <fuleco> As I was comenting with tim before, my best bet is that it is trying to read a body file that does not exist, but for some reason it registers the read and kinda deadlocks in the IO operation
21:23:24 <timburke> and i kinda remember some funny business around TimeoutError
21:23:40 <fuleco> but I have to admit I didn't go down the eventlet route too much
21:24:32 <mattoliver> good thing we have a timburke who does delve in eventlet upstream :)
21:26:23 <timburke> the chunked transfer thing makes some sense to me -- client sends a DELETE with the 0-byte chunk, then starts reading and blocks
21:26:25 <timburke> server takes the client request and makes the listing to discover the new current version, consuming the client-provided body in the process; server then uses the same client-socket-backed wsgi.input for the PUT, where we try to read a byte that the client will never send
21:26:50 <timburke> (because it would break HTTP)
21:27:48 <mattoliver> oh i see. and the handle_put_version is what then the putting the old version back in place?
21:27:49 <timburke> but if the client sends an explicit Content-Length: 0, everything works out fine (because we can "consume" that kind of a body all we want)
21:27:54 <timburke> yup
21:28:47 <timburke> so that's my suspicion; i just need to get my dev environment back to a point that i can test it
21:29:31 <timburke> extra fun will be breaking the HTTP protocol by sending more data, and seeing what happens when symlink balks
21:29:47 <opendevreview> Shreeya Deshpande proposed openstack/swift master: Split statsd client from logger  https://review.opendev.org/c/openstack/swift/+/915483
21:29:59 <timburke> speaking of rabbit holes for tim to fall down...
21:30:09 <timburke> #topic liberasurecode segfault
21:31:14 <timburke> i discovered that creating multiple liberasurecode_rs_vand instances leaks memory!
21:31:29 <mattoliver> oh yay :(
21:31:52 <timburke> and worse, destroying any one of them will leave all other segfault-y
21:32:02 <timburke> like, can't even destroy them
21:32:18 <timburke> patch was reasonably easy, though!
21:32:21 <timburke> #link https://review.opendev.org/c/openstack/liberasurecode/+/929193
21:32:21 <patch-bot> patch 929193 - liberasurecode - built-in rs_vand: De-init tables only when last de... - 3 patch sets
21:33:09 <timburke> the test even reads remarkably well, IMHO
21:33:42 <mattoliver> nice one
21:33:47 <timburke> i discovered this because i was trying to write an even more complicated test around...
21:33:58 <timburke> #topic liberasurecode thread-safety
21:34:15 <timburke> we've known for a while that it's not thread safe
21:34:24 <timburke> #link https://bugs.launchpad.net/liberasurecode/+bug/1954351
21:34:25 <patch-bot> Bug #1954351 - Multiple APIs not thread-safe (In Progress)
21:34:56 <timburke> when i wrote that, i didn't realize just *how* not-thread-safe it is
21:36:38 <timburke> the one lock we've got currently guards just a single data structure, but i'm increasingly convinced that it should guard all usages of backend instances (the structures that get looked up via the descriptor numbers we return)
21:37:26 <timburke> mostly because we can't trust that whatever backend initialization/cleanup will be thread-safe
21:37:45 <mattoliver> cool, haven't looked at liberasurecode in a while. might need to reload what I can in my head :) But yeah test reads pretty good for c :P
21:38:45 <timburke> the fix i wrote for the built-in rs-vand implementation isn't, though i could add another lock and make it better. but i also discovered that jerasurecode isn't, and i wouldn't be surprised if isa-l had similar trouble
21:39:30 <timburke> i've got a few patches now to improve things; the first two seem pretty ready-to-go
21:39:31 <timburke> #link https://review.opendev.org/c/openstack/liberasurecode/+/929324
21:39:32 <patch-bot> patch 929324 - liberasurecode - Fix write locking when destroying instances - 2 patch sets
21:39:35 <timburke> #link https://review.opendev.org/c/openstack/liberasurecode/+/929325
21:39:36 <patch-bot> patch 929325 - liberasurecode - Fix write locking when creating instances - 2 patch sets
21:40:00 <timburke> the last one goes and adds all the missing read-locks, but is still under-tested
21:40:07 <timburke> #link https://review.opendev.org/c/openstack/liberasurecode/+/929847
21:40:08 <patch-bot> patch 929847 - liberasurecode - Add read locks when accessing EC descriptors - 1 patch set
21:41:41 <timburke> the one hesitation i've got with those first two is that we'll remove some functions that are currently exposed
21:42:05 <timburke> liberasurecode_backend_instance_register and liberasurecode_backend_instance_unregister specifically
21:42:14 <mattoliver> yeah, I just saw that
21:42:51 <timburke> but these are functions that act on ec_backend_t instances, which callers should not be using directly -- that's the whole point of the descriptors we return
21:43:10 <timburke> the more i looked into it, the more i realized
21:43:32 <timburke> 1, we expose a whole lot more symbols than we really want to maintain as part of our API
21:44:24 <timburke> and 2, we really need to have a way to track what symbols we're publishing and make it obvious that we're adding to or removing from that list
21:45:13 <timburke> FWIW, i did a bit of digging and discovered that we've previously removed (well, renamed) symbols
21:45:34 <timburke> see https://github.com/openstack/liberasurecode/commit/a6a8d201 -- is_valid_fragment_metadata was renamed to is_invalid_fragment_metadata
21:45:58 <timburke> (yes, really)
21:46:14 <fuleco> Oh no
21:46:27 <mattoliver> wow, go us
21:47:02 <timburke> that was back in 1.2.0 -- and so far as i know, no one called us out for not going to a 2.0 release instead
21:47:56 <timburke> i *think* that the main reason for that is that we have so few consumers -- checking a variety of package managers, i saw at most liberasurecode-dev(el) and pyeclib as reverse dependencies
21:48:45 <timburke> so, provided we don't break pyeclib, maybe we're good to slim things down to the API we actually want to maintain?
21:49:18 <timburke> probably a thing to discuss at the PTG
21:49:55 <mattoliver> I guess the other option is to document and expect anyone using the register/unregister functions must use some kind of locking, like your now doing and so should safely be able to just call them.
21:50:14 <mattoliver> but that still leaves them as non thread-safe blast zones.
21:50:59 <mattoliver> or pull them out of the headers and they just become helper methods for us. We can always add them back is people get annoyed.
21:51:12 <mattoliver> or just rip em out like you've done.
21:51:26 <mattoliver> eitherway, I'll take a closer look and yeah great topic for PTG
21:51:35 <timburke> yeah, i'd kinda really prefer to lock down the API: make sure that we only expose what we want to expose, and make sure that every entrypoint is threadsafe
21:52:08 <mattoliver> +1
21:52:26 <timburke> all this was done because i realized that getting pyeclib to properly support free-threaded python probably needs to start with thread-safety for liberasurecode
21:52:40 <timburke> which leads me to...
21:52:58 <timburke> #topic test pyeclib on py313t
21:53:23 <timburke> currently on master it doesn't even build on a free-threaded python build
21:54:18 <timburke> this is because i was a little over-zealous in trying to enforce abi3 builds a while back; it was effective, but i should have left it to setuptools to get the #define in there
21:54:26 <timburke> #link https://review.opendev.org/c/openstack/pyeclib/+/929328
21:54:26 <patch-bot> patch 929328 - pyeclib - Stop defining Py_LIMITED_API ourselves - 1 patch set
21:54:36 <timburke> will allow it to build, to that
21:54:43 <timburke> #link https://review.opendev.org/c/openstack/pyeclib/+/927605
21:54:43 <patch-bot> patch 927605 - pyeclib - Test under py313 - 2 patch sets
21:54:54 <timburke> can actually pass
21:55:07 <timburke> though there's a warning about how the GIL got re-enabled
21:55:39 <timburke> so if anyone wants to brush up on their C, there's a bunch of stuff that could use eyes
21:56:03 <timburke> but i've used up a lot of the time already, i ought to open it up for other topics
21:56:08 <timburke> #topic open discussion
21:56:52 <timburke> anything else we ought to bring up this week?
21:56:55 <mattoliver> definitley need to brush up on my c. Nice work tim, really interesting and exciting stuff
21:59:44 <mattoliver> I can't remember what I've been doing. Took time off last week. I started reloading chunked s3api chain and that somehow got me looking at pyeclib wheels. So think I might just try and get some reviews done this week. (I also have a bunch of downstream work on my plate I need to finish up). But next week, hopefully I'll have more updates on things I'm digging into.
22:00:14 <timburke> thanks for looking at both those topics!
22:00:29 <timburke> let me know if you want some high-bandwidth time on either or both of them
22:01:13 <mattoliver> kk, thanks tim
22:01:37 <timburke> all right, i think i'll call it
22:01:47 <timburke> thank you all for coming, and thank you for working on swift!
22:01:51 <timburke> #endmeeting