21:00:09 <timburke> #startmeeting swift 21:00:09 <opendevmeet> Meeting started Wed Sep 18 21:00:09 2024 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:09 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:09 <opendevmeet> The meeting name has been set to 'swift' 21:00:16 <timburke> who's here for the swift meeting? 21:00:20 <fuleco> o/ 21:01:03 <mattoliver> o/ 21:01:14 <timburke> as usual, the agenda's at 21:01:17 <timburke> #link https://wiki.openstack.org/wiki/Meetings/Swift 21:01:22 <timburke> first up 21:01:29 <timburke> #topic 2.34.0 release 21:01:54 <timburke> it's out! this is our final release for the dalmatian cycle 21:02:08 <fuleco> 👏🏻 21:02:11 <timburke> there's a lot of good stuff in it 21:02:17 <timburke> #link https://opendev.org/openstack/swift/src/branch/master/CHANGELOG 21:03:09 <mattoliver> \o/ 21:03:11 <mattoliver> nice 21:03:19 <mattoliver> best swift ever :) 21:03:30 <mattoliver> or best swift yet 21:03:35 <mattoliver> whatever the saying :P 21:03:43 <timburke> it's one better than the last one! 21:03:50 <mattoliver> -ETOOEARLY :P 21:03:51 <mattoliver> lol 21:04:03 <timburke> next up 21:04:08 <timburke> #topic vPTG 21:04:13 <timburke> it's next month! 21:04:35 <timburke> sorry, it kind of snuck up on me; i should really pay more attention to the mailing list 21:05:00 <mattoliver> Yeah, it seems to do that 21:05:04 <timburke> we aren't on the initial project list 21:05:06 <timburke> #link https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/APAC5ANX4TQLP5R257D6OIADTN6Y5GMS/ 21:05:07 <patch-bot> Bug #5 - Plone Placeless Translation Service metadata missing from po files (Fix Released) 21:05:31 <mattoliver> I don't think thats the right line :P 21:05:34 <mattoliver> *link 21:05:58 <mattoliver> I think we need to send an email to the ptg people. 21:06:12 <timburke> but i send an email as recommended 21:06:27 <mattoliver> oh cool 21:06:41 <mattoliver> want me to set up the etherpads so we can start collecting topics? 21:07:12 <timburke> sure! i started on the general topics one, but if you wanted to pick up the ops-feedback, that'd be great! 21:07:22 <mattoliver> kk 21:07:24 <timburke> #link https://etherpad.opendev.org/p/swift-ptg-epoxy 21:08:17 <timburke> (side note: i also just learned today that our next release name is Epoxy -- i suppose we help hold everything together :-) 21:09:41 <mattoliver> lol, really 21:09:50 <timburke> i'll get a poll up for meeting times this week, though i suspect as usual there just isn't going to be a great time slot 21:10:41 <timburke> oh, and registration is at 21:10:42 <timburke> #link https://ptg2024.openinfra.dev/ 21:10:55 <timburke> next up 21:11:03 <timburke> #topic s3api bugs 21:11:29 <timburke> fuleco has been good at finding some current deficiencies in our s3api 21:11:36 <timburke> #link https://bugs.launchpad.net/swift/+bug/2077179 21:11:37 <patch-bot> Bug #2077179 - S3Api - async delete (New) 21:11:39 <timburke> and today 21:11:44 <timburke> #link https://bugs.launchpad.net/swift/+bug/2081103 21:11:44 <patch-bot> Bug #2081103 - s3api: Deleting the current version of an object can (sometimes?) 500 (New) 21:13:06 <timburke> i don't think there's much progress on either yet, but i wanted to make sure that people were aware of them, and hopefully we can devote some time toward them before the PTG 21:13:07 <fuleco> Yeah the first one is kind of a feature too 21:13:20 <fuleco> Since it adds a functionality to s3api 21:13:41 <fuleco> But the second one is the weird one in the pack, I've been cracking my head at it for some days now... 21:14:32 <timburke> true enough -- though i'd be willing to bet that AWS does some dribble-out-whitespace trick for long-running multi-deletes, so us *not* matching that behavior could be construed as a bug ;-) 21:15:11 <fuleco> Seems fair XD 21:15:37 <fuleco> Progress from my part has been a little slow, some other issues got my schedule filed 21:15:54 <fuleco> But I hope I can do something on them in the next weeks 21:16:23 <mattoliver> We know the feeling (re:shedules). Nice finds 21:16:33 <timburke> fwiw, the more i think about it, i've got a suspicion that the second one has to do with a Transfer-Encoding: chunked DELETE call, and us re-using the client request environment to PUT the old version back 21:17:06 <fuleco> I do think so too 21:17:15 <timburke> it reminds me a bit of how we (ab)use chunked transfers for EC PUTs 21:17:34 <timburke> https://bugs.launchpad.net/swift/+bug/1496636 21:17:38 <fuleco> On my side, I'll be applying a patch to just add an empty body on the request call to try and deflect that. That would at least give us some more information 21:17:41 <patch-bot> Bug #1496636 - EC: Chunked transfer/commit protocol is *not* HTTP (In Progress) 21:17:44 <fuleco> However 21:18:02 <timburke> which unfortunately means that we don't really want eventlet to fix it properly 21:18:03 <fuleco> I cannot replicate this error in any environment I have access to 21:18:57 <fuleco> So I'll have to wait for it to go to production and maybe get some feedback? Even the original environment where it errored does not consistantly report it 21:19:34 <fuleco> But I would be more than happy if anyone has other ideas or can at least replicate the error 21:20:28 <timburke> for sure, having someone replicate the error would be great 21:21:02 <mattoliver> it is weird that it's a timeout and not an eventlet.timeout, and down in eventlet hub. 21:21:33 <fuleco> Just to give a bit of context, it was a bucket with 230GB in files, versions and delete markers, totaling about 3 and a half million entries 21:21:54 <mattoliver> I mean I thought eventlet would've monkey patched timeout there. 21:22:19 <fuleco> Yeah, it is very weird 21:22:49 <timburke> agreed. i might be part to blame; i think i did a lot of the get-tests-passing-on-py310 work for eventlet 21:23:19 <fuleco> As I was comenting with tim before, my best bet is that it is trying to read a body file that does not exist, but for some reason it registers the read and kinda deadlocks in the IO operation 21:23:24 <timburke> and i kinda remember some funny business around TimeoutError 21:23:40 <fuleco> but I have to admit I didn't go down the eventlet route too much 21:24:32 <mattoliver> good thing we have a timburke who does delve in eventlet upstream :) 21:26:23 <timburke> the chunked transfer thing makes some sense to me -- client sends a DELETE with the 0-byte chunk, then starts reading and blocks 21:26:25 <timburke> server takes the client request and makes the listing to discover the new current version, consuming the client-provided body in the process; server then uses the same client-socket-backed wsgi.input for the PUT, where we try to read a byte that the client will never send 21:26:50 <timburke> (because it would break HTTP) 21:27:48 <mattoliver> oh i see. and the handle_put_version is what then the putting the old version back in place? 21:27:49 <timburke> but if the client sends an explicit Content-Length: 0, everything works out fine (because we can "consume" that kind of a body all we want) 21:27:54 <timburke> yup 21:28:47 <timburke> so that's my suspicion; i just need to get my dev environment back to a point that i can test it 21:29:31 <timburke> extra fun will be breaking the HTTP protocol by sending more data, and seeing what happens when symlink balks 21:29:47 <opendevreview> Shreeya Deshpande proposed openstack/swift master: Split statsd client from logger https://review.opendev.org/c/openstack/swift/+/915483 21:29:59 <timburke> speaking of rabbit holes for tim to fall down... 21:30:09 <timburke> #topic liberasurecode segfault 21:31:14 <timburke> i discovered that creating multiple liberasurecode_rs_vand instances leaks memory! 21:31:29 <mattoliver> oh yay :( 21:31:52 <timburke> and worse, destroying any one of them will leave all other segfault-y 21:32:02 <timburke> like, can't even destroy them 21:32:18 <timburke> patch was reasonably easy, though! 21:32:21 <timburke> #link https://review.opendev.org/c/openstack/liberasurecode/+/929193 21:32:21 <patch-bot> patch 929193 - liberasurecode - built-in rs_vand: De-init tables only when last de... - 3 patch sets 21:33:09 <timburke> the test even reads remarkably well, IMHO 21:33:42 <mattoliver> nice one 21:33:47 <timburke> i discovered this because i was trying to write an even more complicated test around... 21:33:58 <timburke> #topic liberasurecode thread-safety 21:34:15 <timburke> we've known for a while that it's not thread safe 21:34:24 <timburke> #link https://bugs.launchpad.net/liberasurecode/+bug/1954351 21:34:25 <patch-bot> Bug #1954351 - Multiple APIs not thread-safe (In Progress) 21:34:56 <timburke> when i wrote that, i didn't realize just *how* not-thread-safe it is 21:36:38 <timburke> the one lock we've got currently guards just a single data structure, but i'm increasingly convinced that it should guard all usages of backend instances (the structures that get looked up via the descriptor numbers we return) 21:37:26 <timburke> mostly because we can't trust that whatever backend initialization/cleanup will be thread-safe 21:37:45 <mattoliver> cool, haven't looked at liberasurecode in a while. might need to reload what I can in my head :) But yeah test reads pretty good for c :P 21:38:45 <timburke> the fix i wrote for the built-in rs-vand implementation isn't, though i could add another lock and make it better. but i also discovered that jerasurecode isn't, and i wouldn't be surprised if isa-l had similar trouble 21:39:30 <timburke> i've got a few patches now to improve things; the first two seem pretty ready-to-go 21:39:31 <timburke> #link https://review.opendev.org/c/openstack/liberasurecode/+/929324 21:39:32 <patch-bot> patch 929324 - liberasurecode - Fix write locking when destroying instances - 2 patch sets 21:39:35 <timburke> #link https://review.opendev.org/c/openstack/liberasurecode/+/929325 21:39:36 <patch-bot> patch 929325 - liberasurecode - Fix write locking when creating instances - 2 patch sets 21:40:00 <timburke> the last one goes and adds all the missing read-locks, but is still under-tested 21:40:07 <timburke> #link https://review.opendev.org/c/openstack/liberasurecode/+/929847 21:40:08 <patch-bot> patch 929847 - liberasurecode - Add read locks when accessing EC descriptors - 1 patch set 21:41:41 <timburke> the one hesitation i've got with those first two is that we'll remove some functions that are currently exposed 21:42:05 <timburke> liberasurecode_backend_instance_register and liberasurecode_backend_instance_unregister specifically 21:42:14 <mattoliver> yeah, I just saw that 21:42:51 <timburke> but these are functions that act on ec_backend_t instances, which callers should not be using directly -- that's the whole point of the descriptors we return 21:43:10 <timburke> the more i looked into it, the more i realized 21:43:32 <timburke> 1, we expose a whole lot more symbols than we really want to maintain as part of our API 21:44:24 <timburke> and 2, we really need to have a way to track what symbols we're publishing and make it obvious that we're adding to or removing from that list 21:45:13 <timburke> FWIW, i did a bit of digging and discovered that we've previously removed (well, renamed) symbols 21:45:34 <timburke> see https://github.com/openstack/liberasurecode/commit/a6a8d201 -- is_valid_fragment_metadata was renamed to is_invalid_fragment_metadata 21:45:58 <timburke> (yes, really) 21:46:14 <fuleco> Oh no 21:46:27 <mattoliver> wow, go us 21:47:02 <timburke> that was back in 1.2.0 -- and so far as i know, no one called us out for not going to a 2.0 release instead 21:47:56 <timburke> i *think* that the main reason for that is that we have so few consumers -- checking a variety of package managers, i saw at most liberasurecode-dev(el) and pyeclib as reverse dependencies 21:48:45 <timburke> so, provided we don't break pyeclib, maybe we're good to slim things down to the API we actually want to maintain? 21:49:18 <timburke> probably a thing to discuss at the PTG 21:49:55 <mattoliver> I guess the other option is to document and expect anyone using the register/unregister functions must use some kind of locking, like your now doing and so should safely be able to just call them. 21:50:14 <mattoliver> but that still leaves them as non thread-safe blast zones. 21:50:59 <mattoliver> or pull them out of the headers and they just become helper methods for us. We can always add them back is people get annoyed. 21:51:12 <mattoliver> or just rip em out like you've done. 21:51:26 <mattoliver> eitherway, I'll take a closer look and yeah great topic for PTG 21:51:35 <timburke> yeah, i'd kinda really prefer to lock down the API: make sure that we only expose what we want to expose, and make sure that every entrypoint is threadsafe 21:52:08 <mattoliver> +1 21:52:26 <timburke> all this was done because i realized that getting pyeclib to properly support free-threaded python probably needs to start with thread-safety for liberasurecode 21:52:40 <timburke> which leads me to... 21:52:58 <timburke> #topic test pyeclib on py313t 21:53:23 <timburke> currently on master it doesn't even build on a free-threaded python build 21:54:18 <timburke> this is because i was a little over-zealous in trying to enforce abi3 builds a while back; it was effective, but i should have left it to setuptools to get the #define in there 21:54:26 <timburke> #link https://review.opendev.org/c/openstack/pyeclib/+/929328 21:54:26 <patch-bot> patch 929328 - pyeclib - Stop defining Py_LIMITED_API ourselves - 1 patch set 21:54:36 <timburke> will allow it to build, to that 21:54:43 <timburke> #link https://review.opendev.org/c/openstack/pyeclib/+/927605 21:54:43 <patch-bot> patch 927605 - pyeclib - Test under py313 - 2 patch sets 21:54:54 <timburke> can actually pass 21:55:07 <timburke> though there's a warning about how the GIL got re-enabled 21:55:39 <timburke> so if anyone wants to brush up on their C, there's a bunch of stuff that could use eyes 21:56:03 <timburke> but i've used up a lot of the time already, i ought to open it up for other topics 21:56:08 <timburke> #topic open discussion 21:56:52 <timburke> anything else we ought to bring up this week? 21:56:55 <mattoliver> definitley need to brush up on my c. Nice work tim, really interesting and exciting stuff 21:59:44 <mattoliver> I can't remember what I've been doing. Took time off last week. I started reloading chunked s3api chain and that somehow got me looking at pyeclib wheels. So think I might just try and get some reviews done this week. (I also have a bunch of downstream work on my plate I need to finish up). But next week, hopefully I'll have more updates on things I'm digging into. 22:00:14 <timburke> thanks for looking at both those topics! 22:00:29 <timburke> let me know if you want some high-bandwidth time on either or both of them 22:01:13 <mattoliver> kk, thanks tim 22:01:37 <timburke> all right, i think i'll call it 22:01:47 <timburke> thank you all for coming, and thank you for working on swift! 22:01:51 <timburke> #endmeeting