21:00:09 #startmeeting swift 21:00:09 Meeting started Wed Sep 18 21:00:09 2024 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:09 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:09 The meeting name has been set to 'swift' 21:00:16 who's here for the swift meeting? 21:00:20 o/ 21:01:03 o/ 21:01:14 as usual, the agenda's at 21:01:17 #link https://wiki.openstack.org/wiki/Meetings/Swift 21:01:22 first up 21:01:29 #topic 2.34.0 release 21:01:54 it's out! this is our final release for the dalmatian cycle 21:02:08 👏🏻 21:02:11 there's a lot of good stuff in it 21:02:17 #link https://opendev.org/openstack/swift/src/branch/master/CHANGELOG 21:03:09 \o/ 21:03:11 nice 21:03:19 best swift ever :) 21:03:30 or best swift yet 21:03:35 whatever the saying :P 21:03:43 it's one better than the last one! 21:03:50 -ETOOEARLY :P 21:03:51 lol 21:04:03 next up 21:04:08 #topic vPTG 21:04:13 it's next month! 21:04:35 sorry, it kind of snuck up on me; i should really pay more attention to the mailing list 21:05:00 Yeah, it seems to do that 21:05:04 we aren't on the initial project list 21:05:06 #link https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/APAC5ANX4TQLP5R257D6OIADTN6Y5GMS/ 21:05:07 Bug #5 - Plone Placeless Translation Service metadata missing from po files (Fix Released) 21:05:31 I don't think thats the right line :P 21:05:34 *link 21:05:58 I think we need to send an email to the ptg people. 21:06:12 but i send an email as recommended 21:06:27 oh cool 21:06:41 want me to set up the etherpads so we can start collecting topics? 21:07:12 sure! i started on the general topics one, but if you wanted to pick up the ops-feedback, that'd be great! 21:07:22 kk 21:07:24 #link https://etherpad.opendev.org/p/swift-ptg-epoxy 21:08:17 (side note: i also just learned today that our next release name is Epoxy -- i suppose we help hold everything together :-) 21:09:41 lol, really 21:09:50 i'll get a poll up for meeting times this week, though i suspect as usual there just isn't going to be a great time slot 21:10:41 oh, and registration is at 21:10:42 #link https://ptg2024.openinfra.dev/ 21:10:55 next up 21:11:03 #topic s3api bugs 21:11:29 fuleco has been good at finding some current deficiencies in our s3api 21:11:36 #link https://bugs.launchpad.net/swift/+bug/2077179 21:11:37 Bug #2077179 - S3Api - async delete (New) 21:11:39 and today 21:11:44 #link https://bugs.launchpad.net/swift/+bug/2081103 21:11:44 Bug #2081103 - s3api: Deleting the current version of an object can (sometimes?) 500 (New) 21:13:06 i don't think there's much progress on either yet, but i wanted to make sure that people were aware of them, and hopefully we can devote some time toward them before the PTG 21:13:07 Yeah the first one is kind of a feature too 21:13:20 Since it adds a functionality to s3api 21:13:41 But the second one is the weird one in the pack, I've been cracking my head at it for some days now... 21:14:32 true enough -- though i'd be willing to bet that AWS does some dribble-out-whitespace trick for long-running multi-deletes, so us *not* matching that behavior could be construed as a bug ;-) 21:15:11 Seems fair XD 21:15:37 Progress from my part has been a little slow, some other issues got my schedule filed 21:15:54 But I hope I can do something on them in the next weeks 21:16:23 We know the feeling (re:shedules). Nice finds 21:16:33 fwiw, the more i think about it, i've got a suspicion that the second one has to do with a Transfer-Encoding: chunked DELETE call, and us re-using the client request environment to PUT the old version back 21:17:06 I do think so too 21:17:15 it reminds me a bit of how we (ab)use chunked transfers for EC PUTs 21:17:34 https://bugs.launchpad.net/swift/+bug/1496636 21:17:38 On my side, I'll be applying a patch to just add an empty body on the request call to try and deflect that. That would at least give us some more information 21:17:41 Bug #1496636 - EC: Chunked transfer/commit protocol is *not* HTTP (In Progress) 21:17:44 However 21:18:02 which unfortunately means that we don't really want eventlet to fix it properly 21:18:03 I cannot replicate this error in any environment I have access to 21:18:57 So I'll have to wait for it to go to production and maybe get some feedback? Even the original environment where it errored does not consistantly report it 21:19:34 But I would be more than happy if anyone has other ideas or can at least replicate the error 21:20:28 for sure, having someone replicate the error would be great 21:21:02 it is weird that it's a timeout and not an eventlet.timeout, and down in eventlet hub. 21:21:33 Just to give a bit of context, it was a bucket with 230GB in files, versions and delete markers, totaling about 3 and a half million entries 21:21:54 I mean I thought eventlet would've monkey patched timeout there. 21:22:19 Yeah, it is very weird 21:22:49 agreed. i might be part to blame; i think i did a lot of the get-tests-passing-on-py310 work for eventlet 21:23:19 As I was comenting with tim before, my best bet is that it is trying to read a body file that does not exist, but for some reason it registers the read and kinda deadlocks in the IO operation 21:23:24 and i kinda remember some funny business around TimeoutError 21:23:40 but I have to admit I didn't go down the eventlet route too much 21:24:32 good thing we have a timburke who does delve in eventlet upstream :) 21:26:23 the chunked transfer thing makes some sense to me -- client sends a DELETE with the 0-byte chunk, then starts reading and blocks 21:26:25 server takes the client request and makes the listing to discover the new current version, consuming the client-provided body in the process; server then uses the same client-socket-backed wsgi.input for the PUT, where we try to read a byte that the client will never send 21:26:50 (because it would break HTTP) 21:27:48 oh i see. and the handle_put_version is what then the putting the old version back in place? 21:27:49 but if the client sends an explicit Content-Length: 0, everything works out fine (because we can "consume" that kind of a body all we want) 21:27:54 yup 21:28:47 so that's my suspicion; i just need to get my dev environment back to a point that i can test it 21:29:31 extra fun will be breaking the HTTP protocol by sending more data, and seeing what happens when symlink balks 21:29:47 Shreeya Deshpande proposed openstack/swift master: Split statsd client from logger https://review.opendev.org/c/openstack/swift/+/915483 21:29:59 speaking of rabbit holes for tim to fall down... 21:30:09 #topic liberasurecode segfault 21:31:14 i discovered that creating multiple liberasurecode_rs_vand instances leaks memory! 21:31:29 oh yay :( 21:31:52 and worse, destroying any one of them will leave all other segfault-y 21:32:02 like, can't even destroy them 21:32:18 patch was reasonably easy, though! 21:32:21 #link https://review.opendev.org/c/openstack/liberasurecode/+/929193 21:32:21 patch 929193 - liberasurecode - built-in rs_vand: De-init tables only when last de... - 3 patch sets 21:33:09 the test even reads remarkably well, IMHO 21:33:42 nice one 21:33:47 i discovered this because i was trying to write an even more complicated test around... 21:33:58 #topic liberasurecode thread-safety 21:34:15 we've known for a while that it's not thread safe 21:34:24 #link https://bugs.launchpad.net/liberasurecode/+bug/1954351 21:34:25 Bug #1954351 - Multiple APIs not thread-safe (In Progress) 21:34:56 when i wrote that, i didn't realize just *how* not-thread-safe it is 21:36:38 the one lock we've got currently guards just a single data structure, but i'm increasingly convinced that it should guard all usages of backend instances (the structures that get looked up via the descriptor numbers we return) 21:37:26 mostly because we can't trust that whatever backend initialization/cleanup will be thread-safe 21:37:45 cool, haven't looked at liberasurecode in a while. might need to reload what I can in my head :) But yeah test reads pretty good for c :P 21:38:45 the fix i wrote for the built-in rs-vand implementation isn't, though i could add another lock and make it better. but i also discovered that jerasurecode isn't, and i wouldn't be surprised if isa-l had similar trouble 21:39:30 i've got a few patches now to improve things; the first two seem pretty ready-to-go 21:39:31 #link https://review.opendev.org/c/openstack/liberasurecode/+/929324 21:39:32 patch 929324 - liberasurecode - Fix write locking when destroying instances - 2 patch sets 21:39:35 #link https://review.opendev.org/c/openstack/liberasurecode/+/929325 21:39:36 patch 929325 - liberasurecode - Fix write locking when creating instances - 2 patch sets 21:40:00 the last one goes and adds all the missing read-locks, but is still under-tested 21:40:07 #link https://review.opendev.org/c/openstack/liberasurecode/+/929847 21:40:08 patch 929847 - liberasurecode - Add read locks when accessing EC descriptors - 1 patch set 21:41:41 the one hesitation i've got with those first two is that we'll remove some functions that are currently exposed 21:42:05 liberasurecode_backend_instance_register and liberasurecode_backend_instance_unregister specifically 21:42:14 yeah, I just saw that 21:42:51 but these are functions that act on ec_backend_t instances, which callers should not be using directly -- that's the whole point of the descriptors we return 21:43:10 the more i looked into it, the more i realized 21:43:32 1, we expose a whole lot more symbols than we really want to maintain as part of our API 21:44:24 and 2, we really need to have a way to track what symbols we're publishing and make it obvious that we're adding to or removing from that list 21:45:13 FWIW, i did a bit of digging and discovered that we've previously removed (well, renamed) symbols 21:45:34 see https://github.com/openstack/liberasurecode/commit/a6a8d201 -- is_valid_fragment_metadata was renamed to is_invalid_fragment_metadata 21:45:58 (yes, really) 21:46:14 Oh no 21:46:27 wow, go us 21:47:02 that was back in 1.2.0 -- and so far as i know, no one called us out for not going to a 2.0 release instead 21:47:56 i *think* that the main reason for that is that we have so few consumers -- checking a variety of package managers, i saw at most liberasurecode-dev(el) and pyeclib as reverse dependencies 21:48:45 so, provided we don't break pyeclib, maybe we're good to slim things down to the API we actually want to maintain? 21:49:18 probably a thing to discuss at the PTG 21:49:55 I guess the other option is to document and expect anyone using the register/unregister functions must use some kind of locking, like your now doing and so should safely be able to just call them. 21:50:14 but that still leaves them as non thread-safe blast zones. 21:50:59 or pull them out of the headers and they just become helper methods for us. We can always add them back is people get annoyed. 21:51:12 or just rip em out like you've done. 21:51:26 eitherway, I'll take a closer look and yeah great topic for PTG 21:51:35 yeah, i'd kinda really prefer to lock down the API: make sure that we only expose what we want to expose, and make sure that every entrypoint is threadsafe 21:52:08 +1 21:52:26 all this was done because i realized that getting pyeclib to properly support free-threaded python probably needs to start with thread-safety for liberasurecode 21:52:40 which leads me to... 21:52:58 #topic test pyeclib on py313t 21:53:23 currently on master it doesn't even build on a free-threaded python build 21:54:18 this is because i was a little over-zealous in trying to enforce abi3 builds a while back; it was effective, but i should have left it to setuptools to get the #define in there 21:54:26 #link https://review.opendev.org/c/openstack/pyeclib/+/929328 21:54:26 patch 929328 - pyeclib - Stop defining Py_LIMITED_API ourselves - 1 patch set 21:54:36 will allow it to build, to that 21:54:43 #link https://review.opendev.org/c/openstack/pyeclib/+/927605 21:54:43 patch 927605 - pyeclib - Test under py313 - 2 patch sets 21:54:54 can actually pass 21:55:07 though there's a warning about how the GIL got re-enabled 21:55:39 so if anyone wants to brush up on their C, there's a bunch of stuff that could use eyes 21:56:03 but i've used up a lot of the time already, i ought to open it up for other topics 21:56:08 #topic open discussion 21:56:52 anything else we ought to bring up this week? 21:56:55 definitley need to brush up on my c. Nice work tim, really interesting and exciting stuff 21:59:44 I can't remember what I've been doing. Took time off last week. I started reloading chunked s3api chain and that somehow got me looking at pyeclib wheels. So think I might just try and get some reviews done this week. (I also have a bunch of downstream work on my plate I need to finish up). But next week, hopefully I'll have more updates on things I'm digging into. 22:00:14 thanks for looking at both those topics! 22:00:29 let me know if you want some high-bandwidth time on either or both of them 22:01:13 kk, thanks tim 22:01:37 all right, i think i'll call it 22:01:47 thank you all for coming, and thank you for working on swift! 22:01:51 #endmeeting