simondodsley | Newbie to Swift here. What is the preferred way of integrating an S3 capable external device to use as a Swift store? | 16:17 |
---|---|---|
DHE | you mean use swift as its storage? | 17:17 |
opendevreview | Merged openstack/swift master: CI: Include --domain in more openstack commands https://review.opendev.org/c/openstack/swift/+/929688 | 17:59 |
fuleco | hey timburke if you're online, I'd like to discuss a bug that appeared in our deploy this week | 18:51 |
timburke | simondodsley, what do you mean by "use as a Swift store"? swift is a storage platform; it can act as an S3 endpoint, but the configuration details will vary from client to client | 18:53 |
timburke | fuleco, sure -- what's up? | 18:53 |
simondodsley | timburke I have an object store platform that i want to configure to into swift. It is S3 compliant, but I'm not seeing how to add it to Swift as the only backend. | 18:55 |
fuleco | So, we've had a user with an extremely large versioned bucket and with high level of traffic that reported a rclone purge (delete all objects in bucket than bucket). It got a timeout. from what we got of the stack trace, comes to object_versioning.py, which tries to read an empty body and never gets unblocked. | 18:55 |
fuleco | I couldn't replicate the error in other environments, however. I was wondering if something like that came up in the past? From what i've searched, it has not | 18:56 |
timburke | simondodsley, swift doesn't really take storage backends; it's an object store. i guess maybe you could wire something together with https://opendev.org/x/swiftonfile and some filesystem <-> S3 bridge, but i'd mostly be left wondering, why? | 18:58 |
timburke | fuleco, i don't recall having seen that before -- any chance you've got a stack trace for that "tries to read an empty body"? | 19:00 |
simondodsley | what our customers are wanting to do is use the expensive object store they have purchased and integrate it into their openstack cluster. Is there any way to use that external object platform as the primary object store for swift? | 19:00 |
simondodsley | or is there some gateway that can be utilized | 19:00 |
fuleco | I do have, but is somewhat large. How would you like me to send it? | 19:00 |
timburke | simondodsley, maybe you could register it in the keystone catalog? but does the S3-compatible store speak the Swift API? | 19:01 |
timburke | fuleco, i'll often drop these sorts of things on https://paste.opendev.org/ | 19:02 |
simondodsley | timburke - no it doesn't speak swift | 19:02 |
fuleco | timburke ok, ill set it up | 19:03 |
simondodsley | is there any swift<->S3 gateway we could use? | 19:05 |
fuleco | https://paste.opendev.org/show/b0G5BYi68GrDCeWIOdrZ/ Maybe you can access it? | 19:06 |
timburke | simondodsley, maybe something like https://github.com/caiobrentano/swift-s3-sync ? it's an old project, and not even the original, but a fork that was made before the original went private | 19:13 |
simondodsley | timburke: thanks - is the original a paid for service now? | 19:14 |
timburke | simondodsley, it's more complicated than that -- it went private when funding dried up for the startup that built it; start up got acquired with it still private, then the project largely got abandoned | 19:19 |
timburke | (full disclosure: i was working at the startup through all this) | 19:19 |
timburke | looks like some of the other "forks" have a more complete history -- something like https://github.com/GhostPunk/1space is better, but still hasn't been updated in 5 years | 19:20 |
timburke | fuleco, looks like the client didn't provide any content-length (see https://github.com/openstack/swift/blob/2.34.0/swift/common/middleware/versioned_writes/object_versioning.py#L676-L679) | 19:23 |
timburke | that should either mean a chunked transfer, or the client should shut down its sending and just be listening for a response | 19:24 |
fuleco | The client is rclone. Even though it seams like it, this just happens in this one specific bucket. We could not isolate the problem or repeat the behaviour. Looks to me like it could be a data desync between concurrent tasks? | 19:27 |
timburke | curious that it should be caught somewhere around https://github.com/openstack/swift/blob/2.34.0/swift/common/middleware/s3api/s3api.py#L361-L362 -- oh! it's a proper TimeoutError, rather than an eventlet.Timeout! | 19:29 |
fuleco | Yep | 19:29 |
fuleco | It looks like it tries to read an empty buffer and keeps waiting eternally | 19:30 |
timburke | oh! or maybe the trouble is that https://github.com/openstack/swift/blob/2.34.0/swift/common/middleware/s3api/controllers/obj.py#L206-L207 re-uses the client req without replacing the wsgi.input... | 19:37 |
timburke | i'm surprised that wouldn't have been caught in func tests though... | 19:37 |
fuleco | Yep that's what we were going for. | 19:37 |
fuleco | It is very weird though that it didn't happen in any tests | 19:38 |
fuleco | Not the automated, not the manual we run | 19:38 |
fuleco | It just happens in this specific scenario | 19:38 |
timburke | ugh, right, and i left my dev env in a dirty state... i can't even really test myself right now | 19:45 |
timburke | looks like it'd manifest as a 500 to the client, though, yeah? | 19:45 |
fuleco | No worries, just wanted to bring it up to discussion. | 19:45 |
fuleco | Exactly, 500 error | 19:46 |
timburke | created https://bugs.launchpad.net/swift/+bug/2081103 at least | 20:03 |
patch-bot | Bug #2081103 - s3api: Deleting the current version of an object can (sometimes?) 500 (New) | 20:03 |
fuleco | Oh thanks a lot timburke. I was opening it on my end to haha | 20:03 |
opendevreview | Tim Burke proposed openstack/liberasurecode master: built-in rs_vand: De-init tables only when last descriptor is destroyed https://review.opendev.org/c/openstack/liberasurecode/+/929193 | 20:26 |
opendevreview | Tim Burke proposed openstack/liberasurecode master: Fix write locking when destroying instances https://review.opendev.org/c/openstack/liberasurecode/+/929324 | 20:26 |
opendevreview | Tim Burke proposed openstack/liberasurecode master: Fix write locking when creating instances https://review.opendev.org/c/openstack/liberasurecode/+/929325 | 20:26 |
opendevreview | Tim Burke proposed openstack/liberasurecode master: Add read locks when accessing EC descriptors https://review.opendev.org/c/openstack/liberasurecode/+/929847 | 20:27 |
timburke | #startmeeting swift | 21:00 |
opendevmeet | Meeting started Wed Sep 18 21:00:09 2024 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot. | 21:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 21:00 |
opendevmeet | The meeting name has been set to 'swift' | 21:00 |
timburke | who's here for the swift meeting? | 21:00 |
fuleco | o/ | 21:00 |
mattoliver | o/ | 21:01 |
timburke | as usual, the agenda's at | 21:01 |
timburke | #link https://wiki.openstack.org/wiki/Meetings/Swift | 21:01 |
timburke | first up | 21:01 |
timburke | #topic 2.34.0 release | 21:01 |
timburke | it's out! this is our final release for the dalmatian cycle | 21:01 |
fuleco | 👏🏻 | 21:02 |
timburke | there's a lot of good stuff in it | 21:02 |
timburke | #link https://opendev.org/openstack/swift/src/branch/master/CHANGELOG | 21:02 |
mattoliver | \o/ | 21:03 |
mattoliver | nice | 21:03 |
mattoliver | best swift ever :) | 21:03 |
mattoliver | or best swift yet | 21:03 |
mattoliver | whatever the saying :P | 21:03 |
timburke | it's one better than the last one! | 21:03 |
mattoliver | -ETOOEARLY :P | 21:03 |
mattoliver | lol | 21:03 |
timburke | next up | 21:04 |
timburke | #topic vPTG | 21:04 |
timburke | it's next month! | 21:04 |
timburke | sorry, it kind of snuck up on me; i should really pay more attention to the mailing list | 21:04 |
mattoliver | Yeah, it seems to do that | 21:05 |
timburke | we aren't on the initial project list | 21:05 |
timburke | #link https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/APAC5ANX4TQLP5R257D6OIADTN6Y5GMS/ | 21:05 |
patch-bot | Bug #5 - Plone Placeless Translation Service metadata missing from po files (Fix Released) | 21:05 |
mattoliver | I don't think thats the right line :P | 21:05 |
mattoliver | *link | 21:05 |
mattoliver | I think we need to send an email to the ptg people. | 21:05 |
timburke | but i send an email as recommended | 21:06 |
mattoliver | oh cool | 21:06 |
mattoliver | want me to set up the etherpads so we can start collecting topics? | 21:06 |
timburke | sure! i started on the general topics one, but if you wanted to pick up the ops-feedback, that'd be great! | 21:07 |
mattoliver | kk | 21:07 |
timburke | #link https://etherpad.opendev.org/p/swift-ptg-epoxy | 21:07 |
timburke | (side note: i also just learned today that our next release name is Epoxy -- i suppose we help hold everything together :-) | 21:08 |
mattoliver | lol, really | 21:09 |
timburke | i'll get a poll up for meeting times this week, though i suspect as usual there just isn't going to be a great time slot | 21:09 |
timburke | oh, and registration is at | 21:10 |
timburke | #link https://ptg2024.openinfra.dev/ | 21:10 |
timburke | next up | 21:10 |
timburke | #topic s3api bugs | 21:11 |
timburke | fuleco has been good at finding some current deficiencies in our s3api | 21:11 |
timburke | #link https://bugs.launchpad.net/swift/+bug/2077179 | 21:11 |
patch-bot | Bug #2077179 - S3Api - async delete (New) | 21:11 |
timburke | and today | 21:11 |
timburke | #link https://bugs.launchpad.net/swift/+bug/2081103 | 21:11 |
patch-bot | Bug #2081103 - s3api: Deleting the current version of an object can (sometimes?) 500 (New) | 21:11 |
timburke | i don't think there's much progress on either yet, but i wanted to make sure that people were aware of them, and hopefully we can devote some time toward them before the PTG | 21:13 |
fuleco | Yeah the first one is kind of a feature too | 21:13 |
fuleco | Since it adds a functionality to s3api | 21:13 |
fuleco | But the second one is the weird one in the pack, I've been cracking my head at it for some days now... | 21:13 |
timburke | true enough -- though i'd be willing to bet that AWS does some dribble-out-whitespace trick for long-running multi-deletes, so us *not* matching that behavior could be construed as a bug ;-) | 21:14 |
fuleco | Seems fair XD | 21:15 |
fuleco | Progress from my part has been a little slow, some other issues got my schedule filed | 21:15 |
fuleco | But I hope I can do something on them in the next weeks | 21:15 |
mattoliver | We know the feeling (re:shedules). Nice finds | 21:16 |
timburke | fwiw, the more i think about it, i've got a suspicion that the second one has to do with a Transfer-Encoding: chunked DELETE call, and us re-using the client request environment to PUT the old version back | 21:16 |
fuleco | I do think so too | 21:17 |
timburke | it reminds me a bit of how we (ab)use chunked transfers for EC PUTs | 21:17 |
timburke | https://bugs.launchpad.net/swift/+bug/1496636 | 21:17 |
fuleco | On my side, I'll be applying a patch to just add an empty body on the request call to try and deflect that. That would at least give us some more information | 21:17 |
patch-bot | Bug #1496636 - EC: Chunked transfer/commit protocol is *not* HTTP (In Progress) | 21:17 |
fuleco | However | 21:17 |
timburke | which unfortunately means that we don't really want eventlet to fix it properly | 21:18 |
fuleco | I cannot replicate this error in any environment I have access to | 21:18 |
fuleco | So I'll have to wait for it to go to production and maybe get some feedback? Even the original environment where it errored does not consistantly report it | 21:18 |
fuleco | But I would be more than happy if anyone has other ideas or can at least replicate the error | 21:19 |
timburke | for sure, having someone replicate the error would be great | 21:20 |
mattoliver | it is weird that it's a timeout and not an eventlet.timeout, and down in eventlet hub. | 21:21 |
fuleco | Just to give a bit of context, it was a bucket with 230GB in files, versions and delete markers, totaling about 3 and a half million entries | 21:21 |
mattoliver | I mean I thought eventlet would've monkey patched timeout there. | 21:21 |
fuleco | Yeah, it is very weird | 21:22 |
timburke | agreed. i might be part to blame; i think i did a lot of the get-tests-passing-on-py310 work for eventlet | 21:22 |
fuleco | As I was comenting with tim before, my best bet is that it is trying to read a body file that does not exist, but for some reason it registers the read and kinda deadlocks in the IO operation | 21:23 |
timburke | and i kinda remember some funny business around TimeoutError | 21:23 |
fuleco | but I have to admit I didn't go down the eventlet route too much | 21:23 |
mattoliver | good thing we have a timburke who does delve in eventlet upstream :) | 21:24 |
timburke | the chunked transfer thing makes some sense to me -- client sends a DELETE with the 0-byte chunk, then starts reading and blocks | 21:26 |
timburke | server takes the client request and makes the listing to discover the new current version, consuming the client-provided body in the process; server then uses the same client-socket-backed wsgi.input for the PUT, where we try to read a byte that the client will never send | 21:26 |
timburke | (because it would break HTTP) | 21:26 |
mattoliver | oh i see. and the handle_put_version is what then the putting the old version back in place? | 21:27 |
timburke | but if the client sends an explicit Content-Length: 0, everything works out fine (because we can "consume" that kind of a body all we want) | 21:27 |
timburke | yup | 21:27 |
timburke | so that's my suspicion; i just need to get my dev environment back to a point that i can test it | 21:28 |
timburke | extra fun will be breaking the HTTP protocol by sending more data, and seeing what happens when symlink balks | 21:29 |
opendevreview | Shreeya Deshpande proposed openstack/swift master: Split statsd client from logger https://review.opendev.org/c/openstack/swift/+/915483 | 21:29 |
timburke | speaking of rabbit holes for tim to fall down... | 21:29 |
timburke | #topic liberasurecode segfault | 21:30 |
timburke | i discovered that creating multiple liberasurecode_rs_vand instances leaks memory! | 21:31 |
mattoliver | oh yay :( | 21:31 |
timburke | and worse, destroying any one of them will leave all other segfault-y | 21:31 |
timburke | like, can't even destroy them | 21:32 |
timburke | patch was reasonably easy, though! | 21:32 |
timburke | #link https://review.opendev.org/c/openstack/liberasurecode/+/929193 | 21:32 |
patch-bot | patch 929193 - liberasurecode - built-in rs_vand: De-init tables only when last de... - 3 patch sets | 21:32 |
timburke | the test even reads remarkably well, IMHO | 21:33 |
mattoliver | nice one | 21:33 |
timburke | i discovered this because i was trying to write an even more complicated test around... | 21:33 |
timburke | #topic liberasurecode thread-safety | 21:33 |
timburke | we've known for a while that it's not thread safe | 21:34 |
timburke | #link https://bugs.launchpad.net/liberasurecode/+bug/1954351 | 21:34 |
patch-bot | Bug #1954351 - Multiple APIs not thread-safe (In Progress) | 21:34 |
timburke | when i wrote that, i didn't realize just *how* not-thread-safe it is | 21:34 |
timburke | the one lock we've got currently guards just a single data structure, but i'm increasingly convinced that it should guard all usages of backend instances (the structures that get looked up via the descriptor numbers we return) | 21:36 |
timburke | mostly because we can't trust that whatever backend initialization/cleanup will be thread-safe | 21:37 |
mattoliver | cool, haven't looked at liberasurecode in a while. might need to reload what I can in my head :) But yeah test reads pretty good for c :P | 21:37 |
timburke | the fix i wrote for the built-in rs-vand implementation isn't, though i could add another lock and make it better. but i also discovered that jerasurecode isn't, and i wouldn't be surprised if isa-l had similar trouble | 21:38 |
timburke | i've got a few patches now to improve things; the first two seem pretty ready-to-go | 21:39 |
timburke | #link https://review.opendev.org/c/openstack/liberasurecode/+/929324 | 21:39 |
patch-bot | patch 929324 - liberasurecode - Fix write locking when destroying instances - 2 patch sets | 21:39 |
timburke | #link https://review.opendev.org/c/openstack/liberasurecode/+/929325 | 21:39 |
patch-bot | patch 929325 - liberasurecode - Fix write locking when creating instances - 2 patch sets | 21:39 |
timburke | the last one goes and adds all the missing read-locks, but is still under-tested | 21:40 |
timburke | #link https://review.opendev.org/c/openstack/liberasurecode/+/929847 | 21:40 |
patch-bot | patch 929847 - liberasurecode - Add read locks when accessing EC descriptors - 1 patch set | 21:40 |
timburke | the one hesitation i've got with those first two is that we'll remove some functions that are currently exposed | 21:41 |
timburke | liberasurecode_backend_instance_register and liberasurecode_backend_instance_unregister specifically | 21:42 |
mattoliver | yeah, I just saw that | 21:42 |
timburke | but these are functions that act on ec_backend_t instances, which callers should not be using directly -- that's the whole point of the descriptors we return | 21:42 |
timburke | the more i looked into it, the more i realized | 21:43 |
timburke | 1, we expose a whole lot more symbols than we really want to maintain as part of our API | 21:43 |
timburke | and 2, we really need to have a way to track what symbols we're publishing and make it obvious that we're adding to or removing from that list | 21:44 |
timburke | FWIW, i did a bit of digging and discovered that we've previously removed (well, renamed) symbols | 21:45 |
timburke | see https://github.com/openstack/liberasurecode/commit/a6a8d201 -- is_valid_fragment_metadata was renamed to is_invalid_fragment_metadata | 21:45 |
timburke | (yes, really) | 21:45 |
fuleco | Oh no | 21:46 |
mattoliver | wow, go us | 21:46 |
timburke | that was back in 1.2.0 -- and so far as i know, no one called us out for not going to a 2.0 release instead | 21:47 |
timburke | i *think* that the main reason for that is that we have so few consumers -- checking a variety of package managers, i saw at most liberasurecode-dev(el) and pyeclib as reverse dependencies | 21:47 |
timburke | so, provided we don't break pyeclib, maybe we're good to slim things down to the API we actually want to maintain? | 21:48 |
timburke | probably a thing to discuss at the PTG | 21:49 |
mattoliver | I guess the other option is to document and expect anyone using the register/unregister functions must use some kind of locking, like your now doing and so should safely be able to just call them. | 21:49 |
mattoliver | but that still leaves them as non thread-safe blast zones. | 21:50 |
mattoliver | or pull them out of the headers and they just become helper methods for us. We can always add them back is people get annoyed. | 21:50 |
mattoliver | or just rip em out like you've done. | 21:51 |
mattoliver | eitherway, I'll take a closer look and yeah great topic for PTG | 21:51 |
timburke | yeah, i'd kinda really prefer to lock down the API: make sure that we only expose what we want to expose, and make sure that every entrypoint is threadsafe | 21:51 |
mattoliver | +1 | 21:52 |
timburke | all this was done because i realized that getting pyeclib to properly support free-threaded python probably needs to start with thread-safety for liberasurecode | 21:52 |
timburke | which leads me to... | 21:52 |
timburke | #topic test pyeclib on py313t | 21:52 |
timburke | currently on master it doesn't even build on a free-threaded python build | 21:53 |
timburke | this is because i was a little over-zealous in trying to enforce abi3 builds a while back; it was effective, but i should have left it to setuptools to get the #define in there | 21:54 |
timburke | #link https://review.opendev.org/c/openstack/pyeclib/+/929328 | 21:54 |
patch-bot | patch 929328 - pyeclib - Stop defining Py_LIMITED_API ourselves - 1 patch set | 21:54 |
timburke | will allow it to build, to that | 21:54 |
timburke | #link https://review.opendev.org/c/openstack/pyeclib/+/927605 | 21:54 |
patch-bot | patch 927605 - pyeclib - Test under py313 - 2 patch sets | 21:54 |
timburke | can actually pass | 21:54 |
timburke | though there's a warning about how the GIL got re-enabled | 21:55 |
timburke | so if anyone wants to brush up on their C, there's a bunch of stuff that could use eyes | 21:55 |
timburke | but i've used up a lot of the time already, i ought to open it up for other topics | 21:56 |
timburke | #topic open discussion | 21:56 |
timburke | anything else we ought to bring up this week? | 21:56 |
mattoliver | definitley need to brush up on my c. Nice work tim, really interesting and exciting stuff | 21:56 |
mattoliver | I can't remember what I've been doing. Took time off last week. I started reloading chunked s3api chain and that somehow got me looking at pyeclib wheels. So think I might just try and get some reviews done this week. (I also have a bunch of downstream work on my plate I need to finish up). But next week, hopefully I'll have more updates on things I'm digging into. | 21:59 |
timburke | thanks for looking at both those topics! | 22:00 |
timburke | let me know if you want some high-bandwidth time on either or both of them | 22:00 |
mattoliver | kk, thanks tim | 22:01 |
timburke | all right, i think i'll call it | 22:01 |
timburke | thank you all for coming, and thank you for working on swift! | 22:01 |
timburke | #endmeeting | 22:01 |
opendevmeet | Meeting ended Wed Sep 18 22:01:51 2024 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 22:01 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/swift/2024/swift.2024-09-18-21.00.html | 22:01 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/swift/2024/swift.2024-09-18-21.00.txt | 22:01 |
opendevmeet | Log: https://meetings.opendev.org/meetings/swift/2024/swift.2024-09-18-21.00.log.html | 22:01 |
opendevreview | Tim Burke proposed openstack/liberasurecode master: Track symbols exposed by built so's https://review.opendev.org/c/openstack/liberasurecode/+/929855 | 22:37 |
opendevreview | Tim Burke proposed openstack/liberasurecode master: Track symbols exposed by built so's https://review.opendev.org/c/openstack/liberasurecode/+/929855 | 22:57 |
opendevreview | Tim Burke proposed openstack/liberasurecode master: Track symbols exposed by built so's https://review.opendev.org/c/openstack/liberasurecode/+/929855 | 23:02 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!