Wednesday, 2024-09-18

simondodsleyNewbie to Swift here. What is the preferred way of integrating an S3 capable external device to use as a Swift store?16:17
DHEyou mean use swift as its storage?17:17
opendevreviewMerged openstack/swift master: CI: Include --domain in more openstack commands  https://review.opendev.org/c/openstack/swift/+/92968817:59
fulecohey timburke if you're online, I'd like to discuss a bug that appeared in our deploy this week18:51
timburkesimondodsley, what do you mean by "use as a Swift store"? swift is a storage platform; it can act as an S3 endpoint, but the configuration details will vary from client to client18:53
timburkefuleco, sure -- what's up?18:53
simondodsleytimburke I have an object store platform that i want to configure to into swift. It is S3 compliant, but I'm not seeing how to add it to Swift as the only backend.18:55
fulecoSo, we've had a user with an extremely large versioned bucket and with high level of traffic that reported a rclone purge (delete all objects in bucket than bucket). It got a timeout. from what we got of the stack trace, comes to object_versioning.py, which tries to read an empty body and never gets unblocked.18:55
fulecoI couldn't replicate the error in other environments, however. I was wondering if something like that came up in the past? From what i've searched, it has not18:56
timburkesimondodsley, swift doesn't really take storage backends; it's an object store. i guess maybe you could wire something together with https://opendev.org/x/swiftonfile and some filesystem <-> S3 bridge, but i'd mostly be left wondering, why?18:58
timburkefuleco, i don't recall having seen that before -- any chance you've got a stack trace for that "tries to read an empty body"?19:00
simondodsleywhat our customers are wanting to do is use the expensive object store they have purchased and integrate it into their openstack cluster. Is there any way to use that external object platform as the primary object store for swift?19:00
simondodsleyor is there some gateway that can be utilized19:00
fulecoI do have, but is somewhat large. How would you like me to send it?19:00
timburkesimondodsley, maybe you could register it in the keystone catalog? but does the S3-compatible store speak the Swift API?19:01
timburkefuleco, i'll often drop these sorts of things on https://paste.opendev.org/19:02
simondodsleytimburke - no it doesn't speak swift19:02
fulecotimburke ok, ill set it up19:03
simondodsleyis there any swift<->S3 gateway we could use?19:05
fulecohttps://paste.opendev.org/show/b0G5BYi68GrDCeWIOdrZ/ Maybe you can access it?19:06
timburkesimondodsley, maybe something like https://github.com/caiobrentano/swift-s3-sync ? it's an old project, and not even the original, but a fork that was made before the original went private19:13
simondodsleytimburke: thanks - is the original a paid for service now?19:14
timburkesimondodsley, it's more complicated than that -- it went private when funding dried up for the startup that built it; start up got acquired with it still private, then the project largely got abandoned19:19
timburke(full disclosure: i was working at the startup through all this)19:19
timburkelooks like some of the other "forks" have a more complete history -- something like https://github.com/GhostPunk/1space is better, but still hasn't been updated in 5 years19:20
timburkefuleco, looks like the client didn't provide any content-length (see https://github.com/openstack/swift/blob/2.34.0/swift/common/middleware/versioned_writes/object_versioning.py#L676-L679)19:23
timburkethat should either mean a chunked transfer, or the client should shut down its sending and just be listening for a response19:24
fulecoThe client is rclone. Even though it seams like it, this just happens in this one specific bucket. We could not isolate the problem or repeat the behaviour. Looks to me like it could be a data desync between concurrent tasks?19:27
timburkecurious that it should be caught somewhere around https://github.com/openstack/swift/blob/2.34.0/swift/common/middleware/s3api/s3api.py#L361-L362 -- oh! it's a proper TimeoutError, rather than an eventlet.Timeout!19:29
fulecoYep19:29
fulecoIt looks like it tries to read an empty buffer and keeps waiting eternally19:30
timburkeoh! or maybe the trouble is that https://github.com/openstack/swift/blob/2.34.0/swift/common/middleware/s3api/controllers/obj.py#L206-L207 re-uses the client req without replacing the wsgi.input...19:37
timburkei'm surprised that wouldn't have been caught in func tests though...19:37
fulecoYep that's what we were going for.19:37
fulecoIt is very weird though that it didn't happen in any tests19:38
fulecoNot the automated, not the manual we run19:38
fulecoIt just happens in this specific scenario19:38
timburkeugh, right, and i left my dev env in a dirty state... i can't even really test myself right now19:45
timburkelooks like it'd manifest as a 500 to the client, though, yeah?19:45
fulecoNo worries, just wanted to bring it up to discussion.19:45
fulecoExactly, 500 error19:46
timburkecreated https://bugs.launchpad.net/swift/+bug/2081103 at least20:03
patch-botBug #2081103 - s3api: Deleting the current version of an object can (sometimes?) 500 (New)20:03
fulecoOh thanks a lot timburke. I was opening it on my end to haha20:03
opendevreviewTim Burke proposed openstack/liberasurecode master: built-in rs_vand: De-init tables only when last descriptor is destroyed  https://review.opendev.org/c/openstack/liberasurecode/+/92919320:26
opendevreviewTim Burke proposed openstack/liberasurecode master: Fix write locking when destroying instances  https://review.opendev.org/c/openstack/liberasurecode/+/92932420:26
opendevreviewTim Burke proposed openstack/liberasurecode master: Fix write locking when creating instances  https://review.opendev.org/c/openstack/liberasurecode/+/92932520:26
opendevreviewTim Burke proposed openstack/liberasurecode master: Add read locks when accessing EC descriptors  https://review.opendev.org/c/openstack/liberasurecode/+/92984720:27
timburke#startmeeting swift21:00
opendevmeetMeeting started Wed Sep 18 21:00:09 2024 UTC and is due to finish in 60 minutes.  The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot.21:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.21:00
opendevmeetThe meeting name has been set to 'swift'21:00
timburkewho's here for the swift meeting?21:00
fulecoo/21:00
mattolivero/21:01
timburkeas usual, the agenda's at21:01
timburke#link https://wiki.openstack.org/wiki/Meetings/Swift21:01
timburkefirst up21:01
timburke#topic 2.34.0 release21:01
timburkeit's out! this is our final release for the dalmatian cycle21:01
fuleco👏🏻21:02
timburkethere's a lot of good stuff in it21:02
timburke#link https://opendev.org/openstack/swift/src/branch/master/CHANGELOG21:02
mattoliver\o/21:03
mattolivernice21:03
mattoliverbest swift ever :) 21:03
mattoliveror best swift yet21:03
mattoliverwhatever the saying :P 21:03
timburkeit's one better than the last one!21:03
mattoliver-ETOOEARLY :P 21:03
mattoliverlol21:03
timburkenext up21:04
timburke#topic vPTG21:04
timburkeit's next month!21:04
timburkesorry, it kind of snuck up on me; i should really pay more attention to the mailing list21:04
mattoliverYeah, it seems to do that21:05
timburkewe aren't on the initial project list21:05
timburke#link https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/APAC5ANX4TQLP5R257D6OIADTN6Y5GMS/21:05
patch-botBug #5 - Plone Placeless Translation Service metadata missing from po files (Fix Released)21:05
mattoliverI don't think thats the right line :P 21:05
mattoliver*link21:05
mattoliverI think we need to send an email to the ptg people. 21:05
timburkebut i send an email as recommended21:06
mattoliveroh cool21:06
mattoliverwant me to set up the etherpads so we can start collecting topics?21:06
timburkesure! i started on the general topics one, but if you wanted to pick up the ops-feedback, that'd be great!21:07
mattoliverkk21:07
timburke#link https://etherpad.opendev.org/p/swift-ptg-epoxy21:07
timburke(side note: i also just learned today that our next release name is Epoxy -- i suppose we help hold everything together :-)21:08
mattoliverlol, really21:09
timburkei'll get a poll up for meeting times this week, though i suspect as usual there just isn't going to be a great time slot21:09
timburkeoh, and registration is at21:10
timburke#link https://ptg2024.openinfra.dev/21:10
timburkenext up21:10
timburke#topic s3api bugs21:11
timburkefuleco has been good at finding some current deficiencies in our s3api21:11
timburke#link https://bugs.launchpad.net/swift/+bug/207717921:11
patch-botBug #2077179 - S3Api - async delete (New)21:11
timburkeand today21:11
timburke#link https://bugs.launchpad.net/swift/+bug/208110321:11
patch-botBug #2081103 - s3api: Deleting the current version of an object can (sometimes?) 500 (New)21:11
timburkei don't think there's much progress on either yet, but i wanted to make sure that people were aware of them, and hopefully we can devote some time toward them before the PTG21:13
fulecoYeah the first one is kind of a feature too21:13
fulecoSince it adds a functionality to s3api21:13
fulecoBut the second one is the weird one in the pack, I've been cracking my head at it for some days now...21:13
timburketrue enough -- though i'd be willing to bet that AWS does some dribble-out-whitespace trick for long-running multi-deletes, so us *not* matching that behavior could be construed as a bug ;-)21:14
fulecoSeems fair XD21:15
fulecoProgress from my part has been a little slow, some other issues got my schedule filed21:15
fulecoBut I hope I can do something on them in the next weeks21:15
mattoliverWe know the feeling (re:shedules). Nice finds21:16
timburkefwiw, the more i think about it, i've got a suspicion that the second one has to do with a Transfer-Encoding: chunked DELETE call, and us re-using the client request environment to PUT the old version back21:16
fulecoI do think so too21:17
timburkeit reminds me a bit of how we (ab)use chunked transfers for EC PUTs21:17
timburkehttps://bugs.launchpad.net/swift/+bug/149663621:17
fulecoOn my side, I'll be applying a patch to just add an empty body on the request call to try and deflect that. That would at least give us some more information21:17
patch-botBug #1496636 - EC: Chunked transfer/commit protocol is *not* HTTP (In Progress)21:17
fulecoHowever21:17
timburkewhich unfortunately means that we don't really want eventlet to fix it properly21:18
fulecoI cannot replicate this error in any environment I have access to21:18
fulecoSo I'll have to wait for it to go to production and maybe get some feedback? Even the original environment where it errored does not consistantly report it21:18
fulecoBut I would be more than happy if anyone has other ideas or can at least replicate the error21:19
timburkefor sure, having someone replicate the error would be great21:20
mattoliverit is weird that it's a timeout and not an eventlet.timeout, and down in eventlet hub. 21:21
fulecoJust to give a bit of context, it was a bucket with 230GB in files, versions and delete markers, totaling about 3 and a half million entries21:21
mattoliverI mean I thought eventlet would've monkey patched timeout there. 21:21
fulecoYeah, it is very weird21:22
timburkeagreed. i might be part to blame; i think i did a lot of the get-tests-passing-on-py310 work for eventlet21:22
fulecoAs I was comenting with tim before, my best bet is that it is trying to read a body file that does not exist, but for some reason it registers the read and kinda deadlocks in the IO operation21:23
timburkeand i kinda remember some funny business around TimeoutError21:23
fulecobut I have to admit I didn't go down the eventlet route too much21:23
mattolivergood thing we have a timburke who does delve in eventlet upstream :) 21:24
timburkethe chunked transfer thing makes some sense to me -- client sends a DELETE with the 0-byte chunk, then starts reading and blocks21:26
timburkeserver takes the client request and makes the listing to discover the new current version, consuming the client-provided body in the process; server then uses the same client-socket-backed wsgi.input for the PUT, where we try to read a byte that the client will never send21:26
timburke(because it would break HTTP)21:26
mattoliveroh i see. and the handle_put_version is what then the putting the old version back in place? 21:27
timburkebut if the client sends an explicit Content-Length: 0, everything works out fine (because we can "consume" that kind of a body all we want)21:27
timburkeyup21:27
timburkeso that's my suspicion; i just need to get my dev environment back to a point that i can test it21:28
timburkeextra fun will be breaking the HTTP protocol by sending more data, and seeing what happens when symlink balks21:29
opendevreviewShreeya Deshpande proposed openstack/swift master: Split statsd client from logger  https://review.opendev.org/c/openstack/swift/+/91548321:29
timburkespeaking of rabbit holes for tim to fall down...21:29
timburke#topic liberasurecode segfault21:30
timburkei discovered that creating multiple liberasurecode_rs_vand instances leaks memory!21:31
mattoliveroh yay :( 21:31
timburkeand worse, destroying any one of them will leave all other segfault-y21:31
timburkelike, can't even destroy them21:32
timburkepatch was reasonably easy, though!21:32
timburke#link https://review.opendev.org/c/openstack/liberasurecode/+/92919321:32
patch-botpatch 929193 - liberasurecode - built-in rs_vand: De-init tables only when last de... - 3 patch sets21:32
timburkethe test even reads remarkably well, IMHO21:33
mattolivernice one21:33
timburkei discovered this because i was trying to write an even more complicated test around...21:33
timburke#topic liberasurecode thread-safety21:33
timburkewe've known for a while that it's not thread safe21:34
timburke#link https://bugs.launchpad.net/liberasurecode/+bug/195435121:34
patch-botBug #1954351 - Multiple APIs not thread-safe (In Progress)21:34
timburkewhen i wrote that, i didn't realize just *how* not-thread-safe it is21:34
timburkethe one lock we've got currently guards just a single data structure, but i'm increasingly convinced that it should guard all usages of backend instances (the structures that get looked up via the descriptor numbers we return)21:36
timburkemostly because we can't trust that whatever backend initialization/cleanup will be thread-safe21:37
mattolivercool, haven't looked at liberasurecode in a while. might need to reload what I can in my head :) But yeah test reads pretty good for c :P 21:37
timburkethe fix i wrote for the built-in rs-vand implementation isn't, though i could add another lock and make it better. but i also discovered that jerasurecode isn't, and i wouldn't be surprised if isa-l had similar trouble21:38
timburkei've got a few patches now to improve things; the first two seem pretty ready-to-go21:39
timburke#link https://review.opendev.org/c/openstack/liberasurecode/+/92932421:39
patch-botpatch 929324 - liberasurecode - Fix write locking when destroying instances - 2 patch sets21:39
timburke#link https://review.opendev.org/c/openstack/liberasurecode/+/92932521:39
patch-botpatch 929325 - liberasurecode - Fix write locking when creating instances - 2 patch sets21:39
timburkethe last one goes and adds all the missing read-locks, but is still under-tested21:40
timburke#link https://review.opendev.org/c/openstack/liberasurecode/+/92984721:40
patch-botpatch 929847 - liberasurecode - Add read locks when accessing EC descriptors - 1 patch set21:40
timburkethe one hesitation i've got with those first two is that we'll remove some functions that are currently exposed21:41
timburkeliberasurecode_backend_instance_register and liberasurecode_backend_instance_unregister specifically21:42
mattoliveryeah, I just saw that21:42
timburkebut these are functions that act on ec_backend_t instances, which callers should not be using directly -- that's the whole point of the descriptors we return21:42
timburkethe more i looked into it, the more i realized21:43
timburke1, we expose a whole lot more symbols than we really want to maintain as part of our API21:43
timburkeand 2, we really need to have a way to track what symbols we're publishing and make it obvious that we're adding to or removing from that list21:44
timburkeFWIW, i did a bit of digging and discovered that we've previously removed (well, renamed) symbols21:45
timburkesee https://github.com/openstack/liberasurecode/commit/a6a8d201 -- is_valid_fragment_metadata was renamed to is_invalid_fragment_metadata21:45
timburke(yes, really)21:45
fulecoOh no21:46
mattoliverwow, go us21:46
timburkethat was back in 1.2.0 -- and so far as i know, no one called us out for not going to a 2.0 release instead21:47
timburkei *think* that the main reason for that is that we have so few consumers -- checking a variety of package managers, i saw at most liberasurecode-dev(el) and pyeclib as reverse dependencies21:47
timburkeso, provided we don't break pyeclib, maybe we're good to slim things down to the API we actually want to maintain?21:48
timburkeprobably a thing to discuss at the PTG21:49
mattoliverI guess the other option is to document and expect anyone using the register/unregister functions must use some kind of locking, like your now doing and so should safely be able to just call them.  21:49
mattoliverbut that still leaves them as non thread-safe blast zones. 21:50
mattoliveror pull them out of the headers and they just become helper methods for us. We can always add them back is people get annoyed. 21:50
mattoliveror just rip em out like you've done.21:51
mattolivereitherway, I'll take a closer look and yeah great topic for PTG21:51
timburkeyeah, i'd kinda really prefer to lock down the API: make sure that we only expose what we want to expose, and make sure that every entrypoint is threadsafe21:51
mattoliver+121:52
timburkeall this was done because i realized that getting pyeclib to properly support free-threaded python probably needs to start with thread-safety for liberasurecode21:52
timburkewhich leads me to...21:52
timburke#topic test pyeclib on py313t21:52
timburkecurrently on master it doesn't even build on a free-threaded python build21:53
timburkethis is because i was a little over-zealous in trying to enforce abi3 builds a while back; it was effective, but i should have left it to setuptools to get the #define in there21:54
timburke#link https://review.opendev.org/c/openstack/pyeclib/+/92932821:54
patch-botpatch 929328 - pyeclib - Stop defining Py_LIMITED_API ourselves - 1 patch set21:54
timburkewill allow it to build, to that21:54
timburke#link https://review.opendev.org/c/openstack/pyeclib/+/92760521:54
patch-botpatch 927605 - pyeclib - Test under py313 - 2 patch sets21:54
timburkecan actually pass21:54
timburkethough there's a warning about how the GIL got re-enabled21:55
timburkeso if anyone wants to brush up on their C, there's a bunch of stuff that could use eyes21:55
timburkebut i've used up a lot of the time already, i ought to open it up for other topics21:56
timburke#topic open discussion21:56
timburkeanything else we ought to bring up this week?21:56
mattoliverdefinitley need to brush up on my c. Nice work tim, really interesting and exciting stuff21:56
mattoliverI can't remember what I've been doing. Took time off last week. I started reloading chunked s3api chain and that somehow got me looking at pyeclib wheels. So think I might just try and get some reviews done this week. (I also have a bunch of downstream work on my plate I need to finish up). But next week, hopefully I'll have more updates on things I'm digging into.21:59
timburkethanks for looking at both those topics!22:00
timburkelet me know if you want some high-bandwidth time on either or both of them22:00
mattoliverkk, thanks tim22:01
timburkeall right, i think i'll call it22:01
timburkethank you all for coming, and thank you for working on swift!22:01
timburke#endmeeting22:01
opendevmeetMeeting ended Wed Sep 18 22:01:51 2024 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)22:01
opendevmeetMinutes:        https://meetings.opendev.org/meetings/swift/2024/swift.2024-09-18-21.00.html22:01
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/swift/2024/swift.2024-09-18-21.00.txt22:01
opendevmeetLog:            https://meetings.opendev.org/meetings/swift/2024/swift.2024-09-18-21.00.log.html22:01
opendevreviewTim Burke proposed openstack/liberasurecode master: Track symbols exposed by built so's  https://review.opendev.org/c/openstack/liberasurecode/+/92985522:37
opendevreviewTim Burke proposed openstack/liberasurecode master: Track symbols exposed by built so's  https://review.opendev.org/c/openstack/liberasurecode/+/92985522:57
opendevreviewTim Burke proposed openstack/liberasurecode master: Track symbols exposed by built so's  https://review.opendev.org/c/openstack/liberasurecode/+/92985523:02

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!