#openstack-swift log

21:00:08 <timburke> #startmeeting swift
21:00:08 <opendevmeet> Meeting started Wed Jul  3 21:00:08 2024 UTC and is due to finish in 60 minutes.  The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:00:08 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:00:08 <opendevmeet> The meeting name has been set to 'swift'
21:00:15 <timburke> who's here for the swift meeting?
21:00:33 <timburke> i know acoles and mattoliver are out
21:01:01 <fulecorafa> Im here
21:01:10 <timburke> o/
21:01:14 <fulecorafa> o/
21:03:38 <timburke> well, it'll do, even if it's just the two of us :-)
21:03:47 <timburke> as usual, the agenda's at
21:03:49 <timburke> #link https://wiki.openstack.org/wiki/Meetings/Swift
21:03:54 <timburke> first up
21:04:08 <timburke> #topic published docker images
21:04:21 <timburke> they were busted -- now they're fixed!
21:04:33 <fulecorafa> That's great news!
21:04:34 <timburke> they're also now py3-only!
21:04:45 <timburke> #link https://review.opendev.org/c/openstack/swift/+/896450
21:04:45 <patch-bot> patch 896450 - swift - Get rid of py2 docker image builds; switch "latest... (MERGED) - 3 patch sets
21:05:40 <timburke> we probably should have changed the "latest" tag to be py3 a while ago, so i think it's all good
21:07:06 <timburke> i still need to look into getting a gate job to run functional tests using the built container, though -- but at least the immediate problem of publishing known-broken images is resolved
21:07:41 <timburke> unfortunately, i realized we'd been doing that for months and months :-(
21:08:02 <fulecorafa> We've being taking a look at automating some test, but they're are focused on s3api
21:08:24 <timburke> going back to september at least
21:08:26 <fulecorafa> Maybe we could start messing around with getting the tests to run in the container as well
21:08:26 <timburke> #link https://bugs.launchpad.net/swift/+bug/2037268
21:08:27 <patch-bot> Bug #2037268 - Docker's SAIO doesn't work (Fix Released)
21:09:35 <timburke> fulecorafa, that'd be great, thanks for thinking of it! iirc, the docker image includes s3api support, so that could work well -- i'll double check
21:09:56 <fulecorafa> It does include, I'm sure of it
21:10:50 <fulecorafa> But the testing we've been doing with the image has been more of a manual work. i.e. a cli client which makes a bunch of calls to a running instance
21:11:17 <timburke> the main i need to figure out is how to get the zuul plumbing sorted; i think clarkb has previously given me some pointers to help get me started, just need to go back through some notes
21:11:21 <timburke> ah, makes sense
21:11:49 <fulecorafa> I imagine the ideal scenario would be to automatically run a docker instance and testing together
21:12:02 <clarkb> timburke: ya zuul has a bunch of existing base job stuff you can inherit from and supply secrest to and it does a lot of the container build and publishing work for you
21:12:14 <clarkb> timburke: opendev/system-config has a lot of examples as does zuul/zuul or zuul/nodepool
21:12:15 <timburke> my thought is basically to have a job that starts up the just-built container, then runs `pytest test/functional` with an appropriate config file to point it at the container
21:12:55 <clarkb> probably the zuul stuff is closer to what you're doing since its also python and a smaller number of images. Look at the zuul image builds and the zuul quickstart job and how they interact
21:13:36 <timburke> yeah -- probably what i really ought to do is extend the existing build job to include validation testing
21:13:52 <fulecorafa> Should be great as well. I did try to run testing on the image before, but I just couldn't figure out the python dependencies. I'll later try again with this new fix
21:14:22 <timburke> next up
21:14:28 <timburke> #topic account-reaper and sharded containers
21:14:50 <timburke> i still haven't gotten to writing a probe test. but it's still on my list!
21:15:07 <timburke> hopefully i can get something together for next week
21:15:12 <zaitcev> thanks for remembering
21:15:54 <timburke> next up...
21:16:00 <timburke> #topic cooperative tokens
21:16:31 <jianjian> ah, I joined on the right moment
21:16:51 <timburke> you did!
21:17:29 <timburke> i was hoping jianjian might offer an overview of some of the work here, since i've seen a decent bit of activity/interest lately
21:17:37 <timburke> #link https://review.opendev.org/c/openstack/swift/+/890174
21:17:38 <patch-bot> patch 890174 - swift - common: add memcached based cooperative token mech... - 32 patch sets
21:17:45 <zaitcev> I have a fundamental queestion: do we actually need these
21:17:53 <zaitcev> in light of the Fernet token review
21:18:10 <timburke> and
21:18:14 <timburke> #link https://review.opendev.org/c/openstack/swift/+/908969
21:18:14 <zaitcev> Soon there will be no memcached, no McRouter, no trouble ... right?
21:18:14 <patch-bot> patch 908969 - swift - proxy: use cooperative tokens to coalesce updating... - 23 patch sets
21:18:51 <jianjian> zaitcev, other than Fernet token, swift has other use cases which replies on memcached
21:19:05 <timburke> zaitcev, so memcache will still be a thing -- these are mostly around account/container info and (especially) shard range caching
21:19:17 <zaitcev> jianjian: yeah. But those aren't cooperative token use cases.
21:19:23 <timburke> (thanks for reviewing the fernet token patch, though!)
21:19:43 <zaitcev> Did I review it? I remember looking it, but...
21:19:54 <zaitcev> (having a biden moment)
21:20:11 <timburke> p 861271 has your +2 and everything :-)
21:20:12 <patch-bot> https://review.opendev.org/c/openstack/swift/+/861271 - swift - tempauth: Support fernet tokens - 6 patch sets
21:20:22 <zaitcev> So anyway, I'm not aganist the cooperative tokens idea, I think they are pretty clever.
21:20:38 <jianjian> from our production experience, shard range caching is our mostly targeted use case. we see a lot of shard range cache misses and the associated thundering herd problems.
21:20:52 <zaitcev> I see.
21:22:07 <jianjian> and when shard range cache misses, thousands of requests would go the same containers (3 replica) and overload those container, because shard range GET is a very expensive and slow operation.
21:23:47 <jianjian> and also some of those requests would be able to get shard ranges and start to write into memcache at the same time, cause memcache to fail as well
21:24:29 <jianjian> but think about it, all of those tens of thousands requests are asking for the same thing! we only need one of them
21:25:40 <jianjian> that's the basic idea of cooperative token. on top of that, we allow a few of requests to get the token and go to the backend, in case any single one would fail to do so.
21:26:37 <zaitcev> I see what I misunderstood. You aren't talking about authentication tokens at all, but tokens that you circulate in memcached like in a TokenRing.
21:27:37 <timburke> yeah, something like a semaphore i think
21:27:38 <zaitcev> Way to go off half-cocked. But thanks for the additional explanation.
21:27:47 <jianjian> no, it's not for authentication.
21:28:03 <jianjian> timburke, that's right
21:28:18 <jianjian> testing on staging cluster works well, we are going to enabling it on production, let's see.
21:29:05 <timburke> we've got a few processes that might be interested in shard ranges -- is this work only improving proxy-server handling? would mattoliver's p 874721 be able to benefit, too?
21:29:05 <patch-bot> https://review.opendev.org/c/openstack/swift/+/874721 - swift - updater: add memcache shard update lookup support - 5 patch sets
21:29:57 <timburke> how are you looking at measuring the improvement?
21:30:16 <jianjian> just noticed this patch, will take a look at it
21:31:29 <jianjian> my goal is to not see the container server 503 storms any more, not sure if it will improve anything like front user will see
21:32:01 <timburke> it's mroe than a year out of date, so i wouldn't worry *too much* about it -- i was just curious if you could see other processes which need shard ranges (such as the object-updater) using the same API
21:32:27 <jianjian> if those container server won't be overloaded, user would see less 503 errors
21:32:49 <timburke> or rather, that *could benefit from* shard ranges -- updater doesn't currently fetch shard ranges, just accepts redirects
21:32:49 <jianjian> good point, I will take a look. thanks! timburke
21:32:51 <jianjian> cccccbkvbdvfndecuccvrufvdvlccbretkdtnufnnrvr
21:33:13 <jianjian> oh, no, sorry my usb device
21:34:20 <zaitcev> My keyboard just repeats a key.
21:34:29 <timburke> but either way, reducing the load from proxies will be great, and getting some measurable improvements from prod is always a vote of confidence :-) i'll try to review the chain, though it looks like acoles has been helping too
21:35:14 <timburke> next up
21:35:29 <timburke> #topic py312 and slow process start-up
21:36:20 <timburke> this was something i noticed when trying to run tests locally ahead of declaring py312 support in p 917878
21:36:20 <patch-bot> https://review.opendev.org/c/openstack/swift/+/917878 - swift - Test under py312 (MERGED) - 5 patch sets
21:36:53 <timburke> (that only added automated unit testing under py312; func and probe tests i performed locally)
21:37:12 <timburke> func tests were fine, but probe tests took *forever*
21:38:13 <timburke> i eventually traced it down warnings about pkg_resources being deprecated
21:39:03 <zaitcev> I think I recall fixing one... Only because watcher used it and I was obligated.
21:39:37 <timburke> the kind of funny thing is that setuptools (which was what was issuing the warning about pkg_resources being deprecated) was *also* the one writing code that would use pkg_resources!
21:40:02 <timburke> https://github.com/pypa/setuptools/blob/main/setuptools/script.tmpl
21:40:02 <jianjian> haha
21:40:20 <zaitcev> groan
21:41:05 <jianjian> I noticed our pipeline has "swift-tox-py312" now, so that's added by the 917878 patch?
21:41:08 <timburke> that only really came up with py312 because prior to that, python -m venv would include an older version of setuptools that didn't issue the warning
21:41:16 <timburke> jianjian, yup!
21:42:04 <jianjian> 👍
21:42:24 <timburke> after a bit more digging, i figured out that we could change how we indicate there's this bin script to get around it
21:43:59 <timburke> basically, instead of listing it in the "scripts" of the "files" section in setup.cfg, list it in "console_scripts" in "entry_points"
21:44:44 <zaitcev> Oh, so that was the reason
21:45:08 <zaitcev> Why didn't you write it in the changelog? It's not self-evident. I'd add a +2 right away.
21:45:12 <timburke> so i started doing the conversion. it's mostly fairly mechanical, but there are a few bin scripts that still have a lot of code in them that needs to get relocated
21:45:44 <timburke> i thought i did ok on that in p 918365 ! i guess not clear enough :-)
21:45:44 <patch-bot> https://review.opendev.org/c/openstack/swift/+/918365 - swift - Use entry_points for server executables - 4 patch sets
21:45:46 <jianjian> nice! this brings us closer to run py312 on prod
21:45:53 <zaitcev> Oh, right. That'd add a lot of boilerplate.
21:47:26 <timburke> anyway, mattoliver at least took a look a couple months ago, recommending that we do it for everybody, so now i've added more patches stacked on top
21:47:46 <timburke> if anyone has time to review, i'd appreciate it
21:47:54 <timburke> next up
21:48:06 <timburke> #topic multi-policy containers
21:48:25 <timburke> i promised fulecorafa i'd get this on the agenda :-)
21:48:36 <fulecorafa> Thanks a lot :)
21:49:23 <timburke> i'm not sure what would be most useful for you, though. more discussion/brainstorming, maybe?
21:49:57 <fulecorafa> So, we've been working on this and we have a working prototype. Based on that idea from last meeting of making a +cold bucket automatically
21:50:23 <timburke> cool!
21:51:01 <zaitcev> Interesting!
21:51:07 <fulecorafa> We still have to make some adaptations to start thinking about a patch though. We've been working with some old version, which is what we use in production
21:51:54 <fulecorafa> Some other good news: we got it working with MPUs and versioning as well
21:51:54 <timburke> always best to start from what you're running in prod :-)
21:51:59 <timburke> nice!
21:52:56 <fulecorafa> One thing we are still pondering though
21:53:54 <fulecorafa> For multipart uploads, it kind of came for free with our modifications in the protocol if we have 4 buckets in the end: bkt; bkt+segments; bkt+cold and bkt+segments+cold
21:54:35 <timburke> that'd seem to make sense
21:55:33 <fulecorafa> We came to consider possibly removing this need by adapting the manifest file to skip the linking bkt -> bkt+cold (manifest) -> bkt+cold+segments (parts). So it would be just bkt (manifest) -> bkt+segments+cold (parts)
21:55:48 <fulecorafa> Is it worth the trouble?
21:57:20 <timburke> maybe? i think it'd probably come down to what kind of performance tradeoffs you can make -- the extra hop will necessarily increase time-to-first-byte
21:58:26 <fulecorafa> My personal opinion is that it is not worth it. It would largly complicate the code and this increase in time would be little compared to the whole process
21:58:48 <fulecorafa> In our tests, we didn't notice much of a difference
22:01:01 <timburke> having the four buckets seems unavoidable -- bkt / bkt+cold for normal uploads, bkt+segments / bkt+segments+cold for part data
22:01:04 <timburke> i'm inclined to agree that having the extra hop is worth it if it means the code is easier to grok and maintain
22:02:06 <fulecorafa> Then we agree. Maybe this changes when we come around to patch upstream, but we can talk about it then
22:02:17 <timburke> cool, sounds great!
22:02:23 <timburke> all right, we're about at time
22:02:38 <timburke> thank you all for coming, and thank you for working on swift!
22:02:44 <timburke> #endmeeting