21:01:00 <timburke> #startmeeting swift
21:01:00 <opendevmeet> Meeting started Wed Dec  4 21:01:00 2024 UTC and is due to finish in 60 minutes.  The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:01:00 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:01:00 <opendevmeet> The meeting name has been set to 'swift'
21:01:08 <timburke> who's here for the swift meeting?
21:01:17 <mattoliver> o/
21:01:46 <nathang15> #help
21:01:50 <nathang15> me
21:02:06 <timburke> i haven't updated the agenda; still catching up a bit on what i've missed the past couple weeks
21:02:49 <timburke> but i think i can pull a few topics out of that catch-up ;-)
21:02:58 <timburke> #topic failing cors tests
21:03:29 <timburke> a new eventlet release broke one of our test jobs
21:04:40 <timburke> it was a fairly esoteric sort of a break; the content-type of HEAD requests against accounts and containers changed with eventlet 0.38.0
21:05:16 <timburke> clayg spearheaded the fix in p 935817
21:05:17 <patch-bot> https://review.opendev.org/c/openstack/swift/+/935817 - swift - Ensure correct content-type in container HEAD resp... (MERGED) - 6 patch sets
21:05:21 <opendevreview> Matthew Oliver proposed openstack/swift master: Quarantine asyncs older then reclaim_age  https://review.opendev.org/c/openstack/swift/+/527296
21:06:06 <mattoliver> yup, that was fun, we all learnt how much work timburke does for this project!
21:06:23 <timburke> and i've proposed backports of that to the stable branches. i'm a little torn about the `Content-Length: 0` that's getting inserted, though
21:07:51 <timburke> it revealed a broader problem with our CI, though: many/most of our jobs run with a weird mishmash of dependency versions
21:08:53 <timburke> many get pinned by upper-constraints from the requirements repo, but direct dependencies are not (which meant that eventlet was getting updated despite the global u-c having 0.36.1)
21:09:42 <timburke> *that* should get fixed by p 936872 (which also has a comment going into what's going on there a little more)
21:09:42 <patch-bot> https://review.opendev.org/c/openstack/swift/+/936872 - swift - CI: drop pip --upgrade flag in tox.ini - 1 patch set
21:09:47 <mattoliver> ahh, ok, thats annoying
21:10:20 <timburke> or rather, that addresses the problem for many, but not all, jobs
21:11:04 <timburke> there's another comment on that review that calls out the other jobs that i saw pulling in latest eventlet
21:11:06 <mattoliver> ok so the -U would trump the contraints.. wow.
21:12:45 <timburke> mattoliver, kind of! only for direct dependencies, which is probably why it hasn't bitten us much before: we actively try to keep that list small
21:14:04 <timburke> earlier today i also pushed up p 937045 to add constraints to a job that was missing them
21:14:04 <patch-bot> https://review.opendev.org/c/openstack/swift/+/937045 - swift - CI: Use constraints for api-ref builds - 1 patch set
21:14:52 <mattoliver> ok, still unexpected behaviour can cause us a shoot ourselves in the foot unknowingly. nice find.
21:14:59 <timburke> and i'm working on bringing our py3-constraints.txt file more in line with the requirements u-c version
21:15:07 <mattoliver> at least now we know we can now work with a later eventlet :P
21:15:31 <timburke> next up after that, we'll want to
21:16:18 <timburke> (1) use constraints for all the *other* places we run `pip install` (down in ansible playbooks)
21:16:52 <timburke> and (2) add some non-voting jobs that intentionally run unconstrained
21:17:19 <timburke> i'm thinking unit and func-encryption tests should be sufficient, probably on py312
21:17:39 <mattoliver> kk, so api-ref I feel less worried about but agree that consistency is much better, because rather then problems like this biting us on the butt again.
21:17:46 <mattoliver> Sounds like a solid plan.
21:18:03 <mattoliver> non-voting to let us know if there are problems afoot but not gate blockers!
21:20:33 <timburke> #topic statsd-logging coupling
21:20:41 <timburke> looks like p 931473 finally merged!
21:20:42 <patch-bot> https://review.opendev.org/c/openstack/swift/+/931473 - swift - Remove statds from the logs module (MERGED) - 27 patch sets
21:21:31 <timburke> great work acoles and shreeya, and thanks for reviewing clayg
21:21:43 <timburke> i know there was a decent bit of back-and-forth on it
21:22:35 <timburke> not just on that patch, but p 915483 as well
21:22:36 <patch-bot> https://review.opendev.org/c/openstack/swift/+/915483 - swift - Split statsd client from logger (ABANDONED) - 52 patch sets
21:23:49 <mattoliver> 52 patchesets and abandoned. yeah
21:25:23 <timburke> now that that's landed...
21:25:32 <timburke> #topic labeled metrics
21:26:14 <timburke> it looks like yan has rebased the chain: p 909882, p 917711, p 930918
21:26:14 <patch-bot> https://review.opendev.org/c/openstack/swift/+/909882 - swift - stats: API for native labeled metrics - 39 patch sets
21:26:14 <patch-bot> https://review.opendev.org/c/openstack/swift/+/917711 - swift - Add labeled metrics to proxy-logging - 23 patch sets
21:26:16 <patch-bot> https://review.opendev.org/c/openstack/swift/+/930918 - swift - proxy-logging: Add real-time transfer bytes counters - 10 patch sets
21:27:29 <mattoliver> nice, one step closer to having labelled metrics support in swift
21:27:31 <timburke> we should probably try to focus on that next so it doesn't get stalled out again (i think i last reviewed it back in July 😱)
21:28:16 <mattoliver> we have been running these downstream too right, so they're also getting some live testing
21:29:37 <timburke> speaking of code already running downstream...
21:29:55 <timburke> #topic swift-reload robustness
21:30:10 <timburke> i finally got around to addressing some of clayg's concerns on p 837641
21:30:10 <patch-bot> https://review.opendev.org/c/openstack/swift/+/837641 - swift - Add abstract sockets for process notifications - 13 patch sets
21:31:11 <mattoliver> oh cool, I'll take a look
21:31:19 <timburke> the major thing was to rip out the ancdata processing to verify that messages were coming from the expected pid; it's unlikely that anything else would be using the pid-specific address
21:32:08 <mattoliver> kk
21:32:10 <timburke> which also gave me an excuse to dust off p 900957
21:32:10 <patch-bot> https://review.opendev.org/c/openstack/swift/+/900957 - swift - wsgi: Avoid the need for a separate socket-closer ... - 2 patch sets
21:35:03 <mattoliver> cool, oneday we'll get through your swift-reload chain :P
21:35:07 <timburke> and speaking of server process management, i should probably also revisit p 789035 ...
21:35:07 <patch-bot> https://review.opendev.org/c/openstack/swift/+/789035 - swift - wsgi: Reap stale workers (after a timeout) followi... - 19 patch sets
21:35:34 <timburke> but there's always so much to review!
21:35:38 <timburke> on that note...
21:35:42 <timburke> #topic feature/mpu
21:35:51 <timburke> acoles has been hard at work!
21:36:31 <timburke> i'm pretty sure the majority of landed changes these past two weeks have been to feature/mpu
21:37:30 <timburke> that said, i'm sure he'd appreciate some reviews on the other ones that are still open: https://review.opendev.org/q/project:openstack/swift+branch:feature/mpu+is:open
21:39:19 <timburke> ...but i think that's about as far as i've gotten in terms of catching up ;-)
21:39:25 <timburke> #topic open discussion
21:39:35 <timburke> anything else we should talk about this week?
21:40:02 <mattoliver> I've been playing with 2 things
21:40:19 <mattoliver> p 527296 was revived
21:40:20 <patch-bot> https://review.opendev.org/c/openstack/swift/+/527296 - swift - Quarantine asyncs older then reclaim_age - 4 patch sets
21:40:36 <mattoliver> And I've changed it to quarantine rather then unlink.
21:41:15 <zaitcev> It's like there's no win.
21:41:32 <zaitcev> Except keep the cluster health up so that asyncs aren't that old.
21:41:48 <mattoliver> Also finally managed to get the reaper to properly use an internal client. Thanks to writing a probe test, lead me to find areas where internal_clients would fail because x-backend-override-deleted isn't passed it get_info requests.
21:42:39 <timburke> at least now we've got p 931979 !
21:42:39 <patch-bot> https://review.opendev.org/c/openstack/swift/+/931979 - swift - Add oldest failed async pending tracker (MERGED) - 33 patch sets
21:43:00 <mattoliver> zaitcev: yeah true. and stops the updater from continually picking them up and useing them forever
21:43:32 <mattoliver> probably why it sat there for 6 years :P
21:44:53 <mattoliver> here is the shard-aware reaper using internal_client: p 925188
21:44:54 <patch-bot> https://review.opendev.org/c/openstack/swift/+/925188 - swift - WIP: reaper: use internal client - 5 patch sets
21:45:25 <mattoliver> Still need to clean it up. As there was a bit of live hacking/debugging while diagnosing why the probetest kept failing.
21:46:05 <mattoliver> but ended up being (as I said) get_infos and better account-server override-delete support.
21:46:24 <mattoliver> turns out direct client made certain things much easier.
21:47:01 <mattoliver> That patch also adds a 'legecy_mode' to the reaper, for a possible upgrade path.. but that's still in debate.. so still wip
21:47:17 <mattoliver> that's all I have
21:48:31 <jianjian> good work, Matt.
21:48:59 <jianjian> yeah, zuul is not happy yet with the reaper patch
21:50:58 <mattoliver> yeah, I saw that. Only got probe test working late last night.. so now I'll doublecheck everything else with a fresh brain :)
21:51:27 <timburke> think it's just that unit tests need updating
21:51:30 <jianjian> for the "Quarantine asyncs older then reclaim_age" patch, is there a valid case that we are going to need that async pending even after reclaim_age?
21:52:22 <timburke> i think the idea is that the operator probably ought to dig in some and figure out what the listing is supposed to look like
21:52:58 <jianjian> yeah, I am thinking if there is case like uploaded object is still there, and container listing still doesn't have it
21:54:15 <jianjian> will think more and look into it
21:54:46 <mattoliver> Clay liked the idea try again and only q if it failed
21:55:27 <timburke> with multiple replicas trying to get the update in, i don't think dark data tends to be the issue. the trouble is usually the opposite: ghost listings, where the data's been cleaned up but container still has the listing
21:55:30 <mattoliver> if it partially succeeds (managed to update a replica) then keep it around.. but knowing it still might get q next time round.
21:56:04 <mattoliver> yeah, ghost listings is really what I meant
21:56:54 <jianjian> good point, the chance of ghost listing is bigger
21:58:12 <timburke> even if we've got an async for the delete, once we get it to the container, it'll delete the row entirely, so another replica can just send the ghost back again :-(
21:59:11 <timburke> all right, we're about at time
21:59:22 <mattoliver> does it delete entirely or put a tombstone (mark deleted=1) in the db
21:59:22 <timburke> thank you all for coming, and thanks for working on swift!
21:59:37 <timburke> mattoliver, well, if it's past reclaim age...
21:59:49 <timburke> #endmeeting