21:01:00 <timburke> #startmeeting swift 21:01:00 <opendevmeet> Meeting started Wed Dec 4 21:01:00 2024 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:01:00 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:01:00 <opendevmeet> The meeting name has been set to 'swift' 21:01:08 <timburke> who's here for the swift meeting? 21:01:17 <mattoliver> o/ 21:01:46 <nathang15> #help 21:01:50 <nathang15> me 21:02:06 <timburke> i haven't updated the agenda; still catching up a bit on what i've missed the past couple weeks 21:02:49 <timburke> but i think i can pull a few topics out of that catch-up ;-) 21:02:58 <timburke> #topic failing cors tests 21:03:29 <timburke> a new eventlet release broke one of our test jobs 21:04:40 <timburke> it was a fairly esoteric sort of a break; the content-type of HEAD requests against accounts and containers changed with eventlet 0.38.0 21:05:16 <timburke> clayg spearheaded the fix in p 935817 21:05:17 <patch-bot> https://review.opendev.org/c/openstack/swift/+/935817 - swift - Ensure correct content-type in container HEAD resp... (MERGED) - 6 patch sets 21:05:21 <opendevreview> Matthew Oliver proposed openstack/swift master: Quarantine asyncs older then reclaim_age https://review.opendev.org/c/openstack/swift/+/527296 21:06:06 <mattoliver> yup, that was fun, we all learnt how much work timburke does for this project! 21:06:23 <timburke> and i've proposed backports of that to the stable branches. i'm a little torn about the `Content-Length: 0` that's getting inserted, though 21:07:51 <timburke> it revealed a broader problem with our CI, though: many/most of our jobs run with a weird mishmash of dependency versions 21:08:53 <timburke> many get pinned by upper-constraints from the requirements repo, but direct dependencies are not (which meant that eventlet was getting updated despite the global u-c having 0.36.1) 21:09:42 <timburke> *that* should get fixed by p 936872 (which also has a comment going into what's going on there a little more) 21:09:42 <patch-bot> https://review.opendev.org/c/openstack/swift/+/936872 - swift - CI: drop pip --upgrade flag in tox.ini - 1 patch set 21:09:47 <mattoliver> ahh, ok, thats annoying 21:10:20 <timburke> or rather, that addresses the problem for many, but not all, jobs 21:11:04 <timburke> there's another comment on that review that calls out the other jobs that i saw pulling in latest eventlet 21:11:06 <mattoliver> ok so the -U would trump the contraints.. wow. 21:12:45 <timburke> mattoliver, kind of! only for direct dependencies, which is probably why it hasn't bitten us much before: we actively try to keep that list small 21:14:04 <timburke> earlier today i also pushed up p 937045 to add constraints to a job that was missing them 21:14:04 <patch-bot> https://review.opendev.org/c/openstack/swift/+/937045 - swift - CI: Use constraints for api-ref builds - 1 patch set 21:14:52 <mattoliver> ok, still unexpected behaviour can cause us a shoot ourselves in the foot unknowingly. nice find. 21:14:59 <timburke> and i'm working on bringing our py3-constraints.txt file more in line with the requirements u-c version 21:15:07 <mattoliver> at least now we know we can now work with a later eventlet :P 21:15:31 <timburke> next up after that, we'll want to 21:16:18 <timburke> (1) use constraints for all the *other* places we run `pip install` (down in ansible playbooks) 21:16:52 <timburke> and (2) add some non-voting jobs that intentionally run unconstrained 21:17:19 <timburke> i'm thinking unit and func-encryption tests should be sufficient, probably on py312 21:17:39 <mattoliver> kk, so api-ref I feel less worried about but agree that consistency is much better, because rather then problems like this biting us on the butt again. 21:17:46 <mattoliver> Sounds like a solid plan. 21:18:03 <mattoliver> non-voting to let us know if there are problems afoot but not gate blockers! 21:20:33 <timburke> #topic statsd-logging coupling 21:20:41 <timburke> looks like p 931473 finally merged! 21:20:42 <patch-bot> https://review.opendev.org/c/openstack/swift/+/931473 - swift - Remove statds from the logs module (MERGED) - 27 patch sets 21:21:31 <timburke> great work acoles and shreeya, and thanks for reviewing clayg 21:21:43 <timburke> i know there was a decent bit of back-and-forth on it 21:22:35 <timburke> not just on that patch, but p 915483 as well 21:22:36 <patch-bot> https://review.opendev.org/c/openstack/swift/+/915483 - swift - Split statsd client from logger (ABANDONED) - 52 patch sets 21:23:49 <mattoliver> 52 patchesets and abandoned. yeah 21:25:23 <timburke> now that that's landed... 21:25:32 <timburke> #topic labeled metrics 21:26:14 <timburke> it looks like yan has rebased the chain: p 909882, p 917711, p 930918 21:26:14 <patch-bot> https://review.opendev.org/c/openstack/swift/+/909882 - swift - stats: API for native labeled metrics - 39 patch sets 21:26:14 <patch-bot> https://review.opendev.org/c/openstack/swift/+/917711 - swift - Add labeled metrics to proxy-logging - 23 patch sets 21:26:16 <patch-bot> https://review.opendev.org/c/openstack/swift/+/930918 - swift - proxy-logging: Add real-time transfer bytes counters - 10 patch sets 21:27:29 <mattoliver> nice, one step closer to having labelled metrics support in swift 21:27:31 <timburke> we should probably try to focus on that next so it doesn't get stalled out again (i think i last reviewed it back in July 😱) 21:28:16 <mattoliver> we have been running these downstream too right, so they're also getting some live testing 21:29:37 <timburke> speaking of code already running downstream... 21:29:55 <timburke> #topic swift-reload robustness 21:30:10 <timburke> i finally got around to addressing some of clayg's concerns on p 837641 21:30:10 <patch-bot> https://review.opendev.org/c/openstack/swift/+/837641 - swift - Add abstract sockets for process notifications - 13 patch sets 21:31:11 <mattoliver> oh cool, I'll take a look 21:31:19 <timburke> the major thing was to rip out the ancdata processing to verify that messages were coming from the expected pid; it's unlikely that anything else would be using the pid-specific address 21:32:08 <mattoliver> kk 21:32:10 <timburke> which also gave me an excuse to dust off p 900957 21:32:10 <patch-bot> https://review.opendev.org/c/openstack/swift/+/900957 - swift - wsgi: Avoid the need for a separate socket-closer ... - 2 patch sets 21:35:03 <mattoliver> cool, oneday we'll get through your swift-reload chain :P 21:35:07 <timburke> and speaking of server process management, i should probably also revisit p 789035 ... 21:35:07 <patch-bot> https://review.opendev.org/c/openstack/swift/+/789035 - swift - wsgi: Reap stale workers (after a timeout) followi... - 19 patch sets 21:35:34 <timburke> but there's always so much to review! 21:35:38 <timburke> on that note... 21:35:42 <timburke> #topic feature/mpu 21:35:51 <timburke> acoles has been hard at work! 21:36:31 <timburke> i'm pretty sure the majority of landed changes these past two weeks have been to feature/mpu 21:37:30 <timburke> that said, i'm sure he'd appreciate some reviews on the other ones that are still open: https://review.opendev.org/q/project:openstack/swift+branch:feature/mpu+is:open 21:39:19 <timburke> ...but i think that's about as far as i've gotten in terms of catching up ;-) 21:39:25 <timburke> #topic open discussion 21:39:35 <timburke> anything else we should talk about this week? 21:40:02 <mattoliver> I've been playing with 2 things 21:40:19 <mattoliver> p 527296 was revived 21:40:20 <patch-bot> https://review.opendev.org/c/openstack/swift/+/527296 - swift - Quarantine asyncs older then reclaim_age - 4 patch sets 21:40:36 <mattoliver> And I've changed it to quarantine rather then unlink. 21:41:15 <zaitcev> It's like there's no win. 21:41:32 <zaitcev> Except keep the cluster health up so that asyncs aren't that old. 21:41:48 <mattoliver> Also finally managed to get the reaper to properly use an internal client. Thanks to writing a probe test, lead me to find areas where internal_clients would fail because x-backend-override-deleted isn't passed it get_info requests. 21:42:39 <timburke> at least now we've got p 931979 ! 21:42:39 <patch-bot> https://review.opendev.org/c/openstack/swift/+/931979 - swift - Add oldest failed async pending tracker (MERGED) - 33 patch sets 21:43:00 <mattoliver> zaitcev: yeah true. and stops the updater from continually picking them up and useing them forever 21:43:32 <mattoliver> probably why it sat there for 6 years :P 21:44:53 <mattoliver> here is the shard-aware reaper using internal_client: p 925188 21:44:54 <patch-bot> https://review.opendev.org/c/openstack/swift/+/925188 - swift - WIP: reaper: use internal client - 5 patch sets 21:45:25 <mattoliver> Still need to clean it up. As there was a bit of live hacking/debugging while diagnosing why the probetest kept failing. 21:46:05 <mattoliver> but ended up being (as I said) get_infos and better account-server override-delete support. 21:46:24 <mattoliver> turns out direct client made certain things much easier. 21:47:01 <mattoliver> That patch also adds a 'legecy_mode' to the reaper, for a possible upgrade path.. but that's still in debate.. so still wip 21:47:17 <mattoliver> that's all I have 21:48:31 <jianjian> good work, Matt. 21:48:59 <jianjian> yeah, zuul is not happy yet with the reaper patch 21:50:58 <mattoliver> yeah, I saw that. Only got probe test working late last night.. so now I'll doublecheck everything else with a fresh brain :) 21:51:27 <timburke> think it's just that unit tests need updating 21:51:30 <jianjian> for the "Quarantine asyncs older then reclaim_age" patch, is there a valid case that we are going to need that async pending even after reclaim_age? 21:52:22 <timburke> i think the idea is that the operator probably ought to dig in some and figure out what the listing is supposed to look like 21:52:58 <jianjian> yeah, I am thinking if there is case like uploaded object is still there, and container listing still doesn't have it 21:54:15 <jianjian> will think more and look into it 21:54:46 <mattoliver> Clay liked the idea try again and only q if it failed 21:55:27 <timburke> with multiple replicas trying to get the update in, i don't think dark data tends to be the issue. the trouble is usually the opposite: ghost listings, where the data's been cleaned up but container still has the listing 21:55:30 <mattoliver> if it partially succeeds (managed to update a replica) then keep it around.. but knowing it still might get q next time round. 21:56:04 <mattoliver> yeah, ghost listings is really what I meant 21:56:54 <jianjian> good point, the chance of ghost listing is bigger 21:58:12 <timburke> even if we've got an async for the delete, once we get it to the container, it'll delete the row entirely, so another replica can just send the ghost back again :-( 21:59:11 <timburke> all right, we're about at time 21:59:22 <mattoliver> does it delete entirely or put a tombstone (mark deleted=1) in the db 21:59:22 <timburke> thank you all for coming, and thanks for working on swift! 21:59:37 <timburke> mattoliver, well, if it's past reclaim age... 21:59:49 <timburke> #endmeeting