21:00:57 #startmeeting swift 21:00:57 Meeting started Wed Oct 2 21:00:57 2024 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:57 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:57 The meeting name has been set to 'swift' 21:01:05 who's here for the swift meeting? 21:01:18 o/ 21:01:20 o/ 21:01:50 i didn't get around to updating the agenda (sorry) 21:02:06 but first up is still the PTG! 21:02:10 #topic PTG 21:02:27 o/ 21:02:37 it's not far off now -- Oct 21-25 21:02:39 (not a part of the team, but thought I should make an appearance!) 21:04:54 i'm never too particular about who's "part of the team" -- way i see it, if you want to help make swift better, you're always welcome :-) 21:05:41 great :D 21:05:53 i still need to book meeting slots for the ptg -- filling out the poll at https://framadate.org/LQOsGVVWXDhXqQUw would be appreciated 21:06:32 though i also know there's a strong US contingent that hasn't responded yet :-/ i poked them internally earlier today 21:07:08 we've also started gathering topics to discuss at 21:07:10 #link https://etherpad.opendev.org/p/swift-ptg-epoxy 21:09:26 i think i heard that cschwede might be able to make it, too -- it'd be nice to catch up a bit :-) 21:10:10 but i think that's about all i've got to say about the ptg 21:10:12 next up 21:10:20 #topic feature/mpu 21:10:31 acoles is on a roll lately! 21:10:54 acoles has been on a streak! 21:11:01 i'm sure he'd appreciate more feedback if anyone has some spare review cycles 21:11:10 Trying to find some time to review these 21:11:26 #link https://review.opendev.org/q/project:openstack/swift+branch:feature/mpu+is:open 21:11:52 to get a sense of what's currently in progress 21:14:05 next up 21:14:17 #topic object expirer 21:14:42 i know clayg and jianjian (and to some degree, acoles) have been thinking about this lately 21:14:59 o/ 21:17:54 Do we mean the object expirer for orphas in this case? Trying to get the context 21:18:08 so we have a large backlog of work in the queue, and recently updated our task-containers-per-day from 100 to 10000 21:18:33 the 100 is currently a hard-coded constant buried down in utils 21:20:32 we previously ran into issues with all expirers wanting to list all tasks in every container -- with an expirer running on each object-server, we were doing so many listings that the container servers would get overloaded 21:22:45 so we put up a patch to try to divide things such that any particular expirer could skip an entire container, then take more work from the containers that it *did* bother to list 21:24:54 in the extreme, each expirer could find the one and only container it actually cared about and work through the whole thing -- but only if we remove that hard-coded 100-containers-per-day limit 21:25:39 trouble was, even after doing that, we have a bunch of work still in the old queues, so what would happen was that we'd get 100 really busy expirers, and 1000+ mostly-idle ones 21:26:24 yeah, Tim summarized the situation very well, thanks 21:27:11 and worse, a bunch of the old entries were in fact stale -- the object had been overwritten, or updated with a new x-delete-at -- and the attempt to clean up the queue went to the new 1-in-10000 container, rather than the old 1-in-100 21:27:42 so we're also looking at what to do with the mass of 412s you get when X-If-Delete-At doesn't match 21:28:01 we think we've found a couple viable paths forward 21:28:21 first is to try to migrate the old queue entries to the new, expected location 21:28:55 yes, thanks Tim for the suggestions for fixing 412, I am working on a patch trying to remove those old delete tasks from queue as soon as we see them 21:29:21 second is to try to figure out when a 412 happens because the metadata it was based upon was stale, so we can delete them rather than retry forever 21:29:35 also, there is one patch under review as well, to have expirer dealing with all different listing errors, 503, 404: If, instead of this remark, my father had taken the pains to explain to me that the principles of Agrippa had been entirely exploded, and that a modern system of science had been introduced, I should certainly have thrown Agrippa aside, but the cursory glance my father had taken of my volume by no means 21:29:35 assured me that he was acquainted with its contents; and I continued to read with the greatest avidity. 21:30:10 sorry, that was my son 😞 21:30:35 also, there is one patch under review as well, to have expirer dealing with all different listing errors, 503, 404: https://review.opendev.org/c/openstack/swift/+/930393 21:30:35 patch 930393 - swift - Object-expirer: continue to process next container... - 4 patch sets 21:30:39 jianjian, you reading frankenstein? i recently picked that up again -- been a while :-) 21:32:11 i was just looking for another quote earlier today: "It was my temper to avoid a crowd, and to attach myself fervently to a few." 21:32:25 my son is working on his homework, yes, it is frankenstein. And I accidentally copied his copy-and-post, haha, sorry again. 21:33:05 lol 21:33:54 some other interesting patches in this vein: 21:33:56 #link https://review.opendev.org/c/openstack/swift/+/920452 21:33:57 patch 920452 - swift - Configurable expiring_objects_task_container_per_day - 19 patch sets 21:34:04 #link https://review.opendev.org/c/openstack/swift/+/918366 21:34:04 Tim is a great coder plus reader 😉 21:34:05 patch 918366 - swift - Parallel distributed task container iteration - 22 patch sets 21:35:58 #link https://review.opendev.org/c/openstack/swift/+/914713 21:35:58 patch 914713 - swift - DNM expirer: new options to control task iteration - 36 patch sets 21:36:46 i would also appreciate a review on 21:36:46 #link https://review.opendev.org/c/openstack/swift/+/916547 21:36:46 patch 916547 - swift - fix x-open-expired 404 on HEAD?part-number reqs - 43 patch sets 21:37:29 those two patches introduce a new random strategy and parallel strategy to distribute the expirer listing work evenly to every expirer, and probably will ditch random and land parallel only upstream if testing of parallel goes well in the end. 21:37:45 i feel like we ought to structure these a little more so we could actually review/merge some of them -- i don't particularly like hos that DNM patch is ahead of the other two -- makes me worry none of it's actually close to being able to merge 21:37:56 my bad, msg got prematurely sent 21:38:05 ok, good to know jianjian 21:38:14 indianwhocodes, no worries :-) 21:39:05 i think i'm running out of things to bring up anyway 21:39:09 #topic open discussion 21:39:16 what else should we discuss this week? 21:39:51 So, something came up around here, wanted to get your input in 21:40:49 The demand is to delete a container/bucket with items inside of it, without having to list all objects in container -> delete objects in batches -> delete container 21:41:14 does it need to be user(/client) facing? 21:41:16 Isn't it what bulk does? 21:41:18 we do get people grumpy about that, and have scripts to work around it 21:41:39 Was thinking of doind something like what object expirer does, but for buckets, `X-Delete-At` and "markers" 21:41:41 o/ I have an item for open discussion whenever you're at the end of this thread 21:41:51 timburke yes we'd like it client facing 21:41:59 It has both bulk upload and bulk delete. Still not a single call but at least better than the full thing. 21:41:59 a while back i wrote up https://github.com/openstack/swift/blob/master/swift/cli/container_deleter.py but it's intended as an operator tool 21:42:35 zaitcev kind of. In this case we're deleting buckets with about 100 milion objects, so it takes quite a long time to delete all objects throught client scripting 21:42:54 Some bucket deletions are taking north of 7 days running non stop 21:44:06 timburke nice to know, will certainly take a look. Thanks for the help 21:44:23 rwmtinkywinky4 mind sharing how you solved this? 21:44:32 fulecorafa: I'll see if I can share some of our deletion scripts 21:44:34 fulecorafa, so one way or another, it's going to take a long time to fully process. what should happen with the bucket in the mean time? do we want this to be something like an account deletion, where the container still exists but client attempts to access it get back a 410 or the like? 21:45:30 timburke yes I'm aware. But at this point, the important thing is to make it appear to the client as if it's already been deleted 21:45:47 i should go back through some of our old etherpads -- i'm sure i remember people talking about making a container-reaper before 21:46:08 From what I understand, that's what happens in object expirer right? It may still be there, not deleted, but since it is marked with `X-Delete-At` it is not listed to the user/client 21:46:09 fulecorafa, what should happen if they attempt to recreate it? 21:46:38 yup -- objects appear in listings until the expirer actually processes them 21:47:13 That's a good question, but I recon it should just fail with a BucketAlreadyExists 21:48:01 haven't tried it and read code, can we rename a container? 21:48:32 jianjian, nope -- gotta copy all the data, then delete the old locations 21:48:37 The name inputs into DHT directly for both containers and objects, so I highly doubt it's possible. 21:48:45 There's no inode. 21:48:51 bingo 21:48:55 jianjian I think not. The "move" existant is just a copy then delete 21:49:08 yeah, it's the ring. 21:49:47 One thing to notice is that we have a workaround to share the bucket namespace between all accounts 21:50:02 So user_b cannot create my_bucket if user_a/my_bucket exists 21:50:40 fulecorafa, the jobbed off delete seems reasonable enough -- i could definitely imagine a way to make that work. might even be able to piggy-back off the existing account-reaper -- have it start looking at both account and container dbs. it'd be a bit of finicky plumbing to get all the behaviors right, i think 21:51:13 fulecorafa, the global bucket namespace sounds super cool! i keep meaning to think more about that, but haven't 21:51:46 Yeah, in our case we just keep a separate DB, not implemented in swift itself 21:51:56 Sorry about that 21:52:04 yeah, that seems like the way to do it :-) 21:52:37 should be low enough write traffic that it wouldn't be too hard to manage 21:52:49 Yep 21:53:11 I do think it's finicky as well. But I took enough time, I think JayF wanted to chat? 21:53:23 Thanks for the help guys! 21:53:30 Mine is easy and simple: just wanted to draw attention to https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/XJUTKK5ZVWEO6IFHVQ3YSH6S5HQE6NEQ/ 21:53:44 and request on behalf of VMT you audit your security contacts in case we need to contact someone about a security vuln 21:53:55 we've discovered many teams had hideously out of date groups so I'm asking folks to check 21:54:15 that's all but I'm happy to answer questions if you've got 'em o/ 21:55:36 JayF, thanks for the heads-up! there are definitely some folks listed that i wouldn't expect to be terribly active in resolving any security issues, but all the people that should be there are 21:56:17 alright, I'd suggest removing those folks so they aren't accidentally made privvy to secrets they don't need to know 21:56:21 but thanks for checking :D 21:57:37 i probably ought to add a topic to for the PTG: "review core membership" 21:58:18 Yeah. I've suggested in Ironic community that we more aggressively add cores. It's easy to revert an accidentally-merged change; it's hard to onboard someone if the core team dwindles too low. 21:59:29 JayF, so the coresec members are a subset of the overall cores, i take it? 21:59:56 I don't think it has a universal meaning other than "the first place VMT looks for humans to review security issues if they don't have a security liason" 22:00:09 and given our old security liason tracking page is in the wiki and is about 5 years outta date 22:00:17 you can see why I take the approach I am :D 22:01:07 i was thinking more specifically about "what does ironic do?" and "is that maybe what *we* should do?" 22:02:41 i suppose whoever's listed there can specifically bring in other subject-matter experts if/as needed to a given security bug 22:03:19 all right, we're a bit past time 22:03:24 thank you all for coming, and thank you for working on swift! 22:03:26 #endmeeting