21:00:30 #startmeeting swift 21:00:30 Meeting started Wed Feb 23 21:00:30 2022 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:30 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:30 The meeting name has been set to 'swift' 21:00:36 who's here for the swift meeting? 21:00:58 o/ 21:01:13 o/ 21:02:19 o/ 21:02:34 as usual, the agenda's at https://wiki.openstack.org/wiki/Meetings/Swift 21:02:54 (though i've forgotten to update it :P) 21:03:01 #topic PTG 21:03:17 quick reminder to fill out the doodle poll to pick meeting times 21:03:24 #link https://doodle.com/poll/qs2pysgyb8nb36c2 21:03:34 oh ok. will do soon 21:04:02 i'll get an etherpad up to collect development topics, too 21:04:53 #topic priority reviews 21:05:00 i updated the page at https://wiki.openstack.org/wiki/Swift/PriorityReviews 21:05:32 mostly to call out some patches i know we're running in prod 21:07:17 some seem about ready to go -- expirer: Only try to delete empty containers (https://review.opendev.org/c/openstack/swift/+/825883) did just what we hoped it would, and we saw a precipitous drop in container deletes and listing shard range cache misses 21:08:09 yes that was a great improvement 21:08:28 others had somewhat more mixed results -- container-server: plumb includes down into _get_shard_range_rows (https://review.opendev.org/c/openstack/swift/+/569847) *maybe* had some impact on updater timings, but it was hard to say decidedly 21:08:53 there was one that i wanted to check in on in particular 21:08:55 #link https://review.opendev.org/c/openstack/swift/+/809969 21:09:01 Sharding: a remote SR without an epoch can't replicate over one with an epoch 21:09:58 mattoliver, am i remembering right that the idea was to get the no-epoch SR to stick around so we could hunt down how it happened? 21:10:03 That stops the reset, but I think currently locks the problem to the problem node. 21:10:24 But if that problem node is a handoff then it might be fine. 21:10:42 Interedtly we haven't seen to problem again since we started running it. 21:11:21 what do we think about merging it sooner rather than later, and calling the problem fixed until we get new information? 21:11:58 Yeah, kk, it does log when there is an issue, so it'll let people know. 21:13:43 might be worth adding broker.db_path to the warning? 21:14:56 oh yeah, good idea. 21:15:03 all right, that's about all i've got then 21:15:11 #topic open discussion 21:15:15 I haven't looked at the patch so will look today 21:15:22 what else should we bring up this week? 21:16:27 I added handoff_delete to the db replicators https://review.opendev.org/c/openstack/swift/+/828637 21:16:53 which helps when needing to drain and gets them closer to on par with the obj replicator 21:18:23 Also been playing with concurrent container object puts to the same container and trying to understand the problems involved and attempting to improve things some more. 21:19:02 nice! along the same lines, i wrote up https://review.opendev.org/c/openstack/swift/+/830535 to clean up part dirs more quickly when you're rebalancing DBs 21:19:25 cool 21:19:49 In initial testing, moving the container directory lock and sharding out the pending file and locking the pending file your updating seems really promising. Getting much less directory lock timeouts 21:20:58 Just improves concurrent access to the server. So helps when running multiple workers 21:21:51 current POC WIP is https://review.opendev.org/c/openstack/swift/+/830551 21:22:10 yeah, that looked promising -- anything to get a few more reqs/s out of the container-server 21:22:19 That still has debugging and q statements in it. Just wanted to get it backed up off my laptop. 21:22:24 +1 21:24:17 one thing i'm still curious about is what the curve looks like for number of container-server workers vs. max concurrent requests before clients start hitting timeouts 21:25:05 yeah, on my VSAIO wont be as high as a real server :P 21:26:21 still, hopefully the curve would still look somewhat similar -- start off at some level, and as you add a *ton* of workers it drops pretty low because of all the contention -- but what happens in the middle? 21:26:33 i feel like that may push us toward something like a servers-per-port strategy 21:26:43 yup, can have a play. 21:27:28 currently I'm randomly choosing a pending file shard when a put comes in. I wonder if I could just have a shard per worker, or maybe its shards per worker. 21:27:42 some of the timeouts could also be due to the randomness of choosing a shard. 21:28:25 mattoliver: are you no longer locking the parent directory when appending to the pending file? 21:29:01 nope, not unless it's a _commit_puts and we actually update the DB 21:29:15 but not sure the effect that is on other things like replication yet 21:29:49 but I do lock the pending file being updated so we don't loose pending data. 21:30:10 but not locking the pending file when flushing it? 21:30:28 does the parent dir lock also take lock on all the pending files? 21:30:31 I do lock then too, because we use a truncate on it 21:31:18 yeah, i'd imagine you'd want to lock all the pending files (and the parent dir) when flushing 21:31:31 OIC down in commit_puts 21:31:52 but I take a lock on a pending file while flushing it, and only while dealing with that one so a concurrent put could go use it again. 21:32:04 timburke: yup 21:32:11 nice 21:32:13 if anyone has some spare time to think about a client-facing api change, i've got some users that'd appreciate something like https://review.opendev.org/c/openstack/swift/+/829605 - container: Add delimiter-depth query param 21:33:36 I was wondering if it would be possible to direct updates to a pending file that isn't being flushed? 21:34:07 oh interesting! 21:34:13 that'd be fancy! do it as a ring ;-) 21:34:19 e.g. if the pending files could be pinned to workers 21:34:46 or some kind of rotation 21:35:12 I like it! 21:35:44 maybe just try 'em all til you get a lock, a bit like how we do multiple lock files 21:36:15 yeah can borrow that code as a start at least :) 21:36:53 also like the ring like approach. 21:37:08 Will have a play. thanks for the awesome ideas 21:38:18 all right, i think i'll call it 21:38:30 thank you all for coming, and thank you for working on swift! 21:38:34 #endmeeting