21:00:30 <timburke> #startmeeting swift 21:00:30 <opendevmeet> Meeting started Wed Feb 23 21:00:30 2022 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:30 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:30 <opendevmeet> The meeting name has been set to 'swift' 21:00:36 <timburke> who's here for the swift meeting? 21:00:58 <mattoliver> o/ 21:01:13 <kota> o/ 21:02:19 <acoles> o/ 21:02:34 <timburke> as usual, the agenda's at https://wiki.openstack.org/wiki/Meetings/Swift 21:02:54 <timburke> (though i've forgotten to update it :P) 21:03:01 <timburke> #topic PTG 21:03:17 <timburke> quick reminder to fill out the doodle poll to pick meeting times 21:03:24 <timburke> #link https://doodle.com/poll/qs2pysgyb8nb36c2 21:03:34 <kota> oh ok. will do soon 21:04:02 <timburke> i'll get an etherpad up to collect development topics, too 21:04:53 <timburke> #topic priority reviews 21:05:00 <timburke> i updated the page at https://wiki.openstack.org/wiki/Swift/PriorityReviews 21:05:32 <timburke> mostly to call out some patches i know we're running in prod 21:07:17 <timburke> some seem about ready to go -- expirer: Only try to delete empty containers (https://review.opendev.org/c/openstack/swift/+/825883) did just what we hoped it would, and we saw a precipitous drop in container deletes and listing shard range cache misses 21:08:09 <acoles> yes that was a great improvement 21:08:28 <timburke> others had somewhat more mixed results -- container-server: plumb includes down into _get_shard_range_rows (https://review.opendev.org/c/openstack/swift/+/569847) *maybe* had some impact on updater timings, but it was hard to say decidedly 21:08:53 <timburke> there was one that i wanted to check in on in particular 21:08:55 <timburke> #link https://review.opendev.org/c/openstack/swift/+/809969 21:09:01 <timburke> Sharding: a remote SR without an epoch can't replicate over one with an epoch 21:09:58 <timburke> mattoliver, am i remembering right that the idea was to get the no-epoch SR to stick around so we could hunt down how it happened? 21:10:03 <mattoliver> That stops the reset, but I think currently locks the problem to the problem node. 21:10:24 <mattoliver> But if that problem node is a handoff then it might be fine. 21:10:42 <mattoliver> Interedtly we haven't seen to problem again since we started running it. 21:11:21 <timburke> what do we think about merging it sooner rather than later, and calling the problem fixed until we get new information? 21:11:58 <mattoliver> Yeah, kk, it does log when there is an issue, so it'll let people know. 21:13:43 <acoles> might be worth adding broker.db_path to the warning? 21:14:56 <mattoliver> oh yeah, good idea. 21:15:03 <timburke> all right, that's about all i've got then 21:15:11 <timburke> #topic open discussion 21:15:15 <mattoliver> I haven't looked at the patch so will look today 21:15:22 <timburke> what else should we bring up this week? 21:16:27 <mattoliver> I added handoff_delete to the db replicators https://review.opendev.org/c/openstack/swift/+/828637 21:16:53 <mattoliver> which helps when needing to drain and gets them closer to on par with the obj replicator 21:18:23 <mattoliver> Also been playing with concurrent container object puts to the same container and trying to understand the problems involved and attempting to improve things some more. 21:19:02 <timburke> nice! along the same lines, i wrote up https://review.opendev.org/c/openstack/swift/+/830535 to clean up part dirs more quickly when you're rebalancing DBs 21:19:25 <mattoliver> cool 21:19:49 <mattoliver> In initial testing, moving the container directory lock and sharding out the pending file and locking the pending file your updating seems really promising. Getting much less directory lock timeouts 21:20:58 <mattoliver> Just improves concurrent access to the server. So helps when running multiple workers 21:21:51 <mattoliver> current POC WIP is https://review.opendev.org/c/openstack/swift/+/830551 21:22:10 <timburke> yeah, that looked promising -- anything to get a few more reqs/s out of the container-server 21:22:19 <mattoliver> That still has debugging and q statements in it. Just wanted to get it backed up off my laptop. 21:22:24 <mattoliver> +1 21:24:17 <timburke> one thing i'm still curious about is what the curve looks like for number of container-server workers vs. max concurrent requests before clients start hitting timeouts 21:25:05 <mattoliver> yeah, on my VSAIO wont be as high as a real server :P 21:26:21 <timburke> still, hopefully the curve would still look somewhat similar -- start off at some level, and as you add a *ton* of workers it drops pretty low because of all the contention -- but what happens in the middle? 21:26:33 <timburke> i feel like that may push us toward something like a servers-per-port strategy 21:26:43 <mattoliver> yup, can have a play. 21:27:28 <mattoliver> currently I'm randomly choosing a pending file shard when a put comes in. I wonder if I could just have a shard per worker, or maybe its shards per worker. 21:27:42 <mattoliver> some of the timeouts could also be due to the randomness of choosing a shard. 21:28:25 <acoles> mattoliver: are you no longer locking the parent directory when appending to the pending file? 21:29:01 <mattoliver> nope, not unless it's a _commit_puts and we actually update the DB 21:29:15 <mattoliver> but not sure the effect that is on other things like replication yet 21:29:49 <mattoliver> but I do lock the pending file being updated so we don't loose pending data. 21:30:10 <acoles> but not locking the pending file when flushing it? 21:30:28 <acoles> does the parent dir lock also take lock on all the pending files? 21:30:31 <mattoliver> I do lock then too, because we use a truncate on it 21:31:18 <timburke> yeah, i'd imagine you'd want to lock all the pending files (and the parent dir) when flushing 21:31:31 <acoles> OIC down in commit_puts 21:31:52 <mattoliver> but I take a lock on a pending file while flushing it, and only while dealing with that one so a concurrent put could go use it again. 21:32:04 <mattoliver> timburke: yup 21:32:11 <timburke> nice 21:32:13 <timburke> if anyone has some spare time to think about a client-facing api change, i've got some users that'd appreciate something like https://review.opendev.org/c/openstack/swift/+/829605 - container: Add delimiter-depth query param 21:33:36 <acoles> I was wondering if it would be possible to direct updates to a pending file that isn't being flushed? 21:34:07 <mattoliver> oh interesting! 21:34:13 <timburke> that'd be fancy! do it as a ring ;-) 21:34:19 <acoles> e.g. if the pending files could be pinned to workers 21:34:46 <acoles> or some kind of rotation 21:35:12 <mattoliver> I like it! 21:35:44 <acoles> maybe just try 'em all til you get a lock, a bit like how we do multiple lock files 21:36:15 <mattoliver> yeah can borrow that code as a start at least :) 21:36:53 <mattoliver> also like the ring like approach. 21:37:08 <mattoliver> Will have a play. thanks for the awesome ideas 21:38:18 <timburke> all right, i think i'll call it 21:38:30 <timburke> thank you all for coming, and thank you for working on swift! 21:38:34 <timburke> #endmeeting