opendevreview | Tim Burke proposed openstack/swift master: db: Attempt to clean up part dir post replication https://review.opendev.org/c/openstack/swift/+/830535 | 00:07 |
---|---|---|
opendevreview | Tim Burke proposed openstack/liberasurecode master: Add CentOS 9 Stream job https://review.opendev.org/c/openstack/liberasurecode/+/820969 | 05:38 |
opendevreview | Matthew Oliver proposed openstack/swift master: POC/WIP - db: shard up the DatabaseBroker pending files https://review.opendev.org/c/openstack/swift/+/830551 | 05:59 |
mattoliver | ^ that is _very_ rough, and still has debugging `import q` statements int it. So will fail tests. Just want to push it off my laptop before I go further down the rabbit hole :P | 06:01 |
afaranha | hi, anyone knows if there's currently some issues when adding jobs zuul for wallaby and xena? We added fips jobs and it's not being run: https://review.opendev.org/c/openstack/swift/+/827901 https://review.opendev.org/c/openstack/swift/+/827902 | 10:08 |
clarkb | afaranha: can you point to where it isn't being run? The fips jobs appear to have run against 827901 and 827902 and there are no newer xena or wallaby changes that may haev triggered the tests | 16:21 |
afaranha | clarkb, shouldn't it run on the patch itself? | 16:24 |
clarkb | afaranha: yes, it did | 16:25 |
afaranha | i can't see its results on the patch itself | 16:25 |
afaranha | in others projects, like nova, I can see it there | 16:26 |
afaranha | https://review.opendev.org/c/openstack/nova/+/827895 | 16:26 |
clarkb | for 827901 zuul +1 on february 9. You see them if you open the zuul summary too | 16:26 |
clarkb | they are there | 16:26 |
clarkb | they don't run in the gate so are not part of the gate testing | 16:26 |
clarkb | but the change didn't add them to the gate so that is expected | 16:27 |
afaranha | clarkb++ ow, nice, thanks for the explanation | 16:29 |
*** timburke_ is now known as timburke | 20:59 | |
kota | good morning | 21:00 |
timburke | #startmeeting swift | 21:00 |
opendevmeet | Meeting started Wed Feb 23 21:00:30 2022 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot. | 21:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 21:00 |
opendevmeet | The meeting name has been set to 'swift' | 21:00 |
timburke | who's here for the swift meeting? | 21:00 |
mattoliver | o/ | 21:00 |
kota | o/ | 21:01 |
acoles | o/ | 21:02 |
timburke | as usual, the agenda's at https://wiki.openstack.org/wiki/Meetings/Swift | 21:02 |
timburke | (though i've forgotten to update it :P) | 21:02 |
timburke | #topic PTG | 21:03 |
timburke | quick reminder to fill out the doodle poll to pick meeting times | 21:03 |
timburke | #link https://doodle.com/poll/qs2pysgyb8nb36c2 | 21:03 |
kota | oh ok. will do soon | 21:03 |
timburke | i'll get an etherpad up to collect development topics, too | 21:04 |
timburke | #topic priority reviews | 21:04 |
timburke | i updated the page at https://wiki.openstack.org/wiki/Swift/PriorityReviews | 21:05 |
timburke | mostly to call out some patches i know we're running in prod | 21:05 |
timburke | some seem about ready to go -- expirer: Only try to delete empty containers (https://review.opendev.org/c/openstack/swift/+/825883) did just what we hoped it would, and we saw a precipitous drop in container deletes and listing shard range cache misses | 21:07 |
acoles | yes that was a great improvement | 21:08 |
timburke | others had somewhat more mixed results -- container-server: plumb includes down into _get_shard_range_rows (https://review.opendev.org/c/openstack/swift/+/569847) *maybe* had some impact on updater timings, but it was hard to say decidedly | 21:08 |
timburke | there was one that i wanted to check in on in particular | 21:08 |
timburke | #link https://review.opendev.org/c/openstack/swift/+/809969 | 21:08 |
timburke | Sharding: a remote SR without an epoch can't replicate over one with an epoch | 21:09 |
timburke | mattoliver, am i remembering right that the idea was to get the no-epoch SR to stick around so we could hunt down how it happened? | 21:09 |
mattoliver | That stops the reset, but I think currently locks the problem to the problem node. | 21:10 |
mattoliver | But if that problem node is a handoff then it might be fine. | 21:10 |
mattoliver | Interedtly we haven't seen to problem again since we started running it. | 21:10 |
timburke | what do we think about merging it sooner rather than later, and calling the problem fixed until we get new information? | 21:11 |
mattoliver | Yeah, kk, it does log when there is an issue, so it'll let people know. | 21:11 |
acoles | might be worth adding broker.db_path to the warning? | 21:13 |
mattoliver | oh yeah, good idea. | 21:14 |
timburke | all right, that's about all i've got then | 21:15 |
timburke | #topic open discussion | 21:15 |
mattoliver | I haven't looked at the patch so will look today | 21:15 |
timburke | what else should we bring up this week? | 21:15 |
mattoliver | I added handoff_delete to the db replicators https://review.opendev.org/c/openstack/swift/+/828637 | 21:16 |
mattoliver | which helps when needing to drain and gets them closer to on par with the obj replicator | 21:16 |
mattoliver | Also been playing with concurrent container object puts to the same container and trying to understand the problems involved and attempting to improve things some more. | 21:18 |
timburke | nice! along the same lines, i wrote up https://review.opendev.org/c/openstack/swift/+/830535 to clean up part dirs more quickly when you're rebalancing DBs | 21:19 |
mattoliver | cool | 21:19 |
mattoliver | In initial testing, moving the container directory lock and sharding out the pending file and locking the pending file your updating seems really promising. Getting much less directory lock timeouts | 21:19 |
mattoliver | Just improves concurrent access to the server. So helps when running multiple workers | 21:20 |
mattoliver | current POC WIP is https://review.opendev.org/c/openstack/swift/+/830551 | 21:21 |
timburke | yeah, that looked promising -- anything to get a few more reqs/s out of the container-server | 21:22 |
mattoliver | That still has debugging and q statements in it. Just wanted to get it backed up off my laptop. | 21:22 |
mattoliver | +1 | 21:22 |
timburke | one thing i'm still curious about is what the curve looks like for number of container-server workers vs. max concurrent requests before clients start hitting timeouts | 21:24 |
mattoliver | yeah, on my VSAIO wont be as high as a real server :P | 21:25 |
timburke | still, hopefully the curve would still look somewhat similar -- start off at some level, and as you add a *ton* of workers it drops pretty low because of all the contention -- but what happens in the middle? | 21:26 |
timburke | i feel like that may push us toward something like a servers-per-port strategy | 21:26 |
mattoliver | yup, can have a play. | 21:26 |
mattoliver | currently I'm randomly choosing a pending file shard when a put comes in. I wonder if I could just have a shard per worker, or maybe its shards per worker. | 21:27 |
mattoliver | some of the timeouts could also be due to the randomness of choosing a shard. | 21:27 |
acoles | mattoliver: are you no longer locking the parent directory when appending to the pending file? | 21:28 |
mattoliver | nope, not unless it's a _commit_puts and we actually update the DB | 21:29 |
mattoliver | but not sure the effect that is on other things like replication yet | 21:29 |
mattoliver | but I do lock the pending file being updated so we don't loose pending data. | 21:29 |
acoles | but not locking the pending file when flushing it? | 21:30 |
acoles | does the parent dir lock also take lock on all the pending files? | 21:30 |
mattoliver | I do lock then too, because we use a truncate on it | 21:30 |
timburke | yeah, i'd imagine you'd want to lock all the pending files (and the parent dir) when flushing | 21:31 |
acoles | OIC down in commit_puts | 21:31 |
mattoliver | but I take a lock on a pending file while flushing it, and only while dealing with that one so a concurrent put could go use it again. | 21:31 |
mattoliver | timburke: yup | 21:32 |
timburke | nice | 21:32 |
timburke | if anyone has some spare time to think about a client-facing api change, i've got some users that'd appreciate something like https://review.opendev.org/c/openstack/swift/+/829605 - container: Add delimiter-depth query param | 21:32 |
acoles | I was wondering if it would be possible to direct updates to a pending file that isn't being flushed? | 21:33 |
mattoliver | oh interesting! | 21:34 |
timburke | that'd be fancy! do it as a ring ;-) | 21:34 |
acoles | e.g. if the pending files could be pinned to workers | 21:34 |
acoles | or some kind of rotation | 21:34 |
mattoliver | I like it! | 21:35 |
acoles | maybe just try 'em all til you get a lock, a bit like how we do multiple lock files | 21:35 |
mattoliver | yeah can borrow that code as a start at least :) | 21:36 |
mattoliver | also like the ring like approach. | 21:36 |
mattoliver | Will have a play. thanks for the awesome ideas | 21:37 |
timburke | all right, i think i'll call it | 21:38 |
timburke | thank you all for coming, and thank you for working on swift! | 21:38 |
timburke | #endmeeting | 21:38 |
opendevmeet | Meeting ended Wed Feb 23 21:38:34 2022 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 21:38 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/swift/2022/swift.2022-02-23-21.00.html | 21:38 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/swift/2022/swift.2022-02-23-21.00.txt | 21:38 |
opendevmeet | Log: https://meetings.opendev.org/meetings/swift/2022/swift.2022-02-23-21.00.log.html | 21:38 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!