Wednesday, 2022-02-23

opendevreview	Tim Burke proposed openstack/swift master: db: Attempt to clean up part dir post replication https://review.opendev.org/c/openstack/swift/+/830535	00:07
opendevreview	Tim Burke proposed openstack/liberasurecode master: Add CentOS 9 Stream job https://review.opendev.org/c/openstack/liberasurecode/+/820969	05:38
opendevreview	Matthew Oliver proposed openstack/swift master: POC/WIP - db: shard up the DatabaseBroker pending files https://review.opendev.org/c/openstack/swift/+/830551	05:59
mattoliver	^ that is _very_ rough, and still has debugging `import q` statements int it. So will fail tests. Just want to push it off my laptop before I go further down the rabbit hole :P	06:01
afaranha	hi, anyone knows if there's currently some issues when adding jobs zuul for wallaby and xena? We added fips jobs and it's not being run: https://review.opendev.org/c/openstack/swift/+/827901 https://review.opendev.org/c/openstack/swift/+/827902	10:08
clarkb	afaranha: can you point to where it isn't being run? The fips jobs appear to have run against 827901 and 827902 and there are no newer xena or wallaby changes that may haev triggered the tests	16:21
afaranha	clarkb, shouldn't it run on the patch itself?	16:24
clarkb	afaranha: yes, it did	16:25
afaranha	i can't see its results on the patch itself	16:25
afaranha	in others projects, like nova, I can see it there	16:26
afaranha	https://review.opendev.org/c/openstack/nova/+/827895	16:26
clarkb	for 827901 zuul +1 on february 9. You see them if you open the zuul summary too	16:26
clarkb	they are there	16:26
clarkb	they don't run in the gate so are not part of the gate testing	16:26
clarkb	but the change didn't add them to the gate so that is expected	16:27
afaranha	clarkb++ ow, nice, thanks for the explanation	16:29
*** timburke_ is now known as timburke		20:59
kota	good morning	21:00
timburke	#startmeeting swift	21:00
opendevmeet	Meeting started Wed Feb 23 21:00:30 2022 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot.	21:00
opendevmeet	Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.	21:00
opendevmeet	The meeting name has been set to 'swift'	21:00
timburke	who's here for the swift meeting?	21:00
mattoliver	o/	21:00
kota	o/	21:01
acoles	o/	21:02
timburke	as usual, the agenda's at https://wiki.openstack.org/wiki/Meetings/Swift	21:02
timburke	(though i've forgotten to update it :P)	21:02
timburke	#topic PTG	21:03
timburke	quick reminder to fill out the doodle poll to pick meeting times	21:03
timburke	#link https://doodle.com/poll/qs2pysgyb8nb36c2	21:03
kota	oh ok. will do soon	21:03
timburke	i'll get an etherpad up to collect development topics, too	21:04
timburke	#topic priority reviews	21:04
timburke	i updated the page at https://wiki.openstack.org/wiki/Swift/PriorityReviews	21:05
timburke	mostly to call out some patches i know we're running in prod	21:05
timburke	some seem about ready to go -- expirer: Only try to delete empty containers (https://review.opendev.org/c/openstack/swift/+/825883) did just what we hoped it would, and we saw a precipitous drop in container deletes and listing shard range cache misses	21:07
acoles	yes that was a great improvement	21:08
timburke	others had somewhat more mixed results -- container-server: plumb includes down into _get_shard_range_rows (https://review.opendev.org/c/openstack/swift/+/569847) maybe had some impact on updater timings, but it was hard to say decidedly	21:08
timburke	there was one that i wanted to check in on in particular	21:08
timburke	#link https://review.opendev.org/c/openstack/swift/+/809969	21:08
timburke	Sharding: a remote SR without an epoch can't replicate over one with an epoch	21:09
timburke	mattoliver, am i remembering right that the idea was to get the no-epoch SR to stick around so we could hunt down how it happened?	21:09
mattoliver	That stops the reset, but I think currently locks the problem to the problem node.	21:10
mattoliver	But if that problem node is a handoff then it might be fine.	21:10
mattoliver	Interedtly we haven't seen to problem again since we started running it.	21:10
timburke	what do we think about merging it sooner rather than later, and calling the problem fixed until we get new information?	21:11
mattoliver	Yeah, kk, it does log when there is an issue, so it'll let people know.	21:11
acoles	might be worth adding broker.db_path to the warning?	21:13
mattoliver	oh yeah, good idea.	21:14
timburke	all right, that's about all i've got then	21:15
timburke	#topic open discussion	21:15
mattoliver	I haven't looked at the patch so will look today	21:15
timburke	what else should we bring up this week?	21:15
mattoliver	I added handoff_delete to the db replicators https://review.opendev.org/c/openstack/swift/+/828637	21:16
mattoliver	which helps when needing to drain and gets them closer to on par with the obj replicator	21:16
mattoliver	Also been playing with concurrent container object puts to the same container and trying to understand the problems involved and attempting to improve things some more.	21:18
timburke	nice! along the same lines, i wrote up https://review.opendev.org/c/openstack/swift/+/830535 to clean up part dirs more quickly when you're rebalancing DBs	21:19
mattoliver	cool	21:19
mattoliver	In initial testing, moving the container directory lock and sharding out the pending file and locking the pending file your updating seems really promising. Getting much less directory lock timeouts	21:19
mattoliver	Just improves concurrent access to the server. So helps when running multiple workers	21:20
mattoliver	current POC WIP is https://review.opendev.org/c/openstack/swift/+/830551	21:21
timburke	yeah, that looked promising -- anything to get a few more reqs/s out of the container-server	21:22
mattoliver	That still has debugging and q statements in it. Just wanted to get it backed up off my laptop.	21:22
mattoliver	+1	21:22
timburke	one thing i'm still curious about is what the curve looks like for number of container-server workers vs. max concurrent requests before clients start hitting timeouts	21:24
mattoliver	yeah, on my VSAIO wont be as high as a real server :P	21:25
timburke	still, hopefully the curve would still look somewhat similar -- start off at some level, and as you add a ton of workers it drops pretty low because of all the contention -- but what happens in the middle?	21:26
timburke	i feel like that may push us toward something like a servers-per-port strategy	21:26
mattoliver	yup, can have a play.	21:26
mattoliver	currently I'm randomly choosing a pending file shard when a put comes in. I wonder if I could just have a shard per worker, or maybe its shards per worker.	21:27
mattoliver	some of the timeouts could also be due to the randomness of choosing a shard.	21:27
acoles	mattoliver: are you no longer locking the parent directory when appending to the pending file?	21:28
mattoliver	nope, not unless it's a _commit_puts and we actually update the DB	21:29
mattoliver	but not sure the effect that is on other things like replication yet	21:29
mattoliver	but I do lock the pending file being updated so we don't loose pending data.	21:29
acoles	but not locking the pending file when flushing it?	21:30
acoles	does the parent dir lock also take lock on all the pending files?	21:30
mattoliver	I do lock then too, because we use a truncate on it	21:30
timburke	yeah, i'd imagine you'd want to lock all the pending files (and the parent dir) when flushing	21:31
acoles	OIC down in commit_puts	21:31
mattoliver	but I take a lock on a pending file while flushing it, and only while dealing with that one so a concurrent put could go use it again.	21:31
mattoliver	timburke: yup	21:32
timburke	nice	21:32
timburke	if anyone has some spare time to think about a client-facing api change, i've got some users that'd appreciate something like https://review.opendev.org/c/openstack/swift/+/829605 - container: Add delimiter-depth query param	21:32
acoles	I was wondering if it would be possible to direct updates to a pending file that isn't being flushed?	21:33
mattoliver	oh interesting!	21:34
timburke	that'd be fancy! do it as a ring ;-)	21:34
acoles	e.g. if the pending files could be pinned to workers	21:34
acoles	or some kind of rotation	21:34
mattoliver	I like it!	21:35
acoles	maybe just try 'em all til you get a lock, a bit like how we do multiple lock files	21:35
mattoliver	yeah can borrow that code as a start at least :)	21:36
mattoliver	also like the ring like approach.	21:36
mattoliver	Will have a play. thanks for the awesome ideas	21:37
timburke	all right, i think i'll call it	21:38
timburke	thank you all for coming, and thank you for working on swift!	21:38
timburke	#endmeeting	21:38
opendevmeet	Meeting ended Wed Feb 23 21:38:34 2022 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)	21:38
opendevmeet	Minutes: https://meetings.opendev.org/meetings/swift/2022/swift.2022-02-23-21.00.html	21:38
opendevmeet	Minutes (text): https://meetings.opendev.org/meetings/swift/2022/swift.2022-02-23-21.00.txt	21:38
opendevmeet	Log: https://meetings.opendev.org/meetings/swift/2022/swift.2022-02-23-21.00.log.html	21:38

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!