Monday, 2023-07-10

paladox	hmm, maybe 4 x 500 (one is 490g) and 1 x 1000 (it's 925g)	12:15
opendevreview	Alistair Coles proposed openstack/swift master: Encode header in latin-1 with wsgi_to_bytes https://review.opendev.org/c/openstack/swift/+/884240	14:20
opendevreview	Alistair Coles proposed openstack/swift master: proxy: remove client_chunk_size and skip_bytes from GetOrHeadHandler https://review.opendev.org/c/openstack/swift/+/886823	15:25
opendevreview	Alistair Coles proposed openstack/swift master: proxy: encapsulate Getter resp, node and parts_iter https://review.opendev.org/c/openstack/swift/+/886994	15:25
paladox	hmm, not sure how to get the perfect balance. so we have 3 x 525, 1 x 490 and 1 x 916. But seem the 3 x 525 ran out of storage whilst there was like 80g left on the 1 x 916 one (this was with the 3 x 600, 1 x 500 and 1 x 900 for weight).	16:32
timburke	paladox, it's hard to get a perfect balance, especially in a small cluster -- the distribution of object sizes tends to be lumpy, some partitions are more (or less) densely filled than average, etc.	16:57
timburke	you can keep fiddling with weights to try to get through the current pain -- but whether a cluster is "healthy" or not should never come down to whether we've got a handful of partitions assigned to one disk vs another. when it seems like it does, we're already to "unhealthy." ultimately you kinda need to get some more hardware in (or delete some data)	17:00
opendevreview	ASHWIN A NAIR proposed openstack/swift master: bad request syntax response missing txn-id https://review.opendev.org/c/openstack/swift/+/887904	17:04
opendevreview	ASHWIN A NAIR proposed openstack/swift master: bad request syntax response missing txn-id https://review.opendev.org/c/openstack/swift/+/887904	17:19
opendevreview	ASHWIN A NAIR proposed openstack/swift master: bad request syntax response missing txn-id https://review.opendev.org/c/openstack/swift/+/887904	17:22
paladox	timburke: ah ok. i'm going to see if i can get more storage. But the doing it to like 3 x 525, 1 x 490 and 1 x 916 as weight is fine?	17:22
opendevreview	ASHWIN A NAIR proposed openstack/swift master: bad request syntax response missing txn-id https://review.opendev.org/c/openstack/swift/+/887904	17:33
timburke	paladox, if it seems to be working for you, go for it. if you're worried about the 525 disks still being too more-full than the others, maybe bring that down some, or bring the 916 up a bit. i'd do it fairly slowly, though -- maybe 1-3% weight change, rebalance, let things settle, see how fullness has changed across the cluster, re-evaluate and decide whether to continue. how much any change will help may largely depend upon	17:37
timburke	how much space is required for the individual partitions that get reassigned at this point	17:37
paladox	ah ok	17:37
opendevreview	Merged openstack/swift master: Encode header in latin-1 with wsgi_to_bytes https://review.opendev.org/c/openstack/swift/+/884240	17:44
reid_g	I just saw the note about the upgrade order: O>C>A>P. Is that documented someplace?	18:43
DHE	I upgraded 1 proxy (nothing else) and it's definitely gone sideways on my EC containers on that proxy..	18:55
timburke	reid_g, unfortunately, i think it's been largely tribal knowledge -- i should make sure that gets written down somewhere. notmyname had a blog post about it a while back, but it's since gone MIA	19:35
timburke	DHE, are there tracebacks with those 500s?	19:35
DHE	https://pastebin.com/raw/ETNsWqQj	19:40
DHE	the version is 2.26.0-10+deb11u1 (yes, debian's managed package)	19:40
timburke	well, the good news is it's fixed on master... https://github.com/openstack/swift/commit/a5fa3cfc	19:50
opendevreview	Merged openstack/swift master: Object-server: keep SLO manifest files in page cache. https://review.opendev.org/c/openstack/swift/+/885302	19:50
timburke	and backported it to victoria: https://github.com/openstack/swift/commit/acb742ac	19:53
timburke	i never did a 2.26.1 tho :-(	19:56
timburke	oh! DHE, you also mentioned proxies sometimes jamming up, right? i think that may have been fixed in a more-recent version, too: https://opendev.org/openstack/swift/src/tag/2.27.0/CHANGELOG#L184-L186	20:01
DHE	umm, yes. I believe I'm the one who suggested the fix	20:02
DHE	the original problem from the changelog jammed up the whole proxy process, deadlocking it forever. this problem seems specific to EC containers, and the proxy process remains otherwise servicable	20:04
timburke	oh, right	20:04
DHE	this py3 issue looks like it's related to my other issue. debian 11 carries swift 2.26 which sounds about right	20:05
DHE	hmm... maybe I can just cherry-pick it over the debian code...	20:06
timburke	i'd recommend it -- the change should apply cleanly. just need to remember to do it again if you ever need to re-image a node or something	20:09
reid_g	Do you have link to the write up by notmyname?	20:09
DHE	with the recent release of debian 12, I hope I don't... :)	20:10
reid_g	I was about to upgrade our clusters and it would be interesting to read it. We running OCAP on all of our hosts and I did not separate the backend upgrades in my testing/staging clusters.	20:11
timburke	reid_g, i've been searching, but haven't had any luck. it was an old post, though; i'm sure there's a bunch more that could be said about how to do upgrades without your clients noticing these days	20:11
timburke	fwiw, we've definitely done upgrades like that (node at a time, all services on a node at once) for a while; it should still work pretty well	20:12
reid_g	Fair Enough. We did that in the past and didn't have any noticable problems. I was curious since you mentioned an order.	20:13
reid_g	Also, while you are here. I'm not sure if you noticed my comment the other day about the commit you posted relating to the unable to bind ports. I tested it out by manually patching a host and it did not fix the issue.	20:15
timburke	good to know, thanks for testing it! i'm still scratching my head, then :-/	20:18
reid_g	The only way I found is to stop the object reconstructor/replicator/server, wait 60 sec for the timewaits to die down and then I can start the object server again.	20:19
timburke	reid_g, what version are you on? i'm wondering if the "seamless reload" from https://github.com/openstack/swift/commit/1107f241 would work any better for you, or if the re-exec'ed process would also fail to bind....	20:21
reid_g	ussuri / 2.25.2	20:22
reid_g	Oh. Actually it still happens in Yoga / 2.29.2	20:25
timburke	reid_g, try doing a `kill -USR1 $MAINPID`! should be supported since 2.24.0. the idea is that we fork an extra child that's responsible for shutting down the old servers, re-exec the main guy to spawn a new batch of workers with new code/configs, then signal the extra child that we're ready so it can actually do the shutdown	20:28
reid_g	I will try to test that out.	20:30
DHE	with SO_REUSEPORT (or SO_REUSEADDR ? I always get them mixed up) it might not even need that. in theory you can start the new service immediately alongside the old service. connections are assigned randomly. is there a "clean shutdown" command?	20:36
DHE	actually that probably causes problems with the service manager	20:37
timburke	DHE, send a `HUP` to the main process and we'll close the listen sockets while still completing any in-flight requests	20:38
timburke	and iirc, we let eventlet take care of setting both REUSEPORT and REUSEADDR for us: https://github.com/eventlet/eventlet/blob/master/eventlet/convenience.py#L34	20:39
reid_g	Sounds like the systemd service unit should have a few more options configured.	20:48
reid_g	I'm just using the default unit files from Ubuntu package	20:49
opendevreview	Tim Burke proposed openstack/swift-bench master: Fix SyntaxWarning https://review.opendev.org/c/openstack/swift-bench/+/888069	21:35
opendevreview	Merged openstack/swift-bench master: Fix SyntaxWarning https://review.opendev.org/c/openstack/swift-bench/+/888069	22:22
opendevreview	Tim Burke proposed openstack/swift-bench master: refactor bin/bench into swiftbench/cli for testing https://review.opendev.org/c/openstack/swift-bench/+/866826	22:42
opendevreview	Tim Burke proposed openstack/swift-bench master: Switch from optparse to argparse https://review.opendev.org/c/openstack/swift-bench/+/874341	22:42
opendevreview	Tim Burke proposed openstack/swift-bench master: support container_name from cli https://review.opendev.org/c/openstack/swift-bench/+/865369	22:42
opendevreview	ASHWIN A NAIR proposed openstack/swift master: bad request syntax response missing txn-id https://review.opendev.org/c/openstack/swift/+/887904	23:15
opendevreview	ASHWIN A NAIR proposed openstack/swift master: bad request syntax response missing txn-id https://review.opendev.org/c/openstack/swift/+/887904	23:32

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!