paladox | hmm, maybe 4 x 500 (one is 490g) and 1 x 1000 (it's 925g) | 12:15 |
---|---|---|
opendevreview | Alistair Coles proposed openstack/swift master: Encode header in latin-1 with wsgi_to_bytes https://review.opendev.org/c/openstack/swift/+/884240 | 14:20 |
opendevreview | Alistair Coles proposed openstack/swift master: proxy: remove client_chunk_size and skip_bytes from GetOrHeadHandler https://review.opendev.org/c/openstack/swift/+/886823 | 15:25 |
opendevreview | Alistair Coles proposed openstack/swift master: proxy: encapsulate Getter resp, node and parts_iter https://review.opendev.org/c/openstack/swift/+/886994 | 15:25 |
paladox | hmm, not sure how to get the perfect balance. so we have 3 x 525, 1 x 490 and 1 x 916. But seem the 3 x 525 ran out of storage whilst there was like 80g left on the 1 x 916 one (this was with the 3 x 600, 1 x 500 and 1 x 900 for weight). | 16:32 |
timburke | paladox, it's hard to get a *perfect* balance, especially in a small cluster -- the distribution of object sizes tends to be lumpy, some partitions are more (or less) densely filled than average, etc. | 16:57 |
timburke | you can keep fiddling with weights to try to get through the current pain -- but whether a cluster is "healthy" or not should never come down to whether we've got a handful of partitions assigned to one disk vs another. when it seems like it does, we're already to "unhealthy." ultimately you kinda *need* to get some more hardware in (or delete some data) | 17:00 |
opendevreview | ASHWIN A NAIR proposed openstack/swift master: bad request syntax response missing txn-id https://review.opendev.org/c/openstack/swift/+/887904 | 17:04 |
opendevreview | ASHWIN A NAIR proposed openstack/swift master: bad request syntax response missing txn-id https://review.opendev.org/c/openstack/swift/+/887904 | 17:19 |
opendevreview | ASHWIN A NAIR proposed openstack/swift master: bad request syntax response missing txn-id https://review.opendev.org/c/openstack/swift/+/887904 | 17:22 |
paladox | timburke: ah ok. i'm going to see if i can get more storage. But the doing it to like 3 x 525, 1 x 490 and 1 x 916 as weight is fine? | 17:22 |
opendevreview | ASHWIN A NAIR proposed openstack/swift master: bad request syntax response missing txn-id https://review.opendev.org/c/openstack/swift/+/887904 | 17:33 |
timburke | paladox, if it seems to be working for you, go for it. if you're worried about the 525 disks still being too more-full than the others, maybe bring that down some, or bring the 916 up a bit. i'd do it fairly slowly, though -- maybe 1-3% weight change, rebalance, let things settle, see how fullness has changed across the cluster, re-evaluate and decide whether to continue. how much any change will help may largely depend upon | 17:37 |
timburke | how much space is required for the individual partitions that get reassigned at this point | 17:37 |
paladox | ah ok | 17:37 |
opendevreview | Merged openstack/swift master: Encode header in latin-1 with wsgi_to_bytes https://review.opendev.org/c/openstack/swift/+/884240 | 17:44 |
reid_g | I just saw the note about the upgrade order: O>C>A>P. Is that documented someplace? | 18:43 |
DHE | I upgraded 1 proxy (nothing else) and it's definitely gone sideways on my EC containers on that proxy.. | 18:55 |
timburke | reid_g, unfortunately, i think it's been largely tribal knowledge -- i should make sure that gets written down somewhere. notmyname had a blog post about it a while back, but it's since gone MIA | 19:35 |
timburke | DHE, are there tracebacks with those 500s? | 19:35 |
DHE | https://pastebin.com/raw/ETNsWqQj | 19:40 |
DHE | the version is 2.26.0-10+deb11u1 (yes, debian's managed package) | 19:40 |
timburke | well, the good news is it's fixed on master... https://github.com/openstack/swift/commit/a5fa3cfc | 19:50 |
opendevreview | Merged openstack/swift master: Object-server: keep SLO manifest files in page cache. https://review.opendev.org/c/openstack/swift/+/885302 | 19:50 |
timburke | and backported it to victoria: https://github.com/openstack/swift/commit/acb742ac | 19:53 |
timburke | i never did a 2.26.1 tho :-( | 19:56 |
timburke | oh! DHE, you also mentioned proxies sometimes jamming up, right? i think that may have been fixed in a more-recent version, too: https://opendev.org/openstack/swift/src/tag/2.27.0/CHANGELOG#L184-L186 | 20:01 |
DHE | umm, yes. I believe I'm the one who suggested the fix | 20:02 |
DHE | the original problem from the changelog jammed up the whole proxy process, deadlocking it forever. this problem seems specific to EC containers, and the proxy process remains otherwise servicable | 20:04 |
timburke | oh, right | 20:04 |
DHE | this py3 issue looks like it's related to my other issue. debian 11 carries swift 2.26 which sounds about right | 20:05 |
DHE | hmm... maybe I can just cherry-pick it over the debian code... | 20:06 |
timburke | i'd recommend it -- the change should apply cleanly. just need to remember to do it again if you ever need to re-image a node or something | 20:09 |
reid_g | Do you have link to the write up by notmyname? | 20:09 |
DHE | with the recent release of debian 12, I hope I don't... :) | 20:10 |
reid_g | I was about to upgrade our clusters and it would be interesting to read it. We running OCAP on all of our hosts and I did not separate the backend upgrades in my testing/staging clusters. | 20:11 |
timburke | reid_g, i've been searching, but haven't had any luck. it was an old post, though; i'm sure there's a bunch more that could be said about how to do upgrades without your clients noticing these days | 20:11 |
timburke | fwiw, we've definitely done upgrades like that (node at a time, all services on a node at once) for a while; it should still work pretty well | 20:12 |
reid_g | Fair Enough. We did that in the past and didn't have any noticable problems. I was curious since you mentioned an order. | 20:13 |
reid_g | Also, while you are here. I'm not sure if you noticed my comment the other day about the commit you posted relating to the unable to bind ports. I tested it out by manually patching a host and it did not fix the issue. | 20:15 |
timburke | good to know, thanks for testing it! i'm still scratching my head, then :-/ | 20:18 |
reid_g | The only way I found is to stop the object reconstructor/replicator/server, wait 60 sec for the timewaits to die down and then I can start the object server again. | 20:19 |
timburke | reid_g, what version are you on? i'm wondering if the "seamless reload" from https://github.com/openstack/swift/commit/1107f241 would work any better for you, or if the re-exec'ed process would also fail to bind.... | 20:21 |
reid_g | ussuri / 2.25.2 | 20:22 |
reid_g | Oh. Actually it still happens in Yoga / 2.29.2 | 20:25 |
timburke | reid_g, try doing a `kill -USR1 $MAINPID`! should be supported since 2.24.0. the idea is that we fork an extra child that's responsible for shutting down the old servers, re-exec the main guy to spawn a new batch of workers with new code/configs, then signal the extra child that we're ready so it can actually do the shutdown | 20:28 |
reid_g | I will try to test that out. | 20:30 |
DHE | with SO_REUSEPORT (or SO_REUSEADDR ? I always get them mixed up) it might not even need that. in theory you can start the new service immediately alongside the old service. connections are assigned randomly. is there a "clean shutdown" command? | 20:36 |
DHE | actually that probably causes problems with the service manager | 20:37 |
timburke | DHE, send a `HUP` to the main process and we'll close the listen sockets while still completing any in-flight requests | 20:38 |
timburke | and iirc, we let eventlet take care of setting both REUSEPORT and REUSEADDR for us: https://github.com/eventlet/eventlet/blob/master/eventlet/convenience.py#L34 | 20:39 |
reid_g | Sounds like the systemd service unit should have a few more options configured. | 20:48 |
reid_g | I'm just using the default unit files from Ubuntu package | 20:49 |
opendevreview | Tim Burke proposed openstack/swift-bench master: Fix SyntaxWarning https://review.opendev.org/c/openstack/swift-bench/+/888069 | 21:35 |
opendevreview | Merged openstack/swift-bench master: Fix SyntaxWarning https://review.opendev.org/c/openstack/swift-bench/+/888069 | 22:22 |
opendevreview | Tim Burke proposed openstack/swift-bench master: refactor bin/bench into swiftbench/cli for testing https://review.opendev.org/c/openstack/swift-bench/+/866826 | 22:42 |
opendevreview | Tim Burke proposed openstack/swift-bench master: Switch from optparse to argparse https://review.opendev.org/c/openstack/swift-bench/+/874341 | 22:42 |
opendevreview | Tim Burke proposed openstack/swift-bench master: support container_name from cli https://review.opendev.org/c/openstack/swift-bench/+/865369 | 22:42 |
opendevreview | ASHWIN A NAIR proposed openstack/swift master: bad request syntax response missing txn-id https://review.opendev.org/c/openstack/swift/+/887904 | 23:15 |
opendevreview | ASHWIN A NAIR proposed openstack/swift master: bad request syntax response missing txn-id https://review.opendev.org/c/openstack/swift/+/887904 | 23:32 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!