Friday, 2025-08-08

*** mtreinish_ is now known as mtreinish11:00
*** mtreinish_ is now known as mtreinish12:02
fungiare we even still deploying elastic-recheck? that should probably have been retired years ago when we took down the elasticsearch cluster12:28
clarkbfungi: I'm not sure if the opensearch setup is still running elastic-recheck or not14:51
clarkbI think we can retire the project if not14:51
fungii thought it was all rewritten from scratch14:52
fungii haven't seen dpawlik around for a while to ask though14:52
*** darmach43 is now known as darmach415:39
clarkbinfra-root any objections to me proceeding with https://review.opendev.org/c/opendev/system-config/+/956828 then https://review.opendev.org/c/opendev/system-config/+/956829 ?15:58
funginone on my part, i just didn't want to approve them until you were settled in for the morning16:18
clarkbcool I've eaten breakfast and taken acre of paperwork I needed to get done. I'll approve the first one now16:19
fungii feel like i have an acre of paperwork every week16:20
clarkbI've been reading up on mjolnir and haven't found any indication from EMS docs that they can run one for you (though I haevn't logged into the control dashboard to confirm that this is the case). But they publish a container image that you supply a config to with account credetnials fro your homeserver and that account gets configured as adminitrator on your channels. Then you add16:47
clarkbthe bot to the channels as well as a private moderation channel which is where the humans can receive reports. I think they can also send commands to the bot there or dm the bot16:47
clarkbin the past you also needed to use a proxy to join encrypted channels but now mjolnir supports that natively.16:48
fungiso in theory we could add it to the fleet of bot containers on eavesdrop16:51
clarkbya that is what I'm thinking. I think it does store some data (ban lists for example) but that should be minimal. Doesn't seem to use a database either16:51
clarkbor if it does its something like a sqlite file directly on disk not a separate service16:51
fungiwell, if it stores something then it's using some kind of database, yeah16:52
fungiit might just be a flat text file, but that's still a sort of database16:52
clarkbhttps://github.com/matrix-org/mjolnir/blob/main/docs/setup_docker.md ya I meant more that we don't need additional services based on this documentation16:52
clarkbjust a data volume16:52
fungiperfect16:52
fungiwe have those16:53
clarkblooks like the entire cinder volume on eavesdrop is bind mounted for limnoria now. So we'd need to either expand that fs and split the fs tree for multiple mount points or use a second volume16:54
clarkbin any case I think this is doable16:54
fungii can imagine a quick container restart where we put eavesdrop02 into emergency disable, mkdir /var/lib/bots and /var/lib/limnoria/limnoria, stop the bot, mv all the other files in /var/lib/limnoria into /var/lib/limnoria/limnoria, umount /var/lib/limnoria, mount it at /var/lib/bots, adjust the volume source path in the compose file, start the bot, merge a change to make the config17:05
fungipermanent17:05
fungithen we can put other mappings in /var/lib/bots and they'll be on the cinder volume17:06
fungi*or* we could just not care because the mjolnir state data is likely tiny and the only reason limnoria has its own cinder volume is lack of sufficient space on the rootfs17:07
fungialso losing state for mjolnir probably isn't catastrophic since spammers usually burn their accounts within minutes of using them and getting banned everywhere, but we can lean on backups for emergencies too17:09
clarkbthats a good point17:10
clarkbcan probably just bind mount off the rootfs17:10
fungii wouldn't worry about moving it to the cinder volume unless it's a lot of data, which i have a hard time imagining it would be17:16
corvusit looks like there were 3 post-failures for the image upload role switch change; the rest succeeded17:16
corvuss/3/4/ math is hard17:16
corvus "cloud": "defaults"17:17
corvusthat doesn't look right17:17
corvushttps://zuul.opendev.org/t/opendev/build/e9084d621659404ab67a9428efd418ff/log/job-output.txt#1046217:17
fungiin actuality we have enough room on the eavesdrop02 rootfs for limnoria's data too, but it's enough data (20gb in cinder for now while there's only 33gb free on the rootfs) that this is reasonable future-proofing17:18
corvushttps://zuul.opendev.org/t/opendev/build/e9084d621659404ab67a9428efd418ff/console oh that's a much better error actually17:19
corvusopenstack.exceptions.HttpException: HttpException: 499: Client Error for url: https://swift.api.sjc3.rackspacecloud.com/v1/AUTH_ac0fed44dbe4539d83485bcefc4e2d4b/images-7b7d44d25aa9/e9084d621659404ab67a9428efd418ff-centos-9-stream-arm64.raw.zst/000001, Client DisconnectThe client was disconnected during request.17:19
clarkbwas it disconncted because the cloud details were wrong so auth failed?17:20
clarkbI would've expected a different http code in that case but maybe that explains it?17:20
fungimisbehaving middlebox, poorly-configured idle state timeout, or just a random network glitch17:20
corvusi think the "cloud" error is a red herring and my fault; i think the real error is the Disconnect, and that's just a normal glitch.  i think clarkb mentioned that our change to "retry" didn't cover all the cases, and i think maybe this is evidence of that, and it's still a problem17:23
fungithe common interpretation for a 499 is that a client closed the connection before the server responded. this can be a webserver reporting a timeout proxying/calling to a wsgi process17:23
corvusso, in short, i think this is not evidence that there is something wrong with the new copy of the role; i think we're just seeing sporadic errors we've previously seen17:23
fungimaybe we're finding the less common cases now that they're not drowned out in other noise17:24
corvusthe "cloud" thing is because i tried to helpfully put in some debug data in error responses, probably based on code copied from the logs roles, and that code doesn't work because it's dereferencing a variable that doesn't exist.  we're lucky it did that instead of just crashing, actually.17:24
corvusfungi: yeah, i think this may be the only one we've seen that we don't have a solution for17:25
corvusi think we speculated that to actually have requests/urllib retry due to this error code, we would need to do some intrusive work... it's not supported in the api.17:25
corvusoh i think i get it.  since we're not using a clouds.yaml file, our cloud doesn't have a name.  so i guess cloud.name on the cloud object we get back from openstacksdk is just defaults, meaning "whatever you gave me as parameters"17:27
clarkbaha17:28
corvusi think i'm convinced that a "recheck" is okay because everything is status quo17:28
clarkbsounds good17:28
opendevreviewDmitriy Rabotyagov proposed openstack/project-config master: Stop syncing run_tests/Vagrantfiles for OSA  https://review.opendev.org/c/openstack/project-config/+/95694417:28
opendevreviewJames E. Blair proposed opendev/zuul-providers master: Use label-defaults  https://review.opendev.org/c/opendev/zuul-providers/+/95694617:35
corvusno rush, but ^ is in response to a proposed niz syntax change... i wanted to go ahead and write the change for opendev so we can take a look at the result and see if it looks sane (to help evaluate the upstream zuul change).17:36
clarkbI guess the idea is to reorganize to better reflect those values apply to label boots and not the cloud itself?17:37
clarkbsimilarly with the image defaults being image specific17:38
clarkbsyntax wise that seems fine17:38
corvusyeah, and especially that they apply to labels and there might be different defaults for the same attribute that would apply to images.  the depends-on commit message goes into it a bit.17:44
opendevreviewMerged opendev/system-config master: Reapply "Migrate statsd sidecar container images to quay.io"  https://review.opendev.org/c/opendev/system-config/+/95682818:09
fungii guess hourlies got in first18:10
clarkbthe promote jobs don't wait18:11
clarkbonce they are done and we confirm the new content on quay I'll approve the followup to pull the image from there18:11
clarkblooks like they both updated18:12
clarkbI've approved the other change (956829)18:12
fungiah, right, it's the second change that deploys anyway18:25
clarkbOnce that change merges and deplyos I'm going to pop out for lunch19:08
fungiyeah, i need to run some errands once it's in19:11
opendevreviewMerged opendev/system-config master: Pull the haproxy and zookeeper statsd sidecars from quay  https://review.opendev.org/c/opendev/system-config/+/95682919:34
fungideploy jobs are already starting19:35
clarkbboth haproxy-statsd containers have restarted on the new containers19:37
clarkbhttps://grafana.opendev.org/d/1f6dfd6769/opendev-load-balancer?orgId=1&from=now-5m&to=now&timezone=utc shows a little gap then resumed data19:38
clarkbthe zookeeper statsd containers also restarted19:38
clarkbhttps://grafana.opendev.org/d/21a6e53ea4/zuul-status?orgId=1&from=now-5m&to=now&timezone=utc the bottom of this dashboard has the zk metrics19:39
clarkbI think the 19:39:10 set are post restart so this also looks good19:39
fungiyeah, looks right so far19:40
clarkbthe deploy buildset reported success and I'm happy with the statsd grafana results19:41
clarkbI'm going to grab lunch now19:41
fungiyep, i think we're good19:42
fungii'm going to pop out to run some errands while the tourists are hopefully all out on the water19:42
fungilooks like everything's still working21:59

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!