Thursday, 2026-01-15

*** jroll07 is now known as jroll007:16
opendevreviewMerged openstack/project-config master: Migrate base propose-updates job to Jammy  https://review.opendev.org/c/openstack/project-config/+/96147509:59
fricklermordred: the secret for OSC you created in https://review.opendev.org/714502 needs rotation, seems you might be the only person having access? we could also drop the osc-image uploads instead if nodody is using them?10:08
frickleras a side note it seems weird to me that the job is running in gate, blocking merges if it fails. wouldn't it be better to do it in the post pipeline? e.g. https://review.opendev.org/c/openstack/python-openstackclient/+/96277810:09
fungistill no newer hits to update_bug.py in the gerrit error_log since we rotated its application credentiall13:59
fungihttps://launchpad.net/~hudson-openstack/+karma shows a bunch of bug comments though only some of them line up with the approximate time of release changes merging, suggesting that e.g. the one "4 hours ago" was from gerrit at least14:27
fungi(no openstack release requests merged between when we made the credential change and 4 hours ago)14:27
frickleryes, I don't think we have any releases pending that would trigger LP updates currently14:55
fungibug comments are working, https://bugs.launchpad.net/keystonemiddleware/+bug/2129018 got them in the past few minutes15:36
fungii'll go ahead and revoke authorization for the old credential now15:36
clarkbawesome thank you for handling that15:45
fricklerah, let's hope we get a couple of fresh releases from that soonish, too16:00
fungiright, i'll be monitoring that bug and we should see the zuul secret exercised that way16:02
fungiso far we only know that the token for gerrit updates is working, and that's configured via private hostvars rather than zuul secret (which now has a dedicated token to itself, for future flexibility and improved separation of concerns)16:03
fungiand i reduced the scope of privileges for these new tokens to just what i think they need, which makes all of this a bit more secure than what we had before too16:04
*** gmaan is now known as gmaan_afk16:33
opendevreviewMohammad Issa proposed openstack/project-config master: Add repo app-openvswitch for starlingx  https://review.opendev.org/c/openstack/project-config/+/97352117:31
clarkbfungi: I'm looking at the failed borg backup for lists and it looks like we've had a backup running to the ord server since the 12th which is why subsequent backups have failed (they cannot get the lock to backup to that server)17:33
fungioh, i wonder what got it stuck17:33
clarkbfungi: strace on the python process has it stuck on a read17:34
clarkba quick google serach indicates that borg should handle interrupted backups safely17:34
fungii don't see anything remotely related to the lists server nor backups happening on monday, so it was probably a fluke17:35
clarkbI suspect that we can probably kill the process and then monitor tomorrow's backup pass to ensure it is happier?17:35
clarkbya maybe something in the networking between them?17:35
fungiyeah, sgtm17:35
clarkbfungi: 2769786 is the python process and it has a child ssh (the transport mechanism) with pid 276978917:36
clarkbfungi: I'm thinking I'll just do a sudo kill 2769789 and see what happens?17:36
fungiright, interrupt the child and let the parent clean up17:36
clarkbok I'll do that now17:37
fungithanks!17:37
clarkbhttps://borgbackup.readthedocs.io/en/stable/faq.html#if-a-backup-stops-mid-way-does-the-already-backed-up-data-stay-there here are the upstream docs. Killing ssh doesn't seem to have killed the python process17:39
clarkbso I think I'll kill 2769786 next17:39
clarkbfungi: looking at docker ps -a I think that docker restarted the mariadb server which interrupted our mysqldump17:40
clarkband that is what we must be stuck on a read for17:40
clarkb2769798 is the mysql dump so I'm going to try and kill it first17:41
clarkbthe trace of the borg process is proceeding now with munmaps17:43
clarkbI suspect that if we wait a bit now it will figure out that things went side ways17:44
clarkbas for preventing this in the future I wonder if we should run borg under timeout? Say 48 hours?17:44
clarkband it emailed us when the old running process exited with failure.17:45
clarkbtomorrow we should check that it succeeds and in the meantime think about how we might avoid this problem in the future. I'm surprised that mysqldump doesn't exit with an error if the mysql server goes away mid stream17:46
*** gmaan_afk is now known as gmaan17:46
fungiinteresting, yeah i guess we pull from an outside mariadb container source and it updated?17:49
fungitimeouts make sense in that situation, i suppose any of our many database backup streams could get hit by the same17:49
clarkbyup I think we must've merged something that cause ansible to run on lists and it saw a mariadb image update and updated it17:53
fungiwell, we were updating a lot of things earlier in the week, obviously17:54
clarkbexactly. I'm not surprised that mariadb updated. More surprised that mysqldump can't handle that18:04
fungior the kernel... maybe the client side of the local network socket just hung when the server stopped listening?18:06
clarkbperhaps. the python side was reading fd 0 which I think is due to it being in a pipe from mysqldump? Maybe stringingthe process together like that is part of the problem18:07
clarkbwe have decent alerting from when this goes wrong so I'm not too worried about it. We can probably just observe and see if any action needs to be taken18:07
fungiwell, by "client" i meant myseldump itself. i assume it's connecting over loopback but maybe it's a unix socket/fifo instead?18:09
fungilooks like we did get a backup failure e-mail about the one you terminated too18:13
clarkbfungi: yup at ~17:44 UTC18:14
clarkbfungi: we don't specify a host value to mysql dump which causes the host to be localhost. And if you use localhost it will use the domain socket18:15
clarkbso that may be why we don't notice that things go aware18:15
clarkbs/aware/away/18:15
fungiyeah, maybe that doesn't force close the fd18:15
clarkbsince having the socket open will keep it around similar to having a normal file open and you delete the directory it is in recursively?18:15
fungiwhereas network to loopback would probably have done tcp/rst once the service was down, if not a proper tcp/fin18:16
fungiduring shutdown18:16
clarkbyup18:16
clarkbI wonder if we can update that mysqldump command to use -h 127.0.0.1 and have it just work18:17
clarkbmysql/mariadb differentiate auth based on auth method so that may not work as expected. But is somethign we could test via a held node in zuul or just manually18:17
fungiit might be marginally less efficient, but probably not in any way that matters for our case18:17
clarkblooking in system-config none of our current db backup scripts use -h at all so I think they are all using the default (I was hoping for one example to indicate this may just work)18:24
fungiit's super easy to test from the command line though18:24
fungijust direct stdout to /dev/null rather than piping through compression18:25
fungiunless you want to be able to check the contents of the dump i guess18:25
fungilaunchpad is inconveniently down at the moment18:26
opendevreviewClark Boylan proposed opendev/system-config master: DO NOT MERGE Test mysql connectivity via tcp instead of local socket  https://review.opendev.org/c/opendev/system-config/+/97353518:28
fungilooks like they moved from irc to matrix semi-recently. i'll have to update my client config18:29
clarkbfungi: oh ^ I did that because it is easy18:29
clarkbif that works I think it should work and we can test mysqldump itself to be sure. If it doesnt' work then we're into debugging18:29
fungihttps://status.canonical.com/ says major outage for lp18:44
clarkbI think github had an issue today too. And yesterday verizon was down for your part of the world18:50
clarkbholidays are over and we're getting back to work and breaking things :)18:50
fungibut not moving fast, just breaking things18:51
clarkbfungi: my simple test ni 973535 did work. So the next step is probably updating one of the mysqldump scripts to do the same thing?19:02
clarkbeither that or manually testing it first19:02
clarkblets see if we have any low impact hosts to yolo with19:03
clarkblodgeit? I'll propose a change to use -h 127.0.0.1 there19:03
fungisgtm19:07
opendevreviewClark Boylan proposed opendev/system-config master: Perform lodgeit mysqldump via tcp instead of local unix socket  https://review.opendev.org/c/opendev/system-config/+/97354019:12
opendevreviewClark Boylan proposed opendev/system-config master: Run all mysqldumps over tcp 127.0.0.1 instead of localhost socket  https://review.opendev.org/c/opendev/system-config/+/97354119:12
clarkbthings look quiet I'm going to take advantage of that and the weather and go for a bike ride21:31
clarkbLooking ahead to tomorrow as long as things are still quiet I think I would like to try the gerrit java 21 cut over21:31
clarkbinfra-root ^ just fyi and if you have reason to believe that is a bad idea please let me know21:31
fungisounds good, i plan to be around21:37

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!