Monday, 2025-09-29

opendevreviewElod Illes proposed openstack/project-config master: Temporarily remove release docs semaphores  https://review.opendev.org/c/openstack/project-config/+/96247610:11
opendevreviewElod Illes proposed openstack/project-config master: Revert "Temporarily remove release docs semaphores"  https://review.opendev.org/c/openstack/project-config/+/96247710:11
*** dmellado3 is now known as dmellado10:59
ykarelclarkb, seen that multiple interface issue again once on 27th on rax-iad-main https://e04cd551585f8367e58f-97f1bb270b356fde15ebdabc0499edaa.ssl.cf2.rackcdn.com/openstack/f8cff2e0d42847e1b689a29f88b6200d/job-output.txt11:53
ykarelhmm but i see a day before there were some undo/reapply https://review.opendev.org/c/opendev/zuul-providers/+/962379 so likely related to that?11:55
*** dhill is now known as Guest2791212:17
fungiykarel: yes, we were still trying to work out the cause, and finally found it12:59
ykarelfungi, ack thx13:05
Clark[m]corvus: not sure if you saw but we tried adding the network config back to rax flex and got duplicate IPs again on Friday14:07
Clark[m]corvus: it's possible that I didn't restart the launchers on a fixed version, but I wondered if you could check via the repl against the test provider again to see if it looks better?14:07
Clark[m]But also we may hold off on trying again until after Wednesday's Openstack release so maybe this is not urgent14:08
corvusyep i'll check14:15
corvusClark: ['opendevzuul-network1']14:18
corvusClark: that's for 'ubuntu-noble-8GB' in 'raxflex-test-sjc3-main'14:18
Clark[m]Huh so now I wonder if maybe weekend restarts updated to an actually fixed version and I restarted into Wednesday was not fixed14:19
Clark[m]Or maybe there is a second issue we haven't tracked down yet14:19
corvusClark: how about we make a new label, stick it on the test provider, then run a test job that selects that label and emits network config?14:21
Clark[m]I like that idea 14:22
Clark[m]I can work on getting a change up for that in a bit14:23
corvus++14:23
*** sean-k-mooney-pto is now known as sean-k-mooney14:28
opendevreviewClark Boylan proposed opendev/zuul-providers master: Add a test label to raxflex-test  https://review.opendev.org/c/opendev/zuul-providers/+/96249914:37
clarkbcorvus: ^ something like that?14:37
clarkbcorvus: I think any job can be run on that label as the zuul info gathering in the base playbooks should fetch sufficient info via ansible facts to know if we have multiple ip addrs/nics14:38
opendevreviewMerged opendev/zuul-providers master: Add a test label to raxflex-test  https://review.opendev.org/c/opendev/zuul-providers/+/96249914:39
corvuscool, a sandbox change to just emit a debug should do it then :)14:39
opendevreviewClark Boylan proposed opendev/bindep master: DNM testing raxflex network behaviors in zuul-launcher  https://review.opendev.org/c/opendev/bindep/+/96250014:44
clarkbcorvus: ^ I think that should do it14:44
clarkbhrm that hit node failure14:46
corvusi'm going to re-enque it14:51
clarkback. I'm getting beginning of week software updates and reboots out of the way14:52
opendevreviewJames E. Blair proposed opendev/zuul-providers master: Update raxflex-test config  https://review.opendev.org/c/opendev/zuul-providers/+/96250314:55
corvusclarkb: ^14:55
opendevreviewJames E. Blair proposed opendev/zuul-providers master: Update raxflex-test config  https://review.opendev.org/c/opendev/zuul-providers/+/96250314:56
opendevreviewMerged opendev/zuul-providers master: Update raxflex-test config  https://review.opendev.org/c/opendev/zuul-providers/+/96250314:59
clarkbok back15:01
clarkbI rechecked 96250015:02
clarkbcorvus: https://zuul.opendev.org/t/opendev/build/bd8a41c5d1754c939f30a11511a63e92/log/zuul-info/host-info.ubuntu-noble.yaml#2-5 this looks like one interface to me. I'm really confused and best I can figure is maybe what I restarted onto wasn't as fixed as I thought and since then restarts over the weekend have caught things up15:06
*** clif1 is now known as clif15:07
clarkbcorvus: is it possible that we needed to restart other components to have a consistent view of the configuration?15:08
clarkbcorvus: for example is it executors that provide the label info to the launchers rather than the launchers reading it directly? that might also explain it15:08
clarkb(I don't think this is the case as we were able to fix things by dropping the network config and that took effect immediately)15:08
corvuswe may have needed to update the schedulers to get the fix.15:16
corvusthat being the case, it should be fine to unrevert.  i'm happy to double check the repl too, just to be safe.15:17
clarkbok I'll get another change up15:21
opendevreviewClark Boylan proposed opendev/zuul-providers master: Revert "Reapply "Remove raxflex networks config""  https://review.opendev.org/c/opendev/zuul-providers/+/96250715:22
opendevreviewMerged opendev/zuul-providers master: Revert "Reapply "Remove raxflex networks config""  https://review.opendev.org/c/opendev/zuul-providers/+/96250715:24
corvus['opendevzuul-network1']15:27
corvusfor 'ubuntu-noble-8GB' in 'raxflex-sjc3-main' on both servers15:27
corvusand that's after waiting for the logs to indicate the config update is done15:28
corvusi think we're all set, and can probably remove that test provider now too15:28
clarkblet me double check the booted nodes15:32
clarkbon the cloud side I mean15:32
clarkbcorvus: all nodes in sjc3 currently have one interface. You don't happen to have a node name do you so that I can check a specific node?15:33
clarkbbut I agree this is looking good. We should in theory be able to update the clouds.yaml to clean up the unnecessary config and also remove the test provider and label15:34
clarkbcorvus: separately do you know why ever node name ends in 4? is that a characteristic of whatever uuid system we're using?15:34
corvusyou want me to confirm the name of a node that was launched after the config merged?15:34
corvuss/merged/went into effect/15:35
clarkbcorvus: yes15:35
clarkbif you have it15:35
corvusand yes :)15:35
clarkbnp25da30f833794 was booted at 2025-09-29T15:32:50.00000015:35
clarkbso I think this is one that is after it went into effect15:35
corvusprobably the 4 is part of the uuid that comes from the host or nic or whatever15:35
clarkbah15:35
corvus(and is apparently something shared between the 2 launchers?15:36
clarkbif np25da30f833794 is one that booted after the update then I think we're good15:36
clarkbbut I'll check dfw3 and iad3 really quickly too15:36
corvus15:26:04,704 is the latest reconfig time15:36
corvusso anything after that should be using the new config15:37
clarkbnp0bda80f9f2674 launched at 2025-09-29T15:36:03.000000 in dfw3 and has one interface15:37
clarkbnp95718a142dfc4 launched at 2025-09-29T15:36:48.000000 in IAD3 and has one itnerface15:37
clarkbI note that all three example nodes have slightly earlier created at timestamps but they are all well after 15:26:04,704 so I think we are good15:38
clarkbinfr-root I removed my workflow -1 from https://review.opendev.org/c/opendev/system-config/+/962237 as it should be safe to clean that up per ^15:39
clarkbI'm going to clean up the test provider next and then abandon the bindep test change15:39
corvusclarkb: oh, the '4' is apparently the field in the uuid that literally means it's a version 4 uuid15:40
opendevreviewClark Boylan proposed opendev/zuul-providers master: Cleanup the raxflex test provider and label  https://review.opendev.org/c/opendev/zuul-providers/+/96251615:41
clarkbcorvus: aha15:41
clarkbit just happens to also be where the prefix cutoff is15:41
corvusyep, and that is the maxlen of a windows hostname15:42
corvus(including the "np")15:42
corvuswe could change that to "z" and get an extra character15:42
clarkbykarel__: ^ fyi since you've been following this. We think we're finally in the state we want to be long term and networks seem to work the way we want them to15:43
opendevreviewMerged opendev/zuul-providers master: Cleanup the raxflex test provider and label  https://review.opendev.org/c/opendev/zuul-providers/+/96251615:44
clarkbheh I realize my highlight of https://review.opendev.org/c/opendev/system-config/+/962237 won't have worked because I typoed infra-root15:44
clarkbthat is the change to clear the network config out of clouds.yaml for raxflex and rely only on the zuul-provider config15:44
corvus+315:46
opendevreviewMichal Nasiadka proposed openstack/diskimage-builder master: almalinux-container: Add support for building 10  https://review.opendev.org/c/openstack/diskimage-builder/+/96033615:54
opendevreviewMichal Nasiadka proposed openstack/diskimage-builder master: almalinux-container: Add support for building 10  https://review.opendev.org/c/openstack/diskimage-builder/+/96033616:00
opendevreviewMichal Nasiadka proposed openstack/diskimage-builder master: almalinux-container: Add support for building 10  https://review.opendev.org/c/openstack/diskimage-builder/+/96033616:00
opendevreviewMichal Nasiadka proposed openstack/diskimage-builder master: almalinux-container: Add support for building 10  https://review.opendev.org/c/openstack/diskimage-builder/+/96033616:01
opendevreviewMerged opendev/system-config master: Revert "Reapply "Select the network to use in raxflex""  https://review.opendev.org/c/opendev/system-config/+/96223716:06
corvusclarkb: ^ if we want to be extra careful, then we'll want to restart the launchers to get that and then double check everything, right?16:24
corvus(but only after that deploys)16:24
clarkbcorvus: yes that is correct16:25
corvusthat should happen in the jobs that launch in 35m, iirc.16:25
corvusif we're lucky, we're going to merge an unrelated zuul change in 11m, and we're going to start merging launcher zuul changes in 41m.  that may introduce an unwanted variable into this.16:26
corvusto that end, i have paused the queues in the zuul tenant16:27
corvusbecause that's a thing we can do now16:27
corvushttps://zuul.opendev.org/t/zuul/status16:27
clarkbneat16:28
corvus(i don't think this should normally be the way we handle this, but in this case, i think it's fine, plus good to exercise the new feature in a low-stakes situation :)16:28
clarkbI need to pop out for a bit but should be able to help with restarts once deployed later if that helps16:29
corvus++16:29
opendevreviewMichal Nasiadka proposed openstack/diskimage-builder master: almalinux-container: Add support for building 10  https://review.opendev.org/c/openstack/diskimage-builder/+/96033616:29
clarkbok I'm back around in about 15-20 minutes the clouds.yaml on zuul launchers should update then we can stop/start down/up -d the containers16:51
opendevreviewMerged openstack/diskimage-builder master: Remove nodepool based testing  https://review.opendev.org/c/openstack/diskimage-builder/+/95295317:02
clarkbthe zuul hourly deployment is running now17:04
clarkbits done. I'll check the clouds.yaml fiels now17:08
clarkbcorvus: both of the clouds.yaml files look good to me. Do you want to restart launchers or should I? any particular process you want to employ I'm guessing a stop/start is safer to keep versions in sync with what we're currently running?17:09
clarkbthats interesting docker's documentation is giving me access denied errors17:10
clarkbI think we have ~15 minutes before the paused pipelines proceed17:12
clarkbhttps://review.opendev.org/c/zuul/zuul/+/962177 is the one change that has merged since the weekend restart and that is web only so should be fine17:13
opendevreviewMerged openstack/project-config master: Retire shade  https://review.opendev.org/c/openstack/project-config/+/96152417:13
corvusclarkb: what do you mean 15 minutes?17:20
corvusi'll go ahead and restart launchers17:20
clarkbcorvus: the zuul pause says it is paused for one hour which I think is up in 7 minutes based on irc timestamps17:20
clarkbbut maybe I'm mistaken about how that works17:20
corvusoh sorry that's just informative text i wrote in the comment17:21
corvusi was trying to communicate to anyone else who read it what our expectations were for resuming it17:21
clarkboh I see its not an actual timer just a note from yuor side17:21
corvusyep17:21
corvusi recognize how unclear that was now.  :)17:21
corvusboth launchers restarted and running now17:22
clarkbnpbdb2e77336124 is building in dfw317:23
clarkbit has one interface17:24
clarkbI think that is the only one that has booted in rax flex since the restart17:24
clarkbbut I'll try to confirm the other two regions are also happy as things get used17:24
corvus17:23:11,857 is its start time, so that should be good17:25
corvusi'll unpause the zuul tenant now17:25
clarkbnpbdefd2f2ded44 in sjc3 lgtm17:26
clarkbcorvus: oh alsoI deleted ze11 last week but not ze12 (just a heads up)17:29
clarkbjitsi meet published a release and then very quickly published an update to that release. We don't update until our daily runs so shouldn't be a big deal17:32
fungiworth keeping in mind as we approach the ptg in about a month17:33
clarkbyup I think we should do the put jitsi meet servers in emergency thing again17:33
fungiabsolutely17:33
clarkbfungi: oh also I checked and our periodic daily runs appear to be happy again after fixing borg on kdc0317:33
fungiyeah, i hadn't seen any more failures17:33
clarkbfungi: might be a good idea to check the borg logs on kdc03 to ensure it is backing up successfully too (I don't see emails about it complaining)17:33
fungiright, i was mainly going by the absence of failure e-mails17:34
fungiterminating with success status, rc 017:35
fungiMon Sep 29 05:05:23 PM UTC 2025 Backup finished with warnings.17:35
clarkbexcellent17:36
fungiit's not clear to me what the warnings are from17:36
fungiah, "file changed while we backed it up"17:37
fungiokay, so looks good17:37
clarkbya if you scroll through the log it should record all the things whether they were successful, warning, or error17:38
fungiright, and in this case we seem to have configured borg to back up its own log17:38
clarkboh fun but also probably not a bad idea17:39
fungiso it, understandably, changes as it logs what it's doing17:39
fungi/var/log/borg-backup-backup01.ord.rax.opendev.org.log: file changed while we backed it up17:39
fungithat's the only warning i saw in the log, fwiw17:39
clarkbiad3 still hasn't booted any new nodes but considering that dfw3 and sjc3 are looking good I'm not worried17:41
clarkbI'm going to consider this done when I edit the meeting agenda later today. I'll also drop the ze11 topic and the lists.o.o performance topic. Let me know if there are other edits we should add/remove/edit17:42
corvusclarkb: i agree, looks like we can completely close out the network topic18:16
corvusre ze11, the graphs look good after that change (ze11) is nulled out18:17
opendevreviewMichal Nasiadka proposed openstack/diskimage-builder master: almalinux-container: Add support for building 10  https://review.opendev.org/c/openstack/diskimage-builder/+/96033618:50
opendevreviewMichal Nasiadka proposed openstack/diskimage-builder master: almalinux-container: Add support for building 10  https://review.opendev.org/c/openstack/diskimage-builder/+/96033618:51
clarkbfirst pass on meeting agenda edits is in19:54
clarkbI have a meeting at 2300 UTC so I'll aim to send the agenda before that time19:55

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!