| opendevreview | Elod Illes proposed openstack/project-config master: Temporarily remove release docs semaphores https://review.opendev.org/c/openstack/project-config/+/962476 | 10:11 |
|---|---|---|
| opendevreview | Elod Illes proposed openstack/project-config master: Revert "Temporarily remove release docs semaphores" https://review.opendev.org/c/openstack/project-config/+/962477 | 10:11 |
| *** dmellado3 is now known as dmellado | 10:59 | |
| ykarel | clarkb, seen that multiple interface issue again once on 27th on rax-iad-main https://e04cd551585f8367e58f-97f1bb270b356fde15ebdabc0499edaa.ssl.cf2.rackcdn.com/openstack/f8cff2e0d42847e1b689a29f88b6200d/job-output.txt | 11:53 |
| ykarel | hmm but i see a day before there were some undo/reapply https://review.opendev.org/c/opendev/zuul-providers/+/962379 so likely related to that? | 11:55 |
| *** dhill is now known as Guest27912 | 12:17 | |
| fungi | ykarel: yes, we were still trying to work out the cause, and finally found it | 12:59 |
| ykarel | fungi, ack thx | 13:05 |
| Clark[m] | corvus: not sure if you saw but we tried adding the network config back to rax flex and got duplicate IPs again on Friday | 14:07 |
| Clark[m] | corvus: it's possible that I didn't restart the launchers on a fixed version, but I wondered if you could check via the repl against the test provider again to see if it looks better? | 14:07 |
| Clark[m] | But also we may hold off on trying again until after Wednesday's Openstack release so maybe this is not urgent | 14:08 |
| corvus | yep i'll check | 14:15 |
| corvus | Clark: ['opendevzuul-network1'] | 14:18 |
| corvus | Clark: that's for 'ubuntu-noble-8GB' in 'raxflex-test-sjc3-main' | 14:18 |
| Clark[m] | Huh so now I wonder if maybe weekend restarts updated to an actually fixed version and I restarted into Wednesday was not fixed | 14:19 |
| Clark[m] | Or maybe there is a second issue we haven't tracked down yet | 14:19 |
| corvus | Clark: how about we make a new label, stick it on the test provider, then run a test job that selects that label and emits network config? | 14:21 |
| Clark[m] | I like that idea | 14:22 |
| Clark[m] | I can work on getting a change up for that in a bit | 14:23 |
| corvus | ++ | 14:23 |
| *** sean-k-mooney-pto is now known as sean-k-mooney | 14:28 | |
| opendevreview | Clark Boylan proposed opendev/zuul-providers master: Add a test label to raxflex-test https://review.opendev.org/c/opendev/zuul-providers/+/962499 | 14:37 |
| clarkb | corvus: ^ something like that? | 14:37 |
| clarkb | corvus: I think any job can be run on that label as the zuul info gathering in the base playbooks should fetch sufficient info via ansible facts to know if we have multiple ip addrs/nics | 14:38 |
| opendevreview | Merged opendev/zuul-providers master: Add a test label to raxflex-test https://review.opendev.org/c/opendev/zuul-providers/+/962499 | 14:39 |
| corvus | cool, a sandbox change to just emit a debug should do it then :) | 14:39 |
| opendevreview | Clark Boylan proposed opendev/bindep master: DNM testing raxflex network behaviors in zuul-launcher https://review.opendev.org/c/opendev/bindep/+/962500 | 14:44 |
| clarkb | corvus: ^ I think that should do it | 14:44 |
| clarkb | hrm that hit node failure | 14:46 |
| corvus | i'm going to re-enque it | 14:51 |
| clarkb | ack. I'm getting beginning of week software updates and reboots out of the way | 14:52 |
| opendevreview | James E. Blair proposed opendev/zuul-providers master: Update raxflex-test config https://review.opendev.org/c/opendev/zuul-providers/+/962503 | 14:55 |
| corvus | clarkb: ^ | 14:55 |
| opendevreview | James E. Blair proposed opendev/zuul-providers master: Update raxflex-test config https://review.opendev.org/c/opendev/zuul-providers/+/962503 | 14:56 |
| opendevreview | Merged opendev/zuul-providers master: Update raxflex-test config https://review.opendev.org/c/opendev/zuul-providers/+/962503 | 14:59 |
| clarkb | ok back | 15:01 |
| clarkb | I rechecked 962500 | 15:02 |
| clarkb | corvus: https://zuul.opendev.org/t/opendev/build/bd8a41c5d1754c939f30a11511a63e92/log/zuul-info/host-info.ubuntu-noble.yaml#2-5 this looks like one interface to me. I'm really confused and best I can figure is maybe what I restarted onto wasn't as fixed as I thought and since then restarts over the weekend have caught things up | 15:06 |
| *** clif1 is now known as clif | 15:07 | |
| clarkb | corvus: is it possible that we needed to restart other components to have a consistent view of the configuration? | 15:08 |
| clarkb | corvus: for example is it executors that provide the label info to the launchers rather than the launchers reading it directly? that might also explain it | 15:08 |
| clarkb | (I don't think this is the case as we were able to fix things by dropping the network config and that took effect immediately) | 15:08 |
| corvus | we may have needed to update the schedulers to get the fix. | 15:16 |
| corvus | that being the case, it should be fine to unrevert. i'm happy to double check the repl too, just to be safe. | 15:17 |
| clarkb | ok I'll get another change up | 15:21 |
| opendevreview | Clark Boylan proposed opendev/zuul-providers master: Revert "Reapply "Remove raxflex networks config"" https://review.opendev.org/c/opendev/zuul-providers/+/962507 | 15:22 |
| opendevreview | Merged opendev/zuul-providers master: Revert "Reapply "Remove raxflex networks config"" https://review.opendev.org/c/opendev/zuul-providers/+/962507 | 15:24 |
| corvus | ['opendevzuul-network1'] | 15:27 |
| corvus | for 'ubuntu-noble-8GB' in 'raxflex-sjc3-main' on both servers | 15:27 |
| corvus | and that's after waiting for the logs to indicate the config update is done | 15:28 |
| corvus | i think we're all set, and can probably remove that test provider now too | 15:28 |
| clarkb | let me double check the booted nodes | 15:32 |
| clarkb | on the cloud side I mean | 15:32 |
| clarkb | corvus: all nodes in sjc3 currently have one interface. You don't happen to have a node name do you so that I can check a specific node? | 15:33 |
| clarkb | but I agree this is looking good. We should in theory be able to update the clouds.yaml to clean up the unnecessary config and also remove the test provider and label | 15:34 |
| clarkb | corvus: separately do you know why ever node name ends in 4? is that a characteristic of whatever uuid system we're using? | 15:34 |
| corvus | you want me to confirm the name of a node that was launched after the config merged? | 15:34 |
| corvus | s/merged/went into effect/ | 15:35 |
| clarkb | corvus: yes | 15:35 |
| clarkb | if you have it | 15:35 |
| corvus | and yes :) | 15:35 |
| clarkb | np25da30f833794 was booted at 2025-09-29T15:32:50.000000 | 15:35 |
| clarkb | so I think this is one that is after it went into effect | 15:35 |
| corvus | probably the 4 is part of the uuid that comes from the host or nic or whatever | 15:35 |
| clarkb | ah | 15:35 |
| corvus | (and is apparently something shared between the 2 launchers? | 15:36 |
| clarkb | if np25da30f833794 is one that booted after the update then I think we're good | 15:36 |
| clarkb | but I'll check dfw3 and iad3 really quickly too | 15:36 |
| corvus | 15:26:04,704 is the latest reconfig time | 15:36 |
| corvus | so anything after that should be using the new config | 15:37 |
| clarkb | np0bda80f9f2674 launched at 2025-09-29T15:36:03.000000 in dfw3 and has one interface | 15:37 |
| clarkb | np95718a142dfc4 launched at 2025-09-29T15:36:48.000000 in IAD3 and has one itnerface | 15:37 |
| clarkb | I note that all three example nodes have slightly earlier created at timestamps but they are all well after 15:26:04,704 so I think we are good | 15:38 |
| clarkb | infr-root I removed my workflow -1 from https://review.opendev.org/c/opendev/system-config/+/962237 as it should be safe to clean that up per ^ | 15:39 |
| clarkb | I'm going to clean up the test provider next and then abandon the bindep test change | 15:39 |
| corvus | clarkb: oh, the '4' is apparently the field in the uuid that literally means it's a version 4 uuid | 15:40 |
| opendevreview | Clark Boylan proposed opendev/zuul-providers master: Cleanup the raxflex test provider and label https://review.opendev.org/c/opendev/zuul-providers/+/962516 | 15:41 |
| clarkb | corvus: aha | 15:41 |
| clarkb | it just happens to also be where the prefix cutoff is | 15:41 |
| corvus | yep, and that is the maxlen of a windows hostname | 15:42 |
| corvus | (including the "np") | 15:42 |
| corvus | we could change that to "z" and get an extra character | 15:42 |
| clarkb | ykarel__: ^ fyi since you've been following this. We think we're finally in the state we want to be long term and networks seem to work the way we want them to | 15:43 |
| opendevreview | Merged opendev/zuul-providers master: Cleanup the raxflex test provider and label https://review.opendev.org/c/opendev/zuul-providers/+/962516 | 15:44 |
| clarkb | heh I realize my highlight of https://review.opendev.org/c/opendev/system-config/+/962237 won't have worked because I typoed infra-root | 15:44 |
| clarkb | that is the change to clear the network config out of clouds.yaml for raxflex and rely only on the zuul-provider config | 15:44 |
| corvus | +3 | 15:46 |
| opendevreview | Michal Nasiadka proposed openstack/diskimage-builder master: almalinux-container: Add support for building 10 https://review.opendev.org/c/openstack/diskimage-builder/+/960336 | 15:54 |
| opendevreview | Michal Nasiadka proposed openstack/diskimage-builder master: almalinux-container: Add support for building 10 https://review.opendev.org/c/openstack/diskimage-builder/+/960336 | 16:00 |
| opendevreview | Michal Nasiadka proposed openstack/diskimage-builder master: almalinux-container: Add support for building 10 https://review.opendev.org/c/openstack/diskimage-builder/+/960336 | 16:00 |
| opendevreview | Michal Nasiadka proposed openstack/diskimage-builder master: almalinux-container: Add support for building 10 https://review.opendev.org/c/openstack/diskimage-builder/+/960336 | 16:01 |
| opendevreview | Merged opendev/system-config master: Revert "Reapply "Select the network to use in raxflex"" https://review.opendev.org/c/opendev/system-config/+/962237 | 16:06 |
| corvus | clarkb: ^ if we want to be extra careful, then we'll want to restart the launchers to get that and then double check everything, right? | 16:24 |
| corvus | (but only after that deploys) | 16:24 |
| clarkb | corvus: yes that is correct | 16:25 |
| corvus | that should happen in the jobs that launch in 35m, iirc. | 16:25 |
| corvus | if we're lucky, we're going to merge an unrelated zuul change in 11m, and we're going to start merging launcher zuul changes in 41m. that may introduce an unwanted variable into this. | 16:26 |
| corvus | to that end, i have paused the queues in the zuul tenant | 16:27 |
| corvus | because that's a thing we can do now | 16:27 |
| corvus | https://zuul.opendev.org/t/zuul/status | 16:27 |
| clarkb | neat | 16:28 |
| corvus | (i don't think this should normally be the way we handle this, but in this case, i think it's fine, plus good to exercise the new feature in a low-stakes situation :) | 16:28 |
| clarkb | I need to pop out for a bit but should be able to help with restarts once deployed later if that helps | 16:29 |
| corvus | ++ | 16:29 |
| opendevreview | Michal Nasiadka proposed openstack/diskimage-builder master: almalinux-container: Add support for building 10 https://review.opendev.org/c/openstack/diskimage-builder/+/960336 | 16:29 |
| clarkb | ok I'm back around in about 15-20 minutes the clouds.yaml on zuul launchers should update then we can stop/start down/up -d the containers | 16:51 |
| opendevreview | Merged openstack/diskimage-builder master: Remove nodepool based testing https://review.opendev.org/c/openstack/diskimage-builder/+/952953 | 17:02 |
| clarkb | the zuul hourly deployment is running now | 17:04 |
| clarkb | its done. I'll check the clouds.yaml fiels now | 17:08 |
| clarkb | corvus: both of the clouds.yaml files look good to me. Do you want to restart launchers or should I? any particular process you want to employ I'm guessing a stop/start is safer to keep versions in sync with what we're currently running? | 17:09 |
| clarkb | thats interesting docker's documentation is giving me access denied errors | 17:10 |
| clarkb | I think we have ~15 minutes before the paused pipelines proceed | 17:12 |
| clarkb | https://review.opendev.org/c/zuul/zuul/+/962177 is the one change that has merged since the weekend restart and that is web only so should be fine | 17:13 |
| opendevreview | Merged openstack/project-config master: Retire shade https://review.opendev.org/c/openstack/project-config/+/961524 | 17:13 |
| corvus | clarkb: what do you mean 15 minutes? | 17:20 |
| corvus | i'll go ahead and restart launchers | 17:20 |
| clarkb | corvus: the zuul pause says it is paused for one hour which I think is up in 7 minutes based on irc timestamps | 17:20 |
| clarkb | but maybe I'm mistaken about how that works | 17:20 |
| corvus | oh sorry that's just informative text i wrote in the comment | 17:21 |
| corvus | i was trying to communicate to anyone else who read it what our expectations were for resuming it | 17:21 |
| clarkb | oh I see its not an actual timer just a note from yuor side | 17:21 |
| corvus | yep | 17:21 |
| corvus | i recognize how unclear that was now. :) | 17:21 |
| corvus | both launchers restarted and running now | 17:22 |
| clarkb | npbdb2e77336124 is building in dfw3 | 17:23 |
| clarkb | it has one interface | 17:24 |
| clarkb | I think that is the only one that has booted in rax flex since the restart | 17:24 |
| clarkb | but I'll try to confirm the other two regions are also happy as things get used | 17:24 |
| corvus | 17:23:11,857 is its start time, so that should be good | 17:25 |
| corvus | i'll unpause the zuul tenant now | 17:25 |
| clarkb | npbdefd2f2ded44 in sjc3 lgtm | 17:26 |
| clarkb | corvus: oh alsoI deleted ze11 last week but not ze12 (just a heads up) | 17:29 |
| clarkb | jitsi meet published a release and then very quickly published an update to that release. We don't update until our daily runs so shouldn't be a big deal | 17:32 |
| fungi | worth keeping in mind as we approach the ptg in about a month | 17:33 |
| clarkb | yup I think we should do the put jitsi meet servers in emergency thing again | 17:33 |
| fungi | absolutely | 17:33 |
| clarkb | fungi: oh also I checked and our periodic daily runs appear to be happy again after fixing borg on kdc03 | 17:33 |
| fungi | yeah, i hadn't seen any more failures | 17:33 |
| clarkb | fungi: might be a good idea to check the borg logs on kdc03 to ensure it is backing up successfully too (I don't see emails about it complaining) | 17:33 |
| fungi | right, i was mainly going by the absence of failure e-mails | 17:34 |
| fungi | terminating with success status, rc 0 | 17:35 |
| fungi | Mon Sep 29 05:05:23 PM UTC 2025 Backup finished with warnings. | 17:35 |
| clarkb | excellent | 17:36 |
| fungi | it's not clear to me what the warnings are from | 17:36 |
| fungi | ah, "file changed while we backed it up" | 17:37 |
| fungi | okay, so looks good | 17:37 |
| clarkb | ya if you scroll through the log it should record all the things whether they were successful, warning, or error | 17:38 |
| fungi | right, and in this case we seem to have configured borg to back up its own log | 17:38 |
| clarkb | oh fun but also probably not a bad idea | 17:39 |
| fungi | so it, understandably, changes as it logs what it's doing | 17:39 |
| fungi | /var/log/borg-backup-backup01.ord.rax.opendev.org.log: file changed while we backed it up | 17:39 |
| fungi | that's the only warning i saw in the log, fwiw | 17:39 |
| clarkb | iad3 still hasn't booted any new nodes but considering that dfw3 and sjc3 are looking good I'm not worried | 17:41 |
| clarkb | I'm going to consider this done when I edit the meeting agenda later today. I'll also drop the ze11 topic and the lists.o.o performance topic. Let me know if there are other edits we should add/remove/edit | 17:42 |
| corvus | clarkb: i agree, looks like we can completely close out the network topic | 18:16 |
| corvus | re ze11, the graphs look good after that change (ze11) is nulled out | 18:17 |
| opendevreview | Michal Nasiadka proposed openstack/diskimage-builder master: almalinux-container: Add support for building 10 https://review.opendev.org/c/openstack/diskimage-builder/+/960336 | 18:50 |
| opendevreview | Michal Nasiadka proposed openstack/diskimage-builder master: almalinux-container: Add support for building 10 https://review.opendev.org/c/openstack/diskimage-builder/+/960336 | 18:51 |
| clarkb | first pass on meeting agenda edits is in | 19:54 |
| clarkb | I have a meeting at 2300 UTC so I'll aim to send the agenda before that time | 19:55 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!