Monday, 2025-07-07

opendevreviewMerged opendev/zuul-providers master: Add centos-10-stream and rockylinux-10 image definitions  https://review.opendev.org/c/opendev/zuul-providers/+/95372606:49
opendevreviewMichal Nasiadka proposed opendev/zuul-providers master: Add CentOS Stream 10 builds  https://review.opendev.org/c/opendev/zuul-providers/+/95346007:41
opendevreviewMichal Nasiadka proposed opendev/zuul-providers master: Add CentOS Stream 10 builds  https://review.opendev.org/c/opendev/zuul-providers/+/95346008:12
opendevreviewTristan Cacqueray proposed zuul/zuul-jobs master: DNM: testing third-party CI config  https://review.opendev.org/c/zuul/zuul-jobs/+/95421208:15
opendevreviewMichal Nasiadka proposed opendev/zuul-providers master: Add CentOS Stream 10 builds  https://review.opendev.org/c/opendev/zuul-providers/+/95346008:37
opendevreviewMichal Nasiadka proposed opendev/zuul-providers master: Add CentOS Stream 10 builds  https://review.opendev.org/c/opendev/zuul-providers/+/95346008:48
*** liuxie is now known as liushy08:50
priteauHello. Do you know when we could expect new nodepool images to be available on all clouds following the merge of https://review.opendev.org/c/opendev/zuul-providers/+/953908?09:53
fricklerpriteau: iiuc this should have happened right after the promote pipeline ran for that change. but there were also fresh image builds this morning. do you still see any issues?10:51
priteauI still had the issue this morning in https://zuul.opendev.org/t/openstack/build/3d027faf5d084bd185363ef931f65ea611:07
priteauzuul-info says /etc/dib-builddate.txt is 2025-05-27 19:2711:07
priteau(stable/2025.1 branch)11:08
fricklerinfra-root: ^^ I can confirm that https://zuul.opendev.org/t/openstack/image/rockylinux-9 only shows pretty old images, I guess someone needs to take a closer look as to why the images e.g. from https://review.opendev.org/c/opendev/zuul-providers/+/953908 don't get uploaded and/or used11:23
fricklerok, looks like rocky images are missing from the periodic-image-build pipeline, but shouldn't uploads also be triggered when promoting images?11:35
mnasiadkaseems not11:41
mnasiadkalet me raise a patch11:42
mnasiadkaI think promote just promotes the built image11:42
opendevreviewMichal Nasiadka proposed opendev/zuul-providers master: Add missing rocky jobs to perodic pipeline  https://review.opendev.org/c/opendev/zuul-providers/+/95423111:44
fricklerright, but promote should act on the images that were built in the gate pipeline?11:44
opendevreviewMichal Nasiadka proposed opendev/zuul-providers master: Add missing rocky jobs to perodic pipeline  https://review.opendev.org/c/opendev/zuul-providers/+/95423111:45
opendevreviewMichal Nasiadka proposed opendev/zuul-providers master: Add missing rocky jobs to perodic pipeline  https://review.opendev.org/c/opendev/zuul-providers/+/95423111:47
*** dhill is now known as Guest2150411:47
mnasiadkafrickler: seems there's some logic in promote playbooks which query for artifact, if nothing was built and uploaded - nothing is promoted I guess11:50
mnasiadkafrickler: although the gating jobs should be uploading something, so I'm a bit lost there ;-)11:52
opendevreviewMichal Nasiadka proposed opendev/zuul-providers master: Add missing rocky jobs to periodic pipeline  https://review.opendev.org/c/opendev/zuul-providers/+/95423112:16
opendevreviewMichal Nasiadka proposed opendev/zuul-providers master: Add CentOS Stream 10 builds  https://review.opendev.org/c/opendev/zuul-providers/+/95346013:19
opendevreviewJames E. Blair proposed opendev/zuul-providers master: Add image names to promote jobs  https://review.opendev.org/c/opendev/zuul-providers/+/95424413:28
corvusfrickler: mnasiadka ^ that should fix the promote jobs13:28
opendevreviewJames E. Blair proposed openstack/project-config master: Update Zuul status node graphs  https://review.opendev.org/c/openstack/project-config/+/95424613:43
opendevreviewMichal Nasiadka proposed opendev/zuul-providers master: Add CentOS Stream 10 builds  https://review.opendev.org/c/opendev/zuul-providers/+/95346013:45
ykarelHi is there some recent change which made multinode jobs to have nodes across different cloud providers? Seeing random failures in CI with such cases13:49
ykarelexample https://zuul.openstack.org/build/37303139f81f4390a04446c9e5f1a9a5 and https://zuul.openstack.org/build/886bbfaeb5894fb389d1fecb989b631f, but there are many such cases started ~ week ago13:51
fungiykarel: what's your most recent example? have you seen it happen today?13:51
ykarelfungi, you mean this is fixed already?13:51
ykarelone of above ex is from today13:51
fungiover the weekend we upgraded to a zuul-launcher fix that should try harder to keep nodes for the same request in one provider13:51
fungier, zuul, not zuul-launcher13:53
fungia fix to the zuul-launcher service in the zuul repo13:53
fungiykarel: in theory https://review.opendev.org/c/zuul/zuul/+/954064 should have addressed the problem13:53
ykarelok seeing the failures today atleast it seems not helped fully13:53
fungiit does look like our upgrade over the weekend maybe didn't happen? we're running several different versions on various components according to https://zuul.opendev.org/components13:54
ykarelno idea about that :)13:56
fungithe launchers are still a couple of commits behind, i think 12.1.1.dev48 75766e938 that some components are running has it, but 12.1.1.dev46 ad9d8bc4e on the launchers doesn't13:59
fungilooks like the upgrade may have stopped between ze08 and ze09?13:59
fungize09 is missing from the components list, and then ze10 onwards are still reporting older versions14:00
fungiinfra-root: ^ i'm in meetings for the next 2 hours and can't look deeper into this yet14:00
opendevreviewTristan Cacqueray proposed zuul/zuul-jobs master: Remove python2-devel from bindep for Fedora  https://review.opendev.org/c/zuul/zuul-jobs/+/95425714:31
opendevreviewMerged opendev/zuul-providers master: Add image names to promote jobs  https://review.opendev.org/c/opendev/zuul-providers/+/95424414:43
corvusi can look into the launcher/nodes issue; i'll leave debugging the restart playbook for someone else or later14:49
fungicorvus: i suspect the launcher/nodes issue is simply that the fix from last week isn't deployed yet due to the stuck upgrade14:52
corvusfungi: oh i restarted the launchers immediately with that fix14:53
corvusso that should be deployed14:53
fungioh, the fix in 954064 is 12.1.1.dev46 then?14:54
corvusi'm not sure, i'm just sure that i did a pull and restart14:54
fungii was trying to map it based on commit count since the last tag and it seemed like that was one or two dev version commits too low14:55
fungibut maybe i was off by a couple14:55
opendevreviewMichal Nasiadka proposed opendev/zuul-providers master: Add CentOS Stream 10 builds  https://review.opendev.org/c/opendev/zuul-providers/+/95346014:56
opendevreviewMichal Nasiadka proposed opendev/zuul-providers master: Add Rocky Linux 10 builds  https://review.opendev.org/c/opendev/zuul-providers/+/95426514:56
mnasiadkahmm, DCO is not required in opendev tenant?14:56
corvusnope14:57
mnasiadkafine by me :)14:58
mnasiadka(was just surprised)14:58
corvusthose policies are set by the projects themselves (eg, "openstack", "starlingx", "opendev", etc)14:59
fungiafaik it's currently set for official openstack deliverables, as well as airship and starlingx15:00
fungiopendev isn't an official openinfra project, we're more like a community that operates with assistance from openinfra15:01
corvusfungi: zuul-launcher is running commit "Add monitoring server to zuul-launcher" which is current master15:01
opendevreviewMichal Nasiadka proposed opendev/zuul-providers master: Add Rocky Linux 10 builds  https://review.opendev.org/c/opendev/zuul-providers/+/95426515:02
corvus(based on docker inspect)15:02
fungicorvus: huh, i wonder if the components list is stale then?15:02
fungimaybe something broke version reporting15:04
corvusfungi: oops, sorry, it's running "Minor launcher log improvements", one commit back15:05
corvusthat seems consistent with the components page, where launchers are running 46, and executors are running 48.15:06
fungiokay, so launchers should have the locality fix already since before the weekend, and separately the upgrade to an even newer state broke in the middle of the executors list after that15:09
fungidoes indeed sound like two different unrelated problems in that case15:09
corvus++15:10
fricklerze09 upgrade failed in "Upgrade executor server packages" with "E:Could not get lock /var/lib/apt/lists/lock. It is held by process 917246 (apt-get)". possibly a collision with automated upgrades15:15
fungifatal: [ze09.opendev.org]: FAILED! => ... MSG: Failed to lock apt for exclusive operation: Failed to lock directory /var/lib/apt/lists/: W:Be aware that removing the lock file is not a solution and may break your system., E:Could not get lock /var/lib/apt/lists/lock. It is held by process 917246 (apt-get)15:17
fungioh, frickler beat me to the log15:17
frickleractually looks like "/bin/sh /usr/lib/apt/apt.systemd.daily update" is stuck since 2025063015:17
fungibut yeah, sounds like the reboot playbook collided with unattended upgrades on ze09, leaving it offline and stopping cold there15:17
fungi20250630 is coincidentally close to "up 8 days"15:19
Clark[m]I assume that caused the reboot playbook to exit with a failure as well?15:19
fungiyes15:22
fungiit aborted at that point, and since it serializes the upgrades/reboots it left half the list un-upgraded15:23
fungi/var/log/dpkg.log is empty, the last entry in /var/log/dpkg.log.1 is "2025-06-29 02:20:16 status installed libc-bin:amd64 2.39-0ubuntu8.4"15:24
fungisimilarly, /var/log/unattended-upgrades/unattended-upgrades.log is empty but the most recent rotated one ends at "2025-06-30 06:09:41,059 INFO No packages found that can be upgraded unattended and no pending auto-removals"15:25
Clark[m]corvus: both of ykarel's examples appear to have been in the periodic pipeline. I wonder if that makes a difference with the stampede of requests all at once vs normal operating through the rest of the day15:31
Clark[m]thinking maybe boot failures are more likely during that time which may lead to three failures in a row in some clouds15:31
fungi`ps -q 917246 -o lstart` indicates the apt-get update process started "Mon Jun 30 11:26:29 2025" which is well after the last entry in unattended-upgrades.log15:36
clarkbfungi: ansible runs apt-get update too iirc15:37
clarkbits possible that an ansible run triggered the process and then things went sour15:37
fungiwell, in this case as frickler said it's a child of /usr/lib/apt/apt.systemd.daily15:39
opendevreviewDr. Jens Harbott proposed opendev/zuul-providers master: Add debian-trixie build  https://review.opendev.org/c/opendev/zuul-providers/+/95147115:39
opendevreviewMichal Nasiadka proposed opendev/zuul-providers master: Add Rocky Linux 10 builds  https://review.opendev.org/c/opendev/zuul-providers/+/95426515:39
fungiso locally scheduled process on the machine, not ansible as far as i can tell15:40
clarkboh tahts odd that it would be so offset from when the daily runs occur15:40
fungilooks like systemd has registered a apt-daily.timer and a apt-daily-upgrade.timer (with an associated service for each)15:43
fungi`journalctl -u apt-daily.service` reports that it started Jun 30 11:26:28 for "Daily apt download activities" and never reported completion15:45
fungilooks like it fires twice a day at somewhat random times15:46
fungi/usr/lib/systemd/system/apt-daily.timer has OnCalendar=*-*-* 6,18:00 and RandomizedDelaySec=12h15:46
fungiso it will run at a random time between 06:00-18:00 utc and again at a random time between 18:00-06:00 utc15:47
clarkbwe can probably make the reboot playbook more resilient by detecting a lock and then waiting. However, in this case that wouldn't have helped much as things seem properly stuck15:49
fungiyeah, looks like child processes are in select loops waiting for something that's never coming15:50
fungiand strace indicates the parent is waiting on each of them15:51
fungii'm not sure how much deeper i can dig into this, but suspect the next step is to try to forcibly terminate the apt-daily.service and then clean up any hung processes15:53
clarkbthat seems reasonable to me15:55
clarkbI recall there were problems reaching the ubuntu mirrors a little while back15:55
clarkbI'd have to look at irc logs to confirm but I wonder if this daily update occurred during that period of time and its a network sad related to that15:55
fungientirely possible15:56
jrosserthere is something a bit strange happening here https://zuul.opendev.org/t/openstack/build/22b144ba953b42758cec8f4cc32d0fce/log/job-output.txt#326-34316:05
jrosseri can see two different mirrors there for different CI regions16:05
jrosserthe iad3 ones are correct, the dfw ones not16:08
clarkbmy hunch is that the image build process is setting the dfw mirror in the files then the configure-mirrors role is not overriding the backports target16:09
jrosseri don't see it for noble on the same job, so it could be debian specific16:10
clarkbjrosser: https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/configure-mirrors/tasks/mirror/Debian.yaml#L12-L1316:11
clarkband yes that behavior differs to Ubuntu https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/configure-mirrors/tasks/mirror/Ubuntu.yaml16:11
clarkbI think you can set that flag and then things will work. I'm not sure why that is hidden behind a var though16:12
jrosserseems to be me that added that :)16:13
clarkbreading the README it seems the implication is that backports are not enabled on debian by default so you have to opt into them here16:14
clarkbbut apparently we are already opted into them during image builds? I wonder if that is a bug16:14
clarkbfungi: ^ you may have thoughts16:14
jrosserbut that doesnt explain why the mirror URL is different?16:14
jrosserthat should just be enable/disable backports iirc16:14
fungihaving backports in the sources list doesn't automatically result in installing packages from backports16:14
clarkbjrosser: my assumption which I haven't confirmed is that the image build is producing the backports source file with that mirror in it16:14
clarkbjrosser: then your job runs and because it doesn't override the value it uses the wrong backports target16:15
fungibackports have a lower priority so you have to explicitly request packages from the backports suite or specify versions that can only be satisfied from backports16:15
clarkbwith nodepool you'd be hardset to the public dfw mirror interface. WIth zuul launcher its inheriting from the test node environment I think16:15
clarkbso you probably just never noticed the backport mirror was configured on nodepool iamges but now we notice because the target moved16:16
fungibasically, including backports in the sources list means jobs can request packages from backports if they want, without having to first add more sources list entries16:16
clarkbfungi: jrosser given the backports won't by used by default behavior I think the easiest and most correct thing would be to have configure-mirrors always ocnfigure backports to the correct target16:16
fungialso, i think debian's policy on whether backports are enabled by default differs between their installer and their cloud images16:17
jrosseri am pretty confused now16:18
fungilast i recall, official debian cloud images have backports available by default in the sources list, but if you use the installer it prompts you whether you want them enabled defaulting to no16:18
clarkbjrosser: when your test node booted all of the apt list configuration was pointed at mirror.dfw-int.rax.opendev.org. Then configure-mirrors ran and updated all of the apt list config except backports to point at the current cloud location.16:19
clarkb(this is still my hunch I haven't checked the image to see if that is the case, but I'm likt 95% certain)16:19
clarkbthis happened due to the move from nodepool building the image to zuul-launcher jobs building the image. With nodepool we always hardcoded to mirror.dfw.rax.opendev.org16:20
fungii think mirror-int.dfw.rax.opendev.org is used initially because that's local to where the image builders are, so that starts out baked into our images16:20
jrosserah and this is because in my job i set `configure_mirrors_extra_repos` to be False, and that leaves the stale backports mirror config visible16:20
clarkbso it just worked before (tm) but not qutie in the way you expected it to16:20
clarkbyes exactly16:20
jrosserright, yes i understand now, thankyou :)16:20
clarkbI think we should remove configure_mirrors_extra_repos checking for debian backports and always configure it16:20
clarkbsince backports are not used by default it seems like it would be correct ot ensure they are available and configured correctly for things that opt into using it16:20
fungii concur, as long as that's always going to be used on images with the backports suite enabled16:21
fungiit may be that at some point in the past we didn't have backports in the initial sources list for images we built16:21
jrosserin the past it was extremely difficult to remove the backports config that was in the image16:22
fungiwhich in theory shouldn't be necessary unless you're trying to make sure you don't accidentally request a version of something from backports16:23
fungiunless you're doing things like `apt install -t bookworm-backports foo` or `apt install foo/bookworm-backports` or `apt install foo>=some.backport.version`16:24
fungiin which case you're going to get an error without bookworm-backports in sources16:25
fungibut installing packages from backports shouldn't ever happen implicitly unless you alter the suite priority in apt.conf or do bespoke package pinning configuration16:27
jrosserwell anyway, my commit message here has some of the reasoning for this https://opendev.org/zuul/zuul-jobs/commit/5d01b68574931435ce7605a8773fc4db0b47b60c16:28
fungithe only reason i can think of to not include backports in sources is to reduce the number of indices downloaded during an apt update, making jobs (slightly) faster and consuming less bandwidth with fewer random network failures, though little of that is relevant since we have local mirrors16:28
clarkbI guess the alternative would be to ensure the backports file is absent when that var is not set16:30
fungialso as previously mentioned, if you're trying to approximate official debian cloud images, i believe those *do* include backports in their sources lists. it's debian-installer which defaults to not enabling it (and prompts asking whether to if run interactively)16:30
jrosserthat flag is just there to avoid collisions in jobs that themselves expect to be managing that aspect of the repo setup16:31
jrosserwe've previously had very fragile code in pre.yml to try to unpick this16:31
clarkbwouldn'y you just ensure the file is absent and update?16:32
clarkbthats all I was going to do to the alternative of just always configuring it16:32
clarkbmostly just looking for some rough consensus on what we think is appropriate. I'm leaning towards alwyas configuring backports since you have to opt into using them anyway16:34
opendevreviewClark Boylan proposed zuul/zuul-jobs master: Always configure Debian backports in configure-mirrors  https://review.opendev.org/c/zuul/zuul-jobs/+/95428016:40
mnasiadkaOk, finally stream 10 zuul-launcher builds are working - if anyone has some time to review https://review.opendev.org/c/opendev/zuul-providers/+/953460 - I’d be grateful16:40
clarkbWe can decide in review if 954280 is what we want16:40
clarkbmnasiadka: reviewed. One nit but then the thing with rax classic not being able to boot el10 should be addressed so I -1'd16:46
mnasiadkaAh right16:48
opendevreviewMichal Nasiadka proposed opendev/zuul-providers master: Add CentOS Stream 10 builds  https://review.opendev.org/c/opendev/zuul-providers/+/95346016:50
clarkbfollowing up on things from last week: has anyone checked if the zuul key backup cron is working now with the updated container name?17:00
clarkbI've done a first pass update to the meeting agenda. I tried to capture the current state of zuul launcher things in particular17:05
clarkblet me know what else should go on there. I guess adding centos 10 stream ndoes is a good topic too17:05
clarkbok added centos 10 stream and debian trixie to the agenda17:06
clarkbthe zuul keys backup file has a modtime of July 7 00:00 UTC and has many bytes of content. I suspectthat is happy now17:08
corvusremote:   https://review.opendev.org/c/zuul/zuul/+/954284 Further fixes to multi-tenant provider locality [NEW]        17:12
corvusclarkb: ^ previous fix was not complete17:12
clarkbon it thanks17:13
fungithanks corvus!17:13
fungiykarel: ^ is the answer to earlier17:13
fungisince i didn't hear any objections, i issued a `sudo systemctl stop apt-daily` on ze09 which seems to have killed the hung processes and the associated timer automatically started it again. i'll monitor to make sure it completes17:18
fungi/var/log/dpkg.log indicates package upgrades are in progress this time, i see a bunch scrolling by17:19
clarkbcorvus: why use explicit includes in ps2's tenant config? isn't the default to include all of that stuff anyway?17:20
clarkbchange lgtm either way17:20
corvusclarkb: not the niz objects.  they are not included by default while niz is still in stealth mode17:28
corvus(otherwise, zuul installations could be surprised their users are using an undocumented feature)17:29
fungiautomatic package upgrades on ze09 completed this time (took a while since a new kernel triggered another dkms rebuild for the openafs lkm)17:30
fungieverything should be back to normal there now17:30
clarkbcorvus: aha that explains it17:31
clarkbfungi: corvus frickler  should we manually trigger the cronjob on bridge that upgrades and reboots zuul now? Or just wait for Friday again?17:32
clarkbI think the main concern is launchers and those are easy to manually update if/when we need to out of band if we just want to wait for Friday17:32
fungiwell also we're one executor down for the moment17:32
fungisince ze09 never got rebooted17:33
clarkbI think the playbook does handle the case where services are already off on the executor17:34
clarkbalso I wonder if noble is running those updates more aggressively than jammy did17:34
corvusi think we can start up ze09 and leave it17:34
opendevreviewMichal Nasiadka proposed opendev/zuul-providers master: Add Rocky Linux 10 builds  https://review.opendev.org/c/opendev/zuul-providers/+/95426517:35
opendevreviewMichal Nasiadka proposed opendev/zuul-providers master: Add Rocky Linux 10 builds  https://review.opendev.org/c/opendev/zuul-providers/+/95426517:35
clarkbthat might explain part of why this is a new issue17:35
corvusi do think this is the second time this happened, but i can't be sure without digging through irc logs that it's not the same stuck process as the first.17:35
fungiwhat's the best way to recover ze09? issue a reboot?17:36
opendevreviewMichal Nasiadka proposed opendev/zuul-providers master: Add Rocky Linux 10 builds  https://review.opendev.org/c/opendev/zuul-providers/+/95426517:36
fungisince there was also a kernel update in the mix17:36
corvusoh i just remembered, searching through irc logs is easy with matrix/element17:36
corvusthe thing i'm remembering is that it got stuck on june 28 when the server booted initially17:37
clarkbfungi: a reboot won't start the containers again. But rebooting to pick up the updates is probably a good idea anyway. So reboot then docker compose up -d in /etc/zuul-executor-docker/ or whatever the dir is called17:37
clarkbgitea09 is a jammy node and has both the daily and daily upgrade timer in systemd17:37
clarkband the daily run runs ExecStart=/usr/lib/apt/apt.systemd.daily update17:38
clarkbfungi: both the upgrade unit and the daily unit run /usr/lib/apt/apt.systemd.daily. One passes install as the argument (upgrade) and the other update (daily). Reading the script it seems like these two may be redundant with one another17:42
clarkboh no "update" only runs unattended-upgrade if download only is set17:43
clarkbso it doesn't install any packages with unattended-upgrade. That is the difference. I wonder why we need both to run though17:43
fungiyeah, i think the update is meant to happen more often to pull newer indices and maybe download and stage packages for later upgrading, so that the upgrade will be faster17:44
fungii didn't really dig into the reasoning behind those17:45
fungii see they're present on my debian/unstable systems as well17:45
clarkbya I think that is correct. But they both set the update timestamp so I think you can get away with only the ugprade unit and not the daily17:46
clarkbthere isn't a strict dependency between them, just potential optimization if both are used? I wonder if we should disable the apt daily unit and rely only on the upgrade uit to reduce the chances of conflict17:46
clarkbbut also I see that this happens on jammy too so not a new regression I don't think17:46
clarkbpossibly just more likely to happen on noble since it is newer so maybe gets more package updates17:47
fungipossible17:49
clarkbto be clear "this happens" == the two crons exist and run. Not necessarily the stuck process we saw on ze0917:50
clarkbmnasiadka: -1 on the rocky image change due to missing nodeset details but otherwise lgtm17:57
clarkbfrickler: the trixie image change lgmt as well. We'll just have to remember to switch testing to trixie once the release happens and also add arm64 images at some point17:57
clarkbfollowing up on the ipv4 -> ipv6 -> ipv4 -> ipv6 git fetch behavior seen building an image that retried git repo updates it looks like the node had ipv6 configured on its primary interface with the expected global address at the beginning of the job when we record node details17:59
clarkbhttps://zuul.opendev.org/t/opendev/build/2f4200f24ae4492295d5c43307e93038/log/zuul-info/zuul-info.ubuntu-noble.txt#53 any idea what stale ipv6 router neighbors indicates?18:02
clarkboh that is for the scope link network anyway?18:06
fungii think stale state may be an indicator that the tcp/ip stack simply hasn't seen any traffic for that address and so hasn't bothered to try to refresh it18:07
fungibasically the kernel doesn't know whether that address is reachable or not18:08
fungibut saw it working at one time18:08
fungiand yeah the via routes (default being the only one listed) use the gateway's global address instead, which is marked as reachable in the neighbor table, so should be fine18:10
fungimy guess is that the router is sending route announcements via its linklocal address, but not frequently enough that the linklocal address is always fresh in the neighbor table18:11
clarkbI've hopped onto a running bhs1 noble node and we actually configure ipv6 statically on that node18:11
clarkbso now I'm even more confused as to why we would see things move from ipv4 to ipv6 and back again18:12
clarkbcould it be DNS record lookup failures?18:12
fungitrying to remember what the actual issue was... is this concerning git?18:12
clarkbfungi: yes, its git running within the image build chroot (which I suppose may impact things) updating each repo18:13
clarkbhttps://zuul.opendev.org/t/opendev/build/2f4200f24ae4492295d5c43307e93038/log/job-output.txt#4353-435718:13
fungii believe git will automatically try over ipv4 if it gets an error doing things over v618:13
fungiwhich can lead to odd behavior if you have intermittent network issues18:13
clarkbhere you can see it fails then tries again. The prior git command used ipv6 fomr what I can tell from haproxy logs18:14
clarkbthe first attempt presumably tries ipv6 as well but that is never recorded on the haproxy side. Then ipv4 request shows up18:14
mnasiadkaclarkb: will update soon, thanks for review18:14
clarkband before using ipv6 it did a bunch of requests via ipv4 first18:14
fungibasically git assumes ipv6 is broken when it gets an error, even though it may be a network issue impacting both v6 and v418:14
clarkbfungi: well git doesn't retry automatically to use ipv4 we do that18:15
clarkbfungi: it fails using ipv6 then when we execute the same git command again in a loop to retry it used ipv418:15
clarkbafter a 5 minute timeout18:16
fungihuh, for some reason i thought git did that on its own too18:16
clarkbthe trying again in the linked log is something explicit that mnasiadka added to the dib script18:17
clarkbMaybe need to check how dns is configured in the chroot18:19
clarkbI think our centos 9 stream mirror is sad18:22
clarkbAppstream repomd.xml hashes don't match expected values. This probably means upstream made udpates out of order and we're seeing the fallout18:23
fungirevisiting my recollections and trying to reproduce, it does look like git only tries one address family. i was probably thinking of curl, which does implement a "happy eyeballs" fast fallback (ietf rfc 6555/8305) and exhibits the oddities i described18:32
clarkbI'm still a bit stumped on what could cause the change of behavior between subsequent git invocations now that i know the addr is statically configured18:34
clarkbthe only thing I can figure is dns but the ttl on the record is 3600 and we see the flip flapping multiple times in under an hour18:34
clarkbmaybe git is round robbinning between A and AAAA records?18:34
fungiit really shouldn't. based on my reading it uses ipv6 if aaaa records are returned, otherwise ipv418:35
fungiexcept when explicitly overridden with -4//ipv4 and -6/--ipv6 cli options18:35
clarkb`sudo zgrep '158.69.71.233\|2607:5300:201:2000::5e8' /var/log/haproxy.log.4.gz` this will show you the flapping on gitea-lb0218:40
clarkbI'm thinking we probably don't need to preserve that log since the node is gone and any real debugging probably needs a held node18:41
clarkbbut feel free to grab it if you think otherwise18:41
opendevreviewMichal Nasiadka proposed opendev/zuul-providers master: Add Rocky Linux 10 builds  https://review.opendev.org/c/opendev/zuul-providers/+/95426518:54
fungiclarkb: not sure if you've seen https://review.opendev.org/c/openstack/pbr/+/954040 (not urgent), but istr you have some strong feelings about it18:57
clarkbfungi: I'm actually fine with whatever format people want to use. I just don't want pre-commit installing everything from git repos if we can use pypi pacakges instead (for reliability reasons) but that is orthogonal19:00
clarkbfungi: as for why it helps refactor: black's formatting is intentionally done to make diffs cleaner19:00
mnasiadkaclarkb: rechecked the stream10 patch because of some arm64 package fetch issues (maybe mirror update in progress) and updated the rocky10 one - let's see if these pass19:00
clarkbso in theory it makes refactoring easier as the diffs between old and new code will be cleaner19:01
clarkbmnasiadka: ya centos 9 stream's mirror seems to be in a sad state while they do updates19:01
corvusthis is... not important... but i just opened one file at random from that change and the reformatting tha black is doing is clearly not going to make diffs easier: https://review.opendev.org/c/openstack/pbr/+/954040/2/pbr/tests/test_files.py#8619:03
corvusi'm not here to argue against black in pbr19:03
corvusi'm just saying i don't agree with that assessment of black19:04
mordredthe pre-commit stuff is gross, although it looks like that line has already been crossed19:04
corvus(you don't make diffs easier to read by squashing a multi-line dict construction into a single line)19:04
mordredI agree - that diff goes the wrong direction19:04
clarkbhuh that is weird since elsewhere black breaks things up across multiple lines19:05
clarkblike here https://review.opendev.org/c/openstack/pbr/+/954040/2/pbr/cmd/main.py I wonder if that is a bug or intentional19:05
mordredI like the general approach of indenting after opening parens and brackets, and of being more aggressive about breaking across lines. I really don't like the "this is small, we'll collapse it" thing19:05
clarkbbut I agree that won't make diffs easier to read19:05
corvusmordred: ++19:06
clarkbanyway I don't use black so have no idea if that is likely intentional or not. But I do know one of the stated goals of the project is to make diffing easier so I would probably consider that a bug myself given that goal19:07
mnasiadkacorvus: I've had multiple issues with black in some repos, it usually was boiled to some bugs (which fixes were hidden behind some experimental flag or were not released yet) - I'm probably not a fan as well ;-)19:07
mnasiadka(and it was always some weird reformatting)19:08
fungiyeah, skimming, it seems like it wants to combine lists into a single line if short, but explode them to one per line if not19:08
mordredI use black on things I work on these days because it's easy enough to do so, and even if I disagree with some of its choices they're not usually terrible choices, so it's usually less cognitive burden to just "shrug, autoformat and be done with it" than to spend time configuring rule specifics.19:14
mordredthis is hilarious though: https://review.opendev.org/c/openstack/pbr/+/954040/2/pbr/sphinxext.py#6819:14
opendevreviewMichal Nasiadka proposed opendev/zuul-providers master: Add Rocky Linux 10 builds  https://review.opendev.org/c/opendev/zuul-providers/+/95426519:57
corvusfor my own edification: rocky10 has the same cpu flag requirements as centos10?20:13
clarkbcorvus: yes20:13
clarkbcorvus: alma however does rebuilds of packages for older hardware. But they don't do that for all packges so it may or may not work depending on what you install20:13
corvusis that a bug or a feature?20:13
clarkbin alma's case it is an intentional choice on their side to allow people with older hardware to try and use almalinux20:14
corvusyeah; i guess i wondered if rocky might make the decision differently too20:15
clarkbiirc they decided not to20:15
corvushere's a stray thought on a different subject: both the rocky and trixie changes are building all the images because they require some pro-forma changes to dib elements to account for the new releases.  perhaps that's something that could be optimized out of the element themselves.20:16
clarkbmaybe. The element is literally the infra package needs element. We could possibly have copies of that element per platform with symlinks to shared bits maybe20:17
clarkbAlso I think the trixie change makes a braoder change that potentially impacts all of debuntui20:17
corvus(example, the trixie change edits a file that basically says "the ntp package is stilled called 'ntp' in trixie".  and the rocky change edits a file that says "we still want to install haveged from epel")20:18
corvusyeah, they're totally running the right jobs for those changes20:18
corvusjust thinking there could be a lot less churn if those elements were restructured, maybe with defaults that carry over automatically, or possibly just have values supplied as input20:19
clarkbwe may be able to invert the defaults where old things override rather than new things overriding20:19
corvusya20:19
corvusfor infra-package-needs, could that just be a list that we pass in through env variables?  then we can set it once for debuntu and we're done until something changes in the distros20:20
corvus(i mean, we set it once in the debuntu zuul image build job)20:21
clarkbmaybe? we're relying on dib tooling to take the list and expand it into the list of packages to install and uninstall20:21
clarkbso it would depend on whether or not we could have an element that writes files into itself that meet the interface for that tooling20:21
clarkbor updating thoes elements to look for the data via env vars if they dno't already20:22
corvusack.  i don't have time to dig into that right now, but that could be a big win for efficiency if someone does.20:22
corvuswe're currently using 50% of the arm quota every time we make one of these changes.20:22
clarkbin the trixie example we could invert the pkg-map pretty easily I think20:24
clarkbbasically things older than bookeworm and focal get listed and then we default the family/distro entries to what we're setting noble,trixie et al to20:24
clarkband do similar for the timesyncd check20:25
opendevreviewJames E. Blair proposed opendev/zuul-providers master: Add ubuntu-bionic and ubuntu-focal labels  https://review.opendev.org/c/opendev/zuul-providers/+/95429620:43
clarkbfungi: jrosser: should we go ahead and proceed with https://review.opendev.org/c/zuul/zuul-jobs/+/954280 ?20:48
fungii think so, but holding for confirmation20:56
fungithe reality is that right now we have those sources enabled, but they're potentially unreachable from most cloud providers if not overridden in the role20:56
jrosserI didn’t expect to lose a feature that were used as a side effect of reporting a different bug tbh21:05
jrosser*we are using21:05
opendevreviewMerged opendev/zuul-providers master: Add missing rocky jobs to periodic pipeline  https://review.opendev.org/c/opendev/zuul-providers/+/95423121:06
fungijrosser: as i understand it, the sources entries were already there, but pulling from a distant cloud mirror most of the time, and we didn't find out until we switched them defaulting to one which wasn't reachable remotely from any other clouds21:13
jrosserthis is a new behaviour though21:13
fungilooks like it's been in there since https://review.openstack.org/563748 merged almost 7 years ago?21:15
fungithe errors are a new behavior because we switched the baked-in mirror hostname to one which isn't reachable from other clouds21:16
opendevreviewJames E. Blair proposed opendev/zuul-providers master: Add ubuntu-bionic and ubuntu-focal labels  https://review.opendev.org/c/opendev/zuul-providers/+/95429621:16
opendevreviewClark Boylan proposed opendev/zuul-providers master: Normalize infra-package-needs packages on current releases  https://review.opendev.org/c/opendev/zuul-providers/+/95429921:16
clarkbcorvus: ^ something like that maybe. If that passes testing then mnasiadka and frickler might want to rebase on top of that change21:16
corvuslgtm21:20
corvusthanks!21:20
fungijrosser: these days it's being set by https://opendev.org/openstack/diskimage-builder/src/branch/master/diskimage_builder/elements/ubuntu-minimal/environment.d/12-ubuntu-repo-dists.bash#L1521:21
clarkbfungi: jrosser  right I think this never worked the way jrosser thought it did21:22
clarkbits just working less well and failing in the process rather than silenety succeeding21:22
jrosseryep - well clearly i have no logs or anything concrete to show related to the orignal issue we had with the backports repo21:23
clarkbthe new behavior is that we're embedding the mirror for the node that built the image rather than the "central" mirror.dfw.rax.opendev.org mirror in every image when built by nodepool on static nodes within dfw21:24
jrosserso its fine, we can go back to removing it in a pre playbook21:24
fungiyeah, i'm not opposed to figuring out a way to make the addition of those sources optional, but they've been included for many years21:24
fungiand the current toggle was a useless feel-good knob, it didn't actually control the existence of the entries, merely let you choose whether they should get updated to point to the closest mirror instead of the baked-in one21:25
fungifixing it down to the dib layer would be a much larger amount of engineering21:26
fungimainly because we haven't previously required jobs which might intentionally install things from backports to first enable the backports sources21:26
fungiso would need to shake out any potential breakage if we stopped including them21:27
opendevreviewJames E. Blair proposed opendev/zuul-providers master: Add nested-virt labels  https://review.opendev.org/c/opendev/zuul-providers/+/95430121:56
corvusi think that change and it's parent should be the last of the labels being handled by nodepool21:57
clarkbI approved the parent22:04
clarkbhopefully that is ok22:04
opendevreviewMerged opendev/zuul-providers master: Add ubuntu-bionic and ubuntu-focal labels  https://review.opendev.org/c/opendev/zuul-providers/+/95429622:04
fungicorvus: one possible error on 95430122:08
fungialso i see we have an incorrect comment on the one above it, though it was there before your change so i didn't comment on that22:08
opendevreviewClark Boylan proposed opendev/zuul-providers master: Normalize infra-package-needs packages on current releases  https://review.opendev.org/c/opendev/zuul-providers/+/95429922:13
clarkbmy brain saw cronie and thought thats where chrony goes22:13
clarkbbut no we have both cronie and chrony22:13
clarkbLast call for meeting updates. I think its mostly caught up based on the work I've seen going on but let me know if there are updates I should make22:14
opendevreviewJames E. Blair proposed opendev/zuul-providers master: Add nested-virt labels  https://review.opendev.org/c/opendev/zuul-providers/+/95430122:20
corvusfungi: ^ thx fixed clarkb ^22:20
corvusthat comment above tripped me up; i fixed it too22:20
corvus#status log restarted zuul-launcher with locality fix22:24
opendevstatuscorvus: finished logging22:25
fungithanks!22:25
fungiykarel: ^ should hopefully be fixed now22:25
clarkbperiodic jobs tongiht will be a good exercise22:26
clarkbI've approved 95430122:27
opendevreviewMerged opendev/zuul-providers master: Add nested-virt labels  https://review.opendev.org/c/opendev/zuul-providers/+/95430122:27
corvusyeah -- sorry i forget to respond to your point directly earlier, but, yes, the rush of jobs put some of our providers over quota; so those temp-failures triggered the flaw22:27
clarkbya I remembered you had said that the periodic job rush is a bit of a stampede that exercises zuul aluncher in weird ways before so figured that it was related22:31
corvusthe short version of that is: quota calculations are (intentionally) racy now, and stampedes tickle that.22:35
clarkbmeeting agenda should be in your inboxes now22:55
opendevreviewClark Boylan proposed opendev/zuul-providers master: Normalize infra-package-needs packages on current releases  https://review.opendev.org/c/opendev/zuul-providers/+/95429923:01
clarkbwhile I'm burning through test nodes it is nice that we can check all of this pre merge without spinning up dib locally23:02
corvus++  also, we could add image validation in zuul if we get bored23:05
opendevreviewClark Boylan proposed opendev/zuul-providers master: Normalize infra-package-needs packages on current releases  https://review.opendev.org/c/opendev/zuul-providers/+/95429923:35
clarkbapparently I don't understand bash regexes. Those quotes I added are not appropriate23:36

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!