Friday, 2022-07-29

opendevreviewMerged zuul/zuul-jobs master: Add jammy testing
opendevreviewMerged zuul/zuul-jobs master: Sort supported platforms
opendevreviewMerged zuul/zuul-jobs master: Support subsets of platforms in update-test-platforms
opendevreviewMerged zuul/zuul-jobs master: Revert "Revert cri-dockerd changes"
opendevreviewMerged zuul/zuul-jobs master: Test ensure-kubernetes on all Ubuntu platforms
ianw is showing a lot of green00:47
clarkbya I think elodilles confirmed it was looking good earlier today too00:48
clarkbthank you for working through that00:48
ianwi guess something like is really a failure of the package and it's bindep dependencies00:49
opendevreviewMerged openstack/project-config master: Synchronize diskimages on builders
*** ysandeep|out is now known as ysandeep01:54
*** ysandeep is now known as ysandeep|afk03:24
*** ysandeep|afk is now known as ysandeep03:38
tonybHow do I see which groups/users have access to a repo?  I expected it to be,access but that has an empty area where I'd expect the details to be04:24
tonybI can see the answer in the approprate REST API04:30
*** ysandeep is now known as ysandeep|break04:32
*** ysandeep|break is now known as ysandeep05:36
*** bhagyashris is now known as bhagyashris|ruck06:05
*** ysandeep is now known as ysandeep|lunch07:14
bbezakHi - I've noticed that 'rockylinux-8' nodes stopped working couple of days ago. Do you happen to know what's up with that? -
opendevreviewSlawek Kaplonski proposed openstack/project-config master: Add new project "whitebox-neutron-tempest-plugin" in the x/ namespace
*** jpena|off is now known as jpena08:45
*** akahat|ruck is now known as akahat09:12
*** ysandeep|lunch is now known as ysandeep10:21
*** rlandy|out is now known as rlandy10:35
*** dviroel|afk is now known as dviroel11:15
*** dviroel is now known as dviroel|rover11:18
dtantsurhi folks, could someone elaborate on the error message in and why it ignored depends-on?11:40
dtantsurthis is definitely something that used to work11:40
*** soniya is now known as soniya|ruck11:52
*** soniya|ruck is now known as soniya|afk11:52
*** frenzy_friday is now known as frenzyfriday|rover11:54
NeilHanlonbbezak: I'm poking at that today, came to ask for some help myself :)12:17
fungitonyb: unfortunately, gerrit's security model is such that it errs on the side of only showing you access information for things you have access to do, so you can't actually see the acl entries for things that don't authorize you to do them12:19
fungitonyb: the workaround would be to parse in the gerrit/projects.yaml and gerrit/acls/*/* files in the openstack/project-config repo12:20
tonybfungi: but I can see the information via that rest API with curl 12:20
fungii'm not surprised that they apply that security model inconsistently12:20
tonybFair enough.  i wanted to answer the question "who can land this patch" which I used to be able to answer in an older Gerrit at that url.12:23
tonybI got my answer just found it strange/a regression I can't do it in the web UI like I could12:23
tonybanyway, thanks fungi for the reply 12:23
fungitonyb: i had a script for that at but it likely needs some updating for recent changes in gerrit12:24
fungiit handled things like recursive group resolution too12:25
fungibbezak: NeilHanlon: it looks like the images are being built successfully, just looking at the logs on our nodepool image builders. my guess is they've ceased to be bootable for some reason12:25
tonybfungi thanks.  I'll check that out over the weekend 12:26
fungibbezak: NeilHanlon: the most recently built one seems to be here if you want to give it a try locally:
fungidtantsur: a while back, zuul went from ignoring approvals on changes like that to registering a failure, so that you'd be aware12:30
fungithe reason and solution remain the same: if changes depend on each other but their projects don't share a change queue, don't approve the depending change until the one it depends on has merged12:31
NeilHanlonsweet, thanks fungi. checking out now12:31
dtantsurfungi: hmm, I see. That's probably a message that did not come across, at least I somehow missed it.12:35
dtantsurmaybe an error message could be expanded with an explanation?12:35
NeilHanlonfungi: so I grabbed that rocky8 qcow and booted it fine in KVM using virt-install. are there any logs I can look at for it trying to start up?12:47
fungidtantsur: well, it already says "Change 851500 in project openstack/requirements does not share a change queue with 851501 in project openstack/ironic-inspector" as its reason. are you suggestion it should explain why changes have to share a change queue in order to be enqueued together, or that zuul start embedding links to relevant sections of its documentation in errors, or something else?12:57
fungiNeilHanlon: i think we might have a feature in nodepool which tries to capture the console log from nova for boot failures, i'll have to see if that happens by default (and where it gets written), or if it's something we have to turn on12:59
fungijust booting isn't sufficient though, we need to be able to ssh into it for the boot to be considered successful, so if it's coming up with no networking or sshd isn't starting that could still explain it12:59
NeilHanlonah, yeah that makes sense, too. i'm also rebuilding my AIO here as I ruined it a few weeks ago and haven't had a chance to fix it. so I can test in nova too13:00
*** ysandeep is now known as ysandeep|out13:02
dtantsurfungi: well, it's confusing for those who got used to the old behavior (which I liked much more tbh). Like, it never shared a change queue, so the cause is unclear without a further explanation.13:03
fungiahh, yeah the old behavior was that it ignored your approval completely and you had to approve it again once the dependency merged13:04
funginow it actually gives you feedback explaining why it wasn't enqueued, rather than just silence13:04
fungiNeilHanlon: i can confirm, our launcher logs indicate a timeout waiting for a connection to port 22 on the vm13:05
fungiit reports that the node makes it to a running state in nova, but we can't reach it13:05
fungiso something has happened with networking setup i guess, either in  rocky itself or in dib13:06
NeilHanlonfungi: is it safe to assume it's trying to ssh in with a password, not a pubkey?13:17
funginope, we embed allowed keys into the image, but you can override them through configdrive13:19
fungifirst order of business though would be to see if there's even a reachable socket on 22/tcp13:19
fungiif you can netcat or telnet to that and get an sshd banner back, then at least that much is working13:19
*** dviroel|rover is now known as dviroel13:20
fungias far as i can tell from our launcher logs, i don't think we're getting that far even13:20
Clark[m]If the issue is with glean the openstack console log should record what glean has done or attempted to do. Often times with issues like this we end up needing to manually boot the image, check console log, then maybe rescue instance to edit something and try rebooting and so on13:21
NeilHanlonon my qemu/kvm setup here, sshd is started and I can ssh to it13:22
NeilHanlon(I recognize this is not the setup it's failing in, though)13:22
*** dviroel is now known as dviroel|afk13:25
fungii can confirm, for example, that it's failing in rackspace but that's of course xen and not kvm. i'll see if i can also find a similar situation in another provider13:33
fungiClark[m]: did we or did we not add a feature to nodepool for capturing console logs automatically?13:34
Clark[m]I think you might have to toggle it on for the label or the image 13:36
Clark[m]I've got to pop out now but can take a second look in a couple hours13:36
fungistrangely, i see a bunch of rockylinux-8 nodes in a ready state in various providers, about half of which are over a day old13:37
fungiand the rest are from roughly 4 hours ago13:37
fungiabout 20 in total13:38
fungiyet they're not getting used to fulfill node requests13:38
fungino, nevermind. i was looking at image-list. those are images. *sigh*13:39
* fungi finishes his coffee13:39
fungiit looks like nl01 (our rackspace launcher) is the only one that's tried booting grep rockylinux-8 nodes in the past few hours, so maybe it's failing on xen but no other launchers are taking over the node request?13:42
fungis/grep //13:42
*** arxcruz|rover is now known as arxcruz13:43
funginevermind, i found an example in iweb from 07:38:10 utc, same story (timeout waifing for connection to port 22)13:48
fungiso we're seeing it in a kvm provider too13:48
fungier, timeout waiting13:49
fungianyway, i think this confirms the problem isn't provider or backend specific, so we need to see what's being recorded to the console log (which i think will be easier in iweb due to rackspace only providing web-based console access)13:50
*** dasm|off is now known as dasm13:59
opendevreviewJeremy Stanley proposed openstack/project-config master: Temporarily turn on console logs for rocky in iweb
opendevreviewJeremy Stanley proposed openstack/project-config master: Revert "Temporarily turn on console logs for rocky in iweb"
fungii'll set the revert to wip for now13:59
*** pojadhav is now known as pojadhav|out14:08
opendevreviewMerged openstack/project-config master: Allow Stackalytics maintainers to rewrite history
fungiself-approving 851519 so we can start collecting logs on that asap14:18
opendevreviewMerged openstack/project-config master: Temporarily turn on console logs for rocky in iweb
fungideploy job for it is almost done running14:59
fungiand it's deployed. now to wait for another rockylinux-8 boot attempt in iweb-mtl0115:03
fungiit doesn't appear to actually put the captured console in the launcher's debug log, nor are there any separate files for that i can see in the log dir15:25
funginothing relevant in /tmp either15:25
fungithe launcher doesn't need to be restarted after a config update, does it?15:26
Clark[m]It shouldn't. It reloads the config each pass through the run loop15:35
fungimaybe the console log collector is bitrotten?15:38
fungiahh, nope it's in there, i was just having trouble sifting it out15:43
fungivery little console output captured because it's sitting at a grub> prompt15:45
fungilooks like it probably has no idea what to boot15:46
fungino errors though that i can see, it's just not automatically booting. maybe it can't find the grub config?15:48
fungilooks like we're generating a /boot/grub2/grub.cfg during image building at least15:51
fungistarting at 08:29:32 in the log here:
clarkbis the device label wrong maybe? or did we switch to efi somehow?15:52
*** jpena is now known as jpena|off15:52
fungiit's not finding a /sys/firmware/efi and so doing mbr15:53
fungi/usr/sbin/grub2-install '--modules=part_msdos part_gpt lvm biosdisk' --target=i386-pc --force /dev/loop015:53
clarkbok thats good. we want bios for the x86 clouds15:54
clarkbfungi: often at this point a rescue instance or mounting the image locally can be helpful just to see what t looks like. With a rescue instance you can make changes and rebootto see if they fix15:55
fungilooks like the image builds were failing on the 26th which is as far back as our log retention for those goes, so i suspect it went from "we have working images" to "we can't build images" for some period of time and then to "now we can build images again but they no longer boot in our providers" at which point people started getting node_error results15:55
clarkbit is odd that the image would boot for NeilHanlon but not in the cloud if grub config is the problem15:55
clarkbyes all image builds were failing due to the full disks due to the leaked fedora-35 images I cleaned up15:56
clarkbonce I cleared that up we had room to make images again and new images would've been created15:56
fungiahh, okay. so really no idea how long ago this broke, because it's likely the working nodes were booting from quite stale images15:56
*** iurygregory_ is now known as iurygregory15:57
fungilast time i see any substantive changes for rocky image builds in dib was february when it started insisting on networkmanager15:59
fungiso probably some outside factor changed15:59
fungiwe updated our container base images for nodepool on june 30, but that's also quite a while ago16:01
clarkband the container base images should't affect building much16:01
clarkbdib makes a chroot and isolates itself from the host env16:01
fungiahh, yep16:02
clarkbit is probably some change to the distro itself that we aren't accomodating properly16:02
*** marios is now known as marios|out16:02
fungi"the distro" meaning rocky, not the builder's distro?16:03
fungirocky linux 8.6 happened in may16:05
fungi~10 weeks ago16:05
NeilHanlonthere are some changes, definitely. i didn't anticipate any making the builds fail though. my apologies16:06
fungi9.0 happened a couple of weeks back though, are we accidentally installing that thinking it's v8?16:06
fungiNeilHanlon: "recent" changes after 8.6?16:07
NeilHanlonthe latest containers I pushed in early July16:08
NeilHanlonthe only difference in the container package set is a langpack, so my assumption is there's a change in config somewhere/somehow going on16:12
*** dviroel|afk is now known as dviroel16:14
fungii suppose we could try to figure out how to role back to the earlier container for that and test the theory16:14
NeilHanlonif the build does any dnf upgrade, it would go to the latest versions, so i'm not sure that would help16:15
clarkbya I think it will be more productive to work backward from grub doesn't work16:16
clarkbrather than try and identify a needle in a haystack16:16
clarkbit could also be a change to how dib manages grub16:18
NeilHanlon maybe? 16:21
clarkbya I wonder what was in the most recent dib release16:23
fungi`git tag --contains 9987d09` says that first appeared in 3.21.0 tagged 2022-05-0416:26
fungiso it's been in for a while16:26
fungiand we bumped our nodepool images to that dib version the same day16:26
clarkbunrelated, why in 2022 do we still have to do an initial login to set an admin user password16:43
fungiis that for postorious/hyperkitty?16:47
clarkbreading the script that handles it it looks like you can just not set the vars to create an admin user and I'm hoping doing that is "secure" the docs say you have to do it though so we'll see17:00
clarkbthese containers use alpine too and the user management is weird. I'm worried we might have to replace them. But one step at a time17:01
fungiyeah, i had similar concerns when looking at them. though when i first looked they were building "fat" containers with an init and multiple services17:14
opendevreviewClark Boylan proposed opendev/system-config master: WIP Add a mailman3 list server
clarkbfbo[m]: ya so far no real deal breakers just some odd decisions (but I've got my own biases). Another thing I Noticed is one container uses pymysql to use the db and the other uses the C bound library17:15
clarkber sorry fbo[m]  that was for fungi 17:15
fungiyes, it seems like a somewhat disjoint hodgepodge17:15
clarkbthat latest patchset will almost certainly explode, but it should give us logs I hope for what assumptions we need to work around17:17
clarkbit isn't clear to me how to configure the vhost aspects of mailman3 through the docker containers yet for example17:17
fungiit's been a while since i did the first poc, but i can pull up my notes when i'm at another computer. my recollection though is you don't set up the vhosts, you just add mailing lists to them and they appear as if by magic. also possible i was drunk, no guarantees17:19
clarkbha ok17:20
fungiin mm3 though, i'm pretty sure the domain is just part of the list info17:20
clarkbbut ya  Ifigure a lot of this stuff will make more sense if we just get a thing in CI with some tests17:20
fungiclarkb: my old notes (from 4 years ago, omg!) are at
fungithose include creating mailing lists in multiple domains17:32
fungilooks like you pass the `mailman create` subcommand a parameter for the list address which necessarily includes the domain (unlike in mm2 where it's just the base name for the ml)17:34
fungia downside to that design is that we'll probably need to have some extra validation or maybe a hierarchical data structure (like now) in our meta-config to make sure you don't accidentally typo a new domain into existence17:39
fungiso even though mm3 fully qualifies the listnames now, we could keep our yaml as a nested associative array of domain:list17:41
fungitechnically we organize it by a "site name" and then a list at present, but we could replace the site names with corresponding fqdn from our existing map17:51
fungithis is a good opportunity to simplify all that anyway17:52
clarkbI've booted clarkb-test-rocky in ovh bhs1 just to verify the behavior is consistent there (it is)17:56
clarkbNow I'm going to try rescueing it to see if anything stands out in the image17:56
fungioh, thanks!17:57
fungiNeilHanlon: ^17:57
clarkbside note ovh apparnetly has uefi images. I wonder ifwe can boot efi there now17:57
fungii got briefly sidetracked by overdue yardwork and an impending appliance delivery17:57
fungioh neat17:58
clarkbOk first thing I checked is that it isn't somehow rocky 9. /etc/os-release reports 8.618:02
fungioh good18:03
clarkbnext the device lable is cloudimg-rootfs so that was set proprely18:03
corvusthe zuul pipeline refresh change merged, so that should be in place after this weekends restarts (this is the issue that caused 2 zuul changes to get stuck in gate for 24h)18:04
clarkbfungi: NeilHanlon /boot/grub2/grub.cfg doesn't appear to have any menu entries18:06
clarkbI think we havne't specified what to boot in the grub config18:06
clarkbNeilHanlon: I'm surprised it was able to boot foryou?18:07
clarkboh wait18:08
fungithanks corvus!18:08
clarkbI may have derped and looked at the wrong path18:08
NeilHanlonthe image i downloaded from nb02 seems to be fine 18:08
clarkbno I looked at the correct grub config. I looked at the wrong etc/default/grub18:08
NeilHanlonw.r.t. the kernel, i mean18:08
clarkbNeilHanlon: thats the image I booted on ovh18:08
clarkbNeilHanlon: grub.cfg is present but without any entries to boot18:09
fungimaybe grub picks the first definition it finds in that case?18:09
fungibut yeah, the behavior we're observing looks (from the console log) exactly like "i launched grub... what next?"18:10
clarkbwell there isn't anything in the grub config to say what it should boot18:10
clarkb is the images i hsould be one in ovh18:10
NeilHanlonyeah that's the one I grabbed18:11
clarkbNeilHanlon: uh this image only has /boot/grub2/grub.cfg18:11
clarkbthat paste looks like grub 118:11
clarkbthere is also a /boot/efi/EFI here (which on the surface shouldn't affect anything but it is unexpected to me)18:12
clarkbparted shows the partition type is mbr not gpt so I think that implies we aren't tryingto build an efi images with dib (as dib assumes gpt with efi?)18:14
clarkbNeilHanlon: can you check your /etc/dib-build-date.txt18:15
clarkber /etc/dib-builddate.txt18:15
clarkb2022-07-29 08:17 is what I've got in there18:15
NeilHanloni'm very confused. i downloaded the image again and now it is booting to a grub console...18:18
NeilHanloni don't believe the lack of menuentries in grub.cfg is a problem; none of my installs have that either18:19
clarkbhow doe sgrub know what to boot in that case?18:19
clarkbit may be that in the efi case shortcuts can be taken but I think in the bios case you have to have something there for grub to boot18:20
clarkb/etc/default/grub appear to be set properly with GRUB_DEVICE=LABEL=cloudimg-rootfs so whatever is generating the grub config is just not producing entries for that device and the kernel?18:21
NeilHanlon this is from a different rocky 8 system18:21
clarkboh! I see no kernels in /boot18:22
NeilHanlonwe don't need those, right?18:22
clarkbI bet the underlying issue is no kernel means grub config isn't properly populated18:23
clarkbin your past you have stuff under 10_linux starting at line 106 which is empty in these images18:23
clarkbbut ya I suspect if we fix the lack of kernels then grub instal will be happy18:23
clarkbI think that is enough info to debug via the build logs and local dib runs now? I'm going to clean up my test instance18:23
NeilHanlonfair enough! :) yep, definitely18:24
NeilHanlonthank you and fungi for the assistance thus far18:24
clarkbyou're welcome. Fwiw I was expecting a glean issue because it is always a glean (or networkmanager) issue :)18:25
clarkbneat to find something different18:25
fungialso happy when the problem isn't something we have to fix ;)18:43
NeilHanloni enjoy breaking things in new and exciting ways18:45
fungithat's the thrill of computering18:52
NeilHanlonit seems to be installing the kernel, so I am le confused18:52
NeilHanlonbut that is relatively normal for me, so i'll take it18:53
fungiyou're looking at the image build log i linked earlier, right?18:53
NeilHanlonthat, and running locally18:53
fungi2022-07-29 08:15:30.945 | > ---> Package kernel.x86_64 4.18.0-372.16.1.el8_6.0.1 will be installed18:55
fungithat looks good18:55
fungi2022-07-29 08:16:40.587 | >   Installing       : kernel-headers-4.18.0-372.16.1.el8_6.0.1.x86_64     79/193 18:55
fungioh, wait, wrong pacage18:55
fungi2022-07-29 08:17:03.186 | >   Installing       : kernel-4.18.0-372.16.1.el8_6.0.1.x86_64            171/193 18:56
fungi"/boot is not a mountpoint" so it wants a separate /boot partition?18:58
clarkbit is common practice t odo that but I'm not aware of anyone requiring it be done18:58
fungior is it just that something needs to add a /boot/grub2/grubenv?18:58
NeilHanloni think it is stemming from /boot/loader/entries/ being empty18:59
clarkbit is required to hvae a fat32 partition for efi at /boot/efi or whatever but we aren't efi'ing18:59
fungiyeah, it's not clear to me whether any of the entries in that paste are errors or just informing the decisions19:00
fungii do see it in our build log too: 2022-07-29 08:29:32.972 | grep: /boot/grub2/grubenv: No such file or directory19:00
fungithough "/boot is not a mountpoint" is not in our log19:01
NeilHanlonyeah, same. will have to dig in, I think. I wonder if it is a separate partition at point point, and then after save it reverts to being a (different) directory? Does that make sense? i.e., something is mounting on top of an existing directory and doing work that gets lost19:02
clarkbif that were happening it would have to do a weird end around dib's disk management. I think that would be weird for physical devices too19:03
NeilHanlonI.. might have found it. Confirming19:06
fungilast touched 2 years ago, did something change in rocky to need it?19:09
fungia recentaddition of iscsi element to rocky maybe necessitates it?19:09
clarkbfungi: I think it is the source of that mesage you saw but unlikely realted to this bug19:09
fungiahh, yes19:09
NeilHanlonno, this wasn't it either. i was hoping it was somehow responsible for making the configs on centos images, but I was mistaken19:10
fungihalf of the problem is that i would normally do a sxs comparison of the working and broken builds, but since we also had a bout of rapid-fire build failures we no longer have a good log to compare against19:10
NeilHanlonI believe what needs to happen is adding `GRUB_ENABLE_BLSCFG=true` to /etc/default/grub - but it is not clear to me why it's unnecessary on other images, too19:11
NeilHanlone.g., I expect this to be roughly equivalent to a centos8 one or a rhel 819:12
clarkbwe only boot centos-8-stream now so maybe they have diverged and rhel 8 is also broken19:13
fungiyeah, at this point rocky and openeuler are the closest things we have to rhel19:14
fungior did someone get alma images going too?19:15
NeilHanloni don't think so19:15
clarkbya no alma19:15
*** dasm is now known as dasm|off20:46
clarkbwhen I run into distro specific issues I tend to fire up a container really quickly and sanity check. Unfortuantely that doesn't help with kernels and boot stuff20:51
clarkbdib-run-parts Running /tmp/in_target.d/finalise.d/01-clean-old-kernels seems to run and not find any old kernels to clean20:55
BlaisePabon[m]1you had me going for a bit here, because this is (almost) a plausible sentence in Spanish and I was staring at it, trying to figure out what you were trying to say.20:55
clarkbBlaisePabon[m]1: with "ya no alma" ?20:56
BlaisePabon[m]1I imagined a woman named Alma who was giving you a hard time and you were telling her off.20:57
clarkbNeilHanlon: I see in the image size report that 69MiB /opt/dib_tmp/dib_build.5sTNqxWo/built/usr/lib/modules/4.18.0-372.16.1.el8_6.0.1.x86_64 is present. Is it possible that the symlinks in /boot are missing due to package changes?21:00
clarkbmaybe we need another package to add those symlinks?21:00
opendevreviewClark Boylan proposed opendev/system-config master: WIP Add a mailman3 list server
NeilHanlonthat seems reasonable. i'll be trying to sort it later this evening. gotta go do family stuff21:07
clarkbno rush, thanks for looking21:10
*** dviroel is now known as dviroel|out21:33

Generated by 2.17.3 by Marius Gedminas - find it at!