Thursday, 2022-08-18

ianwclarkb/Neilhanlon: comparing the dump of the grubby install it does seem to be missing LABEL=00:02
ianwthose two are taken from build logs on nob0100:02
ianwnb01 even00:03
Clark[m]If you look at older builds it seems the info call didn't emit much and definitely not the uuid either00:04
ianwi think this is the same sort of problem00:04
ianwthat sets root="LABEL=cloudimg-rootfs" and i bet that it gets that from the node it's booted on00:05
ianwi wonder how related is00:08
ianwi wonder if we used a different label in test somehwo00:30
ianwwe have --root-label=00:33
opendevreviewIan Wienand proposed openstack/diskimage-builder master: Allow setting ROOT_LABEL from environment
opendevreviewIan Wienand proposed openstack/diskimage-builder master: Allow setting ROOT_LABEL from environment
opendevreviewIan Wienand proposed openstack/diskimage-builder master: [dnm] trigger builds with different root labels
opendevreviewIan Wienand proposed openstack/diskimage-builder master: [dnm] trigger builds with different root labels
ianw^ hopefully that *fails* -- it would show us the root disk isn't being set correctly01:50
*** ysandeep|out is now known as ysandeep05:29
*** ysandeep is now known as ysandeep|ruck06:01
ianwclarkb/NeilHanlon -- that did fail @
ianwjob is
ianwdib build is
*** jpena|off is now known as jpena07:38
*** ysandeep|ruck is now known as ysandeep|lunch08:34
*** ysandeep|lunch is now known as ysandeep10:05
*** ysandeep is now known as ysandeep|ruck10:07
*** dviroel|afk is now known as dviroel11:39
*** ysandeep|ruck is now known as ysandeep|afk14:38
*** ysandeep|afk is now known as ysandeep14:57
*** ysandeep is now known as ysandeep|PTO15:05
*** dasm|off is now known as dasm15:09
*** marios is now known as marios|out15:30
NeilHanlonianw: so, just to make sure I'm understanding correctly. The image being built is being given a LABEL, but the installed grub boot option uses UUID regardless. right?15:45
clarkbNeilHanlon: I think it is a bit more subtle than that. It uses the build hosts boot device. In production that is a uuid. In testing (and this is the reason it passes testing) it used a label because our test nodes boot using a disk label15:51
NeilHanlonah, okay. I think I am understanding. The nodepool stuff is still a bit of a mystery to me. trying to piece it all together15:52
clarkbNeilHanlon: the stack ianw pushed above overrides the LABEL value so that it is different than the one on the test node which causes it to fail in testing. We were getting lucky that disk image builder was used to build the test nodes so the label was the same "cloudimg-rootfs" on the dib host and the resulting image we want to test so it just worked15:52
clarkbNeilHanlon: I think the question now is how is the grub install on rocky 9 ignoring the image build's /etc/default grub value which sets the label and UUID avoidance flag and still finding the host disk info15:56
clarkbonce we figure that out we can update with the fix and the test should pass at that point15:57
NeilHanlonagreed. it's been funky since the beginning and I'm at the point where I feel we (i've) missed something obvious here15:59
NeilHanlonthough at least we are not alone. Our releng lead for Rocky has been having almost the same issues in his lab with EL916:00
*** jpena is now known as jpena|off16:42
clarkbianw: I left a question on 85357317:09
tristanCclarkb: so we collected all the periodic pipelines and their job count in this page:
tristanCthe openstack-promote-component pipeline has 64 jobs and it triggers every 2 hours. Is there a place where we can see when is the rdo's zuul load causing trouble for
clarkbtristanC: cacti is what I've been looking at shows a spike of tcp connections double typical background when it has problems17:42
tristanCclarkb: oh, so it's not even daily? the only 2x spike i can see is between the 16th and 17th of august17:45
clarkbtristanC: correct it has happened twice at roughyl the same time but at least a day apart17:45
clarkb(at least twice I should say)17:45
clarkbI think after the first one you assumed periodic jobs were doing it, but I'm not sure we have any evidendce of that yet. What we did have evidence of was the sf account having a significant number of open connections during the connection spikes17:46
clarkbit happened Monday and Wednesday at around 0800 UTC17:47
tristanCclarkb: thanks, i'll check the scheduler log at those date17:49
clarkbLooks like there was a spike at 0800 ish today as well just minimal compared to wednesdays17:49
clarkbI suspect it maybe occuring daily, but depending on other system load we may not cross our limit threasholds17:50
clarkband maybe when fungi is back from vacation we can land and restart gerrit (I suppose we can do that this week, but making a hcang elike that on friday is maybe not the best idea17:52
opendevreviewClark Boylan proposed openstack/diskimage-builder master: DNM test if GRUB_DISABLE_UUID fixes rocky 9 boots
clarkbI don't really expect that to fix things, but it was the bit I found yesterday and figured testing it was easy so why not18:18
fungii'm good with 853528 but not really around if it goes sideways19:15
*** rcastillo|rover_ is now known as rcastillo|rover19:16
*** dviroel is now known as dviroel|afk20:32
opendevreviewClark Boylan proposed opendev/system-config master: Update to Gitea 1.17
clarkb1.17.1 has happened21:13
clarkblooks like the rocky 9 build against 853691 does indeed fail as expected. That wasn/t the fix21:19
ianwi'm sure it will have to do with the bls uuid matching again.  i'll try to pull something up soon22:01
*** dasm is now known as dasm|off22:08
clarkbLooks like the runc issue preventing docker exec got fixed. Not sur ehow long until we see it in a release though22:16
*** rlandy_ is now known as rlandy|out23:13
*** rcastillo|rover is now known as rcastillo23:13

Generated by 2.17.3 by Marius Gedminas - find it at!