ianw | clarkb/Neilhanlon: comparing the dump of the grubby install https://paste.opendev.org/show/bAZtHOkYK22lhvMr4zew/ it does seem to be missing LABEL= | 00:02 |
---|---|---|
ianw | those two are taken from build logs on nob01 | 00:02 |
ianw | nb01 even | 00:03 |
Clark[m] | If you look at older builds it seems the info call didn't emit much and definitely not the uuid either | 00:04 |
ianw | i think this is the same sort of problem | 00:04 |
ianw | https://c4b8fbf52a989902d3ec-7920090a053c319fb201bfc42c18b31c.ssl.cf2.rackcdn.com/840144/6/check/dib-nodepool-functional-openstack-rockylinux-9-containerfile-src/0255c6c/nodepool/builds/test-image-0000000001.log | 00:04 |
ianw | that sets root="LABEL=cloudimg-rootfs" and i bet that it gets that from the node it's booted on | 00:05 |
ianw | i wonder how related https://review.opendev.org/c/openstack/diskimage-builder/+/851687 is | 00:08 |
ianw | i wonder if we used a different label in test somehwo | 00:30 |
ianw | we have --root-label= | 00:33 |
opendevreview | Ian Wienand proposed openstack/diskimage-builder master: Allow setting ROOT_LABEL from environment https://review.opendev.org/c/openstack/diskimage-builder/+/853573 | 01:19 |
opendevreview | Ian Wienand proposed openstack/diskimage-builder master: Allow setting ROOT_LABEL from environment https://review.opendev.org/c/openstack/diskimage-builder/+/853573 | 01:23 |
opendevreview | Ian Wienand proposed openstack/diskimage-builder master: [dnm] trigger builds with different root labels https://review.opendev.org/c/openstack/diskimage-builder/+/853575 | 01:23 |
opendevreview | Ian Wienand proposed openstack/diskimage-builder master: [dnm] trigger builds with different root labels https://review.opendev.org/c/openstack/diskimage-builder/+/853575 | 01:28 |
ianw | ^ hopefully that *fails* -- it would show us the root disk isn't being set correctly | 01:50 |
*** ysandeep|out is now known as ysandeep | 05:29 | |
*** ysandeep is now known as ysandeep|ruck | 06:01 | |
ianw | clarkb/NeilHanlon -- that did fail @ https://zuul.opendev.org/t/openstack/build/0ed7322cc25d41588ddaec00435ea4aa/log/instances/16cf4a50-e107-494a-88f8-194f5115061c/console.log | 07:28 |
ianw | job is https://zuul.opendev.org/t/openstack/build/0ed7322cc25d41588ddaec00435ea4aa/logs | 07:28 |
ianw | dib build is https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_0ed/853575/2/check/dib-nodepool-functional-openstack-rockylinux-9-containerfile-src/0ed7322/nodepool/builds/test-image-0000000001.log | 07:28 |
*** jpena|off is now known as jpena | 07:38 | |
*** ysandeep|ruck is now known as ysandeep|lunch | 08:34 | |
*** ysandeep|lunch is now known as ysandeep | 10:05 | |
*** ysandeep is now known as ysandeep|ruck | 10:07 | |
*** dviroel|afk is now known as dviroel | 11:39 | |
*** ysandeep|ruck is now known as ysandeep|afk | 14:38 | |
*** ysandeep|afk is now known as ysandeep | 14:57 | |
*** ysandeep is now known as ysandeep|PTO | 15:05 | |
*** dasm|off is now known as dasm | 15:09 | |
*** marios is now known as marios|out | 15:30 | |
NeilHanlon | ianw: so, just to make sure I'm understanding correctly. The image being built is being given a LABEL, but the installed grub boot option uses UUID regardless. right? | 15:45 |
clarkb | NeilHanlon: I think it is a bit more subtle than that. It uses the build hosts boot device. In production that is a uuid. In testing (and this is the reason it passes testing) it used a label because our test nodes boot using a disk label | 15:51 |
NeilHanlon | ah, okay. I think I am understanding. The nodepool stuff is still a bit of a mystery to me. trying to piece it all together | 15:52 |
clarkb | NeilHanlon: the stack ianw pushed above overrides the LABEL value so that it is different than the one on the test node which causes it to fail in testing. We were getting lucky that disk image builder was used to build the test nodes so the label was the same "cloudimg-rootfs" on the dib host and the resulting image we want to test so it just worked | 15:52 |
clarkb | NeilHanlon: I think the question now is how is the grub install on rocky 9 ignoring the image build's /etc/default grub value which sets the label and UUID avoidance flag and still finding the host disk info | 15:56 |
clarkb | once we figure that out we can update https://review.opendev.org/c/openstack/diskimage-builder/+/853575 with the fix and the test should pass at that point | 15:57 |
NeilHanlon | agreed. it's been funky since the beginning and I'm at the point where I feel we (i've) missed something obvious here | 15:59 |
NeilHanlon | though at least we are not alone. Our releng lead for Rocky has been having almost the same issues in his lab with EL9 | 16:00 |
*** jpena is now known as jpena|off | 16:42 | |
clarkb | ianw: I left a question on 853573 | 17:09 |
tristanC | clarkb: so we collected all the periodic pipelines and their job count in this page: https://softwarefactory-project.io/weeder/tenant/rdoproject.org/info | 17:40 |
tristanC | the openstack-promote-component pipeline has 64 jobs and it triggers every 2 hours. Is there a place where we can see when is the rdo's zuul load causing trouble for review.opendev.org? | 17:41 |
clarkb | tristanC: cacti is what I've been looking at http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=70350&rra_id=all shows a spike of tcp connections double typical background when it has problems | 17:42 |
tristanC | clarkb: oh, so it's not even daily? the only 2x spike i can see is between the 16th and 17th of august | 17:45 |
clarkb | tristanC: correct it has happened twice at roughyl the same time but at least a day apart | 17:45 |
clarkb | (at least twice I should say) | 17:45 |
clarkb | I think after the first one you assumed periodic jobs were doing it, but I'm not sure we have any evidendce of that yet. What we did have evidence of was the sf account having a significant number of open connections during the connection spikes | 17:46 |
clarkb | it happened Monday and Wednesday at around 0800 UTC | 17:47 |
tristanC | clarkb: thanks, i'll check the scheduler log at those date | 17:49 |
clarkb | Looks like there was a spike at 0800 ish today as well just minimal compared to wednesdays | 17:49 |
clarkb | I suspect it maybe occuring daily, but depending on other system load we may not cross our limit threasholds | 17:50 |
clarkb | and maybe when fungi is back from vacation we can land https://review.opendev.org/c/opendev/system-config/+/853528 and restart gerrit (I suppose we can do that this week, but making a hcang elike that on friday is maybe not the best idea | 17:52 |
opendevreview | Clark Boylan proposed openstack/diskimage-builder master: DNM test if GRUB_DISABLE_UUID fixes rocky 9 boots https://review.opendev.org/c/openstack/diskimage-builder/+/853691 | 18:18 |
clarkb | I don't really expect that to fix things, but it was the bit I found yesterday and figured testing it was easy so why not | 18:18 |
fungi | i'm good with 853528 but not really around if it goes sideways | 19:15 |
*** rcastillo|rover_ is now known as rcastillo|rover | 19:16 | |
*** dviroel is now known as dviroel|afk | 20:32 | |
opendevreview | Clark Boylan proposed opendev/system-config master: Update to Gitea 1.17 https://review.opendev.org/c/opendev/system-config/+/847204 | 21:13 |
clarkb | 1.17.1 has happened | 21:13 |
clarkb | looks like the rocky 9 build against 853691 does indeed fail as expected. That wasn/t the fix | 21:19 |
ianw | i'm sure it will have to do with the bls uuid matching again. i'll try to pull something up soon | 22:01 |
*** dasm is now known as dasm|off | 22:08 | |
clarkb | Looks like the runc issue preventing docker exec got fixed. Not sur ehow long until we see it in a release though | 22:16 |
*** rlandy_ is now known as rlandy|out | 23:13 | |
*** rcastillo|rover is now known as rcastillo | 23:13 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!