Monday, 2021-10-25

opendevreviewIan Wienand proposed openstack/diskimage-builder master: centos-minimal: boot test 9-stream
ianwclarkb: after much faffing about, i am 99% sure the fedora 34 boot issues are from
fricklerianw: seems you used the f35 advisory code instead of the one for f3405:34
fricklercomment 29 vs. 3005:36
ianwah, yes06:31
ianwi think that is too early still anyway...06:34
ianwfedora-34-0000007913.log ... trying with an explicit pre-install of dracut, then upgrading it06:44
opendevreviewMartin Kopec proposed opendev/system-config master: Adjust RefStack build for osf->openinfra rename
opendevreviewAlfredo Moralejo proposed openstack/diskimage-builder master: Add support for CentOS Stream 9 in DIB
*** ykarel is now known as ykarel|lunch07:49
*** mazzy5096 is now known as mazzy50908:10
*** mazzy5098 is now known as mazzy50908:23
opendevreviewdaniel.pawlik proposed openstack/project-config master: Add project openstack/ci-log-processing
opendevreviewdaniel.pawlik proposed openstack/project-config master: Setup configuration for project openstack/ci-log-processing
*** ykarel|lunch is now known as ykarel09:35
*** noonedeadpunk_ is now known as noonedeadpunk10:36
ianwi think fedora-34-0000007919.log will work10:36
*** ysandeep is now known as ysandeep|afk10:48
*** dviroel|rover|afk is now known as dviroel|rover11:10
opendevreviewdaniel.pawlik proposed openstack/project-config master: Add project openstack/ci-log-processing
ianwclarkb: did apply the change (search updates-testing) but TBH i'm not sure if that is sufficient to get the initramfs updated sufficiently.  we also have a dracut-regenerate element which might need to get involved11:43
ianwit should upload soon, but i won't have time to check it tonight11:43
*** ysandeep|afk is now known as ysandeep11:50
opendevreviewdaniel.pawlik proposed openstack/project-config master: Add project openstack/ci-log-processing
fungiianw: i can see where that would impact booting on rackspace (xen), but it's quite curious if it also fixes kvm-based providers12:45
*** jpena|off is now known as jpena12:59
*** artom_ is now known as artom13:06
*** ykarel is now known as ykarel|away13:25
*** cloudnull2 is now known as cloudnull13:41
opendevreviewAlfredo Moralejo proposed openstack/project-config master: Add support for CentOS Stream 9 in nodepool elements
opendevreviewMerged openstack/project-config master: Mirror newly added charms to GitHub
opendevreviewdaniel.pawlik proposed openstack/project-config master: Add project openstack/ci-log-processing
fungididn't we hide the tarball/zipball misfeature of gitea at one point?
*** dviroel|rover is now known as dviroel|rover|lunch15:15
clarkbfungi: we disable "releases" which has no content. I'm not sure the tags bit is a problem if it doesn't claim to be more than a tarball of a tag state. However, not sure if we can disable those anyway15:17
fungiahh, okay15:17
clarkbfungi: DEFAULT_REPO_UNITS is the relevant config15:18
clarkband disabled repo units15:19
clarkbbasically it is imperfect but we do what we can15:19
fungiyep, thanks!15:20
opendevreviewClark Boylan proposed opendev/system-config master: Upgrade gitea to 1.15.5
clarkbfungi: ^ we should probably go ahead and land that today too to keep up with upstream15:23
fungiyeah, today's good for that15:24
clarkbthe person with gerrit issues responded to me. I think directly. Sounds like they want to remove the email from the old account and allow gerrit to create a new account. I'll confirm that with them and if they ack that I've understood correctly we can go about making that happen15:28
fungiahh, good. thanks for the update!15:30
fungiand yeah, i didn't see any reply from them, so it must have gone only to you and not the ml15:30
*** ysandeep is now known as ysandeep|out15:32
clarkbre fedora I guess removing xen drivers would explain the problem for rax. But ya not sure what happend with ovh or iweb. Wouldn't surprise me if the xen driver cleanup came with cleanups for other drivers we need in those clouds though15:34
fungiright, that's what i realized after i asked, i didn't actually look at the fix so it might re-add a number of missing modules15:35
fungii suppose we should have some booting the new image now15:36
clarkbI just rechecked which will queue up at least one f34 job15:37
fungione building in rax-iad for the past 9 minutes, and another in airship-kna1 for 7 minutes15:37
fungithat doesn't bode well15:37
fungiyour recheck has added another building in rax-iad now15:37
fungibut yeah, fedora-34 0000007919 has been available in all providers for ~3-3.5 hours15:39
clarkbfrickler: if you have a moment can you rereview for zuul sigusr2 docs?15:39
fungi0027083402 in rax-iad has been building for almost 12 minutes now. something tells me it's going to time out15:40
clarkbinfra-root should be a straightforward comment only change that helps clarify some behavior in our ansible playbooks15:40
clarkbfungi: it could be the new image needing to be copied to hypervisor slowness15:40
fungibut also 0027083406 in airship-kna1, so i now worry we've broken the image even for the places where it was booting successfully before15:41
clarkbbut ya we could've made it worse15:41
fungiyeah, it could still be hypervisor image caches warming15:41
fricklerclarkb: approved15:48
clarkbfrickler: thanks!15:48
fricklerthis small fix for dib gentoo builds might be worth a look, too
fungistill no in-use fedora-34 nodes, but now 0027083653 is trying to build in ovh-bhs115:50
fungii'll see if i can spot why the earlier ones didn't do anything15:50
fungitimeout waiting for connection on port 2215:52
fungifor both rax-iad and airship-kna115:52
fungilooks like we're probably no longer successfully booting fedora-34 anywhere15:52
fungii'll try to scrape a console from one of the currently building nodes15:53
clarkbfungi: vexxhost was the other successful location when I did manual boots. I'm not sure that we'll schedule much there automatically on fedora 34 right now due to flavor sizes though15:53
fungibooting node i looked at in ovh has a console full of kernel panics i think, seems to have exceeded the buffer for it so i don't see any of the early boot output15:55
clarkbthat would imply the previous panics were not a result of bad image uploads (unless we got lucky twice)15:56
fungikernel panic on one in rax-ord as well15:58
clarkbok that is new bevhaior for rax I think15:59
clarkbseems like before it never got that far with the kernel in rax15:59
clarkbI wonder if the fix is incomplete in that case and the kernel is just broken15:59
fungiunfortunately the kernel panic is so verbose i can't seem to capture the start of it for proper context16:00
fungilooks like the buffer may be limited to 102400 butes16:01
clarkbis that also true of nova's console log show command?16:03
clarkb(not sure where that buffer limit might be)16:03
fungithat's what i'm using16:03
opendevreviewMerged opendev/system-config master: Document Zuul's SIGUSR2 handler
clarkbgerrit 3.5's first RC has happened16:05
fungiwelp, should we pause fedora-34 image builds and delete the most recent one? granted, the older one we still have is from ~8 hours ago so may not be viable either16:08
fungiat this point we should probably discourage anyone from trying to run jobs on fedora-34, period16:09
clarkbya looking at scrollback it isn't clear to me if we iterated through a few broken builds and then thought the last one would be happy or if we just did the last one thinking it would be good16:09
clarkbfungi: I think what we can do is undo the change that ianw documented in the etherpad and then rebuild to go back to where we were before16:10
fungier, discourage anyone from trying to run jobs on fedora at all. we never had fedora-33 working apparently, and we deleted fedora-32 a while back16:10
fungiso at the moment this is our only fedora16:10
clarkbthe undo of the dnf stuff for f34 seems likely to be the most reliable option available to us16:11
clarkbsince as you mention older images are of unknown function too16:11
*** dviroel|rover|lunch is now known as dviroel|rover16:11
clarkbfungi: ^ should I go ahead and do that?16:15
clarkbheh no vi on the image16:15
fungiclarkb: yeah i need to pull up that pad16:15
fungibut sounds like the next logical step16:16
clarkbya that etherpad. No ed/ex/vi/nano/emacs. There is sed16:17
clarkbI guess I can give sed line numbers to prefix with #16:17
fungi order16:18
fungier, wrong terminal16:18
clarkb`sed -e '14,17s/^/#/' /usr/local/lib/python3.7/site-packages/diskimage_builder/elements/yum/pre-install.d/00-dnf-update` but add a -i to do it in place16:19
clarkbfungi: if that looks sane I'll do that on nb01 and nb02 and we can ask nodepool to build a new image16:21
fungiclarkb: running that against the current copy from the dib source tree seems to do nothing?16:24
clarkbfungi: according to the etherpad those lines were hand patched in on nb01 and nb02 and my sed comments them out16:25
fungithough i guess there's more in the file on the image16:25
fungiand the idea is to comment out lines from 14-16 or some such?16:25
clarkb14-17 yup16:25
fungier, or has that been edited on the builders?16:25
clarkbthose are the lines added according to the etherpad. Commenting them out should hopefully produce an image that boots again in airship and vexxhost (but likely nowhere else)16:25
clarkbfungi: yes, the etherpad says those 4 lines were hand patched on the builder images on nb01 and nb0216:26
fungiinside the containers i guess16:26
fungifound it, /var/lib/docker/overlay2/faf1aed676e6d6a9bcab23fc50b32b4a48fda396b48752ecbfafcf977e7e8ad5/merged/usr/local/lib/python3.7/site-packages/diskimage_builder/elements/yum/pre-install.d/00-dnf-update on nb0116:28
fungiand yeah, that'll comment out a conditional block for $DISTRO_NAME == "fedora" && $DIB_RELEASE -ge 3416:28
fungiwhich is preinstalling dracut and upgrading to updates-testing for the FEDORA-2021-e4843341ca advisory16:29
fungiclarkb: okay, i've confirmed the sed command will do what we want16:29
fungii say go for it16:29
clarkbThere is also a /usr/local/lib/python3.7/site-packages/diskimage_builder/elements/yum/pre-install.d/00-dnf-update~ which I wonder if the runparts will find and run against too?16:30
clarkb(I hope not but wouldn't be surprised if it doesn't filter those files properly)16:30
fungii... hope not, yeah16:30
clarkbI'll run the sed against both files16:30
fungii'd just clear out the editor backup16:30
fungii suppose emacs must write those by default on our systems16:31
clarkbfungi: it has different content though and I didn't want to remove that in case it was relevant to debugging16:32
clarkbit wasn't bad to comment it out. I'm going to ask for a rebuild now16:32
clarkb(I did the comments on both nb01 and nb02)16:33
clarkband buidl requested16:33
clarkbtime for breakfast16:34
opendevreviewMerged openstack/diskimage-builder master: Fix bootloader installation for gentoo
*** jpena is now known as jpena|off17:05
clarkbfungi: dpawlik do we need a governance change before landing ?17:07
clarkbI'm happy to approve it as is now but wanted to double check we aren't getting ahead of ourselves for some reason (also might be worth waiting on the gitea upgrade before approving to avoid any order issues there, but I can approve it after the upgrade easily enough)17:08
fungiclarkb: i suggested one in my comments, but it's not critical that the governance change exist before we create the repo i think, since we already have agreement from this sig chair17:13
fungicouldn't hurt though17:13
clarkbin that case I guess I can approve it once the gitea upgrade is done17:14
fungiand yeah, upgrade first, surely17:15
opendevreviewMerged opendev/system-config master: Upgrade gitea to 1.15.5
clarkbThe hourly jobs queued up before ^ so we're waiting about 35 minutes before we do the upgrade (I'm keeping an eye on it)18:01
clarkbgitea01 has updated and lgtm18:38
fungiyeah, stepped away for a moment but testing them now18:51
fungiseems to still work as intended18:52
clarkb08 hasn't updated but the other 7 have and seem happy18:53
clarkbI expect 08 will be done shortly18:53
clarkband now 08 is done. The job should be completing shortly. I'm going to approve the openstack/ci-log-processing repo change as soon as this job completes18:56
clarkboh wait just saw a bug I think18:56
clarkbdpawlik: please see comments on
fungioh good catch, i entirely missed that typo18:59
*** dviroel|rover is now known as dviroel|rover|out20:56
ianwclarkb/fungi: sorry, looking at scrollback now21:15
ianwadding that dracut package making it globally not bootable was certainly not an expected outcome ...21:16
ianw(it also ignores files with ~ on the end)21:18
clarkbianw: ok so the ~ thing was probably unrelated21:20
ianwoh, doh, i guess this is it21:21
ianw2021-10-25 10:58:43.585 | >  Problem: package dracut-config-generic-055-3.fc34.x86_64 requires dracut = 055-3.fc34, but none of the providers can be installed21:22
ianw2021-10-25 10:58:43.586 | >   - cannot install both dracut-055-3.fc34.x86_64 and dracut-055-5.fc34.x86_6421:22
ianw2021-10-25 10:58:43.586 | >   - cannot install the best candidate for the job21:22
ianw... although, it still installed it21:23
ianwbut it downgraded dracut21:23
ianw2021-10-25 10:58:43.593 | > Downgrading:21:23
ianw2021-10-25 10:58:43.593 | >  dracut                        x86_64 055-3.fc34                  updates 347 k21:23
ianwso the end result of all this *should* have actually just been nothing -- it ended up with the old version of dracut anyway21:25
ianwfungi: i'm not seeing any console log in your homedir on nl01?21:32
clarkbianw: you should be able to boot the old image to get one at least? We didn't delete the image just replaced it with a revert of the in place patch of dib21:34
fungiianw: mmm. checking again, maybe i typed wrong21:36
fungiianw: oops, sorry, bridge.o.o21:36
opendevreviewIan Wienand proposed openstack/diskimage-builder master: [wip] regenerate initramfs with FEDORA-2021-e4843341ca
ianwinteresting ... i did not see an oops21:40
ianwwill see if the logs for ^ seem like it makes something sane21:42
fungiokay, this is weird... /afs/ has changes which do not appear in the read-only replica, however i don't see any errors releasing the docs volume in our logs21:49
fungioh, nevermind, i should learn to read more closely21:50
fungirelease ERROR    Release of docs failed21:50
fungiearliest occurrence was 2021-10-25 15:00:02,16521:52
fungilooks like there might be a stuck release lock for the replica on afs01.ord21:55
fungithough the server seems fine21:55
fungii've manually unlocked and started a new vos release22:04
fungivos status is showed afs01.ord doing deletevolume for that replica, and now it's running restore22:05
fungii expect this to take a while given the rtt between dfw and ord22:07
ianwfungi: i'm not sure but you might like to stop the cron job on ... mirror-update(?) that releases docs periodically?22:10
fungiyeah, i'll hold the flock it checks22:10
ianwclarkb: if you get a chance to check out that would be good, some dstat updates.  interested if it works for your browser (or, any browser that's not my firefox)22:13
fungiif i'm mathing correctly, it should complete in roughly 4 hours:
ianwyeah, about 10 megabit is the limit22:27
clarkbianw: it works in my FF and Chrome installs. Any other testing you think we should do or should I go ahead and approve it?22:31
ianwclarkb: i feel like that's about it ... i feel like it's better than what it currently does at least22:32
ianwno response to my pull request -- if we get past a couple of weeks i think it would probably make sense to import it with my latest changes to an and just maintain it there, similar to lodgeit22:33
corvusianw: agreed and thanks!22:34
fungi#status log The OpenStack docs volume in AFS has been stuck for replication since 15:00 UTC, so a full release has been initiated which should complete in roughly 4 hours22:34
opendevstatusfungi: finished logging22:34
ianwlooks like dracut-regenerate doesn't work anyway ... sigh22:50
opendevreviewIan Wienand proposed openstack/diskimage-builder master: [wip] regenerate initramfs with FEDORA-2021-e4843341ca
opendevreviewMerged zuul/zuul-jobs master: ensure-dstat-graph: pull updated branch
clarkbthe agenda seems pretty empty after Icleared out old topics. Maybe this is a good thing23:36
clarkb*tomorrows meeting agenda I mean23:36
clarkbI guess we can always talk about extra stuff when we have plenty of extra time tomorrow :)23:36
clarkbI'll give it a few more minutes in case anyone wants to add something (or have me add an extra topic)23:37
ianwclarkb: want to start planning for gerrit 3.4?  23:38
clarkbprobably a good idea23:38
ianwi'll be happy to drive that one if you like, last time seemed to go well enough23:39
ianw(timestamp that for famous last words)23:39
clarkbianw: I added some thoughts to the agenda around what we should be thinking about before upgrading. But also happy to help.23:41
clarkband agenda sent.23:43
clarkboh we should also test the revert23:44
clarkbI did that with 3.3 -> 3.2 on a held test node23:44
clarkbwe should do the same with a held 3.4 -> 3.323:44

Generated by 2.17.2 by Marius Gedminas - find it at!