opendevreview | Ian Wienand proposed openstack/diskimage-builder master: centos-minimal: boot test 9-stream https://review.opendev.org/c/openstack/diskimage-builder/+/814844 | 00:07 |
---|---|---|
ianw | clarkb: after much faffing about, i am 99% sure the fedora 34 boot issues are from https://bugzilla.redhat.com/show_bug.cgi?id=2010058 | 04:42 |
ianw | https://etherpad.opendev.org/p/f34-ci-boot | 04:51 |
frickler | ianw: seems you used the f35 advisory code instead of the one for f34 | 05:34 |
frickler | comment 29 vs. 30 | 05:36 |
ianw | ah, yes | 06:31 |
ianw | i think that is too early still anyway... | 06:34 |
ianw | fedora-34-0000007913.log ... trying with an explicit pre-install of dracut, then upgrading it | 06:44 |
opendevreview | Martin Kopec proposed opendev/system-config master: Adjust RefStack build for osf->openinfra rename https://review.opendev.org/c/opendev/system-config/+/808480 | 07:26 |
opendevreview | Alfredo Moralejo proposed openstack/diskimage-builder master: Add support for CentOS Stream 9 in DIB https://review.opendev.org/c/openstack/diskimage-builder/+/811392 | 07:40 |
*** ykarel is now known as ykarel|lunch | 07:49 | |
*** mazzy5096 is now known as mazzy509 | 08:10 | |
*** mazzy5098 is now known as mazzy509 | 08:23 | |
opendevreview | daniel.pawlik proposed openstack/project-config master: Add project openstack/ci-log-processing https://review.opendev.org/c/openstack/project-config/+/815260 | 09:29 |
opendevreview | daniel.pawlik proposed openstack/project-config master: Setup configuration for project openstack/ci-log-processing https://review.opendev.org/c/openstack/project-config/+/815024 | 09:32 |
*** ykarel|lunch is now known as ykarel | 09:35 | |
*** noonedeadpunk_ is now known as noonedeadpunk | 10:36 | |
ianw | i think fedora-34-0000007919.log will work | 10:36 |
*** ysandeep is now known as ysandeep|afk | 10:48 | |
*** dviroel|rover|afk is now known as dviroel|rover | 11:10 | |
opendevreview | daniel.pawlik proposed openstack/project-config master: Add project openstack/ci-log-processing https://review.opendev.org/c/openstack/project-config/+/815260 | 11:20 |
ianw | clarkb: https://nb02.opendev.org/fedora-34-0000007919.log did apply the change (search updates-testing) but TBH i'm not sure if that is sufficient to get the initramfs updated sufficiently. we also have a dracut-regenerate element which might need to get involved | 11:43 |
ianw | it should upload soon, but i won't have time to check it tonight | 11:43 |
*** ysandeep|afk is now known as ysandeep | 11:50 | |
opendevreview | daniel.pawlik proposed openstack/project-config master: Add project openstack/ci-log-processing https://review.opendev.org/c/openstack/project-config/+/815260 | 12:06 |
fungi | ianw: i can see where that would impact booting on rackspace (xen), but it's quite curious if it also fixes kvm-based providers | 12:45 |
*** jpena|off is now known as jpena | 12:59 | |
*** artom_ is now known as artom | 13:06 | |
*** ykarel is now known as ykarel|away | 13:25 | |
*** cloudnull2 is now known as cloudnull | 13:41 | |
opendevreview | Alfredo Moralejo proposed openstack/project-config master: Add support for CentOS Stream 9 in nodepool elements https://review.opendev.org/c/openstack/project-config/+/811442 | 13:58 |
opendevreview | Merged openstack/project-config master: Mirror newly added charms to GitHub https://review.opendev.org/c/openstack/project-config/+/814888 | 14:16 |
opendevreview | daniel.pawlik proposed openstack/project-config master: Add project openstack/ci-log-processing https://review.opendev.org/c/openstack/project-config/+/815260 | 14:23 |
fungi | didn't we hide the tarball/zipball misfeature of gitea at one point? https://opendev.org/opendev/bindep/tags | 15:14 |
*** dviroel|rover is now known as dviroel|rover|lunch | 15:15 | |
clarkb | fungi: we disable "releases" https://opendev.org/opendev/bindep/releases which has no content. I'm not sure the tags bit is a problem if it doesn't claim to be more than a tarball of a tag state. However, not sure if we can disable those anyway | 15:17 |
fungi | ahh, okay | 15:17 |
clarkb | fungi: https://docs.gitea.io/en-us/config-cheat-sheet/ DEFAULT_REPO_UNITS is the relevant config | 15:18 |
clarkb | and disabled repo units | 15:19 |
clarkb | basically it is imperfect but we do what we can | 15:19 |
fungi | yep, thanks! | 15:20 |
opendevreview | Clark Boylan proposed opendev/system-config master: Upgrade gitea to 1.15.5 https://review.opendev.org/c/opendev/system-config/+/815326 | 15:23 |
clarkb | fungi: ^ we should probably go ahead and land that today too to keep up with upstream | 15:23 |
fungi | yeah, today's good for that | 15:24 |
clarkb | the person with gerrit issues responded to me. I think directly. Sounds like they want to remove the email from the old account and allow gerrit to create a new account. I'll confirm that with them and if they ack that I've understood correctly we can go about making that happen | 15:28 |
fungi | ahh, good. thanks for the update! | 15:30 |
fungi | and yeah, i didn't see any reply from them, so it must have gone only to you and not the ml | 15:30 |
*** ysandeep is now known as ysandeep|out | 15:32 | |
clarkb | re fedora I guess removing xen drivers would explain the problem for rax. But ya not sure what happend with ovh or iweb. Wouldn't surprise me if the xen driver cleanup came with cleanups for other drivers we need in those clouds though | 15:34 |
fungi | right, that's what i realized after i asked, i didn't actually look at the fix so it might re-add a number of missing modules | 15:35 |
fungi | i suppose we should have some booting the new image now | 15:36 |
fungi | checking | 15:36 |
clarkb | I just rechecked https://review.opendev.org/c/opendev/bindep/+/814809 which will queue up at least one f34 job | 15:37 |
fungi | one building in rax-iad for the past 9 minutes, and another in airship-kna1 for 7 minutes | 15:37 |
fungi | that doesn't bode well | 15:37 |
fungi | your recheck has added another building in rax-iad now | 15:37 |
fungi | but yeah, fedora-34 0000007919 has been available in all providers for ~3-3.5 hours | 15:39 |
clarkb | frickler: if you have a moment can you rereview https://review.opendev.org/c/opendev/system-config/+/815134 for zuul sigusr2 docs? | 15:39 |
fungi | 0027083402 in rax-iad has been building for almost 12 minutes now. something tells me it's going to time out | 15:40 |
clarkb | infra-root https://review.opendev.org/c/opendev/system-config/+/815049 should be a straightforward comment only change that helps clarify some behavior in our ansible playbooks | 15:40 |
clarkb | fungi: it could be the new image needing to be copied to hypervisor slowness | 15:40 |
fungi | but also 0027083406 in airship-kna1, so i now worry we've broken the image even for the places where it was booting successfully before | 15:41 |
clarkb | but ya we could've made it worse | 15:41 |
fungi | yeah, it could still be hypervisor image caches warming | 15:41 |
frickler | clarkb: approved | 15:48 |
clarkb | frickler: thanks! | 15:48 |
frickler | this small fix for dib gentoo builds might be worth a look, too https://review.opendev.org/c/openstack/diskimage-builder/+/814866 | 15:48 |
fungi | still no in-use fedora-34 nodes, but now 0027083653 is trying to build in ovh-bhs1 | 15:50 |
fungi | i'll see if i can spot why the earlier ones didn't do anything | 15:50 |
fungi | timeout waiting for connection on port 22 | 15:52 |
fungi | for both rax-iad and airship-kna1 | 15:52 |
fungi | looks like we're probably no longer successfully booting fedora-34 anywhere | 15:52 |
fungi | i'll try to scrape a console from one of the currently building nodes | 15:53 |
clarkb | fungi: vexxhost was the other successful location when I did manual boots. I'm not sure that we'll schedule much there automatically on fedora 34 right now due to flavor sizes though | 15:53 |
fungi | booting node i looked at in ovh has a console full of kernel panics i think, seems to have exceeded the buffer for it so i don't see any of the early boot output | 15:55 |
clarkb | that would imply the previous panics were not a result of bad image uploads (unless we got lucky twice) | 15:56 |
fungi | kernel panic on one in rax-ord as well | 15:58 |
clarkb | ok that is new bevhaior for rax I think | 15:59 |
clarkb | seems like before it never got that far with the kernel in rax | 15:59 |
clarkb | I wonder if the fix is incomplete in that case and the kernel is just broken | 15:59 |
fungi | unfortunately the kernel panic is so verbose i can't seem to capture the start of it for proper context | 16:00 |
fungi | looks like the buffer may be limited to 102400 butes | 16:01 |
fungi | bytes | 16:01 |
fungi | 100kib | 16:02 |
clarkb | is that also true of nova's console log show command? | 16:03 |
clarkb | (not sure where that buffer limit might be) | 16:03 |
fungi | that's what i'm using | 16:03 |
opendevreview | Merged opendev/system-config master: Document Zuul's SIGUSR2 handler https://review.opendev.org/c/opendev/system-config/+/815134 | 16:03 |
clarkb | gerrit 3.5's first RC has happened | 16:05 |
fungi | welp, should we pause fedora-34 image builds and delete the most recent one? granted, the older one we still have is from ~8 hours ago so may not be viable either | 16:08 |
fungi | at this point we should probably discourage anyone from trying to run jobs on fedora-34, period | 16:09 |
clarkb | ya looking at scrollback it isn't clear to me if we iterated through a few broken builds and then thought the last one would be happy or if we just did the last one thinking it would be good | 16:09 |
clarkb | fungi: I think what we can do is undo the change that ianw documented in the etherpad and then rebuild to go back to where we were before | 16:10 |
fungi | er, discourage anyone from trying to run jobs on fedora at all. we never had fedora-33 working apparently, and we deleted fedora-32 a while back | 16:10 |
fungi | so at the moment this is our only fedora | 16:10 |
clarkb | the undo of the dnf stuff for f34 seems likely to be the most reliable option available to us | 16:11 |
clarkb | since as you mention older images are of unknown function too | 16:11 |
*** dviroel|rover|lunch is now known as dviroel|rover | 16:11 | |
clarkb | fungi: ^ should I go ahead and do that? | 16:15 |
clarkb | heh no vi on the image | 16:15 |
fungi | clarkb: yeah i need to pull up that pad | 16:15 |
fungi | but sounds like the next logical step | 16:16 |
fungi | https://etherpad.opendev.org/p/f34-ci-boot | 16:16 |
clarkb | ya that etherpad. No ed/ex/vi/nano/emacs. There is sed | 16:17 |
clarkb | I guess I can give sed line numbers to prefix with # | 16:17 |
fungi | order | 16:18 |
fungi | er, wrong terminal | 16:18 |
clarkb | `sed -e '14,17s/^/#/' /usr/local/lib/python3.7/site-packages/diskimage_builder/elements/yum/pre-install.d/00-dnf-update` but add a -i to do it in place | 16:19 |
clarkb | fungi: if that looks sane I'll do that on nb01 and nb02 and we can ask nodepool to build a new image | 16:21 |
fungi | clarkb: running that against the current copy from the dib source tree seems to do nothing? | 16:24 |
clarkb | fungi: according to the etherpad those lines were hand patched in on nb01 and nb02 and my sed comments them out | 16:25 |
fungi | though i guess there's more in the file on the image | 16:25 |
fungi | and the idea is to comment out lines from 14-16 or some such? | 16:25 |
clarkb | 14-17 yup | 16:25 |
fungi | er, or has that been edited on the builders? | 16:25 |
clarkb | those are the lines added according to the etherpad. Commenting them out should hopefully produce an image that boots again in airship and vexxhost (but likely nowhere else) | 16:25 |
clarkb | fungi: yes, the etherpad says those 4 lines were hand patched on the builder images on nb01 and nb02 | 16:26 |
fungi | inside the containers i guess | 16:26 |
clarkb | yes | 16:26 |
fungi | found it, /var/lib/docker/overlay2/faf1aed676e6d6a9bcab23fc50b32b4a48fda396b48752ecbfafcf977e7e8ad5/merged/usr/local/lib/python3.7/site-packages/diskimage_builder/elements/yum/pre-install.d/00-dnf-update on nb01 | 16:28 |
fungi | and yeah, that'll comment out a conditional block for $DISTRO_NAME == "fedora" && $DIB_RELEASE -ge 34 | 16:28 |
fungi | which is preinstalling dracut and upgrading to updates-testing for the FEDORA-2021-e4843341ca advisory | 16:29 |
fungi | clarkb: okay, i've confirmed the sed command will do what we want | 16:29 |
fungi | i say go for it | 16:29 |
clarkb | There is also a /usr/local/lib/python3.7/site-packages/diskimage_builder/elements/yum/pre-install.d/00-dnf-update~ which I wonder if the runparts will find and run against too? | 16:30 |
clarkb | (I hope not but wouldn't be surprised if it doesn't filter those files properly) | 16:30 |
fungi | i... hope not, yeah | 16:30 |
clarkb | I'll run the sed against both files | 16:30 |
fungi | i'd just clear out the editor backup | 16:30 |
fungi | i suppose emacs must write those by default on our systems | 16:31 |
clarkb | fungi: it has different content though and I didn't want to remove that in case it was relevant to debugging | 16:32 |
clarkb | it wasn't bad to comment it out. I'm going to ask for a rebuild now | 16:32 |
clarkb | (I did the comments on both nb01 and nb02) | 16:33 |
clarkb | and buidl requested | 16:33 |
fungi | thanks | 16:33 |
clarkb | time for breakfast | 16:34 |
opendevreview | Merged openstack/diskimage-builder master: Fix bootloader installation for gentoo https://review.opendev.org/c/openstack/diskimage-builder/+/814866 | 17:02 |
*** jpena is now known as jpena|off | 17:05 | |
clarkb | fungi: dpawlik do we need a governance change before landing https://review.opendev.org/c/openstack/project-config/+/815260 ? | 17:07 |
clarkb | I'm happy to approve it as is now but wanted to double check we aren't getting ahead of ourselves for some reason (also might be worth waiting on the gitea upgrade before approving to avoid any order issues there, but I can approve it after the upgrade easily enough) | 17:08 |
fungi | clarkb: i suggested one in my comments, but it's not critical that the governance change exist before we create the repo i think, since we already have agreement from this sig chair | 17:13 |
fungi | couldn't hurt though | 17:13 |
clarkb | in that case I guess I can approve it once the gitea upgrade is done | 17:14 |
fungi | and yeah, upgrade first, surely | 17:15 |
opendevreview | Merged opendev/system-config master: Upgrade gitea to 1.15.5 https://review.opendev.org/c/opendev/system-config/+/815326 | 18:00 |
clarkb | The hourly jobs queued up before ^ so we're waiting about 35 minutes before we do the upgrade (I'm keeping an eye on it) | 18:01 |
fungi | thanks | 18:05 |
clarkb | gitea01 has updated and lgtm | 18:38 |
fungi | yeah, stepped away for a moment but testing them now | 18:51 |
fungi | seems to still work as intended | 18:52 |
clarkb | 08 hasn't updated but the other 7 have and seem happy | 18:53 |
clarkb | I expect 08 will be done shortly | 18:53 |
clarkb | and now 08 is done. The job should be completing shortly. I'm going to approve the openstack/ci-log-processing repo change as soon as this job completes | 18:56 |
clarkb | oh wait just saw a bug I think | 18:56 |
clarkb | dpawlik: please see comments on https://review.opendev.org/c/openstack/project-config/+/815260 | 18:57 |
fungi | oh good catch, i entirely missed that typo | 18:59 |
*** dviroel|rover is now known as dviroel|rover|out | 20:56 | |
ianw | clarkb/fungi: sorry, looking at scrollback now | 21:15 |
ianw | adding that dracut package making it globally not bootable was certainly not an expected outcome ... | 21:16 |
ianw | (it also ignores files with ~ on the end) | 21:18 |
clarkb | ianw: ok so the ~ thing was probably unrelated | 21:20 |
ianw | oh, doh, i guess this is it | 21:21 |
ianw | 2021-10-25 10:58:43.585 | > Problem: package dracut-config-generic-055-3.fc34.x86_64 requires dracut = 055-3.fc34, but none of the providers can be installed | 21:22 |
ianw | 2021-10-25 10:58:43.586 | > - cannot install both dracut-055-3.fc34.x86_64 and dracut-055-5.fc34.x86_64 | 21:22 |
ianw | 2021-10-25 10:58:43.586 | > - cannot install the best candidate for the job | 21:22 |
ianw | ... although, it still installed it | 21:23 |
ianw | but it downgraded dracut | 21:23 |
ianw | 2021-10-25 10:58:43.593 | > Downgrading: | 21:23 |
ianw | 2021-10-25 10:58:43.593 | > dracut x86_64 055-3.fc34 updates 347 k | 21:23 |
ianw | so the end result of all this *should* have actually just been nothing -- it ended up with the old version of dracut anyway | 21:25 |
ianw | fungi: i'm not seeing any console log in your homedir on nl01? | 21:32 |
clarkb | ianw: you should be able to boot the old image to get one at least? We didn't delete the image just replaced it with a revert of the in place patch of dib | 21:34 |
fungi | ianw: mmm. checking again, maybe i typed wrong | 21:36 |
fungi | ianw: oops, sorry, bridge.o.o | 21:36 |
opendevreview | Ian Wienand proposed openstack/diskimage-builder master: [wip] regenerate initramfs with FEDORA-2021-e4843341ca https://review.opendev.org/c/openstack/diskimage-builder/+/815385 | 21:38 |
ianw | interesting ... i did not see an oops | 21:40 |
ianw | will see if the logs for ^ seem like it makes something sane | 21:42 |
fungi | okay, this is weird... /afs/.openstack.org/docs/charm-guide/latest/release-notes/index.html has changes which do not appear in the read-only replica, however i don't see any errors releasing the docs volume in our logs | 21:49 |
fungi | oh, nevermind, i should learn to read more closely | 21:50 |
fungi | release ERROR Release of docs failed | 21:50 |
fungi | earliest occurrence was 2021-10-25 15:00:02,165 | 21:52 |
fungi | looks like there might be a stuck release lock for the replica on afs01.ord | 21:55 |
fungi | though the server seems fine | 21:55 |
fungi | i've manually unlocked and started a new vos release | 22:04 |
fungi | vos status is showed afs01.ord doing deletevolume for that replica, and now it's running restore | 22:05 |
fungi | i expect this to take a while given the rtt between dfw and ord | 22:07 |
ianw | :/ | 22:10 |
ianw | fungi: i'm not sure but you might like to stop the cron job on ... mirror-update(?) that releases docs periodically? | 22:10 |
fungi | yeah, i'll hold the flock it checks | 22:10 |
ianw | clarkb: if you get a chance to check out https://review.opendev.org/c/zuul/zuul-jobs/+/815089 that would be good, some dstat updates. interested if it works for your browser (or, any browser that's not my firefox) | 22:13 |
fungi | if i'm mathing correctly, it should complete in roughly 4 hours: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=2264&rra_id=all | 22:22 |
ianw | yeah, about 10 megabit is the limit | 22:27 |
clarkb | ianw: it works in my FF and Chrome installs. Any other testing you think we should do or should I go ahead and approve it? | 22:31 |
ianw | clarkb: i feel like that's about it ... i feel like it's better than what it currently does at least | 22:32 |
ianw | no response to my pull request -- if we get past a couple of weeks i think it would probably make sense to import it with my latest changes to an opendev.org and just maintain it there, similar to lodgeit | 22:33 |
clarkb | wfm | 22:33 |
corvus | ianw: agreed and thanks! | 22:34 |
fungi | #status log The OpenStack docs volume in AFS has been stuck for replication since 15:00 UTC, so a full release has been initiated which should complete in roughly 4 hours | 22:34 |
opendevstatus | fungi: finished logging | 22:34 |
ianw | https://2645a96313d0095b3bcc-38806e64f2d3daae89d7ad3776e6aee4.ssl.cf5.rackcdn.com/815385/1/check/dib-nodepool-functional-openstack-fedora-34-containerfile-src/5b12ca8/nodepool/builds/test-image-0000000001.log | 22:49 |
ianw | looks like dracut-regenerate doesn't work anyway ... sigh | 22:50 |
opendevreview | Ian Wienand proposed openstack/diskimage-builder master: [wip] regenerate initramfs with FEDORA-2021-e4843341ca https://review.opendev.org/c/openstack/diskimage-builder/+/815385 | 23:00 |
opendevreview | Merged zuul/zuul-jobs master: ensure-dstat-graph: pull updated branch https://review.opendev.org/c/zuul/zuul-jobs/+/815089 | 23:32 |
clarkb | the agenda seems pretty empty after Icleared out old topics. Maybe this is a good thing | 23:36 |
clarkb | *tomorrows meeting agenda I mean | 23:36 |
clarkb | I guess we can always talk about extra stuff when we have plenty of extra time tomorrow :) | 23:36 |
clarkb | I'll give it a few more minutes in case anyone wants to add something (or have me add an extra topic) | 23:37 |
ianw | clarkb: want to start planning for gerrit 3.4? | 23:38 |
clarkb | probably a good idea | 23:38 |
ianw | i'll be happy to drive that one if you like, last time seemed to go well enough | 23:39 |
ianw | (timestamp that for famous last words) | 23:39 |
clarkb | ianw: I added some thoughts to the agenda around what we should be thinking about before upgrading. But also happy to help. | 23:41 |
ianw | ++ | 23:41 |
clarkb | and agenda sent. | 23:43 |
clarkb | oh we should also test the revert | 23:44 |
clarkb | I did that with 3.3 -> 3.2 on a held test node | 23:44 |
clarkb | we should do the same with a held 3.4 -> 3.3 | 23:44 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!