Wednesday, 2024-05-22

*** ykarel__ is now known as ykarel08:10
clarkbildikov: that sounds like a symptom of older cached files (basically you end up with a cached parent file that can't require/import old files that are no longer on a new server installation)15:09
clarkbdid a hard refresh help at all?15:09
clarkbinfra-root I wonder if the kolla struggles with docker py will affect us. https://review.opendev.org/c/opendev/system-config/+/920115 passed testing and shuold be affected and ran after the updates that broke stuff I think so probably not15:10
clarkbshould we go ahead and approve 920115 with a plan to restart on that version today? then I'll be able to retest stuff with the new images before may 3115:10
ildikovOh, ok. And yes, the etherpad is loaded now. It’s just odd, because this isn’t a browser that’s been running for a long time…15:10
fungiildikov: it's possible chrome had cached the javascript to disk and didn't invalidate it when the server replaced that file with a newer version15:13
fungii suppose a forced refresh invalidates the cached files referenced from that page15:14
ildikovI’ll keep an eye on it and see how it behaves from here.15:14
clarkbya a hard refresh is meant to refetch files even if the browser thinks they haven't changed15:15
opendevreviewJeremy Stanley proposed openstack/project-config master: Clean up unused labels from nl02 config  https://review.opendev.org/c/openstack/project-config/+/92019015:36
fungiinfra-root: should i self-approve https://review.opendev.org/920149 to move forward with the ubuntu-noble node addition, or does anyone else want to review it first?15:38
clarkbfungi: I think its probably fine to proceed. Worst case we either fail to build images then pause them while we debug or do build images and they don't boot so we disable them. Either way no jobs are using them yet so blast radius is small15:39
fungifair enough. fire in the hole!15:42
opendevreviewMerged openstack/project-config master: Add Ubuntu 24.04 LTS (ubuntu-noble) nodes  https://review.opendev.org/c/openstack/project-config/+/92014915:49
fungiit's deployed, nb01 building ubuntu-noble since about 9 minutes ago16:00
fungisame for nb04 and ubuntu-noble-arm6416:00
clarkbnow we wait16:01
clarkbit will probably take about 1.5-2 hours until we can see the first boot attempt assuming the build itself is successfuly16:01
fungistill going at 26 minutes elapsed16:18
opendevreviewMerged opendev/system-config master: Update Gerrit images to 3.8.6 and 3.9.5  https://review.opendev.org/c/opendev/system-config/+/92011516:22
clarkbboth the 3.8 and 3.9 images promoted after ^ landed16:36
clarkbwhen do we want to restart gerrit?16:36
tonybIt's pretty low risk right?  Assuming so I don't see a lot of value in delaying it.  So "at the top of the hour" ?16:38
tonybgives other infra-roots time to weigh in16:38
fungiyeah, fine by me. i'll be around to help16:38
clarkbya should be low risk given the small delta between what we're running now and this update16:39
clarkbok process should be to do an image pull, docker-compose down, mv replication waiting queue aside, docker-compose up -d, tail logs and check functionality16:41
clarkbI've just warned the release team16:42
tonybokay16:42
tonybWhen do you delete the replication queue ?  Once we're happy its working?16:43
clarkbtonyb: I actually haven't been deleting them. The queue is a bunch of really tiny files so its "cost" is low and by keeping them around we have data for future fixing/debugging. Though the value of that is minimal at this point as I think I managed to identify and report the cases upstream16:44
tonybOkay16:44
clarkbso ya I think we can prune those copied waiting queues the day after a restart or whatever16:44
tonybMakes sense16:46
clarkbhttps://hub.docker.com/layers/opendevorg/gerrit/3.8/images/sha256-69a0493bf9f74fe9798cfa974a814065e6a6cad958fc55656e149c14c76a3d16?context=explore is the image we should expect to end up on. I'll check that after the pull as well16:47
clarkbhow does this look #status notice There will be a short Gerrit outage while we update to the latest 3.8 release in preparation for next weeks 3.9 upgrade.16:49
tonybLGTM16:49
clarkbI've started a root screen on review02. Will send that message and do the pull at the top of the hour16:51
tonyb+116:52
clarkb#status notice There will be a short Gerrit outage while we update to the latest 3.8 release in preparation for next weeks 3.9 upgrade.17:00
opendevstatusclarkb: sending notice17:00
-opendevstatus- NOTICE: There will be a short Gerrit outage while we update to the latest 3.8 release in preparation for next weeks 3.9 upgrade.17:00
tonybclarkb: image id looks good17:01
clarkbthe image that was pulled matches what I expected to see in docker hub so that checks out17:01
clarkbyup17:01
clarkbI'll do the down, queue mv, and up again after the bot reports it is done sneding the notice17:01
tonybOkay17:02
opendevstatusclarkb: finished sending notice17:03
tonybPowered by Gerrit Code Review (3.8.6-3-g1219a3208f-dirty)17:04
clarkbya the log reported it started up on that version17:05
clarkbdiffs are loading for me on a random change I clicked on out of my status page17:05
clarkbone annoying thing is that half the plugins report they are version 3.9.5 and that is because version 3.9.5 and 3.8.6 are the same thing17:05
tonybYup they just started working for me17:06
tonybAh.17:06
clarkbI wonder if we should modify our build process to prune all tags in the plugin repos that are not the version we are checking out so that it doesn't do that17:06
clarkbhttps://gerrit.googlesource.com/plugins/replication/+log/8fd3c271ce0a21480e3d04da5ad2112efea3bedf for example shows that 3.8.6 and 3.9.5 of the replication plugin are identical17:06
tonybWe could do that.17:07
clarkbslightly worried the risk of doing that is worse than the inconvennience of having misleading versions reported in the log17:07
clarkbwould have to think about it a bit more, as long as we don't remove what we checkout though it should be safe since tags are just pointers17:08
clarkbjust need someone to push a change now17:08
tonybYeah.  I think it's worth looking at but implementing very carefully17:08
clarkbhttps://review.opendev.org/c/openstack/octavia/+/919846 I think this was a new patchset after the update17:08
clarkbthis seems to have gone as anticipated so I'm going to close the screen down17:10
tonybOkay17:10
clarkboh someone beat me to it17:10
tonybOoops17:10
tonybctrl-d isn't the same as ctrl-a d :/17:10
clarkbno its fine17:10
clarkbI was just about to type exit myself17:11
clarkbOnce we're happy with where the noble test node situation is I'll see about retesting some gerrit upgrade stuff. I'll rotate node holds now17:11
fungiseems to be working for me17:12
fungigerrit i mean17:12
fungias for noble images, building is 80 minutes in and hasn't aborted yet, so i take that as a good sign17:12
clarkbfwiw I think I'm going to give up on the gerrit doc build target change and/or patching that in locally for now. We can always make an update like that after the 3.9 upgrade but for now the more important thing is actually getting the upgrade done since this isn't a regression17:14
opendevreviewClark Boylan proposed opendev/system-config master: DNM Forced fail on Gerrit to test the 3.9 upgrade  https://review.opendev.org/c/opendev/system-config/+/89357117:15
tonybclarkb: Sounds reasonable17:15
fungiagreed17:17
clarkbfungi: fwiw future noble builds should be quicker as dib builds out caching for noble stuff17:36
fungiyeah, still building at 105 minutes now17:38
mnasiadkaHello - any idea why rechecks/rebase does not make Zuul jobs run for https://review.opendev.org/c/openstack/kayobe/+/910513 ?17:42
clarkbmnasiadka: yes this was debugged in #openstack-infra earlier today. The problem is the addition of three github depends on. In particular we believe the one whose branch was deleted may be affecting zuuls ability to check if the PR is mergeable?17:44
clarkbbasically zuul is falling over mergeability checking to ensure the parents are not in a state that would prevent the child from merging. We know that when people do PRs on the same repo and then delete things after the PR is merged that this can impact that. There may also be a token problem but would need more debugging to say for sure. However, in this case all of the depends on17:45
clarkbhave merged and the ci jobs don't actually do anything with the depends on so the best thing in this case is to remove them from the change entirely17:45
mnasiadkaclarkb: makes sense, let me check17:55
mnasiadkaYeah, running now - thanks clarkb17:58
fungiubuntu-noble-arm64 is still building, but ubuntu-noble is ready since about 25 minutes and is uploading to providers (ready in both ovh regions already)18:02
fungia ready node has also been successfully booted at 158.69.64.166 in ovh-bhs118:03
fungii can ssh into it as root18:03
fungiso we can probably run a test job as soon as someone feels like lining it up18:04
clarkbit denied my key. I think I have it loaded properly according to ssh-agent -l18:04
clarkboh wait its because we haven't rotated keys on test nodes yet pebkac18:04
clarkbfungi: I would suggest adding noble jobs to zuul-jobs since that tenant shoudl already use ansible 918:05
clarkbdo you want todo that or should I?18:05
fungii can set it up shortly18:06
clarkbthat should also get us feedback on where any incompatibilies might be since zuul-jobs has a broad set of distro release reach18:07
fungiclarkb: how would you recommend going about it? just add to tools/update-test-platforms.py and regenerate the configs?18:50
fungitox -re update-test-platforms18:52
fungilooks like it did a thing18:52
opendevreviewJeremy Stanley proposed zuul/zuul-jobs master: Add ubuntu-noble testing  https://review.opendev.org/c/zuul/zuul-jobs/+/92020518:53
fungiit's already got a job running on the initial ready node18:54
fungihttps://zuul.opendev.org/t/zuul/stream/ba64de4048db4865965233c114c7c751?logfile=console.log18:54
fungifailed in pre-run: https://zuul.opendev.org/t/zuul/build/ba64de4048db4865965233c114c7c75118:56
fungiunbound failed to start, guess we'll need to hold a node to check logs for that18:56
fungii've added an autohold for zuul-jobs-test-bindep-ubuntu-noble since it hasn't run yet18:58
fungialso one for zuul-jobs-test-ensure-nox-ubuntu-noble since it's running right now so might get us a sample a few minutes sooner19:00
fungioh, though it's not going to hold on a retry, is it19:01
fungiadded a hold for zuul-jobs-test-validate-zone-db-ubuntu-noble too since it's starting its third attempt soon19:01
fungierror: unable to open /var/lib/unbound/root.key for reading: No such file or directory19:16
Clark[m]Different paths maybe?19:17
fungisplit to a new package: unbound-anchor19:17
fungimaybe. still verifying19:18
fungioh, it's dns-root-data on ubuntu19:19
fungiokay, yeah installing dns-root-data (recommended by the unbound package) gets it working19:24
fungii'll re-test that on another held node, since i have several to choose from19:25
fungiyeah, that's what fixed it, just installing that one package allows the unbound service to start successfully, and then i can resolve through it over the loopback19:27
fungioh, even better, we already did this for debian-bookworm19:28
johnsomYeah, the root key is usually in a different package as it changes at a different rate19:29
fungiwell, it apparently wasn't until ~recently19:29
johnsomI think it was in unbound-anchor before. At least that is what I remember19:29
fungimaybe it moved from a depends to a recommends when the package name changed19:31
fungiunfortunately, packages.ubuntu.com is choking on me again, as it often does19:31
johnsomYeah, same for me. lol19:31
fungithough transition from depends to recommends is exactly what happened in debian-bookworm according to frickler's commit message in https://review.opendev.org/c/openstack/project-config/+/887570 a year ago19:34
opendevreviewJeremy Stanley proposed openstack/project-config master: Fix unbound setup for ubuntu-noble  https://review.opendev.org/c/openstack/project-config/+/92020819:35
johnsomYeah, so unbound-anchor was depends in Jammy, and now dns-root-data is recommends19:37
johnsomFinally got the pages to load19:37
Clark[m]Weird that it is only a recommends if it doesn't work at all without it. Or maybe that is a side effect of our config doing dnssec validation?19:37
johnsomhttps://www.irccloud.com/pastebin/GrCQSy5R/19:39
johnsomInteresting. Good to know as I bet I'm going to have upgrade issues19:40
johnsomhttps://www.irccloud.com/pastebin/NAWZjrmP/19:41
clarkbback from lunch and I'll go ahead and approve that change after I double check the json is proper19:56
fungipretty sure i dusted it with the requisite number of commas20:03
opendevreviewMerged openstack/project-config master: Fix unbound setup for ubuntu-noble  https://review.opendev.org/c/openstack/project-config/+/92020820:09
fungiclarkb: we need to delete the existing images from nodepool to trigger new ones with that ^ right?20:09
clarkbfungi: you did. I just know that anytime I edit json directly I'm more likely yo get it wrong than not20:10
clarkbfungi: I think you can explicitly tell nodepool to build new images instead20:10
clarkbalternatively you can delete the old ones that should work too20:10
fungioh, good. also the arm64 image is still building. been 4h20m so far20:10
fungihopefully the cache it's building up will help the rerun20:10
clarkb++20:10
fungisudo docker-compose -f /etc/nodepool-builder-compose/docker-compose.yaml exec nodepool-builder nodepool image-build ubuntu-noble20:13
fungiand ubuntu-noble-arm6420:13
clarkbyes, it might ignore the arm64 request since one is in progress. I can't remember20:13
fungithough i think that's not going to work for that one, right20:13
fungiseems to have been ignored20:14
funginot a big deal, that one's not holding up testing for now20:14
fungiclarkb: interestingly, it wasn't ignored, it was merely queued up. once nb04 stopped building the previous version, it immediately began building the new one20:40
clarkboh cool. I wasn't sure how it would resolve the request and whether or not it would queue it up or ignore it because the requested task was already in progress20:42
fungithough related question... should i have waited for the deploy to complete before starting new image builds?20:42
fungijust realized the change triggered a deploy job20:43
fungiinfra-prod-service-nodepool20:43
fungii think the elements get checked out directly from git when building an image though, so should be fine as long as it was merged20:44
clarkbI'm not sure about that20:45
clarkbI think the project-config checkout may be what the builds rely on20:45
clarkband they don't self bootstrap that20:46
fungimmm20:46
clarkbyou can confirm via the elements list in the build string I think /me looks20:46
fungior i can keep an eye on the log and see if it installs the missing package20:48
clarkbthe container bind mounts /opt/project-config in20:49
clarkbI'm not finding where we tell dib to find the elements in that repo yet but I think the fact we bind mount it implies we're relying on the ansibel managed checkout there20:49
fungican't really start a new image build until this finishes anyway, though i suppose it's possible the deploy job updated the bind-mounted files before dib will end up reading them20:51
clarkbELEMENTS_PATH=/etc/nodepool/elements is in the environment20:51
clarkbwhcih isn't /opt/project-config20:51
clarkbok /etc/nodepool/elements is a symlink to /opt/project-config/nodepool/elements20:52
clarkband that is configured via the elements_dir config option. So ya we are using the ansible managed checkout20:53
clarkbfungi: its possible the update would've occurred quickly enough to apply to the running build20:53
fungiright, that's what i also surmised above20:54
clarkbI don't see the current build getting to installing unbound yet so ya 20:54
fungiwill check the log once it reaches that stage20:54
clarkbthe image build is nearing the end of the git repo stuff should move more quickly after that is done21:17
clarkbfungi: I think it may be using an old version of the file. Maybe it copies them all into the image build at build start time21:21
clarkbin the bit where it says the following packages will be installed unbound is listed but not dns-root-data (dns-root-data is listed above in recommended packages)21:22
fungiyeah, i see that. oh well, i'll queue up more image builds now21:31
fungii may still have time to clear out the remaining ready node later this evening once those complete and then recheck the zuul-jobs change to see what else is broken21:33
opendevreviewJulia Kreger proposed openstack/diskimage-builder master: simple-init: Swap continue for true  https://review.opendev.org/c/openstack/diskimage-builder/+/92021521:39
fungii think it didn't actually start another ubuntu-noble build, checking the logs on that last one23:39
fungiyeah, it didn't23:43
fungii'll try to coerce it now23:43
fungithere it goes23:43

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!