*** ykarel__ is now known as ykarel | 08:10 | |
clarkb | ildikov: that sounds like a symptom of older cached files (basically you end up with a cached parent file that can't require/import old files that are no longer on a new server installation) | 15:09 |
---|---|---|
clarkb | did a hard refresh help at all? | 15:09 |
clarkb | infra-root I wonder if the kolla struggles with docker py will affect us. https://review.opendev.org/c/opendev/system-config/+/920115 passed testing and shuold be affected and ran after the updates that broke stuff I think so probably not | 15:10 |
clarkb | should we go ahead and approve 920115 with a plan to restart on that version today? then I'll be able to retest stuff with the new images before may 31 | 15:10 |
ildikov | Oh, ok. And yes, the etherpad is loaded now. It’s just odd, because this isn’t a browser that’s been running for a long time… | 15:10 |
fungi | ildikov: it's possible chrome had cached the javascript to disk and didn't invalidate it when the server replaced that file with a newer version | 15:13 |
fungi | i suppose a forced refresh invalidates the cached files referenced from that page | 15:14 |
ildikov | I’ll keep an eye on it and see how it behaves from here. | 15:14 |
clarkb | ya a hard refresh is meant to refetch files even if the browser thinks they haven't changed | 15:15 |
opendevreview | Jeremy Stanley proposed openstack/project-config master: Clean up unused labels from nl02 config https://review.opendev.org/c/openstack/project-config/+/920190 | 15:36 |
fungi | infra-root: should i self-approve https://review.opendev.org/920149 to move forward with the ubuntu-noble node addition, or does anyone else want to review it first? | 15:38 |
clarkb | fungi: I think its probably fine to proceed. Worst case we either fail to build images then pause them while we debug or do build images and they don't boot so we disable them. Either way no jobs are using them yet so blast radius is small | 15:39 |
fungi | fair enough. fire in the hole! | 15:42 |
opendevreview | Merged openstack/project-config master: Add Ubuntu 24.04 LTS (ubuntu-noble) nodes https://review.opendev.org/c/openstack/project-config/+/920149 | 15:49 |
fungi | it's deployed, nb01 building ubuntu-noble since about 9 minutes ago | 16:00 |
fungi | same for nb04 and ubuntu-noble-arm64 | 16:00 |
clarkb | now we wait | 16:01 |
clarkb | it will probably take about 1.5-2 hours until we can see the first boot attempt assuming the build itself is successfuly | 16:01 |
fungi | still going at 26 minutes elapsed | 16:18 |
opendevreview | Merged opendev/system-config master: Update Gerrit images to 3.8.6 and 3.9.5 https://review.opendev.org/c/opendev/system-config/+/920115 | 16:22 |
clarkb | both the 3.8 and 3.9 images promoted after ^ landed | 16:36 |
clarkb | when do we want to restart gerrit? | 16:36 |
tonyb | It's pretty low risk right? Assuming so I don't see a lot of value in delaying it. So "at the top of the hour" ? | 16:38 |
tonyb | gives other infra-roots time to weigh in | 16:38 |
fungi | yeah, fine by me. i'll be around to help | 16:38 |
clarkb | ya should be low risk given the small delta between what we're running now and this update | 16:39 |
clarkb | ok process should be to do an image pull, docker-compose down, mv replication waiting queue aside, docker-compose up -d, tail logs and check functionality | 16:41 |
clarkb | I've just warned the release team | 16:42 |
tonyb | okay | 16:42 |
tonyb | When do you delete the replication queue ? Once we're happy its working? | 16:43 |
clarkb | tonyb: I actually haven't been deleting them. The queue is a bunch of really tiny files so its "cost" is low and by keeping them around we have data for future fixing/debugging. Though the value of that is minimal at this point as I think I managed to identify and report the cases upstream | 16:44 |
tonyb | Okay | 16:44 |
clarkb | so ya I think we can prune those copied waiting queues the day after a restart or whatever | 16:44 |
tonyb | Makes sense | 16:46 |
clarkb | https://hub.docker.com/layers/opendevorg/gerrit/3.8/images/sha256-69a0493bf9f74fe9798cfa974a814065e6a6cad958fc55656e149c14c76a3d16?context=explore is the image we should expect to end up on. I'll check that after the pull as well | 16:47 |
clarkb | how does this look #status notice There will be a short Gerrit outage while we update to the latest 3.8 release in preparation for next weeks 3.9 upgrade. | 16:49 |
tonyb | LGTM | 16:49 |
clarkb | I've started a root screen on review02. Will send that message and do the pull at the top of the hour | 16:51 |
tonyb | +1 | 16:52 |
clarkb | #status notice There will be a short Gerrit outage while we update to the latest 3.8 release in preparation for next weeks 3.9 upgrade. | 17:00 |
opendevstatus | clarkb: sending notice | 17:00 |
-opendevstatus- NOTICE: There will be a short Gerrit outage while we update to the latest 3.8 release in preparation for next weeks 3.9 upgrade. | 17:00 | |
tonyb | clarkb: image id looks good | 17:01 |
clarkb | the image that was pulled matches what I expected to see in docker hub so that checks out | 17:01 |
clarkb | yup | 17:01 |
clarkb | I'll do the down, queue mv, and up again after the bot reports it is done sneding the notice | 17:01 |
tonyb | Okay | 17:02 |
opendevstatus | clarkb: finished sending notice | 17:03 |
tonyb | Powered by Gerrit Code Review (3.8.6-3-g1219a3208f-dirty) | 17:04 |
clarkb | ya the log reported it started up on that version | 17:05 |
clarkb | diffs are loading for me on a random change I clicked on out of my status page | 17:05 |
clarkb | one annoying thing is that half the plugins report they are version 3.9.5 and that is because version 3.9.5 and 3.8.6 are the same thing | 17:05 |
tonyb | Yup they just started working for me | 17:06 |
tonyb | Ah. | 17:06 |
clarkb | I wonder if we should modify our build process to prune all tags in the plugin repos that are not the version we are checking out so that it doesn't do that | 17:06 |
clarkb | https://gerrit.googlesource.com/plugins/replication/+log/8fd3c271ce0a21480e3d04da5ad2112efea3bedf for example shows that 3.8.6 and 3.9.5 of the replication plugin are identical | 17:06 |
tonyb | We could do that. | 17:07 |
clarkb | slightly worried the risk of doing that is worse than the inconvennience of having misleading versions reported in the log | 17:07 |
clarkb | would have to think about it a bit more, as long as we don't remove what we checkout though it should be safe since tags are just pointers | 17:08 |
clarkb | just need someone to push a change now | 17:08 |
tonyb | Yeah. I think it's worth looking at but implementing very carefully | 17:08 |
clarkb | https://review.opendev.org/c/openstack/octavia/+/919846 I think this was a new patchset after the update | 17:08 |
clarkb | this seems to have gone as anticipated so I'm going to close the screen down | 17:10 |
tonyb | Okay | 17:10 |
clarkb | oh someone beat me to it | 17:10 |
tonyb | Ooops | 17:10 |
tonyb | ctrl-d isn't the same as ctrl-a d :/ | 17:10 |
clarkb | no its fine | 17:10 |
clarkb | I was just about to type exit myself | 17:11 |
clarkb | Once we're happy with where the noble test node situation is I'll see about retesting some gerrit upgrade stuff. I'll rotate node holds now | 17:11 |
fungi | seems to be working for me | 17:12 |
fungi | gerrit i mean | 17:12 |
fungi | as for noble images, building is 80 minutes in and hasn't aborted yet, so i take that as a good sign | 17:12 |
clarkb | fwiw I think I'm going to give up on the gerrit doc build target change and/or patching that in locally for now. We can always make an update like that after the 3.9 upgrade but for now the more important thing is actually getting the upgrade done since this isn't a regression | 17:14 |
opendevreview | Clark Boylan proposed opendev/system-config master: DNM Forced fail on Gerrit to test the 3.9 upgrade https://review.opendev.org/c/opendev/system-config/+/893571 | 17:15 |
tonyb | clarkb: Sounds reasonable | 17:15 |
fungi | agreed | 17:17 |
clarkb | fungi: fwiw future noble builds should be quicker as dib builds out caching for noble stuff | 17:36 |
fungi | yeah, still building at 105 minutes now | 17:38 |
mnasiadka | Hello - any idea why rechecks/rebase does not make Zuul jobs run for https://review.opendev.org/c/openstack/kayobe/+/910513 ? | 17:42 |
clarkb | mnasiadka: yes this was debugged in #openstack-infra earlier today. The problem is the addition of three github depends on. In particular we believe the one whose branch was deleted may be affecting zuuls ability to check if the PR is mergeable? | 17:44 |
clarkb | basically zuul is falling over mergeability checking to ensure the parents are not in a state that would prevent the child from merging. We know that when people do PRs on the same repo and then delete things after the PR is merged that this can impact that. There may also be a token problem but would need more debugging to say for sure. However, in this case all of the depends on | 17:45 |
clarkb | have merged and the ci jobs don't actually do anything with the depends on so the best thing in this case is to remove them from the change entirely | 17:45 |
mnasiadka | clarkb: makes sense, let me check | 17:55 |
mnasiadka | Yeah, running now - thanks clarkb | 17:58 |
fungi | ubuntu-noble-arm64 is still building, but ubuntu-noble is ready since about 25 minutes and is uploading to providers (ready in both ovh regions already) | 18:02 |
fungi | a ready node has also been successfully booted at 158.69.64.166 in ovh-bhs1 | 18:03 |
fungi | i can ssh into it as root | 18:03 |
fungi | so we can probably run a test job as soon as someone feels like lining it up | 18:04 |
clarkb | it denied my key. I think I have it loaded properly according to ssh-agent -l | 18:04 |
clarkb | oh wait its because we haven't rotated keys on test nodes yet pebkac | 18:04 |
clarkb | fungi: I would suggest adding noble jobs to zuul-jobs since that tenant shoudl already use ansible 9 | 18:05 |
clarkb | do you want todo that or should I? | 18:05 |
fungi | i can set it up shortly | 18:06 |
clarkb | that should also get us feedback on where any incompatibilies might be since zuul-jobs has a broad set of distro release reach | 18:07 |
fungi | clarkb: how would you recommend going about it? just add to tools/update-test-platforms.py and regenerate the configs? | 18:50 |
fungi | tox -re update-test-platforms | 18:52 |
fungi | looks like it did a thing | 18:52 |
opendevreview | Jeremy Stanley proposed zuul/zuul-jobs master: Add ubuntu-noble testing https://review.opendev.org/c/zuul/zuul-jobs/+/920205 | 18:53 |
fungi | it's already got a job running on the initial ready node | 18:54 |
fungi | https://zuul.opendev.org/t/zuul/stream/ba64de4048db4865965233c114c7c751?logfile=console.log | 18:54 |
fungi | failed in pre-run: https://zuul.opendev.org/t/zuul/build/ba64de4048db4865965233c114c7c751 | 18:56 |
fungi | unbound failed to start, guess we'll need to hold a node to check logs for that | 18:56 |
fungi | i've added an autohold for zuul-jobs-test-bindep-ubuntu-noble since it hasn't run yet | 18:58 |
fungi | also one for zuul-jobs-test-ensure-nox-ubuntu-noble since it's running right now so might get us a sample a few minutes sooner | 19:00 |
fungi | oh, though it's not going to hold on a retry, is it | 19:01 |
fungi | added a hold for zuul-jobs-test-validate-zone-db-ubuntu-noble too since it's starting its third attempt soon | 19:01 |
fungi | error: unable to open /var/lib/unbound/root.key for reading: No such file or directory | 19:16 |
Clark[m] | Different paths maybe? | 19:17 |
fungi | split to a new package: unbound-anchor | 19:17 |
fungi | maybe. still verifying | 19:18 |
fungi | oh, it's dns-root-data on ubuntu | 19:19 |
fungi | okay, yeah installing dns-root-data (recommended by the unbound package) gets it working | 19:24 |
fungi | i'll re-test that on another held node, since i have several to choose from | 19:25 |
fungi | yeah, that's what fixed it, just installing that one package allows the unbound service to start successfully, and then i can resolve through it over the loopback | 19:27 |
fungi | oh, even better, we already did this for debian-bookworm | 19:28 |
johnsom | Yeah, the root key is usually in a different package as it changes at a different rate | 19:29 |
fungi | well, it apparently wasn't until ~recently | 19:29 |
johnsom | I think it was in unbound-anchor before. At least that is what I remember | 19:29 |
fungi | maybe it moved from a depends to a recommends when the package name changed | 19:31 |
fungi | unfortunately, packages.ubuntu.com is choking on me again, as it often does | 19:31 |
johnsom | Yeah, same for me. lol | 19:31 |
fungi | though transition from depends to recommends is exactly what happened in debian-bookworm according to frickler's commit message in https://review.opendev.org/c/openstack/project-config/+/887570 a year ago | 19:34 |
opendevreview | Jeremy Stanley proposed openstack/project-config master: Fix unbound setup for ubuntu-noble https://review.opendev.org/c/openstack/project-config/+/920208 | 19:35 |
johnsom | Yeah, so unbound-anchor was depends in Jammy, and now dns-root-data is recommends | 19:37 |
johnsom | Finally got the pages to load | 19:37 |
Clark[m] | Weird that it is only a recommends if it doesn't work at all without it. Or maybe that is a side effect of our config doing dnssec validation? | 19:37 |
johnsom | https://www.irccloud.com/pastebin/GrCQSy5R/ | 19:39 |
johnsom | Interesting. Good to know as I bet I'm going to have upgrade issues | 19:40 |
johnsom | https://www.irccloud.com/pastebin/NAWZjrmP/ | 19:41 |
clarkb | back from lunch and I'll go ahead and approve that change after I double check the json is proper | 19:56 |
fungi | pretty sure i dusted it with the requisite number of commas | 20:03 |
opendevreview | Merged openstack/project-config master: Fix unbound setup for ubuntu-noble https://review.opendev.org/c/openstack/project-config/+/920208 | 20:09 |
fungi | clarkb: we need to delete the existing images from nodepool to trigger new ones with that ^ right? | 20:09 |
clarkb | fungi: you did. I just know that anytime I edit json directly I'm more likely yo get it wrong than not | 20:10 |
clarkb | fungi: I think you can explicitly tell nodepool to build new images instead | 20:10 |
clarkb | alternatively you can delete the old ones that should work too | 20:10 |
fungi | oh, good. also the arm64 image is still building. been 4h20m so far | 20:10 |
fungi | hopefully the cache it's building up will help the rerun | 20:10 |
clarkb | ++ | 20:10 |
fungi | sudo docker-compose -f /etc/nodepool-builder-compose/docker-compose.yaml exec nodepool-builder nodepool image-build ubuntu-noble | 20:13 |
fungi | and ubuntu-noble-arm64 | 20:13 |
clarkb | yes, it might ignore the arm64 request since one is in progress. I can't remember | 20:13 |
fungi | though i think that's not going to work for that one, right | 20:13 |
fungi | seems to have been ignored | 20:14 |
fungi | not a big deal, that one's not holding up testing for now | 20:14 |
fungi | clarkb: interestingly, it wasn't ignored, it was merely queued up. once nb04 stopped building the previous version, it immediately began building the new one | 20:40 |
clarkb | oh cool. I wasn't sure how it would resolve the request and whether or not it would queue it up or ignore it because the requested task was already in progress | 20:42 |
fungi | though related question... should i have waited for the deploy to complete before starting new image builds? | 20:42 |
fungi | just realized the change triggered a deploy job | 20:43 |
fungi | infra-prod-service-nodepool | 20:43 |
fungi | i think the elements get checked out directly from git when building an image though, so should be fine as long as it was merged | 20:44 |
clarkb | I'm not sure about that | 20:45 |
clarkb | I think the project-config checkout may be what the builds rely on | 20:45 |
clarkb | and they don't self bootstrap that | 20:46 |
fungi | mmm | 20:46 |
clarkb | you can confirm via the elements list in the build string I think /me looks | 20:46 |
fungi | or i can keep an eye on the log and see if it installs the missing package | 20:48 |
clarkb | the container bind mounts /opt/project-config in | 20:49 |
clarkb | I'm not finding where we tell dib to find the elements in that repo yet but I think the fact we bind mount it implies we're relying on the ansibel managed checkout there | 20:49 |
fungi | can't really start a new image build until this finishes anyway, though i suppose it's possible the deploy job updated the bind-mounted files before dib will end up reading them | 20:51 |
clarkb | ELEMENTS_PATH=/etc/nodepool/elements is in the environment | 20:51 |
clarkb | whcih isn't /opt/project-config | 20:51 |
clarkb | ok /etc/nodepool/elements is a symlink to /opt/project-config/nodepool/elements | 20:52 |
clarkb | and that is configured via the elements_dir config option. So ya we are using the ansible managed checkout | 20:53 |
clarkb | fungi: its possible the update would've occurred quickly enough to apply to the running build | 20:53 |
fungi | right, that's what i also surmised above | 20:54 |
clarkb | I don't see the current build getting to installing unbound yet so ya | 20:54 |
fungi | will check the log once it reaches that stage | 20:54 |
clarkb | the image build is nearing the end of the git repo stuff should move more quickly after that is done | 21:17 |
clarkb | fungi: I think it may be using an old version of the file. Maybe it copies them all into the image build at build start time | 21:21 |
clarkb | in the bit where it says the following packages will be installed unbound is listed but not dns-root-data (dns-root-data is listed above in recommended packages) | 21:22 |
fungi | yeah, i see that. oh well, i'll queue up more image builds now | 21:31 |
fungi | i may still have time to clear out the remaining ready node later this evening once those complete and then recheck the zuul-jobs change to see what else is broken | 21:33 |
opendevreview | Julia Kreger proposed openstack/diskimage-builder master: simple-init: Swap continue for true https://review.opendev.org/c/openstack/diskimage-builder/+/920215 | 21:39 |
fungi | i think it didn't actually start another ubuntu-noble build, checking the logs on that last one | 23:39 |
fungi | yeah, it didn't | 23:43 |
fungi | i'll try to coerce it now | 23:43 |
fungi | there it goes | 23:43 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!