frickler | corvus: clarkb: there is something weird about zuul history. https://zuul.opendev.org/t/openstack/build/a6515b5787b349e490d9c24029d08f80 ran on Sep 1, but if I look at build history for that job, it only shows 49 builds none earlier than Sep 13 | 01:40 |
---|---|---|
frickler | I'm also not sure why and when that job started failing, seems related to switching to bookworm | 01:41 |
Clark[m] | frickler: related to zuul switching to bookworm? | 01:54 |
frickler | Clark[m]: nodepool rather, but I didn't check the details yet. I wanted to start by seeing when exactly the job started failing, but with the zuul amnesia I only have a rough guess so far | 01:55 |
frickler | though https://review.opendev.org/c/zuul/nodepool/+/892697 looks like a probable trigger | 01:57 |
Clark[m] | Oh I thought you were saying the history weirdness might be related to bookworm. But you are saying the actual job result may be | 01:58 |
frickler | ah, yes, sorry for mixing that up. the reason for history lacking things is completely unclear to me | 01:59 |
Clark[m] | I did test that the nodepool builds with dib ran successfully building an Ubuntu focal image in the bookworm switch. But maybe the problem is more specific | 01:59 |
frickler | dib history looks more complete and would match my assumption https://zuul.opendev.org/t/openstack/builds?job_name=nodepool-build-image-siblings&project=openstack/diskimage-builder | 02:00 |
Clark[m] | Got it. Not in a good spot to debug now but can probably look in the morning | 02:00 |
frickler | iiuc the issue is in building an actual nodepool image, not with generic images | 02:01 |
opendevreview | daniel.pawlik proposed zuul/zuul-jobs master: Add feature to set --vm-driver name for minikube https://review.opendev.org/c/zuul/zuul-jobs/+/894755 | 07:21 |
opendevreview | daniel.pawlik proposed zuul/zuul-jobs master: Add feature to set --vm-driver name for minikube https://review.opendev.org/c/zuul/zuul-jobs/+/894755 | 07:33 |
opendevreview | daniel.pawlik proposed zuul/zuul-jobs master: Bump cri-o default version to 1.26 https://review.opendev.org/c/zuul/zuul-jobs/+/895597 | 07:45 |
opendevreview | daniel.pawlik proposed zuul/zuul-jobs master: Add feature to set --vm-driver name for minikube https://review.opendev.org/c/zuul/zuul-jobs/+/894755 | 07:46 |
opendevreview | daniel.pawlik proposed zuul/zuul-jobs master: Bump cri-o default version to 1.26 https://review.opendev.org/c/zuul/zuul-jobs/+/895597 | 07:50 |
opendevreview | daniel.pawlik proposed zuul/zuul-jobs master: Bump cri-o default version to 1.28 https://review.opendev.org/c/zuul/zuul-jobs/+/895597 | 08:09 |
opendevreview | daniel.pawlik proposed zuul/zuul-jobs master: Change CoreDNS configuration for Minikube https://review.opendev.org/c/zuul/zuul-jobs/+/895604 | 08:38 |
opendevreview | daniel.pawlik proposed zuul/zuul-jobs master: Fix tox job when stestr is used https://review.opendev.org/c/zuul/zuul-jobs/+/895606 | 08:55 |
*** amoralej is now known as amoralej|lunch | 10:59 | |
*** amoralej|lunch is now known as amoralej | 12:22 | |
corvus | frickler: Clark it appears to be a pagination error; here is a list of 100 builds: https://zuul.opendev.org/t/openstack/builds?job_name=nodepool-build-image-siblings&project=openstack/openstacksdk&limit=100 | 14:58 |
frickler | corvus: oh, interesting, it only shows 99 builds there for me. some kind of bad/hidden apple that makes zuul think the list is complete? | 15:03 |
frickler | also going to 200 confirms Sep 6 as the date for the job failing | 15:03 |
*** ykarel is now known as ykarel|away | 15:04 | |
clarkb | is that when we switched to bookworm? | 15:11 |
clarkb | it is | 15:12 |
frickler | corvus: any idea what is happening with the pagination? nothing special found in web-debug.log | 15:14 |
frickler | clarkb: yes, that's the patch I mentioned earlier | 15:14 |
clarkb | ok the issue in the job is trying to install dib bindep packages that apparently don't exist on bookworm anymore | 15:15 |
clarkb | so updating dib to install the correct pacakges should fix it | 15:15 |
clarkb | I'll work on that | 15:15 |
clarkb | I'm rearranging things in the bindep.txt to hopefully be more future proof | 15:23 |
clarkb | basically stop explicitly matching new things forcing us to match new things every release and instead unmatch old things | 15:23 |
fungi | good call, also makes cleanup easier | 15:27 |
opendevreview | Clark Boylan proposed openstack/diskimage-builder master: Update bindep rules for Debuntu https://review.opendev.org/c/openstack/diskimage-builder/+/895699 | 15:35 |
clarkb | frickler: ^ I think that will fix it | 15:35 |
clarkb | frickler: did you want to review https://review.opendev.org/c/openstack/project-config/+/895514 to update nodepool image upload timeouts before we approve it? | 15:40 |
frickler | clarkb: I meant to, but got distracted, thx for the reminder | 15:54 |
opendevreview | Merged openstack/project-config master: Set a six hour nodepool image upload timeout https://review.opendev.org/c/openstack/project-config/+/895514 | 16:43 |
frickler | corvus: clarkb: when you restarted zuul from scratch, what exactly did you do to clean up the cache, is that documented somewhere? just some zookeeper commands? | 16:57 |
clarkb | frickler: it is a zuul command that clears out the zk data. Let me find a link | 16:57 |
clarkb | frickler: https://zuul-ci.org/docs/zuul/latest/client.html#delete-state | 16:58 |
frickler | clarkb: ah, I was only looking at zuul-scheduler commands, thx a lot | 17:01 |
frickler | clarkb: bonus question: how do you run that command, when all containers are stopped? some docker foo with manually mounted config volumes? | 17:09 |
clarkb | frickler: I believe that corvus indicated it was executed using the scheduler container and docker-compose run | 17:09 |
clarkb | frickler: something like `docker-compose run zuul-scheduler zuul delete-state` but corvus can confirm | 17:10 |
fungi | frickler: run works by spawning a new container, rather than exec which uses an already running container | 17:10 |
fungi | (and i think run overrides the init so services don't get started in it?) | 17:10 |
frickler | oh, that sounds easy enough, thanks again | 17:11 |
clarkb | run overrides the RUN command of the container if you supply an explicit command. But it does not override the ENTRYPOINT | 17:13 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Revert "Disable base role testing that runs code on localhost" https://review.opendev.org/c/zuul/zuul-jobs/+/895708 | 17:18 |
opendevreview | Merged openstack/project-config master: Retire os-win: remove project from infra https://review.opendev.org/c/openstack/project-config/+/894419 | 17:24 |
opendevreview | Merged openstack/project-config master: Retire compute-hyperv: remove project from infra https://review.opendev.org/c/openstack/project-config/+/894420 | 17:25 |
opendevreview | Merged openstack/project-config master: Retire networking-hyperv: remove project from infra https://review.opendev.org/c/openstack/project-config/+/894441 | 17:27 |
opendevreview | Merged openstack/project-config master: Retire oswin-tempest-plugin: remove project from infra https://review.opendev.org/c/openstack/project-config/+/894442 | 17:27 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Revert "Disable base role testing that runs code on localhost" https://review.opendev.org/c/zuul/zuul-jobs/+/895708 | 17:36 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Pin python-subunit in fetch-subunit-output test https://review.opendev.org/c/zuul/zuul-jobs/+/895715 | 17:36 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Unpin stestr and python-subunit in fetch-subunit-output test https://review.opendev.org/c/zuul/zuul-jobs/+/895716 | 17:36 |
frickler | just ftr I did 'docker-compose run zuul-scheduler bash' and then 'zuul delete-state' inside and it worked fine | 17:45 |
frickler | (on my local zuul, no need to worry ;) | 17:46 |
fungi | corvus: odd behavior that might be regex related, not sure... https://opendev.org/openstack/requirements/src/branch/stable/train/.zuul.d/cross-jobs.yaml#L185-L197 defines a job parented to build-openstack-sphinx-docs which https://zuul.opendev.org/t/openstack/job/build-openstack-sphinx-docs says is only relevant on stable/rocky, yet there is no config error for that reported in | 18:55 |
fungi | https://zuul.opendev.org/t/openstack/config-errors | 18:55 |
fungi | change https://review.opendev.org/891629 is attempting to remove the definition scoped to stable/rocky, and is getting the config error about stable/stein of openstack/requirements using it | 18:55 |
fungi | https://zuul.opendev.org/t/openstack/job/cross-osc-build-sphinx-docs also correctly shows only being relevant for stable/stein (despite having a parent that is only relevant for stable/rocky) | 18:57 |
fungi | it's left me scratching my head | 18:57 |
frickler | I think https://opendev.org/openstack/requirements/src/branch/stable/stein/.zuul.d/cross-jobs.yaml#L180-L192 is the questionable job reference, not the stable/train link above | 19:09 |
frickler | after reading the docs, my understanding of job.branches is that it only limits when the job will be run, it doesn't block the "parent" composition. so a job definition with "parent: build-openstack-sphinx-docs" will only become a config error once the last definition of that parent job gets removed, even if already earlier there is no chance that this job variant will ever be executed | 19:13 |
fungi | yeah, maybe it's just not intuitive to me that you can parent a job to a definition on another branch that doesn't match (either implicitly, explicitly, or via default fallback) | 19:15 |
fungi | i agree that seems possible though | 19:15 |
corvus | fungi: the parent definition doesn't specify a branch, it just specifies a job, and there's no way to know if the job will run without an exemplar ref (and even in the case where it seems like there should be no possibility, it's possible for a child to alter things to encourage zuul to use a parent on a different branch). | 19:23 |
corvus | fungi: so if you want to try to rewire your brain to match zuul: jobs are abstract ideas that exist independent of branches; branches are just things that are used to decide which variants of jobs to use. | 19:24 |
fungi | corvus: thanks, so essentially what frickler was also saying | 19:25 |
fungi | and https://review.opendev.org/891629 then really is creating an error by trying to remove a job definition that is referenced by another project's config | 19:26 |
corvus | fungi: well, i was hoping to try to address your comment that it was non-intuitive. i was trying to find a way to meet you at your intuition. | 19:27 |
fungi | yep, got it. i wasn't thinking of the possibility that job definitions can be taken from other branches even when those aren't the default branch | 19:30 |
corvus | "jobs exist in a state of quantum superposition until a change is enqueued..." does that help? ;) | 19:32 |
fungi | yes, perfectly! | 19:32 |
corvus | definitely going to work "speculative execution" and "quantum superposition" into the same slide for my next talk | 19:33 |
fungi | so to be clear, the job wouldn't have actually run anyway because the parent wasn't defined for that branch | 19:33 |
corvus | fungi: it is very unlikely for that job to run in the normal course of business as i understand things in openstack, yes. | 19:35 |
fungi | wfm, thanks again | 19:35 |
fungi | (and yes i can imagine there might be some convoluted way to get it to run from a third project that defined its default branch as one of those and then had a change proposed for the other) | 19:36 |
corvus | here's a draft of the announcement about zuul regexes: https://etherpad.opendev.org/p/3FbfuhNmIWT33fCFK1oK | 21:07 |
clarkb | corvus: lgtm | 21:30 |
opendevreview | Merged openstack/diskimage-builder master: Update bindep rules for Debuntu https://review.opendev.org/c/openstack/diskimage-builder/+/895699 | 21:49 |
clarkb | I've done some edits to the team meeting agenda. Anything else to add? | 21:50 |
clarkb | also yuriy responded to my email already. So starting to work through the information gathering for that cloud replacement | 21:51 |
corvus | clarkb: probably worth bringing up that email in the meeting (not sure if that falls under an existing item) | 21:51 |
JayF | Hello infra friends; can someone hold a node for me: job: ironic-inspector-grenade, change: 895164 repo: openstack/ironic-inspector | 21:52 |
JayF | it's in experimental queue in the change I'm testing it on, if that matters | 21:52 |
JayF | let me know when that's in place and I'll run a `check experimental` | 21:52 |
clarkb | corvus: yup I added a new agenda item to discuss the whole cloud redployment thing | 21:52 |
clarkb | JayF: yup one sec | 21:53 |
clarkb | JayF: it should be in placen ow | 21:56 |
corvus | clarkb: oh sorry i meant the regex notification email (though the other thing sounds good too) | 21:56 |
JayF | thank you | 21:56 |
JayF | clarkb: I will note if Julia somehow wants to steal this node in the morning before I get to it, please give it to her (or anyone else who wants to help fix it, really LOL) | 21:57 |
clarkb | corvus: oh got it | 22:01 |
clarkb | corvus: do plan to send it before then and I can link to the archive or should I link to the draft? | 22:01 |
corvus | the draft -- i'd like to make sure everyone agrees with the approach since this is somewhat novel | 22:11 |
fungi | corvus: that announcement lgtm | 22:31 |
JayF | corvus: if you want a PTL+TC perspective, that LGTM as well. Thank you for the detail :) | 22:32 |
fungi | on the openstack side of things, i expect the biggest challenges are going to be trying to fix errors on several-years-old branches where nobody has kept their jobs passing, and may require removal/fixing of failing jobs or assistance from sysadmins to bypass gating | 22:32 |
fungi | but we can tackle that on a case-by-case basis | 22:33 |
corvus | JayF, fungi thanks :) | 22:33 |
JayF | fungi: I think there's a potential prioritization batte there: What is more important; the "fix CI" forcing function and CI-correctness, or fixing zuul-config-errors in an expediant manner through force-merging changes. | 22:34 |
fungi | JayF: or seeing it as a reminder that if you're going to keep branches open to accept changes, you need to keep whatever jobs you run on those branches passing | 22:34 |
fungi | (with several implied solutions to the problem, depending on your available time and preferences) | 22:35 |
JayF | that is another way to state 'forcing CI-correctness', at least in how I meant it | 22:35 |
fungi | yep | 22:35 |
JayF | I think you're right, and that probably is the correct move, but I know it always feels like I'm closing a door forever when I remove jobs from an old branch, which makes it tough to do | 22:36 |
fungi | when i said several implied solutions, i meant: 1. fixing jobs, 2. removing jobs, 3. closing branches | 22:36 |
fungi | which is appropriate will be situation-dependent | 22:37 |
clarkb | slittle: did further decisions get made around starlingx repo renaming? | 23:09 |
clarkb | meeting agenda sent | 23:10 |
fungi | thanks! | 23:40 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!