tonyb | Sounds good, I have removed the images from bridge, as they match the ones that exist on my laptop | 00:02 |
---|---|---|
tonyb | I left the vhd "just in case". I'll verify the RAX boot in all regions and then remove the vhd also | 00:02 |
*** jp is now known as Guest6628 | 06:19 | |
opendevreview | Dr. Jens Harbott proposed opendev/infra-manual master: Replace twitter with fediverse https://review.opendev.org/c/opendev/infra-manual/+/939775 | 09:43 |
frickler | infra-root: ^^ btw., who has the credentials for the fosstodon.org account? I couldn't find them on bridge, only a token in the hostvars for eavesdrop? | 09:44 |
ianw | i feel certain i set it up and can't imagine why i didn't put the credentials in ... | 11:42 |
opendevreview | Merged opendev/infra-manual master: Replace twitter with fediverse https://review.opendev.org/c/opendev/infra-manual/+/939775 | 13:35 |
slittle | We have two gerrit reviews that seem to be stuck... zuul isn't voting on them | 15:39 |
slittle | https://review.opendev.org/c/starlingx/utilities/+/938743 | 15:40 |
slittle | https://review.opendev.org/c/starlingx/vault-armada-app/+/938744 | 15:40 |
clarkb | slittle: if I had to guess either you've got file matchers on all your jobs that don't match updates to .gitreview or you've got branch matchers that don't match the new branch. | 15:49 |
clarkb | slittle: https://opendev.org/starlingx/vault-armada-app/src/branch/master/.zuul.yaml#L35-L36 picking a random check job this is the case for that specific job | 15:50 |
clarkb | however doesn't seem to be the case for https://opendev.org/starlingx/vault-armada-app/src/branch/r/stx.10.0/.zuul.yaml#L42 | 15:51 |
clarkb | another thing to check is if you have any config errors preventing zuul config from loading for the new branches | 15:52 |
clarkb | slittle: starlingx/zuul-jobs has nodeset not foudn errors. Not sure if the job config for these two repos would rely on anything in that repo | 15:53 |
clarkb | infra-root I'm going to approve https://review.opendev.org/c/opendev/system-config/+/939667 now then double check the lodgeit image pulls to see if we pull them correctly before approving those updates | 15:54 |
clarkb | slittle: looking at zuul logs on zuul01 I see zuul reports it is using a cached layout for 938743 and it then lists a number of sources including master and r/stx.9.0 but not r/stx.10.0 so ya I think the issue is likely related to config errors. | 15:58 |
*** gthiemon1e is now known as gthiemonge | 15:58 | |
clarkb | slittle: one hack may be to push up a change to the .zuul.yaml for that branch which should cause zuul to report back if there are config errors | 15:59 |
clarkb | I think that is what I would do either in the existing changes or in a followup change | 15:59 |
opendevreview | Yaguang Tang proposed zuul/zuul-jobs master: the buildx image changed to alpine which has no ca-certificates https://review.opendev.org/c/zuul/zuul-jobs/+/939823 | 16:02 |
clarkb | fungi: I'd like to land https://review.opendev.org/c/opendev/system-config/+/936297 and then recheck https://review.opendev.org/c/opendev/system-config/+/939767 as that should help us confirm we're pulling the current image from the intermediate registry properly using podman (not possible with docker previously when looking at quay as the source) | 16:16 |
clarkb | fungi: if the captcha doesn't render properly we know we're fetching the old image from quay directly and need to look into the speculative image testing setup | 16:16 |
clarkb | any chance you can review that change? Its straightforward and I'll probably approve it in the next hour if not but figured if you are around its worth looking at | 16:16 |
frickler | slittle: https://zuul.opendev.org/t/openstack/config-errors?project=starlingx%2Fzuul-jobs&skip=0 shows the errors, not sure if they are related to your issue, but fixing these would be good anyway | 16:17 |
fungi | clarkb: done | 16:19 |
clarkb | thanks. As a side note I wonder if there is a way to log more explicitly the versions of things that docker compose is pulling | 16:21 |
clarkb | I can't really tell from our logs in the CI jobs what we're fetching | 16:21 |
clarkb | but I figure the existing test update should give us decent signal and we can continue to improve the info collection from there | 16:21 |
opendevreview | Yaguang Tang proposed zuul/zuul-jobs master: the buildx image changed to alpine which has no ca-certificates https://review.opendev.org/c/zuul/zuul-jobs/+/939823 | 16:34 |
clarkb | tonyb: https://meetings.opendev.org/irclogs/%23opendev/%23opendev.2024-12-17.log.html#t2024-12-17T20:59:58 this is where the unhappy gerrit restart started getting discussed last month | 16:42 |
opendevreview | Merged opendev/system-config master: Handle borg 1.2 rc 1 for warnings behavior https://review.opendev.org/c/opendev/system-config/+/939667 | 16:43 |
frickler | I'm trying to find out why zuul doesn't report anything on https://review.opendev.org/c/openstack/kolla/+/938546 , my guess is that that is due to the accidental (or so I assume) mix of using both .zuul.d and zuul.d, but I don't find any useful clue in the executor logs | 16:50 |
frickler | (cc mnasiadka) | 16:51 |
clarkb | reporting is done by the schedulers | 16:51 |
mnasiadka | that's funny I might have uncovered a bug by doing a typo ;-) | 16:52 |
clarkb | https://review.opendev.org/c/openstack/kolla/+/938546/6/.zuul.d/project.yaml is invalid config | 16:52 |
frickler | I meant schedulers, zuul01+02, sorry I mix up the names | 16:52 |
clarkb | you need to specify which pipeline the jobs run in | 16:53 |
clarkb | I do wonder if there is some regression preventing zuul from reporting config errors though based on ^ and slittle's problem | 16:53 |
clarkb | corvus: not sure if there ahve been changes to config loading with the last restart | 16:53 |
opendevreview | Merged opendev/system-config master: Capture lodgeit captchas for verification purposes https://review.opendev.org/c/opendev/system-config/+/936297 | 16:53 |
frickler | ah, right, that should be "check:\njobs:" mnasiadka | 16:53 |
mnasiadka | but yes, I would expect Zuul tells me it's wrong config :) | 16:54 |
clarkb | in both cases zuul seems to eventually log "no jobs for queue item stuf in check | 17:01 |
clarkb | and then I think it decides not to report as a result. But the no jobs are due to errors (I suspect anyway) so we should be reporting? | 17:02 |
clarkb | ya looks like that case is explicitly a no report case one sec I'll link to it | 17:03 |
clarkb | https://opendev.org/zuul/zuul/src/branch/master/zuul/manager/__init__.py#L2444-L2449 | 17:03 |
clarkb | earlier in that "switch" statement we handle the config error case so something must be overriding the config error case for us? | 17:04 |
clarkb | since mnasiadka's example at least is definitely a config error | 17:04 |
clarkb | ok the captcha is entirely in the image so I think we're pulling the speculative image properly. I'll recheck again after this reports and if we get the expected behavior again I think we can proceed with landing this | 17:37 |
clarkb | following up on borg rc 1 behavior the backup on paste02 that just ended at 17:28 UTC reported Wed Jan 22 17:28:21 UTC 2025 Backup finished with warnings. and I don't see an email today for a failed backup so I think that worked | 17:41 |
fungi | yeah, i just checked the inbox (there was one from ~05z in spam which i moved to the inbox) | 17:42 |
clarkb | Zuul is quite busy this morning too | 17:44 |
clarkb | assuming things remain normalish for the rest of the morning I think I'll go ahead and approve the gerrit image update and the h2 compaction time changes. Then plan to restart gerrit at the end of our meetup time today. That way we can use the meetpad call to coordinate | 17:46 |
clarkb | I suspect we'll have plenty of time for that within our allocated time | 17:47 |
fungi | sounds good to me | 17:49 |
slittle | frickler,clarkb: perhaps we need to backport https://review.opendev.org/c/starlingx/zuul-jobs/+/939482 to older branches | 17:50 |
frickler | slittle: ah, you fixed it in master already, that's a good start, didn't see that before. but yes, that needs to go into all affected branches, then | 17:55 |
fungi | (or affected branches can alternatively be deleted to resolve the error) | 17:57 |
clarkb | for the gerrit updates I think we can do both the 3.10.4 update and the h2 compaction update at the same time and restart 1 on both. Then when things settle we restart again to check the compaction time behavior. Or we can do them separately. If we do them separately we would do restart 1 to pick up the compaction update, then stop gerrit to verify compaction behavior and if all | 17:59 |
clarkb | looks good then do restart 2 onto gerrit 3.10.4 | 17:59 |
clarkb | I'm throwing that out nwo so that people can think about the appraoch that makes the most sense to them. I think the two things are independent enough of each other that it is unlikely we'd have an interaction that caused problems | 17:59 |
clarkb | it would be more a case of how worried we are having problems from both at the same time I guess | 17:59 |
fungi | plan #1 sounds better to me. also we should restart apache since we were getting some stale ssl keys reported as recently as yesterday | 18:14 |
clarkb | ++ sounds good | 18:15 |
clarkb | I should've just checked the lodgeit image on quay to see if we are pulling it. The image was created in 2023 but the image logs that it is using uwsgi from 2024 so we almost certainly are using speculative images in testing \o/ | 18:42 |
clarkb | https://quay.io/repository/opendevorg/lodgeit?tab=tags | 18:42 |
clarkb | so I guess I'll go ahead and approve the change to publish that image to quay again? | 18:43 |
corvus | clarkb: no relevant changes merged in zuul afaik. looking at 938546, that appears to be a change which, by virtue of the zuul.d directory overriding the .zuul.d directory, completely removes the in-repo config for the project, replacing it with a single job definition (which is not run). so i think zuul correctly decides that no jobs should run and does not report. | 18:49 |
corvus | i think having both config files is a configuration warning, but not an error. arguably, we could consider promoting that to error-level; not sure of the implications of that off-hand. | 18:49 |
clarkb | ah that would explain it too then | 18:50 |
clarkb | so the invalid config is completely ignored which means no error. The new config which is used is basically a noop and we may warn about the overriding of configs but only if it merges first | 18:51 |
clarkb | corvus: unrelated to zuulconfig errors I think I've decided that https://review.opendev.org/939767 is speculatively testing container images hosted on quay with podman and docker compose | 18:52 |
clarkb | I think we figured it would but its nice to see it actually working apparently | 18:52 |
clarkb | I think I decided to hold off on approving the change to move the image until after Gerrit stuff is done so that I don't have to juggle both things at the same time. But I will work on getting lodgeit image publishing moved to quay again after gerrit is done later today | 18:53 |
fungi | makes sense | 18:57 |
clarkb | and I'll approve the gerrit changes if everything is still happy before I go to lunch. That way it should merge and be ready for us a couple hours later | 18:57 |
corvus | clarkb: i double checked and i agree that job ran the speculative container image. \o/ | 18:57 |
clarkb | corvus: were you able to confirm using hashes or using inferred info like me? | 18:59 |
clarkb | I wasn't able to find somethin as specific as a hash due to how things get logged by the pull | 19:00 |
corvus | hashes | 19:04 |
corvus | sorry i closed all the tabs :( | 19:05 |
corvus | it was a mess to read, but it was in there | 19:05 |
clarkb | ok good to know if I dug further it is there | 19:06 |
corvus | https://zuul.opendev.org/t/openstack/build/0b2e83b16c744d89a7d5b8393131dfac/log/bridge99.opendev.org/ansible/service-paste.yaml.log#2172 | 19:07 |
frickler | oh, so zuul.d completely overrides .zuul.d, I wasn't aware of that. is that intentional or would merging contents be a possibly less confusing option? | 19:07 |
corvus | https://zuul.opendev.org/t/openstack/build/0b2e83b16c744d89a7d5b8393131dfac/console#2/0/13/localhost | 19:07 |
corvus | clarkb: those are the two hashes i matched | 19:08 |
clarkb | thanks! | 19:09 |
corvus | frickler: very intentional and documented | 19:09 |
clarkb | does anyone else want to review the gerrit 3.10.4 update before I approve it: https://review.opendev.org/c/opendev/system-config/+/939167 ? | 19:11 |
fungi | looks like i already have | 19:12 |
frickler | corvus: is there a use case for actually having both in parallel in a repo or would at least generating a configuration warning be an option? | 19:17 |
clarkb | I've approved the h2.maxCompactTime change and will give the 3.10.4 update a little longer in case anyone else wants to review it | 19:21 |
corvus | see above, it does make a warning | 19:23 |
frickler | corvus: was that meant for me? I don't see a warning anywhere | 19:26 |
corvus | frickler: https://meetings.opendev.org/irclogs/%23opendev/latest.log.html#t2025-01-22T18:49:51 | 19:28 |
frickler | corvus: so that warning would only appear if that change is merged? let me try that in the sandbox, then | 19:29 |
clarkb | ya I think so because we don't report warnings on proposed changes (we do for errors), but warnings do show up in the error list when merged | 19:32 |
clarkb | ok approving the 3.10.4 update change now | 19:45 |
frickler | ok, I did find the warnings in the scheduler log now, didn't spot those before. so I guess that's ok-ish then, except maybe we want to report those? are there warnings we do not want to report? | 19:45 |
corvus | We don't force a report for only warnings. That's a good thing. | 19:47 |
corvus | That project was not configured for like 20 pipelines. We don't want 20 warnings. | 19:48 |
corvus | 20 reports | 19:49 |
corvus | It should be user visible on a report or the config error page | 19:50 |
frickler | otoh it would be nice for users to get feedback as to when (and possibly why) zuul will not report on their change, instead of waiting for hours and then having to ask opendev admins | 19:53 |
corvus | No doubt. | 19:55 |
clarkb | the 3.10.4 update failed on docker hub too many requests. I'm going to manually dequeue it so that I can reapprove it | 20:01 |
clarkb | this should speed things up by about 40 minutes. Then I'll eat lunch and then meetup time | 20:01 |
clarkb | oh I don't have a +1 anymore so I can't just reapprove I need to reenqueue. Doing that instead | 20:02 |
fungi | wfm | 20:03 |
clarkb | and if that fails again maybe we're just doing the update for h2 compaction time today... | 20:03 |
fungi | as well as a change for moving gerrit images to quay? ;) | 20:03 |
Clark[m] | Need to update to podman and noble first. | 20:05 |
fungi | ah, yeah | 20:05 |
clarkb | https://zuul.opendev.org/t/openstack/build/b0ca1e0e3af5448681596930b59587c6 though it failed pulling the python base image which we mirror now | 20:12 |
clarkb | maybe we need to more aggressively move things over to the mirrored image? | 20:12 |
clarkb | also it says the non buildkit builder is deprecated we should probably just build everything with buildkit? | 20:12 |
clarkb | I don't think we need to solve either of those thinsg right this instant though and I'm hungry | 20:13 |
opendevreview | Merged opendev/system-config master: Set h2.maxCompactTime to 15 seconds https://review.opendev.org/c/opendev/system-config/+/938000 | 20:39 |
clarkb | I checked and ^ is in place and ready for us | 20:55 |
clarkb | the version bump hasn't failed yet | 20:55 |
fungi | yeah, deploy reported success | 20:58 |
clarkb | ya and I looked at the compose file too | 20:58 |
fungi | i fixed my disappearing etherpad scrollbar by turning on layout.testing.overlay-scrollbars.always-visible in firefox's about:config | 22:16 |
fungi | #status notice The Gerrit service on review.opendev.org will be offline momentarily for a restart to put some database compaction config changes into effect, and will return within a few minutes | 22:52 |
opendevstatus | fungi: sending notice | 22:52 |
-opendevstatus- NOTICE: The Gerrit service on review.opendev.org will be offline momentarily for a restart to put some database compaction config changes into effect, and will return within a few minutes | 22:53 | |
opendevstatus | fungi: finished sending notice | 22:57 |
clarkb | hashar: as a heads up ^ was done to try the h2.maxCompactTime setting that you suggested. Unfortiantely it idnd't make any of the cache files smaller. Now we're wondering if we are shutting down gerrit properly so that compaction actually runs | 23:29 |
clarkb | hashar: I'll ask on discord about the correct way to do that, but wanted to mention it to you too in case you have noticed that compaction isn't actually working or maybe know of something that we might be doing wrong | 23:29 |
clarkb | I've asked upstream on discord we'll see what they say and until then I guess we can try and setup a test system | 23:34 |
clarkb | I've rechecked https://review.opendev.org/c/opendev/system-config/+/893571 and put a hold on the gerrit 3.10 job so that we can try to set it up for more testing | 23:36 |
clarkb | oh I failed to update the hold string meh its fine | 23:36 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!