Friday, 2025-09-12

-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed:00:44
- [zuul/zuul] 960417: Add subnode support to launcher https://review.opendev.org/c/zuul/zuul/+/960417
- [zuul/zuul] 960695: Add reuse label option to launcher https://review.opendev.org/c/zuul/zuul/+/960695
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed:02:09
- [zuul/zuul] 960417: Add subnode support to launcher https://review.opendev.org/c/zuul/zuul/+/960417
- [zuul/zuul] 960695: Add reuse label option to launcher https://review.opendev.org/c/zuul/zuul/+/960695
-@gerrit:opendev.org- Zuul merged on behalf of Clark Boylan: [zuul/zuul] 960679: Pull base python images from quay.io/opendevorg https://review.opendev.org/c/zuul/zuul/+/96067911:00
@fungicide:matrix.orgClark: not sure if you saw, but someone raised a question post-merge on your https://review.opendev.org/958783 change for zuul-jobs14:06
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 958963: AWS: add kms-key-id for label https://review.opendev.org/c/zuul/zuul/+/95896314:40
@mhuin:matrix.orgI might have asked before but I don't remember the answer; out of curiosity how many zookeeper nodes do you run on opendev? What about other deployments?14:49
@jim:acmegating.commhu: 314:50
@fungicide:matrix.orgbasically 3 is the minimum required to have proper redundancy14:52
@mhuin:matrix.orgyeah ... we're learning this the hard way14:53
@mhuin:matrix.orgalso for availability during upgrades14:53
@clarkb:matrix.orgfungi: I hadn't seen that, I'll respond there I guess14:54
-@gerrit:opendev.org- Jan Gutter proposed: [zuul/zuul-jobs] 960840: Make buildx builder image configurable https://review.opendev.org/c/zuul/zuul-jobs/+/96084014:54
@jangutter:matrix.org^^ Clark I dunno if that's a good idea or not, or if it needs to be a job var instead of a rolevar.14:55
@jangutter:matrix.orgfeel free to take over (or I can abandon if it's a bad idea)14:55
@clarkb:matrix.orgjangutter: are you working with sjabasti on this? I'm wondering if I still need to respond on the original change14:56
@clarkb:matrix.orgI think its fine to make that configurable14:56
@jangutter:matrix.orgClark: Nope, just saw some low hanging fruit and needed a mind cleanser of my on-call stuff this week :-p14:57
@clarkb:matrix.orgthanks I'll respond with a link to your change then.14:57
@jim:acmegating.comhttps://zuul-ci.org/docs/zuul-jobs/latest/policy.html#role-variable-naming-policy14:57
@jim:acmegating.comshould probably consider the variable name14:58
@jangutter:matrix.orgYeah, the cognitive dissonance to me is because this is technically a "global" definition that's common across those two builder roles. But I have no objection with namespacing them (the two roles are not expected to be drop-in replacements for each other)15:01
@clarkb:matrix.orgI think the other variables those roles use are scoped15:01
@jangutter:matrix.orgYep15:01
@clarkb:matrix.orglike the container image lists to build so doing it here isn't a problem either.15:01
@jim:acmegating.comthat's why the vars use the "container_" prefix15:01
@jim:acmegating.comit's a slight relaxation of the general rule which would be "build_container_image_buildx_builder_image" to account for the multi-role use15:02
@jangutter:matrix.orgDammit, you didn't fall for my global namespace landgrab. Oh well....15:02
@jim:acmegating.comnot my first rodeo15:02
@jangutter:matrix.orgBut rolevar rather than jobvar?15:03
@jangutter:matrix.org(I've been caught by stashing defaults in roles....)15:03
@jim:acmegating.comyeah i think that's fine (rolevar means it can be set in both places).  the docs for the job should be updated to include it as well.15:04
@jangutter:matrix.org👍️ thanks for the quick out-side-of-gerrit review. I'll do a bit of a quick fix on that.15:05
@jim:acmegating.comClark: https://review.opendev.org/959887  appeared to immediately succeed -- was that expected?15:12
@clarkb:matrix.orgcorvus: ish. I suspect the underlying issue was related to the same disk problems that the ubuntu mirrors had at the same time. Things were "working" but very slow and I think possibly geo location impacted results as well15:13
@clarkb:matrix.orgfungi: do you know if canonical and ubuntu are all caught up after the mirror issues now? I suspect landing that change now is safe15:14
@fungicide:matrix.orgClark: i haven't seen any more evidence since the weekend15:15
@jim:acmegating.comthe job with microk8s in the name ran in a reasonable amount of time https://zuul.opendev.org/t/zuul/runtime?job_name=zuul-jobs-test-registry-buildset-registry-k8s-microk8s&project=zuul/zuul-jobs&branch=master&pipeline=check15:16
@jim:acmegating.comi'll go ahead and +3 it then15:17
@clarkb:matrix.orgsounds good15:17
@jim:acmegating.comClark: i went ahead and signed up for the open source pavillion.  i think based on your description of the environment, maybe the best approach would be to show opendev's zuul and do ad-hoc demos and discussion, rather than something structured.  i can also have a quick-start demo on the laptop already running in case we want to talk about that.  and this would be a zuul project booth, so anyone at the summit can help out.  it's only an hour or two, so i don't think we need to coordinate shifts or anything.  just show up and show and tell zuul i'm thinking.15:19
-@gerrit:opendev.org- Jan Gutter proposed: [zuul/zuul-jobs] 960840: Make buildx builder image configurable https://review.opendev.org/c/zuul/zuul-jobs/+/96084015:19
@jim:acmegating.comi have a bunch of the one-page zuul flyers too, i'll bring them.15:21
@clarkb:matrix.orgcorvus: I like that. And ya we should be able to push changes to zuul itself to trigger behaviors we want to show off in the opendev zuul15:21
@clarkb:matrix.orgthat seems like an easy way to engage people with someone more interactive than the flyer15:21
@jim:acmegating.com++15:21
@jim:acmegating.comClark: *now* it's failing :) https://zuul.opendev.org/t/zuul/build/4d835ef03dd949e584d318d2f116303b/console15:35
@jangutter:matrix.orgcorvus: hitting a rate limit?15:36
@jim:acmegating.comunclear; `context canceled` is vague15:37
@jangutter:matrix.orglooks like there's at least some other folks who believe it's vague https://bugs.launchpad.net/snapstore-server/+bug/2104066 15:40
@clarkb:matrix.orgwow thats classic15:40
@clarkb:matrix.orgit works when you think it is broken and broken when you suspect it should work15:41
@clarkb:matrix.orgfwiw I'm trying to clear out my opendev backlog then I'm going to see if I can make sense of the bwrap trixie issue on that held node15:41
-@gerrit:opendev.org- Zuul merged on behalf of Simon Westphahl: [zuul/zuul] 959428: Fix provider quota calculation https://review.opendev.org/c/zuul/zuul/+/95942815:47
@jim:acmegating.comjangutter: i did not realize there was a variable already for that for the docker role.... we could make an argument we should use that variable (which is what you did in PS1), or we could take the opportunity to improve things for the container role and use the container_ prefix.  but we probably shouldn't actually change the variable in the docker role -- or if we do, we're going to need to support backwards compat for that and do a migration.15:55
@jangutter:matrix.orgOh hang on... I thought that wasn't templated out in the docker role...15:56
@jim:acmegating.comoh you're adding it to the docker role too15:57
@jim:acmegating.com * oh you're adding it to the docker role too?15:57
@jim:acmegating.comsorry i mistook how the docker role got involved; so this is expanding the scope of the change to do the same thing for both roles?15:58
@clarkb:matrix.orgyes I think it is being added to both15:58
@jangutter:matrix.orgYeah, the code is mirrored, so it made sense to mirror the interface to me.15:58
@jim:acmegating.comokay, then i think the approach in PS2 may be the approach we want15:58
@jim:acmegating.comyeah, indentation aside that lgtm16:00
@jangutter:matrix.orgSo if someone is converting a project from `docker` to `container`, they can just replace the `docker_` section of the job vars with `container_` (following the same pattern as most of the other "common" bits.16:00
@jangutter:matrix.orgThat whitespace is going to haunt me.16:01
@jim:acmegating.comline 83 too16:01
@jangutter:matrix.orgyep, serves me right for not doing a git show and being all obsessive.16:02
@fungicide:matrix.org`git diff --check` will also call out trailing whitespace16:03
@jangutter:matrix.orgheh, trailing whitespace goes all red in my vim.16:04
@fungicide:matrix.orgoh, well yes mine too16:05
-@gerrit:opendev.org- Jan Gutter proposed: [zuul/zuul-jobs] 960840: Make buildx builder image configurable https://review.opendev.org/c/zuul/zuul-jobs/+/96084016:05
@fungicide:matrix.orgi think --check might also call out mixed tabs and spaces16:05
@fungicide:matrix.orgdon't recall now, manpage likely says16:05
@jangutter:matrix.orgI'm particularly miffed that vscode doesn't add trailing newlines by default.16:06
@jangutter:matrix.orgOur codebase is a delightful mixture of text files that sometimes end in newlines.16:06
@fungicide:matrix.orgindeed, manpage says by default --check warns about trailing whitespace, spaces immediately followed by tabs in an initial indent, and merge/rebase conflict markers16:07
@clarkb:matrix.orgoh nice it will check for conflict markers16:08
@fungicide:matrix.orgoh, yeah, dos vs unix text file formatting is a pain too, git core.whitespace might be able to be tweaked to identify it16:08
@clarkb:matrix.orgcorvus: is there a way to keep job dirs without restarting the executor?16:28
@clarkb:matrix.orgI'm on that held node now and builds/ is empty so I think I need to keep jobdirs and then trigger a new test and see what things look like when zuul is trying to run bwrap16:28
@jim:acmegating.comClark: `zuul-executor keep` on all the executors16:29
@jim:acmegating.comthen `nokeep` on all of them once the job is done16:29
@jim:acmegating.comminimize the time it's enabled16:29
@jim:acmegating.comthe weekly restart should clean up the detritus from other jobs16:30
@clarkb:matrix.orgthanks in this case its the quickstart setup on a held test node so I think its ok to just enable for now16:30
@jim:acmegating.comoh sorry i thought you were asking about a meta level up :)16:30
@clarkb:matrix.org`drwx------ 7 root root 4096 Sep 12 16:32 5153f599f4cd4ceaaf29c026f69b222a` this is what the resulting build dir looks like so now I'm thinking the bwrap `Permission denied` is literal16:33
@jim:acmegating.commaybe the umask changed16:35
@clarkb:matrix.orgwe also don't set a user on the container in the docker compose file so its running as root within the container so you'd think that should still work. But maybe the unshare of users by bwrap means that it won't use a 700 directory?16:37
@clarkb:matrix.orgumask reports `0022` within the trixie container16:37
@clarkb:matrix.orglet me see if I can figure out running the command from the logs to reproduce then modify the dir perms to see if the error goes away16:39
@clarkb:matrix.orgok I think the issue is that /var/lib/zuul is 0700 zuul:zuul16:46
@clarkb:matrix.org* ok I think the issue is that /var/lib/zuul is 0700 zuul:zuul (10001:10001 not 1000:1000 of the calling zuul user in the host env)16:49
@clarkb:matrix.orgthat directory is created by `"lib-zuul-executor:/var/lib/zuul:z"` in the docker compose file as a volume mount. But we create it in the container itself using useradd and set it as the 10001 zuul user's homedir16:50
@clarkb:matrix.orgcorvus: I'm thinking that useradd must've changed default permissions on homedirs16:50
@clarkb:matrix.orgcorvus: do you know why we don't run the containers as the 10001 user within the quickstart environment? Is the issue with podman running the containers as a regular user?16:50
@clarkb:matrix.orgI'm going to test running this as the zuul user16:53
@clarkb:matrix.orgheh deleting the lib zuul volume required deleting the gerrit config container which ended up automatically deleting the gerrit container so now I have a running executor I can't trigger jobs in.16:59
@clarkb:matrix.orgI'm just going to push up a change to have zuul automate this testing for me16:59
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul] 960682: Build container images on Debian Trixie https://review.opendev.org/c/zuul/zuul/+/96068217:00
@fungicide:matrix.orghow meta17:02
@clarkb:matrix.organyway the problem appears to be the mismatch between the /var/lib/zuul permissions and ownership and the uid running zuul-executor which ultimately runs the bwrap command17:03
@clarkb:matrix.orgI suspect that this change is part of /var/lib/zuul being created in the container via useradd17:03
@fungicide:matrix.orgoh, created as the homedir17:04
@clarkb:matrix.orgyup17:04
@clarkb:matrix.orguseradd might access an override to permsisions but if running the executor as zuul works instead I think that is better17:04
@fungicide:matrix.orgyeah, useradd may have started making more restrictively permissioned homedirs by default in trixie17:04
@clarkb:matrix.orgthen once I get this sorted out I think I want to update quickstart to use a different volume for /var/lib/zuul in gerritconfig so that the executor can be cleared out and restarted without impacting gerrit17:05
@clarkb:matrix.orgfungi: ya that is my assumption right now. i haven't confirmed17:05
@clarkb:matrix.orgI don't see any useradd options to set perms on the homedir when created with the `-m` flag. But it does appear to look at login.defs? I assume something like that is what changed. Anyway another option would be to drop the -m flag and start creating the dir ourselves if we want17:12
@clarkb:matrix.orgbut I think if just running the quickstart processes as the zuul user works then that is a better option17:12
@jim:acmegating.comClark: i'd be concerned about fixing this with the quickstart; fundamentally it seems we're creating the directory in the container image with the wrong perms?  i think we should keep them the same.17:13
@fungicide:matrix.orgyeah, login.defs has HOME_MODE 070017:13
@jim:acmegating.com`drwxr-xr-x 2 zuul zuul` is what they're expected to be17:14
@jim:acmegating.comi also think it's curious that bwrap isn't overriding that when run as root.17:14
@clarkb:matrix.orgcorvus: I guess it depends on how we define what wrong permissions are. They are wrong in this instance because we create /var/lib/zuul for the zuul user's homedir then run processes as another user (root in this case) out of that directory17:14
@clarkb:matrix.orgmy initial impression is that the process uid should match the homedir ownership. But if that is not the case then I can modify the Dockerfile to create the homedir explicitly rather than via useradd (that way we can control the permissions17:15
@jim:acmegating.comi think we expect people to bind mount /var/lib/zuul if they run it as a different user, but strictly speaking, we haven't set the USER in the dockerfile, so the default user is still root.  we have some pending changes to update that, and i think it may make sense to merge those changes and move this project along.  but if the goal is "make zuul work on trixie" then i think we should do that minimally.17:17
@jim:acmegating.comhttps://review.opendev.org/c/zuul/zuul/+/84075817:19
@clarkb:matrix.orggot it. In that case I'll rework the image build to create the directory in a separate command17:19
@clarkb:matrix.orgsomething like useradd && mkdir && chown && chmod17:19
@jim:acmegating.comsgtm.  and tbc, i'm totally in favor of opening the can of worms, but just want it to be a different can of worms than trixie.  :)17:20
@clarkb:matrix.orgor I guess I can let it create the dir then chmod17:20
@jim:acmegating.com(we should probably update 840758 and see where that stands in the current world since almost everything has changed since then)17:20
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 960875: Launcher: don't use the main node with slots https://review.opendev.org/c/zuul/zuul/+/96087517:32
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul] 960682: Build container images on Debian Trixie https://review.opendev.org/c/zuul/zuul/+/96068217:38
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 959397: Add "final" attribute to image/flavor/label objects https://review.opendev.org/c/zuul/zuul/+/95939717:38
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul] 960878: Give quickstart gerritconfig a distinct /var/lib/zuul volume https://review.opendev.org/c/zuul/zuul/+/96087817:44
@clarkb:matrix.orgcorvus: I think 960692 is green now. But I want to drop the zuul-executor and scheduelr debug flags unless you think we should keep those in quickstart. Then separately do you think we should merge that change before the new nodejs and golang images are mirrored to quay.io/opendevmirror? If not I'll push an edit up to try and use the mirror which will fail for now until opendev starts mirroring those images18:27
@clarkb:matrix.orghttps://review.opendev.org/c/opendev/system-config/+/960681 is the change to mirror those images18:28
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul] 960682: Build container images on Debian Trixie https://review.opendev.org/c/zuul/zuul/+/96068219:22
@clarkb:matrix.orgI decided to go ahead and make those two changes I think that is the end state we want and may as well point in that direction now19:23
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul] 960682: Build container images on Debian Trixie https://review.opendev.org/c/zuul/zuul/+/96068219:24
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 960208: Add image validation admin method https://review.opendev.org/c/zuul/zuul/+/96020820:02
@clarkb:matrix.orgI just restarted the matrix eavesdrop bot and want to make sure it captures this message20:59
@fajfer:reszka.organd did it?21:31
@clarkb:matrix.orgit did21:33
@clarkb:matrix.orghttps://meetings.opendev.org/irclogs/%23zuul/%23zuul.2025-09-12.log21:33
@jim:acmegating.comClark: i reckon we need to wait till tomorrow (or do something to force a run) to get that mirrored22:37
@clarkb:matrix.orgcorvus: ya I'll recheck the change after mirroring happens 22:47
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed:23:51
- [zuul/zuul] 960875: Launcher: don't use the main node with slots https://review.opendev.org/c/zuul/zuul/+/960875
- [zuul/zuul] 960921: Handle launch failures with subnodes https://review.opendev.org/c/zuul/zuul/+/960921

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!