Thursday, 2025-09-25

-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul-jobs] 962238: Fix defaults for upload-image-swift and -s3 https://review.opendev.org/c/zuul/zuul-jobs/+/96223800:34
@tristanc_:matrix.orgHello folks, we are trying to diagnose a weird issue when we perform a ZooKeeper upgrade. We presently only have one replica, and when the ZK service is restarted, we observe that Nodepool deletes in-use nodes, which results in running Zuul jobs to fail weirdly. This apparently happens for the ZKNodes that are unlocked, but looking at the metastatic adapter, the IN_USE node state should be locked with a non ephemeral lock. So the question is: is it expected to loose in-use nodes when ZooKeeper is fully restarted? And is the only solution to upgrade ZooKeeper without loosing IN_USE nodes is to setup more than one replicas?09:43
@jim:acmegating.comtristanC: a zk quorum with an even number of nodes could split-brain and cause issues.  running an odd number is recommended.13:55
@fungicide:matrix.org(this is why opendev has 3 zk servers for its zuul)13:59
@jangutter:matrix.orgI have a question about https://review.opendev.org/c/zuul/zuul-jobs/+/962194/1/roles/ensure-python/tasks/main.yaml <--- I think that general section is making the assumption that Fedora and CentOS python packages are using dotless convention by default. I'm reasonably sure the regex on line 44 can be removed.14:31
I don't know much about the users of this, for example if python_version: "311" is a considered a valid input to the role.
@jangutter:matrix.orgArgh, I meant to say "making the incorrect assumption"14:32
@tristanc_:matrix.orgcorvus: fungi Thanks! And one is not enough I guess... It's a bit surprising though, I thought that restarting zookeeper would not kill running jobs14:33
@fungicide:matrix.orgjangutter: i have a feeling some of that came about due to using the role in tox jobs where "py311" is tox's built-in selector for a python 3.11 test environment14:35
@fungicide:matrix.organd so we wanted a way to map a test environment back to a python interpreter install mechanism14:35
@fungicide:matrix.orgbut yeah, it looks like this is in reverse? we pass in "3.11" as python_version and then it expects fedora to supply a python311-devel package14:37
@jangutter:matrix.orgI'm thinking the behaviour for fedora/rhel packages should map "311" or "3.11" to "python3.11-devel" (or "39"/"3.9" to "python3.9-devel"). Does that sound reasonable?14:38
@fungicide:matrix.orgjangutter: hunting around, it looks like at some timein the past there was a dotless package naming scheme in centos/epel14:38
@jangutter:matrix.orgI'm shocked, I tell you.14:39
@fungicide:matrix.orgyeah, rpmfind has a lot of python39-devel package hits, for example14:39
@fungicide:matrix.orgopensuse still does python311-devel apparently14:40
@fungicide:matrix.orgso it has now diverged depending on the red hat sub-flavor14:40
@jangutter:matrix.org_but_ I wonder if there's an alias.14:40
@fungicide:matrix.orgoh maybe14:41
@jangutter:matrix.orgyeah, python39 + python3.9 works on c9s14:41
@jangutter:matrix.orgOh my giddy aunt.... python312 works on c10s too14:42
@jangutter:matrix.orgOK, so please ignore my rant - it turns out that even though the package might be named python3.9 or python39, the aliases seem to be working and lasts across many versions.14:43
@jangutter:matrix.orgAha.14:45
@jangutter:matrix.orgThere is a difference though: python3.12-devel works, but python312-devel does not.14:45
@fungicide:matrix.orgso depending on the vintage we'll need to use one or the other i guess?14:46
@jangutter:matrix.orgso for the base packages the aliases remain, but the devel package seems to have dropped it. That's new in both c9s and c10s14:46
@jangutter:matrix.orgFor the devel package (with rpmfind) it looks like OpenSuse diverged from RH-derived distros.14:48
@fungicide:matrix.orgright, that's what i was saying earlier14:48
@jangutter:matrix.org(just the devel package mind you!)14:48
@fungicide:matrix.orgwhat a mess14:48
@fungicide:matrix.orgi tried to express this complexity in the pep 725/804 draft discussions, but i'm not sure my concern was heard14:49
@fungicide:matrix.org(not about python.*-devel packages specifically, but the problem of package names changing over time in various distros)14:50
@jangutter:matrix.orgThat's why you shouldn't download the packages yourself: give it to an LLM agent to do it for you!14:52
@fungicide:matrix.orgbut what if you already *are* an llm agent?14:52
@jangutter:matrix.orgI'm waay ahead of you never having had a soul.14:53
@jangutter:matrix.orgNever should have gone to 64 bit... it all went downhill from there.14:54
@jangutter:matrix.orgLooking at the ci coverage for that job, there's precious little rpm-based distros voting on it.14:56
@fungicide:matrix.orgat this point opendev only has centos stream and rocky available, someone is currently working on alma. we dropped fedora and opensuse due to lack of interest (in maintaining support for them)14:58
@jim:acmegating.com(if people are interested in maintaining those, volunteering to do so in opendev is welcome :)14:59
@fungicide:matrix.orgyes, exactly14:59
@jangutter:matrix.orgah, to be young and have the fun of keeping a distro integrated! (decades ago, I had a Gentoo desktop)15:00
@jangutter:matrix.orgSo, two choices: keep the current logic in-place in `ensure-python` (just amend it for c10), or I can propose a fix with something that swings the logic around? It's niche, but it means one less thing we keep downstream.15:02
@fungicide:matrix.orgyeah, i think if the only user who's reported running into it only needs it to work for centos, then take the simple approach for now and don't prematurely engineer support for other users who haven't brought it up and likely don't exist15:08
@fungicide:matrix.orgreworking it so the "old" package name format is the exception might help avoid future updates to the logic, but would also potentially be backward-incompatible for platforms we don't know about15:12
@fungicide:matrix.orgbasically, prioritize fixing what we know isn't working for actual users who have run into a problem, but avoid breaking what may be currently working for users we're not aware of15:13
@jangutter:matrix.orgAgree... at best I think I need to add a note in the code though.15:17
@jangutter:matrix.org(for future acheologists, in some distant age....)15:18
@fungicide:matrix.orgyes, absolutely, a comment about it would be grand15:19
@fungicide:matrix.organd then later if we end up with an unweildy list of exceptions or get annoyed by constantly updating it, that's the time to think about potentially-backward-incompatible refactoring of the logic15:20
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed:15:27
- [zuul/zuul] 961292: Launcher: handle reused node failure https://review.opendev.org/c/zuul/zuul/+/961292
- [zuul/zuul] 961557: Assign unassigned building nodes to requests https://review.opendev.org/c/zuul/zuul/+/961557
- [zuul/zuul] 962145: Use a subnode for request assignment https://review.opendev.org/c/zuul/zuul/+/962145
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul-jobs] 962291: Update packer job/role documentation https://review.opendev.org/c/zuul/zuul-jobs/+/96229115:43
-@gerrit:opendev.org- Zuul merged on behalf of Andy Ladjadj: [zuul/zuul-jobs] 836744: fix(packer): prevent task failure when packer_variables is not defined https://review.opendev.org/c/zuul/zuul-jobs/+/83674415:57
-@gerrit:opendev.org- Jan Gutter proposed: [zuul/zuul-jobs] 962194: Fix up some EL10 compatibility https://review.opendev.org/c/zuul/zuul-jobs/+/96219416:52
@gordonmessmer:fedora.imhttps://opendev.org/zuul/zuul/src/branch/master/tools/docker-compose.yaml refers to a container named zuul-test-zookeeper, but I can't find that container or a definition for it18:09
@fungicide:matrix.orgGordon Messmer: that... is the definition for it?18:23
@fungicide:matrix.orgdocker-compose reads the definition in that file to create the container18:23
@fungicide:matrix.orge.g. if you run https://opendev.org/zuul/zuul/src/branch/master/tools/test-setup-docker.sh i think?18:24
@gordonmessmer:fedora.imoh hell... I misread the error message I get when trying to start podman-compose. it's failing to *start* that container because of an "invalid mount option" -_-18:25
@gordonmessmer:fedora.impodman-compose seems not to understand the tmpfs directions18:27
@fungicide:matrix.orgi wonder if it's thrown by the uid= option18:27
@gordonmessmer:fedora.imyes18:28
@fungicide:matrix.orgat least that's not set in the mysql container's tmpfs list18:28
@fungicide:matrix.orgso if it's starting the mysql container and not zookeeper then that would stand to reason18:28
@fungicide:matrix.orglooks like that option was added to the docker-compose.yaml file by https://review.opendev.org/c/zuul/zuul/+/835019 ~3.5 years ago, for reference18:30
@fungicide:matrix.orgi would be surprised if nobody's tried running the setup with podman-compose in that long, but i suppose it's possible18:31
@gordonmessmer:fedora.imperhaps they hit an error and simply revert to using docker.  :)18:31
@fungicide:matrix.org"The :U suffix tells Podman to use the correct host UID and GID based on the UID and GID within the <<container|pod>>, to change recursively the owner and group of the source volume. Chowning walks the file system under the volume and changes the UID/GID on each file, it the volume has thousands of inodes, this process will take a long time, delaying the start of the <<container|pod>>."18:34
@fungicide:matrix.orgfrom the "Chowning Volume Mounts" section of https://docs.podman.io/en/v4.4/markdown/options/volume.html18:35
@fungicide:matrix.orgi wonder if that's used directly by podman-compose18:35
@gordonmessmer:fedora.imI've simply removed the uid mount option, and the USER spec for the container. that allows podman-compose to start the set, at least.18:35
@gordonmessmer:fedora.imperhaps also notable, tox 4.30 no longer supports the "whitelist_externals" directive, so tox fails.18:39
@fungicide:matrix.orgah, yeah zuul switched to nox around the time that tox v4 happened, so probably never got fixed to work with it18:41
@fungicide:matrix.orgi bet TESTING.rst got overlooked for updating18:42
@gordonmessmer:fedora.imyes, that would make sense. it still refers to tox18:42
@fungicide:matrix.orgwell, we also never removed the old tox.ini18:42
@gordonmessmer:fedora.imOK, so do I need to know anything other than "run nox"?18:44
@fungicide:matrix.orgaha, there are still some uses of tox mixed around in tool scripts looks like, which is i guess why tox.ini wasn't removed, though i see patterns like `ensure_tox_version: "<4"`18:45
@fungicide:matrix.orgGordon Messmer: basically yes, though the equivalent of `tox -e myenv` is `nox -s myenv`18:45
@fungicide:matrix.organd the environments are defined in noxfile.py instead of tox.ini18:46
@gordonmessmer:fedora.imthanks.  let's see what happens...18:48
@gordonmessmer:fedora.immy goal is to rebase and update https://review.opendev.org/c/zuul/zuul/+/85993918:48
@fungicide:matrix.orgcool, this is helpful discussion regardless, i'm working on a patch now to get some of the stuff you've identified cleaned up18:49
@fungicide:matrix.orgthough not sure what we should do about podman-compose at the moment18:49
@fungicide:matrix.orgthat'll need a bit more digging18:49
@gordonmessmer:fedora.imI'm seeing a lot of kazoo client errors, "Connection time-out"18:51
@fungicide:matrix.orgcould be that zookeeper didn't start18:52
@fungicide:matrix.orgwhich might be due to the dropped uid mapping18:52
@fungicide:matrix.orgkazoo is the zk client lib18:52
@gordonmessmer:fedora.imit looks like the container is *running*, at least18:53
@gordonmessmer:fedora.imI'll see if I can get logs out of it18:53
-@gerrit:opendev.org- Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org proposed: [zuul/zuul] 962304: Clean up tox remnants https://review.opendev.org/c/zuul/zuul/+/96230419:04
@fungicide:matrix.orgnow to see if that ^ passes testing19:04
@gordonmessmer:fedora.imfound that zookeeper did not have read access to its certificates. simply make them globally readable. I may try to figure out the rootless container permissions later, but this should be good enough for now19:08
@gordonmessmer:fedora.imunit tests are running. thanks.19:09
@gordonmessmer:fedora.imif/when the gitea tests pass, I'll open a new review19:10
@fungicide:matrix.orginteresting, i guess the lack of user mapping could have affected the perms on the certs volume19:11
-@gerrit:opendev.org- Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org proposed: [zuul/zuul] 962304: Clean up tox remnants https://review.opendev.org/c/zuul/zuul/+/96230419:33

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!