Tuesday, 2025-07-01

clarkbcorvus: there was a small syntax error in that chnage so I went ahead and fixed ti so we can get test results back00:33
corvusoh thx00:42
clarkbcorvus: I did have a couple small questions on the change. I think it should work but I'm still trying to wrap my brain around the behavior there00:42
clarkbbut I need to eat dinner now so no rush00:42
clarkbalso zk01-03 seem to be stable in count of connections and rough znode numbers. I think the next big event is periodic jobs enqueing but I may have to followup on that tomorrow morning00:44
corvusclarkb: replied00:45
clarkbthanks I updated to +200:47
mnasiadkacorvus: yup, it has depends-on on a DIB change i dug out that should help with retries (of course after some modifications from what I see)04:06
opendevreviewMichal Nasiadka proposed openstack/diskimage-builder master: Retry git clone/fetch on timeout  https://review.opendev.org/c/openstack/diskimage-builder/+/72158104:33
opendevreviewLukas Kranz proposed zuul/zuul-jobs master: limit-log-files: allow unlimited files  https://review.opendev.org/c/zuul/zuul-jobs/+/95385408:06
opendevreviewLukas Kranz proposed zuul/zuul-jobs master: limit-log-files: allow unlimited files  https://review.opendev.org/c/zuul/zuul-jobs/+/95385408:07
opendevreviewLukas Kranz proposed zuul/zuul-jobs master: limit-log-files: allow unlimited files  https://review.opendev.org/c/zuul/zuul-jobs/+/95385408:13
opendevreviewLukas Kranz proposed zuul/zuul-jobs master: limit-log-files: allow unlimited files  https://review.opendev.org/c/zuul/zuul-jobs/+/95385408:13
opendevreviewLukas Kranz proposed zuul/zuul-jobs master: limit-log-files: allow unlimited files  https://review.opendev.org/c/zuul/zuul-jobs/+/95385408:21
opendevreviewRodolfo Alonso proposed openstack/project-config master: Remove the fullstack job from the master dashboard  https://review.opendev.org/c/openstack/project-config/+/95385608:36
priteauHello. Do you know what is required to make this patch move out of "Ready to submit" state? https://review.opendev.org/c/openstack/bifrost/+/94824509:35
mnasiadkaclarkb, corvus: I think https://review.opendev.org/c/opendev/zuul-providers/+/953269 should be fine now (with the depends-on), although I see there are some other problems (disk full?)09:47
fricklerpriteau: I answered in the ironic channel yesterday: it needs a rebase, the status that gerrit shows is wrong10:29
fricklerinfra-root some review on https://review.opendev.org/c/openstack/diskimage-builder/+/951469 would be nice to allow to build testing/trixie images10:30
fricklermnasiadka: looks like there is another error to check for: "Empty reply from server", I wonder whether we should simply retry any failure? https://zuul.opendev.org/t/opendev/build/b47ca097c4c54a799cf68305a7c27ce510:32
fricklerand yes, this looks pretty full "/dev/vda1       78062860 73215660   1268820  99% /". maybe a good chance to test the z-l autohold :)10:35
frickler24G for /opt/git sounds like a lot to me, I had something less than 20G for total image size in memory. 10:39
mnasiadkafrickler: might make sense, I'll have a look later10:53
priteaufrickler: I missed your reply, thanks11:13
opendevreviewMerged openstack/project-config master: Remove the fullstack job from the master dashboard  https://review.opendev.org/c/openstack/project-config/+/95385611:24
jrosserwhen does the ubuntu mirror sync run? there are some broken dependencies on qemu-block-extra in the CI mirrors which are (no longer?) present in the upstream mirror12:34
jrosser(some discussion in #openstack-ansible and #openstack-nova about broken jobs related to that)12:35
fungijrosser: it updated an hour ago12:40
fungihttps://static.opendev.org/mirror/ubuntu/timestamp.txt12:40
jrosserso this http://mirror.iad3.openmetal.opendev.org/ubuntu/dists/noble-updates/main/binary-amd64/Packages is still showing what is different content for qemu-block-extra12:41
fungi"different" compared to...12:41
jrosserdifferent from the equivalent thing at archive.ubuntu.com12:42
jrosserwhich i think is this https://archive.ubuntu.com/ubuntu/ubuntu/ubuntu/dists/noble-updates/main/binary-amd64/Packages.gz12:42
jrosserthe depends on `librbd1 (>= 19.2.1-0ubuntu0.24.04.1)` seems to be wrong12:43
fungireprepro is pulling the packages from http://us.archive.ubuntu.com/ubuntu per https://opendev.org/opendev/system-config/src/commit/5a27433/playbooks/roles/reprepro/files/ubuntu/config/updates#L212:44
fungii'll check the logs12:44
jrosserus.archive.ubuntu.com shows librbd1 (>= 19.2.0-0ubuntu0.24.04.2) as a depends for qemu-block-extra, so that looks OK12:49
fungilooks like the packages file on us.a.u.c last updated at 11:34 utc, we finished pulling from it about 10 minuts before that12:50
jrosseri think that this has been OK using the upstream repos for at least the last 6 hours12:52
fungiwell, it may be that us.archive.ubuntu.com was lagging behind other mirrors12:52
jrosserah right, ok, that makes sense12:52
fungiso if you weren't pulling from the same one we're mirroring from, you might have gotten different results12:52
fungiit looks like there's another mirror pull in progress, started about 40 minutes ago, so should have the new content shortly12:54
fungii wonder if ubuntu's doing a point update for noble today or something12:58
fungibecause these syncs have been really slow12:59
fungiwould also explain the upstream mirrors being somewhat inconsistent12:59
jrosseri think that this is some kind of error13:04
fungicould be, but if so it seems like it may have affected a lot of packages because reprepro mirror pulls are taking over an hour to complete, or at least the last one did13:06
funginormally they're on the order of a few minutes13:06
fungimaybe they accidentally dumped plucky package data into noble or something13:07
opendevreviewMerged openstack/project-config master: Replace OpenStack's CLA enforcement with the DCO  https://review.opendev.org/c/openstack/project-config/+/95099813:10
jrosserthere does seem to be some evidence of ceph 19.2.1 being built https://archive.ubuntu.com/ubuntu/pool/main/c/ceph/13:11
fungiyeah, it's just acting like there's a ton of package churn in the noble packages today13:11
stephenfinfungi: clarkb: (discussing here since clarkb isn't on #openstack-oslo) do we have a list of things pbr does that we want/need to keep written down anywhere?13:14
stephenfinasking before I start one13:15
opendevreviewMerged openstack/project-config master: Replace StarlingX's CLA enforcement with the DCO  https://review.opendev.org/c/openstack/project-config/+/95381913:16
opendevreviewMerged openstack/project-config master: Replace Airship's CLA enforcement with the DCO  https://review.opendev.org/c/openstack/project-config/+/95384913:16
fungistephenfin: do pbr's docs count?13:18
fungior are you asking for a separate document that lists the pbr features we know projects are using?13:18
stephenfin'ish. I think we're missing docs for things like the custom scriptwriter13:19
fungiand when you say "things pbr does" you mean pbr features or undocumented behaviors? (or both?)13:19
fungiah13:19
fungii thought we documented that it creates wsgi scripts and such, but i'll check13:20
fungiyou're right, the features chapter of the document is rather short13:21
stephenfinyeah, and it focuses more of the setup.cfg file and its configuration (likely because that's all I knew about when I wrote it 😅)13:23
stephenfinhttps://etherpad.opendev.org/p/pbr-setuptools-future13:23
fungithe pbr.packaging.generate_script() method is covered in the auto-generated api docs, but that's about it13:23
stephenfinI'll start jotting down stuff there so13:23
fungithanks!13:23
stephenfinwe don't need to get this done today, but I think it would be a good first step is working with the setuptools folks13:25
stephenfinideally we should learn from each other and either fix things in setuptools (to avoid the need to have pbr provide its own implementation), add a hook point to setuptools, or realise a feature no longer makes sense13:26
Clark[m]stephenfin: off the top of my head the PBR scripts do faster process startup because they avoid pkg_resources/importlib. The setuptools version would work but slower. Then there are the wsgi scripts which setuptools doesn't do at all. PBR's commit message version bump tags are something that Openstack uses. The inclusion of git sha information in package metadata is another important feature that setuptools and others don't do. I think those13:36
Clark[m]may be the big ones. Setuptools-scm will get you close otherwise13:36
Clark[m]PBR has also been 100x better at backward compatibility than setuptools. I think that is more of a culture problem than a technical one. For some reason Pypa doesn't want to keep trivial to maintain code around even if it prevents breaking their users13:38
fungithey see forcing users to drop upstream-eol python versions as a feature, not a bug13:38
fungidoing them a favor by saving them from themselves running code with likely security vulnerabilities13:39
Clark[m]It's not just that though. The whole _ vs - debacle served no purpose other than to break existing packages13:41
Clark[m]And many packages built for old python continue to work for new python. The only exception is if you aren't updating every 6 months to fix the next setuptools breakage13:41
fungithe filename normalization change ostensibly prevents future ambiguous package files from entering the archive (ambiguous in the sense that the dist name and classifiers/version are clearly separated now)13:43
fungibut yes, it's arguable whether the churn was justified13:43
Clark[m]They did it in setup.cfg keys too and broke a bunch of stuff aiui13:45
Clark[m]PBR has a whole translation layer protecting its users (that stephenfin write)13:46
Clark[m]More generally the risk with just using setuptools is you get on their treadmill of breaking changes. PBR insulated its users from much of that13:46
Clark[m]Because we can shim compatibility into PBR rather than into every repo/package13:47
Clark[m]So that's not really a feature of PBR itself more of a shortcoming of setuptools but probably still worth calling out in a discussion of "this is why we have PBR"13:47
fungia big part of the setuptools maintainers' argument for that sort of churn is that they lack sufficient bandwidth to keep all of the project maintained, so are trying to actively purge older functionality in order to keep the maintenance burden down, with the idea that projects that need the old features can just use old setuptools versions instead13:52
fungithough i think they see it as "projects that are effectively abandanware, or otherwise too lazy to keep updated to the latest packaging standards"13:53
fungiabandonware13:53
fungijrosser: the latest mirror pull didn't resolve it. one thing i'm noticing is that the version reprepro is holding onto that depends on librbd1 from plucky is qemu-block-extra 1:8.2.2+ds-0ubuntu1.8 while the one with the working dependency on the official mirrors is qemu-block-extra 1:8.2.2+ds-0ubuntu1.714:13
fungii wonder if they "rolled back" by pulling the 0ubuntu1.8 build from the archive, and reprepro won't downgrade to that now because it thinks it's older (since it is)14:14
fungiservers that installed -0ubuntu1.8 are similarly not going to auto-downgrade to -0ubuntu1.7, so it may be that they need to do a new bump of the build version for everything to straighten out14:15
fungii expect it's possible to force reprepro to forget qemu-block-extra so it refetches the older version, but will take some reading of the docs to figure out14:16
fungi(and i won't have time to get to that right away, since today is "dco day")14:17
clarkbthats a fun package mirror scenario if that is what happened. I wonder why they didn't revert by rolling the version forward with old content14:39
clarkbif we think that is what happened it may be worth pinging ubuntu about it? as you say people who already installed the package won't downgrade this way14:41
fungiit may be that they thought it would be fine because the package was essentially uninstallable, depending on a version of librbd1 that wasn't available15:00
clarkbfungi: when you get a chance https://review.opendev.org/c/opendev/system-config/+/953783 would be a good one for you to look at as it adds a new mailing list15:13
fungiah, yeah, i saw that but then forgot in my rush to get other stuff done yesterday15:17
clarkbthanks15:23
stephenfinfungi: clarkb: Another pbr question to follow. FYI these are not urgent: I just don't want to waste time if you've already gotten to the bottom of something...15:28
stephenfinI'm trying to fix the remaining issue on https://review.opendev.org/c/openstack/pbr/+/953839 just to unblock that gate. fwict, the issue is that setuptools changed how they normalise their package names between v79.0.0 and v79.0.2 and I'm guessing that breaks this code where we drop a package from the constraints list15:29
stephenfinhttps://github.com/openstack/pbr/blob/5d4a1815afa920cf20e889be20617105446f7ce2/pbr/tests/test_integration.py#L101-L11215:30
fungithey changed the dist name normalization in metadata?15:31
fungiwhat's the actual change in setuptools?15:31
fungijust making sure it's not being conflated with the change to sdist filename normalization15:31
stephenfinsec, lemme grab links15:31
stephenfinI'm looking at 'git diff v75.8.0..v75.8.2', since the job started failing on 2025-02-26 and setuptools cut a release that day (neither virtualenv nor pip did)15:32
stephenfinhttps://github.com/pypa/setuptools/compare/v75.8.0..v75.8.215:33
stephenfinboth pkg_resources and setuptools have normalization-related changes there. Nothing else looks like it could be related. The test is failing due to constraints mismatches15:34
clarkbstephenfin: we'er writing too many packages to the constraints file?15:35
clarkbbecause they don't match?15:35
stephenfinI'm thinking (still working on a fix) that we're using upper-constraints.txt and we're also setting PBRVERSION=0.0, so build packages have version 0.0.0, and constraints say they need X.Y.Z15:37
stephenfinand it only fails on glance-store (again, a guess) because the name in the package's setup.cfg differs from what's in upper-constraints.txt15:37
clarkbstephenfin: and that quoted block of code you linked is there to ensure those projects are not in constraints I think so that 0.0 can be installed?15:38
clarkbso ya if the names don't align we'd keep adding them to constraints then have a conflict15:39
stephenfinyeah, I think that's the idea. I'm just wondering why it broke with v79.0.1 or .2. We didn't have anything else normalization-related like this pop up earlier this year, did we?15:39
clarkbjust the sdist names and the setup.cfg entry keys that I can remember15:40
fungiwe switched to using pyproject-build instead of running setup.py directly in release jobs around that time, so could maybe be a difference in resulting metadata for that package15:40
fungistephenfin: glance-store 4.9.1 was released on 2025-02-21... lemme see when that propagated to upper-constraints.txt15:42
stephenfinoh, I didn't think to look at glance-store itself15:43
opendevreviewMerged opendev/system-config master: Add a summitspeakers@lists.openinfra.org mailing list  https://review.opendev.org/c/opendev/system-config/+/95378315:43
stephenfinit's only that out of the ~65 packages we test that is failing, which is what's so odd15:44
fungihuh, https://review.opendev.org/c/openstack/requirements/+/942508 removed it on 2025-02-2515:46
fungicould the problem be that it's not in the constraints list then?15:47
fungioh, it switched to glance_store15:47
stephenfinbingo15:48
stephenfinso we need to normalize our constraint name also15:49
fungiyeah, 942508 renamed it from glance-store to glance_store15:49
fungiso the output from the generate-constraints script changed around that time, i suppose15:50
fungifrickler: ^ this may be the answer to the baffling failures we were discussing in #openstack-release too?15:50
stephenfinI hope that wasn't me...15:51
stephenfin(I would have been working on it about then iirc)15:51
* stephenfin blame tonyb preventatively15:51
noonedeadpunkhey folks! Do you have any guesses why https://docs.openstack.org/2025.1/deploy/ is not showing any projects, given that https://zuul.opendev.org/t/openstack/build/240a3287daba4c029dbc06ed9ca9519d/artifacts is merged and promoted https://review.opendev.org/c/openstack/openstack-manuals/+/953313 ?15:55
noonedeadpunkis there any sort of server-side cache for html content?15:55
funginoonedeadpunk: i'll check the log to see if it's failing to update or something, but no normally it should take no more than 5 minutes after the promote job completes15:57
clarkbhave you checked that the artifacts contain the data?15:58
clarkbnoonedeadpunk: ^ you can do that first15:58
clarkbI think either https://opendev.org/openstack/openstack-manuals/src/branch/master/www/2025.1/deploy/index.html isn't doing what you expect it to do in generating the content (check the build result content to see) or publication is failing somewhere15:59
fungiyeah, volume releases are happening as scheduled at least15:59
fungihttps://109254a1749d24a3f999-62593741b623fd737fcd3b17f392bcb8.ssl.cf5.rackcdn.com/openstack/240a3287daba4c029dbc06ed9ca9519d/docs/2025.1/deploy/ seems to be in the preview from the gate job16:02
clarkband it doesn't have the content. So the issue is in generating the content not publishing it16:03
fungiagreed, i pulled the docs archive tarball from that job and it also only contains an index.html file under ./2025.1/deploy/16:04
fungier, from that build16:04
clarkband that index file doesn't have any guides in it16:04
fungiit has two links in it, one for ansible in docker and one for ansible in lxc16:06
clarkboh huh I must be blind it does16:07
clarkbok ignore me the data appears to be there then16:07
clarkbnext thing is probably to check if the RW volume has the data in it?16:08
fungiyeah, so the question is did something else race it and deploy an older version of that file? maybe changes merged for different branches around the same time16:08
clarkboh except you said volume releases are happening so RW and RO should match16:09
clarkbya rsync races are a good guess16:09
noonedeadpunkyes, it should contain only links16:09
fungilooks like there was another docs change that promoted at almost the same time16:10
clarkbthough openstack-manuals seems to be primarily master now (ther are some really old tsable branches but nothign for 2025.1 for example)16:10
noonedeadpunk2 patches landed eachj after other - that's true16:10
noonedeadpunkbut neither of them was promoted16:10
noonedeadpunk(there was a chain)16:10
fungi"Build succeeded (promote pipeline)"16:10
fungithose are promotions16:10
noonedeadpunkbut I'd expect there some semaphors then if it's sensetive to races condfition?16:11
fungibuilds of jobs running in the promote pipeline16:11
noonedeadpunkand they were run in serial manner, yeah16:12
fungiyeah, promote should be using a supercedent pipeline manager, i think, so as long as those were for the same branch they wouldn't have run concurrently16:12
noonedeadpunkyeah, right16:13
clarkbhttps://zuul.opendev.org/t/openstack/build/56ea313811794f9999fcce2c5625cdee/log/job-output.txt#133-135 and https://zuul.opendev.org/t/openstack/build/b569b5cc2b454b2593d8e2d982166fdf/log/job-output.txt#133-135 seem to confirm there was a minute and half ish between them16:14
noonedeadpunkSynchronize files to AFS is changed in there as well16:15
noonedeadpunkSo I'd expect promote to be successfull16:15
clarkbthe executors were replaced with new systems which means new kernels and openafs builds. However, elsewhere afs synchronization seems to be working (I just published a zuul blog post an hour ago for example)16:15
clarkbso I don't think its a fundamental afs problem16:15
noonedeadpunkThe only thing I can guess then if AFS wasn;'t mounted in fact16:17
clarkbor the source data was wrong. Or the target path is wrong somehow16:18
noonedeadpunkwell artifact does look fine, so unless it fetched wrong artifact...16:18
clarkbhttps://ef4886d127b9d5e50b2a-46ed996c4c88287cea630d62dd5380de.ssl.cf1.rackcdn.com/openstack/37a9b4537fc9495c99974bf011d09f00/docs-html.tar.gz this is what was fetched16:18
clarkbaccording to https://zuul.opendev.org/t/openstack/build/56ea313811794f9999fcce2c5625cdee/console#1/0/4/localhost16:19
fungii tracked back from the last run of that promote job and the artifact it said it pulled in https://zuul.opendev.org/t/openstack/build/b569b5cc2b454b2593d8e2d982166fdf/console#1/0/4/localhost does include those lines in the index.html16:19
fungiis that the most recent run of the job?16:19
clarkbthe one I linked to is for https://review.opendev.org/c/openstack/openstack-manuals/+/95331216:19
clarkbwhich ran first not second I think16:20
noonedeadpunkyes, right, I just saw that artifact was correct as well16:20
fungiyeah, from there i went to build history to check the most recent build, just in case it had overwritten it with different content16:20
noonedeadpunkhttps://zuul.opendev.org/t/openstack/buildset/a53547d0a00447c2828a57569a53c725 is the last one 16:21
noonedeadpunkaccording to https://zuul.opendev.org/t/openstack/buildsets?project=openstack%2Fopenstack-manuals&pipeline=promote&skip=016:21
fungiyeah, that's the buildset containing the build i linked above16:21
clarkbhttps://109254a1749d24a3f999-62593741b623fd737fcd3b17f392bcb8.ssl.cf5.rackcdn.com/openstack/240a3287daba4c029dbc06ed9ca9519d/docs-html.tar.gz and that  build pulled this tarball16:22
clarkbside note: those tarballs are tarbombs :/16:22
noonedeadpunkdoes content on AFS match? 16:22
fungilast modified time on /afs/.openstack.org/docs/2025.1/deploy/index.html is 2025-05-02T14:2716:25
fungiso it's not getting overwritten in afs16:25
noonedeadpunkso I am wondering what in promote job does mount afs?16:26
fungii did at least check from a zuul executor that it does have working afs at that path16:26
noonedeadpunkas right after `aklog` upload hjappens16:26
clarkbopenafs is installed on the executors and is part of the bwrap bind mount iirc16:27
clarkbso nothing in the job is mounting it. Its already mounted16:27
noonedeadpunkaha, ok16:27
noonedeadpunkso it just needs auth during runtiome which it gets16:28
fungiyeah, the "upload-afs-roots: Synchronize files to AFS" task should be doing it, and seems to reference the correct unpacked source path16:28
fungiwe don't get any output from that task though16:28
clarkbthis build is what published my zuul blog post update https://zuul.opendev.org/t/zuul/build/5bee8a15e93d4e219471f064f0debc9216:28
clarkbthat runs in post not promote but I don't think that should matter16:29
noonedeadpunkok, let me land another patch to the repo then and see what happens16:29
clarkbhttps://zuul.opendev.org/t/zuul/build/5bee8a15e93d4e219471f064f0debc92/console#3/0/5/localhost its synchronize is far more verbose16:30
clarkbI think that task literally copied no files16:30
fungiyeah16:30
fungicould /var/lib/zuul/builds/b569b5cc2b454b2593d8e2d982166fdf/work/docs/ be empty then?16:30
clarkbin https://zuul.opendev.org/t/openstack/build/b569b5cc2b454b2593d8e2d982166fdf/console#1/0/26/localhost build_roots is empty16:31
clarkbbut build_roots is not empty in my working example16:31
clarkbI think this must be the source of the problem16:31
fungihttps://zuul.opendev.org/t/openstack/build/b569b5cc2b454b2593d8e2d982166fdf/console#1/0/6/localhost is what should have put the files in place16:31
clarkbhttps://opendev.org/zuul/zuul-jobs/src/branch/master/roles/upload-afs-roots/library/zuul_afs.py#L109 this is where build_roots comes from. I was initially reading it as an input to th task but it is an output logging what it did16:33
clarkbhttps://opendev.org/zuul/zuul-jobs/src/branch/master/roles/upload-afs-roots/library/zuul_afs.py#L32-L3316:34
clarkbis there a .root-marker?16:34
clarkbthere isn't one in the tarball. Not sure if the job creates that or it needs it in the tarball16:34
fungithe "Write root_marker file" task was skipped due to a conditional result16:35
clarkbin the zuul example the job writes it here: https://zuul.opendev.org/t/zuul/build/5bee8a15e93d4e219471f064f0debc92/console#2/0/0/localhost16:35
noonedeadpunkhttps://zuul.opendev.org/t/openstack/buildsets?project=openstack%2Fopenstack-manuals&pipeline=promote&skip=016:35
fungihttps://zuul.opendev.org/t/openstack/build/b569b5cc2b454b2593d8e2d982166fdf/console#1/0/7/localhost16:35
noonedeadpunkthis what vccreates .root-marker I think16:35
noonedeadpunkdoh16:35
noonedeadpunkhttps://zuul.opendev.org/t/zuul/build/5bee8a15e93d4e219471f064f0debc92/console#2/0/0/localhost16:36
clarkbhttps://opendev.org/openstack/project-config/src/branch/master/zuul.d/jobs.yaml#L597-L61516:36
clarkbwrite_root_marker: false16:36
clarkbto me that implies the artifact is supposed to contain the root marker and we stopped doing that?16:36
fungiso presumably you'd do that if it should contain one already16:36
fungiyeah16:37
clarkbfungi: ya exactly. I think that flag exists because sometimes the content in the archive is expected to already have the data16:37
noonedeadpunkbut what about copy task above?16:37
clarkbnoonedeadpunk: that is from the zuul blog post which works16:37
noonedeadpunkah, doh16:37
noonedeadpunkright16:37
noonedeadpunkok, yes,. there's no .root-marker task indeed in affected job16:39
clarkbhttps://opendev.org/openstack/openstack-manuals/src/branch/master/.zuul.yaml#L6-L29 the comments in that job make no sense. Once says no root marker is written and the other say it is written so you need the vars16:40
noonedeadpunkwell the job was not touched for a veeeeery long time16:41
fungilooking to see what, if anything has updated in that volume in the past month16:41
clarkbya but maybe something that tox runs was changed16:41
fungifor comparison purposes16:41
fungidocs for glean and openstack-zuul-jobs updated there recenly16:41
fungibetting they run a different set of jobs16:42
clarkbyes manuals has its own jobs16:42
clarkbopenstack-manuals/tools/build-all-rst.sh does write a root marker16:43
fungi948577,1 and 948425,4 promoted about half an hour apart on may 2, the first one got its content published and the second did not16:44
noonedeadpunkI'm not sure this is actually the job which runs?16:45
noonedeadpunkdisregard16:45
noonedeadpunkI mixed myself again with URIs16:45
clarkbwe build the artifact tarball with tox -epublishdocs which runs tools/publishdocs.sh which calls tools/build-all-rst.sh --pdf16:45
fungiso it seems like we might have a half-hour window we can narrow the behavior change to16:46
fungiunfortunately the artifacts have all aged out of swift since that was 2 months ago16:46
clarkbI think that should be writing the root marker in that process. Maybe the problem is in the tarball step we aren't including the . file16:46
clarkboh we only write the root marker if the branch is master16:47
clarkbwhich it was16:47
fungiwhich should be irrelevant in the case of openstack-manuals since it doesn't have stable branches i don't think16:47
fungior rather it stopped having them after stable/ocata16:48
noonedeadpunkright16:48
fungicould the branch test behavior have changed and stopped matching?16:49
fungi(complete shot in the dark)16:49
clarkbmaybe its looking at $ZUUL_BRANCH16:50
clarkbhttps://zuul.opendev.org/t/openstack/build/240a3287daba4c029dbc06ed9ca9519d/console#2/0/13/ubuntu-jammy doesn't seem to record the env vars that were used to run the task16:50
clarkbhttps://opendev.org/openstack/openstack-manuals/src/branch/master/.zuul.yaml#L23-L29 this is wheer we set it theoretically16:51
noonedeadpunkis it https://zuul.opendev.org/t/openstack/build/240a3287daba4c029dbc06ed9ca9519d/log/tox/publishdocs/3-commands[2].log#13 ?16:51
clarkbnoonedeadpunk: ya that looks like it is being set16:53
clarkbnoonedeadpunk: I think what I would do to debug furhter is update opensatck-manuals tools/publishdocs and tools/build-all-rst to set -x and add debugging output around root marker creation. Then if we confirm it is being written the next step is checking why it isn't included in the archive16:53
noonedeadpunkso here we run publishdocs.sh https://zuul.opendev.org/t/openstack/build/240a3287daba4c029dbc06ed9ca9519d/log/tox/publishdocs/1-commands[0].log#2016:54
noonedeadpunkbut I don;'t see build-all-rst being sourced16:54
noonedeadpunkoh, it's not having it....16:55
noonedeadpunkdoh16:55
clarkbnoonedeadpunk: publishdocs.sh calls it16:55
clarkbbut I think we need more logging (set -x would help a lot)16:55
noonedeadpunkI type faster then I think :(16:56
clarkbmaybe the publish-docs/html/.root-marker path is not the correct path anymore?16:56
clarkbbut I Think first thing is confirm it is being written at all then debug why it isn't getting into the tarball16:57
fungicheck jobs should be sufficient for verifying that, so a dnm change ought to do the trick16:57
clarkbyup16:57
clarkbhttps://zuul.opendev.org/t/openstack/build/240a3287daba4c029dbc06ed9ca9519d/console#3/0/16/ubuntu-jammy this is what creates the tarball17:00
clarkbpublish-docs/html does look to be the correct path to me17:00
clarkbnoonedeadpunk: you might be able to add a -v to the tar invocation there and do a depends on to get the file listing of things going into the tarball if you confirm the root marker is written17:01
noonedeadpunk++17:02
opendevreviewMichal Nasiadka proposed openstack/diskimage-builder master: Retry git clone/fetch on timeout  https://review.opendev.org/c/openstack/diskimage-builder/+/72158117:09
clarkbinfra-root when do you think we should pull the old zk servers out of DNS? https://review.opendev.org/c/opendev/zone-opendev.org/+/953844 I can also take that as a signal to cleanup the old servers17:10
opendevreviewMichal Nasiadka proposed openstack/diskimage-builder master: Retry git clone/fetch on timeout  https://review.opendev.org/c/openstack/diskimage-builder/+/72158117:10
corvusclarkb: zk dns any time17:23
priteauI just got some job failures due to: Source /opt/cache/files/cirros-0.5.3-x86_64-disk.img not found. Has the file been removed?17:23
fungimight be something we didn't expect that changed with zuul-built images vs nodepool?17:24
fungithough we're including it in https://opendev.org/opendev/zuul-providers/src/branch/master/dib-elements/cache-devstack/source-repository-images#L417:25
clarkbfungi: priteau I see that image listed in both nodepool image build config and zuul launcher built images config17:25
clarkbbut also those files are suspected to be treated as a best effort cache. If not present you should download it yourself17:25
clarkbs/suspected/supposed/ I can't type17:25
clarkbpriteau: can you link to the job?17:26
fungiyeah, which is devstack's behavior noramlly, at least17:26
clarkbsorry job log17:26
fungiso guessing this isn't devstack-based17:26
noonedeadpunkso it echoing data at least in debug: https://zuul.opendev.org/t/openstack/build/1aa3f63f30434b4baa89af3f7b018145/log/tox/publishdocs/1-commands[0].log#1016517:27
clarkbnoonedeadpunk: I notice there is an rsync -a after the root marker is written that targets the dir that root marker is written to. But that shouldn't overwrite the file17:28
clarkbyou have to set extra flags to delete with rsync iirc17:28
noonedeadpunkbut it's not implying --delete17:28
noonedeadpunkiirc17:29
clarkbyes. I just watned to call that out17:29
clarkbto amke sure my assumptions made sense and sounds like they do17:29
noonedeadpunkI also was looking at it right now17:29
clarkbnoonedeadpunk: maybe add a task after tox that does an ls -al of the dir 17:29
noonedeadpunkbut it kinda opens door for error17:29
clarkband maybe even cat the file17:29
noonedeadpunkSo I'd rather created it in www/static/ to be on the safe side...17:30
clarkbzuul web node listing seems to only show in use nodes?17:30
fricklerfungi: stephenfin: not sure if I missed something in the backlog, but yes, I've been thinking about needing to update the constraints and possibly the requirements.txt in consumers, but that sounds like a difficult/questionable thing to do for stable branches17:30
clarkband I don't see IP addresses17:30
fungiclarkb: i think because it's tenant-scoped? so only relevant once there's a node request for a build in that tenant?17:31
clarkbfungi: ya. I'm wondering what the equivalent of nodepool list is now so that I as admin can see an overall picture. Maybe that doesn't exist yet?17:31
fungifrickler: what we observed was that generate-constraints renamed glance-store to glance_store in the upper-constraints.txt file at the time when the errors in job began17:31
fungithe change i linked where that happened didn't call it out specifically as happening though17:32
corvusclarkb: ips should be in the json; i think there isn't a column in the javascript ui for them (yet)17:33
clarkbcorvus: any idea what the server column with null values and copy buttons is supposed to represent?17:34
corvusi think external id17:34
clarkboh the first id is the internal zuul id. Got it17:35
clarkbI'm going to step out for a bit. I've got hvac people coming over today and the time window they gave us covers the team meeting. So I need to make sure there is unobstructed access in and out of the garage before then17:40
noonedeadpunkhm, so I actually think it could be that the job was just broken, but it was not failing for some reason before. as until I rebased on top of https://review.opendev.org/c/openstack/openstack-manuals/+/951429 - my DNM was failing: https://zuul.opendev.org/t/openstack/build/f6a4b48ce567463db385a1c6de5f301817:46
opendevreviewMichal Nasiadka proposed openstack/diskimage-builder master: Retry git clone/fetch on timeout  https://review.opendev.org/c/openstack/diskimage-builder/+/72158117:46
noonedeadpunk(not failing due to missing set -e or smth)17:47
noonedeadpunkas now `Synchronize files to AFS` is already running for 5 minutes17:48
stephenfinclarkb: fungi: wrapping up now, but fwiw it seems setuptools is no longer responsible for generating console scripts. That's instead handled by pip (via the vendored distlib library) and that doesn't use importlib17:54
fungineat17:56
fungiso maybe we can drop that from pbr and just expect everyone to have a new enough pip?17:56
noonedeadpunkhow new these need to be?17:57
stephenfinor at the very minimum put it behind a Python version check, on the assumption that if you're using python3.6 or better, you're good17:57
stephenfinnoonedeadpunk: not sure yet, but this is the code that does the generation https://github.com/pypa/pip/blob/main/src/pip/_vendor/distlib/scripts.py17:58
stephenfinjust a case of finding out how that's called, and how long/since what release of pip it's been called that way17:58
stephenfinmy guess is that it may be a PEP-517 thing but TBD17:59
noonedeadpunkwell, according to history, it looks around 1y at least...17:59
fungiscripts.py has been there in one form or another dating back to a vendored copy of distlib that was added to pip in march 201318:00
fungii suppose the bigger question is at what point did it start getting used for this purpose18:00
fungihttps://github.com/pypa/pip/commit/9936d6b18:01
stephenfinhttps://github.com/pypa/pip/commit/22b63127cbccc3476033e54eb72bac8ea932ce3a18:02
clarkbthats only when installing from a wheel but everything installed by pip becomes a wheel first then is installed right?18:02
clarkbI guess that is good news and probably something we can simplify in pbr / remove then18:02
fungijrosser: i tweaked our mirror script to frontload the update with an explicit `reprepro remove noble-updates qemu-block-extra` and started a manual run of the script. looks like it pulled the older version of the package now, so once this completes the errors will hopefully go away18:03
jrosserfungi: thats great, thank you18:04
noonedeadpunkjrosser: thanks for raising that 18:04
fungiclarkb: yes, pip doesn't "install" sdists these days, it builds a wheel and installs that18:04
clarkbnoonedeadpunk: so you think 951429 fixes this?18:04
clarkbthis == root markers for afs publication?18:05
noonedeadpunkpromote jobs already runs for over 20 minutes18:05
stephenfinindeed. even if you give it an sdist directly, it still builds the wheel18:05
noonedeadpunkso either smth went terribly wrong, or it does the job now for real18:05
stephenfinexhibit a https://paste.opendev.org/show/bOq1KbVBPlXRVU33ygAG/18:06
noonedeadpunkbut that dnm which adds set -e failed right away and nothing was merged before - is very suspicious18:06
clarkbstephenfin: I guess the thing to double check is update PBR to drop all scripts handling then pip install a package and see what the scripts look like. Setuptools does have script handlign in it but I'm guessing that largely for setup.py install these days which is largely going away? Basically lets ensure that the normal ip install case does what we expect and if so we can probably18:08
funginoonedeadpunk: so supposition is that the script was terminating early before the root marker was added, and then the job was proceeding normally thereafter as if it assumed all the content was present?18:08
clarkbdrop the handling in PBR entirely18:08
clarkbstephenfin: but also I think it is late for you so don't worry about that now18:08
noonedeadpunkyeah, as some content was generated already, but not all of it18:08
noonedeadpunkso artifacts looked "fine"18:08
clarkbnoonedeadpunk: I think adding -e is something that should happen then. To make errors more obvious (rather than dnm it)18:09
fungiheck, it's late enough for me, i'm ready for meetings to be over so i can go get dinner18:09
fungiagreed, sounds like that script needs to become more robust18:09
fungiif this was actually the problem18:10
stephenfinclarkb: I've been checking the behaviour of setuptools by itself (see my paste above) and that looks good. The question will be whether we remove the pbr stuff entirely, or as I said hide it behind a Python version check18:10
stephenfinat some point, releasing pbr2 and leaving pbr as-is for more ancient software might just be easier. not something to be figured out now though18:11
noonedeadpunkbut projhect deploy guide was produced in my dnf before the failure: https://zuul.opendev.org/t/openstack/build/f6a4b48ce567463db385a1c6de5f3018/log/job-output.txt#101318:11
noonedeadpunkand then it was hitting the warning and failed: https://zuul.opendev.org/t/openstack/build/f6a4b48ce567463db385a1c6de5f3018/log/job-output.txt#127018:11
clarkbfungi: I'm trying to pull up lists.opendev.org to get a link to my archived meeting agenda and the page is not loading. Server load seems a bit high but not crazy. Not sure if this is a known problem (ai crawlers possible)18:12
clarkbhrm looks like we may be digging into swap and mariadb is workign hard18:13
clarkbok page finally loaded for me18:13
noonedeadpunkand warning was also in a kolla/osa jobs: https://zuul.opendev.org/t/openstack/build/240a3287daba4c029dbc06ed9ca9519d/log/job-output.txt#124418:13
noonedeadpunkexcept, it finsihed with `ok: Runtime: 0:01:53.808615` right after that18:13
clarkbfungi: ya there is definitely some suspicious activity in the apache logs18:14
noonedeadpunkbut in my DNM which was successfull - it's not even a half: https://zuul.opendev.org/t/openstack/build/1aa3f63f30434b4baa89af3f7b018145/log/job-output.txt#133318:14
clarkbif we haven't already we should apply our user agent filter list to that node18:14
noonedeadpunkbuild finished with problems, 1 warning (with warnings treated as errors). congratulations :) (113.61 seconds) 18:15
noonedeadpunkhttps://zuul.opendev.org/t/openstack/build/240a3287daba4c029dbc06ed9ca9519d/log/job-output.txt#1307-131518:16
noonedeadpunkso yeah18:16
noonedeadpunkit should not be DNM then....18:16
clarkbI need to focus on meeting prep so won't dig into the lists server situation further right now but I suspect step 0 is applyting that filter if we don't already then maybe updating the filter list as necesasry18:17
fungihttps://opendev.org/opendev/system-config/src/commit/76233e9/playbooks/roles/mailman3/tasks/main.yaml#L167-L170 has it in there already18:18
fungiso i guess we need to make some additions to the filter list18:19
clarkbagreed18:20
opendevreviewJeremy Stanley proposed opendev/system-config master: Add another weird UA to our general filters  https://review.opendev.org/c/opendev/system-config/+/95390418:21
clarkbfungi: do we want to just add that one? I didn't check how many occurrences there are, but I suspect there may be others if we spend a few minutes checking the logs (usually I try to grep the UAs out then have sort/uniq etc give me a count of the worst offenders then those that are definitely odd get added to our list)18:22
clarkbfungi: I posted a comment to the change for a bug that will need fixing either way18:23
noonedeadpunkand content now promoted: https://docs.openstack.org/2025.1/deploy/18:24
fungid'oh, that's what i get for appending a copy of the last line18:24
noonedeadpunkthanks folks for your time - I would never found that .root-marker18:25
opendevreviewJeremy Stanley proposed opendev/system-config master: Add another weird UA to our general filters  https://review.opendev.org/c/opendev/system-config/+/95390418:25
corvusnoonedeadpunk: for future reference there's a description of it here: https://zuul-ci.org/docs/zuul-jobs/latest/afs-roles.html#role-upload-afs-roots18:33
noonedeadpunkthanks!18:35
mnasiadkaclarkb: Any idea if anything related to disk space changed after switching to niz? 2025-07-01 19:05:30.838 | cp: error copying '/opt/dib_tmp/dib_image.xgYsGicI/image0.raw' to '/opt/dib_tmp/dib-images/ubuntu-focal.vhd-intermediate': No space left on device19:09
clarkbmnasiadka: we're limited to the disk space available on the test nodes. We've made some changes previously to reduce the total amount of space used. But the total space varies by cloud provider and their flavors. We may need to look and see if there is more we can trim out to make things fit. That said I think frickler indicated the total size of the git repos seemed larger than19:10
clarkbexpected so maybe we start there?19:10
mnasiadkaAh, right - I'll check19:10
fungithe zuul info logs collected for the build should give some idea of the fs layout and sizes19:11
fungijrosser: it's updated now and i see qemu-block-extra 1:8.2.2+ds-0ubuntu1.7 depending on librbd1 (>= 19.2.0-0ubuntu0.24.04.2) again19:18
mnasiadkaWell, not really seeing anything that is off from the DIB image size report in terms of filenames, but still 23G in total for /opt/dib_tmp/dib_build.9SeIVw8X/built/opt/git/opendev.org and 15G for git/opendev.org/openstack alone19:19
fungiif we're running it on a rax classic node with 40gb rootfs and not mounting the ephemeral disk at /opt then that could easily blow up19:21
corvus(since the image builds happen in the opendev tenant, they've been running on niz nodes for several weeks)19:24
mnasiadkawe are using the ephemeral disk, so it's not that19:26
mnasiadkalet me remove the depends-on to check if the git retry code is not adding something it shouldn't19:27
opendevreviewMichal Nasiadka proposed opendev/zuul-providers master: Add Ubuntu bionic/focal builds, labels and provider config  https://review.opendev.org/c/opendev/zuul-providers/+/95326919:27
clarkboooh side effects from that would be interesting19:27
priteauclarkb: sorry for the delay responding, this is the job which failed to get the cirros image: https://zuul.opendev.org/t/openstack/build/62ce2e3acdd549f896b24839d88214c619:34
priteauSame failure in another one: https://zuul.opendev.org/t/openstack/build/85da9e8b18774b72b865322b5402177919:35
clarkbcan you link to the portion of the log that shows the cirros issue? there are a lot of failures I'm seeing19:36
clarkbalso I see zuul logger errors are you rebooting the node possibly?19:37
clarkb(that might impact mounts of /opt in some clouds)19:37
priteauThe failure is during the last task of https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_62c/openstack/62ce2e3acdd549f896b24839d88214c6/primary/ansible/seed-deploy19:38
priteau"msg: Source /opt/cache/files/cirros-0.5.3-x86_64-disk.img not found"19:38
priteauI don't believe we are rebooting the node at all19:38
clarkbpriteau: if you navigate to that file in the zuul web dashboard you can link to the line19:39
clarkbfor example https://zuul.opendev.org/t/openstack/build/62ce2e3acdd549f896b24839d88214c6/log/primary/ansible/seed-deploy#11121 which seems to indicate the file is prsent?19:39
clarkbor maybe the stat: exists: false just below that is that task saying nothing changed because the file isn't there?19:42
clarkbbut it doesn't error until later19:42
clarkbpriteau: fungi corvus mnasiadka I see the problem. In opendev/zuul-providers we don't put cache-devstack element in all image builds. We should probably move it into the base-elements list so that all images get that file19:46
clarkbpriteau: that said we've never promised those files will exist on all images. It is a cache and caches may not have entries. Your jobs should handle the fallback case where they don't exists as well19:46
clarkbin particular that cirros image is quite old and it wouldn't be crazy for us to stop caching it in hopes people use the newer versions instead19:48
fungiwe've semi-regularly dropped less-used versions to keep the list from growing massive19:49
priteauFor some reason we have issues booting the 0.6.x series. This is the last one in the 0.5.x19:50
priteauBut that's another topic to investigate19:50
fungiwith the expectation that taking slightly longer to fetch those images in a handful of jobs is an okay tradeoff for keeping our node images smaller19:50
fungibut yeah, 5.3 is probably used in a lot of jobs still19:50
opendevreviewClark Boylan proposed opendev/zuul-providers master: Cache devstack files on all images  https://review.opendev.org/c/opendev/zuul-providers/+/95390819:51
clarkbI think ^ will fix the missing files19:51
fungigonna go grab dinner, back soon19:54
priteauthanks19:55
clarkbcorvus: I updated the zk upgrade etherpad notes. Hopefully those make sense now.19:59
clarkbI'm going to dig up lunch then will look at zookeeper server cleanups after20:00
opendevreviewMerged opendev/zone-opendev.org master: Remove zk04-zk06 from DNS  https://review.opendev.org/c/opendev/zone-opendev.org/+/95384420:45
clarkbok those names don't resolve for me anymore. Last call otherwise I'm deleting the servers at the top of the hour20:51
clarkb#status log Deleted zk04, zk05, and zk06.opendev.org. These zookeeper servers have been replaced by zk01, zk02, and zk03.opendev.org.21:03
opendevstatusclarkb: finished logging21:04
clarkbI event remembered to clean up the emergency file21:28
clarkbtook me a few to remember that thoseservers were in there though21:28
fungithanks!21:30
clarkbcorvus: I think both of the launcher changes you wanted to get in place for opendev have landed is the plan still to restart launchers today?21:37
clarkbI'm thinking I can recheck my chagnes that tripped over the provider nodeset assignemtn a few times afterwards if you think that would be helpful21:37
corvusclarkb: i'm almost ready to switch back to that21:39
corvusclarkb: both launchers restarted22:00
clarkbI'll issue a couple rechecks now then22:01
clarkbfrickler: I posted a comment on https://review.opendev.org/c/openstack/diskimage-builder/+/951469 asking for clarification on the trixie support change. fungi you may be interested in that too22:22
opendevreviewMerged openstack/project-config master: Add zuul-launcher max servers  https://review.opendev.org/c/openstack/project-config/+/95380322:25
opendevreviewMerged openstack/project-config master: Grafana: update zuul status page for NIZ  https://review.opendev.org/c/openstack/project-config/+/95382322:29
clarkbfungi: looking at the reprepro hack for ubuntu package cleanup. That script is used by all of the reprepro repos but I guess this is safe because we'll just short circuit on the set -e if it fails in a repo without those packages?23:00
clarkbmostly just wondering what happens here if run against debian mirroring for example23:00
clarkbbut thats probably fine to fail a coupel of updates in order to fix ubuntu noble23:00
fungiyeah23:04
fungiif it becomes a problem i can run a copy of the script23:04
fungibut not planning to leave it like this anyway23:04
clarkbyup its temporary23:04
clarkbI'm going to pop out now for a family dinner. Probably won't be back until tomorrow morning23:05
fungicool, catch up with you then23:05

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!