Thursday, 2025-03-20

opendevreviewKarolina Kula proposed opendev/glean master: WIP: Add support for CentOS 10 keyfiles  https://review.opendev.org/c/opendev/glean/+/94167210:25
opendevreviewKarolina Kula proposed opendev/glean master: WIP: Add support for CentOS 10 keyfiles  https://review.opendev.org/c/opendev/glean/+/94167211:57
Clark[m]https://thelibre.news/foss-infrastructure-is-under-attack-by-ai-companies/14:24
Clark[m]I'm having a slow start today. But enjoy this article that covers similar experiences to our own. Also I feel a bit lucky in that our mitigations seem to have been mostly sufficient so far14:25
*** priteau2 is now known as priteau15:01
priteauHello. In blazar-nova we need to install nova to run unit tests (since blazar-nova is a scheduler filter for Nova), we've been using a script that supports this for a while now. We recently reworked it to remove use of zuul-cloner, thinking that we could clone from local /opt/git/opendev.org instead. It worked fine in master but is failing now in stable/2025.1, but only for py3915:26
priteaujobs. Do you know what could be causing this?15:26
priteauhttps://review.opendev.org/c/openstack/blazar-nova/+/834159/4/tools/tox_install.sh15:26
fricklerpriteau: that branch might not be in the copy in the image until it has been rebuilt, which can take a couple of days. but then, why do you do this at all and do not add nova as required-project and let zuul prepare the repo for you?15:31
tonybpriteau: There's always a lag in what's in the image cache and what is truly in master.  There are many potential solutions.  You could always to a `git remote update` in the cache dir, you could look for a branch and fallback to master, you could add nova to your required projects (if it isn't already) and then use that to install from.15:48
clarkbya the most correct thing for this use case is to use zuul's required projects to set up the repo for you15:49
clarkbthat will get you working depends-on among other features15:49
tonybpriteau: At some level it depends on the usecase for tox_install.sh.  Is it primarily used for CI?  or for local developers15:49
clarkbinfra-root nb07 has built every arm64 image except openeuler (which is paused/disabled for now). It will run the image export cron in about 2 hours. If that cronjob is successful then I'll start pushing up changes to remove nb04 and clean it up. Note I think I put nb04 in the emergency file before it's cron could upate so we may still get a complaint from it but as long as nb07 looks15:50
clarkbgood that should be fine as nb04 will go away soon I hope15:50
tonybYeah adding nova as a required project will do many good things for CI,  but some of those things will need to be emulated for a local developer experience.15:51
tonybpriteau: Happy to discuss some of the various options15:51
clarkbone thing to consider with removing old builders is that while we don't think that will cause us to orphan image records in the zk db it is a possibility. It looks like rockylinux-9 images have rolled over though and cleared out records without nb01 or nb02 running so I think this is evidence that our hunch is correct and we won't have a problem15:53
fungitechnically both should be compatible approaches15:53
fungiif a developer has a nova clone then they'll want to check out the latest branch state for it15:54
fungiwhile zuul will do that part for you (and adjust to speculative state for any depends-on reference)15:55
tonybfungi: I didn't mean to imply it couldn't be done. zuul just makes it easier and if a local developer wants the same easy experience the tox or tools/tox_install.sh need to accommodate that.15:56
tonybIt may be that it already does but I'm not super familiar with the nova-blazar daily workflow and I'd be guessing based on other projects15:57
clarkbya you can do something like what devstack does.15:57
clarkb(devstack has a flag that says git repos are in a happy state already leave them alone (the zuul case) or it will clone if necessary then update to the current branch)15:58
tonybOh nice.15:58
opendevreviewClark Boylan proposed opendev/system-config master: Remove nb01, nb02, and nb04 from config management  https://review.opendev.org/c/opendev/system-config/+/94512416:06
clarkbas mentioned I'd like to hold off on merging ^ until we at least see the cronjob on nb07 succeed16:07
clarkband then happy to hold off further if others want to wait for other checks or database state cleanup16:07
opendevreviewClark Boylan proposed opendev/zone-opendev.org master: Remove dns records for old nodepool builders  https://review.opendev.org/c/opendev/zone-opendev.org/+/94512516:07
opendevreviewClark Boylan proposed openstack/project-config master: Cleanup nb04 builder config  https://review.opendev.org/c/openstack/project-config/+/94512616:08
opendevreviewVladimir Kozhukalov proposed openstack/project-config master: End gating for openstack/openstack-helm-infra  https://review.opendev.org/c/openstack/project-config/+/94512716:12
priteautonyb: to be honest I copied this script from neutron years ago and rarely use it outside of CI16:14
tonybTrying to get ,nodepool, functional tests running on noble, I hit an error with nodepool-nox-3.11 which I don't think is related:16:17
tonybhttps://zuul.opendev.org/t/zuul/build/0c81062113ec47e9ab2e6ddcd589e99d/log/job-output.txt#1821-183216:17
tonybPointers on how to debug/check would be appreciated as I don't think that recheck will do anything helpful16:17
opendevreviewMichal Nasiadka proposed openstack/diskimage-builder master: CI: Add a tool for displaying CPU flags and QEMU version  https://review.opendev.org/c/openstack/diskimage-builder/+/93783616:19
clarkbtonyb: look at 'ZooKeeper connection: LOST' in the logs. I think this is annoying zookeper connectivity exploded problem16:21
clarkbI recently updated the zookeeper version used by zuul to something more modern that should run better on the java on jammy and noble. That updated zuul-jobs' ensure-zookeeper role which I suspect nodepool also uses16:21
clarkbits possible that a simple recheck is all you need due to ^16:21
clarkbI'm trying to find the version it installed now16:22
clarkbhttps://zuul.opendev.org/t/zuul/build/0c81062113ec47e9ab2e6ddcd589e99d/console#3/0/3/ubuntu-jammy hrm no that used 3.9.3 which is current16:23
priteaufrickler: clarkb: I was already suspecting that adding a required-project was the proper fix, but we are only using openstack-python3-jobs from templates here. Do we need to use the same approach as neutron were they redefine jobs? https://opendev.org/openstack/neutron/src/branch/master/zuul.d/job-templates.yaml16:23
clarkbtonyb: doesn't look like this job collects the statsd info like zuul's test jobs do. Also doesn't collect the zk logs like zuul does. Grabbing those two pieces of info might be he next thing to see if zookeeper crashed or similar16:23
clarkbpriteau: yes, though you aren't so much redefining the jobs as creating a child of the job with additional config16:24
tonybclarkb: If you point me at where we do that elsewhere I can do that16:24
clarkbpriteau: if you look in openstack-zuul-jobs there are templates for projects like neutron and horizon that do some of this too but if this is the only place you need it doing it once in the repo is probably fine16:24
clarkbtonyb: https://opendev.org/zuul/zuul/src/branch/master/playbooks/zuul-nox/post-system-logs.yaml I think this post-run playbook on the zuul-nox job does it16:25
tonybclarkb: Thanks16:25
clarkboh actually dstat-graph may be half the equation for dstat16:26
clarkbtonyb: https://opendev.org/zuul/zuul/src/branch/master/playbooks/zuul-nox/pre.yaml#L4-L5 this is the other half in pre-run16:26
opendevreviewClark Boylan proposed opendev/system-config master: Remove nb01, nb02, and nb04 from config management  https://review.opendev.org/c/opendev/system-config/+/94512416:30
opendevreviewClark Boylan proposed opendev/system-config master: Cleanup docker-compose.yaml versions in Noble services  https://review.opendev.org/c/opendev/system-config/+/94513116:33
clarkbfrickler: ^ those two changes now cleanup the docker-compose.yaml version: lines for services it is safe to do so on16:33
clarkbthat should cleanup the warning for those specific setups16:34
clarkbtonyb: re 944118 keep in mind the nodepool functional jobs run nodepool from a container so we're not testing noble python etc16:46
clarkbbut I think you're trying to cover the runtime bootstrapping on noble so thats fine?16:46
tonybCorrect16:46
clarkback16:46
tonybI can call that out in the commit message when it needs re-working16:47
priteauAm I doing things correctly in this change? https://review.opendev.org/c/openstack/blazar-nova/+/945132/1/.zuul.yaml16:48
priteauIt appears to have worked16:48
opendevreviewClark Boylan proposed opendev/lodgeit master: Bump lodgeit up to python3.12  https://review.opendev.org/c/opendev/lodgeit/+/94513516:51
clarkbif ^ is happy I'll rebase the granian stuff onto it so that we move to 3.12 first (I think that is a bigger priority)16:51
clarkbpriteau: yes that will ensure /home/zuul/src/opendev.org/openstack/nova is checked out to the appropraite commit16:51
clarkbpriteau: so you just need to update your job to refer to that location16:52
tonybIt looks right, though more complex than I expected.16:52
tonybActaully it's installing nova from: Found -e git+file:///opt/git/opendev.org/openstack/nova python package installed rather than from  /home/zuul/src/opendev.org/openstack/nova 16:53
opendevreviewClark Boylan proposed opendev/lodgeit master: Bump lodgeit up to python3.12  https://review.opendev.org/c/opendev/lodgeit/+/94513516:55
priteautonyb: that's hardcoded in the tox_install script. Should I use /home/zuul instead?16:57
clarkb/opt/git is a cache and is only updated when the images rebuild. Zuul knows to use this cache when preparing the content in /home/zuul/src. Basically it starts with the cache content then updates it to the correct git state for your chagne contenxt16:58
clarkbIn theory no one but zuul should use /opt/git in the zuul ci jobs16:58
tonybpriteau: I would.  You can attempt to detect the path by looking at the zuul.projects ansible var or you could do something like `for repo_dir in /home/zuul/src/opendev.org /opt/git/ ; do pip install ${repo_dir}/openstack/nova && break ; done`17:02
clarkbI would stop using /opt/git entirely17:02
clarkbthere really is no reason for any zuul job to ever look in those repos themselves17:02
clarkbits an implementation detail to speed up zuul's processing of git repos for you17:03
tonybpriteau: but I *think* if you require nova, and nova is included in zuul.projects[], tox_siblings will just do what you want17:03
tonybclarkb: Okay, priteau Just switch to /home/zuul :)17:04
clarkboh I had forgotten about tox siblings17:05
clarkbyes tox siblings should do what you want here and is maybe the simplest fix17:05
tonybclarkb: and with no actual code change the nodepool unittests passed :/17:08
clarkbtonyb: ya its almost certainly a slow node/oom/low memory type of failure17:10
clarkbzookeeper and its clients rely on heartbeats to maintain connectiosn and if something causes a major slowdown heartbeats start to fail and go sideways17:11
clarkbI think this is also why corvus has been interested in 16gb nodes with zuul launcher efforts17:11
tonybahh.  That makes sense17:11
clarkbcorvus: speaking of the openmetal change has an invalid key for az: I noted we might want to put ovh in first before openmetal while we add support for az?17:11
corvusclarkb: there should be a depends-on for the change that implements that, and i've approved it, so it should land within the next several years.17:18
corvusi think i'd prefer to just wait for it to merge before doing more testing, that way the next batch of zuul-launcher experiments happen with the drivers at parity17:19
clarkbwfm17:19
opendevreviewJeremy Stanley proposed opendev/bindep master: Drop requirements.txt  https://review.opendev.org/c/opendev/bindep/+/93857017:26
opendevreviewJeremy Stanley proposed opendev/bindep master: Drop auxiliary requirements files  https://review.opendev.org/c/opendev/bindep/+/94071117:26
fungifrickler: ^ addressed your consistency comment17:27
opendevreviewClark Boylan proposed opendev/system-config master: DNM testing new py312 build of lodgeit  https://review.opendev.org/c/opendev/system-config/+/94514317:31
clarkbthe cronjob produced an export file on nb07. I think I'm happy with the new builders and it should be ok to start removing the old ones17:36
opendevreviewJeremy Stanley proposed opendev/bindep master: Drop auxiliary requirements files  https://review.opendev.org/c/opendev/bindep/+/94071118:05
fungifrickler: ^ seems the comments do work based on local testing18:06
fricklerfungi: did you reply with gertty? looks a bit weird to see your reply in front of my question https://review.opendev.org/c/opendev/bindep/+/940711/7..8/pyproject.toml18:19
fungifrickler: yeah, i'm using the experimental patch for threaded comment support, but i guess it still gets something wrong with ordering things18:20
opendevreviewMerged opendev/system-config master: Cleanup docker-compose.yaml versions in Noble services  https://review.opendev.org/c/opendev/system-config/+/94513119:01
opendevreviewJeremy Stanley proposed opendev/bindep master: Drop auxiliary requirements files  https://review.opendev.org/c/opendev/bindep/+/94071119:01
fungiclarkb: frickler: ^ i just spotted a lingering test-requirements.txt reference in CONTRIBUTING.rst too19:01
opendevreviewJeremy Stanley proposed opendev/engagement master: Update project boilerplate  https://review.opendev.org/c/opendev/engagement/+/94515119:04
opendevreviewJeremy Stanley proposed opendev/engagement master: Import old who-approves.py script  https://review.opendev.org/c/opendev/engagement/+/94515219:04
opendevreviewJeremy Stanley proposed opendev/system-config master: Remove redundant old who-approves.py script  https://review.opendev.org/c/opendev/system-config/+/94515419:06
opendevreviewMerged opendev/system-config master: Remove nb01, nb02, and nb04 from config management  https://review.opendev.org/c/opendev/system-config/+/94512419:17
opendevreviewJeremy Stanley proposed opendev/engagement master: Update project boilerplate  https://review.opendev.org/c/opendev/engagement/+/94515119:28
opendevreviewJeremy Stanley proposed opendev/engagement master: Import old who-approves.py script  https://review.opendev.org/c/opendev/engagement/+/94515219:28
fungiinfra-root: any more takers on https://review.opendev.org/938570 and https://review.opendev.org/940711 for bindep? both have 2x+2 already, so i'll plan to self-approve those and tag a new release in, say, an hour20:55
clarkbinfra-root anyone else want to review https://review.opendev.org/c/openstack/project-config/+/945126 to remove nb04 from project-config and then there is this unreviewed dns cleanup for the three old builders: https://review.opendev.org/c/opendev/zone-opendev.org/+/94512520:56
clarkbassuming those land soonish I'll plan to check the new builders one more time tomorrow morning then delete them and their volumes20:56
clarkband then I guess I'll look at replacing the arm mirror node and whatever else is next on the todo list after that20:57
clarkband hopeflly also update more things to python31220:57
fungithanks, i missed those two in the earlier flurry, but have approved both now20:59
clarkbthanks!21:00
opendevreviewMerged opendev/zone-opendev.org master: Remove dns records for old nodepool builders  https://review.opendev.org/c/opendev/zone-opendev.org/+/94512521:00
clarkbfungi: note the project-config change didn't get approved21:47
clarkbshoudl I go ahead and approve it?21:47
fungisorry, did now. thought id pushed the button but i guess not22:36
opendevreviewMerged openstack/project-config master: Cleanup nb04 builder config  https://review.opendev.org/c/openstack/project-config/+/94512622:38
opendevreviewMerged opendev/bindep master: Drop requirements.txt  https://review.opendev.org/c/opendev/bindep/+/93857022:43
opendevreviewMerged opendev/bindep master: Drop auxiliary requirements files  https://review.opendev.org/c/opendev/bindep/+/94071122:43
clarkbboom22:45
fungigoing to tag bindep 2.13.0 as a minor increment since it drops python 3.6 support22:48
clarkbsounds good22:48
fungiwe've generally not been doing major increments for that if requires_python keeps consumers with older interpreters from pulling the new release22:48
clarkbI'm double checking nodepool builders are 945126 deployed22:48
clarkbya /etc/nodepool/nodepool.yaml hasn't updated on nb07 since yesterday and much longer for nb04 (which isn't in inventory anymore and is in the emergency file)22:50
clarkbanyway I think that lgtm. I'll plan to do server cleanups in the morning22:50
clarkbnote the change that removed version: from docker-compose for paste, tracing, grafana, and codesarch doesn't appaer to have restarted any of those services. I don't think we need to but wanted to make note of that22:51
clarkbthe python3.12 updates will address that for paste at least22:51
clarkband we have sufficient CI I'm not worried that broke something. I also confirmed the warning about version went away when manually running the cron command on nb0622:51
fungicommit 09f16f60713cc6d00cf8226cd55ca3417e33051e (HEAD -> tag: 2.13.0, origin/master, origin/HEAD, gerrit/master, gerrit/HEAD)22:55
clarkblooking22:55
clarkb09f16f60713cc6d00cf8226cd55ca3417e33051e is the commit I see and 2.13.0 appears to be the next tag version so that lgtm22:56
fungithanks! pushing...22:56
fungiopendev-publish-unversioned-nox-docs has still got problems: https://zuul.opendev.org/t/opendev/build/889e2f2e6883492eab63bf5376be7aa423:02
fungisupposed to be ref-agnostic, but doesn't seem to work in the tag-oriented pipelines yet23:03
clarkbthe task that failed only runs when zuul.tag is set23:03
clarkbhttps://zuul.opendev.org/t/opendev/build/889e2f2e6883492eab63bf5376be7aa4/console#3/0/3/localhost23:03
fungiyeah, that's presumably what still needs adjusting23:04
clarkbthat playbook is identical to the tox-docs publish playbook23:05
clarkber wait maybe its a different part of the code23:05
clarkbok write-root-marker is an included role. So the publish playbook is the same and they both call that role which is where it ultimately fails23:06
fungithe hope was that we could have one job that updates the docs after changes merge and also after tags appear, particularly so that published release notes don't show the latest release as still under development until the next change after the tag merges23:08
fungihttps://pypi.org/project/bindep/ shows the new version now23:08
fungi#status log Released Bindep 2.13.0: https://docs.opendev.org/opendev/bindep/latest/releasenotes.html#relnotes-2-13-023:08
opendevstatusfungi: finished logging23:08
fungi(that url is "eventually consistent")23:08
fungiseems it did actually publish?23:09
clarkbhttps://pypi.org/project/bindep/#files seems so23:10
fungii mean the release notes published with the updated version23:10
clarkbfungi: I think I see the problem and I think this affects tox too23:10
clarkbpost.yaml needs to run first but if you look at https://zuul.opendev.org/t/opendev/build/889e2f2e6883492eab63bf5376be7aa4/console#3/0/3/localhost its queued up behind the child job publish.yaml23:11
fungiah23:11
clarkbalso the playbooks/tox-jobs/post.yaml does an extra fetch compared to playbooks/nox-jobs/post.yaml23:11
clarkbI'm not sure if that matters yet but I think the fundamental problem is the order of playbooks.23:11
clarkbI think in opendev/base-jobs we need the child jobs to run the post.yaml and the publish.yaml to get the order right (then parent can run them again as things bubble up?23:12
clarkbactually the parent abstract base job is using the same playbooks so we can just drop the playbook from the child jobs23:12
clarkbfetch-tox-output is just collecting tox logs so that shouldn't matter23:13
clarkbone sec23:13
fungii expect that's something i screwed up in the existing implementation, this inconsistency has been a quixotic ocd quest of mine for years now23:13
clarkbthe two publishing jobs don't actually differ in any way except their names23:14
clarkboh wait the secrets are different and that includes publication dest info I think23:15
clarkbso two jobs there is fine23:15
fungiyeah, i think that was the reason23:16
fungidest was baked into the secret23:16
fungiso needed a different secret to alter23:16
clarkbfungi: looking at the console log we actually run publish.yaml and fail. Then we run post.yaml and publish.yaml and succeed23:17
clarkbthis is why things updated but the job still records it as a fail23:17
clarkbanyway I'm writing a commit message explaining this and will push shortly23:18
fungihah, wow that's terrible23:18
opendevreviewClark Boylan proposed opendev/base-jobs master: Remove post-run playbooks from child doc publishing jobs  https://review.opendev.org/c/opendev/base-jobs/+/94517423:19
clarkbhttps://zuul.opendev.org/t/opendev/build/889e2f2e6883492eab63bf5376be7aa4/console illustrates this23:19
opendevreviewMerged opendev/base-jobs master: Remove post-run playbooks from child doc publishing jobs  https://review.opendev.org/c/opendev/base-jobs/+/94517423:38

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!