opendevreview | Karolina Kula proposed opendev/glean master: WIP: Add support for CentOS 10 keyfiles https://review.opendev.org/c/opendev/glean/+/941672 | 10:25 |
---|---|---|
opendevreview | Karolina Kula proposed opendev/glean master: WIP: Add support for CentOS 10 keyfiles https://review.opendev.org/c/opendev/glean/+/941672 | 11:57 |
Clark[m] | https://thelibre.news/foss-infrastructure-is-under-attack-by-ai-companies/ | 14:24 |
Clark[m] | I'm having a slow start today. But enjoy this article that covers similar experiences to our own. Also I feel a bit lucky in that our mitigations seem to have been mostly sufficient so far | 14:25 |
*** priteau2 is now known as priteau | 15:01 | |
priteau | Hello. In blazar-nova we need to install nova to run unit tests (since blazar-nova is a scheduler filter for Nova), we've been using a script that supports this for a while now. We recently reworked it to remove use of zuul-cloner, thinking that we could clone from local /opt/git/opendev.org instead. It worked fine in master but is failing now in stable/2025.1, but only for py39 | 15:26 |
priteau | jobs. Do you know what could be causing this? | 15:26 |
priteau | https://review.opendev.org/c/openstack/blazar-nova/+/834159/4/tools/tox_install.sh | 15:26 |
frickler | priteau: that branch might not be in the copy in the image until it has been rebuilt, which can take a couple of days. but then, why do you do this at all and do not add nova as required-project and let zuul prepare the repo for you? | 15:31 |
tonyb | priteau: There's always a lag in what's in the image cache and what is truly in master. There are many potential solutions. You could always to a `git remote update` in the cache dir, you could look for a branch and fallback to master, you could add nova to your required projects (if it isn't already) and then use that to install from. | 15:48 |
clarkb | ya the most correct thing for this use case is to use zuul's required projects to set up the repo for you | 15:49 |
clarkb | that will get you working depends-on among other features | 15:49 |
tonyb | priteau: At some level it depends on the usecase for tox_install.sh. Is it primarily used for CI? or for local developers | 15:49 |
clarkb | infra-root nb07 has built every arm64 image except openeuler (which is paused/disabled for now). It will run the image export cron in about 2 hours. If that cronjob is successful then I'll start pushing up changes to remove nb04 and clean it up. Note I think I put nb04 in the emergency file before it's cron could upate so we may still get a complaint from it but as long as nb07 looks | 15:50 |
clarkb | good that should be fine as nb04 will go away soon I hope | 15:50 |
tonyb | Yeah adding nova as a required project will do many good things for CI, but some of those things will need to be emulated for a local developer experience. | 15:51 |
tonyb | priteau: Happy to discuss some of the various options | 15:51 |
clarkb | one thing to consider with removing old builders is that while we don't think that will cause us to orphan image records in the zk db it is a possibility. It looks like rockylinux-9 images have rolled over though and cleared out records without nb01 or nb02 running so I think this is evidence that our hunch is correct and we won't have a problem | 15:53 |
fungi | technically both should be compatible approaches | 15:53 |
fungi | if a developer has a nova clone then they'll want to check out the latest branch state for it | 15:54 |
fungi | while zuul will do that part for you (and adjust to speculative state for any depends-on reference) | 15:55 |
tonyb | fungi: I didn't mean to imply it couldn't be done. zuul just makes it easier and if a local developer wants the same easy experience the tox or tools/tox_install.sh need to accommodate that. | 15:56 |
tonyb | It may be that it already does but I'm not super familiar with the nova-blazar daily workflow and I'd be guessing based on other projects | 15:57 |
clarkb | ya you can do something like what devstack does. | 15:57 |
clarkb | (devstack has a flag that says git repos are in a happy state already leave them alone (the zuul case) or it will clone if necessary then update to the current branch) | 15:58 |
tonyb | Oh nice. | 15:58 |
opendevreview | Clark Boylan proposed opendev/system-config master: Remove nb01, nb02, and nb04 from config management https://review.opendev.org/c/opendev/system-config/+/945124 | 16:06 |
clarkb | as mentioned I'd like to hold off on merging ^ until we at least see the cronjob on nb07 succeed | 16:07 |
clarkb | and then happy to hold off further if others want to wait for other checks or database state cleanup | 16:07 |
opendevreview | Clark Boylan proposed opendev/zone-opendev.org master: Remove dns records for old nodepool builders https://review.opendev.org/c/opendev/zone-opendev.org/+/945125 | 16:07 |
opendevreview | Clark Boylan proposed openstack/project-config master: Cleanup nb04 builder config https://review.opendev.org/c/openstack/project-config/+/945126 | 16:08 |
opendevreview | Vladimir Kozhukalov proposed openstack/project-config master: End gating for openstack/openstack-helm-infra https://review.opendev.org/c/openstack/project-config/+/945127 | 16:12 |
priteau | tonyb: to be honest I copied this script from neutron years ago and rarely use it outside of CI | 16:14 |
tonyb | Trying to get ,nodepool, functional tests running on noble, I hit an error with nodepool-nox-3.11 which I don't think is related: | 16:17 |
tonyb | https://zuul.opendev.org/t/zuul/build/0c81062113ec47e9ab2e6ddcd589e99d/log/job-output.txt#1821-1832 | 16:17 |
tonyb | Pointers on how to debug/check would be appreciated as I don't think that recheck will do anything helpful | 16:17 |
opendevreview | Michal Nasiadka proposed openstack/diskimage-builder master: CI: Add a tool for displaying CPU flags and QEMU version https://review.opendev.org/c/openstack/diskimage-builder/+/937836 | 16:19 |
clarkb | tonyb: look at 'ZooKeeper connection: LOST' in the logs. I think this is annoying zookeper connectivity exploded problem | 16:21 |
clarkb | I recently updated the zookeeper version used by zuul to something more modern that should run better on the java on jammy and noble. That updated zuul-jobs' ensure-zookeeper role which I suspect nodepool also uses | 16:21 |
clarkb | its possible that a simple recheck is all you need due to ^ | 16:21 |
clarkb | I'm trying to find the version it installed now | 16:22 |
clarkb | https://zuul.opendev.org/t/zuul/build/0c81062113ec47e9ab2e6ddcd589e99d/console#3/0/3/ubuntu-jammy hrm no that used 3.9.3 which is current | 16:23 |
priteau | frickler: clarkb: I was already suspecting that adding a required-project was the proper fix, but we are only using openstack-python3-jobs from templates here. Do we need to use the same approach as neutron were they redefine jobs? https://opendev.org/openstack/neutron/src/branch/master/zuul.d/job-templates.yaml | 16:23 |
clarkb | tonyb: doesn't look like this job collects the statsd info like zuul's test jobs do. Also doesn't collect the zk logs like zuul does. Grabbing those two pieces of info might be he next thing to see if zookeeper crashed or similar | 16:23 |
clarkb | priteau: yes, though you aren't so much redefining the jobs as creating a child of the job with additional config | 16:24 |
tonyb | clarkb: If you point me at where we do that elsewhere I can do that | 16:24 |
clarkb | priteau: if you look in openstack-zuul-jobs there are templates for projects like neutron and horizon that do some of this too but if this is the only place you need it doing it once in the repo is probably fine | 16:24 |
clarkb | tonyb: https://opendev.org/zuul/zuul/src/branch/master/playbooks/zuul-nox/post-system-logs.yaml I think this post-run playbook on the zuul-nox job does it | 16:25 |
tonyb | clarkb: Thanks | 16:25 |
clarkb | oh actually dstat-graph may be half the equation for dstat | 16:26 |
clarkb | tonyb: https://opendev.org/zuul/zuul/src/branch/master/playbooks/zuul-nox/pre.yaml#L4-L5 this is the other half in pre-run | 16:26 |
opendevreview | Clark Boylan proposed opendev/system-config master: Remove nb01, nb02, and nb04 from config management https://review.opendev.org/c/opendev/system-config/+/945124 | 16:30 |
opendevreview | Clark Boylan proposed opendev/system-config master: Cleanup docker-compose.yaml versions in Noble services https://review.opendev.org/c/opendev/system-config/+/945131 | 16:33 |
clarkb | frickler: ^ those two changes now cleanup the docker-compose.yaml version: lines for services it is safe to do so on | 16:33 |
clarkb | that should cleanup the warning for those specific setups | 16:34 |
clarkb | tonyb: re 944118 keep in mind the nodepool functional jobs run nodepool from a container so we're not testing noble python etc | 16:46 |
clarkb | but I think you're trying to cover the runtime bootstrapping on noble so thats fine? | 16:46 |
tonyb | Correct | 16:46 |
clarkb | ack | 16:46 |
tonyb | I can call that out in the commit message when it needs re-working | 16:47 |
priteau | Am I doing things correctly in this change? https://review.opendev.org/c/openstack/blazar-nova/+/945132/1/.zuul.yaml | 16:48 |
priteau | It appears to have worked | 16:48 |
opendevreview | Clark Boylan proposed opendev/lodgeit master: Bump lodgeit up to python3.12 https://review.opendev.org/c/opendev/lodgeit/+/945135 | 16:51 |
clarkb | if ^ is happy I'll rebase the granian stuff onto it so that we move to 3.12 first (I think that is a bigger priority) | 16:51 |
clarkb | priteau: yes that will ensure /home/zuul/src/opendev.org/openstack/nova is checked out to the appropraite commit | 16:51 |
clarkb | priteau: so you just need to update your job to refer to that location | 16:52 |
tonyb | It looks right, though more complex than I expected. | 16:52 |
tonyb | Actaully it's installing nova from: Found -e git+file:///opt/git/opendev.org/openstack/nova python package installed rather than from /home/zuul/src/opendev.org/openstack/nova | 16:53 |
opendevreview | Clark Boylan proposed opendev/lodgeit master: Bump lodgeit up to python3.12 https://review.opendev.org/c/opendev/lodgeit/+/945135 | 16:55 |
priteau | tonyb: that's hardcoded in the tox_install script. Should I use /home/zuul instead? | 16:57 |
clarkb | /opt/git is a cache and is only updated when the images rebuild. Zuul knows to use this cache when preparing the content in /home/zuul/src. Basically it starts with the cache content then updates it to the correct git state for your chagne contenxt | 16:58 |
clarkb | In theory no one but zuul should use /opt/git in the zuul ci jobs | 16:58 |
tonyb | priteau: I would. You can attempt to detect the path by looking at the zuul.projects ansible var or you could do something like `for repo_dir in /home/zuul/src/opendev.org /opt/git/ ; do pip install ${repo_dir}/openstack/nova && break ; done` | 17:02 |
clarkb | I would stop using /opt/git entirely | 17:02 |
clarkb | there really is no reason for any zuul job to ever look in those repos themselves | 17:02 |
clarkb | its an implementation detail to speed up zuul's processing of git repos for you | 17:03 |
tonyb | priteau: but I *think* if you require nova, and nova is included in zuul.projects[], tox_siblings will just do what you want | 17:03 |
tonyb | clarkb: Okay, priteau Just switch to /home/zuul :) | 17:04 |
clarkb | oh I had forgotten about tox siblings | 17:05 |
clarkb | yes tox siblings should do what you want here and is maybe the simplest fix | 17:05 |
tonyb | clarkb: and with no actual code change the nodepool unittests passed :/ | 17:08 |
clarkb | tonyb: ya its almost certainly a slow node/oom/low memory type of failure | 17:10 |
clarkb | zookeeper and its clients rely on heartbeats to maintain connectiosn and if something causes a major slowdown heartbeats start to fail and go sideways | 17:11 |
clarkb | I think this is also why corvus has been interested in 16gb nodes with zuul launcher efforts | 17:11 |
tonyb | ahh. That makes sense | 17:11 |
clarkb | corvus: speaking of the openmetal change has an invalid key for az: I noted we might want to put ovh in first before openmetal while we add support for az? | 17:11 |
corvus | clarkb: there should be a depends-on for the change that implements that, and i've approved it, so it should land within the next several years. | 17:18 |
corvus | i think i'd prefer to just wait for it to merge before doing more testing, that way the next batch of zuul-launcher experiments happen with the drivers at parity | 17:19 |
clarkb | wfm | 17:19 |
opendevreview | Jeremy Stanley proposed opendev/bindep master: Drop requirements.txt https://review.opendev.org/c/opendev/bindep/+/938570 | 17:26 |
opendevreview | Jeremy Stanley proposed opendev/bindep master: Drop auxiliary requirements files https://review.opendev.org/c/opendev/bindep/+/940711 | 17:26 |
fungi | frickler: ^ addressed your consistency comment | 17:27 |
opendevreview | Clark Boylan proposed opendev/system-config master: DNM testing new py312 build of lodgeit https://review.opendev.org/c/opendev/system-config/+/945143 | 17:31 |
clarkb | the cronjob produced an export file on nb07. I think I'm happy with the new builders and it should be ok to start removing the old ones | 17:36 |
opendevreview | Jeremy Stanley proposed opendev/bindep master: Drop auxiliary requirements files https://review.opendev.org/c/opendev/bindep/+/940711 | 18:05 |
fungi | frickler: ^ seems the comments do work based on local testing | 18:06 |
frickler | fungi: did you reply with gertty? looks a bit weird to see your reply in front of my question https://review.opendev.org/c/opendev/bindep/+/940711/7..8/pyproject.toml | 18:19 |
fungi | frickler: yeah, i'm using the experimental patch for threaded comment support, but i guess it still gets something wrong with ordering things | 18:20 |
opendevreview | Merged opendev/system-config master: Cleanup docker-compose.yaml versions in Noble services https://review.opendev.org/c/opendev/system-config/+/945131 | 19:01 |
opendevreview | Jeremy Stanley proposed opendev/bindep master: Drop auxiliary requirements files https://review.opendev.org/c/opendev/bindep/+/940711 | 19:01 |
fungi | clarkb: frickler: ^ i just spotted a lingering test-requirements.txt reference in CONTRIBUTING.rst too | 19:01 |
opendevreview | Jeremy Stanley proposed opendev/engagement master: Update project boilerplate https://review.opendev.org/c/opendev/engagement/+/945151 | 19:04 |
opendevreview | Jeremy Stanley proposed opendev/engagement master: Import old who-approves.py script https://review.opendev.org/c/opendev/engagement/+/945152 | 19:04 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Remove redundant old who-approves.py script https://review.opendev.org/c/opendev/system-config/+/945154 | 19:06 |
opendevreview | Merged opendev/system-config master: Remove nb01, nb02, and nb04 from config management https://review.opendev.org/c/opendev/system-config/+/945124 | 19:17 |
opendevreview | Jeremy Stanley proposed opendev/engagement master: Update project boilerplate https://review.opendev.org/c/opendev/engagement/+/945151 | 19:28 |
opendevreview | Jeremy Stanley proposed opendev/engagement master: Import old who-approves.py script https://review.opendev.org/c/opendev/engagement/+/945152 | 19:28 |
fungi | infra-root: any more takers on https://review.opendev.org/938570 and https://review.opendev.org/940711 for bindep? both have 2x+2 already, so i'll plan to self-approve those and tag a new release in, say, an hour | 20:55 |
clarkb | infra-root anyone else want to review https://review.opendev.org/c/openstack/project-config/+/945126 to remove nb04 from project-config and then there is this unreviewed dns cleanup for the three old builders: https://review.opendev.org/c/opendev/zone-opendev.org/+/945125 | 20:56 |
clarkb | assuming those land soonish I'll plan to check the new builders one more time tomorrow morning then delete them and their volumes | 20:56 |
clarkb | and then I guess I'll look at replacing the arm mirror node and whatever else is next on the todo list after that | 20:57 |
clarkb | and hopeflly also update more things to python312 | 20:57 |
fungi | thanks, i missed those two in the earlier flurry, but have approved both now | 20:59 |
clarkb | thanks! | 21:00 |
opendevreview | Merged opendev/zone-opendev.org master: Remove dns records for old nodepool builders https://review.opendev.org/c/opendev/zone-opendev.org/+/945125 | 21:00 |
clarkb | fungi: note the project-config change didn't get approved | 21:47 |
clarkb | shoudl I go ahead and approve it? | 21:47 |
fungi | sorry, did now. thought id pushed the button but i guess not | 22:36 |
opendevreview | Merged openstack/project-config master: Cleanup nb04 builder config https://review.opendev.org/c/openstack/project-config/+/945126 | 22:38 |
opendevreview | Merged opendev/bindep master: Drop requirements.txt https://review.opendev.org/c/opendev/bindep/+/938570 | 22:43 |
opendevreview | Merged opendev/bindep master: Drop auxiliary requirements files https://review.opendev.org/c/opendev/bindep/+/940711 | 22:43 |
clarkb | boom | 22:45 |
fungi | going to tag bindep 2.13.0 as a minor increment since it drops python 3.6 support | 22:48 |
clarkb | sounds good | 22:48 |
fungi | we've generally not been doing major increments for that if requires_python keeps consumers with older interpreters from pulling the new release | 22:48 |
clarkb | I'm double checking nodepool builders are 945126 deployed | 22:48 |
clarkb | ya /etc/nodepool/nodepool.yaml hasn't updated on nb07 since yesterday and much longer for nb04 (which isn't in inventory anymore and is in the emergency file) | 22:50 |
clarkb | anyway I think that lgtm. I'll plan to do server cleanups in the morning | 22:50 |
clarkb | note the change that removed version: from docker-compose for paste, tracing, grafana, and codesarch doesn't appaer to have restarted any of those services. I don't think we need to but wanted to make note of that | 22:51 |
clarkb | the python3.12 updates will address that for paste at least | 22:51 |
clarkb | and we have sufficient CI I'm not worried that broke something. I also confirmed the warning about version went away when manually running the cron command on nb06 | 22:51 |
fungi | commit 09f16f60713cc6d00cf8226cd55ca3417e33051e (HEAD -> tag: 2.13.0, origin/master, origin/HEAD, gerrit/master, gerrit/HEAD) | 22:55 |
clarkb | looking | 22:55 |
clarkb | 09f16f60713cc6d00cf8226cd55ca3417e33051e is the commit I see and 2.13.0 appears to be the next tag version so that lgtm | 22:56 |
fungi | thanks! pushing... | 22:56 |
fungi | opendev-publish-unversioned-nox-docs has still got problems: https://zuul.opendev.org/t/opendev/build/889e2f2e6883492eab63bf5376be7aa4 | 23:02 |
fungi | supposed to be ref-agnostic, but doesn't seem to work in the tag-oriented pipelines yet | 23:03 |
clarkb | the task that failed only runs when zuul.tag is set | 23:03 |
clarkb | https://zuul.opendev.org/t/opendev/build/889e2f2e6883492eab63bf5376be7aa4/console#3/0/3/localhost | 23:03 |
fungi | yeah, that's presumably what still needs adjusting | 23:04 |
clarkb | that playbook is identical to the tox-docs publish playbook | 23:05 |
clarkb | er wait maybe its a different part of the code | 23:05 |
clarkb | ok write-root-marker is an included role. So the publish playbook is the same and they both call that role which is where it ultimately fails | 23:06 |
fungi | the hope was that we could have one job that updates the docs after changes merge and also after tags appear, particularly so that published release notes don't show the latest release as still under development until the next change after the tag merges | 23:08 |
fungi | https://pypi.org/project/bindep/ shows the new version now | 23:08 |
fungi | #status log Released Bindep 2.13.0: https://docs.opendev.org/opendev/bindep/latest/releasenotes.html#relnotes-2-13-0 | 23:08 |
opendevstatus | fungi: finished logging | 23:08 |
fungi | (that url is "eventually consistent") | 23:08 |
fungi | seems it did actually publish? | 23:09 |
clarkb | https://pypi.org/project/bindep/#files seems so | 23:10 |
fungi | i mean the release notes published with the updated version | 23:10 |
clarkb | fungi: I think I see the problem and I think this affects tox too | 23:10 |
clarkb | post.yaml needs to run first but if you look at https://zuul.opendev.org/t/opendev/build/889e2f2e6883492eab63bf5376be7aa4/console#3/0/3/localhost its queued up behind the child job publish.yaml | 23:11 |
fungi | ah | 23:11 |
clarkb | also the playbooks/tox-jobs/post.yaml does an extra fetch compared to playbooks/nox-jobs/post.yaml | 23:11 |
clarkb | I'm not sure if that matters yet but I think the fundamental problem is the order of playbooks. | 23:11 |
clarkb | I think in opendev/base-jobs we need the child jobs to run the post.yaml and the publish.yaml to get the order right (then parent can run them again as things bubble up? | 23:12 |
clarkb | actually the parent abstract base job is using the same playbooks so we can just drop the playbook from the child jobs | 23:12 |
clarkb | fetch-tox-output is just collecting tox logs so that shouldn't matter | 23:13 |
clarkb | one sec | 23:13 |
fungi | i expect that's something i screwed up in the existing implementation, this inconsistency has been a quixotic ocd quest of mine for years now | 23:13 |
clarkb | the two publishing jobs don't actually differ in any way except their names | 23:14 |
clarkb | oh wait the secrets are different and that includes publication dest info I think | 23:15 |
clarkb | so two jobs there is fine | 23:15 |
fungi | yeah, i think that was the reason | 23:16 |
fungi | dest was baked into the secret | 23:16 |
fungi | so needed a different secret to alter | 23:16 |
clarkb | fungi: looking at the console log we actually run publish.yaml and fail. Then we run post.yaml and publish.yaml and succeed | 23:17 |
clarkb | this is why things updated but the job still records it as a fail | 23:17 |
clarkb | anyway I'm writing a commit message explaining this and will push shortly | 23:18 |
fungi | hah, wow that's terrible | 23:18 |
opendevreview | Clark Boylan proposed opendev/base-jobs master: Remove post-run playbooks from child doc publishing jobs https://review.opendev.org/c/opendev/base-jobs/+/945174 | 23:19 |
clarkb | https://zuul.opendev.org/t/opendev/build/889e2f2e6883492eab63bf5376be7aa4/console illustrates this | 23:19 |
opendevreview | Merged opendev/base-jobs master: Remove post-run playbooks from child doc publishing jobs https://review.opendev.org/c/opendev/base-jobs/+/945174 | 23:38 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!