Monday, 2025-07-14

*** cloudnull1097746 is now known as cloudnull10977400:11
*** cloudnull1097742 is now known as cloudnull10977403:02
fricklerinfra-root: afs permission denied error in the release job https://zuul.opendev.org/t/openstack/build/efae434519884bf3b9710e29800ebcca12:39
fungilooks like that ran on ze0912:48
fungiand /afs/.openstack.org is currently mounted there12:49
fungiand looks like the same job succeeded a few minutes later for another roject12:51
fungiran the same task even12:52
fungithere were no package upgrades occurring that could have restarted the openafs client daemon at that time (last entries in dpkg.log were from friday)12:53
funginothing relevant in dmesg output either12:53
fungithe job got far enough to push packages to pypi, but not to the tarballs site12:54
fungiand packages were pushed to that same /afs/.openstack.org/project/tarballs.opendev.org/openstack/neutron-lib/ directory a week ago12:57
fungii'm about out of potential causes to check13:01
fungias for fixing things, i can probably fetch the files back from pypi, manually sign them with the release openpgp key, and stick all those files into afs myself, but i've got a full day of meetings so it's not going to happen straight away13:02
Clark[m]frickler we never built a wheel mirror for noble because pypi happily ignores it as long as packages are available otherwise13:07
Clark[m]fungi ze09 was our half upgraded server from the upgrade past time before last. However it should've upgraded this last round so probably unrelated. Unless the latest upgrades booted up with non functional afs13:09
Clark[m]When you checked the mount did you check being able to retrieve content under the mount?13:09
Clark[m]But ya if the kernel log/dmesg don't have anything in them then maybe it isn't some non functioning afs issue and something more subtle. I don't have good ideas for debugging further13:11
Clark[m]Oh I have an idea. That path is within the executor container not the host. Maybe the container started before afs booted sufficiently and the bindmount is sad 13:13
Clark[m]Maybe we bind mounted an empty /afs then the host mounts it's afs stuff over that and the container side doesn't seem the update?13:16
fricklerClark[m]: that theory sounds correct: https://paste.opendev.org/show/bnnLbr28rozylIb7JF4J/13:34
fungiyeah, that does indeed sound likely13:38
fungimaybe we need some safety to block starting the container until there is content in /afs/.openstack13:38
fungibut yes i did confirm from the host context it saw files under there13:39
Clark[m]Ya I think we need to check each of the executors and restart containers for each in this state. Then update the rolling reboots playbook to delay starting the containers13:45
Clark[m]I'm popping out for a bike ride before it gets hot but then can look at the playbook side if that is still useful13:45
fungihave a good time!13:45
fungiwiki logins aren't sticking, going to reboot the server once more13:55
fungisystem load average is only a little over 30 at the moment13:55
fungii've seen it a lot higher in recent months13:56
fungi#status log Rebooted wiki.openstack.org again to get OpenID logins back to a working state14:04
opendevstatusfungi: finished logging14:05
fungioof, now the wiki's load average is up to 10614:22
clarkbok finally back after a bike ride and system updates and reboots16:28
clarkbdoesn't look like the executor on ze09 has been restarted yet. I'm going to check if any others need restarts then figure out a plan16:29
clarkbze09 is the only oen in this state (all of the oths ls -l /afs with openstack.org returned inside the container)16:35
clarkbI'm going to request graceful shutdown of the container on ze09 now16:38
clarkbthats done. Now we wait16:39
opendevreviewClark Boylan proposed opendev/system-config master: Wait for AFS to mount when rebooting executors  https://review.opendev.org/c/opendev/system-config/+/95497816:56
clarkbthat isn't easily testable so review carefully please :)16:57
clarkbcorvus: I do wonder if that check should go into the start.yaml tasks file instead?17:00
clarkbcorvus: not sure if you think that is generally useful to check or if it is reboot specific17:00
corvusclarkb: technically, yes, but i think this is better because it should never be a problem except on boot... and maybe if it is, maybe we want to start anyway...  but i don't feel strongly.17:03
clarkback I'll leave it as is then17:07
clarkbdown to 1 job on ze0917:20
clarkb#status log Restarted zuul executor on ze09 to force its bind mount for /afs to see the actual AFS filesystem17:44
opendevstatusclarkb: finished logging17:44
clarkbfrickler: ^ fyi this is done. I think we should be clear for releases to proceedagain17:45
clarkbI also simply restarted the existing conatiner so that that executor stays in sync with the rest of the install version wise17:45
fungithanks clarkb!18:13
fungii'll get to work on the cleanup for the failed release shortly18:13
corvusclarkb: do recall correctly that the only nodepool label at this point is bionic-arm64, and you have changes to remove all the jobs that use it, but those changes have not merged?18:41
corvuss/do recall/do i recall/18:41
*** dhill is now known as Guest2194218:48
clarkbcorvus: that sounds right? I'm not sure if the changes remove all use of them but gmaan took them over and made them more complete. Let me pull up the latest staet18:55
clarkbcorvus: https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/954761 and child are the "top level" removals and they have a bunch of depends on to clean up the leaves18:56
clarkblooks lik3 ~6 leaf changes are still open18:56
gmaanonly one deps left to merge which again depends on few fixes https://review.opendev.org/c/openstack/neutron/+/95478618:56
clarkbgmaan: ya and then all of the unmaintained openstack/requirements changes are still -118:56
clarkbbecause tempest full py3 is failing on them18:57
gmaanah that one, right. 18:57
clarkbwe could force merge the requirements updates since ist largely mechanical zuul updates no real code changes18:58
clarkbfungi: ^ not sure what you think about that18:58
fungii'm okay with the idea, but from an openstack community perspective i would prefer openstack/requirements eol those branches if they're not being maintained19:16
fungielodilles: ^ what's your take?19:16
fungiwell, not being taken care of and kept working with their testing anyway, i should avoid the term "maintain" with unmaintained branches19:17
slittle1_Can't seem to upload a merge commit to ssh://review.opendev.org:29418/starlingx/docs.git as I can to other starlingx projects19:33
slittle1_To ssh://review.opendev.org:29418/starlingx/docs.git19:34
slittle1_ ! [remote rejected]   HEAD -> refs/for/f/trixie%topic=merge-master-as-of-20250711T041000Z (commit 0b8bd1e: you are not allowed to upload merges)19:34
slittle1_error: failed to push some refs to 'ssh://review.opendev.org:29418/starlingx/docs.git'19:34
slittle1_It's config in project-config seems normal19:35
fungislittle1_: what's a repository you've successfully pushed merge commits into? i don't see any explicit merge commit permissions in https://opendev.org/openstack/project-config/src/branch/master/gerrit/acls/starlingx/docs.config19:45
slittle1_https://review.opendev.org/c/starlingx/vault-armada-app/+/95503419:46
fungithat one has this permission which the docs repo lacks? https://opendev.org/openstack/project-config/src/branch/master/gerrit/acls/starlingx/vault-armada-app.config#L1-L219:46
fungislittle1_: if you want to propose that addition i'll approve it right away19:47
slittle1_will do19:48
opendevreviewScott Little proposed openstack/project-config master: add ability for group starlingx-release to do merges on starlingx/docs  https://review.opendev.org/c/openstack/project-config/+/95503519:53
slittle1_https://review.opendev.org/c/openstack/project-config/+/95503519:53
fungiyep, approved19:59
fungionce it merges and deploy pipeline jobs complete, try to push again20:00
clarkbnote the manage-projects job has been slow recently due to the ai crawlers so it may be a minute20:00
clarkbthough load looks reasonable right now so maybe it will be quick20:01
opendevreviewMerged opendev/zuul-providers master: Move ubuntu-focal-arm64 nodeset  https://review.opendev.org/c/opendev/zuul-providers/+/95477820:01
opendevreviewMerged openstack/project-config master: add ability for group starlingx-release to do merges on starlingx/docs  https://review.opendev.org/c/openstack/project-config/+/95503520:06
clarkbit was quick nice. slittle1_ should be ready for testing now20:15
clarkbif we land https://review.opendev.org/c/opendev/system-config/+/954978 by Friday I'll be back by then and can try to monitor the restarts over the weekend21:39
clarkbbut also now is a good time to remind everyone I'm out tomorrow through thursday. Back Friday. With the goal of doing those Gerrit updates Friday or Monday depending on how Friday is going21:40
clarkbI did recheck my gerrit quay.io image hosting change last week and it still passes so I think that is good to go21:40
clarkblooks like the CLA removal change also passes testing so I think we're in good shape for Friday21:41
fungii'll remind myself to run the meeting tomorrow21:41
fungiwere you going to send out the agenda or should i?21:42
clarkbI can send it out if that helps but happy for you to do it as well21:43
clarkbI think the edits I would make are updates to niz status (capture trixie and centos 10 stuff here and drop the specific entries), make note of the planned gerrit updates, and add the matrix spec to the agenda21:45
clarkbI'll go ahead and start on those edits21:45
clarkbthose edits are in. If there are no other comments for edits in the next half hour to hour or so I'll send it out21:54
clarkbfungi: ^ fyi I can do that21:54
clarkbthis is interesting. Local dns stopped returning records to me. I think it is because cloudflare is having a sad21:58
clarkbalso both 1.0.0.1 and 1.1.1.1 appear to be non working. I guess a situation where ther failure domains are not properly separated22:04
clarkbdebugging this I notice that docs.openstack.org has a 5 minute ttl. Not sure if we want to change that22:17
fungii didn't have any updates for the agenda. feel free to send it out or i can do it if you'd prefer22:29
clarkbya I'll send it out here in a few minutes22:44
fungithanks!22:45
clarkbsent!22:52
corvusclarkb: i think we're ready to remove the bionic label and start letting that error out for anyone still using it.  can you confirm or object to that?  https://review.opendev.org/954759  is the change that would start doing that.  that will help me progress niz while you're out.23:14
corvus(that's the main question i have for the group at tomorrows meeting; so asking for your input ahead of time)23:15
clarkbcorvus: I think my main concern with that is if we think it will make landing the change to clean it up more difficult23:19
clarkbcurrently things are split into two changes and I think we would have to squash them if that change landes23:20
clarkbwhich I guess is fine so I'd be ok with that23:20
corvusack thanks!23:22

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!