*** cloudnull1097746 is now known as cloudnull109774 | 00:11 | |
*** cloudnull1097742 is now known as cloudnull109774 | 03:02 | |
frickler | infra-root: afs permission denied error in the release job https://zuul.opendev.org/t/openstack/build/efae434519884bf3b9710e29800ebcca | 12:39 |
---|---|---|
fungi | looks like that ran on ze09 | 12:48 |
fungi | and /afs/.openstack.org is currently mounted there | 12:49 |
fungi | and looks like the same job succeeded a few minutes later for another roject | 12:51 |
fungi | ran the same task even | 12:52 |
fungi | there were no package upgrades occurring that could have restarted the openafs client daemon at that time (last entries in dpkg.log were from friday) | 12:53 |
fungi | nothing relevant in dmesg output either | 12:53 |
fungi | the job got far enough to push packages to pypi, but not to the tarballs site | 12:54 |
fungi | and packages were pushed to that same /afs/.openstack.org/project/tarballs.opendev.org/openstack/neutron-lib/ directory a week ago | 12:57 |
fungi | i'm about out of potential causes to check | 13:01 |
fungi | as for fixing things, i can probably fetch the files back from pypi, manually sign them with the release openpgp key, and stick all those files into afs myself, but i've got a full day of meetings so it's not going to happen straight away | 13:02 |
Clark[m] | frickler we never built a wheel mirror for noble because pypi happily ignores it as long as packages are available otherwise | 13:07 |
Clark[m] | fungi ze09 was our half upgraded server from the upgrade past time before last. However it should've upgraded this last round so probably unrelated. Unless the latest upgrades booted up with non functional afs | 13:09 |
Clark[m] | When you checked the mount did you check being able to retrieve content under the mount? | 13:09 |
Clark[m] | But ya if the kernel log/dmesg don't have anything in them then maybe it isn't some non functioning afs issue and something more subtle. I don't have good ideas for debugging further | 13:11 |
Clark[m] | Oh I have an idea. That path is within the executor container not the host. Maybe the container started before afs booted sufficiently and the bindmount is sad | 13:13 |
Clark[m] | Maybe we bind mounted an empty /afs then the host mounts it's afs stuff over that and the container side doesn't seem the update? | 13:16 |
frickler | Clark[m]: that theory sounds correct: https://paste.opendev.org/show/bnnLbr28rozylIb7JF4J/ | 13:34 |
fungi | yeah, that does indeed sound likely | 13:38 |
fungi | maybe we need some safety to block starting the container until there is content in /afs/.openstack | 13:38 |
fungi | but yes i did confirm from the host context it saw files under there | 13:39 |
Clark[m] | Ya I think we need to check each of the executors and restart containers for each in this state. Then update the rolling reboots playbook to delay starting the containers | 13:45 |
Clark[m] | I'm popping out for a bike ride before it gets hot but then can look at the playbook side if that is still useful | 13:45 |
fungi | have a good time! | 13:45 |
fungi | wiki logins aren't sticking, going to reboot the server once more | 13:55 |
fungi | system load average is only a little over 30 at the moment | 13:55 |
fungi | i've seen it a lot higher in recent months | 13:56 |
fungi | #status log Rebooted wiki.openstack.org again to get OpenID logins back to a working state | 14:04 |
opendevstatus | fungi: finished logging | 14:05 |
fungi | oof, now the wiki's load average is up to 106 | 14:22 |
clarkb | ok finally back after a bike ride and system updates and reboots | 16:28 |
clarkb | doesn't look like the executor on ze09 has been restarted yet. I'm going to check if any others need restarts then figure out a plan | 16:29 |
clarkb | ze09 is the only oen in this state (all of the oths ls -l /afs with openstack.org returned inside the container) | 16:35 |
clarkb | I'm going to request graceful shutdown of the container on ze09 now | 16:38 |
clarkb | thats done. Now we wait | 16:39 |
opendevreview | Clark Boylan proposed opendev/system-config master: Wait for AFS to mount when rebooting executors https://review.opendev.org/c/opendev/system-config/+/954978 | 16:56 |
clarkb | that isn't easily testable so review carefully please :) | 16:57 |
clarkb | corvus: I do wonder if that check should go into the start.yaml tasks file instead? | 17:00 |
clarkb | corvus: not sure if you think that is generally useful to check or if it is reboot specific | 17:00 |
corvus | clarkb: technically, yes, but i think this is better because it should never be a problem except on boot... and maybe if it is, maybe we want to start anyway... but i don't feel strongly. | 17:03 |
clarkb | ack I'll leave it as is then | 17:07 |
clarkb | down to 1 job on ze09 | 17:20 |
clarkb | #status log Restarted zuul executor on ze09 to force its bind mount for /afs to see the actual AFS filesystem | 17:44 |
opendevstatus | clarkb: finished logging | 17:44 |
clarkb | frickler: ^ fyi this is done. I think we should be clear for releases to proceedagain | 17:45 |
clarkb | I also simply restarted the existing conatiner so that that executor stays in sync with the rest of the install version wise | 17:45 |
fungi | thanks clarkb! | 18:13 |
fungi | i'll get to work on the cleanup for the failed release shortly | 18:13 |
corvus | clarkb: do recall correctly that the only nodepool label at this point is bionic-arm64, and you have changes to remove all the jobs that use it, but those changes have not merged? | 18:41 |
corvus | s/do recall/do i recall/ | 18:41 |
*** dhill is now known as Guest21942 | 18:48 | |
clarkb | corvus: that sounds right? I'm not sure if the changes remove all use of them but gmaan took them over and made them more complete. Let me pull up the latest staet | 18:55 |
clarkb | corvus: https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/954761 and child are the "top level" removals and they have a bunch of depends on to clean up the leaves | 18:56 |
clarkb | looks lik3 ~6 leaf changes are still open | 18:56 |
gmaan | only one deps left to merge which again depends on few fixes https://review.opendev.org/c/openstack/neutron/+/954786 | 18:56 |
clarkb | gmaan: ya and then all of the unmaintained openstack/requirements changes are still -1 | 18:56 |
clarkb | because tempest full py3 is failing on them | 18:57 |
gmaan | ah that one, right. | 18:57 |
clarkb | we could force merge the requirements updates since ist largely mechanical zuul updates no real code changes | 18:58 |
clarkb | fungi: ^ not sure what you think about that | 18:58 |
fungi | i'm okay with the idea, but from an openstack community perspective i would prefer openstack/requirements eol those branches if they're not being maintained | 19:16 |
fungi | elodilles: ^ what's your take? | 19:16 |
fungi | well, not being taken care of and kept working with their testing anyway, i should avoid the term "maintain" with unmaintained branches | 19:17 |
slittle1_ | Can't seem to upload a merge commit to ssh://review.opendev.org:29418/starlingx/docs.git as I can to other starlingx projects | 19:33 |
slittle1_ | To ssh://review.opendev.org:29418/starlingx/docs.git | 19:34 |
slittle1_ | ! [remote rejected] HEAD -> refs/for/f/trixie%topic=merge-master-as-of-20250711T041000Z (commit 0b8bd1e: you are not allowed to upload merges) | 19:34 |
slittle1_ | error: failed to push some refs to 'ssh://review.opendev.org:29418/starlingx/docs.git' | 19:34 |
slittle1_ | It's config in project-config seems normal | 19:35 |
fungi | slittle1_: what's a repository you've successfully pushed merge commits into? i don't see any explicit merge commit permissions in https://opendev.org/openstack/project-config/src/branch/master/gerrit/acls/starlingx/docs.config | 19:45 |
slittle1_ | https://review.opendev.org/c/starlingx/vault-armada-app/+/955034 | 19:46 |
fungi | that one has this permission which the docs repo lacks? https://opendev.org/openstack/project-config/src/branch/master/gerrit/acls/starlingx/vault-armada-app.config#L1-L2 | 19:46 |
fungi | slittle1_: if you want to propose that addition i'll approve it right away | 19:47 |
slittle1_ | will do | 19:48 |
opendevreview | Scott Little proposed openstack/project-config master: add ability for group starlingx-release to do merges on starlingx/docs https://review.opendev.org/c/openstack/project-config/+/955035 | 19:53 |
slittle1_ | https://review.opendev.org/c/openstack/project-config/+/955035 | 19:53 |
fungi | yep, approved | 19:59 |
fungi | once it merges and deploy pipeline jobs complete, try to push again | 20:00 |
clarkb | note the manage-projects job has been slow recently due to the ai crawlers so it may be a minute | 20:00 |
clarkb | though load looks reasonable right now so maybe it will be quick | 20:01 |
opendevreview | Merged opendev/zuul-providers master: Move ubuntu-focal-arm64 nodeset https://review.opendev.org/c/opendev/zuul-providers/+/954778 | 20:01 |
opendevreview | Merged openstack/project-config master: add ability for group starlingx-release to do merges on starlingx/docs https://review.opendev.org/c/openstack/project-config/+/955035 | 20:06 |
clarkb | it was quick nice. slittle1_ should be ready for testing now | 20:15 |
clarkb | if we land https://review.opendev.org/c/opendev/system-config/+/954978 by Friday I'll be back by then and can try to monitor the restarts over the weekend | 21:39 |
clarkb | but also now is a good time to remind everyone I'm out tomorrow through thursday. Back Friday. With the goal of doing those Gerrit updates Friday or Monday depending on how Friday is going | 21:40 |
clarkb | I did recheck my gerrit quay.io image hosting change last week and it still passes so I think that is good to go | 21:40 |
clarkb | looks like the CLA removal change also passes testing so I think we're in good shape for Friday | 21:41 |
fungi | i'll remind myself to run the meeting tomorrow | 21:41 |
fungi | were you going to send out the agenda or should i? | 21:42 |
clarkb | I can send it out if that helps but happy for you to do it as well | 21:43 |
clarkb | I think the edits I would make are updates to niz status (capture trixie and centos 10 stuff here and drop the specific entries), make note of the planned gerrit updates, and add the matrix spec to the agenda | 21:45 |
clarkb | I'll go ahead and start on those edits | 21:45 |
clarkb | those edits are in. If there are no other comments for edits in the next half hour to hour or so I'll send it out | 21:54 |
clarkb | fungi: ^ fyi I can do that | 21:54 |
clarkb | this is interesting. Local dns stopped returning records to me. I think it is because cloudflare is having a sad | 21:58 |
clarkb | also both 1.0.0.1 and 1.1.1.1 appear to be non working. I guess a situation where ther failure domains are not properly separated | 22:04 |
clarkb | debugging this I notice that docs.openstack.org has a 5 minute ttl. Not sure if we want to change that | 22:17 |
fungi | i didn't have any updates for the agenda. feel free to send it out or i can do it if you'd prefer | 22:29 |
clarkb | ya I'll send it out here in a few minutes | 22:44 |
fungi | thanks! | 22:45 |
clarkb | sent! | 22:52 |
corvus | clarkb: i think we're ready to remove the bionic label and start letting that error out for anyone still using it. can you confirm or object to that? https://review.opendev.org/954759 is the change that would start doing that. that will help me progress niz while you're out. | 23:14 |
corvus | (that's the main question i have for the group at tomorrows meeting; so asking for your input ahead of time) | 23:15 |
clarkb | corvus: I think my main concern with that is if we think it will make landing the change to clean it up more difficult | 23:19 |
clarkb | currently things are split into two changes and I think we would have to squash them if that change landes | 23:20 |
clarkb | which I guess is fine so I'd be ok with that | 23:20 |
corvus | ack thanks! | 23:22 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!