clarkb | corvus: the upgrades and reboots have gotten as far as zl01. We issueda graceful stop which the process recorded in its debug log. I think we're waiting now for all the uploads to finish maybe? I guess thats fine for rolling upgrades but it is different than the hard stop start we have been doing. I wonder if we should update the playbook to match or not | 00:26 |
---|---|---|
clarkb | looks like it took approximately 15 minutes to gracefull stop. That isn't too bad os maybe this is fine. It does mean the launcher falls off the components list because unlike the executors it is isn't "paused" during that period of time | 00:27 |
clarkb | anyway I think this is working its just didn't look how I expected for a moment so I dug in | 00:28 |
opendevreview | OpenStack Proposal Bot proposed openstack/project-config master: Normalize projects.yaml https://review.opendev.org/c/openstack/project-config/+/957995 | 02:25 |
opendevreview | Merged openstack/project-config master: kolla: Allow kolla-core to remove RP votes https://review.opendev.org/c/openstack/project-config/+/949842 | 04:19 |
opendevreview | Merged openstack/project-config master: Remove nodepool configuration/elements https://review.opendev.org/c/openstack/project-config/+/956184 | 04:22 |
opendevreview | Merged openstack/project-config master: Prepare for retirement of RefStack repositories https://review.opendev.org/c/openstack/project-config/+/947859 | 04:24 |
opendevreview | Merged openstack/project-config master: Stop syncing run_tests/Vagrantfiles for OSA https://review.opendev.org/c/openstack/project-config/+/956944 | 04:24 |
ykarel | Hi is this known? ERROR! couldn't resolve module/action 'openvswitch_bridge'. This often indicates a misspelling, missing collection, or incorrect module path. | 04:47 |
ykarel | or was a timing thing during ansible 11 switch? | 04:47 |
ykarel | seen in todays periodic run | 04:47 |
ykarel | https://c4e65dc90d54a7cb0d09-c58207963db0f03dec19154799b50d2d.ssl.cf5.rackcdn.com/openstack/41fccd9ecb204b53bc4e82d4e6cd9dec/job-output.txt | 04:47 |
ykarel | https://6ab93f4ca7b96e79e883-fb8b0f0ff152f556a5802daf1433e080.ssl.cf5.rackcdn.com/openstack/492f0763d7b54cb388374107cc79cf62/job-output.txt | 04:48 |
ykarel | or may be we need to adopt in https://codesearch.opendev.org/?q=Ensure%20the%20infra%20bridge%20exists&i=nope&literal=nope&files=&excludeFiles=&repos= | 04:49 |
frickler | ykarel: IIUC that module was dropped to be installed with ansible 11, a short term workaround might be to let those jobs run with ansible 9 | 04:59 |
opendevreview | Takashi Kajinami proposed opendev/system-config master: Add OpenVox to mirror https://review.opendev.org/c/opendev/system-config/+/957299 | 05:00 |
ykarel | frickler, ack i was going to copy the module like https://opendev.org/zuul/zuul-jobs/raw/branch/master/roles/multi-node-bridge/library/openvswitch_bridge.py | 05:00 |
frickler | ah, yes. maybe there is a way we can make it useable from other playbooks without duplication, though. | 05:05 |
ykarel | https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/958008 | 05:09 |
ykarel | yes looks setting ANSIBLE_LIBRARY that can be reused, any way to set it ? | 05:10 |
frickler | sorry, I don't know too much about these ansible-in-zuul details, maybe clarkb or corvus have an idea | 05:14 |
ykarel | ok thx | 05:33 |
*** clarkb is now known as Guest24535 | 11:02 | |
*** dhill is now known as Guest24539 | 12:35 | |
fungi | ykarel: yeah, ansible dropped the ovs module, so clarkb vendored it into the multi-node-bridge role with with https://review.opendev.org/c/zuul/zuul-jobs/+/957188 | 12:54 |
fungi | you could also just do something similar | 12:54 |
fungi | though i agree finding a way to not have two copies floating around would be nice | 12:55 |
fungi | longer term it would probably make more sense to replace it entirely in multi-node-bridge with a simiar setup just using linux's bridge driver | 12:55 |
ykarel | fungi, for the limited use going with https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/958008/2/roles/multi-node-setup/tasks/main.yaml | 12:56 |
frickler | infra-root: looks like we again have a kolla-ansible change stuck in check for > 24h, https://review.opendev.org/935704 PS20. not sure if that's related to the ongoing restarts? I'll leave it in place for now in case someone wants to dig further | 13:02 |
*** dhill is now known as Guest24544 | 13:36 | |
Clark[m] | frickler: I wouldn't expect that to be related to the restarts as the launchers updated closer to 14 hours ago. I think if you hover the build status bar in the UI you get the request I'd which you can grep for in launcher and scheduler logs to see where it may be stuck | 13:46 |
Clark[m] | We probably want to check on rax ord and dfw as well since reenabling those may have caused them to error again? | 13:47 |
opendevreview | Clif Houck proposed openstack/diskimage-builder master: Add a sha256sum check for CentOS Cloud Images https://review.opendev.org/c/openstack/diskimage-builder/+/957983 | 13:52 |
opendevreview | Clif Houck proposed openstack/diskimage-builder master: Add a sha256sum check for CentOS Cloud Images https://review.opendev.org/c/openstack/diskimage-builder/+/957983 | 13:53 |
frickler | Clark[m]: if neither you nor corvus want to check deeper, I'd just abandon and restore that change in order tu rerun jobs | 13:54 |
Clark[m] | I can take a look but it will be a bit. I'm trying to get out for an early morning bike ride before the heat of the day but can look when I get back | 13:56 |
fungi | looks like it has two builds that are waiting on specific nodeset requests | 13:57 |
fungi | we could check what provider those are for | 13:57 |
fungi | though it's been in the queue since long before we reenabled the rax classic regions, and these don't look like retries | 13:58 |
corvus | nodeset requests are a56f042da27b4cfda9af080eea029ac7 and ea449d9a4f744590a1e0af1b7c3bc625, for posterity | 14:08 |
corvus | given that they each have the requisite nodes ready and assigned to the request, i think that's very likely a launcher bug | 14:10 |
fungi | i'll be in and out a bit today. taking a break from storm prep to go grab lunch, then when i get back i'll split my time between last-minute yardwork and server upgrades | 14:59 |
cloudnull | Clark[m] fungi can we get you all to shutdown jobs on the rackspace legacy cloud environments? We’re seeing more than 300k api requests hammering the environment this morning. | 15:07 |
cloudnull | Maybe there’s a run away process? Bad return from the legacy api? | 15:07 |
opendevreview | James E. Blair proposed opendev/zuul-providers master: Revert "Reenable rax DFW and ORD providers" https://review.opendev.org/c/opendev/zuul-providers/+/958094 | 15:16 |
corvus | 2025-08-20 15:13:51,274 ERROR zuul.Launcher: keystoneauth1.exceptions.connection.ConnectFailure: Unable to establish connection to https://dfw.servers.api.rackspacecloud.com/v2/637776/servers/d92503ac-b6c8-4d5d-a54c-f6c0a4717271: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) | 15:16 |
corvus | cloudnull ^ i see a lot of those errors | 15:16 |
corvus | 6422 of those errors since utc midnight | 15:18 |
corvus | 31294 regular api calls | 15:20 |
corvus | cloudnull: opendev's zuul issues 37716 api calls to both regions of rax legacy since utc midnight, 6422 of them failed with connection errors | 15:20 |
opendevreview | Merged opendev/zuul-providers master: Revert "Reenable rax DFW and ORD providers" https://review.opendev.org/c/opendev/zuul-providers/+/958094 | 15:21 |
cloudnull | We’ll give it a look one things calm down a bit. | 15:26 |
cloudnull | Do you happen to have an api breakdown of calls made across DFW and ORD since it was reenabled? Maybe this is just an issue with DFW? | 15:27 |
corvus | cloudnull: yes, we have a record of every call. one sec. | 15:32 |
corvus | dfw: 19009 successful calls, 6422 failures; ord: 11271 successful calls, 0 failures | 15:35 |
corvus | cloudnull: ^ so yeah, looks like dfw is the only one we're seeing errors for | 15:35 |
corvus | frickler: i think i have what i need from those changes; they're never going to resolve on their own. i will dequeue them. bugfix to come later. | 15:40 |
corvus | s/those changes/that change/ | 15:41 |
cloudnull | corvus could I bother you to reenabled ord and iad? | 15:42 |
cloudnull | We think there’s an issue specifically with DFW and I’d like to prove that and help with resources | 15:43 |
corvus | cloudnull: we hadn't enabled iad yet, just because its failure mode was different (slow booting nodes) so we wanted to monitor that separately... can i do ord for now and wait for Clark or fungi to be around to add iad back? | 15:44 |
opendevreview | James E. Blair proposed opendev/zuul-providers master: Re-enable rax-ord https://review.opendev.org/c/opendev/zuul-providers/+/958099 | 15:45 |
opendevreview | Merged opendev/zuul-providers master: Re-enable rax-ord https://review.opendev.org/c/opendev/zuul-providers/+/958099 | 15:46 |
cloudnull | Thank you corvus | 15:46 |
frickler | corvus: thx, rechecked the change and will check later whether things progress better now. those stuck requests weren't related to the dfw issues, right? | 15:50 |
corvus | don't think so; it's an error in the ready node reuse code | 15:56 |
fungi | okay, back at the keyboard for a bit to see what i missed | 16:15 |
fungi | and yeah, the stuck jobs had node assignments pending for hours before we tried to bring rax classic back online yesterday, so should have been entirely unrelated | 16:17 |
Guest24535 | I am around now and can monitor reenabling rax-iad if cloudnull is still ok with that: https://review.opendev.org/c/opendev/zuul-providers/+/957957 is the change for that | 16:24 |
Guest24535 | oh heh I've guestified | 16:24 |
Guest24535 | one moment please | 16:24 |
*** Guest24535 is now known as clarkb | 16:25 | |
fungi | welcome back Guest24535! ;) | 16:25 |
clarkb | thanks | 16:26 |
fungi | i'll approve that change, i'm semi-around again now | 16:26 |
clarkb | thanks I'm properly around at this point | 16:27 |
opendevreview | Merged opendev/zuul-providers master: Reenable rax IAD https://review.opendev.org/c/opendev/zuul-providers/+/957957 | 16:27 |
clarkb | then also in my backlog is https://review.opendev.org/c/opendev/zone-opendev.org/+/957981 and child | 16:27 |
fungi | looking | 16:29 |
fungi | approved both, though on the second what do you think about just adding a www vhost to static and doing tghe redirect there since it already hosts a ton of them? | 16:30 |
fungi | alternatively we can probably point is to the lb as long as we add a redirect in the apache configs for the backends | 16:31 |
opendevreview | Merged opendev/zone-opendev.org master: Delete review02 DNS records https://review.opendev.org/c/opendev/zone-opendev.org/+/957981 | 16:32 |
clarkb | fungi: ya I didn't think to far ahead on that one. I just didn't want us to accidentally enable the record and have it point to a non existent location | 16:32 |
clarkb | but now that apache is doing all the initial connections on giteas I thinkwe could handle it there | 16:32 |
opendevreview | Merged opendev/zone-opendev.org master: Update commented out www.opendev.org record https://review.opendev.org/c/opendev/zone-opendev.org/+/957982 | 16:33 |
fungi | oh, good point, it was a less clean set of options in the past when we were doing haproxy->gitea instead of haproxy->apache->gitea | 16:34 |
clarkb | I've shutdown the screen that was running the zuul reboots as those completed successfully | 16:38 |
frickler | if someone gets bored, there's a new big yaml reformatting change, I didn't check yet what happened to trigger this https://review.opendev.org/c/openstack/project-config/+/957995 | 16:39 |
clarkb | corvus: we're largely caught up as of yesterday evening. I think the setup_hook change landed just a bit too late to get deployed though so we aren't running that in prod yet (but we did direct checking of it so I'm not too worried) | 16:39 |
clarkb | fungi: frickler: do you recall what the set mtu to 1500 process was for the rax flex network? that is up next for me if the ssh keys and routers and networks etc all got created properly overnight. THen we can boot a mirror | 16:42 |
clarkb | based on https://grafana.opendev.org/d/fd44466e7f/zuul-launcher3a-rackspace?orgId=1&from=now-1h&to=now&timezone=utc&var-region=$__all rax classic iad seems to be working. we have ready and in use nodes implying that boot timeouts are not a major issue | 16:43 |
fungi | `openstack --os-cloud=opendevzuul-rax-flex --os-region-name=SJC3 network set --mtu=1500 opendevzuul-network1` | 16:44 |
fungi | that | 16:44 |
fungi | 's from my shell history on bridge | 16:44 |
frickler | clarkb: from what I remember we needed to set that on the tenant network? | 16:44 |
frickler | like that, yes | 16:44 |
clarkb | perfect thanks. run_cloud_launcher.log seems to indicate success so I'll set that in IAD3 momentarily | 16:44 |
frickler | maybe check the current value first | 16:45 |
clarkb | frickler: ++ | 16:45 |
frickler | would be interesting to see whether rax fixed their deployments | 16:45 |
fungi | well, they aleady fixed them at least once to no longer be <1500 they merely made them much larger | 16:46 |
clarkb | its 3942 currenctly | 16:46 |
clarkb | *currently | 16:46 |
fungi | but yeah, i dunno if they re-lowered them to 1500 | 16:46 |
fungi | guess not, that's the setpoint i recall | 16:46 |
frickler | ok, so bad configuration for a public cloud IMO, but up to them. | 16:46 |
fungi | not entirely broken at least, but yes likely to lead to pmtud negotiations and/or fragmentation/reassembly | 16:47 |
clarkb | ok I set the mtu to 1500 on that network in both accounts | 16:48 |
fungi | and possible unreachability to/from some places impacted by pmtud black holes, though hopefully those parts of the internet are vanishingly rare these days | 16:48 |
fungi | i still have ptsd from dealing with customer sites where their "security" people had been convinved that icmp was dangerous so they just blocked all of it at their edge | 16:49 |
clarkb | there is an Ubuntu 24.04 image in this cloud region. Do you think we should upload our own noble image or just use theirs? (we uplaoded our own image into most other cloud since it wasn't otherwise available) | 16:49 |
fungi | the whole "ping of death" scare did more lasting damage to the internet in misplaced security filters than the actual packets ever could | 16:50 |
clarkb | I'm somewhat inclined to just use the cloud provided image | 16:50 |
clarkb | our config management should coerce it to looks basically identical to whatever we would upload I think | 16:50 |
fungi | i think i deleted the noble-server-cloudimg-amd64.img from my homedir that i had uploaded to the other regions | 16:52 |
clarkb | ya we would probably just download the ubuntu published image and reupload which is likely the same result as what that cloud image is as well | 16:52 |
fungi | `wget https://cloud-images.ubuntu.com/noble/current/{noble-server-cloudimg-amd64.img,SHA256SUMS{,.gpg}}` is how i pulled it, fwiw | 16:52 |
fungi | followed by `gpg --verify SHA256SUMS.gpg SHA256SUMS && sha256sum -c --ignore-missing SHA256SUMS` | 16:53 |
keekz | cloudnull corvis: dfw should be better now | 16:57 |
fungi | keekz: thanks for the followup! | 16:58 |
clarkb | did someone else want to push a change up to enable that region? I'm working on a new mirror in iad3 | 17:01 |
fungi | gimme a sec i can push a revert | 17:01 |
opendevreview | Jeremy Stanley proposed opendev/zuul-providers master: Reapply "Reenable rax DFW and ORD providers" https://review.opendev.org/c/opendev/zuul-providers/+/958104 | 17:03 |
fungi | though i guess the subject is now misleading since part of it was reverted already | 17:03 |
fungi | happy to amend with revised commit message if anyone cares | 17:03 |
clarkb | ya maybe amend it just to avoid confusion if we have problems in the futuer. It will be clearer that ord was ok | 17:05 |
fungi | can do | 17:13 |
opendevreview | Jeremy Stanley proposed opendev/zuul-providers master: Reapply "Reenable rax DFW provider" https://review.opendev.org/c/opendev/zuul-providers/+/958104 | 17:14 |
fungi | better? | 17:14 |
clarkb | approved | 17:16 |
opendevreview | Merged opendev/zuul-providers master: Reapply "Reenable rax DFW provider" https://review.opendev.org/c/opendev/zuul-providers/+/958104 | 17:17 |
frickler | fwiw I'd be fine with using the cloud ubuntu image | 17:19 |
opendevreview | Clark Boylan proposed opendev/zone-opendev.org master: Add mirror01.iad3.raxflex to our DNS zone https://review.opendev.org/c/opendev/zone-opendev.org/+/958106 | 17:22 |
opendevreview | Clark Boylan proposed opendev/system-config master: Add mirror01.iad3.raxflex to our inventory https://review.opendev.org/c/opendev/system-config/+/958107 | 17:26 |
clarkb | frickler: ya that is what I ended up doing | 17:26 |
clarkb | I think the main reason we didn't use cloud images previously was that they were simply not available so I took advantage of the image being available here and simplify things. It seems to hae worked fine and you can ssh in to the IP address in ^ and check it out first too | 17:26 |
opendevreview | Merged opendev/zone-opendev.org master: Add mirror01.iad3.raxflex to our DNS zone https://review.opendev.org/c/opendev/zone-opendev.org/+/958106 | 17:30 |
opendevreview | Clark Boylan proposed opendev/zuul-providers master: Add raxflex iad3 region to zuul's resource pools https://review.opendev.org/c/opendev/zuul-providers/+/958109 | 17:33 |
clarkb | cloudnull: ^ I noticed that we're still set to a 10 instance limit and unlimited floating IPs in iad3. I've hardcoded us to the 10 instance limit as a conservative starting point but can update that to whatever we set the floating ip limit to later | 17:34 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Use Jammy for our Kerberos servers https://review.opendev.org/c/opendev/system-config/+/958112 | 17:47 |
fungi | infra-root: ^ i couldn't find anything similar for the afs db or file servers, am i blind or do we not do test deployments of those? | 17:47 |
clarkb | fungi: in zuul.d/system-config-roles.yaml we have tests for the openafs role. I think that may only be the client side though | 17:49 |
fungi | yeah, nothing that deploys test servers on specific platforms | 17:50 |
clarkb | I don't see any job for that seems to run the service-afs playbook | 17:50 |
fungi | okay, so good enough | 17:50 |
clarkb | you could add a job that does ^ but I'm wondering if part of the reason for that is it isn't fully automated? | 17:50 |
clarkb | oh I wonder if some of the problem is with the domain and authentication and all that | 17:50 |
clarkb | since its a global filesystem it probably isn't trivial to spin up something working without making it a different domain? | 17:51 |
fungi | yeah, i assumed it was complexities of actually having a working subtree in global afs | 17:52 |
clarkb | did anyone else want to review the mirror01.iad3.raxflex server addition? I'll probably approve it in ~10 minutes if there is no -1 betwen now and then | 18:21 |
fungi | yeah, i just wanted to make sure it was in dns before approving | 18:26 |
corvus | lgtm | 18:27 |
fungi | since my jammy change for the kerberos servers is passing i'm going to stick them and all the afs servers into the emergency disable list now | 18:27 |
clarkb | fungi: I think we have a documented process for kerberos server outages fwiw | 18:28 |
fungi | afs servers too | 18:34 |
fungi | i'm pulling them all up | 18:34 |
fungi | https://docs.opendev.org/opendev/system-config/latest/kerberos.html#no-service-outage-server-maintenance and https://docs.opendev.org/opendev/system-config/latest/afs.html#no-outage-server-maintenance for the record | 18:42 |
fungi | i've placed the following servers temporarily in the emergency disable list on bridge in order to start working through upgrades over the rest of the week: afs01.dfw.openstack.org, afs01.ord.openstack.org, afs02.dfw.openstack.org, afsdb01.openstack.org, afsdb02.openstack.org, afsdb03.openstack.org, kdc03.openstack.org, kdc04.openstack.org | 18:48 |
fungi | these are the only afs and kerberos servers i found in our inventory | 18:48 |
fungi | per the no-outage docs i'm upgrading kdc04 first since it's the inactive replica | 18:50 |
fungi | packages for focal are already up to date, but it apparently needs a clean reboot before i can run do-release-upgrade to jammy. i expect this will be common across the entirety of the set | 18:52 |
fungi | it starts an extra sshd on 1022/tcp, for reference | 18:58 |
Clark[m] | Our iptables likely blocks that fwiw. I'm on matrix now due to lunch | 18:59 |
fungi | "Sorry, this storage driver is not supported in kernels for newer releases. There will not be any further Ubuntu releases that provide kernel support for the aufs storage driver. Please ensure that none of your containers are using the aufs storage driver, remove the directory /var/lib/docker/aufs and try again." | 19:00 |
fungi | i don't think we rely on it? | 19:00 |
Clark[m] | That should be fine. I didn't even think we run docker on the kdcs | 19:00 |
fungi | yeah, don't think we do | 19:01 |
Clark[m] | Are there any containers? If not then nothing should use aufs | 19:01 |
Clark[m] | And I would expect new containers to have stopped using aufs at some point | 19:02 |
fungi | Command 'docker' not found | 19:02 |
fungi | survey says "no" | 19:02 |
opendevreview | Merged opendev/system-config master: Add mirror01.iad3.raxflex to our inventory https://review.opendev.org/c/opendev/system-config/+/958107 | 19:03 |
Clark[m] | That change will run all the jobs but you've got the hosts in the emergency file so shouldn't matter | 19:03 |
fungi | right | 19:03 |
fungi | i did a `rm /var/lib/docker/aufs` and retried do-release-upgrade, seems maybe happier now | 19:04 |
Clark[m] | Ack | 19:04 |
fungi | er, rm -rf because it was a directory | 19:05 |
Clark[m] | I suspect it was empty too and just auto created by something for some reason at some point :) | 19:05 |
fungi | i concur | 19:06 |
fungi | so far i've only told it to keep our sshd and sudoers config changes, anything else i expect ansible can (re-)correct | 19:28 |
clarkb | https://mirror.iad3.raxflex.opendev.org/ubuntu/ has content now. I think it should be ok to land https://review.opendev.org/c/opendev/zuul-providers/+/958109 as a result. But I know that cloudnull wants to adjust quotas there. Not sure if we want to wait for that to happen before we try to use the 10 instance quota | 19:39 |
fungi | lgtm, yep | 19:51 |
fungi | kdc04 is up and running on jammy now. i'll do the switchover steps in our docs next | 19:57 |
fungi | Database propagation to kdc04.openstack.org: SUCCEEDED | 19:59 |
corvus | clarkb: did you verify the flavor names? (they are different in the other 2 flex regions) | 20:06 |
corvus | (i mean to say, dfw3 and sjc3 are different from each other, so i wonder if iad3 should be different still, or is the same as sjc3) | 20:07 |
fungi | oh! good memory | 20:07 |
fungi | granted, thay should have become readily apparent when booting the new mirror instance | 20:08 |
clarkb | corvus: I did. sjc3 and iad3 have matching flavors | 20:08 |
fungi | wonder why dfw3 is the odd one out | 20:09 |
clarkb | actually iad3 is a subset. But the three flavors we use are in the subset | 20:09 |
corvus | clarkb: cool, lgtm then... | 20:09 |
corvus | the zuul-providers config for iad3 == sjc3 | 20:10 |
corvus | 4 flavors there in your change | 20:10 |
clarkb | ya I had to check for booting the mirror as well so made sure everything lined up | 20:10 |
clarkb | the fourth is a duplicate we just alias the nested virt to the 8gb flavor iirc. But yes I checked they are in there | 20:10 |
fungi | did i miss the zuul-providers addition? | 20:10 |
clarkb | gp.0.4.4 gp.0.4.8 and gp.0.4.16 show up | 20:11 |
clarkb | fungi https://review.opendev.org/c/opendev/zuul-providers/+/958109 its this change | 20:11 |
corvus | ah yes, 4 of our flavors, 3 of theirs. i thought you meant that iad3 was a subset of sjc3 (but it's not) | 20:13 |
corvus | clarkb: i think from our pov, it's okay to start with the 10. | 20:13 |
clarkb | corvus: oh sorry I meant the flavors on the cloud side of iad3 are a subset of the flavors in sjc3 | 20:17 |
clarkb | but those that do exist overlap | 20:17 |
clarkb | corvus: in that case I think we can probably land the change and see how it goes? | 20:17 |
clarkb | then after the quota is adjusted we can change that value | 20:17 |
fungi | lgtm, thanks! | 20:24 |
opendevreview | Merged opendev/zuul-providers master: Add raxflex iad3 region to zuul's resource pools https://review.opendev.org/c/opendev/zuul-providers/+/958109 | 20:24 |
corvus | clarkb: oh that's interesting about the cloud flavors. gtk. all caught up now. :) | 20:24 |
fungi | re-ran /usr/local/bin/run-kprop.sh on kdc03 after upgrade to jammy, all done there now | 20:40 |
fungi | starting on afsdb01 with our no-downtime maintenance instructions | 20:43 |
fungi | Instance ptserver, temporarily disabled, has core file, currently shutdown. Instance vlserver, temporarily disabled, currently shutdown. | 20:44 |
clarkb | I think for the fileservers we have to transition the primary volume away from the host being updated. That might be a bit more painful unless things are already distributed in a way that just works for all but one | 20:48 |
clarkb | but this already seems liek good progress! | 20:48 |
fungi | yeah | 20:49 |
fungi | already reading ahead to those | 20:49 |
fungi | but we have 3 db servers to get through first | 20:50 |
fungi | and then maybe repeat from jammy to noble | 20:50 |
clarkb | one thing that just occurred to me is you may want to remove the ansible fact cache files for those hosts before you remove them from the emergency file | 20:54 |
clarkb | that way ansible rereads all the facts as they are now rather than potentially relying on old fact info | 20:54 |
fungi | any idea where that is these days? | 20:58 |
fungi | but good call, yep | 20:58 |
clarkb | /var/cache/ansible/facts on bridge I think | 20:59 |
fungi | k, thx | 20:59 |
fungi | Instance ptserver, has core file, currently running normally. Instance vlserver, currently running normally. | 21:12 |
fungi | i'll move on to afsdb02 and 03 | 21:13 |
cloudnull | clarkb: I can go get those quotas update now. | 21:20 |
clarkb | cloudnull: ack I don't think we're in a hurry. But we have everythin configured on our side to take advantage of them once set now | 21:21 |
clarkb | fungi: are you having to do a preparatory reboot on all of these before beginning the update process? | 21:24 |
clarkb | and did any other server complain about aufs? | 21:24 |
fungi | clarkb: yeah, all of them want a reboot before running, and i've just preemptively been removing that directory | 21:26 |
clarkb | ack thanks | 21:27 |
fungi | basically if there's been a kernel update applied since the last reboot they want to be rebooted first, so that's every last one really | 21:27 |
fungi | because we reboot them infrequently | 21:27 |
clarkb | makes sense | 21:28 |
fungi | hopefully afsdb02 will be done soon and i can move on to 03 | 21:28 |
fungi | finally finished the yardwork so i can focus on this a little more intently | 21:28 |
fungi | though i'll probably grab a shower while afsdb03 is upgrading | 21:29 |
fungi | otherwise christine will complain | 21:29 |
clarkb | once you get to a good pausing point you can always pick it back up again in the morning | 21:33 |
clarkb | I have an appointment tomorrow morning but am around otherwise | 21:35 |
clarkb | I'm off to get my annual eyeball scan | 21:36 |
fungi | yeah, other than meetings i've got nothing pressing tomorrow | 21:36 |
fungi | assuming my cables don't get sucked up by a hurricane (unlikely) | 21:37 |
clarkb | omnomnom | 21:37 |
fungi | copper ramen | 21:38 |
fungi | working on 03 now | 21:46 |
fungi | okay, afsdb03.openstack.org is now upgraded to jammy. that just leaves the file servers, which i'll pick back up with in my morning | 23:04 |
clarkb | and we're leaving all of the hosts in the emergency file for now? (I think that is fine just double checking) | 23:07 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!