corvus | it passed the second time through | 00:07 |
---|---|---|
opendevreview | Merged opendev/system-config master: Fix system-config-run for Ansible 9 vs 11 https://review.opendev.org/c/opendev/system-config/+/957811 | 01:30 |
opendevreview | Merged opendev/system-config master: Add rax flex iad3 to clouds.yaml and cloud management https://review.opendev.org/c/opendev/system-config/+/957810 | 01:30 |
*** mrunge_ is now known as mrunge | 12:12 | |
*** dhill is now known as Guest24465 | 12:33 | |
frickler | what's the status for gitea-lb02.opendev.org? looks like the DNS records are gone but the server is still running? (just asking because of the mail errors that this setup produces) | 12:47 |
Clark[m] | frickler I thought I had cleaned up the server but maybe in the rush to replace things I hadnt | 14:20 |
Clark[m] | Oh we were keeping the server up as an example for vexxhost to debug | 14:21 |
Clark[m] | That is why it is still up. We can shutdown the server to avoid it generating email and boot it later if that aids debugging | 14:21 |
*** dhill is now known as Guest24474 | 14:25 | |
opendevreview | Jeremy Stanley proposed opendev/bindep master: Drop setup.cfg https://review.opendev.org/c/opendev/bindep/+/943976 | 14:44 |
clarkb | looks like the zuul reboots have managed to get stuck waiting for a paused ze06 overnight | 14:52 |
priteau | Is the openstack tenant overloaded right now? I see lots of jobs in queued state | 14:53 |
clarkb | I don't see anything in the zuul queues older than ~4 hours so not sure why the executor isn't shutting down | 14:53 |
clarkb | priteau: yes, it is very likely we have more demand than quota. We had to disable a number of regions recently due to cloud errors too so our quota is lower than normal | 14:54 |
clarkb | the cloud is aware and looking into it. So hopefully that gets resolved soonish | 14:54 |
priteau | thanks | 14:54 |
fungi | priteau: the node requests chart at https://grafana.opendev.org/d/21a6e53ea4/zuul-status?orgId=1&from=now-6h&to=now&timezone=utc indicates that's the case | 14:54 |
*** darmach47 is now known as darmach4 | 14:58 | |
clarkb | zuul launcher log file sizes look much more reasonable this morning | 14:58 |
clarkb | ze06's log file says it is waiting for one job to finish. I don't know what job that is yet. I have a meeting in a minute too so will have to dig in after | 14:59 |
fungi | looks like we've peaked around 370 nodes in use at best today | 14:59 |
clarkb | corvus: on ze06 /var/lib/zuul/builds shows 5aedd3b812174f7c9f8b29c78e69c3e3 as the only build entry. That uuid shows up in executor logs from days ago. I suspect this may be some sort of bug holding things up? I don't want to shutdown ze06 manually until you have a chance to look at it. But I suspect that may be a workaround we employ | 15:11 |
corvus | looking | 15:13 |
clarkb | also feel free to take any debugging and/or shutdown steps you like. Just keep in mind the playbook is running on bridge so as soon as the paused executors stops things will update on the server and it will be rebooted | 15:14 |
corvus | clarkb: ack. i did a sigusr2 and it's waiting for a paused build which apparently no longer exists. seems likely to be a bug somehow. https://zuul.opendev.org/t/openstack/build/5aedd3b812174f7c9f8b29c78e69c3e3 | 15:16 |
corvus | i think further debugging can be done just with the scheduler and executor logs. i don't think we need any more state from the executor, so i will hard-stop it now. | 15:16 |
clarkb | ack thanks for checking | 15:17 |
clarkb | the playbook is proceeding now. | 15:20 |
clarkb | #status log Shut down the old gitea-lb02 server to prevent it from sending cron spam without DNS records. Server is not yet deleted as it may be useful for debugging ipv6 cloud behaviors. | 16:09 |
clarkb | frickler: ^ fyi | 16:10 |
opendevstatus | clarkb: finished logging | 16:10 |
frickler | ty | 16:18 |
opendevreview | Clark Boylan proposed opendev/system-config master: Drop Bionic testing for OpenDev services, playbooks and roles https://review.opendev.org/c/opendev/system-config/+/957950 | 16:21 |
clarkb | infra-root ^ it occurred to me that I should do an actual audit of whether or not we have bionic nodes and as far as I can tell we do not | 16:21 |
clarkb | that means we can fully rip out our testing of bionic things in system-config and remove the ansible version override | 16:21 |
clarkb | more than happy for people to double check me on that as part of their review | 16:21 |
fungi | we'll likely need to drop py37 testing from bindep, git-review, pbr, and so on | 16:25 |
clarkb | ya I wonder what that means for python2.7 testing? Do we have 2.7 on focal or jammy? | 16:29 |
clarkb | but yes those are likely additional cleanups. | 16:29 |
cloudnull | clarkb fungi following up from last week - are workloads still no longer operating on OpenStack Public Cloud (legacy)? Anything I can help on that front? | 16:29 |
clarkb | cloudnull: I believe they are still turned off. Should we turn them back on and see if things are happier now? | 16:29 |
cloudnull | Yeah. I think it should be good now. | 16:33 |
cloudnull | If you run into issue let me know. We want to make sure we unblock additional jobs/quota. | 16:33 |
clarkb | cloudnull: thanks I'll get a change or two up momentarily to turn things back on | 16:34 |
cloudnull | Cool cool. | 16:36 |
opendevreview | Clark Boylan proposed opendev/zuul-providers master: Reenable rax DFW and ORD providers https://review.opendev.org/c/opendev/zuul-providers/+/957955 | 16:36 |
opendevreview | Clark Boylan proposed opendev/zuul-providers master: Reenable rax IAD https://review.opendev.org/c/opendev/zuul-providers/+/957957 | 16:38 |
clarkb | infra-root ^ I figure we can reenable dfw and ord first and if that looks good proceed with iad | 16:38 |
cloudnull | ++ | 16:39 |
fungi | clarkb: looks like infra-prod-run-cloud-launcher failed on the iad3 addition, not sure if you saw yet | 16:40 |
clarkb | fungi: I hadn't | 16:40 |
fungi | https://zuul.opendev.org/t/openstack/build/df8689f85f574beb998d1f17c977d045 | 16:40 |
clarkb | The error was: keystoneauth1.exceptions.http.Unauthorized: The request you have made requires authentication. | 16:41 |
clarkb | I can't remember do we still need to do something special to sync up the new cloud regions in rax flex with our auth tokens? | 16:42 |
fungi | that may be iad3 not having our account cached yet | 16:42 |
fungi | i never got a clear answer on how that works, but cloudnull may know | 16:42 |
fungi | when we added dfw3 the api only started working after i manually logged into skyline once | 16:43 |
clarkb | ya so maybe we need to do that here? | 16:44 |
clarkb | its worht a shot. I need to get myself ready for the infra meeting later today before my three hour block of meetings begins so won't get to that immediately | 16:44 |
fungi | on the topic of bionic nodes and ansible 11, https://review.opendev.org/c/openstack/pbr/+/942594 shows a pbr change now returning RETRY_LIMIT results for openstack-tox-py27, openstack-tox-py36 and openstack-tox-py37 | 16:44 |
clarkb | fungi: ya so 36 and 37 are basically gone. Then 27 depends on whether or not focal has 27 packages | 16:45 |
clarkb | https://packages.ubuntu.com/jammy/python2.7 jammy has 2.7 supposedly | 16:45 |
*** dhill is now known as Guest24484 | 16:48 | |
cloudnull | clarkb the simplest way to get the project on the environment to login to the webui, once you do that you’ll be able to hit the API. | 16:55 |
clarkb | cloudnull: thanks for confirming. I should be able to do that sometime today if no one beats me to it | 16:56 |
clarkb | (more than happy for someone else to beat me to it) | 16:56 |
cloudnull | Once someone logs in, let me know. I’ll got tweak the quotas to push them up. | 16:57 |
clarkb | will do | 16:59 |
fungi | cloudnull: not urgent, but from a future simplication perspective is there any api call we can make to do that from an automated system so a user doesn't need to dig out the credentials and load them into a browser? if so, we can just improve our automation to perform the additional step | 17:01 |
fungi | if not, no biggie, we just have to remember to do that as a separate step any time we bring up a new region, which isn't all that often | 17:01 |
fungi | but happens infrequently enough that we'll probably always forget and then have to remember why it doesn't work initially | 17:02 |
*** sfinucan is now known as stephenfin | 17:02 | |
keekz | i am also hitting `RETRY_LIMIT`: https://review.opendev.org/c/openstack/skyline-apiserver/+/957743 is that an infra issue? it's been happening for a couple reviews of mine since yesterday | 17:06 |
fungi | keekz: are those jobs using a really old platform, like ubuntu-bionic nodes maybe? | 17:09 |
fungi | or an old python version (<3.8) that needs an older test platform? | 17:10 |
frickler | "ANSIBLE PARSE ERROR" | 17:10 |
frickler | on ubuntu-noble | 17:10 |
keekz | :shrugs: i've never contributed to this project before.. things have been getting merged in fairly regularly it looks like though | 17:10 |
frickler | https://zuul.opendev.org/t/openstack/build/bb6ffbbe24f444e7af2b7d2316b1f406 | 17:10 |
frickler | so likely some new ansible 11 issue | 17:11 |
clarkb | "ERROR! Vars in a Play must be specified as a dictionary." it says | 17:13 |
clarkb | https://opendev.org/openstack/skyline-apiserver/src/branch/master/playbooks/devstack/pre.yaml#L8-L9 looks like an easy fix | 17:14 |
clarkb | keekz: I think you just need to convert that list of vars to a dictionary of vars. Essentially just remove the '- ' prefix from those two lines | 17:15 |
fungi | okay, so newer ansible>9 behavior change | 17:15 |
clarkb | ya and I kinda wonder if that means ansible 9 was running things in a way that was unexpected | 17:15 |
clarkb | but seems like an easy rollforward situation | 17:15 |
keekz | cool, appreciate it, i'll give that a try. i didn't notice that ansible error message tucked away in there, just that it seemingly had gotten stuck on swift upload at the end of the log | 17:17 |
clarkb | we have a road here called skyline that I ride my bike up semi regularly. Searching my browser history for skyline to see if I have the url for the rax dashboard in there is just a list of git repo links and bike routes | 17:40 |
clarkb | https://docs.rackspace.com/docs/accessing-rackspace-flex-cloud has the links I need | 17:40 |
clarkb | cloudnull: fungi: I just tried using the keystone credentials for the two accounts (what is in our clouds.yaml) and that failed. I'm guessing I need to use rackspace federation the first time as that is what syncs the keystone creds for both skyline and the APIs? | 17:44 |
clarkb | ya using rackspace federation appears to have worked for the first account. Working on the second next | 17:46 |
clarkb | cloudnull: fungi: ok I logged into both of our accounts using rackspace federation method in iad3's skyline instance. One thing to note is I had to use an incognito tab for the second one as "sign out" leaves enough of a cookie trail that on next login it redirects you straight back into the account and doesn't ask for your new username and password | 17:50 |
fungi | probably a skyline bug/feature request | 17:50 |
clarkb | I am able to do image listings using openstack client with both accounts in that region now as well so I suspect this has addressed the problem | 17:53 |
clarkb | and for the record this is where I logged in: https://skyline.api.iad3.rackspacecloud.com/auth/login | 17:53 |
fungi | awesome, thanks! | 17:55 |
*** dhill is now known as Guest24491 | 17:59 | |
opendevreview | Merged opendev/zuul-providers master: Reenable rax DFW and ORD providers https://review.opendev.org/c/opendev/zuul-providers/+/957955 | 19:35 |
fungi | i didn't approve the second one, until we see the upshot of ^ | 19:35 |
clarkb | ++ | 19:36 |
fungi | related, looks like we're basically caught up on the node request backlog finally | 19:55 |
corvus | keep in mind that a lot of the time the backlog is only arm64 requests | 19:55 |
fungi | yeah | 19:55 |
clarkb | we are up to ze11 on the upgrade and reboot process | 21:01 |
opendevreview | Clark Boylan proposed opendev/zone-opendev.org master: Delete review02 DNS records https://review.opendev.org/c/opendev/zone-opendev.org/+/957981 | 21:12 |
clarkb | infra-root ^ I double checked that review02 has been deleted from the cloud (it was its not there anymore) so this DNS cleanup should be safe. It is also not in our inventory anymore so I will cleanup the emergency file now as well | 21:13 |
clarkb | gitea-lb02 was also in the emergency file and I have removed it too | 21:14 |
opendevreview | Clark Boylan proposed opendev/zone-opendev.org master: Update commented out www.opendev.org record https://review.opendev.org/c/opendev/zone-opendev.org/+/957982 | 21:18 |
clarkb | and that is a super minor non production update for sanity purposes | 21:18 |
opendevreview | Clif Houck proposed openstack/diskimage-builder master: Add a sha256sum check for CentOS Cloud Images https://review.opendev.org/c/openstack/diskimage-builder/+/957983 | 21:20 |
opendevreview | Clark Boylan proposed openstack/diskimage-builder master: Drop python36 testing https://review.opendev.org/c/openstack/diskimage-builder/+/957985 | 21:47 |
clarkb | more ^ ansible 11 python36 bionic fallout | 21:47 |
clarkb | I think that centos 9 stream may have published bad qcow2 image(s) | 21:56 |
clarkb | 957983 adds sha256sum image hash verification. Well the sha256sum file for the image we use is 0 bytes. Then in my change that drops python36 we fail to build centos 9 stream images because qemu-img's conversion from qcow2 to raw fails on a read error | 21:57 |
clarkb | spotz isn't in here, but maybe would know how to ping about that? | 21:57 |
fungi | oh too fun | 22:01 |
JayF | clarkb: we literally do that in an ironic CI job; rebuild an image with few changes just to get it using 4k block sizes | 22:02 |
JayF | clarkb: so that's why those image issues break even our CI that uses prebuilt ramdisks :( | 22:03 |
clarkb | you could use centos-minimal probably | 22:03 |
clarkb | problems like this are exactly why opendev doesn't use the prebuild images as a starting point | 22:03 |
JayF | that's all Julia's stuff, they did a bunch of DIB 4k image support stuff | 22:03 |
JayF | I am not fully familiar enough with it to know if that's possible | 22:04 |
clarkb | they change too often in weird undocumented ways without much path for input. | 22:04 |
JayF | stuff I do starts with *-minimal | 22:04 |
clarkb | we'ev made it to ze12. This might finish before the end of the day | 22:05 |
fungi | yay! | 22:05 |
opendevreview | Merged opendev/zuul-providers master: Switch to zuul-jobs upload-image-swift https://review.opendev.org/c/opendev/zuul-providers/+/951018 | 23:20 |
corvus | wow that finally happened :) | 23:25 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!