Tuesday, 2025-08-19

corvusit passed the second time through00:07
opendevreviewMerged opendev/system-config master: Fix system-config-run for Ansible 9 vs 11  https://review.opendev.org/c/opendev/system-config/+/95781101:30
opendevreviewMerged opendev/system-config master: Add rax flex iad3 to clouds.yaml and cloud management  https://review.opendev.org/c/opendev/system-config/+/95781001:30
*** mrunge_ is now known as mrunge12:12
*** dhill is now known as Guest2446512:33
fricklerwhat's the status for gitea-lb02.opendev.org? looks like the DNS records are gone but the server is still running? (just asking because of the mail errors that this setup produces)12:47
Clark[m]frickler I thought I had cleaned up the server but maybe in the rush to replace things I hadnt14:20
Clark[m]Oh we were keeping the server up as an example for vexxhost to debug14:21
Clark[m]That is why it is still up. We can shutdown the server to avoid it generating email and boot it later if that aids debugging14:21
*** dhill is now known as Guest2447414:25
opendevreviewJeremy Stanley proposed opendev/bindep master: Drop setup.cfg  https://review.opendev.org/c/opendev/bindep/+/94397614:44
clarkblooks like the zuul reboots have managed to get stuck waiting for a paused ze06 overnight14:52
priteauIs the openstack tenant overloaded right now? I see lots of jobs in queued state14:53
clarkbI don't see anything in the zuul queues older than ~4 hours so not sure why the executor isn't shutting down14:53
clarkbpriteau: yes, it is very likely we have more demand than quota. We had to disable a number of regions recently due to cloud errors too so our quota is lower than normal14:54
clarkbthe cloud is aware and looking into it. So hopefully that gets resolved soonish14:54
priteauthanks14:54
fungipriteau: the node requests chart at https://grafana.opendev.org/d/21a6e53ea4/zuul-status?orgId=1&from=now-6h&to=now&timezone=utc indicates that's the case14:54
*** darmach47 is now known as darmach414:58
clarkbzuul launcher log file sizes look much more reasonable this morning14:58
clarkbze06's log file says it is waiting for one job to finish. I don't know what job that is yet. I have a meeting in a minute too so will have to dig in after14:59
fungilooks like we've peaked around 370 nodes in use at best today14:59
clarkbcorvus: on ze06 /var/lib/zuul/builds shows 5aedd3b812174f7c9f8b29c78e69c3e3 as the only build entry. That uuid shows up in executor logs from days ago. I suspect this may be some sort of bug holding things up? I don't want to shutdown ze06 manually until you have a chance to look at it. But I suspect that may be a workaround we employ15:11
corvuslooking15:13
clarkbalso feel free to take any debugging and/or shutdown steps you like. Just keep in mind the playbook is running on bridge so as soon as the paused executors stops things will update on the server and it will be rebooted15:14
corvusclarkb: ack.  i did a sigusr2 and it's waiting for a paused build which apparently no longer exists.  seems likely to be a bug somehow.  https://zuul.opendev.org/t/openstack/build/5aedd3b812174f7c9f8b29c78e69c3e315:16
corvusi think further debugging can be done just with the scheduler and executor logs.  i don't think we need any more state from the executor, so i will hard-stop it now.15:16
clarkback thanks for checking15:17
clarkbthe playbook is proceeding now.15:20
clarkb#status log Shut down the old gitea-lb02 server to prevent it from sending cron spam without DNS records. Server is not yet deleted as it may be useful for debugging ipv6 cloud behaviors.16:09
clarkbfrickler: ^ fyi16:10
opendevstatusclarkb: finished logging16:10
fricklerty16:18
opendevreviewClark Boylan proposed opendev/system-config master: Drop Bionic testing for OpenDev services, playbooks and roles  https://review.opendev.org/c/opendev/system-config/+/95795016:21
clarkbinfra-root ^ it occurred to me that I should do an actual audit of whether or not we have bionic nodes and as far as I can tell we do not16:21
clarkbthat means we can fully rip out our testing of bionic things in system-config and remove the ansible version override16:21
clarkbmore than happy for people to double check me on that as part of their review16:21
fungiwe'll likely need to drop py37 testing from bindep, git-review, pbr, and so on16:25
clarkbya I wonder what that means for python2.7 testing? Do we have 2.7 on focal or jammy?16:29
clarkbbut yes those are likely additional cleanups.16:29
cloudnullclarkb fungi following up from last week - are workloads still no longer operating on OpenStack Public Cloud (legacy)? Anything I can help on that front?16:29
clarkbcloudnull: I believe they are still turned off. Should we turn them back on and see if things are happier now?16:29
cloudnullYeah. I think it should be good now.16:33
cloudnullIf you run into issue let me know. We want to make sure we unblock additional jobs/quota.16:33
clarkbcloudnull: thanks I'll get a change or two up momentarily to turn things back on16:34
cloudnullCool cool.16:36
opendevreviewClark Boylan proposed opendev/zuul-providers master: Reenable rax DFW and ORD providers  https://review.opendev.org/c/opendev/zuul-providers/+/95795516:36
opendevreviewClark Boylan proposed opendev/zuul-providers master: Reenable rax IAD  https://review.opendev.org/c/opendev/zuul-providers/+/95795716:38
clarkbinfra-root ^ I figure we can reenable dfw and ord first and if that looks good proceed with iad16:38
cloudnull++16:39
fungiclarkb: looks like infra-prod-run-cloud-launcher failed on the iad3 addition, not sure if you saw yet16:40
clarkbfungi: I hadn't16:40
fungihttps://zuul.opendev.org/t/openstack/build/df8689f85f574beb998d1f17c977d04516:40
clarkbThe error was: keystoneauth1.exceptions.http.Unauthorized: The request you have made requires authentication.16:41
clarkbI can't remember do we still need to do something special to sync up the new cloud regions in rax flex with our auth tokens?16:42
fungithat may be iad3 not having our account cached yet16:42
fungii never got a clear answer on how that works, but cloudnull may know16:42
fungiwhen we added dfw3 the api only started working after i manually logged into skyline once16:43
clarkbya so maybe we need to do that here?16:44
clarkbits worht a shot. I need to get myself ready for the infra meeting later today before my three hour block of meetings begins so won't get to that immediately16:44
fungion the topic of bionic nodes and ansible 11, https://review.opendev.org/c/openstack/pbr/+/942594 shows a pbr change now returning RETRY_LIMIT results for openstack-tox-py27, openstack-tox-py36 and openstack-tox-py3716:44
clarkbfungi: ya so 36 and 37 are basically gone. Then 27 depends on whether or not focal has 27 packages16:45
clarkbhttps://packages.ubuntu.com/jammy/python2.7 jammy has 2.7 supposedly16:45
*** dhill is now known as Guest2448416:48
cloudnullclarkb the simplest way to get the project on the environment to login to the webui, once you do that you’ll be able to hit the API.16:55
clarkbcloudnull: thanks for confirming. I should be able to do that sometime today if no one beats me to it16:56
clarkb(more than happy for someone else to beat me to it)16:56
cloudnullOnce someone logs in, let me know. I’ll got tweak the quotas to push them up.16:57
clarkbwill do16:59
fungicloudnull: not urgent, but from a future simplication perspective is there any api call we can make to do that from an automated system so a user doesn't need to dig out the credentials and load them into a browser? if so, we can just improve our automation to perform the additional step17:01
fungiif not, no biggie, we just have to remember to do that as a separate step any time we bring up a new region, which isn't all that often17:01
fungibut happens infrequently enough that we'll probably always forget and then have to remember why it doesn't work initially17:02
*** sfinucan is now known as stephenfin17:02
keekzi am also hitting `RETRY_LIMIT`: https://review.opendev.org/c/openstack/skyline-apiserver/+/957743 is that an infra issue? it's been happening for a couple reviews of mine since yesterday17:06
fungikeekz: are those jobs using a really old platform, like ubuntu-bionic nodes maybe?17:09
fungior an old python version (<3.8) that needs an older test platform?17:10
frickler"ANSIBLE PARSE ERROR"17:10
frickleron ubuntu-noble17:10
keekz:shrugs: i've never contributed to this project before.. things have been getting merged in fairly regularly it looks like though17:10
fricklerhttps://zuul.opendev.org/t/openstack/build/bb6ffbbe24f444e7af2b7d2316b1f40617:10
fricklerso likely some new ansible 11 issue17:11
clarkb"ERROR! Vars in a Play must be specified as a dictionary." it says17:13
clarkbhttps://opendev.org/openstack/skyline-apiserver/src/branch/master/playbooks/devstack/pre.yaml#L8-L9 looks like an easy fix17:14
clarkbkeekz: I think you just need to convert that list of vars to a dictionary of vars. Essentially just remove the '- ' prefix from those two lines17:15
fungiokay, so newer ansible>9 behavior change17:15
clarkbya and I kinda wonder if that means ansible 9 was running things in a way that was unexpected17:15
clarkbbut seems like an easy rollforward situation17:15
keekzcool, appreciate it, i'll give that a try. i didn't notice that ansible error message tucked away in there, just that it seemingly had gotten stuck on swift upload at the end of the log17:17
clarkbwe have a road here called skyline that I ride my bike up semi regularly. Searching my browser history for skyline to see if I have the url for the rax dashboard in there is just a list of git repo links and bike routes17:40
clarkbhttps://docs.rackspace.com/docs/accessing-rackspace-flex-cloud has the links I need17:40
clarkbcloudnull: fungi: I just tried using the keystone credentials for the two accounts (what is in our clouds.yaml) and that failed. I'm guessing I need to use rackspace federation the first time as that is what syncs the keystone creds for both skyline and the APIs?17:44
clarkbya using rackspace federation appears to have worked for the first account. Working on the second next17:46
clarkbcloudnull: fungi: ok I logged into both of our accounts using rackspace federation method in iad3's skyline instance. One thing to note is I had to use an incognito tab for the second one as "sign out" leaves enough of a cookie trail that on next login it redirects you straight back into the account and doesn't ask for your new username and password17:50
fungiprobably a skyline bug/feature request17:50
clarkbI am able to do image listings using openstack client with both accounts in that region now as well so I suspect this has addressed the problem17:53
clarkband for the record this is where I logged in: https://skyline.api.iad3.rackspacecloud.com/auth/login17:53
fungiawesome, thanks!17:55
*** dhill is now known as Guest2449117:59
opendevreviewMerged opendev/zuul-providers master: Reenable rax DFW and ORD providers  https://review.opendev.org/c/opendev/zuul-providers/+/95795519:35
fungii didn't approve the second one, until we see the upshot of ^19:35
clarkb++19:36
fungirelated, looks like we're basically caught up on the node request backlog finally19:55
corvuskeep in mind that a lot of the time the backlog is only arm64 requests19:55
fungiyeah19:55
clarkbwe are up to ze11 on the upgrade and reboot process21:01
opendevreviewClark Boylan proposed opendev/zone-opendev.org master: Delete review02 DNS records  https://review.opendev.org/c/opendev/zone-opendev.org/+/95798121:12
clarkbinfra-root ^ I double checked that review02 has been deleted from the cloud (it was its not there anymore) so this DNS cleanup should be safe. It is also not in our inventory anymore so I will cleanup the emergency file now as well21:13
clarkbgitea-lb02 was also in the emergency file and I have removed it too21:14
opendevreviewClark Boylan proposed opendev/zone-opendev.org master: Update commented out www.opendev.org record  https://review.opendev.org/c/opendev/zone-opendev.org/+/95798221:18
clarkband that is a super minor non production update for sanity purposes21:18
opendevreviewClif Houck proposed openstack/diskimage-builder master: Add a sha256sum check for CentOS Cloud Images  https://review.opendev.org/c/openstack/diskimage-builder/+/95798321:20
opendevreviewClark Boylan proposed openstack/diskimage-builder master: Drop python36 testing  https://review.opendev.org/c/openstack/diskimage-builder/+/95798521:47
clarkbmore ^ ansible 11 python36 bionic fallout21:47
clarkbI think that centos 9 stream may have published bad qcow2 image(s)21:56
clarkb957983 adds sha256sum image hash verification. Well the sha256sum file for the image we use is 0 bytes. Then in my change that drops python36 we fail to build centos 9 stream images because qemu-img's conversion from qcow2 to raw fails on a read error21:57
clarkbspotz isn't in here, but maybe would know how to ping about that?21:57
fungioh too fun22:01
JayFclarkb: we literally do that in an ironic CI job; rebuild an image with few changes just to get it using 4k block sizes22:02
JayFclarkb: so that's why those image issues break even our CI that uses prebuilt ramdisks :(22:03
clarkbyou could use centos-minimal probably22:03
clarkbproblems like this are exactly why opendev doesn't use the prebuild images as a starting point22:03
JayFthat's all Julia's stuff, they did a bunch of DIB 4k image support stuff22:03
JayFI am not fully familiar enough with it to know if that's possible22:04
clarkbthey change too often in weird undocumented ways without much path for input.22:04
JayFstuff I do starts with *-minimal22:04
clarkbwe'ev made it to ze12. This might finish before the end of the day22:05
fungiyay!22:05
opendevreviewMerged opendev/zuul-providers master: Switch to zuul-jobs upload-image-swift  https://review.opendev.org/c/opendev/zuul-providers/+/95101823:20
corvuswow that finally happened :)23:25

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!