Monday, 2025-06-23

opendevreviewtianyutong proposed openstack/project-config master: Allow tag creation for heterogeneous-distributed-training-framework  https://review.opendev.org/c/openstack/project-config/+/95306901:41
opendevreviewMerged openstack/project-config master: Allow tag creation for heterogeneous-distributed-training-framework  https://review.opendev.org/c/openstack/project-config/+/95306912:31
*** haleyb|out is now known as haleyb13:12
*** mmagr__ is now known as mmagr14:28
priteauHello. I have a job which has been queued for 5+ hours: https://zuul.opendev.org/t/openstack/status?change=95298315:33
priteaukayobe-seed-vm-ubuntu-noble towards the bottom of the list15:33
clarkblooks like that job uses this nodeset: https://opendev.org/openstack/kayobe/src/branch/master/zuul.d/nodesets.yaml#L14-L18 which uses a standard ubuntu-noble label (so not arm or nested virt etc)15:42
fungiand just a single node15:42
clarkb2025-06-23 15:42:55,713 DEBUG nodepool.PoolWorker.raxflex-sjc3-main: Active requests: ['200-0027294972']15:43
clarkbthats the nodepool provider that has the request in its todo list. It reports there isn't enough quota to fulfill the request. It is possible we have leaked fips there again. i'll check15:43
clarkbyes based on a floating ip listing I believe this is the case. I'll do what I did a week or two ago and delete all the fips that are not attached to anything15:44
clarkbdfw3 is in the same situation so I'll do that same there as well15:45
clarkbsjc3 api responses are a bit slow so I started with dfw315:50
clarkbdfw3 is done. sjc3 is in progress but slow due to the api response timing. Hopefully things will be happy in the next 5-10 minutes15:53
priteauThank you!16:20
priteauShould I have posted this to #opendev instead?16:20
clarkbeither is fine, but this is an opendev ci system issue nothing specific to openstack16:21
clarkblooking at grafana graphs I think we also have a number of nodes stuck in a deleting state in sjc3. Possibly due to the api slowness I've observed16:25
clarkbI can try to manually delete a node there and see what happens16:26
clarkbok manually deleting ~3 nodes seemed to get things moving. Possible with the api response times that nodepool was hitting some error that short circuited things until I reduced the total size of the list? I'm not sure. Either way this also seems to have helped16:31
clarkbthe job has a node now too16:32
fungiagreed, i see it started running16:34
clarkbactually I think my manual deletion just happened to coincide with james denton making a fix on the cloud side16:40
clarkbso anyway cloud helped us out, once that was sorted then nodepool could clean things up normally (except for the fips since we have fip cleanup disabled)16:40
clarkbpriteau: thank you for the heads up and I think your change has reported now16:41
priteauIt did, thanks.16:46

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!