fungi | yeah, the failure mode is basically things not getting deployed because the repos aren't updating, but the updates re effectively deferred until we get those repos updating consistently at the right times | 00:01 |
---|---|---|
opendevreview | Clark Boylan proposed opendev/system-config master: Run infra-prod jobs in parallel https://review.opendev.org/c/opendev/system-config/+/943488 | 00:04 |
opendevreview | Clark Boylan proposed opendev/system-config master: Use required-projects in bootstrap-bridge https://review.opendev.org/c/opendev/system-config/+/943509 | 00:04 |
clarkb | I think that will help. Its not completely clear to me how the two ansible roles update properly since they don't have the master checkout special cases in base-jobs but that isn't a regression so we can sort that out later I suspect | 00:05 |
clarkb | tonyb: I'll let you upload that change whenever convenient for you since ^ has managed to continue to distract me | 00:06 |
clarkb | fungi: one thing I realized is my project-config fix will always run bootstrap-bridge even if we don't update files that match things for nodepool, zuul, grafana, etc. I don't think that is a major issue and optimizing it might be more dangerous than it is worth. But I wanted to mention it | 00:12 |
opendevreview | Merged openstack/project-config master: A comment tweak to trigger nl01 config deployment https://review.opendev.org/c/openstack/project-config/+/943508 | 00:13 |
clarkb | the deploy queue for ^ lgtm | 00:13 |
clarkb | its got bootsrap bridge and service-nodepool and both are waiting behind the hourly jobs | 00:13 |
fungi | confirmed | 00:14 |
clarkb | and in this case projcet-config will update but system-confiog wont... not ideal but it will work for now | 00:14 |
clarkb | or thats the expectation anyway. We should check once bootstrap-bridge is done | 00:15 |
fungi | /etc/nodepool/nodepool.yaml was updated on nl01 at 00:08 utc, over 10 minutes ago | 00:20 |
clarkb | that would be roughly when the 00:00 hourlies for nodepool ran | 00:21 |
clarkb | did merging 943508 somehow trigger it to update? thats weird | 00:21 |
clarkb | https://zuul.opendev.org/t/openstack/build/608251960f3343ff8721323faba73c70/log/job-output.txt#109-110 it says we didn't update there | 00:22 |
fungi | not without a time machine since 943508 didn't merge until 00:13 | 00:22 |
fungi | oh, 943494 merrged tho | 00:23 |
fungi | i guess that caused project-config on bridge to get updated: https://zuul.opendev.org/t/openstack/buildset/560b4ec63265447ba74eed24f55209e3 | 00:25 |
clarkb | aha yup that would do it | 00:25 |
clarkb | because as I mentioend I didn't restrict when infra-prod-boostrap-bridge runs to only match when the other service jobs run | 00:25 |
clarkb | so the noop for services change landed but ran bootstrap-bridge and bumped things up to include your first change. Then you updated with the noop comment which should update to include that noop comment | 00:26 |
fungi | so 943508 was ultimately unnecessary between 943494 triggering the bootstrap job and then hourlies running another nodepool deploy | 00:26 |
clarkb | agreed | 00:26 |
clarkb | so I think the next step is likely updating required projects so that we update all 4 repos every time rather than just one or another. | 00:27 |
clarkb | but that seems like a good one to sleep on to ensure there isn't any race condition introduced by that | 00:27 |
clarkb | nl01 did update to your noop fix btw | 00:27 |
clarkb | grafana graphs are looking like I woudl expect too | 00:28 |
fungi | also i guess we need to restart the nodepool-launcher container on nl01 to pick up the config change? | 00:29 |
clarkb | fungi: no that is automatic | 00:29 |
clarkb | you can see it in the grafana graphs | 00:29 |
clarkb | https://grafana.opendev.org/d/6d29645669/nodepool3a-rackspace-flex?orgId=1&from=now-6h&to=now&timezone=utc&var-region=$__all | 00:29 |
clarkb | you might need to manually ask nodepool to delete the ready nodes for raxflex sjc3 though to speed cleanup for those up. Its like an 8 hour timeout otherwise? | 00:30 |
fungi | looks like the server list in that project is turning up some stuck "building" nodes that are months old | 00:32 |
clarkb | I think we haev two opetions for those. Ask jamesdenton__ to clean them up nicely and have nodepool update the db for us. Or we just say meh switch things over then nodepool may still clean them up beacuse they won't exist in the new tenant? | 00:33 |
clarkb | and then followup later and have jamesdenton__ clean them up | 00:33 |
clarkb | when I say nodepool may still clean them up I mean on the nodepool db side. That won't help wit hthe cloud side | 00:34 |
fungi | yeah | 00:34 |
clarkb | but I'm not sure nodepool can transition building to deleting in that case. I'm 99% certain this would work fi they were already deleting | 00:34 |
clarkb | so maybe ask nodepool to delete them first so they are in a deleting state? | 00:35 |
clarkb | looknig at grafana the nodepool side may already be deleting so ya I think that may just work? | 00:36 |
jamesdenton__ | still running into issues? | 00:36 |
fungi | jamesdenton__: some server instances reported as "building" in nova for months | 00:37 |
fungi | d29a78ca-979b-43d3-b2d0-9439a955d62a, bfc3635d-9d3f-4e14-88de-021932759c67, 8065ff20-e395-46d0-a7f5-780513461e7c | 00:37 |
jamesdenton__ | just those 3? | 00:38 |
fungi | yes, for servers stuck in building | 00:38 |
clarkb | fungi: grafana shows 4 servers consistently in a deleting state I wonder if those are stuck too? | 00:39 |
fungi | clarkb: two of the stuck building (from nova's perspective) servers are in a deleting state in nodepool | 00:40 |
fungi | not sure about the other two yet | 00:40 |
clarkb | ack | 00:40 |
clarkb | jamesdenton__: the background here is we're trying to shutdown the old tenant resources in sjc3 so that we can spin up resources in the new tenant to match what is in dfw3, then turn on dfw3 | 00:40 |
jamesdenton__ | so, don't worry about those 3 because they're fake news. We will clean them up, though | 00:41 |
fungi | also tkajinam has an autohold from a few months ago locking an instance in the old project there, so i'll release that as i doubt it's still relevant this long after | 00:42 |
clarkb | ++ | 00:43 |
fungi | i also manually nodepool deleted the ready nodes | 00:45 |
clarkb | according to grafana everything is in a deleting state now. My hunch is that when we switch sjc3 over to the new tenant nodepool will do a listing and see those nodes are all "gone" and clean up its db treating them as deleted (even though they may still exist in the old tenant) | 00:46 |
clarkb | so I think that is a 'safe' state to begin the transition from as long as we coordinate with jamesdenton__ and crew to clean things up in the old tenant | 00:46 |
fungi | the other two that are stuck in a deleting state in nodepool are showing active in nova. i'll see what happens if i try openstack server delete on them | 00:47 |
fungi | they seem to stay listed as active | 00:48 |
fungi | both were created arround 12 hours ago | 00:49 |
fungi | jamesdenton__: these two also won't delete... 098ed720-3160-4aa6-8364-ba960a1841a4 49771e38-c82d-4107-93a6-019c9e3f1795 | 00:50 |
jamesdenton__ | ok looking | 00:50 |
jamesdenton__ | for those two - are you getting any sort of error or do they just remain active? | 00:51 |
fungi | they just remain active, no error | 00:51 |
jamesdenton__ | kk | 00:51 |
jamesdenton__ | mind if i try on my side? | 00:52 |
fungi | openstack server delete returns successfully for me but no apparent change on the cloud end. please try whatever you like | 00:52 |
jamesdenton__ | kk | 00:52 |
fungi | i'm really just trying to clear out that tenant/project at this point since, as clark said, we're pulling out of it in order to move to the new project in that region | 00:53 |
fungi | all the server instances i was able to delete have been, the remaining 5 (3 in building, 2 in active) just don't seem to want to go away quietly | 00:54 |
fungi | clarkb: from the nodepool side of things, it may just clean them out when we switch the cloud config over to the new tenant project_id, since it will do a server list and discover they're gone, right? | 01:00 |
fungi | it's already cleaned up the images, so i expect it's safe to merge https://review.opendev.org/942231 now | 01:03 |
clarkb | fungi: yes that is my expectation | 01:04 |
fungi | i've approved it | 01:04 |
clarkb | it requests a delete then periodically lists servers to see if they have gone away. Changing tenants will effectively mark them gone away | 01:04 |
fungi | yeah, good enough | 01:04 |
opendevreview | Stephen Reaves proposed openstack/diskimage-builder master: Enable custom overlays https://review.opendev.org/c/openstack/diskimage-builder/+/943500 | 01:12 |
jamesdenton__ | fungi for 098ed720-3160-4aa6-8364-ba960a1841a4 49771e38-c82d-4107-93a6-019c9e3f1795, i reset their state to error then issued the delete again which seems to have done the trick | 01:17 |
fungi | thanks jamesdenton__! that did indeed solve it | 01:26 |
opendevreview | Merged opendev/system-config master: Switch Nodepool to the new Rackspace Flex project https://review.opendev.org/c/opendev/system-config/+/942231 | 01:41 |
fungi | it's deploying | 01:44 |
clarkb | that update may require you to restart nodepool-launcher because it updates clouds.yaml and not nodepool.yaml | 01:46 |
jamesdenton__ | fungi those 3 BUILD offenders are gone, too | 01:46 |
fungi | confirmed, thanks again jamesdenton__! | 01:49 |
fungi | clarkb: 943102 goes in next to update nodepool.yaml, so that will probably get it? | 01:51 |
fungi | and similarly 943103 for zuul-launcher configuration | 01:52 |
clarkb | it may I'm not sure if we create a new openstack client when bumping the provider config up | 01:54 |
clarkb | if it does then ya it should be fine | 01:54 |
fungi | though also it presumably won't matter until 943106 ups max-servers on the nodepool launchers | 01:56 |
fungi | /etc/openstack/clouds.yaml on the builders updated at 01:43 | 02:00 |
fungi | approving 943102 and 943103 now | 02:01 |
clarkb | fungi: note that periodics just started | 02:05 |
clarkb | so you'll be enqueued behind all that | 02:06 |
clarkb | looks like hourlies got in ahead of periodics and periodics are waiting on hourlies | 02:06 |
clarkb | so thats good | 02:06 |
opendevreview | Tony Breeds proposed opendev/system-config master: Add option to force docker.io addresses to IPv4 https://review.opendev.org/c/opendev/system-config/+/943216 | 02:06 |
fungi | no biggie | 02:06 |
opendevreview | Merged opendev/zuul-providers master: Revert "Reapply "Wind down/clean up Rackspace Flex SJC3 resources"" https://review.opendev.org/c/opendev/zuul-providers/+/943103 | 02:07 |
fungi | i'll go ahead and put in 943105 too so the images can start uploading to dfw3 asap | 02:10 |
opendevreview | Merged openstack/project-config master: Revert "Wind down/clean up Rackspace Flex SJC3 resources" https://review.opendev.org/c/openstack/project-config/+/943102 | 02:30 |
opendevreview | Merged openstack/project-config master: Add the DFW3 region for Rackspace Flex https://review.opendev.org/c/openstack/project-config/+/943105 | 02:36 |
fungi | looks like the deploy jobs are going to be a while still, the backlogged hourlies ahead of them still haven't started. i may have to pick this back up in the morning, but hopefully images will upload while i'm asleep so we cn just turn the max-servers back on at that point | 03:27 |
opendevreview | Tony Breeds proposed opendev/system-config master: Add option to force docker.io addresses to IPv4 https://review.opendev.org/c/opendev/system-config/+/943216 | 03:31 |
*** dmellado0755393736 is now known as dmellado075539373 | 06:43 | |
*** diablo_rojo_phone is now known as Guest10755 | 08:01 | |
veith4f_ | Hello. Just stumbled over openstack project cleanup ... https://bugs.launchpad.net/openstacksdk/+bug/2100958. Seems like the fix is to just delete security groups after networks. | 08:49 |
frickler | veith4f_: the topic of this channel is to discuss the infrastructure that is used in developing openstack or other projects. for sdk bugs please see the #openstack-sdks channel. also have a look at https://docs.openstack.org/contributors/code-and-documentation/quick-start.html for submitting patches | 09:04 |
veith4f_ | will do. | 09:04 |
frickler | infra-root: first branches for 2025.1 are happening, I'll watch to see whether zuul picks up all of these, iiuc that should be a good indicator for missed branch events otherwise? https://review.opendev.org/q/topic:%22create-2025.1%22 | 09:06 |
opendevreview | Fabien Boucher proposed zuul/zuul-jobs master: zuul_log_path: allow override for emit-job-header and upload-logs https://review.opendev.org/c/zuul/zuul-jobs/+/943586 | 11:30 |
opendevreview | Matthieu Huin proposed zuul/zuul-jobs master: Fix the upload-logs-s3 test playbook https://review.opendev.org/c/zuul/zuul-jobs/+/927600 | 11:31 |
opendevreview | Fabien Boucher proposed zuul/zuul-jobs master: zuul_log_path: allow override for emit-job-header and upload-logs https://review.opendev.org/c/zuul/zuul-jobs/+/943586 | 12:54 |
Clark[m] | frickler: ya if you look at the zuul API or web UI you should see the new 2025.1 branch like on https://zuul.opendev.org/t/openstack/project/opendev.org/openstack/ovsdbapp | 14:12 |
Clark[m] | frickler if that branch is missing but does exist in Gerrit/git then that is a case of this problem and we have an example to examine logs for | 14:12 |
opendevreview | Fabien Boucher proposed zuul/zuul-jobs master: zuul_log_path: allow override for emit-job-header and upload-logs https://review.opendev.org/c/zuul/zuul-jobs/+/943586 | 14:16 |
fungi | deploy for 943105 succeeded at 05:07:47 utc, but i don't see any images uploaded to dfw3 yet, investigating now | 14:18 |
fungi | the sjc3 images did upload to the correct (new) project at least, so i don't think it's a stale clouds.yaml problem for the nodepool builders | 14:19 |
fungi | urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='glance.api.dfw3.rackspacecloud.com', port=443): Max retries exceeded with url: /v2/images/e1b42593-4dca-46aa-973c-f3982447e1a8/file (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2427)'))) | 14:24 |
Clark[m] | My browser seems to be happy with that / at that host and port. I get back a json doc and no ssl issues. Maybe a proxy problem when actually uploading images? | 14:28 |
fungi | yeah, that was the first thing i tested too | 14:29 |
fungi | looks like the last time nb01 logged that was 10:15:31,111 utc, a little over 4 hours ago | 14:32 |
fungi | and 11:58:33,064 utc on nb02 | 14:34 |
fungi | doesn't look like either has tried to upload again since then | 14:34 |
fungi | so ~2.5 hours ago | 14:34 |
Clark[m] | Are there any uploads currently running against dfw3? | 14:34 |
fungi | not that i found in the current debug logs | 14:35 |
Clark[m] | Looking to see what state we can infer about those might point at where the ssl eof is occuring | 14:35 |
Clark[m] | A nodepool image-list should confirm | 14:35 |
fungi | looks like the create image calls are terminating at exactly 30 seconds in | 14:35 |
fungi | my guess is a waf/middlebox cutting it off | 14:36 |
Clark[m] | Ya. Though you uploaded an image for the noble mirror right? So we know it worked at one point? | 14:36 |
fungi | yes | 14:36 |
fungi | when jamesdenton__ is awake, maybe he can confirm whether something new might be disconnecting glance uploads in dfw3 after exactly 30 seconds | 14:37 |
fungi | my successful manual image upload to dfw3 was 2025-02-20T16:56:22Z so just over two weeks ago | 14:39 |
fungi | er, exactly two weeks ago | 14:40 |
fungi | i'll try another manual upload just to confirm | 14:40 |
fungi | HttpException: 413: Client Error for url: https://glance.api.dfw3.rackspacecloud.com/v2/images/7e041fcd-6513-4e81-808a-05da65635e28/file, 413 Request Entity Too Large: nginx | 14:44 |
fungi | that's what i get back from `openstack server create` to dfw3 now, exact same command worked when i ran it two weeks ago, pulled right from my shell history | 14:45 |
fungi | trying to sjc3 again for comparison | 14:45 |
fungi | and it worked fine there just now | 14:46 |
fungi | the 30 seconds might be a red herring, i have a feeling that might be related to async image upload processing and background checking intervals | 14:47 |
Clark[m] | That could be. It's also good a problem reproduces with the main tool because now rax can test without running a nodepool | 14:49 |
fungi | yeah, when testing with the cli i get that 413 error back after only 15 seconds | 14:50 |
Clark[m] | I guess the next step for us is to turn sjc3 test nodes on while we wait for dfw3 images to sort out? | 14:50 |
jamesdenton__ | fungi how large is that image? | 14:51 |
fungi | 582M noble-server-cloudimg-amd64.img | 14:52 |
fungi | the one i just tested with the cli | 14:52 |
fungi | the ones nodepool is uploading are much larger, but i can't even upload a half-gigabyte cloud image to dfw3 at the moment | 14:52 |
jamesdenton__ | ok, let me ping the team about that | 14:53 |
fungi | jamesdenton__: and two weeks ago it was working fine | 14:53 |
fungi | exact same image create command | 14:53 |
fungi | with the exact same file even | 14:54 |
fungi | i still had it sitting in my homedir, hadn't gotten around to deleting it | 14:54 |
fungi | Clark[m]: yeah, i can split 943106 into two separate changes for now | 14:55 |
jamesdenton__ | fungi can you try again? they made a change that might fix it | 14:56 |
opendevreview | Fabien Boucher proposed zuul/zuul-jobs master: zuul_log_path: allow override for emit-job-header and upload-logs https://review.opendev.org/c/zuul/zuul-jobs/+/943586 | 14:57 |
opendevreview | Jeremy Stanley proposed openstack/project-config master: Boot nodes in Rackspace Flex SJC3 again https://review.opendev.org/c/openstack/project-config/+/943106 | 15:01 |
opendevreview | Jeremy Stanley proposed openstack/project-config master: Start booting nodes in Rackspace Flex DFW3 https://review.opendev.org/c/openstack/project-config/+/943617 | 15:01 |
fungi | jamesdenton__: jamesdenton__ now my test upload to dfw3 is working, yes. thanks!!! | 15:02 |
jamesdenton__ | thanks for submitting your bug report. have a nice day :D | 15:02 |
fungi | heh ;) | 15:03 |
fungi | i'll figure out how to prompt the nodepool builders to try again | 15:03 |
fungi | i've set 943617 wip until we get our images up there | 15:06 |
fungi | images are already starting to appear there without me doing anything, so i guess nodepool has started to figure it out | 15:08 |
Clark[m] | fungi: 943106 lgtm but I can't vote for a bit due to the school run. I say feel free to self approve | 15:21 |
fungi | will do | 15:21 |
clarkb | fungi: I +2'd both with a note on the second that I'll let you approve when you are happy with the image state | 15:45 |
clarkb | fungi: did we or do we still need to update zuul-launcher? | 15:45 |
opendevreview | Merged openstack/project-config master: Boot nodes in Rackspace Flex SJC3 again https://review.opendev.org/c/openstack/project-config/+/943106 | 15:46 |
fungi | clarkb: that was https://review.opendev.org/c/opendev/zuul-providers/+/943103 | 15:47 |
opendevreview | Clark Boylan proposed opendev/system-config master: Add option to force docker.io addresses to IPv4 https://review.opendev.org/c/opendev/system-config/+/943216 | 15:48 |
clarkb | excellent. There aws enough going on yesterday I missed it. | 15:48 |
fungi | clarkb: also 943104 goes along with 943617 as the zuul-launcher counterpart for enabling dfw3 | 15:51 |
fungi | though it could in theory be approved any time if we don't want to wait to confirm nodepool is able to boot nodes there first | 15:52 |
clarkb | I suspect that is less urgent as sjc3 should be able to provide zuul-launcher coverage for now so maybe focus on nodepool first since it is easier for us to confirm functionality via nodepool | 15:54 |
clarkb | infra-root I posted a self review to https://review.opendev.org/c/opendev/system-config/+/943509 with one last thought on the required-projects list and how they interact with the dns zone repos. Maybe corvus knows the answer to my question off the top of his head otherwise we may need to read the source luke or experiment | 15:54 |
clarkb | actually I answered my own question the zone repos are fetched from opendev.org | 15:56 |
clarkb | so zuul isn't involved. I'll update my review | 15:56 |
fungi | yeah, my understanding is that zuul will add the triggering project to the projects list regardless, but i suppose that doesn't matter if the job is ignoring zuul's checkout | 15:57 |
clarkb | yup | 15:58 |
clarkb | fungi: grafana shows max-servers in sjc3 has gone back up to 32. I don't see any nodes yet via the graphs | 16:00 |
clarkb | no building nodes either so probably just a lack of demand at the moment | 16:02 |
fungi | or the launcher on nl01 is still looking at the old project which no longer contains any images, e.g. because updating nodepool.yaml didn't cause it to create new cloud objects after clouds.yaml updated | 16:05 |
fungi | may need to restart the container | 16:06 |
clarkb | hrm could be | 16:06 |
clarkb | if we think that is possible restarting it now before it accidentally manages to boot something (though without images that should be impossible) is probably a good idea | 16:07 |
clarkb | the nodepool hourly job is running through so may want to wait for that to finish then restart | 16:07 |
fungi | sure | 16:08 |
clarkb | we have a building node now | 16:18 |
clarkb | it doesn't look like the nodepool launcher on nl01 was restarted yet fwiw | 16:20 |
clarkb | but a server list from bridge seems to show that it is launching in the correct tenant | 16:21 |
fungi | agreed, one active and another building | 16:23 |
fungi | correct project in sjc3 | 16:23 |
fungi | no intervention needed, just patience i suppose ;) | 16:24 |
clarkb | ++ | 16:24 |
clarkb | the in use node is no longer in use and got deleted. I'm going to try and ssh into one of the building nodes when they are ready/in-use to double check network mtus | 16:25 |
clarkb | in theory it should all be happy now | 16:25 |
fungi | good idea | 16:25 |
clarkb | ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 | 16:26 |
clarkb | from 65.17.193.51 | 16:26 |
fungi | perfect | 16:26 |
clarkb | yup I think that means the two major things we wanted to address have been addressed (use newer project/tenant and get 1500 network MTUs) | 16:30 |
clarkb | adding dfw3 and doubling our quota is an excellent bonus | 16:31 |
opendevreview | Stephen Reaves proposed openstack/diskimage-builder master: Enable custom overlays https://review.opendev.org/c/openstack/diskimage-builder/+/943500 | 16:33 |
fungi | clarkb: i suppose we could go ahead and and turn on dfw3 to start at least satisfying node requests for the images that are already uploaded. or will that cause issues? | 16:51 |
clarkb | I think that should be fine | 16:51 |
clarkb | nodepool should know it can't boot the other images. | 16:52 |
fungi | presumably it doesn't make sense to retry earlier uploads as the builders have probably cleaned up their local copies | 16:54 |
fungi | also images we don't rebuild as often will take a while to start appearing regardless | 16:54 |
clarkb | we can request new builds of images that we know we want/need | 16:58 |
clarkb | I think we do keep the local image content as long as the image is active in a cloud | 16:59 |
clarkb | active from nodepool's perspective I mean. Once nodepool says I'm deleting that image and that is true for all clouds then it will clean up the local data. This can occur before the actual images are removed from the clouds | 16:59 |
opendevreview | Merged openstack/project-config master: Start booting nodes in Rackspace Flex DFW3 https://review.opendev.org/c/openstack/project-config/+/943617 | 17:04 |
fungi | we probably won't see much uptake there since it's only got debian-bookworm and ubuntu-bionic for now, but i'll check the graphs later today to see if there are at least some blips in use | 17:05 |
clarkb | oh however we may only keep the smallest image file (qcow2?) so maybe we'd have to manually convert to raw if we are uploading raw here | 17:10 |
clarkb | and ya its probably not worth the trouble just let new images build and deploy over the next few days (the most used images should still be built daily) | 17:11 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Clean up old Rackspace Flex SJC3 project https://review.opendev.org/c/opendev/system-config/+/943625 | 17:27 |
clarkb | https://review.opendev.org/c/opendev/system-config/+/943216 is passing again now. As written it should only affect our CI jobs and not production. But that is worth double checking as I think we need to be much more careful with this behavior in production (due to issues like stale records that could point requests at the wrong ips) | 17:33 |
opendevreview | Fabien Boucher proposed zuul/zuul-jobs master: zuul_log_path: allow override for emit-job-header and upload-logs https://review.opendev.org/c/zuul/zuul-jobs/+/943586 | 17:43 |
opendevreview | Fabien Boucher proposed zuul/zuul-jobs master: zuul_log_path: allow override for emit-job-header and upload-logs https://review.opendev.org/c/zuul/zuul-jobs/+/943586 | 17:45 |
fungi | a bunch more images have appeared in dfw3 now | 17:50 |
clarkb | still waiting on the first node boot. Quick someone push a change to tacker :) | 17:56 |
fungi | i went ahead and cleaned up orphaned floating ips, images, routers, networks and ports in our two old sjc3 projects | 18:25 |
clarkb | good idea | 18:33 |
clarkb | I've rechecked https://review.opendev.org/c/opendev/system-config/+/943326 on the off chance the seelenium update can land without the docker.io ipv4 change and maybe that will generate enough load to see things schedule on dfw3 | 18:34 |
fungi | oh, removed old keypairs and security group rules too | 18:37 |
clarkb | seems like overall load is low enough that everything else keeps scheduling nodes and not dfw3. | 18:38 |
fungi | i can't think of anything else we'd need to cleanup | 18:38 |
fungi | yeah, i have a feeling we may not see any node use there until 02:00 | 18:38 |
fungi | at least the daily jobs still (more than) saturate our quota everywhere | 18:38 |
opendevreview | Karolina Kula proposed opendev/glean master: WIP: Add support for CentOS 10 keyfiles https://review.opendev.org/c/opendev/glean/+/941672 | 21:34 |
opendevreview | Karolina Kula proposed openstack/diskimage-builder master: WIP: Add support for CentOS Stream 10 https://review.opendev.org/c/openstack/diskimage-builder/+/934045 | 21:35 |
opendevreview | Merged opendev/system-config master: Pull the selenium standalone-firefox image from quay https://review.opendev.org/c/opendev/system-config/+/943326 | 22:00 |
fungi | Exception: Unable to find flavor: gp.0.4.8 | 22:01 |
fungi | should be gp.5.4.8 there, working on a patch | 22:02 |
opendevreview | Jeremy Stanley proposed openstack/project-config master: Correct flavor name for Rackspace Flex DFW3 https://review.opendev.org/c/openstack/project-config/+/943647 | 22:07 |
fungi | that's going to break some assumptions for our existing zuul-launcher config too, looks like | 22:09 |
clarkb | oh hey the selenium image update meregd without extra help | 22:11 |
fungi | yup! | 22:11 |
clarkb | I've approved the flavor fix in project-config | 22:11 |
clarkb | looks like sjc3 may be erroring on boots too | 22:12 |
opendevreview | Jeremy Stanley proposed opendev/zuul-providers master: Add the DFW3 region for Rackspace Flex https://review.opendev.org/c/opendev/zuul-providers/+/943104 | 22:14 |
clarkb | Invalid key_name provided. | 22:14 |
clarkb | fungi: ^ did stuff get deleted in the wrong tenant? | 22:14 |
fungi | checking | 22:15 |
fungi | no | 22:15 |
fungi | unless keypairs are cross-tenant resources? | 22:16 |
fungi | oh! they are :/ | 22:16 |
clarkb | that would explain it. keypair list is empty in both tenants | 22:16 |
clarkb | how often does cloud launcher run? once a day in periodic? | 22:17 |
fungi | they must be tied to the account? | 22:17 |
clarkb | ya I guess so that seems like a bug if I'm honest | 22:17 |
clarkb | there is no reason for tenant A to know anything about tenant B's keys if all other resources are separated | 22:17 |
clarkb | anyway cloud launcher should fix that for us | 22:17 |
clarkb | I want to say it runs daily | 22:18 |
clarkb | or when we update the vars for it | 22:18 |
fungi | yeah, i stupidly assumed that `openstack keypair delete ...` would delete them from the identified project only, not from all projects that account has access to | 22:18 |
fungi | but yeah, we're probably going to see that get fixed in ~4 hours | 22:19 |
clarkb | and dfw3 looks fine so it should come online with the flavor fix | 22:20 |
clarkb | fungi: fwiw I would've assumed they were separate too. I would've done the same thing and been confused | 22:26 |
clarkb | (also why I initially wondered if the wrong tenant was used to request the deletion) | 22:27 |
fungi | well, the week's incomplete if i haven't broken something! at least i got it out of the way ;) | 22:31 |
clarkb | I'm still trying to fix the stuff I broke :) | 22:32 |
clarkb | mostly waiting on a secnd review on the infra-prod stuff since its complicated and sensitive enough that the current half broken but mostly working state is preferable to very broken if I got the fix wrong | 22:32 |
opendevreview | Merged openstack/project-config master: Correct flavor name for Rackspace Flex DFW3 https://review.opendev.org/c/openstack/project-config/+/943647 | 22:33 |
opendevreview | Merged openstack/diskimage-builder master: tox: Drop functest target from tox https://review.opendev.org/c/openstack/diskimage-builder/+/932266 | 22:44 |
clarkb | I haev confirmed that cloudlauncher will run with the periodic jobs | 22:44 |
clarkb | so ya I don't think we need to rush to try and fix it before then | 22:44 |
clarkb | dfw3 is attempting to boot 5 servers right now | 22:46 |
clarkb | I was able to ssh into one and check mtus. All looks well there | 22:47 |
fungi | cool, so it was just the flavor difference | 22:50 |
clarkb | yup seems to be working based on these measurements now | 22:51 |
clarkb | https://zuul.opendev.org/t/openstack/stream/f112d5081fe9466d8fc360174cc865ee?logfile=console.log this job is running in dfw3 | 23:08 |
clarkb | https://zuul.opendev.org/t/openstack/build/f112d5081fe9466d8fc360174cc865ee it finished successfully | 23:14 |
fungi | i think that's about all we can hope for | 23:14 |
clarkb | yup | 23:15 |
clarkb | I think I'm going to take that as a cue to go for a bike ride. Its been a quiet day and weather is nice. The onyl things in my backlog are potentially impactful so better for tomorrow morning (the docker stuff, gitea upgrade, and infra-prod required projects update)_ | 23:16 |
fungi | great idea, have fun! | 23:20 |
fungi | i'll try to check back up on sjc3 after the keypairs are redeployed | 23:20 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!