Thursday, 2025-03-06

fungi	yeah, the failure mode is basically things not getting deployed because the repos aren't updating, but the updates re effectively deferred until we get those repos updating consistently at the right times	00:01
opendevreview	Clark Boylan proposed opendev/system-config master: Run infra-prod jobs in parallel https://review.opendev.org/c/opendev/system-config/+/943488	00:04
opendevreview	Clark Boylan proposed opendev/system-config master: Use required-projects in bootstrap-bridge https://review.opendev.org/c/opendev/system-config/+/943509	00:04
clarkb	I think that will help. Its not completely clear to me how the two ansible roles update properly since they don't have the master checkout special cases in base-jobs but that isn't a regression so we can sort that out later I suspect	00:05
clarkb	tonyb: I'll let you upload that change whenever convenient for you since ^ has managed to continue to distract me	00:06
clarkb	fungi: one thing I realized is my project-config fix will always run bootstrap-bridge even if we don't update files that match things for nodepool, zuul, grafana, etc. I don't think that is a major issue and optimizing it might be more dangerous than it is worth. But I wanted to mention it	00:12
opendevreview	Merged openstack/project-config master: A comment tweak to trigger nl01 config deployment https://review.opendev.org/c/openstack/project-config/+/943508	00:13
clarkb	the deploy queue for ^ lgtm	00:13
clarkb	its got bootsrap bridge and service-nodepool and both are waiting behind the hourly jobs	00:13
fungi	confirmed	00:14
clarkb	and in this case projcet-config will update but system-confiog wont... not ideal but it will work for now	00:14
clarkb	or thats the expectation anyway. We should check once bootstrap-bridge is done	00:15
fungi	/etc/nodepool/nodepool.yaml was updated on nl01 at 00:08 utc, over 10 minutes ago	00:20
clarkb	that would be roughly when the 00:00 hourlies for nodepool ran	00:21
clarkb	did merging 943508 somehow trigger it to update? thats weird	00:21
clarkb	https://zuul.opendev.org/t/openstack/build/608251960f3343ff8721323faba73c70/log/job-output.txt#109-110 it says we didn't update there	00:22
fungi	not without a time machine since 943508 didn't merge until 00:13	00:22
fungi	oh, 943494 merrged tho	00:23
fungi	i guess that caused project-config on bridge to get updated: https://zuul.opendev.org/t/openstack/buildset/560b4ec63265447ba74eed24f55209e3	00:25
clarkb	aha yup that would do it	00:25
clarkb	because as I mentioend I didn't restrict when infra-prod-boostrap-bridge runs to only match when the other service jobs run	00:25
clarkb	so the noop for services change landed but ran bootstrap-bridge and bumped things up to include your first change. Then you updated with the noop comment which should update to include that noop comment	00:26
fungi	so 943508 was ultimately unnecessary between 943494 triggering the bootstrap job and then hourlies running another nodepool deploy	00:26
clarkb	agreed	00:26
clarkb	so I think the next step is likely updating required projects so that we update all 4 repos every time rather than just one or another.	00:27
clarkb	but that seems like a good one to sleep on to ensure there isn't any race condition introduced by that	00:27
clarkb	nl01 did update to your noop fix btw	00:27
clarkb	grafana graphs are looking like I woudl expect too	00:28
fungi	also i guess we need to restart the nodepool-launcher container on nl01 to pick up the config change?	00:29
clarkb	fungi: no that is automatic	00:29
clarkb	you can see it in the grafana graphs	00:29
clarkb	https://grafana.opendev.org/d/6d29645669/nodepool3a-rackspace-flex?orgId=1&from=now-6h&to=now&timezone=utc&var-region=$__all	00:29
clarkb	you might need to manually ask nodepool to delete the ready nodes for raxflex sjc3 though to speed cleanup for those up. Its like an 8 hour timeout otherwise?	00:30
fungi	looks like the server list in that project is turning up some stuck "building" nodes that are months old	00:32
clarkb	I think we haev two opetions for those. Ask jamesdenton__ to clean them up nicely and have nodepool update the db for us. Or we just say meh switch things over then nodepool may still clean them up beacuse they won't exist in the new tenant?	00:33
clarkb	and then followup later and have jamesdenton__ clean them up	00:33
clarkb	when I say nodepool may still clean them up I mean on the nodepool db side. That won't help wit hthe cloud side	00:34
fungi	yeah	00:34
clarkb	but I'm not sure nodepool can transition building to deleting in that case. I'm 99% certain this would work fi they were already deleting	00:34
clarkb	so maybe ask nodepool to delete them first so they are in a deleting state?	00:35
clarkb	looknig at grafana the nodepool side may already be deleting so ya I think that may just work?	00:36
jamesdenton__	still running into issues?	00:36
fungi	jamesdenton__: some server instances reported as "building" in nova for months	00:37
fungi	d29a78ca-979b-43d3-b2d0-9439a955d62a, bfc3635d-9d3f-4e14-88de-021932759c67, 8065ff20-e395-46d0-a7f5-780513461e7c	00:37
jamesdenton__	just those 3?	00:38
fungi	yes, for servers stuck in building	00:38
clarkb	fungi: grafana shows 4 servers consistently in a deleting state I wonder if those are stuck too?	00:39
fungi	clarkb: two of the stuck building (from nova's perspective) servers are in a deleting state in nodepool	00:40
fungi	not sure about the other two yet	00:40
clarkb	ack	00:40
clarkb	jamesdenton__: the background here is we're trying to shutdown the old tenant resources in sjc3 so that we can spin up resources in the new tenant to match what is in dfw3, then turn on dfw3	00:40
jamesdenton__	so, don't worry about those 3 because they're fake news. We will clean them up, though	00:41
fungi	also tkajinam has an autohold from a few months ago locking an instance in the old project there, so i'll release that as i doubt it's still relevant this long after	00:42
clarkb	++	00:43
fungi	i also manually nodepool deleted the ready nodes	00:45
clarkb	according to grafana everything is in a deleting state now. My hunch is that when we switch sjc3 over to the new tenant nodepool will do a listing and see those nodes are all "gone" and clean up its db treating them as deleted (even though they may still exist in the old tenant)	00:46
clarkb	so I think that is a 'safe' state to begin the transition from as long as we coordinate with jamesdenton__ and crew to clean things up in the old tenant	00:46
fungi	the other two that are stuck in a deleting state in nodepool are showing active in nova. i'll see what happens if i try openstack server delete on them	00:47
fungi	they seem to stay listed as active	00:48
fungi	both were created arround 12 hours ago	00:49
fungi	jamesdenton__: these two also won't delete... 098ed720-3160-4aa6-8364-ba960a1841a4 49771e38-c82d-4107-93a6-019c9e3f1795	00:50
jamesdenton__	ok looking	00:50
jamesdenton__	for those two - are you getting any sort of error or do they just remain active?	00:51
fungi	they just remain active, no error	00:51
jamesdenton__	kk	00:51
jamesdenton__	mind if i try on my side?	00:52
fungi	openstack server delete returns successfully for me but no apparent change on the cloud end. please try whatever you like	00:52
jamesdenton__	kk	00:52
fungi	i'm really just trying to clear out that tenant/project at this point since, as clark said, we're pulling out of it in order to move to the new project in that region	00:53
fungi	all the server instances i was able to delete have been, the remaining 5 (3 in building, 2 in active) just don't seem to want to go away quietly	00:54
fungi	clarkb: from the nodepool side of things, it may just clean them out when we switch the cloud config over to the new tenant project_id, since it will do a server list and discover they're gone, right?	01:00
fungi	it's already cleaned up the images, so i expect it's safe to merge https://review.opendev.org/942231 now	01:03
clarkb	fungi: yes that is my expectation	01:04
fungi	i've approved it	01:04
clarkb	it requests a delete then periodically lists servers to see if they have gone away. Changing tenants will effectively mark them gone away	01:04
fungi	yeah, good enough	01:04
opendevreview	Stephen Reaves proposed openstack/diskimage-builder master: Enable custom overlays https://review.opendev.org/c/openstack/diskimage-builder/+/943500	01:12
jamesdenton__	fungi for 098ed720-3160-4aa6-8364-ba960a1841a4 49771e38-c82d-4107-93a6-019c9e3f1795, i reset their state to error then issued the delete again which seems to have done the trick	01:17
fungi	thanks jamesdenton__! that did indeed solve it	01:26
opendevreview	Merged opendev/system-config master: Switch Nodepool to the new Rackspace Flex project https://review.opendev.org/c/opendev/system-config/+/942231	01:41
fungi	it's deploying	01:44
clarkb	that update may require you to restart nodepool-launcher because it updates clouds.yaml and not nodepool.yaml	01:46
jamesdenton__	fungi those 3 BUILD offenders are gone, too	01:46
fungi	confirmed, thanks again jamesdenton__!	01:49
fungi	clarkb: 943102 goes in next to update nodepool.yaml, so that will probably get it?	01:51
fungi	and similarly 943103 for zuul-launcher configuration	01:52
clarkb	it may I'm not sure if we create a new openstack client when bumping the provider config up	01:54
clarkb	if it does then ya it should be fine	01:54
fungi	though also it presumably won't matter until 943106 ups max-servers on the nodepool launchers	01:56
fungi	/etc/openstack/clouds.yaml on the builders updated at 01:43	02:00
fungi	approving 943102 and 943103 now	02:01
clarkb	fungi: note that periodics just started	02:05
clarkb	so you'll be enqueued behind all that	02:06
clarkb	looks like hourlies got in ahead of periodics and periodics are waiting on hourlies	02:06
clarkb	so thats good	02:06
opendevreview	Tony Breeds proposed opendev/system-config master: Add option to force docker.io addresses to IPv4 https://review.opendev.org/c/opendev/system-config/+/943216	02:06
fungi	no biggie	02:06
opendevreview	Merged opendev/zuul-providers master: Revert "Reapply "Wind down/clean up Rackspace Flex SJC3 resources"" https://review.opendev.org/c/opendev/zuul-providers/+/943103	02:07
fungi	i'll go ahead and put in 943105 too so the images can start uploading to dfw3 asap	02:10
opendevreview	Merged openstack/project-config master: Revert "Wind down/clean up Rackspace Flex SJC3 resources" https://review.opendev.org/c/openstack/project-config/+/943102	02:30
opendevreview	Merged openstack/project-config master: Add the DFW3 region for Rackspace Flex https://review.opendev.org/c/openstack/project-config/+/943105	02:36
fungi	looks like the deploy jobs are going to be a while still, the backlogged hourlies ahead of them still haven't started. i may have to pick this back up in the morning, but hopefully images will upload while i'm asleep so we cn just turn the max-servers back on at that point	03:27
opendevreview	Tony Breeds proposed opendev/system-config master: Add option to force docker.io addresses to IPv4 https://review.opendev.org/c/opendev/system-config/+/943216	03:31
*** dmellado0755393736 is now known as dmellado075539373		06:43
*** diablo_rojo_phone is now known as Guest10755		08:01
veith4f_	Hello. Just stumbled over openstack project cleanup ... https://bugs.launchpad.net/openstacksdk/+bug/2100958. Seems like the fix is to just delete security groups after networks.	08:49
frickler	veith4f_: the topic of this channel is to discuss the infrastructure that is used in developing openstack or other projects. for sdk bugs please see the #openstack-sdks channel. also have a look at https://docs.openstack.org/contributors/code-and-documentation/quick-start.html for submitting patches	09:04
veith4f_	will do.	09:04
frickler	infra-root: first branches for 2025.1 are happening, I'll watch to see whether zuul picks up all of these, iiuc that should be a good indicator for missed branch events otherwise? https://review.opendev.org/q/topic:%22create-2025.1%22	09:06
opendevreview	Fabien Boucher proposed zuul/zuul-jobs master: zuul_log_path: allow override for emit-job-header and upload-logs https://review.opendev.org/c/zuul/zuul-jobs/+/943586	11:30
opendevreview	Matthieu Huin proposed zuul/zuul-jobs master: Fix the upload-logs-s3 test playbook https://review.opendev.org/c/zuul/zuul-jobs/+/927600	11:31
opendevreview	Fabien Boucher proposed zuul/zuul-jobs master: zuul_log_path: allow override for emit-job-header and upload-logs https://review.opendev.org/c/zuul/zuul-jobs/+/943586	12:54
Clark[m]	frickler: ya if you look at the zuul API or web UI you should see the new 2025.1 branch like on https://zuul.opendev.org/t/openstack/project/opendev.org/openstack/ovsdbapp	14:12
Clark[m]	frickler if that branch is missing but does exist in Gerrit/git then that is a case of this problem and we have an example to examine logs for	14:12
opendevreview	Fabien Boucher proposed zuul/zuul-jobs master: zuul_log_path: allow override for emit-job-header and upload-logs https://review.opendev.org/c/zuul/zuul-jobs/+/943586	14:16
fungi	deploy for 943105 succeeded at 05:07:47 utc, but i don't see any images uploaded to dfw3 yet, investigating now	14:18
fungi	the sjc3 images did upload to the correct (new) project at least, so i don't think it's a stale clouds.yaml problem for the nodepool builders	14:19
fungi	urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='glance.api.dfw3.rackspacecloud.com', port=443): Max retries exceeded with url: /v2/images/e1b42593-4dca-46aa-973c-f3982447e1a8/file (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2427)')))	14:24
Clark[m]	My browser seems to be happy with that / at that host and port. I get back a json doc and no ssl issues. Maybe a proxy problem when actually uploading images?	14:28
fungi	yeah, that was the first thing i tested too	14:29
fungi	looks like the last time nb01 logged that was 10:15:31,111 utc, a little over 4 hours ago	14:32
fungi	and 11:58:33,064 utc on nb02	14:34
fungi	doesn't look like either has tried to upload again since then	14:34
fungi	so ~2.5 hours ago	14:34
Clark[m]	Are there any uploads currently running against dfw3?	14:34
fungi	not that i found in the current debug logs	14:35
Clark[m]	Looking to see what state we can infer about those might point at where the ssl eof is occuring	14:35
Clark[m]	A nodepool image-list should confirm	14:35
fungi	looks like the create image calls are terminating at exactly 30 seconds in	14:35
fungi	my guess is a waf/middlebox cutting it off	14:36
Clark[m]	Ya. Though you uploaded an image for the noble mirror right? So we know it worked at one point?	14:36
fungi	yes	14:36
fungi	when jamesdenton__ is awake, maybe he can confirm whether something new might be disconnecting glance uploads in dfw3 after exactly 30 seconds	14:37
fungi	my successful manual image upload to dfw3 was 2025-02-20T16:56:22Z so just over two weeks ago	14:39
fungi	er, exactly two weeks ago	14:40
fungi	i'll try another manual upload just to confirm	14:40
fungi	HttpException: 413: Client Error for url: https://glance.api.dfw3.rackspacecloud.com/v2/images/7e041fcd-6513-4e81-808a-05da65635e28/file, 413 Request Entity Too Large: nginx	14:44
fungi	that's what i get back from `openstack server create` to dfw3 now, exact same command worked when i ran it two weeks ago, pulled right from my shell history	14:45
fungi	trying to sjc3 again for comparison	14:45
fungi	and it worked fine there just now	14:46
fungi	the 30 seconds might be a red herring, i have a feeling that might be related to async image upload processing and background checking intervals	14:47
Clark[m]	That could be. It's also good a problem reproduces with the main tool because now rax can test without running a nodepool	14:49
fungi	yeah, when testing with the cli i get that 413 error back after only 15 seconds	14:50
Clark[m]	I guess the next step for us is to turn sjc3 test nodes on while we wait for dfw3 images to sort out?	14:50
jamesdenton__	fungi how large is that image?	14:51
fungi	582M noble-server-cloudimg-amd64.img	14:52
fungi	the one i just tested with the cli	14:52
fungi	the ones nodepool is uploading are much larger, but i can't even upload a half-gigabyte cloud image to dfw3 at the moment	14:52
jamesdenton__	ok, let me ping the team about that	14:53
fungi	jamesdenton__: and two weeks ago it was working fine	14:53
fungi	exact same image create command	14:53
fungi	with the exact same file even	14:54
fungi	i still had it sitting in my homedir, hadn't gotten around to deleting it	14:54
fungi	Clark[m]: yeah, i can split 943106 into two separate changes for now	14:55
jamesdenton__	fungi can you try again? they made a change that might fix it	14:56
opendevreview	Fabien Boucher proposed zuul/zuul-jobs master: zuul_log_path: allow override for emit-job-header and upload-logs https://review.opendev.org/c/zuul/zuul-jobs/+/943586	14:57
opendevreview	Jeremy Stanley proposed openstack/project-config master: Boot nodes in Rackspace Flex SJC3 again https://review.opendev.org/c/openstack/project-config/+/943106	15:01
opendevreview	Jeremy Stanley proposed openstack/project-config master: Start booting nodes in Rackspace Flex DFW3 https://review.opendev.org/c/openstack/project-config/+/943617	15:01
fungi	jamesdenton__: jamesdenton__ now my test upload to dfw3 is working, yes. thanks!!!	15:02
jamesdenton__	thanks for submitting your bug report. have a nice day :D	15:02
fungi	heh ;)	15:03
fungi	i'll figure out how to prompt the nodepool builders to try again	15:03
fungi	i've set 943617 wip until we get our images up there	15:06
fungi	images are already starting to appear there without me doing anything, so i guess nodepool has started to figure it out	15:08
Clark[m]	fungi: 943106 lgtm but I can't vote for a bit due to the school run. I say feel free to self approve	15:21
fungi	will do	15:21
clarkb	fungi: I +2'd both with a note on the second that I'll let you approve when you are happy with the image state	15:45
clarkb	fungi: did we or do we still need to update zuul-launcher?	15:45
opendevreview	Merged openstack/project-config master: Boot nodes in Rackspace Flex SJC3 again https://review.opendev.org/c/openstack/project-config/+/943106	15:46
fungi	clarkb: that was https://review.opendev.org/c/opendev/zuul-providers/+/943103	15:47
opendevreview	Clark Boylan proposed opendev/system-config master: Add option to force docker.io addresses to IPv4 https://review.opendev.org/c/opendev/system-config/+/943216	15:48
clarkb	excellent. There aws enough going on yesterday I missed it.	15:48
fungi	clarkb: also 943104 goes along with 943617 as the zuul-launcher counterpart for enabling dfw3	15:51
fungi	though it could in theory be approved any time if we don't want to wait to confirm nodepool is able to boot nodes there first	15:52
clarkb	I suspect that is less urgent as sjc3 should be able to provide zuul-launcher coverage for now so maybe focus on nodepool first since it is easier for us to confirm functionality via nodepool	15:54
clarkb	infra-root I posted a self review to https://review.opendev.org/c/opendev/system-config/+/943509 with one last thought on the required-projects list and how they interact with the dns zone repos. Maybe corvus knows the answer to my question off the top of his head otherwise we may need to read the source luke or experiment	15:54
clarkb	actually I answered my own question the zone repos are fetched from opendev.org	15:56
clarkb	so zuul isn't involved. I'll update my review	15:56
fungi	yeah, my understanding is that zuul will add the triggering project to the projects list regardless, but i suppose that doesn't matter if the job is ignoring zuul's checkout	15:57
clarkb	yup	15:58
clarkb	fungi: grafana shows max-servers in sjc3 has gone back up to 32. I don't see any nodes yet via the graphs	16:00
clarkb	no building nodes either so probably just a lack of demand at the moment	16:02
fungi	or the launcher on nl01 is still looking at the old project which no longer contains any images, e.g. because updating nodepool.yaml didn't cause it to create new cloud objects after clouds.yaml updated	16:05
fungi	may need to restart the container	16:06
clarkb	hrm could be	16:06
clarkb	if we think that is possible restarting it now before it accidentally manages to boot something (though without images that should be impossible) is probably a good idea	16:07
clarkb	the nodepool hourly job is running through so may want to wait for that to finish then restart	16:07
fungi	sure	16:08
clarkb	we have a building node now	16:18
clarkb	it doesn't look like the nodepool launcher on nl01 was restarted yet fwiw	16:20
clarkb	but a server list from bridge seems to show that it is launching in the correct tenant	16:21
fungi	agreed, one active and another building	16:23
fungi	correct project in sjc3	16:23
fungi	no intervention needed, just patience i suppose ;)	16:24
clarkb	++	16:24
clarkb	the in use node is no longer in use and got deleted. I'm going to try and ssh into one of the building nodes when they are ready/in-use to double check network mtus	16:25
clarkb	in theory it should all be happy now	16:25
fungi	good idea	16:25
clarkb	ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500	16:26
clarkb	from 65.17.193.51	16:26
fungi	perfect	16:26
clarkb	yup I think that means the two major things we wanted to address have been addressed (use newer project/tenant and get 1500 network MTUs)	16:30
clarkb	adding dfw3 and doubling our quota is an excellent bonus	16:31
opendevreview	Stephen Reaves proposed openstack/diskimage-builder master: Enable custom overlays https://review.opendev.org/c/openstack/diskimage-builder/+/943500	16:33
fungi	clarkb: i suppose we could go ahead and and turn on dfw3 to start at least satisfying node requests for the images that are already uploaded. or will that cause issues?	16:51
clarkb	I think that should be fine	16:51
clarkb	nodepool should know it can't boot the other images.	16:52
fungi	presumably it doesn't make sense to retry earlier uploads as the builders have probably cleaned up their local copies	16:54
fungi	also images we don't rebuild as often will take a while to start appearing regardless	16:54
clarkb	we can request new builds of images that we know we want/need	16:58
clarkb	I think we do keep the local image content as long as the image is active in a cloud	16:59
clarkb	active from nodepool's perspective I mean. Once nodepool says I'm deleting that image and that is true for all clouds then it will clean up the local data. This can occur before the actual images are removed from the clouds	16:59
opendevreview	Merged openstack/project-config master: Start booting nodes in Rackspace Flex DFW3 https://review.opendev.org/c/openstack/project-config/+/943617	17:04
fungi	we probably won't see much uptake there since it's only got debian-bookworm and ubuntu-bionic for now, but i'll check the graphs later today to see if there are at least some blips in use	17:05
clarkb	oh however we may only keep the smallest image file (qcow2?) so maybe we'd have to manually convert to raw if we are uploading raw here	17:10
clarkb	and ya its probably not worth the trouble just let new images build and deploy over the next few days (the most used images should still be built daily)	17:11
opendevreview	Jeremy Stanley proposed opendev/system-config master: Clean up old Rackspace Flex SJC3 project https://review.opendev.org/c/opendev/system-config/+/943625	17:27
clarkb	https://review.opendev.org/c/opendev/system-config/+/943216 is passing again now. As written it should only affect our CI jobs and not production. But that is worth double checking as I think we need to be much more careful with this behavior in production (due to issues like stale records that could point requests at the wrong ips)	17:33
opendevreview	Fabien Boucher proposed zuul/zuul-jobs master: zuul_log_path: allow override for emit-job-header and upload-logs https://review.opendev.org/c/zuul/zuul-jobs/+/943586	17:43
opendevreview	Fabien Boucher proposed zuul/zuul-jobs master: zuul_log_path: allow override for emit-job-header and upload-logs https://review.opendev.org/c/zuul/zuul-jobs/+/943586	17:45
fungi	a bunch more images have appeared in dfw3 now	17:50
clarkb	still waiting on the first node boot. Quick someone push a change to tacker :)	17:56
fungi	i went ahead and cleaned up orphaned floating ips, images, routers, networks and ports in our two old sjc3 projects	18:25
clarkb	good idea	18:33
clarkb	I've rechecked https://review.opendev.org/c/opendev/system-config/+/943326 on the off chance the seelenium update can land without the docker.io ipv4 change and maybe that will generate enough load to see things schedule on dfw3	18:34
fungi	oh, removed old keypairs and security group rules too	18:37
clarkb	seems like overall load is low enough that everything else keeps scheduling nodes and not dfw3.	18:38
fungi	i can't think of anything else we'd need to cleanup	18:38
fungi	yeah, i have a feeling we may not see any node use there until 02:00	18:38
fungi	at least the daily jobs still (more than) saturate our quota everywhere	18:38
opendevreview	Karolina Kula proposed opendev/glean master: WIP: Add support for CentOS 10 keyfiles https://review.opendev.org/c/opendev/glean/+/941672	21:34
opendevreview	Karolina Kula proposed openstack/diskimage-builder master: WIP: Add support for CentOS Stream 10 https://review.opendev.org/c/openstack/diskimage-builder/+/934045	21:35
opendevreview	Merged opendev/system-config master: Pull the selenium standalone-firefox image from quay https://review.opendev.org/c/opendev/system-config/+/943326	22:00
fungi	Exception: Unable to find flavor: gp.0.4.8	22:01
fungi	should be gp.5.4.8 there, working on a patch	22:02
opendevreview	Jeremy Stanley proposed openstack/project-config master: Correct flavor name for Rackspace Flex DFW3 https://review.opendev.org/c/openstack/project-config/+/943647	22:07
fungi	that's going to break some assumptions for our existing zuul-launcher config too, looks like	22:09
clarkb	oh hey the selenium image update meregd without extra help	22:11
fungi	yup!	22:11
clarkb	I've approved the flavor fix in project-config	22:11
clarkb	looks like sjc3 may be erroring on boots too	22:12
opendevreview	Jeremy Stanley proposed opendev/zuul-providers master: Add the DFW3 region for Rackspace Flex https://review.opendev.org/c/opendev/zuul-providers/+/943104	22:14
clarkb	Invalid key_name provided.	22:14
clarkb	fungi: ^ did stuff get deleted in the wrong tenant?	22:14
fungi	checking	22:15
fungi	no	22:15
fungi	unless keypairs are cross-tenant resources?	22:16
fungi	oh! they are :/	22:16
clarkb	that would explain it. keypair list is empty in both tenants	22:16
clarkb	how often does cloud launcher run? once a day in periodic?	22:17
fungi	they must be tied to the account?	22:17
clarkb	ya I guess so that seems like a bug if I'm honest	22:17
clarkb	there is no reason for tenant A to know anything about tenant B's keys if all other resources are separated	22:17
clarkb	anyway cloud launcher should fix that for us	22:17
clarkb	I want to say it runs daily	22:18
clarkb	or when we update the vars for it	22:18
fungi	yeah, i stupidly assumed that `openstack keypair delete ...` would delete them from the identified project only, not from all projects that account has access to	22:18
fungi	but yeah, we're probably going to see that get fixed in ~4 hours	22:19
clarkb	and dfw3 looks fine so it should come online with the flavor fix	22:20
clarkb	fungi: fwiw I would've assumed they were separate too. I would've done the same thing and been confused	22:26
clarkb	(also why I initially wondered if the wrong tenant was used to request the deletion)	22:27
fungi	well, the week's incomplete if i haven't broken something! at least i got it out of the way ;)	22:31
clarkb	I'm still trying to fix the stuff I broke :)	22:32
clarkb	mostly waiting on a secnd review on the infra-prod stuff since its complicated and sensitive enough that the current half broken but mostly working state is preferable to very broken if I got the fix wrong	22:32
opendevreview	Merged openstack/project-config master: Correct flavor name for Rackspace Flex DFW3 https://review.opendev.org/c/openstack/project-config/+/943647	22:33
opendevreview	Merged openstack/diskimage-builder master: tox: Drop functest target from tox https://review.opendev.org/c/openstack/diskimage-builder/+/932266	22:44
clarkb	I haev confirmed that cloudlauncher will run with the periodic jobs	22:44
clarkb	so ya I don't think we need to rush to try and fix it before then	22:44
clarkb	dfw3 is attempting to boot 5 servers right now	22:46
clarkb	I was able to ssh into one and check mtus. All looks well there	22:47
fungi	cool, so it was just the flavor difference	22:50
clarkb	yup seems to be working based on these measurements now	22:51
clarkb	https://zuul.opendev.org/t/openstack/stream/f112d5081fe9466d8fc360174cc865ee?logfile=console.log this job is running in dfw3	23:08
clarkb	https://zuul.opendev.org/t/openstack/build/f112d5081fe9466d8fc360174cc865ee it finished successfully	23:14
fungi	i think that's about all we can hope for	23:14
clarkb	yup	23:15
clarkb	I think I'm going to take that as a cue to go for a bike ride. Its been a quiet day and weather is nice. The onyl things in my backlog are potentially impactful so better for tomorrow morning (the docker stuff, gitea upgrade, and infra-prod required projects update)_	23:16
fungi	great idea, have fun!	23:20
fungi	i'll try to check back up on sjc3 after the keypairs are redeployed	23:20

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!