Monday, 2023-08-28

liuc49_	Does anyone meet the problem: the server cannot ssh login after booting a instance. And the status of the floatingIP is down. The error log is as following:	00:48
liuc49_	Body: b'{"floatingip": {"id": "5c4e68c1-aad9-498d-8fcf-91bb2031c27e", "tenant_id": "9fc039ef8cb942de9e6a57f484dd4942", "floating_ip_address": "172.24.4.159", "floating_network_id": "3067c886-e311-402f-b4b9-3b24c7456fa5", "router_id": "4ea198ac-facc-4201-b80c-ccd5a5828a43", "port_id": "8b8b5840-a5d3-427c-9278-ead7f54a544f", "fixed_ip_address": "10.0.0.11", "status": "DOWN", "description": "", "port_details":	00:48
liuc49_	{"name": "", "network_id": "e8b6d264-7e08-4da9-bd67-1f1062634b4d", "mac_address": "fa:16:3e:e4:17:54", "admin_state_up": true, "status": "ACTIVE", "device_id": "d9af96ca-56cc-49fd-9bc9-3d99a2e542bf", "device_owner": "compute:nova"}	00:48
liuc49_	, "dns_domain": "", "dns_name": "", "tags": [], "created_at": "2023-08-18T04:09:27Z", "updated_at": "2023-08-18T04:09:38Z", "revision_number": 1, "project_id": "9fc039ef8cb942de9e6a57f484dd4942"}}'	00:48
liuc49_	2023-08-18 04:09:38,489 395657 INFO [tempest.lib.common.ssh] Creating ssh connection to '172.24.4.159:22' as 'cirros' with public key authentication	00:48
liuc49_	2023-08-18 04:10:38,546 395657 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.4.159 (timed out). Number attempts: 1. Retry after 2 seconds.	00:49
liuc49_	2023-08-18 04:11:41,109 395657 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.4.159 (timed out). Number attempts: 2. Retry after 3 seconds.	00:49
liuc49_	2023-08-18 04:12:44,663 395657 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.4.159 (timed out). Number attempts: 3. Retry after 4 seconds.	00:49
liuc49_	2023-08-18 04:13:49,225 395657 ERROR [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.4.159 after 3 attempts. Proxy client: no proxy client	00:49
liuc49_	2023-08-18 04:13:49.225 395657 ERROR tempest.lib.common.ssh Traceback (most recent call last):	00:49
liuc49_	2023-08-18 04:13:49.225 395657 ERROR tempest.lib.common.ssh File "/opt/stack/tempest/tempest/lib/common/ssh.py", line 136, in _get_ssh_connection	00:49
liuc49_	2023-08-18 04:13:49.225 395657 ERROR tempest.lib.common.ssh ssh.connect(self.host, port=self.port, username=self.username,	00:49
liuc49_	2023-08-18 04:13:49.225 395657 ERROR tempest.lib.common.ssh File "/opt/stack/.local/lib/python3.10/site-packages/paramiko/client.py", line 386, in connect	00:49
liuc49_	2023-08-18 04:13:49.225 395657 ERROR tempest.lib.common.ssh sock.connect(addr)	00:49
liuc49_	2023-08-18 04:13:49.225 395657 ERROR tempest.lib.common.ssh TimeoutError: timed out	00:49
liuc49_	2023-08-18 04:13:49.225 395657 ERROR tempest.lib.common.ssh	00:49
liuc49_	how to resolve the issue？	00:54
liuc49_	It occurs during running tempest cases.	01:08
Clark[m]	liuc49_ we run the zuul instance which executes tempest jobs but don't do much directly with tempest. The openstack qa team is probably a better resource for that. Their irc channel is #openstack-qa	02:04
frickler	fungi: interesting, I wasn't aware of the .test domain. I need to test whether the restrictions regarding caching resolvers apply to systemd	05:14
frickler	does devstack.org belong to the foundation? can't really tell from the outside	05:15
*** elodilles_pto is now known as elodilles		06:41
opendevreview	Bartosz Bezak proposed openstack/diskimage-builder master: Add NetworkManager-config-server to rocky-container https://review.opendev.org/c/openstack/diskimage-builder/+/892893	09:53
*** dhill is now known as Guest909		10:42
*** gthiemon1e is now known as gthiemonge		12:20
fungi	frickler: yes, openinfra foundation controls the domain registration and the domain is presently hosted in rackspace but we could move it to opendev's nameservers or wherever really	12:21
fungi	it was the domain for a vanity site dtroyer (i think) put together for devstack in the very early days, but years ago we took that down and just turned it into a redirect to the docs	12:22
frickler	fungi: well we would not need anything happen with it, just make sure that the testing use in devstack/tempest doesn't collide with any real use	12:22
frickler	fungi: I'll discuss this with neutron and qa people and see how they want to proceed	12:23
fungi	sounds good, let me know what consensus you reach	12:26
fungi	looks like we're running at around 500 concurrent builds at the moment, all our quota is in use since around 12:55z and there's a modest node request backlog (though it seems to be catching back up quickly)	13:33
frickler	hmm, I was thinking our peak capacity would be larger, but maybe that's just memories of the past	13:58
fungi	we lost several hundred nodes of capacity when iweb pulled out	14:00
fungi	but also we average more than one node per build, and there's node building/deleting overhead to take into account as well (plus ready nodes for less-used platforms)	14:01
fungi	the node request backlog only lasted an hour, caught up by 13:55z	14:25
fungi	looks like it conicided with a large batch of openstack project release jobs, which all go pretty quickly	14:26
fungi	however, that pushed our jph for the 13z sample to 1.5k	14:27
fungi	build concurrency topped out at 503	14:28
fungi	so the build durations were averaging <20min	14:28
*** JasonF is now known as JaqyF		14:54
*** JaqyF is now known as JayF		14:54
fungi	infra-root: came up earlier in #openstack-infra but frickler looked into the stale centos mirroring and it looks like we're at quota on it. recommendation is we increase that (and also the centos-stream volume which is nearly maxxed as well) by 50gb each. any objections?	15:07
clarkb	no objections. I think last I looked centos isn't pruning old packages and is just adding to them. Some are large and we may need to add more exclusions though that gets messy when trying to exclude versions and not suites	15:16
fungi	maybe we'll be able to drop centos 7 soon and free up space	15:18
fungi	but also fedora is using 400gb right now which will hopefully go away rsn	15:19
clarkb	++ https://review.opendev.org/c/opendev/base-jobs/+/892380 is the next step for fedora I think	15:19
fungi	frickler: were you wanting to do the centos/centos-stream mirror volume quota increases, or shall i?	15:25
opendevreview	Clark Boylan proposed openstack/project-config master: Convert all zuul tenants except openstack to Ansible 8 https://review.opendev.org/c/openstack/project-config/+/892976	15:31
clarkb	infra-root quick review of ^ is appreciated since we've already discussed that previously	15:31
opendevreview	Clark Boylan proposed opendev/system-config master: Exclude i686 rpms in our centos mirrors https://review.opendev.org/c/opendev/system-config/+/892978	15:47
fungi	lgtm, thanks!	15:49
fungi	backup02.ca-ymq-1.vexxhost is filling up, i'll start pruning there	15:52
clarkb	thanks	15:52
clarkb	fyi I just pushed remote: https://review.opendev.org/c/openstack/tempest/+/892981 DNM test devstack+tempest under Ansible 8 [NEW]	15:57
fungi	thanks!	15:57
fungi	that's probably some of the most complicated job playbooks we've got, so should serve as a good litmus test	15:58
clarkb	yup	15:58
frickler	fungi: if you want to bump the quota, feel free to go ahead	16:03
fungi	frickler: now it looks like we may not need to, with 892978	16:04
fungi	clarkb's eyes were better than mine at spotting (rather a lot of) something we didn't need to be mirroring	16:05
clarkb	fwiw I don't think it will be half because some packages don't get i686 packges and there are a lot of no arch packages too	16:06
clarkb	but it should be a good chunk	16:06
frickler	I think rsync will delete only after syncing, so will probably need the added headroom once anyway, but we can give it a try	16:08
clarkb	yes I agree. I think we should do both and then can adjust the quota back later	16:10
fungi	okay, i can take care of the increases in a moment	16:13
fungi	also rsync >=3 does --delete-during as its default. there is also a --delete-before which would make it somewhat less likely to run out of room while syncing	16:15
fungi	the only time it waits until the end to perform deletions is if you specify --delete-after	16:16
fungi	--delete-before was the default in rsync <3	16:16
fungi	#status log Increased quotas of the AFS mirror volumes for centos from 400GB to 450GB and centos-stream from 250GB to 300GB	16:27
opendevstatus	fungi: finished logging	16:28
fungi	i'll go ahead and manually initiate rsync for the stale centos volume to get things updated	16:28
fungi	in progress	16:30
fungi	still getting "File too large (27)" errors	16:33
fungi	maybe i needed to update the quotas for the read-only volumes too	16:33
clarkb	I thought quotas were only on the rw side and ro got them synced over	16:35
clarkb	is the problem a literal file that is too large?	16:35
clarkb	looks like the file size limit is either 2GB or many many terabytes	16:37
fungi	combined size of the filenames in a directory	16:38
fungi	https://lists.openafs.org/pipermail/openafs-info/2016-July/041859.html	16:39
fungi	so hopefully your change will fix this	16:39
clarkb	fungi: how did we go from file too large (27) to running out of directory entries? I agree my chagne should help if that is the problem	16:40
fungi	the volume was running 5gb below quota so the initial assumption was that the error message was related to running out of available quota	16:41
fungi	but apparently in the past that error message showed up for our tumbleweed mirror a few years ago: https://lists.openstack.org/pipermail/openstack-infra/2018-June/005972.html	16:43
fungi	it looks like it was resolved for us when suse removed some packages from their mirror sites	16:44
fungi	also it looks like the afs servers are still reporting the old quotas for the read-only volumes, but won't let me set them, so i think quota changes require a vos release to take effect (which will happen once we get the mirror syncing successfully)	16:46
fungi	i can try patching 892978 in by hand temporarily and running again with that	16:46
fungi	the change needs a recheck, archive.ubuntu.com is apparently in an incoherent state at the moment (leading to the check job failure on it)	16:47
clarkb	yes quota changes are vos released from rw to ro	16:47
fungi	clarkb: see apevec's comment on that change, but i still suspect it's our best way out if the problem is the number of files in directories	16:49
clarkb	ya I just looked at https://nb01.opendev.org/centos-8-stream-a814f58c1b3c4fa79f8eba7f991eb5d1.log and https://nb01.opendev.org/centos-9-stream-684af0330ddf4e7d893df6f720588b65.log and neither file has 'i686' in it	16:50
clarkb	the yum output does show x86_64 as arch for packages though so I think i686 would show up if we were installing those packages	16:50
fungi	any concerns with me running the version of centos-mirror-update with the --exclude="*.i686.rpm" lines added?	16:51
apevec	yeah no i686 should install by default on 8/9 - maybe on 7 but we don't have any c7 nodes do we?	16:51
fungi	we do still run jobs on centos-7 yes	16:52
clarkb	apevec: we do have c7 and the change does remove i686 pacakges from c7 too	16:52
clarkb	fungi: its broken as is so I don't think running it that wa will make it anyway worse	16:52
apevec	uhm what still runs on c7	16:52
clarkb	really old stuff	16:52
clarkb	some openshift things ecause openshift 3 on centos 7 is actually installable like a normal application	16:52
clarkb	the vast majority of things are probably bitrotted and really old though	16:53
apevec	openshift3 is EOL https://access.redhat.com/support/policy/updates/openshift_noncurrent	16:55
clarkb	sure but you cannot test with openshift 4 in a reasonable manner so this is a halfway measure	16:55
clarkb	its not great, but its better than nothing I Guess?	16:55
apevec	dunno, I'd drop it - for openshift4 we actually have reasonable microshift ansible role if anyone wants to test not EOL openshift	16:56
clarkb	apevec: is that new? last I saw sean was working on something similar but hadn't heard it was functional yet. Also openshift apparentyl refuses to start unless you give it massive amounts of memory	16:57
fungi	i thought ianw had tried to get microshift working for ci jobswithout success	16:57
clarkb	fungi: yes among other things. Memory being the problem iirc	16:57
fungi	oh right, it needed like 32gb of ram	16:58
fungi	for zuul i think we ended up deciding that making sure the kubernetes driver works was sufficient since we couldn't actually test openshift any more?	16:58
clarkb	yes, zuul appears to have dropped the job	16:59
apevec	https://github.com/openstack-k8s-operators/ansible-microshift-role/ we use it for sf-operator development	16:59
apevec	dpawlik tristanC what are min resource reqs?	16:59
clarkb	oh wait maybe its in nodepool	16:59
clarkb	ya the job still runs in nodepool	17:00
apevec	ah could be for openshift nodepool driver?	17:00
clarkb	apevec: yes	17:00
fungi	right, and that's where we're still testing with openshift 3 on centos 7?	17:00
clarkb	fungi: yes	17:00
apevec	tristanC: ^ is this worth to keep or could be removed?	17:00
clarkb	because openshift 4 isn't really an application or set of applications anymore. Its like a full rack appliance instead	17:00
apevec	c7 goes EOL soon next year...	17:01
clarkb	that ou install from the OS up using a full system orchestrator	17:01
fungi	right, you don't install openshift on an operating system any more, openshift is the operating system, as best i could tell	17:01
apevec	yeah, normal openshift installs coreos nodes using ignition, but microshift is really a reduced footprint	17:01
tristanC[m]	apevec: the issue with microshift is that it needs a pull secret. The min resource reqs are rather low, 3GB or ram, 2VCPU	17:01
clarkb	tristanC[m]: what sort of secret?	17:02
fungi	what's a "pull secret?"	17:02
fungi	like authenticated access to the packages?	17:02
clarkb	oh I see a secret to access the container images looks like	17:02
apevec	to pull container images from Red Hat CDN ... sigh	17:02
apevec	there's OKD which might not need that, but that's like openshift nightly i.e. openshift "Stream"	17:03
fungi	anyway, the centos mirror voume is vos releasing now, without i686 packages included	17:03
clarkb	I guess openshift releases aren't packaged in an opensource way then?	17:03
clarkb	we'd have to build from source?	17:03
fungi	maybe the source code is available to build images of it yourself	17:03
apevec	well, src is all in github ...	17:03
tristanC[m]	clarkb: fungi I think the pull secret is just an authentication token for the registry	17:03
clarkb	ya I mean I'm not going to worry about any of that myself. If it came down to me I'd remove the testing from nodepool and push people towards the k8s driver	17:04
clarkb	it should work against openshift too so wouldn't impact users	17:04
fungi	agreed. users of openshift would want to know it works with the official openshift images anyway, and since those are proprietary it would be better for the vendor to assess that	17:05
clarkb	in any case we haven't removed centos 7 because we have said a few times that we'll wait for distro EOL if we can manage it and people are using the images. It does seem like some people are using it and its fairly static at this point so hasn't created many problems	17:08
clarkb	once it does eol we'll clean it all up	17:08
corvus	the nodepool functional test for openshift does exercise some openshift-specific functionality like projects, so it's useful to have. but also, nodepool interacts with a lot of proprietary systems that are impractical to functional test in the opendev environment, so if openshift becomes impractical to functional-test then we'll just make sure we have those bits faked out and rely on unit tests.	17:11
fungi	makes sense, yep	17:15
clarkb	the zookeeper statsd container appears to have updated as expected during the daily runs	17:17
clarkb	and we still have data in grafana so that all looks good to me	17:18
clarkb	fungi: I'm going to go ahead and +A https://review.opendev.org/c/openstack/project-config/+/892976 in order to stay on schedule with ansible 8 and ensure we get as much data as quickly as possible	17:21
clarkb	the tempest chagne I pushed is looking good so far too	17:21
opendevreview	Merged openstack/project-config master: Convert all zuul tenants except openstack to Ansible 8 https://review.opendev.org/c/openstack/project-config/+/892976	17:35
fungi	clarkb: sounds good	17:37
clarkb	fungi: looks like the recheck you did failed again trying to install pci.ids from ubuntu's main us mirror?	17:54
clarkb	the tempest change I pushed to shake out ansible 8 issues passed every single job it ran except for two non voting jobs. Spot checking those two failures they both appear to fail running tests and not in the ansible. That is a very good sign	17:58
fungi	clarkb: yeah, i figure we should give archive.ubuntu.com some more time	18:03
fungi	seems it's having issues	18:03
clarkb	fungi: and for clarity you only ran the mirror script for centos not centos-stream?	18:09
fungi	correct	18:11
fungi	the other one is still successfully syncing on its own	18:11
fungi	#status log Pruned backups on backup02.ca-ymq-1.vexxhost.opendev.org as the volume had reached 90% utilization	18:48
opendevstatus	fungi: finished logging	18:48
ildikov	Hi	19:04
ildikov	I have a quick question. The patch that adds the new StarlingX Matrix rooms to the logbot got merged: https://review.opendev.org/c/opendev/system-config/+/892387	19:04
ildikov	If I understand it correctly, the logs are supposed to show up on the eavesdrop page. Is that accurate?	19:05
fungi	ildikov: yes, you can find them under https://meetings.opendev.org/irclogs/	19:08
fungi	(the meetings site is where channel logs are also published)	19:08
fungi	for example, https://meetings.opendev.org/irclogs/%23starlingx-general/latest.log.html	19:08
ildikov	fungi: never mind, I'm just blind, lol	19:09
fungi	nah, it's not super obvious	19:09
ildikov	fungi: for some reason I thought it'll all be under the #starlingx folder, since we organized the Matrix rooms into a Starlingx space	19:10
fungi	the logbot doesn't have a concept of spaces, but also spaces aren't channel namespaces in matrix anyway (which is why we prefixed all the channel names with starlingx-)	19:10
ildikov	so I opened that folder in the tree and then my brain just stayed in that rabbit hole	19:11
ildikov	gotta love Mondays, I think I just need my 2nd coffee now! :D	19:11
fungi	keep in mind that the element web client can also be used to anonymously look at public channel logs/history in a richer format	19:11
fungi	the main benefit to our channel logging is that it can be indexed by web search engines	19:12
ildikov	yeah, the logbot part makes sense, the namespace part is a good reminder	19:12
ildikov	that's great to know, thank you!	19:12
fungi	my pleasure, as always	19:13
fungi	clarkb: should we merge https://review.opendev.org/892817 once system-config-run-base is working again? basically get the last of the mm3 patches in before we start scheduling more migrations	19:41
Clark[m]	fungi: yup we should. I'm headed home from lunch now. Not sure if the base job is working again	19:55
fungi	i'll recheck your mirroring change again and find out	19:59
opendevreview	Merged openstack/diskimage-builder master: Install netplan.io for Debian Bookworm https://review.opendev.org/c/openstack/diskimage-builder/+/891323	20:17
fungi	yeah, looks like it's working again	20:38
fungi	once 892978 merges i'll release my lock on the volume	20:39
clarkb	++	20:42
opendevreview	Merged opendev/system-config master: Exclude i686 rpms in our centos mirrors https://review.opendev.org/c/opendev/system-config/+/892978	21:20
fungi	it deployed so dropping my lock now	21:29
fungi	i've self-approved 892817 now	21:49
Clark[m]	I'm at school pickup for a bit but don't expect trouble	21:57
fungi	nah, should be a no-op	22:01
opendevreview	Merged opendev/system-config master: mailman3: re-sync custom web/settings.py https://review.opendev.org/c/opendev/system-config/+/892817	22:15
fungi	it deployed	22:31
fungi	everything still looks fine	22:37
fungi	containers restarted at 22:21	22:37
clarkb	did I end up deleting the old ci registry node? I recall we had to land the change for that which I think did happen but maybe I didn't delete the server yet?	22:52
clarkb	I'm going to drop the gerrit updates meeting topic. I don't think we need to keep bringing up replication leak files	22:53
clarkb	with that I've just pushed an updated agenda	22:53
clarkb	anything else to add to it before I send it out? that will laso serve to confirm the list server is happy	22:55
fungi	nothing comes to mind	23:15
clarkb	looks like it went through	23:30

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!