Monday, 2023-08-28

liuc49_Does anyone meet the problem: the server cannot ssh login after booting a instance. And the status of the floatingIP is down. The error log is as following:00:48
liuc49_Body: b'{"floatingip": {"id": "5c4e68c1-aad9-498d-8fcf-91bb2031c27e", "tenant_id": "9fc039ef8cb942de9e6a57f484dd4942", "floating_ip_address": "172.24.4.159", "floating_network_id": "3067c886-e311-402f-b4b9-3b24c7456fa5", "router_id": "4ea198ac-facc-4201-b80c-ccd5a5828a43", "port_id": "8b8b5840-a5d3-427c-9278-ead7f54a544f", "fixed_ip_address": "10.0.0.11", "status": "DOWN", "description": "", "port_details":00:48
liuc49_{"name": "", "network_id": "e8b6d264-7e08-4da9-bd67-1f1062634b4d", "mac_address": "fa:16:3e:e4:17:54", "admin_state_up": true, "status": "ACTIVE", "device_id": "d9af96ca-56cc-49fd-9bc9-3d99a2e542bf", "device_owner": "compute:nova"}00:48
liuc49_, "dns_domain": "", "dns_name": "", "tags": [], "created_at": "2023-08-18T04:09:27Z", "updated_at": "2023-08-18T04:09:38Z", "revision_number": 1, "project_id": "9fc039ef8cb942de9e6a57f484dd4942"}}'00:48
liuc49_2023-08-18 04:09:38,489 395657 INFO [tempest.lib.common.ssh] Creating ssh connection to '172.24.4.159:22' as 'cirros' with public key authentication00:48
liuc49_2023-08-18 04:10:38,546 395657 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.4.159 (timed out). Number attempts: 1. Retry after 2 seconds.00:49
liuc49_2023-08-18 04:11:41,109 395657 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.4.159 (timed out). Number attempts: 2. Retry after 3 seconds.00:49
liuc49_2023-08-18 04:12:44,663 395657 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.4.159 (timed out). Number attempts: 3. Retry after 4 seconds.00:49
liuc49_2023-08-18 04:13:49,225 395657 ERROR [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.4.159 after 3 attempts. Proxy client: no proxy client00:49
liuc49_2023-08-18 04:13:49.225 395657 ERROR tempest.lib.common.ssh Traceback (most recent call last):00:49
liuc49_2023-08-18 04:13:49.225 395657 ERROR tempest.lib.common.ssh File "/opt/stack/tempest/tempest/lib/common/ssh.py", line 136, in _get_ssh_connection00:49
liuc49_2023-08-18 04:13:49.225 395657 ERROR tempest.lib.common.ssh ssh.connect(self.host, port=self.port, username=self.username,00:49
liuc49_2023-08-18 04:13:49.225 395657 ERROR tempest.lib.common.ssh File "/opt/stack/.local/lib/python3.10/site-packages/paramiko/client.py", line 386, in connect00:49
liuc49_2023-08-18 04:13:49.225 395657 ERROR tempest.lib.common.ssh sock.connect(addr)00:49
liuc49_2023-08-18 04:13:49.225 395657 ERROR tempest.lib.common.ssh TimeoutError: timed out00:49
liuc49_2023-08-18 04:13:49.225 395657 ERROR tempest.lib.common.ssh00:49
liuc49_how to resolve the issue´╝č00:54
liuc49_It occurs during running tempest cases.01:08
Clark[m]liuc49_ we run the zuul instance which executes tempest jobs but don't do much directly with tempest. The openstack qa team is probably a better resource for that. Their irc channel is #openstack-qa02:04
fricklerfungi: interesting, I wasn't aware of the .test domain. I need to test whether the restrictions regarding caching resolvers apply to systemd05:14
fricklerdoes devstack.org belong to the foundation? can't really tell from the outside05:15
*** elodilles_pto is now known as elodilles06:41
opendevreviewBartosz Bezak proposed openstack/diskimage-builder master: Add NetworkManager-config-server to rocky-container  https://review.opendev.org/c/openstack/diskimage-builder/+/89289309:53
*** dhill is now known as Guest90910:42
*** gthiemon1e is now known as gthiemonge12:20
fungifrickler: yes, openinfra foundation controls the domain registration and the domain is presently hosted in rackspace but we could move it to opendev's nameservers or wherever really12:21
fungiit was the domain for a vanity site dtroyer (i think) put together for devstack in the very early days, but years ago we took that down and just turned it into a redirect to the docs12:22
fricklerfungi: well we would not need anything happen with it, just make sure that the testing use in devstack/tempest doesn't collide with any real use12:22
fricklerfungi: I'll discuss this with neutron and qa people and see how they want to proceed12:23
fungisounds good, let me know what consensus you reach12:26
fungilooks like we're running at around 500 concurrent builds at the moment, all our quota is in use since around 12:55z and there's a modest node request backlog (though it seems to be catching back up quickly)13:33
fricklerhmm, I was thinking our peak capacity would be larger, but maybe that's just memories of the past13:58
fungiwe lost several hundred nodes of capacity when iweb pulled out14:00
fungibut also we average more than one node per build, and there's node building/deleting overhead to take into account as well (plus ready nodes for less-used platforms)14:01
fungithe node request backlog only lasted an hour, caught up by 13:55z14:25
fungilooks like it conicided with a large batch of openstack project release jobs, which all go pretty quickly14:26
fungihowever, that pushed our jph for the 13z sample to 1.5k14:27
fungibuild concurrency topped out at 50314:28
fungiso the build durations were averaging <20min14:28
*** JasonF is now known as JaqyF14:54
*** JaqyF is now known as JayF14:54
fungiinfra-root: came up earlier in #openstack-infra but frickler looked into the stale centos mirroring and it looks like we're at quota on it. recommendation is we increase that (and also the centos-stream volume which is nearly maxxed as well) by 50gb each. any objections?15:07
clarkbno objections. I think last I looked centos isn't pruning old packages and is just adding to them. Some are large and we may need to add more exclusions though that gets messy when trying to exclude versions and not suites15:16
fungimaybe we'll be able to drop centos 7 soon and free up space15:18
fungibut also fedora is using 400gb right now which will hopefully go away rsn15:19
clarkb++ https://review.opendev.org/c/opendev/base-jobs/+/892380 is the next step for fedora I think15:19
fungifrickler: were you wanting to do the centos/centos-stream mirror volume quota increases, or shall i?15:25
opendevreviewClark Boylan proposed openstack/project-config master: Convert all zuul tenants except openstack to Ansible 8  https://review.opendev.org/c/openstack/project-config/+/89297615:31
clarkbinfra-root quick review of ^ is appreciated since we've already discussed that previously15:31
opendevreviewClark Boylan proposed opendev/system-config master: Exclude i686 rpms in our centos mirrors  https://review.opendev.org/c/opendev/system-config/+/89297815:47
fungilgtm, thanks!15:49
fungibackup02.ca-ymq-1.vexxhost is filling up, i'll start pruning there15:52
clarkbthanks15:52
clarkbfyi I just pushed remote:   https://review.opendev.org/c/openstack/tempest/+/892981 DNM test devstack+tempest under Ansible 8 [NEW]15:57
fungithanks!15:57
fungithat's probably some of the most complicated job playbooks we've got, so should serve as a good litmus test15:58
clarkbyup15:58
fricklerfungi: if you want to bump the quota, feel free to go ahead16:03
fungifrickler: now it looks like we may not need to, with 89297816:04
fungiclarkb's eyes were better than mine at spotting (rather a lot of) something we didn't need to be mirroring16:05
clarkbfwiw I don't think it will be half because some packages don't get i686 packges and there are a lot of no arch packages too16:06
clarkbbut it should be a good chunk16:06
fricklerI think rsync will delete only after syncing, so will probably need the added headroom once anyway, but we can give it a try16:08
clarkbyes I agree. I think we should do both and then can adjust the quota back later16:10
fungiokay, i can take care of the increases in a moment16:13
fungialso rsync >=3 does --delete-during as its default. there is also a --delete-before which would make it somewhat less likely to run out of room while syncing16:15
fungithe only time it waits until the end to perform deletions is if you specify --delete-after16:16
fungi--delete-before was the default in rsync <316:16
fungi#status log Increased quotas of the AFS mirror volumes for centos from 400GB to 450GB and centos-stream from 250GB to 300GB16:27
opendevstatusfungi: finished logging16:28
fungii'll go ahead and manually initiate rsync for the stale centos volume to get things updated16:28
fungiin progress16:30
fungistill getting "File too large (27)" errors16:33
fungimaybe i needed to update the quotas for the read-only volumes too16:33
clarkbI thought quotas were only on the rw side and ro got them synced over16:35
clarkbis the problem a literal file that is too large?16:35
clarkblooks like the file size limit is either 2GB or many many terabytes16:37
fungicombined size of the filenames in a directory16:38
fungihttps://lists.openafs.org/pipermail/openafs-info/2016-July/041859.html16:39
fungiso hopefully your change will fix this16:39
clarkbfungi: how did we go from file too large (27) to running out of directory entries? I agree my chagne should help if that is the problem16:40
fungithe volume was running 5gb below quota so the initial assumption was that the error message was related to running out of available quota16:41
fungibut apparently in the past that error message showed up for our tumbleweed mirror a few years ago: https://lists.openstack.org/pipermail/openstack-infra/2018-June/005972.html16:43
fungiit looks like it was resolved for us when suse removed some packages from their mirror sites16:44
fungialso it looks like the afs servers are still reporting the old quotas for the read-only volumes, but won't let me set them, so i think quota changes require a vos release to take effect (which will happen once we get the mirror syncing successfully)16:46
fungii can try patching 892978 in by hand temporarily and running again with that16:46
fungithe change needs a recheck, archive.ubuntu.com is apparently in an incoherent state at the moment (leading to the check job failure on it)16:47
clarkbyes quota changes are vos released from rw to ro16:47
fungiclarkb: see apevec's comment on that change, but i still suspect it's our best way out if the problem is the number of files in directories16:49
clarkbya I just looked at https://nb01.opendev.org/centos-8-stream-a814f58c1b3c4fa79f8eba7f991eb5d1.log and https://nb01.opendev.org/centos-9-stream-684af0330ddf4e7d893df6f720588b65.log and neither file has 'i686' in it16:50
clarkbthe yum output does show x86_64 as arch for packages though so I think i686 would show up if we were installing those packages16:50
fungiany concerns with me running the version of centos-mirror-update with the --exclude="*.i686.rpm" lines added?16:51
apevecyeah no i686 should install by default on 8/9 - maybe on 7 but we don't have any c7 nodes do we?16:51
fungiwe do still run jobs on centos-7 yes16:52
clarkbapevec: we do have c7 and the change does remove i686 pacakges from c7 too16:52
clarkbfungi: its broken as is so I don't think running it that wa will make it anyway worse16:52
apevecuhm what still runs on c716:52
clarkbreally old stuff16:52
clarkbsome openshift things ecause openshift 3 on centos 7 is actually installable like a normal application16:52
clarkbthe vast majority of things are probably bitrotted and really old though16:53
apevecopenshift3 is EOL https://access.redhat.com/support/policy/updates/openshift_noncurrent16:55
clarkbsure but you cannot test with openshift 4 in a reasonable manner so this is a halfway measure16:55
clarkbits not great, but its better than nothing I Guess?16:55
apevecdunno, I'd drop it - for openshift4 we actually have reasonable microshift ansible role if anyone wants to test not EOL openshift 16:56
clarkbapevec: is that new? last I saw sean was working on something similar but hadn't heard it was functional yet. Also openshift apparentyl refuses to start unless you give it massive amounts of memory16:57
fungii thought ianw had tried to get microshift working for ci jobswithout success16:57
clarkbfungi: yes among other things. Memory being the problem iirc16:57
fungioh right, it needed like 32gb of ram16:58
fungifor zuul i think we ended up deciding that making sure the kubernetes driver works was sufficient since we couldn't actually test openshift any more?16:58
clarkbyes, zuul appears to have dropped the job16:59
apevechttps://github.com/openstack-k8s-operators/ansible-microshift-role/ we use it for sf-operator development16:59
apevecdpawlik tristanC  what are min resource reqs?16:59
clarkboh wait maybe its in nodepool16:59
clarkbya the job still runs in nodepool17:00
apevecah could be for openshift nodepool driver?17:00
clarkbapevec: yes17:00
fungiright, and that's where we're still testing with openshift 3 on centos 7?17:00
clarkbfungi: yes17:00
apevectristanC: ^ is this worth to keep or could be removed?17:00
clarkbbecause openshift 4 isn't really an application or set of applications anymore. Its like a full rack appliance instead17:00
apevecc7 goes EOL soon next year...17:01
clarkbthat ou install from the OS up using a full system orchestrator17:01
fungiright, you don't install openshift on an operating system any more, openshift is the operating system, as best i could tell17:01
apevecyeah, normal openshift installs coreos nodes using ignition, but microshift is really a reduced footprint17:01
tristanC[m]apevec: the issue with microshift is that it needs a pull secret. The min resource reqs are rather low, 3GB or ram, 2VCPU17:01
clarkbtristanC[m]: what sort of secret?17:02
fungiwhat's a "pull secret?"17:02
fungilike authenticated access to the packages?17:02
clarkboh I see a secret to access the container images looks like17:02
apevecto pull container images from Red Hat CDN ... sigh17:02
apevecthere's OKD which might not need that, but that's like openshift nightly i.e. openshift "Stream"17:03
fungianyway, the centos mirror voume is vos releasing now, without i686 packages included17:03
clarkbI guess openshift releases aren't packaged in an opensource way then?17:03
clarkbwe'd have to build from source?17:03
fungimaybe the source code is available to build images of it yourself17:03
apevecwell, src is all in github ...17:03
tristanC[m]clarkb: fungi I think the pull secret is just an authentication token for the registry17:03
clarkbya I mean I'm not going to worry about any of that myself. If it came down to me I'd remove the testing from nodepool and push people towards the k8s driver17:04
clarkbit should work against openshift too so wouldn't impact users17:04
fungiagreed. users of openshift would want to know it works with the official openshift images anyway, and since those are proprietary it would be better for the vendor to assess that17:05
clarkbin any case we haven't removed centos 7 because we have said a few times that we'll wait for distro EOL if we can manage it and people are using the images. It does seem like some people are using it and its fairly static at this point so hasn't created many problems17:08
clarkbonce it does eol we'll clean it all up17:08
corvusthe nodepool functional test for openshift does exercise some openshift-specific functionality like projects, so it's useful to have.  but also, nodepool interacts with a lot of proprietary systems that are impractical to functional test in the opendev environment, so if openshift becomes impractical to functional-test then we'll just make sure we have those bits faked out and rely on unit tests.17:11
fungimakes sense, yep17:15
clarkbthe zookeeper statsd container appears to have updated as expected during the daily runs17:17
clarkband we still have data in grafana so that all looks good to me17:18
clarkbfungi: I'm going to go ahead and +A https://review.opendev.org/c/openstack/project-config/+/892976 in order to stay on schedule with ansible 8 and ensure we get as much data as quickly as possible17:21
clarkbthe tempest chagne I pushed is looking good so far too17:21
opendevreviewMerged openstack/project-config master: Convert all zuul tenants except openstack to Ansible 8  https://review.opendev.org/c/openstack/project-config/+/89297617:35
fungiclarkb: sounds good17:37
clarkbfungi: looks like the recheck you did failed again trying to install pci.ids from ubuntu's main us mirror?17:54
clarkbthe tempest change I pushed to shake out ansible 8 issues passed every single job it ran except for two non voting jobs. Spot checking those two failures they both appear to fail running tests and not in the ansible. That is a very good sign17:58
fungiclarkb: yeah, i figure we should give archive.ubuntu.com some more time18:03
fungiseems it's having issues18:03
clarkbfungi: and for clarity you only ran the mirror script for centos not centos-stream?18:09
fungicorrect18:11
fungithe other one is still successfully syncing on its own18:11
fungi#status log Pruned backups on backup02.ca-ymq-1.vexxhost.opendev.org as the volume had reached 90% utilization18:48
opendevstatusfungi: finished logging18:48
ildikovHi19:04
ildikovI have a quick question. The patch that adds the new StarlingX Matrix rooms to the logbot got merged: https://review.opendev.org/c/opendev/system-config/+/89238719:04
ildikovIf I understand it correctly, the logs are supposed to show up on the eavesdrop page. Is that accurate?19:05
fungiildikov: yes, you can find them under https://meetings.opendev.org/irclogs/19:08
fungi(the meetings site is where channel logs are also published)19:08
fungifor example, https://meetings.opendev.org/irclogs/%23starlingx-general/latest.log.html19:08
ildikovfungi: never mind, I'm just blind, lol19:09
funginah, it's not super obvious19:09
ildikovfungi: for some reason I thought it'll all be under the #starlingx folder, since we organized the Matrix rooms into a Starlingx space19:10
fungithe logbot doesn't have a concept of spaces, but also spaces aren't channel namespaces in matrix anyway (which is why we prefixed all the channel names with starlingx-)19:10
ildikovso I opened that folder in the tree and then my brain just stayed in that rabbit hole19:11
ildikovgotta love Mondays, I think I just need my 2nd coffee now! :D19:11
fungikeep in mind that the element web client can also be used to anonymously look at public channel logs/history in a richer format19:11
fungithe main benefit to our channel logging is that it can be indexed by web search engines19:12
ildikovyeah, the logbot part makes sense, the namespace part is a good reminder19:12
ildikovthat's great to know, thank you!19:12
fungimy pleasure, as always19:13
fungiclarkb: should we merge https://review.opendev.org/892817 once system-config-run-base is working again? basically get the last of the mm3 patches in before we start scheduling more migrations19:41
Clark[m]fungi: yup we should. I'm headed home from lunch now. Not sure if the base job is working again19:55
fungii'll recheck your mirroring change again and find out19:59
opendevreviewMerged openstack/diskimage-builder master: Install netplan.io for Debian Bookworm  https://review.opendev.org/c/openstack/diskimage-builder/+/89132320:17
fungiyeah, looks like it's working again20:38
fungionce 892978 merges i'll release my lock on the volume20:39
clarkb++20:42
opendevreviewMerged opendev/system-config master: Exclude i686 rpms in our centos mirrors  https://review.opendev.org/c/opendev/system-config/+/89297821:20
fungiit deployed so dropping my lock now21:29
fungii've self-approved 892817 now21:49
Clark[m]I'm at school pickup for a bit but don't expect trouble 21:57
funginah, should be a no-op22:01
opendevreviewMerged opendev/system-config master: mailman3: re-sync custom web/settings.py  https://review.opendev.org/c/opendev/system-config/+/89281722:15
fungiit deployed22:31
fungieverything still looks fine22:37
fungicontainers restarted at 22:2122:37
clarkbdid I end up deleting the old ci registry node? I recall we had to land the change for that which I think did happen but maybe I didn't delete the server yet?22:52
clarkbI'm going to drop the gerrit updates meeting topic. I don't think we need to keep bringing up replication leak files22:53
clarkbwith that I've just pushed an updated agenda22:53
clarkbanything else to add to it before I send it out? that will laso serve to confirm the list server is happy22:55
funginothing comes to mind23:15
clarkblooks like it went through23:30

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!