liuc49_ | Does anyone meet the problem: the server cannot ssh login after booting a instance. And the status of the floatingIP is down. The error log is as following: | 00:48 |
---|---|---|
liuc49_ | Body: b'{"floatingip": {"id": "5c4e68c1-aad9-498d-8fcf-91bb2031c27e", "tenant_id": "9fc039ef8cb942de9e6a57f484dd4942", "floating_ip_address": "172.24.4.159", "floating_network_id": "3067c886-e311-402f-b4b9-3b24c7456fa5", "router_id": "4ea198ac-facc-4201-b80c-ccd5a5828a43", "port_id": "8b8b5840-a5d3-427c-9278-ead7f54a544f", "fixed_ip_address": "10.0.0.11", "status": "DOWN", "description": "", "port_details": | 00:48 |
liuc49_ | {"name": "", "network_id": "e8b6d264-7e08-4da9-bd67-1f1062634b4d", "mac_address": "fa:16:3e:e4:17:54", "admin_state_up": true, "status": "ACTIVE", "device_id": "d9af96ca-56cc-49fd-9bc9-3d99a2e542bf", "device_owner": "compute:nova"} | 00:48 |
liuc49_ | , "dns_domain": "", "dns_name": "", "tags": [], "created_at": "2023-08-18T04:09:27Z", "updated_at": "2023-08-18T04:09:38Z", "revision_number": 1, "project_id": "9fc039ef8cb942de9e6a57f484dd4942"}}' | 00:48 |
liuc49_ | 2023-08-18 04:09:38,489 395657 INFO [tempest.lib.common.ssh] Creating ssh connection to '172.24.4.159:22' as 'cirros' with public key authentication | 00:48 |
liuc49_ | 2023-08-18 04:10:38,546 395657 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.4.159 (timed out). Number attempts: 1. Retry after 2 seconds. | 00:49 |
liuc49_ | 2023-08-18 04:11:41,109 395657 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.4.159 (timed out). Number attempts: 2. Retry after 3 seconds. | 00:49 |
liuc49_ | 2023-08-18 04:12:44,663 395657 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.4.159 (timed out). Number attempts: 3. Retry after 4 seconds. | 00:49 |
liuc49_ | 2023-08-18 04:13:49,225 395657 ERROR [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.4.159 after 3 attempts. Proxy client: no proxy client | 00:49 |
liuc49_ | 2023-08-18 04:13:49.225 395657 ERROR tempest.lib.common.ssh Traceback (most recent call last): | 00:49 |
liuc49_ | 2023-08-18 04:13:49.225 395657 ERROR tempest.lib.common.ssh File "/opt/stack/tempest/tempest/lib/common/ssh.py", line 136, in _get_ssh_connection | 00:49 |
liuc49_ | 2023-08-18 04:13:49.225 395657 ERROR tempest.lib.common.ssh ssh.connect(self.host, port=self.port, username=self.username, | 00:49 |
liuc49_ | 2023-08-18 04:13:49.225 395657 ERROR tempest.lib.common.ssh File "/opt/stack/.local/lib/python3.10/site-packages/paramiko/client.py", line 386, in connect | 00:49 |
liuc49_ | 2023-08-18 04:13:49.225 395657 ERROR tempest.lib.common.ssh sock.connect(addr) | 00:49 |
liuc49_ | 2023-08-18 04:13:49.225 395657 ERROR tempest.lib.common.ssh TimeoutError: timed out | 00:49 |
liuc49_ | 2023-08-18 04:13:49.225 395657 ERROR tempest.lib.common.ssh | 00:49 |
liuc49_ | how to resolve the issue? | 00:54 |
liuc49_ | It occurs during running tempest cases. | 01:08 |
Clark[m] | liuc49_ we run the zuul instance which executes tempest jobs but don't do much directly with tempest. The openstack qa team is probably a better resource for that. Their irc channel is #openstack-qa | 02:04 |
frickler | fungi: interesting, I wasn't aware of the .test domain. I need to test whether the restrictions regarding caching resolvers apply to systemd | 05:14 |
frickler | does devstack.org belong to the foundation? can't really tell from the outside | 05:15 |
*** elodilles_pto is now known as elodilles | 06:41 | |
opendevreview | Bartosz Bezak proposed openstack/diskimage-builder master: Add NetworkManager-config-server to rocky-container https://review.opendev.org/c/openstack/diskimage-builder/+/892893 | 09:53 |
*** dhill is now known as Guest909 | 10:42 | |
*** gthiemon1e is now known as gthiemonge | 12:20 | |
fungi | frickler: yes, openinfra foundation controls the domain registration and the domain is presently hosted in rackspace but we could move it to opendev's nameservers or wherever really | 12:21 |
fungi | it was the domain for a vanity site dtroyer (i think) put together for devstack in the very early days, but years ago we took that down and just turned it into a redirect to the docs | 12:22 |
frickler | fungi: well we would not need anything happen with it, just make sure that the testing use in devstack/tempest doesn't collide with any real use | 12:22 |
frickler | fungi: I'll discuss this with neutron and qa people and see how they want to proceed | 12:23 |
fungi | sounds good, let me know what consensus you reach | 12:26 |
fungi | looks like we're running at around 500 concurrent builds at the moment, all our quota is in use since around 12:55z and there's a modest node request backlog (though it seems to be catching back up quickly) | 13:33 |
frickler | hmm, I was thinking our peak capacity would be larger, but maybe that's just memories of the past | 13:58 |
fungi | we lost several hundred nodes of capacity when iweb pulled out | 14:00 |
fungi | but also we average more than one node per build, and there's node building/deleting overhead to take into account as well (plus ready nodes for less-used platforms) | 14:01 |
fungi | the node request backlog only lasted an hour, caught up by 13:55z | 14:25 |
fungi | looks like it conicided with a large batch of openstack project release jobs, which all go pretty quickly | 14:26 |
fungi | however, that pushed our jph for the 13z sample to 1.5k | 14:27 |
fungi | build concurrency topped out at 503 | 14:28 |
fungi | so the build durations were averaging <20min | 14:28 |
*** JasonF is now known as JaqyF | 14:54 | |
*** JaqyF is now known as JayF | 14:54 | |
fungi | infra-root: came up earlier in #openstack-infra but frickler looked into the stale centos mirroring and it looks like we're at quota on it. recommendation is we increase that (and also the centos-stream volume which is nearly maxxed as well) by 50gb each. any objections? | 15:07 |
clarkb | no objections. I think last I looked centos isn't pruning old packages and is just adding to them. Some are large and we may need to add more exclusions though that gets messy when trying to exclude versions and not suites | 15:16 |
fungi | maybe we'll be able to drop centos 7 soon and free up space | 15:18 |
fungi | but also fedora is using 400gb right now which will hopefully go away rsn | 15:19 |
clarkb | ++ https://review.opendev.org/c/opendev/base-jobs/+/892380 is the next step for fedora I think | 15:19 |
fungi | frickler: were you wanting to do the centos/centos-stream mirror volume quota increases, or shall i? | 15:25 |
opendevreview | Clark Boylan proposed openstack/project-config master: Convert all zuul tenants except openstack to Ansible 8 https://review.opendev.org/c/openstack/project-config/+/892976 | 15:31 |
clarkb | infra-root quick review of ^ is appreciated since we've already discussed that previously | 15:31 |
opendevreview | Clark Boylan proposed opendev/system-config master: Exclude i686 rpms in our centos mirrors https://review.opendev.org/c/opendev/system-config/+/892978 | 15:47 |
fungi | lgtm, thanks! | 15:49 |
fungi | backup02.ca-ymq-1.vexxhost is filling up, i'll start pruning there | 15:52 |
clarkb | thanks | 15:52 |
clarkb | fyi I just pushed remote: https://review.opendev.org/c/openstack/tempest/+/892981 DNM test devstack+tempest under Ansible 8 [NEW] | 15:57 |
fungi | thanks! | 15:57 |
fungi | that's probably some of the most complicated job playbooks we've got, so should serve as a good litmus test | 15:58 |
clarkb | yup | 15:58 |
frickler | fungi: if you want to bump the quota, feel free to go ahead | 16:03 |
fungi | frickler: now it looks like we may not need to, with 892978 | 16:04 |
fungi | clarkb's eyes were better than mine at spotting (rather a lot of) something we didn't need to be mirroring | 16:05 |
clarkb | fwiw I don't think it will be half because some packages don't get i686 packges and there are a lot of no arch packages too | 16:06 |
clarkb | but it should be a good chunk | 16:06 |
frickler | I think rsync will delete only after syncing, so will probably need the added headroom once anyway, but we can give it a try | 16:08 |
clarkb | yes I agree. I think we should do both and then can adjust the quota back later | 16:10 |
fungi | okay, i can take care of the increases in a moment | 16:13 |
fungi | also rsync >=3 does --delete-during as its default. there is also a --delete-before which would make it somewhat less likely to run out of room while syncing | 16:15 |
fungi | the only time it waits until the end to perform deletions is if you specify --delete-after | 16:16 |
fungi | --delete-before was the default in rsync <3 | 16:16 |
fungi | #status log Increased quotas of the AFS mirror volumes for centos from 400GB to 450GB and centos-stream from 250GB to 300GB | 16:27 |
opendevstatus | fungi: finished logging | 16:28 |
fungi | i'll go ahead and manually initiate rsync for the stale centos volume to get things updated | 16:28 |
fungi | in progress | 16:30 |
fungi | still getting "File too large (27)" errors | 16:33 |
fungi | maybe i needed to update the quotas for the read-only volumes too | 16:33 |
clarkb | I thought quotas were only on the rw side and ro got them synced over | 16:35 |
clarkb | is the problem a literal file that is too large? | 16:35 |
clarkb | looks like the file size limit is either 2GB or many many terabytes | 16:37 |
fungi | combined size of the filenames in a directory | 16:38 |
fungi | https://lists.openafs.org/pipermail/openafs-info/2016-July/041859.html | 16:39 |
fungi | so hopefully your change will fix this | 16:39 |
clarkb | fungi: how did we go from file too large (27) to running out of directory entries? I agree my chagne should help if that is the problem | 16:40 |
fungi | the volume was running 5gb below quota so the initial assumption was that the error message was related to running out of available quota | 16:41 |
fungi | but apparently in the past that error message showed up for our tumbleweed mirror a few years ago: https://lists.openstack.org/pipermail/openstack-infra/2018-June/005972.html | 16:43 |
fungi | it looks like it was resolved for us when suse removed some packages from their mirror sites | 16:44 |
fungi | also it looks like the afs servers are still reporting the old quotas for the read-only volumes, but won't let me set them, so i think quota changes require a vos release to take effect (which will happen once we get the mirror syncing successfully) | 16:46 |
fungi | i can try patching 892978 in by hand temporarily and running again with that | 16:46 |
fungi | the change needs a recheck, archive.ubuntu.com is apparently in an incoherent state at the moment (leading to the check job failure on it) | 16:47 |
clarkb | yes quota changes are vos released from rw to ro | 16:47 |
fungi | clarkb: see apevec's comment on that change, but i still suspect it's our best way out if the problem is the number of files in directories | 16:49 |
clarkb | ya I just looked at https://nb01.opendev.org/centos-8-stream-a814f58c1b3c4fa79f8eba7f991eb5d1.log and https://nb01.opendev.org/centos-9-stream-684af0330ddf4e7d893df6f720588b65.log and neither file has 'i686' in it | 16:50 |
clarkb | the yum output does show x86_64 as arch for packages though so I think i686 would show up if we were installing those packages | 16:50 |
fungi | any concerns with me running the version of centos-mirror-update with the --exclude="*.i686.rpm" lines added? | 16:51 |
apevec | yeah no i686 should install by default on 8/9 - maybe on 7 but we don't have any c7 nodes do we? | 16:51 |
fungi | we do still run jobs on centos-7 yes | 16:52 |
clarkb | apevec: we do have c7 and the change does remove i686 pacakges from c7 too | 16:52 |
clarkb | fungi: its broken as is so I don't think running it that wa will make it anyway worse | 16:52 |
apevec | uhm what still runs on c7 | 16:52 |
clarkb | really old stuff | 16:52 |
clarkb | some openshift things ecause openshift 3 on centos 7 is actually installable like a normal application | 16:52 |
clarkb | the vast majority of things are probably bitrotted and really old though | 16:53 |
apevec | openshift3 is EOL https://access.redhat.com/support/policy/updates/openshift_noncurrent | 16:55 |
clarkb | sure but you cannot test with openshift 4 in a reasonable manner so this is a halfway measure | 16:55 |
clarkb | its not great, but its better than nothing I Guess? | 16:55 |
apevec | dunno, I'd drop it - for openshift4 we actually have reasonable microshift ansible role if anyone wants to test not EOL openshift | 16:56 |
clarkb | apevec: is that new? last I saw sean was working on something similar but hadn't heard it was functional yet. Also openshift apparentyl refuses to start unless you give it massive amounts of memory | 16:57 |
fungi | i thought ianw had tried to get microshift working for ci jobswithout success | 16:57 |
clarkb | fungi: yes among other things. Memory being the problem iirc | 16:57 |
fungi | oh right, it needed like 32gb of ram | 16:58 |
fungi | for zuul i think we ended up deciding that making sure the kubernetes driver works was sufficient since we couldn't actually test openshift any more? | 16:58 |
clarkb | yes, zuul appears to have dropped the job | 16:59 |
apevec | https://github.com/openstack-k8s-operators/ansible-microshift-role/ we use it for sf-operator development | 16:59 |
apevec | dpawlik tristanC what are min resource reqs? | 16:59 |
clarkb | oh wait maybe its in nodepool | 16:59 |
clarkb | ya the job still runs in nodepool | 17:00 |
apevec | ah could be for openshift nodepool driver? | 17:00 |
clarkb | apevec: yes | 17:00 |
fungi | right, and that's where we're still testing with openshift 3 on centos 7? | 17:00 |
clarkb | fungi: yes | 17:00 |
apevec | tristanC: ^ is this worth to keep or could be removed? | 17:00 |
clarkb | because openshift 4 isn't really an application or set of applications anymore. Its like a full rack appliance instead | 17:00 |
apevec | c7 goes EOL soon next year... | 17:01 |
clarkb | that ou install from the OS up using a full system orchestrator | 17:01 |
fungi | right, you don't install openshift on an operating system any more, openshift is the operating system, as best i could tell | 17:01 |
apevec | yeah, normal openshift installs coreos nodes using ignition, but microshift is really a reduced footprint | 17:01 |
tristanC[m] | apevec: the issue with microshift is that it needs a pull secret. The min resource reqs are rather low, 3GB or ram, 2VCPU | 17:01 |
clarkb | tristanC[m]: what sort of secret? | 17:02 |
fungi | what's a "pull secret?" | 17:02 |
fungi | like authenticated access to the packages? | 17:02 |
clarkb | oh I see a secret to access the container images looks like | 17:02 |
apevec | to pull container images from Red Hat CDN ... sigh | 17:02 |
apevec | there's OKD which might not need that, but that's like openshift nightly i.e. openshift "Stream" | 17:03 |
fungi | anyway, the centos mirror voume is vos releasing now, without i686 packages included | 17:03 |
clarkb | I guess openshift releases aren't packaged in an opensource way then? | 17:03 |
clarkb | we'd have to build from source? | 17:03 |
fungi | maybe the source code is available to build images of it yourself | 17:03 |
apevec | well, src is all in github ... | 17:03 |
tristanC[m] | clarkb: fungi I think the pull secret is just an authentication token for the registry | 17:03 |
clarkb | ya I mean I'm not going to worry about any of that myself. If it came down to me I'd remove the testing from nodepool and push people towards the k8s driver | 17:04 |
clarkb | it should work against openshift too so wouldn't impact users | 17:04 |
fungi | agreed. users of openshift would want to know it works with the official openshift images anyway, and since those are proprietary it would be better for the vendor to assess that | 17:05 |
clarkb | in any case we haven't removed centos 7 because we have said a few times that we'll wait for distro EOL if we can manage it and people are using the images. It does seem like some people are using it and its fairly static at this point so hasn't created many problems | 17:08 |
clarkb | once it does eol we'll clean it all up | 17:08 |
corvus | the nodepool functional test for openshift does exercise some openshift-specific functionality like projects, so it's useful to have. but also, nodepool interacts with a lot of proprietary systems that are impractical to functional test in the opendev environment, so if openshift becomes impractical to functional-test then we'll just make sure we have those bits faked out and rely on unit tests. | 17:11 |
fungi | makes sense, yep | 17:15 |
clarkb | the zookeeper statsd container appears to have updated as expected during the daily runs | 17:17 |
clarkb | and we still have data in grafana so that all looks good to me | 17:18 |
clarkb | fungi: I'm going to go ahead and +A https://review.opendev.org/c/openstack/project-config/+/892976 in order to stay on schedule with ansible 8 and ensure we get as much data as quickly as possible | 17:21 |
clarkb | the tempest chagne I pushed is looking good so far too | 17:21 |
opendevreview | Merged openstack/project-config master: Convert all zuul tenants except openstack to Ansible 8 https://review.opendev.org/c/openstack/project-config/+/892976 | 17:35 |
fungi | clarkb: sounds good | 17:37 |
clarkb | fungi: looks like the recheck you did failed again trying to install pci.ids from ubuntu's main us mirror? | 17:54 |
clarkb | the tempest change I pushed to shake out ansible 8 issues passed every single job it ran except for two non voting jobs. Spot checking those two failures they both appear to fail running tests and not in the ansible. That is a very good sign | 17:58 |
fungi | clarkb: yeah, i figure we should give archive.ubuntu.com some more time | 18:03 |
fungi | seems it's having issues | 18:03 |
clarkb | fungi: and for clarity you only ran the mirror script for centos not centos-stream? | 18:09 |
fungi | correct | 18:11 |
fungi | the other one is still successfully syncing on its own | 18:11 |
fungi | #status log Pruned backups on backup02.ca-ymq-1.vexxhost.opendev.org as the volume had reached 90% utilization | 18:48 |
opendevstatus | fungi: finished logging | 18:48 |
ildikov | Hi | 19:04 |
ildikov | I have a quick question. The patch that adds the new StarlingX Matrix rooms to the logbot got merged: https://review.opendev.org/c/opendev/system-config/+/892387 | 19:04 |
ildikov | If I understand it correctly, the logs are supposed to show up on the eavesdrop page. Is that accurate? | 19:05 |
fungi | ildikov: yes, you can find them under https://meetings.opendev.org/irclogs/ | 19:08 |
fungi | (the meetings site is where channel logs are also published) | 19:08 |
fungi | for example, https://meetings.opendev.org/irclogs/%23starlingx-general/latest.log.html | 19:08 |
ildikov | fungi: never mind, I'm just blind, lol | 19:09 |
fungi | nah, it's not super obvious | 19:09 |
ildikov | fungi: for some reason I thought it'll all be under the #starlingx folder, since we organized the Matrix rooms into a Starlingx space | 19:10 |
fungi | the logbot doesn't have a concept of spaces, but also spaces aren't channel namespaces in matrix anyway (which is why we prefixed all the channel names with starlingx-) | 19:10 |
ildikov | so I opened that folder in the tree and then my brain just stayed in that rabbit hole | 19:11 |
ildikov | gotta love Mondays, I think I just need my 2nd coffee now! :D | 19:11 |
fungi | keep in mind that the element web client can also be used to anonymously look at public channel logs/history in a richer format | 19:11 |
fungi | the main benefit to our channel logging is that it can be indexed by web search engines | 19:12 |
ildikov | yeah, the logbot part makes sense, the namespace part is a good reminder | 19:12 |
ildikov | that's great to know, thank you! | 19:12 |
fungi | my pleasure, as always | 19:13 |
fungi | clarkb: should we merge https://review.opendev.org/892817 once system-config-run-base is working again? basically get the last of the mm3 patches in before we start scheduling more migrations | 19:41 |
Clark[m] | fungi: yup we should. I'm headed home from lunch now. Not sure if the base job is working again | 19:55 |
fungi | i'll recheck your mirroring change again and find out | 19:59 |
opendevreview | Merged openstack/diskimage-builder master: Install netplan.io for Debian Bookworm https://review.opendev.org/c/openstack/diskimage-builder/+/891323 | 20:17 |
fungi | yeah, looks like it's working again | 20:38 |
fungi | once 892978 merges i'll release my lock on the volume | 20:39 |
clarkb | ++ | 20:42 |
opendevreview | Merged opendev/system-config master: Exclude i686 rpms in our centos mirrors https://review.opendev.org/c/opendev/system-config/+/892978 | 21:20 |
fungi | it deployed so dropping my lock now | 21:29 |
fungi | i've self-approved 892817 now | 21:49 |
Clark[m] | I'm at school pickup for a bit but don't expect trouble | 21:57 |
fungi | nah, should be a no-op | 22:01 |
opendevreview | Merged opendev/system-config master: mailman3: re-sync custom web/settings.py https://review.opendev.org/c/opendev/system-config/+/892817 | 22:15 |
fungi | it deployed | 22:31 |
fungi | everything still looks fine | 22:37 |
fungi | containers restarted at 22:21 | 22:37 |
clarkb | did I end up deleting the old ci registry node? I recall we had to land the change for that which I think did happen but maybe I didn't delete the server yet? | 22:52 |
clarkb | I'm going to drop the gerrit updates meeting topic. I don't think we need to keep bringing up replication leak files | 22:53 |
clarkb | with that I've just pushed an updated agenda | 22:53 |
clarkb | anything else to add to it before I send it out? that will laso serve to confirm the list server is happy | 22:55 |
fungi | nothing comes to mind | 23:15 |
clarkb | looks like it went through | 23:30 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!