opendevreview | Merged opendev/system-config master: launch: fix RAX rdns command-line tool https://review.opendev.org/c/opendev/system-config/+/880785 | 00:00 |
---|---|---|
opendevreview | Merged opendev/system-config master: reprepro doc: mention contents.cache.db https://review.opendev.org/c/opendev/system-config/+/857530 | 00:32 |
opendevreview | Merged opendev/system-config master: doc/nodepool: update vhd-util docs https://review.opendev.org/c/opendev/system-config/+/869623 | 00:33 |
opendevreview | Merged opendev/system-config master: grafana: pull the grafyaml image before running https://review.opendev.org/c/opendev/system-config/+/852067 | 00:57 |
opendevreview | Merged opendev/system-config master: logrotate: don't use filename to generate config file https://review.opendev.org/c/opendev/system-config/+/873481 | 00:57 |
ianw | https://static.opendev.org/project/tarballs.opendev.org/openstack/openstack-zuul-jobs/openafs/centos9-stream/RPMS/x86_64/ still doesn't have the 1.8.9 rpm files i expected it to have | 01:48 |
ianw | ls -c1 /afs/openstack.org/project/tarballs.opendev.org/openstack/openstack-zuul-jobs/openafs/centos9-stream/RPMS/x86_64 | wc -l | 01:52 |
ianw | 43 | 01:52 |
ianw | ls -c1 /afs/.openstack.org/project/tarballs.opendev.org/openstack/openstack-zuul-jobs/openafs/centos9-stream/RPMS/x86_64 | wc -l | 01:52 |
ianw | 63 | 01:52 |
opendevreview | Merged opendev/zone-opendev.org master: Remove old nameservers https://review.opendev.org/c/opendev/zone-opendev.org/+/880709 | 01:53 |
opendevreview | Merged opendev/zone-zuul-ci.org master: Remove old nameservers https://review.opendev.org/c/opendev/zone-zuul-ci.org/+/880910 | 01:53 |
ianw | Updating existing ro volume 536871090 on afs02.dfw.openstack.org ... | 01:56 |
ianw | Starting ForwardMulti from 536871090 to 536871090 on afs02.dfw.openstack.org (as of Sun Apr 23 02:22:36 2023). | 01:56 |
ianw | i'm running it again. maybe it only synced to when the volume was locked previously? | 01:56 |
opendevreview | Ian Wienand proposed opendev/system-config master: Remove old DNS servers https://review.opendev.org/c/opendev/system-config/+/880710 | 02:10 |
fungi | ianw: yes, normally the first new run redoes the previous failed serial, then the next catches it up | 02:26 |
ianw | i don't think i realised that tidbit. anyway, it's released (again) and looks in sync now | 05:06 |
ianw | i'll drop the root shell and emergency etc. | 05:06 |
opendevreview | Ian Wienand proposed opendev/system-config master: haproxy-statsd: add something at startup https://review.opendev.org/c/opendev/system-config/+/881794 | 06:08 |
opendevreview | Merged opendev/system-config master: openafs-client: get logs better https://review.opendev.org/c/opendev/system-config/+/881528 | 06:20 |
*** gthiemon1e is now known as gthiemonge | 07:12 | |
opendevreview | Luis Tomas Bolivar proposed openstack/project-config master: Enable pypi and github clone jobs for ovn-bgp-agent https://review.opendev.org/c/openstack/project-config/+/881800 | 07:35 |
opendevreview | yatin proposed openstack/project-config master: Temporary disable vexxhost-ca-ymq-1 provider https://review.opendev.org/c/openstack/project-config/+/881810 | 10:48 |
opendevreview | Ian Wienand proposed opendev/system-config master: haproxy-statsd: add something at startup https://review.opendev.org/c/opendev/system-config/+/881794 | 10:54 |
ianw | ^^ some discussion in #openstack-tc about this, but it looks quite weird | 11:03 |
ianw | the mirror host is up and nothing seems out of order in terms of processes, etc. | 11:03 |
ianw | https://4c92e32258abec426e9c-a1b3a735e9c0af824100c02f6885a5ce.ssl.cf1.rackcdn.com/881798/1/check/neutron-tempest-plugin-ovn/266a77a/job-output.txt | 11:03 |
ianw | is a log. it seems like the mirror goes between being completely accessible to not, then back again | 11:04 |
ianw | ipv4 & ipv6 addresses appear, so it doesn't seem to be one or the other | 11:05 |
ianw | not seeing anything in vexxhost status | 11:06 |
ianw | https://04313aea8028a6223239-76770e8376bdcb5a12e0ef605f8b8d22.ssl.cf5.rackcdn.com/881742/3/check/neutron-tempest-plugin-designate-scenario/6f701ec/job-output.txt is another one | 11:14 |
ianw | almost the same | 11:15 |
ianw | 2023-04-28 10:58:19.131915 | controller | Hit:8 https://mirror.ca-ymq-1.vexxhost.opendev.org/ubuntu focal-security Release | 11:16 |
ianw | 2023-04-28 10:58:51.169261 | controller | Could not connect to mirror.ca-ymq-1.vexxhost.opendev.org:443 (2604:e100:1:0:f816:3eff:fe0c:e2c0). - connect (113: No route to host) Could not connect to mirror.ca-ymq-1.vexxhost.opendev.org:443 (199.204.45.149), connection timed out | 11:17 |
ianw | it gets a package list, then seconds later can't connect | 11:17 |
ralonsoh | ianw, hi, thanks for checking that. guilhermesp_____ (not in this channel) was checking that | 11:19 |
ralonsoh | but we didn't receive any feedback yet | 11:19 |
ianw | i would suspect the job, but nothing really happens between the "apt-get update" and the install of the packages; and the node doesn't drop off the network | 11:20 |
ianw | mnaser: ^ any thoughts?! | 11:21 |
gthiemonge | ianw: ralonsoh: the Octavia gates are also blocked by this issue | 11:21 |
ralonsoh | right | 11:22 |
ianw | https://zuul.openstack.org/build/db2df31a57594568a23aab551586023f is an octavia job doing the same | 11:23 |
ianw | yeah, was just looking through the failure list for something ! neutron | 11:24 |
ralonsoh | yes, octavia is using the same virt nested nodes | 11:24 |
ykarel | ianw, from quite sometime compute nodes are being upgraded in vexxhost-ca-ymq-1 to fix a nested-virt issue https://bugs.launchpad.net/neutron/+bug/1999249/comments/3 | 11:28 |
ykarel | and the nodes that are impacted by mirror issue also matches those to-be-upgraded(Bad Nodes ^) host list | 11:29 |
ykarel | so seems some fix(which is available in other nodes) is missed during the upgrade | 11:29 |
ykarel | just assuming based on the available data, infra guys may have more details about the history here | 11:31 |
ianw | ... that bug seems to refer to things randomly hanging, which sounds more plausible for nested virt issues than networking somehow stopping at a first glance | 11:34 |
ianw | but i don't know. i tried logging into a running ca-ymq-1 nested virt node and it was pinging the mirror fine | 11:35 |
ianw | and as noted the mirror node itself doesn't seem unhappy with anything | 11:36 |
ianw | i think we probably need vexxhost to look behind what we can see | 11:36 |
ykarel | ianw, the node you logged in was booted on one of the compute node listed above? | 11:38 |
ianw | yes, just a random one | 11:40 |
ykarel | do you have the host_id for that node? | 11:42 |
ianw | i've just checked a few running ones. some of them are clearly doing devstack and are past the point of installing things | 11:42 |
ykarel | it's possible they are good ones then, as bad ones fails during devstack setup | 11:43 |
ianw | one is installing stuff now | 11:45 |
ianw | though it has no ipv6 | 11:45 |
opendevreview | Merged openstack/project-config master: Enable pypi and github clone jobs for ovn-bgp-agent https://review.opendev.org/c/openstack/project-config/+/881800 | 12:02 |
opendevreview | Merged openstack/project-config master: Indent Gerrit ACL options https://review.opendev.org/c/openstack/project-config/+/879906 | 12:02 |
opendevreview | Merged openstack/project-config master: tools/normalize_acl.py: Add some human readable output https://review.opendev.org/c/openstack/project-config/+/880898 | 12:02 |
opendevreview | yatin proposed openstack/project-config master: Temporary disable nested-virt labels in vexxhost-ca-ymq-1 https://review.opendev.org/c/openstack/project-config/+/881810 | 12:12 |
dpawlik | hello folks o/ dansmith, fungi: I did not catch that you reply me on 17.04.2023 for the question related to the performance.json file | 12:22 |
dpawlik | I can skip that file, when the value is wrong but... | 12:22 |
dpawlik | as I mentioned, this problem is just in project: x/networking-opencontrail | 12:23 |
dpawlik | there is some periodic pipeline related to that project | 12:23 |
dpawlik | is it still used or it can be disabled? | 12:23 |
dpawlik | https://review.opendev.org/c/x/networking-opencontrail/+/881820 | 12:27 |
dpawlik | ah, this project got some cores. Ignore review for that PS. | 12:29 |
fungi | ianw: when looking into this previously, i also confirmed logs on both the mirror and collected from the test node don't indicate stray routes being temporarily added or removed either | 12:31 |
fungi | and someone suggested this coincided with the switch from focal to jammy? so could be kernel-related i guess | 12:32 |
*** amoralej is now known as amoralej|lunch | 12:47 | |
opendevreview | Merged openstack/project-config master: Temporary disable nested-virt labels in vexxhost-ca-ymq-1 https://review.opendev.org/c/openstack/project-config/+/881810 | 13:05 |
dpawlik | fungi: do you know if the core reviewers can push the code directly to the repo? Disabling the periodic jobs https://review.opendev.org/c/x/networking-opencontrail/+/881820 requires to fix many things... | 13:05 |
dpawlik | or set voting: false to all jobs.. | 13:06 |
fungi | dpawlik: disabling broken jobs seems reasonable to me, but that's probably up to the maintainers of that project if they're still active. if they're not, the opendev sysadmins might consider removing their job configuration entirely in order to stop them wasting resources | 13:12 |
fungi | it's not something we've done in the past, so we'd need to talk through what sort of precedent that sets and what we want our policy for that sort of thing to be going forward | 13:13 |
fungi | infra-root: ^ opinions on that are welcome | 13:13 |
fungi | note that merging a change which replaces all their project-pipelines with just check and gate runs of the built-in noop job should be mergeable normally by zuul, if it's not then it's because we've also got jobs added via the project-config repo which will need to be removed first | 13:16 |
Clark[m] | I think there are two separate concerns here. The first is dpawlik's where the project generates files that can't be indexes properly. The indexer should either learn to ignore them or figure out how to index the files in some way. Then on the OpenDev side it is what do we do about dead projects running jobs/having broken jobs. In the case of x/windmill repos we simply removed them from the zuul projects list. I don't think we need to do | 13:20 |
Clark[m] | anything more in depth than that. | 13:20 |
dpawlik | "The indexer should either learn to ignore them or figure out how to index the files in some way" - yeah, will do a patch for that to avoid such issues, but from the other side, if project is dead or no activity for one year, such periodic jobs that for sure would fail does not make sense. | 13:32 |
dpawlik | thank you folks for reply, will do a patch | 13:32 |
Clark[m] | dpawlik in the old system we had a way to exclude specific jobs. Usually this was necessary because the jobs would create massive log files that we couldn't index in a reasonable amount of time | 13:41 |
Clark[m] | Sometimes we did it because the log format was broken | 13:41 |
*** amoralej|lunch is now known as amoralej | 13:44 | |
fungi | yeah, removal from tenants seems like the best approach if there's an abandoned project still running jobs and wasting resources | 13:54 |
fungi | i agree that's even simpler than the noop job change approach | 13:54 |
fungi | and avoids tampering with the content of the repository itself | 13:55 |
*** dviroel_ is now known as dviroel | 14:30 | |
clarkb | and it is easy to add it back in should people show up looking to make things work | 15:06 |
clarkb | slaweq: ykarel fungi ianw I feel like the vexxhost nested virt stuff is almost certainly going to be a cloud issue because no other regions experience this and the mirror node reports all is well | 15:07 |
clarkb | and there is a high probability that it is a neutron issue in the cloud :) now we just need to make dogfooding work | 15:08 |
fungi | yes | 15:09 |
clarkb | fungi: have time for https://review.opendev.org/c/opendev/system-config/+/881682 ? to fix gerrit theme on 3.8? | 15:11 |
fungi | guilhermesp: mnaser: the neutron folks have been observing random internal routing issues between some test nodes in ca-ymq-1 and our mirror server in the same network there, like packets intermittently not making it between hosts there (and often very early in jobs when there's not much besides package downloads going on). they put together a list of the host_id hashes seen for nodes that | 15:11 |
fungi | exhibited the problem: https://paste.opendev.org/show/bCbxIrXR1P01q4JYUh3i/ | 15:11 |
clarkb | fungi: did we end up deciding whether or not we want ipv6 glue records for opendev nameservers? | 15:15 |
fungi | i don't think they're critical, do you know if we had them before we switched? | 15:16 |
clarkb | I don't know | 15:16 |
fungi | but also we didn't ever get reverse dns working for the ns04 v6 addy | 15:17 |
clarkb | ah | 15:17 |
clarkb | fungi: re gite links in gerrit 3.8 I think the thing that changed was an internal interface that updated affecting plugins used to set web links. We don't use a plugin for that and they would have updated the internal interfaces for gitweb stuff we do use | 15:18 |
clarkb | We should still test it but I think this one is a noop for us | 15:18 |
fungi | oh, that helps | 15:18 |
fungi | it wasn't clear to me that what we're doing doesn't use a polygerrit plugin | 15:19 |
clarkb | basically the gerrit official stuff appears to have been updated even if it was in a plugin. But then there may be third party plugins that do similar which they warn about | 15:19 |
fungi | got it | 15:20 |
fungi | so shouldn't impact our configuration | 15:20 |
clarkb | ya I don't think so. But we should definitely check it since something may have been missed | 15:21 |
*** amoralej is now known as amoralej|off | 15:48 | |
opendevreview | Merged opendev/system-config master: gerrit: update OpenDev theme CSS installation https://review.opendev.org/c/opendev/system-config/+/881682 | 16:42 |
opendevreview | Merged opendev/base-jobs master: Run ensure-quay-repo in our base container jobs https://review.opendev.org/c/opendev/base-jobs/+/881522 | 16:45 |
dansmith | is something up with the gate or is it just busy? >100 things in the queue is a lot for a friday | 16:57 |
clarkb | I see 10 in the gate | 17:02 |
dansmith | sorry I don't mean the gate queue specifically | 17:02 |
clarkb | picking random jobs that are in progress their console logs seem to show they are progressing with recent timestamps | 17:03 |
dansmith | there are some things that have been in there for >4hr and have "paused" jobs.. I dunno what that means | 17:03 |
clarkb | looks like starglingx also just pushed a bunch of changes | 17:03 |
dansmith | I just noticed because I submitted a few things and even after 45m no jobs have even started | 17:03 |
dansmith | I just noticed because I submitted a few things and even after 45m no jobs have even started | 17:03 |
clarkb | I think this is mostly demand and a large number of changes from starlingx showing up all at once | 17:03 |
clarkb | Zuul allows you to pause execution of a job so that it can provide resources to other jobs. This is commonly used to host container images (which is what the tripleo exampls you see are doing) | 17:04 |
clarkb | the content-provider jobs build and host container images then a bunch of the other jobs fetch and run those containers | 17:04 |
dansmith | okay I just hadn't seen that before | 17:05 |
fungi | you can see the node requests are climbing too https://grafana.opendev.org/d/21a6e53ea4/zuul-status?orgId=1 | 17:06 |
clarkb | rax-iad has a backlog of deleting nodes, but that is something we've been fighting for a while now so not new | 17:06 |
fungi | we've been maxxed out on available capacity since the past ~4 hours | 17:06 |
dansmith | yeah, just very different suddenly from yesterday and somewhat unusual for a friday so I thought maybe we had another thing that was making everything fail | 17:07 |
dansmith | okay the top nova job just started running tests, 88 minutes after enqueue.. whew. | 17:29 |
*** JayF is now known as Guest12444 | 18:27 | |
*** JasonF is now known as JayF | 18:27 | |
opendevreview | Marcos Paulo Oliveira Silva proposed openstack/project-config master: Add Node Feature Discovery FluxCD app to StarlingX https://review.opendev.org/c/openstack/project-config/+/881883 | 18:41 |
opendevreview | Marcos Paulo Oliveira Silva proposed openstack/project-config master: Add Node Feature Discovery FluxCD app to StarlingX https://review.opendev.org/c/openstack/project-config/+/881883 | 18:43 |
opendevreview | Marcos Paulo Oliveira Silva proposed openstack/project-config master: Add Node Feature Discovery FluxCD app to StarlingX https://review.opendev.org/c/openstack/project-config/+/881883 | 19:40 |
opendevreview | Marcos Paulo Oliveira Silva proposed openstack/project-config master: Add Node Feature Discovery FluxCD app to StarlingX https://review.opendev.org/c/openstack/project-config/+/881883 | 19:47 |
*** dmellado4 is now known as dmellado | 20:22 | |
fungi | we caught back up on node requests a few hours ago, btw | 21:21 |
fungi | around 18z | 21:22 |
opendevreview | Marcos Paulo Oliveira Silva proposed openstack/project-config master: Add Node Feature Discovery FluxCD app to StarlingX https://review.opendev.org/c/openstack/project-config/+/881883 | 21:48 |
opendevreview | Clark Boylan proposed zuul/zuul-jobs master: Skip quay repo creation if necessary info is missing https://review.opendev.org/c/zuul/zuul-jobs/+/881893 | 22:09 |
opendevreview | Merged zuul/zuul-jobs master: Skip quay repo creation if necessary info is missing https://review.opendev.org/c/zuul/zuul-jobs/+/881893 | 22:22 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!