Thursday, 2025-05-29

corvushttps://zuul.opendev.org/t/opendev/build/3a0987f20e5142e4abd6bd29b78d0441/console#4/0/28/ubuntu-noble00:03
corvusso far so good re no_log00:03
corvusnow we can update that to return some more useful info00:04
corvuswe could give that task a timeout so that it can fail internally and return more data too00:04
clarkbwe're still getting flaky package installs from upstream mirrors on the system-config-run jobs00:41
clarkbwe don't use our own mirrors for those because they arem eant to look like prod00:41
clarkbI'll just have to followup tomorrow morning00:41
opendevreviewVladimir Kozhukalov proposed openstack/project-config master: Retire openstack-helm-infra repo  https://review.opendev.org/c/openstack/project-config/+/95123706:06
fricklermnaser: fwiw from AS3320 I'm now getting "no route to host" for opendev.org (2604:e100:3::/48), while review.opendev.org (2604:e100:1::/48) is working fine. I'd also very much like to see some postmortem report on these repeated networking issues07:04
fricklerI'm also having partial connectivity issues via IPv4 to review.o.o for some weeks, where only some of the flows are affected, looking like there is a single broken patch within a loadbalancing bundle within cogent (AS174), but they keep saying "all the pings work fine for us" :-(07:25
fricklerah, wrong, the latter affects all of 38.108.68.0/24 for me, so including gitea (opendev.org), not gerrit07:31
mnaserfrickler: we cant really do much traffic steering outside our network.  since cogent is a vendor of ours, if you can show that it's clearly broken, i can open a ticket, but i cannot guarantee anything11:56
mnaserat the end of the day, it's _really_ hard for us to solve issues that are happening on networks that don't belong to us, opendev.org was not even affected by yesterday, that was another site11:57
*** amoralej_ is now known as amoralej12:09
fungibeen right there myself more times than i can count, i feel for you, hope cogent will get their act together but i know backbone providers too well to expect that13:11
fungiunrelated, i need to run some errands on another nearby island today, so will probably not be around the keyboard between 14:30 and 17:00 utc, in case anyone's looking for me13:12
opendevreviewMerged openstack/project-config master: Retire openstack-helm-infra repo  https://review.opendev.org/c/openstack/project-config/+/95123714:10
fungilooks like our system-config jobs are still choking on upstream ubuntu mirror server timeouts14:23
fungiokay, heading out, back in ~2.5 hours hopefully14:33
clarkbfungi: ack. Thanks for rechecking those changes. Though it looks like the us ubuntu mirror pool is still struggling today14:42
clarkbwe already haev failures hitting 2620:2d:4002:1::103 which iirc is the same location we had problems with yesterday14:42
clarkboh though I think it is saying a specific backend is unreachable?14:44
clarkbhttps://status.canonical.com/ indicates all is well. If we hit failures again I wonder if we should maybe ask canonical/ubuntu about it14:46
opendevreviewMerged opendev/system-config master: Switch from netcat to netcat-openbsd package  https://review.opendev.org/c/opendev/system-config/+/95117816:13
opendevreviewMerged opendev/system-config master: Install python3-venv for venv creation on mirror-update  https://review.opendev.org/c/opendev/system-config/+/95121416:13
JayFFWIW, there are reports in #gentoo-infra of outages in ubuntu archives as well16:18
clarkbJayF: the two changes aboev are the two that I've been fighting to get in that hit that and they made it through check and gate finally so maybe things are better? but good to know its observable elsewhere16:21
JayFyeah, I did the reciprocal as well. Just nice for everyone to have confirmation they aren't seeing a problem :D 16:22
JayF**the only ones seeing a problem16:23
clarkbinfra-root I decided to move /usr/local/afsmonvenv aside on mirror-update02 since it was half created and I'm not sure how ansible's pip module will handle that16:25
clarkband the deploy failed in bootstrap bridge due to an apt update problem16:27
clarkbso maybe ubuntu isn't happy again16:27
clarkbI'll figure out reenqueing it16:27
clarkb`zuul-client enqueue --tenant openstack --pipeline deploy --project opendev.org/opendev/system-config --change 951214,1` and it is reenqueed16:29
clarkb"msg": "Failed to update apt cache: unknown reason"16:32
clarkbthis happened again. I'll try to manually do an apt-get update16:32
clarkbCould not connect to security.ubuntu.com:80 (2620:2d:4000:1::103). - connect (111: Connection refused)16:33
clarkbso this does indeed appear to be the same issue. I'll wait a bit before trying again16:33
clarkbbut given those issues I think trying to build entirely new servers is also a bit of a lost cause (as it will randomly fail)16:34
clarkbI should probably focus on something else until this is happier16:34
JayFmaybe they are behind cogent too :| 16:34
ykarelHi can someone create autohold for job neutron-functional-1, change 950303, need to debug a random issue16:47
clarkbykarel: yes one moment16:49
clarkbok that should be in place16:50
ykarelclarkb, thx very much16:51
clarkbI've reeneuqued the afsmonvenv change deployment because bridge apt-get update succeeded for me just now16:55
clarkbsuccess!17:00
clarkbbut ya booting new nodes is likely to be flaky while the mirrors are sad so I'll pause here ready to deploy a new mirror-update and new zookeeper and wait on that for when we think mirrors are happy again17:01
fungigood to see those merged and deployed successfully at least17:16
clarkbhttps://grafana.opendev.org/d/9871b26303/afs?orgId=1&from=now-6h&to=now&timezone=utc I think this shows we're continuing to report afs monitoring stats to graphite/grafana17:51
clarkb(it runs every half hour)17:51
clarkband you can see a slight change in some values at 17:30 UTC17:51
clarkbykarel: that jobs completed and then ode is held. Do you have an ssh key posted somewhere I can put on the host?17:54
clarkbykarel: or feel free to PM me a pubkey17:54
ykarelclarkb, i injected already with add authorized keys role17:54
clarkbah17:54
nhicher[m]clarkb fungi: FYI, I created a bug report for openafs on centos-9 and I've found today that the issue is dkms now require 'yes' or 'no' values (it's true on dkms.conf provided by openafs) https://github.com/dell/dkms/blob/main/dkms.in#L824. You should not be impacted for NO_WEAK_MODULES is only used on rhel/opensuse, on others distributions is ignored according to the man page of dkms20:06
funginhicher[m]: thanks for the follow-up! i suppose that's good news for us, but it prompted us to go ahead and update our openafs package build anyway which isn't a total waste ;)20:17
clarkbnhicher[m]: I think I'd be happy to accept a patch to our package build script that does s/NO_WEAK_MODULES=true/NO_WEAK_MODULES=yes/ if you can figure out the paths and where in the process to apply that20:39
clarkbthat is a minimal update that we can cleanup when upstream has fixed it20:39
fungiagreed, sounds fine to me20:40
nhicher[m]clarkb: sure, I will have a look, thanks =)20:40
opendevreviewClark Boylan proposed opendev/engagement master: Fix data accuracy bugs  https://review.opendev.org/c/opendev/engagement/+/95047121:16
opendevreviewMerged opendev/engagement master: Collect reviewer and maintainer counts per project  https://review.opendev.org/c/opendev/engagement/+/95036922:22
opendevreviewMerged opendev/engagement master: Properly query for opened changes during date range  https://review.opendev.org/c/opendev/engagement/+/95044222:23
opendevreviewMerged opendev/engagement master: Fix data accuracy bugs  https://review.opendev.org/c/opendev/engagement/+/95047122:25

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!