Tuesday, 2026-05-26

-@gerrit:opendev.org- Mauricio Harley proposed: [openstack/project-config] 990078: Remove gerritbot project notifications from #openstack-pqc https://review.opendev.org/c/openstack/project-config/+/99007809:59
-@gerrit:opendev.org- Zuul merged on behalf of Mauricio Harley: [openstack/project-config] 990078: Remove gerritbot project notifications from #openstack-pqc https://review.opendev.org/c/openstack/project-config/+/99007813:08
@fungicide:matrix.orglooks like the project.tarballs afs volume has been running vos release since some time yesterday, blocking all the other static volume updates13:16
@fungicide:matrix.org`2026-05-25 02:55:02,365 release DEBUG    Running: ssh -T -i /root/.ssh/id_vos_release vos_release@afs01.dfw.openstack.org -- vos release project.tarballs`13:19
@fungicide:matrix.orgthat's the one still running13:20
-@gerrit:opendev.org- Sylvain Bauza proposed: [openstack/project-config] 990118: Add #openstack-agentic-workflows IRC channel https://review.opendev.org/c/openstack/project-config/+/99011814:12
@fungicide:matrix.orgout to run a quick errand, back shortly14:16
-@gerrit:opendev.org- Zuul merged on behalf of Sylvain Bauza: [openstack/project-config] 990118: Add #openstack-agentic-workflows IRC channel https://review.opendev.org/c/openstack/project-config/+/99011814:28
-@gerrit:opendev.org- Stephen Finucane proposed: [openstack/project-config] 990122: Rename x/cursive to openstack/cursive https://review.opendev.org/c/openstack/project-config/+/99012214:42
@clarkb:matrix.orginfra-root we'll discuss much of this in today's meeting but the things I'm looking at this week are hopefully landing https://review.opendev.org/c/opendev/system-config/+/988993 to ensure new executors have manual configs that were added. Upgrading Gitea to 1.26.2 https://review.opendev.org/c/opendev/system-config/+/989448 Helping mnasiadka add a new backup server https://review.opendev.org/c/opendev/system-config/+/989567 and its depends on. Then doing more Gerrit upgrade prep. I plan to announce the June 5 upgrade date if there are no concerns with that today and then work through my TODO list in the planning etherpad: https://etherpad.opendev.org/p/gerrit-upgrade-3.1315:23
@clarkb:matrix.orgoh and digging into the dns job failure. Looks like my last recheck of the test change may have caught a failure but I haven't looked any closer yet15:24
@clarkb:matrix.orghttps://3edfd4ea22585141d74d-f3c4fca3c92876a4d627c25bf953ebd1.ssl.cf1.rackcdn.com/openstack/486a31d808e04145a00d975a8b984e59/bridge99.opendev.org/ara-report/results/373.html shows a SERVFAIL dns response15:28
@clarkb:matrix.orgthat query was run against the local resolver. The next two queries query the authoritative servers successfully15:29
@clarkb:matrix.orgLooking at that I think I need to grab /var/log/unbound.log as there isn't really any info as to why the local resolver failed15:30
-@gerrit:opendev.org- Clark Boylan proposed: [opendev/system-config] 989784: Add some dns lookup debugging to adns test server https://review.opendev.org/c/opendev/system-config/+/98978415:33
@clarkb:matrix.orgnow with more logging15:33
-@gerrit:opendev.org- Zuul merged on behalf of Sylvain Bauza: [opendev/system-config] 988406: Add bots to #openstack-agentic-worfklows https://review.opendev.org/c/opendev/system-config/+/98840615:42
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul-jobs] 990139: DNM: Test zuul_user_dir https://review.opendev.org/c/zuul/zuul-jobs/+/99013916:22
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul-jobs] 989997: Use zuul_user_dir in some roles https://review.opendev.org/c/zuul/zuul-jobs/+/98999717:09
-@gerrit:opendev.org- Michal Nasiadka proposed on behalf of Mohammed Naser:18:09
- [opendev/system-config] 980840: Add Prometheus monitoring service https://review.opendev.org/c/opendev/system-config/+/980840
- [opendev/system-config] 980994: Deploy node_exporter across all managed hosts https://review.opendev.org/c/opendev/system-config/+/980994
-@gerrit:opendev.org- Michal Nasiadka proposed: [opendev/system-config] 988310: Add GrepTimeDB long term storage for Prometheus https://review.opendev.org/c/opendev/system-config/+/98831018:11
@mnasiadka:matrix.orgClark: updated the node-exporter patch, will work on the greptimedb one - still I'm unsure if we should have that in the deploy pipeline from the start or not18:12
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul-jobs] 990164: DNM: test zuul_user_dir fallback https://review.opendev.org/c/zuul/zuul-jobs/+/99016418:16
@clarkb:matrix.orgmnasiadka: I think we can leave it out of deploy and add it to deploy when we add a server if we prefer that approach19:07
@fungicide:matrix.orgthe project.tarballs vos release complleted some time between 18:30 and 18:35 utc, and another vos release is in progress for it now, hopefully wrapping up soon19:08
@dpanech:matrix.orgHi all, in this review: https://review.opendev.org/c/starlingx/tools/+/988511 the blueprint link doesn't work right. It links to https://blueprints.launchpad.net/openstack/... rather than .../starlingx/... . Is there a workaround?19:11
I suspect the answer is "no" based on this: https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/gerrit/templates/gerrit.config.j2#L137 .
Could somebody confirm ?
@fungicide:matrix.orgi think it's taking advantage of launchpad's umbrella org functionality that does bp name lookups for child projects of an umbrella parent19:21
@fungicide:matrix.orgmaybe we could drop the "openstack/" from the url and do this? https://blueprints.launchpad.net/?searchtext=testbp19:22
@fungicide:matrix.orgit would potentially return blueprints from other projects on lp, but if people namespace their blueprints consistently maybe things would just work out most of the time?19:23
@clarkb:matrix.orgya I think you'd need to namespace blueprint names themselves to make that work and that may be an appropriate solution here19:24
@fungicide:matrix.orgalso worth noting, we already had the possibility for bp names to collide between different openstack projects, resulting in the existing query returning multiple results19:30
@mnasiadka:matrix.orgSo we just have a smaller collision domain right now19:35
@fungicide:matrix.orgright. and insofar as our gerrit isn't only for openstack, this feels like another legacy decision we've needed to generalize for some time but nobody's pointed out until now19:37
@mnasiadka:matrix.orgTrue, sounds like an easy switch, but some communication work is needed19:40
-@gerrit:opendev.org- Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org proposed:19:46
- [opendev/system-config] 990186: Don't assume all LP blueprints belong to OpenStack https://review.opendev.org/c/opendev/system-config/+/990186
- [opendev/system-config] 990187: Update the bug reporting link in Gerrit https://review.opendev.org/c/opendev/system-config/+/990187
@fungicide:matrix.orgthe second one was something trivial i happened to notice while i was in there19:46
@clarkb:matrix.orgfungi: I think you might be right about bind and unbound getting in a fight. In the unbound log I see that we do queries for us.archive.ubuntu.com to install bind and a couple of other packages. Then I don't see queries from by debug script after that. In particular google.com shows up nowhere in the log and I do a query for google.com against the local resolver19:49
@clarkb:matrix.orghowever, it does resolve19:49
@clarkb:matrix.orgso maybe bind is coming up and taking over resolution duties? And for some reason google.com can resolve but not opendev.org in that situation?19:49
@fungicide:matrix.orgthough we do disable recursion in bind19:50
@fungicide:matrix.orgor at least it seemed like we did19:50
@clarkb:matrix.orgfungi: yes, but this is before we configure bind19:50
@clarkb:matrix.orgwe're in the period of time between package install for bind and configuring bind19:50
@fungicide:matrix.orgmmm, so could also be coming from a cache19:51
@fungicide:matrix.orgmaybe even cached results can be returned later after disabling recursion19:51
@clarkb:matrix.orgI think I would see the request go to unbound in that case. Unless systemd or libc is caching it and short circuiting before going to unbound?19:51
@fungicide:matrix.orgor systemd-resolved19:52
@clarkb:matrix.organyway at this point I'm wondering if I should try to hold a node or I can collect more debug output (ss to see what listening where and maybe the bind log?)19:52
@fungicide:matrix.orghopefully not that19:52
@fungicide:matrix.orgbut yeah, an autohold is warranted for this19:52
@clarkb:matrix.orgok I'll put one in place and start rechecking. I do worry that after we configure bind the issue will go away19:52
@fungicide:matrix.orgotherwise you're probably stuck dumping lsof to a file19:52
@clarkb:matrix.orgso maybe I should also add more debugging and cover all the bases19:52
@fungicide:matrix.orggetting query logs from bind as well as unbound would help us identify if the queries were going to the wrong daemon19:53
@clarkb:matrix.orgyup I'll work both angles in case one is insufficient19:53
@dpanech:matrix.orgfungi: thanks for the BP links update19:57
@fungicide:matrix.orgdavlet: thanks for pointing it out! that wasn't something i had thought about in a very long time19:59
@clarkb:matrix.orgfungi: more datapoints. We are collecting syslog which has named logs and it is listening at 127.0.01:53: `listening on IPv4 interface lo, 127.0.0.1#53` and that happens according to syslog before my debugging command task from ansible runs. In prod we have unbound on local and bind on the "public" (which is not publicly accessible) interfaces20:01
@clarkb:matrix.orgalso these jobs are failing in rax flex whcih are fast. I suspect that bind is coming up quicker there and then breaking/conflicting with unbound whereas normally there is enough of a delay there.20:02
@clarkb:matrix.orgI think maybe we can refactor these tasks so that we install git first and then clone repos. Then install bind and rsync and worry about synchronization and configuring bind20:03
@clarkb:matrix.organd honestly I'm tempted to just push that change up and recheck it a few times and if it doesn't break maybe we're good20:03
@clarkb:matrix.orgI think the problem is just within the window where bind has been installed and we need dns lookups which is small and avoidable20:03
@fungicide:matrix.orgmakes sense20:04
-@gerrit:opendev.org- Clark Boylan proposed: [opendev/system-config] 990190: Fix order of operations during adns bind installation https://review.opendev.org/c/opendev/system-config/+/99019020:12
@clarkb:matrix.orgthat change is decoupled from the debugging change. That said I have an autohold set and will push an update to the debugging change shortly to better illustrate the problem20:12
-@gerrit:opendev.org- Clark Boylan proposed: [opendev/system-config] 989784: Add some dns lookup debugging to adns test server https://review.opendev.org/c/opendev/system-config/+/98978420:16
@clarkb:matrix.orghttps://zuul.opendev.org/t/openstack/build/d87513edff3f4e3ca278d4552e0def4f/log/job-output.txt#17238-17265 I think this shows the bind port conflict21:05
@fungicide:matrix.orginteresting, how are they listening on the same port? i thought the kernel didn't allow that21:08
@fungicide:matrix.orgoh! udp, so not "listening" in the tcp sense21:09
@fungicide:matrix.orgmy brain is fried, and it's only tuesday and yesterday was a holiday21:10
@clarkb:matrix.orgThough it isn't multicast so not sure how the kernel decides which one to deliver to? Or does it deliver to all in a unicast fashion and then maybe conntrack gets confused 21:12
@clarkb:matrix.orgfungi: those nodes should be held too if you want to inspect the running system21:14
@clarkb:matrix.orgBut I'm doing the school run now 21:14
@fungicide:matrix.orgi think they all receive every datagram and then respond if they want21:14
@clarkb:matrix.orgIf they respond the resolution should work though?21:15
@fungicide:matrix.orgso it's more about which response does the client accept21:15
@clarkb:matrix.orgThe client (dig) says servfail21:15
@fungicide:matrix.orgmy guess is the client is receiving two responses21:15
@clarkb:matrix.orgWhich is in that log just below21:15
@fungicide:matrix.orgbut i'm currently prepping dinner so can't experiment at the moment21:16
@fungicide:matrix.orgwe could probably tcpdump on lo0 to confirm that theory, but it doesn't change the fact that they're clearly in conflict with one another21:16
@clarkb:matrix.orgYa and I think the fix is straightforward. Avoid DNS name lookups after bind is installed and before it is reconfigured 21:17
@fungicide:matrix.orgoh! there's even more fun...21:17
@clarkb:matrix.orgMy proposed fix did not hit raxflex so I rechecked it21:17
@fungicide:matrix.org`...127.0.0.53%lo:53         0.0.0.0:*    users:(("systemd-resolve"...`21:17
@clarkb:matrix.orgYa though I think that systemd magic happens there21:18
@fungicide:matrix.orgso in fact we have 3 resolvers all listening on 53/udp21:18
@fungicide:matrix.orgon the loopback21:19
@clarkb:matrix.orgYa it's probably worth checking the held node to see how that all shakes out. But I suspect the fix I already pushed is the solution 21:22
@clarkb:matrix.orgmnasiadka: noticed a test issue with the greptimedb change and left a comment with a suggestion on how to fix it21:50
@fungicide:matrix.orgthe irc meeting schedule indicates there's nothing going on at this point, so i'm going to restart the ircbot container on eavesdrop02 to pick up new channels22:44
@fungicide:matrix.orglooks like it's rejoined all the channels now22:47
@fungicide:matrix.orgincluding the two newly-added ones22:49
@clarkb:matrix.orgperfect22:59
@clarkb:matrix.orgI hopped onto the held adns node and did a `dig opendev.org` which failed on the first query but succeeded on the second23:34
@clarkb:matrix.org`broken trust chain resolving 'opendev.org/A/IN': 104.239.145.127#53` is in the bind log23:35
@clarkb:matrix.orgI think maybe it is failing the first request on dnssec validation but then serving out of the cache on the second (that feels liek a bug but whatever)23:35
@clarkb:matrix.organd the unbound log file is acting like it isn't moving at all23:35
@clarkb:matrix.orgas if the udp packet isn't getting delivered to it at all23:36
@clarkb:matrix.orgIn any case I suspect that reconfiguring bind removes the conflict which should allow unbound to get the packets again so reordering ansible tasks to avoid the name lookups when bind is half configured should work well as a fix (but I'm still trying to get a test run of that change in rax flex)23:36
@clarkb:matrix.orgya stracing the unbound process and running dig doesn't show any movement in that process at all so I think it is sitting there idle and bind is getting the packets for whatever reason23:38
@clarkb:matrix.orgoh there is already an etherpad 3.2.0 with the session cleanup fix in it23:44
@clarkb:matrix.orgI missed that23:44

Generated by irclog2html.py 4.1.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!