Thursday, 2025-10-16

opendevreviewOpenStack Proposal Bot proposed openstack/project-config master: Normalize projects.yaml  https://review.opendev.org/c/openstack/project-config/+/96255702:15
hemanthHey regarding RETRY_LIMIT Host Unreachable.. I am able to see the error multiple times yesterday and today.. 1. Increased verbosity of charmcraft command and the charmcraft creates a lxc container and setup networking, tz etc inside the container and tries to stop the container and there is disconnection to logs 2. Modified the lxc network from 10.x.x.x to 172.28.x.x but the problem still exists ... the pattern which i observed this is03:41
hemanthhappening on instances where more than 1 interface exists (104.x.x.x and 23.x.x.x).. is there a nodeset i can use so that the test machine can be from 104.x.x.x/23.x.x.x? (Do not want to recheck which runs 40+ jobs to see the error)03:41
hemanthI see 3 attempts on some of the jobs that are failing but there is no pre-run defined for the job.. is this expected? https://zuul-ci.org/docs/zuul/latest/config/job.html#attr-job.attempts03:42
hemanthJob definition: https://opendev.org/openstack/sunbeam-charms/src/branch/main/zuul.d/jobs.yaml#L1 (or any charm-build-* in this file)03:44
hemanth* Hey regarding RETRY_LIMIT Host Unreachable.. I am able to see the error multiple times yesterday and today.. 1. Increased verbosity of charmcraft command and the charmcraft creates a lxc container and setup networking, tz etc inside the container and tries to stop the container and there is disconnection to logs 2. Modified the lxc network from 10.x.x.x to 172.28.x.x but the problem still exists ... the pattern which i observed this03:56
hemanthis happening on instances where more than 1 interface exists (primary interfaces: 104.x.x.x and 23.x.x.x).. is there a nodeset i can use so that the test machine can be from 104.x.x.x/23.x.x.x? (Do not want to recheck which runs 40+ jobs to see the error)03:56
tonybhemanth: I'm AFK right now (on my phone).   Thanks for the updates.  I'll process what you've written and see what suggestions I have04:16
tonybhemanth: A very non-scientific look at: https://zuul.opendev.org/t/openstack/builds?job_name=charm-build-glance-k8s&project=openstack%2Fsunbeam*&change=963705&skip=0 seems to show that all the failing jobs are running on the 'RAX Classic' cloud (in various regions).06:10
tonybSome (https://4ca4d8620dc864739feb-f83d06667d580e000031601b82c71a43.ssl.cf5.rackcdn.com/openstack/acfdfa3dcb224b04b45750039d5298c0/zuul-info/inventory.yaml) run (successfully) on "RAX Flex" which has multiple interfaces.06:10
tonybI don't think there is anyway to explicitly request a node on that (RAX Classic) provider.06:15
tonybhemanth: I suggest creating a modified version of your change that removes all jobs from: https://opendev.org/openstack/sunbeam-charms/src/branch/main/zuul.d/project-templates.yaml#L40 apart from charm-build-glance-k8s.  Then you can recheck on that.06:18
tonybOnce you see it's running on RAX Classic (near the top of the job-output/console stream) look for the "Print node information" task. Assuming an admin is online we can add an autohold for you06:20
hemanthI moved the builds to ubuntu-noble and i do not see any issue ..https://zuul.openstack.org/status?change=96420506:27
hemanthhttps://review.opendev.org/c/openstack/sunbeam-charms/+/964205/2#message-8f3e9d694cff4b7e7badd61f615e41e2565e0f4506:31
tonybYou got a little lucky in that you landed on RAX Flex06:32
hemanthok.. let me start one just with charm-build-glance-k8s06:34
tonybOkay, I'll discard mine06:37
hemanthI trigged one - change id 96420806:37
tonybhemanth: What's the chnage number?06:38
hemanthThe job started on raxclassic one06:43
hemanthCan you autohold and is it possible for me to ssh into the node?06:45
fungitonyb: ^ (in case you weren't watching closely)06:46
tonybAdded, sorry I was trying to get the hold in place while making a stupid amount of typos06:47
tonybhttps://zuul.opendev.org/t/openstack/autoholds06:47
tonybhemanth: ^^06:47
tonybhemanth: Can you send me your ssh public key and I'll add it to the node06:48
hemanthtonyb: https://launchpad.net/~hemanth-n/+sshkeys . .can you add the last one06:50
tonybAh and of course the networking is b0rked06:51
hemanth:-( i will ping you back when i see it runs again on rax classic06:52
fungiwell, that's sort of to be expected since that's what the job failures were suggesting06:52
tonybYeah :/06:52
hemanthi am just thinking how to debug further06:52
tonybYou could add an explict "fail" into the playbooks before he charmcraft06:53
tonybgimme 3 to think ...06:54
fungibooting the node into recovery mode through the rackspace dashboard should be possible, but that may destroy whatever state the playbook has gotten it into06:54
fungithough could still be enough to collect logs06:54
tonybfungi: Do you think I could *maybe* access it via the service-net in that region ... from the mirror node06:56
tonyb(guessing that only eth0 is b0rked)06:56
fungioh, entirely possible but you'd need to copy your keys over06:56
fungior do agent forwarding or something06:56
funginot really the safest idea06:57
tonybYeah I'd probably do agent forwarding06:57
fungithough i suppose you could proxy the socket06:57
tonybLet me see if I can get the Service-IP, and if it's listening on 2206:59
tonybThe first node has been recycled07:01
hemanthtonyb: 2nd attempt landed on rax-ord-main07:06
tonybOkay I got in and killed /usr/bin/snap install --channel latest/stable lxd07:08
tonybso the job failed and hopefully we get the autohold07:08
tonybI'll see if I can get in via service-net, and then add hemanth's ssh key.07:09
tonybhemanth: You'll need to manually walk through the playbook07:09
hemanthtonyb: ack07:09
tonybOkay service-net does listen on 2207:11
hemanthtonyb: let me know when i should be good to ssh07:18
tonybWill do, sorry it's taking a little longer to get in that I thought07:18
tonybhemanth: root@23.253.164.13307:21
hemanthtonyb: thanks, i am in07:22
tonybOkay07:22
tonybyou should be able to sudo -i -u zuul su - (or similar)07:22
tonyband then work through the playbook07:22
hemanthack07:22
tonyb*hopefully*, if the network dies again it will only impact the public network07:23
tonybI'm running a mix of ip/iptables commands with (local logging) "just in case"07:25
hemanthack07:26
tonybthe lxbr0 is still on 10.x is that expected?07:27
hemanthyes with the current one yes..07:28
tonyb++07:28
hemanthi can change manually.. lemme do that07:28
tonybNo it's fine07:28
tonybI just wanted to check what I'm seeing07:28
tonybthe routes look okay07:28
hemanthstarted the charmcraft pack command07:32
tonyb++07:33
hemanthis the connection gone for you?07:34
tonybI still have access on service-net07:35
hemanthI can no more access07:36
tonybOkay it's lost the default route07:36
hemanthI added following rule nft insert rule filter openstack-INPUT iif lxdbr0 accept before running charmacraft pack07:36
hemanthas per the playbook.. to get network access to lxd07:37
tonybeth0 no longer has an IP and is marked as DOWN07:37
tonybGood: https://paste.opendev.org/show/buP2Y3YFx7fbLhAfd2e3/07:38
tonybNow: https://paste.opendev.org/show/bFGMj25zQox3PexkP6o6/07:39
hemanthanything on syslog that tells why it happened07:39
tonybAnything in thje charmpack that would down the host's network?07:39
*** ykarel_ is now known as ykarel07:40
tonybhttps://paste.opendev.org/show/bk4ZjLzFAvs00bGZU0BM/07:40
hemanthideally it shudnt touch the host... the only thing that tricks me is the interface name defined within container is eth0 which is defined in lxc profile.. sudo lxc profile show default07:40
tonybOkay I have run ifup eth0 you should be able to get back in07:42
hemanthok07:42
hemanthI will run some experiments if you dont mind... 07:43
tonybhttps://paste.opendev.org/show/bTCmjYHDdgYcoNUjOe4Q/07:44
tonybLooks like the snap shutdown eth007:44
hemanthyeah i want to change the lxd interface name something else other than eth007:44
tonybOkay07:44
hemanthI tried to remove lxd and my connection is again lost07:45
hemanthcan you ifup eth007:45
tonybdone07:45
hemanthI will get back to you in 15 minutes07:46
tonybOkay07:46
hemanthtonyb: I think its an issue with charmcraft with interface name as eth0 ... I will debug creating an environment with that combination.. I do not have a good workaround yet.. but seems this is not problem with ubuntu-noble where i see the tests ran on rax-iad-main but the interface names are enx0, enx1 ... please release the node08:29
hemanth* tonyb: I think its an issue with charmcraft with host interface name as eth0 ... I will debug creating an environment with that combination.. I do not have a good workaround yet.. but seems this is not problem with ubuntu-noble where i see the tests ran on rax-iad-main but the interface names are enx0, enx1 ... please release the node08:29
hemanthand thank you for all the support08:30
tonybnp  Good luck08:30
tonybyou know where I am/We are if you need more work08:30
hemanthyeah sure and thanks for the patience 08:30
tonybAll good.08:32
hemanthtonyb: did you happen to save the iptable rules. If you have them, can you provide those information as well09:04
tonybSorry, I cleaned them up, thinking they weren't needed anymore09:05
hemanthack.. thats fine09:05
tonybhemanth: Let me check something09:06
tonybI thought they might be in: https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_a15/openstack/a15b8e30ca57484c8f33eac2cfc352f4/zuul-info/zuul-info.ubuntu-jammy.txt09:07
tonybbut it seems not09:07
hemanthI tried to replicate the eth0 interface on my VM and this time the build is fine.. the only difference in my environment is there are no specific iptable/nft rules 09:09
hemanthI have the similar logs as upstream one (which you provided) https://paste.opendev.org/show/bTCmjYHDdgYcoNUjOe4Q/ .. except for iptables drop .. the eth0 interface is referred to the container eth0 interface09:10
tonybWell you can use the dummy change to dump them on a new run.  You could also add a hard fail right before the charmpack? and then try again09:11
hemanthI moved the jobs to ubuntu-noble and triggered those jobs in change 962366 .. seems fine for now.. (mostly due to different interface name on host.. crosschecked couple of instances on rax classic)09:12
tonybnow that we now roughly where/why it happens09:12
tonybor if it makes no real difference to the output you could just switch to noble, as you say that chnages the interface name in the host09:13
tonybI guess at this stage it's a matter of how much you want to debug this/understand the underlying problem09:13
hemanthyeah i will see some builds today but will come back tomorrow on jammy since it is curious what causes the iptables drop09:13
tonybWell the DROPS in https://paste.opendev.org/show/bTCmjYHDdgYcoNUjOe4Q/ are just random checks to see if there is a webserver on the node09:18
hemanthack09:19
tonybI don't think they're the cause09:19
tonybIt's the 'Oct 16 07:32:55 ubuntu kernel: [ 1680.186759] eth0: renamed from physyLxJCZ' that I thought was more interesting09:20
hemanthi am retesting again in my local with few settings.. 09:21
tonybOkay09:21
tonybI'm probably going to sign off soon09:22
hemanth\o09:23
*** bauzas9 is now known as bauzas19:35
opendevreviewTony Breeds proposed openstack/project-config master: [pti-python-tarball] Add compatibility for older wheels  https://review.opendev.org/c/openstack/project-config/+/96425120:12
opendevreviewTony Breeds proposed openstack/project-config master: [pti-python-tarball] Add compatibility for older wheels  https://review.opendev.org/c/openstack/project-config/+/96425121:07

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!