Saturday, 2026-06-27

mikalI think I've found another flakey CI issue -- we're seeing libvirt timeouts and connection drops because libvirt hasn't been tuned to allow the number of connections we're doing. I am working on a patch for it now.01:36
mikalIts why CI on https://review.opendev.org/c/openstack/kolla-ansible/+/994603 keeps failing.01:36
opendevreviewMichael Still proposed openstack/kolla-ansible master: Tune libvirtd connection limits for OpenStack.  https://review.opendev.org/c/openstack/kolla-ansible/+/99517104:29
mikal^--- this attempts to address libvirt "flakiness" I am seeing in CI. Basically nova-compute's connections to libvirt are being rate limited, which particularly sucks when a bunch of VMs are being started.04:30
blanson[m]oh those werent tuned 07:24
blanson[m]thank you! will take a look but from exp I think that's generally something we want people to be able to set for their env 07:25
blanson[m]so win-win :D07:25
mikalI cannot find an openstack deployment tool which does tune them, but the libvirt authors clearly intended us to. So yeah, Kolla for the win I suppose?07:45
mikalWe should definitely backport this one if people are ok with doing it in master.07:46
*** jhorstmann is now known as Guest1212009:14
blanson[m]in production we had to reach some level of scale before these became an issue so I guess not a lot of people end up having to tweak. it's noticeable in CI tho cause tempest does tempest things10:00
blanson[m]I agree backport would be good, it needs a bug reports for it but otherwise +2 10:01
fricklerI'm just wondering what changed to make the issue so prominent right now. the limits in libvirtd aren't new, are they?11:10
Vii_Yeah, the libvirtd limits themselves aren't new. My guess is that something changed in the libvirt/nova stack (or simply the timing of operations), so max_client_requests=5 is now hit much more often under concurrent builds. 11:40
Vii_Since we're only seeing this on Debian Trixie and not on Rocky or Ubuntu, it also points towards a package/version difference rather than the limit itself suddenly becoming an issue11:41
Vii_ubuntu: metadata: {'Python': '3.12.3',11:45
Vii_debian: metadata: {'Python': '3.13.5',11:46
Vii_hmmm11:46
Vii_From a quick look, it seems this only affects the "upgrade" jobs11:47
opendevreviewVerification of a change to openstack/kolla-ansible stable/2025.1 failed: proxysql: fix keepalived healthcheck to use MySQL protocol  https://review.opendev.org/c/openstack/kolla-ansible/+/99240512:02
opendevreviewVerification of a change to openstack/kolla-ansible stable/2025.1 failed: proxysql: fix keepalived healthcheck to use MySQL protocol  https://review.opendev.org/c/openstack/kolla-ansible/+/99240513:40
mikalDo we have a way to search historic zuul logs? Is it possible we've been masking the problem for a while with rechecks?20:22
mikalThe short answer is I don't know why it surfaces now, I just know its relatively easy to fix and the libvirt team has fairly clear recommendations on how to tune it.20:22
blanson[m]i think there is cause i've seen people pull logs out of nowhere (once or twice in the neutron channel when I asked questions) but I have no idea how that works20:28
blanson[m]maybe someone with more exp can answer this one20:28
mikalThere is an elastic search at https://opensearch.logs.openstack.org/, but I cannot get my query to work. I am certainly not an elasticsearch expert however.23:12

Generated by irclog2html.py 4.1.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!