| mikal | I think I've found another flakey CI issue -- we're seeing libvirt timeouts and connection drops because libvirt hasn't been tuned to allow the number of connections we're doing. I am working on a patch for it now. | 01:36 |
|---|---|---|
| mikal | Its why CI on https://review.opendev.org/c/openstack/kolla-ansible/+/994603 keeps failing. | 01:36 |
| opendevreview | Michael Still proposed openstack/kolla-ansible master: Tune libvirtd connection limits for OpenStack. https://review.opendev.org/c/openstack/kolla-ansible/+/995171 | 04:29 |
| mikal | ^--- this attempts to address libvirt "flakiness" I am seeing in CI. Basically nova-compute's connections to libvirt are being rate limited, which particularly sucks when a bunch of VMs are being started. | 04:30 |
| blanson[m] | oh those werent tuned | 07:24 |
| blanson[m] | thank you! will take a look but from exp I think that's generally something we want people to be able to set for their env | 07:25 |
| blanson[m] | so win-win :D | 07:25 |
| mikal | I cannot find an openstack deployment tool which does tune them, but the libvirt authors clearly intended us to. So yeah, Kolla for the win I suppose? | 07:45 |
| mikal | We should definitely backport this one if people are ok with doing it in master. | 07:46 |
| *** jhorstmann is now known as Guest12120 | 09:14 | |
| blanson[m] | in production we had to reach some level of scale before these became an issue so I guess not a lot of people end up having to tweak. it's noticeable in CI tho cause tempest does tempest things | 10:00 |
| blanson[m] | I agree backport would be good, it needs a bug reports for it but otherwise +2 | 10:01 |
| frickler | I'm just wondering what changed to make the issue so prominent right now. the limits in libvirtd aren't new, are they? | 11:10 |
| Vii_ | Yeah, the libvirtd limits themselves aren't new. My guess is that something changed in the libvirt/nova stack (or simply the timing of operations), so max_client_requests=5 is now hit much more often under concurrent builds. | 11:40 |
| Vii_ | Since we're only seeing this on Debian Trixie and not on Rocky or Ubuntu, it also points towards a package/version difference rather than the limit itself suddenly becoming an issue | 11:41 |
| Vii_ | ubuntu: metadata: {'Python': '3.12.3', | 11:45 |
| Vii_ | debian: metadata: {'Python': '3.13.5', | 11:46 |
| Vii_ | hmmm | 11:46 |
| Vii_ | From a quick look, it seems this only affects the "upgrade" jobs | 11:47 |
| opendevreview | Verification of a change to openstack/kolla-ansible stable/2025.1 failed: proxysql: fix keepalived healthcheck to use MySQL protocol https://review.opendev.org/c/openstack/kolla-ansible/+/992405 | 12:02 |
| opendevreview | Verification of a change to openstack/kolla-ansible stable/2025.1 failed: proxysql: fix keepalived healthcheck to use MySQL protocol https://review.opendev.org/c/openstack/kolla-ansible/+/992405 | 13:40 |
| mikal | Do we have a way to search historic zuul logs? Is it possible we've been masking the problem for a while with rechecks? | 20:22 |
| mikal | The short answer is I don't know why it surfaces now, I just know its relatively easy to fix and the libvirt team has fairly clear recommendations on how to tune it. | 20:22 |
| blanson[m] | i think there is cause i've seen people pull logs out of nowhere (once or twice in the neutron channel when I asked questions) but I have no idea how that works | 20:28 |
| blanson[m] | maybe someone with more exp can answer this one | 20:28 |
| mikal | There is an elastic search at https://opensearch.logs.openstack.org/, but I cannot get my query to work. I am certainly not an elasticsearch expert however. | 23:12 |
Generated by irclog2html.py 4.1.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!