Friday, 2024-05-17

sorbalHi I have a an issue of octavia health manager rebuilding the amphorae VMs over and over while the loadbalancer is working fine and I can access my worker. I am getting the following in the amphorae syslog: WARNING octavia.amphorae.backends.health_daemon.health_daemon [-] Unable to query the HAProxy stats (/var/lib/octavia/<id>.sock) due to: not enough values to unpack (expected 2, got 1) and the following on my 12:48
sorbaloctavia-health-manager logs: Amphora <id> health message reports 0 listeners when 1 expected Waiting for 1 failovers to finish12:48
sorbalDo you have any suggestions of where I should look to find the issue besides those two logs and the haproxy journalctl that doesn't show any errors?12:49
sorbalI also tried echo "show info" | socat  on the haproxy socket but the stats seem normal12:50
johnsomsorbal That is a new one to me. Can you tell us what OS and version the Amphora is running, what version of HAProxy is (haproxy -v) in the Amphora, and what the ID in /opt/amphora-agent.gitref is?14:27
sorbaljohnsom Amphora is Ubuntu 22.04.4 LTS (GNU/Linux 5.15.0-107-generic x86_64), Haproxy is 2.4.24. I am still looking into the issue. I removed some additions I have made to the amphora, specifically a SPOE filter for haproxy and I don't get the Warning. There seems to be an issue with the show_stat() func in the haproxy_queue.py file. It is probably unable to read the formats of the stats of the spoe frontend and backend 14:42
sorbalreturned by the 'show stat' command. 14:42
sorbalHowever the format of the stats returned by the 'show stat' query on the socket is the same between the original amphora (without spoe) and the one that has the spoe enabled in the haproxy config14:44
johnsomYeah, ok, so 22.04 and 2.4.24 is what we run in all of our test jobs, so yes, I would expect it is related to your local changes.14:44
sorbalI am looking into the haproxy_query.HAProxyQuery(), stats_query.show_stat(), stats_query.get_pool_status() now to see if there is something that could throw the " not enough values to unpack (expected 2, got 1)" exception14:45
johnsomI would log the output of this: https://github.com/openstack/octavia/blob/master/octavia/amphorae/backends/utils/haproxy_query.py#L9714:45
sorbalyes I am looking at that method now, I will try to debug it by adding some extra logs.14:49
sorbalI am guessing maybe the line 102: list_results = results[2:].split('\n') is not happy about the format of the results when trying to split it14:49
johnsomThat would be my guess14:50
opendevreviewMichael Johnson proposed openstack/octavia master: Fix failover when using SRIOV VIP  https://review.opendev.org/c/openstack/octavia/+/91997416:49
sorbaljohnsom found the issue on https://github.com/openstack/octavia/blob/master/octavia/amphorae/backends/utils/haproxy_query.py#L136 17:53
johnsomSo your code modified the pool or listener ID somehow?17:54
sorbalmy spoa backend and server doesnt have a pxname with : and unique ids, because I use the same spoa backend for all the listeners. So I just gave it the name "coraza-spoa" so no ":" to split with.17:55
sorbalThe listeners and backends of octavia remain the same but it had an issue with the name of my spoa backend17:56
sorbalI guess i have to do something like the line above "if 'prometheus' in line['pxname']: continue" 17:57
sorbalJust wondering though, wouldn't it be best if octavia handled stuff like that more gracefully instead of falling into an infinite loop? Do you think it would be best to log a warning or something and ignore backends that don't have a ":" in their name like octavia currently ignores prometheus?18:00
johnsomWell, the amphora is a highly managed system. We do not expect there to be things in that configuration that are not put there by the amphora driver.18:02
johnsomAs for the loop, the health manager is doing exactly what it is supposed to do. It was finding a broken amphora and attempting to repair it.18:02
johnsomWhat are you attempting to do with you SPOA?18:03
sorbalI am developing my own amphora image elements and amphora driver in order to create a way to enable a WAF on any listener you want from the cli. 18:04
johnsomAh, I see. That is on our feature roadmap18:04
sorbalWell this is my uni project, but I will look into the roadmap, I didn't know18:05
johnsomIt's not scheduled to be worked on any time soon18:05
sorbalWell I am sure I will have something simple working soon but I am sure it will need a lot of polishing to be worthy of a contribution18:07
johnsomYeah, cool. Let us know. Are you going down the ModSecurity path?18:07
johnsomCoraza?18:07
sorbalI tried, for some reason I couldn't get it to work with modsecurity, I havent ruled it out but it is throwing random deny codes (overflowed ints) and I can't see why18:08
sorbalSo yes coraza is working great for now18:08
johnsomNice, ModSecurity is EOL anyway I think18:09
sorbalIt is although I see some people from the owasp community want to keep it alive, they have a slack channel about it18:09
sorbalI wanted to look into openappsec too but I will not have the time I am afraid18:10
johnsomI think Coraza is a good path18:10
sorbalthink so too, thanks for your help johnsom18:12
johnsomSure, NP18:12

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!