sorbal | Hi I have a an issue of octavia health manager rebuilding the amphorae VMs over and over while the loadbalancer is working fine and I can access my worker. I am getting the following in the amphorae syslog: WARNING octavia.amphorae.backends.health_daemon.health_daemon [-] Unable to query the HAProxy stats (/var/lib/octavia/<id>.sock) due to: not enough values to unpack (expected 2, got 1) and the following on my | 12:48 |
---|---|---|
sorbal | octavia-health-manager logs: Amphora <id> health message reports 0 listeners when 1 expected Waiting for 1 failovers to finish | 12:48 |
sorbal | Do you have any suggestions of where I should look to find the issue besides those two logs and the haproxy journalctl that doesn't show any errors? | 12:49 |
sorbal | I also tried echo "show info" | socat on the haproxy socket but the stats seem normal | 12:50 |
johnsom | sorbal That is a new one to me. Can you tell us what OS and version the Amphora is running, what version of HAProxy is (haproxy -v) in the Amphora, and what the ID in /opt/amphora-agent.gitref is? | 14:27 |
sorbal | johnsom Amphora is Ubuntu 22.04.4 LTS (GNU/Linux 5.15.0-107-generic x86_64), Haproxy is 2.4.24. I am still looking into the issue. I removed some additions I have made to the amphora, specifically a SPOE filter for haproxy and I don't get the Warning. There seems to be an issue with the show_stat() func in the haproxy_queue.py file. It is probably unable to read the formats of the stats of the spoe frontend and backend | 14:42 |
sorbal | returned by the 'show stat' command. | 14:42 |
sorbal | However the format of the stats returned by the 'show stat' query on the socket is the same between the original amphora (without spoe) and the one that has the spoe enabled in the haproxy config | 14:44 |
johnsom | Yeah, ok, so 22.04 and 2.4.24 is what we run in all of our test jobs, so yes, I would expect it is related to your local changes. | 14:44 |
sorbal | I am looking into the haproxy_query.HAProxyQuery(), stats_query.show_stat(), stats_query.get_pool_status() now to see if there is something that could throw the " not enough values to unpack (expected 2, got 1)" exception | 14:45 |
johnsom | I would log the output of this: https://github.com/openstack/octavia/blob/master/octavia/amphorae/backends/utils/haproxy_query.py#L97 | 14:45 |
sorbal | yes I am looking at that method now, I will try to debug it by adding some extra logs. | 14:49 |
sorbal | I am guessing maybe the line 102: list_results = results[2:].split('\n') is not happy about the format of the results when trying to split it | 14:49 |
johnsom | That would be my guess | 14:50 |
opendevreview | Michael Johnson proposed openstack/octavia master: Fix failover when using SRIOV VIP https://review.opendev.org/c/openstack/octavia/+/919974 | 16:49 |
sorbal | johnsom found the issue on https://github.com/openstack/octavia/blob/master/octavia/amphorae/backends/utils/haproxy_query.py#L136 | 17:53 |
johnsom | So your code modified the pool or listener ID somehow? | 17:54 |
sorbal | my spoa backend and server doesnt have a pxname with : and unique ids, because I use the same spoa backend for all the listeners. So I just gave it the name "coraza-spoa" so no ":" to split with. | 17:55 |
sorbal | The listeners and backends of octavia remain the same but it had an issue with the name of my spoa backend | 17:56 |
sorbal | I guess i have to do something like the line above "if 'prometheus' in line['pxname']: continue" | 17:57 |
sorbal | Just wondering though, wouldn't it be best if octavia handled stuff like that more gracefully instead of falling into an infinite loop? Do you think it would be best to log a warning or something and ignore backends that don't have a ":" in their name like octavia currently ignores prometheus? | 18:00 |
johnsom | Well, the amphora is a highly managed system. We do not expect there to be things in that configuration that are not put there by the amphora driver. | 18:02 |
johnsom | As for the loop, the health manager is doing exactly what it is supposed to do. It was finding a broken amphora and attempting to repair it. | 18:02 |
johnsom | What are you attempting to do with you SPOA? | 18:03 |
sorbal | I am developing my own amphora image elements and amphora driver in order to create a way to enable a WAF on any listener you want from the cli. | 18:04 |
johnsom | Ah, I see. That is on our feature roadmap | 18:04 |
sorbal | Well this is my uni project, but I will look into the roadmap, I didn't know | 18:05 |
johnsom | It's not scheduled to be worked on any time soon | 18:05 |
sorbal | Well I am sure I will have something simple working soon but I am sure it will need a lot of polishing to be worthy of a contribution | 18:07 |
johnsom | Yeah, cool. Let us know. Are you going down the ModSecurity path? | 18:07 |
johnsom | Coraza? | 18:07 |
sorbal | I tried, for some reason I couldn't get it to work with modsecurity, I havent ruled it out but it is throwing random deny codes (overflowed ints) and I can't see why | 18:08 |
sorbal | So yes coraza is working great for now | 18:08 |
johnsom | Nice, ModSecurity is EOL anyway I think | 18:09 |
sorbal | It is although I see some people from the owasp community want to keep it alive, they have a slack channel about it | 18:09 |
sorbal | I wanted to look into openappsec too but I will not have the time I am afraid | 18:10 |
johnsom | I think Coraza is a good path | 18:10 |
sorbal | think so too, thanks for your help johnsom | 18:12 |
johnsom | Sure, NP | 18:12 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!