Friday, 2024-05-17

sorbal	Hi I have a an issue of octavia health manager rebuilding the amphorae VMs over and over while the loadbalancer is working fine and I can access my worker. I am getting the following in the amphorae syslog: WARNING octavia.amphorae.backends.health_daemon.health_daemon [-] Unable to query the HAProxy stats (/var/lib/octavia/<id>.sock) due to: not enough values to unpack (expected 2, got 1) and the following on my	12:48
sorbal	octavia-health-manager logs: Amphora <id> health message reports 0 listeners when 1 expected Waiting for 1 failovers to finish	12:48
sorbal	Do you have any suggestions of where I should look to find the issue besides those two logs and the haproxy journalctl that doesn't show any errors?	12:49
sorbal	I also tried echo "show info" \| socat on the haproxy socket but the stats seem normal	12:50
johnsom	sorbal That is a new one to me. Can you tell us what OS and version the Amphora is running, what version of HAProxy is (haproxy -v) in the Amphora, and what the ID in /opt/amphora-agent.gitref is?	14:27
sorbal	johnsom Amphora is Ubuntu 22.04.4 LTS (GNU/Linux 5.15.0-107-generic x86_64), Haproxy is 2.4.24. I am still looking into the issue. I removed some additions I have made to the amphora, specifically a SPOE filter for haproxy and I don't get the Warning. There seems to be an issue with the show_stat() func in the haproxy_queue.py file. It is probably unable to read the formats of the stats of the spoe frontend and backend	14:42
sorbal	returned by the 'show stat' command.	14:42
sorbal	However the format of the stats returned by the 'show stat' query on the socket is the same between the original amphora (without spoe) and the one that has the spoe enabled in the haproxy config	14:44
johnsom	Yeah, ok, so 22.04 and 2.4.24 is what we run in all of our test jobs, so yes, I would expect it is related to your local changes.	14:44
sorbal	I am looking into the haproxy_query.HAProxyQuery(), stats_query.show_stat(), stats_query.get_pool_status() now to see if there is something that could throw the " not enough values to unpack (expected 2, got 1)" exception	14:45
johnsom	I would log the output of this: https://github.com/openstack/octavia/blob/master/octavia/amphorae/backends/utils/haproxy_query.py#L97	14:45
sorbal	yes I am looking at that method now, I will try to debug it by adding some extra logs.	14:49
sorbal	I am guessing maybe the line 102: list_results = results[2:].split('\n') is not happy about the format of the results when trying to split it	14:49
johnsom	That would be my guess	14:50
opendevreview	Michael Johnson proposed openstack/octavia master: Fix failover when using SRIOV VIP https://review.opendev.org/c/openstack/octavia/+/919974	16:49
sorbal	johnsom found the issue on https://github.com/openstack/octavia/blob/master/octavia/amphorae/backends/utils/haproxy_query.py#L136	17:53
johnsom	So your code modified the pool or listener ID somehow?	17:54
sorbal	my spoa backend and server doesnt have a pxname with : and unique ids, because I use the same spoa backend for all the listeners. So I just gave it the name "coraza-spoa" so no ":" to split with.	17:55
sorbal	The listeners and backends of octavia remain the same but it had an issue with the name of my spoa backend	17:56
sorbal	I guess i have to do something like the line above "if 'prometheus' in line['pxname']: continue"	17:57
sorbal	Just wondering though, wouldn't it be best if octavia handled stuff like that more gracefully instead of falling into an infinite loop? Do you think it would be best to log a warning or something and ignore backends that don't have a ":" in their name like octavia currently ignores prometheus?	18:00
johnsom	Well, the amphora is a highly managed system. We do not expect there to be things in that configuration that are not put there by the amphora driver.	18:02
johnsom	As for the loop, the health manager is doing exactly what it is supposed to do. It was finding a broken amphora and attempting to repair it.	18:02
johnsom	What are you attempting to do with you SPOA?	18:03
sorbal	I am developing my own amphora image elements and amphora driver in order to create a way to enable a WAF on any listener you want from the cli.	18:04
johnsom	Ah, I see. That is on our feature roadmap	18:04
sorbal	Well this is my uni project, but I will look into the roadmap, I didn't know	18:05
johnsom	It's not scheduled to be worked on any time soon	18:05
sorbal	Well I am sure I will have something simple working soon but I am sure it will need a lot of polishing to be worthy of a contribution	18:07
johnsom	Yeah, cool. Let us know. Are you going down the ModSecurity path?	18:07
johnsom	Coraza?	18:07
sorbal	I tried, for some reason I couldn't get it to work with modsecurity, I havent ruled it out but it is throwing random deny codes (overflowed ints) and I can't see why	18:08
sorbal	So yes coraza is working great for now	18:08
johnsom	Nice, ModSecurity is EOL anyway I think	18:09
sorbal	It is although I see some people from the owasp community want to keep it alive, they have a slack channel about it	18:09
sorbal	I wanted to look into openappsec too but I will not have the time I am afraid	18:10
johnsom	I think Coraza is a good path	18:10
sorbal	think so too, thanks for your help johnsom	18:12
johnsom	Sure, NP	18:12

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!