Tuesday, 2020-01-21

openstackgerritAdit Sarfaty proposed openstack/neutron-lbaas stable/stein: Prevent deletion of a listener attached to a pool  https://review.opendev.org/67765905:31
openstackgerritCarlos Goncalves proposed openstack/octavia-tempest-plugin master: Enable KVM libvirt type on all scenario jobs  https://review.opendev.org/70292106:05
openstackgerritMerged openstack/octavia master: Allow the Octavia wsgi to accept argv parameters  https://review.opendev.org/70148508:07
*** tkajinam has joined #openstack-lbaas12:14
openstackgerritBrian Haley proposed openstack/octavia master: Remove all usage of six library  https://review.opendev.org/70129014:36
johnsomFinally: Flow 'octavia-failover-loadbalancer-flow' (0a5f299e-ba0b-4433-b172-84421b130e3d) transitioned into state 'SUCCESS' from state 'RUNNING'18:43
johnsomIt actually works too: <grin>18:44
johnsomFlow 'octavia-failover-loadbalancer-flow' (0a5f299e-ba0b-4433-b172-84421b130e3d) transitioned into state 'SUCCESS' from state 'RUNNING'18:44
johnsomWrong paste18:44
johnsomstack@devstack:~/octavia$ curl
johnsomWelcome to connection 11094218:44
johnsomI really hate gnome's cut/paste...18:44
cgoncalvesmnaser, hey! I've been trying to hit a nodepool instance on vexxhost to specifically test a patch there to no luck thus far. all I get is rax, ovh and fortnebula. any chance you could help somehow?18:47
cgoncalvespatch: https://review.opendev.org/#/c/702921/18:47
cgoncalvesjohnsom, it will be fun merging that in the amphorav2 driver. not to mention backporting18:48
cgoncalvescan't get worse than the single-process patch18:49
cgoncalvesflows refactor18:49
johnsomummm, well, .... Yeah, it will be a bunch of work for sure.18:50
johnsomI guess I have found the longest time fedora 31 can deal with my system load, 49 days. Windows are not all painting, terminal character movement is stuttering, etc....  Time for a reboot.18:58
cgoncalvesI had the same issue until some weeks (months?) ago. very sporadic and only affecting when on multi-screen19:02
johnsomI have seen this before. It seems to be swap related. Once it decides it needs to start swapping, it gets unhappy.19:03
cgoncalvesmnaser, never mind. I got one job running on vexxhost. KVM libvirt type + cpu mode host-passthrough work. expected to see better performance, though, but this is good progress :)20:09
rm_workare there any known issues currently with UDP-CONNECT healthchecks?20:11
rm_workSeeing ERROR for all member operating_status, but traffic is still passing to those members <_<20:11
rm_workand apparently even if the members are taken completely down, the LB still tries to pass traffic to them20:12
rm_workand they're still just "ERROR"20:12
rm_worknot DOWN20:12
johnsomrm_works for me, but you have to have your security groups setup right for the ICMP traffic.20:12
rm_workyeah i'll make sure20:12
johnsomWe don't have a "DOWN" status BTW20:13
rm_workbut why would the HM allow traffic to an ERROR member?20:13
rm_workcould have sword we did, ok whelp20:14
rm_workbut shouldn't it take ERROR members out of rotation?20:14
johnsomyes, in theory20:15
johnsomI would hop on the amp and look at the ipvsadm status20:15
johnsomthen back track20:15
rm_workAH ok that's how you do that? I was looking for logs but there didn't seem to be any20:15
johnsomYeah, the lvs based UDP functionality is all in the kernel and doesn't not log flows.20:16
cgoncalvesFYI, gthiemonge's UDP scenario patch inclues test health monitor for UDP members: https://review.opendev.org/#/c/656515/20:18
rm_worki think it's not listing anything O_o20:18
johnsomAre you in the netns?20:18
johnsomYeah, pretty sure I sent you all this once before, including the test patch20:18
rm_workah ok it's netns relevant too20:19
rm_workyeah, so it's listing the one member that's in ERROR according to the HM20:19
rm_workcgoncalves: is that run on both centos and ubuntu amps? could be an OS difference?20:20
cgoncalvesrm_work, Cirros in upstream CI and I'm +50% certain gthiemonge also tested on RHEL/CentOS too ;)20:24
gthiemongeyep, ubuntu and centos amps20:24
cgoncalvesoh, there's the man!20:24
rm_workyeah if i'm on the Amp, how can i get some insight into what the HM is doing?20:27
rm_workI don't know how the UDP HMs work20:27
gthiemongeI'd use tcpdump (in the amp or on the host) to see if HM is working correctly20:28
johnsomAll of the info is in ipvsadm20:28
johnsomI am going to grab lunch, but can maybe answer questions after lunch20:29
cgoncalvesrm_work, https://docs.openstack.org/octavia/latest/user/guides/basic-cookbook.html#other-heath-monitors20:29
rm_workhmm k20:29
cgoncalvessee "UDP-CONNECT"20:29
rm_workso we're not getting "ONLINE" for a down member20:30
rm_workwe're getting ERROR for up OR down members, and the members are remaining in the rotation either way20:30
rm_workif status is ERROR (correctly, OR not) shouldn't it be removed from the rotation?20:30
gthiemongerm_work: yes it should20:32
rm_workok, that's what I'm not seeing20:33
rm_workso trying to figure out what the HM is actually seeing, not what our agent is forwarding20:33
rm_workbecause the agent is obviously forwarding "DOWN"20:34
* rm_work https://github.com/openstack/octavia/blob/master/octavia/controller/healthmanager/health_drivers/update_db.py#L421-L42220:35
rm_workbut ipvsadm shows the down members still in rotation20:36
rm_work(which I guess is where I was thinking of "DOWN" existing)20:40
*** gcheresh has quit IRC20:45
johnsomSo, rm_work, in the amp netns, if you run "ipvsadm --list" this is all of the healthy members on the VIP (first line).21:31
johnsomAny members detected as DOWN would not be listed in that list.21:31
rm_workit's listing the member21:32
johnsomI have this config:21:32
johnsombut once the health check kicks in:21:32
johnsominstance has the ICMP SG rules in to allow the "port unreachable"21:33
johnsomInstance does not allow sending ICMP so it is falsely detecting port 55555 as open21:33
rm_workyeah, again, assuming it's totally unreachable21:35
rm_workit'd be marked DOWN by amp agent (which translates to ERROR in our DB/API)21:35
johnsomIf the ICMP is not being sent, the operating status will never update21:35
rm_workwhich is what i'm seeing21:35
rm_workso it'd not be UP *or* DOWN (ONLINE/OFFLINE)21:36
rm_workand the status just wouldn't ... change?21:36
rm_workif the HM was never able to reach it from the initial add21:36
johnsomI think it sticks at the last state it had21:36
rm_workso maybe that's the case, and the last state was ERROR for some reason? O_o21:37
rm_workso the HM actually isn't getting any status , and therefore wouldn't remove the node from rotation21:37
johnsomHmm, something is fishy though. My "good" one is still showing "NO_MONITOR"21:37
*** mithilarun has joined #openstack-lbaas21:37
johnsomNo, no, no, the removal of rotation is all in the kernel, HM has nothing to do with it21:37
johnsomThat is the ipvsadm --list output21:38
johnsomIf it's listed, it is in rotation, if not, it is out of the pool21:38
johnsomWhen it is pulled out of the pool, because it is failed, it will also log:21:39
johnsomJan 21 21:35:10 amphora-6d2886c5-b068-49c8-a510-d4b987a97d9d Keepalived_healthcheckers_amphora-haproxy[5639]: Misc check to [] for [/var/lib/octavia/lvs/check/udp_check.sh 55555] failed.21:39
johnsomin the syslog/messages21:39
rm_workahh ok so the HM is still keepalived?21:41
johnsomThe health monitor for UDP is part keepalived, part kernel. keepalived will monitor21:42
rm_workok and then it just does commands to do the updating of the rotation21:44
johnsomSo, on mine, the agent is sending the right stuff, but an "UP" isn't making the member online.  That said, this devstack is fairly old code21:46
*** tkajinam has joined #openstack-lbaas22:57
*** openstackgerrit has joined #openstack-lbaas23:00
openstackgerritBrian Haley proposed openstack/octavia master: Make octavia-grenade job use python3  https://review.opendev.org/69348623:00
johnsom71 seconds for an Active/Standby load balancer full failover.23:02
johnsom<note: no downtime of course>23:04
johnsomWell, ok, minimal downtime. lol23:27
