Thursday, 2020-12-03

*** noonedeadpunk has quit IRC		00:08
*** noonedeadpunk has joined #openstack-lbaas		00:10
*** cgoncalves has quit IRC		00:28
*** gregwork has quit IRC		00:48
*** cgoncalves has joined #openstack-lbaas		01:06
*** armax has quit IRC		01:29
*** rcernin has quit IRC		01:58
*** rcernin has joined #openstack-lbaas		01:58
*** ramishra_ has joined #openstack-lbaas		02:08
*** xgerman has quit IRC		02:56
*** rcernin has quit IRC		03:06
openstackgerrit	wu.chunyang proposed openstack/octavia master: Add notifications specification documens https://review.opendev.org/c/openstack/octavia/+/727915	03:16
*** rcernin has joined #openstack-lbaas		03:26
*** rcernin has quit IRC		03:30
*** rcernin has joined #openstack-lbaas		03:30
*** sapd1_x has joined #openstack-lbaas		03:33
*** psachin has joined #openstack-lbaas		03:59
*** lemko has quit IRC		04:25
*** lemko has joined #openstack-lbaas		04:25
*** gcheresh has joined #openstack-lbaas		05:32
*** sapd1_x has quit IRC		05:40
openstackgerrit	zhangboye proposed openstack/octavia master: Replace deprecated UPPER_CONSTRAINTS_FILE variable https://review.opendev.org/c/openstack/octavia/+/765240	05:48
openstackgerrit	Merged openstack/octavia-tempest-plugin master: Add HTTP/2 support to the Go test server https://review.opendev.org/c/openstack/octavia-tempest-plugin/+/758617	06:39
openstackgerrit	Merged openstack/octavia master: Add amphora_id in store params for failover_amphora https://review.opendev.org/c/openstack/octavia/+/760380	06:45
openstackgerrit	Ann Taraday proposed openstack/octavia master: Alias change amphorav2 -> amphora https://review.opendev.org/c/openstack/octavia/+/740432	07:20
openstackgerrit	Ann Taraday proposed openstack/octavia master: Alias change amphorav2 -> amphora https://review.opendev.org/c/openstack/octavia/+/740432	07:20
*** rpittau\|afk is now known as rpittau		07:27
openstackgerrit	wu.chunyang proposed openstack/octavia master: Add default value for enabled column in l7rule table https://review.opendev.org/c/openstack/octavia/+/761283	07:27
*** sapd1_x has joined #openstack-lbaas		07:29
openstackgerrit	wu.chunyang proposed openstack/octavia master: Add notifications specification documens https://review.opendev.org/c/openstack/octavia/+/727915	07:49
rm_work	interesting one -- an amp went down because a HV went down... and it took MANY HOURS to actually failover (config is set to default of 60s heartbeat stale time)	08:04
rm_work	recorded this error at the time I assume it was first stale:	08:04
rm_work	octavia/controller/worker/v1/tasks/amphora_driver_tasks.py:execute:84 Failed to update listeners on amphora c37d547f-bacb-48c3-8bab-41254bba4945. Skipping this amphora as it is failing to update due to: contacting the amphora timed out	08:08
rm_work	and i see a ton of the timeouts going back for ~12 minutes (120 retries at 5s each, also default config, it seems)	08:13
rm_work	so ... the amp went down, became stale in the DB, and then... why would it be trying to connect and timing out? O_o	08:13
rm_work	after that timeout it seems it moved on to another step? and failed timeouts for another ~10.5m (120*5) until it got this:	08:16
rm_work	octavia/controller/worker/v1/tasks/amphora_driver_tasks.py:execute:136 Failed to reload listeners on amphora c37d547f-bacb-48c3-8bab-41254bba4945. Skipping this amphora as it is failing to reload due to: contacting the amphora timed out	08:16
rm_work	AH, it seems that both of the amphorae on the LB failed at almost the same time (may have been on the same HV... damn soft-aa)	08:21
rm_work	possibly the single-amp failover process doesn't handle the case super well?	08:21
rm_work	one succeeded but the other most certainly did not	08:21
*** luksky has joined #openstack-lbaas		08:22
rm_work	OH I SEE (I think)	08:24
rm_work	So both went down at approximately the same time. One of them went stale first. HM tried to failover that amp. It succeeded but took a really long time because it had to time-out on two steps attempting to update the other amp.	08:25
rm_work	those timeout failures cause the other amp (which was also down) to go to ERROR	08:26
rm_work	somehow about 2h45m later, it actually DID get picked up as stale again??? was it marked busy somehow for that time? unclear	08:28
rm_work	at which point it completed failover in short order	08:28
*** redrobot has quit IRC		08:41
*** rcernin has quit IRC		08:46
*** vishalmanchanda has joined #openstack-lbaas		08:47
rm_work	nevermind. figured out the correct timeline	08:50
rm_work	6:26 -- amp1 HV dies	08:51
rm_work	6:27 -- amp1 goes stale, attempts to failover, but can't connect to amp2 to update the haproxy/vrrp peer configs	08:51
rm_work	6:39 -- times out on first task (update listeners)	08:53
rm_work	6:49 -- times out on second task (reload listeners)	08:53
rm_work	6:50 -- amp1 failover complete	08:54
rm_work	9:32 -- amp2 HV dies	08:54
rm_work	9:33 -- amp2 goes stale, fails over	08:54
rm_work	9:35 -- amp2 failover complete	08:54
rm_work	so, the issue was that amp2 was not connectable during the amp1 failover, even tho it should have been up, as it was sending heartbeats just fine and the HV was up	08:55
rm_work	amp2 was in ERROR status for the intervening period until the heartbeats actually did fail, and then it was replaced correctly	08:55
rm_work	MEANWHILE, user was reporting intermittent 502 errors -- I can only guess that: without an updated vrrp config on amp2, it thought it was still supposed to be the MASTER and so it kept gARPing, but so did amp1-new, and the routes were flipping back and forth constantly? would that cause a 502 in the case where it happened at exactly the right time (between packets on a keepalive connection)?	08:57
*** rcernin has joined #openstack-lbaas		09:00
*** zzzeek has quit IRC		09:02
*** rcernin has quit IRC		09:04
*** zzzeek has joined #openstack-lbaas		09:04
*** ataraday_ has joined #openstack-lbaas		09:07
lxkong	anyone has seen this error during failover before https://dpaste.com/C68CMVQ6A#wrap?	09:38
lxkong	after upgrading octavia from train to ussuri.	09:39
lxkong	The load balancer info: https://dpaste.com/7HQRKPWL8	09:39
lxkong	in ussuri, when failover failed, the new amphora is removed, so no chance to log in and check	09:40
*** luksky has quit IRC		09:51
lxkong	well, the pool in the load balancer has configured session persistence	10:11
lxkong	the peer section is empty, https://dpaste.com/GV6V84PHV	10:22
lxkong	hmm...seems like the issue has been fixed in the upstream recently	11:01
lxkong	using the latest master, amphora can be successfully failed over	11:01
lxkong	found, https://review.opendev.org/q/change:I923accd73e0c9cadc91c115157c576432f428622	11:16
*** sapd1 has quit IRC		11:17
*** zzzeek has quit IRC		11:17
*** zzzeek has joined #openstack-lbaas		11:19
*** luksky has joined #openstack-lbaas		11:22
*** sapd1_x has quit IRC		11:27
*** sapd1_x has joined #openstack-lbaas		11:33
*** spatel has joined #openstack-lbaas		11:34
*** sapd1_x has quit IRC		11:38
*** spatel has quit IRC		11:39
*** ramishra_ has quit IRC		11:40
*** psachin has quit IRC		11:45
*** ramishra has joined #openstack-lbaas		11:59
openstackgerrit	Ann Taraday proposed openstack/octavia master: Alias change amphorav2 -> amphora https://review.opendev.org/c/openstack/octavia/+/740432	12:01
openstackgerrit	Ann Taraday proposed openstack/octavia master: Alias change amphorav2 -> amphora https://review.opendev.org/c/openstack/octavia/+/740432	12:03
*** ramishra has quit IRC		12:09
*** ramishra has joined #openstack-lbaas		12:16
*** zzzeek has quit IRC		13:09
*** zzzeek has joined #openstack-lbaas		13:11
*** mugsie has quit IRC		13:32
*** TrevorV has joined #openstack-lbaas		13:38
*** ramishra has quit IRC		13:45
*** ramishra has joined #openstack-lbaas		14:04
*** ataraday_ has quit IRC		14:51
*** laerling has joined #openstack-lbaas		15:17
*** redrobot has joined #openstack-lbaas		15:35
*** spatel has joined #openstack-lbaas		15:40
spatel	johnsom: hey!	15:40
johnsom	rm_work: Your frankinetwork makes it had to say definitively, but if the user got a 502 from the load balancer it was reachable, but a backend server may have become unreachable while it was servicing the request.	15:40
johnsom	spatel Hi	15:40
spatel	I have affinity SINGLE for octavia but when i building LB default it creating two amphora somehow	15:41
rm_work	Hmm	15:41
johnsom	Do you have a spares pool configured?	15:42
spatel	johnsom: let me collect more logs etc. thought just ask you if any thing change recently which i am not aware.	15:42
spatel	spares pool? i didn't do any special configuration (everything is default)	15:43
johnsom	Check your config file for the spares setting and make sure it is not configured	15:43
spatel	looking..	15:44
johnsom	spare_amphora_pool_size	15:45
spatel	spare_amphora_pool_size = 1	15:45
johnsom	That is why, it is booting a spare amp	15:45
johnsom	Set that to zero	15:45
spatel	how does this extra amphora different then full HA mode?	15:46
spatel	what is the use of having spare_amphora_pool_size setting?	15:46
johnsom	They are unconfigured and can be used when creating a new load balancer	15:47
adeberg	faster recovery i believe	15:47
johnsom	Well, in very limited situations due to nova issues. We have marked it deprecated in recent releases	15:47
spatel	johnsom: you are saying this option will deprecated in future release right?	15:49
johnsom	The idea wad to speed creation and failover by having the VM already booted.	15:49
johnsom	Yes	15:49
spatel	good to know so i don't spend my time on that one. :)	15:49
johnsom	https://docs.openstack.org/octavia/latest/configuration/configref.html#house_keeping.spare_amphora_pool_size	15:49
spatel	+1	15:53
openstackgerrit	Merged openstack/octavia stable/ussuri: Fix load balancers with failed amphora failover https://review.opendev.org/c/openstack/octavia/+/763732	15:53
openstackgerrit	Merged openstack/octavia stable/stein: Fix missing cronie package in RHEL-based image builds https://review.opendev.org/c/openstack/octavia/+/764890	15:54
openstackgerrit	Merged openstack/octavia stable/train: Fix missing cronie package in RHEL-based image builds https://review.opendev.org/c/openstack/octavia/+/764889	15:54
spatel	johnsom: does octavia support ngnix amphora?	15:54
johnsom	No	15:54
spatel	:(	15:54
johnsom	Bad licensing issues and no one developed one	15:55
*** armax has joined #openstack-lbaas		15:55
johnsom	Plus why?	15:55
spatel	currently we are running ngnix using tcp stream socket	15:55
spatel	not sure if haproxy support that protocol	15:56
johnsom	Yes it does	15:56
spatel	hmm i think i need to ask developer to try out haproxy to validate functionality	15:57
spatel	Is there any load-testing or benchmark report available for octvia to verify how many TPS it can handle with standard hardware?	15:58
spatel	I am going to benchmark but i need some baseline report to compare my result	15:59
johnsom	Ha, well, that is a moving target and highly dependent on the underlying cloud	15:59
johnsom	If you google, there is a page that will come up for an old version	16:00
spatel	Ok i will try to find them.	16:00
johnsom	It was around 30,000 for an older amp, 1 core, 1gbps	16:01
spatel	30k TPS with SSL.. that is freaking awesome number	16:01
johnsom	No, that was not with TLS	16:01
spatel	ah	16:02
johnsom	With TLS you will want your nova to pass through the encryption acceleration cpu functions and may need to bump the RAM for the amp	16:03
spatel	currently i am using public centos8 amphora, and lots of doc saying build your own so is there really advantage to build own amphora image?	16:03
spatel	Yes for TLS we need AES flag on CPU with openssl support to use that flag	16:04
johnsom	Well, we don’t ship images, so everyone builds their own	16:04
johnsom	Our amps will use the extensions if they are there	16:04
spatel	what is the advantage of getting from public place vs building own? (i believe we can add some custom stuff if we build in home)	16:05
johnsom	I guess that you have current bits. We provide scripts that make it quick and easy to build the image	16:06
johnsom	We, as the OpenStack community, do not ship prebuilt images for production use.	16:07
johnsom	Some vendors do however	16:07
johnsom	Over the years there has been some advantage to building custom to get a newer version of HAproxy than the distros shipped. But right now I think we are pulling in 2.x so in good shape	16:10
spatel	johnsom: thanks	16:21
johnsom	Sure, np	16:21
spatel	johnsom: does amphora support SRIOV instances for performance?	16:22
johnsom	We have not yet added the scheduling hints to flavors to support that. It can, but the dev work has not been done yet.	16:23
johnsom	Are you interested in QAT SRIOV or the nic SRIOV?	16:24
spatel	nic SRIOV	16:25
spatel	my 80% workload running on sriov instances so looking for that support if required to run high performance haproxy LB	16:25
spatel	what is QAT?	16:26
johnsom	Encryption and compression offload	16:27
spatel	Not there yet.	16:28
johnsom	Yeah, the underlying networking is usually the bottleneck for the amps	16:28
spatel	we use nic SRIOV for low latency network	16:28
spatel	virtio is really bad for moderate workload	16:29
spatel	i did benchmarking and found virtio only support 200kpps Vs sriov support 1.5mpps	16:29
johnsom	I have seen up to 14gbps through an amp, TCP, but it was same host	16:30
johnsom	Well, there is a lot of tuning that can be done as well	16:30
spatel	I always do benchmark based on PPS rate	16:30
spatel	This is my Trex result of standard virtio vm - https://asciinema.org/a/qXPA48Kc7deILJJrObtF2i3ZT	16:31
spatel	This is SR-IOV Trex result - https://asciinema.org/a/376367	16:32
johnsom	There are still a lot of usecases where using a hardware provider with Octavia is the right answer.	16:32
spatel	ACTIVE+ACTIVE will solve all those issue :)	16:33
johnsom	Well, it was going to target them for sure. Now with the HAProxy 2.x amps we can vertically scale by adding cpu cores as well, which will also provide a good bump	16:36
johnsom	There is more tuning for that I have planned as well, but sadly my focus is on other internal projects at the moment	16:37
*** sapd1_x has joined #openstack-lbaas		16:37
spatel	adding more cpu means changing flavor right?	16:38
johnsom	Well, you would create an Octavia flavor unless you want all of you lbs to have more cores	16:40
spatel	+1	16:40
spatel	Can i pass properties via flavor to select SINGLE vs ACTIVE-STANDBY	16:41
johnsom	Yes	16:42
spatel	nice! let me go back to my lab for more testing :) thank you johnsom	16:46
johnsom	Sure, NP	16:47
*** rpittau is now known as rpittau\|afk		16:49
*** luksky has quit IRC		16:54
*** sapd1_x has quit IRC		17:25
openstackgerrit	Merged openstack/octavia stable/stein: Map cloud-guest-utils to cloud-utils-growpart for Red Hat distros. https://review.opendev.org/c/openstack/octavia/+/764894	17:43
openstackgerrit	Merged openstack/octavia stable/train: Map cloud-guest-utils to cloud-utils-growpart for Red Hat distros. https://review.opendev.org/c/openstack/octavia/+/764893	18:27
*** vishalmanchanda has quit IRC		19:22
*** beagles has quit IRC		19:28
*** b3nt_pin has joined #openstack-lbaas		19:29
*** b3nt_pin is now known as beagles		19:29
*** gcheresh has quit IRC		19:35
*** luksky has joined #openstack-lbaas		19:42
*** rcernin has joined #openstack-lbaas		19:57
*** rcernin has quit IRC		20:23
*** xgerman has joined #openstack-lbaas		20:31
openstackgerrit	Merged openstack/octavia stable/ussuri: Fix missing cronie package in RHEL-based image builds https://review.opendev.org/c/openstack/octavia/+/764888	20:34
openstackgerrit	Merged openstack/octavia stable/ussuri: Fix load balancers with failed amphora failover https://review.opendev.org/c/openstack/octavia/+/756903	20:54
openstackgerrit	Merged openstack/octavia master: Remove re-import of octavia-lib constants https://review.opendev.org/c/openstack/octavia/+/763437	20:54
*** rcernin has joined #openstack-lbaas		20:58
*** rcernin has quit IRC		21:03
*** TrevorV has quit IRC		21:04
*** openstackgerrit has quit IRC		21:08
*** rcernin has joined #openstack-lbaas		21:29
*** ccamposr has joined #openstack-lbaas		21:41
*** ccamposr__ has quit IRC		21:43
*** spatel has quit IRC		22:16
*** ccamposr__ has joined #openstack-lbaas		22:35
*** ccamposr has quit IRC		22:38
*** luksky has quit IRC		23:03
*** tkajinam has quit IRC		23:03
*** tkajinam has joined #openstack-lbaas		23:04
*** openstackgerrit has joined #openstack-lbaas		23:11
openstackgerrit	Merged openstack/octavia stable/victoria: Map cloud-guest-utils to cloud-utils-growpart for Red Hat distros. https://review.opendev.org/c/openstack/octavia/+/764891	23:11

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!