*** noonedeadpunk has quit IRC | 00:08 | |
*** noonedeadpunk has joined #openstack-lbaas | 00:10 | |
*** cgoncalves has quit IRC | 00:28 | |
*** gregwork has quit IRC | 00:48 | |
*** cgoncalves has joined #openstack-lbaas | 01:06 | |
*** armax has quit IRC | 01:29 | |
*** rcernin has quit IRC | 01:58 | |
*** rcernin has joined #openstack-lbaas | 01:58 | |
*** ramishra_ has joined #openstack-lbaas | 02:08 | |
*** xgerman has quit IRC | 02:56 | |
*** rcernin has quit IRC | 03:06 | |
openstackgerrit | wu.chunyang proposed openstack/octavia master: Add notifications specification documens https://review.opendev.org/c/openstack/octavia/+/727915 | 03:16 |
---|---|---|
*** rcernin has joined #openstack-lbaas | 03:26 | |
*** rcernin has quit IRC | 03:30 | |
*** rcernin has joined #openstack-lbaas | 03:30 | |
*** sapd1_x has joined #openstack-lbaas | 03:33 | |
*** psachin has joined #openstack-lbaas | 03:59 | |
*** lemko has quit IRC | 04:25 | |
*** lemko has joined #openstack-lbaas | 04:25 | |
*** gcheresh has joined #openstack-lbaas | 05:32 | |
*** sapd1_x has quit IRC | 05:40 | |
openstackgerrit | zhangboye proposed openstack/octavia master: Replace deprecated UPPER_CONSTRAINTS_FILE variable https://review.opendev.org/c/openstack/octavia/+/765240 | 05:48 |
openstackgerrit | Merged openstack/octavia-tempest-plugin master: Add HTTP/2 support to the Go test server https://review.opendev.org/c/openstack/octavia-tempest-plugin/+/758617 | 06:39 |
openstackgerrit | Merged openstack/octavia master: Add amphora_id in store params for failover_amphora https://review.opendev.org/c/openstack/octavia/+/760380 | 06:45 |
openstackgerrit | Ann Taraday proposed openstack/octavia master: Alias change amphorav2 -> amphora https://review.opendev.org/c/openstack/octavia/+/740432 | 07:20 |
openstackgerrit | Ann Taraday proposed openstack/octavia master: Alias change amphorav2 -> amphora https://review.opendev.org/c/openstack/octavia/+/740432 | 07:20 |
*** rpittau|afk is now known as rpittau | 07:27 | |
openstackgerrit | wu.chunyang proposed openstack/octavia master: Add default value for enabled column in l7rule table https://review.opendev.org/c/openstack/octavia/+/761283 | 07:27 |
*** sapd1_x has joined #openstack-lbaas | 07:29 | |
openstackgerrit | wu.chunyang proposed openstack/octavia master: Add notifications specification documens https://review.opendev.org/c/openstack/octavia/+/727915 | 07:49 |
rm_work | interesting one -- an amp went down because a HV went down... and it took MANY HOURS to actually failover (config is set to default of 60s heartbeat stale time) | 08:04 |
rm_work | recorded this error at the time I assume it was first stale: | 08:04 |
rm_work | octavia/controller/worker/v1/tasks/amphora_driver_tasks.py:execute:84 Failed to update listeners on amphora c37d547f-bacb-48c3-8bab-41254bba4945. Skipping this amphora as it is failing to update due to: contacting the amphora timed out | 08:08 |
rm_work | and i see a ton of the timeouts going back for ~12 minutes (120 retries at 5s each, also default config, it seems) | 08:13 |
rm_work | so ... the amp went down, became stale in the DB, and then... why would it be trying to connect and timing out? O_o | 08:13 |
rm_work | after that timeout it seems it moved on to another step? and failed timeouts for another ~10.5m (120*5) until it got this: | 08:16 |
rm_work | octavia/controller/worker/v1/tasks/amphora_driver_tasks.py:execute:136 Failed to reload listeners on amphora c37d547f-bacb-48c3-8bab-41254bba4945. Skipping this amphora as it is failing to reload due to: contacting the amphora timed out | 08:16 |
rm_work | AH, it seems that both of the amphorae on the LB failed at almost the same time (may have been on the same HV... damn soft-aa) | 08:21 |
rm_work | possibly the single-amp failover process doesn't handle the case super well? | 08:21 |
rm_work | one succeeded but the other most certainly did not | 08:21 |
*** luksky has joined #openstack-lbaas | 08:22 | |
rm_work | OH I SEE (I think) | 08:24 |
rm_work | So both went down at approximately the same time. One of them went stale first. HM tried to failover that amp. It succeeded but took a really long time because it had to time-out on two steps attempting to update the other amp. | 08:25 |
rm_work | those timeout failures cause the other amp (which was also down) to go to ERROR | 08:26 |
rm_work | somehow about 2h45m later, it actually DID get picked up as stale again??? was it marked busy somehow for that time? unclear | 08:28 |
rm_work | at which point it completed failover in short order | 08:28 |
*** redrobot has quit IRC | 08:41 | |
*** rcernin has quit IRC | 08:46 | |
*** vishalmanchanda has joined #openstack-lbaas | 08:47 | |
rm_work | nevermind. figured out the correct timeline | 08:50 |
rm_work | 6:26 -- amp1 HV dies | 08:51 |
rm_work | 6:27 -- amp1 goes stale, attempts to failover, but can't connect to amp2 to update the haproxy/vrrp peer configs | 08:51 |
rm_work | 6:39 -- times out on first task (update listeners) | 08:53 |
rm_work | 6:49 -- times out on second task (reload listeners) | 08:53 |
rm_work | 6:50 -- amp1 failover complete | 08:54 |
rm_work | 9:32 -- amp2 HV dies | 08:54 |
rm_work | 9:33 -- amp2 goes stale, fails over | 08:54 |
rm_work | 9:35 -- amp2 failover complete | 08:54 |
rm_work | so, the issue was that amp2 was not connectable during the amp1 failover, even tho it should have been up, as it was sending heartbeats just fine and the HV *was* up | 08:55 |
rm_work | amp2 was in ERROR status for the intervening period until the heartbeats actually did fail, and then it was replaced correctly | 08:55 |
rm_work | MEANWHILE, user was reporting intermittent 502 errors -- I can only guess that: without an updated vrrp config on amp2, it thought it was still supposed to be the MASTER and so it kept gARPing, but so did amp1-new, and the routes were flipping back and forth constantly? would that cause a 502 in the case where it happened at exactly the right time (between packets on a keepalive connection)? | 08:57 |
*** rcernin has joined #openstack-lbaas | 09:00 | |
*** zzzeek has quit IRC | 09:02 | |
*** rcernin has quit IRC | 09:04 | |
*** zzzeek has joined #openstack-lbaas | 09:04 | |
*** ataraday_ has joined #openstack-lbaas | 09:07 | |
lxkong | anyone has seen this error during failover before https://dpaste.com/C68CMVQ6A#wrap? | 09:38 |
lxkong | after upgrading octavia from train to ussuri. | 09:39 |
lxkong | The load balancer info: https://dpaste.com/7HQRKPWL8 | 09:39 |
lxkong | in ussuri, when failover failed, the new amphora is removed, so no chance to log in and check | 09:40 |
*** luksky has quit IRC | 09:51 | |
lxkong | well, the pool in the load balancer has configured session persistence | 10:11 |
lxkong | the peer section is empty, https://dpaste.com/GV6V84PHV | 10:22 |
lxkong | hmm...seems like the issue has been fixed in the upstream recently | 11:01 |
lxkong | using the latest master, amphora can be successfully failed over | 11:01 |
lxkong | found, https://review.opendev.org/q/change:I923accd73e0c9cadc91c115157c576432f428622 | 11:16 |
*** sapd1 has quit IRC | 11:17 | |
*** zzzeek has quit IRC | 11:17 | |
*** zzzeek has joined #openstack-lbaas | 11:19 | |
*** luksky has joined #openstack-lbaas | 11:22 | |
*** sapd1_x has quit IRC | 11:27 | |
*** sapd1_x has joined #openstack-lbaas | 11:33 | |
*** spatel has joined #openstack-lbaas | 11:34 | |
*** sapd1_x has quit IRC | 11:38 | |
*** spatel has quit IRC | 11:39 | |
*** ramishra_ has quit IRC | 11:40 | |
*** psachin has quit IRC | 11:45 | |
*** ramishra has joined #openstack-lbaas | 11:59 | |
openstackgerrit | Ann Taraday proposed openstack/octavia master: Alias change amphorav2 -> amphora https://review.opendev.org/c/openstack/octavia/+/740432 | 12:01 |
openstackgerrit | Ann Taraday proposed openstack/octavia master: Alias change amphorav2 -> amphora https://review.opendev.org/c/openstack/octavia/+/740432 | 12:03 |
*** ramishra has quit IRC | 12:09 | |
*** ramishra has joined #openstack-lbaas | 12:16 | |
*** zzzeek has quit IRC | 13:09 | |
*** zzzeek has joined #openstack-lbaas | 13:11 | |
*** mugsie has quit IRC | 13:32 | |
*** TrevorV has joined #openstack-lbaas | 13:38 | |
*** ramishra has quit IRC | 13:45 | |
*** ramishra has joined #openstack-lbaas | 14:04 | |
*** ataraday_ has quit IRC | 14:51 | |
*** laerling has joined #openstack-lbaas | 15:17 | |
*** redrobot has joined #openstack-lbaas | 15:35 | |
*** spatel has joined #openstack-lbaas | 15:40 | |
spatel | johnsom: hey! | 15:40 |
johnsom | rm_work: Your frankinetwork makes it had to say definitively, but if the user got a 502 from the load balancer it was reachable, but a backend server may have become unreachable while it was servicing the request. | 15:40 |
johnsom | spatel Hi | 15:40 |
spatel | I have affinity SINGLE for octavia but when i building LB default it creating two amphora somehow | 15:41 |
rm_work | Hmm | 15:41 |
johnsom | Do you have a spares pool configured? | 15:42 |
spatel | johnsom: let me collect more logs etc. thought just ask you if any thing change recently which i am not aware. | 15:42 |
spatel | spares pool? i didn't do any special configuration (everything is default) | 15:43 |
johnsom | Check your config file for the spares setting and make sure it is not configured | 15:43 |
spatel | looking.. | 15:44 |
johnsom | spare_amphora_pool_size | 15:45 |
spatel | spare_amphora_pool_size = 1 | 15:45 |
johnsom | That is why, it is booting a spare amp | 15:45 |
johnsom | Set that to zero | 15:45 |
spatel | how does this extra amphora different then full HA mode? | 15:46 |
spatel | what is the use of having spare_amphora_pool_size setting? | 15:46 |
johnsom | They are unconfigured and can be used when creating a new load balancer | 15:47 |
adeberg | faster recovery i believe | 15:47 |
johnsom | Well, in very limited situations due to nova issues. We have marked it deprecated in recent releases | 15:47 |
spatel | johnsom: you are saying this option will deprecated in future release right? | 15:49 |
johnsom | The idea wad to speed creation and failover by having the VM already booted. | 15:49 |
johnsom | Yes | 15:49 |
spatel | good to know so i don't spend my time on that one. :) | 15:49 |
johnsom | https://docs.openstack.org/octavia/latest/configuration/configref.html#house_keeping.spare_amphora_pool_size | 15:49 |
spatel | +1 | 15:53 |
openstackgerrit | Merged openstack/octavia stable/ussuri: Fix load balancers with failed amphora failover https://review.opendev.org/c/openstack/octavia/+/763732 | 15:53 |
openstackgerrit | Merged openstack/octavia stable/stein: Fix missing cronie package in RHEL-based image builds https://review.opendev.org/c/openstack/octavia/+/764890 | 15:54 |
openstackgerrit | Merged openstack/octavia stable/train: Fix missing cronie package in RHEL-based image builds https://review.opendev.org/c/openstack/octavia/+/764889 | 15:54 |
spatel | johnsom: does octavia support ngnix amphora? | 15:54 |
johnsom | No | 15:54 |
spatel | :( | 15:54 |
johnsom | Bad licensing issues and no one developed one | 15:55 |
*** armax has joined #openstack-lbaas | 15:55 | |
johnsom | Plus why? | 15:55 |
spatel | currently we are running ngnix using tcp stream socket | 15:55 |
spatel | not sure if haproxy support that protocol | 15:56 |
johnsom | Yes it does | 15:56 |
spatel | hmm i think i need to ask developer to try out haproxy to validate functionality | 15:57 |
spatel | Is there any load-testing or benchmark report available for octvia to verify how many TPS it can handle with standard hardware? | 15:58 |
spatel | I am going to benchmark but i need some baseline report to compare my result | 15:59 |
johnsom | Ha, well, that is a moving target and highly dependent on the underlying cloud | 15:59 |
johnsom | If you google, there is a page that will come up for an old version | 16:00 |
spatel | Ok i will try to find them. | 16:00 |
johnsom | It was around 30,000 for an older amp, 1 core, 1gbps | 16:01 |
spatel | 30k TPS with SSL.. that is freaking awesome number | 16:01 |
johnsom | No, that was not with TLS | 16:01 |
spatel | ah | 16:02 |
johnsom | With TLS you will want your nova to pass through the encryption acceleration cpu functions and may need to bump the RAM for the amp | 16:03 |
spatel | currently i am using public centos8 amphora, and lots of doc saying build your own so is there really advantage to build own amphora image? | 16:03 |
spatel | Yes for TLS we need AES flag on CPU with openssl support to use that flag | 16:04 |
johnsom | Well, we don’t ship images, so everyone builds their own | 16:04 |
johnsom | Our amps will use the extensions if they are there | 16:04 |
spatel | what is the advantage of getting from public place vs building own? (i believe we can add some custom stuff if we build in home) | 16:05 |
johnsom | I guess that you have current bits. We provide scripts that make it quick and easy to build the image | 16:06 |
johnsom | We, as the OpenStack community, do not ship prebuilt images for production use. | 16:07 |
johnsom | Some vendors do however | 16:07 |
johnsom | Over the years there has been some advantage to building custom to get a newer version of HAproxy than the distros shipped. But right now I think we are pulling in 2.x so in good shape | 16:10 |
spatel | johnsom: thanks | 16:21 |
johnsom | Sure, np | 16:21 |
spatel | johnsom: does amphora support SRIOV instances for performance? | 16:22 |
johnsom | We have not yet added the scheduling hints to flavors to support that. It can, but the dev work has not been done yet. | 16:23 |
johnsom | Are you interested in QAT SRIOV or the nic SRIOV? | 16:24 |
spatel | nic SRIOV | 16:25 |
spatel | my 80% workload running on sriov instances so looking for that support if required to run high performance haproxy LB | 16:25 |
spatel | what is QAT? | 16:26 |
johnsom | Encryption and compression offload | 16:27 |
spatel | Not there yet. | 16:28 |
johnsom | Yeah, the underlying networking is usually the bottleneck for the amps | 16:28 |
spatel | we use nic SRIOV for low latency network | 16:28 |
spatel | virtio is really bad for moderate workload | 16:29 |
spatel | i did benchmarking and found virtio only support 200kpps Vs sriov support 1.5mpps | 16:29 |
johnsom | I have seen up to 14gbps through an amp, TCP, but it was same host | 16:30 |
johnsom | Well, there is a lot of tuning that can be done as well | 16:30 |
spatel | I always do benchmark based on PPS rate | 16:30 |
spatel | This is my Trex result of standard virtio vm - https://asciinema.org/a/qXPA48Kc7deILJJrObtF2i3ZT | 16:31 |
spatel | This is SR-IOV Trex result - https://asciinema.org/a/376367 | 16:32 |
johnsom | There are still a lot of usecases where using a hardware provider with Octavia is the right answer. | 16:32 |
spatel | ACTIVE+ACTIVE will solve all those issue :) | 16:33 |
johnsom | Well, it was going to target them for sure. Now with the HAProxy 2.x amps we can vertically scale by adding cpu cores as well, which will also provide a good bump | 16:36 |
johnsom | There is more tuning for that I have planned as well, but sadly my focus is on other internal projects at the moment | 16:37 |
*** sapd1_x has joined #openstack-lbaas | 16:37 | |
spatel | adding more cpu means changing flavor right? | 16:38 |
johnsom | Well, you would create an Octavia flavor unless you want all of you lbs to have more cores | 16:40 |
spatel | +1 | 16:40 |
spatel | Can i pass properties via flavor to select SINGLE vs ACTIVE-STANDBY | 16:41 |
johnsom | Yes | 16:42 |
spatel | nice! let me go back to my lab for more testing :) thank you johnsom | 16:46 |
johnsom | Sure, NP | 16:47 |
*** rpittau is now known as rpittau|afk | 16:49 | |
*** luksky has quit IRC | 16:54 | |
*** sapd1_x has quit IRC | 17:25 | |
openstackgerrit | Merged openstack/octavia stable/stein: Map cloud-guest-utils to cloud-utils-growpart for Red Hat distros. https://review.opendev.org/c/openstack/octavia/+/764894 | 17:43 |
openstackgerrit | Merged openstack/octavia stable/train: Map cloud-guest-utils to cloud-utils-growpart for Red Hat distros. https://review.opendev.org/c/openstack/octavia/+/764893 | 18:27 |
*** vishalmanchanda has quit IRC | 19:22 | |
*** beagles has quit IRC | 19:28 | |
*** b3nt_pin has joined #openstack-lbaas | 19:29 | |
*** b3nt_pin is now known as beagles | 19:29 | |
*** gcheresh has quit IRC | 19:35 | |
*** luksky has joined #openstack-lbaas | 19:42 | |
*** rcernin has joined #openstack-lbaas | 19:57 | |
*** rcernin has quit IRC | 20:23 | |
*** xgerman has joined #openstack-lbaas | 20:31 | |
openstackgerrit | Merged openstack/octavia stable/ussuri: Fix missing cronie package in RHEL-based image builds https://review.opendev.org/c/openstack/octavia/+/764888 | 20:34 |
openstackgerrit | Merged openstack/octavia stable/ussuri: Fix load balancers with failed amphora failover https://review.opendev.org/c/openstack/octavia/+/756903 | 20:54 |
openstackgerrit | Merged openstack/octavia master: Remove re-import of octavia-lib constants https://review.opendev.org/c/openstack/octavia/+/763437 | 20:54 |
*** rcernin has joined #openstack-lbaas | 20:58 | |
*** rcernin has quit IRC | 21:03 | |
*** TrevorV has quit IRC | 21:04 | |
*** openstackgerrit has quit IRC | 21:08 | |
*** rcernin has joined #openstack-lbaas | 21:29 | |
*** ccamposr has joined #openstack-lbaas | 21:41 | |
*** ccamposr__ has quit IRC | 21:43 | |
*** spatel has quit IRC | 22:16 | |
*** ccamposr__ has joined #openstack-lbaas | 22:35 | |
*** ccamposr has quit IRC | 22:38 | |
*** luksky has quit IRC | 23:03 | |
*** tkajinam has quit IRC | 23:03 | |
*** tkajinam has joined #openstack-lbaas | 23:04 | |
*** openstackgerrit has joined #openstack-lbaas | 23:11 | |
openstackgerrit | Merged openstack/octavia stable/victoria: Map cloud-guest-utils to cloud-utils-growpart for Red Hat distros. https://review.opendev.org/c/openstack/octavia/+/764891 | 23:11 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!