*** wuchunyang has quit IRC | 00:15 | |
*** wuchunyang has joined #openstack-lbaas | 00:16 | |
*** yamamoto has joined #openstack-lbaas | 00:17 | |
*** wuchunyang has quit IRC | 01:02 | |
openstackgerrit | Anushka Singh proposed openstack/octavia master: Refactoring amphora stats driver interface https://review.opendev.org/737111 | 01:03 |
---|---|---|
aannuusshhkkaa | johnsom, we fixed 2/3 issues you had raised on https://review.opendev.org/737111... | 01:10 |
*** yamamoto has quit IRC | 01:23 | |
*** yamamoto has joined #openstack-lbaas | 01:26 | |
*** yamamoto has quit IRC | 01:30 | |
*** tkajinam has quit IRC | 01:32 | |
*** tkajinam has joined #openstack-lbaas | 01:32 | |
openstackgerrit | Merged openstack/octavia-tempest-plugin master: Change to use memory_tracker variable https://review.opendev.org/704202 | 01:32 |
*** wuchunyang has joined #openstack-lbaas | 01:34 | |
*** ianychoi_ has quit IRC | 01:48 | |
*** ianychoi_ has joined #openstack-lbaas | 01:50 | |
*** armax has joined #openstack-lbaas | 02:27 | |
*** yamamoto has joined #openstack-lbaas | 02:51 | |
*** yamamoto has quit IRC | 03:05 | |
*** yamamoto has joined #openstack-lbaas | 03:06 | |
*** wuchunyang has quit IRC | 03:40 | |
*** wuchunyang has joined #openstack-lbaas | 03:45 | |
*** coreycb has quit IRC | 04:08 | |
*** headphoneJames has quit IRC | 04:08 | |
*** rm_work has quit IRC | 04:08 | |
*** nicolasbock has quit IRC | 04:08 | |
*** armax has quit IRC | 04:08 | |
*** KeithMnemonic has quit IRC | 04:08 | |
*** bcafarel has quit IRC | 04:08 | |
*** laerling has quit IRC | 04:08 | |
*** oklhost has quit IRC | 04:08 | |
*** dayou has quit IRC | 04:08 | |
*** zigo has quit IRC | 04:08 | |
*** gthiemonge has quit IRC | 04:08 | |
*** andy_ has quit IRC | 04:08 | |
*** cgoncalves has quit IRC | 04:08 | |
*** amotoki has quit IRC | 04:08 | |
*** dulek has quit IRC | 04:08 | |
*** f0o has quit IRC | 04:08 | |
*** ramishra has quit IRC | 04:08 | |
*** dasp_ has quit IRC | 04:08 | |
*** gmann has quit IRC | 04:08 | |
*** emccormick has quit IRC | 04:08 | |
*** dougwig has quit IRC | 04:08 | |
*** mnaser has quit IRC | 04:08 | |
*** squarebracket has quit IRC | 04:08 | |
*** ianychoi_ has quit IRC | 04:08 | |
*** jmccrory has quit IRC | 04:08 | |
*** njohnston has quit IRC | 04:08 | |
*** wuchunyang has quit IRC | 04:08 | |
*** rpittau has quit IRC | 04:08 | |
*** osmanlicilegi has quit IRC | 04:08 | |
*** zzzeek has quit IRC | 04:08 | |
*** brtknr has quit IRC | 04:08 | |
*** mloza has quit IRC | 04:08 | |
*** trident has quit IRC | 04:08 | |
*** tobberydberg_ has quit IRC | 04:08 | |
*** eandersson has quit IRC | 04:08 | |
*** sorrison has quit IRC | 04:08 | |
*** zetaab has quit IRC | 04:08 | |
*** openstackgerrit has quit IRC | 04:08 | |
*** frickler has quit IRC | 04:08 | |
*** johnthetubaguy has quit IRC | 04:08 | |
*** vesper11 has quit IRC | 04:08 | |
*** fyx has quit IRC | 04:08 | |
*** NobodyCam has quit IRC | 04:08 | |
*** jrosser has quit IRC | 04:08 | |
*** aannuusshhkkaa has quit IRC | 04:08 | |
*** JayF has quit IRC | 04:08 | |
*** mugsie has quit IRC | 04:08 | |
*** stingrayza has quit IRC | 04:08 | |
*** jamespage has quit IRC | 04:08 | |
*** kklimonda has quit IRC | 04:08 | |
*** dmsimard has quit IRC | 04:08 | |
*** dosaboy has quit IRC | 04:08 | |
*** TMM has quit IRC | 04:08 | |
*** michchap has quit IRC | 04:08 | |
*** devfaz has quit IRC | 04:08 | |
*** numans has quit IRC | 04:08 | |
*** yamamoto has quit IRC | 04:08 | |
*** servagem has quit IRC | 04:08 | |
*** colin- has quit IRC | 04:08 | |
*** dtruong has quit IRC | 04:08 | |
*** beisner has quit IRC | 04:08 | |
*** hemanth_n has quit IRC | 04:08 | |
*** irclogbot_3 has quit IRC | 04:08 | |
*** haleyb has quit IRC | 04:08 | |
*** logan- has quit IRC | 04:08 | |
*** kevinz has quit IRC | 04:08 | |
*** lxkong has quit IRC | 04:08 | |
*** johnsom has quit IRC | 04:08 | |
*** tkajinam has quit IRC | 04:08 | |
*** xgerman has quit IRC | 04:08 | |
*** andrein has quit IRC | 04:08 | |
*** stevenglasford has quit IRC | 04:08 | |
*** ramishra has joined #openstack-lbaas | 04:14 | |
*** squarebracket has joined #openstack-lbaas | 04:14 | |
*** mnaser has joined #openstack-lbaas | 04:14 | |
*** dougwig has joined #openstack-lbaas | 04:14 | |
*** emccormick has joined #openstack-lbaas | 04:14 | |
*** gmann has joined #openstack-lbaas | 04:14 | |
*** dasp_ has joined #openstack-lbaas | 04:14 | |
*** amotoki has joined #openstack-lbaas | 04:14 | |
*** cgoncalves has joined #openstack-lbaas | 04:14 | |
*** andy_ has joined #openstack-lbaas | 04:14 | |
*** gthiemonge has joined #openstack-lbaas | 04:14 | |
*** nicolasbock has joined #openstack-lbaas | 04:14 | |
*** rm_work has joined #openstack-lbaas | 04:14 | |
*** headphoneJames has joined #openstack-lbaas | 04:14 | |
*** coreycb has joined #openstack-lbaas | 04:14 | |
*** frickler has joined #openstack-lbaas | 04:14 | |
*** openstackgerrit has joined #openstack-lbaas | 04:14 | |
*** zetaab has joined #openstack-lbaas | 04:14 | |
*** sorrison has joined #openstack-lbaas | 04:14 | |
*** eandersson has joined #openstack-lbaas | 04:14 | |
*** johnthetubaguy has joined #openstack-lbaas | 04:14 | |
*** tobberydberg_ has joined #openstack-lbaas | 04:14 | |
*** trident has joined #openstack-lbaas | 04:14 | |
*** zzzeek has joined #openstack-lbaas | 04:14 | |
*** mugsie has joined #openstack-lbaas | 04:14 | |
*** JayF has joined #openstack-lbaas | 04:14 | |
*** aannuusshhkkaa has joined #openstack-lbaas | 04:14 | |
*** jrosser has joined #openstack-lbaas | 04:14 | |
*** NobodyCam has joined #openstack-lbaas | 04:14 | |
*** vesper11 has joined #openstack-lbaas | 04:14 | |
*** jmccrory has joined #openstack-lbaas | 04:14 | |
*** ianychoi_ has joined #openstack-lbaas | 04:14 | |
*** njohnston has joined #openstack-lbaas | 04:14 | |
*** osmanlicilegi has joined #openstack-lbaas | 04:14 | |
*** rpittau has joined #openstack-lbaas | 04:14 | |
*** numans has joined #openstack-lbaas | 04:14 | |
*** devfaz has joined #openstack-lbaas | 04:14 | |
*** michchap has joined #openstack-lbaas | 04:14 | |
*** TMM has joined #openstack-lbaas | 04:14 | |
*** dosaboy has joined #openstack-lbaas | 04:14 | |
*** dmsimard has joined #openstack-lbaas | 04:14 | |
*** kklimonda has joined #openstack-lbaas | 04:14 | |
*** jamespage has joined #openstack-lbaas | 04:14 | |
*** stingrayza has joined #openstack-lbaas | 04:14 | |
*** yamamoto has joined #openstack-lbaas | 04:14 | |
*** tkajinam has joined #openstack-lbaas | 04:14 | |
*** servagem has joined #openstack-lbaas | 04:14 | |
*** fyx has joined #openstack-lbaas | 04:14 | |
*** lxkong has joined #openstack-lbaas | 04:14 | |
*** johnsom has joined #openstack-lbaas | 04:14 | |
*** xgerman has joined #openstack-lbaas | 04:14 | |
*** andrein has joined #openstack-lbaas | 04:14 | |
*** stevenglasford has joined #openstack-lbaas | 04:14 | |
*** beisner has joined #openstack-lbaas | 04:14 | |
*** hemanth_n has joined #openstack-lbaas | 04:14 | |
*** kevinz has joined #openstack-lbaas | 04:14 | |
*** colin- has joined #openstack-lbaas | 04:14 | |
*** f0o has joined #openstack-lbaas | 04:14 | |
*** dulek has joined #openstack-lbaas | 04:14 | |
*** dtruong has joined #openstack-lbaas | 04:14 | |
*** irclogbot_3 has joined #openstack-lbaas | 04:14 | |
*** haleyb has joined #openstack-lbaas | 04:14 | |
*** logan- has joined #openstack-lbaas | 04:14 | |
*** brtknr has joined #openstack-lbaas | 04:15 | |
*** mloza has joined #openstack-lbaas | 04:15 | |
*** armax has joined #openstack-lbaas | 04:15 | |
*** laerling has joined #openstack-lbaas | 04:15 | |
*** KeithMnemonic has joined #openstack-lbaas | 04:15 | |
*** bcafarel has joined #openstack-lbaas | 04:15 | |
*** oklhost has joined #openstack-lbaas | 04:15 | |
*** dayou has joined #openstack-lbaas | 04:15 | |
*** zigo has joined #openstack-lbaas | 04:15 | |
*** coreycb has quit IRC | 04:15 | |
*** nicolasbock has quit IRC | 04:16 | |
*** beisner has quit IRC | 04:16 | |
*** gmann has quit IRC | 04:16 | |
*** mnaser has quit IRC | 04:16 | |
*** fyx has quit IRC | 04:16 | |
*** coreycb has joined #openstack-lbaas | 04:16 | |
*** beisner has joined #openstack-lbaas | 04:18 | |
*** fyx has joined #openstack-lbaas | 04:18 | |
*** gmann has joined #openstack-lbaas | 04:19 | |
*** nicolasbock has joined #openstack-lbaas | 04:21 | |
*** yamamoto has quit IRC | 04:35 | |
*** yamamoto has joined #openstack-lbaas | 04:38 | |
*** gcheresh has joined #openstack-lbaas | 05:11 | |
*** vishalmanchanda has joined #openstack-lbaas | 05:25 | |
*** gcheresh has quit IRC | 05:32 | |
*** wuchunyang has joined #openstack-lbaas | 05:51 | |
*** ianychoi_ has quit IRC | 06:22 | |
*** ianychoi_ has joined #openstack-lbaas | 06:23 | |
*** tkajinam has quit IRC | 06:25 | |
*** tkajinam has joined #openstack-lbaas | 06:26 | |
*** also_stingrayza has joined #openstack-lbaas | 06:37 | |
*** stingrayza has quit IRC | 06:39 | |
*** also_stingrayza is now known as stingrayza | 06:47 | |
*** wuchunyang has quit IRC | 06:48 | |
*** wuchunyang has joined #openstack-lbaas | 06:53 | |
*** ataraday_ has joined #openstack-lbaas | 07:20 | |
*** wuchunyang has quit IRC | 07:25 | |
*** maciejjozefczyk has joined #openstack-lbaas | 07:57 | |
*** gcheresh has joined #openstack-lbaas | 08:00 | |
*** gcheresh has quit IRC | 08:24 | |
*** yamamoto has quit IRC | 08:26 | |
*** cgoncalves has quit IRC | 08:27 | |
*** yamamoto has joined #openstack-lbaas | 08:28 | |
*** cgoncalves has joined #openstack-lbaas | 08:29 | |
*** yamamoto has quit IRC | 08:29 | |
*** yamamoto has joined #openstack-lbaas | 08:29 | |
*** gcheresh has joined #openstack-lbaas | 08:30 | |
ataraday_ | cgoncalves, Hi! I was looking into making grenade job do amphora -> amphorav2 upgrade. And there is an issue with that. I found that I can pass some settings via grenade_devstack_localrc only to new setups https://review.opendev.org/#/c/737993/6/zuul.d/amphorav2-jobs.yaml@107 | 08:38 |
ataraday_ | But I cannot pass post-configs with that | 08:38 |
ataraday_ | I tried some options and checked other projects, but cannot find anything that helps | 08:39 |
cgoncalves | ataraday_, hello! yeah, grenade does not support post-config settings like normal jobs do. you will have to have a per release specific upgrade script | 08:44 |
cgoncalves | ref: https://docs.openstack.org/grenade/latest/readme.html#theory-of-upgrade (see last bullet item in that section) | 08:45 |
cgoncalves | an example of a patch I have open: https://review.opendev.org/#/c/738017/4/devstack/upgrade/from-ussuri/upgrade-octavia | 08:45 |
cgoncalves | ataraday_, although I'm not sure we need such script if we go forward with aliasing "amphora" to "amphorav2" | 08:46 |
ataraday_ | cgoncalves, great, I'll look into it | 08:47 |
ataraday_ | mmm, this was about adding experimental jobs | 08:48 |
ataraday_ | may be I should not add grenade job for now | 08:50 |
*** born2bake has joined #openstack-lbaas | 08:53 | |
*** gcheresh has quit IRC | 08:55 | |
cgoncalves | ataraday_, you could propose a patch with the alias | 09:02 |
cgoncalves | should be trivial, see the "octavia" to "amphora" alias: https://github.com/openstack/octavia/blob/master/setup.cfg#L59-L60 | 09:04 |
ataraday_ | I guess making amphorav2 -> amphora is final step, and experimental jobs we need before to verify that we can do it :) | 09:07 |
cgoncalves | right, so a depends-on patch would work | 09:09 |
cgoncalves | 1) propose alias patch (ignore CI results), 2) set depends-on on the experimental jobs patch for CI validation | 09:10 |
ataraday_ | OK, but then this job will make sense only against this change.. May be this is fine, I will add comment about it. | 09:22 |
ataraday_ | Thanks a lot! | 09:22 |
openstackgerrit | Ann Taraday proposed openstack/octavia master: Add experimental amphorav2 jobs https://review.opendev.org/737993 | 09:39 |
cgoncalves | we can run some experiments to see how it goes :) | 09:40 |
*** dosaboy has quit IRC | 09:40 | |
openstackgerrit | Ann Taraday proposed openstack/octavia master: Alias change amphorav2 -> amphora https://review.opendev.org/740432 | 09:41 |
cgoncalves | ataraday_, not sure I follow your comment in https://review.opendev.org/#/c/739053/3/octavia/common/base_taskflow.py. retryMaskFilter is in both v2.controller_worker and in base_taskflow. are you saying we only need it in one place? | 09:43 |
ataraday_ | cgoncalves, no, I mean we need it in both places. In v2.controller_worker and base_taskflow | 09:44 |
ataraday_ | with enabled or disabled jobboard logs comes from different start points | 09:45 |
*** wuchunyang has joined #openstack-lbaas | 09:45 | |
ataraday_ | cgoncalves, https://review.opendev.org/#/c/647406/106/octavia/controller/worker/v2/controller_worker.py@48 It was dropped it from v2.controller_worker | 09:49 |
cgoncalves | ataraday_, oh, I see the log filter was later removed in the v2.controller_worker. | 09:49 |
cgoncalves | yep | 09:49 |
ataraday_ | sorry for confusion | 09:49 |
cgoncalves | ataraday_, still, v2.controller_worker imports base_taskflow so the log filter will still be applied no? | 09:50 |
cgoncalves | ah, no. never mind | 09:50 |
openstackgerrit | Gregory Thiemonge proposed openstack/octavia-tempest-plugin master: WIP SCTP traffic scenario tests https://review.opendev.org/738643 | 09:53 |
*** yamamoto has quit IRC | 09:53 | |
*** gcheresh has joined #openstack-lbaas | 09:55 | |
*** wuchunyang has quit IRC | 10:08 | |
*** yamamoto has joined #openstack-lbaas | 10:08 | |
*** yamamoto has quit IRC | 10:08 | |
*** yamamoto has joined #openstack-lbaas | 10:09 | |
*** yamamoto has quit IRC | 10:11 | |
*** yamamoto has joined #openstack-lbaas | 10:12 | |
*** gcheresh has quit IRC | 10:20 | |
*** spatel has joined #openstack-lbaas | 10:41 | |
*** spatel has quit IRC | 10:46 | |
*** pck has quit IRC | 10:48 | |
*** pck has joined #openstack-lbaas | 10:51 | |
*** dosaboy has joined #openstack-lbaas | 11:19 | |
*** dosaboy has quit IRC | 11:19 | |
*** dosaboy has joined #openstack-lbaas | 11:19 | |
*** ramishra has quit IRC | 11:23 | |
*** ramishra has joined #openstack-lbaas | 11:27 | |
*** yamamoto has quit IRC | 11:30 | |
*** yamamoto has joined #openstack-lbaas | 11:40 | |
*** yamamoto has quit IRC | 11:42 | |
*** pck has quit IRC | 11:52 | |
*** pck has joined #openstack-lbaas | 11:54 | |
*** yamamoto has joined #openstack-lbaas | 12:00 | |
*** gcheresh has joined #openstack-lbaas | 12:12 | |
*** pck has quit IRC | 12:12 | |
*** pck has joined #openstack-lbaas | 12:13 | |
openstackgerrit | Ann Taraday proposed openstack/octavia master: Alias change amphorav2 -> amphora https://review.opendev.org/740432 | 12:14 |
*** yamamoto has quit IRC | 12:18 | |
*** yamamoto has joined #openstack-lbaas | 12:28 | |
*** spatel has joined #openstack-lbaas | 12:42 | |
*** spatel has quit IRC | 12:47 | |
*** yamamoto has quit IRC | 12:53 | |
*** yamamoto has joined #openstack-lbaas | 13:05 | |
*** gcheresh has quit IRC | 13:07 | |
*** mnaser has joined #openstack-lbaas | 13:09 | |
*** jamesdenton has joined #openstack-lbaas | 13:11 | |
devfaz | hi, anyone here able to help us getting some loadbalancers back to "normal"? We have amphoras in ERROR-state and are unable to failover. | 13:11 |
*** yamamoto has quit IRC | 13:12 | |
*** irclogbot_3 has quit IRC | 13:27 | |
*** kevinz has quit IRC | 13:27 | |
*** irclogbot_0 has joined #openstack-lbaas | 13:29 | |
*** hemanth_n has quit IRC | 13:29 | |
*** hemanth_n_ has joined #openstack-lbaas | 13:30 | |
*** TrevorV has joined #openstack-lbaas | 13:30 | |
*** haleyb has quit IRC | 13:30 | |
*** logan- has quit IRC | 13:30 | |
devfaz | we would like to remove an amphora from the database (the instance already got removed) and just let octavia create a new one. If we just try to failover an amphora we run into different issues f.e. "unable to attach port" to new amphora, then we removed the port.. => "Port: NULL not found", then we created a vrrp_port as described here http://eavesdrop.openstack.org/irclogs/%23openstack-lbaas/%23openstack-lbaas.2017-11-02.log.html#t2017-11-02T11:0 | 13:30 |
devfaz | 7:45 - but now getting "subnet_id: Null", ... is there an easy way to just tell octavia: hey, drop this amphora and create a new one with new vrrp_port? | 13:30 |
*** gmann has quit IRC | 13:30 | |
*** gmann has joined #openstack-lbaas | 13:32 | |
*** logan- has joined #openstack-lbaas | 13:32 | |
*** yamamoto has joined #openstack-lbaas | 13:38 | |
openstackgerrit | Merged openstack/octavia master: Stop to use the __future__ module. https://review.opendev.org/732880 | 13:41 |
*** yamamoto has quit IRC | 13:42 | |
*** yamamoto has joined #openstack-lbaas | 13:52 | |
openstackgerrit | Gregory Thiemonge proposed openstack/octavia master: Deny the creation of L7Policies in TCP or UDP listeners https://review.opendev.org/740478 | 14:16 |
*** gcheresh has joined #openstack-lbaas | 15:09 | |
*** yamamoto has quit IRC | 15:10 | |
*** vishalmanchanda has quit IRC | 15:15 | |
*** sapd1_x has joined #openstack-lbaas | 15:16 | |
*** ataraday_ has quit IRC | 15:17 | |
*** tkajinam has quit IRC | 15:37 | |
*** gcheresh has quit IRC | 15:57 | |
*** gcheresh has joined #openstack-lbaas | 15:58 | |
*** armax has quit IRC | 16:16 | |
*** gcheresh has quit IRC | 16:19 | |
*** armax has joined #openstack-lbaas | 16:20 | |
*** dmellado has joined #openstack-lbaas | 16:50 | |
*** dmellado has quit IRC | 17:04 | |
*** dmellado has joined #openstack-lbaas | 17:08 | |
*** dmellado has quit IRC | 17:26 | |
*** armax has joined #openstack-lbaas | 17:36 | |
*** armax has quit IRC | 17:38 | |
*** sapd1_x has quit IRC | 18:44 | |
*** spatel has joined #openstack-lbaas | 19:22 | |
*** spatel has quit IRC | 19:23 | |
*** spatel has joined #openstack-lbaas | 19:23 | |
*** spatel has quit IRC | 19:30 | |
rm_work | hmm just heads up I am debugging an issue around some session persistence config causing LBs to ERROR | 19:36 |
rm_work | in my cloud, so cent8 amps and minor patching but not anything that should interfere, will follow up when i have some idea what's u[ | 19:37 |
johnsom | "not anything that should interfere" lol | 19:38 |
rm_work | it's pretty minimal now | 19:42 |
rm_work | nova scheduling patch, and a patch to force the cent8 amps to actually ARP properly on boot | 19:43 |
*** TrevorV has quit IRC | 20:01 | |
*** gcheresh has joined #openstack-lbaas | 20:02 | |
*** maciejjozefczyk has quit IRC | 20:25 | |
rm_work | johnsom: ok it's super weird | 20:41 |
cgoncalves | rm_work, could you please revisit https://review.opendev.org/#/c/738246/ (nest virt for CI patch) | 20:41 |
johnsom | rm_work Theme of my day | 20:42 |
rm_work | one of my amps gets | 20:42 |
rm_work | [2020-07-10 20:35:55 +0000] [1090] [DEBUG] Ignoring connection reset | 20:42 |
rm_work | and then won't respond for a while | 20:42 |
rm_work | then [2020-07-10 20:38:55 +0000] [1031] [CRITICAL] WORKER TIMEOUT (pid:1090) | 20:42 |
rm_work | and then exits and starts a new worker, then the new worker just gets constant SSL/socket errors | 20:43 |
johnsom | This is haproxy? <missing some context> | 20:43 |
rm_work | this is the agent | 20:43 |
rm_work | LBs getting stuck in pending | 20:43 |
rm_work | and eventually ERROR | 20:43 |
rm_work | (after timeout) | 20:43 |
johnsom | yeah, go to ERROR. Ok, so this is the gunicorn worker | 20:44 |
rm_work | yes | 20:44 |
johnsom | This rings a bell, I'm just not sure which one yet. | 20:44 |
rm_work | oh huh | 20:45 |
rm_work | [2020-07-10 20:34:10 +0000] [1090] [DEBUG] PUT /1.0/loadbalancer/083a6861-8ec9-47cb-81e6-6b03dbf45a1f/reload | 20:45 |
rm_work | ::ffff:10.249.23.94 - - [10/Jul/2020:20:35:40 +0000] "PUT /1.0/loadbalancer/083a6861-8ec9-47cb-81e6-6b03dbf45a1f/reload HTTP/1.1" 500 377 "-" "Octavia HaProxy Rest Client/0.5 (https://wiki.openstack.org/wiki/Octavia)" | 20:45 |
rm_work | right before it starts doing this | 20:46 |
johnsom | Yeah, is it memory pressure in the amp? | 20:46 |
rm_work | http://paste.openstack.org/show/3AbgVxoWHvxUwpf6N2Di/ | 20:47 |
*** KeithMnemonic has quit IRC | 20:47 | |
rm_work | no | 20:47 |
cgoncalves | could it be the haproxy memory bug gthiemonge has been working on? because you mentioned session persistence | 20:48 |
johnsom | Is there a "failed" haproxy config file? You know the one that it saves if haproxy doesn't like the config? | 20:48 |
rm_work | looking | 20:49 |
johnsom | Yeah, that is why I asked about the memory pressure | 20:49 |
rm_work | no | 20:49 |
rm_work | and the other amp took the config correctly | 20:49 |
rm_work | both amps seem to be around: | 20:49 |
rm_work | total used free shared buff/cache available | 20:49 |
rm_work | Mem: 979Mi 315Mi 540Mi 6.0Mi 123Mi 529Mi | 20:49 |
rm_work | seems not bad | 20:49 |
cgoncalves | rm_work, https://storyboard.openstack.org/#!/story/2007794 | 20:49 |
johnsom | Yeah, plenty | 20:50 |
rm_work | i can try changing the connection limit tho | 20:50 |
johnsom | I doubt it is related | 20:50 |
johnsom | Can you paste the systemd service file for haproxy? | 20:51 |
johnsom | Really I'm looking for the peer ID string | 20:51 |
rm_work | heh | 20:52 |
cgoncalves | hah | 20:52 |
rm_work | wait where is that | 20:52 |
rm_work | yeah it always happens when I add a member... | 20:52 |
rm_work | i am trying to confirm that also | 20:52 |
johnsom | You can also get the string I want with "ps -ef | grep haproxy" | 20:54 |
johnsom | Should be after the -L | 20:54 |
rm_work | ah | 20:54 |
rm_work | lEyka8jttt6jkyQiPHSB6AvUwU0 | 20:54 |
rm_work | no - | 20:54 |
johnsom | Ok, bummer, it's not that | 20:54 |
johnsom | what does the haproxy log file have? Anything interesting? | 20:55 |
rm_work | ah i do see something in there, was just looking | 20:55 |
rm_work | interesting | 20:55 |
rm_work | FD limit issues | 20:56 |
rm_work | on third member i think | 20:56 |
rm_work | http://paste.openstack.org/show/795804/ | 20:56 |
rm_work | maxcon related also | 20:56 |
johnsom | Nope, it's the usage output. | 20:57 |
johnsom | The FD stuff is always there | 20:57 |
johnsom | So, that peer ID has to be the problem. | 20:57 |
cgoncalves | wouldn't we see a second usage output if it was a bad peer ID? | 20:58 |
johnsom | Hmm, there is a "cannot fork" in there | 20:58 |
johnsom | Maybe it is memory. Try dropping the max connections on the listener down to 50k | 20:58 |
rm_work | wait so we ALWAYS have an FD problem? | 20:59 |
johnsom | Yeah, it always whines about the FD limit and drops it down to whatever the instance can handle | 20:59 |
johnsom | Something that is fixed in 2.x versions btw | 21:00 |
johnsom | That usage output makes me wonder though. I don't think we see that when it's just the memory being too low | 21:02 |
rm_work | hmmm this is weird tho | 21:05 |
rm_work | i created a third member and it worked fine... | 21:05 |
rm_work | so i deleted it, which worked | 21:06 |
rm_work | and recreated it | 21:06 |
rm_work | and now it broke one amp again | 21:06 |
cgoncalves | maxconn issue | 21:06 |
rm_work | memory is totally fine tho | 21:07 |
johnsom | Give it a shot though, I'm leaning that way as well | 21:07 |
cgoncalves | rm_work, it is until haproxy tries to reload | 21:07 |
rm_work | k | 21:08 |
*** gcheresh has quit IRC | 21:08 | |
johnsom | You could check syslog and see if there are oom logs, but I don't think there always are | 21:08 |
rm_work | i mean it then succeeds at loading haproxy again right after | 21:08 |
rm_work | and memory is NOW fine | 21:08 |
rm_work | but the amp agent is still totally busted | 21:08 |
rm_work | and therefore the amp is broken | 21:08 |
cgoncalves | rm_work, it works fine if you don't reload too fast | 21:09 |
rm_work | ok sooooo | 21:09 |
rm_work | why is the amp agent dead | 21:09 |
rm_work | and spewing connection errors | 21:09 |
rm_work | so haproxy failed to load -- ok | 21:09 |
rm_work | but now the amp agent can't accept connections? | 21:09 |
rm_work | how would one affect the other that way | 21:10 |
johnsom | I have some theories on that. my guess is the systemctl restart was hanging, which let the controller to timeout and close the connection, thus the connection reset, but gunicorn is still waiting on systemctl to give up or whatever | 21:10 |
rm_work | trying a manual restart of the amp agent to see if it comes back | 21:11 |
johnsom | check if there is a systemctl in the process list | 21:11 |
rm_work | ok well i restarted the agent | 21:12 |
rm_work | and the amp is back to ACTIVE | 21:12 |
rm_work | and now the LB is good | 21:12 |
rm_work | "restart" didn't work, had to do stop/start BTW | 21:12 |
rm_work | yeah | 21:15 |
rm_work | /bin/systemctl reload haproxy-143b63ec-6058-437f-8eb6-112380a612e4.service | 21:15 |
rm_work | it's stuck on the reload and timing out gunicorn, you're correct | 21:15 |
rm_work | so gunicorn doesn't handle that well and just ... breaks? | 21:15 |
rm_work | and can't recover | 21:15 |
johnsom | Well, we have one worker configured (good reasons) and that one worker is locked up with systemd dumbness | 21:16 |
rm_work | hmm | 21:16 |
rm_work | and systemd NEVER times out? | 21:17 |
rm_work | ah no, it is gone now | 21:17 |
rm_work | but agent is still borked I think | 21:17 |
johnsom | It does, but probably waits longer than gunicorn | 21:17 |
rm_work | so once that's timed out, gunicorn should start responding, right? | 21:17 |
johnsom | Yeah, we see it killing the worker, but I don't know what it does when a new worker is started, does it just run the request again? | 21:19 |
rm_work | doesn't seem to | 21:19 |
johnsom | So out of curiosity, did lowering the listener connection limit help? | 21:23 |
rm_work | still doing a bunch of testing to make sure i can 100% replicate | 21:25 |
rm_work | i think i'm about confident | 21:25 |
johnsom | Now that we are on python3 we could add a timeout to that systemd call | 21:35 |
*** shtepanie has joined #openstack-lbaas | 21:41 | |
aannuusshhkkaa | johnsom: are you around? | 21:42 |
johnsom | aannuusshhkkaa I am | 21:43 |
aannuusshhkkaa | so we were wondering if we should create a new table for the new metrics we are adding | 21:44 |
johnsom | Those are the CPU and RAM? | 21:45 |
shtepanie | yep, and later on we plan on adding things like load averages and disk usage | 21:45 |
aannuusshhkkaa | and probably some related to active connections and network bandwidth etc... | 21:46 |
johnsom | Hmm, do you need them stored for your use case? | 21:46 |
aannuusshhkkaa | as opposed to? | 21:46 |
johnsom | Well, we could collect them from the amphora, pass them to the metrics driver(s), and that is all. | 21:47 |
johnsom | We are not planning to add those to the API, so I don't know if they need to be stored or simply passed to the metrics driver(s) | 21:47 |
aannuusshhkkaa | hmm okay.. | 21:49 |
aannuusshhkkaa | so we wont need a new api either right? | 21:50 |
johnsom | We store the current metrics because they are exposed via the API, so if someone asks for the stats we don't have to wait for a message or poll the amphora. But these metrics I don't think we are planning to add to the API, we are just going to use them and/or send them somewhere. | 21:50 |
aannuusshhkkaa | so we just query the amphora for the NEW metrics say every 10 seconds, and use them however we want? | 21:52 |
johnsom | Well, the amphora is going to send them right? we aren't adding polling. | 21:53 |
shtepanie | we want our customer to be able to get the metrics when they ask for it, so wouldn't we fall into the first scenario? and if we don't add them to the api, the customer would just have to wait for the next message to get the stats? | 21:54 |
johnsom | Hmm, we didn't write up a spec did we rm_work? | 21:54 |
rm_work | we did not.... though i expected we WOULD expose those via the API somehow | 21:55 |
rm_work | I realize they're not always totally generic | 21:55 |
johnsom | Yeah, so maybe our use cases are very different. I thought we were updating the metrics and adding the ability to have metrics drivers, which would take the data and do something with it. | 21:56 |
rm_work | yes, both | 21:56 |
rm_work | but they're separate things | 21:56 |
rm_work | and if we pass that data to the metrics drivers, the update_db driver needs to store it too... | 21:56 |
johnsom | Yeah, the issues with the API are, 1. users aren't supposed to know amphora even exist, let alone the memory or CPU allocated to them. | 21:56 |
rm_work | right, been thinking about that | 21:57 |
johnsom | 2. we would have to store those | 21:57 |
rm_work | so if we didn't create a new table and added them to the listenerstats table... there would be a little bit of data duplication on the amp driver, but could allow other providers at least to store things granularly enough | 21:57 |
johnsom | 3. then expose more "amphora" stuff | 21:57 |
johnsom | 4. Is it going to be consistent with different amphora images, etc. | 21:58 |
rm_work | well, they all have a concept of CPU/RAM | 21:58 |
rm_work | and if it's percentages, then... pretty consistent | 21:58 |
johnsom | Not really per-listener or LB though | 21:58 |
johnsom | This just feels like exposing the sausage making, it's ugly and customers really only want the finished product. | 21:59 |
rm_work | yeah, i mean we will already have to make funky decisions like "do we store just the MAX of the usage values, since we have two amps returning data?" | 22:00 |
rm_work | you say that, but we've tried to convince our customer that all they need is active connections and an estimate of the maximum we support | 22:00 |
rm_work | and that has not been an accepted answer, they want to see CPU/RAM data | 22:01 |
johnsom | Yeah, you just can't add that to the listener or lb api. It wouldn't make an sense | 22:01 |
rm_work | it wouldn't necessarily make sense to SHOW on the listener stats api call | 22:02 |
johnsom | The only place you could add it is is to create a new "amphora stats" API | 22:02 |
rm_work | yeah, that's an option | 22:04 |
rm_work | I mean, it COULD be displayed on the loadbalancer stats | 22:04 |
rm_work | because I believe every provider will have these stats, even the hardware ones? wouldn't an F5 have CPU/RAM usage? | 22:04 |
johnsom | So you have an LB with five amphora on it. Do you average? | 22:04 |
rm_work | like i said, Peak | 22:04 |
rm_work | Max | 22:05 |
johnsom | No, F5 doesn't have this | 22:05 |
johnsom | F5 would have a few thousand load balancers all sharing a CPU and RAM | 22:05 |
rm_work | ah I didn't think F5 did multi-tenant | 22:06 |
rm_work | i guess it makes sense tho | 22:06 |
rm_work | i've just never seen it deployed that way | 22:06 |
aannuusshhkkaa | looks like F5 does? https://techdocs.f5.com/kb/en-us/products/big-ip_analytics/manuals/product/analytics-implementations-12-1-0/7.html | 22:07 |
aannuusshhkkaa | have cpu and ram usage.. correct me if I am wrong! | 22:07 |
johnsom | aannuusshhkkaa Yeah, that is exactly what I am saying. The only metrics for cpu/disk/ram on the appliances is for thousands of load balancers. It's not per-load balancer like the amphora. | 22:08 |
*** rcernin has joined #openstack-lbaas | 22:09 | |
aannuusshhkkaa | ouuu okay | 22:10 |
rm_work | hmm, it does have them? | 22:10 |
rm_work | and i guess it depends on the deployment | 22:10 |
johnsom | On F5 a load balancer is a "virtual server". The appliances are expensive, so you stack as many virtual servers (load balancers) on each appliance as you can. | 22:10 |
aannuusshhkkaa | and if one fails, does it mean the others are about to fail too? | 22:11 |
johnsom | Yeah, it is shared fate, but typically you have them in an HA pair and it fails over to the other appliance. | 22:12 |
rm_work | so you would -2 adding system statistics to LoadbalancerStats in any form? | 22:13 |
aannuusshhkkaa | gotcha, then using peak(max) totally makes sense right? | 22:13 |
johnsom | So every customer would see 95% basically. No matter if their load balancer was idle. | 22:14 |
aannuusshhkkaa | i guess a false positive indicating failure is safer than a false negative in this case right? | 22:16 |
rm_work | mixed bag -- don't necessarily want a ton of customers coming to us and complaining that their LB is always over 50% capacity and they want a new one | 22:16 |
johnsom | Just to give you an idea, this is the cheapest F5 appliance: https://www.softchoice.com/catalog/en-us/network-devices-f5-big-ip-iseries-local-traffic-manager-i2800-load-balancing-device-F5Networks-UX5251 | 22:17 |
rm_work | lol yeah | 22:17 |
aannuusshhkkaa | whaaaaat!!!!! | 22:17 |
aannuusshhkkaa | rm_work, right.. so what do we do then? about the mixed bag? | 22:18 |
*** dmellado has joined #openstack-lbaas | 22:20 | |
johnsom | Oh, ha, I was wrong, there is one for $16,000 | 22:20 |
rm_work | well, one option is we tell our customer "sorry, we know you want CPU/RAM stats, but... you don't get that. Trust us that the meaningful metric is current active connections, and use that." | 22:22 |
rm_work | which I tried to do in our last meeting and was ignored | 22:22 |
rm_work | or rather, told "no, that isn't good enough" | 22:23 |
rm_work | but we may just have to be more forceful | 22:23 |
johnsom | Yeah, so given that all pretty much every other driver those are shared resources across tenants, I don't think it is viable for adding to the load balancer or listener stats. | 22:23 |
johnsom | If you want to make something like that public, you could I guess come up with a bogo-mip kind of thing. It's not really CPU or RAM, but "bogo-load". | 22:24 |
johnsom | Most drivers would probably do number of connections over max | 22:25 |
rm_work | right, a "capacity units" measurement, 0-1 | 22:26 |
rm_work | something like that | 22:26 |
rm_work | and yeah, we would do connections over our estimated max | 22:26 |
johnsom | We already have that to some degree as the listener will go degraded if the connection level goes too high | 22:27 |
aannuusshhkkaa | how do we check that currently? just based on number of active connections? | 22:31 |
johnsom | Yes. You set your maximum number of connections when creating the listener. Then we currently collect the number of current active connections. | 22:32 |
johnsom | HAProxy also notifies us via the "FULL" state: https://github.com/openstack/octavia/blob/master/octavia/controller/healthmanager/health_drivers/update_db.py#L289 | 22:33 |
johnsom | That means connections are getting queued on the front end | 22:33 |
aannuusshhkkaa | aah gotcha | 22:40 |
aannuusshhkkaa | so the health of a LB is determined based on current connections/max number of connections? | 22:41 |
johnsom | Among the other status issues that could lead to a degraded or error operational state. | 22:41 |
rm_work | basically everything boils down to how many connections can be open | 22:42 |
rm_work | CPU/RAM is all just there so we can have more open connections | 22:42 |
johnsom | +1 to that | 22:42 |
rm_work | basically an operator should know in their environment how many connections is a theoretical max | 22:42 |
rm_work | and then that should be the "100%" | 22:43 |
aannuusshhkkaa | johnsom, what are the other "status issues"? | 22:43 |
johnsom | Like the number of member servers that are down, etc. | 22:43 |
aannuusshhkkaa | okay so would we have to incorporate those as well in determining the health of the LB? | 22:44 |
johnsom | So if you have a pool with five member servers, and one is not responding, that would be a degraded state as well. If all are not responding, then you are in ERROR. | 22:44 |
shtepanie | going back a little, but if HAProxy notifies us of a "FULL" state, is it also possible to notify us of a "almost full" type of state? some sort of warning when we're starting to use close to a full state? | 22:45 |
johnsom | Yeah, we already do. That is what the operational status is for | 22:45 |
shtepanie | ahh okay | 22:45 |
aannuusshhkkaa | do we already have something that would warn us when the LB is at 50% or 75% capacity? | 22:46 |
johnsom | Not really percentages. Really CPU and RAM won't tell you that either. | 22:48 |
aannuusshhkkaa | yeap yeap.. | 22:48 |
aannuusshhkkaa | so if we want to find that out, what would we use? | 22:49 |
johnsom | I mean a user can calculate the percentage using the active connections. We will say when you have reached you connection capacity limit. The part that is harder is deciding on the correct "MAX" to set. | 22:50 |
rm_work | yeah MAX has to be determined with performance testing per cloud | 22:50 |
aannuusshhkkaa | dont we already calculate the max for each cloud? | 22:51 |
johnsom | google "openstack octavia performance" a few lines down is a guide I wrote a long time ago that I guess is now published. It has a list of some of the factors that will go into why getting a "MAX" is hard. | 22:54 |
aannuusshhkkaa | alrighty.. i'll take a look | 22:56 |
rm_work | yeah it has to do with your network hardware setup, your compute hosts, and maybe a couple other things | 22:59 |
aannuusshhkkaa | https://developer.rackspace.com/docs/private-cloud/rpc/master/rpc-octavia-internal/octavia-perf-guide/ is this the link? | 22:59 |
rm_work | yeah | 22:59 |
aannuusshhkkaa | okay | 22:59 |
johnsom | rm_work So back to the haproxy reload issue. Systemd docs suck, so it's not clear which of those timeouts matter on a reload call. Worse yet, it implies the timeout is 100ms which can't be true. Any luck testing out the connection limit change on the listener? | 23:02 |
aannuusshhkkaa | so there are about 18 factors that contribute to the performance according to that page.. do we collect that data? if so, where can i find it? | 23:02 |
johnsom | At least that many factors... | 23:03 |
rm_work | yeah seems to be the connection limit thing | 23:03 |
aannuusshhkkaa | yeah.. maybe if we look at the logs, we can come up with a formula to determine LB capacity units.. | 23:03 |
rm_work | guess the memory pressure is too ephemeral | 23:03 |
johnsom | Yeah, great. ok. So we have been talking about this internally for a while. Thus the patch Greg posted, but last I looked it needs work. | 23:04 |
johnsom | Basically with the "unlimited" -1, we translated that into 1,000,000 connections. With the memory allocation up front, that is a sizable amount. Now, using the current reload mechanism for hitless reloads, haproxy starts a secondary process, or depending on how often the reloads come in, more. | 23:06 |
johnsom | What seems to be a bit new is how long the old processes stick around. | 23:06 |
johnsom | In general, what should happen is if haproxy doesn't have enough memory, it should fail and systemd *should* kill it and restart it. So the pain would only be that it was a non-hitless reload, but the amp should continue fine. However that isn't happening right. I found a bug in systemd that was causing restarts to not fire in the version centos has. That *should* be fixed, at least in RHEL. | 23:10 |
johnsom | The gunicorn issue.... that one i'm not sure about. If we coudl set a timeout for systemd reload, that would be great. The only other thought is setting a timeout on the subsystem call in python. | 23:11 |
rm_work | hmm | 23:21 |
rm_work | weird that even a "restart" on the agent didn't actually stop/start the agent properly either | 23:21 |
rm_work | it was kinda... stuck? | 23:21 |
rm_work | I think | 23:21 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!