Friday, 2018-04-13

*** Deific714 has joined #openstack-lbaas00:08
*** Deific714 has quit IRC00:10
openstackgerritMichael Johnson proposed openstack/octavia master: Switch to ubuntu-minimal for default amphora image
openstackgerritAdam Harwell proposed openstack/neutron-lbaas master: WIP: Test l7 proxy to octavia
*** harlowja has quit IRC00:52
*** ianychoi_ is now known as ianychoi01:01
openstackgerritAdam Harwell proposed openstack/neutron-lbaas master: WIP: Test l7 proxy to octavia
johnsomHa, bionic passed the gate01:41
xgerman_Proxy is down to 12 fails…02:59
openstackgerritMerged openstack/neutron-lbaas master: Fix pep8 errors
*** imacdonn has quit IRC03:09
*** imacdonn has joined #openstack-lbaas03:09
*** annp has joined #openstack-lbaas03:21
*** benj_UW23UQ has joined #openstack-lbaas03:23
*** benj_UW23UQ has quit IRC03:25
*** H48XK3b00kworm has joined #openstack-lbaas03:36
*** H48XK3b00kworm has quit IRC03:38
*** sanfern has joined #openstack-lbaas03:46
*** pppktzMPTGBW has joined #openstack-lbaas03:51
*** pppktzMPTGBW has quit IRC03:53
*** sanfern has quit IRC04:02
*** bbzhao has quit IRC04:09
*** bbzhao has joined #openstack-lbaas04:09
*** sapd has joined #openstack-lbaas04:11
*** zuollien has joined #openstack-lbaas04:11
*** zuollien has quit IRC04:12
*** harlowja has joined #openstack-lbaas04:44
*** sapd has quit IRC04:44
*** sapd has joined #openstack-lbaas04:47
*** links has joined #openstack-lbaas05:12
*** mordred has quit IRC05:29
*** harlowja has quit IRC05:39
*** mordred has joined #openstack-lbaas05:42
*** sapd has quit IRC05:45
*** dayou has quit IRC06:08
*** dayou has joined #openstack-lbaas06:23
*** AlexeyAbashkin has joined #openstack-lbaas06:37
*** slaweq has joined #openstack-lbaas06:54
*** AlexeyAbashkin has quit IRC06:58
*** tesseract has joined #openstack-lbaas07:17
*** rcernin has quit IRC07:33
*** AlexeyAbashkin has joined #openstack-lbaas07:41
*** AlexeyAbashkin has quit IRC07:46
*** dulek has joined #openstack-lbaas08:10
dulekHey people! We've started seeing issues with Octavia on Kuryr gates.08:11
dulekBasically in some runs Amphorae doesn't answer:
dulekAny ideas what might be the cause? I see your gates are healthy, so I'm pretty surprised.08:12
dmelladohey dulek08:14
dmelladoso, calling cgoncalves_08:14
dmelladoone two three08:14
dmelladoalso if you say bcafarel three times it might help08:14
dmelladobcafarel: bcafarel bcafarel08:14
dmelladoit's not beetlejuice but anyways08:14
openstackgerritAlberto Planas proposed openstack/octavia master: Update osutil support for SUSE distro
bcafareldmellado: nah you won't succeed planting the beetlejuice song in my head :p08:17
bcafarellooks like the first amphora replies in the end?
bcafareloh ok failure on config upload later on08:19
bcafarelmay need confirmation from others here, but if you reach config upload stage, the amphora itself is up and replying08:20
bcafarelbut then the agent in the amphora replies 50008:21
dulekbcafarel: Any way to see its logs?08:23
bcafareldulek: not at the moment IIRC :/ gathering amphora logs is still in the todo list08:25
bcafarel25s for health check, maybe the gate is missing nested kvm or something like that?08:26
dulekbcafarel: You think it's infra VM being slow?08:26
dulekbcafarel: I don't think it's missing that… Hm.08:27
dulekbcafarel: If it was infra's fault you should observe that in your gate as well.08:27
dulekOh, but you might have longer timeouts than us.08:27
bcafarelI admit I never checked the HM logs in gates :/ so not sure if replies that long happen often or not08:28
*** yamamoto has quit IRC08:33
openstackgerritMerged openstack/neutron-lbaas master: Cap haproxy log level severity
*** bbzhao has quit IRC08:44
*** bbzhao has joined #openstack-lbaas08:44
*** yamamoto has joined #openstack-lbaas08:44
*** yamamoto has quit IRC08:51
openstackgerritWei Li proposed openstack/octavia master: No need create vrrp port in TOPOLOGY_SINGLE
openstackgerritWei Li proposed openstack/octavia master: No need create vrrp port in TOPOLOGY_SINGLE
*** pcaruana has joined #openstack-lbaas09:12
*** salmankhan has joined #openstack-lbaas09:19
*** sanfern has joined #openstack-lbaas09:22
*** sanfern has quit IRC09:22
*** fnaval_ has quit IRC09:32
*** celebdor1 has joined #openstack-lbaas09:49
*** celebdor1 is now known as apuimedo09:50
*** salmankhan has quit IRC09:57
*** salmankhan has joined #openstack-lbaas10:09
*** annp has quit IRC10:29
*** salmankhan has quit IRC10:31
*** salmankhan has joined #openstack-lbaas10:31
*** irenab has quit IRC10:42
*** oanson has quit IRC10:44
rm_workdulek: our timeouts are very long, it can take up to like 6-8 minutes to boot one amphora in the gates without nested kvm11:30
rm_workyou will see a TON of the "timeout" messages, and they look very foreboding, but it usually eventually connects11:31
rm_workbut yeah, bcafarel's assessment seems correct... it got to creating the listener and that exploded.11:32
rm_workdo you use a custom amphora image at all, or just the same one devstack makes?11:32
*** atoth has joined #openstack-lbaas11:51
dulekrm_work: Currently it's the DevStack-made.12:03
dulekrm_work: We're planning to use the one from tarballs, but that commit isn't merged yet.12:04
dulekOne more data point - the issue is transient.12:04
cgoncalvesdulek, you mean ?12:07
dulekcgoncalves: I think so.12:08
dulekrm_work: Our timeout is 15 minutes.12:08
cgoncalvespatch was merged already; images being uploaded to tarballs.o.o12:09
dulekcgoncalves: I mean Kuryr gate patch.12:10
dulekcgoncalves: This one:
rm_workyeah it's not the timeout12:14
rm_worknot sure why but my guess is something is happening in your cloud such that for the first few seconds that the amphora VM is booted, it's a little unstable (like, maybe a cloud-init latency thing, or networking peculiarities)12:15
rm_workand so it does finally respond, but then it is still processing stuff in the background or something, so when the agent goes to set up the config (and probably the netns) it breaks12:16
rm_workreally wish our 500s were more useful12:16
cgoncalvesdulek, oh, ok. I lost the backlog so I got the conversation halfway through12:18
*** sanfern has joined #openstack-lbaas12:49
apuimedodulek: did you show cgoncalves the log from the gate?12:54
dulekapuimedo: Not from your run, but the others.12:54
*** dayou has quit IRC12:55
*** KeithMnemonic has joined #openstack-lbaas13:02
cgoncalvesapuimedo, dulek: I lost backlog older than 1h30. what I see from is a TypeError: delete_namespaced_service() takes exactly 4 arguments (3 given)13:04
dulekcgoncalves: Sorry for confusing you. We're currently investigating .13:08
dulekcgoncalves: But I have a feeling it's apuimedo's fault on that patch, not Octavia's.13:08
apuimedodulek: :'(13:08
dulekcgoncalves: And the issue you've listed is fixed now in DevStack. It's always like we get 2 or 3 gate breakages in at once. :D13:09
dulekapuimedo: Hey, haven't I convinced you on #openstack-kuryr?13:09
apuimedocgoncalves: when does an LB go operating_status OFFLINE with an ACTIVE provisioning status13:19
cgoncalvesapuimedo, heart beats not received13:20
apuimedoI see nothing on the health manager log13:23
apuimedo(deployed by tripleo)13:23
cgoncalvesapuimedo, glad you ask :)13:25
cgoncalvesbackported to and merged in queens already13:25
cgoncalvesapuimedo, rhbz downstream bug 1506644 in openstack-tripleo-common "Add support for configuring 'in-overcloud' resources for octavia through workflows and ansible" [High,Post] - Assigned to cgoncalves13:26
apuimedocgoncalves: so... without this fix... Does anything ever go into good operation?13:27
cgoncalvesapuimedo, IIRC amphorae still serve traffic13:28
*** fnaval has joined #openstack-lbaas13:30
*** samccann has joined #openstack-lbaas13:30
apuimedocgoncalves: so what does it affect? Only reporting?13:31
*** tzumainn has joined #openstack-lbaas13:33
cgoncalvesapuimedo, I'd expect octavia to trigger failover. 1-2 people have told me it was not; I haven't confirmed13:33
*** fnaval has quit IRC13:34
cgoncalvesrm_work, shouldn't house keeping trigger failover on LBs in operating_status=OFFLINE?13:34
apuimedocgoncalves: what does triggering failover mean in a non-ha environment, respawning the amphora?13:34
cgoncalvesapuimedo, yes. create new vm, configure amphora, delete failed amphora13:35
apuimedocgoncalves: thanks13:36
*** dayou has joined #openstack-lbaas13:38
*** velizarx has joined #openstack-lbaas14:29
*** fnaval has joined #openstack-lbaas14:42
*** AlexeyAbashkin has joined #openstack-lbaas14:45
*** velizarx has quit IRC14:46
*** velizarx has joined #openstack-lbaas14:49
*** AlexeyAbashkin has quit IRC14:49
*** velizarx has quit IRC14:49
*** openstackgerrit has quit IRC14:50
*** velizarx has joined #openstack-lbaas14:56
*** links has quit IRC14:58
*** velizarx has quit IRC14:59
*** velizarx has joined #openstack-lbaas15:01
*** velizarx has quit IRC15:06
*** pcaruana has quit IRC15:12
*** slaweq has quit IRC15:12
*** gokhan_ has quit IRC15:13
*** slaweq has joined #openstack-lbaas15:19
*** openstackgerrit has joined #openstack-lbaas15:23
openstackgerritMichael Johnson proposed openstack/octavia master: Switch to ubuntu-minimal for default amphora image
*** dayou has quit IRC15:24
*** slaweq has quit IRC15:30
*** dlundquist has joined #openstack-lbaas15:31
johnsomYeah, working on the image a bit. Save 100MB already15:32
johnsomPlus have a bionic image and gate in the works.15:32
apuimedoI tried it a while ago with centos minimal and ubuntu minimal15:33
apuimedoand shaved a lot15:33
apuimedoI didn't have time to clean up and submit though :(15:33
johnsomYeah, I tried minimal a year or two ago and it was super broken, but Ian has been getting things in shape.15:33
*** qwebirc75161 has joined #openstack-lbaas15:34
cgoncalveswant it really minimal? alpine is the answer :)15:37
johnsomcgoncalves Great, make it happen!  Thanks for volunteering.  Grin15:37
johnsomWait, did you guys buy that too?15:38
cgoncalveswe can't afford it atm. we're saving for next RH party15:40
johnsomWhere are our invites?15:40
xgerman_yep, by now I think we are on the black list for RH parties15:42
xgerman_it looks like they go out of their way not to invite us15:42
apuimedocgoncalves: I had the kuryr container based in minimal15:43
cgoncalvesfwiw i know nothing this time 'xD15:43
apuimedobut I'd tell you this. if you want really minimal15:43
apuimedoyou should do like I do for the kuryr testing container15:43
apuimedobusybox + static built binaries for extra tools15:43
apuimedoI think we're at 4MiB15:44
apuimedoor something15:44
*** qwebirc75161 has quit IRC15:44
apuimedothe curl static build was a PITA to get right15:44
johnsomYeah, it would be nice, but it's a bunch of work15:46
apuimedojohnsom: just for reference apart from python and haproxy, what does it use?15:51
apuimedoof course15:53
johnsomblah, bionic seems broken this morning.15:55
johnsomThe fun of testing with pre-RC bits15:56
*** salmankhan has quit IRC15:59
cgoncalvesjohnsom, shouldn't the health manager trigger failover of amps with operating_status=offline?15:59
cgoncalvesand provisioning_status=ACTIVE15:59
johnsomNo, operating status is the observed status. This can be because all of the backend servers are down, or the LB is not yet fully configured.16:00
johnsomIt's not a fault of the LB.16:00
johnsomIt could also be that we never received a health heartbeat from the amp. I.e. bad network setup16:01
*** slaweq has joined #openstack-lbaas16:01
cgoncalveswell, what if amp is created with no members *and* health manager does not receive heartbeats?16:01
*** salmankhan has joined #openstack-lbaas16:01
johnsomSo, in summary, operating_status offline is not necessarily a failure of the LB16:01
cgoncalvesok, so from the moment it received the first heartbeat if it doesn't receive again it should failover?16:02
cgoncalvesok, good16:02
cgoncalvesthat's what we're observing now in tripleo envs. there were 2 misconfigurations: 1) controller_ip_list and 2) firewall :555516:03
johnsomThere is a bug open to enable failover of an amp that never sends it's initial heartbeat, but that is tricky and likely if it never sends, failover probably isn't going to fix it.16:03
cgoncalvespatched submitted and merged16:03
johnsomOk, cool16:03
cgoncalvesyeah, in that scenario it would enter a failover loop16:04
johnsomRight, which is good or bad...  It opens the non-linear back off can of worms16:04
*** dayou has joined #openstack-lbaas16:07
*** pcaruana has joined #openstack-lbaas16:15
*** apuimedo has quit IRC16:24
*** links has joined #openstack-lbaas16:53
*** bbzhao has quit IRC17:02
*** bbzhao has joined #openstack-lbaas17:02
openstackgerritMichael Johnson proposed openstack/octavia master: Switch to ubuntu-minimal for default amphora image
*** slaweq has quit IRC17:19
*** atoth has quit IRC17:30
*** atoth has joined #openstack-lbaas17:32
*** tesseract has quit IRC17:53
*** links has quit IRC18:02
*** salmankhan has quit IRC18:08
*** atoth has quit IRC18:11
mnaserhi everyone18:12
mnaserso i'm deploying octavia for a private cloud customer right now18:12
mnaseris there a fairly friendly way of plugging the octavia control plane to a neutron network18:13
openstackgerritMichael Johnson proposed openstack/octavia master: DNM: Gate test
mnaserthe only network routable between their control plane and vms is 'public' but i'd rather avoid publics ips if i could18:13
mnaseri remember someone had a trick of manually creating a port in openvswitch or something18:13
mnaserlike creating a vxlan network and somehow getting the control plane to get an ip out of it18:14
johnsomWell, we use the OVS trick in devstack:
johnsomBut, you need to think about your HA strategy so you don't have a single pop out of neutron18:15
*** links has joined #openstack-lbaas18:15
mnasersingle pop out of neutron?18:16
johnsomYeah, if you bridge out of OVS onto a control plane network18:17
johnsomIf you just create an interface on all of you control plane hosts, that is fine. It just means you have neutron on each of your octavia controllers18:18
mnaserall control plane servers run neutron on them anyways18:18
johnsomWell, then yeah, the trick we use for devstack will work for you.18:18
mnaserso ill have an octavia network and 3 interfaces each one on a controller18:19
johnsomYeah, just create the lb-mgmt-net in neutron, exclude addresses for the controllers, then create the interface via ovs, bind it in the namespace for octavia or make sure the network you pick doesn't conflict with other controller stuff.18:21
*** links has quit IRC18:31
*** apuimedo has joined #openstack-lbaas18:32
mnaserjohnsom: yup thats the plan, okay cool18:35
mnaseri'll keep you updated on how that rolls out18:35
mnaserpretty excited but a little bit nervous about this being in the hands of the customers, they'll be giving it quite the workout18:35
mnaserthey dynamically reconfigure their lb's all the time, use SNI, etc18:35
*** numans has quit IRC18:50
*** sanfern has quit IRC18:50
*** numans has joined #openstack-lbaas18:52
*** apuimedo has quit IRC18:57
*** harlowja has joined #openstack-lbaas19:10
*** numans has quit IRC19:25
*** numans has joined #openstack-lbaas19:28
*** slaweq has joined #openstack-lbaas19:43
*** gokhan has joined #openstack-lbaas20:37
johnsomrm_work around?20:54
*** samccann has quit IRC21:09
*** tzumainn has quit IRC21:15
rm_workjohnsom: i am now21:19
xgerman_and I was about to alias myself21:19
johnsomrm_work on your patch:
johnsombelow that, line 297, I think that filter should go21:20
rm_workcgoncalves / johnsom: my theory on the "initial heartbeat" thing, was that the second we first get an active connection to the amp for vip-plug (meaning the agent is up) we put in a fake heartbeat for like, 5m in the future (or something) and then if it doesn't get overwritten by real heartbeats starting, it'll failover after that21:22
rm_workwe're not in a hurry on new LBs IMO since they won't have any existing traffic21:23
rm_workjohnsom: wait, what?21:23
rm_workthat's in a totally different function than i'm touching?21:24
johnsomThis is in addition to your change21:24
rm_workso what should go there?21:24
xgerman_I still think we should explore more root causes before bringing out the hammer21:24
xgerman_(deleting all ports on the sec-grp)21:25
johnsomWhy can't I do a "sudo systemctl restart devstack@o-hm"?  The old listener process hangs around and the new one does:  error: [Errno 98] Address already in use21:25
johnsomrm_work I think we should take out the filter for "allocated" amps. I think that is too narrow a scope. Maybe it should try all of them.21:26
cgoncalvesrm_work, yeah although there's at least one corner case: misconfiguration of controller ip list and/or firewall dropping heartbeats. with what you said, it would failover over and over again21:26
xgerman_johnsom: +121:26
rm_workjohnsom: ah so21:27
xgerman_but we should put this in an extra patch and test independently - my 2ct21:29
rm_workjohnsom: yeah i do kinda agree, i think21:30
rm_workjust ...21:30
rm_workall amphora21:31
rm_workwe only call deallocate VIP at the end of a delete for a LB, right?21:31
xgerman_as part of it: we unplug_vip, deallocate VIP, and then remove the vm21:32
xgerman_unplug_vip and deallocate_vip share some code21:32
rm_workbut it's 100% only on LB deletion21:35
rm_workthen yeah, we should just do it21:35
rm_workit reminds me of a little bit21:36
xgerman_yeah, we need a couple of fail saves21:36
johnsomHmm, something is fishy here too.  Stats isn't getting called for me21:37
rm_workin which patch?21:38
*** slaweq has quit IRC21:38
rm_workcgoncalves: yep :/21:38
rm_workjohnsom: wee yeah but like... where isn't stats getting called?21:39
rm_worka tempest test you're working on?21:39
*** slaweq has joined #openstack-lbaas21:39
johnsomNo, I just have an LB created. It never updates the stats.  Tracing it back now.21:39
*** vkceggzw has joined #openstack-lbaas21:39
johnsomYeah, ok stats driver doesn't load and silently fails.21:40
johnsomImportError: No module named octavia_controller.healthmanager.health_drivers.update_db21:42
rm_workmissed a thing in a refactor21:42
rm_workyou got a fix or should I21:42
johnsomI will push up a fix21:43
*** slaweq has quit IRC21:44
*** slaweq has joined #openstack-lbaas21:44
*** vkceggzw has quit IRC21:46
*** numans has quit IRC21:48
openstackgerritMichael Johnson proposed openstack/octavia master: Fix statistics update typo
rm_workwhen did that even happen21:48
johnsomWhen you made the updates drivers21:49
johnsomSo new to rocky21:49
rm_workso a while ago21:49
rm_workok that's good at least21:49
rm_worknot in queens21:49
*** numans has joined #openstack-lbaas21:49
johnsomback to that filter, do you want to make that change?21:51
rm_workyeah i can do that21:51
xgerman_can we have it in a separate patch?21:52
johnsomI am warming to whack-a-mole21:52
rm_worki think it's related21:52
xgerman_in one might cause the other21:52
rm_workyeah basically IMO there are some places we can be more careful, maybe, but no matter how careful we are we will get screwed by nova/neutron being shitty sometimes21:52
rm_workso we may as well just have our hammer out21:52
rm_workxgerman_: i can do it as a second patch if you really have a problem with it being the same21:54
xgerman_I wouldn’t say problems — it’s just a preference to make backporting easier21:54
rm_workbut IMO it should be easier to just roll it in21:54
rm_workeven for backporting :/21:54
rm_workthe change is related, it's all delete-flow21:54
johnsomYeah, I think it is related and I'm inclined to go ahead with the port delete patch21:55
xgerman_ok, I will review it with a fine comb so we have the appropriate logging21:55
rm_workyeah honestly now that you point it out johnsom, i am not sure why we ever filtered that21:57
johnsomMy bad actually21:57
johnsomI think I was trying to save time in the failover flow by not making calls for deleted stuff.  Plus the AAP driver has been dumb in the past and blown up on deleting things that were already deleted.21:58
rm_workanywho, running tests22:01
rm_workjohnsom: oh, did you see my question about the issue in my gate test22:01
rm_workit isn't finding the method from the plugin.sh22:01
rm_workare we not allowed to use those there?22:01
rm_workah it was in PM22:01
*** slaweq has quit IRC22:10
openstackgerritAdam Harwell proposed openstack/octavia master: When SG delete fails on vip deallocate, try harder
rm_workthere you go22:14
rm_workalso added a clarifying comment to the other bit22:16
rm_workbecause no matter how many times i explain, people keep thinking it's deleting ALL the ports, but no, it really is JUST the ones we own, I swear22:17
openstackgerritMichael Johnson proposed openstack/neutron-lbaas master: Fix the double zuul project definition
*** fnaval has quit IRC22:27
*** KeithMnemonic has quit IRC22:41
*** fnaval has joined #openstack-lbaas23:04
*** fnaval has quit IRC23:04
openstackgerritAdam Harwell proposed openstack/neutron-lbaas master: WIP: Test l7 proxy to octavia
openstackgerritMichael Johnson proposed openstack/octavia master: Improve the error logging for zombie amphora

Generated by 2.15.3 by Marius Gedminas - find it at!