20:00:03 <johnsom> #startmeeting Octavia 20:00:04 <openstack> Meeting started Wed May 2 20:00:03 2018 UTC and is due to finish in 60 minutes. The chair is johnsom. Information about MeetBot at http://wiki.debian.org/MeetBot. 20:00:05 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 20:00:08 <openstack> The meeting name has been set to 'octavia' 20:00:16 <johnsom> Hi folks! 20:00:49 <cgoncalves> hi 20:00:52 <xgerman_> o/ 20:01:00 <johnsom> #topic Announcements 20:01:12 <johnsom> The only announcement I have this week is that we have a new TC elected: 20:01:17 <xgerman_> +1 20:01:18 <johnsom> #link https://governance.openstack.org/election/results/rocky/tc.html 20:02:06 <johnsom> Oh, and there is now an Octavia ingress controller for Octavia 20:02:13 <johnsom> #link https://github.com/kubernetes/cloud-provider-openstack/tree/master/pkg/ingress 20:02:35 <johnsom> Any other announcements this week? 20:03:14 <johnsom> #topic Brief progress reports / bugs needing review 20:03:49 <johnsom> I have been busy working on the provider driver. The Load Balancer part is now complete and up for review comments. 20:03:56 <johnsom> #link https://review.openstack.org/#/c/563795/ 20:04:16 <johnsom> It got a bit big due to single-call-create being part of load balancer. 20:04:30 <rm_work> o/ 20:04:32 <johnsom> So, I'm going to split it across a few patches (and update the commit to reflect that) 20:05:01 <nmagnezi> johnsom, thank you for taking a lead on this. I will review it. 20:05:06 <johnsom> Ha, I guess there is that announcement as well 20:05:33 <rm_work> I have been working on the octavia tempest plugin. Two patches ready for review (although I need to address johnsom's comments) 20:05:36 <johnsom> I think the listener one will be a good example for what needs to happen with the rest of the API. It's up next for me 20:05:53 <johnsom> +1 on tempest plugin work 20:07:06 <johnsom> Any updates on Rally or grenade tests? 20:07:53 <cgoncalves> sorry, I still need to resume the grenade patch 20:08:24 <johnsom> Ok, NP. Just curious for an update. 20:08:32 <nmagnezi> johnsom, the rally scenario now works, i have some other internal fires to put out and then I'll iterate back to run it and report the numbers. it had a bug with the loadbalancers cleanup which is fixed now. so we are in a good shape there overall. 20:08:47 <johnsom> Cool! 20:09:11 <johnsom> Any other updates this week or should we move on to our next agenda topic? 20:09:24 <nmagnezi> yeah :) it took quite a few tries but it worth the effort i think. 20:09:36 <johnsom> #topic Discuss health monitors of type PING 20:09:44 <johnsom> #link https://review.openstack.org/#/c/528439/ 20:09:53 <johnsom> nmagnezi This is your topic. 20:10:04 <nmagnezi> open it ^^ while gerrit still works :) 20:10:13 <rm_work> PING is dumb and should be burned with fire 20:10:17 <nmagnezi> so, rm_work submitted a patch to allow operators to block it 20:10:26 <johnsom> I can give a little background on why I added this feature. 20:10:39 <cgoncalves> rm_work: wait for it. I think you will like it ;) 20:10:46 <johnsom> 1. Most load balancers offer it. 20:10:49 <rm_work> johnsom: because you want users to suffer? 20:10:52 <nmagnezi> i commented that I understand rm_work's point, but I don't know if adding a config option is a good idea here 20:11:02 <nmagnezi> rm_work, lol 20:11:33 <rm_work> we're handing them a gun and pointing it at their foot for them 20:11:34 <nmagnezi> anyhow, the discussion I think we should have is whether or not we want to deprecate and later remove this option from our API 20:11:47 <rm_work> cgoncalves: you're right :) 20:11:57 <johnsom> 2. I was doing some API load testing with members and wanted them online, but not getting HTTP hits to skew metrics. 20:12:53 <rm_work> you could also just ... not use HMs in a load test... they'll also be "online" 20:13:02 <rm_work> or use an alternate port 20:13:10 <johnsom> Well, they would be "no monitor" 20:13:35 <rm_work> does TCP Connect actually count for stats? 20:13:36 <johnsom> It was basically, ping localhost so they all go online no matter what. 20:14:17 <johnsom> So, I'm just saying there was a reason I went to the trouble to fix that (beyond the old broken docs that listed it) 20:15:11 <rm_work> we could rename it to "DO_NOT_USE_PING" 20:15:16 <nmagnezi> johnsom, your opinion is that we should keep ping hm as is? 20:15:38 <johnsom> Now, I fully understand that joe-I-don't-know-jack-but-am-an-load-balancer-expert will use PING for all of the wrong reasons.... I have seen it with my own eyes. 20:16:18 <rm_work> in *most openstack clouds* the default SG setup is to block ICMP 20:16:29 <rm_work> though I guess I can't back that up with actual survey data 20:16:47 <johnsom> Nice, so they instantly fail and they don't get too burned by being dumb 20:16:54 <johnsom> grin 20:16:56 <rm_work> so people are like "all my stuff is down, your thing is broken" 20:17:41 <xgerman_> I dislike most ooenstack clouds — there are some wacky clpuds out there 20:17:46 <rm_work> lol 20:18:02 <johnsom> My stance is, most, if not all of our load balancers support it. There was at least one use case for adding it. It's there and works (except on centos amps). Do we really need to remove it? 20:18:05 <nmagnezi> johnsom, in your eyes, what are the right reasons for using ping hm? 20:18:16 * xgerman_ read about people using k8s to loadbalance since they don’t want to upgrade from Mitaka 20:18:27 <johnsom> Testing purposes only... Ha 20:18:34 <nmagnezi> lol 20:19:19 <nmagnezi> i'm not asking if we should or shouldn't remove this because of the centos amps. I'm asking this because it seem that everyone agree with rm_work's gentle statements about ping :) 20:19:38 * rm_work is so gentle and PC 20:20:17 <rm_work> tremendously gentle, everyone says so. anyone who doesn't is fake news 20:20:27 <johnsom> #link http://andrewkandels.com/easy-icmp-health-checking-for-front-end-load-balanced-web-servers 20:20:29 <johnsom> lol 20:20:34 <cgoncalves> +1. unless there's a complelling use case for keeping ping, I'm for removing it 20:20:48 <rm_work> we SHOULD probably check with some vendors 20:20:54 <rm_work> I wish we had more participation from them 20:20:58 <nmagnezi> the point i'm trying to make here is that if ping is something we would want to keep, i don't think we need a config option to block it. 20:21:06 <xgerman_> +1 20:21:12 <rm_work> I don't even see most of our vendor contacts in-channel anymore 20:21:20 <nmagnezi> if we agree that it should be removed, we don't need that config option as well :) 20:21:26 <xgerman_> that’s why we are ding providers 20:21:38 <rm_work> nmagnezi: yeah, this was supposed to be a compromise 20:21:53 <rm_work> you could argue that all compromise is bad and we should just pick a direction 20:21:54 <xgerman_> anyhow, I think ping has value — not everybody runs HTTP or TCP 20:22:00 <xgerman_> we have UDP coming up 20:22:06 <johnsom> Yeah, from what I see, all of our vendors support ICMP 20:22:15 <rm_work> alright 20:22:16 <rm_work> well 20:22:37 <xgerman_> just trying to thik through a UDP healthmonitor 20:22:39 <johnsom> This is true, UDP is harder to check 20:22:42 <rm_work> yes 20:22:57 <johnsom> Maybe someone will want us to load balance ICMP.... 20:22:58 <johnsom> grin 20:23:03 <rm_work> but that's why there's TCP_CONNECT and alternate ports 20:23:03 <nmagnezi> HAHA 20:23:33 <rm_work> any reason a UDP member wouldn't allow a TCP_CONNECT HM with the monitor_port? 20:23:53 <johnsom> Yes, if they don't have any TCP code.... 20:24:09 <nmagnezi> rm_work, that might depend on the app you run on the members 20:24:51 <rm_work> i mean 20:24:54 <johnsom> Yeah, so F5, A10, radware, and netscaler all have ICMP health check options 20:24:56 <rm_work> you would run another app 20:25:04 <rm_work> that is a health check for the UDP app 20:25:08 <rm_work> to make sure it is up, etc 20:25:33 <rm_work> so combo of connectable + 200OK response == good 20:25:43 <rm_work> I was pretty sure that was the standard for healthchecking stuff and why we added the monitor_port thing to begin with 20:25:48 <johnsom> Well, some of this UDP stuff is for very dumb/simple devices. That was what the use case discussion was at the PTG around the need for UDP 20:26:02 <nmagnezi> rm_work, sounds a little bit redundant. if you want to check the health of you ACTUAL app, why have another one just to answer the lb? 20:26:06 <xgerman_> probably not too dumb for ICMP 20:26:25 <nmagnezi> (but you could argue the same for ICMP, but at least it checks networking.. ha) 20:26:28 <johnsom> So, if the concern is for users mis-using ICMP, should we maybe just add a warning print to the client and dashboard? 20:26:37 <nmagnezi> johnsom, +! 20:26:40 <nmagnezi> johnsom, +1 20:26:48 <xgerman_> +1 20:26:58 <rm_work> k T_T 20:27:01 <rm_work> I am ok with this 20:27:03 <nmagnezi> johnsom, i would add another warning to the logs as well 20:27:25 <cgoncalves> +1, plus warning msg in server side? 20:27:32 <rm_work> eh, logs just go to ops, and they can see it in the DB 20:27:33 <rm_work> which is easier to check 20:27:34 <rm_work> and they already know it's dumb 20:27:38 <rm_work> i wouldn't bother with the server side 20:27:45 <johnsom> Eh, not sure operators would care that much what heatlh monitors the users are setting. Does that cross the "INFO" log level????? 20:27:46 <rm_work> its users we need to reach 20:28:28 <nmagnezi> johnsom, a user being dump sounds like a warning to me :) 20:28:34 <nmagnezi> dumb* 20:29:12 <johnsom> Yeah, I just want us to draw a balance between filling up log files with noise and having actionable info in there. 20:29:38 <nmagnezi> well, you only print it once, when the it's created 20:29:50 <nmagnezi> so it's not spamming the logs that bad 20:30:02 <johnsom> Ha, I have seen projects with 250 LBs in it. Click-deploy.... 20:30:27 <johnsom> I am ok with logging it, no higher than INFO, if you folks think it is useful 20:30:41 <nmagnezi> fair enough. 20:30:51 <rm_work> wait, isn't info the one that always prints? 20:31:01 <rm_work> or, i guess that was your point 20:31:02 <rm_work> k 20:31:08 <johnsom> It would be some "fanatical support" to have agents call the user that just did that.... Grin 20:31:40 <rm_work> I would set up an automated email job 20:31:46 <nmagnezi> lol 20:32:02 <johnsom> That was flux... 20:32:21 <johnsom> Ha, ok, so where are we at with the config patch? 20:32:22 <rm_work> "We noticed you just created a PING Health Monitor for LB #UUID#. We recommend you reconsider, and use a different method for the following reasons: ...." 20:33:04 <rm_work> I mean... I would still like to be able to disable it, personally, but I grant that it should probably remain an option at large (however reluctantly) 20:33:07 <johnsom> I can open a story to add warnings to the client and dashboard 20:33:32 <rm_work> I can put WIP on this one or DNM or whatever, and just continue to pull it in downstream I guess <_< 20:33:48 <rm_work> I just figured a config couldn't hurt 20:34:11 <rm_work> the way I designed it, it would explain to the user when it blocks the creation 20:34:17 <nmagnezi> rm_work, if everyone else agree on that, I will not be the one to block it. Just wanted to raise discussion around this topic 20:34:22 <johnsom> I am ok with empowering operators myself 20:34:55 <rm_work> can we get CentOS to 1.8? :P 20:35:05 <rm_work> I'd have a much weaker case then 20:35:13 <xgerman_> \me wrong person to ask 20:35:14 <cgoncalves> +1, still knowing nmagnezi is not a fan of adding config options like this 20:35:15 <nmagnezi> cgoncalves and myself are working on it. it's not easy but we are doing our best :) 20:35:25 <cgoncalves> rm_work: soon! ;) 20:35:33 <rm_work> k 20:35:33 <nmagnezi> rm_work, we'll keep you posted 20:35:34 <rm_work> I mean 20:35:35 <rm_work> if we got a more official repo 20:35:40 <rm_work> we don't even need it in the main repo 20:35:49 <rm_work> we could merge my patch to the amp agent element 20:35:54 <rm_work> err, amp element 20:36:10 <rm_work> (which I already pull in downstream) 20:36:21 <cgoncalves> rm_work: short answer is: likely to have 1.8 in OSP14 (Rocky) 20:36:32 <rm_work> in what way? 20:36:41 <rm_work> CentOS amps based on CentOS8? 20:36:51 <rm_work> Official repo for OpenStack HAProxy? 20:37:00 <rm_work> HAProxy 1.8 backported into CentOS7? 20:37:34 <cgoncalves> cross tag. haproxy rpm in osp repo, same rpm as from openshift/pass repo 20:37:43 <rm_work> ok 20:37:53 <rm_work> so we would update and merge my patch 20:38:06 <cgoncalves> we will keep haproyz 1.5 but add 'haproxy18' package 20:38:11 <rm_work> yeah 20:38:18 <johnsom> #link https://storyboard.openstack.org/#!/story/2001957 20:38:54 <cgoncalves> rm_work: you could then delete the repo add part from your patch 20:39:01 <rm_work> ok 20:39:08 <rm_work> i wish i could look up that CR now >_> 20:39:14 <rm_work> great timing on gerrit outage for us, lol 20:40:55 <johnsom> So, I guess to close out the PING topic, vote on the open patch. (once gerrit is back) 20:41:16 <johnsom> #topic Open Discussion 20:41:23 <johnsom> Any topics today? 20:41:33 <rm_work> Multi-AZ? 20:41:42 <rm_work> I have a patch, it is actually reasonable to review 20:42:01 <rm_work> the question is... since it will only work if every AZ is routable on the same L2... is this reasonable to merge? 20:42:26 <rm_work> At least one other operator was doing the same thing and even had some similar patches started 20:42:28 <johnsom> We have a bionic gate, it is passing, but I'm not sure how giving the networking changes they made. It must have a backward compatibility feature. It's on my list to go update the amphora-agent for bionic's new networking. 20:43:28 <johnsom> I have not looked at the AZ patch, so can't really comment at the moment 20:43:32 <rm_work> (or if they're using an L3 networking driver) 20:43:44 <rm_work> k, it's more about whether the concept is a -2 or not 20:45:27 <johnsom> In general mutli-AZ seems great to me. However the details really get deep 20:47:06 <rm_work> yeah 20:47:33 <rm_work> though if you have a routable L2 for all AZs, or you use an L3 net driver... then my patch will *just work* 20:47:37 <xgerman_> +1 20:47:39 <rm_work> and the best part is that the only required config change is ... adding the additional AZs to the az config 20:47:51 <rm_work> :) 20:48:21 <xgerman_> Would love nova to do something reasonable but in the interim… 20:49:19 <johnsom> Yeah, so I think it's down to review 20:49:39 <johnsom> Which brings me to a gentle nag.... 20:49:39 <xgerman_> +1 20:49:49 <johnsom> #link ttps://review.openstack.org/#/q/(project:openstack/octavia+OR+project:openstack/octavia-dashboard+OR+project:openstack/python-octaviaclient+OR+project:openstack/octavia-tempest-plugin)+AND+status:open+AND+NOT+label:Code-Review%253C0+AND+NOT+label:Verified%253C%253D0+AND+NOT+label:Workflow%253C0 20:50:09 <johnsom> Well, when gerrit is back up. 20:50:10 <nmagnezi> johnsom, forgot an 'h' 20:50:28 <rm_work> ono 20:50:29 <johnsom> There are a ton of open un-reviewed patches.... 20:50:38 <johnsom> #undo 20:50:39 <openstack> Removing item from minutes: #link ttps://review.openstack.org/#/q/(project:openstack/octavia+OR+project:openstack/octavia-dashboard+OR+project:openstack/python-octaviaclient+OR+project:openstack/octavia-tempest-plugin)+AND+status:open+AND+NOT+label:Code-Review%253C0+AND+NOT+label:Verified%253C%253D0+AND+NOT+label:Workflow%253C0 20:50:42 <rm_work> so many 20:50:50 <rm_work> I need to go review too, but 20:50:53 <johnsom> #link https://review.openstack.org/#/q/(project:openstack/octavia+OR+project:openstack/octavia-dashboard+OR+project:openstack/python-octaviaclient+OR+project:openstack/octavia-tempest-plugin)+AND+status:open+AND+NOT+label:Code-Review%253C0+AND+NOT+label:Verified%253C%253D0+AND+NOT+label:Workflow%253C0 20:50:55 <rm_work> not just me :P 20:51:15 <johnsom> Yeah, please take a few minutes and help us with reviews. 20:51:41 <johnsom> Any other topics today? 20:52:30 <johnsom> Ok then. Thanks everyone! 20:52:35 <johnsom> #endmeeting