20:00:11 <johnsom> #startmeeting Octavia
20:00:12 <openstack> Meeting started Wed Aug  1 20:00:11 2018 UTC and is due to finish in 60 minutes.  The chair is johnsom. Information about MeetBot at http://wiki.debian.org/MeetBot.
20:00:13 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
20:00:15 <openstack> The meeting name has been set to 'octavia'
20:00:20 <johnsom> Hi folks!
20:00:35 <cgoncalves> o/
20:00:47 <johnsom> #topic Announcements
20:00:54 <rm_work> o/
20:01:09 <nmagnezi> O/
20:01:12 <johnsom> We are still tracking priority bugs for Rocky.  We are in feature freeze, but we can still be fixing bugs.....
20:01:18 <johnsom> #link https://etherpad.openstack.org/p/octavia-priority-reviews
20:01:37 <xgerman_> o/
20:02:06 <johnsom> As an FYI, Rocky RC1 is next week. This is where we will cut a stable branch for Rocky. We should strive to have as many bug fixes in as we can.
20:02:19 <johnsom> It would be super nice to  only do one RC and start work on Stein
20:02:34 <johnsom> I do have some sad news for you however....
20:03:02 <johnsom> Since no one ran against me, you are stuck with me as PTL for another release.....
20:03:11 <johnsom> #link https://governance.openstack.org/election/
20:03:15 * xgerman_ raises the “4 more years” sign
20:03:18 <cgoncalves> 4 more years \o/ !!
20:03:28 * nmagnezi joins xgerman_
20:03:45 <johnsom> You all are trying to make me crazy aren't you....
20:03:59 <cgoncalves> crazier
20:03:59 <xgerman_> just showing our appreciation…
20:04:01 <nmagnezi> johnsom, you scared me for a sec
20:04:10 <nmagnezi> johnsom, not cool :)
20:04:37 <johnsom> Towards the end of the year it will be three years for me. I would really like to see a change in management around here, so....  Start planning your campaign.
20:04:38 <xgerman_> this is not where we elect PTLs for a year?
20:05:23 <johnsom> Not yet. Maybe the cycle after Stein will be longer than six months....
20:05:42 <johnsom> Actually Stein is going to be slightly longer than a normal release to sync back with the summits
20:05:53 <xgerman_> :-)
20:05:56 <johnsom> #link https://releases.openstack.org/stein/schedule.html
20:06:05 <johnsom> If you are interested in the Stein schedule
20:06:57 <johnsom> Also an FYI, all of the OpenStack IRC channels now require you to be signed in with a freenode account to join.
20:07:05 <johnsom> #link http://lists.openstack.org/pipermail/openstack-dev/2018-August/132719.html
20:07:37 <johnsom> There have been bad IRC spam storms recently. I blocked our channel yesterday, but infra has done the rest today.
20:07:49 <cgoncalves> I see the longer Stein release as a good thing this time around
20:08:21 <johnsom> It doesn't mean we can procrastinate though.... grin
20:08:42 <johnsom> I think my early Stein goal is going to be implement flavors
20:09:05 <johnsom> That is all I have for announcements, anything I missded?
20:09:45 <johnsom> #topic Brief progress reports / bugs needing review
20:10:23 <johnsom> I have been pretty distracted with internal stuffs over the week, but most of that is clear now (some docs to create which I hope to upstream).
20:10:40 <johnsom> I have also been focused on the UDP patch and helping there.
20:10:54 <xgerman_> did two bug fixes: one when nova doesn’t release the port for failover + one for the zombie amps
20:12:04 <johnsom> Yeah, the nova thing was interesting. Someone turned off a compute host for eight hours.  Nova just sits on the instance delete evidently and doesn't do it, nor release the attached ports.
20:12:38 <johnsom> If someone has a multi-node lab where they can power of compute hosts, that patch could use some testing assistance.
20:12:47 <xgerman_> +10
20:13:21 <xgerman_> my multinode lab has customers — so I can’t chaos monkey
20:13:24 <cgoncalves> nothing special from my side: some octavia and neutron-lbaas backporting, devstack plugin fixing and CI jobs changes + housekeeping, then some tripleo-octavia bits
20:13:46 <rm_work> interesting, we're ... having that happen here, as we're patching servers on a rolling thing, and some hosts end up down for a while sometimes <_<
20:13:46 <johnsom> Any other updates?  nmagnezi cgoncalves ?
20:14:04 <cgoncalves> xgerman_, could your patch (which I haven't looked yet) improve scenarios like https://bugzilla.redhat.com/show_bug.cgi?id=1609064 ? it sounds like it
20:14:04 <openstack> bugzilla.redhat.com bug 1609064 in openstack-octavia "Rebooting the cluster causes the loadbalancers are not working anymore" [High,New] - Assigned to amuller
20:14:10 <nmagnezi> On my end: have been looking deeply into active standby. Will report bunch of stories (and submit patches) soon
20:14:30 <nmagnezi> Some if the issues where already known ; some look new (at least to me)
20:14:38 <nmagnezi> But nothing drastic
20:14:39 <johnsom> rm_work the neat thing we saw once, but couldn't confirm was nova status said "DELETED" but there is a second status in the EXT that said "deleting"
20:15:09 <rm_work> O_o
20:15:31 <nmagnezi> johnsom, I actually have a question related to active standby, but that can wait for open discussion
20:15:34 <rm_work> well ... we wouldn't get bugs like cgoncalves linked, as our ACTIVE_STANDBY amps are split across AZs
20:15:38 <rm_work> with AZ Anti-affinity ;P
20:16:05 <rm_work> which I still wish we could merge as an experimental feature, as i have seen at least two other operators HERE that use similar code
20:16:23 <xgerman_> Stein…
20:16:32 <johnsom> Looks like it went to failover those and there were no compute hosts left: Failed to build compute instance due to: {u'message': u'No valid host was found. There are not enough hosts available.'
20:17:46 <johnsom> Yeah, so too short of a timeout before we start failing over or too small of a cloud?
20:17:55 <cgoncalves> johnsom, right. and I think after that the LBs/amps couldn't failover manually because they were in ERROR. I need to look deeper and need-info the reporter. anyway
20:18:14 <johnsom> hmmm.  Ok, thanks for the updates
20:18:22 <johnsom> Our main event today:
20:18:30 <johnsom> #topic Make the FFE call on UDP support
20:19:09 <johnsom> Hmm, wonder if meeting bot is broken
20:19:16 <xgerman_> mmh
20:19:16 <johnsom> Well, we will see at the end
20:19:22 <cgoncalves> I *swear* I've been wanting to test this :( I even restacked this afternoon with latest patch sets
20:19:35 <johnsom> Current status from my perspective:
20:19:51 <johnsom> 1. the client patch is merged and was in Rocky.  This is good and it seems to work great for me.
20:20:13 <johnsom> 2. Two out of the three patches I have +2'd as I think they are fine.
20:20:40 <johnsom> 3. I have started some stories for issues I see, but I don't consider show stoppers: https://storyboard.openstack.org/#!/story/list?status=active&tags=UDP
20:20:58 <johnsom> 4. I have successfully build working UDP LBs with this code.
20:21:38 <johnsom> 5. The gates show it doesn't break existing stuff. (the one gate failure today was a "connection reset" while devstack was downloading a package)
20:21:50 <rm_work> Yeah I think this also falls into "merge, fix bugs as we find them" territory
20:21:59 <rm_work> as with any big feature
20:22:12 <rm_work> so long as it doesn't interfere with existing code paths (which i believe it does not)
20:22:38 <johnsom> Yeah, I'm leaning that way too.  I need to take another pass across the middle patch to see if anything recent jumps out at me, but I expect we can turn and burn on that if needed.
20:22:54 <xgerman_> I am not entirely in love with the additional code path for UDP LB health
20:23:13 <cgoncalves> would it make sense to somehow flag it as experimental? there's not a single tempest test for it IIRC
20:23:16 <xgerman_> but we can streamline that later
20:23:16 <johnsom> That is in the middle patch I haven't looked at for a bit
20:23:59 <johnsom> cgoncalves we have shipped stuff in worse shape tempest wise... sigh
20:24:01 <xgerman_> yeah, my other beef is with having a special UDP listener on the amphora REST API…
20:24:27 <johnsom> xgerman_ What do you mean? It's just another protocol on the listener.....
20:25:04 <johnsom> Oh, amphora-agent API?
20:26:19 <xgerman_> yep: https://review.openstack.org/#/c/529651/57/octavia/amphorae/backends/agent/api_server/server.py
20:26:23 <johnsom> cgoncalves I can probably wipe up some tempest tests for this before Rocky ships if you are that concerned.  We will need them
20:26:38 <johnsom> Will be a heck of a lot easier than the dump migration tool tests
20:26:57 <xgerman_> wouldn’t bet on it ;-)
20:27:27 <johnsom> Well, I have been doing manual testing on this a lot so have a pretty good idea how I would do it
20:27:35 <cgoncalves> johnsom, I'd prefer having at least a basic udp test but I wont ask you for that. too much already on your plate
20:29:32 <johnsom> Any more discussion or should we vote?
20:30:04 <xgerman_> well, how do others feel about the architecture ?
20:30:26 <xgerman_> or, let’s vote ;-)
20:30:45 <johnsom> #startvote Should we merge-and-fix the UDP patches? Yes, No
20:30:46 <openstack> Begin voting on: Should we merge-and-fix the UDP patches? Valid vote options are Yes, No.
20:30:47 <openstack> Vote using '#vote OPTION'. Only your last vote counts.
20:31:06 <xgerman_> #vote abstain
20:31:07 <openstack> xgerman_: abstain is not a valid option. Valid options are Yes, No.
20:31:12 <johnsom> No maybe options for you whimps....  Grin
20:31:22 <cgoncalves> #vote yes
20:31:41 <cgoncalves> (do I get to vote?!)
20:31:41 <johnsom> #vote yes
20:31:52 <johnsom> Yes, everyone gets a vote
20:32:01 <nmagnezi> I was not involved in this, but johnsom reasoning make sense to me
20:32:04 <nmagnezi> #vote yes
20:32:53 <johnsom> xgerman_ rm_work Have a vote?  Anyone else lurking?
20:32:59 <rm_work> ah
20:33:29 <xgerman_> I thought sitting it out would be like abstain ;-)
20:33:50 <rm_work> #vote yes
20:34:06 * johnsom needs a buzzer for "abstain" votes
20:34:08 <rm_work> though I should at least try to get through the patches
20:34:20 <rm_work> to make sure there's nothing that'd be hard to fix later
20:34:39 <johnsom> Yeah, I think 1 and 3 are good. I would like some time on 2 today, so maybe push to merge later today or early tomorrow
20:35:11 <johnsom> Going once....
20:35:16 <johnsom> Going twice.....
20:35:24 <johnsom> #endvote
20:35:25 <openstack> Voted on "Should we merge-and-fix the UDP patches?" Results are
20:35:26 <openstack> Yes (4): rm_work, nmagnezi, cgoncalves, johnsom
20:35:42 <johnsom> Sold, you are now the proud owners of a UDP protocol load balancer
20:36:57 <xgerman_> dougwig: will be proud ;-)
20:37:09 <johnsom> So, cores, if you could give your approve votes on 1 and 3.  Give us some time on 2. I will ping in the channel if it's ready for the final review pass
20:37:20 <cgoncalves> we now *really* need to fix it for centos amps. I just tried creating a LB and it failed
20:37:33 <xgerman_> what’s the patch I have to pull for a complete install? 1 or 3?
20:37:43 <johnsom> Ah bummer. cgoncalves can you help with that or too busy?
20:38:04 <cgoncalves> johnsom, I will prioritize that for tomorrow
20:38:06 <johnsom> xgerman_ 3 or https://review.openstack.org/539391
20:38:22 <xgerman_> k
20:38:25 <johnsom> I also added a follow up patch with API-ref and release notes and some minor cleanup
20:38:38 <johnsom> https://review.openstack.org/587690
20:38:43 <johnsom> Which is also at the end of the chain.
20:39:04 <xgerman_> k
20:39:18 <johnsom> cgoncalves If you have changes, can you create a patch at the end of the chain? That way we can still make progress on review/merge but get it fixed
20:39:30 <cgoncalves> johnsom, sure
20:39:37 <dougwig> UDP, damn straight.
20:39:58 <johnsom> If I get done early with my review on 2 I might poke at centos, but no guarantees I will get there.
20:40:23 <johnsom> dougwig o/ Sorry you missed the vote.  Now you can load balancer your DNS servers...   grin
20:40:45 <xgerman_> :-)
20:40:47 <dougwig> next up, rewrite in ruby
20:41:05 <johnsom> You had better sign up for PTL if you want to do that....
20:41:08 <johnsom> grin
20:41:20 <johnsom> #topic Open Discussion
20:41:38 <johnsom> nmagnezi I think you had an act/stdby question
20:41:49 <nmagnezi> johnsom, yup :)
20:42:25 <nmagnezi> johnsom, so did a basic test of spawning a highly available load balancer , and captured the traffic on both amps
20:42:37 <nmagnezi> Specifically, on the namespace that we run there
20:42:49 <nmagnezi> MASTER: https://www.cloudshark.org/captures/1d0a1028c402
20:42:59 <nmagnezi> BACKUP: https://www.cloudshark.org/captures/8a4ee5b38e18
20:43:21 <nmagnezi> First question, mm.. I was not expecting to see tenant traffic in the backup one
20:43:54 <nmagnezi> Even if I manually "if down" the VIP interface (who does not send GARPs - I verified that) -> I still see that traffic
20:44:17 <nmagnezi> And that happens specifically when I send traffic towards the VIP
20:44:52 <nmagnezi> btw in this example -> is the qrouter NIC and is the VIP
20:45:25 <johnsom> It's likely the promiscuous capture on the port.
20:46:19 <nmagnezi> johnsom, you mean that it is because I use 'tcpdump -i any" in the namespace?
20:46:22 <johnsom> Oh, I know what it is. It's the health monitor set on the LB. It's outgoing tests for the member I bet
20:46:40 <xgerman_> +1
20:46:48 <nmagnezi> IIRC I didn't set any health monitor
20:46:56 <nmagnezi> Lemme double check that real quick
20:46:56 <xgerman_> mmh
20:47:02 <johnsom> Your member is located out the VIP subnet (you didn't specify a subnet at member create)
20:47:47 <johnsom> Because on that backup, those HTTP are all outbound from the VIP
20:48:21 <nmagnezi> Checked.. no health monitor set
20:48:42 <nmagnezi> The members reside on the same subnet as the VIP
20:49:00 <nmagnezi> All in privste-subnet the is created by default in devstack
20:49:37 <johnsom> Hmm, they do look a bit odd. Yeah, my bet is the promiscuous setting on the port is picking up the response traffic from the master, let's look at the MAC addresses.
20:50:29 <johnsom> That is probably why only half the conversation is seen on the backup.
20:50:57 <nmagnezi> Yeah that looked very strange.. no SYN packets
20:51:04 <johnsom> If you check, the backup's haproxy counters will not be going up
20:51:47 <nmagnezi> Will check that
20:52:10 <nmagnezi> But honestly I was not expecting to see that traffic on the backup amp
20:53:01 <johnsom> It looks right to me in general. Yeah, generally I wouldn't either, but I'm just guessing it's how the network is setup underneath and the point of capture.
20:53:26 <nmagnezi> I still don't get why it's there actually. I know the two amps communicate for other stuff (e.g. keepalived)
20:53:50 <nmagnezi> okay
20:53:56 <johnsom> The key to helping understand that is to look at the MAC addresses of your ports and the packets.  The 0.3 packets will likely have the MAC of the base port on the master
20:54:47 <johnsom> If you switch it over it should be the base port of the backup in those packets.
20:55:19 <johnsom> Does that help?
20:55:34 <nmagnezi> I was inspecting the qrouter arp table while doing those tests. It remained consistent with the MASTER MAC address
20:55:39 <nmagnezi> It does, thank you
20:55:44 <nmagnezi> Will keep looking into this
20:55:51 <johnsom> Ok, cool.
20:55:58 <johnsom> Any other items today?
20:56:01 <nmagnezi> If there's time I have another question
20:56:08 <nmagnezi> But let other folks talk first
20:56:09 <nmagnezi> :)
20:56:09 <johnsom> Sure, 5 minutes
20:56:29 <nmagnezi> Going once..
20:56:36 <johnsom> Just take it
20:56:39 <nmagnezi> ha
20:56:40 <nmagnezi> ok
20:56:50 <nmagnezi> So if we look at the capture from master
20:57:00 <nmagnezi> MASTER: https://www.cloudshark.org/captures/1d0a1028c402
20:57:19 <nmagnezi> Some connections end with RST, ACK and RST
20:57:20 <nmagnezi> Some not
20:57:35 <nmagnezi> Is that an HAPROXY thing to close connections with pool members?
20:58:21 <nmagnezi> It does not happen with all the sessions
20:58:22 <johnsom> If it is a flow with the pool member, yes, that is the connection between HAProxy and the member server.
20:59:41 <nmagnezi> Okay, no more questions here
20:59:45 <johnsom> If the client on the front end closes the connection to the LB, haproxy will RST the backend.
20:59:54 <nmagnezi> Thank you!
20:59:59 <johnsom> Let me see if I can find that part of the docs.
21:00:07 <johnsom> I will send a link after the meeting
21:00:13 <nmagnezi> Np
21:00:35 <johnsom> Thanks folks!
21:00:39 <johnsom> #endmeeting