14:01:20 <carl_baldwin> #startmeeting neutron_routed_networks
14:01:26 <openstack> Meeting started Tue Mar 22 14:01:20 2016 UTC and is due to finish in 60 minutes.  The chair is carl_baldwin. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:01:27 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:01:29 <openstack> The meeting name has been set to 'neutron_routed_networks'
14:01:55 <carl_baldwin> #topic Announcements
14:02:10 <carl_baldwin> I don't have much to announce.  Anyone?
14:02:36 <xiaohhui> nothing from me
14:02:40 <mlavalle> Newton is open now
14:03:05 <carl_baldwin> mlavalle: ++
14:03:20 <mlavalle> and we are about a month from the Summit
14:03:49 <mlavalle> we should start thinking what meetings we want to hold regarding routed networks
14:03:59 <mlavalle> especially with the Nova team
14:04:00 * johnthetubaguy lurks
14:04:04 <carl_baldwin> mlavalle: true.  I've got to book travel.
14:04:17 <carl_baldwin> johnthetubaguy: hi
14:04:20 <johnthetubaguy> hey
14:04:29 <mlavalle> johnthetubaguy: good to see you :-)
14:04:30 <johnthetubaguy> I keep meaning to hit your spec again
14:04:41 <carl_baldwin> #topic Progress
14:04:51 <carl_baldwin> johnthetubaguy: that is a good segue.
14:05:09 <carl_baldwin> #link https://review.openstack.org/#/c/263898/
14:05:28 <carl_baldwin> We've been getting some attention from the scheduler team on it.
14:05:48 <carl_baldwin> I have responses to feedback and a new version of the spec almost ready to post.
14:06:01 <mlavalle> I asked jaypipes to review it yesterday during the scheduler meeting
14:06:11 <johnthetubaguy> yeah, I suspect getting agreement on that spec is the best target for the summit, I guess that covers most of the details now
14:06:16 <mlavalle> He was super nice and reviewed it right away
14:06:21 <johnthetubaguy> on the nova side I mean
14:06:29 <carl_baldwin> Thank you jaypipes.
14:06:36 <reedip__> Hi
14:07:03 <johnthetubaguy> did you see jay's spec on the nova side, for the generic resource provider stuff, I guess you have?
14:07:27 <mlavalle> johnthetubaguy: yes I just reviewed it and provided feedback, especially on the Neutron aspects
14:07:51 <johnthetubaguy> mlavalle: awesome, I need to take a look at the pair of these
14:08:09 <carl_baldwin> It looks to me like the difficult part is going to be nailing down the details of how to handle allocations and reserved in the scheduler.  I started thinking about it again last night but it will take me a little more time to think through it again.
14:08:34 <carl_baldwin> johnthetubaguy: I should take another pass through Jay's.
14:08:37 <johnthetubaguy> carl_baldwin: +1 thats the tricky bit, I think we all mostly agree on the other parts, based on the midcycle
14:09:30 <carl_baldwin> johnthetubaguy: It would help to have another review.  As I recall, you had some good insight in Bristol.
14:09:33 <mlavalle> carl_baldwin: the sticking part is with ports with ip addresses created in Neutron and passed to Nova
14:09:39 <johnthetubaguy> the problem is ports that have an IP address, but are not associated with that particular Nova deploy (worrying about two nova's one neutron in my head here)
14:09:48 <johnthetubaguy> mlavalle: +1
14:11:11 <johnthetubaguy> I have wondered about Neutron registering the claims in Nova's database, when its a "Nova" claimed IPs, i.e. check Nova has a claim for an IP for that port, before assigning it in Neutron
14:11:15 <mlavalle> carl_baldwin: what if for each segment, we assign 2 subnets, one for Nova to allocate ips from and the other for Neutron to allocate ips from?
14:11:20 <johnthetubaguy> but I don't like how coupled those two things are
14:11:22 <carl_baldwin> Or, with ports created in Neutron without an IP address.  Depending on how you look at it.
14:11:57 <johnthetubaguy> mlavalle: I don't like more segments, we have more pools of wasted IP addresses that way
14:12:22 <johnthetubaguy> mlavalle: think about when a port gets re-used for an existing server, you can't switch segments at that point
14:12:51 <mlavalle> johnthetubaguy: yeap, I am just brainstorming ways to decouple the ip allocation between Nova and Neutron
14:13:05 <johnthetubaguy> I think either neutron checks claims in Nova, or we do some dance around the reserved number getting updated correctly
14:13:13 <carl_baldwin> I think mlavalle meant two subnets rather than segments.  But, the IP waste is still something to think about.
14:13:28 <mlavalle> yes, the IP waste is a problem
14:13:56 <carl_baldwin> johnthetubaguy: My current revision of the spec has something like the "dance".  Jay provided feedback on that (Ed too, a little bit) and I'm currently digesting that feedback.
14:15:00 <mlavalle> I tend to think that the number of ports created in Neutron with IPs will be smaller than the number of ports created by Nova. So the subnet for Neutron could have smaller range
14:15:01 <johnthetubaguy> honestly, I am leaning towards Neutron optionally checking Nova has a claim for that port uuid in the correct pool, when assignging IPs, (and adding a claim if its missing).
14:15:22 <johnthetubaguy> so if Nova has reserved an IP, Neutron doesn't give it out by accident
14:15:27 <johnthetubaguy> and other such cases
14:15:59 <kevinbenton> a nova api call on every neutron port create?
14:16:03 <johnthetubaguy> its more coupling that I would like, but I am struggling to find anything thats as clean
14:16:11 <johnthetubaguy> kevinbenton: yeah, its not great
14:16:47 <kevinbenton> is it an issue if neutron gives it out?
14:16:48 <mlavalle> kevinbenton: not necessarilly on very port creation. We could limit that to segments in roued networks
14:17:16 <kevinbenton> mlavalle: but in a deployment that depends on this topology, that would be almost every port creation
14:17:29 <mlavalle> mhhh... True
14:17:31 <johnthetubaguy> so here is the thing, we can use a service token when talking to neutron for when Nova calls to avoid the callback, so Nova asserts we have a claim
14:18:12 <carl_baldwin> kevinbenton: hi.  It is turning out to be a bit of a struggle to ensure that the scheduler has the data it needs to make reservations and still maintain ownership of IP addresses in Neutron.  Welcome to the conversation.
14:18:12 <johnthetubaguy> and most port creates in this world mean no IP address is assigned, until the port is attached, so it would limit the cases where its needed
14:18:58 <johnthetubaguy> so I think its possible to "require" the call, then promptly avoid it in 99% of port creates, if that makes any sense
14:19:04 <carl_baldwin> johnthetubaguy: I'll admit I don't have my head wrapped around this idea yet.
14:19:05 <kevinbenton> there are two things at play here. one is just getting the data to nova. the other is blocking neutron from allocating an IP
14:19:29 <kevinbenton> if the latter is not necessary, we can use the existing nova notifier neutron has to dispatch IP allocation pool changes
14:19:44 <johnthetubaguy> we need the later, else the claim in Nova is pointless
14:19:57 <kevinbenton> well the claim affects all VM booting, no?
14:20:03 <mlavalle> yes, the latter is exactly the sticking point
14:20:25 <johnthetubaguy> kevinbenton: I think so, but not sure I fully understand your question
14:20:39 <kevinbenton> so not completely pointless. it would just be stale for a few seconds if a user creates a port with an IP via the Neutron API directly
14:21:20 <johnthetubaguy> so we are trying to avoid all those races, as in practice they turn out to be a big deal
14:22:00 <johnthetubaguy> its kinda surprising, but its really about when you get big bursts of build requests coming into the system, and there being certain limits of when you can retry builds
14:22:21 <kevinbenton> bursts of builds would not trigger this though because nova will manage its allocations
14:22:36 <johnthetubaguy> kevinbenton: true
14:22:48 <kevinbenton> the only time this would be an issue is if someone ate up all of the IPs via the neutron API at exactly the same time as a build
14:23:28 <kevinbenton> The thing i'm worried about is implying that Nova will be a source of truth that will make us safe to these races
14:23:36 <kevinbenton> Because IPAM is pluggable in Neutron now
14:24:07 <kevinbenton> so available IPs could change under other constraints
14:25:12 <johnthetubaguy> honestly, I think we should design for a case that works great, and make it easy for others to opt into that approach
14:26:33 <johnthetubaguy> so if neutron has an optional check of Nova for the claim, before calling out to get the IP from IPAM, what other issues would be hit?
14:26:55 <johnthetubaguy> (assuming for the common case, Nova sends details to neutron to say there is not need to check the claim, its already been claimed)
14:27:51 <kevinbenton> what will neutron ask Nova?
14:27:57 <carl_baldwin> kevinbenton: johnthetubaguy:  I'm not sure that I have a firm grasp on what either of you has in mind in great detail.
14:28:26 <carl_baldwin> Is there a way that we could spell out these proposals to be sure we all have the same understanding?
14:28:34 <kevinbenton> johnthetubaguy: something like, "can i use an IP out of this subnet?"
14:29:16 <johnthetubaguy> kevinbenton: it would more be, get-or-return-claim-uuid for port uuid-a for resource pool uuid-b
14:29:29 <johnthetubaguy> carl_baldwin: comments on your spec, maybe?
14:29:57 <carl_baldwin> johnthetubaguy: yes, that works.
14:30:17 <kevinbenton> johnthetubaguy: does that create a claim if one doesn't exist?
14:30:44 <johnthetubaguy> kevinbenton: yes, a bit like get-me-a-network, creates it, or returns the existing one
14:30:56 <johnthetubaguy> idenpotent-ey
14:31:07 <mlavalle> so the 'get' part is the create if it doesn't exist, right?
14:31:09 <kevinbenton> johnthetubaguy: so then if the port create fails, we have to handle a rollback notification to nova, right?
14:31:32 <johnthetubaguy> mlavalle: good point, I meant create-or-get-claim-uuid
14:31:51 <mlavalle> clearer now :-)
14:32:15 <johnthetubaguy> kevinbenton: hmm, depends how you handle that, I was expecting to have a port in the error state that needs deleteing, and on delete you drop the claim
14:32:48 <kevinbenton> johnthetubaguy: well it depends on where it fails
14:33:24 <kevinbenton> johnthetubaguy: If it fails to allocate an IP because of a race (concurrent server request), the whole process will be restarted
14:33:26 <johnthetubaguy> kevinbenton: can we leave that late enough, as we only need the claim before doing the IPAM call, I guess
14:33:36 <johnthetubaguy> yeah
14:33:48 <kevinbenton> johnthetubaguy: so in that case, we can no longer use db rollbacks
14:34:13 <kevinbenton> johnthetubaguy: this is now an operation with an external side effect
14:34:22 <johnthetubaguy> kevinbenton: this is certainly outside the db transaction
14:35:16 <kevinbenton> johnthetubaguy: yeah, that won't work. we will need to refactor stuff. Right now everything including IPAM is assumed to be pre-commit and safe to rollback and retry
14:35:53 <johnthetubaguy> so clairifcation point
14:36:03 <johnthetubaguy> the claim is for "a" member of the pool
14:36:09 <johnthetubaguy> not for a specific IP address
14:36:27 <johnthetubaguy> so you should be able to retry several times, without needing to refresh the claim
14:36:33 <johnthetubaguy> assuming the port uuid is static
14:36:44 <kevinbenton> but the port uuid is not IIRC
14:36:45 <kevinbenton> one sec
14:37:01 <johnthetubaguy> ah, that would certainly need to change to make the above work
14:38:14 <kevinbenton> yes, right now it will be generated on each call since it won't be specified
14:38:19 <kevinbenton> https://github.com/openstack/neutron/blob/master/neutron/db/db_base_plugin_v2.py#L1213
14:38:29 <kevinbenton> this function will be called on each retry
14:39:34 <johnthetubaguy> kevinbenton: yeah, as long as we generate the uuid outside that look, it should be better I guess
14:40:01 <kevinbenton> what happens when a server dies in the middle of this? what makes this eventually consistent?
14:43:08 <kevinbenton> we can take this discussion to the spec. my initial reaction to this is that it feels wrong that Neutron has to ask Nova if it's allowed to do something with Neutron resources
14:47:14 <carl_baldwin> kevinbenton: I agree with that.  We also don't want the scheduler to have to ask Neutron when scheduling.  So, we don't really want to require either service to ask the other.
14:47:41 <carl_baldwin> I also agree that Neutron needs to be the ultimate source of truth around IP availability.
14:47:50 <mlavalle> ++
14:47:53 <carl_baldwin> ... more specifically, IPAM
14:48:51 <carl_baldwin> Let's take it to the spec.  I think there are a lot of ways this *could* be done.  I'd like to make sure that we have them all laid out well enough to understand them and compare them.
14:49:28 <mlavalle> I am still going to propose the 2 subnets approach in the spec. At least it is another alternative
14:49:59 <carl_baldwin> mlavalle: Please do.
14:51:19 <johnthetubaguy> sorry, I got totally distracted there
14:51:45 <carl_baldwin> kevinbenton: I'm not sure that I have a handle on what alternative you have in mind.
14:51:50 <carl_baldwin> Here's a like to the spec:
14:51:59 <carl_baldwin> #link https://review.openstack.org/#/c/263898
14:52:18 <carl_baldwin> Is there anything else for this meeting?  Our time is getting short.
14:52:37 <carl_baldwin> I'll be honest, I've had a lot of time off since two weeks ago and I haven't made much progress.
14:53:06 <reedip__> carl_baldwin : everyone needs a break :)
14:53:18 <mlavalle> This discussion is what I wanted to trigger yesterday when I asked jay to review the spec, especially the 'algorithm'. So, I guess I can claim 'mission acomplished'
14:53:22 <carl_baldwin> I'll update both specs to freshen them up today.  Then, I'll visit the patches already in flight.
14:53:35 <carl_baldwin> mlavalle: :)
14:53:48 <xiaohhui> I have a short chat with Russlle today. We are starting some work for routed network in ovs-ovn and networking-ovn
14:54:38 <carl_baldwin> xiaohhui: Anything to bring up?
14:54:51 <carl_baldwin> All, I'd like to get this wrapped up:
14:54:55 <carl_baldwin> #link https://review.openstack.org/#/c/242393
14:55:32 <xiaohhui> The work is to bring the bridge-mapping to ovn. It is not supported yet.
14:56:00 <xiaohhui> yeah, I think that patch should have a quick update, many patches depends on it
14:56:35 <carl_baldwin> I just put a note on it that we should rename the table in that patch.  Anyone willing to take that on?
14:57:04 <mlavalle> carl_baldwin: in Rochester you and I talked about the 1 to 1 relationship between segment and a physical network, at least in the context of routed networks
14:57:25 <mlavalle> I made that assumption here: https://review.openstack.org/#/c/285548
14:57:26 <xiaohhui> Can we do it in a following patch? I think that was the dicussion in the patch.
14:58:14 <carl_baldwin> xiaohhui: After reading other comments, I don't see the benefit of doing the rename is a follow-up patch.  It just leaves things in an awkward intermediate state.
14:58:37 <mlavalle> carl_baldwin: I can do it, if nobody else steps up
14:58:40 <carl_baldwin> Avoiding a migration in this patch, to me, isn't a really compelling reason to not do it.
14:59:29 <carl_baldwin> mlavalle: You got time?
14:59:33 <xiaohhui> hmmm, I am OK with that. I can also do that, but I will take vocation in the rest days of this week.
14:59:44 <mlavalle> carl_baldwin: yes
15:00:17 <EmilienM> hello
15:00:19 <carl_baldwin> xiaohhui: thanks for offering.  enjoy your vacation.
15:00:19 <mlavalle> we probably have to quit now
15:00:25 <carl_baldwin> Sorry.  meeting's over.
15:00:27 <carl_baldwin> #endmeeting