14:01:20 <carl_baldwin> #startmeeting neutron_routed_networks 14:01:26 <openstack> Meeting started Tue Mar 22 14:01:20 2016 UTC and is due to finish in 60 minutes. The chair is carl_baldwin. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:01:27 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:01:29 <openstack> The meeting name has been set to 'neutron_routed_networks' 14:01:55 <carl_baldwin> #topic Announcements 14:02:10 <carl_baldwin> I don't have much to announce. Anyone? 14:02:36 <xiaohhui> nothing from me 14:02:40 <mlavalle> Newton is open now 14:03:05 <carl_baldwin> mlavalle: ++ 14:03:20 <mlavalle> and we are about a month from the Summit 14:03:49 <mlavalle> we should start thinking what meetings we want to hold regarding routed networks 14:03:59 <mlavalle> especially with the Nova team 14:04:00 * johnthetubaguy lurks 14:04:04 <carl_baldwin> mlavalle: true. I've got to book travel. 14:04:17 <carl_baldwin> johnthetubaguy: hi 14:04:20 <johnthetubaguy> hey 14:04:29 <mlavalle> johnthetubaguy: good to see you :-) 14:04:30 <johnthetubaguy> I keep meaning to hit your spec again 14:04:41 <carl_baldwin> #topic Progress 14:04:51 <carl_baldwin> johnthetubaguy: that is a good segue. 14:05:09 <carl_baldwin> #link https://review.openstack.org/#/c/263898/ 14:05:28 <carl_baldwin> We've been getting some attention from the scheduler team on it. 14:05:48 <carl_baldwin> I have responses to feedback and a new version of the spec almost ready to post. 14:06:01 <mlavalle> I asked jaypipes to review it yesterday during the scheduler meeting 14:06:11 <johnthetubaguy> yeah, I suspect getting agreement on that spec is the best target for the summit, I guess that covers most of the details now 14:06:16 <mlavalle> He was super nice and reviewed it right away 14:06:21 <johnthetubaguy> on the nova side I mean 14:06:29 <carl_baldwin> Thank you jaypipes. 14:06:36 <reedip__> Hi 14:07:03 <johnthetubaguy> did you see jay's spec on the nova side, for the generic resource provider stuff, I guess you have? 14:07:27 <mlavalle> johnthetubaguy: yes I just reviewed it and provided feedback, especially on the Neutron aspects 14:07:51 <johnthetubaguy> mlavalle: awesome, I need to take a look at the pair of these 14:08:09 <carl_baldwin> It looks to me like the difficult part is going to be nailing down the details of how to handle allocations and reserved in the scheduler. I started thinking about it again last night but it will take me a little more time to think through it again. 14:08:34 <carl_baldwin> johnthetubaguy: I should take another pass through Jay's. 14:08:37 <johnthetubaguy> carl_baldwin: +1 thats the tricky bit, I think we all mostly agree on the other parts, based on the midcycle 14:09:30 <carl_baldwin> johnthetubaguy: It would help to have another review. As I recall, you had some good insight in Bristol. 14:09:33 <mlavalle> carl_baldwin: the sticking part is with ports with ip addresses created in Neutron and passed to Nova 14:09:39 <johnthetubaguy> the problem is ports that have an IP address, but are not associated with that particular Nova deploy (worrying about two nova's one neutron in my head here) 14:09:48 <johnthetubaguy> mlavalle: +1 14:11:11 <johnthetubaguy> I have wondered about Neutron registering the claims in Nova's database, when its a "Nova" claimed IPs, i.e. check Nova has a claim for an IP for that port, before assigning it in Neutron 14:11:15 <mlavalle> carl_baldwin: what if for each segment, we assign 2 subnets, one for Nova to allocate ips from and the other for Neutron to allocate ips from? 14:11:20 <johnthetubaguy> but I don't like how coupled those two things are 14:11:22 <carl_baldwin> Or, with ports created in Neutron without an IP address. Depending on how you look at it. 14:11:57 <johnthetubaguy> mlavalle: I don't like more segments, we have more pools of wasted IP addresses that way 14:12:22 <johnthetubaguy> mlavalle: think about when a port gets re-used for an existing server, you can't switch segments at that point 14:12:51 <mlavalle> johnthetubaguy: yeap, I am just brainstorming ways to decouple the ip allocation between Nova and Neutron 14:13:05 <johnthetubaguy> I think either neutron checks claims in Nova, or we do some dance around the reserved number getting updated correctly 14:13:13 <carl_baldwin> I think mlavalle meant two subnets rather than segments. But, the IP waste is still something to think about. 14:13:28 <mlavalle> yes, the IP waste is a problem 14:13:56 <carl_baldwin> johnthetubaguy: My current revision of the spec has something like the "dance". Jay provided feedback on that (Ed too, a little bit) and I'm currently digesting that feedback. 14:15:00 <mlavalle> I tend to think that the number of ports created in Neutron with IPs will be smaller than the number of ports created by Nova. So the subnet for Neutron could have smaller range 14:15:01 <johnthetubaguy> honestly, I am leaning towards Neutron optionally checking Nova has a claim for that port uuid in the correct pool, when assignging IPs, (and adding a claim if its missing). 14:15:22 <johnthetubaguy> so if Nova has reserved an IP, Neutron doesn't give it out by accident 14:15:27 <johnthetubaguy> and other such cases 14:15:59 <kevinbenton> a nova api call on every neutron port create? 14:16:03 <johnthetubaguy> its more coupling that I would like, but I am struggling to find anything thats as clean 14:16:11 <johnthetubaguy> kevinbenton: yeah, its not great 14:16:47 <kevinbenton> is it an issue if neutron gives it out? 14:16:48 <mlavalle> kevinbenton: not necessarilly on very port creation. We could limit that to segments in roued networks 14:17:16 <kevinbenton> mlavalle: but in a deployment that depends on this topology, that would be almost every port creation 14:17:29 <mlavalle> mhhh... True 14:17:31 <johnthetubaguy> so here is the thing, we can use a service token when talking to neutron for when Nova calls to avoid the callback, so Nova asserts we have a claim 14:18:12 <carl_baldwin> kevinbenton: hi. It is turning out to be a bit of a struggle to ensure that the scheduler has the data it needs to make reservations and still maintain ownership of IP addresses in Neutron. Welcome to the conversation. 14:18:12 <johnthetubaguy> and most port creates in this world mean no IP address is assigned, until the port is attached, so it would limit the cases where its needed 14:18:58 <johnthetubaguy> so I think its possible to "require" the call, then promptly avoid it in 99% of port creates, if that makes any sense 14:19:04 <carl_baldwin> johnthetubaguy: I'll admit I don't have my head wrapped around this idea yet. 14:19:05 <kevinbenton> there are two things at play here. one is just getting the data to nova. the other is blocking neutron from allocating an IP 14:19:29 <kevinbenton> if the latter is not necessary, we can use the existing nova notifier neutron has to dispatch IP allocation pool changes 14:19:44 <johnthetubaguy> we need the later, else the claim in Nova is pointless 14:19:57 <kevinbenton> well the claim affects all VM booting, no? 14:20:03 <mlavalle> yes, the latter is exactly the sticking point 14:20:25 <johnthetubaguy> kevinbenton: I think so, but not sure I fully understand your question 14:20:39 <kevinbenton> so not completely pointless. it would just be stale for a few seconds if a user creates a port with an IP via the Neutron API directly 14:21:20 <johnthetubaguy> so we are trying to avoid all those races, as in practice they turn out to be a big deal 14:22:00 <johnthetubaguy> its kinda surprising, but its really about when you get big bursts of build requests coming into the system, and there being certain limits of when you can retry builds 14:22:21 <kevinbenton> bursts of builds would not trigger this though because nova will manage its allocations 14:22:36 <johnthetubaguy> kevinbenton: true 14:22:48 <kevinbenton> the only time this would be an issue is if someone ate up all of the IPs via the neutron API at exactly the same time as a build 14:23:28 <kevinbenton> The thing i'm worried about is implying that Nova will be a source of truth that will make us safe to these races 14:23:36 <kevinbenton> Because IPAM is pluggable in Neutron now 14:24:07 <kevinbenton> so available IPs could change under other constraints 14:25:12 <johnthetubaguy> honestly, I think we should design for a case that works great, and make it easy for others to opt into that approach 14:26:33 <johnthetubaguy> so if neutron has an optional check of Nova for the claim, before calling out to get the IP from IPAM, what other issues would be hit? 14:26:55 <johnthetubaguy> (assuming for the common case, Nova sends details to neutron to say there is not need to check the claim, its already been claimed) 14:27:51 <kevinbenton> what will neutron ask Nova? 14:27:57 <carl_baldwin> kevinbenton: johnthetubaguy: I'm not sure that I have a firm grasp on what either of you has in mind in great detail. 14:28:26 <carl_baldwin> Is there a way that we could spell out these proposals to be sure we all have the same understanding? 14:28:34 <kevinbenton> johnthetubaguy: something like, "can i use an IP out of this subnet?" 14:29:16 <johnthetubaguy> kevinbenton: it would more be, get-or-return-claim-uuid for port uuid-a for resource pool uuid-b 14:29:29 <johnthetubaguy> carl_baldwin: comments on your spec, maybe? 14:29:57 <carl_baldwin> johnthetubaguy: yes, that works. 14:30:17 <kevinbenton> johnthetubaguy: does that create a claim if one doesn't exist? 14:30:44 <johnthetubaguy> kevinbenton: yes, a bit like get-me-a-network, creates it, or returns the existing one 14:30:56 <johnthetubaguy> idenpotent-ey 14:31:07 <mlavalle> so the 'get' part is the create if it doesn't exist, right? 14:31:09 <kevinbenton> johnthetubaguy: so then if the port create fails, we have to handle a rollback notification to nova, right? 14:31:32 <johnthetubaguy> mlavalle: good point, I meant create-or-get-claim-uuid 14:31:51 <mlavalle> clearer now :-) 14:32:15 <johnthetubaguy> kevinbenton: hmm, depends how you handle that, I was expecting to have a port in the error state that needs deleteing, and on delete you drop the claim 14:32:48 <kevinbenton> johnthetubaguy: well it depends on where it fails 14:33:24 <kevinbenton> johnthetubaguy: If it fails to allocate an IP because of a race (concurrent server request), the whole process will be restarted 14:33:26 <johnthetubaguy> kevinbenton: can we leave that late enough, as we only need the claim before doing the IPAM call, I guess 14:33:36 <johnthetubaguy> yeah 14:33:48 <kevinbenton> johnthetubaguy: so in that case, we can no longer use db rollbacks 14:34:13 <kevinbenton> johnthetubaguy: this is now an operation with an external side effect 14:34:22 <johnthetubaguy> kevinbenton: this is certainly outside the db transaction 14:35:16 <kevinbenton> johnthetubaguy: yeah, that won't work. we will need to refactor stuff. Right now everything including IPAM is assumed to be pre-commit and safe to rollback and retry 14:35:53 <johnthetubaguy> so clairifcation point 14:36:03 <johnthetubaguy> the claim is for "a" member of the pool 14:36:09 <johnthetubaguy> not for a specific IP address 14:36:27 <johnthetubaguy> so you should be able to retry several times, without needing to refresh the claim 14:36:33 <johnthetubaguy> assuming the port uuid is static 14:36:44 <kevinbenton> but the port uuid is not IIRC 14:36:45 <kevinbenton> one sec 14:37:01 <johnthetubaguy> ah, that would certainly need to change to make the above work 14:38:14 <kevinbenton> yes, right now it will be generated on each call since it won't be specified 14:38:19 <kevinbenton> https://github.com/openstack/neutron/blob/master/neutron/db/db_base_plugin_v2.py#L1213 14:38:29 <kevinbenton> this function will be called on each retry 14:39:34 <johnthetubaguy> kevinbenton: yeah, as long as we generate the uuid outside that look, it should be better I guess 14:40:01 <kevinbenton> what happens when a server dies in the middle of this? what makes this eventually consistent? 14:43:08 <kevinbenton> we can take this discussion to the spec. my initial reaction to this is that it feels wrong that Neutron has to ask Nova if it's allowed to do something with Neutron resources 14:47:14 <carl_baldwin> kevinbenton: I agree with that. We also don't want the scheduler to have to ask Neutron when scheduling. So, we don't really want to require either service to ask the other. 14:47:41 <carl_baldwin> I also agree that Neutron needs to be the ultimate source of truth around IP availability. 14:47:50 <mlavalle> ++ 14:47:53 <carl_baldwin> ... more specifically, IPAM 14:48:51 <carl_baldwin> Let's take it to the spec. I think there are a lot of ways this *could* be done. I'd like to make sure that we have them all laid out well enough to understand them and compare them. 14:49:28 <mlavalle> I am still going to propose the 2 subnets approach in the spec. At least it is another alternative 14:49:59 <carl_baldwin> mlavalle: Please do. 14:51:19 <johnthetubaguy> sorry, I got totally distracted there 14:51:45 <carl_baldwin> kevinbenton: I'm not sure that I have a handle on what alternative you have in mind. 14:51:50 <carl_baldwin> Here's a like to the spec: 14:51:59 <carl_baldwin> #link https://review.openstack.org/#/c/263898 14:52:18 <carl_baldwin> Is there anything else for this meeting? Our time is getting short. 14:52:37 <carl_baldwin> I'll be honest, I've had a lot of time off since two weeks ago and I haven't made much progress. 14:53:06 <reedip__> carl_baldwin : everyone needs a break :) 14:53:18 <mlavalle> This discussion is what I wanted to trigger yesterday when I asked jay to review the spec, especially the 'algorithm'. So, I guess I can claim 'mission acomplished' 14:53:22 <carl_baldwin> I'll update both specs to freshen them up today. Then, I'll visit the patches already in flight. 14:53:35 <carl_baldwin> mlavalle: :) 14:53:48 <xiaohhui> I have a short chat with Russlle today. We are starting some work for routed network in ovs-ovn and networking-ovn 14:54:38 <carl_baldwin> xiaohhui: Anything to bring up? 14:54:51 <carl_baldwin> All, I'd like to get this wrapped up: 14:54:55 <carl_baldwin> #link https://review.openstack.org/#/c/242393 14:55:32 <xiaohhui> The work is to bring the bridge-mapping to ovn. It is not supported yet. 14:56:00 <xiaohhui> yeah, I think that patch should have a quick update, many patches depends on it 14:56:35 <carl_baldwin> I just put a note on it that we should rename the table in that patch. Anyone willing to take that on? 14:57:04 <mlavalle> carl_baldwin: in Rochester you and I talked about the 1 to 1 relationship between segment and a physical network, at least in the context of routed networks 14:57:25 <mlavalle> I made that assumption here: https://review.openstack.org/#/c/285548 14:57:26 <xiaohhui> Can we do it in a following patch? I think that was the dicussion in the patch. 14:58:14 <carl_baldwin> xiaohhui: After reading other comments, I don't see the benefit of doing the rename is a follow-up patch. It just leaves things in an awkward intermediate state. 14:58:37 <mlavalle> carl_baldwin: I can do it, if nobody else steps up 14:58:40 <carl_baldwin> Avoiding a migration in this patch, to me, isn't a really compelling reason to not do it. 14:59:29 <carl_baldwin> mlavalle: You got time? 14:59:33 <xiaohhui> hmmm, I am OK with that. I can also do that, but I will take vocation in the rest days of this week. 14:59:44 <mlavalle> carl_baldwin: yes 15:00:17 <EmilienM> hello 15:00:19 <carl_baldwin> xiaohhui: thanks for offering. enjoy your vacation. 15:00:19 <mlavalle> we probably have to quit now 15:00:25 <carl_baldwin> Sorry. meeting's over. 15:00:27 <carl_baldwin> #endmeeting