16:03:41 #startmeeting networking_ml2 16:03:42 Meeting started Wed Mar 4 16:03:41 2015 UTC and is due to finish in 60 minutes. The chair is rkukura. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:03:43 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:03:44 rkukura: he is here 16:03:46 The meeting name has been set to 'networking_ml2' 16:04:00 #topic Agenda 16:04:15 #link https://wiki.openstack.org/wiki/Meetings/ML2 16:04:29 Anything we need to add to the agenda? 16:04:57 #topic Announcements 16:05:05 One announcement: Kilo Feature Freeze is tomorrow (March 5th) 16:05:36 I missed this week’s neutron IRC meeting. This is the deadline to submit patches against BPs, right? 16:05:46 clarification, the code should be in for review by this date 16:06:04 Bug fix code can be submitted after this deadline, right? 16:06:24 correct 16:06:27 if the ci is not functional it is still a good idea to push the patchset 16:06:33 shivharis, rkukura: No, I think this is FPF date - 16:06:36 bug fixes are not features :-( 16:06:40 bug fixes are not features :-) 16:06:59 amotoki:-) 16:07:12 Right, so FPF is tomorrow, code freeze is when? 16:07:23 mar 19 16:07:29 rkukura: Usually a week or two later 16:07:54 OK. I think the ML2 features have all made the FPF - am I missing anything? 16:07:57 rkukura: I think you mean feature freeze. 16:08:42 amotoki: right - freeze for all code to merge except for bugs blocking release, right? 16:09:08 Any other announcements? 16:09:14 rkukura: no freeze for feature proposal - I think 16:09:39 Sukhdev: Isn’t FPF tomorrow? 16:10:26 rkukura: yes - i meant the freeze is for feature proposal not code 16:11:05 My understanding is FPF is the deadline to submit patch for new features, and FF is the deadline for them to be merged. Is that correct? 16:11:30 rkukura: right 16:12:12 Do we know whethere we will see initial patches for the pecan and plugin API BPs by tomorrow? 16:13:20 I have no information. 16:13:30 OK, lets move on 16:13:48 #topic ML2 Drivers decomposition discussion 16:14:23 Need reveiwers for https://review.openstack.org/#/c/155436/ pls 16:14:28 other than arista, what other drivers are in? 16:14:32 #link https://github.com/openstack/neutron/blob/master/doc/source/devref/contribute.rst#how-to 16:14:56 I have marked it in WIP since the corresponding vendor patch is still not in stackforge 16:15:19 rkukura: ODL 16:16:33 sadasu: it has to pass Cisco CI - that is the requirement 16:16:35 the patches are complete in all other respects…do you want me to lift the WIP tag? 16:17:27 sadasu: if the Cisco CI is not voting on it, you will not get any useful reviews :-) 16:17:45 reviewers are going to ask for CI immediately, they look at jenkins followed by CI vote 16:18:26 sadasu: If you need help in getting Cisco CI to vote on this patch, ping me, I can help 16:19:01 one question on vendor decomposition. Before and after the decomposition we need different CI setup. Is it better to stop the current CI and setup new CI for the decomposition? 16:19:11 sadasu: If you remove WIP, your patch will start to get attention - so, it is the right thing to do - if you believe it is ready 16:19:20 Sukhdev: thanks! I am in the process of integrating my UCS M specific testbed to the rest of Cisco CI 16:19:38 other than that I am all set 16:19:52 ... my question is not for sadasu's case. it is a new mech driver. 16:19:55 amotoki: It is a chicken-n-egg problem - 16:19:57 amotoki: yes, that is what i did 16:20:37 amotoki: I believe that is correct and that is why there is some down-time while moving the test - bed over 16:20:39 new mech driver will not have a prior CI (?) 16:20:55 amotoki: For CI to pass during the decomposition, you have to manually install the stackforge repo and afterwards you can pull it from pypi 16:21:01 trying to follow: what WIP and CI means plz 16:21:26 WIP, work in progress CI, continuous integration 16:21:51 Sukhdev: thanks for the tips. It sounds reasonable appraoch 16:22:43 amotoki: without this tip, you will not get core team's approval :-) 16:22:48 Sukhdev: +1 16:23:18 actually, to be precise, uninstall then install 16:24:00 Sukhdev: I would rather get attention now rather than the last day 16:24:01 sadasu: I will review your patch - you have to mock most of your stackforge code - will let you know if I see anything funny 16:24:27 sadasu: hence, I suggested to remove the WIP - 16:25:00 Sukhdev: please take a look at my patch-set as well, thanks 16:25:10 Sukhdev: thanks 16:25:13 shivharis: link? 16:25:28 i have sent u just now 16:25:47 shivharis: cool 16:26:14 any other decomposition questions/issues/comments? 16:26:16 who should update the https://github.com/openstack/neutron/blob/master/doc/source/devref/contribute.rst#how-to because the networking-mlnx is out of tree already 16:26:50 moshele: I would suggest you submit the patch to update it 16:26:56 ok 16:27:11 I was gooing to suggest we all submit patches to https://github.com/openstack/neutron/blob/master/doc/source/devref/contribute.rst#decomposition-progress-chart as we make progress. 16:27:29 it is just a snapshot of sometime and more input is really welcome. 16:27:53 rkukura: correct - that is the right way to do - but, keep armax in the loop - he is in charge of this 16:28:07 anything else on decomposition? 16:28:19 in other words make sure he approves the patch 16:28:35 #topic ML2 Sync and error handling (Task Flow) 16:28:55 manishg: nice to have you here to discuss this 16:29:24 have been sick! much better now :) 16:29:26 just wondering if anyone has looked at the patch? 16:29:48 manishg: I am guilty as charged :-) 16:30:00 manishg: Its been on my list to review in detail, but have not had a chance yet 16:30:11 I will try today 16:30:13 manishg: this time I left a comment :-) 16:30:29 As was suggested by HenryG/ Sukhdev , I added lot of documentation and updated the code. 16:30:46 Hoping that the intention/ design is clear. but if it's not let me know. 16:31:09 manishg: Can you briefly summarize what futher work would be required beyond the current patch? 16:31:57 rkukura: the current patch gives an idea of the interface with the drivers (which is backward compatible). next steps would be make it modify the db to maintain state 16:32:06 + making it async 16:32:31 right now, the functionality is more or less the same (with refactor - getting ready for next steps) 16:32:45 So for example, a create would then return with the resource in a “creating” state, which would change to “ready” when the async tasks complete? 16:32:58 yep 16:33:33 we can add a state to the resource and perhaps a separate table to track the progress of various drivers which are 16:33:36 registered. 16:33:50 Do you have thoughts on how the resources should behave in these intermediate states? Are new requests blocked/queued or rejected or run in parallel? 16:34:09 the other table with drivers only have entries while the state is "creating" , e.g. 16:34:42 when all of the drivers are done creating, the overall state of the resource can be ACTIVE and the driver table could have entries removed for that resource 16:34:48 sounds right? 16:35:14 while resource is in intermediate state 16:35:23 only certain operations are permitted. 16:35:37 manishg: I think so, although it might be worth keeping track of which drivers are up-to-date rather than which drivers are not 16:35:38 and when querying for resources only ACTIVE ones are returned 16:35:44 manishg: During Paris summit - we discussed as to how long to keep the resource in intermediate state, say if one of the MD is misbehaving/ 16:36:15 not sure hiding the resource makes sense 16:36:29 would seem a create followed by a get should always work 16:36:36 rkukura: I was thinking we maintain all driver states for the resources so that you know the complete state... until the transition happens to ACTIVE 16:37:09 manishg: I agree the plugin should keep track of the driver states and refect these in the overall state 16:37:11 rkukura: the hiding part is only default behaviour and one can request "all" resources. The reason for this 16:37:29 manishg: At what stage will we declare an operation failed? Or keep in intermediate state forever? 16:37:32 is to maintain compatibility with current behaviour 16:37:53 is there any reason we cannot return resources with non-ACTIVE status? 16:38:26 I think we can return error resposen for operations on resource in creaeting status. 16:38:28 manishg: I’d think a lot of scripts and tools expect the resource to exist right after a create that didn’t fail 16:38:29 amotoki: non-ACTIVE is equavelent to "creating" 16:38:31 rkukura, amotoki: today, we create and that succeeds only when resource is ACTIVE. so create + get = resource in ACTIVE state. and callers use it that way 16:38:33 *response 16:38:59 we can have both options. but not sure what we want the default to be 16:39:07 I'd imagine default to return only ACTIVE. 16:39:09 Sukhdev: non-ACTIVE is same as 'Creating" when creating a resource. 16:39:27 when u do a create you get a net uuid, but when u do a 'normal' get you don't see it? 16:39:45 Is it reasonable to consider this driver state as trailing the definitive DB state, hopefully eventually becoming consistnent? 16:39:47 Sukhdev: about how long.... that depends on the drivers. And when they give up... 16:40:17 amotoki: non-ACTIVE = "creating" or "error" state (if drivers are able to declare it such due to certain condition) 16:40:21 And if so, why not allow API operations while in these intermediate states? 16:40:56 So the DB operations occur in sequence, building up a queue of tasks that need to complete until all thre drivers are in sync 16:41:18 rkukura: in intermediate states certain operations would make sense. for example in "creating" state one can delete 16:41:21 pending state (*-ing) in my mind. non-ACTIVE is misleading.... 16:41:31 And then when nova for instance needs to wait for a port to be ready to plug, it would make sure there is no backlog of driver activity at that point 16:42:34 rkukura: it will be simpler to not queue up lots of operations because then we run the risk of either uselessly doing operations (e.g. multiple updates) or will need to prune the queue (either ways it will be more complex) 16:43:03 manishg: I agree it could get complicated. But we do need to keep it usable. 16:43:37 rkukura: you bring up a good point 16:43:39 rkukura: isn't nova doing the same? 16:44:10 I think nova uses intemediate states as well - so, we may be OK 16:44:29 It may be worthwhile to look into this 16:44:34 Sukhdev: I think rkukura's question is not about the states but operations that are permitted in those states. 16:44:48 We’ve got a few more agenda items and limited time. I suggest we continue this discussion of the next steps next week, hopefully after we’ve all reviewed manishg’s current patch in detail. 16:44:52 so he doesn't want the caller to bother about the state. 16:45:21 rkukura: sure. but I think the more people look at the patch and comment I can proceed to next step of the states. 16:45:51 The patch's been out for two weeks or so. The sooner I get feedback, the faster we can march. 16:45:52 manishg: exactly - lets review this first phase, and discuss the next phase 16:45:59 cool. thanks. 16:46:02 manishg rkukura ; I have a one question - before you move to next topic 16:46:11 Sukhdev: go ahead 16:46:53 Say if the resource is in Active state (all MD's are in sync) - can one MD then move the state of the resource into "..ing" state? 16:47:20 what event would cause that? 16:47:38 I ask this in the context of back-end of the driver resets and it needs to bring everyting in sync - and do not want to allow additional operations until all in sync 16:47:40 Is this like if something breaks and the driver detect it? 16:48:08 if it's ACTIVE, delete API could cause it to go to DELETing but after ACTIVE, going back to CREATing would be weird. 16:48:10 rkukura: yes 16:48:36 manishg: I would want to move from ACTIVE - to creating 16:48:52 Maybe these should be special intermediate states, not CREATING or DELETING. 16:49:14 manishg: if my back-end resets and need to rebuild the state - do not want any operations (e.g. Update) performed on it while I am rebuilding my state 16:50:11 manishg: This is one of the use cases I was thinking when I was reviewing your patvch 16:50:13 patch 16:50:42 Sukhdev: I'm not really sure if we want to feed in events from devices back in the db... but I'll think about it. we can discuss more next time. 16:51:05 if we do, there will be lots of such cases. what if certain segment is down (data-path)... 16:51:25 manishg: yes, lets think about it - as I see this use case all the times - when HW resets :-) 16:51:27 do we want to update the db saying the segment is down and prevent selection of that network for new VMs? 16:51:29 I’d also like to avoid complicating the client view - they shouldn’t need to worry about whether an operation can be accepted or they have to wait before trying 16:51:43 Sukhdev: it you are going to put the network in "creating" you will also have to do the same to the respective ports 16:52:12 shivharis: I am more or less thinking about ports (was not thinking about networks) 16:52:38 Sukhdev: why not? 16:52:53 rkukura: I think building an async API while client isn't looking at the states will surely introduce complications. But that is another topic we need to discuss more I think. 16:52:59 In general if I have 10K VMs running on 4 networks - the ports become critical 16:53:46 Sukhdev: in the interest of getting something done, i suggest simplifying the first cut 16:54:11 we can add HA/audit like functionality later 16:54:16 shivharis: +1 (Sukhdev: what do we do now?_ 16:54:24 The definitive state is the DB. It seems to me we should allow all API operations to proceed moving the DB from one state to the next, and come up with a solution where the backend state eventually converges, and there is visibility when needed of whether it has or hasn’t converged. 16:54:28 I do not know how much scale are you testing your MDs, but, I am pushing beyond 20K VMs - then these kings of issue pop up, especially, if you reset HW in the middle (or switchover) 16:54:42 I am hoping this new mechanism can help all MDs 16:55:03 rkukura: agree. will discuss more. 16:55:14 OK, lets move on, and continue this very useful discussion next week 16:55:23 Sukhdev: what do you now? 16:55:28 #topic portsecurity patches 16:55:30 rkukura: sure. 16:55:38 #link portsecurity patches 16:55:38 hi 16:55:50 #link https://review.openstack.org/#/c/126552/ 16:56:01 #link https://review.openstack.org/#/c/160051/ 16:56:03 manishg: we have a very elaborate Sync mechanism - I was hoping to remove that and use taskflow - :-) 16:56:07 yamahata: go ahead... 16:56:24 126552 is framework for functional tests of iptables firewall generic. 16:56:40 So I'd like to merge it first so that we can add tests independently. 16:56:55 the second one is 160051. it has db issue. 16:57:06 yalie1: can you explain it? 16:57:36 Sukhdev: I'll look at your driver. you update the db state? Let's chat offline since folks are discussing other topic now. 16:57:43 y, 160051 is the major patch to implement the extension driver of port-sec 16:58:22 it almost done, but miss the functional test cases. 16:58:59 its the other way around 126552 is the port sec patch and the 160051 is the functional test patch. 16:59:11 manishg: no, I do not - but, considered it…..especially, when running 3 or 4 Neutron servers all pushing 20K+ VMs and forcing the switchovers in HW 16:59:29 we are down to a minute - anything blocking progress? 17:00:06 hopefully more reviews on the portsecurity patches. 17:00:10 Lets review those patches! 17:00:23 shivharis: Any bugs we need to look at this week? 17:00:27 shwetaap: I will review later today 17:00:28 FYI: I am pushing this bug fix for K3, https://bugs.launchpad.net/neutron/+bug/1328991 17:00:29 Launchpad bug 1328991 in neutron "External network should not have provider:network_type as vxlan" [Medium,In progress] - Assigned to Aman Kumar (amank) 17:01:01 i will ping the owner, it is close to getting a fix and useful to have 17:01:19 I’m making forward progress on the DVR schema/logic cleanup - hopefully a patch soon 17:01:44 We are out of time - anything else urgent? 17:01:47 k3 in mar 19 we need all high prio fixed 17:02:09 shivharis: right 17:02:12 Thanks everyone! 17:02:14 thanks. bye. 17:02:17 thanks!! 17:02:17 #endmeeting