#openstack-meeting-alt log

16:03:41 <rkukura> #startmeeting networking_ml2
16:03:42 <openstack> Meeting started Wed Mar  4 16:03:41 2015 UTC and is due to finish in 60 minutes.  The chair is rkukura. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:03:43 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:03:44 <Sukhdev> rkukura: he is here
16:03:46 <openstack> The meeting name has been set to 'networking_ml2'
16:04:00 <rkukura> #topic Agenda
16:04:15 <rkukura> #link https://wiki.openstack.org/wiki/Meetings/ML2
16:04:29 <rkukura> Anything we need to add to the agenda?
16:04:57 <rkukura> #topic Announcements
16:05:05 <rkukura> One announcement: Kilo Feature Freeze is tomorrow (March 5th)
16:05:36 <rkukura> I missed this week’s neutron IRC meeting. This is the deadline to submit patches against BPs, right?
16:05:46 <shivharis> clarification, the code should be in for review by this date
16:06:04 <rkukura> Bug fix code can be submitted after this deadline, right?
16:06:24 <sadasu> correct
16:06:27 <shivharis> if the ci is not functional it is still a good idea to push the patchset
16:06:33 <Sukhdev> shivharis, rkukura: No, I think this is FPF date -
16:06:36 <amotoki> bug fixes are not features :-(
16:06:40 <amotoki> bug fixes are not features :-)
16:06:59 <Sukhdev> amotoki:-)
16:07:12 <rkukura> Right, so FPF is tomorrow, code freeze is when?
16:07:23 <shivharis> mar 19
16:07:29 <Sukhdev> rkukura: Usually a week or two later
16:07:54 <rkukura> OK. I think the ML2 features have all made the FPF - am I missing anything?
16:07:57 <amotoki> rkukura: I think you mean feature freeze.
16:08:42 <rkukura> amotoki: right - freeze for all code to merge except for bugs blocking release, right?
16:09:08 <rkukura> Any other announcements?
16:09:14 <Sukhdev> rkukura: no freeze for feature proposal - I think
16:09:39 <rkukura> Sukhdev: Isn’t FPF tomorrow?
16:10:26 <Sukhdev> rkukura: yes - i meant the freeze is for feature proposal not code
16:11:05 <rkukura> My understanding is FPF is the deadline to submit patch for new features, and FF is the deadline for them to be merged. Is that correct?
16:11:30 <amotoki> rkukura: right
16:12:12 <rkukura> Do we know whethere we will see initial patches for the pecan and plugin API BPs by tomorrow?
16:13:20 <amotoki> I have no information.
16:13:30 <rkukura> OK, lets move on
16:13:48 <rkukura> #topic ML2 Drivers decomposition discussion
16:14:23 <sadasu> Need reveiwers for https://review.openstack.org/#/c/155436/ pls
16:14:28 <shivharis> other than arista, what other drivers are in?
16:14:32 <rkukura> #link https://github.com/openstack/neutron/blob/master/doc/source/devref/contribute.rst#how-to
16:14:56 <sadasu> I have marked it in WIP since the corresponding vendor patch is still not in stackforge
16:15:19 <Sukhdev> rkukura: ODL
16:16:33 <Sukhdev> sadasu: it has to pass Cisco CI - that is the requirement
16:16:35 <sadasu> the patches are complete in all other respects…do you want me to lift the WIP tag?
16:17:27 <Sukhdev> sadasu: if the Cisco CI is not voting on it, you will not get any useful reviews :-)
16:17:45 <shivharis> reviewers are going to ask for CI immediately, they look at jenkins followed by CI vote
16:18:26 <Sukhdev> sadasu: If you need help in getting Cisco CI to vote on this patch, ping me, I can help
16:19:01 <amotoki> one question on vendor decomposition. Before and after the decomposition we need different CI setup. Is it better to stop the current CI and setup new CI for the decomposition?
16:19:11 <Sukhdev> sadasu: If you remove WIP, your patch will start to get attention - so, it is the right thing to do - if you believe it is ready
16:19:20 <sadasu> Sukhdev: thanks! I am in the process of integrating my UCS M specific testbed to the rest of Cisco CI
16:19:38 <sadasu> other than that I am all set
16:19:52 <amotoki> ... my question is not for sadasu's case. it is a new mech driver.
16:19:55 <Sukhdev> amotoki: It is a chicken-n-egg problem -
16:19:57 <shivharis> amotoki: yes, that is what i did
16:20:37 <sadasu> amotoki: I believe that is correct and that is why there is some down-time while moving the test - bed over
16:20:39 <shivharis> new mech driver will not have a prior CI (?)
16:20:55 <Sukhdev> amotoki: For CI to pass during the decomposition, you have to manually install the stackforge repo and afterwards you can pull it from pypi
16:21:01 <crimmson_> trying to follow: what WIP and CI means plz
16:21:26 <shivharis> WIP, work in progress CI, continuous integration
16:21:51 <amotoki> Sukhdev: thanks for the tips. It sounds reasonable appraoch
16:22:43 <Sukhdev> amotoki: without this tip, you will not get core team's approval :-)
16:22:48 <shivharis> Sukhdev: +1
16:23:18 <shivharis> actually, to be precise, uninstall then install
16:24:00 <sadasu> Sukhdev: I would rather get attention now rather than the last day
16:24:01 <Sukhdev> sadasu: I will review your patch - you have to mock most of your stackforge code - will let you know if I see anything funny
16:24:27 <Sukhdev> sadasu: hence, I suggested to remove the WIP -
16:25:00 <shivharis> Sukhdev: please take a look at my patch-set as well, thanks
16:25:10 <sadasu> Sukhdev: thanks
16:25:13 <Sukhdev> shivharis: link?
16:25:28 <shivharis> i have sent u just now
16:25:47 <Sukhdev> shivharis: cool
16:26:14 <rkukura> any other decomposition questions/issues/comments?
16:26:16 <moshele> who should update the https://github.com/openstack/neutron/blob/master/doc/source/devref/contribute.rst#how-to because the networking-mlnx is out of tree already
16:26:50 <Sukhdev> moshele: I would suggest you submit the patch to update it
16:26:56 <moshele> ok
16:27:11 <rkukura> I was gooing to suggest we all submit patches to https://github.com/openstack/neutron/blob/master/doc/source/devref/contribute.rst#decomposition-progress-chart as we make progress.
16:27:29 <amotoki> it is just a snapshot of sometime and more input is really welcome.
16:27:53 <Sukhdev> rkukura: correct - that is the right way to do - but, keep armax in the loop - he is in charge of this
16:28:07 <rkukura> anything else on decomposition?
16:28:19 <Sukhdev> in other words make sure he approves the patch
16:28:35 <rkukura> #topic ML2 Sync and error handling (Task Flow)
16:28:55 <rkukura> manishg: nice to have you here to discuss this
16:29:24 <manishg> have been sick!  much better now :)
16:29:26 <manishg> just wondering if anyone has looked at the patch?
16:29:48 <Sukhdev> manishg: I am guilty as charged :-)
16:30:00 <rkukura> manishg: Its been on my list to review in detail, but have not had a chance yet
16:30:11 <rkukura> I will try today
16:30:13 <Sukhdev> manishg: this time I left a comment :-)
16:30:29 <manishg> As was suggested by HenryG/ Sukhdev , I added lot of documentation and updated the code.
16:30:46 <manishg> Hoping that the intention/ design is clear.  but if it's not let me know.
16:31:09 <rkukura> manishg: Can you briefly summarize what futher work would be required beyond the current patch?
16:31:57 <manishg> rkukura: the current patch gives an idea of the interface with the drivers (which is backward compatible). next steps would be make it modify the db to maintain state
16:32:06 <manishg> + making it async
16:32:31 <manishg> right now, the functionality is more or less the same (with refactor - getting ready for next steps)
16:32:45 <rkukura> So for example, a create would then return with the resource in a “creating” state, which would change to “ready” when the async tasks complete?
16:32:58 <manishg> yep
16:33:33 <manishg> we can add a state to the resource and perhaps a separate table to track the progress of various drivers which are
16:33:36 <manishg> registered.
16:33:50 <rkukura> Do you have thoughts on how the resources should behave in these intermediate states? Are new requests blocked/queued or rejected or run in parallel?
16:34:09 <manishg> the other table with drivers only have entries while the state is "creating" , e.g.
16:34:42 <manishg> when all of the drivers are done creating, the overall state of the resource can be ACTIVE and the driver table could have entries removed for that resource
16:34:48 <manishg> sounds right?
16:35:14 <manishg> while resource is in intermediate state
16:35:23 <manishg> only certain operations are permitted.
16:35:37 <rkukura> manishg: I think so, although it might be worth keeping track of which drivers are up-to-date rather than which drivers are not
16:35:38 <manishg> and when querying for resources only ACTIVE ones are returned
16:35:44 <Sukhdev> manishg: During Paris summit - we discussed as to how long to keep the resource in intermediate state, say if one of the MD is misbehaving/
16:36:15 <rkukura> not sure hiding the resource makes sense
16:36:29 <rkukura> would seem a create followed by a get should always work
16:36:36 <manishg> rkukura: I was thinking we maintain all driver states for the resources so that you know the complete state... until the transition happens to ACTIVE
16:37:09 <rkukura> manishg: I agree the plugin should keep track of the driver states and refect these in the overall state
16:37:11 <manishg> rkukura: the hiding part is only default behaviour and one can request "all" resources.  The reason for this
16:37:29 <Sukhdev> manishg: At what stage will we declare an operation failed? Or keep in intermediate state forever?
16:37:32 <manishg> is to maintain compatibility with current behaviour
16:37:53 <amotoki> is there any reason we cannot return resources with non-ACTIVE status?
16:38:26 <amotoki> I think we can return error resposen for operations on resource in creaeting status.
16:38:28 <rkukura> manishg: I’d think a lot of scripts and tools expect the resource to exist right after a create that didn’t fail
16:38:29 <Sukhdev> amotoki: non-ACTIVE is equavelent to "creating"
16:38:31 <manishg> rkukura, amotoki: today, we create and that succeeds only when resource is ACTIVE.  so create + get = resource in ACTIVE state.  and callers use it that way
16:38:33 <amotoki> *response
16:38:59 <manishg> we can have both options.  but not sure what we want the default to be
16:39:07 <manishg> I'd imagine default to return only ACTIVE.
16:39:09 <amotoki> Sukhdev: non-ACTIVE is same as 'Creating" when creating a resource.
16:39:27 <shivharis> when u do a create you get a net uuid, but when u do a 'normal' get you don't see it?
16:39:45 <rkukura> Is it reasonable to consider this driver state as trailing the definitive DB state, hopefully eventually becoming consistnent?
16:39:47 <manishg> Sukhdev: about how long.... that depends on the drivers.  And when they give up...
16:40:17 <manishg> amotoki: non-ACTIVE = "creating"  or "error" state (if drivers are able to declare it such due to certain condition)
16:40:21 <rkukura> And if so, why not allow API operations while in these intermediate states?
16:40:56 <rkukura> So the DB operations occur in sequence, building up a queue of tasks that need to complete until all thre drivers are in sync
16:41:18 <manishg> rkukura: in intermediate states certain operations would make sense.  for example in "creating" state one can delete
16:41:21 <amotoki> pending state (*-ing) in my mind. non-ACTIVE is misleading....
16:41:31 <rkukura> And then when nova for instance needs to wait for a port to be ready to plug, it would make sure there is no backlog of driver activity at that point
16:42:34 <manishg> rkukura: it will be simpler to not queue up lots of operations because then we run the risk of either uselessly doing operations (e.g. multiple updates) or will need to prune the queue (either ways it will be more complex)
16:43:03 <rkukura> manishg: I agree it could get complicated. But we do need to keep it usable.
16:43:37 <Sukhdev> rkukura: you bring up a good point
16:43:39 <manishg> rkukura: isn't nova doing the same?
16:44:10 <Sukhdev> I think nova uses intemediate states as well - so, we may be OK
16:44:29 <Sukhdev> It may be worthwhile to look into this
16:44:34 <manishg> Sukhdev: I think rkukura's question is not about the states but operations that are permitted in those states.
16:44:48 <rkukura> We’ve got a few more agenda items and limited time. I suggest we continue this discussion of the next steps next week, hopefully after we’ve all reviewed manishg’s current patch in detail.
16:44:52 <manishg> so he doesn't want the caller to bother about the state.
16:45:21 <manishg> rkukura: sure.  but I think the more people look at the patch and comment I can proceed to next step of the states.
16:45:51 <manishg> The patch's been out for two weeks or so.  The sooner I get feedback, the faster we can march.
16:45:52 <rkukura> manishg: exactly - lets review this first phase, and discuss the next phase
16:45:59 <manishg> cool.  thanks.
16:46:02 <Sukhdev> manishg rkukura ; I have a one question - before you move to next topic
16:46:11 <rkukura> Sukhdev: go ahead
16:46:53 <Sukhdev> Say if the resource is in Active state (all MD's are in sync) - can one MD then move the state of the resource into "..ing" state?
16:47:20 <manishg> what event would cause that?
16:47:38 <Sukhdev> I ask this in the context of back-end of the driver resets and it needs to bring everyting in sync - and do not want to allow additional operations until all in sync
16:47:40 <rkukura> Is this like if something breaks and the driver detect it?
16:48:08 <manishg> if it's ACTIVE, delete API could cause it to go to DELETing but after ACTIVE, going back to CREATing would be weird.
16:48:10 <Sukhdev> rkukura: yes
16:48:36 <Sukhdev> manishg: I would want to move from ACTIVE - to creating
16:48:52 <rkukura> Maybe these should be special intermediate states, not CREATING or DELETING.
16:49:14 <Sukhdev> manishg: if my back-end resets and need to rebuild the state - do not want any operations (e.g. Update) performed on it while I am rebuilding my state
16:50:11 <Sukhdev> manishg: This is one of the use cases I was thinking when I was reviewing your patvch
16:50:13 <Sukhdev> patch
16:50:42 <manishg> Sukhdev: I'm not really sure if we want to feed in events from devices back in the db... but I'll think about it.  we can discuss more next time.
16:51:05 <manishg> if we do, there will be lots of such cases.  what if certain segment is down (data-path)...
16:51:25 <Sukhdev> manishg: yes, lets think about it - as I see this use case all the times - when HW resets :-)
16:51:27 <manishg> do we want to update the db saying the segment is down and prevent selection of that network for new VMs?
16:51:29 <rkukura> I’d also like to avoid complicating the client view - they shouldn’t need to worry about whether an operation can be accepted or they have to wait before trying
16:51:43 <shivharis> Sukhdev: it you are going to put the network in "creating" you will also have to do the same to the respective ports
16:52:12 <Sukhdev> shivharis: I am more or less thinking about ports (was not thinking about networks)
16:52:38 <shivharis> Sukhdev: why not?
16:52:53 <manishg> rkukura: I think building an async API while client isn't looking at the states will surely introduce complications.  But that is another topic we need to discuss more I think.
16:52:59 <Sukhdev> In general if I have 10K VMs running on 4 networks - the ports become critical
16:53:46 <shivharis> Sukhdev: in the interest of getting something done, i suggest simplifying the first cut
16:54:11 <shivharis> we can add HA/audit like functionality later
16:54:16 <manishg> shivharis: +1  (Sukhdev: what do we do now?_
16:54:24 <rkukura> The definitive state is the DB. It seems to me we should allow all API operations to proceed moving the DB from one state to the next, and come up with a solution where the backend state eventually converges, and there is visibility when needed of whether it has or hasn’t converged.
16:54:28 <Sukhdev> I do not know how much scale are you testing your MDs, but, I am pushing beyond 20K VMs - then these kings of issue pop up, especially, if you reset HW in the middle (or switchover)
16:54:42 <Sukhdev> I am hoping this new mechanism can help all MDs
16:55:03 <manishg> rkukura: agree.  will discuss more.
16:55:14 <rkukura> OK, lets move on, and continue this very useful discussion next week
16:55:23 <manishg> Sukhdev: what do you now?
16:55:28 <rkukura> #topic portsecurity patches
16:55:30 <manishg> rkukura: sure.
16:55:38 <rkukura> #link portsecurity patches
16:55:38 <yamahata> hi
16:55:50 <rkukura> #link https://review.openstack.org/#/c/126552/
16:56:01 <rkukura> #link https://review.openstack.org/#/c/160051/
16:56:03 <Sukhdev> manishg: we have a very elaborate Sync mechanism - I was hoping to remove that and use taskflow - :-)
16:56:07 <rkukura> yamahata: go ahead...
16:56:24 <yamahata> 126552 is framework  for functional tests of iptables firewall generic.
16:56:40 <yamahata> So I'd like to merge it first so that we can add tests independently.
16:56:55 <yamahata> the second one is 160051. it has db issue.
16:57:06 <yamahata> yalie1: can you explain it?
16:57:36 <manishg> Sukhdev: I'll look at your driver.  you update the db state?  Let's chat offline since folks are discussing other topic now.
16:57:43 <yalie1> y, 160051 is the major patch to implement the extension driver of port-sec
16:58:22 <yalie1> it almost done, but miss the functional test cases.
16:58:59 <shwetaap> its the other way around 126552 is the port sec patch and the 160051 is the functional test patch.
16:59:11 <Sukhdev> manishg: no, I do not - but, considered it…..especially, when running 3 or 4 Neutron servers all pushing 20K+ VMs and forcing the switchovers in HW
16:59:29 <rkukura> we are down to a minute - anything blocking progress?
17:00:06 <shwetaap> hopefully more reviews on the portsecurity patches.
17:00:10 <rkukura> Lets review those patches!
17:00:23 <rkukura> shivharis: Any bugs we need to look at this week?
17:00:27 <Sukhdev> shwetaap: I will review later today
17:00:28 <shivharis> FYI: I am pushing this bug fix for K3, https://bugs.launchpad.net/neutron/+bug/1328991
17:00:29 <openstack> Launchpad bug 1328991 in neutron "External network should not have provider:network_type as vxlan" [Medium,In progress] - Assigned to Aman Kumar (amank)
17:01:01 <shivharis> i will ping the owner, it is close to getting a fix and useful to have
17:01:19 <rkukura> I’m making forward progress on the DVR schema/logic cleanup - hopefully a patch soon
17:01:44 <rkukura> We are out of time - anything else urgent?
17:01:47 <shivharis> k3 in mar 19 we need all high prio fixed
17:02:09 <rkukura> shivharis: right
17:02:12 <rkukura> Thanks everyone!
17:02:14 <manishg> thanks.  bye.
17:02:17 <sadasu> thanks!!
17:02:17 <rkukura> #endmeeting