#openstack-meeting-alt log

16:04:06 <rkukura> #startmeeting networking_ml2
16:04:07 <openstack> Meeting started Wed Dec 18 16:04:06 2013 UTC and is due to finish in 60 minutes.  The chair is rkukura. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:04:08 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:04:11 <openstack> The meeting name has been set to 'networking_ml2'
16:04:22 <rkukura> #link https://wiki.openstack.org/wiki/Meetings/ML2 Agenda
16:04:44 <rkukura> #topic Action Items From Last Week
16:04:56 <asadoughi> hi
16:05:08 <asadoughi> so, last week i had two actions items, first being running unit tests
16:05:41 <asadoughi> i posted my results to the ML, but with no interpretation of the results since i wasn't sure what we are aiming for besides more coverage which would be represented by the % field
16:05:49 <rkukura> asadoughi: Yes, and see how our coverage looks
16:05:51 <asadoughi> http://paste.openstack.org/show/54845/
16:06:16 <asadoughi> ^ that was for ml2 tests and this is the agent unit tests: http://lists.openstack.org/pipermail/openstack-dev/2013-December/022146.html
16:07:51 <Sukhdev> Hi
16:07:58 <asadoughi> yeah, i have nothing more to add about unit tests, rkukura?
16:08:10 <rkukura> Sukhdev: hi - just discussing the tox coverage action item
16:08:44 <rkukura> asadoughi: nothing in the plugin jumps out, but there is room for improvment
16:09:11 <asadoughi> definitely
16:09:35 <rkukura> particularly in the of the type drivers
16:10:20 <asadoughi> agents could use some improvement and i will be working on the ovs_neutron_agent going forward
16:10:25 <rkukura> I'm kind of surprised rpc.py is at 85%
16:10:47 <asadoughi> higher than expected?
16:10:52 <rkukura> agent unit tests are really important as we do all the bug fixing and refactoring for stability
16:11:26 <rkukura> I didn't think the RPC handlers (like get_device_details) were covered in the unit tests
16:11:55 <asadoughi> ah, well, i'm not sure how much of the unit tests are actually unit tests if you catch my drift
16:11:56 <rkukura> Does anyone know if there is a way to get annotated source showing line-by-line coverage?
16:12:21 <rkukura> Or at least method-level coverage rather than just file-level
16:12:49 <asadoughi> not sure, but i can look into if it's possible or not
16:13:07 <rkukura> asadoughi: I think so, but I did not expect real RPCs between agent and server to be executed in the unit test run, right?
16:13:49 <asadoughi> right
16:13:57 <rkukura> #action: asadoughi to look into if line-level or function-level coverage data is available
16:14:12 <dane_> Don't know about 'tox -e cover', but 'run_tests.sh --coverage' will give line-by-line coverage in HTML format
16:14:16 <kblin> sorry for barging in, but if you're testing classic python code, the python "coverage" tool can generate line-by-line coverage output
16:14:52 <rkukura> dane_, kblin: Thanks.
16:15:26 <rkukura> anything else on the current coverage data asadoughi provided, or next steps?
16:16:12 <rkukura> if not, lets move on to: asadoughi to discuss ovs-firewall-driver on email list and schedule IRC meeting on ML2+SG
16:17:31 <asadoughi> i held a meeting monday at 2000 utc; all of those who voiced their opinion agreed about what i later posted to the ML for wider discussion: adding --source-port-range-min and --source-port-range-max to security groups API
16:17:44 <asadoughi> http://lists.openstack.org/pipermail/openstack-dev/2013-December/022518.html or https://wiki.openstack.org/wiki/Neutron/blueprint_ovs-firewall-driver#Security_groups_extension_API_addition_discussion
16:18:18 <asadoughi> did anyone have a chance to read the e-mail? the tl:dr is To implement a performant OVS-based security groups solution in Neutron today, source port matching is a required addition to the security groups extension API.
16:20:31 <rkukura> asadoughi: I had not seen the email, but looks like there is a concrete recommendation
16:21:40 <asadoughi> this week, i will continue with the reviews i have already uploaded and continue with baseline work to get the agent compatible with the firewall
16:22:13 <asadoughi> that is all for now. any feedback is appreciated. thanks.
16:22:42 <rkukura> asadoughi: thanks!
16:22:55 <rkukura> any other comments on the ovs-firewall-driver?
16:23:03 <asadoughi> https://review.openstack.org/#/c/62129/ https://review.openstack.org/#/c/62130/
16:24:18 <asadoughi> just mentioning the links for completeness if anyone had questions
16:24:47 <rkukura> asadoughi: I'm not seeing any review comments - how long until you feel you can remove the WIP?
16:25:13 <asadoughi> work should be completed this week on those reviews
16:25:43 <rkukura> Is the source port support being added to the iptables driver as well?
16:25:58 <asadoughi> it already has it, will add tests though
16:26:13 <rkukura> great
16:26:16 <asadoughi> (source port is already in the rpc api)
16:27:20 <rkukura> #topic MechanismDriver API delete method ordering and the relationship between DB state and external mechanism state managed by the driver
16:27:50 <rkukura> rcurran: Would you like to summarize where we are on this?
16:28:07 <rcurran> ummm, lots of good email :-)
16:28:20 <rcurran> two solutions have been proposed
16:28:33 <rcurran> 1. reverse order of deletes to match create
16:28:37 <rkukura> rcurran: Agreed, but we need to move the email discussion back onto openstack-dev
16:28:50 <rcurran> 2. save off bound_segment to another object for use w/ deletes
16:29:46 <Sukhdev> I believe both solutions are orthogonal
16:30:11 <Sukhdev> perhaps need both of them :-)
16:30:26 <rkukura> rcurran: Isn't #2 more about making the previous_bound segments available in update_port_postcommit, and making sure that's called while unbinding before deleting?
16:31:45 <rkukura> rcurran, Sukhdev: There seems to be some fundamental question about the relationship between the DB state and the external state (device state managed by drivers), and how these are kept in sync
16:31:57 <rcurran> perhaps ... today most mech drivers only take actions (in this area) on delete_port
16:33:45 <rkukura> rcurran: That may work in many cases, but what about things like VM migration where a port gets unbound and then rebound?
16:33:48 <Sukhdev> rkukura: I saw your latest email, agree to most of what you say and replied to it.
16:34:45 <rkukura> I'm not sure whether we should try to continue/wrap-up that discussion here, or move the email discussion to openstack-dev and continue there
16:34:54 <rcurran> vm migration is one of the things i need to port to the cisco_nexus md, and yes this is where i'll need more info on bound_segment info
16:35:45 <rkukura> So we have several options on how to make bound_segment or previous_bound_segment available in the existing method calls
16:36:15 <rkukura> we also seem to realize there are race conditions and failure modes where external state can diverge from the DB state
16:37:59 <rkukura> To me, the biggest question is whether we want loose synchronization with eventual consistency, or need tighter transactional synchronization between the DB state and the external state
16:38:39 <Sukhdev> I think keeping loose coupling is better model
16:39:30 <rcurran> sukhdev: agreed
16:39:32 <rkukura> If we stick with loose synchronization, what do we need to do the ensure eventual consistency?
16:39:45 <Sukhdev> However, the split of pre/post gives the preception that framework is proposing tight synchronization model
16:40:38 <rkukura> Sukhdev: Maybe the names are unfortunate, but the precommit methods are the ones that are part of the DB transactions, and are therefore the only tightly synchronized part
16:41:31 <rcurran> creating delete functions are always tricky, on a failure to you try to re-create what was (potentially) just deleted or make a "best effort" at deleting everything and then regardless tell the calling function about the exception
16:42:09 <rkukura> rcurran: Its definitely easier to recover from a failed create by doing a delete, then trying to undo a failed delete
16:42:10 <Sukhdev> In that case, what should be the behaviour of ML2 framework when a post operation is failed?
16:42:20 <rkukura> We also need to think about update failures
16:43:09 <rkukura> One approach is to really only guarantee the transactional part, and come up with some way to achieve eventual consistency for the external state
16:43:37 <Sukhdev> rkukura: I like that model
16:43:48 <rcurran> rkurura: agree about creates being the easier to solve and for updates we can consider (and andre left this as a TODO) on failure use the org_dict info to put the info back to the way it was
16:45:54 <Sukhdev> I have a crazy thought - what if used pre operations as golden and do not act on the failures of post operations and send a notification to the north bound APIs that devices may be out of sync and will requiire operator invention
16:45:57 <rkukura> Would some sort of periodic re-syncing of the DB state with all the drivers be sufficient and scalable?
16:45:57 <rcurran> but as sukhdev as stated these are similar but orthogonal type converstaions
16:47:07 <rkukura> Sukhdev: So failures in postcommit driver methods would flag the resource to resynched later?
16:47:25 <rcurran> sukhdev: agreed - i think for all types of failures (create, update, delete - even though on create we "clean up" by deleting) that the calling function would get an exception
16:47:30 <Sukhdev> rkukura: If you look at Arista Driver's sync mechanism, it preceisely does that - periodically sync the state between DB and back-end....it assumes DB as "true source"
16:48:20 <Sukhdev> rkukura: yes
16:48:25 <rkukura> Sukhdev: So maybe that sync mechanism or something similar could get promoted to the plugin, and the cost amortized across all the drivers?
16:49:38 <rkukura> rcurran: If we go with loose synchronization with some mechanism in the plugin to ensure eventual consistency, do we still have a reason to reverse the precommit/postcommit methods for delete operations?
16:49:48 <Sukhdev> rkukura: it may be a very heavy handed operation :-)
16:50:42 <rkukura> Sukhdev: That is a concern, and maybe it could be optimized by flagging individual resources for re-synch or something like that
16:50:51 <rcurran> rkukura: yes. port.bound_segment would not be available on delete_port_postcommit()
16:51:23 <rkukura> Sukhdev: Lets flesh out the sync options/details on openstack-dev
16:51:57 <Sukhdev> rkukura: yes, we can improve upon the implementation - but, overall, this is a great way to go....
16:52:30 <rkukura> rcurran: So why not have the unbinding phase occur before delete_precommit, and result in update_precommit/update_postcommit calls on the driver where previous_bound_segment is available in PortContext?
16:53:53 <rcurran> i think we need a previous_bound_segment for other reasons. if we want delete_port_xxxcommit() to access the previosus_bound_segment for retrieving the vlan then i can live w/ that
16:54:11 <rkukura> rcurran: That way unbinding looks the same to the drivers regardless of whether its part of a delete, a migration, or something else (agent failure)
16:54:52 <rkukura> rcurran: What about doing the unbind 1st as an update before calling delete_port_precommit on the drivers?
16:55:29 <rkukura> Or do we really want the unbinding to be the same transaction as the deleting?
16:55:48 <rcurran> right now, the cisco_nexus driver (and i think all others) are taking no actions on port_bind and unbind ... only deletes
16:56:40 <rkukura> rcurran: Yes, and that is because port_bind and port_unbind are only called on a subset of the drivers
16:57:21 <rkukura> rcurran: I'm suggesting unbinding is really an update with port_update_precommit and port_update_postcommit calls on all register mechanism drivers
16:57:31 <Sukhdev> True for Arista - we do not take any action on bind/unbid - only delete
16:58:16 <rcurran> and will previous_bound_segment still be available (and accurate as the last vlan used) on delete_port()
16:58:47 <rkukura> There is also a TODO related to compound bindings or something like that, which maybe is really what we need so that ToR switch drivers are explicitly part of a binding that is based on an L2 agent's driver.
16:59:11 <asadoughi> sorry, we're running out of time, had a quick question: next week is christmas and the week after is new year's day. are we holding any meetings before January 8?
17:00:10 <rkukura> rcurran: If we do treat the implicit unbind as a separate transaction, I don't think the previous info from the unbind transaction would be visible in the delete precommit/postcommit calls
17:00:48 <rkukura> asadoughi: mestery proposed cancelling the next two - I'm OK either way
17:01:20 <asadoughi> rkukura: +1 on cancel, i'll be out on vacation after this week until 2014
17:01:27 <rcurran> then that would cause a re-write for my md ... basically i'd want to remove the nexus switch info on port_unbind(). not undoable but not how things have been working
17:02:22 <rkukura> rcurran: I think you'd need to remove the switch info in port_update_postcommit unless we did compound-binding
17:02:51 <rkukura> Lets move this discussion to openstack-dev
17:03:15 <rkukura> OK, we'll cancel the next two regular meetings
17:03:38 <rkukura> Enjoy the holidays and/or time off everyone!
17:03:47 <rkukura> #endmeeting