16:04:06 #startmeeting networking_ml2 16:04:07 Meeting started Wed Dec 18 16:04:06 2013 UTC and is due to finish in 60 minutes. The chair is rkukura. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:04:08 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:04:11 The meeting name has been set to 'networking_ml2' 16:04:22 #link https://wiki.openstack.org/wiki/Meetings/ML2 Agenda 16:04:44 #topic Action Items From Last Week 16:04:56 hi 16:05:08 so, last week i had two actions items, first being running unit tests 16:05:41 i posted my results to the ML, but with no interpretation of the results since i wasn't sure what we are aiming for besides more coverage which would be represented by the % field 16:05:49 asadoughi: Yes, and see how our coverage looks 16:05:51 http://paste.openstack.org/show/54845/ 16:06:16 ^ that was for ml2 tests and this is the agent unit tests: http://lists.openstack.org/pipermail/openstack-dev/2013-December/022146.html 16:07:51 Hi 16:07:58 yeah, i have nothing more to add about unit tests, rkukura? 16:08:10 Sukhdev: hi - just discussing the tox coverage action item 16:08:44 asadoughi: nothing in the plugin jumps out, but there is room for improvment 16:09:11 definitely 16:09:35 particularly in the of the type drivers 16:10:20 agents could use some improvement and i will be working on the ovs_neutron_agent going forward 16:10:25 I'm kind of surprised rpc.py is at 85% 16:10:47 higher than expected? 16:10:52 agent unit tests are really important as we do all the bug fixing and refactoring for stability 16:11:26 I didn't think the RPC handlers (like get_device_details) were covered in the unit tests 16:11:55 ah, well, i'm not sure how much of the unit tests are actually unit tests if you catch my drift 16:11:56 Does anyone know if there is a way to get annotated source showing line-by-line coverage? 16:12:21 Or at least method-level coverage rather than just file-level 16:12:49 not sure, but i can look into if it's possible or not 16:13:07 asadoughi: I think so, but I did not expect real RPCs between agent and server to be executed in the unit test run, right? 16:13:49 right 16:13:57 #action: asadoughi to look into if line-level or function-level coverage data is available 16:14:12 Don't know about 'tox -e cover', but 'run_tests.sh --coverage' will give line-by-line coverage in HTML format 16:14:16 sorry for barging in, but if you're testing classic python code, the python "coverage" tool can generate line-by-line coverage output 16:14:52 dane_, kblin: Thanks. 16:15:26 anything else on the current coverage data asadoughi provided, or next steps? 16:16:12 if not, lets move on to: asadoughi to discuss ovs-firewall-driver on email list and schedule IRC meeting on ML2+SG 16:17:31 i held a meeting monday at 2000 utc; all of those who voiced their opinion agreed about what i later posted to the ML for wider discussion: adding --source-port-range-min and --source-port-range-max to security groups API 16:17:44 http://lists.openstack.org/pipermail/openstack-dev/2013-December/022518.html or https://wiki.openstack.org/wiki/Neutron/blueprint_ovs-firewall-driver#Security_groups_extension_API_addition_discussion 16:18:18 did anyone have a chance to read the e-mail? the tl:dr is To implement a performant OVS-based security groups solution in Neutron today, source port matching is a required addition to the security groups extension API. 16:20:31 asadoughi: I had not seen the email, but looks like there is a concrete recommendation 16:21:40 this week, i will continue with the reviews i have already uploaded and continue with baseline work to get the agent compatible with the firewall 16:22:13 that is all for now. any feedback is appreciated. thanks. 16:22:42 asadoughi: thanks! 16:22:55 any other comments on the ovs-firewall-driver? 16:23:03 https://review.openstack.org/#/c/62129/ https://review.openstack.org/#/c/62130/ 16:24:18 just mentioning the links for completeness if anyone had questions 16:24:47 asadoughi: I'm not seeing any review comments - how long until you feel you can remove the WIP? 16:25:13 work should be completed this week on those reviews 16:25:43 Is the source port support being added to the iptables driver as well? 16:25:58 it already has it, will add tests though 16:26:13 great 16:26:16 (source port is already in the rpc api) 16:27:20 #topic MechanismDriver API delete method ordering and the relationship between DB state and external mechanism state managed by the driver 16:27:50 rcurran: Would you like to summarize where we are on this? 16:28:07 ummm, lots of good email :-) 16:28:20 two solutions have been proposed 16:28:33 1. reverse order of deletes to match create 16:28:37 rcurran: Agreed, but we need to move the email discussion back onto openstack-dev 16:28:50 2. save off bound_segment to another object for use w/ deletes 16:29:46 I believe both solutions are orthogonal 16:30:11 perhaps need both of them :-) 16:30:26 rcurran: Isn't #2 more about making the previous_bound segments available in update_port_postcommit, and making sure that's called while unbinding before deleting? 16:31:45 rcurran, Sukhdev: There seems to be some fundamental question about the relationship between the DB state and the external state (device state managed by drivers), and how these are kept in sync 16:31:57 perhaps ... today most mech drivers only take actions (in this area) on delete_port 16:33:45 rcurran: That may work in many cases, but what about things like VM migration where a port gets unbound and then rebound? 16:33:48 rkukura: I saw your latest email, agree to most of what you say and replied to it. 16:34:45 I'm not sure whether we should try to continue/wrap-up that discussion here, or move the email discussion to openstack-dev and continue there 16:34:54 vm migration is one of the things i need to port to the cisco_nexus md, and yes this is where i'll need more info on bound_segment info 16:35:45 So we have several options on how to make bound_segment or previous_bound_segment available in the existing method calls 16:36:15 we also seem to realize there are race conditions and failure modes where external state can diverge from the DB state 16:37:59 To me, the biggest question is whether we want loose synchronization with eventual consistency, or need tighter transactional synchronization between the DB state and the external state 16:38:39 I think keeping loose coupling is better model 16:39:30 sukhdev: agreed 16:39:32 If we stick with loose synchronization, what do we need to do the ensure eventual consistency? 16:39:45 However, the split of pre/post gives the preception that framework is proposing tight synchronization model 16:40:38 Sukhdev: Maybe the names are unfortunate, but the precommit methods are the ones that are part of the DB transactions, and are therefore the only tightly synchronized part 16:41:31 creating delete functions are always tricky, on a failure to you try to re-create what was (potentially) just deleted or make a "best effort" at deleting everything and then regardless tell the calling function about the exception 16:42:09 rcurran: Its definitely easier to recover from a failed create by doing a delete, then trying to undo a failed delete 16:42:10 In that case, what should be the behaviour of ML2 framework when a post operation is failed? 16:42:20 We also need to think about update failures 16:43:09 One approach is to really only guarantee the transactional part, and come up with some way to achieve eventual consistency for the external state 16:43:37 rkukura: I like that model 16:43:48 rkurura: agree about creates being the easier to solve and for updates we can consider (and andre left this as a TODO) on failure use the org_dict info to put the info back to the way it was 16:45:54 I have a crazy thought - what if used pre operations as golden and do not act on the failures of post operations and send a notification to the north bound APIs that devices may be out of sync and will requiire operator invention 16:45:57 Would some sort of periodic re-syncing of the DB state with all the drivers be sufficient and scalable? 16:45:57 but as sukhdev as stated these are similar but orthogonal type converstaions 16:47:07 Sukhdev: So failures in postcommit driver methods would flag the resource to resynched later? 16:47:25 sukhdev: agreed - i think for all types of failures (create, update, delete - even though on create we "clean up" by deleting) that the calling function would get an exception 16:47:30 rkukura: If you look at Arista Driver's sync mechanism, it preceisely does that - periodically sync the state between DB and back-end....it assumes DB as "true source" 16:48:20 rkukura: yes 16:48:25 Sukhdev: So maybe that sync mechanism or something similar could get promoted to the plugin, and the cost amortized across all the drivers? 16:49:38 rcurran: If we go with loose synchronization with some mechanism in the plugin to ensure eventual consistency, do we still have a reason to reverse the precommit/postcommit methods for delete operations? 16:49:48 rkukura: it may be a very heavy handed operation :-) 16:50:42 Sukhdev: That is a concern, and maybe it could be optimized by flagging individual resources for re-synch or something like that 16:50:51 rkukura: yes. port.bound_segment would not be available on delete_port_postcommit() 16:51:23 Sukhdev: Lets flesh out the sync options/details on openstack-dev 16:51:57 rkukura: yes, we can improve upon the implementation - but, overall, this is a great way to go.... 16:52:30 rcurran: So why not have the unbinding phase occur before delete_precommit, and result in update_precommit/update_postcommit calls on the driver where previous_bound_segment is available in PortContext? 16:53:53 i think we need a previous_bound_segment for other reasons. if we want delete_port_xxxcommit() to access the previosus_bound_segment for retrieving the vlan then i can live w/ that 16:54:11 rcurran: That way unbinding looks the same to the drivers regardless of whether its part of a delete, a migration, or something else (agent failure) 16:54:52 rcurran: What about doing the unbind 1st as an update before calling delete_port_precommit on the drivers? 16:55:29 Or do we really want the unbinding to be the same transaction as the deleting? 16:55:48 right now, the cisco_nexus driver (and i think all others) are taking no actions on port_bind and unbind ... only deletes 16:56:40 rcurran: Yes, and that is because port_bind and port_unbind are only called on a subset of the drivers 16:57:21 rcurran: I'm suggesting unbinding is really an update with port_update_precommit and port_update_postcommit calls on all register mechanism drivers 16:57:31 True for Arista - we do not take any action on bind/unbid - only delete 16:58:16 and will previous_bound_segment still be available (and accurate as the last vlan used) on delete_port() 16:58:47 There is also a TODO related to compound bindings or something like that, which maybe is really what we need so that ToR switch drivers are explicitly part of a binding that is based on an L2 agent's driver. 16:59:11 sorry, we're running out of time, had a quick question: next week is christmas and the week after is new year's day. are we holding any meetings before January 8? 17:00:10 rcurran: If we do treat the implicit unbind as a separate transaction, I don't think the previous info from the unbind transaction would be visible in the delete precommit/postcommit calls 17:00:48 asadoughi: mestery proposed cancelling the next two - I'm OK either way 17:01:20 rkukura: +1 on cancel, i'll be out on vacation after this week until 2014 17:01:27 then that would cause a re-write for my md ... basically i'd want to remove the nexus switch info on port_unbind(). not undoable but not how things have been working 17:02:22 rcurran: I think you'd need to remove the switch info in port_update_postcommit unless we did compound-binding 17:02:51 Lets move this discussion to openstack-dev 17:03:15 OK, we'll cancel the next two regular meetings 17:03:38 Enjoy the holidays and/or time off everyone! 17:03:47 #endmeeting