16:01:11 #startmeeting networking_ml2 16:01:11 hi 16:01:11 Meeting started Wed Sep 24 16:01:11 2014 UTC and is due to finish in 60 minutes. The chair is Sukhdev. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:01:12 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:01:15 The meeting name has been set to 'networking_ml2' 16:01:30 #topic: Agenda 16:01:47 #link: https://wiki.openstack.org/wiki/Meetings/ML2#Agenda 16:02:05 #topic: Announcements: 16:02:24 Juno RC1 is very close 16:02:43 Does anyone know the exact date? I missed the core meeting yesterday 16:02:50 https://wiki.openstack.org/wiki/Juno_Release_Schedule 16:03:12 dont know the exact date 16:03:43 I think it is sometime this week 16:03:49 was said around today :) no exact day 16:04:31 hi 16:04:40 will high priority bugs still get in after the date? 16:04:46 Kilo Summit sessions are growing in numbers 16:04:54 bugs -> bug fixes 16:05:03 shivharis: i think so 16:05:24 shivharis: If any critical bug is left, they may cut RC2 16:05:31 Yes, we are now focusing on high priority bugs. 16:06:02 We have one to discuss - lets cover that in the Bugs section 16:06:31 Please look at the Kilo summit sessions list - it is growing https://etherpad.openstack.org/p/kilo-neutron-summit-topics 16:07:02 nice! 16:07:11 If you have a passion for something or would like to propose a session, this etherpad is a good place to add 16:07:19 about RC1, the exact date is not decided but several projects plan to release the end of the week, so it is a good milestone. 16:07:21 long! 16:07:48 amotoki: thanks for clarifying 16:08:17 I have one bug to push - hopefully will make it :-) 16:08:25 Any other announcements? 16:08:52 #topic: Actions from last week 16:09:04 make hotel reservations soon, they are going fast 16:09:12 banix, mlavalle: update? 16:09:24 Sukhdev: we had a nice chat yesterday 16:09:31 mlavalle: you want to discuss 16:10:08 Sukhdev: we agreed to propose to the team to implement a special mechanism driver that will fail in predictable ways, for example every four operations 16:10:42 this will enable a tempest script to expect the bulk operations to fail and verify they are rolled back 16:10:57 i was thinking the exact behavior of this driver could also be set by config parameters; just a thought; more flexible 16:11:11 banix: great idea 16:11:25 mlavalle: you would need a predictable trigger to achieve this 16:11:37 mlavalle, banix: Could also use special values set on some attribute (i.e. name) to trigger behaviour. 16:11:53 i like the idea +1 16:12:11 as time goes by it can be used for other tests as well 16:12:25 I wrote a brief description in the agenda items 16:12:28 rkukura: yeah we thought about your suggestion as well 16:12:30 my question is how to specify such driver. 16:13:20 ah.. rkukura already mentions the related topic. 16:13:38 amotoki: I am thinking a mocked driver with some config knobs 16:13:39 amotoki: The driver could live in tempest - just needs its entry point in the right namespace in setup.py. 16:13:48 rkukura: thought the other option may be a bit more straightforward less involved 16:14:02 yeah we have such a driver there already: bulkless 16:14:03 rkukura: yes. we can specify multiple drivers. 16:14:33 bulkfail? 16:14:42 mlavalle: Is this special driver part of tempest or for unit/functional tests? 16:14:48 rkukura: it could live in ml2/plugin/drivers (as a test driver?) 16:15:03 amotoki: neutron/tests/unit/ml2/drivers/mechanism_bulkless.py 16:15:10 If its only for tempest, why not put it in the tempest repo? 16:15:11 banix: i know. 16:15:22 rkukura: I am thinking this driver is part of ml2 and used by tempest scripts 16:15:42 I’m a bit uncomfortable making such a driver part of ML2’s “stable API”, with tempest being unversionsed. 16:15:52 but using tempest to this purpose is a bit big thing as my first impression 16:16:11 But I’m also uncomfortable making the driver API a “stable API” for tempest :( 16:16:16 amotoki: you mean too much; not necessary? 16:17:14 banix: tempest test API and scenarios tests for some set of OpenStack, so I wonder whether a special driver fits or not. 16:17:34 tempest test*s* *..... 16:17:35 amotoki: Maybe this is best done as a unit test or functional test. 16:17:50 amotoki: rkukura i see the concern 16:17:53 rkukura: totally agree. 16:18:19 we have some forms of these tests already in the unit tests 16:18:56 may be functional tests is the correct place for something like this 16:19:02 banix: yes, so I think it fits unit test or functional test rather than tempest tests. 16:19:36 rkukura: I am not too sure how will you test this behavior with unit tests - but, functional tests may be 16:20:21 Sukhdev: we do test with failing mechanism drivers in unit tests 16:20:29 mlavalle: Are functional tests not part of tempest tests? 16:20:30 Sukhdev: We have plenty of unit tests that configure drivers and driver them through the neutron API. Maybe these should be functional tests instead, but not sure. 16:20:43 but the proposed solution may be a step further wrt testing functionality 16:20:47 s/and driver them/and drive them/ 16:21:03 Sukhdev: they are not 16:21:20 mlavalle: thanks 16:21:29 I think it fits functional tests though i have no good idea on how to implement so far. 16:21:42 may be that is the next action item to see how we can improve neutron functional tests in this regard? 16:22:05 banix: +1 16:22:14 It seems that if we want to do this with tempest, we are best off putting this driver in neutron and making sure that its behaviour and the mechanism for triggering it are things we can maintain backwards compatability with. 16:22:35 rkukura: correct 16:23:28 in my view, tempest tests should not expect some plugins/drivers. 16:23:30 mlavalle: in functional testing is it possible to restart neutron-server? 16:23:33 banix mlavalle: Will you continue to drive this? 16:23:44 Sukhdev: yes 16:24:08 shivharis: I don't really have experience with neutron functional testing. for me, that is somethiong to eplore 16:24:29 #action: banix to work with mlavalle and the team to figure out best testing strategy for bulk operations 16:24:30 Sukhdev: yes, as i said last week, I am joining the ml2 team for Kilo 16:24:44 i think that is a good thing for us to explore 16:25:01 Anything else on this subject? 16:25:09 Sukhdev: if you all take me, of course :-) 16:25:29 mlavalle: You are most welcome 16:25:34 mlavalle: You are more than welcome to get as involved as you want! 16:25:41 :-) 16:25:52 mlavalle: why wait for kilo 16:25:56 mlavalle: we humbly accept to you as a fellow ML2'er :-) 16:26:20 shivharis: it means now... it's just that it takes some ramp up time 16:26:28 Anything else, folks? 16:26:35 now we have to tell him about the secret rules ;) 16:26:43 i'm done for today 16:26:50 #topic: Bugs 16:26:58 hi 16:26:59 shivharis: Floor is yours 16:27:32 we have discussed banix's bug quite well... 16:27:48 moving on: 16:27:52 https://bugs.launchpad.net/neutron/+bug/1179223 16:27:54 Launchpad bug 1179223 in neutron "Retired GRE and VXLAN tunnels persists in neutron db" [High,In progress] 16:28:00 romilg: ? 16:28:40 I posted the patch set 16:28:52 is this based on a new design? 16:28:57 need reviewers 16:29:25 yeah 16:29:30 kyle thinks that this should not be postponsed to kilo 16:29:46 I have added a table tunnel_mappings to accomated this 16:29:51 we need help reviewing this (high priority) 16:30:00 lets have a looks at https://review.openstack.org/#/c/121000/ 16:30:05 and share comments 16:30:16 rkukura amotoki: can you spare some cycles for this? 16:30:17 thanks :) 16:30:26 Sukhdev: Yes 16:30:34 sure 16:30:39 rkukura, amotoki: thanks 16:30:43 next: 16:30:50 https://bugs.launchpad.net/neutron/+bug/1367391 16:30:51 I'll take a look at this as well. 16:30:52 Launchpad bug 1367391 in neutron "ML2 DVR port binding implementation unnecessarily duplicates schema and logic" [High,Confirmed] 16:31:08 rkukura: this one is yours 16:31:35 shivharis: mestery decided to postpone that one to juno 16:31:47 kilo? 16:31:47 to kilo I mean 16:31:52 ok 16:32:18 there are other DVR issues that are more critical I believe 16:32:51 ok, we will move this to kilo 16:32:59 other than the 3 we discussed, please look at all the ml2 bugs and see if these need to be prioritized 16:33:28 if things dont change a whole lot we look quite alright for RC1 16:33:54 Targeted to Kilo just means the second priority and it is a balance to the gate queue... we need to focus on bugs targeted to RC1 now. 16:34:32 amotoki: agreed 16:34:59 amotoki: Makes sense - so, we have one from banix and others mentioned by rkukura for DVR 16:34:59 we have only 3 that need work for RC1 16:35:26 this kind of consensus is important :-) 16:35:33 f there are any that should be re-prioritized please speak up now 16:35:47 if there is time may i ask a question regarding my bug 16:35:54 that is my beloved bug :) 16:35:55 Any questions on bugs? 16:36:04 I think one DVR issue we should be tracking is https://review.openstack.org/#/c/123403/ 16:36:05 banix: yes 16:36:16 Thanks for all the reviews; Will adress all today; mainly minor; have one question to discuss here 16:37:02 and that is how we deal with failure in rollback; in the patch, we try to set the status of the resource to ERROR. amotaki in reviews mention about applying the same for non bulks 16:37:10 shivharis: can you add that to the list? 16:37:14 This I believe is changing ML2 so that delete_port_postcommit() gets called multiple times, once for each host on which the DVR port is bound. 16:37:14 rkukura: i dont have a background on this will take this up next time 16:37:29 Sukhdev: will do 16:37:37 that is not done for non bulks right now and am wondering if we could leave that for a separate patch? 16:38:10 It already calls delete_port_precommit() for each host, so this makes these match, but I believe the delete calls should match the create calls. 16:39:04 I’m expecting this fix will need to go into juno-rc1 for DVR, and that in kilo we will need to revisit how deletes work. 16:39:58 sorry misspelled amotoki above 16:40:07 banix: np 16:40:45 if there is a bug critical for RC1, please raise it or let kyle or one of the core team know it. 16:40:55 i am just thinking we do not change much for non bulk operations in this patch; considering where we are in the cycle 16:41:14 banix: so for my clarification - if one of the four (in a bulk operation) fails, roll back only the failed one or all four? 16:41:27 Sukhdev: all four 16:41:34 banix: on the bulk ops fix, I feel we should avoid unnecessary changes, and leave cleanup/refactoring/etc for kilo 16:41:35 the bulk ops are atomic by definition 16:41:50 i will talk to Kyle regarding the bugs owned by banix and romilg (any other?) 16:42:00 amotoki: ^^^^ 16:42:03 banix: I am neutral we need to fix it in a same patch. 16:42:37 rkukura: are you ok with how rollback failures for bulk are done in the patch or want that done in a lter patch as well? 16:42:59 Sukhdev: add action for me 16:43:17 rkukura: that is setting the status to error for resources that didnt get deleted properly in rollback 16:43:24 amotoki: ok thanks 16:43:53 banix: what does it mean for a resource to not get deleted properly? 16:43:56 #action: shivharis to follow up with mestery regarding ML2 RC1 bugs 16:43:58 banix: setting error status should be consistent 16:44:09 rkukura: exception is thrown 16:44:15 Do you mean that some MD failed? 16:44:15 banix: bulk or non-bulk 16:44:31 right now we do not worry about the delete failing in non bulks 16:44:55 banix: IMO 16:45:00 So for non-bulk, we go ahead and delete the resource? 16:45:02 i think consisteny is important, and one gets a way to deal with these later on, based on status 16:45:04 ChuckC: i see your point 16:45:16 rkukura: yes 16:45:27 I’d say this late in the game, its best to make the bulk behaviour match the current non-bulk behaviour 16:45:38 rkukura: +1 16:45:55 rkukura: +1 16:45:57 amotoki: what do you think about what rkukura just stated above 16:46:17 banix: which one? 16:46:41 amotoki: that we do nit worry about setting the status of resources to ERROR if delete fails in this patch 16:47:26 i like amotoki's idea, but can it wait for kilo? 16:47:27 asking amotoki explicitly becuse he suggested the further step of setting status to ERROR 16:47:31 hmm... setting status in bulk can be in, but.... 16:47:43 others welcome to chime in of course 16:47:46 for a single operation it should be deferred. 16:48:21 ideally both should match, but we don't need to fix it at the same time. 16:48:45 So if a non-bulk create fails, no resource has been created, but with bulk it may have been created but be left in a state where it cannot be used or deleted? 16:49:11 rkukura: even in non bulk the failure may come in post commit 16:49:19 amotoki: do you support a separate Juno patch to make it consistent? 16:49:31 setting status can be deferred too of course. 16:49:36 banix: But in that case, doesn’t the resource get deleted before the create op returns? 16:49:37 banix: agreed 16:49:41 imo, i still think keep consistency bulk or no bulk 16:49:42 ChuckC: I don't think it is a requirement. 16:50:06 rkukura: unless the db delete for some reason fails 16:50:15 rkukura: resources can be visible in the API after precommit finishes so it can potentially be deleted before the rollback in bulk create. 16:50:27 amotoki: you have a point 16:50:29 i think the issue with bulk is even if a delte fails we have to do the other deletes 16:50:43 that we will do 16:50:54 banix: +1 16:51:00 i am thinking dealing with a failed delete may be left for another patch 16:51:11 that is setting the status to ERROR 16:51:28 that way we have consistency between bulk and non bulk as well 16:51:37 My comment on delete failure is jus a possibility. +1 for banix now. 16:51:57 banix: please summarize so we can move on 16:52:22 amotoki: thnanks; just considering how late in the cycle we are; that would be reasonable 16:52:26 yes to summarize: 16:53:18 for bulk operations we make sure if a delte in rollack fails we continue other resources as needed; in a future patch we make sure if delete fails (for bulk and non bulk) we set the status of the resouce to ERROR 16:53:29 banix: thanks 16:53:42 we have beaten all bugs to death, back to you Sukhdev 16:53:47 #agreed: for bulk operations we make sure if a delte in rollack fails we continue other resources as needed; in a future patch we make sure if delete fails (for bulk and non bulk) we set the status of the resouce to ERROR 16:54:00 shivharis: Thanks 16:54:29 Anything else on the bugs, folks? 16:54:44 #topic: Deferred specs 16:55:17 I just put the link in the agenda as a place holder so that folks can see it - feel free to update it https://wiki.openstack.org/wiki/Tracking_ML2_Subgroup_Reviews#Under_Review 16:55:41 #topic: Open Discussion 16:55:48 banix: please include me in discussions re testing mech driver failures 16:55:50 anything? 16:56:03 ChuckC: sure will do 16:56:09 banix: thanks 16:56:21 Anybody want to discuss anything - we have 4 mins ? 16:56:51 waiting…..waiting…. 16:57:14 Looks like we are done, folks….great discussion and conclusions!!! 16:57:17 thanks for attending 16:57:21 (announce) horizon Juno supports provider networks. 16:57:22 #endmeeting