15:00:26 #startmeeting neutron_dvr 15:00:27 Meeting started Wed Dec 16 15:00:26 2015 UTC and is due to finish in 60 minutes. The chair is haleyb. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:28 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:31 The meeting name has been set to 'neutron_dvr' 15:00:32 obondarev:hi 15:00:35 haleyb: hi 15:00:42 #chair Swami 15:00:43 Warning: Nick not in channel: Swami 15:00:45 Current chairs: Swami haleyb 15:00:51 hi there 15:00:57 #chair Swami_ 15:00:58 Current chairs: Swami Swami_ haleyb 15:01:09 * regXboi watches the musical chairs 15:01:17 do we have any announcements today 15:01:20 :) 15:01:34 #topic Announcements 15:02:14 o/ 15:02:28 carl_baldwin:hi 15:02:31 So this might be the last meeting this year for me as I'm on vacation, don't know about others 15:02:42 ditto for me 15:02:46 haleyb: makes sense 15:03:04 ++ 15:03:34 if we all decide then we can resume the meeting in 2016 15:04:00 I'd ask it the other way 15:04:07 Right. I might be watching reviews here and there but not much else 15:04:16 "does anybody see a reason to meet with reduced groups" 15:04:41 I don't see any value with reduced groups. 15:04:48 we can continue in 2016 I think 15:05:06 ok, so somebody want to do the #agree? 15:05:19 #agree with that 15:05:24 +1 15:05:37 #agreed DVR meetings will pick up on 1/6/2016 15:05:42 whatever - I'll do it :) 15:05:44 +1 15:06:09 I'll update the wiki page after the meeting 15:06:15 #action I will send a message out in the channel. 15:06:30 Swami_: you want to #undo and s/I/Swami_/ in that? 15:06:32 Swami_: dev list? 15:06:38 otherwise the notes have an action item for "I" 15:06:39 We need ML for info 15:06:56 #undo s/I/Swami 15:06:57 Removing item from minutes: 15:07:18 (I can't believe I'm asking this) haleyb: hand me a chair, please 15:07:37 #chair regXboi 15:07:38 Current chairs: Swami Swami_ haleyb regXboi 15:07:45 hope you can reach the top shelf now 15:08:00 #action Swami_ to send a message about meetings continuing on 1/6/2016 to both channel and -dev ML 15:08:09 * regXboi breathes easier 15:08:30 any other announcements? 15:08:32 regXboi: did we give you enough breathing problems. 15:09:05 Swami_: since I've read minutes from these things in the past, I've become sensitive to the messages making sense to people who weren't here 15:09:06 i wish the bot would ackowledge all the #'s 15:09:07 nothing else I hope let us get going. 15:09:22 and note: in three days I will be a "person who weren't here" :) 15:09:31 #topic Bugs 15:09:39 haleyb: hi 15:09:56 I have created the bug sections in the Wiki page to categorize the bugs. 15:10:12 Please let me know if that helps out for the people who review and read the wiki. 15:10:23 Swami_: nice 15:10:29 still minor clean up is required, I will do it in the coming week. 15:10:51 This week we have not seen many bugs but a couple 15:10:53 Swami_: thanks, it helps, should cut/paste a small info line for each 15:11:01 Let us go over it. 15:11:08 haleyb: yes will work on it. 15:11:28 #link https://bugs.launchpad.net/neutron/+bug/1524908 15:11:28 Launchpad bug 1524908 in neutron "Router may be removed from dvr_snat agent by accident" [Undecided,In progress] - Assigned to Oleg Bondarev (obondarev) 15:11:47 There is a patch for this bug right now. 15:12:11 It should be straight forward, this was found during addressing another patch by oleg, so oleg has pushed in this patch. 15:12:15 Please review it. 15:12:29 Nothing to discuss more about this bug. 15:13:00 I think this bug has two patchs one to add the "admin_context" to delete and the other one to handle delete of router_namespaces in snat. 15:13:07 review both the patches. 15:13:29 The next one in the list is 15:13:34 #link https://bugs.launchpad.net/neutron/+bug/1526175 15:13:34 Launchpad bug 1526175 in neutron "ha router schedule to dvr agent in compute node" [Medium,In progress] - Assigned to zhang sheng (langyxxl) 15:14:01 did we drop a bug reference there? 15:14:12 This one was filed recently, the bug states that somehow when ha is configured and dvr agent is running, the ha routers end up in the dvr node. 15:14:35 regXboi: which bug reference? 15:15:06 Swami_: you said Nothing to discuss more about 1524908 and then said "I think this bug has two patches..." 15:15:28 and I'm confused if the I think statement still refers to 1524908 15:15:42 #link https://bugs.launchpad.net/neutron/+bug/1424096 15:15:42 Launchpad bug 1424096 in neutron "DVR routers attached to shared networks aren't being unscheduled from a compute node after deleting the VMs using the shared net" [Undecided,In progress] - Assigned to Oleg Bondarev (obondarev) 15:16:03 regXboi: this is the bug associated with the other dependent patch. 15:16:26 ah 15:16:31 ok, now the loop is closed - thanks 15:16:33 regXboi: is that clear now. 15:16:52 I reopened that one 15:17:06 obondarev: thanks 15:17:08 because faced it while reworking unit tests 15:17:20 so decided to go with a separate patch 15:17:58 obondarev: thanks for the update 15:18:03 next one. 15:18:07 #link https://bugs.launchpad.net/neutron/+bug/1522824 15:18:07 Launchpad bug 1522824 in neutron "DVR multinode job: test_shelve_instance failure due to SSHTimeout" [High,In progress] - Assigned to Oleg Bondarev (obondarev) 15:18:30 This is old bug and there is a patch out there for review. Please review it if not reviewed. 15:18:54 #link https://review.openstack.org/#/c/253569/ - This is the patch. 15:19:01 obondarev: were you discussing that one with kevin ? 15:19:28 it had +2 from carl_baldwinoncebut then kevinbenton suggested to use BUILD instead of new PENDING_BUILD 15:19:34 haleyb: no the one that i was discussing with kevin is the other one, this is related, but not the same. 15:19:57 haleyb: yes, we discussed with kevinbenton 15:19:58 obondarev: I did see that you pushed in a new version. 15:20:30 so the suggestion didn't wokr spo I returned to initial version 15:20:44 reviews needed 15:21:01 obondarev: ok, thanks will review it. 15:21:10 #link https://bugs.launchpad.net/neutron/+bug/1456073 15:21:10 Launchpad bug 1456073 in neutron "Connection to an instance with floating IP breaks during block migration when using DVR" [High,Confirmed] 15:21:30 Swami_: thanks, I think you did already 15:21:58 This bug is related to live migration on DVR and FIP. 15:22:12 haleyb: This is the one that I had discussion with kevin yesterday. 15:22:58 obondarev has a couple of patches to address the live migration but that is not directly related to this bug 15:23:18 right 15:24:16 obondarev: I will try to test with obondarev patch the live migration issue with fip and see if there is any improvement on this. 15:24:49 Swami_: my patches will hardly resolve the issue, I think more work is needed 15:24:50 obondarev: I have also added some comments in your patch on passing some information as "kwargs" to the registerd parties. 15:24:58 obondarev: yes I agree. 15:25:11 Swami_: saw that, prefer a separate patch for this 15:25:15 obondarev: I will work on it and see what is more required. 15:25:23 obondarev: ok will add one. 15:25:49 #link https://review.openstack.org/#/c/246898/ 15:25:56 This is the patch that we are discussing. 15:26:32 Thanks for the link 15:26:49 The next high priority bug in the list is 15:26:53 #link https://bugs.launchpad.net/neutron/+bug/1462154 15:26:53 Launchpad bug 1462154 in neutron "With DVR Pings to floating IPs replied with fixed-ips" [High,In progress] - Assigned to ZongKai LI (lzklibj) 15:27:36 #link https://review.openstack.org/#/c/246855/ 15:27:55 This patch is under review, please review if not reviewed. 15:28:52 #link https://bugs.launchpad.net/neutron/+bug/1522824 15:28:52 Launchpad bug 1522824 in neutron "DVR multinode job: test_shelve_instance failure due to SSHTimeout" [High,In progress] - Assigned to Oleg Bondarev (obondarev) 15:29:01 This is related to the gate test failure. 15:29:01 I need to catch up on this. Looks like still a wip 15:29:16 ... About the previous bug 15:29:21 carl_baldwin: yes seems like it. 15:29:34 1522824 was already discussed, wasn't it? 15:29:41 lizk: are you still here. 15:29:41 yes, it was first 15:29:50 yes, I'm here 15:30:25 lizk: are you still working on 1522824 15:30:38 it's should be under review now, but failed for gate-grenade-dsvm-neutron test 15:30:43 #undo 15:30:45 Removing item from minutes: 15:30:49 yes, I'm working on that 15:31:07 lizk: ok, just ping us in the channel once your are ready for review. 15:31:22 ok 15:31:41 obondarev: yes we have already reviewed the bug 1522824, my mistake. 15:31:41 bug 1522824 in neutron "DVR multinode job: test_shelve_instance failure due to SSHTimeout" [High,In progress] https://launchpad.net/bugs/1522824 - Assigned to Oleg Bondarev (obondarev) 15:32:58 #topic Gate-Test-Failures 15:33:19 Is there any new failures seen in the gate recently with respect to DVR. 15:33:39 Swami_: we should discuss the two infra reviews as well 15:33:47 I've not seen anything new 15:33:57 regXboi: thanks 15:33:58 and +1 on the infra discussion 15:34:08 haleyb: what are the two infra ones. 15:34:16 haleyb: do you have the links 15:34:24 the dvr job does seem higher than neutron-full fwiw 15:34:53 https://review.openstack.org/#/c/255325/ -make dvr job voting again 15:35:05 which had some good comments from kyle, doug and swami 15:35:19 haleyb: the single voting job was pushed last week based on our agreement. 15:35:41 But there were a couple of comments on that patch. 15:35:54 if we need to go for voting on single node job or multinode job? 15:36:15 What is the reason that we need to go in for single node job? 15:36:15 well, ideally, we want multinode to be voting and single node to go away 15:36:37 The comment on the single-node was to change it to be a check only job 15:36:40 but multinode (of both types) are rather ill 15:36:57 and I agree to that idea - single node can be check only 15:37:00 regXboi: is but we don't want to wait until the multinode job gets stable, if single node job is stable, then we should vote and then start working on multinode to prevent further regression. 15:37:19 honestly, having single node vote in the gate isn't really useful, is it? 15:37:22 both multinode jobs are affected by some block live migration failure 15:37:31 obondarev: ack 15:37:33 haleyb: regXboi: what is the difference between having it in check versus the other. 15:37:38 not related to dvr I guess 15:37:50 and not sure if related to neutron 15:37:58 so, we *had* single node voting before, remember? 15:38:01 regXboi: why do you think so that voting single node is not useful. 15:38:11 and it led to regressions because it doesn't really *test* anything 15:38:20 DVR without multinode isn't really DVR 15:39:01 +1 15:39:06 and dougwig + mestery (and armax I expect) are pointing that out 15:39:11 and after thinking about it, they are correct 15:39:37 regXboi: but do we need to delay the single node voting because of multinode that is my point. 15:40:02 once multinode gets voting we can remove the single node job. 15:40:08 the push back is "making single node voting doesn't mean anything because it isn't really testing DVR" 15:40:18 and I can't argue with that 15:40:31 in fact, I'd argue that we might as well just remove singlenode DVR completely 15:40:42 right, and we want to reduce the number of rechecks in the gate 15:40:46 as a non voting job it doesn't do anything 15:40:47 obondarev: as you pointed out in multinode the live migration is a blocker right now, until we fix that bug. 15:40:53 as a voting job it doesn't do anything 15:40:58 so why is it there? 15:41:10 regXboi: were things different when we discussed this last week. 15:41:36 Swami_: yes, I had forgotten a bit of history that I went and boned up on after the comment stream 15:41:56 regXboi: thanks 15:42:28 regXboi: first I would say that the tempest tests were not specifically written for multinode jobs. 15:43:02 Swami_: and as you noticed, the dvr job is already in the check queue, it's just not voting 15:43:21 haleyb: yes I see that. 15:43:26 Swami_: doesn't that sorta translate into "the tempest tests were not specifically written for DVR?" 15:43:57 regXboi: not only DVR any multinode scenario is not well handled for the multinode case. 15:44:26 regXboi: yes we need to fix all the tests before attempting to make the multinode voting, that is my argument. 15:44:34 Swami_: granted, but I don't see why that's a reason to (a) make the single node dvr job voting or (b) keep the single node job around 15:44:35 whoa 15:44:47 I'm not talking about making the multinode job voting now 15:44:53 regXboi: haleyb: agreed 15:45:01 that's the end goal, yes, but we are nowhere *close* to that one 15:45:20 So the agreement here by the team is not to make single node job vote and move forward with voting the multinode job. 15:45:33 If the single now job isn't testing anything more than the non dvr one, why has it been falling more? 15:45:35 #agree 15:45:40 uh... not exactly 15:46:00 carl_baldwin: I didn't quite parse that 15:46:01 right, has the single-node job caught anything? 15:46:01 carl_baldwin: I like your question. 15:46:18 I'll ask the question this way 15:46:28 although i guess since it's not voting the net is always empty 15:46:37 (1) why should the single node DVR job be voting? 15:46:54 (2) if there is no good answer to #1, why does the singe node DVR job exist? 15:47:16 regXboi: I think you have not answered for carl_baldwin question above. 15:47:18 and I've not heard a reason for #1 that holds up yet 15:47:31 Swami_: I didn't *parse* carl_baldwin's question above 15:47:42 Yes, the single now job has caught problems with dvr. 15:48:01 s/now/node 15:48:07 Swami_ knows this 15:48:22 carl_baldwin: recently? 15:48:38 carl_baldwin: I agree, the only one thing that the single node did not catch is the live migration that involves two or more nodes. 15:49:02 regXboi: does it matter when? 15:49:11 carl_baldwin: yes actually it does 15:49:34 regXboi: if the answer is Yes, then why should we remove the job 15:49:46 regXboi: why? 15:49:58 if the answer is "yes, recently" than keeping the job makes sense 15:50:12 if the answer is "yes, but not recently" then having the multi-node job covers it 15:50:36 in other words: "what is the single node job testing that the multi node job isn't" 15:51:14 regXboi: as I mentioned above, the additional tests that the multinode job is testing is the nova live migration that is only turned on when there are more than one nodes. 15:51:36 Swami_: that's not the answer I think you want to give me 15:51:51 regXboi: what do you expect? 15:51:54 If the set of tests from multinode is a superset of the tests of single node then why does single node exist 15:52:12 regXboi: it is a stepping stone. 15:52:26 regXboi: well, it is testing the namespace manager code, as that aspect differs 15:52:56 regXboi: I haven't seen that the multinode job is anywhere near working well. Why not have the single node, which is close, in the mean time? 15:53:28 carl_baldwin: I believe these are the arguements that will need to be made to the infra folks 15:53:41 carl_baldwin: that is exactly what I am thinking. Until the multinode job is stable and ready let us depend on the single node. 15:53:51 Someone has said that the single node test doesn't test *anything*. That is bogus. 15:54:02 I said that 15:54:14 What they should be saying is that the single node test doesn't test all that might be testable in a multinode. 15:54:14 and except for haleyb's statement, I rather stand by it 15:54:17 carl_baldwin: fyi, see the discussion in https://review.openstack.org/#/c/255325/ from doug and kyle (and swami) if you haven't already 15:54:38 But, in its current form, the multi-node test barely tests anything more and it is a lot more broken. 15:54:39 regXboi: right, packet flow via namespaces and OVS rules is different 15:54:49 haleyb's namespace manager statement is the first thing I've seen that I think single node can hang its hat on 15:55:42 my thoughts. 15:55:58 When multi-node is serving its purpose well, there is no need for single node. I agree with you there. But, for now, no one will pay attention to it for anything because it is broken. 15:56:15 carl_baldwin: I'll give you a +1 on that 15:56:17 If we don't make the single node job voting we will be constantly fixing bugs which are due to new updates. 15:56:34 So, suggesting we remove single now, that it has no purpose, is premature. 15:57:23 carl_baldwin: ok, that's self consistent reasoning for keeping it 15:57:23 carl_baldwin: agreed 15:57:31 but that's not enough to make it voting 15:58:06 or at least, I don't think it is 15:58:09 yet 15:58:36 now - haleyb's comments about namespace and packet flow I think *might be* 15:58:37 regXboi: this gets at "noone will notice" if a non-voting job fails, we don't want to slide backwards 15:58:38 regXboi: may be thinks will change next week or by 2016, if you think through. 15:59:00 s/thinks/things 15:59:00 regXboi: That was only arguing to not get rid of the single node job not to be conflated with any arguments to make it voting. 15:59:00 we are almost out of time 15:59:09 Can I ask dvr floks in last? 15:59:11 carl_baldwin: ack 15:59:17 I wonder if DVR floks will review a linuxbridge DVR spec https://review.openstack.org/#/c/255174/ 15:59:23 carl_baldwin: apologies for conflation 15:59:30 hichihara: will od 15:59:36 carl_baldwin: so you don't think it should vote in the check job? 15:59:37 s/od/do 15:59:48 Swami_: Thanks :) 16:00:06 we are at the top of the hour 16:00:13 we will have to keep talking in #neutron as out of time 16:00:19 we can discuss it in IRC or on the voting job patch. 16:00:30 have a happy new year everyone, and thanks for the great work 16:00:32 #endmeeting