15:04:43 <Swami> #startmeeting distributed_virtual_router 15:04:44 <openstack> Meeting started Wed Aug 20 15:04:43 2014 UTC and is due to finish in 60 minutes. The chair is Swami. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:04:45 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:04:47 <openstack> The meeting name has been set to 'distributed_virtual_router' 15:04:51 <Swami> Bhooshan: hi 15:04:55 <pcm_> hi 15:05:09 <Rajeev> Swami: Hi 15:05:22 <Swami> Rajeev: hi 15:05:29 <Swami> pcm_:hi 15:06:06 <Swami> #info Feature Proposal freeze is this week 15:06:33 <Swami> #topic Agenda 15:06:42 <Swami> 1. DVR Update 15:07:17 <Swami> DVR is currently under test. 15:07:52 <Swami> The code has been completely merged. 15:08:20 <Swami> DVR team is currently focused on testing, fixing bugs. 15:09:00 <Swami> #topic Horizon 15:09:20 <Bhooshan> Completed code changes for enhancement of Horizon to support DVR. Patchset4 has been uploaded. 15:09:23 <Swami> Bhooshan: I saw your patch yesterday 15:09:41 <Bhooshan> As per the last meeting removed checkbox from has been replaced with drop down menu. 15:09:53 <Swami> Bhooshan: I was not able to test your code, I will try to test it today. 15:10:15 <Bhooshan> swami: fine 15:10:17 <Bhooshan> Administrator can choose "Use server default", “Distributed” and “Centralized” from dropdown menu. 15:10:39 <Swami> Bhooshan: Thanks for the accomodating those change request 15:10:51 <Bhooshan> "Use server default" won’t carry “distributed” flag along with the REST request. 15:11:12 <Swami> Bhooshan: Good to know. 15:11:21 <Bhooshan> Now I am writing unit test case for the DVR support and addressing the comments. 15:11:41 <Swami> Bhooshan: The only other pending item is the "Router update" or "Edit" tab 15:12:11 <Swami> Amotoki mentioned that it can be done by Juno 3. 15:12:38 <Bhooshan> Amotoki replied for this in one mail 15:12:45 <Swami> If you are pretty much occupied on this initial effort, can I ask amotoki to work on the edit tab. 15:13:03 <Bhooshan> update router will go as different patch 15:13:38 <amotoki> Swami: Bhooshan: I am okay with either case. 15:13:53 <Swami> amotoki: thanks for jumping in. 15:13:57 <Bhooshan> ok, let wait till we will this patchset 15:14:11 <Swami> So let us keep both as separate patches. 15:14:21 <Bhooshan> till we merge this patchset 15:14:39 <Swami> Bhooshan you can focus on wrapping up the current work that you are doing along with amotoki's work 15:14:39 <Bhooshan> I am planning to finish unit tests by tomorrow 15:14:49 <Bhooshan> fine 15:15:04 <Swami> I will request amotoki to put in another patch for the router edit tab. 15:15:14 <Bhooshan> Ok. 15:15:29 <Swami> amotoki: are you ok with this proposal 15:15:42 <amotoki> Swami: yes. no problem. 15:15:55 <Swami> amotoki: Thanks for your help. 15:16:39 <Bhooshan> Amotoki: Thank you for all your helps 15:16:55 <Swami> Bhooshan: can you add your patch link into the irc 15:17:19 <Swami> Just for people to review 15:17:28 <Bhooshan> https://review.openstack.org/112583 15:18:06 <Swami> DVR folks please review the horizon UI pages for DVR. 15:18:11 <amotoki> Swami: is "Edit router" for admin panel in your mind? "Edit Router" in project panel provides not so much value. 15:18:52 <Swami> amotoki: Yes edit router is only for "admin" at this time to allow an admin to update a router from a legacy to "distributed". 15:20:04 <Swami> amotoki: is it clear 15:20:15 <amotoki> Swami: In my undertstanding, Juno supports only updating a router from legacy to distributed. update from dvr to legacy is not supported. 15:20:40 <Swami> amotoki: Yes your understanding is right. 15:20:52 <amotoki> Swami: thanks. it is now clear to me. 15:21:02 <Swami> ok 15:21:07 <Swami> Let us move on to the next topic 15:21:14 <amotoki> there are many patches related to dvr and i cannot track teh status completely :-( 15:21:15 <Swami> #topic Bugs 15:21:30 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1358718 15:21:31 <uvirtbot> Launchpad bug 1358718 in neutron "duplicate ping packets from dhcp namespace when pinging across DVR subnet VMs" [Medium,New] 15:21:47 <Swami> I have posted the link to the DVR bug list 15:22:33 <Swami> Most of the bugs we have assignee 15:22:48 <Swami> A couple of new bugs have poped yesterday. 15:23:35 <Swami> One of the bug requires some discussion 15:23:35 <carl_baldwin> I think our High importance bugs look like they’re under control. I’m having difficulty keeping up with the Medium. 15:24:00 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1358998 15:24:01 <uvirtbot> Launchpad bug 1358998 in neutron ""No L3 agents can host the router..." traces for DVR" [Medium,New] 15:24:14 <Swami> This was one of the bug that was filed by armando yesterday 15:24:14 <carl_baldwin> If you notice that there has not been enough reviewer attention on a bug fix, could you ping me? 15:24:39 <Swami> carl_baldwin: is there a specific bug 15:25:05 <Swami> carl_baldwin: got it I will let you know if there is a patch that is waiting for review. 15:25:13 <carl_baldwin> Swami: no. No specific bug. Thanks. 15:25:42 <Swami> carl_baldwin: sure will do. 15:26:16 <Swami> The bug that i have posted the link above is caused by FWaaS failing in the gate. 15:26:41 <Swami> The problem for this is our router namespaces are not getting created and so the Firewall raises an error. 15:27:24 <Swami> The current firewall does not support DVR and the router configured is all dVR, so the behavior is odd. 15:28:04 <carl_baldwin> Swami: You mean that the bug is the cause for FWaaS failing? 15:28:07 <Swami> In dvr we only create an IR if there is a VM on the network or in the case of service node we create an IR if there is a gateway and interface attached to the IR. 15:28:32 <Swami> carl_baldwin: Yes that's what armando mentioned to me yesterday. 15:28:49 <carl_baldwin> Swami: Thanks for the clarification. 15:29:08 <Swami> The log message by itself is kind of missleading we might have to correct the log message. 15:29:23 <carl_baldwin> After looking at this further, I wonder if Medium is appropriate for this bug. 15:29:41 <Swami> When there is no VM in the network, the scheduler comes back and says that there are no valid L3 agents right now. 15:30:27 <Swami> The main reason I brought up this topic is because myself, armando had concerns with the way the current IR's are created. 15:30:44 <Swami> I just wanted to get some feedback from the team on how we can handle this situation. 15:31:10 <Swami> There are two options 15:31:57 <Swami> 1. Fix the FWaaS code to handle DVR routers in the gate testing so that it never waits for the IR and when IR is created the FWaaS should apply the rules. In this case the L3scheduler to provide information to the FwaaS agent when the IR is created. 15:32:38 <Swami> Option: 2. We should create IR's irrespective of the VM's availability. 15:32:51 <Swami> What is your take on this. 15:33:33 <Swami> armando: ping 15:33:50 <Rajeev> Swami: thanks for putting out the options. Don't like 2 15:34:25 <Swami> Rajeev: Thanks for your input 15:34:36 <Swami> armando seems to align on option 2. 15:35:07 <Rajeev> Swami: is 2 for only single node scenario ? 15:35:25 <carl_baldwin> Swami: Could I get a little more detail on option 2? In what case would we create an IR where we don’t now? 15:36:03 <Swami> No he was ok for multi node as well. 15:36:31 * chuckC_ wonders if armax gets notified when his handle appears 15:36:33 <Rajeev> Swami: Is FWaaS the only reason for 2 ? 15:36:45 <armax> chuckC_: hi 15:36:55 <Swami> carl_baldwin: Armando is suggesting can we go ahead and create IR's similar to centralized routers on all nodes, where there are active L3 agents, instead of checking for a dvr related port to pop up. 15:36:57 <chuckC_> armax: guess not! 15:37:19 <armax> yeah, armax would do the trick 15:37:23 <Swami> armax: are you there 15:37:32 <armax> yes 15:37:46 <armax> hang on let me read through 15:37:52 <carl_baldwin> Swami: So, all compute nodes would host all IRs? 15:37:58 <Swami> armax: we are just going through the two options that we were discussing regarding the IR creation 15:38:25 <Swami> carl_baldwin: Yes 15:38:27 <armax> Swami: right, I am still on the fence as to what option is best 15:39:10 <Rajeev> Swami: I still want to know what the motivation is for these options ? 15:39:12 <carl_baldwin> Swami: armax: That just won’t scale. Network namespaces are not as resource intensive as VMs but they are not free. We cannot load a compute host with 1000s of NSs. 15:39:14 <armax> Swami: the fact that we create namespaces under some (not well documented) circumstances 15:39:17 <WormMan> I think all compute nodes having every router would be messy and complicated. We run nova-network MultiHost and 64 tenants/vlans is ba enough there 15:39:25 <WormMan> bad enough 15:39:27 <armax> makes it difficult to understand when and where namespaces should or should not be 15:39:41 <armax> option 2 is only better to rule out any logic error 15:39:44 <Swami> carl_baldwin: agreed, that was my original comment on that option. 15:39:47 <armax> if namespaces are always supposed to be there 15:40:41 <armax> then the presence of namespace or lack thereof will tell immediately if there’s a sympthom or not 15:40:59 <armax> carl_baldwin: agreed that performance may be an issue 15:41:26 <carl_baldwin> armax: Performance will be an issue. A big one. 15:41:28 <armax> it’s not a matter of namespace handles on the hosts 15:41:41 <armax> carl_baldwin: but also the notifications that would result in getting those namespaces there 15:42:25 <Rajeev> control plane traffic after the Namespaces are created would be high too 15:42:52 <Rajeev> because updates will get directed to all nodes 15:42:53 <Swami> So are we all in an agreement that Option 2 is ruled out. 15:43:07 <carl_baldwin> +1 15:43:32 <Swami> Ok, now let us come back to the Option 1. 15:43:38 <Rajeev> +1 would feel better if I knew why we are bringing these options 15:44:20 <Rajeev> Is there any other reason than the Fwaas and simplicity ? 15:44:32 <Swami> So in order to fix this issue for now, we either have to fix the current FWaaS tempest test suite that tries to configure the FWaaS for Distributed routers 15:45:18 <Rajeev> In that case 2 sounds like an overkill 15:45:21 <carl_baldwin> Is the fwaas agent the same as the l3 agent? 15:45:53 <Swami> Yes it is using the l3 agent, there is no separate agent for it. 15:47:04 <armax> when looking at the fwaas test errors yestarday I noticed that the test might pass based on a dirty state of the L3 agent 15:47:24 <Swami> rajeev: Yes in terms of simplicity option 2 would be more simple, but for performance reasons we have to move away from option 2. 15:48:10 <Swami> armax: Is that a trivial change. 15:48:16 <armax> Swami: I haven’t root caused it completely…but the fact that we make the transition of the firewall state from PENDING_CREATE to CREATE by looking also at the actual namespaces on the agent’s node is a bit weird 15:49:07 <Swami> armax: I think it is because they are dependent on the router, so they have a check in there. 15:49:07 <armax> Rajeev, Swami: let’s put at rest option 2. The only reason I brought it up was because we found so many issues with namespace placement that I was advocating for option 2, only as a first step and then walk back to address the performance issues 15:49:36 <armax> but that ship has sailed in my opinion, it seems we got a good handle on how to deal with snat/ir namespaces etc. 15:49:40 <carl_baldwin> armax: That is a bit wierd. 15:50:01 <Rajeev> armax: understood. thanks for explaining. 15:50:02 <armax> so let’s stick with option 1 15:50:13 <Swami> agreed. 15:50:21 <armax> and perhaps let’s document somewhere in the code where/when the placement happens 15:50:33 <armax> because just reading the code it’s not super clear 15:50:54 <carl_baldwin> armax: +1 we need to be clear about this. 15:50:58 <Swami> armax: carl_baldwin: I have initiated a thread with the FWaaS team on this issue and probably we might fix this behavior for the DVR with their current implementation. 15:51:01 <Rajeev> +1 15:51:54 <armax> Swami: the other reason why I brough option 2 up was because of this b 15:51:56 <armax> bug 15:52:10 <armax> #link https://bugs.launchpad.net/neutron/+bug/1358998 15:52:11 <uvirtbot> Launchpad bug 1358998 in neutron ""No L3 agents can host the router..." traces for DVR" [Medium,New] 15:52:14 <armax> not a bug per se... 15:52:34 <armax> but the fact that we’re looking a failure in router placement is confusing to the eye of the admin 15:52:54 <armax> anyhow, let’s go back to fwaas/dvr 15:52:56 <Swami> armax: yes that's what leaded us to this discussion here in the IRC 15:53:13 <armax> gotcha 15:53:14 <carl_baldwin> armax: I haven’t totally wrapped my head around the problem. It could just be a misleading log message. 15:53:23 <armax> carl_baldwin: it is indeed 15:53:33 <carl_baldwin> Or, maybe it is a bigger problem? 15:53:51 <armax> carl_baldwin: right…we can’t tell for sure and it depends on each case 15:54:02 <carl_baldwin> Hence my inability to make up my mind about the bugs Importance. :) 15:54:39 <Swami> armax: carl_baldwin: The log messages requires some clean up, there are some misleading logs. We will try to clean it up as part of the snat fixes. 15:54:41 <armax> hence the whole story around option 2; if we had chosen that path intiially only to improve it aftewards we would’ve been able to tell 15:55:27 <Swami> ok. 15:55:39 <Swami> #topic DVR migration patch 15:56:17 <Swami> Mike is currently working on the migration patch, but he is also having challenges in migrating a legacy router with multiple subnets to a DVR router. 15:56:56 <mrsmith> Swami: yes - a couple probs 15:57:10 <Swami> If VMs are scattered across multiple subnets for a legacy router, there is no cleaner way to migrate the legacy router and create IR on all respective compute Nodes. 15:57:29 <mrsmith> the current code doesn't handle multiple subnets well - so there is a fix for that in progress 15:58:07 <mrsmith> it comes back to our previous discussion on IRs on CNs and checking for VMs being present or not 15:58:36 <Swami> mrsmith: thanks for the update 15:58:37 <mrsmith> I keep hitting db errors related to open sessions and not rolling back previous sessions 15:58:57 <mrsmith> hopefully I will have more progress today 15:59:02 <mrsmith> Swami: np 15:59:15 <Swami> #topic Open Discussion 15:59:24 <mrsmith> quick - time low 15:59:36 <Swami> Any other items that we need to discuss 16:00:00 <Swami> Ok folks, thanks for joining the meeting. 16:00:13 <Swami> If we have any other topic we can discuss tomorrow in the L3 meeting. 16:00:18 <carl_baldwin> mrsmith: Is there any way to fix this in stages? Maybe throw an exception if there are multiple subnets in the first patch and follow on with other patches? 16:00:18 <Swami> bye 16:00:36 <mrsmith> carl_baldwin - yes possible 16:00:43 <mrsmith> I have considered that 16:00:48 <Swami> carl_baldwin: that might be a possibility. We will try to explore more on this today and give an update. 16:00:57 <mrsmith> k 16:01:01 <carl_baldwin> mrsmith: Swami: thanks 16:01:07 <Swami> sorry we are at the end of hour. 16:01:07 <Swami> bye 16:01:11 <Swami> #endmeeting