15:00:33 <Swami> #startmeeting distributed_virtual_router 15:00:34 <openstack> Meeting started Wed Oct 8 15:00:33 2014 UTC and is due to finish in 60 minutes. The chair is Swami. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:36 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:38 <openstack> The meeting name has been set to 'distributed_virtual_router' 15:01:14 <Swami> #info RC1 cut happened last Thursday 15:01:29 <Swami> If anyone is testing the DVR code please make sure that you are testing the RC1 code. 15:01:46 <Swami> #topic Bugs 15:02:02 <Swami> There are couple of high bugs that we are currently working on 15:02:34 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1377241 15:02:36 <uvirtbot> Launchpad bug 1377241 in neutron "Lock wait timeout on delete port for DVR" [High,In progress] 15:03:34 <Swami> I am working on the above bug and I have posted a patch for review. It is still a WIP. Please take a look at it and provide your comments. 15:03:45 <viveknarasimhan> sure Swami 15:03:58 <Swami> #link https://review.openstack.org/#/c/124849/ 15:04:37 <Swami> viveknarasimhan: yes this has the lock for the csnat delete and also I made a minor change for the gateway-clear to call the csnat-delete only when the gateway is associated with the current network id. 15:05:11 <Swami> There was some issue with the upstream test_requirements, I could not run tox last night since it was broken, I will check it again today and see if it works. 15:05:53 <Swami> This lockwait bug exposes other areas where there might be a timing issue. 15:06:09 <ChuckC> Swami: unit tests need more setup now 15:06:18 <viveknarasimhan> ok Swami 15:06:50 <Swami> Because of the router_interface_delete and gateway_clear not ordered, there is more timing issues. 15:07:09 <Swami> This was the reason that we are also seeing the DBDuplicateError for Snat agent binding. 15:07:34 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1378468 15:07:35 <uvirtbot> Launchpad bug 1378468 in neutron "DBDuplicateError found sometimes when router_interface_delete issued with DVR" [Undecided,In progress] 15:07:43 <ChuckC> Swami: other tests also. as of 10/6 15:08:03 <Swami> ChuckC: can you elaborate on the setup please. 15:08:31 <ChuckC> You need to install postgresql postgresql-contrib postgresql-clien 15:08:50 <ChuckC> and postgresql-server-dev-9.3 15:09:18 <Swami> Should it be done manually 15:09:29 <ChuckC> to run tests other than unit tests, more setup is needed, but I think devstack will handle that 15:09:31 <Swami> will it not pull it automatically from the requirements.txt file 15:09:56 <Swami> Right now devstack is broken, it is not able to pull all the information. 15:10:12 <Swami> It was giving me some errors on mysql.conf not found or something. 15:10:25 <viveknarasimhan> did you try 15:10:30 <viveknarasimhan> run_tests.sh instead of tox? 15:10:33 <ChuckC> These are not in requirements.txt, but I think devstack will cover them once a fix merges (let me look) 15:10:53 <ChuckC> viveknarasimhan: I just ran tox 15:11:26 <Swami> viveknarasimhan: last night I ran both, after cleaning up my existing .venv and .tox 15:11:33 <Swami> viveknarasimhan: but did not succeed. 15:12:17 <Swami> Ok I will again today by manually installing the packages that chuck mentioned. 15:12:26 <viveknarasimhan> Ok 15:12:38 <Swami> coming back to our discussion. 15:12:46 <viveknarasimhan> when you say Tox failure i assume its not the old proxy issue we discussed 15:12:52 <Swami> For the above bug with DBDuplicateError I have posted a patch. 15:13:08 <Swami> #link https://review.openstack.org/#/c/126793/ 15:13:29 <Swami> viveknarasimhan: yes not the old proxy, since i was running it from my home yesterday. 15:13:35 <viveknarasimhan> ok 15:13:58 <ChuckC> the review is https://review.openstack.org/#/c/126175, but it's for gate, not devstack 15:14:12 <Swami> In order to work around this timing issue I went back to introducing the hints, and based on the hints, if it is router_interface_action I will not call the schedule_snat. 15:15:22 <Swami> I am not sure if this is the optimial solution for this timing issue, but in the short term, this helps by preventing any calls to schedule_router. 15:15:53 <viveknarasimhan> ok i will look closer 15:16:07 <Swami> viveknarasimhan: Also ChuckC is working on a patch that Carl Baldwin started for removing the rpc from the delete_port, can you review it. 15:16:16 <ChuckC> https://review.openstack.org/#/c/122880 15:16:37 <ChuckC> thanks, Swami 15:16:42 <viveknarasimhan> i see that is WIP 15:16:45 <Swami> ChuckC: You might have to create a bug on the launchpad for this issue, I did see that it did not have a bug id on it. 15:16:47 <viveknarasimhan> but, yes i will review that as well 15:17:11 <ChuckC> Swami: I don't really have context for submitting a bug (I don't know the symptoms) 15:17:31 <Swami> ChuckC: no problem 15:17:41 <ChuckC> I need some help here, since I'll need to test the fix somehow also 15:17:48 <Swami> I will file a bug on that for splitting the rpc from db transactions for delete_port 15:18:03 <Swami> Once I file the bug I will let you know. 15:18:12 <ChuckC> thanks Swami 15:18:35 <Swami> By the way last couple of days I have been testing your patch along with my fix for the DB lockwait timeout, I did not see any issues. 15:18:39 <ChuckC> viveknarasimhan: I don't think it's WIP any more 15:19:17 <ChuckC> Swami: great! 15:19:27 <viveknarasimhan> ChuckC: Ok , will review it 15:19:40 <Swami> viveknarasimhan: With both my patches that I mentioned above, I still see some errors related to dhcp when I run the clean up script. 15:19:49 <Swami> I sent you an email about this. 15:20:21 <Swami> There is a "KeyError network_id" in dhcp_rpc.py. I already filed a bug upstream on this, and someone is working on it. 15:20:49 <Swami> #link https://bugs.launchpad.net/bugs/1378508 15:20:50 <uvirtbot> Launchpad bug 1378508 in neutron "KeyError in DHPC RPC when port_update happens.- this is seen when a delete_port event occurs " [Undecided,New] 15:20:57 <viveknarasimhan> I couldn't spend time on the DHCP bug today 15:21:10 <viveknarasimhan> due to interview calls and VLAN/FLAT bridge overlapping troubleshooting 15:21:12 <Swami> This is not related to DVR so I did not tag it as l3-dvr-backlog. 15:21:22 <viveknarasimhan> i will look at both your reviews 15:21:34 <viveknarasimhan> but DHCP might be a different problem you have uncovered 15:21:40 <viveknarasimhan> unrelated to the patch.. 15:21:48 <Swami> Also I do see some "SAWarning" messages in the logs 15:22:03 <Swami> But these are not critical. 15:22:25 <Swami> But I am not sure what are the side effects of these warning messages, 15:23:12 <Swami> I also see some log messages such as "Will not send event port_delete_end for network 3db2c093-c033-4b18-860b-2691892aaea7: no agent available. Payload: {'port_id': u'c7f99dd3-1523-4128-85d2-ec15ff0f7f2e'}" 15:24:35 <viveknarasimhan> those are audit logging messages 15:24:37 <Swami> All these are related to "timing" issues that is my hunch 15:24:50 <viveknarasimhan> the xxx_delete_start and xxx_delete_end where xxx represents the resource 15:25:03 <Swami> Why those messages are shown as red.(error) 15:25:44 <Swami> These are not seen on all logs, but it occurs sometimes. 15:26:12 <Swami> Like yesterday I ran the regression testing for clean up around 20 times and I did see these errors couple of times. 15:27:40 <Swami> getting back to the bugs 15:27:48 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1376325 15:27:50 <uvirtbot> Launchpad bug 1376325 in neutron "Cannot enable DVR and IPv6 simultaneously" [Medium,New] 15:28:00 <Swami> This bug is related to IPv6 configuration with DVR 15:28:19 <viveknarasimhan> we did not promote support of IPv6 15:28:27 <viveknarasimhan> so i am not sure why this is considered bug 15:28:36 <viveknarasimhan> this would be a feature 15:28:42 <Swami> Upstream does support IPv6, but our l3-agent kind of does not like it. 15:28:43 <haleyb> I was just thinking about that one, and if there was an easy fix to add the ip rule correctly 15:28:57 <Swami> haleyb: hi 15:29:23 <Swami> Yes this week we don't have rajeev and mike, they are on vaccation,so we are limited by resources 15:29:52 <Swami> I will ask rajeev to take a look at it. 15:30:20 <haleyb> hi swami. i'll see if i have time to look further, it's obvious what is broken, just not sure if a simple fix will get it working. I'm not looking at optimizing things 15:30:25 <Swami> haleyb: I am not sure about the level of that bug. 15:31:13 <Swami> haleyb: sure take a look at it and get an estimate on that work, 15:31:22 <haleyb> medium or low, armando had marked as medium 15:31:31 <Swami> Once we wrap up the critical ones we can come back to these backlog items. 15:32:33 <Swami> I think that's all I had for the bugs 15:33:09 <Swami> Do you guys have any other topic to discuss today. 15:33:31 <Swami> #topic DVR Backlog 15:33:57 <Swami> viveknarasimhan: I have not filed bugs for all the backlog items yet 15:34:11 <viveknarasimhan> Please file the bugs 15:34:12 <Swami> viveknarasimhan: I will do it today as per our earlier discussion 15:34:32 <viveknarasimhan> sure.. tag them as l3-dvr-backlog only if failure happens with DVR enabled 15:34:37 <Swami> viveknarasimhan: Once I file the bugs I will let you know and if I have missed anything you can let me know. 15:35:31 <Swami> #topic DVR-Documentation 15:35:47 <Swami> There are couple of bugs filed for DVR-Documentation. 15:35:54 <viveknarasimhan> ok sure 15:36:09 <Swami> Armando wanted us to work on those after we complete the bugs for the Juno RC. 15:36:35 <Swami> I have asked Vinod to take a look at it and it seems Vinod was ok with starting with one of the doc bugs. 15:36:53 <Swami> I have also spoken to Edgar on this to get the ownership of the DVR documentation 15:37:19 <Swami> Vivek, may be if you and me can help Vinod we can complete the documentation for the DVR. 15:37:29 <viveknarasimhan> i had a discussion with Vinod today 15:37:38 <viveknarasimhan> he will be able to address doc bugs for L2 15:37:55 <Swami> carl_baldwin: hi 15:37:57 <viveknarasimhan> but for l3 extensions, we felt it will be good if you could guide 15:38:19 <Swami> sure, I can help vinod if he have any questions 15:38:52 <Swami> Let me see how these bugs end up and then take up the doc work. 15:39:23 <Swami> #topic Open Discussion 15:39:26 <carl_baldwin> Swami: sorry to be late. 15:39:36 <Swami> carl_baldwin: no worries 15:39:45 <Swami> I just went over the bugs. 15:39:51 <ChuckC> carl_baldwin: Swami has volunteered to submit a bug for 122880 15:39:57 <Swami> Most of our discussion was with the DB lockwait timeout issue. 15:40:33 <carl_baldwin> Swami: 122880 didn’t help with that in your testing, right? 15:40:47 <Swami> carl_baldwin: Yes it did not help 15:40:57 <Swami> But it is good to have it. 15:41:23 <Swami> Adding a lock to the transaction that does the "csnat_port_delete" helps a bit. 15:41:41 <Swami> I have posted a couple of patches upstream as WIP for review. 15:42:06 <Swami> #link https://review.openstack.org/#/c/124849/ 15:42:20 <Swami> #link https://review.openstack.org/#/c/126793/ 15:42:40 <Swami> With both these patches I don't see the "Internal Server Error" any more. 15:43:00 <carl_baldwin> Swami: Have you seen my latest comment on https://review.openstack.org/#/c/105855/ ? 15:43:04 <Swami> Please take a look at it and let me know your thoughts. 15:43:22 <carl_baldwin> Swami: I’ll have a look. 15:44:08 <Swami> carl_baldwin: no I have been heads down on the lockwait issue, I will take a look at it. 15:44:22 <carl_baldwin> Maybe it can wait until mrsmith is back. 15:44:55 <Swami> no problem I will check it out and fix it, if it is minor one. 15:45:29 <viveknarasimhan> i have question for bhailey: 15:45:30 <carl_baldwin> I’m not sure it is minor but you can probably judge that better. 15:45:52 <haleyb> viveknarasimhan: i'm here 15:45:52 <carl_baldwin> viveknarasimhan: do you mean haleyb ? 15:46:02 <viveknarasimhan> yes 15:46:07 <viveknarasimhan> for review 123911 15:47:00 <viveknarasimhan> i see PS11 came back full circle to be PS8 15:47:42 <viveknarasimhan> haleyb: please let us know the self.dvr_agent attribute error issue not popping out 15:47:44 <haleyb> Yes, removing that self.dvr_agent check broke the tests, since it immediately calls _report_state() and blows up 15:47:50 <viveknarasimhan> on report_state running earlier than rpc_loop 15:48:17 <viveknarasimhan> also pls feel free to let us know 15:48:25 <viveknarasimhan> any help that might be required on 123911 15:48:57 <Swami> Any other topics 15:49:14 <Swami> if not we can end this meeting 15:49:15 <haleyb> viveknarasimhan: i think it's ready to go now, if you just want to review and give a +1/-1 as needed 15:49:26 <viveknarasimhan> PS 11 looks ok to me 15:49:34 <viveknarasimhan> will give +1 later today 15:49:51 <Swami> can you post the link in here for reference. 15:50:10 <viveknarasimhan> https://review.openstack.org/#/c/123911/8..11/neutron/plugins/openvswitch/agent/ovs_dvr_neutron_agent.py 15:50:25 <Swami> viveknarasimhan: thanks 15:51:06 <Swami> thanks everyone for joining the meeting 15:51:14 <Swami> see you all next week 15:51:17 <Swami> bye 15:51:20 <viveknarasimhan> good day swami, carl, brian , chuck 15:51:25 <Swami> #endmeeting