18:01:59 <SumitNaiksatam> #startmeeting networking_policy 18:02:00 <openstack> Meeting started Thu Sep 17 18:01:59 2015 UTC and is due to finish in 60 minutes. The chair is SumitNaiksatam. Information about MeetBot at http://wiki.debian.org/MeetBot. 18:02:01 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 18:02:03 <openstack> The meeting name has been set to 'networking_policy' 18:02:17 <SumitNaiksatam> #info agenda https://wiki.openstack.org/wiki/Meetings/GroupBasedPolicy#Sept_3rd.2C_10th.2C_17th_2015 18:02:29 <SumitNaiksatam> not much change in the agenda from last time 18:02:46 <SumitNaiksatam> our plan was to release l1 on sept 20th 18:03:04 <SumitNaiksatam> let me know if there are any concerns 18:03:38 <SumitNaiksatam> #topic Integration tests 18:03:46 <rkukura> SumitNaiksatam: Do you mean “out plan is …”? 18:03:47 <SumitNaiksatam> per last week, i had posted #link https://review.openstack.org/#/c/222048 18:03:54 <rkukura> s/out/our/ 18:04:25 <SumitNaiksatam> rkukura: yes, the plan was and still “is” ;-) 18:04:37 <rkukura> thanks 18:04:48 <SumitNaiksatam> i need to update the license header 18:05:05 <SumitNaiksatam> ivar-lazzaro: thanks for catching that, i made a mess in the search and replace 18:05:36 <SumitNaiksatam> anything else to discuss on that patch? 18:06:03 <SumitNaiksatam> okay 18:06:04 <SumitNaiksatam> #topic Packaging 18:06:06 <ivar-lazzaro> I was a bit confused about the comment on subprocess 18:06:21 <SumitNaiksatam> #undo 18:06:22 <openstack> Removing item from minutes: <ircmeeting.items.Topic object at 0x9cc1fd0> 18:06:25 <ivar-lazzaro> "This has NO USE of subprocess module" 18:06:47 <ivar-lazzaro> and yet that is imported below... Am I missing something? :) 18:06:55 <SumitNaiksatam> ivar-lazzaro: hopefully jishnu can respond to that, its his code verbatim 18:07:14 <ivar-lazzaro> ok 18:07:22 <ivar-lazzaro> will catch up with him 18:07:55 <SumitNaiksatam> #topic Packaging 18:08:23 <rkukura> No news from me on this - I’ve been focusing on bugs and reviews, plus had some PTO 18:09:28 <SumitNaiksatam> rkukura: okay 18:09:56 <SumitNaiksatam> #topic Bugs 18:11:12 <SumitNaiksatam> rkukura: want to discuss the patch you posted yesterday? 18:11:24 <rkukura> sure 18:11:31 <SumitNaiksatam> #link https://review.openstack.org/#/c/224293/ 18:11:53 <SumitNaiksatam> rkukura: so regarding the question on the success reported 18:11:58 <SumitNaiksatam> on the rally job 18:12:05 <rkukura> Apparently, the IP Pool overlaps exception is being hit via some codepath other than the one I fixed. 18:12:12 <SumitNaiksatam> rkukura: okay 18:12:30 <rkukura> I do see failures in http://logs.openstack.org/24/166424/44/check/gate-group-based-policy-dsvm-rally/eb4a553//logs/rally-task-results.txt.gz 18:12:35 <SumitNaiksatam> rkukura: right 18:13:01 <SumitNaiksatam> so the rally job is not instrumented very liniently right now 18:13:20 <SumitNaiksatam> so it does not report failures unless its a 0% failure on a particular test 18:13:36 <SumitNaiksatam> i meant it *is* linient 18:13:37 <rkukura> But I’m not seeing the new logging I added in the q-svc log that I’d expect if this was happening during implicit default l3p creation 18:13:46 <ivar-lazzaro> rkukura: have you considered using overlapping name as per my comment? 18:14:02 <rkukura> ivar-lazzaro: I’m not sure what you mean by overlapping name 18:14:29 <ivar-lazzaro> rkukura: we could verify in the create_l3p_precommit that no other L3P with name "default" exist for that tenant 18:14:57 <ivar-lazzaro> this will be more reliable, and there are other places where this is done already 18:15:07 <rkukura> so your idea is to head off creation before we get to the postcommit 18:15:10 <SumitNaiksatam> ivar-lazzaro: so “default” would be a reserved name 18:15:12 <ivar-lazzaro> (for instance, I think you can't create a default SG in Neutron) 18:15:23 <ivar-lazzaro> SumitNaiksatam: yes 18:15:35 <SumitNaiksatam> ivar-lazzaro: you are right, there is that precedence in neutron 18:16:00 <ivar-lazzaro> that way we won't need to change the implementation once we fix the overlapping IP limitation 18:16:03 <rkukura> ivar-lazzaro: This may be the way to go, but I’d like to understand what is happening currently that is preventing my fix from working 18:16:37 <rkukura> I should be able to try ivar-lazzaro’s suggestion later today 18:16:52 <ivar-lazzaro> rkukura: sounds like a plan! 18:16:54 <SumitNaiksatam> rkukura: i agree, it will be really helpful to understand what code blocks are problem areas for us from a concurrency perspective 18:17:43 <SumitNaiksatam> rkukura: so regarding the rally job status, the “success” report is kind of misleading and we need to change that 18:18:19 <rkukura> SumitNaiksatam: Do we want to fix the undelying issue first so that we really do expect 100% success? 18:18:28 <SumitNaiksatam> rkukura: so we should decide what level of concurrency error is tolerable, or we can just say that report failure if any test scenarion does not succeed 100% of the time 18:18:57 <SumitNaiksatam> rkukura: yes sure, lets fix this, and see it get to 100%, and then we can consider changing the job reporting 18:19:07 <rkukura> SumitNaiksatam: +1 18:19:53 <SumitNaiksatam> this is actually the reason why i had kept it the way it is, because we had latent issues that we needed to fix before we put up a benchmark 18:20:00 <SumitNaiksatam> rkukura: thanks for the update on the patch 18:20:13 <SumitNaiksatam> rkukura: so you were able to find all the logs that you were looking for? 18:20:35 <ivar-lazzaro> SumitNaiksatam: I'd rather keep it non voting 18:20:40 <rkukura> SumitNaiksatam: yes, although I wish GBP did more logging in q-svc.log 18:20:44 <SumitNaiksatam> ivar-lazzaro: it is non-voting 18:20:48 <ivar-lazzaro> SumitNaiksatam: and have people attention on concurrency issues 18:20:55 <ivar-lazzaro> when they come 18:21:01 <ivar-lazzaro> instead of making it more forgiving 18:21:03 <SumitNaiksatam> rkukura: agreed 18:21:11 <ivar-lazzaro> people don't look at SUCCESS logs ;) 18:21:23 <SumitNaiksatam> ivar-lazzaro: i agree 18:22:06 <SumitNaiksatam> for now, just spreading the word around that we should examine rally job logs to see the actual passing percentage per scenario 18:23:50 <SumitNaiksatam> ivar-lazzaro: do we need to discuss any of your patches? 18:24:30 <ivar-lazzaro> Does anyone have any question on them? 18:24:42 <ivar-lazzaro> not sure which one in particular could need attention 18:25:19 <SumitNaiksatam> ivar-lazzaro: i believe we decided that https://review.openstack.org/212707 and https://review.openstack.org/166424 were ready 18:25:25 <SumitNaiksatam> based on mageshgv’s testing 18:25:38 <SumitNaiksatam> so we will move forward with them shortly 18:26:00 <ivar-lazzaro> yes, I need to remove the wip 18:26:01 <SumitNaiksatam> i believe #link https://review.openstack.org/224383 is also good 18:26:24 <rkukura> Was just going to mention the WIP in https://review.openstack.org/#/c/212707/ 18:27:55 <ivar-lazzaro> done 18:27:55 <SumitNaiksatam> mageshgv: hi 18:28:00 <SumitNaiksatam> ivar-lazzaro: thanks ;-) 18:28:01 <mageshgv> SumitNaiksatam: hi 18:28:21 <rkukura> I think I missed the last couple lines - anything important? 18:28:23 <SumitNaiksatam> mageshgv: any feedback on the above discussion, or anything you are blocked on? 18:28:41 <SumitNaiksatam> rkukura: i dont think so 18:28:46 <rkukura> ok 18:29:17 <mageshgv> SumitNaiksatam: I ran into a few minor issues with provider owned patch, will add them to the review 18:29:33 <SumitNaiksatam> mageshgv: ah good to know, timely feedback 18:30:36 <SumitNaiksatam> mageshgv: are you still pursuing this #link https://review.openstack.org/212676 18:30:37 <SumitNaiksatam> ? 18:31:20 <mageshgv> SumitNaiksatam: yes, will have to enforce the check on drivers only for the service chains 18:31:30 <SumitNaiksatam> mageshgv: okay 18:32:04 <SumitNaiksatam> rkukura: any chance you were able to explore #link https://bugs.launchpad.net/group-based-policy/+bug/1417312 or 18:32:05 <openstack> Launchpad bug 1417312 in Group Based Policy "GBP: Existing L3Policy results in failure to create PTG with default L2Policy" [High,Confirmed] - Assigned to Robert Kukura (rkukura) 18:32:13 <SumitNaiksatam> https://bugs.launchpad.net/group-based-policy/+bug/1470646 18:32:14 <openstack> Launchpad bug 1470646 in Group Based Policy "Deleting network associated with L2P results in infinite loop" [High,Triaged] - Assigned to Robert Kukura (rkukura) 18:32:29 <rkukura> I’ve looked at both a bit 18:34:20 <rkukura> On the first, is the bug stating we need the ability to configure different default IP pools for L3Ps named ‘default’ than for L3Ps not named ‘default’? 18:35:30 <rkukura> And on the 2nd, do we want to consider monkey patching neutron to avoid the loop? 18:36:04 <SumitNaiksatam> rkukura: for the first i think the observation was that using an IP pool which overlaps the “default” ip_pool results in error but leaves some dangling resources 18:36:17 <SumitNaiksatam> the error is possibly coming from post-commit 18:36:35 <SumitNaiksatam> so i believe the suggestion is to catch the overlapping ip_pool upfront 18:36:51 <SumitNaiksatam> on the second one, it sounds fine to me to patch 18:36:57 <rkukura> SumitNaiksatam: I may have missed the dangling resources aspect of this 18:37:01 <SumitNaiksatam> i guess we probably dont have too many other options 18:37:25 <ivar-lazzaro> SumitNaiksatam: shouldn't be a postcommit check though 18:38:09 <SumitNaiksatam> ivar-lazzaro: you are saying it should not be, it is currently not the case? 18:38:21 <SumitNaiksatam> * or its currently not the case 18:38:45 <ivar-lazzaro> SumitNaiksatam: I think this is not the case today 18:38:52 <ivar-lazzaro> SumitNaiksatam: but let me check very quickly 18:39:05 <SumitNaiksatam> rkukura: lets follow up with jishnub (bug reporter) on what inconsistent state he was seeing 18:39:07 <SumitNaiksatam> ivar-lazzaro: thanks 18:39:29 <ivar-lazzaro> SumitNaiksatam: also... What driver are we talking about 18:39:32 <ivar-lazzaro> ? :) 18:39:35 <rkukura> SumitNaiksatam: Sounds like the PTG itself isn’t getting cleaned up 18:39:37 <SumitNaiksatam> ivar-lazzaro: RMD 18:39:56 <SumitNaiksatam> rkukura: okay 18:40:36 <ivar-lazzaro> #link https://github.com/stackforge/group-based-policy/blob/master/gbpservice/neutron/services/grouppolicy/drivers/resource_mapping.py#L827-L829 18:40:41 <ivar-lazzaro> that's precommit operation 18:40:53 <ivar-lazzaro> this failure should rollback everything 18:41:36 <rkukura> Is the L3P getting created during the PTG’s postcommit? 18:41:39 <ivar-lazzaro> But I think we are dealing with the typical "IPD plot twist" issue :D 18:42:02 <SumitNaiksatam> ivar-lazzaro: what if the ip_pool is a subset of the default ip_pool? 18:42:20 <ivar-lazzaro> rkukura: probably, but on any exception during Creation we delete the whole thing 18:42:21 <rkukura> We should be deleteing the resource when an exception is raising during postcommit of a create 18:42:53 <ivar-lazzaro> rkukura: but since the L3P was already deleted within its precommit, IPD doesn't find it and fails during deletion 18:43:03 <ivar-lazzaro> rkukura: we do, for all resources 18:44:05 <SumitNaiksatam> its quite possible that this bug does not stil manifest exactly as how it was observed originally 18:44:29 <rkukura> ivar-lazzaro: I see of errors in q-svc.log like: ERROR gbpservice.neutron.services.grouppolicy.plugin [req-61d2038c-d838-4805-825c-c3d8ebaba95a None None] delete_policy_target_group_postcommit failed for policy_target_group 1b4866b6-eea6-4180-a4c2-9681c87690e4 18:44:35 <SumitNaiksatam> but i dont think we have updated this part of the RMD driver code for a while now, so it might still be there 18:44:37 <ivar-lazzaro> SumitNaiksatam: in that case (subset ip pool) it should still fail 18:44:48 <SumitNaiksatam> ivar-lazzaro: okay 18:44:48 <rkukura> ivar-lazzaro: Do you think these are from the IPD driver in the case aboce? 18:44:52 <rkukura> above 18:45:00 <ivar-lazzaro> rkukura: yes 18:45:17 <ivar-lazzaro> rkukura: The other issue we have there is that we don't LOG the error 18:45:29 <ivar-lazzaro> rkukura: actually we do, but it should be a LOG.exception instead 18:46:42 <SumitNaiksatam> ivar-lazzaro: so yeah, this sounds like the IPD, RMD interaction issue where the rollback can happen only up to a certain extent 18:46:43 <ivar-lazzaro> good news is that this issue seems reproducible via UTs 18:46:54 <SumitNaiksatam> in this case the new L3P does not get created 18:47:17 <SumitNaiksatam> but the PTG in error is not rolled back 18:47:30 <ivar-lazzaro> SumitNaiksatam: it is rolled back! But the rollback fails 18:47:45 <ivar-lazzaro> SumitNaiksatam: that's what I suspect at least 18:48:01 <SumitNaiksatam> ivar-lazzaro: yeah, i mean the net effect being that the PTG stays in the DB 18:48:06 <rkukura> I’ll dig into this one I’ve updated the fix for the 1st bug 18:48:13 <SumitNaiksatam> in an inconsistent state 18:48:23 <SumitNaiksatam> rkukura: nice, thanks 18:48:53 <SumitNaiksatam> #topic Open Discussion 18:49:07 <SumitNaiksatam> just wanted to update there were some gate issues yesterday 18:49:20 <SumitNaiksatam> new oslo and client libraries were released yesterday 18:49:34 <SumitNaiksatam> and they caused kilo branch breakages 18:49:51 <SumitNaiksatam> we had to cap some oslo lib versions to get beyond this 18:50:07 <SumitNaiksatam> also currentlt the gate seems to be stalled 18:50:40 <SumitNaiksatam> i noticed in the morning that the pypi mirror for openstack was not finding the right pbr version 18:51:08 <SumitNaiksatam> i suspect that at this point the jobs are only queued, but nothing is actually executing 18:51:56 <SumitNaiksatam> i also have not posted the client fix for the stable/juno branch which will align it with neutronclient 2.3.12 18:52:07 <SumitNaiksatam> will try to do that shortly, will let you know 18:52:11 <SumitNaiksatam> anything else? 18:52:53 <SumitNaiksatam> alright, thanks everyone! 18:52:56 <SumitNaiksatam> bye! 18:52:59 <rkukura> thanks SumitNaiksatam! 18:53:10 <ivar-lazzaro> bye! 18:53:15 <SumitNaiksatam> #endmeeting