18:01:59 #startmeeting networking_policy 18:02:00 Meeting started Thu Sep 17 18:01:59 2015 UTC and is due to finish in 60 minutes. The chair is SumitNaiksatam. Information about MeetBot at http://wiki.debian.org/MeetBot. 18:02:01 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 18:02:03 The meeting name has been set to 'networking_policy' 18:02:17 #info agenda https://wiki.openstack.org/wiki/Meetings/GroupBasedPolicy#Sept_3rd.2C_10th.2C_17th_2015 18:02:29 not much change in the agenda from last time 18:02:46 our plan was to release l1 on sept 20th 18:03:04 let me know if there are any concerns 18:03:38 #topic Integration tests 18:03:46 SumitNaiksatam: Do you mean “out plan is …”? 18:03:47 per last week, i had posted #link https://review.openstack.org/#/c/222048 18:03:54 s/out/our/ 18:04:25 rkukura: yes, the plan was and still “is” ;-) 18:04:37 thanks 18:04:48 i need to update the license header 18:05:05 ivar-lazzaro: thanks for catching that, i made a mess in the search and replace 18:05:36 anything else to discuss on that patch? 18:06:03 okay 18:06:04 #topic Packaging 18:06:06 I was a bit confused about the comment on subprocess 18:06:21 #undo 18:06:22 Removing item from minutes: 18:06:25 "This has NO USE of subprocess module" 18:06:47 and yet that is imported below... Am I missing something? :) 18:06:55 ivar-lazzaro: hopefully jishnu can respond to that, its his code verbatim 18:07:14 ok 18:07:22 will catch up with him 18:07:55 #topic Packaging 18:08:23 No news from me on this - I’ve been focusing on bugs and reviews, plus had some PTO 18:09:28 rkukura: okay 18:09:56 #topic Bugs 18:11:12 rkukura: want to discuss the patch you posted yesterday? 18:11:24 sure 18:11:31 #link https://review.openstack.org/#/c/224293/ 18:11:53 rkukura: so regarding the question on the success reported 18:11:58 on the rally job 18:12:05 Apparently, the IP Pool overlaps exception is being hit via some codepath other than the one I fixed. 18:12:12 rkukura: okay 18:12:30 I do see failures in http://logs.openstack.org/24/166424/44/check/gate-group-based-policy-dsvm-rally/eb4a553//logs/rally-task-results.txt.gz 18:12:35 rkukura: right 18:13:01 so the rally job is not instrumented very liniently right now 18:13:20 so it does not report failures unless its a 0% failure on a particular test 18:13:36 i meant it *is* linient 18:13:37 But I’m not seeing the new logging I added in the q-svc log that I’d expect if this was happening during implicit default l3p creation 18:13:46 rkukura: have you considered using overlapping name as per my comment? 18:14:02 ivar-lazzaro: I’m not sure what you mean by overlapping name 18:14:29 rkukura: we could verify in the create_l3p_precommit that no other L3P with name "default" exist for that tenant 18:14:57 this will be more reliable, and there are other places where this is done already 18:15:07 so your idea is to head off creation before we get to the postcommit 18:15:10 ivar-lazzaro: so “default” would be a reserved name 18:15:12 (for instance, I think you can't create a default SG in Neutron) 18:15:23 SumitNaiksatam: yes 18:15:35 ivar-lazzaro: you are right, there is that precedence in neutron 18:16:00 that way we won't need to change the implementation once we fix the overlapping IP limitation 18:16:03 ivar-lazzaro: This may be the way to go, but I’d like to understand what is happening currently that is preventing my fix from working 18:16:37 I should be able to try ivar-lazzaro’s suggestion later today 18:16:52 rkukura: sounds like a plan! 18:16:54 rkukura: i agree, it will be really helpful to understand what code blocks are problem areas for us from a concurrency perspective 18:17:43 rkukura: so regarding the rally job status, the “success” report is kind of misleading and we need to change that 18:18:19 SumitNaiksatam: Do we want to fix the undelying issue first so that we really do expect 100% success? 18:18:28 rkukura: so we should decide what level of concurrency error is tolerable, or we can just say that report failure if any test scenarion does not succeed 100% of the time 18:18:57 rkukura: yes sure, lets fix this, and see it get to 100%, and then we can consider changing the job reporting 18:19:07 SumitNaiksatam: +1 18:19:53 this is actually the reason why i had kept it the way it is, because we had latent issues that we needed to fix before we put up a benchmark 18:20:00 rkukura: thanks for the update on the patch 18:20:13 rkukura: so you were able to find all the logs that you were looking for? 18:20:35 SumitNaiksatam: I'd rather keep it non voting 18:20:40 SumitNaiksatam: yes, although I wish GBP did more logging in q-svc.log 18:20:44 ivar-lazzaro: it is non-voting 18:20:48 SumitNaiksatam: and have people attention on concurrency issues 18:20:55 when they come 18:21:01 instead of making it more forgiving 18:21:03 rkukura: agreed 18:21:11 people don't look at SUCCESS logs ;) 18:21:23 ivar-lazzaro: i agree 18:22:06 for now, just spreading the word around that we should examine rally job logs to see the actual passing percentage per scenario 18:23:50 ivar-lazzaro: do we need to discuss any of your patches? 18:24:30 Does anyone have any question on them? 18:24:42 not sure which one in particular could need attention 18:25:19 ivar-lazzaro: i believe we decided that https://review.openstack.org/212707 and https://review.openstack.org/166424 were ready 18:25:25 based on mageshgv’s testing 18:25:38 so we will move forward with them shortly 18:26:00 yes, I need to remove the wip 18:26:01 i believe #link https://review.openstack.org/224383 is also good 18:26:24 Was just going to mention the WIP in https://review.openstack.org/#/c/212707/ 18:27:55 done 18:27:55 mageshgv: hi 18:28:00 ivar-lazzaro: thanks ;-) 18:28:01 SumitNaiksatam: hi 18:28:21 I think I missed the last couple lines - anything important? 18:28:23 mageshgv: any feedback on the above discussion, or anything you are blocked on? 18:28:41 rkukura: i dont think so 18:28:46 ok 18:29:17 SumitNaiksatam: I ran into a few minor issues with provider owned patch, will add them to the review 18:29:33 mageshgv: ah good to know, timely feedback 18:30:36 mageshgv: are you still pursuing this #link https://review.openstack.org/212676 18:30:37 ? 18:31:20 SumitNaiksatam: yes, will have to enforce the check on drivers only for the service chains 18:31:30 mageshgv: okay 18:32:04 rkukura: any chance you were able to explore #link https://bugs.launchpad.net/group-based-policy/+bug/1417312 or 18:32:05 Launchpad bug 1417312 in Group Based Policy "GBP: Existing L3Policy results in failure to create PTG with default L2Policy" [High,Confirmed] - Assigned to Robert Kukura (rkukura) 18:32:13 https://bugs.launchpad.net/group-based-policy/+bug/1470646 18:32:14 Launchpad bug 1470646 in Group Based Policy "Deleting network associated with L2P results in infinite loop" [High,Triaged] - Assigned to Robert Kukura (rkukura) 18:32:29 I’ve looked at both a bit 18:34:20 On the first, is the bug stating we need the ability to configure different default IP pools for L3Ps named ‘default’ than for L3Ps not named ‘default’? 18:35:30 And on the 2nd, do we want to consider monkey patching neutron to avoid the loop? 18:36:04 rkukura: for the first i think the observation was that using an IP pool which overlaps the “default” ip_pool results in error but leaves some dangling resources 18:36:17 the error is possibly coming from post-commit 18:36:35 so i believe the suggestion is to catch the overlapping ip_pool upfront 18:36:51 on the second one, it sounds fine to me to patch 18:36:57 SumitNaiksatam: I may have missed the dangling resources aspect of this 18:37:01 i guess we probably dont have too many other options 18:37:25 SumitNaiksatam: shouldn't be a postcommit check though 18:38:09 ivar-lazzaro: you are saying it should not be, it is currently not the case? 18:38:21 * or its currently not the case 18:38:45 SumitNaiksatam: I think this is not the case today 18:38:52 SumitNaiksatam: but let me check very quickly 18:39:05 rkukura: lets follow up with jishnub (bug reporter) on what inconsistent state he was seeing 18:39:07 ivar-lazzaro: thanks 18:39:29 SumitNaiksatam: also... What driver are we talking about 18:39:32 ? :) 18:39:35 SumitNaiksatam: Sounds like the PTG itself isn’t getting cleaned up 18:39:37 ivar-lazzaro: RMD 18:39:56 rkukura: okay 18:40:36 #link https://github.com/stackforge/group-based-policy/blob/master/gbpservice/neutron/services/grouppolicy/drivers/resource_mapping.py#L827-L829 18:40:41 that's precommit operation 18:40:53 this failure should rollback everything 18:41:36 Is the L3P getting created during the PTG’s postcommit? 18:41:39 But I think we are dealing with the typical "IPD plot twist" issue :D 18:42:02 ivar-lazzaro: what if the ip_pool is a subset of the default ip_pool? 18:42:20 rkukura: probably, but on any exception during Creation we delete the whole thing 18:42:21 We should be deleteing the resource when an exception is raising during postcommit of a create 18:42:53 rkukura: but since the L3P was already deleted within its precommit, IPD doesn't find it and fails during deletion 18:43:03 rkukura: we do, for all resources 18:44:05 its quite possible that this bug does not stil manifest exactly as how it was observed originally 18:44:29 ivar-lazzaro: I see of errors in q-svc.log like: ERROR gbpservice.neutron.services.grouppolicy.plugin [req-61d2038c-d838-4805-825c-c3d8ebaba95a None None] delete_policy_target_group_postcommit failed for policy_target_group 1b4866b6-eea6-4180-a4c2-9681c87690e4 18:44:35 but i dont think we have updated this part of the RMD driver code for a while now, so it might still be there 18:44:37 SumitNaiksatam: in that case (subset ip pool) it should still fail 18:44:48 ivar-lazzaro: okay 18:44:48 ivar-lazzaro: Do you think these are from the IPD driver in the case aboce? 18:44:52 above 18:45:00 rkukura: yes 18:45:17 rkukura: The other issue we have there is that we don't LOG the error 18:45:29 rkukura: actually we do, but it should be a LOG.exception instead 18:46:42 ivar-lazzaro: so yeah, this sounds like the IPD, RMD interaction issue where the rollback can happen only up to a certain extent 18:46:43 good news is that this issue seems reproducible via UTs 18:46:54 in this case the new L3P does not get created 18:47:17 but the PTG in error is not rolled back 18:47:30 SumitNaiksatam: it is rolled back! But the rollback fails 18:47:45 SumitNaiksatam: that's what I suspect at least 18:48:01 ivar-lazzaro: yeah, i mean the net effect being that the PTG stays in the DB 18:48:06 I’ll dig into this one I’ve updated the fix for the 1st bug 18:48:13 in an inconsistent state 18:48:23 rkukura: nice, thanks 18:48:53 #topic Open Discussion 18:49:07 just wanted to update there were some gate issues yesterday 18:49:20 new oslo and client libraries were released yesterday 18:49:34 and they caused kilo branch breakages 18:49:51 we had to cap some oslo lib versions to get beyond this 18:50:07 also currentlt the gate seems to be stalled 18:50:40 i noticed in the morning that the pypi mirror for openstack was not finding the right pbr version 18:51:08 i suspect that at this point the jobs are only queued, but nothing is actually executing 18:51:56 i also have not posted the client fix for the stable/juno branch which will align it with neutronclient 2.3.12 18:52:07 will try to do that shortly, will let you know 18:52:11 anything else? 18:52:53 alright, thanks everyone! 18:52:56 bye! 18:52:59 thanks SumitNaiksatam! 18:53:10 bye! 18:53:15 #endmeeting