18:01:59 <SumitNaiksatam> #startmeeting networking_policy
18:02:00 <openstack> Meeting started Thu Sep 17 18:01:59 2015 UTC and is due to finish in 60 minutes.  The chair is SumitNaiksatam. Information about MeetBot at http://wiki.debian.org/MeetBot.
18:02:01 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
18:02:03 <openstack> The meeting name has been set to 'networking_policy'
18:02:17 <SumitNaiksatam> #info agenda https://wiki.openstack.org/wiki/Meetings/GroupBasedPolicy#Sept_3rd.2C_10th.2C_17th_2015
18:02:29 <SumitNaiksatam> not much change in the agenda from last time
18:02:46 <SumitNaiksatam> our plan was to release l1 on sept 20th
18:03:04 <SumitNaiksatam> let me know if there are any concerns
18:03:38 <SumitNaiksatam> #topic Integration tests
18:03:46 <rkukura> SumitNaiksatam: Do you mean “out plan is …”?
18:03:47 <SumitNaiksatam> per last week, i had posted #link https://review.openstack.org/#/c/222048
18:03:54 <rkukura> s/out/our/
18:04:25 <SumitNaiksatam> rkukura: yes, the plan was and still “is” ;-)
18:04:37 <rkukura> thanks
18:04:48 <SumitNaiksatam> i need to update the license header
18:05:05 <SumitNaiksatam> ivar-lazzaro: thanks for catching that, i made a mess in the search and replace
18:05:36 <SumitNaiksatam> anything else to discuss on that patch?
18:06:03 <SumitNaiksatam> okay
18:06:04 <SumitNaiksatam> #topic Packaging
18:06:06 <ivar-lazzaro> I was a bit confused about the comment on subprocess
18:06:21 <SumitNaiksatam> #undo
18:06:22 <openstack> Removing item from minutes: <ircmeeting.items.Topic object at 0x9cc1fd0>
18:06:25 <ivar-lazzaro> "This has NO USE of subprocess module"
18:06:47 <ivar-lazzaro> and yet that is imported below... Am I missing something? :)
18:06:55 <SumitNaiksatam> ivar-lazzaro: hopefully jishnu can respond to that, its his code verbatim
18:07:14 <ivar-lazzaro> ok
18:07:22 <ivar-lazzaro> will catch up with him
18:07:55 <SumitNaiksatam> #topic Packaging
18:08:23 <rkukura> No news from me on this - I’ve been focusing on bugs and reviews, plus had some PTO
18:09:28 <SumitNaiksatam> rkukura: okay
18:09:56 <SumitNaiksatam> #topic Bugs
18:11:12 <SumitNaiksatam> rkukura: want to discuss the patch you posted yesterday?
18:11:24 <rkukura> sure
18:11:31 <SumitNaiksatam> #link https://review.openstack.org/#/c/224293/
18:11:53 <SumitNaiksatam> rkukura: so regarding the question on the success reported
18:11:58 <SumitNaiksatam> on the rally job
18:12:05 <rkukura> Apparently, the IP Pool overlaps exception is being hit via some codepath other than the one I fixed.
18:12:12 <SumitNaiksatam> rkukura: okay
18:12:30 <rkukura> I do see failures in http://logs.openstack.org/24/166424/44/check/gate-group-based-policy-dsvm-rally/eb4a553//logs/rally-task-results.txt.gz
18:12:35 <SumitNaiksatam> rkukura: right
18:13:01 <SumitNaiksatam> so the rally job is not instrumented very liniently right now
18:13:20 <SumitNaiksatam> so it does not report failures unless its a 0% failure on a particular test
18:13:36 <SumitNaiksatam> i meant it *is* linient
18:13:37 <rkukura> But I’m not seeing the new logging I added in the q-svc log that I’d expect if this was happening during implicit default l3p creation
18:13:46 <ivar-lazzaro> rkukura: have you considered using overlapping name as per my comment?
18:14:02 <rkukura> ivar-lazzaro: I’m not sure what you mean by overlapping name
18:14:29 <ivar-lazzaro> rkukura: we could verify in the create_l3p_precommit that no other L3P with name "default" exist for that tenant
18:14:57 <ivar-lazzaro> this will be more reliable, and there are other places where this is done already
18:15:07 <rkukura> so your idea is to head off creation before we get to the postcommit
18:15:10 <SumitNaiksatam> ivar-lazzaro: so “default” would be a reserved name
18:15:12 <ivar-lazzaro> (for instance, I think you can't create a default SG in Neutron)
18:15:23 <ivar-lazzaro> SumitNaiksatam: yes
18:15:35 <SumitNaiksatam> ivar-lazzaro: you are right, there is that precedence in neutron
18:16:00 <ivar-lazzaro> that way we won't need to change the implementation once we fix the overlapping IP limitation
18:16:03 <rkukura> ivar-lazzaro: This may be the way to go, but I’d like to understand what is happening currently that is preventing my fix from working
18:16:37 <rkukura> I should be able to try ivar-lazzaro’s suggestion later today
18:16:52 <ivar-lazzaro> rkukura: sounds like a plan!
18:16:54 <SumitNaiksatam> rkukura: i agree, it will be really helpful to understand what code blocks are problem areas for us from a concurrency perspective
18:17:43 <SumitNaiksatam> rkukura: so regarding the rally job status, the “success” report is kind of misleading and we need to change that
18:18:19 <rkukura> SumitNaiksatam: Do we want to fix the undelying issue first so that we really do expect 100% success?
18:18:28 <SumitNaiksatam> rkukura: so we should decide what level of concurrency error is tolerable, or we can just say that report failure if any test scenarion does not succeed 100% of the time
18:18:57 <SumitNaiksatam> rkukura: yes sure, lets fix this, and see it get to 100%, and then we can consider changing the job reporting
18:19:07 <rkukura> SumitNaiksatam: +1
18:19:53 <SumitNaiksatam> this is actually the reason why i had kept it the way it is, because we had latent issues that we needed to fix before we put up a benchmark
18:20:00 <SumitNaiksatam> rkukura: thanks for the update on the patch
18:20:13 <SumitNaiksatam> rkukura: so you were able to find all the logs that you were looking for?
18:20:35 <ivar-lazzaro> SumitNaiksatam: I'd rather keep it non voting
18:20:40 <rkukura> SumitNaiksatam: yes, although I wish GBP did more logging in q-svc.log
18:20:44 <SumitNaiksatam> ivar-lazzaro: it is non-voting
18:20:48 <ivar-lazzaro> SumitNaiksatam: and have people attention on concurrency issues
18:20:55 <ivar-lazzaro> when they come
18:21:01 <ivar-lazzaro> instead of making it more forgiving
18:21:03 <SumitNaiksatam> rkukura: agreed
18:21:11 <ivar-lazzaro> people don't look at SUCCESS logs ;)
18:21:23 <SumitNaiksatam> ivar-lazzaro: i agree
18:22:06 <SumitNaiksatam> for now, just spreading the word around that we should examine rally job logs to see the actual passing percentage per scenario
18:23:50 <SumitNaiksatam> ivar-lazzaro: do we need to discuss any of your patches?
18:24:30 <ivar-lazzaro> Does anyone have any question on them?
18:24:42 <ivar-lazzaro> not sure which one in particular could need attention
18:25:19 <SumitNaiksatam> ivar-lazzaro: i believe we decided that https://review.openstack.org/212707 and https://review.openstack.org/166424 were ready
18:25:25 <SumitNaiksatam> based on mageshgv’s testing
18:25:38 <SumitNaiksatam> so we will move forward with them shortly
18:26:00 <ivar-lazzaro> yes, I need to remove the wip
18:26:01 <SumitNaiksatam> i believe #link https://review.openstack.org/224383 is also good
18:26:24 <rkukura> Was just going to mention the WIP in https://review.openstack.org/#/c/212707/
18:27:55 <ivar-lazzaro> done
18:27:55 <SumitNaiksatam> mageshgv: hi
18:28:00 <SumitNaiksatam> ivar-lazzaro: thanks ;-)
18:28:01 <mageshgv> SumitNaiksatam: hi
18:28:21 <rkukura> I think I missed the last couple lines - anything important?
18:28:23 <SumitNaiksatam> mageshgv: any feedback on the above discussion, or anything you are blocked on?
18:28:41 <SumitNaiksatam> rkukura: i dont think so
18:28:46 <rkukura> ok
18:29:17 <mageshgv> SumitNaiksatam: I ran into a few minor issues with provider owned patch, will add them to the review
18:29:33 <SumitNaiksatam> mageshgv: ah good to know, timely feedback
18:30:36 <SumitNaiksatam> mageshgv: are you still pursuing this #link https://review.openstack.org/212676
18:30:37 <SumitNaiksatam> ?
18:31:20 <mageshgv> SumitNaiksatam: yes, will have to enforce the check on drivers only for the service chains
18:31:30 <SumitNaiksatam> mageshgv: okay
18:32:04 <SumitNaiksatam> rkukura: any chance you were able to explore #link https://bugs.launchpad.net/group-based-policy/+bug/1417312 or
18:32:05 <openstack> Launchpad bug 1417312 in Group Based Policy "GBP: Existing L3Policy results in failure to create PTG with default L2Policy" [High,Confirmed] - Assigned to Robert Kukura (rkukura)
18:32:13 <SumitNaiksatam> https://bugs.launchpad.net/group-based-policy/+bug/1470646
18:32:14 <openstack> Launchpad bug 1470646 in Group Based Policy "Deleting network associated with L2P results in infinite loop" [High,Triaged] - Assigned to Robert Kukura (rkukura)
18:32:29 <rkukura> I’ve looked at both a bit
18:34:20 <rkukura> On the first, is the bug stating we need the ability to configure different default IP pools for L3Ps named ‘default’ than for L3Ps not named ‘default’?
18:35:30 <rkukura> And on the 2nd, do we want to consider monkey patching neutron to avoid the loop?
18:36:04 <SumitNaiksatam> rkukura: for the first i think the observation was that using an IP pool which overlaps the “default” ip_pool results in error but leaves some dangling resources
18:36:17 <SumitNaiksatam> the error is possibly coming from post-commit
18:36:35 <SumitNaiksatam> so i believe the suggestion is to catch the overlapping ip_pool upfront
18:36:51 <SumitNaiksatam> on the second one, it sounds fine to me to patch
18:36:57 <rkukura> SumitNaiksatam: I may have missed the dangling resources aspect of this
18:37:01 <SumitNaiksatam> i guess we probably dont have too many other options
18:37:25 <ivar-lazzaro> SumitNaiksatam: shouldn't be a postcommit check though
18:38:09 <SumitNaiksatam> ivar-lazzaro: you are saying it should not be, it is currently not the case?
18:38:21 <SumitNaiksatam> * or its currently not the case
18:38:45 <ivar-lazzaro> SumitNaiksatam: I think this is not the case today
18:38:52 <ivar-lazzaro> SumitNaiksatam: but let me check very quickly
18:39:05 <SumitNaiksatam> rkukura: lets follow up with jishnub (bug reporter) on what inconsistent state he was seeing
18:39:07 <SumitNaiksatam> ivar-lazzaro: thanks
18:39:29 <ivar-lazzaro> SumitNaiksatam: also... What driver are we talking about
18:39:32 <ivar-lazzaro> ? :)
18:39:35 <rkukura> SumitNaiksatam: Sounds like the PTG itself isn’t getting cleaned up
18:39:37 <SumitNaiksatam> ivar-lazzaro: RMD
18:39:56 <SumitNaiksatam> rkukura: okay
18:40:36 <ivar-lazzaro> #link https://github.com/stackforge/group-based-policy/blob/master/gbpservice/neutron/services/grouppolicy/drivers/resource_mapping.py#L827-L829
18:40:41 <ivar-lazzaro> that's precommit operation
18:40:53 <ivar-lazzaro> this failure should rollback everything
18:41:36 <rkukura> Is the L3P getting created during the PTG’s postcommit?
18:41:39 <ivar-lazzaro> But I think we are dealing with the typical "IPD plot twist" issue :D
18:42:02 <SumitNaiksatam> ivar-lazzaro: what if the ip_pool is a subset of the default ip_pool?
18:42:20 <ivar-lazzaro> rkukura: probably, but on any exception during Creation we delete the whole thing
18:42:21 <rkukura> We should be deleteing the resource when an exception is raising during postcommit of a create
18:42:53 <ivar-lazzaro> rkukura: but since the L3P was already deleted within its precommit, IPD doesn't find it and fails during deletion
18:43:03 <ivar-lazzaro> rkukura: we do, for all resources
18:44:05 <SumitNaiksatam> its quite possible that this bug does not stil manifest exactly as how it was observed originally
18:44:29 <rkukura> ivar-lazzaro: I see of errors in q-svc.log like: ERROR gbpservice.neutron.services.grouppolicy.plugin [req-61d2038c-d838-4805-825c-c3d8ebaba95a None None] delete_policy_target_group_postcommit failed for policy_target_group 1b4866b6-eea6-4180-a4c2-9681c87690e4
18:44:35 <SumitNaiksatam> but i dont think we have updated this part of the RMD driver code for a while now, so it might still be there
18:44:37 <ivar-lazzaro> SumitNaiksatam: in that case (subset ip pool) it should still fail
18:44:48 <SumitNaiksatam> ivar-lazzaro: okay
18:44:48 <rkukura> ivar-lazzaro: Do you think these are from the IPD driver in the case aboce?
18:44:52 <rkukura> above
18:45:00 <ivar-lazzaro> rkukura: yes
18:45:17 <ivar-lazzaro> rkukura: The other issue we have there is that we don't LOG the error
18:45:29 <ivar-lazzaro> rkukura: actually we do, but it should be a LOG.exception instead
18:46:42 <SumitNaiksatam> ivar-lazzaro: so yeah, this sounds like the IPD, RMD interaction issue where the rollback can happen only up to a certain extent
18:46:43 <ivar-lazzaro> good news is that this issue seems reproducible via UTs
18:46:54 <SumitNaiksatam> in this case the new L3P does not get created
18:47:17 <SumitNaiksatam> but the PTG in error is not rolled back
18:47:30 <ivar-lazzaro> SumitNaiksatam: it is rolled back! But the rollback fails
18:47:45 <ivar-lazzaro> SumitNaiksatam: that's what I suspect at least
18:48:01 <SumitNaiksatam> ivar-lazzaro: yeah, i mean the net effect being that the PTG stays in the DB
18:48:06 <rkukura> I’ll dig into this one I’ve updated the fix for the 1st bug
18:48:13 <SumitNaiksatam> in an inconsistent state
18:48:23 <SumitNaiksatam> rkukura: nice, thanks
18:48:53 <SumitNaiksatam> #topic Open Discussion
18:49:07 <SumitNaiksatam> just wanted to update there were some gate issues yesterday
18:49:20 <SumitNaiksatam> new oslo and client libraries were released yesterday
18:49:34 <SumitNaiksatam> and they caused kilo branch breakages
18:49:51 <SumitNaiksatam> we had to cap some oslo lib versions to get beyond this
18:50:07 <SumitNaiksatam> also currentlt the gate seems to be stalled
18:50:40 <SumitNaiksatam> i noticed in the morning that the pypi mirror for openstack was not finding the right pbr version
18:51:08 <SumitNaiksatam> i suspect that at this point the jobs are only queued, but nothing is actually executing
18:51:56 <SumitNaiksatam> i also have not posted the client fix for the stable/juno branch which will align it with neutronclient 2.3.12
18:52:07 <SumitNaiksatam> will try to do that shortly, will let you know
18:52:11 <SumitNaiksatam> anything else?
18:52:53 <SumitNaiksatam> alright, thanks everyone!
18:52:56 <SumitNaiksatam> bye!
18:52:59 <rkukura> thanks SumitNaiksatam!
18:53:10 <ivar-lazzaro> bye!
18:53:15 <SumitNaiksatam> #endmeeting