18:02:38 <SumitNaiksatam> #startmeeting networking_policy
18:02:39 <openstack> Meeting started Thu Nov 30 18:02:38 2017 UTC and is due to finish in 60 minutes.  The chair is SumitNaiksatam. Information about MeetBot at http://wiki.debian.org/MeetBot.
18:02:40 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
18:02:42 <openstack> The meeting name has been set to 'networking_policy'
18:02:47 <rkukura> hi annakk
18:03:38 <SumitNaiksatam> as I was mentioning to rkukura, I dont have much of an agenda for today except the “pike sync"
18:04:03 <SumitNaiksatam> but i thought it might be good to meet since we havent met for a few weeks for different reasons
18:04:05 * mlavalle waves at rkukura and SumitNaiksatam
18:04:19 <SumitNaiksatam> mlavalle: hi there! :-)
18:04:20 <rkukura> maybe we should discuss the stable/newton branch a bit too
18:04:25 <SumitNaiksatam> rkukura: sure
18:04:32 <SumitNaiksatam> #topic Pike Support
18:04:32 <rkukura> hi mlavalle!
18:05:13 <SumitNaiksatam> so we need at least one more reviewer on this: #link https://review.openstack.org/#/c/514864/
18:05:18 <SumitNaiksatam> rkukura: thanks for your review
18:05:34 <rkukura> looks good
18:05:37 <SumitNaiksatam> (had fixed the devstack gate job prior to this, so it reflects in this patch as well)
18:05:58 <SumitNaiksatam> annakk: sorry, i might have missed your comment here
18:06:08 <annakk> I had one comment there but its not a must for sure
18:06:29 <SumitNaiksatam> i saw it before thanksgiving and have been on leave since
18:06:34 <SumitNaiksatam> totally forgot about it
18:06:39 <annakk> np :)
18:06:52 <annakk> I just thought it was preferrable to limit the hack to aim notifier only
18:07:12 <SumitNaiksatam> annakk: yes sure, i will respond to the comment
18:07:28 <rkukura> I’ll be ready to re-review
18:07:48 <SumitNaiksatam> and i can also do a follow up patch if required (i might not be able to get to this until monday)
18:08:20 <SumitNaiksatam> i was spending most of my time on this patch #link https://review.openstack.org/#/c/514864/
18:08:30 <SumitNaiksatam> but the patchset is outdated
18:08:51 <SumitNaiksatam> i have lot more changes in my local repo which i should have posted
18:08:55 <rkukura> same patch?
18:09:02 <SumitNaiksatam> oh sorry
18:09:33 <SumitNaiksatam> #link https://review.openstack.org/#/c/518183/
18:09:54 <SumitNaiksatam> basically i was stuck at a transactions issue
18:10:39 <SumitNaiksatam> the new reader/wrtier calls in neutron were breaking the apic driver’s long transaction requirement
18:10:57 <SumitNaiksatam> had discussed with rkukura and tbachman offline before
18:11:01 <SumitNaiksatam> rkukura: thanks for the discussion
18:11:32 <rkukura> SumitNaiksatam: Is this an issue for the apic_aim MD, or just the aim_mapping PD?
18:11:33 <SumitNaiksatam> so it seems like nesting the reader/writer calls will preserve the session
18:11:36 <annakk> is this a new issue? sounds like a deja vu
18:11:56 <SumitNaiksatam> starts with apic_aim MD
18:12:01 <SumitNaiksatam> rkukura: ^^^
18:12:08 <SumitNaiksatam> and of course for the aim_mapping as well
18:12:32 <rkukura> SumitNaiksatam: was wondering if the problem surfaces in the apic_aim MD UTs?
18:12:38 <SumitNaiksatam> annakk: the issue is now, the usage of reader/writer is not new
18:13:05 <SumitNaiksatam> rkukura: yes, those are the ones i was trying to fix all along
18:13:35 <SumitNaiksatam> so anyway, the nesting seems to work, what was throwing me off was another bug in neutron (in the extend dict part)
18:14:37 <SumitNaiksatam> we will still have to patch the reader/writer part for attaching the notification queue to the session (like before)
18:14:48 <SumitNaiksatam> and for that i am trying to figure out how to get it to work with the context manager
18:15:04 <annakk> the extend dict was handled in ocata as well (subnet pools and address scopes)
18:15:38 <SumitNaiksatam> annakk: yes, but there was a commit made in neutron which moved the extend_dict call to a different point (essentially before post_commit is called)
18:15:54 <annakk> oh
18:16:06 <SumitNaiksatam> annakk: and because of that we were never seeing resources extended
18:16:08 <rkukura> SumitNaiksatam: Is it called within the transaction now?
18:16:19 <SumitNaiksatam> rkukura: no
18:16:40 <rkukura> That’s too bad
18:16:45 <SumitNaiksatam> rkukura: i had meant to sent you the commit last week, somehow slipped my mind
18:17:09 <SumitNaiksatam> by commit, i mean the patch where this change happened
18:17:28 <rkukura> I’d like to see it
18:17:35 <SumitNaiksatam> i cant seem to find it now
18:17:49 <SumitNaiksatam> it was a patch dealing with MTU setting i think
18:17:59 <rkukura> later is fine
18:18:46 <SumitNaiksatam> rkukura: to fix that would require a small monkey patch in GBP
18:19:07 <SumitNaiksatam> but having done all that, the UT seemed to pass in my local env
18:19:30 <rkukura> ok, but maybe there is a reason for the change in Neutron
18:19:42 <SumitNaiksatam> so that was some progress as opposed to what seemed a bit of a hopeless situation with regards to dealing with the transactions (when i last spoke to rkukura)
18:19:59 <SumitNaiksatam> rkukura: yes, i think for that particular resource they needed it that way
18:20:26 <SumitNaiksatam> rkukura: they might have not hit the side effects since other resources might not be using the extension
18:20:43 <SumitNaiksatam> rkukura: but anyway, you can take look at it and let me know what you think
18:20:50 <rkukura> sure
18:21:55 <rkukura> I’ve always felt extend_dict should be done just before the transaction commits to ensure that the results returned to the client reflect the state commited by the operation
18:22:07 <SumitNaiksatam> rkukura: agree
18:22:23 <SumitNaiksatam> i was down to some 200 UT failures and most seemed to be related to this transaction issue
18:22:41 <SumitNaiksatam> i hope there is no other issue lurking
18:23:07 <SumitNaiksatam> this was just the UT, so we would still need to update the devstack job to see if everything works properly
18:23:23 <SumitNaiksatam> *UTs
18:23:29 <rkukura> so did this reduce below 200 failures with the extend_dict monkey patch?
18:23:55 <rkukura> or was it 200 failures with the monkey patch?
18:24:01 <SumitNaiksatam> rkukura: no no, the 200 UT failures is prior to all the discussion above
18:24:16 <rkukura> ok, great!
18:24:16 <SumitNaiksatam> all what i discussed above is not in the lastes patchset posted
18:24:35 <SumitNaiksatam> and apologies, i should have posted that patchset like a week back
18:25:16 <SumitNaiksatam> i had too many debugs all over the place and didnt get a chance to clean them
18:25:18 <SumitNaiksatam> anywat
18:25:33 <SumitNaiksatam> annakk: rkukura anything more to discuss on Pike?
18:25:44 <annakk> not from me..
18:25:49 <rkukura> nothing from me
18:26:20 <SumitNaiksatam> once the UTs pass, perhaps if anyone has time, we can split work and knock off updating the gate jobs in parallel
18:26:34 <SumitNaiksatam> rkukura:  we would need to create temp branches for the apic repos
18:26:50 <SumitNaiksatam> rkukura: hopefully we dont have to do much code updates
18:26:59 <SumitNaiksatam> will need to discuss with tbachman
18:27:13 <rkukura> right
18:27:18 <SumitNaiksatam> so far i was just running against the master of those branches and the UTs seemed to be okay
18:27:33 <SumitNaiksatam> but we should at least have place holder branches
18:27:44 <annakk> I hope to be able to take some, but in a while cause next week I'm on PTO
18:27:55 <SumitNaiksatam> nsx is great in that regard since they have long had the pike branches :-)
18:27:59 <rkukura> I think we use AIM master for all GBP branches, don’t we?
18:28:01 <SumitNaiksatam> annakk: np
18:28:37 <SumitNaiksatam> rkukura: will need to go back and check (aim does use some oslo libs)
18:28:57 <rkukura> agreed we need to check
18:29:05 <SumitNaiksatam> #topic Newton branches
18:29:09 <SumitNaiksatam> rkukura: go ahead
18:30:25 <rkukura> I back-ported an AIM-related fix to stable/newton recently, and noticed only a small subset of the UTs run. I understand this is done because the number of cores for stable/newton jobs has been reduced.
18:30:54 <SumitNaiksatam> rkukura: yes, we went through several iterations on this, and eseentially gave up!
18:30:58 <rkukura> So I tried restoring the full set (we had already eliminated a lot of redundant Neutron UTs from ml2plus)
18:31:06 <SumitNaiksatam> since the branch is eol’ed upstream
18:31:22 <rkukura> Right. That stlll timed out.
18:31:36 <SumitNaiksatam> rkukura: right
18:32:04 <rkukura> So I tried running the full set locally. I ran into some failures with my branch, then eventually tried running the full set locally on stable/newton without my branch.
18:32:09 <rkukura> This also hit some failures.
18:32:29 <rkukura> So I think stable/newton may already be in a somewhat broken state.
18:32:32 <SumitNaiksatam> rkukura: i think there is an ordering issue
18:32:52 <rkukura> Right, pretty much any small group of tests would run fine in isolation.
18:32:57 <SumitNaiksatam> which gets exposed when we run subsets, or in local environments
18:33:08 <rkukura> I think the schema loading and table clearing stuff isn’t exactly right.
18:33:29 <SumitNaiksatam> i think the config gets carried over to tests
18:33:29 <SumitNaiksatam> rkukura: yes, that too
18:33:42 <SumitNaiksatam> i think i ran into this in the pike prep branch
18:33:43 <rkukura> I cleaned that up with regards to the AIM schema by adding a new AimSqlFixture
18:33:49 <SumitNaiksatam> rkukura: yes, that was great
18:34:45 <rkukura> And I spent some time trying to cleanup the way we handle the GBP schema, which uses Neutrons base class, and therefore needs to be handled via Neutron’s StaticSqlFixture
18:35:11 <rkukura> But getting this to work properly requires some work…
18:35:32 <rkukura> I think we need to elminate all of the non-fixture schema loading and table clearing code
18:35:48 <SumitNaiksatam> rkukura: yeah, and the problem is that its pretty to challenging to validate this in the upstream gate (for newton)
18:35:55 <rkukura> And we also need to ensure that the entire GBP schema is visible in every single test module
18:36:02 <SumitNaiksatam> rkukura: sure
18:36:39 <rkukura> So I think we should add a new models.py into GBP that imports all the DB model files that use Neutron’s ModelBase.
18:36:44 <SumitNaiksatam> rkukura: the assumption is that it is currently visible in all the tests that need it
18:37:14 <SumitNaiksatam> rkukura: okay
18:37:27 <rkukura> I think some of the tests don’t import model files that are not needed for that test modules, which means subsequent tests run by the same process fail later
18:37:51 <SumitNaiksatam> rkukura: okay
18:38:05 <rkukura> Just wanted to see if you all thought this analysis made sense
18:38:26 <SumitNaiksatam> we do the explicit create_tables thing, which is for sure a red flag, and should not be required if we use the fixtures properly, i guess
18:38:33 <rkukura> and if its worth trying to clean it up on master, then backport it to all our active branches
18:39:37 <SumitNaiksatam> rkukura: per my comment above, the only hole in my understanding is the use of the sql fixture, i thought using that correctly would fix this problem
18:39:41 <rkukura> SumitNaiksatam: My conclusion is that Neutron’s StaticSqlFixture should work properly as long as all the GBP repo’s models that use Neutron’s ModelBase are imported in every test module
18:40:08 <SumitNaiksatam> rkukura: ah, was just saying that, perhaps the import is required in addition to get it to work
18:40:10 <rkukura> I think its a combo of not going around the fixture, and making sure the fixture always loads the entire schema for all tests
18:40:38 <SumitNaiksatam> rkukura: yeah, nice explanation, it makes sense to me
18:40:52 <rkukura> annakk: make sense to you?
18:41:25 <annakk> I need to catch up on this, sorry :)
18:41:29 <rkukura> no problem
18:41:59 <rkukura> Anyway, if it does make sense, I’ll start a new branch and try to get it working
18:42:11 <SumitNaiksatam> rkukura: sounds good to me
18:42:28 <SumitNaiksatam> annakk: its a little difficult to follow this discussion without enough context
18:42:47 <rkukura> SumitNaiksatam: Did you say this may be currently an issue for the pike sync?
18:42:58 <annakk> yeah I noticed the cleanup was weird, but never got to the bottom of it
18:43:46 <SumitNaiksatam> rkukura: i made some spot fixes which eliminated issues with configs not getting loaded correctly (previously it worked since the test were running in a different order in the gate, i guess)
18:44:05 <SumitNaiksatam> rkukura: so, so far not a blocker for pike sync
18:44:30 <SumitNaiksatam> but i can only say with confidence if can get the whole suite to pass :-)
18:44:32 <rkukura> SumitNaiksatam: OK. If there are fixes related to UT ordering, it would be ideal to back-port those to the older branches
18:44:48 <SumitNaiksatam> rkukura: okay
18:45:14 <SumitNaiksatam> we might have to cherry-pick only part of the change
18:45:19 <rkukura> sure
18:45:43 <rkukura> the other thing with stable/newton...
18:46:17 <rkukura> It seems to me that if infra reduces the number of CPU cores allocated, they should correspondingly increase the overall job timeout.
18:46:34 <SumitNaiksatam> rkukura: right, that is the source of the timeout
18:47:01 <SumitNaiksatam> sorry, i missed your “if”
18:47:18 <rkukura> s/if/when/
18:47:47 <SumitNaiksatam> rkukura: well, based on my observation, that definitely did not happen
18:48:02 <rkukura> what’s the process to request such a change from infra?
18:48:19 <SumitNaiksatam> rkukura: i think this can be controlled from our gate job
18:48:26 <rkukura> really?
18:48:35 <SumitNaiksatam> i mean, not from our repo
18:48:48 <SumitNaiksatam> but from our definittion of the gate job in the openstack-infra
18:49:04 <rkukura> Do we propose a change in gerrit to do that?
18:49:09 <SumitNaiksatam> but i wasnt comforable even doing that before we put our house in order
18:49:23 <SumitNaiksatam> rkukura: yes, but i dont know what is that exact config
18:49:33 <rkukura> Any particular part of our house?
18:49:51 <SumitNaiksatam> i believe it would go into the zuul def in project-config repo
18:50:10 <SumitNaiksatam> rkukura: well we did a part of that by eliminating some UTs
18:50:26 <SumitNaiksatam> some of our UTs run for 90 seconds or so
18:50:58 <rkukura> We control the timeout for individual UTs in our own repo, so I don’t see that as an issue
18:51:09 <SumitNaiksatam> in general, if you see much bigger projects like neutron, their py27 jobs still finish in about 20 mins
18:51:24 <SumitNaiksatam> so we are doing more than unit testing :-)
18:51:30 <rkukura> we are better off with fewer long-running UTs that having many UTs duplicate the same work to get setup
18:51:51 <rkukura> right
18:51:57 <SumitNaiksatam> rkukura: i agree that the philosophy makes more sense for GBP
18:51:59 <rkukura> but I don’t see us changing that anytime soon
18:52:04 <SumitNaiksatam> rkukura: yes
18:52:13 <SumitNaiksatam> and hence my predicament in going to infra :-)
18:52:33 <SumitNaiksatam> *going to infra -> approaching the infra team earlier
18:53:28 <SumitNaiksatam> also trying to make this case for an eol’ed branch seemed like a tough proposition to begin with
18:53:29 <rkukura> If we just proposed a patch to an infra repo to increase the stable/newton py27 job timeout, explaining that its needed due to fewer cores being allocated, do you think they would reject it?
18:53:29 <annakk> can we split the UT to several jobs?
18:53:56 <SumitNaiksatam> annakk: yes, that is possible, and its a nice novel idea
18:54:08 <rkukura> Seems restructuring UTs is not something that should even be back-ported to a stable branch, let alone for an EOL branch
18:54:20 <SumitNaiksatam> rkukura: agree
18:54:39 <SumitNaiksatam> annakk: but i dont think we would be able to do that for newton branch
18:54:59 <rkukura> So we could promise to work to improve things on our master, while asking for the change on the EOL stable jobs
18:55:00 <SumitNaiksatam> rkukura: one option is taht we enumerate the tests that we want to run
18:55:11 <SumitNaiksatam> rkukura: possible
18:55:29 <rkukura> was wondering if we could specify a regex or list in testr.conf
18:55:41 <SumitNaiksatam> rkukura: unfortunately not regex
18:55:52 <rkukura> that’s what I concluded too
18:55:57 <SumitNaiksatam> you can either give a directory starting point for discovery
18:56:04 <SumitNaiksatam> or provide a list
18:56:22 <SumitNaiksatam> i havent tried the later, but based on documentation it should be doable
18:56:32 <rkukura> when running tox manually, you can use a regex on the command line I think, which matches a subset of those discovered
18:56:52 <rkukura> wonder if we can change the job to do that?
18:56:58 <SumitNaiksatam> rkukura: ah, i did not try that, but good point
18:57:39 <SumitNaiksatam> rkukura: we might have to completely define a new py27 job
18:57:46 <annakk> I think regex worked for me
18:57:53 <SumitNaiksatam> annakk: nice
18:58:07 <SumitNaiksatam> rkukura: we currently use the standard py27 job definition
18:58:23 <SumitNaiksatam> anyway, we are hitting the hour
18:58:29 <rkukura> I’ve never looked at how these jobs are defined
18:58:37 <SumitNaiksatam> rkukura: i can send you a pointer
18:58:38 <rkukura> that’s all from me on stable/newton
18:58:43 <SumitNaiksatam> rkukura: thanks
18:59:01 <SumitNaiksatam> annakk: anything more for today?
18:59:13 <SumitNaiksatam> annakk: great to read that you enjoyed Sydney! :-)
18:59:15 <annakk> no
18:59:21 <annakk> :)
18:59:39 <SumitNaiksatam> was planning to put your retrospective on the agenda that following week but we missed the meeting
18:59:44 <SumitNaiksatam> thanks for your email though!
18:59:53 <SumitNaiksatam> alrighty thanks rkukura and annakk for joining!
18:59:55 <SumitNaiksatam> bye!
18:59:58 <rkukura> thanks!
19:00:00 <SumitNaiksatam> #endmeeting