18:02:38 #startmeeting networking_policy 18:02:39 Meeting started Thu Nov 30 18:02:38 2017 UTC and is due to finish in 60 minutes. The chair is SumitNaiksatam. Information about MeetBot at http://wiki.debian.org/MeetBot. 18:02:40 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 18:02:42 The meeting name has been set to 'networking_policy' 18:02:47 hi annakk 18:03:38 as I was mentioning to rkukura, I dont have much of an agenda for today except the “pike sync" 18:04:03 but i thought it might be good to meet since we havent met for a few weeks for different reasons 18:04:05 * mlavalle waves at rkukura and SumitNaiksatam 18:04:19 mlavalle: hi there! :-) 18:04:20 maybe we should discuss the stable/newton branch a bit too 18:04:25 rkukura: sure 18:04:32 #topic Pike Support 18:04:32 hi mlavalle! 18:05:13 so we need at least one more reviewer on this: #link https://review.openstack.org/#/c/514864/ 18:05:18 rkukura: thanks for your review 18:05:34 looks good 18:05:37 (had fixed the devstack gate job prior to this, so it reflects in this patch as well) 18:05:58 annakk: sorry, i might have missed your comment here 18:06:08 I had one comment there but its not a must for sure 18:06:29 i saw it before thanksgiving and have been on leave since 18:06:34 totally forgot about it 18:06:39 np :) 18:06:52 I just thought it was preferrable to limit the hack to aim notifier only 18:07:12 annakk: yes sure, i will respond to the comment 18:07:28 I’ll be ready to re-review 18:07:48 and i can also do a follow up patch if required (i might not be able to get to this until monday) 18:08:20 i was spending most of my time on this patch #link https://review.openstack.org/#/c/514864/ 18:08:30 but the patchset is outdated 18:08:51 i have lot more changes in my local repo which i should have posted 18:08:55 same patch? 18:09:02 oh sorry 18:09:33 #link https://review.openstack.org/#/c/518183/ 18:09:54 basically i was stuck at a transactions issue 18:10:39 the new reader/wrtier calls in neutron were breaking the apic driver’s long transaction requirement 18:10:57 had discussed with rkukura and tbachman offline before 18:11:01 rkukura: thanks for the discussion 18:11:32 SumitNaiksatam: Is this an issue for the apic_aim MD, or just the aim_mapping PD? 18:11:33 so it seems like nesting the reader/writer calls will preserve the session 18:11:36 is this a new issue? sounds like a deja vu 18:11:56 starts with apic_aim MD 18:12:01 rkukura: ^^^ 18:12:08 and of course for the aim_mapping as well 18:12:32 SumitNaiksatam: was wondering if the problem surfaces in the apic_aim MD UTs? 18:12:38 annakk: the issue is now, the usage of reader/writer is not new 18:13:05 rkukura: yes, those are the ones i was trying to fix all along 18:13:35 so anyway, the nesting seems to work, what was throwing me off was another bug in neutron (in the extend dict part) 18:14:37 we will still have to patch the reader/writer part for attaching the notification queue to the session (like before) 18:14:48 and for that i am trying to figure out how to get it to work with the context manager 18:15:04 the extend dict was handled in ocata as well (subnet pools and address scopes) 18:15:38 annakk: yes, but there was a commit made in neutron which moved the extend_dict call to a different point (essentially before post_commit is called) 18:15:54 oh 18:16:06 annakk: and because of that we were never seeing resources extended 18:16:08 SumitNaiksatam: Is it called within the transaction now? 18:16:19 rkukura: no 18:16:40 That’s too bad 18:16:45 rkukura: i had meant to sent you the commit last week, somehow slipped my mind 18:17:09 by commit, i mean the patch where this change happened 18:17:28 I’d like to see it 18:17:35 i cant seem to find it now 18:17:49 it was a patch dealing with MTU setting i think 18:17:59 later is fine 18:18:46 rkukura: to fix that would require a small monkey patch in GBP 18:19:07 but having done all that, the UT seemed to pass in my local env 18:19:30 ok, but maybe there is a reason for the change in Neutron 18:19:42 so that was some progress as opposed to what seemed a bit of a hopeless situation with regards to dealing with the transactions (when i last spoke to rkukura) 18:19:59 rkukura: yes, i think for that particular resource they needed it that way 18:20:26 rkukura: they might have not hit the side effects since other resources might not be using the extension 18:20:43 rkukura: but anyway, you can take look at it and let me know what you think 18:20:50 sure 18:21:55 I’ve always felt extend_dict should be done just before the transaction commits to ensure that the results returned to the client reflect the state commited by the operation 18:22:07 rkukura: agree 18:22:23 i was down to some 200 UT failures and most seemed to be related to this transaction issue 18:22:41 i hope there is no other issue lurking 18:23:07 this was just the UT, so we would still need to update the devstack job to see if everything works properly 18:23:23 *UTs 18:23:29 so did this reduce below 200 failures with the extend_dict monkey patch? 18:23:55 or was it 200 failures with the monkey patch? 18:24:01 rkukura: no no, the 200 UT failures is prior to all the discussion above 18:24:16 ok, great! 18:24:16 all what i discussed above is not in the lastes patchset posted 18:24:35 and apologies, i should have posted that patchset like a week back 18:25:16 i had too many debugs all over the place and didnt get a chance to clean them 18:25:18 anywat 18:25:33 annakk: rkukura anything more to discuss on Pike? 18:25:44 not from me.. 18:25:49 nothing from me 18:26:20 once the UTs pass, perhaps if anyone has time, we can split work and knock off updating the gate jobs in parallel 18:26:34 rkukura: we would need to create temp branches for the apic repos 18:26:50 rkukura: hopefully we dont have to do much code updates 18:26:59 will need to discuss with tbachman 18:27:13 right 18:27:18 so far i was just running against the master of those branches and the UTs seemed to be okay 18:27:33 but we should at least have place holder branches 18:27:44 I hope to be able to take some, but in a while cause next week I'm on PTO 18:27:55 nsx is great in that regard since they have long had the pike branches :-) 18:27:59 I think we use AIM master for all GBP branches, don’t we? 18:28:01 annakk: np 18:28:37 rkukura: will need to go back and check (aim does use some oslo libs) 18:28:57 agreed we need to check 18:29:05 #topic Newton branches 18:29:09 rkukura: go ahead 18:30:25 I back-ported an AIM-related fix to stable/newton recently, and noticed only a small subset of the UTs run. I understand this is done because the number of cores for stable/newton jobs has been reduced. 18:30:54 rkukura: yes, we went through several iterations on this, and eseentially gave up! 18:30:58 So I tried restoring the full set (we had already eliminated a lot of redundant Neutron UTs from ml2plus) 18:31:06 since the branch is eol’ed upstream 18:31:22 Right. That stlll timed out. 18:31:36 rkukura: right 18:32:04 So I tried running the full set locally. I ran into some failures with my branch, then eventually tried running the full set locally on stable/newton without my branch. 18:32:09 This also hit some failures. 18:32:29 So I think stable/newton may already be in a somewhat broken state. 18:32:32 rkukura: i think there is an ordering issue 18:32:52 Right, pretty much any small group of tests would run fine in isolation. 18:32:57 which gets exposed when we run subsets, or in local environments 18:33:08 I think the schema loading and table clearing stuff isn’t exactly right. 18:33:29 i think the config gets carried over to tests 18:33:29 rkukura: yes, that too 18:33:42 i think i ran into this in the pike prep branch 18:33:43 I cleaned that up with regards to the AIM schema by adding a new AimSqlFixture 18:33:49 rkukura: yes, that was great 18:34:45 And I spent some time trying to cleanup the way we handle the GBP schema, which uses Neutrons base class, and therefore needs to be handled via Neutron’s StaticSqlFixture 18:35:11 But getting this to work properly requires some work… 18:35:32 I think we need to elminate all of the non-fixture schema loading and table clearing code 18:35:48 rkukura: yeah, and the problem is that its pretty to challenging to validate this in the upstream gate (for newton) 18:35:55 And we also need to ensure that the entire GBP schema is visible in every single test module 18:36:02 rkukura: sure 18:36:39 So I think we should add a new models.py into GBP that imports all the DB model files that use Neutron’s ModelBase. 18:36:44 rkukura: the assumption is that it is currently visible in all the tests that need it 18:37:14 rkukura: okay 18:37:27 I think some of the tests don’t import model files that are not needed for that test modules, which means subsequent tests run by the same process fail later 18:37:51 rkukura: okay 18:38:05 Just wanted to see if you all thought this analysis made sense 18:38:26 we do the explicit create_tables thing, which is for sure a red flag, and should not be required if we use the fixtures properly, i guess 18:38:33 and if its worth trying to clean it up on master, then backport it to all our active branches 18:39:37 rkukura: per my comment above, the only hole in my understanding is the use of the sql fixture, i thought using that correctly would fix this problem 18:39:41 SumitNaiksatam: My conclusion is that Neutron’s StaticSqlFixture should work properly as long as all the GBP repo’s models that use Neutron’s ModelBase are imported in every test module 18:40:08 rkukura: ah, was just saying that, perhaps the import is required in addition to get it to work 18:40:10 I think its a combo of not going around the fixture, and making sure the fixture always loads the entire schema for all tests 18:40:38 rkukura: yeah, nice explanation, it makes sense to me 18:40:52 annakk: make sense to you? 18:41:25 I need to catch up on this, sorry :) 18:41:29 no problem 18:41:59 Anyway, if it does make sense, I’ll start a new branch and try to get it working 18:42:11 rkukura: sounds good to me 18:42:28 annakk: its a little difficult to follow this discussion without enough context 18:42:47 SumitNaiksatam: Did you say this may be currently an issue for the pike sync? 18:42:58 yeah I noticed the cleanup was weird, but never got to the bottom of it 18:43:46 rkukura: i made some spot fixes which eliminated issues with configs not getting loaded correctly (previously it worked since the test were running in a different order in the gate, i guess) 18:44:05 rkukura: so, so far not a blocker for pike sync 18:44:30 but i can only say with confidence if can get the whole suite to pass :-) 18:44:32 SumitNaiksatam: OK. If there are fixes related to UT ordering, it would be ideal to back-port those to the older branches 18:44:48 rkukura: okay 18:45:14 we might have to cherry-pick only part of the change 18:45:19 sure 18:45:43 the other thing with stable/newton... 18:46:17 It seems to me that if infra reduces the number of CPU cores allocated, they should correspondingly increase the overall job timeout. 18:46:34 rkukura: right, that is the source of the timeout 18:47:01 sorry, i missed your “if” 18:47:18 s/if/when/ 18:47:47 rkukura: well, based on my observation, that definitely did not happen 18:48:02 what’s the process to request such a change from infra? 18:48:19 rkukura: i think this can be controlled from our gate job 18:48:26 really? 18:48:35 i mean, not from our repo 18:48:48 but from our definittion of the gate job in the openstack-infra 18:49:04 Do we propose a change in gerrit to do that? 18:49:09 but i wasnt comforable even doing that before we put our house in order 18:49:23 rkukura: yes, but i dont know what is that exact config 18:49:33 Any particular part of our house? 18:49:51 i believe it would go into the zuul def in project-config repo 18:50:10 rkukura: well we did a part of that by eliminating some UTs 18:50:26 some of our UTs run for 90 seconds or so 18:50:58 We control the timeout for individual UTs in our own repo, so I don’t see that as an issue 18:51:09 in general, if you see much bigger projects like neutron, their py27 jobs still finish in about 20 mins 18:51:24 so we are doing more than unit testing :-) 18:51:30 we are better off with fewer long-running UTs that having many UTs duplicate the same work to get setup 18:51:51 right 18:51:57 rkukura: i agree that the philosophy makes more sense for GBP 18:51:59 but I don’t see us changing that anytime soon 18:52:04 rkukura: yes 18:52:13 and hence my predicament in going to infra :-) 18:52:33 *going to infra -> approaching the infra team earlier 18:53:28 also trying to make this case for an eol’ed branch seemed like a tough proposition to begin with 18:53:29 If we just proposed a patch to an infra repo to increase the stable/newton py27 job timeout, explaining that its needed due to fewer cores being allocated, do you think they would reject it? 18:53:29 can we split the UT to several jobs? 18:53:56 annakk: yes, that is possible, and its a nice novel idea 18:54:08 Seems restructuring UTs is not something that should even be back-ported to a stable branch, let alone for an EOL branch 18:54:20 rkukura: agree 18:54:39 annakk: but i dont think we would be able to do that for newton branch 18:54:59 So we could promise to work to improve things on our master, while asking for the change on the EOL stable jobs 18:55:00 rkukura: one option is taht we enumerate the tests that we want to run 18:55:11 rkukura: possible 18:55:29 was wondering if we could specify a regex or list in testr.conf 18:55:41 rkukura: unfortunately not regex 18:55:52 that’s what I concluded too 18:55:57 you can either give a directory starting point for discovery 18:56:04 or provide a list 18:56:22 i havent tried the later, but based on documentation it should be doable 18:56:32 when running tox manually, you can use a regex on the command line I think, which matches a subset of those discovered 18:56:52 wonder if we can change the job to do that? 18:56:58 rkukura: ah, i did not try that, but good point 18:57:39 rkukura: we might have to completely define a new py27 job 18:57:46 I think regex worked for me 18:57:53 annakk: nice 18:58:07 rkukura: we currently use the standard py27 job definition 18:58:23 anyway, we are hitting the hour 18:58:29 I’ve never looked at how these jobs are defined 18:58:37 rkukura: i can send you a pointer 18:58:38 that’s all from me on stable/newton 18:58:43 rkukura: thanks 18:59:01 annakk: anything more for today? 18:59:13 annakk: great to read that you enjoyed Sydney! :-) 18:59:15 no 18:59:21 :) 18:59:39 was planning to put your retrospective on the agenda that following week but we missed the meeting 18:59:44 thanks for your email though! 18:59:53 alrighty thanks rkukura and annakk for joining! 18:59:55 bye! 18:59:58 thanks! 19:00:00 #endmeeting