18:01:06 <SumitNaiksatam> #startmeeting networking_policy 18:01:07 <openstack> Meeting started Thu Mar 23 18:01:06 2017 UTC and is due to finish in 60 minutes. The chair is SumitNaiksatam. Information about MeetBot at http://wiki.debian.org/MeetBot. 18:01:08 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 18:01:11 <openstack> The meeting name has been set to 'networking_policy' 18:01:26 <igordcard> hi SumitNaiksatam 18:01:29 <igordcard> hi all 18:01:32 <SumitNaiksatam> #info agenda https://wiki.openstack.org/wiki/Meetings/GroupBasedPolicy#March_23rd_2017 18:01:37 <SumitNaiksatam> rkukura: hi 18:01:38 <rkukura> hi 18:02:01 <SumitNaiksatam> so most of the newton sync patches merged over the past few days 18:02:06 <SumitNaiksatam> thanks all for the work and the reviews 18:02:06 <tbachman> SumitNaiksatam: congrats! 18:02:14 * igordcard claps 18:02:20 * tbachman knows that was no small effort 18:02:25 <SumitNaiksatam> tbachman: but still facing some niggling issues :-) 18:02:28 <tbachman> :) 18:02:36 <SumitNaiksatam> igordcard: have to apologize to you since it created more work for you 18:02:47 <SumitNaiksatam> so lets go to QoS first 18:03:04 <SumitNaiksatam> #topic QoS via NSP patch 18:03:04 <igordcard> SumitNaiksatam: it's ok I invested on the wrong week to do it :p 18:03:06 <SumitNaiksatam> #link https://review.openstack.org/#/c/426436 18:03:15 <SumitNaiksatam> igordcard: :-( 18:03:28 <SumitNaiksatam> igordcard: so per your latest comment, this is in good shape now? 18:03:59 <igordcard> SumitNaiksatam: it looks like it is, at least by comparing the latest nfp gate failures with the nfp gate failures of one of the last merged patch 18:04:10 <SumitNaiksatam> igordcard: nice 18:04:28 <igordcard> I am disabling QoS entirely on the aim gate, is this ok? 18:04:36 <SumitNaiksatam> igordcard: i was planning to look at it before the meeting, but got distracted with something else 18:04:43 <SumitNaiksatam> igordcard: oh 18:04:59 <SumitNaiksatam> igordcard: does it break if you dont? 18:05:24 <igordcard> SumitNaiksatam: yeah and the errors weren't very explicit 18:05:43 <igordcard> SumitNaiksatam: I believe it was mentioned back in the time that aim didn't have to support qos? 18:06:01 <SumitNaiksatam> igordcard: yes, aim doesnt have to support 18:06:10 <SumitNaiksatam> igordcard: but i would be curious to know why it failed 18:06:32 <SumitNaiksatam> igordcard: so when you say you disabled qos, you mean which configuration? 18:06:35 <igordcard> #link http://logs.openstack.org/36/426436/20/check/gate-group-based-policy-dsvm-aim-ubuntu-xenial-nv/fd4ebeb/console.html#_2017-03-21_22_10_08_099496 18:07:04 <rkukura> I’m part way through reviewing that patch, and thought the devstack config looked reasonable 18:07:11 <igordcard> #link https://review.openstack.org/#/c/426436/22/devstack/override-defaults 18:07:17 <SumitNaiksatam> rkukura: great 18:07:37 <SumitNaiksatam> igordcard: okay got it 18:07:41 <rkukura> It does not add the QoS extension driver for AIM, but does otherwise 18:07:41 <SumitNaiksatam> igordcard: that looks fine to me 18:07:49 <SumitNaiksatam> rkukura: yeah, that is fine for now 18:08:14 <SumitNaiksatam> rkukura: so great, rkukura i was going to request you to review, but looks like you are already on it 18:08:38 <rkukura> yes 18:08:48 <SumitNaiksatam> so if nothing major, lets try to merge it before it diverges more 18:08:56 <SumitNaiksatam> igordcard: we would have to backport to stable/newton 18:09:12 <SumitNaiksatam> and then the question is if we should backport it to stable/mitaka as well 18:09:16 <rkukura> only comment so far other than ripping out the clean_session stuff had to do with the exception text 18:09:45 <SumitNaiksatam> i ah 18:09:48 <SumitNaiksatam> *ah 18:10:02 <rkukura> I’m not sure I understand the original text either 18:10:02 <SumitNaiksatam> so looks like it will need another rebase :-( 18:10:42 * rkukura will be back in about 2 minutes 18:10:57 <igordcard> great, feel free to leave all the comments there and I'll fix and rebase on the next patchset 18:11:24 <SumitNaiksatam> igordcard: thanks 18:11:49 <SumitNaiksatam> #topic NFP patches 18:12:10 <SumitNaiksatam> the other big thing I had were the NFP patches 18:12:33 <SumitNaiksatam> oh and there is songole right on cue :-) 18:12:38 <songole> Hi 18:12:49 <songole> Wrong timing :) 18:12:55 <SumitNaiksatam> songole: lol 18:13:06 <tbachman> lol 18:13:09 * igordcard thanks all and gracefully leaves to get home 18:13:11 <SumitNaiksatam> songole: so you and hemanth are mostly shepherding the NFP patches 18:13:20 <SumitNaiksatam> igordcard: thanks a bunch for taking the time to join! 18:13:27 <SumitNaiksatam> igordcard: good night! 18:13:32 <igordcard> :) 18:13:50 <SumitNaiksatam> songole: a disruptive patch just merged 18:14:14 <songole> What is it? Qos? 18:14:17 <SumitNaiksatam> disruptive in the sense that it requires a rebase for other patches 18:14:24 <SumitNaiksatam> songole: no, QoS not merged yet 18:14:43 <songole> Ah. 18:14:48 <SumitNaiksatam> oh, I should have mentioned this in the bigger context - we are completely eliminating all the “clean_session” stuff 18:15:39 <SumitNaiksatam> #link https://review.openstack.org/#/c/448885/ 18:16:09 <SumitNaiksatam> so this eliminates the use of the clean_session flag 18:16:22 <SumitNaiksatam> but it causes merge conflicting with the existing patches 18:16:39 <SumitNaiksatam> so if you see conflicts thats the first thing you need to take care of 18:18:11 <songole> ok 18:18:12 <SumitNaiksatam> songole: other than that, how are we doing on the NFP patches? 18:18:41 <SumitNaiksatam> songole: there were some patches which tried to fix the NFP gate job, and which we merged, but the gate job is still broken 18:18:42 <songole> we are facing a few issues with lbaasv2 18:18:48 <SumitNaiksatam> songole: ah okay 18:18:56 <SumitNaiksatam> songole: do we need to discuss here? 18:19:12 <songole> in the base mode where we used to use the namespace lb implementation 18:19:21 <songole> without having to launch a VM for lb 18:19:41 <SumitNaiksatam> songole: ah, but you can do that any more? 18:19:45 <SumitNaiksatam> *cant 18:20:01 <songole> looks like it. default is octavia 18:20:08 <SumitNaiksatam> bummer!!! 18:20:10 <songole> which spins up a vm 18:20:16 <SumitNaiksatam> hmmm 18:20:25 <SumitNaiksatam> okay i see the difficulty now 18:20:43 <tbachman> LBaa(!OS)S 18:20:56 <SumitNaiksatam> tbachman: :-) :-( 18:21:02 <songole> lol 18:21:15 <SumitNaiksatam> mixed feelings 18:21:32 <SumitNaiksatam> songole: so there is no way to adapt the old driver to v2? 18:21:34 <songole> so, it may take sometime to get the tests running 18:21:43 <SumitNaiksatam> just to validate the gate 18:21:52 <tbachman> I think you can still use LBaaSv2, but it’s deprecated 18:22:05 <SumitNaiksatam> tbachman: i think you mean v1 18:22:07 * rkukura finally back 18:22:08 <tbachman> ah 18:22:09 <tbachman> k 18:22:15 <SumitNaiksatam> and v1 is totally out of newton 18:22:16 <tbachman> it will finally hit us at some point 18:22:17 <SumitNaiksatam> hence the issue 18:22:37 <songole> ash said there might be a way to use namespace in v2 18:22:48 <SumitNaiksatam> songole: okay 18:23:08 <SumitNaiksatam> songole: that said should the tests work on mitaka? 18:23:11 * tbachman was confused 18:23:20 <SumitNaiksatam> tbachman: np, i know what you meant 18:23:29 <songole> mitaka should be good 18:23:46 <songole> the issue is only with newton 18:24:08 <songole> but nfp tests are failing on mitaka for a different reason.. 18:24:17 <SumitNaiksatam> songole: so what i am suggesting is that to some small extent we can at least retroactively validate against stable/mitaka (but this will be after the backport, so master is already merged by then) 18:24:25 <SumitNaiksatam> songole: ah okay 18:24:51 <SumitNaiksatam> songole: what happens if you dont launch the service instance in newton? 18:25:05 <SumitNaiksatam> songole: can we fake the responses? 18:25:54 <SumitNaiksatam> songole: just so that we can dervie benefit from all the other things in that gate test 18:26:03 <songole> Will explore the idea 18:26:14 <SumitNaiksatam> songole: right now since the whole job fails we cant tell what is broken 18:26:52 <SumitNaiksatam> songole: okay thanks 18:26:52 <songole> got it 18:27:04 <SumitNaiksatam> songole: anyting else you want to bring up on the NFP patches today? 18:27:42 <songole> nothing more.. 18:27:47 <SumitNaiksatam> songole: okay things 18:27:54 <SumitNaiksatam> #topic Open Discussion 18:28:19 <SumitNaiksatam> so we are seeing some DB perplexing DB issues when running newton (with aim_mapping driver) 18:28:31 <SumitNaiksatam> which are not noticed in the gate 18:28:55 * tbachman listens in 18:29:00 <SumitNaiksatam> so if you hit any wierdness, get in touch with me, might save you some time 18:29:22 <SumitNaiksatam> tbachman: we think that the session is somehow leaking across threads 18:29:36 <tbachman> SumitNaiksatam: sounds familiar ;) 18:29:46 <SumitNaiksatam> tbachman: again :-) :-( 18:29:52 <tbachman> I was always a little nervous when we backed out the expunge_all 18:30:09 <tbachman> b/c I wasn’t convinced we no longer had a root cause 18:30:31 <SumitNaiksatam> tbachman: i think the expunge all would have created even more problems 18:30:37 <SumitNaiksatam> tbachman: but thanks for bringing that up 18:30:45 <tbachman> SumitNaiksatam: ack. but it was something that let us know that there might be a problem there 18:30:50 <tbachman> (i.e. that it was a possibility) 18:30:54 <SumitNaiksatam> tbachman: i had lost track of the fact that neutron does some expunging in newton 18:31:06 <tbachman> SumitNaiksatam: do you have any logs of the more recent failures? 18:31:13 <SumitNaiksatam> tbachman: initially i had patched that but then i let it be there 18:31:57 <SumitNaiksatam> tbachman: these are in the deployment, i think jishnu refreshed the fab 18:32:05 <tbachman> SumitNaiksatam: ack 18:32:16 <SumitNaiksatam> tbachman: will send you once we are able to reproduce them again 18:32:21 <tbachman> SumitNaiksatam: thx! 18:32:27 * rkukura back in moment 18:32:33 * tbachman may be working on the same problem with Jishnu in parallel 18:32:34 <SumitNaiksatam> i also ran into a very wierd issue in stable/mitaka 18:32:40 <SumitNaiksatam> tbachman: ah right 18:32:49 <SumitNaiksatam> the issue is complicated to explain 18:33:20 <SumitNaiksatam> but basically I was seeing this exception - “ResourceClosedError: This transaction is closed" 18:33:29 <SumitNaiksatam> i know exactly where it is happening 18:33:34 <SumitNaiksatam> but not why! 18:33:38 <SumitNaiksatam> i have a workaround 18:33:43 <songole> SumitNaiksatam: are you able to bring up a working devstack on newton consistently? 18:33:47 <SumitNaiksatam> but need to get the root cause 18:34:05 <SumitNaiksatam> songole: the devstack installation is happening in every gate job run 18:34:08 <SumitNaiksatam> songole: so yes 18:34:17 <SumitNaiksatam> songole: i tried last week on my system 18:34:20 <SumitNaiksatam> not this well 18:34:24 <SumitNaiksatam> *week 18:34:44 <tbachman> SumitNaiksatam: this is only on Mitaka? 18:34:48 <songole> right. 18:34:58 <SumitNaiksatam> tbachman: and yes, this later issue on mitaka 18:35:10 <SumitNaiksatam> tbachman: the session sharing issue on newton 18:36:08 <tbachman> SumitNaiksatam: so this isn’t related to the problem you were seeing with the newton sync 18:36:12 <tbachman> (b/c it’s mitaka) 18:36:37 <tbachman> (i.e. caused by the addition of the context manager) 18:37:05 * rkukura is back 18:37:07 <SumitNaiksatam> tbachman: no 18:37:26 <SumitNaiksatam> tbachman: yes, mitaka issue has probably been around 18:37:39 <SumitNaiksatam> tbachman: its just that we didnt catch it 18:37:51 <SumitNaiksatam> tbachman: the issue is that all the DB operations are committed 18:38:14 <SumitNaiksatam> however, when the outer most transaction context exits, the exception is thrown 18:38:20 <SumitNaiksatam> (i am referring to the resource closed) 18:39:19 * tbachman wishes he had slayed that dragon when he first encountered it :( 18:39:21 <SumitNaiksatam> this is the trace from that: #link https://www.pastiebin.com/58cd90f6bce04 18:39:49 <SumitNaiksatam> tbachman: i dont know if its just one dragon 18:40:12 <SumitNaiksatam> and for the newton issue, this is the trace: #link https://www.pastiebin.com/58d16e63140f9 18:40:30 <SumitNaiksatam> we dont have a lead on the newton issue 18:40:46 <SumitNaiksatam> on the mitaka issue, like i said, i have a workaround, but its not committed in code 18:41:09 <SumitNaiksatam> the newton issue gets triggered only when we use a heat template to exercise the system 18:41:47 <SumitNaiksatam> rkukura: and you are looking at this, #link https://www.pastiebin.com/58d01638c9db3 18:42:19 <SumitNaiksatam> so three DB issues, when the aim drivers are used 18:42:43 <SumitNaiksatam> just want to make sure everyone is aware in case you hit them 18:42:46 <rkukura> I’ve looked at the log, but that’s about as far as I’ve got. Seems it retries and succeeds. 18:43:02 <SumitNaiksatam> rkukura: yeah there is no functional issue there 18:43:14 <SumitNaiksatam> rkukura: i think this might be on account of the expunge that neutron is doing 18:43:50 <rkukura> SumitNaiksatam: Where is that expunge? 18:45:10 <SumitNaiksatam> rkukura: so i recall that before the extension attributes get processed, the expunge gets call 18:45:13 <SumitNaiksatam> *called 18:45:41 <SumitNaiksatam> i am really not comfortable with that expunge but i did not patch it since it was not breaking things in the gate tests 18:45:45 <rkukura> right - I’ll look and see if that could be related 18:45:53 <SumitNaiksatam> rkukura: just do a grep and you will see it 18:45:59 <SumitNaiksatam> i think its in a couple of places 18:46:05 <tbachman> it’s weird — how come we don’t see the parent class implementation in delete_policy_target? 18:46:08 <tbachman> (in the trace) 18:46:19 <SumitNaiksatam> this is the neutron code that i am referring to, not GBP 18:46:24 <tbachman> (the one for stable/mitaka) 18:46:58 <SumitNaiksatam> tbachman: its difficult to read that trace in the way things get called on account of the decorators involved 18:47:05 <tbachman> yeah 18:47:30 <SumitNaiksatam> tbachman: you will flip even more if i tell you how the workaround works 18:47:35 <tbachman> :) 18:48:16 <SumitNaiksatam> anyway, not a very happy place right now! 18:48:28 <tbachman> :( 18:48:36 * tbachman hands SumitNaiksatam a snickers 18:48:40 <SumitNaiksatam> tbachman: :-) 18:48:55 <SumitNaiksatam> tbachman: so if you are running tempest tests, be warned 18:49:04 <tbachman> SumitNaiksatam: thx 18:49:22 <SumitNaiksatam> not that its very helpful just saying that :-) 18:49:44 <SumitNaiksatam> we might have to pool our collective wisdom at some point to get past these DB issues 18:50:19 <SumitNaiksatam> alrighty, if not else for today, we can stop here 18:50:34 <annak> i wanted to suggest something small 18:50:48 <annak> on a much smaller scale :) 18:50:52 <SumitNaiksatam> annak: yes please, and thanks for all the newton patches! 18:51:05 <annak> to move to neutron_lib pep8 factory 18:51:09 <annak> instead of neutron 18:51:17 <SumitNaiksatam> annak: okay 18:51:30 <annak> i think that's what we're supposed to do. 18:51:40 <SumitNaiksatam> annak: sure then we should 18:51:44 <annak> ok :) i'll do that 18:51:49 <SumitNaiksatam> annak: i havent explored that 18:51:53 <SumitNaiksatam> annak: great, thanks! 18:52:34 <SumitNaiksatam> annak: did you see any issues using newton with the vmw backend? 18:52:42 <SumitNaiksatam> i mean GBP newton 18:53:22 <annak> the vmw plugin is still very minimal, and no, no GBP-specific issues 18:53:35 <annak> lots of backend-specific issues :) 18:53:41 <SumitNaiksatam> annak: hmmm, okay 18:53:54 <SumitNaiksatam> annak: we are seeing these issues when we have concurrent operations 18:54:36 <annak> I am still far from testing anything under load, but thanks, i'll keep that in mind 18:54:44 <SumitNaiksatam> annak: so may be if you have a test suite which exercises things in parallel, perhaps you can report back your experience 18:54:47 <SumitNaiksatam> annak: yeah 18:55:10 <SumitNaiksatam> okay, thanks all for joining 18:55:11 <SumitNaiksatam> bye 18:55:22 <annak> bye! 18:55:25 <SumitNaiksatam> #endmeeting