18:01:06 <SumitNaiksatam> #startmeeting networking_policy
18:01:07 <openstack> Meeting started Thu Mar 23 18:01:06 2017 UTC and is due to finish in 60 minutes.  The chair is SumitNaiksatam. Information about MeetBot at http://wiki.debian.org/MeetBot.
18:01:08 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
18:01:11 <openstack> The meeting name has been set to 'networking_policy'
18:01:26 <igordcard> hi SumitNaiksatam
18:01:29 <igordcard> hi all
18:01:32 <SumitNaiksatam> #info agenda https://wiki.openstack.org/wiki/Meetings/GroupBasedPolicy#March_23rd_2017
18:01:37 <SumitNaiksatam> rkukura: hi
18:01:38 <rkukura> hi
18:02:01 <SumitNaiksatam> so most of the newton sync patches merged over the past few days
18:02:06 <SumitNaiksatam> thanks all for the work and the reviews
18:02:06 <tbachman> SumitNaiksatam: congrats!
18:02:14 * igordcard claps
18:02:20 * tbachman knows that was no small effort
18:02:25 <SumitNaiksatam> tbachman: but still facing some niggling issues :-)
18:02:28 <tbachman> :)
18:02:36 <SumitNaiksatam> igordcard: have to apologize to you since it created more work for you
18:02:47 <SumitNaiksatam> so lets go to QoS first
18:03:04 <SumitNaiksatam> #topic QoS via NSP patch
18:03:04 <igordcard> SumitNaiksatam: it's ok I invested on the wrong week to do it :p
18:03:06 <SumitNaiksatam> #link https://review.openstack.org/#/c/426436
18:03:15 <SumitNaiksatam> igordcard: :-(
18:03:28 <SumitNaiksatam> igordcard: so per your latest comment, this is in good shape now?
18:03:59 <igordcard> SumitNaiksatam: it looks like it is, at least by comparing the latest nfp gate failures with the nfp gate failures of one of the last merged patch
18:04:10 <SumitNaiksatam> igordcard: nice
18:04:28 <igordcard> I am disabling QoS entirely on the aim gate, is this ok?
18:04:36 <SumitNaiksatam> igordcard: i was planning to look at it before the meeting, but got distracted with something else
18:04:43 <SumitNaiksatam> igordcard: oh
18:04:59 <SumitNaiksatam> igordcard: does it break if you dont?
18:05:24 <igordcard> SumitNaiksatam: yeah and the errors weren't very explicit
18:05:43 <igordcard> SumitNaiksatam: I believe it was mentioned back in the time that aim didn't have to support qos?
18:06:01 <SumitNaiksatam> igordcard: yes, aim doesnt have to support
18:06:10 <SumitNaiksatam> igordcard: but i would be curious to know why it failed
18:06:32 <SumitNaiksatam> igordcard: so when you say you disabled qos, you mean which configuration?
18:06:35 <igordcard> #link http://logs.openstack.org/36/426436/20/check/gate-group-based-policy-dsvm-aim-ubuntu-xenial-nv/fd4ebeb/console.html#_2017-03-21_22_10_08_099496
18:07:04 <rkukura> I’m part way through reviewing that patch, and thought the devstack config looked reasonable
18:07:11 <igordcard> #link https://review.openstack.org/#/c/426436/22/devstack/override-defaults
18:07:17 <SumitNaiksatam> rkukura: great
18:07:37 <SumitNaiksatam> igordcard: okay got it
18:07:41 <rkukura> It does not add the QoS extension driver for AIM, but does otherwise
18:07:41 <SumitNaiksatam> igordcard: that looks fine to me
18:07:49 <SumitNaiksatam> rkukura: yeah, that is fine for now
18:08:14 <SumitNaiksatam> rkukura: so great, rkukura i was going to request you to review, but looks like you are already on it
18:08:38 <rkukura> yes
18:08:48 <SumitNaiksatam> so if nothing major, lets try to merge it before it diverges more
18:08:56 <SumitNaiksatam> igordcard: we would have to backport to stable/newton
18:09:12 <SumitNaiksatam> and then the question is if we should backport it to stable/mitaka as well
18:09:16 <rkukura> only comment so far other than ripping out the clean_session stuff had to do with the exception text
18:09:45 <SumitNaiksatam> i ah
18:09:48 <SumitNaiksatam> *ah
18:10:02 <rkukura> I’m not sure I understand the original text either
18:10:02 <SumitNaiksatam> so looks like it will need another rebase :-(
18:10:42 * rkukura will be back in about 2 minutes
18:10:57 <igordcard> great, feel free to leave all the comments there and I'll fix and rebase on the next patchset
18:11:24 <SumitNaiksatam> igordcard: thanks
18:11:49 <SumitNaiksatam> #topic NFP patches
18:12:10 <SumitNaiksatam> the other big thing I had were the NFP patches
18:12:33 <SumitNaiksatam> oh and there is songole right on cue :-)
18:12:38 <songole> Hi
18:12:49 <songole> Wrong timing :)
18:12:55 <SumitNaiksatam> songole: lol
18:13:06 <tbachman> lol
18:13:09 * igordcard thanks all and gracefully leaves to get home
18:13:11 <SumitNaiksatam> songole: so you and hemanth are mostly shepherding the NFP patches
18:13:20 <SumitNaiksatam> igordcard: thanks a bunch for taking the time to join!
18:13:27 <SumitNaiksatam> igordcard: good night!
18:13:32 <igordcard> :)
18:13:50 <SumitNaiksatam> songole: a disruptive patch just merged
18:14:14 <songole> What is it? Qos?
18:14:17 <SumitNaiksatam> disruptive in the sense that it requires a rebase for other patches
18:14:24 <SumitNaiksatam> songole: no, QoS not merged yet
18:14:43 <songole> Ah.
18:14:48 <SumitNaiksatam> oh, I should have mentioned this in the bigger context - we are completely eliminating all the “clean_session” stuff
18:15:39 <SumitNaiksatam> #link https://review.openstack.org/#/c/448885/
18:16:09 <SumitNaiksatam> so this eliminates the use of the clean_session flag
18:16:22 <SumitNaiksatam> but it causes merge conflicting with the existing patches
18:16:39 <SumitNaiksatam> so if you see conflicts thats the first thing you need to take care of
18:18:11 <songole> ok
18:18:12 <SumitNaiksatam> songole: other than that, how are we doing on the NFP patches?
18:18:41 <SumitNaiksatam> songole: there were some patches which tried to fix the NFP gate job, and which we merged, but the gate job is still broken
18:18:42 <songole> we are facing a few issues with lbaasv2
18:18:48 <SumitNaiksatam> songole: ah okay
18:18:56 <SumitNaiksatam> songole: do we need to discuss here?
18:19:12 <songole> in the base mode where we used to use the namespace lb implementation
18:19:21 <songole> without having to launch a VM for lb
18:19:41 <SumitNaiksatam> songole: ah, but you can do that any more?
18:19:45 <SumitNaiksatam> *cant
18:20:01 <songole> looks like it. default is octavia
18:20:08 <SumitNaiksatam> bummer!!!
18:20:10 <songole> which spins up a vm
18:20:16 <SumitNaiksatam> hmmm
18:20:25 <SumitNaiksatam> okay i see the difficulty now
18:20:43 <tbachman> LBaa(!OS)S
18:20:56 <SumitNaiksatam> tbachman: :-) :-(
18:21:02 <songole> lol
18:21:15 <SumitNaiksatam> mixed feelings
18:21:32 <SumitNaiksatam> songole: so there is no way to adapt the old driver to v2?
18:21:34 <songole> so, it may take sometime to get the tests running
18:21:43 <SumitNaiksatam> just to validate the gate
18:21:52 <tbachman> I think you can still use LBaaSv2, but it’s deprecated
18:22:05 <SumitNaiksatam> tbachman: i think you mean v1
18:22:07 * rkukura finally back
18:22:08 <tbachman> ah
18:22:09 <tbachman> k
18:22:15 <SumitNaiksatam> and v1 is totally out of newton
18:22:16 <tbachman> it will finally hit us at some point
18:22:17 <SumitNaiksatam> hence the issue
18:22:37 <songole> ash said there might be a way to use namespace in v2
18:22:48 <SumitNaiksatam> songole: okay
18:23:08 <SumitNaiksatam> songole: that said should the tests work on mitaka?
18:23:11 * tbachman was confused
18:23:20 <SumitNaiksatam> tbachman: np, i know what you meant
18:23:29 <songole> mitaka should be good
18:23:46 <songole> the issue is only with newton
18:24:08 <songole> but nfp tests are failing on mitaka for a different reason..
18:24:17 <SumitNaiksatam> songole: so what i am suggesting is that to some small extent we can at least retroactively validate against stable/mitaka (but this will be after the backport, so master is already merged by then)
18:24:25 <SumitNaiksatam> songole: ah okay
18:24:51 <SumitNaiksatam> songole: what happens if you dont launch the service instance in newton?
18:25:05 <SumitNaiksatam> songole: can we fake the responses?
18:25:54 <SumitNaiksatam> songole: just so that we can dervie benefit from all the other things in that gate test
18:26:03 <songole> Will explore the idea
18:26:14 <SumitNaiksatam> songole: right now since the whole job fails we cant tell what is broken
18:26:52 <SumitNaiksatam> songole: okay thanks
18:26:52 <songole> got it
18:27:04 <SumitNaiksatam> songole: anyting else you want to bring up on the NFP patches today?
18:27:42 <songole> nothing more..
18:27:47 <SumitNaiksatam> songole: okay things
18:27:54 <SumitNaiksatam> #topic Open Discussion
18:28:19 <SumitNaiksatam> so we are seeing some DB perplexing DB issues when running newton (with aim_mapping driver)
18:28:31 <SumitNaiksatam> which are not noticed in the gate
18:28:55 * tbachman listens in
18:29:00 <SumitNaiksatam> so if you hit any wierdness, get in touch with me, might save you some time
18:29:22 <SumitNaiksatam> tbachman: we think that the session is somehow leaking across threads
18:29:36 <tbachman> SumitNaiksatam: sounds familiar ;)
18:29:46 <SumitNaiksatam> tbachman: again :-) :-(
18:29:52 <tbachman> I was always a little nervous when we backed out the expunge_all
18:30:09 <tbachman> b/c I wasn’t convinced we no longer had a root cause
18:30:31 <SumitNaiksatam> tbachman: i think the expunge all would have created even more problems
18:30:37 <SumitNaiksatam> tbachman: but thanks for bringing that up
18:30:45 <tbachman> SumitNaiksatam: ack. but it was something that let us know that there might be a problem there
18:30:50 <tbachman> (i.e. that it was a possibility)
18:30:54 <SumitNaiksatam> tbachman: i had lost track of the fact that neutron does some expunging in newton
18:31:06 <tbachman> SumitNaiksatam: do you have any logs of the more recent failures?
18:31:13 <SumitNaiksatam> tbachman: initially i had patched that but then i let it be there
18:31:57 <SumitNaiksatam> tbachman: these are in the deployment, i think jishnu refreshed the fab
18:32:05 <tbachman> SumitNaiksatam: ack
18:32:16 <SumitNaiksatam> tbachman: will send you once we are able to reproduce them again
18:32:21 <tbachman> SumitNaiksatam: thx!
18:32:27 * rkukura back in moment
18:32:33 * tbachman may be working on the same problem with Jishnu in parallel
18:32:34 <SumitNaiksatam> i also ran into a very wierd issue in stable/mitaka
18:32:40 <SumitNaiksatam> tbachman: ah right
18:32:49 <SumitNaiksatam> the issue is complicated to explain
18:33:20 <SumitNaiksatam> but basically I was seeing this exception - “ResourceClosedError: This transaction is closed"
18:33:29 <SumitNaiksatam> i know exactly where it is happening
18:33:34 <SumitNaiksatam> but not why!
18:33:38 <SumitNaiksatam> i have a workaround
18:33:43 <songole> SumitNaiksatam: are you able to bring up a working devstack on newton consistently?
18:33:47 <SumitNaiksatam> but need to get the root cause
18:34:05 <SumitNaiksatam> songole: the devstack installation is happening in every gate job run
18:34:08 <SumitNaiksatam> songole: so yes
18:34:17 <SumitNaiksatam> songole: i tried last week on my system
18:34:20 <SumitNaiksatam> not this well
18:34:24 <SumitNaiksatam> *week
18:34:44 <tbachman> SumitNaiksatam: this is only on Mitaka?
18:34:48 <songole> right.
18:34:58 <SumitNaiksatam> tbachman: and yes, this later issue on mitaka
18:35:10 <SumitNaiksatam> tbachman: the session sharing issue on newton
18:36:08 <tbachman> SumitNaiksatam: so this isn’t related to the problem you were seeing with the newton sync
18:36:12 <tbachman> (b/c it’s mitaka)
18:36:37 <tbachman> (i.e. caused by the addition of the context manager)
18:37:05 * rkukura is back
18:37:07 <SumitNaiksatam> tbachman: no
18:37:26 <SumitNaiksatam> tbachman: yes, mitaka issue has probably been around
18:37:39 <SumitNaiksatam> tbachman: its just that we didnt catch it
18:37:51 <SumitNaiksatam> tbachman: the issue is that all the DB operations are committed
18:38:14 <SumitNaiksatam> however, when the outer most transaction context exits, the exception is thrown
18:38:20 <SumitNaiksatam> (i am referring to the resource closed)
18:39:19 * tbachman wishes he had slayed that dragon when he first encountered it :(
18:39:21 <SumitNaiksatam> this is the trace from that: #link https://www.pastiebin.com/58cd90f6bce04
18:39:49 <SumitNaiksatam> tbachman: i dont know if its just one dragon
18:40:12 <SumitNaiksatam> and for the newton issue, this is the trace: #link https://www.pastiebin.com/58d16e63140f9
18:40:30 <SumitNaiksatam> we dont have a lead on the newton issue
18:40:46 <SumitNaiksatam> on the mitaka issue, like i said, i have a workaround, but its not committed in code
18:41:09 <SumitNaiksatam> the newton issue gets triggered only when we use a heat template to exercise the system
18:41:47 <SumitNaiksatam> rkukura: and you are looking at this, #link https://www.pastiebin.com/58d01638c9db3
18:42:19 <SumitNaiksatam> so three DB issues, when the aim drivers are used
18:42:43 <SumitNaiksatam> just want to make sure everyone is aware in case you hit them
18:42:46 <rkukura> I’ve looked at the log, but that’s about as far as I’ve got. Seems it retries and succeeds.
18:43:02 <SumitNaiksatam> rkukura: yeah there is no functional issue there
18:43:14 <SumitNaiksatam> rkukura: i think this might be on account of the expunge that neutron is doing
18:43:50 <rkukura> SumitNaiksatam: Where is that expunge?
18:45:10 <SumitNaiksatam> rkukura: so i recall that before the extension attributes get processed, the expunge gets call
18:45:13 <SumitNaiksatam> *called
18:45:41 <SumitNaiksatam> i am really not comfortable with that expunge but i did not patch it since it was not breaking things in the gate tests
18:45:45 <rkukura> right - I’ll look and see if that could be related
18:45:53 <SumitNaiksatam> rkukura: just do a grep and you will see it
18:45:59 <SumitNaiksatam> i think its in a couple of places
18:46:05 <tbachman> it’s weird — how come we don’t see the parent class implementation in delete_policy_target?
18:46:08 <tbachman> (in the trace)
18:46:19 <SumitNaiksatam> this is the neutron code that i am referring to, not GBP
18:46:24 <tbachman> (the one for stable/mitaka)
18:46:58 <SumitNaiksatam> tbachman: its difficult to read that trace in the way things get called on account of the decorators involved
18:47:05 <tbachman> yeah
18:47:30 <SumitNaiksatam> tbachman: you will flip even more if i tell you how the workaround works
18:47:35 <tbachman> :)
18:48:16 <SumitNaiksatam> anyway, not a very happy place right now!
18:48:28 <tbachman> :(
18:48:36 * tbachman hands SumitNaiksatam a snickers
18:48:40 <SumitNaiksatam> tbachman: :-)
18:48:55 <SumitNaiksatam> tbachman: so if you are running tempest tests, be warned
18:49:04 <tbachman> SumitNaiksatam: thx
18:49:22 <SumitNaiksatam> not that its very helpful just saying that :-)
18:49:44 <SumitNaiksatam> we might have to pool our collective wisdom at some point to get past these DB issues
18:50:19 <SumitNaiksatam> alrighty, if not else for today, we can stop here
18:50:34 <annak> i wanted to suggest something small
18:50:48 <annak> on a much smaller scale :)
18:50:52 <SumitNaiksatam> annak: yes please, and thanks for all the newton patches!
18:51:05 <annak> to move to neutron_lib pep8 factory
18:51:09 <annak> instead of neutron
18:51:17 <SumitNaiksatam> annak: okay
18:51:30 <annak> i think that's what we're supposed to do.
18:51:40 <SumitNaiksatam> annak: sure then we should
18:51:44 <annak> ok :) i'll do that
18:51:49 <SumitNaiksatam> annak: i havent explored that
18:51:53 <SumitNaiksatam> annak: great, thanks!
18:52:34 <SumitNaiksatam> annak: did you see any issues using newton with the vmw backend?
18:52:42 <SumitNaiksatam> i mean GBP newton
18:53:22 <annak> the vmw plugin is still very minimal, and no, no GBP-specific issues
18:53:35 <annak> lots of backend-specific issues :)
18:53:41 <SumitNaiksatam> annak: hmmm, okay
18:53:54 <SumitNaiksatam> annak: we are seeing these issues when we have concurrent operations
18:54:36 <annak> I am still far from testing anything under load, but thanks, i'll keep that in mind
18:54:44 <SumitNaiksatam> annak: so may be if you have a test suite which exercises things in parallel, perhaps you can report back your experience
18:54:47 <SumitNaiksatam> annak: yeah
18:55:10 <SumitNaiksatam> okay, thanks all for joining
18:55:11 <SumitNaiksatam> bye
18:55:22 <annak> bye!
18:55:25 <SumitNaiksatam> #endmeeting