18:00:57 #startmeeting networking_policy 18:00:58 Meeting started Thu Dec 8 18:00:57 2016 UTC and is due to finish in 60 minutes. The chair is SumitNaiksatam. Information about MeetBot at http://wiki.debian.org/MeetBot. 18:00:59 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 18:01:01 The meeting name has been set to 'networking_policy' 18:01:26 #info agenda https://wiki.openstack.org/wiki/Meetings/GroupBasedPolicy#Dec_8th_2016 18:01:37 hi 18:01:38 #topic Remove expunge_all calls 18:01:43 #link https://review.openstack.org/#/c/399772/ 18:01:44 :( 18:01:48 i just rebased it 18:01:50 tbachman: :-) 18:02:03 SumitNaiksatam: I just got back to this mid-yesterday 18:02:03 songole: any more data points at your end? 18:02:08 tbachman: np 18:02:55 Regarding sharing across threads.. 18:03:11 we use getsession method provided by neutron 18:03:32 would that be an issue? we are talking about green threads here, right? 18:04:06 songole: in my mind, it *shouldn’t*, but I have yet to get through this code 18:04:37 I believe get_session is supposed to provide an interface that ensures each green thread gets its own session 18:04:44 but I haven’t gotten thru this myself just yet 18:04:49 * tbachman is doubting :) 18:05:05 ok 18:05:35 regarding the specific patch, we ran tests locally with the patch 18:05:48 i dont thinkg neutron does anything there, i think this has to have been implemented in oslo_db 18:06:04 songole: have you verified that each thread does its own getsession call, and that neither the session itself nor any of the DB objects associated with the sesssion are shared across threads? 18:06:04 and we hit the db exception which sumit and I debugged for lbaas v1 18:06:31 rkukura: yes. each thread does its own getsession 18:06:44 songole: thanks 18:07:13 okay, now I’m confused 18:07:21 you’re hitting this w/o the expunge? 18:07:30 (i.e. the DetachedInstanceError exception) 18:08:23 tbachman: i share your confusion ;-) 18:09:26 songole: you hit the exception with or without the expunge_all ? 18:10:00 tbachman: thanks for referring to the exact exception, just so that we know we are all talking about the same thing :-) 18:10:01 I didn't see it myself. I just got an email from india team regarding this. 18:10:09 I will confirm 18:10:47 songole: i hope its with the expunge_all call, otherwise it flies in the face of most of our analysis so far :-( 18:11:38 SumitNaiksatam: I understand.. I will confirm.. 18:12:52 SumitNaiksatam: I’m hoping to have a better handle on this tomorrow 18:12:59 tbachman: on the side perhaps we can continue to rebase/recheck your patch to see if we see any consistent pattern of failures 18:13:07 i will try to keep an eye as well 18:13:11 I realize that we’re short on time here, so unfortunately I’m not sure what to do here. 18:13:49 i think there are several changes being made on the NFP side, and perhaps some of the exceptions we were seeing earlier on this patch were not related and might have gone away 18:14:09 tbachman: yeah same here, hence the offline email thread last week 18:14:43 SumitNaiksatam: maybe we can address this again offline tomorrow? 18:14:49 for at least the core GBP stuff? 18:14:49 tbachman: sue 18:14:51 *sure 18:15:07 If we’re lucky, any findings there might give us clues for NFP as well (if there are any) 18:16:10 songole: so you are confirming that your usage of threads in NFP is session-safe i.e. sessions are not getting shared threads? 18:16:46 SumitNaiksatam: From our understanding, they are thread safe 18:16:53 *shared across 18:17:26 And, the lb db exception, i was seeing even without nfp 18:18:35 songole: oh, but i thought you were earlier saying that was not the case with the lb exception 18:19:08 songole: so the lb exception occured with or without GBP in the mix? 18:19:29 with GBP and with or without nfp 18:19:54 songole: okay, so the expunge_all is still relevant 18:20:12 because without GBP, it wont be 18:20:42 * tbachman would like to expunge expunge_all 18:21:03 tbachman: yay to that 18:21:20 tbachman: songole: any other data point we need to discuss? 18:21:33 SumitNaiksatam: nothing from me (for now) 18:22:07 okay, so lets assume that NFP is indeed getting one session per thread 18:22:42 and sessions are not leaking across threads 18:23:22 can we analyze the exceptions we see henceforth on tbachman’s with that assumption? 18:23:49 tbachman’s patch i meant 18:24:22 i think we got into the session leaking discussion because we were seeing some NFP exceptions which we could not explain 18:24:29 SumitNaiksatam: FWIW, in the original bug report from Ivar, the problem seemed to be state that wasn’t being created 18:24:30 Don’t forget that the DB objects stay associated with the session, so its important not to leak them across threads either. 18:24:34 the UTs were failing b/c state wasn’t there 18:24:40 and attirbuted it to either the removal of the expunge_all call or the session leaking 18:25:51 (e.g. assertEqual for a configuration that’s read back, but the configuration doesn’t exist) 18:26:26 we should make sure that’s what the NFP failures were seeing (I believe they were, but don’t have the data in front of me atm) 18:27:07 tbachman: you are referring to this bug report, right: #link https://bugs.launchpad.net/group-based-policy/+bug/1404412 18:27:07 Launchpad bug 1404412 in Group Based Policy "Session's persistent objects across transactions create inconsistency" [High,Fix released] - Assigned to Ivar Lazzaro (mmaleckk) 18:27:08 ? 18:27:20 SumitNaiksatam: ack 18:27:45 what i recall was that the problem was of stale state 18:28:01 and there are not threading issues with the UTs (at least not that i am aware off) 18:28:22 doing the expunge_all was cleaning the in-session state and forcing reload from the db 18:29:51 SumitNaiksatam: one theory I had was that tox runs UTs in parallel, and somehow the two threads were affecting the same DB state 18:29:53 but that’s a shaky theory at best 18:30:07 so, it wasn’t the sequential transactions per-se 18:30:08 tbachman: hmmm, i would be doubtful about that 18:30:28 tbachman: it might very well have been a bug at that time though 18:30:31 sorry, networking issues. reconnected 18:30:35 SumitNaiksatam: ack 18:30:37 yeah 18:30:46 tbachman: and hence that problem does not manifest any more now 18:31:01 SumitNaiksatam: ack. But there are still a few things that don’t make sense to me 18:31:01 tbachman: because you are not seeing any issues with the UTs after removing the call 18:31:03 songole: np 18:31:13 for example, changing it to expire should have had the same effect, were that the case 18:31:15 but it didn't. 18:31:42 tbachman: hmmm, didnt think of that, but makes sense 18:32:26 rkukura: to your earlier point about sharing db objects across threads, if that is a culprit is some way with regards to the NFP errors, it would be difficult to track 18:32:58 rkukura: however, i don’t understand why it would manifest after removing the expunge_all call 18:33:23 *is some way - in some way 18:33:34 SumitNaiksatam: Agreed it might be difficult to track, but the DB objects use the session when you try to follow associations, etc. 18:34:42 I think there are event logs we can turn on here, but I’m guessing it would be a complete overkill of info 18:35:52 rkukura: answering my own question, if db objects were being used across threads, and removing the expunge_all call led to an exception, i think it means that it actually revealed an incorrect use of the db objects (across threads), no? 18:36:20 SumitNaiksatam: since the DB objects are tied to sessions, and sessions should only be used by a single thread, I’d say yes :) 18:36:27 tbachman: okay 18:36:31 SumitNaiksatam: Since expunge_all() dissaciates the DB objects from the session, it seems like it could be preventing the errors 18:36:33 * tbachman believes that’s what rkukura was getting at 18:36:44 dissassociates 18:37:07 keep in mind that the DB objects can only be used in this state if no backrefs are accessed 18:37:15 otherwise we get the exception 18:37:22 tbachman: ah okay 18:37:43 So, if the NFP code is doing this and not accessing backrefs, then we won’t see the exceptions in their code 18:38:06 tbachman: but that is regardless of the expunge_all, right? 18:38:24 SumitNaiksatam: actually, without the expunge_all, the object is still tied to the session 18:38:27 and the backref will work 18:38:30 and worse 18:38:35 tbachman: ah okay 18:39:14 If two contexts are sharing the same DB object, and mutating it, then there are issues 18:40:04 SumitNaiksatam: I will also say that I’m still worried that my understanding of this is limited 18:40:09 I don’t think a DB object can be associated with two sessions at once 18:40:15 tbachman: by contexts you mean? 18:40:38 I read context as session, but maybe tbachman meant thread 18:40:43 yeah 18:40:57 * tbachman hates the terminology here 18:41:06 context here meant thread 18:41:07 sorry about that 18:41:09 but how can the db object with multiple sessions? 18:41:19 *be associated with 18:41:31 SumitNaiksatam: I think “multiple sessions” was my mistaken interprettation of tbachman 18:41:40 rkukura: ack 18:41:48 I meant multiple threads sharing the same DB object 18:41:49 ah sorry 18:41:57 SumitNaiksatam: my bad :) 18:42:07 tbachman: no my bad, didnt read your comment 18:42:11 * tbachman is barely monolingual :) 18:42:26 tbachman: lol 18:42:31 one more item on the agenda 18:42:32 My point was that the DB objects (until expunged) stay associated with the session, so if they are shared across threads, they can cause the session to be shared across threads. 18:43:04 rkukura: but that is not a good usage pattern, right? 18:43:12 SumitNaiksatam: Correct 18:44:21 I’d also be somewhat concerned about continuing to use DB objects in the same thread after the transaction in which they were obtained is commited, and especially if a new transaction on the thread’s session gets started. Those old DB objects might become involved in the new transaction. 18:44:52 rkukura: agree, but i believe that pattern exists in several places in neutron code 18:45:00 so we cant change that 18:45:31 It might be something we should emphasize for new code 18:45:42 (in GBP of course) 18:46:18 as for our NFP code, removing the expunge_all call somehow is leading stale state being read within the threads, which was not earlier the case because (implicitly) the assumption for expunge was being made 18:46:26 *leading to 18:46:34 Or we just call expunge_all in the right places to dissassociate the DB objects from the sessions they came from ;) 18:47:06 rkukura: oh that can get really messy, i am sure you were indeed joking :-) 18:47:31 one more item on the agenda 18:47:46 #topic Configuring extension drivers from policy drivers 18:47:51 SumitNaiksatam: Not really - I think it might explain why expunge_all seems to solve some problems, and maybe cause others 18:48:44 rkukura: yes, it might explain, but i dont think we want to solve that by calling expunge in a different place 18:48:47 #link https://review.openstack.org/#/c/407985/ 18:49:35 in the above review you mentioned something to the effect of aggregating the driver extension i have proposed in the patch with another one 18:50:34 rkukura: this was ostensibly to avoid the need (and potential error) in specifying multiple driver extensions in the configraion 18:50:45 right 18:51:08 Or configuring extension meant for one driver with the other driver. 18:51:09 i was thinking what if we allowed the policy driver to specify the driver extensions 18:52:18 so if a particular policy driver always knows that it needs certaing driver extensions, it can internally specify that 18:52:49 that would avoid the need for any additional extension driver configuration for that particular policy driver 18:52:56 SumitNaiksatam: That may be a reasonable idea, but I’m a bit worried about it not being consistent with the way EDs work in ML2. 18:53:14 FWIW, here’s the log of a failure in the NFP gatefor the last recheck on the expunge_all (search for “Traceback”): http://logs.openstack.org/72/399772/3/check/gate-group-based-policy-dsvm-nfp-ubuntu-xenial-nv/67fd5b7/logs/q-svc.txt.gz 18:53:44 tbachman: thanks 18:54:02 rkukura: but the GBP framework can diverge where relevant, right? 18:54:05 First question is whether we want users to be able to choose whether or not to enable these extensions. If so, we should make them individual EDs that are not automatically pulled in by the PD/MD. 18:54:53 rkukura: there are two aspects to this, one is how they are implemented, and second how they are configured 18:55:33 rkukura: if they are implemented as separate extensions/drivers, it leaves the possibility open for configuring them individually 18:55:49 I think it would be reasonable for a deployer to decide they want to use the aim_mapping PD, but only expose the standard GBP API. But I don’t know if the PD code will even work if the EDs are not enabled. 18:56:49 rkukura: specificall as for the AIM PD (and its reliance on the ml2+, etc) i think there is enough going on there that we should assume that it wont work without the EDs 18:57:37 rkukura: so in such cases why not let the PD decide that it always needs certain EDs 18:58:09 * or at least certain EDs 18:58:23 But then why package each (required) feature as a separate extension and ED? Wouldn’t it be a lot simpler to just have one extension and one ED for all of them? 18:58:46 rkukura: so that some other PD could potentially consume them in a different combination 18:59:27 SumitNaiksatam: maybe that makes sense 18:59:41 okay we reached the hour 18:59:53 thanks all for your time 18:59:58 bye! 19:00:01 #endmeeting