18:01:32 #startmeeting networking_policy 18:01:33 Meeting started Thu Mar 30 18:01:32 2017 UTC and is due to finish in 60 minutes. The chair is SumitNaiksatam. Information about MeetBot at http://wiki.debian.org/MeetBot. 18:01:34 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 18:01:35 hi 18:01:36 The meeting name has been set to 'networking_policy' 18:02:00 #info agenda https://wiki.openstack.org/wiki/Meetings/GroupBasedPolicy#March_30th.2C_23rd_2017 18:02:07 since igordcard is around 18:02:12 #topic QoS patch 18:02:22 #link https://review.openstack.org/426436 18:02:48 igordcard: thanks for fixing all the issues, great work 18:03:01 rkukura: do you think your comments are addressed on this patch? 18:03:29 I think so - planning to re-review 18:03:33 rkukura: thanks 18:03:47 igordcard: as before, my apologies for dragging this patch for so long 18:04:22 I wasn’t sure if we wanted to merge this right away, or hold off until the newton work settled out completely 18:04:29 igordcard: we were battling some pretty nasty DB issues last week, and I preferred to not merge anything major which might have further complicated debugging the DB issues 18:04:36 rkukura: right, just saying that 18:05:00 igordcard: we did solve one of the DB problems (which i will update a little later) 18:05:45 we should be in a position to merge this as soon as a couple of the other issues settle down 18:06:13 igordcard: i dont anticipate that any of the fixes we make will have any impact on your patch 18:06:21 SumitNaiksatam, igordcard: Do we plan to back-port QoS to stable/newton and/or stable/mitaka? 18:06:46 rkukura: at this point i am not inclined to backport to mitaka 18:07:09 makes sense 18:07:15 rkukura: okay 18:08:29 okay moving on 18:08:48 #topic Use of upper constraints on master 18:09:59 so basically the summary here is that regardless of what branch version we use in the tox.ini for the upper-constraints, the upstream openstack infra job looks at the branch name and basically uses the version of upper-constraints based on that 18:10:14 this messes up things in our case where our master trails the rest of openstack 18:11:02 so if we take the current snapshot, we are on newton but the py27 job is being run with library dependencies of pike 18:12:03 even if we specify explicit version numbers in requirements or test-requirements, if the dependency is present in upper-constraints, the version in upper-constraints takes precedence 18:12:42 the best fix is to obviously catch up with openstack master, so that we are trailing 18:13:01 but since we are not there yet, we have to work around this 18:13:47 one of my recent commits fixed an issue related to this such that it worked in the CI, but it broke local tox py27 runs 18:13:51 rkukura: thanks for catching that 18:14:07 that was an easy one 18:14:27 we can adjust the local runs.. 18:14:35 thanks SumitNaiksatam, sorry was on another conversation 18:14:36 rkukura: and you were contemplating chaning the upper-constraints of our master to point to our master 18:14:46 igordcard: no problem at all 18:14:52 igordcard: thanks for sticking around 18:14:59 SumitNaiksatam: yes, two possible fixes 18:15:32 SumitNaiksatam: [qos] backporting to newton should be alright 18:15:34 rkukura: i am not in favor of changing the upper-constraints version for our master just yet 18:15:48 igordcard: yes, that wouldnt be an issue 18:15:58 SumitNaiksatam: I agree - I’d rather catch issues while developing rather than while back-porting 18:16:13 rkukura: exactly, you said it well 18:17:55 rkukura: so for now lets roll with your patch 18:18:06 rkukura: i just had some nit comments on it 18:18:21 OK - just fixed the commit msg, and will add the REVISIT in a moment 18:18:25 that said, we will continue to live on the edge for this reason! :-( 18:18:31 rkukura: great thanks 18:19:23 i think songole is not around 18:19:47 unfortunately we have not been able to make progress with the NFP patches for the same reason as for the QoS patch 18:20:03 we need newton to settle, before we can start merging any major changes 18:20:20 #topic Open Discussion 18:20:52 annak: thanks for the patch on pep-8 changes, great contribution towards the betterment of the project! 18:21:23 SumitNaiksatam: np 18:21:54 So are we seeing more deprecation warnings now that we should deal with? 18:21:57 annak: the merge failures on that patch were on account of the upper-constraints issue discussed earlier here, but for now we dodged a bullet and were able to get through 18:22:22 rkukura: i havent checked after annak’s latest merge, but i suspect yes 18:22:43 SumitNaiksatam: yes, thanks 18:22:43 I think its mostly stuff that’s moved to neutron_lib 18:22:49 rkukura: the ones that remain are probably bigger changes in terms of which APIs we are using 18:22:53 rkukura: yes 18:23:23 rkukura: there might be some keystone/keystoneclient related too 18:24:11 and most likely these are already deprecated in Ocata, so we will have to deal with these pretty soon 18:25:02 of course, as a standard operating procedure, we should not be waiting for a deprecation to happen :-) 18:25:25 the other update i had was on the DB issue 18:26:59 last week i had reported this: #link https://www.pastiebin.com/58d16e63140f9 18:27:06 the DB reentrant issue 18:27:28 it turned out that this was an issue in the “aim” code outside of GBP 18:27:44 so it was happening only when configuring the aim_mapping GBP driver 18:28:05 the issue was that a session was being switched with another during a transaction 18:28:42 the switch was inadvertent, hence difficult to find, and had catastrophic results! 18:28:52 SumitNaiksatam: good find! 18:29:06 tbachman: yes, thank amit! :-) 18:29:35 there are still a couple of other DB issues we are looking at 18:30:20 i am still sitting on the “resource closed” error, not reproduced it yet, and rkukura is dealing with the stale data error (which is specific to aim driver) 18:31:10 but regarding the first issue, switching of session is very difficult to catch, so we needless to say, we have to be extra careful when we pass the session around 18:31:39 i was also thinking if we could instrument an automatic check if the session gets used outside the scope of the thread that created it 18:31:50 * tbachman kicks the evil “session cache” daemon 18:31:58 tbachman: :-) 18:32:32 SumitNaiksatam: I like that idea 18:32:38 rkukura: okay 18:32:57 the last thing i wanted to bring up is adding “retry” decorators for the GBP plugin, similar to the one that the neutron plugin has 18:33:24 currently we are seeing an issue, at least with the aim_mapping driver, where concurrent operations cause failure 18:33:35 in such cases, a retry would help 18:34:18 (the failure is because the concurrent operations lead to mutliple per-tenant resources) 18:34:30 SumitNaiksatam: do you have the patch that fixes the session issue so i can get better idea about the problem? 18:34:51 annak: its outside of GBP but I can certainly point you to where it is 18:35:07 annak: i will follow up with you offline 18:35:15 SumitNaiksatam: thanks 18:35:58 rkukura: tbachman: do you have any thoughts about adding the retry decorator? 18:36:14 SumitNaiksatam: not sure I know of a reason against it? 18:36:20 i had actually added them in the newton sync patch, but eventually removed them before it got merged 18:36:32 tbachman: and i dont recall why 18:37:02 SumitNaiksatam: things like this already exist in neutron, I believe (e.g. port binding)? 18:37:33 tbachman: right, so we can follow that pattern 18:37:57 okay, since there does not seem to be an objection, i wil try it out 18:38:09 songole: hi, we were just about to wrap up 18:38:18 just a heads up - I'm planning to start posting the vmware plugin soon, it is still under heavy WIP so probably not worth looking at yet 18:38:23 Sorry, I am late 18:38:32 annak: np, feel free to post 18:38:51 annak: most of our patches start that way :-) 18:38:59 :) 18:39:27 annak: i actually prefer to post even at an early stage, if not for anything else, just to see how it fares in the upstream gate 18:39:59 sorry - got interrupted 18:40:06 annak: we could also consider adding a new gate job for that plugin 18:40:17 songole: np 18:40:21 I was also going to ask about adding retry decorators to the GBP API 18:40:31 SumitNaiksatam: yes, definately. but too early for that as well 18:40:34 rkukura: so you feel we should add 18:40:37 annak: sure 18:41:04 annak: but be aware that the gate job that CI gives is just a hook (with devstack install) 18:41:13 annak: you can decide what tests you want to run 18:41:28 annak: so in the initial stage you can run something very basic 18:42:36 songole: i apologized earlier for not making progress with the NFP patches :-) 18:42:37 SumitNaiksatam: Whether we should need to rety decorators depends on the DB isolation level, I think 18:43:35 rkukura: could you elaborate? 18:43:35 SumitNaiksatam: np. We will not be merging until mid next week 18:44:01 songole: one of the reason we are holding back a bit is because all the DB issues are not yet resolved with newton 18:44:18 ok 18:45:05 songole: so we want to avoid complicating the problem (which we are anyway finding difficult to debug) with more changes on top 18:45:42 SumitNaiksatam: Could me merge it for mitaka first in that case? 18:45:44 SumitNaiksatam: As I understand it, neutron was originally written assuming a high isolation level, such as serializable, but I think this has been relaxed over time 18:45:54 songole: most likely the NFP patches are not going to affect anything major in the framework, but we just want to be careful, since we had some issues now with outside repos breaking us 18:46:08 There is also the version column thing, which maybe we should be using 18:46:34 when ensures a commit fails if some other TX increments versions of rows that it modifies 18:46:58 rkukura: you mean object versioning (OVO)? 18:47:26 no, its a column in the DB that keeps a version counter for each row 18:47:39 rkukura: okay 18:47:51 #link http://docs.sqlalchemy.org/en/latest/orm/versioning.html 18:48:17 rkukura: ah thanks, i was thinking something totally different 18:49:11 rkukura: have you observed this pattern being used in openstack? 18:49:14 My very vague understanding of Neutron’s approach now is to allow a lower isolation level, such as read committed, and count on the DB throughing retry exceptions when attempting to commit something that has been concurrently modified 18:49:33 s/throughing/throwing/ 18:50:57 rkukura: okay 18:51:00 neutron.db.standard_attr includes “version_id_col” 18:51:15 rkukura: okay 18:52:26 rkukura: the retry decorator should work for that case, right? 18:52:58 so I think the retry decorators work together with the StaleDataError exceptions 18:53:41 right, the version_id_col declaration tells sqlalchemy to raise StaleDataError during commit if the version number isn’t what was expected. 18:53:54 rkukura: okay 18:54:27 i am hoping they will also work in the case, say, when a per-tenant resource is attempted to be created from two concurrent threads, one succeeds, the other one fails 18:55:08 but the one which fails can be retried, and will now see the per-tenant resource already present and would not try recreating it the second time around 18:55:51 So if we really are using read committed or repeatable reads isolation level, but aren’t suing version_id_col, then we might be exposed to all kinds of concurrency issues 18:56:32 * tbachman notes the time 18:56:42 tbachman: thanks :-) 18:57:01 rkukura: okay, thanks for bringing that up 18:57:19 rkukura: will take up this discussion offline with you 18:57:23 sure 18:57:48 songole: i am not sure front porting is a good model 18:58:04 songole: lets see which patches you have in mind 18:58:19 songole: hopefully we can open up the master itself soon 18:58:57 okay if nothing else, we can wrap up for today 18:59:03 thanks all for joining! 18:59:05 SumitNaiksatam: thanks! 18:59:08 thanks SumitNaiksatam! 18:59:10 bye! 18:59:14 #endmeeting