18:09:36 #startmeeting networking_policy 18:09:37 Meeting started Thu Feb 22 18:09:36 2018 UTC and is due to finish in 60 minutes. The chair is SumitNaiksatam. Information about MeetBot at http://wiki.debian.org/MeetBot. 18:09:38 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 18:09:41 The meeting name has been set to 'networking_policy' 18:09:49 #topic Pending patches 18:10:06 SumitNaiksatam: we may also want to add “Reverting patches” 18:10:19 tho we may just carry that discussion over gerrit 18:10:38 tbachman: ah 18:10:39 which ones? 18:10:45 the one I merged :P 18:10:45 lets discuss here 18:10:51 * tbachman goes to gerrit 18:11:02 #link https://review.openstack.org/#/c/546269/ 18:11:59 okay i see bob’s comments now 18:12:09 incidentally, amit had discussed this with me yesterday 18:12:18 I’m not arguing that we should revert the patch 18:12:28 and i had pointed out that we prefer to not use lockmode update 18:12:38 But I do thing the locking is completely usefess 18:13:02 rkukura: is it useless in all cases? 18:13:27 * tbachman wonders if we have an overarching set of DB access semantics for GBP 18:13:35 s/semantics/idioms/ 18:13:43 if the DB server is clustered, and locks are not global across the DB servers, they don’t do any good 18:13:48 fyi - #link https://wiki.openstack.org/wiki/OpenStack_and_SQLAlchemy#MySQLdb_.2B_eventlet_.3D_sad 18:14:06 lol 18:14:27 see sections “MySQLdb + eventlet = sad” and also “Pessimistic Locking - SELECT FOR UPDATE" 18:14:39 and in cases like this, locking the set of rows returned from the query does not help - what would be needed would be locking the whole table to prevent adding new rows that would have been returned by the query 18:15:00 sounds heavy 18:15:18 rkukura: so i agress in the case of clustered (with Galera front ending) its not helpful 18:15:18 and in general, we would prefer optimistic approaches to pessimistic locking in order to scale an perform better 18:15:31 *agree 18:16:10 and I had forgotten about the potential deadlocking with threads 18:16:13 Do different clustered DB servers provide this as part of their clustering behavior? 18:16:14 tbachman: to your point, moving to an optimistic locking strategy seemed like an overarching project concern/goal 18:16:34 and it seemed too big a change for this patch 18:16:55 this is way beyond the topic of this specific subject, but I find it challenging to define a set of DB access semantics/idioms when we can’t limit the set of what people use 18:17:18 It seems like we’d have to go with a very low bar, in order to support any DB integration 18:17:25 * tbachman gets off soapbox 18:17:38 s/any/every possible/ 18:18:11 * tbachman taps mic 18:18:20 we tried to think through ways of doing “optimistic” locking but per Amit, in this case the logic wasnt as straightforward 18:18:33 I don’t claim to know what the solution is, but my point on this patch is that adding the locking doesn’t really help anything, and we’ve got bigger issues of the same type 18:18:44 * tbachman nods 18:18:50 rkukura: i think we are somewhat covered with the thread deadlocks because we will presumably retry 18:19:03 SumitNaiksatam: that may be the case 18:19:13 rkukura: it would help in the case the DB is not clustered 18:19:38 but how is locking the set of rows returned by the query prevent other sessions from adding new rows that would have been returned by the query? 18:20:35 As I understand it, the rows in this DB are generally not even mutated, just added or removed. So locking existing rows doesn’t do much good, if any. 18:21:36 rkukura: so i did not study all the cases in which reference counting was being done in this patch, but it seemed to me that at least in the delete case it would help, no? 18:21:44 What is needed is some way for the commit to validate that the same rows would be returned at that point as when the query was made 18:23:28 rkukura: in the above, you’re referring to MVCC 18:23:29 ? 18:24:10 not specifically, but maybe 18:24:13 k 18:24:24 just wasn’t sure what a lock means in that case 18:24:42 rkukura: in your review comments on the patch, you meant “live” not “leave” right? 18:24:53 but even that seems to mainly address existing rows, not rows that might get added concurrenty during a transaction 18:26:14 SumitNaiksatam: No, I meant “I’d prefer that we leave the … out for now” 18:26:49 ah sorry, i was misreading, the sentence is fine as is 18:27:40 I don’t think the locking does a whole lot of harm in this case, so I am not arguing to revert the patch. I just don’t want the illusion that we don’t have a problem. 18:28:07 rkukura: totally with you on the point about this creating the illusion 18:28:13 * tbachman nods again 18:28:13 hence had requested Amit to add the comment 18:28:50 sorry again folks for the quick merge on this — should have given it my due attention :( 18:28:54 i had pointed this link to him yesterday so he is aware of the limitations 18:29:43 tbachman: hmmm, but i think we are converging to the same result 18:29:49 as for other patches 18:29:52 they are building up 18:30:39 SumitNaiksatam: I put a comment in your DB patch — more of a question, really 18:30:49 tbachman: thanks, just noticed and responded 18:30:59 i had considered that before 18:31:05 k 18:31:23 I just couldn’t remember if this was an issue, since I last wrote a DB migration 18:32:00 we add only what is needed in the migration 18:32:39 so, in the case where we doing things like adding columns, etc., that makes sense 18:32:50 but if we need to move things around, etc., that would be an issue, right? 18:33:47 tbachman: sorry, i did not understand either of those 18:33:57 we are doing this only in the case of data migration 18:33:57 basically, anything that would affect the integrity of the row, if the row would include a relationship to another table row 18:34:05 not schema migration 18:34:09 ah 18:35:33 if you all would quit changing things on my, I could complete the validation/repair tool, and we could use that to handle the data migrations in some cases going forward 18:36:17 rkukura: “my” ? 18:36:19 I cannot type on this stupid keyboard 18:36:27 me 18:36:32 :-) 18:36:37 got it 18:36:48 ;) 18:36:49 dont mean to change the topic, but quick one - #link https://review.openstack.org/#/c/546017 18:37:37 Just added that it should have a commit message 18:37:41 explanation/context 18:37:49 tbachman: agree :-) 18:38:45 why is this only on stable/ocata? 18:39:01 I think this is to address the UT issue 18:39:03 rkukura: its relevant only to ocata 18:39:15 not just a UT issue 18:39:15 there’s a UT that’s failing sporadically, but only on ocata 18:39:26 ah 18:39:27 ok, wouldn’t hurt to mention that in the commit msg 18:39:44 presumably, the code is different on other branches? 18:40:13 it fixes the UT, but its a much bigger issue because the new and old transaction patterns are getting mixed up 18:40:23 so it breaks our transaction integrity 18:40:36 this is the part where neutron introduced this change selectively 18:40:56 and only in Pike it was done more comprehensively 18:41:03 so we adapted in Pike 18:41:21 but in Ocata we used the older pattern (and so does Neutron in most other cases) 18:41:47 SumitNaiksatam: thx for the background 18:41:52 and neutron doesnt care as much, most likely, because this plugin is either not used as much in Ocata, or not in the way that we do 18:41:57 * tbachman definitely thinks it needs a commit message now ;) 18:42:04 tbachman: lol, no 18:42:12 i cant type either 18:42:15 *np 18:42:41 i spent some time debugging the UT issue and which led to this realization 18:42:44 SumitNaiksatam: That explains why its only on stavle/ocata. Did “it fixes the UT” but that it “breaks our transaction integrity” above both refer to the patch? 18:42:56 rkukura: yes 18:43:14 so the patch breaks transaction integrity? 18:43:23 so when i refer to the “plugin” i meant the “trunk” plugin 18:44:32 rkukura: yes, because it mixes up the old and new transaction creation pattern 18:45:32 SumitNaiksatam: Are you saying the the trunk plugin broke transaction integrity, or that this patch broke transaction integrity? I’m getting confused. 18:46:13 rkukura: the trunk plugin broke transactional integrity for our apic_aim mech driver 18:46:32 it might be working fine independently 18:47:00 ok, and this patch fixes that issue? 18:47:43 rkukura: yes 18:47:56 for example, use of #link https://github.com/openstack/neutron/blob/stable/ocata/neutron/services/trunk/plugin.py#L226 18:48:03 in Ocata, causes a problem for us 18:48:50 this patch aims to change the use to the older pattern prevalent in ocata 18:48:51 OK, I’m fine with this patch, but wonder if its worth updating its commit message 18:49:20 rkukura: absolutely, i didnt think of it because i had the background, but i can see why it might not make sense without knowing it 18:50:06 of couse, there is not justification/motivation required for putting commit messages, it should always be required! :-) 18:50:09 *no 18:50:24 SumitNaiksatam: +1 18:50:37 this one would be particularly helpful 18:51:23 as reviwers we should enforce it for all patches, even for trivial patches 18:51:45 +1 18:51:48 just add one line! :-) 18:51:58 but it matters a lot when you are reading the git log 18:51:59 this one might take more than one 18:52:03 tbachman: lol 18:52:09 of course 18:52:10 anyway 18:52:20 SumitNaiksatam: totally agree about when reading the log 18:52:28 I’ve run into that a bit lately 18:53:01 i used to comment on the patches earlier asking for a commit message, but patches got merged in spite, so i gave up 18:53:22 heh 18:53:37 as for #link https://review.openstack.org/#/c/546427/ 18:54:14 it fixes a show stopper, basically the installation upgrade fails 18:54:50 (sorry i changed context and circled back to this patch which we already discussed) 18:55:28 rkukura: i was responding to your earlier point about things like this getting fixed by your repair tool 18:56:17 depending on when the tool is run, it might fix this issue, but at least for now, since the data migration is embeded in our migration chain, it runs and breaks the upgrade 18:56:19 sorry 18:56:22 SumitNaiksatam: so, we don’t need to populate this: https://github.com/openstack/group-based-policy/blob/master/gbpservice/neutron/plugins/ml2plus/drivers/apic_aim/db.py#L31-L34 18:56:49 or is it this: 18:56:50 https://github.com/openstack/group-based-policy/blob/master/gbpservice/neutron/plugins/ml2plus/drivers/apic_aim/db.py#L27-L29 18:57:01 * tbachman gets confused about directions wrt ForeignKey and relationship 18:57:57 tbachman: i dont think the foreign key is required 18:58:07 SumitNaiksatam: k. thx 18:58:15 we are not redefining the table 18:58:41 we are just providing enough defintion to fill the columns that are required to migrate the data 18:58:53 got it 18:59:15 there is a stack trace i can share with you offline that will provide more context on this 18:59:23 note that the patch is doing two things 18:59:28 even tho there’s the ondelete=‘CASCADE’? 18:59:57 its also getting ready of the use of the mixin methods, which also breaks things 19:00:03 we are out of time 19:00:13 tbachman: responding offline 19:00:19 SumitNaiksatam: ack. Thx! 19:00:28 rkukura: tbachman: thanks for joining! 19:00:31 SumitNaiksatam: bye! 19:00:44 #endmeeting