#openstack-meeting-3 log

18:09:36 <SumitNaiksatam> #startmeeting networking_policy
18:09:37 <openstack> Meeting started Thu Feb 22 18:09:36 2018 UTC and is due to finish in 60 minutes.  The chair is SumitNaiksatam. Information about MeetBot at http://wiki.debian.org/MeetBot.
18:09:38 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
18:09:41 <openstack> The meeting name has been set to 'networking_policy'
18:09:49 <SumitNaiksatam> #topic Pending patches
18:10:06 <tbachman> SumitNaiksatam: we may also want to add “Reverting patches”
18:10:19 <tbachman> tho we may just carry that discussion over gerrit
18:10:38 <SumitNaiksatam> tbachman: ah
18:10:39 <SumitNaiksatam> which ones?
18:10:45 <tbachman> the one I merged :P
18:10:45 <SumitNaiksatam> lets discuss here
18:10:51 * tbachman goes to gerrit
18:11:02 <tbachman> #link https://review.openstack.org/#/c/546269/
18:11:59 <SumitNaiksatam> okay i see bob’s comments now
18:12:09 <SumitNaiksatam> incidentally, amit had discussed this with me yesterday
18:12:18 <rkukura> I’m not arguing that we should revert the patch
18:12:28 <SumitNaiksatam> and i had pointed out that we prefer to not use lockmode update
18:12:38 <rkukura> But I do thing the locking is completely usefess
18:13:02 <SumitNaiksatam> rkukura: is it useless in all cases?
18:13:27 * tbachman wonders if we have an overarching set of DB access semantics for GBP
18:13:35 <tbachman> s/semantics/idioms/
18:13:43 <rkukura> if the DB server is clustered, and locks are not global across the DB servers, they don’t do any good
18:13:48 <SumitNaiksatam> fyi - #link https://wiki.openstack.org/wiki/OpenStack_and_SQLAlchemy#MySQLdb_.2B_eventlet_.3D_sad
18:14:06 <tbachman> lol
18:14:27 <SumitNaiksatam> see sections “MySQLdb + eventlet = sad” and also “Pessimistic Locking - SELECT FOR UPDATE"
18:14:39 <rkukura> and in cases like this, locking the set of rows returned from the query does not help  - what would be needed would be locking the whole table to prevent adding new rows that would have been returned by the query
18:15:00 <tbachman> sounds heavy
18:15:18 <SumitNaiksatam> rkukura: so i agress in the case of clustered (with Galera front ending) its not helpful
18:15:18 <rkukura> and in general, we would prefer optimistic approaches to pessimistic locking in order to scale an perform better
18:15:31 <SumitNaiksatam> *agree
18:16:10 <rkukura> and I had forgotten about the potential deadlocking with threads
18:16:13 <tbachman> Do different clustered DB servers provide this as part of their clustering behavior?
18:16:14 <SumitNaiksatam> tbachman: to your point, moving to an optimistic locking strategy seemed like an overarching project concern/goal
18:16:34 <SumitNaiksatam> and it seemed too big a change for this patch
18:16:55 <tbachman> this is way beyond the topic of this specific subject, but I find it challenging to define a set of DB access semantics/idioms when we can’t limit the set of what people use
18:17:18 <tbachman> It seems like we’d have to go with a very low bar, in order to support any DB integration
18:17:25 * tbachman gets off soapbox
18:17:38 <tbachman> s/any/every possible/
18:18:11 * tbachman taps mic
18:18:20 <SumitNaiksatam> we tried to think through ways of doing “optimistic” locking but per Amit, in this case the logic wasnt as straightforward
18:18:33 <rkukura> I don’t claim to know what the solution is, but my point on this patch is that adding the locking doesn’t really help anything, and we’ve got bigger issues of the same type
18:18:44 * tbachman nods
18:18:50 <SumitNaiksatam> rkukura: i think we are somewhat covered with the thread deadlocks because we will presumably retry
18:19:03 <rkukura> SumitNaiksatam: that may be the case
18:19:13 <SumitNaiksatam> rkukura: it would help in the case the DB is not clustered
18:19:38 <rkukura> but how is locking the set of rows returned by the query prevent other sessions from adding new rows that would have been returned by the query?
18:20:35 <rkukura> As I understand it, the rows in this DB are generally not even mutated, just added or removed. So locking existing rows doesn’t do much good, if any.
18:21:36 <SumitNaiksatam> rkukura: so i did not study all the cases in which reference counting was being done in this patch, but it seemed to me that at least in the delete case it would help, no?
18:21:44 <rkukura> What is needed is some way for the commit to validate that the same rows would be returned at that point as when the query was made
18:23:28 <tbachman> rkukura: in the above, you’re referring to MVCC
18:23:29 <tbachman> ?
18:24:10 <rkukura> not specifically, but maybe
18:24:13 <tbachman> k
18:24:24 <tbachman> just wasn’t sure what a lock means in that case
18:24:42 <SumitNaiksatam> rkukura: in your review comments on the patch, you meant “live” not “leave” right?
18:24:53 <rkukura> but even that seems to mainly address existing rows, not rows that might get added concurrenty during a transaction
18:26:14 <rkukura> SumitNaiksatam: No, I meant “I’d prefer that we leave the … out for now”
18:26:49 <SumitNaiksatam> ah sorry, i was misreading, the sentence is fine as is
18:27:40 <rkukura> I don’t think the locking does a whole lot of harm in this case, so I am not arguing to revert the patch. I just don’t want the illusion that we don’t have a problem.
18:28:07 <SumitNaiksatam> rkukura: totally with you on the point about this creating the illusion
18:28:13 * tbachman nods again
18:28:13 <SumitNaiksatam> hence had requested Amit to add the comment
18:28:50 <tbachman> sorry again folks for the quick merge on this — should have given it my due attention :(
18:28:54 <SumitNaiksatam> i had pointed this link to him yesterday so he is aware of the limitations
18:29:43 <SumitNaiksatam> tbachman: hmmm, but i think we are converging to the same result
18:29:49 <SumitNaiksatam> as for other patches
18:29:52 <SumitNaiksatam> they are building up
18:30:39 <tbachman> SumitNaiksatam: I put a comment in your DB patch — more of a question, really
18:30:49 <SumitNaiksatam> tbachman: thanks, just noticed and responded
18:30:59 <SumitNaiksatam> i had considered that before
18:31:05 <tbachman> k
18:31:23 <tbachman> I just couldn’t remember if this was an issue, since I last wrote a DB migration
18:32:00 <SumitNaiksatam> we add only what is needed in the migration
18:32:39 <tbachman> so, in the case where we doing things like adding columns, etc., that makes sense
18:32:50 <tbachman> but if we need to move things around, etc., that would be an issue, right?
18:33:47 <SumitNaiksatam> tbachman: sorry, i did not understand either of those
18:33:57 <SumitNaiksatam> we are doing this only in the case of data migration
18:33:57 <tbachman> basically, anything that would affect the integrity of the row, if the row would include a relationship to another table row
18:34:05 <SumitNaiksatam> not schema migration
18:34:09 <tbachman> ah
18:35:33 <rkukura> if you all would quit changing things on my, I could complete the validation/repair tool, and we could use that to handle the data migrations in some cases going forward
18:36:17 <SumitNaiksatam> rkukura: “my” ?
18:36:19 <rkukura> I cannot type on this stupid keyboard
18:36:27 <rkukura> me
18:36:32 <SumitNaiksatam> :-)
18:36:37 <SumitNaiksatam> got it
18:36:48 <rkukura> ;)
18:36:49 <SumitNaiksatam> dont mean to change the topic, but quick one - #link https://review.openstack.org/#/c/546017
18:37:37 <tbachman> Just added that it should have a commit message
18:37:41 <tbachman> explanation/context
18:37:49 <SumitNaiksatam> tbachman: agree :-)
18:38:45 <rkukura> why is this only on stable/ocata?
18:39:01 <tbachman> I think this is to address the UT issue
18:39:03 <SumitNaiksatam> rkukura: its relevant only to ocata
18:39:15 <SumitNaiksatam> not just a UT issue
18:39:15 <tbachman> there’s a UT that’s failing sporadically, but only on ocata
18:39:26 <tbachman> ah
18:39:27 <rkukura> ok, wouldn’t hurt to mention that in the commit msg
18:39:44 <rkukura> presumably, the code is different on other branches?
18:40:13 <SumitNaiksatam> it fixes the UT, but its a much bigger issue because the  new and old transaction patterns are getting mixed up
18:40:23 <SumitNaiksatam> so it breaks our transaction integrity
18:40:36 <SumitNaiksatam> this is the part where neutron introduced this change selectively
18:40:56 <SumitNaiksatam> and only in Pike it was done more comprehensively
18:41:03 <SumitNaiksatam> so we adapted in Pike
18:41:21 <SumitNaiksatam> but in Ocata we used the older pattern (and so does Neutron in most other cases)
18:41:47 <tbachman> SumitNaiksatam: thx for the background
18:41:52 <SumitNaiksatam> and neutron doesnt care as much, most likely, because this plugin is either not used as much in Ocata, or not in the way that we do
18:41:57 * tbachman definitely thinks it needs a commit message now ;)
18:42:04 <SumitNaiksatam> tbachman: lol, no
18:42:12 <SumitNaiksatam> i cant type either
18:42:15 <SumitNaiksatam> *np
18:42:41 <SumitNaiksatam> i spent some time debugging the UT issue and which led to this realization
18:42:44 <rkukura> SumitNaiksatam: That explains why its only on stavle/ocata. Did “it fixes the UT” but that it “breaks our transaction integrity” above both refer to the patch?
18:42:56 <SumitNaiksatam> rkukura: yes
18:43:14 <rkukura> so the patch breaks transaction integrity?
18:43:23 <SumitNaiksatam> so when i refer to the “plugin” i meant the “trunk” plugin
18:44:32 <SumitNaiksatam> rkukura: yes, because it mixes up the old and new transaction creation pattern
18:45:32 <rkukura> SumitNaiksatam: Are you saying the the trunk plugin broke transaction integrity, or that this patch broke transaction integrity? I’m getting confused.
18:46:13 <SumitNaiksatam> rkukura: the trunk plugin broke transactional integrity for our apic_aim mech driver
18:46:32 <SumitNaiksatam> it might be working fine independently
18:47:00 <rkukura> ok, and this patch fixes that issue?
18:47:43 <SumitNaiksatam> rkukura: yes
18:47:56 <SumitNaiksatam> for example, use of #link https://github.com/openstack/neutron/blob/stable/ocata/neutron/services/trunk/plugin.py#L226
18:48:03 <SumitNaiksatam> in Ocata, causes a problem for us
18:48:50 <SumitNaiksatam> this patch aims to change the use to the older pattern prevalent in ocata
18:48:51 <rkukura> OK, I’m fine with this patch, but wonder if its worth updating its commit message
18:49:20 <SumitNaiksatam> rkukura: absolutely, i didnt think of it because i had the background, but i can see why it might not make sense without knowing it
18:50:06 <SumitNaiksatam> of couse, there is not justification/motivation required for putting commit messages, it should always be required! :-)
18:50:09 <SumitNaiksatam> *no
18:50:24 <tbachman> SumitNaiksatam: +1
18:50:37 <tbachman> this one would be particularly helpful
18:51:23 <SumitNaiksatam> as reviwers we should enforce it for all patches, even for trivial patches
18:51:45 <rkukura> +1
18:51:48 <SumitNaiksatam> just add one line! :-)
18:51:58 <SumitNaiksatam> but it matters a lot when you are reading the git log
18:51:59 <tbachman> this one might take more than one
18:52:03 <SumitNaiksatam> tbachman: lol
18:52:09 <SumitNaiksatam> of course
18:52:10 <SumitNaiksatam> anyway
18:52:20 <tbachman> SumitNaiksatam: totally agree about when reading the log
18:52:28 <tbachman> I’ve run into that a bit lately
18:53:01 <SumitNaiksatam> i used to comment on the patches earlier asking for a commit message, but patches got merged in spite, so i gave up
18:53:22 <tbachman> heh
18:53:37 <SumitNaiksatam> as for #link https://review.openstack.org/#/c/546427/
18:54:14 <SumitNaiksatam> it fixes a show stopper, basically the installation upgrade fails
18:54:50 <SumitNaiksatam> (sorry i changed context and circled back to this patch which we already discussed)
18:55:28 <SumitNaiksatam> rkukura: i was responding to your earlier point about things like this getting fixed by your repair tool
18:56:17 <SumitNaiksatam> depending on when the tool is run, it might fix this issue, but at least for now, since the data migration is embeded in our migration chain, it runs and breaks the upgrade
18:56:19 <rkukura> sorry
18:56:22 <tbachman> SumitNaiksatam: so, we don’t need to populate this: https://github.com/openstack/group-based-policy/blob/master/gbpservice/neutron/plugins/ml2plus/drivers/apic_aim/db.py#L31-L34
18:56:49 <tbachman> or is it this:
18:56:50 <tbachman> https://github.com/openstack/group-based-policy/blob/master/gbpservice/neutron/plugins/ml2plus/drivers/apic_aim/db.py#L27-L29
18:57:01 * tbachman gets confused about directions wrt ForeignKey and relationship
18:57:57 <SumitNaiksatam> tbachman: i dont think the foreign key is required
18:58:07 <tbachman> SumitNaiksatam: k. thx
18:58:15 <SumitNaiksatam> we are not redefining the table
18:58:41 <SumitNaiksatam> we are just providing enough defintion to fill the columns that are required to migrate the data
18:58:53 <tbachman> got it
18:59:15 <SumitNaiksatam> there is a stack trace i can share with you offline that will provide more context on this
18:59:23 <SumitNaiksatam> note that the patch is doing two things
18:59:28 <tbachman> even tho there’s the ondelete=‘CASCADE’?
18:59:57 <SumitNaiksatam> its also getting ready of the use of the mixin methods, which also breaks things
19:00:03 <SumitNaiksatam> we are out of time
19:00:13 <SumitNaiksatam> tbachman: responding offline
19:00:19 <tbachman> SumitNaiksatam: ack. Thx!
19:00:28 <SumitNaiksatam> rkukura: tbachman: thanks for joining!
19:00:31 <tbachman> SumitNaiksatam: bye!
19:00:44 <SumitNaiksatam> #endmeeting