#openstack-meeting-alt log

16:00:10 <lbragstad> #startmeeting keystone
16:00:10 <openstack> Meeting started Tue Apr 24 16:00:10 2018 UTC and is due to finish in 60 minutes.  The chair is lbragstad. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:11 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:12 <lbragstad> ping ayoung, breton, cmurphy, dstanek, gagehugo, henrynash, hrybacki, knikolla, lamt, lbragstad, lwanderley, kmalloc, rderose, rodrigods, samueldmq, spilla, aselius, dpar, jdennis, ruan_he, wxy, sonuk
16:00:14 <openstack> The meeting name has been set to 'keystone'
16:00:19 <gagehugo> o/
16:00:20 <hrybacki> o/
16:00:22 <lbragstad> #link https://etherpad.openstack.org/p/keystone-weekly-meeting
16:00:25 <lbragstad> agenda ^
16:00:43 <cmurphy> o/
16:00:43 <knikolla> o/
16:00:54 <wxy|> o/
16:01:15 <kmalloc> o/
16:01:51 <lbragstad> #topic specifications
16:02:04 <lbragstad> just a heads up - we're now past the specification proposal deadline
16:02:27 <lbragstad> we'll continue to press on with reviews of what's been proposed
16:03:01 <lbragstad> next important deadline will be specification freeze which is rocky-2, or June 8th
16:03:17 <lbragstad> #topic consul
16:03:24 <lbragstad> #link http://eavesdrop.openstack.org/irclogs/%23openstack-tc/%23openstack-tc.2018-04-18.log.html#t2018-04-18T13:21:23
16:03:32 <lbragstad> ^ consul came up again in the -tc channel
16:03:54 <lbragstad> which has been a topic that comes up every once in a while
16:04:20 <hrybacki> 'discussed but never implemented' -- just always a low prio?
16:04:39 <lbragstad> i'm not even sure we made it that far
16:04:59 <lbragstad> here is a list of every time it's been mentioned in keystone http://paste.openstack.org/raw/719483/ (i think)
16:05:08 <lbragstad> list of logs*
16:05:16 <gagehugo> hmm
16:05:37 <hrybacki> nice tool lbragstad
16:05:56 <lbragstad> does this seem like something that would be appropriate for the idea log? http://specs.openstack.org/openstack/keystone-specs/specs/keystone/ideas/README.html
16:06:26 <cmurphy> what is the use case?
16:06:41 <lbragstad> so - consul is service discovery software
16:06:59 <lbragstad> the use case would be that we would off load service catalog stuff to consul instead of reinventing the wheel
16:07:09 <lbragstad> but kmalloc knows way more about it than i do
16:07:15 <kmalloc> if we want something clever, like services self-registering.
16:07:18 <kmalloc> consul is great
16:07:35 <kmalloc> otherwise it's not much different than we have now
16:07:40 <cmurphy> so rewrite the service catalog in order to avoid writing a new service catalog?
16:08:26 <cmurphy> sorry didn't mean for that to come out so cynical
16:08:34 <lbragstad> i was under the impression we would implement a driver or work directly with services to leverage consul
16:08:51 <lbragstad> s/driver/catalog driver/
16:10:23 <lbragstad> (i believe the second option would offload the responsibility of the catalog to a separate service, and users would talk to consul to figure out where to go to interact with services in a deployment)
16:13:09 <kmalloc> lbragstad: coreect
16:13:14 <kmalloc> if we are doing that
16:13:30 <kmalloc> but... i am against the speculative "maybe" bit of a driver
16:13:40 <kmalloc> if consul isn't really a dep of openstack, I don't want to add it
16:13:51 <cmurphy> ++
16:13:56 <kmalloc> it's why i pushed for picking zookeeper...or consul...or even etcd
16:14:06 <kmalloc> and we ended up with "well ALL OF THEM/any"
16:14:07 <cmurphy> etcd is a base service of openstack
16:14:11 <lbragstad> ^ that's a good point that someone brought up in the discussion a couple days ago
16:14:22 <kmalloc> etcd is mostly a base service.
16:14:40 <kmalloc> so, if we are picking one, i'd say hard set on etcd
16:14:45 <kmalloc> not "add consul"
16:14:53 <kmalloc> and work around etcd's limitations
16:14:57 <kmalloc> if we can
16:15:46 <lbragstad> ok - the fact that consul gets brought up every few/six months tells me it's worth investigating, but i don't expect anyone to sign up for this
16:15:59 <lbragstad> that said, where should we document this?
16:16:08 <kmalloc> *shrug*
16:16:08 <lbragstad> it feels like a PoC of some kind is needed
16:16:30 <kmalloc> i'm going to be very anti-consul for catalog unless we have some serious weight behind it... even with a PoC
16:16:39 <cmurphy> i'd like to see some evidence or arguments for why this is better than what we have now and worth us putting in the effort, or else see someone with a specific use case step up to do the work
16:16:46 <cmurphy> we don't have the bandwidth to take on projects just because
16:16:51 <kmalloc> cmurphy: ++
16:16:53 <gagehugo> yeah..
16:17:00 <lbragstad> oh - sorry, i meant in more of a general sense and not just consul
16:17:19 <wxy|> So users want Keystone to support service self-registering One of the driver may be consul.
16:17:23 <wxy|> right?
16:17:31 <kmalloc> wxy|: sortof...
16:17:45 <kmalloc> self-registering and direct query by endpoints vs. in token
16:18:04 <lbragstad> cmurphy: i completely agree about bandwidth
16:21:00 <lbragstad> i don't want to propose more work for the sake of work, but during the tc discussion it was apparent that this particular thing has been brought up consistently since ~2015
16:22:21 <lbragstad> i guess my end goal was to either open a wishlist bug describing the thing that gets brought up or somehow make it discoverable so that if someone did want to do a PoC they wouldn't have to dig through all the context again
16:24:03 <hrybacki> I think adding this to the idea list and letting relevant ideas formulate there until a strong ask arises is a good idea
16:24:37 <hrybacki> then we have some documented history and can make a more educated decision when priorities begin to shift?
16:24:59 <lbragstad> if people don't have any objections to that, i can take a stab at proposing it to the idea log
16:25:04 <cmurphy> i think that's fine, as long as the idea is "do some research on this" and not "implement this"
16:25:12 <lbragstad> ++
16:25:14 <lbragstad> agreed
16:25:19 <hrybacki> ++
16:25:40 <lbragstad> #action lbragstad to fill out and idea form for consul
16:25:58 <lbragstad> #topic idempotent migrations
16:26:00 <lbragstad> hrybacki: o/
16:26:05 <hrybacki> o/
16:26:27 <hrybacki> So stemming from the review posted by Ade
16:26:29 <hrybacki> #link https://review.openstack.org/#/c/563266/
16:27:15 <hrybacki> tl;dr -- let's discuss the merits of making migrations idempotent as a requirement/recommendation/requirement under sitautions x, y, but not z
16:27:31 <cmurphy> this would help with that postgres deadlock situation
16:27:41 <hrybacki> cmurphy: I'm not following that
16:27:50 <cmurphy> let me pull up the bug
16:27:54 <hrybacki> ty!
16:28:16 <hrybacki> kmalloc: made the good point that making a migration idempotent adds a level of complexity
16:28:31 <ayoung> I'm against
16:28:35 <kmalloc> a lot of complexity
16:28:39 <cmurphy> https://bugs.launchpad.net/keystone/+bug/1755906
16:28:40 <openstack> Launchpad bug 1755906 in OpenStack Identity (keystone) "Occasional deadlock during db_sync --contract during Newton to Pike live upgrade" [Medium,Incomplete]
16:28:40 <lbragstad> cmurphy: https://bugs.launchpad.net/keystone/+bug/1755906 ?
16:28:47 <lbragstad> oh - yep, beat me to it :)
16:28:48 <ayoung> Its too much work, and the migration mechanism is already designed to be Idempotent
16:28:48 <kmalloc> ayoung: this is more for *all* migrations (going forward)
16:29:10 <kmalloc> and my guidance/recommendation is we dont
16:29:18 <ayoung> instead, we need a better process for doing backports
16:29:21 <kmalloc> but i am willing to humor discussing and open to changing my opinion
16:29:37 <ayoung> I think the problem is something like this:
16:29:38 <kmalloc> ayoung: this came up because a vendor backported
16:29:43 <kmalloc> not upstream backported
16:29:47 <ayoung> kmalloc, I know.  I'm the vendor
16:29:51 <kmalloc> so am i :P
16:29:52 <kmalloc> but...
16:29:54 <kmalloc> that said
16:29:56 <lbragstad> how complicated is doing a backport with alembic?
16:29:57 <ayoung> we're the vendor, as is hrybacki
16:30:03 <hrybacki> and vendors will backport. So we should think about that :)
16:30:09 <ayoung> so...say a feature goes into R
16:30:15 <ayoung> and we decide we want it in Q
16:30:22 <kmalloc> i am against feature backports. full stop
16:30:27 <ayoung> migration fills a hole in a prealloc
16:30:28 <kmalloc> i'd argue for microversions if it stopped it
16:30:32 <gagehugo> kmalloc heh
16:30:43 <ayoung> but...since the feature is not in Q yet, there is no way to make the migration idempotent
16:30:43 <kmalloc> backporting features is *bad* and should not be encourages
16:30:50 <kmalloc> encouraged*
16:30:57 <ayoung> so...if the vender needs that kind of idempotency, the flow should be:
16:31:10 <ayoung> 1. submit the backport to stable.  Does not need to be accepted, but should be submitted
16:31:21 <ayoung> as a req to 1 ever getting merged...
16:31:24 <kmalloc> lbragstad: alembic is easier, since you can just place a migration at any point (iirc) and can verify it was added/applied side-channel if it hasn't
16:31:30 <kmalloc> but... that is a limited understanding
16:31:33 <ayoung> 2.  Submit a change to the original migration to make it idempotent
16:32:12 <kmalloc> ayoung: i'd -2 upstream changes like that in general unless we're striving to make all migrations idempotent
16:32:13 <ayoung> we can be lenient on the idempotent patches.  They should have 0 impact
16:32:27 <ayoung> kmalloc, what?  Are you looking to make more work?
16:32:28 <kmalloc> because migration changes could change resulting schema
16:32:34 <ayoung> So...no
16:32:38 <kmalloc> so. no. not more work
16:32:56 <ayoung> the only thing we accept is a commit to an existing migration that confirms the change has already been made, and then skips
16:33:00 <kmalloc> just avoiding accidental schema changes that end up with divergent schemas across deployments
16:33:06 <ayoung> say we add a column in migration 101
16:33:19 <kmalloc> again, migrations are so touchy as is.
16:33:33 <ayoung> the only thing we would then accept is a change to migration 101 that checks for the columnt and, if it exists, no-ops
16:33:35 <kmalloc> changing them in the past could really really break things for some folks
16:33:39 <ayoung> vendor submits both patches
16:33:43 <kmalloc> even if it looks like all it does is "check"
16:33:57 <kmalloc> my opinion: vendor carries downstream.
16:33:58 <ayoung> there is 0 impact on the postive thread
16:34:11 <ayoung> kmalloc, this is IFF they submit to stable
16:34:25 <kmalloc> in the case of the bug brought up... it could have been 100% solved with a different index name
16:34:39 <kmalloc> i don't trust them to submit to stable and master and follow through
16:34:39 <ayoung> and that is also why it should be submitted to stable
16:34:44 <ayoung> to get that kind of code review
16:34:46 <kmalloc> it has to land in multiple places.
16:34:56 <kmalloc> this feels like a really bad idea to support
16:34:58 <hrybacki> I'm genuinely curious what the rest of the cores feel regarding vendors carrying downstream backports for migrations vs planning for that to happen in upstream
16:35:06 <ayoung> so...we may end up NEVER accepting these, but we should document the process.
16:35:07 * kmalloc quiets up and lets other speak
16:35:15 * ayoung too
16:35:22 * hrybacki leans on others' experience and knowledge
16:35:35 <ayoung> and everyone looks at cmurphy!
16:36:01 * kmalloc stares intently at lbragstad too.
16:36:21 * hrybacki rubs gagehugo's shoulders encouragingly
16:36:22 <lbragstad> umm
16:36:25 <cmurphy> we mostly don't carry patches for any major features
16:36:40 <cmurphy> we don't have any for migrations that i'm aware of
16:36:52 <gagehugo> I don't think we carry backported features across releases
16:37:07 <ayoung> we have, in the past, accepted changes to old migrations, when an issue has been identified that required it.
16:37:11 <lbragstad> i'm unfamiliar with how we're managing that relationship internally
16:37:36 <hrybacki> curious
16:37:37 <kmalloc> ayoung: the issues were not for "we backported this" it has always been "this migration fails in this way specifically"
16:37:53 <lbragstad> i will say that after looking into the osp issue, i see that the migration process is pretty tricky to get right
16:38:00 <kmalloc> and those were reproducable bugs... some around things like MySQL's DDL was finacky
16:38:09 <lbragstad> if you don't propose the original patch properly, you're kinda screwed
16:38:38 <kmalloc> my view is pretty strongly: we don't support backporting migrations outside of legit bugs for upstream fixes.
16:39:01 <kmalloc> if you're backporting a migration --- be aware it will be run again in upgrade.
16:39:34 <kmalloc> and take proper steps to ensure (e.g. use a different index name) you're not causing failures.
16:39:43 <ayoung> kmalloc, so this would be the process we would need iff we backported a fix to the stable branch
16:40:08 <hrybacki> ^^ that scenario has to arise, right?
16:40:12 <ayoung> I say we document it, and ensure that it is labeld "IN CASE OF FIRE BREAK GLASS" or sommat
16:40:22 <kmalloc> we already have a way to backport migrations
16:40:35 <kmalloc> make a migration in master (idempotent) and in a placeholder, backport placeholder
16:41:03 <ayoung> kmalloc, but that will break if the migration in master is not idempotent.  So Add that step in.
16:41:08 <kmalloc> no it wont
16:41:14 <hrybacki> so when do we decide to backport to stable or not? The bug raising this question goes way back -- at least to Liberty
16:41:20 <kmalloc> we never ever ever ever backport a landed migration.
16:41:37 <kmalloc> if we're doing a backport it's a BUG fix specific migration
16:41:47 <kmalloc> and that is written to be idempotent to begin with
16:42:04 <lbragstad> that's the step we missed with migration 22
16:42:07 <kmalloc> and we then land it in the placeholders (also idempotent) and then we backport just the placeholder
16:42:14 <hrybacki> right, that iddn't happen this time -- my point
16:42:15 <kmalloc> we didn't backport it.
16:42:15 <ayoung> say we drop an index in master.  then we realize we need it in stable. As part of that fix, we realize that id we then go from stable to master (or later stable) we are going to break.  So, yeah, accepting the "check before drop" as a modification to the master migration is the only thing we can do.
16:42:57 <kmalloc> ayoung: lets table that, you are talking about a different scenario that is not applicable here
16:43:21 <ayoung> ++
16:43:47 <kmalloc> ayoung: i'm not arguing about having a mechanism to support smart migration backporting (and we have that in place -- we can better document it)
16:43:50 <kmalloc> :)
16:43:52 <hrybacki> right so in summary: We only 'backport' migrations when it's related to a bug that affects  versions older than master?
16:44:19 <kmalloc> the summary is: if we have a bug that requires fixing a migration there are 2 scenarios
16:44:33 <kmalloc> 1) Bug affects everyone and we create a new migration and placeholders for bakcporting
16:44:35 <ayoung> hrybacki, I think that, when in doubt, submit your patch to stable.  And we can make the discussion public.  Might not get accepted, but at  least you will know why.
16:45:15 <kmalloc> 2) the bug is in a specific migration and in some scenarios that migration is failing, but strictly in an upgrade (not a backported thing done out of tree). This is fixed in master and backported directly
16:45:46 <kmalloc> 2 is a very very rare case
16:45:52 <kmalloc> we have had maybe 1 of those
16:45:53 <kmalloc> or 2
16:46:14 <lbragstad> ~14 minutes remaining
16:46:15 <kmalloc> case #1 is not common, but an example is: "we need to make sure indexes are consistently named"
16:46:42 <kmalloc> we write a migration to fix it, and placeholder backports to fix it in older releases so we're not randomly breaking due to bad assumptions
16:46:51 <kmalloc> everything in case #1 is idempotent
16:47:25 <ayoung> Anything else on the Agenda?
16:47:29 <hrybacki> ack. I think I have a good understanding. (lbragstad please push us forward if there is another item on the agenda)
16:47:48 <lbragstad> nope - that was our last topic
16:47:56 <lbragstad> i didn't mean to cut the discussion short
16:48:04 <hrybacki> ack
16:48:04 <lbragstad> just keeping an eye on time is all
16:48:06 <ayoung> OK...topic I want people to think about.  We'll discuss this at the summit but...
16:48:17 <lbragstad> #topic open discussion
16:48:41 <lbragstad> ayoung: the floor is yours
16:48:41 <ayoung> Deleting a project today leaves all objects in remote services orphaned.  What should we do about this?
16:49:26 <ayoung> And...I think that the implied task in that is "how would a user figure out everything that they have labeled under a project to delete it"
16:49:29 <ayoung> and also
16:49:44 <ayoung> "how would I lock a project so I can delete stuff without letting someone else still create stuff"
16:49:49 <ayoung> I don't think there are easy answers
16:50:08 <ayoung> and I don't think that we can just punt on it, either, say it is not a Keystone problem
16:50:18 <ayoung> it is not exclusively a Keystone problem, obviously
16:50:39 <ayoung> but we should be working cross-project to come up with a range of solutions
16:50:45 <lbragstad> i was under the impression that notifications were supposed to help with this
16:50:53 <kmalloc> iirc, i chatted with nova and nova had a mechanism that would already allow this
16:50:57 <kmalloc> direct API calls
16:51:10 <kmalloc> notifications could help, but that is possibly wonky
16:51:19 <kmalloc> since notifications could be missed.
16:51:27 <ayoung> Issue with notifications is that the services currently have no listeners.
16:51:52 <ayoung> So, one solution would be to build a "cleanup service" that got the project delete notification and had sufficient capabilities to delete...everything?
16:52:38 <wxy|> ayoung: ++ that's what we do in downstream.
16:52:40 <lbragstad> so - that would still be contingent upon the notification getting received by something
16:53:16 <ayoung> lbragstad, right,although you could also kick it off manually.
16:53:32 <lbragstad> i suppose
16:53:47 <lbragstad> i assume that service would need a user?
16:54:22 <ayoung> lbragstad, I think it would be a service level role to perform, or we would have to "fake out" a project token for the deleted project
16:54:28 <lbragstad> would deployments looking to achieve decentralized authorization from NISTs RBAC model be hesitant to adopting that kind of solution?
16:54:54 <lbragstad> since a lot of authorization would bubble up to that role?
16:55:01 <lbragstad> or service user?
16:55:17 <ayoung> It is a lot of power.
16:55:39 <lbragstad> yeah... but it's necessary if you go across the API boundaries, isn't it?
16:55:45 <ayoung> I'd rather have it done with a soft-delete of the project, followed by cleanup, followed by a hard delete
16:56:05 <kmalloc> ayoung: hm.
16:56:06 <lbragstad> wouldn't that require the same amount of authorization?
16:56:41 <ayoung> lbragstad, I think "delete only tokens" might make sense once we have finer grained roles.
16:56:44 <kmalloc> ayoung: i could probably spin up some Resource-Option fu for projects that would support "soft" delete
16:56:48 <kmalloc> in a couple hours
16:57:08 <kmalloc> (different semantically from "delete" or "disable")
16:57:18 <ayoung> it could be the original role assignment that is used to determine who can delete
16:57:26 <lbragstad> isn't disable == soft delete?
16:57:30 <kmalloc> lbragstad: sortof.
16:57:32 <kmalloc> not really.
16:57:47 <lbragstad> and we already implement that
16:57:49 <ayoung> we could say that disable is a pre-req, tho
16:57:55 <kmalloc> it is somewhere between disable and soft-delete
16:57:59 <ayoung> disable means you can get delete only tokens...
16:58:06 * ayoung waves hands
16:58:16 <kmalloc> because we can't change disable to waht ayoung says
16:58:22 <kmalloc> since it changes api behavior
16:58:43 <kmalloc> also, disable is sortof meant to be reversible
16:58:54 <ayoung> It also leaves the question of "what do I delete where" up in the air
16:59:15 <ayoung> I mean, you could start by walking the service catalog, and say "ok, Nova...whatcha got?  Delete that"
16:59:35 <kmalloc> ayoung: i really hope we can simplify it long term to "nova - cleanup resources for project X"
16:59:37 <ayoung> one thing to avoid is the race condition
16:59:43 <kmalloc> not "what do you have, delete 1, delete 2, etc"
16:59:44 <kmalloc> ;)
16:59:52 <ayoung> "I keep trying to delte things, but kmalloc keeps making new  VMs on me"
17:00:09 <lbragstad> and with that we're out of time,
17:00:15 <ayoung> kmalloc, so...yeah, that is exactly what I think we should work towards
17:00:15 <hrybacki> Reminder that we are conducting our M1 retrospective. Call-in: https://bluejeans.com/8559013623 Trello Board: https://trello.com/b/PiJecAs4/keystone-rocky-m1-retrospective -- I have to break. Let's plan to join in 10? Please add cards in respective columns as you may
17:00:23 <lbragstad> we can continue this in office hours after the retro though
17:00:34 <lbragstad> thanks for coming
17:00:34 <hrybacki> o/
17:00:45 <lbragstad> #endmeeting