16:00:10 #startmeeting keystone 16:00:10 Meeting started Tue Apr 24 16:00:10 2018 UTC and is due to finish in 60 minutes. The chair is lbragstad. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:11 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:12 ping ayoung, breton, cmurphy, dstanek, gagehugo, henrynash, hrybacki, knikolla, lamt, lbragstad, lwanderley, kmalloc, rderose, rodrigods, samueldmq, spilla, aselius, dpar, jdennis, ruan_he, wxy, sonuk 16:00:14 The meeting name has been set to 'keystone' 16:00:19 o/ 16:00:20 o/ 16:00:22 #link https://etherpad.openstack.org/p/keystone-weekly-meeting 16:00:25 agenda ^ 16:00:43 o/ 16:00:43 o/ 16:00:54 o/ 16:01:15 o/ 16:01:51 #topic specifications 16:02:04 just a heads up - we're now past the specification proposal deadline 16:02:27 we'll continue to press on with reviews of what's been proposed 16:03:01 next important deadline will be specification freeze which is rocky-2, or June 8th 16:03:17 #topic consul 16:03:24 #link http://eavesdrop.openstack.org/irclogs/%23openstack-tc/%23openstack-tc.2018-04-18.log.html#t2018-04-18T13:21:23 16:03:32 ^ consul came up again in the -tc channel 16:03:54 which has been a topic that comes up every once in a while 16:04:20 'discussed but never implemented' -- just always a low prio? 16:04:39 i'm not even sure we made it that far 16:04:59 here is a list of every time it's been mentioned in keystone http://paste.openstack.org/raw/719483/ (i think) 16:05:08 list of logs* 16:05:16 hmm 16:05:37 nice tool lbragstad 16:05:56 does this seem like something that would be appropriate for the idea log? http://specs.openstack.org/openstack/keystone-specs/specs/keystone/ideas/README.html 16:06:26 what is the use case? 16:06:41 so - consul is service discovery software 16:06:59 the use case would be that we would off load service catalog stuff to consul instead of reinventing the wheel 16:07:09 but kmalloc knows way more about it than i do 16:07:15 if we want something clever, like services self-registering. 16:07:18 consul is great 16:07:35 otherwise it's not much different than we have now 16:07:40 so rewrite the service catalog in order to avoid writing a new service catalog? 16:08:26 sorry didn't mean for that to come out so cynical 16:08:34 i was under the impression we would implement a driver or work directly with services to leverage consul 16:08:51 s/driver/catalog driver/ 16:10:23 (i believe the second option would offload the responsibility of the catalog to a separate service, and users would talk to consul to figure out where to go to interact with services in a deployment) 16:13:09 lbragstad: coreect 16:13:14 if we are doing that 16:13:30 but... i am against the speculative "maybe" bit of a driver 16:13:40 if consul isn't really a dep of openstack, I don't want to add it 16:13:51 ++ 16:13:56 it's why i pushed for picking zookeeper...or consul...or even etcd 16:14:06 and we ended up with "well ALL OF THEM/any" 16:14:07 etcd is a base service of openstack 16:14:11 ^ that's a good point that someone brought up in the discussion a couple days ago 16:14:22 etcd is mostly a base service. 16:14:40 so, if we are picking one, i'd say hard set on etcd 16:14:45 not "add consul" 16:14:53 and work around etcd's limitations 16:14:57 if we can 16:15:46 ok - the fact that consul gets brought up every few/six months tells me it's worth investigating, but i don't expect anyone to sign up for this 16:15:59 that said, where should we document this? 16:16:08 *shrug* 16:16:08 it feels like a PoC of some kind is needed 16:16:30 i'm going to be very anti-consul for catalog unless we have some serious weight behind it... even with a PoC 16:16:39 i'd like to see some evidence or arguments for why this is better than what we have now and worth us putting in the effort, or else see someone with a specific use case step up to do the work 16:16:46 we don't have the bandwidth to take on projects just because 16:16:51 cmurphy: ++ 16:16:53 yeah.. 16:17:00 oh - sorry, i meant in more of a general sense and not just consul 16:17:19 So users want Keystone to support service self-registering One of the driver may be consul. 16:17:23 right? 16:17:31 wxy|: sortof... 16:17:45 self-registering and direct query by endpoints vs. in token 16:18:04 cmurphy: i completely agree about bandwidth 16:21:00 i don't want to propose more work for the sake of work, but during the tc discussion it was apparent that this particular thing has been brought up consistently since ~2015 16:22:21 i guess my end goal was to either open a wishlist bug describing the thing that gets brought up or somehow make it discoverable so that if someone did want to do a PoC they wouldn't have to dig through all the context again 16:24:03 I think adding this to the idea list and letting relevant ideas formulate there until a strong ask arises is a good idea 16:24:37 then we have some documented history and can make a more educated decision when priorities begin to shift? 16:24:59 if people don't have any objections to that, i can take a stab at proposing it to the idea log 16:25:04 i think that's fine, as long as the idea is "do some research on this" and not "implement this" 16:25:12 ++ 16:25:14 agreed 16:25:19 ++ 16:25:40 #action lbragstad to fill out and idea form for consul 16:25:58 #topic idempotent migrations 16:26:00 hrybacki: o/ 16:26:05 o/ 16:26:27 So stemming from the review posted by Ade 16:26:29 #link https://review.openstack.org/#/c/563266/ 16:27:15 tl;dr -- let's discuss the merits of making migrations idempotent as a requirement/recommendation/requirement under sitautions x, y, but not z 16:27:31 this would help with that postgres deadlock situation 16:27:41 cmurphy: I'm not following that 16:27:50 let me pull up the bug 16:27:54 ty! 16:28:16 kmalloc: made the good point that making a migration idempotent adds a level of complexity 16:28:31 I'm against 16:28:35 a lot of complexity 16:28:39 https://bugs.launchpad.net/keystone/+bug/1755906 16:28:40 Launchpad bug 1755906 in OpenStack Identity (keystone) "Occasional deadlock during db_sync --contract during Newton to Pike live upgrade" [Medium,Incomplete] 16:28:40 cmurphy: https://bugs.launchpad.net/keystone/+bug/1755906 ? 16:28:47 oh - yep, beat me to it :) 16:28:48 Its too much work, and the migration mechanism is already designed to be Idempotent 16:28:48 ayoung: this is more for *all* migrations (going forward) 16:29:10 and my guidance/recommendation is we dont 16:29:18 instead, we need a better process for doing backports 16:29:21 but i am willing to humor discussing and open to changing my opinion 16:29:37 I think the problem is something like this: 16:29:38 ayoung: this came up because a vendor backported 16:29:43 not upstream backported 16:29:47 kmalloc, I know. I'm the vendor 16:29:51 so am i :P 16:29:52 but... 16:29:54 that said 16:29:56 how complicated is doing a backport with alembic? 16:29:57 we're the vendor, as is hrybacki 16:30:03 and vendors will backport. So we should think about that :) 16:30:09 so...say a feature goes into R 16:30:15 and we decide we want it in Q 16:30:22 i am against feature backports. full stop 16:30:27 migration fills a hole in a prealloc 16:30:28 i'd argue for microversions if it stopped it 16:30:32 kmalloc heh 16:30:43 but...since the feature is not in Q yet, there is no way to make the migration idempotent 16:30:43 backporting features is *bad* and should not be encourages 16:30:50 encouraged* 16:30:57 so...if the vender needs that kind of idempotency, the flow should be: 16:31:10 1. submit the backport to stable. Does not need to be accepted, but should be submitted 16:31:21 as a req to 1 ever getting merged... 16:31:24 lbragstad: alembic is easier, since you can just place a migration at any point (iirc) and can verify it was added/applied side-channel if it hasn't 16:31:30 but... that is a limited understanding 16:31:33 2. Submit a change to the original migration to make it idempotent 16:32:12 ayoung: i'd -2 upstream changes like that in general unless we're striving to make all migrations idempotent 16:32:13 we can be lenient on the idempotent patches. They should have 0 impact 16:32:27 kmalloc, what? Are you looking to make more work? 16:32:28 because migration changes could change resulting schema 16:32:34 So...no 16:32:38 so. no. not more work 16:32:56 the only thing we accept is a commit to an existing migration that confirms the change has already been made, and then skips 16:33:00 just avoiding accidental schema changes that end up with divergent schemas across deployments 16:33:06 say we add a column in migration 101 16:33:19 again, migrations are so touchy as is. 16:33:33 the only thing we would then accept is a change to migration 101 that checks for the columnt and, if it exists, no-ops 16:33:35 changing them in the past could really really break things for some folks 16:33:39 vendor submits both patches 16:33:43 even if it looks like all it does is "check" 16:33:57 my opinion: vendor carries downstream. 16:33:58 there is 0 impact on the postive thread 16:34:11 kmalloc, this is IFF they submit to stable 16:34:25 in the case of the bug brought up... it could have been 100% solved with a different index name 16:34:39 i don't trust them to submit to stable and master and follow through 16:34:39 and that is also why it should be submitted to stable 16:34:44 to get that kind of code review 16:34:46 it has to land in multiple places. 16:34:56 this feels like a really bad idea to support 16:34:58 I'm genuinely curious what the rest of the cores feel regarding vendors carrying downstream backports for migrations vs planning for that to happen in upstream 16:35:06 so...we may end up NEVER accepting these, but we should document the process. 16:35:07 * kmalloc quiets up and lets other speak 16:35:15 * ayoung too 16:35:22 * hrybacki leans on others' experience and knowledge 16:35:35 and everyone looks at cmurphy! 16:36:01 * kmalloc stares intently at lbragstad too. 16:36:21 * hrybacki rubs gagehugo's shoulders encouragingly 16:36:22 umm 16:36:25 we mostly don't carry patches for any major features 16:36:40 we don't have any for migrations that i'm aware of 16:36:52 I don't think we carry backported features across releases 16:37:07 we have, in the past, accepted changes to old migrations, when an issue has been identified that required it. 16:37:11 i'm unfamiliar with how we're managing that relationship internally 16:37:36 curious 16:37:37 ayoung: the issues were not for "we backported this" it has always been "this migration fails in this way specifically" 16:37:53 i will say that after looking into the osp issue, i see that the migration process is pretty tricky to get right 16:38:00 and those were reproducable bugs... some around things like MySQL's DDL was finacky 16:38:09 if you don't propose the original patch properly, you're kinda screwed 16:38:38 my view is pretty strongly: we don't support backporting migrations outside of legit bugs for upstream fixes. 16:39:01 if you're backporting a migration --- be aware it will be run again in upgrade. 16:39:34 and take proper steps to ensure (e.g. use a different index name) you're not causing failures. 16:39:43 kmalloc, so this would be the process we would need iff we backported a fix to the stable branch 16:40:08 ^^ that scenario has to arise, right? 16:40:12 I say we document it, and ensure that it is labeld "IN CASE OF FIRE BREAK GLASS" or sommat 16:40:22 we already have a way to backport migrations 16:40:35 make a migration in master (idempotent) and in a placeholder, backport placeholder 16:41:03 kmalloc, but that will break if the migration in master is not idempotent. So Add that step in. 16:41:08 no it wont 16:41:14 so when do we decide to backport to stable or not? The bug raising this question goes way back -- at least to Liberty 16:41:20 we never ever ever ever backport a landed migration. 16:41:37 if we're doing a backport it's a BUG fix specific migration 16:41:47 and that is written to be idempotent to begin with 16:42:04 that's the step we missed with migration 22 16:42:07 and we then land it in the placeholders (also idempotent) and then we backport just the placeholder 16:42:14 right, that iddn't happen this time -- my point 16:42:15 we didn't backport it. 16:42:15 say we drop an index in master. then we realize we need it in stable. As part of that fix, we realize that id we then go from stable to master (or later stable) we are going to break. So, yeah, accepting the "check before drop" as a modification to the master migration is the only thing we can do. 16:42:57 ayoung: lets table that, you are talking about a different scenario that is not applicable here 16:43:21 ++ 16:43:47 ayoung: i'm not arguing about having a mechanism to support smart migration backporting (and we have that in place -- we can better document it) 16:43:50 :) 16:43:52 right so in summary: We only 'backport' migrations when it's related to a bug that affects versions older than master? 16:44:19 the summary is: if we have a bug that requires fixing a migration there are 2 scenarios 16:44:33 1) Bug affects everyone and we create a new migration and placeholders for bakcporting 16:44:35 hrybacki, I think that, when in doubt, submit your patch to stable. And we can make the discussion public. Might not get accepted, but at least you will know why. 16:45:15 2) the bug is in a specific migration and in some scenarios that migration is failing, but strictly in an upgrade (not a backported thing done out of tree). This is fixed in master and backported directly 16:45:46 2 is a very very rare case 16:45:52 we have had maybe 1 of those 16:45:53 or 2 16:46:14 ~14 minutes remaining 16:46:15 case #1 is not common, but an example is: "we need to make sure indexes are consistently named" 16:46:42 we write a migration to fix it, and placeholder backports to fix it in older releases so we're not randomly breaking due to bad assumptions 16:46:51 everything in case #1 is idempotent 16:47:25 Anything else on the Agenda? 16:47:29 ack. I think I have a good understanding. (lbragstad please push us forward if there is another item on the agenda) 16:47:48 nope - that was our last topic 16:47:56 i didn't mean to cut the discussion short 16:48:04 ack 16:48:04 just keeping an eye on time is all 16:48:06 OK...topic I want people to think about. We'll discuss this at the summit but... 16:48:17 #topic open discussion 16:48:41 ayoung: the floor is yours 16:48:41 Deleting a project today leaves all objects in remote services orphaned. What should we do about this? 16:49:26 And...I think that the implied task in that is "how would a user figure out everything that they have labeled under a project to delete it" 16:49:29 and also 16:49:44 "how would I lock a project so I can delete stuff without letting someone else still create stuff" 16:49:49 I don't think there are easy answers 16:50:08 and I don't think that we can just punt on it, either, say it is not a Keystone problem 16:50:18 it is not exclusively a Keystone problem, obviously 16:50:39 but we should be working cross-project to come up with a range of solutions 16:50:45 i was under the impression that notifications were supposed to help with this 16:50:53 iirc, i chatted with nova and nova had a mechanism that would already allow this 16:50:57 direct API calls 16:51:10 notifications could help, but that is possibly wonky 16:51:19 since notifications could be missed. 16:51:27 Issue with notifications is that the services currently have no listeners. 16:51:52 So, one solution would be to build a "cleanup service" that got the project delete notification and had sufficient capabilities to delete...everything? 16:52:38 ayoung: ++ that's what we do in downstream. 16:52:40 so - that would still be contingent upon the notification getting received by something 16:53:16 lbragstad, right,although you could also kick it off manually. 16:53:32 i suppose 16:53:47 i assume that service would need a user? 16:54:22 lbragstad, I think it would be a service level role to perform, or we would have to "fake out" a project token for the deleted project 16:54:28 would deployments looking to achieve decentralized authorization from NISTs RBAC model be hesitant to adopting that kind of solution? 16:54:54 since a lot of authorization would bubble up to that role? 16:55:01 or service user? 16:55:17 It is a lot of power. 16:55:39 yeah... but it's necessary if you go across the API boundaries, isn't it? 16:55:45 I'd rather have it done with a soft-delete of the project, followed by cleanup, followed by a hard delete 16:56:05 ayoung: hm. 16:56:06 wouldn't that require the same amount of authorization? 16:56:41 lbragstad, I think "delete only tokens" might make sense once we have finer grained roles. 16:56:44 ayoung: i could probably spin up some Resource-Option fu for projects that would support "soft" delete 16:56:48 in a couple hours 16:57:08 (different semantically from "delete" or "disable") 16:57:18 it could be the original role assignment that is used to determine who can delete 16:57:26 isn't disable == soft delete? 16:57:30 lbragstad: sortof. 16:57:32 not really. 16:57:47 and we already implement that 16:57:49 we could say that disable is a pre-req, tho 16:57:55 it is somewhere between disable and soft-delete 16:57:59 disable means you can get delete only tokens... 16:58:06 * ayoung waves hands 16:58:16 because we can't change disable to waht ayoung says 16:58:22 since it changes api behavior 16:58:43 also, disable is sortof meant to be reversible 16:58:54 It also leaves the question of "what do I delete where" up in the air 16:59:15 I mean, you could start by walking the service catalog, and say "ok, Nova...whatcha got? Delete that" 16:59:35 ayoung: i really hope we can simplify it long term to "nova - cleanup resources for project X" 16:59:37 one thing to avoid is the race condition 16:59:43 not "what do you have, delete 1, delete 2, etc" 16:59:44 ;) 16:59:52 "I keep trying to delte things, but kmalloc keeps making new VMs on me" 17:00:09 and with that we're out of time, 17:00:15 kmalloc, so...yeah, that is exactly what I think we should work towards 17:00:15 Reminder that we are conducting our M1 retrospective. Call-in: https://bluejeans.com/8559013623 Trello Board: https://trello.com/b/PiJecAs4/keystone-rocky-m1-retrospective -- I have to break. Let's plan to join in 10? Please add cards in respective columns as you may 17:00:23 we can continue this in office hours after the retro though 17:00:34 thanks for coming 17:00:34 o/ 17:00:45 #endmeeting