16:00:12 #startmeeting cinder-nova-api-changes 16:00:12 Meeting started Thu Apr 27 16:00:12 2017 UTC and is due to finish in 60 minutes. The chair is ildikov. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:13 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:16 The meeting name has been set to 'cinder_nova_api_changes' 16:00:21 DuncanT ameade cFouts johnthetubaguy jaypipes takashin alaski e0ne jgriffith tbarron andrearosa hemna erlon mriedem gouthamr ebalduf patrickeast smcginnis diablo_rojo gsilvis xyang1 raj_singh lyarwood breitz jungleboyj 16:00:31 o/ 16:00:45 o/ 16:00:48 \o 16:01:28 o/ 16:01:40 hi :) 16:01:44 o/ 16:02:46 ok, let's jump in the middle 16:02:54 o/ 16:02:59 most of the ongoing items are here: https://review.openstack.org/#/q/topic:bp/cinder-new-attach-apis 16:03:29 stvnoyes updated shutdown_instance: https://review.openstack.org/#/c/456877/ 16:03:42 and local_cleanup: https://review.openstack.org/#/c/456851/ 16:03:46 with tests 16:04:13 those two are standalone ones, plz review and let us know if there's anything else that should be addressed there 16:04:50 so the etherpad is a touch out of date I guess 16:04:52 we should slow down because we have some open questions on some existing patches 16:04:54 I added comments about attachment_update in the chain mriedem picked up starting with terminate_volume_connections: https://review.openstack.org/#/c/456896/ 16:04:59 #link https://etherpad.openstack.org/p/pike-nova-priorities-tracking 16:05:30 i.e. johnthetubaguy is wondering in https://review.openstack.org/#/c/456896/ if we should delete the attachment, or update the attachment to set connector=None 16:05:41 and i have those exact same questions in https://review.openstack.org/#/c/456971/ for swap volume 16:05:55 so I fear its not that simple 16:05:59 johnthetubaguy: would adding the topic there for sub-team review make sense? 16:06:13 ildikov: I think it is there 16:06:41 mriedem: so you cannot update the attachment with connector=None today 16:06:52 as attachment_update blows up in that case 16:07:13 so if we look at terminate connection, I think there are two things we could be doing 16:07:17 the idea what we also started to follow is to create a new attachment and delete the old one 16:07:30 (1) deleting, (2) moving 16:07:35 ildikov +1 16:07:44 We made the decision to stop moving 16:08:01 johnthetubaguy: terminate connection always terminates the connection regardless of calling detach or not 16:08:04 Attachment-update at one point was actually attachment-finalize but that was changed 16:08:07 so moving I am talking about is a create and delete (in that order) 16:08:20 ildikov: the problem is we need to make sure the new attachment is created first 16:08:49 johnthetubaguy: if we have a new attachment for the new volume/host that should cover the not calling detach case I think 16:08:56 so delete attachment is replicating both terminate + detach 16:08:59 its not like we can just delete attachments where we used to terminate 16:09:22 ildikov: it can, its messy, but it can 16:09:33 jgriffith: I start to feel sorry for insisting to switch to 'update' from 'finalize'...:/ 16:09:40 johnthetubaguy we can certainly do the create and then delete if we need to 16:09:48 one of the questions i had in the swap volume patch was, 16:09:50 ildikov haha.. nah 16:09:57 will migrate_volume_completion handle that we deleted the old attachment? 16:10:32 retype gets pretty damn nutty 16:11:03 where is that again? 16:11:45 johnthetubaguy: https://review.openstack.org/#/c/456971/5/nova/compute/manager.py@5030 16:11:57 i have TODOs in there asking what i should be doing wrt delete vs update 16:12:05 sounds like i can't do update with connector=None so that's my answer 16:12:13 btw, we really need to get these APIs documented 16:12:18 mriedem: so where you call update, we can call delete there as you created a new attachment already 16:12:26 jungleboyj: do you have minions that can document these APIs? 16:12:29 for 3.27 16:12:45 which reserves the volume just like when you didn't call detach with the old flow 16:13:09 mriedem: I have no minions right now. :-( 16:13:14 mriedem: what would be the purpose of update in the swap case there? 16:13:36 mriedem: there's a patch up there to document the new attach/detach API 16:13:36 smcginnis: tommylikehu has been working on some of that, right? 16:14:00 jungleboyj: yes, that's what I'm referring to, just don't have the link handy 16:14:05 jungleboyj: There have been various api-ref updates. 16:14:34 mriedem: ah, I see what you mean... I am confused in all that code 16:15:14 API docs patch: https://review.openstack.org/#/c/451252/ 16:15:14 johnthetubaguy: i wrote it and i'm confused 16:15:20 Mriedem jungleboyj I’ll look at updating the docs we have here https://github.com/openstack/cinder/blob/master/doc/source/devref/attach_detach_conventions_v2.rst 16:15:42 :) 16:15:42 ildikov: Thanks, I could not find that patch. 16:15:46 ildikov: the purpose of the update_attachment call with connector=None was me trying to terminate the connection but keep the attachment around for migrate_volume_completion 16:15:56 ildikov: but that's why i had a todo, because i didn't know what i as the client should be doing 16:16:04 so I remember tracing back migrate_volume_completion now, and its kinda "bi-modal" depending on if cinder started the migration 16:16:24 jgriffith: Can you look at that api-ref patch and make sure it's in sync with your view? 16:16:34 smcginnis: +1 16:16:35 jgriffith: to be clear i'm looking for it here https://developer.openstack.org/api-ref/block-storage/v3/ 16:16:37 mriedem: what does migrate_volume_completion do exactly? 16:16:39 mriedem ildikov tommylikehu so if you have an alternate idea I’m open, but my intent here was that we don’t re-use attachments any more. They’re disposable things 16:16:52 jgriffith, +1 16:16:53 jgriffith: i'm fine with not re-using it, i just didn't know 16:16:55 hence the todo to ask 16:17:01 That way we eliminate the need for any funny reference counting or corner cases on keeping things around and updating them 16:17:07 ildikov: its a crazy cinder API that is a Nova call back during a volume migrate 16:17:10 jgriffith: Cool, thanks. 16:17:23 mriedem yeah, I got ya, just explaining my view and opening it for discussion if needed 16:17:24 ildikov: yeah it's nova telling cinder that we completed our side of the retype operatoin 16:17:33 ildikov: and we say if it failed on the nova side or not 16:17:46 so that cinder can rollback 16:17:47 the new volume 16:17:48 ildikov: cinder API: volume migrate -> nova API: swap volume -> cinder API: migrate_volume_completion 16:17:50 I think 16:17:57 johnthetubaguy: correct 16:18:07 plus 30 sub-api calls in between :) 16:18:20 also... user creates volume and calls nova API: swap volume -> cinder API: migrate_volume_completion 16:18:27 * johnthetubaguy nods 16:18:33 jgriffith: I don't want to re-use them either, but we need to figure out how to get a new flow with some old things like this migrate_volume_completion stuff... :/ 16:18:55 mriedem: johnthetubaguy: that's a little bit nuts... :) 16:18:58 I think in the first case, the volume_uuid is swapped, but its not swapped in the second case, or something like that 16:19:10 * ildikov wishes we could simplify that somehow... 16:19:31 btw, migrate_volume_completion is not ready for multi attach 16:20:10 ildikov `git revert` 16:20:11 tommylikehu: +100 16:20:21 tommylikehu: I think currently it might not even ready for single attach... :( 16:20:33 heh 16:21:06 yeah, I am not totally sure how that all works with the new API, now you mention it 16:21:13 jgriffith: we keep the new flow for sure, just need to clean up some more old stuff 16:21:18 mriedem: is there an easier place to start, like migrate? 16:21:54 jgriffith: and that's non-negotiable :) 16:22:02 I just feel like if we did an easier one, it would be clearer whats missing/odd 16:22:29 johnthetubaguy: the two standalone things are ready for review I mentioned at the beginning of the meeting 16:22:48 johnthetubaguy: the last patch in that chain is migrate and I couldn't really get there to give much thought to it 16:22:57 johnthetubaguy: so we can play with that 16:23:04 yeah, not seen those patches before now 16:23:05 changing the order of patches is simple too 16:23:45 johnthetubaguy: post_live_migrate: https://review.openstack.org/#/c/456988/ 16:24:09 tommylikehu: we aren't supporting anything with multiattach in nova yet 16:24:27 johnthetubaguy: i generally don't combine "easier" and "migrate" 16:24:46 so doing post_live-migrate seems hard if you haven't done the first bit 16:24:58 mriedem: me neither, seemed strange saying that 16:25:05 feels dirty 16:25:12 mriedem: lol, my thinking too :) 16:25:53 johnthetubaguy: so far we tried to find where we call terminate and replace that 16:26:01 as we're starting with detach 16:26:14 mriedem: nova is going to support right? 16:26:20 yeah, I guess migrate does bring attach into it 16:26:29 tommylikehu: nova won't support multiattach until all of the operations are using the new attach flow 16:26:53 mriedem: that make sense 16:27:20 well, and thats because we need the new flow to be able to support it safely, AFAICT 16:27:39 so this one is nice actually: https://review.openstack.org/#/c/456877 16:27:45 johnthetubaguy: that's correct 16:27:56 its really clear, attachment_delete === terminate + detach 16:29:08 yeah, these corner cases are tough like swap and migrate where we showed the volume attached even when it technically wasn't anymore... 16:30:29 right, its the reason we are doing the change, telling cinder what is really happening 16:30:35 so multi-attach is possible 16:32:19 ok, so now we need to look into the migrate_volume_completion in Cinder and figure out what it does with the volume to see how it can work with the new flow or what we need to do to have the new flow support swap 16:32:55 I would think we don't want update to have the connector=None option here as it does not seem to be useful for swap here anyhow 16:33:00 ildikov: ok so i'll review your comments and change the attachment_update connector=None stuff to delete 16:33:24 mriedem: that should work, I'll look into the Cinder side 16:33:53 is there anyone here from the Cinder team, who's familiar with that particular piece in question? 16:34:08 in case I would go nuts for some reason while looking into it :) 16:34:26 Which? Sorry, too many streams going right now. 16:34:48 smcginnis: migrate_volume_completion 16:35:14 ildikov, we should eliminate the connector=None wherever we can 16:35:24 that simply doesn't work for most backends in cinder. 16:35:37 hemna: the Cinder side returns an HTTPBadRequest for that right now 16:35:55 I mean attachment_update does return that 16:36:40 ok phew 16:36:48 :) 16:37:09 hemna: to be clear, nova creates an attachment initially with connector=None 16:37:11 by design 16:37:24 to reserve that volume 16:37:35 we provide the connector once we get it from brick on the compute 16:37:49 as part of the new API right? but that attach call never makes it to the cinder driver afaik 16:38:07 hemna: Yep, just creates in DB. 16:38:13 ok cool 16:38:15 mriedem: that does not go down to the drivers 16:38:27 sorry I was thinking the old API 16:38:28 mriedem: that's basically a DB record at that point 16:38:40 re: calling terminate_connection with a None connector 16:38:48 ignore me 16:39:14 hemna: ah ok, no, we definitely didn't want to do that :) 16:39:23 i'll adjust my patches then 16:39:58 mriedem: I'll confirm the migrate_volume_completion part as soon as I figured it out 16:40:37 I also started to clean up the very old attach PoC so we have something to test with 16:40:53 ok 16:41:30 so, I have an idea that may help us here 16:42:01 thinking about this one: https://review.openstack.org/#/c/456877 16:42:15 it does terminate + detach 16:42:20 or delete attachment 16:42:27 lets make that a single method 16:42:49 its basically detach_from_host_and_volume 16:43:15 or detach(keep_attached=False) 16:43:47 so if you do dettach(keep_volume_attached_to_instance=False) 16:43:54 I think thats a better name 16:43:59 detach(keep_attached) seems a little funny, but yeah 16:44:00 all be it too long 16:44:19 detach_from_host(also_detatch_volume=True) 16:44:26 johnthetubaguy, where the call to os-brick.disconnect_volume in that flow? 16:44:27 well, we use the terminate + detach like that in 1.5 places 16:44:31 (gerrit url) 16:44:38 hemna: its before it I think 16:44:51 * hemna is looking 16:45:01 https://review.openstack.org/#/c/456877/2/nova/compute/manager.py@2261 16:45:21 and the keep_attached piece is tricky as sometimes we have a new volume, in the other case we have a new host 16:45:35 hemna: dang, I wonder if its missing? 16:45:37 so I wouldn't go for consolidation at thi spoint 16:45:42 johnthetubaguy, looks like it is 16:46:07 ildikov: maybe, but here is the next bit 16:46:14 I don't mean to derail this, but we are having a discussing in the nova channel about the races between calling brick's disconnect_volume and calling terminate_connection 16:46:21 so detach_from_host(also_detatch_volume=True) deletes the attachment 16:46:37 which is why I was looking for the disconnect_volume call in that patch 16:46:40 detach_from_host(also_detatch_volume=False) creates a new attachment, then deletes the old one 16:46:42 johnthetubaguy: you're trying to combine what shutdown_instance and https://review.openstack.org/#/c/456896/5/nova/compute/manager.py is doing right? 16:47:06 mriedem: yeah 16:47:20 well, I am thinking, what method could we replace terminate connection with 16:47:34 but rather, its what do we replace terminate connection with AND terminate + detach with 16:47:38 we aren't creating a new attachment in shutdown_instance though 16:47:48 hemna: we only changed terminate_connection, so wherever disconnect is I don't think we made anything worse 16:48:11 hehe 16:48:52 johnthetubaguy: as a matter of fact I started with something like that with terminate_volume_connections having a param keep_attached 16:48:55 hemna: sounds like those races should get shook out of all this work, if we do it correctly 16:49:03 I'm hoping so 16:49:04 johnthetubaguy: but going into the details it seemed like a bad idea at this point 16:49:17 johnthetubaguy: while we're trying to figure out the individual flows 16:49:25 I don't want to have to deal with a callback from os-brick to nova 16:49:29 nor calling cinder 16:49:44 ildikov: maybe, but this is much worse: https://review.openstack.org/#/c/456896/5 16:50:39 johnthetubaguy: i don't see how that is worse 16:50:51 i was -1 on the earlier patches in that series which were trying to do things that we didn't need to do yet 16:51:01 plus it was all one big change which was hard to review 16:51:25 mriedem: yeah, I was referring to the don't need to do those yet part here 16:51:25 until we actually have to write something that does an attach and then detach, can we save it for then? 16:51:45 mriedem: so isn't _terminate_volume_connections used in loads of places? 16:52:19 johnthetubaguy: it's used in 2 places 16:52:34 yep, that's what I remember too 16:52:35 resize and revert_resize 16:52:44 mriedem: OK, so thats probably where I am getting confused 16:52:51 we call terminate_connection separately in swap volume 16:52:54 and shutdown_instance 16:52:56 and probably live migration 16:52:59 and cold migration 16:53:03 and foobarbilly 16:53:13 yeah 16:53:45 johnthetubaguy, https://github.com/openstack/nova/blob/master/nova/compute/api.py#L2149 16:54:01 this also looks like a place where terminate_connection is called w/o calling brick to disconnect 16:54:09 :( 16:54:19 that's local delete folks 16:54:23 you can't call brick b/c the compute is dead, 16:54:24 hemna: thats because the host is dead 16:54:30 or the instance is shelved offloaded 16:54:31 ok 16:54:34 phew 16:54:41 I'm just scouring through nova code 16:54:43 we also stash the connector since mitaka or something 16:54:50 and pass that in if we have it 16:54:58 https://github.com/openstack/nova/blob/master/nova/compute/api.py#L2157 16:55:15 so, we have really been around the houses today, 5 mins left, whats next? 16:55:35 i have to drop for another call 16:56:02 sounds like I need to look at the other patches in this series, and re-read some of this code again 16:56:14 after that, it should be a bit clearer to me I hope 16:56:29 sounds like something fun for a friday morning I guess 16:57:05 johnthetubaguy: if we could get local_cleanup and shutdown_instance done quickly that would be great 16:57:11 and then deal with the chain 16:57:28 also plz look into remove check_detach as it's just hanging there lonely at this point 16:57:47 I will add attach to the etherpads once it's in working state 16:58:18 I mean the attach PoC, not for merging but if anyone wants to pick it up for testing, etc. 16:58:55 I see we have a spec update up for check_detach 16:59:07 johnthetubaguy: and begin_detaching 16:59:14 I have a question there for the latter 16:59:23 we can chat about that on the review 16:59:25 yeah, I added follow on questions 16:59:32 ok, cool, tnx 16:59:39 we need to resolve begin_detaching before we do check_detach I think 16:59:59 at least, I think resolving that would stop my concerns 17:00:05 johnthetubaguy: it does the checks regardless 17:00:24 my problem is finding a place for begin_detaching in the new world 17:00:30 if its still there, the checks can die 17:00:37 johnthetubaguy: did you see my spec update for begin_detaching? 17:00:38 improvements in the volume state machine is independent IMHO 17:00:53 mriedem: yeah, thats the patch we are discussing, I added some questions 17:01:23 so if we keep using begin_detaching with the new flow (no multi-attach), I think we are all good 17:01:29 johnthetubaguy: as I see it comes before we get to the new world 17:01:58 right now we still do, no one touched that call 17:02:02 I think we probably have agreed that now, once thats agreed I am fine with the check_detach patch 17:02:13 ildikov: totally 17:02:40 I think we are talking past each other on this one, I am 99% sure I have the same aims as you, just a few extra worries 17:02:55 ok, we can finish agreeing on the spec 17:03:12 yeah, the problem is we loose the protection when we get multiple attachments 17:03:19 johnthetubaguy: sometimes you worry more than my mother and she's really good at it! :) 17:03:28 the protection against two detach calls being made to one volume, basically 17:03:48 ildikov: I can out worry quite a few mothers, its a bit of a worry 17:04:07 well, in Pike we will not get there to have multiple attachments anyhow 17:04:19 anyways, I guess we are done, timewise 17:04:23 johnthetubaguy: lol :) 17:04:40 yeah, let's take this discussion to the spec review 17:04:49 thank y'all for today! 17:04:49 yup, yup 17:05:23 C U on Gerrit and next week here! 17:05:28 #endmeeting