16:00:12 <ildikov> #startmeeting cinder-nova-api-changes
16:00:12 <openstack> Meeting started Thu Apr 27 16:00:12 2017 UTC and is due to finish in 60 minutes.  The chair is ildikov. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:13 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:16 <openstack> The meeting name has been set to 'cinder_nova_api_changes'
16:00:21 <ildikov> DuncanT ameade cFouts johnthetubaguy jaypipes takashin alaski e0ne jgriffith tbarron andrearosa hemna erlon mriedem gouthamr ebalduf patrickeast smcginnis diablo_rojo gsilvis  xyang1 raj_singh lyarwood breitz jungleboyj
16:00:31 <mriedem> o/
16:00:45 <jungleboyj> o/
16:00:48 <hemna> \o
16:01:28 <johnthetubaguy> o/
16:01:40 <ildikov> hi :)
16:01:44 <smcginnis> o/
16:02:46 <ildikov> ok, let's jump in the middle
16:02:54 <stvnoyes> o/
16:02:59 <ildikov> most of the ongoing items are here: https://review.openstack.org/#/q/topic:bp/cinder-new-attach-apis
16:03:29 <ildikov> stvnoyes updated shutdown_instance: https://review.openstack.org/#/c/456877/
16:03:42 <ildikov> and local_cleanup: https://review.openstack.org/#/c/456851/
16:03:46 <ildikov> with tests
16:04:13 <ildikov> those two are standalone ones, plz review and let us know if there's anything else that should be addressed there
16:04:50 <johnthetubaguy> so the etherpad is a touch out of date I guess
16:04:52 <mriedem> we should slow down because we have some open questions on some existing patches
16:04:54 <ildikov> I added comments about attachment_update in the chain mriedem picked up starting with terminate_volume_connections: https://review.openstack.org/#/c/456896/
16:04:59 <johnthetubaguy> #link https://etherpad.openstack.org/p/pike-nova-priorities-tracking
16:05:30 <mriedem> i.e. johnthetubaguy is wondering in https://review.openstack.org/#/c/456896/ if we should delete the attachment, or update the attachment to set connector=None
16:05:41 <mriedem> and i have those exact same questions in https://review.openstack.org/#/c/456971/ for swap volume
16:05:55 <johnthetubaguy> so I fear its not that simple
16:05:59 <ildikov> johnthetubaguy: would adding the topic there for sub-team review make sense?
16:06:13 <johnthetubaguy> ildikov: I think it is there
16:06:41 <ildikov> mriedem: so you cannot update the attachment with connector=None today
16:06:52 <ildikov> as attachment_update blows up in that case
16:07:13 <johnthetubaguy> so if we look at terminate connection, I think there are two things we could be doing
16:07:17 <ildikov> the idea what we also started to follow is to create a new attachment and delete the old one
16:07:30 <johnthetubaguy> (1) deleting, (2) moving
16:07:35 <jgriffith> ildikov +1
16:07:44 <jgriffith> We made the decision to stop moving
16:08:01 <ildikov> johnthetubaguy: terminate connection always terminates the connection regardless of calling detach or not
16:08:04 <jgriffith> Attachment-update at one point was actually attachment-finalize but that was changed
16:08:07 <johnthetubaguy> so moving I am talking about is a create and delete (in that order)
16:08:20 <johnthetubaguy> ildikov: the problem is we need to make sure the new attachment is created first
16:08:49 <ildikov> johnthetubaguy: if we have a new attachment for the new volume/host that should cover the not calling detach case I think
16:08:56 <johnthetubaguy> so delete attachment is replicating both terminate + detach
16:08:59 <johnthetubaguy> its not like we can just delete attachments where we used to terminate
16:09:22 <johnthetubaguy> ildikov: it can, its messy, but it can
16:09:33 <ildikov> jgriffith: I start to feel sorry for insisting to switch to 'update' from 'finalize'...:/
16:09:40 <jgriffith> johnthetubaguy we can certainly do the create and then delete if we need to
16:09:48 <mriedem> one of the questions i had in the swap volume patch was,
16:09:50 <jgriffith> ildikov haha.. nah
16:09:57 <mriedem> will migrate_volume_completion handle that we deleted the old attachment?
16:10:32 <mriedem> retype gets pretty damn nutty
16:11:03 <johnthetubaguy> where is that again?
16:11:45 <mriedem> johnthetubaguy: https://review.openstack.org/#/c/456971/5/nova/compute/manager.py@5030
16:11:57 <mriedem> i have TODOs in there asking what i should be doing wrt delete vs update
16:12:05 <mriedem> sounds like i can't do update with connector=None so that's my answer
16:12:13 <mriedem> btw, we really need to get these APIs documented
16:12:18 <ildikov> mriedem: so where you call update, we can call delete there as you created a new attachment already
16:12:26 <mriedem> jungleboyj: do you have minions that can document these APIs?
16:12:29 <mriedem> for 3.27
16:12:45 <ildikov> which reserves the volume just like when you didn't call detach with the old flow
16:13:09 <jungleboyj> mriedem: I have no minions right now.  :-(
16:13:14 <ildikov> mriedem: what would be the purpose of update in the swap case there?
16:13:36 <ildikov> mriedem: there's a patch up there to document the new attach/detach API
16:13:36 <jungleboyj> smcginnis: tommylikehu has been working on some of that, right?
16:14:00 <ildikov> jungleboyj: yes, that's what I'm referring to, just don't have the link handy
16:14:05 <smcginnis> jungleboyj: There have been various api-ref updates.
16:14:34 <johnthetubaguy> mriedem: ah, I see what you mean... I am confused in all that code
16:15:14 <ildikov> API docs patch: https://review.openstack.org/#/c/451252/
16:15:14 <mriedem> johnthetubaguy: i wrote it and i'm confused
16:15:20 <jgriffith> Mriedem jungleboyj I’ll look at updating the docs we have here https://github.com/openstack/cinder/blob/master/doc/source/devref/attach_detach_conventions_v2.rst
16:15:42 <tommylikehu> :)
16:15:42 <smcginnis> ildikov: Thanks, I could not find that patch.
16:15:46 <mriedem> ildikov: the purpose of the update_attachment call with connector=None was me trying to terminate the connection but keep the attachment around for migrate_volume_completion
16:15:56 <mriedem> ildikov: but that's why i had a todo, because i didn't know what i as the client should be doing
16:16:04 <johnthetubaguy> so I remember tracing back migrate_volume_completion now, and its kinda "bi-modal" depending on if cinder started the migration
16:16:24 <smcginnis> jgriffith: Can you look at that api-ref patch and make sure it's in sync with your view?
16:16:34 <tommylikehu> smcginnis: +1
16:16:35 <mriedem> jgriffith: to be clear i'm looking for it here https://developer.openstack.org/api-ref/block-storage/v3/
16:16:37 <ildikov> mriedem: what does migrate_volume_completion do exactly?
16:16:39 <jgriffith> mriedem ildikov tommylikehu so if you have an alternate idea I’m open, but my intent here was that we don’t re-use attachments any more.  They’re disposable things
16:16:52 <hemna> jgriffith, +1
16:16:53 <mriedem> jgriffith: i'm fine with not re-using it, i just didn't know
16:16:55 <mriedem> hence the todo to ask
16:17:01 <jgriffith> That way we eliminate the need for any funny reference counting or corner cases on keeping things around and updating them
16:17:07 <johnthetubaguy> ildikov: its a crazy cinder API that is a Nova call back during a volume migrate
16:17:10 <jungleboyj> jgriffith: Cool, thanks.
16:17:23 <jgriffith> mriedem yeah, I got ya, just explaining my view and opening it for discussion if needed
16:17:24 <mriedem> ildikov: yeah it's nova telling cinder that we completed our side of the retype operatoin
16:17:33 <mriedem> ildikov: and we say if it failed on the nova side or not
16:17:46 <mriedem> so that cinder can rollback
16:17:47 <mriedem> the new volume
16:17:48 <johnthetubaguy> ildikov: cinder API: volume migrate -> nova API: swap volume -> cinder API: migrate_volume_completion
16:17:50 <johnthetubaguy> I think
16:17:57 <mriedem> johnthetubaguy: correct
16:18:07 <mriedem> plus 30 sub-api calls in between :)
16:18:20 <johnthetubaguy> also... user creates volume and calls nova API: swap volume -> cinder API: migrate_volume_completion
16:18:27 * johnthetubaguy nods
16:18:33 <ildikov> jgriffith: I don't want to re-use them either, but we need to figure out how to get a new flow with some old things like this migrate_volume_completion stuff... :/
16:18:55 <ildikov> mriedem: johnthetubaguy: that's a little bit nuts... :)
16:18:58 <johnthetubaguy> I think in the first case, the volume_uuid is swapped, but its not swapped in the second case, or something like that
16:19:10 * ildikov wishes we could simplify that somehow...
16:19:31 <tommylikehu> btw, migrate_volume_completion is not ready for multi attach
16:20:10 <jgriffith> ildikov `git revert`
16:20:11 <johnthetubaguy> tommylikehu: +100
16:20:21 <ildikov> tommylikehu: I think currently it might not even ready for single attach... :(
16:20:33 <hemna> heh
16:21:06 <johnthetubaguy> yeah, I am not totally sure how that all works with the new API, now you mention it
16:21:13 <ildikov> jgriffith: we keep the new flow for sure, just need to clean up some more old stuff
16:21:18 <johnthetubaguy> mriedem: is there an easier place to start, like migrate?
16:21:54 <ildikov> jgriffith: and that's non-negotiable :)
16:22:02 <johnthetubaguy> I just feel like if we did an easier one, it would be clearer whats missing/odd
16:22:29 <ildikov> johnthetubaguy: the two standalone things are ready for review I mentioned at the beginning of the meeting
16:22:48 <ildikov> johnthetubaguy: the last patch in that chain is migrate and I couldn't really get there to give much thought to it
16:22:57 <ildikov> johnthetubaguy: so we can play with that
16:23:04 <johnthetubaguy> yeah, not seen those patches before now
16:23:05 <ildikov> changing the order of patches is simple too
16:23:45 <ildikov> johnthetubaguy: post_live_migrate: https://review.openstack.org/#/c/456988/
16:24:09 <mriedem> tommylikehu: we aren't supporting anything with multiattach in nova yet
16:24:27 <mriedem> johnthetubaguy: i generally don't combine "easier" and "migrate"
16:24:46 <johnthetubaguy> so doing post_live-migrate seems hard if you haven't done the first bit
16:24:58 <johnthetubaguy> mriedem: me neither, seemed strange saying that
16:25:05 <mriedem> feels dirty
16:25:12 <ildikov> mriedem: lol, my thinking too :)
16:25:53 <ildikov> johnthetubaguy: so far we tried to find where we call terminate and replace that
16:26:01 <ildikov> as we're starting with detach
16:26:14 <tommylikehu> mriedem: nova is going to support right?
16:26:20 <johnthetubaguy> yeah, I guess migrate does bring attach into it
16:26:29 <mriedem> tommylikehu: nova won't support multiattach until all of the operations are using the new attach flow
16:26:53 <tommylikehu> mriedem:  that make sense
16:27:20 <johnthetubaguy> well, and thats because we need the new flow to be able to support it safely, AFAICT
16:27:39 <johnthetubaguy> so this one is nice actually: https://review.openstack.org/#/c/456877
16:27:45 <ildikov> johnthetubaguy: that's correct
16:27:56 <johnthetubaguy> its really clear, attachment_delete === terminate + detach
16:29:08 <ildikov> yeah, these corner cases are tough like swap and migrate where we showed the volume attached even when it technically wasn't anymore...
16:30:29 <johnthetubaguy> right, its the reason we are doing the change, telling cinder what is really happening
16:30:35 <johnthetubaguy> so multi-attach is possible
16:32:19 <ildikov> ok, so now we need to look into the migrate_volume_completion in Cinder and figure out what it does with the volume to see how it can work with the new flow or what we need to do to have the new flow support swap
16:32:55 <ildikov> I would think we don't want update to have the connector=None option here as it does not seem to be useful for swap here anyhow
16:33:00 <mriedem> ildikov: ok so i'll review your comments and change the attachment_update connector=None stuff to delete
16:33:24 <ildikov> mriedem: that should work, I'll look into the Cinder side
16:33:53 <ildikov> is there anyone here from the Cinder team, who's familiar with that particular piece in question?
16:34:08 <ildikov> in case I would go nuts for some reason while looking into it :)
16:34:26 <smcginnis> Which? Sorry, too many streams going right now.
16:34:48 <ildikov> smcginnis: migrate_volume_completion
16:35:14 <hemna> ildikov, we should eliminate the connector=None wherever we can
16:35:24 <hemna> that simply doesn't work for most backends in cinder.
16:35:37 <ildikov> hemna: the Cinder side returns an HTTPBadRequest for that right now
16:35:55 <ildikov> I mean attachment_update does return that
16:36:40 <hemna> ok phew
16:36:48 <ildikov> :)
16:37:09 <mriedem> hemna: to be clear, nova creates an attachment initially with connector=None
16:37:11 <mriedem> by design
16:37:24 <mriedem> to reserve that volume
16:37:35 <mriedem> we provide the connector once we get it from brick on the compute
16:37:49 <hemna> as part of the new API right?  but that attach call never makes it to the cinder driver afaik
16:38:07 <smcginnis> hemna: Yep, just creates in DB.
16:38:13 <hemna> ok cool
16:38:15 <ildikov> mriedem: that does not go down to the drivers
16:38:27 <hemna> sorry I was thinking the old API
16:38:28 <ildikov> mriedem: that's basically a DB record at that point
16:38:40 <hemna> re: calling terminate_connection with a None connector
16:38:48 <hemna> ignore me
16:39:14 <ildikov> hemna: ah ok, no, we definitely didn't want to do that :)
16:39:23 <mriedem> i'll adjust my patches then
16:39:58 <ildikov> mriedem: I'll confirm the migrate_volume_completion part as soon as I figured it out
16:40:37 <ildikov> I also started to clean up the very old attach PoC so we have something to test with
16:40:53 <mriedem> ok
16:41:30 <johnthetubaguy> so, I have an idea that may help us here
16:42:01 <johnthetubaguy> thinking about this one: https://review.openstack.org/#/c/456877
16:42:15 <johnthetubaguy> it does terminate + detach
16:42:20 <johnthetubaguy> or delete attachment
16:42:27 <johnthetubaguy> lets make that a single method
16:42:49 <johnthetubaguy> its basically detach_from_host_and_volume
16:43:15 <johnthetubaguy> or detach(keep_attached=False)
16:43:47 <johnthetubaguy> so if you do dettach(keep_volume_attached_to_instance=False)
16:43:54 <johnthetubaguy> I think thats a better name
16:43:59 <smcginnis> detach(keep_attached) seems a little funny, but yeah
16:44:00 <johnthetubaguy> all be it too long
16:44:19 <johnthetubaguy> detach_from_host(also_detatch_volume=True)
16:44:26 <hemna> johnthetubaguy, where the call to os-brick.disconnect_volume in that flow?
16:44:27 <ildikov> well, we use the terminate + detach like that in 1.5 places
16:44:31 <hemna> (gerrit url)
16:44:38 <johnthetubaguy> hemna: its before it I think
16:44:51 * hemna is looking
16:45:01 <johnthetubaguy> https://review.openstack.org/#/c/456877/2/nova/compute/manager.py@2261
16:45:21 <ildikov> and the keep_attached piece is tricky as sometimes we have a new volume, in the other case we have a new host
16:45:35 <johnthetubaguy> hemna: dang, I wonder if its missing?
16:45:37 <ildikov> so I wouldn't go for consolidation at thi spoint
16:45:42 <hemna> johnthetubaguy, looks like it is
16:46:07 <johnthetubaguy> ildikov: maybe, but here is the next bit
16:46:14 <hemna> I don't mean to derail this, but we are having a discussing in the nova channel about the races between calling brick's disconnect_volume and calling terminate_connection
16:46:21 <johnthetubaguy> so detach_from_host(also_detatch_volume=True) deletes the attachment
16:46:37 <hemna> which is why I was looking for the disconnect_volume call in that patch
16:46:40 <johnthetubaguy> detach_from_host(also_detatch_volume=False) creates a new attachment, then deletes the old one
16:46:42 <mriedem> johnthetubaguy: you're trying to combine what shutdown_instance and https://review.openstack.org/#/c/456896/5/nova/compute/manager.py is doing right?
16:47:06 <johnthetubaguy> mriedem: yeah
16:47:20 <johnthetubaguy> well, I am thinking, what method could we replace terminate connection with
16:47:34 <johnthetubaguy> but rather, its what do we replace terminate connection with AND terminate + detach with
16:47:38 <mriedem> we aren't creating a new attachment in shutdown_instance though
16:47:48 <ildikov> hemna: we only changed terminate_connection, so wherever disconnect is I don't think we made anything worse
16:48:11 <hemna> hehe
16:48:52 <ildikov> johnthetubaguy: as a matter of fact I started with something like that with terminate_volume_connections having a param keep_attached
16:48:55 <johnthetubaguy> hemna: sounds like those races should get shook out of all this work, if we do it correctly
16:49:03 <hemna> I'm hoping so
16:49:04 <ildikov> johnthetubaguy: but going into the details it seemed like a bad idea at this point
16:49:17 <ildikov> johnthetubaguy: while we're trying to figure out the individual flows
16:49:25 <hemna> I don't want to have to deal with a callback from os-brick to nova
16:49:29 <hemna> nor calling cinder
16:49:44 <johnthetubaguy> ildikov: maybe, but this is much worse: https://review.openstack.org/#/c/456896/5
16:50:39 <mriedem> johnthetubaguy: i don't see how that is worse
16:50:51 <mriedem> i was -1 on the earlier patches in that series which were trying to do things that we didn't need to do yet
16:51:01 <mriedem> plus it was all one big change which was hard to review
16:51:25 <ildikov> mriedem: yeah, I was referring to the don't need to do those yet part here
16:51:25 <mriedem> until we actually have to write something that does an attach and then detach, can we save it for then?
16:51:45 <johnthetubaguy> mriedem: so isn't _terminate_volume_connections used in loads of places?
16:52:19 <mriedem> johnthetubaguy: it's used in 2 places
16:52:34 <ildikov> yep, that's what I remember too
16:52:35 <mriedem> resize and revert_resize
16:52:44 <johnthetubaguy> mriedem: OK, so thats probably where I am getting confused
16:52:51 <mriedem> we call terminate_connection separately in swap volume
16:52:54 <mriedem> and shutdown_instance
16:52:56 <mriedem> and probably live migration
16:52:59 <mriedem> and cold migration
16:53:03 <mriedem> and foobarbilly
16:53:13 <johnthetubaguy> yeah
16:53:45 <hemna> johnthetubaguy, https://github.com/openstack/nova/blob/master/nova/compute/api.py#L2149
16:54:01 <hemna> this also looks like a place where terminate_connection is called w/o calling brick to disconnect
16:54:09 <hemna> :(
16:54:19 <mriedem> that's local delete folks
16:54:23 <mriedem> you can't call brick b/c the compute is dead,
16:54:24 <johnthetubaguy> hemna: thats because the host is dead
16:54:30 <mriedem> or the instance is shelved offloaded
16:54:31 <hemna> ok
16:54:34 <hemna> phew
16:54:41 <hemna> I'm just scouring through nova code
16:54:43 <mriedem> we also stash the connector since mitaka or something
16:54:50 <mriedem> and pass that in if we have it
16:54:58 <mriedem> https://github.com/openstack/nova/blob/master/nova/compute/api.py#L2157
16:55:15 <johnthetubaguy> so, we have really been around the houses today, 5 mins left, whats next?
16:55:35 <mriedem> i have to drop for another call
16:56:02 <johnthetubaguy> sounds like I need to look at the other patches in this series, and re-read some of this code again
16:56:14 <johnthetubaguy> after that, it should be a bit clearer to me I hope
16:56:29 <johnthetubaguy> sounds like something fun for a friday morning I guess
16:57:05 <ildikov> johnthetubaguy: if we could get local_cleanup and shutdown_instance done quickly that would be great
16:57:11 <ildikov> and then deal with the chain
16:57:28 <ildikov> also plz look into remove check_detach as it's just hanging there lonely at this point
16:57:47 <ildikov> I will add attach to the etherpads once it's in working state
16:58:18 <ildikov> I mean the attach PoC, not for merging but if anyone wants to pick it up for testing, etc.
16:58:55 <johnthetubaguy> I see we have a spec update up for check_detach
16:59:07 <ildikov> johnthetubaguy: and begin_detaching
16:59:14 <ildikov> I have a question there for the latter
16:59:23 <ildikov> we can chat about that on the review
16:59:25 <johnthetubaguy> yeah, I added follow on questions
16:59:32 <ildikov> ok, cool, tnx
16:59:39 <johnthetubaguy> we need to resolve begin_detaching before we do check_detach I think
16:59:59 <johnthetubaguy> at least, I think resolving that would stop my concerns
17:00:05 <ildikov> johnthetubaguy: it does the checks regardless
17:00:24 <johnthetubaguy> my problem is finding a place for begin_detaching in the new world
17:00:30 <johnthetubaguy> if its still there, the checks can die
17:00:37 <mriedem> johnthetubaguy: did you see my spec update for begin_detaching?
17:00:38 <ildikov> improvements in the volume state machine is independent IMHO
17:00:53 <johnthetubaguy> mriedem: yeah, thats the patch we are discussing, I added some questions
17:01:23 <johnthetubaguy> so if we keep using begin_detaching with the new flow (no multi-attach), I think we are all good
17:01:29 <ildikov> johnthetubaguy: as I see it comes before we get to the new world
17:01:58 <ildikov> right now we still do, no one touched that call
17:02:02 <johnthetubaguy> I think we probably have agreed that now, once thats agreed I am fine with the check_detach patch
17:02:13 <johnthetubaguy> ildikov: totally
17:02:40 <johnthetubaguy> I think we are talking past each other on this one, I am 99% sure I have the same aims as you, just a few extra worries
17:02:55 <ildikov> ok, we can finish agreeing on the spec
17:03:12 <johnthetubaguy> yeah, the problem is we loose the protection when we get multiple attachments
17:03:19 <ildikov> johnthetubaguy: sometimes you worry more than my mother and she's really good at it! :)
17:03:28 <johnthetubaguy> the protection against two detach calls being made to one volume, basically
17:03:48 <johnthetubaguy> ildikov: I can out worry quite a few mothers, its a bit of a worry
17:04:07 <ildikov> well, in Pike we will not get there to have multiple attachments anyhow
17:04:19 <johnthetubaguy> anyways, I guess we are done, timewise
17:04:23 <ildikov> johnthetubaguy: lol :)
17:04:40 <ildikov> yeah, let's take this discussion to the spec review
17:04:49 <ildikov> thank y'all for today!
17:04:49 <johnthetubaguy> yup, yup
17:05:23 <ildikov> C U on Gerrit and next week here!
17:05:28 <ildikov> #endmeeting