16:00:24 <ildikov> #startmeeting cinder-nova-api-changes
16:00:24 <openstack> Meeting started Thu Sep 21 16:00:24 2017 UTC and is due to finish in 60 minutes.  The chair is ildikov. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:25 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:28 <openstack> The meeting name has been set to 'cinder_nova_api_changes'
16:00:38 <ildikov> johnthetubaguy jaypipes e0ne jgriffith hemna mriedem patrickeast smcginnis diablo_rojo xyang1 raj_singh lyarwood jungleboyj stvnoyes
16:00:52 <jungleboyj> @!
16:00:52 <_pewp_> jungleboyj \(-_- )
16:00:58 <stvnoyes> o/
16:01:33 <ildikov> let's wait a tiny bit, I guess people are slower joining after having the PTG last week :)
16:02:50 <mriedem> o/
16:03:02 <smcginnis> o/
16:03:05 <jgriffith> o/
16:03:07 <ildikov> ok, let's start
16:03:08 <jgriffith> pheww
16:03:17 <ildikov> jgriffith: just in time :)
16:03:39 <ildikov> so we agreed on a few things last week and are about to move forward
16:03:41 <jungleboyj> jgriffith:  Slides into home.
16:03:58 <ildikov> I saw recent comments on the live_migration patch and test
16:04:25 <ildikov> mriedem: there's only one service version check in the new attach patch that I'm aware of
16:04:28 <stvnoyes> looks like i have a grenade issue there.
16:04:39 <ildikov> mriedem: which is not supposed to make the Grenade test fail
16:05:01 <mriedem> it could be a problem in the test, i'm not sure
16:05:18 <stvnoyes> i'll look into to see why it's failing
16:05:28 <mriedem> but yes, we shouldn't be using any new style attachments if any computes are not running the latest code
16:05:33 <mriedem> so i'm not sure why grenade would have issues
16:05:44 <stvnoyes> i am guessing it's a test issue
16:05:55 <mriedem> maybe, i haven't grok'ed your tempest change
16:07:31 <stvnoyes> while we're on the topic of tempest tests...
16:07:33 <ildikov> the attach patch should be up to date with the compute service version
16:07:45 <ildikov> but let me know if something went wrong there
16:08:27 <stvnoyes> I've been working on debugging the migrate iscsi test.  In my local env I am seeing very occasional detach timeouts - DeviceDetachFailed: Device detach failed for vdb: Unable to detach from guest transient domain.
16:08:54 <stvnoyes> happened twice yesterday, but after a bunch of debug, hasn't happened yet today
16:09:18 <stvnoyes> i've been updated this bug - https://bugs.launchpad.net/nova/+bug/1696125
16:09:18 <openstack> Launchpad bug 1696125 in OpenStack Compute (nova) "Detach interface failed - timeout waiting to detach tap device in linuxbridge job (pike)" [High,In progress] - Assigned to Matt Riedemann (mriedem)
16:09:21 <stvnoyes> fyi
16:10:02 <stvnoyes> seems to go down into libvirt and libvirt never responds with detach complete
16:10:16 <jgriffith> stvnoyes I ran into that at one point a while back, but with the old code.  Problem "went away" and I never got it back
16:10:27 <jgriffith> stvnoyes and never figured out what it was doing either :(
16:10:56 <mriedem> stvnoyes: are you using ovs or linuxbridge?
16:11:03 <stvnoyes> ovs
16:11:19 <mriedem> are you running latest nova?
16:11:26 <mriedem> there was a recent change that went into that code not too long ago
16:11:35 <stvnoyes> sorta, as of Aug 31, after the last change listed in that bug
16:12:04 <mriedem> https://review.openstack.org/#/q/I8cd056fa17184a98c31547add0e9fb2d363d0908
16:12:14 <mriedem> ok you should have that then
16:13:00 <mriedem> melwitt, mdbooth, lyarwood and kashyap are the people to ask about that weird persistent vs transient domain stuff
16:13:38 <stvnoyes> ok thx. the persistent  detach seems to work ok, it dies on the transient detach
16:14:14 <stvnoyes> i am hoping to get a failure with my latest debug code, should learn more when that happens
16:14:32 <stvnoyes> unless of course it changes things enough to prevent the failure
16:16:13 <ildikov> stvnoyes: thanks for looking into it
16:16:26 <ildikov> I didn't run into this one besides the Gerrit reviews
16:17:38 <ildikov> is there anything else to add to this topic?
16:17:49 <stvnoyes> all set here
16:18:01 <ildikov> stvnoyes: cool, thanks
16:18:29 <ildikov> so we're working on a couple of things to get the multi-attach prerequisites covered as well before merging the new attach patch
16:18:52 <ildikov> the reason is to wait and have the latest microversion out so we don't need to deal with bumps later if possible
16:19:24 <ildikov> we said we will have a uses_shared_connections flag
16:19:36 * johnthetubaguy hides in the back row
16:19:48 <ildikov> looking more into it the recent plans are to have that in the volume info
16:19:58 <ildikov> as opposed to the attachment or the connection_info
16:20:13 <ildikov> mriedem: johnthetubaguy: any objections to that? ^^
16:20:23 <ildikov> johnthetubaguy: glad you could join :)
16:21:05 <jgriffith> johnthetubaguy mriedem for a little more detail, the idea being that you'd want *something* to use as a lock string even during attachment-reserve/create
16:21:12 <johnthetubaguy> as long as when we do attachment_update and attachment_delete we have the info, I think it is fine
16:21:21 <jgriffith> So the proposal is that we have a bool added in the volume that says "uses_shared_targets"
16:21:22 <johnthetubaguy> jgriffith: why on reserve?
16:21:39 <jgriffith> patrickeast raised a concern stating it needed to be there
16:21:52 <johnthetubaguy> do we know why?
16:22:00 <jgriffith> I'm not 100% convinced, but didn't feel like rat holing
16:22:03 <ildikov> and as it is a capability it doesn't change over attachments, so it makes more sense as part of the volume info
16:22:19 <johnthetubaguy> so a lock on reserve isn't really implementable in Nova right now
16:22:23 <ildikov> jgriffith: didn't he say attachment_update?
16:22:27 <jgriffith> ildikov +1 it made sense to associate it with the volume regardless
16:22:50 <jgriffith> ildikov well, yes; the problem being that he'd need the info DURING the attachment_update
16:22:54 <johnthetubaguy> yeah, in volume_info seems to make sense, from a data model POV
16:22:59 <jgriffith> which means ideally he'd have it on the attachment_reserve
16:23:25 <johnthetubaguy> I am guessing non-Nova users will appreciate it before attachment_create/reserve etc
16:23:30 <ildikov> that would mean the return value from reserve the latest and not using it for reserve
16:23:30 <jgriffith> So my proposal is to just put that flag in the volume, as well as the volume_backend_name which could then be used for the lock
16:23:38 <ildikov> jgriffith: +1
16:24:03 <johnthetubaguy> yeah, we want it for attachment_update and _delete, I think that is the main bit
16:24:24 <jgriffith> johnthetubaguy yeah, and this way you can have it for *all the things* if something comes up
16:24:35 <ildikov> we retrieve the volume info before those calls I think in all cases, so it should be ok
16:24:37 <jgriffith> logically it makes sense, and the testing so far looks bueno
16:24:44 <ildikov> we could still save it in the BDM if we wanted to
16:24:52 <jgriffith> hope to have it posted later this morning
16:24:55 <mriedem> what happens if the volume is retyped?
16:25:01 <mriedem> and the backend name changes
16:25:03 <jgriffith> if I can figure out which Instance I had all the code on :)
16:25:07 <johnthetubaguy> jgriffith: yeah, all sounds fine, my worry was there was something happening in reserve/create we don't understand
16:25:15 <jgriffith> mriedem then that info gets updated if it's a backend migration
16:25:43 <jgriffith> johnthetubaguy no, I don't think there is, it's more about when/how to get the info on the caller side
16:26:02 <jungleboyj> jgriffith:  That plan makes sense to me.
16:26:04 <jgriffith> mriedem did that address our question?
16:26:06 <mriedem> i don't really want to think about retyping a multi-attach volume
16:26:14 <jgriffith> mriedem me neither :(
16:26:15 <ildikov> the info makes even more sense on the volume level considering back end migration IMHO
16:26:35 <jgriffith> There's a reason we used not allow retype of an in-use volume :)
16:26:48 <ildikov> yeah, let's not ruin this nice morning with that thought... :/
16:26:57 <johnthetubaguy> jgriffith: cool
16:27:14 <johnthetubaguy> mriedem +1 on that
16:28:54 <jungleboyj> mriedem:  One step at a time.  :-)
16:28:58 <ildikov> ok, it seems to me that we're on an agreement here
16:29:40 <ildikov> so if anyone wants to rain in this parade please do it now
16:29:46 <ildikov> or don't do it at all :)
16:30:26 <mriedem> shared connections are backend specific,
16:30:31 <mriedem> and the volume is typed per backend,
16:30:37 <mriedem> so it makes sense to model that in the volume
16:30:39 <mriedem> so +1
16:31:25 <ildikov> great, so we will add the info to the volume and do the microversion bumps accordingly
16:32:12 * johnthetubaguy nods
16:32:21 <ildikov> next update
16:32:34 <johnthetubaguy> is there a spec to +1 around all that, I maybe missed the link?
16:32:53 <ildikov> johnthetubaguy: I will add this to the multi-attach spec after the meeting
16:33:09 <johnthetubaguy> OK
16:33:20 <ildikov> johnthetubaguy: I added the info about the lock there, but wanted to get to an agreement on where we put the extra info before updating the spec again
16:33:30 <johnthetubaguy> +1
16:33:56 <ildikov> johnthetubaguy: I also added a note there about the policies I believe where I will need some double checks and suggestions as well
16:34:06 <johnthetubaguy> cool
16:34:21 <ildikov> johnthetubaguy: so highly appreciated if you can check the spec once I update it again shortly
16:34:26 <mriedem> you guys don't want a cinder spec for any of that?
16:34:29 <johnthetubaguy> I think the policy should be mostly cinder side, but I should read through that
16:34:36 <johnthetubaguy> mriedem: I was wondering that too
16:34:39 <jgriffith> mriedem yeah, we do/will
16:34:52 <jgriffith> I'll have it posted prior to proposing the changes
16:34:52 <mriedem> nova was going to have a policy rule for multi-attach boot from volume i think
16:35:02 <mriedem> cinder was going to have a policy rule for multi-attach in general
16:35:30 <mriedem> it was in my nova/cinder ptg recap email
16:35:37 <ildikov> covering that on both sides would be good, I agree
16:36:04 <ildikov> I'm happy to type it up, but my knowledge on policies is fairly weak, so I'll need some help
16:36:38 <johnthetubaguy> some of this could be called "configuration" depending on how you expose this in the API
16:37:07 <johnthetubaguy> I guess I was meaning giving some control to the operators, however you feel is best
16:37:51 <jungleboyj> mriedem:  Yeah, we were going to have policies for mulit-attach, read-only and read/write
16:39:31 <jgriffith> jungleboyj we have some discussing to do around some of that on the Cinder side FWIW
16:39:33 <jgriffith> but anyway
16:40:24 <ildikov> ok, so let's discuss what we need to before the next meeting
16:40:25 <jungleboyj> jgriffith: Ok.
16:40:41 <mriedem> stvnoyes: yay https://photos.app.goo.gl/Q8JdpjM0PZhAzsv32
16:40:54 <ildikov> I will update the Nova side spec and we can add a reference to any Cinder side document we will have
16:40:54 <mriedem> required every time i review any live migration code
16:41:26 <stvnoyes> nice, so simple :-)
16:42:05 <ildikov> mriedem: nice! in the sense of existence
16:42:40 <johnthetubaguy> mriedem: that first one is a cast depending on the microversion now
16:42:43 <ildikov> we should have more of these honestly, I wish I wouldn't be that lazy to draw some when I go and re-understand these flows...
16:42:59 <mriedem> johnthetubaguy: thats from the API, which is not in this diagram
16:43:09 <mriedem> i'm starting at the conductor live migration task
16:43:13 <johnthetubaguy> mriedem: good point
16:44:39 <ildikov> ok
16:44:53 <ildikov> further items
16:45:18 <ildikov> I updated the new attach patch with removing the tiny race conditions by changing the order of attachment_delete and attachment_create
16:45:24 <ildikov> to have create first
16:45:45 <stvnoyes> +1
16:45:49 <ildikov> with jgriffith we're now in a smaller rabbit hole on tracking the volume and attachment state on the Cinder side
16:46:04 <mriedem> that's this right? https://review.openstack.org/#/c/504467/
16:46:12 <jgriffith> mriedem yes
16:46:15 <mriedem> i was confused how that played into the changes we needed on the nova side
16:46:19 <mriedem> i guess they are separate issues?
16:46:43 <jgriffith> partially
16:46:52 <jgriffith> it's a multi-attach issue for the most part
16:47:14 <jgriffith> But there's a problem with moving those things with making sure we have the right "states" on things
16:47:16 <ildikov> mriedem: the problem by having create coming first is that it changes the volume state to 'reserved', which messes with delete, which then changes the volume state to 'available' which also doesn't work and that's not even multi-attach yet...
16:47:59 <jgriffith> what I mean is not the enforcement/check of things, but at the end of actions, figuring out the disaster that is all the various states stashed here and there when you have multiple attachment records
16:48:05 <jgriffith> and you don't treat them as the primary resource
16:48:08 <mriedem> ok
16:48:24 <mriedem> so if a volume is reserved or in-use because there is an existing attachment,
16:48:39 <mriedem> and we create another attachment to keep it reserved, that changes the volume status from in-use to reserved?
16:48:46 <jgriffith> mriedem correct
16:48:48 <mriedem> rather than just leaving it in-use
16:48:49 <mriedem> ok
16:48:54 <jgriffith> oh...
16:48:56 <jgriffith> no opposite
16:49:06 <jgriffith> what happens currently is it will get updated to reserved
16:49:11 <jgriffith> I don't think that's what we want
16:49:17 <jgriffith> we want to keep the in-use status
16:49:17 <mriedem> and once we delete the old attachment, it should leave the volume reserved, but not in-use if that remaining attachment doesn't have a connector
16:49:19 <ildikov> mriedem: exactly
16:49:33 <mriedem> something like that
16:49:46 <jgriffith> mriedem the last one you pointed out yes
16:49:52 <ildikov> mriedem: yes and right now if you delete an attachment it will change the volume state to 'available' blindly
16:49:53 <mriedem> ok
16:49:57 <jungleboyj> jgriffith:  Agreed.  Seems like in-use is the right status.
16:50:11 <mriedem> jungleboyj: in-use depends on if the attachment has a connector
16:50:22 <mriedem> from nova's pov, in-use or reserved are basically the same
16:50:23 <ildikov> mriedem: +1
16:50:25 <jgriffith> https://review.openstack.org/#/c/504467/5/cinder/volume/api.py L#2048
16:50:43 <jgriffith> I'm open to suggestions/discussion if people think that ranking is incorrect
16:51:02 <jungleboyj> mriedem:  Right, but it shouldn't get switched to reserved if there is a connector and it is still in-use in at least one location.  Right?
16:51:16 <mriedem> jungleboyj: correct
16:51:20 <mriedem> at least that's what i'd expect
16:51:20 <ildikov> jgriffith: I couldn't finish my comments before the meeting, but will add them shortly
16:51:27 <jungleboyj> mriedem:  ++
16:51:35 <jgriffith> honestly I'd like to propose a complete rewrite of all of the status mess in Cinder, but I think that's a bigger problem and don't want to pause the Nova attachment changes any more than we have to
16:51:47 <ildikov> mriedem: and more importantly we also want to get 'attaching' and 'detaching' statuses correctly would be my thinking
16:51:50 <jungleboyj> mriedem:  Granted I am new to this and trying to catch up.  If I make no sense you can tell me to go away.  ;-)
16:52:14 <jgriffith> jungleboyj no!!  You don't get to "go away"
16:52:28 <jungleboyj> jgriffith:  Yeah, that would be a much larger effort and we should let this work finish and stabilize first.
16:52:29 <jgriffith> and if something doesn't make sense, ping me or somebody else to figure it out
16:52:32 <ildikov> jgriffith: that makes sense and as we don't want to introduce back status checking in Nova that much I would hope we can rewrite this later without any big mess
16:52:33 <jungleboyj> We have made so much progress.
16:52:47 <jungleboyj> awww jgriffith  Wants me around!  :-)
16:52:54 * jgriffith didn't say that
16:52:56 <mriedem> there is no attaching status right?
16:53:01 * jungleboyj laughs
16:53:03 <jgriffith> mriedem there is
16:53:07 <mriedem> it's available, reserved, in-use, detaching, available
16:53:25 <ildikov> jungleboyj: we're just eager to share the pain with as many fellow Stackers as possible ;)
16:53:48 <jungleboyj> :-)
16:53:49 <mriedem> ok, i'll just leave this alone...
16:53:55 <ildikov> mriedem: attachment_update puts the volume in 'attaching' state
16:54:05 <mriedem> well, tha'ts the attach_status yeah?
16:54:07 <ildikov> and it's attachment_complete that moves it to 'in-use'
16:54:08 <mriedem> there is status, and attach_status
16:54:15 <mriedem> like vm and task state in nova
16:54:26 <mriedem> a vm can be ACTIVE and 'migrating'
16:54:34 <mriedem> anywho
16:54:51 <ildikov> mriedem: ah, ok
16:55:06 <mriedem> btw,
16:55:12 <mriedem> i should not know this much about the internal state machine in cinder
16:55:22 <jgriffith> mriedem and now you're asking for my 20 minute rant on how F'd up all of this is on the Cinder side :)
16:55:22 <jgriffith> mriedem these are why I feel the whole thing is "broken" in Cinder and needs fixed
16:55:31 <ildikov> mriedem: I wouldn't volunteer right now to write up all the possible variations of the 4 statuses we're having...
16:55:33 <jgriffith> mriedem agreed
16:55:50 <jgriffith> anyway... I do plan to "fix" it
16:56:00 <jgriffith> just not before the nova attach changes
16:56:05 <mriedem> ok, ill go back to reviewing steve's live migration change
16:56:07 <ildikov> mriedem: correct, Nova shouldn't play on that field in an ideal world
16:56:19 <jungleboyj> The fact that here is status and attach_status blew my mind when I realized it.
16:56:46 <jgriffith> jungleboyj don't forget the attachment records (new and old) also has an independent attach_status column as well
16:56:55 <ildikov> ok, so I think we all have homeworks that we can move forward with
16:57:05 <jgriffith> we really want to make sure we record attach_status!!
16:57:31 <ildikov> jgriffith: it's very important! :)
16:57:47 <jgriffith> FWIW, my intent was that attach related statuses on the volume go away; and the attachment records are the only ones that matter
16:58:00 <jgriffith> 1 source of truth
16:58:13 <jungleboyj> jgriffith:  ++ That sounds like the right goal.
16:58:17 <jgriffith> anyway... we don't want to go into that right now probably
16:58:24 <ildikov> nope
16:58:28 <ildikov> we have 2 minutes left
16:58:57 <ildikov> so I wonder whether we have any other topics we need to discuss this week and haven't yet?
16:59:53 <ildikov> ok, it seems that was it for this week
16:59:59 <ildikov> thank you all!
17:00:14 <ildikov> let's continue the chats on the channel(s)
17:00:24 <ildikov> see you here next week
17:00:28 <ildikov> #endmeeting