16:00:39 <ildikov> #startmeeting cinder-nova-api-changes
16:00:40 <openstack> Meeting started Thu Jul 27 16:00:39 2017 UTC and is due to finish in 60 minutes.  The chair is ildikov. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:41 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:44 <openstack> The meeting name has been set to 'cinder_nova_api_changes'
16:00:54 <ildikov> johnthetubaguy jaypipes e0ne jgriffith hemna mriedem patrickeast smcginnis diablo_rojo xyang1 raj_singh lyarwood jungleboyj stvnoyes
16:01:17 <jgriffith> o/
16:02:02 <stvnoyes> o/
16:02:08 <_pewp_> hemna (^▽^)/ ʸᵉᔆᵎ
16:03:02 <smcginnis> o/
16:03:32 <mriedem> sorry
16:03:38 <ildikov> mriedem: no worries
16:03:50 <ildikov> let's start I think we can be quick-ish
16:04:02 <ildikov> mriedem: did you get a chance to check out live_migrate?
16:04:17 <ildikov> stvnoyes: I believe everything is fixed in that one by this point, right?
16:04:56 <mriedem> no
16:04:58 <stvnoyes> should be good to go
16:05:01 <mriedem> been putting out other fires all week
16:05:39 <ildikov> mriedem: ok
16:06:00 <ildikov> mriedem: with the attach patch with jgriffith we figured out what the issue is
16:06:00 <stvnoyes> i also have a rv to add some checks to the tempest migrate test. Want to make sure the aold attachments are deleted - https://review.openstack.org/#/c/487884/ waiting for CI before adding reviewers
16:06:29 <ildikov> stvnoyes: sounds good, thanks!
16:06:41 <ildikov> mriedem: the blockdev issues are with shelved
16:06:49 <mriedem> just shelved?
16:06:54 <jgriffith> mriedem yep
16:06:58 <mriedem> or is it shelve does the thing,
16:07:05 <mriedem> and then it errors on every periodic update in the compute
16:07:17 <jgriffith> mriedem it's strictly a matter of the resource counter running while an instance is shelved
16:07:20 <ildikov> mriedem: we put the volume into reserved state when it's shelved and attach only when the instance gets unshelved
16:07:42 <jgriffith> mriedem so here's what I'm trying to decide; two options (kinda)
16:08:12 <jgriffith> 1. Don't try and resource count in that case (ie if it's shelved and reserved, skip it)
16:09:16 <jgriffith> 2. Try and actually connect/attach the volume in the new flow just like the old code used to do
16:09:24 <jgriffith> I prefer option 1 if possible
16:09:53 <mriedem> the instance is shelved or shelved offloaded?
16:10:04 <mriedem> because if it's offloaded, the guest is destroyed from the host
16:10:11 <mriedem> and when we unshelve, we end up on another host potentially
16:10:24 <jgriffith> shelved_offloaded
16:10:33 <mriedem> btw, shelve is already totally screwed when it comes to having volumes attached
16:10:48 <mriedem> https://review.openstack.org/#/c/257275/
16:11:03 <ildikov> hmm, would be good to get it into a bit better shape then... :)
16:11:39 <mriedem> yeah but that depends on cinder changes
16:11:40 <mriedem> https://bugs.launchpad.net/cinder/+bug/1527278
16:11:40 <openstack> Launchpad bug 1527278 in Cinder "remove_export needs an API to prevent dangling exports" [Wishlist,New]
16:12:07 <ildikov> with the new flow it wouldn't even be attached to the instance
16:12:22 <ildikov> or well, it's already not
16:13:59 <mriedem> jgriffith: is this the code where things start to blow up? https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L7284
16:14:38 <ildikov> mriedem: jgriffith: this one: https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L7210
16:15:10 <jgriffith> 7928
16:15:34 <smcginnis> I was hoping at least two of you would have the same line #.
16:15:51 <mriedem> ildikov: yeah that's called from my link
16:15:56 <mriedem> smcginnis: same flow
16:16:11 <mriedem> https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L7299
16:16:42 <ildikov> mriedem: in that case, yes; I was a bit more explicit
16:16:56 <mriedem> that method is called from a bunch of places though
16:17:31 <mriedem> so what's different about the new and old flows here?
16:17:51 <jgriffith> mriedem old flow does a connect and attach even when the instance is shelved
16:18:11 <jgriffith> mriedem so the volume is connected to the host, iscsi-attached and connected
16:18:24 <jgriffith> mriedem the new api's do NOT actually make the connection and just reserve the volume
16:18:49 <smcginnis> Makes much more sense.
16:18:57 <jgriffith> mriedem so there's not an actual device attached
16:19:02 <ildikov> mriedem: https://review.openstack.org/#/c/330285/106/nova/compute/api.py line #3638
16:19:19 <jgriffith> smcginnis yeah, it's more consistent with what's actually being requested, and the other part is...
16:19:36 <mriedem> "old flow does a connect and attach even when the instance is shelved"
16:19:43 <mriedem> ^ when is that happening?
16:19:45 <jgriffith> the old flow abuses the state of the volume, re-initializes on unshelve and does some things under the covers with terminate connection
16:19:49 <jgriffith> hides a bunch of stuff
16:21:14 <jgriffith> mriedem it happens https://github.com/openstack/nova/blob/master/nova/compute/api.py#L3598
16:21:32 <mriedem> that doesn't actually make a host connection though
16:21:36 <mriedem> it can't,
16:21:39 <mriedem> because the instance.host is None
16:21:40 <jgriffith> mriedem which calls back into the check_attach_and_reserve
16:21:46 <jgriffith> mriedem it cheats
16:21:53 <mriedem> right it reserves the volume,
16:21:53 <jgriffith> and it does
16:21:59 <mriedem> so that it can attach on unshelve
16:22:11 <jgriffith> yes
16:22:18 <jgriffith> the whole thing is kinda *bad* IMO
16:22:21 <mriedem> the actual init conection and connect_volume happens on unshelve
16:22:23 <jgriffith> it shouldn't behave that way
16:22:26 <mriedem> yes it's terrible :)
16:22:39 <jgriffith> mriedem it does it earlier in the old flow though, that's the problem
16:22:39 <mriedem> HP public cloud pushed for that ability years ago and we're still paying for it,
16:22:44 <mriedem> and shelve already sucked before that
16:22:49 <ildikov> it calls volume_api.attach(), which attaches the volume in the Cinder DB kind of
16:23:13 <ildikov> so yeah, just weird
16:23:22 <mriedem> yeah so the old flow attaches the volume here https://github.com/openstack/nova/blob/master/nova/compute/api.py#L3616
16:23:26 <mriedem> which is just the in-use state change in cinder
16:23:32 <mriedem> the volume is not actually connected to the host
16:23:35 <mriedem> since there is no host at that point
16:24:11 <jgriffith> mriedem https://github.com/openstack/nova/blob/master/nova/compute/api.py#L3591
16:24:33 <mriedem> jgriffith: that's for an instance that is not shelved_offloaded
16:24:51 <ildikov> jgriffith: that's not called due to this: https://github.com/openstack/nova/blob/master/nova/compute/api.py#L3650
16:24:59 <jgriffith> sorry, wrong paste
16:25:59 <jgriffith> Oh wait
16:26:12 <jgriffith> derp
16:26:14 <jgriffith> old flow doesn't create a bdm
16:26:31 <jgriffith> so it never gets queried on that call
16:26:53 <jgriffith> but we needed the bdm to indicate which api flow we were using on the unshelve
16:26:58 <jgriffith> at least I think that's right :)
16:27:03 <mriedem> the old flow does https://github.com/openstack/nova/blob/master/nova/compute/api.py#L3611
16:27:14 <mriedem> there is always a bdm otherwise we couldn't attach the volume during unshelve
16:27:44 <mriedem> it won't have a mountpoint set
16:27:58 <mriedem> https://github.com/openstack/nova/blob/master/nova/compute/api.py#L3557
16:28:15 <mriedem> mountpoint == device_name
16:29:17 <jgriffith> mriedem hmm.... so that might solve the problem
16:29:33 <jgriffith> I'll look at it again and try that
16:30:37 <jgriffith> I'm not quite sure where that's getting populated differently from the old code though
16:31:15 <ildikov> I guess we need to check the tests again, we might miss something
16:31:32 <jgriffith> ildikov what do you mean?
16:31:40 <ildikov> as we don't do much differently besides having the volume in reserved state and having an attachment_id in the bdm as extra
16:32:19 <ildikov> jgriffith: that seemingly there isn't much practical difference between the two flows here
16:32:44 <jgriffith> ildikov well I think mriedem may have found the answer
16:32:59 <jgriffith> I need to compare the bdm's
16:33:14 <ildikov> ok, let's try that and then we will see where to move forward
16:33:18 <jgriffith> '<disk type="block" device="disk">\n  <driver name="qemu" type="raw"
16:33:18 <jgriffith> cache="none" io="native"/>\n  <source dev="/dev/sdc"/>\n  <target bus="virtio"
16:33:18 <jgriffith> dev="vdb"/>\n  <serial>d41a57f4-4f90-4d58-8922-d8e05671227d</serial>\n
16:33:18 <jgriffith> <address type="pci"/>\n</disk>\n'
16:33:34 <jgriffith> That's the issue
16:33:52 <jgriffith> it has all that info for the reserved device, but it's not valid
16:34:11 <jgriffith> source dev inparticular
16:34:20 <mriedem> that's from the guest right?
16:34:28 <jgriffith> mriedem yes, the xml
16:34:30 <mriedem> on shelve offload, the guest should get deleted
16:35:11 <mriedem> https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L4321
16:35:41 <mriedem> as noted, we don't terminate the volume connection on the host during shelve offload https://review.openstack.org/#/c/257275/
16:35:45 <mriedem> which is a super latent bug
16:35:57 <mriedem> and apparently requires some changes in cinder to remove the export w/o detaching the volume
16:36:30 <mriedem> i think somewhere in the bug report or patches someone said it's fine for lvm, but not for all backends
16:36:37 <mriedem> probably due to shared connections?
16:36:48 <mriedem> to call terminate_connection there i mean
16:37:07 <jgriffith> possibly
16:37:25 <mriedem> this is also going to be a problem for counting connections on shelve offload for multiattach...
16:37:31 <mriedem> i'd really like to just kill the shelve api
16:37:34 <jgriffith> I wonder if we could make this simple process any more complex
16:37:38 <mriedem> :)
16:37:42 <jgriffith> mriedem I'd be down with that
16:37:46 <mriedem> lots of people would
16:37:58 <mriedem> we just don't have a good way to get metrics if anyone uses it
16:38:12 <jgriffith> mriedem I'd also be done for rewriting all of this mess in the next release (attach code in nova)
16:38:14 <jgriffith> it's cra-cra
16:38:45 <mriedem> so to summarize, i think we know there are some issues with the new flow and shelve
16:38:46 <mriedem> yes?
16:38:53 <jgriffith> alright, well let me poke at what I'm doing differently/wrong with the devices entry
16:38:57 <mriedem> not totally sure i understand what they are, or if we do, or how to fix
16:39:00 <mriedem> ok
16:39:02 <jgriffith> mriedem yes
16:39:17 <mriedem> ok, having said that, i think it's fair to say the new attach flow is going to be deferred to queens
16:39:26 <mriedem> that shouldn't be a shock to everyone is it?
16:39:38 <jgriffith> mriedem how about you give me til 16:00 Mountain today before doing that?
16:39:48 <jgriffith> because frankly the behavior is fine
16:39:55 <jgriffith> it's just this stupid counter
16:40:08 <ildikov> jgriffith: +1
16:40:18 <jgriffith> and also it's no or never if people want me working on it
16:40:23 <mriedem> go ahead and shoot, but we still have the live migration patch and reviews to do, and pike3 has to get tagged today, and the rest of the team is dealing with cellsv2 and placement issues
16:40:33 <jgriffith> of course there are plenty of other capable people but hopefully I'm valuable for something
16:40:36 <mriedem> if this doesn't make pike, i'd like to land it very early in queens
16:40:48 <mriedem> like, ~ptg if possible
16:40:50 <ildikov> mriedem: so no FFE this time?
16:40:53 <jgriffith> if it doesn't land in Pike I'd rather just move on with life
16:41:07 <mriedem> ildikov: we have 2 weeks between today and rc1
16:41:13 <jgriffith> Nova can continue to use the old stuff, we'll deprecate it
16:41:16 <jgriffith> and leave it to you
16:41:18 <jgriffith> :)
16:41:27 <mriedem> don't get me wrong, i want this stuff in
16:41:42 <jgriffith> mriedem I know, not taking it that way
16:41:47 <mriedem> and we've made a ton of progress
16:42:30 <ildikov> yeah, we really did and live_migrate looked fine for Stephen before his vacation and it should be kind of there to get merged
16:42:31 <jgriffith> I'm just being honest, that IMO having this old code that sucks stick around for yet another cycle is not good
16:42:52 <jgriffith> Cinder REALLY needs to move forward, but that's not your fault/problem
16:42:55 <jgriffith> anyway
16:43:08 <mriedem> this isn't blocking cinder from moving forward though right?
16:43:11 <jgriffith> let me go work on this instead of soap boxing :)
16:43:30 <mriedem> i'm not sure what moving forward means - you can't drop cinder api < 3.27 in queens anyway
16:43:40 <jgriffith> mriedem well, you're our #1 (and favorite consumer) :)
16:43:52 <jgriffith> mriedem it means get things like multi-attach sorted
16:44:03 <mriedem> right - so i think we can still work on multiattach in queens,
16:44:11 <jgriffith> force the drivers to move off of the old code that is nothing but trouble for us
16:44:11 <mriedem> if we can land these 2 changes early in queens, like ~ptg
16:44:13 <jgriffith> and frankly fix that thing called
16:44:14 <jgriffith> anyway
16:44:42 <smcginnis> Let's just see how far jgriffith can get today.
16:44:44 <ildikov> if it's not Pike, I would like to get this landed *before* the PTG, so we can move on by then and focus on the remaining bits of multi-attach, etc.
16:44:53 <mriedem> smcginnis: agree
16:45:13 <mriedem> i'm sorry the reviews didn't come this release as usual
16:45:20 <mriedem> short more cores again
16:45:27 <mriedem> and other high priority stuff taking attention
16:45:31 <jgriffith> mriedem nah we know how that goes
16:45:31 <ildikov> but if we can think of RC1, 2 weeks means a lot of hours... :)
16:45:45 <jgriffith> mriedem and honestly I could have done a much better job on this; thank GOD for ildikov !!!
16:45:52 <mriedem> ildikov: this is not the only thing we're working on
16:45:56 <jgriffith> I would've dropped out at patch set 20 :)
16:46:07 <mriedem> yes ildikov is certainly the champion
16:46:14 <jgriffith> mriedem +1
16:46:28 <mriedem> ok anything else?
16:46:33 <ildikov> mriedem: we really didn't get lucky in this release, so I understand that part, I know it's not your or the team's fault, it is what it is :/
16:46:37 <ildikov> tnx :)
16:46:52 <ildikov> nah, let's see where we get with figuring out stuff ASAP and keep in touch
16:47:18 <ildikov> I don't think we an solve anything more here than what we discussed already
16:47:57 <stvnoyes> i have a question for the cinder folks, is there any intention to add attachment list (at least) to the cinder cli? When attachment objects are detached they go invisible. It would be nice to have a way to see them other than going into the db.
16:47:58 <ildikov> is there anything else anyone would want to discuss?
16:48:40 <ildikov> stvnoyes: you mean you would like to see the list of deleted attachments?
16:48:52 <jgriffith> stvnoyes it's already there?
16:49:11 <jgriffith> stvnoyes oh... detached == deleted
16:49:19 <jgriffith> stvnoyes you mean add a filter to show deleted?
16:49:36 <jgriffith> stvnoyes you can do "cinder attachment-list" using the cli currently
16:49:55 <jgriffith> but yes, if you call "cinder attachment-delete" it's disconnected and marked as deleted
16:50:02 <jgriffith> that's how by design
16:50:04 <stvnoyes> ah ok, I must have the wrong version spec'd, didn't see that. nm.
16:50:19 <jgriffith> stvnoyes it could be not setting your silly microversion
16:50:25 <jgriffith> export OS_VOLUME_API_VERSION=3.27
16:50:26 <stvnoyes> that would be it
16:50:42 <stvnoyes> thanks
16:50:44 <jgriffith> stvnoyes yeah, took me a while to get in the habbit before adding it to my rc file
16:50:54 <jgriffith> microversions will be the death of us all... mark my words :)
16:51:21 <stvnoyes> yeah, I had fun with those in tempest with that new change
16:51:36 <jgriffith> stvnoyes yeah, I don't envy you on that at all :(
16:51:51 <jgriffith> stvnoyes and by the way!!!
16:51:54 <ildikov> life never gets boring with microversions it seems... :)
16:51:59 <jgriffith> I also meant to give you a HUGE SHOUT OUT!!!!
16:52:20 <ildikov> jgriffith: +1 :)
16:52:20 <jgriffith> stvnoyes you've been a tremendous help and THANK YOU so much for stepping up and taking this on !!!
16:52:32 <stvnoyes> thanks, it's been fun
16:52:38 <ildikov> jgriffith: +2 +A :)
16:52:49 <stvnoyes> i just found out I'll be at the ptg in denver
16:52:53 <ildikov> stvnoyes: I cannot thank you enough times :)
16:52:58 <stvnoyes> so hope to see you there
16:53:00 <jgriffith> stvnoyes sweet! I owe you a drink!
16:53:03 <ildikov> stvnoyes: awesome!!! :)
16:54:48 <ildikov> ok, is there anything else for today?
16:54:52 <ildikov> :)
16:55:17 <mriedem> nope
16:56:01 <ildikov> ok, then let's get back to making things work! :)
16:56:12 <ildikov> thanks everyone!
16:56:23 <ildikov> keep in touch on the channels!
16:56:31 <ildikov> #endmeeting