16:00:39 #startmeeting cinder-nova-api-changes 16:00:40 Meeting started Thu Jul 27 16:00:39 2017 UTC and is due to finish in 60 minutes. The chair is ildikov. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:41 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:44 The meeting name has been set to 'cinder_nova_api_changes' 16:00:54 johnthetubaguy jaypipes e0ne jgriffith hemna mriedem patrickeast smcginnis diablo_rojo xyang1 raj_singh lyarwood jungleboyj stvnoyes 16:01:17 o/ 16:02:02 o/ 16:02:08 <_pewp_> hemna (^▽^)/ ʸᵉᔆᵎ 16:03:02 o/ 16:03:32 sorry 16:03:38 mriedem: no worries 16:03:50 let's start I think we can be quick-ish 16:04:02 mriedem: did you get a chance to check out live_migrate? 16:04:17 stvnoyes: I believe everything is fixed in that one by this point, right? 16:04:56 no 16:04:58 should be good to go 16:05:01 been putting out other fires all week 16:05:39 mriedem: ok 16:06:00 mriedem: with the attach patch with jgriffith we figured out what the issue is 16:06:00 i also have a rv to add some checks to the tempest migrate test. Want to make sure the aold attachments are deleted - https://review.openstack.org/#/c/487884/ waiting for CI before adding reviewers 16:06:29 stvnoyes: sounds good, thanks! 16:06:41 mriedem: the blockdev issues are with shelved 16:06:49 just shelved? 16:06:54 mriedem yep 16:06:58 or is it shelve does the thing, 16:07:05 and then it errors on every periodic update in the compute 16:07:17 mriedem it's strictly a matter of the resource counter running while an instance is shelved 16:07:20 mriedem: we put the volume into reserved state when it's shelved and attach only when the instance gets unshelved 16:07:42 mriedem so here's what I'm trying to decide; two options (kinda) 16:08:12 1. Don't try and resource count in that case (ie if it's shelved and reserved, skip it) 16:09:16 2. Try and actually connect/attach the volume in the new flow just like the old code used to do 16:09:24 I prefer option 1 if possible 16:09:53 the instance is shelved or shelved offloaded? 16:10:04 because if it's offloaded, the guest is destroyed from the host 16:10:11 and when we unshelve, we end up on another host potentially 16:10:24 shelved_offloaded 16:10:33 btw, shelve is already totally screwed when it comes to having volumes attached 16:10:48 https://review.openstack.org/#/c/257275/ 16:11:03 hmm, would be good to get it into a bit better shape then... :) 16:11:39 yeah but that depends on cinder changes 16:11:40 https://bugs.launchpad.net/cinder/+bug/1527278 16:11:40 Launchpad bug 1527278 in Cinder "remove_export needs an API to prevent dangling exports" [Wishlist,New] 16:12:07 with the new flow it wouldn't even be attached to the instance 16:12:22 or well, it's already not 16:13:59 jgriffith: is this the code where things start to blow up? https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L7284 16:14:38 mriedem: jgriffith: this one: https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L7210 16:15:10 7928 16:15:34 I was hoping at least two of you would have the same line #. 16:15:51 ildikov: yeah that's called from my link 16:15:56 smcginnis: same flow 16:16:11 https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L7299 16:16:42 mriedem: in that case, yes; I was a bit more explicit 16:16:56 that method is called from a bunch of places though 16:17:31 so what's different about the new and old flows here? 16:17:51 mriedem old flow does a connect and attach even when the instance is shelved 16:18:11 mriedem so the volume is connected to the host, iscsi-attached and connected 16:18:24 mriedem the new api's do NOT actually make the connection and just reserve the volume 16:18:49 Makes much more sense. 16:18:57 mriedem so there's not an actual device attached 16:19:02 mriedem: https://review.openstack.org/#/c/330285/106/nova/compute/api.py line #3638 16:19:19 smcginnis yeah, it's more consistent with what's actually being requested, and the other part is... 16:19:36 "old flow does a connect and attach even when the instance is shelved" 16:19:43 ^ when is that happening? 16:19:45 the old flow abuses the state of the volume, re-initializes on unshelve and does some things under the covers with terminate connection 16:19:49 hides a bunch of stuff 16:21:14 mriedem it happens https://github.com/openstack/nova/blob/master/nova/compute/api.py#L3598 16:21:32 that doesn't actually make a host connection though 16:21:36 it can't, 16:21:39 because the instance.host is None 16:21:40 mriedem which calls back into the check_attach_and_reserve 16:21:46 mriedem it cheats 16:21:53 right it reserves the volume, 16:21:53 and it does 16:21:59 so that it can attach on unshelve 16:22:11 yes 16:22:18 the whole thing is kinda *bad* IMO 16:22:21 the actual init conection and connect_volume happens on unshelve 16:22:23 it shouldn't behave that way 16:22:26 yes it's terrible :) 16:22:39 mriedem it does it earlier in the old flow though, that's the problem 16:22:39 HP public cloud pushed for that ability years ago and we're still paying for it, 16:22:44 and shelve already sucked before that 16:22:49 it calls volume_api.attach(), which attaches the volume in the Cinder DB kind of 16:23:13 so yeah, just weird 16:23:22 yeah so the old flow attaches the volume here https://github.com/openstack/nova/blob/master/nova/compute/api.py#L3616 16:23:26 which is just the in-use state change in cinder 16:23:32 the volume is not actually connected to the host 16:23:35 since there is no host at that point 16:24:11 mriedem https://github.com/openstack/nova/blob/master/nova/compute/api.py#L3591 16:24:33 jgriffith: that's for an instance that is not shelved_offloaded 16:24:51 jgriffith: that's not called due to this: https://github.com/openstack/nova/blob/master/nova/compute/api.py#L3650 16:24:59 sorry, wrong paste 16:25:59 Oh wait 16:26:12 derp 16:26:14 old flow doesn't create a bdm 16:26:31 so it never gets queried on that call 16:26:53 but we needed the bdm to indicate which api flow we were using on the unshelve 16:26:58 at least I think that's right :) 16:27:03 the old flow does https://github.com/openstack/nova/blob/master/nova/compute/api.py#L3611 16:27:14 there is always a bdm otherwise we couldn't attach the volume during unshelve 16:27:44 it won't have a mountpoint set 16:27:58 https://github.com/openstack/nova/blob/master/nova/compute/api.py#L3557 16:28:15 mountpoint == device_name 16:29:17 mriedem hmm.... so that might solve the problem 16:29:33 I'll look at it again and try that 16:30:37 I'm not quite sure where that's getting populated differently from the old code though 16:31:15 I guess we need to check the tests again, we might miss something 16:31:32 ildikov what do you mean? 16:31:40 as we don't do much differently besides having the volume in reserved state and having an attachment_id in the bdm as extra 16:32:19 jgriffith: that seemingly there isn't much practical difference between the two flows here 16:32:44 ildikov well I think mriedem may have found the answer 16:32:59 I need to compare the bdm's 16:33:14 ok, let's try that and then we will see where to move forward 16:33:18 '\n cache="none" io="native"/>\n \n dev="vdb"/>\n d41a57f4-4f90-4d58-8922-d8e05671227d\n 16:33:18
\n\n' 16:33:34 That's the issue 16:33:52 it has all that info for the reserved device, but it's not valid 16:34:11 source dev inparticular 16:34:20 that's from the guest right? 16:34:28 mriedem yes, the xml 16:34:30 on shelve offload, the guest should get deleted 16:35:11 https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L4321 16:35:41 as noted, we don't terminate the volume connection on the host during shelve offload https://review.openstack.org/#/c/257275/ 16:35:45 which is a super latent bug 16:35:57 and apparently requires some changes in cinder to remove the export w/o detaching the volume 16:36:30 i think somewhere in the bug report or patches someone said it's fine for lvm, but not for all backends 16:36:37 probably due to shared connections? 16:36:48 to call terminate_connection there i mean 16:37:07 possibly 16:37:25 this is also going to be a problem for counting connections on shelve offload for multiattach... 16:37:31 i'd really like to just kill the shelve api 16:37:34 I wonder if we could make this simple process any more complex 16:37:38 :) 16:37:42 mriedem I'd be down with that 16:37:46 lots of people would 16:37:58 we just don't have a good way to get metrics if anyone uses it 16:38:12 mriedem I'd also be done for rewriting all of this mess in the next release (attach code in nova) 16:38:14 it's cra-cra 16:38:45 so to summarize, i think we know there are some issues with the new flow and shelve 16:38:46 yes? 16:38:53 alright, well let me poke at what I'm doing differently/wrong with the devices entry 16:38:57 not totally sure i understand what they are, or if we do, or how to fix 16:39:00 ok 16:39:02 mriedem yes 16:39:17 ok, having said that, i think it's fair to say the new attach flow is going to be deferred to queens 16:39:26 that shouldn't be a shock to everyone is it? 16:39:38 mriedem how about you give me til 16:00 Mountain today before doing that? 16:39:48 because frankly the behavior is fine 16:39:55 it's just this stupid counter 16:40:08 jgriffith: +1 16:40:18 and also it's no or never if people want me working on it 16:40:23 go ahead and shoot, but we still have the live migration patch and reviews to do, and pike3 has to get tagged today, and the rest of the team is dealing with cellsv2 and placement issues 16:40:33 of course there are plenty of other capable people but hopefully I'm valuable for something 16:40:36 if this doesn't make pike, i'd like to land it very early in queens 16:40:48 like, ~ptg if possible 16:40:50 mriedem: so no FFE this time? 16:40:53 if it doesn't land in Pike I'd rather just move on with life 16:41:07 ildikov: we have 2 weeks between today and rc1 16:41:13 Nova can continue to use the old stuff, we'll deprecate it 16:41:16 and leave it to you 16:41:18 :) 16:41:27 don't get me wrong, i want this stuff in 16:41:42 mriedem I know, not taking it that way 16:41:47 and we've made a ton of progress 16:42:30 yeah, we really did and live_migrate looked fine for Stephen before his vacation and it should be kind of there to get merged 16:42:31 I'm just being honest, that IMO having this old code that sucks stick around for yet another cycle is not good 16:42:52 Cinder REALLY needs to move forward, but that's not your fault/problem 16:42:55 anyway 16:43:08 this isn't blocking cinder from moving forward though right? 16:43:11 let me go work on this instead of soap boxing :) 16:43:30 i'm not sure what moving forward means - you can't drop cinder api < 3.27 in queens anyway 16:43:40 mriedem well, you're our #1 (and favorite consumer) :) 16:43:52 mriedem it means get things like multi-attach sorted 16:44:03 right - so i think we can still work on multiattach in queens, 16:44:11 force the drivers to move off of the old code that is nothing but trouble for us 16:44:11 if we can land these 2 changes early in queens, like ~ptg 16:44:13 and frankly fix that thing called 16:44:14 anyway 16:44:42 Let's just see how far jgriffith can get today. 16:44:44 if it's not Pike, I would like to get this landed *before* the PTG, so we can move on by then and focus on the remaining bits of multi-attach, etc. 16:44:53 smcginnis: agree 16:45:13 i'm sorry the reviews didn't come this release as usual 16:45:20 short more cores again 16:45:27 and other high priority stuff taking attention 16:45:31 mriedem nah we know how that goes 16:45:31 but if we can think of RC1, 2 weeks means a lot of hours... :) 16:45:45 mriedem and honestly I could have done a much better job on this; thank GOD for ildikov !!! 16:45:52 ildikov: this is not the only thing we're working on 16:45:56 I would've dropped out at patch set 20 :) 16:46:07 yes ildikov is certainly the champion 16:46:14 mriedem +1 16:46:28 ok anything else? 16:46:33 mriedem: we really didn't get lucky in this release, so I understand that part, I know it's not your or the team's fault, it is what it is :/ 16:46:37 tnx :) 16:46:52 nah, let's see where we get with figuring out stuff ASAP and keep in touch 16:47:18 I don't think we an solve anything more here than what we discussed already 16:47:57 i have a question for the cinder folks, is there any intention to add attachment list (at least) to the cinder cli? When attachment objects are detached they go invisible. It would be nice to have a way to see them other than going into the db. 16:47:58 is there anything else anyone would want to discuss? 16:48:40 stvnoyes: you mean you would like to see the list of deleted attachments? 16:48:52 stvnoyes it's already there? 16:49:11 stvnoyes oh... detached == deleted 16:49:19 stvnoyes you mean add a filter to show deleted? 16:49:36 stvnoyes you can do "cinder attachment-list" using the cli currently 16:49:55 but yes, if you call "cinder attachment-delete" it's disconnected and marked as deleted 16:50:02 that's how by design 16:50:04 ah ok, I must have the wrong version spec'd, didn't see that. nm. 16:50:19 stvnoyes it could be not setting your silly microversion 16:50:25 export OS_VOLUME_API_VERSION=3.27 16:50:26 that would be it 16:50:42 thanks 16:50:44 stvnoyes yeah, took me a while to get in the habbit before adding it to my rc file 16:50:54 microversions will be the death of us all... mark my words :) 16:51:21 yeah, I had fun with those in tempest with that new change 16:51:36 stvnoyes yeah, I don't envy you on that at all :( 16:51:51 stvnoyes and by the way!!! 16:51:54 life never gets boring with microversions it seems... :) 16:51:59 I also meant to give you a HUGE SHOUT OUT!!!! 16:52:20 jgriffith: +1 :) 16:52:20 stvnoyes you've been a tremendous help and THANK YOU so much for stepping up and taking this on !!! 16:52:32 thanks, it's been fun 16:52:38 jgriffith: +2 +A :) 16:52:49 i just found out I'll be at the ptg in denver 16:52:53 stvnoyes: I cannot thank you enough times :) 16:52:58 so hope to see you there 16:53:00 stvnoyes sweet! I owe you a drink! 16:53:03 stvnoyes: awesome!!! :) 16:54:48 ok, is there anything else for today? 16:54:52 :) 16:55:17 nope 16:56:01 ok, then let's get back to making things work! :) 16:56:12 thanks everyone! 16:56:23 keep in touch on the channels! 16:56:31 #endmeeting