17:00:08 <ildikov> #startmeeting cinder-nova-api-changes 17:00:09 <openstack> Meeting started Mon May 23 17:00:08 2016 UTC and is due to finish in 60 minutes. The chair is ildikov. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:00:10 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 17:00:12 <openstack> The meeting name has been set to 'cinder_nova_api_changes' 17:00:17 <ildikov> scottda ildikov DuncanT ameade cFouts johnthetubaguy jaypipes takashin alaski e0ne jgriffith tbarron andrearosa hemna erlon mriedem gouthamr ebalduf patrickeast smcginnis diablo_rojo gsilvis 17:00:28 <scottda> Hi 17:00:35 <scottda> ildikov: Just pinged you in Cinder :) 17:00:44 <ildikov> scottda: :) 17:01:04 <geguileo> Hi 17:01:05 <patrickeast> hi 17:01:10 <ildikov> I started to answer there, but then I switch to this channel 17:01:53 <hemna> hey 17:02:02 <hemna> is this our meeting slot now? 17:02:06 <hemna> I can't keep it straight 17:02:08 <ildikov> hemna: yes :) 17:02:18 <hemna> seems to change every week 17:02:19 <ildikov> hemna: sorry for the many variations 17:02:35 <hemna> nah it's cool. 17:02:36 <hemna> :) 17:02:51 <ildikov> hemna: last week's slot was meant to be the one, but then it turned out we don't have a channel for that slot anymore... 17:03:18 <hemna> coolio 17:03:23 <ildikov> hemna: and next week Monday is a US holiday, so we need to find another slot or sync up separately... 17:03:25 <hemna> so do we have an agenda today? 17:03:43 <ildikov> I hope we have jgriffith around 17:03:59 <ildikov> to discuss his part as I haven't seen patches from him but I might missed it 17:04:29 <hemna> man that'd be great to see that patch(s) 17:04:30 <ildikov> hemna: I also checked check_attach a bit last week 17:04:57 <ildikov> and I found that in case of attaching a volume at boot time Nova does not call reserve_volume 17:05:17 <hemna> w t f 17:05:19 <hemna> seriously? 17:05:30 <patrickeast> lol 17:05:35 <hemna> ugh, lets follow that up 17:05:39 <hemna> that needs to get fixed asap 17:05:56 <ildikov> I raised an exception from the reserve call and attach at boot time went through and then detach and the attach after that failed there 17:06:24 <ildikov> I mean I had this dumb way of testing, but anyway, the point is the same 17:06:34 <hemna> yah 17:06:47 <ildikov> I think if we can have that fixed than check_attach can be removed from there too 17:06:57 <ildikov> s/than/then/ 17:07:25 <ildikov> maybe where they call check_attach first we can call reserve volume there 17:07:39 <ildikov> let's follow that up after the meeting 17:07:54 <hemna> https://github.com/openstack/nova/blob/master/nova/compute/api.py#L1288 17:08:07 <hemna> that's the one I thought should be changed to a reserve_volume call 17:08:18 <hemna> I haven't tried it yet though 17:08:26 <ildikov> yeah, I meant this one 17:08:38 <ildikov> I think this method is called only in that flow 17:08:48 <hemna> in the boot from volume flow ? 17:08:53 <ildikov> yeap 17:09:05 <ildikov> otherwise they just call attach and that creates the BDM 17:09:19 <hemna> ok, so do you think I should try the same thing I did here: https://review.openstack.org/#/c/315789/ ? 17:09:36 <ildikov> https://github.com/openstack/nova/blob/master/nova/compute/api.py#L3101 17:10:21 <ildikov> that and also call reserve_volume 17:10:25 <hemna> ildikov, so for that one, my patch changes it already 17:10:27 <ildikov> as otherwise no one will 17:10:29 <hemna> https://review.openstack.org/#/c/315789/2/nova/compute/api.py 17:11:14 <ildikov> right, I meant that for other attach cases this method is called here, but for boot from volume they use the _validate_bdm, etc 17:11:36 <hemna> *smh* 17:12:30 <hemna> ok, so at some point in the boot from volume workflow there has to be a call to initialize_connection and attach 17:12:38 <ildikov> regarding the reserve call, I would guess we need to check that what if anything goes wrong, I did not analyse the full call sequence... 17:12:40 <hemna> I'll see if I can track those down 17:13:06 <ildikov> I didn't look for initialize_connection, next time I will be smarter 17:13:12 <hemna> I can't comprehend why nova has copy/paste code for doing the same operation. 17:13:16 <hemna> mind boggling 17:13:18 <ildikov> do you have any other case I should check? 17:13:46 <ildikov> agreed, good that we're doing some cleanup 17:14:39 <ildikov> so normal attach has reserve, live migration is a special case anyway 17:15:17 <hemna> http://paste.openstack.org/show/498223/ 17:15:23 <hemna> that's the entire list of check_attach references 17:15:26 <hemna> fwiw 17:15:46 <hemna> lots of tests referencing it 17:16:44 <hemna> anyway, I'll try testing my patch against he boot from volme and modify the check_attach case there to call check_availability_zone and reserve_volume instead 17:16:48 <ildikov> there is a fake check_attach code as well, which is the copy of the normal basically 17:16:54 <hemna> and see how that goes 17:16:56 <ildikov> that code is a nightmare 17:17:14 <johnthetubaguy> +1 to the above comments, that seems crazy 17:17:19 <hemna> check_attach should go away IMHO 17:17:28 <johnthetubaguy> where is the cut and paste again? 17:17:34 <ildikov> I meant line 33 for my previous comment 17:17:34 * johnthetubaguy reads more of scrollback 17:17:43 <hemna> https://github.com/openstack/nova/blob/master/nova/compute/api.py#L1287-L1291 17:18:02 <hemna> https://github.com/openstack/nova/blob/master/nova/compute/api.py#L3097-L3099 17:18:08 <hemna> those should be the same thing IMHO 17:18:14 <hemna> but should be changed to.... 17:18:14 <ildikov> johnthetubaguy: the base of this discussion is that the BFV case with existing Cinder volume does not call reserve_volume 17:18:22 <hemna> https://review.openstack.org/#/c/315789/2/nova/compute/api.py 17:18:23 <hemna> that 17:18:24 <johnthetubaguy> ah... 17:20:05 <ildikov> so we thought to add that missing call and also remove check_attach from there as well 17:20:31 <johnthetubaguy> a new method in volume_api to do those calls? 17:20:40 <hemna> possibly yah 17:20:49 <hemna> I had thought of refactoring those 2 into a single place 17:20:53 <hemna> if all goes well :) 17:21:13 <johnthetubaguy> yeah, that makes sense 17:21:34 <johnthetubaguy> seems like reserve and check should always both we done? 17:21:49 <hemna> possibly 17:21:55 <hemna> live migration is a special case though 17:21:57 <hemna> :( 17:22:02 <johnthetubaguy> oh, true 17:22:15 <johnthetubaguy> its always a special case 17:22:20 <johnthetubaguy> thats why its so dammed broken 17:22:25 <hemna> yup 17:22:26 <hemna> pain 17:22:33 <hemna> ok so baby steps 17:22:34 <ildikov> but in general the less state check in Nova is the better 17:22:54 <hemna> ildikov, +1 17:23:04 <ildikov> and use reserve what it's for 17:23:07 <johnthetubaguy> yeah, +1 17:23:16 <ildikov> do we have any other special case like live migration? 17:23:27 <hemna> I'm shocked that BFV doesn't use reserve 17:23:35 <ildikov> or it's the only, but big monster? 17:23:37 <hemna> I'd like to confirm that 17:23:46 <ildikov> I haven't checked how it is done when Nova creates the volume 17:23:58 <johnthetubaguy> so I wonder if we should get you lovely cinder folks to write a very quick doc on how we *should* use your API? we could put that in our devref thingy? 17:23:59 <scottda> ildikov: I think we need to look at shelve_offload, as cinder state if fudged there as well. 17:24:12 <hemna> heh 17:24:12 <johnthetubaguy> ildikov: so migrate, and shelved, evacuate, resize 17:24:16 <ildikov> just add an exception to the client call and it will get through or I'm a magician... 17:24:16 <scottda> johnthetubaguy: That's a good idea. 17:24:23 <hemna> I think jgriffith created a nice blog about it and was working on a devref for it as well 17:24:31 <johnthetubaguy> that would be prefect 17:25:07 <ildikov> johnthetubaguy: migrate is live migrate or Cinder migrate there? 17:25:12 <johnthetubaguy> so I think its when the server moves, we have issues 17:25:20 <johnthetubaguy> let me find a doc on those, one second... 17:25:36 <hemna> johnthetubaguy, http://docs.openstack.org/developer/cinder/devref/attach_detach_conventions.html 17:25:37 <hemna> :) 17:25:51 <johnthetubaguy> http://developer.openstack.org/api-guide/compute/server_concepts.html 17:26:22 <johnthetubaguy> hemna: ah, coolness, we might want to add something in the Nova docs, that just link to your doc :) but that might be overkill! 17:26:22 <ildikov> hemna: nice, we could add a ref to your diagrams there as well 17:26:34 <hemna> johnthetubaguy, I think that'd be nice. 17:26:43 <johnthetubaguy> so the api docs cover server moves, we have very silly names 17:26:49 <johnthetubaguy> live-migrate is what you know 17:26:54 <johnthetubaguy> migrate is cold migrate 17:27:01 <hemna> server move = live migration ? 17:27:02 <johnthetubaguy> i.e. turn it off, move it, turn it on 17:27:20 <johnthetubaguy> yeah, live-migration is move the server, but with only a slight pause and no reboot 17:27:22 <ildikov> johnthetubaguy: yeah, right, I forgot you call it migrate as well, my bad :) 17:27:36 <johnthetubaguy> yeah, live migrate and cold migrate is what we often call it now 17:27:56 <johnthetubaguy> now we have dead parrot migrate (I totally just made that up) 17:28:03 <johnthetubaguy> its called evacuate 17:28:08 <ildikov> johnthetubaguy: how much are these things tested? 17:28:13 <johnthetubaguy> when you host is dead, you need to move your server 17:28:18 <johnthetubaguy> ildikov: honestly, its a WIP 17:28:47 <johnthetubaguy> ildikov: migrate is OK, live-migrate is only tested in the experimental queue, we can't make it pass enough yet 17:28:48 <hemna> is cold migrate = shelve/unshelve ? 17:28:53 <johnthetubaguy> nope 17:28:54 <hemna> ok 17:28:58 <johnthetubaguy> lets do evacuate 17:29:03 <johnthetubaguy> your host is head 17:29:06 <johnthetubaguy> you can't get to it 17:29:11 <johnthetubaguy> its in a skip 17:29:15 <johnthetubaguy> but you want your VM back 17:29:19 <johnthetubaguy> so you use evacuate 17:29:38 <johnthetubaguy> (resurrect is another name we came up with, since we created that API call) 17:29:43 <johnthetubaguy> does that make any sense? 17:29:50 <hemna> yah 17:29:53 <ildikov> I know the operation 17:29:53 <hemna> thanks :) 17:29:56 <johnthetubaguy> sweet 17:29:59 <ildikov> but what happens with the volume there? 17:30:10 <johnthetubaguy> it gets re-attached on the destination host 17:30:21 <johnthetubaguy> we don't have access to the old host to do anything there 17:30:32 <johnthetubaguy> we technically share that code with rebuild 17:30:39 <ildikov> but then from volume perspective it's like a migrate? 17:30:44 <scottda> so volume does not need to be "reserved", i.e. set to attaching, it should already be attached and in-use 17:30:46 <johnthetubaguy> I am not sure if we call terminate correctly 17:30:50 <johnthetubaguy> ildikov: kinda, yeah 17:31:02 <johnthetubaguy> scottda: yeah, thats the key bit I guess 17:31:14 <johnthetubaguy> the detach is harder, as we can't call os-brick on the old host, its dead 17:31:27 <hemna> so nova will call initialize_connection on a new host then, when the volume is already 'in-use' 17:31:32 <scottda> and we don't want Nova to check the volume state, because it won't be correct for non-multi-attach case 17:31:37 <johnthetubaguy> hemna: I think so 17:31:40 <hemna> ok 17:31:50 <hemna> yet another case where the connector information should be updated 17:31:52 <ildikov> yeah, like migrate 17:31:57 <hemna> ildikov, yup 17:32:02 <johnthetubaguy> yeah, its very similar 17:32:09 <scottda> johnthetubaguy: Yes, detach won't work for Nova, we'll want to call cinder force-detach (which needs fixing after jgriffith patches) 17:32:32 <johnthetubaguy> scottda: yeah, that sounds correct for evacuate (although wrong for rebuild/migrate) 17:32:56 <johnthetubaguy> which might be tricky, and rebuild and evacuate are mostly common code, but lets ignore that for now 17:33:02 <ildikov> I just wanted to ask whether we could call force-detach :) 17:33:16 <johnthetubaguy> hehe, I think we should be able to fix that 17:33:21 <scottda> ildikov: Force detach won't work properly until Cinder saves the connector info... 17:33:29 <hemna> scottda, +! 17:33:30 <scottda> Which is will soon with jgriffith patches. 17:33:31 <hemna> +1 17:33:31 <johnthetubaguy> scottda: +1 17:33:40 <johnthetubaguy> cool beans, we should do shelved? 17:33:42 <ildikov> scottda: right, I got it now :) 17:34:00 <johnthetubaguy> so that migrate, with a really long pause, and possible a delete before the resume 17:34:02 <scottda> johnthetubaguy: Yes, please proceed with shelved :) 17:34:13 <johnthetubaguy> the idea is this: 17:34:18 <ildikov> johnthetubaguy: please educate us :) 17:34:21 <johnthetubaguy> user wants to keep all their resources 17:34:36 <johnthetubaguy> but they don't want to pay for compute resources when the instance isn't used 17:34:43 <johnthetubaguy> (i.e. over night) 17:34:58 <johnthetubaguy> and what a really quick resume, where possible, when they do want to use it 17:35:09 <johnthetubaguy> its a bit messy 17:35:27 <johnthetubaguy> but the instance is basically not really on a host, once it is selved (offloaded) 17:35:45 <johnthetubaguy> the offloaded bit is about snapshotting any local disks, so you can resume on a different host) 17:35:50 <hemna> but the volume is still 'in-use' 17:35:56 <johnthetubaguy> yeah, volume stays in-use 17:36:06 <johnthetubaguy> it shouldn't be attached to any other place 17:36:16 <scottda> so similar to evacuate, we need to skip state checking 17:36:37 <johnthetubaguy> ... now there is a spec to detach disks from a shelved instance, which we currently don't support, but thats not approved yet, AFAIK 17:36:42 <johnthetubaguy> scottda: +1 17:37:14 <scottda> johnthetubaguy: I believe that under the covers, os-brick is called to actually disconnect the disk when we shelve, does that sound right? 17:37:29 <scottda> and a new connection occurs during un-shelve. 17:37:31 <johnthetubaguy> yes, that does sound correct 17:37:43 <johnthetubaguy> yeah, its assumed to be on a new host, and you want to clean up the old host 17:38:03 <johnthetubaguy> not sure how well shelve is tested yet, I don't remember if the tempest stuff landed or not 17:38:11 <johnthetubaguy> but thats the general intent 17:38:16 <scottda> So, there's no messy force-detach issues, or dangling connection. Just a basic disconncect/re-connect in brick and dealing with weird state management. 17:38:26 <johnthetubaguy> thats my understanding of it 17:38:34 <johnthetubaguy> evacuate is the only one with the force, I think... 17:39:31 <johnthetubaguy> I think thats the main move operations all covered 17:39:52 <johnthetubaguy> oh, right, live-migrate is slightly different to the other migrates 17:40:04 <smcginnis> johnthetubaguy: So for shelve, were you saying we need to detach the volume but still keep it in-use? 17:40:12 <johnthetubaguy> smcginnis: yes 17:40:39 <johnthetubaguy> its possible we just don't call terminate_connection thinking about it 17:40:44 <ildikov> I guess it's only disconnected from the host 17:40:50 <johnthetubaguy> ildikov: +1 17:40:56 <johnthetubaguy> ildikov: at least, thats the intent 17:41:05 <johnthetubaguy> its logically still attached to the instance 17:41:23 <ildikov> johnthetubaguy: otherwise the state would change and you don't want to have the volume available 17:41:31 <johnthetubaguy> +1 17:42:02 <scottda> I need to move to IRC_on_phone mode, sorry folks, mostly Read-only from here on out.... 17:42:20 <ildikov> scottda: any updtes on the migrate test? 17:42:35 <ildikov> scottda: on the Cinder migrate ones I meant 17:43:58 <johnthetubaguy> are there any other questions like the above ones I can help with? 17:44:46 <scottda> ildikov: sorry, no update. 17:44:54 <ildikov> johnthetubaguy: I wonder where we will update the host info for the shelve case 17:45:04 <ildikov> scottda: no worries, I just wanted to check 17:45:31 <johnthetubaguy> ildikov: so there is no host, until you do unshelve, and we call initialize_connection again 17:45:32 <hemna> ildikov, any time initialize_connection is called the connector passed in has the host and it should be updated 17:46:01 <johnthetubaguy> I guess for local volumes, that messes things up, but otherwise, that should be OK? 17:46:02 <hemna> I'm hoping jgriffith's change has this in place as part of his patch(es) 17:46:05 <ildikov> johnthetubaguy: ah, ok, I wasn't sure we have the initialize_connection call 17:46:14 <johnthetubaguy> ildikov: I think we should have 17:46:56 <ildikov> ok, cool, I'll check if I get there 17:47:17 <ildikov> johnthetubaguy: can you also take a look at the multiattach spec? 17:47:26 <ildikov> next week is the deadline if I'm not mistaken 17:48:03 <johnthetubaguy> ildikov: for unshelve, its doing this: https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L4205 17:48:16 <johnthetubaguy> ildikov: yeah, sorry, I must look at that again, ASAP 17:48:38 <hemna> I think we are stuck waiting for jgriffith's patches no ? 17:48:42 <johnthetubaguy> I have a quite a few specs in that list, I am afraid, but hope to get through more of those tomorrow 17:49:38 <johnthetubaguy> we should be able to merge the spec before the patches, or is there something undecided we are depending on? 17:50:29 <ildikov> I think we should be able to conclude on the spec 17:51:00 <johnthetubaguy> cool 17:51:37 <ildikov> at least in my opinion, but it would be nice to get a heads up if there's anything missing 17:51:52 <johnthetubaguy> totally, I will try hit that soon 17:52:04 <johnthetubaguy> ildikov: you totally have permission to keep bugging me every morning till I review it 17:52:14 <ildikov> I tried to handle things like jgriffith's patches as an identified dependency, but not part of the multiattach spec technically 17:52:34 <johnthetubaguy> ildikov: that sounds correct 17:52:46 <johnthetubaguy> we could call the other stuff a bug fix 17:52:47 <johnthetubaguy> ish 17:52:48 <ildikov> johnthetubaguy: thanks :) I might use it, but hopefully I will not need :) 17:53:02 <ildikov> johnthetubaguy: +1 17:53:23 <ildikov> many of the discussions today sound like a cleanup and fixing issues we have hidden 17:53:51 <johnthetubaguy> yeah 17:53:52 <smcginnis> ildikov: +1 17:54:04 <ildikov> ok, we have 7 minutes left from the today's slot 17:54:05 <johnthetubaguy> honestly, most of this comes from neither side understand what the other is wanting to do 17:54:16 <johnthetubaguy> so great we are cleaning this up now :) 17:54:25 <ildikov> is there anything we should discuss? 17:54:37 <johnthetubaguy> I was looking at the devref stuff in cinder, I think we might want a more API focused doc 17:54:37 <ildikov> johnthetubaguy: my thinking as well :) 17:54:48 <johnthetubaguy> that includes os-brick in the discussion, maybe 17:54:49 <smcginnis> johnthetubaguy: I do like the suggestion of a devref in Nova for some of this. It's duplication, but hopefully it saves this from happening again once it's cleaned up. 17:55:01 <johnthetubaguy> actually, that would work 17:55:09 <johnthetubaguy> do the nova focused one in Nova 17:55:16 <johnthetubaguy> i.e. here are the patterns 17:55:41 <ildikov> smcginnis: johnthetubaguy: I think we can also add hemna's diagrams 17:55:44 <johnthetubaguy> if we do create server, delete server, live-migrate server, evacuate and shelve, we should be there 17:55:56 <johnthetubaguy> yeah, diagrams are good to make this clearer 17:56:15 <ildikov> not as an API ref, but if someone would like to figure it out what's in the background it's pretty useful 17:56:22 <johnthetubaguy> yeah 17:57:16 <scottda> I'll put it on my list to put up a patch in Nova devref 17:57:18 <johnthetubaguy> it goes the core reviews something to go double check when we can't remember the details next time 17:57:31 <johnthetubaguy> s/ws/wers/ 17:57:40 <johnthetubaguy> scottda: awesome, thank you 17:58:01 <johnthetubaguy> nothing more from me for this week :) 17:58:35 <ildikov> I think we have home work with these server moving actions and fix the reserve_volume in BFV 17:58:48 <ildikov> I will try to hunt down jgriffith as soon as I can 17:58:58 <johnthetubaguy> yeah, feel free to ping me in channel if later in the week, if thats useful 17:59:24 <ildikov> johnthetubaguy: cool, thanks 17:59:41 <ildikov> anything else from anyone before we close? 18:00:07 <ildikov> then, thanks for the good chat! 18:00:12 <johnthetubaguy> +1 18:00:15 <scottda> Thanks everyone 18:00:17 <ildikov> I learnt a lot today :) 18:00:18 <smcginnis> ildikov: Thanks again for arranging this. 18:00:24 <ildikov> oh one thing 18:00:44 <ildikov> next Monday is US holiday, so I will ask around when we could have a quick sync 18:00:59 <scottda> I'm out until Wed next week 18:01:12 <hemna> I think next week is going to be a bit of a loss for some 18:01:20 <hemna> I might be taking time off next week too. 18:01:22 <johnthetubaguy> actually, its a UK holiday too 18:01:27 <ildikov> thanks for the info, Matt is out too at the beginning 18:01:31 <johnthetubaguy> I forget about those 18:01:52 <johnthetubaguy> by which I mean, I forget they are coming up 18:02:05 <ildikov> ok, then let's aim for having a short chat mid next week 18:02:13 <scottda> cool 18:02:13 <johnthetubaguy> that works 18:02:26 <ildikov> johnthetubaguy: we need to get the spec landed, otherwise we should be good 18:02:32 <johnthetubaguy> yeah, +1 18:02:39 <smcginnis> Maybe sync up on Thurday? 18:02:54 <johnthetubaguy> thats a good plan 18:02:57 <ildikov> smcginnis: yeap, sounds good 18:03:00 <scottda> yup 18:03:23 <ildikov> I hope 1700UTC can work 18:03:39 <johnthetubaguy> if its quick, that should be OK 18:03:46 <ildikov> I will send out a heads up mail later this week 18:03:54 <scottda> thanks ildikov 18:03:54 <smcginnis> Works for me. We can hijack the cindre channel again if we need to. 18:04:01 <ildikov> johnthetubaguy: due to the holidays I would assume we can keep it short 18:04:02 <johnthetubaguy> +1 18:04:09 <johnthetubaguy> :) 18:04:24 <ildikov> smcginnis: cool, thanks, I'll check the channels 18:04:51 <ildikov> ok, I don't want steal more of your time today :) 18:05:00 <ildikov> thanks guys! 18:05:17 <ildikov> #endmeeting