16:00:05 <ildikov> #startmeeting cinder-nova-api-changes 16:00:06 <openstack> Meeting started Thu Apr 20 16:00:05 2017 UTC and is due to finish in 60 minutes. The chair is ildikov. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:07 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:09 <mriedem> o/ 16:00:10 <openstack> The meeting name has been set to 'cinder_nova_api_changes' 16:00:14 <mriedem> o/ o/ o/ 16:00:16 <johnthetubaguy> o/ 16:00:20 <ildikov> DuncanT ameade cFouts johnthetubaguy jaypipes takashin alaski e0ne jgriffith tbarron andrearosa hemna erlon mriedem gouthamr ebalduf patrickeast smcginnis diablo_rojo gsilvis xyang1 raj_singh lyarwood breitz jungleboyj 16:00:22 <lyarwood> o/ 16:00:31 <hemna> \o 16:00:34 <ildikov> mriedem: hi :) 16:00:42 <smcginnis> ./ 16:01:05 <patrickeast> o/ 16:01:14 <jungleboyj> o/ 16:01:55 <ildikov> I think we can start :) 16:02:36 <stvnoyes> o/ 16:02:49 <ildikov> so ongoing things are updating the detach flow in Nova 16:02:57 <ildikov> and remove check_detach at this point 16:03:02 <mriedem> https://review.openstack.org/#/c/438750/ has a +2 16:03:41 <lyarwood> a quick nit on that, do we also need to delete the attachment_id from the bdm when we are removing the attachment? 16:03:41 <ildikov> mriedem: awesome, thank you :) 16:04:06 <mriedem> oh hmm 16:04:18 <ildikov> lyarwood: actually we should :S 16:04:22 <johnthetubaguy> does the BDM get deleted at that point? 16:04:30 <johnthetubaguy> I guess not always 16:05:32 <lyarwood> yeah sorry trying and failing to think of a use case where it would remain, ignore me, I'll follow up in the review if I find one 16:05:46 <mriedem> well, 16:05:52 <mriedem> it's a good question, 16:05:56 <johnthetubaguy> terminate connection ones 16:06:07 <mriedem> but things that are keying off attachment_id would see it's not set and then do any old path stuff 16:06:11 <johnthetubaguy> but wait, no attachment delete there 16:06:23 <johnthetubaguy> mriedem: good point 16:06:54 <johnthetubaguy> although after you detach a bdm, not sure what you would do with it, unless its already attached somewhere else or about to be attached somewhere else 16:06:55 <lyarwood> but that's only on the detach flow right? attach isn't going to use attachment_id, it's just always going to use the new flow. 16:07:08 <johnthetubaguy> lyarwood: I think that might be the key bit 16:07:17 <mriedem> detach_volume in the compute manager deletes the bdm 16:07:21 <mriedem> after the detach 16:07:27 <ildikov> lyarwood: attach will set it in the BDM 16:07:39 <mriedem> unless it's doing a rebuild 16:07:45 <ildikov> lyarwood: attach will use microversions to decide what flow to use 16:08:06 <mriedem> hmm, this is going to be weird with rebuild/evacuate 16:08:07 <mdbooth> o/ 16:08:29 <mriedem> i believe rebuild models things with a single bdm 16:08:30 <johnthetubaguy> well, on the new host its an attach flow 16:09:55 <johnthetubaguy> ... I wonder if we really detach the attachment, rather than the BDM, oh my, my head is mashed :( 16:10:03 <ildikov> mriedem: you mean updating or not updating the attachment_id? 16:11:17 <ildikov> mriedem: johnthetubaguy: do you have kind of a state machine for the BDM? 16:11:34 <ildikov> just to try to figure out what happens with it in each use case 16:12:06 <mriedem> hells no 16:12:11 <johnthetubaguy> I think we wrote it out in the spec 16:12:19 <johnthetubaguy> I wouldn't want to go down that path here 16:12:29 <mriedem> johnthetubaguy: so rebuild is either going to delete the old bdm and create a new bdm, or update the single bdm with the new attachment_id 16:12:31 <mriedem> i think 16:12:40 <johnthetubaguy> yeah, I think it does the update 16:12:51 <johnthetubaguy> you know connection_info 16:12:52 <mriedem> because it's the same volume 16:12:58 <johnthetubaguy> we have to store that 16:12:58 <mriedem> same volume, different attachment 16:13:01 <johnthetubaguy> so we don't loose it 16:13:07 <mriedem> yeah so i'm still +2 16:13:10 <johnthetubaguy> I think attachment_id will have to be done in the same way 16:13:13 <johnthetubaguy> yeah, I think me too 16:13:36 <johnthetubaguy> either way, I think its fine with what uses that code right now 16:13:47 <mriedem> yeah that too, 16:13:51 <mriedem> we can flush out other demons later 16:13:55 <johnthetubaguy> +1 16:14:00 <johnthetubaguy> its a problem for live-migrate 16:14:02 <johnthetubaguy> etc 16:14:12 <ildikov> yeap, I'm trying to organize the chain of patches right now 16:14:28 <ildikov> I had to accept I cannot really avoid doing that... :/ 16:14:39 <mriedem> yeah i was trying to figure out what to review next 16:14:42 <mriedem> since it's not a series that's hard 16:15:08 <mriedem> i guess https://review.openstack.org/#/c/456896/ is next? 16:15:33 <mriedem> since some things build on top of that 16:16:21 <johnthetubaguy> hmm... is there not a way to do an explicit flow 16:16:29 <johnthetubaguy> like API operation X 16:16:42 <mriedem> i think that's what ildikov tried to do 16:16:49 <mriedem> the common things are at the bottom, 16:16:53 <ildikov> mriedem: that can be 16:16:54 <mriedem> but then each api is handled separately 16:16:56 <mriedem> not in a chain 16:16:57 <johnthetubaguy> ah, my bad, I see 16:17:07 <mriedem> i just don't personally want to handle live migration right now... 16:17:17 <johnthetubaguy> yeah 16:17:25 <ildikov> mriedem: I tried to re-use terminate_connection, re your comment on try not to do too much copy-paste there 16:17:26 <johnthetubaguy> maybe migrate would be easier, or rebuild? 16:17:46 <johnthetubaguy> honestly, I would add a new method, and consolidate if it makes sense 16:18:04 <johnthetubaguy> to stop accidentally using the new stuff, I was thinking 16:18:13 <ildikov> mriedem: but it's not crystal clean yet whether that was a good idea or not 16:18:23 <ildikov> mriedem: I meant _terminate_volume_connections to be clear 16:18:25 <mriedem> ildikov: in https://review.openstack.org/#/c/456896/3/nova/compute/manager.py ? 16:18:41 <mriedem> ok i see 16:19:16 <johnthetubaguy> you know what, we need create before delete 16:19:55 <ildikov> mriedem: yes 16:20:13 <ildikov> mriedem: attachment_create is there by mistake in that patch 16:20:47 <ildikov> johnthetubaguy: accidentally use new stuff? 16:21:19 <mriedem> ildikov: i agree with lyarwood in that change, comments inline 16:21:27 <mriedem> i don't see anything calling _terminate_volume_connections that already has a connector in scope 16:21:30 <mriedem> unless those come later in the series 16:21:37 <johnthetubaguy> ildikov: I will describe more in the patch 16:21:43 <ildikov> johnthetubaguy: ok 16:21:53 <johnthetubaguy> ah, so I think I mean what mriedem just said 16:22:07 <johnthetubaguy> oops, maybe not 16:22:38 <mriedem> i guess swap_volume is the one that already has the connector 16:22:47 <mriedem> so https://review.openstack.org/#/c/456896/3/nova/compute/manager.py is prematurely optimizing 16:23:41 <ildikov> mriedem: we have it here for instance: https://review.openstack.org/#/c/456988/1/nova/compute/manager.py 16:24:22 <johnthetubaguy> doesn't attachment_create need the connector here? 16:24:31 <johnthetubaguy> my bad, its the wrong one 16:24:43 <mriedem> ildikov: yeah we don't need that yet in https://review.openstack.org/#/c/456896/3/nova/compute/manager.py 16:24:46 <mriedem> comments inline 16:24:52 <mriedem> anyway, we could review by committee later :) 16:24:55 <mriedem> should we move on? 16:25:03 <ildikov> johnthetubaguy: I thought it just reserves at this point, but we might actually want to connect as well 16:25:22 <johnthetubaguy> I don't think you do the connect in there right now 16:25:42 <mriedem> creating an attachment in the _terminate_volume_connections method is wrong 16:25:44 <johnthetubaguy> I think thats correct, because we are on the wrong host to get the destination connector 16:25:56 <johnthetubaguy> mriedem: I think its only needed for the old API case 16:25:57 <ildikov> I only split the code into multiple patches and that was it 16:26:01 <mriedem> so whatever patches do that, like swap_volume, will need something different 16:26:11 <ildikov> my bad, but I didn't have more bandwidth for it this week :( 16:26:39 <johnthetubaguy> mriedem: ++ 16:26:45 <ildikov> mriedem: when you create an attachment without connector that only keeps the volume reserved 16:27:06 <johnthetubaguy> mriedem: I think thats what I was trying to say earlier with my create a new method for this, it feels like we have many uses, not all need to do the attachment create 16:27:11 <ildikov> mriedem: we can do that later, but then the volume will be out there available 16:27:16 <johnthetubaguy> oh, its worse than that isn't it.... 16:27:58 <johnthetubaguy> the attachment delete is needed when we both detatch and terminate connection, but when we just terminate connection, we probably want to create then delete 16:28:12 <ildikov> I added a keep_reserved boolean to that method later on and when that's True that's when I wanted to do the attachment_create to reserve 16:28:19 <johnthetubaguy> I am not sure I understood what I just said there 16:28:44 <johnthetubaguy> honestly, I would do swap with all brand new code, then we look for shared code later? 16:28:52 <ildikov> the first patch there shouldn't have attachment_create regardless 16:28:54 <johnthetubaguy> I know thats going to mean cut and paste in the mean time 16:29:09 <smcginnis> johnthetubaguy: Seems a reasonable approach though. 16:29:11 <johnthetubaguy> mriedem: do you think that would work ^ 16:29:18 <smcginnis> As long as the clean up work gets identified and done. 16:29:24 <johnthetubaguy> I just think we are going to confuse ourselves with premature optimizations here 16:29:27 <ildikov> johnthetubaguy: works for me, I only tried this per request from mriedem, I can switch back to new code 16:29:50 <johnthetubaguy> its a good idea to try, but its bending my mind reviewing this stuff 16:30:01 <johnthetubaguy> agreed we need no cut and paste in the end, like smcginnis said 16:30:26 <smcginnis> Let's get it working first, then get it cleaned up. 16:30:44 <johnthetubaguy> so... I am thinking about to nova-network and neutron integration 16:30:45 <mriedem> i think re-using code is fine, if it's done properly, 16:30:45 <jungleboyj> smcginnis: +2 16:30:47 <ildikov> fair enough, I will reorganize this when I do that chaining up 16:30:51 <mriedem> there are just simple mistakes in these patches 16:30:55 <mriedem> that once removed are not a problem 16:31:07 <johnthetubaguy> mriedem: maybe 16:31:12 <mriedem> i've commented in two of them 16:31:36 <lyarwood> ildikov: is bandwidth going to be an issue over the next week? Happy to take on swap_volume for example if it is. 16:31:54 <mriedem> so what's the issue with the keeping the volume reserved think? 16:31:57 <mriedem> that's for swap volume? 16:32:14 <mriedem> because as it stands, the code is deleting the attachment and then creating a new attachment w/o a connector to fake out that the volume is still reserved 16:32:17 <mriedem> that's lossy 16:32:26 <mriedem> we have to have a better way to make that atomic 16:32:32 <mriedem> can't we update the attachment to drop the connector? 16:32:36 <johnthetubaguy> you don't need the connector 16:32:39 <ildikov> lyarwood: let's sync up after the meeting as if I put all these into one chain then it's tough to share the work on it 16:32:55 <johnthetubaguy> oh, sorry, thats not your question 16:33:09 <johnthetubaguy> mriedem: the API we have is create a new one, then delete it, to make it atomic 16:33:10 <lyarwood> mriedem: I suggested that during the spec review but I don't think the api can do that at the moment 16:33:16 <ildikov> mriedem: the Cinder code does not support that at the moment 16:33:30 <johnthetubaguy> ildikov: does it support the reverse ordering? 16:33:36 <johnthetubaguy> create then delete 16:33:50 <ildikov> mriedem: and when I last talked to jgriffith we kind of agreed it would be better not to mess up update with that case 16:33:55 <mriedem> ok, create new attachment (w/o connector), delete old attachment (that had connector), then update new attachment with new connector? 16:34:08 <johnthetubaguy> mriedem: yeah, that I thought what the plan 16:34:17 <mriedem> it'd be nice if attachment update could just be atomic 16:34:20 <johnthetubaguy> mriedem: agreed its messy 16:34:26 <mriedem> since the client has to jump through hoops now 16:34:37 <ildikov> johnthetubaguy: we might be able to do that, I don't think anything prevent us from reversing that order 16:34:37 <mriedem> and it means the client has to know the internal state machine of cinder for how attachments work 16:34:39 <johnthetubaguy> so live-migrate has to do that flow anyway, as both are connected for some time 16:34:58 <johnthetubaguy> ildikov: we need the reverse order for live-migrate, so I hope it works 16:35:28 <mriedem> yeah create new attachment on destination host, once live migration is done, delete old attachment from source host 16:35:32 <mriedem> we do the same with ports 16:36:40 <lyarwood> johnthetubaguy: iirc you called that flow out in the spec so it is allowed 16:36:41 <ildikov> johnthetubaguy: I think it should, I don't recall anything that would fail with that case right now 16:37:11 <johnthetubaguy> lyarwood: yeah, I remember jgriffith gave me a funny look about that, so I am curious if it made it into the code 16:37:47 <johnthetubaguy> mriedem: FWIW, its causing big problems with ports 16:38:05 <ildikov> johnthetubaguy: what kind of problems? 16:38:34 <johnthetubaguy> so its probably a total distraction here, its specific to how neutron does a partial attach 16:38:43 <johnthetubaguy> and live-migrate, naturally 16:38:57 <ildikov> I just asked as if we have anything similar to that here then we could try to do it better for volumes 16:39:05 <mriedem> johnthetubaguy: are you referring to all of the stuff we talked about with neutron in BCN that we never implemented? 16:39:11 <johnthetubaguy> mriedem: yeah 16:39:31 <johnthetubaguy> mriedem: they wouldn't implement it, because we wouldn't approve our side of the work, or something like that 16:39:40 <johnthetubaguy> and it wasn't high priority 16:39:44 <mriedem> oh mututally assured destruction, i see 16:39:45 <johnthetubaguy> but anyways, thats a total rat hole 16:39:54 <mriedem> *mutually 16:39:58 * johnthetubaguy nods 16:40:13 <mriedem> well it was ocata, 16:40:17 <mriedem> we had 3 people and 3 months to do anything 16:40:18 <mriedem> so... 16:40:35 <mriedem> anyway, 16:40:38 <mriedem> comments are in the patches 16:40:41 <mriedem> what else? 16:41:10 <johnthetubaguy> seems like we have the stuff for next week 16:41:26 <johnthetubaguy> who was working on those, I forget lyarwood or ildikov or both? 16:41:30 <ildikov> we could decide what to do with remove check_detach: https://review.openstack.org/#/c/446671/ 16:41:55 <ildikov> johnthetubaguy: me for sure, I will sync up with lyarwood and stvnoyes to see how we can share work items 16:42:04 <johnthetubaguy> sweet 16:42:24 <johnthetubaguy> #action ildikov will sync up with lyarwood and stvnoyes to see how we can share work items 16:42:31 <johnthetubaguy> check_detach 16:42:34 <ildikov> johnthetubaguy: and I will hunt down jgriffith as we're in the same town today and figure out what we touched on today to see what to do with these edge cases 16:42:42 <johnthetubaguy> so I totally missread the code and got all worried about breaking the API 16:43:10 <johnthetubaguy> it seems the extra checks are very much "belt and braces" stuff, by which I mean overkill 16:43:37 <johnthetubaguy> ildikov: is that right, its just redundant checks we are removing here really 16:44:35 <ildikov> johnthetubaguy: check_attach and check_detach were/are checks remained here from the ancient times when Cinder became a separate entity in my understanding and from what I saw in the code so far 16:45:02 <ildikov> johnthetubaguy: and I personally believe in code clean up and deleting code time to time and this seems like a good time for it 16:45:33 <mriedem> check_attach had implications for api behavior, 16:45:34 <ildikov> johnthetubaguy: and finally in my view Nova should not track the volume state in these cases as it's Cinder's responsibility really 16:45:38 <mriedem> so it required service version checks and the like 16:45:45 <hemna> ildikov, +1 16:45:52 <mriedem> i also love deleting code 16:45:54 <ildikov> mriedem: that was because of a missed reserve_volume call 16:46:07 <mriedem> but i don't assuming every service is always upgraded to the latest level at the same time 16:46:07 <johnthetubaguy> ildikov: I support all your aims, I am just trying to work out if the change is safe 16:46:28 <ildikov> so I think we made the Nova side code better overall with that :) 16:46:31 <mriedem> having said all that, i haven't gone through the check_detach patch, it might be less messy 16:46:35 <johnthetubaguy> I mean if it were me merge code deletes should deliver you a cookie 16:46:48 <johnthetubaguy> mriedem: I think it is 16:46:50 <mriedem> i didn't get a cookie for removing os-pci 16:47:10 <mriedem> anyway...moving on? 16:47:14 <ildikov> mriedem: we can sort this out in Boston :) 16:47:24 <johnthetubaguy> so... I have a quick question 16:47:28 <ildikov> mriedem: plz check the patch, it's pretty small, should be a piece of cake 16:47:29 <johnthetubaguy> begin_detaching... 16:47:36 <ildikov> mriedem: sorry, cookie :) 16:47:44 <johnthetubaguy> what does that get replaced by in the new world? 16:48:15 <johnthetubaguy> i.e. one we use the new attachment API 16:48:31 <ildikov> johnthetubaguy: you or someone from Cinder should remind me what that does exactly before I try to answer that question 16:48:51 <johnthetubaguy> its the API that does the state check, to see if we can detach 16:48:54 <johnthetubaguy> cinder API 16:49:11 <johnthetubaguy> I don't think it exists in the new API, so Nova would have to manual check the state 16:49:38 <johnthetubaguy> I think it does attached -> detaching state change on the volume in the old world 16:50:06 <ildikov> johnthetubaguy: we shouldn't solve anything on the Nova side that belongs to Cinder 16:50:21 <johnthetubaguy> I agree, but the API doesn't let us do that 16:50:59 <ildikov> we can also re-evaluate the states that we're maintaining in Cinder, if all of them are needed then we can check what's missing and fix it 16:51:15 <johnthetubaguy> so... actually, about that patch we just approved 16:51:25 <ildikov> I mean fix where the fix should happen 16:51:29 <johnthetubaguy> I guess we actually still call begin_detaching, which I don't think we are allowed to? 16:51:55 <johnthetubaguy> so maybe begin_detaching is in the new API too, I wasn't ever 100% clear on that 16:52:29 <mriedem> begin_detaching changes the volume status from in-use to detaching, 16:52:35 <johnthetubaguy> right 16:52:38 <johnthetubaguy> sorry 16:52:39 <ildikov> I don't think it is 16:52:40 <mriedem> if we fail to detach, we call roll_detaching to put it back into in-use state 16:53:23 <johnthetubaguy> maybe thats why we dropped it in the new API, my memory is sketchy on that now 16:53:30 <mriedem> if begin_detaching isn't available in the new api, shouldn't it fail if you request it at microversion >= 2.37? 16:53:32 <mriedem> *3.27 16:53:47 <mriedem> or is it just a noop? 16:53:57 <ildikov> mriedem: I meant that the new flow doesn't use it, the code is available still 16:53:59 <mriedem> i guess this is homework for cinder 16:54:07 <johnthetubaguy> I think so 16:54:18 <johnthetubaguy> so to be clear, I think this is the new flow 16:54:36 <johnthetubaguy> check BDM state, continue (maybe check cinder is in sync-> attachment still exists) 16:54:44 <johnthetubaguy> give early error to uses it was a bad API call 16:54:45 <johnthetubaguy> etc 16:54:48 <johnthetubaguy> then we do stuff 16:54:52 <johnthetubaguy> then delete attachment 16:55:01 <mriedem> why does begin_detaching even exist? 16:55:03 <johnthetubaguy> so its like one cinder API call only in the new flow, I think 16:55:04 <smcginnis> I thought there was a reason it was no longer needed, but we can verify that. 16:55:16 <mriedem> who cares if the volume is being detached? 16:55:21 <johnthetubaguy> so the code we are deleting seems to be needed in the new flow 16:55:22 <mriedem> it's either in-use or it's not 16:55:26 <johnthetubaguy> mriedem: correct 16:55:35 <smcginnis> mriedem: +1 16:55:43 <mriedem> begin_detaching was probably some internal nova-volume thing, externalized in the cinder api 16:56:12 <johnthetubaguy> its just so you get a detaching... state in the API 16:56:18 <johnthetubaguy> actually, thats a good point 16:56:18 <mriedem> same with putting the volume in reserved state, but that's a bit trickier because we need to reserve the volume in nova-api and then put it in-use in the compute 16:56:24 <johnthetubaguy> the new flow changes the API state changes 16:56:35 <hemna> well it was also used to try and prevent race conditions 16:56:43 <johnthetubaguy> reserve just == "in use" I think 16:56:43 <mriedem> hemna: which one? 16:56:52 <ildikov> mriedem: if we would ever get there to multi-attach I guess we might not want to attach and detach at the same time? 16:56:57 <johnthetubaguy> two people calling detatch... damm it 16:56:59 <hemna> multiple detaches happening at the same time 16:57:14 <hemna> re: volume state mgmt 16:57:18 <johnthetubaguy> grr, we missed that one 16:57:31 <johnthetubaguy> right now it will fail on the compute due to the locking going on there 16:57:39 <mriedem> so request 1 puts the volume into detaching state, request 2 fails because the volume is already detaching 16:57:44 <johnthetubaguy> wich is a poor API expereience 16:57:45 <hemna> yah 16:57:59 <mriedem> and we don't have this with the new flows? 16:58:08 <hemna> if it's already detaching, then isn't the 2nd api request a success ? 16:58:10 <mriedem> i.e. we can't use begin_detaching with the >=3.27? 16:58:11 <hemna> it's detaching.... 16:58:19 <hemna> except multi-attach 16:58:29 <mriedem> we have 2 minutes left 16:58:32 <ildikov> mriedem: we can and I think we still do actually 16:58:36 <mriedem> but this is why we can't just rip code out willy nilly 16:58:48 <mriedem> i said it, willy nilly 16:58:57 <ildikov> lol :) 16:59:12 <mriedem> anyway we're out of time 16:59:13 <ildikov> mriedem: it's check_detach I would love to rip out 16:59:35 <ildikov> we can continue on the Cinder channel and code reviews 16:59:50 * willynilly wants to remove check_detach 16:59:51 * johnthetubaguy needs to go cook his dinner 16:59:57 <ildikov> or Nova channel, whichever we want to 17:00:00 <johnthetubaguy> heh 17:00:09 <ildikov> thanks everyone for today 17:00:29 <ildikov> #endmeeting