16:00:30 #startmeeting cinder-nova-api-changes 16:00:31 Meeting started Thu May 25 16:00:30 2017 UTC and is due to finish in 60 minutes. The chair is ildikov. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:32 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:34 The meeting name has been set to 'cinder_nova_api_changes' 16:00:40 o/ 16:00:40 DuncanT ameade cFouts johnthetubaguy jaypipes takashin alaski e0ne jgriffith tbarron andrearosa hemna erlon mriedem gouthamr ebalduf patrickeast smcginnis diablo_rojo gsilvis xyang1 raj_singh lyarwood breitz jungleboyj 16:00:51 o/ 16:00:52 @! 16:00:52 hemna (╯°□°)╯︵ ┻━┻ 16:01:06 @!h 16:01:06 jungleboyj (/ .□.) ︵╰(゜Д゜)╯︵ /(.□. ) 16:01:26 hemna: already? :) 16:01:40 it's the new hi. 16:01:52 hemna: oh, got it :) 16:02:06 hemna: Is just angry all the time. 16:02:11 I guess jungleboyj said smth like 'Hi All' then :) 16:02:24 hah 16:02:27 o? 16:02:35 ildikov: :-) 16:02:42 i guess i am here 16:03:18 stvnoyes: Not sure about that? :) 16:03:18 ok, so for today we have live migrate, touch on attach and swap a bit and then we might look into some other things stvnoyes found recently in the new Cinder attach/detach API 16:03:48 :-) 16:03:54 stvnoyes: I saw you put live_migrate into WIP 16:04:08 do we have any nova changes which aren't WIP that need review? 16:04:08 stvnoyes: is that because of the comments for rollback? 16:04:12 and can actually move forward? 16:04:34 mriedem: your two changes I think still 16:04:55 ildikov: ok i thought so 16:05:20 mriedem: and this: https://review.openstack.org/#/c/456896/ 16:05:51 which i co-authored 16:06:01 i'll need to just find some helpless nova core to pick on here 16:06:06 is there a proposed fix yet for this cinder bug? - https://bugs.launchpad.net/cinder/+bug/1692153 - it will be important to get that fixed 16:06:07 Launchpad bug 1692153 in Cinder "v3 attachment connection_info formatting is different than v2" [High,New] 16:06:24 mriedem: and we could decide on this one too: https://review.openstack.org/#/c/446671/ 16:06:34 stvnoyes: is that on 3.0? 16:06:36 or 3.27? 16:06:46 mriedem: 3.27 16:06:57 only with the new v3 cinder attach code 16:07:24 stvnoyes, so, I think I can put in a fix in os-brick to cast those values to ints 16:07:33 but I still think it's a bug in the output IMHO 16:07:43 bug in the output of the api? 16:07:47 yes 16:07:51 at this point it's in the response that way 16:07:54 you can't change it 16:07:58 w/o a new microversion 16:07:58 the OVO stuff completely reformats it 16:08:12 even if it's a bug ? 16:08:16 correct 16:08:21 i think there were boolean issues too... 16:08:25 yes 16:08:31 "False" vs. False 16:08:33 values. 16:08:47 ok 16:08:57 the oslo versioned object stuff forces it to strings :( 16:09:04 :/ 16:09:09 need response schema validation in tempest 16:09:12 however, 16:09:25 given (1) you don't really want to test initialize_connection in tempest and (2) it's a freeform dict, 16:09:29 that's probably not trivial 16:09:29 attaches won't work w/o a fix somewhere 16:09:35 you could do it with your in-tree tempest plugin 16:09:43 the fixes will have to be client-side for now 16:09:43 so we use OVO and then convert stuff back before sending out the response? 16:09:46 I discovered it while trying to update the brick cinderclient extension to do bare metal attaches 16:10:19 i ran into it doing a v3 attach using the POC bits 16:10:22 hemna: yeah, I remember now you were discussing it on the Cinder channel not long ago 16:10:47 nobody is using this new API yet..... 16:11:19 heh 16:11:21 again, 16:11:25 that's not the answer 16:11:44 once you guys are getting used outside openstack, like you want to be, you can't make those assumptions 16:11:46 :P 16:11:47 plus, 16:11:56 changing this on pike means it's still busted on ocata 16:11:57 come on Dad! 16:12:04 and client side doesn't know b/c you didnt bump the microversion 16:12:07 Until we backport the fix. 16:12:11 wrong answer 16:12:25 Which we can't do if we bump the mv just for a bug fix. 16:12:32 I'm not sure how to fix it on the cinder side at the moment 16:12:49 who are the ovo "guys" in oslo ? 16:12:57 dansmith and jaypipes 16:13:04 i don't see how this is a problem for ovo though 16:13:11 ok, I'll raise this with them and see what their thoughts are on a fix 16:13:13 smcginnis: correct, you can't backport it 16:13:15 it's a new microversion 16:13:21 in the mean time, I can patch brick to do a fix 16:13:25 even if you changed ovo, the response would change 16:13:48 then we can microversion the fix on the cinder side once we figure out the right way. 16:14:32 yup 16:14:48 see 2.42 in nova if you want to see how we've also had to deal with this kind of pain 16:14:58 are we ok with me patching brick to fix at least the LUN int ? 16:15:07 brick is client side, 16:15:11 and can straddle releases 16:15:28 and nova is also client side, 16:15:33 and can require a minimum version of brick 16:15:34 with the fix 16:15:47 ok I'll try and get that up today 16:16:01 cool, so brick it is 16:16:41 and also check what we do with ovo and this type of issues long term 16:16:48 ildikov, +1 16:17:22 mriedem, if you can find me a nova review that I can see how you guys handled something similar that'd help 16:18:32 is there a note/warning anywhere for ovo that it does things you would not necessarily expect? 16:19:09 ildikov, well, I think it's doing exactly as you'd expect 16:19:10 https://github.com/openstack/cinder/blob/master/cinder/objects/volume_attachment.py#L53 16:19:22 the connection_info was declared as a dict of strings. 16:19:27 hence the conversion 16:19:32 hemna: https://specs.openstack.org/openstack/nova-specs/specs/ocata/implemented/fix-tag-attribute-disappearing.html is a story about how we f'ed up 16:19:59 I think you have to declare those vars as something, and as far as I can tell, there is no DictOfLeaveMeAlone 16:20:01 hemna: hmm, I was just wondering why we only realized it now... :) 16:20:16 ildikov, because nobody was using the result coming back from the API yet 16:20:22 to try and do a real attach 16:20:28 well, 16:20:41 the long-term awesome thing to do is model ConnectionInfo as an object itself, 16:20:42 brick puked, because "1" != 1 16:20:46 with it's own fields that are various types 16:20:53 but, 16:20:56 ouch 16:20:56 hemna: I know, but still 16:20:59 yah 16:20:59 that means actually codifying that thing 16:21:15 actually knowing wtf goes into connection_info would be awesome 16:21:21 +1 16:21:23 even if you needed a vendor_extras field which is a dict of strings 16:21:29 and unvalidated 16:21:38 anyway, that's way down the road and will probably never happen 16:21:47 brick 2.0 16:21:52 uh, that looks a tough goal to aim for, but I definitely agree it would be nice to have some sort of a structure for that thing 16:21:55 hemna: it's server side 16:21:57 not brick 16:22:04 think nova.objects.ImageMetaProps in nova 16:22:09 well, there is the connector dict too 16:22:11 codifies image metadata properties from glance 16:22:14 I guess I was thinking of that too 16:22:16 yes connector should also be a versioned object 16:22:22 they are both a wild west of terrible 16:22:36 anyway, wayyyyyy down the rabbit hole here 16:22:38 can we move on? 16:22:41 yup 16:22:45 yes 16:23:34 so we have a recent WIP, which is live migrate 16:23:49 stvnoyes: do you want to discuss that here? 16:24:17 hold off on rv'ing that for now. i am in the middle of functionally testing it. i just got v3 attach working yesterday. 16:24:36 I saw your latest comments about rollback and was wondering whether that's something to raise here? 16:24:47 i've been updating the POC and my rv with what I'm finding 16:25:22 well there is a odd bug in nova in rollback.... 16:25:29 new or old code 16:26:37 https://review.openstack.org/#/c/463987/3/nova/compute/manager.py - self.compute_rpcapi.remove_volume_connection has the wrong args 16:26:50 mriedem: have you seen that ^? 16:26:56 ~line 5758 16:27:25 also, i'm not sure fixing the args is the real answer, i don't know why we'd do this on a source node. 16:28:41 why is the signature wrong? 16:29:25 this is the source host telling the dest host to rollback volume connections from the dest host right? 16:29:30 b/c we're rolling back 16:30:13 hmmm, re-looking at this, it seems ok. not sure what i was thinking. 16:30:25 ignore this for now 16:30:37 so, 16:30:41 live migration is a turd 16:30:43 just fyi 16:30:43 :) 16:31:02 yeah, we had similar issues in the virtual iron product 16:31:09 it's like 20 compute manager methods doing rpc calls back and forth and you rarely know which one is on the source or target host 16:31:20 none of it is documented, 16:31:21 or mapped out 16:31:28 you're welcome 16:31:34 :-) 16:31:43 nice :) 16:32:11 you're lucky the docstring even says 16:32:11 :param dest: 16:32:12 This method is called from live migration src host. 16:32:12 This param specifies destination host. 16:32:25 that's a rare bit of critical information in these methods 16:33:16 ok, then I guess whomever can make attach work with the new flow should test the live migrate patch for now to see how it works out 16:33:18 the odd part is that i was getting wrong number of args exception in here. i need to go back and re-look at where that was coming from. 16:33:26 $5 to a new contributor to document the flow :) 16:33:54 lol :) 16:34:06 i need to check the other patches that i applied... 16:35:23 moving on? 16:35:36 stvnoyes and i reviewed jgriffith's attach WIP https://review.openstack.org/#/c/330285/ 16:35:37 the attach PoC throws warious things I would guess, the other bits should be fine 16:36:05 mriedem: stvnoyes: we will skip the microversion discovery for now 16:36:25 so we can have attach itself working 16:36:29 first 16:36:45 i'm fine with that for now 16:36:54 we 16:36:56 we'll want t 16:36:58 damn 16:37:00 shelve should be moved out of the patch too, but as it contained more things originally than just attach I didn't bother as of yet... 16:37:05 we'll want to split that out into a separate patch first anyway i think 16:37:19 well, 16:37:28 this is attaching a volume to a shelved offloaded server 16:37:31 so it's still new style attach 16:37:41 just a wrinkle 16:37:46 i don't think you can split that out really 16:38:31 I hoped we can 16:39:20 I talked to John this morning, he's cleaning up things in the patch like using the attachment_id we have in the bdm now, etc. 16:39:41 I also finally have a DevStack up and running so I will chime in testing and cleaning up this stuff 16:40:21 i think there are some obvious things to fix in there, yes, which is a good start 16:40:28 which should also decomplicate it 16:40:45 mriedem: stvnoyes: also tnx for the thorough review :) 16:41:15 np, i am anxious to see migrate work 16:41:17 mriedem: yeah, I started to fix up check and reserve for instance, but couldn't get there to clean up the rest and things like these 16:41:25 hopefully shortly 16:41:49 the connection_info format changes in the driver are also old ones, we had a stage when the format didn't match 16:42:41 that needs to be checked again too 16:44:07 and when we have attach then we will start write a new version of swap 16:44:17 that thing still confuses me... 16:45:33 if someone could once describe me what migrate_volume_completion is supposed to do I would be pretty pleased :) 16:45:49 it's nova calling back to cinder saying i'm done 16:45:59 which was either successful or an error 16:46:09 if cinder initiated the swap, i thought telling it we failed initiated a rollback 16:46:12 on the cinder side 16:46:27 if cinder didn't initiate the swap, i assume cinder is like, 'ok great but i don't know what you're talking about' 16:47:30 and what does it do when Nova initiated a swap and it was successful? 16:47:35 mriedem: I believe so: https://github.com/openstack/cinder/blob/114e3bd7223ae957bd9bb71e3fb7e114af5c9f72/cinder/volume/manager.py#L2018 16:48:40 ildikov: i don't know 16:48:44 that's a question for cinder 16:48:53 mriedem: ok :) 16:49:10 i would think if cinder wasn't tracking a migration for the volume it would ignore it, 16:49:11 but idk 16:49:41 https://github.com/openstack/cinder/blob/114e3bd7223ae957bd9bb71e3fb7e114af5c9f72/cinder/volume/api.py#L1449 16:49:59 we don't cast to the volume manager then 16:50:33 so i think that's just updating the db on the cinder side 16:50:38 in the case that it didn't fail 16:50:47 which if nova already did that i think it's just a noop 16:50:53 but again i'm not sure 16:50:55 you mean with all that attach/detach calls in the non-error state? 16:50:58 yes 16:51:26 ok, cool 16:51:56 http://logs.openstack.org/95/467995/3/check/gate-grenade-dsvm-neutron-multinode-live-migration-nv/46deb21/logs/new/tempest_conf.txt.gz 16:51:59 oops wrong window 16:52:20 then we will look into how to reproduce those many noop's with the new flow then... 16:52:52 I think mainly that's it for today 16:52:57 cool, 16:53:08 i've also put out the nova core call to 2 specific cores to review the bottom changes which have the +2 16:53:10 mriedem: who to ping with those few patches to merge? 16:53:10 alex and kenichi 16:53:22 alex is in china so he's probably done for the day already 16:53:23 mriedem: awesome, tnx 16:53:31 but kenichi is in california so he can't hide 16:53:40 ok :) 16:54:32 I pinged other two guys, hopefully we will have at least one looking at those finally 16:55:07 cool 16:55:32 if nothing else then thank y'all for the today's meeting 16:55:32 which other two guys? 16:55:55 lyarwood and sfinucan IIRC 16:55:59 oh, not cores 16:56:01 but ok 16:56:05 anyway yeah let's call it a meeting 16:56:13 oh sfinucan is 16:56:15 duh 16:56:32 :) 16:57:15 talk to you next week the latest! :) 16:57:20 have a good day 16:57:26 #endmeeting