14:01:30 #startmeeting PowerVM Driver Meeting 14:01:32 Meeting started Tue Apr 10 14:01:30 2018 UTC and is due to finish in 60 minutes. The chair is edmondsw. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:01:33 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:01:35 The meeting name has been set to 'powervm_driver_meeting' 14:01:40 #link agenda: https://etherpad.openstack.org/p/powervm_driver_meeting_agenda 14:01:52 #topic In-Tree Driver 14:02:04 #link https://etherpad.openstack.org/p/powervm-in-tree-todos 14:02:30 esberglu update on IT status? 14:03:00 edmondsw: Everything before localdisk is ready for core review 14:03:36 efried: I've responded to all your comments on localdisk except 14:03:38 https://review.openstack.org/#/c/549300/18/nova/virt/powervm/disk/localdisk.py@122 14:03:44 Wasn't sure exactly what you meant there 14:04:45 hotplug merged, so I've removed that from the todo etherpad 14:04:46 jichenjc left a few more comments that I haven't hit yet 14:05:04 shall we talk about that comment now? 14:05:13 go ahead 14:06:14 We're building a ftsk, which is supposed to have a list of VIOSes in it as getters rather than wrappers, which is supposed to allow us to defer the retrieval of the VIOS(es) until we want to do the actual work, which is supposed to minimize the window for conflicts. 14:06:42 But we're doing things here that are eliminating those benefits. 14:08:23 First off, there's only one VIOS we care about, and we already know which one it is (self._vios_uuid or whatever). So using build_active_vio_feed_task - which goes out and tries to figure out which of all the VIOSes are "active" (RMC up) and stuffs all of those into the ftsk - will only *hopefully* include that VIOS, and may very well include the other(s) that we don't care about. 14:08:51 agree that we only care about one VIOS 14:08:52 Second, L137 accesses the .wrapper @property, which prefetches the wrappers in the ftsk, so we're not getting the benefit of deferring that fetch. 14:09:24 __init__ only sets self._vios_uuid, it does not cache the vios_w, so we do need a way to get vios_w 14:09:48 and we do need to get it to make sure we have the latest info there, right? 14:09:58 The main benefit I forgot to mention is running the subtasks in parallel across the VIOSes. Which is n/a here since there is (should be) only one VIOS we care about. 14:10:12 efried: So what you're proposing is that instead of adding the rm_func to the stg_ftsk we would just call tsk_map.remove_maps 14:10:20 Directly after find_maps? 14:10:35 so do we need a different stg_ftsk that only retrieves one vios, or do we need to get the vios without a feedtask? 14:10:48 I'm saying using a ftsk at all in this method is overkill. 14:10:51 unnecessary. 14:11:08 esberglu: Let me look; it's possible remove_maps already returns the maps that get removed. 14:11:27 are feedtasks only relevant when you're dealing with lists, and not singletons? 14:11:39 feed = list? 14:12:07 ...which means we could have probably extracted those results out of the ftsk after execute. 14:13:07 edmondsw: No, that's not the only advantage. Doing multiple operations, reverting stuff, etc. (FeedTask is a derivative of TaskFlow) 14:13:33 ...yup, remove_maps already returns the list of maps removed. 14:13:58 I haven't looked, but I suspect the current code is an artifact of slot manager garbage from OOT, and we're going to have to re-complexify it later when we put that shit back in. 14:14:05 but for now, we can make this way simpler. 14:14:20 efried: So rip out the stg_ftsk & rm_func stuff, rip out find_maps 14:14:30 And just have 14:14:32 vios_w = stg_ftsk.wrapper_tasks[self._vios_uuid].wrapper 14:14:53 you just said rip out stg_ftsk, so that won't work 14:15:14 Oh right 14:15:16 No stg_ftsk. Retrieve the VIOS wrapper afresh based on self.vios_uuid 14:15:25 yep 14:15:26 ...using the SCSI xag 14:15:31 VIO_SMAP 14:15:53 and then tsk_map.remove_maps on it 14:15:59 and done 14:16:09 Okay got it 14:16:45 esberglu for vscsi, we already did this right? "Add a follow on to use pypowervm 1.1.12 for wwpns" 14:16:53 so I'm removing that from TODO etherpad 14:17:02 edmondsw: Yeah 14:17:25 has anything merged other than netw hotplug 14:17:37 or have comments we need to address, other than localdisk? 14:17:43 edmondsw: Nope, nothing has been reviewed, still a few things ahead of us in runways 14:17:52 yep 14:18:02 any updates on migrate/resize? 14:18:27 Gonna finish up localdisk today and get it ready for review, then jump back into that 14:18:36 cool 14:18:36 Ready for core review 14:18:51 ok, anything else IT? 14:19:08 Is there any system requirements for SDE installs? My install failed 14:19:23 And I need that to test localdisk snapshot 14:19:53 (esberglu: Just noticed gaffe in the commit message) 14:20:14 esberglu thinking... but check with seroyer 14:21:01 I think there are some local disk size requirements? 14:22:17 edmondsw: I'll ask and try again, might also see if anyone can loan me a system for a couple days 14:22:37 #topic Out-of-Tree Driver 14:22:47 #link https://etherpad.openstack.org/p/powervm-oot-todos 14:23:05 I've got a meeting setup with the PowerVC folks to talk about the volume refactoring 14:23:19 and get everyone on the same page there 14:23:49 I'd talked to gfm about this, and he was onboard, but some others on his team are freaking out 14:23:52 so need to calm them down 14:24:15 I've been working with chhavi__ quite a bit on iscsi 14:24:23 I think we're making progress there 14:24:59 I need to ping burgerk about https://review.openstack.org/#/c/428433/ again 14:25:14 #action edmondsw to ping burgerk about config drive UUID 14:25:43 I also need to start writing code for MSP support 14:26:40 efried I think the pypowervm support is already there for that, though obviously untested 14:27:04 efried I will probably be proposing a change to at least the docstring, though, since it says name where it actually needs IPs 14:27:31 and the arg is badly named as well... I'd love to rename it, but that would break backward compat 14:27:35 "MSP support"? 14:27:40 What arg? 14:27:42 What docstring? 14:27:44 do you think that's ok since it didn't work before? 14:27:45 What's going on here?? 14:27:48 one sec 14:28:14 https://github.com/powervm/pypowervm/blob/master/pypowervm/tasks/migration.py#L52 14:28:28 dest_msp_name and src_msp_name should actually be lists of IP addresses 14:28:31 not names 14:28:39 MSP = mover service partition 14:29:05 specifying IPs allows you to dictate which interfaces are used for LPM 14:29:45 new for NovaLink, but HMC has had this... presumably the pypowervm code was copied from HMC support 14:30:06 efried make more sense now? 14:30:47 I thought the "lists of" thing was something new coming down the pipe. 14:30:55 And... you're saying those args don't work at all today? 14:31:37 efried NovaLink didn't support those in REST until the changes Nicolas has just now been working on 14:31:44 so they couldn't have worked (for NovaLink) before 14:32:00 OIC, we just copied that method from k2operator or whatever? 14:32:06 I assume, yes 14:32:44 was REST just ignoring any values passed down there? 14:33:04 cause if so, we can't remove/rename them. If it was erroring, then maybe we can get away with it. 14:33:24 efried right, I have to check with Nicolas on that 14:33:42 until I know otherwise, I'm assuming we have to leave them and just cleanup the docstring 14:33:48 Well... 14:34:11 If they can now be lists, we should probably accept (python) lists, and convert 'em to comma-delimited (or whatever) within the method. 14:34:32 yes 14:34:48 I don't mean there would only be a docstring change... just that I wouldn't rename the args unless they were erroring before 14:35:10 Dig. 14:35:37 anything else to discuss OOT? 14:36:09 nope 14:36:11 #topic Device Passthrough 14:36:15 #efried you're up 14:36:35 I started working on granular. Some pretty intricate algorithms happening there. 14:36:55 Got grudging agreement from jaypipes that the spec as written is the way we should go (rather than switching to separate-by-default) 14:37:18 cool 14:37:26 he still has to convince Dan, but I think since the path of least resistance is what we've got, it'll just fall off. 14:37:56 In case you're interested in looking at the code: https://review.openstack.org/#/c/517757/ 14:38:04 I need to fix test, but the general idea is there. 14:38:20 I'm interested, but won't have time 14:38:21 At this point I've given up waiting for Jay to finish nrp-in-alloc-cands before I do that. 14:38:29 :) 14:38:38 So whichever one of us wins, the other has to figure out how to integrate granular+NRP. 14:39:33 There's a new #openstack-placement channel you may wish to join. 14:39:43 efried ah, tx for the heads up 14:40:21 upt stuff is mostly merged. I think my runway expires tomorrow. But the stuff that's left is pretty nonessential - if it doesn't get in, it's not the end of the world. 14:40:34 so the last important one did merge? 14:40:49 I think so. Lemme double check. 14:41:24 yeah. The pending ones are nice-to-have, but we can get by without 'em if we need. 14:42:02 https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/update-provider-tree 14:42:17 efried so are we ready to start making changes in our driver? 14:42:35 Yes, if someone else wants to do it. I'm going to be heads down on granular and reviewing other placement stuff for a while yet. 14:42:49 also note that we can't get any mileage out of actual trees until Jay's thing is done. 14:43:29 We can do single-provider stuff with traits, but we won't be able to do anything with child providers for GPUs etc. 14:43:43 so that'll probably wait a bit longer then, because I have too many other things on my plate right now as well 14:44:00 We could hack that together with custom resource classes, one per GPU, inventory of 1. But that would be an interim solution. 14:44:22 If we get to the end of Rocky and the NRP work still isn't finished, we may have to do that. 14:44:29 k 14:44:42 anything else? 14:45:18 Well, on your side, have we gotten any further figuring out how we want to map/represent/filter supported devices? 14:45:37 I think it's pretty much what we'd talked about before 14:45:55 provider per adapter 14:46:28 so we can allow selection by unique id if need be 14:47:19 for PCI, representation will use PCI vendor/device IDs 14:47:32 hm, then I wonder if we actually want to model it with custom resource classes. 14:47:41 Nah, the cores will freak about that. 14:47:50 the custom bit will the unique id 14:48:13 Right, but we need to use traits for that. 14:48:26 so we could use a common provider and have custom traits? 14:48:34 if so, great 14:48:44 no, if it's a common provider, it would need to be distinct RCs. 14:48:50 Separate providers, traits. 14:49:13 otherwise there's no way to know which trait belongs to which device. 14:49:44 I did not follow that 14:49:45 btw, RP names are freeform - no char restrictions - so we can do whatever tf we want with them. 14:50:18 meaning that we can use the DRC name (or whatever) for the RP name, and not have to do any weird mapping. 14:50:28 Sorry, okay, lemme back up. 14:51:00 Traits are on providers, not resources. 14:51:44 efried: edmondsw: Sorry to butt in, but I've got to present on CI in a few minutes 14:51:46 Multinode CI status: Have working multinode stack within staging, updated prep_devstack to handle control and compute 14:51:58 Still seeing a few errors there 14:52:06 efried yeah, let's give esberglu a few min on CI and we can continue later 14:52:13 #topic PowerVM CI 14:52:17 #link https://etherpad.openstack.org/p/powervm_ci_todos 14:52:27 Next up is getting zuul/nodepool to work with multinode 14:52:37 And figuring out the tempest failures 14:52:43 That's pretty much all I have 14:53:09 esberglu tempest failures? 14:53:31 is that specific to multinode, or in general? 14:53:53 Seeing cold mig tempest failures (not all, just a few tests) 14:53:56 On OOT 14:54:10 ok 14:54:33 Gotta run 14:54:43 I need to run as well 14:54:58 #topic Open Discussion 14:55:01 anything quick here? 14:55:02 edmondsw: If we want to have all of our devices in the same provider, and have them all with the same generic resource class (e.g. "GPU"), it doesn't help us to have all the traits that represent all the devices on that provider, because when you pick one off, you don't know which trait goes with which inventory item. And we don't want to be editing traits on the fly to indicate that kind of thing. So if we want all ou 14:55:02 custom resource classes (e.g. "GPU_") and we kinda lose the ability to request different types (e.g. based on vendor/product IDs). 14:56:03 So what we want is one RP per device, with the provider name equating to a unique identifier we can correlate back to the real device, and traits on the RP marking the type (vendor/product IDs, capabilities, whatever). 14:56:20 each RP has inventory 1 of the generic resource class (e.g. "GPU") 14:56:47 If that's still murky, hmu later and we can talk it through s'more. 14:56:55 so we can use the common/generic RC 14:57:02 but need custom RP 14:57:15 We were going to want to do that to some extent anyway. 14:57:16 that's what I was hoping 14:57:29 Theoretically we could group like devices 14:57:42 but then we lose the ability to target a *specific* device. 14:57:47 which I gather is something we still want. 14:57:52 I think so 14:57:53 even though it's not very cloudy. 14:58:22 well... there are different definitions of cloud 14:58:31 I think you're falling into the nova definition trap :) 14:58:40 s/nova/certain nova cores/ 14:58:55 #endmeeting