#openstack-powervm log

13:04:01 <esberglu> #startmeeting powervm_driver_meeting
13:04:02 <openstack> Meeting started Tue Aug 29 13:04:01 2017 UTC and is due to finish in 60 minutes.  The chair is esberglu. Information about MeetBot at http://wiki.debian.org/MeetBot.
13:04:03 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
13:04:06 <openstack> The meeting name has been set to 'powervm_driver_meeting'
13:04:18 <efried> \o
13:04:31 <efried> edmondsw is at VMWorld.
13:04:41 <efried> thorst_afk - you going to be here?
13:04:48 <mdrabe> o/
13:04:52 <thorst_afk> not really
13:05:02 <thorst_afk> will be catching up on the feed periodically
13:05:16 <esberglu> #link https://etherpad.openstack.org/p/powervm_driver_meeting_agenda
13:05:24 <esberglu> #topic In Tree Driver
13:05:37 <esberglu> #link https://etherpad.openstack.org/p/powervm-in-tree-todos
13:05:40 <efried> Okay.  I guess it'll be mostly a status update.  esberglu when done, send link to minutes out so others can catch up.
13:06:45 <esberglu> I tested the config drive patch with FORCE_CONFIG_DRIVE=True
13:06:56 <esberglu> Everything seemed to be working as expected
13:07:06 <efried> Sweet.  Have you seen my email?
13:07:20 <esberglu> efried: Yep. Ran it last night, 0 failures
13:07:29 <efried> Well, so here's the funky thing...
13:07:29 <esberglu> Need to port that to the IT patch
13:07:46 <efried> https://review.openstack.org/#/c/498614/ <== this passed our CI.
13:07:59 <efried> It oughtn't to have.
13:08:42 <esberglu> efried: Weird...
13:08:56 <efried> We would have been sending down AssociatedLogicalPartition links like http://localhost:12080/rest/api/uom/ManagedSystem/None/LogicalPartition/<real_uuid>
13:09:16 <efried> But... perhaps the REST side is just parsing the end off without looking too closely at the rest of it.
13:09:39 <efried> The only side effect I can think of would have been that we would be ignoring fuse logic in mappings
13:09:48 <efried> which just means every mapping ends up on its own bus.
13:09:57 <efried> which wouldn't manifest any problems in the CI, really.
13:10:20 <esberglu> efried: Want me to post the CI logs from the manual CI run to see if there's anything different going on?
13:10:32 <efried> No, if it passes, we won't be able to see anything useful in there.
13:11:00 <efried> Anyway, yeah, esberglu you want to pick up those two changes and run with 'em?
13:11:06 <efried> finish up UT and whatnot?
13:11:16 <esberglu> efried: Sure
13:11:33 <esberglu> #action esberglu: Port host_uuid change to IT config drive patch
13:11:52 <esberglu> #action esberglu: Finish UT for OOT host_uuid patch
13:12:03 <esberglu> That
13:12:08 <esberglu> that's it for IT?
13:12:39 <efried> For completeness, the pypowervm side is 5818; the nova-powervm change to be finished up and ported to the in-tree cfg drive change is https://review.openstack.org/#/c/498614/
13:13:09 <efried> #action esberglu UT for 5818 too
13:13:24 <efried> It's passing right now, but needs some extra testing for the stuff I changed.
13:13:35 <esberglu> efried: ack
13:13:55 <efried> Oh, and following up on vfc mappings.  I don't have anything fibre-channely to test with.  Do you?
13:14:38 <esberglu> Don't think so
13:14:57 <efried> Need to make sure the same logic (posting ROOT URIs to create mappings) also works for vfc, then make the same change in the vfc mapping bld methods.
13:15:06 <efried> Can be a followon change, I suppose
13:15:26 <efried> For the moment, we're just using it to trim down the cfg drive stuff, which is always vscsi.
13:16:00 <esberglu> efried: Ok. We can loop back to that after this first wave is done and either add it in or start a new change
13:16:26 <efried> yuh.  Perhaps someone from pvc can lend us a fc-havin system for a day or two.
13:16:36 <efried> mdrabe you got anything like that?
13:16:52 <mdrabe> Yes
13:17:02 <efried> nice
13:17:07 <mdrabe> But as far as lending I'm not sure :/
13:17:28 <efried> I actually would literally need it for half an hour
13:17:44 <mdrabe> All be consumed for pvc stories atm, in a few days they should be free I think
13:17:45 <efried> and would need a free fc disk I could assign to an LPAR (even the nvl)
13:18:11 <efried> Assuming that disk is free, the testing would be nondestructive.
13:18:24 <efried> I just need to create a mapping in a certain way and make sure it works.
13:18:40 <mdrabe> Testing with devstack though right?
13:18:43 <efried> no
13:18:45 <efried> just pypowervm
13:18:49 <efried> don't even care what level
13:18:55 <mdrabe> oh mkay
13:19:09 <mdrabe> That's good then, dm me
13:19:14 <efried> rgr
13:19:46 <efried> #action efried to validate vfc mappings work the same way (with ROOT URIs for AssociatedLogicalPartition) using mdrabe's setup.
13:19:49 <efried> aaaand...
13:20:03 <efried> #action efried to continue thorough review of cfg drive change
13:20:22 <efried> At some point I reckon we're gonna need thorst_afk to review it too.
13:20:54 <efried> All three of us have had hands on it, so approval is going to be like supermajority consensus.
13:21:07 <efried> Oh, wait, this is in tree.
13:21:17 <efried> So we all just get to +1 it anyway.
13:21:25 <efried> And community approval is going to entail...
13:21:48 <efried> #action esberglu to drive pvm in-tree integration bp for q.
13:22:07 <esberglu> efried: Yep
13:22:14 <efried> Not sure if you caught my parting shot yesterday on that, but: may want to ask mriedem in -nova whether he wants a fresh bp or just re-approve existing.
13:22:31 <esberglu> efried: Yep saw that was planning on putting that in motion today
13:22:37 <efried> coo
13:23:11 <esberglu> #topic Out Of Tree Driver
13:23:28 <mdrabe> https://review.openstack.org/#/c/471926 passed functional testing
13:23:49 <mdrabe> There's one issue uncovered from the testing left to be ironed out, but it's unrelated to the change
13:24:34 <efried> What was the issue?
13:24:38 <efried> Love it when test finds bugs.
13:24:49 <efried> It kinda justifies the whole existence of testing as a thing.
13:25:21 <mdrabe> One evacuation failed with the dreaded vscsi rebuild exception...
13:25:45 <efried> got bug?
13:26:09 <mdrabe> Can't link here, dming, sec
13:26:17 <efried> If it's RTC, I don't care.
13:26:28 <mdrabe> Heh k
13:26:28 <efried> Is it not a bug in the community code?
13:26:49 <efried> If so, we ought to have a lp bug for it.
13:26:54 <mdrabe> It's been some time since I've looked at it
13:27:05 <efried> oh, is it an old bug?
13:27:09 * efried confused
13:27:23 <mdrabe> No, I just mean within a weeks timeframe
13:27:28 <mdrabe> I forget things quickly, sorry
13:28:18 <efried> mdrabe Okay, well, I'm not in a huge hurry to get a lp bug opened, but if the changes are going into nova-powervm, that should happen eventually (before we merge it).
13:28:47 <mdrabe> efried: The exception that was raised was this one: https://github.com/powervm/pypowervm/blob/develop/pypowervm/tasks/slot_map.py#L665
13:29:00 <mdrabe> For 1 out of 5 evacuations
13:29:32 <efried> As in, we couldn't find one of the devices on the target system?
13:29:37 <mdrabe> Right
13:29:42 <efried> Uhm.
13:29:47 <efried> So first of all, 1/5 ain't good.
13:29:58 <mdrabe> And I _think_ I recall seeing LUA recovery failures in the logs
13:30:02 <efried> Second, upon what are you basing your assertion that this is unrelated to your change?
13:30:16 <mdrabe> Because it's not related to the slot map
13:31:21 <efried> even though ten out of the 13 or so LOC leading up to that exception have 'slot' in 'em?
13:31:49 <mdrabe> but
13:31:52 <mdrabe> ok
13:32:04 <mdrabe> I'll -1 WF until we resolve it
13:32:35 <mdrabe> efried: fair?
13:32:46 <efried> I had put a +2 on it, but yeah, I think we should follow up first.
13:34:21 <esberglu> Reminder that pike official release is tomorrow
13:34:30 <esberglu> That it for OOT?
13:34:48 <efried> other than pci stuff, I think so.
13:35:10 <esberglu> #topic PCI Passthrough
13:35:18 <efried> okay
13:35:28 <efried> Lots to catch up on here since last week.
13:36:11 <efried> First of all, last week I got a prototype successfully claiming *and* assigning PCI passthrough devices during spawn.
13:36:38 <efried> Were any of y'all in the demo on Friday?
13:36:58 <esberglu> Yeah
13:36:58 <mdrabe> nope
13:37:32 <efried> The nova-powervm code is here: https://review.openstack.org/#/c/496434/
13:38:09 <efried> And I'm actually not sure ^ relies on any pypowervm or REST changes, as currently written.
13:38:19 <efried> despite what the commit message says.
13:38:29 <efried> now
13:39:12 <efried> REST has merged the change that lets us assign slots on LPAR PUT.  Which means I can remove the hack here: https://review.openstack.org/#/c/496434/3/nova_powervm/virt/powervm/vm.py@573
13:40:37 <efried> Also the much-debated PCI address spoofing I think I'm gonna keep in nova-powervm (abandoned 5755 accordingly) because...
13:40:43 <efried> All of this is going to be temporary
13:40:50 <efried> It may not even survive queens, gods willing.
13:41:02 <mdrabe> efried: I forget, through what API do we assign PCI devices after spawn?
13:41:46 <efried> mdrabe Before that REST fix?  IOSlot.bld and append that guy to the LPAR's io_config.io_slots.  Then POST the LPAR.
13:42:11 <openstackgerrit> Eric Berglund proposed openstack/nova-powervm master: DNM: ci check  https://review.openstack.org/328315
13:42:43 <mdrabe> efried: And that's triggered by an interface attach from an openstack perspective?
13:43:16 <efried> mdrabe No, actually, I'm not sure what happens during interface attach - should probably look into that.
13:43:40 <efried> No, in openstack the instance object we get passed during spawn contains a list of pci_devices that have been claimed for us.
13:43:49 <openstackgerrit> Eric Berglund proposed openstack/nova-powervm master: DNM: CI Check2  https://review.openstack.org/328317
13:44:24 <efried> Via the above change sets, we're culling that info and sending it into LPARBuilder (curse him).
13:44:51 <efried> mdrabe Is that what you were looking for?
13:45:12 <mdrabe> I'm just trying to understand the flows affected
13:45:38 <efried> Sure, definitely worth going over in more detail, let's do that.
13:46:38 <mdrabe> Yea, I've been meaning to take some time to stare at this stuff, I'll probably ask better questions after I do that
13:46:45 <efried> Nova gets PCI dev info from three places:
13:46:52 <efried> => get_available_resource (in the compute driver - code we control) produces a list of pci_passthrough_devices as part of the json object it dumps.
13:47:35 <efried> => The compute process looks in its conf for [pci]passthrough_whitelist, which it intersects with the above to filter down to only devices you're allowed to assign to VMs.
13:48:16 <efried> => The nova API process looks in its conf (which may not be the same .conf as the compute process - took me a while to figure THAT one out) for [pci]alias entries, which it *also* uses to filter the above.
13:48:55 <efried> The operator sets up a flavor.  In the flavor extra_specs he sets a field called pci_passthrough:alias whose value is a comma-separated list of <alias>:<count>
13:49:48 <efried> The <alias> names come from the [pci]alias config, and are how the op identifies what kinds of devices he wants on his VM.  Those [pci]alias entries just map the alias name to a vendor/product ID pair.
13:49:57 <efried> And the <count> is how many of that kind of dev you want.
13:50:01 <efried> So
13:50:56 <efried> When you do a spawn with that flavor, nova looks at the pci_passthrough:alias in the flavor, maps it to the vendor/product ID, and then goes and looks in the filtered-down pci_passthrough_devices list for devices that match.
13:51:12 <efried> Meanwhile it's keeping track of how many of those kinds of devices it has claimed and whatnot.
13:51:36 <mdrabe> Ok so adding/removing PCI devices is triggered through resize
13:51:38 <efried> So assuming it finds suitable devices, it decrements their available count and assigns 'em to your instance.
13:51:50 <efried> Yes, I believe that's the case, though I haven't explicitly tried it yet.
13:52:17 <mdrabe> That makes me wonder how this works with SR-IOV
13:52:32 <efried> To come full circle: nova puts the specific devices it claimed into your instance object that it passes to spawn, which is where our code again gets control.
13:52:50 <efried> Yeah, SR-IOV is going to be a different story
13:52:59 <efried> Especially since we're not doing the same thing nova does with SR-IOV.
13:53:16 <efried> But much of the flow is the same.
13:53:54 <efried> pci_passthrough_devices is *supposed* to register each VF as a child of its respective PF.
13:54:05 <efried> So you could claim a VF and the matching is done based on the parent.
13:54:32 <efried> But when you're doing that as part of network interface setup, things go off the rails a bit.
13:55:02 <efried> Now it starts looking for a physical_network tag on your device and trying to bind a neutron port with that network and all that jazz.
13:55:40 <efried> In the rest of the world, you have to pre-create VFs, and they're passed through explicitly one by one and assigned directly to the VM.
13:56:01 <efried> In our world... we don't have the VFs until we need 'em, and even then, they're not assigned directly to the VM.
13:56:49 <efried> So we have to fool the pci manager by spoofing "fake" VFs in our pci_passthrough_devices list.  We just create however many entries according to the MaxLPs on the PF.
13:57:23 <mdrabe> Right okay, I'm stuck in the PowerVM perspective
13:58:12 <efried> Yeah, so when we do a claim with SR-IOV, nova actually hands us one of those fake VFs, but we ignore it and just create our VNIC on the appropriate PF.
13:58:55 <efried> This is probably enough historical treatise.  The aforementioned PoC code gives me confidence that we can make this work in q without community involvement.  Which is not bad.
13:59:01 <efried> But it also ain't pretty.
13:59:16 <efried> The main ugliness is that we have to spoof our PCI addresses.
13:59:41 <efried> Because nova refuses to operate without a Linuxy PCI address in <domain>:<bus>:<slot>.<func> format.
13:59:54 <efried> Our devices don't have those.  We have DRC index and location code.
14:00:09 <efried> Linuxy PCI addresses are 32-bit.  DRC index is 64-bit.
14:00:42 <mdrabe> What determines the DRC index for us?
14:00:46 <efried> PHYP
14:00:46 <mdrabe> phyp?
14:01:25 <efried> So I started down a path of suggesting some changes to nova's pci manager that would allow us to use our DRC index (or location code, or whatever we wanted) to address and identify devices.
14:01:42 <efried> https://review.openstack.org/497965
14:02:38 <efried> It was basically shot down as being an interim hackup that would be superseded by the move to placement and resource providers.
14:03:04 <efried> Which is really what I was going for in the first place.  I wanted to garner some attention and discussion that would get us moving in that direction.
14:04:07 <efried> The upshot is that we (I believe Jay is the nova core most invested in this) want to make devices (not just PCI - any devices) managed through the placement and resource provider framework.
14:04:36 <efried> In that nirvana, our compute driver provides a get_inventory method, which replaces get_available_resource.
14:05:24 <efried> The information contained therein is able to represent any resource generically, and the nova code doesn't try to introspect values and do stuff with 'em like it is doing today for PCI addresses and whatnot.
14:05:54 <mdrabe> That sounds like the way to go
14:06:10 <efried> That work is off the ground at this point in nova, for resources like vcpu, mem, and disk.
14:06:18 <efried> There's also some support for custom resource classes.
14:06:23 <efried> So
14:06:57 <efried> Jay and I are working up content for discussion at the PTG toward making devices managed by the same setup.
14:07:18 <mdrabe> Cool
14:08:22 <esberglu> Good discussion. We ready to move on?
14:08:33 <efried> A resource provider would describe the devices it has available; those devices would have qualitative and quantitative properties.  Nova would get a spawn request asking for a device with certain qualitative and quantitative properties.  Placement and scheduler and claims and family would just match those values (again, blindly, not introspecting the values) and give us the resources.
14:08:51 <efried> And we get the helm back in our driver and do whatever we want with those claimed resources.
14:09:29 <mdrabe> I feel much more informed than I did an hour ago
14:09:53 <esberglu> Same
14:10:02 <efried> So my action this week is going to be collating some of these notes and stuff, creating an etherpad for the PTG, and perhaps putting some of it down in a blueprint https://blueprints.launchpad.net/nova/+spec/devices-as-resources whose spec is here: https://review.openstack.org/#/c/497978/
14:10:51 <mdrabe> efried: Is the resource provider change targetted for q?
14:11:02 <efried> Well, that's what I don't know.
14:11:09 <efried> I'm sure it will be targetted for q.
14:11:15 <efried> Whether it will get done in q is another question.
14:11:19 <efried> So
14:11:31 <efried> We need to be prepared to move forward with our hacked version
14:11:44 <efried> And we can transition over as able.
14:11:48 <efried> It's a big piece of work.
14:12:09 <efried> So I suspect that even if it gets done in q, it'll get done late in the cycle, possibly too late for us to exploit it fully ourselves.
14:12:37 <efried> The really good news here is that Jay is very invested in this, and it fits with the overall direction nova is moving wrt placement and resource providers, so I don't doubt it's going to get done... eventually.
14:12:57 <efried> It's not just us whining "we need this for PowerVM".
14:13:40 <esberglu> Cool
14:13:50 <efried> Okay, I think that's probably enough of that for now.  Any further questions, or ready to move on?
14:14:09 <efried> #action efried to write etherpad and/or spec content for nova device management as generic resources.
14:14:09 <esberglu> I might have questions later, I need to look through the code still
14:14:32 <esberglu> #topic PowerVM CI
14:14:37 <esberglu> Not much to report here
14:15:10 <esberglu> Still waiting for the REST change for the serialization issue
14:15:25 <efried> esberglu It's been prototyped, though?
14:15:33 <efried> And run through CI?
14:15:58 <esberglu> efried: Prototyped and run through CI, but not with the latest version of the code
14:16:10 <efried> 5775?
14:17:38 <esberglu> efried: I think it requires the related changes as well. Not 100% sure though, hsien deployed it
14:18:59 <esberglu> Other than that the compute driver was occaisionally failing to come up on CI runs. The stacks on the undercloud for a few systems were messed up
14:19:09 <esberglu> I redeployed, haven't seen it since, gonna keep an eye out
14:20:00 <esberglu> Those were the only failures hitting CI consistently, so failure rates should be pretty low now
14:20:16 <esberglu> Well not now, once that rest fix is in
14:20:30 <esberglu> That's all I had CI
14:20:52 <esberglu> #topic Driver Testing
14:21:05 <esberglu> Jay isn't on. But he was having problems stacking last week
14:21:24 <esberglu> I got his system stacked, not sure if any further testing has been done on it yet
14:22:14 <esberglu> Nothing else to report there
14:22:34 <esberglu> #topic Open Discussion
14:22:45 <esberglu> That's it for me
14:22:52 <efried> nothing else here
14:23:40 <esberglu> Alright. See you here next week
14:23:50 <esberglu> #endmeeting