#openstack-powervm log

13:00:09 <esberglu> #startmeeting powervm_driver_meeting
13:00:10 <openstack> Meeting started Tue Sep  5 13:00:09 2017 UTC and is due to finish in 60 minutes.  The chair is esberglu. Information about MeetBot at http://wiki.debian.org/MeetBot.
13:00:11 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
13:00:13 <openstack> The meeting name has been set to 'powervm_driver_meeting'
13:00:26 <edmondsw> o/
13:00:30 <efried> \o
13:00:34 <thorst_afk> \o
13:00:35 <mdrabe> o/
13:01:28 <esberglu> #link https://etherpad.openstack.org/p/powervm_driver_meeting_agenda
13:01:45 <esberglu> #topic In-Tree Driver
13:02:19 <esberglu> Planning on starting the pike spec today in the background
13:02:27 <esberglu> Might have some questions
13:02:33 <edmondsw> efried did you mark the pike one implemented?
13:02:38 <esberglu> queens spec
13:02:47 <efried> I think mriedem did
13:03:11 <edmondsw> cool... I thought he had, but then you were talking about it on Friday and I assumed I'd missed something
13:03:26 <efried> oh, I didn't wind up moving it from approved/ to implemented/ because it appears there's a script to do that for all of them, and I don't think that's my responsibility.
13:03:49 <efried> well, to rephrase, I'm not sure it would be appreciated if I proposed that.
13:03:50 <edmondsw> oh, interesting
13:04:10 <edmondsw> as long as we're in sync with everything else
13:04:19 <efried> Yeah, nothing has been moved yet.
13:04:28 <edmondsw> esberglu how is config drive going?
13:05:11 <esberglu> edmondsw: Good I think I'm ready for the first wave of reviews on the IT patch
13:05:27 <edmondsw> what have you done to test it?
13:06:18 <edmondsw> I assume there are functional tests that we can start running in the CI related to this... have you tried those with it?
13:06:19 <esberglu> Still need to finish up the UT pypowervm side for removal of the host_uuid which is needed for that
13:06:30 <esberglu> Yeah there is a tempest conf option FORCE_CONFIG_DRIVE
13:06:38 <esberglu> I have done a couple manual runs with that set to true
13:06:58 <esberglu> And have looked through the logs to make sure that it is hitting the right code paths
13:07:07 <esberglu> Everything looked fine for those
13:07:08 <edmondsw> cool
13:07:14 <efried> Manually you can also do a spawn and then look to make sure the device and its scsi mapping exist.
13:07:25 <esberglu> I have not done any testing of spawns from the cli
13:07:53 <esberglu> Only through tempest so far
13:08:21 <edmondsw> k, let's try the CLI as well just to be safe
13:08:48 <esberglu> #action: esberglu: Test config drive patch from CLI
13:08:50 <edmondsw> and make sure that what you tried to get config drive to do was actually done on the guest OS
13:09:12 <edmondsw> e.g. setting the hostname
13:09:31 <esberglu> Yep will do
13:09:38 <thorst_afk> efried: you started posting in nova
13:09:40 <thorst_afk> lol
13:09:50 <efried> thorst_afk Eh?
13:09:53 <efried> Posting what?
13:09:59 <mdrabe> Does IT support LPM yet?
13:10:11 <thorst_afk> nm...I misread  :-)
13:10:13 <edmondsw> mdrabe not yet... that's one of our TODOs for queens
13:10:31 <mdrabe> K, was just thinking about the problems that vopt causes
13:10:39 <efried> LPM might be ambitious for Q
13:11:06 <edmondsw> efried oh, you're right... that was NOT a TODO for queens...
13:11:09 <efried> mdrabe We have that stuff solved for OOT; do you see any reason the solution would be different IT?
13:11:56 <mdrabe> I don't think so
13:13:17 <edmondsw> anybody else have something to discuss IT?
13:14:10 <esberglu> #topic Out-of-Tree Driver
13:14:28 <esberglu> Anything to discuss here?
13:14:35 <efried> Pursuant to above, the rename from approved/ to implemented/ is already proposed here: https://review.openstack.org/#/c/500369/2
13:14:47 <efried> (for the pike spec)
13:15:09 <edmondsw> efried cool
13:15:18 <edmondsw> mdrabe any updates on what you were working on?
13:16:27 <mdrabe> The PPT ratio stuff has been delivered in pypowervm
13:16:59 <mdrabe> I've tested the teeny nova-powervm bits, but I wanted to get with Satish on merging that
13:17:02 <edmondsw> mdrabe and nova-powervm?
13:17:28 <mdrabe> (Also the nova-powervm side has been merged internally)
13:17:30 <edmondsw> k
13:17:45 <edmondsw> anything else OOT?
13:18:08 <efried> I think I went through on Friday and cleaned up some oldies.
13:18:18 <edmondsw> there is an iSCSI-related pypowervm change from tjakobs that I need to review
13:18:49 <edmondsw> thorst_afk what are we doing with 5531?
13:19:08 <thorst_afk> edmondsw: looking
13:19:35 <edmondsw> been sitting a while
13:19:42 <thorst_afk> I don't think we need that
13:20:01 <thorst_afk> and if we want it, we can punt to a later pypowervm
13:20:12 <edmondsw> thorst_afk abandon?
13:20:14 <thorst_afk> but the OVS update proposed doesn't require it
13:20:20 <thorst_afk> yeah
13:20:22 <thorst_afk> can do
13:20:24 <edmondsw> tx
13:20:29 <edmondsw> anything else?
13:21:50 <edmondsw> esberglu next...
13:21:53 <esberglu> #topic PCI Passthrough
13:22:04 <efried> Which should be renamed "device passthrough"
13:22:21 <edmondsw> because?
13:22:30 <efried> Because it's not going to be limited to PCI devices.
13:22:38 <edmondsw> what else?
13:22:53 <efried> No significant update from last week; been doing a brain dump in prep for the PTG here https://etherpad.openstack.org/p/nova-ptg-queens-generic-device-management but it's not really ready for anyone else to read yet.
13:23:27 <efried> At the end of last week I think I started to get the idea of how it's really going to end up working.
13:24:11 <efried> And I think there's only going to be a couple of things that will be device-specific about it (as distinguishable from any other type of resource)
13:24:47 <efried> One will be how to transition away from the existing PCI device management setup (see L62 of that etherpad)
13:25:03 <efried> The other will be how network attachments will be associated with devices when they're generic resources.
13:25:37 <efried> I'm going to spend the rest of this week thinking through various scenarios and populating the section at L47...
13:25:57 <efried> ...and possibly putting those into a nice readable RST that we can put up on the screen in Denver.
13:26:01 <edmondsw> that sounds great
13:26:30 <edmondsw> even just throwing up the etherpad would be great
13:26:38 <efried> The premise is that I believe we can handle devices just like any other resource, with some careful (and occasionally creative) modeling of traits etc.
13:26:51 <efried> So the goal is to enumerate the scenarios and try to describe how each one would fit into that picture.
13:27:11 <efried> Now, getting this right relies *completely* on nested resource providers.
13:27:21 <efried> Which aren't done yet, but which I think will be a focus for Q.
13:27:42 <efried> If they aren't already, the need for device passthrough will be a push in that direction.
13:28:35 <edmondsw> so are we being pushed away from doing things first with the current state of things and then again later moving to resource providers?
13:28:37 <efried> Once that framework is all in place, the onus will be on individual virt drivers to do most of the work as far as inventory reporting, creation of resource classes and traits, etc.
13:28:55 <efried> What do you mean pushed?
13:29:17 <edmondsw> is this our choice, or are comments from jaypipes others making us have to go that way?
13:30:08 <efried> So as far as the powervm driver is concerned (both in and out of tree), I believe the appropriate plan is for us to implement our hacked up PCI passthrough using the existing PCI device management subsystem.  Basically cleat up the PoCs I've got already proposed.
13:30:14 <efried> and have that be our baseline for Q.
13:30:37 <efried> Then whenever the generic resource provider and placement stuff is ready, we transition.  Whether that be Queens or Rocky or whatever.
13:30:50 <edmondsw> ok, I misunderstood your intentions then
13:31:01 <edmondsw> sounds good
13:31:23 <edmondsw> I like doing what we can under the current system in case the resource providers is delayed
13:31:28 <edmondsw> but moving to that as soon as we can
13:31:39 <efried> Right; and to answer the other part of your question: yes, it's Jay et al (Dan, Ed, Chris, etc.) pushing for the way things are going to be.
13:32:11 <efried> I'm tracking it very closely, and imposing myself in the process, to make sure our particular axes are appropriately ground.
13:32:29 <edmondsw> great, as long as they're not resisting patches that will get things working under the current system
13:32:30 <efried> But so far the direction seems sane and correct and generic enough to accomodate everyone.
13:32:40 <efried> Oh, I have no idea about that.
13:33:04 <efried> We'll have to throw those at the wall and see if they stick.
13:33:08 <efried> But we can at least get it done for OOT.
13:33:08 <edmondsw> yeah
13:33:13 <edmondsw> yep
13:33:28 <efried> Which is the important thing for us.
13:33:50 <edmondsw> mdrabe need you to look at this resource providers future and assess PowerVC impacts
13:34:01 <edmondsw> we can talk more about that offline
13:34:09 <mdrabe> aye
13:34:19 <esberglu> Ready to move on?
13:34:27 <edmondsw> I think so
13:34:38 <efried> yup
13:34:40 <esberglu> #topic PowerVM CI
13:34:59 <efried> Noticed things are somewhat unhealthy at the moment.
13:35:02 <efried> At least OOT.
13:35:25 <efried> https://review.openstack.org/#/c/500099/ https://review.openstack.org/#/c/466425/
13:35:50 <esberglu> neo19 is failing to start the compute service with this error
13:36:19 <esberglu> http://paste.openstack.org/show/620409/
13:36:37 <edmondsw> http://ci-watch.tintri.com/project?project=nova says the last 7 OOT have passed
13:36:44 <esberglu> This persisted through an unstack and a stack. Anyone know what that's about? I asked in novalink with no response
13:36:50 <esberglu> As far as actual tempest runs
13:37:13 <esberglu> The fix for the REST serialization stuff doesn't appear to have solved the issue
13:37:22 <edmondsw> esberglu anytime we see HTTP 500 we will have to look at pvm-rest logs
13:37:27 <esberglu> Still seeing "The physical location code "U8247.22L.2125D5A-V2-C4" either does not exist on the system or the device associated with it is not in AVAILABLE state."
13:37:37 <esberglu> Need to link up with hsien again for that
13:38:20 <esberglu> There is this other issue that has been popping up lately
13:38:25 <esberglu> http://paste.openstack.org/show/620411/
13:39:08 <esberglu> Other than that I spent some time looking at the networking related tempest failures. It seems to be an issue with multiple tests trying to interact with the same network
13:39:25 <edmondsw> interesting...
13:39:27 <esberglu> I haven't wrapped my head around exactly what's going on there
13:39:38 <efried> esberglu That rootwrap one - is it occasional or consistent?
13:39:44 <esberglu> efried: Occaisonal
13:39:59 <efried> that's really weird.  If there's no filter for `tee`, there's no filter for `tee`.
13:40:14 <edmondsw> could be using wildcards that sometimes match and sometimes don't
13:40:23 <edmondsw> but that is really weird
13:41:00 <edmondsw> maybe part of the rootwrap setup is sometimes failing?
13:42:33 <esberglu> I haven't been keeping this page as up to date as I should, but I started transitioning some local notes to it
13:42:33 <edmondsw> esberglu I'll try to hep you with that offline
13:42:35 <esberglu> https://etherpad.openstack.org/p/powervm_tempest_failures
13:42:54 <edmondsw> ++
13:43:11 <edmondsw> esberglu could you stop taking local notes and just work out of that etherpad?
13:44:11 <esberglu> edmondsw: Yeah that's my plan. The formatting options aren't as robust as I would like but I can deal
13:44:40 <edmondsw> just as much as possible
13:45:01 <esberglu> Anyone know what that neo19 issue is about? Might just reinstall unless anyone has an idea
13:45:31 <esberglu> (IIRC this error message has been seen before and fixed via reinstall)
13:45:52 <efried> Seems like we should open a defect and have the VIOS team look at that.
13:46:09 <esberglu> efried: k
13:46:23 <esberglu> That's all for CI
13:46:24 <efried> Can we bump neo19 out of the pool in the meantime?
13:46:31 <esberglu> efried: Yeah it already is
13:46:34 <efried> coo
13:46:46 <esberglu> #topic Driver Testing
13:47:00 <esberglu> Haven't heard from jay in a while, anyone know where that testing is at?
13:47:09 <edmondsw> yeah, I've got updates here
13:47:14 <edmondsw> we have lost Jay
13:47:31 <edmondsw> he's been pulled off to other things
13:47:56 <edmondsw> We may have someone else that can help here, or we may not... I will be figuring that out this week
13:48:42 <edmondsw> longterm we can probably assume that testing will be whatever we can do as a dev team via tempest, with no specific tester assigned
13:49:17 <edmondsw> any questions?
13:49:48 <esberglu> Not atm
13:50:50 <esberglu> #topic Open Discussion
13:51:00 <esberglu> Anything else this week?
13:51:01 <edmondsw> we should all start thinking about tempest coverage and improving it where we can / should / have time
13:51:34 <edmondsw> I think that's it for me
13:51:58 <edmondsw> we have the PTG next week, so probably no meeting
13:52:17 <efried> yuh
13:52:30 <esberglu> Yep I'll cancel
13:52:51 <esberglu> Have a good week all
13:52:55 <esberglu> #endmeeting