#openstack-powervm log

13:01:52 <esberglu> #startmeeting powervm_driver_meeting
13:01:53 <openstack> Meeting started Tue May 16 13:01:52 2017 UTC and is due to finish in 60 minutes.  The chair is esberglu. Information about MeetBot at http://wiki.debian.org/MeetBot.
13:01:54 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
13:01:56 <openstack> The meeting name has been set to 'powervm_driver_meeting'
13:02:24 <edmondsw> o/
13:02:55 <esberglu> #topic In Tree Driver
13:03:01 <efried> o/
13:03:06 <thorst> \o
13:03:14 <efried> No progress
13:03:32 <efried> obviously nothing was going to happen during the summit.
13:03:42 <esberglu> Yep that's what I thought. Anything to discuss here?
13:03:45 <efried> And at this point I wanna give everyone a bit to calm back down.
13:03:55 <efried> Nope.  I'll look for an opportune time to pester again.
13:04:34 <esberglu> #topic Out-of-tree Driver
13:04:54 <thorst> lots of reviews out.  PowerVC team has been pushing up some solid patches
13:04:59 <thorst> I sent efried a list to review
13:05:16 <efried> #action efried to do those
13:05:27 <thorst> chhavi has also been driving a ton of the iSCSI work.  She found a ton of issues that needed to be worked, changes in the REST itself, etc...
13:05:34 <thorst> so great work chhavi...many many thanks
13:05:49 <efried> +1
13:07:02 <thorst> I'm basically working on core issues with the QCOW handler
13:07:09 <thorst> but they're transparent to the OpenStack driver
13:07:25 <efried> I have a couple of topics for the driver in general, came out of the summit.  Let's have a separate "Summit takeaways" topic later.
13:07:33 <thorst> k
13:07:36 <thorst> I think that's all I had.
13:07:51 <esberglu> I'm still continuing to work in-tree backports as able. Nothing else from me
13:08:25 <efried> CI?
13:08:35 <esberglu> #topic PowerVM CI
13:08:58 <esberglu> My #1 priority this week is identifying/opening/fixing bugs that are causing tempest tests
13:09:49 <esberglu> There are still some in-tree tests that fail occaisonally because of other networks that exist from other tests during the time they are running
13:10:08 <esberglu> And a handful of problematic tests for OOT
13:10:56 <esberglu> Other than that I'm still working the systemd change. It's looking good for the most part.
13:11:11 <efried> I'd still like a systemd'd setup to noodle around with.
13:11:37 <efried> I've got some definite comments queued up for 5245; but will have more once I've had a chance to play.
13:11:59 <esberglu> However the n-dhcp service looks like it is still getting started with screen, which doesn't work with the logserver changes
13:12:12 <esberglu> Haven't got into a node to play around with it yet.
13:12:32 <thorst> that systemd thing is weird
13:12:39 <thorst> I'm not sure I'm a fan of getting rid of screen.
13:12:45 <thorst> but that may just be that I'm so used to screen.
13:13:16 <efried> thorst Totally agree.  I'm old, and not a fan of change.  But I think once we get used to it, it'll be better.
13:13:28 <efried> The one thing I'm going to miss the most is the coloring.
13:13:30 <thorst> totes...but grumble in the mean time
13:13:39 <efried> Hopefully we figure out how to get colors.
13:13:43 <esberglu> efried: I can spin you up a node on the prod. cloud. I'm gonna be tearing the staging down a bunch this week
13:14:07 <efried> I don't see why we can't still throw color codes into journald - mebbe it doesn't allow them?
13:14:17 <thorst> I thought I was seeing it...
13:14:34 <efried> thorst Where?  We haven't produced anything with journald yet, have we?
13:15:37 <thorst> chhavi's env
13:15:41 <thorst> iscsi debug
13:15:46 <efried> Anyway, converged logs plus the "traveling request-id" stuff jaypipes is working on ought to make debugging much nicer.
13:16:06 <thorst> traveling request-id?  Sounds beautiful
13:16:18 <thorst> almost as nice as the service token stuff
13:16:57 <edmondsw> it's related, actually
13:17:22 <edmondsw> dependent on service tokens... so another reason to love those
13:17:43 <thorst> heh, you know how much I love security discussions...
13:18:12 <thorst> anywho...
13:18:15 <thorst> anything else here?
13:18:40 <esberglu> Backlog of small CI issues as always. But nothing noteworthy
13:19:15 <esberglu> #topic Driver Testing
13:19:21 <efried> thorst #link https://review.openstack.org/#/c/464746/
13:19:26 <efried> ^^ request-id spec
13:20:30 <esberglu> Any issues or updates on the manual driver testing?
13:21:12 <efried> jay1_ ^^ ?
13:23:24 <efried> FYI guys, I gotta jump in about 20 minutes.  My #3 has a track meet.  (He's running the hurdles, like his ol' dad used to do ;-)
13:23:54 <efried> Can we move on, and come back to jay1_ later?
13:23:56 <esberglu> Alright moving on.
13:24:07 <esberglu> #topic Open Discussion
13:24:43 <efried> Okay, couple things came out of the summit that I wanted to document for the sake of having 'em in writing (read: before I forget all about 'em)
13:25:29 <efried> First, placement and SR-IOV.  Talked with jaypipes; he agrees PCI stuff is totally broken, and would definitely be in the corner of anyone "volunteering" to fix it.
13:26:13 <efried> He seemed at least superficially sympathetic to the fact that the existing model doesn't work at all for us (or anyone who's not running on the hypervisor - access to /dev special files, pre-creation of VFs, etc.)
13:26:35 <efried> So this would be an opportunity for us to a) fix our SR-IOV story; and b) look good in the community.
13:27:02 <edmondsw> +1
13:27:21 <edmondsw> that should definitely be on our TODO list
13:27:57 <efried> Second, resource groups.  Today, if we have 5 compute nodes all on the same SSP which has, say, 20GB of free space, and you ask the conductor how much storage you have in your cloud, it answers 5x20GB, cause it doesn't know any better.
13:28:22 <jay1_> efried: the update on the ISCSI verification, issue is there with attach/detach.
13:28:33 <jay1_> https://jazz07.rchland.ibm.com:13443/jazz/web/projects/NEO#action=com.ibm.team.workitem.viewWorkItem&id=174342
13:28:35 <efried> jay1_ Hold on, we'll come back to that.
13:28:38 <edmondsw> I thought there was a concept of shared resources... are we not signaling something correctly there?
13:28:52 <efried> There's a spec to define resource groups within placement, seems to be mostly done.  This would allow you to register the fact that all of those computes are on the same SSP, and the math would fix itself.
13:29:39 <edmondsw> ah... so still in the works. Is the spec almost done, or the implementation?
13:29:41 <efried> bah, I'll have to find that spec later, I've got it here somewhere.
13:29:50 <efried> edmondsw I think the code is ready for use.
13:29:53 <efried> So the thing is:
13:30:28 <efried> The user can define the resource group by running commands.  In that picture, we don't have to do anything - but it's the responsibility of the user to get it right.
13:30:55 <efried> However, there's a way we can tie into the placement API with code and set up this resource group ourselves from within our driver.
13:31:05 <efried> Like, when get_inventory is called.
13:31:23 <edmondsw> yeah, we don't want users having to do that
13:32:22 <efried> Roughly, from get_inventory, we would check whether the SSP is registered (hopefully we're allowed to specify its UUID).  If not, register it from this host.  If so (and this might be a no-op), add this host to it.
13:32:53 <efried> jaypipes volunteered to show me where that code is, but he's in China this week, so I'll follow up next week.
13:33:16 <efried> Those were the two major takeaways from the summit.
13:33:18 <efried> from me.
13:33:34 <efried> to-dos, I should say.  I had much more takeaway than that.
13:33:40 <edmondsw> :)
13:33:47 <edmondsw> thanks, efried
13:34:07 <edmondsw> esberglu did you get a chance to look at that etcd stuff and see whether it would impact us?
13:34:18 <edmondsw> or more likely, how?
13:35:11 <esberglu> That totally slipped my mind. Adding it to the list now
13:37:14 <efried> Also on the backlog of driver work to be done (these have been hanging out for a while, but I may as well mention them here to get them on record):
13:37:18 <efried> o Implement get_inventory().  This replaces get_available_resource().
13:37:18 <efried> o Make the compute driver mark the host unavailable if the REST API is
13:37:18 <efried> busted.
13:37:18 <efried> o Subsume HttpNotFound exception - now available via pypowervm 1.1.4,
13:37:18 <efried> which is through global-reqs.
13:38:03 <edmondsw> efried that's all for the IT driver, right?
13:38:12 <efried> Both.
13:38:20 <edmondsw> ah, true
13:39:00 <efried> Oh, also need to keep an eye on https://review.openstack.org/#/c/452958/ and remove that arg from our destroy method when it merges.
13:40:07 <edmondsw> efried why isn't that fixing the IT driver as part of the patch?
13:40:16 <thorst> I love the idea to mark the driver down if the REST API isn't working
13:40:16 <efried> edmondsw Was just about to -1 for that.
13:40:23 <edmondsw> good
13:41:33 <edmondsw> should we go back to the iSCSI testing now?
13:42:20 <esberglu> Sure. I didn't have anything else
13:42:39 <edmondsw> #topic iSCSI testing
13:42:49 <efried> Gotta bounce - I'll have to catch up on the iSCSI stuff later.  (I think I may have some to-dos wrt scrubbers there.)
13:43:06 <edmondsw> bye efried
13:43:16 <edmondsw> jay1_ ready to talk about iSCSI testing now
13:44:20 <jay1_> sure; so far the progress is that we are able to do successful deploy without NW
13:44:40 <jay1_> attach/detach issue is still getting fixed
13:45:09 <jay1_> the next target would be to try LPM.
13:46:03 <edmondsw> jay1_ this is with a CHAP secret configured but not using CHAP, is that right?
13:46:07 <thorst> multi attach has REST issues as well
13:46:38 <edmondsw> I spoke to gfm about the CHAP issues... sounds like a couple different problems we're trying to track down and get fixed
13:46:45 <edmondsw> glad you were able to work around that in the meantime
13:47:25 <edmondsw> jay1_ is the bug you linked above the only other issue we're seeing?
13:47:43 <edmondsw> do we have a bug for the CHAP issues?
13:48:13 <jay1_> edmondsw: yes we are not using CHAP yet
13:48:31 <chhavi> edmondsw: currently we have the CHAP disabled for SVC,
13:49:00 <chhavi> edmondsw: reason why i disabled CHAP, we were having discovery issues if we enable CHAP
13:49:09 <thorst> working the multi attach, lpm, then chap can come in
13:49:16 <edmondsw> thorst +1
13:49:16 <thorst> just chipping away at all the edge cases
13:49:20 <chhavi> edmondsw: reason is on neo34 iscsid.conf CHAP is not configured
13:49:38 <edmondsw> I'm trying to work CHAP in parallel with the storage guys, but that shouldnt' be focus for jay1_ or chhavi right now
13:50:03 <chhavi> yeah, for us we have put CHAP on the side for now, and just playing with attach/detach
13:50:05 <edmondsw> what's the latest on the attach issues?
13:50:38 <jay1_> https://jazz07.rchland.ibm.com:13443/jazz/web/projects/NEO#action=com.ibm.team.workitem.viewWorkItem&id=174342
13:50:53 <chhavi> currently the issue which I am seing is, if I am trying to attach say 2 volumes on the VM. the second volume is not getting discovered correctly
13:50:53 <jay1_> we have other open defect as well
13:51:43 <thorst> jay1_: as noted earlier, please don't put links to internal IBM sites in here  :-)
13:51:54 <thorst> but that issue is for multi-attach
13:51:57 <chhavi> the problem which i suspect is, while we do the discover we just do iscsiadm login, i am trying to find, where we can use lun id to identify the correct lun on the same target
13:52:06 <chhavi> thorst: this is not multiattach
13:52:14 <thorst> ooo
13:52:18 <thorst> its just straight detach
13:52:19 <jay1_> thorst: sure ..
13:52:19 <chhavi> multiattach means, same volume on multiple VM
13:52:33 <thorst> ahh, sorry I meant multiple volumes on same vm
13:53:32 <edmondsw> chhavi if we only attach/detach one volume, things work, but if we attach/detach a second volume we have issues... is that correct?
13:53:42 <chhavi> yeah
13:54:40 <edmondsw> chhavi have you spoken to the owner of that bug? Do they have any ideas on how to fix it?
13:54:46 <chhavi> another use case which i have seen, is we have 2 VM's on the same host, and if u try to attach one volume on each VM, it does not discover that as well
13:55:13 <thorst> edmondsw: changch is aware.  He's got a backlog himself that he's working through unfortunately
13:55:19 <thorst> maybe nvcastet can help him out
13:55:31 <chhavi> i am waiting for hsien to come, and i am also checking parallely how to do iscidisovery using lun-id
13:55:56 <chhavi> by the time, i am updating my pypowervm review with exception handling, too many stuff's in parallel :)
13:57:35 <thorst> chhavi: yeah, we probably need to just list out the active items there.  Maybe a etherpad would help keep track of it all
13:57:46 <thorst> but we know we'll at least need a new pypowervm rev when all said and done
13:59:43 <edmondsw> we're at the top of the hour... any last words?
14:00:24 <esberglu> Thanks for joining
14:00:32 <esberglu> #endmeeting