13:01:52 #startmeeting powervm_driver_meeting 13:01:53 Meeting started Tue May 16 13:01:52 2017 UTC and is due to finish in 60 minutes. The chair is esberglu. Information about MeetBot at http://wiki.debian.org/MeetBot. 13:01:54 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 13:01:56 The meeting name has been set to 'powervm_driver_meeting' 13:02:24 o/ 13:02:55 #topic In Tree Driver 13:03:01 o/ 13:03:06 \o 13:03:14 No progress 13:03:32 obviously nothing was going to happen during the summit. 13:03:42 Yep that's what I thought. Anything to discuss here? 13:03:45 And at this point I wanna give everyone a bit to calm back down. 13:03:55 Nope. I'll look for an opportune time to pester again. 13:04:34 #topic Out-of-tree Driver 13:04:54 lots of reviews out. PowerVC team has been pushing up some solid patches 13:04:59 I sent efried a list to review 13:05:16 #action efried to do those 13:05:27 chhavi has also been driving a ton of the iSCSI work. She found a ton of issues that needed to be worked, changes in the REST itself, etc... 13:05:34 so great work chhavi...many many thanks 13:05:49 +1 13:07:02 I'm basically working on core issues with the QCOW handler 13:07:09 but they're transparent to the OpenStack driver 13:07:25 I have a couple of topics for the driver in general, came out of the summit. Let's have a separate "Summit takeaways" topic later. 13:07:33 k 13:07:36 I think that's all I had. 13:07:51 I'm still continuing to work in-tree backports as able. Nothing else from me 13:08:25 CI? 13:08:35 #topic PowerVM CI 13:08:58 My #1 priority this week is identifying/opening/fixing bugs that are causing tempest tests 13:09:49 There are still some in-tree tests that fail occaisonally because of other networks that exist from other tests during the time they are running 13:10:08 And a handful of problematic tests for OOT 13:10:56 Other than that I'm still working the systemd change. It's looking good for the most part. 13:11:11 I'd still like a systemd'd setup to noodle around with. 13:11:37 I've got some definite comments queued up for 5245; but will have more once I've had a chance to play. 13:11:59 However the n-dhcp service looks like it is still getting started with screen, which doesn't work with the logserver changes 13:12:12 Haven't got into a node to play around with it yet. 13:12:32 that systemd thing is weird 13:12:39 I'm not sure I'm a fan of getting rid of screen. 13:12:45 but that may just be that I'm so used to screen. 13:13:16 thorst Totally agree. I'm old, and not a fan of change. But I think once we get used to it, it'll be better. 13:13:28 The one thing I'm going to miss the most is the coloring. 13:13:30 totes...but grumble in the mean time 13:13:39 Hopefully we figure out how to get colors. 13:13:43 efried: I can spin you up a node on the prod. cloud. I'm gonna be tearing the staging down a bunch this week 13:14:07 I don't see why we can't still throw color codes into journald - mebbe it doesn't allow them? 13:14:17 I thought I was seeing it... 13:14:34 thorst Where? We haven't produced anything with journald yet, have we? 13:15:37 chhavi's env 13:15:41 iscsi debug 13:15:46 Anyway, converged logs plus the "traveling request-id" stuff jaypipes is working on ought to make debugging much nicer. 13:16:06 traveling request-id? Sounds beautiful 13:16:18 almost as nice as the service token stuff 13:16:57 it's related, actually 13:17:22 dependent on service tokens... so another reason to love those 13:17:43 heh, you know how much I love security discussions... 13:18:12 anywho... 13:18:15 anything else here? 13:18:40 Backlog of small CI issues as always. But nothing noteworthy 13:19:15 #topic Driver Testing 13:19:21 thorst #link https://review.openstack.org/#/c/464746/ 13:19:26 ^^ request-id spec 13:20:30 Any issues or updates on the manual driver testing? 13:21:12 jay1_ ^^ ? 13:23:24 FYI guys, I gotta jump in about 20 minutes. My #3 has a track meet. (He's running the hurdles, like his ol' dad used to do ;-) 13:23:54 Can we move on, and come back to jay1_ later? 13:23:56 Alright moving on. 13:24:07 #topic Open Discussion 13:24:43 Okay, couple things came out of the summit that I wanted to document for the sake of having 'em in writing (read: before I forget all about 'em) 13:25:29 First, placement and SR-IOV. Talked with jaypipes; he agrees PCI stuff is totally broken, and would definitely be in the corner of anyone "volunteering" to fix it. 13:26:13 He seemed at least superficially sympathetic to the fact that the existing model doesn't work at all for us (or anyone who's not running on the hypervisor - access to /dev special files, pre-creation of VFs, etc.) 13:26:35 So this would be an opportunity for us to a) fix our SR-IOV story; and b) look good in the community. 13:27:02 +1 13:27:21 that should definitely be on our TODO list 13:27:57 Second, resource groups. Today, if we have 5 compute nodes all on the same SSP which has, say, 20GB of free space, and you ask the conductor how much storage you have in your cloud, it answers 5x20GB, cause it doesn't know any better. 13:28:22 efried: the update on the ISCSI verification, issue is there with attach/detach. 13:28:33 https://jazz07.rchland.ibm.com:13443/jazz/web/projects/NEO#action=com.ibm.team.workitem.viewWorkItem&id=174342 13:28:35 jay1_ Hold on, we'll come back to that. 13:28:38 I thought there was a concept of shared resources... are we not signaling something correctly there? 13:28:52 There's a spec to define resource groups within placement, seems to be mostly done. This would allow you to register the fact that all of those computes are on the same SSP, and the math would fix itself. 13:29:39 ah... so still in the works. Is the spec almost done, or the implementation? 13:29:41 bah, I'll have to find that spec later, I've got it here somewhere. 13:29:50 edmondsw I think the code is ready for use. 13:29:53 So the thing is: 13:30:28 The user can define the resource group by running commands. In that picture, we don't have to do anything - but it's the responsibility of the user to get it right. 13:30:55 However, there's a way we can tie into the placement API with code and set up this resource group ourselves from within our driver. 13:31:05 Like, when get_inventory is called. 13:31:23 yeah, we don't want users having to do that 13:32:22 Roughly, from get_inventory, we would check whether the SSP is registered (hopefully we're allowed to specify its UUID). If not, register it from this host. If so (and this might be a no-op), add this host to it. 13:32:53 jaypipes volunteered to show me where that code is, but he's in China this week, so I'll follow up next week. 13:33:16 Those were the two major takeaways from the summit. 13:33:18 from me. 13:33:34 to-dos, I should say. I had much more takeaway than that. 13:33:40 :) 13:33:47 thanks, efried 13:34:07 esberglu did you get a chance to look at that etcd stuff and see whether it would impact us? 13:34:18 or more likely, how? 13:35:11 That totally slipped my mind. Adding it to the list now 13:37:14 Also on the backlog of driver work to be done (these have been hanging out for a while, but I may as well mention them here to get them on record): 13:37:18 o Implement get_inventory(). This replaces get_available_resource(). 13:37:18 o Make the compute driver mark the host unavailable if the REST API is 13:37:18 busted. 13:37:18 o Subsume HttpNotFound exception - now available via pypowervm 1.1.4, 13:37:18 which is through global-reqs. 13:38:03 efried that's all for the IT driver, right? 13:38:12 Both. 13:38:20 ah, true 13:39:00 Oh, also need to keep an eye on https://review.openstack.org/#/c/452958/ and remove that arg from our destroy method when it merges. 13:40:07 efried why isn't that fixing the IT driver as part of the patch? 13:40:16 I love the idea to mark the driver down if the REST API isn't working 13:40:16 edmondsw Was just about to -1 for that. 13:40:23 good 13:41:33 should we go back to the iSCSI testing now? 13:42:20 Sure. I didn't have anything else 13:42:39 #topic iSCSI testing 13:42:49 Gotta bounce - I'll have to catch up on the iSCSI stuff later. (I think I may have some to-dos wrt scrubbers there.) 13:43:06 bye efried 13:43:16 jay1_ ready to talk about iSCSI testing now 13:44:20 sure; so far the progress is that we are able to do successful deploy without NW 13:44:40 attach/detach issue is still getting fixed 13:45:09 the next target would be to try LPM. 13:46:03 jay1_ this is with a CHAP secret configured but not using CHAP, is that right? 13:46:07 multi attach has REST issues as well 13:46:38 I spoke to gfm about the CHAP issues... sounds like a couple different problems we're trying to track down and get fixed 13:46:45 glad you were able to work around that in the meantime 13:47:25 jay1_ is the bug you linked above the only other issue we're seeing? 13:47:43 do we have a bug for the CHAP issues? 13:48:13 edmondsw: yes we are not using CHAP yet 13:48:31 edmondsw: currently we have the CHAP disabled for SVC, 13:49:00 edmondsw: reason why i disabled CHAP, we were having discovery issues if we enable CHAP 13:49:09 working the multi attach, lpm, then chap can come in 13:49:16 thorst +1 13:49:16 just chipping away at all the edge cases 13:49:20 edmondsw: reason is on neo34 iscsid.conf CHAP is not configured 13:49:38 I'm trying to work CHAP in parallel with the storage guys, but that shouldnt' be focus for jay1_ or chhavi right now 13:50:03 yeah, for us we have put CHAP on the side for now, and just playing with attach/detach 13:50:05 what's the latest on the attach issues? 13:50:38 https://jazz07.rchland.ibm.com:13443/jazz/web/projects/NEO#action=com.ibm.team.workitem.viewWorkItem&id=174342 13:50:53 currently the issue which I am seing is, if I am trying to attach say 2 volumes on the VM. the second volume is not getting discovered correctly 13:50:53 we have other open defect as well 13:51:43 jay1_: as noted earlier, please don't put links to internal IBM sites in here :-) 13:51:54 but that issue is for multi-attach 13:51:57 the problem which i suspect is, while we do the discover we just do iscsiadm login, i am trying to find, where we can use lun id to identify the correct lun on the same target 13:52:06 thorst: this is not multiattach 13:52:14 ooo 13:52:18 its just straight detach 13:52:19 thorst: sure .. 13:52:19 multiattach means, same volume on multiple VM 13:52:33 ahh, sorry I meant multiple volumes on same vm 13:53:32 chhavi if we only attach/detach one volume, things work, but if we attach/detach a second volume we have issues... is that correct? 13:53:42 yeah 13:54:40 chhavi have you spoken to the owner of that bug? Do they have any ideas on how to fix it? 13:54:46 another use case which i have seen, is we have 2 VM's on the same host, and if u try to attach one volume on each VM, it does not discover that as well 13:55:13 edmondsw: changch is aware. He's got a backlog himself that he's working through unfortunately 13:55:19 maybe nvcastet can help him out 13:55:31 i am waiting for hsien to come, and i am also checking parallely how to do iscidisovery using lun-id 13:55:56 by the time, i am updating my pypowervm review with exception handling, too many stuff's in parallel :) 13:57:35 chhavi: yeah, we probably need to just list out the active items there. Maybe a etherpad would help keep track of it all 13:57:46 but we know we'll at least need a new pypowervm rev when all said and done 13:59:43 we're at the top of the hour... any last words? 14:00:24 Thanks for joining 14:00:32 #endmeeting