14:01:25 <esberglu> #startmeeting powervm_driver_meeting 14:01:26 <openstack> Meeting started Tue Feb 20 14:01:25 2018 UTC and is due to finish in 60 minutes. The chair is esberglu. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:01:27 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:01:30 <openstack> The meeting name has been set to 'powervm_driver_meeting' 14:03:11 <esberglu> edmondsw: efried: You guys around? Sorry didn't realize the recurring meeting notices were done 14:03:30 <efried> oh, yeah, I forgot because no calendar entry. 14:03:35 <efried> But I'm here. 14:04:22 <esberglu> #topic In-tree Driver 14:04:59 <esberglu> The bp looked good to me, I think edmondsw is gonna add the interface hotplug stuff still? 14:06:02 <esberglu> Only other open IT item for me is snapshot 14:06:37 <esberglu> Hoping we can get vSCSI, snapshot and localdisk in pretty early in the cycle 14:07:04 <esberglu> Because the migration stuff is gonna require a lot more work, both coding and testing 14:07:09 <esberglu> And getting CI up to speed 14:09:06 <efried> yuh 14:09:09 <esberglu> #topic Out-of-tree Driver 14:09:31 <efried> edmondsw and I learned things about MagicMock and one of our test cases yesterday. 14:10:08 <efried> about the former: it doesn't blow up when you try to iterate on it (Mock does). And about the latter: we weren't testing what we thought we were testing. Which happens a lot. 14:12:09 <esberglu> efried: You mean we have a bunch of tests not doing what we thought? 14:12:40 <efried> Probably. But at least this one for sure. 14:13:18 <efried> Interested in the recap? 14:13:44 <esberglu> efried: Yeah 14:14:20 <efried> This was for driver capabilities. We've been setting has_imagecache dynamically, because whether the driver supports it depends on which disk driver we're using. 14:14:25 <efried> Localdisk supports it, the others don't. 14:14:52 <efried> So we had a test that said assertFalse(driver.capabilities['has_imagecache']) 14:15:18 <efried> which seems reasonable enough 14:15:42 <efried> The dynamic setting used to be code in the driver that did something like: 14:15:42 <efried> self.capabilities.update(disk_drv.capabilities) 14:16:05 <efried> i.e. "take the stuff in the disk_drv.capabilities dict and overwrite/add whatever's in self.capabilities" 14:16:42 <efried> edmondsw wasn't happy with the "add" part, because it meant we were pulling capability keys into our compute driver that aren't part of the ComputeDriver interface. 14:17:11 <esberglu> yep 14:17:14 <efried> So to get rid of that, he changed it to be explicit about just pulling in has_imagecache: 14:17:14 <efried> self.capabilities['has_imagecache'] = disk_drv.capabilities['has_imagecache'] 14:17:27 <efried> Whereupon the above assertion started to fail. 14:17:40 <efried> And when he changed it to assertTrue, it passed. 14:17:50 <efried> Which should not happen. 14:19:34 <efried> If we were testing this properly, both of those modes of assignment should have been equivalent in terms of the has_imagecache key. So it should have been the same before & after. Which in fact should have been a signal that the code change was valid. 14:20:34 <esberglu> But the code change is valid, so we realized it was a problem with our test 14:20:57 <efried> Well, in broad terms, yeah. But we couldn't suss what the problem could possibly be. 14:21:21 <efried> The first thing I suspected was that one or both of those .capabilities was a Mock. But that didn't make sense, because you get an exception if you try to [key access] or .update a Mock. 14:21:31 <efried> So either code path should have raised an exception. 14:22:03 <efried> But it turns out that the disk_drv.capabilities dict was a *Magic*Mock 14:23:14 <efried> And MagicMock, in contrast with Mock, has default impls for magic methods. Like __getitem__ (returns a mock) and __iter__ (yields as if an empty list). 14:24:27 <efried> So when we were doing .update(), we were updating the driver's capabilities - where has_imagecache is defaulted to False - from an empty list of disk driver capabilities. Which left it False. 14:24:48 <efried> But when we did the assignment, we were assigning a mock to the value. And a mock evaluates to True. 14:25:34 <esberglu> Ah okay I see 14:26:39 <esberglu> I'm not seeing where in the code the capabilities are a MagicMock vs Mock 14:28:41 <efried> Yeah, that was tricky to track down. The driver is a real driver, which we were doing on purpose so we could get "real" values in the capabilities dict 14:28:42 <efried> BUT 14:29:02 <efried> the setUp for this test class is installing a fixture that actually mocks the disk driver constructor 14:29:37 <efried> So we're getting a real compute driver, but when it goes to create the disk driver, that mock patch takes over and we get a MagicMock. 14:30:29 <esberglu> Okay I follow 14:32:23 <efried> What's next? 14:32:40 <esberglu> efried: How do we fix it? 14:32:53 <efried> Don't let the disk driver get mocked. 14:33:24 <efried> I suggested moving the test case up to the first class in the suite, which doesn't do any of the fixture stuff. 14:33:57 <efried> And also adding test cases where we use different disk drivers to make sure it's True only for LocalStorage. 14:34:22 <efried> And also adding assertions to ensure that the extraneous keys aren't being created. 14:34:57 <esberglu> Sounds good 14:35:10 <esberglu> #topic Device Passthrough 14:35:12 <efried> IOW, what I'm sure edmondsw thought was going to be an ace across the net is rapidly metastasizing 14:35:58 <esberglu> Yep sounds that way 14:35:59 <efried> No news from me on dev passthrough. We're frantically writing & reviewing specs ahead of the PTG. 14:36:08 <efried> I abandoned that resource class affinity spec. 14:36:21 <efried> And put up this new one (unrelated) last night: https://review.openstack.org/#/c/546009/ 14:38:24 <esberglu> #topic PowerVM CI 14:38:31 <efried> I believe I'm still waiting on prompts from edmondsw et al for next steps on dev passthrough. 14:38:38 <efried> sorry, I'm a step behind you all day. 14:38:45 <esberglu> np 14:39:23 <esberglu> So the big thing for CI for queens is getting multinode jobs so we can do migration testing 14:39:39 <esberglu> I haven't looked into that at all really in the past, so I need to do some research there 14:39:40 <efried> Yeah. Big fun times. 14:39:55 <efried> I can tell you this: I will be of absolutely no help. 14:40:21 <efried> I have managed to go my entire career without ever migrating a partition 14:40:47 <esberglu> Huh, would have bet against that :) 14:41:22 <esberglu> The other thing for CI I wanted to bring up is that stale vopt issue we were seeing 14:41:46 <esberglu> Since fixing up the post stack cleaner, that has gone WAY down. However it did hit a handful of times in the last week 14:42:21 <esberglu> So we either need to re-explore that vopt naming change or try to figure out where the vopts are coming from now 14:42:22 <efried> interesting. 14:42:54 <efried> Did you manually clean up (or check) any stale vopts that were lying around from before we fixed the cleaner? 14:43:08 <esberglu> efried: Yes 14:43:52 <efried> So we're somehow still ending up with stale ones. 14:44:31 <esberglu> Yep, looks that way 14:44:36 <efried> I wonder if it could be timing issues with multiple tests (possibly even from multiple patches) running at once. 14:45:05 <efried> Like, the offending vopt isn't actually stale; it's just in the process of being created/mapped or unmapped/deleted when the other test hits this code path. 14:45:18 <efried> ...and it's still the naming issue. 14:45:43 <esberglu> efried: If that were the case I don't think the post stack cleaner update would have cut down the occurrences so much 14:46:13 <efried> No, I'm saying both issues that we talked about were problems. The post-stack cleaner took care of most of them, but not all. 14:46:19 <efried> Unless... is it possible the cleaner fix didn't get deployed to whatever server hit the problem? 14:46:38 <esberglu> efried: Nope, the latest of that gets pulled every run 14:47:50 <efried> How often is this happening? Did we just see this once in a little blip and never again? 14:49:31 <esberglu> 7 times since I left on wednesday 14:49:48 <esberglu> Where it was a handful a day before the cleaner fix 14:51:09 <esberglu> Either way, needs further debugging. I will look into it and let you know if I get stuck 14:51:12 <efried> Spread out across multiple nodes? 14:51:19 <esberglu> efried: Yep 14:51:23 <efried> I was thinking we could reinstate the naming fix and see if it clears up entirely. 14:52:30 <esberglu> efried: And if it doesn't, that points to a possible timing issue 14:52:36 <esberglu> I'm on board with that 14:53:11 <efried> ight. will look for that review. I need to un-abandon the renaming patch, eh? 14:53:14 * efried looks... 14:53:48 <efried> Done: https://review.openstack.org/#/c/540453/ 14:54:40 <esberglu> efried: Cool I'll put up the change to patch it in right after this 14:55:01 <esberglu> efried: That's all for me today, anything else? 14:55:42 <efried> don't think so. 14:55:49 <esberglu> #endmeeting