#openstack-powervm log

14:01:25 <esberglu> #startmeeting powervm_driver_meeting
14:01:26 <openstack> Meeting started Tue Feb 20 14:01:25 2018 UTC and is due to finish in 60 minutes.  The chair is esberglu. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:01:27 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:01:30 <openstack> The meeting name has been set to 'powervm_driver_meeting'
14:03:11 <esberglu> edmondsw: efried: You guys around? Sorry didn't realize the recurring meeting notices were done
14:03:30 <efried> oh, yeah, I forgot because no calendar entry.
14:03:35 <efried> But I'm here.
14:04:22 <esberglu> #topic In-tree Driver
14:04:59 <esberglu> The bp looked good to me, I think edmondsw is gonna add the interface hotplug stuff still?
14:06:02 <esberglu> Only other open IT item for me is snapshot
14:06:37 <esberglu> Hoping we can get vSCSI, snapshot and localdisk in pretty early in the cycle
14:07:04 <esberglu> Because the migration stuff is gonna require a lot more work, both coding and testing
14:07:09 <esberglu> And getting CI up to speed
14:09:06 <efried> yuh
14:09:09 <esberglu> #topic Out-of-tree Driver
14:09:31 <efried> edmondsw and I learned things about MagicMock and one of our test cases yesterday.
14:10:08 <efried> about the former: it doesn't blow up when you try to iterate on it (Mock does).  And about the latter: we weren't testing what we thought we were testing.  Which happens a lot.
14:12:09 <esberglu> efried: You mean we have a bunch of tests not doing what we thought?
14:12:40 <efried> Probably.  But at least this one for sure.
14:13:18 <efried> Interested in the recap?
14:13:44 <esberglu> efried: Yeah
14:14:20 <efried> This was for driver capabilities.  We've been setting has_imagecache dynamically, because whether the driver supports it depends on which disk driver we're using.
14:14:25 <efried> Localdisk supports it, the others don't.
14:14:52 <efried> So we had a test that said assertFalse(driver.capabilities['has_imagecache'])
14:15:18 <efried> which seems reasonable enough
14:15:42 <efried> The dynamic setting used to be code in the driver that did something like:
14:15:42 <efried> self.capabilities.update(disk_drv.capabilities)
14:16:05 <efried> i.e. "take the stuff in the disk_drv.capabilities dict and overwrite/add whatever's in self.capabilities"
14:16:42 <efried> edmondsw wasn't happy with the "add" part, because it meant we were pulling capability keys into our compute driver that aren't part of the ComputeDriver interface.
14:17:11 <esberglu> yep
14:17:14 <efried> So to get rid of that, he changed it to be explicit about just pulling in has_imagecache:
14:17:14 <efried> self.capabilities['has_imagecache'] = disk_drv.capabilities['has_imagecache']
14:17:27 <efried> Whereupon the above assertion started to fail.
14:17:40 <efried> And when he changed it to assertTrue, it passed.
14:17:50 <efried> Which should not happen.
14:19:34 <efried> If we were testing this properly, both of those modes of assignment should have been equivalent in terms of the has_imagecache key.  So it should have been the same before & after.  Which in fact should have been a signal that the code change was valid.
14:20:34 <esberglu> But the code change is valid, so we realized it was a problem with our test
14:20:57 <efried> Well, in broad terms, yeah.  But we couldn't suss what the problem could possibly be.
14:21:21 <efried> The first thing I suspected was that one or both of those .capabilities was a Mock.  But that didn't make sense, because you get an exception if you try to [key access] or .update a Mock.
14:21:31 <efried> So either code path should have raised an exception.
14:22:03 <efried> But it turns out that the disk_drv.capabilities dict was a *Magic*Mock
14:23:14 <efried> And MagicMock, in contrast with Mock, has default impls for magic methods.  Like __getitem__ (returns a mock) and __iter__ (yields as if an empty list).
14:24:27 <efried> So when we were doing .update(), we were updating the driver's capabilities - where has_imagecache is defaulted to False - from an empty list of disk driver capabilities.  Which left it False.
14:24:48 <efried> But when we did the assignment, we were assigning a mock to the value.  And a mock evaluates to True.
14:25:34 <esberglu> Ah okay I see
14:26:39 <esberglu> I'm not seeing where in the code the capabilities are a MagicMock vs Mock
14:28:41 <efried> Yeah, that was tricky to track down.  The driver is a real driver, which we were doing on purpose so we could get "real" values in the capabilities dict
14:28:42 <efried> BUT
14:29:02 <efried> the setUp for this test class is installing a fixture that actually mocks the disk driver constructor
14:29:37 <efried> So we're getting a real compute driver, but when it goes to create the disk driver, that mock patch takes over and we get a MagicMock.
14:30:29 <esberglu> Okay I follow
14:32:23 <efried> What's next?
14:32:40 <esberglu> efried: How do we fix it?
14:32:53 <efried> Don't let the disk driver get mocked.
14:33:24 <efried> I suggested moving the test case up to the first class in the suite, which doesn't do any of the fixture stuff.
14:33:57 <efried> And also adding test cases where we use different disk drivers to make sure it's True only for LocalStorage.
14:34:22 <efried> And also adding assertions to ensure that the extraneous keys aren't being created.
14:34:57 <esberglu> Sounds good
14:35:10 <esberglu> #topic Device Passthrough
14:35:12 <efried> IOW, what I'm sure edmondsw thought was going to be an ace across the net is rapidly metastasizing
14:35:58 <esberglu> Yep sounds that way
14:35:59 <efried> No news from me on dev passthrough.  We're frantically writing & reviewing specs ahead of the PTG.
14:36:08 <efried> I abandoned that resource class affinity spec.
14:36:21 <efried> And put up this new one (unrelated) last night: https://review.openstack.org/#/c/546009/
14:38:24 <esberglu> #topic PowerVM CI
14:38:31 <efried> I believe I'm still waiting on prompts from edmondsw et al for next steps on dev passthrough.
14:38:38 <efried> sorry, I'm a step behind you all day.
14:38:45 <esberglu> np
14:39:23 <esberglu> So the big thing for CI for queens is getting multinode jobs so we can do migration testing
14:39:39 <esberglu> I haven't looked into that at all really in the past, so I need to do some research there
14:39:40 <efried> Yeah.  Big fun times.
14:39:55 <efried> I can tell you this: I will be of absolutely no help.
14:40:21 <efried> I have managed to go my entire career without ever migrating a partition
14:40:47 <esberglu> Huh, would have bet against that :)
14:41:22 <esberglu> The other thing for CI I wanted to bring up is that stale vopt issue we were seeing
14:41:46 <esberglu> Since fixing up the post stack cleaner, that has gone WAY down. However it did hit a handful of times in the last week
14:42:21 <esberglu> So we either need to re-explore that vopt naming change or try to figure out where the vopts are coming from now
14:42:22 <efried> interesting.
14:42:54 <efried> Did you manually clean up (or check) any stale vopts that were lying around from before we fixed the cleaner?
14:43:08 <esberglu> efried: Yes
14:43:52 <efried> So we're somehow still ending up with stale ones.
14:44:31 <esberglu> Yep, looks that way
14:44:36 <efried> I wonder if it could be timing issues with multiple tests (possibly even from multiple patches) running at once.
14:45:05 <efried> Like, the offending vopt isn't actually stale; it's just in the process of being created/mapped or unmapped/deleted when the other test hits this code path.
14:45:18 <efried> ...and it's still the naming issue.
14:45:43 <esberglu> efried: If that were the case I don't think the post stack cleaner update would have cut down the occurrences so much
14:46:13 <efried> No, I'm saying both issues that we talked about were problems.  The post-stack cleaner took care of most of them, but not all.
14:46:19 <efried> Unless... is it possible the cleaner fix didn't get deployed to whatever server hit the problem?
14:46:38 <esberglu> efried: Nope, the latest of that gets pulled every run
14:47:50 <efried> How often is this happening?  Did we just see this once in a little blip and never again?
14:49:31 <esberglu> 7 times since I left on wednesday
14:49:48 <esberglu> Where it was a handful a day before the cleaner fix
14:51:09 <esberglu> Either way, needs further debugging. I will look into it and let you know if I get stuck
14:51:12 <efried> Spread out across multiple nodes?
14:51:19 <esberglu> efried: Yep
14:51:23 <efried> I was thinking we could reinstate the naming fix and see if it clears up entirely.
14:52:30 <esberglu> efried: And if it doesn't, that points to a possible timing issue
14:52:36 <esberglu> I'm on board with that
14:53:11 <efried> ight.  will look for that review.  I need to un-abandon the renaming patch, eh?
14:53:14 * efried looks...
14:53:48 <efried> Done: https://review.openstack.org/#/c/540453/
14:54:40 <esberglu> efried: Cool I'll put up the change to patch it in right after this
14:55:01 <esberglu> efried: That's all for me today, anything else?
14:55:42 <efried> don't think so.
14:55:49 <esberglu> #endmeeting