#openstack-powervm log

13:30:37 <esberglu> #startmeeting powervm_ci_meeting
13:30:38 <openstack> Meeting started Thu Feb  2 13:30:37 2017 UTC and is due to finish in 60 minutes.  The chair is esberglu. Information about MeetBot at http://wiki.debian.org/MeetBot.
13:30:39 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
13:30:43 <openstack> The meeting name has been set to 'powervm_ci_meeting'
13:30:54 <esberglu> Hi
13:31:01 <xia> o/
13:31:07 <xia> Hi
13:31:11 <thorst_> o/
13:31:36 <efried> o/
13:32:08 <efried> #topic In-tree runs
13:32:26 <efried> Have we rebuilt and done runs with the latest pypowervm power-off fix?
13:32:47 <esberglu> Yeah, still hitting power off issues
13:32:51 <efried> Wait, that was for oot
13:33:34 <esberglu> Oh yeah for oot, didn't look at topic
13:34:16 <esberglu> In tree also picked has those changes patches in
13:35:07 <esberglu> In tree needs some further whitelist work
13:36:08 <esberglu> Latest in-tree results from patch 1
13:36:11 <esberglu> http://184.172.12.213/88/391288/33/silent/nova-in-tree-pvm/98345c2/powervm_os_ci.html
13:38:25 <efried> Yeah, those port binding failures are what I expect to see any time you try to spawn a VM with that first change set.
13:38:50 <efried> In fact, anything network-related I would say indicates that test should be stricken from the whitelist.
13:39:16 <thorst_> +1
13:40:28 <esberglu> Yep
13:40:37 <efried> Server rename isn't supposed to work either; though that failure looks like the VM didn't build in the first place.  Probably worth looking into that further - but again, I don't think we should be running that test.
13:40:59 <esberglu> #action: esberglu: Update whitelist based on In Tree results
13:41:30 <efried> All in all, I'm fairly pleased that we can pass 862 tests in-tree.
13:41:45 <efried> Though I suppose the vast majority of them never hit our code.
13:41:49 <thorst_> yeah...
13:42:07 <thorst_> I don't know that its a huge success of anything other than the framework and that our driver doesn't die on start up at the moment  :-)
13:42:18 <thorst_> unless we did get that network-less test in?
13:42:33 <efried> Validating the framework is a win, in my book.
13:42:37 <thorst_> totes
13:43:06 <efried> A bigger deal than proving a real spawn, with no disk or network, IMO.
13:43:17 <thorst_> true
13:43:55 <efried> esberglu I thought we ascertained that there was some way we could set up the networks where there were existing tests that would do real spawns - i.e. we didn't need to write our own explicit network-less spawn tests.
13:44:04 <efried> Any update on that statement?
13:44:25 <efried> Like, can you point to one or more of the tests in this run and say, "This is doing a spawn and succeeding"?
13:44:35 <efried> Guess I could grep the log...
13:44:43 <esberglu> tempest.api.compute.servers.test_servers.ServersTestJSON
13:44:56 <esberglu> Look at the tests under that
13:46:01 <esberglu> Yeah I stopped looking into the networkless spawn test when we started passing other ones
13:46:57 <efried> Well, shoot, I see a ton of successful spawns in the log.  This is mucho goodness.
13:47:41 <efried> 29 of them.
13:48:15 <thorst_> neat
13:48:39 <efried> So: let's get rid of failing tests that we know we don't/shouldn't support yet, and produce a list of remaining failures that we don't think should be failures, and look into those.
13:49:17 <efried> Delete the former from the whitelist, but go ahead and comment out the latter, and let's see if we can get some consistent successful runs out of that.
13:49:35 <esberglu> Sounds good
13:49:43 <esberglu> Once that change is in I will push through some runs
13:50:07 <esberglu> Anything else for in tree CI?
13:50:27 <efried> #topic Out-of-tree runs
13:50:43 <efried> So you said we're still seeing power-off failures?  Show me.
13:51:12 <esberglu> One sec, finding the right logs
13:51:26 <efried> I'm refactoring the power on/off code.  Hopefully that'll make it a lot easier to isolate where these failures happen, and make appropriate tweaks.
13:51:43 <esberglu> http://184.172.12.213/30/427930/1/check/nova-out-of-tree-pvm/5bbe540/powervm_os_ci.html
13:52:32 <efried> looking...
13:53:11 <esberglu> http://184.172.12.213/82/416882/9/check/nova-out-of-tree-pvm/5cb242a/powervm_os_ci.html
13:53:28 <esberglu> Hitting a diff. test case in that second one
13:56:00 <efried> So here again it looks like we're trying the power-off with force_immediate=False - which is ON_FAILURE, which is _supposed_ to retry with vsp hard shutdown on any failure.  But it ain't.
13:57:05 <efried> Give me the day to finish this refactor, and then we can start verifying those code paths better.
13:58:07 <thorst_> efried: did we add logging to the power off code?
13:58:11 <thorst_> seems like we're at that point.
13:58:11 <esberglu> Okay
13:58:29 <thorst_> especially if they ask for an force on failure
13:58:34 <thorst_> we should log warn that we're escalating
13:59:30 <esberglu> Other than that there is this register image test that is hitting us
13:59:51 <esberglu> And a few other ones that are failing (but may be related / side effects of power off)
13:59:54 <efried> I'm glad you asked.  The power_off method was under @logcall, but without params logged.  So I'm adding that.  But yeah, I'm also going to add more log statements so we can follow the flow.
14:00:12 <thorst_> cool beans dude
14:00:18 <thorst_> what a weird mess
14:00:27 <thorst_> I wonder if this is a side effect of the NovaLink's being up for so long.
14:00:36 <thorst_> is there some leak in core that we're just now seeing...
14:00:58 <efried> The flow diagrams nvcastet showed us yesterday are really informative.  I'm actually suspecting we're not completely honoring those - as in this case.
14:01:08 <efried> I'm going to render those ascii-wise in the docstrings.
14:01:42 <efried> I still have some questions on what paths we should take along those flows depending on the different values of force_immediate, though.
14:02:28 <thorst_> nvcastet is probably the right one for that...he's super smart.
14:02:32 <thorst_> smarter than me :-)
14:05:30 <efried> Well, I think the semantics of force_immediate were more of an artifact of how we wanted the community code to flow.  So I'll need input from both of you.
14:06:34 <efried> nvcastet implied that the IBMi folks said we should "always" try vsp normal before vsp hard - so I'll want to know if that applies even if we say NO_RETRY.
14:07:20 <efried> The contract of NO_RETRY is supposedly that we *don't* go to vsp hard if the previous attempt failed.
14:07:27 <efried> So I'm stuck in the middle, at least for IBMi.
14:07:32 <efried> Anywho...
14:07:41 <efried> Are we done talking about CI?
14:08:00 <esberglu> I am
14:08:10 <thorst_> yep
14:08:23 <efried> xia, anything?
14:09:38 <xia> No,Thanks
14:09:51 <esberglu> Thanks for joining
14:09:54 <esberglu> #endmeeting