13:30:37 <esberglu> #startmeeting powervm_ci_meeting 13:30:38 <openstack> Meeting started Thu Feb 2 13:30:37 2017 UTC and is due to finish in 60 minutes. The chair is esberglu. Information about MeetBot at http://wiki.debian.org/MeetBot. 13:30:39 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 13:30:43 <openstack> The meeting name has been set to 'powervm_ci_meeting' 13:30:54 <esberglu> Hi 13:31:01 <xia> o/ 13:31:07 <xia> Hi 13:31:11 <thorst_> o/ 13:31:36 <efried> o/ 13:32:08 <efried> #topic In-tree runs 13:32:26 <efried> Have we rebuilt and done runs with the latest pypowervm power-off fix? 13:32:47 <esberglu> Yeah, still hitting power off issues 13:32:51 <efried> Wait, that was for oot 13:33:34 <esberglu> Oh yeah for oot, didn't look at topic 13:34:16 <esberglu> In tree also picked has those changes patches in 13:35:07 <esberglu> In tree needs some further whitelist work 13:36:08 <esberglu> Latest in-tree results from patch 1 13:36:11 <esberglu> http://184.172.12.213/88/391288/33/silent/nova-in-tree-pvm/98345c2/powervm_os_ci.html 13:38:25 <efried> Yeah, those port binding failures are what I expect to see any time you try to spawn a VM with that first change set. 13:38:50 <efried> In fact, anything network-related I would say indicates that test should be stricken from the whitelist. 13:39:16 <thorst_> +1 13:40:28 <esberglu> Yep 13:40:37 <efried> Server rename isn't supposed to work either; though that failure looks like the VM didn't build in the first place. Probably worth looking into that further - but again, I don't think we should be running that test. 13:40:59 <esberglu> #action: esberglu: Update whitelist based on In Tree results 13:41:30 <efried> All in all, I'm fairly pleased that we can pass 862 tests in-tree. 13:41:45 <efried> Though I suppose the vast majority of them never hit our code. 13:41:49 <thorst_> yeah... 13:42:07 <thorst_> I don't know that its a huge success of anything other than the framework and that our driver doesn't die on start up at the moment :-) 13:42:18 <thorst_> unless we did get that network-less test in? 13:42:33 <efried> Validating the framework is a win, in my book. 13:42:37 <thorst_> totes 13:43:06 <efried> A bigger deal than proving a real spawn, with no disk or network, IMO. 13:43:17 <thorst_> true 13:43:55 <efried> esberglu I thought we ascertained that there was some way we could set up the networks where there were existing tests that would do real spawns - i.e. we didn't need to write our own explicit network-less spawn tests. 13:44:04 <efried> Any update on that statement? 13:44:25 <efried> Like, can you point to one or more of the tests in this run and say, "This is doing a spawn and succeeding"? 13:44:35 <efried> Guess I could grep the log... 13:44:43 <esberglu> tempest.api.compute.servers.test_servers.ServersTestJSON 13:44:56 <esberglu> Look at the tests under that 13:46:01 <esberglu> Yeah I stopped looking into the networkless spawn test when we started passing other ones 13:46:57 <efried> Well, shoot, I see a ton of successful spawns in the log. This is mucho goodness. 13:47:41 <efried> 29 of them. 13:48:15 <thorst_> neat 13:48:39 <efried> So: let's get rid of failing tests that we know we don't/shouldn't support yet, and produce a list of remaining failures that we don't think should be failures, and look into those. 13:49:17 <efried> Delete the former from the whitelist, but go ahead and comment out the latter, and let's see if we can get some consistent successful runs out of that. 13:49:35 <esberglu> Sounds good 13:49:43 <esberglu> Once that change is in I will push through some runs 13:50:07 <esberglu> Anything else for in tree CI? 13:50:27 <efried> #topic Out-of-tree runs 13:50:43 <efried> So you said we're still seeing power-off failures? Show me. 13:51:12 <esberglu> One sec, finding the right logs 13:51:26 <efried> I'm refactoring the power on/off code. Hopefully that'll make it a lot easier to isolate where these failures happen, and make appropriate tweaks. 13:51:43 <esberglu> http://184.172.12.213/30/427930/1/check/nova-out-of-tree-pvm/5bbe540/powervm_os_ci.html 13:52:32 <efried> looking... 13:53:11 <esberglu> http://184.172.12.213/82/416882/9/check/nova-out-of-tree-pvm/5cb242a/powervm_os_ci.html 13:53:28 <esberglu> Hitting a diff. test case in that second one 13:56:00 <efried> So here again it looks like we're trying the power-off with force_immediate=False - which is ON_FAILURE, which is _supposed_ to retry with vsp hard shutdown on any failure. But it ain't. 13:57:05 <efried> Give me the day to finish this refactor, and then we can start verifying those code paths better. 13:58:07 <thorst_> efried: did we add logging to the power off code? 13:58:11 <thorst_> seems like we're at that point. 13:58:11 <esberglu> Okay 13:58:29 <thorst_> especially if they ask for an force on failure 13:58:34 <thorst_> we should log warn that we're escalating 13:59:30 <esberglu> Other than that there is this register image test that is hitting us 13:59:51 <esberglu> And a few other ones that are failing (but may be related / side effects of power off) 13:59:54 <efried> I'm glad you asked. The power_off method was under @logcall, but without params logged. So I'm adding that. But yeah, I'm also going to add more log statements so we can follow the flow. 14:00:12 <thorst_> cool beans dude 14:00:18 <thorst_> what a weird mess 14:00:27 <thorst_> I wonder if this is a side effect of the NovaLink's being up for so long. 14:00:36 <thorst_> is there some leak in core that we're just now seeing... 14:00:58 <efried> The flow diagrams nvcastet showed us yesterday are really informative. I'm actually suspecting we're not completely honoring those - as in this case. 14:01:08 <efried> I'm going to render those ascii-wise in the docstrings. 14:01:42 <efried> I still have some questions on what paths we should take along those flows depending on the different values of force_immediate, though. 14:02:28 <thorst_> nvcastet is probably the right one for that...he's super smart. 14:02:32 <thorst_> smarter than me :-) 14:05:30 <efried> Well, I think the semantics of force_immediate were more of an artifact of how we wanted the community code to flow. So I'll need input from both of you. 14:06:34 <efried> nvcastet implied that the IBMi folks said we should "always" try vsp normal before vsp hard - so I'll want to know if that applies even if we say NO_RETRY. 14:07:20 <efried> The contract of NO_RETRY is supposedly that we *don't* go to vsp hard if the previous attempt failed. 14:07:27 <efried> So I'm stuck in the middle, at least for IBMi. 14:07:32 <efried> Anywho... 14:07:41 <efried> Are we done talking about CI? 14:08:00 <esberglu> I am 14:08:10 <thorst_> yep 14:08:23 <efried> xia, anything? 14:09:38 <xia> No,Thanks 14:09:51 <esberglu> Thanks for joining 14:09:54 <esberglu> #endmeeting