13:30:37 #startmeeting powervm_ci_meeting 13:30:38 Meeting started Thu Feb 2 13:30:37 2017 UTC and is due to finish in 60 minutes. The chair is esberglu. Information about MeetBot at http://wiki.debian.org/MeetBot. 13:30:39 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 13:30:43 The meeting name has been set to 'powervm_ci_meeting' 13:30:54 Hi 13:31:01 o/ 13:31:07 Hi 13:31:11 o/ 13:31:36 o/ 13:32:08 #topic In-tree runs 13:32:26 Have we rebuilt and done runs with the latest pypowervm power-off fix? 13:32:47 Yeah, still hitting power off issues 13:32:51 Wait, that was for oot 13:33:34 Oh yeah for oot, didn't look at topic 13:34:16 In tree also picked has those changes patches in 13:35:07 In tree needs some further whitelist work 13:36:08 Latest in-tree results from patch 1 13:36:11 http://184.172.12.213/88/391288/33/silent/nova-in-tree-pvm/98345c2/powervm_os_ci.html 13:38:25 Yeah, those port binding failures are what I expect to see any time you try to spawn a VM with that first change set. 13:38:50 In fact, anything network-related I would say indicates that test should be stricken from the whitelist. 13:39:16 +1 13:40:28 Yep 13:40:37 Server rename isn't supposed to work either; though that failure looks like the VM didn't build in the first place. Probably worth looking into that further - but again, I don't think we should be running that test. 13:40:59 #action: esberglu: Update whitelist based on In Tree results 13:41:30 All in all, I'm fairly pleased that we can pass 862 tests in-tree. 13:41:45 Though I suppose the vast majority of them never hit our code. 13:41:49 yeah... 13:42:07 I don't know that its a huge success of anything other than the framework and that our driver doesn't die on start up at the moment :-) 13:42:18 unless we did get that network-less test in? 13:42:33 Validating the framework is a win, in my book. 13:42:37 totes 13:43:06 A bigger deal than proving a real spawn, with no disk or network, IMO. 13:43:17 true 13:43:55 esberglu I thought we ascertained that there was some way we could set up the networks where there were existing tests that would do real spawns - i.e. we didn't need to write our own explicit network-less spawn tests. 13:44:04 Any update on that statement? 13:44:25 Like, can you point to one or more of the tests in this run and say, "This is doing a spawn and succeeding"? 13:44:35 Guess I could grep the log... 13:44:43 tempest.api.compute.servers.test_servers.ServersTestJSON 13:44:56 Look at the tests under that 13:46:01 Yeah I stopped looking into the networkless spawn test when we started passing other ones 13:46:57 Well, shoot, I see a ton of successful spawns in the log. This is mucho goodness. 13:47:41 29 of them. 13:48:15 neat 13:48:39 So: let's get rid of failing tests that we know we don't/shouldn't support yet, and produce a list of remaining failures that we don't think should be failures, and look into those. 13:49:17 Delete the former from the whitelist, but go ahead and comment out the latter, and let's see if we can get some consistent successful runs out of that. 13:49:35 Sounds good 13:49:43 Once that change is in I will push through some runs 13:50:07 Anything else for in tree CI? 13:50:27 #topic Out-of-tree runs 13:50:43 So you said we're still seeing power-off failures? Show me. 13:51:12 One sec, finding the right logs 13:51:26 I'm refactoring the power on/off code. Hopefully that'll make it a lot easier to isolate where these failures happen, and make appropriate tweaks. 13:51:43 http://184.172.12.213/30/427930/1/check/nova-out-of-tree-pvm/5bbe540/powervm_os_ci.html 13:52:32 looking... 13:53:11 http://184.172.12.213/82/416882/9/check/nova-out-of-tree-pvm/5cb242a/powervm_os_ci.html 13:53:28 Hitting a diff. test case in that second one 13:56:00 So here again it looks like we're trying the power-off with force_immediate=False - which is ON_FAILURE, which is _supposed_ to retry with vsp hard shutdown on any failure. But it ain't. 13:57:05 Give me the day to finish this refactor, and then we can start verifying those code paths better. 13:58:07 efried: did we add logging to the power off code? 13:58:11 seems like we're at that point. 13:58:11 Okay 13:58:29 especially if they ask for an force on failure 13:58:34 we should log warn that we're escalating 13:59:30 Other than that there is this register image test that is hitting us 13:59:51 And a few other ones that are failing (but may be related / side effects of power off) 13:59:54 I'm glad you asked. The power_off method was under @logcall, but without params logged. So I'm adding that. But yeah, I'm also going to add more log statements so we can follow the flow. 14:00:12 cool beans dude 14:00:18 what a weird mess 14:00:27 I wonder if this is a side effect of the NovaLink's being up for so long. 14:00:36 is there some leak in core that we're just now seeing... 14:00:58 The flow diagrams nvcastet showed us yesterday are really informative. I'm actually suspecting we're not completely honoring those - as in this case. 14:01:08 I'm going to render those ascii-wise in the docstrings. 14:01:42 I still have some questions on what paths we should take along those flows depending on the different values of force_immediate, though. 14:02:28 nvcastet is probably the right one for that...he's super smart. 14:02:32 smarter than me :-) 14:05:30 Well, I think the semantics of force_immediate were more of an artifact of how we wanted the community code to flow. So I'll need input from both of you. 14:06:34 nvcastet implied that the IBMi folks said we should "always" try vsp normal before vsp hard - so I'll want to know if that applies even if we say NO_RETRY. 14:07:20 The contract of NO_RETRY is supposedly that we *don't* go to vsp hard if the previous attempt failed. 14:07:27 So I'm stuck in the middle, at least for IBMi. 14:07:32 Anywho... 14:07:41 Are we done talking about CI? 14:08:00 I am 14:08:10 yep 14:08:23 xia, anything? 14:09:38 No,Thanks 14:09:51 Thanks for joining 14:09:54 #endmeeting