esberglu | #startmeeting powervm_ci_meeting | 13:31 |
openstack | Meeting started Thu Jan 26 13:31:10 2017 UTC and is due to finish in 60 minutes. The chair is esberglu. Information about MeetBot at http://wiki.debian.org/MeetBot. | 13:31 |
openstack | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 13:31 |
openstack | The meeting name has been set to 'powervm_ci_meeting' | 13:31 |
thorst_ | o/ | 13:31 |
XiaYan | o/ | 13:31 |
esberglu | #topic status | 13:32 |
esberglu | I spent some time yesterday looking through what the OOT CI driver. Other than that I've been working on getting a passing tempest spawn on the IT driver | 13:34 |
thorst_ | how close are we with IT? | 13:35 |
esberglu | #topic In Tree CI | 13:37 |
esberglu | I'm at the point where it is requesting a server on the compute api. According to the docs, you can set networks='none' | 13:37 |
esberglu | and it will create a server without networks | 13:37 |
esberglu | But it isn't accepting it as a valid request | 13:37 |
esberglu | It might be a version problem | 13:38 |
thorst_ | do you have a pastebin that we can work off of? | 13:38 |
thorst_ | I'd like to see that after meeting | 13:38 |
thorst_ | but that's good to hear that we are that far along :-) | 13:38 |
esberglu | And hopefully once we get through the rest api it will spawn successfully | 13:39 |
thorst_ | well, it should | 13:40 |
thorst_ | I've done it myself...unless it changed recently | 13:40 |
thorst_ | but that's good news | 13:40 |
thorst_ | good work | 13:41 |
esberglu | Other than that, just creating the white list for IT | 13:41 |
esberglu | #action esberglu: Create In Tree Whitelist | 13:42 |
esberglu | #action: esberglu: Finish the tempest no network spawn test | 13:42 |
esberglu | I think that's pretty much it for in tree unless you had anything | 13:42 |
thorst_ | not for in tree | 13:43 |
esberglu | #Topic out of tree driver | 13:44 |
esberglu | I haven't looked into the failures this morning, but there are a decent amount coming through | 13:45 |
thorst_ | soooo | 13:45 |
thorst_ | I think I know why | 13:45 |
thorst_ | I think a lot of them are power_off failures. | 13:45 |
thorst_ | good news about that power off change I put in | 13:45 |
thorst_ | it's respecting the timeout values now | 13:45 |
thorst_ | bad news, a lot of images are using a default timeout of 60 seconds | 13:46 |
thorst_ | and I didn't say 'force off after 60 seconds' | 13:46 |
thorst_ | which I apparently should have. | 13:46 |
thorst_ | well, 'force off after timeout' | 13:46 |
thorst_ | I'd like to catch efried today and push that through. | 13:46 |
esberglu | Cool, glad we have a plan on that | 13:47 |
thorst_ | before we'd just wait up to 20 minutes...and it'd usually power off. | 13:47 |
thorst_ | now, that timing window is small and we have pain :-( | 13:47 |
thorst_ | sorry | 13:47 |
esberglu | That clears up some of the errors. | 13:49 |
esberglu | But we are still seing an occaisional "no valid host found" | 13:49 |
esberglu | Or seeing servers fail to build | 13:49 |
esberglu | I will go through the failures again today and see what we are hitting again | 13:50 |
esberglu | That all I had on OOT | 13:51 |
esberglu | Any other topics you want to do? | 13:51 |
esberglu | Want to talk drivers at all? | 13:51 |
efried | Did I miss the CI meeting? | 13:54 |
esberglu | Yeah | 13:55 |
esberglu | Its still going if you had topics | 13:55 |
efried | Link me to the minutes? | 13:55 |
efried | oh, okay. | 13:55 |
efried | Have you tried 4754 yet? | 13:55 |
esberglu | No I went down the wrong path for a while yesterday trying to get that spawn test working | 13:56 |
esberglu | Almost working now | 13:56 |
esberglu | I then was gonna test that | 13:56 |
esberglu | Test 4754 that is | 13:57 |
efried | cool beans. Do you need it to be merged before you can try it, or can you pick it up in-flight? | 13:57 |
esberglu | No merge needed. I will just do a manual run on the staging env | 13:57 |
efried | k. Let me know if you need any help setting it up. | 13:57 |
thorst_ | sorry - I had someone stop by. I didn't have anything else | 13:58 |
thorst_ | efried: can you read the eavesdrop logs from about 10-20 minutes ago | 13:58 |
thorst_ | about the power off stuff | 13:58 |
efried | The os_ci_tempest.conf for in-tree use the default base regex (all-inclusive) | 13:58 |
efried | thorst_: it stops at 13:44 | 13:59 |
thorst_ | alright, I'll catch you up after the nova IRC meeting | 13:59 |
efried | Sounds good. | 13:59 |
thorst_ | (it'll also be there by then I bet) | 14:00 |
efried | yuh | 14:00 |
efried | Sorry, guys - for some reason Thursday mornings always seem to start slow for me. | 14:00 |
esberglu | No biggie | 14:00 |
esberglu | #endmeeting | 14:00 |
openstack | Meeting ended Thu Jan 26 14:00:50 2017 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 14:00 |
openstack | Minutes: http://eavesdrop.openstack.org/meetings/powervm_ci_meeting/2017/powervm_ci_meeting.2017-01-26-13.31.html | 14:00 |
openstack | Minutes (text): http://eavesdrop.openstack.org/meetings/powervm_ci_meeting/2017/powervm_ci_meeting.2017-01-26-13.31.txt | 14:00 |
openstack | Log: http://eavesdrop.openstack.org/meetings/powervm_ci_meeting/2017/powervm_ci_meeting.2017-01-26-13.31.log.html | 14:00 |
efried | What's Michael Still's IRC handle? | 14:01 |
efried | thorst_: ^ | 14:01 |
efried | looks like 'mikal' | 14:03 |
thorst_ | yep | 14:03 |
efried | oh, I did have a CI topic. Remind me after the nova meeting. It goes something like this: Can we eliminate some or all of the tests that don't hit our code? That would make our CI run (a lot) faster. Do those tests actually tell us anything? (Maybe they would theoretically uncover errors running standard in-tree code on Power hardware?) | 14:06 |
thorst_ | efried: this is basically the fix I think we need https://review.openstack.org/#/c/425711/ | 14:07 |
thorst_ | for the 'power off' discussion. Hopefully the commit message explains...but the CI is failing because some power off with a timeout of 60 seconds. Then we periodically fail there...because the VM may not care. So I think we need to switch to a force immediate after the timeout has been hit | 14:08 |
thorst_ | it looks like that is what libvirt does as well...so I'm comfortable doing it in our code :-) | 14:10 |
efried | thorst_: The 60s timeout comes down from nova? | 14:19 |
thorst_ | yep | 14:19 |
efried | "libvirt does it" is slightly compelling, but without that, I would be pretty resistant to this idea. | 14:20 |
efried | Shutting down forcefully can result in data loss. | 14:20 |
efried | We don't care in tempest, but in real life... | 14:20 |
efried | We're basically coding this up so there'll never be a power off failure. | 14:21 |
thorst_ | correct | 14:21 |
efried | but by masking the fact that the graceful power off actually did fail. | 14:21 |
thorst_ | and that actually appears to be the design | 14:21 |
efried | which doesn't seem... ethical. | 14:21 |
thorst_ | I guess if timeout is -1...I could do a forever thing | 14:21 |
thorst_ | that seems more appropriate actually | 14:21 |
thorst_ | let me read the conf prop a little more closely | 14:22 |
efried | I wouldn't mind having a little data on how long it typically does take to power off gracefully. | 14:23 |
thorst_ | well, the problem is it matters where you are in the boot cycle | 14:23 |
thorst_ | and then different OSes do things differently too | 14:23 |
efried | If it's, like, 70s, then maybe we respect the timeout with a fudge factor. | 14:23 |
thorst_ | so for our CI, if we're in like a bootp process (cause we have a non-bootable image), it'll never come up | 14:23 |
thorst_ | if its in SMS still (because slow boot), I think its fine | 14:23 |
thorst_ | IBMi I believe is slow | 14:23 |
thorst_ | AIX can be slow, because workloads can say 'hold on'.... | 14:24 |
thorst_ | I think...not sure on that? | 14:24 |
efried | But we don't have the ability to know whether we're in OF or SMS - PHYP reports all of that as "on", right? | 14:24 |
efried | I guess there's the status LED. | 14:24 |
efried | But I don't relish the thought of coding to the zillion possible values of that guy. | 14:25 |
thorst_ | efried: well, also there's the what do we think we should do, versus what is the OpenStack contract... | 14:25 |
thorst_ | regardless of what we think | 14:25 |
efried | Does power-off accept a timeout from the user, ever? | 14:25 |
thorst_ | https://github.com/openstack/nova/blob/master/nova/conf/compute.py#L905-L926 | 14:25 |
thorst_ | yes. | 14:25 |
efried | I've never noticed such an option in horizon, but from the CLI? | 14:25 |
thorst_ | *kinda* | 14:25 |
thorst_ | its set in the image. | 14:25 |
thorst_ | or defaults to this value. | 14:26 |
thorst_ | the images in tempest just use the 60s | 14:26 |
efried | aha | 14:26 |
efried | Okay, the way I read that doc, it does seem to imply that you're supposed to pull the plug if it's not down by the timeout. | 14:27 |
thorst_ | right. | 14:27 |
thorst_ | (I actually thought that was how I had it originally, but forgot that new Force enum) | 14:27 |
efried | Putting the onus on the user to set a reasonable timeout in their image metadata seems like a pretty heavy burden. | 14:28 |
efried | But if that's the OpenStack contract... I guess I'm okay with it. | 14:28 |
thorst_ | yeah, I agree. | 14:28 |
thorst_ | I think for our typical workloads, we'd generally suggest that be higher. | 14:28 |
thorst_ | want to give those db's time to gracefully shut down | 14:29 |
efried | Is that something we should elucidate in our devref? | 14:29 |
efried | Or just count on the user to know their OpenStack esoterica? | 14:29 |
thorst_ | I think its really up to the distro... | 14:29 |
thorst_ | but I'm kinda iffy on it. | 14:30 |
thorst_ | I don't think I want to put it in...lets wait for someone to correct us on that... | 14:30 |
thorst_ | (if we're in fact wrong) | 14:30 |
efried | The doc implies -1 isn't a valid value. | 14:30 |
efried | In fact, even zero doesn't appear to be valid. | 14:31 |
thorst_ | right | 14:31 |
thorst_ | so I *can't* do an if -1 go for a while | 14:31 |
efried | So actually, the condition you have for setting the force flag will never work. | 14:32 |
efried | Unless there's some other way that value gets set within nova, that ignores/overrides the config opt. | 14:32 |
thorst_ | well, if they tell me to force it, it doesn't matter the timeout | 14:34 |
thorst_ | I just rip it I thought | 14:34 |
efried | If *who* tells you to force it? | 14:35 |
efried | The power_off API doesn't have a separate force option. | 14:35 |
efried | I guess that must be the answer to my above question. | 14:35 |
efried | If you force power off from the API, it sends you a zero, regardless of the conf option. | 14:35 |
efried | So - I think I'm good with your change. | 14:36 |
efried | Let me scrutinize it for spelling mistakes ;-) | 14:36 |
thorst_ | and definitely apostrophese | 14:36 |
efried | Nope, looks good to me. Let's wait for the PowerVM CI before merging, tho. | 14:37 |
thorst_ | efried: agree | 14:38 |
thorst_ | give it your +2 though, so I can just W+1 it when that comes in | 14:38 |
thorst_ | don't worry, I am ethical enough not to push it through until I get the CI +1'd | 14:38 |
thorst_ | also, I responded to your comments back in the fileio patch | 14:39 |
thorst_ | I did not take any action in the change set (after rewriting that one section, only to find out it doesn't matter) | 14:39 |
efried | thorst_: saw that, agree with your reasoning. Gave +1, will promote once csky and gfm do the same (since they had comments earlier). | 14:43 |
thorst_ | I thought they did on previous patch and this was really just a rebase | 14:43 |
efried | I don't see +1s from them. | 14:43 |
thorst_ | o bleh, that was the iscsi one | 14:44 |
thorst_ | let me go pester them | 14:44 |
efried | bbiab | 14:44 |
efried | My gerrit views are out of control. | 14:53 |
efried | thorst_: can you push this one? https://review.openstack.org/#/c/424238/ | 14:53 |
efried | and this one https://review.openstack.org/#/c/421448/ | 14:54 |
efried | I'll follow up with more cleanup change sets if needed. | 14:54 |
openstackgerrit | Merged openstack/networking-powervm: Move deprecated pci_passthrough_whitelist https://review.openstack.org/424238 | 15:04 |
*** smatzek has joined #openstack-powervm | 15:09 | |
*** esberglu has joined #openstack-powervm | 15:09 | |
thorst_ | hopefully that leads to more stable runs this afternoon | 16:02 |
efried | thorst_: gave +2 to File I/O. | 16:03 |
thorst_ | k | 16:03 |
efried | thorst_: Got time for https://review.openstack.org/#/c/421448/ ? | 16:03 |
thorst_ | yeah, I also want to peak at that OVS one from qingwu | 16:03 |
efried | esberglu: Did you determine for sure that https://review.openstack.org/#/c/399254/ and its brethren are not needed? | 16:04 |
openstackgerrit | Merged openstack/nova-powervm: Force immediate on failure https://review.openstack.org/425711 | 16:06 |
esberglu | I will keep an eye on CI with that force immediate merged | 16:13 |
esberglu | efried: It isn't needed for our CI. But if someone else was trying to use a custom pypowervm they would need a workaround. | 16:13 |
esberglu | But those patches didn't work when I tested them all concurrently. | 16:13 |
efried | I'll abandon for now, I guess, unless adreznec says otherwise. | 16:14 |
efried | thorst_: is https://review.openstack.org/#/c/400451/ ever likely to be resurrected? I can't remember where we stand on the whole glance thing. | 16:15 |
adreznec | efried: Nope, I don't think we need them if they're not needed for CI | 16:16 |
efried | k, abandoned, thx | 16:16 |
adreznec | The only people who'd be likely to need custom pypowervm is us | 16:16 |
efried | thorst_: How is configdrive.required_by(instance) normally set? I'm testing out the cfg_drv change set and that condition doesn't seem to be firing. I can force it with the conf option, but is that how we usually do it? | 16:20 |
thorst_ | efried: that's honestly how I've always done it | 16:20 |
thorst_ | I think you can also set image metadata to flag it | 16:21 |
thorst_ | but I just do the conf | 16:21 |
efried | okay. I've never noticed that. | 16:21 |
thorst_ | I *think* that's what most deploys do too... | 16:21 |
openstackgerrit | Merged openstack/nova-powervm: Mock data cleanup 1 https://review.openstack.org/421448 | 16:22 |
efried | thorst_: cfg_drv isn't working because the code path "requires" the mgmt_cna. | 16:31 |
efried | First of all, I'm assuming we can't do mgmt_cna yet because no network? | 16:31 |
efried | Second, assuming #1 is true, can we just rip out the mgmt_cna from that code path, or do we actually need it in order to build the cfg drv? | 16:31 |
thorst_ | mgmt_cna is separate from networking | 16:36 |
thorst_ | it is an additonal network | 16:36 |
thorst_ | but cfg drive shouldn't REQUIRE the mgmt cna | 16:36 |
efried | Okay, so for now, rip it out? | 16:37 |
thorst_ | there is in fact a config option to turn off mgmt cna altogether... | 16:37 |
thorst_ | rip it out from what? | 16:37 |
thorst_ | which patch | 16:37 |
efried | https://review.openstack.org/#/c/409404/10/nova/virt/powervm/tasks/storage.py@135 | 16:38 |
thorst_ | yeah, that should be something we add in a separate change set | 16:39 |
thorst_ | right now qingwu has the mgmt cna (and some vnic stuff....) in the OVS patch | 16:39 |
thorst_ | maybe we need the mgmt CNA to be a patch on top of the OVS one | 16:39 |
thorst_ | because it extends into multiple things | 16:39 |
openstackgerrit | Matt Rabe proposed openstack/nova-powervm: Add vopt removal params to the power on job in spawn https://review.openstack.org/425780 | 16:43 |
openstackgerrit | Eric Fried proposed openstack/networking-powervm: Use neutron-lib portbindings api-def https://review.openstack.org/422759 | 16:45 |
efried | thorst_: adreznec: ^^ I didn't hear from boden, so I went ahead and made the changes. I reckon we're in the driver's seat on this one - merge whenever we're ready. | 16:46 |
adreznec | Hmm ok | 16:46 |
thorst_ | sounds good | 16:47 |
adreznec | efried: https://review.openstack.org/#/c/421912/ | 16:48 |
esberglu | thorst_: Pretty much hitting the same failures today as yesterday. I will update again once some runs go through with the force immediate patch | 16:53 |
esberglu | Did we decide we were okay with disabling the test with the security group in use error? | 16:53 |
openstackgerrit | Matt Rabe proposed openstack/nova-powervm: Add vopt removal params to the power on job in spawn https://review.openstack.org/425780 | 17:03 |
thorst_ | I'm OK with it. efried may want to weigh in...but I'd prefer to do it | 17:06 |
efried | Like I said yesterday, I would like to know more about why this fails. If it's our concurrency level or it fails for everyone like this, community bug. Otherwise, we need to look at why it fails with our code and whether _we_ have a bug. | 17:08 |
thorst_ | so maybe some research quick on what tests were running when that failed | 17:11 |
thorst_ | looking at the n-cpu log | 17:11 |
efried | For now I guess I'm okay putting it in the part of the skip list that says, "We're skipping this because it fails, but we need to investigate why" | 17:11 |
efried | But we need to be careful that bit doesn't grow too large or get stale. | 17:11 |
thorst_ | yeah, figure it out in staging | 17:12 |
thorst_ | rather than hold production hostage | 17:12 |
efried | thorst_: community code uses shared procs always? | 17:18 |
efried | and uncapped | 17:18 |
thorst_ | yeah, openstack code always runs that way. It's a shared system, so we do that. I think there was an extra specs thing to override though, but that would be the controversial bit | 17:24 |
openstackgerrit | Drew Thorstensen (thorst) proposed openstack/nova-powervm: Experimental: File I/O Volume Driver Implementation https://review.openstack.org/422273 | 17:40 |
thorst_ | adreznec: efried: if good with that commit update, can we then push through? | 17:40 |
efried | thorst_: +2 | 17:41 |
efried | thorst_: adreznec: https://review.openstack.org/#/c/391288/ - added detail on those two proc tunables. See whatcha think. | 17:47 |
mdrabe | esberglu: Is the CI vote gating on all nova-powervm changes? | 17:49 |
thorst_ | mdrabe: yes | 17:50 |
thorst_ | all of them. | 17:50 |
thorst_ | won't get in unless it passes :-) | 17:50 |
mdrabe | k cool | 17:50 |
thorst_ | it's amazing :-) | 17:50 |
mdrabe | It is, also means I don't have to put "DNM yet" in the vopt removal change | 17:51 |
efried | mdrabe: Do you have the pypowervm change set up yet? I don't see it. | 17:59 |
mdrabe | clbush already merged it | 18:00 |
efried | thanks, I see it now. | 18:01 |
thorst_ | efried: I think it has NL updates too | 18:01 |
thorst_ | so the CI systems would need to be updated | 18:01 |
efried | thorst_: https://review.openstack.org/#/c/409404/ - updated to remove mgmt_cna stuff, tested, works. | 18:24 |
efried | Also rebased 'em all. | 18:25 |
efried | thorst_ adreznec esberglu: Follow up from earlier | 18:25 |
efried | Can we eliminate some or all of the tests that don't hit our code? That would make our CI run (a lot) faster. Do those tests actually tell us anything? (Maybe they would theoretically uncover errors running standard in-tree code on Power hardware?) | 18:25 |
thorst_ | efried: adreznec and I are in a meeting :-( | 18:26 |
efried | Fine, be that way. | 18:26 |
thorst_ | all of the latest jobs have failed the CI | 19:01 |
thorst_ | ruh roh | 19:01 |
openstackgerrit | Matt Rabe proposed openstack/nova-powervm: Add vopt removal params to the power on job in spawn https://review.openstack.org/425780 | 19:02 |
openstackgerrit | Taylor Jakobson proposed openstack/nova-powervm: Add image cache to nova-powervm https://review.openstack.org/371946 | 19:09 |
esberglu | thorst_: Just got back from lunch and saw the CI. Looks like they aren't even stacking. Looking into logs and checking devstack commits | 19:15 |
thorst_ | OK - if we need to turn something off...we should until we get it sorted | 19:16 |
thorst_ | or at least turning off the voting | 19:16 |
thorst_ | not voting...but basically red showing up from our CI on nova patches | 19:16 |
esberglu | thorst_: Most of those failures were expected | 19:40 |
esberglu | 7 failures were the in tree driver patches | 19:40 |
esberglu | 2 WIP patches that are failing everything | 19:40 |
esberglu | 3 failing haven't rebased with the force immediate | 19:40 |
esberglu | And the 4 have failed with tests that were in that list yesterday | 19:41 |
openstackgerrit | Merged openstack/nova-powervm: Experimental: File I/O Volume Driver Implementation https://review.openstack.org/422273 | 20:08 |
thorst_ | esberglu: phew! | 20:31 |
thorst_ | good analysis | 20:31 |
efried | esberglu: is https://review.openstack.org/#/c/416667/ on your radar? What do we need to do to get CI passing here? | 20:43 |
efried | ( adreznec ^^ ) | 20:43 |
adreznec | efried: esberglu I think we'd need a CI local.conf change | 20:56 |
adreznec | For the dsvm AIOs | 20:56 |
esberglu | Yeah to the local.conf.aio | 20:56 |
adreznec | Adding enable_service pvm-q-sea-agt to it | 20:56 |
esberglu | Yep | 20:56 |
adreznec | I think we could add that and it wouldn't impact anything | 20:56 |
adreznec | Then recheck, and then merge things | 20:56 |
adreznec | Since enabling the service twice should end up with the second being a no-op | 20:57 |
esberglu | Yeah. It will take a redeploy of the CI though. Or we can put it in and wait for the next image template to pick it up | 20:58 |
adreznec | Right | 20:58 |
adreznec | esberglu: Have a preference on the approach? | 20:59 |
esberglu | I vote for the latter and just wait to merge the change until we know that those line are in the local.conf on the image template | 21:00 |
esberglu | thorst_: adreznec: efried: I am gonna tag the *-powervm projects for the ocata 3 milestone today | 21:06 |
efried | This means we'll have a stable/ocata branch and have to do all the backporty stuff? | 21:07 |
adreznec | Ok | 21:07 |
thorst_ | let me merge one thing quick | 21:07 |
thorst_ | o wait, nm...its in | 21:07 |
thorst_ | rip it! | 21:07 |
adreznec | Which thing | 21:07 |
adreznec | file i/o? | 21:08 |
thorst_ | yep | 21:08 |
adreznec | yeah I just merged it | 21:08 |
adreznec | Figured you wouldn't care | 21:08 |
adreznec | :P | 21:08 |
thorst_ | I was wondering how that merged | 21:08 |
thorst_ | lol | 21:08 |
esberglu | efried: Nope, that's next thursday (release candidate 1). Ocata 3 is supposed to signify the end of new feature development | 21:10 |
esberglu | And then we tag the first release candidate next thursday and create the stable/ocata branch | 21:11 |
esberglu | And start doing backports if needed | 21:11 |
esberglu | Anything else people want to get in before the tag? | 21:13 |
openstackgerrit | Merged openstack/nova-powervm: Add image cache to nova-powervm https://review.openstack.org/371946 | 21:19 |
*** smatzek has quit IRC | 21:31 | |
efried | esberglu adreznec thorst_ - Was about to rebase 4757 on top of 4754; but maybe y'all want to do it the other way around? | 22:00 |
efried | Are we delaying 4754 for some reason? | 22:00 |
efried | It is backward compatible for OOT, so we oughtta be able to merge it and keep on keepin on. | 22:00 |
esberglu | Yeah. You tested it with no whitelist to confirm that OOT would work? | 22:01 |
efried | yes | 22:01 |
esberglu | Cool. +1 from me, I'll let the others look at it one more time | 22:03 |
adreznec | efried: No reason I can think of, I'll review that next | 22:04 |
efried | thx | 22:04 |
efried | esberglu - I did the rebase and the skip list mod on 4757 | 22:06 |
esberglu | Awesome thanks | 22:07 |
efried | yahyoubetcha. Only problem is, now we have to get someone else to review it ;-) | 22:07 |
*** tlian has joined #openstack-powervm | 23:21 | |
