14:00:46 #startmeeting tripleo 14:00:47 Meeting started Tue Mar 29 14:00:46 2016 UTC and is due to finish in 60 minutes. The chair is dprince. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:48 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:51 The meeting name has been set to 'tripleo' 14:01:03 o/ 14:01:04 o/ 14:01:07 o/ 14:01:08 o/ 14:01:09 o/ 14:01:11 hi everyone 14:01:11 o/ 14:01:11 o/ 14:01:22 o/ 14:01:23 * larsks lurks 14:01:23 o/ 14:01:25 \o 14:01:29 o/ 14:01:30 o/ 14:01:37 hello \o 14:01:40 grep o/ 14:01:44 :) 14:01:45 o/ 14:01:59 hi 14:02:15 o/ 14:02:39 #topic agenda 14:02:46 * bugs 14:02:46 * Projects releases or stable backports 14:02:46 * CI 14:02:46 * Specs 14:02:46 * one off agenda items 14:02:48 * open discussion 14:03:01 o/ 14:03:24 Looks like we have a one off agenda item to discuss today with regards to quickstart 14:03:32 florianf: o/ 14:03:57 yep, I put that there 14:04:05 we can save it for last though 14:04:11 trown: cool, we will try to get to it 14:04:36 any other agenda items for this week? 14:04:44 dprince, I'd like to add "tempest run on CI jobs" to CI agenda today 14:05:01 dprince: tripleo-mistral spec? 14:05:06 sshnaidm: okay, lets talk about that in the CI section again 14:05:18 dprince: and I'd like to discuss it once more where to add the IPv6 functionality. 14:05:40 dprince: that can go in the CI part too, if that's fine. 14:05:49 rbrady: ack, adarazs ack. We will hit those under Specs and CI 14:06:36 okay, lets go 14:06:40 #topic bugs 14:07:31 this one is killing a lot of HA jobs, https://bugs.launchpad.net/tripleo/+bug/1562893 14:07:31 Launchpad bug 1562893 in tripleo "CI: HA jobs failing regularly with Error: /Stage[main]/Heat::Engine/Service[heat-engine]: Failed to call refresh: Could not restart Service[heat-engine]" [Critical,In progress] - Assigned to James Slagle (james-slagle) 14:07:54 pretty sure we've had the problem since we merged the patch to use the heat keystone domain 14:07:55 I confirm, I've seen it lot of times over the last days 14:08:08 EmilienM: check out my linked patch from the bz 14:08:09 slagle: can you remind us the patch please? 14:08:12 sshnaidm: revert then? 14:08:13 ok 14:08:16 slagle: ^^^ 14:08:18 sorry 14:08:25 it passed the HA job once, and i rechecked it to see if it passes again 14:08:42 dprince: possibly, we've already had to revert it once 14:08:46 also have seen this regularly .. http://logs.openstack.org/32/252032/33/check-tripleo/gate-tripleo-ci-f22-nonha/c06f6ba/console.html#_2016-03-27_23_23_08_401 14:08:47 slagle: if I recall we had a lot of issue the first time we added a Heat keystone domain, and it was reverted then too 14:08:51 slagle: exactly 14:08:57 slagle: that was like months ago though 14:09:04 dprince: yes, no one is actually making sure it works 14:09:11 just recheck recheck recheck, until you get one pass, then merge 14:09:40 so, we could revert, in the meantime, i took a shot at a "fix" 14:09:56 https://review.openstack.org/#/c/298295/ 14:10:00 michchap: ^^ that would effect our new keystone profile in puppet-tripleo too 14:10:15 we need a solution tho, as running without the isolated domain is a very insecure setup 14:10:38 shardy: slagle's solution is secure I think 14:10:46 dprince: when are we looking to merge patches that use the profiles? 14:10:46 shardy, slagle: I'm fine trying to fix it 14:10:53 my patch actually eliminates all of step6 in the manifest 14:10:53 Ah yup, that's not a revert :) 14:11:01 although i left the Step6 resource in tht 14:11:07 michchap: we've branched for Mitaka, so I think it is fine to merge 14:11:24 dprince: great. I dont' want to be porting piecemeal patches for too long. 14:11:32 slagle on a side note https://review.openstack.org/#/c/298695/ 14:11:41 slagle: before landing it, we might need to check if it passe consistently 14:11:50 iirc, when this was reverted the first time, the problem wasn't the patch, just that it was the first thing to try and actually use services that wen't ready yet 14:12:02 slagle: we are moving keystone stuff here soon http://git.openstack.org/cgit/openstack/puppet-tripleo/tree/manifests/profile/pacemaker/keystone.pp 14:12:09 EmilienM: yes, that's why i rechecked it even though it alredy passed all jobs 14:12:27 slagle: that already landed and this is passing CI in t-h-t too https://review.openstack.org/#/c/295588/ 14:12:41 though we might have an orchestration issue, I'll investigate that 14:12:46 slagle: so your step change could just go into puppet tripleo instead? 14:13:13 #action EmilienM to investigate lp#1562893 and find if we can do better orchestration in puppet-heat or THT 14:13:26 slagle: in the meantime, your patch does the job I think 14:13:29 dprince: ok, i'd be fine rebasing on that 14:13:59 slagle: we'd have to actually land 3 patches then, but it moves us in a better direction I think in terms of puppet maintenance 14:14:19 any other bugs this week? 14:14:40 there is another bug with stack-delete not working when using net-iso: https://bugs.launchpad.net/heat/+bug/1561172 14:14:40 Launchpad bug 1561172 in tripleo "Overcloud delete never completes" [Critical,Triaged] 14:15:25 bnemec and I have both hit it 14:15:27 slagle: yeah, bummer on the stack delete bug. Do we think that is a heat or heatclient issue? 14:15:37 i'm not sure we really know yet 14:15:58 we ought to add an overcloud delete to the end of our ci jobs 14:16:09 although that would add yet another 5 minutes :) 14:16:11 slagle: I did, then we reverted it because it took to long :( 14:16:29 shardy: that not why we reverted it, 14:16:45 slagle: yeah, hold that thought. Our job time can't increase I think 14:16:48 we reverted it because postci couldn't get logs from the overcloud because we had deleted it 14:17:00 slagle: I hads stumbled upon that bug, but for some reason now it's not happening in my current deployment 14:17:06 derekh: we also started hitting timouts IIRC, but yeah that was another issue 14:17:17 derekh: post_postci? 14:17:19 either way, reinstating that coverage would be good if we can figure out a way 14:17:38 derekh: seriously though, I don't think we can entertain added it back until we get our job walltime down 14:17:43 dprince: you mean pre_postci ? 14:18:08 yup, things are long enough 14:18:16 shardy: see the thread on the list about infra not liking our long running job times. This would just make it worse :/ 14:18:41 dprince: sure, but we trade a few mins CI time for hours and hours of developer debugging, when things keep breaking over and over again :( 14:19:04 something as fundamental as delete should be tested IMO, but I appreciate we are time constrained 14:19:06 shardy: periodic job then? 14:19:15 dprince: Yeah that'd be a good start for sure 14:19:37 shardy: or run a parallel job that is much shorter perhaps, many ways to accomplish this coverage I think.... 14:19:51 shardy: perhaps even under the Heat CI test coverage (outside of TripleO) 14:20:15 dprince: delete is already tested in heat CI, but it's possible we're missing some cases TripleO hits 14:20:20 in which case we can absolutely add them 14:20:29 yea, and it's only tripleo when using net-iso 14:20:36 so it's a complex case 14:20:38 shardy: right, perhaps just review those to add more advanced neutron port nested stacks.... 14:21:02 dprince: Yup we can do that when we figure out a heat-ci appropriate reproducer 14:21:46 shardy: that is the angle I would persue here I think. Adding CI to the right component would guard TripleO's cases closer to the source 14:22:03 shardy: and we could still have a longer running periodic CI job to do it too. 14:22:26 +1 14:22:27 okay, any other bugs this week? 14:23:18 i think that's it from me :) 14:23:34 #topic Projects releases or stable backports 14:23:43 slagle: mitaka is all set? 14:23:54 yes, i cut the branches off master yesterday 14:24:03 and we got another pass from rdo mitaka ci last night 14:24:05 nice :) 14:24:10 i assume it used the branches, but i didn't check 14:24:11 just an FYI, Puppet OpenStack released Mitaka last night - 8.0.0 is out for us and stable/mitaka is created. We openned the Newton developpment 14:24:11 nice one 14:24:33 i also did releases from master yesterday for the branched projects to bump the major version 14:25:25 i also deployed locally with tripleo.sh, and it was all good, modulo the stack-delete issue 14:25:25 cool, any other release things this week then? 14:26:05 I assume that "no feature backport" policy is in effect, right? 14:26:32 dtantsur: unless it impacts upgrades I think that would be correct 14:26:45 got it 14:26:51 dtantsur: Yes that's what we agreed, bugs only, unless an exception is agreed (e.g upgrades) 14:27:03 pradk: You had a question about aodh tho didn't you? 14:27:10 i think it would be prudent as well to require a lp bug for backports 14:27:18 there was some discussion of allowing an exception for that as it was close to landing last week 14:27:18 slagle: +! 14:27:21 https://review.openstack.org/288417 is giving us some pain, but I'm not sure it's worth an exception 14:27:26 +1 and limit medium/high/critical bugs only. 14:27:38 slagle: thanks for cutting all the branches/releases! :) 14:27:49 do we have a backport policy? I created one last week for Puppet group https://wiki.openstack.org/wiki/Puppet/Backport_policy 14:28:17 shardy, yea regarding gnocchi 14:28:50 since we cut mitaka, can we agree to make it an exception for backport to get gnocchi into mitaka 14:28:51 oh yeah s/aodh/gnocchi 14:29:22 EmilienM: we don't have a wiki page but I think there is agreement on what the policy is. We should probably draft a wiki for it though 14:29:26 slagle: +1 on bugs 14:29:42 dprince: documentation would help contributors not familiar with the process 14:29:44 I'll look into creating series branches in launchpad, then you can just do "target to series" on the bug 14:29:58 the way most other projects do it (currently we don't have LP set up right for this) 14:30:08 shardy: sounds good to me 14:30:20 shardy, most projects has abandoned lp series 14:30:31 in favor of tracking things with reno 14:30:38 not arguing, just FYI :) 14:30:41 for now, shall we tag any potential backport-appropriate bugs with "mitaka-backport-potential" ? 14:30:46 interested, we do both in puppet 14:30:47 again, same as other projects 14:31:02 dtantsur: but reno is only for release notes 14:31:03 dtantsur: reno for tracking bug backports? 14:31:10 I thought that was just for release notes 14:31:22 well, that's what people were using launchpad mostly 14:31:27 * for 14:31:49 I've never used launchpad for release notes, it used to be a wiki 14:31:53 tbh managing lp is annoying, so in ironic world we only use it for bugs 14:32:15 dtantsur: that's all I'm suggesting - report a bug, if it's potentially soemthing we should backport, add the tag 14:32:32 okay, lets move on to CI 14:32:34 #topic CI 14:32:50 adarazs had one question about IPv6 testing 14:32:59 EmilienM: hold on that 14:33:07 sshnaidm: you had a tempest question? 14:33:14 * EmilienM stop moving 14:33:31 dprince, yes, I'd like to talk how can we configure tempest on CI 14:33:35 EmilienM: :) 14:33:49 Puppet OpenStack CI is using puppet-tempest 14:33:51 Now I used config_tempest.py file from redhat tempest, that was installed on CI hosts 14:34:03 config_tempest.py is upstream? 14:34:08 EmilienM, nope 14:34:12 that's bad 14:34:32 container job works again ^.^ https://review.openstack.org/#/c/288915/ 14:34:42 EmilienM, and as I heard from dmellado he is working on complete refactoring it for upstream, so maybe soon 14:34:43 :) 14:34:51 seems like if we have puppet-tempest working for puppet tests, we should use the same thing in tripleo 14:35:11 sshnaidm: I think we'd just want smoke tests, similar to what puppet uses probably 14:35:18 whatever solution: we need to pick an upstream thing. 14:35:25 trown, yes, it's one of options 14:35:43 we do more than smoke, fyi https://github.com/openstack/puppet-openstack-integration/blob/master/run_tests.sh#L132-L143 14:35:45 sshnaidm: but I'd want to hold on that until we get our wall time down. Specifically some of the mirroring, caching work derekh is doing 14:35:52 rhallisey: that's cool - it's really near to the timeout tho by the looks of it.. 14:36:06 dprince, actually I ran already tests on live CI, and got some results: http://logs.openstack.org/38/297038/4/check-tripleo/gate-tripleo-ci-f22-upgrades/b7de594/console.html http://logs.openstack.org/38/297038/4/check-tripleo/gate-tripleo-ci-f22-nonha/63cbc53/console.html 14:36:27 dprince: it could be worth doing sooner, if we keep the time the same as the ping test 14:36:28 tempest smoke tests take from20 to 30 minutes, right now 14:36:34 oh 14:36:41 dprince: sshnaidm we can start by tagging the tempest testing on to the end of the periodic test 14:36:49 sshnaidm: excellent! 14:36:50 trown: if the time is the same then yeah. I think it is a significant hit 14:37:04 derekh: yep, +1 for periodic tempest 14:37:12 cool 14:37:17 test_network_basic_ops is the most important scenario imho 14:37:18 sshnaidm, is that w/ 0 failures, w/ the skip list? 14:37:26 EmilienM, and it flaps :( 14:37:46 I don't think tempest can really replace the pingtest entirely 14:37:50 shardy, ya it is running a bit slower than usual.. 14:37:58 shardy, agree 14:37:58 sshnaidm: so please do proceed w/ this. It shoulds like periodic job might be the best first place for it to live though 14:38:04 e.g there's zero functional coverage of heat in tempest, because we couldn't get our patches in there 14:38:11 the only blocker I see now is that we use config_tempest.py that is not in openstack/tempest 14:38:12 wendar, yes, with skip list: tempest.api.orchestration.stacks tempest.scenario.test_volume_boot_pattern 14:38:19 dprince, can we push the config_tempest.py to a tripleo repo? 14:38:33 weshay: sure, probably tripleo-ci if it helps 14:38:40 well 14:38:41 That is not the right way to upstream that. 14:38:50 second question is how do we configure tempest.. puppet is one of the option, and second is to commit configure_tempest script to CI repo 14:38:52 should not we follow how openstack/tempest does config? 14:39:17 I don't see what the downside of using puppet-tempest is? 14:39:19 either puppet-tempest or openstack/tempest upstream tool 14:39:23 bnemec: so long as we can (quickly) make changes to it I don't mind where it lives 14:39:33 bnemec: right 14:40:12 I don't want us to end up maintaing a tempest thing that upstream tempest is not using. 14:40:14 Which option is a tripleo/rdo user most likely to use? 14:40:22 weshay: I don't think that's the best solution, imho 14:40:35 trown, configuring tempest with puppet seems to be very complex, maybe because I'm not puppet expert, but configure_tempest.py is much much easier 14:40:41 EmilienM, ya.. I'm not suggesting it is.. 14:40:47 shardy, slagle, so could we consider gnocchi as an exception for mitaka? it would be very helpful to get it in and low risk i think as we dont default to gnocchi in ceilometer as discussed with shardy 14:40:48 I agree with bnemec, we should talk with tempest folks and find a common way to do it 14:41:06 I wouldn't use puppet-tempest I think. Honestly I don't even thing Tempest belongs on the undercloud because it can't be called remotely. There is no API to execute it... unless we did something w/ Mistral I guess to exec it that way 14:41:28 Tempst is just a test suite that can be called anywhere. It doesn't need to live on an undercloud proper 14:41:36 yup 14:41:50 I don't mind not using puppet-tempest, but I'm against using config_tempest.py which is not upstream but redhat only. 14:41:52 it's just convenient 14:42:18 dprince, but it should access AUTH_URL of overcloud 14:42:20 EmilienM, ya.. I think that is what it boils down to 14:42:20 EmilienM: I don't think it'd be redhat only. More of a TripleO config 14:42:33 Where it lives is kind of a big tangent here isn't it? For CI it's going to be on the undercloud because that's where it has to be. 14:42:45 dprince: I disagree, tripleo aims to use openstack projects, and tempest is already working on configuration tool 14:42:50 mtreinish: ^ 14:42:56 sshnaidm: sure, but no need to log into the undercloud for that. You can do it that way, but you don't have to 14:43:12 EmilienM, are we *sure* the upstream tempest folks are working on something 14:43:17 I've heard that for 3+ years 14:43:25 it could also be on the jenkins node 14:43:39 derekh: Not for net-iso 14:43:39 weshay: if not, we need to fix it. 14:43:47 wendar, EmilienM, as I know dmellado work on this, and it will be in upstream. Somewhen 14:43:48 bnemec: ack 14:43:48 derekh: exactly 14:43:52 It seems like there are more than a couple design decisions here... maybe a spec? 14:43:52 weshay: but I don't like taking the short path and implementing our own stuff 14:44:00 I checked yesterday and I heard they were punting on it 14:44:10 EmilienM, aye 14:44:14 * bnemec suddenly remembers why we just wrote our own ping test 14:44:18 lol 14:44:24 EmilienM: ?? 14:44:24 bnemec, :) 14:44:25 bnemec +1 14:44:36 it's more stable than network_basic_ops for sure... 14:44:57 bnemec: we'd just need to expose the routes, I get your case... and we can run it on the undercloud due to how we setup our environment. But again, this isn't a requirement for everyone. Just a remnant of how our CI works 14:45:11 note.. that config_tempest.py is used mid and downstream 14:45:31 and would allow us to compare tempest results and have the faith the config was the same 14:45:48 it's just a point.. not saying we should use one or the other really atm.. 14:45:51 weshay: check again in 3 years ;) 14:46:02 it's important to get *some* tempest results soon though 14:46:04 Commiting config_tempest.py to CI may be a temporary solution until the same tool appears upstream 14:46:35 weshay: again, I'd be fine for it to live wherever. But if tripleo-ci is a good start then perhaps there first? 14:46:39 I would like to see a collaboration with Tempest folks 14:46:54 is there a thread on ML or a patch in Gerrit that we can monitor? 14:47:12 I would like to see us iterate on this outside of our 1 hour meeting :) 14:47:16 dprince, that works for me.. as long as others think it's reasonable 14:47:19 time is ticking guys. Perhaps we can table the oppinions on the location of the python script to the list? 14:47:23 I'm not in favor of accepting some custom tools while there are some upstream dup efforts 14:47:24 * adarazs thinks tripleo-ci is a good place for it until it's not ready for tempest directly. 14:47:31 my 2 cents, being able to run upstream tempest, is attractive to installer strategy 14:47:57 let's move forward and follow-up later. 14:48:06 weshay: propose it in a review where you think it belongs. We can debate it there.... 14:48:08 lets use whatever RDO is using we'll then be more likely be able to debug each others setups 14:48:11 We've not reached consensus re pradk's request for a backport exception - do we take that to the ML so folks can vote and the list of pending patches can be listed? 14:48:16 ack 14:48:22 thanks 14:48:26 we need some +1's on the mitaka periodic job, https://review.openstack.org/#/c/298295/ 14:48:27 shardy: +1 14:48:32 either way there's a couple of remaining patches which need to land on master 14:48:34 shardy: sure, sorry if I missed that 14:48:41 derekh: well, RDO is not part of OpenStack, while TripleO is, we don't have the same constraints 14:48:53 ipv6 question time? 14:48:54 Regarding the overcloud deletion, https://review.openstack.org/#/c/296765/ 14:48:58 adarazs: yes. you are up 14:49:04 IPv6 question: currently the upgrades job does "nothing", that is, it doesn't have pacemaker (or actual upgrades). previously we agreed that we'll add the IPv6 netiso to upgrades, but if the job doesn't have pacemaker, we're not testing the more important parts of our code path, and things could break with IPv6 pacemaker. 14:49:04 1) is it okay if we enable pacemaker for upgrades? (which was planned anyway, I think and is my preference too) or 2) should we use the HA job for IPv6 or 3) not test IPv6 with pacemaker (which seems like a bad idea to me) 14:49:08 That's the periodic job I pushed for it. 14:50:31 before having upgrades with pacemaker, I would rather try to have an upgrade scenario working on non-ha, to iterate later. 14:50:44 adarazs: so you are proposing that that upgrades job test IPv6. And by doing so it would cover IPv6 w/ pacemaker because that is what we are doing there anyways? 14:50:51 to me, iteration would help: first get an upgrade job working without HA, then add HA, and then add IPV6 14:51:04 dprince: last time we talked that was the end result, that we should use upgrades. :) 14:51:07 if we try all in one, we'll never finish 14:51:19 dprince: I'm fine with either, but we should add it to some gate job. 14:51:20 EmilienM: I think upgrades requires pacemaker currently. I agree with your sentament but that isn't what has evolved here :/ 14:51:32 matbu, ^ 14:52:32 adarazs: IPv6 for the upgrades job would be fine with me. 14:52:33 derekh: my poing was tripleo & rdo do not have the same constraints. tripleo by its mission statements aims to consume OpenStack upstream projects, and this thing with tempest is downstream only, which is odd. 14:52:36 also relevant here is a patch gfidente proposed to try and document the test matrix we are hoping to converge to https://review.openstack.org/#/c/294071/ 14:52:48 EmilienM: I tried upgrade from liberty to mitaka 14:52:49 adarazs: we have to get the upgrades job working first though right? 14:53:02 adarazs: right now it is only partially testing upgrades I think 14:53:22 dprince: we currently have an upgrade job that runs without any option as it's not in tripleo-ci. I can add IPv6 option to it first. 14:53:26 AIUI gfidente was hoping to have the upgrades job with pacemaker and IPv6 14:53:34 dprince: so that we test that at least until upgrades actually land. 14:53:39 of course we could add each of them on at a time 14:53:40 trown: I'm not sure we are going to get to your quickstart issue man, too much time on CI :/ 14:53:46 dprince: i can try to make it works 14:53:59 dprince: no worries, I will send to the ML, and we can discuss next week if still needed 14:54:21 trown: we can follow-up this topic in async on #tripleo 14:54:35 adarazs: sounds fine to me. I think just seeing IPv6 passing somewhere upstream on any CI job is the first step here. 14:54:53 dprince: right, and upgrade job is very new,I'm not sure it's a good candidate. 14:54:55 dprince: we had a passing a few days ago with all the different configurations! 14:55:10 to me, we should start simple and non-ha or ha would be a good candidate for ipv6 testing. We can iterate later; 14:55:14 dprince: that is with and without pacemaker, with and without ceph. 14:55:28 does upgrade job performs real upgrade? 14:55:29 adarazs: you mean ipv6 ? 14:55:42 EmilienM: limited resources. we have to combine some things 14:55:57 EmilienM: nonha doesn't have network isolation or pacemaker currently (but it has SSL which we sholudn't test together with IPv6 for now) 14:56:00 EmilienM: I'm not opposed to adding it to nonha though 14:56:11 EmilienM I think the conclusion was that an actual upgrade would take too much to run at gate 14:56:27 so the upgrade job will update deploying without the patch first and with the patch after 14:56:47 that's what we did in our Puppet CI 14:56:48 adarazs: could you please email the list on these options? 14:56:51 and we noticed it was useless 14:56:55 dprince: sure. 14:57:08 adarazs: thanks 14:57:12 #topic specs 14:57:13 we have the exact same job, that were deployment our scenarios without the patch and then applying the patch. 14:57:24 rbrady: you wanted ot mention someting about tripleo-mistral? 14:57:28 we stopped doing that because the feedback of the jobs were mostly useless 14:57:38 what operators really care is upgrade between openstack releases 14:58:01 EmilienM so that's a vote for deploying from stable branch first and update to master+patch right? 14:58:02 EmilienM: minor updates also matter tho, because bugfixes.. 14:58:05 so yeah, it takes time but we should not skip it imho 14:58:28 shardy: right, but you and me know that today the biggest issues that we have are in major upgrades. 14:58:35 I think rbrady wanted feedback on this: 14:58:38 #link https://review.openstack.org/#/c/280407/ 14:58:39 gfidente: yes 14:58:50 The spec for the tripleo mistral integration has been up for some time now. If there is no further issue, I'd like to see if there is another core willing to review. 14:59:41 rbrady: ack, thanks for bringing it up again 14:59:43 rbrady: sure, I'll provide feedback, been on my todo list 15:00:06 almost out of time. trown sorry we didn't get to your item this week 15:00:08 * EmilienM switching to puppet meeting on openstack-meeting-4 15:00:21 #endmeeting