14:00:44 <shardy> #startmeeting tripleo 14:00:44 <openstack> Meeting started Tue May 17 14:00:44 2016 UTC and is due to finish in 60 minutes. The chair is shardy. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:45 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:47 <EmilienM> o/ 14:00:48 <jrist> o/ 14:00:48 <openstack> The meeting name has been set to 'tripleo' 14:00:49 <shardy> #topic rollcall 14:00:53 <shadower> hey 14:00:55 <marios> o/ 14:00:55 <shardy> Hi all, who's around? 14:00:58 <rdopiera> hi 14:01:02 <gfidente> o/ 14:01:10 <jdob> o/ 14:01:15 <sshnaidm> o/ 14:01:19 <pabelanger> o/ 14:01:40 <shardy> #link https://wiki.openstack.org/wiki/Meetings/TripleO 14:01:46 <jistr> o/ 14:01:54 * myoung waves 14:02:08 <shardy> #topic agenda 14:02:08 <shardy> * one off agenda items 14:02:08 <shardy> * bugs 14:02:08 <shardy> * Projects releases or stable backports 14:02:08 <shardy> * CI 14:02:10 <shardy> * Specs 14:02:13 <shardy> * open discussion 14:02:28 <shardy> Does anyone have any more one-off items to add today? 14:02:37 <sshnaidm> shardy, please add to items: "tempest run for nonha periodic jobs" 14:02:50 <shardy> sshnaidm: ack, thanks 14:03:04 * beagles wanders in a couple of min late 14:03:09 <rbrady> o/ 14:03:16 <shardy> #topic one off agenda items 14:03:34 <shardy> #info should we remove the containers job (slagle) 14:03:51 <shardy> slagle: did you chat with rhallisey or Slower about that? 14:03:54 <akrivoka> \o 14:04:09 <jcoufal> o/ 14:04:12 <dprince> hi 14:04:13 <EmilienM> we should rather bring it back to green, isn't? 14:04:15 <chem`> yo 14:04:18 <shardy> it's been broken for a long time, so I agree we either need to fix it or remove it 14:04:24 <slagle> shardy: no, i wasn't able to last week, that's why i added it to the meeting 14:04:34 <shardy> EmilienM: well, yes, but folks said they were doing that and it's not happened for $months 14:04:45 <EmilienM> :-( 14:04:49 <weshay> o/ 14:04:52 <slagle> i would just like to know if people are working on fixing it 14:05:03 <EmilienM> he has a bunch of patches in progress for that 14:05:20 <EmilienM> they need to be rebased / passing CI / merged and things could be better 14:05:34 <enikher> hey 14:05:40 <enikher> #info Nikolas Hermanns 14:05:41 <rhallisey> hey 14:05:41 <d0ugal> o/ 14:05:46 <rhallisey> sorry I'm late 14:05:50 <derekh> o/ 14:05:53 <shardy> rhallisey: Hey, we're discussing the containers CI job 14:05:55 <slagle> EmilienM: right there was a lot of discussion at the summit though about not using docker-compose any longer, etc 14:06:04 <shardy> what's the status there, do we have a plan to get it working again? 14:06:22 <ccamacho> o/ 14:06:51 <rhallisey> so the job worked for awhile. I can rebase it to get it back working, but with composable roles the gate will be flaky 14:06:59 <dprince> shardy: we will need the job again at some point 14:07:03 <adarazs> o/ 14:07:06 <qasims> o/ 14:07:14 <shardy> rhallisey: Ok, so we're trying to decide if we can get it working, or if it should be temporarily removed 14:07:16 <slagle> of course, we can always add it back 14:07:31 <shardy> obviously getting it working is preferable, but we need folks willing to keep it green 14:07:40 <rhallisey> I considered the current state as temproarily removed 14:07:50 <rhallisey> which I think is appropriate 14:08:00 <EmilienM> maybe keep it only for tripleo-ci project jobs so rhallisey can still run some tests or moving it to experimental pipeline 14:08:09 <EmilienM> so he can "check experimental" when needed 14:08:13 <shardy> rhallisey: Ok, well it's disabled, but we're discussing removing it from the default pipeline 14:08:22 <shardy> EmilienM: +1 that's perhaps a reasonable compromise 14:08:35 <rhallisey> shardy, ya I agree with EmilienM 14:08:40 <rhallisey> that would be best 14:08:46 <shardy> slagle: does that work for you? 14:08:52 <slagle> sure 14:08:53 <derekh> yup +1 to what EmilienM said, that way it wont be using up as many jenkins slaves 14:09:01 <EmilienM> rhallisey: if you're not familiar with infra, I can make the patch, let me know. 14:09:14 <shardy> #info agreement to make containers job temporarily only run via experimental pipeline 14:09:39 <shardy> #info tempest run for nonha periodic jobs 14:09:44 <shardy> sshnaidm: ^^ 14:09:47 <rhallisey> EmilienM, may ask some questions. I've done it once :) 14:10:04 <EmilienM> rhallisey: cool, we catch-up after the meeting 14:10:12 <sshnaidm> I think timeout is not the issue now after HW upgrade as I see in logs 14:10:34 <sshnaidm> so we can try to run tempest tests on periodic nonha jobs as the shortest ones 14:10:41 <rhallisey> shardy, sorry about being a tad late. I'm in a large meeting in westford 14:10:54 <shardy> rhallisey: no problem 14:10:59 <sshnaidm> https://review.openstack.org/#/c/297038/ 14:11:11 <rhallisey> container stuff 14:11:31 <derekh> sshnaidm: ya, I think we have the time available to do this now 14:11:52 <dprince> I'd like to hold on Tempest 14:12:00 <dprince> It really isn't our biggest problem ATM 14:12:05 <EmilienM> +1 14:12:13 <EmilienM> our CI is broken almost every day 14:12:17 <EmilienM> can we stabilize it first? 14:12:28 <dprince> like seriously... could someone point out what if anything Tempest is going to catch that our existing CI isn't already telling us 14:12:30 <derekh> sshnaidm: befor we merge that though can we just do one think, submit a WIP patch to run a fake period job to make sure there working as that patch isn't tested in ci 14:12:52 <shardy> derekh: I was just about to ask how we see that patch actually working :) 14:12:53 <dprince> Our existing ping test is working quite well 14:12:57 <sshnaidm> dprince, we already caought 2 bugs with tempest runs 14:13:09 <sshnaidm> derekh, sure 14:13:09 <trown> sshnaidm: what bugs? 14:13:26 <dprince> sshnaidm: that is great. The question is does the extra time Tempest costs us warrent running it on every single test 14:13:29 <EmilienM> on the other hand running tempest on periodic job does not hurt as much as running in our gate. 14:13:31 <sshnaidm> trown, I don't remember the numbers, one is about sahara non available and another one, I can find later 14:13:35 <trown> I have seen it catch packaging bugs where out of tree tempest plugins werent packaged right 14:13:44 <shardy> dprince: I think the proposal is only to run it once per day? 14:13:47 <dprince> sshnaidm: If those bugs are like constantly regressing you may have a case 14:13:56 <EmilienM> trown: right, puppet-CI catches is very often. 14:13:57 <dprince> shardy: we've been over this *so* many times 14:13:58 <derekh> dprince: its, for the periodic job, it doesn't add anything to normal ci runs 14:14:06 <dprince> nobody is again running tempest in the periodic jobs 14:14:16 <dprince> okay, that is fine then 14:14:33 <sshnaidm> dprince, EmilienM I don't think tempest will hurt CI, the current fails have their own reasons, not sure tempest test will break something 14:14:40 <shardy> dprince: we went over it a couple of weeks ago, and the consensus was OK, let's add it to the periodic job if we have enough walltime 14:14:50 <dprince> period is fine 14:15:03 <slagle> if we merge that, our promotes, which are also done by periodic jobs, will be dependent on tempest 14:15:04 <derekh> so we're all good 14:15:15 <derekh> slagle: yes they will 14:15:19 <dprince> slagle: that is a good point 14:15:22 <sshnaidm> slagle, right 14:15:29 <slagle> i dont think we really need that tbh 14:15:36 <dprince> but... my main concern is adding wall time to the existing jobs 14:15:51 <slagle> what about a separate periodic job for tempest that does not influence the promote? 14:16:16 <sshnaidm> dprince, as I saw all last nonha jobs it's not an issue, we have a time now 14:16:34 <sshnaidm> slagle, the point here is not promote bugs, afaiu.. 14:16:51 <dprince> sshnaidm: even so. I think what you are hearing is basically Tempest doesn't add much to our existing CI 14:16:55 <pabelanger> I raised a question on the ML about the CI pipeline recently: http://lists.openstack.org/pipermail/openstack-dev/2016-May/095143.html and have some ideas how to reduce CI times. Would be interested in feedback on it after the meeting 14:17:15 <dprince> sshnaidm: there is value in having our promotion job match (as closely as possible) what our Trunk jobs do as well 14:17:18 <sshnaidm> dprince, if it already caught a few bugs, why not? 14:18:01 <dprince> sshnaidm: list the bugs and make the case. I'm more interested if those bugs are perhaps corner cases... that may not often cause regressions in TripleO 14:18:08 <shardy> Ok, we're going to have to time-box this discussion and move on I think 14:18:21 <shardy> we keep having the same discussion every time, so lets all vote on https://review.openstack.org/#/c/297038 14:18:23 <slagle> i think the issue is that not every bug out there needs to block tripleo forward progress 14:18:45 <shardy> sshnaidm: please link the bugs found there, and post a link to where we can see results of it running 14:18:49 <derekh> slagle: that true but if it doesn't block something we'll possibly ignore it 14:19:24 <shardy> slagle: every bug out there typically does block forward progress anyway tho ;) 14:19:33 <slagle> derekh: agreed, and i think i'm leaning towards ignoring some things is ok :) 14:19:45 <shardy> #topic bugs 14:19:46 <slagle> obviously we want to find and report all bugs that we can 14:20:07 <derekh> slagle: then a valid bug could make it the whole way to the end of a cycle until we start panicing about it 14:20:24 <shardy> #link https://bugs.launchpad.net/tripleo/?orderby=-id&start=0 14:20:38 <sshnaidm> dprince, shardy one of them: https://review.openstack.org/#/c/309042/ 14:20:50 <dprince> shardy: I'm -2 on the Tempest patch as is I think 14:20:58 <shardy> So there's a packaging issue wrt mistral (caught by the periodic job), it now needs designateclient 14:22:03 <shardy> I'm trying to get a packaging fix in for that, and I confirmed the long term plan is for mistral to have soft dependencies so we won't have to install all-the-clients 14:22:14 <shardy> Any other bugs folks want to discuss? 14:22:26 <dprince> shardy: so rbrady filed a bug on the Mistral timeouts issue 14:22:30 <dprince> shardy: https://bugs.launchpad.net/mistral/+bug/1581649 14:22:31 <openstack> Launchpad bug 1581649 in Mistral "Action Execution Response Timeout" [High,New] 14:22:59 <rbrady> dprince: I'm still working on that bug 14:23:00 <dprince> which I don't actually see myself. But apparently it could be related to eventlet and the fact that we run Mistral API in Apache WSGI 14:23:15 <d0ugal> I am seeing that bug too 14:23:19 <dprince> rbrady: yep, just highlighting it so people know where to look if they hit that issue 14:23:37 <dprince> shardy: that is all from me on bugs 14:23:49 <shardy> dprince: interesting, thanks! I hit that myself quite a few times 14:24:04 <marios> +1 me too i thought it was underpowered undercloud 14:24:18 <rbrady> it would be helpful that anyone else seeing that bug notes that in launchpad 14:24:23 <marios> i often had to just re-run the execution 14:24:40 <shardy> rbrady: will do - I hit it pretty much 50% of my executions 14:24:48 <marios> shardy: same 14:24:56 <shardy> which sounds like a blocker to switching the client over to mistral 14:25:13 <d0ugal> +1 14:25:20 <shardy> #link https://launchpad.net/tripleo/+milestone/newton-1 14:25:36 <dprince> shardy: yes, this needs to be fixed 14:26:04 <shardy> So, the newton-1 milestone is in less than two weeks - I'm hoping we can cut some releases around the end of next week (or during the week after, but I'll be on PTO) 14:26:38 <shardy> can folks review that list, and ensure anything you consider release-blocker gets on there, and anything else gets moved to n-2? 14:26:49 <marios> shardy: well, https://blueprints.launchpad.net/tripleo/+spec/overcloud-upgrades 14:26:57 <marios> shardy: i was winderig about that one 14:27:12 <shardy> marios: Yup, that was going to be my next question - can we break that down into some smaller pieces? 14:27:25 <shardy> that's a catch-all BP and I'm guessing we can't declare it implemented for n-1? 14:27:42 <marios> shardy: right... i wasn't even clear what is was about exactly 14:28:16 <jistr> yea i think we can break it down into smaller BPs 14:28:40 <marios> shardy: i can work with jistr to define something for next week's meeting? 14:28:42 <shardy> marios: any remaining pieces (including CI coverage) we need to be able to declare mitaka->newton upgrades fully supported in upstream builds 14:28:54 <jistr> marios: +1 14:28:58 <marios> shardy: or is that too late already? 14:28:59 <shardy> marios: sure, sounds good, thanks! 14:29:22 <shardy> marios: we can chat about it between now and next week - please enumerate the todo list somewhere, e.g etherpad 14:29:32 <marios> shardy: ack 14:29:34 <jistr> ack 14:29:35 <jcoufal> +1 feel free to drag me in if needed 14:29:39 <shardy> trown: what's the status wrt quickstart - can we declare that implemented now? 14:30:10 <trown> shardy: we are still a bit blocked on the image front, but I am meeting with derekh later today to sync on that 14:30:25 <trown> shardy: since the only consumable image atm is from RDO 14:31:00 <shardy> trown: Ok, looks like we may have to bump to n-2 then, can you perhaps link an etherpad from the BP with details of what remains, or link bugs in the tripleo-quickstart launcpad? 14:31:15 <trown> shardy: will do 14:31:17 <shardy> zero blueprints for n-1 14:31:42 <shardy> Ok, sorry we kinda side-tracked off bugs a bit there 14:31:49 <shardy> #topic Projects releases or stable backports 14:32:35 <slagle> on the aodh backport topic, i did see what deps get pulled in if you try and install aodh from mitaka on a liberty controller 14:32:40 <shardy> So, pradk posted http://lists.openstack.org/pipermail/openstack-dev/2016-May/095097.html 14:32:42 <slagle> and it was only aodh related packages 14:32:50 <slagle> afaict 14:33:10 <slagle> so i think we might be able to migrate to aodh as the first step in a mitaka upgrade 14:33:13 <shardy> slagle: interesting, so we could potentially do that as part of the upgrade vs backport? 14:33:22 <slagle> shardy: yes, that's my hope 14:33:24 <trown> I like that better 14:33:24 <slagle> and avoid the backport 14:33:29 <EmilienM> +1 14:33:30 <shardy> I think that would be better if possible 14:33:54 <shardy> If folks have other opinions please reply to the thread 14:34:14 <slagle> i'll reply with what i found in my test 14:34:19 <shardy> I guess this is a somewhat common requirement, so it'd be good to solve it in a general way 14:34:35 <shardy> since for stable/mitaka we will not be even considering such backports 14:35:06 <shardy> I already had to abandon our application for stable:follows_policy governance tag due to stable/liberty 14:35:39 <shardy> we're now blocked from applying that until liberty is EOL, so a non-backport solution would be preferable from that angle also 14:35:48 <shardy> slagle: thanks 14:35:50 <jistr> the general way we had in mind previously involved doing it as a last step of the upgrade 14:36:04 <jistr> together with the "convergence via puppet" 14:36:11 <jistr> but that implies not missing the deprecation period 14:36:28 <jistr> (we need to be able to use the old service/config/whatever with the new packages) 14:36:36 <shardy> jistr: AIUI the problem is aodh obsoletes the ceilometer-alarm packages, so then we'd have no alarm service for the duration of the upgrade? 14:36:48 <shardy> (which could be days while all the computes are upgraded?) 14:36:51 <jistr> yea 14:37:00 <jistr> but it wouldn't have to obsolete them necessarily i think 14:37:12 <jistr> but the issue is 14:37:28 <jistr> that ceilometer-alarm is not present in upstream release of mitaka IIUC 14:38:05 <shardy> jistr: ack - well can you, slagle and pradk talk, and post an update to the thread with your preferred approach? 14:38:47 <jistr> sure thing. If we can find better solution than what we've found out so far, that would be great. 14:38:58 <shardy> +1 14:39:06 <shardy> Ok, anything else re releases or backports to mention? 14:39:36 <shardy> #topic CI 14:39:52 <shardy> derekh: care to give an update re the new super-speedy CI? 14:40:21 <derekh> Main CI story is that we (eventually) got HW upgrades done for the machines doing CI 14:40:33 <derekh> all of them not have 128G of RAM (was 64) 14:40:42 <derekh> and 200G SSD's 14:41:10 <shardy> AFAICT the performance looks quite a bit better? 14:41:15 <derekh> but w'ere only currently using the SSD'd for the testenv's , jenkins slaves arn't using them yet 14:41:34 <EmilienM> shardy: well, we haven't had much time to evaluate, CI was broken a lot of time since Friday. 14:41:37 <derekh> testenvs is where it was most important though 14:41:54 <EmilienM> but yeah, upgrades jobs ~1h45 I've seen 14:42:12 <derekh> shardy: ya things look better but I'm reluctant to say anything concrete until we have a few days where tunk is actually working 14:42:17 <EmilienM> some nonha take 1h23 14:42:30 <dprince> performance is definately improved. Would be nice to try and quantify it a bit further though 14:42:45 <derekh> the upgrade took a day longer then I had hoped, due to our bastion host refusing to reboot 14:42:58 <shardy> Yeah, definitely moving in the right direction, but would be good to figure out if we can get the runtimes a little more consistent 14:43:16 <derekh> dprince: yup, once we get a few days of results I'll try and do a comparison 14:43:25 <derekh> might even generate a graph or something 14:43:27 <shardy> e.g I've seen nonha jobs vary between 1hr11 and 1hr40+ 14:43:28 <dprince> derekh: thanks for driving this 14:43:55 <shardy> pabelanger: did you have anything to add re your CI work? 14:43:57 <dprince> and thanks for the teamwork in configuring the new drives in the BIOS (slagle, bnemec, EmilienM) 14:44:02 <pabelanger> sure 14:44:13 <pabelanger> I'd like to see if we can merge: https://review.openstack.org/#/c/316973/ 14:44:15 <shardy> +1 thanks to everyone involved in the hardware upgrade and subsequent cleanup 14:44:22 <derekh> ya,thanks all with the bios help, that would have been a PITA alone 14:44:30 <EmilienM> also big kudos for pabelanger for staying very late on Thursday evening 14:44:34 <pabelanger> then find a time to migrate to centos-7 jenkins slaves: https://review.openstack.org/#/c/316856/ 14:44:52 <pabelanger> Then the last step is launching an AFS mirror, which derekh is going to be commenting on the ML 14:44:55 <jrist> pabelanger++ 14:46:13 <EmilienM> nice move :-) 14:46:56 <derekh> pabelanger: that patch looks good to me 14:47:11 <shardy> thanks pabelanger, we'll check out those patches 14:47:20 <shardy> #topic Specs 14:47:29 <shardy> https://review.openstack.org/#/q/project:openstack/tripleo-specs+status:open 14:47:35 <shardy> #link https://review.openstack.org/#/q/project:openstack/tripleo-specs+status:open 14:47:59 <shardy> beagles: thanks for your feedback on the dpdk and sriov ones, still need others to check those out 14:48:30 <beagles> yup 14:48:32 <shardy> we also need to confirm direction wrt the lightweight HA one 14:49:01 <shardy> e.g if we can switch to only that model, or if we must support the old model too 14:49:26 <shardy> Any other updates re specs/features? 14:50:18 <shardy> dprince: other than needing more reviews, any updates re composable services? 14:50:41 <shardy> hopefully we can get a clear few days in CI and start landing those 14:51:35 <shardy> I posted a further example of how the fully composable/custom roles might work in https://review.openstack.org/#/c/315679/ 14:51:36 <dprince> shardy: no, reviews are good 14:51:40 <dprince> shardy: working CI will help too 14:51:47 <shardy> Not yet functional, but feedback re the general approach is welcome 14:52:10 <shardy> Ok then 14:52:15 <shardy> #topic Open Discussion 14:52:30 <shardy> Anyone have anything else to raise? 14:52:30 <slagle> shardy: do you think we'd drive the templating via mistral? 14:52:39 <shardy> slagle: yes, that was my plan 14:53:01 <shardy> for now I've put the script in t-h-t, but I'd expect it to move to tripleo-common and be driven via mistral as part of the deployment workflow 14:53:07 <gfidente> hey guys I wanted to ask if anybody has thoughts about the problem raised by EmilienM in the mailing list? 14:53:09 <slagle> cool, will have a look 14:53:18 <gfidente> on sharing data across roles 14:53:43 <shardy> gfidente: I think the allnodes deployment is probably the place to do it 14:53:49 <gfidente> and if there are opinions about my reply to have an 'allNodesConfig' output from the different toles? 14:54:05 <shardy> we've done that in various templates already 14:54:10 <EmilienM> let's try that 14:54:25 <gfidente> ack 14:54:36 <shardy> gfidente: Yeah, figuring out how that may wire in to composable services is tricky, but the only way to pass data between all the roles will be in allnodeconfig I think 14:54:50 <EmilienM> gfidente: do you have a PoC already? 14:54:58 <EmilienM> or any example of existing code? 14:55:03 <gfidente> EmilienM no so we can start by using the *existing* allNodesConfig 14:55:14 <gfidente> until we figure how to 'compose' that from the roles 14:55:33 <dprince> gfidente: the composable role bits we have today should be thought of as global config blogs 14:55:55 <dprince> gfidente: any per node settings are still being handled in the server templates themselves (compute.yaml, controller.yaml, etc.) 14:56:25 <dprince> and then the all nodes configs roll up that information into lists, etc. when needed 14:56:45 <gfidente> dprince yes but it's only distributed to nodes of same 'type' 14:56:46 <dprince> if we need to access data between the two I'd just suggest using hiera I think 14:57:11 <gfidente> so yes for networking I have a submission which dumps IPs into hieradata 14:57:28 <gfidente> but for inter-role data we can't assume hiera is available 14:57:45 <gfidente> because we could be writing hiera on a different node then the one we need to read the data from 14:57:51 <dprince> gfidente: it may not be available, but it eventually would be right? 14:57:57 <dprince> gfidente: hiera lookup with a default? 14:58:08 <gfidente> let's continue on #tripleo 14:58:15 <shardy> +1 14:58:16 <gfidente> it's probably different cases we need to cover 14:58:35 <dprince> gfidente: sure, there are some corner cases to cover yet 14:58:50 <gfidente> hiera seems okay for networking though 14:59:00 <gfidente> cause we dump it from controller and consume from service 14:59:05 <shardy> Ok, we're pretty much out of time, thanks everyone! 14:59:07 <gfidente> on same node 14:59:11 <shardy> #endmeeting