#openstack-meeting-alt log

14:00:44 <shardy> #startmeeting tripleo
14:00:44 <openstack> Meeting started Tue May 17 14:00:44 2016 UTC and is due to finish in 60 minutes.  The chair is shardy. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:45 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:47 <EmilienM> o/
14:00:48 <jrist> o/
14:00:48 <openstack> The meeting name has been set to 'tripleo'
14:00:49 <shardy> #topic rollcall
14:00:53 <shadower> hey
14:00:55 <marios> o/
14:00:55 <shardy> Hi all, who's around?
14:00:58 <rdopiera> hi
14:01:02 <gfidente> o/
14:01:10 <jdob> o/
14:01:15 <sshnaidm> o/
14:01:19 <pabelanger> o/
14:01:40 <shardy> #link https://wiki.openstack.org/wiki/Meetings/TripleO
14:01:46 <jistr> o/
14:01:54 * myoung waves
14:02:08 <shardy> #topic agenda
14:02:08 <shardy> * one off agenda items
14:02:08 <shardy> * bugs
14:02:08 <shardy> * Projects releases or stable backports
14:02:08 <shardy> * CI
14:02:10 <shardy> * Specs
14:02:13 <shardy> * open discussion
14:02:28 <shardy> Does anyone have any more one-off items to add today?
14:02:37 <sshnaidm> shardy, please add to items: "tempest run for nonha periodic jobs"
14:02:50 <shardy> sshnaidm: ack, thanks
14:03:04 * beagles wanders in a couple of min late
14:03:09 <rbrady> o/
14:03:16 <shardy> #topic one off agenda items
14:03:34 <shardy> #info should we remove the containers job (slagle)
14:03:51 <shardy> slagle: did you chat with rhallisey or Slower about that?
14:03:54 <akrivoka> \o
14:04:09 <jcoufal> o/
14:04:12 <dprince> hi
14:04:13 <EmilienM> we should rather bring it back to green, isn't?
14:04:15 <chem`> yo
14:04:18 <shardy> it's been broken for a long time, so I agree we either need to fix it or remove it
14:04:24 <slagle> shardy: no, i wasn't able to last week, that's why i added it to the meeting
14:04:34 <shardy> EmilienM: well, yes, but folks said they were doing that and it's not happened for $months
14:04:45 <EmilienM> :-(
14:04:49 <weshay> o/
14:04:52 <slagle> i would just like to know if people are working on fixing it
14:05:03 <EmilienM> he has a bunch of patches in progress for that
14:05:20 <EmilienM> they need to be rebased / passing CI / merged and things could be better
14:05:34 <enikher> hey
14:05:40 <enikher> #info Nikolas Hermanns
14:05:41 <rhallisey> hey
14:05:41 <d0ugal> o/
14:05:46 <rhallisey> sorry I'm late
14:05:50 <derekh> o/
14:05:53 <shardy> rhallisey: Hey, we're discussing the containers CI job
14:05:55 <slagle> EmilienM: right there was a lot of discussion at the summit though about not using docker-compose any longer, etc
14:06:04 <shardy> what's the status there, do we have a plan to get it working again?
14:06:22 <ccamacho> o/
14:06:51 <rhallisey> so the job worked for awhile. I can rebase it to get it back working, but with composable roles the gate will be flaky
14:06:59 <dprince> shardy: we will need the job again at some point
14:07:03 <adarazs> o/
14:07:06 <qasims> o/
14:07:14 <shardy> rhallisey: Ok, so we're trying to decide if we can get it working, or if it should be temporarily removed
14:07:16 <slagle> of course, we can always add it back
14:07:31 <shardy> obviously getting it working is preferable, but we need folks willing to keep it green
14:07:40 <rhallisey> I considered the current state as temproarily removed
14:07:50 <rhallisey> which I think is appropriate
14:08:00 <EmilienM> maybe keep it only for tripleo-ci project jobs so rhallisey can still run some tests or moving it to experimental pipeline
14:08:09 <EmilienM> so he can "check experimental" when needed
14:08:13 <shardy> rhallisey: Ok, well it's disabled, but we're discussing removing it from the default pipeline
14:08:22 <shardy> EmilienM: +1 that's perhaps a reasonable compromise
14:08:35 <rhallisey> shardy, ya I agree with EmilienM
14:08:40 <rhallisey> that would be best
14:08:46 <shardy> slagle: does that work for you?
14:08:52 <slagle> sure
14:08:53 <derekh> yup +1 to what EmilienM said, that way it wont be using up as many jenkins slaves
14:09:01 <EmilienM> rhallisey: if you're not familiar with infra, I can make the patch, let me know.
14:09:14 <shardy> #info agreement to make containers job temporarily only run via experimental pipeline
14:09:39 <shardy> #info tempest run for nonha periodic jobs
14:09:44 <shardy> sshnaidm: ^^
14:09:47 <rhallisey> EmilienM, may ask some questions.  I've done it once :)
14:10:04 <EmilienM> rhallisey: cool, we catch-up after the meeting
14:10:12 <sshnaidm> I think timeout is not the issue now after HW upgrade as I see in logs
14:10:34 <sshnaidm> so we can try to run tempest tests on periodic nonha jobs as the shortest ones
14:10:41 <rhallisey> shardy, sorry about being a tad late.  I'm in a large meeting in westford
14:10:54 <shardy> rhallisey: no problem
14:10:59 <sshnaidm> https://review.openstack.org/#/c/297038/
14:11:11 <rhallisey> container stuff
14:11:31 <derekh> sshnaidm: ya, I think we have the time available to do this now
14:11:52 <dprince> I'd like to hold on Tempest
14:12:00 <dprince> It really isn't our biggest problem ATM
14:12:05 <EmilienM> +1
14:12:13 <EmilienM> our CI is broken almost every day
14:12:17 <EmilienM> can we stabilize it first?
14:12:28 <dprince> like seriously... could someone point out what if anything Tempest is going to catch that our existing CI isn't already telling us
14:12:30 <derekh> sshnaidm: befor we merge that though can we just do one think, submit a WIP patch to run a fake period job to make sure there working as that patch isn't tested in ci
14:12:52 <shardy> derekh: I was just about to ask how we see that patch actually working :)
14:12:53 <dprince> Our existing ping test is working quite well
14:12:57 <sshnaidm> dprince, we already caought 2 bugs with tempest runs
14:13:09 <sshnaidm> derekh, sure
14:13:09 <trown> sshnaidm: what bugs?
14:13:26 <dprince> sshnaidm: that is great. The question is does the extra time Tempest costs us warrent running it on every single test
14:13:29 <EmilienM> on the other hand running tempest on periodic job does not hurt as much as running in our gate.
14:13:31 <sshnaidm> trown, I don't remember the numbers, one is about sahara non available and another one, I can find later
14:13:35 <trown> I have seen it catch packaging bugs where out of tree tempest plugins werent packaged right
14:13:44 <shardy> dprince: I think the proposal is only to run it once per day?
14:13:47 <dprince> sshnaidm: If those bugs are like constantly regressing you may have a case
14:13:56 <EmilienM> trown: right, puppet-CI catches is very often.
14:13:57 <dprince> shardy: we've been over this *so* many times
14:13:58 <derekh> dprince: its, for the periodic job, it doesn't add anything to normal ci runs
14:14:06 <dprince> nobody is again running tempest in the periodic jobs
14:14:16 <dprince> okay, that is fine then
14:14:33 <sshnaidm> dprince, EmilienM I don't think tempest will hurt CI, the current fails have their own reasons, not sure tempest test will break something
14:14:40 <shardy> dprince: we went over it a couple of weeks ago, and the consensus was OK, let's add it to the periodic job if we have enough walltime
14:14:50 <dprince> period is fine
14:15:03 <slagle> if we merge that, our promotes, which are also done by periodic jobs, will be dependent on tempest
14:15:04 <derekh> so we're all good
14:15:15 <derekh> slagle: yes they will
14:15:19 <dprince> slagle: that is a good point
14:15:22 <sshnaidm> slagle, right
14:15:29 <slagle> i dont think we really need that tbh
14:15:36 <dprince> but... my main concern is adding wall time to the existing jobs
14:15:51 <slagle> what about a separate periodic job for tempest that does not influence the promote?
14:16:16 <sshnaidm> dprince, as I saw all last nonha jobs it's not an issue, we have a time now
14:16:34 <sshnaidm> slagle, the point here is not promote bugs, afaiu..
14:16:51 <dprince> sshnaidm: even so. I think what you are hearing is basically Tempest doesn't add much to our existing CI
14:16:55 <pabelanger> I raised a question on the ML about the CI pipeline recently: http://lists.openstack.org/pipermail/openstack-dev/2016-May/095143.html and have some ideas how to reduce CI times.  Would be interested in feedback on it after the meeting
14:17:15 <dprince> sshnaidm: there is value in having our promotion job match (as closely as possible) what our Trunk jobs do as well
14:17:18 <sshnaidm> dprince, if it already caught a few bugs, why not?
14:18:01 <dprince> sshnaidm: list the bugs and make the case. I'm more interested if those bugs are perhaps corner cases... that may not often cause regressions in TripleO
14:18:08 <shardy> Ok, we're going to have to time-box this discussion and move on I think
14:18:21 <shardy> we keep having the same discussion every time, so lets all vote on https://review.openstack.org/#/c/297038
14:18:23 <slagle> i think the issue is that not every bug out there needs to block tripleo forward progress
14:18:45 <shardy> sshnaidm: please link the bugs found there, and post a link to where we can see results of it running
14:18:49 <derekh> slagle: that true but if it doesn't block something we'll possibly ignore it
14:19:24 <shardy> slagle: every bug out there typically does block forward progress anyway tho ;)
14:19:33 <slagle> derekh: agreed, and i think i'm leaning towards ignoring some things is ok :)
14:19:45 <shardy> #topic bugs
14:19:46 <slagle> obviously we want to find and report all bugs that we can
14:20:07 <derekh> slagle: then a valid bug could make it the whole way to the end of a cycle until we start panicing about it
14:20:24 <shardy> #link https://bugs.launchpad.net/tripleo/?orderby=-id&start=0
14:20:38 <sshnaidm> dprince, shardy one of them: https://review.openstack.org/#/c/309042/
14:20:50 <dprince> shardy: I'm -2 on the Tempest patch as is I think
14:20:58 <shardy> So there's a packaging issue wrt mistral (caught by the periodic job), it now needs designateclient
14:22:03 <shardy> I'm trying to get a packaging fix in for that, and I confirmed the long term plan is for mistral to have soft dependencies so we won't have to install all-the-clients
14:22:14 <shardy> Any other bugs folks want to discuss?
14:22:26 <dprince> shardy: so rbrady filed a bug on the Mistral timeouts issue
14:22:30 <dprince> shardy: https://bugs.launchpad.net/mistral/+bug/1581649
14:22:31 <openstack> Launchpad bug 1581649 in Mistral "Action Execution Response Timeout" [High,New]
14:22:59 <rbrady> dprince: I'm still working on that bug
14:23:00 <dprince> which I don't actually see myself. But apparently it could be related to eventlet and the fact that we run Mistral API in Apache WSGI
14:23:15 <d0ugal> I am seeing that bug too
14:23:19 <dprince> rbrady: yep, just highlighting it so people know where to look if they hit that issue
14:23:37 <dprince> shardy: that is all from me on bugs
14:23:49 <shardy> dprince: interesting, thanks!  I hit that myself quite a few times
14:24:04 <marios> +1 me too i thought it was underpowered undercloud
14:24:18 <rbrady> it would be helpful that anyone else seeing that bug notes that in launchpad
14:24:23 <marios> i often had to just re-run the execution
14:24:40 <shardy> rbrady: will do - I hit it pretty much 50% of my executions
14:24:48 <marios> shardy: same
14:24:56 <shardy> which sounds like a blocker to switching the client over to mistral
14:25:13 <d0ugal> +1
14:25:20 <shardy> #link https://launchpad.net/tripleo/+milestone/newton-1
14:25:36 <dprince> shardy: yes, this needs to be fixed
14:26:04 <shardy> So, the newton-1 milestone is in less than two weeks - I'm hoping we can cut some releases around the end of next week (or during the week after, but I'll be on PTO)
14:26:38 <shardy> can folks review that list, and ensure anything you consider release-blocker gets on there, and anything else gets moved to n-2?
14:26:49 <marios> shardy: well, https://blueprints.launchpad.net/tripleo/+spec/overcloud-upgrades
14:26:57 <marios> shardy: i was winderig about that one
14:27:12 <shardy> marios: Yup, that was going to be my next question - can we break that down into some smaller pieces?
14:27:25 <shardy> that's a catch-all BP and I'm guessing we can't declare it implemented for n-1?
14:27:42 <marios> shardy: right... i wasn't even clear what is was about exactly
14:28:16 <jistr> yea i think we can break it down into smaller BPs
14:28:40 <marios> shardy: i can work with jistr to define something for next week's meeting?
14:28:42 <shardy> marios: any remaining pieces (including CI coverage) we need to be able to declare mitaka->newton upgrades fully supported in upstream builds
14:28:54 <jistr> marios: +1
14:28:58 <marios> shardy: or is that too late already?
14:28:59 <shardy> marios: sure, sounds good, thanks!
14:29:22 <shardy> marios: we can chat about it between now and next week - please enumerate the todo list somewhere, e.g etherpad
14:29:32 <marios> shardy: ack
14:29:34 <jistr> ack
14:29:35 <jcoufal> +1 feel free to drag me in if needed
14:29:39 <shardy> trown: what's the status wrt quickstart - can we declare that implemented now?
14:30:10 <trown> shardy: we are still a bit blocked on the image front, but I am meeting with derekh later today to sync on that
14:30:25 <trown> shardy: since the only consumable image atm is from RDO
14:31:00 <shardy> trown: Ok, looks like we may have to bump to n-2 then, can you perhaps link an etherpad from the BP with details of what remains, or link bugs in the tripleo-quickstart launcpad?
14:31:15 <trown> shardy: will do
14:31:17 <shardy> zero blueprints for n-1
14:31:42 <shardy> Ok, sorry we kinda side-tracked off bugs a bit there
14:31:49 <shardy> #topic Projects releases or stable backports
14:32:35 <slagle> on the aodh backport topic, i did see what deps get pulled in if you try and install aodh from mitaka on a liberty controller
14:32:40 <shardy> So, pradk posted http://lists.openstack.org/pipermail/openstack-dev/2016-May/095097.html
14:32:42 <slagle> and it was only aodh related packages
14:32:50 <slagle> afaict
14:33:10 <slagle> so i think we might be able to migrate to aodh as the first step in a mitaka upgrade
14:33:13 <shardy> slagle: interesting, so we could potentially do that as part of the upgrade vs backport?
14:33:22 <slagle> shardy: yes, that's my hope
14:33:24 <trown> I like that better
14:33:24 <slagle> and avoid the backport
14:33:29 <EmilienM> +1
14:33:30 <shardy> I think that would be better if possible
14:33:54 <shardy> If folks have other opinions please reply to the thread
14:34:14 <slagle> i'll reply with what i found in my test
14:34:19 <shardy> I guess this is a somewhat common requirement, so it'd be good to solve it in a general way
14:34:35 <shardy> since for stable/mitaka we will not be even considering such backports
14:35:06 <shardy> I already had to abandon our application for stable:follows_policy governance tag due to stable/liberty
14:35:39 <shardy> we're now blocked from applying that until liberty is EOL, so a non-backport solution would be preferable from that angle also
14:35:48 <shardy> slagle: thanks
14:35:50 <jistr> the general way we had in mind previously involved doing it as a last step of the upgrade
14:36:04 <jistr> together with the "convergence via puppet"
14:36:11 <jistr> but that implies not missing the deprecation period
14:36:28 <jistr> (we need to be able to use the old service/config/whatever with the new packages)
14:36:36 <shardy> jistr: AIUI the problem is aodh obsoletes the ceilometer-alarm packages, so then we'd have no alarm service for the duration of the upgrade?
14:36:48 <shardy> (which could be days while all the computes are upgraded?)
14:36:51 <jistr> yea
14:37:00 <jistr> but it wouldn't have to obsolete them necessarily i think
14:37:12 <jistr> but the issue is
14:37:28 <jistr> that ceilometer-alarm is not present in upstream release of mitaka IIUC
14:38:05 <shardy> jistr: ack - well can you, slagle and pradk talk, and post an update to the thread with your preferred approach?
14:38:47 <jistr> sure thing. If we can find better solution than what we've found out so far, that would be great.
14:38:58 <shardy> +1
14:39:06 <shardy> Ok, anything else re releases or backports to mention?
14:39:36 <shardy> #topic CI
14:39:52 <shardy> derekh: care to give an update re the new super-speedy CI?
14:40:21 <derekh> Main CI story is that we (eventually) got HW upgrades done for the machines doing CI
14:40:33 <derekh> all of them not have 128G of RAM (was 64)
14:40:42 <derekh> and 200G SSD's
14:41:10 <shardy> AFAICT the performance looks quite a bit better?
14:41:15 <derekh> but w'ere only currently using the SSD'd for the testenv's , jenkins slaves arn't using them yet
14:41:34 <EmilienM> shardy: well, we haven't had much time to evaluate, CI was broken a lot of time since Friday.
14:41:37 <derekh> testenvs is where it was most important though
14:41:54 <EmilienM> but yeah, upgrades jobs ~1h45 I've seen
14:42:12 <derekh> shardy: ya things look better but I'm reluctant to say anything concrete until we have a few days where tunk is actually working
14:42:17 <EmilienM> some nonha take 1h23
14:42:30 <dprince> performance is definately improved. Would be nice to try and quantify it a bit further though
14:42:45 <derekh> the upgrade took a day longer then I had hoped, due to our bastion host refusing to reboot
14:42:58 <shardy> Yeah, definitely moving in the right direction, but would be good to figure out if we can get the runtimes a little more consistent
14:43:16 <derekh> dprince: yup, once we get a few days of results I'll try and do a comparison
14:43:25 <derekh> might even generate a graph or something
14:43:27 <shardy> e.g I've seen nonha jobs vary between 1hr11 and 1hr40+
14:43:28 <dprince> derekh: thanks for driving this
14:43:55 <shardy> pabelanger: did you have anything to add re your CI work?
14:43:57 <dprince> and thanks for the teamwork in configuring the new drives in the BIOS (slagle, bnemec, EmilienM)
14:44:02 <pabelanger> sure
14:44:13 <pabelanger> I'd like to see if we can merge: https://review.openstack.org/#/c/316973/
14:44:15 <shardy> +1 thanks to everyone involved in the hardware upgrade and subsequent cleanup
14:44:22 <derekh> ya,thanks all with the bios help, that would have been a PITA alone
14:44:30 <EmilienM> also big kudos for pabelanger for staying very late on Thursday evening
14:44:34 <pabelanger> then find a time to migrate to centos-7 jenkins slaves: https://review.openstack.org/#/c/316856/
14:44:52 <pabelanger> Then the last step is launching an AFS mirror, which derekh is going to be commenting on the ML
14:44:55 <jrist> pabelanger++
14:46:13 <EmilienM> nice move :-)
14:46:56 <derekh> pabelanger: that patch looks good to me
14:47:11 <shardy> thanks pabelanger, we'll check out those patches
14:47:20 <shardy> #topic Specs
14:47:29 <shardy> https://review.openstack.org/#/q/project:openstack/tripleo-specs+status:open
14:47:35 <shardy> #link https://review.openstack.org/#/q/project:openstack/tripleo-specs+status:open
14:47:59 <shardy> beagles: thanks for your feedback on the dpdk and sriov ones, still need others to check those out
14:48:30 <beagles> yup
14:48:32 <shardy> we also need to confirm direction wrt the lightweight HA one
14:49:01 <shardy> e.g if we can switch to only that model, or if we must support the old model too
14:49:26 <shardy> Any other updates re specs/features?
14:50:18 <shardy> dprince: other than needing more reviews, any updates re composable services?
14:50:41 <shardy> hopefully we can get a clear few days in CI and start landing those
14:51:35 <shardy> I posted a further example of how the fully composable/custom roles might work in https://review.openstack.org/#/c/315679/
14:51:36 <dprince> shardy: no, reviews are good
14:51:40 <dprince> shardy: working CI will help too
14:51:47 <shardy> Not yet functional, but feedback re the general approach is welcome
14:52:10 <shardy> Ok then
14:52:15 <shardy> #topic Open Discussion
14:52:30 <shardy> Anyone have anything else to raise?
14:52:30 <slagle> shardy: do you think we'd drive the templating via mistral?
14:52:39 <shardy> slagle: yes, that was my plan
14:53:01 <shardy> for now I've put the script in t-h-t, but I'd expect it to move to tripleo-common and be driven via mistral as part of the deployment workflow
14:53:07 <gfidente> hey guys I wanted to ask if anybody has thoughts about the problem raised by EmilienM in the mailing list?
14:53:09 <slagle> cool, will have a look
14:53:18 <gfidente> on sharing data across roles
14:53:43 <shardy> gfidente: I think the allnodes deployment is probably the place to do it
14:53:49 <gfidente> and if there are opinions about my reply to have an 'allNodesConfig' output from the different toles?
14:54:05 <shardy> we've done that in various templates already
14:54:10 <EmilienM> let's try that
14:54:25 <gfidente> ack
14:54:36 <shardy> gfidente: Yeah, figuring out how that may wire in to composable services is tricky, but the only way to pass data between all the roles will be in allnodeconfig I think
14:54:50 <EmilienM> gfidente: do you have a PoC already?
14:54:58 <EmilienM> or any example of existing code?
14:55:03 <gfidente> EmilienM no so we can start by using the *existing* allNodesConfig
14:55:14 <gfidente> until we figure how to 'compose' that from the roles
14:55:33 <dprince> gfidente: the composable role bits we have today should be thought of as global config blogs
14:55:55 <dprince> gfidente: any per node settings are still being handled in the server templates themselves (compute.yaml, controller.yaml, etc.)
14:56:25 <dprince> and then the all nodes configs roll up that information into lists, etc. when needed
14:56:45 <gfidente> dprince yes but it's only distributed to nodes of same 'type'
14:56:46 <dprince> if we need to access data between the two I'd just suggest using hiera I think
14:57:11 <gfidente> so yes for networking I have a submission which dumps IPs into hieradata
14:57:28 <gfidente> but for inter-role data we can't assume hiera is available
14:57:45 <gfidente> because we could be writing hiera on a different node then the one we need to read the data from
14:57:51 <dprince> gfidente: it may not be available, but it eventually would be right?
14:57:57 <dprince> gfidente: hiera lookup with a default?
14:58:08 <gfidente> let's continue on #tripleo
14:58:15 <shardy> +1
14:58:16 <gfidente> it's probably different cases we need to cover
14:58:35 <dprince> gfidente: sure, there are some corner cases to cover yet
14:58:50 <gfidente> hiera seems okay for networking though
14:59:00 <gfidente> cause we dump it from controller and consume from service
14:59:05 <shardy> Ok, we're pretty much out of time, thanks everyone!
14:59:07 <gfidente> on same node
14:59:11 <shardy> #endmeeting