14:00:03 <shardy> #startmeeting tripleo 14:00:09 <ccamacho> o/ 14:00:10 <openstack> Meeting started Tue Jun 7 14:00:03 2016 UTC and is due to finish in 60 minutes. The chair is shardy. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:11 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:13 <openstack> The meeting name has been set to 'tripleo' 14:00:20 <shardy> #topic rollcall 14:00:22 <EmilienM> o/ 14:00:27 <shardy> Hey all, who's around? 14:00:31 <marios> \o 14:00:36 <dprince> hey 14:00:41 <slagle> hi 14:00:46 <skramaja> hi 14:00:48 <rohitpagedar__> Hi 14:00:52 <derekh> o/ 14:00:53 <jdob> o/ 14:01:06 <jokke_> o/ 14:01:21 <pradk> o/ 14:01:24 <shadower> hey 14:01:24 <trown> o/ 14:01:50 <shardy> #link https://wiki.openstack.org/wiki/Meetings/TripleO 14:02:01 <shardy> #topic agenda 14:02:02 <shardy> * one off agenda items 14:02:02 <shardy> * bugs 14:02:02 <shardy> * Projects releases or stable backports 14:02:02 <shardy> * CI 14:02:04 <shardy> * Specs 14:02:06 <shardy> * open discussion 14:02:15 <shardy> Anyone have anything to add today? We don't have any one-off items in the wiki 14:02:48 <rohitpagedar__> o/ 14:02:52 <jrist> o/ 14:03:10 <rbrady> o/ 14:03:11 <adarazs> o/ 14:03:13 <beagles> o/ 14:03:21 <d0ugal> o/ 14:03:38 <shardy> Ok then, lets get started 14:03:43 <shardy> #topic bugs 14:04:10 <jistr> o/ 14:04:13 <saneax> o/ 14:04:13 <shardy> So, I flipped all released bugs to Fix Released for n-1: 14:04:16 <shardy> https://launchpad.net/tripleo/+milestone/newton-1 14:04:34 <shardy> if folks can target bugs to n-2 from now on so we can track them that would be good 14:05:28 <shardy> #link https://bugs.launchpad.net/tripleo/+bugs?orderby=-id&start=0 14:05:39 <shardy> Anyone have anythink to highlight re bugs this week? 14:05:55 <shardy> I know we have various CI issues, but anything else we need to prioritize re bugs? 14:06:26 <shardy> I raised https://bugs.launchpad.net/tripleo/+bug/1589983 earlier, would be good if someone feels like picking that up 14:06:26 <openstack> Launchpad bug 1589983 in tripleo "tripleoclient leaks temporary files" [Medium,Triaged] 14:07:25 <weshay> \o 14:07:27 <shardy> beagles: how's your bug cleanup been going? 14:07:35 <myoung> o/ 14:07:49 <shardy> I've been attempting to clean up old bugs too, a bunch of things are due to expire in the next 2 weeks, which is good 14:08:01 <shardy> and I've set all old Fix Committed things to Fix Released 14:08:10 <beagles> pretty well.. I didn't do much last week but got up to 1 year old bugs :) 14:08:23 <shardy> beagles: ack, well thanks for your help :) 14:08:42 <dprince> ls 14:08:45 <beagles> np.. I was just noticing on our current bug list that at least one is an RFE... 14:09:09 <beagles> what was the story with those... are bugs fine for that kind of thing 14:09:11 <shardy> beagles: that's OK, but they should be tagged spec-lite and marked as wishlist priority 14:09:18 <beagles> cool 14:09:48 <shardy> http://docs.openstack.org/developer/glance/contributing/blueprints.html#glance-spec-lite 14:10:03 <beagles> thanks 14:10:05 <shardy> we agreed a while back that folks could use a process similar to glance if they wish 14:10:18 <shardy> blueprints with/without specs as appropriate are also fine 14:10:35 <shardy> the main thing is we target things to the milestones so we have an idea on progress 14:10:45 <shardy> Ok anything else re bugs or shall we move on? 14:11:08 <shardy> #topic Projects releases or stable backports 14:11:27 <shardy> So, thanks very much to EmilienM and anyone else who helped get n-1 shipped last week :) 14:11:41 <trown> EmilienM++ 14:11:41 <EmilienM> cool, I missed puppet release :P 14:11:45 <EmilienM> but no worries 14:12:12 <shardy> One thing I've seen discussed in the context of RDO is a tagging releases for stable branches 14:12:32 <shardy> does anyone have any thoughts on how best to handle that, now that stable-maint don't tag periodic releases? 14:12:42 <shardy> trown: any suggestions on preferences there? 14:13:07 <shardy> the issue I've heard is some folks want to consume RDO stable/mitaka which is the latest tagged release, not the heat of the stable branch 14:13:13 <slagle> we've done releases from stable branches before 14:13:13 <shardy> so they end up missing backported things 14:13:22 <trown> shardy: I wonder if we could just do stable milestone releases when we do the master milestone releases 14:13:25 <slagle> is it not just requesting a new release via openstack/releases? 14:13:36 <jokke_> shardy: In glance we have been trying to follow the idea that we tag when ever we merge something and it makes sense 14:13:52 <trown> shardy: assuming there is something new to release 14:13:56 <shardy> slagle: yup, I'm just wondering if we should have a plan re their period or other criteria which triggers semi-regular releases 14:14:06 <jokke_> as in we tag if something critical gets backported and otherwise we tag in bundles of few patches merged 14:14:33 <slagle> yea, i was more or less doing it on demand previously 14:15:01 <jokke_> shardy: that was discussed quite a bit when stable decided that no periodicals and basically the consensus was that anything worth of backporting should be worth of releasing 14:15:01 <shardy> jokke_: ack, yeah we could do that, but TripleO has quite a lot of repos, so it's somewhat easier to just do periodic releases of everything vs monitoring actively everywhere 14:15:32 <jokke_> shardy: isn't that what stable liaison is for ;) 14:16:36 <shardy> jokke_: Yeah, I think by default that's the PTL but if anyone else wants to step up to help I'm more than happy :) 14:16:57 <shardy> EmilienM: has offered to help as a release liason, so if anyone wants to share the load of proposing stable releases, that'd be good 14:17:00 <shardy> https://wiki.openstack.org/wiki/CrossProjectLiaisons 14:17:00 <EmilienM> yep 14:17:11 * coolsvap_mobile interested 14:17:18 <jokke_> shardy: let me get used to the codebase first, maybe next cycle ;) 14:17:21 <EmilienM> I'm happy to help with that, I'm already familiar with release management 14:17:28 <coolsvap_mobile> But I will need some help initially 14:17:58 <shardy> Ok, lets follow up in #tripleo - IMO it'd be best if we can spread the load around vs expecting one person to do everything 14:18:06 <shardy> given the number of repos/branches involved 14:18:25 <jokke_> I'm happy to help with any stable/release general questions so feel free to shoot so we don't overload the few carrying the responsibilities 14:19:02 <shardy> Ok, anything else on release/backports before we move on to CI? 14:19:49 <shardy> #topic CI 14:19:57 <EmilienM> I noticed stable/mitaka nonha job is broken https://review.openstack.org/#/c/324527/ 14:20:05 <EmilienM> I haven't spent time to figure why yet 14:20:21 <EmilienM> but imho we should high-prio this work too 14:20:28 <shardy> EmilienM: yeah, was just about to mention that - I started looking at it yesterday but not yet figured out the problem 14:20:33 <shardy> EmilienM: can you raise a bug please? 14:20:46 <EmilienM> shardy: I will 14:20:46 <shardy> AFAICT pingtest is failing because nova can't connect to keystone 14:21:09 <EmilienM> #action EmilienM to investigate & file a bug to why stable/mitaka nonha job is failing 14:21:29 <shardy> So, more generally, CI has been broken by the move to centos7 slaves 14:21:45 <shardy> pabelanger: Do you have any update on the revert of that? 14:22:01 <EmilienM> I ran a recheck on a patch to see if CI is now working 14:22:11 <EmilienM> but not sure if something else needs to be updated 14:22:19 <pabelanger> no, in fact, I don't believe the issue is related to centos-7 DIBs. But more the exposure of actually failures running on centos-7. 14:22:27 <pabelanger> 2 issue 14:22:41 <pabelanger> 1) failure to connect to gearman servers. I don't have the ability to look into this 14:22:48 <fungi> the details in https://review.openstack.org/326182 were meager. is the suspected issue that your jobs currently don't work on centos 7 and you need time to port forward form centos 6? 14:23:05 <pabelanger> 2) openstack overcloud deploy takes 2+ hour (again no logs generated for this) 14:23:16 <dprince> pabelanger: I saw a few spurious gearman failures, but most of the issues were related to stack timeouts I think 14:23:26 <dprince> agree there may be multiple issues though 14:23:54 <pabelanger> To move forward, I think we need tripleo members to review the failures and see what is going on (and why no logs are produced) 14:24:49 <pabelanger> http://logs.openstack.org/11/326311/4/check-tripleo/gate-tripleo-ci-f22-nonha/208cda5/console.html 14:24:50 <trown> there was no way to test this in advance? 14:24:51 <dprince> using Centos 7 would be more ideal for TripleO. And pabelanger mentions that infra wants to soon deprecate the way TripleO builds up its jenkins image (rather would just use DIB directly) 14:25:01 <derekh> pabelanger: I wasn't in yesterday so havn't looked but on 2) sueually we get no logs if a job timesout 14:25:04 <pabelanger> gearman failures are also happening on fedora-22 14:25:05 <dprince> so we do want to support the effort to switch sooner rather than later 14:25:21 <derekh> pabelanger: as the last pard of the job is to copy logs off the nodes , this doesn't get executed 14:25:40 <derekh> pabelanger: what were the gearman failures? 14:25:50 <dprince> trown: I think we can test more in advance, yes 14:25:56 <EmilienM> derekh: why isn't it a jenkins publisher? 14:25:57 <pabelanger> derekh: okay, it should be easy to reproduce now in triploe-experimental. I think we need logs to see what is going on with deploy scripts 14:26:43 <pabelanger> trown: I did test in advance 14:26:45 <derekh> EmilienM: the publisher copies them to the log server, but the end of the tripleo jobs copies them onto the jenkins slave 14:27:03 <shardy> gear.NoConnectedServersError: No connected Gearman servers 14:27:11 <EmilienM> yeah pabelanger did experimental-jobs testing, it worked fine afict 14:27:14 <shardy> that doesn't sound like a deploy failure to me 14:27:30 <trown> pabelanger: how did it work then and not now? 14:27:38 <pabelanger> shardy: jobs are timing out after 2.5 hours 14:27:42 * bnemec recalls that the patch merged with 2 of the 3 experimental voting jobs failing 14:27:43 <fungi> we work around that in devstack-gate (i know you never got around to reimplementing under d-g) by having an inner timeout to kill the test payload with enough time to still reshuffle logs into the location the publisher expects 14:27:44 <trown> just trying to understand how we ended up with CI totally down 14:27:47 <bnemec> but I could be wrong. 14:27:51 <shardy> it looks like it's just getting stuck and not really running the job? 14:28:05 <derekh> shardy: seems like a problem with our infrastructure, especially is it was happening on fedora slaves too 14:28:33 <pabelanger> trown: because I believe the issue is not centos-7 but the infra we are deploying too. EG: gearmans servers and other external factors. 14:28:49 <derekh> fungi: yup, we should do something similar 14:29:01 <shardy> Ok, lets work together after the meeting and try to figure it out 14:29:24 <shardy> pabelanger: ack - I think it was the timing which made us want to revert the centos7 change, e.g it merged and everything broke around the same time 14:29:31 <pabelanger> What dprince mention is correct. If we can all iterate on this today, we can likely fix centos-7 with a short amount of work 14:29:34 <derekh> pabelanger: shardy was the gear.NoConnectedServersError: error happening on fedora slaves also? 14:29:44 <pabelanger> derekh: it is now 14:29:52 <pabelanger> derekh: see the log I just linked 14:30:09 <derekh> pabelanger: ack, will take a look after this meeting 14:30:14 <dprince> pabelanger: yep, thanks for your work on this. fast tracking reverts is healthy too though :) 14:30:51 <shardy> Ok, shall we move on and continue discussion after the meeting? 14:30:59 <pabelanger> Sure, we are close. Just want to get us over the line 14:31:12 <shardy> #topic Specs 14:32:09 <shardy> So, in terms of features, we've got about 5 weeks until n-2 14:32:16 <shardy> #link https://launchpad.net/tripleo/+milestone/newton-2 14:32:36 <shardy> we've got 10 blueprints targetted, and we landed zero for n-1 14:32:52 <dprince> shardy: I would like to land remote execution in this timeline. I think it is useful by itself... and also will be helpful to bootstrap Ansible validations 14:33:14 <shardy> I'm wondering if we need to break some of the down, of if we're confident e.g we can get composable services, custom rols and the Mistral API landed in the next few weeks 14:33:50 <shardy> I've had some complaints that some (large) blueprints slipped into n-2 so we probably need to show incremental progress 14:33:58 <dprince> shardy: regarding the composable side of things. I'd like to see us focus on composable services first 14:34:14 <dprince> shardy: the composable roles bits can come later I think 14:34:48 <shardy> dprince: Yeah, although I think it's quite related - e.g we need to sync steps on all roles, or you can't move an e.g "step 4" service from say controller to compute 14:34:51 <EmilienM> dprince++ 14:35:11 <dprince> shardy: for the Mistral workflow stuff I would very much like to see us cut over and start using Mistral for workflows vi python-tripleoclient ASAP 14:35:14 <shardy> there are a few ways to do that, but one way is to template in all the steps for the roles via jinja, which would be an initial step towards composable roles 14:35:41 <dprince> shardy: no need to take baby steps and maintain backwards compat via "python library" code I think 14:36:01 <dprince> shardy: in fact having to code paths for the python-tripleoclient to choose is actually worse I think. 14:36:11 <shardy> dprince: agreed, I'm just wondering if we should have a couple of mini-sprints where we just push and get folks to concentrate on landing one feature for a few days 14:36:39 <dprince> shardy: well, I think these are different groups of people. So they can co-exist 14:37:14 <shardy> dprince: Yeah, I'm just trying to avoid fragmentation of effort meaning $many of the 10 blueprints slipping into n-3 14:37:15 <dprince> shardy: composable services is coming along slowly but surely I think 14:37:35 <EmilienM> slowly mainly because CI issues 14:37:45 <EmilienM> a lot of work is already done or WIP 14:37:58 <dprince> yes, CI is slowing down composable work a bit 14:38:05 <shardy> Yeah, composable services is definitely progressing 14:38:30 <dprince> For Mistral I think we've got a nice spec. But we have just recently worked out some of the finer interface arrangements around the zaqar queue, etc. 14:38:37 <dprince> so there is still some design work ongoing there 14:39:01 <shardy> jistr: https://launchpad.net/tripleo/+milestone/newton-2 has three high priority BPs with no assignee 14:39:14 <shardy> can you get folks to assign themselves and set the delivery status? 14:39:33 <dprince> furthermore, I would like to see us focus on "parity" with the initial workflows... we are actually getting lots of good feedback from guys like dtansur for the baremetal.yaml workflows for example 14:39:36 <beagles> fwiw, I was able to spin up an l3 agent and metadata agent (required for DVR) with the existing merged code and a simple environment file .. but we need that step sync thing to keep that working 14:39:37 <shardy> I'll start deferring anything not obviously started fairly soon to avoid last-minute slips 14:39:49 * beagles blurts sometimes ;) 14:40:13 <shardy> beagles: you mean sync the steps between roles? 14:40:23 <dprince> feedback for future improvements I thinks. Rather than implement all of the improvements for workflows I would suggest capturing the new ideas in bugs and moving on 14:40:25 <jistr> shardy: will try to figure it out. Personally i'm presently tasked 100% downstream, will try to push to at least carve out time for reviews. 14:40:29 <marios> shardy: i thought it was already assigned to me so grabbing https://blueprints.launchpad.net/tripleo/+spec/overcloud-upgrades-workflow-mitaka-to-newton since am working on it 14:40:37 <marios> shardy: is for n2 14:41:04 <shardy> jistr: Ok, that's fine, anything not assigned will just get deferred to n-3 so it's clear it's at-risk 14:41:13 <beagles> shardy: I think so .. basically that the l3 and metadata agent roles do their thing at the proper step... 14:41:17 <dprince> beagles: once we decompose all the roles I think we can rework how -post.yaml's work to better accomidate syncing the steps 14:41:36 <beagles> shardy: or maybe I misunderstood what you meant 14:41:45 <dprince> beagles: we probably need to decompose it all first though so we can move things around more capably in t-h-t 14:41:51 <beagles> dprince: ack 14:42:13 <dprince> beagles: part of the vision I think... we just have to finish composable services first to get there 14:42:18 <beagles> I was just pretty jazzed to try it and see it kind of work this early in the game 14:42:20 <shardy> dprince: Yeah, I think we actually have to move away from -post per-role completely, because you can never sync resources between different nested stacks 14:42:30 <dprince> shardy: exactly, this was the idea 14:42:35 <jokke_> I have cycles, I just will need quite a bit guidance 14:42:57 <shardy> dprince: I'm trying an approach which might work which jinja templates all the steps, for each role, in overcloud.yaml 14:42:58 <jokke_> if it's storage related I have really easy to justify time for it 14:43:00 <dprince> shardy: I think this will be a top level overcloud.yaml architecture thing once we finish 14:43:07 <shardy> that would work, and fits well with the custom roles stuff I think 14:43:16 <shardy> dprince: +1 14:43:27 <shardy> good, sounds like we've reached the same conclusion :) 14:43:50 <shardy> Ok, we've kinda derailed from specs somewhat 14:43:52 <dprince> shardy: I'm less convinced the core team agrees on the Mistral workflow conclusions BTW 14:43:59 <shardy> anything else to raise re specs/features? 14:44:08 <dprince> shardy: seems to be some ambiguity about which code we are moving over to tripleo-common 14:44:27 * dprince wants the "library" concept for tripleo-common to be kept at a minimum 14:44:37 <shardy> dprince: that was why I mentioned a mini-sprint, I'm not sure there's lack of agreement, I just think many folks haven't even looked at it 14:45:01 <shardy> dprince: rbrady was talking about a tripleo-workflows repo to contain all the mistral pieces 14:45:09 <d0ugal> I think there is ambiguity about tripleo-common in general :) 14:45:14 <dprince> shardy: yep, I'm very much a fan of the separate repo too 14:45:16 <shardy> and I agree we should stop (ab)using tripleo-common to put all the things in 14:45:17 <trown> indeed 14:45:27 <trown> wrt ambiguity of tripleo-common 14:45:37 <shardy> I always thought it was weird we decided to put API stuff in there 14:45:38 <dprince> shardy: that said, I'd rather see us cut over and actually use a Mistral workflow for something sooner rather than later 14:45:52 <dprince> shardy: all this talk... and we still aren't actually using it yet 14:46:11 <shardy> dprince: agreed, Ok lets try to get some eyes on it then 14:46:37 <shardy> #topic open discussion 14:46:51 <shardy> I suppose we've already started open discussion, but anything else folks want to raise? 14:47:08 <trown> just something semi-related to the above 14:47:21 <trown> do we intend to present image building through a mistral workflow? 14:47:27 <slagle> just wanted to mention i pushed some patches for multinode ci jobs via nodepool 14:47:33 <trown> or is that a case where we use tripleo-common as a library 14:47:37 <slagle> i'm trying to move forward with that and see how it goes 14:47:38 <d0ugal> trown: I don't think that is a candidate for the first pass 14:47:41 <EmilienM> slagle: w00t, links? 14:47:44 <dprince> trown: meh, I mean... that is a wierd one I think 14:47:57 <shardy> slagle: nice! 14:47:59 <trown> d0ugal: cool, I think it shouldn't be a mistral workflow fwiw 14:48:03 <dprince> trown: if the UI requires it... then maybe. But I'd rather see the UI just require it 14:48:04 <slagle> EmilienM: https://review.openstack.org/#/q/topic:tripleo-multinode 14:48:08 <shardy> trown: IMO it's not something we should consider initially 14:48:13 <d0ugal> trown: Yeah, I wasn't sure but I've not really thought about it yet. 14:48:14 <dprince> trown: rather the UI just require images exist 14:48:19 <shardy> lets just get the basic deployment workflow working first 14:48:23 <d0ugal> dprince: +1 14:48:25 <dprince> trown: building images via Mistral would be an abuse 14:48:42 <trown> dprince: ya that was my thought as well, just wanted to check, because tripleo-common is pretty ambiguous :) 14:49:17 <EmilienM> slagle: just an FYI, https://review.openstack.org/#/c/326095/ needs to be rebased 14:49:30 <jokke_> just comment from the fresh eyes trying to wrap my head around tripleO ... the current amount of repos is already quite a handful trying to get hold of 14:49:31 <dprince> shardy: nice on the multinode work 14:49:48 <jokke_> Just saying, please don't introduce too many new one just because we can 14:49:57 <trown> context: there are a couple folks looking at updating tripleoclient to use the yaml based image build library in tripleo-common, and I didnt want to steer them wrong 14:50:09 <dprince> jokke_: ack, I think the goal is to eventually drop some... like perhaps tripleo-image-elements 14:50:21 <slagle> EmilienM: yes there's actually other issues with it since nodepool is installed from pypi in the jenkins jobs, so my needed depends-on isn't pulled in 14:50:27 <EmilienM> slagle: nice work anyway, thanks 14:50:39 <dprince> trown: I'm okay for the image building YAML's to live in tripleo-common for now 14:50:41 <slagle> EmilienM: so i'll have to rework it a bit to get around that initially 14:50:53 <dprince> trown: perhaps not where we'll have it long term but it is okay 14:51:12 <EmilienM> pabelanger: just an FYI to put this in your radar: https://review.openstack.org/#/q/topic:tripleo-multinode 14:51:34 <shardy> talking of reducing repos, it'd be good to figure out what needs to be moved so we can retire tripleo-incubator 14:52:08 <shardy> The outdated README in that comes up above tripleo-docs in a google search :( 14:52:11 <trown> shardy: is anything still using it? 14:53:01 <shardy> trown: I thought a couple of the scripts were still used in CI but I may be wrong 14:53:11 <dprince> shardy: did we ever land your patch to switch to t-ht- driving os-net-config? 14:53:14 <shardy> perhaps I'll push a patch deleting everything and see what happens ;) 14:53:16 <dprince> shardy: via software deployments? 14:53:24 <shardy> dprince: No, I need to rebase it 14:53:30 <shardy> landing that would be good though 14:53:35 <shardy> I'll rebase it today 14:53:40 <dprince> shardy: I gave you some -1's for nits but I very much like that idea 14:53:52 <bnemec> +1 14:53:54 <pabelanger> EmilienM: thanks, just left a comment on 326095 14:53:57 <shardy> dprince: ack, thanks, I'll address those and hopefully we can get it passing CI 14:54:42 <bnemec> shardy: Note that there was just a patch posted related to the os-net-config script: https://review.openstack.org/#/c/326511 14:54:52 <bnemec> Looks like a valid bug, so we should make sure we don't lose the fix. 14:55:04 <EmilienM> slagle: please let me know if you need help with this work & bindep stuffs 14:55:17 <shardy> bnemec: thanks, I'll rebase on top of that 14:55:25 <bnemec> Cool 14:55:58 <bnemec> Oh, we're still tracking ci-admins in incubator too. 14:56:27 <shardy> bnemec: Yeah I thought there was some stuff like that 14:56:41 <shardy> it'd be good to move it to tripleo-ci then reture -incubator IMO 14:56:48 <slagle> pabelanger: thanks, i'll have a look at the bindep/jjb stuff 14:56:49 <bnemec> +1000 14:57:39 <shardy> Ok, anything else or shall we declare EOM? 14:58:04 <dprince> shardy: thanks for hosting 14:58:04 <shardy> Thanks all! 14:58:13 <shardy> #endmeeting