14:00:40 <shardy> #startmeeting tripleo
14:00:41 <openstack> Meeting started Tue Jul  5 14:00:40 2016 UTC and is due to finish in 60 minutes.  The chair is shardy. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:42 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:45 <openstack> The meeting name has been set to 'tripleo'
14:00:47 <shardy> #topic rollcall
14:00:51 <saneax> o/
14:00:53 <EmilienM> o/
14:00:53 <shardy> Hi all, who's around?
14:00:56 <rhallisey> hi
14:00:57 <skramaja> hello
14:00:58 <dprince> hello
14:00:59 <beagles> o/
14:01:01 <ccamacho> Hello guys!!!
14:01:18 <thrash> o/
14:01:44 <shardy> #link https://wiki.openstack.org/wiki/Meetings/TripleO
14:01:51 <derekh> o/
14:02:20 <trown> o/
14:02:24 <shardy> #topic agenda
14:02:25 <shardy> * one off agenda items
14:02:25 <shardy> * bugs
14:02:25 <shardy> * Projects releases or stable backports
14:02:27 <shardy> * CI
14:02:30 <shardy> * Specs
14:02:32 <shardy> * open discussion
14:02:42 <shardy> Does anyone have anything to add re one-off items?
14:02:50 <shardy> I see one re the UI network diagram, anything else?
14:04:02 <panda> o/ (late rollcall)
14:04:10 <shardy> #topic one-off agenda items
14:04:17 <shardy> #link https://blueprints.launchpad.net/tripleo-ui/+spec/network-diagram
14:04:20 <shardy> honza: Hi!
14:04:38 <shardy> So apologies, I saw the openstack-dev thread on this but have not yet replied
14:05:03 <shardy> the feature looks good, I'm just wondering what additional interfaces it may need
14:05:06 <marios> o/ /me late
14:05:12 <d0ugal> o/
14:05:20 <shardy> e.g can you use something like a heat stack-preview to get the graph of all network resources?
14:05:42 <shardy> IMHO you should definitely avoid parsing the templates directly as that will probably be fragile, and duplicate stuff inside heat
14:05:59 <shardy> Any other comments or questions on the BP?
14:07:15 <shardy> A spec would probably be a good idea, but that probably shouldn't prevent you starting the implementation
14:07:29 <shardy> the main risk I see in the BP description is "parse them from YAML"
14:07:37 <shardy> as mentioned, I don't think we should do that
14:08:51 <saneax> will it be encompassing the provider network as well, the ones set by os-net-config?
14:09:45 <shardy> saneax: good question - seems like we'd want it to support visualizing and validation both the physical and overlay network configuration
14:10:22 <shardy> Since honza doesn't appear to be around, lets take this the the existing openstack-dev thread
14:10:43 <shardy> #topic bugs
14:10:44 <dprince> shardy: sounds good
14:11:00 <shardy> #link https://bugs.launchpad.net/tripleo/
14:11:09 <honza> shardy: sorry, late
14:11:45 <shardy> honza: np, let's catch up in #tripleo after and/or on the ML
14:11:56 <honza> shardy: ok, thanks
14:11:58 <shardy> overall the BP looks good, but there's some details we should work out re parsing the templates
14:12:29 <shardy> Does anyone have any specific bugs to discuss this week?
14:12:29 <honza> shardy: dprince: yaml parsing is to be done via heat to avoid duplication of efforts (sorry that wasn't clear)
14:12:58 <shardy> things have been going better with CI this week, but there's still some issues with upgrades job both on master and stable branches
14:13:11 <shardy> some of that seems to be memory and/or walltime related
14:13:29 <sshnaidm> shardy, filed a but for upgrades: https://bugs.launchpad.net/tripleo/+bug/1599048
14:13:29 <openstack> Launchpad bug 1599048 in tripleo "CI upgrades job fail because of memory shortage" [High,New]
14:13:49 <sshnaidm> shardy, but will look at this after ci move
14:14:46 <sshnaidm> (next week)
14:14:56 <derekh> shardy sshnaidm so the upgrades job now doesn't exist, until we get rh1 back doing something
14:15:12 <shardy> derekh: hehe, that's one way to fix the problem ;)
14:15:35 <derekh> shardy: yup, unfortunatly we could only run a subset of jobs to keep us ticking over
14:15:56 <shardy> derekh: cool, what's the timeline for bringing the rh1 capacity back online?
14:16:01 <dprince> derekh: could we fit in a periodic upgrade job to at least monitor it?
14:16:13 <dprince> derekh: just an idea...
14:16:21 <derekh> I've added the upgrades jobs as an experimental job on the tripleo-ci repo, but that wont work unless we add multinics to the OVB envs
14:16:42 <shardy> so that means we've got no coverage of net-iso now too?
14:17:02 <dprince> sounds like it is time to land a bunch of code :) !
14:17:07 <shardy> lol
14:17:10 <shardy> #topic CI
14:17:12 <derekh> dprince: yup, I think I did add it as a periodic job
14:17:22 <shardy> since we're talking CI, let's just do the CI topic now :)
14:17:28 <derekh> dprince: but we'll need multi nics to for it to pass
14:17:40 <dprince> derekh: okay, so perhaps just a report, or we monitor that manually each day...
14:17:46 * derekh had something prepared
14:17:48 <derekh> All our jobs are now running on rh1, this is a subset of jobs but hopefully we'll only be in this situation for a short period
14:17:48 <derekh> Most of the details on the move are here http://lists.openstack.org/pipermail/openstack-dev/2016-July/098616.html
14:17:48 <derekh> If the jobs on rh2 go well, I think the main quiestion we need to answer is, should we redploy rh1 as it was or instead switch to OVB
14:17:48 <derekh> Some of the pros/cons I've put in that email, so maybe we should discuss it there
14:17:50 <derekh> On rh2 we have two problems currently
14:17:52 <derekh> 1. libery and mitaka jobs were failing because the sudoers file on the image used contained quotes and instack undercloud didn't allow for that
14:17:55 <derekh> this patch fixes the problem https://review.openstack.org/#/c/336470/
14:17:57 <derekh> but on liberty there seem to be a seperate problem, python is failing with a seg fault
14:17:59 <derekh> http://logs.openstack.org/70/336470/1/check-tripleo/gate-tripleo-ci-centos-7-ovb-ha/816d049/console.html#_2016-07-04_16_14_34_270127
14:18:02 <derekh> not sure why yet
14:18:04 <derekh> 2. The failure rate of the HA job is a little highter then it was on rh2, this is what I'm trying to reproduce at the moment
14:18:12 <derekh> the first line is wronge, all jobs are now on rh2...
14:19:30 <shardy> derekh: thanks for the update - so can we make the call re OVB when we've run the new rh2 CI for a little longer?
14:19:45 <derekh> the last line is wrong also, s/rh2/rh1/
14:19:45 <dprince> derekh: I will catch up on the list too, but unless RH1 comes up without much work I'm leaning towards a rebuild
14:19:53 <shardy> e.g when will rh1 be either rebuilt or brought back online?
14:20:08 <dprince> we are long overdue... I hate having to context switch back to Nova BM and tripleo-image-lements to maintain that cloud
14:20:16 <derekh> shardy: yup, I think that would be best, but I'm leaning in the direction of OVB aswell
14:20:27 <shardy> Yeah, provided the OVB env is working smoothly it seems like it'd be good to standardize on that
14:20:31 <derekh> shardy: we should have the HW available to us again on thursday
14:20:44 <shardy> derekh: Ok, cool, not a huge outage then
14:20:53 <derekh> shardy: after that it depends on what problems we hit, there could be none or many
14:21:03 <dprince> shardy: available doesn't mean anything is working :)
14:21:10 <dprince> fingers crossed though
14:21:16 <shardy> dprince: Yeah, but at that point we can make the call re OVB or not ;)
14:21:17 <derekh> yup, what dprince said
14:21:31 <shardy> until the hardware is moved, we can keep deliberating and looking at how things are with rh2 :)
14:21:32 <EmilienM> do we need refrain ourselves to merge some kind of patches (eg: net iso)?
14:22:04 <shardy> EmilienM: I think we'll have to rely on local testing for things that look risky from a CI coverage perspective
14:22:08 <dprince> EmilienM: without local testing I'd say lets be careful
14:22:25 <shardy> +1
14:22:27 <EmilienM> good to know
14:22:28 <dprince> I don't think we need to make a hard rule about not merging things though
14:23:21 <shardy> Ok, anything else we need to discuss re CI?
14:23:49 <panda> onboarding for new workforce ?
14:24:26 <shardy> panda: Ah yeah, I never got around to the ML thread re mentoring, I'll do it today
14:24:34 <shardy> sorry, busy week
14:24:48 <panda> I will need some directions on where to start and what are the initial expectation
14:24:49 <shardy> thanks for the reminder
14:24:52 <panda> shardy: ok, thanks.
14:25:15 <derekh> I'll also create an accound on rh2 with some quota for people looking to reproduce ci and put togeth some instructions
14:25:29 <shardy> derekh: +1 that sounds good
14:25:29 <EmilienM> https://goo.gl/VpIZPJ <- CI bugs
14:25:36 <saneax> derekh, +1
14:25:42 <shardy> #topic Projects releases or stable backports
14:25:54 <shardy> So, it's only a week until newton-2
14:26:02 <shardy> #link https://launchpad.net/tripleo/+milestone/newton-2
14:26:12 <shardy> #link http://releases.openstack.org/newton/schedule.html
14:26:28 <EmilienM> shardy: sorry, still on CI
14:26:31 <shardy> I'm starting to get worried, because although we're making good progress, we've not landed any blueprints yet
14:26:32 <EmilienM> shardy: we need to land https://review.openstack.org/#/c/336130/
14:26:34 <shardy> same as n-1
14:26:50 <EmilienM> shardy: mgould has some patches waiting on liberty and he's waiting for https://review.openstack.org/#/c/336130/
14:27:07 <shardy> I think we're already pushing hard on composable-services, but can folks please help by prioritizing reviews to things on that n-2 launchpad list (bugs and BPs)
14:27:33 <marios> shardy: for https://blueprints.launchpad.net/tripleo/+spec/overcloud-upgrades-workflow-mitaka-to-newton - i've been pulled into some mitaka related upgrades work atm... so that may slip. biggest issue there is https://bugs.launchpad.net/tripleo/+bug/1596950
14:27:33 <openstack> Launchpad bug 1596950 in tripleo "undercloud upgrade from stable/mitaka to latest hangs - possibly for swift services yet tbd" [Medium,Triaged] - Assigned to Marios Andreou (marios-b)
14:27:40 <shardy> I'm going to start deferring things this week, but it'd be super-great if I don't have to defer multiple features..
14:28:05 <shardy> marios: ack, please retarget to n-3 then, and link the related bugs on the whiteboard
14:28:30 <shardy> if it's blocked by that bug, you can set the status to blocked
14:28:43 <marios> shardy: ack
14:29:08 <shardy> EmilienM: thanks, I saw that yesterday, good to see the lint issues are now fixed
14:29:35 <EmilienM> shardy: still need +1 on liberty, mitaka is fixed
14:30:22 <shardy> I think we should merge it, the upgrades job isn't failing because of the gemfile change
14:30:48 <EmilienM> right
14:31:02 <shardy> done ;)
14:31:06 <EmilienM> shardy: the upgrade job is broken on liberty since we switched to ipv6
14:31:23 <shardy> Yeah, it'd be good to work out which patches we need to fix that
14:31:34 <shardy> thanks for investigating and figuring out the root-cause
14:31:40 <EmilienM> right, I spent time on it but I could not find time to work on it more
14:31:56 <shardy> np, lets revisit after n-2 has passed
14:32:18 <shardy> Does anyone else have updates or questions around releases or backports?
14:33:04 <shardy> #topic Specs
14:33:35 <shardy> #link https://review.openstack.org/#/q/project:openstack/tripleo-specs+status:open
14:33:38 <EmilienM> shardy: for the release, I can take care of it next Monday if you want
14:34:20 <shardy> We need further review of https://review.openstack.org/#/c/313872/ and https://review.openstack.org/#/c/313871/
14:34:31 <shardy> beagles: ^^ any chance you can revisit those?
14:34:42 <shardy> it'd be good to land them soon, as we're getting late in the cycle
14:34:53 <shardy> dsneddon has an interesting one re lldp too
14:35:04 <shardy> and we need to figure out the plan re composable upgrades
14:35:15 <karthiks> shadower,  we are working on SRIOV and DPDK
14:35:22 <shardy> EmilienM: Sure, if you're happy to do that that would be great, thanks! :)
14:35:31 <karthiks> shardy, we are working on SRIOV and DPDK
14:35:31 <EmilienM> will do
14:36:09 <shardy> karthiks: ack - I've approved the blueprints, so it'd be good to get those specs landed too
14:36:19 <skramaja> shardy, we are almost completed with SRIOV changes.. final puppet changes in progres..
14:36:46 <skramaja> for the dpdk, we are working on the os-net-config and  the puppet changes for dpdk..
14:37:02 <marios> shardy: ack on the composable upgrades, though as we discussed previously,  an upgrade newton to ocata thing. by work out you mean still tbd/discuss etc right or did i miss a thread again
14:37:21 <shardy> skramaja: good to know, I see some patches referenced from https://blueprints.launchpad.net/tripleo/+spec/tripleo-sriov
14:37:24 <shardy> thanks for the update
14:37:34 <skramaja> shardy, is there anything we have to do on the specs to push the specs
14:37:42 <marios> shardy: also hapy to work/help on a spec once we're there... though jistr has one up already that we should update once we decide
14:37:46 <shardy> marios: Yeah, I was just mentioning it as it's an outstanding spec
14:38:03 <shardy> marios: we've got a more urgent "ensure minor updates work with composable services" problem for Newton
14:38:32 <shardy> but the major upgrade of composable services can potentially slip to ocata, but I'd like to get a clear idea of how we'll do it asap
14:38:45 <marios> shardy: ack thanks
14:38:53 <shardy> been planning some prototyping to help with that, but composable roles/services have taken priority
14:39:10 <dprince> shardy: any links to your prototypes?
14:39:16 * dprince is interested in prototyping this too
14:39:32 <shardy> dprince: No, I've not done them yet, just looked at a few possible approaches
14:39:41 <shardy> hoping to get to that after we're past n2 next week
14:39:47 <dprince> shardy: cool. thanks
14:40:05 <EmilienM> I have one remark about composable services
14:40:08 <shardy> dprince: cool, we can sync up and perhaps work on it together
14:40:41 <EmilienM> I would like to slow down a bit the effort in new composable services (mistral, zaqar, etc) and help me (and others) to finish core services (neutron, ceilometer, etc)
14:41:14 <EmilienM> we still have some bits to move
14:41:25 <EmilienM> and if we want to make it for newton, we need more hands
14:41:38 <EmilienM> mistral/zaqar/etc can wait IMHO
14:41:42 <dprince> EmilienM: for new services though there shouldn't be any conflicts right?
14:41:49 <EmilienM> I'm not talking about conflict
14:41:52 <dprince> EmilienM: besides the CI resource drain, what are your concerns?
14:42:20 <EmilienM> ok let me put this way: I need help to finish ceilometer / nova / neutron bits
14:42:24 <marios> EmilienM: :( i'd really rather not delay manila further  though i can appreciate your concern for the time left to get all thigns done (core)
14:42:51 <dprince> EmilienM: from my POV one of the biggest benefits of composability is that we can add new services without concern for impacting the core
14:42:52 <EmilienM> I'm dealing with multiple blockers in same time, I would need some help
14:43:07 <dprince> EmilienM: ack, I agree with the priorities FWIW
14:43:11 <EmilienM> dprince: *again* it's not a problem of impacting the core
14:43:18 <EmilienM> it's a problem of "enough people moving core bits)
14:43:19 <shardy> we should prioritize those services we know folks want to scale independently of the controller
14:43:27 <dprince> EmilienM: but I would like to not discourage others who are chipping away at new services
14:43:40 <shardy> e.g of those which are still part of the monolithic controller definition
14:43:43 <EmilienM> I'm not discouraging people, I'm asking for help here
14:43:50 <dprince> EmilienM: gotcha
14:43:53 <EmilienM> cool
14:44:43 <EmilienM> some people are waiting for this thing to deploy AIO etc
14:44:56 <shardy> Yeah, there's a balance here, stuff like manila have been around for ages so we shouldn't block that, but absolutely we need more help with reviews/patches for the core services
14:45:05 <EmilienM> it's currently not possible because we need to finish nova / neutron (very close but still not finished)
14:45:27 <slagle> fwiw, i have deployed aio, w/o pacemaker
14:45:35 <shardy> EmilienM: perhaps we should break down the remaining services and raise bugs for those not yet decomposed
14:45:43 <EmilienM> right, but with pacemaker we still have issues (I'm on it)
14:45:45 <slagle> EmilienM: i'm testing your other 2 patches now
14:46:04 <shardy> then we could land the composable services BP, which is really about the new architecture, and have a clearer view wrt progress on remaining/new services
14:46:26 <shardy> with the benefit of hindsight, a single BP and an etherpad hasn't been the best way to track this (huge) chunk of work
14:46:45 <jrist> did it seem huge in foresight? :)
14:46:57 <EmilienM> right, it was a lot of work
14:47:23 <shardy> jrist: I think we knew it was a lot of work, but tracking it could have been done in a more granular way
14:47:46 <ccamacho> guys quick question, is the etherpad fully updated? https://etherpad.openstack.org/p/tripleo-composable-services
14:47:55 <EmilienM> ccamacho: I'm trying to update it every day
14:48:14 <dprince> shardy: we could have just forked t-h-t and then we'd be done by now already :)
14:48:22 <EmilienM> ccamacho: I check all patches every day and update status
14:48:37 <ccamacho> EmilienM me to, but when I hit things not updated, ack
14:48:49 <shardy> dprince: maybe, but this approach has allowed us to prove each step via CI, which we would have lost with the forking approach I guess
14:49:16 <shardy> progress lately has been very good, since CI has been more stable :)
14:49:48 <shardy> #topic open discussion
14:50:03 <shardy> Anyone have any topics to discuss in our final 10mins?
14:50:17 <shardy> slagle: the AIO work sounds interesting, care to share any details on that?
14:52:09 <shardy> Evidently not ;)
14:52:21 <shardy> Ok, well if there's nothing else lets finish early, thanks everyone!
14:52:33 <dprince> shardy: thanks
14:52:34 <shardy> #endmeeting