14:00:40 <shardy> #startmeeting tripleo 14:00:41 <openstack> Meeting started Tue Jul 5 14:00:40 2016 UTC and is due to finish in 60 minutes. The chair is shardy. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:42 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:45 <openstack> The meeting name has been set to 'tripleo' 14:00:47 <shardy> #topic rollcall 14:00:51 <saneax> o/ 14:00:53 <EmilienM> o/ 14:00:53 <shardy> Hi all, who's around? 14:00:56 <rhallisey> hi 14:00:57 <skramaja> hello 14:00:58 <dprince> hello 14:00:59 <beagles> o/ 14:01:01 <ccamacho> Hello guys!!! 14:01:18 <thrash> o/ 14:01:44 <shardy> #link https://wiki.openstack.org/wiki/Meetings/TripleO 14:01:51 <derekh> o/ 14:02:20 <trown> o/ 14:02:24 <shardy> #topic agenda 14:02:25 <shardy> * one off agenda items 14:02:25 <shardy> * bugs 14:02:25 <shardy> * Projects releases or stable backports 14:02:27 <shardy> * CI 14:02:30 <shardy> * Specs 14:02:32 <shardy> * open discussion 14:02:42 <shardy> Does anyone have anything to add re one-off items? 14:02:50 <shardy> I see one re the UI network diagram, anything else? 14:04:02 <panda> o/ (late rollcall) 14:04:10 <shardy> #topic one-off agenda items 14:04:17 <shardy> #link https://blueprints.launchpad.net/tripleo-ui/+spec/network-diagram 14:04:20 <shardy> honza: Hi! 14:04:38 <shardy> So apologies, I saw the openstack-dev thread on this but have not yet replied 14:05:03 <shardy> the feature looks good, I'm just wondering what additional interfaces it may need 14:05:06 <marios> o/ /me late 14:05:12 <d0ugal> o/ 14:05:20 <shardy> e.g can you use something like a heat stack-preview to get the graph of all network resources? 14:05:42 <shardy> IMHO you should definitely avoid parsing the templates directly as that will probably be fragile, and duplicate stuff inside heat 14:05:59 <shardy> Any other comments or questions on the BP? 14:07:15 <shardy> A spec would probably be a good idea, but that probably shouldn't prevent you starting the implementation 14:07:29 <shardy> the main risk I see in the BP description is "parse them from YAML" 14:07:37 <shardy> as mentioned, I don't think we should do that 14:08:51 <saneax> will it be encompassing the provider network as well, the ones set by os-net-config? 14:09:45 <shardy> saneax: good question - seems like we'd want it to support visualizing and validation both the physical and overlay network configuration 14:10:22 <shardy> Since honza doesn't appear to be around, lets take this the the existing openstack-dev thread 14:10:43 <shardy> #topic bugs 14:10:44 <dprince> shardy: sounds good 14:11:00 <shardy> #link https://bugs.launchpad.net/tripleo/ 14:11:09 <honza> shardy: sorry, late 14:11:45 <shardy> honza: np, let's catch up in #tripleo after and/or on the ML 14:11:56 <honza> shardy: ok, thanks 14:11:58 <shardy> overall the BP looks good, but there's some details we should work out re parsing the templates 14:12:29 <shardy> Does anyone have any specific bugs to discuss this week? 14:12:29 <honza> shardy: dprince: yaml parsing is to be done via heat to avoid duplication of efforts (sorry that wasn't clear) 14:12:58 <shardy> things have been going better with CI this week, but there's still some issues with upgrades job both on master and stable branches 14:13:11 <shardy> some of that seems to be memory and/or walltime related 14:13:29 <sshnaidm> shardy, filed a but for upgrades: https://bugs.launchpad.net/tripleo/+bug/1599048 14:13:29 <openstack> Launchpad bug 1599048 in tripleo "CI upgrades job fail because of memory shortage" [High,New] 14:13:49 <sshnaidm> shardy, but will look at this after ci move 14:14:46 <sshnaidm> (next week) 14:14:56 <derekh> shardy sshnaidm so the upgrades job now doesn't exist, until we get rh1 back doing something 14:15:12 <shardy> derekh: hehe, that's one way to fix the problem ;) 14:15:35 <derekh> shardy: yup, unfortunatly we could only run a subset of jobs to keep us ticking over 14:15:56 <shardy> derekh: cool, what's the timeline for bringing the rh1 capacity back online? 14:16:01 <dprince> derekh: could we fit in a periodic upgrade job to at least monitor it? 14:16:13 <dprince> derekh: just an idea... 14:16:21 <derekh> I've added the upgrades jobs as an experimental job on the tripleo-ci repo, but that wont work unless we add multinics to the OVB envs 14:16:42 <shardy> so that means we've got no coverage of net-iso now too? 14:17:02 <dprince> sounds like it is time to land a bunch of code :) ! 14:17:07 <shardy> lol 14:17:10 <shardy> #topic CI 14:17:12 <derekh> dprince: yup, I think I did add it as a periodic job 14:17:22 <shardy> since we're talking CI, let's just do the CI topic now :) 14:17:28 <derekh> dprince: but we'll need multi nics to for it to pass 14:17:40 <dprince> derekh: okay, so perhaps just a report, or we monitor that manually each day... 14:17:46 * derekh had something prepared 14:17:48 <derekh> All our jobs are now running on rh1, this is a subset of jobs but hopefully we'll only be in this situation for a short period 14:17:48 <derekh> Most of the details on the move are here http://lists.openstack.org/pipermail/openstack-dev/2016-July/098616.html 14:17:48 <derekh> If the jobs on rh2 go well, I think the main quiestion we need to answer is, should we redploy rh1 as it was or instead switch to OVB 14:17:48 <derekh> Some of the pros/cons I've put in that email, so maybe we should discuss it there 14:17:50 <derekh> On rh2 we have two problems currently 14:17:52 <derekh> 1. libery and mitaka jobs were failing because the sudoers file on the image used contained quotes and instack undercloud didn't allow for that 14:17:55 <derekh> this patch fixes the problem https://review.openstack.org/#/c/336470/ 14:17:57 <derekh> but on liberty there seem to be a seperate problem, python is failing with a seg fault 14:17:59 <derekh> http://logs.openstack.org/70/336470/1/check-tripleo/gate-tripleo-ci-centos-7-ovb-ha/816d049/console.html#_2016-07-04_16_14_34_270127 14:18:02 <derekh> not sure why yet 14:18:04 <derekh> 2. The failure rate of the HA job is a little highter then it was on rh2, this is what I'm trying to reproduce at the moment 14:18:12 <derekh> the first line is wronge, all jobs are now on rh2... 14:19:30 <shardy> derekh: thanks for the update - so can we make the call re OVB when we've run the new rh2 CI for a little longer? 14:19:45 <derekh> the last line is wrong also, s/rh2/rh1/ 14:19:45 <dprince> derekh: I will catch up on the list too, but unless RH1 comes up without much work I'm leaning towards a rebuild 14:19:53 <shardy> e.g when will rh1 be either rebuilt or brought back online? 14:20:08 <dprince> we are long overdue... I hate having to context switch back to Nova BM and tripleo-image-lements to maintain that cloud 14:20:16 <derekh> shardy: yup, I think that would be best, but I'm leaning in the direction of OVB aswell 14:20:27 <shardy> Yeah, provided the OVB env is working smoothly it seems like it'd be good to standardize on that 14:20:31 <derekh> shardy: we should have the HW available to us again on thursday 14:20:44 <shardy> derekh: Ok, cool, not a huge outage then 14:20:53 <derekh> shardy: after that it depends on what problems we hit, there could be none or many 14:21:03 <dprince> shardy: available doesn't mean anything is working :) 14:21:10 <dprince> fingers crossed though 14:21:16 <shardy> dprince: Yeah, but at that point we can make the call re OVB or not ;) 14:21:17 <derekh> yup, what dprince said 14:21:31 <shardy> until the hardware is moved, we can keep deliberating and looking at how things are with rh2 :) 14:21:32 <EmilienM> do we need refrain ourselves to merge some kind of patches (eg: net iso)? 14:22:04 <shardy> EmilienM: I think we'll have to rely on local testing for things that look risky from a CI coverage perspective 14:22:08 <dprince> EmilienM: without local testing I'd say lets be careful 14:22:25 <shardy> +1 14:22:27 <EmilienM> good to know 14:22:28 <dprince> I don't think we need to make a hard rule about not merging things though 14:23:21 <shardy> Ok, anything else we need to discuss re CI? 14:23:49 <panda> onboarding for new workforce ? 14:24:26 <shardy> panda: Ah yeah, I never got around to the ML thread re mentoring, I'll do it today 14:24:34 <shardy> sorry, busy week 14:24:48 <panda> I will need some directions on where to start and what are the initial expectation 14:24:49 <shardy> thanks for the reminder 14:24:52 <panda> shardy: ok, thanks. 14:25:15 <derekh> I'll also create an accound on rh2 with some quota for people looking to reproduce ci and put togeth some instructions 14:25:29 <shardy> derekh: +1 that sounds good 14:25:29 <EmilienM> https://goo.gl/VpIZPJ <- CI bugs 14:25:36 <saneax> derekh, +1 14:25:42 <shardy> #topic Projects releases or stable backports 14:25:54 <shardy> So, it's only a week until newton-2 14:26:02 <shardy> #link https://launchpad.net/tripleo/+milestone/newton-2 14:26:12 <shardy> #link http://releases.openstack.org/newton/schedule.html 14:26:28 <EmilienM> shardy: sorry, still on CI 14:26:31 <shardy> I'm starting to get worried, because although we're making good progress, we've not landed any blueprints yet 14:26:32 <EmilienM> shardy: we need to land https://review.openstack.org/#/c/336130/ 14:26:34 <shardy> same as n-1 14:26:50 <EmilienM> shardy: mgould has some patches waiting on liberty and he's waiting for https://review.openstack.org/#/c/336130/ 14:27:07 <shardy> I think we're already pushing hard on composable-services, but can folks please help by prioritizing reviews to things on that n-2 launchpad list (bugs and BPs) 14:27:33 <marios> shardy: for https://blueprints.launchpad.net/tripleo/+spec/overcloud-upgrades-workflow-mitaka-to-newton - i've been pulled into some mitaka related upgrades work atm... so that may slip. biggest issue there is https://bugs.launchpad.net/tripleo/+bug/1596950 14:27:33 <openstack> Launchpad bug 1596950 in tripleo "undercloud upgrade from stable/mitaka to latest hangs - possibly for swift services yet tbd" [Medium,Triaged] - Assigned to Marios Andreou (marios-b) 14:27:40 <shardy> I'm going to start deferring things this week, but it'd be super-great if I don't have to defer multiple features.. 14:28:05 <shardy> marios: ack, please retarget to n-3 then, and link the related bugs on the whiteboard 14:28:30 <shardy> if it's blocked by that bug, you can set the status to blocked 14:28:43 <marios> shardy: ack 14:29:08 <shardy> EmilienM: thanks, I saw that yesterday, good to see the lint issues are now fixed 14:29:35 <EmilienM> shardy: still need +1 on liberty, mitaka is fixed 14:30:22 <shardy> I think we should merge it, the upgrades job isn't failing because of the gemfile change 14:30:48 <EmilienM> right 14:31:02 <shardy> done ;) 14:31:06 <EmilienM> shardy: the upgrade job is broken on liberty since we switched to ipv6 14:31:23 <shardy> Yeah, it'd be good to work out which patches we need to fix that 14:31:34 <shardy> thanks for investigating and figuring out the root-cause 14:31:40 <EmilienM> right, I spent time on it but I could not find time to work on it more 14:31:56 <shardy> np, lets revisit after n-2 has passed 14:32:18 <shardy> Does anyone else have updates or questions around releases or backports? 14:33:04 <shardy> #topic Specs 14:33:35 <shardy> #link https://review.openstack.org/#/q/project:openstack/tripleo-specs+status:open 14:33:38 <EmilienM> shardy: for the release, I can take care of it next Monday if you want 14:34:20 <shardy> We need further review of https://review.openstack.org/#/c/313872/ and https://review.openstack.org/#/c/313871/ 14:34:31 <shardy> beagles: ^^ any chance you can revisit those? 14:34:42 <shardy> it'd be good to land them soon, as we're getting late in the cycle 14:34:53 <shardy> dsneddon has an interesting one re lldp too 14:35:04 <shardy> and we need to figure out the plan re composable upgrades 14:35:15 <karthiks> shadower, we are working on SRIOV and DPDK 14:35:22 <shardy> EmilienM: Sure, if you're happy to do that that would be great, thanks! :) 14:35:31 <karthiks> shardy, we are working on SRIOV and DPDK 14:35:31 <EmilienM> will do 14:36:09 <shardy> karthiks: ack - I've approved the blueprints, so it'd be good to get those specs landed too 14:36:19 <skramaja> shardy, we are almost completed with SRIOV changes.. final puppet changes in progres.. 14:36:46 <skramaja> for the dpdk, we are working on the os-net-config and the puppet changes for dpdk.. 14:37:02 <marios> shardy: ack on the composable upgrades, though as we discussed previously, an upgrade newton to ocata thing. by work out you mean still tbd/discuss etc right or did i miss a thread again 14:37:21 <shardy> skramaja: good to know, I see some patches referenced from https://blueprints.launchpad.net/tripleo/+spec/tripleo-sriov 14:37:24 <shardy> thanks for the update 14:37:34 <skramaja> shardy, is there anything we have to do on the specs to push the specs 14:37:42 <marios> shardy: also hapy to work/help on a spec once we're there... though jistr has one up already that we should update once we decide 14:37:46 <shardy> marios: Yeah, I was just mentioning it as it's an outstanding spec 14:38:03 <shardy> marios: we've got a more urgent "ensure minor updates work with composable services" problem for Newton 14:38:32 <shardy> but the major upgrade of composable services can potentially slip to ocata, but I'd like to get a clear idea of how we'll do it asap 14:38:45 <marios> shardy: ack thanks 14:38:53 <shardy> been planning some prototyping to help with that, but composable roles/services have taken priority 14:39:10 <dprince> shardy: any links to your prototypes? 14:39:16 * dprince is interested in prototyping this too 14:39:32 <shardy> dprince: No, I've not done them yet, just looked at a few possible approaches 14:39:41 <shardy> hoping to get to that after we're past n2 next week 14:39:47 <dprince> shardy: cool. thanks 14:40:05 <EmilienM> I have one remark about composable services 14:40:08 <shardy> dprince: cool, we can sync up and perhaps work on it together 14:40:41 <EmilienM> I would like to slow down a bit the effort in new composable services (mistral, zaqar, etc) and help me (and others) to finish core services (neutron, ceilometer, etc) 14:41:14 <EmilienM> we still have some bits to move 14:41:25 <EmilienM> and if we want to make it for newton, we need more hands 14:41:38 <EmilienM> mistral/zaqar/etc can wait IMHO 14:41:42 <dprince> EmilienM: for new services though there shouldn't be any conflicts right? 14:41:49 <EmilienM> I'm not talking about conflict 14:41:52 <dprince> EmilienM: besides the CI resource drain, what are your concerns? 14:42:20 <EmilienM> ok let me put this way: I need help to finish ceilometer / nova / neutron bits 14:42:24 <marios> EmilienM: :( i'd really rather not delay manila further though i can appreciate your concern for the time left to get all thigns done (core) 14:42:51 <dprince> EmilienM: from my POV one of the biggest benefits of composability is that we can add new services without concern for impacting the core 14:42:52 <EmilienM> I'm dealing with multiple blockers in same time, I would need some help 14:43:07 <dprince> EmilienM: ack, I agree with the priorities FWIW 14:43:11 <EmilienM> dprince: *again* it's not a problem of impacting the core 14:43:18 <EmilienM> it's a problem of "enough people moving core bits) 14:43:19 <shardy> we should prioritize those services we know folks want to scale independently of the controller 14:43:27 <dprince> EmilienM: but I would like to not discourage others who are chipping away at new services 14:43:40 <shardy> e.g of those which are still part of the monolithic controller definition 14:43:43 <EmilienM> I'm not discouraging people, I'm asking for help here 14:43:50 <dprince> EmilienM: gotcha 14:43:53 <EmilienM> cool 14:44:43 <EmilienM> some people are waiting for this thing to deploy AIO etc 14:44:56 <shardy> Yeah, there's a balance here, stuff like manila have been around for ages so we shouldn't block that, but absolutely we need more help with reviews/patches for the core services 14:45:05 <EmilienM> it's currently not possible because we need to finish nova / neutron (very close but still not finished) 14:45:27 <slagle> fwiw, i have deployed aio, w/o pacemaker 14:45:35 <shardy> EmilienM: perhaps we should break down the remaining services and raise bugs for those not yet decomposed 14:45:43 <EmilienM> right, but with pacemaker we still have issues (I'm on it) 14:45:45 <slagle> EmilienM: i'm testing your other 2 patches now 14:46:04 <shardy> then we could land the composable services BP, which is really about the new architecture, and have a clearer view wrt progress on remaining/new services 14:46:26 <shardy> with the benefit of hindsight, a single BP and an etherpad hasn't been the best way to track this (huge) chunk of work 14:46:45 <jrist> did it seem huge in foresight? :) 14:46:57 <EmilienM> right, it was a lot of work 14:47:23 <shardy> jrist: I think we knew it was a lot of work, but tracking it could have been done in a more granular way 14:47:46 <ccamacho> guys quick question, is the etherpad fully updated? https://etherpad.openstack.org/p/tripleo-composable-services 14:47:55 <EmilienM> ccamacho: I'm trying to update it every day 14:48:14 <dprince> shardy: we could have just forked t-h-t and then we'd be done by now already :) 14:48:22 <EmilienM> ccamacho: I check all patches every day and update status 14:48:37 <ccamacho> EmilienM me to, but when I hit things not updated, ack 14:48:49 <shardy> dprince: maybe, but this approach has allowed us to prove each step via CI, which we would have lost with the forking approach I guess 14:49:16 <shardy> progress lately has been very good, since CI has been more stable :) 14:49:48 <shardy> #topic open discussion 14:50:03 <shardy> Anyone have any topics to discuss in our final 10mins? 14:50:17 <shardy> slagle: the AIO work sounds interesting, care to share any details on that? 14:52:09 <shardy> Evidently not ;) 14:52:21 <shardy> Ok, well if there's nothing else lets finish early, thanks everyone! 14:52:33 <dprince> shardy: thanks 14:52:34 <shardy> #endmeeting