14:00:32 #startmeeting tripleo 14:00:33 Meeting started Tue Jul 11 14:00:32 2017 UTC and is due to finish in 60 minutes. The chair is EmilienM. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:35 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:37 The meeting name has been set to 'tripleo' 14:00:42 o/ 14:01:02 \o 14:01:08 o/ 14:01:10 o/ 14:01:11 #topic agenda 14:01:12 o/ 14:01:12 * review past action items 14:01:14 * one off agenda items 14:01:14 o\ 14:01:16 * bugs 14:01:18 * Projects releases or stable backports 14:01:20 * CI 14:01:22 * Specs 14:01:24 * open discussion 14:01:26 Anyone can use the #link, #action and #info commands, not just the moderatorǃ 14:01:28 Hi everyone! who is around today? 14:01:30 * EmilienM slow today 14:01:36 o/ 14:01:40 \o 14:01:44 o/ 14:01:45 o/ 14:01:47 o/ 14:01:56 hi2u 14:02:20 #topic review past action items 14:02:29 team to review https://review.openstack.org/#/c/478516/ 14:02:39 panda isn't here but I see the patch needs some love 14:02:45 o/ 14:02:49 I'll check with him 14:02:54 o/ 14:02:58 #topic one off agenda items 14:03:02 #link https://etherpad.openstack.org/p/tripleo-meeting-items 14:03:13 gfidente: go ahead 14:03:17 o/ 14:03:19 o/ 14:03:30 so I think we got up for review the remaining submissions to test ceph-ansible in ci 14:03:46 my question is if it should replace the old environment file so that jobs previously using ceph now uses ceph-ansible 14:03:57 or if we should have separate jobs 14:04:04 how do you manage upgrades? 14:04:04 /me would like the first option more 14:04:23 EmilienM that is a ceph-ansible playbook which we trigger from a heat parameter 14:04:27 o/ 14:04:28 so your option would force everyone to deploy the new thing? 14:04:56 my option would keep the existing puppet-ceph services but default the existing environment files to use ceph-ansible yes 14:05:46 will we have tripleo jobs running on ceph-ansible changes? 14:06:03 trown good questions, we don't have yet, this was discussed but it's totally WIP 14:06:13 so the answer is no 14:06:29 right now we own ceph-ansible builds in cbs, but we don't have ci for ceph-ansible against tripleo 14:06:35 that seems prone to ci outages 14:06:35 unless we have CI coverage for ceph-ansible, I don't see any point to switch to it 14:06:58 though that is our only option to deploy ceph in containers today 14:07:21 ya, maybe we need to first add the patches that enable turning it on, but keeping default to puppet-ceph 14:07:24 if we had ceph-ansible in our promotion pipeline (packaging in RDO, etc), then fine 14:07:38 then get tripleo CI running on ceph-ansible 14:08:00 yes, I would have seen a CI job like it was the plan but never came up 14:08:00 if we get enabling patches in, we could always run container job that way 14:08:02 EmilienM so no right now it's not subject to promotion, we manually tag the package for testing in the cbs repos and later release 14:08:20 maybe you can switch multinode-scenario001-container to use ceph-ansible 14:08:58 gfidente: jobs are voting. Everything manual will be error prone and break our CI. Multinode container job breaks almost every day nowadays 14:08:58 EmilienM okay I'll look into that 14:09:26 EmilienM yeah we have to put up automation in ceph-ansible CI 14:09:34 everything that isn't tested and isn't in promotion pipeline shouldn't be in gate. 14:09:56 +1 14:10:05 gfidente: so make it work in multinode-scenario001-container and add ceph-ansible to rdo packaging where we can control its version from rdoinfo. 14:10:26 once we have that, it's a bit more solid and we can maybe consider it into the gate again 14:10:39 ack, thanks 14:10:59 anything else for this week before we move to the regular agenda? 14:11:13 (not sure if we can make it into rdo packaging actually, but that's a different conversation I think) 14:11:28 gfidente: something that our delivery team can control the version 14:11:43 gfidente: folks who manage ceph-ansible are in europe - if something breaks who will be around? 14:12:08 we want to be able to control tags / bumps from a single place, like we do for almost everything else 14:12:41 to me it makes sense to have ceph-ansible part of rdo (with a distgit and a version control) 14:12:44 EmilienM yeah I think I get the point 14:12:52 but yeah we can discuss it offline 14:13:07 gfidente: maybe on #rdo after this meeting 14:13:13 with apevec & the team 14:13:15 EmilienM sure if you can, thanks 14:13:26 cool, let's do that 14:13:44 #action gfidente & EmilienM to figure out where if whether or not we package ceph-ansible in RDO 14:14:04 #action gfidente to add ceph-ansible to multinode-scenario001-container job 14:14:11 #topic bugs 14:14:16 #link https://launchpad.net/tripleo/+milestone/pike-3 14:14:26 is there any critical bug we need to discuss this week? 14:14:37 anything outstanding? 14:15:14 I see https://bugs.launchpad.net/tripleo/+bug/1703599 14:15:15 Launchpad bug 1703599 in tripleo "Containers multinode job deploying an empty overcloud" [Critical,In progress] - Assigned to Jiří Stránský (jistr) 14:15:27 jistr|call: is it the only patch that needs to be merged? https://review.openstack.org/#/c/482545 14:15:59 EmilienM: for deploy job, yes. We have another one for upgrade job. 14:16:02 jistr|call: what change caused this bug? I'm curious - we gate all tripleo projects on containers 14:16:33 EmilienM: it went to quickstart-extras where we don't run the multinode job. The OOOQ job passed on it, somewhat strangely. 14:16:52 we need to fix that 14:17:09 sshnaidm: can you check if we run enough jobs in oooq-extras? 14:17:15 the multinode puppet jobs run on extras 14:17:21 EmilienM, I'll add multinode containers to extras 14:17:32 sshnaidm: thank you sir 14:17:36 weshay: not containers 14:17:37 weshay: sorry i meant we don't run multinode-containers there 14:17:41 ah.. I see but not containers 14:17:42 ya 14:18:05 we should have all oooq jobs on oooq-extras 14:18:06 sshnaidm: maybe one is enough for now, not all scenarios but just gate-tripleo-ci-centos-7-containers-multinode 14:18:17 EmilienM, yeah, right 14:18:24 or all scenarios - really I wouldn't mind 14:18:32 mwhahaha: yes I agree 14:18:43 the maximum we can add, please add 14:18:47 at least all oooq jobs that are voting 14:18:50 so we reduce the breakages 14:18:59 yeah, all gates 14:19:11 sshnaidm: can you take this one? 14:19:24 EmilienM, sure 14:19:31 #action sshnaidm to add missing oooq jobs in oooq-extras which are already gating somewhere in tripleo 14:19:38 sshnaidm: thx 14:19:44 any other bug to discuss this week? 14:20:10 #topic projects releases or stable backports 14:20:15 #link https://releases.openstack.org/pike/schedule.html 14:20:41 we're 2 weeks from pike-3 milestone 14:20:53 next week is Final release for non-client libraries 14:21:10 I think tripleo-common and maybe some others 14:22:09 we'll discuss more about pike-3 next week 14:22:18 we still have a bunch of blueprints in progress 14:22:39 if you could help by updating your blueprints / bug in Launchpad, it would be awesome 14:22:58 because when I do it myself it's in ninja mode and not so good ;-) 14:23:16 any question about release management before we go ahead? 14:23:16 We usually treat "tripleo-common" as "other" (internal component), not a library as it tends to be need updates right to the end 14:23:32 jpich: right, I just checked in openstack/releases - it's other 14:23:48 so we shouldn't have anything to release by next week 14:23:57 Yeah agreed it's probably too early to freeze t-c 14:24:05 jpich: thanks for the correction 14:24:17 Ok, cool! 14:24:29 #topic CI 14:24:50 adarazs, panda, sshnaidm, weshay, trown : any outstanding updates on CI this week? 14:24:52 weshay: Hey any update on the conversion of the 3nodes job to quickstart? 14:25:12 I'd like to get coverage of multinode (more than one node) w/containers+HA 14:25:23 EmilienM, former ovb-updates jobs transited to oooq and successfully runs 14:25:33 but I kinda stopped looking into it as I was under the impression we were on the verge of quickstarting the existing 3nodes job 14:25:52 if that's not imminent we should probably go ahead and add the coverage with the old scripts as a stopgap 14:26:01 but I'm a little wary of doing the work twice.. 14:26:19 sshnaidm: cool, we could update tripleo.org/cistatus.html maybe 14:26:33 EmilienM, I have a patch, waiting for fixing gates 14:26:53 EmilienM: I'm working on adding a static "readme" footer for our logs on logs.openstack.org, hopefully this can go through: https://review.openstack.org/482210 14:26:56 I think trown was about to look at the 3-nodes thing but I might be wrong 14:26:57 shardy, our backlog and status is here https://trello.com/b/U1ITy0cu/tripleo-ci-squad 14:27:07 adarazs: really cool 14:27:08 shardy, no progress afaik 14:27:14 EmilienM: not on my radar no 14:27:21 ah ok 14:27:33 adarazs: ping #openstack-infra if you need review - none of us is core on it 14:27:43 weshay: Ok, should we have a discussion re priorities and the 3nodes stuff perhaps? 14:27:46 adarazs: but I'll review it anyway 14:27:56 shardy, yes.. we can do that 14:28:06 the issue is we need coverage of HA w/containers, then rolling updates w HA/containers 14:28:08 EmilienM: ack, I added pabelanger and ajaeger, but I might ping them directly too. 14:28:21 shardy, adarazs can you add that to the schedule for thrs tripleo-ci-squad mtg and invite shardy 14:28:28 the latter is net-new coverage, and I'm not super motivated to do it twice, or ask someone to do the work only for it to be immediately redone 14:28:36 as was the case e.g with upgrades in the past 14:28:39 shardy, understood 14:28:53 weshay: thanks! 14:29:26 feel free to ping me, I am back from PTO today 14:30:10 shardy, weshay: I'll add an agenda item on https://etherpad.openstack.org/p/tripleo-ci-squad-meeting for it. 14:30:22 adarazs: ack sounds good, thanks! 14:30:25 cool, thanks 14:30:41 for now perhaps I'll post a WIP patch hijacking the exisitng 3nodes job to show what we'd like to test 14:31:02 shardy: good idea 14:31:13 shardy: you might have some changes to do in tripleo-ci repo maybe 14:31:26 EmilienM: yeah, I'll take a look 14:31:30 for the 3-nodes environment - but not sure if really needed, just fyi 14:31:55 anything else CI related for this week? 14:31:58 yeah it uses a custom role which isn't strictly required but should work for HA+Containers I think 14:32:26 good news 14:32:27 adarazs: weshay tripleo-ci mtg is on thursday right ? 14:32:38 i'll join as well with shardy 14:32:40 matbu: yeah, 14:30 UTC 14:32:52 I'll add you to the pinglist :) 14:32:53 adarazs: ack 14:32:58 adarazs: thx 14:33:04 adarazs: I'll be able to join this week as well (I wasn't last week) 14:33:23 ok moving on 14:33:25 #topic specs 14:33:32 #link https://review.openstack.org/#/q/project:openstack/tripleo-specs+status:open 14:33:43 EmilienM: cool. I was on PTO last week too, so I don't know what exactly happened apart from what I see on the pad. 14:34:12 adarazs: I might have missed it but I haven't seen an email summary 14:34:39 well, I couldn't really make a summary because I wasn't there :) 14:34:43 who led the meeting? 14:34:47 apart from pasting the agenda. 14:35:14 weshay, trown, sshnaidm, panda ? ^ 14:35:43 ok moving on 14:36:00 anything to discuss about specs this week? 14:36:24 #topic open discussion 14:36:46 * Reminder about PTG: please contribute to the schedule: https://etherpad.openstack.org/p/tripleo-ptg-queens 14:37:08 we'll have 2 days 1/2 to work together, again it's a good place to make progress together 14:37:22 feel free to propose new topics 14:37:24 scenario upgrade jobs in ocata are timing out, is there anything we can do to improve these as it's blocking backports 14:38:07 https://bugs.launchpad.net/tripleo/+bug/1702955 14:38:08 Launchpad bug 1702955 in tripleo "tripleo upgrade jobs timeout when running in RAX cloud" [Critical,Triaged] 14:38:13 that's what we found out last week 14:38:23 scenario001 seems to be consistently failing 14:38:33 rax/ovh doesn't seem to be provider specific 14:38:50 so from what I've understand, it sounds like it's timing out when it's not running on OSIC cloud (which is by far the faster cloud that we have) 14:39:05 so yeah there is something to do here 14:39:31 mwhahaha: we should spend some time in comparing old jobs (when it was far from timeouting) and now and compare the overcloud deploy steps 14:39:35 and see what takes more time 14:39:49 figuring out if it's because some services or if because one specific service, etc 14:40:13 I can have a look this week and start some investigation, if someone is willing to help a little also 14:41:11 anything else for open discussion? 14:41:13 i'll see if i can take a look as well 14:41:29 mwhahaha: I would rather poke someone from upgrade team if you don't mind 14:41:41 mwhahaha: I think you have better to do :-) 14:42:30 marios: ^ anyone in your team willing to help? 14:42:56 anyway, closing the meeting, we can continue offline 14:42:58 thanks everyone 14:43:00 #endmeeting