14:00:38 #startmeeting tripleo 14:00:39 Meeting started Tue Mar 21 14:00:38 2017 UTC and is due to finish in 60 minutes. The chair is EmilienM. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:40 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:41 #topic agenda 14:00:43 The meeting name has been set to 'tripleo' 14:00:43 Hello! 14:00:43 o/ 14:00:46 hey 14:00:46 * review past action items 14:00:47 o/ 14:00:47 \o 14:00:48 * one off agenda items 14:00:49 o/ 14:00:50 * bugs 14:00:50 o/ 14:00:52 * Projects releases or stable backports 14:00:54 * CI 14:00:56 * Specs 14:00:58 o/ 14:00:58 * Week roadmap 14:01:00 :) 14:01:00 * open discussion 14:01:02 Anyone can use the #link, #action and #info commands, not just the moderatorǃ 14:01:04 Hi everyone! who is around today? 14:01:08 hello o/ 14:01:11 o/ 14:01:13 weshay_sick: you still sick? I thought I told you to dress something when you sleep :P 14:01:18 o/ 14:01:31 o/ 14:01:32 o/ 14:01:33 o/ 14:01:39 hey 14:01:48 o/ 14:02:00 that spawns sooo many questions 14:02:03 o/ 14:02:21 (I'm not present today) 14:02:54 hi 14:02:57 ok let's start 14:03:01 o/ 14:03:12 #topic review past action items 14:03:20 team to review https://review.openstack.org/#/c/443964 and comment ML thread about it: still WIP 14:03:22 sshnaidm: ^ 14:03:36 sshnaidm: any blocker so far? 14:04:04 team to review plan import/export patches. akrivoka: did you have enough reviews? 14:04:06 EmilienM, yeah, we are discussing it in ML actually 14:04:20 sshnaidm: excellent if things are moving 14:04:23 marios check with marius if https://bugs.launchpad.net/tripleo/+bug/1661347 is still valid or has been fixed: done (thanks shardy for doing it) 14:04:23 Launchpad bug 1661347 in tripleo "During Newton->Ocata upgrade pattern is ignored in the 'Stop Ceph Mon' task in ceph-mon upgrade" [Critical,Fix released] - Assigned to Giulio Fidente (gfidente) 14:04:32 EmilienM, and there are blockers from pabelanger, no sure the thing move actually.. 14:04:49 EmilienM to follow-up on DIB thread and propose the change in Governance: done : https://review.openstack.org/#/c/445617/ - need more discussion 14:05:04 EmilienM: yah that one was done in the meeting (the /+bug/1661347 thanks shardy ) 14:05:17 sshnaidm: is there anything we can do? 14:05:19 o/ 14:05:28 sshnaidm: i see the proposal of using RDO zuul 14:05:53 EmilienM, yeah, I'd like you comment in ML what you think.. 14:06:07 EmilienM, I'm not sure it's a solution 14:06:22 ok I'll look at it asap today 14:06:34 EmilienM, and I'm not sure that issues that were raised there are real issues 14:06:34 #action EmilienM to look more closely the thread about periodic jobs 14:06:54 EmilienM, thanks 14:07:10 ok, moving on 14:07:15 #topic one off agenda items 14:07:18 #link https://etherpad.openstack.org/p/tripleo-meeting-items 14:07:29 I want to talk about moving DIB out of TripleO https://review.openstack.org/#/c/445617/ 14:07:31 o/ 14:08:09 shardy, bnemec: it sounds like dib-utils might not be ready to be removed 14:08:40 can we at least move dib itself out of tripleo, and figure dib-utils in another patch (and start actions on the second item?) 14:09:09 EmilienM: yeah it sounds like o-r-c still needs it, unless we also bundle the script with o-r-c like dib is doing? 14:09:36 it's pretty simple, perhaps we should modify o-r-c to do it all in python 14:09:43 but yeah for now I think we need dib-utils 14:10:07 ok so are you ok to 1. keep dib-utils now and figure out what to do in shorterm and 2. move out dib 14:10:41 shardy, bnemec ^ ? 14:11:20 yeah I think we agreed on the ML moving dib is good, but we need dib-utils 14:11:33 #action EmilienM to update https://review.openstack.org/#/c/445617/ to keep dib-utils part of TripleO for now and keep moving out DIB out of TripleO 14:11:48 any other item for this week before we move on? 14:12:12 ok moving on to bugs 14:12:14 #topic bugs 14:12:22 #link https://launchpad.net/tripleo/+milestone/pike-1 14:12:33 I want to review and inform about critical bugs this week: 14:12:36 #link 14:12:37 https://bugs.launchpad.net/tripleo/+bugs?field.searchtext=&orderby=-importance&field.status%3Alist=NEW&field.status%3Alist=CONFIRMED&field.status%3Alist=TRIAGED&field.status%3Alist=INPROGRESS&field.importance%3Alist=CRITICAL&assignee_option=any&field.assignee=&field.bug_reporter=&field.bug_commenter=&field.subscriber=&field.structural_subscriber=&field.milestone%3Alist=80593&field.tag=&field.ta 14:12:39 gs_combinator=ANY&field.has_cve.used=&field.omit_dupes.used=&field.omit_dupes=on&field.affects_me.used=&field.has_patch.used=&field.has_branches.used=&field.has_branches=on&field.has_no_branches.used=&field.has_no_branches=on&field.has_blueprints.used=&field.has_blueprints=on&field.has_no_blueprints.used=&field.has_no_blueprints=on&search=Search 14:12:43 well the link broke, let me try again 14:13:14 #link https://goo.gl/ZhrWJ8 14:13:32 #link https://bugs.launchpad.net/tripleo/+bug/1669452 14:13:32 Launchpad bug 1669452 in tripleo "Glance and Neutron worker defaults not working" [Critical,In progress] - Assigned to Brent Eagles (beagles) 14:13:46 beagles needs a review on https://review.openstack.org/#/c/425423/ 14:13:56 #link https://bugs.launchpad.net/tripleo/+bug/1669462 14:13:57 Launchpad bug 1669462 in tripleo "puppet package versions break ocata deployments and upgrades to pike" [Critical,Triaged] 14:14:11 shardy: do we have any progress so far ? I think we're still in stale situation 14:14:37 EmilienM: going to move https://bugs.launchpad.net/tripleo/+bug/1656791 to Fix Released (is in ocata see comment #7 landed master before ocata branched) 14:14:38 Launchpad bug 1656791 in tripleo "Mitaka to Newton undercloud upgrade needs "nova db online_data_migrations" after the undercloud upgrade" [High,In progress] - Assigned to Marios Andreou (marios-b) 14:15:02 EmilienM: matbu just mentioned the upgrade jobs are still broken so I guess not, but we need to sync up with #rdo folks to find an eta for the proposed workaround 14:15:04 EmilienM: the puppet packages shouldn't break things anymore but can't vouch for any of the other openstack packages 14:15:14 marios: ok so no need of backport 14:15:21 EmilienM: rigiht 14:15:26 EmilienM: since we bumped all the puppet metadata to use -dev 14:15:33 shardy: can you take action on this one please? 14:15:43 EmilienM: I can follow up on it sure 14:15:46 mwhahaha: right, I'm more concerned about all other packages 14:15:48 shardy: thanks 14:15:58 #link https://bugs.launchpad.net/tripleo/+bug/1671119 14:15:58 Launchpad bug 1671119 in tripleo "haproxy start will fail on selinux enabled systems due to unix stat socket" [Critical,Triaged] - Assigned to Michele Baldessari (michele) 14:16:15 sounds like bandini has it under control, I had to rebase patch but nothing is blocking now 14:16:27 bandini: do you need backport to newton too? 14:16:39 #link https://bugs.launchpad.net/tripleo/+bug/1673282 14:16:39 Launchpad bug 1673282 in tripleo "quickstart logging: implement compliance with infra requirements" [Critical,Triaged] 14:16:43 panda|pto: any progress on this one? ^ 14:16:52 adarazs, sshnaidm: ^ you too 14:17:10 EmilienM, I think we covered most of them 14:17:14 EmilienM: I think panda|pto had a change that addressed most of theproblems. 14:17:17 #link https://bugs.launchpad.net/tripleo/+bug/1674026 (under control) 14:17:17 Launchpad bug 1674026 in tripleo "stable/ocata scenario001-multinode-oooq failing with missing panko-dbsync command" [Critical,In progress] - Assigned to Emilien Macchi (emilienm) 14:17:23 sshnaidm, adarazs: please update the bug report then. 14:17:28 EmilienM, ok 14:17:42 #link https://bugs.launchpad.net/tripleo/+bug/1674067 (under control by adarazs) 14:17:42 Launchpad bug 1674067 in tripleo "newton: tripleo-quickstart jobs don't build packages from stable/newton" [Critical,Triaged] - Assigned to Attila Darazs (adarazs) 14:17:49 I think this was not updated again because of the broken gerrit-launchpad stuff. 14:18:00 #link https://bugs.launchpad.net/tripleo/+bug/1674236 (under control by jroll) 14:18:01 Launchpad bug 1674236 in tripleo "CI / promotion: Nova isn't aware of the nodes that were registered with Ironic" [Critical,In progress] - Assigned to Emilien Macchi (emilienm) 14:18:15 EmilienM: for https://bugs.launchpad.net/tripleo/+bug/1673447 the ocata cherrypick at https://review.openstack.org/#/c/446506/ is waiting for https://review.openstack.org/#/c/444338/ for ci (your comment there) 14:18:15 Launchpad bug 1673447 in tripleo "Cleanup no longer used upgrade files" [High,In progress] - Assigned to Marios Andreou (marios-b) 14:18:18 achievement unlocked: tripleo meeting shoutout 14:18:23 that stable branch one should be resolved soon. 14:18:26 I'm also taking care of following ^ to take actions if things need to be changed in TripleO wrt nova config 14:18:44 jroll: \o/ 14:18:55 marios: ok 14:19:08 marios: no, not related 14:19:17 marios: my patch is only for scenario001 and 002 14:19:30 marios: https://review.openstack.org/#/c/446506/ must be failing for another reason 14:19:37 marios: have you looked at logs? 14:20:14 EmilienM: ok... it was your comment i mean "please don't do recheck, we need https://review.openstack.org/#/c/444338/ first" 14:20:27 marios: yeah but I was prob wrong 14:20:31 marios: the patch is failing on http://logs.openstack.org/06/446506/1/check/gate-tripleo-ci-centos-7-nonha-multinode-oooq/c5bd561/console.html#_2017-03-17_10_08_13_172813 14:20:34 needs more investigation 14:20:35 EmilienM: i am pretty sure i did look at the logs at some point, but that review is removing files that aren't used in the job/workflow at all 14:20:48 EmilienM: ok i can have a look, may have missed something 14:20:49 #action marios to investigate why https://review.openstack.org/#/c/446506/ is failing 14:21:05 marios: yeah, probably a transient error but would be great to be investigated 14:21:11 marios: if you file anything, please let me know 14:21:13 EmilienM: sure thanks 14:21:22 and the last one: 14:21:24 #link https://bugs.launchpad.net/tripleo/+bug/1674681 14:21:24 Launchpad bug 1674681 in tripleo "centos.org repositories are unreliable" [Critical,Triaged] 14:21:28 reported this morning by /me 14:21:39 I commented few idea 14:21:44 ideas 14:21:52 we're having this one for months but nobody took real actions until now 14:22:18 I started to work on https://review.openstack.org/#/q/topic:cbs/https 14:22:42 bogdando: interesting 14:23:12 I've implemented that as puppet provider (rejected upstream), the same could be done for ansible or name it 14:23:41 bogdando: /join #rdo and we'll discuss about it post meeting 14:23:49 but I'm yet to know details, which tool exactly suffers from that issue 14:24:15 bogdando: yum from a bash script from an ansible task from a bash script 14:24:35 that's all for Critical bugs targeted for Pike-1 14:24:42 do we have any other bug to discuss this week? 14:25:26 a lot of triage has to be done on blueprints / bugs for pike-1: https://launchpad.net/tripleo/+milestone/pike-1 14:25:38 if anyone wants to help me, that's would be nice, thanks 14:25:56 3 New, 4 Incomplete, 4 Invalid, 1 Won't Fix, 2 Confirmed, 157 Triaged, 80 In Progress, 4 Fix Committed, 56 Fix Released 14:26:11 and blueprints: 14:26:13 12 Unknown, 13 Not started, 6 Started, 3 Slow progress, 11 Good progress, 7 Needs Code Review, 1 Implemented 14:26:48 EmilienM: I'll try to help out - I've used scripts in the past to auto-defer e.g all not in-progress bugs 14:27:04 shardy: defer to pike-2? 14:27:21 EmilienM: yeah, we could just defer anything not in progress to pike-2 14:27:21 yeah I have the script too, but I'm not sure we should do that at this stage (maybe?) 14:27:58 pike-1 is Apr 10 - Apr 14 14:28:09 I'll probably start defering next week 14:28:17 EmilienM: I'm just looking at the numbers - we should probably prioritize landing those things already in progress pretty soon 14:28:27 but yeah I guess we can wait until next week 14:28:40 157 Triaged 14:28:53 #action team to review https://launchpad.net/tripleo/+milestone/pike-1 and prioritize in progress work 14:29:12 #action team to postpone Triaged bugs to pike-2 next week 14:29:29 if no other bug to discuss, moving on 14:29:37 #topic projects releases or stable backports 14:29:42 adarazs: re: broken newton package building: https://review.openstack.org/447747 14:30:08 so the next release should be pike-1 in ~3 weeks 14:30:23 bkero: that should be solved once https://review.openstack.org/447530 (build-test-packages: fix distgit branch naming) merges, which depends on https://review.openstack.org/447993 (Don't use string parsing when detecting deploy result) -- they have both +2 and +w, just waiting for the gates to finish running. 14:31:02 do we have any question or comment about release management & stable backports? 14:31:16 stable/newton CI is currently down because of quickstart but it's WIP 14:31:32 stable/ocata CI is very unstable, but we're also working on it 14:31:52 adarazs: Awesome, we both wrote it. 14:32:05 I didn't see yours linked to from the bug yesterday when I made my effort. 14:32:11 https://bugs.launchpad.net/tripleo/+bug/1674026 and https://bugs.launchpad.net/tripleo/+bug/1674067 are the 2 bugs to follow 14:32:11 Launchpad bug 1674026 in tripleo "stable/ocata scenario001-multinode-oooq failing with missing panko-dbsync command" [Critical,In progress] - Assigned to Emilien Macchi (emilienm) 14:32:12 Launchpad bug 1674067 in tripleo "newton: tripleo-quickstart jobs don't build packages from stable/newton" [Critical,Triaged] - Assigned to Attila Darazs (adarazs) 14:32:37 anything else about release before we move on to CI topic? 14:33:01 #topic CI 14:33:11 I think we already covered all bugs related to CI 14:33:24 yep... 14:33:25 adarazs: any update to give (that we haven't said yet in this meeting)? 14:33:46 no, it should be working after that two changes merge. 14:33:54 let me put a quick summary of CI status : 14:34:13 1. Promotion is blocked by a regression in Nova for now https://bugs.launchpad.net/tripleo/+bug/1674236 14:34:13 Launchpad bug 1674236 in tripleo "CI / promotion: Nova isn't aware of the nodes that were registered with Ironic" [Critical,In progress] - Assigned to Emilien Macchi (emilienm) 14:34:36 2. Upgrade jobs on master are broken because of https://bugs.launchpad.net/tripleo/+bug/1669462 14:34:37 Launchpad bug 1669462 in tripleo "puppet package versions break ocata deployments and upgrades to pike" [Critical,Triaged] 14:35:02 3. stable/ocata jobs are unstable because of https://bugs.launchpad.net/tripleo/+bug/1674681 and https://bugs.launchpad.net/tripleo/+bug/1674026 14:35:02 Launchpad bug 1674681 in tripleo "centos.org repositories are unreliable" [Critical,Triaged] 14:35:03 Launchpad bug 1674026 in tripleo "stable/ocata scenario001-multinode-oooq failing with missing panko-dbsync command" [Critical,In progress] - Assigned to Emilien Macchi (emilienm) 14:35:27 4. stable/newton jobs are broken because of https://bugs.launchpad.net/tripleo/+bug/1674067 14:35:27 Launchpad bug 1674067 in tripleo "newton: tripleo-quickstart jobs don't build packages from stable/newton" [Critical,Triaged] - Assigned to Attila Darazs (adarazs) 14:35:29 and 5. all jobs can break anytime because of https://bugs.launchpad.net/tripleo/+bug/1674681 14:35:39 so before doing recheck, please look at the logs 14:36:04 and if you don't know where to find relevant logs, please ping me, adarazs, sshnaidm, panda|pto, bkero, trown if it's a question for quickstart 14:36:35 any comment / question about CI this week? 14:36:59 oh and if anyone is willing to help, that is always welcome... 14:37:12 EmilienM: + apparently the output of a successful overcloud deployment changed in master, on which Quickstart was relying... so until https://review.openstack.org/#/c/447993/ merges the master branch oooq jobs will fail too :/ 14:37:25 but it's very close to merging. 14:37:26 ah and that also 14:37:28 yeah 14:37:52 #topic specs 14:37:55 #link https://review.openstack.org/#/q/project:openstack/tripleo-specs+status:open 14:38:21 I'll repeat what we said next week but we need to review the specs before pike-1, it would be great 14:39:16 EmilienM for the new oooq CI jobs is there any guide about how the files organization? I mean i.e. subnode-1 should be the undercloud or subnode-2. 14:39:21 anyone wants to tlk about a specific spec now? 14:39:37 ccamacho, it's in progress 14:39:39 ccamacho: subnode-2 shouldn't exist. It's subnode1 14:40:06 but yeah, afik they're working on providing the doc in the logs 14:40:16 weshay_sick: any bug report / patch we can follow to know progress? 14:40:16 EmilienM but it does :P, also, i.e. where is now postci.txt.gz 14:40:26 ack weshay_sick thanks! 14:40:30 I'll open a bug so you guys can follow it 14:40:35 awesome 14:41:17 any comments on specs before we move on? please please review the specs, if you're tripleo core 14:41:33 yeah any docs on the CI logs would be good, it's taking me a while to find things since we changed to oooq 14:41:54 in particular I miss the postci log, that was always super-helpful 14:41:55 right, it's not obvious which log file to check when a specific ansible task is failing 14:42:08 I saw postci somewhere, let me find it 14:42:13 +1 on that 14:42:18 shardy, postci log is in /undercloud/var/log 14:42:18 we miss you postci xD 14:42:27 cool 14:42:30 sshnaidm: Ok can we link that from the top-level logs dir? 14:42:31 hint: ignore_errors: true may be replaced by failed_when: false for the most of the cases 14:42:38 in logs/undercloud/var/log/postci.txt.gz 14:42:39 this keeps the build log less RED 14:42:45 shardy, yeah, I can copy it there 14:42:46 a few small changes like that would make a big difference to those familiar with the old jobs I think 14:43:13 bogdando: please submit patches or it will be trashed in this meeting 14:43:19 ok) 14:43:23 shardy: ya that is the focus for the ci-squad over moving more jobs over 14:43:34 shardy: that's one of our focus now after the transition, so just keep the suggestions coming, we'll fix these up. 14:43:39 trown: ack good to know :) 14:43:46 adarazs: cool, thanks! 14:43:57 #topic open discussion 14:44:16 if you have any question or feedback, please go for it now 14:45:01 For containerized OpenStack, shall we not put logs into docker volumes (tht templates)?.. -v log:/var/log is not a good for each of the multiple containers, it uses COW, and is slow. IMO all should log to stdout only, each containerized service is ran by the foreground command AFAIK, should not a problem. 14:45:26 not be* 14:46:12 bogdando: how would it work with our opstools that read logfiles? 14:46:34 also, I'm not sure how a single docker volume "log" would handle simultaneous writes to overlapping files! 14:46:45 they should switch to fluentd or the like 14:46:50 flaper87, dprince, jistr, mandre ^ thoughts? 14:47:18 bogdando: we already use fluentd 14:47:25 bogdando: see the logging profiles in TripleO 14:47:31 IIUC, it can collect from containers stdout 14:47:43 but this should be tested 14:47:45 bogdando: we could just bind mount to a host dir, and avoid the COW penalty 14:48:02 bogdando: dirty, but would solve the performance issue I think 14:48:04 yeah we need to solve for the fluentd case but also for local development and CI 14:48:09 dprince: right, but still, several containers may want to write the same log file 14:48:10 bogdando: you might want to talk with larks and mmagr about this one 14:48:15 what happens to the history if we just log to stdout in the containers? 14:48:48 shardy: it should survive to ES cluster or what backend fluentd has 14:48:52 bogdando: with our services right now I think there would be little overlap here. Perhaps just apache and we mostly have control of that already 14:49:07 bogdando: most developers don't have an ELK stack, neither does CI 14:49:21 so that dependency should be considered optional 14:49:21 right, we need to collect logs into a file and store it 14:49:36 well, host path should work as well, if there is no corrupted overlaps 14:49:57 bogdando: ack, cool, I'm just saying lets keep local logging as an option 14:50:06 ack 14:50:21 bogdando: there may be some, but not many. I like moving towards console logging too. But local for now doesn't break things 14:50:48 bogdando: I've a patch for logs that might need to be revisited based on this 14:50:58 but there's also the need to integrate well with fluentd 14:51:03 we could mount not /var/logs but /var/logs/{{ service_name}} 14:51:08 s/not/from 14:51:13 I think, eventually, we should log to /var/log and also stdout 14:51:28 flaper87: can we file a launchpad bug for the fluentd case and maybe assign it to opstools folks (not in this meeting sadly) 14:51:41 EmilienM: yes, will do that 14:51:44 flaper87: merci 14:51:44 err, I mean -v /var/logs/foo/:/var/logs/ 14:51:55 so no overlaps 14:52:02 bogdando: https://review.openstack.org/#/c/442603/ 14:52:09 that's the patch that might need to be revisited 14:52:13 feel free to update it yourself 14:52:21 ok thanks 14:52:47 bogdando: thanks for bringing this up, super important 14:52:49 but, TBH, I think we need to get to the bottom of the slownes for these volumes 14:53:04 mounting host dirs can't always be the answer :( 14:53:53 any other topic for this week before I close the meeting? 14:54:23 I raised that cuz IIRC, once upon a time, the fuel scale lab team had discovered keystone *major* degradation cuz of slow syslog setup 14:54:42 so COW plays as well here 14:54:56 bogdando: interesting, thanks for sharing 14:54:58 serious things :) 14:55:11 yep, let's take some actions on that 14:55:32 #action flaper87 to file a bug in tripleo assigned to opstools for fluentd to read logs from containers 14:55:45 #action bogdando to followup on https://review.openstack.org/#/c/442603/ and update if needed 14:55:52 thanks everyone 14:56:03 #endmeeting