14:00:36 <mwhahaha> #startmeeting tripleo 14:00:36 <mwhahaha> #topic agenda 14:00:36 <mwhahaha> * Review past action items 14:00:36 <mwhahaha> * One off agenda items 14:00:36 <mwhahaha> * Squad status 14:00:36 <mwhahaha> * Bugs & Blueprints 14:00:36 <mwhahaha> * Projects releases or stable backports 14:00:37 <slagle> sdoran: the context is that we are running ansible as a user that has no home dir, and there are some localhost tasks, so we need to override remote_tmp 14:00:37 <openstack> Meeting started Tue Nov 7 14:00:36 2017 UTC and is due to finish in 60 minutes. The chair is mwhahaha. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:37 <mwhahaha> * Specs 14:00:37 <mwhahaha> * open discussion 14:00:38 <mwhahaha> Anyone can use the #link, #action and #info commands, not just the moderatorǃ 14:00:38 <mwhahaha> Hi everyone! who is around today? 14:00:38 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:41 <openstack> The meeting name has been set to 'tripleo' 14:01:08 <beagles> o/ 14:01:09 * mwhahaha watches as everyone disappears 14:01:14 <panda|ruck> o/ 14:01:17 <ccamacho> hola! 14:01:18 <fultonj> o/ 14:01:27 <jkilpatr> o/ 14:01:33 <abishop> o/ 14:01:51 <jpich> o/ 14:01:58 <jkilpatr> should I bring up quickstart commits here or wait for the TripleO CI meeting? 14:01:58 <oidgar> o/ 14:02:13 <slagle> hi 14:02:14 <mwhahaha> jkilpatr: wait until the open discussion in the meeting plz 14:02:37 <jkilpatr> sure 14:02:55 <owalsh> o/ 14:03:04 <jfrancoa> o/ 14:03:10 <mwhahaha> ok lets get started 14:03:15 <mwhahaha> #topic review past action items 14:03:21 <mwhahaha> EmilienM to prepare an etherpad for tripleo onboarding session in Sydney - DONE 14:03:29 <mwhahaha> well i assume it's done since he's in Sydney 14:03:37 <mwhahaha> marios, matbu, chem provide doc/status of upgrade workflow 14:04:10 <mwhahaha> marios, matbu, chem - any update? 14:04:20 <chem> mwhahaha: yes, there is a review hold on 14:04:40 <openstackgerrit> Sagi Shnaidman proposed openstack/tripleo-quickstart-extras master: Fix devmode by right order of playbooks https://review.openstack.org/518336 14:04:52 <chem> mwhahaha: https://review.openstack.org/#/c/517916/ 14:04:59 <chem> mwhahaha: this is the skeleton 14:05:05 <mwhahaha> cool 14:05:09 <marios> mwhahaha: o/ thought there wouldn't be a meeting /me late ... not that i know of but sounds like chem knows more 14:05:31 <mwhahaha> #action team to review upgrades developer docs https://review.openstack.org/#/c/517916/ 14:05:40 <mwhahaha> gfidente put together issues around multiple service instances 14:05:44 <chem> mwhahaha: we plan on filling this as we go, maybe adding TODO and merging this one, no sure how to proceed 14:05:46 <jrist> o/ 14:05:59 <mwhahaha> chem: yea let's get the skeleton merged and iterate 14:06:20 <chem> mwhahaha: ack 14:06:27 <gfidente> mwhahaha I added just two lines into the integration sqad etherpad https://etherpad.openstack.org/p/tripleo-integration-squad-status 14:06:28 <jfrancoa> mwhahaha: chem: yes, that's the best way to proceed I think 14:06:33 <chem> jfrancoa: ^ 14:06:48 <mwhahaha> gfidente: ok thanks 14:06:53 <mwhahaha> mwhahaha to move medium bugs to queens-3 - DONE 14:07:01 <gfidente> mwhahaha but I'd like to get some feedback from people about those and how to approach it 14:07:10 <mwhahaha> i moved all the unstarted medium bugs to queens-3 14:07:19 <gfidente> https://etherpad.openstack.org/p/tripleo-integration-squad-status lines 10 > 12 14:07:25 <mwhahaha> gfidente: ok probably wouldn't hurt to solicit feedback via the ML 14:07:30 <gfidente> mwhahaha ack 14:08:11 <mwhahaha> #action gfidente to send a note requesting feedback on the ML about multiple service instances issues 14:08:15 <mwhahaha> ci squad to start gathering gate failure metrics and information 14:08:31 <mwhahaha> weshay, adarazs|rover, panda|ruck: any updates on the metrics? 14:09:12 <matbu> o/ 14:09:25 <panda|ruck> mwhahaha: sova has a adedicated page for the gate jobs now 14:09:44 <panda|ruck> mwhahaha: we have an aggregate RSS to look at the failures too 14:10:06 <mwhahaha> panda|ruck: cool, please make sure to communicate this information so others can follow 14:10:15 <ooolpbot> URGENT TRIPLEO TASKS NEED ATTENTION 14:10:15 <ooolpbot> https://bugs.launchpad.net/tripleo/+bug/1727406 14:10:17 <openstack> Launchpad bug 1727406 in tripleo "Zaqar subscriptions failed to report deployment error" [Critical,Triaged] - Assigned to Thomas Herve (therve) 14:10:17 <ooolpbot> https://bugs.launchpad.net/tripleo/+bug/1729253 14:10:17 <ooolpbot> https://bugs.launchpad.net/tripleo/+bug/1729586 14:10:18 <ooolpbot> https://bugs.launchpad.net/tripleo/+bug/1730111 14:10:19 <ooolpbot> https://bugs.launchpad.net/tripleo/+bug/1730477 14:10:19 <ooolpbot> https://bugs.launchpad.net/tripleo/+bug/1730671 14:10:20 <openstack> Launchpad bug 1729253 in tripleo "CI issue: Failed to run cinder task ScheduleCreateVolumeTask, No valid backend was found" [Critical,In progress] - Assigned to Martin André (mandre) 14:10:21 <openstack> Launchpad bug 1729586 in tripleo "CI: rdocloud node randomly going offline during jobs" [Critical,Triaged] - Assigned to Gabriele Cerami (gcerami) 14:10:22 <openstack> Launchpad bug 1730111 in tripleo "Volume service hostgroup@tripleo_iscsi failed to start.: CappedVersionUnknown: Unrecoverable Error" [Critical,Triaged] 14:10:23 <openstack> Launchpad bug 1730477 in tripleo "legacy-puppet-syntax-3 job missing on instack-undercloud stable/newton branch" [Critical,Triaged] 14:10:24 <openstack> Launchpad bug 1730671 in tripleo "overcloud installation times out without useful errors in the logs" [Critical,Triaged] 14:10:34 <panda|ruck> mwhahaha: ok, I'll send an email 14:10:41 <mwhahaha> panda|ruck: thanks 14:10:50 <mwhahaha> #topic one off agenda items 14:10:50 <mwhahaha> #link https://etherpad.openstack.org/p/tripleo-meeting-items 14:11:06 <mwhahaha> the agenda is empty, so unless anyone has anything they wish to bring up now I'll move on 14:11:35 <jkilpatr> I just wanted to bring up a quickstart patch. 14:11:50 <mwhahaha> jkilpatr: sure which one 14:11:52 <jkilpatr> https://review.openstack.org/#/c/497950/20 14:12:12 <jkilpatr> I'm not really sure where it belongs. But it needs to live somewhere. 14:12:25 <jkilpatr> if that has to be my own repo fine, if it's in extras great. Just want a verdict on that. 14:12:42 <mwhahaha> jkilpatr: personally i think quickstart-extras is the correct place 14:13:18 <mwhahaha> let's get some more folks to weigh in but it seems like something useful for ci/developers 14:13:57 <jkilpatr> yup, trying to get a nonvoting job running so it would be great if I didn'th ave to cherry pick it in. 14:14:10 <fultonj> where else could it go? 14:14:11 <panda|ruck> will this be confined to CI or will it be useful for customers too ? 14:14:11 <trown> seems like there are 2 things in that patch 14:14:29 <jkilpatr> fultonj, in theory you can pull in arbitrary repos containing extra roles for quickstart. 14:14:33 <mwhahaha> it's a dev/qe thing for the most part 14:14:51 <trown> there is an update role, and a disruption role... 14:15:04 <mwhahaha> customers want no distruptions but we need to be able to test that is true 14:15:09 <jkilpatr> trown, yes there was no role for trying the various stack settings change possibilities. I can split that out if you like? 14:15:12 <jkilpatr> it's a semantics issue. 14:15:28 <mwhahaha> trown: panda|ruck, please take the comments to the review 14:15:53 <trown> sure 14:15:55 <panda|ruck> ok 14:16:05 <mwhahaha> ok moving on to squad status 14:16:09 <trown> seems like a good idea... we just dont have wall time for even upgrades atm 14:16:18 <trown> let alone multi upgrades with special code 14:16:28 <mwhahaha> it could be a periodic 14:16:34 <mwhahaha> but anyway 14:16:37 <mwhahaha> #topic Squad status 14:16:37 <mwhahaha> ci 14:16:37 <mwhahaha> #link https://etherpad.openstack.org/p/tripleo-ci-squad-scrum 14:16:37 <mwhahaha> upgrade 14:16:38 <mwhahaha> #link https://etherpad.openstack.org/p/tripleo-upgrade-squad-status 14:16:38 <mwhahaha> containers 14:16:38 <mwhahaha> #link https://etherpad.openstack.org/p/tripleo-containers-squad-status 14:16:38 <mwhahaha> integration 14:16:38 <mwhahaha> #link https://etherpad.openstack.org/p/tripleo-integration-squad-status 14:16:39 <mwhahaha> ui/cli 14:16:39 <mwhahaha> #link https://etherpad.openstack.org/p/tripleo-ui-cli-squad-status 14:16:40 <mwhahaha> validations 14:16:40 <mwhahaha> #link https://etherpad.openstack.org/p/tripleo-validations-squad-status 14:16:41 <mwhahaha> networking 14:16:41 <mwhahaha> #link https://etherpad.openstack.org/p/tripleo-networking-squad-status 14:17:10 <mwhahaha> jrist: ui/cli status missing updates 14:17:26 <jrist> sorry, will update 14:17:56 <mwhahaha> workflows - thrash|biab please provide a status when you have a chance 14:18:15 <mwhahaha> everyone else it looks like statuses have been updated, please take a look ad review 14:18:48 <openstackgerrit> Sagi Shnaidman proposed openstack/tripleo-quickstart-extras master: Send ARA statistics to Graphite server https://review.openstack.org/479882 14:18:57 <mwhahaha> moving on 14:19:00 <mwhahaha> #topic bugs & blueprints 14:19:00 <mwhahaha> #link https://launchpad.net/tripleo/+milestone/queens-2 14:19:00 <mwhahaha> For Queens we currently have 70 (+1) blueprints and about 521 (+24) open bugs. 254 queens-2 and 267 queens-3. 14:19:16 <mwhahaha> so it seems last week hasn't been a good week for bugs as we're +24 14:19:37 <mwhahaha> also CI has been hosed for some time 14:20:29 <mwhahaha> the queue keeps failing with timeouts so we need to work on critical bugs 14:20:46 <mwhahaha> any other bug related items? 14:21:41 <mwhahaha> sounds like nope 14:21:42 <mwhahaha> #topic projects releases or stable backports 14:21:56 <mwhahaha> Any stable backports people need eyes on? 14:22:11 <mwhahaha> I think EmilienM is working through some release issues if they haven't already been resolved 14:22:30 <mwhahaha> I believe we'll be lining up another release for stable stuff next week depending on the status of previous ones 14:24:08 <mwhahaha> moving on 14:24:12 <mwhahaha> #topic specs 14:24:13 <mwhahaha> #link https://review.openstack.org/#/q/project:openstack/tripleo-specs+status:open 14:25:37 <mwhahaha> please take some time to review the openspecs and if you have an open one with comments, please update it 14:26:11 <mwhahaha> dsneddon, lhinds, ccamacho: looks like you folks have some open specs with -1s 14:27:01 <mwhahaha> moving on 14:27:03 <mwhahaha> #topic open discussion 14:27:05 <ccamacho> mwhahaha yeah Ill ask you some questions after the meeting thanks 14:27:41 <mwhahaha> so CI... 14:28:13 <mwhahaha> adarazs|rover, panda|ruck: do we have updates on where we're at with CI and why we're running into so many timeouts? 14:28:47 <adarazs|rover> mwhahaha: nope. I just opened the bug for it, as I didn't find anything more enlightening after looking around in the logs. 14:29:02 <adarazs|rover> here's the bug: https://bugs.launchpad.net/tripleo/+bug/1730671 14:29:03 <openstack> Launchpad bug 1730671 in tripleo "overcloud installation times out without useful errors in the logs" [Critical,Triaged] 14:29:21 <mwhahaha> ok so the gate is 24+ hours behind and it seems that we're hitting a timeout consistently on some jobs 14:30:11 <mwhahaha> i'll take a look at that bug and see if i can add some more information 14:30:43 <adarazs|rover> mwhahaha: thanks! 14:31:07 <panda|ruck> I really need to understand why the zuul queue can grow indefinitely. Chasing gate failures is something we certainly have to do, but it looks like a bug that we ad and add changes to a queue that grow the possibility of getting a job to fail and reset everything again 14:31:08 <mwhahaha> so at this point we're pretty much blocked again in the gate unless we can figure out what's timing out 14:31:51 <mwhahaha> panda|ruck: it grows because people are approving stuff or rechecking. and when a reset occurs that's another 2+ hours of not advancing 14:32:19 <mwhahaha> we probably need a stop order on all aprovals until we figure out what is causing the timeouts 14:32:46 <mwhahaha> unless a patch is going to fix a known blocker bug I think we need to -2 for now 14:32:47 <panda|ruck> mwhahaha: well, zuul could just say "i'll teke the first five and chenge only them together, so the risk of getting a reset is lower" 14:33:15 <Tengu> hmm there's also the random issue with the volumes 14:33:37 <panda|ruck> this kind of optimization zuul is doing, doesn't work very well with our rate of failures 14:33:51 <mwhahaha> well the argument is that we shouldn't have this rate of failures 14:33:54 <mwhahaha> cause we shouldn't 14:34:27 <mwhahaha> so anyway i'm going to send a note about not aproving anything else and we may have to clear the queue to right this 14:34:31 <adarazs|rover> I mean if you have a 99% chance of passing and you try to run it 20 times you're already at only an 80% chance of passing. 14:34:41 <adarazs|rover> and 99% is quite ideal. 14:35:08 <mwhahaha> #action mwhahaha send a note about CI to ML and propsing no more merging of items not specifically critical CI bugs 14:35:08 <panda|ruck> we can keep the rate low, putting a lot of effeort, but I don't think the situation is going to improve in the long run 14:35:09 <jaosorior> Tengu: got logs for it? 14:35:26 <Tengu> jaosorior: there's two issues about that already. 2s, getting logs. 14:36:15 <Tengu> jaosorior: Launchpad bug 1729253 in tripleo "CI issue: Failed to run cinder task ScheduleCreateVolumeTask, No valid backend was found" and "Launchpad bug 1730111 in tripleo "Volume service hostgroup@tripleo_iscsi failed to start.: CappedVersionUnknown: Unrecoverable Error"" - I regularly hit the first one on my apache review. 14:36:17 <openstack> Launchpad bug 1729253 in tripleo "CI issue: Failed to run cinder task ScheduleCreateVolumeTask, No valid backend was found" [Critical,In progress] https://launchpad.net/bugs/1729253 - Assigned to Martin André (mandre) 14:36:18 <openstack> Launchpad bug 1730111 in tripleo "Volume service hostgroup@tripleo_iscsi failed to start.: CappedVersionUnknown: Unrecoverable Error" [Critical,Triaged] https://launchpad.net/bugs/1730111 14:36:41 <ccamacho> mwhahaha what about patches/bug fixes already +2 but waiting for merge? 14:36:52 <ccamacho> I mean +A 14:36:59 <mwhahaha> ccamacho: unless it's a CI bug no we shouldn't recheck 14:37:03 <ccamacho> ack 14:37:29 <mwhahaha> ok anything else? 14:40:34 <mwhahaha> sounds like nope 14:40:36 <mwhahaha> thanks everyone 14:40:38 <mwhahaha> #endmeeting