14:00:36 #startmeeting tripleo 14:00:36 #topic agenda 14:00:36 * Review past action items 14:00:36 * One off agenda items 14:00:36 * Squad status 14:00:36 * Bugs & Blueprints 14:00:36 * Projects releases or stable backports 14:00:37 sdoran: the context is that we are running ansible as a user that has no home dir, and there are some localhost tasks, so we need to override remote_tmp 14:00:37 Meeting started Tue Nov 7 14:00:36 2017 UTC and is due to finish in 60 minutes. The chair is mwhahaha. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:37 * Specs 14:00:37 * open discussion 14:00:38 Anyone can use the #link, #action and #info commands, not just the moderatorǃ 14:00:38 Hi everyone! who is around today? 14:00:38 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:41 The meeting name has been set to 'tripleo' 14:01:08 o/ 14:01:09 * mwhahaha watches as everyone disappears 14:01:14 o/ 14:01:17 hola! 14:01:18 o/ 14:01:27 o/ 14:01:33 o/ 14:01:51 o/ 14:01:58 should I bring up quickstart commits here or wait for the TripleO CI meeting? 14:01:58 o/ 14:02:13 hi 14:02:14 jkilpatr: wait until the open discussion in the meeting plz 14:02:37 sure 14:02:55 o/ 14:03:04 o/ 14:03:10 ok lets get started 14:03:15 #topic review past action items 14:03:21 EmilienM to prepare an etherpad for tripleo onboarding session in Sydney - DONE 14:03:29 well i assume it's done since he's in Sydney 14:03:37 marios, matbu, chem provide doc/status of upgrade workflow 14:04:10 marios, matbu, chem - any update? 14:04:20 mwhahaha: yes, there is a review hold on 14:04:40 Sagi Shnaidman proposed openstack/tripleo-quickstart-extras master: Fix devmode by right order of playbooks https://review.openstack.org/518336 14:04:52 mwhahaha: https://review.openstack.org/#/c/517916/ 14:04:59 mwhahaha: this is the skeleton 14:05:05 cool 14:05:09 mwhahaha: o/ thought there wouldn't be a meeting /me late ... not that i know of but sounds like chem knows more 14:05:31 #action team to review upgrades developer docs https://review.openstack.org/#/c/517916/ 14:05:40 gfidente put together issues around multiple service instances 14:05:44 mwhahaha: we plan on filling this as we go, maybe adding TODO and merging this one, no sure how to proceed 14:05:46 o/ 14:05:59 chem: yea let's get the skeleton merged and iterate 14:06:20 mwhahaha: ack 14:06:27 mwhahaha I added just two lines into the integration sqad etherpad https://etherpad.openstack.org/p/tripleo-integration-squad-status 14:06:28 mwhahaha: chem: yes, that's the best way to proceed I think 14:06:33 jfrancoa: ^ 14:06:48 gfidente: ok thanks 14:06:53 mwhahaha to move medium bugs to queens-3 - DONE 14:07:01 mwhahaha but I'd like to get some feedback from people about those and how to approach it 14:07:10 i moved all the unstarted medium bugs to queens-3 14:07:19 https://etherpad.openstack.org/p/tripleo-integration-squad-status lines 10 > 12 14:07:25 gfidente: ok probably wouldn't hurt to solicit feedback via the ML 14:07:30 mwhahaha ack 14:08:11 #action gfidente to send a note requesting feedback on the ML about multiple service instances issues 14:08:15 ci squad to start gathering gate failure metrics and information 14:08:31 weshay, adarazs|rover, panda|ruck: any updates on the metrics? 14:09:12 o/ 14:09:25 mwhahaha: sova has a adedicated page for the gate jobs now 14:09:44 mwhahaha: we have an aggregate RSS to look at the failures too 14:10:06 panda|ruck: cool, please make sure to communicate this information so others can follow 14:10:15 URGENT TRIPLEO TASKS NEED ATTENTION 14:10:15 https://bugs.launchpad.net/tripleo/+bug/1727406 14:10:17 Launchpad bug 1727406 in tripleo "Zaqar subscriptions failed to report deployment error" [Critical,Triaged] - Assigned to Thomas Herve (therve) 14:10:17 https://bugs.launchpad.net/tripleo/+bug/1729253 14:10:17 https://bugs.launchpad.net/tripleo/+bug/1729586 14:10:18 https://bugs.launchpad.net/tripleo/+bug/1730111 14:10:19 https://bugs.launchpad.net/tripleo/+bug/1730477 14:10:19 https://bugs.launchpad.net/tripleo/+bug/1730671 14:10:20 Launchpad bug 1729253 in tripleo "CI issue: Failed to run cinder task ScheduleCreateVolumeTask, No valid backend was found" [Critical,In progress] - Assigned to Martin André (mandre) 14:10:21 Launchpad bug 1729586 in tripleo "CI: rdocloud node randomly going offline during jobs" [Critical,Triaged] - Assigned to Gabriele Cerami (gcerami) 14:10:22 Launchpad bug 1730111 in tripleo "Volume service hostgroup@tripleo_iscsi failed to start.: CappedVersionUnknown: Unrecoverable Error" [Critical,Triaged] 14:10:23 Launchpad bug 1730477 in tripleo "legacy-puppet-syntax-3 job missing on instack-undercloud stable/newton branch" [Critical,Triaged] 14:10:24 Launchpad bug 1730671 in tripleo "overcloud installation times out without useful errors in the logs" [Critical,Triaged] 14:10:34 mwhahaha: ok, I'll send an email 14:10:41 panda|ruck: thanks 14:10:50 #topic one off agenda items 14:10:50 #link https://etherpad.openstack.org/p/tripleo-meeting-items 14:11:06 the agenda is empty, so unless anyone has anything they wish to bring up now I'll move on 14:11:35 I just wanted to bring up a quickstart patch. 14:11:50 jkilpatr: sure which one 14:11:52 https://review.openstack.org/#/c/497950/20 14:12:12 I'm not really sure where it belongs. But it needs to live somewhere. 14:12:25 if that has to be my own repo fine, if it's in extras great. Just want a verdict on that. 14:12:42 jkilpatr: personally i think quickstart-extras is the correct place 14:13:18 let's get some more folks to weigh in but it seems like something useful for ci/developers 14:13:57 yup, trying to get a nonvoting job running so it would be great if I didn'th ave to cherry pick it in. 14:14:10 where else could it go? 14:14:11 will this be confined to CI or will it be useful for customers too ? 14:14:11 seems like there are 2 things in that patch 14:14:29 fultonj, in theory you can pull in arbitrary repos containing extra roles for quickstart. 14:14:33 it's a dev/qe thing for the most part 14:14:51 there is an update role, and a disruption role... 14:15:04 customers want no distruptions but we need to be able to test that is true 14:15:09 trown, yes there was no role for trying the various stack settings change possibilities. I can split that out if you like? 14:15:12 it's a semantics issue. 14:15:28 trown: panda|ruck, please take the comments to the review 14:15:53 sure 14:15:55 ok 14:16:05 ok moving on to squad status 14:16:09 seems like a good idea... we just dont have wall time for even upgrades atm 14:16:18 let alone multi upgrades with special code 14:16:28 it could be a periodic 14:16:34 but anyway 14:16:37 #topic Squad status 14:16:37 ci 14:16:37 #link https://etherpad.openstack.org/p/tripleo-ci-squad-scrum 14:16:37 upgrade 14:16:38 #link https://etherpad.openstack.org/p/tripleo-upgrade-squad-status 14:16:38 containers 14:16:38 #link https://etherpad.openstack.org/p/tripleo-containers-squad-status 14:16:38 integration 14:16:38 #link https://etherpad.openstack.org/p/tripleo-integration-squad-status 14:16:39 ui/cli 14:16:39 #link https://etherpad.openstack.org/p/tripleo-ui-cli-squad-status 14:16:40 validations 14:16:40 #link https://etherpad.openstack.org/p/tripleo-validations-squad-status 14:16:41 networking 14:16:41 #link https://etherpad.openstack.org/p/tripleo-networking-squad-status 14:17:10 jrist: ui/cli status missing updates 14:17:26 sorry, will update 14:17:56 workflows - thrash|biab please provide a status when you have a chance 14:18:15 everyone else it looks like statuses have been updated, please take a look ad review 14:18:48 Sagi Shnaidman proposed openstack/tripleo-quickstart-extras master: Send ARA statistics to Graphite server https://review.openstack.org/479882 14:18:57 moving on 14:19:00 #topic bugs & blueprints 14:19:00 #link https://launchpad.net/tripleo/+milestone/queens-2 14:19:00 For Queens we currently have 70 (+1) blueprints and about 521 (+24) open bugs. 254 queens-2 and 267 queens-3. 14:19:16 so it seems last week hasn't been a good week for bugs as we're +24 14:19:37 also CI has been hosed for some time 14:20:29 the queue keeps failing with timeouts so we need to work on critical bugs 14:20:46 any other bug related items? 14:21:41 sounds like nope 14:21:42 #topic projects releases or stable backports 14:21:56 Any stable backports people need eyes on? 14:22:11 I think EmilienM is working through some release issues if they haven't already been resolved 14:22:30 I believe we'll be lining up another release for stable stuff next week depending on the status of previous ones 14:24:08 moving on 14:24:12 #topic specs 14:24:13 #link https://review.openstack.org/#/q/project:openstack/tripleo-specs+status:open 14:25:37 please take some time to review the openspecs and if you have an open one with comments, please update it 14:26:11 dsneddon, lhinds, ccamacho: looks like you folks have some open specs with -1s 14:27:01 moving on 14:27:03 #topic open discussion 14:27:05 mwhahaha yeah Ill ask you some questions after the meeting thanks 14:27:41 so CI... 14:28:13 adarazs|rover, panda|ruck: do we have updates on where we're at with CI and why we're running into so many timeouts? 14:28:47 mwhahaha: nope. I just opened the bug for it, as I didn't find anything more enlightening after looking around in the logs. 14:29:02 here's the bug: https://bugs.launchpad.net/tripleo/+bug/1730671 14:29:03 Launchpad bug 1730671 in tripleo "overcloud installation times out without useful errors in the logs" [Critical,Triaged] 14:29:21 ok so the gate is 24+ hours behind and it seems that we're hitting a timeout consistently on some jobs 14:30:11 i'll take a look at that bug and see if i can add some more information 14:30:43 mwhahaha: thanks! 14:31:07 I really need to understand why the zuul queue can grow indefinitely. Chasing gate failures is something we certainly have to do, but it looks like a bug that we ad and add changes to a queue that grow the possibility of getting a job to fail and reset everything again 14:31:08 so at this point we're pretty much blocked again in the gate unless we can figure out what's timing out 14:31:51 panda|ruck: it grows because people are approving stuff or rechecking. and when a reset occurs that's another 2+ hours of not advancing 14:32:19 we probably need a stop order on all aprovals until we figure out what is causing the timeouts 14:32:46 unless a patch is going to fix a known blocker bug I think we need to -2 for now 14:32:47 mwhahaha: well, zuul could just say "i'll teke the first five and chenge only them together, so the risk of getting a reset is lower" 14:33:15 hmm there's also the random issue with the volumes 14:33:37 this kind of optimization zuul is doing, doesn't work very well with our rate of failures 14:33:51 well the argument is that we shouldn't have this rate of failures 14:33:54 cause we shouldn't 14:34:27 so anyway i'm going to send a note about not aproving anything else and we may have to clear the queue to right this 14:34:31 I mean if you have a 99% chance of passing and you try to run it 20 times you're already at only an 80% chance of passing. 14:34:41 and 99% is quite ideal. 14:35:08 #action mwhahaha send a note about CI to ML and propsing no more merging of items not specifically critical CI bugs 14:35:08 we can keep the rate low, putting a lot of effeort, but I don't think the situation is going to improve in the long run 14:35:09 Tengu: got logs for it? 14:35:26 jaosorior: there's two issues about that already. 2s, getting logs. 14:36:15 jaosorior: Launchpad bug 1729253 in tripleo "CI issue: Failed to run cinder task ScheduleCreateVolumeTask, No valid backend was found" and "Launchpad bug 1730111 in tripleo "Volume service hostgroup@tripleo_iscsi failed to start.: CappedVersionUnknown: Unrecoverable Error"" - I regularly hit the first one on my apache review. 14:36:17 Launchpad bug 1729253 in tripleo "CI issue: Failed to run cinder task ScheduleCreateVolumeTask, No valid backend was found" [Critical,In progress] https://launchpad.net/bugs/1729253 - Assigned to Martin André (mandre) 14:36:18 Launchpad bug 1730111 in tripleo "Volume service hostgroup@tripleo_iscsi failed to start.: CappedVersionUnknown: Unrecoverable Error" [Critical,Triaged] https://launchpad.net/bugs/1730111 14:36:41 mwhahaha what about patches/bug fixes already +2 but waiting for merge? 14:36:52 I mean +A 14:36:59 ccamacho: unless it's a CI bug no we shouldn't recheck 14:37:03 ack 14:37:29 ok anything else? 14:40:34 sounds like nope 14:40:36 thanks everyone 14:40:38 #endmeeting