13:59:37 <mwhahaha> #startmeeting tripleo 13:59:37 <mwhahaha> #topic agenda 13:59:37 <mwhahaha> * Review past action items 13:59:37 <mwhahaha> * One off agenda items 13:59:37 <mwhahaha> * Squad status 13:59:37 <mwhahaha> * Bugs & Blueprints 13:59:37 <openstack> Meeting started Tue Oct 3 13:59:37 2017 UTC and is due to finish in 60 minutes. The chair is mwhahaha. Information about MeetBot at http://wiki.debian.org/MeetBot. 13:59:37 <mwhahaha> * Projects releases or stable backports 13:59:38 <mwhahaha> * Specs 13:59:38 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 13:59:38 <mwhahaha> * open discussion 13:59:39 <mwhahaha> Anyone can use the #link, #action and #info commands, not just the moderatorǃ 13:59:41 <openstack> The meeting name has been set to 'tripleo' 13:59:55 <EmilienM> o/ 13:59:57 <fultonj> o/ 13:59:59 <slagle> hi 14:00:02 <jfrancoa> o/ 14:00:06 <trown> o/ 14:00:07 <bandini> o/ 14:00:12 <mandre> o/ 14:00:17 <cdearborn_> \o 14:00:19 <marios> o/ 14:00:20 <beagles> o/ 14:00:22 <jistr> o/ 14:00:22 <matbu_> o/ 14:00:29 <jtomasek> o/ 14:00:44 <atoth> o/ 14:00:47 <ccamacho> o/ 14:00:49 <jrist> \o 14:01:26 <mwhahaha> alright let's do this 14:01:27 <mwhahaha> #topic review past action items 14:01:33 <mwhahaha> shardy to look at how to reduce # of services deployed on ovb (continued) 14:01:45 <mwhahaha> shardy, did you get a chance to do this? not that it matters because CI is hosed 14:02:38 <adarazs> o/ 14:03:06 <mwhahaha> I'll take that as a no, i'll follow up with steve later 14:03:15 <owalsh> o/ 14:03:25 <mwhahaha> #action mwhahaha to follow up with shardy about services and ovb jobs 14:03:27 <mwhahaha> review newton backports in gerrit 14:03:39 <EmilienM> I think we did it 14:03:43 <mwhahaha> as a reminder upstream eol is soon 14:03:53 <EmilienM> but some of them are still under review 14:04:05 <mwhahaha> still trying to figure out what that exactly means for tripleo but make sure to take a look at newton stuff 14:04:06 <EmilienM> and it won't make progress until this zuul v3 stops to kill us 14:04:24 <mwhahaha> yup 14:04:31 <fultonj> regarding newton i have a bug i am working on i hope to backport to newton but it has to go further https://bugs.launchpad.net/tripleo/+bug/1720787 14:04:32 <openstack> Launchpad bug 1720787 in tripleo "TripleO deploys ceph client keyring with 644 permissions" [High,In progress] - Assigned to John Fulton (jfulton-org) 14:04:33 <shardy> mwhahaha: Hey sorry, running late, no progress re ovb jobs yet, as you say CI has been down 14:04:34 <fultonj> just an fyi 14:05:06 <shardy> mwhahaha: still plan to take a look, but pre-zuul3 issues the OVB jobs weren't timing out as much so I deproritized reducing the services 14:05:11 <EmilienM> fultonj: it will be - in theory we release newton / ocata / pike every 2 weeks 14:05:44 <mwhahaha> shardy: ok. yea i think it's still a good idea to evaulate what we're checking to also help with excessive coverage, etc 14:05:49 <fultonj> EmilienM: ack 14:06:01 <openstackgerrit> Tim Rozet proposed openstack/tripleo-heat-templates master: Dynamically renders network isolation resource registry https://review.openstack.org/509190 14:06:18 <shardy> mwhahaha: Yeah agreed 14:06:36 <mwhahaha> ok moving on to the next stuff 14:06:37 <mwhahaha> #topic one off agenda items 14:06:42 <mwhahaha> #link https://etherpad.openstack.org/p/tripleo-meeting-items 14:06:59 <mwhahaha> looks empty, anyone have anything they wish to address? 14:07:01 <fultonj> one-off... 14:07:07 <fultonj> https://etherpad.openstack.org/p/tripleo-integration-squad-status 14:07:10 <fultonj> TripleO Integration Squad Status 14:07:20 <mwhahaha> fultonj: we have a section for squad status (next) :D 14:07:24 <fultonj> whops 14:07:26 <mwhahaha> but thanks i'll just link that 14:07:35 <EmilienM> fultonj: nice etherpad, thanks! 14:07:42 <fultonj> thanks 14:07:53 <fultonj> thanks gfidente 14:08:11 <mwhahaha> ok moving on to squad status since we have some 14:08:12 <mwhahaha> #topic Squad status 14:08:20 <mwhahaha> integration 14:08:20 <mwhahaha> #link https://etherpad.openstack.org/p/tripleo-integration-squad-status 14:08:38 <fultonj> we talked about these items at PTG 14:08:55 <EmilienM> fultonj: do we target them all for Queens? 14:08:59 <fultonj> no progress on multiple ceph clusters (would be expansion of composable roles capability) 14:09:13 <fultonj> only multiple ceph pools and luminous 14:09:15 <fultonj> 2 out of the 3 14:09:35 <fultonj> we hope to land changes for multiple ceph clusters now, but not fully deliver feature until post-queens 14:09:45 <marios> mwhahaha: i just put this down (we spoke about it in dfg upgrades not sure if someone got round to it already) but https://etherpad.openstack.org/p/tripleo-upgrade-squad-status 14:09:46 <fultonj> now --> during the queens cycle 14:09:58 <mwhahaha> marios: thanks! 14:10:08 <marios> matbu_: can you update with the minor updates remaining https://etherpad.openstack.org/p/tripleo-upgrade-squad-status 14:10:13 <ooolpbot> URGENT TRIPLEO TASKS NEED ATTENTION 14:10:14 <ooolpbot> https://bugs.launchpad.net/tripleo/+bug/1718387 14:10:15 <ooolpbot> https://bugs.launchpad.net/tripleo/+bug/1719123 14:10:15 <ooolpbot> https://bugs.launchpad.net/tripleo/+bug/1720220 14:10:16 <ooolpbot> https://bugs.launchpad.net/tripleo/+bug/1720458 14:10:16 <openstack> Launchpad bug 1718387 in tripleo "ping test is periodically failing for the gate-tripleo-ci-centos-7-nonha-multinode-oooq " [Critical,Triaged] - Assigned to Sofer Athlan-Guyot (sofer-athlan-guyot) 14:10:16 <ooolpbot> https://bugs.launchpad.net/tripleo/+bug/1720556 14:10:17 <ooolpbot> https://bugs.launchpad.net/tripleo/+bug/1720721 14:10:18 <marios> mwhahaha: i'll ask folks to update it 14:10:18 <ooolpbot> https://bugs.launchpad.net/tripleo/+bug/1720918 14:10:18 <ooolpbot> https://bugs.launchpad.net/tripleo/+bug/1720973 14:10:19 <openstack> Launchpad bug 1719123 in tripleo "tempest fails on overcloud keystone admin tenant error" [Critical,Triaged] - Assigned to Arx Cruz (arxcruz) 14:10:20 <openstack> Launchpad bug 1720220 in tripleo "CI: Most/all legacy-tripleo jobs failing on gate" [Critical,Triaged] 14:10:21 <openstack> Launchpad bug 1720458 in tripleo "Lastest delorean pip package causes error in autodoc" [Critical,In progress] - Assigned to wes hayutin (weshayutin) 14:10:22 <openstack> Launchpad bug 1720556 in tripleo "tracker: upstream zuul is not triggering periodic jobs" [Critical,Triaged] - Assigned to wes hayutin (weshayutin) 14:10:23 <openstack> Launchpad bug 1720721 in tripleo "CI: OVB jobs fail because can't install XStatic from PyPI mirror on rh1 cloud" [Critical,Triaged] - Assigned to Paul Belanger (pabelanger) 14:10:24 <openstack> Launchpad bug 1720918 in tripleo "puppet-firewall changed the ipv6-icmp rule type name" [Critical,In progress] - Assigned to Ben Nemec (bnemec) 14:10:26 <openstack> Launchpad bug 1720973 in tripleo "CI: infras cirros image has wrong permission" [Critical,Triaged] - Assigned to Paul Belanger (pabelanger) 14:10:30 <mwhahaha> marios: sounds good i'll use that to refer to status durring the meetings going forward 14:10:34 <matbu_> marios: yep /me tries to click on the link 14:10:34 <EmilienM> fultonj: do you have upgrades working from ocata - puppet-ceph managed to pike ceph-ansible managed? (I know CI is down now, but I would like to see some CI job voting here once it's back) 14:11:04 <fultonj> from ocata working, but no ci (must add ci) 14:11:27 <EmilienM> fultonj: maybe you can add it in the etherpad, I would like it to be a priority 14:11:37 <fultonj> EmilienM: yes, will do 14:11:40 <EmilienM> thanks 14:11:41 <fultonj> #actionitem 14:11:46 <mandre> EmilienM: I think that may be covered in the CI work we do in the containers squad 14:11:58 <fultonj> #action ceph-upgrade ci 14:12:10 <EmilienM> mandre: most probably, I just want to make sure this is done at some point 14:12:12 <mandre> talking about the container squad, we also prepared an etherpad :) https://etherpad.openstack.org/p/tripleo-containers-squad-status 14:12:29 <mwhahaha> mandre: thanks! i'll add that to the list for next time as well 14:12:52 <matbu_> marios: done 14:13:04 <mwhahaha> ci - weshay trown sshnaidm|mtg - any status that you can share around CI? 14:13:29 <mwhahaha> ui/cli - jrist - any status you can share around UI? (or validations as well) 14:13:39 <trown> mwhahaha: other than it is totally borked due to zuulv3 migration not a ton this week 14:13:44 <EmilienM> mwhahaha: status is here https://etherpad.openstack.org/p/tripleo-ci-squad-meeting 14:13:57 <marios> ty matbu_ 14:13:58 <mwhahaha> EmilienM: yea but they are getting rid of the squad meeting so is that still valid? 14:13:59 <jrist> nothing in particular except that we are beginning to work on some ci and upstream automation 14:14:00 <weshay> mwhahaha, we're putting on the finishing touches to tripleo promotion jobs in rdo software factory 14:14:10 <EmilienM> mwhahaha: or here even https://etherpad.openstack.org/p/tripleo-ci-squad-scrum 14:14:12 <trown> we are working on a new process for organizing our squad and will have more formal status etherpad going forward 14:14:43 <mwhahaha> trown: sounds good, just let me know so I can add it to the list 14:15:04 <mwhahaha> workflows - thrash - any status you can share? 14:15:12 <mwhahaha> networking - beagles - any status you can share? 14:15:19 <weshay> tripleo-ci-squad sprint details can be found https://www.redhat.com/archives/rdo-list/2017-September/msg00068.html 14:16:39 <thrash> mwhahaha: working on some rfe's for ui 14:16:54 <mwhahaha> thrash: sounds good, thanks. let us know if you need anything (reviews/etc) 14:17:00 <thrash> mwhahaha: ack 14:17:39 <mwhahaha> ok thanks everyone for the status. let's move on 14:17:49 <mwhahaha> #topic bugs & blueprints 14:17:50 <mwhahaha> #link https://launchpad.net/tripleo/+milestone/queens-1 14:18:03 <mwhahaha> We currently have 62 blueprints and about 476 open bugs. Please take some time to review your blueprint status and make sure it is properly up to date. 14:18:14 <mwhahaha> please be aware that queens-1 is in about 2 weeks 14:18:26 <mwhahaha> so please move stuff that you aren't going to land by queens-1 out to queens-2 14:19:00 <mwhahaha> anyone have any blueprints or bugs they want to raise for visibility? 14:19:39 <slagle> i filed a blueprint for the ansible work 14:19:42 <slagle> just fyi 14:19:44 <EmilienM> I hope we can merge something before the 2 weeks ^^ 14:20:14 <mwhahaha> ok 14:20:52 <marios> mwhahaha: its already in upgrades squad status, but for visibility, spec for the Q upgrades here https://review.openstack.org/507620 14:21:07 <marios> mwhahaha: captures what we discussed in ptg 14:21:12 <mwhahaha> marios: sounds good 14:21:17 <EmilienM> mwhahaha: https://blueprints.launchpad.net/tripleo/+spec/ansible-config-download-ci and https://blueprints.launchpad.net/tripleo/+spec/ansible-config-download 14:22:07 <mwhahaha> slagle: do you think those will be landed in the next two weeks? 14:22:23 <EmilienM> it's possible 14:22:24 <slagle> mwhahaha: i think it's possible 14:22:30 <slagle> yea, we'll go with "possible" :) 14:22:54 <mwhahaha> provided ci stops being on fire :D 14:22:58 <slagle> it's also possible not a single patch will get landed in the next 2 weeks :) 14:22:59 <EmilienM> we already tested the bits on our envs, it works fine (module some changes) - if our CI is back this week... 14:23:23 <EmilienM> we'll move it to queens-2 otherwise, but we want to get ci coverage asap for this feature 14:23:31 <mwhahaha> makes sense 14:23:40 <mwhahaha> Ok any other bugs/blueprints? 14:23:41 <shardy> 476 open bugs - I wonder what we can do about that - I think the trend is upwards so perhaps we can prune/prioritize or de-duplicate better there? 14:23:57 <mwhahaha> shardy: i've started to go through and clean some up 14:24:04 <mwhahaha> shardy: we're about +6 for the week 14:24:18 <shardy> mwhahaha: ack OK I'll see if I can spend some time helping 14:24:32 <EmilienM> we could target 450 by queens-1, and maybe 400 by queens-2 14:24:36 <mwhahaha> there's a bunch of old stuff that is no longer valid i'm sure 14:24:44 <shardy> mwhahaha: I find it useful to have a clearly defined (smaller) set of priority things for a milestone, then it's easier to know which reviews to prioritize 14:25:10 <mwhahaha> so we could move all <= medium to queens-2 for visibility 14:25:22 <shardy> mwhahaha: yeah something like that might be good 14:25:39 <mwhahaha> #action mwhahaha to move bugs <= medium to queens-2 and review > medium for validity 14:25:43 <mwhahaha> i'll do that this week 14:25:45 <EmilienM> mwhahaha: +1 14:25:56 <mwhahaha> moving on 14:25:57 <mwhahaha> #topic projects releases or stable backports 14:26:00 <EmilienM> mwhahaha: I can do it with the script 14:26:09 <mwhahaha> EmilienM: i'll get it 14:26:13 <EmilienM> k 14:26:40 <mwhahaha> so we have pending stable releases but there's some issues around th stable-policy stuff 14:27:31 <mwhahaha> Given the status of CI there isn't much point in talking backport patches 14:27:34 <mwhahaha> so let's move on 14:27:38 <mwhahaha> #topic specs 14:27:38 <mwhahaha> #link https://review.openstack.org/#/q/project:openstack/tripleo-specs+status:open 14:27:58 <mwhahaha> given that queens-1 is in ~2 weeks, we need to make sure any queens specs get merged asap 14:28:04 <mwhahaha> please take a look at the open ones and review 14:28:18 <mwhahaha> as marios pointed out the upgrade spec is here: 14:28:20 <mwhahaha> #link https://review.openstack.org/507620 14:28:33 <EmilienM> can we merge specs? lol 14:28:50 <mwhahaha> well it starts with reviewing them :D 14:28:51 <EmilienM> ah yeah, jobs pass, at least something that works 14:28:57 <marios> mwhahaha: thanks, sorry for the noise before (I didn't read the context, i thought you were asking for updates on Q stuff, not blueprints/bugs targetting apologies ) 14:29:12 <mwhahaha> marios: it's all good, it's better to have more info then none at all :D 14:30:06 <mwhahaha> Ok so on to the open discussion 14:30:07 <mwhahaha> #topic open discussion 14:30:30 <mwhahaha> Anything else that folks would like to talk about? 14:30:38 <EmilienM> we haven't talked much about CI 14:30:43 <bandini> is there any super rough/high level eta for fixing zuulv3? 14:30:44 <EmilienM> it's unclear to me where we are now 14:30:53 <mwhahaha> dmsimard, pabelanger: any updates on CI 14:30:56 <marios> right, i was weary about asking... do we know anything more about it :) seems other share the sentiment 14:31:01 <shardy> bandini: that's been asked on the ML but I don't see any reply yet 14:31:10 <EmilienM> we've been very patient until now but things are getting bad now 14:31:35 <bandini> shardy: yeah saw the mail, thought maybe some tripleo ci overlords had some additional thoughts :) 14:31:37 <EmilienM> pabelanger, dmsimard: this CI downtime is having a critical effect on TripleO delivery, fyi 14:31:52 <dmsimard> mwhahaha, EmilienM: sshnaidm|mtg gave me an update earlier, it looks like we're almost clear and hitting some sort of timeout right now 14:32:08 <dmsimard> let me pull up the review, hang on. 14:32:18 <EmilienM> shardy: which email? 14:32:24 <EmilienM> [openstack-dev] [all] Update on Zuul v3 Migration - and what to do about issues 14:32:27 <EmilienM> this one? 14:32:28 <shardy> [openstack-dev] [all] Update on Zuul v3 Migration - and what to do about issues 14:32:30 <dmsimard> mwhahaha, EmilienM: https://review.openstack.org/#/c/508660/ 14:32:39 <trown> dmsimard: we are hitting a timeout because the connection to the nodepool node drops 14:32:47 <amoralej> EmilienM, but if we ping puppet-firewall, we'll be unable to merge the fix, right? 14:33:07 <shardy> yeah that one, dansmith asked about mitigation as did sdague yesterday 14:33:09 <bnemec> True 14:33:10 <dmsimard> trown: yes, I am looking into it right now 14:33:13 <EmilienM> amoralej: let's pin it anyway for now, so we remove one issue 14:33:17 <amoralej> ok 14:33:59 <EmilienM> dmsimard, pabelanger : so back to the initial question, any ETA? 14:34:11 <dmsimard> mwhahaha, EmilienM: there are different ongoing discussions about considering a rollback but it's not a simple task because some projects (tripleo included) have had to introduce changes in their projects to support zuul v3 and doing a rollback would mean breaking those projects again 14:34:30 <EmilienM> why haven't we tested thing before? 14:35:03 <EmilienM> it's a paradox that the projects in charge of testing wasn't tested against tripleo before 14:35:13 <shardy> yeah it's kinda surprising there wasn't some sort of paralell migration strategy so jobs could be switched over gradually 14:35:34 <shardy> EmilienM: well it sounds like it's not only TripleO 14:36:02 <tosky> the problem is that some of the fixes are stuck in the queue, but they will land 14:36:20 <tosky> I'm not sure that rolling back now wouldn't cause even more disruption 14:36:32 <pabelanger> we are having scale issues, which are harder to test then migration process 14:36:44 <shardy> tosky: yeah, it'd just be nice to have better visibility of the status/progress I guess 14:37:15 <pabelanger> EmilienM: no ETA, suggest watching https://etherpad.openstack.org/p/zuulv3-issues and #openstack-infra. 14:37:27 <EmilienM> anything we can help? 14:37:42 <dmsimard> EmilienM: I heavily recommended a gradual opt-in of selected projects, especially deployment projects such as puppet-openstack, openstack-ansible and tripleo but the decision ended otherwise 14:38:10 <dmsimard> EmilienM: right now we are working with https://review.openstack.org/#/c/508660/ 14:38:24 <dmsimard> and I am looking into what seems to be a timeout related issue 14:38:38 <pabelanger> just patience and what ever needs done on etherpad 14:38:52 <dmsimard> seeing as the jobs ran for over 2 hours, I'm hopeful there are no more issues related to tripleo or the jobs themselves 14:39:07 <EmilienM> pabelanger: patience ✓ already ;-) 14:39:32 <EmilienM> last patch merged in tripleo was Sep 28th 14:39:55 <EmilienM> do we have a failover plan if by end of week it's not fixed? 14:40:15 <EmilienM> we can't stop merging code duringe more than a week, we have to find a plan b 14:40:29 <dmsimard> EmilienM: there's ongoing discussion around that topic and I anticipate an infra core to chime in the openstack-dev thread 14:40:30 <mwhahaha> weshay: said there might be an option with software factory 14:40:49 <EmilienM> I don't think they have enough resources tbh 14:41:22 <mwhahaha> time to figure out what would qualify as a critical minimal subset of CI testing 14:41:57 <EmilienM> do we have someone looking at that^ ? 14:41:59 <EmilienM> weshay: ^ 14:42:21 <EmilienM> mwhahaha: anyway, we can move on I guess 14:42:23 <shardy> we did that once before, disabled a bunch of jobs and merged code with a subset - it took weeks to fix all the regressions after 14:42:33 <shardy> so IMO it'd be best avoided if possible 14:42:35 <EmilienM> shardy: no way we do that again 14:43:34 * mwhahaha shrugs 14:43:42 <mwhahaha> it's an option if we can't get anything goig 14:43:44 <mwhahaha> going 14:43:46 <dmsimard> it seems jobs are timing out on a specific play: 14:43:48 <dmsimard> http://logs.openstack.org/60/508660/13/check/legacy-tripleo-ci-centos-7-scenario001-multinode-oooq-puppet/c92e76d/job-output.txt.gz#_2017-10-03_10_59_26_310226 14:44:01 <dmsimard> anyone familiar would know why ? 14:44:19 <mwhahaha> we can look after that 14:44:26 <shardy> mwhahaha: yeah, cool, just saying probably should be a last resort :) 14:44:32 <EmilienM> on "Add virthost to inventory" ? 14:44:50 <dmsimard> maybe it's not a specific thing, another job timed out on a different task http://logs.openstack.org/60/508660/13/check/legacy-tripleo-ci-centos-7-undercloud-oooq/5dc7681/job-output.txt.gz#_2017-10-03_10_46_17_231725 14:44:53 <mwhahaha> ok so it sounds like we need to actively work on CI, so lets close out the meeting and go focus on CI 14:44:59 <mwhahaha> thanks everyone 14:45:01 <mwhahaha> #endmeeting