13:59:37 <mwhahaha> #startmeeting tripleo
13:59:37 <mwhahaha> #topic agenda
13:59:37 <mwhahaha> * Review past action items
13:59:37 <mwhahaha> * One off agenda items
13:59:37 <mwhahaha> * Squad status
13:59:37 <mwhahaha> * Bugs & Blueprints
13:59:37 <openstack> Meeting started Tue Oct  3 13:59:37 2017 UTC and is due to finish in 60 minutes.  The chair is mwhahaha. Information about MeetBot at http://wiki.debian.org/MeetBot.
13:59:37 <mwhahaha> * Projects releases or stable backports
13:59:38 <mwhahaha> * Specs
13:59:38 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
13:59:38 <mwhahaha> * open discussion
13:59:39 <mwhahaha> Anyone can use the #link, #action and #info commands, not just the moderatorǃ
13:59:41 <openstack> The meeting name has been set to 'tripleo'
13:59:55 <EmilienM> o/
13:59:57 <fultonj> o/
13:59:59 <slagle> hi
14:00:02 <jfrancoa> o/
14:00:06 <trown> o/
14:00:07 <bandini> o/
14:00:12 <mandre> o/
14:00:17 <cdearborn_> \o
14:00:19 <marios> o/
14:00:20 <beagles> o/
14:00:22 <jistr> o/
14:00:22 <matbu_> o/
14:00:29 <jtomasek> o/
14:00:44 <atoth> o/
14:00:47 <ccamacho> o/
14:00:49 <jrist> \o
14:01:26 <mwhahaha> alright let's do this
14:01:27 <mwhahaha> #topic review past action items
14:01:33 <mwhahaha> shardy to look at how to reduce # of services deployed on ovb (continued)
14:01:45 <mwhahaha> shardy, did you get a chance to do this? not that it matters because CI is hosed
14:02:38 <adarazs> o/
14:03:06 <mwhahaha> I'll take that as a no, i'll follow up with steve later
14:03:15 <owalsh> o/
14:03:25 <mwhahaha> #action mwhahaha to follow up with shardy about services and ovb jobs
14:03:27 <mwhahaha> review newton backports in gerrit
14:03:39 <EmilienM> I think we did it
14:03:43 <mwhahaha> as a reminder upstream eol is soon
14:03:53 <EmilienM> but some of them are still under review
14:04:05 <mwhahaha> still trying to figure out what that exactly means for tripleo but make sure  to take a look at newton stuff
14:04:06 <EmilienM> and it won't make progress until this zuul v3 stops to kill us
14:04:24 <mwhahaha> yup
14:04:31 <fultonj> regarding newton i have a bug i am working on i hope to backport to newton but it has to go further https://bugs.launchpad.net/tripleo/+bug/1720787
14:04:32 <openstack> Launchpad bug 1720787 in tripleo "TripleO deploys ceph client keyring with 644 permissions" [High,In progress] - Assigned to John Fulton (jfulton-org)
14:04:33 <shardy> mwhahaha: Hey sorry, running late, no progress re ovb jobs yet, as you say CI has been down
14:04:34 <fultonj> just an fyi
14:05:06 <shardy> mwhahaha: still plan to take a look, but pre-zuul3 issues the OVB jobs weren't timing out as much so I deproritized reducing the services
14:05:11 <EmilienM> fultonj: it will be - in theory we release newton / ocata / pike every 2 weeks
14:05:44 <mwhahaha> shardy: ok. yea i think it's still a good idea to evaulate what we're checking to also help with excessive coverage, etc
14:05:49 <fultonj> EmilienM: ack
14:06:01 <openstackgerrit> Tim Rozet proposed openstack/tripleo-heat-templates master: Dynamically renders network isolation resource registry  https://review.openstack.org/509190
14:06:18 <shardy> mwhahaha: Yeah agreed
14:06:36 <mwhahaha> ok moving on to the next stuff
14:06:37 <mwhahaha> #topic one off agenda items
14:06:42 <mwhahaha> #link https://etherpad.openstack.org/p/tripleo-meeting-items
14:06:59 <mwhahaha> looks empty, anyone have anything they wish to address?
14:07:01 <fultonj> one-off...
14:07:07 <fultonj> https://etherpad.openstack.org/p/tripleo-integration-squad-status
14:07:10 <fultonj> TripleO Integration Squad Status
14:07:20 <mwhahaha> fultonj: we have a section for squad status (next) :D
14:07:24 <fultonj> whops
14:07:26 <mwhahaha> but thanks i'll just link that
14:07:35 <EmilienM> fultonj: nice etherpad, thanks!
14:07:42 <fultonj> thanks
14:07:53 <fultonj> thanks gfidente
14:08:11 <mwhahaha> ok moving on to squad status since we have some
14:08:12 <mwhahaha> #topic Squad status
14:08:20 <mwhahaha> integration
14:08:20 <mwhahaha> #link https://etherpad.openstack.org/p/tripleo-integration-squad-status
14:08:38 <fultonj> we talked about these items at PTG
14:08:55 <EmilienM> fultonj: do we target them all for Queens?
14:08:59 <fultonj> no progress on multiple ceph clusters (would be expansion of composable roles capability)
14:09:13 <fultonj> only multiple ceph pools and luminous
14:09:15 <fultonj> 2 out of the 3
14:09:35 <fultonj> we hope to land changes for multiple ceph clusters now, but not fully deliver feature until post-queens
14:09:45 <marios> mwhahaha: i just put this down (we spoke about it in dfg upgrades not sure if someone got round to it already) but https://etherpad.openstack.org/p/tripleo-upgrade-squad-status
14:09:46 <fultonj> now --> during the queens cycle
14:09:58 <mwhahaha> marios: thanks!
14:10:08 <marios> matbu_: can you update with the minor updates remaining https://etherpad.openstack.org/p/tripleo-upgrade-squad-status
14:10:13 <ooolpbot> URGENT TRIPLEO TASKS NEED ATTENTION
14:10:14 <ooolpbot> https://bugs.launchpad.net/tripleo/+bug/1718387
14:10:15 <ooolpbot> https://bugs.launchpad.net/tripleo/+bug/1719123
14:10:15 <ooolpbot> https://bugs.launchpad.net/tripleo/+bug/1720220
14:10:16 <ooolpbot> https://bugs.launchpad.net/tripleo/+bug/1720458
14:10:16 <openstack> Launchpad bug 1718387 in tripleo "ping test is periodically failing for the gate-tripleo-ci-centos-7-nonha-multinode-oooq " [Critical,Triaged] - Assigned to Sofer Athlan-Guyot (sofer-athlan-guyot)
14:10:16 <ooolpbot> https://bugs.launchpad.net/tripleo/+bug/1720556
14:10:17 <ooolpbot> https://bugs.launchpad.net/tripleo/+bug/1720721
14:10:18 <marios> mwhahaha: i'll ask folks to update it
14:10:18 <ooolpbot> https://bugs.launchpad.net/tripleo/+bug/1720918
14:10:18 <ooolpbot> https://bugs.launchpad.net/tripleo/+bug/1720973
14:10:19 <openstack> Launchpad bug 1719123 in tripleo "tempest fails on overcloud keystone admin tenant error" [Critical,Triaged] - Assigned to Arx Cruz (arxcruz)
14:10:20 <openstack> Launchpad bug 1720220 in tripleo "CI: Most/all legacy-tripleo jobs failing on gate" [Critical,Triaged]
14:10:21 <openstack> Launchpad bug 1720458 in tripleo "Lastest delorean pip package causes error in autodoc" [Critical,In progress] - Assigned to wes hayutin (weshayutin)
14:10:22 <openstack> Launchpad bug 1720556 in tripleo "tracker: upstream zuul is not triggering periodic jobs" [Critical,Triaged] - Assigned to wes hayutin (weshayutin)
14:10:23 <openstack> Launchpad bug 1720721 in tripleo "CI: OVB jobs fail because can't install XStatic from PyPI mirror on rh1 cloud" [Critical,Triaged] - Assigned to Paul Belanger (pabelanger)
14:10:24 <openstack> Launchpad bug 1720918 in tripleo "puppet-firewall changed the ipv6-icmp rule type name" [Critical,In progress] - Assigned to Ben Nemec (bnemec)
14:10:26 <openstack> Launchpad bug 1720973 in tripleo "CI: infras cirros image has wrong permission" [Critical,Triaged] - Assigned to Paul Belanger (pabelanger)
14:10:30 <mwhahaha> marios: sounds good i'll use that to refer to status durring the meetings going forward
14:10:34 <matbu_> marios: yep /me tries to click on the link
14:10:34 <EmilienM> fultonj: do you have upgrades working from ocata - puppet-ceph managed to pike ceph-ansible managed? (I know CI is down now, but I would like to see some CI job voting here once it's back)
14:11:04 <fultonj> from ocata working, but no ci (must add ci)
14:11:27 <EmilienM> fultonj: maybe you can add it in the etherpad, I would like it to be a priority
14:11:37 <fultonj> EmilienM: yes, will do
14:11:40 <EmilienM> thanks
14:11:41 <fultonj> #actionitem
14:11:46 <mandre> EmilienM: I think that may be covered in the CI work we do in the containers squad
14:11:58 <fultonj> #action ceph-upgrade ci
14:12:10 <EmilienM> mandre: most probably, I just want to make sure this is done at some point
14:12:12 <mandre> talking about the container squad, we also prepared an etherpad :) https://etherpad.openstack.org/p/tripleo-containers-squad-status
14:12:29 <mwhahaha> mandre: thanks! i'll add that to the list for next time as well
14:12:52 <matbu_> marios: done
14:13:04 <mwhahaha> ci - weshay trown sshnaidm|mtg - any status that you can share around CI?
14:13:29 <mwhahaha> ui/cli - jrist - any status you can share around UI? (or validations as well)
14:13:39 <trown> mwhahaha: other than it is totally borked due to zuulv3 migration not a ton this week
14:13:44 <EmilienM> mwhahaha: status is here https://etherpad.openstack.org/p/tripleo-ci-squad-meeting
14:13:57 <marios> ty matbu_
14:13:58 <mwhahaha> EmilienM: yea but they are getting rid of the squad meeting so is that still valid?
14:13:59 <jrist> nothing in particular except that we are beginning to work on some ci and upstream automation
14:14:00 <weshay> mwhahaha, we're putting on the finishing touches to tripleo promotion jobs in rdo software factory
14:14:10 <EmilienM> mwhahaha: or here even https://etherpad.openstack.org/p/tripleo-ci-squad-scrum
14:14:12 <trown> we are working on a new process for organizing our squad and will have more formal status etherpad going forward
14:14:43 <mwhahaha> trown: sounds good, just let me know so I can add it to the list
14:15:04 <mwhahaha> workflows - thrash - any status you can share?
14:15:12 <mwhahaha> networking - beagles - any status you can share?
14:15:19 <weshay> tripleo-ci-squad sprint details can be found https://www.redhat.com/archives/rdo-list/2017-September/msg00068.html
14:16:39 <thrash> mwhahaha: working on some rfe's for ui
14:16:54 <mwhahaha> thrash: sounds good, thanks. let us know if you need anything (reviews/etc)
14:17:00 <thrash> mwhahaha: ack
14:17:39 <mwhahaha> ok thanks everyone for the status. let's move on
14:17:49 <mwhahaha> #topic bugs & blueprints
14:17:50 <mwhahaha> #link https://launchpad.net/tripleo/+milestone/queens-1
14:18:03 <mwhahaha> We currently have 62 blueprints and about 476 open bugs. Please take some time to review your blueprint status and make sure it is properly up to date.
14:18:14 <mwhahaha> please be aware that queens-1 is in about 2 weeks
14:18:26 <mwhahaha> so please move stuff that you aren't going to land by queens-1 out to queens-2
14:19:00 <mwhahaha> anyone have any blueprints or bugs they want to raise for visibility?
14:19:39 <slagle> i filed a blueprint for the ansible work
14:19:42 <slagle> just fyi
14:19:44 <EmilienM> I hope we can merge something before the 2 weeks ^^
14:20:14 <mwhahaha> ok
14:20:52 <marios> mwhahaha: its already in upgrades squad status, but for visibility, spec for the Q upgrades here https://review.openstack.org/507620
14:21:07 <marios> mwhahaha: captures what we discussed in ptg
14:21:12 <mwhahaha> marios: sounds good
14:21:17 <EmilienM> mwhahaha: https://blueprints.launchpad.net/tripleo/+spec/ansible-config-download-ci and https://blueprints.launchpad.net/tripleo/+spec/ansible-config-download
14:22:07 <mwhahaha> slagle: do you think those will be landed in the next two weeks?
14:22:23 <EmilienM> it's possible
14:22:24 <slagle> mwhahaha: i think it's possible
14:22:30 <slagle> yea, we'll go with "possible" :)
14:22:54 <mwhahaha> provided ci stops being on fire :D
14:22:58 <slagle> it's also possible not a single patch will get landed in the next 2 weeks :)
14:22:59 <EmilienM> we already tested the bits on our envs, it works fine (module some changes) - if our CI is back this week...
14:23:23 <EmilienM> we'll move it to queens-2 otherwise, but we want to get ci coverage asap for this feature
14:23:31 <mwhahaha> makes sense
14:23:40 <mwhahaha> Ok any other bugs/blueprints?
14:23:41 <shardy> 476 open bugs - I wonder what we can do about that - I think the trend is upwards so perhaps we can prune/prioritize or de-duplicate better there?
14:23:57 <mwhahaha> shardy: i've started to go through and clean some up
14:24:04 <mwhahaha> shardy: we're about +6 for the week
14:24:18 <shardy> mwhahaha: ack OK I'll see if I can spend some time helping
14:24:32 <EmilienM> we could target 450 by queens-1, and maybe 400 by queens-2
14:24:36 <mwhahaha> there's a bunch of old stuff that is no longer valid i'm sure
14:24:44 <shardy> mwhahaha: I find it useful to have a clearly defined (smaller) set of priority things for a milestone, then it's easier to know which reviews to prioritize
14:25:10 <mwhahaha> so we could move all <= medium to queens-2 for visibility
14:25:22 <shardy> mwhahaha: yeah something like that might be good
14:25:39 <mwhahaha> #action mwhahaha to move bugs <= medium to queens-2 and review > medium for validity
14:25:43 <mwhahaha> i'll do that this week
14:25:45 <EmilienM> mwhahaha: +1
14:25:56 <mwhahaha> moving on
14:25:57 <mwhahaha> #topic projects releases or stable backports
14:26:00 <EmilienM> mwhahaha: I can do it with the script
14:26:09 <mwhahaha> EmilienM: i'll get it
14:26:13 <EmilienM> k
14:26:40 <mwhahaha> so we have pending stable releases but there's some issues around th stable-policy stuff
14:27:31 <mwhahaha> Given the status of CI there isn't much point in talking backport patches
14:27:34 <mwhahaha> so let's move on
14:27:38 <mwhahaha> #topic specs
14:27:38 <mwhahaha> #link https://review.openstack.org/#/q/project:openstack/tripleo-specs+status:open
14:27:58 <mwhahaha> given that queens-1 is in ~2 weeks, we need to make sure any queens specs get merged asap
14:28:04 <mwhahaha> please take a look at the open ones and review
14:28:18 <mwhahaha> as marios pointed out the upgrade spec is here:
14:28:20 <mwhahaha> #link https://review.openstack.org/507620
14:28:33 <EmilienM> can we merge specs? lol
14:28:50 <mwhahaha> well it starts with reviewing them :D
14:28:51 <EmilienM> ah yeah, jobs pass, at least something that works
14:28:57 <marios> mwhahaha: thanks, sorry for the noise before (I didn't read the context, i thought you were asking for updates on Q stuff, not blueprints/bugs targetting apologies )
14:29:12 <mwhahaha> marios: it's all good, it's better to have more info then none at all :D
14:30:06 <mwhahaha> Ok so on to the open discussion
14:30:07 <mwhahaha> #topic open discussion
14:30:30 <mwhahaha> Anything else that folks would like to talk about?
14:30:38 <EmilienM> we haven't talked much about CI
14:30:43 <bandini> is there any super rough/high level eta for fixing zuulv3?
14:30:44 <EmilienM> it's unclear to me where we are now
14:30:53 <mwhahaha> dmsimard, pabelanger: any updates on CI
14:30:56 <marios> right, i was weary about asking... do we know anything more about it :) seems other share the sentiment
14:31:01 <shardy> bandini: that's been asked on the ML but I don't see any reply yet
14:31:10 <EmilienM> we've been very patient until now but things are getting bad now
14:31:35 <bandini> shardy: yeah saw the mail, thought maybe some tripleo ci overlords had some additional thoughts :)
14:31:37 <EmilienM> pabelanger, dmsimard: this CI downtime is having a critical effect on TripleO delivery, fyi
14:31:52 <dmsimard> mwhahaha, EmilienM: sshnaidm|mtg gave me an update earlier, it looks like we're almost clear and hitting some sort of timeout right now
14:32:08 <dmsimard> let me pull up the review, hang on.
14:32:18 <EmilienM> shardy: which email?
14:32:24 <EmilienM> [openstack-dev] [all] Update on Zuul v3 Migration - and what to do about issues
14:32:27 <EmilienM> this one?
14:32:28 <shardy> [openstack-dev] [all] Update on Zuul v3 Migration - and what to do about issues
14:32:30 <dmsimard> mwhahaha, EmilienM: https://review.openstack.org/#/c/508660/
14:32:39 <trown> dmsimard: we are hitting a timeout because the connection to the nodepool node drops
14:32:47 <amoralej> EmilienM, but if we ping puppet-firewall, we'll be unable to merge the fix, right?
14:33:07 <shardy> yeah that one, dansmith asked about mitigation as did sdague yesterday
14:33:09 <bnemec> True
14:33:10 <dmsimard> trown: yes, I am looking into it right now
14:33:13 <EmilienM> amoralej: let's pin it anyway for now, so we remove one issue
14:33:17 <amoralej> ok
14:33:59 <EmilienM> dmsimard, pabelanger : so back to the initial question, any ETA?
14:34:11 <dmsimard> mwhahaha, EmilienM: there are different ongoing discussions about considering a rollback but it's not a simple task because some projects (tripleo included) have had to introduce changes in their projects to support zuul v3 and doing a rollback would mean breaking those projects again
14:34:30 <EmilienM> why haven't we tested thing before?
14:35:03 <EmilienM> it's a paradox that the projects in charge of testing wasn't tested against tripleo before
14:35:13 <shardy> yeah it's kinda surprising there wasn't some sort of paralell migration strategy so jobs could be switched over gradually
14:35:34 <shardy> EmilienM: well it sounds like it's not only TripleO
14:36:02 <tosky> the problem is that some of the fixes are stuck in the queue, but they will land
14:36:20 <tosky> I'm not sure that rolling back now wouldn't cause even more disruption
14:36:32 <pabelanger> we are having scale issues, which are harder to test then migration process
14:36:44 <shardy> tosky: yeah, it'd just be nice to have better visibility of the status/progress I guess
14:37:15 <pabelanger> EmilienM: no ETA, suggest watching https://etherpad.openstack.org/p/zuulv3-issues and #openstack-infra.
14:37:27 <EmilienM> anything we can help?
14:37:42 <dmsimard> EmilienM: I heavily recommended a gradual opt-in of selected projects, especially deployment projects such as puppet-openstack, openstack-ansible and tripleo but the decision ended otherwise
14:38:10 <dmsimard> EmilienM: right now we are working with https://review.openstack.org/#/c/508660/
14:38:24 <dmsimard> and I am looking into what seems to be a timeout related issue
14:38:38 <pabelanger> just patience and what ever needs done on etherpad
14:38:52 <dmsimard> seeing as the jobs ran for over 2 hours, I'm hopeful there are no more issues related to tripleo or the jobs themselves
14:39:07 <EmilienM> pabelanger: patience ✓ already ;-)
14:39:32 <EmilienM> last patch merged in tripleo was Sep 28th
14:39:55 <EmilienM> do we have a failover plan if by end of week it's not fixed?
14:40:15 <EmilienM> we can't stop merging code duringe more than a week, we have to find a plan b
14:40:29 <dmsimard> EmilienM: there's ongoing discussion around that topic and I anticipate an infra core to chime in the openstack-dev thread
14:40:30 <mwhahaha> weshay: said there might be an option with software factory
14:40:49 <EmilienM> I don't think they have enough resources tbh
14:41:22 <mwhahaha> time to figure out what would qualify as a critical minimal subset of CI testing
14:41:57 <EmilienM> do we have someone looking at that^ ?
14:41:59 <EmilienM> weshay: ^
14:42:21 <EmilienM> mwhahaha: anyway, we can move on I guess
14:42:23 <shardy> we did that once before, disabled a bunch of jobs and merged code with a subset - it took weeks to fix all the regressions after
14:42:33 <shardy> so IMO it'd be best avoided if possible
14:42:35 <EmilienM> shardy: no way we do that again
14:43:34 * mwhahaha shrugs
14:43:42 <mwhahaha> it's an option if we can't get anything goig
14:43:44 <mwhahaha> going
14:43:46 <dmsimard> it seems jobs are timing out on a specific play:
14:43:48 <dmsimard> http://logs.openstack.org/60/508660/13/check/legacy-tripleo-ci-centos-7-scenario001-multinode-oooq-puppet/c92e76d/job-output.txt.gz#_2017-10-03_10_59_26_310226
14:44:01 <dmsimard> anyone familiar would know why ?
14:44:19 <mwhahaha> we can look after that
14:44:26 <shardy> mwhahaha: yeah, cool, just saying probably should be a last resort :)
14:44:32 <EmilienM> on "Add virthost to inventory" ?
14:44:50 <dmsimard> maybe it's not a specific thing, another job timed out on a different task http://logs.openstack.org/60/508660/13/check/legacy-tripleo-ci-centos-7-undercloud-oooq/5dc7681/job-output.txt.gz#_2017-10-03_10_46_17_231725
14:44:53 <mwhahaha> ok so it sounds like we need to actively work on CI, so lets close out the meeting and go focus on CI
14:44:59 <mwhahaha> thanks everyone
14:45:01 <mwhahaha> #endmeeting