19:03:35 <lifeless> #topic agenda
19:03:36 <slagle> hello
19:03:41 <lifeless> bugs
19:03:41 <lifeless> reviews
19:03:41 <lifeless> Projects needing releases
19:03:41 <lifeless> CD Cloud status
19:03:41 <lifeless> CI virtualized testing progress
19:03:43 <lifeless> Insert one-off agenda items here
19:03:46 <lifeless> open discussion
19:03:48 <lifeless> #topic bugs
19:03:52 <lifeless> #link https://bugs.launchpad.net/tripleo/
19:03:52 <lifeless> #link https://bugs.launchpad.net/diskimage-builder/
19:03:52 <lifeless> #link https://bugs.launchpad.net/os-refresh-config
19:03:54 <lifeless> #link https://bugs.launchpad.net/os-apply-config
19:03:57 <lifeless> #link https://bugs.launchpad.net/os-collect-config
19:03:59 <lifeless> #link https://bugs.launchpad.net/tuskar
19:04:02 <lifeless> #link https://bugs.launchpad.net/tuskar-ui
19:04:04 <lifeless> #link https://bugs.launchpad.net/python-tuskarclient
19:04:07 <lifeless> hmm, sthould remove -ui from there, its now a horizon problem ;)
19:04:22 <lsmola_> :-)
19:05:06 <lifeless> ok, so lets see
19:05:16 <lifeless> we're drowning in criticals
19:05:58 <lifeless> https://bugs.launchpad.net/tripleo/+bug/1270646
19:06:02 <lifeless> https://bugs.launchpad.net/tripleo/+bug/1271344
19:06:09 <lifeless> https://bugs.launchpad.net/tripleo/+bug/1272803
19:06:14 <lifeless> https://bugs.launchpad.net/tripleo/+bug/1272969
19:06:20 <lifeless> https://bugs.launchpad.net/tripleo/+bug/1278861
19:06:25 <lifeless> https://bugs.launchpad.net/tripleo/+bug/1280941
19:06:30 <lifeless> https://bugs.launchpad.net/tripleo/+bug/1283921
19:06:35 <lifeless> https://bugs.launchpad.net/tripleo/+bug/1284054
19:06:45 * SpamapS opens new window and starts ctrl-clicking
19:06:55 <lifeless> and we've got untriaged!
19:07:03 <lifeless> https://bugs.launchpad.net/tripleo/+bug/1277168
19:07:08 <lifeless> https://bugs.launchpad.net/tripleo/+bug/1279537
19:07:13 <lifeless> https://bugs.launchpad.net/tripleo/+bug/1281174
19:07:17 <lifeless> https://bugs.launchpad.net/tripleo/+bug/1281702
19:07:22 <lifeless> https://bugs.launchpad.net/tripleo/+bug/1281705
19:07:26 <lifeless> https://bugs.launchpad.net/tripleo/+bug/1281719
19:07:31 <lifeless> https://bugs.launchpad.net/tripleo/+bug/1281977
19:07:35 <lifeless> https://bugs.launchpad.net/tripleo/+bug/1284242
19:07:52 <lifeless> https://bugs.launchpad.net/tripleo/+bug/1272605
19:07:55 <lifeless> much wow
19:07:56 <jcoufal> o/ hey guys
19:09:30 <lifeless> jcoufal: ola!
19:10:02 <lifeless> ok so untriaged: I think we need to all jointly commit to doing 1hr of bug triage a week
19:10:13 <lifeless> and this will be all sorted next week with no stress
19:10:28 <lifeless> can anyone *not* make such a commitment ?
19:10:30 <SpamapS> well I just +A'd the fix for 1284054
19:10:34 <SpamapS> bug 1284054 I should say
19:12:15 <SpamapS> lifeless: +1 for the bug triage. I've always found it quite useful to have a specific day in the week that is "my bug triage day" so that I just know when that 1 hour will happen.
19:12:59 <dprince_> fine
19:12:59 <tchaypo> I can do that, but I'm going to need some hand-holding
19:13:16 <jistr> i have e-mail notifications on tuskar and tuskarclient bugs and usually i just triage as they come
19:13:28 <bnemec> My bug triage hour is usually right now. ;-)
19:13:52 <SpamapS> tchaypo: As Stevie Wonder said.. "Do allll that you caaaan"
19:14:17 <lsmola_> trying to do that, at least on tuskar :-)
19:14:25 <lifeless> jistr: so that good, but like review please share the load with the rest of the team
19:14:46 <lifeless> jistr: so put aside time after a coffee break one day and look at all the tripleo projects for untriaged bugs
19:15:03 <lifeless> untriaged == no priority and status != triaged
19:15:21 <lifeless> [ignoring in-progress and fix committed of course]
19:15:25 <jdob> thats a good idea for the e-mail notification, I didn't realize it supported that
19:16:18 <ccrouch> lifeless: is there no way to generate a single query?
19:16:32 <lifeless> ccrouch: we have canned queries on the wiki page I believe
19:16:37 <jistr> lifeless: i don't have 100% confidence in my ability to correctly triage bugs on non-tuskar projects, so that's why i don't do it much, but i'll try
19:16:39 <lifeless> ccrouch: but no, not across multiple projects
19:16:46 <ccrouch> ah ok
19:17:04 <d0ugal> Can only core do full triage? I noticed I can't set some of the values
19:17:04 <lifeless> jistr: can you tell 'omg this shouldn't happen' from 'it might be nice if?' :)
19:17:08 <lifeless> jistr: thats really all it takes
19:17:16 <jistr> yeah i hope so ;)
19:17:20 <lifeless> d0ugal: there is a team on launchpad - ~tripleo - request membership of that
19:17:23 <lifeless> tchaypo: ^ you too
19:17:32 <d0ugal> lifeless: aha, will do. Thanks.
19:17:34 <tchaypo> do we try to reproduce or confirm the problem during triage?
19:17:51 <lifeless> tchaypo: use your own judgement
19:17:55 <greghaynes> lifeless: I think we still have some people with membership pending, fyi
19:18:03 <lifeless> I will check post meeting
19:18:06 <lifeless> so these criticals
19:18:31 <tchaypo> Team request made
19:18:51 <lifeless> the PMTU one I think we probably need to put a ethtool -K gro off in automatically, drop the bug to high and pursue out of band
19:19:10 <lifeless> the leases one neutron are pursuing, but the undefined priority there makes me cry
19:19:30 <lifeless> https://bugs.launchpad.net/tripleo/+bug/1272803 is low hanging fruit for someone interested
19:19:45 <lifeless> just make a new internal interface rather than reusing the mac of the bridge
19:20:11 <lifeless> https://bugs.launchpad.net/tripleo/+bug/1272969 dprince_'s work will solve
19:20:41 <lifeless> jistr: https://bugs.launchpad.net/tripleo/+bug/1278861 - any additional news on that ?
19:21:24 <jistr> lifeless: not beyond what Dmitri posted... i'm reprovisioning my lab machine now so i'll try to run as non-root and we'll see
19:22:00 <lifeless> jistr: I think we should downgrade this - CI is working [there's a different failure right now but we had three-green-bars yesterday]
19:22:05 <lifeless> jistr: so its not systemic
19:22:09 <jistr> lifeless: +1
19:22:29 <lifeless> https://bugs.launchpad.net/tripleo/+bug/1280941 we need to back out our workaround and then its done
19:22:51 <lifeless> https://bugs.launchpad.net/tripleo/+bug/1283921 I think we landed my proposed patch
19:23:03 <lifeless> but we haven't followed up to see if the life cycle is managed properly
19:23:11 <lifeless> I'm inclined to close it and wait and see
19:23:21 <lifeless> we know we need to do more work on migrations as we finish the HA arc
19:23:25 <lifeless> thoughts?
19:24:39 <lifeless> ok, silence == assent ;P
19:24:55 <lifeless> and SpamapS says he +A'd https://bugs.launchpad.net/tripleo/+bug/1284054
19:25:30 <lifeless> ok, any other bug stuff?
19:25:31 <SpamapS> lifeless: it had a Partial-Bug tag.. I think we should drop it to High since this workaround mitigates the impact
19:25:44 <lifeless> SpamapS: # ?
19:25:52 <SpamapS> bug #1284054
19:25:59 <SpamapS> We still go too slow.
19:26:01 <lifeless> #action all tripleo devs to do 1hr bug triage a week, random offsets.
19:26:02 <SpamapS> We just wait longer now
19:26:31 <lifeless> sure
19:26:46 <lifeless> though I suspect folk will perf optimise independently
19:26:51 <lifeless> I'd be inclined to just close it
19:26:58 <lifeless> up to you!
19:27:09 <lifeless> any other bug stuff to discuss?
19:28:16 <lifeless> ok
19:28:25 <lifeless> #topic reviews
19:28:59 <lifeless> I'm going to resume my monthly reviewer summaries now that we're all well and truely back from holidays
19:29:08 <lifeless> current status
19:29:10 <lifeless> http://russellbryant.net/openstack-stats/tripleo-openreviews.html
19:29:21 <lifeless> Stats since the last revision without -1 or -2 :
19:29:21 <lifeless> Average wait time: 4 days, 1 hours, 33 minutes
19:29:30 <lifeless> Median wait time: 3 days, 19 hours, 16 minutes
19:29:49 <lifeless> we're letting ourselves down - thats a long time to be waiting for feedback on a branch
19:30:28 <SpamapS> lifeless: I've seen a lot more operational focus lately, I don't think we've been focused on reviews.
19:30:28 <lifeless> longest review is up at 6 days
19:30:47 <lifeless> there are 9 people in cd-admins, and ~20 in the review team.
19:31:04 <lifeless> SpamapS: so while cd-admins can be affected by that, it doesn't cover the whole team.
19:31:14 <lifeless> SpamapS: not by a long shot.
19:31:32 <SpamapS> so reviewers are just not doing enough reviews
19:31:46 <lifeless> I'd say so
19:31:54 <lifeless> reviews are how we scale development bandwidth
19:32:00 <SpamapS> actually yeah.. I'm at the top for 30 days.. and I KNOW I haven't been doing enough
19:32:10 <lifeless> they are more critical than bug fixing
19:32:18 <lifeless> because they actually *deliver the code to users*
19:32:38 <SpamapS> I'm not so much seeing bug fixing as testing and using.
19:32:50 <SpamapS> the non admins.. I don't know whats up with that.
19:32:51 <lifeless> I'm not interested in singling individuals out as not reviewing enough - this is a team wide challenge.
19:32:55 <slagle> the CI jobs are slowing reviews down
19:32:59 <SpamapS> but admins are also the most active reviewers
19:33:10 <slagle> not that we shouldn't be doing CI, but it's a data point to consider
19:33:12 <lifeless> slagle: you can still -1 or +2 based on the code.
19:33:20 <lifeless> slagle: the only thing CI affects is +A.
19:33:45 <slagle> ok, i've been  -1'ing still
19:33:58 <slagle> but, i've been holding off on the +2 as well
19:34:07 <slagle> it doesn't "look good to me" if jenkins hasn't run yet
19:34:28 <lifeless> of the 4 oldest reviews 3 have on one +2, the last has +A but depends on a patch with two -1s from non-core
19:34:36 <slagle> but, if the pattern is to still +2, i can start doing so
19:34:47 <lifeless> both -1's which the author (me) disagreed with.
19:35:08 <lifeless> I think cores need to review things with -1's on them, at least far enough to detect this sort of thing
19:35:32 <SpamapS> With the pipeline length we have... CI would have to be DAYS behind before it blocks a reviewer.
19:35:48 <lifeless> slagle: Here's my thought on the +2 thing - say you upload something, and two people +2 it then CI passes, I think the author could +A reasonably.
19:35:51 <SpamapS> I review everything, even if it has -2's
19:35:54 <lifeless> slagle: what do you think ?
19:36:25 <slagle> wfm
19:36:29 <lifeless> slagle: [without invoking any of our special clauses about jointly edited patches, CD features etc]
19:36:32 <lifeless> ok
19:36:35 <lifeless> so
19:36:48 <lifeless> #action pick up the game on reviews everyone. EVERYONE.
19:37:13 <lifeless> #info +2 is ok even when CI hasn't checked in yet.
19:37:19 * SpamapS whinces as the whip cracks
19:37:37 <lifeless> SpamapS: sadface, I was trying to avoid that framing of the problem.
19:37:43 <SpamapS> haha I'm kidding. :)
19:37:48 <lifeless> seriously, its a jointly affected, jointly solved issue
19:37:52 <lifeless> we have to want it.
19:38:04 <SpamapS> Not reviewing means your own patches sit longer.
19:38:06 <SpamapS> I get it. :)
19:38:10 <lifeless> #topic  Projects needing releases
19:38:24 <rpodolyaka1> I can take. not a problem
19:38:36 <lifeless> awesome
19:38:43 <lifeless> #topic  CD Cloud status
19:38:55 <lifeless> dprince_: RH region status?
19:39:33 <dprince_> lifeless: still fleshing out NW access. More good progress today. I expect full access for everyone this week I think
19:39:40 <lifeless> awesome
19:39:45 <lifeless> HP region status:
19:40:01 <lifeless> we've got a bad node running the ci overcloud, which is a problem
19:40:17 <lifeless> may be as simple as a BIOS firmware upgrade though - firedrill card open in trello
19:41:05 <lifeless> we have about 25% capacity bad in some way, an HP tripleo-cd-admin needs to take ownership of going through all the machines listed bad, opening JIRA tickets and sheparding them through
19:41:32 <tchaypo> JIRA my old friend, we meet again
19:41:34 <lifeless> ng and spamaps have done a chunk of work here but its not finished
19:42:08 <lifeless> I'd like to say any HP person, but reality is that without access to the machines to test etc it would be very hard.
19:42:31 <lifeless> the cd-undercloud network card seems to be glitching again; our old friend mellanox
19:42:36 <lifeless> thats about it
19:42:42 <lifeless> #topic CI
19:43:22 <lifeless> so we've had an exciting week
19:43:26 <lifeless> we're back with check jobs
19:43:29 <lifeless> and yesterday they were all passing
19:43:30 <lifeless> OMG
19:43:43 <rpodolyaka1> \o/
19:43:53 <lifeless> one thing that became clear last week, I want to reinforce here
19:44:23 <lifeless> tripleo-cd-admins - this is a production quality admin team: we're on the hook for treating CD-undercloud, CD-overcloud and CI-overcloud failures as production failures
19:44:29 <lifeless> we've got enough folk to do follow the sun
19:44:40 <lifeless> its a volunteer team
19:44:42 <lifeless> but
19:44:44 <lifeless> IMO
19:45:26 <lifeless> if you're in the team, it requires commitment - specifically if it breaks (e.g. if infra go 'wtf this region is down') we need to drop anything else and fix it
19:45:46 <lifeless> if that happens a lot, its in our power to correct it (e.g. more HA, take servers out of rotation, fix bugs in the code)
19:46:13 <lifeless> So - I'm going to call a vote of the cd-admins folk here: does this all make sense
19:46:37 <lifeless> remember - tripleo-cloud/tripleo-cd-admins in incubator is the list of admins
19:46:44 <dprince_> lifeless: yes
19:47:02 <lifeless> #vote does cd-admins membership imply production quality respones to everyone?
19:47:16 <lifeless> erm, I hope that calls a vote :P
19:47:25 <SpamapS> +1
19:47:26 <lifeless> #startvote does cd-admins membership imply production quality respones to everyone?
19:47:26 <SpamapS> ?
19:47:27 <openstack> Begin voting on: does cd-admins membership imply production quality respones to everyone? Valid vote options are Yes, No.
19:47:29 <openstack> Vote using '#vote OPTION'. Only your last vote counts.
19:47:33 <greghaynes> Yes
19:47:34 <SpamapS> #vote Yes
19:47:36 <greghaynes> ah
19:47:37 <lifeless> #vote yes
19:47:38 <greghaynes> #vote Yes
19:47:39 <slagle> #vote Yes
19:47:46 <lifeless> greghaynes: you're not an admin, but thanks :P
19:47:46 <dprince_> lifeless: +1 (when I'm online I'll help)
19:47:48 <greghaynes> :p
19:47:54 <tchaypo> lifeless: I'm happy to volunteer for the team, but I don't think I can be very useful just yet
19:48:23 <slagle> i'm going to ask an embarrasing question though
19:48:24 <lifeless> tchaypo: read ttripleo-cloud/README.md
19:48:35 <slagle> where are the ci clouds documented?
19:48:47 <slagle> we have TripleOCloud on the wiki
19:48:53 <slagle> that pertains only to the CD cloud does it not?
19:49:11 <lifeless> slagle: tripleo-cloud/README.md + https://wiki.openstack.org/wiki/TripleO/TripleOCloud (linked from the README) + https://wiki.openstack.org/wiki/TripleO/TripleOCloud/Regions (linked from the first wiki page)
19:49:27 <lifeless> slagle: we probably need more CI-overcloud docs!
19:49:37 <lifeless> slagle: plus the admin spreadsheet with network ranges, passwords etc.
19:49:45 <lifeless> #endvote
19:49:46 <openstack> Voted on "does cd-admins membership imply production quality respones to everyone?" Results are
19:49:52 <slagle> ah, ok the spreadsheet
19:50:01 <slagle> b/c i dont see any ci hostnames on that wiki page
19:50:04 <lifeless> slagle: the spreadsheet is linked from the wiki pages
19:50:27 <slagle> got it, will check it out
19:51:17 <lifeless> dprince_: ok, so on contacting us. I'd like to explore us sharing phone numbers to permit follow-the-sun handoffs even if folk are offline (e.g. I'm not going to keep poking at servers at 11pm when e.g. derekh or ng are awake and much more compos mentis
19:51:39 <lifeless> dprince_: I'll send mail to the list for that though, I think it needs everyone to be involved - consensus discussion
19:51:57 <lifeless> #action lifeless to mail list about tripleo-cd-admins vote + contact-options topic
19:52:18 <lifeless> #topic open discussion
19:53:12 <tchaypo> I'm hoping to finagle a vpn token and cert from the office at rhodes today, at which point I'll be able to read my work email, which will be exciting
19:54:57 <jistr> lifeless: i didn't catch the discussion but i heard a rumor that you picked up extraction of overcloud init from devtest into a separate library that tuskar api could use. Is that right?
19:55:03 <SpamapS> tchaypo: woohoo :)
19:55:23 <lifeless> jistr: I've committed to bootstrapping that this week yes
19:55:49 <lifeless> jistr: long as the clouds we're running stay up
19:56:01 <lifeless> does anyone know how multi region keystone is meant to work
19:56:02 <lifeless> like
19:56:17 <lifeless> do we run two keystones each with the other's services registered and round-robin DNS ?
19:56:31 <lifeless> or do we run one globally distributed keystone ?
19:57:06 <lifeless> I mean - we'll have two underclouds soon, which should be separate but the overcloud should present a single multi region cloud to user, no ?
19:57:19 <SpamapS> lifeless: I had always thought it was achieved using shared users/catalogs, but not tokens basically.
19:57:20 <jistr> lifeless: re the general stuff: sounds good, thanks. I just wanted to check on a rough ETA. (re keystone i don't know :) )
19:58:11 <jdob> lifeless: is there a kickoff of sorts on monday morning for the meetup?
19:58:16 <lifeless> jistr: I'm going to get a separate tree together, move stuff across into it untangling deps as needed, make it pip installable, and then say 'here, add what you need on top into this thing' :)
19:58:32 <lifeless> jdob: yes, on monday we run around going wtf are we going to be.
19:58:37 <jistr> lifeless: cool :)
