14:00:23 <shardy> #startmeeting tripleo
14:00:28 <openstack> Meeting started Tue Nov  8 14:00:23 2016 UTC and is due to finish in 60 minutes.  The chair is shardy. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:29 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:31 <d0ugal> o/
14:00:32 <openstack> The meeting name has been set to 'tripleo'
14:00:36 <shardy> hey all, who's around?
14:00:40 <skramaja> hello
14:00:44 <beagles> o/
14:00:44 <shadower> hey
14:00:56 <shardy> Please add any one-off items to https://etherpad.openstack.org/p/tripleo-meeting-items
14:00:59 <sshnaidm> o/
14:01:04 <d0ugal> Hi!
14:01:06 <bnemec> o/
14:01:15 <marios> \o
14:01:16 <trown> o/
14:01:29 <arxcruz> o/
14:01:34 <coolsvap> o/
14:01:50 <jpich> o/
14:01:54 <shardy> #topic agenda
14:01:54 <shardy> * one off agenda items
14:01:54 <shardy> * bugs
14:01:54 <shardy> * Projects releases or stable backports
14:01:54 <shardy> * CI
14:01:55 <mwhahaha> o/
14:01:57 <shardy> * Specs
14:01:59 <shardy> * open discussion
14:02:03 <ccamacho> hey!!! TripleOers!
14:02:26 <bandini> o/
14:02:40 <shardy> Ok then, hi all - lets get started!
14:02:45 <cdearborn> o/
14:02:52 <jokke_> o/
14:03:00 <shardy> I don't see any one-off items except the one I added re Ocata-1, which we can cover in the project releases standing item
14:03:16 <shardy> does anyone have anything to add before we get into the recurring items?
14:04:04 <shardy> Alright, could be a short meeting today then :)
14:04:16 <fultonj> o/
14:04:26 <bandini> w00t
14:04:30 <shardy> #info skipping one-off items as there aren't any
14:04:35 <shardy> #topic bugs
14:04:52 <shardy> #link https://bugs.launchpad.net/tripleo/
14:05:13 <shardy> So, it's been another bad week for CI impacting bugs, thanks to everyone for efforts resolving them
14:05:35 <shardy> https://bugs.launchpad.net/tripleo/+bug/1638908 is still unresolved AFAIK
14:05:35 <openstack> Launchpad bug 1638908 in tripleo-quickstart "Overcloud deployment fails in minimal configuration with ('Connection aborted.', BadStatusLine("''",))" [Undecided,In progress] - Assigned to Alfredo Moralejo (amoralej)
14:05:41 <shardy> has anyone had any luck locally reproducing that?
14:05:49 <shardy> I hit it once, but was then unable to reproduce
14:06:03 <bandini> not me, plan to retry tonight though
14:06:19 <shardy> there's a theory that increasing haproxy timeouts will help, but I'm not yet clear if that's the full story
14:06:19 <trown> I have never been able to reproduce that one outside of CI
14:07:31 <arxcruz> long time ago, in one of my tests I've seen this, and was because haproxy wasn't sending the proper headers
14:07:31 <trown> ya it only happens with ssl, so haproxy timeouts could help
14:07:54 <arxcruz> and as trown said, on ssl  only
14:08:08 <shardy> I hit it without ssl post'ing to swift locally, but that could have been a different issue which just caused the same low-level python cryptic error
14:08:20 <trown> there is a bigger issue in that bug though (not CI impacting, but user impacting) in that the logging is pretty awful
14:09:01 <shardy> yeah there's no swift logs at all, even with undercloud_debug = true
14:09:09 <shardy> so we can probably fix that at least
14:09:27 <shardy> Ok then, lets move on, but if anyone has any more clues please do update the bug, thanks!
14:10:34 <shardy> https://bugs.launchpad.net/tripleo/+bug/1604927 is another critical issue we don't seem to have a handle on yet
14:10:34 <openstack> Launchpad bug 1604927 in tripleo "Deployment times out due to nova-conductor never starting" [Critical,Triaged]
14:10:43 <shardy> bnemec: any further clues on that one?
14:11:29 <bnemec> shardy: I haven't actually seen that recently.  We could probably close it for now.
14:11:43 <shardy> bnemec: ack, please do if you're happy it's gone, thanks!
14:12:05 <shardy> Anyone else have bugs the want to highlight before we move on?
14:13:11 <ccamacho> shardy me
14:13:42 <ccamacho> related to HAproxy restarts on ControllerPrePuppet and ControllerPostPuppet
14:13:57 <ccamacho> https://bugs.launchpad.net/tripleo/+bug/1640175
14:13:57 <openstack> Launchpad bug 1640175 in tripleo "HAProxy doesn't load the new configuration never" [Undecided,In progress] - Assigned to Carlos Camacho (ccamacho)
14:14:13 <ccamacho> just wanted to ask more info about if this can impact upgrades
14:14:55 <ccamacho> bandini for HA and marios for upgrades please when having some free cycles.
14:15:10 <sshnaidm> shardy, these two are also CI impacting: https://bugs.launchpad.net/tripleo/+bug/1639885  https://bugs.launchpad.net/tripleo/+bug/1639970
14:15:10 <openstack> Launchpad bug 1639885 in tripleo "CI: pingtest timeouts cause by performance issues (redis, swift, ceiliometer)" [High,Triaged]
14:15:11 <openstack> Launchpad bug 1639970 in tripleo "CI: cinder fails to allocate memory while creating volume for ping test tenant" [Critical,Confirmed]
14:15:42 <shardy> ccamacho: certainly looks like it may, can you please add more details to the bug then we can discuss further?
14:15:42 <bandini> ccamacho: I am almost done with an escalation. happy to sync up in a bit?
14:16:04 <ccamacho> shardy ack Ill add more details there
14:16:33 <ccamacho> thanks bandini!
14:16:41 <marios> ccamacho: sorry, chasing BZ, reading back
14:17:19 <shardy> Ok so bug #1639970 needs further investigation to see where/why we're using more memory
14:17:19 <openstack> bug 1639970 in tripleo "CI: cinder fails to allocate memory while creating volume for ping test tenant" [Critical,Confirmed] https://launchpad.net/bugs/1639970
14:17:41 <shardy> do we know if https://review.openstack.org/#/c/394548/ fixes bug 1639885 or if further work is needed?
14:17:41 <openstack> bug 1639885 in tripleo "CI: pingtest timeouts cause by performance issues (redis, swift, ceiliometer)" [High,Triaged] https://launchpad.net/bugs/1639885
14:18:12 <sshnaidm> shardy, no, it doesn't fix it
14:18:24 <sshnaidm> shardy, the performance issue is still there
14:18:27 <marios> ccamacho: ok i guess you will add more info there ? seems very new do we have a BZ for that (we can take it offline after the meeting too). Haven't heard of someone hitting that yet but we should find out more (I mean for upgrades)
14:18:42 <shardy> sshnaidm: Ok, can you please add more details, as "various performance issues" isn't that actionable
14:18:46 <shardy> thanks :)
14:19:01 <ccamacho> marios ack, https://bugzilla.redhat.com/show_bug.cgi?id=1390962
14:19:01 <openstack> bugzilla.redhat.com bug 1390962 in rhel-osp-director "HAProxy doesn't load the new configuration after scaling out the role running the Openstack API services" [Urgent,Assigned] - Assigned to ccamacho
14:19:13 <marios> ccamacho: ty
14:19:19 <shardy> launchpad bugs here please ;)
14:19:37 <ccamacho> shardy upps
14:19:38 <shardy> Ok then, any further bugs or shall we continue?
14:20:25 <shardy> #topic Projects releases or stable backports
14:20:33 <marios> shardy: sorry my fault i asked for that. we do always link to LP from the BZ though where appropriate
14:20:49 <shardy> Ok, two things to discuss here, slagle is planning a stable/newton release tomorrow
14:21:12 <shardy> and we need to release ocata-1 next week (I'm happy to coordinate that unless anyone else wants to)
14:21:23 <shardy> slagle: what's the status of the newton release, are we in good shape for tomorrow?
14:21:31 <shardy> any outstanding backports need review attention?
14:21:47 <bandini> I'd like to add this one (only to newton, not to master) https://review.openstack.org/394980
14:21:52 <bandini> slagle, marios: ^
14:22:26 <marios> bandini: thanks (for the galera issue looks like)
14:22:36 <bandini> marios: totally not galera btw ;)
14:22:46 <marios> shardy: this is another one that was filed moments ago https://review.openstack.org/#/c/394968/ which we'll need into newton
14:23:34 <shardy> marios: ack
14:23:36 <marios> bandini: k :D well the symptom was galera at least
14:23:42 <marios> bandini: will check the review thanks
14:23:49 <jpich> This one as well, a missed parameter in the generated passwords: https://review.openstack.org/#/c/394493/
14:24:08 <bandini> marios: np ;)
14:25:09 <marios> shardy: another one here https://review.openstack.org/#/c/389830/
14:25:23 <slagle> shardy: everything for newton won't be merged by tomorrow
14:25:36 <slagle> we can always do another release though
14:25:40 <shardy> slagle: Yeah, I'm assuming we may need to do another one
14:25:54 <shardy> but we can try to push any passing CI patches in during the rest of today I guess
14:25:57 <trown> ya releases are fairly inexpensive
14:26:21 <slagle> yea so i'll request the release of what we've got tomorrow
14:26:43 <shardy> Ok then sounds like the newton release is under control, thanks!
14:26:49 <shardy> #link https://launchpad.net/tripleo/+milestone/ocata-1
14:26:57 <shardy> 168 bugs targeted :-O
14:27:29 <shardy> I'm going to start deferring all bugs later this week to ocata-2 unless they're assigned and high/critical priority
14:27:51 <shardy> we'll aim to cut the ocata-1 release next week (shall we say Wednesday again?)
14:28:04 <trown> wednesday seems good
14:28:14 <jpich> sounds reasonable
14:28:14 <shardy> jpich: what's the status of the tripleo-ui CI job?
14:28:26 <shardy> https://blueprints.launchpad.net/tripleo/+spec/tripleo-ui-basic-ci-check
14:28:29 <jpich> shardy: Last patch is ready for review
14:28:40 <jpich> https://review.openstack.org/#/c/390845/
14:28:51 <shardy> Ok, thanks, lets see if we can get that solitary blueprint landed this week then ;)
14:28:56 <jpich> :)
14:29:18 <shardy> Should we use a gerrit topic again to help focus reviews?
14:29:38 <shardy> e.g if folks have release blocker bugs, and they're targetted to ocata-1, tag the patches with tripleo/ocata1 ?
14:29:48 <shardy> I found that helpful in the run-up to the newton release
14:29:57 <trown> +1 that is really helpful
14:30:15 <shardy> should probably be tripleo/ocata-1
14:30:19 <shardy> for consistency
14:31:21 <shardy> Ok then, well lets do that, but FWIW I'll probably prefer deferring things to ocata-2 wherever possible given the huge number of outstanding bugs
14:32:28 <shardy> Feel free to help by deferring bugs to ocata-2 if you think they can wait
14:32:58 <trown> I have been filing new bugs targeted at ocata-2 already
14:32:58 <shardy> Anything else related to releases before we continue?
14:33:24 <shardy> ++ yeah please don't target any new bugs to ocata-1 unless they're super critical
14:33:31 <shardy> I'll only write a script that defers them ;)
14:34:28 <shardy> #topic CI
14:34:45 <shardy> Ok who wants to give an update of the current status of CI?
14:35:06 <shardy> I know things are looking a log more green now, and we've talked about a few remaining CI impacting bugs
14:35:53 <shardy> I'm interested to discuss how we can more effectively triage/assign CI related issues
14:36:24 <slagle> stop all other work
14:36:34 <d0ugal> :)
14:36:43 <bandini> I personally would love if we could dedicate a deep dive to CI. I try to help but I am often a little confused by the whole CI topic ;)
14:36:52 <shardy> slagle: so, that's one option - but is it efficient to have *everyone* stop to look at the same issue
14:37:06 <shardy> sometimes there are multiple issues, and often there are critical bugs which sit unassigned
14:37:30 <bandini> especially regarding the flow of fixes into rdo when the issue is not tht/tripleo specific
14:38:07 <slagle> shardy: yea, i think it is more efficient
14:38:10 <jokke_> I personally think that as long as the CI is as complex as it is to debug, it's just waste of time to "everyone stop all other work" kind of approach
14:39:09 <shardy> I think we need some way to avoid the same folks always fixing CI, but which doesn't result in a 50% efficiency hit on all development
14:39:19 <shardy> CI regressions happen almost every day lately
14:39:38 <bandini> yeah
14:40:06 <mwhahaha> we probably need to start going back and doing proper RCAs on blocking CI issues to see what happened
14:40:16 <slagle> part of the problem is that the people who do work on CI are consumed with just the firedrills/reactions
14:40:19 <mwhahaha> the big ones have been packaging issues that caused not complete failures
14:40:37 <slagle> so there is little time left to work on things like documenting it for others or making it less complex
14:40:52 <slagle> that's one of the ways that "stopping other work" would make us more efficient
14:41:04 <shardy> slagle: I agree, I'm just saying if 100% of the team are consumed by the same firedrills, that doesn't necesarily help
14:41:20 <jokke_> shardy: ++
14:41:32 <shardy> but it's defintely a problem we need to address
14:41:39 <slagle> shardy: yea, not saying we need everyone looking at the same issue at the same time
14:42:17 <slagle> just that if CI is down...and it's already being worked on, maybe take that as opportunity to look into how to avoid similar failures in the future
14:42:27 <slagle> or improve something so that different people could help next time
14:42:51 <slagle> or document the issue for a wider understanding, etc
14:42:54 <shardy> slagle: cool, that was my initial interpretation of "stopping other work" - if we can figure out ways for more folks to help I'm 100% in favor of it of course
14:43:38 <shardy> Alright, perhaps we can continue this discussion on the ML as we'll run out of time here
14:43:49 <shardy> #topic specs
14:44:04 <shardy> So Emilien was talking about observing a spec freeze starting next week
14:44:10 <slagle> just a follow up from last week, the major underclodu upgrades job is merged and non-voting now
14:44:17 <slagle> sorry, done with ci now :)
14:44:21 <bnemec> \o/
14:44:26 <jokke_> one thin tht would be super helpful to perhaps get into the level being able to help (either on the firedrills or preventatively) would be some kind of summary of what caused it, how it was found and how it was fixed
14:45:05 <shardy> We've got a bunch of open spec reviews, please help with reviews, and I think we should start landing things which look in good shape with at least a couple of +2s
14:45:26 <shardy> slagle: good news :)
14:45:31 <skramaja> i have added a BP - https://blueprints.launchpad.net/tripleo/+spec/tuned-nfv-dpdk for DPDK performance, things are not clear, we are working with the performance team for it. but we dont want to miss the ocata cycle freeze. so raising a BP
14:45:53 <shadower> I've written a couple of validation-related specs that could use attention: https://review.openstack.org/#/c/393281/ and https://review.openstack.org/#/c/393775/
14:45:59 <shadower> (they affect tripleo-common)
14:46:10 <shardy> jokke_: yeah, I think that's what mwhahaha was suggesting with the RCA comment, in theory that info should be in the bug report, but often it isn't
14:46:49 <shardy> #action everyone to review all-the-specs ahead of proposed spec freeze
14:47:06 <skramaja> shardy: we will have more clarity in the coming week.
14:47:15 <fultonj> review all-the-specs++
14:47:37 <shardy> Ok lets try to land as much as possible then re-assess in next weeks meeting
14:47:44 <shardy> thanks all
14:47:57 <shardy> #topic Open Discussion
14:48:27 <shardy> 12 minutes to discuss other things (or continue to debate CI if you wish ;)
14:48:30 <mwhahaha> I wanted to mention about bugs. it would be helpful if you spot something wrong, create a bug. you don't have to fix it but it allows other people a chance to work on something and possibly know there's an issue
14:48:40 <jpich> +1
14:48:44 <mwhahaha> i've noticed many times people will create a bug right before proposing a patch
14:48:58 <jpich> if at all :)
14:49:03 <shardy> +1 that should already be happening in theory but a good reminder mwhahaha thanks
14:49:25 <mwhahaha> just a friendly reminder :)
14:49:31 <shardy> :)
14:49:43 <shardy> Anyone have anything else to raise?
14:49:43 <bnemec> Example: I opened an ipv6 bug yesterday and beagles fixed it before I could. :-)
14:49:51 <shadower> I'd like to point that there's a bunch of open validation bugs free to take :-)
14:50:22 <bandini> mwhahaha: +1
14:52:00 <shardy> Ok, waiting 1 minute before declaring meeting complete, anything else before we wrap things up?
14:53:03 <shardy> thanks all!
14:53:08 <shardy> #endmeeting