19:01:48 #startmeeting tripleo 19:01:49 Meeting started Tue Oct 15 19:01:48 2013 UTC and is due to finish in 60 minutes. The chair is lifeless. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:50 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:52 The meeting name has been set to 'tripleo' 19:02:13 #topic agenda 19:02:20 bugs 19:02:20 reviews 19:02:20 Projects needing releases 19:02:20 CD Cloud status 19:02:20 CI virtualized testing progress 19:02:22 Discuss latest comments on LXC+ISCSI bug: https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1226855 19:02:25 Insert one-off agenda items here 19:02:28 MVP1 retrospective TripleO/TripleOCloud/MVP1Retrospective 19:02:30 MVP2 retrospective TripleO/TripleOCloud/MVP2Retrospective 19:02:33 open discussion 19:02:38 lots of topics today, I'm going to be moving fast, so if you have something to add in a section,, don't wait 19:02:44 #topic bugs 19:02:50 #link https://bugs.launchpad.net/tripleo/ 19:02:51 #link https://bugs.launchpad.net/diskimage-builder/ 19:02:51 #link https://bugs.launchpad.net/os-refresh-config 19:02:51 #link https://bugs.launchpad.net/os-apply-config 19:02:51 #link https://bugs.launchpad.net/os-collect-config 19:02:53 #link https://bugs.launchpad.net/tuskar 19:02:55 #link https://bugs.launchpad.net/tuskar-ui 19:02:58 #link https://bugs.launchpad.net/python-tuskarclient 19:03:57 only 1 critical that I see 19:04:22 and that one is fix committed 19:04:42 bunch of untriaged on tripleo. 19:04:48 And the sad thing is they were filed by ATC 19:04:51 s 19:05:18 Any thoughts on how we can get folk to file bugs triaged ? 19:05:35 so should I be able to triage, I'm guessing there should be a edit button beside importance 19:06:10 do we necessarily want that? Some people think it's better to capture a bug in any state than hold off from capturing it because it's not fleshed out enough to be triaged 19:06:10 derekh: join ~tripleo on Launchpad, and then you can. And when you file there is an 'other settings' or 'more' or something which lets you set during creation. 19:06:22 Ng: triaged doesn't imply fleshed out. 19:06:37 lifeless: I believe we can control the "what to put in the bug" message and the "thanks for filing a bug" message, so we can remind people to self-triage in there. 19:06:37 lifeless: ahh that explains it 19:06:50 SpamapS: good idea 19:07:02 I'll look at doing that right now 19:07:04 #action SpamapS to customise the reporting guidelines to encourage self triage 19:07:12 lifeless: shouldn't it mean that we at least understand enough about the bug to be able to decide its priority and start fixing it? 19:07:21 Ng: yes, no in order. 19:07:55 hmm, ok. not how I've used Triaged in the past, but I'm not opposed to it having that meaning :) 19:08:07 (for tripleo at least :) 19:08:08 Ng: think of it in the original context 19:08:12 Ng: will the patient die? 19:08:25 * Ng nods 19:08:30 Ng: quick snap assessment. Then move on. 19:08:51 "Only the team's administrators can invite a user to be a member" 19:08:54 We can and will tune bugs further, but having them competely uncategorised just means that I end up triaging them. 19:08:59 derekh: nuts, let me fix that 19:09:38 set to moderated 19:09:40 derekh: ^ 19:10:16 * derekh joins ~tripleo 19:10:23 * rpodolyaka1 too 19:10:47 Ng: the consequence of setting a high bar for triage is that you get a massive pool of bugs noone is willing to touch because they are clear/complete/reproducable enough, but some may well be affecting everybody 19:11:09 Ng: clearing up a bug enough to work on it is something you only want to do *when you want to work on it* 19:11:33 and unless you are running short of bugs, thats never the problem. The problem is 'which bugs are most useful to fix' :) 19:11:47 fair points, all 19:13:06 #action lifeless to summarise this + team joining and mail to the list 19:13:40 #topic reviews 19:13:47 http://russellbryant.net/openstack-stats/tripleo-openreviews.html 19:13:50 http://russellbryant.net/openstack-stats/tripleo-reviewers-30.txt 19:13:53 http://russellbryant.net/openstack-stats/tripleo-reviewers-90.txt 19:14:50 Median wait time: 1 days, 0 hours, 21 minutes 19:14:51 3rd quartile wait time: 3 days, 11 hours, 41 minutes 19:15:03 We're losing ground 19:15:12 thats under 19:15:13 Stats since the last revision without -1 or -2 (ignoring jenkins): 19:15:39 19:15:39 Longest waiting reviews (based on oldest rev without nack, ignoring jenkins): 19:15:42 7 days, 12 hours, 29 minutes https://review.openstack.org/50010 (Fix a couple of default config values) 19:16:14 lifeless: the MVP push has narrowed our review focus and we're glossing over rather than driving to 0 I think. 19:16:15 Is everyone in core using the link on https://wiki.openstack.org/wiki/TripleO#Review_team ? 19:16:44 SpamapS: most of the tuskar oriented folk are -core 19:17:06 SpamapS: and that 7 days one hasn't been +2'd or -1'd ; I've +2'd it. 19:17:13 I am using the link, but I am still ignoring tuskar reviews because I still haven't poked at it 19:17:39 Ng: can I suggest you give +1's and -1's at least? 19:17:42 Was not using the link, but it is not far off rom my "watched changes" minus heat. 19:17:52 lifeless: you can, I will do so 19:18:29 I am reviewing only tuskar, for now 19:18:30 lifeless: if you were to knock https://review.openstack.org/#/c/50431/2 in the affirmatory, that would clear two of our oldest reviews :) 19:18:34 so, SpamapS you raise an interesting point, at least for the folk doing tripleo-cd mvp's. 19:18:55 which is do we keep driving reviews to 0 ? 19:19:33 SpamapS: I think we should. The kanban provides focus, but it doesn't replace our responsibility for things we've already shipped, or for team maintenance and growth 19:19:34 I think we should still devote time to it. We just may not devote as much time to it. 19:20:11 SpamapS: so, when we sketched this, we said 19:20:14 unblock bottlenecks first, then unblock everyone else. 19:20:15 folk are still self directed - it's open source - but clear 19:20:25 SpamapS: I think doing reviews counts as unblocking everyone else 19:20:47 SpamapS: so my personal view on this is that we should review all firedrill reviews first 19:20:51 then all CD related reviews 19:20:56 then all reviews in the project 19:21:02 with drive-to-0 mentality 19:21:37 +1 from me on that plan 19:21:50 lsmola_: I think it's very important that tuskar folk really understand whats going on in the rest of tripleo - whats holding you back from reviewing across the full set of projects? 19:21:53 I agree with the order, just worry we've been a little too aggressive causing regression to creep in 19:22:09 SpamapS: It means less personal bandwidth for CD/MVP contributions, but more team bandwidth. 19:22:45 derekh: agreed on that as well, though I think the regressions have been contained _mostly_ to tripleo-incubator 19:22:46 SpamapS: If the team as a whole doesn't dig in and share review load, then we can revisit this. 19:22:57 lifeless, well I have been trying devtest last two weeks, so now I am still gaining confidence :-) 19:23:08 i'm fine with the review priority, as, i still feel the average review time is pretty good 19:23:31 derekh: and overall, we have a more tested solution because we have been aggressive and pushed to have a real live cloud deploying all the time. 19:23:33 *proposed review priority, i mean 19:24:24 lifeless, most of us will be testing stuff we are preparing for hongkong in like next 3 weeks, so we might not have many new patches or reviews 19:24:38 derekh: it's true, there have been regressions. as we work down the stack towards CI, I think that will get solved 19:24:44 SpamapS: yup true but we've left behind the seed and undercloud , so were only catching things on the top of the stack (which is better then nothing) 19:24:58 lsmola_: thats fine, but reviewers have to scale as contributors do 19:25:01 yup more CI will solve the problem 19:25:26 lsmola_: if you only review a thin slice, the contribution becomes asymmetric. 19:25:54 lsmola_: an explicit goal for me is to see as many of tuskar folk remain tripleo-core as possible, but that means: 19:26:09 derekh: yeah, we went from a broad comprehensive defense strategy to a tactical surgical strike strategy. :) 19:26:16 - contributing as -core in a sustained period (e.g. a review a day minimum) 19:26:29 - and learning about the other components (by participating in reviews) 19:26:44 lifeless, true that, though right now I could do just code review, so checking good programming style, cause I don know how to test most of the stuff, but I am getting into it 19:26:59 * jistr will try to lose flaps on eyes focused towards tuskarclient and do reviews elsewhere too 19:27:23 lsmola_: yeah that's my concern too. I don't know how to test the other stuff properly. 19:27:37 but i'll try to start at least with +/-1 19:27:46 lsmola_: please do - even if you can only comment on surface aspects you'll see a) other reviewers comments and b) the code and problems it's trying to solve. 19:27:51 There's an interesting parallel to the Hyper-V Nova discussion here. 19:28:09 jistr, that's my concern as well - and coming from a tuskar-ui background, so far my interactions have mostly been limited to tuskar-ui/tuskar 19:28:22 Narrow review focus vs. the project as a whole. 19:28:27 lifeless, ok then 19:28:48 bnemec: yes, there is. 19:29:14 We had something similar in Heat too with rackspace specific resource plugins. 19:29:17 however we're at least one OOM less code to deal with in direct terms. 19:29:54 Yeah, absolutely. And review turnaround in tripleo seems to be much faster IME. 19:30:08 bnemec: thats deliberate :). 19:30:11 Which addresses a lot of the complains the hyperv people have. 19:30:15 *complaints 19:30:38 bnemec: review turnaround is crucial, I think nova needs to fix that, but I don't have a trivial 'do X' to fix it. 19:30:43 anyhow... 19:30:57 #action lifeless to summarise review meta to the list as well 19:31:48 #topic Projects needing releases 19:32:04 SpamapS: you were going to do releases of $everything; I'm guessing the MVP work sidetracked that? 19:32:31 lifeless: and my own scatter brain. 19:32:45 lifeless: will put it higher on the list 19:33:16 SpamapS: how about we ask someone not visibly tackling the MVP work to do it? 19:33:50 do we have a volunteer? Goal is to a) make sure all projects other than -incubator are correctly configured in infra/config + their own code to /do/ releases. 19:33:53 lifeless: That would not bother me, but I am not worried that I won't complete the task. 19:34:07 SpamapS: you're a bottleneck on heat. 19:34:21 I know 19:34:28 goal b) is then to do a release of everything pending commits 19:34:33 so yeah seems to make more sense to take tasks away from me 19:34:35 I'm happy to look at release stuff. I'll need some handholding or docs, having not done any openstack releases before 19:34:42 Ng: ok 19:34:51 Ng: mordred and clarkb can get you through it. 19:34:59 aroo? 19:35:03 #action Ng to ensure all non-incubator projects can release and get a release done of any which have unreleased code. 19:35:12 #topic CD Cloud status 19:35:24 Ng: make sure you have a gpg key 19:35:38 We're now deploying a publically accessible cloud with admins and users setup. 19:35:45 mordred: Ng: -> -infra please 19:36:12 It's deploying quite reliably; < 3% failure, and AFAIK we manually caused all of those. 19:36:20 So this is big! 19:37:08 lifeless: well we don't actually know if what we deploy works fully do we? 19:37:11 We're now working on preserving state (with downtime) so that folk don't have to reupload images etc. 19:37:12 aka no tempest yet 19:37:26 jog0: we spawn a VM and assign a floating IP 19:37:46 jog0: thats not perfect, but pretty indicative. 19:38:06 * derekh is working on tempest, got side tracted, will refocus again 19:38:08 if you're a TripleO contributor, please do get your accounts setup 19:38:09 lifeless: nice didn't know we were running that 19:38:27 see https://wiki.openstack.org/wiki/TripleO/TripleOCloud for instructions 19:38:37 jog0: end of devtest_overcloud.sh :) 19:38:58 #topic CI virtualized testing progress 19:39:09 pleia2: is the lxc bug thing a stale subtopic ? 19:39:26 lifeless: it is, I'll remove it 19:39:42 ok, so how goes it ? 19:39:52 I have a local nodepool running now to debug the issues we ran into last week with iteration 1 19:40:13 so hopefully that will be sorted soon so we can get that completed 19:40:31 #link https://etherpad.openstack.org/p/tripleo-test-cluster 19:40:33 for reference 19:40:44 cool 19:40:49 how is iteration two shaping up? 19:40:55 and based on discussion with lifeless last week I have some preliminary work done on iteration 2, but nothing shareable yet 19:41:10 ok, cool. 19:41:17 #topic MVP1 retrospective TripleO/TripleOCloud/MVP1Retrospective 19:41:33 http://finding-marbles.com/retr-o-mat/what-is-a-retrospective/ if you aren't aware of retrospectives 19:42:01 The goal here is to gather a shared pool of information about what happened during MVP1 19:42:17 and from that jointly decide what to do to make things better 19:42:38 so - 19:42:56 What do you guys think went well ? Both outcomes and just actions 19:43:16 I think we learned a lot. 19:43:20 this is brainstorm mode, so please no critique of what people say 19:43:42 So if an MVP is supposed to teach you, then it was a success in that light. 19:43:53 cool 19:44:22 I thought we did a good job on firedrills 19:44:26 I think we delivered very quickly, it's immensely cool, only a week into the experiment, to have a cloud folk can use, deployed using our tooling 19:44:39 specifically we learned that the tools we have built around the core work as expected on a real network 19:45:00 jog0: anything specific that stood out about the firedrills ? 19:45:29 lifeless: we fixed them fairly quickly 19:45:33 and all helped out 19:45:51 ok 19:46:05 What went poorly. Both outcomes and actions 19:46:07 ? 19:46:17 define firedrill, cd-overcloud not working ? 19:46:26 I think we also learned how to be a little more continuous in our own development.. handing things off.. communicating across sunsets.. etc. 19:46:41 derekh: a regression in anything we've delivered - 19:46:46 derekh: and are maintaining 19:47:06 SpamapS: the sun never sets on TripleO development 19:47:08 I think we need a list of those things at some point 19:47:18 lifeless: ++ 19:47:39 so I guess I learned I need to file more bugs, as far as I know there are 2 reason devtest is currently busted, 19:47:51 I guess a critical bug would speed up reviews 19:48:01 so one thing that went poorly is that we regressed devtest 19:48:17 and seemed to ignore toci 19:49:00 yup, toci have been failing since yesterday http://54.228.118.193/toci/ 19:49:58 I think we spent a lot of time fiddling directly on the cloud 19:50:04 rather than landing reviews to do things 19:50:09 derekh: do we track bugs for toci in the tripleo project? 19:50:22 at one point there were 15 or some commits live on the cloud that hadn't been reviewed. 19:50:42 SpamapS: nope we don't 19:50:48 lifeless: thats interesting. From my perspective we did a good job of making sure things were at least in reviews while fiddling. I never felt that the backlog got out of hand. 19:51:12 SpamapS: we want CICD, which means all deploys in prod are from trunk, not from ahead-of-trunk 19:51:13 derekh: Ok, because I was going to suggest that toci grow some instructions on how to debug failures. I see the fails but I don't always know what failed or how to tell. 19:51:15 SpamapS: we're cheating 19:51:21 but its usually a regression in one of the other projects that causes the failur 19:51:29 fwiw I deliberately did very little directly on the CD itself and focussed very hard on getting as much reviewing done as I could, mainyl because lifeless was spitting dozens of the things out each day 19:51:34 lifeless: doh. 19:51:34 ok, lets switch to 'what can we do better' now 19:51:51 SpamapS: I'll put something together 19:52:11 Ng: thank you:) 19:52:13 lifeless: so that suggests that we need to have a playground for developers so they aren't tempted to fiddle on live. 19:52:27 SpamapS: no, you misunderstand 19:52:35 SpamapS: reviews need to be down to 15-20m latency 19:52:42 SpamapS: to keep the velocity up 19:53:03 SpamapS: well, maybe a playground would be good too! 19:53:17 * SpamapS will have to revisit his expectations so 15-20m latency gets out of the "not likely to succeed" bucket in his head. 19:53:19 SpamapS: but e.g. I deployed a test node without interfering with cd-overcloud to test things. 19:53:21 lifeless: where does gating fit in this ? 19:53:34 jog0: gating helps avoid regressions 19:53:39 so things to change 19:53:45 lifeless: there's no way to test turning file injection off without breaking the cloud and/or having a second cloud. 19:54:07 SpamapS: right, fortunately you can deploy a second undercloud with the first 19:54:16 SpamapS: and now we're using neutron the dhcp won't fight. 19:54:40 SpamapS: however! I'm not whinging about that sort of explicit manual test 19:54:44 sure, it sounds like we just need to chop off a few machines for things like that. 19:54:56 SpamapS: I'm talking about the system sitting there running with 'review -d X' mode 19:55:17 lifeless: anyway we can gate devtest? that would act like a playground for devs 19:55:26 push a patch runs in a sandbox and logs are collected 19:55:29 jog0: pleia2 is working up to that. 19:55:30 just like infra 19:55:32 lifeless: woot 19:55:42 lifeless: yeah, with 7 people (is that right?) spread out by many hours of timezone, we're not going to get 15-20m latency on such things. Something would have to change. 19:55:48 so one thing we could do is say CI is a much higher priority, and do that before stateful. 19:55:51 turns out infra needs some tweaking each time we add a new provider :) 19:55:51 I think thats a good way to remve the need for human access 19:56:09 jog0: I think it's important, but not sufficient 19:56:26 jog0: the thing is that if there is a lot of inventory, people stop collaborating. 19:56:37 jog0: the place people collaborate is the shared code - what's in trunk. 19:56:53 jog0: I'm not saying where they *should*, but where they *do*, by observation. 19:57:18 so - brainstorm time 19:57:23 lifeless: I am not sure if I follow 19:57:27 suggestions - one liners - for doing things better over the next week 19:57:44 So this is all good information. I'm struggling to see how any of it will help us to keep velocity up without cherry-pick cheating. 19:58:09 SpamapS: ++ 19:58:34 change CD related reviews to only needing two +2's, not 'two +2's from non-the-author' 19:58:55 I also don't think even 0 latency would enable "I need to screw with X for a while to see how it works." 19:59:09 SpamapS: like I said, I'm fine with *that* 19:59:39 SpamapS: I'm not talking about removing sysadmin access, I'm talking about making the standard loop 'propose a review, get it landed, it deploys automatically' 20:00:12 I do agree that lower latency CD reviews will make us all more comfortable running in a mode where logging into the prod undercloud is basically seen as a massive team failure. 20:00:14 SpamapS: which means that everyone can /see/ whats running at any point in time, reducing hidden state 20:00:29 I'd be ok with CD related reviews allowing a +2 from the author, although ideally with a "I'm +2ing this because I just deployed it and it works" ;) 20:00:46 we have hit EOMeeting tho 20:00:49 we're right on running out of time 20:00:51 lifeless: it sounds like you are saying that instead of doing: try code on metal, merge, deploy trunk, we do merge, deploy trunk 20:01:01 but while we have folk here lets try to get this answered 20:01:19 So I suggest we discuss on the ML and in IRC over the next few days. 20:02:15 there isn't anyone after us. 20:02:18 as we get further along in our CD cloud story, I asusme we will soon get to the point where we do not want to risk any potentailly breaking changes from landing 20:02:36 jog0: thats the CI value proposition, yes. 20:02:45 jog0: but also the small-changes, no inventory aspect. 20:03:00 define inventory 20:03:02 jog0: and having heat self detect and rollback and things like that 20:03:08 jog0: code that is written but not delivered. 20:03:29 jog0: e.g. sitting in review or in trunk but not released. 20:03:35 lifeless: ahh I was thinking hardware inventory that makes more sense 20:03:52 inventory is something you have built but are not benefiting from 20:03:58 it's a cost 20:04:30 who else is still here? derekh ? dprince ? GheRivero ? 20:04:39 * SpamapS is here 20:04:43 * Ng 20:04:45 i'm here :) 20:04:48 * jog0 here 20:04:57 * rpodolyaka1 here 20:04:58 dprince: leaving in a moment. 20:05:01 me too 20:05:04 * jprovazn here 20:05:05 here 20:05:07 before you go 20:05:13 * marios 20:05:23 any thoughts on "change CD related reviews to only needing two +2's, not 'two +2's from 20:05:26 non-the-author'" 20:05:51 It was the only item proposed in the brainstorm :( 20:06:08 We probably need a retrospective retrospective ;) 20:06:11 can we reasonably require the author to assert that their +2 is based on the change having been tested? 20:06:11 lifeless: fine by me, can re-evaluate again later if it doesn't prove useful. 20:06:28 Ng: yes 20:06:31 to me, that just means "one +2" :) 20:06:34 Ng: +1 good idea, +2 (I manually tested this) 20:06:35 Ng: I think so 20:06:37 Ng: +1 20:06:45 sounds ok to me, maybe we say 1 +2 from non author 20:06:52 even requiring 2, +2's, i think we expect folks to test first 20:07:02 although, realistically, i know that can't always happen 20:07:11 * dprince for the record I've always loved sending my own code in 20:07:20 I think two +2's is useful because it's saying two sets of experienced-eyeballs. 20:07:37 dprince: you're in core I believe :) 20:08:02 ok, so no objections; 20:08:14 #action lifeless to report CD review tweak to the list 20:08:28 So, guidelines attached: +2 your own patch if you believe it is a) extremely straight forward, and b) you have a high degree of confidence it improves the situation of the CD environment. 20:08:30 oh, I had one more suggestion 20:08:32 no objections 20:09:00 If anyone finds a problem with something we're supporting (as opposed to building out), they should open a firedrill card in kanban 20:09:05 (or get someone to open one) 20:09:13 I agree SpamapS's guidelines 20:09:20 I'm not sure if this is a suggestion, but just now I looked at trello to remind myself what had been done in MVP1 and I don't think we keep separate cards for tasks that have been completed for a specific MVP, which makes doing a retrospective (at least for those of us with poor memories) harder :) 20:09:27 SpamapS, +1 20:09:40 Ng: I think we kinda did a big retrospective of the last week 20:09:45 Ng: perhaps we should archive all those cards now ? 20:09:51 SpamapS, or +2 I guess :-) 20:09:54 :) 20:10:02 lifeless: that's a good idea 20:10:06 Ng: or are you suggesting a column for mvpN-done + generic-done ? 20:10:27 lifeless: well I was thinking mvpN-done so I could look at exactly what was part of a given mvp for retrospecting 20:10:36 * SpamapS would like to see the board duplicated and frozen whenever a milestone is reached, as that would be ideal for a retrospective 20:10:42 but if we archive things as we've retrospected them, it's clear what is relevant for the next retrospective 20:10:54 I don't know what will work better. 20:11:06 Ok this doesn't seem nearly as important as the other question 20:11:10 and we've gone well over teh hour. 20:11:16 yup, thanks for your patience 20:11:19 #endmeeting