20:04:12 #startmeeting tripleo 20:04:13 Meeting started Mon Jul 15 20:04:12 2013 UTC. The chair is lifeless. Information about MeetBot at http://wiki.debian.org/MeetBot. 20:04:14 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 20:04:17 The meeting name has been set to 'tripleo' 20:04:27 sorry I'm late; had a non-sleeping baby night :( 20:04:45 lifeless: had one of those too. Pondering a post meeting nap :) 20:05:26 mmm nap 20:05:35 o/ 20:05:41 #topic agenda 20:05:42 bugs 20:05:43 Grizzly test rack status 20:05:43 CI virtualized testing progress 20:05:43 open discussion 20:05:49 #topic bugs 20:06:05 https://bugs.launchpad.net/tripleo/ 20:06:19 #link https://bugs.launchpad.net/tripleo/ 20:06:23 #link https://bugs.launchpad.net/diskimage-builder/ 20:06:55 #link https://bugs.launchpad.net/os-refresh-config 20:07:10 #link https://bugs.launchpad.net/os-config-applier 20:07:11 as usual, my two bugs are both a bit lacking in context 20:07:35 SpamapS: is there a config-collector tracker now? 20:08:04 lifeless: no, I set it aside a bit to get my tripleo setup in order for testing it.. 20:08:08 that was a week ago 20:08:10 have not booted a VM yet. :-/ 20:08:27 ok. Would you like me to do the LP administrivia ? 20:09:00 lifeless: yeah, I don't think I'm admin of the teams anyway 20:10:11 SpamapS: will do. (YOu don't need to be admin of them to make and hand-off a project) 20:12:09 ok so 20:12:22 hmm, bug https://bugs.launchpad.net/tripleo/+bug/1182241 20:12:25 Launchpad bug 1182241 in tripleo "first-boot.d rules are running on every boot" [Critical,Triaged] 20:12:56 I think thats fixed, should have been closed. 20:13:17 I moved the rules to orc scripts 20:13:29 and made them idempotent 20:13:46 we can't delete the first-boot feature yet 20:13:55 perhaps we should deprecate it though? 20:14:29 lifeless: indeed I think it may need to stay around with big ugly DEPRECATED warnings for a while .. since we seem to have some adoption now. 20:14:30 salv-orlando can't reproduce the quantum load issue 20:14:43 ayoung: why would having two delegated auth mechanism bad? 20:14:50 be bad* 20:14:50 in bug 1184484 20:14:52 Launchpad bug 1184484 in tripleo "Quantum default settings will cause deadlocks due to overflow of sqlalchemy_pool" [Critical,Triaged] https://launchpad.net/bugs/1184484 20:14:58 stevemar2: this is a different meeting 20:15:08 lifeless, wrong window, sry 20:15:10 stevemar2, having a broken delegation mechanism would be bad 20:15:13 stevemar2: please use the dev channel for out-of-meeting chat 20:15:16 stevemar2: np, thanks! 20:15:48 stevemar2, and having two mechanisms is fine, but duplication in general leads to fixes in one needing to be made in the other as well 20:15:51 We may need to do a manual update of the control plane in the POC to help him reproduce, or perhaps we can trigger it in virt. 20:16:01 ayoung: hey, you too please! -> ~-meeting. 20:16:10 lifeless, sorry 20:16:31 bug 1199412 20:16:33 Launchpad bug 1199412 in tripleo "seed vm build fails during cinder service install" [Critical,Triaged] https://launchpad.net/bugs/1199412 20:17:07 Thats fixed too isn't it ? 20:17:46 lifeless: yes 20:17:59 and so is bug 1199568 20:18:01 Launchpad bug 1199568 in tripleo "keystone service not running during wipe-openstack" [Critical,Triaged] https://launchpad.net/bugs/1199568 20:19:08 jog0: you've got docs somewhere on the fake virt driver for nova right? 20:19:26 lifeless: yeah 20:19:29 jog0: so that we can try booting 100 vms at once without having a 100-vm capable control plane 20:19:47 jog0: perhaps you'd like to try reproducing 1184484 ? 20:20:10 https://github.com/openstack-dev/devstack/commit/baf37ea81720982050eceea2b1b1e9bbdf6f0c94 20:20:29 jog0: just take an overcloud and change the virt driver on the compute node then throw a big boot request at it 20:21:03 ok, thats all the crits I can see 20:21:06 lifeless: sounds good to me 20:21:16 anyone have high bugs they want to discuss? 20:22:09 nada, ok. 20:22:17 lifeless: getting a gate up? 20:22:25 not sure if that counts as a bug 20:22:27 https://bugs.launchpad.net/tripleo/+bug/1201056 20:22:29 Launchpad bug 1201056 in tripleo "init-nova requires internet access" [High,Triaged] 20:22:29 fix released right? 20:22:36 SpamapS: yes plox 20:22:45 jog0: yeah, other business 20:22:53 #topic grizzly POC rack status 20:23:02 SpamapS: you were going to file some bugs about this? 20:23:08 I'm having problems right now with setup-baremetal ... 20:23:33 lifeless: drafting them now. 20:23:44 SpamapS: cool 20:24:02 #action SpamapS to finish drafting the bugs about long term rack running 20:24:21 we had the control plane for the POC go offline mid last week 20:24:45 the good news is that 'nova boot' on the undercloud brought it right back. 20:25:28 The bad news is that I suspect the [I think fixed in nova trunk - devananda will know] 'oh look IPMI didn't respond quickly, clearly the machine wants to be off' bug turned it off in the first place. 20:25:50 we haven't confirmed that via logs. 20:26:15 I'm not sure if we need to bother, since the only reason it was a fire drill was this being a non-HA setup. 20:26:27 thoughts? 20:27:01 I think it is worth confirming that is what happened. 20:27:13 Its a serious enough thing that we don't want to gloss over because it is "likely" 20:27:29 Also if we had better monitoring on our POC we'd have known sooner. 20:27:43 SpamapS: entirely agreed. 20:27:51 SpamapS: perhaps you could include a bug about both of those points. 20:28:00 SpamapS: in your drafts 20:28:44 https://bugs.launchpad.net/tripleo/+bugs?field.tag=poc 20:30:18 lifeless: there, I think now I have all of them 20:30:57 #link https://bugs.launchpad.net/tripleo/+bugs?field.tag=poc 20:31:16 ok 20:31:46 so I think we should treat these as indeed critical and move on them after the current crits are closed 20:31:50 which is spamaps two fuzzy ones 20:32:04 #topic CI virtualized testing progress 20:32:10 hello 20:32:15 pleia2: how goes it? We synced a little on the weekend 20:32:46 yeah, so that was helpful in understanding some of the networking stuff that I was tripping up on, now just nailing down the specific things I want to run for this testing 20:33:15 pleia2: is your kvm seed setup working - can you nova boot a bm node? 20:33:32 lifeless: unfortunately not :( 20:33:42 in the middle of backing out some changes 20:34:07 pleia2: perhaps we should pair up again after this meeting? 20:34:31 lifeless: yeah, that would be good (but lunch for me first) 20:34:41 pleia2: kk; ping me maybe. 20:34:45 #topic open discussion 20:34:45 will do 20:35:07 jog0: You wanted to talk CI gates 20:35:14 lifeless: I feel like we need to start pushing harder down the path toward gating. 20:35:19 https://review.openstack.org/#/c/30441/ fingers crossed again 20:35:59 There are enough people involved, and enough moving parts, that breaking stuff is worse than not moving forward at the highest possible speed. 20:36:36 so we break for two reasons. Other projects. Our stuffups. 20:36:54 are there any small gating tests that can help that don't require my bit yet? 20:36:56 My sense is that in the last 2 weeks its been about 50-50 split 20:37:36 SpamapS: ++ 20:37:36 to gate we need all our components mutually gated/low risk of random breaks/or used via releases 20:38:27 uhm 20:38:49 what kinds of issues have been breaking trunk? Perhaps there is a smaller gate we can start with as pleia2 suggested 20:38:53 e.g. diskimage-builder - to gate on that we need to get the ubuntu and fedora images it needs cached into the CI infrastructure so we're not dependent on internet access. 20:39:09 maybe just getting through DIB or something 20:39:22 for tie we need the git caching derekh is working on, and pip caching which I put a etherpad up designing 20:39:37 jog0: so we do need that; but note that dib hasn't broken. 20:40:09 lifeless: I was refering to the image elements aspect, so what generally breaks. 20:40:14 we had a tie rule break (cinder builds), we had neutron break (the rename of the client) and then (the quantum-server compoatbility script broke) 20:40:46 bah, spelling 20:40:58 anyhow, just to say - I'm totall +100 on CI 20:41:41 how much of that can be detected with just getting a seed-stack vm running? 20:41:58 the cinder rule break would have been detected 20:42:02 (and not any fake baremetal booting) 20:42:16 both neutron failures were silent until we tried to do stuff in anger 20:42:44 jog0: pleia2: so - we can indeed get some benefit from smaller gate checks. 20:43:00 is it possible to do seed-stack + tempest? 20:43:18 jog0: in principle yes. 20:43:27 err rather would that help us 20:43:29 jog0: in the current gate I very much doubt it. 20:43:32 assuming we can get toci back in working order, is it possible to ask everyone to run it before checking in new changes? if there is a smaller set of tests folks can run, i would be in favor of that too 20:43:51 Another thing that might help is getting error reporting into the heat templates (via waitconditions + orc) 20:44:04 rwsu: what would be awesome would be if toci just subscribed to branches proposed to tie/the/dib/oac/orc/occ 20:44:16 lifeless: ++ 20:45:12 so at the moment, the toci folk are carrying the CI burden in chase-mode for most of tripleo, and pleia2 is the only person working directly on gating infrastructure 20:45:25 pleia2 is working on /nova/ gating infrastructure atm 20:45:28 lifeless: good idea, it would be nice to have it report yea or nay in the review process 20:45:30 and I'm getting impacted by breakage to, so it's a bit slow going :( 20:45:30 because nova bm isn't gated 20:46:10 a consequence of that (that it will gate nova) is that its going to have to be super reliable 20:46:31 which probably means stevebaker's packaging patch set, derekh's git cache stuff, and a pip cache 20:46:44 plus other ancilliary changes will all be needed just to get that 20:47:07 I'm going to suggest that pushing straight at that target is better than picking small side gates 20:47:55 because I don't think small side gates will catch any breakage that actually impacts pleia2's work - she has been hitting all the quantum rename stuff, and also issues with running in lxc not kvm etc. 20:48:23 opinion: we should make bugs that will prevent her gate being activated critical 20:48:37 because getting CI for us is now critical 20:48:54 lifeless: ++ 20:49:37 we may have program status soon 20:49:50 if we do, we can ask for -infra help gating everything 20:49:53 which we can't at the moment 20:50:37 I'm going to be out of the office next week for OSCON (checking in, but solid testing+work will be hard) but I hope to have enough progress this week that I have some kind of dependency list of what we'll need in the gate (caches, etc) 20:50:55 is someone already working on the pip cache? 20:51:24 infra has a nice pip cache :) 20:51:51 rwsu: we have a design nutted out - http://etherpad.openstack.org/TripleO-pip-mirror 20:52:17 nice 20:52:47 hmm last minutes 20:52:51 I followed up SpamapS sprint idea 20:53:11 one thing is that a bunch of folk have said 'after the beta milestone please' 20:53:20 implicitly, just by 'this date is better' 20:53:38 how important is it that we be sprinting before the milestone (to get things in order for it) 20:53:48 vs that we be sprinting together (to get tings together) 20:54:31 Hrm 20:54:52 Well my thinking was to get together to push things into h3. 20:54:58 indeed 20:55:04 which is sept 5th 20:55:18 But if people would rather get together to hash out ideas for working on before the icehouse summit.. I would find value in that as well. 20:55:45 lifeless: why not something over sept 5th so we can both push to finish features and then switch to finding/triaging and fixing bugs? 20:56:09 (so we can make things work for the burners) 20:56:14 jog0: I have a conference 6/7/8 sept 20:56:14 I get the feeling that people not explicitly "working on tripleo" are constrained by their primary focus. 20:56:48 jog0: though like mordred I don't strictly need to be at the sprint, I'd really /like/ to be there. 20:57:19 jog0: aug 19th is early enough for most burners I think, at least for 1/2 the week. 20:57:28 lifeless: I think we'd be less productive without you 20:57:29 26th is the burn start 20:57:56 aug 19th doesn't work for me, but if it works for everyone else I will just have to skip it 20:58:01 And yeah this is a short sprint.. 19/20 gives them 6 days of pre-burn prep time :) 20:58:29 what about sept 2/3? 20:58:31 lifeless: 19/20 is probably no good for either monty or me, FWIW 20:58:39 devananda: ack 20:58:57 devananda: mordred had indicated he could do mon maybe tuesday 20:59:08 jog0: terrible for burners 20:59:09 maybe tuesday, but it would be pushing it 20:59:17 mon yes, maybe. tues is kinda driving day 20:59:18 jog0: they need a couple weeks decompression after the thing 20:59:31 right 20:59:31 maybe we should do this in http://www.doodle.com/ 20:59:37 and I'm not available 8/27 - 9/9 .. (not burning.. ;) 20:59:38 doodle ++ 20:59:39 out of time 20:59:44 I will take it to the list. 20:59:49 #endmeeting