20:00:21 <lifeless> #startmeeting tripleo 20:00:22 <openstack> Meeting started Mon Jun 3 20:00:21 2013 UTC. The chair is lifeless. Information about MeetBot at http://wiki.debian.org/MeetBot. 20:00:23 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 20:00:25 <openstack> The meeting name has been set to 'tripleo' 20:00:48 <lifeless> #agenda 20:00:51 <lifeless> bugs 20:00:51 <lifeless> Grizzly test rack progress 20:00:51 <lifeless> CI virtualized testing progress 20:00:51 <lifeless> open discussion 20:00:52 <lifeless> bah 20:00:57 <lifeless> #topic agenda 20:01:01 <lifeless> bugs 20:01:01 <lifeless> Grizzly test rack progress 20:01:01 <lifeless> CI virtualized testing progress 20:01:01 <lifeless> open discussion 20:01:10 <lifeless> #topic bugs 20:01:26 <lifeless> https://bugs.launchpad.net/tripleo/ 20:01:30 <lifeless> sigh. 20:01:32 <lifeless> #link https://bugs.launchpad.net/tripleo/ 20:02:15 <SpamapS> o/ 20:02:20 <lifeless> 10 criticals 20:02:37 <lifeless> 4 in progress 20:02:53 <lifeless> SpamapS: am I wrong, or do you have 1182249 too ? 20:03:11 <lifeless> and 1182732 and 1182737 ? 20:03:24 <SpamapS> checking 20:03:35 <lifeless> and 1183442 ? :) 20:03:45 <SpamapS> I believe 1182249 yes 20:04:08 <Ng> lifeless: which bug? :) 20:04:19 <lifeless> #action lifeless https://bugs.launchpad.net/tripleo/+bug/1184484 I will add it to the discussion about defaults on the -dev list. 20:04:20 <uvirtbot> Launchpad bug 1184484 in tripleo "Quantum default settings will cause deadlocks due to overflow of sqlalchemy_pool" [Critical,Triaged] 20:04:23 <lifeless> Ng: iLO 20:04:24 <SpamapS> Not sure if my patches in review handle 1182732 20:04:32 <lifeless> Ng: https://bugs.launchpad.net/tripleo/ 20:05:23 <SpamapS> I do have 1183442 20:06:22 <SpamapS> is there some reason gerrit doesn't manage LP projects for stackforge? 20:07:15 <lifeless> unlinking https://bugs.launchpad.net/tripleo/+bug/1182732 - we have a separate workaround task. 20:07:16 <uvirtbot> Launchpad bug 1182732 in quantum "bad dependency on quantumclient breaks metadata agent" [High,Confirmed] 20:07:55 <lifeless> SpamapS: it will 20:07:58 <lifeless> SpamapS: if it's configured correctly 20:08:12 <lifeless> SpamapS: we should be configured correctly now; clarkb gave us a hand last week to sort it out 20:09:16 <SpamapS> good, will cross my fingers :) 20:09:48 <lifeless> Ng: https://bugs.launchpad.net/tripleo/+bug/1178112 specifically 20:09:49 <uvirtbot> Launchpad bug 1178112 in tripleo "baremetal kernel boot options make console inaccessible on ILO environments" [Critical,Triaged] 20:11:14 <lifeless> so that leaves two 20:11:21 <lifeless> one is a workaround issue 20:11:34 <lifeless> not a lot we can do; clearly quantum hasn't been used at moderate scale 20:11:43 <lifeless> thats bug 20:11:48 <lifeless> bug 1184484 20:11:49 <uvirtbot> Launchpad bug 1184484 in tripleo "Quantum default settings will cause deadlocks due to overflow of sqlalchemy_pool" [Critical,Triaged] https://launchpad.net/bugs/1184484 20:12:10 <lifeless> and I'm fairly sure we have to have 1182737 fixed to bring up an automated overcloud 20:12:16 <Ng> lifeless: repointed at the commit that landed in dib and market as fix committed 20:12:36 <lifeless> Ng: \o/ - as dib isn't doing releases yet, just fix released please. 20:12:49 <Ng> k 20:12:53 <lifeless> SpamapS: as you sure you're not installing git trunk of quantumclient yet ? 20:13:07 <SpamapS> lifeless: was just looking 20:13:08 <lifeless> SpamapS: I thought you brought up an overcloud in fully automated fashion and it worked? 20:13:57 <SpamapS> lifeless: 99% automated.. still getting stuck at booting an instance and having metadata because of a lack of routers.. 20:14:11 <SpamapS> lifeless: in my notes I have "install quantumclient from trunk in quantum venv" 20:14:23 <lifeless> kk 20:14:33 <lifeless> SpamapS: I will debug that with you later today? 20:14:51 <lifeless> so, bugs done to death I think; lots of high, but lets get the fire drill sorted before we worry about that. 20:15:07 <SpamapS> lifeless: yes I've got it working well with very straight forward manual steps 20:15:08 <lifeless> #topic Grizzly test rack POC 20:15:21 <SpamapS> lifeless: also we should lean on quantumclient maintainers maybe? 20:15:25 <lifeless> So, we have a live working grizzly cloud. 20:15:42 <lifeless> SpamapS: we should. 20:16:28 <lifeless> but we have no monitoring in place. 20:16:33 <SpamapS> lifeless: I need to lean on the keystoneclient maintainers for similar reasons. :) 20:16:47 <lifeless> Ng: GheRivero: perhaps thats something you guys have lots of experience and would like to take up the mantle for? 20:16:48 <SpamapS> lifeless: sure we do, our POC users will phone us if it breaks. ;) 20:16:49 * SpamapS hides 20:17:50 <Ng> lifeless: monitoring? sure. do we have any ideas about what we want? 20:18:01 <GheRivero> lifeless: yeah, sure (don't know what, but if you say so... :) 20:18:13 <lifeless> well 20:18:38 <lifeless> icinga or nagios perhaps? NobodyCam had the start of an element, but it's stalled AFAICT' 20:18:58 <lifeless> perhaps work with him on that, and on the heat template for same 20:19:07 <jog0> I assume alerting is included in monitoring? 20:19:10 <cody-somerville> I can also give a hand there. 20:19:20 <lifeless> cody-somerville: cool 20:19:25 <SpamapS> I'd love to have an icinga-heat bit that, given heat read credentials, can interpret a heat stack and generate all of the monitoring. 20:19:40 <lifeless> SpamapS: +1, and I welcome our strong AI overlords. 20:19:49 <NobodyCam> I ment to get back to the nagios ... but got side tracked on ironic 20:20:17 <lifeless> so, we've three weeks of POC to go; be really good to have monitoring sooner rather than earlier. 20:20:20 <lifeless> jog0: oh yeah, HI 20:20:21 <lifeless> ! 20:20:26 <lifeless> what do we want? 20:20:34 <lifeless> I think we want base level hardware/OS monitoring. 20:20:49 <lifeless> We want cloud health - have we maxed out any resource 20:20:50 * jog0 waves to lifeless and the rest of the room 20:21:05 <lifeless> We want API health: are all the API endpoints answering, and doing so in a reasonable timeframe. 20:21:27 <lifeless> We want functional monitoring - is spinning up/down instances working, is networking working. 20:21:43 <lifeless> it's likely we want more than one tool; but a consolidated view of their data. 20:21:58 <lifeless> should I turn this into a blueprint/etherpad? 20:22:34 <Ng> we should probably have this captured somewhere 20:22:39 <lifeless> I'd love it if someone can pick it up and run with it; dragging other folk in as needed. 20:22:45 <jog0> lifeless: and cloud health also detects if a box dies? 20:23:00 <lifeless> jog0: the base hardware/os layer stuff would capture that 20:23:25 <lifeless> jog0: I'm inclined to worry about automated remedial actions at a later date 20:23:40 <jog0> ah, perhaps we should sync up offline, so I can get up to speed 20:24:00 <lifeless> jog0: sure; though we have some time here. 20:24:24 <SpamapS> Note that Heat wants to be able to do some of that remedial action stuff too. 20:24:27 <lifeless> Basically, tripleo aims to deliver a production ready cloud; having *an* answer for monitoring is an important thing. 20:24:32 <lifeless> SpamapS: right 20:24:41 <lifeless> SpamapS: I'm thinking an icinga endpoint can be the canary check. 20:24:52 <SpamapS> yes indeed 20:24:55 <lifeless> SpamapS: at 10000ft view 20:25:16 <lifeless> jog0: the other thing as SpamapS just brought up, is that solid service monitoring is a key part of safe deployment automation. 20:25:28 <lifeless> jog0: so you can stop a deploy mid-way if things go pear shaped. 20:26:24 <jog0> right 20:26:33 <lifeless> right now we have nothing; so we need someting 20:26:47 <lifeless> #action lifeless to capture 10000ft view of monitoring needs in a blueprint 20:26:58 <lifeless> #action someone to take point on monitoring 20:27:44 <lifeless> I have an open todo to track down the missing machines; echohead gave me a spreadsheet, but I don't know [yet] the network topology 20:27:51 <lifeless> anything else about the test rack? 20:28:53 <lifeless> ok 20:29:02 <lifeless> #topic 20:29:02 <SpamapS> next topic: 2nd test rack? :) 20:29:07 <lifeless> #topic CI virtualized testing progress 20:29:15 <pleia2> so, this one is lots of fun 20:29:24 <lifeless> SpamapS: once we can bring up and pull down parallel clouds in this rack I'll ask for a test row. 20:29:35 <pleia2> a couple months ago I was tasked with testing nova-baremtal https://bugs.launchpad.net/openstack-ci/+bug/1082795 20:29:36 <lifeless> pleia2: tag, you're it 20:29:36 <uvirtbot> Launchpad bug 1082795 in openstack-ci "Add baremetal testing" [High,Triaged] 20:29:55 <pleia2> as we all know, a ton has changed since then, ironic and all 20:30:19 <pleia2> but I've still been focusing on tripleo to do virtualized testing of the soundness of launching these test bmnodes 20:30:30 <pleia2> so two things 20:30:59 <pleia2> 1. This is difficult, I tried just straight toci that dprince worked on, but our virtualized environment don't really allow for this (they don't have kvm, and qemu is way way too slow) 20:31:18 <lifeless> how slow? 20:31:26 <lifeless> Like, can we get working-but-slow, and then iterate? 20:31:30 <pleia2> openstack starts exhibiting strange timeout bugs slow, not usable 20:31:40 <pleia2> 2 minutes to ssh in 20:32:19 <lifeless> ok, thats pretty messed up. 20:32:29 <pleia2> 2. Am I still on the right track here at all by using tripleo? If I use lifeless' takeover node I end up pulling out so much virtualization that I'm really just testing dib and launching of nodes (and haven't quite figured out networking on that) 20:32:57 <pleia2> which isn't really tripleo anymore, but probably is where I want to be testing baremetal-wise (I think) 20:33:01 <SpamapS> pleia2: I have working nested kvm on my i7 laptop 20:33:05 <lifeless> pleia2: so, what code path do you want to test ? 20:33:07 <SpamapS> pleia2: using boot-stack 20:33:14 <lifeless> SpamapS: cloud test environments are rackspace/HPCS. 20:33:21 <lifeless> SpamapS: so thats interesting but irrelevant 20:33:22 <pleia2> SpamapS: me too, but not on hpcloud 20:33:27 <SpamapS> gah 20:33:32 <pleia2> can't even load kvm module on hpcloud 20:33:42 <SpamapS> yeah didn't realize thats what we were talking about 20:33:45 <lifeless> pleia2: what codepath are you aiming to test. 20:34:20 <pleia2> lifeless: so that's what I realized this morning - I don't know, nova-baremetal is now ironic (and doesn't yet have a nova driver afaik) 20:34:30 <lifeless> nova-baremetal still exists. 20:34:36 <lifeless> ironic is coming together. 20:34:47 <pleia2> right, and it's a goal for ironic to have a driver which I assume will behave the same way 20:34:53 <pleia2> +for nova 20:34:58 <lifeless> once ironic is integrated, we'll still want to know that the use case of 'nova boot baremetal' still works. 20:35:05 <lifeless> so, lets ignore ironic. 20:35:28 <lifeless> if we do that, what codepath do you want to test? 20:35:51 <pleia2> nova 20:35:58 <lifeless> ok 20:36:04 <lifeless> so the minimum you need for that is 20:36:07 <lifeless> the nova code 20:36:10 <lifeless> configured for baremetal 20:36:15 * pleia2 nods 20:36:19 <lifeless> you need a dedicated network 20:36:37 <lifeless> with 'physical' machines on it w/PXE boot configured 20:36:38 <pleia2> yes, this is my current challenge when trying to do this virtualized without nesting 20:36:52 <lifeless> and you need a power driver capable of turning them on / off. 20:37:26 <lifeless> I suggest that a solid 'it worked' test is to boot a vanilla ubuntu image, ssh in with metadata supplied ssh key. 20:37:48 <lifeless> tripleo's boot-stack is neither here nor there w.r.t. to testing this specific code path. 20:37:55 <pleia2> ok 20:37:57 <SpamapS> Just a thought. Have we ever tried lxc as a way around the nesting problems? 20:38:05 <pleia2> SpamapS: nope 20:38:10 <lifeless> SpamapS: lxc container set to pxe boot ? 20:38:19 <lifeless> SpamapS: with a different kernel.... 20:38:20 <pleia2> just qemu (drop in for kvm, easy to test) 20:38:26 <lifeless> SpamapS: I don't think its a fit. 20:38:37 <lifeless> SpamapS: though it's an interesting idea 20:38:40 <SpamapS> lifeless: oh well if we're testing _that_ part yeah there's no point. 20:38:50 <pleia2> so I've had a few ideas, but they end up being so weird that we don't end up testing what we think we're testing (and the tests could break in weird ways) 20:39:06 <lifeless> pleia2: so, *tripleo* want to test the full path. 20:39:26 <lifeless> pleia2: one reason you've been steered at tripleo, I think, is so that you can kill two birds with one stone. 20:39:47 <lifeless> pleia2: a) nova baremetal functional/integration test. b) tripleo boot-stack functional/integration test. 20:40:01 <lifeless> pleia2: I'll let -infra folk weigh in on the relative importance of that, but... 20:40:03 <pleia2> yeah, and it also tests dib 20:40:16 <lifeless> pleia2: for my part, I think 'lets get /a/ test in place and upgrade it later' 20:40:18 <pleia2> (or, is potentially broken by dib :)) 20:40:59 <lifeless> now, in the absence of a test cloud with nested vm enabled.... which btw the grizzly POC rack could be setup 20:41:06 <lifeless> or a bare metal test cloud 20:41:15 <lifeless> we're going to have nested KVM for the baremetal node you boot 20:41:27 <lifeless> we don't have to have nested KVM for the boot-stack node. 20:41:34 <lifeless> SpamapS: oh, I may have misinterpreted you.... 20:41:41 <pleia2> so my thought was to spin up 3 hpcloud/rackspace instances 20:41:43 <lifeless> pleia2: we could run the boot-stack image in lxc perhaps. 20:42:08 <lifeless> SpamapS: ^ is that what you meant ? 20:42:28 <pleia2> one would be what usually is physical hardware, then boot-stack then the baremetal node, but those are all public machines, not a private lan where they all talk 20:43:06 <lifeless> I don't think it will buy you anything 20:43:16 <pleia2> yeah, it's a mess 20:43:20 <lifeless> as they'll have to run a VPN to get the layer 2 network to do PXE 20:43:27 <lifeless> and that implies nested KVM on each machine 20:43:50 <lifeless> except the boot-stack on; but - see lxc. 20:43:55 <lifeless> sp, SpamapS is afk ;). I'll riff 20:44:23 <lifeless> use dib to build a boot-stack image. loopback mount it and lxc boot it - no nested kvm 20:44:29 <pleia2> hah, so lxc container inside the hpcloud instance? 20:44:33 <lifeless> we document 'use kvm to boot the seed cloud' 20:44:46 <lifeless> we can also document 'use lxc to boot the seed cloud', just as well 20:44:56 <lifeless> bmnodes will still be nested kvm 20:45:14 <lifeless> and you'll still have a br99 or whatever between the bm nodes and eth1 in the boot-stack container. 20:45:14 <SpamapS> sorry yeah had local interrupt 20:45:17 <pleia2> well, qemu, right? 20:45:23 <pleia2> since we can't do nested kvm 20:45:23 <lifeless> pleia2: ack 20:45:35 <SpamapS> lifeless: and yes I meant run boot-stack in lxc 20:45:49 <lifeless> SpamapS: so yeah, I misinterpreted you :(. 20:45:53 <lifeless> SpamapS: argue more, dammit! 20:46:12 <SpamapS> lifeless: I was mid-argument when wife needed muscles 20:47:00 <lifeless> doh! 20:47:03 <lifeless> pleia2: what do you think ? 20:47:05 <pleia2> ok, so instead of booting boot-stack as a kvm instance, we make it lxc (can lxc boot qcow2?), right? 20:47:36 <pleia2> then we just create the bmnode as usual (except with qemu rather than kvm) 20:48:12 <NobodyCam> just I run boot-stack setup on three virtual box vms, dib,boot-stack,bm-node.... no nested vms at all 20:48:25 <NobodyCam> *justFYI* 20:48:34 <lifeless> NobodyCam: those are nested when your host is a cloud instance 20:48:38 <lifeless> NobodyCam: thats the issue 20:48:44 <pleia2> yeah, we're doing this on a public cloud 20:48:47 <lifeless> pleia2: yes. And qemu-nbd can loopback mount qcow2. 20:48:57 <pleia2> lifeless: ok, cool 20:49:15 <pleia2> ok, I have a plan, thanks lifeless and SpamapS 20:49:19 <lifeless> cool 20:49:23 <pleia2> (now to learn more about lxc :)) 20:49:32 <lifeless> #topic open discussion 20:50:15 <lifeless> anything? 20:50:17 <SpamapS> so many bugs 20:50:20 <SpamapS> so little time :) 20:50:32 <SpamapS> (I think that may mean tripleo is healthy) 20:51:44 <SpamapS> oh 20:51:46 <lifeless> MORE PEOPLEZ PLEAHSE 20:51:51 <SpamapS> os-config-applier is now os-apply-config 20:52:24 <SpamapS> also I was thinking o-a-c should have a way to reference instance metadata the same way it references heat metadata. 20:52:43 <lifeless> mmm 20:52:55 <lifeless> what about a thing to suck instance metadata down to disk as json 20:53:13 <lifeless> and oac unions in some well defined manner multiple json files? 20:53:26 <SpamapS> yeah thats the way I was thinking of doing it actually. 20:53:58 <SpamapS> local_ip = {{instance_metadata.private_ip}} or something like that. 20:54:16 <lifeless> sob, you want to kill my sed ?:) 20:54:39 <lifeless> should we namespace the heat variables too ? 20:54:43 <lifeless> {{heat.goo}} ? 20:54:49 <SpamapS> Yeah thats what pops into my head as well 20:55:01 <lifeless> so 20:55:04 <SpamapS> though another thought is to just reserve some namespaces 20:55:07 <lifeless> what I was thinking was that neither was namespaced 20:55:22 <lifeless> and we define what happens on conflicts in a formal predictable manner 20:55:25 <sthakkar> hey guys 20:55:35 <lifeless> so that you can locally override something 20:55:41 <lifeless> sthakkar: hi ? 20:56:04 * mestery thinks sthakkar is early for the next meeting. :) 20:56:20 <sthakkar> mestery is right. sorry guys :) 20:56:55 <SpamapS> lifeless: well I will put together a bug about the need for access to metadata.. the design can come later. 20:57:16 <lifeless> ok, so I think thats a wrap then. 20:57:19 <lifeless> last call 20:57:57 <lifeless> #endmeeting