Wednesday, 2014-04-16

*** lipinski has quit IRC		00:01
*** ChanServ changes topic to "support @ https://ask.openstack.org \| developer wiki @ https://wiki.openstack.org/wiki/Heat \| development @ https://launchpad.net/heat \| logged @ http://eavesdrop.openstack.org/irclogs/%23heat/"		00:01
*** jay_t has quit IRC		00:05
*** achampion has joined #heat		00:10
*** arbylee has quit IRC		00:11
*** andersonvom has quit IRC		00:15
*** m_22 has quit IRC		00:16
*** spzala has joined #heat		00:18
aru__	thanks stevebaker	00:24
aru__	ot seemed like solve the problem	00:24
*** asalkeld has quit IRC		00:24
*** lindsayk1 has quit IRC		00:25
*** arbylee has joined #heat		00:26
*** matsuhashi has joined #heat		00:30
*** lindsayk has joined #heat		00:33
openstackgerrit	A change was merged to openstack/heat: Implement locking in abandon stack https://review.openstack.org/86663	00:36
*** asalkeld has joined #heat		00:37
*** blamar has quit IRC		00:41
*** lindsayk has joined #heat		00:49
*** andersonvom has joined #heat		00:55
*** blamar has joined #heat		00:59
*** spzala has quit IRC		01:01
*** nati_uen_ has quit IRC		01:20
*** andersonvom has quit IRC		01:28
*** lindsayk has quit IRC		01:30
*** daneyon has joined #heat		01:32
*** Qiming has joined #heat		01:34
*** david-lyle has joined #heat		01:36
*** matsuhashi has quit IRC		01:40
*** matsuhas_ has joined #heat		01:43
*** julienvey has joined #heat		01:46
*** julienvey has quit IRC		01:51
*** david-lyle has quit IRC		02:03
openstackgerrit	Jun Jie Nan proposed a change to openstack/python-heatclient: Add --preview option to stack abandon command https://review.openstack.org/84680	02:04
*** lipinski has joined #heat		02:05
*** alexpilotti has quit IRC		02:13
*** harlowja is now known as harlowja_away		02:35
*** connie has joined #heat		02:35
*** connie has quit IRC		02:36
*** julienvey has joined #heat		02:45
*** etoews has quit IRC		02:47
*** matsuhas_ has quit IRC		02:49
*** matsuhashi has joined #heat		02:49
*** julienvey has quit IRC		02:50
*** matsuhas_ has joined #heat		02:52
*** matsuhashi has quit IRC		02:52
*** matsuhas_ has quit IRC		02:58
*** zhiyan_ is now known as zhiyan		03:01
*** etoews has joined #heat		03:05
*** sergmelikyan has quit IRC		03:10
*** sergmelikyan has joined #heat		03:13
*** etoews has quit IRC		03:14
*** arbylee has quit IRC		03:14
*** arbylee has joined #heat		03:14
*** etoews has joined #heat		03:22
*** ramishra has joined #heat		03:23
*** etoews has quit IRC		03:28
*** nosnos has quit IRC		03:35
*** lipinski has quit IRC		03:40
sdake	harlowja had early dinner - which message is interrupting your business continuity?	03:44
sdake	and on that note, i'm off to bed, enjoy :)	03:45
*** julienvey has joined #heat		03:46
*** etoews has joined #heat		03:47
*** julienvey has quit IRC		03:51
openstackgerrit	Jun Jie Nan proposed a change to openstack/heat: Add preview option to stack abandon https://review.openstack.org/84664	03:52
*** etoews has quit IRC		03:52
*** IlyaE has quit IRC		04:03
*** etoews has joined #heat		04:06
*** sdake_ has joined #heat		04:08
*** etoews has quit IRC		04:10
*** asalkeld has quit IRC		04:14
*** asalkeld has joined #heat		04:16
*** IlyaE has joined #heat		04:17
*** sergmelikyan has quit IRC		04:21
*** sergmelikyan has joined #heat		04:23
*** nosnos has joined #heat		04:25
openstackgerrit	Jun Jie Nan proposed a change to openstack/python-heatclient: Add code coverage in resource list test https://review.openstack.org/87846	04:25
openstackgerrit	Jun Jie Nan proposed a change to openstack/python-heatclient: Fix empty resource list index out of range error https://review.openstack.org/87269	04:25
*** saju_m has joined #heat		04:27
*** saju_m has quit IRC		04:27
*** aru__ has quit IRC		04:28
*** saju_m has joined #heat		04:31
*** aweiteka has joined #heat		04:34
*** pithagora has joined #heat		04:38
*** achampio1 has joined #heat		04:42
*** achampion has quit IRC		04:44
*** julienvey has joined #heat		04:47
*** achampion has joined #heat		04:47
*** achampio1 has quit IRC		04:48
*** nanjj has joined #heat		04:49
*** julienvey has quit IRC		04:52
*** cmyster has joined #heat		04:59
*** cmyster has joined #heat		04:59
*** IlyaE has quit IRC		05:07
*** nkhare has joined #heat		05:09
*** etoews has joined #heat		05:10
cmyster	morning	05:11
Qiming	morning	05:14
*** etoews has quit IRC		05:17
cmyster	how are you this morning Qiming ?	05:17
Qiming	cmyster: feeling very hot in the office	05:18
cmyster	same here, summer has started very early this year...	05:20
*** chandan_kumar has joined #heat		05:31
*** pithagora has quit IRC		05:37
*** dmueller has joined #heat		05:48
*** Qiming has quit IRC		05:50
*** dmueller has quit IRC		05:50
*** julienvey has joined #heat		05:51
*** IlyaE has joined #heat		05:54
*** julienvey has quit IRC		05:55
*** sdague has quit IRC		05:58
*** sdague has joined #heat		06:05
*** slagle has quit IRC		06:12
*** slagle has joined #heat		06:13
*** saju_m has quit IRC		06:17
*** etoews has joined #heat		06:23
*** liang has joined #heat		06:28
*** etoews has quit IRC		06:29
*** saju_m has joined #heat		06:29
*** fandi has joined #heat		06:39
*** etoews has joined #heat		06:40
*** etoews has quit IRC		06:45
*** arbylee has quit IRC		06:47
therve	Good morning!	06:50
*** tomek_adamczewsk has joined #heat		06:51
*** jprovazn has joined #heat		06:51
cmyster	morning	06:52
*** IlyaE has quit IRC		06:55
*** chandan_kumar has quit IRC		07:02
*** chandan_kumar has joined #heat		07:08
shardy	morning all	07:17
cmyster	morning	07:20
*** sdake has quit IRC		07:21
*** jiangyaoguo has joined #heat		07:21
*** jiangyaoguo has left #heat		07:22
*** sdake has joined #heat		07:36
*** tspatzier has joined #heat		07:37
*** tspatzier has quit IRC		07:42
*** arbylee has joined #heat		07:48
*** asalkeld has quit IRC		07:51
*** jistr has joined #heat		07:52
*** akuznets_ has quit IRC		07:53
openstackgerrit	Zhang Yang proposed a change to openstack/heat: Allow DesiredCapacity to be zero https://review.openstack.org/87204	07:54
*** arbylee has quit IRC		07:55
pas-ha	morning all	07:56
skraynev	Morning all	07:56
*** akuznetsov has joined #heat		08:00
cmyster	morning	08:01
openstackgerrit	Sergey Kraynev proposed a change to openstack/heat: Adding attribute schema class for attributes https://review.openstack.org/86525	08:11
openstackgerrit	Sergey Kraynev proposed a change to openstack/heat: Using attribute schema for building documentation https://review.openstack.org/86803	08:11
openstackgerrit	Sergey Kraynev proposed a change to openstack/heat: Deprecate first_address attribute of Server https://review.openstack.org/86526	08:12
*** derekh has joined #heat		08:12
openstackgerrit	Sergey Kraynev proposed a change to openstack/heat: Deprecate first_address attribute of Server https://review.openstack.org/86526	08:16
cmyster	http://download.fedoraproject.org/pub/fedora/linux/updates/20/Images/x86_64/Fedora-x86_64-20-20140407-sda.qcow2 is heartbleed free btw	08:17
*** e0ne has joined #heat		08:21
*** che-arne has joined #heat		08:33
*** sorantis has joined #heat		08:35
*** petertoft has joined #heat		08:35
*** pablosan is now known as zz_pablosan		08:36
*** TonyBurn has joined #heat		08:55
*** zhangyang has joined #heat		08:56
*** rpothier has quit IRC		09:00
*** rpothier has joined #heat		09:01
*** chandan_kumar has quit IRC		09:06
*** ramishra has quit IRC		09:06
*** ramishra has joined #heat		09:07
*** ramishra has quit IRC		09:08
*** alexpilotti has joined #heat		09:16
*** chandan_kumar has joined #heat		09:20
*** chandan_kumar has quit IRC		09:26
*** chandan_kumar has joined #heat		09:26
*** tspatzier has joined #heat		09:29
openstackgerrit	Zhang Yang proposed a change to openstack/heat: Allow DesiredCapacity to be zero https://review.openstack.org/87204	09:30
*** liang has quit IRC		09:32
*** saju_m has quit IRC		09:49
*** arbylee has joined #heat		09:53
*** tspatzier has quit IRC		09:56
*** arbylee has quit IRC		09:58
*** nosnos has quit IRC		10:07
openstackgerrit	A change was merged to openstack/heat: Add hint on creating new user for Heat in DevStack https://review.openstack.org/87555	10:10
*** nosnos has joined #heat		10:12
openstackgerrit	Zhang Yang proposed a change to openstack/heat: Allow DesiredCapacity to be zero https://review.openstack.org/87204	10:14
*** dmakogon_ is now known as denis_makogon		10:14
shardy	If anyone wants lots of details on stack domain users, I just posted this:	10:15
shardy	http://hardysteven.blogspot.co.uk/2014/04/heat-auth-model-updates-part-2-stack.html	10:15
skraynev	shardy: thanks. will read ;)	10:16
*** e0ne has quit IRC		10:17
*** e0ne has joined #heat		10:18
*** Qiming has joined #heat		10:18
*** nanjj has quit IRC		10:21
*** e0ne has quit IRC		10:22
*** etoews has joined #heat		10:29
Qiming	shardy, thank you!	10:29
pas-ha	shardy: thanks, good read	10:31
shardy	Qiming: ah, you're here now, no problem :)	10:32
*** mestery_ has joined #heat		10:33
*** nosnos has quit IRC		10:34
*** etoews has quit IRC		10:34
*** nosnos has joined #heat		10:35
*** mestery has quit IRC		10:36
Qiming	shardy, I am trapped by sending a signal to heat using mechanism other than ec2-signed URL	10:38
shardy	Qiming: You can send a signal via the native API	10:38
shardy	but not a WaitCondition notification at the moment	10:39
shardy	heat resource-signal ...	10:39
*** nosnos has quit IRC		10:39
Qiming	shardy: I read the deployment code where heat-config sent back a signal now	10:39
Qiming	in that implementation, the signalling side need to have user-id, password, project-id, auth-url ...	10:40
shardy	Qiming: Yes, the SoftwareDeployment resources have been designed to use either ec2 signed URLs or native signals	10:40
shardy	yes, it uses a stack domain user and a randomly generated password	10:40
Qiming	I hope your blog will help me better understand how trusts work	10:40
shardy	My post from last week may do, but it's unrelated to in-instance signalling	10:40
shardy	please read both posts and come back if you still have questions :)	10:41
Qiming	then maybe I can try have the Ceilometer::Alarm to post to a 'trust-url' ?	10:41
shardy	Qiming: therve has posted patches which enable exactly that	10:42
Qiming	still not sure how to post some data back along with the signal	10:42
Qiming	shardy, that is the patch I will try, :)	10:42
* Qiming printed the blog posts and a dictionary, started to study English ...		10:43
*** e0ne has joined #heat		10:48
*** e0ne_ has joined #heat		10:50
*** sorantis has quit IRC		10:53
*** e0ne has quit IRC		10:53
*** nkhare has quit IRC		10:58
*** fandi has quit IRC		11:04
*** e0ne_ has quit IRC		11:06
*** e0ne has joined #heat		11:06
*** Michalik- has joined #heat		11:06
*** nosnos has joined #heat		11:06
*** sorantis has joined #heat		11:08
*** ifarkas has quit IRC		11:10
*** ifarkas has joined #heat		11:22
*** yassine has joined #heat		11:33
*** lipinski has joined #heat		11:39
sdake	morning	11:40
cmyster	morning	11:41
*** mkollaro has joined #heat		11:42
*** tspatzier has joined #heat		11:43
sdake	cmyster for some reason I thought TLV was in shutdown until the 22nd	11:43
cmyster	ummm	11:43
cmyster	I'm not really here ?	11:43
sdake	hmm well your in so guess not :)	11:43
*** etoews has joined #heat		11:49
*** etoews has quit IRC		11:53
*** arbylee has joined #heat		11:54
*** igormarnat_ has joined #heat		11:57
*** arbylee has quit IRC		11:58
*** tspatzier has quit IRC		11:58
*** Qiming has quit IRC		11:59
*** Qiming has joined #heat		12:00
*** akuznetsov has quit IRC		12:09
*** akuznets_ has joined #heat		12:09
*** alexpilotti has quit IRC		12:15
*** nosnos has quit IRC		12:17
*** tspatzier has joined #heat		12:22
*** achampion has quit IRC		12:23
*** jdob has joined #heat		12:38
*** slagle has quit IRC		12:39
*** saju_m has joined #heat		12:40
*** slagle has joined #heat		12:40
*** rbuilta has joined #heat		12:40
*** akuznets_ has quit IRC		12:43
*** akuznetsov has joined #heat		12:43
*** saju_m has quit IRC		12:45
*** blomquisg has joined #heat		12:46
*** saju_m has joined #heat		13:02
*** pafuent has joined #heat		13:03
*** spzala has joined #heat		13:07
*** erecio has quit IRC		13:08
*** alexpilotti has joined #heat		13:14
*** erecio has joined #heat		13:14
*** achampion has joined #heat		13:16
*** mestery_ is now known as mestery		13:22
*** ramishra has joined #heat		13:27
*** jprovazn has quit IRC		13:29
*** zz_gondoi is now known as gondoi		13:31
*** gondoi is now known as zz_gondoi		13:31
*** dims has quit IRC		13:32
*** samstav has joined #heat		13:34
*** zz_gondoi is now known as gondoi		13:35
*** etoews has joined #heat		13:36
*** igormarnat_ has quit IRC		13:36
*** arbylee has joined #heat		13:41
*** pafuent has left #heat		13:43
*** pafuent has joined #heat		13:44
*** gondoi is now known as zz_gondoi		13:45
*** zz_gondoi is now known as gondoi		13:51
*** arbylee has quit IRC		13:52
*** arbylee has joined #heat		13:52
*** jprovazn has joined #heat		13:53
*** vijendar has joined #heat		13:53
*** julienvey has joined #heat		13:59
*** spzala has quit IRC		14:00
*** spzala has joined #heat		14:01
*** tspatzier has quit IRC		14:02
*** aweiteka has quit IRC		14:02
*** spzala has quit IRC		14:04
*** sjmc7 has joined #heat		14:05
*** aweiteka has joined #heat		14:15
*** jaustinpage has joined #heat		14:16
Qiming	shardy: there?	14:17
*** dims has joined #heat		14:18
shardy	Qiming: yes	14:20
Qiming	shardy: do we assign a role to a user, or assign a user to a role?	14:21
Qiming	or, there is no difference, :p	14:21
*** zns has joined #heat		14:22
shardy	Qiming: I would say you assign a role to a user, scoped to a project or domain	14:23
jaustinpage	shardy: re: todays blog post: how does heat handle signaling when it is in standalone mode?	14:23
Qiming	thanks, shardy	14:24
shardy	jaustinpage: assuming you don't have permission to create a new domain, you probably have to use the old fallback behavior, which is to create the users as before, in the project of the stack-owner	14:25
shardy	jaustinpage: I guess it depends on what level of control you have over the remote keystone	14:25
jaustinpage	shardy: but in standalone mode, i thought there was an assumption that you couldn't create users either...	14:25
jaustinpage	shardy: thanks for the reply	14:26
shardy	jaustinpage: I'm not aware of any such assumption, or none of the signalling features would have worked for anyone ever	14:26
jaustinpage	shardy	14:26
shardy	jaustinpage: happy to get use-case feedback though, if you have specific issues :)	14:26
jaustinpage	shardy: ok, thanks for the reply	14:27
shardy	jaustinpage: np	14:27
jaustinpage	shardy: one other question, you mentioned the ec2 method of passing keys, is there a significant difference between this method and the heat_signal method of passing keys?	14:28
jaustinpage	shardy: *from the heat engine to the vm being deployed, through cloud-init if i am understanding correctly...	14:28
shardy	jaustinpage: sorry, by heat_signal, you mean the native signals, e.g heat resource-signal?	14:28
jaustinpage	i believe so, i am pretty sure the heat softwaredeployment resource makes use of the heat resource-signal	14:30
shardy	jaustinpage: Ah, HEAT_SIGNAL for SoftwareDeployment resources creates a stack domain user, but not an ec2 keypair, instead it creates a random password, and we use that from the instance	14:30
shardy	So the main difference it removes the dependency on ec2tokens being enabled in keystone, which some deployers don't enable	14:31
*** julienvey has quit IRC		14:31
shardy	But the disadvantage is you have to obtain a token from the instance, e.g heatclient has to connect to keystone then heat	14:31
jaustinpage	ok, so the instance, in order to signal back, would get a token from keystone, then use that to authenticate to the heat metadata?	14:31
shardy	jaustinpage: exactly	14:31
shardy	jaustinpage: we're still looking at ways we might avoid that additional call to keystone, x509 cert most likely	14:33
*** daneyon has quit IRC		14:34
jaustinpage	shardy: ok, cool. if the call to keystone could be avoided, it would seem that a custom authentication mechanism in the heat engine's pipeline could then work, and still have signalling support	14:34
*** dims has quit IRC		14:34
jaustinpage	*heat engines authentication pipeline	14:34
*** daneyon has joined #heat		14:35
shardy	jaustinpage: "custom authentication mechanism"?	14:35
shardy	jaustinpage: FWIW, we (or at least I) have been specifically trying to avoid inventing something heat-specific for this	14:35
jaustinpage	shardy: somebody writing one of these: https://github.com/openstack/heat/blob/master/heat/common/custom_backend_auth.py	14:36
lipinski	Any reason why the heat client and/or engine needs permissions to /lost+found ?	14:36
lipinski	I'm failing to create a stack because of permissions on /lost+found and /root - while the heat-engine is running as heat user	14:37
*** andrew_plunk has joined #heat		14:37
jaustinpage	shardy: yea, i can definitely understand trying to avoid having a custom authentication backend	14:37
shardy	jaustinpage: If you write your own auth middleware then the call to keystone becomes irellevant, you just insert your m/w earlier in the paste pipeline, and modify the client to send whatever secret your auth scheme understands	14:37
*** sorantis has quit IRC		14:38
shardy	jaustinpage: The call to keystone is already optional e.g in python-heatclient, so if for example you were using the heat-api-standalone API pipeline, you could just hard-code a password in all your templates	14:38
jaustinpage	shardy: thanks for the info, and thanks for putting up with all of my questions!	14:39
shardy	jaustinpage: that doesn't really solve the problem that some resources are integrated with keystone functionality though, so you might have to modify them as well as your middleware	14:40
sdague	hey folks, we're seen a lot of inconsistent fails in the heat-slow jobs	14:40
sdague	it would be really great to get some eyes on some of these to figure out what's going wrong	14:40
shardy	jaustinpage: np, given me some things to think about re standalone mode, I've mostly been considering the integrated use-case	14:40
shardy	sdague: sure, got any links?	14:40
*** mriedem has joined #heat		14:41
mriedem	sdague: hi	14:41
sdague	mriedem has been doing some debug shardy, he should have some links on fails	14:41
jaustinpage	shardy: no worries, heat definitely walks between the iaas-whatever is higher up the chain than iaas line, which makes for some difficult choices when implementing feautres	14:41
shardy	jaustinpage: Yeah, that is the challenge with standalone mode. Feel free to raise bugs if you have specific problems	14:42
*** mtreinish has joined #heat		14:42
mriedem	http://goo.gl/NNAUfK	14:42
mriedem	was just looking at the results for that fail after mtreinish raised the build timeout in tempest for heat jobs yesterday, which didn't hel[p	14:42
mriedem	because the timeout happens in heat, not tempest	14:43
shardy	So the problem is a signal from the instance is not reaching heat	14:43
sdague	it also looks like it dramatically got worse	14:44
shardy	either because the instance is not running, the network is broken, or the VM deployment is just taking too long and the timeout is expiring	14:44
sdague	2 days ago	14:44
mriedem	so i wonder if the nova/neutron timeout/wait stuff slowed this all down	14:44
mriedem	it is failing on a slow neutron job right?	14:44
mriedem	nova waits longer for neutron to callback	14:45
shardy	sdague: do we have any timing data, are VM's taking massively longer to launch recently?	14:45
mriedem	heat is waiting for nova?	14:45
mriedem	shardy: ^	14:45
mriedem	so if nova is waiting on neutron, and heat is waiting on nova, and that all slowed down with callbacks	14:45
mriedem	we're going to see timeouts	14:45
*** zz_pablosan is now known as pablosan		14:45
shardy	mriedem: Yes, heat is waiting for the VM to boot, some stuff to happen inside the VM, and a signal to be POSTed baack to us	14:45
mriedem	shardy: is that controlled with stack_action_timeout?	14:46
mriedem	which defaults to 1 minute	14:46
mriedem	derp	14:46
shardy	mriedem: sec, let me look at the tests	14:46
mriedem	1 hour i should say	14:46
shardy	http://docs.openstack.org/developer/heat/template_guide/cfn.html#AWS::CloudFormation::WaitCondition	14:46
shardy	It's controlled by the Timeout specified in the template	14:47
*** dims has joined #heat		14:47
shardy	do we know which test is failing?	14:47
mriedem	sec	14:47
*** igormarnat_ has joined #heat		14:47
mriedem	there are a couple	14:48
mriedem	http://logs.openstack.org/47/84447/15/check/check-tempest-dsvm-neutron-heat-slow/f2a11ec/console.html	14:48
sdague	shardy: do the heat tests actually require neutron? I wonder if it's better to disconnect them from neutron failure rates to actually test heat instead of couple heat issues to neutron fails	14:48
mriedem	search for FAIL:	14:48
sdague	http://logs.openstack.org/47/84447/15/check/check-tempest-dsvm-neutron-heat-slow/f2a11ec/console.html#_2014-04-16_14_34_01_860	14:48
sdague	3 different tests failed in that one	14:48
shardy	sdague: Most tests probably don't, but e.g api test_neutron_resources.py does :)	14:48
mriedem	sdague: what makes the slow job 'slow'? not run in parallel?	14:48
sdague	mriedem: each of these tests is slow	14:49
sdague	or some of them can be	14:49
*** akuznetsov has quit IRC		14:49
shardy	ServerCfnInitTestJSON.test_all_resources_created[slow] 631.788	14:49
shardy	that can't be right..	14:49
sdague	shardy: right now it's a coin flip to pass heat-slow - http://jogo.github.io/gate/	14:49
shardy	I thought the entire heat-slow job took about 27 minutes	14:50
sdague	the timing there is right	14:51
sdague	depending on the node it drifts from 280 -> 700s	14:51
sdague	honestly, the other tests take longer than reported, because some stuff is done in setupclass	14:51
mriedem	http://logs.openstack.org/47/84447/15/check/check-tempest-dsvm-neutron-heat-slow/f2a11ec/console.html#_2014-04-16_14_34_13_597	14:51
sdague	which isn't time accounted	14:51
mriedem	651 sec for that run	14:51
sdague	yeh, it's been running that duration for as long as I can remember	14:52
shardy	So test_server_cfn_init.py is one that's failing, and that doesn't require neutron	14:52
shardy	Timeout: '600'	14:52
*** jaustinpage has quit IRC		14:52
shardy	If the test is taking >600s that will timeout	14:52
mriedem	should be 1200 now in tempest: https://review.openstack.org/#/c/87691/	14:53
shardy	https://github.com/openstack/tempest/blob/master/tempest/api/orchestration/stacks/templates/cfn_init_signal.yaml#L71	14:53
mriedem	derp	14:53
sdague	the guest boot takes 53s	14:54
sdague	563s	14:54
sdague	http://logs.openstack.org/47/84447/15/check/check-tempest-dsvm-neutron-heat-slow/f2a11ec/console.html#_2014-04-16_14_34_13_590	14:54
shardy	Just to boot? ouch :(	14:54
mriedem	so there is a configurable timeout in heat, there is a timeout for the heat-slow job, and there is a build_timeout for tempest.conf	14:54
mriedem	the tempest ones are the same value now, but the heat value isn't changed for the slow jobs	14:55
shardy	AFAICS those config values aren't overriding the values in the templates though	14:55
mriedem	they aren't	14:55
shardy	probably we should have a self.override_timeout(loaded_template) step in all the tests	14:56
sdague	shardy: well booting a full fedora 20 cloud guest 2nd level is slow	14:56
shardy	so we can globally configure the waitcondition timeout	14:56
sdague	could we build a cirros with cfn tools in it? or would that just get crazy	14:56
shardy	sdague: not sure tbh, I've only really used fedora images	14:57
shardy	sdague: if the image has python, cloud-init and boto, then probably	14:57
sdague	it has cloud init	14:57
sdague	I have no idea about the rest	14:58
shardy	cloud-init depends on boto, so it may work	14:58
shardy	there are a few other deps, but those are the main ones	14:58
*** Qiming has quit IRC		14:58
*** akuznetsov has joined #heat		15:00
SpamapS	boto is the devil	15:01
SpamapS	period	15:01
shardy	sdague, mriedem: want me to post a patch which aligns the WaitCondition timeout with build_timeout, but passing build_timeout as a parameter into the stack?	15:02
mriedem	shardy: yeah i was just looking at that	15:02
mriedem	when it reads the yaml file is it automatically converted to json?	15:02
sdague	cirros does not have python	15:03
shardy	mriedem: I think there are two ways, either directly override the Timeout in the template, or establish a convention where all templates containing a WaitCondition expose a parameter "timeout"	15:03
sdague	I wonder if they compiled down cloud init to a binary	15:03
shardy	sdague: How does cloud-init work then?	15:03
*** IlyaE has joined #heat		15:03
larsks	sdague shardy : cirros has a collection of shell scripts.	15:03
larsks	It's actually pretty clever, and the cli is somewhat nicer (it caches results locally, and provides cli tools for querying the data)	15:04
mriedem	shardy: i have no idea how to pass parameters to templates (never looked at heat before)	15:04
shardy	sdague: for WaitConditions, we don't actually need heat-cfntools, you can do it with just curl	15:04
*** TonyBurn has quit IRC		15:04
shardy	mriedem: Give me 10mins, I'll post a patch showing what I mean	15:05
*** igormarnat_ has left #heat		15:05
shardy	mriedem: what's the bug # for this issue?	15:05
mriedem	shardy: ok, thanks - fwiw the neutron_basic.yaml in tempest also has a 600 second wait timeout	15:05
*** julienvey has joined #heat		15:05
mriedem	1297560	15:05
shardy	mriedem: thanks	15:05
*** jaustinpage has joined #heat		15:06
sdague	larsks: so it just emulates cloud init? or is it totally different	15:06
*** sorantis has joined #heat		15:07
larsks	sdague: It doesn't really emulate cloud-init. It will run scripts in user-data, though. I don't think it makes any attempt at reading cloud-config format data.	15:10
shardy	mriedem: https://review.openstack.org/87993	15:12
shardy	mriedem: just going to try testing locally, but that's what I meant	15:12
larsks	sdague: Yeah, it just looks for "#!" in userdata and runs it, otherwise it just exposes the data via "cirros-query".	15:12
*** jprovazn is now known as jprovazn_afk		15:13
mriedem	shardy: cool, that's easy	15:14
mriedem	you missed one template though	15:14
shardy	mriedem: I'm doing it now, was going to post two patches	15:14
shardy	or I can add it to that patch if you prefer :)	15:14
mriedem	doing it in one seems good	15:14
shardy	Ok, git rebase squash it is :)	15:15
*** sorantis has quit IRC		15:19
shardy	mriedem: updated	15:19
*** kgriffs\|afk is now known as kgriffs		15:21
*** spzala has joined #heat		15:23
mriedem	shardy: looks good	15:25
sdague	shardy: so the lingering question is currently fedora cloud image is nearly 2 orders of magnitude slower to complete booting than cirros	15:25
sdague	I think that unless we can get it down to 1 order of magnitude, the amount of coverage we can realistically expect out of heat is going to be small	15:26
sdague	so if you have any thoughts on how to trim what's in that image that would be cool	15:26
*** sdake_ has quit IRC		15:27
*** aweiteka has quit IRC		15:27
*** andersonvom has joined #heat		15:28
shardy	sdague: I think test_neutron_resources.py can be converted to use an image not containing heat-cfntools	15:29
*** saju_m has quit IRC		15:29
shardy	All it does in the user-data is cfn-signal, which is basically just a wrapper for curl	15:30
shardy	so provided there's curl or something similar in the cirros image, perhaps we can use that?	15:30
shardy	I'll have to take a look, don't think I've ever booted a cirros image before	15:30
sdague	yeh, there is curl inside there	15:31
shardy	sdague: I think there are only a very small subset of things which actually need cfntools	15:31
sdague	ok, cool, well that would help a lot if we were able to isolate those things	15:31
sdague	then we could run most of the tests on cirros I think	15:32
*** fandi has joined #heat		15:33
shardy	Even test_server_cfn_init.py could be rewritten to not need cfn-init, although that might defeat the point of it a bit :)	15:35
*** fandi has quit IRC		15:37
sdague	yeh, I'm fine with using cfn-init where it's needed to test that	15:37
sdague	just given the image weight, I'd rather see what we can test with cirros so we can get broader coverage of heat that doesn't need cfn-tools	15:38
sdague	we'll get more bang for our buck that way	15:38
shardy	sdague: Sure, makes sense	15:38
shardy	sdague: the lack of python is an issue though, as we use python hook scripts for SoftwareDeployment resources IIRC	15:39
*** ramishra has quit IRC		15:39
shardy	maybe there's a way to do shell script hooks instead, not sure, stevebaker will know	15:39
sdague	well, I expect software deployment resources will need the bigger image	15:39
sdague	I wonder if there are things that could be stripped from the base image that would help with speed. Part of the issue is it's a 500 MB disk, which means we're generating real io, not keeping it in cache	15:41
sdague	whereas the cirros disk is 13M	15:41
*** smulcahy has joined #heat		15:42
sdague	anyway, got to run away for a bit	15:42
shardy	sdague: In a past life I maintained a Fedora image which was <50M, but the effort to prune things to get to that point was non-trivial	15:42
shardy	sdague: Ok, I'm out till next week but I'll start digging into the image requirements next week	15:43
*** fandi has joined #heat		15:44
*** sdake_ has joined #heat		15:45
*** chandan_kumar has quit IRC		15:46
smulcahy	Hi folks - is anyone looking at https://bugs.launchpad.net/heat/+bug/1306743 ? A few folks in HP are but not making much progress so far. It seems to be a hard blocker for running Heat with more than 2 or 3 nodes which is a surprisingly low bar.	15:46
uvirtbot	Launchpad bug 1306743 in heat "queuepool limit of size 5 overflow" [Critical,Triaged]	15:46
*** arbylee has quit IRC		15:47
zaneb	therve: I looked at your stack snapshots thing, but I think shardy is more qualified to comment ;)	15:47
shardy	zaneb: Yeah therve and I discussed it on IRC but I've not got around to replying to the ML post yet	15:48
*** etoews has left #heat		15:49
zaneb	shardy: thanks. that wasn't a hurry-up ;)	15:49
therve	zaneb, OK thanks :)	15:49
zaneb	it was more of a this-is-why-I-haven't-responded-to-the-post-therve-asked-me-to-look-at :)	15:50
shardy	lol :)	15:50
therve	smulcahy, The bug is not super clear to be honest. I don't know where to look	15:51
therve	The only fix I can think of is "make less SQL queries in Heat"	15:55
therve	Which is a fine goal but you may want a faster solution	15:55
smulcahy	therve: are we the only ones seeing this?	15:55
therve	cough	15:56
therve	You're the only ones reporting it at least	15:56
*** e0ne has quit IRC		15:56
smulcahy	we'll see if we can peel out a simpler reproducer	15:57
smulcahy	but currently blocked on any real deploys by this	15:57
*** e0ne has joined #heat		15:57
therve	smulcahy, Have you simply tried tweaking those parameters?	15:57
therve	5 and 10 looks small for a real deployment	15:58
smulcahy	yes and yes	15:58
zaneb	smulcahy: what does "nodes" mean in "2 or 3 nodes"?	15:58
*** jlanoux has joined #heat		15:59
smulcahy	zaneb: servers running nova bare metal	15:59
zaneb	ok	15:59
*** vinsh has joined #heat		16:00
smulcahy	we're trying to repro with VMs, or maybe figure out a simple Heat only test of some sort	16:00
*** mkollaro has quit IRC		16:00
smulcahy	but any suggestions and input most welcome on https://bugs.launchpad.net/heat/+bug/1306743	16:00
uvirtbot	Launchpad bug 1306743 in heat "queuepool limit of size 5 overflow" [Critical,Triaged]	16:00
*** e0ne has quit IRC		16:02
therve	smulcahy, Input from you would be welcome	16:02
therve	We really lack enough information to help	16:02
zaneb	smulcahy: so what is polling describe_stack_resource in that traceback?	16:02
*** arbylee has joined #heat		16:04
*** jlanoux has quit IRC		16:05
*** tomek_adamczewsk has quit IRC		16:06
*** ramishra has joined #heat		16:06
smulcahy	zaneb: one of the os- scripts afaik	16:09
zaneb	could that be the problem? how fast is it polling?	16:09
*** geerdest has joined #heat		16:10
smulcahy	not sure, asking on of our other folks to pop on if they're available	16:10
SpamapS	Hey if somebody can give https://bugs.launchpad.net/heat/+bug/1306743 a look..	16:11
uvirtbot	Launchpad bug 1306743 in heat "queuepool limit of size 5 overflow" [Critical,Triaged]	16:11
SpamapS	we're hitting scale problems at just 30 nodes requesting metadata from Heat.	16:11
*** Michalik- has quit IRC		16:11
zaneb	SpamapS: by happy coincidence we were just discussing that :)	16:12
SpamapS	I'm guessing we just need to start looking at a caching layer	16:12
smulcahy	zaneb: lifeless also ran into this problem on his testing last week so should be able to give more info in a bit	16:12
SpamapS	oh hah	16:12
SpamapS	zaneb: not such a coincidence, as smulcahy is indeed somebody probably even more motivated than I am to fix this :)	16:12
SpamapS	anyway, I'm offline for a while	16:12
SpamapS	good luck!	16:13
SpamapS	zaneb: polling once every 30 seconds per node	16:13
SpamapS	zaneb: os-collect-config btw	16:13
zaneb	ok, that doesn't sound unreasonable	16:13
SpamapS	zaneb: pretty slow IMO	16:13
SpamapS	But if each poll takes 5 queries or something.. :-/	16:14
SpamapS	anyway.. offline.. forrealz	16:14
zaneb	if it was 30 times per second then I would understand	16:14
*** TonyBurn has joined #heat		16:16
smulcahy	we're still trying to find the source of those 300-400 reqs/sec hitting mysql from heat-engine	16:17
*** zhiyan is now known as zhiyan_		16:17
petertoft	Also heat-engine pinning a CPU at 100%	16:18
smulcahy	all we have so far is that its call to resource_data_get(resource, key) in :heat/db/sqlalchemy/api.py	16:18
*** akuznets_ has joined #heat		16:18
zaneb	sounds like maybe we are creating a new session somewhere to request data that is probably already cached in our existing session	16:19
*** mriedem has left #heat		16:19
*** IlyaE has quit IRC		16:20
smulcahy	zaneb: there may be some cascading effect here too	16:21
*** akuznetsov has quit IRC		16:21
*** cmyster has quit IRC		16:21
zaneb	I'd look very closely at the Metadata class	16:22
smulcahy	zaneb: Can you put any suggestions and/or requests for more info on https://bugs.launchpad.net/heat/+bug/1306743 - it would help us in digging deeper on this	16:22
uvirtbot	Launchpad bug 1306743 in heat "queuepool limit of size 5 overflow" [Critical,Triaged]	16:22
*** cmyster has joined #heat		16:23
*** cmyster has joined #heat		16:23
therve	smulcahy, How many http requests to heat do you get?	16:24
smulcahy	therve: again, could you post these questions to the bug - I'll need to dig to answer them	16:25
*** pablosan has quit IRC		16:29
*** pablosan has joined #heat		16:29
*** IlyaE has joined #heat		16:30
*** ramishra has quit IRC		16:31
zaneb	smulcahy, therve: done	16:32
*** zhiyan_ is now known as zhiyan		16:34
*** zhiyan is now known as zhiyan_		16:41
*** gokrokve has joined #heat		16:47
*** harlowja_away is now known as harlowja		16:51
*** cmyster has quit IRC		16:52
*** yassine has quit IRC		16:54
*** cmyster has joined #heat		16:54
*** cmyster has joined #heat		16:54
*** wendar has quit IRC		16:59
*** wendar has joined #heat		17:01
*** julienvey has quit IRC		17:02
*** derekh has quit IRC		17:02
*** jstrachan has joined #heat		17:09
*** akuznets_ has quit IRC		17:11
*** Lotus907efi has joined #heat		17:11
*** Carlos44 has joined #heat		17:12
*** denis_makogon has quit IRC		17:13
*** dmakogon_ has joined #heat		17:13
Carlos44	hey everyone i have hit this bug https://bugs.launchpad.net/heat/+bug/1290274 trying to get heat db sync'd has anyone found how to get by this?	17:13
uvirtbot	Launchpad bug 1290274 in heat "Index on 'tenant' column will be inefficient and 767 bytes per index key on MySQL " [Medium,Triaged]	17:13
Carlos44	thats is the one any possible manual fixes	17:16
*** jaustinpage has quit IRC		17:18
*** Carlos44 has quit IRC		17:19
*** Carlos44 has joined #heat		17:20
Carlos44	hey everyone i have hit this bug https://bugs.launchpad.net/heat/+bug/1290274 trying to get heat db sync'd has anyone found how to get by this?	17:20
uvirtbot	Launchpad bug 1290274 in heat "Index on 'tenant' column will be inefficient and 767 bytes per index key on MySQL " [Medium,Triaged]	17:20
*** david-lyle has joined #heat		17:20
*** IlyaE has quit IRC		17:26
sdake	zaneb i'll be at my folks for dinner during the meeting time - so won't be able to make it	17:26
sdake	enjoy :)	17:26
harlowja	zanebaany heat guys around, got a probably easy question about if heat could do something a customer (mail) is asking for	17:26
harlowja	zanebany, haha	17:26
sdake	harlowja heat cannot solve world hunger	17:27
harlowja	:(	17:27
sdake	otherwise its great!	17:27
harlowja	will it solve my business continutity	17:27
Lotus907efi	is heat buzzword compatible?	17:28
harlowja	ha, anyway the simple question is, mail wants to basically startup CI servers, but have them auto-delete if they aren't used after X minutes, my knowledge of heat is not so much, but i thought that it had some type of capability to do this, but i can't remember anymore	17:28
sdake	business continuity - in the context of 1) monitor for failures 2) recover from failures 3) notify of failures 4) escalate on repeated failures	17:28
sdake	harlowja that is not business continuity imo :)	17:29
sdake	but yes, autoscaling will do that	17:29
harlowja	lol	17:29
harlowja	any good docs i can reference about this?	17:30
sdake	the developer docs for openstack show the resources you would want to use	17:30
sdake	the heat templates repo contains a autoscaling example	17:30
sdake	imo autoscaling needs love	17:30
*** jaustinpage has joined #heat		17:30
sdake	the specific problem you mention, which is autodelete a specific node if it is underutilized, heat will not do	17:30
sdake	heat will take a holistic approach to machines in an autoscaling group and scale up or down based upon metrics	17:31
sdake	but it doesn't target machines that are at low-utilization for removal	17:31
Lotus907efi	is there any documentation that would lead a newbie through all necessary steps to do a simple example of using heat and cloud-init to do a semi-routine config task on newly booted system?	17:31
harlowja	kk	17:31
sdake	heat expects a load balancer to run in front of the services to evenly spread load	17:31
*** lindsayk has joined #heat		17:31
sdake	so realistically when a node is killed off by autoscaling, it would have a similar load as other vms	17:32
*** kgriffs is now known as kgriffs\|afk		17:32
harlowja	right right, makes sense	17:32
Lotus907efi	I have been looking around a for a few days and reading stuff but I am still a very confused newbie when it comes to using heat meta-data / user-data to get cloud-init to do things on first boot	17:32
Lotus907efi	and the example yaml files I have looked at seem a little vague	17:33
lifeless	zaneb: o/ I'm around now	17:34
harlowja	thx sdake , let me see if i can further figure out what the heck mail people want to do	17:35
*** david-lyle has quit IRC		17:35
harlowja	ha	17:35
*** yogesh has joined #heat		17:36
*** lindsayk has quit IRC		17:36
*** jstrachan has quit IRC		17:38
*** TonyBurn has quit IRC		17:39
sdake	you mean yahoo mail harlowja?	17:42
*** petertoft has quit IRC		17:43
harlowja	ya	17:43
harlowja	i do	17:43
sdake	i suspect they dont' care about killing a low use node if they have a LB in front	17:44
*** lindsayk has joined #heat		17:44
sdake	they want to reduce utilization holistically rather then specifically	17:44
sdake	we do have some folks that want to be able to target specific nodes for queisce and kill	17:44
sdake	but that isn't implemented (yet)	17:45
harlowja	sdake this is also for there CI, not neccasrily for the mail facing servers yet, so i think we're trying to figure out what exactly they want still :)	17:45
sdake	harlowja http://docs.openstack.org/developer/heat/template_guide/openstack.html#OS::Heat::AutoScalingGroup and http://docs.openstack.org/developer/heat/template_guide/openstack.html#OS::Heat::ScalingPolicy	17:46
sdake	policy controls the group	17:47
sdake	group contains the collection of vms	17:47
harlowja	thx sdake i'll see if i can find out more about there requirement still, and reference the above as a possible way (once the requirement becomes less blurry)	17:48
sdake	lotus907efi have you launched your first stack?	17:49
Lotus907efi	I have been playing around with tripleo for about a month now	17:50
sdake	lotus907efi http://openstack.redhat.com/Deploy_Heat_and_launch_your_first_Application	17:50
sdake	lotus907efi http://openstack.redhat.com/Deploy_an_application_with_Heat	17:51
Lotus907efi	cool, thanks I will read those	17:51
sdake	the first step of heat is launching a stack	17:51
Lotus907efi	ok	17:51
sdake	once you get that down, you can play with the various heat API operations via cli	17:51
sdake	once you understand the clis, you can dig into writing your own templates	17:51
sdake	I'd proceed in that order :)	17:51
sdake	if you having a working openstack install, should take less then 30 minutes to get through those 3 things	17:52
sdake	one option that works really nicely - rax has stood up heat in their infrastructure, so all you need is the python client (and a credit card) to launch stacks :)	17:52
Lotus907efi	ah, well I have two or three tripleo devtest environments running	17:55
Lotus907efi	and I have been digging into those	17:55
*** gokrokve has quit IRC		17:57
sdake	lotus907efi I have not tried tripleo + heat	17:57
Lotus907efi	tripleo has heat integrated fully into it	17:57
sdake	I intend to get heavily involved atleast in using that model during juno tho	17:57
Lotus907efi	tripleo uses heat to bring up the undercloud and overcloud stacks that running devtest produces	17:58
sdake	lotus907efi yes I know - I think the heat core is going to take a keen interest in making sure that works moar better in juno	17:59
Lotus907efi	and all of those undercloud and overcloud systems have meta-data servers running at http://169.254.169.254	18:00
*** jprovazn_afk is now known as jprovazn		18:00
*** spzala has quit IRC		18:00
Lotus907efi	so are you saying that the heat bits built into tripleo might not be the most up to date fully coked bits?	18:00
Lotus907efi	cooked	18:01
cmyster	evening	18:01
sdake	heat bits built into tripleo are most up to date yes	18:01
*** spzala has joined #heat		18:02
sdake	lotus907efi I get the impression the integration could be improved	18:02
zaneb	lifeless: SpamapS answered my question already; I added some stuff to the bug	18:02
*** lindsayk1 has joined #heat		18:02
*** kgriffs\|afk is now known as kgriffs		18:02
sdake	mostly from the heat side	18:02
Lotus907efi	ah, ok	18:02
sdake	(eg heat has gaps that need feed and care)	18:02
Lotus907efi	hmm, from what I can see from what little I have used it the heat stuff in tripleo seems to work pretty well	18:03
*** lindsayk has quit IRC		18:04
*** e0ne has joined #heat		18:05
sdake	I think SpamapS would argue with your definition of seems :)	18:06
openstackgerrit	Andreas Jaeger proposed a change to openstack/heat: Check that all po/pot files are valid https://review.openstack.org/84226	18:06
*** ramishra has joined #heat		18:09
*** jstrachan has joined #heat		18:09
Lotus907efi	ah, well he is supposed to be on vacation so not allowed to grouse about stuff now	18:09
*** e0ne has quit IRC		18:10
*** lindsayk1 has quit IRC		18:11
*** zhangyang has quit IRC		18:11
sdague	shardy: you still awake?	18:12
*** e0ne has joined #heat		18:12
sdague	there is another race I'm seeing here - https://github.com/openstack/tempest/blob/master/tempest/api/orchestration/stacks/test_update.py#L73-L78	18:12
sdague	will a stack go through a state transition so that we can wait on something?	18:13
*** ramishra has quit IRC		18:13
cmyster	hi sdague	18:13
sdague	because it looks like a decent amount of the time the update isn't processing before the list pulls it back	18:13
*** rbuilta1 has joined #heat		18:13
sdague	cmyster: hi	18:14
*** rbuilta has quit IRC		18:15
sdague	actually, that's a more general heat question on resource wait on update	18:15
*** adeb_ has joined #heat		18:16
sdague	this has failed 39 times in the last 24 hrs, so pretty bad - http://logstash.openstack.org/#eyJzZWFyY2giOiJcIkZBSUw6IHRlbXBlc3QuYXBpLm9yY2hlc3RyYXRpb24uc3RhY2tzLnRlc3RfdXBkYXRlLlVwZGF0ZVN0YWNrVGVzdEpTT04udGVzdF9zdGFja191cGRhdGVfYWRkX3JlbW92ZVwiIiwiZmllbGRzIjpbXSwib2Zmc2V0IjowLCJ0aW1lZnJhbWUiOiI2MDQ4MDAiLCJncmFwaG1vZGUiOiJjb3VudCIsInRpbWUiOnsidXNlcl9pbnRlcnZhbCI6MH0sInN0YW1wIjoxMzk3NjcxNzUzOTE2fQ==	18:16
Lotus907efi	sdake: one comment about that "Deploy an application with Heat" document - the sentence "There are a number of sample templates available in the github repo" where "githup repo" is a link .... the link does not seem to actually lead to any example templates I can see	18:17
*** jprovazn has quit IRC		18:18
*** gokrokve has joined #heat		18:18
*** jstrachan has quit IRC		18:21
*** lindsayk has joined #heat		18:22
therve	sdague, "'Unknown resource Type : OS::Heat::RandomString" ? That's a really weird error	18:23
cmyster	depends on version I guess...	18:26
sdague	therve: are you seeing a different issue than I am?	18:29
sdague	http://logs.openstack.org/54/87554/9/check/check-tempest-dsvm-postgres-full/7922180/console.html#_2014-04-16_13_04_22_366	18:29
*** vinsh has quit IRC		18:29
sdague	the mismatch is a race on update	18:29
sdague	looks like it happens 25% of the time	18:30
*** spzala has quit IRC		18:30
*** spzala has joined #heat		18:30
*** pafuent1 has joined #heat		18:33
*** spzala has quit IRC		18:33
*** aweiteka has joined #heat		18:34
sdague	therve: here's the bug - https://bugs.launchpad.net/heat/+bug/1308682	18:35
uvirtbot	Launchpad bug 1308682 in tempest "Race in heat stack update " [Undecided,New]	18:35
sdague	I'm curious if this is expected that stack_update is only eventually consistent	18:35
sdague	and if so, if there is a wait condition to know it's done	18:35
*** pafuent has quit IRC		18:36
*** akuznetsov has joined #heat		18:37
sdague	therve: so if you have any thoughts before I just straight out skip the test	18:37
sdague	would be appreciated	18:37
*** aweiteka has quit IRC		18:43
*** kgriffs is now known as kgriffs\|afk		18:44
gokrokve	Hi. Is there any good example of HARestarter resource usage available? I checked here: https://github.com/openstack/heat-templates but did not find any.	18:46
*** jprovazn has joined #heat		18:47
*** chandan_kumar has joined #heat		18:49
cmyster	there is actually	18:52
cmyster	http://zenodo.org/record/7571/files/CERN_openlab_report_Michelino.pdf	18:52
*** russellb has quit IRC		18:53
adeb_	My settings requires going through a proxy for downloading things	18:53
*** petertoft has joined #heat		18:53
adeb_	I am trying to create a stack using this https://github.com/openstack/heat-templates/blob/master/hot/hello_world.yaml template...but my create fails with the follwoing error: Could not retrieve template: Failed to retrieve template: [Errno 110] ETIMEDOUT	18:54
adeb_	Is there any config file where I can set up the proxy	18:54
*** aweiteka has joined #heat		18:54
adeb_	I already have the env variable http_proxy set up in	18:54
*** bgorski has joined #heat		18:54
*** russellb has joined #heat		18:55
gokrokve	cmyster: Thanks!	19:00
*** gondoi is now known as zz_gondoi		19:01
*** tango has joined #heat		19:01
cmyster	np gokrokve	19:03
cmyster	adeb_: and proxy is working regularly otherwise?	19:03
cmyster	i.e if you go online to some web site	19:04
*** aweiteka has quit IRC		19:08
*** ramishra has joined #heat		19:09
*** nati_ueno has joined #heat		19:10
*** ramishra has quit IRC		19:14
*** e0ne has quit IRC		19:15
*** jdob_ has joined #heat		19:15
*** e0ne has joined #heat		19:16
*** nati_ueno has quit IRC		19:20
*** e0ne has quit IRC		19:21
*** nati_ueno has joined #heat		19:21
therve	sdague, Sorry was away. Yes in your logstack results I saw a different error	19:22
therve	Like http://logs.openstack.org/92/85392/6/check/check-tempest-master-dsvm-full-havana/6fb8d82/console.html	19:22
sdague	oh, yeh, that's another job	19:23
adeb_	yes, if I go to online to other sites it works	19:23
sdague	I realized that later	19:23
adeb_	sorry was away	19:23
sdague	so the failure rate is more like 10% I think	19:25
sdague	check-tempest-master-dsvm-full-havana is our attempt to run tempest master on stable/havana	19:25
therve	Still pretty bad	19:25
sdague	yeh	19:25
sdague	therve: so I pushed a skip	19:25
therve	Yeah I saw :/	19:26
sdague	however, that doesn't really answer the root question on whether that is expected to be non synchronous	19:26
*** tspatzier has joined #heat		19:26
sdague	and if it is, how would an api user know things were ready	19:26
*** cmyster has quit IRC		19:27
*** e0ne has joined #heat		19:27
therve	Uh	19:29
therve	sdague, TemplateYAMLNegativeTestJSON is pretty bad... It connects to example.com	19:29
*** chandan_kumar has quit IRC		19:30
sdague	therve: is it actually connecting?	19:32
therve	Yeah :/	19:32
therve	Unrelated to your issues, just saw that in the logs	19:32
*** nati_uen_ has joined #heat		19:32
sdague	so if we give it a totally bogus dns name will it do the right thing?	19:32
sdague	I agree we should get rid of network connects like that	19:33
*** nati_ueno has quit IRC		19:33
therve	It's doing a HTTP GET, so whatever answers fast would be nice	19:34
*** e0ne has quit IRC		19:34
*** cmyster has joined #heat		19:34
*** cmyster has joined #heat		19:34
*** e0ne has joined #heat		19:35
*** nati_uen_ has quit IRC		19:35
*** petertoft has quit IRC		19:37
*** e0ne has quit IRC		19:39
*** jistr has quit IRC		19:40
therve	sdague, So to get back to the problem, I think we simply have a race condition in Heat	19:41
therve	We set the state to UPDATE_COMPLETE but it's not really complete	19:41
sdague	ok	19:41
therve	And transactions are for suckers, so...	19:41
sdague	so then it was right that I also marked it as a heat bug	19:41
therve	I think so	19:42
sdague	if you want to add that commentary in there, would be appreciated	19:42
therve	I'm interested about the the other failure we're seeing to. It seems weird.	19:42
therve	Will do	19:42
sdague	yeh, well the tempest-master ones mostly would be an incompatible change from havana to now	19:46
sdague	perhaps something was added?	19:46
therve	Ah yes, those tests wouldn't pass on Heat havana	19:48
*** spzala has joined #heat		19:50
*** rbuilta1 has quit IRC		19:53
*** zns has quit IRC		19:53
*** saurabhs has joined #heat		19:55
*** tspatzier has quit IRC		19:55
*** vinsh has joined #heat		19:56
*** akuznetsov has quit IRC		19:57
*** chandan_kumar has joined #heat		19:57
openstackgerrit	Thomas Herve proposed a change to openstack/heat: Push COMPLETE status change at the end of update https://review.openstack.org/88075	19:57
*** IlyaE has joined #heat		19:58
*** jprovazn has quit IRC		19:59
*** jdob_ has quit IRC		19:59
*** e0ne has joined #heat		20:04
*** alexpilotti has quit IRC		20:05
*** e0ne has quit IRC		20:08
*** e0ne has joined #heat		20:08
*** ramishra has joined #heat		20:10
*** zns has joined #heat		20:14
*** ramishra has quit IRC		20:14
*** david-lyle has joined #heat		20:16
*** jistr has joined #heat		20:17
*** tspatzier has joined #heat		20:20
sdague	so it looks like - https://review.openstack.org/#/c/87993/ doesn't fix anything	20:21
sdague	we're still failing on wait condition on that	20:22
sdague	the cloud init errors aren't fun there though - http://logs.openstack.org/93/87993/2/gate/gate-tempest-dsvm-neutron-heat-slow/82b8ac1/console.html#_2014-04-16_17_38_16_664	20:23
sdague	with the heat job at about a 50% fail rate it's bouncing everyone else's patches at this point. I think that if we can't resolve these soon we need to stop voting with it.	20:25
*** radez_g0n3 has quit IRC		20:26
*** radez_g0n3 has joined #heat		20:26
*** aweiteka has joined #heat		20:28
sdake	sdague my guess there with that last trace is that the metadata server is not metadata serving	20:31
sdake	the last trace you showed showed the instance was being orchestrated up to 560 sec and timed out right around 600sec	20:32
sdake	this trace doesn't show anything after 300 sec	20:32
sdake	which implies cloud-init is spinning waiting for the metadata server to provide it the goods	20:33
stevebaker	sdague: \o	20:33
*** Tross1 has quit IRC		20:33
sdake	sdague I think you mentioned in some cases neutron doesn't setup the network properly?	20:33
*** Tross has joined #heat		20:33
stevebaker	what is that failed to set hostname error about?	20:34
sdake	no idea never seen that before	20:35
sdake	possibly network connectivity problems not allowing nova to work with the storage?	20:36
*** blomquisg has quit IRC		20:36
sdake	one thing to eliminate is "does the network actually work properly" prior to blaming wait conditions :)	20:36
sdague	yeh, I think that's probably wise. The initial test conditions here assume a bit too much. So it's hard to work backwards to the failures.	20:37
stevebaker	yes, a waitcondition timeout is a symptom of one failure in a very long chain, where most parts are not heat related	20:37
stevebaker	sdague: writing the boot log is meant to help diagnose these, but I'm happy for a whole bunch more debugging to be logged to diagnose these situations	20:38
sdake_	although if cloudinit fails to set that file, it could caus ecloud init to exit the init process	20:39
sdake_	and not actually orchestrate the instance	20:39
sdake_	alhtough I've never seen that happen in years of working on the code	20:39
sdake_	(the particular error)	20:39
stevebaker	we used to have set hostname failures before the name was pinned to under 63 chars	20:40
therve	It looks like there is a 9 minute window of nothing	20:40
sdague	sdake_: our experience is we run so much more throughput through the gate we see issues that no one else sees, because they happened once, and people just moved past	20:40
sdake_	sdague yup understood on that point	20:40
sdague	stevebaker: so what kind of additional debug, or assert are you thinking here?	20:41
sdague	we could also be more deliberate about asserting on the way up with things we believe should be working	20:41
stevebaker	sdague: I guess following the whole chain, checking networking connectivity from the server to the heat endpoint, checking heat-api-cfn responds to something	20:42
*** gokrokve has quit IRC		20:42
sdake_	sdague most devs also use baremetal rather then virt on virt	20:43
sdague	is there an existing heat-api-cfn client in the tree to do that easily	20:43
therve	Maybe set cloud init debug?	20:43
stevebaker	sdague: if we could find the right assert, then we could at least fail early rather than timeout	20:43
sdague	sdake_: sure, but that should only change timing	20:43
*** tspatzier has quit IRC		20:43
sdague	this shouldn't not work in this environment	20:43
sdague	so it will expose different races	20:43
sdake_	well virt on virt is a POS when I have tested previously	20:44
sdake_	stack traces, kernel oopses, machine lockups, etc	20:44
sdake_	2014-04-16 17:38:16.628 \| [ 0.015000] WARNING: This combination of AMD processors is not suitable for SMP.	20:44
sdake_	this kernel warning looks problematic	20:44
sdague	well we've not really had any guest issues previously	20:45
stevebaker	sdague: is this an issue that only happens on rax or HP?	20:45
sdague	stevebaker: good question, let me check	20:46
sdague	http://logstash.openstack.org/#eyJmaWVsZHMiOltdLCJzZWFyY2giOiJtZXNzYWdlOlwiaGVhdC5lbmdpbmUucmVzb3VyY2UgV2FpdENvbmRpdGlvblRpbWVvdXQ6IDAgb2YgMSByZWNlaXZlZFwiIEFORCB0YWdzOlwic2NyZWVuLWgtZW5nLnR4dFwiXG4iLCJ0aW1lZnJhbWUiOiI2MDQ4MDAiLCJncmFwaG1vZGUiOiJjb3VudCIsIm9mZnNldCI6MCwidGltZSI6eyJ1c2VyX2ludGVydmFsIjowfSwibW9kZSI6InNjb3JlIiwiYW5hbHl6ZV9maWVsZCI6ImJ1aWxkX25vZGUifQ==	20:47
sdague	seems pretty equal opportunity	20:47
sdake_	mostly hp cloud	20:47
sdake_	is the load 50/50 in the gate?	20:48
sdague	well, that's only the top 25 events	20:48
sdague	top 25 facets	20:48
sdague	we run 50 ish	20:48
sdague	This combination of AMD processors is not suitable for SMP. would be a rax error though	20:49
sdague	hp runs intel procs	20:49
sdague	but I really don't think that's the issue	20:49
sdake_	not sure if it is just pointing it out	20:49
sdake_	my top two guesses would be network or virt on virt	20:51
sdake_	i assume other guest tests don't run f20	20:51
sdague	correct, we're running cirros	20:52
sdake_	perhaps f20 has some incompatibility with the hypervisor on those environments	20:52
sdague	could be	20:52
therve	sdague, Presumably ssh access to those instances during a test is out of question?	20:52
sdague	therve: no, not out of the question	20:53
*** aweiteka has quit IRC		20:53
sdague	that would be complete kosher	20:53
therve	It's be interesting to see what's going on	20:53
sdague	sure	20:53
therve	We should have "Cloud-init v. 0.7.2 finished" in the logs, so I think it's still running	20:53
sdague	well the nova console log is there in the dump	20:54
therve	I'm blamming SSH access somewhere	20:54
therve	s/SSH/network	20:54
sdake_	it is either running or exited in some undefined way	20:54
stevebaker	historically ssh was attempted before the waitcondition returned, but it was moved to after because of ssh timeouts	20:54
sdake_	systemctl should give output of the cloudinit results	20:54
*** e0ne has quit IRC		20:54
stevebaker	but actually the ssh timeout is probably exactly the same bug as our current waitcondition timeout	20:54
*** e0ne has joined #heat		20:55
sdake_	would it be possible to gate with a distro other then f20?	20:55
*** jdob has quit IRC		20:55
sdake_	so we can eliminate the distro as a source	20:55
sdague	sdake_: if we can get this on cirros, that's super easy, as the image is there already	20:55
sdake_	does cirros have the proper cloudinit?	20:56
sdague	sdake_: no, it's got some lightweight scripts that do part of it	20:56
sdague	enough for nova tests to work	20:57
*** asalkeld has joined #heat		20:57
therve	stevebaker, Actually you're right, we should see SSH host key generation in there	20:57
sdake_	sdague heat definately needs cloudinit	20:57
sdague	that should be in the backlog, we were going through that this morning actually to try to figure out	20:57
*** e0ne has quit IRC		20:57
stevebaker	sdague: that test is to test cfn-init, so cirros is out. But there could be another test which uses cirros to test end to end connectivity, and cfn-signal could be replaced with curl	20:58
*** pafuent1 has quit IRC		20:58
stevebaker	sdake_: we could only switch to ubuntu when a solution is found for building images in gate	20:59
sdake_	well with f20 - it could be a kernel bug, a systemd bug, a cloud init bug	21:00
sdake_	any one of those things could fail and the test would not complete	21:00
sdague	ok, so it feels like we have a bunch of loose ends here. The real question is what made this spike at about 18:00 UTC Apr 14	21:01
sdake_	with cirros + curl, it could only be a systemd bug	21:01
sdake_	sorry with cirros + curl - if the gate works, then it is definately a f20 problem	21:01
sdake_	if cirros + curl = if the gate fails - likely a network problem	21:01
sdague	yep, sure	21:02
*** gokrokve has joined #heat		21:02
stevebaker	well, a problem which f20 reveals. It could equally be a neutron or nova bug	21:02
*** tspatzier has joined #heat		21:02
sdake_	stevebaker agree	21:02
sdague	stevebaker: definitely could be, however, we're not seeing the same high level of failure on nova	21:03
sdague	which does ~180 guest starts during a run	21:03
sdague	neutron is only probably at about 60 guest starts I think	21:03
*** kgriffs\|afk is now known as kgriffs		21:03
stevebaker	sdague: try starting ~180 f20s ;)	21:03
sdague	stevebaker: :)	21:03
*** kgriffs is now known as kgriffs\|afk		21:04
sdague	my point is we start 1, and we get really high failure. So the guest hypothesis is potentionally interesting	21:04
therve	stevebaker, So the test is doing mostly the same thing as the neutron ones that work, except cfn-int	21:05
sdake_	ya makes sense	21:05
therve	cfn-init	21:05
therve	Which does a metadata retrieve	21:05
sdake_	cloud-init does the metadata retrieval, not cfn-init	21:05
therve	sdake_: Heat metadata	21:06
sdague	ok, so we have some long term items, but we also have the short term issue of the 50% failure rate, which is liable to get flaming torches soon	21:06
stevebaker	therve: the cirros test could grep the nova metadata service user_data for some pattern, then curl signal the result. No cloud-init, no heat-cfntools	21:06
sdake_	so course of action - make heat gate non-voting until we get to bottom of problem	21:06
sdague	yeh, that was basically the opinion I was going to ask	21:06
sdake_	#2 make a cirros test which uses curl to identify if f20 is the cause	21:07
stevebaker	sdague: we could skip that test, and have some gerrit changes which unskip it while we continue to diagnose	21:07
sdague	stevebaker: that's an option as well, skipping the test means we won't get any data on it though	21:07
sdague	vs. non voting, where the runs will still happen	21:07
therve	stevebaker, Right but it doesn't solve the issue of that test	21:07
stevebaker	yeah, non-voting might be best	21:07
sdague	so I'd like heat core team pov on which you guys think is best	21:08
sdague	I'm happy to execute on either of them	21:08
therve	Note that cfn-signal works, the problem seems to be with cfn-init	21:08
stevebaker	therve: it looks like cfn-init isn't being run	21:08
sdake_	given the # of failures of f20 vs cirros coupled with the number of guest starts, I think we need to identify if f20 is part of the problem	21:08
stevebaker	therve: you mean cloud-init?	21:08
*** jaustinpage has quit IRC		21:09
therve	stevebaker, No I mean cfn-init	21:09
sdague	ok, got to drop for a few to relocate. I should be back on in 30 mins or so.	21:09
therve	stevebaker, Difference between https://github.com/openstack/tempest/blob/master/tempest/api/orchestration/stacks/templates/cfn_init_signal.yaml and https://github.com/openstack/tempest/blob/master/tempest/api/orchestration/stacks/templates/neutron_basic.yaml	21:10
stevebaker	therve: there is no evidence in the log that cfn-init is failing though	21:11
sdake_	it doesn't seem that cfn-init is run	21:12
therve	My guess would be that it hangs	21:12
therve	The first thing it does is connecting to heat	21:13
sdake_	my guess would be that cloud-init hangs :)	21:13
stevebaker	therve: no, cfn-init just consumes metadata data which cloud-init (and loguserdata.py) has already written to disk	21:13
therve	stevebaker, ? It connects to heat metadata server, no?	21:13
sdake_	cfn-init does not connect to any server unless yum or deb are specified as files	21:14
sdake_	it reads off the local disk	21:14
therve	That's not what I understand of the code	21:14
stevebaker	therve: no. cloud-init fetches the user_data from the nova metadata server with an http GET	21:15
stevebaker	therve: the user_data is a mime package containing the cfn metadata, plus loguserdata.py which cloud-init invokes to write that metadata to disk	21:15
stevebaker	(plus a bunch of other stuff)	21:15
therve	stevebaker, https://github.com/openstack/heat-cfntools/blob/master/heat_cfntools/cfntools/cfn_helper.py#L1119	21:16
therve	It seems we first try to get the remote metadata	21:16
therve	And then fail back to local files	21:16
sdake_	therve the getting of the remot emetadata described in line 1119 happens via cloud-init	21:17
sdake_	atleast that is how it behaved in the past :)	21:18
therve	remote_metadata seems pretty remote to me	21:19
stevebaker	therve: right, ok. We could not write out /etc/cfn/cfn-credentials to see if the test gets further, but it would probably fail for the same reason when attempting to cfn-signal	21:19
sdake	therve good point	21:19
therve	Yeah I'd be curious, but anyway	21:20
sdake	therve my apologies that is new code :)	21:20
therve	stevebaker, Maybe we can use a custom image and set debug to some stuff?	21:20
therve	Like cloud-init and cfn-tools	21:20
stevebaker	therve: yeah, sorry. I forgot the reason I wrote out cfn-credentials	21:20
*** andrew_plunk has quit IRC		21:21
*** zns has quit IRC		21:31
*** vijendar has quit IRC		21:32
*** jistr has quit IRC		21:33
*** achampion has quit IRC		21:34
*** dims has quit IRC		21:52
*** dims has joined #heat		22:04
*** e0ne has joined #heat		22:05
*** e0ne has quit IRC		22:10
*** lindsayk has quit IRC		22:13
*** lindsayk has joined #heat		22:13
*** lindsayk has quit IRC		22:13
*** lindsayk has joined #heat		22:15
*** zns has joined #heat		22:16
gokrokve	Hi. Is it possible to use Ceilometer alarms for HARestarter instead of CloudWatch alarms?	22:19
sdague	stevebaker / therve / sdake_ : if you guys are good with this, please +1 - https://review.openstack.org/88100 - it's making the job non voting	22:20
*** tspatzier has quit IRC		22:20
*** Tross1 has joined #heat		22:28
*** lindsayk has quit IRC		22:30
*** lindsayk has joined #heat		22:30
*** Tross has quit IRC		22:30
*** sjmc7 has quit IRC		22:34
stevebaker	sdague: +1	22:36
stevebaker	and if any heat-core +2s a change without checking the reason for a heat-slow failure, I keel yooo!	22:36
stevebaker	winky face	22:37
sdague	hehe	22:38
stevebaker	gokrokve: Yes, but I think you still need to use cfn-push-stats, so the metrics go through heat on the way to ceilometer	22:38
SpamapS	stevebaker: hah, that ventriliquist guy lives in my neighborhood.. ;)	22:38
gokrokve	stevebaker: Do I need to configure cfn-credentials for that?	22:39
stevebaker	gokrokve: um, yes?	22:40
*** yogesh has quit IRC		22:40
stevebaker	gokrokve: there must be an example template somewhere	22:40
gokrokve	stevebaker: What about cfn-hup? Should it be in crontab for that?	22:40
gokrokve	stevebaker: I've got an json example from CERN guys. its like 2 pages of bash magic in user-data :-(	22:41
mattoliverau	morning all	22:43
*** IlyaE has quit IRC		22:44
*** zns has quit IRC		22:46
stevebaker	gokrokve: I would avoid cfn-hup, its a very complicated way of achieving configuration updates	22:48
gokrokve	stevebaker: I see it in all examples. So what will be the best way to setup a VM to report status to Ceilometer?	22:48
stevebaker	gokrokve: look for cfn-push-stats	22:48
stevebaker	gokrokve: call it from cron or a bash loop	22:49
gokrokve	stevebaker: Cool. So I need to setup cfn-credentials with some securekay and then setup crontab to run cfn-push-stats	22:50
stevebaker	gokrokve: yes	22:50
gokrokve	stevebaker: Then create a Ceilometer alarm for specific instance gauge	22:50
gokrokve	stevebaker: Ok. Thanks. Will try to figure out how to glue this all together	22:50
stevebaker	gokrokve: or you could use python instead of cfn-push-stats https://review.openstack.org/#/c/44967/5/tempest/scenario/orchestration/test_autoscaling.yaml	22:51
*** lnxnut_ has joined #heat		22:51
stevebaker	gokrokve: which would be pure boto	22:51
*** zns has joined #heat		22:51
*** lnxnut has quit IRC		22:51
*** lnxnut_ has quit IRC		22:51
*** lnxnut has joined #heat		22:52
gokrokve	stevebaker: That is great. I will probably use python version as it is much clearer.	22:52
SpamapS	stevebaker: so, slow polling for metadata...	22:59
SpamapS	stevebaker: do we actually need to parse the whole stack, to just pull the metadata for a server?	22:59
*** david-lyle has quit IRC		23:01
stevebaker	SpamapS: the current implementation of _authorize_stack_user requires a parsed stack	23:01
*** adeb_ has quit IRC		23:01
SpamapS	stevebaker: so here's a thought for a potential optimization: shove metadata into swift, and hand out tempurls to said metadata.	23:02
stevebaker	SpamapS: my POLL_DEPLOYMENTS plan would be very low overhead, its just a formatted sql query	23:03
SpamapS	stevebaker: if that is too radical, we could also just precompute the inputs for _authorize_stack_user and save them in resource_data.	23:03
stevebaker	yeah, there is lots of potential optimisations	23:04
stevebaker	SpamapS: does raising max_pool_size in heat.conf mitigate this?	23:05
SpamapS	stevebaker: given that heat-engine is hitting 100% CPU, I think that will just change it from 500 errors to timeouts	23:05
stevebaker	yeah, ok	23:05
*** IlyaE has joined #heat		23:06
stevebaker	SpamapS: Once https://review.openstack.org/#/c/84269/ has landed I'll carry on with a collector which calls heatclient.software_deployments.metadata directly	23:08
SpamapS	stevebaker: it's got my +2 :)	23:11
*** killer_prince has quit IRC		23:14
*** ifarkas has quit IRC		23:14
stevebaker	SpamapS: cool. do you see any issue with getting a new keystone token every 30 seconds for every occ based server?	23:14
SpamapS	stevebaker: seems like a huge waste.	23:17
SpamapS	stevebaker: have to run for a while.. but I'll be back in a while.	23:17
stevebaker	SpamapS: it does doesn't it. the collector should really keep using the token until it is close to expiring	23:20
*** lazy_prince has joined #heat		23:20
*** lazy_prince is now known as killer_prince		23:20
*** gokrokve has quit IRC		23:23
*** vinsh has quit IRC		23:26
*** asalkeld_ has joined #heat		23:30
*** lipinski has quit IRC		23:30
*** asalkeld has quit IRC		23:30
*** zns has quit IRC		23:32
*** arbylee has quit IRC		23:34
*** achampion has joined #heat		23:36
*** chandan_kumar has quit IRC		23:37
*** andersonvom has quit IRC		23:40
cmyster	I am going over the API in http://api.openstack.org/api-ref-orchestration.html and I was wondering how can a software config be updated ?	23:41
stevebaker	cmyster: by creating a new one with different contents, they are designed to be immutable	23:42
cmyster	stevebaker: needs to be the same name or something?	23:42
*** asalkeld_ is now known as asalkeld		23:43
*** arbylee has joined #heat		23:43
stevebaker	cmyster: the deployment resource associates a config with a server, and creates derived configs whenever input_values changed. Its the derived config which the server ends up with	23:43
*** arbylee has quit IRC		23:43
cmyster	so from a user point of view to replace a config is to delete and recreate/	23:44
cmyster	?	23:44
*** andersonvom has joined #heat		23:44
*** cmyster has quit IRC		23:49
*** andersonvom has quit IRC		23:50
*** tango has quit IRC		23:50
*** ramishra has joined #heat		23:52
*** ramishra has quit IRC		23:56

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!