Wednesday, 2014-04-16

*** lipinski has quit IRC00:01
*** ChanServ changes topic to "support @ https://ask.openstack.org | developer wiki @ https://wiki.openstack.org/wiki/Heat | development @ https://launchpad.net/heat | logged @ http://eavesdrop.openstack.org/irclogs/%23heat/"00:01
*** jay_t has quit IRC00:05
*** achampion has joined #heat00:10
*** arbylee has quit IRC00:11
*** andersonvom has quit IRC00:15
*** m_22 has quit IRC00:16
*** spzala has joined #heat00:18
aru__thanks stevebaker00:24
aru__ot seemed like solve the problem00:24
*** asalkeld has quit IRC00:24
*** lindsayk1 has quit IRC00:25
*** arbylee has joined #heat00:26
*** matsuhashi has joined #heat00:30
*** lindsayk has joined #heat00:33
openstackgerritA change was merged to openstack/heat: Implement locking in abandon stack  https://review.openstack.org/8666300:36
*** asalkeld has joined #heat00:37
*** blamar has quit IRC00:41
*** lindsayk has joined #heat00:49
*** andersonvom has joined #heat00:55
*** blamar has joined #heat00:59
*** spzala has quit IRC01:01
*** nati_uen_ has quit IRC01:20
*** andersonvom has quit IRC01:28
*** lindsayk has quit IRC01:30
*** daneyon has joined #heat01:32
*** Qiming has joined #heat01:34
*** david-lyle has joined #heat01:36
*** matsuhashi has quit IRC01:40
*** matsuhas_ has joined #heat01:43
*** julienvey has joined #heat01:46
*** julienvey has quit IRC01:51
*** david-lyle has quit IRC02:03
openstackgerritJun Jie Nan proposed a change to openstack/python-heatclient: Add --preview option to stack abandon command  https://review.openstack.org/8468002:04
*** lipinski has joined #heat02:05
*** alexpilotti has quit IRC02:13
*** harlowja is now known as harlowja_away02:35
*** connie has joined #heat02:35
*** connie has quit IRC02:36
*** julienvey has joined #heat02:45
*** etoews has quit IRC02:47
*** matsuhas_ has quit IRC02:49
*** matsuhashi has joined #heat02:49
*** julienvey has quit IRC02:50
*** matsuhas_ has joined #heat02:52
*** matsuhashi has quit IRC02:52
*** matsuhas_ has quit IRC02:58
*** zhiyan_ is now known as zhiyan03:01
*** etoews has joined #heat03:05
*** sergmelikyan has quit IRC03:10
*** sergmelikyan has joined #heat03:13
*** etoews has quit IRC03:14
*** arbylee has quit IRC03:14
*** arbylee has joined #heat03:14
*** etoews has joined #heat03:22
*** ramishra has joined #heat03:23
*** etoews has quit IRC03:28
*** nosnos has quit IRC03:35
*** lipinski has quit IRC03:40
sdakeharlowja had early dinner - which message is interrupting your business continuity?03:44
sdakeand on that note, i'm off to bed, enjoy :)03:45
*** julienvey has joined #heat03:46
*** etoews has joined #heat03:47
*** julienvey has quit IRC03:51
openstackgerritJun Jie Nan proposed a change to openstack/heat: Add preview option to stack abandon  https://review.openstack.org/8466403:52
*** etoews has quit IRC03:52
*** IlyaE has quit IRC04:03
*** etoews has joined #heat04:06
*** sdake_ has joined #heat04:08
*** etoews has quit IRC04:10
*** asalkeld has quit IRC04:14
*** asalkeld has joined #heat04:16
*** IlyaE has joined #heat04:17
*** sergmelikyan has quit IRC04:21
*** sergmelikyan has joined #heat04:23
*** nosnos has joined #heat04:25
openstackgerritJun Jie Nan proposed a change to openstack/python-heatclient: Add code coverage in resource list test  https://review.openstack.org/8784604:25
openstackgerritJun Jie Nan proposed a change to openstack/python-heatclient: Fix empty resource list index out of range error  https://review.openstack.org/8726904:25
*** saju_m has joined #heat04:27
*** saju_m has quit IRC04:27
*** aru__ has quit IRC04:28
*** saju_m has joined #heat04:31
*** aweiteka has joined #heat04:34
*** pithagora has joined #heat04:38
*** achampio1 has joined #heat04:42
*** achampion has quit IRC04:44
*** julienvey has joined #heat04:47
*** achampion has joined #heat04:47
*** achampio1 has quit IRC04:48
*** nanjj has joined #heat04:49
*** julienvey has quit IRC04:52
*** cmyster has joined #heat04:59
*** cmyster has joined #heat04:59
*** IlyaE has quit IRC05:07
*** nkhare has joined #heat05:09
*** etoews has joined #heat05:10
cmystermorning05:11
Qimingmorning05:14
*** etoews has quit IRC05:17
cmysterhow are you this morning Qiming ?05:17
Qimingcmyster: feeling very hot in the office05:18
cmystersame here, summer has started very early this year...05:20
*** chandan_kumar has joined #heat05:31
*** pithagora has quit IRC05:37
*** dmueller has joined #heat05:48
*** Qiming has quit IRC05:50
*** dmueller has quit IRC05:50
*** julienvey has joined #heat05:51
*** IlyaE has joined #heat05:54
*** julienvey has quit IRC05:55
*** sdague has quit IRC05:58
*** sdague has joined #heat06:05
*** slagle has quit IRC06:12
*** slagle has joined #heat06:13
*** saju_m has quit IRC06:17
*** etoews has joined #heat06:23
*** liang has joined #heat06:28
*** etoews has quit IRC06:29
*** saju_m has joined #heat06:29
*** fandi has joined #heat06:39
*** etoews has joined #heat06:40
*** etoews has quit IRC06:45
*** arbylee has quit IRC06:47
therveGood morning!06:50
*** tomek_adamczewsk has joined #heat06:51
*** jprovazn has joined #heat06:51
cmystermorning06:52
*** IlyaE has quit IRC06:55
*** chandan_kumar has quit IRC07:02
*** chandan_kumar has joined #heat07:08
shardymorning all07:17
cmystermorning07:20
*** sdake has quit IRC07:21
*** jiangyaoguo has joined #heat07:21
*** jiangyaoguo has left #heat07:22
*** sdake has joined #heat07:36
*** tspatzier has joined #heat07:37
*** tspatzier has quit IRC07:42
*** arbylee has joined #heat07:48
*** asalkeld has quit IRC07:51
*** jistr has joined #heat07:52
*** akuznets_ has quit IRC07:53
openstackgerritZhang Yang proposed a change to openstack/heat: Allow DesiredCapacity to be zero  https://review.openstack.org/8720407:54
*** arbylee has quit IRC07:55
pas-hamorning all07:56
skraynevMorning all07:56
*** akuznetsov has joined #heat08:00
cmystermorning08:01
openstackgerritSergey Kraynev proposed a change to openstack/heat: Adding attribute schema class for attributes  https://review.openstack.org/8652508:11
openstackgerritSergey Kraynev proposed a change to openstack/heat: Using attribute schema for building documentation  https://review.openstack.org/8680308:11
openstackgerritSergey Kraynev proposed a change to openstack/heat: Deprecate first_address attribute of Server  https://review.openstack.org/8652608:12
*** derekh has joined #heat08:12
openstackgerritSergey Kraynev proposed a change to openstack/heat: Deprecate first_address attribute of Server  https://review.openstack.org/8652608:16
cmysterhttp://download.fedoraproject.org/pub/fedora/linux/updates/20/Images/x86_64/Fedora-x86_64-20-20140407-sda.qcow2 is heartbleed free btw08:17
*** e0ne has joined #heat08:21
*** che-arne has joined #heat08:33
*** sorantis has joined #heat08:35
*** petertoft has joined #heat08:35
*** pablosan is now known as zz_pablosan08:36
*** TonyBurn has joined #heat08:55
*** zhangyang has joined #heat08:56
*** rpothier has quit IRC09:00
*** rpothier has joined #heat09:01
*** chandan_kumar has quit IRC09:06
*** ramishra has quit IRC09:06
*** ramishra has joined #heat09:07
*** ramishra has quit IRC09:08
*** alexpilotti has joined #heat09:16
*** chandan_kumar has joined #heat09:20
*** chandan_kumar has quit IRC09:26
*** chandan_kumar has joined #heat09:26
*** tspatzier has joined #heat09:29
openstackgerritZhang Yang proposed a change to openstack/heat: Allow DesiredCapacity to be zero  https://review.openstack.org/8720409:30
*** liang has quit IRC09:32
*** saju_m has quit IRC09:49
*** arbylee has joined #heat09:53
*** tspatzier has quit IRC09:56
*** arbylee has quit IRC09:58
*** nosnos has quit IRC10:07
openstackgerritA change was merged to openstack/heat: Add hint on creating new user for Heat in DevStack  https://review.openstack.org/8755510:10
*** nosnos has joined #heat10:12
openstackgerritZhang Yang proposed a change to openstack/heat: Allow DesiredCapacity to be zero  https://review.openstack.org/8720410:14
*** dmakogon_ is now known as denis_makogon10:14
shardyIf anyone wants lots of details on stack domain users, I just posted this:10:15
shardyhttp://hardysteven.blogspot.co.uk/2014/04/heat-auth-model-updates-part-2-stack.html10:15
skraynevshardy: thanks. will read ;)10:16
*** e0ne has quit IRC10:17
*** e0ne has joined #heat10:18
*** Qiming has joined #heat10:18
*** nanjj has quit IRC10:21
*** e0ne has quit IRC10:22
*** etoews has joined #heat10:29
Qimingshardy, thank you!10:29
pas-hashardy: thanks, good read10:31
shardyQiming: ah, you're here now, no problem :)10:32
*** mestery_ has joined #heat10:33
*** nosnos has quit IRC10:34
*** etoews has quit IRC10:34
*** nosnos has joined #heat10:35
*** mestery has quit IRC10:36
Qimingshardy, I am trapped by sending a signal to heat using mechanism other than ec2-signed URL10:38
shardyQiming: You can send a signal via the native API10:38
shardybut not a WaitCondition notification at the moment10:39
shardyheat resource-signal ...10:39
*** nosnos has quit IRC10:39
Qimingshardy: I read the deployment code where heat-config sent back a signal now10:39
Qimingin that implementation, the signalling side need to have user-id, password, project-id, auth-url ...10:40
shardyQiming: Yes, the SoftwareDeployment resources have been designed to use either ec2 signed URLs or native signals10:40
shardyyes, it uses a stack domain user and a randomly generated password10:40
QimingI hope your blog will help me better understand how trusts work10:40
shardyMy post from last week may do, but it's unrelated to in-instance signalling10:40
shardyplease read both posts and come back if you still have questions :)10:41
Qimingthen maybe I can try have the Ceilometer::Alarm to post to a 'trust-url' ?10:41
shardyQiming: therve has posted patches which enable exactly that10:42
Qimingstill not sure how to post some data back along with the signal10:42
Qimingshardy, that is the patch I will try, :)10:42
* Qiming printed the blog posts and a dictionary, started to study English ...10:43
*** e0ne has joined #heat10:48
*** e0ne_ has joined #heat10:50
*** sorantis has quit IRC10:53
*** e0ne has quit IRC10:53
*** nkhare has quit IRC10:58
*** fandi has quit IRC11:04
*** e0ne_ has quit IRC11:06
*** e0ne has joined #heat11:06
*** Michalik- has joined #heat11:06
*** nosnos has joined #heat11:06
*** sorantis has joined #heat11:08
*** ifarkas has quit IRC11:10
*** ifarkas has joined #heat11:22
*** yassine has joined #heat11:33
*** lipinski has joined #heat11:39
sdakemorning11:40
cmystermorning11:41
*** mkollaro has joined #heat11:42
*** tspatzier has joined #heat11:43
sdakecmyster for some reason I thought TLV was in shutdown until the 22nd11:43
cmysterummm11:43
cmysterI'm not really here ?11:43
sdakehmm well your in so guess not :)11:43
*** etoews has joined #heat11:49
*** etoews has quit IRC11:53
*** arbylee has joined #heat11:54
*** igormarnat_ has joined #heat11:57
*** arbylee has quit IRC11:58
*** tspatzier has quit IRC11:58
*** Qiming has quit IRC11:59
*** Qiming has joined #heat12:00
*** akuznetsov has quit IRC12:09
*** akuznets_ has joined #heat12:09
*** alexpilotti has quit IRC12:15
*** nosnos has quit IRC12:17
*** tspatzier has joined #heat12:22
*** achampion has quit IRC12:23
*** jdob has joined #heat12:38
*** slagle has quit IRC12:39
*** saju_m has joined #heat12:40
*** slagle has joined #heat12:40
*** rbuilta has joined #heat12:40
*** akuznets_ has quit IRC12:43
*** akuznetsov has joined #heat12:43
*** saju_m has quit IRC12:45
*** blomquisg has joined #heat12:46
*** saju_m has joined #heat13:02
*** pafuent has joined #heat13:03
*** spzala has joined #heat13:07
*** erecio has quit IRC13:08
*** alexpilotti has joined #heat13:14
*** erecio has joined #heat13:14
*** achampion has joined #heat13:16
*** mestery_ is now known as mestery13:22
*** ramishra has joined #heat13:27
*** jprovazn has quit IRC13:29
*** zz_gondoi is now known as gondoi13:31
*** gondoi is now known as zz_gondoi13:31
*** dims has quit IRC13:32
*** samstav has joined #heat13:34
*** zz_gondoi is now known as gondoi13:35
*** etoews has joined #heat13:36
*** igormarnat_ has quit IRC13:36
*** arbylee has joined #heat13:41
*** pafuent has left #heat13:43
*** pafuent has joined #heat13:44
*** gondoi is now known as zz_gondoi13:45
*** zz_gondoi is now known as gondoi13:51
*** arbylee has quit IRC13:52
*** arbylee has joined #heat13:52
*** jprovazn has joined #heat13:53
*** vijendar has joined #heat13:53
*** julienvey has joined #heat13:59
*** spzala has quit IRC14:00
*** spzala has joined #heat14:01
*** tspatzier has quit IRC14:02
*** aweiteka has quit IRC14:02
*** spzala has quit IRC14:04
*** sjmc7 has joined #heat14:05
*** aweiteka has joined #heat14:15
*** jaustinpage has joined #heat14:16
Qimingshardy: there?14:17
*** dims has joined #heat14:18
shardyQiming: yes14:20
Qimingshardy: do we assign a role to a user, or assign a user to a role?14:21
Qimingor, there is no difference, :p14:21
*** zns has joined #heat14:22
shardyQiming: I would say you assign a role to a user, scoped to a project or domain14:23
jaustinpageshardy: re: todays blog post: how does heat handle signaling when it is in standalone mode?14:23
Qimingthanks, shardy14:24
shardyjaustinpage: assuming you don't have permission to create a new domain, you probably have to use the old fallback behavior, which is to create the users as before, in the project of the stack-owner14:25
shardyjaustinpage: I guess it depends on what level of control you have over the remote keystone14:25
jaustinpageshardy: but in standalone mode, i thought there was an assumption that you couldn't create users either...14:25
jaustinpageshardy: thanks for the reply14:26
shardyjaustinpage: I'm not aware of any such assumption, or none of the signalling features would have worked for anyone ever14:26
jaustinpageshardy14:26
shardyjaustinpage: happy to get use-case feedback though, if you have specific issues :)14:26
jaustinpageshardy: ok, thanks for the reply14:27
shardyjaustinpage: np14:27
jaustinpageshardy: one other question, you mentioned the ec2 method of passing keys, is there a significant difference between this method and the heat_signal method of passing keys?14:28
jaustinpageshardy: *from the heat engine to the vm being deployed, through cloud-init if i am understanding correctly...14:28
shardyjaustinpage: sorry, by heat_signal, you mean the native signals, e.g heat resource-signal?14:28
jaustinpagei believe so, i am pretty sure the heat softwaredeployment resource makes use of the heat resource-signal14:30
shardyjaustinpage: Ah, HEAT_SIGNAL for SoftwareDeployment resources creates a stack domain user, but not an ec2 keypair, instead it creates a random password, and we use that from the instance14:30
shardySo the main difference it removes the dependency on ec2tokens being enabled in keystone, which some deployers don't enable14:31
*** julienvey has quit IRC14:31
shardyBut the disadvantage is you have to obtain a token from the instance, e.g heatclient has to connect to keystone then heat14:31
jaustinpageok, so the instance, in order to signal back, would get a token from keystone, then use that to authenticate to the heat metadata?14:31
shardyjaustinpage: exactly14:31
shardyjaustinpage: we're still looking at ways we might avoid that additional call to keystone, x509 cert most likely14:33
*** daneyon has quit IRC14:34
jaustinpageshardy: ok, cool. if the call to keystone could be avoided, it would seem that a custom authentication mechanism in the heat engine's pipeline could then work, and still have signalling support14:34
*** dims has quit IRC14:34
jaustinpage*heat engines authentication pipeline14:34
*** daneyon has joined #heat14:35
shardyjaustinpage: "custom authentication mechanism"?14:35
shardyjaustinpage: FWIW, we (or at least I) have been specifically trying to avoid inventing something heat-specific for this14:35
jaustinpageshardy: somebody writing one of these: https://github.com/openstack/heat/blob/master/heat/common/custom_backend_auth.py14:36
lipinskiAny reason why the heat client and/or engine needs permissions to /lost+found ?14:36
lipinskiI'm failing to create a stack because of permissions on /lost+found and /root - while the heat-engine is running as heat user14:37
*** andrew_plunk has joined #heat14:37
jaustinpageshardy: yea, i can definitely understand trying to avoid having a custom authentication backend14:37
shardyjaustinpage: If you write your own auth middleware then the call to keystone becomes irellevant, you just insert your m/w earlier in the paste pipeline, and modify the client to send whatever secret your auth scheme understands14:37
*** sorantis has quit IRC14:38
shardyjaustinpage: The call to keystone is already optional e.g in python-heatclient, so if for example you were using the heat-api-standalone API pipeline, you could just hard-code a password in all your templates14:38
jaustinpageshardy: thanks for the info, and thanks for putting up with all of my questions!14:39
shardyjaustinpage: that doesn't really solve the problem that some resources are integrated with keystone functionality though, so you might have to modify them as well as your middleware14:40
sdaguehey folks, we're seen a lot of inconsistent fails in the heat-slow jobs14:40
sdagueit would be really great to get some eyes on some of these to figure out what's going wrong14:40
shardyjaustinpage: np, given me some things to think about re standalone mode, I've mostly been considering the integrated use-case14:40
shardysdague: sure, got any links?14:40
*** mriedem has joined #heat14:41
mriedemsdague: hi14:41
sdaguemriedem has been doing some debug shardy, he should have some links on fails14:41
jaustinpageshardy: no worries, heat definitely walks between the iaas-whatever is higher up the chain than iaas line, which makes for some difficult choices when implementing feautres14:41
shardyjaustinpage: Yeah, that is the challenge with standalone mode.  Feel free to raise bugs if you have specific problems14:42
*** mtreinish has joined #heat14:42
mriedemhttp://goo.gl/NNAUfK14:42
mriedemwas just looking at the results for that fail after mtreinish raised the build timeout in tempest for heat jobs yesterday, which didn't hel[p14:42
mriedembecause the timeout happens in heat, not tempest14:43
shardySo the problem is a signal from the instance is not reaching heat14:43
sdagueit also looks like it dramatically got worse14:44
shardyeither because the instance is not running, the network is broken, or the VM deployment is just taking too long and the timeout is expiring14:44
sdague2 days ago14:44
mriedemso i wonder if the nova/neutron timeout/wait stuff slowed this all down14:44
mriedemit is failing on a slow neutron job right?14:44
mriedemnova waits longer for neutron to callback14:45
shardysdague: do we have any timing data, are VM's taking massively longer to launch recently?14:45
mriedemheat is waiting for nova?14:45
mriedemshardy: ^14:45
mriedemso if nova is waiting on neutron, and heat is waiting on nova, and that all slowed down with callbacks14:45
mriedemwe're going to see timeouts14:45
*** zz_pablosan is now known as pablosan14:45
shardymriedem: Yes, heat is waiting for the VM to boot, some stuff to happen inside the VM, and a signal to be POSTed baack to us14:45
mriedemshardy: is that controlled with stack_action_timeout?14:46
mriedemwhich defaults to 1 minute14:46
mriedemderp14:46
shardymriedem: sec, let me look at the tests14:46
mriedem1 hour i should say14:46
shardyhttp://docs.openstack.org/developer/heat/template_guide/cfn.html#AWS::CloudFormation::WaitCondition14:46
shardyIt's controlled by the Timeout specified in the template14:47
*** dims has joined #heat14:47
shardydo we know which test is failing?14:47
mriedemsec14:47
*** igormarnat_ has joined #heat14:47
mriedemthere are a couple14:48
mriedemhttp://logs.openstack.org/47/84447/15/check/check-tempest-dsvm-neutron-heat-slow/f2a11ec/console.html14:48
sdagueshardy: do the heat tests actually require neutron? I wonder if it's better to disconnect them from neutron failure rates to actually test heat instead of couple heat issues to neutron fails14:48
mriedemsearch for FAIL:14:48
sdaguehttp://logs.openstack.org/47/84447/15/check/check-tempest-dsvm-neutron-heat-slow/f2a11ec/console.html#_2014-04-16_14_34_01_86014:48
sdague3 different tests failed in that one14:48
shardysdague: Most tests probably don't, but e.g api test_neutron_resources.py does :)14:48
mriedemsdague: what makes the slow job 'slow'? not run in parallel?14:48
sdaguemriedem: each of these tests is slow14:49
sdagueor some of them can be14:49
*** akuznetsov has quit IRC14:49
shardyServerCfnInitTestJSON.test_all_resources_created[slow]                631.78814:49
shardythat can't be right..14:49
sdagueshardy: right now it's a coin flip to pass heat-slow - http://jogo.github.io/gate/14:49
shardyI thought the entire heat-slow job took about 27 minutes14:50
sdaguethe timing there is right14:51
sdaguedepending on the node it drifts from 280 -> 700s14:51
sdaguehonestly, the other tests take longer than reported, because some stuff is done in setupclass14:51
mriedemhttp://logs.openstack.org/47/84447/15/check/check-tempest-dsvm-neutron-heat-slow/f2a11ec/console.html#_2014-04-16_14_34_13_59714:51
sdaguewhich isn't time accounted14:51
mriedem651 sec for that run14:51
sdagueyeh, it's been running that duration for as long as I can remember14:52
shardySo test_server_cfn_init.py is one that's failing, and that doesn't require neutron14:52
shardyTimeout: '600'14:52
*** jaustinpage has quit IRC14:52
shardyIf the test is taking >600s that will timeout14:52
mriedemshould be 1200 now in tempest: https://review.openstack.org/#/c/87691/14:53
shardyhttps://github.com/openstack/tempest/blob/master/tempest/api/orchestration/stacks/templates/cfn_init_signal.yaml#L7114:53
mriedemderp14:53
sdaguethe guest boot takes 53s14:54
sdague563s14:54
sdaguehttp://logs.openstack.org/47/84447/15/check/check-tempest-dsvm-neutron-heat-slow/f2a11ec/console.html#_2014-04-16_14_34_13_59014:54
shardyJust to boot? ouch :(14:54
mriedemso there is a configurable timeout in heat, there is a timeout for the heat-slow job, and there is a build_timeout for tempest.conf14:54
mriedemthe tempest ones are the same value now, but the heat value isn't changed for the slow jobs14:55
shardyAFAICS those config values aren't overriding the values in the templates though14:55
mriedemthey aren't14:55
shardyprobably we should have a self.override_timeout(loaded_template) step in all the tests14:56
sdagueshardy: well booting a full fedora 20 cloud guest 2nd level is slow14:56
shardyso we can globally configure the waitcondition timeout14:56
sdaguecould we build a cirros with cfn tools in it? or would that just get crazy14:56
shardysdague: not sure tbh, I've only really used fedora images14:57
shardysdague: if the image has python, cloud-init and boto, then probably14:57
sdagueit has cloud init14:57
sdagueI have no idea about the rest14:58
shardycloud-init depends on boto, so it may work14:58
shardythere are a few other deps, but those are the main ones14:58
*** Qiming has quit IRC14:58
*** akuznetsov has joined #heat15:00
SpamapSboto is the devil15:01
SpamapSperiod15:01
shardysdague, mriedem: want me to post a patch which aligns the WaitCondition timeout with build_timeout, but passing build_timeout as a parameter into the stack?15:02
mriedemshardy: yeah i was just looking at that15:02
mriedemwhen it reads the yaml file is it automatically converted to json?15:02
sdaguecirros does not have python15:03
shardymriedem: I think there are two ways, either directly override the Timeout in the template, or establish a convention where all templates containing a WaitCondition expose a parameter "timeout"15:03
sdagueI wonder if they compiled down cloud init to a binary15:03
shardysdague: How does cloud-init work then?15:03
*** IlyaE has joined #heat15:03
larskssdague shardy : cirros has a collection of shell scripts.15:03
larsksIt's actually pretty clever, and the cli is somewhat nicer (it caches results locally, and provides cli tools for querying the data)15:04
mriedemshardy: i have no idea how to pass parameters to templates (never looked at heat before)15:04
shardysdague: for WaitConditions, we don't actually need heat-cfntools, you can do it with just curl15:04
*** TonyBurn has quit IRC15:04
shardymriedem: Give me 10mins, I'll post a patch showing what I mean15:05
*** igormarnat_ has left #heat15:05
shardymriedem: what's the bug # for this issue?15:05
mriedemshardy: ok, thanks - fwiw the neutron_basic.yaml in tempest also has a 600 second wait timeout15:05
*** julienvey has joined #heat15:05
mriedem129756015:05
shardymriedem: thanks15:05
*** jaustinpage has joined #heat15:06
sdaguelarsks: so it just emulates cloud init? or is it totally different15:06
*** sorantis has joined #heat15:07
larskssdague: It doesn't really emulate cloud-init.  It will run scripts in user-data, though.  I don't think it makes any attempt at reading cloud-config format data.15:10
shardymriedem: https://review.openstack.org/8799315:12
shardymriedem: just going to try testing locally, but that's what I meant15:12
larskssdague: Yeah, it just looks for "#!" in userdata and runs it, otherwise it just exposes the data via "cirros-query".15:12
*** jprovazn is now known as jprovazn_afk15:13
mriedemshardy: cool, that's easy15:14
mriedemyou missed one template though15:14
shardymriedem: I'm doing it now, was going to post two patches15:14
shardyor I can add it to that patch if you prefer :)15:14
mriedemdoing it in one seems good15:14
shardyOk, git rebase squash it is :)15:15
*** sorantis has quit IRC15:19
shardymriedem: updated15:19
*** kgriffs|afk is now known as kgriffs15:21
*** spzala has joined #heat15:23
mriedemshardy: looks good15:25
sdagueshardy: so the lingering question is currently fedora cloud image is nearly 2 orders of magnitude slower to complete booting than cirros15:25
sdagueI think that unless we can get it down to 1 order of magnitude, the amount of coverage we can realistically expect out of heat is going to be small15:26
sdagueso if you have any thoughts on how to trim what's in that image that would be cool15:26
*** sdake_ has quit IRC15:27
*** aweiteka has quit IRC15:27
*** andersonvom has joined #heat15:28
shardysdague: I think test_neutron_resources.py can be converted to use an image not containing heat-cfntools15:29
*** saju_m has quit IRC15:29
shardyAll it does in the user-data is cfn-signal, which is basically just a wrapper for curl15:30
shardyso provided there's curl or something similar in the cirros image, perhaps we can use that?15:30
shardyI'll have to take a look, don't think I've ever booted a cirros image before15:30
sdagueyeh, there is curl inside there15:31
shardysdague: I think there are only a very small subset of things which actually *need* cfntools15:31
sdagueok, cool, well that would help a lot if we were able to isolate those things15:31
sdaguethen we could run most of the tests on cirros I think15:32
*** fandi has joined #heat15:33
shardyEven test_server_cfn_init.py could be rewritten to not need cfn-init, although that might defeat the point of it a bit :)15:35
*** fandi has quit IRC15:37
sdagueyeh, I'm fine with using cfn-init where it's needed to test that15:37
sdaguejust given the image weight, I'd rather see what we can test with cirros so we can get broader coverage of heat that doesn't need cfn-tools15:38
sdaguewe'll get more bang for our buck that way15:38
shardysdague: Sure, makes sense15:38
shardysdague: the lack of python is an issue though, as we use python hook scripts for SoftwareDeployment resources IIRC15:39
*** ramishra has quit IRC15:39
shardymaybe there's a way to do shell script hooks instead, not sure, stevebaker will know15:39
sdaguewell, I expect software deployment resources will need the bigger image15:39
sdagueI wonder if there are things that could be stripped from the base image that would help with speed. Part of the issue is it's a 500 MB disk, which means we're generating real io, not keeping it in cache15:41
sdaguewhereas the cirros disk is 13M15:41
*** smulcahy has joined #heat15:42
sdagueanyway, got to run away for a bit15:42
shardysdague: In a past life I maintained a Fedora image which was <50M, but the effort to prune things to get to that point was non-trivial15:42
shardysdague: Ok, I'm out till next week but I'll start digging into the image requirements next week15:43
*** fandi has joined #heat15:44
*** sdake_ has joined #heat15:45
*** chandan_kumar has quit IRC15:46
smulcahyHi folks - is anyone looking at https://bugs.launchpad.net/heat/+bug/1306743 ? A few folks in HP are but not making much progress so far. It seems to be a hard blocker for running Heat with more than 2 or 3 nodes which is a surprisingly low bar.15:46
uvirtbotLaunchpad bug 1306743 in heat "queuepool limit of size 5 overflow" [Critical,Triaged]15:46
*** arbylee has quit IRC15:47
zanebtherve: I looked at your stack snapshots thing, but I think shardy is more qualified to comment ;)15:47
shardyzaneb: Yeah therve and I discussed it on IRC but I've not got around to replying to the ML post yet15:48
*** etoews has left #heat15:49
zanebshardy: thanks. that wasn't a hurry-up ;)15:49
thervezaneb, OK thanks :)15:49
zanebit was more of a this-is-why-I-haven't-responded-to-the-post-therve-asked-me-to-look-at :)15:50
shardylol :)15:50
thervesmulcahy, The bug is not super clear to be honest. I don't know where to look15:51
therveThe only fix I can think of is "make less SQL queries in Heat"15:55
therveWhich is a fine goal but you may want a faster solution15:55
smulcahytherve: are we the only ones seeing this?15:55
therve*cough*15:56
therveYou're the only ones reporting it at least15:56
*** e0ne has quit IRC15:56
smulcahywe'll see if we can peel out a simpler reproducer15:57
smulcahybut currently blocked on any real deploys by this15:57
*** e0ne has joined #heat15:57
thervesmulcahy, Have you simply tried tweaking those parameters?15:57
therve5 and 10 looks small for a real deployment15:58
smulcahyyes and yes15:58
zanebsmulcahy: what does "nodes" mean in "2 or 3 nodes"?15:58
*** jlanoux has joined #heat15:59
smulcahyzaneb: servers running nova bare metal15:59
zanebok15:59
*** vinsh has joined #heat16:00
smulcahywe're trying to repro with VMs, or maybe figure out a simple Heat only test of some sort16:00
*** mkollaro has quit IRC16:00
smulcahybut any suggestions and input most welcome on https://bugs.launchpad.net/heat/+bug/130674316:00
uvirtbotLaunchpad bug 1306743 in heat "queuepool limit of size 5 overflow" [Critical,Triaged]16:00
*** e0ne has quit IRC16:02
thervesmulcahy, Input from you would be welcome16:02
therveWe really lack enough information to help16:02
zanebsmulcahy: so what is polling describe_stack_resource in that traceback?16:02
*** arbylee has joined #heat16:04
*** jlanoux has quit IRC16:05
*** tomek_adamczewsk has quit IRC16:06
*** ramishra has joined #heat16:06
smulcahyzaneb: one of the os- scripts afaik16:09
zanebcould that be the problem? how fast is it polling?16:09
*** geerdest has joined #heat16:10
smulcahynot sure, asking on of our other folks to pop on if they're available16:10
SpamapSHey if somebody can give https://bugs.launchpad.net/heat/+bug/1306743 a look..16:11
uvirtbotLaunchpad bug 1306743 in heat "queuepool limit of size 5 overflow" [Critical,Triaged]16:11
SpamapSwe're hitting scale problems at just 30 nodes requesting metadata from Heat.16:11
*** Michalik- has quit IRC16:11
zanebSpamapS: by happy coincidence we were just discussing that :)16:12
SpamapSI'm guessing we just need to start looking at a caching layer16:12
smulcahyzaneb: lifeless also ran into this problem on his testing last week so should be able to give more info in a bit16:12
SpamapSoh hah16:12
SpamapSzaneb: not such a coincidence, as smulcahy is indeed somebody probably even more motivated than I am to fix this :)16:12
SpamapSanyway, I'm offline for a while16:12
SpamapSgood luck!16:13
SpamapSzaneb: polling once every 30 seconds per node16:13
SpamapSzaneb: os-collect-config btw16:13
zanebok, that doesn't sound unreasonable16:13
SpamapSzaneb: pretty slow IMO16:13
SpamapSBut if each poll takes 5 queries or something.. :-/16:14
SpamapSanyway.. offline.. forrealz16:14
zanebif it was 30 times per second then I would understand16:14
*** TonyBurn has joined #heat16:16
smulcahywe're still trying to find the source of those 300-400 reqs/sec hitting mysql from heat-engine16:17
*** zhiyan is now known as zhiyan_16:17
petertoftAlso heat-engine pinning a CPU at 100%16:18
smulcahyall we have so far is that its call to resource_data_get(resource, key) in :heat/db/sqlalchemy/api.py16:18
*** akuznets_ has joined #heat16:18
zanebsounds like maybe we are creating a new session somewhere to request data that is probably already cached in our existing session16:19
*** mriedem has left #heat16:19
*** IlyaE has quit IRC16:20
smulcahyzaneb: there may be some cascading effect here too16:21
*** akuznetsov has quit IRC16:21
*** cmyster has quit IRC16:21
zanebI'd look very closely at the Metadata class16:22
smulcahyzaneb: Can you put any suggestions and/or requests for more info on https://bugs.launchpad.net/heat/+bug/1306743 - it would help us in digging deeper on this16:22
uvirtbotLaunchpad bug 1306743 in heat "queuepool limit of size 5 overflow" [Critical,Triaged]16:22
*** cmyster has joined #heat16:23
*** cmyster has joined #heat16:23
thervesmulcahy, How many http requests to heat do you get?16:24
smulcahytherve: again, could you post these questions to the bug - I'll need to dig to answer them16:25
*** pablosan has quit IRC16:29
*** pablosan has joined #heat16:29
*** IlyaE has joined #heat16:30
*** ramishra has quit IRC16:31
zanebsmulcahy, therve: done16:32
*** zhiyan_ is now known as zhiyan16:34
*** zhiyan is now known as zhiyan_16:41
*** gokrokve has joined #heat16:47
*** harlowja_away is now known as harlowja16:51
*** cmyster has quit IRC16:52
*** yassine has quit IRC16:54
*** cmyster has joined #heat16:54
*** cmyster has joined #heat16:54
*** wendar has quit IRC16:59
*** wendar has joined #heat17:01
*** julienvey has quit IRC17:02
*** derekh has quit IRC17:02
*** jstrachan has joined #heat17:09
*** akuznets_ has quit IRC17:11
*** Lotus907efi has joined #heat17:11
*** Carlos44 has joined #heat17:12
*** denis_makogon has quit IRC17:13
*** dmakogon_ has joined #heat17:13
Carlos44hey everyone i have hit this bug https://bugs.launchpad.net/heat/+bug/1290274 trying to get heat db sync'd has anyone found how to get by this?17:13
uvirtbotLaunchpad bug 1290274 in heat "Index on 'tenant' column will be inefficient and 767 bytes per index key on MySQL " [Medium,Triaged]17:13
Carlos44thats is the one any possible manual fixes17:16
*** jaustinpage has quit IRC17:18
*** Carlos44 has quit IRC17:19
*** Carlos44 has joined #heat17:20
Carlos44hey everyone i have hit this bug https://bugs.launchpad.net/heat/+bug/1290274 trying to get heat db sync'd has anyone found how to get by this?17:20
uvirtbotLaunchpad bug 1290274 in heat "Index on 'tenant' column will be inefficient and 767 bytes per index key on MySQL " [Medium,Triaged]17:20
*** david-lyle has joined #heat17:20
*** IlyaE has quit IRC17:26
sdakezaneb i'll be at my folks for dinner during the meeting time - so won't be able to make it17:26
sdakeenjoy :)17:26
harlowjazanebaany heat guys around, got a probably easy question about if heat could do something a customer (mail) is asking for17:26
harlowjazanebany, haha17:26
sdakeharlowja heat cannot solve world hunger17:27
harlowja:(17:27
sdakeotherwise its great!17:27
harlowjawill it solve my business continutity17:27
Lotus907efiis heat buzzword compatible?17:28
harlowjaha, anyway the simple question is, mail wants to basically startup CI servers,  but have them auto-delete if they aren't used after X minutes, my knowledge of heat is not so much, but i thought that it had some type of capability to do this, but i can't remember anymore17:28
sdakebusiness continuity - in the context of 1) monitor for failures 2) recover from failures 3) notify of failures 4) escalate on repeated failures17:28
sdakeharlowja that is not business continuity imo :)17:29
sdakebut yes, autoscaling will do that17:29
harlowjalol17:29
harlowjaany good docs i can reference about this?17:30
sdakethe developer docs for openstack show the resources you would want to use17:30
sdakethe heat templates repo contains a autoscaling example17:30
sdakeimo autoscaling needs love17:30
*** jaustinpage has joined #heat17:30
sdakethe specific problem you mention, which is autodelete a specific node if it is underutilized, heat will not do17:30
sdakeheat will take a holistic approach to machines in an autoscaling group and scale up or down based upon metrics17:31
sdakebut it doesn't target machines that are at low-utilization for removal17:31
Lotus907efiis there any documentation that would lead a newbie through all necessary steps to do a simple example of using heat and cloud-init to do a semi-routine config task on newly booted system?17:31
harlowjakk17:31
sdakeheat expects a load balancer to run in front of the services to evenly spread load17:31
*** lindsayk has joined #heat17:31
sdakeso realistically when a node is killed off by autoscaling, it would have a similar load as other vms17:32
*** kgriffs is now known as kgriffs|afk17:32
harlowjaright right, makes sense17:32
Lotus907efiI have been looking around a for a few days and reading stuff but I am still a very confused newbie when it comes to using heat meta-data / user-data to get cloud-init to do things on first boot17:32
Lotus907efiand the example yaml files I have looked at seem a little vague17:33
lifelesszaneb: o/ I'm around now17:34
harlowjathx sdake , let me see if i can further figure out what the heck mail people want to do17:35
*** david-lyle has quit IRC17:35
harlowjaha17:35
*** yogesh has joined #heat17:36
*** lindsayk has quit IRC17:36
*** jstrachan has quit IRC17:38
*** TonyBurn has quit IRC17:39
sdakeyou mean yahoo mail harlowja?17:42
*** petertoft has quit IRC17:43
harlowjaya17:43
harlowjai do17:43
sdakei suspect they dont' care about killing  a low use node if they have a LB in front17:44
*** lindsayk has joined #heat17:44
sdakethey want to reduce utilization holistically rather then specifically17:44
sdakewe do have some folks that want to be able to target specific nodes for queisce and kill17:44
sdakebut that isn't implemented (yet)17:45
harlowjasdake this is also for there CI, not neccasrily for the mail facing servers yet, so i think we're trying to figure out what exactly they want still :)17:45
sdakeharlowja http://docs.openstack.org/developer/heat/template_guide/openstack.html#OS::Heat::AutoScalingGroup and http://docs.openstack.org/developer/heat/template_guide/openstack.html#OS::Heat::ScalingPolicy17:46
sdakepolicy controls the group17:47
sdakegroup contains the collection of vms17:47
harlowjathx sdake i'll see if i can find out more about there requirement still, and reference the above as a possible way (once the requirement becomes less blurry)17:48
sdakelotus907efi have you launched your first stack?17:49
Lotus907efiI have been playing around with tripleo for about a month now17:50
sdakelotus907efi http://openstack.redhat.com/Deploy_Heat_and_launch_your_first_Application17:50
sdakelotus907efi http://openstack.redhat.com/Deploy_an_application_with_Heat17:51
Lotus907eficool, thanks I will read those17:51
sdakethe first step of heat is launching a stack17:51
Lotus907efiok17:51
sdakeonce you get that down, you can play with the various heat API operations via cli17:51
sdakeonce you understand the clis, you can dig into writing your own templates17:51
sdakeI'd proceed in that order :)17:51
sdakeif you having a working openstack install, should take less then 30 minutes to get through those 3 things17:52
sdakeone option that works really nicely - rax has stood up heat in their infrastructure, so all you need is the python client (and a credit card) to launch stacks :)17:52
Lotus907efiah, well I have two or three tripleo devtest environments running17:55
Lotus907efiand I have been digging into those17:55
*** gokrokve has quit IRC17:57
sdakelotus907efi I have not tried tripleo + heat17:57
Lotus907efitripleo has heat integrated fully into it17:57
sdakeI intend to get heavily involved atleast in using that model during juno tho17:57
Lotus907efitripleo uses heat to bring up the undercloud and overcloud stacks that running devtest produces17:58
sdakelotus907efi yes I know - I think the heat core is going to take a keen interest in making sure that works moar better in juno17:59
Lotus907efiand all of those undercloud and overcloud systems have meta-data servers running at http://169.254.169.25418:00
*** jprovazn_afk is now known as jprovazn18:00
*** spzala has quit IRC18:00
Lotus907efiso are you saying that the heat bits built into tripleo might not be the most up to date fully coked bits?18:00
Lotus907eficooked18:01
cmysterevening18:01
sdakeheat bits built into tripleo are most up to date yes18:01
*** spzala has joined #heat18:02
sdakelotus907efi I get the impression the integration could be improved18:02
zaneblifeless: SpamapS answered my question already; I added some stuff to the bug18:02
*** lindsayk1 has joined #heat18:02
*** kgriffs|afk is now known as kgriffs18:02
sdakemostly from the heat side18:02
Lotus907efiah, ok18:02
sdake(eg heat has gaps that need feed and care)18:02
Lotus907efihmm, from what I can see from what little I have used it the heat stuff in tripleo seems to work pretty well18:03
*** lindsayk has quit IRC18:04
*** e0ne has joined #heat18:05
sdakeI think SpamapS would argue with your definition of seems :)18:06
openstackgerritAndreas Jaeger proposed a change to openstack/heat: Check that all po/pot files are valid  https://review.openstack.org/8422618:06
*** ramishra has joined #heat18:09
*** jstrachan has joined #heat18:09
Lotus907efiah, well he is supposed to be on vacation so not allowed to grouse about stuff now18:09
*** e0ne has quit IRC18:10
*** lindsayk1 has quit IRC18:11
*** zhangyang has quit IRC18:11
sdagueshardy: you still awake?18:12
*** e0ne has joined #heat18:12
sdaguethere is another race I'm seeing here - https://github.com/openstack/tempest/blob/master/tempest/api/orchestration/stacks/test_update.py#L73-L7818:12
sdaguewill a stack go through a state transition so that we can wait on something?18:13
*** ramishra has quit IRC18:13
cmysterhi sdague18:13
sdaguebecause it looks like a decent amount of the time the update isn't processing before the list pulls it back18:13
*** rbuilta1 has joined #heat18:13
sdaguecmyster: hi18:14
*** rbuilta has quit IRC18:15
sdagueactually, that's a more general heat question on resource wait on update18:15
*** adeb_ has joined #heat18:16
sdaguethis has failed 39 times in the last 24 hrs, so pretty bad - http://logstash.openstack.org/#eyJzZWFyY2giOiJcIkZBSUw6IHRlbXBlc3QuYXBpLm9yY2hlc3RyYXRpb24uc3RhY2tzLnRlc3RfdXBkYXRlLlVwZGF0ZVN0YWNrVGVzdEpTT04udGVzdF9zdGFja191cGRhdGVfYWRkX3JlbW92ZVwiIiwiZmllbGRzIjpbXSwib2Zmc2V0IjowLCJ0aW1lZnJhbWUiOiI2MDQ4MDAiLCJncmFwaG1vZGUiOiJjb3VudCIsInRpbWUiOnsidXNlcl9pbnRlcnZhbCI6MH0sInN0YW1wIjoxMzk3NjcxNzUzOTE2fQ==18:16
Lotus907efisdake: one comment about that "Deploy an application with Heat" document - the sentence "There are a number of sample templates available in the github repo" where "githup repo" is a link .... the link does not seem to actually lead to any example templates I can see18:17
*** jprovazn has quit IRC18:18
*** gokrokve has joined #heat18:18
*** jstrachan has quit IRC18:21
*** lindsayk has joined #heat18:22
thervesdague, "'Unknown resource Type : OS::Heat::RandomString" ? That's a really weird error18:23
cmysterdepends on version I guess...18:26
sdaguetherve: are you seeing a different issue than I am?18:29
sdaguehttp://logs.openstack.org/54/87554/9/check/check-tempest-dsvm-postgres-full/7922180/console.html#_2014-04-16_13_04_22_36618:29
*** vinsh has quit IRC18:29
sdaguethe mismatch is a race on update18:29
sdaguelooks like it happens 25% of the time18:30
*** spzala has quit IRC18:30
*** spzala has joined #heat18:30
*** pafuent1 has joined #heat18:33
*** spzala has quit IRC18:33
*** aweiteka has joined #heat18:34
sdaguetherve: here's the bug - https://bugs.launchpad.net/heat/+bug/130868218:35
uvirtbotLaunchpad bug 1308682 in tempest "Race in heat stack update " [Undecided,New]18:35
sdagueI'm curious if this is expected that stack_update is only eventually consistent18:35
sdagueand if so, if there is a wait condition to know it's done18:35
*** pafuent has quit IRC18:36
*** akuznetsov has joined #heat18:37
sdaguetherve: so if you have any thoughts before I just straight out skip the test18:37
sdaguewould be appreciated18:37
*** aweiteka has quit IRC18:43
*** kgriffs is now known as kgriffs|afk18:44
gokrokveHi. Is there any good example of HARestarter resource usage available? I checked here: https://github.com/openstack/heat-templates but did not find any.18:46
*** jprovazn has joined #heat18:47
*** chandan_kumar has joined #heat18:49
cmysterthere is actually18:52
cmysterhttp://zenodo.org/record/7571/files/CERN_openlab_report_Michelino.pdf18:52
*** russellb has quit IRC18:53
adeb_My settings requires going through a proxy for downloading things18:53
*** petertoft has joined #heat18:53
adeb_I am trying to create a stack using this https://github.com/openstack/heat-templates/blob/master/hot/hello_world.yaml template...but my create fails with the follwoing error: Could not retrieve template: Failed to retrieve template: [Errno 110] ETIMEDOUT18:54
adeb_Is there any config file where I can set up the proxy18:54
*** aweiteka has joined #heat18:54
adeb_I already have the env variable http_proxy set up in18:54
*** bgorski has joined #heat18:54
*** russellb has joined #heat18:55
gokrokvecmyster: Thanks!19:00
*** gondoi is now known as zz_gondoi19:01
*** tango has joined #heat19:01
cmysternp gokrokve19:03
cmysteradeb_: and proxy is working regularly otherwise?19:03
cmysteri.e if you go online to some web site19:04
*** aweiteka has quit IRC19:08
*** ramishra has joined #heat19:09
*** nati_ueno has joined #heat19:10
*** ramishra has quit IRC19:14
*** e0ne has quit IRC19:15
*** jdob_ has joined #heat19:15
*** e0ne has joined #heat19:16
*** nati_ueno has quit IRC19:20
*** e0ne has quit IRC19:21
*** nati_ueno has joined #heat19:21
thervesdague, Sorry was away. Yes in your logstack results I saw a different error19:22
therveLike http://logs.openstack.org/92/85392/6/check/check-tempest-master-dsvm-full-havana/6fb8d82/console.html19:22
sdagueoh, yeh, that's another job19:23
adeb_yes, if I go to online to other sites it works19:23
sdagueI realized that later19:23
adeb_sorry was away19:23
sdagueso the failure rate is more like 10% I think19:25
sdaguecheck-tempest-master-dsvm-full-havana is our attempt to run tempest master on stable/havana19:25
therveStill pretty bad19:25
sdagueyeh19:25
sdaguetherve: so I pushed a skip19:25
therveYeah I saw :/19:26
sdaguehowever, that doesn't really answer the root question on whether that is expected to be non synchronous19:26
*** tspatzier has joined #heat19:26
sdagueand if it is, how would an api user know things were ready19:26
*** cmyster has quit IRC19:27
*** e0ne has joined #heat19:27
therveUh19:29
thervesdague, TemplateYAMLNegativeTestJSON is pretty bad... It connects to example.com19:29
*** chandan_kumar has quit IRC19:30
sdaguetherve: is it actually connecting?19:32
therveYeah :/19:32
therveUnrelated to your issues, just saw that in the logs19:32
*** nati_uen_ has joined #heat19:32
sdagueso if we give it a totally bogus dns name will it do the right thing?19:32
sdagueI agree we should get rid of network connects like that19:33
*** nati_ueno has quit IRC19:33
therveIt's doing a HTTP GET, so whatever answers fast would be nice19:34
*** e0ne has quit IRC19:34
*** cmyster has joined #heat19:34
*** cmyster has joined #heat19:34
*** e0ne has joined #heat19:35
*** nati_uen_ has quit IRC19:35
*** petertoft has quit IRC19:37
*** e0ne has quit IRC19:39
*** jistr has quit IRC19:40
thervesdague, So to get back to the problem, I think we simply have a race condition in Heat19:41
therveWe set the state to UPDATE_COMPLETE but it's not really complete19:41
sdagueok19:41
therveAnd transactions are for suckers, so...19:41
sdagueso then it was right that I also marked it as a heat bug19:41
therveI think so19:42
sdagueif you want to add that commentary in there, would be appreciated19:42
therveI'm interested about the the other failure we're seeing to. It seems weird.19:42
therveWill do19:42
sdagueyeh, well the tempest-master ones mostly would be an incompatible change from havana to now19:46
sdagueperhaps something was added?19:46
therveAh yes, those tests wouldn't pass on Heat havana19:48
*** spzala has joined #heat19:50
*** rbuilta1 has quit IRC19:53
*** zns has quit IRC19:53
*** saurabhs has joined #heat19:55
*** tspatzier has quit IRC19:55
*** vinsh has joined #heat19:56
*** akuznetsov has quit IRC19:57
*** chandan_kumar has joined #heat19:57
openstackgerritThomas Herve proposed a change to openstack/heat: Push COMPLETE status change at the end of update  https://review.openstack.org/8807519:57
*** IlyaE has joined #heat19:58
*** jprovazn has quit IRC19:59
*** jdob_ has quit IRC19:59
*** e0ne has joined #heat20:04
*** alexpilotti has quit IRC20:05
*** e0ne has quit IRC20:08
*** e0ne has joined #heat20:08
*** ramishra has joined #heat20:10
*** zns has joined #heat20:14
*** ramishra has quit IRC20:14
*** david-lyle has joined #heat20:16
*** jistr has joined #heat20:17
*** tspatzier has joined #heat20:20
sdagueso it looks like - https://review.openstack.org/#/c/87993/ doesn't fix anything20:21
sdaguewe're still failing on wait condition on that20:22
sdaguethe cloud init errors aren't fun there though - http://logs.openstack.org/93/87993/2/gate/gate-tempest-dsvm-neutron-heat-slow/82b8ac1/console.html#_2014-04-16_17_38_16_66420:23
sdaguewith the heat job at about a 50% fail rate it's bouncing everyone else's patches at this point. I think that if we can't resolve these soon we need to stop voting with it.20:25
*** radez_g0n3 has quit IRC20:26
*** radez_g0n3 has joined #heat20:26
*** aweiteka has joined #heat20:28
sdakesdague my guess there with that last trace is that the metadata server is not metadata serving20:31
sdakethe last trace you showed showed the instance was being orchestrated up to 560 sec and timed out right around 600sec20:32
sdakethis trace doesn't show anything after 300 sec20:32
sdakewhich implies cloud-init is spinning waiting for the metadata server to provide it the goods20:33
stevebakersdague: \o20:33
*** Tross1 has quit IRC20:33
sdakesdague I think you mentioned in some cases neutron doesn't setup the network properly?20:33
*** Tross has joined #heat20:33
stevebakerwhat is that failed to set hostname error about?20:34
sdakeno idea never seen that before20:35
sdakepossibly network connectivity problems not allowing nova to work with the storage?20:36
*** blomquisg has quit IRC20:36
sdakeone thing to eliminate is "does the network actually work properly" prior to blaming wait conditions :)20:36
sdagueyeh, I think that's probably wise. The initial test conditions here assume a bit too much. So it's hard to work backwards to the failures.20:37
stevebakeryes, a waitcondition timeout is a symptom of one failure in a *very* long chain, where most parts are not heat related20:37
stevebakersdague: writing the boot log is meant to help diagnose these, but I'm happy for a whole bunch more debugging to be logged to diagnose these situations20:38
sdake_although if cloudinit fails to set that file, it could caus ecloud init to exit the init process20:39
sdake_and not actually orchestrate the instance20:39
sdake_alhtough I've never seen that happen in years of working on the code20:39
sdake_(the particular error)20:39
stevebakerwe used to have set hostname failures before the name was pinned to under 63 chars20:40
therveIt looks like there is a 9 minute window of nothing20:40
sdaguesdake_: our experience is we run so much more throughput through the gate we see issues that no one else sees, because they happened once, and people just moved past20:40
sdake_sdague yup understood on that point20:40
sdaguestevebaker: so what kind of additional debug, or assert are you thinking here?20:41
sdaguewe could also be more deliberate about asserting on the way up with things we believe should be working20:41
stevebakersdague: I guess following the whole chain, checking networking connectivity from the server to the heat endpoint, checking heat-api-cfn responds to something20:42
*** gokrokve has quit IRC20:42
sdake_sdague most devs also use baremetal rather then virt on virt20:43
sdagueis there an existing heat-api-cfn client in the tree to do that easily20:43
therveMaybe set cloud init debug?20:43
stevebakersdague: if we could find the right assert, then we could at least fail early rather than timeout20:43
sdaguesdake_: sure, but that should only change timing20:43
*** tspatzier has quit IRC20:43
sdaguethis shouldn't *not work* in this environment20:43
sdagueso it will expose different races20:43
sdake_well virt on virt is a POS when I have tested previously20:44
sdake_stack traces, kernel oopses, machine lockups, etc20:44
sdake_2014-04-16 17:38:16.628 | [    0.015000] WARNING: This combination of AMD processors is not suitable for SMP.20:44
sdake_this kernel warning looks problematic20:44
sdaguewell we've not really had any guest issues previously20:45
stevebakersdague: is this an issue that only happens on rax or HP?20:45
sdaguestevebaker: good question, let me check20:46
sdaguehttp://logstash.openstack.org/#eyJmaWVsZHMiOltdLCJzZWFyY2giOiJtZXNzYWdlOlwiaGVhdC5lbmdpbmUucmVzb3VyY2UgV2FpdENvbmRpdGlvblRpbWVvdXQ6IDAgb2YgMSByZWNlaXZlZFwiIEFORCB0YWdzOlwic2NyZWVuLWgtZW5nLnR4dFwiXG4iLCJ0aW1lZnJhbWUiOiI2MDQ4MDAiLCJncmFwaG1vZGUiOiJjb3VudCIsIm9mZnNldCI6MCwidGltZSI6eyJ1c2VyX2ludGVydmFsIjowfSwibW9kZSI6InNjb3JlIiwiYW5hbHl6ZV9maWVsZCI6ImJ1aWxkX25vZGUifQ==20:47
sdagueseems pretty equal opportunity20:47
sdake_mostly hp cloud20:47
sdake_is the load 50/50 in the gate?20:48
sdaguewell, that's only the top 25 events20:48
sdaguetop 25 facets20:48
sdaguewe run 50 ish20:48
sdagueThis combination of AMD processors is not suitable for SMP. would be a rax error though20:49
sdaguehp runs intel procs20:49
sdaguebut I really don't think that's the issue20:49
sdake_not sure if it is just pointing it out20:49
sdake_my top two guesses would be network or virt on virt20:51
sdake_i assume other guest tests don't run f2020:51
sdaguecorrect, we're running cirros20:52
sdake_perhaps f20 has some incompatibility with the hypervisor on those environments20:52
sdaguecould be20:52
thervesdague, Presumably ssh access to those instances during a test is out of question?20:52
sdaguetherve: no, not out of the question20:53
*** aweiteka has quit IRC20:53
sdaguethat would be complete kosher20:53
therveIt's be interesting to see what's going on20:53
sdaguesure20:53
therveWe should have "Cloud-init v. 0.7.2 finished" in the logs, so I think it's still running20:53
sdaguewell the nova console log is there in the dump20:54
therveI'm blamming SSH access somewhere20:54
therves/SSH/network20:54
sdake_it is either running or exited in some undefined way20:54
stevebakerhistorically ssh was attempted before the waitcondition returned, but it was moved to after because of ssh timeouts20:54
sdake_systemctl should give output of the cloudinit results20:54
*** e0ne has quit IRC20:54
stevebakerbut actually the ssh timeout is probably exactly the same bug as our current waitcondition timeout20:54
*** e0ne has joined #heat20:55
sdake_would it be possible to gate with a distro other then f20?20:55
*** jdob has quit IRC20:55
sdake_so we can eliminate the distro as a source20:55
sdaguesdake_: if we can get this on cirros, that's super easy, as the image is there already20:55
sdake_does cirros have the proper cloudinit?20:56
sdaguesdake_: no, it's got some lightweight scripts that do part of it20:56
sdagueenough for nova tests to work20:57
*** asalkeld has joined #heat20:57
thervestevebaker, Actually you're right, we should see SSH host key generation in there20:57
sdake_sdague heat definately needs cloudinit20:57
sdaguethat should be in the backlog, we were going through that this morning actually to try to figure out20:57
*** e0ne has quit IRC20:57
stevebakersdague: that test is to test cfn-init, so cirros is out. But there could be another test which uses cirros to test end to end connectivity, and cfn-signal could be replaced with curl20:58
*** pafuent1 has quit IRC20:58
stevebakersdake_: we could only switch to ubuntu when a solution is found for building images in gate20:59
sdake_well with f20 - it could be a kernel bug, a systemd bug, a cloud init bug21:00
sdake_any one of those things could fail and the test would not complete21:00
sdagueok, so it feels like we have a bunch of loose ends here. The real question is what made this spike at about 18:00 UTC Apr 1421:01
sdake_with cirros + curl, it could only be a systemd bug21:01
sdake_sorry with cirros + curl - if the gate works, then it is definately a f20 problem21:01
sdake_if cirros + curl = if the gate fails - likely a network problem21:01
sdagueyep, sure21:02
*** gokrokve has joined #heat21:02
stevebakerwell, a problem which f20 reveals. It could equally be a neutron or nova bug21:02
*** tspatzier has joined #heat21:02
sdake_stevebaker agree21:02
sdaguestevebaker: definitely could be, however, we're not seeing the same high level of failure on nova21:03
sdaguewhich does ~180 guest starts during a run21:03
sdagueneutron is only probably at about 60 guest starts I think21:03
*** kgriffs|afk is now known as kgriffs21:03
stevebakersdague: try starting ~180 f20s ;)21:03
sdaguestevebaker: :)21:03
*** kgriffs is now known as kgriffs|afk21:04
sdaguemy point is we start 1, and we get really high failure. So the guest hypothesis is potentionally interesting21:04
thervestevebaker, So the test is doing mostly the same thing as the neutron ones that work, except cfn-int21:05
sdake_ya makes sense21:05
thervecfn-init21:05
therveWhich does a metadata retrieve21:05
sdake_cloud-init does the metadata retrieval, not cfn-init21:05
thervesdake_: Heat metadata21:06
sdagueok, so we have some long term items, but we also have the short term issue of the 50% failure rate, which is liable to get flaming torches soon21:06
stevebakertherve: the cirros test could grep the nova metadata service user_data for some pattern, then curl signal the result. No cloud-init, no heat-cfntools21:06
sdake_so course of action - make heat gate non-voting until we get to bottom of problem21:06
sdagueyeh, that was basically the opinion I was going to ask21:06
sdake_#2 make a cirros test which uses curl to identify if f20 is the cause21:07
stevebakersdague: we could skip that test, and have some gerrit changes which unskip it while we continue to diagnose21:07
sdaguestevebaker: that's an option as well, skipping the test means we won't get any data on it though21:07
sdaguevs. non voting, where the runs will still happen21:07
thervestevebaker, Right but it doesn't solve the issue of that test21:07
stevebakeryeah, non-voting might be best21:07
sdagueso I'd like heat core team pov on which you guys think is best21:08
sdagueI'm happy to execute on either of them21:08
therveNote that cfn-signal works, the problem seems to be with cfn-init21:08
stevebakertherve: it looks like cfn-init isn't being run21:08
sdake_given the # of failures of f20 vs cirros coupled with the number of guest starts, I think we need to identify if f20 is part of the problem21:08
stevebakertherve: you mean cloud-init?21:08
*** jaustinpage has quit IRC21:09
thervestevebaker, No I mean cfn-init21:09
sdagueok, got to drop for a few to relocate. I should be back on in 30 mins or so.21:09
thervestevebaker, Difference between https://github.com/openstack/tempest/blob/master/tempest/api/orchestration/stacks/templates/cfn_init_signal.yaml  and https://github.com/openstack/tempest/blob/master/tempest/api/orchestration/stacks/templates/neutron_basic.yaml21:10
stevebakertherve: there is no evidence in the log that cfn-init is failing though21:11
sdake_it doesn't seem that cfn-init is run21:12
therveMy guess would be that it hangs21:12
therveThe first thing it does is connecting to heat21:13
sdake_my guess would be that cloud-init hangs :)21:13
stevebakertherve: no, cfn-init just consumes metadata data which cloud-init (and loguserdata.py) has already written to disk21:13
thervestevebaker, ? It connects to heat metadata server, no?21:13
sdake_cfn-init does not connect to any server unless yum or deb are specified as files21:14
sdake_it reads off the local disk21:14
therveThat's not what I understand of the code21:14
stevebakertherve: no. cloud-init fetches the user_data from the nova metadata server with an http GET21:15
stevebakertherve: the user_data is a mime package containing the cfn metadata, plus loguserdata.py which cloud-init invokes to write that metadata to disk21:15
stevebaker(plus a bunch of other stuff)21:15
thervestevebaker, https://github.com/openstack/heat-cfntools/blob/master/heat_cfntools/cfntools/cfn_helper.py#L111921:16
therveIt seems we first try to get the remote metadata21:16
therveAnd then fail back to local files21:16
sdake_therve the getting of the remot emetadata described in line 1119 happens via cloud-init21:17
sdake_atleast that is how it behaved in the past :)21:18
therveremote_metadata seems pretty remote to me21:19
stevebakertherve: right, ok. We could not write out /etc/cfn/cfn-credentials to see if the test gets further, but it would probably fail for the same reason when attempting to cfn-signal21:19
sdaketherve good point21:19
therveYeah I'd be curious, but anyway21:20
sdaketherve my apologies that is new code :)21:20
thervestevebaker, Maybe we can use a custom image and set debug to some stuff?21:20
therveLike cloud-init and cfn-tools21:20
stevebakertherve: yeah, sorry. I forgot the reason I wrote out cfn-credentials21:20
*** andrew_plunk has quit IRC21:21
*** zns has quit IRC21:31
*** vijendar has quit IRC21:32
*** jistr has quit IRC21:33
*** achampion has quit IRC21:34
*** dims has quit IRC21:52
*** dims has joined #heat22:04
*** e0ne has joined #heat22:05
*** e0ne has quit IRC22:10
*** lindsayk has quit IRC22:13
*** lindsayk has joined #heat22:13
*** lindsayk has quit IRC22:13
*** lindsayk has joined #heat22:15
*** zns has joined #heat22:16
gokrokveHi. Is it possible to use Ceilometer alarms for HARestarter instead of CloudWatch alarms?22:19
sdaguestevebaker / therve / sdake_ : if you guys are good with this, please +1 - https://review.openstack.org/88100 - it's making the job non voting22:20
*** tspatzier has quit IRC22:20
*** Tross1 has joined #heat22:28
*** lindsayk has quit IRC22:30
*** lindsayk has joined #heat22:30
*** Tross has quit IRC22:30
*** sjmc7 has quit IRC22:34
stevebakersdague: +122:36
stevebakerand if any heat-core +2s a change without checking the reason for a heat-slow failure, I keel yooo!22:36
stevebakerwinky face22:37
sdaguehehe22:38
stevebakergokrokve: Yes, but I think you still need to use cfn-push-stats, so the metrics go through heat on the way to ceilometer22:38
SpamapSstevebaker: hah, that ventriliquist guy lives in my neighborhood.. ;)22:38
gokrokvestevebaker: Do I need to configure cfn-credentials for that?22:39
stevebakergokrokve: um, yes?22:40
*** yogesh has quit IRC22:40
stevebakergokrokve: there must be an example template somewhere22:40
gokrokvestevebaker: What about cfn-hup? Should it be in crontab for that?22:40
gokrokvestevebaker: I've got an json example from CERN guys. its like 2 pages of bash magic in user-data :-(22:41
mattoliveraumorning all22:43
*** IlyaE has quit IRC22:44
*** zns has quit IRC22:46
stevebakergokrokve: I would avoid cfn-hup, its a very complicated way of achieving configuration updates22:48
gokrokvestevebaker: I see it in all examples. So what will be the best way to setup a VM to report status to Ceilometer?22:48
stevebakergokrokve: look for cfn-push-stats22:48
stevebakergokrokve: call it from cron or a bash loop22:49
gokrokvestevebaker: Cool. So I need to setup cfn-credentials with some securekay and then setup crontab to run cfn-push-stats22:50
stevebakergokrokve: yes22:50
gokrokvestevebaker: Then create a Ceilometer alarm for specific instance gauge22:50
gokrokvestevebaker: Ok. Thanks. Will try to figure out how to glue this all together22:50
stevebakergokrokve: or you could use python instead of cfn-push-stats https://review.openstack.org/#/c/44967/5/tempest/scenario/orchestration/test_autoscaling.yaml22:51
*** lnxnut_ has joined #heat22:51
stevebakergokrokve: which would be pure boto22:51
*** zns has joined #heat22:51
*** lnxnut has quit IRC22:51
*** lnxnut_ has quit IRC22:51
*** lnxnut has joined #heat22:52
gokrokvestevebaker: That is great. I will probably use python version as it is much clearer.22:52
SpamapSstevebaker: so, slow polling for metadata...22:59
SpamapSstevebaker: do we actually need to parse the whole stack, to just pull the metadata for a server?22:59
*** david-lyle has quit IRC23:01
stevebakerSpamapS: the current implementation of _authorize_stack_user requires a parsed stack23:01
*** adeb_ has quit IRC23:01
SpamapSstevebaker: so here's a thought for a potential optimization: shove metadata into swift, and hand out tempurls to said metadata.23:02
stevebakerSpamapS: my POLL_DEPLOYMENTS plan would be very low overhead, its just a formatted sql query23:03
SpamapSstevebaker: if that is too radical, we could also just precompute the inputs for _authorize_stack_user and save them in resource_data.23:03
stevebakeryeah, there is lots of potential optimisations23:04
stevebakerSpamapS: does raising max_pool_size in heat.conf mitigate this?23:05
SpamapSstevebaker: given that heat-engine is hitting 100% CPU, I think that will just change it from 500 errors to timeouts23:05
stevebakeryeah, ok23:05
*** IlyaE has joined #heat23:06
stevebakerSpamapS: Once https://review.openstack.org/#/c/84269/ has landed I'll carry on with a collector which calls heatclient.software_deployments.metadata directly23:08
SpamapSstevebaker: it's got my +2 :)23:11
*** killer_prince has quit IRC23:14
*** ifarkas has quit IRC23:14
stevebakerSpamapS: cool. do you see any issue with getting a new keystone token every 30 seconds for every occ based server?23:14
SpamapSstevebaker: seems like a huge waste.23:17
SpamapSstevebaker: have to run for a while.. but I'll be back in a while.23:17
stevebakerSpamapS: it does doesn't it. the collector should really keep using the token until it is close to expiring23:20
*** lazy_prince has joined #heat23:20
*** lazy_prince is now known as killer_prince23:20
*** gokrokve has quit IRC23:23
*** vinsh has quit IRC23:26
*** asalkeld_ has joined #heat23:30
*** lipinski has quit IRC23:30
*** asalkeld has quit IRC23:30
*** zns has quit IRC23:32
*** arbylee has quit IRC23:34
*** achampion has joined #heat23:36
*** chandan_kumar has quit IRC23:37
*** andersonvom has quit IRC23:40
cmysterI am going over the API in http://api.openstack.org/api-ref-orchestration.html and I was wondering how can a software config be updated ?23:41
stevebakercmyster: by creating a new one with different contents, they are designed to be immutable23:42
cmysterstevebaker: needs to be the same name or something?23:42
*** asalkeld_ is now known as asalkeld23:43
*** arbylee has joined #heat23:43
stevebakercmyster: the deployment resource associates a config with a server, and creates derived configs whenever input_values changed. Its the derived config which the server ends up with23:43
*** arbylee has quit IRC23:43
cmysterso from a user point of view to replace a config is to delete and recreate/23:44
cmyster?23:44
*** andersonvom has joined #heat23:44
*** cmyster has quit IRC23:49
*** andersonvom has quit IRC23:50
*** tango has quit IRC23:50
*** ramishra has joined #heat23:52
*** ramishra has quit IRC23:56

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!