08:01:42 <therve> #startmeeting Heat 08:01:43 <openstack> Meeting started Wed Apr 6 08:01:42 2016 UTC and is due to finish in 60 minutes. The chair is therve. Information about MeetBot at http://wiki.debian.org/MeetBot. 08:01:44 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 08:01:47 <openstack> The meeting name has been set to 'heat' 08:02:10 <therve> #topic Rollcall 08:02:17 <shardy> o/ 08:02:57 <therve> skraynev ? 08:03:13 <therve> Who's supposed to be here 08:04:02 <therve> stevebaker, Maybe? 08:04:19 <elynn_> o/ 08:04:23 <ricolin> o/ 08:04:32 <skraynev> o/ 08:04:42 <skraynev> therve: I am here 08:04:45 <skraynev> :) 08:04:59 <therve> Alright, that's a start :) 08:05:04 <skraynev> ramishra ? 08:05:21 <therve> #topic Adding items to agenda 08:05:37 <therve> #link https://wiki.openstack.org/wiki/Meetings/HeatAgenda#Agenda_.282016-04-06_0800_UTC.29 08:06:08 <therve> #topic Add new tag for stable branches 08:06:14 <therve> skraynev, So what's up 08:07:39 <skraynev> therve: so... 08:08:06 <skraynev> I thought about adding new tags for stable branches like liberty and kilo 08:08:25 <skraynev> looks like we enough new patches in these branches 08:08:37 <skraynev> and last tags were applied long time ago 08:09:20 <skraynev> I discussed it with ttx yesterday and he told, that have tag each 2 month sounds - ok 08:09:24 <therve> OK I have no idea what those do and how to do it :) 08:09:38 <skraynev> however I need also ask the same question in openstack-release channel 08:09:49 <shardy> is this the new model for stable releases, e.g per-project tagging instead of a coordinated periodic stable release? 08:09:59 <therve> Somethng like 5.0.2 on kilo? 08:10:05 <skraynev> therve: I suppose, that it's just patch to releases repo... 08:10:10 <skraynev> therve: yeah 08:10:55 <therve> That seems reasonable 08:11:00 <skraynev> shardy: I am not sure.. May be we have not coordinated periodic stable release now... 08:11:10 <therve> Let's see after the meeting if we can do it? 08:11:42 <skraynev> shardy: it's another good reason to ask this question on openstack-stable 08:11:50 <skraynev> therve: sure 08:11:58 <therve> OK moving on 08:12:09 <therve> #topic AWS resources future 08:12:16 <therve> I don't know who put that in :) 08:13:33 <therve> skraynev, You presumably? 08:13:46 <skraynev> therve: yeah. :) 08:14:23 <therve> What's your question? 08:14:34 <skraynev> so I remember, that we planned don't remove AWS resources 08:15:08 <skraynev> but I was surprised, that nova team deprecated AWS support and totally removed it... 08:15:18 <shardy> skraynev: well, it moved into a different project 08:15:19 <skraynev> and move it to another repo. 08:15:29 <skraynev> shardy: right 08:15:41 <shardy> I personally don't feel like it's a huge issue for us, the maintenance overhead seems pretty small 08:15:41 <skraynev> so my question was: do we need to do in the same way ? 08:15:51 <shardy> and we still have some coupling between resources I believe 08:15:51 <therve> We don't need to, no 08:16:17 <shardy> I feel like, one day, we might want to move the cfn compatible stuff (API's and resources), but not now, or even anytime soon 08:16:28 <therve> The cfn API is still fairly useful by default until we have a nice signature story 08:16:30 <skraynev> shardy: as I remember, we wanted to decouple them 08:16:48 <shardy> therve: Yeah, by default SoftwareDeployment still uses it 08:17:06 <shardy> I started a discussion about changing that a few months ago, but then realized that certain config groups won't work without it 08:17:13 <shardy> the os-apply-config one in particular 08:17:26 <shardy> I'm working to remove that from TripleO, but it may be in use elsewhere also 08:17:26 <skraynev> shardy: aha. I see :) 08:17:52 <therve> shardy, What doesn't work? 08:18:14 <shardy> therve: signalling back to heat, because the os-refresh-config script expects a pre-signed URL 08:18:20 <shardy> the swift signal approach may work 08:18:23 <therve> Ah I see 08:18:28 <shardy> but the heat native one won't atm 08:18:32 <therve> Not a bug in Heat per itself 08:18:49 <shardy> I could probably fix the hook script, but have focussed instead on trying to not use it 08:19:03 <shardy> therve: No, just a known use-case, and reason for not ripping out the CFN stuff just yet ;) 08:19:13 <therve> Right 08:19:45 <therve> skraynev, So yeah, we'll keep it unless we have a good reason, like legal crap or a real advantage to rip it out 08:20:01 <tiantian> mostly templates in our product team is using AWS resources:( 08:20:07 <shardy> I always imagined we'd end up with CFN resources as provider resources (nested stack templates) 08:20:21 <skraynev> therve: ok :) I just wanted to clarify it in light of changes in nova ;) 08:20:30 <shardy> but our stack abstraction has ended up so heavyweight I think the performance penalty would be too much atm 08:21:13 <therve> shardy, We can still do that. And then optimize :) 08:21:36 <therve> #topic Summit sessions 08:21:42 <shardy> tiantian: Thanks, that's good feedback - I suspected they are in quite wide use but it's hard to know for sure :) 08:21:45 <therve> #link https://etherpad.openstack.org/p/newton-heat-sessions 08:22:01 <shardy> +1000 on performance improvements 08:22:15 <therve> Yeah I believe that's one focus 08:22:24 <shardy> Heat has become, by far, the worst memory hog on the TripleO undercloud recently 08:22:38 <therve> Still, there are probably other topics :). I sent an email to the list, but didn't get any feedback yet 08:22:41 <shardy> we fixed it a while back, but it's crept back up again :( 08:22:59 <therve> shardy, We can split that topic in several sessions, 40 minutes is probably not enough 08:23:47 <ricolin> improve +1 08:24:43 <therve> We have 12 slots. If we have relic topics from Tokyo we can revive them too 08:24:53 <therve> #link https://etherpad.openstack.org/p/mitaka-heat-sessions 08:26:27 <therve> *cricket sound* 08:26:40 <skraynev> therve: do you plan prepare list of pain points for performance ? 08:26:48 <shardy> I don't feel like we've made much progress on the composition improvements or breaking the stack barrier 08:26:57 <shardy> those may be worth revisiting 08:27:06 <skraynev> therve: honestly I did not research this area deeply :) 08:27:16 <therve> skraynev, I started to 08:27:32 <therve> skraynev, I'm interested about what Rally can do for us, if you know 08:27:41 <shardy> One thing I've also been wondering about is do we need a heat option which optimizes for all-in-one environments 08:28:10 <shardy> we've been optimizing more and more for breaking work out over RPC, but this makes performance increasingly bad for typical all-in-one deployments 08:28:10 <skraynev> therve: awesome, it will be really useful to discuss the described issues. 08:28:43 <therve> shardy, Yeah I think there is a tradeoff to be made 08:28:54 <skraynev> therve: which date estimate for proposing topics for sessions ? 08:29:17 <therve> Final deadline next week 08:29:27 <therve> We still have next meeting 08:29:32 <skraynev> therve: i.e. when do you plan start migrate it from etherpad to schedule ? 08:29:40 <skraynev> therve: aha 08:29:45 <skraynev> got it 08:31:04 <therve> I'd rather have it done before, obviously :) 08:31:38 <therve> shardy, I'm also a bit concerned with convergence, which performs even worse than the regular workflow 08:31:56 <shardy> therve: Yeah, that's kinda the root of my question 08:32:15 <shardy> e.g do we need an abstraction which defeats some of that overhead for e.g developer environments 08:32:40 <shardy> My changes to RPC nested stacks were the first step, and convergence goes a step further 08:33:18 <therve> We don't do the expensive polling over RPC though 08:33:36 <therve> shardy, For dev, you can try the fake messaging driver 08:33:58 <shardy> therve: Yeah, this is the discussion I'd like to have (not necessarily at summit, just generally) 08:34:12 <shardy> is there a way to offer at least docs on how to make aio enviroments work better 08:34:19 <shardy> or at least as well as they once did ;) 08:34:30 <shardy> or do we need to abstract things internally to enable that 08:35:07 <therve> Well, I'd like all envs to perform well :) 08:35:23 <skraynev> shardy: I have some graphics for comparison convergence vs legacy (it based on Angus written scenarios for rally) 08:35:26 <therve> I don't think there is any reason for things to do badly, except bad implementation 08:35:29 <shardy> therve: Sure, me too - I'm just pointing out we're regressing badly for some sorts of deployments 08:35:37 <skraynev> however I measured it on liberty code :( 08:36:32 <skraynev> shardy: btw... what about plans of enabling TripleO job against convergence 08:36:43 <skraynev> I remmeber, that we discussed it early 08:37:04 <shardy> skraynev: This is related to my question - we're hitting heat RPC timeouts without convergence due to the TripleO undercloud being an all-in-one enviromnent 08:37:12 <shardy> I'm pretty sure convergence will make that worse 08:37:35 <shardy> we can do some tests, but I don't think we can enable convergence for tripleo until we resove some of the performance issues 08:37:55 <shardy> e.g we increased the worker count because of hitting RPC timeouts for nested stacks 08:38:03 <shardy> which made us run out of memory 08:38:15 <shardy> convergence will make the situation worse AFAICT 08:38:43 <shardy> Obviously it's a similar experience for developers and other aio users 08:38:44 <therve> Most likely 08:40:03 <skraynev> shardy: hm.. interesting. do you have guess why it happens ? I mean where exactly we have little performance ? 08:40:38 <therve> Presumably we're doing to many things in parallel? Maybe we should try to limit that 08:40:39 <shardy> skraynev: because we create >70 nested stacks with 4 workers, which causes Heat to DoS itself and use >2GB of memory 08:41:03 <therve> Doing it a bit more sequentially could work much better 08:41:27 <shardy> therve: Yeah, we've already done stuff like unrolling nesting to improve things 08:41:44 <shardy> effectively compromising on our use of the heat template model due to implementation limitations 08:42:02 <skraynev> shardy: could you try to use https://review.openstack.org/#/c/301323/ ? 08:42:06 <shardy> but we don't want to do too much sequentially, because we've got constraints in terms of wall-time for our CI 08:42:17 <skraynev> or https://review.openstack.org/#/c/244117/ 08:42:33 <skraynev> sounds, like it may help you 08:43:24 <therve> The batch work may be a interesting thing to try indeed 08:43:28 <therve> Anyway 08:43:33 <therve> #topic Open discussion 08:43:36 <shardy> skraynev: thanks, looks interesting, although zaneb appears to be opposing it 08:44:02 <skraynev> shardy: IMO it will be really good test-case for these patches 08:44:19 <shardy> skraynev: yup, I'll pull them and give it a try, thanks! 08:44:39 <skraynev> :) good. 08:46:04 <therve> shardy, A good thing would be to find a reproducer without tripleo 08:46:11 <therve> Ideally without servers at all 08:46:33 <therve> I wonder if just nested stacks of TestResource would trigger the issues 08:47:33 <skraynev> therve: I afraid, that we may have issue with reproducing it on devstack. in this case it will be really bad for normal development/fixing. 08:47:37 <shardy> therve: Yeah I did raise one against convergence with a reproducer that may emulate what we're seeeing 08:47:41 <shardy> let me find it 08:48:08 <shardy> I'm pretty sure it can be reproduced, I'll raise a bug 08:48:39 <therve> Cool 08:48:44 <therve> Anything else? 08:50:16 <therve> Alright thanks all 08:50:18 <therve> #endmeeting