#openstack-meeting log

08:01:42 <therve> #startmeeting Heat
08:01:43 <openstack> Meeting started Wed Apr  6 08:01:42 2016 UTC and is due to finish in 60 minutes.  The chair is therve. Information about MeetBot at http://wiki.debian.org/MeetBot.
08:01:44 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
08:01:47 <openstack> The meeting name has been set to 'heat'
08:02:10 <therve> #topic Rollcall
08:02:17 <shardy> o/
08:02:57 <therve> skraynev ?
08:03:13 <therve> Who's supposed to be here
08:04:02 <therve> stevebaker, Maybe?
08:04:19 <elynn_> o/
08:04:23 <ricolin> o/
08:04:32 <skraynev> o/
08:04:42 <skraynev> therve: I am here
08:04:45 <skraynev> :)
08:04:59 <therve> Alright, that's a start :)
08:05:04 <skraynev> ramishra ?
08:05:21 <therve> #topic Adding items to agenda
08:05:37 <therve> #link https://wiki.openstack.org/wiki/Meetings/HeatAgenda#Agenda_.282016-04-06_0800_UTC.29
08:06:08 <therve> #topic Add new tag for stable branches
08:06:14 <therve> skraynev, So what's up
08:07:39 <skraynev> therve: so...
08:08:06 <skraynev> I thought about adding new tags for stable branches like liberty and kilo
08:08:25 <skraynev> looks like we enough new patches in these branches
08:08:37 <skraynev> and last tags were applied long time ago
08:09:20 <skraynev> I discussed it with ttx yesterday and he told, that have tag each 2 month sounds - ok
08:09:24 <therve> OK I have no idea what those do and how to do it :)
08:09:38 <skraynev> however I need also ask the same question in openstack-release channel
08:09:49 <shardy> is this the new model for stable releases, e.g per-project tagging instead of a coordinated periodic stable release?
08:09:59 <therve> Somethng like 5.0.2 on kilo?
08:10:05 <skraynev> therve: I suppose, that it's just patch to releases repo...
08:10:10 <skraynev> therve: yeah
08:10:55 <therve> That seems reasonable
08:11:00 <skraynev> shardy: I am not sure.. May be we have not coordinated periodic stable release now...
08:11:10 <therve> Let's see after the meeting if we can do it?
08:11:42 <skraynev> shardy: it's another good reason to ask this question on openstack-stable
08:11:50 <skraynev> therve: sure
08:11:58 <therve> OK moving on
08:12:09 <therve> #topic AWS resources future
08:12:16 <therve> I don't know who put that in :)
08:13:33 <therve> skraynev, You presumably?
08:13:46 <skraynev> therve: yeah. :)
08:14:23 <therve> What's your question?
08:14:34 <skraynev> so I remember, that we planned don't remove AWS resources
08:15:08 <skraynev> but I was surprised, that nova team deprecated AWS support and totally removed it...
08:15:18 <shardy> skraynev: well, it moved into a different project
08:15:19 <skraynev> and move it to another repo.
08:15:29 <skraynev> shardy: right
08:15:41 <shardy> I personally don't feel like it's a huge issue for us, the maintenance overhead seems pretty small
08:15:41 <skraynev> so my question was: do we need to do in the same way ?
08:15:51 <shardy> and we still have some coupling between resources I believe
08:15:51 <therve> We don't need to, no
08:16:17 <shardy> I feel like, one day, we might want to move the cfn compatible stuff (API's and resources), but not now, or even anytime soon
08:16:28 <therve> The cfn API is still fairly useful by default until we have a nice signature story
08:16:30 <skraynev> shardy: as I remember, we wanted to decouple them
08:16:48 <shardy> therve: Yeah, by default SoftwareDeployment still uses it
08:17:06 <shardy> I started a discussion about changing that a few months ago, but then realized that certain config groups won't work without it
08:17:13 <shardy> the os-apply-config one in particular
08:17:26 <shardy> I'm working to remove that from TripleO, but it may be in use elsewhere also
08:17:26 <skraynev> shardy: aha. I see :)
08:17:52 <therve> shardy, What doesn't work?
08:18:14 <shardy> therve: signalling back to heat, because the os-refresh-config script expects a pre-signed URL
08:18:20 <shardy> the swift signal approach may work
08:18:23 <therve> Ah I see
08:18:28 <shardy> but the heat native one won't atm
08:18:32 <therve> Not a bug in Heat per itself
08:18:49 <shardy> I could probably fix the hook script, but have focussed instead on trying to not use it
08:19:03 <shardy> therve: No, just a known use-case, and reason for not ripping out the CFN stuff just yet ;)
08:19:13 <therve> Right
08:19:45 <therve> skraynev, So yeah, we'll keep it unless we have a good reason, like legal crap or a real advantage to rip it out
08:20:01 <tiantian> mostly templates in our product team is using AWS resources:(
08:20:07 <shardy> I always imagined we'd end up with CFN resources as provider resources (nested stack templates)
08:20:21 <skraynev> therve: ok :) I just wanted to clarify it in light of changes in nova ;)
08:20:30 <shardy> but our stack abstraction has ended up so heavyweight I think the performance penalty would be too much atm
08:21:13 <therve> shardy, We can still do that. And then optimize :)
08:21:36 <therve> #topic Summit sessions
08:21:42 <shardy> tiantian: Thanks, that's good feedback - I suspected they are in quite wide use but it's hard to know for sure :)
08:21:45 <therve> #link https://etherpad.openstack.org/p/newton-heat-sessions
08:22:01 <shardy> +1000 on performance improvements
08:22:15 <therve> Yeah I believe that's one focus
08:22:24 <shardy> Heat has become, by far, the worst memory hog on the TripleO undercloud recently
08:22:38 <therve> Still, there are probably other topics :). I sent an email to the list, but didn't get any feedback yet
08:22:41 <shardy> we fixed it a while back, but it's crept back up again :(
08:22:59 <therve> shardy, We can split that topic in several sessions, 40 minutes is probably not enough
08:23:47 <ricolin> improve +1
08:24:43 <therve> We have 12 slots. If we have relic topics from Tokyo we can revive them too
08:24:53 <therve> #link https://etherpad.openstack.org/p/mitaka-heat-sessions
08:26:27 <therve> *cricket sound*
08:26:40 <skraynev> therve: do you plan prepare list of pain points for performance ?
08:26:48 <shardy> I don't feel like we've made much progress on the composition improvements or breaking the stack barrier
08:26:57 <shardy> those may be worth revisiting
08:27:06 <skraynev> therve: honestly I did not research this area deeply :)
08:27:16 <therve> skraynev, I started to
08:27:32 <therve> skraynev, I'm interested about what Rally can do for us, if you know
08:27:41 <shardy> One thing I've also been wondering about is do we need a heat option which optimizes for all-in-one environments
08:28:10 <shardy> we've been optimizing more and more for breaking work out over RPC, but this makes performance increasingly bad for typical all-in-one deployments
08:28:10 <skraynev> therve: awesome, it will be really useful to discuss the described issues.
08:28:43 <therve> shardy, Yeah I think there is a tradeoff to be made
08:28:54 <skraynev> therve: which date estimate for proposing topics for sessions ?
08:29:17 <therve> Final deadline next week
08:29:27 <therve> We still have next meeting
08:29:32 <skraynev> therve: i.e. when do you plan start migrate it from etherpad to schedule ?
08:29:40 <skraynev> therve: aha
08:29:45 <skraynev> got it
08:31:04 <therve> I'd rather have it done before, obviously :)
08:31:38 <therve> shardy, I'm also a bit concerned with convergence, which performs even worse than the regular workflow
08:31:56 <shardy> therve: Yeah, that's kinda the root of my question
08:32:15 <shardy> e.g do we need an abstraction which defeats some of that overhead for e.g developer environments
08:32:40 <shardy> My changes to RPC nested stacks were the first step, and convergence goes a step further
08:33:18 <therve> We don't do the expensive polling over RPC though
08:33:36 <therve> shardy, For dev, you can try the fake messaging driver
08:33:58 <shardy> therve: Yeah, this is the discussion I'd like to have (not necessarily at summit, just generally)
08:34:12 <shardy> is there a way to offer at least docs on how to make aio enviroments work better
08:34:19 <shardy> or at least as well as they once did ;)
08:34:30 <shardy> or do we need to abstract things internally to enable that
08:35:07 <therve> Well, I'd like all envs to perform well :)
08:35:23 <skraynev> shardy: I have some graphics for comparison convergence vs legacy (it based on Angus written scenarios for rally)
08:35:26 <therve> I don't think there is any reason for things to do badly, except bad implementation
08:35:29 <shardy> therve: Sure, me too - I'm just pointing out we're regressing badly for some sorts of deployments
08:35:37 <skraynev> however I measured it on liberty code :(
08:36:32 <skraynev> shardy: btw... what about plans of enabling TripleO job against convergence
08:36:43 <skraynev> I remmeber, that we discussed it early
08:37:04 <shardy> skraynev: This is related to my question - we're hitting heat RPC timeouts without convergence due to the TripleO undercloud being an all-in-one enviromnent
08:37:12 <shardy> I'm pretty sure convergence will make that worse
08:37:35 <shardy> we can do some tests, but I don't think we can enable convergence for tripleo until we resove some of the performance issues
08:37:55 <shardy> e.g we increased the worker count because of hitting RPC timeouts for nested stacks
08:38:03 <shardy> which made us run out of memory
08:38:15 <shardy> convergence will make the situation worse AFAICT
08:38:43 <shardy> Obviously it's a similar experience for developers and other aio users
08:38:44 <therve> Most likely
08:40:03 <skraynev> shardy: hm.. interesting. do you have guess why it happens ? I mean where exactly we have little performance ?
08:40:38 <therve> Presumably we're doing to many things in parallel? Maybe we should try to limit that
08:40:39 <shardy> skraynev: because we create >70 nested stacks with 4 workers, which causes Heat to DoS itself and use >2GB of memory
08:41:03 <therve> Doing it a bit more sequentially could work much better
08:41:27 <shardy> therve: Yeah, we've already done stuff like unrolling nesting to improve things
08:41:44 <shardy> effectively compromising on our use of the heat template model due to implementation limitations
08:42:02 <skraynev> shardy: could you try to use https://review.openstack.org/#/c/301323/ ?
08:42:06 <shardy> but we don't want to do too much sequentially, because we've got constraints in terms of wall-time for our CI
08:42:17 <skraynev> or https://review.openstack.org/#/c/244117/
08:42:33 <skraynev> sounds, like it may help you
08:43:24 <therve> The batch work may be a interesting thing to try indeed
08:43:28 <therve> Anyway
08:43:33 <therve> #topic Open discussion
08:43:36 <shardy> skraynev: thanks, looks interesting, although zaneb appears to be opposing it
08:44:02 <skraynev> shardy: IMO it will be really good test-case for these patches
08:44:19 <shardy> skraynev: yup, I'll pull them and give it a try, thanks!
08:44:39 <skraynev> :) good.
08:46:04 <therve> shardy, A good thing would be to find a reproducer without tripleo
08:46:11 <therve> Ideally without servers at all
08:46:33 <therve> I wonder if just nested stacks of TestResource would trigger the issues
08:47:33 <skraynev> therve: I afraid, that we may have issue with reproducing it on devstack. in this case it will be really bad for normal development/fixing.
08:47:37 <shardy> therve: Yeah I did raise one against convergence with a reproducer that may emulate what we're seeeing
08:47:41 <shardy> let me find it
08:48:08 <shardy> I'm pretty sure it can be reproduced, I'll raise a bug
08:48:39 <therve> Cool
08:48:44 <therve> Anything else?
08:50:16 <therve> Alright thanks all
08:50:18 <therve> #endmeeting