12:01:19 <zaneb> #startmeeting heat
12:01:20 <openstack> Meeting started Wed Oct  1 12:01:19 2014 UTC and is due to finish in 60 minutes.  The chair is zaneb. Information about MeetBot at http://wiki.debian.org/MeetBot.
12:01:21 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
12:01:23 <openstack> The meeting name has been set to 'heat'
12:01:34 <zaneb> #topic roll call
12:01:36 <asalkeld> o/
12:01:37 <mspreitz> o/
12:01:44 <tspatzier> hi
12:01:46 <unmeshg> hi
12:01:47 <inc0> o/
12:02:15 <skraynev> yohu!
12:03:07 <zaneb> #topic Review action items from last meeting
12:03:24 <zaneb> #link http://eavesdrop.openstack.org/meetings/heat/2014/heat.2014-09-24-20.02.html
12:03:53 <zaneb> https://review.openstack.org/#/c/122934/ got merged, so success there
12:04:04 <zaneb> therve sync oslo incubator
12:04:11 <therve> Hi
12:04:13 <zaneb> did anyone notice if that happened?
12:04:19 <zaneb> o hai
12:04:23 <therve> No I didn't do it
12:04:33 <therve> I was waiting on the oslo.i18n merge, I don't think it happened?
12:04:38 <skraynev> zaneb: do you mean sync with oslo?
12:04:52 <zaneb> ok, probably counterproductive to do anything at this point anyway
12:05:06 <zaneb> skraynev: oslo-incubator
12:05:18 <zaneb> we have to copy-paste stuff from it
12:05:27 <zaneb> it's ugly
12:05:35 <skraynev> zaneb: got it, thx
12:05:41 <asalkeld> not too much left in the incubator
12:05:42 <therve> Last time I checked there were no critical issues at least
12:05:52 <zaneb> #topic Adding items to the agenda
12:06:10 <zaneb> #link https://wiki.openstack.org/wiki/Meetings/HeatAgenda#Agenda_.282014-10-01_1200_UTC.29
12:06:15 <zaneb> anything else to add?
12:06:20 <asalkeld> we have some bugs causing gating problems
12:06:38 <therve> Heat bugs?
12:06:46 <asalkeld> yeah
12:07:11 <therve> We should raise the priority of those if that's the case
12:07:15 <zaneb> ok, we'll cover that
12:07:36 <asalkeld> #link http://status.openstack.org/elastic-recheck/#1374175
12:07:40 <skraynev> I had one question, but suppose it may be moved to open discussion ;)
12:08:35 <asalkeld> zaneb, do we need to start thinking about summit sessions?
12:08:53 <zaneb> asalkeld: definitely
12:09:07 <asalkeld> not sure about time line
12:09:13 <asalkeld> need to look into that
12:09:26 <zaneb> asalkeld: yeah, it normally sneaks up on you
12:09:45 <zaneb> no sooner are PTL elections over than whammo
12:09:47 <asalkeld> we all need to think of important stuff to discuss
12:09:54 <asalkeld> :-O
12:10:17 <zaneb> #topic HARestarter transition plan
12:10:38 <asalkeld> mspreitz, you have a bp
12:10:49 <mspreitz> the BP is plan C
12:10:55 <zaneb> once again we have failed to communicate to our users the uselessness of HARestarter in yet another release
12:10:56 <mspreitz> plan A is to have a normal transition
12:10:57 <asalkeld> might be a good summit sessions
12:10:58 <zaneb> sigh :(
12:11:21 <asalkeld> mspreitz, we are not removing it
12:11:35 <asalkeld> we will wait until there is something better
12:11:48 <asalkeld> i think we all care about ha
12:11:54 <mspreitz> So will there be a release in which both HARestarter and something better are available?
12:12:06 <asalkeld> just want to be straight up about it's capablity
12:12:18 <inc0> we need to define "something better" as well
12:12:28 <asalkeld> inc0, +1
12:12:50 <zaneb> yeah, we need to figure out what we _can_ do without HARestarter holding up convergence
12:12:53 <mspreitz> The thing that worries me is people saying that HARestarter will be impossible in the future
12:12:58 <inc0> I've posted comment to your spec mspreitz, HA is hard thing...
12:13:15 <asalkeld> and harestarter is a toy really
12:13:36 <tspatzier> I think confusion also exists because someone said that HARestarter will not work with convergence, so can there be a release when both the legacy and the new thing work?
12:13:38 * pas-ha being late
12:13:41 <inc0> there are several approaches to both health monitoring and self healing
12:13:59 <skraynev> possibly we may make the shadow replace and add map in environment on new HARestarter
12:14:08 * mspreitz hopes the discussion will focus on the pointed question of a transition
12:14:09 <asalkeld> i think it would work better if we had built in workflow
12:14:14 <skraynev> it allows people do not worry about templates
12:14:19 <asalkeld> and we could run tasks
12:14:45 <asalkeld> repair workflow
12:15:01 <mspreitz> HARestarter has the wrong name but can be used today
12:15:12 <mspreitz> The question is, will there be a normal transition?
12:15:23 <inc0> asalkeld, we can't really have that without going to very low level, I've been probing this subject with self healing after host failure topic
12:15:24 <asalkeld> mspreitz, i don't see why not
12:15:53 <mspreitz> Some remarks say that HARestarter will be impossible when convergence arrives
12:16:04 <inc0> there might be issues with simple transition, because me might need to change logic itself
12:16:09 <asalkeld> i dont' see why
12:16:28 <mspreitz> Zane, do you have a reason?
12:16:35 <asalkeld> it's a very similar mechanism to autoscaling
12:16:59 <skraynev> I think that mspreitz worries, that we just remove HARestarter and do not offer any equivalent.
12:17:08 <zaneb> asalkeld: autoscaling controls another stack. HARestarter controls its own stack
12:17:11 <mspreitz> my point is transition
12:17:31 <mspreitz> if HARestarter and something better can not co-exist, there is no transition
12:17:48 <asalkeld> signal -> action -> delete (wait for continous recovery)
12:17:52 <mspreitz> there needs to be at least one release of overlap
12:18:11 <zaneb> transition is completely unrelated to deprecation, yet we have allowed the former to derail the latter for over a year now
12:19:02 <asalkeld> mspreitz, i'd rather have convergence than harestarer if it comes to that
12:19:14 <asalkeld> but hopefully we don't have to choose
12:19:22 <inc0> if HARestarter takes instance id as property, if convergence rebuilds this instance, id changes right? This way HARestarter will stop covering instance if something goes wrong
12:19:40 <zaneb> inc0: it doesn't though
12:19:52 <mspreitz> asalkeld: my point is that an abrupt change should not be forced on users.  That's just evolution 101
12:20:09 <asalkeld> mspreitz, i am in agreement that we should do what we can to maintain it's functionality
12:20:20 <asalkeld> but not at all costs
12:20:29 <zaneb> +1
12:20:39 <mspreitz> We do not need co-existence indefinitely, but ground rules say we need it for 1 release
12:21:20 <therve> Can we talk about it again if/when the problem araises?
12:21:38 <asalkeld> sure, lets not assume the worst
12:21:48 <mspreitz> no wait
12:22:14 <mspreitz> If some time during K we decide we can not have co-existence, what then?
12:22:24 <therve> That's not how it works
12:22:39 <therve> We make patches, and then one patch may break HARestarter, then we'll see
12:22:46 <therve> Before merging it
12:23:04 <zaneb> I don't want to design all of convergence around known-broken resource types
12:23:06 <mspreitz> As long as people are committed to the usual ground rules, 1 release of overlap, I am fine
12:23:24 <therve> Sure
12:23:39 <asalkeld> mspreitz, i think we will come up with a reasonable plan
12:23:58 <asalkeld> no one is a fortune teller
12:24:18 <therve> convergence may not happen at all during K
12:24:26 <mspreitz> I am not asking for clairvoyance.  I am worried by verbiage that suggests less than a committment to evolution 101
12:24:59 <therve> HARestarter is dumb, we shouldn't go out of our way to support it, that's all
12:25:03 <mspreitz> therve has indicated a clear committment
12:25:08 <mspreitz> oh dang
12:25:31 <asalkeld> mspreitz, all we are saying is "if it it totally impossible to have harestarter and convergance, I think we should choose convergence"
12:25:37 <mspreitz> calling a useful thing names does not really give you permission to break the ground rules
12:25:47 <asalkeld> but hopefully it won't come to that
12:26:00 <mspreitz> asalkeld: I am saying there are other ways to get smooth transitions
12:26:05 <zaneb> I'm not prepared to commit, because I don't think this feature should ever have been in Heat. we should have deprecated it in Havana
12:26:53 <asalkeld> well zaneb we didn't have a deprecation mechanism until receintly
12:27:01 <asalkeld> (for resource types)
12:27:02 <zaneb> and given that we have known since then all the ways that it breaks Heat's data model, I don't want convergence to be forced into architectural changes just to deal with broken stuff like this
12:27:41 <zaneb> that said, if it can be made to work with convergence for a release, then that is obviously preferable
12:27:43 <mspreitz> So plan C is a way to keep things smooth for users even if HARestarter and convergence can not co-exist
12:28:11 <asalkeld> how is that going to happen?
12:28:21 <mspreitz> (plan B is: go back and try harder to make plan A work)
12:28:46 <mspreitz> plan C is to introduce a higher-level abstraction that meets users needs and can be implemented by HARestarter and by convergence
12:29:30 <mspreitz> but you have to introduce that higher abstraction at least 1 release before you remove HARestarter
12:29:53 <asalkeld> mspreitz, the problem is we just don't even know if this is going to be a problem at all
12:30:10 <asalkeld> so this discussion seems totally premature to me
12:30:13 <mspreitz> plan C requires planning 1 release ahead
12:30:28 <mspreitz> but if plan A will not work, it is what you have
12:30:59 <asalkeld> mspreitz, i do think we need to start now with a better ha solution
12:31:13 <asalkeld> so by that time we have something much better
12:31:25 <zaneb> mspreitz: are you suggesting we introduce a less-broken HA to overlap with HARestarter for one release and then convergence for one release?
12:31:35 <mspreitz> zaneb: yes
12:31:46 <inc0> is there any point of making something less-broken and just for one release
12:31:54 <mspreitz> 2 releases
12:31:58 <inc0> maybe lets make something well...unbroken?
12:32:02 <mspreitz> to allow for smooth transitions
12:32:14 <zaneb> inc0: that would be convergence
12:32:17 <mspreitz> zaneb: i have to adjust the wording
12:32:21 <asalkeld> mspreitz, so you want us to delay convergence because of harestarer
12:32:28 <mspreitz> something with an interface that implies less brokenness
12:32:39 <zaneb> my fear is that if we introduce something else, people will use it
12:32:43 <mspreitz> at first the impelementation will be based on HARestarter
12:33:13 <mspreitz> oops,  I still botched wording...
12:33:28 <inc0> its my understanding that convergence will be optional right?
12:33:39 <asalkeld> inc0, i don't see how
12:33:46 <mspreitz> introduce something whose interface does not imply the implementation is based on HARestarter, even though the initial implementation will be just that
12:33:51 <zaneb> inc0: convergence is both a feature and an entire architecture
12:33:54 <zaneb> inc0: IOW no
12:34:46 <mspreitz> Plan C is not about introducing a new HA mechanism
12:35:00 <mspreitz> it is about obscuring the fact that HARestarter is the only solution
12:35:12 <mspreitz> so that users will not see abrupt change
12:35:19 <mspreitz> when HARestarter is replaced
12:35:23 <zaneb> mspreitz: I don't see how you get away from the fact that we're doing stack-level operations within a resource. if convergence doesn't support that, we're still hosed
12:35:48 <mspreitz> plan C is a resource type whose implementation can be changed at the moment HARestarter is replaced
12:35:59 <mspreitz> with constant resource type interface, user templates do not have to change
12:36:29 <asalkeld> mspreitz, i think what zaneb is saying is the "server" needs to be a stack
12:36:46 <zaneb> we could have e.g. an OS::Heat::HAStack resource that creates a nested stack with HA control over a named resource
12:36:52 <mspreitz> plan C is a resource type that takes a template as an input
12:37:02 <asalkeld> zaneb, +1
12:37:06 <zaneb> and use the update mechanism to do restarts
12:37:15 <asalkeld> recreates
12:37:33 <asalkeld> stack->repair
12:37:35 <mspreitz> my proposal for the resource type was that it not be for an atomic thing but rather a scaling group of things
12:38:02 <mspreitz> I suggested that because I thought I heard that automatic convergence would first arrrive only for scaling group members
12:38:13 <asalkeld> mspreitz, yes that is a nested stack right
12:38:17 <zaneb> asalkeld: it's still horrible though, and I guarantee people will actually start using it :(
12:38:25 <asalkeld> har
12:38:45 <asalkeld> can't we just get pacemaker working in the guest
12:38:48 <mspreitz> my point is that the users templates do not commit to the current bad implementation
12:39:03 <mspreitz> if they use the higher level resource type
12:39:16 <asalkeld> sure
12:39:38 <asalkeld> mspreitz, that sounds better
12:39:47 <mspreitz> asalkeld: if the plan C resource type is not about maintaining one thing but rather a scaling group of things, then I think it is not so bad
12:40:06 <mspreitz> Really, it is just saying what I hear is the most useful feature of scaling groups.
12:40:10 <inc0> asalkeld, pacemaker doesn't really scale, its not good for larger stacks
12:40:26 <asalkeld> inc0, in guest -not run by us
12:40:58 <mspreitz> I am suggesting a resource type that lets users write templates that do not commit to any one bad implementation, the only committment is to a pretty defensible function
12:41:11 <asalkeld> mspreitz, that sounds ok to me
12:41:36 <inc0> asalkeld, yes, but since we're in cloud we could do this in more scallable way than making people using pacemaker
12:42:04 <asalkeld> tho' what would be good would be to have a plugable why to repair
12:42:09 <zaneb> inc0: nothing we can do is in any way a substitute for pacemaker
12:42:21 <asalkeld> so we don't have to deal with every possible issue
12:42:33 <inc0> zaneb, agree, but there are subsets of cases we can actually help with
12:42:34 <mspreitz> asalkeld: not sure if you are speaking to plan C, but...
12:42:46 <asalkeld> yip C
12:43:01 <mspreitz> the point of plan C is that the users template only to commit to the idea that we have some way to do repair of scaling group members
12:43:14 <mspreitz> we are free to change implementation abruptly
12:43:36 <inc0> mspreitz, what people will actually get from this commitment?
12:43:48 <asalkeld> we have 15 mins left
12:43:58 <asalkeld> do we want to cover other things?
12:44:02 <mspreitz> inc0: users will not have to change all their templates the moment their cloud operator installs a certain release
12:44:28 <mspreitz> the point of plan C is that we make less of a committment
12:44:29 <zaneb> mspreitz: before you were all about having a transition plan for users, but now you only care about scaling group members?
12:44:34 * zaneb is confused
12:44:51 <inc0> also, if we introduce interface before designing architecture, that may lead to mistakes impossible to correct
12:45:02 <mspreitz> The users of which I know can be satisfied if we only support HA for scaling group members
12:45:12 <zaneb> inc0: bingo
12:45:17 <mspreitz> I suggested that limitation because it puts less restrictions on what we do
12:45:44 <asalkeld> i think we need lots of furture design
12:45:46 <mspreitz> Look at https://review.openstack.org/#/c/124656/ and see if you think the interface promises something we will not be able to deliver
12:46:08 <mspreitz> I think if we can not deliver on that level of function in the future then we will be badly broken in the future
12:46:24 <zaneb> #link https://review.openstack.org/#/c/124656/
12:46:35 <mspreitz> The interface is deliberately not general, so that we do not have all the usual problems of designing too far ahead
12:46:44 <inc0> shouldn't we just meet up in Paris to discuss that and maybe make actual high level draft of architecture?
12:46:45 <zaneb> ok, let's review that spec and go from there
12:46:54 <mspreitz> but I think the interface is general enough to satisfy users for a while
12:47:03 <asalkeld> inc0, +1
12:47:08 <zaneb> #topic Gate bugs
12:47:23 <mspreitz> Again, plan C requires planning 1 release ahead
12:47:41 <zaneb> #link https://bugs.launchpad.net/heat/+bug/1374175
12:47:43 <uvirtbot> Launchpad bug 1374175 in heat "test_server_cfn_init failed in gate-tempest-dsvm-neutron-heat-slow: AssertionError: Timed out waiting for to become reachable" [High,Confirmed]
12:47:54 <zaneb> that one seems to be failing a lot
12:48:00 <asalkeld> so i just wanted to make people aware of that
12:48:25 <therve> is heat-slow voting?
12:48:26 <skraynev> so summarize: 1. we do not delete harestarter (yet) and leave it in deprecation until alternative are not implemented. 2. Implementation of convergence may be danger for harestarter, but we believe, that all will be ok. 3. Replacement for harestrater will be done after convergence as soon as possible.
12:48:33 <skraynev> I hope it's right ;)
12:48:54 <asalkeld> notice that it came to being at the same time as a neutron gating bug
12:49:09 <mspreitz> skraynev: was there typo in 1?  Extra "not" ?
12:49:23 <zaneb> https://bugs.launchpad.net/tempest/+bug/1370865 <- that one has been fixed
12:49:25 <uvirtbot> Launchpad bug 1370865 in heat "tempest.api.orchestration.stacks.test_update.UpdateStackTestJSON.test_stack_update_add_remove mismatch error" [Medium,Fix committed]
12:49:43 <skraynev> mspreitz: yes, second is unnecessary ;)
12:49:58 <asalkeld> zaneb, look here : http://status.openstack.org/elastic-recheck/#1374175
12:50:14 <asalkeld> there are a bunch of issue that cropped up at the same time
12:50:22 <asalkeld> sep29
12:50:24 <therve> zaneb, It seems to re-happened since yesterday?
12:50:38 <asalkeld> i wonder if there was an infra change
12:50:40 <therve> http://status.openstack.org/elastic-recheck/#1370865
12:52:05 <zaneb> asalkeld: yeah, something must have been changed, and not by us I suspect
12:52:48 <asalkeld> lots of timeout issue
12:52:56 <asalkeld> maybe slower vms?
12:53:04 <asalkeld> network etc..
12:53:31 <skraynev> convergence are not implemented yet ...
12:53:33 <skraynev> :)
12:53:50 <asalkeld> Bug 1311066 - Some nodes allocated in node pool are very very slow
12:53:52 <uvirtbot> Launchpad bug 1311066 in openstack-ci "Some nodes allocated in node pool are very very slow" [High,Confirmed] https://launchpad.net/bugs/1311066
12:53:53 <zaneb> do elastic recheck graphs include stable branches?
12:53:58 <asalkeld> yes
12:54:40 <zaneb> did we start running a lot of stable/icehouse tests on the 29th maybe?
12:54:54 <asalkeld> honestly not sure
12:55:03 <asalkeld> but maybe
12:55:26 <asalkeld> isn't it the last stable release just before the next new release?
12:55:45 <zaneb> idk
12:55:55 <zaneb> sounds plausible
12:56:48 <asalkeld> any bugs that we might need in a rc2?
12:57:01 <mspreitz> I'd rather not have bugs !
12:57:17 <asalkeld> :) - i mean the fixes
12:57:36 <zaneb> #link https://bugs.launchpad.net/heat/+bugs?field.tag=juno-rc-potential
12:58:08 <zaneb> just a reminder, tag your bug "juno-rc-potential" if you think it should maybe be fixed in -rc2
12:58:18 <asalkeld> okie dokie
12:58:46 <zaneb> btw if someone knowledgeable (i.e. not me) could look at bug 1370302 that would be great
12:58:47 <uvirtbot> Launchpad bug 1370302 in heat "heat stack-create failed due to lack of 'v2.0' in auth_uri" [Undecided,New] https://launchpad.net/bugs/1370302
12:59:10 <asalkeld> everyone waiting for shardy to get back:-O
12:59:10 <zaneb> it sounds bad but could easily be PEBKAC
12:59:35 <asalkeld> googling pebkac
12:59:59 <zaneb> asalkeld: you disappoint me ;)
13:00:20 <asalkeld> ga, never heard of that
13:00:20 <therve> That bug looks weird to me
13:00:35 <skraynev> asalkeld: http://forum.ilikecheats.com/threads/571-new-problems-PEBKAC
13:01:07 <zaneb> ok, we're out of time
13:01:14 <zaneb> let's continue in #heat
13:01:17 <asalkeld> k
13:01:18 <skraynev> ok
13:01:19 <zaneb> #endmeeting