12:01:19 #startmeeting heat 12:01:20 Meeting started Wed Oct 1 12:01:19 2014 UTC and is due to finish in 60 minutes. The chair is zaneb. Information about MeetBot at http://wiki.debian.org/MeetBot. 12:01:21 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 12:01:23 The meeting name has been set to 'heat' 12:01:34 #topic roll call 12:01:36 o/ 12:01:37 o/ 12:01:44 hi 12:01:46 hi 12:01:47 o/ 12:02:15 yohu! 12:03:07 #topic Review action items from last meeting 12:03:24 #link http://eavesdrop.openstack.org/meetings/heat/2014/heat.2014-09-24-20.02.html 12:03:53 https://review.openstack.org/#/c/122934/ got merged, so success there 12:04:04 therve sync oslo incubator 12:04:11 Hi 12:04:13 did anyone notice if that happened? 12:04:19 o hai 12:04:23 No I didn't do it 12:04:33 I was waiting on the oslo.i18n merge, I don't think it happened? 12:04:38 zaneb: do you mean sync with oslo? 12:04:52 ok, probably counterproductive to do anything at this point anyway 12:05:06 skraynev: oslo-incubator 12:05:18 we have to copy-paste stuff from it 12:05:27 it's ugly 12:05:35 zaneb: got it, thx 12:05:41 not too much left in the incubator 12:05:42 Last time I checked there were no critical issues at least 12:05:52 #topic Adding items to the agenda 12:06:10 #link https://wiki.openstack.org/wiki/Meetings/HeatAgenda#Agenda_.282014-10-01_1200_UTC.29 12:06:15 anything else to add? 12:06:20 we have some bugs causing gating problems 12:06:38 Heat bugs? 12:06:46 yeah 12:07:11 We should raise the priority of those if that's the case 12:07:15 ok, we'll cover that 12:07:36 #link http://status.openstack.org/elastic-recheck/#1374175 12:07:40 I had one question, but suppose it may be moved to open discussion ;) 12:08:35 zaneb, do we need to start thinking about summit sessions? 12:08:53 asalkeld: definitely 12:09:07 not sure about time line 12:09:13 need to look into that 12:09:26 asalkeld: yeah, it normally sneaks up on you 12:09:45 no sooner are PTL elections over than whammo 12:09:47 we all need to think of important stuff to discuss 12:09:54 :-O 12:10:17 #topic HARestarter transition plan 12:10:38 mspreitz, you have a bp 12:10:49 the BP is plan C 12:10:55 once again we have failed to communicate to our users the uselessness of HARestarter in yet another release 12:10:56 plan A is to have a normal transition 12:10:57 might be a good summit sessions 12:10:58 sigh :( 12:11:21 mspreitz, we are not removing it 12:11:35 we will wait until there is something better 12:11:48 i think we all care about ha 12:11:54 So will there be a release in which both HARestarter and something better are available? 12:12:06 just want to be straight up about it's capablity 12:12:18 we need to define "something better" as well 12:12:28 inc0, +1 12:12:50 yeah, we need to figure out what we _can_ do without HARestarter holding up convergence 12:12:53 The thing that worries me is people saying that HARestarter will be impossible in the future 12:12:58 I've posted comment to your spec mspreitz, HA is hard thing... 12:13:15 and harestarter is a toy really 12:13:36 I think confusion also exists because someone said that HARestarter will not work with convergence, so can there be a release when both the legacy and the new thing work? 12:13:38 * pas-ha being late 12:13:41 there are several approaches to both health monitoring and self healing 12:13:59 possibly we may make the shadow replace and add map in environment on new HARestarter 12:14:08 * mspreitz hopes the discussion will focus on the pointed question of a transition 12:14:09 i think it would work better if we had built in workflow 12:14:14 it allows people do not worry about templates 12:14:19 and we could run tasks 12:14:45 repair workflow 12:15:01 HARestarter has the wrong name but can be used today 12:15:12 The question is, will there be a normal transition? 12:15:23 asalkeld, we can't really have that without going to very low level, I've been probing this subject with self healing after host failure topic 12:15:24 mspreitz, i don't see why not 12:15:53 Some remarks say that HARestarter will be impossible when convergence arrives 12:16:04 there might be issues with simple transition, because me might need to change logic itself 12:16:09 i dont' see why 12:16:28 Zane, do you have a reason? 12:16:35 it's a very similar mechanism to autoscaling 12:16:59 I think that mspreitz worries, that we just remove HARestarter and do not offer any equivalent. 12:17:08 asalkeld: autoscaling controls another stack. HARestarter controls its own stack 12:17:11 my point is transition 12:17:31 if HARestarter and something better can not co-exist, there is no transition 12:17:48 signal -> action -> delete (wait for continous recovery) 12:17:52 there needs to be at least one release of overlap 12:18:11 transition is completely unrelated to deprecation, yet we have allowed the former to derail the latter for over a year now 12:19:02 mspreitz, i'd rather have convergence than harestarer if it comes to that 12:19:14 but hopefully we don't have to choose 12:19:22 if HARestarter takes instance id as property, if convergence rebuilds this instance, id changes right? This way HARestarter will stop covering instance if something goes wrong 12:19:40 inc0: it doesn't though 12:19:52 asalkeld: my point is that an abrupt change should not be forced on users. That's just evolution 101 12:20:09 mspreitz, i am in agreement that we should do what we can to maintain it's functionality 12:20:20 but not at all costs 12:20:29 +1 12:20:39 We do not need co-existence indefinitely, but ground rules say we need it for 1 release 12:21:20 Can we talk about it again if/when the problem araises? 12:21:38 sure, lets not assume the worst 12:21:48 no wait 12:22:14 If some time during K we decide we can not have co-existence, what then? 12:22:24 That's not how it works 12:22:39 We make patches, and then one patch may break HARestarter, then we'll see 12:22:46 Before merging it 12:23:04 I don't want to design all of convergence around known-broken resource types 12:23:06 As long as people are committed to the usual ground rules, 1 release of overlap, I am fine 12:23:24 Sure 12:23:39 mspreitz, i think we will come up with a reasonable plan 12:23:58 no one is a fortune teller 12:24:18 convergence may not happen at all during K 12:24:26 I am not asking for clairvoyance. I am worried by verbiage that suggests less than a committment to evolution 101 12:24:59 HARestarter is dumb, we shouldn't go out of our way to support it, that's all 12:25:03 therve has indicated a clear committment 12:25:08 oh dang 12:25:31 mspreitz, all we are saying is "if it it totally impossible to have harestarter and convergance, I think we should choose convergence" 12:25:37 calling a useful thing names does not really give you permission to break the ground rules 12:25:47 but hopefully it won't come to that 12:26:00 asalkeld: I am saying there are other ways to get smooth transitions 12:26:05 I'm not prepared to commit, because I don't think this feature should ever have been in Heat. we should have deprecated it in Havana 12:26:53 well zaneb we didn't have a deprecation mechanism until receintly 12:27:01 (for resource types) 12:27:02 and given that we have known since then all the ways that it breaks Heat's data model, I don't want convergence to be forced into architectural changes just to deal with broken stuff like this 12:27:41 that said, if it can be made to work with convergence for a release, then that is obviously preferable 12:27:43 So plan C is a way to keep things smooth for users even if HARestarter and convergence can not co-exist 12:28:11 how is that going to happen? 12:28:21 (plan B is: go back and try harder to make plan A work) 12:28:46 plan C is to introduce a higher-level abstraction that meets users needs and can be implemented by HARestarter and by convergence 12:29:30 but you have to introduce that higher abstraction at least 1 release before you remove HARestarter 12:29:53 mspreitz, the problem is we just don't even know if this is going to be a problem at all 12:30:10 so this discussion seems totally premature to me 12:30:13 plan C requires planning 1 release ahead 12:30:28 but if plan A will not work, it is what you have 12:30:59 mspreitz, i do think we need to start now with a better ha solution 12:31:13 so by that time we have something much better 12:31:25 mspreitz: are you suggesting we introduce a less-broken HA to overlap with HARestarter for one release and then convergence for one release? 12:31:35 zaneb: yes 12:31:46 is there any point of making something less-broken and just for one release 12:31:54 2 releases 12:31:58 maybe lets make something well...unbroken? 12:32:02 to allow for smooth transitions 12:32:14 inc0: that would be convergence 12:32:17 zaneb: i have to adjust the wording 12:32:21 mspreitz, so you want us to delay convergence because of harestarer 12:32:28 something with an interface that implies less brokenness 12:32:39 my fear is that if we introduce something else, people will use it 12:32:43 at first the impelementation will be based on HARestarter 12:33:13 oops, I still botched wording... 12:33:28 its my understanding that convergence will be optional right? 12:33:39 inc0, i don't see how 12:33:46 introduce something whose interface does not imply the implementation is based on HARestarter, even though the initial implementation will be just that 12:33:51 inc0: convergence is both a feature and an entire architecture 12:33:54 inc0: IOW no 12:34:46 Plan C is not about introducing a new HA mechanism 12:35:00 it is about obscuring the fact that HARestarter is the only solution 12:35:12 so that users will not see abrupt change 12:35:19 when HARestarter is replaced 12:35:23 mspreitz: I don't see how you get away from the fact that we're doing stack-level operations within a resource. if convergence doesn't support that, we're still hosed 12:35:48 plan C is a resource type whose implementation can be changed at the moment HARestarter is replaced 12:35:59 with constant resource type interface, user templates do not have to change 12:36:29 mspreitz, i think what zaneb is saying is the "server" needs to be a stack 12:36:46 we could have e.g. an OS::Heat::HAStack resource that creates a nested stack with HA control over a named resource 12:36:52 plan C is a resource type that takes a template as an input 12:37:02 zaneb, +1 12:37:06 and use the update mechanism to do restarts 12:37:15 recreates 12:37:33 stack->repair 12:37:35 my proposal for the resource type was that it not be for an atomic thing but rather a scaling group of things 12:38:02 I suggested that because I thought I heard that automatic convergence would first arrrive only for scaling group members 12:38:13 mspreitz, yes that is a nested stack right 12:38:17 asalkeld: it's still horrible though, and I guarantee people will actually start using it :( 12:38:25 har 12:38:45 can't we just get pacemaker working in the guest 12:38:48 my point is that the users templates do not commit to the current bad implementation 12:39:03 if they use the higher level resource type 12:39:16 sure 12:39:38 mspreitz, that sounds better 12:39:47 asalkeld: if the plan C resource type is not about maintaining one thing but rather a scaling group of things, then I think it is not so bad 12:40:06 Really, it is just saying what I hear is the most useful feature of scaling groups. 12:40:10 asalkeld, pacemaker doesn't really scale, its not good for larger stacks 12:40:26 inc0, in guest -not run by us 12:40:58 I am suggesting a resource type that lets users write templates that do not commit to any one bad implementation, the only committment is to a pretty defensible function 12:41:11 mspreitz, that sounds ok to me 12:41:36 asalkeld, yes, but since we're in cloud we could do this in more scallable way than making people using pacemaker 12:42:04 tho' what would be good would be to have a plugable why to repair 12:42:09 inc0: nothing we can do is in any way a substitute for pacemaker 12:42:21 so we don't have to deal with every possible issue 12:42:33 zaneb, agree, but there are subsets of cases we can actually help with 12:42:34 asalkeld: not sure if you are speaking to plan C, but... 12:42:46 yip C 12:43:01 the point of plan C is that the users template only to commit to the idea that we have some way to do repair of scaling group members 12:43:14 we are free to change implementation abruptly 12:43:36 mspreitz, what people will actually get from this commitment? 12:43:48 we have 15 mins left 12:43:58 do we want to cover other things? 12:44:02 inc0: users will not have to change all their templates the moment their cloud operator installs a certain release 12:44:28 the point of plan C is that we make less of a committment 12:44:29 mspreitz: before you were all about having a transition plan for users, but now you only care about scaling group members? 12:44:34 * zaneb is confused 12:44:51 also, if we introduce interface before designing architecture, that may lead to mistakes impossible to correct 12:45:02 The users of which I know can be satisfied if we only support HA for scaling group members 12:45:12 inc0: bingo 12:45:17 I suggested that limitation because it puts less restrictions on what we do 12:45:44 i think we need lots of furture design 12:45:46 Look at https://review.openstack.org/#/c/124656/ and see if you think the interface promises something we will not be able to deliver 12:46:08 I think if we can not deliver on that level of function in the future then we will be badly broken in the future 12:46:24 #link https://review.openstack.org/#/c/124656/ 12:46:35 The interface is deliberately not general, so that we do not have all the usual problems of designing too far ahead 12:46:44 shouldn't we just meet up in Paris to discuss that and maybe make actual high level draft of architecture? 12:46:45 ok, let's review that spec and go from there 12:46:54 but I think the interface is general enough to satisfy users for a while 12:47:03 inc0, +1 12:47:08 #topic Gate bugs 12:47:23 Again, plan C requires planning 1 release ahead 12:47:41 #link https://bugs.launchpad.net/heat/+bug/1374175 12:47:43 Launchpad bug 1374175 in heat "test_server_cfn_init failed in gate-tempest-dsvm-neutron-heat-slow: AssertionError: Timed out waiting for 10.1.0.4 to become reachable" [High,Confirmed] 12:47:54 that one seems to be failing a lot 12:48:00 so i just wanted to make people aware of that 12:48:25 is heat-slow voting? 12:48:26 so summarize: 1. we do not delete harestarter (yet) and leave it in deprecation until alternative are not implemented. 2. Implementation of convergence may be danger for harestarter, but we believe, that all will be ok. 3. Replacement for harestrater will be done after convergence as soon as possible. 12:48:33 I hope it's right ;) 12:48:54 notice that it came to being at the same time as a neutron gating bug 12:49:09 skraynev: was there typo in 1? Extra "not" ? 12:49:23 https://bugs.launchpad.net/tempest/+bug/1370865 <- that one has been fixed 12:49:25 Launchpad bug 1370865 in heat "tempest.api.orchestration.stacks.test_update.UpdateStackTestJSON.test_stack_update_add_remove mismatch error" [Medium,Fix committed] 12:49:43 mspreitz: yes, second is unnecessary ;) 12:49:58 zaneb, look here : http://status.openstack.org/elastic-recheck/#1374175 12:50:14 there are a bunch of issue that cropped up at the same time 12:50:22 sep29 12:50:24 zaneb, It seems to re-happened since yesterday? 12:50:38 i wonder if there was an infra change 12:50:40 http://status.openstack.org/elastic-recheck/#1370865 12:52:05 asalkeld: yeah, something must have been changed, and not by us I suspect 12:52:48 lots of timeout issue 12:52:56 maybe slower vms? 12:53:04 network etc.. 12:53:31 convergence are not implemented yet ... 12:53:33 :) 12:53:50 Bug 1311066 - Some nodes allocated in node pool are very very slow 12:53:52 Launchpad bug 1311066 in openstack-ci "Some nodes allocated in node pool are very very slow" [High,Confirmed] https://launchpad.net/bugs/1311066 12:53:53 do elastic recheck graphs include stable branches? 12:53:58 yes 12:54:40 did we start running a lot of stable/icehouse tests on the 29th maybe? 12:54:54 honestly not sure 12:55:03 but maybe 12:55:26 isn't it the last stable release just before the next new release? 12:55:45 idk 12:55:55 sounds plausible 12:56:48 any bugs that we might need in a rc2? 12:57:01 I'd rather not have bugs ! 12:57:17 :) - i mean the fixes 12:57:36 #link https://bugs.launchpad.net/heat/+bugs?field.tag=juno-rc-potential 12:58:08 just a reminder, tag your bug "juno-rc-potential" if you think it should maybe be fixed in -rc2 12:58:18 okie dokie 12:58:46 btw if someone knowledgeable (i.e. not me) could look at bug 1370302 that would be great 12:58:47 Launchpad bug 1370302 in heat "heat stack-create failed due to lack of 'v2.0' in auth_uri" [Undecided,New] https://launchpad.net/bugs/1370302 12:59:10 everyone waiting for shardy to get back:-O 12:59:10 it sounds bad but could easily be PEBKAC 12:59:35 googling pebkac 12:59:59 asalkeld: you disappoint me ;) 13:00:20 ga, never heard of that 13:00:20 That bug looks weird to me 13:00:35 asalkeld: http://forum.ilikecheats.com/threads/571-new-problems-PEBKAC 13:01:07 ok, we're out of time 13:01:14 let's continue in #heat 13:01:17 k 13:01:18 ok 13:01:19 #endmeeting