20:01:23 #startmeeting heat 20:01:24 Meeting started Wed Aug 21 20:01:23 2013 UTC and is due to finish in 60 minutes. The chair is shardy. Information about MeetBot at http://wiki.debian.org/MeetBot. 20:01:25 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 20:01:27 The meeting name has been set to 'heat' 20:01:34 #topic rollcall 20:01:40 hi all, who's around? 20:01:42 jpeeler here 20:01:42 o/ 20:01:43 o/ 20:01:44 hello! 20:01:45 here 20:01:48 Hi! 20:01:55 Hi! 20:02:01 \o 20:02:36 #link https://wiki.openstack.org/wiki/Meetings/HeatAgenda 20:03:10 o/ 20:03:14 (leaving early) 20:03:17 o/ 20:03:24 asalkeld, sdake? 20:03:37 ok lets get started 20:03:56 #topic Review last week's actions 20:04:27 #link http://eavesdrop.openstack.org/meetings/heat/2013/heat.2013-08-14-20.00.html 20:04:35 * shardy shardy to post mission statement 20:04:42 oops, I forgot to do that, again 20:04:59 does anyone have a link to the thread where this was requested, then I'll reply after this meeting? 20:05:08 #action shardy to post mission statement, again 20:05:20 I think people just post new threads with the statement 20:05:33 stevebaker: Ok, thanks 20:05:44 anything else from last week before we move on? 20:06:23 #topic Reminder re Feature proposal freeze 20:07:10 So it's this Friday, any features posted for review after that will require an exception, so we should start -2ing stuff which gets posted late 20:07:12 The gate already decided for us it seems 20:07:25 therve: haha, yea quite possibly ;) 20:07:29 * bnemec is here, but got distracted playing with stackalytics... 20:07:39 maybe we should review late items at the meeting 20:08:15 stevebaker: yup, next item is h3 bps 20:08:24 #topic h3 blueprint status 20:08:32 so, I'm working on https://bugs.launchpad.net/heat/+bug/1176142 20:08:34 Launchpad bug 1176142 in heat "UPDATE_REPLACE deletes things before it creates the replacement" [High,In progress] 20:08:40 dunno if you would call that a "feature" 20:08:46 #link https://launchpad.net/heat/+milestone/havana-3 20:08:49 but it is *not* going to land by friday 20:08:50 zaneb: \o/ 20:09:03 zaneb: that is a bug 100% 20:09:04 Launchpad bug 100 in launchpad "uploading po file overwrites authors list" [Medium,Fix released] https://launchpad.net/bugs/100 20:09:09 haha doh 20:09:22 basically, it takes longer to rebase my current work than time I have to work on it 20:09:24 zaneb: well technically I think it's only BPs which are frozen, but IMO we should start deferring any big bugs which contain new functionality not just fixes 20:09:36 so I have a negative amount of time to spend on the actual problem 20:09:40 zaneb: so maybe we defer until early icehouse? 20:09:45 which is already *really* hard 20:09:48 #link https://launchpad.net/heat/+milestone/havana-3 20:10:21 Ok so I think heat-trusts will slip too - I've just had too many problems with keystone, and it's taken too long to get the client support merged 20:10:27 zaneb: are you saying that you have so many things in-flight that you spend all day rebasing? 20:10:30 yeah, let's see how things go over the next week or so, but I'm thinking it will probably have to be bumped :( 20:10:40 shardy: sad :( 20:10:59 zaneb, Considering it's a bug, can't it go past the feature freeze? 20:11:08 Or is it too big of a behavior change? 20:11:18 it's a big behaviour change, tbh 20:11:19 bugs are exempt 20:11:21 zaneb, radix: IMO it's much better to bump big stuff which is too late than land loads of risky stuff and make our first integrated release really broken 20:11:31 there is no point in delaying a bugfix 20:11:57 shardy: yeah, I understand 20:12:03 adrian_otto: This is a complete rework of our update logic, so it's not a normal bugfix 20:12:20 same logic applies 20:12:24 but yeah, in general bugs are fine 20:12:29 unless you are addinga new feature? 20:12:29 zaneb: if the UPDATE_REPLACE is fixed, i think it's going to be easier to implement the rolling update for as-update-policy. 20:12:35 I was just sad that the keystone stuff is buggy 20:13:04 radix: well it's very new stuff, AFAICT we're the first to try to really use it 20:13:33 shardy: I think landing potentially broken stuff early is also an anti-pattern. Lets land good stuff. 20:13:34 yeah, that's usually how it goes when you have inter project dependencies like that 20:14:31 SpamapS: yeah, but if we've got stuff which is rushed and may be flaky, or depends on stuff known-to-be-flaky in other projects, now is not really the time to land it 20:14:45 early in the cycle is much less risky as we've got more time to fix and test 20:14:52 Yes, what I am saying is, lets not count on it being landable, flaky, on day 1 of icehouse. 20:15:07 There is never a time where it is ok to risk breaking everything. 20:15:53 SpamapS: sure, but deferring gives those working on said features more time to test and solidify their stuff before it's merged 20:16:16 deferring +1. Planning to de-stabilize trunk, -1. 20:16:40 (and I acknowledge that you were just saying "lets defer") 20:16:53 I'm just saying, don't defer, and stop working on it. 20:17:02 defer landing, keep stabilizing it. 20:17:11 Yeah, who was saying destablize trunk, I think you just made that up ;P 20:17:17 anyways.. 20:17:32 The entire software industry made that up. Its called "lets just drop it in trunk when we re-open after freeze". 20:17:43 There are a few bp's still in "Good Progress", some of which have patches posted I think: 20:17:45 * SpamapS moves on :) 20:18:05 #link https://blueprints.launchpad.net/heat/+spec/hot-parameters 20:18:24 #link https://blueprints.launchpad.net/heat/+spec/multiple-engines 20:18:52 #link https://blueprints.launchpad.net/heat/+spec/heat-multicloud 20:18:53 Should have something to submit today 20:18:58 for multi-engines 20:19:07 i mean multicloud :/ 20:19:18 to many multis 20:19:23 #link https://blueprints.launchpad.net/heat/+spec/oslo-db-support 20:19:45 Ok, cool, just wanted to see if some of those should actually be either Implemented or Needs Code Review 20:19:57 anyone know if hot-parameters is really done? 20:20:34 I think it's really sort-of done 20:20:42 https://review.openstack.org/#/q/status:merged+project:openstack/heat+branch:master+topic:bp/hot-parameters,n,z 20:21:08 All the stuff posted is merged, so can we claim it Implemented for h3 purposes? 20:21:22 for h3 purposes I think so, yes 20:21:43 zaneb: Ok, cool, thanks 20:22:14 best to double check with nanjj too though 20:22:19 then maybe it needs another bp for post-h features 20:22:57 I moved a few bugs into h3 which look like it would be good to fix, if anyone has bandwidth and needs something to do please pick them up :) 20:22:57 radix: yes, quite likely 20:23:20 Ok I'll ping nanjj to check tomorrow 20:23:43 Anyone else have anything to raise re h3 before we move on? 20:23:58 I may slip my lbaas bp in it 20:24:09 It "just" needs one more branch I think 20:24:33 Overall I think we've done really really well, 27 bps and 60 bugs atm, if we land most of that it's going to be a great effort :) 20:24:33 what's the remaining branch? 20:24:54 therve: Ok, cool, if things are up for review but not targetted please add them 20:24:59 radix, https://review.openstack.org/#/c/41475/ 20:25:07 i'm still working to get my last patch submitted for review for as-update-policy before end of week. as-update-policy was moved to "next". 20:25:17 I think m4dcoder posted a patch which got bumped and we may be able to pull back 20:25:21 snap 20:25:21 oh OK 20:25:59 m4dcoder: Ok, if it looks like it's going to land, I'll pull it back 20:26:19 thx. i have 1 in review and another 1 i'm going to submit before end of week for as-update-policy. 20:26:45 m4dcoder: Ok, sounds good thanks 20:27:43 #topic Open Discussion 20:27:54 anyone have anything? 20:28:05 shardy: I will follow up with nanji tonight on hot-parameter validation blueprint, I think it is done like you mentioned. Sorry I was on another meeting. 20:28:22 can havana release of heat still work with an openstack instances on grizzly? 20:28:31 spzala: Ok, that would be great, thanks, pls change the Implementation status if so 20:28:43 there was a recent change where if a nova boot fails, the instance resource deletes the server during create. Do we want to be doing this? 20:29:02 stevebaker: excellent question 20:29:02 m4dcoder: You mean on a grizzly openstack install? 20:29:06 shardy: OK, no problem. Yup, will do. 20:29:08 shardy: yes 20:29:14 I don't want to be doing that, it frightens me 20:29:29 m4dcoder: maybe, but it's not something we support, you need to use stable/grizzly 20:29:44 I'm inclined to leave the failed server there, for post-mortum if nothing else 20:29:54 I have been wondering about having an explicit "try to converge" operation in heat 20:30:37 which would basically clean up and retry to get things to look like the template 20:30:39 Are we deleting it, and re-trying? 20:30:42 I do like that 20:30:46 an ERROR state is a dead instance. 20:30:54 just deleting and putting the resource in FAILED state 20:30:55 stevebaker: I agree, I don't think we want to delete, or try to delete until stack delete 20:31:07 i shall raise a bug 20:31:13 no, it doesn't currently retry. and I don't think it should by default 20:31:27 shardy: thanks. 20:31:29 yeah that could lead to a large bill. ;) 20:31:42 instance group does this too BTW - deletes the sub resources 20:32:24 radix: yeah I was looking at that in conjunction with a patch from liang 20:32:46 how did a major change to the behaviour of instances sneak in in a commit that claimed to be just adding a Racksapce resource? 20:32:48 radix, It'd be nice to have at least an API call to "retry" create 20:32:50 https://github.com/openstack/heat/commit/2684f2bb4cda1b1a23ce596fcdb476bb961ea3f8 20:32:53 radix: IMO that is also wrong, the InstanceGroup resource should go into a failed state (probably UPDATE, FAILED) if it can't adjust 20:32:57 that's extremely uncool 20:33:18 therve: there is a bug for allowing retry of create/update 20:33:20 yeah. I didn't do it, I just tried to maintain the behavior through my redactor :) 20:33:32 f 20:34:08 zaneb, That's a bit sad and untested :/ 20:34:30 zaneb: Wow, that was a really bad commit. It introduced the ResourceFailure bug too. 20:34:32 shardy: all these transitions to failed state make the urgency of needing a "RETRY" capability go up. 20:34:39 radix: sure, well lets raise a big and fix it 20:35:02 bnemec: no, I introduced the ResourceFailure bug by not spotting that (bizarre) change 20:35:04 have not had time to address the lack thereof.. but would still like to very much 20:35:08 SpamapS: well IIRC it's assigned to you... 20:35:33 bnemec: and by "not spotting" I mean "relying on the unit tests instead of grep" 20:35:42 SpamapS: if you don't have the bandwidth, let me know and we'll reassign 20:35:45 Yeah, I keep running into these things where heat is a time bomb waiting to eat all of your memory/disk/cpu ... can't seem to prioritize retry over those. ;) 20:35:48 * zaneb goes to the naughty corner 20:36:06 zaneb: Yeah, part of the problem with that is it's too large. 800 some lines is too much to review properly. 20:36:29 SpamapS we're running into the same problem.. other general scalability issues are keeping us from getting to implementing a Retry 20:36:47 * SpamapS watches the ducks line up 20:36:56 but, +1 on wanting retry, both retry create, and retry individual steps of the create. 20:36:56 and now I have an appointment that I have to get to. 20:36:59 bnemec: well, the problem is when it says "Add resource for Rackspace Cloud Servers" but actually makes fundamental changes to other resources :) 20:37:00 anybody need me for something before I go? 20:37:44 kebray: OK well lets coordinate getting someone (if SpamapS can't get to it) looking at that soon 20:37:54 shardy sounds good. 20:38:01 SpamapS: o/ 20:38:21 * SpamapS goes poof 20:38:22 So I have a general question re upgrade strategy when we move to trusts.. 20:38:38 shoot 20:39:00 the cleanest way is to drop the DB and just use trusts for all user_creds, but I'm thinking we need to allow transistion, ie existing stacks should still work 20:39:24 zaneb: Sure, but in a 200 line change that probably gets shot down by reviewers. In 800 it gets lost in the noise. 20:39:32 (sorry for the tangent in the middle of the meeting) 20:39:45 so my current plan is to extend the context and user_creds adding a trust_id, which we use if it's there, otherwise we fall back to the user/pass for old, existing stacks 20:39:59 shardy, Is there a way to migrate existing stacks? 20:40:04 bnemec: fair point; that's hard to avoid when you're adding a whole new resource though 20:40:05 shardy: how about storing in resource_data? 20:40:22 therve: not really, because you don't have a connection to keystone at DB migrate time 20:40:40 bnemec: but yes, that should have been at least 3 patches. and at least 1 should have been rejected ;) 20:40:45 but we could write a tool which creates a trust using the stored credentials, and migrate it that way 20:41:05 zaneb: Agreed. :-) 20:41:25 shardy: does this mean new secrets need to make it onto instances? 20:41:34 stevebaker: I was thinking of just adding the trust_id to the stack table, but that makes the overlap between old/new methods harder 20:41:38 oh, this is just for api requests 20:41:38 shardy: heat-manage could do that maybe? 20:41:54 stevebaker: no, not yet, this is just the credentials for periodic tasks in the engine 20:42:19 zaneb: hmm, yeah, but heat-manage would need the ID of the heat service user 20:42:36 short answer is, it would be nice if the old method continued to work in parallel 20:42:42 I guess it could read it from a cli arg or config file 20:43:00 shardy: it would need the whole config file to decrypt the credentials anyway 20:43:08 shardy: but that seems doable 20:43:13 zaneb: good point 20:43:35 shardy: if you want to get really fancy, you could do it in the db migration ;) 20:43:44 stevebaker: agreed, but then do we publish e.g that we'll transistion to just trusts after e.g one cycle? 20:44:21 zaneb: haha, yeah I guess, was trying to keep things simple ;) 20:44:45 shardy: actually, we may need to keep the old way for a while if we support older openstack clouds 20:45:07 Ok thanks all for the input, will try to get a wip patch up for review soon, aiming for early Icehouse when the keystoneclient patches etc have landed 20:45:19 like, indefinitely. And have some way of discovering if the keystone supports trusts 20:45:29 ick 20:45:36 stevebaker: you mean for multicloud? 20:45:42 shardy: yes 20:45:48 gah 20:45:59 it's extremely, extremely uncool that we are storing credentials at all 20:46:00 :D 20:46:02 that could get really messy 20:46:17 I think it's better to say that if you don't have a compatible keystone, you lose out 20:46:23 zaneb: yes, unless it is a private heat installation 20:46:39 than to say that if you don't have a compatible keystone, we store your password in a really insecure way 20:46:48 zaneb: that's why I was hoping everyone would say drop-the-db, kill the stored-creds ;) 20:46:51 plaintext! 20:46:59 it may as well be 20:47:31 if we keep both around, you'll never be sure which we're doing 20:47:43 maybe we can look into a keystore for icehouse 20:47:47 stevebaker: So we say master only supports havana for native openstack deployments, are you saying we somehow have to maintain indefinite backwards compat for multicloud? 20:48:29 zaneb: If we can manage a flag-day migration, I would much prefer it, and the resulting code will be much much easier to maintain 20:48:30 shardy: that is something we should discuss 20:48:44 sorry, I have to go. 20:48:56 stevebaker: Ok lets pick it up on the ML 20:49:15 anyone else have anything for the last few minutes? 20:49:37 I was distracted for a bit, was a bug filed about not deleting failed instances? 20:49:45 I can file one for the InstanceGroup and take that on 20:50:20 radix: I think stevebaker was going to file one 20:50:32 for Instance, that is 20:50:34 radix: not yet, please do, it was discussed ref https://review.openstack.org/#/c/42462/ 20:51:03 okie doke 20:51:17 radix: https://bugs.launchpad.net/heat/+bug/1215132 20:51:19 whoah. nice new format for jenkins comments :D 20:51:20 Launchpad bug 1215132 in heat "Nova server gets deleted immediately after failed create" [High,Confirmed] 20:51:33 zaneb: ok, I'll create a similar one for InstanceGroup. 20:52:09 funzo: Did you get the feedback you needed re autoscaling last week? 20:52:30 funzo: guess you mainly need to speak with asalkeld re alarms etc from your ML post? 20:53:06 I didn't have a long discussion, it was just pointing to the doc 20:53:14 I can't assign bugs to milestones, so if someone wants to put https://bugs.launchpad.net/heat/+bug/1215140 in h3 that'd be peachy 20:53:15 Launchpad bug 1215140 in heat "InstanceGroup shouldn't delete instances that failed to be created" [Undecided,New] 20:53:57 shardy: I believe the general thought is we could use nested stacks as a first cut, but I've been focused more on DIB this past week. 20:54:21 funzo: Ok, well shout if there's any info you need from us :) 20:54:23 shardy: I'll probably talk more about the scaling work when the rhel images are booting in os 20:54:30 shardy: definitely will, thx 20:54:34 funzo: Ok, cool 20:54:47 anything else before we wrap things up? 20:55:25 Ok then, well thanks all! 20:55:31 #endmeeting