20:00:51 <zaneb> #startmeeting heat 20:00:52 <openstack> Meeting started Wed Jun 4 20:00:51 2014 UTC and is due to finish in 60 minutes. The chair is zaneb. Information about MeetBot at http://wiki.debian.org/MeetBot. 20:00:53 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 20:00:55 <openstack> The meeting name has been set to 'heat' 20:01:16 <skraynev> hey guys! 20:01:17 <zaneb> #topic roll call 20:01:21 <tspatzier> hi all 20:01:24 <stevebaker> \o for now 20:01:24 <mspreitz> here 20:01:30 <andrew_plunk> hello 20:01:37 <tango> Hi 20:01:40 <cyli> hi 20:01:41 <iqbalmohomed> Hello everyone 20:01:45 <zaneb> does anybody want to have a go at chairing one of these meetings? 20:01:51 <andreaRosa_home> hi 20:01:55 <randallburt> hello all 20:02:09 <jpeeler> hey 20:02:27 <mspreitz> Zaneb: you mean today, or later? 20:02:35 <zaneb> mspreitz: either 20:02:43 <BillArnold_> hello 20:02:47 <mspreitz> I'll offer to do one later 20:02:51 <zaneb> I have therve doing the other one 20:02:57 <zaneb> mspreitz: cheers :) 20:03:31 <mspreitz> You can contact me directly to arrange a date 20:03:34 <SpamapS> o/ 20:03:35 <zaneb> ok 20:03:42 <zaneb> #topic Convergence 20:03:48 <zaneb> SpamapS: go 20:03:49 <SpamapS> I can't chair today because I have to leave in 20m 20:03:50 <stevebaker> I can't chair, kid wrangling for school 20:03:51 <SpamapS> zaneb: thanks 20:04:02 <SpamapS> I put this first because I have to excuse myself in a bit 20:04:25 <SpamapS> I just wanted to highlight that the specs are in full review for convergence, and I really appreciate the comments and ideas already added to them. 20:04:49 <zaneb> #link https://review.openstack.org/95907 20:05:02 <zaneb> #link https://review.openstack.org/96394 20:05:11 <zaneb> #link https://review.openstack.org/96404 20:05:23 <SpamapS> I also would like to see _more_ from everyone, as this is a large change, and it will really affect Heat in a large way, so we don't want to move forward without the full support of the core reviewers and community of developers as a whole. 20:05:31 <SpamapS> zaneb: thanks was just about to go fetch those :) 20:05:54 <SpamapS> I wanted to bring up one thing: taskflow. 20:05:58 <bgorski> o/ 20:06:02 <mspreitz> Yes, I plan to say more 20:06:06 <stevebaker> SpamapS, can you tell me how the observer notifies that a thing is done? is the observer RPC API sync or async, or does the observer actually push the next task into taskflow? 20:06:28 <SpamapS> stevebaker: taskflow isn't where the list of tasks to be done go. 20:06:46 <randallburt> SpamapS: taskflow as a concept or taskflow the specific library? 20:06:56 <SpamapS> stevebaker: taskflow would be used to manage the process of converging an out of sync resource. 20:07:02 <SpamapS> randallburt: the library. 20:07:11 <SpamapS> the one that all the rest of OpenStack has accepted and is adopting in some form. 20:07:12 <randallburt> SpamapS: k 20:07:50 <SpamapS> stevebaker: the graph that is in the database drives what happens after a resource has been moved to a COMPLETE state. 20:08:23 <stevebaker> SpamapS, so the converger polls the database waiting for the observer to write to it? 20:08:34 <SpamapS> stevebaker: so only convergence, not observer, changes that state, and thus it would be responsible for initiating calls to converge those items which have all of their parents in a COMPLETE state. 20:08:56 <mspreitz> SpamapS: which state? 20:09:01 <SpamapS> stevebaker: observer changes the database and calls the convergence engine whenever observed state changes. 20:09:31 <stevebaker> ok 20:09:36 <SpamapS> Oh I just used the wrong words. Lets try that again. 20:10:23 <SpamapS> so whatever changes the observed state to match the goal state is responsible for initiating calls to converge those items which have all of their parents in a COMPLETE observed state. 20:11:02 * radix arrives 20:11:06 <SpamapS> Regarding async vs. sync, I believe that should be async, but reliable. I don't actually know if we have a way to do that with our current RPC system. 20:11:08 <mspreitz> SpamapS: without regard to whether those downstream items are currently diverged? 20:12:16 <stevebaker> SpamapS, btw, I don't think you were in the room when we volunteered therve to look at starting the observer work during the RPC notification session in Atlanta 20:12:25 <SpamapS> mspreitz: if their parents were diverged, they will either be in an "initialized only" state (meaning they exist but have never been created) or they are existing but may need their other bits updated now that parents might ahve new attribute values. 20:12:35 <stevebaker> SpamapS, so you should just sync up with him if you want to start it 20:12:44 <SpamapS> stevebaker: no I was not 20:13:10 <SpamapS> and I already started POC work just to validate my assumptions while writing the spec. That is fine, set based design and all. 20:13:40 <stevebaker> I assume he hasn't started yet 20:14:18 <mspreitz> SpamapS: My question is, for downstream items that already exist, does engine test whether they need convergence or is there a call to the resource regardless? 20:14:21 <SpamapS> I've just written some stubs and basic methods to observe, and to walk the dependency graph to find children. 20:14:56 <SpamapS> mspreitz: a converge call on an item that is not diverged is what you just described: engine testing whether it needs convergence. 20:15:07 <SpamapS> a relative noop 20:15:20 <stevebaker> SpamapS, are we going to need a resource-level lock? or lock free? 20:15:23 <SpamapS> if observed_state == goal_state: return 20:15:55 <zaneb> SpamapS: how will you handle merging dependency branches? 20:16:07 <zaneb> i.e. when one resource depends on two others 20:16:08 <mspreitz> SpamapS: I am not sure I understand your answer. Let me ask my question another way. Once a divergence is healed at one resource, is it inevitable that all recursive dependents will be called (and if not, how does it stop)? 20:16:21 <SpamapS> stevebaker: the db should be able to control access at the row level. I am hoping we don't have to be explicit there, just rely on transaction serialization (which we may need to explicitly use) 20:17:03 <mspreitz> SpamapS: We probably do not want a transaction that lives as long as it takes to do an arbitrary resource call 20:17:09 <SpamapS> mspreitz: it is inevitable that the direct dependents will have converge called on them. I don't think we'll force a full traversal though. 20:17:16 <randallburt> SpamapS: +1 and I'm crossing my fingers. 20:17:23 <zaneb> mspreitz: I assume the immediate dependencies are always triggered, and we stop at the point where nothing has changed 20:17:44 <mspreitz> So the resource can report "no change" ? 20:18:01 <SpamapS> zaneb: so thats the code I was writing, I find all the children, and then look at their parents. If any parents are still diverged, I do nothing. 20:18:46 <mspreitz> regarding locking, I think we need explicit locks --- you do not want to have to hold a transaction open for as long as it takes to do the convergence for a given resource 20:18:49 <zaneb> ok, so you have a plan :) 20:18:55 <SpamapS> zaneb: only the last parent's COMPLETE state change should kick off convergence of the children. 20:19:18 <SpamapS> mspreitz: no you don't need it for the time it takes to do the convergence. 20:19:33 <stevebaker> mspreitz, I don't think that is what was being suggested 20:19:44 <SpamapS> mspreitz: but you do need it for the time it takes to look around the graph. 20:19:50 <SpamapS> which is tiny 20:20:22 * mspreitz is figuring SpamapS is typing 20:20:26 <stevebaker> shall we move on? 20:20:36 <SpamapS> You basically don't want to be acting thinking that you're not the last COMPLETE parent, when you are. So we need that to be serialized by having a transaction open. 20:21:06 <zaneb> yes, let's move on 20:21:21 <zaneb> #topic API stability for out-of-tree resources 20:21:26 <SpamapS> It might be easier to use tooz to do that. I would accept that level of explicit locking if the transactions were deemed unreliable or overly complex 20:21:27 <zaneb> stevebaker: I'm guessing this is you 20:21:27 <stevebaker> \o 20:21:30 <SpamapS> Anyway yeah I have to move on too 20:21:33 * SpamapS heads to dentist 20:21:38 <SpamapS> thanks everyone 20:22:12 <randallburt> we have "out-of-tree" resources? 20:22:15 <stevebaker> I'm wondering to what extent we need to keep the internal API stable to ease the burden of maintaining an out-of-tree resources 20:22:37 <zaneb> I would say to a great extent 20:22:45 <mspreitz> Sounds like a worthy goal to me 20:22:56 <zaneb> a lot of people at summit seemed to be using custom resources 20:23:28 <stevebaker> in the long term sure, but we've never set any expectations that our internal API is stable, so what should we set? 20:23:48 <stevebaker> some options are 20:24:03 <zaneb> really? I have always set the expectation that the public parts of that API are supposed to be stable 20:24:09 <randallburt> Why not encourage contrib contributions as well? Hooking your own ci/cd into Gerrit also seems like a viable solution as well. 20:24:19 <stevebaker> zaneb, what is public? 20:24:56 <radix> everything without an underscore in front of it? :) 20:24:59 <zaneb> stevebaker: that's a trickier question, but for the most part it's obvious 20:25:06 <randallburt> the lifecycle spec for a resource plugin (the methods you need to override) should be relatively stable, but other than that, I'm not sure there's much more the plug-in author should worry about. 20:25:07 <zaneb> what radix said for a start :D 20:25:16 <stevebaker> we could say that unless you put it in contrib or hook in 3rd-party CI then things could break at any time 20:25:24 <randallburt> radix: def not 20:25:29 <randallburt> stevebaker: +1 20:25:34 <zaneb> I have no problem with deprecating stuff 20:25:42 <zaneb> but we should follow a normal deprecation process 20:25:50 <radix> well, that's a default that python programmers expect, and afaik we don't have a published document describing what's public or not 20:25:50 <stevebaker> we could say that the Resource class is stable enough, but anything else is fair game (Stack, nova_utils etc) 20:25:57 <randallburt> things like handle_ and check_ should be changed with great care, but not much else counts (or should) 20:26:18 <radix> randallburt: thats' not a good enough description, and also it's not published 20:26:31 <randallburt> radix: it is in the plug-in developers guide 20:26:44 <radix> oh. never mind then :) 20:27:05 <zaneb> yeah, I'd say to a first approximation, anything mentioned in the plugin developer guide should be stable 20:27:20 <randallburt> zaneb: +1 20:27:34 <stevebaker> also, for things which are deprecated, can we remove them as soon as the next dev cycle opens? 20:28:02 <mspreitz> zaneb: I presume that relationship runs both ways 20:28:22 <zaneb> stevebaker: I think the usual policy is it remains deprecated for a whole cycle, then remove at the beginning of the next one 20:28:45 <BillArnold_> zaneb +1 (deprecation for an entire cycle) 20:28:47 <skraynev> stevebaker: agree with zaneb 20:28:49 <zaneb> mspreitz: fair point :) 20:29:00 <stevebaker> dang, with the amount of refactoring convergence will need that is a long time 20:29:31 <zaneb> I'd be happy to say anything deprecated before j-1 could be removed at the beginning of K 20:29:54 <stevebaker> ok, I'm alright with that 20:30:09 <zaneb> stevebaker: we may just need a whole new plugin api for convergence 20:30:23 <mspreitz> zaneb: was that "before the open of j-1" or 'before the close of j-1' ? 20:30:31 <skraynev> zaneb: possibly will be good to have some list with things for removing (that were deprecated) 20:30:38 <zaneb> mspreitz: j-1 is a milestone 20:31:17 <skraynev> zaneb: I mean useful for creating bug about deleting such kind things 20:31:41 <zaneb> I do think it ought to be something particularly heinous we're removing to justify rushing it 20:32:46 <zaneb> skraynev: bugs are good. DeprecationWarning in the code is also good 20:33:20 <zaneb> #topic Voting and reliability of heat-slow 20:33:25 <zaneb> stevebaker: this is also you 20:33:28 <stevebaker> me again 20:34:25 <stevebaker> the heat-slow job fails about 20% of the time due to slow nodes timing out fedora boot, but it is a very useful check that a change is valid 20:34:25 <skraynev> one question about it: does it work now ? I have seen some errors in Jenkins results 20:34:49 <stevebaker> so IMO it *must* pass before any heat change is approved 20:35:15 <zaneb> stevebaker: do you have a bug number for the timeout? 20:35:21 <mspreitz> stevebaker: what would it take to fix the timeout period? 20:35:22 <zaneb> that we can use for retriggers 20:35:40 <stevebaker> there was a change this week which broke heat-slow completely 20:35:46 <stevebaker> zaneb, I can find it later 20:36:04 <stevebaker> mspreitz, a custom image built during devstack start, which I am working on 20:36:23 <mspreitz> ugh! 20:36:47 <mspreitz> Is there a problem here to propagate upstream? 20:37:07 <zaneb> stevebaker: https://bugs.launchpad.net/tempest/+bug/1297560 ? 20:37:09 <uvirtbot> Launchpad bug 1297560 in tempest "*tempest-dsvm-neutron-heat-slow fails with WaitConditionTimeout" [Undecided,New] 20:37:26 <stevebaker> so we can either spank the miscreants when heat-slow breaking changes land, or we could make heat-slow a voting job just for heat changes 20:37:54 <zaneb> I would support the latter, if the fix for the current issue ever gets merged 20:38:20 <stevebaker> I think it has merged, I can't see it in therves changes any more 20:38:33 <randallburt> +1 for the latter 20:38:44 <zaneb> I didn't know anything was getting merged :D 20:39:30 <stevebaker> zaneb, that is the bug. it would be better if that bug described what a logind timeout on boot looked like 20:39:42 <mspreitz> stevebaker: is the time limit in the image or something less "baked" ? 20:39:55 <stevebaker> ok, action for me: make heat-slow voting 20:39:58 <zaneb> #agreed we want the heat-slow Tempest job gating on Heat patches 20:40:14 <stevebaker> mspreitz, another option is switch to ubuntu, which also requires custom image 20:40:32 <zaneb> #info the bug for timeout rechecks is bug 1297560 20:40:34 <uvirtbot> Launchpad bug 1297560 in tempest "*tempest-dsvm-neutron-heat-slow fails with WaitConditionTimeout" [Undecided,New] https://launchpad.net/bugs/1297560 20:40:40 <mspreitz> Is the customization to change the time limit, or is it about something else? 20:40:42 <zaneb> #link https://bugs.launchpad.net/tempest/+bug/1297560 20:40:43 <uvirtbot> Launchpad bug 1297560 in tempest "*tempest-dsvm-neutron-heat-slow fails with WaitConditionTimeout" [Undecided,New] 20:41:09 <stevebaker> mspreitz, custom image is also required to install the software-config agent and hooks 20:41:24 <zaneb> #action stevebaker to make heat-slow voting on heat 20:41:49 <zaneb> #topic Review last meeting's actions 20:42:07 <zaneb> 1. zaneb to sync with andrew_plunk on status of Rackspace CI Jenkins job 20:42:16 <zaneb> we synced 20:42:21 <zaneb> nothing had happened yet 20:42:35 <zaneb> andrew_plunk: want to give a quick update? 20:42:55 <andrew_plunk> zaneb, things are happening now, I have had to disable the job because of some build failures in the template we are using for integration testing 20:43:39 <andrew_plunk> which has led me down the path of how to do reporting when an upstream patch causes a problem on the Rackspace cloud 20:43:52 <zaneb> #action andrew_plunk to continue working on integrating Rackspace 3rd-party CI 20:43:56 <andrew_plunk> and how to differentiate those problems from service failures 20:43:59 <andrew_plunk> ok. Thanks zaneb 20:44:11 <zaneb> 2. zaneb to sync with stevebaker on metadata in resource plugin API 20:44:21 <zaneb> we synced on this too 20:44:28 <zaneb> I have no recollection of the outcome :D 20:44:50 <zaneb> I guess this is actually the agenda item we just discussed 20:45:00 <mspreitz> ? 20:45:40 <stevebaker> oh, I haven't put a patch together for that yet 20:45:46 <mspreitz> zaneb: can you pls explain that? 20:46:01 <zaneb> mspreitz: it's about stability of the resource plugin API 20:46:03 <stevebaker> too busy breaking other internal API ;) 20:46:19 * zaneb runs away 20:46:22 <mspreitz> pls remind me, are we adding or removing metadata? 20:46:29 <zaneb> mspreitz: neither 20:46:37 <zaneb> stevebaker broke the api 20:46:50 <zaneb> we discussed how to fix it again before anyone notices 20:47:08 <mspreitz> got it, thanks 20:47:55 <zaneb> #action stevebaker to put up a patch to restore the metadata attribute of Resources for plugin API backwards compatibility 20:48:15 <zaneb> #topic oslo-messaging blueprint 20:48:26 <zaneb> #link https://blueprints.launchpad.net/heat/+spec/oslo-messaging 20:48:46 <zaneb> sdake was working on migrating us to oslo-messaging 20:48:54 <zaneb> however, he is sadly indisposed 20:49:11 <zaneb> I know a lot of people are wanting this change to go through 20:49:18 <zaneb> does anyone want to pick it up? 20:49:35 <zaneb> there are patches available, it just needs a bit of a tidy-up I think 20:50:05 <zaneb> now that therve has fixed the config options generation, it should be relatively easy 20:50:14 <cyli> zaneb: is this something someone new to heat can pick up easily? 20:50:40 <stevebaker> there was someone keen to do that, I can't remember who 20:50:55 <zaneb> cyli: it could be, actually 20:51:28 <stevebaker> i need to go now o/ 20:51:34 <zaneb> ok, if anyone wants it, grab the blueprint 20:51:52 <zaneb> if multiple people want it, ping me and I can co-ordinate 20:52:05 <iqbalmohomed> I'm also new and interested ... will take a look at it before commiting :D 20:52:32 <skraynev> zaneb: IMO, sdake will be a better person for finishing it :) He already knows all possible problem 20:53:41 <zaneb> #topic juno-1 blueprint status 20:54:42 <zaneb> I bumped a lot of blueprints that were targeted at j-1 20:55:07 <zaneb> #link https://launchpad.net/heat/+milestone/juno-1 20:56:04 <skraynev> looks like all have good progress 20:56:17 <zaneb> if you think your blueprint will actually land, please target it back 20:56:27 <zaneb> there are a couple there that still need approval 20:56:59 <zaneb> I haven't had the chance to look at the reviews for them yet 20:57:32 <zaneb> and for all blueprints you're working on, please keep the delivery status up to date 20:57:45 <tspatzier> zaneb: remind me, when is j-1? 20:57:50 <zaneb> #action everyone to update delivery status on blueprints 20:57:55 <zaneb> tspatzier: next week 20:58:19 <zaneb> #link https://wiki.openstack.org/wiki/Juno_Release_Schedule 20:58:28 <tspatzier> zaneb: thanks, that makes the decision for one of my bps easy 20:58:30 <BillArnold_> tspatzier 2014-06-12 20:58:34 <skraynev> so close... I forgot about it. 20:58:37 <zaneb> #info juno-1 milestone is next week 20:58:49 <zaneb> yeah, the first one goes quick 20:59:01 <zaneb> the key is to start working on stuff *before* summit 20:59:07 <tspatzier> zaneb: I have one that has not net been approved and targeted. I think j-2 should work - https://blueprints.launchpad.net/heat/+spec/action-aware-sw-config 20:59:43 <zaneb> tspatzier: thanks, I'll take a look 21:00:01 <zaneb> #topic Critical issues sync 21:00:12 <zaneb> I think everyone left 21:00:22 <zaneb> and we're out of time anyway 21:00:24 <iqbalmohomed> :) 21:00:29 <zaneb> #endmeeting