20:00:51 <zaneb> #startmeeting heat
20:00:52 <openstack> Meeting started Wed Jun  4 20:00:51 2014 UTC and is due to finish in 60 minutes.  The chair is zaneb. Information about MeetBot at http://wiki.debian.org/MeetBot.
20:00:53 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
20:00:55 <openstack> The meeting name has been set to 'heat'
20:01:16 <skraynev> hey guys!
20:01:17 <zaneb> #topic roll call
20:01:21 <tspatzier> hi all
20:01:24 <stevebaker> \o for now
20:01:24 <mspreitz> here
20:01:30 <andrew_plunk> hello
20:01:37 <tango> Hi
20:01:40 <cyli> hi
20:01:41 <iqbalmohomed> Hello everyone
20:01:45 <zaneb> does anybody want to have a go at chairing one of these meetings?
20:01:51 <andreaRosa_home> hi
20:01:55 <randallburt> hello all
20:02:09 <jpeeler> hey
20:02:27 <mspreitz> Zaneb: you mean today, or later?
20:02:35 <zaneb> mspreitz: either
20:02:43 <BillArnold_> hello
20:02:47 <mspreitz> I'll offer to do one later
20:02:51 <zaneb> I have therve doing the other one
20:02:57 <zaneb> mspreitz: cheers :)
20:03:31 <mspreitz> You can contact me directly to arrange a date
20:03:34 <SpamapS> o/
20:03:35 <zaneb> ok
20:03:42 <zaneb> #topic Convergence
20:03:48 <zaneb> SpamapS: go
20:03:49 <SpamapS> I can't chair today because I have to leave in 20m
20:03:50 <stevebaker> I can't chair, kid wrangling for school
20:03:51 <SpamapS> zaneb: thanks
20:04:02 <SpamapS> I put this first because I have to excuse myself in a bit
20:04:25 <SpamapS> I just wanted to highlight that the specs are in full review for convergence, and I really appreciate the comments and ideas already added to them.
20:04:49 <zaneb> #link https://review.openstack.org/95907
20:05:02 <zaneb> #link https://review.openstack.org/96394
20:05:11 <zaneb> #link https://review.openstack.org/96404
20:05:23 <SpamapS> I also would like to see _more_ from everyone, as this is a large change, and it will really affect Heat in a large way, so we don't want to move forward without the full support of the core reviewers and community of developers as a whole.
20:05:31 <SpamapS> zaneb: thanks was just about to go fetch those :)
20:05:54 <SpamapS> I wanted to bring up one thing: taskflow.
20:05:58 <bgorski> o/
20:06:02 <mspreitz> Yes, I plan to say more
20:06:06 <stevebaker> SpamapS, can you tell me how the observer notifies that a thing is done? is the observer RPC API sync or async, or does the observer actually push the next task into taskflow?
20:06:28 <SpamapS> stevebaker: taskflow isn't where the list of tasks to be done go.
20:06:46 <randallburt> SpamapS:  taskflow as a concept or taskflow the specific library?
20:06:56 <SpamapS> stevebaker: taskflow would be used to manage the process of converging an out of sync resource.
20:07:02 <SpamapS> randallburt: the library.
20:07:11 <SpamapS> the one that all the rest of OpenStack has accepted and is adopting in some form.
20:07:12 <randallburt> SpamapS:  k
20:07:50 <SpamapS> stevebaker: the graph that is in the database drives what happens after a resource has been moved to a COMPLETE state.
20:08:23 <stevebaker> SpamapS, so the converger polls the database waiting for the observer to write to it?
20:08:34 <SpamapS> stevebaker: so only convergence, not observer, changes that state, and thus it would be responsible for initiating calls to converge those items which have all of their parents in a COMPLETE state.
20:08:56 <mspreitz> SpamapS: which state?
20:09:01 <SpamapS> stevebaker: observer changes the database and calls the convergence engine whenever observed state changes.
20:09:31 <stevebaker> ok
20:09:36 <SpamapS> Oh I just used the wrong words. Lets try that again.
20:10:23 <SpamapS> so whatever changes the observed state to match the goal state is responsible for initiating calls to converge those items which have all of their parents in a COMPLETE observed state.
20:11:02 * radix arrives
20:11:06 <SpamapS> Regarding async vs. sync, I believe that should be async, but reliable. I don't actually know if we have a way to do that with our current RPC system.
20:11:08 <mspreitz> SpamapS: without regard to whether those downstream items are currently diverged?
20:12:16 <stevebaker> SpamapS, btw, I don't think you were in the room when we volunteered therve to look at starting the observer work during the RPC notification session in Atlanta
20:12:25 <SpamapS> mspreitz: if their parents were diverged, they will either be in an "initialized only" state (meaning they exist but have never been created) or they are existing but may need their other bits updated now that parents might ahve new attribute values.
20:12:35 <stevebaker> SpamapS, so you should just sync up with him if you want to start it
20:12:44 <SpamapS> stevebaker: no I was not
20:13:10 <SpamapS> and I already started POC work just to validate my assumptions while writing the spec. That is fine, set based design and all.
20:13:40 <stevebaker> I assume he hasn't started yet
20:14:18 <mspreitz> SpamapS: My question is, for downstream items that already exist, does engine test whether they need convergence or is there a call to the resource regardless?
20:14:21 <SpamapS> I've just written some stubs and basic methods to observe, and to walk the dependency graph to find children.
20:14:56 <SpamapS> mspreitz: a converge call on an item that is not diverged is what you just described: engine testing whether it needs convergence.
20:15:07 <SpamapS> a relative noop
20:15:20 <stevebaker> SpamapS, are we going to need a resource-level lock? or lock free?
20:15:23 <SpamapS> if observed_state == goal_state: return
20:15:55 <zaneb> SpamapS: how will you handle merging dependency branches?
20:16:07 <zaneb> i.e. when one resource depends on two others
20:16:08 <mspreitz> SpamapS: I am not sure I understand your answer.  Let me ask my question another way.  Once a divergence is healed at one resource, is it inevitable that all recursive dependents will be called (and if not, how does it stop)?
20:16:21 <SpamapS> stevebaker: the db should be able to control access at the row level. I am hoping we don't have to be explicit there, just rely on transaction serialization (which we may need to explicitly use)
20:17:03 <mspreitz> SpamapS: We probably do not want a transaction that lives as long as it takes to do an arbitrary resource call
20:17:09 <SpamapS> mspreitz: it is inevitable that the direct dependents will have converge called on them. I don't think we'll force a full traversal though.
20:17:16 <randallburt> SpamapS:  +1 and I'm crossing my fingers.
20:17:23 <zaneb> mspreitz: I assume the immediate dependencies are always triggered, and we stop at the point where nothing has changed
20:17:44 <mspreitz> So the resource can report "no change" ?
20:18:01 <SpamapS> zaneb: so thats the code I was writing, I find all the children, and then look at their parents. If any parents are still diverged, I do nothing.
20:18:46 <mspreitz> regarding locking, I think we need explicit locks --- you do not want to have to hold a transaction open for as long as it takes to do the convergence for a given resource
20:18:49 <zaneb> ok, so you have a plan :)
20:18:55 <SpamapS> zaneb: only the last parent's COMPLETE state change should kick off convergence of the children.
20:19:18 <SpamapS> mspreitz: no you don't need it for the time it takes to do the convergence.
20:19:33 <stevebaker> mspreitz, I don't think that is what was being suggested
20:19:44 <SpamapS> mspreitz: but you do need it for the time it takes to look around the graph.
20:19:50 <SpamapS> which is tiny
20:20:22 * mspreitz is figuring SpamapS is typing
20:20:26 <stevebaker> shall we move on?
20:20:36 <SpamapS> You basically don't want to be acting thinking that you're not the last COMPLETE parent, when you are. So we need that to be serialized by having a transaction open.
20:21:06 <zaneb> yes, let's move on
20:21:21 <zaneb> #topic API stability for out-of-tree resources
20:21:26 <SpamapS> It might be easier to use tooz to do that. I would accept that level of explicit locking if the transactions were deemed unreliable or overly complex
20:21:27 <zaneb> stevebaker: I'm guessing this is you
20:21:27 <stevebaker> \o
20:21:30 <SpamapS> Anyway yeah I have to move on too
20:21:33 * SpamapS heads to dentist
20:21:38 <SpamapS> thanks everyone
20:22:12 <randallburt> we have "out-of-tree" resources?
20:22:15 <stevebaker> I'm wondering to what extent we need to keep the internal API stable to ease the burden of maintaining an out-of-tree resources
20:22:37 <zaneb> I would say to a great extent
20:22:45 <mspreitz> Sounds like a worthy goal to me
20:22:56 <zaneb> a lot of people at summit seemed to be using custom resources
20:23:28 <stevebaker> in the long term sure, but we've never set any expectations that our internal API is stable, so what should we set?
20:23:48 <stevebaker> some options are
20:24:03 <zaneb> really? I have always set the expectation that the public parts of that API are supposed to be stable
20:24:09 <randallburt> Why not encourage contrib contributions as well? Hooking your own ci/cd into Gerrit also seems like a viable solution as well.
20:24:19 <stevebaker> zaneb, what is public?
20:24:56 <radix> everything without an underscore in front of it? :)
20:24:59 <zaneb> stevebaker: that's a trickier question, but for the most part it's obvious
20:25:06 <randallburt> the lifecycle spec for a resource plugin (the methods you need to override) should be relatively stable, but other than that, I'm not sure there's much more the plug-in author should worry about.
20:25:07 <zaneb> what radix said for a start :D
20:25:16 <stevebaker> we could say that unless you put it in contrib or hook in 3rd-party CI then things could break at any time
20:25:24 <randallburt> radix:  def not
20:25:29 <randallburt> stevebaker:  +1
20:25:34 <zaneb> I have no problem with deprecating stuff
20:25:42 <zaneb> but we should follow a normal deprecation process
20:25:50 <radix> well, that's a default that python programmers expect, and afaik we don't have a published document describing what's public or not
20:25:50 <stevebaker> we could say that the Resource class is stable enough, but anything else is fair game (Stack, nova_utils etc)
20:25:57 <randallburt> things like handle_ and check_ should be changed with great care, but not much else counts (or should)
20:26:18 <radix> randallburt: thats' not a good enough description, and also it's not published
20:26:31 <randallburt> radix:  it is in the plug-in developers guide
20:26:44 <radix> oh. never mind then :)
20:27:05 <zaneb> yeah, I'd say to a first approximation, anything mentioned in the plugin developer guide should be stable
20:27:20 <randallburt> zaneb:  +1
20:27:34 <stevebaker> also, for things which are deprecated, can we remove them as soon as the next dev cycle opens?
20:28:02 <mspreitz> zaneb: I presume that relationship runs both ways
20:28:22 <zaneb> stevebaker: I think the usual policy is it remains deprecated for a whole cycle, then remove at the beginning of the next one
20:28:45 <BillArnold_> zaneb +1 (deprecation for an entire cycle)
20:28:47 <skraynev> stevebaker: agree with zaneb
20:28:49 <zaneb> mspreitz: fair point :)
20:29:00 <stevebaker> dang, with the amount of refactoring convergence will need that is a long time
20:29:31 <zaneb> I'd be happy to say anything deprecated before j-1 could be removed at the beginning of K
20:29:54 <stevebaker> ok, I'm alright with that
20:30:09 <zaneb> stevebaker: we may just need a whole new plugin api for convergence
20:30:23 <mspreitz> zaneb: was that "before the open of j-1" or 'before the close of j-1' ?
20:30:31 <skraynev> zaneb: possibly will be good to have some list with things for removing (that were deprecated)
20:30:38 <zaneb> mspreitz: j-1 is a milestone
20:31:17 <skraynev> zaneb: I mean useful for creating bug about deleting such kind things
20:31:41 <zaneb> I do think it ought to be something particularly heinous we're removing to justify rushing it
20:32:46 <zaneb> skraynev: bugs are good. DeprecationWarning in the code is also good
20:33:20 <zaneb> #topic Voting and reliability of heat-slow
20:33:25 <zaneb> stevebaker: this is also you
20:33:28 <stevebaker> me again
20:34:25 <stevebaker> the heat-slow job fails about 20% of the time due to slow nodes timing out fedora boot, but it is a very useful check that a change is valid
20:34:25 <skraynev> one question about it: does it work now ? I have seen some errors in Jenkins results
20:34:49 <stevebaker> so IMO it *must* pass before any heat change is approved
20:35:15 <zaneb> stevebaker: do you have a bug number for the timeout?
20:35:21 <mspreitz> stevebaker: what would it take to fix the timeout period?
20:35:22 <zaneb> that we can use for retriggers
20:35:40 <stevebaker> there was a change this week which broke heat-slow completely
20:35:46 <stevebaker> zaneb, I can find it later
20:36:04 <stevebaker> mspreitz, a custom image built during devstack start, which I am working on
20:36:23 <mspreitz> ugh!
20:36:47 <mspreitz> Is there a problem here to propagate upstream?
20:37:07 <zaneb> stevebaker: https://bugs.launchpad.net/tempest/+bug/1297560 ?
20:37:09 <uvirtbot> Launchpad bug 1297560 in tempest "*tempest-dsvm-neutron-heat-slow fails with WaitConditionTimeout" [Undecided,New]
20:37:26 <stevebaker> so we can either spank the miscreants when heat-slow breaking changes land, or we could make heat-slow a voting job just for heat changes
20:37:54 <zaneb> I would support the latter, if the fix for the current issue ever gets merged
20:38:20 <stevebaker> I think it has merged, I can't see it in therves changes any more
20:38:33 <randallburt> +1 for the latter
20:38:44 <zaneb> I didn't know anything was getting merged :D
20:39:30 <stevebaker> zaneb, that is the bug. it would be better if that bug described what a logind timeout on boot looked like
20:39:42 <mspreitz> stevebaker: is the time limit in the image or something less "baked" ?
20:39:55 <stevebaker> ok, action for me: make heat-slow voting
20:39:58 <zaneb> #agreed we want the heat-slow Tempest job gating on Heat patches
20:40:14 <stevebaker> mspreitz, another option is switch to ubuntu, which also requires custom image
20:40:32 <zaneb> #info the bug for timeout rechecks is bug 1297560
20:40:34 <uvirtbot> Launchpad bug 1297560 in tempest "*tempest-dsvm-neutron-heat-slow fails with WaitConditionTimeout" [Undecided,New] https://launchpad.net/bugs/1297560
20:40:40 <mspreitz> Is the customization to change the time limit, or is it about something else?
20:40:42 <zaneb> #link https://bugs.launchpad.net/tempest/+bug/1297560
20:40:43 <uvirtbot> Launchpad bug 1297560 in tempest "*tempest-dsvm-neutron-heat-slow fails with WaitConditionTimeout" [Undecided,New]
20:41:09 <stevebaker> mspreitz, custom image is also required to install the software-config agent and hooks
20:41:24 <zaneb> #action stevebaker to make heat-slow voting on heat
20:41:49 <zaneb> #topic  Review last meeting's actions
20:42:07 <zaneb> 1. zaneb to sync with andrew_plunk on status of Rackspace CI Jenkins job
20:42:16 <zaneb> we synced
20:42:21 <zaneb> nothing had happened yet
20:42:35 <zaneb> andrew_plunk: want to give a quick update?
20:42:55 <andrew_plunk> zaneb, things are happening now, I have had to disable the job because of some build failures in the template we are using for integration testing
20:43:39 <andrew_plunk> which has led me down the path of how to do reporting when an upstream patch causes a problem on the Rackspace cloud
20:43:52 <zaneb> #action andrew_plunk to continue working on integrating Rackspace 3rd-party CI
20:43:56 <andrew_plunk> and how to differentiate those problems from service failures
20:43:59 <andrew_plunk> ok. Thanks zaneb
20:44:11 <zaneb> 2. zaneb to sync with stevebaker on metadata in resource plugin API
20:44:21 <zaneb> we synced on this too
20:44:28 <zaneb> I have no recollection of the outcome :D
20:44:50 <zaneb> I guess this is actually the agenda item we just discussed
20:45:00 <mspreitz> ?
20:45:40 <stevebaker> oh, I haven't put a patch together for that yet
20:45:46 <mspreitz> zaneb: can you pls explain that?
20:46:01 <zaneb> mspreitz: it's about stability of the resource plugin API
20:46:03 <stevebaker> too busy breaking other internal API ;)
20:46:19 * zaneb runs away
20:46:22 <mspreitz> pls remind me, are we adding or removing metadata?
20:46:29 <zaneb> mspreitz: neither
20:46:37 <zaneb> stevebaker broke the api
20:46:50 <zaneb> we discussed how to fix it again before anyone notices
20:47:08 <mspreitz> got it, thanks
20:47:55 <zaneb> #action stevebaker to put up a patch to restore the metadata attribute of Resources for plugin API backwards compatibility
20:48:15 <zaneb> #topic oslo-messaging blueprint
20:48:26 <zaneb> #link https://blueprints.launchpad.net/heat/+spec/oslo-messaging
20:48:46 <zaneb> sdake was working on migrating us to oslo-messaging
20:48:54 <zaneb> however, he is sadly indisposed
20:49:11 <zaneb> I know a lot of people are wanting this change to go through
20:49:18 <zaneb> does anyone want to pick it up?
20:49:35 <zaneb> there are patches available, it just needs a bit of a tidy-up I think
20:50:05 <zaneb> now that therve has fixed the config options generation, it should be relatively easy
20:50:14 <cyli> zaneb:  is this something someone new to heat can pick up easily?
20:50:40 <stevebaker> there was someone keen to do that, I can't remember who
20:50:55 <zaneb> cyli: it could be, actually
20:51:28 <stevebaker> i need to go now o/
20:51:34 <zaneb> ok, if anyone wants it, grab the blueprint
20:51:52 <zaneb> if multiple people want it, ping me and I can co-ordinate
20:52:05 <iqbalmohomed> I'm also new and interested ... will take a look at it before commiting :D
20:52:32 <skraynev> zaneb: IMO, sdake will be a better person for finishing it :) He already knows all possible problem
20:53:41 <zaneb> #topic juno-1 blueprint status
20:54:42 <zaneb> I bumped a lot of blueprints that were targeted at j-1
20:55:07 <zaneb> #link https://launchpad.net/heat/+milestone/juno-1
20:56:04 <skraynev> looks like all have good progress
20:56:17 <zaneb> if you think your blueprint will actually land, please target it back
20:56:27 <zaneb> there are a couple there that still need approval
20:56:59 <zaneb> I haven't had the chance to look at the reviews for them yet
20:57:32 <zaneb> and for all blueprints you're working on, please keep the delivery status up to date
20:57:45 <tspatzier> zaneb: remind me, when is j-1?
20:57:50 <zaneb> #action everyone to update delivery status on blueprints
20:57:55 <zaneb> tspatzier: next week
20:58:19 <zaneb> #link https://wiki.openstack.org/wiki/Juno_Release_Schedule
20:58:28 <tspatzier> zaneb: thanks, that makes the decision for one of my bps easy
20:58:30 <BillArnold_> tspatzier 2014-06-12
20:58:34 <skraynev> so close... I forgot about it.
20:58:37 <zaneb> #info juno-1 milestone is next week
20:58:49 <zaneb> yeah, the first one goes quick
20:59:01 <zaneb> the key is to start working on stuff *before* summit
20:59:07 <tspatzier> zaneb: I have one that has not net been approved and targeted. I think j-2 should work - https://blueprints.launchpad.net/heat/+spec/action-aware-sw-config
20:59:43 <zaneb> tspatzier: thanks, I'll take a look
21:00:01 <zaneb> #topic Critical issues sync
21:00:12 <zaneb> I think everyone left
21:00:22 <zaneb> and we're out of time anyway
21:00:24 <iqbalmohomed> :)
21:00:29 <zaneb> #endmeeting