12:00:34 #startmeeting Heat 12:00:35 Meeting started Wed Oct 15 12:00:34 2014 UTC and is due to finish in 60 minutes. The chair is asalkeld. Information about MeetBot at http://wiki.debian.org/MeetBot. 12:00:36 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 12:00:38 The meeting name has been set to 'heat' 12:00:45 #topic rollcall 12:00:51 o/ 12:00:51 \o 12:00:53 o/ 12:00:53 hi 12:00:56 hi 12:00:59 hi 12:01:00 \o/ 12:01:14 #link https://wiki.openstack.org/wiki/Meetings/HeatAgenda 12:01:18 hi 12:01:34 hi 12:01:55 Hi 12:02:13 all in 12:02:17 #topic Review action items from last meeting 12:02:22 good afternoon/evening 12:02:34 only 2 12:02:51 i sent an email re: the meetup 12:03:21 people need to have a look here: https://etherpad.openstack.org/p/kilo-heat-midcycle-meetup 12:03:33 o/ 12:03:50 yo zaneb 12:04:10 #topic Adding items to the agenda 12:04:12 morning y'all 12:04:18 morning 12:04:18 any more topics 12:04:52 hi Qiming 12:04:55 transition period for removing HARestarter 12:05:12 mspreitz, haven't we been through that? 12:05:26 * shardy thought we had 12:05:42 Last time we talked, folks were going to look at my proposal. Can we agree on a conclusion? 12:05:44 lets add it to the end 12:05:44 * zaneb too 12:06:22 Qiming, you want to talk about the autoscale inheritance? 12:06:33 asalkeld, sure 12:06:47 Qiming, i'll add a topic for the end 12:06:47 #link https://review.openstack.org/#/c/119619/ 12:07:02 i am going to start moving through the topics now 12:07:05 oh, sorry. I am late, but I am here :) hi all 12:07:12 hi skraynev 12:07:17 #topic Critical issues sync 12:07:20 thx, guys 12:07:28 any critical issues? 12:08:10 #link https://bugs.launchpad.net/heat/+bug/1370865 12:08:12 Launchpad bug 1370865 in heat "tempest.api.orchestration.stacks.test_update.UpdateStackTestJSON.test_stack_update_add_remove - mismatch error" [Critical,In progress] 12:08:13 I had a question about one 12:08:56 o/ 12:09:19 actually never mind. it was https://review.openstack.org/#/c/128159/ but it has been abandoned 12:09:25 zaneb, you want to summerize that's a long bug 12:10:05 sure. something appears to be afoot with the db 12:10:19 :) 12:10:31 such that we are sometimes reporting update_complete... 12:10:33 ok, is it under control 12:10:51 anything we can do to help? 12:11:21 and then _subsequently_ when you retrieve the stack, you get data from the old template 12:11:51 that's not fun 12:12:00 zaneb: is that a release blocker for Juno? IIRC we proclaim final release tomorrow? 12:12:04 it was a low-ish level of failure, then there was a patch that I think made it *much* worse, so I reverted that 12:12:19 shardy: correct, and I don't regard this as a blocker 12:12:27 helpfully, in the meantime, QE reverted the only test which triggered it :\ 12:12:36 yeah, the test has been disabled in Tempest 12:12:45 can we drop the severity? 12:12:46 so I'm not even sure how you would test this 12:13:06 asalkeld: done 12:13:15 ok, then I'll move on 12:13:29 #topic Reminder to update midcycle etherpad 12:13:35 is the code for the test still in tempest? We could run it locally 12:13:56 ryansb: No sdague claimed it was broken by design and reverted it 12:14:04 ryansb: initially they removed it, but I believe they put it back with a skip 12:14:11 stevebaker has a revert-the-revert patch up for discussion 12:14:42 zaneb: https://review.openstack.org/#/c/126464/ 12:14:47 not yet AFAIK 12:14:49 asalkeld: what kind of info should be updated? voting for the meetup place ot something else? 12:14:52 unless that patch is a dupe? 12:15:11 skraynev, yip what your preferences are 12:15:21 shardy: ah, ok, I hadn't kept up with it. just knew they had agreed to it in principle 12:15:47 that's all for that topic 12:15:59 #link https://etherpad.openstack.org/p/kilo-heat-midcycle-meetup 12:16:23 Poland is an option, I've spoken with few people and we can get some venue 12:16:32 depends on number of people attending 12:16:37 nice inc0 12:16:39 ooh, Moscow or Poland would be awesome 12:16:42 *in the Summer* 12:16:54 ;) 12:16:54 we have not shortage of venues 12:16:55 its seaside;) 12:16:56 If you want a real russion experience, you go in the winter 12:17:01 zaneb: ha, not in February I guess 12:17:04 I also checked and in theory we could use RHT offices in Brno 12:17:14 you'll have a chance to take a walk *on the sea* 12:17:19 not only by the sea 12:17:34 since I was not at the last meet-up: what kind of venue is needed? One big room for all, or several breakout rooms? 12:17:36 * shardy shivers 12:17:56 i mostly have shorts 12:18:09 tspatzier: we did it with one big room, and it worked pretty well I'd say 12:18:15 yes, it will be well...different than Australia 12:18:20 mspreitz: that presumes people *want* a russian winter experience 12:18:26 with a big white board 12:18:30 please 12:18:49 asalkeld++ 12:19:00 tho I think all of Europe will be cold at that time 12:19:13 unless we're talking of Spain or so 12:19:30 ok, lets not spend any more time on that 12:19:36 just a reminder 12:19:48 #topic Reminder to update summit session etherpad 12:19:59 another reminder about $topic 12:20:06 get on it 12:20:18 #topic Convergence: Persisting graph and resource versioning 12:20:34 ok shoot 12:20:52 unmeshg_, ... 12:21:10 #link https://wiki.openstack.org/wiki/Heat/ConvergenceDesign 12:21:20 #link https://etherpad.openstack.org/p/convergence 12:21:23 This would be first step towards convergence 12:21:26 we have the wiki with overall design of convergence 12:21:30 thanks, we would like to get more eyes on the wiki 12:21:44 I have put some comments on the etherpad 12:22:04 IMO we need a semi functional PoC before summit 12:22:10 ok 12:22:15 even with fake resources 12:22:24 I plan to put eyes on this once Juno is out 12:22:24 and wobbling 12:22:42 the idea was to bring the persist graph and resource versioning into everyone's notice 12:22:51 i don't want to assume the design works 12:22:56 that forms the basis...as I have mentioned in the wiki 12:23:03 lets play and make sure 12:23:03 asalkeld, sure 12:23:23 Ok, so we will have patches for them and folks can play around then 12:23:26 yeah, I feel like we are flying blind here, and db migrations are pretty much the most expensive thing to be guessing at 12:23:27 unmeshg_: one comment (which I made on the convegence-engine spec review) Is that IMO the convergence worker (and possibly the observer) should be more heat-engine processes 12:23:43 where we just spread out the load by one engine calling another via RPC 12:23:55 shardy, isn't that the plan 12:23:56 unless there's a really clear reason for a totally new process 12:24:00 shardy: they do completely different things 12:24:04 (i thoughtit was) 12:24:11 asalkeld: well, it's not from the design in the wiki 12:24:32 the engine isn't an engine any more 12:24:35 really, i thought that was just logical 12:24:39 it's more of a kickstarter 12:24:43 yip 12:24:59 s/EngineService/SomethingelseService 12:25:02 yeah...we should discuss on that more...I will try to elaborate more there 12:25:04 it does the initial db storage and kicks off the process 12:25:09 zaneb: the engine currently creates stacks and resources, we need to decouple those and enable recursion via RPC, similar to what I started for nested stacks 12:25:17 zaneb: yes 12:25:23 my question, where will we define what is autohealing strategy? Volume goes ERROR for some reason, what do we do then? 12:25:39 that's continous observer 12:25:45 maybe I'm missing context from not being at the meetup, but we shouldn't underestimate the pain related to deployers with adding a bunch of new services vs just adding more of what we have 12:25:57 do we want RPC, or work queuing? 12:26:05 inc0: so the idea is maybe the state will become a property of the volume 12:26:16 shardy, i thought we would just have heat-engine 12:26:22 and 3 threads 12:26:23 inc0: so if the state doesn't match the property, then there is work to do 12:26:38 asalkeld: Ok, cool, at least it wasn't just me :) 12:26:39 zaneb, sure, and what work that would be? 12:26:56 in volume case "destroy->create" is not really a good idea 12:26:58 for example 12:27:14 inc0: that's up to the plugin 12:27:36 more up to the plugin now 12:27:40 zaneb what would the default be? 12:28:03 let's save someother day for discussing on the engine or worker part 12:28:15 graph? 12:28:23 my agenda was to discuss what is outlined in the wiki and persisting the graph 12:28:24 BillArnold: I don't think that question makes sense. It's like asking what's the "default" handle_update() now? 12:28:26 I think the framework should recognize that some things can be healed by re-creation and some can not 12:28:34 see, the overall design is being put there 12:28:40 ananta: Ok, I just wanted to point it out incase you have folks implementing exactly what is on the wiki page 12:28:58 should we add like handle_autoheal() to every resource maybe? 12:29:18 FOr those that can not heal by re-create, the heal, if any, is on a bigger scope 12:29:19 BillArnold, mspreitz : I guess it's "raise UpdateReplace", and will be in future too 12:29:26 zaneb: well, to be fair, the default update policy is replacement.. 12:29:46 So the default convergence policy might be replacement, or "do nothing" 12:29:50 ananta, the graph stuff seems ok to me 12:29:56 mspreitz, volume is one of examples imho 12:30:09 inc0: exactly. The volume itself can not recover 12:30:23 I mean, the recovery involves more than a volume 12:30:24 my only question is if multiple requests come in create then update how do we deal with multiple graphs 12:30:30 asalkeld: ok 12:30:58 asalkeld: huh? REST says each update is given whole new graph 12:31:02 ananta, i have put my questions on the etherpad 12:31:05 asalkeld: yep, that is the $64 question 12:31:21 sure it gets a new graph 12:31:34 we'll have to work on a single latest graph i suppose 12:31:38 but what happens to dangling resources etc. 12:31:53 do we just ditch the older graphs 12:32:01 gneed to have a gc running. 12:32:12 asalked: that's why I hate the state = deletme idea. If something is not in the latest desired state, that is what means "delete" 12:32:17 also note this graph does seem a lot like a workflow 12:32:23 just saying 12:32:51 asalkeld, what if we are already in process of building graph 1 when graph 2 comes in? that will be often a case, because resources tends to breaks in cascade 12:33:01 asalkeld: I have not been following convergence closely. I had expected graph would be a nodel of desired state, not workflow to get there. 12:33:16 one fail causes another and observer might see each one as separate failure 12:33:23 mspreitz, it seems to be a graph of depenancies 12:33:26 mspreitz: the graph is dependencies 12:33:44 both actual and projected 12:33:48 but with some extra stuff 12:33:49 graph will just have edges resorded 12:33:52 retries 12:33:54 the graph is dependencies and edges are stored 12:33:56 recorded 12:34:39 desired states too are stores in separate model 12:34:44 so guys a had a big question about restarting the actions in case of process failer 12:34:46 @asalkeld: a persisted graph will be deleted only after another graph (after stack update) which has been realized 12:34:58 please go thro the spec https://review.openstack.org/#/c/123749/ 12:35:02 i think its part of unmeshg_ commit on convergence observer 12:35:13 ok 12:35:22 If all you think about is CREATE, then graph w edges looks like a workflow. Add in UPDATE and DELETE, you see your graph is a model of desired state and the workflow is a separate question 12:35:54 ckmvishnu, ananta unmeshg_ please note there are lots of people very keen to help (including me) 12:36:06 unmeshg_: I'm not sure that works. graphs will have dependencies between each other. they need to be combined into one big graph 12:36:06 the graph is just to track the dependencies 12:36:11 it is not workflow 12:36:30 if we can get chunks of work to share it would help a lot 12:36:45 so please let us know what you are doing 12:36:51 and where we can help 12:37:05 ananta, kinda 12:37:09 zaneb: the existing graph will be updated with new one...with edges from resources in progress from old graph 12:37:27 in case of an on going update 12:37:52 I have a writeup for this in concurrent stack update which I intend to push siometime 12:37:57 ok 12:38:06 where is the overall architecture design diagram? I saw people mentioned a wiki page? 12:38:34 qiming: https://wiki.openstack.org/wiki/Heat/ConvergenceDesign 12:39:00 the wiki is not completely complete :) so I can understand folks have lot of concerns and ques 12:39:20 Qiming: we'll keep updating wiki and all related discussion on ether pad as shared 12:39:34 sounds great. 12:39:44 asalkeld: sure we will....we should get the overall picture clear so that we are all comfortable in discussions 12:39:52 Qiming: https://etherpad.openstack.org/p/convergence 12:40:44 ananta, whilst the design is great, I perfer a PoC patch 12:41:25 asalkeld: sure....that's why we wanted to have many implementable specs so that the patches are smaller in unit and we can test 12:41:37 or play around 12:41:51 ok 12:41:52 asalkeld: Before submit, the better 12:41:53 ok, i can move on now if you want 12:42:21 ckmvishnu: yeah...before summit 12:42:22 do we great a different branch for plc? 12:42:32 PoC? 12:42:40 no just mark it wip 12:43:02 any code upstream I can commit to? 12:43:03 How would I pick up all the PoC related changes? 12:43:22 the same git review -d 12:43:23 mspreitz: Just pull the branch from the review 12:43:39 #topic autoscaling class re-org 12:43:44 ckmvishnu: have them all under one topic would be helpful 12:43:44 Qiming, 12:43:50 that's a good question... since the changes are going to bit bigger...i think we can have a branch 12:43:54 just my thoughts 12:44:08 ckmvishnu: i.e. use a different local branch, but still submit against master 12:44:09 okay, regarding https://review.openstack.org/#/c/119619/ 12:44:14 ananta: will be enought to have different topic 12:44:17 ananta: a series of patches posted for review is a branch, effectively 12:44:25 I have got quite some constructive comments on it 12:44:43 it is a first step for the spec: http://specs.openstack.org/openstack/heat-specs/specs/reorg-autoscaling-group.html 12:44:50 shardy: a topic makes searching easy though 12:45:00 sure, then 12:45:11 zaneb: Sure :) 12:45:12 zaneb: yup, agreed 12:45:20 Does http://specs.openstack.org/openstack/heat-specs/specs/reorg-autoscaling-group.html imply duplicated code between the AWS and OS ASGs? 12:45:21 :) 12:45:23 Qiming, so i think actually seeing the changes makes me wonder about the value 12:45:25 there are multiple goals in this work: 1) tweak the ASG class hierarchy; 2) split AWS and Heat version ASG; 3) use namespace for separation; 4) add test cases to Heat version 12:45:47 so far, as I just summarized, the main concerns are: 1) no immediate benefits; 2) a lot of work; 3) complicated hierarchy; 4) similaritybtw InstanceGroup and ResourceGroup; 5) should use utility functions whenever possible; 6) patches in that chain are too small 12:46:10 1 and 2: pains I'm prepared to take, for the long term benefits this can bring to the project 12:46:12 mspreitz: no, but we want to break the model where the native resource inherits from the AWS one 12:46:29 regarding 3): I don't agree, the intent was to straighthen the relationship, not to make it more complicated. 12:46:34 mspreitz: so the common code needs to move somewhere 12:46:41 Qiming: so the long term goal here should be that we can move the autoscaling stuff behind a separate (Python) API that doesn't dig around in stack internals 12:46:55 instead of having the implementation mingled with the resources 12:46:58 zaneb, I got the points 12:47:13 Qiming, i guess i tend to perfer a collection of shared functions 12:47:14 so that we can eventually move it to a separate process 12:47:28 so I listed them as point 5 and 6, that's something I will bear in mind going further on this path 12:47:29 and a flatter class structure 12:47:35 shardy: I do not see a place for the common code in the hierarchy in http://specs.openstack.org/openstack/heat-specs/specs/reorg-autoscaling-group.html 12:48:12 asalkeld, the current hierarchy is flawed, if we want to reorg it, the earlier the better 12:48:32 mspreitz: evidently, which is why we're having this discussion :) 12:48:38 making InstanceGroup a subclass of ResourceGroup is not only a concept problem 12:48:45 they are indeed quite similar 12:48:51 Qiming, sure but you could just go straight from resource to asg 12:49:05 as I have replied shardy here: https://review.openstack.org/#/c/119619/1/heat/engine/resources/autoscaling.py 12:49:46 for some related bugs, if we don't have class hierarchy, the code may have to be duplicated as I can imagine 12:50:13 I agree we sould abstract utility functions out of class implementation whenever possible 12:50:27 I don't see a big conflict here 12:50:27 it really make unit testing alot easier 12:50:59 shrug, ok 12:51:05 in fact, in that chain of patches, #link https://review.openstack.org/#/c/123481/ is the current bottleneck 12:51:20 once I'm through that, the next "big" things would be: 1) extract utility functions from AWS ASG into a module; 2) subclass Heat AutoScalingGroup directly from ResourceGroup, while reusing those utility functions to avoid code duplication as much as possible. 12:51:21 #action please all have a look and give some constructive feedback 12:51:44 I'm expecting less rebase efforts once the rough split (#123481) is done. 12:51:59 thank you all for your time on reviewing this, :) 12:52:10 ok, Qiming lets move on 12:52:16 Qiming: imho the current hierarchy is exactly backwards 12:52:21 asalkeld: could you specify have a look "WHAT" in action :) 12:52:34 ok 12:52:40 Qiming: InstanceGroup should inherit from AutoscalingGroup 12:52:54 asalkeld: we just may forget.. 12:52:58 so this change seems to be entrenching us further in the backwards way to me 12:52:59 #action please all have a look at review 119619 and give some constructive feedback 12:53:07 thx 12:53:29 moving along... 12:53:41 #topic the harestarter 12:53:56 mspreitz, ... 12:53:58 zaneb, cannot get the point at the moment, but will think about it 12:54:02 So we discussed some possibilities last time, but did not reach an agreement... 12:54:16 What I am looking for is a reason to expect that users will get a transition period. 12:54:36 Here are the leading possibilities I see: 12:54:46 mspreitz, i think that is reasonable 12:54:50 mspreitz, shouldn't we discuss that in design summit? 12:55:06 inc0, yeah the solution (new thing) 12:55:15 is there anything more to say about this other than #action everyone re-review https://review.openstack.org/#/c/124656/ ? 12:55:19 (interrupte) inc0: if you are willing to hold up convergence until L, then yes 12:55:30 because this is my least-favourite conversation topic. 12:55:59 The timing problem is this: if HARestarter is inconsisent with convergence then you need to do something 1 release before convergence 12:56:42 mspreitz, i don't think this will be needed 12:56:56 asalkeld: what, and why not? 12:57:16 we can make a resource that runs 2 updates on the stack (remove and add resources) 12:57:29 (to re-create) 12:57:53 (to duplicate the current functionality) 12:57:59 zaneb: +1 12:58:18 I really don't think pinging via network is good way to check resource health 12:58:19 If the people working on convergence agree that we can and will put a new implementation behind the OS::Heat::HARestarter name, then I am happy 12:58:27 aka, the default 'handle_autoheal()' 12:58:33 AFAIK we said, deprecate but not remove until a viable replacement exists, which may be health-aware ASG 12:58:34 mspreitz, i am fine with that 12:58:43 we need to make it work 12:59:00 except, of course, that the name is a bug! 12:59:06 at least no worse than now 12:59:07 but we can fix that separately 12:59:21 asalkeld: I am hearing crickets from the convergence people 12:59:31 InstanceRecreater 12:59:35 mspreitz, we are the convergence people 12:59:46 Qiming, please...don't... 13:00:02 ok nearly timeout 13:00:04 Anant, what do you think? 13:00:09 #endmeeting