20:00:22 #startmeeting heat 20:00:26 Meeting started Wed Dec 3 20:00:22 2014 UTC and is due to finish in 60 minutes. The chair is asalkeld. Information about MeetBot at http://wiki.debian.org/MeetBot. 20:00:27 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 20:00:30 The meeting name has been set to 'heat' 20:00:47 #topic rollcall 20:00:48 o/ 20:00:51 o/ 20:00:52 o/ 20:00:52 \o 20:00:53 o/ 20:00:56 hi 20:00:58 hi folks 20:01:02 o/ 20:01:06 hey 20:02:00 whilst we are waiting for others, lets think of some topics 20:02:25 I have one:) Status of zero downtime upgrades 20:02:29 #topic Adding items to the agenda 20:02:37 ok 20:02:37 asalkeld: I have one, Grenade testing 20:02:40 kilo-1 status 20:02:48 looking for a volunteer to push that forward 20:03:37 also kilo-1 exceptions 20:04:00 ryansb: exceptions? 20:04:16 i thought that was for ff 20:04:25 shardy: you mean grenade job or what? 20:04:26 ryansb: I don't think we do that for milestones, its just a snapshot 20:04:55 what about a quick summary of convergence 20:04:58 skraynev_: Yeah, I started a patch adding Heat to grenade testing, EmilienM then did some work on it, but it's still not ready/merged 20:05:21 err, I got wires crossed. Was thinking of 2014.2.1 20:05:22 need someone to take ownership of it, I never have time to look at it lately 20:05:23 https://review.openstack.org/#/c/86978/ 20:05:26 scratch it 20:05:30 urm.... 20:05:31 shardy: I may take a look ;) 20:05:36 #topic Review action items from last meeting 20:05:39 skraynev, ^^^^^ 20:05:45 ryansb: Ah, yeah I think we landed all we needed for the stable freeze 20:05:53 #link http://eavesdrop.openstack.org/meetings/heat/2014/heat.2014-11-26-12.00.html 20:06:20 I think last meeting ended early due to thanksgiving...is there anything to review? 20:06:22 did this happen "Convergence folks to organise call/hangout to define direction re various PoC efforts" 20:06:32 skraynev_: thanks, lets action you when we get to that item :) 20:06:48 asalkeld: it's organised for tomorrow 20:06:53 it did -> tomorrow at 9am east coast time 20:06:54 shardy: ok 20:06:55 cool 20:06:58 pas-ha: thx 20:07:10 ok, moving on 20:07:14 #topic Status of zero downtime upgrades 20:07:34 so this is waiting on the oslo versioned objects 20:07:36 so, do we have issues listed out? blueprints to implement/draft? bugs to fix? 20:07:49 inc0, we have a bp 20:08:15 #link https://review.openstack.org/#/c/132157/ 20:08:26 and seems like nova versioned objects falls into that bucket too 20:08:47 is that only issue we have? if that lands we'll be able to do upgrades? 20:08:51 asalkeld: can't we have zero downtime upgrades now as long as db schema changes are backwards compatible? 20:09:08 stevebaker, yeah 20:09:16 is it always? 20:09:23 tho' i doubt that is normally the case 20:09:47 so can a heat-engine talk to an old db 20:10:33 inc0, i could start with just coping the versioned objects code in tree 20:10:42 there in not much code 20:10:55 and then swith over when we have it in oslo 20:11:05 so it's not holding us back 20:11:19 thats a bit risky 20:11:27 stevebaker, why? 20:11:43 i don't believe it's going to change 20:11:46 the api might change as it moves to oslo 20:11:50 asalkeld: can we wjust wait this functional from oslo? 20:12:10 s/wjust/just 20:12:19 we can wait, but i am not sure if it would make it in kilo 20:12:21 stevebaker: Sometimes the API changes when it moves out of incubator anyway, so I don't see that as a huge barrier 20:12:32 which would be disapointing to me 20:13:00 asalkeld: got it. Agree, will be nice to have it in kilo 20:13:22 ok, moving on 20:13:27 asalkeld: what's holding it up landing in oslo? 20:13:27 #topic Grenade testing 20:13:35 ok, I wouldn't block it in the short term 20:13:46 shardy, spec approval + code landing 20:13:49 Ok, so here's the patch I started a while back: 20:13:58 https://review.openstack.org/#/c/86978/ 20:14:03 #link https://review.openstack.org/#/c/86978/ 20:14:19 shardy: vaguely related, do you have a working local instack? 20:14:42 EmilienM: helped get it working, now it's broken again, and I never seem to have time to look at it 20:15:01 skraynev_: if you have time to help that would be great 20:15:04 shardy: I should have a look if you want 20:15:14 shardy, is the setup tricky? 20:15:14 s/should/could/ 20:15:30 we need to land initial grenade support, then define javelin stuff so we can actually test persisting stacks over upgrade 20:15:51 Oh hey EmilienM, yeah if you have time to revisit it that would be great too 20:16:01 shardy, does that test a rolling upgrade? 20:16:09 shardy: sure, I'd interesting in it. 20:16:17 shardy: I can manage to have time. I don't want to abandon something I helped a bit 20:16:27 asalkeld: aiui grenade tests a stop-start upgrade 20:16:33 EmilienM: could you work with skraynev_ and somehow divide the work of defining what's tested and implementing it? 20:16:35 asalkeld: looks like it kills heat, db upgrades, then starts heat again 20:16:45 shardy: sure. 20:16:48 EmilienM: Ok, great, thanks 20:16:50 #action skraynev take over https://review.openstack.org/#/c/86978/ 20:16:58 ok 20:17:02 thanks skraynev_ 20:17:07 stevebaker: I don't have instack locally atm 20:17:30 k 20:17:41 ok, anything more on this? 20:17:51 asalkeld: Last time I tried setup was very tricky, but that may have improved 20:18:03 asalkeld: not from me, thanks 20:18:07 ok 20:18:09 #topic kilo-1 status 20:18:16 EmilienM: I suggest connect via email ;) because near of midnight for me. So I start looking tomorrow. 20:18:27 #link https://launchpad.net/heat/+milestone/kilo-1 20:18:42 skraynev_: emilien@redhat.com 20:18:59 there are a couple of bp i am not sure of the status there 20:19:28 EmilienM: k 20:19:36 any one know multi region status? 20:19:54 shouldn't that be assigned to Qiming? 20:19:55 and " Templates outputs need to be lazy loaded" 20:20:07 working on that ^ 20:20:08 asalkeld: I think Qiming is working in the multi region one 20:20:21 ok, i'll update the bp 20:20:39 asalkeld: IIRC, stevebaker had some good comments so Qiming is working on an update 20:21:13 ok as long as it is moving forward 20:21:35 i have taken shardy decouple stack bp 20:21:56 we need to focus on reviewing bugs/bps on that list 20:22:01 to make sure they land 20:22:22 asalkeld: I rebased decouple-nested, and fixed the first test failure today, help with the remaining tests would be great, thanks! 20:22:36 i might kick this back to k2:https://bugs.launchpad.net/bugs/1319813 20:22:39 Launchpad bug 1319813 in heat "no event recorded for INIT_* resources" [Medium,Triaged] 20:22:43 thx, shardy 20:23:01 zaneb, ^ 20:23:07 you set it to k1 20:23:11 that isn't even a bug 20:23:33 but it will probably happen as part of convergence anyway 20:23:36 to 'fix' it looks like a non-trivial amount of work to me 20:23:42 not that it will change anything 20:24:06 moved it to k2 20:24:34 anyone know if jpeeler is working on the lazy loading code? 20:24:50 asalkeld: yes i am 20:25:03 ok, you think it can land in k1? 20:25:08 i do 20:25:17 cool 20:25:19 * zaneb should check the review again 20:25:31 zaneb: nothing yet, will let you know 20:25:38 oh good :) 20:25:46 well i think we are in good shape then 20:25:52 I hate when I'm holding everything up 20:26:11 are you? 20:26:24 sometimes 20:26:28 well half time bell 20:26:34 and we are out of topics 20:26:58 https://review.openstack.org/#/c/138800/ 20:27:14 heads-up we need to land that (or a sample conf sync) to unblock our gate 20:27:27 oslo.messaging released this afternoon 20:27:54 that needs devstack changes first, devstack starts with heat.conf.sample 20:28:19 o - i quite like having a sample in tree to point people to 20:28:20 stevebaker: Hmm, OK, I'll propose a sync and mark that WIP while we work that out 20:28:33 asalkeld: It's constantly breaking us 20:28:39 other projects are ditching it though I think 20:28:40 I thought I made that comment in one of the other delete heat.conf.sample patches 20:28:46 not *that* often 20:29:10 * shardy shrugs 20:29:17 shrug, seem we dont' have much of a choice 20:29:20 shardy - looks like some change must be made to devstack first 20:29:23 well I find it annoying anyway 20:29:30 pas-ha: Yeah 20:29:37 it tries to stat the heat.conf.sample 20:29:40 stevebaker: evidently I missed those, apologies 20:30:00 shardy: there is a tox env to generate a sample, devstack can call that instead 20:30:01 zaneb, you and inc0 want to fill us in on the convergence status 20:30:31 well, we've spoke with HP guys today, and agreed to meet up tomorrow 20:30:32 stevebaker: Yeah, I'm just thinking of the immediate step to un-break things 20:30:52 * shardy waited 6 weeks to land his last devstack patch 20:30:54 #topic open discussion 20:31:03 shardy: maybe just this would be fine for now https://review.openstack.org/#/c/138800/2/tox.ini 20:31:23 from my point of view: I hope I've answered questions from mailing list...now I think we just need to decide on approach 20:31:32 I think we are close to settling on an approach 20:31:36 stevebaker: Ok, I'll break it into two patches, thanks 20:31:56 just figuring out the last differences in how to store the graph efficiently 20:32:07 ok 20:32:16 zaneb, so you've already dropped out my approach? 20:32:19 :/ 20:32:49 inc0, have you worked on it more? 20:32:53 inc0: I... haven't changed my mind since my email last week 20:33:06 does it fill the gaps that zaneb pointed out 20:33:08 I haven't seen any new patches as of yesterday 20:33:34 I didn't really have time for coding unfortunately, but I hope I clarified that on email? 20:33:56 and I don't really want to wait another month to just come up with something equivalent to what we already have 20:34:15 zaneb, it will never be equivalent because well..its different 20:34:21 what would change my mind at this point is a demonstration that phase 2 won't work with the other approaches 20:34:31 inc0, i think the problem we have with that approach is that there are some nagging questions that remain unanswered 20:34:47 asalkeld, ask then;) 20:35:04 the rescheduling 20:35:07 zaneb, have you guys solved things like concurrent update? 20:35:19 asalkeld, what do you mean by rescheduling? 20:35:35 inc0: yeah. that is pretty much the entire point. 20:35:36 when to re-call converge() 20:35:52 and it's effect on the remaining actions 20:36:06 asalkeld, it doesn't matter, at first we can do call every few seconds 20:36:08 also the db load 20:36:40 I'm pretty sure db load can be dropped to n*logn and it can be heavily optimized by queries 20:36:53 its hard to do optimization on dummy db... 20:37:03 inc0, agree 20:37:11 stevebaker: https://review.openstack.org/138850 20:37:29 for first version I was tinking of calling converge every few seconds 20:37:35 well, it's easy to see what does lots of db calls and what doesn't 20:37:38 or when resource state change 20:38:05 zaneb, when you have things like joins, index lookups and so on you can optimize that 20:38:24 also my initial mistake of using name as unique key took its toll 20:38:43 shardy: +2, waiting for gate pass 20:38:48 i guess i am in favour of something at least somewhat logically proven - compared to an arguement 20:39:06 asalkeld: +1 20:39:11 stevebaker: thanks, I'll look at the devstack fix for the next patch now 20:39:18 well, it does solve test cases;) 20:39:25 we all agree that this stuff can _probably_ be solved given enough time 20:39:37 inc0, dont' get me wrong: i like what you have done 20:39:43 I mean I'm not mentally bound to this one, but it does seem to solve biggest space of problems 20:39:52 why would we take 90% chance of a solution in a month over 99% change of a solution right now? 20:39:59 speaking of which, porting the test cases to functional tests now would benefit all approaches, and would be hugely useful 20:40:13 stevebaker, +1 20:40:28 zaneb, because it solves concurency problems 20:40:37 in an elegant, non-hacky way 20:40:43 stevebaker, i might have to do that iwth shardy's decouple patches 20:40:47 concurrent updates I mean 20:41:21 or this stuff https://bugs.launchpad.net/heat/+bug/1260167 20:41:23 Launchpad bug 1260167 in heat "heat autoscale scale up does not work if delete one instance manually " [Medium,Confirmed] 20:41:46 so inc0 that is really phase 2 right? 20:41:52 I mean it all can be solved by graph approach...but these problems simply won't happen 20:41:55 having a 'check' 20:41:56 with my approach 20:42:11 and that should be solved 20:42:17 inc0: it really doesn't. to the extent that the other approaches have a problem with concurrent updates (which is only when something is in mid-update and we don't know what to do), yours has exactly the same problem afaict 20:42:55 well it doesn't have notion of mid-upgrade 20:43:32 so I don't think that would be problem at all 20:43:37 and yet the real world does, with existing plugins 20:43:43 yeah, there will be tasks working that you won't know what to do with 20:43:58 shardy: we have a regression in the functional tests http://logs.openstack.org/19/138819/1/check/check-heat-dsvm-functional-mysql/e9d51bc/console.html#_2014-12-03_19_10_38_802 20:44:10 shardy: I think Qiming raised a bug, let me find it 20:44:20 zaneb, we can simulate in_progress, but not build on it 20:44:28 I mean in progress == actions left to perform 20:44:33 shardy, i can look at that 20:44:50 stevebaker, i mean 20:45:15 asalkeld: ok, lets make that job voting as soon as its fixed 20:45:24 totally 20:45:32 still, we can work on graph approach, no problem with me 20:45:38 stevebaker: interesting, great to see real issues getting caught :) 20:45:45 it may give us something faster 20:46:11 and we may learn more in the process 20:46:13 shardy: although I recall Qiming commenting that the test might need fixing, not the code 20:46:45 stevebaker, i'll raise a bug for it 20:47:19 https://bugs.launchpad.net/heat/+bug/1397945 20:47:22 Launchpad bug 1397945 in heat "resource group fails validate a template resource type" [Undecided,New] 20:47:23 found it! 20:47:28 I just hope we won't get ourselves into some hack-loop with concurrent stuff 20:47:44 I'll mark it critical, since it breaks the gate job 20:48:05 #link https://bugs.launchpad.net/heat/+bug/1398973 20:48:07 Launchpad bug 1398973 in heat "test_stack_update_provider_group integration test is failing" [Undecided,New] 20:48:32 stevebaker: Possibly, I saw his bug but could not reproduce locally 20:49:43 shardy: so running heat_integrationtests.functional.test_update didn't reproduce? maybe its a race 20:50:14 anyway, whichever approach we take, I prefere to start implementing than arguing about PoC 20:50:14 s 20:50:26 inc0, agree 20:50:34 we need to get cracking on this 20:51:20 we really don't want to land this to late (like right before release) 20:51:57 we can get back to this discussion at phase 2 20:52:05 when we won't have fire in our house 20:52:47 stevebaker, we have bugs for templest 20:53:02 and we have a spec for functional tests 20:53:09 do we need both? 20:53:13 inc0: what makes you think the pressure ever lets up? ;) 20:53:56 zaneb, I know it never is, thats why I'll be secretly preparing one big stateless fire extinguisher;) 20:54:44 asalkeld: we need to rename all those tempest tags, maybe to 'functional'. They could stay as wishlist bugs though. 20:54:53 ok 20:55:01 i'd like to target some to k2 20:55:16 and get people commited to them 20:55:35 5mins 20:55:51 asalkeld: things have likely moved on a bit since then. I'd much rather see functional tests for native resources than the AWS ones 20:55:58 asalkeld, any chances we could talk tomorrow about getting few actions drafted to start working on upgrades? 20:56:14 sure inc0 20:56:21 i'd love to progress that 20:56:47 I'll bug you about that tomorrow then (for you it will be later today;)) 20:56:53 k 20:57:06 o/ 20:57:42 i'll end this 2 early then 20:57:47 #endmeeting