15:00:07 <therve> #startmeeting heat 15:00:08 <openstack> Meeting started Wed Jun 8 15:00:07 2016 UTC and is due to finish in 60 minutes. The chair is therve. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:10 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:13 <therve> #topic Roll call 15:00:13 <openstack> The meeting name has been set to 'heat' 15:00:35 <jdob> o/ 15:00:39 <jaimguer> o/ 15:00:46 <jasond> o/ 15:00:50 <zaneb> howdy y'all 15:00:53 <ananta> Hi 15:00:53 <duvarenkov__> hi 15:00:53 <ramishra> hi 15:00:54 <rpothier> o/ 15:00:56 <cwolferh> o/ 15:01:20 <spzala> o/ 15:01:44 <Drago> o/ 15:01:57 <ochuprykov> hi 15:02:23 <therve> OK! 15:02:43 <therve> #topic Adding items to agenda 15:02:54 <therve> #link https://wiki.openstack.org/wiki/Meetings/HeatAgenda#Agenda_.282016-06-08_1500_UTC.29 15:03:31 <therve> #topic Convergence status 15:03:48 <therve> OK! I added this item because we missed the n1 release 15:04:00 <therve> Got some small issues, like the tripleo skip not merged 15:04:39 <therve> I believe it's in now, so we should proceed 15:04:52 <ananta> therve: I think we are ready, we should discuss if there is something stopping us from doing so 15:05:08 <shardy> therve: yup it merged https://review.openstack.org/#/c/321090/ 15:05:24 <prazumovsky> Hi! 15:05:29 <ricolin> o/ 15:05:30 <therve> ananta, Oh the unittest one is even merged already? 15:05:33 <jdandrea> o/ 15:05:39 <ananta> yup 15:05:46 <therve> Awesome 15:05:52 <therve> Let's do this then :) 15:05:59 <zaneb> SHIP. IT. 15:06:00 <ananta> https://review.openstack.org/#/c/325798/ 15:06:02 <shardy> therve: I'm planning to get an experimental tripleo job working that enables it 15:06:14 <jdob> three seconds later "Guys, the CI rack is literally on fire." 15:06:45 <shardy> thanks to the patches from cwolferh, zaneb and others the heat memory utilization is looking quite a lot less bad now 15:06:55 <therve> Wednesday is as good a day as any to break everybody :) 15:06:58 <shardy> so it'll be interesting to see how much it explodes now 15:07:00 <zaneb> shardy: nice, any numbers on that? 15:07:08 <therve> Please pay attention to CI breakage from other projects 15:07:13 <shardy> I know stevebaker also has some db optimizations which I've been meaning to test 15:07:32 <shardy> zaneb: I've got some locally but not yet plotted them - will try to do so later 15:07:44 <zaneb> shardy: awesome, thanks 15:07:54 <therve> #topic Performance improvements 15:08:08 <therve> Retro topic! 15:08:31 <therve> shardy, Do you think we'll see some other things taking memory now that the files thing is fixed? 15:09:23 <shardy> therve: I'm sure there are others, but that was definitely a big part of the problem 15:09:44 <shardy> I'll try to quantify it later then we can decide how much effort to invest in further memory profiling etc 15:10:00 <therve> Yeah I was hoping it would clear things enough that we could see other problems 15:11:03 <therve> We'll see how it goes 15:11:23 <therve> That makes me think of another topic 15:11:31 <therve> #topic Refactor db API 15:11:36 <shardy> I think the next big issue is DB optimization, e.g bug #1578854 15:11:36 <openstack> bug 1578854 in heat "resource-list makes SQL calls for every resource which is a nested stack" [High,In progress] https://launchpad.net/bugs/1578854 - Assigned to Steve Baker (steve-stevebaker) 15:11:54 <therve> zaneb, So the other day you mentioned having more transactions around db operations 15:11:55 <shardy> we've got operators waiting well over 10 minutes for a single heat resource-list command :( 15:12:29 <therve> There is this new enginefacade API in oslo_db which may make things more obvious 15:12:34 <therve> shardy, Yeah I hope it will help 15:12:48 <zaneb> shardy: doesn't it time out at 10 minutes even on an undercloud? 15:13:11 <shardy> real 13m23.083s 15:13:13 <shardy> it appears not 15:13:22 <shardy> I'm not sure if config was modified tho 15:13:32 <zaneb> maybe 15:13:47 <therve> shardy, It's with -n5 or something though, right? 15:14:04 <shardy> therve: Yeah, and it was a big deployment (about 300 nodes) 15:14:34 <therve> Yeah 15:14:40 <zaneb> I've seen some of stevebaker's patches land already... looks like they should make a big difference 15:14:57 <therve> But even some 100 sql queries shouldn't take 10 minutes :) 15:15:13 <therve> Especially when heat has basically one user 15:15:33 <shardy> zaneb: cool, it'd be good to see if we can get them tested in some real large deployments - it's hard to really prove in dev environments 15:15:44 <zaneb> therve: I suspect there is some n^2-ness to how the current queries work ;) 15:16:33 <therve> zaneb, Yeah I know... I reintroduced one recently even... 15:16:43 <therve> Because of that required by property 15:17:09 <zaneb> shardy: just need to find some sucke^W testers to try it out 15:18:29 <shardy> zaneb: heh, those with a spare few hundred nodes ideally ;) 15:18:53 <shardy> slagle got some time on the OSIC cluster, I'll try to find out when that's happening 15:19:25 <therve> #topic Pending reviews 15:19:35 <therve> So hum, we have some reviews around 15:19:40 <therve> Please review! 15:19:59 <therve> https://review.openstack.org/135492 seems ready to me, it'd be cool to have that in 15:20:25 <therve> The conditions thing is almost there, so if you want to have a look before it's too late, please do 15:20:29 <therve> cc zaneb :) 15:20:54 <zaneb> yeah, need to re-review that then 15:21:19 <therve> And yeah the performance tweaks by stevebaker too 15:21:32 <therve> Though I'd like rally to be working before, but well :) 15:21:40 <jdob> are thsoe tweaks under the same topic? 15:21:50 <jdob> or just look for baker patches 15:23:03 <therve> jdob, I think there are a bunch of them 15:23:24 <shardy> I'd like to see stevebaker's https://review.openstack.org/#/c/280963/ stack failures list patch land 15:23:26 <therve> At least bug/1588561 and bug/1578854 15:23:34 <shardy> we'll probably enable it by default in tripleoclient 15:23:38 <jdob> kk 15:25:03 <therve> shardy, It seems to query all the resources? 15:25:37 <shardy> therve: it recurses over the failed ones, yes 15:25:56 <therve> shardy, I mean it does a plain resources.list() and then filter on the result, instead of just querying the failed ones 15:26:07 <shardy> probably scope for optimization, but the output it provides is good 15:26:09 <shardy> therve: aha 15:26:48 <ramishra> therve: what about the external_id patch? I've raised an issue, though there may not be an easy fix for it atm. 15:26:57 <ramishra> so probably can go in. 15:27:57 <therve> ramishra, I don't really understand your concern 15:28:17 <therve> You mean you can't reference a removed resource from the properties? 15:28:59 <ramishra> therve: if we leave the properties with a reference to another resource in the template(for a external resource) 15:29:15 <ramishra> then it's used for dependency calculation 15:29:32 <ramishra> so when that dependant resource is removed from the template it would fail 15:29:44 <ramishra> when you try to update 15:29:49 <therve> That seems correct to me? 15:29:59 <therve> I don't understand which behavior you're expecting 15:30:28 <therve> Completely ignore the properties of the external resource? 15:30:28 <ramishra> It would not fail, when the reference is there. 15:31:04 <ramishra> yes, ignore them for dep calculation, or don't to dep calculation for external resource 15:31:14 <therve> I don't really see why. 15:31:29 <therve> Maybe it's worth documenting it, but it seems reasonable as-is to me 15:31:58 <ramishra> what's the deps for external resource mean? 15:32:10 <zaneb> without having looked at the patch, I don't think we would ever want to ignore anything in the dependency calculation 15:32:31 <zaneb> but this is perhaps a topic best discussed after the meeting :) 15:32:42 <therve> ramishra, Not much, but that's a user problem 15:32:56 <therve> Don't add dep to other resources if you have an external resource 15:33:03 <therve> Yeah 15:33:07 <therve> #topic Open discussions 15:33:51 <zaneb> I had one thing to make folks aware of 15:34:04 <ramishra> I've one more thing to discuss, why do we have policy enforcement for resource_types in heat default policy. Don't they mask the policies from the services. 15:34:36 <jdob> i have a quick one: Kanagaraj and I have been working towards a PoC and spec changes for the hot-template stuff, so there should be something to show in the next few days 15:35:27 <therve> ramishra, They allow providers to give a better user experience 15:35:45 <therve> jdob, Awesome! 15:36:09 <zaneb> back in Kilo when we split nested stacks and called them over RPC, we didn't have a way of cancelling them if another failure occurred in the parent stack, with the result that the user has to wait for everything to time out after a failed operation before they could try again 15:36:24 <zaneb> we put in code to fix this on updates in Liberty 15:36:42 <zaneb> unfortunately that code is not working even in Mitaka :( 15:37:00 <zaneb> also, we don't have any code that even attempts to cancel creates 15:37:12 <ramishra> therve: hmmm.. I am not sure how that's better experience, when there can be possibilty of conflicts in the heat policy vs service policy 15:37:34 <zaneb> and if the code was present/working it would still be doing stuff wrong 15:37:41 <zaneb> so that's what I'm working on at the moment :) 15:38:13 <therve> ramishra, http://specs.openstack.org/openstack/heat-specs/specs/liberty/conditional-resource-exposure-roles.html was approved 15:38:43 <zaneb> ramishra: eventually there's going to be a keystone API for this. in the meantime this is probably the best we can do 15:39:19 <therve> zaneb, That's cool. Do you think that will touch what we talked about with regards to transactions? 15:40:27 <zaneb> therve: hopefully we're going to interrupt threads at yield points instead of randomly, so that may solve/mask many of the problems 15:40:45 <therve> zaneb, Hummm. Don't we do that already? 15:40:45 <zaneb> therve: but I still think we should use transactions for everything 15:41:06 <ramishra> zaneb: my suggestion is to get rid of them, anyway user will get to know about service policy enforcements. Though I know there would not be many takers for this:) 15:41:36 <zaneb> therve: for stack-cancel-update yes. but when you initiate a delete it just kills the thread 15:42:14 <zaneb> therve: it will be a lot of work to do this everywhere though. and even then there will be times when we have to kill a thread as a fallback 15:42:19 <therve> zaneb, Right, but kill an eventlet thread is just raising an exception at yield point 15:42:22 <therve> AFAIU at least 15:43:18 <zaneb> therve: by 'yield' I meant explicit 'yield' in a co-routine. so can't happen in the middle of a series of DB operations like eventlet switch can 15:43:34 <therve> Ah, okay 15:44:00 <zaneb> ramishra: that's what we used to have and it was terrible ;) 15:44:19 <therve> Well I'm very interested in that work at any rate :) 15:44:27 <zaneb> ramishra: we had to keep admin-only resource plugins in /contrib. this is the price of moving everything in-tree 15:45:13 <zaneb> therve: great, you can review my mega patch series when it's ready ;) 15:45:19 <therve> :D 15:45:24 <therve> OK, anything else? 15:45:59 <zaneb> er, that read wrong. it's a very long series, not a series of very long patches 15:46:45 <cwolferh> feedback on change 303692 (resource properties data data model change) would be appreciated, especially last comment 15:46:48 <therve> I totally expected 28 patches each 13 lines long by reading that :) 15:47:46 <therve> cwolferh, Do you mean https://review.openstack.org/267953 ? 15:48:21 <cwolferh> er, right, the other patch :-) 15:48:36 <therve> cwolferh, So yeah that test interacts with other tests somewhat on purpose 15:48:57 <therve> The purge is something operators are going to run while heat is running, so it needs to work while stuff is happening 15:49:22 <zaneb> therve: they're not going to run it with time=0 though, surely? 15:49:23 <cwolferh> i'm not sure purge_deleted 0 should really be encouraged, but ok 15:49:42 <therve> zaneb, They're not, but I'm not sure it should make a difference? 15:50:01 <cwolferh> well, the good thing at least, it didn't cause the other tests to fail 15:50:55 <therve> I'm not sure why it's suddenly a problem with the new table 15:51:09 <therve> If you calculate what to remove correctly it should work as usual 15:51:35 <cwolferh> i'm not sure it wasn't stepping on other things and not being noticed before 15:52:15 <therve> I'm pretty sure it was stepping on other things and some issue were found :) 15:52:39 <zaneb> therve: if we e.g. write the DELETE_COMPLETE event after marking the stack DELETE_COMPLETE then there could be a race there that only occurs if with purge_deleted 0 15:53:34 <zaneb> and that's not necessarily an illegitimate way to do it, although it does sound fixable 15:54:12 <zaneb> although stack-level events should have properties attached anyway... 15:54:15 <therve> zaneb, events reference stacks though? 15:54:22 <therve> So we would see this issue on master 15:55:03 <zaneb> the failure was an event referencing a deleted properties data row iirc 15:55:10 <cwolferh> right 15:55:24 <cwolferh> also a resource 15:55:38 <zaneb> we can move this one back to #heat too I think 15:55:44 <therve> Sure 15:55:44 <cwolferh> yep 15:55:50 <therve> #endmeeting