20:00:27 #startmeeting heat 20:00:28 Meeting started Wed Mar 12 20:00:27 2014 UTC and is due to finish in 60 minutes. The chair is stevebaker. Information about MeetBot at http://wiki.debian.org/MeetBot. 20:00:29 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 20:00:31 The meeting name has been set to 'heat' 20:00:31 #topic rollcall 20:00:36 hi 20:00:37 hi all 20:00:46 o/ 20:00:47 o/ 20:00:48 o/ 20:00:50 o/ 20:01:05 o/ 20:01:28 shardy? 20:01:36 o/ 20:02:09 o/ 20:02:17 o/ 20:02:19 no action items last week 20:02:41 o/ 20:02:48 sorry bit late 20:02:49 #topic Adding items to the agenda 20:02:54 #link https://wiki.openstack.org/wiki/Meetings/HeatAgenda#Agenda_.282014-3-12_2000_UTC.29 20:03:36 anything to add? 20:04:01 AutoScaling and load balancers 20:04:25 that's one topic, the conjunction 20:04:25 mspreitz: done 20:05:04 #topic Tempest tests (and lack thereof) 20:05:16 * radix arrives 20:05:21 does anyone know anything about grenade? 20:05:29 Yeah. Are there tempest tests of autoscaling? 20:05:51 mspreitz: I don't think so 20:05:58 shardy: it's something to do with upgrade testing, right? 20:06:02 (My Yeah was about the topic, not grenade) 20:06:19 zaneb: yeah, we evidently need to integrate with it but I don't know where to start 20:06:32 I don't really know anything at all about tempest or how its tests are defined or run, I shuold look into that 20:06:42 o/ 20:06:47 The good news is that the heat-slow job is now gating and voting, the bad news is that it has taken us (me) a long time to get there and the current tests are superficial. sdague quite rightly gave us a D grade for our current integration test coverage 20:06:52 shardy: is it just DB upgrades, or more than that? idk 20:06:53 radix: we do have some existing tempest tests you can look at as examples 20:07:00 o/ 20:07:03 * SpamapS is late 20:07:05 zaneb: I have no idea, hence my question :) 20:07:06 ok 20:07:24 mspreitz: there is a disabled autoscaling scenario test. I have a local rewrite which I need to resurrect 20:07:46 stevebaker: that would be good 20:08:01 shardy: where are they? in the heat repo? 20:08:04 I am annoyed at the lack of testing of autoscaling. Will try to prod loose some local time to work on it... 20:08:13 radix: no, in the tempest repo ;) 20:08:13 is there a wiki page for learning how to do tempest testing? 20:08:19 grenade runs tempest against havana, then does an upgrade to icehouse, then runs tempest again 20:08:20 So this is a wonderful time to push forward on tempest tests. 20:08:37 Especially if you fix bugs. If you fix a bug, write a tempest regression test. 20:08:43 I also am completely ignorant of how to write tempest tests. 20:08:52 SpamapS: yes, and I'm going on the assumption that if there is no tempest test for it, the heat feature is broken 20:09:01 mspreitz it is pretty easy, the hardest part is learning how tempest works 20:09:08 the easiest way to learn how tempest works is to tryit out 20:09:22 WOuld the right approach for autoscaling be to factor into two pieces: one that tests whether alarms POST to the right URLs at the right times, and another to test whether hitting the webhook causes scaling? 20:09:24 yeah, we might want to put together a quick start for just running heat's tempest tests. 20:09:35 #action everyone to write tempest tests ;) 20:09:58 shardy iirc only the meetbot chair can record #actions :_) 20:10:00 it basically uses all the same stuff we use for unit tests, but you don't do unit tests, just interact at the API level 20:10:19 there are two parts - api tests, and scenario tests 20:10:24 if you're looking for a test to write, we're tracking them as tempest tagged heat wishlist bugs https://bugs.launchpad.net/heat/+bugs?field.tag=tempest 20:10:37 mspreitz: the point of scenario testing AFAIK is to do more end-to-end user-scenario orientated testing 20:10:39 sceanrio tests are more complex use cases like launch a vm and storage, connect them together, and see if that works 20:10:47 I'll be writing some scenario tests for software config 20:10:59 mspreitz: there is also an API surface test which is more granular (test each action) 20:11:30 shardy: is that in tempest too? 20:11:32 I think all new tests should be scenario tests, that means you can use heatclient. And everything but the most trivial template is orchestrating a "scenario" 20:11:37 mspreitz: yes 20:11:59 github.com/openstack/tempest/blob/master/tempest/api/orchestration/ 20:12:20 https://github.com/openstack/tempest/tree/master/tempest/scenario/orchestration 20:12:25 our api test coverage is pretty weak stevebaker 20:12:27 shardy: thanks 20:12:27 it needs more attention 20:12:28 We should have a stretch goal to have documentation making it easy enough to do TDD with tempest tests btw. 20:12:52 SpamapS: TDD? 20:12:58 test driven development 20:13:04 stevebaker: when do you plan enable autoscaling scenario test? 20:13:15 So, write the tempest test, run it in a loop, commit/git-review once it passes. :) 20:13:19 stevebaker: AFAIK, it's skipped now 20:13:20 also priority should be given to native heat resources. Eventually we should only be testing cfn resources with cfn compatible templates using the cfn api with boto as tempest thirdparty tests 20:13:37 stevebaker, sdake_: question - how does one run tempest locally? 20:13:46 stevebaker: so what is the heat-slow test currently testing? 20:13:48 zaneb I use testr 20:13:48 i.e. not in the gat 20:13:51 gate 20:13:54 skraynev: get my local rewrite finished, and maybe port it to the native autoscaling resources 20:13:59 stevebaker: I only see the autoscaling scenario test 20:14:10 sdake_: just devstack + testr? 20:14:34 zaneb I add tempest to my devstack install -> http://paste.fedoraproject.org/84814/46552661 20:14:36 zaneb: you can run them just like a unit test 20:14:44 shardy: any orchestration test decorated with attribute "slow" 20:14:53 stevebaker: Ah, OK thanks 20:15:00 * zaneb needs to try setting up devstack again 20:15:01 stevebaker: cool, will it be before Juno release ? 20:15:08 I'm still a bit confused by the various decorator categories 20:15:38 to run tempest tl;dr, enable tempest in devstack, cd tempest, testr run slow 20:15:38 stevebaker: I mean before new release cycle start 20:15:47 or: tempest run orchestration 20:16:23 skraynev: I planned to enable it before havana ;) 20:16:56 skraynev: it needs some changes, the current approach is too racey 20:17:11 stevebaker: hehe... just a little late ;) 20:17:48 so can I have a show of hands of people who intend to write some tests really soon now? 20:17:51 zaneb: btw, you don't have to use devstack, I've used tempest installed via packstack against RDO too 20:18:04 oh, ok 20:18:25 stevebaker: o/ I will write a test for the retry thing I am still trying to get done. :) 20:18:27 stevebaker: Ok, I hope, that it will be soon, because my test scenario test for lbaas in heat based on your example. 20:18:37 accepting that time is short and I may not get it into I ;) 20:18:41 stevebaker: I do, auth user/accesskey/trusts and volume stuff 20:19:17 skraynev: yes, autoscaling plus load balancing should be the end goal 20:19:59 shardy: using software-config to unmount/remount on suspend/resume would be an interesting thing to test for 20:20:16 Human resources uncertain, but interested in that scenario testing. 20:20:32 stevebaker: could you ping me, when you upload new version of it? 20:20:39 skraynev: sure thing 20:20:41 me too 20:20:46 mspreitz: ok 20:20:47 stevebaker: sounds good - I need to spend some time trying out all the new software-config stuff :) 20:21:05 stevebaker: thanks ;) 20:21:10 Maybe we should have a tempest test day 20:21:14 would like to try writing tempest tests too, but not sure how soon will have smth done.. need to learn it first :) 20:21:19 tempest test week would be more appropriate :) 20:21:28 we tried a test day once and nothing happened 20:21:36 SpamapS: will be better - tempest week ;) 20:21:40 SpamapS: we had one a while ago, there wasn't much participation. Maybe there would be more interest now 20:21:41 sdake_: agreed, needs more than a day 20:21:43 sdake_: true, takes a while to complete tests so might take more than a day just to get one to pass ;) 20:21:46 no advertising. gotta market these things sdake_ ;) 20:22:03 I dont think advertising would have helped 20:22:10 people have to be angry about the lack of tests 20:22:10 I think if we do a 'test day' it should be longer than a day too 20:22:12 it takes a day just to come up to speed on tempest 20:22:17 so as long as sdague is calling us out, we should be angry ;) 20:22:25 I will take a look at the bug wishlist and give a try on the tempest test 20:22:44 tango: cool, thanks 20:23:10 I don't think being angry turns into motiviation :) 20:23:15 as I've said many times, I find writing these tests actually fun 20:23:37 while I agree the situation could certainly be better, the community at large needs to pitch in. I wonder if it would behove us to −2 things that we think should have matching tempest tests? 20:24:05 randallburt implementing such a policy before the rest of heat has coverage seems counterprodutive 20:24:21 sdake_: how is that? at least new stuff would have tests no? 20:24:25 sdake_: perhaps. would keep us from playing catch-up though 20:24:27 randallburt: maybe now that we have a voting job which we can launch nova resources we can start considering that, 20:24:32 slower seems like a double standard 20:24:55 once we have some basic examples of testing our current resources then we could consider this 20:24:57 sdake_: I see what you are saying but I think randallburt makes a good point 20:25:26 a slight modification to randallburt's idea is to just insist that bug fixes have corresponding tempest tests, and that features only modify parts of Heat covered by at least some tempest test. 20:25:27 seriosuly if everyone on core spent 1 week madly writing tests cases in #heat, we would be done 20:25:29 maybe we think about transitioning to that 20:25:34 I don't think other projects enforce everything-in-tempest, (near) full coverage in unit tests and good coverage for core functionality in tempest IMO 20:25:34 sdake_: +1 20:25:35 stevebaker: sounds good. and maybe −2 is a little harsh, but at least some review guidelines specifically mentioning tempest/gate/check tests would be good 20:25:40 SpamapS: +1 20:25:42 we should just get er done 20:25:59 realistically we can't test everything in tempest or the tests will take too long to gate on 20:26:15 sdake_: not every core has that kind of time to devote. we have 100+ contributors though 20:26:20 shardy: interesting framework point 20:26:25 Since these are integration/functional tests, we don't expect 100% coverage. But obvious failure prevention is extremely useful. 20:26:26 worthy of discussion itself 20:26:35 shardy: however a single scenario can squeeze in an aweful lot of coverate 20:26:38 coverage 20:26:44 This also helps prevent things like keystone changing an API that only we use trusts 20:27:02 stevebaker: agreed, getting autoscaling+LB would be a massive win 20:27:15 Should we add another category of tests, ones that don't run all the time in the gate but do run periodically? 20:27:26 slower :) 20:27:30 mspreitz: I'd rather focus on all the time tests. 20:27:43 sdake_: then later we'll have slowest, then slowerthanslowest ... 20:27:45 shardy: autoscaling+LB+software-config doing stack updates! 20:27:45 randallburt: it's worth noting that you *can't* create tempest test for something until after it's merged, because otherwise the test will fail 20:27:55 then "getsomecoffee" 20:27:58 stevebaker: ++ :) 20:27:58 SpamapS, i think more than you guys use trusts :P 20:27:59 (I hope more do) 20:28:10 morganfainberg: I kid, I kid. :) 20:28:15 "slow" just means "boots a server which isn't cirros" 20:28:16 SpamapS, >.> 20:28:28 * morganfainberg goes back to lurking :) 20:28:51 anyway, lets move on 20:28:56 #topic Deferred auth method default to trusts 20:29:02 i think we need scenario tests that handle every resource 20:29:09 but they should probablybe split out 20:29:31 stevebaker: so I posted this to devstack: 20:29:39 https://review.openstack.org/#/c/80002/ 20:30:02 but I bumped the bug saying make it default to future, due to the upset the instance-users config chances caused.. 20:30:25 oh see defaulting deferred auth method to trusts is a good example of something that will benefit from more tempest tests. 20:30:28 the issue is it requires a role to exist, and for users to have that role, or we have nothing to delegate via trusts 20:30:29 shardy: It would to make it the default, but given the upgrade constraints I think it should fallback to password if the roles are not set up correctly 20:30:48 stevebaker: Ok, I can work on a patch which does that 20:31:13 shardy: or should the default be "auto" which uses trusts if the role exists? 20:31:24 does this somehow involve the keystone v2 plugin too? 20:31:48 shardy: is that role heat_stack_user or something else? 20:31:50 radix: no, that shouldn't be required for this 20:32:02 SpamapS: heat_stack_owner by default 20:32:08 ah ok 20:32:08 ok 20:32:17 won't v2 keystone only ever be able to use password deferred auth? 20:32:18 Oh I like the idea of 'auto' ... does that already exist? 20:32:28 SpamapS: that is what's setup in devstack, but the idea is deployers set it to whatever makes sense given their local policy 20:32:30 stevebaker: yup, afiak 20:32:42 SpamapS: heat_stack_user is for the in-instance users 20:33:00 stevebaker: which isn't a problem if you need the v2 plugin you just need to know to set deferred auth to password 20:33:16 stevebaker: yes the v2 plugin will never work with trusts 20:33:55 shardy: what do you think of an "auto" option? 20:34:22 stevebaker: sigh. Seems kinda messy but I'm happ to do it if that's what folks want 20:34:43 can it be "auto with annoying log warnings"? :) 20:34:47 lol 20:35:01 shardy: please do make it that way 20:35:08 I would like to move towards deprecating the whole password thing really 20:35:14 shardy: it is a legitimate WARNING .. you are running with less security than you should be. 20:35:34 shardy: and +1 for deprecating the user/pass auth as soon as keystone v2 is gone. 20:35:43 SpamapS: yeah, but it's also a user-facing annoyance, e.g that box in horizon where you have to enter a password on stack-create 20:35:46 WARNING: we r storng yr secrits 20:35:56 SpamapS: but as much as you can given how heat works and one's particular constraints around api availability. 20:36:12 I saw users trying to use the UI with therve at a workshop recently and it's something we really should fix 20:36:14 stevebaker I had a parse error :) 20:36:36 shardy: ew, yeah lets kill that ASAP :) 20:36:52 SpamapS: +1000, but to do that, we *have* to use trusts 20:36:59 I opened a bug about the admin password boxes 20:37:00 so "auto" won't really cut it 20:37:17 wait, sorry, wrong boxes 20:37:18 as horizon has no way to know what variety of auto-ness heat has selected 20:37:28 shardy: right, so the only blocker to that is the sad pandas who are stuck with keystone v2 right? 20:37:33 I guess a deployer can just configure horizon to show the box or not 20:37:41 SpamapS: right 20:37:44 how could horizon know whether to prompt for password or not? it would probably have to do it after attempting a create without it 20:38:12 like heatclient errors with a request to include the password 20:38:27 shardy: that is an interesting failure on a UI level. I wonder if we couldn't just teach horizon how to do the same check as heat-engine does? 20:38:33 stevebaker: Yeah I suppose 20:38:59 horizon probabloy just needs to pass the auth token rather then a user/pwd 20:39:01 Oh yeah see fallback to password makes sense. 20:39:07 that sounds like a different issue to me 20:39:09 SpamapS: we'd have to expose the requirement for deferred auth via the resource schema 20:39:17 sdake_: for create we need a user/pass or trusts. 20:39:25 ohright 20:39:34 yiou mean in the non-trusts case 20:39:54 I'd rather ask for forgiveness, attempt create with just a token, and an error will indicate a password is needed too 20:40:00 shardy: I think falling back after heatclient errors is the way to go.. since this is temporary. 20:40:10 My take is, lets just move towards making trusts the default, and make the password box a configuration option in horizon 20:40:34 so that is a breaking config option.. or one that has to default to the lowest common denominator to be useful. 20:40:40 #action make trusts the default, with graceful fallback so existing configuration files continue to work 20:40:43 Yeah, or fallback automagically but really I'd like the confusing password box hidden 20:41:04 try/except seems acceptible.. since we know exactly the failure to expect if we have to fallback. 20:41:13 SpamapS: so re making it the default, what are the barries re adding a role to every user for TripleO? 20:41:13 could it be done in such way that horizon tries to create with trust and if fail present user with password input dialog? 20:41:28 SpamapS: I'm kinda nervous about the whole thing given recent events ;) 20:41:35 shardy: remember, "auto" is the default ;) 20:42:04 stevebaker: Ok right auto all the things, sorry 20:42:20 shardy: that would have an impact if it was required, yes. But we'll get it done ASAP if we know we have to do it and have a little lead time. 20:42:24 why do we have *any* config options, set them all to auto! :D 20:42:39 config options suck I agree :) 20:42:50 The telephone switch guys have no config options in their products 20:42:53 SpamapS: Ok, I'll work up an auto-trusts patch this week 20:42:53 We're almost done adding stack_domain_admin :) 20:43:09 now that our CI isn't broken :) 20:43:09 shardy: I think you're being facetious, but I actually agree ;) 20:43:33 getting rid of config options is possible, the telephone switch manufacturers did it, but it took them 30 years 20:43:51 stevebaker: I am a bit, my concern re auto-all-the-things is maintaining masses of legacy fallback code long term 20:44:04 but if it's temporary, lets do it :) 20:44:25 SpamapS: that is good to hear! :) 20:44:49 shardy: having an auto can always detect the most appropriate option, it doesn't stop us from deprecating and removing the old broken ways 20:45:15 stevebaker: well it does if everyone ignores the warnings and relies on the old broken ways 20:45:49 15 minutes left 20:45:51 #topic Autoscaling and load balancers 20:45:57 mspreitz: go 20:46:09 OK, first question: do we think this works now? 20:46:20 the new ASG with Neutron LB? 20:46:43 mspreitz: I am still trying to get a test environment working enough to try that 20:46:46 I mean put a PoolMember in a nested stack that gets scaled by the ASG 20:46:47 i.e. one with Neutron 20:46:51 therve is putting it through its paces, I haven't got to it yet but will get back to the existing tempest test soon 20:46:57 mspreitz: I hope it works :) 20:47:22 bah. and my message to openstack-dev just bounced for some reason 20:47:30 Great. Now I just need a little help setting up neutron so I can test it myself. I have asked all day on IRC and ML, gotten zero useful response 20:47:37 but my position is that if it doesn't work, its bugs that need to be fixed before icehouse 20:47:47 agreed 20:47:47 stevebaker: great 20:49:11 perhaps we should write a tempest test 20:49:16 for autoscaling + LB ;) 20:49:27 SpamapS: brilliant! 20:49:29 deja vu all over again! 20:49:35 * SpamapS is just a parrot :) 20:49:35 SpamapS: neutron LB? 20:49:48 * shardy just saw a black cat, then another just like it ;) 20:49:58 skraynev: in theory it should work. In practice, I suspect that is something also not well tested in tempest already ;) 20:49:59 #open discussion 20:50:09 ahem 20:50:14 #topic open discussion 20:50:23 maintenance 20:50:31 Anything in heat intend to keep VMs running? 20:50:33 stevebaker: Graceful things from hot-software-config ... 20:51:01 stevebaker: we had talked about how the automatic wait conditions softwareconfig/deployer create would be useful in this area.. 20:51:34 stevebaker: wondering if you have any update on that, or guidance as to whether I can write a resource plugin that would make that a reality... 20:51:36 SpamapS: I am having trouble parsing you 20:51:47 sorry 20:52:04 SpamapS: this is for rebuild specifically? what does the shutdown aquiesing actually need to do? 20:52:13 mspreitz: you mean, like, keeping things running when bad things happen out-of-band? (like a server being deleted somehow) 20:52:13 graceful things == signalling to in-instance tools that a reboot or instance delete is coming, and waiting for a signal back before doing reboot/delete. 20:52:24 radix: right 20:52:30 SpamapS: hm. I don't know how it will be works together, because I only have tempest scenario test for LB (but currently it works local..) 20:52:35 mspreitz: it's something that's being talked about a lot and will probably get attention from multiple people in juno 20:52:51 Spamaps: thanks. 20:53:00 stevebaker: rebuild and delete 20:53:06 stevebaker: and resize 20:53:17 spamaps I think stevebaker has something to handle that 20:53:37 he indicated software config can run a workload at shutdown before actually deleting the instance 20:53:38 Basically, our cluster health will stay higher during updates if we don't rip nodes out from the cluster without warning. 20:53:41 SpamapS: so currently you can only do that for DELETE, since that is an action 20:54:07 stevebaker: ok, so I could in theory extend OS::Nova::Server to use the same method before it does a rebuild? 20:54:07 eg, the workload running would be part of hte delete operation 20:54:10 SpamapS: also need signalling to other resources, I think 20:54:14 SpamapS: e.g. dependent PoolMembers 20:54:28 SpamapS: so that they can temporarily remove a node from a load balancer when the node is being e.g. resized 20:54:56 SpamapS: yes. How about putting your subclass in contrib/tripleo. Would you object to moving OS::Heat::UpdateWaitConditionHandle there too? 20:55:33 stevebaker: I would not object to either of those, though IMO OS::Heat::UpdateWaitConditionHandle is generically useful for anybody not ready to use SoftwareConfig so I am less excited to move it to contrib. 20:56:02 stevebaker: and if we can't get it into contrib for Icehouse, we'll just ship it in tripleo-heat-templates 20:56:24 since that does not really freeze 20:56:49 UpdateWaitConditionHandle frightens me, but I don't know what an alternative would look like 20:56:57 SpamapS: FYI I started looking at a native OS::Heat::WaitSignal resource, designed to work with heat resouirce-signal, but ran out of time 20:57:17 SpamapS: maybe long term will be to represent rebuild/resize workloads as config/deployment, but short term just hack it 20:57:25 will probably pick that up again after the freeze, although the software-config stuff somewhat makes it redundant 20:57:50 SpamapS: but I will think about how it could be donw 20:58:11 I'm assuming contrib is not subject to feature freeze by the way 20:58:25 stevebaker: do we know that's the case? 20:58:34 shardy: no, I will confirm 20:58:41 stevebaker: yeah, I think it is just another action, just a resource-centric action rather than a stack-centric action like DELETE 20:59:11 radix: on signalling cross-node - there are lots of reasons a node might be unavailable, lbss shoudl just cope 20:59:19 SpamapS: yes, it might turn out to be easy 20:59:23 radix: it is after all what they are designed to do 20:59:27 out of time.. 20:59:29 lifeless: I'm not talking about cross-node signalling 20:59:36 but maybe you're still right 20:59:41 lifeless: agree 20:59:43 its midnight somewhere 20:59:53 lifeless: there are other use cases, like VolumeConnections, I don't know what kind of behavior those have. 21:00:01 #endmeeting