14:00:48 <SergeyLukjanov> #startmeeting sahara 14:00:50 <openstack> Meeting started Thu Oct 8 14:00:48 2015 UTC and is due to finish in 60 minutes. The chair is SergeyLukjanov. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:51 <SergeyLukjanov> #link https://wiki.openstack.org/wiki/Meetings/SaharaAgenda 14:00:51 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:54 <openstack> The meeting name has been set to 'sahara' 14:01:05 <crobertsrh> Hel\o/ 14:01:13 <esikachev> hi 14:01:13 <SergeyLukjanov> let's wait for a few more minutees' 14:01:27 <elmiko> o/ 14:01:36 <vgridnev> hi 14:02:21 <SergeyLukjanov> #topic sahara@horizon status (crobertsrh, vgridnev) 14:02:30 <SergeyLukjanov> #link https://etherpad.openstack.org/p/sahara-reviews-in-horizon 14:02:41 <crobertsrh> I added one bug fix to that list this week. 14:02:49 <crobertsrh> It has to do with filtering by plugin name. 14:03:07 <crobertsrh> It's kind of ugly, but actually not a regression. 14:03:18 <vgridnev> nothing from me on this topic 14:03:50 <crobertsrh> Soon, I will have a change posted for adding shares to node group templates. Reviews will be appreciated. 14:04:17 <SergeyLukjanov> cool, thx 14:04:19 <SergeyLukjanov> #topic News / updates 14:04:54 <elmiko> mainly been working on talks for summit, also a little help with coordinating some outreachy applicants for sahara 14:05:27 <vgridnev> working on ambari plugin improvements 14:05:27 <egafford> Also primarily on summit talks and discussions. 14:05:47 <AndreyPavlov> hi guys, have been working on CLI and grenade job 14:05:55 <esikachev> working on SPI for cluster verification checks, testing liberty 14:06:07 <crobertsrh> I've been working on adding manila share functionality to horizon (node/cluster templates and running clusters) 14:06:34 <tosky> some time to test Liberty 14:07:12 <tosky> and finding bug, like the one mentioned by crobertsrh (which would be nice to backport to Liberty, if it's not a problem, after fixed): https://bugs.launchpad.net/horizon/+bug/1503235 14:07:12 <openstack> Launchpad bug 1503235 in OpenStack Dashboard (Horizon) "[Sahara] Filter (on strings?) not working at least for Node Group and Cluster Template pages" [Undecided,In progress] - Assigned to Chad Roberts (croberts) 14:07:34 <vgridnev> crobertsrh, I viewed Trevor change for filtering in sahara for cluster templates, it looks like it breaks backward comp. 14:07:38 <vgridnev> tosky, ^^ 14:07:47 <elmiko> #link https://review.openstack.org/#/c/232067/ 14:07:52 <elmiko> that's the review in question 14:07:58 <tosky> vgridnev: what exactly? 14:08:01 <sreshetn1ak> I'm working on support api-paste.ini 14:08:05 <crobertsrh> vgridnev: He was striving to not break backward compat, but I haven't looked at it yet. 14:08:26 <elmiko> yea, it looked like from his tests that it would not break compat 14:09:28 <vgridnev> so, if we have 2 cluster templates, with name "abs" and "abss" , and then ask for filter for name with arg "abs" we will get 2 cluster templates as result 14:09:52 <tosky> yes, but this is what happens when you use the same search box in Compute, for example 14:10:14 <tosky> consistency is more important, I personally think that the current behavior is incorrect; I would say users expect substring matching 14:10:54 <egafford> tosky: I think vgridnev's point that existing users will see a change in response. Still, that's the goal in this case; to fix the bug behavior does have to change. 14:10:55 <vgridnev> tosky, but initially was perfect matching, so we are changing output result 14:11:22 <tosky> vgridnev: yes, to be consistent with similar search filters in horizon 14:11:26 <egafford> So either we live with it until we rev the API for backward compat, or we fix the bug. 14:12:07 <tosky> egafford: it's not a matter of API, it's behavior; if we say that it can't be changed (i.e. perfect match only), it will never be changed regardless of the internal code implementation 14:12:20 <egafford> I think in this case it's reasonable to call the initial version wrong, and worth fixing, rather than reasonable and worth preserving (even if we'd rather do something else.) 14:12:40 <crobertsrh> +1 for consistency across openstack 14:14:11 <egafford> tosky: Well, vgridnev's argument is that we shouldn't change the response (behavior) of an existing API. Response is part of API too; we could decide to only change at a version switch. Still, +1 to consistency across openstack, as crobertsrh says. 14:14:26 <tosky> uhm uhm 14:14:29 <egafford> I think fixing it is a win. 14:15:10 <tosky> and a new API? I guess no one wants that 14:15:24 <elmiko> i want that ;) 14:15:25 <tosky> the goal (from my point of view) is consistency in search behavior 14:15:26 <egafford> tosky: Well, elmiko wants that. 14:15:28 <egafford> :) 14:15:31 <tosky> in Horizon, at least 14:15:45 <tosky> not talking about 2.0 :) 14:15:52 <tosky> I was talking about a new method here 14:15:57 <egafford> (So do I.) tosky: Yeah, we agree, I'm just mapping out the alternative argument. 14:16:20 <elmiko> tosky: so, you are saying add a new method for the glob style filtering? 14:16:30 <vgridnev> we can't be sure that someone uses current behavior of Sahara 14:16:56 <tosky> yes, but on the other side we don't have microversioning - but we did add new methods to the 1.1 API in the past, right? 14:17:25 <elmiko> yea, we could add an entirely new method without breaking the contract 14:17:39 <elmiko> vgridnev: +1 14:17:47 <egafford> elmiko: Where do you even put that in a sensible REST API though? 14:17:47 <tosky> so, if it is possible, maybe we could leave get_cluster_templates as it is and introduce new get_cluster_templates_filter (the same for other methods) 14:18:01 <egafford> I mean, there are crazy places to put it, but sensible ones? Harder. 14:18:15 <elmiko> egafford: yea, i know. this is a difficult question 14:18:32 <tosky> and then, in horizon, look for the new one and fallback in case it is not there 14:18:33 <tosky> that would make impossible to backport to Liberty, though 14:18:47 <tosky> vgridnev: would that ^ be acceptable? 14:19:05 <tosky> (we all know that we will have 2.0 in M, so this will be moot, right? :D) 14:19:18 <tosky> ok, joking aside 14:19:27 <elmiko> i dunno, we might have to wait on this. because we will also need to change the filter stuff for node groups, i'd hate to see us propogate a bunch of bad api endpoints just to fix this for now. 14:20:40 <elmiko> i think we could add a new filtering parameter to the current endpoint 14:21:08 <elmiko> which would allow us to distinguish between exact match and glob match 14:21:11 <vgridnev> I think this issue not so critical for liberty, it doesn't break anything, just something probably strange 14:21:17 <elmiko> +1 14:21:30 <elmiko> also, our api docs are making me sad =( 14:22:00 <SergeyLukjanov> elmiko outdated? 14:22:19 <egafford> Yeah, if there's uncertainty among the team, waiting does seem to make sense. And vgridnev, your point is good and well-taken; GetOne to GetMany is a real, substantive change. 14:22:22 * SergeyLukjanov just returned home tonight and have a jet lag 14:22:29 <elmiko> SergeyLukjanov: i think we are missing some details 14:22:35 <elmiko> (on the api-ref site) 14:23:01 <SergeyLukjanov> heh, that's sad 14:23:55 <SergeyLukjanov> btw I'm proposing to agree on preliminary list of design summit sessions on the next IRC meeting (Oct 15) 14:23:57 <elmiko> SergeyLukjanov: imo, we just need to start creating our api docs in the specs repo, like keystone and others have done. 14:24:30 <elmiko> in rst format, to make it easier for updating 14:24:35 <SergeyLukjanov> elmiko, and copying to api-ref or droping api-ref? 14:25:15 <tosky> elmiko: I thought api doc was autogenerated from the APIs and comments, isn't it the case? 14:25:30 <elmiko> SergeyLukjanov: there is much discussion about what should happen to api-ref with the docs team, but one of the suggestions concerning documenting apis is for more projects to keep their api references up to date in their own repos. 14:25:35 <elmiko> tosky: not yet 14:26:04 <elmiko> there is a massive discussion about moving to an autogenerated format for the api-ref site, but for the projects themselves to keep descriptive long-form api references in their own repos 14:26:23 <elmiko> an example, https://github.com/openstack/keystone-specs/tree/master/api 14:26:35 <elmiko> in the v2 api spec, i've proposed we do the same 14:27:00 <tosky> wouldn't it make sense to add also the long-form API references as comments? I mean, it's the doxygen style 14:27:03 <elmiko> this will make it much easier for developers to keep the descriptive docs up-to-date when adding new things 14:27:21 <SergeyLukjanov> #topic Open discussion 14:27:38 <rickflare> hello 14:27:52 <rickflare> I just wanted to introduce myself 14:28:00 <elmiko> tosky: that does make sense, and if you look at some of the oslo libs that is what they do. 14:28:21 <tosky> elmiko: can't we go in that direction then? 14:28:26 <elmiko> tosky: imo, we should still move towards making our docs in the specs repo. it allows for a much more in-depth description of the api 14:29:14 <SergeyLukjanov> rickflare, o/ 14:29:23 * elmiko waves at rickflare 14:29:39 <SergeyLukjanov> elmiko, we should evaluate the option of creating v2 api in specs repo 14:29:44 * rickflare waves back o/ 14:29:59 <SergeyLukjanov> elmiko, I like the idea, but I'm afraid that we'll need to duplicate it in api-ref 14:30:33 <tosky> elmiko: why can't we have in-depth as API comments directly? 14:30:35 <elmiko> SergeyLukjanov: yea, but api-ref is pain in the ass. we need to keep that up to date for now, but it will change in the future. 14:30:47 <elmiko> tosky, SergeyLukjanov, look at this review for more ideas https://review.openstack.org/#/c/214817/ 14:30:52 <rickflare> So guys let me give you a little insight to my background etc. I am system engineer/ clould engineer. I have been involved and using Linux for about 12 years now. 14:30:53 <elmiko> tosky: we can 14:31:01 <elmiko> tosky: it's a deeper question though 14:31:21 <tosky> hi rickflare 14:31:33 <elmiko> welcome rickflare 14:31:50 <rickflare> I am extremely passionate about Hadoop and other cloud/big data technologies. I am looking to contribute to Sahara as this is best group of FOSS people ive worked with online. 14:31:54 <SergeyLukjanov> elmiko, oh, swagger, it's not bad 14:32:22 <rickflare> I am currently in the process of trying to implment saltstack and puppet into our current cluster deployments. 14:32:49 <egafford> o/ rickflare! 14:33:00 <crobertsrh> Glad to have ya here rickflare 14:33:41 <elmiko> tosky: look at line 39-64 in that review 14:34:06 <rickflare> In my process I have been amazed at how awesome this project is and I can only see great things ahead. I hope to bring years of managing large hadoop clustering. 14:34:12 <elmiko> tosky: that stuff is way to in-depth to put in the comments. it will make the code more comments than code 14:34:32 <elmiko> rickflare: \o/ 14:34:47 <tosky> elmiko: which is not a bad thing (more comment than code) - well, not for compiled languages at least 14:35:34 <elmiko> tosky: the point is, the proposed solution is to use swagger to auto-generate the api references with a supplemental document written in rst that will have more in-depth explanations of how to use the api 14:35:39 <rickflare> I feel the key to Sahara success will henge on easy of use for admins and devopers etc. I am extremely passionate about this and I am more than willing to get my hands dirty and really help out. 14:36:22 <elmiko> rickflare: we could definitely use more admin/ops perspectives on how to improve sahara =) 14:37:11 <SergeyLukjanov> yeah and we're really looking for this ind of feedback :) 14:38:44 <rickflare> so one of the things that I think should be addressed is the process in which one makes images. 14:39:10 <egafford> rickflare: :D 14:39:13 <SergeyLukjanov> :) 14:39:25 <SergeyLukjanov> yeah, it's the area where we need to do smth 14:39:40 <crobertsrh> rickflare: Any chance you'll be in Tokyo for the summit? 14:39:49 <rickflare> Its a huge hurdle and for those who are not extremely knowledgeable about tox it will be a huge turn off for newbies 14:39:49 <crobertsrh> we will be talking about image generation there 14:39:58 <egafford> I hope to talk about that at summit coming up. If you have any ideas for specific implementations, I'd really love to know; I want to sound out all the options before us to get a sense of which to explore in the next 6months. 14:40:23 <rickflare> crobertsrh I really wish I could be there. I am planning on attending the Austin summit for sure. 14:41:44 <rickflare> so the all in one tox command is fine but it needs to be much more modular in its approach. I am certains admins are constantly going to want to slipstream per se software into these images and its going to be a bummer if they can not do that with ease. 14:42:34 <pino|work> rickflare: http://libguestfs.org/virt-customize.1.html there you go 14:42:48 <elmiko> we have been talking about allowing extra parameters to pass through sahara-image-elements into diskimage-builder 14:43:18 <rickflare> thanks pino! 14:44:07 <rickflare> ive also noticed esp with devstack that if one builds a large cluster ie 20 nodes or so Mariadb seems to get overwhelmed quickly. 14:45:04 <rickflare> I have instances in which clusters would not delete etc because of too many connections to Mariadb. Ive also alters the allowed number of connections in my.cnf only to see the same effects. 14:45:16 <rickflare> Im in the process of opening up a bug report about this 14:45:51 <elmiko> i think there has got to be some leeway for devstack too though. devstack is not meant for production work. 14:46:06 <elmiko> imo, 20+ node cluster is not ideal for devstack 14:46:19 <rickflare> The one thing I love is how fluid and natural the process in Horizon is in deploying a cluster. It just works and that wonderful. 14:46:45 <rickflare> elmiko understood, thats why i brought it up here because I was not sure it was of real concern or not. 14:47:17 <elmiko> building large clusters on devstack seems low prio to me 14:47:43 <elmiko> i just feel like you will spend more time fighting with devstack than getting work done ;) 14:48:23 <rickflare> ive also seen this with 5 or 7 node clusters as well though 14:48:32 <crobertsrh> Yes, 1 node devstack is hard enough sometimes :) 14:48:56 <rickflare> well im sorry I might have confused what I was saying 14:49:10 <rickflare> I was refering to clusters with 5 ot 7 nodes in them 14:49:15 <elmiko> ok, 5-7 node, that we should look into 14:49:18 <crobertsrh> rickflare: Is the problem special for sahara clusters, or is it something you'd see with 7 or 8 unconnected vms too? 14:52:14 <rickflare> its only with sahara clusters and per my impections it seems to be steaming from heat and only during the delete command. 14:52:32 <egafford> rickflare: Huh, interesting. 14:52:57 <rickflare> So once horizon reports that a cluster has been deleted it seems that heat is still actively communicating heavily with Mariadb 14:53:17 <SergeyLukjanov> rickflare, heat generates high load on openstack services apis 14:53:21 <rickflare> and it overwhelms it and then it locks up. 14:53:46 <egafford> We do create a lot of resources, and those resources have DB footprints in Heat and other services; deletion is often quicker than spinup, and Heat may not have much in the way of rate-limiting on those calls. 14:53:47 <SergeyLukjanov> so, we've seeing how heat killing some parts of the cloud itself during cluster creation and removal ;) 14:54:03 <SergeyLukjanov> MQ is very affected too 14:54:14 <rickflare> SergeyLukjanov :) ok so its not just me. 14:55:08 <rickflare> I think perhaps we can rate limit or load balance mariadb to deal with this? I am currently in the process of starting to test more of this using RDO versus devstack. 14:55:29 <SergeyLukjanov> we're running 50-200 nodes Sahara clusters regularly and seeing different issues 14:55:39 <egafford> rickflare: Sadly, a lot of the rate limiting would have to be in heat. 14:55:49 <egafford> From Sahara's side, it's really just a very few calls. 14:55:50 <SergeyLukjanov> yeah 14:56:14 <elmiko> also, rate limiting or load balancing to the db is kinda outside sahara purview. we can provide advice, and i think this exists in the operator manual, but that seems like the extent of our actions. 14:56:29 <SergeyLukjanov> actually, we're working with Heat folks to make a kind of limiting (batching) for the requested to guarantee that there will be no DDOS for the APIs 14:56:38 <elmiko> +1 14:56:41 <rickflare> that is awesome! 14:57:10 <egafford> SergeyLukjanov: Yeah, Sahara as attack vector in a public cloud would make us only so attractive to operators. :) 14:57:11 <SergeyLukjanov> rickflare and you still have an option to try the direct engine 14:57:43 <elmiko> egafford: but it's not a sahara only problem, you could do the same by creating some crazy heat requests 14:57:44 <rickflare> SergeyLukjanov I will reach out to offline to learn more about that. 14:57:50 <egafford> elmiko: Oh, absolutely. 14:57:57 <egafford> It needs to be on Heat's sid.e 14:58:09 <elmiko> yea, i just fail to see how we can make a sahara change to affect this 14:58:20 <egafford> Totally agreed. 14:58:39 <egafford> A savvy attacker could make a much more cumbersome Heat template than we do. 14:59:05 <SergeyLukjanov> it could be any load to openstack api 14:59:13 <rickflare> I think we must continue to focus on the junior system admin that is going to have the responsibility of installing and mantaining this software. I think keeping that in mind will go a long way in ensuring we are flexible and easy to understand. Esp because Hadoop, Storm, etc are all monsters of thier own. 14:59:15 <elmiko> yea, these are not sahara specific problems 14:59:26 <SergeyLukjanov> our MQ solution sucks IMO 14:59:36 <egafford> SergeyLukjanov: Heh. 14:59:37 <elmiko> rickflare: yea, but what you are describing is a more general openstack issue. not a sahara issue 14:59:56 * regXboi looks at clock 15:00:00 <SergeyLukjanov> #endmeeting