14:00:31 <raginbajin> #startmeeting Operators Ops Tools/Monitoring 14:00:31 <openstack> Meeting started Wed Jun 17 14:00:31 2015 UTC and is due to finish in 60 minutes. The chair is raginbajin. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:32 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:35 <openstack> The meeting name has been set to 'operators_ops_tools_monitoring' 14:01:08 <raginbajin> #topic Recap the Summit Meetup 14:01:15 <raginbajin> Hi everyone. 14:01:48 <GonZo2000> Hi 14:02:54 <odyssey4me> o/ 14:03:27 <raginbajin> Just waiting for some others to join. 14:03:28 <pasquier-s> o/ 14:03:59 <GonZo2000> ok 14:05:15 <ddutta> Hi 14:07:12 <raginbajin> Ok I think we will get started. 14:07:22 <GonZo2000> yep 14:07:44 <raginbajin> The first thing is to recap our YVR meeting. 14:07:49 <raginbajin> You can find the link here 14:07:53 <raginbajin> #link https://etherpad.openstack.org/p/YVR-ops-tools 14:08:15 <raginbajin> our agenda for today can be find here 14:08:20 <raginbajin> #link https://etherpad.openstack.org/p/monitoring-ops-tools-meeting-agenda 14:08:45 <raginbajin> So during our meeting we discussed the following items. 14:08:56 <raginbajin> * ToolChains that people are using 14:09:13 <raginbajin> * Monitoring Tools - who's using what and a briefly why 14:09:22 <raginbajin> * health checks 14:09:26 <raginbajin> * dealing with logs 14:09:32 <raginbajin> * gathering metrics 14:09:50 <raginbajin> * Inventory data and using CMDB 14:10:12 <raginbajin> * Message Bus - thoughts (as always rabbitmq being a heartache for everyone) 14:10:23 <raginbajin> * Monitoring the Network 14:10:28 <raginbajin> * Storage Monitoring 14:10:53 <raginbajin> * We broke down operatational areas usually covered by tools. 14:11:20 <raginbajin> A lot of great comments started to go back and forth and the question was raised about how do we share this information. 14:11:39 <raginbajin> We identified an over-arching pain points. 14:11:48 <ddutta> yeah very good meeting 14:11:58 <GonZo2000> yep :) it was 14:12:15 <raginbajin> So that's the basics of what we talked about. 14:12:28 * odyssey4me is sorry that he missed it, but the notes were great for getting the gist of it 14:12:40 <raginbajin> Specifically we came away with the following things 14:12:52 <raginbajin> How do we share this information 14:13:11 <raginbajin> Some solutions was using the github repo called osops to share this information 14:13:23 <raginbajin> and to start up this discussion on a bi-weekly basis 14:13:50 <raginbajin> So that was the YVR meeting overview or meeting minutes from the last time we met. 14:14:32 <raginbajin> Any questions or comments before we move on to some of the actions that we created from that meeting? 14:14:45 <ddutta> Did we send out email to ask for contributions? And also license checks 14:15:10 <ddutta> https://github.com/orgs/osops/dashboard 14:15:24 <raginbajin> I don't think we did.. There was a few emails that I saw floating around about people mentioning it, but nothing directly related to it. 14:15:35 <ddutta> doesn't have a license yet https://github.com/osops/example-configs 14:16:07 <ddutta> Else everything seems to have apache 2.0 14:16:35 <raginbajin> Should we create an action item for sending out a notice about using the osops github group? and maybe requirements for contributing. (ie. specify a license inside your directory) 14:16:49 <ddutta> sure! 14:17:01 <pasquier-s> +1 14:17:15 <raginbajin> #action Create an email to the Operators List-serv discussing the OSOPS repo and what it should be used for 14:17:40 <odyssey4me> Excuse me if this has been covered in previous discussion, but is there a particular reason why a github organisation has been setup for this instead of using stackforge? 14:17:59 <raginbajin> #action Create a set of requirements that can be posted in email to the operators list as well in the github repo mentioning the requirements to contribute, specifically license requirements. 14:18:05 <ddutta> no idea ... but yes stack forge would be the best home eventually 14:18:27 <raginbajin> Yeah. I think this came out of a ops meet-up and was just stood-up to get going. 14:18:36 <pasquier-s> odyssey4me, I think the reason was it would less "scary" for ops to contribute there (no gerrit process) 14:19:45 <raginbajin> Should we add that as an action to evaluate the pro's con's of moving osops to stackforge? I think we need to bring in the owners and maybe get some community feedback based on those pros/cons 14:19:54 <ddutta> +1 14:20:10 <GonZo2000> + 14:20:14 <GonZo2000> +1 14:20:21 <odyssey4me> OK, ultimately it's a decision already taken - I was just curious. When we get to discussing the logstash work we may have to make an exception there - I'll cover the reasons, but don't want to derail the current agenda. 14:20:21 <pasquier-s> sounds right to me 14:20:49 <raginbajin> #action Create a list of Pros/Cons about moving the Github repo OSOPS to stackforge. Once list is created circulate it with the operators community and poll what they think. 14:20:58 <GonZo2000> I understood that there are people using Heka 14:21:05 <GonZo2000> we intend to use Heka 14:21:16 <odyssey4me> raginbajin I don't think that an eval necessarily needs to happen, but I do think that the group should have a clear idea of why it's chosen to be there - seperate from everything else openstack. 14:21:27 <pasquier-s> GonZo2000, Heka user here :-) 14:21:28 <raginbajin> make sense. 14:21:34 <raginbajin> #topic OSOPS github repo 14:22:02 <raginbajin> I think finding out the history of why it was done and see what everyone thinks is a good idea 14:22:19 <raginbajin> I'm not sure what it was, it just popped up at the PHL meetup and kinda took off from there it seems 14:22:27 <ddutta> maybe we could ask the owner of the repo 14:22:37 <GonZo2000> if we are going to the big tent 14:22:38 <raginbajin> ha. That's a good point. They would probably know best. 14:22:46 <GonZo2000> maybe we should move to gerrit workflow 14:23:13 <odyssey4me> yep, it's important to have a record of it so that when you're asked, you have an answer... but also when you re-evaluate whether it's working for you, you can check each reason and validate whether the reasons remain valid or not 14:23:54 <raginbajin> ok cool. Well we can modify that action whenever based on what we find. 14:24:28 <raginbajin> I think that kinda ends that then for now 14:24:54 <raginbajin> #topic Discussion around what people want to achieve from the meetings/working group 14:25:26 <raginbajin> The first was 14:25:31 <raginbajin> #info Place to bring up problems/challenges to monitoring/ops tools 14:26:47 <raginbajin> I think we should all try to solicit people to contribute here during these meetings. 14:26:57 <raginbajin> on the listserv 14:27:25 <raginbajin> example they are looking for tools or looking for something that needs to be done, recommend that they discuss that they bring it up here, and have others start helping us do the same. 14:28:07 <raginbajin> The next thing was 14:28:10 <raginbajin> #info Discuss/plan specs (anyone have any spec ideas?) 14:28:36 <raginbajin> Does anyone have any spec ideas? We should make this a topic for every meeting to see if anyone has any ideas and to help flush them out if they have anything 14:28:38 <ddutta> I think it would be a great idea to have this group start to help other operators out in many ways including communicating to the PTLs about the monitoring needs etc 14:29:15 <ddutta> also collate all the configs and create shared corpus-es 14:29:23 <GonZo2000> Agree ddutta 14:29:27 <raginbajin> What's the best way to gather that info to share? 14:29:28 <ddutta> collate best practices for monitoring 14:29:32 <ddutta> Wiki 14:29:34 <GonZo2000> Wiki ? 14:29:43 <odyssey4me> the scope of this group is rather large, perhaps the scope should be split into headings/sections each with agreed definitions of what they mean and what the mission of that area/section is? 14:29:55 <ddutta> +1 14:30:02 <GonZo2000> +1 14:30:25 <raginbajin> Totally works for me. How do we start to do that? 14:30:27 <pasquier-s> +1 for using the Wiki as the common place where we share documents 14:30:54 <ddutta> I can take a quick pass at the wiki sometime this week 14:30:57 <raginbajin> Ok 14:31:02 <odyssey4me> there is an issue that will quickly arise, though, which is that wiki articles and repo code goes stale really quickly... for this to work it needs a fair amount of commitment from enough people 14:31:23 <ddutta> yes, thats why we need to be at it... 14:31:42 <raginbajin> #action Break down the Wiki into headings/sections and define what each section means and the mission of that specific area 14:32:22 <GonZo2000> so do we create a proposal on a weiki page for the structure ? 14:32:32 <raginbajin> I think thats a good start 14:32:33 <GonZo2000> wiki* 14:32:34 <odyssey4me> otherwise the group is possibly better off not creating content for the most part, but rather keeping track of development objectives and focusing on influencing the various development teams to achieve the goal of improving the ability to consistantly log, and to properly monitor the very large family of OpenStack components 14:33:22 <raginbajin> odyssey4me: I think a mix of both is actually better 14:33:46 <raginbajin> Maybe the wiki page that just have very basic high level development objects that are being tracked 14:33:57 <raginbajin> objectives* 14:34:08 <raginbajin> geez that was bad typing.. 14:34:25 <odyssey4me> :) 14:34:38 <raginbajin> I totally see your point, and until this group is so active that constant updates are being done, we shouldn't try to take on a massive documentation type of project at all 14:34:56 <ddutta> It could just be a collection of links and a structure 14:34:59 <raginbajin> but I think your right that the group's topic and role is really large, so breaking it down and just maybe providing enough informaiton about what we are tracking 14:35:12 <ddutta> then we can see how it evolves 14:35:20 <raginbajin> +1 14:35:21 <GonZo2000> agree raginbajin very high level at the beggining 14:35:49 <GonZo2000> +1 14:35:54 <raginbajin> ddutta: So you going to take a swing at it? 14:36:07 <ddutta> yes I volunteer 14:36:15 <raginbajin> #action Review the first wiki page structure attempt 14:36:25 <odyssey4me> +1 keep it simple, try to take on a smaller set of things that are achievable and can build momentum... as the group grows, so can the volume of activities 14:36:32 <GonZo2000> ddutta i will try to help you as well 14:37:00 <ddutta> GonZo2000: thx 14:37:04 <raginbajin> Great... The next thing. 14:37:05 <raginbajin> #info Continuously look at and curate ops-tools repo 14:37:16 <raginbajin> I think we kinda talked about this already and have some work to do for this 14:37:57 <raginbajin> moving on then. 14:38:01 <raginbajin> #info "Stories from the front line" Perhaps if people want to talk about problems they've had with Openstack in production, where the tooling they used lead to resolving them quickly (success stories? are these useful?) 14:38:37 <raginbajin> I'm not sure if we are going to have a lot just yet. 14:38:50 <raginbajin> Maybe as we get going a little more. 14:38:51 <GonZo2000> Maybe we can ask on listserv for a "show/tell" type presentation ? 14:39:22 <odyssey4me> is the intent to have people do a quick paragraph in the meeting, or to collect some sort of formal story in the repo or something? 14:39:44 <raginbajin> I *think* it was to just have a virtual version of what we had at the Summit. 14:39:48 <odyssey4me> a lot of people share that kind of stuff in blogs already 14:40:07 <raginbajin> True, but not everyone blogs and/or can blog. 14:40:23 <GonZo2000> maybe like the RDO does a bi-weekly roundup email with links to stories ? 14:40:59 <raginbajin> I'm not sure if this is really worth while at this point. 14:41:05 <odyssey4me> I do think it's a great idea to encourage sharing a recent experience, or something recently done.... many people don't have the time or confidence to do blog posts, but would be happy to share a short story via IRC. 14:41:15 <raginbajin> exactly. 14:41:29 <raginbajin> So maybe we just ask at the end of the meeting, does anyone have a success story to share 14:41:40 <GonZo2000> lets crawl before we walk then :) 14:42:05 <odyssey4me> perhaps just set some guidelines to help tell the story - 3-5 key questions and limit each person's time to no more than x minutes in the meeting? 14:42:06 <raginbajin> #action Update meeting agenda to ask at the end of the bi-weekly irc meetings if anyone has a success story to share. 14:42:14 <GonZo2000> encourage a irc short story ? 14:42:56 <raginbajin> I think having some structure like the 3-5 questions is good. 14:43:05 <raginbajin> That way someone doesn't take 20 minutes telling a story 14:43:21 <raginbajin> maybe 2-3 minutes max, and answer these 3-5 questions and share that expierence with us 14:43:54 <GonZo2000> +1 14:44:27 <raginbajin> Ok we'll figure it out as we go. 14:44:52 <raginbajin> the last topic we have on our agenda.. 14:44:56 <raginbajin> #topic Rackspace contributions/effort around logstash configuration 14:45:18 <raginbajin> #info How can we leverage and formalise this? (opstools repo?) 14:45:42 <GonZo2000> i think the opstools repo for starters is a good 14:45:58 <raginbajin> Odyssey4me: I think this was something you posted about on the listserv to talk about 14:46:23 <odyssey4me> raginbajin yep, if you don't mind I'd like to give a little background before we decide what goes where 14:46:38 <raginbajin> No problem.. Just reading from the agenda.. 14:46:38 <odyssey4me> (or how we do what) 14:47:42 <odyssey4me> ok, so there have been various efforts from various directions to develop solutions that are related to OpenStack and the ELK stack. 14:48:20 <odyssey4me> OpenStack-Infra has an ELK stack deployed and has some logstash filters. They have no special dashboard or anything. 14:48:39 <odyssey4me> The osops repo has some submissions covering Kibana dashboards and logstash filters. 14:49:17 <odyssey4me> I'll note that while Heka is another toolset that appears to be gaining some momentum, I'm not aware of any shared 'filters' for Heka just yet. 14:49:35 <GonZo2000> true none until now 14:49:51 <GonZo2000> i hope that my team can contribute with some :) 14:50:31 <odyssey4me> Oh, back to the ELK stack Rackspace Private Cloud (my employer) is happy to contribute and participate in the development of Logstash filters that are useful to operators for production, and Kibana dashboards that are also useful for operating an OpenStack cloud. 14:51:15 <odyssey4me> OpenStack-Infra is also keen for their filters and dashboards to be part of a community effort and they're looking for assistance with upgrading and evolving their ELK stack 14:51:44 <odyssey4me> GonZo2000 sounds great! one of our team is very keen on heka and may possibly help where he can 14:51:54 <raginbajin> This is exciting to hear. My company has been working on the, but just haven't had enough time or resources to really produce valuable filters. Most of them have been very basic and doesn't break out messages just liek the Openstack-infra ones. 14:52:06 <GonZo2000> perfect odyssey4me :) 14:52:47 <odyssey4me> So, of course we need to figure out how we do this in a way that any project (such as openstack-infra and RPC) can very easily pull in the work and make use of the good stuff 14:52:56 <ddutta> We (Cisco) might also be able to help on Kibana dashboards 14:52:59 <pasquier-s> GonZo2000, odyssey4me, some Heka stuff for OpenStack is here => https://github.com/stackforge/fuel-plugin-lma-collector/tree/master/deployment_scripts/puppet/modules/lma_collector/files/plugins/decoders 14:53:28 <ddutta> I think it would be awesome to have a repo with Kibana dashboards 14:53:56 <raginbajin> ddutta: agreed Kibana 4 dashboards would be great. 14:54:14 <GonZo2000> yep :) +1 Kibana dashboards 14:54:35 <odyssey4me> I'm inclined to separate each ELK component (or similar) into its own development space/repo. I'd like to try and tackle the logstash filters personally to begin with because we've done pretty much most of the hard work already, it's really about just figuring out how to package it. 14:55:10 <odyssey4me> If someone else wants to drive Kibana dashboards, I'll be happy to contribute but take a bit more of a back seat. 14:55:35 <pasquier-s> except that Kibana dashboards depend on what/how your filters parse the logs 14:55:48 <GonZo2000> i agree with the separation, because heka can use kibana and elasticsearch 14:56:07 <odyssey4me> Considering that both openstack-infra and RPC intend to (very possibly) directly consume the logstash filters as-is, I think they'll need to be in their own stackforge repository 14:56:10 <pasquier-s> We (Mirantis) have some Kibana dashboards but I doubt they would be useful without our Heka decoders 14:56:15 <odyssey4me> perhaps even in the openstack namespace 14:57:05 <raginbajin> Ok - how should we proceed? or what are our next steps? 14:57:05 <odyssey4me> pasquier-s correct, but if we can agree between Heka and logstash on the terms to use and what info is useful to extract, then the kibana dashboards will work both ways 14:57:10 <raginbajin> we have a just a few minutes left in our meeting. 14:58:05 <odyssey4me> I propose that a project be setup within th eopenstack namespace, something like openstack-logstash-filters and that we contribute our body of work into there. 14:58:07 <pasquier-s> odyssey4me, yes, worth trying at least ;) 14:58:22 <raginbajin> in the stackforge you mean? 14:58:25 <raginbajin> or in the osops? 14:58:49 <odyssey4me> raginbajin openstack-infra has actually suggested putting it straight into the big tent 14:59:23 <raginbajin> ok.. Well then if you guys do that, then we can just contribute as we see fit 14:59:47 <odyssey4me> I'm happy to do this and get things going if that's ok with everyone. I'd just like to know who'd be keen to get involved in setting things up and curating from then on. 15:00:14 <raginbajin> Fine by me 15:00:35 <raginbajin> We have to end the meeting now. 15:00:46 <odyssey4me> I guess we're out of time, and perhaps we should sleep on this and discuss it next week. I'll start making some arrangements meanwhile. 15:00:59 <raginbajin> #action More discussion on the Logstash-filters and how this group can help. 15:01:02 <pasquier-s> bye 15:01:03 <raginbajin> #endmeeting