14:00:31 <raginbajin> #startmeeting Operators Ops Tools/Monitoring
14:00:31 <openstack> Meeting started Wed Jun 17 14:00:31 2015 UTC and is due to finish in 60 minutes.  The chair is raginbajin. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:32 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:35 <openstack> The meeting name has been set to 'operators_ops_tools_monitoring'
14:01:08 <raginbajin> #topic Recap the Summit Meetup
14:01:15 <raginbajin> Hi everyone.
14:01:48 <GonZo2000> Hi
14:02:54 <odyssey4me> o/
14:03:27 <raginbajin> Just waiting for some others to join.
14:03:28 <pasquier-s> o/
14:03:59 <GonZo2000> ok
14:05:15 <ddutta> Hi
14:07:12 <raginbajin> Ok I think we will get started.
14:07:22 <GonZo2000> yep
14:07:44 <raginbajin> The first thing is to recap our YVR meeting.
14:07:49 <raginbajin> You can find the link here
14:07:53 <raginbajin> #link https://etherpad.openstack.org/p/YVR-ops-tools
14:08:15 <raginbajin> our agenda for today can be find here
14:08:20 <raginbajin> #link https://etherpad.openstack.org/p/monitoring-ops-tools-meeting-agenda
14:08:45 <raginbajin> So during our meeting we discussed the following items.
14:08:56 <raginbajin> * ToolChains that people are using
14:09:13 <raginbajin> * Monitoring Tools - who's using what and a briefly why
14:09:22 <raginbajin> * health checks
14:09:26 <raginbajin> * dealing with logs
14:09:32 <raginbajin> * gathering metrics
14:09:50 <raginbajin> * Inventory data and using CMDB
14:10:12 <raginbajin> * Message Bus - thoughts (as always rabbitmq being a heartache for everyone)
14:10:23 <raginbajin> * Monitoring the Network
14:10:28 <raginbajin> * Storage Monitoring
14:10:53 <raginbajin> * We broke down operatational areas usually covered by tools.
14:11:20 <raginbajin> A lot of great comments started to go back and forth and the question was raised about how do we share this information.
14:11:39 <raginbajin> We identified an over-arching pain points.
14:11:48 <ddutta> yeah very good meeting
14:11:58 <GonZo2000> yep :) it was
14:12:15 <raginbajin> So that's the basics of what we talked about.
14:12:28 * odyssey4me is sorry that he missed it, but the notes were great for getting the gist of it
14:12:40 <raginbajin> Specifically we came away with the following things
14:12:52 <raginbajin> How do we share this information
14:13:11 <raginbajin> Some solutions was using the github repo called osops to share this information
14:13:23 <raginbajin> and to start up this discussion on a bi-weekly basis
14:13:50 <raginbajin> So that was the YVR meeting overview or meeting minutes from the last time we met.
14:14:32 <raginbajin> Any questions or comments before we move on to some of the actions that we created from that meeting?
14:14:45 <ddutta> Did we send out email to ask for contributions? And also license checks
14:15:10 <ddutta> https://github.com/orgs/osops/dashboard
14:15:24 <raginbajin> I don't think we did.. There was a few emails that I saw floating around about people mentioning it, but nothing directly related to it.
14:15:35 <ddutta> doesn't have a license yet https://github.com/osops/example-configs
14:16:07 <ddutta> Else everything seems to have apache 2.0
14:16:35 <raginbajin> Should we create an action item for sending out a notice about using the osops github group? and maybe requirements for contributing. (ie. specify a license inside your directory)
14:16:49 <ddutta> sure!
14:17:01 <pasquier-s> +1
14:17:15 <raginbajin> #action Create an email to the Operators List-serv discussing the OSOPS repo and what it should be used for
14:17:40 <odyssey4me> Excuse me if this has been covered in previous discussion, but is there a particular reason why a github organisation has been setup for this instead of using stackforge?
14:17:59 <raginbajin> #action Create a set of requirements that can be posted in email to the operators list as well in the github repo mentioning the requirements to contribute, specifically license requirements.
14:18:05 <ddutta> no idea ... but yes stack forge would be the best home eventually
14:18:27 <raginbajin> Yeah. I think this came out of a ops meet-up and was just stood-up to get going.
14:18:36 <pasquier-s> odyssey4me, I think the reason was it would less "scary" for ops to contribute there (no gerrit process)
14:19:45 <raginbajin> Should we add that as an action to evaluate the pro's con's of moving osops to stackforge? I think we need to bring in the owners and maybe get some community feedback based on those pros/cons
14:19:54 <ddutta> +1
14:20:10 <GonZo2000> +
14:20:14 <GonZo2000> +1
14:20:21 <odyssey4me> OK, ultimately it's a decision already taken - I was just curious. When we get to discussing the logstash work we may have to make an exception there - I'll cover the reasons, but don't want to derail the current agenda.
14:20:21 <pasquier-s> sounds right to me
14:20:49 <raginbajin> #action Create a list of Pros/Cons about moving the Github repo OSOPS to stackforge. Once list is created circulate it with the operators community and poll what they think.
14:20:58 <GonZo2000> I understood that there are people using Heka
14:21:05 <GonZo2000> we intend to use Heka
14:21:16 <odyssey4me> raginbajin I don't think that an eval necessarily needs to happen, but I do think that the group should have a clear idea of why it's chosen to be there - seperate from everything else openstack.
14:21:27 <pasquier-s> GonZo2000, Heka user here :-)
14:21:28 <raginbajin> make sense.
14:21:34 <raginbajin> #topic OSOPS github repo
14:22:02 <raginbajin> I think finding out the history of why it was done and see what everyone thinks is a good idea
14:22:19 <raginbajin> I'm not sure what it was, it just popped up at the PHL meetup and kinda took off from there it seems
14:22:27 <ddutta> maybe we could ask the owner of the repo
14:22:37 <GonZo2000> if we are going to the big tent
14:22:38 <raginbajin> ha. That's a good point. They would probably know best.
14:22:46 <GonZo2000> maybe we should move to gerrit workflow
14:23:13 <odyssey4me> yep, it's important to have a record of it so that when you're asked, you have an answer... but also when you re-evaluate whether it's working for you, you can check each reason and validate whether the reasons remain valid or not
14:23:54 <raginbajin> ok cool. Well we can modify that action whenever based on what we find.
14:24:28 <raginbajin> I think that kinda ends that then for now
14:24:54 <raginbajin> #topic Discussion around what people want to achieve from the meetings/working group
14:25:26 <raginbajin> The first was
14:25:31 <raginbajin> #info Place to bring up problems/challenges to monitoring/ops tools
14:26:47 <raginbajin> I think we should all try to solicit people to contribute here during these meetings.
14:26:57 <raginbajin> on the listserv
14:27:25 <raginbajin> example they are looking for tools or looking for something that needs to be done, recommend that they discuss that they bring it up here, and have others start helping us do the same.
14:28:07 <raginbajin> The next thing was
14:28:10 <raginbajin> #info Discuss/plan specs (anyone have any spec ideas?)
14:28:36 <raginbajin> Does anyone have any spec ideas?    We should make this a topic for every meeting to see if anyone has any ideas and to help flush them out if they have anything
14:28:38 <ddutta> I think it would be a great idea to have this group start to help other operators out in many ways including communicating to the PTLs about the monitoring needs etc
14:29:15 <ddutta> also collate all the configs and create shared corpus-es
14:29:23 <GonZo2000> Agree ddutta
14:29:27 <raginbajin> What's the best way to gather that info to share?
14:29:28 <ddutta> collate best practices for monitoring
14:29:32 <ddutta> Wiki
14:29:34 <GonZo2000> Wiki ?
14:29:43 <odyssey4me> the scope of this group is rather large, perhaps the scope should be split into headings/sections each with agreed definitions of what they mean and what the mission of that area/section is?
14:29:55 <ddutta> +1
14:30:02 <GonZo2000> +1
14:30:25 <raginbajin> Totally works for me.  How do we start to do that?
14:30:27 <pasquier-s> +1 for using the Wiki as the common place where we share documents
14:30:54 <ddutta> I can take a quick pass at the wiki sometime this week
14:30:57 <raginbajin> Ok
14:31:02 <odyssey4me> there is an issue that will quickly arise, though, which is that wiki articles and repo code goes stale really quickly... for this to work it needs a fair amount of commitment from enough people
14:31:23 <ddutta> yes, thats why we need to be at it...
14:31:42 <raginbajin> #action Break down the Wiki into headings/sections and define what each section means and the mission of that specific area
14:32:22 <GonZo2000> so do we create a proposal on a weiki page for the structure ?
14:32:32 <raginbajin> I think thats a good start
14:32:33 <GonZo2000> wiki*
14:32:34 <odyssey4me> otherwise the group is possibly better off not creating content for the most part, but rather keeping track of development objectives and focusing on influencing the various development teams to achieve the goal of improving the ability to consistantly log, and to properly monitor the very large family of OpenStack components
14:33:22 <raginbajin> odyssey4me: I think a mix of both is actually better
14:33:46 <raginbajin> Maybe the wiki page that just have very basic high level development objects that are being tracked
14:33:57 <raginbajin> objectives*
14:34:08 <raginbajin> geez that was bad typing..
14:34:25 <odyssey4me> :)
14:34:38 <raginbajin> I totally see your point, and until this group is so active that constant updates are being done, we shouldn't try to take on a massive documentation type of project at all
14:34:56 <ddutta> It could just be a collection of links and a structure
14:34:59 <raginbajin> but I think your right that the group's topic and role is really large, so breaking it down and just maybe providing enough informaiton about what we are tracking
14:35:12 <ddutta> then we can see how it evolves
14:35:20 <raginbajin> +1
14:35:21 <GonZo2000> agree raginbajin very high level at the beggining
14:35:49 <GonZo2000> +1
14:35:54 <raginbajin> ddutta: So you going to take a swing at it?
14:36:07 <ddutta> yes I volunteer
14:36:15 <raginbajin> #action Review the first wiki page structure attempt
14:36:25 <odyssey4me> +1 keep it simple, try to take on a smaller set of things that are achievable and can build momentum... as the group grows, so can the volume of activities
14:36:32 <GonZo2000> ddutta i will try to help you as well
14:37:00 <ddutta> GonZo2000: thx
14:37:04 <raginbajin> Great... The next thing.
14:37:05 <raginbajin> #info Continuously look at and curate ops-tools repo
14:37:16 <raginbajin> I think we kinda talked about this already and have some work to do for this
14:37:57 <raginbajin> moving on then.
14:38:01 <raginbajin> #info "Stories from the front line" Perhaps if people want to talk about problems they've had with Openstack in production, where the tooling they used lead to resolving them quickly (success stories? are these useful?)
14:38:37 <raginbajin> I'm not sure if we are going to have a lot just yet.
14:38:50 <raginbajin> Maybe as we get going a little more.
14:38:51 <GonZo2000> Maybe we can ask on listserv for a "show/tell" type presentation ?
14:39:22 <odyssey4me> is the intent to have people do a quick paragraph in the meeting, or to collect some sort of formal story in the repo or something?
14:39:44 <raginbajin> I *think* it was to just have a virtual version of what we had at the Summit.
14:39:48 <odyssey4me> a lot of people share that kind of stuff in blogs already
14:40:07 <raginbajin> True, but not everyone blogs and/or can blog.
14:40:23 <GonZo2000> maybe like the RDO does a bi-weekly roundup email with links to stories ?
14:40:59 <raginbajin> I'm not sure if this is really worth while at this point.
14:41:05 <odyssey4me> I do think it's a great idea to encourage sharing a recent experience, or something recently done.... many people don't have the time or confidence to do blog posts, but would be happy to share a short story via IRC.
14:41:15 <raginbajin> exactly.
14:41:29 <raginbajin> So maybe we just ask at the end of the meeting, does anyone have a success story to share
14:41:40 <GonZo2000> lets crawl before we walk then :)
14:42:05 <odyssey4me> perhaps just set some guidelines to help tell the story - 3-5 key questions and limit each person's time to no more than x minutes in the meeting?
14:42:06 <raginbajin> #action Update meeting agenda to ask at the end of the bi-weekly irc meetings if anyone has a success story to share.
14:42:14 <GonZo2000> encourage a irc short story ?
14:42:56 <raginbajin> I think having some structure like the 3-5 questions is good.
14:43:05 <raginbajin> That way someone doesn't take 20 minutes telling a story
14:43:21 <raginbajin> maybe 2-3 minutes max, and answer these 3-5 questions and share that expierence with us
14:43:54 <GonZo2000> +1
14:44:27 <raginbajin> Ok we'll figure it out as we go.
14:44:52 <raginbajin> the last topic we have on our agenda..
14:44:56 <raginbajin> #topic Rackspace contributions/effort around logstash configuration
14:45:18 <raginbajin> #info How can we leverage and formalise this? (opstools repo?)
14:45:42 <GonZo2000> i think the opstools repo for starters is a good
14:45:58 <raginbajin> Odyssey4me: I think this was something you posted about on the listserv to talk about
14:46:23 <odyssey4me> raginbajin yep, if you don't mind I'd like to give a little background before we decide what goes where
14:46:38 <raginbajin> No problem.. Just reading from the agenda..
14:46:38 <odyssey4me> (or how we do what)
14:47:42 <odyssey4me> ok, so there have been various efforts from various directions to develop solutions that are related to OpenStack and the ELK stack.
14:48:20 <odyssey4me> OpenStack-Infra has an ELK stack deployed and has some logstash filters. They have no special dashboard or anything.
14:48:39 <odyssey4me> The osops repo has some submissions covering Kibana dashboards and logstash filters.
14:49:17 <odyssey4me> I'll note that while Heka is another toolset that appears to be gaining some momentum, I'm not aware of any shared 'filters' for Heka just yet.
14:49:35 <GonZo2000> true none until now
14:49:51 <GonZo2000> i hope that my team can contribute with some :)
14:50:31 <odyssey4me> Oh, back to the ELK stack Rackspace Private Cloud (my employer) is happy to contribute and participate in the development of Logstash filters that are useful to operators for production, and Kibana dashboards that are also useful for operating an OpenStack cloud.
14:51:15 <odyssey4me> OpenStack-Infra is also keen for their filters and dashboards to be part of a community effort and they're looking for assistance with upgrading and evolving their ELK stack
14:51:44 <odyssey4me> GonZo2000 sounds great! one of our team is very keen on heka and may possibly help where he can
14:51:54 <raginbajin> This is exciting to hear. My company has been working on the, but just haven't had enough time or resources to really produce valuable filters. Most of them have been very basic and doesn't break out messages just liek the Openstack-infra ones.
14:52:06 <GonZo2000> perfect odyssey4me :)
14:52:47 <odyssey4me> So, of course we need to figure out how we do this in a way that any project (such as openstack-infra and RPC) can very easily pull in the work and make use of the good stuff
14:52:56 <ddutta> We (Cisco) might also be able to help on Kibana dashboards
14:52:59 <pasquier-s> GonZo2000, odyssey4me, some Heka stuff for OpenStack is here => https://github.com/stackforge/fuel-plugin-lma-collector/tree/master/deployment_scripts/puppet/modules/lma_collector/files/plugins/decoders
14:53:28 <ddutta> I think it would be awesome to have a repo with Kibana dashboards
14:53:56 <raginbajin> ddutta: agreed Kibana 4 dashboards would be great.
14:54:14 <GonZo2000> yep :) +1 Kibana dashboards
14:54:35 <odyssey4me> I'm inclined to separate each ELK component (or similar) into its own development space/repo. I'd like to try and tackle the logstash filters personally to begin with because we've done pretty much most of the hard work already, it's really about just figuring out how to package it.
14:55:10 <odyssey4me> If someone else wants to drive Kibana dashboards, I'll be happy to contribute but take a bit more of a back seat.
14:55:35 <pasquier-s> except that Kibana dashboards depend on what/how your filters parse the logs
14:55:48 <GonZo2000> i agree with the separation, because heka can use kibana and elasticsearch
14:56:07 <odyssey4me> Considering that both openstack-infra and RPC intend to (very possibly) directly consume the logstash filters as-is, I think they'll need to be in their own stackforge repository
14:56:10 <pasquier-s> We (Mirantis) have some Kibana dashboards but I doubt they would be useful without our Heka decoders
14:56:15 <odyssey4me> perhaps even in the openstack namespace
14:57:05 <raginbajin> Ok - how should we proceed? or what are our next steps?
14:57:05 <odyssey4me> pasquier-s correct, but if we can agree between Heka and logstash on the terms to use and what info is useful to extract, then the kibana dashboards will work both ways
14:57:10 <raginbajin> we have a just a few minutes left in our meeting.
14:58:05 <odyssey4me> I propose that a project be setup within th eopenstack namespace, something like openstack-logstash-filters and that we contribute our body of work into there.
14:58:07 <pasquier-s> odyssey4me, yes, worth trying at least ;)
14:58:22 <raginbajin> in the stackforge you mean?
14:58:25 <raginbajin> or in the osops?
14:58:49 <odyssey4me> raginbajin openstack-infra has actually suggested putting it straight into the big tent
14:59:23 <raginbajin> ok.. Well then if you guys do that, then we can just contribute as we see fit
14:59:47 <odyssey4me> I'm happy to do this and get things going if that's ok with everyone. I'd just like to know who'd be keen to get involved in setting things up and curating from then on.
15:00:14 <raginbajin> Fine by me
15:00:35 <raginbajin> We have to end the meeting now.
15:00:46 <odyssey4me> I guess we're out of time, and perhaps we should sleep on this and discuss it next week. I'll start making some arrangements meanwhile.
15:00:59 <raginbajin> #action More discussion on the Logstash-filters and how this group can help.
15:01:02 <pasquier-s> bye
15:01:03 <raginbajin> #endmeeting