15:06:29 <anteaya> #startmeeting third-party
15:06:29 <openstack> Meeting started Mon Jul 11 15:06:29 2016 UTC and is due to finish in 60 minutes.  The chair is anteaya. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:06:31 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:06:33 <openstack> The meeting name has been set to 'third_party'
15:06:36 <anteaya> thanks lennyb
15:06:46 <anteaya> I was deep in figuring out storyboard apis
15:06:48 <anteaya> :)
15:07:16 <mmedvede> hi anteaya
15:07:23 <anteaya> how are you today, mmedvede?
15:07:30 <mmedvede> all good, thanks
15:07:36 <anteaya> oh I'm so glad
15:07:39 <anteaya> nice day here
15:07:53 <anteaya> had a deer on my lawn last night for the longest time
15:08:18 <anteaya> does anyone have anything they would like to discuss today?
15:08:19 <mmedvede> lots of deer and rabbit around where I am
15:08:24 <mmedvede> :)
15:08:26 <anteaya> mmedvede: nice
15:08:27 <anteaya> :)
15:08:31 <anteaya> I love watching them
15:08:52 <wznoinsk> hi there
15:09:00 <anteaya> hey wznoinsk
15:09:12 <anteaya> #info http://lists.openstack.org/pipermail/openstack-dev/2016-July/098992.html pin nodepool
15:09:19 <wznoinsk> hi anteaya
15:09:30 <anteaya> so have you all read asselin__'s post to dev about pinning nodepool?
15:09:37 <anteaya> wznoinsk: nice to see you
15:10:02 <asselin__> hi. i'm here but double-booked.
15:10:14 <wznoinsk> anteaya, good to be here ;-) even tho still on vacation
15:10:26 <anteaya> asselin__: thanks for the post to dev
15:10:51 <anteaya> wznoinsk: oh my, well glad you are here but I hope you enjoy vacation
15:11:05 <anteaya> does anyone have anything they would like to discuss today?
15:11:20 <mmedvede> I have a question regarding OpenStack infra monitoring
15:11:27 <anteaya> mmedvede: go ahead
15:11:30 <mmedvede> (trying to set something up myself)
15:11:46 <mmedvede> does the team use any automated notification system?
15:11:50 <anteaya> no
15:11:52 <mmedvede> or looked into having one?
15:12:03 <anteaya> humans are much faster than any automated notifaction system
15:12:04 <wznoinsk> anteaya, don't mind this little break to get back and set my mind into a technical mode
15:12:21 <anteaya> we have purposely not wanted any automatic notification system
15:12:32 <anteaya> none of the infra team has a pager, nor do we want one
15:12:35 <lennyb> mmedvede: what are you looking for?
15:12:40 <anteaya> wznoinsk: fair enough
15:13:03 <anteaya> some infra team member purposely choose to work in this environment as a way of leaveing a pager behind
15:13:08 <wznoinsk> anteaya, this maybe due to amount of notifications teams may get and get around to do the 'really important' ones, isn't it?
15:13:30 <mmedvede> I ideally am looking for alerts being sent to irc, triggered by anomaly detection in metrics
15:13:45 <anteaya> wznoinsk: well there a a couple of reasons, one is a lifestyle choice, when our infra folks are online they are responding to things in channel as fast as possible
15:13:55 <anteaya> when they are offline, they really need to be offline
15:13:56 <mmedvede> anteaya: agree on no pager. I am talking about irc alerts
15:14:00 <wznoinsk> mmedvede, what monitoring system you have as the source of these alerts?
15:14:06 <anteaya> mmedvede: what kind of alerts?
15:14:38 <mmedvede> wznoinsk: none yet. was considering graphite-beacon initially (simple thresholds trigger a script that sends irc message)
15:15:06 <mmedvede> anteaya: any sorts of alerts, e.g. zuul lost connection to OpenStack gerrit
15:15:45 <mmedvede> anteaya: but my question is mostly to find out if infra considered any tooling (as generally you chose good tools :) )
15:15:54 <anteaya> zuul would reconnect then would it not, if it lost a connection to gerrit?
15:16:11 <anteaya> ah, well our irc bots are in need of an overhaul
15:16:22 <anteaya> but noone has had the time & interest to do it
15:16:48 <anteaya> as our irc bots are fraught with issues I think due to threading which requires us to pin a bot to a server
15:16:58 <anteaya> then if the server goes down we lose the bot
15:17:36 <anteaya> so in terms of irc messaging tool I believe that infra does not feel we are using the latest bright and shiny
15:17:52 <wznoinsk> mmedvede, can an event be used (like running an arbitrary command) when alert occurs in graphite?
15:18:45 <mmedvede> wznoinsk: yes, but you need external tool (e.g. graphite-beacon)
15:19:15 <wznoinsk> on the other hand, the more puzzles in the 'notification' system the mroe prone it is it will not work in some cases
15:19:51 <wznoinsk> my setup was nagios + nagstamon (windows app that sits on your desktop/systray and makes flashing/noise when nagios sees someting bad)
15:19:52 <mmedvede> anteaya: ok, that is more or less what I thought
15:20:27 <anteaya> mmedvede: it is an interesting discussion, I'm curious as to your motivation for it, what are you looking to fix or address?
15:20:43 <wznoinsk> I'd recommend something as simple as that for notification + externally hosted script to check the monitoring system (nagios, graphite) itself ;-)
15:20:48 <mmedvede> wznoinsk: I found nagios so far hard to manage
15:21:26 <wznoinsk> mmedvede, I'm not promoting nagios in any form (it's just what I was used to and didn't want to learn new monitoring tool back then)
15:21:51 <mmedvede> anteaya: main motivation is to react quickly to things going wrong. we do not have downstream users to complain, so sometimes it takes awhile to catch things
15:22:03 <wznoinsk> bad wording, again: I like nagios, but the above was just an example of noitification simplicity (to avoid problems with the notification system itself)
15:22:34 <anteaya> mmedvede: ah for your personal use, yes that makes sense
15:23:07 <anteaya> mmedvede: um, how many of your tools send you email alerts on failures?
15:23:11 <mmedvede> wznoinsk: I understand what you are saying :)
15:24:01 <wznoinsk> mmedvede, good ;-)
15:24:06 <mmedvede> anteaya: I did not configure emails on failure. I think after awhile you would start ignoring them, as noise
15:24:35 <anteaya> well we are talking about a system for you to know when your system is failing, are we not?
15:25:02 <anteaya> if the tools have the ability to email you, have you tried to figure out how to get that feature to work for you?
15:25:02 <lennyb> mmedvede: we send emails on 5 failures
15:25:26 <mmedvede> anteaya: I thought you meant CI failures
15:25:35 <mmedvede> like jenkins test failed
15:25:45 <anteaya> I don't want your emails
15:26:00 <anteaya> but I thought we were talking about solving a problem you have
15:26:07 <anteaya> so get your tools to email you
15:26:21 <anteaya> and you can configure it as you see fit, as lennyb suggests
15:26:32 <mmedvede> lennyb: that works to a degree, but you might wait 5 hours before you get email
15:27:16 <mmedvede> anteaya: getting a tool to email is not a problem. Main puzzle piece is what to use to decide when to send an alert
15:27:26 <mmedvede> there are a lot of options
15:28:12 <lennyb> mmedvede: correct. my assumption is that a single failure is a developer responsibility, if there are a number of failures, that probably means that the problem is my CI. I still have in my todo list a script to compare my CI  failure to the others
15:28:18 <anteaya> mmedvede: ah, yes I do agree
15:29:51 <watanabe_isao> anteaya, do we have a way to know if the zuul of infra CI is down ASAP? I'm thinking maybe that is mmedvede 's question?
15:30:08 <anteaya> mmedvede: is that your question?
15:30:27 <wznoinsk> mmedvede, does graphite have a configuration on how many attempts a check has to fail before its marked as a WARNING/CRITICAL?
15:31:06 <mmedvede> watanabe_isao: I brought up our CI's zuul as example
15:31:30 <mmedvede> there are many more things we need to monitor, zuul was one of them that misbehaves frequently
15:31:58 <mmedvede> wznoinsk: graphite is not monitoring tool, it is aggregation. So it does not have alerts
15:32:24 <mmedvede> wznoinsk: so someone wrote graphite-beacon to monitor graphite metrics
15:32:29 <watanabe_isao> mmedvede, I see. Well in my third party CI zuul hungs before due to some issue, and I need to check it every day now, which is a nightmare.
15:32:30 <mmedvede> (there are many others)
15:32:37 <wznoinsk> mmedvede, is that the graphite youre talking about ? http://graphiteapp.org/ ?
15:32:52 <anteaya> mmedvede: well it sounds like there is no existing thing that does what you are looking for, my suggestion would be to put something in an etherpad that specificies _exactly_ what you want, since we seem to be getting lost guessing due to generalities
15:33:21 <mmedvede> wznoinsk: in the context of OpenStack infra - http://graphite.openstack.org/
15:33:31 <anteaya> then once you get a few people to read the etherpad who can then repeat back what you say you need such that they understand what you want, post to the infra list
15:33:45 <anteaya> since I will be honest, currently I don't know what it is you want
15:34:25 <mmedvede> anteaya: it is ok, I know what I want to try already. And this discussion confirmed I did not missed some obscure super-cool tool everyone is using
15:34:48 <anteaya> ah that was the point of this conversation
15:34:50 <anteaya> okay great
15:34:55 <anteaya> glad you got what you needed
15:35:11 <anteaya> and yeah, I don't think you are missing out in any of the latest hotness
15:35:35 <anteaya> does anyone have anything more for this discussion?
15:35:41 <watanabe_isao> anteaya, are we only talking about 3rd party tol here? May I ask something about devstack-gate?
15:35:55 <anteaya> watanabe_isao: you can ask
15:36:00 <anteaya> this is the third-party meeting
15:36:21 <mmedvede> we use devstack-gate (some of us)
15:36:23 <anteaya> so anything you ask will be viewed in the context of third party operators and their activitiese
15:36:27 <wznoinsk> mmedvede, ok - I was only reading the 'about' section of graphite, sometimes it's hard to tailor a tool for data collecting/metrics for the monitoring/notifications purposes... would graphite be your only source of data you want to alert/notify on? or would you want to monitor output of different kinds (i.e.: particular processes on a machine, run a completely custom check etc.) ?
15:36:53 <watanabe_isao> Does anyone considered about a mid_test_hook?
15:37:07 <mmedvede> wznoinsk: right now it seems graphite (statsd metrics) is a good way to aggregate everything
15:37:20 <anteaya> what do you want a test hook to do in the middle of a test?
15:37:22 <watanabe_isao> With is used to execute some commands before tempest
15:37:46 <mmedvede> watanabe_isao: do you mean after devstack, but before tempest?
15:38:02 <watanabe_isao> anteaya, to set up the environment, like add a node to ironic.
15:38:09 <watanabe_isao> mmedvede, yes
15:38:11 <wznoinsk> watanabe_isao, if you're thinking what I think you're thinking about you probably want to use local.sh that devstack itself runs at the very end
15:38:34 <mmedvede> watanabe_isao: we actually have ironic job that does something like that
15:38:40 <watanabe_isao> wznoinsk, I know it also can e.x. add the node to ironic
15:39:04 <mmedvede> we use pre_test_hook to create config with baremetal node information
15:40:00 <watanabe_isao> wznoinsk, for example you want to add a node to ironic as late as you can. But devstack install takes too long time.
15:41:03 <wznoinsk> mmedvede, when I think about it... I agree, I could probably tailor nearly all of my custom scripts to output some form of metric to graphite... monitoring tools usually have graphing tools built-in tho... my main aim is to monitor, alert/notify 2nd to see historical graphs hence I use nagios
15:42:02 <wznoinsk> watanabe_isao, sorry, I can't help you here, don't use Ironic here yet
15:42:41 <mmedvede> watanabe_isao: we considered adding node later, but for POC job ended up de-facto adding it before devstack
15:42:47 <wznoinsk> but shouldn't a stacked node, given OS_URL and other links to the controller/keystone, register itself up (note: lack of Ironic knowledge here)
15:42:49 <watanabe_isao> wznoinsk, it's ok. well in my use case, I also want to run some local scripts before tempest
15:42:57 <mmedvede> watanabe_isao: did you consider a devstack plugin?
15:43:27 <watanabe_isao> mmedvede, no just some local scripts.
15:44:10 <watanabe_isao> mmedvede, but it is a good point I think.
15:45:43 <anteaya> anything more on this topic or the monitoring one?
15:45:52 <anteaya> we seemed to be doing both at the same time
15:46:21 <anteaya> does anyone have anything else they would like to discuss today?
15:46:41 <wznoinsk> anteaya, we've agreed with moshele from Mellanox to submit a barcelona talk abstract about SRIOV/NFV CI setups we have... we both use openstackci toolset... I'd like to bring it up with os infra guys, is tomorrows 3rdparty WG meeting a good one for this?
15:46:42 <watanabe_isao> anteaya, one more on ci-watch, please.
15:47:09 <watanabe_isao> wznoinsk, you first.
15:47:32 <wznoinsk> watanabe_isao, go with yours, given the time
15:47:46 <watanabe_isao> wznoinsk, thanks
15:48:30 <watanabe_isao> anteaya, does any of us going to give ci-watch a filter?
15:48:44 <anteaya> what filter might they give?
15:49:37 <mjturek1> hey all mmedvede is on his way back. Lost power and is reconnecting
15:49:48 <anteaya> mjturek1: thank you
15:49:59 <mjturek1> np!
15:50:00 <watanabe_isao> anteaya, my CI in cinder is always at below, and I don't want to see some CI's result. I'm talking about a filter to stop showing some results.
15:50:23 <anteaya> ah filter out results you don't want
15:50:35 <watanabe_isao> anteaya, yes.
15:50:43 <anteaya> watanabe_isao: well the ci-watch code is using gerrit, so you could offer a patch
15:51:14 <anteaya> even if you aren't sure of the code, write a clear commit message saying what you want the patch to do, and hopefully kind reviewers can help you get the patch in shape
15:51:43 <watanabe_isao> anteaya, got it. Currently it is just a idea. will do it.
15:52:26 <anteaya> #link http://git.openstack.org/cgit/openstack/third-party-ci-tools/
15:52:33 <anteaya> I believe that is the repo
15:52:49 <anteaya> great, more on this or shall we move to wznoinsk's question?
15:53:06 <watanabe_isao> anteaya, thank you. yes. please go.
15:53:11 <anteaya> thanks
15:53:16 <anteaya> wznoinsk: you are up
15:54:18 <anteaya> wznoinsk: I believe your question was about making the infra team aware of a talk you have submitted?
15:55:40 <anteaya> well a few things, I personally don't endorse anyone else's talk proposal, lest I be inindated with requests
15:55:41 <wznoinsk> you, os infra, guys would know best what community is looking for about 3rdparty CI setups so I wanted to have a chat about our yet-general sriov/nfv CI talk in barcelona on tomorrows 3rd party WG meeting
15:56:25 <anteaya> if you want to discuss the content of the proposal prior to proposing that is fine, you can ask questions in the infra channel
15:56:59 <anteaya> if you want the whole team to discuss something (both men and women, not just the guys) then you can add an agenda item to the infra team meeting: https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting
15:57:15 <wznoinsk> ok, I'll share more on nthis soon then
15:57:26 <anteaya> I have a lot of meetings already and am hard pressed to attend more
15:57:48 <anteaya> I can't speak for other infra team members but if there is someone you would like to invite you are welcome to ask them
15:58:18 <anteaya> thanks
15:58:22 <anteaya> more on this topic?
15:58:39 <wznoinsk> nope, thanks
15:58:42 <anteaya> thank you
15:58:49 <anteaya> anyone with anything else today?
15:58:55 <anteaya> about 1 minutes remaining
15:59:01 <anteaya> 1 minute
15:59:07 <watanabe_isao> anteaya,  sorry that I'm new to this meeting. Do we have another meeting, tomorrow?
15:59:20 <anteaya> watanabe_isao: thanks for attending, glad you have you
15:59:40 <anteaya> watanabe_isao: all openstack meetings are listed here: http://eavesdrop.openstack.org/
16:00:07 <anteaya> #link http://eavesdrop.openstack.org/#Third_Party_Meeting
16:00:18 <anteaya> #link http://eavesdrop.openstack.org/#Third_Party_Working_Group_Meeting
16:00:28 <anteaya> those would be the links you are looking for
16:00:31 <anteaya> and time to end
16:00:32 <watanabe_isao> anteaya, ohhh,
16:00:32 <watanabe_isao> Third Party Working Group Meeting
16:00:43 <anteaya> thank you everyone see you next week
16:00:46 <anteaya> #endmeeting