#openstack-meeting-5 log

15:00:37 <mattmceuen> #startmeeting openstack-helm
15:00:38 <openstack> Meeting started Tue Oct 17 15:00:37 2017 UTC and is due to finish in 60 minutes.  The chair is mattmceuen. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:39 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:41 <openstack> The meeting name has been set to 'openstack_helm'
15:00:49 <mattmceuen> #topic rollcall
15:00:57 <portdirect> o/
15:01:03 <jayahn> o/
15:01:09 <mateuszb> o/
15:01:17 <alraddarla> o/
15:01:18 <mattmceuen> Here's the agenda, all -- I'll give you a couple mins to add topics: https://etherpad.openstack.org/p/openstack-helm-meeting-2017-10-17
15:03:13 <v1k0d3n> o/
15:03:17 <xek> o/
15:03:55 <mattmceuen> #topic update to official project status
15:04:25 <mattmceuen> Just an update -- we submitted OSH for TC governance last week
15:04:27 <mattmceuen> https://etherpad.openstack.org/p/openstack-helm-meeting-2017-10-17
15:04:32 <srwilkers> o/
15:04:35 <jayahn> yeap.
15:04:53 <mattmceuen> Review is ongoing, but it's looking very positive.  Seven +1 roll-call votes so far. :)
15:05:42 <mattmceuen> That's really all I have to say about that.  Hopefully good progress next week.  Any other thoughts on the topic?
15:06:19 <mattmceuen> #topic log-based alerting
15:06:21 <portdirect> nothing other than congrats mattmceuen
15:06:30 <portdirect> :D
15:06:32 <mattmceuen> Thanks @portdirect :)  back at you man
15:06:42 <mateuszb> I've been investigating different methods for sending alerts based on logs
15:06:54 <mateuszb> Tat was discussed during last AT&T and Intel meeting
15:07:08 <v1k0d3n> yeah, really happy to see this mattmceuen. great seeing you leading the charge.
15:07:16 <mateuszb> Before I proceed with an implementation I'd like to know your opinion ;)
15:07:31 <mateuszb> As for now I'm aware of 2 approaches:
15:07:36 <mateuszb> 1. Using ElastAlert
15:07:55 <mateuszb> 2. Using Prometheus plugin for fluentd (https://github.com/kazegusuri/fluent-plugin-prometheus )
15:08:24 <mateuszb> The idea behind both of them is the same: search for specified patterns (for example "Can't connect to MySQL") in container logs
15:08:35 <mateuszb> and if the pattern occurs - fire an alert
15:09:01 <mateuszb> I'd be rather in favor of using ElastAlert as it's easier to configure and more intuitive for me
15:09:17 <mateuszb> It periodically queries ElasticSearch to retrieve a given pattern
15:09:39 <mateuszb> It has highly modular approach - each rule is a separate file
15:09:59 <jayahn> I will look at ElastAlert. Just a question here. What is our purpose here? there can probably be many method to do log-based alarm. do we want to say "there is a reference implementation on log-based alert"?
15:09:59 <mateuszb> You can configure timeouts, thresholds and so on out of the box. I'd require, however, additional chart to be implemented
15:10:22 <mattmceuen> How does that compare to the Prometheus plugin, mateuszb?
15:10:58 <mattmceuen> Does it also support a modular approach, or more monolithic?
15:11:05 <mateuszb> jayahn - that's a good question :) I'd like to know what the requirements from AT&T side are
15:11:21 <portdirect> mateuszb: thats not to important here :)
15:11:42 <portdirect> what does the community want - hopefully we align internally with that
15:11:46 <jayahn> important stuff would be "what alert we want to have". this can be a shared list. how to implement it will highly depends on what tools you use.
15:11:49 <mattmceuen> Agree.
15:12:00 <mattmceuen> I would like srwilkers as our LMA SME to weigh in as well
15:12:05 <portdirect> ++
15:12:06 <jayahn> so, IMHO, defining the basic set of alerts would be valuable.
15:12:35 <srwilkers> i'd need to look at the differences between elastalert and the prometheus plugin, but i'd honestly prefer to handle it via prometheus if the end result is the same
15:13:10 <portdirect> srwilkers: how is the prometheus ps progressing?
15:13:17 <mateuszb> I agree. I can imagine that for the beginning we can start with alerts which inform about DB/rabbitmq connection issues in specified pods
15:13:23 <portdirect> its pretty clost to being out of WIP is it not?
15:13:26 <srwilkers> portdirect: its out of WIP and ready for some initial reviews
15:13:36 <portdirect> w00t
15:13:49 <srwilkers> im currently working on getting jobs set up for the controller manager and scheduler, but thats roughly another ~10m worth of work
15:14:03 <srwilkers> once that's done, i'll be happy with where it stands
15:14:33 <mattmceuen> would it make sense to take this as a homework assignment:  srwilkers and jayahn to get acquainted w/ ElastAlert, and mateuszb maybe you could do a comparative example of a few rules, implemented by both tools, that you could share?
15:14:43 <jayahn> if we know what alerts we want to create, then we can use that list to evaluate how "a specific alert implmentation tool" can satisfy our requirements.  that said, initiating an effort to defining a basic set of alerts would be also valuable.
15:14:44 <mateuszb> The result would be the same (or at least as for now I don't know any limitations)
15:14:53 <portdirect> that sounds like an awesome idea mattmceuen
15:15:23 <srwilkers> if you call this homework, im going to be a typical CS student and wait until the last minute ;)
15:15:33 <portdirect> and presumably there is nothing precluding both from being implemented?
15:15:34 <mattmceuen> Sorry, I meant to say "ice cream"
15:15:40 <portdirect> if they serve diff use cases
15:16:05 <mateuszb> Ok, so I'll prepare 2 patchets (one for ElastAlert and one for fluentd-prometheus plugin)
15:16:07 <srwilkers> portdirect: nope, there's not.  like jayahn said, it'd just be a matter of determining if we want to choose a reference or not
15:16:25 <jayahn> we should say "pizza". typical CS student can do anything for pizza.
15:16:29 <mattmceuen> @jayahn that is another good homework assignment :) do we have any list in progress around specific alerts needed?
15:16:37 <mattmceuen> ha!
15:16:40 <srwilkers> mateuszb: in regards to the fluentd plugin, it'd be nice to make sure it works with the fluentbit+fluentd set up that's in review currently
15:17:03 <jayahn> i can write up some initial docs regarding alert needed
15:17:06 <mateuszb> srwilkers: I'll do that on top of fluentbit+fluentd PS
15:17:28 <mateuszb> srwilkers: in fact, that is a requirement for fluentd-prometheus plugin to work
15:17:45 <mattmceuen> jayahn mateuszb awesome, thank you!
15:18:04 <mateuszb> cause it enforces us to use only one fluentd instance per cluster which I think will be true when fluentd+fluentbit PS is merged
15:18:33 <mateuszb> jayahn that would be great
15:19:06 <mattmceuen> #action mateuszb will prep comparison of ElastAlert and fluentd-prometheus
15:19:34 <mattmceuen> #action jayahn will take a stab at initial alerting need documentation
15:19:41 <jayahn> okay
15:19:44 * portdirect notes a mech eng student only works for alcohol
15:19:52 <mateuszb> :)
15:20:05 <mattmceuen> Good discussion guys, anything else on this topic?
15:20:17 * jayahn an korean student regardless of major onnly works for alcohol
15:20:21 <mateuszb> that's all from my side
15:20:23 <srwilkers> ahahaha
15:20:43 <mattmceuen> @jayahn lol
15:20:54 <mattmceuen> #topic cw's RFC on storage class name
15:21:06 <mattmceuen> I'm not sure we have cw?
15:21:14 <srwilkers> doesnt seem so
15:21:30 <srwilkers> this was just to rename the general storage class right?
15:21:44 <mattmceuen> Well let's get him some opinions offline.  He has an RFC out for whether we should use "general" or some other name.
15:21:48 <mattmceuen> yup
15:21:56 <portdirect> I have some feedback in there: https://review.openstack.org/#/c/511975/
15:22:14 <mattmceuen> thx portdirect, copy/paste was failing me ;-)
15:22:51 <mattmceuen> I'll leave it at that for now -- if you'd like to see the discussion or weigh in, please visit the review!
15:23:24 <mattmceuen> #topic Official project docs move (timing and process)
15:23:52 <mattmceuen> After we become an official project, I think we'll want to move our docs here, correct? https://docs.openstack.org/openstack-helm/latest/
15:23:58 <mattmceuen> (away from readthedocs)
15:24:31 <portdirect> yup
15:24:47 <portdirect> I think this should be very simple to do
15:24:49 <mattmceuen> Will we deprecate the readthedocs at that point, or leave it up?  Maintain it?
15:25:23 <portdirect> I know lamt has insight into what needs done, but i think a theme change and a single ps to infra
15:25:32 <portdirect> I vote for retiring the readthedocs
15:25:48 <portdirect> though it would be possible to have both
15:25:55 <portdirect> and they would stay in sync
15:25:56 <jayahn> i also vote for retiring the readthedocs once we move to the official doc
15:25:57 <srwilkers> i'd rather retire them than maintain them
15:26:02 <portdirect> but seems silly to have them in two places
15:26:03 <mattmceuen> +1
15:27:02 <mattmceuen> #action mattmceuen to discuss readthedocs->docs.openstack.org with lamt
15:27:36 <mattmceuen> Alrighty:  reviews time
15:27:41 <mattmceuen> #topic reviews needed
15:27:48 <jayahn> once we move to docs.openstack.org, i can ask official translation to to take openstack-helm project.
15:27:59 <mattmceuen> great point jay
15:28:07 <jayahn> i talked with srwilkers on the review.
15:28:07 <mattmceuen> awesome
15:28:38 <srwilkers> yep :)
15:28:39 <portdirect> jayahn: long as you promise not to write nasty things about us in korean ;)
15:28:58 <mattmceuen> save any nasty comments for english please :-D
15:29:19 <jayahn> nothing to add. personally, i would like to have both prometheus and fluent working in our env. asap. :) .. let's get it done.
15:29:26 <jayahn> ha ha..
15:29:39 <mattmceuen> yes
15:29:47 <srwilkers> jayahn: +1
15:29:50 <jayahn> you have google translation. so i will never wirte nasty stuff in korean.
15:30:10 <mattmceuen> Here's the review for fluent: https://review.openstack.org/#/c/507023/
15:30:21 <jayahn> however, we know how to avoid google translation, but still understand each other. tweaked version of korean writing. lol
15:30:29 <srwilkers> ;)
15:30:37 <srwilkers> in regards to that review
15:31:09 <srwilkers> im happy with it after discussing it with jayahn.  takeaway was that we can leave kafka as an opportunity for a future addition to OSH
15:31:17 <portdirect> +2
15:31:17 <srwilkers> potentially something for a new contributor to try their hand at
15:31:22 <jayahn> yes.
15:31:23 <mattmceuen> excellent
15:31:31 <portdirect> that was also the one with some fun stuuf in the dep chack was it not?
15:31:36 <portdirect> *check
15:32:12 <portdirect> ah yes: https://review.openstack.org/#/c/507023/9/fluent-logging/templates/deployment-fluentd.yaml
15:32:37 <portdirect> we should probably make a blueprint for extending the dep checking model to account for conditionals
15:32:47 <portdirect> as this is starting to turn up more and more
15:32:51 <portdirect> esp in neutron
15:32:58 <mattmceuen> very good point
15:32:59 * portdirect why is it always neutron?
15:33:00 <srwilkers> portdirect: i agree
15:33:05 <jayahn> +1
15:33:21 <srwilkers> there are somethings that are best handled via overrides, but conditionals may not be the best place for them
15:33:30 <srwilkers> took me chewing on that a bit to feel that way
15:33:47 <srwilkers> well, not all conditionals ill say
15:33:51 <portdirect> I agree - I've had a few chats with alanmeadows on this
15:34:12 <portdirect> and will try and summarise what we were mulling over and throw it out there
15:34:47 * portdirect notes that hes should bring up somthing else in any other business
15:34:52 <mattmceuen> #action portdirect to create a blueprint for extending the OSH dependency checking model to account for conditionals
15:35:06 <mattmceuen> before other business
15:35:20 <mattmceuen> any other outstanding reviews we need more eyeballs on?
15:36:00 <mattmceuen> (also thanks pete :) )
15:36:01 <portdirect> https://review.openstack.org/#/c/457754/
15:36:02 <srwilkers> nope, im good
15:36:15 <portdirect> https://review.openstack.org/#/c/507628/
15:36:33 <portdirect> ^^ both of these ceph ones are right up on my list of things we want
15:36:46 <portdirect> i think the disc targetting has a bit to go
15:36:51 <portdirect> but is getting close
15:37:10 <mattmceuen> That is great to hear
15:37:12 <srwilkers> nice.  ill take a gander and see what's going on there
15:37:36 <mattmceuen> Team:  good work on reviews this week, I think we're going in a good direction there.
15:37:45 <mattmceuen> #topic other business
15:37:50 <mattmceuen> take it away portdirect
15:38:14 <portdirect> so - (and I'm gonna me a blueprint for this)
15:38:33 <portdirect> we have a need to be able to run multiple configs of some daemonsets
15:38:46 <portdirect> ie nova-compute and ovs-agent
15:39:19 <mattmceuen> is "multiple" on the order of per-node, or a constant?
15:39:46 <portdirect> we can currently achive this though some footwork using the manifests: true/false, and multiple deployments of a chart, but I'd like a cleaner solution
15:40:15 <mattmceuen> give me big-o notation, portdirect!
15:40:20 <portdirect> multiple for us could mean a lot of things unfortunatly - from groups of hardware to canary nodes
15:41:02 <portdirect> so I'm tinking of extending the to_oslo_config logic for damonsets, and the damonset manifests themselves to allow this
15:41:19 <portdirect> so an example config would be as folows:
15:43:06 * srwilkers waits on the edge of his seat
15:43:23 <alraddarla> he probably got distracted while he was typing
15:43:49 <portdirect> https://www.irccloud.com/pastebin/Fi1hpGbz/
15:44:07 <portdirect> so most compute nodes would be debug=true
15:44:24 <portdirect> ones labeled with `arbitary_label` would be debug=false
15:44:55 <portdirect> and if its hostname was `hostname` it would be debug=false regardless of what labels etc were applied
15:45:25 <portdirect> thoughts before I do the initial write-up/proposal?
15:46:08 <srwilkers> hmm
15:46:32 <mattmceuen> interesting - am I correct in assuming we would continue not to use this for e.g. disk targeting?  Or do you see it being used for that?
15:46:57 <portdirect> disc tragtting could potentially benifit from this
15:47:12 <portdirect> though thats solving a slightly diff problem
15:47:52 <mattmceuen> my only concern is that "node_groups" and "nodes" are only meaningful if you already know what they mean
15:48:11 <portdirect> this would need to be documented
15:48:57 <mattmceuen> Would it make sense to put them under something (or call them something) more descriptive, like (but better than) per_thing_configs.node_group
15:49:42 <portdirect> good point
15:49:46 <portdirect> so like this?
15:50:08 <portdirect> https://www.irccloud.com/pastebin/FetsDzFf/
15:51:10 <mattmceuen> that is a good point, but not the point I was trying to make :)
15:51:18 <srwilkers> i suppose i'd need to see what's written up and get the line of thinking behind it
15:51:18 <portdirect> lol
15:51:51 <portdirect> will do - just putting it out there, for us internally this will be required
15:52:02 <mattmceuen> I meant literally an id like "per_thing_conf" that made clear what a "node_group" or a "nodes" is -- what's the context
15:52:09 <portdirect> but would be great to have a solution that works well upstream
15:52:18 <mattmceuen> I really like the flexibility
15:52:24 <portdirect> gotcha
15:52:45 <jayahn> i will follow up on portdirect's upcoming write-up on this.
15:53:03 <mattmceuen> +1 same, looking forward to it
15:53:05 <jayahn> and. with my thinking hat.
15:53:37 <mattmceuen> Aright guys - any other topics?
15:53:42 <portdirect> I'm good
15:53:49 <srwilkers> oh, unrelated
15:53:56 <srwilkers> but TC voting is happening currently
15:53:59 <srwilkers> dont forget to vote!
15:54:06 <mattmceuen> thanks for the reminder!
15:54:47 <mattmceuen> Ok team - good meeting.  Have a great day, see you in the chat room!
15:54:48 <portdirect> theres this dude YamSaple
15:55:05 <portdirect> whatever you do dont vote for him.
15:55:13 <v1k0d3n> haha
15:55:27 <mattmceuen> we don't want to steal time away from him distracting you in our chat room portdirect -- excellent point
15:55:27 <portdirect> ;)
15:55:48 <mattmceuen> #endmeeting