15:00:26 <portdirect> #startmeeting openstack-helm 15:00:27 <openstack> Meeting started Tue Sep 10 15:00:26 2019 UTC and is due to finish in 60 minutes. The chair is portdirect. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:28 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:30 <openstack> The meeting name has been set to 'openstack_helm' 15:00:36 <portdirect> lets give it a fes mins for people to arrive 15:00:37 <roman_g> o/ 15:00:48 <lamt> \o 15:00:53 <portdirect> the agenda is here: https://etherpad.openstack.org/p/openstack-helm-meeting-2019-09-10 15:01:01 <portdirect> please add away :) 15:01:01 <stevthedev> hello everyone 15:01:11 <megheisler> o/ 15:01:29 <mattmceuen> o/ 15:02:40 <mbuil> \o 15:04:20 <srwilkers> o/ 15:04:35 <rihabb> o/ 15:05:32 <portdirect> ok looks like it will be a fairly quite meeting today 15:05:37 <portdirect> #topic Monitoring/Alerting stack in the gates 15:05:46 <cheng1> o/ 15:05:57 <gagehugo> o/ 15:06:10 <portdirect> as some users of osh are moving forward with serious deployments 15:06:31 <portdirect> im wondering if we could enchance the work we do in the gates 15:06:53 <portdirect> to make more active use of our Logging, monitoring and alerting stack 15:07:38 <portdirect> it would be great (i think) to start querying our monitoring services for the state of things we are attempting to validate 15:07:55 <srwilkers> actually, yes 15:08:10 <srwilkers> i've got a few WIP patches in flight that do this sort of thing 15:08:21 <portdirect> which would help close (some of) the gap we have in that our gate just does point in time checks on pod and service state 15:08:31 <portdirect> srwilkers: awesome! 15:08:36 <portdirect> can you point to them? 15:08:48 <srwilkers> yeah, sec 15:09:07 <stevthedev> I like the thought 15:09:33 <srwilkers> this change added a chaoskube experimental check then queried prometheus for firing alerts: https://review.opendev.org/#/c/630299/28/tools/deployment/common/check-prom-alerts.sh 15:10:15 <srwilkers> this change queried elasticsearch for pod logs using a bash utility i wrote awhile ago: https://review.opendev.org/#/c/624435/14 15:10:42 <srwilkers> granted, these are a bit stale. however, they'd still serve as a decent reference for what we could do in our jobs for validating operation with these tools 15:11:05 <portdirect> i think as a 1st step it would be great to get queries to nagios? 15:11:18 <srwilkers> nagios doesn't have an API, so that's out 15:11:19 <portdirect> as this is the 'front door' we promote to operations etc 15:11:30 <portdirect> what about selenium? 15:11:37 <srwilkers> best we can do is take snapshots with something like selenium 15:11:39 <srwilkers> which we already do 15:11:49 <srwilkers> we just don't do that in every job we run 15:12:16 <portdirect> rather than just take snapshots we should be able to query for element state - eg red/green 15:12:50 <evrardjp> o/ 15:13:42 <portdirect> also does our nagios not support ncpa? https://www.nagios.org/ncpa/help/2.0/api.html 15:14:54 <srwilkers> we just use nagios core at the moment - we can see if we can include NCPA, but we don't at the moment 15:16:23 <portdirect> ok, i think we will need somthing in this space 15:17:03 <srwilkers> the next question is - do we want this as part of every job we run? 15:17:19 <portdirect> if we dont have the ability so simply query nagios via an api, then old skool selenium scraping looks to be our only option? 15:17:27 <portdirect> srwilkers: at the least a periodical 15:17:42 <srwilkers> i was thinking the periodic multinode jobs would be good candidates, beyond what we do already 15:17:48 <portdirect> yup 15:22:36 <portdirect> srwilkers: lets have a look at our options this week, and come back next week with what we come up with? 15:22:45 <srwilkers> portdirect: works for me 15:23:11 <portdirect> ok - thats all i have for topics today 15:23:46 <portdirect> anything else we should be discussing/thinking about this week, before we move onto the plea for reviews? 15:25:48 <portdirect> ok - lets move on 15:25:53 <portdirect> #topic reviews 15:26:12 <portdirect> https://www.irccloud.com/pastebin/jFzEmB8L/ 15:27:27 <rihabb> Could you guys please give this patch (https://review.opendev.org/#/c/643284/) final review? We have tried to incorporate all the comments that were addressed 15:27:39 <rihabb> :) 15:28:35 <cheng1> rihabb, I was also about to mention it, it really needs core reviewer's reviews 15:29:35 <portdirect> will do rihabb/cheng1 15:29:51 <rihabb> Thanks :) 15:29:56 <portdirect> if all the comments have been addressed, then i think we can finally put this one home 15:30:02 <cheng1> portdirect, thanks 15:30:48 <portdirect> ok - lets give everyone 30 mins back 15:31:03 <portdirect> #endmeeting