15:00:26 #startmeeting openstack-helm 15:00:27 Meeting started Tue Sep 10 15:00:26 2019 UTC and is due to finish in 60 minutes. The chair is portdirect. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:28 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:30 The meeting name has been set to 'openstack_helm' 15:00:36 lets give it a fes mins for people to arrive 15:00:37 o/ 15:00:48 \o 15:00:53 the agenda is here: https://etherpad.openstack.org/p/openstack-helm-meeting-2019-09-10 15:01:01 please add away :) 15:01:01 hello everyone 15:01:11 o/ 15:01:29 o/ 15:02:40 \o 15:04:20 o/ 15:04:35 o/ 15:05:32 ok looks like it will be a fairly quite meeting today 15:05:37 #topic Monitoring/Alerting stack in the gates 15:05:46 o/ 15:05:57 o/ 15:06:10 as some users of osh are moving forward with serious deployments 15:06:31 im wondering if we could enchance the work we do in the gates 15:06:53 to make more active use of our Logging, monitoring and alerting stack 15:07:38 it would be great (i think) to start querying our monitoring services for the state of things we are attempting to validate 15:07:55 actually, yes 15:08:10 i've got a few WIP patches in flight that do this sort of thing 15:08:21 which would help close (some of) the gap we have in that our gate just does point in time checks on pod and service state 15:08:31 srwilkers: awesome! 15:08:36 can you point to them? 15:08:48 yeah, sec 15:09:07 I like the thought 15:09:33 this change added a chaoskube experimental check then queried prometheus for firing alerts: https://review.opendev.org/#/c/630299/28/tools/deployment/common/check-prom-alerts.sh 15:10:15 this change queried elasticsearch for pod logs using a bash utility i wrote awhile ago: https://review.opendev.org/#/c/624435/14 15:10:42 granted, these are a bit stale. however, they'd still serve as a decent reference for what we could do in our jobs for validating operation with these tools 15:11:05 i think as a 1st step it would be great to get queries to nagios? 15:11:18 nagios doesn't have an API, so that's out 15:11:19 as this is the 'front door' we promote to operations etc 15:11:30 what about selenium? 15:11:37 best we can do is take snapshots with something like selenium 15:11:39 which we already do 15:11:49 we just don't do that in every job we run 15:12:16 rather than just take snapshots we should be able to query for element state - eg red/green 15:12:50 o/ 15:13:42 also does our nagios not support ncpa? https://www.nagios.org/ncpa/help/2.0/api.html 15:14:54 we just use nagios core at the moment - we can see if we can include NCPA, but we don't at the moment 15:16:23 ok, i think we will need somthing in this space 15:17:03 the next question is - do we want this as part of every job we run? 15:17:19 if we dont have the ability so simply query nagios via an api, then old skool selenium scraping looks to be our only option? 15:17:27 srwilkers: at the least a periodical 15:17:42 i was thinking the periodic multinode jobs would be good candidates, beyond what we do already 15:17:48 yup 15:22:36 srwilkers: lets have a look at our options this week, and come back next week with what we come up with? 15:22:45 portdirect: works for me 15:23:11 ok - thats all i have for topics today 15:23:46 anything else we should be discussing/thinking about this week, before we move onto the plea for reviews? 15:25:48 ok - lets move on 15:25:53 #topic reviews 15:26:12 https://www.irccloud.com/pastebin/jFzEmB8L/ 15:27:27 Could you guys please give this patch (https://review.opendev.org/#/c/643284/) final review? We have tried to incorporate all the comments that were addressed 15:27:39 :) 15:28:35 rihabb, I was also about to mention it, it really needs core reviewer's reviews 15:29:35 will do rihabb/cheng1 15:29:51 Thanks :) 15:29:56 if all the comments have been addressed, then i think we can finally put this one home 15:30:02 portdirect, thanks 15:30:48 ok - lets give everyone 30 mins back 15:31:03 #endmeeting