15:00:47 <srwilkers_> #startmeeting openstack-helm 15:00:49 <openstack> Meeting started Tue Jun 20 15:00:47 2017 UTC and is due to finish in 60 minutes. The chair is srwilkers_. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:50 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:52 <openstack> The meeting name has been set to 'openstack_helm' 15:01:01 <srwilkers_> happy tuesday everyone 15:01:15 <portdirect> \o/ 15:01:16 <alraddarla> o/ 15:01:27 <srwilkers_> we've got a few items on the agenda for today. can be found here: https://etherpad.openstack.org/p/openstack-helm-meeting-2017-06-20 15:01:48 <dulek> o/ 15:02:07 <lrensing> o/ 15:02:29 <srwilkers_> lets give it a minute or so to let others filter in 15:02:39 <srwilkers_> then we can tackle these topics, and then open up for discussion afterwards 15:03:21 <lamt> o/ 15:03:36 * gagehugo lurks 15:04:09 <v1k0d3n> o/ 15:04:28 <srwilkers_> alright, lets go ahead and start tackling these then 15:04:29 <srwilkers_> #topic Health Checks 15:04:45 <srwilkers_> portdirect, i think you raised this topic last week or this one -- cant recall 15:04:47 <srwilkers_> youve got the floor 15:06:02 <portdirect> yeah - I ve got a few queries about these - as I'm not sure we want to be killing/restarting pods when comms with infra is broken - seems like a way to create a runaway train. but was wondering if dulek had thought of somthing id missed? 15:06:03 * dulek is here to answer any questions about the patches. 15:06:07 <portdirect> :) 15:06:37 <dulek> Right, that's something I was wondering about too. 15:07:06 <dulek> My intention with the patches is also to have some indication for the admin that pod is unhealthy. 15:07:53 <dulek> Also I can imagine a pod being rescheduled to another kubelet, which resolves the problem if it was caused by physical network failure. 15:08:28 <portdirect> yeah - though i can also see a transient problem causing the entire cluster to go down? 15:08:37 <dulek> Please note that in case of Neutron and Nova, healthchecks are based on nova service-list/neutron agent-list commands. I was hoping to do the same with the rest of the charts but it isn't always possible. 15:09:24 <dulek> portdirect: Pod restart when a transient failure occurred isn't really that bad thing IMO. 15:10:56 <portdirect> it is if they all restart simultaneously - thats gonna potentially put a huge load on your underlying services 15:11:30 <dulek> portdirect: Well, I cannot disagree with that. 15:11:51 <portdirect> but i get your point - would the visibility your aiming for not be better achived through a mechanism like centralised logging? 15:12:18 <dulek> portdirect: Though if your infra is suffering from transient failures, you should probably tweak liveness probe period. 15:13:25 <portdirect> the best thing about failure is it s like the Spanish Inquisition: https://www.youtube.com/watch?v=sAn7baRbhx4 15:13:34 <dulek> portdirect: RabbitMQ or DB disconnections could be addressed through that, I'm not sure about monitoring "nova service-list" though. 15:14:06 <portdirect> srwilkers_: will the work your doing be able to monitor those things? 15:14:26 <srwilkers_> portdirect, yeah. it should be able to. 15:14:53 <srwilkers_> and once we actually get it into addons, we can start looking at specific exporters for prometheus and adjusting the rules as necessary 15:15:35 <portdirect> would you be able to work with dulek to get a list of the tings we'd need to monitor in a spec? 15:15:48 <srwilkers_> sure, i can get that started today 15:16:27 <srwilkers_> #action srwilkers: document appropriate monitoring targets with dulek 15:16:49 <portdirect> dulek: would that work for you? I suspect it would need a combination of your work and srwilkers_ to get all we need 15:16:50 <dulek> I'll be ending my office day just after the meeting, but I can definitely provide a list of alarms in logs/commands that we should look for. 15:17:21 <portdirect> awesome - cheers dude 15:17:23 <dulek> portdirect: Sure, I'll take a look at srwilkers_ work and see how we can use it. 15:18:02 <srwilkers_> okay cool. think we can move on then? 15:18:28 <srwilkers_> #topic Gating for Addons/Infra 15:18:53 <srwilkers_> so we've got openstack-helm-addons and openstack-helm-infra now, in addition to the primary repo 15:19:32 <srwilkers_> portdirect, lamt and StaceyF have been tossing around ideas for gating the three repos in a way that makes sense in how the repositories are expected to be used 15:20:39 <srwilkers_> the current idea is to explore zuul-cloner to see how we can run checks on the three repos without introducing any race conditions or overhead 15:20:42 <lamt> srwilkers_ -infra was created overnight, I will work to set up the linter gate there, also to clean up -addons 15:21:13 <srwilkers_> lamt, awesome. i can help as well if you need. think that should be higher priority so we can start getting stuff out of the queue in addons 15:21:18 <srwilkers_> theres already a bit backed up there 15:21:24 <lrensing> just for clarity, can we define where we draw the line for each of the repos? 15:21:27 <srwilkers_> lamt, any other thoughts? 15:22:04 <lamt> lamt for now, either tarball, or gitclone, will need to play around with it 15:22:07 <srwilkers_> lrensing, sure. the expectation is that infra is anything required to run the openstack services on top of. addons is anything ancillary that can be used in conjunction with them 15:22:36 <srwilkers_> thats my view at any rate 15:22:38 <portdirect> ^^ ++ 15:22:44 <lrensing> sounds good :) 15:22:54 <srwilkers_> lamt, awesome 15:23:16 <srwilkers_> #action lamt to explore tarballs or gitclone for multi repo checks/gates 15:23:26 <portdirect> lamt: id like us to use git clone when not in infra - being able to run the gate scripts locally is pretty important :) 15:23:27 <lamt> the linter should work for -addons, please review it. will refine the node3 gate late 15:23:41 <lamt> portdirect sounds good 15:25:06 <portdirect> perhaps we should explore having an 'armada' check as well? 15:25:22 <portdirect> alanmeadows: ^? 15:25:45 <alanmeadows> it supports remote git urls and branch/tag targets 15:25:49 <alanmeadows> seems like the fit you're after 15:26:22 <alanmeadows> I have a working manifest for ~master at this point 15:26:52 <portdirect> nice - if we got an addtional 3 node check would you be able to submit a ps for it? 15:27:17 <alanmeadows> sure 15:28:08 <srwilkers_> alanmeadows, portdirect: awesome :) 15:28:22 <portdirect> sounds good - I'll try and get that up today 15:29:10 <srwilkers_> #action portdirect will look into 3 node check for armada check 15:29:41 <srwilkers_> anything else on gating? 15:30:09 <portdirect> I'll be working with lrensing to get ceph running iin the gate today and start the road to dropping nfs :D 15:30:50 <srwilkers_> portdirect, thatd be really awesome :) 15:30:54 <portdirect> we'll also have vms running in the gate soon - i just need to tidy up my ps 15:31:29 <portdirect> https://review.openstack.org/#/c/474960/ 15:31:35 <srwilkers_> that's great. let's get some visibility on those when they're ready for review. would be nice to get those in for sure 15:32:06 <StaceyF> I put the 3rd party gate as a separate topic but can bring it up now? 15:32:10 <srwilkers_> oh nice 15:32:17 <srwilkers_> of course StaceyF, floors yours 15:32:52 <srwilkers_> #topic Third Party Gate 15:33:36 <StaceyF> It's currently just a skeleton job but we have Openstack Helm deployed on our CI lab and will be utilizing Jenkins plugins (openstack-cloud) to dynamically provision VMs to test OSH on PS / merge in Gerrit 15:34:03 <StaceyF> I'd like to get the go ahead to make it a non-voting 3rd party instead of the skipping once we've stabilized the Jenkins jobs 15:35:21 <StaceyF> We will be putting the logs on an Apache server so they are accessible to see failures/success. 15:35:44 <srwilkers_> StaceyF, hmm. im okay with it becoming a non-voting 3rd party gate, given the logs are publicly available 15:37:17 <StaceyF> srwilkers thanks 15:37:19 <srwilkers_> portdirect, thoughts? 15:37:20 <portdirect> nice - will we be able to run rally properly inside the ATT gate? 15:37:33 <StaceyF> yes 15:37:56 <portdirect> srwilkers_: think this is pretty awesome :) 15:38:55 <srwilkers_> alright cool. 15:39:11 <StaceyF> then I'll move forward and will provide status as it moves along 15:39:24 <srwilkers_> StaceyF, great. keep us posted :) 15:40:13 <srwilkers_> #topic Monasca-Helm (https://github.com/monasca/monasca-helm) 15:40:24 <srwilkers_> i added this topic for jayahn as it was mentioned yesterday 15:40:42 <srwilkers_> and he brought up possibly touching base with the monasca folks to see what their intentions are for it 15:41:11 <srwilkers_> but im not sure if jayahn is present currently, so might want to follow up with him when i see him in the openstack-helm channel again. i know the time difference is pretty drastic 15:41:52 <portdirect> yeah - they have some interesting things going on over there - looks like a full stack in a single chart? 15:42:02 <srwilkers_> portdirect, thats what it seems like at a glance 15:42:25 <srwilkers_> i havent had a chance to play with it yet, but they're using monasca to grab cluster-level metrics i think and scraping prometheus endpoints 15:42:47 <portdirect> be good to chat to them for sure - you know what irc chan they in? 15:42:59 <srwilkers_> i can dig it up and ask about it today 15:43:38 <srwilkers_> either way, i think im going to fiddle with it and profile it versus prometheus to see what the differences are just for the sake of comparison 15:44:37 <srwilkers_> #action srwilkers_ follow up with jayahn about monasca-helm 15:45:15 <srwilkers_> #topic open discussion 15:45:25 <srwilkers_> alright, that's all the topics we had in the etherpad for today 15:45:39 <srwilkers_> id like to open the floor for any other concerns/topics that weren't on the agenda 15:47:27 <srwilkers_> going once 15:48:07 <srwilkers_> going twice? 15:49:25 <srwilkers_> alright, going to give you all 10 minutes back 15:49:47 <srwilkers_> #endmeeting