15:00:47 #startmeeting openstack-helm 15:00:49 Meeting started Tue Jun 20 15:00:47 2017 UTC and is due to finish in 60 minutes. The chair is srwilkers_. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:50 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:52 The meeting name has been set to 'openstack_helm' 15:01:01 happy tuesday everyone 15:01:15 \o/ 15:01:16 o/ 15:01:27 we've got a few items on the agenda for today. can be found here: https://etherpad.openstack.org/p/openstack-helm-meeting-2017-06-20 15:01:48 o/ 15:02:07 o/ 15:02:29 lets give it a minute or so to let others filter in 15:02:39 then we can tackle these topics, and then open up for discussion afterwards 15:03:21 o/ 15:03:36 * gagehugo lurks 15:04:09 o/ 15:04:28 alright, lets go ahead and start tackling these then 15:04:29 #topic Health Checks 15:04:45 portdirect, i think you raised this topic last week or this one -- cant recall 15:04:47 youve got the floor 15:06:02 yeah - I ve got a few queries about these - as I'm not sure we want to be killing/restarting pods when comms with infra is broken - seems like a way to create a runaway train. but was wondering if dulek had thought of somthing id missed? 15:06:03 * dulek is here to answer any questions about the patches. 15:06:07 :) 15:06:37 Right, that's something I was wondering about too. 15:07:06 My intention with the patches is also to have some indication for the admin that pod is unhealthy. 15:07:53 Also I can imagine a pod being rescheduled to another kubelet, which resolves the problem if it was caused by physical network failure. 15:08:28 yeah - though i can also see a transient problem causing the entire cluster to go down? 15:08:37 Please note that in case of Neutron and Nova, healthchecks are based on nova service-list/neutron agent-list commands. I was hoping to do the same with the rest of the charts but it isn't always possible. 15:09:24 portdirect: Pod restart when a transient failure occurred isn't really that bad thing IMO. 15:10:56 it is if they all restart simultaneously - thats gonna potentially put a huge load on your underlying services 15:11:30 portdirect: Well, I cannot disagree with that. 15:11:51 but i get your point - would the visibility your aiming for not be better achived through a mechanism like centralised logging? 15:12:18 portdirect: Though if your infra is suffering from transient failures, you should probably tweak liveness probe period. 15:13:25 the best thing about failure is it s like the Spanish Inquisition: https://www.youtube.com/watch?v=sAn7baRbhx4 15:13:34 portdirect: RabbitMQ or DB disconnections could be addressed through that, I'm not sure about monitoring "nova service-list" though. 15:14:06 srwilkers_: will the work your doing be able to monitor those things? 15:14:26 portdirect, yeah. it should be able to. 15:14:53 and once we actually get it into addons, we can start looking at specific exporters for prometheus and adjusting the rules as necessary 15:15:35 would you be able to work with dulek to get a list of the tings we'd need to monitor in a spec? 15:15:48 sure, i can get that started today 15:16:27 #action srwilkers: document appropriate monitoring targets with dulek 15:16:49 dulek: would that work for you? I suspect it would need a combination of your work and srwilkers_ to get all we need 15:16:50 I'll be ending my office day just after the meeting, but I can definitely provide a list of alarms in logs/commands that we should look for. 15:17:21 awesome - cheers dude 15:17:23 portdirect: Sure, I'll take a look at srwilkers_ work and see how we can use it. 15:18:02 okay cool. think we can move on then? 15:18:28 #topic Gating for Addons/Infra 15:18:53 so we've got openstack-helm-addons and openstack-helm-infra now, in addition to the primary repo 15:19:32 portdirect, lamt and StaceyF have been tossing around ideas for gating the three repos in a way that makes sense in how the repositories are expected to be used 15:20:39 the current idea is to explore zuul-cloner to see how we can run checks on the three repos without introducing any race conditions or overhead 15:20:42 srwilkers_ -infra was created overnight, I will work to set up the linter gate there, also to clean up -addons 15:21:13 lamt, awesome. i can help as well if you need. think that should be higher priority so we can start getting stuff out of the queue in addons 15:21:18 theres already a bit backed up there 15:21:24 just for clarity, can we define where we draw the line for each of the repos? 15:21:27 lamt, any other thoughts? 15:22:04 lamt for now, either tarball, or gitclone, will need to play around with it 15:22:07 lrensing, sure. the expectation is that infra is anything required to run the openstack services on top of. addons is anything ancillary that can be used in conjunction with them 15:22:36 thats my view at any rate 15:22:38 ^^ ++ 15:22:44 sounds good :) 15:22:54 lamt, awesome 15:23:16 #action lamt to explore tarballs or gitclone for multi repo checks/gates 15:23:26 lamt: id like us to use git clone when not in infra - being able to run the gate scripts locally is pretty important :) 15:23:27 the linter should work for -addons, please review it. will refine the node3 gate late 15:23:41 portdirect sounds good 15:25:06 perhaps we should explore having an 'armada' check as well? 15:25:22 alanmeadows: ^? 15:25:45 it supports remote git urls and branch/tag targets 15:25:49 seems like the fit you're after 15:26:22 I have a working manifest for ~master at this point 15:26:52 nice - if we got an addtional 3 node check would you be able to submit a ps for it? 15:27:17 sure 15:28:08 alanmeadows, portdirect: awesome :) 15:28:22 sounds good - I'll try and get that up today 15:29:10 #action portdirect will look into 3 node check for armada check 15:29:41 anything else on gating? 15:30:09 I'll be working with lrensing to get ceph running iin the gate today and start the road to dropping nfs :D 15:30:50 portdirect, thatd be really awesome :) 15:30:54 we'll also have vms running in the gate soon - i just need to tidy up my ps 15:31:29 https://review.openstack.org/#/c/474960/ 15:31:35 that's great. let's get some visibility on those when they're ready for review. would be nice to get those in for sure 15:32:06 I put the 3rd party gate as a separate topic but can bring it up now? 15:32:10 oh nice 15:32:17 of course StaceyF, floors yours 15:32:52 #topic Third Party Gate 15:33:36 It's currently just a skeleton job but we have Openstack Helm deployed on our CI lab and will be utilizing Jenkins plugins (openstack-cloud) to dynamically provision VMs to test OSH on PS / merge in Gerrit 15:34:03 I'd like to get the go ahead to make it a non-voting 3rd party instead of the skipping once we've stabilized the Jenkins jobs 15:35:21 We will be putting the logs on an Apache server so they are accessible to see failures/success. 15:35:44 StaceyF, hmm. im okay with it becoming a non-voting 3rd party gate, given the logs are publicly available 15:37:17 srwilkers thanks 15:37:19 portdirect, thoughts? 15:37:20 nice - will we be able to run rally properly inside the ATT gate? 15:37:33 yes 15:37:56 srwilkers_: think this is pretty awesome :) 15:38:55 alright cool. 15:39:11 then I'll move forward and will provide status as it moves along 15:39:24 StaceyF, great. keep us posted :) 15:40:13 #topic Monasca-Helm (https://github.com/monasca/monasca-helm) 15:40:24 i added this topic for jayahn as it was mentioned yesterday 15:40:42 and he brought up possibly touching base with the monasca folks to see what their intentions are for it 15:41:11 but im not sure if jayahn is present currently, so might want to follow up with him when i see him in the openstack-helm channel again. i know the time difference is pretty drastic 15:41:52 yeah - they have some interesting things going on over there - looks like a full stack in a single chart? 15:42:02 portdirect, thats what it seems like at a glance 15:42:25 i havent had a chance to play with it yet, but they're using monasca to grab cluster-level metrics i think and scraping prometheus endpoints 15:42:47 be good to chat to them for sure - you know what irc chan they in? 15:42:59 i can dig it up and ask about it today 15:43:38 either way, i think im going to fiddle with it and profile it versus prometheus to see what the differences are just for the sake of comparison 15:44:37 #action srwilkers_ follow up with jayahn about monasca-helm 15:45:15 #topic open discussion 15:45:25 alright, that's all the topics we had in the etherpad for today 15:45:39 id like to open the floor for any other concerns/topics that weren't on the agenda 15:47:27 going once 15:48:07 going twice? 15:49:25 alright, going to give you all 10 minutes back 15:49:47 #endmeeting