15:01:33 #startmeeting openstack-helm 15:01:34 Meeting started Tue Jul 24 15:01:33 2018 UTC and is due to finish in 60 minutes. The chair is mattmceuen. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:35 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:37 #topic Rollcall 15:01:38 The meeting name has been set to 'openstack_helm' 15:01:50 GM/GE all! 15:01:57 =) 15:02:00 o/ 15:02:11 o/ 15:02:13 good morning 15:02:14 hi 15:02:24 Here's the agenda for our openstack-helm weekly team meeting https://etherpad.openstack.org/p/openstack-helm-meeting-2018-07-24 15:02:39 Please add anything you'd like to discuss today 15:03:23 o/ 15:03:54 #topic Use Case for external k8s cluster and Ceph Cluster 15:04:12 ah that's mine :-). We are exploring the use case that deploys OSH on an existing k8s cluster, and a separate ceph cluster (i.e., not as containers on the k8s). 15:04:35 Is there already some document for this use case or interest in the use case? 15:04:35 Good news jgu_ -- every OSH deployment is on an existing k8s cluster :) 15:04:47 It's a "bring your own kubernetes" model 15:04:53 stand corrected :-) 15:04:58 So that part should be all good 15:05:03 yes, bring your k8s and ceph 15:05:07 The external ceph cluster can be done as well 15:05:41 As the response in the agenda mentions, our SKT teammates use Ceph that way - do we have any reference overrides for this that we can share? 15:05:52 not currently afaik 15:06:02 I've been trying this too - and have some steps - but can't really comment too much because I haven't successfully got it working 15:06:02 it would be great if we could get that in 15:06:12 as i know jayahn's team did a load of good work to support it 15:06:28 We have documented the steps to integrate with the external ceph cluster for PVC’s used by mariadb and rabbitmq etc. And currently working on configuring glance (and possibly same for cinder, swift) charts to use the external ceph. 15:06:34 d'oh rwellum - one more reason we need good reference material for it. Are your probs with the Ceph cluster itself, or with plugging it into OSH? 15:06:41 if there's work already on it, we'd love to leverage/use that 15:08:06 i think the best way would be to get a WIP up at least that outlines what's resulted in success so far 15:08:16 that way we can get more eyes on it and collaborate on it 15:08:41 ++ 15:09:12 We can try to pull in everyone who has experience with it (rwellum, SKT folks, etc) to weigh in on review 15:09:30 we got the bring yoour own k8s cluster working to the extent that we can provision the charts, but helm testing fails because the openstack service can't be accessed off the k8s nodes w/o LB or proxy 15:09:48 sounds great to me. 15:10:00 we'd love to share what we have learnt so far 15:10:01 I think adding a new doc in here would be really valuable: https://github.com/openstack/openstack-helm/tree/master/doc/source/install 15:10:10 Thanks jgu_! 15:10:22 We'll get it to 100% 15:11:01 cool... where would we start? get a spec or story board ticket or a PS to the doc? 15:11:27 Yep I can create a storyboard story for you (or feel free to create one if you don't want to wait :) ) 15:11:58 For the doc itself, you could just copy one of the existing docs in the `install` folder and modify it 15:12:02 +1 15:12:34 will do 15:12:48 Thanks man 15:13:04 Anything else anyone would like to discuss around ceph? 15:14:15 #topic Progress update on enabling Galera survive OSH cluster restart? 15:14:38 Pete says: Still in flight :( https://review.openstack.org/#/c/583986/, getting close though. 15:14:51 yeah - I hope to have it out of wip this week 15:14:58 Nice! 15:15:00 got sidetracked with some prod issues :( 15:15:11 pesky prod issues 15:15:33 thanks guys 15:15:54 #topic PS needing review 15:15:56 https://review.openstack.org/#/c/585039/ 15:16:11 ^ srwilkers already on it - thanks Steve 15:16:50 That'll be a good one for anyone new to the team to review 15:16:50 np 15:16:55 https://review.openstack.org/#/c/579022/ could use some eyes too 15:17:04 Because portdirect is adding good docs to helm-toolkit 15:17:13 Good educational opportunity! 15:17:29 i'm even finding some 18 month old bugs :D 15:17:50 ive been working on making elasticsearch a bit smarter/more lean, and ^ would help it a little bit 15:18:11 oh - a perfect segway 15:18:14 yes srwilkers - thanks for bringing tha tone up 15:18:15 *one 15:18:40 #topic And now for something completely different 15:18:49 would be great if we can get some eyes on this: https://review.openstack.org/#/c/582620 15:19:02 esp how we should get lma into this picture 15:19:13 as both a method of reporting issues 15:19:29 and `0 touch` probing 15:19:52 +1 15:20:22 hmm 15:20:40 Getting good probes is really important and wide-ranging, this is a good one for a spec (and a good one to get lots of eyeballs on) 15:21:36 what'd you have in mind for 0 touch probing? 15:22:03 as in saying `im ill` without effecting the pod 15:22:21 as this can sometimes have pretty disaterious effects on a cluster 15:22:39 eg: if external DNS is down, we want to report this 15:22:55 but not necessarily kill, or mark unready, the dns pod 15:23:04 theres loads of other examples 15:23:23 we should only whack pods when we're pretty certain whacking the pod will fix the issue, yes? 15:23:44 light touch 15:23:48 ^^ 15:24:13 in many cases we know somthings wrong, but require `quantum computing` * to fix 15:24:31 * the cheapest quantum computer avalible to most orgs is somone in ops 15:25:02 could pod return status "not ready yet" in such cases? 15:25:06 okay, as long as we're including the ability to avoid marking pods as unready. having something like prometheus determine whether we can mark a pod as ready or not scares me a bit, as all it takes is a PVC filling up to cause prometheus to stop gathering metrics 15:25:24 and if prometheus was responsible for determining some part of readiness in that scenario, nothing is going to be ready 15:25:27 roman_g: resuting not ready yet, can (and does) cause issues in itself 15:25:32 I don't see anything around the lma probe (I'm going to go ahead and coin `logliness probe`) in the spec yet - wanting to add that in there, or separate portdirect? 15:25:57 srwilkers: i think you getting the wrong end of the stick here 15:26:08 portdirect: i think so too. thats why i asked for clarification :) 15:26:28 what im getting at is more that lma should be the 1st step of the heath reporting of a cluster 15:26:36 on that, we agree 15:26:41 and then automatic and atomic operrations can come in 15:26:52 ie - livelyness/readyness probes 15:27:10 Im not proposing coupling lma with that 15:27:27 but in a spec, we need to look at the holistic picture 15:27:51 LMA is for manual resolution; probes are for automated resolution (pod whacking) -- we want to save the automation for when we're quite sure ahead of time that pod whacking is the resolution 15:28:18 when it takes human intelligence to resolve we should rely on lma instead of probes 15:28:19 even pulling a pod out of 'ready' is essentially a wack - as it drops its endpoints 15:28:25 just sounds like smarter and more granular alarming to me 15:28:31 yes 15:28:33 exactly :) 15:28:54 cool 15:29:04 smarter and more granular alarming == `logliness probe` 15:29:14 * srwilkers cringes 15:29:15 I'm like a dog with a bone 15:29:17 moving on 15:29:24 #topic Roundtable 15:29:33 Who else has good jokes 15:29:35 Or other topics 15:29:45 this has stagnated a bit, but would like to figure out a path forward for this one: https://review.openstack.org/#/c/559417/ 15:30:13 as this makes it possible to take care of our indexes better, as without it our only option is to just delete them 15:31:21 is the RGW S3 API now the default snapshotting approach with this change, Steve? 15:31:47 yep 15:32:00 awesome 15:32:50 what's the default snapshotting frequency? 15:32:54 that removes the pvc mechanism all together 15:34:05 default is to snapshot any indices older than one day, which is a bit often. but curator is configured entirely via the values in elasticsearch at the moment 15:34:06 Oh I see -- Snapshot indices older than one day 15:34:09 yep 15:34:44 once we have something reliable to use for snapshot repositories, we can start exploring more sane defaults for curator, or at least exposing more options by default 15:35:22 What's the process to restore from a snapshot? 15:35:59 elasticsearch exposes the ability to do so via its API 15:36:30 Based on the S3 URL? 15:36:33 https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-snapshots.html 15:36:39 based on whatever repositories you have defined 15:36:48 ah ok cool 15:37:17 I'll give this a spin - thanks srwilkers 15:37:36 Any other roundtable topics? 15:37:38 PTG - any planning for discussions etc? 15:37:50 I should be there btw 15:37:53 That is quickly approaching isn't it 15:37:55 only plans ive made are which breweries i plan on drinking at 15:38:12 +1 15:38:28 (wynkoop has a 14% IPA) 15:38:30 just sayin 15:38:38 we need to come up with some discussion topics soon, to keep srwilkers and rwellum from drinking too much 15:39:06 Yeah last PTG was a complete blank - don't hang out with the Cinder folks... 15:39:13 :D 15:39:14 drunken discussion is best discussion. we need to hit the balmer peak 15:40:21 We will have have highly focused and well-planned sessions with plenty of coffee, followed by responsible fun afterwards :) 15:40:24 * mattmceuen PSA complete 15:40:32 this guy 15:40:47 Alright guys :) 15:40:58 Thanks for the discussion today - have a great week! 15:41:07 #endmeeting