#openstack-meeting-5 log

15:01:33 <mattmceuen> #startmeeting openstack-helm
15:01:34 <openstack> Meeting started Tue Jul 24 15:01:33 2018 UTC and is due to finish in 60 minutes.  The chair is mattmceuen. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:01:35 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:01:37 <mattmceuen> #topic Rollcall
15:01:38 <openstack> The meeting name has been set to 'openstack_helm'
15:01:50 <mattmceuen> GM/GE all!
15:01:57 <roman_g> =)
15:02:00 <portdirect> o/
15:02:11 <srwilkers> o/
15:02:13 <jgu_> good morning
15:02:14 <vadym> hi
15:02:24 <mattmceuen> Here's the agenda for our openstack-helm weekly team meeting https://etherpad.openstack.org/p/openstack-helm-meeting-2018-07-24
15:02:39 <mattmceuen> Please add anything you'd like to discuss today
15:03:23 <rwellum> o/
15:03:54 <mattmceuen> #topic Use Case for external k8s cluster and Ceph Cluster
15:04:12 <jgu_> ah that's mine :-). We are exploring the use case that deploys OSH on an existing k8s cluster, and a separate ceph cluster (i.e., not as containers on the k8s).
15:04:35 <jgu_> Is there already some document for this use case or interest in the use case?
15:04:35 <mattmceuen> Good news jgu_ -- every OSH deployment is on an existing k8s cluster :)
15:04:47 <mattmceuen> It's a "bring your own kubernetes" model
15:04:53 <jgu_> stand corrected :-)
15:04:58 <mattmceuen> So that part should be all good
15:05:03 <jgu_> yes, bring your k8s and ceph
15:05:07 <mattmceuen> The external ceph cluster can be done as well
15:05:41 <mattmceuen> As the response in the agenda mentions, our SKT teammates use Ceph that way - do we have any reference overrides for this that we can share?
15:05:52 <portdirect> not currently afaik
15:06:02 <rwellum> I've been trying this too - and have some steps - but can't really comment too much because I haven't successfully got it working
15:06:02 <portdirect> it would be great if we could get that in
15:06:12 <portdirect> as i know jayahn's team did a load of good work to support it
15:06:28 <jgu_> We have documented the steps to integrate with the external ceph cluster for PVC’s used by mariadb and rabbitmq etc. And currently working on configuring glance (and possibly same for cinder, swift) charts to use the external ceph.
15:06:34 <mattmceuen> d'oh rwellum - one more reason we need good reference material for it.  Are your probs with the Ceph cluster itself, or with plugging it into OSH?
15:06:41 <jgu_> if there's work already on it, we'd love to leverage/use that
15:08:06 <srwilkers> i think the best way would be to get a WIP up at least that outlines what's resulted in success so far
15:08:16 <srwilkers> that way we can get more eyes on it and collaborate on it
15:08:41 <mattmceuen> ++
15:09:12 <mattmceuen> We can try to pull in everyone who has experience with it (rwellum, SKT folks, etc) to weigh in on review
15:09:30 <jgu_> we got the bring yoour own k8s cluster working to the extent that we can provision the charts, but helm testing fails because the openstack service can't be accessed off the k8s nodes w/o LB or proxy
15:09:48 <jgu_> sounds great to me.
15:10:00 <jgu_> we'd love to share what we have learnt so far
15:10:01 <mattmceuen> I think adding a new doc in here would be really valuable: https://github.com/openstack/openstack-helm/tree/master/doc/source/install
15:10:10 <mattmceuen> Thanks jgu_!
15:10:22 <mattmceuen> We'll get it to 100%
15:11:01 <jgu_> cool... where would we start? get a spec or story board ticket or a PS to the doc?
15:11:27 <mattmceuen> Yep I can create a storyboard story for you (or feel free to create one if you don't want to wait :) )
15:11:58 <mattmceuen> For the doc itself, you could just copy one of the existing docs in the `install` folder and modify it
15:12:02 <rwellum> +1
15:12:34 <jgu_> will do
15:12:48 <mattmceuen> Thanks man
15:13:04 <mattmceuen> Anything else anyone would like to discuss around ceph?
15:14:15 <mattmceuen> #topic Progress update on enabling Galera survive OSH cluster restart?
15:14:38 <mattmceuen> Pete says:  Still in flight :( https://review.openstack.org/#/c/583986/, getting close though.
15:14:51 <portdirect> yeah - I hope to have it out of wip this week
15:14:58 <mattmceuen> Nice!
15:15:00 <portdirect> got sidetracked with some prod issues :(
15:15:11 <mattmceuen> pesky prod issues
15:15:33 <vadym> thanks guys
15:15:54 <mattmceuen> #topic PS needing review
15:15:56 <mattmceuen> https://review.openstack.org/#/c/585039/
15:16:11 <mattmceuen> ^ srwilkers already on it - thanks Steve
15:16:50 <mattmceuen> That'll be a good one for anyone new to the team to review
15:16:50 <srwilkers> np
15:16:55 <srwilkers> https://review.openstack.org/#/c/579022/ could use some eyes too
15:17:04 <mattmceuen> Because portdirect is adding good docs to helm-toolkit
15:17:13 <mattmceuen> Good educational opportunity!
15:17:29 <portdirect> i'm even finding some 18 month old bugs :D
15:17:50 <srwilkers> ive been working on making elasticsearch a bit smarter/more lean, and ^ would help it a little bit
15:18:11 <portdirect> oh - a perfect segway
15:18:14 <mattmceuen> yes srwilkers - thanks for bringing tha tone up
15:18:15 <mattmceuen> *one
15:18:40 <mattmceuen> #topic And now for something completely different
15:18:49 <portdirect> would be great if we can get some eyes on this: https://review.openstack.org/#/c/582620
15:19:02 <portdirect> esp how we should get lma into this picture
15:19:13 <portdirect> as both a method of reporting issues
15:19:29 <portdirect> and  `0 touch` probing
15:19:52 <mattmceuen> +1
15:20:22 <srwilkers> hmm
15:20:40 <mattmceuen> Getting good probes is really important and wide-ranging, this is a good one for a spec (and a good one to get lots of eyeballs on)
15:21:36 <srwilkers> what'd you have in mind for 0 touch probing?
15:22:03 <portdirect> as in saying `im ill` without effecting the pod
15:22:21 <portdirect> as this can sometimes have pretty disaterious effects on a cluster
15:22:39 <portdirect> eg: if external DNS is down, we want to report this
15:22:55 <portdirect> but not necessarily kill, or mark unready, the dns pod
15:23:04 <portdirect> theres loads of other examples
15:23:23 <mattmceuen> we should only whack pods when we're pretty certain whacking the pod will fix the issue, yes?
15:23:44 <mattmceuen> light touch
15:23:48 <portdirect> ^^
15:24:13 <portdirect> in many cases we know somthings wrong, but require `quantum computing` * to fix
15:24:31 <portdirect> * the cheapest quantum computer avalible to most orgs is somone in ops
15:25:02 <roman_g> could pod return status "not ready yet" in such cases?
15:25:06 <srwilkers> okay, as long as we're including the ability to avoid marking pods as unready.  having something like prometheus determine whether we can mark a pod as ready or not scares me a bit, as all it takes is a PVC filling up to cause prometheus to stop gathering metrics
15:25:24 <srwilkers> and if prometheus was responsible for determining some part of readiness in that scenario, nothing is going to be ready
15:25:27 <portdirect> roman_g: resuting not ready yet, can (and does) cause issues in itself
15:25:32 <mattmceuen> I don't see anything around the lma probe (I'm going to go ahead and coin `logliness probe`) in the spec yet - wanting to add that in there, or separate portdirect?
15:25:57 <portdirect> srwilkers: i think you getting the wrong end of the stick here
15:26:08 <srwilkers> portdirect: i think so too.  thats why i asked for clarification :)
15:26:28 <portdirect> what im getting at is more that lma should be the 1st step of the heath reporting of a cluster
15:26:36 <srwilkers> on that, we agree
15:26:41 <portdirect> and then automatic and atomic operrations can come in
15:26:52 <portdirect> ie - livelyness/readyness probes
15:27:10 <portdirect> Im not proposing coupling lma with that
15:27:27 <portdirect> but in a spec, we need to look at the holistic picture
15:27:51 <mattmceuen> LMA is for manual resolution; probes are for automated resolution (pod whacking) -- we want to save the automation for when we're quite sure ahead of time that pod whacking is the resolution
15:28:18 <mattmceuen> when it takes human intelligence to resolve we should rely on lma instead of probes
15:28:19 <portdirect> even pulling a pod out of 'ready' is essentially a wack - as it drops its endpoints
15:28:25 <srwilkers> just sounds like smarter and more granular alarming to me
15:28:31 <mattmceuen> yes
15:28:33 <portdirect> exactly :)
15:28:54 <srwilkers> cool
15:29:04 <mattmceuen> smarter and more granular alarming == `logliness probe`
15:29:14 * srwilkers cringes
15:29:15 <mattmceuen> I'm like a dog with a bone
15:29:17 <mattmceuen> moving on
15:29:24 <mattmceuen> #topic Roundtable
15:29:33 <mattmceuen> Who else has good jokes
15:29:35 <mattmceuen> Or other topics
15:29:45 <srwilkers> this has stagnated a bit, but would like to figure out a path forward for this one: https://review.openstack.org/#/c/559417/
15:30:13 <srwilkers> as this makes it possible to take care of our indexes better, as without it our only option is to just delete them
15:31:21 <mattmceuen> is the RGW S3 API now the default snapshotting approach with this change, Steve?
15:31:47 <srwilkers> yep
15:32:00 <mattmceuen> awesome
15:32:50 <mattmceuen> what's the default snapshotting frequency?
15:32:54 <srwilkers> that removes the pvc mechanism all together
15:34:05 <srwilkers> default is to snapshot any indices older than one day, which is a bit often.  but curator is configured entirely via the values in elasticsearch at the moment
15:34:06 <mattmceuen> Oh I see -- Snapshot indices older than one day
15:34:09 <srwilkers> yep
15:34:44 <srwilkers> once we have something reliable to use for snapshot repositories, we can start exploring more sane defaults for curator, or at least exposing more options by default
15:35:22 <mattmceuen> What's the process to restore from a snapshot?
15:35:59 <srwilkers> elasticsearch exposes the ability to do so via its API
15:36:30 <mattmceuen> Based on the S3 URL?
15:36:33 <srwilkers> https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-snapshots.html
15:36:39 <srwilkers> based on whatever repositories you have defined
15:36:48 <mattmceuen> ah ok cool
15:37:17 <mattmceuen> I'll give this a spin - thanks srwilkers
15:37:36 <mattmceuen> Any other roundtable topics?
15:37:38 <rwellum> PTG - any planning for discussions etc?
15:37:50 <rwellum> I should be there btw
15:37:53 <mattmceuen> That is quickly approaching isn't it
15:37:55 <srwilkers> only plans ive made are which breweries i plan on drinking at
15:38:12 <rwellum> +1
15:38:28 <srwilkers> (wynkoop has a 14% IPA)
15:38:30 <srwilkers> just sayin
15:38:38 <mattmceuen> we need to come up with some discussion topics soon, to keep srwilkers and rwellum from drinking too much
15:39:06 <rwellum> Yeah last PTG was a complete blank - don't hang out with the Cinder folks...
15:39:13 <mattmceuen> :D
15:39:14 <srwilkers> drunken discussion is best discussion.  we need to hit the balmer peak
15:40:21 <mattmceuen> We will have have highly focused and well-planned sessions with plenty of coffee, followed by responsible fun afterwards :)
15:40:24 * mattmceuen PSA complete
15:40:32 <srwilkers> this guy
15:40:47 <mattmceuen> Alright guys :)
15:40:58 <mattmceuen> Thanks for the discussion today - have a great week!
15:41:07 <mattmceuen> #endmeeting