15:00:30 <mattmceuen> #startmeeting openstack-helm 15:00:31 <openstack> Meeting started Tue Dec 12 15:00:30 2017 UTC and is due to finish in 60 minutes. The chair is mattmceuen. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:32 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:34 <openstack> The meeting name has been set to 'openstack_helm' 15:00:39 <mattmceuen> #topic Rollcall 15:00:44 <mattmceuen> GM everyone! 15:00:53 <raymaika> o/ 15:00:53 <srwilkers> o/ 15:00:57 <srwilkers> its morning 15:01:02 <srwilkers> wont say if its good yet or not 15:01:22 <mattmceuen> Here's the agenda - I'll give folks a couple mins to fill it out https://etherpad.openstack.org/p/openstack-helm-meeting-2017-12-12 15:02:01 <mattmceuen> srwilkers: it's gonna be a great day man 15:02:15 <portdirect> o/ 15:02:23 <srwilkers> our optimistic captain at the helm would never steer us wrong 15:02:43 <mattmceuen> #titanic 15:02:57 <portdirect> too soon mattmceuen 15:02:59 <portdirect> too soon 15:03:05 <srwilkers> oof 15:03:28 <srwilkers> im more of a hunt for red october kinda guy 15:03:38 <mattmceuen> alrighty let's get this show on the road 15:03:54 <mattmceuen> #topic Dependencies on non-default images 15:04:34 <mattmceuen> Let me lay this out for y'all. I want to accomplish two things here: 15:04:35 <mattmceuen> 1) Come up with a general principle for us OSH engineers to apply 15:04:35 <mattmceuen> 2) Come up with a tactical plan for Hyunsun's PS 15:05:07 <mattmceuen> First the problem statement: Hyunsun has a PS that has a feature (lbaas plugin), which is turned off by default 15:05:37 <mattmceuen> The feature doesn't work with the default neutron kolla 3.2.0 image that we have configured 15:06:05 <mattmceuen> It doesn't cause any issues unless you turn the feature on, but if you turn it on, you also have to switch out the image to support the feature 15:06:43 <mattmceuen> That's something that either needs to be documented very well (with a reference to an image you can use which supports the feature), or, we should apply a "feature must wait till the default images support it" rule 15:07:05 <mattmceuen> So that's the RFC for you all. I've heard both opinions. 15:07:58 <mattmceuen> I am personally leaning toward "don't merge until the default image supports" 15:08:21 <mattmceuen> Perhaps leaving the door open for "... unless there is a really special circumstance we haven't thought of yet" 15:08:41 <portdirect> Agreed, though we should confirm if the 4.0.0 image works ok with newton 15:08:45 <mattmceuen> Otherwise we could end up with spaghetti dependencies 15:08:56 <mattmceuen> you're skipping ahead portdirect! 15:09:07 <portdirect> lol - I'll be quite ;) 15:09:36 <mattmceuen> Any dissenting or reaffirming opinions? 15:09:39 <srwilkers> im skeptical of having a default image that doesnt work. as it shouldn't be considered the default at that point 15:09:48 <mattmceuen> Everyone on xmas vacation already? :-D 15:10:10 <srwilkers> its nothing but a placeholder then 15:10:31 <portdirect> ++ 15:10:41 <mattmceuen> Alrighty: 15:11:01 <mattmceuen> #agreed features should not be merged until they are supported by the default images, even if they're turned off by default 15:11:11 <mattmceuen> Next: let's get tactical 15:11:48 <mattmceuen> lbaas is supposed to be supported since kilo, and Hyunsun has or will file a bug with kolla for not supporting 15:12:17 <portdirect> I suspect that will not yeild joy as newton is eol. 15:12:20 <mattmceuen> We could potentially swap in the kolla 4.0.0 image just for the needful image, or swap in a loki image if it supports it out of box 15:12:20 <srwilkers> in the past, we've had issues getting kolla to provide fixes to images that don't work with the charts we're building 15:12:25 <srwilkers> plus what portdirect said 15:12:54 <portdirect> I think the first step would be to see if a 4.0.0 image works with 3.0.3 - if it does great 15:13:01 <srwilkers> portdirect: ++ 15:13:09 <mattmceuen> Agree - I will pas that on to Hyunsun. Thanks guys. 15:13:23 <portdirect> on that ps as well while we are here 15:13:51 <srwilkers> tlam__: late to the party! 15:13:52 <portdirect> we will need to test it a bit - as lbaas agent with haproxy used to be prone to leaving zombie processes about 15:13:58 <tlam__> o/ sorry was running late 15:14:09 <portdirect> so we may need an init system in that pod to reap them... 15:14:11 <mattmceuen> (the PS by the way is https://review.openstack.org/#/c/522162/) 15:14:14 * portdirect finished rant 15:14:35 * srwilkers thinks the patchset needs more cowbell 15:14:41 <mattmceuen> shortest rant I've heard out of you yet portdirect, you've been refining your style 15:15:09 <mattmceuen> Yup Hyunsun affirmed the zombie apocalype this morning 15:15:10 <portdirect> I just want LBaaS in :D its a great thing for us to use with magnum :) 15:15:31 <mattmceuen> amen 15:15:33 <portdirect> Kubernetes on Openstack on Kubernetes awaits :D 15:15:47 * srwilkers groans 15:15:50 <mattmceuen> turtles all the way down 15:15:53 * portdirect dares not abbreviate that. 15:16:07 <mattmceuen> Next: 15:16:14 <mattmceuen> #topic Fluentd Chart 15:16:18 <mattmceuen> Take it away swilkers! 15:16:44 <srwilkers> the patchset in question is: https://review.openstack.org/#/c/519548/ 15:17:12 <srwilkers> sungil and jayahn have done a great job at getting this work started, and i feel bad that it's moved destinations twice as we've worked to get osh-infra sorted 15:18:00 <srwilkers> i think the works almost there, but it might need some tweaking to really shine. i think the charts need to be separated to appropriately handle rbac for both services without getting too confusing 15:18:51 <mattmceuen> do we have jayahn? 15:18:54 <srwilkers> i also think the configuration files need to be defined in values.yaml to allow for customization of the filters and matches for complex use cases 15:18:56 <srwilkers> i dont think we do :( 15:19:00 <portdirect> how come split for rbac? 15:19:32 <portdirect> would it not just be a rbac-fluent.yaml, and rbac-fluentbit.yaml ? 15:19:54 <srwilkers> the helm-toolkit function names the entrypoints by release 15:19:59 <portdirect> totally agree on moving configs to values. 15:20:09 <srwilkers> so splitting them out in the way you mentioned results in duplicate names 15:20:28 <portdirect> but the entrypoint service account would be the same for both 15:21:26 <srwilkers> okay, thats a misunderstanding on my part then 15:21:55 <portdirect> though it does touch on tins rbac work - and how much simpler that will make things - can we add that to parking lot 15:22:27 <mattmceuen> yup 15:22:34 <mattmceuen> Can the values file configurability be done in a follow-on PS? 15:23:18 <srwilkers> it could be. the prime value add there in my mind is that we could then configure fluentd to capture the logs running in the osh-infra gates 15:24:21 <mattmceuen> Cool - I'm looking forward to getting the great work to date merged if possible 15:24:54 <mattmceuen> So where did you land srwilkers - do you think we need to split the fluentd chart after all? 15:26:30 <srwilkers> its my opinion that it'd make things cleaner and i dont think the collector and aggregator need to be coupled in the same chart, but thats just my opinion 15:26:40 <srwilkers> im not entirely stuck on it 15:27:36 <mattmceuen> Would that be overly difficult to change later if we went down the single-chart path today? 15:27:40 <srwilkers> nah 15:28:04 <MarkBaker> o/ 15:28:10 <srwilkers> hey MarkBaker 15:28:15 <mattmceuen> awesome - I am in git'er merged mode as the holidays approach :-D 15:28:21 <mattmceuen> GM MarkBaker! 15:28:36 <srwilkers> let me make sure nothing else needs to be cleaned up in that patchset then should be good to go 15:28:38 <alanmeadows> Egg Nog in one hand, +2 mouse in the other -- sounds dangerous. 15:28:56 <mattmceuen> alanmeadows same hand 15:29:03 <alanmeadows> nice 15:29:08 <portdirect> why does `merica not understand the benefits of mulled wine? 15:29:21 <mattmceuen> sounds like a cultural learning opportunity 15:29:24 <mattmceuen> srwilkers you keep the talking stick 15:29:28 <MarkBaker> alanmeadows, drinks egg nogs that require 2 hands? 15:29:29 <mattmceuen> #topic Prometheus 2.0 15:29:51 <mattmceuen> alanmeadows is a legit pro at egg nog 15:29:58 <alanmeadows> It comes in a stein 15:30:07 <srwilkers> so prometheus 2.0 was released a bit ago. it brought some benefits im happy to see 15:30:21 <srwilkers> the storage layer was drastically reworked to improve performance and reduce resource consumption 15:31:00 <srwilkers> it also changed the rules format from gotpl to yaml, which makes me especially happy 15:31:21 * portdirect does happy dance 15:31:37 <srwilkers> ive got a patchset to change the prometheus chart in osh-infra to use prometheus 2.0 by default 15:32:00 <srwilkers> there are a few other items i want to get merged first before looking to merge it, but it works currently 15:32:14 <alanmeadows> That would be fantastic, one primary concern surrounding prometheus up until this point was its resource consumption 15:32:19 <srwilkers> one of the new storage features added was the ability to snapshot the time series database 15:32:49 <srwilkers> alanmeadows: yeah, i've had a few instances running at home and it wasnt uncommon for prometheus to fall over after chewing through resources 15:33:28 <srwilkers> i was curious if there was appetite for including a cron job in the prometheus chart for snapshotting the database at configured intervals 15:35:15 <portdirect> srwilkers: what would the objective of the cron job be? backup? 15:35:23 <alanmeadows> Beyond that, we should think about how we might trigger that action as well, and how we might apply the same approach to things like mariadb - preupgrade actions across all of these data warehouses 15:35:29 <srwilkers> portdirect: yep 15:35:48 <mattmceuen> so prometheus can have multiple servers replicating the same data in case one goes down. Would we be using it that way? 15:35:51 <srwilkers> alanmeadows: also agree 15:35:54 <portdirect> we could really do with a `helm fire-hook foo` 15:36:02 <portdirect> that operates the same way test does 15:36:09 <alanmeadows> yes 15:36:21 <alanmeadows> really the ask would just be make 'test' arbitrary 15:37:01 <portdirect> should we look into the feasibility of making a ps for that? 15:37:17 <portdirect> ohh github - s/ps/pr 15:37:22 <mattmceuen> I like that idea 15:37:23 <srwilkers> i think that'd make sense 15:37:26 <alanmeadows> It satisfies two outstanding asks, being able to break tests apart into impacting vs non-impacting 15:37:37 <alanmeadows> and arbitrary actions like backups/snapshots/reversions/... ? 15:38:14 <portdirect> i think so 15:38:28 <portdirect> give us a new hammer, and we'll find nails... 15:38:45 <srwilkers> :) 15:38:52 <alanmeadows> or just hit things 15:38:59 <srwilkers> that too 15:39:06 <mattmceuen> Any other prom bits you want to cover now srwilkers? I'm looking fw to 2.0 15:39:08 <srwilkers> but that concludes my points there 15:39:12 <srwilkers> nope, thats it for me 15:39:30 <mattmceuen> cool. portdirect get ready 15:39:44 <mattmceuen> #topic The Future of Ceph! 15:39:58 <alanmeadows> Is this, the Ceph of Tomorrow, Today? 15:40:10 <mattmceuen> give us a glimpse of this amazing future, technologist 15:40:17 <portdirect> its the ceph of the futrure, tomorrow. 15:40:28 * alanmeadows sips some nog. 15:40:44 <portdirect> at kubecon i had some good chats with the ceph peeps re the various ceph-chart efforts 15:41:14 <portdirect> and I think (ok hope) that we all have the same desire for there to be one well maintained chart that deploys ceph 15:41:29 <portdirect> rather than the 3 or so versions I know of today. 15:41:51 <mattmceuen> variety is the spice of maintenance 15:42:00 <portdirect> the chart used by ceph/ceph-helm is actually a fork of ours, which in turn is a chartified version of seb-hans work 15:42:40 <portdirect> I put a summary of the steps that we hashed out to get to a single chart in the etherpad 15:42:54 <portdirect> for the sake of meeting logging I'll paste them here 15:44:14 <portdirect> As ceph goes much further than just OpenStack, it makes sense for this to be hosted either by Ceph, or in K8s/charts 15:44:15 <portdirect> ceph/ceph-helm is based on the osh ceph chart from approx 3 months ago 15:44:15 <portdirect> We met with the ceph maintainers (core team) at kubecon and discussed their desires/issues with both of our charts and come up with the following proposals: 15:44:15 <portdirect> 1) Split Keystone endpoint creation out of the ceph chart and into its own thing (that would live in OSH) 15:44:15 <portdirect> 2) Merge the healthchecks from OSH into Ceph-Helm 15:44:15 <portdirect> 3) Merge the luminous support from Ceph-Helm into OSH 15:44:15 <portdirect> 4) Update the loopback device creation scripts from bash to ansible 15:44:16 <portdirect> 5) Combine the disc targetting efforts from both OSH and Ceph-Helm into a single effort that brings the reliability of RH's approach with the OSD by bus-id from OSH 15:44:16 <portdirect> 6) The Ceph-Helm chart will then be moved/mirrored to k8s/charts 15:44:17 <portdirect> 7) At this point, add an OSH gates to experimentally use the Ceph-Helm chart 15:44:17 <portdirect> 8) Once stabilised and we have confidence, depreciate the OSH ceph chart 15:44:45 <portdirect> the order is obviously somewhat flexible - but as a general outline how does this seem? 15:47:16 <mattmceuen> digesting... 15:47:23 <alanmeadows> What is the destination, for example in #2 -- ceph/ceph-helm or K8s/charts? 15:47:45 <portdirect> ceph/ceph-helm 15:48:08 <alanmeadows> is this mismash of combination in various targets before aligning on one target because this spans a large period of time? 15:48:12 <portdirect> and then once the majority of big changes are done we move to k8s/charts 15:48:36 <portdirect> i would like us at 7 by eoy 15:48:49 <alanmeadows> i.e. #2 does work in ceph-helm, #3 in osh 15:48:50 <portdirect> and 8 in the first two weeks of next 15:49:34 <portdirect> yup - I have merge rights in ceph/ceph-helm to faciliate this moving faster 15:49:44 <jayahn> Hi late 15:49:51 <mattmceuen> hey jayahn! 15:50:32 <mattmceuen> portdirect: s/disc/disk/ and I then I like the plan 15:50:52 <srwilkers> hey jayahn 15:50:55 <jayahn> Just fell a sleep. :) 15:51:16 <jayahn> While waiting for the meetinf 15:51:38 <srwilkers> just curious portdirect, as i havent paid much attention to the ceph work. does the luminous support include enabling the built-in prometheus metrics exporter via ceph-mgr? 15:52:08 <srwilkers> as that makes the ceph-exporter work something we can drop once that's accomplished i think 15:53:17 <portdirect> srwilkers: it does :D 15:53:27 <srwilkers> nice :) 15:54:00 <mattmceuen> I think your plan is the plan portdirect, unless there are any other thoughts 15:54:37 <alanmeadows> It gets us to a unified chart the community owns, I'm all good 15:54:43 <mattmceuen> t minus 5 mins 15:55:22 <mattmceuen> and still agenda items - may have to punt till next week. alanmeadows, will yours fit in 5? 15:55:25 <jayahn> unified ceph chart. sounds really good to me 15:56:37 <portdirect> if we can fit in alanmeadows's topic that would be great 15:56:43 <mattmceuen> #topic Holistic etcd approach 15:57:19 <mattmceuen> to quote alanmeadows: Holistic etcd approach 15:57:19 <mattmceuen> Various charts trying to use etcd, can we (and should we) unify an approach, or let etcds sprinkle the cloud? 15:57:19 <mattmceuen> e.g. https://review.openstack.org/#/c/525752/ 15:57:19 <mattmceuen> Rabbit would likely follow in approach at some point 15:57:19 <mattmceuen> Calico .... 15:57:37 <alanmeadows> I see a few different etcds popping up 15:58:12 <alanmeadows> This seems like we need to tackle this one if nothing else to be cognizant of what we're doing 15:59:03 <portdirect> agreed - I'd like us to get a solid etcd chart that we can use 15:59:04 <mattmceuen> Start with a spec of one etcd chart to rule them all? 15:59:29 <alanmeadows> I think so, with a few harder needs in mind 15:59:40 <alanmeadows> not just resiliency but backups, disaster recovery, and so on 15:59:44 <mattmceuen> Let's let this marinate and continue to discuss next time - we're out of time friends 15:59:51 <mattmceuen> thanks everyone 16:00:07 <mattmceuen> see y'all in the #openstack-helm ! 16:00:10 <mattmceuen> #endmeeting