15:00:30 #startmeeting openstack-helm 15:00:31 Meeting started Tue Dec 12 15:00:30 2017 UTC and is due to finish in 60 minutes. The chair is mattmceuen. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:32 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:34 The meeting name has been set to 'openstack_helm' 15:00:39 #topic Rollcall 15:00:44 GM everyone! 15:00:53 o/ 15:00:53 o/ 15:00:57 its morning 15:01:02 wont say if its good yet or not 15:01:22 Here's the agenda - I'll give folks a couple mins to fill it out https://etherpad.openstack.org/p/openstack-helm-meeting-2017-12-12 15:02:01 srwilkers: it's gonna be a great day man 15:02:15 o/ 15:02:23 our optimistic captain at the helm would never steer us wrong 15:02:43 #titanic 15:02:57 too soon mattmceuen 15:02:59 too soon 15:03:05 oof 15:03:28 im more of a hunt for red october kinda guy 15:03:38 alrighty let's get this show on the road 15:03:54 #topic Dependencies on non-default images 15:04:34 Let me lay this out for y'all. I want to accomplish two things here: 15:04:35 1) Come up with a general principle for us OSH engineers to apply 15:04:35 2) Come up with a tactical plan for Hyunsun's PS 15:05:07 First the problem statement: Hyunsun has a PS that has a feature (lbaas plugin), which is turned off by default 15:05:37 The feature doesn't work with the default neutron kolla 3.2.0 image that we have configured 15:06:05 It doesn't cause any issues unless you turn the feature on, but if you turn it on, you also have to switch out the image to support the feature 15:06:43 That's something that either needs to be documented very well (with a reference to an image you can use which supports the feature), or, we should apply a "feature must wait till the default images support it" rule 15:07:05 So that's the RFC for you all. I've heard both opinions. 15:07:58 I am personally leaning toward "don't merge until the default image supports" 15:08:21 Perhaps leaving the door open for "... unless there is a really special circumstance we haven't thought of yet" 15:08:41 Agreed, though we should confirm if the 4.0.0 image works ok with newton 15:08:45 Otherwise we could end up with spaghetti dependencies 15:08:56 you're skipping ahead portdirect! 15:09:07 lol - I'll be quite ;) 15:09:36 Any dissenting or reaffirming opinions? 15:09:39 im skeptical of having a default image that doesnt work. as it shouldn't be considered the default at that point 15:09:48 Everyone on xmas vacation already? :-D 15:10:10 its nothing but a placeholder then 15:10:31 ++ 15:10:41 Alrighty: 15:11:01 #agreed features should not be merged until they are supported by the default images, even if they're turned off by default 15:11:11 Next: let's get tactical 15:11:48 lbaas is supposed to be supported since kilo, and Hyunsun has or will file a bug with kolla for not supporting 15:12:17 I suspect that will not yeild joy as newton is eol. 15:12:20 We could potentially swap in the kolla 4.0.0 image just for the needful image, or swap in a loki image if it supports it out of box 15:12:20 in the past, we've had issues getting kolla to provide fixes to images that don't work with the charts we're building 15:12:25 plus what portdirect said 15:12:54 I think the first step would be to see if a 4.0.0 image works with 3.0.3 - if it does great 15:13:01 portdirect: ++ 15:13:09 Agree - I will pas that on to Hyunsun. Thanks guys. 15:13:23 on that ps as well while we are here 15:13:51 tlam__: late to the party! 15:13:52 we will need to test it a bit - as lbaas agent with haproxy used to be prone to leaving zombie processes about 15:13:58 o/ sorry was running late 15:14:09 so we may need an init system in that pod to reap them... 15:14:11 (the PS by the way is https://review.openstack.org/#/c/522162/) 15:14:14 * portdirect finished rant 15:14:35 * srwilkers thinks the patchset needs more cowbell 15:14:41 shortest rant I've heard out of you yet portdirect, you've been refining your style 15:15:09 Yup Hyunsun affirmed the zombie apocalype this morning 15:15:10 I just want LBaaS in :D its a great thing for us to use with magnum :) 15:15:31 amen 15:15:33 Kubernetes on Openstack on Kubernetes awaits :D 15:15:47 * srwilkers groans 15:15:50 turtles all the way down 15:15:53 * portdirect dares not abbreviate that. 15:16:07 Next: 15:16:14 #topic Fluentd Chart 15:16:18 Take it away swilkers! 15:16:44 the patchset in question is: https://review.openstack.org/#/c/519548/ 15:17:12 sungil and jayahn have done a great job at getting this work started, and i feel bad that it's moved destinations twice as we've worked to get osh-infra sorted 15:18:00 i think the works almost there, but it might need some tweaking to really shine. i think the charts need to be separated to appropriately handle rbac for both services without getting too confusing 15:18:51 do we have jayahn? 15:18:54 i also think the configuration files need to be defined in values.yaml to allow for customization of the filters and matches for complex use cases 15:18:56 i dont think we do :( 15:19:00 how come split for rbac? 15:19:32 would it not just be a rbac-fluent.yaml, and rbac-fluentbit.yaml ? 15:19:54 the helm-toolkit function names the entrypoints by release 15:19:59 totally agree on moving configs to values. 15:20:09 so splitting them out in the way you mentioned results in duplicate names 15:20:28 but the entrypoint service account would be the same for both 15:21:26 okay, thats a misunderstanding on my part then 15:21:55 though it does touch on tins rbac work - and how much simpler that will make things - can we add that to parking lot 15:22:27 yup 15:22:34 Can the values file configurability be done in a follow-on PS? 15:23:18 it could be. the prime value add there in my mind is that we could then configure fluentd to capture the logs running in the osh-infra gates 15:24:21 Cool - I'm looking forward to getting the great work to date merged if possible 15:24:54 So where did you land srwilkers - do you think we need to split the fluentd chart after all? 15:26:30 its my opinion that it'd make things cleaner and i dont think the collector and aggregator need to be coupled in the same chart, but thats just my opinion 15:26:40 im not entirely stuck on it 15:27:36 Would that be overly difficult to change later if we went down the single-chart path today? 15:27:40 nah 15:28:04 o/ 15:28:10 hey MarkBaker 15:28:15 awesome - I am in git'er merged mode as the holidays approach :-D 15:28:21 GM MarkBaker! 15:28:36 let me make sure nothing else needs to be cleaned up in that patchset then should be good to go 15:28:38 Egg Nog in one hand, +2 mouse in the other -- sounds dangerous. 15:28:56 alanmeadows same hand 15:29:03 nice 15:29:08 why does `merica not understand the benefits of mulled wine? 15:29:21 sounds like a cultural learning opportunity 15:29:24 srwilkers you keep the talking stick 15:29:28 alanmeadows, drinks egg nogs that require 2 hands? 15:29:29 #topic Prometheus 2.0 15:29:51 alanmeadows is a legit pro at egg nog 15:29:58 It comes in a stein 15:30:07 so prometheus 2.0 was released a bit ago. it brought some benefits im happy to see 15:30:21 the storage layer was drastically reworked to improve performance and reduce resource consumption 15:31:00 it also changed the rules format from gotpl to yaml, which makes me especially happy 15:31:21 * portdirect does happy dance 15:31:37 ive got a patchset to change the prometheus chart in osh-infra to use prometheus 2.0 by default 15:32:00 there are a few other items i want to get merged first before looking to merge it, but it works currently 15:32:14 That would be fantastic, one primary concern surrounding prometheus up until this point was its resource consumption 15:32:19 one of the new storage features added was the ability to snapshot the time series database 15:32:49 alanmeadows: yeah, i've had a few instances running at home and it wasnt uncommon for prometheus to fall over after chewing through resources 15:33:28 i was curious if there was appetite for including a cron job in the prometheus chart for snapshotting the database at configured intervals 15:35:15 srwilkers: what would the objective of the cron job be? backup? 15:35:23 Beyond that, we should think about how we might trigger that action as well, and how we might apply the same approach to things like mariadb - preupgrade actions across all of these data warehouses 15:35:29 portdirect: yep 15:35:48 so prometheus can have multiple servers replicating the same data in case one goes down. Would we be using it that way? 15:35:51 alanmeadows: also agree 15:35:54 we could really do with a `helm fire-hook foo` 15:36:02 that operates the same way test does 15:36:09 yes 15:36:21 really the ask would just be make 'test' arbitrary 15:37:01 should we look into the feasibility of making a ps for that? 15:37:17 ohh github - s/ps/pr 15:37:22 I like that idea 15:37:23 i think that'd make sense 15:37:26 It satisfies two outstanding asks, being able to break tests apart into impacting vs non-impacting 15:37:37 and arbitrary actions like backups/snapshots/reversions/... ? 15:38:14 i think so 15:38:28 give us a new hammer, and we'll find nails... 15:38:45 :) 15:38:52 or just hit things 15:38:59 that too 15:39:06 Any other prom bits you want to cover now srwilkers? I'm looking fw to 2.0 15:39:08 but that concludes my points there 15:39:12 nope, thats it for me 15:39:30 cool. portdirect get ready 15:39:44 #topic The Future of Ceph! 15:39:58 Is this, the Ceph of Tomorrow, Today? 15:40:10 give us a glimpse of this amazing future, technologist 15:40:17 its the ceph of the futrure, tomorrow. 15:40:28 * alanmeadows sips some nog. 15:40:44 at kubecon i had some good chats with the ceph peeps re the various ceph-chart efforts 15:41:14 and I think (ok hope) that we all have the same desire for there to be one well maintained chart that deploys ceph 15:41:29 rather than the 3 or so versions I know of today. 15:41:51 variety is the spice of maintenance 15:42:00 the chart used by ceph/ceph-helm is actually a fork of ours, which in turn is a chartified version of seb-hans work 15:42:40 I put a summary of the steps that we hashed out to get to a single chart in the etherpad 15:42:54 for the sake of meeting logging I'll paste them here 15:44:14 As ceph goes much further than just OpenStack, it makes sense for this to be hosted either by Ceph, or in K8s/charts 15:44:15 ceph/ceph-helm is based on the osh ceph chart from approx 3 months ago 15:44:15 We met with the ceph maintainers (core team) at kubecon and discussed their desires/issues with both of our charts and come up with the following proposals: 15:44:15 1) Split Keystone endpoint creation out of the ceph chart and into its own thing (that would live in OSH) 15:44:15 2) Merge the healthchecks from OSH into Ceph-Helm 15:44:15 3) Merge the luminous support from Ceph-Helm into OSH 15:44:15 4) Update the loopback device creation scripts from bash to ansible 15:44:16 5) Combine the disc targetting efforts from both OSH and Ceph-Helm into a single effort that brings the reliability of RH's approach with the OSD by bus-id from OSH 15:44:16 6) The Ceph-Helm chart will then be moved/mirrored to k8s/charts 15:44:17 7) At this point, add an OSH gates to experimentally use the Ceph-Helm chart 15:44:17 8) Once stabilised and we have confidence, depreciate the OSH ceph chart 15:44:45 the order is obviously somewhat flexible - but as a general outline how does this seem? 15:47:16 digesting... 15:47:23 What is the destination, for example in #2 -- ceph/ceph-helm or K8s/charts? 15:47:45 ceph/ceph-helm 15:48:08 is this mismash of combination in various targets before aligning on one target because this spans a large period of time? 15:48:12 and then once the majority of big changes are done we move to k8s/charts 15:48:36 i would like us at 7 by eoy 15:48:49 i.e. #2 does work in ceph-helm, #3 in osh 15:48:50 and 8 in the first two weeks of next 15:49:34 yup - I have merge rights in ceph/ceph-helm to faciliate this moving faster 15:49:44 Hi late 15:49:51 hey jayahn! 15:50:32 portdirect: s/disc/disk/ and I then I like the plan 15:50:52 hey jayahn 15:50:55 Just fell a sleep. :) 15:51:16 While waiting for the meetinf 15:51:38 just curious portdirect, as i havent paid much attention to the ceph work. does the luminous support include enabling the built-in prometheus metrics exporter via ceph-mgr? 15:52:08 as that makes the ceph-exporter work something we can drop once that's accomplished i think 15:53:17 srwilkers: it does :D 15:53:27 nice :) 15:54:00 I think your plan is the plan portdirect, unless there are any other thoughts 15:54:37 It gets us to a unified chart the community owns, I'm all good 15:54:43 t minus 5 mins 15:55:22 and still agenda items - may have to punt till next week. alanmeadows, will yours fit in 5? 15:55:25 unified ceph chart. sounds really good to me 15:56:37 if we can fit in alanmeadows's topic that would be great 15:56:43 #topic Holistic etcd approach 15:57:19 to quote alanmeadows: Holistic etcd approach 15:57:19 Various charts trying to use etcd, can we (and should we) unify an approach, or let etcds sprinkle the cloud? 15:57:19 e.g. https://review.openstack.org/#/c/525752/ 15:57:19 Rabbit would likely follow in approach at some point 15:57:19 Calico .... 15:57:37 I see a few different etcds popping up 15:58:12 This seems like we need to tackle this one if nothing else to be cognizant of what we're doing 15:59:03 agreed - I'd like us to get a solid etcd chart that we can use 15:59:04 Start with a spec of one etcd chart to rule them all? 15:59:29 I think so, with a few harder needs in mind 15:59:40 not just resiliency but backups, disaster recovery, and so on 15:59:44 Let's let this marinate and continue to discuss next time - we're out of time friends 15:59:51 thanks everyone 16:00:07 see y'all in the #openstack-helm ! 16:00:10 #endmeeting