#openstack-meeting-5 log

15:00:41 <mattmceuen> #startmeeting openstack-helm
15:00:42 <openstack> Meeting started Tue Jun 19 15:00:41 2018 UTC and is due to finish in 60 minutes.  The chair is mattmceuen. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:43 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:45 <mattmceuen> #topic rollcall
15:00:45 <openstack> The meeting name has been set to 'openstack_helm'
15:01:08 <mattmceuen> GM all
15:01:20 <mattmceuen> Here's our agenda for today's OpenStack-Helm meeting: https://etherpad.openstack.org/p/openstack-helm-meeting-2018-06-19
15:01:43 <rwellum> o/
15:01:49 <mattmceuen> Please add anything to it you'd like to discuss, PS needing review, etc!
15:01:50 <radeks> o/
15:01:52 <mattmceuen> o/
15:02:10 <srwilkers> o/
15:02:14 <portdirect> o/
15:03:33 <rwellum> mattmceuen: radeks is from my team, we're going to be slowly attending more and doing more with osh as I've spoken about before.
15:03:42 <mattmceuen> Welcome radeks!
15:03:54 <mattmceuen> Glad to have you with us :)
15:04:01 <roman_g> o/
15:04:07 <mattmceuen> Alrighty - first off do we have Robert Choi in the house?
15:04:17 <mattmceuen> (o/ roman_g!)
15:04:44 <mattmceuen> He's got the first agenda item, but we can come back to him if he's not on yet.
15:05:02 <mattmceuen> #topic Doc Update Touchpoint
15:05:24 <mattmceuen> For anyone new around here, our documentation is a key area we want to enhance, grow, and groom
15:05:25 <portdirect> from reading the etherpad, im not sure if he or jayahn are here today?
15:05:40 <portdirect> +++ to that mattmceuen :)
15:05:57 <mattmceuen> oh, maybe I misunderstood "next week" -- maybe Robert meant next next week :)
15:06:30 <mattmceuen> We've been capturing some enhancements that should be made to the Multinode Install guide here: https://etherpad.openstack.org/p/openstack-helm-multinode-doc
15:07:05 <jayahn> i am here, but stuck in different meeting and task.
15:07:08 <mattmceuen> I've started implementing a number of bits in this PS: https://etherpad.openstack.org/p/openstack-helm-multinode-doc
15:07:11 <mattmceuen> Hey jayahn!
15:07:13 <jayahn> we meant next week irc meeting.
15:07:59 <roman_g> I saw PS from Matt, would review today/tomorrow. One thing I would add is answer to the question
15:08:02 <mattmceuen> Cool thanks for clarifying :)  for now I'll just paste in your agenda note for general awareness at the end, since that's good to be aware of
15:08:10 <mattmceuen> ^ jayahn
15:08:16 <roman_g> "What to do next once installation finishes?"
15:08:16 <mattmceuen> Thanks in advance roman_g
15:08:27 <rwellum> Wrong link mattmceuen ?
15:09:07 <portdirect> roman_g: use openstack - and be happy?
15:09:12 <lamt> o/ running late
15:09:27 <srwilkers> portdirect: hehe
15:09:28 <mattmceuen> whoops
15:09:40 <roman_g> portdirect: Matt: add link to https://docs.openstack.org/openstack-helm/latest/install/developer/exercise-the-cloud.html to the bottom of Multinode install page.
15:09:41 <mattmceuen> Someday I will master the art of copy and paste
15:09:42 <mattmceuen> https://review.openstack.org/#/c/576342/
15:11:12 <roman_g> @sigit in Slack was asking on #openstack-helm on Thursday, where could he get openrc file
15:11:28 <roman_g> and that exercise-the-cloud.html is the answer
15:12:18 <mattmceuen> So for that one roman_g, I think we could potentially copy a use-it script to the multinode install batch of scripts, rather than cross-linking from the multinode guide back to the dev scripts.
15:13:00 <roman_g> or just move this file from /developer/ subdir to one level up to /install/
15:13:01 <portdirect> roman_g: we just need to get better at refacoring what we have
15:13:07 <mattmceuen> However, the use-it script makes some assumptions around network setup, etc... not sure if that's a good idea for a multinode setup or not.  What do you think portdirect
15:13:12 <portdirect> for example the openrc is at the bottom of here: https://docs.openstack.org/openstack-helm/latest/install/developer/exercise-the-cloud.html
15:13:19 <portdirect> mattmceuen: totally agree
15:13:47 <portdirect> at the end of the multinode guide - we should have a 'real' openstack deployment
15:14:08 <mattmceuen> I suppose we could leave the scripts as-is and then just say at the bottom of the multinode guide something like, "for examples of how to exercise your new OSH cloud, please see <link> the developer guide"
15:14:22 <portdirect> which means the dev-kick-the-tyres script wont be relevant in 99% of cases.
15:14:57 <portdirect> we should find actual openstack docs on cloud use - and link there
15:15:05 <portdirect> if they dont exist, lets make them
15:15:26 <portdirect> as it should be the same regardless of deployment system: osh, osa, kolla, trippleo etc
15:16:43 <mattmceuen> ++
15:17:56 <mattmceuen> I added that to the etherpad
15:18:07 <mattmceuen> rwellum, I haven't gone through your additions to the etherpad yet in detail
15:18:19 <mattmceuen> Do you want to talk through them in brief here?
15:19:19 <rwellum> Yeah - for the most part it's digging out 'things' from the various playbooks that I was taking for granted when running the AIO
15:20:24 <rwellum> I mainly have issues with ceph - I still think some fundamentals are missing
15:20:36 <mattmceuen> What are you working toward with that - do you see the assumptions making it back into the multinode guide?  Or something standalone?
15:21:20 <rwellum> I am a little on the fence here - because to take the guide away from executing the gate scripts is quite a big step.
15:21:34 <rwellum> Maybe a third guide is in order?
15:21:34 <portdirect> im not sure i quite agree
15:22:14 <portdirect> in that once you had k8s setup, with hosts able to resolve k8s services dns, and ceph-common on the host
15:22:30 <portdirect> what more is/was required, other than building the charts?
15:22:45 <rwellum> Nothing - agreed
15:23:07 <rwellum> Yeah I regressed :( - I don't know why ceph is acting up for me now.
15:23:31 <portdirect> so from this point on: https://docs.openstack.org/openstack-helm/latest/install/multinode.html#deploy-openstack-helm
15:23:46 <portdirect> the guide should be totally agnostic of k8s deployment tooling
15:23:55 <portdirect> provided the above criteria are met
15:24:21 <mattmceuen> So after the PS above, the next PS I'll put in will split out the AIO setup entirely
15:24:44 <rwellum> There's a few assumptions regarding number of nodes etc - in the multinode scripts I believe - or?
15:24:51 <portdirect> nice - i think getting this in  - even if we just have a stub for the above points will really help
15:25:03 <mattmceuen> The idea is that the new multinode guide will still link to it as an example way to set up a dev-grade k8s cluster, but also to link out to more legit ways to set up a k8s cluster for prod use
15:25:16 <portdirect> ++
15:26:02 <rwellum> ok I can buy into that
15:26:30 <mattmceuen> That may help us refer out to good ways to stand up clusters independently of the rest of the guide, and avoid guide sprawl
15:27:14 <portdirect> agreed - we are already at risk of that - eg the multiple places we tell you how to set up sudo :)
15:27:16 <mattmceuen> I'll take a look at your material in there rwellum and chew on it as well with an eye toward how it can best fit in
15:28:16 <mattmceuen> So speaking of the troubleshooting doc - we also have an etherpad for that one https://etherpad.openstack.org/p/openstack-helm-troubleshooting
15:28:25 <rwellum> Ok - I also will document how I create the k8s cluster - as a potential example - but also on the fence because really - we don't need another k8s deployer :)
15:28:49 <rwellum> And it's as portdirect says - osh is agnostic
15:29:22 <mattmceuen> Def don't spend too much time on anything that you don't think would be valuable to others, though rwellum - some things are always going to be operator-specific
15:29:34 <portdirect> rwellum: can we not just link to kubead, kubespray and other community proejcts?
15:29:42 <srwilkers> portdirect: that would be the sane thing to do
15:30:12 <roman_g> link to the kubernetes - the hard way ;)
15:30:30 <mattmceuen> HAHA
15:30:38 <portdirect> ^^ actually we 100% should
15:30:40 <srwilkers> i think it's important we reduce how much kubernetes specific documentation/support we offer
15:30:55 <srwilkers> the hard way is how i learned
15:31:02 <srwilkers> it's actually pretty good
15:31:02 <portdirect> as virtually every deployment tool is based on it
15:31:08 <roman_g> same here
15:31:29 <rwellum> Yeah agreed
15:31:30 <mattmceuen> Yup I'm planning on linky-ing in the next PS, if you have any recommendations for good installers / guides etc for prod use please add them to the multinode etherpad!
15:31:30 <roman_g> it's nearly production install.
15:32:05 <mattmceuen> So for troubleshooting: https://etherpad.openstack.org/p/openstack-helm-troubleshooting
15:32:30 <mattmceuen> At the very very bottom I stashed a couple of errors I ran into during my recent multinode install adventure
15:32:48 <mattmceuen> Symptom: one MDS started fine, but another died, complaining in the logs about a feature flags incompability.
15:32:48 <mattmceuen> Cause:  This error was caused by an old version of `docker.io/ceph/daemon:tag-build-master-luminous-ubuntu-16.04` being cached on one node, which was incompatible with a newer version on another node.
15:32:48 <mattmceuen> Resolution:  Pull the updated docker image on the node with the sad mds
15:32:48 <mattmceuen> Error message:  mds.mds-ceph-mds-65bb45dffc-qfqnr handle_mds_map mdsmap compatset compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2} not writeable with daemon features compat={},rocompat={},incompat={1=base v0.20,2=client writeable
15:32:48 <mattmceuen> ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=file layout v2}, killing myself
15:32:48 <mattmceuen> Symptom:  kube-system ingress (host networking) is running and openstack ingress (non-host networking) is failing
15:32:48 <mattmceuen> Cause:  The default calico pod subnet conflicts with a preexisting subnet in this environment
15:32:49 <mattmceuen> Resolution:  in the multinode-vars.yaml file, override the default via `kubernetes_cluster_pod_subnet: 10.25.0.0/16`
15:32:49 <mattmceuen> Error message: Readiness probe failed: Get http://192.168.23.131:10254/healthz: dial tcp 192.168.23.131:10254: getsockopt: connection refused
15:33:13 <srwilkers> well, i dont know if i'd say a production installation necessarily, but i like that it gives people exposure to whats going on, instead of just throwing kubeadm at the wall and getting a cluster
15:33:16 <srwilkers> but thats just me
15:33:23 <mattmceuen> IRC rendered that sadly -- that's two different errors.  But the point is that I think this might be a straightforward approach to recording and helping people solve common issues
15:33:33 <rwellum> There's a 'WHAT'S NEXT!!' placeholder in that guide too - I added, my subtle way of asking for a core who knows osh to add to it.
15:33:36 <mattmceuen> Symptom / Cause / Resolution / Error Message
15:33:37 <rwellum> +1 mattmceuen
15:33:48 <mattmceuen> short & sweet & googleable
15:34:06 <portdirect> sounds good
15:34:17 <portdirect> if we can get a nice start on this
15:34:23 <mattmceuen> Interested in feedback, is that what a troubleshooting guide should look like?
15:34:32 <portdirect> then we should try to answer questions via irc/slack as ps's
15:34:46 <portdirect> so they are kept as reference
15:34:48 <mattmceuen> I like that
15:35:02 <portdirect> i find that 90% of support is answering the same things
15:35:07 <rwellum> My thought for the TS guide is that before someone opens a bug on osh, we guide them to this guide.
15:35:20 <mattmceuen> ++
15:35:24 <portdirect> and its my bad for not documenting them, but this would lower the barrier to that loads mattmceuen
15:36:12 <mattmceuen> Cool, I will add those couple errors as a start into the TS guide and we squint at it, and then continually add to it
15:36:53 <mattmceuen> Alright - anything else on the Doc front before moving along?
15:37:11 <mattmceuen> Good discussion guys and appreciate all the attention on this since last week
15:37:31 <mattmceuen> Alrighty srwilkers you're up!
15:37:37 <mattmceuen> #topic Logging Updates
15:37:38 <srwilkers> cool
15:37:49 <mattmceuen> fluentbit sidecar for ceph-mon and ceph-osd:  https://review.openstack.org/#/c/575832/
15:37:49 <mattmceuen> change to logging.conf for openstack services:  https://review.openstack.org/#/c/576001/
15:38:16 <srwilkers> i've proposed adding fluentbit sidecars to the ceph-mon and ceph-osd charts, to allow us to gather the logs that get placed in /var/log/ceph in those pods
15:38:57 <rwellum> My question here srwilkers was why not make it a default?
15:39:37 <portdirect> as that creats a hard dependency on fluentd
15:39:37 <srwilkers> rwellum: it's set as a default currently as we don't deploy fluentd in the single or multinode gates for openstack-helm
15:39:44 <srwilkers> and what portdirect said
15:40:04 <portdirect> and we want these charts to be compose able simply
15:40:11 <srwilkers> it follows what we've done with the prometheus exporters tied to things like rabbitmq and mariadb -- we leave them disabled by default so as not to create dependencies or assumptions
15:40:43 <srwilkers> but for those who want to use it, it provides additional insight into ceph logged events
15:41:08 <portdirect> also leaves the door open for people to add alternate log aggregators
15:41:23 <rwellum> Ok makes sense
15:42:38 <srwilkers> in addition, we can also use the tags on the logged events to possibly add some sane fluentd filters in the future if we want
15:43:14 <mattmceuen> srwilkers for the logging.conf change - looks awesome; would it prevent openstack logs from going to stdout by default?  I.e. breaking kubectl logs?
15:43:47 <srwilkers> no.  you can define multiple handlers
15:43:53 <mattmceuen> neat
15:44:12 <rwellum> There's a difference between fluentbit and fluentd though right?
15:44:20 <srwilkers> but the cool thing with that change is that for any version >= ocata, we can use the fluent formatters
15:44:31 <mattmceuen> that will be really great then
15:44:44 <portdirect> srwilkers: I'll get some new images published with that over the next 48 hours
15:44:45 <srwilkers> rwellum: functionality is the same, but the big difference is that fluentbit has a much smaller resource footprint than fluentd
15:45:12 <rwellum> Yeah was confused because you said fluentbit sidecars
15:45:14 <srwilkers> so we use fluentbit for the sidecar, then forward the messages to a fluentd serving as an aggregator
15:45:27 <rwellum> Yeah makese sense
15:45:37 <rwellum> I'll go read the PS again
15:45:54 <srwilkers> but the fluent formatter and handler for the openstack services makes me happy, as we can send the logs directly to fluentd
15:46:07 <srwilkers> instead of stdout > fluentbit > fluentd
15:46:18 <srwilkers> this also gets us something we've wanted for awhile
15:46:56 <srwilkers> using the fluent formatter and handler gets us full stacktraces when they're raised
15:47:04 <mattmceuen> woo hoo!!
15:47:18 <srwilkers> can also get us tags for things like the project name, host name, etc
15:47:26 <mattmceuen> stacktraces - the most important part of logging :)
15:47:46 <srwilkers> and since the services get a unique tag, we can define filters in fluentd in the future
15:47:58 <srwilkers> ie: do this for nova, do this for neutron, etc
15:48:51 <srwilkers> can also take the recent updates to the fluent-logging chart and create multiple indices in elasticsearch, and use the project tags to create indices per openstack service if your heart so desired
15:49:06 <srwilkers> but as mentioned, you do need at least ocata to use that formatter
15:49:29 <srwilkers> but even if you dont, i still feel this gives an operator greater control over what they want to see and how
15:49:46 <rwellum> Have to bow out a little early today - will continue with deployment this afternoon and will bug the IRC :)
15:50:01 <mattmceuen> Thanks rwellum!  Looking forward to seeing the success and/or fallout! :)
15:50:11 <srwilkers> that's it for me
15:50:17 <mattmceuen> Thanks srwilkers - that's awesome
15:50:30 <mattmceuen> Looking forward to playing with that :)
15:50:38 <mattmceuen> #topic topics for next time
15:50:49 <mattmceuen> Just one thing to get in front of y'all -- a request for a discussion next week
15:50:49 <mattmceuen> Support multi versions of Rally - Let's have some time to think about it and discuss again next week.
15:51:11 <mattmceuen> There is a little more discussion / brainstorming in the agenda https://etherpad.openstack.org/p/openstack-helm-meeting-2018-06-19
15:51:25 <mattmceuen> #topic Roundtable
15:51:39 <mattmceuen> We have 9 minutes left - anything else you would like to discuss?
15:52:31 <mattmceuen> Or also - any PS urgently needing review?
15:53:12 <portdirect> oh this is pretty cool: https://review.openstack.org/#/c/570658/
15:53:29 <portdirect> to be honest, I'm kinda suprised it works at all
15:53:37 <portdirect> but would be really nice to have
15:53:46 <portdirect> I'm not sure how we could gate it
15:53:46 <mattmceuen> https://twitter.com/IanFromATT/status/1008815392016551941
15:54:07 <portdirect> and also really uncomfortable with some of the things it does
15:54:14 <mattmceuen> I sense a third party gate hooked up to a macbook portdirect
15:54:28 <srwilkers> https://review.openstack.org/#/c/575157/
15:54:38 <portdirect> as messing with a mac deployment (ie doeploying homebrew etc) is kinda bad i think
15:54:55 <portdirect> as most people wont want to redeploy their mac to clean up an env...
15:55:05 <portdirect> but with some changes, could be super valuable
15:55:06 <srwilkers> portdirect: yeah, im not a huge fan of that one
15:56:28 <mattmceuen> Alright guys - unless there's anything else I can give you a full 3 minutes back
15:56:46 <mattmceuen> Thanks!
15:56:49 <mattmceuen> #endmeeting