15:00:49 #startmeeting openstack-helm 15:00:50 Meeting started Tue Nov 14 15:00:49 2017 UTC and is due to finish in 60 minutes. The chair is srwilkers. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:51 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:53 The meeting name has been set to 'openstack_helm' 15:01:29 #topic rollcall 15:01:33 hello 15:01:37 o/ 15:01:39 o/ 15:02:01 o/ 15:02:23 here's the agenda: https://etherpad.openstack.org/p/openstack-helm-meeting-2017-11-14 15:02:33 we'll give it a few minutes to see if anyone else comes along or wants to add to it 15:03:32 hey all! o/\o 15:03:41 hey v1k0d3n \o/ 15:03:43 o/*\o 15:04:56 alright, seems we've got a good list to start with 15:05:11 #topic: graph drawing in documentation 15:05:14 all you jayahn 15:05:35 I just wanted to have a graph drawing capability in doc. :) 15:05:51 i did my best to copy the example configuration. 15:06:04 i agree -- pictures are a great way to share things 15:06:46 not sure if i did all the necessary stuff to setup "sphinxcontrib-blockdig". if anyone can give more feedback. please. :) 15:06:50 admittedly i dont know enough about getting it enabled to tell if that's the right way or not 15:07:04 lamt is really good at that sort of stuff 15:07:19 o/ 15:07:28 ah. lamt is here. :) 15:07:42 https://review.openstack.org/#/c/519653/ 15:08:43 pls leave any feedback on ps. :) 15:08:46 ill poke lamt and make sure he gives some feedback there 15:08:52 anything else on this jayahn? 15:08:57 nope 15:09:05 sweet 15:09:09 #topic fluent-logging 15:09:17 i like this one -- take it away jayahn 15:09:40 we are almost done with putting flunt-based logging. 15:10:00 this is be the first one, it will be followed by next steps. 15:10:16 however, is openstack-helm-infra gate is working properly? 15:10:45 yeah, it's working -- i can work with you and provide feedback to do what we need to run it properly 15:11:02 okay. that would be great. 15:11:17 once things are tidied up, ill make sure everythings documented appropriately so adding new services is easy 15:11:29 can you do through review? or will it be better to setup a serate time? 15:11:50 id be happy to do it through review 15:11:52 if that's okay 15:11:55 okay. great! 15:12:15 also, wanted to get everyones and yours thoughts on something with the fluent-logging stuff 15:12:42 fyi, we have checked new version of fluent-bit support kubernetes plugin and some experimental kafka output. that will be our very next thing to do after this is merged. 15:13:10 okay. srwilkers shoot it. :) 15:13:40 would it be worth considering handling the parsers and fluentbit configurations via values.yaml? reason i ask is because while it works out of the box, its very opinionated in that it's expecting you'll only ever use the json logging driver for docker 15:14:02 ie, it only gets log events via tailing /var/log/whatever and /var/lib/docker/containers/whatever 15:14:44 it's not something i want us to spin the wheels on with the current patchset, because some of your work and mateuszb's work really depends on the fluent-logging stuff being finished 15:14:53 but might be worth considering as an enhancement down the road 15:15:25 okay. we will surely consider your idea into our enhancement task. 15:15:33 i will talk to sungil about this. 15:15:47 awesome :) i can draw up some pictures and throw some roughed up ideas your way too if that helps 15:16:04 right all docker container logs are json type. We need to change that. 15:16:30 and kubernetes logs in /var/log/xxx 15:16:44 but really jayahn -- it's great work. :) 15:16:48 as you said, the current ps will be just the first of all the waves coming after. let's make this base thing work, then continuosly enhance it 15:16:54 i agree 15:17:16 thats all i had 15:17:36 seungkyua is our senior developer. he agrees with you, srwilkers. 15:17:39 :) 15:17:50 nice -- pleasure to meet you seungkyua o/ 15:18:07 nice to meet you. 15:18:22 if he says yes.. it means yes for us :) 15:18:23 #topic default alert list spec 15:18:26 awesome :) 15:18:31 this is the first time online chat. :) 15:18:36 ah. this is very very early draft. 15:19:00 just want to get everyone's opinion on "what is the best way to write it". 15:19:41 I'll take a look at it tomorrow jayahn :) 15:20:02 I think we first define "alert/alarm definition" things like we would like to alert on cpu idle, cpu percent, etc. 15:20:16 but not defining actual trigger threshold in this spec. 15:20:17 yeah, i was going to say your input would be awesome mateuszb since this touches what you've been working against 15:20:29 jayahn: wrt to the fluent work, i can take a look at it too...we've been doing a lot of this recently too for some internal demos. would be nice to get this into upstream. 15:20:32 jayahn: yeah i agree 15:20:46 v1k0d3n: thatd be awesome :) 15:20:50 v1k0d3n, awesome :) 15:21:24 thanks mateuszb. i will add more alert definition in this week. your feedback would be really helpful 15:22:55 anything else on this one? 15:22:59 i think korzen_ only has the firs half of meeting time. let's turn it to him now. :) 15:23:09 sounds good 15:23:17 yes 15:23:26 #topic multi namespace support for entrypoint 15:23:44 so I would like to highlight that multiple namespace support is done in the PS 15:23:52 i just workflowed it :) 15:23:55 #link https://review.openstack.org/#/c/510810 Support services in different namespaces 15:24:09 #link https://review.openstack.org/#/c/511515/ Add jobs and daemonsets namespace support 15:24:27 after it being merged, the full solution is enabled 15:24:57 so we can add cross namespace dependencies for services via enpoints, and for jobs and daemonsets in dependencies section 15:25:19 I am testing it in use-case where every service have its own infra 15:25:31 nice work over there korzen_ ! :) great to see this added. 15:25:36 nice korzen_ :) 15:25:49 like keystone namespace would have its own mariadb, rabbimq ingress etc 15:25:56 ceph would be common 15:26:08 yeah that's awesome. always been the goal... 15:26:08 thx ;) 15:26:20 you guys made it reality. :) 15:26:55 glad to see that it is appreciated ;) 15:27:39 I guess that multiple namespace it is all 15:27:45 for RBAC 15:27:45 #topic RBAC support 15:28:01 #link https://review.openstack.org/#/c/464630 RBAC authorization support 15:28:22 this one i huge but it contains all RBAC rules that are needed to be run 15:28:39 I wanted to get portdirect review on that one 15:29:02 all necessary details are included in agenda 15:29:49 I would also test in for multiple namespace use-case in following days 15:30:20 but example with ceph and ceph-config made this PS ready for multiple namespace 15:30:25 korzen_ we will try to review this RBAC one as well 15:30:36 ill get portdirect to look at it today and provide his feedback 15:31:10 ok, I need to run 15:31:15 it is all from my side 15:31:16 later korzen_ :) 15:31:26 #topic log based alerting approaches 15:31:27 thanks korzen_ 15:31:27 bye 15:31:29 take it away mateuszb 15:31:41 I've got a couple of patchsets in review regarding log-based alarms 15:31:55 I've grouped them into 2 categories depending on the approach: 15:32:02 1. Based on ElastAlert: 15:32:07 ElastAlert chart: https://review.openstack.org/#/c/516629/ 15:32:11 Nagios: passive check for DB errors: https://review.openstack.org/#/c/518543/ 15:32:14 Pushing notifications from ElastAlert to Nagios: https://review.openstack.org/#/c/518711/ 15:32:34 and 2. Based on fluent-plugin-prometheus: 15:32:43 Gathering DB errors count using fluent-plugin-prometheus: https://review.openstack.org/#/c/514938/ 15:32:51 Example log-based alert in Prometheus: https://review.openstack.org/#/c/515061/ 15:32:54 Nagios: Prometheus check for DB errors in logs: https://review.openstack.org/#/c/519318/ 15:33:28 So I've verified that both of the solutions are ready to be integrated with Nagios (I wasn't so sure about ElastAlert+Nagios, but it works well) 15:33:29 these are really beautifully categorized examples. :) 15:34:00 I'm leaving it as it is until the decision is made which of the two approaches we choose (I'd vote for ElastAlert as it's precisely designed for log-based alerting - with a lot of configuration capabilities in place) 15:34:08 So any comments and votes are welcome :) 15:36:14 that's all from me 15:37:02 ElastAlert seems great tool to use. +1 on that. 15:37:02 however, since we are probably use prometheus alert manager for metric-based alert, it would be good to use single solution for all the alert. so +1 on fluent-plugin-prometheus. :) 15:37:26 i will do some discussion with my team members, and will leave our feedback. 15:37:34 You're not helping ;) 15:37:40 Ok that would be great 15:37:42 yeap. i know. :( 15:37:45 yeah, im a bit torn on this. i feel like ive introduced some confusion with nagios, as it was meant to be pitched as a deadmans switch for things like backing prometheus with ceph 15:38:12 but elastalert is able to fire off alerts independently right? it doesnt need nagios or alertmanager? 15:38:33 No, it doesn't need nagios and alertmanager 15:38:58 okay, that makes me feel better. 15:39:22 It fires off alerts independently - but there is a possibility to execute a script, which in turns executes the passive chech to Nagios 15:39:30 check * 15:39:35 cool :) 15:40:10 I would like to compare alert template on both solution, i mean how flexible it is to set some alert patterns. 15:40:30 yeah, that's something to consider for sure 15:40:39 but great work all around on this stuff mateuszb 15:41:07 Well, I may prepare a list of what's needed to add additional alert in both cases 15:41:57 to make things faster, I'll write it on slack tomorrow 15:42:01 sounds good :) 15:42:08 in order not to wait until the next meeting :) 15:42:18 great! 15:43:53 anything else? 15:44:07 no, that's all. Thanks 15:44:15 #topic reviews needed 15:45:07 Cell service: https://review.openstack.org/#/c/516810/ 15:45:10 cell service and nova placement is two essential stuff to do ocata. these are almost ready. pls do final review on this. :) 15:45:27 :) 15:46:06 FYI, as portdirect's request, we will do separate upstream "value override to make ocata work". probably make a new mvp values. 15:46:51 Neutron: Correct section name for linuxbridge bridge_mappings config: https://review.openstack.org/#/c/518503/ 15:46:52 jayahn: that'd be awesome. 15:47:13 we have been testing vlan-based provider-network w/ linuxbridge 15:47:21 to support some of legacy openstack env. 15:47:36 this is one of few thing we are fixing while doing that. 15:47:56 i think it is rather straight-forward. pls review. :) 15:48:04 just workflowed it 15:48:15 thanks 15:49:03 that is all 15:49:40 awesome :) 15:49:58 any other last minute items? 15:50:22 otherwise we can take the open discussion to the openstack-helm channel -- im getting rushed out of my conference room :) 15:50:37 bye 15:50:49 :) 15:50:52 bye 15:50:53 thanks for coming everyone 15:50:57 #endmeeting