Wednesday, 2018-12-19

*** ekcs has quit IRC01:40
*** ricolin has joined #openstack-self-healing02:20
*** ifat_afek has joined #openstack-self-healing06:54
*** alexchadin has joined #openstack-self-healing08:55
aspiershi, anyone around today? I'm not expecting anyone so close to the holidays :)09:00
ifat_afekI’m here :-)09:00
aspiershi :)09:00
aspiersanything you want to discuss?09:01
witekhi09:01
aspiershi witek :)09:01
ifat_afekI was just about to say Vitrage&Monasca ;-)09:01
witekhi aspiers and ifat_afek09:01
aspiersOK sure09:01
aspiers#startmeeting self-healing09:02
openstackMeeting started Wed Dec 19 09:02:12 2018 UTC and is due to finish in 60 minutes.  The chair is aspiers. Information about MeetBot at http://wiki.debian.org/MeetBot.09:02
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.09:02
*** openstack changes topic to " (Meeting topic: self-healing)"09:02
openstackThe meeting name has been set to 'self_healing'09:02
aspiers#topic vitrage / monasca integration09:02
*** openstack changes topic to "vitrage / monasca integration (Meeting topic: self-healing)"09:02
aspiersso, what's new with this? :)09:02
ifat_afekhttps://review.openstack.org/#/c/622899/09:03
ifat_afekThe integration is almost done, but we need to solve the issue of identifying in Vitrage the resource that the alarm is raised on09:03
witekwe have to clarify the mapping between Monasca alarms and Vitrage entities/resources09:03
aspiersawesome09:04
ifat_afekBTW, once we agree on the conceptual design, I think that we might be able to close the current change with minimal fixes, and do the complete solution later in a different change09:04
witek+109:04
aspiersmakes sense09:04
aspiersnice to see my colleague Joe on the review :)09:04
aspiersI should get him on this channel09:04
ifat_afekwitek: did you see my last mail? I suggested a solution, but I’m not familiar enough with Monasca so I need your approval that it will work09:05
witekyes, started writing an answer yesterday, will send today09:05
witekin general I think it should work09:05
ifat_afekCool, thanks09:05
witekwas wondering if you want to implement it as `global` mapping, or add to resource entity definition in Vitrage template?09:06
ifat_afekI thought about a global mapping (single configuration file), but let me think it over09:06
ifat_afekThe benefit of the global mapping is that the same alarm can be easily reused in several templates09:07
witektrue09:07
ifat_afekAny disadvantages in your opinion?09:07
witekI think global file would cover most of use cases, but definition in template might be more flexible09:08
witekbut my knowledge about Vitrage is very limited, so I might be wrong09:09
ifat_afekI need to think about it. Do you  have an example?09:09
witekhttp_status on node or VIP09:10
ifat_afekHow is it defined? it was probably written in the mail09:10
ifat_afek `name`: `http_status`, `dimensions`: {`hostname`: `node1`,09:10
ifat_afek`service`: `keystone`, `url`: `http://node1/identity`09:10
ifat_afek<http://node1/identity>}09:10
ifat_afekThis one, right?09:11
witekyes09:11
ifat_afekAnd why do you think we should handle it in the template?09:11
witekwhen configured with node URL, gives information about the service on the node09:12
ifat_afeksorry, I don’t understand09:12
witekwhen configured with VIP URL, information is one layer higher, for load-balanced service09:12
aspiersnewbie question: how is it currently done with Zabbix?09:13
ifat_afekwe are facing similar questions with Zabbix. So far we are using it for monitoring hosts, and these are statically defined in a zabbix_conf file. For monitoring vms, interfaces etc we didn’t implement a good solution yet09:14
aspiersah ok09:14
ifat_afekwe are also facing this question with Prometheus...09:14
aspiersis it worth writing a spec for this maybe?09:14
aspiersor is that too heavy-weight?09:14
ifat_afekof course, I’m trying to understand what this spec should include09:14
aspiersgot it :-)09:14
ifat_afekI don’t think it’s too heavy-weight, and it is definitely something we should handle09:15
ifat_afekwitek: let me get back to your question. You are saying that depending on the URL we should figure out the resource type?09:15
aspierswell, the spec could list multiple options but propose a preferred solution and list the other(s) as alternative(s)09:15
ifat_afekaspiers: of course09:16
ifat_afekthis is why I wanted to make a temporary fix for the current change in gerrit. But it should be a smart fix and not the existing POC code09:16
witekifat_afek: I think that in general operators might want to use the same metric to alarm about different things09:17
ifat_afekthe full implementation should not take a long time, the design is the complicated part09:17
ifat_afekwitek: I agree09:17
ifat_afekand how do the operators understand what is being monitored? suppose they see the alarms in Monasca itself, do they figure it out by the resource name? by the URL?…09:18
witekso it might be an advantage if they also have a mechanism to describe it in alarm entity definition, how a given alarm should be interpreted09:18
ifat_afekbut the interpretation should happen in Monasca first, right?09:18
ifat_afekwhen you create an alarm definition, you should somehow describe what you are monitoring09:19
witekoperators are free to define their own alarms09:19
witekthey know, what the metric measures09:19
ifat_afekIf I am an operator, and I see a ‘high cpu load’ alarm, how can I tell if it was raised on a vm or on a host? by the resource name? by a certain dimension?09:19
ifat_afekBTW, aspiers, if you prefer we can take this discussion offline :-)09:20
aspiersno, this is a great discussion and obviously important for self-healing :)09:21
aspiersand we probably don't have any other topics today anyway :)09:21
witekdepends how the metric is collected: each monasca-agent plugin provides unique metric names09:21
witekso, system metrics names are different then ones from libvirt plugin09:21
ifat_afekin this case, can we have a configuration file in Vitrage that determines the resource type per metric name? will it always be 1-1 relation?09:22
ifat_afekor maybe better, can we get this information directly from Monasca?09:22
witekI'd say not always, but in most cases09:23
*** eyalb1 has joined #openstack-self-healing09:23
ifat_afekif it’s most cases, then maybe we can’t do what I suggested09:24
ifat_afekso back to my previous question - if the same metric can be used for two different resource types, how does the operator understand what the alarm was raised on?09:25
ifat_afekI’m trying to understand the logic behind this, so I can use this logic in Vitrage09:25
witekI think the problem appears for generic metrics, like e.g. http_check09:26
witekwhich can be configured with really any http endpoint09:27
witekand to your question, the metric is uniquely described by its name and dimension key/values09:28
ifat_afekso I could say something like this: if metric_name==‘http_check’ and url==‘…..’ then resource_type is host and resource_id is ‘resource_id’?09:29
ifat_afekthe disadvantage is, of course, having a detailed description for every alarm09:30
ifat_afekand the need to manually update the description once in a while09:30
ifat_afekalternatively - does it make sense to ask for a dedicated ‘resource_type’ dimension in Monasca?09:31
witekconfiguration of dedicated `resource_type` in agent could be left to operator, per convention09:34
ifat_afekok, so we can’t force it nor assume it is there09:34
ifat_afekso it seems like we should have a slightly-complex configuration file. do you have a better idea?09:35
aspiers#link http://lists.openstack.org/pipermail/openstack-discuss/2018-December/000806.html mailing list discussion on the integration09:36
witekI think we should start with defining use cases we would like to cover and check if we can do it with simple conf file09:36
witekI think it should work for most cases09:36
ifat_afeksounds like a good idea09:37
ifat_afekso I can start a spec with use cases and you’ll help me09:37
witekI'll definitely help09:37
ifat_afeklater I can add a proposed solution to the spec09:38
ifat_afekbut I agree we should start with the use cases09:38
aspierssounds good to me too09:38
ifat_afekgreat. I’ll push an initial version today or tomorrow09:38
witekcool, thanks09:38
aspiersawesome!09:39
ifat_afekwhere to? I was thinking about vitrage-specs, because the implementation will most likely be inside Vitrage09:39
ifat_afekunless you think it should be in the self-healing repo09:39
aspiers#action ifat_afek will submit a spec with use cases09:39
witekyes, I think Vitrage repo is best suited09:39
aspierseither is fine by me09:39
ifat_afekwitek: do you have a simple solution for the existing change? e.g. use a dimension that will work in many cases but not all, instead of the one used for the POC? just so we can say this change is finished09:41
witekI would say, we could stay with current approach for the first version09:43
witekwill leave a comment in review09:43
ifat_afekgreat, thanks09:43
aspierscool. anything else on this topic?09:44
ifat_afeknothing on my side09:44
witekno, thanks09:44
aspiersI learned some useful things, especially that I need to improve my email filters ;-)09:44
witek:) oh, we haven't added [self-healing]09:45
*** eyalb1 has left #openstack-self-healing09:45
ifat_afekmy bad…09:45
aspiershaha no problem X-D09:45
aspiersok, well thanks a lot both - I'm SUPER happy and excited this discussion is happening :)09:45
ifat_afekme too!09:46
witekyes, me too, thanks ifat_afek for launching it!09:46
aspiersit's exactly the kind of cross-project work I was dreaming of for the SIG09:46
aspiersI will ping Joseph and see if he can join future IRC discussions, but I see he's on the mailing list thread already anyway09:47
aspiersand he's in the wrong timezone for this meeting09:47
aspiersdo either of you intend to join the one later today?09:47
aspiersno problem at all if not09:47
ifat_afekI plan to09:47
aspiersOK, it will be his morning then09:47
witekI'm not sure yet09:47
aspiersI'll see if he can, but I guess it's not a big deal if not09:48
aspiersif you included [self-healing] when announcing the spec, maybe it can get a few more reviewers09:48
aspiersI will certainly review, anyway09:49
ifat_afekusually I don’t announce specs, but I can do it this time09:49
ifat_afekbecasue indeed it is interesting, and I’ll be happy to hear more opinions09:49
aspiersgreat :)09:49
aspiersalright, just very briefly for the record...09:49
aspiers#topic service health check API09:50
*** openstack changes topic to "service health check API (Meeting topic: self-healing)"09:50
aspiersthere seems to be some movement on this, since the TC have proposed it as a goal for Train09:50
aspiersbut there needs to be a champion goal09:50
aspiersI have initiated some discussion in SUSE about this - maybe there is a chance that I or another colleague could volunteer for that09:51
aspiersbut we need to discuss prioritisation first, so can't guarantee anything09:51
aspiersin any case, if we had a health check API across multiple services, this would presumably tie in very nicely with the Vitrage/Monasca integration efforts09:52
ifat_afekof course, I think it’s a great initiative, and I’ll be happy if you or your colleagues can drive it forward09:52
aspiersawesome, thanks09:53
aspiersI think it's being discussed under [all][tc] and I added [self-healing] later IIRC09:53
aspiers#link http://lists.openstack.org/pipermail/openstack-discuss/2018-December/000599.html09:54
aspiersyou probably already saw that09:54
ifat_afekright09:54
aspiersnot much more to say about that right now, but I thought it was worth mentioning09:54
aspiersifat_afek: should I add a task to https://storyboard.openstack.org/#!/story/2002684 for creating the vitrage-spec?09:55
aspiersso you can reference it in the commit message?09:55
*** ricolin has quit IRC09:56
aspiersah sorry09:56
aspierswrong story09:56
ifat_afekaspiers: are you sure this is the right story? this one is about Vitrage and Heat09:56
ifat_afek:-)09:56
aspiers:)09:56
aspiersI'm not awake yet09:56
aspiershrm, do we have a story for the integration yet?09:56
aspiersmaybe need to create one09:56
ifat_afekwhich I plan to progress with, BTW, but I’m not ready to update about it yet09:56
ifat_afekwe have a story in Vitrage09:57
aspiersOK, I'll look for that09:57
ifat_afekhttps://storyboard.openstack.org/#!/story/200455009:57
aspiersthanks!09:57
aspiersalright, I guess we're done09:57
ifat_afekAnd another one for the (near?) future, to accept immeidate notifications09:57
aspiersgot it09:57
ifat_afekhttps://storyboard.openstack.org/#!/story/200406409:57
aspiersah yeah, it seems I was already subscribed to that one :)09:58
aspiersOK, thanks a lot both, and maybe see you later!09:59
ifat_afekI can add tasks for writing a spec and also for implementing this spec, on top of the initial implementation09:59
ifat_afeksee you later!09:59
aspiersawesome09:59
aspiers#endmeeting09:59
witekthanks aspiers and ifat_afek09:59
*** openstack changes topic to "https://wiki.openstack.org/wiki/Self_healing_SIG | https://storyboard.openstack.org/#!/project/openstack/self-healing-sig"09:59
openstackMeeting ended Wed Dec 19 09:59:44 2018 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)09:59
openstackMinutes:        http://eavesdrop.openstack.org/meetings/self_healing/2018/self_healing.2018-12-19-09.02.html09:59
openstackMinutes (text): http://eavesdrop.openstack.org/meetings/self_healing/2018/self_healing.2018-12-19-09.02.txt09:59
openstackLog:            http://eavesdrop.openstack.org/meetings/self_healing/2018/self_healing.2018-12-19-09.02.log.html09:59
*** ifat_afek has quit IRC12:24
*** ifat_afek has joined #openstack-self-healing13:24
aspiersifat_afek: sorry, have realised I can't make the next meeting14:30
aspiersfeel free to have it without me if there is anyone else around14:31
ifat_afekaspiers: no problem. I don’t have any special agenda, at least until I make some progress with the spec14:31
aspiersok14:31
*** alexchadin has quit IRC15:11
*** ekcs has joined #openstack-self-healing16:58
*** joadavis has joined #openstack-self-healing17:02
*** joadavis has left #openstack-self-healing17:27
*** ekcs has quit IRC17:47
*** ifat_afek has quit IRC17:56
*** ekcs has joined #openstack-self-healing18:38
*** ifat_afek has joined #openstack-self-healing18:51
*** ifat_afek has quit IRC19:00
*** ifat_afek has joined #openstack-self-healing19:09
*** ifat_afek has quit IRC19:12
*** ifat_afek has joined #openstack-self-healing19:19
*** ifat_afek has quit IRC19:31
*** joadavis has joined #openstack-self-healing22:37
*** joadavis has left #openstack-self-healing22:38

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!