*** ekcs has quit IRC | 01:40 | |
*** ricolin has joined #openstack-self-healing | 02:20 | |
*** ifat_afek has joined #openstack-self-healing | 06:54 | |
*** alexchadin has joined #openstack-self-healing | 08:55 | |
aspiers | hi, anyone around today? I'm not expecting anyone so close to the holidays :) | 09:00 |
---|---|---|
ifat_afek | I’m here :-) | 09:00 |
aspiers | hi :) | 09:00 |
aspiers | anything you want to discuss? | 09:01 |
witek | hi | 09:01 |
aspiers | hi witek :) | 09:01 |
ifat_afek | I was just about to say Vitrage&Monasca ;-) | 09:01 |
witek | hi aspiers and ifat_afek | 09:01 |
aspiers | OK sure | 09:01 |
aspiers | #startmeeting self-healing | 09:02 |
openstack | Meeting started Wed Dec 19 09:02:12 2018 UTC and is due to finish in 60 minutes. The chair is aspiers. Information about MeetBot at http://wiki.debian.org/MeetBot. | 09:02 |
openstack | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 09:02 |
*** openstack changes topic to " (Meeting topic: self-healing)" | 09:02 | |
openstack | The meeting name has been set to 'self_healing' | 09:02 |
aspiers | #topic vitrage / monasca integration | 09:02 |
*** openstack changes topic to "vitrage / monasca integration (Meeting topic: self-healing)" | 09:02 | |
aspiers | so, what's new with this? :) | 09:02 |
ifat_afek | https://review.openstack.org/#/c/622899/ | 09:03 |
ifat_afek | The integration is almost done, but we need to solve the issue of identifying in Vitrage the resource that the alarm is raised on | 09:03 |
witek | we have to clarify the mapping between Monasca alarms and Vitrage entities/resources | 09:03 |
aspiers | awesome | 09:04 |
ifat_afek | BTW, once we agree on the conceptual design, I think that we might be able to close the current change with minimal fixes, and do the complete solution later in a different change | 09:04 |
witek | +1 | 09:04 |
aspiers | makes sense | 09:04 |
aspiers | nice to see my colleague Joe on the review :) | 09:04 |
aspiers | I should get him on this channel | 09:04 |
ifat_afek | witek: did you see my last mail? I suggested a solution, but I’m not familiar enough with Monasca so I need your approval that it will work | 09:05 |
witek | yes, started writing an answer yesterday, will send today | 09:05 |
witek | in general I think it should work | 09:05 |
ifat_afek | Cool, thanks | 09:05 |
witek | was wondering if you want to implement it as `global` mapping, or add to resource entity definition in Vitrage template? | 09:06 |
ifat_afek | I thought about a global mapping (single configuration file), but let me think it over | 09:06 |
ifat_afek | The benefit of the global mapping is that the same alarm can be easily reused in several templates | 09:07 |
witek | true | 09:07 |
ifat_afek | Any disadvantages in your opinion? | 09:07 |
witek | I think global file would cover most of use cases, but definition in template might be more flexible | 09:08 |
witek | but my knowledge about Vitrage is very limited, so I might be wrong | 09:09 |
ifat_afek | I need to think about it. Do you have an example? | 09:09 |
witek | http_status on node or VIP | 09:10 |
ifat_afek | How is it defined? it was probably written in the mail | 09:10 |
ifat_afek | `name`: `http_status`, `dimensions`: {`hostname`: `node1`, | 09:10 |
ifat_afek | `service`: `keystone`, `url`: `http://node1/identity` | 09:10 |
ifat_afek | <http://node1/identity>} | 09:10 |
ifat_afek | This one, right? | 09:11 |
witek | yes | 09:11 |
ifat_afek | And why do you think we should handle it in the template? | 09:11 |
witek | when configured with node URL, gives information about the service on the node | 09:12 |
ifat_afek | sorry, I don’t understand | 09:12 |
witek | when configured with VIP URL, information is one layer higher, for load-balanced service | 09:12 |
aspiers | newbie question: how is it currently done with Zabbix? | 09:13 |
ifat_afek | we are facing similar questions with Zabbix. So far we are using it for monitoring hosts, and these are statically defined in a zabbix_conf file. For monitoring vms, interfaces etc we didn’t implement a good solution yet | 09:14 |
aspiers | ah ok | 09:14 |
ifat_afek | we are also facing this question with Prometheus... | 09:14 |
aspiers | is it worth writing a spec for this maybe? | 09:14 |
aspiers | or is that too heavy-weight? | 09:14 |
ifat_afek | of course, I’m trying to understand what this spec should include | 09:14 |
aspiers | got it :-) | 09:14 |
ifat_afek | I don’t think it’s too heavy-weight, and it is definitely something we should handle | 09:15 |
ifat_afek | witek: let me get back to your question. You are saying that depending on the URL we should figure out the resource type? | 09:15 |
aspiers | well, the spec could list multiple options but propose a preferred solution and list the other(s) as alternative(s) | 09:15 |
ifat_afek | aspiers: of course | 09:16 |
ifat_afek | this is why I wanted to make a temporary fix for the current change in gerrit. But it should be a smart fix and not the existing POC code | 09:16 |
witek | ifat_afek: I think that in general operators might want to use the same metric to alarm about different things | 09:17 |
ifat_afek | the full implementation should not take a long time, the design is the complicated part | 09:17 |
ifat_afek | witek: I agree | 09:17 |
ifat_afek | and how do the operators understand what is being monitored? suppose they see the alarms in Monasca itself, do they figure it out by the resource name? by the URL?… | 09:18 |
witek | so it might be an advantage if they also have a mechanism to describe it in alarm entity definition, how a given alarm should be interpreted | 09:18 |
ifat_afek | but the interpretation should happen in Monasca first, right? | 09:18 |
ifat_afek | when you create an alarm definition, you should somehow describe what you are monitoring | 09:19 |
witek | operators are free to define their own alarms | 09:19 |
witek | they know, what the metric measures | 09:19 |
ifat_afek | If I am an operator, and I see a ‘high cpu load’ alarm, how can I tell if it was raised on a vm or on a host? by the resource name? by a certain dimension? | 09:19 |
ifat_afek | BTW, aspiers, if you prefer we can take this discussion offline :-) | 09:20 |
aspiers | no, this is a great discussion and obviously important for self-healing :) | 09:21 |
aspiers | and we probably don't have any other topics today anyway :) | 09:21 |
witek | depends how the metric is collected: each monasca-agent plugin provides unique metric names | 09:21 |
witek | so, system metrics names are different then ones from libvirt plugin | 09:21 |
ifat_afek | in this case, can we have a configuration file in Vitrage that determines the resource type per metric name? will it always be 1-1 relation? | 09:22 |
ifat_afek | or maybe better, can we get this information directly from Monasca? | 09:22 |
witek | I'd say not always, but in most cases | 09:23 |
*** eyalb1 has joined #openstack-self-healing | 09:23 | |
ifat_afek | if it’s most cases, then maybe we can’t do what I suggested | 09:24 |
ifat_afek | so back to my previous question - if the same metric can be used for two different resource types, how does the operator understand what the alarm was raised on? | 09:25 |
ifat_afek | I’m trying to understand the logic behind this, so I can use this logic in Vitrage | 09:25 |
witek | I think the problem appears for generic metrics, like e.g. http_check | 09:26 |
witek | which can be configured with really any http endpoint | 09:27 |
witek | and to your question, the metric is uniquely described by its name and dimension key/values | 09:28 |
ifat_afek | so I could say something like this: if metric_name==‘http_check’ and url==‘…..’ then resource_type is host and resource_id is ‘resource_id’? | 09:29 |
ifat_afek | the disadvantage is, of course, having a detailed description for every alarm | 09:30 |
ifat_afek | and the need to manually update the description once in a while | 09:30 |
ifat_afek | alternatively - does it make sense to ask for a dedicated ‘resource_type’ dimension in Monasca? | 09:31 |
witek | configuration of dedicated `resource_type` in agent could be left to operator, per convention | 09:34 |
ifat_afek | ok, so we can’t force it nor assume it is there | 09:34 |
ifat_afek | so it seems like we should have a slightly-complex configuration file. do you have a better idea? | 09:35 |
aspiers | #link http://lists.openstack.org/pipermail/openstack-discuss/2018-December/000806.html mailing list discussion on the integration | 09:36 |
witek | I think we should start with defining use cases we would like to cover and check if we can do it with simple conf file | 09:36 |
witek | I think it should work for most cases | 09:36 |
ifat_afek | sounds like a good idea | 09:37 |
ifat_afek | so I can start a spec with use cases and you’ll help me | 09:37 |
witek | I'll definitely help | 09:37 |
ifat_afek | later I can add a proposed solution to the spec | 09:38 |
ifat_afek | but I agree we should start with the use cases | 09:38 |
aspiers | sounds good to me too | 09:38 |
ifat_afek | great. I’ll push an initial version today or tomorrow | 09:38 |
witek | cool, thanks | 09:38 |
aspiers | awesome! | 09:39 |
ifat_afek | where to? I was thinking about vitrage-specs, because the implementation will most likely be inside Vitrage | 09:39 |
ifat_afek | unless you think it should be in the self-healing repo | 09:39 |
aspiers | #action ifat_afek will submit a spec with use cases | 09:39 |
witek | yes, I think Vitrage repo is best suited | 09:39 |
aspiers | either is fine by me | 09:39 |
ifat_afek | witek: do you have a simple solution for the existing change? e.g. use a dimension that will work in many cases but not all, instead of the one used for the POC? just so we can say this change is finished | 09:41 |
witek | I would say, we could stay with current approach for the first version | 09:43 |
witek | will leave a comment in review | 09:43 |
ifat_afek | great, thanks | 09:43 |
aspiers | cool. anything else on this topic? | 09:44 |
ifat_afek | nothing on my side | 09:44 |
witek | no, thanks | 09:44 |
aspiers | I learned some useful things, especially that I need to improve my email filters ;-) | 09:44 |
witek | :) oh, we haven't added [self-healing] | 09:45 |
*** eyalb1 has left #openstack-self-healing | 09:45 | |
ifat_afek | my bad… | 09:45 |
aspiers | haha no problem X-D | 09:45 |
aspiers | ok, well thanks a lot both - I'm SUPER happy and excited this discussion is happening :) | 09:45 |
ifat_afek | me too! | 09:46 |
witek | yes, me too, thanks ifat_afek for launching it! | 09:46 |
aspiers | it's exactly the kind of cross-project work I was dreaming of for the SIG | 09:46 |
aspiers | I will ping Joseph and see if he can join future IRC discussions, but I see he's on the mailing list thread already anyway | 09:47 |
aspiers | and he's in the wrong timezone for this meeting | 09:47 |
aspiers | do either of you intend to join the one later today? | 09:47 |
aspiers | no problem at all if not | 09:47 |
ifat_afek | I plan to | 09:47 |
aspiers | OK, it will be his morning then | 09:47 |
witek | I'm not sure yet | 09:47 |
aspiers | I'll see if he can, but I guess it's not a big deal if not | 09:48 |
aspiers | if you included [self-healing] when announcing the spec, maybe it can get a few more reviewers | 09:48 |
aspiers | I will certainly review, anyway | 09:49 |
ifat_afek | usually I don’t announce specs, but I can do it this time | 09:49 |
ifat_afek | becasue indeed it is interesting, and I’ll be happy to hear more opinions | 09:49 |
aspiers | great :) | 09:49 |
aspiers | alright, just very briefly for the record... | 09:49 |
aspiers | #topic service health check API | 09:50 |
*** openstack changes topic to "service health check API (Meeting topic: self-healing)" | 09:50 | |
aspiers | there seems to be some movement on this, since the TC have proposed it as a goal for Train | 09:50 |
aspiers | but there needs to be a champion goal | 09:50 |
aspiers | I have initiated some discussion in SUSE about this - maybe there is a chance that I or another colleague could volunteer for that | 09:51 |
aspiers | but we need to discuss prioritisation first, so can't guarantee anything | 09:51 |
aspiers | in any case, if we had a health check API across multiple services, this would presumably tie in very nicely with the Vitrage/Monasca integration efforts | 09:52 |
ifat_afek | of course, I think it’s a great initiative, and I’ll be happy if you or your colleagues can drive it forward | 09:52 |
aspiers | awesome, thanks | 09:53 |
aspiers | I think it's being discussed under [all][tc] and I added [self-healing] later IIRC | 09:53 |
aspiers | #link http://lists.openstack.org/pipermail/openstack-discuss/2018-December/000599.html | 09:54 |
aspiers | you probably already saw that | 09:54 |
ifat_afek | right | 09:54 |
aspiers | not much more to say about that right now, but I thought it was worth mentioning | 09:54 |
aspiers | ifat_afek: should I add a task to https://storyboard.openstack.org/#!/story/2002684 for creating the vitrage-spec? | 09:55 |
aspiers | so you can reference it in the commit message? | 09:55 |
*** ricolin has quit IRC | 09:56 | |
aspiers | ah sorry | 09:56 |
aspiers | wrong story | 09:56 |
ifat_afek | aspiers: are you sure this is the right story? this one is about Vitrage and Heat | 09:56 |
ifat_afek | :-) | 09:56 |
aspiers | :) | 09:56 |
aspiers | I'm not awake yet | 09:56 |
aspiers | hrm, do we have a story for the integration yet? | 09:56 |
aspiers | maybe need to create one | 09:56 |
ifat_afek | which I plan to progress with, BTW, but I’m not ready to update about it yet | 09:56 |
ifat_afek | we have a story in Vitrage | 09:57 |
aspiers | OK, I'll look for that | 09:57 |
ifat_afek | https://storyboard.openstack.org/#!/story/2004550 | 09:57 |
aspiers | thanks! | 09:57 |
aspiers | alright, I guess we're done | 09:57 |
ifat_afek | And another one for the (near?) future, to accept immeidate notifications | 09:57 |
aspiers | got it | 09:57 |
ifat_afek | https://storyboard.openstack.org/#!/story/2004064 | 09:57 |
aspiers | ah yeah, it seems I was already subscribed to that one :) | 09:58 |
aspiers | OK, thanks a lot both, and maybe see you later! | 09:59 |
ifat_afek | I can add tasks for writing a spec and also for implementing this spec, on top of the initial implementation | 09:59 |
ifat_afek | see you later! | 09:59 |
aspiers | awesome | 09:59 |
aspiers | #endmeeting | 09:59 |
witek | thanks aspiers and ifat_afek | 09:59 |
*** openstack changes topic to "https://wiki.openstack.org/wiki/Self_healing_SIG | https://storyboard.openstack.org/#!/project/openstack/self-healing-sig" | 09:59 | |
openstack | Meeting ended Wed Dec 19 09:59:44 2018 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 09:59 |
openstack | Minutes: http://eavesdrop.openstack.org/meetings/self_healing/2018/self_healing.2018-12-19-09.02.html | 09:59 |
openstack | Minutes (text): http://eavesdrop.openstack.org/meetings/self_healing/2018/self_healing.2018-12-19-09.02.txt | 09:59 |
openstack | Log: http://eavesdrop.openstack.org/meetings/self_healing/2018/self_healing.2018-12-19-09.02.log.html | 09:59 |
*** ifat_afek has quit IRC | 12:24 | |
*** ifat_afek has joined #openstack-self-healing | 13:24 | |
aspiers | ifat_afek: sorry, have realised I can't make the next meeting | 14:30 |
aspiers | feel free to have it without me if there is anyone else around | 14:31 |
ifat_afek | aspiers: no problem. I don’t have any special agenda, at least until I make some progress with the spec | 14:31 |
aspiers | ok | 14:31 |
*** alexchadin has quit IRC | 15:11 | |
*** ekcs has joined #openstack-self-healing | 16:58 | |
*** joadavis has joined #openstack-self-healing | 17:02 | |
*** joadavis has left #openstack-self-healing | 17:27 | |
*** ekcs has quit IRC | 17:47 | |
*** ifat_afek has quit IRC | 17:56 | |
*** ekcs has joined #openstack-self-healing | 18:38 | |
*** ifat_afek has joined #openstack-self-healing | 18:51 | |
*** ifat_afek has quit IRC | 19:00 | |
*** ifat_afek has joined #openstack-self-healing | 19:09 | |
*** ifat_afek has quit IRC | 19:12 | |
*** ifat_afek has joined #openstack-self-healing | 19:19 | |
*** ifat_afek has quit IRC | 19:31 | |
*** joadavis has joined #openstack-self-healing | 22:37 | |
*** joadavis has left #openstack-self-healing | 22:38 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!