*** ekcs has quit IRC | 02:13 | |
*** ricolin has joined #openstack-self-healing | 03:31 | |
*** openstackstatus has quit IRC | 04:35 | |
*** openstackstatus has joined #openstack-self-healing | 04:36 | |
*** ChanServ sets mode: +v openstackstatus | 04:36 | |
*** ricolin has quit IRC | 05:30 | |
*** tojuvone has quit IRC | 05:57 | |
*** tojuvone has joined #openstack-self-healing | 05:57 | |
*** ifat_afek has joined #openstack-self-healing | 06:39 | |
*** witek has joined #openstack-self-healing | 07:23 | |
*** bogdando has joined #openstack-self-healing | 08:02 | |
*** evrardjp has quit IRC | 08:18 | |
*** evrardjp has joined #openstack-self-healing | 08:19 | |
*** ricolin has joined #openstack-self-healing | 08:30 | |
aspiers | morning | 09:01 |
---|---|---|
*** tojuvone has quit IRC | 09:09 | |
*** tojuvone has joined #openstack-self-healing | 09:09 | |
witek | hi aspiers | 09:16 |
aspiers | hi witek | 09:16 |
aspiers | I guess daylight savings shifted the meeting ... | 09:16 |
aspiers | for some people anyway | 09:17 |
witek | yes | 09:18 |
aspiers | I wonder if it's worth moving | 09:19 |
aspiers | we did that last time | 09:19 |
witek | you mean meeting time, or daylight savings? :) | 09:19 |
aspiers | haha | 09:19 |
aspiers | I'd like to stay permanently in BST | 09:20 |
witek | that is UTC+1? | 09:20 |
*** ifat_afek has quit IRC | 09:56 | |
*** ifat_afek has joined #openstack-self-healing | 09:57 | |
aspiers | yes | 10:01 |
*** tobberydberg has joined #openstack-self-healing | 11:22 | |
*** ifat_afek has quit IRC | 12:25 | |
*** ifat_afek has joined #openstack-self-healing | 12:26 | |
*** ricolin has quit IRC | 12:50 | |
*** ricolin has joined #openstack-self-healing | 12:51 | |
*** irclogbot_2 has joined #openstack-self-healing | 13:04 | |
*** altlogbot_3 has joined #openstack-self-healing | 13:07 | |
*** mvkr has quit IRC | 13:19 | |
*** ifat_afek has quit IRC | 13:20 | |
*** ifat_afek has joined #openstack-self-healing | 13:22 | |
*** mvkr has joined #openstack-self-healing | 13:53 | |
evrardjp | just getting rid of daylight saving globally would be a total good first step | 13:56 |
evrardjp | aspiers: did you jsuchome contacted you recently ? | 13:56 |
*** ifat_afek has quit IRC | 14:02 | |
*** tojuvone has quit IRC | 14:38 | |
*** tojuvone has joined #openstack-self-healing | 14:39 | |
aspiers | evrardjp: yes I just saw it | 14:52 |
aspiers | evrardjp: we could discuss it in 1 hour in the meeting | 14:53 |
ricolin | aspiers, are we having a meeting time in an hour or now? | 16:02 |
*** bogdando has quit IRC | 16:34 | |
*** altlogbot_3 has quit IRC | 16:46 | |
*** jsuchome has joined #openstack-self-healing | 16:52 | |
aspiers | ricolin: I think it's now :) | 17:02 |
aspiers | sorry for confusion | 17:02 |
aspiers | evrardjp: are you around? | 17:02 |
ricolin | aspiers, NP, I figure that out by checking irc-meeting | 17:02 |
aspiers | ricolin: I'm still catching up with stuff, but reading http://lists.openstack.org/pipermail/openstack-discuss/2019-March/004246.html now | 17:03 |
jsuchome | o/ | 17:03 |
aspiers | oh hey jsuchome | 17:04 |
ricolin | aspiers, we can give some discuss about it in meeting if you like | 17:04 |
aspiers | sure | 17:04 |
aspiers | #startmeeting self-healing | 17:04 |
openstack | Meeting started Wed Apr 10 17:04:55 2019 UTC and is due to finish in 60 minutes. The chair is aspiers. Information about MeetBot at http://wiki.debian.org/MeetBot. | 17:04 |
openstack | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 17:04 |
*** openstack changes topic to " (Meeting topic: self-healing)" | 17:04 | |
openstack | The meeting name has been set to 'self_healing' | 17:04 |
aspiers | OK so I think we have 3 topics for the agenda today | 17:05 |
aspiers | 1. whatever jsuchome wants to talk about ;-) | 17:05 |
aspiers | 2. ricolin's topic of http://lists.openstack.org/pipermail/openstack-discuss/2019-March/004246.html | 17:05 |
aspiers | 3. Denver | 17:05 |
jsuchome | I've added mine to the agenda btw | 17:06 |
aspiers | cool | 17:06 |
jsuchome | if https://etherpad.openstack.org/p/self-healing-SIG-IRC-meeting is indeed current agenda | 17:06 |
aspiers | I actually forgot about that etherpad %-D | 17:06 |
aspiers | jsuchome, ricolin: if either of you are in a hurry we can change the order so you can leave early | 17:06 |
ricolin | it will be nice if I can go first | 17:07 |
aspiers | wow, I don't think I've ever used that etherpad before | 17:07 |
aspiers | ricolin: OK I guess your topic will be quick anyway | 17:07 |
aspiers | #topic help most needed for SIG | 17:07 |
*** openstack changes topic to "help most needed for SIG (Meeting topic: self-healing)" | 17:07 | |
ricolin | Not like I will leave the meeting, just wish to go with it when my brain still awake | 17:07 |
aspiers | haha :) | 17:07 |
aspiers | sure | 17:07 |
aspiers | ricolin: I can answer most of these questions | 17:08 |
ricolin | yeah | 17:08 |
aspiers | but I could send a reply to the ML ? | 17:08 |
aspiers | I guess you have things you want to ask now too? | 17:08 |
ricolin | It will be nice if you can reply that ML | 17:08 |
aspiers | sure | 17:08 |
*** mvkr has quit IRC | 17:09 | |
ricolin | I'm fine to get those answer from SIG chairs | 17:09 |
aspiers | OK | 17:09 |
ricolin | so I will leave it to you than:) | 17:09 |
aspiers | that's fine :) | 17:10 |
ricolin | Any specific concerns about those questions? | 17:10 |
aspiers | I will reply tonight or tomorrow | 17:10 |
aspiers | feel free to chase me if I forget ... | 17:10 |
ricolin | I will also try to invite project teams to join us in that forum | 17:10 |
ricolin | and hope that will help with SIGs | 17:11 |
aspiers | cool | 17:11 |
aspiers | is there a meta-SIG Forum session? | 17:11 |
aspiers | I don't see one on https://wiki.openstack.org/wiki/Forum/Denver2019 | 17:11 |
ricolin | It's a action from previous summit forum `expose SIGs and WGs` | 17:12 |
aspiers | Yeah I remember that discussion | 17:12 |
aspiers | Just wondering if there is a slot for it in Denver | 17:12 |
ricolin | https://etherpad.openstack.org/p/DEN-Train-TC-brainstorming | 17:12 |
ricolin | under TC actually | 17:12 |
aspiers | OK cool | 17:13 |
ricolin | I wasn't running meta-SIG at the time | 17:13 |
aspiers | np | 17:13 |
aspiers | maybe we can suggest that to ttx and diablo_rojo | 17:13 |
aspiers | anyway | 17:13 |
ricolin | what you mean by suggest? | 17:14 |
aspiers | anything else on this topic or can we move on to jsuchome's topic? | 17:14 |
aspiers | I mean suggest that maybe we should have a meta-SIG session | 17:14 |
aspiers | e.g. 30 mins | 17:14 |
ricolin | aspiers, we can move on:) | 17:14 |
aspiers | OK thanks :) | 17:14 |
aspiers | #topic supporting self-healing in openstack-helm | 17:14 |
ricolin | aspiers, That's actually nice idea | 17:14 |
*** openstack changes topic to "supporting self-healing in openstack-helm (Meeting topic: self-healing)" | 17:14 | |
aspiers | ricolin: we can talk about it in #openstack-tc maybe | 17:15 |
ricolin | aspiers, I already proposed a SIG governance PTG topic under TC PTG as well | 17:15 |
aspiers | oh nice | 17:15 |
ricolin | aspiers, yep | 17:15 |
aspiers | jsuchome: the floor is yours :) | 17:15 |
jsuchome | OK, so - projects deployed by openstack-helm have these probes, basically python scripts that make sure that a certain service is alive | 17:15 |
aspiers | evrardjp: ^^^ in case you are listening or reading scrollback | 17:16 |
jsuchome | (service running in a container/ kubernetes pod, but that's not necessary interesting here) | 17:16 |
jsuchome | you can see one e.g. here, that is for neutron https://review.openstack.org/#/c/632200/ | 17:16 |
aspiers | yeah, just looking at that | 17:16 |
jsuchome | it is using some fake RPC calls just to find out if RPC service is responding | 17:16 |
aspiers | so is this just for RPC, or also APIs? | 17:17 |
jsuchome | problem with that is that while getting fake function, it logs some errors into the logs | 17:17 |
jsuchome | which does not break anything, because we catch the exception and ignore it, but it is ugly and you can miss the real errors in such log files | 17:17 |
jsuchome | just RPC AFAIK | 17:17 |
aspiers | OK so it's basically the RPC equivalent of https://storyboard.openstack.org/#!/story/2001439 ? | 17:18 |
jsuchome | so, I think, we could use some real methods instead of fake ones, the problem is which ones? | 17:18 |
aspiers | I was hoping to drive API health checks as a community goal for Train, but I ran out of time :-/ | 17:19 |
jsuchome | that might be it, I do not know this story, but that's probably the reason I was redirected to self-healing when proposing we should change it in helm | 17:19 |
aspiers | but yeah this sounds like a good idea to me | 17:19 |
jsuchome | I mean, helm : my idea would be to have such simple "ping" methods in different openstack components | 17:19 |
aspiers | the original idea was to start with APIs first and handle RPC etc. later | 17:19 |
aspiers | but if there is developer bandwidth to do RPC first then that is fine too | 17:19 |
aspiers | IIUC https://review.openstack.org/#/c/632200/37/neutron/templates/bin/_health-probe.py.tpl is somewhat overlapping with API health checks too | 17:21 |
jsuchome | so ... I do not know if it's good idea just to start posting something into e.g. nova codebase ... seems like better start would be to read through your proposal for API | 17:21 |
aspiers | definitely worth reading through the API proposal, although I guess the implementation will need to be quite different for RPC | 17:21 |
ricolin | jsuchome, I think it's a nice tool to check the RPC health, which self-healing SIG should definitely help to run/host the process. We can also discuss about where those code can be | 17:21 |
aspiers | yeah | 17:21 |
aspiers | I agree that fake RPC calls is not good, we should have real ones to avoid errors | 17:22 |
aspiers | but not just to avoid errors, I guess the call could trigger some internal checks similarly to what we intended with the API health checks | 17:22 |
aspiers | jsuchome: https://storyboard.openstack.org/#!/story/2001439 is the best place to start reading about API health checks | 17:23 |
aspiers | we agreed to reuse the existing oslo.middleware API and extend it to v2 | 17:23 |
jsuchome | maybe ... but there's probably the idea that current probes should be "light" so do not spend some resources needed elsewhere, I do not know right now how often they are called | 17:23 |
aspiers | yes | 17:24 |
ricolin | Yeah agree with aspiers , we can reuse/create the same storyboard story in self-healing SIG and implement the basic checking tool in oslo, and services can implement RPC/API check within their own codebase | 17:24 |
aspiers | with API health checks we debated for years whether to make them synchronous (triggered by the request) or async (run periodically and cache the results) | 17:25 |
aspiers | in the end we decided to start simple with synchronous and worry about maybe moving to background later | 17:25 |
aspiers | since there were endless discussions and no progress on code | 17:25 |
aspiers | so my advice would be to avoid over-engineering something complex early on, and start with the simplest thing which can possibly work | 17:26 |
ricolin | aspiers, jsuchome agree, light as it can, and expend if really needed | 17:26 |
jsuchome | well, I'm not sure if my idea wasn't too simple - I really just thought of going through those services I want to watch and adding simple calls with no params and basic response | 17:27 |
aspiers | that sounds good to me | 17:27 |
aspiers | later the basic response could optionally be made more informative | 17:27 |
aspiers | e.g. "oh no! I'm still running but my backend is broken" | 17:27 |
aspiers | however IIRC we also agreed to avoid recursive / transitive checks | 17:28 |
ricolin | jsuchome, so the basic code is ready to split from openstack-helm now or you plan to restart the effort? | 17:28 |
aspiers | so e.g. it's OK for a service to report stuff like "my db connection is healthy/broken", but not OK for it to report "I depend on another service X, and X can't talk to its db" | 17:29 |
aspiers | since in that case service X should report db connection issues via its own healthcheck API or RPC calls | 17:29 |
jsuchome | this would really not be part of openstack-helm code at all, it needs to go directly into openstack components, openstack-helm would just use what is prepared | 17:29 |
aspiers | and we don't want the same health checks duplicated / triggered in multiple places | 17:29 |
aspiers | jsuchome: yes exactly | 17:29 |
aspiers | jsuchome: we already intended openstack-helm to be one of the first customers of the API health checks (I talked to the AT&T guys about this in Berlin and they were interested) | 17:30 |
aspiers | so the same applies for RPC | 17:30 |
aspiers | I think this SIG is the best place to track that work | 17:30 |
jsuchome | it makes sense, as kubernetes being one of the engines that should be interesting in health | 17:31 |
aspiers | definitely | 17:31 |
jsuchome | ok then - I'm gonna look at your API proposal ... then I might hack some simple POC and offer it for reviews, possibly coming here again for some consulting | 17:31 |
aspiers | my suggestion would be a) submit a new story for this, just like https://storyboard.openstack.org/#!/story/2001439 | 17:31 |
ricolin | +1 | 17:32 |
aspiers | (I think maybe best to keep it separate to the API health checks, unless you can see any significant overlap) | 17:32 |
aspiers | b) submit a spec to self-healing-sig repo | 17:32 |
aspiers | c) after spec is merged (or maybe even before) start submitting code to implement | 17:32 |
aspiers | does that make sense? | 17:33 |
jsuchome | it does, yes | 17:33 |
aspiers | cool | 17:33 |
ricolin | We should also have a PTG session for health chekcs for API and RPC | 17:33 |
ricolin | jsuchome, are you going to Denver too? | 17:33 |
aspiers | we should have time in the self-healing PTG session | 17:33 |
ricolin | aspiers, that's perfect | 17:34 |
jsuchome | unfortunatelly not, but I hope I can prepare something before so you can have some base if you have time to talk about this topic | 17:34 |
aspiers | jsuchome: I am very happy to lead that discussion | 17:34 |
aspiers | maybe evrardjp is coming too? I can't remember | 17:34 |
aspiers | jsuchome: feel free to add to https://etherpad.openstack.org/p/DEN-self-healing-SIG | 17:34 |
jsuchome | I think he will, yes | 17:34 |
ricolin | aspiers, I think he will | 17:35 |
aspiers | awesome | 17:35 |
jsuchome | ok, cool, I think I'm done here, thanks for the ideas | 17:35 |
aspiers | cool | 17:35 |
aspiers | thanks a lot for proposing, this is a really cool initiative :) | 17:35 |
aspiers | #topic Denver | 17:36 |
*** openstack changes topic to "Denver (Meeting topic: self-healing)" | 17:36 | |
aspiers | well I guess we already mostly covered this | 17:36 |
aspiers | but just for the record ... | 17:36 |
aspiers | #link https://etherpad.openstack.org/p/DEN-self-healing-SIG self-healing etherpad for Denver Forum and PTG sessions | 17:36 |
aspiers | I thought it was probably overkill to have two separate etherpads | 17:37 |
aspiers | I'll try to touch base with ekcs about Denver too | 17:37 |
aspiers | alright, anything else? if not I think we're done | 17:39 |
ricolin | aspiers, thx, I will try to see if I have any topic to put in | 17:39 |
aspiers | +1 | 17:39 |
ricolin | I think I will put one for tempest later | 17:40 |
aspiers | alright cool, thanks a lot folks! | 17:40 |
aspiers | good idea | 17:40 |
ricolin | aspiers, thx! | 17:40 |
aspiers | #action aspiers to reply to ricolin's SIG questionnaire on ML | 17:40 |
* ricolin like that action:) | 17:40 | |
aspiers | XD | 17:40 |
jsuchome | good night or whatever time is it in your part of the world | 17:40 |
aspiers | #action jsuchome to propose the Denver discussion about RPC healthchecks | 17:41 |
ricolin | 01:41 for me:/ | 17:41 |
aspiers | #action ricolin to propose discussion topic for tempest | 17:41 |
aspiers | ouch! | 17:41 |
aspiers | OK, please sleep now ;-) | 17:41 |
aspiers | thanks a lot for attending | 17:41 |
aspiers | ttyl o/ | 17:41 |
ricolin | any chance to make our meeting two hour earlier?:) | 17:42 |
aspiers | we can make it at least one hour earlier | 17:42 |
aspiers | maybe 2 | 17:42 |
aspiers | but there was another meeting this morning | 17:42 |
aspiers | that one is targetted at EU / APAC | 17:42 |
aspiers | we always have two on the same day, so that all time zones are covered | 17:43 |
ricolin | aspiers, oh, okay than I should try to join that one | 17:43 |
aspiers | yes please :) | 17:43 |
aspiers | #action aspiers to ask if irc-meetings can accept events in non-fixed time zones, so that they automatically adapt to daylight savings | 17:43 |
aspiers | I'll ask on infra now | 17:44 |
aspiers | OK, l8r folks! | 17:44 |
aspiers | #endmeeting | 17:44 |
*** openstack changes topic to "https://wiki.openstack.org/wiki/Self_healing_SIG | https://storyboard.openstack.org/#!/project/openstack/self-healing-sig" | 17:44 | |
openstack | Meeting ended Wed Apr 10 17:44:20 2019 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 17:44 |
openstack | Minutes: http://eavesdrop.openstack.org/meetings/self_healing/2019/self_healing.2019-04-10-17.04.html | 17:44 |
openstack | Minutes (text): http://eavesdrop.openstack.org/meetings/self_healing/2019/self_healing.2019-04-10-17.04.txt | 17:44 |
openstack | Log: http://eavesdrop.openstack.org/meetings/self_healing/2019/self_healing.2019-04-10-17.04.log.html | 17:44 |
*** witek has quit IRC | 17:44 | |
*** ekcs has joined #openstack-self-healing | 17:45 | |
*** ricolin has quit IRC | 17:49 | |
-openstackstatus- NOTICE: Restarting Gerrit on review.openstack.org to pick up new configuration for the replication plugin | 19:05 | |
*** jsuchome has quit IRC | 19:06 | |
*** mvkr has joined #openstack-self-healing | 20:52 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!