Wednesday, 2019-04-10

*** ekcs has quit IRC02:13
*** ricolin has joined #openstack-self-healing03:31
*** openstackstatus has quit IRC04:35
*** openstackstatus has joined #openstack-self-healing04:36
*** ChanServ sets mode: +v openstackstatus04:36
*** ricolin has quit IRC05:30
*** tojuvone has quit IRC05:57
*** tojuvone has joined #openstack-self-healing05:57
*** ifat_afek has joined #openstack-self-healing06:39
*** witek has joined #openstack-self-healing07:23
*** bogdando has joined #openstack-self-healing08:02
*** evrardjp has quit IRC08:18
*** evrardjp has joined #openstack-self-healing08:19
*** ricolin has joined #openstack-self-healing08:30
aspiersmorning09:01
*** tojuvone has quit IRC09:09
*** tojuvone has joined #openstack-self-healing09:09
witekhi aspiers09:16
aspiershi witek09:16
aspiersI guess daylight savings shifted the meeting ...09:16
aspiersfor some people anyway09:17
witekyes09:18
aspiersI wonder if it's worth moving09:19
aspierswe did that last time09:19
witekyou mean meeting time, or daylight savings? :)09:19
aspiershaha09:19
aspiersI'd like to stay permanently in BST09:20
witekthat is UTC+1?09:20
*** ifat_afek has quit IRC09:56
*** ifat_afek has joined #openstack-self-healing09:57
aspiersyes10:01
*** tobberydberg has joined #openstack-self-healing11:22
*** ifat_afek has quit IRC12:25
*** ifat_afek has joined #openstack-self-healing12:26
*** ricolin has quit IRC12:50
*** ricolin has joined #openstack-self-healing12:51
*** irclogbot_2 has joined #openstack-self-healing13:04
*** altlogbot_3 has joined #openstack-self-healing13:07
*** mvkr has quit IRC13:19
*** ifat_afek has quit IRC13:20
*** ifat_afek has joined #openstack-self-healing13:22
*** mvkr has joined #openstack-self-healing13:53
evrardjpjust getting rid of daylight saving globally would be a total good first step13:56
evrardjpaspiers: did you jsuchome contacted you recently ?13:56
*** ifat_afek has quit IRC14:02
*** tojuvone has quit IRC14:38
*** tojuvone has joined #openstack-self-healing14:39
aspiersevrardjp: yes I just saw it14:52
aspiersevrardjp: we could discuss it in 1 hour in the meeting14:53
ricolinaspiers, are we having a meeting time in an hour or now?16:02
*** bogdando has quit IRC16:34
*** altlogbot_3 has quit IRC16:46
*** jsuchome has joined #openstack-self-healing16:52
aspiersricolin: I think it's now :)17:02
aspierssorry for confusion17:02
aspiersevrardjp: are you around?17:02
ricolinaspiers, NP, I figure that out by checking irc-meeting17:02
aspiersricolin: I'm still catching up with stuff, but reading http://lists.openstack.org/pipermail/openstack-discuss/2019-March/004246.html now17:03
jsuchomeo/17:03
aspiersoh hey jsuchome17:04
ricolinaspiers, we can give some discuss about it in meeting if you like17:04
aspierssure17:04
aspiers#startmeeting self-healing17:04
openstackMeeting started Wed Apr 10 17:04:55 2019 UTC and is due to finish in 60 minutes.  The chair is aspiers. Information about MeetBot at http://wiki.debian.org/MeetBot.17:04
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.17:04
*** openstack changes topic to " (Meeting topic: self-healing)"17:04
openstackThe meeting name has been set to 'self_healing'17:04
aspiersOK so I think we have 3 topics for the agenda today17:05
aspiers1. whatever jsuchome wants to talk about ;-)17:05
aspiers2. ricolin's topic of http://lists.openstack.org/pipermail/openstack-discuss/2019-March/004246.html17:05
aspiers3. Denver17:05
jsuchomeI've added mine to the agenda btw17:06
aspierscool17:06
jsuchomeif https://etherpad.openstack.org/p/self-healing-SIG-IRC-meeting is indeed current agenda17:06
aspiersI actually forgot about that etherpad %-D17:06
aspiersjsuchome, ricolin: if either of you are in a hurry we can change the order so you can leave early17:06
ricolinit will be nice if I can go first17:07
aspierswow, I don't think I've ever used that etherpad before17:07
aspiersricolin: OK I guess your topic will be quick anyway17:07
aspiers#topic help most needed for SIG17:07
*** openstack changes topic to "help most needed for SIG (Meeting topic: self-healing)"17:07
ricolinNot like I will leave the meeting, just wish to go with it when my brain still awake17:07
aspiershaha :)17:07
aspierssure17:07
aspiersricolin: I can answer most of these questions17:08
ricolinyeah17:08
aspiersbut I could send a reply to the ML ?17:08
aspiersI guess you have things you want to ask now too?17:08
ricolinIt will be nice if you can reply that ML17:08
aspierssure17:08
*** mvkr has quit IRC17:09
ricolinI'm fine to get those answer from SIG chairs17:09
aspiersOK17:09
ricolinso I will leave it to you than:)17:09
aspiersthat's fine :)17:10
ricolinAny specific concerns about those questions?17:10
aspiersI will reply tonight or tomorrow17:10
aspiersfeel free to chase me if I forget ...17:10
ricolinI will also try to invite project teams to join us in that forum17:10
ricolinand hope that will help with SIGs17:11
aspierscool17:11
aspiersis there a meta-SIG Forum session?17:11
aspiersI don't see one on https://wiki.openstack.org/wiki/Forum/Denver201917:11
ricolinIt's a action from previous summit forum `expose SIGs and WGs`17:12
aspiersYeah I remember that discussion17:12
aspiersJust wondering if there is a slot for it in Denver17:12
ricolinhttps://etherpad.openstack.org/p/DEN-Train-TC-brainstorming17:12
ricolinunder TC actually17:12
aspiersOK cool17:13
ricolinI wasn't running meta-SIG at the time17:13
aspiersnp17:13
aspiersmaybe we can suggest that to ttx and diablo_rojo17:13
aspiersanyway17:13
ricolinwhat you mean by suggest?17:14
aspiersanything else on this topic or can we move on to jsuchome's topic?17:14
aspiersI mean suggest that maybe we should have a meta-SIG session17:14
aspierse.g. 30 mins17:14
ricolinaspiers, we can move on:)17:14
aspiersOK thanks :)17:14
aspiers#topic supporting self-healing in openstack-helm17:14
ricolinaspiers, That's actually nice idea17:14
*** openstack changes topic to "supporting self-healing in openstack-helm (Meeting topic: self-healing)"17:14
aspiersricolin: we can talk about it in #openstack-tc maybe17:15
ricolinaspiers, I already proposed a SIG governance PTG topic under TC PTG as well17:15
aspiersoh nice17:15
ricolinaspiers, yep17:15
aspiersjsuchome: the floor is yours :)17:15
jsuchomeOK, so - projects deployed by openstack-helm have these probes, basically python scripts that make sure that a certain service is alive17:15
aspiersevrardjp: ^^^ in case you are listening or reading scrollback17:16
jsuchome(service running in a container/ kubernetes pod, but that's not necessary interesting here)17:16
jsuchomeyou can see one e.g. here, that is for neutron https://review.openstack.org/#/c/632200/17:16
aspiersyeah, just looking at that17:16
jsuchomeit is using some fake RPC calls just to find out if RPC service is responding17:16
aspiersso is this just for RPC, or also APIs?17:17
jsuchomeproblem with that is that while getting fake function, it logs some errors into the logs17:17
jsuchomewhich does not break anything, because we catch the exception and ignore it, but it is ugly and you can miss the real errors in such log files17:17
jsuchomejust RPC AFAIK17:17
aspiersOK so it's basically the RPC equivalent of https://storyboard.openstack.org/#!/story/2001439 ?17:18
jsuchomeso, I think, we could use some real methods instead of fake ones, the problem is which ones?17:18
aspiersI was hoping to drive API health checks as a community goal for Train, but I ran out of time :-/17:19
jsuchomethat might be it, I do not know this story, but that's probably the reason I was redirected to self-healing when proposing we should change it in helm17:19
aspiersbut yeah this sounds like a good idea to me17:19
jsuchomeI mean, helm : my idea would be to have such simple "ping" methods in different openstack components17:19
aspiersthe original idea was to start with APIs first and handle RPC etc. later17:19
aspiersbut if there is developer bandwidth to do RPC first then that is fine too17:19
aspiersIIUC https://review.openstack.org/#/c/632200/37/neutron/templates/bin/_health-probe.py.tpl is somewhat overlapping with API health checks too17:21
jsuchomeso ... I do not know if it's good idea just to start posting something into e.g. nova codebase ... seems like better start would be to read through your proposal for API17:21
aspiersdefinitely worth reading through the API proposal, although I guess the implementation will need to be quite different for RPC17:21
ricolinjsuchome, I think it's a nice tool to check the RPC health, which self-healing SIG should definitely help to run/host the process. We can also discuss about where those code can be17:21
aspiersyeah17:21
aspiersI agree that fake RPC calls is not good, we should have real ones to avoid errors17:22
aspiersbut not just to avoid errors, I guess the call could trigger some internal checks similarly to what we intended with the API health checks17:22
aspiersjsuchome: https://storyboard.openstack.org/#!/story/2001439 is the best place to start reading about API health checks17:23
aspierswe agreed to reuse the existing oslo.middleware API and extend it to v217:23
jsuchomemaybe ... but there's probably the idea that current probes should be "light" so do not spend some resources needed elsewhere, I do not know right now how often they are called17:23
aspiersyes17:24
ricolinYeah agree with aspiers , we can reuse/create the same storyboard story in self-healing SIG and implement the basic checking tool in oslo, and services can implement RPC/API check within their own codebase17:24
aspierswith API health checks we debated for years whether to make them synchronous (triggered by the request) or async (run periodically and cache the results)17:25
aspiersin the end we decided to start simple with synchronous and worry about maybe moving to background later17:25
aspierssince there were endless discussions and no progress on code17:25
aspiersso my advice would be to avoid over-engineering something complex early on, and start with the simplest thing which can possibly work17:26
ricolinaspiers, jsuchome agree, light as it can, and expend if really needed17:26
jsuchomewell, I'm not sure if my idea wasn't too simple - I really just thought of going through those services I want to watch and adding simple calls with no params and basic response17:27
aspiersthat sounds good to me17:27
aspierslater the basic response could optionally be made more informative17:27
aspierse.g. "oh no! I'm still running but my backend is broken"17:27
aspiershowever IIRC we also agreed to avoid recursive / transitive checks17:28
ricolinjsuchome, so the basic code is ready to split from openstack-helm now or you plan to restart the effort?17:28
aspiersso e.g. it's OK for a service to report stuff like "my db connection is healthy/broken", but not OK for it to report "I depend on another service X, and X can't talk to its db"17:29
aspierssince in that case service X should report db connection issues via its own healthcheck API or RPC calls17:29
jsuchomethis would really not be part of openstack-helm code at all, it needs to go directly into openstack components, openstack-helm would just use what is prepared17:29
aspiersand we don't want the same health checks duplicated / triggered in multiple places17:29
aspiersjsuchome: yes exactly17:29
aspiersjsuchome: we already intended openstack-helm to be one of the first customers of the API health checks (I talked to the AT&T guys about this in Berlin and they were interested)17:30
aspiersso the same applies for RPC17:30
aspiersI think this SIG is the best place to track that work17:30
jsuchomeit makes sense, as kubernetes being one of the engines that should be interesting in health17:31
aspiersdefinitely17:31
jsuchomeok then - I'm gonna look at your API proposal ... then I might hack some simple POC and offer it for reviews, possibly coming here again for some consulting17:31
aspiersmy suggestion would be a) submit a new story for this, just like https://storyboard.openstack.org/#!/story/200143917:31
ricolin+117:32
aspiers(I think maybe best to keep it separate to the API health checks, unless you can see any significant overlap)17:32
aspiersb) submit a spec to self-healing-sig repo17:32
aspiersc) after spec is merged (or maybe even before) start submitting code to implement17:32
aspiersdoes that make sense?17:33
jsuchomeit does, yes17:33
aspierscool17:33
ricolinWe should also have a PTG session for health chekcs for API and RPC17:33
ricolinjsuchome, are you going to Denver too?17:33
aspierswe should have time in the self-healing PTG session17:33
ricolinaspiers, that's perfect17:34
jsuchomeunfortunatelly not, but I hope I can prepare something before so you can have some base if you have time to talk about this topic17:34
aspiersjsuchome: I am very happy to lead that discussion17:34
aspiersmaybe evrardjp is coming too? I can't remember17:34
aspiersjsuchome: feel free to add to https://etherpad.openstack.org/p/DEN-self-healing-SIG17:34
jsuchomeI think he will, yes17:34
ricolinaspiers, I think he will17:35
aspiersawesome17:35
jsuchomeok, cool, I think I'm done here, thanks for the ideas17:35
aspierscool17:35
aspiersthanks a lot for proposing, this is a really cool initiative :)17:35
aspiers#topic Denver17:36
*** openstack changes topic to "Denver (Meeting topic: self-healing)"17:36
aspierswell I guess we already mostly covered this17:36
aspiersbut just for the record ...17:36
aspiers#link https://etherpad.openstack.org/p/DEN-self-healing-SIG self-healing etherpad for Denver Forum and PTG sessions17:36
aspiersI thought it was probably overkill to have two separate etherpads17:37
aspiersI'll try to touch base with ekcs about Denver too17:37
aspiersalright, anything else? if not I think we're done17:39
ricolinaspiers, thx, I will try to see if I have any topic to put in17:39
aspiers+117:39
ricolinI think I will put one for tempest later17:40
aspiersalright cool, thanks a lot folks!17:40
aspiersgood idea17:40
ricolinaspiers, thx!17:40
aspiers#action aspiers to reply to ricolin's SIG questionnaire on ML17:40
* ricolin like that action:)17:40
aspiersXD17:40
jsuchomegood night or whatever time is it in your part of the world17:40
aspiers#action jsuchome to propose the Denver discussion about RPC healthchecks17:41
ricolin01:41 for me:/17:41
aspiers#action ricolin to propose discussion topic for tempest17:41
aspiersouch!17:41
aspiersOK, please sleep now ;-)17:41
aspiersthanks a lot for attending17:41
aspiersttyl o/17:41
ricolinany chance to make our meeting two hour earlier?:)17:42
aspierswe can make it at least one hour earlier17:42
aspiersmaybe 217:42
aspiersbut there was another meeting this morning17:42
aspiersthat one is targetted at EU / APAC17:42
aspierswe always have two on the same day, so that all time zones are covered17:43
ricolinaspiers, oh, okay than I should try to join that one17:43
aspiersyes please :)17:43
aspiers#action aspiers to ask if irc-meetings can accept events in non-fixed time zones, so that they automatically adapt to daylight savings17:43
aspiersI'll ask on infra now17:44
aspiersOK, l8r folks!17:44
aspiers#endmeeting17:44
*** openstack changes topic to "https://wiki.openstack.org/wiki/Self_healing_SIG | https://storyboard.openstack.org/#!/project/openstack/self-healing-sig"17:44
openstackMeeting ended Wed Apr 10 17:44:20 2019 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)17:44
openstackMinutes:        http://eavesdrop.openstack.org/meetings/self_healing/2019/self_healing.2019-04-10-17.04.html17:44
openstackMinutes (text): http://eavesdrop.openstack.org/meetings/self_healing/2019/self_healing.2019-04-10-17.04.txt17:44
openstackLog:            http://eavesdrop.openstack.org/meetings/self_healing/2019/self_healing.2019-04-10-17.04.log.html17:44
*** witek has quit IRC17:44
*** ekcs has joined #openstack-self-healing17:45
*** ricolin has quit IRC17:49
-openstackstatus- NOTICE: Restarting Gerrit on review.openstack.org to pick up new configuration for the replication plugin19:05
*** jsuchome has quit IRC19:06
*** mvkr has joined #openstack-self-healing20:52

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!