*** markvoelker has joined #openstack-ha | 00:01 | |
*** markvoelker has quit IRC | 00:05 | |
*** yan-gao has quit IRC | 00:17 | |
*** hoonetorg has quit IRC | 00:32 | |
*** catintheroof has quit IRC | 00:34 | |
*** yan-gao has joined #openstack-ha | 00:42 | |
*** hoonetorg has joined #openstack-ha | 00:45 | |
*** furlongm has quit IRC | 01:28 | |
*** kgaillot has quit IRC | 01:33 | |
*** hoonetorg has quit IRC | 01:42 | |
*** masahito has joined #openstack-ha | 01:47 | |
*** hoonetorg has joined #openstack-ha | 01:54 | |
*** masahito has quit IRC | 03:14 | |
*** raginbajin has quit IRC | 03:14 | |
*** zerick has quit IRC | 03:15 | |
*** zerick has joined #openstack-ha | 03:18 | |
*** raginbajin has joined #openstack-ha | 03:21 | |
*** Dinesh_Bhor has joined #openstack-ha | 04:00 | |
*** dgurtner has joined #openstack-ha | 04:47 | |
*** dgurtner has quit IRC | 04:53 | |
*** obre_ has joined #openstack-ha | 05:06 | |
*** obre has quit IRC | 05:07 | |
*** masahito has joined #openstack-ha | 05:17 | |
*** nkrinner_afk is now known as nkrinner | 06:20 | |
*** dgurtner has joined #openstack-ha | 06:22 | |
*** pcaruana has joined #openstack-ha | 06:24 | |
*** dgurtner has quit IRC | 06:39 | |
*** dgurtner has joined #openstack-ha | 07:35 | |
*** jpena|off is now known as jpena | 07:39 | |
*** jpena is now known as jpena|off | 07:53 | |
aspiers | beekhof, samP: you around? | 07:57 |
---|---|---|
*** jpena|off is now known as jpena | 07:59 | |
*** rossella_s has joined #openstack-ha | 08:03 | |
beekhof | aspiers: for now :) | 08:04 |
*** ducnc has joined #openstack-ha | 08:09 | |
*** dgurtner has quit IRC | 08:14 | |
*** dgurtner has joined #openstack-ha | 08:31 | |
beekhof | aspiers: ok, i'm back | 08:38 |
beekhof | it was story time | 08:38 |
beekhof | so fundamentally, my question is, if they can reliably evacuate a VM after a failure, why do whole nodes still need attrd | 08:38 |
aspiers | by "they" you mean masakari? | 08:39 |
aspiers | I don't think masakari *needs* attrd but like I said, this architecture needs it in order to avoid a hard-coupling with masakari | 08:40 |
beekhof | yes | 08:40 |
aspiers | the monitoring/notification side needs somewhere to queue failures it spotted | 08:40 |
beekhof | tbh, i am most of the way towards saying "to hell with it, lets just convert to masakari" | 08:41 |
aspiers | well that's kind of what this is proposing | 08:41 |
aspiers | but retaining some leeway in case we need to swap it for something else | 08:41 |
beekhof | i mean exactly as-is. no attrd, no fake fencing agents | 08:42 |
aspiers | and also crucially not doing the crazy host monitoring which masakari currently does | 08:42 |
beekhof | crazy? | 08:42 |
aspiers | masakari has its own host and process monitoring | 08:42 |
aspiers | I want to use pacemaker for that | 08:42 |
aspiers | let me show you some code | 08:42 |
beekhof | well, we have host monitoring | 08:42 |
beekhof | i assume process == vm? | 08:43 |
aspiers | no process == process :) | 08:43 |
aspiers | masakari monitors at 3 levels: host, process, VM | 08:44 |
aspiers | but I don't like it doing host/process | 08:44 |
beekhof | agreed | 08:44 |
aspiers | https://github.com/openstack/masakari-monitors/blob/master/masakarimonitors/hostmonitor/hostmonitor.sh | 08:44 |
aspiers | https://github.com/openstack/masakari-monitors/blob/master/masakarimonitors/processmonitor/processmonitor.sh | 08:44 |
aspiers | those attempt to reimplement aspects of Pacemaker as bash scripts | 08:45 |
aspiers | which I do not like | 08:45 |
aspiers | OTOH, I *do* like this VM monitor https://github.com/openstack/masakari-monitors/blob/master/masakarimonitors/instancemonitor/instance.py | 08:46 |
aspiers | I discussed this with samP and IIRC he was OK with the idea of changing it so that Pacemaker does host / process monitoring | 08:46 |
beekhof | log_info "WARNING : $0 is deprecated as of the Ocata release and will be removed in the Queens release. Use masakari-hostmonitor implemented in python instead of $0." | 08:46 |
aspiers | oh, interesting | 08:46 |
aspiers | but still | 08:47 |
aspiers | that's what Pacemaker already does, well | 08:47 |
aspiers | and Pacemaker can guarantee the node has been fenced *before* anything else happens | 08:47 |
aspiers | I guess that message was referring to https://github.com/openstack/masakari-monitors/blob/master/masakarimonitors/cmd/hostmonitor.py | 08:48 |
aspiers | Python is better than shell but that's not enough to convince me that the architecture is right | 08:49 |
aspiers | beekhof: AFAIK the architecture has not changed much from https://github.com/ntt-sic/masakari | 08:50 |
beekhof | we do it less well for remote nodes though | 08:51 |
beekhof | but yes, fencing | 08:51 |
aspiers | beekhof: look at https://github.com/openstack/masakari-monitors/blob/master/masakarimonitors/hostmonitor/host_handler/handle_host.py | 08:51 |
beekhof | yeah, i was looking at that one :) | 08:51 |
beekhof | since its just wrapping pacemaker's view of the world, might as well just let pacemaker do it | 08:52 |
aspiers | https://github.com/openstack/masakari-monitors/blob/master/masakarimonitors/hostmonitor/host_handler/parse_cib_xml.py#L139 | 08:53 |
aspiers | hardcoding use of IPMI? | 08:53 |
beekhof | in any case, i agree with your premise. masakari for handling the evacuations, something else for triggering them | 08:53 |
aspiers | great! | 08:53 |
aspiers | so in terms of how to "sell" this internally | 08:54 |
aspiers | I guess the points are | 08:54 |
aspiers | masakari does a nice job of VM monitoring (and I assume recovery, but I can't remember) | 08:54 |
aspiers | masakari is (I think) in Big Tent already, and adheres to a lot of OpenStack conventions | 08:55 |
beekhof | how this is impacted by containers will be a significant factor | 08:55 |
aspiers | we need something which can execute more flexible / sophisticated policies | 08:55 |
beekhof | more reliable too | 08:55 |
aspiers | I suspect it would be easy to run masakari server in containers | 08:55 |
beekhof | ie. if the evac fails | 08:55 |
aspiers | right | 08:56 |
aspiers | also masakari already uses oslo e.g. oslo.db | 08:57 |
aspiers | the only question mark for me is whether we could achieve something similar with Congress + Mistral | 08:58 |
aspiers | but did you see my comment at the end of the meeting earlier? | 08:58 |
aspiers | https://blueprints.launchpad.net/mistral/+spec/mistral-ha was closed as obsolete in January | 08:58 |
aspiers | IIRC I asked ddeja if he knew why. I can't remember the answer, but I remember being disappointed by it | 08:59 |
ddeja | aspiers: let me think about it.... | 09:00 |
aspiers | ddeja: thanks :) | 09:01 |
ddeja | aspiers: I see that there is some movement around that blueprint | 09:02 |
ddeja | I guess there was some decisions to move foreward on PTG | 09:02 |
aspiers | ddeja: where do you see the movement? | 09:03 |
ddeja | I see new dependencies | 09:03 |
ddeja | that wasn't there last time I was looking on it | 09:03 |
* ddeja may just be wrong | 09:03 | |
*** ushkalim_ has joined #openstack-ha | 09:03 | |
ddeja | but AFAIK it was closed because it was opened so long ago that circumstances changed and therefore there is a need to write a new one | 09:05 |
* ddeja doesn't have enaugh time to keep up with Mistral... | 09:07 | |
aspiers | that sounds like a strange reason to close | 09:09 |
ddeja | aspiers: I may forgot about something else... | 09:12 |
samP | hi.. | 09:13 |
samP | I thought meeting starts at 6:00 JST ...my mistake.. | 09:14 |
aspiers | oh crap, no my mistake :-( | 09:16 |
aspiers | stupid daylight savings | 09:16 |
samP | aspiers: np... | 09:16 |
aspiers | samP: did your clocks change too? | 09:16 |
samP | aspiers: no...we dont have that luxury | 09:17 |
aspiers | I wouldn't call it a luxury ;-) | 09:17 |
aspiers | makes it much harder to wake up ... | 09:17 |
aspiers | samP: mostly I was explaining the diagram to beekhof. I think he likes it | 09:17 |
samP | aspiers: Im just following the discussion.. diagram is same as one you mail me? | 09:18 |
aspiers | samP: yes | 09:18 |
samP | aspiers: great.. | 09:18 |
aspiers | it would make it easier for us to adopt masakari if the host and process monitors could be dropped in favour of Pacemaker, since they don't seem to offer anything extra over what Pacemaker can already do | 09:19 |
aspiers | or am I wrong? | 09:19 |
aspiers | the masakari host monitor seems to be just a wrapper around Pacemaker | 09:20 |
samP | aspiers: you are right | 09:20 |
aspiers | would it be possible for masakari developers to join future meetings? | 09:20 |
samP | aspiers: only reason we maintain it is we have some users who using it. | 09:20 |
aspiers | OK | 09:20 |
aspiers | so if we could integrate Pacemaker host/process monitoring with masakari then those users would be able to switch | 09:21 |
*** rmart04 has joined #openstack-ha | 09:21 | |
aspiers | back in a few mins | 09:22 |
samP | aspiers: I think it is a nice solution and I cant see any problems with that. Only problem is, they have huge clusters and it will take time to adopt | 09:23 |
samP | aspiers: sure | 09:23 |
aspiers | samP: that's fine, we have similar problems ;-/ | 09:23 |
aspiers | samP: BTW http://lists.openstack.org/pipermail/user-committee/2017-March/001890.html | 09:23 |
samP | aspiers: thank you for bringing this up | 09:26 |
samP | aspiers: In masakari meetings, We have already discussed about replace masakari-monitors** with resource-agents. | 09:29 |
samP | its one of the pike work items. | 09:30 |
samP | #link https://etherpad.openstack.org/p/masakari-pike-workitems | 09:30 |
samP | please see #L51-53 | 09:30 |
aspiers | samP: thanks! | 09:31 |
*** dgurtner has quit IRC | 09:32 | |
samP | aspiers: I was planning to make this before summit.. I can ask masakari developers to join ha meeting. | 09:35 |
aspiers | samP: thanks! | 09:35 |
samP | But the problem is, most of masakari developers have very little knowledge about pacemaker-resource-agents. | 09:38 |
aspiers | that should be easy to fix | 09:38 |
aspiers | I volunteer beekhof to help explain them ;-) | 09:38 |
aspiers | he's even in the right time zone | 09:38 |
samP | aspiers: that would be great..or.. | 09:39 |
aspiers | of course I am happy to answer questions about OCF RAs | 09:39 |
aspiers | samP: about https://etherpad.openstack.org/p/masakari-pike-workitems, it would be nice if masakari didn't hardcode any assumptions about stonith | 09:40 |
aspiers | if it just delegates stonith to pacemaker then there is nothing to do | 09:40 |
aspiers | and then it is not limited to IPMI | 09:40 |
aspiers | and this would happen automatically if masakari uses pacemaker for host monitoring | 09:40 |
samP | aspiers: which item? | 09:40 |
aspiers | "Force Stonith" #L33-36 | 09:41 |
aspiers | same for split brain detection #L19 | 09:41 |
samP | ah....its has a different usecase | 09:41 |
samP | aspiers: Force Stonith is use for isolate a node by force.. kind of and optional... | 09:43 |
aspiers | samP: when would you need to do that? | 09:43 |
aspiers | samP: but again it makes sense to do it through Pacemaker | 09:44 |
samP | aspiers: if pacemaker there, then we can do it through pacemaker. Force Stonith will be the masakari side function to call it. In etherpad, 'IPMI' is an example. | 09:46 |
aspiers | samP: OK. what is the use case? | 09:47 |
samP | aspiers: In process or VM failures, in the case of masakari can not rescue and if the operator decide that he can no longer rescue the compute node, then operator might need to kill the compute node. | 09:50 |
aspiers | samP: OK | 09:50 |
samP | aspiers: I will explain about split brain after this.. | 09:50 |
samP | aspiers: We have got an another request for this... let me try to explain.. | 09:51 |
samP | One of our masakari users user user pacemaker+masakari | 09:52 |
samP | when compute node goes down, pacemaker fence it and call masakari for evacuation. | 09:53 |
samP | the problem is, pacemaker kill the compute node and node does not have enough time to do the core dump | 09:53 |
samP | They have no way to know why that compute node went down.. | 09:54 |
samP | <-- thatz what they said... | 09:54 |
aspiers | pacemaker could do a core dump before fencing | 09:57 |
samP | aspiers: correct..but they need isolate that node immediately, so masakari can do the evacuate. since they have to dump 256GB of mem, it takes some time | 10:00 |
aspiers | I see | 10:02 |
aspiers | gotta go now, back later | 10:02 |
aspiers | thanks for all the info! | 10:02 |
samP | aspiers: sure, thanks I will catch you later | 10:02 |
*** dgurtner has joined #openstack-ha | 10:15 | |
*** dgurtner has quit IRC | 10:15 | |
*** dgurtner has joined #openstack-ha | 10:15 | |
*** sticker has quit IRC | 10:24 | |
*** dgurtner has quit IRC | 10:31 | |
*** masahito has quit IRC | 10:35 | |
*** masahito has joined #openstack-ha | 10:40 | |
*** masahito has quit IRC | 10:45 | |
*** samP has quit IRC | 10:45 | |
*** ushkalim_ has quit IRC | 10:48 | |
*** ushkalim_ has joined #openstack-ha | 11:03 | |
*** ushkalim_ has quit IRC | 11:24 | |
*** ushkalim_ has joined #openstack-ha | 11:36 | |
*** ushkalim_ has quit IRC | 12:03 | |
*** ushkalim_ has joined #openstack-ha | 12:18 | |
*** jpena is now known as jpena|lunch | 12:40 | |
*** rossella_s has quit IRC | 12:42 | |
*** rossella_s has joined #openstack-ha | 12:43 | |
*** jmlowe has quit IRC | 12:48 | |
*** jmlowe has joined #openstack-ha | 13:00 | |
*** jmlowe has quit IRC | 13:02 | |
*** ushkalim_ has quit IRC | 13:13 | |
*** ushkalim_ has joined #openstack-ha | 13:25 | |
*** catintheroof has joined #openstack-ha | 13:30 | |
*** jmlowe has joined #openstack-ha | 13:36 | |
*** catintheroof has quit IRC | 13:41 | |
*** sticker has joined #openstack-ha | 13:45 | |
*** jpena|lunch is now known as jpena | 13:45 | |
*** kgaillot has joined #openstack-ha | 13:55 | |
*** masahito has joined #openstack-ha | 13:58 | |
*** aasmith has quit IRC | 13:59 | |
*** jmlowe_ has joined #openstack-ha | 14:02 | |
*** jmlowe has quit IRC | 14:04 | |
*** masahito has quit IRC | 14:16 | |
*** masahito has joined #openstack-ha | 14:18 | |
*** dgurtner has joined #openstack-ha | 14:31 | |
*** aasmith has joined #openstack-ha | 14:48 | |
*** cleong has joined #openstack-ha | 15:07 | |
*** nkrinner is now known as nkrinner_afk | 15:51 | |
*** rmart04 has quit IRC | 15:55 | |
*** ushkalim_ has quit IRC | 16:17 | |
*** masahito has quit IRC | 16:45 | |
*** masahito has joined #openstack-ha | 17:03 | |
*** mrhillsman has quit IRC | 17:10 | |
*** codebauss has joined #openstack-ha | 17:19 | |
*** jpena is now known as jpena|off | 17:20 | |
*** codebauss is now known as mrhillsman | 17:20 | |
*** mrhillsman has quit IRC | 17:21 | |
*** codebauss has joined #openstack-ha | 17:23 | |
*** masahito has quit IRC | 17:23 | |
*** codebauss is now known as mrhillsman | 17:23 | |
*** jmlowe_ has quit IRC | 17:25 | |
*** pcaruana has quit IRC | 17:52 | |
*** dgurtner has quit IRC | 17:55 | |
*** masahito has joined #openstack-ha | 17:59 | |
*** hannibal has joined #openstack-ha | 18:06 | |
*** aasmith has quit IRC | 18:08 | |
*** jmlowe has joined #openstack-ha | 18:13 | |
*** masahito has quit IRC | 18:22 | |
*** hannibal has quit IRC | 18:36 | |
*** jmlowe has quit IRC | 18:42 | |
*** openstackstatus has joined #openstack-ha | 18:44 | |
*** ChanServ sets mode: +v openstackstatus | 18:44 | |
*** hannibal has joined #openstack-ha | 18:49 | |
*** hannibal has quit IRC | 19:00 | |
*** jmlowe has joined #openstack-ha | 19:06 | |
*** hannibal has joined #openstack-ha | 19:16 | |
*** dgurtner has joined #openstack-ha | 19:53 | |
*** jmlowe has quit IRC | 20:00 | |
*** jmlowe has joined #openstack-ha | 20:03 | |
*** jmlowe has quit IRC | 20:15 | |
*** hannibal has quit IRC | 20:25 | |
*** hannibal has joined #openstack-ha | 20:37 | |
*** dgurtner has quit IRC | 20:40 | |
*** hannibal has quit IRC | 20:54 | |
*** cleong has quit IRC | 21:19 | |
*** yee379 has joined #openstack-ha | 21:25 | |
*** yee37915 has quit IRC | 21:26 | |
*** kgaillot has quit IRC | 23:01 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!