Wednesday, 2017-03-29

*** markvoelker has joined #openstack-ha		00:01
*** markvoelker has quit IRC		00:05
*** yan-gao has quit IRC		00:17
*** hoonetorg has quit IRC		00:32
*** catintheroof has quit IRC		00:34
*** yan-gao has joined #openstack-ha		00:42
*** hoonetorg has joined #openstack-ha		00:45
*** furlongm has quit IRC		01:28
*** kgaillot has quit IRC		01:33
*** hoonetorg has quit IRC		01:42
*** masahito has joined #openstack-ha		01:47
*** hoonetorg has joined #openstack-ha		01:54
*** masahito has quit IRC		03:14
*** raginbajin has quit IRC		03:14
*** zerick has quit IRC		03:15
*** zerick has joined #openstack-ha		03:18
*** raginbajin has joined #openstack-ha		03:21
*** Dinesh_Bhor has joined #openstack-ha		04:00
*** dgurtner has joined #openstack-ha		04:47
*** dgurtner has quit IRC		04:53
*** obre_ has joined #openstack-ha		05:06
*** obre has quit IRC		05:07
*** masahito has joined #openstack-ha		05:17
*** nkrinner_afk is now known as nkrinner		06:20
*** dgurtner has joined #openstack-ha		06:22
*** pcaruana has joined #openstack-ha		06:24
*** dgurtner has quit IRC		06:39
*** dgurtner has joined #openstack-ha		07:35
*** jpena\|off is now known as jpena		07:39
*** jpena is now known as jpena\|off		07:53
aspiers	beekhof, samP: you around?	07:57
*** jpena\|off is now known as jpena		07:59
*** rossella_s has joined #openstack-ha		08:03
beekhof	aspiers: for now :)	08:04
*** ducnc has joined #openstack-ha		08:09
*** dgurtner has quit IRC		08:14
*** dgurtner has joined #openstack-ha		08:31
beekhof	aspiers: ok, i'm back	08:38
beekhof	it was story time	08:38
beekhof	so fundamentally, my question is, if they can reliably evacuate a VM after a failure, why do whole nodes still need attrd	08:38
aspiers	by "they" you mean masakari?	08:39
aspiers	I don't think masakari needs attrd but like I said, this architecture needs it in order to avoid a hard-coupling with masakari	08:40
beekhof	yes	08:40
aspiers	the monitoring/notification side needs somewhere to queue failures it spotted	08:40
beekhof	tbh, i am most of the way towards saying "to hell with it, lets just convert to masakari"	08:41
aspiers	well that's kind of what this is proposing	08:41
aspiers	but retaining some leeway in case we need to swap it for something else	08:41
beekhof	i mean exactly as-is. no attrd, no fake fencing agents	08:42
aspiers	and also crucially not doing the crazy host monitoring which masakari currently does	08:42
beekhof	crazy?	08:42
aspiers	masakari has its own host and process monitoring	08:42
aspiers	I want to use pacemaker for that	08:42
aspiers	let me show you some code	08:42
beekhof	well, we have host monitoring	08:42
beekhof	i assume process == vm?	08:43
aspiers	no process == process :)	08:43
aspiers	masakari monitors at 3 levels: host, process, VM	08:44
aspiers	but I don't like it doing host/process	08:44
beekhof	agreed	08:44
aspiers	https://github.com/openstack/masakari-monitors/blob/master/masakarimonitors/hostmonitor/hostmonitor.sh	08:44
aspiers	https://github.com/openstack/masakari-monitors/blob/master/masakarimonitors/processmonitor/processmonitor.sh	08:44
aspiers	those attempt to reimplement aspects of Pacemaker as bash scripts	08:45
aspiers	which I do not like	08:45
aspiers	OTOH, I do like this VM monitor https://github.com/openstack/masakari-monitors/blob/master/masakarimonitors/instancemonitor/instance.py	08:46
aspiers	I discussed this with samP and IIRC he was OK with the idea of changing it so that Pacemaker does host / process monitoring	08:46
beekhof	log_info "WARNING : $0 is deprecated as of the Ocata release and will be removed in the Queens release. Use masakari-hostmonitor implemented in python instead of $0."	08:46
aspiers	oh, interesting	08:46
aspiers	but still	08:47
aspiers	that's what Pacemaker already does, well	08:47
aspiers	and Pacemaker can guarantee the node has been fenced before anything else happens	08:47
aspiers	I guess that message was referring to https://github.com/openstack/masakari-monitors/blob/master/masakarimonitors/cmd/hostmonitor.py	08:48
aspiers	Python is better than shell but that's not enough to convince me that the architecture is right	08:49
aspiers	beekhof: AFAIK the architecture has not changed much from https://github.com/ntt-sic/masakari	08:50
beekhof	we do it less well for remote nodes though	08:51
beekhof	but yes, fencing	08:51
aspiers	beekhof: look at https://github.com/openstack/masakari-monitors/blob/master/masakarimonitors/hostmonitor/host_handler/handle_host.py	08:51
beekhof	yeah, i was looking at that one :)	08:51
beekhof	since its just wrapping pacemaker's view of the world, might as well just let pacemaker do it	08:52
aspiers	https://github.com/openstack/masakari-monitors/blob/master/masakarimonitors/hostmonitor/host_handler/parse_cib_xml.py#L139	08:53
aspiers	hardcoding use of IPMI?	08:53
beekhof	in any case, i agree with your premise. masakari for handling the evacuations, something else for triggering them	08:53
aspiers	great!	08:53
aspiers	so in terms of how to "sell" this internally	08:54
aspiers	I guess the points are	08:54
aspiers	masakari does a nice job of VM monitoring (and I assume recovery, but I can't remember)	08:54
aspiers	masakari is (I think) in Big Tent already, and adheres to a lot of OpenStack conventions	08:55
beekhof	how this is impacted by containers will be a significant factor	08:55
aspiers	we need something which can execute more flexible / sophisticated policies	08:55
beekhof	more reliable too	08:55
aspiers	I suspect it would be easy to run masakari server in containers	08:55
beekhof	ie. if the evac fails	08:55
aspiers	right	08:56
aspiers	also masakari already uses oslo e.g. oslo.db	08:57
aspiers	the only question mark for me is whether we could achieve something similar with Congress + Mistral	08:58
aspiers	but did you see my comment at the end of the meeting earlier?	08:58
aspiers	https://blueprints.launchpad.net/mistral/+spec/mistral-ha was closed as obsolete in January	08:58
aspiers	IIRC I asked ddeja if he knew why. I can't remember the answer, but I remember being disappointed by it	08:59
ddeja	aspiers: let me think about it....	09:00
aspiers	ddeja: thanks :)	09:01
ddeja	aspiers: I see that there is some movement around that blueprint	09:02
ddeja	I guess there was some decisions to move foreward on PTG	09:02
aspiers	ddeja: where do you see the movement?	09:03
ddeja	I see new dependencies	09:03
ddeja	that wasn't there last time I was looking on it	09:03
* ddeja may just be wrong		09:03
*** ushkalim_ has joined #openstack-ha		09:03
ddeja	but AFAIK it was closed because it was opened so long ago that circumstances changed and therefore there is a need to write a new one	09:05
* ddeja doesn't have enaugh time to keep up with Mistral...		09:07
aspiers	that sounds like a strange reason to close	09:09
ddeja	aspiers: I may forgot about something else...	09:12
samP	hi..	09:13
samP	I thought meeting starts at 6:00 JST ...my mistake..	09:14
aspiers	oh crap, no my mistake :-(	09:16
aspiers	stupid daylight savings	09:16
samP	aspiers: np...	09:16
aspiers	samP: did your clocks change too?	09:16
samP	aspiers: no...we dont have that luxury	09:17
aspiers	I wouldn't call it a luxury ;-)	09:17
aspiers	makes it much harder to wake up ...	09:17
aspiers	samP: mostly I was explaining the diagram to beekhof. I think he likes it	09:17
samP	aspiers: Im just following the discussion.. diagram is same as one you mail me?	09:18
aspiers	samP: yes	09:18
samP	aspiers: great..	09:18
aspiers	it would make it easier for us to adopt masakari if the host and process monitors could be dropped in favour of Pacemaker, since they don't seem to offer anything extra over what Pacemaker can already do	09:19
aspiers	or am I wrong?	09:19
aspiers	the masakari host monitor seems to be just a wrapper around Pacemaker	09:20
samP	aspiers: you are right	09:20
aspiers	would it be possible for masakari developers to join future meetings?	09:20
samP	aspiers: only reason we maintain it is we have some users who using it.	09:20
aspiers	OK	09:20
aspiers	so if we could integrate Pacemaker host/process monitoring with masakari then those users would be able to switch	09:21
*** rmart04 has joined #openstack-ha		09:21
aspiers	back in a few mins	09:22
samP	aspiers: I think it is a nice solution and I cant see any problems with that. Only problem is, they have huge clusters and it will take time to adopt	09:23
samP	aspiers: sure	09:23
aspiers	samP: that's fine, we have similar problems ;-/	09:23
aspiers	samP: BTW http://lists.openstack.org/pipermail/user-committee/2017-March/001890.html	09:23
samP	aspiers: thank you for bringing this up	09:26
samP	aspiers: In masakari meetings, We have already discussed about replace masakari-monitors** with resource-agents.	09:29
samP	its one of the pike work items.	09:30
samP	#link https://etherpad.openstack.org/p/masakari-pike-workitems	09:30
samP	please see #L51-53	09:30
aspiers	samP: thanks!	09:31
*** dgurtner has quit IRC		09:32
samP	aspiers: I was planning to make this before summit.. I can ask masakari developers to join ha meeting.	09:35
aspiers	samP: thanks!	09:35
samP	But the problem is, most of masakari developers have very little knowledge about pacemaker-resource-agents.	09:38
aspiers	that should be easy to fix	09:38
aspiers	I volunteer beekhof to help explain them ;-)	09:38
aspiers	he's even in the right time zone	09:38
samP	aspiers: that would be great..or..	09:39
aspiers	of course I am happy to answer questions about OCF RAs	09:39
aspiers	samP: about https://etherpad.openstack.org/p/masakari-pike-workitems, it would be nice if masakari didn't hardcode any assumptions about stonith	09:40
aspiers	if it just delegates stonith to pacemaker then there is nothing to do	09:40
aspiers	and then it is not limited to IPMI	09:40
aspiers	and this would happen automatically if masakari uses pacemaker for host monitoring	09:40
samP	aspiers: which item?	09:40
aspiers	"Force Stonith" #L33-36	09:41
aspiers	same for split brain detection #L19	09:41
samP	ah....its has a different usecase	09:41
samP	aspiers: Force Stonith is use for isolate a node by force.. kind of and optional...	09:43
aspiers	samP: when would you need to do that?	09:43
aspiers	samP: but again it makes sense to do it through Pacemaker	09:44
samP	aspiers: if pacemaker there, then we can do it through pacemaker. Force Stonith will be the masakari side function to call it. In etherpad, 'IPMI' is an example.	09:46
aspiers	samP: OK. what is the use case?	09:47
samP	aspiers: In process or VM failures, in the case of masakari can not rescue and if the operator decide that he can no longer rescue the compute node, then operator might need to kill the compute node.	09:50
aspiers	samP: OK	09:50
samP	aspiers: I will explain about split brain after this..	09:50
samP	aspiers: We have got an another request for this... let me try to explain..	09:51
samP	One of our masakari users user user pacemaker+masakari	09:52
samP	when compute node goes down, pacemaker fence it and call masakari for evacuation.	09:53
samP	the problem is, pacemaker kill the compute node and node does not have enough time to do the core dump	09:53
samP	They have no way to know why that compute node went down..	09:54
samP	<-- thatz what they said...	09:54
aspiers	pacemaker could do a core dump before fencing	09:57
samP	aspiers: correct..but they need isolate that node immediately, so masakari can do the evacuate. since they have to dump 256GB of mem, it takes some time	10:00
aspiers	I see	10:02
aspiers	gotta go now, back later	10:02
aspiers	thanks for all the info!	10:02
samP	aspiers: sure, thanks I will catch you later	10:02
*** dgurtner has joined #openstack-ha		10:15
*** dgurtner has quit IRC		10:15
*** dgurtner has joined #openstack-ha		10:15
*** sticker has quit IRC		10:24
*** dgurtner has quit IRC		10:31
*** masahito has quit IRC		10:35
*** masahito has joined #openstack-ha		10:40
*** masahito has quit IRC		10:45
*** samP has quit IRC		10:45
*** ushkalim_ has quit IRC		10:48
*** ushkalim_ has joined #openstack-ha		11:03
*** ushkalim_ has quit IRC		11:24
*** ushkalim_ has joined #openstack-ha		11:36
*** ushkalim_ has quit IRC		12:03
*** ushkalim_ has joined #openstack-ha		12:18
*** jpena is now known as jpena\|lunch		12:40
*** rossella_s has quit IRC		12:42
*** rossella_s has joined #openstack-ha		12:43
*** jmlowe has quit IRC		12:48
*** jmlowe has joined #openstack-ha		13:00
*** jmlowe has quit IRC		13:02
*** ushkalim_ has quit IRC		13:13
*** ushkalim_ has joined #openstack-ha		13:25
*** catintheroof has joined #openstack-ha		13:30
*** jmlowe has joined #openstack-ha		13:36
*** catintheroof has quit IRC		13:41
*** sticker has joined #openstack-ha		13:45
*** jpena\|lunch is now known as jpena		13:45
*** kgaillot has joined #openstack-ha		13:55
*** masahito has joined #openstack-ha		13:58
*** aasmith has quit IRC		13:59
*** jmlowe_ has joined #openstack-ha		14:02
*** jmlowe has quit IRC		14:04
*** masahito has quit IRC		14:16
*** masahito has joined #openstack-ha		14:18
*** dgurtner has joined #openstack-ha		14:31
*** aasmith has joined #openstack-ha		14:48
*** cleong has joined #openstack-ha		15:07
*** nkrinner is now known as nkrinner_afk		15:51
*** rmart04 has quit IRC		15:55
*** ushkalim_ has quit IRC		16:17
*** masahito has quit IRC		16:45
*** masahito has joined #openstack-ha		17:03
*** mrhillsman has quit IRC		17:10
*** codebauss has joined #openstack-ha		17:19
*** jpena is now known as jpena\|off		17:20
*** codebauss is now known as mrhillsman		17:20
*** mrhillsman has quit IRC		17:21
*** codebauss has joined #openstack-ha		17:23
*** masahito has quit IRC		17:23
*** codebauss is now known as mrhillsman		17:23
*** jmlowe_ has quit IRC		17:25
*** pcaruana has quit IRC		17:52
*** dgurtner has quit IRC		17:55
*** masahito has joined #openstack-ha		17:59
*** hannibal has joined #openstack-ha		18:06
*** aasmith has quit IRC		18:08
*** jmlowe has joined #openstack-ha		18:13
*** masahito has quit IRC		18:22
*** hannibal has quit IRC		18:36
*** jmlowe has quit IRC		18:42
*** openstackstatus has joined #openstack-ha		18:44
*** ChanServ sets mode: +v openstackstatus		18:44
*** hannibal has joined #openstack-ha		18:49
*** hannibal has quit IRC		19:00
*** jmlowe has joined #openstack-ha		19:06
*** hannibal has joined #openstack-ha		19:16
*** dgurtner has joined #openstack-ha		19:53
*** jmlowe has quit IRC		20:00
*** jmlowe has joined #openstack-ha		20:03
*** jmlowe has quit IRC		20:15
*** hannibal has quit IRC		20:25
*** hannibal has joined #openstack-ha		20:37
*** dgurtner has quit IRC		20:40
*** hannibal has quit IRC		20:54
*** cleong has quit IRC		21:19
*** yee379 has joined #openstack-ha		21:25
*** yee37915 has quit IRC		21:26
*** kgaillot has quit IRC		23:01

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!