Thursday, 2016-04-21

*** rossella_s has quit IRC		01:03
*** rossella_s has joined #openstack-ha		01:04
*** hoangcx has joined #openstack-ha		01:32
*** sasukeh has quit IRC		01:35
*** sasukeh has joined #openstack-ha		01:56
*** masahito has quit IRC		02:32
*** masahito has joined #openstack-ha		02:43
*** masahito has quit IRC		02:58
*** masahito has joined #openstack-ha		03:00
*** sasukeh has quit IRC		03:02
*** masahito has quit IRC		03:38
*** masahito has joined #openstack-ha		03:39
*** masahito has quit IRC		03:39
*** moiz has joined #openstack-ha		03:51
*** hoangcx_ has joined #openstack-ha		03:59
*** beekhof has quit IRC		04:00
*** hoangcx has quit IRC		04:00
*** sasukeh has joined #openstack-ha		04:05
*** sasukeh has quit IRC		04:15
*** sasukeh has joined #openstack-ha		04:25
*** hoangcx_ has quit IRC		04:45
*** masahito has joined #openstack-ha		04:45
*** beekhof has joined #openstack-ha		04:46
*** hoangcx has joined #openstack-ha		04:48
*** rossella_s has quit IRC		05:03
*** rossella_s has joined #openstack-ha		05:04
moiz	masahito: I have setup Masakari controller on openstack controller and host, process & instance monitor on openstack compute nodes.	05:12
moiz	masahito: all the processes are running. However, the evacuations are not happening.	05:13
masahito	moiz: hi	05:13
masahito	moiz: I read your problem.	05:13
masahito	moiz: first of all, all monitor processes don't have host fencing feature.	05:14
masahito	moiz: so you needs to setup RA for nova-compute to fence host when nova-compute goes down.	05:15
masahito	moiz: processmonitor only disables nova-compute on its host if processmonitor fails to restart processes listed in proc.list.	05:16
moiz	okay so from pacemaker side, i need to configure nova-compute RA, fence-nova & ipmilan for each compute node	05:18
moiz	so pacemaker is responsible for detecting nova-compute as down & then fencing it automatically	05:18
masahito	moiz: yes it is. if you want to fence node when nova-compute goes down.	05:19
moiz	if i dont want to fence the compute node via pacemaker , and may be write my own script which calls fence_compute directly, is it possible?	05:19
moiz	because last time i configured nova-compute over pacemaker 1.1.12rc4 , it didnt work.	05:20
moiz	pacemaker kept giving me 'not installed' errors for nova-compute on compute nodes. which doesn't make sense as they are installed and running on compute nods.	05:21
openstackgerrit	chen.xing proposed openstack/ha-guide: Add a note of virtual node https://review.openstack.org/308237	05:22
masahito	it means you don't want to fence node when nova-compute goes down, but want to fence node when some crush happens on the node, right?	05:23
masahito	if so, yes.	05:24
masahito	write your fencing script for pacemaker based on your usecase of fencing.	05:25
moiz	yes. okay great. so where does masakari come in? i need to understand the work flow from when libvirt goes down till evacuation.	05:26
moiz	as i understand , masakari process monitor detect libvirt down, tell the controller, waits for fencing & calls the evacuation API. is this correct?	05:26
masahito	yes	05:27
masahito	sorry, no	05:27
masahito	for evacuation.	05:27
moiz	what is the correct workflow for evacuation? please explain.	05:29
masahito	a host goes down, pacemaker running on another host detects the host down, pacemaker marks the host OFFLINE, hostmonitor detects the host down and sends it to controller, and then the controller waits for fencing and calls the evacuation API	05:30
masahito	long steps :-)	05:30
masahito	following steps are processmonitor's step:	05:31
*** mjura has joined #openstack-ha		05:32
masahito	process monitor detects libvirt down, tell the controller, and then the controller disables nova-compute running on the host of libvirt down	05:32
moiz	1. Process monitor: does exactly the same. i have noticed this on my setup	05:33
moiz	2. Hostmonitor: pacemaker sets the remote node (compute node) as OFFLINE. which host monitor will detect is as down? because hostmonitor will die along with the compute node.	05:34
moiz	my host monitor are running on compute node only. i have 1 compute running all 3 monitors, 1 compute is running all 3 monitors as is the RESERVED HOST. openstack controller is only running masakari controller (not monitors)	05:35
*** nkrinner has joined #openstack-ha		05:35
*** mjura has quit IRC		05:39
*** mjura has joined #openstack-ha		05:39
masahito	to be clarified, you have 2 compute node, one is compute for VM and has all 3 monitor, another is compute for RESERVED HOST and has all 3 monitor.	05:40
masahito	and	05:40
masahito	where is full stuck pacemaker and pacemaker-remote deployed?	05:41
moiz	yes	05:41
moiz	compute nodes are running pacemaker-remote	05:41
moiz	openstack controller node is running full stack pacemaker	05:42
moiz	and i have added remote nodes in the pacemaker cluster (on the controller)	05:42
masahito	oh, got it.	05:42
masahito	so can you see the all node's status of pacemaker at reserved host?	05:43
masahito	by crm_mon command	05:43
moiz	thats another thing i was going to mention. crm_mon would run on the controller node where the pacemaker stack is. aand on the compute nodes only pacemaker remote is installed, the clients are not installed there. and i was looking at masakari scripts & its calling crm_mon on the compute node	05:45
masahito	right, I think it's root cause.	05:45
moiz	i need to install pacemaker clients on the compute nodes. crmsh ?	05:46
masahito	hostmonitor relies on output of crm_mon. so you need to install crm command onto remote node when you use pacemaker-remote	05:46
masahito	yes	05:46
moiz	okay let me try	05:46
masahito	I think. but I don't remember exact package name.	05:47
moiz	done. apt-get install crmsh	05:47
moiz	its working now. i can see the full cluster info on the compute nodes	05:47
moiz	including the reserved host	05:47
masahito	great!	05:48
masahito	oh.	05:48
masahito	let me tell you one important thing	05:48
masahito	if possible, could you put corosync.conf on full stuck pacemaker into remote host?	05:49
moiz	yes i can do it	05:49
masahito	hostmonitor detects which cluster the monitor belongs to by parsing the config.	05:50
moiz	got it.	05:50
moiz	one last thing. at this moment, i dont want pacemaker to monitor the nova-compute process & fence the compute node. Can i see evacuations happening using masakari? how?	05:51
masahito	only seeing logs now.	05:54
masahito	or check the number of reserved hosts.	05:54
moiz	if i unplug my compute node. the hostmonitor on RESERVED node will detect that another compute node is down and will tell the controller and controller will call evacuate api for the down compute node. is this is correct?	05:57
moiz	meanwhile when i unplug, pacemaker will also mark the compute node (remote) as offline.	05:58
masahito	right	05:58
moiz	great.	05:58
moiz	i am going to try it :-)	05:58
*** moizarif has joined #openstack-ha		06:07
*** moiz has quit IRC		06:08
moizarif	masahito: you mentioned a 5 mins convergence period for evacuations. why 5 mins? is it depenedent on the # of VMs on the host?	06:15
moizarif	okay i unplugged the compute node. this is what happened:	06:18
moizarif	1. nova-compute got disabled	06:19
moizarif	2. Pacemaker marked the compute node as OFFLINE (crm_mon)	06:19
moizarif	3. its been 10 mins now and still no evacuations.	06:19
moizarif	controller logs say:	06:21
moizarif	Apr 21 06:09:34 juju-machine-3-lxc-3 masakari-controller(22126): INFO: <_MainThread(MainThread, started 140292292380480)> Recieved notification : { "id": "2f27dcd3-00c0-45e0-8214-6ea8308d8eb9", "type": "nodeStatus", "regionID": "serverstack", "hostname": "compute1-t4", "uuid": "", "time": "20160421060934", "eventID": "1", "eventType": "1", "detail": "02",	06:21
moizarif	"startTime": "20160421060934", "endTime": null, "tzname": "'UTC', 'UTC'", "daylight": "0", "cluster_port": ""}'	06:21
moizarif	Apr 21 06:09:34 juju-machine-3-lxc-3 masakari-controller(22126): INFO: <_MainThread(MainThread, started 140292292380480)> {u'eventID': u'1', u'hostname': u'compute1-t4', u'uuid': u'', u'eventType': u'1', u'regionID': u'serverstack', u'cluster_port': u'', u'detail': u'02', u'daylight': u'0', u'tzname': u"'UTC', 'UTC'", u'startTime': u'20160421060934', u'time': u'20160421060934', u'endTime':	06:21
moizarif	None, u'type': u'nodeStatus', u'id': u'2f27dcd3-00c0-45e0-8214-6ea8308d8eb9'}'	06:21
moizarif	Apr 21 06:09:34 juju-machine-3-lxc-3 masakari-controller(22126): INFO: <Thread(Thread-4, started 140292101179136)> Disable nova-compute on compute1-t4'	06:21
*** sasukeh has quit IRC		06:21
masahito	moizarif: 5 mins was just our requirements.	06:24
moizarif	nothing in the logs of process & host monitor on the reserved host	06:24
masahito	moizarif: what we need to be sure is we have to wait the down host is fenced. so it waits 5 mins.	06:25
masahito	questions	06:25
masahito	1. the host name 'compute1-t4' is equal to the name in crm_mon output?	06:26
moizarif	yes	06:26
masahito	2. when you register reserved host by CLI, what did you specify in cluster_port?	06:27
moizarif	python /home/ubuntu/masakari/masakari-controller/utils/reserve_host_manage.py --mode add --port "172.30.1.205:5405" --host compute2-b1 --db-user root --db-password 123 --db-host 127.0.0.1	06:27
moizarif	i used this command. dont think i added any cluster port	06:28
masahito	got it.	06:29
moizarif	5405 port	06:29
masahito	I think the difference between the command --port "172.30.1.205:5405" and the notification "cluster_port": ""}' causes evacuation fialure.	06:30
masahito	I'm thinking it's bad point in Masakari	06:31
masahito	Masakari	06:31
moizarif	my mysql output for reserve_list table:	06:31
moizarif	mysql> select * from reserve_list;	06:31
moizarif	+----+---------------------+-----------+---------------------+---------+-------------------+-------------+	06:31
moizarif	\| id \| create_at \| update_at \| delete_at \| deleted \| cluster_port \| hostname \|	06:31
moizarif	+----+---------------------+-----------+---------------------+---------+-------------------+-------------+	06:31
moizarif	\| 1 \| 2016-04-20 09:59:19 \| NULL \| 2016-04-20 11:55:51 \| 1 \| 172.30.1.205:5405 \| compute2-b1 \|	06:31
moizarif	+----+---------------------+-----------+---------------------+---------+-------------------+-------------+	06:31
moizarif	1 row in set (0.00 sec)	06:31
*** sasukeh has joined #openstack-ha		06:32
masahito	Masakari specifies whether the host notified by hostmonitor is in same cluster or not with cluster_port valuse.	06:32
masahito	but the cluster_port in a notification is generated based on corosync.conf	06:33
masahito	we need to improve it.	06:33
moizarif	so what can be a temporary workaround for this that i can use?	06:34
masahito	so a workaround for it is you add reserved host with --cluster_port "".	06:34
moizarif	got it. let me try this out	06:34
masahito	instead of 172.30.1.205:5405	06:34
*** sasukeh has quit IRC		06:39
*** rsjethani has joined #openstack-ha		06:43
rsjethani	hi masahito	06:44
masahito	rsjethani: hi	06:45
rsjethani	I have a few questions regarding masakari	06:45
masahito	rsjethani: ok, go ahead.	06:46
rsjethani	ok, first of all why host monitor and process monitor are written in shell script instead of python?	06:47
masahito	because we are using those as a RA of pacemaker local.	06:48
masahito	and both call linux commands, like crm_mon or etc. so it's suite to be linux.	06:51
rsjethani	But RA interface in language independent	06:51
*** sasukeh has joined #openstack-ha		06:51
*** hoangcx_ has joined #openstack-ha		06:53
rsjethani	http://www.linux-ha.org/wiki/Resource_agents	06:53
rsjethani	Topic "Implementation" says the RA just needs to have a predefined interface but	06:54
*** hoangcx has quit IRC		06:54
rsjethani	The reason I a msaying this is beacause shell scripts are hard to follow and maintain.	06:55
rsjethani	Also we want to get masakari under openstack where primary language in python	06:55
masahito	yes, we had options. but we've decided to implement it by shell script.	06:59
rsjethani	ok :)	06:59
masahito	yes, I know.	06:59
masahito	agreed to hard to maintain.	07:00
rsjethani	So where doed host monitor run?	07:00
masahito	on compute node? Is that answer for your question?	07:00
rsjethani	yes I am trying to understand how and where host monitor runs in a an system where we have say three compute nodes and one master/controller node	07:02
masahito	in that case, hostmonitor should run on all 3 compute nodes.	07:03
rsjethani	and masakari-controller will run on the controller node right?	07:04
masahito	right	07:04
rsjethani	ok	07:05
rsjethani	Another Question: why we need masakari-controler?	07:09
rsjethani	IMO we can make all three comaponents as independent services	07:09
rsjethani	just like nova ,glance etc	07:10
rsjethani	let HM make its own decisions. Same goes for IM	07:10
masahito	To conduct all error especially for race conditions, we introduce masakari-controller	07:12
rsjethani	Can you give example of race condition. thanks	07:13
masahito	for example, if the host goes down while instance monitor is rebuilding VM, when should it call evacuate API?	07:14
masahito	From outside of Nova, we can't stop rebuild steps in Nova.	07:14
*** dgurtner has joined #openstack-ha		07:15
rsjethani	Thanks masahito. I will look further into masakari and come back here :)	07:16
*** permalac has quit IRC		07:19
moizarif	masahito: the command: python reserve_host_manage.py --mode add --cluster_port "172.40.1.205:5405" --host compute2-b1 --db-user root --db-password 123 --db-host 127.0.0.1	07:29
moizarif	gives: reserve_host_manage.py: error: unrecognized arguments: --cluster_port 172.40.1.205:5405	07:30
*** moiz has joined #openstack-ha		07:35
*** dileepr has quit IRC		07:39
masahito	sorry, --port is correct	07:43
masahito	I meant --port ""	07:44
*** jpena\|off is now known as jpena		07:44
*** masahito has quit IRC		07:46
*** moiz_ has joined #openstack-ha		07:51
*** moiz has quit IRC		07:53
*** hoangcx_ has quit IRC		07:56
*** hoangcx has joined #openstack-ha		07:56
*** sasukeh has quit IRC		08:02
*** moizarif has quit IRC		08:03
*** haukebruno has joined #openstack-ha		08:13
*** masahito has joined #openstack-ha		08:22
*** markvoelker has quit IRC		08:27
aspiers	rsjethani: are you coming to Austin?	08:29
rsjethani	Hi aspiers	08:44
rsjethani	No I won't be there :(	08:44
aspiers	:(	08:44
rsjethani	But my colleagues will be there	08:45
aspiers	rsjethani: then watch for the video of https://www.openstack.org/summit/austin-2016/summit-schedule/events/7327	08:45
rsjethani	ok	08:45
*** rossella_s has quit IRC		09:03
*** rossella_s has joined #openstack-ha		09:06
*** markvoelker has joined #openstack-ha		09:27
*** markvoelker has quit IRC		09:32
*** moiz_ has quit IRC		09:46
*** masahito has quit IRC		09:52
*** sasukeh has joined #openstack-ha		09:57
*** hoangcx has quit IRC		10:32
*** rsjethani has quit IRC		10:40
*** rsjethani has joined #openstack-ha		10:40
*** dgurtner has quit IRC		10:47
*** dgurtner has joined #openstack-ha		10:47
*** sasukeh has quit IRC		11:14
*** ChanServ changes topic to "OpenStack HA \| next meeting in Austin! 12:30pm Expo Hall 5, table with ClusterLabs sign"		11:35
*** jpena is now known as jpena\|lunch		11:36
aspiers	http://clusterlabs.org/pipermail/users/2016-April/002753.html	11:41
*** mjura has quit IRC		11:48
*** mjura has joined #openstack-ha		12:04
*** markvoelker has joined #openstack-ha		12:17
*** yan-gao has quit IRC		12:19
*** yan-gao has joined #openstack-ha		12:21
*** yan-gao has quit IRC		12:29
*** yan-gao has joined #openstack-ha		12:29
*** mjura has quit IRC		12:38
*** mjura has joined #openstack-ha		12:51
*** jpena\|lunch is now known as jpena		12:59
*** sasukeh has joined #openstack-ha		13:38
*** rsjethani has quit IRC		13:50
*** kgaillot has joined #openstack-ha		13:56
*** sasukeh has quit IRC		14:20
*** mjura has quit IRC		15:16
*** sigmavirus24_awa is now known as sigmavirus24		15:33
*** sasukeh has joined #openstack-ha		15:41
*** dgurtner has quit IRC		15:46
*** sasukeh has quit IRC		15:46
*** openstackgerrit has quit IRC		15:48
*** openstackgerrit has joined #openstack-ha		15:49
*** sasukeh has joined #openstack-ha		15:52
*** sasukeh has quit IRC		16:09
*** jpena is now known as jpena\|off		16:58
*** serverascode has quit IRC		17:03
*** rossella_s has quit IRC		17:03
*** rossella_s has joined #openstack-ha		17:04
*** serverascode has joined #openstack-ha		17:05
*** sasukeh has joined #openstack-ha		17:10
*** sasukeh has quit IRC		17:16
*** rossella_s has quit IRC		17:19
*** rossella_s has joined #openstack-ha		17:20
*** FL1SK has quit IRC		17:23
*** sigmavirus24 is now known as sigmavirus24_awa		17:55
*** hoonetorg has joined #openstack-ha		17:56
*** jpokorny has joined #openstack-ha		17:58
*** sigmavirus24_awa is now known as sigmavirus24		18:09
*** haukebruno has quit IRC		18:11
*** sasukeh has joined #openstack-ha		18:11
*** sasukeh has quit IRC		18:16
*** sasukeh has joined #openstack-ha		19:12
*** sasukeh has quit IRC		19:18
*** FL1SK has joined #openstack-ha		19:24
*** sigmavirus24 is now known as sigmavirus24_awa		20:09
*** sigmavirus24_awa is now known as sigmavirus24		20:13
*** sasukeh has joined #openstack-ha		20:15
*** sasukeh has quit IRC		20:19
*** sasukeh has joined #openstack-ha		21:41
*** sasukeh has quit IRC		21:46
*** sigmavirus24 is now known as sigmavirus24_awa		22:35
*** sasukeh has joined #openstack-ha		22:42
*** vuntz has quit IRC		22:43
*** vuntz has joined #openstack-ha		22:44
*** kgaillot has quit IRC		22:45
*** sasukeh has quit IRC		22:47
*** markvoelker has quit IRC		23:20
*** sasukeh has joined #openstack-ha		23:34
*** sasukeh has quit IRC		23:39

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!