*** rossella_s has quit IRC | 01:03 | |
*** rossella_s has joined #openstack-ha | 01:04 | |
*** hoangcx has joined #openstack-ha | 01:32 | |
*** sasukeh has quit IRC | 01:35 | |
*** sasukeh has joined #openstack-ha | 01:56 | |
*** masahito has quit IRC | 02:32 | |
*** masahito has joined #openstack-ha | 02:43 | |
*** masahito has quit IRC | 02:58 | |
*** masahito has joined #openstack-ha | 03:00 | |
*** sasukeh has quit IRC | 03:02 | |
*** masahito has quit IRC | 03:38 | |
*** masahito has joined #openstack-ha | 03:39 | |
*** masahito has quit IRC | 03:39 | |
*** moiz has joined #openstack-ha | 03:51 | |
*** hoangcx_ has joined #openstack-ha | 03:59 | |
*** beekhof has quit IRC | 04:00 | |
*** hoangcx has quit IRC | 04:00 | |
*** sasukeh has joined #openstack-ha | 04:05 | |
*** sasukeh has quit IRC | 04:15 | |
*** sasukeh has joined #openstack-ha | 04:25 | |
*** hoangcx_ has quit IRC | 04:45 | |
*** masahito has joined #openstack-ha | 04:45 | |
*** beekhof has joined #openstack-ha | 04:46 | |
*** hoangcx has joined #openstack-ha | 04:48 | |
*** rossella_s has quit IRC | 05:03 | |
*** rossella_s has joined #openstack-ha | 05:04 | |
moiz | masahito: I have setup Masakari controller on openstack controller and host, process & instance monitor on openstack compute nodes. | 05:12 |
---|---|---|
moiz | masahito: all the processes are running. However, the evacuations are not happening. | 05:13 |
masahito | moiz: hi | 05:13 |
masahito | moiz: I read your problem. | 05:13 |
masahito | moiz: first of all, all monitor processes don't have host fencing feature. | 05:14 |
masahito | moiz: so you needs to setup RA for nova-compute to fence host when nova-compute goes down. | 05:15 |
masahito | moiz: processmonitor only disables nova-compute on its host if processmonitor fails to restart processes listed in proc.list. | 05:16 |
moiz | okay so from pacemaker side, i need to configure nova-compute RA, fence-nova & ipmilan for each compute node | 05:18 |
moiz | so pacemaker is responsible for detecting nova-compute as down & then fencing it automatically | 05:18 |
masahito | moiz: yes it is. if you want to fence node when nova-compute goes down. | 05:19 |
moiz | if i dont want to fence the compute node via pacemaker , and may be write my own script which calls fence_compute directly, is it possible? | 05:19 |
moiz | because last time i configured nova-compute over pacemaker 1.1.12rc4 , it didnt work. | 05:20 |
moiz | pacemaker kept giving me 'not installed' errors for nova-compute on compute nodes. which doesn't make sense as they are installed and running on compute nods. | 05:21 |
openstackgerrit | chen.xing proposed openstack/ha-guide: Add a note of virtual node https://review.openstack.org/308237 | 05:22 |
masahito | it means you don't want to fence node when nova-compute goes down, but want to fence node when some crush happens on the node, right? | 05:23 |
masahito | if so, yes. | 05:24 |
masahito | write your fencing script for pacemaker based on your usecase of fencing. | 05:25 |
moiz | yes. okay great. so where does masakari come in? i need to understand the work flow from when libvirt goes down till evacuation. | 05:26 |
moiz | as i understand , masakari process monitor detect libvirt down, tell the controller, waits for fencing & calls the evacuation API. is this correct? | 05:26 |
masahito | yes | 05:27 |
masahito | sorry, no | 05:27 |
masahito | for evacuation. | 05:27 |
moiz | what is the correct workflow for evacuation? please explain. | 05:29 |
masahito | a host goes down, pacemaker running on another host detects the host down, pacemaker marks the host OFFLINE, hostmonitor detects the host down and sends it to controller, and then the controller waits for fencing and calls the evacuation API | 05:30 |
masahito | long steps :-) | 05:30 |
masahito | following steps are processmonitor's step: | 05:31 |
*** mjura has joined #openstack-ha | 05:32 | |
masahito | process monitor detects libvirt down, tell the controller, and then the controller disables nova-compute running on the host of libvirt down | 05:32 |
moiz | 1. Process monitor: does exactly the same. i have noticed this on my setup | 05:33 |
moiz | 2. Hostmonitor: pacemaker sets the remote node (compute node) as OFFLINE. which host monitor will detect is as down? because hostmonitor will die along with the compute node. | 05:34 |
moiz | my host monitor are running on compute node only. i have 1 compute running all 3 monitors, 1 compute is running all 3 monitors as is the RESERVED HOST. openstack controller is only running masakari controller (not monitors) | 05:35 |
*** nkrinner has joined #openstack-ha | 05:35 | |
*** mjura has quit IRC | 05:39 | |
*** mjura has joined #openstack-ha | 05:39 | |
masahito | to be clarified, you have 2 compute node, one is compute for VM and has all 3 monitor, another is compute for RESERVED HOST and has all 3 monitor. | 05:40 |
masahito | and | 05:40 |
masahito | where is full stuck pacemaker and pacemaker-remote deployed? | 05:41 |
moiz | yes | 05:41 |
moiz | compute nodes are running pacemaker-remote | 05:41 |
moiz | openstack controller node is running full stack pacemaker | 05:42 |
moiz | and i have added remote nodes in the pacemaker cluster (on the controller) | 05:42 |
masahito | oh, got it. | 05:42 |
masahito | so can you see the all node's status of pacemaker at reserved host? | 05:43 |
masahito | by crm_mon command | 05:43 |
moiz | thats another thing i was going to mention. crm_mon would run on the controller node where the pacemaker stack is. aand on the compute nodes only pacemaker remote is installed, the clients are not installed there. and i was looking at masakari scripts & its calling crm_mon on the compute node | 05:45 |
masahito | right, I think it's root cause. | 05:45 |
moiz | i need to install pacemaker clients on the compute nodes. crmsh ? | 05:46 |
masahito | hostmonitor relies on output of crm_mon. so you need to install crm command onto remote node when you use pacemaker-remote | 05:46 |
masahito | yes | 05:46 |
moiz | okay let me try | 05:46 |
masahito | I think. but I don't remember exact package name. | 05:47 |
moiz | done. apt-get install crmsh | 05:47 |
moiz | its working now. i can see the full cluster info on the compute nodes | 05:47 |
moiz | including the reserved host | 05:47 |
masahito | great! | 05:48 |
masahito | oh. | 05:48 |
masahito | let me tell you one important thing | 05:48 |
masahito | if possible, could you put corosync.conf on full stuck pacemaker into remote host? | 05:49 |
moiz | yes i can do it | 05:49 |
masahito | hostmonitor detects which cluster the monitor belongs to by parsing the config. | 05:50 |
moiz | got it. | 05:50 |
moiz | one last thing. at this moment, i dont want pacemaker to monitor the nova-compute process & fence the compute node. Can i see evacuations happening using masakari? how? | 05:51 |
masahito | only seeing logs now. | 05:54 |
masahito | or check the number of reserved hosts. | 05:54 |
moiz | if i unplug my compute node. the hostmonitor on RESERVED node will detect that another compute node is down and will tell the controller and controller will call evacuate api for the down compute node. is this is correct? | 05:57 |
moiz | meanwhile when i unplug, pacemaker will also mark the compute node (remote) as offline. | 05:58 |
masahito | right | 05:58 |
moiz | great. | 05:58 |
moiz | i am going to try it :-) | 05:58 |
*** moizarif has joined #openstack-ha | 06:07 | |
*** moiz has quit IRC | 06:08 | |
moizarif | masahito: you mentioned a 5 mins convergence period for evacuations. why 5 mins? is it depenedent on the # of VMs on the host? | 06:15 |
moizarif | okay i unplugged the compute node. this is what happened: | 06:18 |
moizarif | 1. nova-compute got disabled | 06:19 |
moizarif | 2. Pacemaker marked the compute node as OFFLINE (crm_mon) | 06:19 |
moizarif | 3. its been 10 mins now and still no evacuations. | 06:19 |
moizarif | controller logs say: | 06:21 |
moizarif | Apr 21 06:09:34 juju-machine-3-lxc-3 masakari-controller(22126): INFO: <_MainThread(MainThread, started 140292292380480)> Recieved notification : { "id": "2f27dcd3-00c0-45e0-8214-6ea8308d8eb9", "type": "nodeStatus", "regionID": "serverstack", "hostname": "compute1-t4", "uuid": "", "time": "20160421060934", "eventID": "1", "eventType": "1", "detail": "02", | 06:21 |
moizarif | "startTime": "20160421060934", "endTime": null, "tzname": "'UTC', 'UTC'", "daylight": "0", "cluster_port": ""}' | 06:21 |
moizarif | Apr 21 06:09:34 juju-machine-3-lxc-3 masakari-controller(22126): INFO: <_MainThread(MainThread, started 140292292380480)> {u'eventID': u'1', u'hostname': u'compute1-t4', u'uuid': u'', u'eventType': u'1', u'regionID': u'serverstack', u'cluster_port': u'', u'detail': u'02', u'daylight': u'0', u'tzname': u"'UTC', 'UTC'", u'startTime': u'20160421060934', u'time': u'20160421060934', u'endTime': | 06:21 |
moizarif | None, u'type': u'nodeStatus', u'id': u'2f27dcd3-00c0-45e0-8214-6ea8308d8eb9'}' | 06:21 |
moizarif | Apr 21 06:09:34 juju-machine-3-lxc-3 masakari-controller(22126): INFO: <Thread(Thread-4, started 140292101179136)> Disable nova-compute on compute1-t4' | 06:21 |
*** sasukeh has quit IRC | 06:21 | |
masahito | moizarif: 5 mins was just our requirements. | 06:24 |
moizarif | nothing in the logs of process & host monitor on the reserved host | 06:24 |
masahito | moizarif: what we need to be sure is we have to wait the down host is fenced. so it waits 5 mins. | 06:25 |
masahito | questions | 06:25 |
masahito | 1. the host name 'compute1-t4' is equal to the name in crm_mon output? | 06:26 |
moizarif | yes | 06:26 |
masahito | 2. when you register reserved host by CLI, what did you specify in cluster_port? | 06:27 |
moizarif | python /home/ubuntu/masakari/masakari-controller/utils/reserve_host_manage.py --mode add --port "172.30.1.205:5405" --host compute2-b1 --db-user root --db-password 123 --db-host 127.0.0.1 | 06:27 |
moizarif | i used this command. dont think i added any cluster port | 06:28 |
masahito | got it. | 06:29 |
moizarif | 5405 port | 06:29 |
masahito | I think the difference between the command --port "172.30.1.205:5405" and the notification "cluster_port": ""}' causes evacuation fialure. | 06:30 |
masahito | I'm thinking it's bad point in Masakari | 06:31 |
masahito | Masakari | 06:31 |
moizarif | my mysql output for reserve_list table: | 06:31 |
moizarif | mysql> select * from reserve_list; | 06:31 |
moizarif | +----+---------------------+-----------+---------------------+---------+-------------------+-------------+ | 06:31 |
moizarif | | id | create_at | update_at | delete_at | deleted | cluster_port | hostname | | 06:31 |
moizarif | +----+---------------------+-----------+---------------------+---------+-------------------+-------------+ | 06:31 |
moizarif | | 1 | 2016-04-20 09:59:19 | NULL | 2016-04-20 11:55:51 | 1 | 172.30.1.205:5405 | compute2-b1 | | 06:31 |
moizarif | +----+---------------------+-----------+---------------------+---------+-------------------+-------------+ | 06:31 |
moizarif | 1 row in set (0.00 sec) | 06:31 |
*** sasukeh has joined #openstack-ha | 06:32 | |
masahito | Masakari specifies whether the host notified by hostmonitor is in same cluster or not with cluster_port valuse. | 06:32 |
masahito | but the cluster_port in a notification is generated based on corosync.conf | 06:33 |
masahito | we need to improve it. | 06:33 |
moizarif | so what can be a temporary workaround for this that i can use? | 06:34 |
masahito | so a workaround for it is you add reserved host with --cluster_port "". | 06:34 |
moizarif | got it. let me try this out | 06:34 |
masahito | instead of 172.30.1.205:5405 | 06:34 |
*** sasukeh has quit IRC | 06:39 | |
*** rsjethani has joined #openstack-ha | 06:43 | |
rsjethani | hi masahito | 06:44 |
masahito | rsjethani: hi | 06:45 |
rsjethani | I have a few questions regarding masakari | 06:45 |
masahito | rsjethani: ok, go ahead. | 06:46 |
rsjethani | ok, first of all why host monitor and process monitor are written in shell script instead of python? | 06:47 |
masahito | because we are using those as a RA of pacemaker local. | 06:48 |
masahito | and both call linux commands, like crm_mon or etc. so it's suite to be linux. | 06:51 |
rsjethani | But RA interface in language independent | 06:51 |
*** sasukeh has joined #openstack-ha | 06:51 | |
*** hoangcx_ has joined #openstack-ha | 06:53 | |
rsjethani | http://www.linux-ha.org/wiki/Resource_agents | 06:53 |
rsjethani | Topic "Implementation" says the RA just needs to have a predefined interface but | 06:54 |
*** hoangcx has quit IRC | 06:54 | |
rsjethani | The reason I a msaying this is beacause shell scripts are hard to follow and maintain. | 06:55 |
rsjethani | Also we want to get masakari under openstack where primary language in python | 06:55 |
masahito | yes, we had options. but we've decided to implement it by shell script. | 06:59 |
rsjethani | ok :) | 06:59 |
masahito | yes, I know. | 06:59 |
masahito | agreed to hard to maintain. | 07:00 |
rsjethani | So where doed host monitor run? | 07:00 |
masahito | on compute node? Is that answer for your question? | 07:00 |
rsjethani | yes I am trying to understand how and where host monitor runs in a an system where we have say three compute nodes and one master/controller node | 07:02 |
masahito | in that case, hostmonitor should run on all 3 compute nodes. | 07:03 |
rsjethani | and masakari-controller will run on the controller node right? | 07:04 |
masahito | right | 07:04 |
rsjethani | ok | 07:05 |
rsjethani | Another Question: why we need masakari-controler? | 07:09 |
rsjethani | IMO we can make all three comaponents as independent services | 07:09 |
rsjethani | just like nova ,glance etc | 07:10 |
rsjethani | let HM make its own decisions. Same goes for IM | 07:10 |
masahito | To conduct all error especially for race conditions, we introduce masakari-controller | 07:12 |
rsjethani | Can you give example of race condition. thanks | 07:13 |
masahito | for example, if the host goes down while instance monitor is rebuilding VM, when should it call evacuate API? | 07:14 |
masahito | From outside of Nova, we can't stop rebuild steps in Nova. | 07:14 |
*** dgurtner has joined #openstack-ha | 07:15 | |
rsjethani | Thanks masahito. I will look further into masakari and come back here :) | 07:16 |
*** permalac has quit IRC | 07:19 | |
moizarif | masahito: the command: python reserve_host_manage.py --mode add --cluster_port "172.40.1.205:5405" --host compute2-b1 --db-user root --db-password 123 --db-host 127.0.0.1 | 07:29 |
moizarif | gives: reserve_host_manage.py: error: unrecognized arguments: --cluster_port 172.40.1.205:5405 | 07:30 |
*** moiz has joined #openstack-ha | 07:35 | |
*** dileepr has quit IRC | 07:39 | |
masahito | sorry, --port is correct | 07:43 |
masahito | I meant --port "" | 07:44 |
*** jpena|off is now known as jpena | 07:44 | |
*** masahito has quit IRC | 07:46 | |
*** moiz_ has joined #openstack-ha | 07:51 | |
*** moiz has quit IRC | 07:53 | |
*** hoangcx_ has quit IRC | 07:56 | |
*** hoangcx has joined #openstack-ha | 07:56 | |
*** sasukeh has quit IRC | 08:02 | |
*** moizarif has quit IRC | 08:03 | |
*** haukebruno has joined #openstack-ha | 08:13 | |
*** masahito has joined #openstack-ha | 08:22 | |
*** markvoelker has quit IRC | 08:27 | |
aspiers | rsjethani: are you coming to Austin? | 08:29 |
rsjethani | Hi aspiers | 08:44 |
rsjethani | No I won't be there :( | 08:44 |
aspiers | :( | 08:44 |
rsjethani | But my colleagues will be there | 08:45 |
aspiers | rsjethani: then watch for the video of https://www.openstack.org/summit/austin-2016/summit-schedule/events/7327 | 08:45 |
rsjethani | ok | 08:45 |
*** rossella_s has quit IRC | 09:03 | |
*** rossella_s has joined #openstack-ha | 09:06 | |
*** markvoelker has joined #openstack-ha | 09:27 | |
*** markvoelker has quit IRC | 09:32 | |
*** moiz_ has quit IRC | 09:46 | |
*** masahito has quit IRC | 09:52 | |
*** sasukeh has joined #openstack-ha | 09:57 | |
*** hoangcx has quit IRC | 10:32 | |
*** rsjethani has quit IRC | 10:40 | |
*** rsjethani has joined #openstack-ha | 10:40 | |
*** dgurtner has quit IRC | 10:47 | |
*** dgurtner has joined #openstack-ha | 10:47 | |
*** sasukeh has quit IRC | 11:14 | |
*** ChanServ changes topic to "OpenStack HA | next meeting in Austin! 12:30pm Expo Hall 5, table with ClusterLabs sign" | 11:35 | |
*** jpena is now known as jpena|lunch | 11:36 | |
aspiers | http://clusterlabs.org/pipermail/users/2016-April/002753.html | 11:41 |
*** mjura has quit IRC | 11:48 | |
*** mjura has joined #openstack-ha | 12:04 | |
*** markvoelker has joined #openstack-ha | 12:17 | |
*** yan-gao has quit IRC | 12:19 | |
*** yan-gao has joined #openstack-ha | 12:21 | |
*** yan-gao has quit IRC | 12:29 | |
*** yan-gao has joined #openstack-ha | 12:29 | |
*** mjura has quit IRC | 12:38 | |
*** mjura has joined #openstack-ha | 12:51 | |
*** jpena|lunch is now known as jpena | 12:59 | |
*** sasukeh has joined #openstack-ha | 13:38 | |
*** rsjethani has quit IRC | 13:50 | |
*** kgaillot has joined #openstack-ha | 13:56 | |
*** sasukeh has quit IRC | 14:20 | |
*** mjura has quit IRC | 15:16 | |
*** sigmavirus24_awa is now known as sigmavirus24 | 15:33 | |
*** sasukeh has joined #openstack-ha | 15:41 | |
*** dgurtner has quit IRC | 15:46 | |
*** sasukeh has quit IRC | 15:46 | |
*** openstackgerrit has quit IRC | 15:48 | |
*** openstackgerrit has joined #openstack-ha | 15:49 | |
*** sasukeh has joined #openstack-ha | 15:52 | |
*** sasukeh has quit IRC | 16:09 | |
*** jpena is now known as jpena|off | 16:58 | |
*** serverascode has quit IRC | 17:03 | |
*** rossella_s has quit IRC | 17:03 | |
*** rossella_s has joined #openstack-ha | 17:04 | |
*** serverascode has joined #openstack-ha | 17:05 | |
*** sasukeh has joined #openstack-ha | 17:10 | |
*** sasukeh has quit IRC | 17:16 | |
*** rossella_s has quit IRC | 17:19 | |
*** rossella_s has joined #openstack-ha | 17:20 | |
*** FL1SK has quit IRC | 17:23 | |
*** sigmavirus24 is now known as sigmavirus24_awa | 17:55 | |
*** hoonetorg has joined #openstack-ha | 17:56 | |
*** jpokorny has joined #openstack-ha | 17:58 | |
*** sigmavirus24_awa is now known as sigmavirus24 | 18:09 | |
*** haukebruno has quit IRC | 18:11 | |
*** sasukeh has joined #openstack-ha | 18:11 | |
*** sasukeh has quit IRC | 18:16 | |
*** sasukeh has joined #openstack-ha | 19:12 | |
*** sasukeh has quit IRC | 19:18 | |
*** FL1SK has joined #openstack-ha | 19:24 | |
*** sigmavirus24 is now known as sigmavirus24_awa | 20:09 | |
*** sigmavirus24_awa is now known as sigmavirus24 | 20:13 | |
*** sasukeh has joined #openstack-ha | 20:15 | |
*** sasukeh has quit IRC | 20:19 | |
*** sasukeh has joined #openstack-ha | 21:41 | |
*** sasukeh has quit IRC | 21:46 | |
*** sigmavirus24 is now known as sigmavirus24_awa | 22:35 | |
*** sasukeh has joined #openstack-ha | 22:42 | |
*** vuntz has quit IRC | 22:43 | |
*** vuntz has joined #openstack-ha | 22:44 | |
*** kgaillot has quit IRC | 22:45 | |
*** sasukeh has quit IRC | 22:47 | |
*** markvoelker has quit IRC | 23:20 | |
*** sasukeh has joined #openstack-ha | 23:34 | |
*** sasukeh has quit IRC | 23:39 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!