*** masahito has joined #openstack-ha | 00:45 | |
*** smoriya_ has joined #openstack-ha | 02:27 | |
*** moiz has joined #openstack-ha | 04:09 | |
moiz | hey guys. | 04:37 |
---|---|---|
moiz | i am trying out masakari approach on my canonical setup | 04:37 |
moiz | i have started with 1 controller (nova lxc) and 1 compute + 1 reserved host | 04:38 |
moiz | my process & host monitors are running | 04:38 |
moiz | however when on the controller i start masakari-controller it doesn't start up | 04:38 |
moiz | says " * masakari is not running" | 04:39 |
moiz | the db is setup & the database & tables are also present | 04:39 |
moiz | i also dont see any logs on the controller | 04:39 |
*** smoriya_ has quit IRC | 05:00 | |
moiz | is there a requirement for masakari controller to run on the same host where keystone, mysql & nova are present? | 05:06 |
*** rsjethani has joined #openstack-ha | 05:52 | |
*** rsjethani has quit IRC | 06:00 | |
masahito | moiz: what version are you using? | 06:14 |
masahito | moiz: I encountered same situation when I forgot installing SQLAlchemy-Utils. | 06:17 |
masahito | moiz: now the package is listed on requirements.txt | 06:17 |
moiz | i am using the 1.0.0 | 06:28 |
moiz | and i have SQLAlchemy-Utils installed | 06:29 |
masahito | 1.0.0 doesn't use SQLAlchemy-Utils, so it's not related... | 06:35 |
masahito | hmm.... | 06:35 |
masahito | Or does masakari-controller have access permission to the directory? | 06:35 |
masahito | s/access permission/write permission/ | 06:36 |
moiz | which directory? | 06:36 |
moiz | one more thing. i am getting this error in masakari-controller logs: | 06:38 |
moiz | Apr 18 06:08:24 masakari(12031): ERROR: --MonitoringMessage--ID:[RecoveryController_0003]An error during initializing masakari controller: Could not find a suitable e$ | 06:38 |
moiz | Apr 18 06:08:24 masakari(12031): ERROR: <class 'keystoneauth1.exceptions.discovery.VersionNotAvailable'>' | 06:38 |
moiz | Apr 18 06:08:24 masakari(12031): ERROR: Could not find a suitable endpoint for client version: 3' | 06:38 |
moiz | Apr 18 06:08:24 masakari(12031): ERROR: --MonitoringMessage--ID:[RecoveryController_0003]An error during initializing masakari controller: Could not find a suitable endpoint for client version: 3' | 06:38 |
masahito | this error is masakari-controller's log :-) | 06:38 |
moiz | yes | 06:39 |
moiz | do keystone & nova need to be on the same machine for masakari-controller to work? | 06:39 |
masahito | masakari assumes your openstack is using KeystoneV3 for authentification now | 06:39 |
masahito | because our system, which is designed for masakari's first deploy, uses keystoneV3. | 06:40 |
masahito | keystone & nova don' t need to be on the same machine for the controller. | 06:41 |
moiz | okay got it | 06:41 |
moiz | and i believe i have keystone v2 on my system | 06:42 |
moiz | also. running create_database.sh gives me /usr/bin/python: No module named masakari-controller.db | 06:55 |
moiz | but running python create_tables.py gives "Successfully created tables" | 06:56 |
masahito | could you check if your db has vm_ha database? | 07:01 |
masahito | it's a database masakari-controller uses | 07:01 |
moiz | yes it exists | 07:02 |
masahito | ok, no problem ;) | 07:02 |
*** pcaruana has joined #openstack-ha | 07:15 | |
*** mjura has joined #openstack-ha | 07:17 | |
*** beekhof has joined #openstack-ha | 07:29 | |
*** dgurtner has joined #openstack-ha | 07:34 | |
*** dgurtner has joined #openstack-ha | 07:34 | |
*** haukebruno has joined #openstack-ha | 07:44 | |
*** dgurtner has quit IRC | 07:46 | |
*** rossella_s has joined #openstack-ha | 07:48 | |
*** dgurtner has joined #openstack-ha | 07:48 | |
*** dgurtner has quit IRC | 07:48 | |
*** dgurtner has joined #openstack-ha | 07:48 | |
*** rmart04 has joined #openstack-ha | 07:55 | |
moiz | masahito: i have resolved the issue. masakari controller is now running :) | 07:58 |
moiz | masahito: on the compute node my process and host monitors are also running but the instancemonitor is starting up " * masakari-instancemonitor is not running " | 07:59 |
moiz | the 15868 port is also in listening state on the controller | 07:59 |
moiz | no logs for instancemonitor yet. any idea ? | 08:01 |
aspiers | morning | 08:01 |
aspiers | or evening | 08:01 |
masahito | aspiers: hi | 08:02 |
aspiers | -> #openstack-meeting | 08:02 |
masahito | moiz: I solved https://github.com/ntt-sic/masakari/pull/23 for instancemonitor | 08:06 |
masahito | could you check it? | 08:06 |
masahito | it's really tiny patch | 08:06 |
aspiers | masahito, moiz: are you joining the meeting? | 08:07 |
moiz | let me try it | 08:11 |
moiz | aspiers: okay i am there | 08:11 |
*** jpena|off is now known as jpena | 08:49 | |
moiz | masahito: the patch is already in place. | 08:54 |
moiz | p.s. i got the new version from 1.1.0: https://github.com/ntt-sic/masakari/releases/tag/1.1.0 | 08:54 |
masahito | python-daemon and python-libvirt are already installed, aren't those. | 08:57 |
masahito | IIRC, both are installed when nova is installed. | 08:58 |
moiz | yes | 09:04 |
moiz | they are installed | 09:04 |
masahito | hmm... | 09:05 |
masahito | no logs, no process. right? | 09:05 |
moiz | yes. | 09:06 |
moiz | logs are there for host & process monitors only | 09:06 |
*** Deng has joined #openstack-ha | 09:07 | |
moiz | and in the processmonitor.log i see: 2016-04-18 09:05:48 Compute0-B3 process_status_checker.sh: down process id_no : 02 | 09:10 |
masahito | it means processmonitor fails to start or restart instancemonitor. it's right log in that case. | 09:11 |
moiz | which process is down in this case? id_no: 02 ? | 09:18 |
moiz | okay i need to look why instancemonitor is failing. i have rechecked the .conf files and they are okay | 09:19 |
masahito | the number is basing on the number in masakari-processmonitor.conf | 09:20 |
aspiers | masahito: are there any reasons not to replace processmonitor with pacemaker process monitoring? IIRC one point was that if nova-compute doesn't start, Pacemaker won't fence - is that right? | 09:21 |
masahito | yes | 09:25 |
masahito | additionally, disable the nova-compute | 09:26 |
masahito | there were no pacemaker professional when we developed it, so we developped it. | 09:27 |
aspiers | masahito: I think it should be possible to fix that in pacemaker | 09:28 |
masahito | yap. | 09:28 |
aspiers | you could have a custom nova-compute-service resource | 09:28 |
aspiers | and nova-compute would depend on it | 09:28 |
aspiers | when nova-compute gets stopped, nova-compute-service would do nova service-disable | 09:29 |
masahito | oh, s/no pacemaker professional/no pacemaker professional in our team/ | 09:29 |
aspiers | :) | 09:29 |
aspiers | masahito: also, is masakari HA? e.g. what if the daemon crashes? | 09:30 |
masahito | we use pacemaker for the controller, hostmonitor and processmonitor. | 09:31 |
aspiers | cool | 09:31 |
aspiers | masahito: and it retries failed evacuations? | 09:32 |
aspiers | masahito: how does it know evacuation succeeded? | 09:32 |
masahito | aspiers: controller has evacuation status in its db. | 09:33 |
masahito | aspiers: so it retries evacuation or wait evacuation after controller's failover. | 09:33 |
aspiers | how does it get evacuation status? | 09:34 |
aspiers | I think it's possible for resurrection to fail after nova evacuate API succeeds | 09:34 |
aspiers | e.g. if nova-scheduler dies at the wrong time | 09:34 |
aspiers | I was discussing this with ddeja on Friday | 09:35 |
masahito | usually check instance status using Nova API, or IIRC wait evacuation timeout set in masakari.conf. | 09:39 |
moiz | masahito: my db has started receiving node status notications and i can see them in notification_list table. | 09:42 |
moiz | ahhhh | 09:46 |
moiz | found it | 09:46 |
moiz | ImportError: No module named httplib2 | 09:46 |
moiz | solved it | 09:46 |
moiz | instancemonitor working now | 09:46 |
masahito | moiz: great | 09:47 |
moiz | i have a suggestion: i found out this when i manually ran python file for masakari_instancemonitor.py in python /opt/masakari/instancemonitor/masakari_instancemonitor.py | 09:47 |
moiz | and i couldnt find this issue any where else | 09:47 |
moiz | masakari should point this out and generate logs even if any service is failing to run | 09:48 |
moiz | like i should have got this error in instancemonitor.log | 09:48 |
masahito | moiz: oh, you're right. | 09:50 |
masahito | if possible, could you report issue in masakari repo? | 09:51 |
moiz | can you share me the link? | 09:51 |
masahito | https://github.com/ntt-sic/masakari/issues | 09:51 |
masahito | it helps us not to forget these. | 09:52 |
moiz | okay i will do it. | 09:52 |
moiz | also faced this same issue while trying to run masakari controller. | 09:52 |
aspiers | masahito: so masakari checks that the instance started on another host? | 09:57 |
aspiers | masahito: also, does masakari use force-down API? | 09:58 |
masahito | aspiers: for first one, yes. for second one, no. | 10:02 |
masahito | aspiers: I wrote evacuation steps in doc. please see this: https://github.com/ntt-sic/masakari/blob/master/docs/evacuation_patterns.md | 10:03 |
aspiers | ok thanks! | 10:03 |
moiz | masahito: https://github.com/ntt-sic/masakari/issues/35 | 10:07 |
masahito | moiz: thanks! | 10:08 |
moiz | masahito: now masakari is setup, masakari controller & DB on openstack controller & host/process/instance monitors on compute node & 1 reserve node configured. | 10:19 |
moiz | pacemaker and pacemaker_remote are also setup and running | 10:19 |
moiz | what are the next steps, adding stonith resource for compute node in pacemaker ? | 10:19 |
moiz | and test evacuations based on setup written on github? | 10:19 |
aspiers | masahito: what is the "resized" status? | 10:20 |
masahito | moiz: yes | 10:20 |
masahito | aspiers: it's in resizing operation by Nova resize API that enables user to change instance flavor size. | 10:21 |
aspiers | masahito: but it's also mentioned for hostmonitor host down event? | 10:22 |
aspiers | resizing a host?? | 10:22 |
aspiers | masahito: also, can it evacuate multiple hosts in parallel? | 10:23 |
masahito | aspiers: this row says controller receives host down events while the instance on the down host is in 'resized' state. | 10:24 |
aspiers | oh | 10:24 |
masahito | aspiers: multiple means multi hosts in one pacemaker cluster or multi hosts in different pacemaker clusters? | 10:25 |
aspiers | one cluster | 10:25 |
masahito | I think it can if you set reserved hosts more than 2. | 10:26 |
aspiers | masahito: can you choose which VMs are HA? or does it always make all HA? | 10:32 |
aspiers | masahito: or choose per compute-node, or per AZ, or per project? | 10:32 |
masahito | aspiers: yes, you can. | 10:34 |
aspiers | masahito: which one? | 10:34 |
masahito | aspiers: if VMs have metadata, key is "HA-Enable" and value is "OFF", the vm doesn't evacuated. | 10:34 |
aspiers | ahh, nice | 10:35 |
aspiers | what about per-project? | 10:35 |
*** dgurtner has quit IRC | 10:35 | |
masahito | per-vms | 10:35 |
aspiers | is it possible to set defaults per project? | 10:35 |
aspiers | I guess that's an OpenStack question | 10:35 |
masahito | no | 10:35 |
aspiers | ok | 10:35 |
moiz | masahito: what about VMs belonging to different tenants residing on the same compute node? are all VMs evacuated in this case? | 10:37 |
masahito | moiz: all are evacuated. it doesn't care tenants to evacuate. | 10:39 |
aspiers | masahito: which platforms/versions does masakari currently support? | 10:46 |
masahito | aspiers: ubuntu14.04 is sure to be supported. | 10:47 |
masahito | samP tested it on up-to-date CentOS. | 10:48 |
aspiers | which CentOS? | 10:48 |
aspiers | masahito: which Pacemaker version on 14.04? | 10:49 |
masahito | Cent7 | 10:49 |
aspiers | ok | 10:49 |
masahito | default version on 14.04 is 1.1.10, so if you want to use pacemaker-remote on it you need to build it locally. | 10:49 |
aspiers | ok | 10:50 |
masahito | I heard 16.04 will support 1.1.14 | 10:50 |
masahito | btw, I'll leave in few mins here. If you have another question, please mail me or see you towmorrow ;-) | 10:51 |
aspiers | ok :) | 10:51 |
aspiers | masahito: I will send you URL to review slide deck | 10:51 |
masahito | aspiers: got it. | 10:51 |
*** masahito has quit IRC | 10:54 | |
moiz | default on 14.04 is 1.1.10. i am using 1.1.12 in order to use masakari with pacemaker remote nodes | 10:59 |
*** serverascode_ has joined #openstack-ha | 11:02 | |
*** g3ek- has joined #openstack-ha | 11:04 | |
*** ljjjustin_ has joined #openstack-ha | 11:05 | |
*** ljjjustin has quit IRC | 11:05 | |
*** serverascode has quit IRC | 11:06 | |
*** zehua has quit IRC | 11:06 | |
*** g3ek has quit IRC | 11:06 | |
*** g3ek- is now known as g3ek | 11:06 | |
*** ljjjustin_ is now known as ljjjustin | 11:06 | |
*** zehua has joined #openstack-ha | 11:08 | |
*** serverascode_ is now known as serverascode | 11:12 | |
*** dgurtner has joined #openstack-ha | 11:23 | |
*** markvoelker has joined #openstack-ha | 12:19 | |
*** jpena is now known as jpena|lunch | 12:36 | |
*** Deng has quit IRC | 12:44 | |
*** smoriya_afk has joined #openstack-ha | 12:53 | |
haukebruno | aspiers, crowbar is crowbar, right? I read about crowbar and opencrowbar and digital rebar | 13:18 |
* haukebruno is confused | 13:18 | |
*** dgurtner has quit IRC | 13:21 | |
aspiers | errr | 13:22 |
aspiers | yes, crowbar is crowbar?! | 13:22 |
aspiers | haukebruno: opencrowbar is dead | 13:22 |
*** dgurtner has joined #openstack-ha | 13:23 | |
haukebruno | and what is that digital rebar thing? | 13:23 |
aspiers | haukebruno: digital rebar is a rewrite of Crowbar which no longer focuses on OpenStack deployment | 13:23 |
haukebruno | good or evil? or dead too? | 13:23 |
haukebruno | ah ok | 13:23 |
haukebruno | so -> it is crowbar | 13:23 |
aspiers | by RackN, startup of ex-Dell employees | 13:23 |
aspiers | rebar is NOT Crowbar | 13:23 |
haukebruno | ah no. I mean: then it - the solution - for me could be crowbar, not rebar, not opencrowbar | 13:25 |
aspiers | haukebruno: oh I see | 13:29 |
aspiers | haukebruno: yes, definitely :) | 13:30 |
*** kgaillot has joined #openstack-ha | 13:34 | |
*** jpena|lunch is now known as jpena | 13:49 | |
*** moiz has quit IRC | 13:51 | |
*** sigmavirus24_awa is now known as sigmavirus24 | 14:03 | |
*** nkrinner has quit IRC | 14:10 | |
*** dgurtner has quit IRC | 14:43 | |
*** dgurtner has joined #openstack-ha | 14:45 | |
*** mjura has quit IRC | 15:20 | |
*** haukebruno has quit IRC | 16:04 | |
*** raginbajin has quit IRC | 16:05 | |
*** raginbajin has joined #openstack-ha | 16:10 | |
*** pcaruana has quit IRC | 16:21 | |
*** dgurtner has quit IRC | 16:23 | |
*** rmart04 has quit IRC | 16:29 | |
*** rossella_s has quit IRC | 17:03 | |
*** rossella_s has joined #openstack-ha | 17:04 | |
*** jpena is now known as jpena|off | 17:07 | |
*** sigmavirus24 is now known as sigmavirus24_awa | 17:26 | |
*** sigmavirus24_awa is now known as sigmavirus24 | 17:28 | |
*** dgurtner has joined #openstack-ha | 17:39 | |
*** dgurtner has joined #openstack-ha | 17:39 | |
*** dgurtner has quit IRC | 17:59 | |
*** rmart04 has joined #openstack-ha | 18:15 | |
*** rmart04 has quit IRC | 18:20 | |
*** moiz has joined #openstack-ha | 18:54 | |
*** raginbajin has quit IRC | 19:16 | |
*** FL1SK has quit IRC | 19:19 | |
*** raginbajin has joined #openstack-ha | 19:21 | |
*** moiz has quit IRC | 21:01 | |
*** rossella_s has quit IRC | 21:03 | |
*** rossella_s has joined #openstack-ha | 21:04 | |
*** FL1SK has joined #openstack-ha | 21:20 | |
*** dileepr has joined #openstack-ha | 21:53 | |
*** sigmavirus24 is now known as sigmavirus24_awa | 22:20 | |
*** markvoelker has quit IRC | 22:28 | |
*** kgaillot has quit IRC | 22:56 | |
*** markvoelker has joined #openstack-ha | 23:28 | |
*** markvoelker has quit IRC | 23:33 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!