opendevreview | OpenStack Proposal Bot proposed openstack/openstack-ansible master: Imported Translations from Zanata https://review.opendev.org/c/openstack/openstack-ansible/+/949692 | 03:35 |
---|---|---|
opendevreview | Merged openstack/openstack-ansible-os_neutron master: Add OVS Ubuntu 24.04 jobs https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/948702 | 03:57 |
opendevreview | Merged openstack/openstack-ansible-os_neutron master: Update uwsgi re-disable reno to contain bug ID https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/949749 | 03:57 |
noonedeadpunk | mornings | 06:41 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_neutron master: Configure apparmor for dnsmasq https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/949780 | 06:42 |
noonedeadpunk | seems we are all set? | 06:43 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_neutron stable/2024.2: Respect aa-disable exit code when disabling profiles https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/949847 | 06:43 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-lxc_hosts stable/2024.2: Patch the usr.bin.lxc-copy apparmor profile https://review.opendev.org/c/openstack/openstack-ansible-lxc_hosts/+/949848 | 06:43 |
noonedeadpunk | I have one personal request kinda - having https://review.opendev.org/c/openstack/openstack-ansible-ops/+/943866 in would be extremely helpful for some of my "home" projects | 06:49 |
noonedeadpunk | it has quite decent molecule coverage | 06:49 |
noonedeadpunk | and I tried to document it even | 06:51 |
noonedeadpunk | https://3753dea4ea75a3fe0692-37ad574b65e44b7a1b1058900a1af5e6.ssl.cf5.rackcdn.com/openstack/6e7edb865aff4c7dbbb9a9ea2c3ac33d/docs/encrypt_secrets.html | 06:51 |
opendevreview | Merged openstack/openstack-ansible-memcached_server master: Remove tags from README https://review.opendev.org/c/openstack/openstack-ansible-memcached_server/+/948950 | 06:51 |
opendevreview | Merged openstack/openstack-ansible-openstack_hosts master: Remove tags from README https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/948949 | 06:51 |
noonedeadpunk | and I was thinking to add smth like sops or vault as another role in collection | 06:52 |
noonedeadpunk | (one day) | 06:52 |
opendevreview | Merged openstack/openstack-ansible-os_mistral master: Remove tags from README and small fix with markings https://review.opendev.org/c/openstack/openstack-ansible-os_mistral/+/948936 | 06:53 |
opendevreview | Merged openstack/openstack-ansible-os_heat master: Remove tags from README https://review.opendev.org/c/openstack/openstack-ansible-os_heat/+/948940 | 06:55 |
opendevreview | Merged openstack/openstack-ansible-os_gnocchi master: Remove tags from README https://review.opendev.org/c/openstack/openstack-ansible-os_gnocchi/+/948945 | 06:57 |
opendevreview | Merged openstack/openstack-ansible-openstack_openrc master: Remove tags from README https://review.opendev.org/c/openstack/openstack-ansible-openstack_openrc/+/948948 | 06:57 |
opendevreview | Merged openstack/openstack-ansible-os_barbican master: Remove tags from README https://review.opendev.org/c/openstack/openstack-ansible-os_barbican/+/948946 | 06:57 |
opendevreview | Merged openstack/openstack-ansible-ceph_client master: Remove tags from README https://review.opendev.org/c/openstack/openstack-ansible-ceph_client/+/948941 | 06:57 |
opendevreview | Merged openstack/openstack-ansible-os_designate master: Remove tags from README https://review.opendev.org/c/openstack/openstack-ansible-os_designate/+/948942 | 06:57 |
opendevreview | Merged openstack/openstack-ansible-lxc_container_create master: Remove tags from README https://review.opendev.org/c/openstack/openstack-ansible-lxc_container_create/+/948951 | 06:58 |
opendevreview | Merged openstack/openstack-ansible-os_rally master: Remove tags from README https://review.opendev.org/c/openstack/openstack-ansible-os_rally/+/948937 | 06:59 |
opendevreview | Merged openstack/openstack-ansible-os_glance master: Remove tags from README https://review.opendev.org/c/openstack/openstack-ansible-os_glance/+/948939 | 06:59 |
opendevreview | Merged openstack/openstack-ansible-os_ironic master: Remove tags from README https://review.opendev.org/c/openstack/openstack-ansible-os_ironic/+/948938 | 06:59 |
opendevreview | Merged openstack/openstack-ansible-os_octavia master: Remove tags from README https://review.opendev.org/c/openstack/openstack-ansible-os_octavia/+/948935 | 06:59 |
opendevreview | Merged openstack/openstack-ansible-haproxy_server master: Remove tags from README https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/948947 | 07:00 |
opendevreview | Merged openstack/openstack-ansible-os_manila master: Remove tags from README https://review.opendev.org/c/openstack/openstack-ansible-os_manila/+/948929 | 07:00 |
opendevreview | Merged openstack/openstack-ansible master: Imported Translations from Zanata https://review.opendev.org/c/openstack/openstack-ansible/+/949692 | 07:19 |
noonedeadpunk | aha, couple of patches need re-checking | 07:20 |
opendevreview | Dmitriy Rabotyagov proposed openstack/ansible-role-pki master: Do not run CA generation code when already exists https://review.opendev.org/c/openstack/ansible-role-pki/+/867549 | 07:23 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible unmaintained/zed: Remove retired qdrouterd, os_sahara, os_senlin and os_murano repos from zuul jobs https://review.opendev.org/c/openstack/openstack-ansible/+/949203 | 07:44 |
opendevreview | Dmitriy Chubinidze proposed openstack/openstack-ansible master: docs: update troubleshooting page https://review.opendev.org/c/openstack/openstack-ansible/+/949779 | 08:01 |
user01000 | Hi I have installed masakari service and created segment using horizon , added hosts to the segment as well i have doubt on this, anyway I created "type" as compute and set "control attribute" as default I am not sure they are correct. anyway if i kill the vm process its automatically restarted but if a compute node is down nothing happens, is there any additional settings | 08:52 |
opendevreview | Dmitriy Chubinidze proposed openstack/openstack-ansible master: docs: update managing instances page https://review.opendev.org/c/openstack/openstack-ansible/+/949734 | 08:57 |
opendevreview | Dmitriy Chubinidze proposed openstack/openstack-ansible master: docs: update managing instances page https://review.opendev.org/c/openstack/openstack-ansible/+/949734 | 08:58 |
opendevreview | Dmitriy Chubinidze proposed openstack/openstack-ansible master: docs: update environment scaling page https://review.opendev.org/c/openstack/openstack-ansible/+/949763 | 08:59 |
noonedeadpunk | user01000: so these 2 actions are controlled by different monitors | 09:00 |
noonedeadpunk | when you kill the VM it's the instance monitor which trigger the event | 09:00 |
noonedeadpunk | when you kill the node - it should be the host monitor | 09:01 |
noonedeadpunk | the host monitor relies on an external cluter mechanism to report node failure | 09:01 |
noonedeadpunk | by default - that is pacemaker/corosync | 09:01 |
noonedeadpunk | so I think it's worth checking if crm_mon -1 is supplying the list of your computes and the hostnames there are exactly the same as in your compute service list output. | 09:02 |
noonedeadpunk | then check if your masakari-engine recieves the event and if tries to do anything with it | 09:02 |
user01000 | noonedeadpunk: I am not sure how to deploy the pacemaker cluster is it part of openstack-ansible project or should i manually set up | 09:03 |
noonedeadpunk | also keep in mind, that nodes added to segments must match hostnames exactly with pacemaker/corosync cluster and with nova compute services | 09:03 |
noonedeadpunk | I _think_ it should deploy with masakari in some way | 09:03 |
noonedeadpunk | but it's installed only on `masakari_monitor` group https://opendev.org/openstack/openstack-ansible-plugins/src/branch/master/playbooks/masakari.yml#L22-L46 | 09:04 |
noonedeadpunk | btw we really need to reconsider how we setup this.... | 09:05 |
noonedeadpunk | anyway | 09:05 |
user01000 | so the pcs cluster should be deployed on compute nodes right | 09:05 |
noonedeadpunk | so running crm_mon -1 on compute should be providing some output to you | 09:05 |
noonedeadpunk | I think most widespread issue there is missmatching hostnames | 09:06 |
noonedeadpunk | ie - hsotname vs fqdn | 09:06 |
noonedeadpunk | as if it's not exact match somewhere - it will not identify the correct node | 09:06 |
user01000 | Okay my compute nodes does have pacemakerd and corosync running but management tools not installed | 09:09 |
user01000 | unable to locate command: /usr/sbin/crm_mon | 09:11 |
noonedeadpunk | I have not run this in a while tbh... So you pretty much missing pacemaker-cli-utils package? | 09:13 |
user01000 | hmm i am also searching for package name | 09:15 |
user01000 | Is there any metadata for vm migration I have added "HA_Enabled" to the vms I wanted to migrate | 09:23 |
user01000 | Node List: * Online: [ compute1 compute2 compute3 compute4 ] the pcs cluster seems working as well | 09:24 |
user01000 | But i noticed "0 resource instances configured" this should be configured right or masakari is taking care of it | 09:25 |
user01000 | hostname is same i dont use fqdn | 09:26 |
noonedeadpunk | "0 resource instances configured" -> this is fine iirc | 09:27 |
noonedeadpunk | masakari hostmonitor should be listening for the traffic on the interface and detect on it's own that node was lost from the cluster | 09:28 |
noonedeadpunk | and issue an even to API | 09:28 |
noonedeadpunk | so you should be able to list and check for events there | 09:29 |
user01000 | Hmm I think there is some errors in services not sure what let me check and restart all again | 09:33 |
opendevreview | Dmitriy Chubinidze proposed openstack/openstack-ansible master: docs: update managing instances page https://review.opendev.org/c/openstack/openstack-ansible/+/949734 | 09:35 |
user01000 | noonedeadpunk: I can see evacuate host active in horizon hypervisor list but vm is still on the node not migrated | 09:47 |
user01000 | Is there any timeout for starting the migration | 09:47 |
noonedeadpunk | yes there is, and you also can not evacuate the host unless it's disabled or marked as "down" in nova | 09:48 |
opendevreview | Dmitriy Chubinidze proposed openstack/openstack-ansible master: docs: update managing instances page https://review.opendev.org/c/openstack/openstack-ansible/+/949734 | 09:51 |
user01000 | Yes it is also showing down so technically msakari should migrate the vm with HA_enabled metadata should be migrated to any active node or should i keep a spare node for this to work | 09:52 |
noonedeadpunk | masakari should evacuate any VM from crashed node by default iirc | 09:53 |
noonedeadpunk | just HA_enabled having a priority | 09:53 |
noonedeadpunk | and spare node is an optional thing as well | 09:53 |
noonedeadpunk | (but kinda recommended to ensure you have where to spawn instances) | 09:54 |
noonedeadpunk | as in case your existing nodes don't have the capacity - they won't spawn there | 09:54 |
user01000 | Yes actually I am testing the to migrte from vmware to openstack for production usage and this is a very important feature and we will keep sapare nodes atleast one | 09:55 |
noonedeadpunk | right. | 09:55 |
noonedeadpunk | I have not used masakari for a while to be completely honest, so not sure in what state it is overall | 09:56 |
user01000 | is there any other condtion this testing nodes hosts ceph storage as well so when a node failes ceph goes degraed state , in proxmox the migration will not start on this situation is there any in maskari | 09:57 |
noonedeadpunk | it will start, why won't it? | 09:58 |
noonedeadpunk | degraded state of ceph is kind of "designed" for these situations, isn't it? | 09:58 |
user01000 | noonedeadpunk: I have experience with proxmox , ha wont work in storage degraed state there | 09:58 |
user01000 | I have no idea why they created like that | 09:59 |
noonedeadpunk | so lately we jsut rely on `resume_guests_state_on_host_boot` as modern computes usually take less time to boot then to wait for the timeout to begin evacuation | 09:59 |
noonedeadpunk | https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.resume_guests_state_on_host_boot | 09:59 |
noonedeadpunk | so in cases of kernel panic, some outages, etc - node gets reseted and VMs are up as soon as node is up as well | 10:00 |
noonedeadpunk | and timeout to detect that node is down in masakari and behin evac is usually smth like 120 sec | 10:00 |
noonedeadpunk | in many cases you just get VMs up faster if using that then evac process | 10:01 |
noonedeadpunk | except when hardware is cooked | 10:01 |
noonedeadpunk | which is extremely rare situation, if you don't use 10yo stuff | 10:01 |
user01000 | I have more question but i dont care about them at this time like what if the br-mgmt network is down in that case the vms will run anyway but pcs will surely mark the node down, but i dont care i just need this thing to work :D | 10:02 |
user01000 | will use 660s probabaly no issue on hardware | 10:03 |
noonedeadpunk | yes, so if interface is down - it will trigger the evac process | 10:04 |
noonedeadpunk | as basically 2 conditions are met: a) node marked as down in cluster b) node is marked down in compute service list | 10:04 |
noonedeadpunk | but. It's usually better to use a different interface from br-mgmt, ie br-stor | 10:05 |
noonedeadpunk | or br-vxlan | 10:05 |
noonedeadpunk | as then you have communication for rabbitmq and pacemaker over different interfaces | 10:05 |
user01000 | Well i br-store on different network bond but rest are on same network bond | 10:06 |
noonedeadpunk | so you;d need both of them to be down then | 10:06 |
noonedeadpunk | yeah, right | 10:06 |
noonedeadpunk | but then again, you always need to meet both of conditions | 10:06 |
user01000 | So basically my configs are correct it should work | 10:06 |
noonedeadpunk | down in corosync and down in compute service list | 10:06 |
noonedeadpunk | I think it should, yeah | 10:07 |
opendevreview | Dmitriy Chubinidze proposed openstack/openstack-ansible master: docs: update managing instances page https://review.opendev.org/c/openstack/openstack-ansible/+/949734 | 10:07 |
user01000 | noonedeadpunk: I am getting this mesage in pacemaker any idea : warning: Cannot route message ping-crmadmin-1747304260-1: Unknown node compute01 | 10:18 |
user01000 | the name is same as #hostname command as well as /etc/hosts file also same in crm_node -l | 10:21 |
user01000 | Hmm seems typo in crm_node output let me correct it probably as you said hostname issue | 10:25 |
noonedeadpunk | there totally could be some issues in playbooks as well | 10:28 |
noonedeadpunk | as in fact we don't functionally test masakari, as it's not trivial to do without normal setup | 10:29 |
user01000 | Yes the hosts file is created wrong as well 10.61.0.54 compute01.openstack.local compute1 compute01 and pcs cluster using compute1 as node name | 10:30 |
noonedeadpunk | I don't think it's wrong per say | 10:31 |
noonedeadpunk | it;s kinda according to RFC | 10:31 |
noonedeadpunk | it;s a difference between nodename, fqdn and hostname | 10:32 |
opendevreview | Dmitriy Chubinidze proposed openstack/openstack-ansible master: docs: update managing instances page https://review.opendev.org/c/openstack/openstack-ansible/+/949734 | 10:32 |
user01000 | nodename fqdn hostname all are same | 10:32 |
noonedeadpunk | they are not according to hosts file | 10:33 |
noonedeadpunk | python3 -c "import socket; print(socket.getfqdn()); print(socket.gethostname())" | 10:34 |
noonedeadpunk | you should get compute01.openstack.local\ncompute1 | 10:34 |
noonedeadpunk | `openstack.local` is coming from here just in case https://opendev.org/openstack/openstack-ansible/src/branch/master/inventory/group_vars/all/all.yml#L88-L90 | 10:36 |
noonedeadpunk | this is smth you can override, but changing hostname will kind of mess your env | 10:37 |
user01000 | Yes i can see that | 10:37 |
noonedeadpunk | you can set `pacemaker_corosync_fqdn: true` to use fqdn in corosync cluster instead of hostname | 10:39 |
noonedeadpunk | it will influence this logic: https://github.com/noonedeadpunk/ansible-role-pacemaker_corosync/blob/master/templates/corosync.conf.j2#L45 | 10:40 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_neutron master: Configure apparmor for haproxy https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/949781 | 11:44 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!