Thursday, 2025-05-15

opendevreviewOpenStack Proposal Bot proposed openstack/openstack-ansible master: Imported Translations from Zanata  https://review.opendev.org/c/openstack/openstack-ansible/+/94969203:35
opendevreviewMerged openstack/openstack-ansible-os_neutron master: Add OVS Ubuntu 24.04 jobs  https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/94870203:57
opendevreviewMerged openstack/openstack-ansible-os_neutron master: Update uwsgi re-disable reno to contain bug ID  https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/94974903:57
noonedeadpunkmornings06:41
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_neutron master: Configure apparmor for dnsmasq  https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/94978006:42
noonedeadpunkseems we are all set?06:43
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_neutron stable/2024.2: Respect aa-disable exit code when disabling profiles  https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/94984706:43
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-lxc_hosts stable/2024.2: Patch the usr.bin.lxc-copy apparmor profile  https://review.opendev.org/c/openstack/openstack-ansible-lxc_hosts/+/94984806:43
noonedeadpunkI have one personal request kinda - having https://review.opendev.org/c/openstack/openstack-ansible-ops/+/943866 in would be extremely helpful for some of my "home" projects06:49
noonedeadpunkit has quite decent molecule coverage 06:49
noonedeadpunkand I tried to document it even06:51
noonedeadpunkhttps://3753dea4ea75a3fe0692-37ad574b65e44b7a1b1058900a1af5e6.ssl.cf5.rackcdn.com/openstack/6e7edb865aff4c7dbbb9a9ea2c3ac33d/docs/encrypt_secrets.html06:51
opendevreviewMerged openstack/openstack-ansible-memcached_server master: Remove tags from README  https://review.opendev.org/c/openstack/openstack-ansible-memcached_server/+/94895006:51
opendevreviewMerged openstack/openstack-ansible-openstack_hosts master: Remove tags from README  https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/94894906:51
noonedeadpunkand I was thinking to add smth like sops or vault as another role in collection06:52
noonedeadpunk(one day)06:52
opendevreviewMerged openstack/openstack-ansible-os_mistral master: Remove tags from README and small fix with markings  https://review.opendev.org/c/openstack/openstack-ansible-os_mistral/+/94893606:53
opendevreviewMerged openstack/openstack-ansible-os_heat master: Remove tags from README  https://review.opendev.org/c/openstack/openstack-ansible-os_heat/+/94894006:55
opendevreviewMerged openstack/openstack-ansible-os_gnocchi master: Remove tags from README  https://review.opendev.org/c/openstack/openstack-ansible-os_gnocchi/+/94894506:57
opendevreviewMerged openstack/openstack-ansible-openstack_openrc master: Remove tags from README  https://review.opendev.org/c/openstack/openstack-ansible-openstack_openrc/+/94894806:57
opendevreviewMerged openstack/openstack-ansible-os_barbican master: Remove tags from README  https://review.opendev.org/c/openstack/openstack-ansible-os_barbican/+/94894606:57
opendevreviewMerged openstack/openstack-ansible-ceph_client master: Remove tags from README  https://review.opendev.org/c/openstack/openstack-ansible-ceph_client/+/94894106:57
opendevreviewMerged openstack/openstack-ansible-os_designate master: Remove tags from README  https://review.opendev.org/c/openstack/openstack-ansible-os_designate/+/94894206:57
opendevreviewMerged openstack/openstack-ansible-lxc_container_create master: Remove tags from README  https://review.opendev.org/c/openstack/openstack-ansible-lxc_container_create/+/94895106:58
opendevreviewMerged openstack/openstack-ansible-os_rally master: Remove tags from README  https://review.opendev.org/c/openstack/openstack-ansible-os_rally/+/94893706:59
opendevreviewMerged openstack/openstack-ansible-os_glance master: Remove tags from README  https://review.opendev.org/c/openstack/openstack-ansible-os_glance/+/94893906:59
opendevreviewMerged openstack/openstack-ansible-os_ironic master: Remove tags from README  https://review.opendev.org/c/openstack/openstack-ansible-os_ironic/+/94893806:59
opendevreviewMerged openstack/openstack-ansible-os_octavia master: Remove tags from README  https://review.opendev.org/c/openstack/openstack-ansible-os_octavia/+/94893506:59
opendevreviewMerged openstack/openstack-ansible-haproxy_server master: Remove tags from README  https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/94894707:00
opendevreviewMerged openstack/openstack-ansible-os_manila master: Remove tags from README  https://review.opendev.org/c/openstack/openstack-ansible-os_manila/+/94892907:00
opendevreviewMerged openstack/openstack-ansible master: Imported Translations from Zanata  https://review.opendev.org/c/openstack/openstack-ansible/+/94969207:19
noonedeadpunkaha, couple of patches need re-checking07:20
opendevreviewDmitriy Rabotyagov proposed openstack/ansible-role-pki master: Do not run CA generation code when already exists  https://review.opendev.org/c/openstack/ansible-role-pki/+/86754907:23
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible unmaintained/zed: Remove retired qdrouterd, os_sahara, os_senlin and os_murano repos from zuul jobs  https://review.opendev.org/c/openstack/openstack-ansible/+/94920307:44
opendevreviewDmitriy Chubinidze proposed openstack/openstack-ansible master: docs: update troubleshooting page  https://review.opendev.org/c/openstack/openstack-ansible/+/94977908:01
user01000Hi I have installed masakari service and created segment using horizon , added hosts to the segment as well i have doubt on this, anyway I created "type" as compute and set "control attribute" as default I am not sure they are correct. anyway if i kill the vm process its automatically restarted but if a compute node is down nothing happens, is there any additional settings 08:52
opendevreviewDmitriy Chubinidze proposed openstack/openstack-ansible master: docs: update managing instances page  https://review.opendev.org/c/openstack/openstack-ansible/+/94973408:57
opendevreviewDmitriy Chubinidze proposed openstack/openstack-ansible master: docs: update managing instances page  https://review.opendev.org/c/openstack/openstack-ansible/+/94973408:58
opendevreviewDmitriy Chubinidze proposed openstack/openstack-ansible master: docs: update environment scaling page  https://review.opendev.org/c/openstack/openstack-ansible/+/94976308:59
noonedeadpunkuser01000: so these 2 actions are controlled by different monitors09:00
noonedeadpunkwhen you kill the VM it's the instance monitor which trigger the event09:00
noonedeadpunkwhen you kill the node - it should be the host monitor09:01
noonedeadpunkthe host monitor relies on an external cluter mechanism to report node failure09:01
noonedeadpunkby default - that is pacemaker/corosync09:01
noonedeadpunkso I think it's worth checking if crm_mon -1 is supplying the list of your computes and the hostnames there are exactly the same as in your compute service list output.09:02
noonedeadpunkthen check if your masakari-engine recieves the event and if tries to do anything with it09:02
user01000noonedeadpunk: I am not sure how to deploy the pacemaker cluster is it part of openstack-ansible project or should i manually set up 09:03
noonedeadpunkalso keep in mind, that nodes added to segments must match hostnames exactly with pacemaker/corosync cluster and with nova compute services09:03
noonedeadpunkI _think_ it should deploy with masakari in some way09:03
noonedeadpunkbut it's installed only on `masakari_monitor` group https://opendev.org/openstack/openstack-ansible-plugins/src/branch/master/playbooks/masakari.yml#L22-L4609:04
noonedeadpunkbtw we really need to reconsider how we setup this....09:05
noonedeadpunkanyway09:05
user01000so the pcs cluster should be deployed on compute nodes right 09:05
noonedeadpunkso running crm_mon -1 on compute should be providing some output to you09:05
noonedeadpunkI think most widespread issue there is missmatching hostnames 09:06
noonedeadpunkie - hsotname vs fqdn09:06
noonedeadpunkas if it's not exact match somewhere - it will not identify the correct node09:06
user01000Okay my compute nodes does have pacemakerd and corosync running but management tools not installed09:09
user01000 unable to locate command: /usr/sbin/crm_mon09:11
noonedeadpunkI have not run this in a while tbh... So you pretty much missing pacemaker-cli-utils package?09:13
user01000hmm i am also searching for package name09:15
user01000Is there any metadata for vm migration I have added "HA_Enabled" to the vms I wanted to migrate09:23
user01000Node List:   * Online: [ compute1 compute2 compute3 compute4 ] the pcs cluster seems working as well09:24
user01000But i noticed "0 resource instances configured" this should be configured right or masakari is taking care of it09:25
user01000hostname is same i dont use fqdn09:26
noonedeadpunk"0 resource instances configured" -> this is fine iirc09:27
noonedeadpunkmasakari hostmonitor should be listening for the traffic on the interface and detect on it's own that node was lost from the cluster09:28
noonedeadpunkand issue an even to API09:28
noonedeadpunkso you should be able to list and check for events there09:29
user01000Hmm I think there is some errors in services not sure what let me check and restart all again 09:33
opendevreviewDmitriy Chubinidze proposed openstack/openstack-ansible master: docs: update managing instances page  https://review.opendev.org/c/openstack/openstack-ansible/+/94973409:35
user01000noonedeadpunk: I can see evacuate host active in horizon hypervisor list but vm is still on the node not migrated 09:47
user01000Is there any timeout for starting the migration09:47
noonedeadpunkyes there is, and you also can not evacuate the host unless it's disabled or marked as "down" in nova09:48
opendevreviewDmitriy Chubinidze proposed openstack/openstack-ansible master: docs: update managing instances page  https://review.opendev.org/c/openstack/openstack-ansible/+/94973409:51
user01000Yes it is also showing down so technically msakari should migrate the vm with HA_enabled metadata should be migrated to any active node or should i keep a spare node for this to work09:52
noonedeadpunkmasakari should evacuate any VM from crashed node by default iirc09:53
noonedeadpunkjust HA_enabled having a priority09:53
noonedeadpunkand spare node is an optional thing as well09:53
noonedeadpunk(but kinda recommended to ensure you have where to spawn instances)09:54
noonedeadpunkas in case your existing nodes don't have the capacity - they won't spawn there09:54
user01000Yes actually I am testing the  to migrte from vmware to openstack for production usage and this is a very important feature and we will keep sapare nodes atleast one 09:55
noonedeadpunkright.09:55
noonedeadpunkI have not used masakari for a while to be completely honest, so not sure in what state it is overall09:56
user01000is there any other condtion this testing nodes hosts ceph storage as well so when a node failes ceph goes degraed state , in proxmox the migration will not start on this situation is there any in maskari09:57
noonedeadpunkit will start, why won't it?09:58
noonedeadpunkdegraded state of ceph is kind of "designed" for these situations, isn't it?09:58
user01000noonedeadpunk: I have experience with proxmox , ha wont work in storage degraed state there09:58
user01000I have no idea why they created like that09:59
noonedeadpunkso lately we jsut rely on `resume_guests_state_on_host_boot` as modern computes usually take less time to boot then to wait for the timeout to begin evacuation09:59
noonedeadpunkhttps://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.resume_guests_state_on_host_boot09:59
noonedeadpunkso in cases of kernel panic, some outages, etc - node gets reseted and VMs are up as soon as node is up as well10:00
noonedeadpunkand timeout to detect that node is down in masakari and behin evac is usually smth like 120 sec10:00
noonedeadpunkin many cases you just get VMs up faster if using that then evac process10:01
noonedeadpunkexcept when hardware is cooked10:01
noonedeadpunkwhich is extremely rare situation, if you don't use 10yo stuff10:01
user01000I have more question but i dont care about them at this time like what if the br-mgmt network is down in that case the vms will run anyway but pcs will surely mark the node down, but i dont care i just need this thing to work :D10:02
user01000will use 660s probabaly no issue on hardware10:03
noonedeadpunkyes, so if interface is down - it will trigger the evac process10:04
noonedeadpunkas basically 2 conditions are met: a) node marked as down in cluster b) node is marked down in compute service list10:04
noonedeadpunkbut. It's usually better to use a different interface from br-mgmt, ie br-stor10:05
noonedeadpunkor br-vxlan10:05
noonedeadpunkas then you have communication for rabbitmq and pacemaker over different interfaces10:05
user01000Well i br-store on different network bond but rest are on same network bond10:06
noonedeadpunkso you;d need both of them to be down then10:06
noonedeadpunkyeah, right10:06
noonedeadpunkbut then again, you always need to meet both of conditions 10:06
user01000So basically my configs are correct it should work 10:06
noonedeadpunkdown in corosync and down in compute service list10:06
noonedeadpunkI think it should, yeah10:07
opendevreviewDmitriy Chubinidze proposed openstack/openstack-ansible master: docs: update managing instances page  https://review.opendev.org/c/openstack/openstack-ansible/+/94973410:07
user01000noonedeadpunk: I am getting this mesage in pacemaker any idea : warning: Cannot route message ping-crmadmin-1747304260-1: Unknown node compute0110:18
user01000the name is same as #hostname command as well as /etc/hosts file also same in crm_node -l10:21
user01000Hmm seems typo in crm_node output let me correct it probably as you said hostname issue10:25
noonedeadpunkthere totally could be some issues in playbooks as well10:28
noonedeadpunkas in fact we don't functionally test masakari, as it's not trivial to do without normal setup10:29
user01000Yes the hosts file is created wrong as well 10.61.0.54 compute01.openstack.local compute1 compute01 and pcs cluster using compute1 as node name10:30
noonedeadpunkI don't think it's wrong per say10:31
noonedeadpunkit;s kinda according to RFC10:31
noonedeadpunkit;s a difference between nodename, fqdn and hostname10:32
opendevreviewDmitriy Chubinidze proposed openstack/openstack-ansible master: docs: update managing instances page  https://review.opendev.org/c/openstack/openstack-ansible/+/94973410:32
user01000nodename fqdn hostname all are same 10:32
noonedeadpunkthey are not according to hosts file10:33
noonedeadpunkpython3 -c "import socket; print(socket.getfqdn()); print(socket.gethostname())"10:34
noonedeadpunkyou should get compute01.openstack.local\ncompute110:34
noonedeadpunk`openstack.local` is coming from here just in case https://opendev.org/openstack/openstack-ansible/src/branch/master/inventory/group_vars/all/all.yml#L88-L9010:36
noonedeadpunkthis is smth you can override, but changing hostname will kind of mess your env 10:37
user01000Yes i can see that10:37
noonedeadpunkyou can set `pacemaker_corosync_fqdn: true` to use fqdn in corosync cluster instead of hostname10:39
noonedeadpunkit will influence this logic: https://github.com/noonedeadpunk/ansible-role-pacemaker_corosync/blob/master/templates/corosync.conf.j2#L4510:40
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_neutron master: Configure apparmor for haproxy  https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/94978111:44

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!