Thursday, 2024-05-16

* NeilHanlon took a pass at some reviews while he was avoiding sleeping04:14
opendevreviewMerged openstack/openstack-ansible master: [doc] Rename extending-osa page  https://review.opendev.org/c/openstack/openstack-ansible/+/91507804:26
opendevreviewMerged openstack/openstack-ansible master: [doc] Document usage of user.rc file  https://review.opendev.org/c/openstack/openstack-ansible/+/91507604:30
opendevreviewMerged openstack/openstack-ansible-os_trove master: Manage trove images through openstack_resources role  https://review.opendev.org/c/openstack/openstack-ansible-os_trove/+/91810306:03
noonedeadpunkmornings07:28
noonedeadpunkthanks Neil!07:28
jrosser_so the manila trouble is starting with an exception in cinder https://zuul.opendev.org/t/openstack/build/25cdbc66c19c4a2c9405607ac59b5af0/log/logs/host/cinder-api.service.journal-17-03-38.log.txt#267007:50
noonedeadpunkwell07:57
noonedeadpunktempest config does not contain image ids07:57
noonedeadpunkhttps://zuul.opendev.org/t/openstack/build/25cdbc66c19c4a2c9405607ac59b5af0/log/logs/etc/host/tempest/tempest.conf.txt#67-6807:57
noonedeadpunkI think it should be detected by this: https://opendev.org/openstack/openstack-ansible-os_tempest/src/branch/master/tasks/tempest_resources.yml#L202-L21707:58
noonedeadpunkbut also this looks wierd: https://zuul.opendev.org/t/openstack/build/25cdbc66c19c4a2c9405607ac59b5af0/log/logs/etc/host/tempest/tempest.conf.txt#2907:58
noonedeadpunkso it's quite weird08:01
noonedeadpunkon positive note - https://review.opendev.org/c/openstack/openstack-ansible-os_octavia/+/868462 passed ovn test :)08:04
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_octavia master: Implement support for octavia-ovn-provider driver  https://review.opendev.org/c/openstack/openstack-ansible-os_octavia/+/86846208:16
noonedeadpunkandrewbonney: it would be nice to have a release not covering tags change for https://review.opendev.org/c/openstack/openstack-ansible/+/91861508:26
andrewbonneyI think I've added one08:28
noonedeadpunksnap08:28
noonedeadpunksorry :D08:28
andrewbonneyNo problem :)08:28
opendevreviewMerged openstack/openstack-ansible-os_adjutant master: reno: Update master for unmaintained/zed  https://review.opendev.org/c/openstack/openstack-ansible-os_adjutant/+/91914308:53
jrosser_oh well i think in the past we had some thing to use a local image file in CI for tempest09:02
farbodHi, after upgrading from xena to yoga everything was ok except neutron-server and neutron-rpc-server.09:22
farbodThe problem is that its using a lot of resources. Actually it uses all the memory and other services getting down.09:22
farbodhere are the logs of neutron-server when I start it: https://paste.opendev.org/show/bMt9mf825ndj81dLa1Z7/09:22
farbodI tried to decrease threadpool executer and workers but nothing changed.09:22
noonedeadpunkyeah, adjutant is severely borked as of today....09:25
noonedeadpunkfarbod: well, one thing you can try, is to disable uwsgi for neutron09:25
noonedeadpunkthough I've spotted recently, that just disabling uwsgi won't bring neutron-rpc-server down on it's own09:26
noonedeadpunkso you can define `neutron_use_uwsgi: False` in user_variables and run openstack-ansible os-neutron-install.yml --limit neutron_server09:26
noonedeadpunkonce that is done, you can use ad-hoc to stop/disable/mask rpc service09:27
noonedeadpunkie - `cd /opt/openstack-ansible; ansible -m service -a "name=neutron-rpc-server status=stopped enabled=false masked=true" neutron_server`09:27
farbodThanks. Let me test it09:29
jrosser_i did notice that neutron were adding some wsgi zuul jobs so hopefully this situation is going to improve09:34
noonedeadpunkwell, afaik they still don't support ovn09:34
noonedeadpunkat least last time I asked around caracal branching it was not09:35
jrosser_https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/91972509:35
farbodWhat do I lose by disabling uwsgi for neutron-server?09:47
noonedeadpunkkinda nothing09:49
noonedeadpunkjust eventlet vs wsgi09:49
noonedeadpunkit used to be eventlet for neutron for quite a while, but trend is just to go with wsgi everywhere09:50
farbodDoes this problem exist in later versions?09:51
noonedeadpunkum, don't really know, I'm not sure we've seen issues with resources09:54
noonedeadpunkbut we have quite heavy hardware for control plane 09:55
farbodOh you mean with enough resources I can use uwsgi?09:57
jrosser_i think it means more that we have not seen issues with large amounts of resources needed by neutron server09:58
noonedeadpunkyeah, exactly09:59
farbodhow much e.g?09:59
jrosser_depends what you mean but on one infra node i see neutron-server wanting 50% of one CPU core and ~500M ram10:03
farbodSo now with 2 workers and 4 thread pool executer on a 8core 64Gb RAM on one of my infra nodes my neutron server is using about 20GB of memory and its growing...10:05
noonedeadpunkUm.... SOmething is very off I'd say...10:06
noonedeadpunkThough, I think we're running 2023.1 as of today10:07
noonedeadpunkWe never stayed on Yoga long enough10:07
noonedeadpunkAs just did Xena->Yoga->2023.1 right away10:07
semanticSo, trying to revert https://github.com/openstack/oslo.messaging/commit/fd2381c723fe805b17aca1f80bfff4738fbe9628 makes things even worse in my case, with rabbit-server constantly logging something like that https://paste.opendev.org/show/bMl8Te9hxAvyXrNDEpOy/10:09
halaliHi, seems with tag 27.4.2 and ubuntu-22.04 redeploy, the nova-api got failed to reconnect to newly RabbitMQ deployed node https://paste.openstack.org/show/bZqhykeJV8ygwJSQXWiX/ and requires nova daemon restart 10:18
noonedeadpunkfeels quite alike to what semantic is experiencing10:27
jrosser_i think we might try some more to replicate this10:30
jrosser_though it is clearly a problem between rabbitmq <> oslo.messaging <> nova10:30
jrosser_it is not really at all an openstack-ansible problem, as far as i can see10:31
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: [doc] Define release names in documentation  https://review.opendev.org/c/openstack/openstack-ansible/+/91981410:43
noonedeadpunkunless we miss that reporting on failover flag to oslo.messaging10:45
noonedeadpunk`enable_cancel_on_failover` 10:46
noonedeadpunkas that somehow sounds very related at least to me10:46
noonedeadpunkhttps://www.rabbitmq.com/docs/ha#cancellation10:47
opendevreviewMerged openstack/openstack-ansible master: Deploy horizon by default with metal AIO scenarios  https://review.opendev.org/c/openstack/openstack-ansible/+/91600510:48
noonedeadpunkso like - instead of waiting for reply, client wil lbe notified that it has no chance to get it?10:48
noonedeadpunkI would really like to try that out, but there's no very easy way to apply it everywhere10:49
jrosser_well - if you have ideas to try i think this afternoon we might look at it some more10:51
jrosser_andrewbonney has some time to spend on this i think10:51
noonedeadpunkI would try to add `enable_cancel_on_failover = True` to [oslo_messaging_rabbit] section of configs10:55
noonedeadpunkI'm not sure if that has that much effect with qourum queus enabled, but it really might help with HA queues10:55
andrewbonneyMy suspicion is that now reply queues are actually HA this isn't really an RMQ issue but something internal getting confused, but happy to try things10:56
noonedeadpunkwell, what this doc describes, is that this is particulary useful for HA queues11:04
noonedeadpunkas they seem to be not master/master anyway11:04
noonedeadpunkso clients might need to be acked about queues being moved to another host, as it's not transparent process...11:04
noonedeadpunkbut I can be wrong11:04
noonedeadpunkbut yeah, that's probably more about duplicates....11:05
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Enable RabbitMQ Quorum Queues by default  https://review.opendev.org/c/openstack/openstack-ansible/+/91981611:19
noonedeadpunkhuh, I just realized, that distro install method has no chance of passing ^11:21
noonedeadpunkdue to severe old versions of rabbitmq11:22
noonedeadpunkin centos....11:22
noonedeadpunkor maybe it's not _that_ bad...11:34
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-rabbitmq_server master: Update rabbitmq/erlang to latest versions  https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/91982111:34
noonedeadpunkooops, cinder looks borked: https://zuul.opendev.org/t/openstack/build/bd5791375b98451f8c8ef136dd0ec9b9/log/logs/host/cinder-api.service.journal-10-02-53.log.txt#2795-282411:51
noonedeadpunkhuh, why it's trying amqp driver....11:52
noonedeadpunknah, amqp is part of rabbit implementation anyway11:59
farbodFor neutron high resource usage which i mentioned above, I tried to check logs in debug mode but every thing seems OK. Any of you guys have any idea for this or a way to troubleshooting neutron-server. neutron-server was ok in xena, but after upgrade it started to consume all the resources12:08
semanticenable_cancel_on_failover is not supported wih quorum queues it seems: 2024-05-16 12:20:09.800 49061 ERROR oslo.messaging._drivers.impl_rabbit [-] Unable to connect to AMQP server on 10.33.177.116:5671 after inf tries: Basic.consume: (406) PRECONDITION_FAILED - invalid arg 'x-cancel-on-ha-failover' for queue 'reply_compute2:nova-compute:1' in vhost 'nova' of queue type rabbit_quorum_queue: amqp.exceptions.PreconditionFai12:24
semanticled: Basic.consume: (406) PRECONDITION_FAILED - invalid arg 'x-cancel-on-ha-failover' for queue 'reply_compute2:nova-compute:1' in vhost 'nova' of queue type rabbit_quorum_queue12:24
noonedeadpunkfarbod: and you have reverted to non-UWSGI mode?12:43
noonedeadpunkhuh12:44
noonedeadpunkinteresting12:44
noonedeadpunksemantic: vmware docs say it should be...12:44
noonedeadpunkhttps://docs.vmware.com/en/VMware-RabbitMQ-for-Kubernetes/1/rmq/migrate-mcq-to-qq.html#:~:text=x%2Dcancel%2Don%2Dha,sent%20again%20(duplicate%20messages).12:44
noonedeadpunk"Most of the cases covered by x-cancel-on-ha-failover do not exist with quorum queues but those that are not covered are still there"12:44
farbodyes i disabled uwsgi12:45
farbodhere is the end of log:12:47
farbodhttps://paste.opendev.org/show/bcKmwaRfg2BrwMMJltm7/12:47
farbodit doesnt continue and doesnt respond to requests12:47
mnaserjrosser_: are you still seeing a bunch of NODE_FAILUREs?12:56
jrosser_mnaser: no, but last time i looked at grafana something still looked wrong12:56
mnaserI wonder where those are being launched12:57
jrosser_https://grafana.opendev.org/d/b283670153/nodepool3a-vexxhost?orgId=1&from=now-7d&to=now12:57
jrosser_"something" happened and theres a ton of things deleting forever12:57
jrosser_i *think* that when deleting was at 32 i got NODE_FAILURE (i guess that is the number of available 32G instances?) and it's hovering now at 2812:58
semanticWell https://github.com/rabbitmq/rabbitmq-server/blob/58b36b808878d5e29c49cd40eae3286b06291ca1/deps/rabbit/src/rabbit_quorum_queue.erl makes me think that quorum queues really do not support x-cancel-ha-on-failover as it is missed in capabilities/consumer_arguments as opposed to classic queue currently. Though i may be reading it wrong of course...13:05
noonedeadpunkyeah, you can be right about that13:06
noonedeadpunkaccording to https://review.opendev.org/q/topic:%22osa/rmq-migrate%22 - only Cinder is broken with qourum queues13:12
noonedeadpunkand weirdly broken....13:12
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_octavia master: Implement support for octavia-ovn-provider driver  https://review.opendev.org/c/openstack/openstack-ansible-os_octavia/+/86846213:14
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Use NFS Ganesha 5  https://review.opendev.org/c/openstack/openstack-ansible/+/91971413:18
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Update cirros image for manila tempest  https://review.opendev.org/c/openstack/openstack-ansible/+/91970213:21
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_manila master: Add quorum queues support for service  https://review.opendev.org/c/openstack/openstack-ansible-os_manila/+/89891413:22
noonedeadpunkmhm, so probavbly issue with cinder is in non-unique name of the service in hsm...13:29
noonedeadpunkwhich would make sense13:29
noonedeadpunkugh13:31
noonedeadpunkas `/dev/shm/aio1_uwsgi_qmanager` would be same for all services using uwsgi13:34
noonedeadpunkand that merged by https://opendev.org/openstack/oslo.messaging/src/branch/master/oslo_messaging/_drivers/amqpdriver.py#L64-L6513:35
noonedeadpunkwhich kinda configurable: https://opendev.org/openstack/oslo.messaging/src/branch/master/oslo_messaging/_drivers/impl_rabbit.py#L248-L25513:35
noonedeadpunkand will be just fine for LXC....13:35
noonedeadpunkit probably won't for metal13:35
noonedeadpunkmeaning... we should supply processname in config13:36
noonedeadpunkmeaning... another series of patches...13:36
noonedeadpunkbut apparently I found what;s wrong with manila images...13:37
noonedeadpunkcrap. but then I guess counter will be reset each time :(13:43
NeilHanlonnoonedeadpunk: regarding mq on rocky/centos... if there's a want to have a newer version, I can investigate14:20
NeilHanlonthere used to be a messaging SIG 14:20
noonedeadpunkI realized that I mixed up mq and mariadb14:22
NeilHanloni do that all the time with rabbit and redis14:23
* NeilHanlon also needs to look at the lxc-templates stuff 14:24
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Enable RabbitMQ Quorum Queues by default  https://review.opendev.org/c/openstack/openstack-ansible/+/91981615:33
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_cinder master: Add tag to enable targeting of post-install config elements only  https://review.opendev.org/c/openstack/openstack-ansible-os_cinder/+/91967415:33
noonedeadpunkcan't think of anything better, then just disable queue manager by default :(15:33
noonedeadpunkit has weird default in roles... 15:34
noonedeadpunkbut really no good solution here15:34
opendevreviewMerged openstack/openstack-ansible-os_heat master: Add tag to enable targeting of post-install config elements only  https://review.opendev.org/c/openstack/openstack-ansible-os_heat/+/91967520:49
opendevreviewMerged openstack/openstack-ansible-os_glance master: Add tag to enable targeting of post-install config elements only  https://review.opendev.org/c/openstack/openstack-ansible-os_glance/+/91967320:50
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_manila master: Add service policies defenition  https://review.opendev.org/c/openstack/openstack-ansible-os_manila/+/91812920:50
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_manila master: Add variable to globally control notifications enablement  https://review.opendev.org/c/openstack/openstack-ansible-os_manila/+/91813020:50
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_manila master: Implement variables to address oslo.messaging improvements  https://review.opendev.org/c/openstack/openstack-ansible-os_manila/+/91813120:50
opendevreviewMerged openstack/openstack-ansible-os_skyline master: Add tag to enable targeting of post-install config elements only  https://review.opendev.org/c/openstack/openstack-ansible-os_skyline/+/91969420:51
opendevreviewMerged openstack/openstack-ansible-os_horizon master: Add tag to enable targeting of post-install config elements only  https://review.opendev.org/c/openstack/openstack-ansible-os_horizon/+/91967620:52
opendevreviewMerged openstack/openstack-ansible-os_barbican master: Add tag to enable targeting of post-install config elements only  https://review.opendev.org/c/openstack/openstack-ansible-os_barbican/+/91967120:52
opendevreviewMerged openstack/openstack-ansible-os_ironic master: Add tag to enable targeting of post-install config elements only  https://review.opendev.org/c/openstack/openstack-ansible-os_ironic/+/91968420:52
opendevreviewMerged openstack/openstack-ansible-os_octavia master: Add tag to enable targeting of post-install config elements only  https://review.opendev.org/c/openstack/openstack-ansible-os_octavia/+/91968720:53
opendevreviewMerged openstack/openstack-ansible-os_keystone master: Add tag to enable targeting of post-install config elements only  https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/91967020:53
opendevreviewMerged openstack/openstack-ansible-os_swift master: Add tag to enable targeting of post-install config elements only  https://review.opendev.org/c/openstack/openstack-ansible-os_swift/+/91967820:53
opendevreviewMerged openstack/openstack-ansible-os_blazar master: Add tag to enable targeting of post-install config elements only  https://review.opendev.org/c/openstack/openstack-ansible-os_blazar/+/91968920:53
jrosser_noonedeadpunk: ^ i have no idea why these are only running the docs job in the gate queue :/20:54
opendevreviewMerged openstack/openstack-ansible-os_aodh master: Add tag to enable targeting of post-install config elements only  https://review.opendev.org/c/openstack/openstack-ansible-os_aodh/+/91968220:54
opendevreviewMerged openstack/openstack-ansible-os_nova master: Add tag to enable targeting of post-install config elements only  https://review.opendev.org/c/openstack/openstack-ansible-os_nova/+/91861420:54
opendevreviewMerged openstack/openstack-ansible-os_designate master: Add tag to enable targeting of post-install config elements only  https://review.opendev.org/c/openstack/openstack-ansible-os_designate/+/91967720:54
opendevreviewMerged openstack/openstack-ansible-os_placement master: Add tag to enable targeting of post-install config elements only  https://review.opendev.org/c/openstack/openstack-ansible-os_placement/+/91967220:56
opendevreviewMerged openstack/openstack-ansible-os_trove master: Add tag to enable targeting of post-install config elements only  https://review.opendev.org/c/openstack/openstack-ansible-os_trove/+/91968620:56
opendevreviewMerged openstack/openstack-ansible-os_ceilometer master: Add tag to enable targeting of post-install config elements only  https://review.opendev.org/c/openstack/openstack-ansible-os_ceilometer/+/91968120:56
opendevreviewMerged openstack/openstack-ansible-os_ceilometer master: Add service policies defenition  https://review.opendev.org/c/openstack/openstack-ansible-os_ceilometer/+/91810520:56
opendevreviewMerged openstack/openstack-ansible-os_ceilometer master: Implement variables to address oslo.messaging improvements  https://review.opendev.org/c/openstack/openstack-ansible-os_ceilometer/+/91810720:56
jrosser_omg it is also in the check jobs https://review.opendev.org/c/openstack/openstack-ansible-os_manila/+/918130?tab=change-view-tab-header-zuul-results-summary20:58
opendevreviewMerged openstack/openstack-ansible-os_tacker master: Add tag to enable targeting of post-install config elements only  https://review.opendev.org/c/openstack/openstack-ansible-os_tacker/+/91968820:58
opendevreviewMerged openstack/openstack-ansible-os_mistral master: Add tag to enable targeting of post-install config elements only  https://review.opendev.org/c/openstack/openstack-ansible-os_mistral/+/91969220:58
opendevreviewMerged openstack/openstack-ansible-os_magnum master: Add tag to enable targeting of post-install config elements only  https://review.opendev.org/c/openstack/openstack-ansible-os_magnum/+/91968521:04
opendevreviewMerged openstack/openstack-ansible-os_neutron master: Add tag to enable targeting of post-install config elements only  https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/91969521:06
opendevreviewJonathan Rosser proposed openstack/openstack-ansible master: Remove murano from zuul required projects, it is now retired  https://review.opendev.org/c/openstack/openstack-ansible/+/91990221:23
jrosser_ok, so don't approve anything at all. until we merge that ^^^21:30
jrosser_otherwise testing is basically bypassed21:30

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!