Tuesday, 2024-05-21

opendevreview	wu.chunyang proposed openstack/kolla-ansible master: Modernize the swift role https://review.opendev.org/c/openstack/kolla-ansible/+/797498	07:05
opendevreview	Matt Crees proposed openstack/kolla master: Pass rabbitmq apt preferences into kolla-toolbox https://review.opendev.org/c/openstack/kolla/+/920037	08:21
sylvr	Hello ! I'm going to update from 2023.1 to 2023.2, what's the best way to update kayobe-config (project forked with custom config). Does a git merge should do the trick with the updated 2023.2(stable) branch ? Thanks a lot!	08:27
kevko	Guys, we upgraded to zed on production and found interisting issue ...when rabbitmq is restarted ... messages are lost, heartbeat issues etc ... yoga didn't do this ...tested ... have anyone seen this ?	09:02
kevko	what helped was of course to restart service containers where we've seen the issue ...but still ..yoga wasn't affected ...	09:03
yusufgungor	hi @kevko i am not sure your problem but i can tell about our experience on zed upgrade.	09:19
yusufgungor	what is the value for om_enable_rabbitmq_high_availability? We had problems if it is true. If this parameter is true then ha-all policy enabled except queues for pattern : ^(?!(amq\\.)\|(._fanout_)\|(reply_))..	09:19
yusufgungor	this config also forces to use durable queues. we had problems because of not mirrored queues which matches to that pattern. we had to set this parameter to false and create a manual ha-all policy on rabbitmq cluster to continue without durable queues and ha-all policy to all queues.	09:19
kevko	mnasiadka: Can we now merge those patches around init-runonce so I can continue with tempests ? ..it's really handy feature ..and či also using	09:19
kevko	yusufgungor: we have high availability turned on and except amqp fanout and reply is ok I would say ... We added global config which turning off durable queues	09:21
yusufgungor	our problem like this. when restarting rabbitmq cluster nodes for doing ha tests we had strange problems. We had to enable ha for amqp, fanout and reply queues too https://usercontent.irccloud-cdn.com/file/7HUNKpqU/image.png	09:25
yusufgungor	https://usercontent.irccloud-cdn.com/file/uDUAkdSO/image.png	09:25
kevko	We saw exactly the same logs as you sent ^^^ !	09:28
yusufgungor	oh, so probably it is same situation. we thought it is ok to add amqp, fanout and reply queues to the ha-all policy and it resolved the problems for us.	09:31
SvenKieske	kevko: do you run the rabbitmq default configuration or what kind of queue HA mechanism are you using?	09:39
kevko	yusufgungor: problem is that it don't explain why it is failing on zed but not on yoga ..because this was not changed between yoga -> zed	09:39
SvenKieske	yusufgungor: yes, fan out queues without HA are a known problem - at least for me - but I don't think deleting the in place ha policy and creating your own is the proper way to fix this	09:40
kevko	SvenKieske: We have 160 hypervisors, and so far we haven't had the courage to migrate to quorum-queues :D	09:40
kevko	SvenKieske: so we have override in global.conf to not use durable queues and we have ha-all still	09:41
SvenKieske	I'm not 100% certain currently, but I don't think both of you are running a valid config? you need durable queues if you really want HA? https://www.rabbitmq.com/docs/ha#non-mirrored-queue-behavior-on-node-failure	09:45
SvenKieske	because in the case of restarts you will lose queue content if the queue is not durable, it's content is not synced to disk then.	09:45
SvenKieske	so I would expect problems if using non durable queues, there was a handy matrix in the old rabbit docs, but they restyled and rewrote apparently their whole docs so I don't find it anymore..	09:46
kevko	SvenKieske: durable just means that they are stored on disk ..so restart of rabbitmq node will drop that queue ...but openstack will recreate it and will start work again if I am right	09:48
kevko	SvenKieske: yes, you can have a problem for few seconds ...few resources didn't created ..etc ...etc ..because of rabbitmq restart ...but not permanently broken service as we see ...	09:49
SvenKieske	yeah, but if that message got not processed it might mean trouble, you can't really treat rabbitmq as a stateless system in openstack, it's just wrong	09:49
SvenKieske	how was the service "broken"? did rabbit not start or something?	09:49
kevko	SvenKieske: we restarted one rabbitmq node ...	09:49
kevko	SvenKieske: and see similar log as yusufgungor sent https://usercontent.irccloud-cdn.com/file/7HUNKpqU/image.png	09:50
SvenKieske	looks like a neutron bug? using ephemeral queues across restarts? I think there where some patches in oslo floating around getting rid of the "ephemeral" concept in openstack because it doesn't fit, as can be seen in these errors	09:52
SvenKieske	but does this error result in anything? cluster not starting?	09:59
SvenKieske	did anybody write a bug report which I can look at?	10:03
yusufgungor	@SvenKieske on our case we still had the same logs and problems even trying durable queues. Everything seems normal but we got that log on some services like designate-central, nova-conductor ve neutron-server etc. We could not create instance. I have written to oslo channel but i have to find it if i got any reply	10:07
yusufgungor	@kevko it is probably a bug from oslo.messaging on zed version	10:07
yusufgungor	@svenkieske i have found it https://usercontent.irccloud-cdn.com/file/QoeJvy04/image.png	10:09
SvenKieske	a bugreport would be something on bugs.launchpad.net, I think, not a screenshot of an IRC channel conversation. nobody can track it this way.	10:10
SvenKieske	but thanks for reporting via IRC, but it seems nobody followed up on it if you are still affected 9 months after reporting. one more point to file an actual bugreport :)	10:11
kevko	yeah, we have exactly the same	10:12
kevko	https://paste.openstack.org/show/b5ygtfdZErzuByVdYBzY/	10:12
kevko	SvenKieske: we are affected for few days :)	10:13
kevko	SvenKieske: upgraded from yoga 2 weeks ago :)	10:13
yusufgungor	@SvenKieske you are right about the bug report but i have too many bug reports which have not got any respond, at that time we had desperate and moved with the easy solution	10:14
yusufgungor	Our environment not have that problem for now. @kevko would you likte to file a bug report to oslo?	10:17
kevko	yusufgungor: i will right after I will fix it :D	10:18
kevko	yusufgungor: in openstack if you are creating bugreport it's expected that you will also send a patch :D	10:18
yusufgungor	@kevko Thanks :D	10:19
opendevreview	Pierre Riteau proposed openstack/kayobe stable/2023.1: CI: Fix upgrade jobs following zed branch renaming https://review.opendev.org/c/openstack/kayobe/+/919807	10:26
SvenKieske	that's really not true :P	10:32
kevko	SvenKieske: almost true :D	10:33
kevko	SvenKieske: this was good advice from my colleguage from job before .... "It's opensource, you need to know where is problem, send bugreport and send also diff how to fix. " :D	10:34
kevko	(it was something different than openstack)	10:34
SvenKieske	I think it still discourages people from filing bugs, don't know if I need to explain why this is bad for the project.	10:34
opendevreview	Pierre Riteau proposed openstack/kayobe master: Fix list formatting in release note https://review.opendev.org/c/openstack/kayobe/+/920089	10:36
kevko	SvenKieske: I know ... but that's reality	10:41
SvenKieske	no it's an exaggeration imho. yes it's always faster to provide your own patches, but thankfully I didn't need to patch everything myself, many other people in fact did fix bugs which affected me, which I'm grateful for :)	10:46
opendevreview	Verification of a change to openstack/kayobe master failed: Fix issue removing docker volumes https://review.opendev.org/c/openstack/kayobe/+/909594	10:58
opendevreview	Pierre Riteau proposed openstack/kayobe master: Fix issue removing docker volumes https://review.opendev.org/c/openstack/kayobe/+/909594	10:59
kevko	SvenKieske: did you migrate queues to quorum queues on the fly ?	11:08
kevko	SvenKieske: yusufgungor: okay, probably colleguage just found an issue	11:10
yusufgungor	@SvenKieske we created ha-all policy with * pattern and applied on the fly to all queues	11:24
sylvr	Hello, I'm having issues with deploying the seed/bifrost machine (kayobe seed service deploy fail at "bootstrapping bifrost container", ironic --watch-log-file tell me : AttributeError: module 'select' has no attribute 'poll'	11:32
opendevreview	Kevin Tindall proposed openstack/kolla-ansible master: Add TLS proxy for novncproxy https://review.opendev.org/c/openstack/kolla-ansible/+/911141	12:29
opendevreview	Matúš Jenča proposed openstack/kolla-ansible master: Add backend TLS between MariaDB and ProxySQL https://review.opendev.org/c/openstack/kolla-ansible/+/909912	12:55
opendevreview	Matt Crees proposed openstack/kayobe stable/2023.1: Add script to migrate to RabbitMQ quorum queues https://review.opendev.org/c/openstack/kayobe/+/919925	14:23
sylvr	Here's the traceback from ironic failling to start (`kayobe seed service deploy`) : https://pastebin.com/gywtWb3z	14:30
sylvr	I can also send the full logs of `kayobe seed service deploy` but it's a big file	14:32
opendevreview	Verification of a change to openstack/kayobe stable/2023.1 failed: CI: Fix upgrade jobs following zed branch renaming https://review.opendev.org/c/openstack/kayobe/+/919807	14:48
opendevreview	Martin Hiner proposed openstack/kolla-ansible master: Add container engine migration scenario https://review.opendev.org/c/openstack/kolla-ansible/+/836941	15:08
spatel	Any idea what is wrong here, my OS is ubuntu 22.04 - https://paste.opendev.org/show/bIm3G8dLbi3UgQjsvcDX/	17:26
spatel	docker is running and I can pull images	17:29
SvenKieske	kevko: you don't happen to have a bugreport now for the rabbitmq stuff, do you? :)	17:30
SvenKieske	spatel: at the end it says: "DockerException: Error while fetching server API version: Not supported URL scheme http+docker\\n'""	17:31
spatel	This is new compute nodes I am adding today	17:32
spatel	and notice this error	17:32
SvenKieske	this seems to be an upstream bug: https://github.com/docker/docker-py/issues/3256	17:32
spatel	what could be wrong?	17:32
spatel	Yesterday I have added 10 compute nodes and didn't see this error	17:32
spatel	today encounter error :)	17:33
SvenKieske	yeah this seems to be an upstream error introduced yesterday somewhere	17:33
SvenKieske	fix is here: https://github.com/docker/docker-py/pull/3257	17:33
SvenKieske	mnasiadka: we might need to do something wrt to https://github.com/docker/docker-py/issues/3256, not sure what though, wondering why I didn't see CI fallout here, do we have pinned the request module maybe?	17:35
spatel	Can i downgrade my module to fix it	17:35
spatel	I hate to do hand edit :(	17:35
SvenKieske	if you downgrade requests it should work, afaik we have it pinned to 2.31.0 https://opendev.org/openstack/requirements/src/branch/master/upper-constraints.txt#L234	17:37
SvenKieske	seems that saved us :D	17:37
SvenKieske	spatel: you could directly use the openstack upper constraints files maybe, that's what we test anyway	17:38
spatel	How do I pin package in my kolla deployment?	17:38
spatel	how do I use this upper-constraints file with kolla-ansible?	17:39
SvenKieske	spatel: well we do this via zuul CI, in general I would advice to build a CI pipeline that builds your kolla images, this is where the projects are listed: https://opendev.org/openstack/kolla-ansible/src/branch/master/zuul.d/base.yaml#L55 (I highlighted the requirements repo)	17:40
SvenKieske	spatel: the process is described in detail here, because it is rather non trivial: https://docs.openstack.org/project-team-guide/dependency-management.html#solution	17:44
spatel	SvenKieske I am not building my images instead just downloading and storing in local repo	18:19
SvenKieske	okay, it's most of the time advisable to build your own images, e.g. when you need to rollback to an older version e.g. when such bugs occur. depends a bit on your circumstances of course.	18:21
spatel	agreed	18:30
spatel	SvenKieske this command fixed my issue - pip3 install requests===2.31.0	19:41
spatel	where do i override this in kolla-ansible tree to not doing this manually each time?	19:41
spatel	is this coming from kolla-ansible or just Ubuntu OS issue	19:41
opendevreview	Merged openstack/kayobe stable/2023.1: CI: Fix upgrade jobs following zed branch renaming https://review.opendev.org/c/openstack/kayobe/+/919807	23:58

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!