Tuesday, 2024-05-21

opendevreviewwu.chunyang proposed openstack/kolla-ansible master: Modernize the swift role  https://review.opendev.org/c/openstack/kolla-ansible/+/79749807:05
opendevreviewMatt Crees proposed openstack/kolla master: Pass rabbitmq apt preferences into kolla-toolbox  https://review.opendev.org/c/openstack/kolla/+/92003708:21
sylvrHello ! I'm going to update from 2023.1 to 2023.2, what's the best way to update kayobe-config (project forked with custom config). Does a git merge should do the trick with the updated 2023.2(stable) branch ? Thanks a lot!08:27
kevkoGuys, we upgraded to zed on production and found interisting issue ...when rabbitmq is restarted ... messages are lost, heartbeat issues etc ... yoga didn't do this ...tested ... have anyone seen this ? 09:02
kevkowhat helped was of course to restart service containers where we've seen the issue ...but still ..yoga wasn't affected ...09:03
yusufgungorhi @kevko i am not sure your problem but i can tell about our experience on zed upgrade. 09:19
yusufgungorwhat is the value for om_enable_rabbitmq_high_availability? We had problems if it is true. If this parameter is true then ha-all policy enabled except queues for pattern : ^(?!(amq\\.)|(.*_fanout_)|(reply_)).*.09:19
yusufgungorthis config also forces to use durable queues. we had problems because of not mirrored queues which matches to that pattern. we had to set this parameter to false and create a manual ha-all policy on rabbitmq cluster to continue without durable queues and ha-all policy to all queues. 09:19
kevkomnasiadka: Can we now merge those patches around init-runonce so I can continue with tempests ? ..it's really handy feature ..and či also using 09:19
kevkoyusufgungor: we have high availability turned on and except amqp fanout and reply is ok I would say ... We added global config which turning off durable queues 09:21
yusufgungorour problem like this. when restarting rabbitmq cluster nodes for doing ha tests we had strange problems. We had to enable ha for amqp, fanout and reply queues too https://usercontent.irccloud-cdn.com/file/7HUNKpqU/image.png09:25
yusufgungorhttps://usercontent.irccloud-cdn.com/file/uDUAkdSO/image.png09:25
kevkoWe saw exactly the same logs as you sent ^^^ !09:28
yusufgungoroh, so probably it is same situation. we thought it is ok to add amqp, fanout and reply queues to the ha-all policy and it resolved the problems for us. 09:31
SvenKieskekevko: do you run the rabbitmq default configuration or what kind of queue HA mechanism are you using?09:39
kevkoyusufgungor: problem is that it don't explain why it is failing on zed but not on yoga ..because this was not changed between yoga -> zed 09:39
SvenKieskeyusufgungor: yes, fan out queues without HA are a known problem - at least for me - but I don't think deleting the in place ha policy and creating your own is the proper way to fix this09:40
kevkoSvenKieske: We have 160 hypervisors, and so far we haven't had the courage to migrate to quorum-queues :D 09:40
kevkoSvenKieske: so we have override in global.conf to not use durable queues and we have ha-all still 09:41
SvenKieskeI'm not 100% certain currently, but I don't think both of you are running a valid config? you need durable queues if you really want HA? https://www.rabbitmq.com/docs/ha#non-mirrored-queue-behavior-on-node-failure09:45
SvenKieskebecause in the case of restarts you will lose queue content if the queue is not durable, it's content is not synced to disk then.09:45
SvenKieskeso I would expect problems if using non durable queues, there was a handy matrix in the old rabbit docs, but they restyled and rewrote apparently their whole docs so I don't find it anymore..09:46
kevkoSvenKieske: durable just means that they are stored on disk ..so restart of rabbitmq node will drop that queue ...but openstack will recreate it and will start work again if I am right09:48
kevkoSvenKieske: yes, you can have a problem for few seconds ...few resources didn't created ..etc ...etc ..because of rabbitmq restart ...but not permanently broken service as we see ...09:49
SvenKieskeyeah, but if that message got not processed it might mean trouble, you can't really treat rabbitmq as a stateless system in openstack, it's just wrong09:49
SvenKieskehow was the service "broken"? did rabbit not start or something?09:49
kevkoSvenKieske: we restarted one rabbitmq node ...09:49
kevkoSvenKieske: and see similar log as yusufgungor sent https://usercontent.irccloud-cdn.com/file/7HUNKpqU/image.png09:50
SvenKieskelooks like a neutron bug? using ephemeral queues across restarts? I think there where some patches in oslo floating around getting rid of the "ephemeral" concept in openstack because it doesn't fit, as can be seen in these errors09:52
SvenKieskebut does this error result in anything? cluster not starting?09:59
SvenKieskedid anybody write a bug report which I can look at?10:03
yusufgungor@SvenKieske on our case we still had the same logs and problems even trying durable queues. Everything seems normal but we got that log on some services like designate-central, nova-conductor ve neutron-server etc. We could not create instance. I have written to oslo channel but i have to find it if i got any reply10:07
yusufgungor@kevko it is probably a bug from oslo.messaging on zed version10:07
yusufgungor@svenkieske i have found it https://usercontent.irccloud-cdn.com/file/QoeJvy04/image.png10:09
SvenKieskea bugreport would be something on bugs.launchpad.net, I think, not a screenshot of an IRC channel conversation. nobody can track it this way.10:10
SvenKieskebut thanks for reporting via IRC, but it seems nobody followed up on it if you are still affected 9 months after reporting. one more point to file an actual bugreport :)10:11
kevkoyeah, we have exactly the same 10:12
kevkohttps://paste.openstack.org/show/b5ygtfdZErzuByVdYBzY/10:12
kevkoSvenKieske: we are affected for few days :) 10:13
kevkoSvenKieske: upgraded from yoga 2 weeks ago :) 10:13
yusufgungor@SvenKieske you are right about the bug report but i have too many bug reports which have not got any respond, at that time we had desperate and moved with the easy solution10:14
yusufgungorOur environment not have that problem for now. @kevko would you likte to file a bug report to oslo?10:17
kevkoyusufgungor: i will right after I will fix it :D 10:18
kevkoyusufgungor: in openstack if you are creating bugreport it's expected that you will also send   a patch :D 10:18
yusufgungor@kevko Thanks :D 10:19
opendevreviewPierre Riteau proposed openstack/kayobe stable/2023.1: CI: Fix upgrade jobs following zed branch renaming  https://review.opendev.org/c/openstack/kayobe/+/91980710:26
SvenKieskethat's really not true :P10:32
kevkoSvenKieske: almost true :D 10:33
kevkoSvenKieske: this was good advice  from my colleguage from job before .... "It's opensource, you need to know where is problem, send bugreport and send also diff how to fix. " :D 10:34
kevko(it was something different than openstack)10:34
SvenKieskeI think it still discourages people from filing bugs, don't know if I need to explain why this is bad for the project.10:34
opendevreviewPierre Riteau proposed openstack/kayobe master: Fix list formatting in release note  https://review.opendev.org/c/openstack/kayobe/+/92008910:36
kevkoSvenKieske: I know ... but that's reality 10:41
SvenKieskeno it's an exaggeration imho. yes it's always faster to provide your own patches, but thankfully I didn't need to patch everything myself, many other people in fact did fix bugs which affected me, which I'm grateful for :)10:46
opendevreviewVerification of a change to openstack/kayobe master failed: Fix issue removing docker volumes  https://review.opendev.org/c/openstack/kayobe/+/90959410:58
opendevreviewPierre Riteau proposed openstack/kayobe master: Fix issue removing docker volumes  https://review.opendev.org/c/openstack/kayobe/+/90959410:59
kevkoSvenKieske: did you migrate queues to quorum queues on the fly ? 11:08
kevkoSvenKieske: yusufgungor: okay, probably colleguage just found an issue 11:10
yusufgungor@SvenKieske we created ha-all policy with * pattern and applied on the fly to all queues11:24
sylvrHello, I'm having issues with deploying the seed/bifrost machine (kayobe seed service deploy fail at "bootstrapping bifrost container", ironic --watch-log-file tell me : AttributeError: module 'select' has no attribute 'poll'11:32
opendevreviewKevin Tindall proposed openstack/kolla-ansible master: Add TLS proxy for novncproxy  https://review.opendev.org/c/openstack/kolla-ansible/+/91114112:29
opendevreviewMatúš Jenča proposed openstack/kolla-ansible master: Add backend TLS between MariaDB and ProxySQL  https://review.opendev.org/c/openstack/kolla-ansible/+/90991212:55
opendevreviewMatt Crees proposed openstack/kayobe stable/2023.1: Add script to migrate to RabbitMQ quorum queues  https://review.opendev.org/c/openstack/kayobe/+/91992514:23
sylvrHere's the traceback from ironic failling to start (`kayobe seed service deploy`) : https://pastebin.com/gywtWb3z14:30
sylvrI can also send the full logs of `kayobe seed service deploy` but it's a big file14:32
opendevreviewVerification of a change to openstack/kayobe stable/2023.1 failed: CI: Fix upgrade jobs following zed branch renaming  https://review.opendev.org/c/openstack/kayobe/+/91980714:48
opendevreviewMartin Hiner proposed openstack/kolla-ansible master: Add container engine migration scenario  https://review.opendev.org/c/openstack/kolla-ansible/+/83694115:08
spatelAny idea what is wrong here, my OS is ubuntu 22.04 - https://paste.opendev.org/show/bIm3G8dLbi3UgQjsvcDX/ 17:26
spateldocker is running and I can pull images 17:29
SvenKieskekevko: you don't happen to have a bugreport now for the rabbitmq stuff, do you? :)17:30
SvenKieskespatel: at the end it says: "DockerException: Error while fetching server API version: Not supported URL scheme http+docker\\n'""17:31
spatelThis is new compute nodes I am adding today17:32
spateland notice this error 17:32
SvenKieskethis seems to be an upstream bug: https://github.com/docker/docker-py/issues/325617:32
spatelwhat could be wrong? 17:32
spatelYesterday I have added 10 compute nodes and didn't see this error 17:32
spateltoday encounter error :)17:33
SvenKieskeyeah this seems to be an upstream error introduced yesterday somewhere17:33
SvenKieskefix is here: https://github.com/docker/docker-py/pull/325717:33
SvenKieskemnasiadka: we might need to do something wrt to https://github.com/docker/docker-py/issues/3256, not sure what though, wondering why I didn't see CI fallout here, do we have pinned the request module maybe?17:35
spatelCan i downgrade my module to fix it17:35
spatelI hate to do hand edit :(17:35
SvenKieskeif you downgrade requests it should work, afaik we have it pinned to 2.31.0 https://opendev.org/openstack/requirements/src/branch/master/upper-constraints.txt#L23417:37
SvenKieskeseems that saved us :D17:37
SvenKieskespatel: you could directly use the openstack upper constraints files maybe, that's what we test anyway17:38
spatelHow do I pin package in my kolla deployment?17:38
spatelhow do I use this upper-constraints file with kolla-ansible?17:39
SvenKieskespatel: well we do this via zuul CI, in general I would advice to build a CI pipeline that builds your kolla images, this is where the projects are listed: https://opendev.org/openstack/kolla-ansible/src/branch/master/zuul.d/base.yaml#L55 (I highlighted the requirements repo)17:40
SvenKieskespatel: the process is described in detail here, because it is rather non trivial: https://docs.openstack.org/project-team-guide/dependency-management.html#solution17:44
spatelSvenKieske I am not building my images instead just downloading and storing in local repo 18:19
SvenKieskeokay, it's most of the time advisable to build your own images, e.g. when you need to rollback to an older version e.g. when such bugs occur. depends a bit on your circumstances of course.18:21
spatelagreed18:30
spatelSvenKieske this command fixed my issue - pip3 install requests===2.31.019:41
spatelwhere do i override this in kolla-ansible tree to not doing this manually each time?19:41
spatelis this coming from kolla-ansible or just Ubuntu OS issue19:41
opendevreviewMerged openstack/kayobe stable/2023.1: CI: Fix upgrade jobs following zed branch renaming  https://review.opendev.org/c/openstack/kayobe/+/91980723:58

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!