opendevreview | wu.chunyang proposed openstack/kolla-ansible master: Modernize the swift role https://review.opendev.org/c/openstack/kolla-ansible/+/797498 | 07:05 |
---|---|---|
opendevreview | Matt Crees proposed openstack/kolla master: Pass rabbitmq apt preferences into kolla-toolbox https://review.opendev.org/c/openstack/kolla/+/920037 | 08:21 |
sylvr | Hello ! I'm going to update from 2023.1 to 2023.2, what's the best way to update kayobe-config (project forked with custom config). Does a git merge should do the trick with the updated 2023.2(stable) branch ? Thanks a lot! | 08:27 |
kevko | Guys, we upgraded to zed on production and found interisting issue ...when rabbitmq is restarted ... messages are lost, heartbeat issues etc ... yoga didn't do this ...tested ... have anyone seen this ? | 09:02 |
kevko | what helped was of course to restart service containers where we've seen the issue ...but still ..yoga wasn't affected ... | 09:03 |
yusufgungor | hi @kevko i am not sure your problem but i can tell about our experience on zed upgrade. | 09:19 |
yusufgungor | what is the value for om_enable_rabbitmq_high_availability? We had problems if it is true. If this parameter is true then ha-all policy enabled except queues for pattern : ^(?!(amq\\.)|(.*_fanout_)|(reply_)).*. | 09:19 |
yusufgungor | this config also forces to use durable queues. we had problems because of not mirrored queues which matches to that pattern. we had to set this parameter to false and create a manual ha-all policy on rabbitmq cluster to continue without durable queues and ha-all policy to all queues. | 09:19 |
kevko | mnasiadka: Can we now merge those patches around init-runonce so I can continue with tempests ? ..it's really handy feature ..and či also using | 09:19 |
kevko | yusufgungor: we have high availability turned on and except amqp fanout and reply is ok I would say ... We added global config which turning off durable queues | 09:21 |
yusufgungor | our problem like this. when restarting rabbitmq cluster nodes for doing ha tests we had strange problems. We had to enable ha for amqp, fanout and reply queues too https://usercontent.irccloud-cdn.com/file/7HUNKpqU/image.png | 09:25 |
yusufgungor | https://usercontent.irccloud-cdn.com/file/uDUAkdSO/image.png | 09:25 |
kevko | We saw exactly the same logs as you sent ^^^ ! | 09:28 |
yusufgungor | oh, so probably it is same situation. we thought it is ok to add amqp, fanout and reply queues to the ha-all policy and it resolved the problems for us. | 09:31 |
SvenKieske | kevko: do you run the rabbitmq default configuration or what kind of queue HA mechanism are you using? | 09:39 |
kevko | yusufgungor: problem is that it don't explain why it is failing on zed but not on yoga ..because this was not changed between yoga -> zed | 09:39 |
SvenKieske | yusufgungor: yes, fan out queues without HA are a known problem - at least for me - but I don't think deleting the in place ha policy and creating your own is the proper way to fix this | 09:40 |
kevko | SvenKieske: We have 160 hypervisors, and so far we haven't had the courage to migrate to quorum-queues :D | 09:40 |
kevko | SvenKieske: so we have override in global.conf to not use durable queues and we have ha-all still | 09:41 |
SvenKieske | I'm not 100% certain currently, but I don't think both of you are running a valid config? you need durable queues if you really want HA? https://www.rabbitmq.com/docs/ha#non-mirrored-queue-behavior-on-node-failure | 09:45 |
SvenKieske | because in the case of restarts you will lose queue content if the queue is not durable, it's content is not synced to disk then. | 09:45 |
SvenKieske | so I would expect problems if using non durable queues, there was a handy matrix in the old rabbit docs, but they restyled and rewrote apparently their whole docs so I don't find it anymore.. | 09:46 |
kevko | SvenKieske: durable just means that they are stored on disk ..so restart of rabbitmq node will drop that queue ...but openstack will recreate it and will start work again if I am right | 09:48 |
kevko | SvenKieske: yes, you can have a problem for few seconds ...few resources didn't created ..etc ...etc ..because of rabbitmq restart ...but not permanently broken service as we see ... | 09:49 |
SvenKieske | yeah, but if that message got not processed it might mean trouble, you can't really treat rabbitmq as a stateless system in openstack, it's just wrong | 09:49 |
SvenKieske | how was the service "broken"? did rabbit not start or something? | 09:49 |
kevko | SvenKieske: we restarted one rabbitmq node ... | 09:49 |
kevko | SvenKieske: and see similar log as yusufgungor sent https://usercontent.irccloud-cdn.com/file/7HUNKpqU/image.png | 09:50 |
SvenKieske | looks like a neutron bug? using ephemeral queues across restarts? I think there where some patches in oslo floating around getting rid of the "ephemeral" concept in openstack because it doesn't fit, as can be seen in these errors | 09:52 |
SvenKieske | but does this error result in anything? cluster not starting? | 09:59 |
SvenKieske | did anybody write a bug report which I can look at? | 10:03 |
yusufgungor | @SvenKieske on our case we still had the same logs and problems even trying durable queues. Everything seems normal but we got that log on some services like designate-central, nova-conductor ve neutron-server etc. We could not create instance. I have written to oslo channel but i have to find it if i got any reply | 10:07 |
yusufgungor | @kevko it is probably a bug from oslo.messaging on zed version | 10:07 |
yusufgungor | @svenkieske i have found it https://usercontent.irccloud-cdn.com/file/QoeJvy04/image.png | 10:09 |
SvenKieske | a bugreport would be something on bugs.launchpad.net, I think, not a screenshot of an IRC channel conversation. nobody can track it this way. | 10:10 |
SvenKieske | but thanks for reporting via IRC, but it seems nobody followed up on it if you are still affected 9 months after reporting. one more point to file an actual bugreport :) | 10:11 |
kevko | yeah, we have exactly the same | 10:12 |
kevko | https://paste.openstack.org/show/b5ygtfdZErzuByVdYBzY/ | 10:12 |
kevko | SvenKieske: we are affected for few days :) | 10:13 |
kevko | SvenKieske: upgraded from yoga 2 weeks ago :) | 10:13 |
yusufgungor | @SvenKieske you are right about the bug report but i have too many bug reports which have not got any respond, at that time we had desperate and moved with the easy solution | 10:14 |
yusufgungor | Our environment not have that problem for now. @kevko would you likte to file a bug report to oslo? | 10:17 |
kevko | yusufgungor: i will right after I will fix it :D | 10:18 |
kevko | yusufgungor: in openstack if you are creating bugreport it's expected that you will also send a patch :D | 10:18 |
yusufgungor | @kevko Thanks :D | 10:19 |
opendevreview | Pierre Riteau proposed openstack/kayobe stable/2023.1: CI: Fix upgrade jobs following zed branch renaming https://review.opendev.org/c/openstack/kayobe/+/919807 | 10:26 |
SvenKieske | that's really not true :P | 10:32 |
kevko | SvenKieske: almost true :D | 10:33 |
kevko | SvenKieske: this was good advice from my colleguage from job before .... "It's opensource, you need to know where is problem, send bugreport and send also diff how to fix. " :D | 10:34 |
kevko | (it was something different than openstack) | 10:34 |
SvenKieske | I think it still discourages people from filing bugs, don't know if I need to explain why this is bad for the project. | 10:34 |
opendevreview | Pierre Riteau proposed openstack/kayobe master: Fix list formatting in release note https://review.opendev.org/c/openstack/kayobe/+/920089 | 10:36 |
kevko | SvenKieske: I know ... but that's reality | 10:41 |
SvenKieske | no it's an exaggeration imho. yes it's always faster to provide your own patches, but thankfully I didn't need to patch everything myself, many other people in fact did fix bugs which affected me, which I'm grateful for :) | 10:46 |
opendevreview | Verification of a change to openstack/kayobe master failed: Fix issue removing docker volumes https://review.opendev.org/c/openstack/kayobe/+/909594 | 10:58 |
opendevreview | Pierre Riteau proposed openstack/kayobe master: Fix issue removing docker volumes https://review.opendev.org/c/openstack/kayobe/+/909594 | 10:59 |
kevko | SvenKieske: did you migrate queues to quorum queues on the fly ? | 11:08 |
kevko | SvenKieske: yusufgungor: okay, probably colleguage just found an issue | 11:10 |
yusufgungor | @SvenKieske we created ha-all policy with * pattern and applied on the fly to all queues | 11:24 |
sylvr | Hello, I'm having issues with deploying the seed/bifrost machine (kayobe seed service deploy fail at "bootstrapping bifrost container", ironic --watch-log-file tell me : AttributeError: module 'select' has no attribute 'poll' | 11:32 |
opendevreview | Kevin Tindall proposed openstack/kolla-ansible master: Add TLS proxy for novncproxy https://review.opendev.org/c/openstack/kolla-ansible/+/911141 | 12:29 |
opendevreview | Matúš Jenča proposed openstack/kolla-ansible master: Add backend TLS between MariaDB and ProxySQL https://review.opendev.org/c/openstack/kolla-ansible/+/909912 | 12:55 |
opendevreview | Matt Crees proposed openstack/kayobe stable/2023.1: Add script to migrate to RabbitMQ quorum queues https://review.opendev.org/c/openstack/kayobe/+/919925 | 14:23 |
sylvr | Here's the traceback from ironic failling to start (`kayobe seed service deploy`) : https://pastebin.com/gywtWb3z | 14:30 |
sylvr | I can also send the full logs of `kayobe seed service deploy` but it's a big file | 14:32 |
opendevreview | Verification of a change to openstack/kayobe stable/2023.1 failed: CI: Fix upgrade jobs following zed branch renaming https://review.opendev.org/c/openstack/kayobe/+/919807 | 14:48 |
opendevreview | Martin Hiner proposed openstack/kolla-ansible master: Add container engine migration scenario https://review.opendev.org/c/openstack/kolla-ansible/+/836941 | 15:08 |
spatel | Any idea what is wrong here, my OS is ubuntu 22.04 - https://paste.opendev.org/show/bIm3G8dLbi3UgQjsvcDX/ | 17:26 |
spatel | docker is running and I can pull images | 17:29 |
SvenKieske | kevko: you don't happen to have a bugreport now for the rabbitmq stuff, do you? :) | 17:30 |
SvenKieske | spatel: at the end it says: "DockerException: Error while fetching server API version: Not supported URL scheme http+docker\\n'"" | 17:31 |
spatel | This is new compute nodes I am adding today | 17:32 |
spatel | and notice this error | 17:32 |
SvenKieske | this seems to be an upstream bug: https://github.com/docker/docker-py/issues/3256 | 17:32 |
spatel | what could be wrong? | 17:32 |
spatel | Yesterday I have added 10 compute nodes and didn't see this error | 17:32 |
spatel | today encounter error :) | 17:33 |
SvenKieske | yeah this seems to be an upstream error introduced yesterday somewhere | 17:33 |
SvenKieske | fix is here: https://github.com/docker/docker-py/pull/3257 | 17:33 |
SvenKieske | mnasiadka: we might need to do something wrt to https://github.com/docker/docker-py/issues/3256, not sure what though, wondering why I didn't see CI fallout here, do we have pinned the request module maybe? | 17:35 |
spatel | Can i downgrade my module to fix it | 17:35 |
spatel | I hate to do hand edit :( | 17:35 |
SvenKieske | if you downgrade requests it should work, afaik we have it pinned to 2.31.0 https://opendev.org/openstack/requirements/src/branch/master/upper-constraints.txt#L234 | 17:37 |
SvenKieske | seems that saved us :D | 17:37 |
SvenKieske | spatel: you could directly use the openstack upper constraints files maybe, that's what we test anyway | 17:38 |
spatel | How do I pin package in my kolla deployment? | 17:38 |
spatel | how do I use this upper-constraints file with kolla-ansible? | 17:39 |
SvenKieske | spatel: well we do this via zuul CI, in general I would advice to build a CI pipeline that builds your kolla images, this is where the projects are listed: https://opendev.org/openstack/kolla-ansible/src/branch/master/zuul.d/base.yaml#L55 (I highlighted the requirements repo) | 17:40 |
SvenKieske | spatel: the process is described in detail here, because it is rather non trivial: https://docs.openstack.org/project-team-guide/dependency-management.html#solution | 17:44 |
spatel | SvenKieske I am not building my images instead just downloading and storing in local repo | 18:19 |
SvenKieske | okay, it's most of the time advisable to build your own images, e.g. when you need to rollback to an older version e.g. when such bugs occur. depends a bit on your circumstances of course. | 18:21 |
spatel | agreed | 18:30 |
spatel | SvenKieske this command fixed my issue - pip3 install requests===2.31.0 | 19:41 |
spatel | where do i override this in kolla-ansible tree to not doing this manually each time? | 19:41 |
spatel | is this coming from kolla-ansible or just Ubuntu OS issue | 19:41 |
opendevreview | Merged openstack/kayobe stable/2023.1: CI: Fix upgrade jobs following zed branch renaming https://review.opendev.org/c/openstack/kayobe/+/919807 | 23:58 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!