Wednesday, 2025-06-04

zigoHi there!07:20
zigoUnder trixie (aka: Debian 13) + epoxy, I get lots of stack dumps like this one in nova-conductor.log:07:20
zigohttps://paste.opendev.org/show/bECP9P2mCs2kelxQhbgK/07:20
zigoHas anyone seen something like this ?07:20
zigoLooks like I'm also getting this when spawning a VM:07:22
zigohttps://paste.opendev.org/show/bOocYhL8mYOV0jf3m7ip/07:22
zigoNot sure if the 2 are related...07:22
sean-k-mooneyzigo: are you use galera or something like that? nova does not suppprot Active active galera nor those most of openstack for that matter07:28
zigoYes, but everything goes through *one* single node for writing.07:29
zigoSo it's more like active/passive...07:29
sean-k-mooneyok that does work07:29
sean-k-mooneyits what we use downstream too07:29
sean-k-mooneyam the only other thing that comes to mind is maybe you have two compute service (condocutors in this case) with the same conf.HOST07:30
sean-k-mooneyi.e. you might have two conductor binaires trying to update teh same service record07:30
zigoIn my setup, conductor nodes are not compute nodes.07:31
zigoMy host = directive isn't set.07:32
zigoSo probably it's trying to use the FQDN ?07:32
sean-k-mooneyit will use socket.hostname()07:32
sean-k-mooneyby default becaus by defualt openstack requires unique hostnames not just unique FQDNs07:32
zigoRight.07:33
sean-k-mooneyif you have two instance of the conductor running on the same host by mistake07:33
zigoI don't.07:33
sean-k-mooneythen it could cuase the conductor issue but i have not seen it other then that07:33
zigoI'm not running in containers, so that would be impossible.07:33
sean-k-mooneyyour compute issue looks diffent07:33
sean-k-mooneywithout digin in too deeply it looks like perhaps you have an incompatible verion of libs07:34
zigoTo me, it seems like I could ignore the nova-conductor reporting issue. It's ugly, but the service is still reported as alive, so that's ok.07:34
zigoOf what lib?!?07:34
sean-k-mooneyperhaps an old neutron client?07:34
zigoUnlikely, it does work under bookworm + epoxy, and that's the same version of the libs.07:35
zigoie: neutronclient 11.4.0-207:35
sean-k-mooneywell the error is coming form keystonauth within neutronclient07:36
zigoRight, though it's saying "no attribute 'endpoint_override'" which is weird.07:36
zigoFYI, I did set endpoint_override in nova.conf for the URL of Neutron.07:36
sean-k-mooneywhat is you keystonauth version07:37
zigo5.10.007:37
zigoSo both keystoneauth1 and neutronclient as the released version for Epoxy.07:38
sean-k-mooneyya that shoudl be new enough07:39
sean-k-mooneyso in neutronclinet its failling here https://github.com/openstack/python-neutronclient/blob/11.4.0/neutronclient/client.py#L348 and that is just delegating to keystonatuh so i guess ill check there next07:39
sean-k-mooneyso its calling https://github.com/openstack/keystoneauth/blob/5.10.0/keystoneauth1/adapter.py#L313-L33407:40
zigoI'll have a try when not setting endpoint_override in my config file.07:42
zigoWhat's weird is that I didn't get this under Bookworm.07:42
zigoSo, this smells like a non-openstack-maintained lib is at play here.07:43
sean-k-mooneythat should not result in an atibute error07:44
sean-k-mooneythat what is really odd07:44
sean-k-mooneynova is still properly registring the relevent config options, if that was broken https://docs.openstack.org/nova/latest/configuration/config.html#neutron.endpoint_override woudl not render07:46
sean-k-mooneyso it should be vaild to set that direcly07:46
zigoRemoving endpoint_override form nova.conf doesn't fix anything. :/07:47
zigoI still get the same stack trace.07:47
sean-k-mooneyzigo: ya the stack track looks like someting is wrong in our packigng somehow07:52
sean-k-mooneyhave you chekced the content of /usr/lib/python3/dist-packages/keystoneauth1/adapter.py07:52
sean-k-mooneyand confirmed that the base adapter in the file on disk has endpoint overried 07:53
sean-k-mooneyand tries to import that in a python terminal and acess it?07:53
sean-k-mooneyor even one step back07:53
sean-k-mooneyif you use the neutron client cli does it have the same issue07:53
sean-k-mooneyi woudl expect a neutron port list to be broken in the same way on that host07:54
zigoI'm not sure what / how you're asking me to check.07:57
zigo:(07:59
sean-k-mooneygrep endpoint_override /usr/lib/python3/dist-packages/keystoneauth1/adapter.py08:01
sean-k-mooneyand also check if `neutron port-list` works08:01
zigoopenstack port list works, indeed.08:03
sean-k-mooneythe neutron cli supprot clouds.yaml so you can just use it like osc08:03
sean-k-mooneyno not openstack port list08:03
sean-k-mooneyexiplcited `neutron port-list`08:03
zigoI don't think neutronclient is providing an /usr/bin/neutron anymore.08:03
sean-k-mooneyopenstack port list uses the openstack sdk not the neutron clinet i think08:03
sean-k-mooneythey may have droped it in epoxy 08:03
zigohttps://paste.opendev.org/show/bWJcYW3nYUAkjMn71RN6/08:05
sean-k-mooneyya they did. you code on disk all looks correct toteh 5.10.0 tag08:07
jkulikhttps://bugs.launchpad.net/ubuntu/+source/nova/+bug/2103413 could that be related? looks like having the same stacktrace08:07
sean-k-mooneyoh perhaps08:08
sean-k-mooneyif eventlet is nuking part of the obejct by garbage collecting too early08:08
zigoOh, thanks ! :)08:08
sean-k-mooneythat does look like the same o r a very similar trace08:08
zigoI'm not surprised if eventlet is playing with me, as I'm on Python 3.13.08:09
sean-k-mooneyzigo: that is unfortually not very good news for yo since it means it because eventlet does not supprot 3.13 yet08:09
sean-k-mooneywhich is why openstack does nto supprot 3.13 yet08:09
zigosean-k-mooney: I have no choice, 3.13 it is ...08:09
zigoI was expecting things would go wrong.08:09
sean-k-mooneyso i belive there is an eventlet issue for this08:10
sean-k-mooneywe also know the thread id si broken08:10
zigoMaybe I should try the latest eventlet release.08:10
sean-k-mooneyhttps://github.com/eventlet/eventlet/issues/103208:10
zigoYeah, which is probably what's breaking nova-conductor too.08:10
sean-k-mooneyzigo: its not fix yet08:10
gboutryThat was exactly the error I got zigo, this is python 3.13 and eventlet not playing nice together08:10
zigoSHIT ! :(08:11
gboutrythe error would manifest with attributes in the object that was initialized correctly08:11
zigoDie eventlet die ...08:11
sean-k-mooneyi have commeted on both the 3.13 bugs to say we need to supprot 3.13 in eventlet for master this cycle per the project runtimes and that its required to complete the eventlet removal08:16
sean-k-mooneythe real quetion is will anyoen have time to actully work on that08:16
sean-k-mooneythis should be highlighted in the eventlet  removal channel too i guess08:16
zigoWell, for me, this means there wont be a working OpenStack for the Trixie release. That's really bad ... :(08:17
sean-k-mooneycannonical are in the same situration for ubuntu 25.0408:17
zigoExcept it's not an LTS, so they don't really care.08:17
sean-k-mooneyits the first tiem in openstack's history that im aware of that the latest point release of ubuntu has not supproted the latest openstack release08:18
zigoThey mostly provide OpenStack on top of LTS.08:18
sean-k-mooneythey ship it in both08:18
zigoRight.08:18
sean-k-mooneyeven if most of there custoemr are not on the point releases08:18
sean-k-mooneyit where they get ther early qa08:18
sean-k-mooneyhopefully herve and co will have time to look at the reported bugs08:20
zigoEpoxy is not a skipable release, so it *must* be fixed.08:20
zigoSaying, it's ok, Flamigo will have the fix, is not an option.08:20
sean-k-mooneyits not an openstack bug currently08:22
sean-k-mooneyits an eventlet one so ther isnt anythign nova can do to enable this08:22
sean-k-mooneyas an assid e apprenetly there is a propsoal to add "virtual thread" ala greentthread to core python https://discuss.python.org/t/add-virtual-threads-to-python/9140308:22
zigoI'll still have a try with eventlet 0.40 and see how it goes.08:44
zigoSame stuff with eventlet 0.40 ... :(08:48
sean-k-mooneythere is thsi work in progress hack https://github.com/eventlet/eventlet/pull/1031/files08:49
sean-k-mooneybut that not complete08:49
sean-k-mooneyand its techinially for the thread issue rather then the one you currently hittign with the GC08:50
sean-k-mooneythat may get you slighly closer howerever08:50
zigoWill try the patch ! :)08:50
zigoI was in fact looking into it.08:50
sean-k-mooneyits just a guess but https://github.com/eventlet/eventlet/pull/1042 might be related to ge gc issue08:52
sean-k-mooneyalthough it does not directly refence the exiting issue so im just going off the cover letter08:53
zigoLooks like I had a version of that patch already in my package.08:54
zigoNot sure if it was the latest one, so trying again.08:54
sean-k-mooneyi just looked at the top few open prs so that just a guess on my part08:55
zigoYeah...08:56
zigoStill broken ... :(08:57
sean-k-mooneyi would reach out to herve and see if they have any ideas on a path forward08:58
zigosean-k-mooney: What's the progress on eventlet removal in nova-compute?08:58
zigoAre there patches available already?08:58
sean-k-mooneyzigo: we dont expect to complete that until 2026.208:58
opendevreviewStephen Finucane proposed openstack/nova master: tests: Replace keystoneclient with keystoneauth1  https://review.opendev.org/c/openstack/nova/+/95174408:58
sean-k-mooneywe might get the intiall verison in 2026.108:58
sean-k-mooneybut we are workign on the schduler first this cycle08:58
sean-k-mooneythen maybe api and or oneof the other contoler services08:59
sean-k-mooneynova-compute is the hardest to move and will be the last service we move08:59
sean-k-mooneyour hope is that in 2026.1 we might be able to run all the nova againet in threaded mode but we dont know if we will get that far09:00
sean-k-mooneymakeing it the defautl and or remvoing eventlet supprot is a 2026.2+ activity once we have at least 1 slurp releease that suprpot threaded mode09:00
sean-k-mooneythat why we are aiming to have 2026.1 be the first release that can run without eventlet (maybe)09:01
sean-k-mooneyzigo: gibi has been makign some promissing progress but its a lot of work09:02
zigoAt the end of https://bugs.launchpad.net/ubuntu/+source/nova/+bug/2103413/comments/1 Guillaume Boutry says:09:03
zigousing `gc.disable()` makes the issue disappear (yay! disable gc!) or actually holding a hardref to `admin_client.baseclient.httpclient` makes the method pass most of the time...09:03
zigoNot sure where/how he's doing the garbage collector disabling.09:04
zigoIs this a gc in eventlet ?09:04
sean-k-mooneyno thats the main python one09:05
sean-k-mooneywe would have to hack in explict calls to the garbage collector somewhere and have hard refences to stop it beign nuked09:06
zigoMaybe, disabling it would be in eventlet/patcher.py ?09:06
gboutry gc.disable() to disable the GC, but that's really NOT a good idea09:07
zigoThat would mean memory leak right?09:07
sean-k-mooneyzigo: that will prevent object created by nova forom being deallcoate dy python automaticly when we exit scope09:07
gboutryyes, nothing would be freed anymore.09:08
sean-k-mooneyzigo: yes it would mean nova would never deallocate any python object 09:08
zigoAt this point, I need to validate the release, so anything is better than completely broken.09:08
zigoI'll try and see what happens. :)09:08
sean-k-mooneywell that will completely break nova09:08
sean-k-mooneyit will cause OOM kill events09:09
opendevreviewIvan Anfimov proposed openstack/nova master: docs: update for services to https  https://review.opendev.org/c/openstack/nova/+/93868009:11
sean-k-mooneyzigo: looking at how they were tracing it https://pastebin.ubuntu.com/p/cj7tb3kmGV/09:12
opendevreviewIvan Anfimov proposed openstack/nova master: docs: update for services to https  https://review.opendev.org/c/openstack/nova/+/93868009:13
sean-k-mooneyi wonder if on line 37 if we did somethign like base_client_ref = admin_client.base_client.httpclient09:13
sean-k-mooneyzigo: would that keep it form being gc'd09:13
sean-k-mooneyso that is https://github.com/openstack/nova/blob/master/nova/network/neutron.py#L121709:16
sean-k-mooneywe could maybve modify our get_client functionto return 3 refs one to the actully client one to the base client and one to the http clinet in the base.09:17
opendevreviewIvan Anfimov proposed openstack/nova master: docs: update for services to https  https://review.opendev.org/c/openstack/nova/+/93868009:17
sean-k-mooneythat shoudl keep them in scope until that function body exits09:17
sean-k-mooneyalthough we already have a refence to the baseclinet in our wrapper09:19
sean-k-mooneyhttps://github.com/openstack/nova/blob/master/nova/network/neutron.py#L184-L18509:19
gboutryBut that wouldn't prevent the code from breaking somewhere else?09:20
sean-k-mooneygboutry: it woudl at best mask the issue for neturonclient09:20
sean-k-mooneyits not an actual fix.09:20
sean-k-mooneyim just tryign to thinkis there a way nova can keep the relevent object alive with a hardref fo some kind09:21
sean-k-mooneyim not seing anything obvious09:21
opendevreviewIvan Anfimov proposed openstack/nova master: docs: update installation documentation  https://review.opendev.org/c/openstack/nova/+/93868009:21
sean-k-mooneygboutry: i need to go look at somethign else but ill quickly add a direct refernce to the http client to our client wrapper and see fi that changes anything09:24
opendevreviewIvan Anfimov proposed openstack/nova master: docs: update installation documentation  https://review.opendev.org/c/openstack/nova/+/93868009:26
opendevreviewsean mooney proposed openstack/nova master: [DNM] testing py3.13 eventlet bug workaround  https://review.opendev.org/c/openstack/nova/+/95174909:38
sean-k-mooneyzigo: gboutry no idea if ^ will work but maybe it will provide more data :shrug:09:39
sean-k-mooneywell that an excelnet start...09:40
*** mikal4 is now known as mikal09:42
opendevreviewsean mooney proposed openstack/nova master: [DNM] testing py3.13 eventlet bug workaround  https://review.opendev.org/c/openstack/nova/+/95174909:42
opendevreviewsean mooney proposed openstack/nova master: [DNM] testing py3.13 eventlet bug workaround  https://review.opendev.org/c/openstack/nova/+/95174909:44
opendevreviewsean mooney proposed openstack/nova master: [DNM] testing py3.13 eventlet bug workaround  https://review.opendev.org/c/openstack/nova/+/95174909:44
opendevreviewsean mooney proposed openstack/nova master: [DNM] testing py3.13 eventlet bug workaround  https://review.opendev.org/c/openstack/nova/+/95174909:45
sean-k-mooneyit helps to actully save all your chagnes before commiting...09:45
sean-k-mooneyzigo: gboutry: that add a direct refence to the http clint but it also moves nova-next to try and ue 3.13 assuming it a think on ubuntu 24.04 and it also adds a 3.13 functional job to see just how broken nova really is09:47
sean-k-mooneyi expect the answer to be very but ci shoudl tell us soon assuming my hacks actully work.09:48
zigosean-k-mooney: I just tried, added gc.disable() at the end of eventlet/patcher.py's _green_existing_locks(), and I could spawn a VM !10:38
sean-k-mooneyzigo: sure but you just added a out of memory issue. that not a solution you can include in your pakcaging of nova10:51
zigoI know, I just wanted to try what gboutry wrote.10:51
zigoI can even see the memory leak in real time.10:51
sean-k-mooneyack10:51
zigonova-compute used to take 0.9% of my VM's RAM, now it's up to 1.0.10:51
zigoI guess it's going to never stop leaking ...10:52
sean-k-mooneyif you have debug logging enabled it will leak faster10:52
zigoI do ! :)10:52
sean-k-mooneyit looks like python3.13 is not aviable in ubuntu 24.04 is it aviabel in debian 12?10:55
sean-k-mooneyit proably in universe in noble10:55
gboutrypython 3.13 is only available through the deadsnakes ppa on noble10:56
sean-k-mooneyack, im tryign to see if there is an easy way to hack in 3.13 in to one of our devstack jobs10:58
sean-k-mooneyi could enable that ppa in a pre playbook10:58
zigosean-k-mooney: No, only in Debian 13, though Trixie is in hard freeze, so it's a good moment to start using it.10:58
sean-k-mooneyzigo: well locally im using sid :) but ya its more work then i have time for today to get trixie in ci10:59
sean-k-mooneyzigo: have you spoken to the infra team about supportign it when it is released yet?11:01
stephenfinUggla: fyi, looks like there's a bug here https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L13370 The attribute on the class is called accessmode, not access_mode11:01
zigoNop, no time for that yet.11:01
zigosean-k-mooney: It doesn't work.11:08
zigoI mean, your patch with self._http_client = base_client.httpclient in network/neutron.py11:09
zigoJust tried ...11:09
sean-k-mooneyack11:10
sean-k-mooneyit was a long shot11:10
zigoThanks for trying ! :)11:10
sean-k-mooneyim going to try one other thing quickly to provied 3.13 via pyenv the same way we do for functional tests11:11
zigoI'm guessing there's going to be this kind of issue a bit everywhere anyways, so we need a better global eventlet fix.11:11
opendevreviewsean mooney proposed openstack/nova master: [DNM] testing py3.13 eventlet bug workaround  https://review.opendev.org/c/openstack/nova/+/95174911:15
sean-k-mooneythat shoudl complie 3.13 from souce on all the nodes although it might fail if we dont have gcc ectra avaibale but if it does it will fail fast11:17
sean-k-mooneyactully i shoudl add bindep first https://opendev.org/opendev/base-jobs/src/branch/master/playbooks/tox-docs/pre.yaml#L311:18
opendevreviewsean mooney proposed openstack/nova master: [DNM] testing py3.13 eventlet bug workaround  https://review.opendev.org/c/openstack/nova/+/95174911:20
gibizigo: sean-k-mooney: I'm torn on prioritizzng chasing down python3.13 eventlet bugs over focusing on removing eventlet from nova. Eventlet is on life support as we speak investing there is problematic. I know that I there is a mismatch between what python version OpenStack supports and what the distros want to use. Still I firmly belive we are better of removing eventlet than patching.11:54
sean-k-mooneyfun installing python 3.13. form sorce worked but https://paste.opendev.org/show/bcQL03mSHktFaQf6diQX/ mysql failed to install properly11:55
sean-k-mooneygibi: well in epoxy 3.13 is exmperimetnal but its in the mandatory testign runtim for 2025.211:55
sean-k-mooneyso there kind fo need to be two efforts here11:56
sean-k-mooneywe either need to change the testing runtims or make it work on 3.13 this cycle regardless fo the eventlet removal11:56
sean-k-mooneygibi: i think you should focus on the eventlet removal11:56
sean-k-mooneybut we need to work with the eventlet maintianers to make sure they have time to fix 3.1311:57
sean-k-mooneyi think that were zigo and other could help with that effort11:57
sean-k-mooneyif we manage to get master workign with 3.13 we can access if we need to backprot stuff to epoxy11:57
sean-k-mooneywe do get clean unit and functional tests11:57
sean-k-mooneyso i dont think things are massively borken in nova11:58
sean-k-mooneyi.e. if the eventlet bugs were fix openstack might "just work" without code changes on expoxy11:58
gibiwe are the eventlet maintainers 12:00
sean-k-mooneykind of 12:01
gibiyou can talk to hberaud about it12:01
sean-k-mooneyto me 3.13 supprot is not really optional12:02
gibiif zigo could help maintaining eventlet in py313 that is a win for sure12:02
sean-k-mooneyby ya its partly a prioty probelm12:02
sean-k-mooneygibi: there is currently a python discsion happenig about addign virtual tread to core pyton started by the gevent folks12:03
gibiI think the reality is that we have limited eventlet internal knowledge to maintain it even if we find the time12:03
zigoHervé is in PTO this week, so I can't talk to him until Monday.12:04
gibizigo: yepp I know12:04
*** ralonsoh_ is now known as ralonsoh12:36
Ugglastephenfin, thanks finding this bug. I'll fix it asap.12:39
Ugglagibi, fyi I have reviewed the first patch of your eventlet serie and left comments/questions.14:23
gibiUggla: thanks a lot. I will reply you probably tomorrow14:27
Ugglagibi, sure no hurries. As I mentioned it took me a while to review this first patch, I just hope I have not asked too much silly questions ?14:29
gibiall questions are usefull :) 14:31
gibiso no worries14:31
opendevreviewDan Smith proposed openstack/nova master: Remove contrib/clean-on-delete.py  https://review.opendev.org/c/openstack/nova/+/95059217:47
*** mikal8 is now known as mikal21:10

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!