jkulik | I might have found the trigger of the problem for. Not sure why exactly that makes it fail with Python 3.13, but at least I cannot reproduce the bug anymore in 100 runs with this patch: https://paste.opendev.org/show/bTK1viFTIu7tZOhwFOch/ | 06:41 |
---|---|---|
jkulik | without this patch, any client has an endless list of base_client.base_client.base_client. those eventually get their __dict__ cleared for some reason, while the ClientWrapper keeps its own. the httpclient (SessionClient) doesn't necessarily get the __dict__ emptied, though. | 06:47 |
zigo | Let me try this! :) | 06:48 |
zigo | \o/ | 06:51 |
zigo | jkulik: Got my first VM spawn on Trixie ! :) | 06:51 |
jkulik | nice! | 06:51 |
zigo | So what is it that happened?!? | 06:51 |
jkulik | I'd still put my money on the Python bug someone mentioned yesterday https://github.com/python/cpython/issues/135552 | 06:52 |
jkulik | But I'm not sure how to reduce the scope of the reproducer to something usefully small :/ | 06:53 |
zigo | Well, I haven't seen Python crashing, so that doesn't seem matching. | 06:53 |
jkulik | "GC doing stuff it shouldn't with circle references" is the summary of that bug for me | 06:54 |
jkulik | hm ... I got the problem reduced quite drastically: https://paste.opendev.org/show/bdXenYhvSBiC7jDazkkc/ | 07:30 |
jkulik | so this looks like a change in how Python works - possibly a bug. I'm not sure how to move forward with that information though. | 07:32 |
zigo | jkulik: Well done! This feels like a Python bug indeed. | 08:09 |
zigo | I'm forwarding this to the #debian-python IRC channel. | 08:09 |
jkulik | https://github.com/python/cpython/issues/130327 | 08:10 |
jkulik | ^ looks pretty much like our bug | 08:10 |
frickler | nice detective work, cool | 08:15 |
zigo | jkulik: Well done, really !!! | 09:43 |
zigo | Though there's no patch upstream for that fix yet, it seems. | 09:43 |
hberaud[m] | jkulik: Kudos, well done, I forwarded your observations on the eventlet side https://github.com/eventlet/eventlet/issues/1032#issuecomment-2987260478 | 09:44 |
jkulik | hberaud[m]: I've seen it. good writeup, thank you! | 09:45 |
hberaud[m] | jkulik: No no thanks to YOU :) | 09:45 |
zigo | Yeah, good summary ! :) | 09:46 |
hberaud[m] | I think we want to wait for Guillaume's acknowledgment before closing this Eventlet issue, though. | 09:47 |
hberaud[m] | thanks | 09:47 |
zigo | Stefano (aka: tumbleweed) bisected it to one of these commits: https://github.com/python/cpython/commit/c32dc47aca6e8fac152699bc613e015c44ccdba9 | 09:49 |
zigo | Gosh, this is highly unreadable... | 09:53 |
gibi | nice work folks! the decriptionof that cpython commit the bisect pointed at feels relevant to the problem we see. | 10:02 |
itamarst | ooh, exciting to see a reproducer | 12:32 |
itamarst | I would suggest adding a note to the CPython issue explaining that this is affecting real-world software | 12:32 |
itamarst | volunteer project and all that but might be motivating | 12:33 |
hberaud[m] | +1 | 12:33 |
hberaud[m] | As I linked https://github.com/python/cpython/issues/135552 into the eventlet issue https://github.com/eventlet/eventlet/issues/1032#issuecomment-2987260478 our story now appear as attached on the CPython side of the conversation. | 12:35 |
hberaud[m] | Plus, Victor Stinner is a former oslo/openstack/eventlet maintainer, I think we have chance he see that it bubbled up on our side. | 12:37 |
hberaud[m] | I time to time discuss with Victor, I can ping him directly if you want, else, if you prefer we can simply continue the conversation publicly over this existing bug, as you prefer | 12:38 |
hberaud[m] | s/existing CPython bug/ (to be exact) | 12:39 |
itamarst | I'd start with just a comment in the issue, cause other people might see it, and then follow up with reaching out to him? | 13:41 |
*** croeland1 is now known as croelandt | 13:56 | |
gibi | jkulik: what do you think about a potential nova fix along the line of https://paste.opendev.org/show/bsbO0vM2R1zqsYfJl8Cr/ my local testing show that it works for 3.12 but I don't have a 3.13 env. | 14:21 |
gibi | you original fix with the __dict__.copy() seems to build on the existing notion of grabbing another object's internals and getting suprised by the result. I would rather not touch those internals but use the official plubing to access it via getattr. | 14:29 |
jkulik | gibi: With that patch, I cannot reproduce the bug, so looks good | 14:30 |
jkulik | I agree that the .copy() isn't a good fix | 14:30 |
gibi | jkulik: thanks for confirming | 14:57 |
gibi | I think we should make this fix in nova regardless of the outcome of the cpython bug | 14:57 |
gibi | I can propose a fix or I can be one of the cores adding +2 ;) | 14:58 |
gibi | s/a fix/the fix/ | 14:58 |
itamarst | working around it is good idea because even if e.g. it's fixed in 3.13.6, it's difficult to guarantee which patch version people will use in practice | 19:50 |
itamarst | (just as a general rule, I imagine OpenStack deployments are much more controlled environments than average, but still) | 19:51 |
JayF | I think we're more likely than other projects to be run against a patched distro python :) and given my experience with how slow stable updates can be.... yeah | 19:56 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!