zigo | hberaud[m]: Did you tag Eventlet 0.40.1 ? | 11:25 |
---|---|---|
zigo | We're still affected by https://github.com/eventlet/eventlet/commit/e470c1f493a87e867a36ee779573d7cbe964d53b even after the last merge (ie: tip of master still has the bug). | 11:34 |
hberaud[m] | zigo: not yet, the release is pending https://github.com/eventlet/eventlet/issues/1051 | 11:36 |
hberaud[m] | Weird... Guillaume confirmed that it solved his problem https://github.com/eventlet/eventlet/issues/1030#issuecomment-2970250157 | 11:37 |
zigo | hberaud[m]: Different issue, no ? | 11:38 |
hberaud[m] | its depends on what you are talking about. I was thinking you mostly worried about the Nova issue reported by Guillaume through https://github.com/eventlet/eventlet/issues/1030 | 11:40 |
zigo | hberaud[m]: The blocker for me, when I spawn a VM with nova, is this one: | 11:42 |
zigo | https://github.com/eventlet/eventlet/issues/1032 | 11:42 |
zigo | also reported here: | 11:42 |
zigo | https://bugs.launchpad.net/nova/+bug/2103413 | 11:42 |
zigo | We know it's a gc issue because when doing gc.disable(), it kind of works (knowing that gc.disable() is not a viable solution). | 11:42 |
zigo | If that one is fixed, then probably everything else will start working. | 11:43 |
zigo | I don't think it's useful to do a release if that one isn't fixed. | 11:43 |
zigo | https://github.com/eventlet/eventlet/commit/e470c1f493a87e867a36ee779573d7cbe964d53b was supposed to address it, but I don't see any resolution to https://bugs.launchpad.net/nova/+bug/2103413 (ie: Nova still can't query Neutron as it looses reference to its keystone object...). | 11:44 |
zigo | hberaud[m]: Does this make more sence now? | 11:45 |
hberaud[m] | So for now I think that this is a different issue from the initial one reported by Guillaume (the fork one). Guillaume told us that indeed this gc problem is still visible even with the fork patch and with the gc patch. For now we do not have a solution, but I think it is worth releasing eventlet in all the case, nothing will stop us from making another release later once we have a solution for this problem. | 11:45 |
zigo | As you like, though as much as I'm concerned, we're still "dans la merde" ! :) | 11:46 |
hberaud[m] | hahaha | 11:46 |
zigo | Thanks again for your work on this though. :) | 11:47 |
hberaud[m] | Thanks, and thanks for your precious help | 11:47 |
*** croeland1 is now known as croelandt | 12:18 | |
itamarst | if someone can produce a minimal reproducer that would be very helpful | 13:29 |
JayF | zigo: ^ | 14:19 |
zigo | JayF: The only way I know is setting-up OpenStack and try to spawn a VM. :/ | 14:19 |
zigo | I can give the traceback though. | 14:19 |
JayF | itamarst: zigo: if it's easily reproducible in something like a Dev stack, I could maybe set up a test harness. | 14:19 |
JayF | But I know that probably the goal is to not have to troubleshoot any of the openstack side of the problem, because isolating what is eventlet and what is Nova will be very difficult I imagine | 14:20 |
zigo | That's my traceback: | 14:21 |
zigo | https://paste.opendev.org/show/bMyqNoJ43Kshsj6iXu4Y/ | 14:21 |
zigo | itamarst: Does this help? | 14:25 |
itamarst | not really | 14:25 |
itamarst | it would be useful to know whether or not fork() is being used | 14:25 |
itamarst | if this only happens when using fork()... the solution is to not use fork() | 14:27 |
itamarst | if this happens without fork()... an object's dictionary being wiped is... potentially even a bug in Python | 14:30 |
hberaud[m] | Make sense | 14:32 |
itamarst | another experiment to try | 14:32 |
itamarst | remove __del__ methods | 14:32 |
zigo | Would it help if I tried to bisect interpreter versions ? | 14:32 |
itamarst | I would first rule out fork(), then remove __del__ methods | 14:33 |
zigo | In what class ? | 14:33 |
itamarst | there's one in keystoneauth1.session.Session at minimum | 14:33 |
itamarst | but I would check SessionClient too | 14:34 |
itamarst | (https://github.com/python/cpython/issues/135552 is a bug with __del__ in all Python 3 versions, but it may only be exposed in some situations which 3.13 makes more likely. or it may be completely unrealted to what you are ssing) | 14:34 |
itamarst | also worth testing with latest patch release of 3.13 | 14:36 |
itamarst | (and that specifically means _not_ the distro version since they don't ship most bug fixes) | 14:37 |
zigo | Commeting out the __del__() method in keystoneauth1.session.Session has no effect at least. | 14:37 |
zigo | If you find a patch to apply to the distro version, it's easy to test. Saying "try the latest" is harder. | 14:38 |
itamarst | uv will download them for you | 14:39 |
itamarst | or you can just download from python.org | 14:39 |
itamarst | on ubuntu there's deadsnakes PPA, etc | 14:40 |
zigo | I can try switching from 3.13.3 to 3.13.5. | 14:41 |
zigo | Nop, not fixing... :/ | 14:43 |
* zigo goes back home. | 14:45 | |
itamarst | how about fork()? | 14:45 |
jkulik | hm ... I was able to reproduce it with this: https://paste.opendev.org/show/bJW0VtRkzfanKa3N4Zso/ - pretty hacked together and needs a working Neutron + Keystone for now | 15:12 |
jkulik | it started happening once I moved the `neutron.get_client()` calls into the `network()` function. I had passed them into `NetworkInfoAsyncWrapper` as arguments previously and that worked. | 15:15 |
itamarst | great. so no fork(). and it works with older versions of python? | 15:18 |
jkulik | hm ... need to test. currently 3.13.2 | 15:19 |
itamarst | another fun and plausible place to be causing this is greenlet | 15:19 |
itamarst | depending if I understood the problem correctly. where is SessionClient implemented so I can see its source? | 15:21 |
jkulik | seems to work with Python 3.12.9 - no exception | 15:22 |
jkulik | SessionClient should come from here https://github.com/openstack/python-neutronclient/blob/master/neutronclient/client.py#L305 | 15:24 |
itamarst | so doesn't look like endpoint_override is del'd, not seeing anything with __dict__... so does seem like a weird bug | 15:27 |
itamarst | and for all the terrible stuff eventlet does I don't see how it would cause this | 15:28 |
hberaud[m] | jkulik: thanks for feedback | 15:30 |
jkulik | (Pdb) [o for o in gc.get_objects() if isinstance(o, t)] using this, I can see 2 clients when the exception gets raised. one (the second one) has a completely empty __dict__ | 15:30 |
itamarst | that's a pretty good bug | 15:33 |
itamarst | so, eventlet honestly seems like an unlikely source for this kind of bug, so it's more likely either greenlet, or CPython | 15:36 |
jkulik | gc.is_finalized() returns False for both objects, fyi | 15:37 |
jkulik | Cannot reproduce with `gc.disable()` added into the code before the first spawn() | 15:41 |
itamarst | an ideal next step would be to find a reproducer that doesn't rely on third party libraries, or third party servers | 15:43 |
itamarst | I do wonder how greenlet and gc interact in edge cases so will spend a few minutes looking at that in a bit | 15:44 |
itamarst | but it could also be a bug elsewhere in greenlet's 3.13 support | 15:44 |
itamarst | or it could be CPython, somehow somewhere | 15:44 |
itamarst | (but if I were a CPython dev my first reflex would be to ask for reproducer _without_ eventlet) | 15:45 |
itamarst | er | 15:45 |
itamarst | without grenlet | 15:45 |
itamarst | but in any case a reproducer that uses just eventlet would be very helpful next step | 15:45 |
jkulik | hm ... I was able to reduce the number of clients it needs. one is enough. what I noticed: I get thrown into the breakpoint after the/a client was used once | 16:09 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!