itamarst | frickler: re https://github.com/pola-rs/polars/issues/5503, does nova use os.read/os.write? if so, what for, and could it switch to blocking versions? | 14:08 |
---|---|---|
frickler | itamarst: that link looks wrong. also you'll have to check with nova developers, I'm just kind of a bystander | 14:12 |
itamarst | oops, I mean https://github.com/eventlet/eventlet/pull/975 | 14:12 |
itamarst | what made you think that was it, at least? | 14:12 |
hberaud[m] | gibi: FYI ^ | 14:14 |
frickler | I think there was something similar in some other project. and nova does a lot of io with libvirt and other things. but essentially it is just a blind guess | 14:14 |
hberaud[m] | I'm not convinced that this github issue is the root cause of the nova failure, because, this change was released with eventlet 0.37.0 and the nova cross functional job was green multiple time with 0.37.0 https://review.opendev.org/c/openstack/requirements/+/933257/3 | 14:19 |
hberaud[m] | this job started to fail later | 14:19 |
hberaud[m] | (If I correctly understand all this history) | 14:20 |
hberaud[m] | It seems that this job started to fail with 0.38+ | 14:21 |
itamarst | I guess I'll go through all PRs | 14:54 |
itamarst | between 0.37 and 0.38 there's https://github.com/eventlet/eventlet/pull/988/files which changed threading somewhat, and a change to http requests with empty body that seems less plausible | 14:56 |
itamarst | https://github.com/eventlet/eventlet/pull/1000/files also changes thread semantics | 14:57 |
hberaud[m] | would be good to have a reproducer to bisect changes | 14:57 |
itamarst | one test that fails (nova.tests.functional.test_servers.ServersTestV280.test_get_migrations_after_live_migrate_server_in_different_project) works for me locally :/ | 15:17 |
hberaud[m] | That was what I assumed :/ | 15:19 |
itamarst | including when I slow down execution a lot | 15:19 |
itamarst | (I have a little shell script that uses cgroups to slow things down, can be useful to catch race conditions) | 15:20 |
hberaud[m] | cool | 15:20 |
hberaud[m] | I put nova folks as "Cc" in the requirements patch https://review.opendev.org/c/openstack/requirements/+/933257/ | 15:21 |
hberaud[m] | I checked all the jobs and the nova job (cross-nova-functional) really started to fail with Patch set 8 and so with 0.38.2 | 15:24 |
hberaud[m] | No fail before | 15:24 |
hberaud[m] | (for this job) | 15:24 |
itamarst | trying the stestr run next, just to see if that makes a difference | 15:25 |
itamarst | it's a lot of tests! | 15:29 |
itamarst | ok, it _does_ fail if run in stestr (as does another test) | 15:32 |
itamarst | stestr with all the tests | 15:32 |
itamarst | so definitely feels very race condition-y | 15:32 |
hberaud[m] | cool | 15:33 |
hberaud[m] | what if if we run it with 0.38.1 | 15:34 |
hberaud[m] | ? | 15:34 |
itamarst | that was 0.38.2 | 15:36 |
itamarst | next gonna try 0.37 | 15:36 |
itamarst | first run passed on 0.37, gonna try one more time | 15:44 |
itamarst | but plausibly it's something that happened post-0.37 | 15:44 |
hberaud[m] | yeah I think the problem appeared after 0.38 | 15:44 |
hberaud[m] | there is 3 versions in 0.38 | 15:45 |
hberaud[m] | 0.38.2 fail | 15:45 |
hberaud[m] | 0.38.1 ? | 15:45 |
hberaud[m] | 0.38.0 ? | 15:45 |
hberaud[m] | 0.37 ok | 15:46 |
itamarst | 0.38.1 didn't fail with that particular test, so rerunning 0.38.2 again but looking likely that it's 0.38.2 specifically | 15:59 |
hberaud[m] | yes | 15:59 |
hberaud[m] | that was my feeling | 16:00 |
hberaud[m] | it will be a bit easier to isolate | 16:00 |
hberaud[m] | and surely to fix | 16:00 |
itamarst | it passed on 0.38.2 this time | 16:06 |
itamarst | so it's intermittent enough that I'm gonna have to do multiple runs on older versions :cry | 16:06 |
hberaud[m] | :/ | 16:09 |
itamarst | looks like it's present in 0.38.1 too | 16:14 |
itamarst | next gonna try 0.38.0 | 16:15 |
hberaud[m] | ah | 16:16 |
itamarst | 0.38.0 also intermittently fails | 16:34 |
itamarst | so gonna run 0.37 a few more times and see if it's _really_ ok or it's just intermittent | 16:35 |
itamarst | but if it is, that means the problem was introduced in 0.38.0 | 16:35 |
itamarst | if it is ok I mean | 16:35 |
itamarst | looking promising so far | 16:42 |
itamarst | so most likely reason is https://github.com/eventlet/eventlet/pull/988 | 16:43 |
itamarst | (gonna run with 0.37.0 a few more times though just to be sure) | 16:45 |
itamarst | oops | 16:50 |
itamarst | just triggered the failure in 0.37.0 | 16:50 |
itamarst | so guess I'm going earlier | 16:50 |
itamarst | at guess if it happens for every version of eventlet at some point it's just "hey you have some Issues in nova" and it's plausible not eventlet's fault | 16:50 |
JayF | Thank you for digging on it | 16:50 |
JayF | 0.36.1 is what shipped with dalmatian | 16:51 |
JayF | that might be a sane point to stop your testing if you were going to draw a line in the sand | 16:51 |
* itamarst nods | 16:52 | |
itamarst | if it is 0.37, candidates are the os.read/os.write thing, and the improved RLock upgrading 😢 | 17:00 |
itamarst | if it's the former... I wonder if oslo.log PipeMutex fix I did is relevant (still unreleased I think) since it uses that | 17:09 |
itamarst | fairly certain it's 0.37, 0.36.1 hasn't failed yet | 17:12 |
JayF | looking re: oslo log versions/releases | 17:38 |
JayF | that's a big nope, it's not released | 17:39 |
JayF | you can see if installing it from git fixes the issue, if so hberaud[m] should have rights to approve a release (you are oslo core, yeah?) | 17:40 |
JayF | itamarst: ^ | 17:40 |
itamarst | I am testing version from git | 17:50 |
itamarst | so far getting good vibes that oslo.log from git unbreak nova when eventlet 0.38.2 is installed | 17:51 |
JayF | I'll propose the release then | 17:53 |
itamarst | let me do a couple more runs | 17:53 |
JayF | please post your findings on the requirements patch that's busted, if you can | 17:53 |
JayF | we should release it either way | 17:53 |
itamarst | I will post my findings once I've successfully gotten no failures a few more times | 17:54 |
JayF | hberaud[m]: itamarst: https://review.opendev.org/c/openstack/releases/+/938683 proposed, we'll have to bump the oslo.log requirement with the eventlet one when the release hits | 17:59 |
itamarst | ok one more run and I'm gonna post | 18:01 |
itamarst | ok, posted: https://review.opendev.org/c/openstack/requirements/+/933257/comments/4158aa65_b9dfad47 | 18:10 |
JayF | A++++ thank you | 18:12 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!