Wednesday, 2025-01-08

itamarstfrickler: re https://github.com/pola-rs/polars/issues/5503, does nova use os.read/os.write? if so, what for, and could it switch to blocking versions?14:08
frickleritamarst: that link looks wrong. also you'll have to check with nova developers, I'm just kind of a bystander14:12
itamarstoops, I mean https://github.com/eventlet/eventlet/pull/97514:12
itamarstwhat made you think that was it, at least?14:12
hberaud[m]gibi: FYI ^14:14
fricklerI think there was something similar in some other project. and nova does a lot of io with libvirt and other things. but essentially it is just a blind guess14:14
hberaud[m]I'm not convinced that this github issue is the root cause of the nova failure, because, this change was released with eventlet 0.37.0 and the nova cross functional job was green multiple time with 0.37.0 https://review.opendev.org/c/openstack/requirements/+/933257/314:19
hberaud[m]this job started to fail later14:19
hberaud[m](If I correctly understand all this history)14:20
hberaud[m]It seems that this job started to fail with 0.38+14:21
itamarstI guess I'll go through all PRs14:54
itamarstbetween 0.37 and 0.38 there's https://github.com/eventlet/eventlet/pull/988/files which changed threading somewhat, and a change to http requests with empty body that seems less plausible14:56
itamarsthttps://github.com/eventlet/eventlet/pull/1000/files also changes thread semantics14:57
hberaud[m]would be good to have a reproducer to bisect changes14:57
itamarstone test that fails (nova.tests.functional.test_servers.ServersTestV280.test_get_migrations_after_live_migrate_server_in_different_project) works for me locally :/15:17
hberaud[m]That was what I assumed :/15:19
itamarstincluding when I slow down execution a lot15:19
itamarst(I have a little shell script that uses cgroups to slow things down, can be useful to catch race conditions)15:20
hberaud[m]cool15:20
hberaud[m]I put nova folks as "Cc" in the requirements patch https://review.opendev.org/c/openstack/requirements/+/933257/15:21
hberaud[m]I checked all the jobs and the nova job (cross-nova-functional) really started to fail with Patch set 8 and so with 0.38.215:24
hberaud[m]No fail before15:24
hberaud[m](for this job)15:24
itamarsttrying the stestr run next, just to see if that makes a difference15:25
itamarstit's a lot of tests!15:29
itamarstok, it _does_ fail if run in stestr (as does another test)15:32
itamarststestr with all the tests15:32
itamarstso definitely feels very race condition-y15:32
hberaud[m]cool15:33
hberaud[m]what if if we run it with 0.38.115:34
hberaud[m]?15:34
itamarstthat was 0.38.215:36
itamarstnext gonna try 0.3715:36
itamarstfirst run passed on 0.37, gonna try one more time15:44
itamarstbut plausibly it's something that happened post-0.3715:44
hberaud[m]yeah I think the problem appeared after 0.3815:44
hberaud[m]there is 3 versions in 0.3815:45
hberaud[m]0.38.2 fail15:45
hberaud[m]0.38.1 ?15:45
hberaud[m]0.38.0 ?15:45
hberaud[m]0.37 ok15:46
itamarst0.38.1 didn't fail with that particular test, so rerunning 0.38.2 again but looking likely that it's 0.38.2 specifically15:59
hberaud[m]yes15:59
hberaud[m]that was my feeling16:00
hberaud[m]it will be a bit easier to isolate16:00
hberaud[m]and surely to fix16:00
itamarstit passed on 0.38.2 this time16:06
itamarstso it's intermittent enough that I'm gonna have to do multiple runs on older versions :cry16:06
hberaud[m]:/16:09
itamarstlooks like it's present in 0.38.1 too16:14
itamarstnext gonna try 0.38.016:15
hberaud[m]ah16:16
itamarst0.38.0 also intermittently fails16:34
itamarstso gonna run 0.37 a few more times and see if it's _really_ ok or it's just intermittent16:35
itamarstbut if it is, that means the problem was introduced in 0.38.016:35
itamarstif it is ok I mean16:35
itamarstlooking promising so far16:42
itamarstso most likely reason is https://github.com/eventlet/eventlet/pull/98816:43
itamarst(gonna run with 0.37.0 a few more times though just to be sure)16:45
itamarstoops16:50
itamarstjust triggered the failure in 0.37.016:50
itamarstso guess I'm going earlier16:50
itamarstat guess if it happens for every version of eventlet at some point it's just "hey you have some Issues in nova" and it's plausible not eventlet's fault16:50
JayFThank you for digging on it16:50
JayF0.36.1 is what shipped with dalmatian16:51
JayFthat might be a sane point to stop your testing if you were going to draw a line in the sand16:51
* itamarst nods16:52
itamarstif it is 0.37, candidates are the os.read/os.write thing, and the improved RLock upgrading 😢17:00
itamarstif it's the former... I wonder if oslo.log PipeMutex fix I did is relevant (still unreleased I think) since it uses that17:09
itamarstfairly certain it's 0.37, 0.36.1 hasn't failed yet17:12
JayFlooking re: oslo log versions/releases17:38
JayFthat's a big nope, it's not released17:39
JayFyou can see if installing it from git fixes the issue, if so hberaud[m] should have rights to approve a release (you are oslo core, yeah?)17:40
JayFitamarst: ^17:40
itamarstI am testing version from git17:50
itamarstso far getting good vibes that oslo.log from git unbreak nova when eventlet 0.38.2 is installed17:51
JayFI'll propose the release then17:53
itamarstlet me do a couple more runs17:53
JayFplease post your findings on the requirements patch that's busted, if you can17:53
JayFwe should release it either way17:53
itamarstI will post my findings once I've successfully gotten no failures a few more times17:54
JayFhberaud[m]: itamarst: https://review.opendev.org/c/openstack/releases/+/938683 proposed, we'll have to bump the oslo.log requirement with the eventlet one when the release hits17:59
itamarstok one more run and I'm gonna post18:01
itamarstok, posted: https://review.opendev.org/c/openstack/requirements/+/933257/comments/4158aa65_b9dfad4718:10
JayFA++++ thank you18:12

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!