*** ryuji is now known as Guest5954 | 02:07 | |
hemanth | Hi team, unfortunately I could not attend weekly meeting yesterday to ask about one of the item which I added in https://etherpad.opendev.org/p/nova-2025.1-status, appreciate if someone can review https://review.opendev.org/c/openstack/nova/+/939044 and provide feedback, thanks! | 02:53 |
---|---|---|
opendevreview | Mengyang Zhang proposed openstack/nova master: Add Burst Length Support to Cinder QoS https://review.opendev.org/c/openstack/nova/+/939310 | 03:28 |
opendevreview | melanie witt proposed openstack/nova master: libvirt: Wrap un-proxied listDevices() and listAllDevices() https://review.opendev.org/c/openstack/nova/+/939317 | 06:46 |
melwitt | sean-k-mooney: stacked this on top of your revert fyi ^ | 06:48 |
opendevreview | Dmitriy Chubinidze proposed openstack/nova master: Using "dnf" instead of "yum" with uncommenting compute_driver option. https://review.opendev.org/c/openstack/nova/+/939325 | 09:39 |
opendevreview | Ivan Anfimov proposed openstack/nova master: doc: Adding link for RabbitMQ installation during nova deployment on controller node. https://review.opendev.org/c/openstack/nova/+/938702 | 09:55 |
opendevreview | Dmitriy Chubinidze proposed openstack/nova master: Enabling of using LibvirtDriver for compute node. https://review.opendev.org/c/openstack/nova/+/939325 | 10:02 |
opendevreview | Dmitriy Chubinidze proposed openstack/nova master: doc: Enabling of using LibvirtDriver for compute node. https://review.opendev.org/c/openstack/nova/+/939325 | 10:14 |
opendevreview | Dmitriy Chubinidze proposed openstack/nova master: doc: Enabling of using LibvirtDriver for compute node. https://review.opendev.org/c/openstack/nova/+/939325 | 10:21 |
*** ykarel_ is now known as ykarel | 13:43 | |
opendevreview | Mengyang Zhang proposed openstack/nova-specs master: Add Burst Length Support to Cinder QoS https://review.opendev.org/c/openstack/nova-specs/+/932653 | 14:47 |
MengyangZhang[m] | <tkajinam> "ok so I'd ask Mengyang Zhang..." <- Yes, the bp is updated with a description. and the code change is here https://review.opendev.org/c/openstack/nova/+/939310, still need to solve the conflicts tho. | 15:00 |
opendevreview | Takashi Kajinami proposed openstack/nova master: Add Burst Length Support to Cinder QoS https://review.opendev.org/c/openstack/nova/+/939310 | 15:08 |
sean-k-mooney | JayF: im actully questioning if the oslo.log patch did actully fix the event let issue or not. | 15:43 |
sean-k-mooney | i swaped form my debuggin patch to the tip of master because i wanted to do a rebase and if i restart the service instead of getting the blockign call in the first 3-5 second like i was reliablly before it s been running for 5 mins witn no issue | 15:45 |
JayF | Given the nature of the eventlet change, I wonder more if it's "existing bug that's coming to the surface more easily" vs it being a direct eventlet issue | 15:46 |
JayF | I wonder if you have any pipemutex-shaped constructs in nova that might need some patching | 15:46 |
JayF | if you can isolate the case, you can see if you can get itamarst in #openstack-eventlet-removal to look if you get stuck :) | 15:47 |
sean-k-mooney | oh the thing im debuging is in watcher | 15:47 |
sean-k-mooney | not nova | 15:47 |
JayF | aha | 15:47 |
JayF | either way, same comment applies, even though TBH our shared downstream doesn't use watcher :D | 15:47 |
sean-k-mooney | well the thing im trying to debug is actuly related to oslo.db and sqlachemy, eventlet complains that you shoudl not make blocking calls on the main tread | 15:48 |
sean-k-mooney | the blockign call is time.sleep(0) form oslo.db _thread_yeild function | 15:48 |
sean-k-mooney | so this https://github.com/openstack/oslo.db/blob/21c484ffb26ed03cbb1ca9dd7fd27fb19a1d4b36/oslo_db/sqlalchemy/engines.py#L43-L52 | 15:49 |
JayF | I wonder if that should be something more like "if eventlet is imported; sleep(0)" | 15:51 |
JayF | time.sleep(0) unpatched technically is a blocking call, yeah? | 15:51 |
sean-k-mooney | https://paste.opendev.org/show/bGPgfURx1cZYOsgmtDyw/ | 15:51 |
JayF | I'm really, really curious if changing that to an eventlet.sleep(0) in your test fixes that | 15:52 |
sean-k-mooney | that is the error but it iss patched as far as i could tell | 15:52 |
JayF | sean-k-mooney: where does the patch happen? | 15:53 |
sean-k-mooney | JayF: this is where the runtim error is comign form https://github.com/eventlet/eventlet/blob/3450e6adb017f4cd47d1c4e06331c9fa26a986d1/eventlet/greenthread.py#L35 | 15:53 |
JayF | sean-k-mooney: I'm looking for it, I see no calls to https://github.com/openstack/watcher/blob/master/watcher/eventlet.py | 15:53 |
JayF | hmm, yeah, this is patching late | 15:53 |
JayF | I suspect this is like, half eventlet patched | 15:54 |
sean-k-mooney | https://github.com/openstack/watcher/blob/master/watcher/cmd/__init__.py#L29 | 15:54 |
sean-k-mooney | JayF: its not patching late | 15:54 |
JayF | is watcher wsgi? | 15:54 |
sean-k-mooney | only the api | 15:54 |
sean-k-mooney | the rest is not | 15:54 |
JayF | hmm | 15:54 |
sean-k-mooney | anyway checkign out master and restarting everything with the new eventlest release im not seeign the problem anymore | 15:55 |
sean-k-mooney | im going to check the latest ci run and see if we see the error in the logs | 15:55 |
JayF | I am always happy to be a sounding board for eventlet stuff, but at some of this level I can't keep it all straight | 15:55 |
sean-k-mooney | ah dambit | 15:56 |
sean-k-mooney | it just triggred in my debug console | 15:56 |
JayF | is it possible this is whatever the opposite of a hesenbug is? the debug console itself causing the headache? | 15:57 |
sean-k-mooney | this is why its so annoying to debug. its been fine without the issue for the 5-10 mins we have been chatting | 15:57 |
sean-k-mooney | we are seeing it in the gate too | 15:57 |
sean-k-mooney | i did a partial fix patch that was enough to make the tempest jobs pass | 15:58 |
sean-k-mooney | then then i was inpsectign the logs and notice that we still had those runtime errors | 15:58 |
sean-k-mooney | so i think my orgianl "fix" just chagned the timing and made it less problematic | 15:58 |
JayF | I'm really having trouble wrapping my head around eventlet code | 16:00 |
JayF | that change you linked | 16:00 |
JayF | eventlet.sleep() is everywhere | 16:01 |
JayF | and making it so we only call that from coroutines (if I read the code right) seems absolutely brutal | 16:01 |
JayF | which is why I assumed at first that it must be somehow unpatched | 16:01 |
sean-k-mooney | right we document if you ever do something long running add eventlet.sleep(0) | 16:01 |
sean-k-mooney | to yeild | 16:01 |
sean-k-mooney | or time.sleep if you want to rely on the monkey patching | 16:02 |
JayF | maybe it's worth asking about that pattern in light of recent code changes in an eventlet issue? | 16:02 |
sean-k-mooney | oh i never looked at the history of this | 16:03 |
sean-k-mooney | https://github.com/eventlet/eventlet/commit/48c259c98edb1eb671afeda9800ca4db17b3ccbf | 16:03 |
JayF | if this is new in this eventlet release, it implies maybe there's a misalignment between how we use it and how they think we use it | 16:03 |
sean-k-mooney | it was maide a runtime error not an assert | 16:03 |
JayF | so it still would've failed, yes? | 16:04 |
sean-k-mooney | no before it was just an assert | 16:04 |
sean-k-mooney | so it woudl have done nothing at runtime | 16:04 |
sean-k-mooney | assert are off unless you run your interperter in debug mode | 16:04 |
sean-k-mooney | this was added in 0.36.0 | 16:05 |
sean-k-mooney | JayF: so oslo.db injects a time.sleep(0) on every "checkin" event emmited by the sqlalchemy engine | 16:07 |
sean-k-mooney | https://github.com/openstack/oslo.db/blob/21c484ffb26ed03cbb1ca9dd7fd27fb19a1d4b36/oslo_db/sqlalchemy/engines.py#L360-L361 | 16:07 |
JayF | aha | 16:07 |
JayF | sean-k-mooney: I'd file a bug against that change then in eventlet | 16:07 |
JayF | sean-k-mooney: it's going to break so much stuff in so many sneaky ways | 16:08 |
sean-k-mooney | JayF: nova's docs have this by the way https://docs.openstack.org/nova/latest/reference/threading.html#yielding-the-thread-in-long-running-tasks | 16:09 |
JayF | so my comment of "replace the time.sleep() with an if eventlet: eventlet.sleep()" was pretty close to a solution, seemingly? | 16:10 |
sean-k-mooney | we have documented the use of time.sleep(0) or greenthread.sleep(0) since before we supproted python3 | 16:10 |
JayF | yeah, so let the eventlet folks know so they can revert the problematic change :) | 16:10 |
JayF | or at least maybe except sleep(0) | 16:10 |
sean-k-mooney | JayF: i dont think it will actully change the behivor | 16:11 |
sean-k-mooney | what i can do is try locally reverting it in my venv | 16:11 |
sean-k-mooney | and see what happens | 16:11 |
sean-k-mooney | well hum i guess that wont tell me much | 16:12 |
sean-k-mooney | it will likely fix the issue but that because asssert are ignored | 16:12 |
sean-k-mooney | JayF: ill file an eventlet bug and push a revert patch, this is the watcher bug i had previously filed for this https://bugs.launchpad.net/watcher/+bug/2086710 | 16:13 |
JayF | I suspect the nice middle ground might be | 16:13 |
JayF | RuntimeError() if the sleep is nonzero | 16:13 |
JayF | but IMBW | 16:14 |
JayF | I also don't fully understand the case they're guarding against in eventlet | 16:14 |
JayF | or maybe I should say; why it's bad to do that | 16:14 |
sean-k-mooney | ya i think that could work. | 16:14 |
sean-k-mooney | what we are really using sleep(0) for is to yeild form teh current corotie back to the hub | 16:14 |
sean-k-mooney | so you could say well we shoudl have done hub.swtich() or the equivlent | 16:15 |
sean-k-mooney | but im pretty sure the eventlet docs used ot docuemtn time.sleep(0) too | 16:15 |
sean-k-mooney | JayF: ah the docsting says it all https://github.com/eventlet/eventlet/blob/3450e6adb017f4cd47d1c4e06331c9fa26a986d1/eventlet/greenthread.py#L25-L27 | 16:16 |
JayF | yeah | 16:17 |
sean-k-mooney | JayF: so ya i think they just forgot to make this condtional | 16:17 |
JayF | ++ | 16:17 |
JayF | If you can file that issue and/or put up a PR; I'll point itamar at it | 16:17 |
JayF | I can't tell him to work on it, but he's generally cooperative :D | 16:18 |
sean-k-mooney | thanks for the fresh persepctive i didnt think to see if this was a eventlet regression just because i expected this to kill nova and other things too | 16:18 |
JayF | I've moved beyond "who regressed" and into "the whole thing is so damn fragile any change can have massive ripple effects" | 16:23 |
JayF | lol | 16:23 |
JayF | is it eventlet's fault? is it openstack's fault? yes. all of us. lol | 16:24 |
sean-k-mooney | well i tought it was an issue with apschduelr because it does nto actully supprot uage with eventlet and is spwaning backgorund threads | 16:24 |
sean-k-mooney | os i tried to fix it by ensuring the backgound thread was monkey patched initally | 16:25 |
sean-k-mooney | JayF: watcher is in one optional part of the code also mixing asyncio with eventlet without usign the asyncio hub | 16:25 |
sean-k-mooney | so i was going down the line "ok watcher is mixing may concurance models so it must be a watcher issue becuase nova is ok" | 16:26 |
MengyangZhang[m] | Hello, I have a question about git review. I was following the Gerrit workflow tutorial https://docs.opendev.org/opendev/infra-manual/latest/sandbox.html#sandbox and ran into this error when running git review. I... (full message at <https://matrix.org/oftc/media/v1/media/download/AYfP68eKdfAp4GdhSOGMJNoxIcgRvFjHe-ES5V35PeyBtFAsEz-1GYjZH3yveUoKSXQ1CvR5Y8ZkyioBMNdqdLtCeUsuVqzQAG1hdHJpeC5vcmcvUHNabUJzV29SVVdnS2FwYUtYYm11d3RR>) | 16:29 |
JayF | MengyangZhang[m]: that's a new one on me, perhaps #opendev would have more insight? | 16:30 |
MengyangZhang[m] | okk thanks for quick response. I will raise it there. | 16:30 |
JayF | good luck! | 16:31 |
sean-k-mooney | MengyangZhang[m] so i think this is related to the verion of git you have installed | 16:31 |
MengyangZhang[m] | mine is 2.39.2 | 16:31 |
sean-k-mooney | i have not hit that but you coudl work around it by doing a local rebase agaisnt master before doing "git revew" | 16:32 |
sean-k-mooney | for what its worth we do nto allow merge commits to be part of a review | 16:32 |
sean-k-mooney | i have git version 2.46.0 | 16:33 |
sean-k-mooney | so your version is not that old | 16:33 |
JayF | I've absolutely used git review w/git 2.39.2 at some point, I think it's unlikely to be that | 16:33 |
JayF | sean-k-mooney may have id'd the issue -- if there are any merge commits in your push | 16:33 |
sean-k-mooney | you probaly dont want pull.rebase=merges in your git cofnig either | 16:34 |
sean-k-mooney | i woudl set that to pull.rebase=true | 16:35 |
sean-k-mooney | or just unset it to get the default behvior | 16:35 |
MengyangZhang[m] | Thanks for the help! So i did a local rebase agains master and tried git review again but still same error | 16:38 |
MengyangZhang[m] | ``` | 16:38 |
MengyangZhang[m] | * Thanks for the help! So i did a local rebase agains master and tried git review again but still same error | 16:39 |
MengyangZhang[m] | git rebase master | 16:39 |
MengyangZhang[m] | Current branch mzhang-branch is up to date. | 16:39 |
JayF | Look at the git config changes that Sean suggested, perhaps? | 16:42 |
sean-k-mooney | so you shoudl do it like this | 16:43 |
sean-k-mooney | git fetch --all | 16:43 |
sean-k-mooney | git rebase -i gerrit/master | 16:43 |
sean-k-mooney | git review | 16:44 |
sean-k-mooney | if your local master is behind origin or gerrit master there may be a remote merge conflict that you are not seeing | 16:44 |
sean-k-mooney | JayF: ill see if i can find time to push a ptch to add teh if check later but i need to step away for a bit, this is the bug https://github.com/eventlet/eventlet/issues/1014 | 16:47 |
JayF | sean-k-mooney: sounds like Itamar agrees with our suggested fix, will work on it soon :) | 16:54 |
sean-k-mooney | oh awsome | 16:55 |
sean-k-mooney | im going to try hackign it in locally to confirm that it does nto fail later but that woudl be good if they had time | 16:56 |
JayF | it's a good fix regardless of if it resolves the watcher issue | 16:59 |
JayF | but hopefully it cleans up both | 16:59 |
-opendevstatus- NOTICE: The paste service at paste.opendev.org will have a short (15-20) minute outage momentarily to replace the underlying server. | 17:08 | |
opendevreview | ribaudr proposed openstack/nova-specs master: Fix a line that was not removed before merging the specification https://review.opendev.org/c/openstack/nova-specs/+/939380 | 18:16 |
opendevreview | Dmitriy Chubinidze proposed openstack/nova master: Correct description for AggregateMultiTenancyIsolation. https://review.opendev.org/c/openstack/nova/+/939398 | 21:19 |
opendevreview | Mengyang Zhang proposed openstack/nova master: Add Burst Length Support to Cinder QoS https://review.opendev.org/c/openstack/nova/+/939310 | 21:36 |
opendevreview | melanie witt proposed openstack/nova master: libvirt pwr mgmt: reproduce bug 2085122 https://review.opendev.org/c/openstack/nova/+/932922 | 23:37 |
opendevreview | melanie witt proposed openstack/nova master: pwr mgmt: power down free PCPUS when updating compute node https://review.opendev.org/c/openstack/nova/+/932926 | 23:37 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!