Wednesday, 2025-01-15

*** ryuji is now known as Guest595402:07
hemanthHi team, unfortunately I could not attend weekly meeting yesterday to ask about one of the item which I added in https://etherpad.opendev.org/p/nova-2025.1-status, appreciate if someone can review https://review.opendev.org/c/openstack/nova/+/939044 and provide feedback, thanks!02:53
opendevreviewMengyang Zhang proposed openstack/nova master: Add Burst Length Support to Cinder QoS  https://review.opendev.org/c/openstack/nova/+/93931003:28
opendevreviewmelanie witt proposed openstack/nova master: libvirt: Wrap un-proxied listDevices() and listAllDevices()  https://review.opendev.org/c/openstack/nova/+/93931706:46
melwittsean-k-mooney: stacked this on top of your revert fyi ^06:48
opendevreviewDmitriy Chubinidze proposed openstack/nova master: Using "dnf" instead of "yum" with uncommenting compute_driver option.  https://review.opendev.org/c/openstack/nova/+/93932509:39
opendevreviewIvan Anfimov proposed openstack/nova master: doc: Adding link for RabbitMQ installation during nova deployment on controller node.  https://review.opendev.org/c/openstack/nova/+/93870209:55
opendevreviewDmitriy Chubinidze proposed openstack/nova master: Enabling of using LibvirtDriver for compute node.  https://review.opendev.org/c/openstack/nova/+/93932510:02
opendevreviewDmitriy Chubinidze proposed openstack/nova master: doc: Enabling of using LibvirtDriver for compute node.  https://review.opendev.org/c/openstack/nova/+/93932510:14
opendevreviewDmitriy Chubinidze proposed openstack/nova master: doc: Enabling of using LibvirtDriver for compute node.  https://review.opendev.org/c/openstack/nova/+/93932510:21
*** ykarel_ is now known as ykarel13:43
opendevreviewMengyang Zhang proposed openstack/nova-specs master: Add Burst Length Support to Cinder QoS  https://review.opendev.org/c/openstack/nova-specs/+/93265314:47
MengyangZhang[m]<tkajinam> "ok so I'd ask Mengyang Zhang..." <- Yes, the bp is updated with a description. and the code change is here https://review.opendev.org/c/openstack/nova/+/939310, still need to solve the conflicts tho.15:00
opendevreviewTakashi Kajinami proposed openstack/nova master: Add Burst Length Support to Cinder QoS  https://review.opendev.org/c/openstack/nova/+/93931015:08
sean-k-mooneyJayF: im actully questioning if the oslo.log patch did actully fix the event let issue or not. 15:43
sean-k-mooneyi swaped form my debuggin patch to the tip of master because i wanted to do a rebase and if i restart the service instead of getting the blockign call in the first 3-5 second like i was reliablly before it s been running for 5 mins witn no issue15:45
JayFGiven the nature of the eventlet change, I wonder more if it's "existing bug that's coming to the surface more easily" vs it being a direct eventlet issue15:46
JayFI wonder if you have any pipemutex-shaped constructs in nova that might need some patching15:46
JayFif you can isolate the case, you can see if you can get itamarst in #openstack-eventlet-removal to look if you get stuck :) 15:47
sean-k-mooneyoh the thing im debuging is in watcher15:47
sean-k-mooneynot nova15:47
JayFaha15:47
JayFeither way, same comment applies, even though TBH our shared downstream doesn't use watcher :D 15:47
sean-k-mooneywell the thing im trying to debug is actuly related to oslo.db and sqlachemy, eventlet complains that you shoudl not make blocking calls on the main tread15:48
sean-k-mooneythe blockign call is time.sleep(0) form oslo.db _thread_yeild function15:48
sean-k-mooneyso this https://github.com/openstack/oslo.db/blob/21c484ffb26ed03cbb1ca9dd7fd27fb19a1d4b36/oslo_db/sqlalchemy/engines.py#L43-L5215:49
JayFI wonder if that should be something more like "if eventlet is imported; sleep(0)"15:51
JayFtime.sleep(0) unpatched technically is a blocking call, yeah? 15:51
sean-k-mooneyhttps://paste.opendev.org/show/bGPgfURx1cZYOsgmtDyw/15:51
JayFI'm really, really curious if changing that to an eventlet.sleep(0) in your test fixes that15:52
sean-k-mooneythat is the error but it iss patched as far as i could tell15:52
JayFsean-k-mooney: where does the patch happen?15:53
sean-k-mooneyJayF: this is where the runtim error is comign form https://github.com/eventlet/eventlet/blob/3450e6adb017f4cd47d1c4e06331c9fa26a986d1/eventlet/greenthread.py#L3515:53
JayFsean-k-mooney: I'm looking for it, I see no calls to https://github.com/openstack/watcher/blob/master/watcher/eventlet.py15:53
JayFhmm, yeah, this is patching late15:53
JayFI suspect this is like, half eventlet patched15:54
sean-k-mooneyhttps://github.com/openstack/watcher/blob/master/watcher/cmd/__init__.py#L2915:54
sean-k-mooneyJayF: its not patching late15:54
JayFis watcher wsgi?15:54
sean-k-mooneyonly the api15:54
sean-k-mooneythe rest is not15:54
JayFhmm15:54
sean-k-mooneyanyway checkign out master and restarting everything with the new eventlest release im not seeign the problem anymore15:55
sean-k-mooneyim going to check the latest ci run and see if we see the error in the logs15:55
JayFI am always happy to be a sounding board for eventlet stuff, but at some of this level I can't keep it all straight15:55
sean-k-mooneyah dambit15:56
sean-k-mooneyit just triggred in my debug console15:56
JayFis it possible this is whatever the opposite of a hesenbug is? the debug console itself causing the headache?15:57
sean-k-mooneythis is why its so annoying to debug. its been fine without the issue for the 5-10 mins we have been chatting15:57
sean-k-mooneywe are seeing it in the gate too15:57
sean-k-mooneyi did a partial fix patch that was enough to make the tempest jobs pass15:58
sean-k-mooneythen then i was inpsectign the logs and notice that we still had those runtime errors15:58
sean-k-mooneyso i think my orgianl "fix" just chagned the timing and made it less problematic15:58
JayFI'm really having trouble wrapping my head around eventlet code16:00
JayFthat change you linked16:00
JayFeventlet.sleep() is everywhere16:01
JayFand making it so we only call that from coroutines (if I read the code right) seems absolutely brutal16:01
JayFwhich is why I assumed at first that it must be somehow unpatched16:01
sean-k-mooneyright we document if you ever do something long running add eventlet.sleep(0)16:01
sean-k-mooneyto yeild16:01
sean-k-mooneyor time.sleep if you want to rely on the monkey patching16:02
JayFmaybe it's worth asking about that pattern in light of recent code changes in an eventlet issue?16:02
sean-k-mooneyoh i never looked at the history of this16:03
sean-k-mooneyhttps://github.com/eventlet/eventlet/commit/48c259c98edb1eb671afeda9800ca4db17b3ccbf16:03
JayFif this is new in this eventlet release, it implies maybe there's a misalignment between how we use it and how they think we use it16:03
sean-k-mooneyit was maide a runtime error not an assert16:03
JayFso it still would've failed, yes?16:04
sean-k-mooneyno before it was just an assert16:04
sean-k-mooneyso it woudl have done nothing at runtime16:04
sean-k-mooneyassert are  off unless you run your interperter in debug mode16:04
sean-k-mooneythis was added in 0.36.016:05
sean-k-mooneyJayF: so oslo.db injects a time.sleep(0) on every "checkin" event emmited by the sqlalchemy engine16:07
sean-k-mooneyhttps://github.com/openstack/oslo.db/blob/21c484ffb26ed03cbb1ca9dd7fd27fb19a1d4b36/oslo_db/sqlalchemy/engines.py#L360-L36116:07
JayFaha16:07
JayFsean-k-mooney: I'd file a bug against that change then in eventlet16:07
JayFsean-k-mooney: it's going to break so much stuff in so many sneaky ways16:08
sean-k-mooneyJayF: nova's docs have this by the way https://docs.openstack.org/nova/latest/reference/threading.html#yielding-the-thread-in-long-running-tasks16:09
JayFso my comment of "replace the time.sleep() with an if eventlet: eventlet.sleep()" was pretty close to a solution, seemingly?16:10
sean-k-mooneywe have documented the use of time.sleep(0) or greenthread.sleep(0) since before we supproted python316:10
JayFyeah, so let the eventlet folks know so they can revert the problematic change :)16:10
JayFor at least maybe except sleep(0) 16:10
sean-k-mooneyJayF: i dont think it will actully change the behivor16:11
sean-k-mooneywhat i can do is try locally reverting it in my venv16:11
sean-k-mooneyand see what happens16:11
sean-k-mooneywell hum i guess that wont tell me much16:12
sean-k-mooneyit will likely fix the issue but that because asssert are ignored16:12
sean-k-mooneyJayF: ill file an eventlet bug and push a revert patch, this is the watcher bug i had previously filed for this https://bugs.launchpad.net/watcher/+bug/208671016:13
JayFI suspect the nice middle ground might be 16:13
JayFRuntimeError() if the sleep is nonzero16:13
JayFbut IMBW16:14
JayFI also don't fully understand the case they're guarding against in eventlet16:14
JayFor maybe I should say; why it's bad to do that16:14
sean-k-mooneyya i think that could work.16:14
sean-k-mooneywhat we are really using sleep(0) for is to yeild form teh current corotie back to the hub16:14
sean-k-mooneyso you could say well we shoudl have done hub.swtich() or the equivlent16:15
sean-k-mooneybut im pretty sure the eventlet docs used ot docuemtn time.sleep(0) too16:15
sean-k-mooneyJayF: ah the docsting says it all https://github.com/eventlet/eventlet/blob/3450e6adb017f4cd47d1c4e06331c9fa26a986d1/eventlet/greenthread.py#L25-L2716:16
JayFyeah16:17
sean-k-mooneyJayF: so ya i think they just forgot to make this condtional 16:17
JayF++16:17
JayFIf you can file that issue and/or put up a PR; I'll point itamar at it16:17
JayFI can't tell him to work on it, but he's generally cooperative :D 16:18
sean-k-mooneythanks for the fresh persepctive i didnt think to see if this was a eventlet regression just because i expected this to kill nova and other things too16:18
JayFI've moved beyond "who regressed" and into "the whole thing is so damn fragile any change can have massive ripple effects"16:23
JayFlol16:23
JayFis it eventlet's fault? is it openstack's fault? yes. all of us. lol16:24
sean-k-mooneywell i tought it was an issue with apschduelr because it does nto actully supprot uage with eventlet and is spwaning backgorund threads16:24
sean-k-mooneyos i tried to fix it by ensuring the backgound thread was monkey patched initally16:25
sean-k-mooneyJayF: watcher is in one optional part of the code also mixing asyncio with eventlet without usign the asyncio hub16:25
sean-k-mooneyso i was going down the line "ok watcher is mixing may concurance models so it must be a watcher issue becuase nova is ok"16:26
MengyangZhang[m]Hello, I have a question about git review. I was following the Gerrit workflow tutorial https://docs.opendev.org/opendev/infra-manual/latest/sandbox.html#sandbox and ran into this error when running git review. I... (full message at <https://matrix.org/oftc/media/v1/media/download/AYfP68eKdfAp4GdhSOGMJNoxIcgRvFjHe-ES5V35PeyBtFAsEz-1GYjZH3yveUoKSXQ1CvR5Y8ZkyioBMNdqdLtCeUsuVqzQAG1hdHJpeC5vcmcvUHNabUJzV29SVVdnS2FwYUtYYm11d3RR>)16:29
JayFMengyangZhang[m]: that's a new one on me, perhaps #opendev would have more insight? 16:30
MengyangZhang[m]okk thanks for quick response. I will raise it there. 16:30
JayFgood luck!16:31
sean-k-mooney MengyangZhang[m]  so i think this is related to the verion of git you have installed16:31
MengyangZhang[m]mine is 2.39.216:31
sean-k-mooneyi have not hit that but you coudl work around it by doing a local rebase agaisnt master before doing "git revew"16:32
sean-k-mooneyfor what its worth we do nto allow merge commits to be part of a review16:32
sean-k-mooneyi have git version 2.46.016:33
sean-k-mooneyso your version is not that old16:33
JayFI've absolutely used git review w/git 2.39.2 at some point, I think it's unlikely to be that16:33
JayFsean-k-mooney may have id'd the issue -- if there are any merge commits in your push16:33
sean-k-mooneyyou probaly dont want pull.rebase=merges in your git cofnig either16:34
sean-k-mooneyi woudl set that to pull.rebase=true16:35
sean-k-mooneyor just unset it to get the default behvior16:35
MengyangZhang[m]Thanks for the help! So i did a local rebase agains master and tried git review again but still same error16:38
MengyangZhang[m]```16:38
MengyangZhang[m]* Thanks for the help! So i did a local rebase agains master and tried git review again but still same error16:39
MengyangZhang[m]git rebase master16:39
MengyangZhang[m]Current branch mzhang-branch is up to date.16:39
JayFLook at the git config changes that Sean suggested, perhaps?16:42
sean-k-mooneyso you shoudl do it like this16:43
sean-k-mooneygit fetch --all16:43
sean-k-mooneygit rebase -i gerrit/master16:43
sean-k-mooneygit review16:44
sean-k-mooneyif your local master is behind origin or gerrit master there may be a remote merge conflict that you are not seeing16:44
sean-k-mooneyJayF: ill see if i can find time to push a ptch to add teh if check later but i need to step away for a bit, this is the bug https://github.com/eventlet/eventlet/issues/101416:47
JayFsean-k-mooney: sounds like Itamar agrees with our suggested fix, will work on it soon :)16:54
sean-k-mooneyoh awsome16:55
sean-k-mooneyim going to try hackign it in locally to confirm that it does nto fail later but that woudl be good if they had time16:56
JayFit's a good fix regardless of if it resolves the watcher issue16:59
JayFbut hopefully it cleans up both16:59
-opendevstatus- NOTICE: The paste service at paste.opendev.org will have a short (15-20) minute outage momentarily to replace the underlying server.17:08
opendevreviewribaudr proposed openstack/nova-specs master: Fix a line that was not removed before merging the specification  https://review.opendev.org/c/openstack/nova-specs/+/93938018:16
opendevreviewDmitriy Chubinidze proposed openstack/nova master: Correct description for AggregateMultiTenancyIsolation.  https://review.opendev.org/c/openstack/nova/+/93939821:19
opendevreviewMengyang Zhang proposed openstack/nova master: Add Burst Length Support to Cinder QoS  https://review.opendev.org/c/openstack/nova/+/93931021:36
opendevreviewmelanie witt proposed openstack/nova master: libvirt pwr mgmt: reproduce bug 2085122  https://review.opendev.org/c/openstack/nova/+/93292223:37
opendevreviewmelanie witt proposed openstack/nova master: pwr mgmt: power down free PCPUS when updating compute node  https://review.opendev.org/c/openstack/nova/+/93292623:37

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!