Tuesday, 2025-12-09

*** mhen_ is now known as mhen02:13
opendevreviewArtem Vasilyev proposed openstack/nova master: Fix functional tests and mypy on macOS  https://review.opendev.org/c/openstack/nova/+/93772707:38
opendevreviewMerged openstack/nova master: Remove openSUSE/SLES from install guide  https://review.opendev.org/c/openstack/nova/+/94932409:31
opendevreviewIvan Anfimov proposed openstack/nova master: doc: Enabling of using LibvirtDriver for compute node  https://review.opendev.org/c/openstack/nova/+/93932509:33
opendevreviewIvan Anfimov proposed openstack/nova master: doc: Enabling of using LibvirtDriver for compute node  https://review.opendev.org/c/openstack/nova/+/93932509:33
opendevreviewRajesh Tailor proposed openstack/nova master: Add functional reproducer for bug 2096884  https://review.opendev.org/c/openstack/nova/+/97023811:51
DominikDanelski[m]sean-k-mooney: Sorry, I wasn't on the other end of the discussion, so I'm not sure how it works. Are you normally notified on comments with replies that haven't been resolved by them? I refer to https://review.opendev.org/c/openstack/nova/+/969251/comment/7f85f631_a6a7fdbb/ Of course take your time, I only wanted to confirm that I've done it the right way.11:53
sean-k-mooney DominikDanelski[m]  so when yyou make the requeted change you shoudl mark it as done. if the ocmment is just geneal feedback but does not requrie a change you can mark it as acknolaged or you can reply and tick resolved in the ui to show the comment is nolonger relevnet11:55
sean-k-mooneyfor that comment specificly i will get an emial when you updated the patch but not nessialy on a comment11:55
sean-k-mooneythe reason i asked you ot use a explcit flavor is its is not obvious why the excption should be raised without finding what the defualt flaovr is11:56
sean-k-mooneyyou do not have any comment explainging that so as a review i woudl have to go digging intot hte test infra to know if the test should raise or not11:56
noonedeadpunkhey folks. after upgrade to Epoxy I've started seing stack trace when attempting to create a gpu-passthrough instance, and I somehow link it to the https://opendev.org/openstack/nova/commit/78be1679312768383b684fa70ca4d2f5c4e35fa9#diff-8255edc78feecdbe2d14ab15e4580b4b6de916d211:57
noonedeadpunkthe problem is that self.roots is empty here: https://opendev.org/openstack/nova/src/commit/8a4b000216c7a6c2673af78d7eb7f9bf938dc867/nova/compute/provider_tree.py#L43611:58
noonedeadpunkfrom the placement view - resource provider is obviously there: https://paste.openstack.org/show/bn6ekRrWbg9ggNpuGx4h/11:59
sean-k-mooneyself.roots is not refenced in teh commit you linked11:59
noonedeadpunkBut I get `ValueError: No such provider 623a0277-62e4-43bc-acae-964dab595905` on the nova-compute side as a result11:59
sean-k-mooneyit is used in the code but it is not in the changed code12:00
sean-k-mooneynoonedeadpunk: you dont have any downstream backprot do you?12:00
noonedeadpunkno, I don't12:00
sean-k-mooneyit kind of soundsl like you mihgt have missed a patch12:00
sean-k-mooneyack12:00
noonedeadpunkbut. stack trace goes through the new _remove_managed_rps_from_tree_not_in_view 12:01
noonedeadpunkhttps://paste.openstack.org/show/bqjIms96L7A7ODUPU33W/12:01
sean-k-mooneythat does not refence self.roots either12:01
sean-k-mooneydo you have pci in placment enabled in this deployjment12:02
noonedeadpunkIsn't that default?12:02
sean-k-mooneyno12:02
noonedeadpunkjsut let me quick check....12:02
sean-k-mooneybut once its enabled we dont support turning it off12:02
noonedeadpunkI was quite sure it is12:03
noonedeadpunkshoot12:03
noonedeadpunkthanks12:03
noonedeadpunkit was fast12:03
sean-k-mooneywe may have turned it on in the last release or so12:03
noonedeadpunkI bet somebody has removed it...12:03
sean-k-mooneybut i dont know if we did that yet or not12:03
noonedeadpunkI have report_in_placement = True but not pci_in_placement12:04
sean-k-mooneystill false https://docs.openstack.org/nova/latest/configuration/config.html#pci.report_in_placement12:04
noonedeadpunksorry for this stupid thing... 12:04
sean-k-mooneyno thats actully the option that i wanted12:04
sean-k-mooneyso there are two part12:04
sean-k-mooneyon the compute you set report_in_placement = True12:04
sean-k-mooneyand in the schdluer you set the other flag to include it in the placment query12:05
noonedeadpunkSeems I don;t have override for `pci_in_placement` at all12:05
sean-k-mooneyif you have  report_in_placement=true it shoudl enabel this codepath but the error is basiclly saying it cant find the resouce provdier12:05
opendevreviewLajos Katona proposed openstack/nova master: WIP: Use SDK for Neutron Ports  https://review.opendev.org/c/openstack/nova/+/96929812:06
noonedeadpunkbut if it's a scheduler part - scheduler seems to be able to find and supply the resource id12:06
noonedeadpunklet me fix scheduler anyway...12:06
sean-k-mooneynoonedeadpunk: can you check if the resouce provider for that compute exists in placement12:07
noonedeadpunkso the id of in the stack trace is a child12:07
noonedeadpunkhttps://paste.openstack.org/show/bn6ekRrWbg9ggNpuGx4h/12:07
sean-k-mooneyack ya it shoudl be the chile for the pci device 12:07
sean-k-mooneyright so its  os-compute-gpu01-az2_0000:25:00.0 12:08
noonedeadpunkyeah12:08
sean-k-mooneythe inventory for the gpu12:08
sean-k-mooneyso can you confirm that that is in you pci devspec and that the adress is correct 12:08
noonedeadpunk25:00.0 3D controller: NVIDIA Corporation12:09
noonedeadpunkfrom lspci12:09
sean-k-mooneyya that looks normal 12:09
sean-k-mooneyi belive   self._remove_managed_rps_from_tree_not_in_view(provider_tree) is intended to be used to remove RP for pci device that are nolonger in the whitelist12:10
sean-k-mooneyso that is why i was suggestign checkign the devspec in the nova.conf on the compute to confirm that is correct12:10
sean-k-mooneythere could ligitimatly be a bug i just want to rule out config errors12:12
noonedeadpunkso one thing I am suspicious about....12:13
noonedeadpunkIs that we have a custom trait on top of the resource class in there12:13
noonedeadpunkand this custom trait is applied only to the root device12:14
noonedeadpunkhttps://paste.openstack.org/show/bhtdPwgzYHpzCdoIGe32/12:14
sean-k-mooneyam i dont think that shoudl break anythign but maybe12:14
noonedeadpunkah, well, it's passed to children as well12:15
sean-k-mooneywe shoudl be adding COMPUTE_MANAGED_PCI_DEVICE as well12:15
sean-k-mooneycan you check fi you see both?12:15
noonedeadpunkyes, I see CUSTOM_GPU_PASSTHROUGH and COMPUTE_MANAGED_PCI_DEVICE traits on both parent and childs12:16
noonedeadpunkbut not the resource_class12:16
sean-k-mooneywell the resouce class shoul be the resouce class in the invetnotry 12:17
noonedeadpunkBut resource class should not be in traits iirc12:17
sean-k-mooneycorrect12:17
sean-k-mooneyif you do an inventory show on the rp you shoudl see it there12:17
noonedeadpunkopenstack resource class show CUSTOM_A10_FULL returns the name, yes12:17
noonedeadpunkI kinda failing to understand at what point `self.roots_by_uuid` is expected to be populated12:20
sean-k-mooneyhttps://opendev.org/openstack/nova/src/commit/8a4b000216c7a6c2673af78d7eb7f9bf938dc867/nova/compute/provider_tree.py#L29212:22
noonedeadpunkit does not look like it's cached, or populated during instance create request12:22
noonedeadpunkbut according to trace it's not called?12:22
sean-k-mooneyit can be populated in othere wase too12:22
sean-k-mooneythe tace doe not actully show a specific error 12:23
sean-k-mooneyjust that the child rp was nto found12:23
sean-k-mooneyi assume this si repoabale i.e. if yuou restart the nova comptue agent it happens again12:23
noonedeadpunkwell, I used epdb, and self.roots is dict_values([])12:24
noonedeadpunkyes, sure12:24
sean-k-mooneyits returnnign the values mapping of the dict yes12:24
noonedeadpunkand self.roots_by_uuid is {}12:25
sean-k-mooneythis shoudl be updated in memory as part fo init_host12:25
noonedeadpunkso it feels that it's not populated in fact...12:25
sean-k-mooneyso during init_host we end up running update_aviable_resocues which populates the resouce tacker and caches the provieder tree12:26
sean-k-mooneythis should be poplulated before we ever get to _build_and_run_instance12:27
noonedeadpunkunless class is not re-inited at some point....12:29
opendevreviewDominik proposed openstack/nova master: Regression test for Placement allocations remaining during failed schedule  https://review.opendev.org/c/openstack/nova/+/96925112:36
opendevreviewDominik proposed openstack/nova master: Remove Placement allocations in the broken build cleanup  https://review.opendev.org/c/openstack/nova/+/96844612:36
DominikDanelski[m]sean-k-mooney: All right, I added a comment explaining why the exact flavour doesn't matter, so now all discussion is resolved.12:38
noonedeadpunksean-k-mooney: so I reverted 78be1679312768383b684fa70ca4d2f5c4e35fa9 and instance become spawning normally :(12:39
noonedeadpunkwill submit a bug report in the meanwhile I guess12:40
sean-k-mooneyan do they have allcoation against the placemetn RPs for the gpu usage?12:40
sean-k-mooneyi..e there were no other sideffects?12:41
sean-k-mooneyim wondering if we are misisng a fix for one of the ohter caching bugs related to OTU devices12:41
sean-k-mooneyin the sable branch12:41
sean-k-mooneygibi and dan were fixing some other bugs around the same time12:41
noonedeadpunkit looks fine? https://paste.openstack.org/show/bj8ZZgxzHSYHZF2hGFvg/12:44
noonedeadpunkhm, maybe I am missing some backport...12:45
noonedeadpunknah, I think we're on current HEAD of 2025.112:45
sean-k-mooneyya that looks correct to me too12:47
noonedeadpunkcreated report https://bugs.launchpad.net/nova/+bug/213446913:00
DominikDanelski[m]sean-k-mooney: Sorry, then do I understand correctly what you wrote at 12:55 that if I replied to the comment but made no code changes, just as I did now, you wouldn't be informed about it?13:21
DominikDanelski[m]I wouldn't want to constantly bug you here, but then I don't know to best reach you when I reply.13:31
sean-k-mooneyDominikDanelski[m]: so every unless you adjust it in you settings if your on a review (reviewwr or cc) you will get an email every time somone comments or pushes a new patch13:50
sean-k-mooneyDominikDanelski[m]: and it show up with your name rater then zuul in the from13:51
sean-k-mooneyi.e. Dominik (Code Review)13:51
sean-k-mooneyso i will be notifed by ether a comment or a new aptchset13:51
sean-k-mooneyi just might not see it for a while depending on how often i check my email13:52
nicolairuckelI'm a bit lost trying to figure out which code gets called for a cold migration. There is the `_cold_migrate` function in manager.py but I kind of expected a call to the driver at some point where I could copy the NVRAM but I can't find anything like that. Am I overlooking something or am I on the wrong track to begin with?15:26
tkajinamnicolairuckel, check migrate_disk_and_power_off in LibvirtDriver15:48
tkajinamI think nova internally calls resize method15:48
nicolairuckeltkajinam: Hm, that's what I tried first but it didn't work. I'm going to check it again. Maybe I did something wrong when I was testing it. Thank you.17:15
opendevreviewNicolai Ruckel proposed openstack/nova master: Preserve UEFI NVRAM variable store  https://review.opendev.org/c/openstack/nova/+/95968218:03
nicolairuckelI pushed my attempt so maybe someone else sees anything that is obviously wrong there.18:04
nicolairuckelThis is what I tried: https://review.opendev.org/c/openstack/nova/+/959682/comment/2c08400d_baac062c/18:56
opendevreviewZhan Zhang proposed openstack/nova-specs master: Refine network setup procedure in live migrations  https://review.opendev.org/c/openstack/nova-specs/+/97029819:31
Zhan[m]sean-k-mooney: As we previously discussed I've submitted the spec for https://bugs.launchpad.net/nova/+bug/2128665, unfortunately didn't make it for 2026.1 so this will be in 2026.2.19:35
opendevreviewZhan Zhang proposed openstack/nova-specs master: Refine network setup procedure in live migrations  https://review.opendev.org/c/openstack/nova-specs/+/97029819:42
opendevreviewZhan Zhang proposed openstack/nova-specs master: Refine network setup procedure in live migrations  https://review.opendev.org/c/openstack/nova-specs/+/97029819:54

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!