| *** mhen_ is now known as mhen | 02:13 | |
| opendevreview | Artem Vasilyev proposed openstack/nova master: Fix functional tests and mypy on macOS https://review.opendev.org/c/openstack/nova/+/937727 | 07:38 |
|---|---|---|
| opendevreview | Merged openstack/nova master: Remove openSUSE/SLES from install guide https://review.opendev.org/c/openstack/nova/+/949324 | 09:31 |
| opendevreview | Ivan Anfimov proposed openstack/nova master: doc: Enabling of using LibvirtDriver for compute node https://review.opendev.org/c/openstack/nova/+/939325 | 09:33 |
| opendevreview | Ivan Anfimov proposed openstack/nova master: doc: Enabling of using LibvirtDriver for compute node https://review.opendev.org/c/openstack/nova/+/939325 | 09:33 |
| opendevreview | Rajesh Tailor proposed openstack/nova master: Add functional reproducer for bug 2096884 https://review.opendev.org/c/openstack/nova/+/970238 | 11:51 |
| DominikDanelski[m] | sean-k-mooney: Sorry, I wasn't on the other end of the discussion, so I'm not sure how it works. Are you normally notified on comments with replies that haven't been resolved by them? I refer to https://review.opendev.org/c/openstack/nova/+/969251/comment/7f85f631_a6a7fdbb/ Of course take your time, I only wanted to confirm that I've done it the right way. | 11:53 |
| sean-k-mooney | DominikDanelski[m] so when yyou make the requeted change you shoudl mark it as done. if the ocmment is just geneal feedback but does not requrie a change you can mark it as acknolaged or you can reply and tick resolved in the ui to show the comment is nolonger relevnet | 11:55 |
| sean-k-mooney | for that comment specificly i will get an emial when you updated the patch but not nessialy on a comment | 11:55 |
| sean-k-mooney | the reason i asked you ot use a explcit flavor is its is not obvious why the excption should be raised without finding what the defualt flaovr is | 11:56 |
| sean-k-mooney | you do not have any comment explainging that so as a review i woudl have to go digging intot hte test infra to know if the test should raise or not | 11:56 |
| noonedeadpunk | hey folks. after upgrade to Epoxy I've started seing stack trace when attempting to create a gpu-passthrough instance, and I somehow link it to the https://opendev.org/openstack/nova/commit/78be1679312768383b684fa70ca4d2f5c4e35fa9#diff-8255edc78feecdbe2d14ab15e4580b4b6de916d2 | 11:57 |
| noonedeadpunk | the problem is that self.roots is empty here: https://opendev.org/openstack/nova/src/commit/8a4b000216c7a6c2673af78d7eb7f9bf938dc867/nova/compute/provider_tree.py#L436 | 11:58 |
| noonedeadpunk | from the placement view - resource provider is obviously there: https://paste.openstack.org/show/bn6ekRrWbg9ggNpuGx4h/ | 11:59 |
| sean-k-mooney | self.roots is not refenced in teh commit you linked | 11:59 |
| noonedeadpunk | But I get `ValueError: No such provider 623a0277-62e4-43bc-acae-964dab595905` on the nova-compute side as a result | 11:59 |
| sean-k-mooney | it is used in the code but it is not in the changed code | 12:00 |
| sean-k-mooney | noonedeadpunk: you dont have any downstream backprot do you? | 12:00 |
| noonedeadpunk | no, I don't | 12:00 |
| sean-k-mooney | it kind of soundsl like you mihgt have missed a patch | 12:00 |
| sean-k-mooney | ack | 12:00 |
| noonedeadpunk | but. stack trace goes through the new _remove_managed_rps_from_tree_not_in_view | 12:01 |
| noonedeadpunk | https://paste.openstack.org/show/bqjIms96L7A7ODUPU33W/ | 12:01 |
| sean-k-mooney | that does not refence self.roots either | 12:01 |
| sean-k-mooney | do you have pci in placment enabled in this deployjment | 12:02 |
| noonedeadpunk | Isn't that default? | 12:02 |
| sean-k-mooney | no | 12:02 |
| noonedeadpunk | jsut let me quick check.... | 12:02 |
| sean-k-mooney | but once its enabled we dont support turning it off | 12:02 |
| noonedeadpunk | I was quite sure it is | 12:03 |
| noonedeadpunk | shoot | 12:03 |
| noonedeadpunk | thanks | 12:03 |
| noonedeadpunk | it was fast | 12:03 |
| sean-k-mooney | we may have turned it on in the last release or so | 12:03 |
| noonedeadpunk | I bet somebody has removed it... | 12:03 |
| sean-k-mooney | but i dont know if we did that yet or not | 12:03 |
| noonedeadpunk | I have report_in_placement = True but not pci_in_placement | 12:04 |
| sean-k-mooney | still false https://docs.openstack.org/nova/latest/configuration/config.html#pci.report_in_placement | 12:04 |
| noonedeadpunk | sorry for this stupid thing... | 12:04 |
| sean-k-mooney | no thats actully the option that i wanted | 12:04 |
| sean-k-mooney | so there are two part | 12:04 |
| sean-k-mooney | on the compute you set report_in_placement = True | 12:04 |
| sean-k-mooney | and in the schdluer you set the other flag to include it in the placment query | 12:05 |
| noonedeadpunk | Seems I don;t have override for `pci_in_placement` at all | 12:05 |
| sean-k-mooney | if you have report_in_placement=true it shoudl enabel this codepath but the error is basiclly saying it cant find the resouce provdier | 12:05 |
| opendevreview | Lajos Katona proposed openstack/nova master: WIP: Use SDK for Neutron Ports https://review.opendev.org/c/openstack/nova/+/969298 | 12:06 |
| noonedeadpunk | but if it's a scheduler part - scheduler seems to be able to find and supply the resource id | 12:06 |
| noonedeadpunk | let me fix scheduler anyway... | 12:06 |
| sean-k-mooney | noonedeadpunk: can you check if the resouce provider for that compute exists in placement | 12:07 |
| noonedeadpunk | so the id of in the stack trace is a child | 12:07 |
| noonedeadpunk | https://paste.openstack.org/show/bn6ekRrWbg9ggNpuGx4h/ | 12:07 |
| sean-k-mooney | ack ya it shoudl be the chile for the pci device | 12:07 |
| sean-k-mooney | right so its os-compute-gpu01-az2_0000:25:00.0 | 12:08 |
| noonedeadpunk | yeah | 12:08 |
| sean-k-mooney | the inventory for the gpu | 12:08 |
| sean-k-mooney | so can you confirm that that is in you pci devspec and that the adress is correct | 12:08 |
| noonedeadpunk | 25:00.0 3D controller: NVIDIA Corporation | 12:09 |
| noonedeadpunk | from lspci | 12:09 |
| sean-k-mooney | ya that looks normal | 12:09 |
| sean-k-mooney | i belive self._remove_managed_rps_from_tree_not_in_view(provider_tree) is intended to be used to remove RP for pci device that are nolonger in the whitelist | 12:10 |
| sean-k-mooney | so that is why i was suggestign checkign the devspec in the nova.conf on the compute to confirm that is correct | 12:10 |
| sean-k-mooney | there could ligitimatly be a bug i just want to rule out config errors | 12:12 |
| noonedeadpunk | so one thing I am suspicious about.... | 12:13 |
| noonedeadpunk | Is that we have a custom trait on top of the resource class in there | 12:13 |
| noonedeadpunk | and this custom trait is applied only to the root device | 12:14 |
| noonedeadpunk | https://paste.openstack.org/show/bhtdPwgzYHpzCdoIGe32/ | 12:14 |
| sean-k-mooney | am i dont think that shoudl break anythign but maybe | 12:14 |
| noonedeadpunk | ah, well, it's passed to children as well | 12:15 |
| sean-k-mooney | we shoudl be adding COMPUTE_MANAGED_PCI_DEVICE as well | 12:15 |
| sean-k-mooney | can you check fi you see both? | 12:15 |
| noonedeadpunk | yes, I see CUSTOM_GPU_PASSTHROUGH and COMPUTE_MANAGED_PCI_DEVICE traits on both parent and childs | 12:16 |
| noonedeadpunk | but not the resource_class | 12:16 |
| sean-k-mooney | well the resouce class shoul be the resouce class in the invetnotry | 12:17 |
| noonedeadpunk | But resource class should not be in traits iirc | 12:17 |
| sean-k-mooney | correct | 12:17 |
| sean-k-mooney | if you do an inventory show on the rp you shoudl see it there | 12:17 |
| noonedeadpunk | openstack resource class show CUSTOM_A10_FULL returns the name, yes | 12:17 |
| noonedeadpunk | I kinda failing to understand at what point `self.roots_by_uuid` is expected to be populated | 12:20 |
| sean-k-mooney | https://opendev.org/openstack/nova/src/commit/8a4b000216c7a6c2673af78d7eb7f9bf938dc867/nova/compute/provider_tree.py#L292 | 12:22 |
| noonedeadpunk | it does not look like it's cached, or populated during instance create request | 12:22 |
| noonedeadpunk | but according to trace it's not called? | 12:22 |
| sean-k-mooney | it can be populated in othere wase too | 12:22 |
| sean-k-mooney | the tace doe not actully show a specific error | 12:23 |
| sean-k-mooney | just that the child rp was nto found | 12:23 |
| sean-k-mooney | i assume this si repoabale i.e. if yuou restart the nova comptue agent it happens again | 12:23 |
| noonedeadpunk | well, I used epdb, and self.roots is dict_values([]) | 12:24 |
| noonedeadpunk | yes, sure | 12:24 |
| sean-k-mooney | its returnnign the values mapping of the dict yes | 12:24 |
| noonedeadpunk | and self.roots_by_uuid is {} | 12:25 |
| sean-k-mooney | this shoudl be updated in memory as part fo init_host | 12:25 |
| noonedeadpunk | so it feels that it's not populated in fact... | 12:25 |
| sean-k-mooney | so during init_host we end up running update_aviable_resocues which populates the resouce tacker and caches the provieder tree | 12:26 |
| sean-k-mooney | this should be poplulated before we ever get to _build_and_run_instance | 12:27 |
| noonedeadpunk | unless class is not re-inited at some point.... | 12:29 |
| opendevreview | Dominik proposed openstack/nova master: Regression test for Placement allocations remaining during failed schedule https://review.opendev.org/c/openstack/nova/+/969251 | 12:36 |
| opendevreview | Dominik proposed openstack/nova master: Remove Placement allocations in the broken build cleanup https://review.opendev.org/c/openstack/nova/+/968446 | 12:36 |
| DominikDanelski[m] | sean-k-mooney: All right, I added a comment explaining why the exact flavour doesn't matter, so now all discussion is resolved. | 12:38 |
| noonedeadpunk | sean-k-mooney: so I reverted 78be1679312768383b684fa70ca4d2f5c4e35fa9 and instance become spawning normally :( | 12:39 |
| noonedeadpunk | will submit a bug report in the meanwhile I guess | 12:40 |
| sean-k-mooney | an do they have allcoation against the placemetn RPs for the gpu usage? | 12:40 |
| sean-k-mooney | i..e there were no other sideffects? | 12:41 |
| sean-k-mooney | im wondering if we are misisng a fix for one of the ohter caching bugs related to OTU devices | 12:41 |
| sean-k-mooney | in the sable branch | 12:41 |
| sean-k-mooney | gibi and dan were fixing some other bugs around the same time | 12:41 |
| noonedeadpunk | it looks fine? https://paste.openstack.org/show/bj8ZZgxzHSYHZF2hGFvg/ | 12:44 |
| noonedeadpunk | hm, maybe I am missing some backport... | 12:45 |
| noonedeadpunk | nah, I think we're on current HEAD of 2025.1 | 12:45 |
| sean-k-mooney | ya that looks correct to me too | 12:47 |
| noonedeadpunk | created report https://bugs.launchpad.net/nova/+bug/2134469 | 13:00 |
| DominikDanelski[m] | sean-k-mooney: Sorry, then do I understand correctly what you wrote at 12:55 that if I replied to the comment but made no code changes, just as I did now, you wouldn't be informed about it? | 13:21 |
| DominikDanelski[m] | I wouldn't want to constantly bug you here, but then I don't know to best reach you when I reply. | 13:31 |
| sean-k-mooney | DominikDanelski[m]: so every unless you adjust it in you settings if your on a review (reviewwr or cc) you will get an email every time somone comments or pushes a new patch | 13:50 |
| sean-k-mooney | DominikDanelski[m]: and it show up with your name rater then zuul in the from | 13:51 |
| sean-k-mooney | i.e. Dominik (Code Review) | 13:51 |
| sean-k-mooney | so i will be notifed by ether a comment or a new aptchset | 13:51 |
| sean-k-mooney | i just might not see it for a while depending on how often i check my email | 13:52 |
| nicolairuckel | I'm a bit lost trying to figure out which code gets called for a cold migration. There is the `_cold_migrate` function in manager.py but I kind of expected a call to the driver at some point where I could copy the NVRAM but I can't find anything like that. Am I overlooking something or am I on the wrong track to begin with? | 15:26 |
| tkajinam | nicolairuckel, check migrate_disk_and_power_off in LibvirtDriver | 15:48 |
| tkajinam | I think nova internally calls resize method | 15:48 |
| nicolairuckel | tkajinam: Hm, that's what I tried first but it didn't work. I'm going to check it again. Maybe I did something wrong when I was testing it. Thank you. | 17:15 |
| opendevreview | Nicolai Ruckel proposed openstack/nova master: Preserve UEFI NVRAM variable store https://review.opendev.org/c/openstack/nova/+/959682 | 18:03 |
| nicolairuckel | I pushed my attempt so maybe someone else sees anything that is obviously wrong there. | 18:04 |
| nicolairuckel | This is what I tried: https://review.opendev.org/c/openstack/nova/+/959682/comment/2c08400d_baac062c/ | 18:56 |
| opendevreview | Zhan Zhang proposed openstack/nova-specs master: Refine network setup procedure in live migrations https://review.opendev.org/c/openstack/nova-specs/+/970298 | 19:31 |
| Zhan[m] | sean-k-mooney: As we previously discussed I've submitted the spec for https://bugs.launchpad.net/nova/+bug/2128665, unfortunately didn't make it for 2026.1 so this will be in 2026.2. | 19:35 |
| opendevreview | Zhan Zhang proposed openstack/nova-specs master: Refine network setup procedure in live migrations https://review.opendev.org/c/openstack/nova-specs/+/970298 | 19:42 |
| opendevreview | Zhan Zhang proposed openstack/nova-specs master: Refine network setup procedure in live migrations https://review.opendev.org/c/openstack/nova-specs/+/970298 | 19:54 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!