sean-k-mooney[m] | mnaser: that is something that libvirt not nova has always been in charge of | 00:31 |
---|---|---|
sean-k-mooney[m] | intel can change the cpu flags via microcode and when it comes to tsx they have on several ocations | 00:31 |
sean-k-mooney[m] | nova for the most part triese to leave all cpu compatiablity checkign to the hyperviors the virt driver is managing | 00:32 |
sean-k-mooney[m] | the excption to this is the abstction we have via traits | 00:33 |
sean-k-mooney[m] | nova sepcficaly the libvirt driver uses the libvirt api to introspect the cpu to report traits | 00:35 |
sean-k-mooney[m] | we do not use cpuid or msrs to detect this as its libvirts jobs to unify the feature flags in a vendor indepentent way | 00:35 |
sean-k-mooney[m] | and its also libvirts job to determin cpu comparitblity for the most part in live migration. | 00:36 |
sean-k-mooney[m] | nova has never used the current cpu model or cpu flags to make schduling decision for migrations | 00:36 |
sean-k-mooney[m] | if you configure required traits in the flavor or image we can take those into account but the current cpu flags of a vm are not an input into the schduleing desicion. | 00:37 |
sean-k-mooney[m] | and they never have been that has alwasy been delegated to the operator to enforce using host aggreates | 00:38 |
sean-k-mooney[m] | so nova has not changed in this regard since i started working on openstack for the most part. | 00:39 |
opendevreview | Sylvain Bauza proposed openstack/nova master: api: Drop generating a keypair and add special chars to naming https://review.opendev.org/c/openstack/nova/+/849133 | 07:33 |
bauzas | gibi: sean-k-mooney: I'm done with the keypair generation removal | 07:33 |
bauzas | given Uggla's patches seem good for unshelve, I'll rebase my branch up on Uggla's unshelve API change once he rebases | 07:34 |
bauzas | gibi: sean-k-mooney: I'll actually be off from tonight to next week (I'll also take Friday) | 07:35 |
gibi | bauzas: thanks for moving you stuff top of Uggla's that is nice selflessness | 07:38 |
gibi | bauzas: have a nice PTO, do you have someting on your PTL table we should keep in mind while you are away? | 07:39 |
bauzas | gibi: nothing in my mind, I'll abandon the yoga open specs next week then | 07:39 |
bauzas | we're on yoga-2 on Thursday | 07:40 |
bauzas | July-14 | 07:40 |
bauzas | that will mean we won't accept new specs | 07:40 |
bauzas | but actually, we don't have a lot of them for zed | 07:40 |
gibi | OK, so I guess you will do the official freeze mail when you are back on Monday. That is KO | 07:42 |
gibi | OK | 07:42 |
bauzas | yup | 07:42 |
bauzas | unless you wanna use the axe | 07:42 |
gibi | nope | 07:43 |
gibi | the axe is yours :) | 07:43 |
gibi | and as you said we don't have much open | 07:43 |
gibi | so there is no need for the axe | 07:43 |
bauzas | gibi: yeah, in my email, I'll clarify the situation https://review.opendev.org/q/project:openstack/nova-specs+status:open+file:%255Especs/zed/.* | 07:45 |
bauzas | about ironic's discussion, this won't need to be hold by the deadline | 07:46 |
bauzas | so only artom's spec is impacted... unless he's able to revive it before the deadline | 07:46 |
bauzas | I don't know if sean-k-mooney had wheels for https://review.opendev.org/c/openstack/nova-specs/+/821419 | 07:47 |
bauzas | gibi: about the API changes, those are stacking | 07:49 |
bauzas | with the same API microversion | 07:49 |
bauzas | I'm not really worried yet | 07:50 |
bauzas | but maybe next week, I'd propose some etherpad for trying to organize series between them | 07:50 |
bauzas | like, Uggla would take the 2.91 as he's close to be merged | 07:50 |
bauzas | mine would take 2.92 as this is a quite self-contained change | 07:51 |
bauzas | and we would debate on other patches for 2.93 and others | 07:51 |
bauzas | idea being that owners of those patches would have time in advance to rebase | 07:51 |
opendevreview | Sylvain Bauza proposed openstack/nova master: zuul: Put Centos9 Stream job periodic-weekly and experimental https://review.opendev.org/c/openstack/nova/+/849463 | 08:03 |
bauzas | gibi: sean-k-mooney: ^ | 08:03 |
bauzas | Uggla: good morning | 08:05 |
gibi | bauzas: thanks for the summary above. I agree with the plans | 08:06 |
Uggla | bauzas, o/ | 08:06 |
gibi | I'm +2 on the centos9 patch | 08:06 |
gibi | Uggla: o/ | 08:06 |
bauzas | Uggla: as I mentioned above, I'll rebase my keypair generation API change on top of your unshelve API patch | 08:07 |
bauzas | gibi: do you think we actually need to rebase all our branches ? Can't I just write my patch saying "this is 2.92" ? | 08:08 |
bauzas | of course, I would get a merge conflict because gerrit wouldn't be able to rebase the rest api microversion list doc | 08:08 |
bauzas | but this would waaaaay simplify the merge conflict resolution | 08:09 |
Uggla | bauzas, regarding unshelve have you entered your comments ? | 08:10 |
bauzas | Uggla: not yet, that's my next move | 08:10 |
bauzas | Uggla: I have a direct interest in merging your branch | 08:11 |
gibi | bauzas: I'm not sure you can verify your code if it is on 2.92 without 2.91 existing | 08:11 |
gibi | but other than that I'm OK to have 2.92 haning off 2.90 with a merge conflict | 08:12 |
Uggla | bauzas, ok I'll wait then I will fix gibi and yours quickly so you could probably merge. | 08:12 |
gibi | I'm here so I can quickly re-review | 08:12 |
bauzas | gibi: yeah the tests will probably fail | 08:14 |
bauzas | but I see this as a security layer in case of a distracted core reviewer | 08:14 |
bauzas | people can work on the the latest microversion, wait for Zuul +1ing | 08:15 |
bauzas | and then modifying their patches with a placeholder microversion | 08:15 |
bauzas | Zuul would say no, but we'd have evidence this was working before | 08:15 |
bauzas | and a merge resolution would solve it quickier once the concurrent patch merges | 08:16 |
bauzas | I think I'm fool enough to test it on my series | 08:16 |
bauzas | once Zuul blesses my last revision | 08:17 |
gibi | bauzas: yeah if you want then you can test this on the keypair series | 08:17 |
opendevreview | Amit Uniyal proposed openstack/nova master: add regression test case for bug 1978983 https://review.opendev.org/c/openstack/nova/+/849104 | 08:23 |
bauzas | Uggla: -1 on https://review.opendev.org/c/openstack/nova/+/831507 due to missing UTs on nova.compute.api | 09:33 |
bauzas | you wrote excellent conditionals (kudos to gibi and you) but you don't verify them :) | 09:34 |
bauzas | also, please help poor reviewers by not reindenting tests, that doesn't help to see the bone of the change :) | 09:34 |
gibi | bauzas: there is a bunch of functional coverage that I felt enough | 09:34 |
bauzas | gibi: yeah but we already have UTs for az | 09:35 |
gibi | ack, I'm not against having exta UTs too, just stated why I ' | 09:35 |
bauzas | and the functests are done on the latter patch | 09:35 |
gibi | why I'm OK as is | 09:35 |
opendevreview | Amit Uniyal proposed openstack/nova master: Adds check, if admin has set compute service down https://review.opendev.org/c/openstack/nova/+/848886 | 09:40 |
Uggla | bauzas, you mean the conditionals with host and az ? There are fully tested with functional tests. Am I missing something ? | 09:42 |
bauzas | Uggla: you test them on the functests in https://review.opendev.org/c/openstack/nova/+/845897/4/nova/tests/functional/test_servers.py | 09:44 |
bauzas | which is the latter patch | 09:44 |
bauzas | Uggla: but you also touch https://review.opendev.org/c/openstack/nova/+/831507/17/nova/tests/unit/compute/test_shelve.py in the compute patch | 09:44 |
bauzas | you're actually just reindenting a few calls | 09:44 |
bauzas | but you could also test the host param in some other tests | 09:45 |
opendevreview | Merged openstack/nova master: Catch an exception in power off procedure https://review.opendev.org/c/openstack/nova/+/817176 | 09:46 |
opendevreview | Merged openstack/nova master: Optimize _local_delete calls by compute unit tests https://review.opendev.org/c/openstack/nova/+/844285 | 09:46 |
sean-k-mooney | bauzas: ill review your api removal patch shortly. i spend a lot of time reviewing this morning before going up to the office so im getting a little burnt out by it but i can do one or two more | 09:47 |
sean-k-mooney | bauzas: the centos 9 patch is on its way to merging | 09:48 |
sean-k-mooney | bauzas: gibi im going to leave the unshleve to host serise to ye. ping me if needed but since ye had open comments on them ill let ye take lead on teh review of that | 09:49 |
gibi | sean-k-mooney: ack, make sense | 09:49 |
sean-k-mooney | bauzas: regarding the external power manamgment i have not had time to look at it but its on my todo list for today | 09:49 |
gibi | sean-k-mooney: my only concern is that bauzas is off the rest of this week and I'd like to merge the unshelve | 09:50 |
sean-k-mooney | gibi: i can review just proably not today | 09:50 |
gibi | sean-k-mooney: superb, thanks | 09:50 |
bauzas | thanks | 09:51 |
sean-k-mooney | if ye agree on the path forward ill review when Uggla respins the patch to adress your comments | 09:51 |
bauzas | sean-k-mooney: gibi: we deserve to be humble with Uggla https://review.opendev.org/c/openstack/os-traits/+/832769 | 10:05 |
gibi | Uggla, bauzas: I'm -1 on https://review.opendev.org/c/openstack/os-traits/+/832769 | 10:07 |
bauzas | gibi: excellent point | 10:08 |
gibi | easy to fix :) | 10:08 |
opendevreview | Manuel Bentele proposed openstack/nova-specs master: Add configuration options to set SPICE compression settings https://review.opendev.org/c/openstack/nova-specs/+/849488 | 10:08 |
opendevreview | Manuel Bentele proposed openstack/nova-specs master: Add configuration options to set SPICE compression settings https://review.opendev.org/c/openstack/nova-specs/+/849488 | 10:11 |
frickler | sean-k-mooney: wow, you really get me wondering now why gerrit is sending me mails about a nova patch, which it usually doesn't. finding out that I reviewed it 5 years ago was ... interesting ;) | 10:14 |
opendevreview | Manuel Bentele proposed openstack/nova-specs master: Add configuration options to set SPICE compression settings https://review.opendev.org/c/openstack/nova-specs/+/849488 | 10:15 |
sean-k-mooney | frickler: i have a dashboard that i sometimes use when i want to find patches to review | 10:56 |
sean-k-mooney | i went througyh some of hte small ones this moringin then looked for ones with one +2 that were not in merge conflict | 10:57 |
sean-k-mooney | then looked at a few form my normal todo list | 10:57 |
sean-k-mooney | so ya some of those were old | 10:57 |
sean-k-mooney | https://review.opendev.org/dashboard/?foreach=%28+project%3Aopenstack%2Fnova+OR%0Aproject%3Aopenstack%2Fpython-novaclient+OR%0Aproject%3Aopenstack%2Fnova-specs+OR%0Aproject%3Aopenstack%2Fos-vif+OR%0Aproject%3Aopenstack%2Fos-traits+%29%0Astatus%3Aopen%0ANOT+owner%3Aself%0ANOT+label%3AWorkflow%3C%3D-1%0Alabel%3AVerified%3E%3D1%2Czuul%0ANOT+reviewedby%3Aself%0Abranch%3Amaster&tit | 10:58 |
sean-k-mooney | le=Nova+Review+Inbox&Small+patches=%28project%3Aopenstack%2Fnova+OR+project%3Aopenstack%2Fpython-novaclient+OR+project%3Aopenstack%2Fos-vif+OR+project%3Aopenstack%2Fos-traits%29+NOT+label%3ACode-Review%3E%3D2%2Cself+NOT+label%3ACode-Review%3C%3D-1%2Cnova-core+NOT+message%3A%22DNM%22+delta%3A%3C%3D10&Needs+final+%2B2=%28project%3Aopenstack%2Fnova+OR+project%3Aopenstack%2Fpython | 10:58 |
sean-k-mooney | -novaclient+OR+project%3Aopenstack%2Fos-vif+OR+project%3Aopenstack%2Fos-traits%29+NOT+label%3ACode-Review%3E%3D2%2Cself+label%3ACode-Review%3E%3D2+limit%3A50&Bug+fix%2C+Passed+Zuul%2C+No+Negative+Feedback=NOT+label%3ACode-Review%3E%3D2%2Cself+NOT+label%3ACode-Review%3C%3D-1%2Cnova-core+message%3A%22bug%3A+%22+limit%3A50&Wayward+Changes+%28Changes+with+no+code+review+in+the+las | 10:58 |
sean-k-mooney | t+two+days%29=NOT+label%3ACode-Review%3C%3D-1+NOT+label%3ACode-Review%3E%3D1+age%3A2d+limit%3A50&Needs+feedback+%28Changes+older+than+5+days+that+have+not+been+reviewed+by+anyone%29=NOT+label%3ACode-Review%3C%3D-1+NOT+label%3ACode-Review%3E%3D1+age%3A5d+limit%3A50&Passed+Zuul%2C+No+Negative+Feedback=NOT+label%3ACode-Review%3E%3D2+NOT+label%3ACode-Review%3C%3D-1+limit%3A50&Need | 10:58 |
sean-k-mooney | s+revisit+%28You+were+a+reviewer+but+haven%27t+voted+in+the+current+revision%29=reviewer%3Aself+limit%3A50&Specs=project%3Aopenstack%2Fnova-specs+status%3Aopen+limit%3A20 | 10:58 |
sean-k-mooney | ok thats longer then i tought it was | 10:58 |
sean-k-mooney | its also a little buggy sometimes | 10:58 |
sean-k-mooney | like it sometimes need to be opened twice to get current data | 10:59 |
opendevreview | Amit Uniyal proposed openstack/nova master: add regression test case for bug 1978983 https://review.opendev.org/c/openstack/nova/+/849104 | 11:04 |
opendevreview | Amit Uniyal proposed openstack/nova master: Adds check, if admin has set compute service down https://review.opendev.org/c/openstack/nova/+/848886 | 11:04 |
opendevreview | sean mooney proposed openstack/nova master: Adds check, if admin has set compute service down https://review.opendev.org/c/openstack/nova/+/848886 | 12:01 |
sean-k-mooney | auniyal_: ^ | 12:02 |
sean-k-mooney | that fixes your release note issue | 12:02 |
sean-k-mooney | but now i need to rebase them both | 12:03 |
opendevreview | sean mooney proposed openstack/nova master: add regression test case for bug 1978983 https://review.opendev.org/c/openstack/nova/+/849104 | 12:03 |
opendevreview | sean mooney proposed openstack/nova master: Adds check, if admin has set compute service down https://review.opendev.org/c/openstack/nova/+/848886 | 12:03 |
sean-k-mooney | auniyal_: so now gerrit sees them both as the most recent revision | 12:03 |
auniyal_ | ack | 12:03 |
*** dasm|off is now known as dasm | 13:02 | |
opendevreview | Amit Uniyal proposed openstack/nova master: Adds link in releasenotes for hw machine type bug https://review.opendev.org/c/openstack/nova/+/849532 | 13:23 |
opendevreview | ribaudr proposed openstack/os-traits master: Add 'COMPUTE_STORAGE_VIRTIO_FS', 'COMPUTE_MEM_BACKING_FILE' https://review.opendev.org/c/openstack/os-traits/+/832769 | 13:25 |
Uggla | bauzas, do you have prepared the notes for today's meeting ? | 13:28 |
bauzas | Uggla: done : https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting | 13:57 |
bauzas | and thanks | 13:57 |
Uggla | bauzas, thx | 14:03 |
Uggla | gibi, you will start the meeting right ? | 14:06 |
bauzas | I have to drop by now | 14:06 |
bauzas | see you folks, you'll be missed | 14:07 |
gibi | bauzas: o/ have a nice one | 14:07 |
gibi | Uggla: as you would like to. I can start and run it until 18:30 CEST and then pass the rest to you. Or you can start from the beginning and I can be just your support running the meeting | 14:07 |
Uggla | gibi, option 1 is fine. | 14:08 |
gibi | OK, then I will start | 14:08 |
Uggla | gibi, I would rather because sometime I'm not fully available to start at 18h. | 14:11 |
gibi | sure, no problemo :) | 14:11 |
ralonsoh | sean-k-mooney, https://review.opendev.org/c/openstack/releases/+/849544 | 14:45 |
ralonsoh | is it ok to have a new os-vif version? | 14:45 |
ralonsoh | we need the trunks improvement | 14:45 |
sean-k-mooney | sure we can do one for m2 | 14:46 |
sean-k-mooney | i can propose a patch | 14:46 |
sean-k-mooney | ralonsoh: oh you already have | 14:46 |
ralonsoh | hehehe yes | 14:47 |
opendevreview | Amit Uniyal proposed openstack/nova master: Adds check, if admin has set compute service down https://review.opendev.org/c/openstack/nova/+/848886 | 15:31 |
gibi | foks, weekly nova meeting starts in 15 minutes here in the channel | 15:43 |
gibi | #startmeeting nova | 16:00 |
opendevmeet | Meeting started Tue Jul 12 16:00:38 2022 UTC and is due to finish in 60 minutes. The chair is gibi. Information about MeetBot at http://wiki.debian.org/MeetBot. | 16:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 16:00 |
opendevmeet | The meeting name has been set to 'nova' | 16:00 |
*** frickler is now known as frickler_pto | 16:00 | |
gibi | #chairs gibi Uggla | 16:00 |
gibi | #chair gibi Uggla | 16:00 |
opendevmeet | Current chairs: Uggla gibi | 16:00 |
gibi | o/ folks | 16:01 |
Uggla | o/ | 16:01 |
gibi | bauzas is away for the rest of the week so Uggla and I will be your host today | 16:01 |
gibi | lets wait a bit and see if there are others here for the meeting :) | 16:01 |
gibi | really? only me an Uggla? then it will be a quick meeting :) | 16:03 |
gibi | #topic Bugs (stuck/critical) | 16:03 |
gibi | #info One Critical bug | 16:04 |
gibi | #link https://bugs.launchpad.net/nova/+bug/1979047 Centos 9 Stream bug failure | 16:04 |
gibi | #link https://review.opendev.org/c/openstack/nova/+/849463 move the C9S job to both experimental and periodic-weekly | 16:04 |
gibi | #action bauzas to track results of this job on nova weekly meeting | 16:04 |
elodilles | o/ | 16:04 |
gibi | I actually closed that critical | 16:04 |
gibi | as we merged the move of the job to our periodic queue today | 16:04 |
gibi | so no need to track this as a critical bug | 16:04 |
gibi | elodilles: o./ | 16:04 |
gibi | #link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 11 new untriaged bugs (+1 since the last meeting) | 16:05 |
gibi | #link https://storyboard.openstack.org/#!/project/openstack/placement 27 open stories (+0 since the last meeting) in Storyboard for Placement | 16:05 |
elodilles | (sorry for being late) | 16:05 |
gibi | is there any bug we need to talk about here? | 16:05 |
gibi | I assume no | 16:06 |
gibi | #info Add yourself in the team bug roster if you want to help https://etherpad.opendev.org/p/nova-bug-triage-roster | 16:06 |
gibi | #info Next bug baton is passed to Uggla | 16:06 |
gibi | Uggla: are you OK to take the baton? | 16:06 |
Uggla | yep | 16:06 |
gibi | awesome | 16:06 |
gibi | thanks | 16:06 |
gibi | #topic Gate status | 16:06 |
gibi | #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs | 16:07 |
gibi | I don't see any new on in that list | 16:07 |
gibi | is there any gate bug we should discuss? | 16:07 |
elodilles | not a bug, but i guess the ovh.net issue ("Payment Required") impacts nova gate as well, doesn't it? | 16:08 |
gibi | I haven't checked but could be | 16:08 |
* gibi was busy hacking k8s operator for the placement service | 16:09 | |
elodilles | it causes POST_FAILURES | 16:09 |
gibi | elodilles: do we have a tracking bug for it? | 16:09 |
elodilles | oh, i see that is fixed | 16:09 |
elodilles | 2022-07-12 15:02:25 UTC Log uploads to OVH's Swift are resuming and our voucher is renewed; thanks again amorin! | 16:09 |
gibi | OK, so POST_FAILURES are OK to recheck now if the failure was Payment Required ;)_ | 16:10 |
elodilles | (from here: https://wiki.openstack.org/wiki/Infrastructure_Status ) | 16:10 |
gibi | elodilles: thanks for the info | 16:10 |
elodilles | np | 16:10 |
gibi | any other gate issue? | 16:10 |
elodilles | nothing i'm aware of at master branch | 16:10 |
gibi | then moving on | 16:12 |
gibi | #link https://zuul.openstack.org/builds?project=openstack%2Fplacement&pipeline=periodic-weekly Placement periodic job status | 16:12 |
gibi | #link https://zuul.opendev.org/t/openstack/builds?job_name=nova-emulation&pipeline=periodic-weekly&skip=0 Emulation periodic job runs | 16:12 |
gibi | both placement and nova emulation are green | 16:12 |
gibi | from next week we will check centos9 job here as well | 16:12 |
gibi | #info Please look at the gate failures and file a bug report with the gate-failure tag. | 16:12 |
gibi | #info STOP DOING BLIND RECHECKS aka. 'recheck' https://docs.openstack.org/project-team-guide/testing.html#how-to-handle-test-failures | 16:12 |
gibi | anything else about the gate before I move on? | 16:13 |
elodilles | - | 16:13 |
gibi | #topic Release Planning | 16:13 |
gibi | #link https://releases.openstack.org/zed/schedule.html | 16:13 |
gibi | #info Zed-2 is in 2 days | 16:13 |
gibi | #info we'll stop accepting specs by Monday | 16:13 |
gibi | #action bauzas to send an email on Monday about specs and abandon the yoga specs | 16:13 |
gibi | we have a small amount of open specs | 16:13 |
gibi | if you have one then this is the last chance for Zed | 16:14 |
gibi | feel free to ping me if you need help to land them | 16:14 |
gibi | is there any other Release info to share? | 16:15 |
gibi | #topic Review priorities | 16:16 |
gibi | #link https://review.opendev.org/q/status:open+(project:openstack/nova+OR+project:openstack/placement+OR+project:openstack/os-traits+OR+project:openstack/os-resource-classes+OR+project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/osc-placement)+label:Review-Priority%252B1 | 16:16 |
gibi | #link https://review.opendev.org/c/openstack/project-config/+/837595 Gerrit policy for Review-prio contributors flag. We need project-config cores to merge it. | 16:16 |
gibi | #link https://docs.openstack.org/nova/latest/contributor/process.html#what-the-review-priority-label-in-gerrit-are-use-for Documentation we already have | 16:16 |
gibi | at some point we should start to review the list of prio marked review here but not today as so few of us here | 16:16 |
gibi | #topic Stable Branches | 16:16 |
gibi | elodilles: ? | 16:16 |
gibi | or more like <mic> -> elodilles | 16:17 |
elodilles | unfortunately not so much news, but let me copy them here | 16:17 |
elodilles | #info stable/train is blocked, fix exists but hasn't merged yet due to intermittent failures | 16:17 |
elodilles | #info stable branch status / gate failures tracking etherpad: https://etherpad.opendev.org/p/nova-stable-branch-ci | 16:17 |
gibi | thanks | 16:17 |
elodilles | so in short, the intermittent failures are still causing pain for us :( | 16:18 |
elodilles | np | 16:18 |
gibi | elodilles: so https://review.opendev.org/c/openstack/nova/+/844530 the one we need for train? | 16:18 |
elodilles | yes | 16:18 |
gibi | ack | 16:19 |
gibi | thanks | 16:19 |
gibi | #topic Open discussion | 16:19 |
gibi | (bauzas) Opportunities for low-hanging-fruits, anyone ? (to be punted to next week) | 16:19 |
gibi | I guess we punt this to next week again | 16:19 |
gibi | but if you see low hangig fruits then note them for bauzas | 16:19 |
gibi | any other topic to discuss today? | 16:19 |
elodilles | nothing from me | 16:20 |
gibi | it seems we are in summer mode | 16:20 |
gibi | but at least I can leave in time for a game night ;) | 16:20 |
elodilles | hahh, have fun then! ; | 16:21 |
elodilles | :) | 16:21 |
gibi | thanks | 16:21 |
gibi | so lets close this | 16:21 |
gibi | #endmeeting | 16:21 |
opendevmeet | Meeting ended Tue Jul 12 16:21:35 2022 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 16:21 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/nova/2022/nova.2022-07-12-16.00.html | 16:21 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/nova/2022/nova.2022-07-12-16.00.txt | 16:21 |
opendevmeet | Log: https://meetings.opendev.org/meetings/nova/2022/nova.2022-07-12-16.00.log.html | 16:21 |
Uggla | gibi, thx for running the meeting | 16:21 |
gibi | Uggla, elodilles: thanks for joining :) | 16:21 |
* gibi logs off | 16:21 | |
elodilles | o/ | 16:22 |
elodilles | :) | 16:22 |
Uggla | gibi, have fun | 16:22 |
colby_ | Hey All. Ive messaged before about our vGPU issues. Ive been working on trying to get this to work for a while now and we would really like to be able to offer this product to our users. Someone here mentioned that Nova should reuse the mdev devices that get created. The problem is that it is not. After spinning up vgpu instances then removing the mdev devices stay, but when we try to spin up a new instance its trying | 17:29 |
colby_ | to create another medev device (I think it is since its trying to use a resource provide of a different pci address on the card that already has all the mdevs created). | 17:29 |
colby_ | mdevctl list output: 67e63f1e-07f2-474f-874c-826a024c10ec 0000:21:01.7 nvidia-563 manual | 17:30 |
colby_ | 3a974d13-5dea-4bfc-b034-533f6e754349 0000:21:03.4 nvidia-563 manual | 17:30 |
colby_ | d3586a0a-2e56-421e-923f-20797fe74ab5 0000:21:03.7 nvidia-563 manual | 17:30 |
colby_ | 150c155c-da0b-45a6-8bc1-a8016231b100 0000:21:04.1 nvidia-563 manual | 17:30 |
colby_ | but spinning up a new instance tried to use the resource provider _pci_0000_21_02_4 (and pci 21 is already full) | 17:31 |
colby_ | how does nova detect the already created devices and use those? Should it be using the resource provider of those already created mdev (eg _pci_0000_21_01_7) | 17:32 |
colby_ | We are on Victoria release, Centos 8 Stream, Nvidia A40 GPU | 17:34 |
opendevreview | Merged openstack/nova master: zuul: Put Centos9 Stream job periodic-weekly and experimental https://review.opendev.org/c/openstack/nova/+/849463 | 17:54 |
sean-k-mooney | colby_: there is definetly a bug with this we are hitting it downstream too and still investigating | 18:36 |
sean-k-mooney | colby_: i can see if i can get you the link to where we try to reuse the mdev | 18:36 |
sean-k-mooney | colby_: https://github.com/openstack/nova/blob/de65131f92ba5ba812e33e6ff63be0991687413a/nova/virt/libvirt/driver.py#L8261-L8278= | 18:37 |
colby_ | sean-k-mooney: oh good glad to know this is not just us. Is there a bug filed yet that I could follow? | 18:40 |
sean-k-mooney | downstream definitly ill grab it and see if we have an upstram one. we were still trying to root cause it | 18:40 |
sean-k-mooney | colby_: basically we were QEing cold migration and noticed that the devices were not being reused | 18:41 |
sean-k-mooney | so depening on the order the test ran it either worked or failed | 18:41 |
sean-k-mooney | so we are looking at it as part of https://bugzilla.redhat.com/show_bug.cgi?id=1701281 | 18:42 |
sean-k-mooney | but i think we are going to break this out as a seperate upstream and downstream bug | 18:42 |
colby_ | would it be the same root cause as we are seeing just deleting and trying to create new instances? | 18:43 |
sean-k-mooney | a host reboot or deleteing the unused mdevs is the work around we are usign right now | 18:43 |
sean-k-mooney | so if you loop over the domain xmls and delete any mdev not used by an xml that "fixes it" temporally | 18:44 |
colby_ | yea thats what I ended up having to do manually is remove the mdevs that got created then new instances could be spun up | 18:44 |
sean-k-mooney | but thats not the correct fix | 18:44 |
colby_ | I suppose I could create a cron job to do that so we can remove the manual part | 18:45 |
sean-k-mooney | colby_: bauzas is on on pto tomorrow and friday but i think they are here tursday | 18:45 |
sean-k-mooney | they tought it might be related to who we do the mdev lookup | 18:46 |
sean-k-mooney | but since you have the issue | 18:46 |
sean-k-mooney | could you compare the list of mdevs returned by mdevctl and libvirt via virsh | 18:46 |
sean-k-mooney | libvirt does some caching so one of the guesses we had is it might be getting out of sync | 18:47 |
sean-k-mooney | actully hum | 18:47 |
sean-k-mooney | https://review.opendev.org/c/openstack/nova/+/832489/1/nova/virt/libvirt/utils.py | 18:49 |
sean-k-mooney | i wonder if its this ^ | 18:49 |
sean-k-mooney | https://bugs.launchpad.net/nova/+bug/1951656 | 18:49 |
sean-k-mooney | colby_: do you know what version fo libvirt you are using | 18:50 |
colby_ | 7.9.0-1 | 18:50 |
colby_ | you just want the output form `virsh nodedev-list` ? | 18:50 |
sean-k-mooney | that was in 7.7 https://github.com/libvirt/libvirt/commit/3bd8181bc5548a0ce81107cbfb480dfdcba5679d | 18:50 |
sean-k-mooney | colby_: yes please nodedev-list shoudl have the names | 18:51 |
sean-k-mooney | and we can check the format | 18:51 |
sean-k-mooney | to see if it has the parent info or not | 18:51 |
colby_ | https://pastebin.com/94YfBRH9 | 18:54 |
sean-k-mooney | there is also https://review.opendev.org/c/openstack/nova/+/838976 as another possible fix | 18:55 |
sean-k-mooney | odd i dont see any mdevs there | 18:55 |
colby_ | ha woops sorry | 18:56 |
colby_ | wrong machine | 18:56 |
colby_ | https://pastebin.com/NAfjxUt7 | 18:56 |
colby_ | mdevctl list output: https://pastebin.com/rXmUftzj | 18:57 |
sean-k-mooney | yep so virsh has the extended names | 18:57 |
sean-k-mooney | so its not a caching issue but it proably is a parsing issue | 18:58 |
sean-k-mooney | since both consitent | 18:58 |
sean-k-mooney | """Note that the lookup of the mdev device by UUID are needed in order | 18:59 |
sean-k-mooney | to keep the ability to recreate assigned mediated devices on a reboot of | 18:59 |
sean-k-mooney | the compute node | 18:59 |
sean-k-mooney | """ | 18:59 |
sean-k-mooney | but i bet its also needed to be able to reuse the mdevs at all | 18:59 |
sean-k-mooney | colby_: im not 100% sure this will fix it but i have set https://review.opendev.org/c/openstack/nova/+/838976 as a review priority and ill follow up with sylvain when they are back | 19:01 |
colby_ | ok sounds good. Im happy to test out the patches on our system if you want | 19:03 |
colby_ | no one is using this hypervisor right now but the admins | 19:03 |
sean-k-mooney | if you wanted to test https://review.opendev.org/c/openstack/nova/+/838976 and provide feedback on the review that is the more compelte fix | 19:04 |
colby_ | sure. Ill get those in place today and let you know if it helps our case at all | 19:04 |
sean-k-mooney | most of the opencomment are about updating the doc strings but the patch should work as is | 19:04 |
sean-k-mooney | we might also add a functional repoducer if we can recaret the mdev resue issue | 19:05 |
sean-k-mooney | colby_: thanks | 19:05 |
*** dasm is now known as dasm|off | 22:14 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!