16:01:09 #startmeeting nova 16:01:09 Meeting started Tue Feb 11 16:01:09 2025 UTC and is due to finish in 60 minutes. The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:01:09 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:01:09 The meeting name has been set to 'nova' 16:01:32 o/ 16:01:47 #link https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting 16:01:53 hello! 16:02:06 o/ 16:02:46 o/ 16:03:01 I guess we can softly starty 16:04:26 #topic Bugs (stuck/critical) 16:04:32 #info No Critical bug 16:04:37 #info Add yourself in the team bug roster if you want to help https://etherpad.opendev.org/p/nova-bug-triage-roster 16:04:45 any bugs people wanna discuss ? 16:05:41 apparently not 16:06:25 moving on 16:06:32 #topic Gate status 16:06:38 #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs 16:06:44 #link https://etherpad.opendev.org/p/nova-ci-failures-minimal 16:06:57 #link https://zuul.openstack.org/builds?project=openstack%2Fnova&project=openstack%2Fplacement&branch=stable%2F*&branch=master&pipeline=periodic-weekly&skip=0 Nova&Placement periodic jobs status 16:07:04 all periodics seem to be green 16:07:23 #info Please look at the gate failures and file a bug report with the gate-failure tag. 16:07:28 #info Please try to provide meaningful comment when you recheck 16:07:37 any gate failures to discuss ? 16:08:03 I have one actually 16:08:33 when looking at https://review.opendev.org/c/openstack/nova/+/940642, even by rechecking, I got the same exceptions on nova-next and nova-multi-cell jobs 16:08:37 for the same test 16:09:06 https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_ba0/940642/5/check/nova-next/ba009c2/testr_results.html 16:09:12 https://6ae015e4c924496dfda3-74c26bb67ca36871e200d07f3210a3f0.ssl.cf1.rackcdn.com/940642/5/check/nova-multi-cell/c2a2f68/testr_results.html 16:09:19 have people seen the same failure ? 16:09:43 last time I looked, I saw nova-next failing once over 5 times 16:09:59 this would create some problems later in the week 16:10:39 so I wonder whether we should skip that specific test 16:11:26 @bauzas, the one I pushed yesterday pass the tests 16:12:01 hmmm 16:12:11 okay, will wee 16:12:13 see* 16:12:19 * gibi joining late 16:12:41 maybe you were not unlucky. 16:13:25 oh, nvm found the root case, PEBKAC 16:13:55 https://6ae015e4c924496dfda3-74c26bb67ca36871e200d07f3210a3f0.ssl.cf1.rackcdn.com/940642/5/check/nova-multi-cell/c2a2f68/controller/logs/screen-n-sch.txt 16:14:02 this is because of my patch 16:14:09 so we can move on 16:14:29 any other failure to discuss ? 16:15:05 #topic Release Planning 16:15:30 #link https://releases.openstack.org/epoxy/schedule.html 16:15:35 #info Nova deadlines are set in the above schedule 16:15:41 #info 2 weeks before Feature Freeze 16:16:01 so maybe we should go straight to the next topic 16:16:07 #topic Review priorities 16:16:14 #link https://etherpad.opendev.org/p/nova-2025.1-status 16:16:39 I'll be honest, I was swamped by other priorities but I want to review a long list of changes this week 16:16:45 that becomes super urgent 16:17:35 fyi, I added at line 131 links to the blueprint and gerrit topic wrt scheduler-hints-in-server-details 16:17:45 please take a look when you folks have some time :) 16:18:02 noted, appreciated 16:18:23 given my on-off presence on IRC, I appreciate any way to discuss async 16:18:45 #topic PTG planning 16:19:14 well, I won't run that PTG by myself but this is a reminder that we'll have a PTG 16:19:22 #info Next PTG will be held on Apr 7-11 16:19:39 I started to draft some etherpad for that PTG 16:19:46 #link https://etherpad.opendev.org/p/nova-2025.2-ptg 16:19:55 feel free to add your topics worth of interest into it ^ 16:21:07 #topic Stable Branches 16:21:16 elodilles_pto: oh he's not here 16:21:30 no worries, let's skip it for this week 16:21:37 #topic vmwareapi 3rd-party CI efforts Highlights 16:21:41 fwiesel: anything to talk ? 16:21:51 Hi, yes... some progress on the last outstanding failure. 16:22:15 (sorry, i'm here, just forgot to change my nick o:) but stable gates seems to be healthy, fwiw) 16:22:30 I suspect I do have a possible patch for that, but now I run into a bug into Cinder. I've created a bug report for it (https://bugs.launchpad.net/cinder/+bug/2097771) and want to fix that. 16:22:35 elodilles: noted. 16:22:51 fwiesel: ack, thanks for the reporting 16:22:59 If that works out, I will ask you to review my solution for nova making use of that. 16:23:05 That's from my side. 16:24:35 thanks 16:24:54 #topic Open discussion 16:25:30 we have one topic to discuss today 16:25:38 sp-bmilanov -- 927474: libvirt: Fetch the available vCPUs from the respective cgroup 16:25:40 sp-bmilanov: around ? 16:25:44 hi! yep 16:25:50 #link https://review.opendev.org/c/openstack/nova/+/927474 16:26:05 sean-k-mooney: are you here ? 16:26:13 yes 16:26:28 the topic is about how to handle the case where Nova is running in a hyperconverged setup where the are CPUs that are not available to Nova 16:26:33 i do not belive nova should do that but ill let sp-bmilanov explain 16:26:54 I was hoping to crowd-source alternatives 16:27:27 perhaps some context will help 16:27:49 the fundamental problem IMO is with libvirt -- it assumes all online CPUs == schedulable CPUs, and does not provide an API to query actually schedulable CPUs 16:27:53 > CPUs that are not available to Nova < that feels like the existing cpuset config can handle 16:27:54 its possible to use the cgroups api to subdevied the host and mark some host as reserved for exclusive use fo a slice/partion 16:28:02 gibi: it can 16:28:20 sp-bmilanov would like nova to parse the cgroups toplogy and try and auto comput eit 16:28:43 one of my bigest issues with that is libvirt and nova-comptue might have diffent view of the cgroups 16:28:58 i.e. nova is deployed in a podman container and livbirt is installed on the host 16:29:12 sean-k-mooney: that was our initial approach, but we trying to solve the issue of Nova not using CPUs that are not availble for it 16:29:19 what is the problem then setting the cpuset config option to only allow nova to use the CPUs the deployer wants? 16:29:33 sp-bmilanov: we have config options ot specify which cpu nova can use 16:29:49 but you explcitly do not want to use the mechanisume we provide for this 16:30:25 yep, we'd like to find a way in which Nova can handle this itself 16:30:49 else it's a runtime error on VM "creation" 16:31:13 sp-bmilanov: why the cpuset config is not enough? 16:31:41 gibi: it does work but sp-bmilanov does not like the ux as they precive it as overly complex to have to configure it 16:31:43 what is the reason you want more automatic discovery of available cpus? 16:32:43 nova has supported specifying which cpus can be used for guest via config since the intoduction of vcpu_pin_set in essex ish 16:33:18 we currently declare it to be the isntaller responcablity to configure nova for the host it is deployed on 16:33:34 yes, partly UX, and that we think it might be possible to do automatically 16:34:13 i dont think it really is, not without libvirt changes IMO 16:34:44 i think we would need libvirt to provide an api that told nova which cpus could be used 16:35:03 we currently use the capablitis api to discuver this info from libvirt 16:35:24 that provides us the cpu toplogy, numa affintity ectra 16:36:09 it does not tell use which cpus are reserved for exlisve use and cannot be used by libvirt to spawn a vm 16:37:11 I agree that this is ultimately an issue in libvirt, but I am concerned that I am not sure how long it would take this to propagate to OpenStack, and in which releases 16:38:01 and if there is something adequate we can do in Nova in the meantime, why not 16:38:03 i dont think it wold be a good idea to make nova discover this via parsing /sys or /proc 16:38:16 that's a good question. Usually we ask the reporter to first engage with the libvirt community to see their thoughts 16:38:21 I don't even understand how that _could_ be a thing 16:38:53 the poc patch looks at a particalar part fo /sys to get the cpu set allowed by nova's cgroup 16:38:58 dansmith: the use case is hyperconverged setups where there is a e.g. storage service with reserved CPUs installed on the hypervisor 16:39:05 but as i noted above that is nto the same as libvirts 16:39:07 if we need to know what libvirt has available for us and we can't ask it, I don't really see what options we have for discovering it 16:39:10 so it would be the wrong info 16:39:20 sp-bmilanov: I get the use-case totally 16:39:43 sean-k-mooney: does /sys's view of available CPUs change by cgroup? I had assumed not 16:40:24 the patch was going ot look at cgroup specific subpaths 16:40:32 https://review.opendev.org/c/openstack/nova/+/927474/10/nova/virt/libvirt/host.py#93 16:40:32 but either way, that doesn't seem like a solution because we don't know that our cgroup is the same (or configured the same) as libvirt's 16:40:39 /sys/fs/cgroup/cpuset/machine.slice/cpuset.cpus', 16:40:43 ah I see 16:41:30 yep, that's not ideal, but we can simplify or extend the checks as much as deemed necessary 16:41:34 last time we tried to look at sysfs for CPU management, it created some problems 16:42:02 sp-bmilanov: in general we try to aovid parsing /sys or /proc if at all posible 16:42:05 so I'm afraid some OSes couldn't support that 16:42:11 it create mantaince issues for us 16:42:23 a feature that works "sometimes" if a bunch of assumptions are made (i.e. libvirt and nova in the same slice) seems worse than no feature to me, in a lot of cases 16:43:03 if we were to do it i could maybe see doing if and only if you provide the path or slice as a config option 16:43:13 and only if you configure it 16:43:16 i.e. opt in 16:43:22 not on by default 16:43:24 sean-k-mooney that sounds good 16:43:38 an opt-in, if-set -like option 16:44:00 a bit like the CPU pinning, but more autonomous 16:44:33 mostly but im not sure how other feel about that 16:44:59 seems yet another knob to me 16:45:11 can we see all the slices in sysfs or only ours? 16:45:18 with a fragile interface and many ways to get it wrong 16:45:29 that depends 16:45:41 if your in a contianer you cant see the host one by default 16:45:41 if we can see all of them, then passing the name/path to the one libvirt is in so we can inspect it would be better, but idk.. seems fragile to me like bauzas says 16:46:00 sean-k-mooney: right, so we likely can't see libvirt's either right? 16:46:20 not thte way tripleo or kolla deploys nova 16:46:31 or our new instaler 16:46:45 yeah 16:46:48 sean-k-mooney mentioned in the change that kolla and tripleo are working around it... sean-k-mooney, where do you "not typiclaly run the nova-comptue binary with 16:46:48 cgroupns_mode: "host"" 16:47:11 right so exisitng installer only set that on libvirt 16:47:14 not on nova 16:47:21 so nova-comptue has a restricted view 16:47:37 ah, right.. but which installer is that? 16:47:55 (I might be missing Nova context) 16:48:26 (btw, there's the other side of the discussion, if we were to treat that currently Nova can start and schedule VMs with this misconfiguration, what would be the fix so it does not start at all (as sean-k-mooney suggested in the change comments)) 16:49:24 libvirt has it in kolla https://github.com/openstack/kolla-ansible/blob/master/ansible/roles/nova-cell/defaults/main.yml#L11 16:49:29 but not nova-compute https://github.com/openstack/kolla-ansible/blob/master/ansible/roles/nova-cell/defaults/main.yml#L55 16:49:40 adn its the same in tripleo and the new isntaller redhat wrote 16:50:08 sp-bmilanov: what installer do you use 16:50:21 time check : 10 mins before end of meeting 16:50:34 dansmith: bauzas am i right in thinking you agree this is out of scope fo nova 16:50:40 and in scope fo the installer 16:51:09 we have seen a few, but yes, kolla and no installer (opereating system packages) 16:51:20 I mean, that's kinda my point that this seems like us introspecting the system to dynamically configure ourselves, but in a way that can only work in certain situations 16:51:20 I'd tend to say yes, this sounds a tooling mechanism that could set the config correctly 16:51:51 i would be less agaisnt this if libvirt provided an api for this 16:51:57 the concern I have is that this approach doesn't sound generic at all and very targeted to a specific case 16:52:22 also those config options we use are ment to be virt dirver independent 16:52:37 right, if libvirt exposes it, that's totally different 16:53:02 in that case, configuring libvirt is the job of the installer, and nova can maybe even lose some config, right? or at least, not always require it to be set 16:53:05 also it opens the question what if the cgroups config changes while nova-compute is running 16:53:18 fun times :) 16:53:40 at least the current config nova has is validated at nova-compute startup 16:54:11 yep it wont actully catch this specific case 16:54:29 i.e. we validate the cpu are in the info provided by libvirt 16:54:35 and if they are online ectra 16:54:48 but we dont have visablity into cgroups at all really 16:54:53 at least not in this context 16:55:19 we have some basic check to know if its cgroups v2 vs v1 16:56:17 time check : 4 mins left. 16:57:11 this is complexity that either lives in the installer or lives in nova, as nova is big enough already I rather see this complexity added in the installer. Sure we have multiple installers so that duplicates some effort. 16:57:32 ++ 16:57:35 I was wondering if there is also a way to know this outside of reading cgroups that might be more robust 16:57:50 if there is im not aware of one 16:58:05 there is actually 16:58:11 ....and it's talking to libvirt :D 16:58:15 :) 16:58:36 :D 16:58:43 I guess we can't solve that question now 16:59:09 sp-bmilanov: could you at least talk to the libvirt community and see their toughts about that ? 16:59:15 in any case, it sounds like a good idea to initiate a discussion with libvirt devs as well 16:59:16 and then come back to us ? 16:59:28 bauzas: yep 16:59:31 cool 16:59:41 thanks then 16:59:52 you're welcome 17:00:04 thanks for the input, all 17:00:08 if you're OK, I think we're done for today 17:00:14 thanks all 17:00:17 #endmeeting