16:01:09 <bauzas> #startmeeting nova
16:01:09 <opendevmeet> Meeting started Tue Feb 11 16:01:09 2025 UTC and is due to finish in 60 minutes.  The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:01:09 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:01:09 <opendevmeet> The meeting name has been set to 'nova'
16:01:32 <fwiesel> o/
16:01:47 <bauzas> #link https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting
16:01:53 <sp-bmilanov> hello!
16:02:06 <elodilles_pto> o/
16:02:46 <Uggla> o/
16:03:01 <bauzas> I guess we can softly starty
16:04:26 <bauzas> #topic Bugs (stuck/critical)
16:04:32 <bauzas> #info No Critical bug
16:04:37 <bauzas> #info Add yourself in the team bug roster if you want to help https://etherpad.opendev.org/p/nova-bug-triage-roster
16:04:45 <bauzas> any bugs people wanna discuss ?
16:05:41 <bauzas> apparently not
16:06:25 <bauzas> moving on
16:06:32 <bauzas> #topic Gate status
16:06:38 <bauzas> #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs
16:06:44 <bauzas> #link https://etherpad.opendev.org/p/nova-ci-failures-minimal
16:06:57 <bauzas> #link https://zuul.openstack.org/builds?project=openstack%2Fnova&project=openstack%2Fplacement&branch=stable%2F*&branch=master&pipeline=periodic-weekly&skip=0 Nova&Placement periodic jobs status
16:07:04 <bauzas> all periodics seem to be green
16:07:23 <bauzas> #info Please look at the gate failures and file a bug report with the gate-failure tag.
16:07:28 <bauzas> #info Please try to provide meaningful comment when you recheck
16:07:37 <bauzas> any gate failures to discuss ?
16:08:03 <bauzas> I have one actually
16:08:33 <bauzas> when looking at https://review.opendev.org/c/openstack/nova/+/940642, even by rechecking, I got the same exceptions on nova-next and nova-multi-cell jobs
16:08:37 <bauzas> for the same test
16:09:06 <bauzas> https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_ba0/940642/5/check/nova-next/ba009c2/testr_results.html
16:09:12 <bauzas> https://6ae015e4c924496dfda3-74c26bb67ca36871e200d07f3210a3f0.ssl.cf1.rackcdn.com/940642/5/check/nova-multi-cell/c2a2f68/testr_results.html
16:09:19 <bauzas> have people seen the same failure ?
16:09:43 <bauzas> last time I looked, I saw nova-next failing once over 5 times
16:09:59 <bauzas> this would create some problems later in the week
16:10:39 <bauzas> so I wonder whether we should skip that specific test
16:11:26 <Uggla> @bauzas, the one I pushed yesterday pass the tests
16:12:01 <bauzas> hmmm
16:12:11 <bauzas> okay, will wee
16:12:13 <bauzas> see*
16:12:19 * gibi joining late
16:12:41 <Uggla> maybe you were not unlucky.
16:13:25 <bauzas> oh, nvm found the root case, PEBKAC
16:13:55 <bauzas> https://6ae015e4c924496dfda3-74c26bb67ca36871e200d07f3210a3f0.ssl.cf1.rackcdn.com/940642/5/check/nova-multi-cell/c2a2f68/controller/logs/screen-n-sch.txt
16:14:02 <bauzas> this is because of my patch
16:14:09 <bauzas> so we can move on
16:14:29 <bauzas> any other failure to discuss ?
16:15:05 <bauzas> #topic Release Planning
16:15:30 <bauzas> #link https://releases.openstack.org/epoxy/schedule.html
16:15:35 <bauzas> #info Nova deadlines are set in the above schedule
16:15:41 <bauzas> #info 2 weeks before Feature Freeze
16:16:01 <bauzas> so maybe we should go straight to the next topic
16:16:07 <bauzas> #topic Review priorities
16:16:14 <bauzas> #link https://etherpad.opendev.org/p/nova-2025.1-status
16:16:39 <bauzas> I'll be honest, I was swamped by other priorities but I want to review a long list of changes this week
16:16:45 <bauzas> that becomes super urgent
16:17:35 <dviroel> fyi, I added at line 131 links to the blueprint and gerrit topic wrt scheduler-hints-in-server-details
16:17:45 <dviroel> please take a look when you folks have some time :)
16:18:02 <bauzas> noted, appreciated
16:18:23 <bauzas> given my on-off presence on IRC, I appreciate any way to discuss async
16:18:45 <bauzas> #topic PTG planning
16:19:14 <bauzas> well, I won't run that PTG by myself but this is a reminder that we'll have a PTG
16:19:22 <bauzas> #info Next PTG will be held on Apr 7-11
16:19:39 <bauzas> I started to draft some etherpad for that PTG
16:19:46 <bauzas> #link https://etherpad.opendev.org/p/nova-2025.2-ptg
16:19:55 <bauzas> feel free to add your topics worth of interest into it ^
16:21:07 <bauzas> #topic Stable Branches
16:21:16 <bauzas> elodilles_pto: oh he's not here
16:21:30 <bauzas> no worries, let's skip it for this week
16:21:37 <bauzas> #topic vmwareapi 3rd-party CI efforts Highlights
16:21:41 <bauzas> fwiesel: anything to talk ?
16:21:51 <fwiesel> Hi, yes... some progress on the last outstanding failure.
16:22:15 <elodilles> (sorry, i'm here, just forgot to change my nick o:) but stable gates seems to be healthy, fwiw)
16:22:30 <fwiesel> I suspect I do have a possible patch for that, but now I run into a bug into Cinder. I've created a bug report for it (https://bugs.launchpad.net/cinder/+bug/2097771) and want to fix that.
16:22:35 <bauzas> elodilles: noted.
16:22:51 <bauzas> fwiesel: ack, thanks for the reporting
16:22:59 <fwiesel> If that works out, I will ask you to review my solution for nova making use of that.
16:23:05 <fwiesel> That's from my side.
16:24:35 <bauzas> thanks
16:24:54 <bauzas> #topic Open discussion
16:25:30 <bauzas> we have one topic to discuss today
16:25:38 <bauzas> sp-bmilanov -- 927474: libvirt: Fetch the available vCPUs from the respective cgroup
16:25:40 <bauzas> sp-bmilanov: around ?
16:25:44 <sp-bmilanov> hi! yep
16:25:50 <bauzas> #link https://review.opendev.org/c/openstack/nova/+/927474
16:26:05 <bauzas> sean-k-mooney: are you here ?
16:26:13 <sean-k-mooney> yes
16:26:28 <sp-bmilanov> the topic is about how to handle the case where Nova is running in a hyperconverged setup where the are CPUs that are not available to Nova
16:26:33 <sean-k-mooney> i do not belive nova should do that but ill let sp-bmilanov explain
16:26:54 <sp-bmilanov> I was hoping to crowd-source alternatives
16:27:27 <sean-k-mooney> perhaps some context will help
16:27:49 <sp-bmilanov> the fundamental problem IMO is with libvirt -- it assumes all online CPUs == schedulable CPUs, and does not provide an API to query actually schedulable CPUs
16:27:53 <gibi> > CPUs that are not available to Nova < that feels like the existing cpuset config can handle
16:27:54 <sean-k-mooney> its possible to use the cgroups api to subdevied the host and mark some host as reserved for exclusive use fo a slice/partion
16:28:02 <sean-k-mooney> gibi: it can
16:28:20 <sean-k-mooney> sp-bmilanov would like nova to parse the cgroups toplogy and try and auto comput eit
16:28:43 <sean-k-mooney> one of my bigest issues with that is libvirt and nova-comptue might have diffent view of the cgroups
16:28:58 <sean-k-mooney> i.e. nova is deployed in a podman container and livbirt is installed on the host
16:29:12 <sp-bmilanov> sean-k-mooney: that was our initial approach, but we trying to solve the issue of Nova not using CPUs that are not availble for it
16:29:19 <gibi> what is the problem then setting the cpuset config option to only allow nova to use the CPUs the deployer wants?
16:29:33 <sean-k-mooney> sp-bmilanov: we have config options ot specify which cpu nova can use
16:29:49 <sean-k-mooney> but you explcitly do not want to use the mechanisume we provide for this
16:30:25 <sp-bmilanov> yep, we'd like to find a way in which Nova can handle this itself
16:30:49 <sp-bmilanov> else it's a runtime error on VM "creation"
16:31:13 <gibi> sp-bmilanov: why the cpuset config is not enough?
16:31:41 <sean-k-mooney> gibi: it does work but sp-bmilanov does not like the ux as they precive it as overly complex to have to configure it
16:31:43 <gibi> what is the reason you want more automatic discovery of available cpus?
16:32:43 <sean-k-mooney> nova has supported specifying which cpus can be used for guest via config since the intoduction of vcpu_pin_set in essex ish
16:33:18 <sean-k-mooney> we currently declare it to be the isntaller responcablity to configure nova for the host it is deployed on
16:33:34 <sp-bmilanov> yes, partly UX, and that we think it might be possible to do automatically
16:34:13 <sean-k-mooney> i dont think it really is, not without libvirt changes IMO
16:34:44 <sean-k-mooney> i think we would need libvirt to provide an api that told nova which cpus could be used
16:35:03 <sean-k-mooney> we currently use the capablitis api to discuver this info from libvirt
16:35:24 <sean-k-mooney> that provides us the cpu toplogy, numa affintity ectra
16:36:09 <sean-k-mooney> it does not tell use which cpus are reserved for exlisve use and cannot be used by libvirt to spawn a vm
16:37:11 <sp-bmilanov> I agree that this is ultimately an issue in libvirt, but I am concerned that I am not sure how long it would take this to propagate to OpenStack, and in which releases
16:38:01 <sp-bmilanov> and if there is something adequate we can do in Nova in the meantime, why not
16:38:03 <sean-k-mooney> i dont think it wold be a good idea to make nova discover this via parsing /sys or /proc
16:38:16 <bauzas> that's a good question. Usually we ask the reporter to first engage with the libvirt community to see their thoughts
16:38:21 <dansmith> I don't even understand how that _could_ be a thing
16:38:53 <sean-k-mooney> the poc patch looks at a particalar part fo /sys to get the cpu set allowed by nova's cgroup
16:38:58 <sp-bmilanov> dansmith: the use case is hyperconverged setups where there is a e.g. storage service with reserved CPUs installed on the hypervisor
16:39:05 <sean-k-mooney> but as i noted above that is nto the same as libvirts
16:39:07 <dansmith> if we need to know what libvirt has available for us and we can't ask it, I don't really see what options we have for discovering it
16:39:10 <sean-k-mooney> so it would be the wrong info
16:39:20 <dansmith> sp-bmilanov: I get the use-case totally
16:39:43 <dansmith> sean-k-mooney: does /sys's view of available CPUs change by cgroup? I had assumed not
16:40:24 <sean-k-mooney> the patch was going ot look at cgroup specific subpaths
16:40:32 <sean-k-mooney> https://review.opendev.org/c/openstack/nova/+/927474/10/nova/virt/libvirt/host.py#93
16:40:32 <dansmith> but either way, that doesn't seem like a solution because we don't know that our cgroup is the same (or configured the same) as libvirt's
16:40:39 <sean-k-mooney> /sys/fs/cgroup/cpuset/machine.slice/cpuset.cpus',
16:40:43 <dansmith> ah I see
16:41:30 <sp-bmilanov> yep, that's not ideal, but we can simplify or extend the checks as much as deemed necessary
16:41:34 <bauzas> last time we tried to look at sysfs for CPU management, it created some problems
16:42:02 <sean-k-mooney> sp-bmilanov: in general we try to aovid parsing /sys or /proc if at all posible
16:42:05 <bauzas> so I'm afraid some OSes couldn't support that
16:42:11 <sean-k-mooney> it create mantaince issues for us
16:42:23 <dansmith> a feature that works "sometimes" if a bunch of assumptions are made (i.e. libvirt and nova in the same slice) seems worse than no feature to me, in a lot of cases
16:43:03 <sean-k-mooney> if we were to do it i could maybe see doing if and only if you provide the path or slice as a config option
16:43:13 <sean-k-mooney> and only if you configure it
16:43:16 <sean-k-mooney> i.e. opt in
16:43:22 <sean-k-mooney> not on by default
16:43:24 <sp-bmilanov> sean-k-mooney that sounds good
16:43:38 <sp-bmilanov> an opt-in, if-set -like option
16:44:00 <sp-bmilanov> a bit like the CPU pinning, but more autonomous
16:44:33 <sean-k-mooney> mostly but im not sure how other feel about that
16:44:59 <bauzas> seems yet another knob to me
16:45:11 <dansmith> can we see all the slices in sysfs or only ours?
16:45:18 <bauzas> with a fragile interface and many ways to get it wrong
16:45:29 <sean-k-mooney> that depends
16:45:41 <sean-k-mooney> if your in a contianer you cant see the host one by default
16:45:41 <dansmith> if we can see all of them, then passing the name/path to the one libvirt is in so we can inspect it would be better, but idk.. seems fragile to me like bauzas says
16:46:00 <dansmith> sean-k-mooney: right, so we likely can't see libvirt's either right?
16:46:20 <sean-k-mooney> not thte way tripleo or kolla deploys nova
16:46:31 <sean-k-mooney> or our new instaler
16:46:45 <dansmith> yeah
16:46:48 <sp-bmilanov> sean-k-mooney mentioned in the change that kolla and tripleo are working around it... sean-k-mooney, where do you "not typiclaly run the nova-comptue binary with
16:46:48 <sp-bmilanov> cgroupns_mode: "host""
16:47:11 <sean-k-mooney> right so exisitng installer only set that on libvirt
16:47:14 <sean-k-mooney> not on nova
16:47:21 <sean-k-mooney> so nova-comptue has a restricted view
16:47:37 <sp-bmilanov> ah, right.. but which installer is that?
16:47:55 <sp-bmilanov> (I might be missing Nova context)
16:48:26 <sp-bmilanov> (btw, there's the other side of the discussion, if we were to treat that currently Nova can start and schedule VMs with this misconfiguration, what would be the fix so it does not start at all (as sean-k-mooney suggested in the change comments))
16:49:24 <sean-k-mooney> libvirt has it in kolla  https://github.com/openstack/kolla-ansible/blob/master/ansible/roles/nova-cell/defaults/main.yml#L11
16:49:29 <sean-k-mooney> but not nova-compute https://github.com/openstack/kolla-ansible/blob/master/ansible/roles/nova-cell/defaults/main.yml#L55
16:49:40 <sean-k-mooney> adn its the same in tripleo and the new isntaller redhat wrote
16:50:08 <sean-k-mooney> sp-bmilanov: what installer do you use
16:50:21 <bauzas> time check : 10 mins before end of meeting
16:50:34 <sean-k-mooney> dansmith: bauzas  am i right in thinking you agree this is out of scope fo nova
16:50:40 <sean-k-mooney> and in scope fo the installer
16:51:09 <sp-bmilanov> we have seen a few, but yes, kolla and no installer (opereating system packages)
16:51:20 <dansmith> I mean, that's kinda my point that this seems like us introspecting the system to dynamically configure ourselves, but in a way that can only work in certain situations
16:51:20 <bauzas> I'd tend to say yes, this sounds a tooling mechanism that could set the config correctly
16:51:51 <sean-k-mooney> i would be less agaisnt this if libvirt provided an api for this
16:51:57 <bauzas> the concern I have is that this approach doesn't sound generic at all and very targeted to a specific case
16:52:22 <sean-k-mooney> also those config options we use are ment to be virt dirver independent
16:52:37 <dansmith> right, if libvirt exposes it, that's totally different
16:53:02 <dansmith> in that case, configuring libvirt is the job of the installer, and nova can maybe even lose some config, right? or at least, not always require it to be set
16:53:05 <gibi> also it opens the question what if the cgroups config changes while nova-compute is running
16:53:18 <sean-k-mooney> fun times :)
16:53:40 <gibi> at least the current config nova has is validated at nova-compute startup
16:54:11 <sean-k-mooney> yep it wont actully catch this specific case
16:54:29 <sean-k-mooney> i.e. we validate the cpu are in the info provided by libvirt
16:54:35 <sean-k-mooney> and if they are online ectra
16:54:48 <sean-k-mooney> but we dont have visablity into cgroups at all really
16:54:53 <sean-k-mooney> at least not in this context
16:55:19 <sean-k-mooney> we have some basic check to know if its cgroups v2 vs v1
16:56:17 <bauzas> time check : 4 mins left.
16:57:11 <gibi> this is complexity that either lives in the installer or lives in nova, as nova is big enough already I rather see this complexity added in the installer. Sure we have multiple installers so that duplicates some effort.
16:57:32 <dansmith> ++
16:57:35 <sp-bmilanov> I was wondering if there is also a way to know this outside of reading cgroups that might be more robust
16:57:50 <sean-k-mooney> if there is im not aware of one
16:58:05 <dansmith> there is actually
16:58:11 <dansmith> ....and it's talking to libvirt :D
16:58:15 <sean-k-mooney> :)
16:58:36 <sp-bmilanov> :D
16:58:43 <bauzas> I guess we can't solve that question now
16:59:09 <bauzas> sp-bmilanov: could you at least talk to the libvirt community and see their toughts about that ?
16:59:15 <sp-bmilanov> in any case, it sounds like a good idea to initiate a discussion with libvirt devs as well
16:59:16 <bauzas> and then come back to us ?
16:59:28 <sp-bmilanov> bauzas: yep
16:59:31 <bauzas> cool
16:59:41 <bauzas> thanks then
16:59:52 <sp-bmilanov> you're welcome
17:00:04 <sp-bmilanov> thanks for the input, all
17:00:08 <bauzas> if you're OK, I think we're done for today
17:00:14 <bauzas> thanks all
17:00:17 <bauzas> #endmeeting