Tuesday, 2024-08-13

*** bauzas_ is now known as bauzas00:01
opendevreviewTakashi Kajinami proposed openstack/nova master: Detect AMD SEV-ES support  https://review.opendev.org/c/openstack/nova/+/92568502:39
opendevreviewTakashi Kajinami proposed openstack/nova master: libvirt: Launch instances with SEV-ES memory encryption  https://review.opendev.org/c/openstack/nova/+/92610602:39
*** bauzas_ is now known as bauzas07:13
*** bauzas_ is now known as bauzas09:37
opendevreviewsean mooney proposed openstack/nova master: Skip new image format tests  https://review.opendev.org/c/openstack/nova/+/92621514:23
dansmithgibi_: can you +W this trivial skip patch to unblock the gate (for us and others) ? ^14:26
sean-k-mooneyim not sure that job runs elsewhere14:30
sean-k-mooneybut it might i could check14:30
dansmithit runs in lots of places14:30
dansmithglance for one14:30
dansmithalso in the ceph plugin14:30
sean-k-mooneyoh ok then yes for others too14:31
dansmithI think cinder's ceph import job might be based on it too, I forget14:31
gibi_dansmith: +A thanks for the tempest tests14:49
dansmiththanks!14:49
opendevreviewArtom Lifshitz proposed openstack/nova master: POC: Work around lbvirt reporting offline cores as being on socket 0  https://review.opendev.org/c/openstack/nova/+/92621815:44
sean-k-mooney@artom that not what we discussed doing to fix that16:02
sean-k-mooneyartom_: we dicussed makeign usre we first online all the cores at startup. then gather the info and then offline them when using the cpu state policy16:03
*** artom_ is now known as arotm16:04
*** arotm is now known as artom16:04
artomI don't think I was there during those discussions :)16:04
gibi_hehe, just commented that to the review:)16:04
artomBut... the effect is the same, no?16:04
gibi_artom: nova-compute cannot assume that the pcpus are online after a restart16:05
artomAh, good point.16:05
gibi_especially as nova-compute offline them16:05
sean-k-mooneyartom: please also decorrage the get capablities with https://docs.python.org/3/library/functools.html#functools.lru_cache16:06
sean-k-mooneythe content of get_capabilities shoudl never change while the agent is running16:06
sean-k-mooneywe done actully need to call libvirt every time16:07
artomWell that's a different thing16:07
artomAn improvement, but not related to fixing the bug16:07
sean-k-mooneyartom: we need to make this change to fix the bug16:07
sean-k-mooneyno its required16:07
artomExplain yourself :)16:07
sean-k-mooneybecause depenign on the livbirt verion once we turn off the cores the output will change16:07
sean-k-mooneyin some cases it will return the cached values form the first invocation in other version it will update and the info will be wrong16:08
artomOh, so you're saying memoize the entire return value instead of manually preserving the socket IDs?16:08
sean-k-mooneyyep16:08
artomOK... can any other caps change while the agent is running?16:08
sean-k-mooneynot without breaking nova :)16:08
sean-k-mooneywe do not supprot cpu/memory hotplug  or chaning the amount of hugepages while nova-comptue is running or chaning the cpu flags your cpu reprot at runtime16:09
sean-k-mooneywhich is what this funciton is basicaly providing16:10
sean-k-mooneyif you want to do any of the above you shoudl restart nova-compute to pick up the change16:10
gibi_we basically already memoizing caps as there is a self._caps to save it and an early return16:10
sean-k-mooneyyou proably shoudl not do any of the above on a host with instnaces either16:10
sean-k-mooneyoh true16:11
artomThere's a cache size limit, though. What happens when we hit it, and end up actually calling libvirt's caps and potentially getting the wrong socket?16:11
sean-k-mooneyartom: that now how that work but gibi is write 16:11
sean-k-mooneywe are intenrlaly making this a proprty/singelton and early outing already16:12
sean-k-mooneyhttps://review.opendev.org/c/openstack/nova/+/926218/1/nova/virt/libvirt/host.py#80216:12
artomI'll be honest, I understand gibi_'s concern, but the lru_cache thing is confusing me16:13
sean-k-mooneypretent i didn mention it16:14
artomYou -1'ed the patch and are asking for it :P16:14
sean-k-mooneyi -1 because the rest of the patch is wrong16:14
sean-k-mooneyeven without htat16:14
artomIs addressing gibi_'s comment enough to fix it?16:15
sean-k-mooneyyep16:15
artom\o/16:15
sean-k-mooneywe just neeed to make sure this work if you restart the agent multiple times16:15
sean-k-mooneyfrom an upstream point of view i think we can more or less vlaidate this with unit/funcitonal tests16:22
sean-k-mooneywe cant test this in tempest without modifyting the cpu toplogy fo the nodepool vms16:22
sean-k-mooneywe could chat to infa about that16:22
sean-k-mooneywe just need to hw_cpu_sockets=1 hw_cpu_threads=116:23
sean-k-mooneyon the nodepool images so that we do not have multipel sockets per numa node16:23
sean-k-mooneyto fully test it we would also need multiple numa nodes and maybe pci16:23
sean-k-mooneyso realistically i think we are going to have to follow up with third party ci and manual testing16:24
sean-k-mooneyand jsut test this with unit/functional test for now16:24
sean-k-mooneywe may want to consier if we shoudl alter the libvirt fixutre to emulate the observed behavior but im not sure if that is warrented or not16:25
gibi_a functional test for upstream is OK to me and we can manual test / whitbox test is downstream16:25
sean-k-mooneyi feel like that will be overly invasive16:25
gibi_sean-k-mooney: yeah, growing this logic into the fixture is too much, just mock the part of the fixture in the test case to simulate it oncec16:26
*** gibi_ is now known as gibi16:27
sean-k-mooneyack we would have to modify https://github.com/openstack/nova/blob/master/nova/tests/fixtures/libvirt.py#L808-L82216:27
sean-k-mooneyto be aware of the online state16:27
opendevreviewStephen Finucane proposed openstack/nova master: hardware: Correct log  https://review.opendev.org/c/openstack/nova/+/92622316:28
sean-k-mooney... this https://github.com/openstack/nova/blob/master/nova/tests/fixtures/libvirt.py#L808-L84716:28
stephenfinsean-k-mooney: gibi: dead simple log fix there ^16:28
sean-k-mooneystephenfin: ah you also explaine what -2 is16:29
sean-k-mooneyfor a second i tough you were just droping kb16:29
stephenfinyeah, I've just copied what we do in the success case16:29
gibistephenfin: I have a bugreport for that https://bugs.launchpad.net/nova/+bug/207595916:30
sean-k-mooney+1 while i wait of ci results16:30
opendevreviewStephen Finucane proposed openstack/nova master: hardware: Correct log  https://review.opendev.org/c/openstack/nova/+/92622316:32
stephenfingibi: Sweet. Updated with a link16:32
gibistephenfin: +2. It feels like you read my mind and fixed my low hanging todo :)16:33
gibiThanks!16:35
opendevreviewArtom Lifshitz proposed openstack/nova master: POC: Work around lbvirt reporting offline cores as being on socket 0  https://review.opendev.org/c/openstack/nova/+/92621816:41
artomgibi, sean-k-mooney, so something like that ^^? A whole lot still missing, obviously, but the basic idea.16:42
artomI wonder if we should be remembering what CPUs were offline? I feel like when we offline the dedicated ones later, that's enough?16:43
gibiartom: the current logic should offline all the unused pcpus, so we can relay on that without remembering explicitly16:45
artomOK, that keeps thing tidy16:45
gibiartom: at a first read what you proposed seems OK to me. I need to look at it once more with a fresh set of eyes probably after we have some test results too16:47
artomFor sure :) Good to know that nothing obvious stands out16:47
artomAnd there's a whitebox DNM patch running, so that should weed out any issues16:47
sean-k-mooneyartom: you proably should not be touching cores outside fo cpu_dedicated_set16:53
sean-k-mooneywe only powermanage cores in cpu_dedicated_set and its a hard error for cores in cpu_shared_set to be offline16:54
sean-k-mooneybut any cores out side of those two sets are valid for the admin to ower on/off as they desire16:55
artomYeah... I guess the only edge case where it might make sense is if the host has been trained, and nova-compute restarted to change the cpu_dedicated_set16:55
sean-k-mooneyas nova is not ment to use them16:55
artoms/trained/drained/16:55
artomAnd so some of the offline cores could be outside the _current_ dedicated set, but part of the _previous_ dedicated set16:55
sean-k-mooneywell not you missing the fact that core outside fo cpu_shared_set adn cpu_dedicated_set can be used for other usecases16:55
sean-k-mooneyartom: so it would be incorrect for use to turn on any cores not in the currnt cpu_dedicated_set16:56
sean-k-mooneyas long as the offline cores are not added to cpu_shared_set we dont care if they were formally in cpu_dedicated_set16:57
artomWe do if they're offline and we read their socket as 0 :)16:57
sean-k-mooneyno we dont16:57
sean-k-mooneybecause nova will never use them16:57
artomOh... yeah? NUMA affinity but with shared CPUs is not a thing?16:58
sean-k-mooneyit is but we have a sperat check for cpu_shared_set to ensure they are online16:58
sean-k-mooneywe dont allow you to have any offlien cores in cpu_shared_set16:58
artomAh ok, in that case it's even tidier, I don't need the new power_up_unconditionally()16:59
sean-k-mooneyright you can resue the existing _power_up although you might want to drop the _ or add a "public" wraper17:00
artomYeah17:03
opendevreviewMerged openstack/nova master: Functional test test_boot_reschedule_with_proper_pci_device_count  https://review.opendev.org/c/openstack/nova/+/76035417:48
opendevreviewArtom Lifshitz proposed openstack/nova master: POC: Work around lbvirt reporting offline cores as being on socket 0  https://review.opendev.org/c/openstack/nova/+/92621818:09
opendevreviewsean mooney proposed openstack/nova master: Skip new image format tests  https://review.opendev.org/c/openstack/nova/+/92621518:33
*** bauzas_ is now known as bauzas19:09
*** bauzas_ is now known as bauzas19:46
*** bauzas_ is now known as bauzas23:35

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!