*** bauzas_ is now known as bauzas | 00:01 | |
opendevreview | Takashi Kajinami proposed openstack/nova master: Detect AMD SEV-ES support https://review.opendev.org/c/openstack/nova/+/925685 | 02:39 |
---|---|---|
opendevreview | Takashi Kajinami proposed openstack/nova master: libvirt: Launch instances with SEV-ES memory encryption https://review.opendev.org/c/openstack/nova/+/926106 | 02:39 |
*** bauzas_ is now known as bauzas | 07:13 | |
*** bauzas_ is now known as bauzas | 09:37 | |
opendevreview | sean mooney proposed openstack/nova master: Skip new image format tests https://review.opendev.org/c/openstack/nova/+/926215 | 14:23 |
dansmith | gibi_: can you +W this trivial skip patch to unblock the gate (for us and others) ? ^ | 14:26 |
sean-k-mooney | im not sure that job runs elsewhere | 14:30 |
sean-k-mooney | but it might i could check | 14:30 |
dansmith | it runs in lots of places | 14:30 |
dansmith | glance for one | 14:30 |
dansmith | also in the ceph plugin | 14:30 |
sean-k-mooney | oh ok then yes for others too | 14:31 |
dansmith | I think cinder's ceph import job might be based on it too, I forget | 14:31 |
gibi_ | dansmith: +A thanks for the tempest tests | 14:49 |
dansmith | thanks! | 14:49 |
opendevreview | Artom Lifshitz proposed openstack/nova master: POC: Work around lbvirt reporting offline cores as being on socket 0 https://review.opendev.org/c/openstack/nova/+/926218 | 15:44 |
sean-k-mooney | @artom that not what we discussed doing to fix that | 16:02 |
sean-k-mooney | artom_: we dicussed makeign usre we first online all the cores at startup. then gather the info and then offline them when using the cpu state policy | 16:03 |
*** artom_ is now known as arotm | 16:04 | |
*** arotm is now known as artom | 16:04 | |
artom | I don't think I was there during those discussions :) | 16:04 |
gibi_ | hehe, just commented that to the review:) | 16:04 |
artom | But... the effect is the same, no? | 16:04 |
gibi_ | artom: nova-compute cannot assume that the pcpus are online after a restart | 16:05 |
artom | Ah, good point. | 16:05 |
gibi_ | especially as nova-compute offline them | 16:05 |
sean-k-mooney | artom: please also decorrage the get capablities with https://docs.python.org/3/library/functools.html#functools.lru_cache | 16:06 |
sean-k-mooney | the content of get_capabilities shoudl never change while the agent is running | 16:06 |
sean-k-mooney | we done actully need to call libvirt every time | 16:07 |
artom | Well that's a different thing | 16:07 |
artom | An improvement, but not related to fixing the bug | 16:07 |
sean-k-mooney | artom: we need to make this change to fix the bug | 16:07 |
sean-k-mooney | no its required | 16:07 |
artom | Explain yourself :) | 16:07 |
sean-k-mooney | because depenign on the livbirt verion once we turn off the cores the output will change | 16:07 |
sean-k-mooney | in some cases it will return the cached values form the first invocation in other version it will update and the info will be wrong | 16:08 |
artom | Oh, so you're saying memoize the entire return value instead of manually preserving the socket IDs? | 16:08 |
sean-k-mooney | yep | 16:08 |
artom | OK... can any other caps change while the agent is running? | 16:08 |
sean-k-mooney | not without breaking nova :) | 16:08 |
sean-k-mooney | we do not supprot cpu/memory hotplug or chaning the amount of hugepages while nova-comptue is running or chaning the cpu flags your cpu reprot at runtime | 16:09 |
sean-k-mooney | which is what this funciton is basicaly providing | 16:10 |
sean-k-mooney | if you want to do any of the above you shoudl restart nova-compute to pick up the change | 16:10 |
gibi_ | we basically already memoizing caps as there is a self._caps to save it and an early return | 16:10 |
sean-k-mooney | you proably shoudl not do any of the above on a host with instnaces either | 16:10 |
sean-k-mooney | oh true | 16:11 |
artom | There's a cache size limit, though. What happens when we hit it, and end up actually calling libvirt's caps and potentially getting the wrong socket? | 16:11 |
sean-k-mooney | artom: that now how that work but gibi is write | 16:11 |
sean-k-mooney | we are intenrlaly making this a proprty/singelton and early outing already | 16:12 |
sean-k-mooney | https://review.opendev.org/c/openstack/nova/+/926218/1/nova/virt/libvirt/host.py#802 | 16:12 |
artom | I'll be honest, I understand gibi_'s concern, but the lru_cache thing is confusing me | 16:13 |
sean-k-mooney | pretent i didn mention it | 16:14 |
artom | You -1'ed the patch and are asking for it :P | 16:14 |
sean-k-mooney | i -1 because the rest of the patch is wrong | 16:14 |
sean-k-mooney | even without htat | 16:14 |
artom | Is addressing gibi_'s comment enough to fix it? | 16:15 |
sean-k-mooney | yep | 16:15 |
artom | \o/ | 16:15 |
sean-k-mooney | we just neeed to make sure this work if you restart the agent multiple times | 16:15 |
sean-k-mooney | from an upstream point of view i think we can more or less vlaidate this with unit/funcitonal tests | 16:22 |
sean-k-mooney | we cant test this in tempest without modifyting the cpu toplogy fo the nodepool vms | 16:22 |
sean-k-mooney | we could chat to infa about that | 16:22 |
sean-k-mooney | we just need to hw_cpu_sockets=1 hw_cpu_threads=1 | 16:23 |
sean-k-mooney | on the nodepool images so that we do not have multipel sockets per numa node | 16:23 |
sean-k-mooney | to fully test it we would also need multiple numa nodes and maybe pci | 16:23 |
sean-k-mooney | so realistically i think we are going to have to follow up with third party ci and manual testing | 16:24 |
sean-k-mooney | and jsut test this with unit/functional test for now | 16:24 |
sean-k-mooney | we may want to consier if we shoudl alter the libvirt fixutre to emulate the observed behavior but im not sure if that is warrented or not | 16:25 |
gibi_ | a functional test for upstream is OK to me and we can manual test / whitbox test is downstream | 16:25 |
sean-k-mooney | i feel like that will be overly invasive | 16:25 |
gibi_ | sean-k-mooney: yeah, growing this logic into the fixture is too much, just mock the part of the fixture in the test case to simulate it oncec | 16:26 |
*** gibi_ is now known as gibi | 16:27 | |
sean-k-mooney | ack we would have to modify https://github.com/openstack/nova/blob/master/nova/tests/fixtures/libvirt.py#L808-L822 | 16:27 |
sean-k-mooney | to be aware of the online state | 16:27 |
opendevreview | Stephen Finucane proposed openstack/nova master: hardware: Correct log https://review.opendev.org/c/openstack/nova/+/926223 | 16:28 |
sean-k-mooney | ... this https://github.com/openstack/nova/blob/master/nova/tests/fixtures/libvirt.py#L808-L847 | 16:28 |
stephenfin | sean-k-mooney: gibi: dead simple log fix there ^ | 16:28 |
sean-k-mooney | stephenfin: ah you also explaine what -2 is | 16:29 |
sean-k-mooney | for a second i tough you were just droping kb | 16:29 |
stephenfin | yeah, I've just copied what we do in the success case | 16:29 |
gibi | stephenfin: I have a bugreport for that https://bugs.launchpad.net/nova/+bug/2075959 | 16:30 |
sean-k-mooney | +1 while i wait of ci results | 16:30 |
opendevreview | Stephen Finucane proposed openstack/nova master: hardware: Correct log https://review.opendev.org/c/openstack/nova/+/926223 | 16:32 |
stephenfin | gibi: Sweet. Updated with a link | 16:32 |
gibi | stephenfin: +2. It feels like you read my mind and fixed my low hanging todo :) | 16:33 |
gibi | Thanks! | 16:35 |
opendevreview | Artom Lifshitz proposed openstack/nova master: POC: Work around lbvirt reporting offline cores as being on socket 0 https://review.opendev.org/c/openstack/nova/+/926218 | 16:41 |
artom | gibi, sean-k-mooney, so something like that ^^? A whole lot still missing, obviously, but the basic idea. | 16:42 |
artom | I wonder if we should be remembering what CPUs were offline? I feel like when we offline the dedicated ones later, that's enough? | 16:43 |
gibi | artom: the current logic should offline all the unused pcpus, so we can relay on that without remembering explicitly | 16:45 |
artom | OK, that keeps thing tidy | 16:45 |
gibi | artom: at a first read what you proposed seems OK to me. I need to look at it once more with a fresh set of eyes probably after we have some test results too | 16:47 |
artom | For sure :) Good to know that nothing obvious stands out | 16:47 |
artom | And there's a whitebox DNM patch running, so that should weed out any issues | 16:47 |
sean-k-mooney | artom: you proably should not be touching cores outside fo cpu_dedicated_set | 16:53 |
sean-k-mooney | we only powermanage cores in cpu_dedicated_set and its a hard error for cores in cpu_shared_set to be offline | 16:54 |
sean-k-mooney | but any cores out side of those two sets are valid for the admin to ower on/off as they desire | 16:55 |
artom | Yeah... I guess the only edge case where it might make sense is if the host has been trained, and nova-compute restarted to change the cpu_dedicated_set | 16:55 |
sean-k-mooney | as nova is not ment to use them | 16:55 |
artom | s/trained/drained/ | 16:55 |
artom | And so some of the offline cores could be outside the _current_ dedicated set, but part of the _previous_ dedicated set | 16:55 |
sean-k-mooney | well not you missing the fact that core outside fo cpu_shared_set adn cpu_dedicated_set can be used for other usecases | 16:55 |
sean-k-mooney | artom: so it would be incorrect for use to turn on any cores not in the currnt cpu_dedicated_set | 16:56 |
sean-k-mooney | as long as the offline cores are not added to cpu_shared_set we dont care if they were formally in cpu_dedicated_set | 16:57 |
artom | We do if they're offline and we read their socket as 0 :) | 16:57 |
sean-k-mooney | no we dont | 16:57 |
sean-k-mooney | because nova will never use them | 16:57 |
artom | Oh... yeah? NUMA affinity but with shared CPUs is not a thing? | 16:58 |
sean-k-mooney | it is but we have a sperat check for cpu_shared_set to ensure they are online | 16:58 |
sean-k-mooney | we dont allow you to have any offlien cores in cpu_shared_set | 16:58 |
artom | Ah ok, in that case it's even tidier, I don't need the new power_up_unconditionally() | 16:59 |
sean-k-mooney | right you can resue the existing _power_up although you might want to drop the _ or add a "public" wraper | 17:00 |
artom | Yeah | 17:03 |
opendevreview | Merged openstack/nova master: Functional test test_boot_reschedule_with_proper_pci_device_count https://review.opendev.org/c/openstack/nova/+/760354 | 17:48 |
opendevreview | Artom Lifshitz proposed openstack/nova master: POC: Work around lbvirt reporting offline cores as being on socket 0 https://review.opendev.org/c/openstack/nova/+/926218 | 18:09 |
opendevreview | sean mooney proposed openstack/nova master: Skip new image format tests https://review.opendev.org/c/openstack/nova/+/926215 | 18:33 |
*** bauzas_ is now known as bauzas | 19:09 | |
*** bauzas_ is now known as bauzas | 19:46 | |
*** bauzas_ is now known as bauzas | 23:35 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!