Tuesday, 2025-08-05

*** mhen_ is now known as mhen01:40
cardoeHey all. I've brought this spec up before and folks have said they're good with it. There's a number of operators that are patching the nova code with patches to achieve this. I'm trying to advocate for these downstream operators to engage more with upstream OpenStack projects with my TC member hat on. https://review.opendev.org/c/openstack/nova-specs/+/47181502:56
cardoeWe're now seeing associated projects (neutron) marking the patches as abandoned because they don't believe it'll go anywhere.02:56
cardoePlease help me folks.02:57
jkulikNicolaiRuckel: sounds to me like you're using system_metadata in your patches, but the object doesn't have that loaded - and can't load it later on. system_metadata would have to get queried out (or set, since we're talking about tests) when the Instance is fetched from the DB/created in the tests.06:17
opendevreviewStephen Finucane proposed openstack/nova master: tests: Replace keystoneclient with keystoneauth1  https://review.opendev.org/c/openstack/nova/+/95174409:27
NicolaiRuckel jkulik: The weird thing is that I didn't knowingly touch system_metadata at all.09:27
jkulikNicolaiRuckel: then maybe your patch calls some function that does consume it? little hard to tell ;)09:42
NicolaiRuckelWhat surprised me is that the tests (for example functional.libvirt.test_uefi:test_create_server) always had the assertion `self.assertIn('image_hw_machine_type', instance.system_metadata)` and used to pass but somehow I managed to make it not lazy-loadable anymore. Is there any documentation on how system_metadata gets set/defined?09:52
jkulikafaik, there's just code.09:53
NicolaiRuckelI was afraid that this would be the answer. :D09:54
jssfrNicolaiRuckel, maybe you can link / paste the specific patch you're looking at?09:54
NicolaiRuckelone second, I need to make the repository public first09:59
NicolaiRuckelThis seems to be the commit that breaks the test: https://gitlab.com/NicolaiRuckel/nova/-/commit/9d6e6f44d5129a9de3c7f4a636e82411a711e3e310:05
jkulikYour use of `objects.Intance` looks weird. If I understand it right, you're assigning the `uuid` to the class - not an individual Instance instance. Maybe that breaks stuff.10:15
NicolaiRuckelThat sounds reasonable. I'll look into that. Thanks!10:31
darkhackernchttps://bugs.launchpad.net/nova/+bug/2119126 team can someone look into this11:24
mohsen__hello everyone. I have an issue with resource tracking in openstack.11:32
mohsen__I have set the following configs in my nova compute nova.conf file:11:32
mohsen__[compute]11:32
mohsen__cpu_shared_set = 0-17,36-5311:32
mohsen__cpu_dedicated_set = 18-35,54-7111:32
mohsen__and I enabled the NUMATopologyFilter  and AggregateInstanceExtraSpecsFilter filters so that I can schedule cpu-pinned instances on my desired aggregate of compute hosts. 11:32
mohsen__The issue is that the placement service periodically adds the specified set of dedicated cpus to the inventory as PCPU section, and after a while, it removes the PCPU section of the inventory unexpectedly. which results to not being able to delete instances and getting the following error:11:32
mohsen__Error during ComputeManager.update_available_resource: nova.exception.ReshapeNeeded: Virt driver indicates that provider inventories need to be moved.11:32
mohsen__I appreciate your ideas in advance11:32
sean-k-mooneymohsen__: is this an existign deployment with workload?11:57
sean-k-mooneymodifyign the legacy vcpu_pin_set and newer cpu_*_sets shoudl only be done on empty hosts11:57
sean-k-mooneyim not aware of a bug that would cause the issue you describe but it sound like the "reshape" code that is run when upgrading to a release that supprot pinned and unpined guests on the same host is trying to fix the allocations/inventories. i dont know what woudl cause the pcpu invetoies to be removed other hten perhaps if the resouce tracker was broken by vms running on the12:00
sean-k-mooneywrong cores. that can happen if you defiend or modifed those values when vms were on the hosts12:00
sean-k-mooneydarkhackernc: if the compute service is down the instance is not expected to be manageable via the nova api12:02
mohsen__sean-k-mooney: Yes. My OpenStack environment already has a couple of instances up and running on it. 12:06
mohsen__Thank you for your helpful response 12:06
sean-k-mooneydarkhackernc: this looks like a masikari issue so i marked the bug as invliad for nova.12:13
NicolaiRuckeljkulik: That really helped. I was able to fix the problem now.12:26
jkulikNicolaiRuckel: nice!12:31
opendevreviewLajos Katona proposed openstack/nova master: WIP Use SDK for Neutron  https://review.opendev.org/c/openstack/nova/+/92802212:36
mohsen__sean-k-mooney: I migrated all the instances from one compute node to another computes. then reconfigured nova with the mentioned config above on the compute host where there aren't any instances no longer. then I ran "openstack resource provider inventory list <compute-host-uuid>" command and find out that the same issue still exists. what could be the root cause?13:09
sean-k-mooneymohsen__: you have restared the compute agent on the host since it was drained?13:10
mohsen__sean-k-mooney: since I used kolla-ansible to reconfigure it, it has a handler which do the restart task after modification. I didn't do it manually but I can test it.13:11
sean-k-mooneyif so and the issue still happens i think your going to have to file a bug with some logs. i.e. the full error message or warning form the reouce tracker and or perodic task related to the createion or update of the resouce provider13:12
sean-k-mooneymohsen__: you did the reconfiugre while there were isntances on the host correct13:12
sean-k-mooneyhave you restarted it since it was empty13:12
mohsen__sean-k-mooney: I evacuated all the instances and then ran the kolla-ansible to reconfigure it. the restart task had been run after all the instances were evacuated.13:14
mohsen__sean-k-mooney: After evacuating all the instances and after reconfiguration, I manually restarted it and the issue still persist. sounds like there is no way unless reporting a bug🤔13:18
sean-k-mooneyack, without inspecigng the logs and or error i cant really advise more. i likely wont have time to dig much deeper in the short term unfortunetly. i woudl suggeset filing a bug and trying to extract any relevent log messages (ideally at debug level) that are related to teh execution of the update_aviable_resouce provider perodic and the intiall startup of the agent related to13:18
sean-k-mooneywhen it initally created teh RP invetories with the PCPUs inventory. you mentioned it initally was created and then lost correct? it would be good to try and capture the relevent logs showing both13:18
mohsen__sean-k-mooney: Thank you for your guidance13:21
*** ykarel_ is now known as ykarel14:01
cardoeI know the nova meeting is coming up and I just want to ping on https://review.opendev.org/c/openstack/nova-specs/+/471815 I pinged earlier.14:58
UgglaNova meeting in ~30mn15:30
Uggla#startmeeting nova16:01
opendevmeetMeeting started Tue Aug  5 16:01:39 2025 UTC and is due to finish in 60 minutes.  The chair is Uggla. Information about MeetBot at http://wiki.debian.org/MeetBot.16:01
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.16:01
opendevmeetThe meeting name has been set to 'nova'16:01
UgglaHello everyone16:01
jssfro/16:02
jssfrNicolaiRuckel and me would like to bring up a topic in the "Open Discussion" slot16:02
jssfr(jfyi)16:02
fwieselo/16:02
Ugglajssfr, no pb but this week few people are available.16:02
jssfrthanks for the heads up :)16:03
gmaano/16:03
Ugglaawaiting people to join, but I'm pretty sure we will not have quorum.16:04
UgglaPlease raise your hand for  showing you are part of the meeting.16:04
NicolaiRuckelo/16:05
dansmitho/16:05
UgglaLet's start16:06
Uggla#topic Bugs (stuck/critical) 16:06
Uggla#info No Critical bug16:06
melwitto/16:06
Uggla#topic Gate status16:07
Ugglahey melwitt, good to see you16:07
Uggla#link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs 16:07
Uggla#link https://etherpad.opendev.org/p/nova-ci-failures-minimal16:07
Uggla#link https://zuul.openstack.org/builds?project=openstack%2Fnova&project=openstack%2Fplacement&branch=stable%2F*&branch=master&pipeline=periodic-weekly&skip=0 Nova&Placement periodic jobs status16:07
Uggla#info Please look at the gate failures and file a bug report with the gate-failure tag.16:07
Uggla#info Please try to provide a meaningful comment when you recheck16:07
Uggla#info Gate was blocked last week with "openstack.exceptions.SDKException: Image creation failed: 'latin-1' codec can't encode character '\u2019' in position 662: ordinal not in range(256) in openstack.tests.functional.image.v2.test_image.TestImage.test_tags".16:08
Uggla#info https://review.opendev.org/c/openstack/openstacksdk/+/956369 fixed it16:08
Uggla#topic tempest-with-latest-microversion job status 16:08
Uggla#link https://zuul.opendev.org/t/openstack/builds?job_name=tempest-with-latest-microversion&skip=016:08
Ugglagmaan, do you have something to share ?16:09
gmaanah, no update on this.16:10
Ugglathanks, so moving on16:10
Uggla#topic Release Planning16:10
Uggla#link https://releases.openstack.org/flamingo/schedule.html16:11
Uggla#info Nova deadlines are set in the above schedule16:11
UgglaTime is flying, a bit less than 4 weeks before FF.16:12
Uggla#topic Review priorities 16:12
Uggla#link https://etherpad.opendev.org/p/nova-2025.2-status16:12
UgglaI guess the file is updated, please let me know if something is wrong in it.16:13
Uggla#topic OpenAPI 16:13
Uggla#link: https://review.opendev.org/q/topic:%22openapi%22+(project:openstack/nova+OR+project:openstack/placement)+-status:merged+-status:abandoned16:13
Uggla#info up from 13 to 32 remaining. (I guess Stephen created new ones and follow up)16:13
UgglaI hope we can see an end about this.16:14
UgglaI'm going to skip stable branch topic because Elod is on pto.16:15
Uggla#topic vmwareapi 3rd-party CI efforts Highlights16:15
Ugglafwiesel, something on our side ?16:15
fwiesel#info fwiesel will be on PTO until the 9/916:15
fwieselOtherwise, no news from me.16:15
fwieselUggla: back to you16:16
Ugglafwiesel, I wish you'll enjoy your PTO.16:16
fwieselThanks!16:16
UgglaAlso skiping Gibi's point about eventlet removal due to Gibi's well deserved pto.16:17
Uggla#topic Open discussion16:17
melwittI have a small topic16:18
UgglaSo jssfr, giving the mic to you.16:18
jssfrthanks!16:18
jssfrso we're currently looking into two topics which are somewhat related:16:18
jssfrfirst we're rebasing https://review.opendev.org/c/openstack/nova/+/62164616:18
jssfrsecond I proposed a draft patch for a vTPM data preservation issue a couple weeks ago: https://review.opendev.org/c/openstack/nova/+/95565716:18
sean-k-mooneyo/ sorry was doign something else16:19
jssfro/16:19
jssfrI also think that sean-k-mooney proposed something similar to the vTPM draft for UEFI NVRAM preservation16:19
jssfrwe'd like to hear whether we're heading the right direction by rebasing #621646 regarding UEFI, or whether there might be a better way.16:20
sean-k-mooneyi was suggeing bring it up in this meetign yes16:20
melwittoh, good. this was the topic I wanted heh (vTPM data)16:20
sean-k-mooneyor rather that melwitt  bring it up16:20
jssfrI also saw only now that you replied to #955657, because I had my inbox sorting messed up, reading that now.16:20
sean-k-mooneymelwitt: do you want to summerize16:21
jssfrread it16:21
melwittsure16:21
melwittwe are looking for input and consensus about handling the recently raised issue in the community regarding vTPM data that is lost across guest reboots https://bugs.launchpad.net/nova/+bug/2118888 and https://review.opendev.org/c/openstack/nova/+/955657 the tl;dr is that this is a latent problem that originates in libvirt16:21
melwittlibvirt added a flag for this VIR_DOMAIN_UNDEFINE_KEEP_TPM a couple of years after vTPM was added in Nova and jssfr has proposed a patch for using the flag ^16:22
jssfr(which will need a bit of work, such as adding tests)16:22
jssfr(but I can do that)16:22
sean-k-mooneythere is a related but seperate issue that related to the nvram data16:22
sean-k-mooneyagain there are now flag to supprot keeping or removing the nvram file16:23
jssfr(UEFI NVRAM bug is https://bugs.launchpad.net/nova/+bug/1785123, patch-with-merge-conflicts is https://review.opendev.org/c/openstack/nova/+/621646 )16:23
dansmithso if we do that, we need to do our own deleting of the TPM file when we undefine to actually delete the instance?16:23
jssfrdansmith, no, because we can just not pass that flag when we truly want to delete16:23
sean-k-mooneydansmith: no. we can pass the keep or delte flag as approtiate16:23
jssfrin my draft patch, I tied it to the delete_storage bool IIRC16:23
dansmithoh it's a flag _to_ undefine?16:24
jssfryes :)16:24
melwittyes16:24
dansmithokay16:24
dansmithah I see yes, okay16:24
sean-k-mooneyso we have a number of placess we undefine the vm today wher the data shoudl not be deleted, hard-reboot is one but also some volume operations require use to temporlly undefien the domain16:24
melwittwe are thinking about, how to fix this for older versions. the flag addition could be technically seen as a "feature" but IMHO the impact is quite significant and personally I would think about backporting the flag fix to older versions that could be reasonably expected to use libvirt >= 8.9.0 where the flag was introduced16:25
melwittwondering what others think of this16:26
dansmithshifting behaviors in a library that we have to account for is not a feature, IMHO16:26
jssfrI think a backport would be quite appropriate16:26
jssfrand we are going to backport to at least 2024.1 anyway16:26
jssfrso we could contribute those backported patches at least.16:27
sean-k-mooneydansmith: the wrinkel is libvirt uncondtionally delete the data when we added vtpm supprot and the option to preseve it was added 2 years later16:27
dansmithyeah understand16:27
sean-k-mooneywith that said the didnt really document that which is  partly why we missed it but i agre we shoudl not require a feature to "do the write thign"16:28
jssfrsean-k-mooney, so you're saying it's not shifting behaviour in libvirt, but simply a bug which existed since forever but which nobody noticed?16:28
sean-k-mooneyjssfr: yes, however that does not mean i belive we shoudl consider it a feature16:28
sean-k-mooneyi supprot the idea of fixing this condtional on the libvirt verison16:28
dansmithit's shifting behavior and our failure to account for it is a bug not a feature16:28
jssfr(I get the impression that "feature" versus "bug" is an important distinction here, but I'm not sure why. backporting/support policies?)16:28
sean-k-mooneyand possibel backporting that based on dicusssion with teh stable team16:29
melwittI agree with dansmith 16:29
sean-k-mooneydansmith: well the default never changed in libvirt. it still deleted on undefine by defualt. they just added the ablity to keep it in later release16:29
sean-k-mooneyso do we agree to treat this a logic bug in nova16:29
sean-k-mooneyi.e. that we didnt account for how this actully works16:30
sean-k-mooneyand if so are we ok with usign the flags to adress that16:30
sean-k-mooneyjssfr: yes the stable plociy does not allow feature backports upstream16:30
jssfraha, thanks!16:31
sean-k-mooneyjssfr: there are very limited excptions to that16:31
jssfrnoted, thanks.16:31
melwittI support backporting use of the flags (with libvirt version conditional)16:32
sean-k-mooneyim +1 on treating it as a but and fixing it by approatbly passing the flags to undefine based on if we are deleting the instance or just undefining for some other reason16:32
NicolaiRuckelthis applies to both vTPM and UEFI, right?16:33
sean-k-mooneys/as a but/as a bug/16:33
sean-k-mooneyso its the same patther and the same logic bug16:33
sean-k-mooneyso i think we could apply the same direction to both16:34
melwitt+116:34
jssfrsean-k-mooney, does this only apply for the accidental deletion on undefine for UEFI, or also for the loss of NVRAM data during migrations?16:34
sean-k-mooneythe specific libvirt verions may differ for the relevent flags16:34
sean-k-mooneyso we shoudl fix both as seperate patches16:34
sean-k-mooneyjssfr: nvram data is actully copied during live migration i belive16:35
sean-k-mooneycold migration is sperate. we shoudl focus on the non move operations cases first16:35
jssfrNicolaiRuckel, any evidence of nvram preservation during live migrations when initially rebased https://review.opendev.org/c/openstack/nova/+/621646 ?16:36
sean-k-mooneyim not sure which subset of opertions loose nvram data but fixing all fo them is not going to be a singel bug16:36
jssfrunderstood, thanks.16:36
sean-k-mooneyjssfr: i belive its copied by qemu16:36
jssfraha16:36
jssfrNicolaiRuckel, we'll have to double-check that in the devstack, without the rebased patches.16:37
NicolaiRuckelyeah, I didn't try it yet16:37
NicolaiRuckelThere were so many failing test cases that I wanted to clean that up first.16:37
sean-k-mooneyi would sugget fixign one bug first then the other16:38
sean-k-mooneyi dont really have a prefence in order but proably vtpm might be less work16:38
sean-k-mooneyfor nvram there is existing tech debt that need to be unwornd related to rebuild16:38
jssfrsean-k-mooney, okay, I'll get to work on making the vTPM patch ready for merge this week then.16:38
jssfrprobably not going to *finish* it this week though. What is the time we need to submit the polished patch (incl. tests) for it to be realistically merged for Flamingo?16:39
sean-k-mooneyyou have about 3 weeks16:39
jssfrokay, so 1w before FF16:39
jssfrnoted16:39
sean-k-mooneywell you have until FF16:40
jssfroh I thought that was in 4 weeks.16:40
UgglaA bit less than 4 weeks16:40
jssfrright16:40
sean-k-mooneyafter which it woudl need a backport but its not a regressin intoduced in this cycle so between FF and RC1 we dont fix a lot of latent bugs16:40
sean-k-mooneyafter rc 1 is cut the branch woudl be reopen for bugfixes again16:40
UgglaFF will be August 28th16:41
jssfrit should be doable to finish it in two weeks, so I'll aim for a submission before or on 2025-08-1916:41
sean-k-mooneyyep that why i said about 3 weeks16:41
NicolaiRuckelI'll continue with rebasing/fixing the UEFI patch in the meantime.16:41
sean-k-mooneyi think we can move on if there are other topic unless there are any final objections?16:44
jssfrI'd probably jump out of the meeting then. I'll stay reachable here as usual though if there's any later remarks. Thanks for all the input!16:45
Ugglaok I guess melwitt point was covered ^16:46
Ugglaanything else to discuss ?16:46
jssfrokay, text you all later, thanks!16:47
UgglaAs we are almost on the top of the hour, I'll skip bug scrubbing.16:52
UgglaThanks all, see you next week.16:53
Uggla#endmeeting16:53
opendevmeetMeeting ended Tue Aug  5 16:53:50 2025 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)16:53
opendevmeetMinutes:        https://meetings.opendev.org/meetings/nova/2025/nova.2025-08-05-16.01.html16:53
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/nova/2025/nova.2025-08-05-16.01.txt16:53
opendevmeetLog:            https://meetings.opendev.org/meetings/nova/2025/nova.2025-08-05-16.01.log.html16:53
cardoeSo I know its after the meeting but I just wanted to bring up https://review.opendev.org/c/openstack/nova-specs/+/47181517:09
sean-k-mooneycardoe: i still think that woudl be good to do, in 2026.1 at this point but its still open because of review banding and process17:31
cardoeSo Ironic has implemented a patch that grabs the metadata from nova and rewrites it to add that data and then ultimately writes that to disk.17:33
sean-k-mooneythat was added for another reason i belive17:34
sean-k-mooneybut yes they have some logic for updatign the config drive17:34
sean-k-mooneythat for port_groups in general rather then bonding specificly i think17:35
sean-k-mooneyor perhaps its for bonding rather then for vlan trunking17:35
sean-k-mooneyi vagly recall the patch but not the details17:35
sean-k-mooneycardoe: https://bugs.launchpad.net/ironic/+bug/210607317:36
sean-k-mooneywhich which was adress by https://review.opendev.org/c/openstack/ironic/+/94667717:37
sean-k-mooneycardoe: it looks like they have chosen to use the uuid for the link id https://review.opendev.org/c/openstack/ironic/+/946677/29/ironic/conductor/configdrive_utils.py#29517:39
sean-k-mooneyeven if that may brake some usescases17:39
sean-k-mooneythat effecitvly what i was suggesting doing too17:39
cardoeWell I can update the spec as well.17:40
sean-k-mooneyack they have some examples of what it will looklike in tehre unit tests https://review.opendev.org/c/openstack/ironic/+/946677/29/ironic/tests/unit/conductor/test_configdrive_utils.py#16917:41
cardoeI'm happy to do whatever you guys would like to see to get it landed.17:55
cardoeI'm merely trying to show users in my company and then at another company (where some former co-workers have gone) that they need to be engaging with upstream instead of downstream patches.17:55
cardoeWhich has morphed into me finding that spec and series which is from another company. And the written patches are written by yet another company.17:56
cardoeWhile my company engages with upstream (hi I'm here) there's other orgs within that don't necessarily.17:56
cardoeThen these other two companies aren't engaging at all.17:57
cardoeYou want devs and contributors, well I gotta convince these companies that it's not pointless to engage upstream on features.17:57
sean-k-mooneycardoe: ack Uggla fyi ^18:10
sean-k-mooneycardoe: its not pointless but someone aslo need to actully do the work and chapiaion the idea. you are promoting it well18:10
cardoeI picked this feature cause I thought it'd be the easiest since it had a spec, an implementation, it had tempest tests for nova, neutron, and ironic18:11
sean-k-mooneyright the context that is around it is many core reviewer have been feeling burn out due to the numebr of feature thaty are bign asked to review so it was propsoed that we shoudl approve less specs but try to activlly land the ones we do approve 18:13
sean-k-mooneya side effect of that was suggestin that cores shoudl only +2 a spec if they intend to spend time reviewin it18:13
sean-k-mooneythat why this didnt progress. its obvious that there is some interest in doing his18:14
sean-k-mooneyso i think it could happen next cycle18:14
cardoeokay18:25
opendevreviewCallum Dickinson proposed openstack/nova master: Fix image ID in libvirt metadata when unshelving  https://review.opendev.org/c/openstack/nova/+/94297321:56
opendevreviewCallum Dickinson proposed openstack/nova master: Add more flavor metadata to libvirt guest XML  https://review.opendev.org/c/openstack/nova/+/94297421:56
opendevreviewCallum Dickinson proposed openstack/nova master: Add image meta to libvirt XML metadata  https://review.opendev.org/c/openstack/nova/+/94276621:56
Callum027sean-k-mooney: Morning, thanks for getting in touch, I've rebased and adopted the suggestions you made to my patches22:00
sean-k-mooneyCallum027: thanks, im just wrappiping for today, i proably shoudl have stop about 4 hours ago :) but ill take alook again tomorrow once we get some ci results22:03
Callum027No worries, have a good evening22:03
sean-k-mooneyo/22:04
melwittworking on code near the server group late affinity check I noticed it looks like there has been a regression in the error handling of the check due to a past bug fix that changed what exception the late check raises https://bugs.launchpad.net/nova/+bug/211957823:06
melwittI just filed a bug to capture it23:07

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!