Thursday, 2025-11-20

*** Jeff is now known as Guest3167701:30
*** mhen_ is now known as mhen02:35
opendevreviewhuanhongda proposed openstack/nova master: live migration: Delete allocation if post_live_migration fails  https://review.opendev.org/c/openstack/nova/+/96765206:59
opendevreviewhuanhongda proposed openstack/nova master: live migration: Delete allocation if post_live_migration fails  https://review.opendev.org/c/openstack/nova/+/96765207:27
opendevreviewRajesh Tailor proposed openstack/nova master: Fix memory_details units in diagnostics API to match documentation  https://review.opendev.org/c/openstack/nova/+/96750509:48
*** sfinucan is now known as stephenfin10:16
opendevreviewSahid Orentino Ferdjaoui proposed openstack/nova master: compute/manager: fix live migration issue due to orphaned attachments  https://review.opendev.org/c/openstack/nova/+/96768310:34
andreykurilinHi folks! I have a question regarding https://bugs.launchpad.net/nova/+bug/2091114 - how the operators are supposed to evacuate the hypervisor in case of emergency? Is there any way to bypass new check?10:36
sean-k-mooneyandreykurilin: in an emargency you cna disabel the secuirty messure on the target hypervior10:38
sean-k-mooneythe flatcar image have been fixed upstream10:38
andreykurilinIt does not relate to flatcar images only, some other vendors violated specifications as well10:39
sean-k-mooneyyep10:39
sean-k-mooneyso as an operator you have the option of enfocing specomplaine and saftey 10:39
sean-k-mooneyon allowing unsefe images10:40
sean-k-mooneywe have taking the stance that all non spec complient images are unsafe10:40
sean-k-mooneyyou coudl use shculder filters to move workload tha tuse unsafe images to a designated host aggarte10:41
andreykurilinI agree with that and we want to enable this check for all new images. But at the same time it would be nice to fix existing images in iterative way without making the cloud less secure (i.e., disable all the checks)10:42
sean-k-mooneyor other means to partion your cloud10:42
sean-k-mooneythe only way to fix the iamges woudl be to modify the date for the image in glance10:42
sean-k-mooneyfor flatcar that just ment removing a flag on the partion but you woudl have to download the image and reupload it10:43
sean-k-mooneyif it was an operator provded image you can do that but not a user image10:43
sean-k-mooneythe oslo image inspector module supprot a way of invoekign it fomr the command line on an image so you can test it locally10:44
andreykurilinoslo image cli note - nice to now, thank you10:45
andreykurilinI understand that fixing images require reuploading and interactions with end-users which takes time. Moving all existing VMs into dedicated HVs with disabled all image checks - sounds expensive :) right now we are leaning towards to patching oslo lib to remove this check so we have enough time to work with end users10:48
sean-k-mooneyhttps://github.com/openstack/oslo.utils/commit/8e6cf9556413f4ef3b75c09b3a42896a4d1b925310:48
sean-k-mooneyandreykurilin: that not somethign that is upstreamable but is somethign you coudl do in your cloud10:49
andreykurilinThank you10:52
sean-k-mooneyandreykurilin: i do have a patch to do that semi safely however it was never inteded really to merge just to demonstate what it woudl look like https://review.opendev.org/c/openstack/oslo.utils/+/937215/2/oslo_utils/imageutils/format_inspector.py10:52
sean-k-mooneythere is another similer patch https://review.opendev.org/c/openstack/oslo.utils/+/96452810:52
sean-k-mooneyeither of which we wexpct to be merged10:53
andreykurilinGot it. Thank you for helpful notes10:56
sean-k-mooneyi dont know if it would make sense to eventully build a "image fixer" type tool that operator could use but kind of feels out os scope of oslo utils10:57
sean-k-mooneymaybe but i dont know of anyoen working on that.10:58
sean-k-mooneysince legacy or uefi boot is select as a image property on the image that is uploaded you will have seperate copies fo the iamge for each mode in glance10:58
sean-k-mooneyso you coudl do what was provided as the flatcar workaround, i dont think glance woul dwant to supprot that itself but it does have interoperatble import plugings for format conversion amoung other things10:59
sean-k-mooneyso that may also be an opetion eventully. however that somethign you would have to talk to the oslo or glance folks about11:00
*** BertrandLanson[m] is now known as blanson[m]11:06
nelljerramelodilles: Thanks for pushing my cherry picks along.  You mentioned on https://review.opendev.org/c/openstack/nova/+/967569 that the grenade job needs investigation - can you suggest some more detail for how I can do that?  (If there is something more specific than the general guidance at https://docs.openstack.org/project-team-guide/testing.html#how-to-handle-test-failures)11:08
andreykurilinsean-j-mooney: yeah, “image fixer” sounds great, but from my noob point of view - too high risk to damage “nobody-known-who-and-how-built-this” user image. May not worth the effort. And definitely out of scope of oslo.utils . Thank you for all your input!11:10
opendevreviewJohn Garbutt proposed openstack/nova master: WIP: Ironic: add logs during provision failures  https://review.opendev.org/c/openstack/nova/+/96782311:27
opendevreviewJohn Garbutt proposed openstack/nova master: WIP: Ironic: on delete wait for cleaning to finish  https://review.opendev.org/c/openstack/nova/+/96782411:27
DominikDanelski[m]Is there any reason why we would want allocations left by a VM that was over quota and never entered ACTIVE to be left in Placement?11:58
DominikDanelski[m]I observed it with unified limits and I'm not sure if that's an error or an intended course of things. Doesn't seem to make sense to me, but maybe there's some underlying reason I'm not aware of.12:02
elodillesnelljerram: thanks for looking into the issue! I've commented on the patch, let's see if my grenade patch fixes the issue when it merges12:54
nelljerramelodilles: many thanks, fingers crossed13:07
*** ykarel_ is now known as ykarel13:30
priteauHello. We have a customer who has raised an interesting issue: they have some tooling using server groups and affinity/anti-affinity rules and were expecting it to work out of the box in a Nova+Ironic context, with affinity policy preventing scheduling and anti-affinity working. However, I think what is happening is that the affinity/anti-affinity filter uses the nova-compute14:08
priteauhost as a comparison, correct?14:08
opendevreviewribaudr proposed openstack/nova master: Use *_OR_ADMIN policy defaults for server shares  https://review.opendev.org/c/openstack/nova/+/96770914:15
opendevreviewRajesh Tailor proposed openstack/nova master: Fix memory_details units in diagnostics API to match documentation  https://review.opendev.org/c/openstack/nova/+/96750515:08
opendevreviewBalazs Gibizer proposed openstack/nova master: Do not fork compute workers in native threading mode  https://review.opendev.org/c/openstack/nova/+/96546615:26
opendevreviewBalazs Gibizer proposed openstack/nova master: Compute manager to use thread pools selectively  https://review.opendev.org/c/openstack/nova/+/96601615:26
opendevreviewBalazs Gibizer proposed openstack/nova master: Libvirt event handling without eventlet  https://review.opendev.org/c/openstack/nova/+/96594915:26
opendevreviewBalazs Gibizer proposed openstack/nova master: Run nova-compute in native threading mode  https://review.opendev.org/c/openstack/nova/+/96546715:26
opendevreviewBalazs Gibizer proposed openstack/nova master: WIP:Move eventlet specific libvirt event handling  https://review.opendev.org/c/openstack/nova/+/96717815:26
opendevreviewBalazs Gibizer proposed openstack/nova master: DNM:Test with oslo.vmware eventlet removal patches  https://review.opendev.org/c/openstack/nova/+/96748715:44
dansmithmelwitt: I just had a thought as I start looking at deployment mode18:06
dansmithmelwitt: if I create an instance, snapshot it, and then delete it.. I'm going to lose my TPM key18:06
dansmithsince we don't snapshot the TPM with the instance, that's fine now18:07
dansmithbut I guess I'm not sure what the long-term intent here is.. maybe we won't ever want to save the TPM with a snapshot for the cloud-fork case, but for the "I snapshot for backup reasons" case it's more relevant...18:07
sean-k-mooneywell that woudl be requried if we ever supprot shelve18:09
sean-k-mooneybut yes that was one of the opens18:09
sean-k-mooneydo we ever supprot snapthosting the tpm and wehre to put the data18:09
sean-k-mooneywe brifly dicussed if you woudl also need a api chagne to allwo requesting it to be snapshot and or restored in rebuild because you mihgt want to rebuidl the root disk without regenerating the tpm18:11
dansmithshelve doesn't suffer from us deleting the tpm key in deployment mode, which is what I'm focusing on here18:12
sean-k-mooneywell at the moment sicne we dont store the tpm data the fact we delet the tpm key when the vm is deleted is consitent18:13
sean-k-mooneyits a problem for the boot, setupm snapshot and boot many form snapshot workflow18:14
sean-k-mooneybut that not really in scope of the live mitration feature is it18:14
sean-k-mooneydansmith: are you suggeting in the curernt proposed code rebuilting woudl delete the tpm secrete?18:15
sean-k-mooneyi.e. just dong boot, snapshot, rebuild?18:16
sean-k-mooneywithout the delete step?18:16
dansmithI'm just saying that we're deleting the secret on instance delete in deployment mode but not the others (of course) and that may be surprising for cases like boot, snapshot, delete, boot-from-snapshot, for whatever reason you might do that (cross-nova rebuilding, boot from backup, etc)18:18
dansmithobviously we're not storing the TPM so that's to be expected now, I'm just noting the difference18:18
sean-k-mooneyack that because the secret is owned by nova18:19
sean-k-mooneyis the tpm secrete ever deleted in the user case18:19
sean-k-mooneyi woudl expect it to be deleted with the vm in the case also18:19
dansmithit can't be18:19
sean-k-mooneywhy we have the users token18:20
sean-k-mooneyfor the delete unless an admin did it18:20
dansmithnot always.. admin delete, local delete, init_host() finish delete, etc18:20
sean-k-mooneyso instead of clenaing it up in the case we can we leak it alwasy?18:20
sean-k-mooneyim not saying that a bug but its surprisng to me18:21
dansmithactually, you're right, we are deleting it in those cases, we'll just fail to actually do it in the cases I'm noting18:21
sean-k-mooneyso we proably need to document the key deletion semantics clearly for all 3 cases in any event18:21
dansmithmelwitt: ^18:21
sean-k-mooneyso i know admin cant read the secret18:22
sean-k-mooneybut i tought he admin user can list and delete them in barbcan18:22
sean-k-mooneydid i imagine that18:23
sean-k-mooneyhttps://github.com/openstack/barbican/blob/master/barbican/common/policies/secrets.py#L41-L4918:24
dansmithmaybe? but in local_delete and init_host cleanup we won't even have (or use) that18:24
sean-k-mooneywell we do via nova credital18:24
dansmithwe won't use it, AFAIK18:24
sean-k-mooneyi.e. the nova user shoudl have service and admin18:24
sean-k-mooneyoh ok18:25
dansmithideally it won't forever though18:25
sean-k-mooneymaybe we coudl to make it consitent in all cases. we wont have admin you mean? just service right?18:25
dansmithcorrect18:25
sean-k-mooneyallowing the service role to delete a secret might make sense18:25
sean-k-mooneybut that is not in there policy today18:26
dansmitheh, idk, if we just make service do everything admins can do that also kinda defeats the point18:26
sean-k-mooneyya i guess im not sure which side fo the ling this falls under18:27
sean-k-mooneyin any case it sounds like we cant guarentee cleanup in most cases18:28
dansmithmelwitt: so when you read the backscroll, ignore my line about losing the TPM key only under deployment mode, but we might want to consider/test/fix late deletes and make them use your nova credential to do the delete in cases where we don't have local creds anymore18:28
sean-k-mooneyalthogh it will work in a subset so users or admins will need to discuvoer and remove the secrets perodicly18:28
* melwitt reads through the backscroll19:28
melwittdansmith: what do you mean by local creds? libvirt secrets store on the compute in the case of 'host' mode? the proposed code tries to lookup the libvirt secret first and if it doesn't find it, it falls back to pulling it from barbican19:31
melwitt*stored19:31
dansmithmelwitt: I meant the nova service user19:32
melwittok, the word "local" is confusing me19:32
dansmithmelwitt: like, compute was down, user did a delete, when we come backup and finish the local delete we don't have any creds to talk to barbican other than the service user19:32
melwittok I see19:33
dansmithprobably also for soft/delayed delete19:33
melwittI'll check through that and add test coverages for it. I remember covering all that when I worked on ephemeral encryption but I'm not sure if I remembered it for this19:34
dansmithack19:34
melwittI think the deferred delete is taken care of already but yeah would be best to test specifically for that as well. I'll work on covering all of the late delete possibilities19:35
dansmithokay, I didn't see where that would be, but we have loooots of delete indirection going on, so I'm sure I missed it :)19:36
melwittdansmith: for the deferred delete stuff it's compartmentalized into the "complete_deletion" methods, like deleting secrets and all that happens in "complete deletion" which does not run until the reclaim periodic task runs which checks for deferred delete expiry and "completes deletion" if the time has expired19:54
melwitt(deferred delete meaning the soft delete API)19:57
dansmithmelwitt: right, but if we're in host or user mode we don't have a context with user creds to do the delete in barbican right?20:21
dansmithlike if we arrive at that code synchronously with a user- or admin-initiated delete, then we will20:23
dansmithbut if we're in that code because we're completing a local delete we won't unless I'm missing something20:24
melwittdansmith: ok yeah I see what you're saying now20:41
melwittthe problem must be already present with plain vTPM as it is today.. since it's all user owned secrets20:43
Zhan[m]hey team I'm wondering when is the next spec freeze? and it's for 2026.1 or 2026.2? I saw in the agenda that it writes "Soft spec freeze November 20th !".21:23
melwittZhan[m]: the soft spec freeze today is for 2026.1 and IIUC it means the last day to propose a new spec to openstack/nova-specs: "After 20 November 2025 (23:59 UTC), no new specs will be accepted." https://releases.openstack.org/gazpacho/schedule.html#g-nova-spec-soft-freeze21:29
melwittand then Dec 4 is the hard spec freeze meaning the deadline for spec approvals i.e. no more approvals after Dec 421:30
Zhan[m]melwitt: Thanks for explaining and the link! I guess my spec needs to go to 2026.2 :P21:31
melwittrealistically yes :) if you have it ready to go and can upload within the next 2 hours before 23:59 UTC then you can try 21:36
dansmithmelwitt: right exactly, nothing new with this series, just thinking we can now fix it since we have your use-the-service-user stuff21:36
melwittyeah, gotcha21:37
opendevreviewJay Faulkner proposed openstack/nova master: [ironic] Ensure unprovision happens for new states  https://review.opendev.org/c/openstack/nova/+/96794123:22
opendevreviewJay Faulkner proposed openstack/nova master: [ironic] Ensure unprovision happens for new states  https://review.opendev.org/c/openstack/nova/+/96794123:24
opendevreviewMerged openstack/nova master: api: Add response body schemas for security group APIs  https://review.opendev.org/c/openstack/nova/+/95297323:35

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!