Monday, 2024-09-23

bauzashaleyb: sure, shoot08:04
parasitidhi all, i have a question regarding the shelving process. would it be possible to "suspend" then shelve an instance, and unshelve + resume so that i can resume my instance on another host without restarting the OS ? Currently it seems that unshelving an instance reboots the VM and that its state file is lost during the shelving process. Am i correct ? thanks a lot09:05
pas-ha[m]parasitid: I don't think 'shelving' saves a memory dump.. it only saves a disk state. Shelving is literally 'shutdown and create an image', and shelve-offload is '..and de-allocate resources' (removes local disk files, removes allocations in placement but keeps the instance in the DB).09:17
pas-ha[m]And 'unshelving' just means 'rebuild the instance from the image we saved in the shelve command'09:18
parasitidpas-ha[m]: ok thanks that's what i thought. I hoped that the state machine would differentiate an instance active state > shelved state and "instance suspended state" > shelved state. it could be nice to offload the instance state file if suspended so that unshelving an instance could restore its states09:22
bauzasparasitid: you need to live-migrate your instance if you want to keep the instance alive09:22
bauzasshelve means "stop my instance and put it in the shelve" :)09:22
parasitidbauzas: well my usecase is : start a instance, use it. shelve it so i'm potentially not billed by the cloud provider. unshelve it and recover previous working environment09:24
pas-ha[m]I don't think nova has anything similar to libvirt/virt-manager 'save VM' which is kind of like hibernate - dump a memory to a file, and resume from that (and disk) later.09:24
bauzassnapshot it09:25
pas-ha[m]that's what would be required for your use case as you want to save the memory as well09:25
bauzasyou could also suspend the VM09:25
bauzasbut your cloud provider will probably continue to bill you09:26
parasitidbauzas: yes, but suspending a VM keeps the resources allocated, so you're often billed by the cloud provider. whereas shelving it frees the resources09:26
bauzassure, but that's why you need to stop the instance, right?09:27
bauzaswhen you shelve, IIUC, you'll get a memory snapshot09:27
bauzassorry, a disk snapshot I mean09:28
bauzasbut the memory will be lost09:28
parasitidbauzas: ok. couldn't the shelve process consider the VM state  and also backup/restore the memory state file if ever the instance was in suspended state ?09:30
bauzasI see your usecase tbc09:32
bauzasbut the problem here is that for the moment, AFAIK nova isn't able to save the memory09:32
parasitidbauzas: usecase would be VDI/remote desktop 09:32
bauzasyup,  I got it09:32
bauzasthat's in general *why* people want to save the memory :D09:33
parasitidbauzas: are you sure ? i made a test this morning. when i suspend an instance and resume it, i can ssh back into my instance and tmux attach to my session and recover the vim buffer right where is was before suspending 09:34
bauzashonestly, I don't know what to tell you, except that it would need a new feature09:34
bauzassuspend works indeed with the memory, but we don't snapshot it09:35
bauzasthat's why I said above "snapshot it"09:35
parasitidbauzas: ok thanks. i understand perfectly. i jumped here just in case there would be a secret tip to achieve this :)09:35
bauzasbut your usecase is different : you want to /persist/ it 09:35
bauzastbh, VMware supports that AFAIKK09:36
bauzas(memory snapshot)09:36
bauzasbut as I said here, we need some new feature and some new API support for it09:36
bauzassean-k-mooney: ^09:36
bauzas(that and live-resize are possibly the most needed features we'd like to have for VMware migration :) )09:38
bauzassean-k-mooney: I wonder, could we help parasitid with file-backed memory?09:39
bauzas(oh I forgot, sean is on PTO this week)09:41
sean-k-mooney[m]we have talked about this in the past, libvirt can do memory snapshots but we can because we do not guarantee a stable hardware interface09:43
sean-k-mooney[m]when  you unshleve you might shelve to a differnt host with differnent cpu, the pci device order in the guest may or may not be the same and if you have any sriov device there  memory would not be captured09:44
sean-k-mooney[m]we may be able to make it work in some cases but we would have to save and store the memory as an addtional image and also might need to recored other info about the guest like the pci device ids instead of allowing libvirt to choose them09:46
sean-k-mooney[m]but no file backed memory will not help here. libivrt has a call to take a snapshot and include the guest memory09:47
sean-k-mooney[m]so its not that we cant do that09:47
gibiyeah shelve with memory would be like: hibernate your guest, write the image to disk, bring the image to another hardwared, and try to resume from hibernation. As far as I know if I change my hardware while my machine is hibernated and I try to resume I very well be in kernel panic territory09:48
sean-k-mooney[m]for instances with no passthough devices like neurton sriov ports, or vgpus that use cpu_mode=custom and a pinned cpu model it proably would work09:49
gibiwondering what restiction libvirt puts on resuming from memory snapshot while the hypervisor is reconfigured inbetween09:50
sean-k-mooney[m]yep, so if there there was a strong demand for this, we could consider it but its very non cloudy, but it is an enterpirse virt feature09:50
gibiIf I have to choose what to spend time on between live-resize vs shelve with memory, then the former feels a more generally useful feature09:51
gibi(and both is hard)09:52
sean-k-mooney[m]the former is more cloudy09:53
sean-k-mooney[m]shelve with memory is definetly in the enterpise virt space09:53
bauzas_I agree with the fact there are some technical constraints and limitations09:56
sean-k-mooney[m]anyway yes im on pto this week so totally not going back to working on my home lab.09:56
gibiit is more like we don't seem to know the list of constraints, and coming up with one is not simple09:56
bauzas_but the usecase written as "as a VDI user, I'd like my instance to be shelved with its memory so I can spin again my instance without problems" sounds a valid cloud usecase for me09:57
gibisean-k-mooney[m]: enjoy your PTO09:57
sean-k-mooney[m]VDI its selef is iffy09:57
gibibauzas_: in a cloud we run cattle. 09:58
sean-k-mooney[m]im open to looking at this again but to be clear we have said no to this exact usecase once before and snapshot with memory once also09:58
sean-k-mooney[m]shevle with memory i think is actully more reasonable then generic snapshot with memory09:58
bauzas_if you prefer, someone running a instance with virtual desktop capabilities09:58
*** bauzas_ is now known as bauzas09:58
bauzasor there could be other workloads where saving the memory would be more than just nice09:59
bauzasbut there are security implications for sure, we'd need to keep the memory safe09:59
gibibauzas: I still reject the idea that this is cloudy. In a cloud you should be ready to loose a single VM any time.09:59
sean-k-mooney[m]yep you just need to reconsile the fact that we provide no abi stablity for the guest.09:59
sean-k-mooney[m]meaning if you add  2 volumes and remove 110:00
bauzasgibi: well, that ship has already sailed for a while :(10:00
gibisean-k-mooney[m]: yeah we can promise that we restore your memory but we cannot promise the guest kernel wont panic10:00
sean-k-mooney[m]the first one, then the next time we generte the xml the remaining voluems pci adress will change10:00
bauzaswhile I'm one of the most feroceous guys that say 'sorry, cloud', I tend to admit that /some/ workloads require more care10:01
gibibauzas: we can try to stop on that slippery slope :)10:01
sean-k-mooney[m]we can add this to the list of things people want but i would port vtpm live migration higher personally10:02
bauzasgibi: and ask the foundation to remove the Vmware migration to OpenStack whitebook ? :)10:02
gibiunshelve with memory is like a brain transplant while you expect not just the patient to survive but also to keep all its past memories.10:02
sean-k-mooney[m]this is notthign to do with vmware10:02
sean-k-mooney[m]we have a vmware direver and they never cared to enable this usecase10:02
sean-k-mooney[m]we have disucss this in an inperson ptg before10:03
sean-k-mooney[m]so this is an old request10:03
gibibauzas: just because the fundation say so brain transplant will not work out of the boz10:03
bauzasdoesn't vsphere support snapshoting the memory ?10:03
gibix10:03
bauzasI like the idea of "brain transplant"10:03
sean-k-mooney[m]it does but it does not do it as part of shelve10:03
bauzasbut this is more live neurolinks transplants10:03
bauzasI can consider the brain as disk while the neurotransmitters are the RAM :)10:04
sean-k-mooney[m]libvirt, hyperv and vmware all can do this (so can virtual box) but none of them ever enabled it in there virt driver10:04
gibithe disk  content goes through a full kernel boot process with hw probing etc, the resume from ram tries to continue exectuing where it left of without a boot process10:05
bauzasthen why shelve w/ memory would be more acceptable than snapshot w/ memory ?10:06
bauzasshit10:06
gibiI think we can ask the libvirt folks how do you feel about moving the libvirt memory snapshot, disk, and domain def to another compute and try to resume, if they say they support it then lets discuss further10:06
bauzassorry, I said "shit" because I was trying to understand exactly the diffs with suspend10:07
bauzasgibi: again, I don't have the energy nor the will to push that more forward, I was just saying that the usecase may sound legit10:08
bauzaslet's not rathole on it,  the answer is as of now "NOT SUPPORTED"10:08
gibias legit as requesting a brain transpart. legit but pointless if the tech is not there10:09
bauzasthe tech already allows us teleportation :)10:10
bauzasbut I hear ya10:10
bauzasthat would be an horrible spec to write and a terrible spec to review10:10
gibiif teleportation you mean live-migration then yes libvirt has a bunch of tech implemented to support that, I'm not sure they have the tech to support brain transplant yet.10:11
gibibtw that points to a direction actually. instead of shelving with a memory snapshot, do memory snapshot on the current hypervisor. The resume is supported there on the same hypevisor. I think the requesting people do not need the move aspect of shelve, they need the resumability only. So give them way to resume on same hypevisor10:14
gibiwait we have that, they just get billed as they reseve the space to be able to resume 10:15
gibiso they don't want to get billed, but still want to make sure they can resume in place, that is contradicting10:16
gibior at least we are moving to interrutible instances territory to make space for resume10:16
gibi(or what was the name of that feature that allows killing certian type of instances to make space for reservations)10:17
gibiI guess I stop braindumping here as it is not exciting :)10:18
sean-k-mooney[m]the reason shelve is more ligit then snapshot is it removes the snapshot once and create many copies uescase adn it also removes restore via rebuild11:06
sean-k-mooney[m]so shelve/unshelve is much smaller in scope and has less sharp edges as a result11:08
sean-k-mooney[m]gibi the reason sususped (which calls managed save) works today is we still have the domain so we can still restore it without worriing about the xml changing, managed save also does not allow pci passthough devices to be attached to the domain11:10
sean-k-mooney[m]if i was to do shelve with memory my instict would say, save the xml and the memory as addtional images and reuse some of the live migraton logic to update the host side for it xml without modifying the guest visible side11:12
sean-k-mooney[m]the same host does not really matter as long as the guest cant tell it moved11:12
gibisean-k-mooney[m]: I agree, move does not matter to the user if the instance works. I just pointing out that they also never requested the move, they only requests having a place to resume to but not getting billed for that place while suspended.11:32
gibibtw having a vtpm live migration indeed seems even more important to get11:33
sean-k-mooney[m]i mainly mentioned vtpm as another example of needing to store addtinoal data alongside the disk when shelving which we dont support today11:35
sean-k-mooney[m]but also it a higher priority in my book but both are parralel efforts11:35
parasitidhi gibi: i think i get you're point but there are still lots very obscure stuff to me because i'm definately not a specialist of this topic. When you say: "so they don't want to get billed, but still want to make sure they can resume in place, that is contradicting" why wouldn't it be possible to be resumed on a host supporting the same "flavor" ? why does it work in case of live migration and not unshelving ?. And yes i wouldn't 11:37
parasitidgibi: i just want to maximize the fact of being "resumable" but lowering the No Valid Host found error11:38
parasitidbtw, i didn't want to hijack your backlog by introducing this topic, i only wanted to know if there was a hidden feature/tips to achieve it. as it seems that its not currently supported im fine with it. don't worry.11:40
gibiparasitid: no worries. I'm happy to discuss incoming request. I'm especially happy that it wasn't just a request but you are open to discuss it a bit deeper. 11:41
gibiparasitid: the devil is in the details of host supporting the same flavor. For nova supporting a flavor to boot a new VM on a host is differnt from supporting host as target for a live migration. And I assume it will be also different for supporting a host as a target for unshelve with memory. 11:44
parasitidgibi: yes. the starting point of my day was a test where i created and instance, open a vim in a tmux session, then suspended it, shelved it... Up to that point, as nova didn't complained about shelving a suspended instance, i secretly hoped that the resume would work :)11:45
parasitidas i didn't find anything related to this in the docs, i jumped here to ask questions11:45
gibiparasitid: what, nova allowed to shelve a suspended VM? That feels like an API validation bug :) (and also a way to raise false hope)11:46
gibiparasitid: totally valid to jump and ask question. :)11:46
gibiI think I'm debating mostly with bauzas about how acceptable this use case as is into our scope.11:47
gibiand pointing out that even if we accept it there might me dependencies on libvirt / qemu we don't have right now to actually support it11:49
gibi(and I think bauzas has a point about scope, the question boils down to how much we want to support enterprise virt vs. cloud)11:50
parasitidby cloud you mean cloud native workloads ?11:52
parasitidcoz opening a vim in a tmux is not a cloud native workload :)11:53
parasitidmoreover.... i'm in the emacs team11:54
gibiby cloud I mean VMs considered as cattle instead of pets12:01
gibiyou loose one, you create a new12:01
gibino feelings attached12:01
bauzassometimes this is a bit harder than that, like if you have some database service using memory :รจ)12:06
gibiwe have cluster aware DBs like galera. I think galera Active/Active/Active is possible. I assume that survives killing a VM12:07
gibiif you only store your important data in memory then loosing that is not on nova12:08
bauzasgibi: (sorry, was upgrading to F40), yeah I don't disagree with your point, I'm just saying that some cases are related to some memory usage12:35
bauzasand just saying 'sorry, but cattle' doesn't help them12:35
gibiI hope that it is OK to send the message that please also try to change the workload to be more cloudy. I hope this help them in the long run changing those workloads and enjoying the benefits. 12:37
gibiI do belive saying no sometimes actually helps :)12:38
opendevreviewBalazs Gibizer proposed openstack/nova master: Refactor obj_make_compatible to reduce complexity  https://review.opendev.org/c/openstack/nova/+/92859012:44
opendevreviewBalazs Gibizer proposed openstack/nova master: [ovo]Add igb value to hw_vif_model image property  https://review.opendev.org/c/openstack/nova/+/92845612:44
opendevreviewBalazs Gibizer proposed openstack/nova master: [libvirt]Support hw_vif_model = igb  https://review.opendev.org/c/openstack/nova/+/92858412:44
opendevreviewBalazs Gibizer proposed openstack/nova master: [doc]Developer doc about PCI and SRIOV testing  https://review.opendev.org/c/openstack/nova/+/92883412:44
gibibauzas: can I drop the multipath_id and the novnc subpath topics from the nova meeting agenda? Or do we want to revisit them this week? I'm asking as I'm going to add https://blueprints.launchpad.net/nova/+spec/igb-vif-model to the agenda I noticed that those items might be stale12:55
bauzasgibi: we discussed the multipath_id one12:56
bauzasfor the subpath, I think we also agreed the specless bp12:56
bauzasso yeah12:56
gibiOK, dropping them12:57
gibidone12:57
bauzascoo12:58
bauzascool even12:58
opendevreviewBalazs Gibizer proposed openstack/nova master: [doc]Developer doc about PCI and SRIOV testing  https://review.opendev.org/c/openstack/nova/+/92883413:05
opendevreviewBrian Haley proposed openstack/nova stable/2023.2: libvirt: Cap with max_instances GPU types  https://review.opendev.org/c/openstack/nova/+/91608913:12
*** bauzas_ is now known as bauzas13:17
opendevreviewBalazs Gibizer proposed openstack/nova master: [doc]Developer doc about PCI and SRIOV testing  https://review.opendev.org/c/openstack/nova/+/92883413:52
opendevreviewDoug Szumski proposed openstack/nova master: Revert "[libvirt] Live migration fails when config_drive_format=iso9660"  https://review.opendev.org/c/openstack/nova/+/90912216:15

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!