| tafkamax | Hi I have a question about live-migration. I am getting this error: Operation not permitted: libvirt.libvirtError: internal error: unable to execute QEMU command 'migrate-set-capabilities': Postcopy is not supported: Userfaultfd not available: Operation not permitted | 08:18 |
|---|---|---|
| tafkamax | I am using kolla-ansible and set the options under [libvirt]... (full message at <https://matrix.org/oftc/media/v1/media/download/ARBKYWZG4vFezBf8AdfbKO35RwPbQknzRNnZhzshrYdIgP66r5d8eL5kRgH6dYMm6kvXhVs1S3zU0FLXLsvUWH1CeectI0ogAG1hdHJpeC5vcmcvT1NZR0RscVlSZk9kZkhkR2ZOc3VHeHF5>) | 08:20 |
| tafkamax | running 2025.1 release | 08:20 |
| tafkamax | are these even supported? | 08:20 |
| tafkamax | or how can I find out then? | 08:20 |
| tafkamax | or do i need to enable a kernel module? | 08:21 |
| tafkamax | * or do i need to enable the sysctl option "vm.unprivileged_userfaultfd=1" | 08:22 |
| Mike-- | I checked nova changelog and version 14.0.0 introduced some of those options and should work I'd say | 08:25 |
| tafkamax | I think I need to fool around with the sysctl options | 08:25 |
| Mike-- | https://github.com/openstack/nova/blob/master/releasenotes/source/newton.rst | 08:26 |
| Mike-- | also found: https://github.com/openstack/nova/blob/master/doc/source/admin/configuring-migrations.rst | 08:27 |
| tafkamax | ok thx! | 08:27 |
| tafkamax | thats a good docs | 08:28 |
| *** elodilles_pto is now known as elodilles | 08:49 | |
| stephenfin | sean-k-mooney: Great, thanks | 09:49 |
| sean-k-mooney | so in the job that failing most often i think i can actully trun off the neutron-periocs but im goign t chat about that wiht the neturon folk after it have coffee | 09:51 |
| opendevreview | Ashish Gupta proposed openstack/nova master: tests: file-backed SQLite with WAL in threading mode for Database and CellDatabases Fixtures https://review.opendev.org/c/openstack/nova/+/988583 | 10:30 |
| opendevreview | Pavlo Shchelokovskyy proposed openstack/nova master: Fix PEP-765 syntax warning https://review.opendev.org/c/openstack/nova/+/988636 | 10:47 |
| tafkamax | I have a question regarding the live migrations. they take forever to sync the last part of ram. And looking at the logs I don't know if I am looking at the correct thing. | 10:48 |
| tafkamax | nova.virt.libvirt.migration [req-f5188e51-02b9-4338-b3e7-53e162a3f3c2 req-f369ecbf-a79d-4ddf-b065-eea729d54da1 2337b7a2b5e34752a86ab5d61c525327 0530bca34f3f4d2c92a3495d56a1e065 - - default default] [instance: d5cf4b01-d97d-4395-8dd2-c4930b57e0c9] Increasing downtime to 50 ms after 0 sec elapsed time during the start of the migration | 10:49 |
| tafkamax | what downtime is it? because https://docs.openstack.org/nova/2025.1/configuration/config.html#libvirt.live_migration_downtime <- this downtime starts at 500ms | 10:49 |
| tafkamax | And it is constantly hovering at 1% and 0% memory remaining at takes like ~20minutes to finish | 10:50 |
| tafkamax | Aha it completed faster after changing live_migration_downtime_delay from default 75 to 25 | 10:55 |
| stephenfin | tafkamax: That happens if you have a a busy VM. If there's lot of writes to memory in the VM, the writes can happen faster than libvirt can sync things on the destination | 10:55 |
| stephenfin | You want to look at live_migration_downtime_delay, live_migration_permit_post_copy and live_migration_permit_auto_converge (this one particularly) | 10:56 |
| * tafkamax sent a code block: https://matrix.org/oftc/media/v1/media/download/AcXSe0uulLyz53TnNkkjWT7nZZIsiyEKA6P9odmnIybwFrUkxqsDakmiHf_wKkU129PnuVjpKmu_goKawiqiJjdCeec2DAgAAG1hdHJpeC5vcmcvVFpITGdxSlpOeXJtTk5XQ3JPYnJlaWl3 | 10:56 | |
| tafkamax | I have already setup the auto_converge and post_copy | 10:56 |
| tafkamax | ^ the logs, the first downtime ~11:43 to 12:03 had the default 75seconds | 10:56 |
| tafkamax | changing to 25seconds in ~13:42 made the downtime change faster and then it completed faster | 10:57 |
| tafkamax | whatabout live_migration_downtime and downtime_steps though? | 10:58 |
| tafkamax | or 500 is the max and it starts from 50ms? | 10:58 |
| tafkamax | I will now move steps from 10 to 6 | 11:06 |
| tafkamax | ok seems I got the hang of it now how it caluculates it | 11:18 |
| gibi | fyi cores, IOThreads in Gazpacho breaks the live migration of pre-existing VMs with pinned CPUs https://bugs.launchpad.net/nova/+bug/2152697 | 11:53 |
| gibi | it is due to an wrong assumption that iothreadpin always exists in the XML | 11:54 |
| gibi | https://github.com/openstack/nova/commit/53a613d9948826ec9a4cd4a502f7a5d1b2dc87d7#diff-1f1f3f935853a67b1239220cd9f8c28278734732c5c28e8d238f62860391b1ecR273 | 11:54 |
| *** haleyb is now known as haleyb|away | 12:38 | |
| opendevreview | Thibaut Démaret proposed openstack/nova master: libvirt: add disk rotation_rate support for local disks https://review.opendev.org/c/openstack/nova/+/979693 | 12:49 |
| opendevreview | Thibaut Démaret proposed openstack/nova master: libvirt: add disk rotation_rate support for local disks https://review.opendev.org/c/openstack/nova/+/979693 | 13:15 |
| sean-k-mooney | gibi: didnt we fix that | 13:23 |
| sean-k-mooney | i guess we never backported it? | 13:23 |
| gibi | sean-k-mooney: we fixed the pinned cpu case and fixed the live migration case but not the live migration of a pre-existing VM wwith pinned cpu case :D | 13:26 |
| gibi | the live migration fix introduced the unconditional check for the iothreadpin xml tag | 13:27 |
| gibi | which only exists with VMs that are created (or rebooted) since the IOThread feature | 13:27 |
| sean-k-mooney | im surpised our multinode grenade job didnt pick that up | 13:37 |
| sean-k-mooney | but ok that shoudl not be hard to fixx | 13:37 |
| sean-k-mooney | we just need ot gate that code by the xml having it | 13:37 |
| gibi | I guess our grenade does not have pinned VMs | 13:43 |
| sean-k-mooney | oh is this only for pinned vms | 13:43 |
| sean-k-mooney | oh no | 13:43 |
| sean-k-mooney | its not pined exaclytu | 13:43 |
| sean-k-mooney | its that we are not using cpu_share_set or cpu_dedicated_set | 13:43 |
| sean-k-mooney | so since we are not using either of them we never generate teh element | 13:43 |
| gibi | nope | 13:44 |
| sean-k-mooney | cpu_shared_set woudl be enough to catch this i think | 13:44 |
| sean-k-mooney | gibi: are you plannign to work on a patch to make it conditional? | 13:44 |
| gibi | I'm working on a functional reproducer | 13:44 |
| sean-k-mooney | ack cool im happy to review both so add me or ping me when its up | 13:45 |
| gibi | during live migration we update the src XML, for a pre-existing VM, the src XML does not have any iothreads or iothreadpin field | 13:45 |
| sean-k-mooney | yep we have had this tyep of issue before | 13:45 |
| sean-k-mooney | i think we have exsing functional tests that simulate somethign similar | 13:46 |
| opendevreview | Balazs Gibizer proposed openstack/nova master: Reproduce bug/2152697 https://review.opendev.org/c/openstack/nova/+/988769 | 13:51 |
| gibi | sean-k-mooney: this is the reproducer ^^ | 13:52 |
| sean-k-mooney | am `del list(conn._vms.values())[0]._def["iothreads"]` | 13:53 |
| sean-k-mooney | ok i see why that works | 13:53 |
| sean-k-mooney | the list is convertign the the iterator int a lis that containes refence to the actul objects in the _vms dict | 13:54 |
| sean-k-mooney | so we are deleting trhough the list | 13:54 |
| sean-k-mooney | coudl we do this a littel iffenrtly | 13:55 |
| sean-k-mooney | vm_domain = next(conn._vms.values()) | 13:55 |
| gibi | we can use next yes | 13:55 |
| sean-k-mooney | del vm_domain._def["..."] | 13:56 |
| sean-k-mooney | i can see why what you have works on a second or third reading but its a littel non obvious | 13:56 |
| sean-k-mooney | gibi: over all the repoducer is more or less doign what i expect | 13:57 |
| sean-k-mooney | as noted however this shoudl also manifest if you have jsut set cpu_shared_set and are not using dedicated | 13:57 |
| sean-k-mooney | so it migh be nice to add a second test for floating vms | 13:57 |
| sean-k-mooney | otherwise this looks like a reasonabel repoducer. it was more or less what i was expecting you to write | 13:58 |
| gibi | I tried an instance with shared CPUs before and it did not hit the same issue | 13:59 |
| gibi | the problematic code is under a condition | 13:59 |
| gibi | https://github.com/openstack/nova/commit/53a613d9948826ec9a4cd4a502f7a5d1b2dc87d7#diff-1f1f3f935853a67b1239220cd9f8c28278734732c5c28e8d238f62860391b1ecR260-R262 | 13:59 |
| gibi | about pinning | 13:59 |
| sean-k-mooney | we do pin shared cpus | 14:00 |
| sean-k-mooney | if an only if cpu_share_set is defiend | 14:00 |
| sean-k-mooney | oh | 14:01 |
| sean-k-mooney | actully in that case we only set vcpu cpuset=... | 14:01 |
| sean-k-mooney | so we pin it but we dont generate the element | 14:02 |
| sean-k-mooney | ok then ya i see this would only impacte pinned vms that were created before iothread | 14:02 |
| opendevreview | Balazs Gibizer proposed openstack/nova master: Fix live migration with pinned VM and iothreads https://review.opendev.org/c/openstack/nova/+/988774 | 14:05 |
| sean-k-mooney | gibi: +2 on the repoducer +1 on the fix. ill see if i can try and trest this on monday and loop back to the review after the ci has ran | 14:08 |
| sean-k-mooney | gibi: we might want to add a release note but that my only feedback really right now | 14:08 |
| sean-k-mooney | maybe also a unit test for the branchign bevhior in the migration module | 14:08 |
| sean-k-mooney | just to test _update_numa_xml | 14:09 |
| gibi | yeah I can do that | 14:09 |
| opendevreview | Balazs Gibizer proposed openstack/nova master: Reproduce bug/2152697 https://review.opendev.org/c/openstack/nova/+/988769 | 14:17 |
| opendevreview | Balazs Gibizer proposed openstack/nova master: Fix live migration with pinned VM and iothreads https://review.opendev.org/c/openstack/nova/+/988774 | 14:17 |
| opendevreview | Balazs Gibizer proposed openstack/nova master: Fix live migration with pinned VM and iothreads https://review.opendev.org/c/openstack/nova/+/988774 | 14:20 |
| gibi | now with reno and unit test :) | 14:20 |
| gibi | ^^ cc dansmith as we chatted about it downstream | 14:21 |
| sean-k-mooney | gibi: so one nuance about your fix | 14:24 |
| sean-k-mooney | it will work but you intentally choose not to add an iothread to the running instnace | 14:24 |
| sean-k-mooney | on live migrate | 14:24 |
| sean-k-mooney | we could also do that perhasp in a followup patch | 14:25 |
| sean-k-mooney | im personally fine with not doing that and just saying you need to reboot to get the iothread jsut notign that qemu/libvirt techniall supprot adding and removign them on a runing instance | 14:26 |
| gibi | I think it is safer not to try to add it during live migration | 14:27 |
| sean-k-mooney | ya that was my feelign too | 14:27 |
| sean-k-mooney | just wanted to raise the possiblity | 14:27 |
| gibi | so if somebody explicity want this then that is a new small feature to me | 14:28 |
| sean-k-mooney | cool well still +1 lets see what ci says and ill loop back | 14:29 |
| sean-k-mooney | gibi: nice find by the way idd you just come across it or was it somethign you noticed in a ci failreu or something | 14:31 |
| sean-k-mooney | oh you were backporting this downstream right | 14:31 |
| gibi | yepp I tested my backport downstream | 14:31 |
| gibi | there it is simpler to thest the upgrade | 14:31 |
| gibi | by just changing the nova-compute container image | 14:32 |
| gibi | as we had bugs before with live migration and pinned CPUs, I tested the combination of those :) | 14:32 |
| sean-k-mooney | ack i need to revive and rewrite my ci change | 14:32 |
| opendevreview | Masanori Ueno proposed openstack/nova master: DNM: Add functional test for NUMA live migration overcommit bug https://review.opendev.org/c/openstack/nova/+/988777 | 14:33 |
| sean-k-mooney | that enabeld cpu pining upstream | 14:33 |
| gibi | yeah I have a todo to add some iothread tests to whitebox | 14:33 |
| sean-k-mooney | i never got aroudn to making the calulation of cpu_share_set and cpu_dedicated_set dynmic | 14:33 |
| sean-k-mooney | whitebox woudl be good too ya | 14:33 |
| *** gibi is now known as gibi_off | 14:46 | |
| dansmith | gibi_off: sean-k-mooney: but... people might want to use live migration to get iothreads without a guest restart | 16:38 |
| dansmith | I mean, I know the risk is higher but.. I _know_ people will want that :D | 16:38 |
| sean-k-mooney | dansmith: thats why i broguht it up. it would be nice if that worked but that is why i was suggesting perhaps a followup | 17:10 |
| sean-k-mooney | i knwo there is an api to add an remove them at runtime | 17:10 |
| sean-k-mooney | but i dont knwo if you can do this when live migrating | 17:10 |
| dansmith | ack | 17:11 |
| -opendevstatus- NOTICE: The Gerrit service on review.opendev.org will be offline momentarily while we restart it onto a new patch release | 18:04 | |
| melwitt | gmaan: sorry to bug you with this but I'm trying to close out a quota bug fix I worked on awhile back that has got one +2 https://review.opendev.org/q/topic:%22bug/2131272%22 the CI results are old but I just rechecked it, if you might have a chance to look at it. it is not urgent | 19:37 |
| gmaan | melwitt: ack, I will take a look | 19:45 |
| melwitt | thanks gmaan | 19:46 |
| gmaan | melwitt: one question. do we allow to set user quota >project quota OR set project quota < user quota ? if yes then that is the problem right? | 20:26 |
| gmaan | so this is case when both are equal and multi users. got it but just wondering if we handle the above case ^^ (i assume yes) | 20:30 |
| *** jcosmao is now known as Guest9508 | 20:44 | |
| melwitt | gmaan: I think it is not allowed to set an individual user's quota to be larger than the quota of their project. And I don't think it's allowed to set project quota lower than an individual user's quota in the project but I am not as sure about that one. /me checks if we have tests for this and if not, it's probably worth adding a couple | 20:49 |
| gmaan | yeah, I was thinking to have test for those cases. but anyways that is separate things came up in my mind. bug#2131272 fix lgtm, waiting for CI results | 20:53 |
| melwitt | gmaan: yeah I think it's a good point. I might add the tests to this patch if I find we don't already have them, since we are waiting anyway | 20:57 |
| melwitt | I would think they should be easy small func tests that just try to set the quota and assert it's rejected in those scenarios | 20:58 |
| opendevreview | melanie witt proposed openstack/nova master: Fix swap disk creation skipped on NFS during cold migration https://review.opendev.org/c/openstack/nova/+/988547 | 21:23 |
| gmaan | ++ thanks | 21:24 |
| opendevreview | melanie witt proposed openstack/nova master: Use tempest_concurrency=1 for nova-vtpm job https://review.opendev.org/c/openstack/nova/+/984864 | 21:27 |
| opendevreview | Merged openstack/nova master: Reproducer for bug 2131272 https://review.opendev.org/c/openstack/nova/+/967147 | 21:42 |
| melwitt | I have a crazy old 4 character bug fix with one +2 on it if anyone wants an easy review https://review.opendev.org/c/openstack/nova/+/904155 | 21:49 |
| opendevreview | melanie witt proposed openstack/nova master: Fix usage count when user-scoped quota is set https://review.opendev.org/c/openstack/nova/+/967148 | 22:25 |
| gmaan | +w | 22:31 |
| melwitt | thanks gmaan ! | 23:07 |
| opendevreview | Merged openstack/nova master: Implementing get_num_instances for ironic virt driver https://review.opendev.org/c/openstack/nova/+/955685 | 23:10 |
Generated by irclog2html.py 4.1.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!