Thursday, 2025-02-06

chungwonHello, can you review this?04:49
chungwonreview.opendev.org/c/openstack/nova/+/93992904:49
opendevreviewsuiong ng proposed openstack/nova stable/2024.1: ironic: Fix ConflictException when deleting server  https://review.opendev.org/c/openstack/nova/+/94084605:21
opendevreviewMichael Still proposed openstack/nova master: libvirt: Add extra spec for sound device.  https://review.opendev.org/c/openstack/nova/+/92612606:42
opendevreviewMichael Still proposed openstack/nova master: Protect older compute managers from sound model requests.  https://review.opendev.org/c/openstack/nova/+/94077006:42
opendevreviewMichael Still proposed openstack/nova master: libvirt: Add extra specs for USB redirection.  https://review.opendev.org/c/openstack/nova/+/92735406:42
opendevreviewMichael Still proposed openstack/nova master: Don't calculate the minimum compute version repeatedly.  https://review.opendev.org/c/openstack/nova/+/94084806:42
opendevreviewMichael Still proposed openstack/nova master: libvirt: direct SPICE console object changes  https://review.opendev.org/c/openstack/nova/+/92687606:46
opendevreviewMichael Still proposed openstack/nova master: libvirt: direct SPICE console database changes  https://review.opendev.org/c/openstack/nova/+/92687706:46
opendevreviewMichael Still proposed openstack/nova master: libvirt: allow direct SPICE connections to qemu  https://review.opendev.org/c/openstack/nova/+/92484406:46
mikalI have updated the tracking etherpad at https://etherpad.opendev.org/p/nova-2025.1-status with a brain dump of the current state of the various SPICE VDI patches.06:59
*** ralonsoh_ is now known as ralonsoh07:40
opendevreviewbenlei proposed openstack/nova master: Abort live migration task when stop nova compute service  https://review.opendev.org/c/openstack/nova/+/93822308:25
artem_vasilyevHey, could you review a small linter and test fixes for macOS support, please https://review.opendev.org/c/openstack/nova/+/93772708:50
bauzasdansmith: Uggla: thanks for the reviews on https://review.opendev.org/c/openstack/nova/+/94064210:20
Ugglabauzas, you are welcome.10:27
sean-k-mooneybauzas: i just left my comment too i dont partically like the approch you took10:48
sean-k-mooneyit seems complex for what your trying to do10:48
bauzassean-k-mooney: thanks, I'll look over them after lunch10:57
bauzassean-k-mooney: your idea of getting all the props by a object property is actually a good one10:58
sean-k-mooneyi ocationally have them10:59
bauzasfor your algorithm with union/intersection, let's discuss this later, as I need to explain correctly why it wasn't working when I tested them11:00
sean-k-mooneyi check ovo and it does not have a funciton for that but they do it often internally. i woudl consider adding set_filed ot novas base object class11:00
sean-k-mooneysure. form my perspecive the cardinatly shoudl not matter11:01
sean-k-mooneyi.e. if 2 isntance request the image property on one host  adn one instnace request it on anohter shoudl not affect the weight11:01
sean-k-mooneyi think you are trying to also include that by using counter right?11:02
sean-k-mooneywe can chat later but i was intentionly trying not to have the number of vms on a host affect teh weighing of the host11:02
bauzasI tested it, and the problem is that for example, if you have a host that has already 2 instances and one other that only has 1 using the prop, then it would prefer the second11:03
bauzasso I preferred to just return the number of how many asked properties that are alrady used in the host11:04
bauzasthis is simplier to calculate and understand11:04
sean-k-mooneyi think that is more complex11:04
bauzasalso, I saw with devstack that when you create an instance, you get more props that the one you asked11:04
sean-k-mooneyand harder ot understand11:04
sean-k-mooneyyou do yes we set some like the machine type on boot11:05
sean-k-mooneyor config drive11:05
bauzasindeed, but you haven't asked for it, that's my point11:05
sean-k-mooneybased on the host config. we will be adding the vtpm secrete type11:05
sean-k-mooneyyou have not but we shoudl not exclude them 11:05
bauzasanyway, I need to go off11:06
sean-k-mooneyack11:06
opendevreviewsean mooney proposed openstack/nova master: [WIP] move nova-ovs-hybrid-plug to deploy with spice and fix qxl default  https://review.opendev.org/c/openstack/nova/+/94083512:20
opendevreviewsean mooney proposed openstack/nova master: Dont deploy n-spice on compute nodes.  https://review.opendev.org/c/openstack/nova/+/94087312:20
opendevreviewsuiong ng proposed openstack/nova master: Fix parameter order in add_instance_info_to_node  https://review.opendev.org/c/openstack/nova/+/93941113:15
*** ykarel_ is now known as ykarel13:27
opendevreviewsean mooney proposed openstack/nova master: allow discover host to be enabled in multiple schedulers  https://review.opendev.org/c/openstack/nova/+/93852313:53
sean-k-mooneygibi: i just dropped the job change form ^13:54
sean-k-mooneyif you have time to look at https://review.opendev.org/c/openstack/nova/+/939476 as well it would be nice to land that13:55
sean-k-mooney... 13:59
sean-k-mooneymikal:  "libvirt.libvirtError: unsupported configuration: spice graphics are not supported with this QEMU"13:59
sean-k-mooneyit looks like ubuntu 24.04 didnt just remove qxl they fully removed spice supprot like rhel913:59
sean-k-mooneyi think, debian still supprots it14:00
sean-k-mooneyso i could move that job to debian but if centos/rhel 9 and ubuntu 24.04 have fully drop spice that proably our last option to test it14:01
sean-k-mooneykolla uses debian continer on all distos now so i woudl guess that is why its workign for you in kolla-ansible14:01
sean-k-mooneythat our your still using ubuntu 22.04?14:02
ykarelHi is the issue a known one where test fails randomly with libvirt.libvirtError: Failed to terminate process 70698 with SIGKILL: Device or resource busy ?15:04
ykarellike seen in15:04
ykarelhttps://0c63ab9652170854bf26-a09d1a3317eb4b9b558e42ad19c25861.ssl.cf2.rackcdn.com/940474/1/gate/neutron-ovs-tempest-multinode-full/5184c7f/testr_results.html15:04
ykarelhttps://97ad4d1320a89f3380ef-01f5fc3a5734547a13a0f54725d40b32.ssl.cf5.rackcdn.com/936364/4/gate/nova-multi-cell/3003ee6/testr_results.html15:04
ykarelfound a quite old related patch https://review.opendev.org/c/openstack/nova/+/639091 but that was targetting a libvirt version and it got removed from nova now as that old versions no longer supported15:06
sean-k-mooneyykarel: rerely teh qemu monitor process that is responisble for respondind to the howdown of the guest can become unresponsive15:35
sean-k-mooneybut htis is not realy a nova issue its more of a libvirt/qemu one15:36
sean-k-mooneynova should not need to actuly retry the destroy at all15:36
ykarelsean-k-mooney, ohk so it's a known issue? what is the mitigation for this then?15:38
ykarelthe operator/user have to retry the operation?15:39
sean-k-mooneynot really15:39
sean-k-mooneyits a very rare failure mode in qemu to exit gracefully when ask15:39
sean-k-mooneylibvirt tried to fix this by sendign sigkill instead of just sigterm15:39
sean-k-mooneyso the os would evenutally reap the process15:39
sean-k-mooneyif sigkill does not work there nothing nova can do to force it to exit really15:40
ykarelohkk got it, but why nova cannot do retry like done in above patch if response is ebusy ? 15:42
sean-k-mooneylibvirt is ment to do it internally15:43
ykarelohkk but as per logs thats not helping as there are failures in CI15:45
ykarelohh or you mean libvirt should do it but it's not doing it and libvirt have to be fixed for that?15:46
sean-k-mooneyits either not doing it or it is doing it and qemu is still not exiting15:46
ykarelohkk no idea then where the issue is, i will start with nova bug and other points could be checked as part of it15:47
ykarelthx sean-k-mooney for sharing all the insights15:48
sean-k-mooneyi dont think you shoudl start with a nova bug15:48
sean-k-mooneyi also think we sthil have some rety logic because _destory does call itesl https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L161215:48
ykarelnova people may know more about libvirt/qemu bits so can judge more then me so i thought to start with it :)15:48
sean-k-mooneyi would not personaly consider it a nova bug but other might disagree15:51
sean-k-mooneythe orgianl issue was not a nova bug it was a workaround for a libvirt/qemu one15:52
sean-k-mooneyykarel: as far as i can see we actuly retry infinetly https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L160315:55
sean-k-mooneyim not a huge fan of who this is written but in destroy we are using loopingcall form oslo.service to execute _wait_for_destroy which checks if the instnace is stil runing. if it sets   kwargs['is_running'] = True15:56
sean-k-mooneywhich asues use to recursivly call destroy https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L161215:57
sean-k-mooneythat loop will break however if we get an excption form libvirt15:58
sean-k-mooneyso if we get EBUSY or other libvirt exction in _wait_for_destroy it wont retry16:02
ykarelbut that is_running = true seems not reached as i don't see it in logs16:02
ykarelit's raising at https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L156616:02
ykareli.e not seeing in logs "Going to destroy instance again"16:02
sean-k-mooneythat means the libvirt could not destory the instnace16:02
ykarelbut you said it will retry indefinitely16:03
sean-k-mooneyyes if there is no error16:03
sean-k-mooneyso if we ask libvirt to destory the instance and there is no error16:04
sean-k-mooneybut it does not complete in half a secodn we ask it to do it again16:04
sean-k-mooneyso we retry only when tehre isnt an error16:04
sean-k-mooneyif you retired the delete or hard reboot ectra it may work if libvirt has correct its internal error or qemu finally died16:05
sean-k-mooneyif qemu is locked up badly enough to service a sig kill form libvirt its possibel the only way to kill it woudl be a host reboot. trying it again might work but there is no garentee16:07
sean-k-mooneywe coudl readd a limite retry in error but it woudl really just eb papering over a potital libvirt or qemu bug16:07
sean-k-mooneywe have done that in the past but its kind of tech debt every time we do which is why we removed this16:08
-opendevstatus- NOTICE: nominations for the OpenStack PTL and TC positions are now open, for details see https://governance.openstack.org/election/16:08
ykarelremoval seems was done as part of cleanup of old libvirt version, but okk got everything what you mean16:09
ykarelbut it's good if it could be worked around atleast for the cases where it's not a persistent issue on qemu side16:10
sean-k-mooneyif you want to file it as a bug do, if we readd this i think we woudl put it behind a workaround flag16:10
ykarelok that should also work16:10
ykarelthx again16:11
sean-k-mooneyit would also look at the failing job16:11
sean-k-mooneythis happens because of very high load16:11
sean-k-mooneyso the job may be usign too high a concurancy or have other isseus that is triggering it16:11
ykarelhmm underlying hypervisor where it ran might also be overloaded16:12
sean-k-mooneyi belive libvirt is now going to wait at least 40s instead of 15 before retuing EBUSY16:13
ykarelyes as per logs i see 40s b/w requesting destroy and returned error16:22
* ykarel away16:34
opendevreviewDoug Szumski proposed openstack/nova master: Stop corrupting ephemeral volumes during live migration  https://review.opendev.org/c/openstack/nova/+/94090017:50
dougszuAny opinions on that would be great ^. I've got another fix for cold migration using block pull to share if it's of interest. And of course, the final bit to close the bug would be not making file systems on the ephemerals at all.17:53
-opendevstatus- NOTICE: The Gerrit service on review.opendev.org will be offline momentarily while we upgrade for a new jeepyb feature and switch our database container image source repository18:52
sean-k-mooneydougszu: i see you tried removing the backing file19:04
sean-k-mooneylongterm definlty the direction we shoudl go19:04
sean-k-mooneywe likely would need to cahnge intal boot/reboot and cold migration or evacuate too19:05
sean-k-mooneyif we were to do this.19:05
sean-k-mooneyi.e procive a way for nova, on the next lifeccyle operation on the guest to move or redifien the domain, to remove the backing file for ephmeral and swap disk19:06
mikalsean-k-mooney: yeah, I am using Debian containers on Kolla-Ansible, so that's why I didn't notice the Ubuntu behaviour.19:07
sean-k-mooneyi need to push my pathc to try the job with debian19:07
sean-k-mooneyi have been gettign distracted for like the last 3 hours19:08
mikalHeh, its ok, I didn't manage to get to tempest for similar reasons yesterday.19:08
mikalI find people dropping SPICE quite frustrating to be entirely honest. The public excuse seems to be H.264 requirements, but RDP has the same requirements and there is a patent license from Cisco for the codec.19:09
opendevreviewsean mooney proposed openstack/nova master: [WIP] move nova-ovs-hybrid-plug to deploy with spice and fix qxl default  https://review.opendev.org/c/openstack/nova/+/94083519:09
opendevreviewsean mooney proposed openstack/nova master: Dont deploy n-spice on compute nodes.  https://review.opendev.org/c/openstack/nova/+/94087319:09
sean-k-mooneymikal: spice is the better console. vnc is catching up but spice is still better19:09
sean-k-mooneymikal: i dont think it h.264 related honestly19:10
mikalsean-k-mooney: yeah, RDP is probably about as good as SPICE in terms of features, but qemu doesn't support it at all and the protocol is much more complicated as best as I can tell.19:10
sean-k-mooneywhats funney is as far as i know its the default in gnome boxes and a few other things19:13
sean-k-mooneyso its not like spice was unsued 19:13
mikaloVirt used it a bit too, as did proxmox.19:14
mikalAlso... its pretty stable. It hasn't changed much if at all in years.19:14
sean-k-mooneyi think its still an option in proxmox given that derived form rhel19:14
mikalSo there's no feature chasing etc.19:14
sean-k-mooney*debian19:14
sean-k-mooneyspeaking of stable i have updated the patch to the latest debian stable (12/bookworm)19:15
sean-k-mooneyactully we do not need grenade and some of the other jobs ill quickly drop those out whiel we are testign19:16
opendevreviewsean mooney proposed openstack/nova master: [WIP] move nova-ovs-hybrid-plug to deploy with spice and fix qxl default  https://review.opendev.org/c/openstack/nova/+/94083519:17
opendevreviewsean mooney proposed openstack/nova master: Dont deploy n-spice on compute nodes.  https://review.opendev.org/c/openstack/nova/+/94087319:17
sean-k-mooneyok better its just runnign one job now19:18
opendevreviewMichael Still proposed openstack/nova master: Don't calculate the minimum compute version repeatedly.  https://review.opendev.org/c/openstack/nova/+/94084819:18
opendevreviewMichael Still proposed openstack/nova master: libvirt: Add extra spec for sound device.  https://review.opendev.org/c/openstack/nova/+/92612619:18
opendevreviewMichael Still proposed openstack/nova master: Protect older compute managers from sound model requests.  https://review.opendev.org/c/openstack/nova/+/94077019:18
opendevreviewMichael Still proposed openstack/nova master: libvirt: Add extra specs for USB redirection.  https://review.opendev.org/c/openstack/nova/+/92735419:18
mikal^--- fixes a one character error in the first patch in that series19:19
sean-k-mooneyah you have split out the sound/usb ones and the performance improvment, nice19:20
mikalYeah, I wrote up a decoder ring on the etherpad because its getting complicated -- https://etherpad.opendev.org/p/nova-2025.1-status19:22
mikalBut there's basically two independent series now -- the API changes, and the "VDI changes" (sound, usb, compute version bumps etc).19:23
mikalI agree we should focus on the API changes first, but that's blocking right now on me getting some quality time with tempest.19:23
sean-k-mooneyin its current form both can be merged in either order and in parallel so that will help i think.19:27
sean-k-mooneymikal: https://zuul.opendev.org/t/openstack/build/842a66f2cfe54eb39d856a3bf311bb5c20:14
sean-k-mooneymikal: debian works and the job passes with it20:14
sean-k-mooneyso ill revert the commetned out jobs but we should be able to test in that job20:15
opendevreviewsean mooney proposed openstack/nova master: move nova-ovs-hybrid-plug to deploy with spice and fix qxl default  https://review.opendev.org/c/openstack/nova/+/94083520:27
opendevreviewsean mooney proposed openstack/nova master: Dont deploy n-spice on compute nodes.  https://review.opendev.org/c/openstack/nova/+/94087320:27
mikalsean-k-mooney: cool, thank you. I assume we still want to do a tempest test, which I will try to actually get around to attempting today.20:34
sean-k-mooneyi think so but those should help with a job we can run the test in20:35
mikalAgreed20:35
mikalThanks for chasing this bit for me.20:36
sean-k-mooneyi just added those to the etherpad too with some explanations20:36
mikalCool20:36
sean-k-mooneymikal: if nothing else, we have not deprecated spice supprot and there is litrally no spice job anywhere so just for that i consider those to be a win20:36

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!