chungwon | Hello, can you review this? | 04:49 |
---|---|---|
chungwon | review.opendev.org/c/openstack/nova/+/939929 | 04:49 |
opendevreview | suiong ng proposed openstack/nova stable/2024.1: ironic: Fix ConflictException when deleting server https://review.opendev.org/c/openstack/nova/+/940846 | 05:21 |
opendevreview | Michael Still proposed openstack/nova master: libvirt: Add extra spec for sound device. https://review.opendev.org/c/openstack/nova/+/926126 | 06:42 |
opendevreview | Michael Still proposed openstack/nova master: Protect older compute managers from sound model requests. https://review.opendev.org/c/openstack/nova/+/940770 | 06:42 |
opendevreview | Michael Still proposed openstack/nova master: libvirt: Add extra specs for USB redirection. https://review.opendev.org/c/openstack/nova/+/927354 | 06:42 |
opendevreview | Michael Still proposed openstack/nova master: Don't calculate the minimum compute version repeatedly. https://review.opendev.org/c/openstack/nova/+/940848 | 06:42 |
opendevreview | Michael Still proposed openstack/nova master: libvirt: direct SPICE console object changes https://review.opendev.org/c/openstack/nova/+/926876 | 06:46 |
opendevreview | Michael Still proposed openstack/nova master: libvirt: direct SPICE console database changes https://review.opendev.org/c/openstack/nova/+/926877 | 06:46 |
opendevreview | Michael Still proposed openstack/nova master: libvirt: allow direct SPICE connections to qemu https://review.opendev.org/c/openstack/nova/+/924844 | 06:46 |
mikal | I have updated the tracking etherpad at https://etherpad.opendev.org/p/nova-2025.1-status with a brain dump of the current state of the various SPICE VDI patches. | 06:59 |
*** ralonsoh_ is now known as ralonsoh | 07:40 | |
opendevreview | benlei proposed openstack/nova master: Abort live migration task when stop nova compute service https://review.opendev.org/c/openstack/nova/+/938223 | 08:25 |
artem_vasilyev | Hey, could you review a small linter and test fixes for macOS support, please https://review.opendev.org/c/openstack/nova/+/937727 | 08:50 |
bauzas | dansmith: Uggla: thanks for the reviews on https://review.opendev.org/c/openstack/nova/+/940642 | 10:20 |
Uggla | bauzas, you are welcome. | 10:27 |
sean-k-mooney | bauzas: i just left my comment too i dont partically like the approch you took | 10:48 |
sean-k-mooney | it seems complex for what your trying to do | 10:48 |
bauzas | sean-k-mooney: thanks, I'll look over them after lunch | 10:57 |
bauzas | sean-k-mooney: your idea of getting all the props by a object property is actually a good one | 10:58 |
sean-k-mooney | i ocationally have them | 10:59 |
bauzas | for your algorithm with union/intersection, let's discuss this later, as I need to explain correctly why it wasn't working when I tested them | 11:00 |
sean-k-mooney | i check ovo and it does not have a funciton for that but they do it often internally. i woudl consider adding set_filed ot novas base object class | 11:00 |
sean-k-mooney | sure. form my perspecive the cardinatly shoudl not matter | 11:01 |
sean-k-mooney | i.e. if 2 isntance request the image property on one host adn one instnace request it on anohter shoudl not affect the weight | 11:01 |
sean-k-mooney | i think you are trying to also include that by using counter right? | 11:02 |
sean-k-mooney | we can chat later but i was intentionly trying not to have the number of vms on a host affect teh weighing of the host | 11:02 |
bauzas | I tested it, and the problem is that for example, if you have a host that has already 2 instances and one other that only has 1 using the prop, then it would prefer the second | 11:03 |
bauzas | so I preferred to just return the number of how many asked properties that are alrady used in the host | 11:04 |
bauzas | this is simplier to calculate and understand | 11:04 |
sean-k-mooney | i think that is more complex | 11:04 |
bauzas | also, I saw with devstack that when you create an instance, you get more props that the one you asked | 11:04 |
sean-k-mooney | and harder ot understand | 11:04 |
sean-k-mooney | you do yes we set some like the machine type on boot | 11:05 |
sean-k-mooney | or config drive | 11:05 |
bauzas | indeed, but you haven't asked for it, that's my point | 11:05 |
sean-k-mooney | based on the host config. we will be adding the vtpm secrete type | 11:05 |
sean-k-mooney | you have not but we shoudl not exclude them | 11:05 |
bauzas | anyway, I need to go off | 11:06 |
sean-k-mooney | ack | 11:06 |
opendevreview | sean mooney proposed openstack/nova master: [WIP] move nova-ovs-hybrid-plug to deploy with spice and fix qxl default https://review.opendev.org/c/openstack/nova/+/940835 | 12:20 |
opendevreview | sean mooney proposed openstack/nova master: Dont deploy n-spice on compute nodes. https://review.opendev.org/c/openstack/nova/+/940873 | 12:20 |
opendevreview | suiong ng proposed openstack/nova master: Fix parameter order in add_instance_info_to_node https://review.opendev.org/c/openstack/nova/+/939411 | 13:15 |
*** ykarel_ is now known as ykarel | 13:27 | |
opendevreview | sean mooney proposed openstack/nova master: allow discover host to be enabled in multiple schedulers https://review.opendev.org/c/openstack/nova/+/938523 | 13:53 |
sean-k-mooney | gibi: i just dropped the job change form ^ | 13:54 |
sean-k-mooney | if you have time to look at https://review.opendev.org/c/openstack/nova/+/939476 as well it would be nice to land that | 13:55 |
sean-k-mooney | ... | 13:59 |
sean-k-mooney | mikal: "libvirt.libvirtError: unsupported configuration: spice graphics are not supported with this QEMU" | 13:59 |
sean-k-mooney | it looks like ubuntu 24.04 didnt just remove qxl they fully removed spice supprot like rhel9 | 13:59 |
sean-k-mooney | i think, debian still supprots it | 14:00 |
sean-k-mooney | so i could move that job to debian but if centos/rhel 9 and ubuntu 24.04 have fully drop spice that proably our last option to test it | 14:01 |
sean-k-mooney | kolla uses debian continer on all distos now so i woudl guess that is why its workign for you in kolla-ansible | 14:01 |
sean-k-mooney | that our your still using ubuntu 22.04? | 14:02 |
ykarel | Hi is the issue a known one where test fails randomly with libvirt.libvirtError: Failed to terminate process 70698 with SIGKILL: Device or resource busy ? | 15:04 |
ykarel | like seen in | 15:04 |
ykarel | https://0c63ab9652170854bf26-a09d1a3317eb4b9b558e42ad19c25861.ssl.cf2.rackcdn.com/940474/1/gate/neutron-ovs-tempest-multinode-full/5184c7f/testr_results.html | 15:04 |
ykarel | https://97ad4d1320a89f3380ef-01f5fc3a5734547a13a0f54725d40b32.ssl.cf5.rackcdn.com/936364/4/gate/nova-multi-cell/3003ee6/testr_results.html | 15:04 |
ykarel | found a quite old related patch https://review.opendev.org/c/openstack/nova/+/639091 but that was targetting a libvirt version and it got removed from nova now as that old versions no longer supported | 15:06 |
sean-k-mooney | ykarel: rerely teh qemu monitor process that is responisble for respondind to the howdown of the guest can become unresponsive | 15:35 |
sean-k-mooney | but htis is not realy a nova issue its more of a libvirt/qemu one | 15:36 |
sean-k-mooney | nova should not need to actuly retry the destroy at all | 15:36 |
ykarel | sean-k-mooney, ohk so it's a known issue? what is the mitigation for this then? | 15:38 |
ykarel | the operator/user have to retry the operation? | 15:39 |
sean-k-mooney | not really | 15:39 |
sean-k-mooney | its a very rare failure mode in qemu to exit gracefully when ask | 15:39 |
sean-k-mooney | libvirt tried to fix this by sendign sigkill instead of just sigterm | 15:39 |
sean-k-mooney | so the os would evenutally reap the process | 15:39 |
sean-k-mooney | if sigkill does not work there nothing nova can do to force it to exit really | 15:40 |
ykarel | ohkk got it, but why nova cannot do retry like done in above patch if response is ebusy ? | 15:42 |
sean-k-mooney | libvirt is ment to do it internally | 15:43 |
ykarel | ohkk but as per logs thats not helping as there are failures in CI | 15:45 |
ykarel | ohh or you mean libvirt should do it but it's not doing it and libvirt have to be fixed for that? | 15:46 |
sean-k-mooney | its either not doing it or it is doing it and qemu is still not exiting | 15:46 |
ykarel | ohkk no idea then where the issue is, i will start with nova bug and other points could be checked as part of it | 15:47 |
ykarel | thx sean-k-mooney for sharing all the insights | 15:48 |
sean-k-mooney | i dont think you shoudl start with a nova bug | 15:48 |
sean-k-mooney | i also think we sthil have some rety logic because _destory does call itesl https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L1612 | 15:48 |
ykarel | nova people may know more about libvirt/qemu bits so can judge more then me so i thought to start with it :) | 15:48 |
sean-k-mooney | i would not personaly consider it a nova bug but other might disagree | 15:51 |
sean-k-mooney | the orgianl issue was not a nova bug it was a workaround for a libvirt/qemu one | 15:52 |
sean-k-mooney | ykarel: as far as i can see we actuly retry infinetly https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L1603 | 15:55 |
sean-k-mooney | im not a huge fan of who this is written but in destroy we are using loopingcall form oslo.service to execute _wait_for_destroy which checks if the instnace is stil runing. if it sets kwargs['is_running'] = True | 15:56 |
sean-k-mooney | which asues use to recursivly call destroy https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L1612 | 15:57 |
sean-k-mooney | that loop will break however if we get an excption form libvirt | 15:58 |
sean-k-mooney | so if we get EBUSY or other libvirt exction in _wait_for_destroy it wont retry | 16:02 |
ykarel | but that is_running = true seems not reached as i don't see it in logs | 16:02 |
ykarel | it's raising at https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L1566 | 16:02 |
ykarel | i.e not seeing in logs "Going to destroy instance again" | 16:02 |
sean-k-mooney | that means the libvirt could not destory the instnace | 16:02 |
ykarel | but you said it will retry indefinitely | 16:03 |
sean-k-mooney | yes if there is no error | 16:03 |
sean-k-mooney | so if we ask libvirt to destory the instance and there is no error | 16:04 |
sean-k-mooney | but it does not complete in half a secodn we ask it to do it again | 16:04 |
sean-k-mooney | so we retry only when tehre isnt an error | 16:04 |
sean-k-mooney | if you retired the delete or hard reboot ectra it may work if libvirt has correct its internal error or qemu finally died | 16:05 |
sean-k-mooney | if qemu is locked up badly enough to service a sig kill form libvirt its possibel the only way to kill it woudl be a host reboot. trying it again might work but there is no garentee | 16:07 |
sean-k-mooney | we coudl readd a limite retry in error but it woudl really just eb papering over a potital libvirt or qemu bug | 16:07 |
sean-k-mooney | we have done that in the past but its kind of tech debt every time we do which is why we removed this | 16:08 |
-opendevstatus- NOTICE: nominations for the OpenStack PTL and TC positions are now open, for details see https://governance.openstack.org/election/ | 16:08 | |
ykarel | removal seems was done as part of cleanup of old libvirt version, but okk got everything what you mean | 16:09 |
ykarel | but it's good if it could be worked around atleast for the cases where it's not a persistent issue on qemu side | 16:10 |
sean-k-mooney | if you want to file it as a bug do, if we readd this i think we woudl put it behind a workaround flag | 16:10 |
ykarel | ok that should also work | 16:10 |
ykarel | thx again | 16:11 |
sean-k-mooney | it would also look at the failing job | 16:11 |
sean-k-mooney | this happens because of very high load | 16:11 |
sean-k-mooney | so the job may be usign too high a concurancy or have other isseus that is triggering it | 16:11 |
ykarel | hmm underlying hypervisor where it ran might also be overloaded | 16:12 |
sean-k-mooney | i belive libvirt is now going to wait at least 40s instead of 15 before retuing EBUSY | 16:13 |
ykarel | yes as per logs i see 40s b/w requesting destroy and returned error | 16:22 |
* ykarel away | 16:34 | |
opendevreview | Doug Szumski proposed openstack/nova master: Stop corrupting ephemeral volumes during live migration https://review.opendev.org/c/openstack/nova/+/940900 | 17:50 |
dougszu | Any opinions on that would be great ^. I've got another fix for cold migration using block pull to share if it's of interest. And of course, the final bit to close the bug would be not making file systems on the ephemerals at all. | 17:53 |
-opendevstatus- NOTICE: The Gerrit service on review.opendev.org will be offline momentarily while we upgrade for a new jeepyb feature and switch our database container image source repository | 18:52 | |
sean-k-mooney | dougszu: i see you tried removing the backing file | 19:04 |
sean-k-mooney | longterm definlty the direction we shoudl go | 19:04 |
sean-k-mooney | we likely would need to cahnge intal boot/reboot and cold migration or evacuate too | 19:05 |
sean-k-mooney | if we were to do this. | 19:05 |
sean-k-mooney | i.e procive a way for nova, on the next lifeccyle operation on the guest to move or redifien the domain, to remove the backing file for ephmeral and swap disk | 19:06 |
mikal | sean-k-mooney: yeah, I am using Debian containers on Kolla-Ansible, so that's why I didn't notice the Ubuntu behaviour. | 19:07 |
sean-k-mooney | i need to push my pathc to try the job with debian | 19:07 |
sean-k-mooney | i have been gettign distracted for like the last 3 hours | 19:08 |
mikal | Heh, its ok, I didn't manage to get to tempest for similar reasons yesterday. | 19:08 |
mikal | I find people dropping SPICE quite frustrating to be entirely honest. The public excuse seems to be H.264 requirements, but RDP has the same requirements and there is a patent license from Cisco for the codec. | 19:09 |
opendevreview | sean mooney proposed openstack/nova master: [WIP] move nova-ovs-hybrid-plug to deploy with spice and fix qxl default https://review.opendev.org/c/openstack/nova/+/940835 | 19:09 |
opendevreview | sean mooney proposed openstack/nova master: Dont deploy n-spice on compute nodes. https://review.opendev.org/c/openstack/nova/+/940873 | 19:09 |
sean-k-mooney | mikal: spice is the better console. vnc is catching up but spice is still better | 19:09 |
sean-k-mooney | mikal: i dont think it h.264 related honestly | 19:10 |
mikal | sean-k-mooney: yeah, RDP is probably about as good as SPICE in terms of features, but qemu doesn't support it at all and the protocol is much more complicated as best as I can tell. | 19:10 |
sean-k-mooney | whats funney is as far as i know its the default in gnome boxes and a few other things | 19:13 |
sean-k-mooney | so its not like spice was unsued | 19:13 |
mikal | oVirt used it a bit too, as did proxmox. | 19:14 |
mikal | Also... its pretty stable. It hasn't changed much if at all in years. | 19:14 |
sean-k-mooney | i think its still an option in proxmox given that derived form rhel | 19:14 |
mikal | So there's no feature chasing etc. | 19:14 |
sean-k-mooney | *debian | 19:14 |
sean-k-mooney | speaking of stable i have updated the patch to the latest debian stable (12/bookworm) | 19:15 |
sean-k-mooney | actully we do not need grenade and some of the other jobs ill quickly drop those out whiel we are testign | 19:16 |
opendevreview | sean mooney proposed openstack/nova master: [WIP] move nova-ovs-hybrid-plug to deploy with spice and fix qxl default https://review.opendev.org/c/openstack/nova/+/940835 | 19:17 |
opendevreview | sean mooney proposed openstack/nova master: Dont deploy n-spice on compute nodes. https://review.opendev.org/c/openstack/nova/+/940873 | 19:17 |
sean-k-mooney | ok better its just runnign one job now | 19:18 |
opendevreview | Michael Still proposed openstack/nova master: Don't calculate the minimum compute version repeatedly. https://review.opendev.org/c/openstack/nova/+/940848 | 19:18 |
opendevreview | Michael Still proposed openstack/nova master: libvirt: Add extra spec for sound device. https://review.opendev.org/c/openstack/nova/+/926126 | 19:18 |
opendevreview | Michael Still proposed openstack/nova master: Protect older compute managers from sound model requests. https://review.opendev.org/c/openstack/nova/+/940770 | 19:18 |
opendevreview | Michael Still proposed openstack/nova master: libvirt: Add extra specs for USB redirection. https://review.opendev.org/c/openstack/nova/+/927354 | 19:18 |
mikal | ^--- fixes a one character error in the first patch in that series | 19:19 |
sean-k-mooney | ah you have split out the sound/usb ones and the performance improvment, nice | 19:20 |
mikal | Yeah, I wrote up a decoder ring on the etherpad because its getting complicated -- https://etherpad.opendev.org/p/nova-2025.1-status | 19:22 |
mikal | But there's basically two independent series now -- the API changes, and the "VDI changes" (sound, usb, compute version bumps etc). | 19:23 |
mikal | I agree we should focus on the API changes first, but that's blocking right now on me getting some quality time with tempest. | 19:23 |
sean-k-mooney | in its current form both can be merged in either order and in parallel so that will help i think. | 19:27 |
sean-k-mooney | mikal: https://zuul.opendev.org/t/openstack/build/842a66f2cfe54eb39d856a3bf311bb5c | 20:14 |
sean-k-mooney | mikal: debian works and the job passes with it | 20:14 |
sean-k-mooney | so ill revert the commetned out jobs but we should be able to test in that job | 20:15 |
opendevreview | sean mooney proposed openstack/nova master: move nova-ovs-hybrid-plug to deploy with spice and fix qxl default https://review.opendev.org/c/openstack/nova/+/940835 | 20:27 |
opendevreview | sean mooney proposed openstack/nova master: Dont deploy n-spice on compute nodes. https://review.opendev.org/c/openstack/nova/+/940873 | 20:27 |
mikal | sean-k-mooney: cool, thank you. I assume we still want to do a tempest test, which I will try to actually get around to attempting today. | 20:34 |
sean-k-mooney | i think so but those should help with a job we can run the test in | 20:35 |
mikal | Agreed | 20:35 |
mikal | Thanks for chasing this bit for me. | 20:36 |
sean-k-mooney | i just added those to the etherpad too with some explanations | 20:36 |
mikal | Cool | 20:36 |
sean-k-mooney | mikal: if nothing else, we have not deprecated spice supprot and there is litrally no spice job anywhere so just for that i consider those to be a win | 20:36 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!