*** rpittau|afk is now known as rpittau | 07:13 | |
lyarwood | FWIW https://review.opendev.org/c/openstack/nova/+/803322 has hit both https://bugs.launchpad.net/nova/+bug/1938021 and https://bugs.launchpad.net/nova/+bug/1936849 now, I've not had a chance to look at these really so if anyone has time this morning please take a look. | 08:06 |
---|---|---|
gibi | lyarwood: ack, those bugs are on my radar too, but I have conflicting priorities :/ | 08:50 |
gibi | I will try to look at them this week | 08:50 |
* kashyap wonders if it's worth it at all to do the tedious deprecation stuff for floppy, instead just rip it out. I looked at tens of thousands of guests from OSP data I have, and I have not seen a single floppy disk, bus, or as a boot device. | 08:53 | |
bauzas | lyarwood: i can try to take a look | 08:54 |
bauzas | in the meantime, can someone explain me the weirdo issue with mypy in https://review.opendev.org/c/openstack/nova/+/802918/7/nova/virt/libvirt/driver.py#511 ? | 08:55 |
bauzas | oh, because I used a lambda function | 08:55 |
* bauzas facepalms | 08:56 | |
bauzas | gibi: any idea how I could tell mypy to *not* ask for a type annotation for a lambda function ? | 08:57 |
gibi | bauzas: either you gave it a ty.Any annotation | 08:59 |
gibi | or | 08:59 |
gibi | there is some comment to silence mypy... | 08:59 |
gibi | # type: ignore | 08:59 |
gibi | bauzas: why don't you want to define the type? is it a complicated one? | 08:59 |
bauzas | gibi: I don't get why myty doesn't ask for an annotation here : https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L474 | 09:00 |
bauzas | and why it asks for https://review.opendev.org/c/openstack/nova/+/802918/7/nova/virt/libvirt/driver.py#511 | 09:00 |
bauzas | oh, the type annotation is for the new attribute | 09:03 |
bauzas | I guess because mypy can't statically guess it | 09:03 |
bauzas | as it uses a lambda function | 09:03 |
gibi | bauzas: you can check what mypy can guess reveal_type(<variable>) | 09:03 |
gibi | as far as I see local variables only resolved if the function is resolved by mypy | 09:04 |
gibi | so if the function has no annotation on the signature then mypy will not check the locals | 09:04 |
bauzas | which is the case for __init__ | 09:05 |
bauzas | but | 09:05 |
gibi | would be interesting to see what type self._sysinfo_serial_func got from mypy | 09:05 |
gibi | but anyhow self.mdev_class_mapping is a Dict[str, str] isn't it? | 09:06 |
bauzas | let me run reveal_type on both mdev_class_mapping and pgpu_class_mapping | 09:06 |
bauzas | gibi: exact, annotation is simple | 09:06 |
bauzas | hum, very interesting | 09:08 |
bauzas | gibi: https://paste.opendev.org/show/807878/ | 09:10 |
bauzas | if I ask reveal_type() this forces mypy to introspect local variables | 09:10 |
bauzas | so I guess we now need to add annotations every time | 09:11 |
opendevreview | Sylvain Bauza proposed openstack/nova master: Provide the mdev class for every PCI device https://review.opendev.org/c/openstack/nova/+/802918 | 09:14 |
opendevreview | Sylvain Bauza proposed openstack/nova master: Provide and use other RCs for mdevs if needed https://review.opendev.org/c/openstack/nova/+/803233 | 09:14 |
opendevreview | Sylvain Bauza proposed openstack/nova master: Expose the mdev class https://review.opendev.org/c/openstack/nova/+/801743 | 09:14 |
opendevreview | Sylvain Bauza proposed openstack/nova master: WIP: Cleanup GPU vs. mdev wording https://review.opendev.org/c/openstack/nova/+/803379 | 09:14 |
gibi | bauzas: I think that only means that mypy did not assign any type to either of those variables but mypy needed to use some type for mdev_class_mapping for something else later so mypy asked the typehint from you. When you added reveal_type you forced mypy to try to assign type to both variables hence is asked typehint for both from you | 09:21 |
bauzas | gibi: yup, that's what I found | 09:22 |
bauzas | static typing, my love | 09:22 |
bauzas | anyway, this is fixed in the last rev, I added an annotation | 09:22 |
gibi | ack | 09:23 |
gibi | I'm done with the re-review of the mdev series so far so good :) | 09:25 |
bauzas | see, I haven't grumbled about mypy | 09:25 |
bauzas | sure, it's important to tell that a variable using a collections.defaultdict is a dict :p | 09:26 |
bauzas | just in case people don't know :D | 09:26 |
bauzas | gibi: I replied to your (good) concern https://review.opendev.org/c/openstack/nova/+/803233/3//COMMIT_MSG#17 | 09:41 |
stephenfin | kashyap: not an option, I'm afraid https://governance.openstack.org/tc/reference/tags/assert_follows-standard-deprecation.html | 09:54 |
kashyap | Yeah, I know "standards" | 09:57 |
sean-k-mooney | kashyap: well in this case its openstacks standard | 10:32 |
sean-k-mooney | which nova does follow | 10:32 |
sean-k-mooney | kashyap: for what its worth if qemu upstream dont remove it im not really in a rush to remove it form nova. deprecating floppy usage sure, but it does not cost us much if anything to keep it and distros can still drop support even if we support it upstream | 10:36 |
kashyap | sean-k-mooney: Hey; I see what you mean, but as noted it is more of a liability than anything at this point | 10:39 |
kashyap | (Given past CVEs. Upstream QEMU too, it's discouraged) | 10:39 |
kashyap | (Downstream distros of OpenStack can also simply declare it out of scope / deprecated.) | 10:40 |
opendevreview | Merged openstack/nova master: zuul: Increase GLANCE_LIMIT_IMAGE_SIZE_TOTAL for nova-lvm https://review.opendev.org/c/openstack/nova/+/803322 | 10:55 |
lyarwood | \o/ | 11:00 |
opendevreview | Lee Yarwood proposed openstack/nova master: libvirt: Handle silent failures to extend volume within os-brick https://review.opendev.org/c/openstack/nova/+/801714 | 11:37 |
opendevreview | Lee Yarwood proposed openstack/nova master: Add functional test for bug 1937375 https://review.opendev.org/c/openstack/nova/+/802011 | 11:37 |
opendevreview | Lee Yarwood proposed openstack/nova master: compute: Avoid duplicate BDMs during reserve_block_device_name https://review.opendev.org/c/openstack/nova/+/801990 | 11:37 |
opendevreview | Lee Yarwood proposed openstack/nova master: fup: Move _wait_for_volume_attach into InstanceHelperMixin https://review.opendev.org/c/openstack/nova/+/802623 | 11:37 |
opendevreview | Lee Yarwood proposed openstack/nova master: Add regression test for bug 1938326 https://review.opendev.org/c/openstack/nova/+/802801 | 11:38 |
opendevreview | Lee Yarwood proposed openstack/nova master: compute: Do not mark disabled but down services as in maintenance https://review.opendev.org/c/openstack/nova/+/802317 | 11:38 |
lyarwood | gibi / bauzas ; now the gate is fixed reviews on ^ would be appreciated this week if you have time | 11:38 |
* lyarwood -> lunch | 11:41 | |
gibi | lyarwood: on it | 11:45 |
gibi | lyarwood: after your lunch, is this extended race scenario possible? https://review.opendev.org/c/openstack/nova/+/801990/6/nova/compute/manager.py#6985 | 12:01 |
lyarwood | gibi: yeah that's also possible but I wonder if we want to treat it as a separate fix? | 12:12 |
lyarwood | actually thinking about it, is it an issue if we have racing requests against the same volume | 12:13 |
lyarwood | multiattach races would be caught by c-api later | 12:14 |
gibi | lyarwood: I would treat it as a separate fix. whay you proposed is a valid fix for a scenario reported int he bug | 12:15 |
gibi | hm, I guess we have the instance.uuid lock as a pattern, we have that for most of the operations | 12:15 |
gibi | but here a volume.id lock would be better | 12:16 |
gibi | I'm not sure we even need the instance.uuid lock here | 12:16 |
gibi | but could be | 12:16 |
stephenfin | bauzas: Can we trade reviews for your generic mdev series and my DB series? I'm really eager to close out as much as I can this week, especially with a few people on PTO next week | 12:27 |
bauzas | stephenfin: for the moment, I'm working on the functest | 12:27 |
bauzas | for the mdev series, asked by gibi | 12:27 |
bauzas | but in one hour, we could do | 12:28 |
sean-k-mooney | gibi: i think we take an instace lock for volume stuff to avoid concurrent voluem attach/detach operation on the same instance | 12:29 |
sean-k-mooney | multi attach is interesting edgecase | 12:30 |
gibi | sean-k-mooney: how a reserve that only create a new bdm can race with an attach that already has the bdm created or a detach that will delete a dbm tha should already exists | 12:30 |
gibi | still, I'm not against keeping the instance.uuid lock | 12:31 |
sean-k-mooney | gibi: well i was just using attach as example not suggesting it would race with it | 12:31 |
gibi | it is a sensible safety measure | 12:31 |
opendevreview | Alexey Stupnikov proposed openstack/nova master: Fix incorrect reservation IDs in unit tests https://review.opendev.org/c/openstack/nova/+/803363 | 12:34 |
aarents | Hi nova, | 13:09 |
sean-k-mooney | o/ | 13:09 |
aarents | I'm a bit stuck with this change https://review.opendev.org/c/openstack/nova/+/799606 (see my self comment) which is only a partial fix to the bug https://bugs.launchpad.net/nova/+bug/1934742 I have in operation, | 13:09 |
aarents | I'm not sure how to properly fix that, if someone have idea I can dig.. | 13:09 |
sean-k-mooney | let me read but you having issue with pepole delete port via neutron? rather then port detach right | 13:10 |
sean-k-mooney | technially we dont really support that | 13:10 |
aarents | yep port deletion when attached | 13:11 |
sean-k-mooney | we have the network-vif-deleted event to try and help fix things | 13:11 |
sean-k-mooney | btu deletion when attached is strongly discuraged | 13:11 |
sean-k-mooney | the correct way to fix it would be to either block it in neutron or have nova call detach internally | 13:12 |
sean-k-mooney | aarents: stirckly speaking form a nova perpective deleting an interface that is attach to an instance via neutorn is a user error | 13:14 |
sean-k-mooney | and the code path you are looking at is a workaround we put in place to try an partly handel it | 13:15 |
aarents | So we can imagine calling detach_interface but it will try to to unbind and will probably got 404 from neutron which is not a big deal | 13:15 |
sean-k-mooney | we would have to handel that ya but its something we could handel | 13:16 |
sean-k-mooney | this came up a few weeks ago though in the neutron driver meeting an i was pushing to block deleteing attach ports in the neutron api | 13:16 |
sean-k-mooney | long term that is the correct approch | 13:17 |
sean-k-mooney | this is really a neutorn bug not a nova one but nova can do a better jobs at cleaning up when you incorrectly detelete attach ports | 13:18 |
sean-k-mooney | the same is true for attach volumes by the way | 13:18 |
sean-k-mooney | you should never try to delete an attach volume but in that case i belive cinder will protect you | 13:18 |
sean-k-mooney | gibi: you rememebr that conversation about 2-3 weeks ago in the drivers meeting right? i think the resolution was to write a neutron spec for the new extnsion | 13:19 |
aarents | sean-k-mooney: I see your point about not a nova issue, My issue is I'like to avoid this possible leak because it generates un live_migrable instances in infra.. | 13:19 |
sean-k-mooney | yep im sure it does and yes that is why we try to clean it up | 13:20 |
sean-k-mooney | aarents: it has other proplems it hte port that was delete uses sriov | 13:20 |
sean-k-mooney | since the pci device will not be released properly | 13:20 |
aarents | I see | 13:21 |
sean-k-mooney | aarents: sriov deattach does not work properly in any release before wallaby | 13:21 |
sean-k-mooney | and i think it sitll wont work properly if you delete an attach neturon port although ill admit i have not looked at that code path to check | 13:22 |
sean-k-mooney | i just suspect that we did not test that when we added sriov attach/detach support | 13:22 |
sean-k-mooney | aarents: anyway your partial fix is to just add a lock | 13:23 |
sean-k-mooney | that might help i in limited cases but i dont think it will in general fix this | 13:24 |
aarents | I aggre it is not enought | 13:24 |
aarents | sean-k-mooney: thks! for the usefull info, I will dig more on blocking or try a detach | 13:30 |
sean-k-mooney | aarents: by the way if we cant use detach directly because all files cant be constructed we have 2 options. | 13:30 |
sean-k-mooney | we could have neutron send the full port info in the detach payload or we could contocut as much data as we can and pass incompelte objects | 13:31 |
sean-k-mooney | but blocking it in nueton would be the best approch or having neutron call detach which is rahter messy but would work better | 13:31 |
aarents | "neutron send the full port info in the detach payload" is interesting yes | 13:32 |
sean-k-mooney | one thing we disucssed was moving when neutron sends the event | 13:32 |
sean-k-mooney | if neutorn sent the deleted event and waited for nova to ack it | 13:32 |
sean-k-mooney | then that would also fix it | 13:33 |
sean-k-mooney | to a degree at least | 13:33 |
sean-k-mooney | anyway this is a know issue that we just have not focused on so sorry your are hitting it | 13:33 |
aarents | "could contocut as much data as we can and pass incompelte objects" <- I try that but it needs details like vif_type bridge name this info are no more available when port is deleted, I don't know how regenerate that | 13:34 |
sean-k-mooney | ah yes we do need those | 13:35 |
sean-k-mooney | which may or may not be still in the info cache as you have found | 13:35 |
opendevreview | Merged openstack/nova master: Add functional test for bug 1937375 https://review.opendev.org/c/openstack/nova/+/802011 | 13:35 |
sean-k-mooney | its its a hack but we might be able to reconstuct the required data by inspecting the libvirt xml but i would prefer to avoid that | 13:36 |
aarents | sean-k-mooney: yes.. it was the workaround I was considering, kind of detach_device_by_mac, we intereate on interfaces nad deleted the good one | 13:38 |
sean-k-mooney | ya by mac works for most things excption sriov pfs but in that case you can kind of still figure it out in some cases | 13:39 |
sean-k-mooney | it really depend on how much of the db info we have acess too | 13:39 |
sean-k-mooney | we cant get the pci address form the port since its gone and the info cache may be empty but we recently started recoring the neutron port uuid as the requester id in the pci_devices table i belive | 13:40 |
aarents | the info we can grab in db are entry in virtuales_interfaces: | 13:41 |
sean-k-mooney | which means we should be able to get the pci adress by the port uuid | 13:41 |
aarents | +---------------------+------------+------------+----+--------------------------------------------------------+------------+--------------------------------------+--------------------------------------+---------+------+ | 13:41 |
aarents | | created_at | updated_at | deleted_at | id | address | network_id | uuid | instance_uuid | deleted | tag | | 13:41 |
aarents | +---------------------+------------+------------+----+--------------------------------------------------------+------------+--------------------------------------+--------------------------------------+---------+------+ | 13:41 |
aarents | | 2021-08-03 15:44:23 | NULL | NULL | 62 | fa:16:3e:fb:2d:7b/4ae693fd-23fb-46c6-a8a5-369270029fc6 | NULL | 4ae693fd-23fb-46c6-a8a5-369270029fc6 | d07df8d0-fc98-4087-9b25-3b4a55cdcac4 | 0 | NULL | | 13:41 |
aarents | +---------------------+------------+------------+----+--------------------------------------------------------+------------+--------------------------------------+--------------------------------------+---------+------+ | 13:41 |
sean-k-mooney | yep | 13:41 |
opendevreview | Sylvain Bauza proposed openstack/nova master: Expose the mdev class https://review.opendev.org/c/openstack/nova/+/801743 | 13:42 |
opendevreview | Sylvain Bauza proposed openstack/nova master: WIP: Cleanup GPU vs. mdev wording https://review.opendev.org/c/openstack/nova/+/803379 | 13:42 |
sean-k-mooney | so with that info and the pci_devices table we can with some effort constuct most of what we need to inpect the xml and then figure out what the vif_type ectra was to unplug | 13:42 |
sean-k-mooney | but since the virt dirvers are not really ment to tlak to the db that is tricky to do | 13:43 |
sean-k-mooney | we would have to implement a new fuction in the compute manager ot do that more then likely | 13:43 |
sean-k-mooney | and then extend thet virt driver api | 13:43 |
sean-k-mooney | well it depend on how we approch it | 13:44 |
aarents | sean-k-mooney: it is good to know the topic is open with neutron, in short term I will find a hack to detach interface with few info I have in db | 13:54 |
aarents | sean-k-mooney: thks! | 13:55 |
kashyap | sean-k-mooney: Remind me again, why did you suggest to leave this as-is to 'cirrus'? - https://review.opendev.org/c/openstack/nova/+/798680/3/nova/virt/libvirt/config.py | 14:00 |
sean-k-mooney | because we only want to change the behavior for new instnaces | 14:00 |
sean-k-mooney | also for most code paths this get overriden | 14:00 |
sean-k-mooney | none might actully be a better default | 14:01 |
sean-k-mooney | i dont think this will ever get used currently at least not on x86 | 14:01 |
kashyap | sean-k-mooney: I'm actually going to do a bunch of migration tests with a variant of this patch to see if there's *actual* breakage or are we talking only theoretical stuff | 14:01 |
sean-k-mooney | well what we approved ot proceed was no change to existing instnaces | 14:03 |
sean-k-mooney | if you want to change the default of exisitng instance then we should disssu this cahgne again with the wider team | 14:03 |
kashyap | Well, let's see what breaks, if anything. "What was approved" was all based on theory. I'd like to see some actual evidence | 14:03 |
sean-k-mooney | so i think you should leave it at cirrus or defer this to yoga | 14:03 |
kashyap | Don't worry, I am concerned as much as you to not break any valid cases or upgrades, etc. | 14:04 |
kashyap | Oh, totally forgot: I said this befoere but I've gotten a Red Hat QE to do some tests w/ Cirrus and VirtIO to change for existing instances - for Windows and Linux. | 14:09 |
kashyap | I'll add a note in the review | 14:09 |
sean-k-mooney | they would need to test both migration and hard reboots after the fact withthe updated xmls | 14:10 |
sean-k-mooney | for live migration obviolys we cant change teh device model, it could change on a hard reboot after the migration but we generally want to avoid that | 14:10 |
kashyap | sean-k-mooney: Right, the test that was done so far was this: | 14:10 |
kashyap | Have a Linux and Windows guest on source with 'cirrus', change the video model to 'virtio'; reboot the guest, and then live-migrate -- it all succeeds | 14:10 |
sean-k-mooney | ack. we had discussed that in some cases default to virtio might be fine due to the vga fallback | 14:11 |
sean-k-mooney | have you test this where we have specifed vram | 14:12 |
sean-k-mooney | or other extra specs | 14:12 |
sean-k-mooney | * other image properteis | 14:12 |
kashyap | And on hard-reboot, a guest can pick up new device-related bits some times; can't avoid it in some cases | 14:12 |
kashyap | sean-k-mooney: Can you spell out a bit more on what do you want tested with the vRAM? | 14:12 |
sean-k-mooney | right personally i feel like that is a bug when that happens | 14:13 |
kashyap | (I mean, it's a bug if it breaks anything user-visible; if not, I'd say it's fine) | 14:13 |
sean-k-mooney | depends on who you talk too | 14:13 |
kashyap | sean-k-mooney: Likewise, what props you want to test in this case? How are these other image props related? | 14:13 |
sean-k-mooney | some customer treat it as a bug since they would require recertifcation of the workload other dont care | 14:14 |
sean-k-mooney | hw_video_ram | 14:14 |
sean-k-mooney | i belive virtio is limited to 8MB | 14:14 |
sean-k-mooney | cirrus i think is larger | 14:14 |
sean-k-mooney | i think crrus can support 24-64 mb somethign in that region | 14:15 |
sean-k-mooney | kashyap: that is hte main one im concerned about currently | 14:15 |
kashyap | sean-k-mooney: You mean hw_video_ram is the thing you're concerned about? | 14:16 |
sean-k-mooney | yes | 14:16 |
kashyap | sean-k-mooney: Sure, adding a note to test that too (and summarizing what we talked here on the change) | 14:16 |
sean-k-mooney | i think the max vram that you can use with virtio is less then that for cirrus | 14:16 |
sean-k-mooney | https://github.com/openstack/nova/blob/master/nova/objects/image_meta.py#L417 | 14:16 |
kashyap | sean-k-mooney: What I'm not clear is the impact of hw_video_ram on the switch to virtio | 14:18 |
* kashyap clicks | 14:18 | |
kashyap | Maybe virtio doesn't need that max vRAM. Anyway, to be tested :) | 14:20 |
sean-k-mooney | kashyap: if you had it set to 16 with cirrus it would work but when you swich to virtio you vm would not boot | 14:20 |
sean-k-mooney | kashyap: lets concirm since i tought the virt team said it did no that downstream mail thread today | 14:21 |
kashyap | Another q. is In what scenarios would one bother to set this at all? | 14:21 |
kashyap | Yes, I will ask the virt graphics maintainer about it | 14:21 |
sean-k-mooney | to improve guest performacne | 14:21 |
sean-k-mooney | it allow more frame buffers to be created in the graphic device which is need for higher resolutions | 14:22 |
sean-k-mooney | without a large enough vram allocation you can do double or triple buffing in the graphic device and xrog has to fall back to copying buffers to/from guest ram | 14:23 |
kashyap | I see; I really wonder how many users actually know this, and how many, if any at all, change this.. | 14:24 |
sean-k-mooney | kashyap: from greg " - vga compatibility mode is limited when compared to stdvga. It has | 14:24 |
sean-k-mooney | 8 MB fixed video memory, whereas stdvga has 16 MB by default can | 14:24 |
sean-k-mooney | can be configured to have more if needed (for example in case you | 14:24 |
sean-k-mooney | want use 4k). | 14:24 |
sean-k-mooney | " | 14:24 |
sean-k-mooney | that was in context of virtio-vga | 14:24 |
sean-k-mooney | vs plain vga | 14:25 |
sean-k-mooney | but i think cirrus also support more the 8MB | 14:25 |
sean-k-mooney | as does QXL | 14:25 |
kashyap | Let's confirm if Cirrus actually does | 14:25 |
sean-k-mooney | looking at https://libvirt.org/formatdomain.html#video-devices yes at least 16MB | 14:26 |
sean-k-mooney | For a guest of type "kvm", the default video is: type with value "cirrus", vram with value "16384" and heads with value "1". | 14:26 |
kashyap | sean-k-mooney: Ah, yep | 14:27 |
* kashyap bbaib; thanks for the discussion, sean-k-mooney :) | 14:28 | |
sean-k-mooney | lookign at https://www.kraxel.org/blog/2019/09/display-devices-in-qemu/#qxl-vga by the way qxl defaults to 64MB which was one of the reasons it perfromed better in the past then other options i know much of the improvemnts have now been surpassed by virtio-gpu or std vga | 14:31 |
sean-k-mooney | actully from the std vga section | 14:33 |
sean-k-mooney | "The linux driver supports page-flipping, so having room for 3-4 framebuffers is a good idea. The driver can leave the framebuffers in vram then instead of swapping them in and out. FullHD (1920x1080) for example needs a bit more than 8 MB for a single framebuffer, so 32 or 64 MB would be a good choice for that. " | 14:33 |
sean-k-mooney | you can see that 8MB is not quite large enough for 1080p | 14:33 |
sean-k-mooney | so without the virtio gpu driver you perfromace would likely regress goign form cirrus to virtio-vga if you tried to use a 1080p resolution today | 14:34 |
kashyap | sean-k-mooney: Back ... reading | 14:43 |
kashyap | I really doubt the seriousness of the "regression" from performance going from 'cirrus' to 'virtio-vga'. If people want performance, you better make sure you're not using deadly-old Linux, and got the virtio-gpu driver. | 14:45 |
kashyap | I'll check w/ Gerd (the author of the above post) and post my summary on the change | 14:48 |
sean-k-mooney | ack | 14:49 |
gibi | aarents: sean-k-mooney: here is the neutron drivers meeting log about deleting bound ports https://meetings.opendev.org/meetings/neutron_drivers/2021/neutron_drivers.2021-07-02-14.00.log.html#l-67 | 14:56 |
gibi | the agreement was that it needs specs | 14:57 |
gibi | there is a summary in the rfe https://bugs.launchpad.net/neutron/+bug/1930866 | 15:00 |
sean-k-mooney | gibi: right that is what i recalled | 15:00 |
bauzas | gibi: I tried to look how to verify whether we would have existing allocations for VGPU RC in case the operator would modify the options but honestly it's difficult to do it as we don't provide the allocations when calling update_provider_tree() without ReshapeNeeded | 15:05 |
bauzas | honestly, we already don't support the fact that operators could modify the inventories if they modify the options | 15:06 |
bauzas | but I'll test this | 15:09 |
aarents | gibi: thank you for the links | 15:09 |
gibi | bauzas: if we cannot prevent it then at least a big red warning should be added to the docs. | 15:17 |
bauzas | agreed | 15:17 |
bauzas | I think we would then have orphaned allocations | 15:17 |
gibi | probably until the instance is deleted or migrated | 15:18 |
bauzas | gibi: we *could* try to hardstop in update_provider_tree() | 15:18 |
bauzas | which is called by the compute service just before we start the RPC service | 15:18 |
bauzas | (called by pre_start_hook) | 15:19 |
bauzas | exactly like we do for reshapes | 15:19 |
bauzas | but then we would need to pass allocations even without reshaping | 15:19 |
bauzas | or, using the ReshapeNeeded exception | 15:20 |
bauzas | like a reshape | 15:20 |
bauzas | either way, not sure I could do it in 2 days | 15:20 |
gibi | yeah I don't expect that we initiati a rehape just to reject it :) | 15:22 |
gibi | I suggest to check what will happen (e.g. orphaned allocation) and add a big warning about it in the docs | 15:23 |
bauzas | gibi: that's just what I'm testing :) | 15:36 |
gibi | coolio | 15:36 |
bauzas | fortunately I have an environment <3 | 15:36 |
bauzas | I don't know yet how long, but... :p | 15:36 |
*** whoami-rajat__ is now known as whoami-rajat | 15:37 | |
opendevreview | Alexandre arents proposed openstack/nova master: libvirt: Abort live-migration job when monitoring fails https://review.opendev.org/c/openstack/nova/+/764435 | 16:07 |
*** rpittau is now known as rpittau|afk | 16:14 | |
opendevreview | Balazs Gibizer proposed openstack/nova master: Support move ops with extended resource request https://review.opendev.org/c/openstack/nova/+/800087 | 16:49 |
opendevreview | Balazs Gibizer proposed openstack/nova master: [func test] refactor interface attach with qos https://review.opendev.org/c/openstack/nova/+/800088 | 16:50 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Support interaface attach / detach with new resource request format https://review.opendev.org/c/openstack/nova/+/800089 | 16:50 |
bauzas | gibi: still around ? | 16:51 |
gibi | bauzas: a bit | 16:51 |
bauzas | gibi: good news, we refuse to change the RC if the operator modifies it | 16:52 |
bauzas | gibi: https://paste.opendev.org/show/807890/ | 16:52 |
gibi | \o/ | 16:52 |
bauzas | gibi: we accept to do this if there are no allocations, but in case we have some of them, we get this ^ | 16:52 |
gibi | awesome | 16:52 |
gibi | this is what we need | 16:52 |
opendevreview | Balazs Gibizer proposed openstack/nova master: [func test] move unshelve test to the proper place https://review.opendev.org/c/openstack/nova/+/793621 | 16:53 |
gibi | does it casue the compute to refuse to start? | 16:53 |
bauzas | the compute service continues to work, tho | 16:53 |
gibi | ;/ | 16:53 |
bauzas | gibi: no | 16:53 |
bauzas | gibi: but that's fine | 16:53 |
gibi | bauzas: do we only hit it in the periodic but not in the inithost? | 16:54 |
opendevreview | Balazs Gibizer proposed openstack/nova master: WIP support extended res req in heal port allocation https://review.opendev.org/c/openstack/nova/+/802060 | 16:54 |
bauzas | gibi: yup, because of update_provider_tree() | 16:54 |
bauzas | gibi: but this method is also called when restarting the compute | 16:54 |
bauzas | and every 60 secs | 16:54 |
bauzas | and then* | 16:54 |
gibi | OK, so we don't create orphans, we are loud in the log about the issue periodically so the admin will notice it. | 16:56 |
gibi | that is OK to me | 16:56 |
bauzas | also, when trying to delete the instance, we get an error | 16:56 |
bauzas | gibi: https://paste.opendev.org/show/807891/ | 16:57 |
bauzas | so the operator needs to first remodify the options to use again the VGPU RC | 16:57 |
gibi | auch | 16:58 |
gibi | that is ugly but meh | 16:58 |
bauzas | anyway, I'll continue to look at it | 16:58 |
bauzas | we could hardstop the compute by the exception maybe | 16:59 |
gibi | that would be ideal if possible | 16:59 |
gibi | anyhow I will drop of soon. I think you made a good progress with the mdev patches. I can look at them tomorrow again if there are updates | 17:00 |
bauzas | ++ | 17:01 |
sean-k-mooney | hard stoping the compute i guess is due to some configuration erro or something discovered wehn the update_provider_tree is run? | 17:47 |
sean-k-mooney | bauzas: the only way we allow operatros to modify the resouce provider by the way is via provider.yaml | 17:48 |
sean-k-mooney | if they tried to do that via plamcent or similar they are off in undefiend behavior land and get to keep what ever mangical bugs they find there | 17:49 |
*** ricolin_ is now known as ricolin | 18:02 | |
opendevreview | Artom Lifshitz proposed openstack/nova master: WIP: PCI resize tests https://review.opendev.org/c/openstack/nova/+/803506 | 19:01 |
NobodyCam | Good afternoon Nova folks, I am running into a situation where ironic nodes appear to available in both Ironic and Nova but when attempting to provision we are hitting errors with placement records not being cleaned up... My question is there a event we can listen for or api we can check that could be used to verify the node is ready for provisioning placement / inventory wise | 21:06 |
NobodyCam | I should add that we listen for compute.instance.delete.start and compute.instance.delete.end | 21:13 |
opendevreview | Ade Lee proposed openstack/nova master: Add check job for FIPS https://review.opendev.org/c/openstack/nova/+/790519 | 22:59 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!