*** bauzas2 is now known as bauzas | 00:14 | |
*** bauzas7 is now known as bauzas | 06:20 | |
r-taketn | Hi cores. I would appreciate it if you could review the SEV-ES code(https://review.opendev.org/q/topic:%22bp/amd-sev-es-libvirt-support%22) for the 2025.2 release. The SEV-ES code looks good to me. I apologize if this is already part of your review plan (https://etherpad.opendev.org/p/nova-2025.2-status#L88). | 06:44 |
---|---|---|
*** bauzas7 is now known as bauzas | 07:34 | |
opendevreview | Masahito Muroi proposed openstack/nova-specs master: Add spec for virtio-blk multiqueue and iothreads extra_spec https://review.opendev.org/c/openstack/nova-specs/+/951636 | 09:12 |
frickler | mikal: looks like we might need to evict more pkgs from our mirror. or ubuntu needs to fix this be releasing pkgs with a bumped version | 09:16 |
frickler | oh, seems this was discussed in the glance channel already | 09:20 |
mikal | Oh, I'm not in the glance channel so I haven't seen that. | 09:38 |
mikal | Is there anything you need me to do? | 09:38 |
frickler | mikal: iiuc fungi was working on this, I'm not sure if the cleanup was completed already. do you still see new failures currently? | 09:44 |
mikal | I haven't tried for a few hours. Let me try again again now. | 09:47 |
mikal | Oh I lie, I kicked off a recheck an hour ago. Its still running. Results will be at https://review.opendev.org/c/openstack/nova/+/926126 when they're done. | 09:48 |
mikal | In other questioins... I am unclear on what problem Glean is solving, apart from cloud-init being a bit big and complicated these days. Someone please to explain? | 09:49 |
opendevreview | Kamil Sambor proposed openstack/nova master: Replace eventlet.event.Event with threading.Event https://review.opendev.org/c/openstack/nova/+/949754 | 10:01 |
opendevreview | Kamil Sambor proposed openstack/nova master: Replace eventlet.event.Event with threading.Event https://review.opendev.org/c/openstack/nova/+/949754 | 10:02 |
frickler | mikal: yes, cloud-init pulls in a lot of python dependencies at the distro level, which may conflict with dependencies for OpenStack projects, which is why we try to avoid it for CI purposes | 10:06 |
frickler | also no failures so far, so looks like things might be fixed https://zuul.opendev.org/t/openstack/status?change=926126 | 10:07 |
Uggla | sean-k-mooney, gibi, I know you are busy, but It would be great if you can have a look at 951636: Add spec for virtio-blk multiqueue and iothreads extra_spec | https://review.opendev.org/c/openstack/nova-specs/+/951636. masahito updated it and I reviewed it on my side. So there is still a chance to have this spec approved for this cycle. | 10:16 |
sean-k-mooney | hum ok,,, i was goign to write a differnt version of that spec(im 2/3rd of the way though) since i assuemd they were not going to update it at this point and i had some other ideas on how to do it | 10:21 |
mikal | frickler: that makes sense. So it's only intended for CI workloads? Config drive was meant to be gone by now... | 10:28 |
gibi | Uggla: left couple of questions ther | 10:31 |
gibi | e | 10:31 |
opendevreview | sean mooney proposed openstack/nova-specs master: support iothread for nova vms https://review.opendev.org/c/openstack/nova-specs/+/953940 | 11:03 |
sean-k-mooney | that is my version of how to do that ^ | 11:03 |
opendevreview | Balazs Gibizer proposed openstack/nova master: FUP: Translate scatter-gather to futurist https://review.opendev.org/c/openstack/nova/+/953338 | 11:24 |
opendevreview | Balazs Gibizer proposed openstack/nova master: FUP: Use futurist for _get_default_green_pool() https://review.opendev.org/c/openstack/nova/+/953339 | 11:24 |
stephenfin | bauzas: We're waiting for you on https://review.opendev.org/c/openstack/nova-specs/+/940440 | 11:44 |
frickler | mikal: in the current state CI is the only remaining use I'm aware of, yes. is config drive deprecated? that's news to me | 11:56 |
stephenfin | gmaan: I've proposed deprecating the v2.0 API for now, and removing in 2 releases. That will cause keystoneauth and Gophercloud to ignore it by default during discovery which is a good first step | 11:57 |
opendevreview | Stephen Finucane proposed openstack/nova-specs master: Deprecate v2.0 API https://review.opendev.org/c/openstack/nova-specs/+/951949 | 12:00 |
opendevreview | sean mooney proposed openstack/nova-specs master: support iothread for nova vms https://review.opendev.org/c/openstack/nova-specs/+/953940 | 12:01 |
sean-k-mooney | frickler: its not deprecated and we have no plans to remvoe it | 12:05 |
gibi | stephenfin: wasn't we deprecated v2.0 already in the past? | 12:05 |
sean-k-mooney | frickler: we even turn it on by default in our downstream installer | 12:05 |
stephenfin | gibi: It still reports SUPPORTED from the root API doc | 12:05 |
sean-k-mooney | mikal: we hyave no plans to remove config drive. | 12:06 |
stephenfin | so not from the API perspective, afaict | 12:06 |
frickler | sean-k-mooney: ah, thx for confirming | 12:11 |
sean-k-mooney | we did deprecate https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.config_drive_format a while ago | 12:11 |
sean-k-mooney | but keepign vfat does tno really hurt so we never remvoed it | 12:12 |
sean-k-mooney | the plan was to only supprot isos onc ethe very very old libivrt bug was resolved | 12:12 |
sean-k-mooney | its one of those thing where until it brekas the effort to remove vfat support is not woth the gain | 12:13 |
gibi | stephenfin: ack, that is a bummer but the I agree to deprecate | 12:22 |
fungi | mikal: can you expand on what "config drive was meant to be gone by now" means? is there a better metadata source that doesn't depend on flaky/inconsistent networking or hard-coded addresses? see also the latest cloud-init security fix where using configdrive is literally the recommendation for non-x86 guests now | 12:23 |
fungi | the reason we use configdrive with glean is because network-supplied metadata has been pretty terrible in lots of clouds | 12:24 |
gibi | brace for impact... | 12:24 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Add spawn_on https://review.opendev.org/c/openstack/nova/+/948079 | 12:24 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Move ComputeManager to use spawn_on https://review.opendev.org/c/openstack/nova/+/948186 | 12:24 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Move ConductorManager to use spawn_on https://review.opendev.org/c/openstack/nova/+/948187 | 12:24 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Make nova.utils.pass_context private https://review.opendev.org/c/openstack/nova/+/948188 | 12:24 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Rename DEFAULT_GREEN_POOL to DEFAULT_EXECUTOR https://review.opendev.org/c/openstack/nova/+/948086 | 12:24 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Make the default executor configurable https://review.opendev.org/c/openstack/nova/+/948087 | 12:24 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Print ThreadPool statistics https://review.opendev.org/c/openstack/nova/+/948340 | 12:24 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Document native threading mode and tuneables https://review.opendev.org/c/openstack/nova/+/949364 | 12:24 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Allow services to start with threading https://review.opendev.org/c/openstack/nova/+/948311 | 12:24 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Run nova-next with n-sch in threading mode https://review.opendev.org/c/openstack/nova/+/948450 | 12:24 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Do not yield in threading mode https://review.opendev.org/c/openstack/nova/+/950994 | 12:24 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Allow to start unit test without eventlet https://review.opendev.org/c/openstack/nova/+/953436 | 12:24 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Run unit test with threading mode https://review.opendev.org/c/openstack/nova/+/953475 | 12:24 |
opendevreview | Balazs Gibizer proposed openstack/nova master: [test]RPC using threading or eventlet selectively https://review.opendev.org/c/openstack/nova/+/953815 | 12:24 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Do not yield in threading mode https://review.opendev.org/c/openstack/nova/+/950994 | 12:33 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Run nova-api and -metadata in threaded mode https://review.opendev.org/c/openstack/nova/+/951957 | 12:33 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Warn on long task wait time for executor https://review.opendev.org/c/openstack/nova/+/952666 | 12:33 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Allow to start unit test without eventlet https://review.opendev.org/c/openstack/nova/+/953436 | 12:34 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Run unit test with threading mode https://review.opendev.org/c/openstack/nova/+/953475 | 12:34 |
opendevreview | Balazs Gibizer proposed openstack/nova master: [test]RPC using threading or eventlet selectively https://review.opendev.org/c/openstack/nova/+/953815 | 12:34 |
gibi | now sign-offs in place :) | 12:34 |
opendevreview | Kamil Sambor proposed openstack/nova master: Replace eventlet.event.Event with threading.Event https://review.opendev.org/c/openstack/nova/+/949754 | 12:38 |
sean-k-mooney | fungi: there was discssion in the past to remove it a very long time ago, the reason we have kep it is for things like ironic or where you need to use metadata form the config drive to confirgure networkign before you can actully call the metadata api | 12:47 |
sean-k-mooney | there are also envionment where the metadta api is not deployed | 12:48 |
sean-k-mooney | or where its in accapble in specific neutron network toplogies like whne you are booted directly to an external network and you are not using ovn or dhcp | 12:48 |
fungi | yeah, and also (at least currently) no means for the host to signal the location of a network metadata source with non-x86 platforms | 12:49 |
sean-k-mooney | how does x86 factor into this? | 12:49 |
fungi | sean-k-mooney: https://bugs.debian.org/1108403 | 12:50 |
sean-k-mooney | is there someting arm or power speciic that woud prevent it doing dhcp adn using the well know adress? | 12:50 |
sean-k-mooney | ok so that not really nova related its a bug in cloud-init | 12:51 |
sean-k-mooney | glean or other first boot agenst are not nessiarly impacted right | 12:51 |
fungi | i have lost access to the more verbose explanation in the (still private) lp bug about how qemu passes information from host to guest that is x86-specific | 12:52 |
sean-k-mooney | ok but the point is qemu is not ment ot be passign info form the host to the guest in general. config dirve is a way to do that | 12:52 |
fungi | nova is able to indicate the address of the metadata server through a host-to-guest communication channel when it's x86 platform, but not on other platforms, so cloud-init had a workaround where it just assumed a specific linklocal address, but this was a security problem if booting on other cloud platforms | 12:53 |
sean-k-mooney | we dont do that however | 12:54 |
fungi | whatever trick qemu provides for that signalling is specific to emulated x86 systems | 12:55 |
sean-k-mooney | so im not aware of the mechisim yoru refering too | 12:55 |
fungi | so cloud-init queries that when run in x86 guests | 12:55 |
sean-k-mooney | the metaddat discover is ment to be to a hardcoded ip adress in cloud-init or to the defautl gateway as a fall back | 12:55 |
fungi | really wish they weren't continuing to keep that bug private after referring to it in the commit message of a public security fix | 12:56 |
sean-k-mooney | to be clear lookign at https://github.com/canonical/cloud-init/commit/f43937f0b462734eb9c76700491c18fe4133c8e1 | 12:56 |
sean-k-mooney | im not even sure that shoudl be a cve | 12:56 |
sean-k-mooney | but in any case cloud init will use the ec2 well knwo adress in an openstack env | 12:56 |
sean-k-mooney | so that would apply equally to that | 12:57 |
fungi | yeah, my understanding is that they're dropping that functionality with the commit there | 12:57 |
sean-k-mooney | tahts a major brakign change | 12:58 |
fungi | because they considered it a security risk if you ran cloud-init in an environment where some other bad actor on a shared network was able to spoof that well-known address | 12:58 |
fungi | yes, it's a breaking change | 12:58 |
sean-k-mooney | not that the dmi interface they are now relaying on | 12:58 |
sean-k-mooney | is not part of nova api contract to the guest runtime | 12:58 |
sean-k-mooney | its an internal impelmation detail of the livbirt driver | 12:58 |
fungi | ah, yes, dmi was how they were saying it worked for x86 guests, so they kept that | 12:59 |
sean-k-mooney | ya so that is not a thing for ironic vmware hyperv ectra | 12:59 |
fungi | and recommend that any other architectures now rely on configdrive | 12:59 |
sean-k-mooney | so that is not somthign they can use to detect if its oepnstack | 12:59 |
sean-k-mooney | it will work for libvirt with qemu but its not genrelaly correct to do | 12:59 |
fungi | right, all the rationale and debate as to available options is still locked up in a private lp bug | 13:00 |
sean-k-mooney | so to me the behaivor it hasd to reach out to the link local adress was the correct behavior and not a security issue | 13:02 |
sean-k-mooney | you can disable datasouce via the cloud init config in yoru image if you wanted too | 13:02 |
sean-k-mooney | im not ok with https://review.opendev.org/c/openstack/nova/+/953732 by the way | 13:05 |
sean-k-mooney | fungi: i guess that is why that was being rasised earlier in the week | 13:05 |
sean-k-mooney | fungi: by the way the dmi info when it is provided to vms is also configuabel via the downstream distobution https://github.com/openstack/nova/blob/2c19c07d5e29ca5445fa6dd45a6117542951714c/nova/version.py#L49-L63 | 13:41 |
sean-k-mooney | i dont hink we have used this historically downstream | 13:44 |
sean-k-mooney | but its a thing that can be configured | 13:44 |
gibi | I'm sorry there was a botched rebased in the bottom of the series... | 13:47 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Use futurist for _get_default_green_pool() https://review.opendev.org/c/openstack/nova/+/948072 | 13:47 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Replace utils.spawn_n with spawn https://review.opendev.org/c/openstack/nova/+/948076 | 13:47 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Add spawn_on https://review.opendev.org/c/openstack/nova/+/948079 | 13:47 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Move ComputeManager to use spawn_on https://review.opendev.org/c/openstack/nova/+/948186 | 13:47 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Move ConductorManager to use spawn_on https://review.opendev.org/c/openstack/nova/+/948187 | 13:47 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Make nova.utils.pass_context private https://review.opendev.org/c/openstack/nova/+/948188 | 13:47 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Rename DEFAULT_GREEN_POOL to DEFAULT_EXECUTOR https://review.opendev.org/c/openstack/nova/+/948086 | 13:47 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Make the default executor configurable https://review.opendev.org/c/openstack/nova/+/948087 | 13:47 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Print ThreadPool statistics https://review.opendev.org/c/openstack/nova/+/948340 | 13:47 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Document native threading mode and tuneables https://review.opendev.org/c/openstack/nova/+/949364 | 13:47 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Allow services to start with threading https://review.opendev.org/c/openstack/nova/+/948311 | 13:47 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Run nova-next with n-sch in threading mode https://review.opendev.org/c/openstack/nova/+/948450 | 13:48 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Do not yield in threading mode https://review.opendev.org/c/openstack/nova/+/950994 | 13:48 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Run nova-api and -metadata in threaded mode https://review.opendev.org/c/openstack/nova/+/951957 | 13:48 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Warn on long task wait time for executor https://review.opendev.org/c/openstack/nova/+/952666 | 13:48 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Allow to start unit test without eventlet https://review.opendev.org/c/openstack/nova/+/953436 | 13:48 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Run unit test with threading mode https://review.opendev.org/c/openstack/nova/+/953475 | 13:48 |
opendevreview | Balazs Gibizer proposed openstack/nova master: [test]RPC using threading or eventlet selectively https://review.opendev.org/c/openstack/nova/+/953815 | 13:48 |
opendevreview | Balazs Gibizer proposed openstack/nova master: FUP: Translate scatter-gather to futurist https://review.opendev.org/c/openstack/nova/+/953338 | 13:49 |
opendevreview | Kamil Sambor proposed openstack/nova master: Replace eventlet.event.Event with threading.Event https://review.opendev.org/c/openstack/nova/+/949754 | 14:31 |
gibi | eventlet sycn https://meet.google.com/bcy-uqoz-hje?authuser=0 | 14:32 |
fungi | sean-k-mooney: thanks, yeah revisiting my recollections of the original bug, it's that cloud-init is guessing whether it's booting in an openstack environment based on the dmi data, but that only works for qemu virtual machines on x86, so on e.g. arm cloud-init can't decide whether it's safe to assume the metadata source is legitimate or potentially spoofed by some attacker | 14:53 |
fungi | because it doesn't know whether it's in an openstack cloud | 14:53 |
sean-k-mooney | right but you can contol that by the cloud-init config in the guest when you build the image | 14:56 |
sean-k-mooney | adn the expect behiovr is to enumerate all data souces that are enabled in the image | 14:56 |
sean-k-mooney | so to me removign the enumeration is fundementaly breakign how this was inteded to work | 14:57 |
sean-k-mooney | no i think they are jsut chagnign there defaults in there config | 14:57 |
sean-k-mooney | rather then entirly removing it | 14:57 |
sean-k-mooney | i.e. if you expiclty customise an image and enable the openstack data souce adn upload it to glance | 14:58 |
sean-k-mooney | you can still use ti with vmware or lxc or ironic | 14:58 |
sean-k-mooney | i would not be surpised if some distros decide to revert back when buidling there cloud images | 15:01 |
sean-k-mooney | fungi: glean just uses config drive right, ye never implemented the metadata rest api? | 15:01 |
fungi | i think so, we prefer dhcp/slaac+ra but if the cloud needs to set network configuration through metadata then we rely on glean to pull that from the configdrive because our experience with network metadata sources has been bad | 15:07 |
gmaan | stephenfin: RE v2.0, thanks for updates. +W on the deprecation way. | 15:21 |
stephenfin | ta | 15:22 |
gmaan | gibi: yeah, that was my main concern on v2.0 direct removal as it was not marked as deprecated or not supported. at least giving deprecation phase to users (if any) can come forward if they need it for some more time and why not using v2.1 | 15:23 |
opendevreview | ribaudr proposed openstack/nova-specs master: Enable memfd support for shared memory backing https://review.opendev.org/c/openstack/nova-specs/+/951689 | 15:31 |
gibi | gmaan: cool, I agree that we need a deprecation first | 15:32 |
Uggla | sean-k-mooney, if you can have a look at 951689: Enable memfd support for shared memory backing | https://review.opendev.org/c/openstack/nova-specs/+/951689 let me know if I manage to capture all you inputs. | 15:32 |
Uggla | sean-k-mooney, no urgency on that one. | 15:33 |
gibi | fyi, Sree while verifying PCI in Placement caught a nice, long standing bug in our pci tracker https://bugs.launchpad.net/nova/+bug/2115729 | 15:33 |
opendevreview | Merged openstack/nova-specs master: Deprecate v2.0 API https://review.opendev.org/c/openstack/nova-specs/+/951949 | 15:34 |
Uggla | gibi, I have seen this bug yesterday. kudos to Sree. I guess I can triage it as a valid one ? | 15:48 |
gibi | I did the triage now that I found the root cause | 15:48 |
gibi | I will push a fix in 30 mins... | 15:49 |
Uggla | gibi, faster than light ! | 15:50 |
gibi | this is a typical bug where finding the root cause is the hard thing, fixing it is the easy | 15:54 |
ratailor | gmaan, could you please provide your feedback on https://review.opendev.org/c/openstack/nova-specs/+/929780/ my replies ? | 16:05 |
opendevreview | Rajesh Tailor proposed openstack/nova-specs master: Show finish_time field in instance action show https://review.opendev.org/c/openstack/nova-specs/+/929780 | 16:08 |
ratailor | sean-k-mooney, gibi could you please also review ^^ as tomorrow is last date for FF. | 16:09 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Reproduce that only half of the PCI devs are removed https://review.opendev.org/c/openstack/nova/+/953971 | 16:12 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Fix pci_tracker.save to delete all removed devs https://review.opendev.org/c/openstack/nova/+/953972 | 16:12 |
gmaan | ratailor: sure, will check today | 16:12 |
ratailor | gmaan, Thanks! | 16:13 |
opendevreview | Alexey Stupnikov proposed openstack/nova master: Use updated_at to purge shadow_instance_extra https://review.opendev.org/c/openstack/nova/+/953984 | 18:29 |
opendevreview | Alexey Stupnikov proposed openstack/nova master: Use updated_at to purge shadow_instance_extra https://review.opendev.org/c/openstack/nova/+/953984 | 18:33 |
sean-k-mooney | gibi: bauzas Uggla did ye look at my verion of the iothreads spec by the wayhttps://review.opendev.org/c/openstack/nova-specs/+/953940/2/specs/2025.2/approved/dynamic-vm-iothreads.rst | 18:41 |
sean-k-mooney | we are unlikely to merge either by tomorrow but i fell like my version is closer | 18:42 |
opendevreview | Merged openstack/nova-specs master: Add flavor-search-by-name spec https://review.opendev.org/c/openstack/nova-specs/+/940440 | 18:50 |
Uggla | sean-k-mooney, yep I have quickly looked at it. Tbh I think from technical point of view It looks great. I'm just annoyed about how masahito feels about it. I'd like a kind of co authoring and make sure his need is fulfill by the spec too. I will discuss with him tomorrow, because I'd like to explain, the goal is not to overwrite his spec but | 19:28 |
Uggla | combine the two. | 19:28 |
mikal | fungi: my general recollection of this decade old coversation aligns with what sean-k-mooney said. Config drive v2 was always meant as a stop gap until everyone was using the metadata service. It really only existed at all because it was copied from AWS. I'm confused by your x86 address detection comments because the metadata service just uses a | 19:31 |
mikal | hard coded zeroconf IP address on any of the clouds I can think of. | 19:31 |
mikal | fungi: certainly a lot of the warts in config drive (like it originally not being able to be updated, although I think maybe that's fixed now) were because it was seen as a temporary thing. | 19:32 |
fungi | mikal: i was mis-remembering the bug, looking at the cloud-init commit it's "nova detection" really. they want to only contact the metadata server address for discovery if run in an openstack (or ec2) cloud, so are checking dmi to determine if it's safe to do so, but that check is only viable on x86 | 19:35 |
fungi | so when it comes to booting generic non-x86 distro cloud images with cloud-init included, the recommendation is to rely on configdrive instead | 19:36 |
mikal | Honestly most of that DMI stuff is a bit dumb. That data exists in metadata, why do we ram it into two places one of which doesn't work on other architectures? | 19:37 |
fungi | yeah, and so if nova didn't do that even on x86, then cloud-init would really just recommend distros go back to making openstack-specific images so they could set the override in the kernel command-line or an embedded conffile | 19:38 |
fungi | (or suggest always using configdrive maybe) | 19:38 |
mikal | I'm not here to hate on config drive. My point is more that its outlived its expected life, had some weird implementation choices because of the time it was done in, and probably needs some love if its really here forever. | 19:38 |
opendevreview | Brett Holman proposed openstack/nova master: Add DMI data to non-x86 architectures https://review.opendev.org/c/openstack/nova/+/953989 | 19:39 |
mikal | Like for example it execs a command line to build the ISO image, solely because the only library we could find at the time to do it nicely was GPL and we refused to use GPL dependencies. | 19:39 |
fungi | agreed, it has challenges, for example the earlier change at the start of this discussion about using virtio for cdrom devices by default, because debian doesn't include ahci drivers in their genericcloud kernel configs | 19:39 |
mikal | FAT config drives should work too? IIRC it doesn't _have_ to be a CDROM, it just defaults to that? | 19:40 |
mikal | Its been a long time. | 19:40 |
fungi | it probably could be any kind of block device, i think it was virtual cd by convention so not sure if that assumption is baked into places (e.g. in cloud-init) | 19:40 |
fungi | granted, forcing a different device type is still going to be a custom image property on upload or some similar override, if it's not nova's default | 19:42 |
mikal | So cloud-init will only probe for a couple of things to match -- device ID matters (config2) and type. The probing code only looks for CDROMs or FAT formatted disks IIRC. | 19:51 |
mikal | Honestly, most of the people I know who really use metadata use it as a poor man's replacement for Nova not having some sort of mature out of band agent thing. They really want AWS SSM agent or equivalent -- https://docs.aws.amazon.com/systems-manager/latest/userguide/ssm-agent.html. | 19:53 |
mikal | We could do the same with either a custom agent or the qemu agent (although the qemu agent has some pretty big limitations), that bit isn't too hard. The hard bit is marshalling the requests into the instance. That likely involves some sort of per-instance queue that Nova doesn't have right now. | 19:54 |
fungi | yeah, an actual communication channel from host to guest at least, if not bi-directional | 19:58 |
mikal | Yeah, specifically one which doesn't not require networking to be configured and working. | 19:59 |
mikal | I worked for a training startup for a bit... We did all sorts of contortions to rescue VMs when students messed up their networking. This would have made it so much easier. | 19:59 |
fungi | a fifo-like proc interface in the kernel or something would be amazing | 19:59 |
fungi | and/or pseudofiles like device drivers tend to expose | 20:00 |
mikal | So qemu-agent uses a unix domain socket on the outside and virtio-serial on the inside and speaks JSON blobs, which is a bit terrible. Serial means only one talker at a time. | 20:00 |
fungi | though at boot time, one talker in the guest is generally sufficient? | 20:01 |
mikal | ShakenFist does this with a unix domain socket on the outside and virtio-vsock on the inside, and speaks protobufs. That's actually really cool if I don't say so myself because the in-guest agent can just run a socket server like you're used to and serve N requests at the same time because virtio-vsock does all the multiplexing for you. | 20:01 |
fungi | or maybe not these days, with the advent of parallel boot processing | 20:01 |
mikal | Serialized requests forces you to come up with some sort of queueing / locking mechanism to handle the case of multiplem requests, even if its not used much. | 20:02 |
fungi | the protobuf design sounds nice, if only protobuf libraries weren't such shifting sand these days (ut maybe they're getting better) | 20:03 |
mikal | https://www.madebymikal.com/virtio-vsock-python-examples-of-running-the-server-in-the-guest/ is what I wrote up when I was playing with virtio-vsock. | 20:03 |
fungi | yeah, i guess low-level socket access like that is probably enough for these cases, you don't need google's own protobuf libs | 20:06 |
fungi | oh, i see what you're saying, shakenfist tunnels protobufs over an interface like that | 20:09 |
fungi | you'd still need separate libs for the serialization i guess | 20:09 |
mikal | Yes, virtio-vsock is the transport, and I talk protobuf down it. Qemu talks JSON in its similar but slightly more terrible case. | 20:12 |
mikal | Notably I do not talk gRPC, just the serialization protocol. | 20:12 |
mikal | But meh, its just serialization. It could be JSON for all it matters. | 20:12 |
mikal | The in-guest agent could present a REST API for example over virtio-vsock and that would not be insane. | 20:12 |
fungi | makes sense, and yes grpc is a much bigger mess | 20:15 |
mikal | On reflection, I am also a bit confused about the "tangled python deependencies" answer to why glean is needed compared to cloud-init. Isn't that just a case of using venvs for the various competing components? Especially when modern pythons are super opposed to you installing directly into the system pip? | 20:17 |
fungi | devstack is finally in a position to be able to isolate openstack services from the system python module search path, but that's an extremely recent change | 20:29 |
fungi | i recall clarkb hacked on versions of that improvement for years before it reached a state others in the community were comfortable with | 20:35 |
clarkb | from the CI/OpenDev perspective I think there are two primary reasons we liek config drive. The first is that it is incredibly reliable. The boot error rate with metadata service is >0 and in some cases quite high. The other is that it is simple. Cloud-init configures a lot of stuff and sometimes in desctructive ways (it once decided to do bad things to volume mounts). Glean | 20:36 |
clarkb | sets up your network and ssh keys and that is about it | 20:36 |
clarkb | today we can only boot ubuntu noble in rackspace classic with config drive. Metadata does not work. This is with the upstream ubuntu noble cloud image which uses cloud init (not glean) | 20:37 |
clarkb | (that is because rackspace doesn't do metadata in the way normal clouds do iirc and they don't have their custom image for noble yet (or ever) so the only way it works with standard tools is config drive) | 20:37 |
clarkb | naively I'm not sure what the issue is. From the end user standpoint config drive is simple and bullet proof | 20:39 |
clarkb | the ability to mount iso9660 or fat32 devices is near univeral, the data is small and static. Using a complicated api service that requires hosts know how to magically negotiate a protocal more complicated than `mount` is always going to be more error prone. | 20:40 |
clarkb | iirc there are layers of proxies involved ont he backend too (which I want to say was the assumed source of the flakyness when we shifted all our ci nodes to config drive) | 20:40 |
mikal | I am not sure there is an issue, I am just surprised given config drive was meant to be a temporary fix which seems to have "rusted on". If it's genuinely around forever it could probably do with some love. | 21:00 |
clarkb | I think the primary downside to config drive is that it consumes one of your disk slots. I know this is a problem for Xen. Not sure about KVM | 21:07 |
clarkb | with Xen you only get something like 16 disks. One of which is implicitly owned by the root disk then config drive consumes a second. This limits you to 14 devices which may limit the total storage attachable to the instance if you also have volume size limits | 21:08 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!