opendevreview | Merged openstack/placement master: improve test logging and replace psycopg2 with psycopg2-binary https://review.opendev.org/c/openstack/placement/+/945487 | 01:04 |
---|---|---|
*** ralonsoh_out is now known as ralonsoh | 05:25 | |
opendevreview | Arnaud Morin proposed openstack/nova master: Fix missing marker with instances in build_requests https://review.opendev.org/c/openstack/nova/+/947804 | 06:33 |
opendevreview | Stefan K proposed openstack/nova-specs master: Add Cloud Hypervisor support spec https://review.opendev.org/c/openstack/nova-specs/+/945549 | 06:34 |
sean-k-mooney | Uggla: there are 2 potential specless blueprint for use to consider https://blueprints.launchpad.net/nova/+spec/xml-image-meta which has an implemeation aviabel https://review.opendev.org/q/topic:%22bp/xml-image-meta%22 is the first | 09:24 |
sean-k-mooney | Uggla: the author reached out druing feature freeze and i chatted to them about it and gave some early feedback but they never procedurlly asked for the blueprint to be specless and for it to be reviewd | 09:25 |
sean-k-mooney | i suggested that they add it to the next meeting or ask for the approave via the mailing list if they cant attend | 09:26 |
sean-k-mooney | the ohter is more of a feature request but i think it could be specless https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/NSW3OG5ME5RDPQYLHF4T4RCWPQYG57PK/ | 09:27 |
sean-k-mooney | TLDR libvirt can now auto free ram allocated to a geust when the guest releases it internally. | 09:27 |
sean-k-mooney | you would not really want to do that for realtime vms | 09:27 |
sean-k-mooney | and you cant do it for hugepage/filebacked vms | 09:28 |
sean-k-mooney | but for standard vms we proably shoudl just enabel that always | 09:28 |
bauzas | sean-k-mooney: I need to go to the physio in 30 mins but if you have time, it would be nice if you could help me on https://review.opendev.org/c/openstack/nova/+/922140 | 09:28 |
sean-k-mooney | this is more just an fyi that thsoe would both eb low hanging fruite features. | 09:29 |
sean-k-mooney | bauzas: form the logs the last time it was pretty clear that the isseu was nto with the subnode using the wrong commit | 09:29 |
sean-k-mooney | i say you said that in the team meeting yesterday but that was no tthe problem previosuly, we coudl see the commit that was beign used in the job output on the ocmptue and it was correct | 09:30 |
sean-k-mooney | witht hat said sure i can take a look | 09:30 |
Uggla | sean-k-mooney, Ok I will add them to the doc and we will be able to discuss them in next meeting if it is required. | 09:30 |
sean-k-mooney | bauzas: i assume you have tired this locally? | 09:31 |
sean-k-mooney | bauzas: https://zuul.opendev.org/t/openstack/build/d31c2c01045a409a96d9dfd5e8aabbaa/log/job-output.txt#25980 so its defintly using the expected commit on the compute node | 09:37 |
sean-k-mooney | in the comptue node devstack logs we can also see the tat compilation is enabeld 2025-05-20 12:55:42.033 | ++ /opt/stack/nova/devstack/settings:source:2 : NOVA_COMPILE_MDEV_SAMPLES=True | 09:40 |
sean-k-mooney | mdevctl is also installed | 09:40 |
sean-k-mooney | looking at the compiel itesf there are some warning but apprently they were compled and loaded properly | 09:43 |
sean-k-mooney | https://paste.opendev.org/show/bcAgW92YFhDTcFs5QLDD/ | 09:43 |
sean-k-mooney | the probelm to me look more like libvirt has not detected the mdevs yet and we only see that on the compute beasue there is less of a delay between the devstack stages | 09:45 |
sean-k-mooney | if i had to guess movign the compilation and install earliyer in devstack or restarting libvirt might fix this. | 09:46 |
bauzas | sean-k-mooney: sorry I was not notified by my TheLounge instance | 09:50 |
bauzas | my main wonder is why compute1 says mtty_mtty is a wrong device | 09:51 |
bauzas | anyway, I need to go to the physio now :( | 09:51 |
sean-k-mooney | bauzas: ill push a patch while your away | 09:51 |
sean-k-mooney | we can see if moving the module install before installign libvirt fixes it | 09:51 |
bauzas | I don't think this is the problem but let's see | 09:52 |
sean-k-mooney | i think its due to caching fo the device and a race between nova-comptue starting and libvirt seeing the device | 09:52 |
sean-k-mooney | bauzas: well i confimed its usign the correct commit, the compile succeed and the kernel modules were loaded | 09:52 |
sean-k-mooney | so that means the issue is either on the libvirt side or in nova | 09:53 |
opendevreview | sean mooney proposed openstack/nova master: move compile earlier https://review.opendev.org/c/openstack/nova/+/950516 | 10:22 |
sean-k-mooney | if my intuition is correct then ^ will fix it but we may need to keep the compile in install dependin on if we have depenciy issues | 10:23 |
*** sfinucan is now known as stephenfin | 11:19 | |
opendevreview | sean mooney proposed openstack/nova master: move compile earlier https://review.opendev.org/c/openstack/nova/+/950516 | 12:27 |
gibi | sean-k-mooney: dansmith: bauzas: FYI the oslo.service threading backend patch has been merged https://review.opendev.org/c/openstack/oslo.service/+/945720 | 14:24 |
dansmith | cool | 14:26 |
dansmith | gibi: pci-in-placement (I think) question | 14:30 |
dansmith | ...and I'm getting close to being able to do this again, so if you don't know what I'm talking about, I'll repro to show you exactly, but: | 14:30 |
dansmith | since I don't have a lot of identical pci devices to play with, I tried configuring two different ones with the same name for my flavor (they're equivalent, just not identical) | 14:31 |
dansmith | when I tried to boot with that flavor, I got an error about "only one pci request allowed".. I think it was pci-in-placement specific | 14:31 |
dansmith | I always thought one of the benefits of symbolic naming for the pci devices was so you could use something like "nvme256G" and even if you had multiple generations of hardware, you'd get a suitable device | 14:32 |
dansmith | is that a pci-in-placement restriction or am I mistaken about that normally working? | 14:32 |
gibi | is it two aliases with the same name but different request? | 14:33 |
dansmith | right | 14:33 |
gibi | https://bugs.launchpad.net/nova/+bug/2102038 | 14:34 |
gibi | it is definitely does not work with pci in placement. Without it it is accepted but based on Sean this never really fully worked there either | 14:35 |
bauzas | gibi: dansmith: in a current meeting but looking | 14:36 |
dansmith | hmm, okay | 14:36 |
dansmith | gibi: I thought you could also specify devices in alias by address, but maybe that's not right | 14:36 |
gibi | https://review.opendev.org/c/openstack/nova/+/944062 see the comments from sean-k-mooney | 14:36 |
gibi | dansmith: I don't have the full picture of this without PCI in Placement | 14:36 |
gibi | with PCI in Placement it is not trivial to support | 14:37 |
bauzas | are we talking of PCI aliases ? | 14:37 |
dansmith | okay I don't know if I agree with sean's comments about it never being intended to be supported | 14:37 |
sean-k-mooney | dansmith: no the alilas doe not have an adress | 14:37 |
gibi | as we cannot say OR between resouce classes in allocationa candidate query | 14:37 |
sean-k-mooney | dansmith:the devspec does but the alaise does not | 14:37 |
dansmith | I obviously wasn't closely involved with the early pci stuff, but I remember a conversation at a design summit about addresses specifically so you didn't have to commit all of one vendor/product | 14:37 |
sean-k-mooney | dansmith: that was rejected because the alsi is ment to eb the same on all hosts | 14:38 |
sean-k-mooney | and the adress woudl vary | 14:38 |
sean-k-mooney | you dont need to commit all of one vendor id and product id | 14:39 |
sean-k-mooney | you do the filtering in the dev spec | 14:39 |
sean-k-mooney | the alias is an abstraction like the resouce classs in placement | 14:39 |
bauzas | I think you can even star the devices | 14:40 |
bauzas | for the alias | 14:40 |
dansmith | okay digging back through old release config docs, I guess I'm thinking of the whitelist | 14:40 |
sean-k-mooney | ya the whitelists is now the devspec | 14:41 |
dansmith | it seems really unfortunate to not be able to construct a flavor that gets one of a set of multiple identical-enough devices | 14:41 |
bauzas | dansmith: it should | 14:41 |
sean-k-mooney | dansmith: you can | 14:41 |
bauzas | I don't get what we can't do | 14:41 |
bauzas | there are two different things | 14:41 |
bauzas | sec | 14:41 |
dansmith | oh, multiple aliases in the flavor with commas? | 14:42 |
sean-k-mooney | yes | 14:42 |
dansmith | and that will do one-of? | 14:42 |
dansmith | okay, gotcha | 14:42 |
sean-k-mooney | no | 14:42 |
bauzas | https://docs.openstack.org/nova/latest/configuration/config.html#pci.alias | 14:42 |
sean-k-mooney | in the flavor if you use commas that is asking for multipel devices | 14:42 |
sean-k-mooney | but not one of | 14:42 |
bauzas | in the alias, you can request PCI devices by vendor/product IDs | 14:42 |
dansmith | okay, so how do I do one of? | 14:42 |
sean-k-mooney | you cant | 14:43 |
sean-k-mooney | not with out breaking other things | 14:43 |
sean-k-mooney | well | 14:43 |
sean-k-mooney | ok so you can do one of via resouce classes an pci in placement | 14:43 |
bauzas | again, I don't see the problem | 14:43 |
dansmith | okay, so it _is_ a limitation that we can't do what I asserted above: a flavor that selects one of a set of identical-enough hardware | 14:43 |
sean-k-mooney | dansmith: identical enough for live migration means they device must be exactly the same | 14:44 |
bauzas | but we can say 'nvme256:1' in the flavor, right? | 14:44 |
sean-k-mooney | potentlaly down to the firmware level | 14:44 |
dansmith | sean-k-mooney: for things that care about live migration, but (a) I have two devices that are the same vendor/product but different firmwares *and* one supports crypto and the other does not, | 14:44 |
dansmith | so I'm sure qemu has to do more is-this-the-same-for-real checking if we expect that to work robustly | 14:45 |
bauzas | provided that alias would match different hardware based on the same alias combination (either the vendor/product id or a star, IIRC) | 14:45 |
sean-k-mooney | dansmith: right so you can requiest trait in the alias | 14:45 |
bauzas | but that requires PCI in placvement | 14:45 |
dansmith | bauzas: I only care about pci in placement :) | 14:45 |
dansmith | sean-k-mooney: I don't see that in the docs, can you link me? | 14:45 |
bauzas | ah, then provide device_spec with traits | 14:46 |
bauzas | and then request that trait thru the alias | 14:46 |
bauzas | https://docs.openstack.org/nova/latest/configuration/config.html#pci.device_spec | 14:46 |
bauzas | https://docs.openstack.org/nova/latest/configuration/config.html#pci.alias | 14:46 |
dansmith | ... I don't see in the docs how to request a trait with the alias | 14:46 |
sean-k-mooney | dansmith: https://github.com/openstack/nova/blob/master/nova/pci/request.py#L110-L115 | 14:46 |
dansmith | them's not docs :) | 14:46 |
sean-k-mooney | dansmith: it may or may not be but it was added with the pci in placement feature | 14:47 |
bauzas | dansmith: there is a traits dict key that you can add | 14:47 |
sean-k-mooney | so it shoudl be in the spec | 14:47 |
bauzas | traits | 14:47 |
bauzas | An optional comma separated list of Placement trait names requested to be present on the resource provider that fulfills this alias. Each trait can be a standard trait from os-traits lib or it can be an arbitrary string. If it is a non-standard trait then Nova will normalize the trait name by making it upper case, replacing any consecutive | 14:47 |
bauzas | character outside of [A-Z0-9_] with a single ‘_’, and prefixing the name with CUSTOM_ if not yet prefixed. The maximum allowed length of a trait name is 255 character including the prefix. Every trait in traits requested in the alias ensured to be in the list of traits provided in the traits field of the [pci]device_spec when scheduling the | 14:47 |
bauzas | request. This field can only be used only if [filter_scheduler]pci_in_placement is enabled. | 14:47 |
dansmith | https://docs.openstack.org/nova/latest/configuration/extra-specs.html#pci_passthrough:alias | 14:47 |
sean-k-mooney | that is the wrong doc | 14:47 |
bauzas | you don't specify the trait explicitely | 14:47 |
bauzas | I mean in the flavor | 14:47 |
dansmith | ohhh, you mean an alias in _config_ can be a trait instead of a vendor/product? | 14:47 |
bauzas | the flavor only requests an amount of that alias | 14:48 |
bauzas | and that alias is defined by having a traits key in it | 14:48 |
bauzas | dansmith: yes | 14:48 |
bauzas | and device_spec allows you to tag with traits some PCI devices automatically | 14:48 |
sean-k-mooney | dansmith: correct you can use placement resouce classes + traits instead of vendor and product id | 14:48 |
bauzas | just add traits to the device_spec dict in question | 14:49 |
dansmith | I see, I thought we were talking about the alias in the request | 14:49 |
sean-k-mooney | in the flavor no | 14:49 |
bauzas | that's transparent to the user or to the API admin | 14:49 |
dansmith | so the doc for alias needs a trait example I guess | 14:49 |
sean-k-mooney | this is all intentioally hiden form the flavor | 14:49 |
bauzas | dansmith: we have the conf docs | 14:49 |
sean-k-mooney | dansmith: ya https://docs.openstack.org/nova/latest/admin/pci-passthrough.html#pci-tracking-in-placement is where i was expectin git | 14:49 |
dansmith | sean-k-mooney: I understand, and that's how I expected it to work, I just didn't see the path to that alias requesting one of multiple things | 14:50 |
bauzas | but indeed the examples don't show< it | 14:50 |
sean-k-mooney | or maybe here https://docs.openstack.org/nova/latest/admin/pci-passthrough.html#configuring-pci-aliases-for-users | 14:50 |
dansmith | we need one here: https://docs.openstack.org/nova/latest/configuration/config.html#pci.alias | 14:50 |
dansmith | because that says you can use traits, but doesn't give me an example to see where that is the only thing used | 14:50 |
gibi | I can add that doc if that helps getting reviews on the eventlet series :) | 14:50 |
bauzas | dansmith: at least that's documented below in the list of accepted keys | 14:50 |
dansmith | I'll try this locally and then document if it works | 14:50 |
bauzas | cool | 14:51 |
dansmith | bauzas: I understand (now), but obviously not obvious to me :) | 14:51 |
bauzas | dansmith: we tested that with Uggla when he was working on vfio-pci | 14:51 |
gibi | (I assume it works there are functional test coverage on it) | 14:51 |
bauzas | should work | 14:51 |
bauzas | Uggla can give you some examples | 14:51 |
sean-k-mooney | ya so in the devspec you can advertise them https://specs.openstack.org/openstack/nova-specs/specs/zed/approved/pci-device-tracking-in-placement.html#pci-device-spec-configuration and in the aliss you can requet them but we dont have that in the doc today | 14:51 |
dansmith | gibi: sorry, I'm not farting around for no reason, I'm trying to get some other important stuff tested and this is in my way.. I promise I'm not ignoring your series for unimportant reasons | 14:51 |
sean-k-mooney | https://specs.openstack.org/openstack/nova-specs/specs/zed/approved/pci-device-tracking-in-placement.html#pci-alias-configuration | 14:51 |
gibi | ack | 14:52 |
dansmith | sean-k-mooney: yeah I think I understand the glue path now | 14:52 |
bauzas | gibi: I'd be more than happy to give a shot to your series | 14:52 |
bauzas | provided my meeting of meetings is done | 14:52 |
sean-k-mooney | so in https://docs.openstack.org/nova/latest/configuration/config.html#pci.alias | 14:53 |
sean-k-mooney | we do mention traits but only provide the eexample of a resouce class | 14:53 |
dansmith | sean-k-mooney: right | 14:53 |
dansmith | which is really just vendor/model (by default) so it doesn't look helpful to someone looking for it | 14:53 |
bauzas | we need one more example, I agree | 14:57 |
bauzas | without the product/vendor ID need | 14:57 |
bauzas | but IIRC, we still need to somehow define them, right Uggla ? | 14:57 |
Uggla | Give me a sec, need to reload the context. | 14:59 |
bauzas | long story short, can we skip the vendor/product ID settings in the pci alias ? I think we had to do something like providing a star (*) | 15:00 |
dansmith | bauzas: star does not help me here | 15:01 |
dansmith | but as I said, I will test this example shortly | 15:01 |
bauzas | dansmith: I guess I understood your case | 15:01 |
bauzas | you wanna request a group of arbitrary pci devices by traits | 15:02 |
bauzas | that's why I'm saying you shouldn't provide vendor and product IDs in the alias definition or you would restrict that list | 15:02 |
gibi | yeah so if you have two different devices with different product id then you can whitelist them via address (with full or partial match on the address) and connect them the the same resource class in the whitelist, you can add traits there as well. Then you can request that resource class in the alias | 15:03 |
bauzas | but IIRC, when we wanted to do that thing, we found some limitation that we fixed somehow I can't recall (maybe a star or something else) | 15:03 |
Uggla | Yes I think in the alias section we can skip product / vendor and only use ressource class or trait. | 15:03 |
gibi | the key is that you can use the same RC in multiple device_spec lines | 15:03 |
Uggla | Yep I updated the documentation for the alias section to highlight that product_id / vendor_id are optional. | 15:05 |
dansmith | I don't see that | 15:07 |
dansmith | ...that they are optional.. I do see the class example | 15:07 |
Uggla | dansmith, https://docs.openstack.org/nova/latest/configuration/config.html --> alias --> A JSON dictionary which describe a PCI device. | 15:08 |
dansmith | gibi: traits would be better for me that RC.. if they use the same RC, then I can't do inventory planning based on RC, and also, I might want to still request via RC for specific devices, or traits for generic "just give me a 256G nvme" | 15:08 |
gibi | https://github.com/openstack/nova/blob/221a3e89e8988bc664298106ee691a4e41ca71f9/nova/tests/functional/libvirt/test_pci_in_placement.py#L1759-L1772 | 15:08 |
dansmith | Uggla: yeah I said I see the example, just not that vendor/product is optional (as it says for traits and rc) | 15:08 |
gibi | dansmith: if you don't define the RC then the RC will be based on the product_id so you cannot have a single alias requesting it as a single alias cannot request two different RC | 15:09 |
dansmith | gibi: ... but I can request just a trait no? | 15:09 |
Uggla | dansmith, --> Note that [...] indicates optional field. | 15:10 |
gibi | you can request a set of traits | 15:10 |
gibi | but you need to eithe provide an RC or a product/vendor id in the alias | 15:10 |
gibi | otherwise we cannot createa placement allocation candidate query | 15:10 |
dansmith | Uggla: fine, but very not as obvious as: "resource_class: The **optional** Placement resource class name" | 15:11 |
dansmith | gibi: okay if that's the case then that's definitely not very well explained in the docs here | 15:12 |
dansmith | lemme try with just the trait so I can see what happens | 15:12 |
gibi | lets improve the doc | 15:12 |
gibi | if you provides just traits in the alias and no RC or vendor/product id then you will get that the alias is invalid | 15:13 |
gibi | (I hope : ) | 15:13 |
dansmith | I do not, on service startup at least | 15:15 |
Uggla | dansmith, I agree it could be better. | 15:15 |
gibi | dansmith: it is validated when the alias is used | 15:18 |
gibi | it was true for before PCI in Placement time | 15:19 |
dansmith | gibi: so, I might be missing something but it looks to me like I was able to create a server and it just silently ignored my pci request | 15:19 |
dansmith | https://paste.opendev.org/show/bkADIKJZwFGSAv3UcXCF/ | 15:20 |
gibi | alias = {"traits": "CUSTOM_NVME256G", "device_type":"type-PCI", "name": "nvme256g"} | 15:21 |
gibi | this should have been rejected at server create. if not that is a bug | 15:21 |
gibi | it cannot work as we cannot generate a proper allocation candidate query as the resource class is mandatory there | 15:22 |
dansmith | you see it's running and you can confirm the embedded flavor attempted to request it right? | 15:22 |
gibi | yes I see | 15:22 |
gibi | so it is a bug | 15:22 |
gibi | dansmith: do you use the same nova-pci.conf for both compute and api? | 15:24 |
dansmith | both computes, api and scheduler yeah | 15:24 |
dansmith | that's why it's on mnt :) | 15:24 |
gibi | ack, I tend to make that mistake that I update the alias in the nova-compute conf and not in the nova-api conf | 15:25 |
dansmith | yep, tried to set myself up to avoid that this time | 15:25 |
gibi | (the whole above discussion shows that this is the first time we really started using the PCI in Placement feature so we are finding the real bugs now) | 15:26 |
dansmith | this is a just-rebuilt setup so let me test with the other aliases to make sure pci is working in general | 15:26 |
dansmith | hmm, no actually | 15:28 |
dansmith | I wonder if my pci filter is not getting set, hmm | 15:28 |
dansmith | ah, it's not | 15:29 |
dansmith | so that's probably why it worked before too, hrm | 15:29 |
dansmith | gibi: so, I had [scheduler] instead of [filter_scheduler] | 15:33 |
Uggla | dansmith, not sure but my pci_in_placement is in the filter_scheduler section. | 15:33 |
Uggla | :) | 15:33 |
dansmith | but, still works.. what all reads that? more than n-sch? | 15:33 |
dansmith | conductor maybe? | 15:34 |
gibi | if pci_in_placement is read by the api | 15:36 |
gibi | s/if// | 15:36 |
gibi | repot_in_placement is read by the compute | 15:37 |
gibi | (yes it is not optimal I know) | 15:37 |
dansmith | okay api is seeing the new value, still not stopping me | 15:38 |
gibi | I can try to reproduce the issue on my side... | 15:39 |
bauzas | what's the issue ? | 15:39 |
dansmith | but, even using a good alias isn't getting me the device or a failure, so something else must be broken. here | 15:40 |
Uggla | bauzas, my understanding is if you only specify an trait in the alias section, pci in placement should block that as it needs RC too. And it is not the case. | 15:41 |
bauzas | yeah, that could be the case | 15:42 |
bauzas | dansmith doesn't see anything in placement ? | 15:42 |
bauzas | about the PCI RPs | 15:42 |
bauzas | (just rebooted due to a wifi crash, <3 F42) | 15:42 |
dansmith | gibi: okay I had a typo in my flavor property so it was just being ignored. but, fixing that, here's what I see as a user: | 15:47 |
dansmith | https://paste.opendev.org/show/bI7lakvj0atefW6KPNkw/ | 15:47 |
dansmith | the log does have something in it, but it's not quite obvious I suspect | 15:47 |
dansmith | I suspect no operator will know what "resources={CUSTOM_PCI_NONE_NONE=1}" means | 15:47 |
gibi | yepp we need to add an explicit validation | 15:49 |
gibi | that either RC or product/vendor needs to be provided in the alias | 15:49 |
dansmith | yeah | 15:50 |
gibi | I can file a bug and link it to the downstream PCI in Placement Jira as well so we will have time for it to fix | 15:51 |
dansmith | gibi: so ... why again can we not support just the trait? You can ask for all providers with that trait, no? | 15:51 |
gibi | you cannot have an allocation candidate without telling placement what resource you are allocating | 15:51 |
dansmith | because you're asking for inventory not a provider? | 15:51 |
dansmith | (and providers have the traits) | 15:52 |
gibi | I'm asking for things I can allocate, I cannot allocate a trait or a provider I can only allocate a piece of inventory | 15:52 |
dansmith | right, that's what I mean | 15:53 |
gibi | yepp | 15:53 |
gibi | in our oversimplified case where a single PF RP only have a single RC with a single inventory allocation the RP or allocating one piece of that inventory is the same. | 15:54 |
* dansmith nods | 15:54 | |
gibi | so it is easy to make that mental jump that we want to allocate RPs | 15:54 |
gibi | this also shows the effect of the design decisions that we connect traits to RPs not to an inventory of the RP | 15:56 |
gibi | we allocate inventories but we filter on traits that are not on the inventories | 15:57 |
gibi | it is a bit of a mixup | 15:57 |
gibi | so we tend to namespace traits to somehow refer to inventories :) | 15:57 |
gibi | like with the OTU trait, we put PCI in the name to signify it is only related to PCI inventories of the RP | 15:58 |
gibi | anyhow | 15:58 |
gibi | I will going to file a bug | 15:58 |
gibi | based on your paste | 15:58 |
Uggla | gibi, yes but finally the code has blocked the invalid settings. So not so bad. | 16:01 |
gibi | https://bugs.launchpad.net/nova/+bug/2111440 | 16:02 |
gibi | Uggla: yep, but we should consider pre-validating the alias at sevice startup | 16:02 |
gibi | and also be a bit more instructive in the lgs | 16:02 |
gibi | logs | 16:02 |
dansmith | gibi: yeah there have been a few things in this journey where traits applied to providers to say something about inventory has been a bit weird | 16:02 |
dansmith | gibi: Uggla I also had a typo in my pci alias config (unparseable json) and it didn't get noticed until the first time I tried to boot with a pci-enabled flavor, and we returned json parse complaints to the user in the error :( | 16:03 |
dansmith | which of course is "line 1, character X" which makes no sense since they're all one line | 16:04 |
dansmith | would be much better if we can parse and log on startup which one is the problem | 16:04 |
gibi | filed the upstream bug and a matching downstream jira | 16:05 |
opendevreview | sean mooney proposed openstack/nova master: move compile earlier https://review.opendev.org/c/openstack/nova/+/950516 | 16:07 |
sean-k-mooney | dansmith: that a typo in the docs | 16:11 |
sean-k-mooney | when we added the docs for pci maged and live migration flags | 16:12 |
dansmith | what's a typo in the docs? | 16:12 |
sean-k-mooney | we missed it was not valid json. i hit the same thing when testin gthe OTU devices feature and pinged Uggla but i dont knwo if we filed an upstream bug or not to fix it | 16:13 |
sean-k-mooney | gibi: validiatign the config on start up is good we also shoudl be cachign it. | 16:13 |
sean-k-mooney | gibi: https://review.opendev.org/c/openstack/nova/+/427145 | 16:15 |
gibi | sean-k-mooney: good point, added the link to the bug | 16:16 |
sean-k-mooney | stephen didn twant the addtional complexity but going form 4062 function calls to 3 after caching is a masive savign | 16:17 |
sean-k-mooney | espically since we can just slap functools.cache on it now | 16:18 |
sean-k-mooney | from the blueprint """During the creation of an instance, a list of request is build. One of this requested elements are the PCI devices, both from the API call and from the flavor. | 16:20 |
sean-k-mooney | In the PCI request gathering from the flavor, the variable "pci_alias", stored in the Nova config file, is always parsed. This step is only needed once, because the information is static.""" | 16:20 |
sean-k-mooney | so every time we need to trasnlate a pci alise to to a pci request object we parst all the alisas again https://github.com/openstack/nova/blob/master/nova/pci/request.py#L124-L209 | 16:23 |
dansmith | Uggla: (cc gibi) FYI nvme emulation in qemu does work for testing this and is cleanable (via format only, not sanitize) | 16:24 |
Uggla | dansmith, cool ! | 16:25 |
dansmith | that device reports that it supports lots of namespaces (not sure if it really does, haven't tested yet) but that flags a warning I have queued for the script locally about multi-namespace devices being cleaned with format only (so that's a good test) | 16:25 |
sean-k-mooney | dansmith: the only issue wiht that iw while qemu can emulate it libvirt currently does nto supprot that so you have to use qemu directly unless i missed something in the xml for this ? | 16:25 |
dansmith | sean-k-mooney: right, you can configure it in the xml, but only via qemu:commandline | 16:25 |
gibi | cool. So we can use that for local testing. | 16:26 |
dansmith | so not helpful for people testing on top of an existing nova impl, unfortunately | 16:26 |
gibi | not in CI though | 16:26 |
dansmith | gibi: yeah | 16:26 |
dansmith | right | 16:26 |
sean-k-mooney | you can yes | 16:26 |
dansmith | gibi: it's also helpful so you can get a bunch of devices instead of just one (if you only have one) | 16:27 |
sean-k-mooney | i looked into that when we were reviwiong the spec but kind of gave up when i realed the gap in libvirt. you could just add the qemu args in virt manager to test it locally or just invoke qemu yourslef | 16:27 |
dansmith | https://termbin.com/tm0b | 16:28 |
sean-k-mooney | @gibi: the crd change is in merge conflict https://github.com/openstack-k8s-operators/nova-operator/pull/948 | 17:03 |
sean-k-mooney | gibi: im sure its trival but it will need to be rebased before it can merge | 17:03 |
sean-k-mooney | oh you already have | 17:03 |
sean-k-mooney | gibi: in that case ill add lgtm and assumign it passes ci it can proceed, does that work for you? | 17:04 |
gibi | yepp | 17:07 |
gibi | works for me | 17:07 |
gibi | (strange place to discuss it but fine :) | 17:08 |
sean-k-mooney | gibi: hehe wrong tab | 17:13 |
* sean-k-mooney has been trying diffent way of arranging my widnows to be less interup driveen | 17:14 | |
* sean-k-mooney does not think that will work out for me becasue i like being interupt driven | 17:15 | |
-opendevstatus- NOTICE: Gerrit is being updated to the latest 3.10 bugfix release as part of early prep work for an eventual 3.11 upgrade. Gerrit will be offline momentarily while it restarts on the new version. | 17:34 | |
opendevreview | Dan Smith proposed openstack/nova master: Make example OTU cleaner support NVMe sanitize https://review.opendev.org/c/openstack/nova/+/950592 | 17:52 |
sean-k-mooney | gmaan: i didnt get as far as i hoped today but the first 10 patches in stephens sereise are still good to merge IMO i.e. https://review.opendev.org/c/openstack/nova/+/936365/8 -> https://review.opendev.org/c/openstack/nova/+/937048/11 | 18:42 |
sean-k-mooney | ill take a look at the next coupel of patches tomorrow | 18:43 |
gmaan | sean-k-mooney: ack, I will see if I can check those this week but next week I am planning. | 18:44 |
Callum027 | Hi sean-k-mooney, I added a comment to the change but I'd thought I'd let you know here as well - sorry for the inactivity on my changes, been a bit busy with work. We've deployed the changes to our production OpenStack environments and everything seems to be working great, so I'm happy to attend the next Nova meeting so we can discuss the proposal | 18:44 |
Callum027 | and hopefully get it merged for Flamingo. | 18:44 |
sean-k-mooney | Callum027: ack, if you cant just send a mail to the list asking for it to be approved | 18:45 |
sean-k-mooney | i hope its not contoverial | 18:45 |
sean-k-mooney | there is at least some other interest form teh teleemtry folks to enabel this with cloud kitty | 18:46 |
sean-k-mooney | so that posivie | 18:46 |
Callum027 | Yeah, I'm sure they'd be interested since this improves the scalability of Ceilometer polling | 18:47 |
Callum027 | I can understand why people wouldn't be happy about it since this is basically putting lipstick on a pig, but anything more radical would require more fundamental design changes and I'm not sure I'm capable of championing that at this stage :) | 18:47 |
sean-k-mooney | well with my nova hat on it replaceing Ceilometer pooling the nova api with Ceilometer polling the libvirt api so if its a perfroamc eissue its not someone elses probelem. on the ohter had the domain xml is ment to be internal state | 18:49 |
sean-k-mooney | so if we just stop using them or libvirt changed ot yaml | 18:49 |
sean-k-mooney | that not a breakign change form a nova perspecitve as the xml is not public | 18:49 |
sean-k-mooney | but realisticlly that not going to happen | 18:49 |
sean-k-mooney | so as long as ceilometer is readonly its proably a net win | 18:50 |
Callum027 | I do think it's a relatively elegant solution to the scalability issues Ceiilometer faced, with the drawback that changes to the files require all instances to be shelved-and-unshelved to apply the changes (or updated in place using virsh commands) | 18:58 |
sean-k-mooney | no it just requries a hard reboot | 18:59 |
sean-k-mooney | the xml is regenerated on every hard reboot or start/power on call | 19:00 |
Callum027 | Oh, right, of course | 19:00 |
sean-k-mooney | shelve would work too but there are cheaper options | 19:00 |
Callum027 | I should update the release notes on those commits to mention that | 19:00 |
Callum027 | But the main thing is live migrating doesn't work since the XML gets copied over without modificaiton | 19:00 |
sean-k-mooney | it will also get added on a live migrate i think | 19:01 |
Callum027 | In our testing it didn't, but we're using an older version of Nova | 19:01 |
sean-k-mooney | or at least it would if you implemnted that in your code change | 19:01 |
sean-k-mooney | well no i mean nova can do that but you need to write the code | 19:01 |
Callum027 | Maybe part of "standardising" this would be that we write a more elegant way of updating the metadata of instances in place | 19:01 |
sean-k-mooney | https://github.com/openstack/nova/blob/master/nova/virt/libvirt/migration.py#L55-L97 | 19:02 |
Callum027 | For our production environment we wrote a one-off Ansible playbook to do it | 19:02 |
sean-k-mooney | i mean that fair too | 19:02 |
sean-k-mooney | although we dont like operators touching the xml | 19:02 |
Callum027 | Yeah, I definitely would have preferred to use the proper core for it | 19:02 |
sean-k-mooney | so perhaps as a follow up we could add updating of the metadata in the domain on live migrate | 19:03 |
sean-k-mooney | that not doen today in general | 19:03 |
sean-k-mooney | so its not really a bug in yoru patch | 19:03 |
sean-k-mooney | its just exsitng beahvior. cold migrate would pick up the chakge as weel | 19:03 |
Callum027 | Yeah, it's definitely a useful change though | 19:04 |
Callum027 | For now I've added the item to the agenda, I'll make sure I'm at the next meeting | 19:04 |
sean-k-mooney | ack | 19:04 |
opendevreview | sean mooney proposed openstack/nova master: move compile earlier https://review.opendev.org/c/openstack/nova/+/950516 | 19:17 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!