ykarel | Hi after https://review.opendev.org/c/openstack/nova/+/924731 - Change force_format strategy to catch mismatches | 05:41 |
---|---|---|
ykarel | many tests are failing involving snapshot/rescue, is this something known already? | 05:42 |
ykarel | may be now as don't see any recent bug for it, will report | 05:43 |
ykarel | reported https://bugs.launchpad.net/nova/+bug/2073944 | 05:57 |
*** bauzas_ is now known as bauzas | 06:04 | |
ralonsoh | sean-k-mooney, ^ | 06:11 |
ralonsoh | bauzas, ^ | 06:11 |
ralonsoh | upsss maybe you didn't see the message | 06:11 |
ralonsoh | https://bugs.launchpad.net/nova/+bug/2073944 | 06:11 |
ralonsoh | since https://review.opendev.org/c/openstack/nova/+/924731 | 06:11 |
opendevreview | Balazs Gibizer proposed openstack/nova master: DNM: test downstream cross repo testing https://review.opendev.org/c/openstack/nova/+/924823 | 07:18 |
sean-k-mooney | ralonsoh: is that on neutron ci | 07:59 |
sean-k-mooney | ykarel: ralonsoh: if i recall correctly neutron is using the uec images (ami format) and has not neessialy configure tempest correctly for that | 08:00 |
ralonsoh | let me check that | 08:00 |
sean-k-mooney | https://github.com/openstack/neutron/blob/master/zuul.d/tempest-multinode.yaml#L89-L90 | 08:01 |
ralonsoh | sean-k-mooney, that change was to address https://bugs.launchpad.net/nova/+bug/1939108 | 08:02 |
sean-k-mooney | tempest is by default configured ot expect qcow | 08:02 |
ralonsoh | https://review.opendev.org/c/openstack/neutron/+/916629 | 08:02 |
sean-k-mooney | ralonsoh: right we have since determined that, using the uec image does not actully fix the probelm | 08:03 |
sean-k-mooney | it makes it less likely but we did it last cycle and the issue didnt go away for us | 08:03 |
ralonsoh | sean-k-mooney, ok, I'll revert the neutron patch then | 08:03 |
sean-k-mooney | reverting is one approch the other is to set the approrate disk/container formats in the tempest config | 08:05 |
sean-k-mooney | for 2024.1 we are also using thte uec images and im currently looking at regerting that to bring it inline with master | 08:05 |
ralonsoh | but if using uec images makes the issue lest likely to happen | 08:05 |
ralonsoh | what should I configure in tempest for that? | 08:06 |
sean-k-mooney | https://review.opendev.org/c/openstack/nova/+/924746/2 | 08:06 |
sean-k-mooney | i think that | 08:06 |
sean-k-mooney | but in general the uec image just change the timeign | 08:06 |
sean-k-mooney | which means if you dont land on a slow node you might get lucky | 08:06 |
ralonsoh | yeah, that's another issue | 08:07 |
sean-k-mooney | ralonsoh: so Kernel panic - not syncing: IO-APIC + timer doesn't work! Boot with apic=debug and send a report. Then try booting with the 'noapic' option. | 08:07 |
sean-k-mooney | is actully fixed by using cirros 0.6.2 | 08:07 |
sean-k-mooney | that specific kernel panic in https://bugs.launchpad.net/nova/+bug/1939108 was a know kernel bug that was fixed in the cirros 0.6.x series | 08:08 |
ralonsoh | ok, so if this issue is solved in 0.6.2, I could also use qcow2 images too | 08:09 |
ralonsoh | right? | 08:09 |
sean-k-mooney | yep | 08:09 |
ralonsoh | that will fix the current problem | 08:09 |
ralonsoh | perfect then! | 08:09 |
sean-k-mooney | the other kernel panics im hoping have been fixed by https://review.opendev.org/c/openstack/devstack/+/924094 | 08:09 |
bauzas | just saw _colby's question, I probably need more details | 08:10 |
bauzas | like the nvidia driver version they use | 08:10 |
sean-k-mooney | bauzas: ya i was hoping you recalled when nvidai split the driver in 2 | 08:10 |
bauzas | was a long time ago | 08:10 |
sean-k-mooney | they proably didnt make that chagne for centos8 | 08:10 |
bauzas | GRID16 | 08:11 |
sean-k-mooney | but not that they are on centos 9 the preobaly pulled the latest dirvers | 08:11 |
bauzas | I can lookup the A40 support for GRID16 | 08:11 |
bauzas | yeah | 08:11 |
bauzas | but GRID 16 and 17 transition around RHEL9 | 08:12 |
bauzas | both support some RHEL9 versions, just not the same | 08:12 |
bauzas | GRID16 is also a LTE version | 08:12 |
bauzas | while GRID17 is only a production version, ie. with a 1-year term support | 08:12 |
bauzas | https://docs.nvidia.com/vgpu/16.0/product-support-matrix/index.html#abstract__red-hat-el-kvm | 08:14 |
*** chuanm6 is now known as chuanm | 08:14 | |
ykarel | ralonsoh, sean-k-mooney but we still hit kernel panic with those 0.6.2 qcow images, i think i have never seen that issue with uec images | 08:37 |
ykarel | https://32b7a8914f85d2150a67-dfd2b77d4a65df68dc83e2cbebde05a8.ssl.cf1.rackcdn.com/924004/4/check/tempest-full-py3/861ecb0/testr_results.html | 08:37 |
ykarel | i mean never seen that issue with uec images in neutron jobs that we track | 08:39 |
ralonsoh | ykarel, we can also add https://review.opendev.org/c/openstack/nova/+/924746/2 in our jobs | 08:41 |
ykarel | ralonsoh, yeap +1 to that as we should also avoid those random kernel panics in our jobs | 08:41 |
ralonsoh | ykarel, perfect, I | 08:42 |
ralonsoh | I'll change the neutron patch | 08:42 |
opendevreview | Mark Goddard proposed openstack/nova master: ironic: Let Ironic handle deployment cleanup actions during destroy https://review.opendev.org/c/openstack/nova/+/883411 | 09:02 |
sean-k-mooney | ykarel: we have see that issue with the uec images, when we swapped to them the kernel panices reduced but they never went away | 09:04 |
sean-k-mooney | ykarel: i will also poitn out that creating vms with uec images is not how 99% of people use openstack | 09:05 |
sean-k-mooney | its why nova in particaly cant just use uec images even if that was a 100% reliable solution which it is not. | 09:06 |
*** bauzas_ is now known as bauzas | 09:08 | |
*** bauzas_ is now known as bauzas | 09:21 | |
frickler | sean-k-mooney: iiuc the 2024.1 fix is blocked by https://review.opendev.org/c/openstack/nova/+/923831 now? so it could be rebase on top of it to get the tests to pass and reviewers could approve things in parallel? | 09:38 |
sean-k-mooney | its not blocked | 09:41 |
sean-k-mooney | frickler: well it is | 09:41 |
sean-k-mooney | but rebasing is not the answer | 09:41 |
sean-k-mooney | we can merge https://review.opendev.org/c/openstack/nova/+/923831 | 09:41 |
sean-k-mooney | and then just rehceck | 09:41 |
sean-k-mooney | i woudl prefer to not have to respin the cherrypics for the sha but i can if needed | 09:42 |
frickler | which takes more time than to rebase now. so anyway, bauzas gibi could you look at ^^ please | 09:42 |
sean-k-mooney | frickler: i was about to ping bauzas, gibi is on pto today but i thinke we can proceed with this without there review | 09:43 |
frickler | yeah, sorry for being impatient, I was just hoping that with just a single commit per branch this whole thing would be done within hours and not take days again | 09:44 |
sean-k-mooney | frickler: well that why i started backporting https://review.opendev.org/c/openstack/nova/+/923831 2 weeks ago | 09:45 |
sean-k-mooney | i had hoped ot have that in place before this hit | 09:45 |
sean-k-mooney | the prvious cve was also impacted by theuse of the uec images | 09:45 |
sean-k-mooney | but that ended up bing partly mitigated | 09:46 |
ykarel | sean-k-mooney, ack atleast in our jobs we have not reproduced that issue with uec image | 10:01 |
frickler | sean-k-mooney: I noticed that I do have stable core powers, so added +2, but leaving the approval for you | 10:06 |
sean-k-mooney | ill give bauzas an hour or so and move forward if he or stephenfin dont have time to review https://review.opendev.org/c/openstack/nova/+/923831 by noon(utc+1) | 10:13 |
bauzas | here now | 10:16 |
bauzas | https://review.opendev.org/c/openstack/nova/+/923831 +Wd | 10:17 |
sean-k-mooney | bauzas: thanks | 10:17 |
opendevreview | Michael Still proposed openstack/nova master: libvirt: Add config option to require secure SPICE. https://review.opendev.org/c/openstack/nova/+/922544 | 10:49 |
opendevreview | Michael Still proposed openstack/nova master: libvirt: allow concurrent access to SPICE consoles. https://review.opendev.org/c/openstack/nova/+/922546 | 10:49 |
opendevreview | Michael Still proposed openstack/nova master: WIP: libvirt: Add guest devices to support SPICE USB. https://review.opendev.org/c/openstack/nova/+/922547 | 10:49 |
opendevreview | Michael Still proposed openstack/nova master: WIP: libvirt: Optionally enable SPICE debug logging. https://review.opendev.org/c/openstack/nova/+/922548 | 10:49 |
opendevreview | Michael Still proposed openstack/nova master: WIP: libvirt: Optionally support sound when using SPICE. https://review.opendev.org/c/openstack/nova/+/922549 | 10:49 |
opendevreview | Michael Still proposed openstack/nova master: libvirt: allow direct SPICE connections to qemu https://review.opendev.org/c/openstack/nova/+/924844 | 10:49 |
mikal | sean-k-mooney: https://review.opendev.org/c/openstack/nova/+/924844/1 is my first swing at the reworking of the Nova APIs for the SPICE consoles. | 10:49 |
sean-k-mooney | mikal: stephenfin is working on a paralle series to add openapi schema validation that might collide with that by the way but we can cross that bridge wehn we get there | 10:51 |
mikal | sean-k-mooney: yeah, fair enough. Imma just gonna keep rebasing and uploading for now. | 10:51 |
sean-k-mooney | mikal: currently my focuse is unfortuentlly on backporting a cve fix upstream and downstream so while i skimmed that i proably wont be able to start reviewing properly unitl next week | 10:53 |
mikal | sean-k-mooney: assuming that the bottom three of those pass CI (and the first two already have before), I'll move onto reimplementing the sound model as an extra spec next. There is also a patch in that stack which adds USB redirect devices, I am undecided if I should make the number of devices configurable in extra specs as well. | 10:53 |
sean-k-mooney | hopefullly we will get most of the cve patches landed today | 10:53 |
mikal | sean-k-mooney: yeah fair enough, I'm just a bit concerned about the code complete deadline, so I want to get them out there as quick as I can. | 10:54 |
sean-k-mooney | we make the number of serial ports configurable in the flavor extra specs and image properlteis i belive | 10:54 |
sean-k-mooney | so i dont see a harm in supproting bot | 10:54 |
sean-k-mooney | i.e. supportign usb count in both flavor and image | 10:55 |
mikal | I haven't read https://review.opendev.org/c/openstack/nova-specs/+/920687 in depth yet, but I will do so before doing any USB stuff which might conflict. | 10:55 |
sean-k-mooney | jsu tmake sure if they disagree that you raise a falvor_image_configlct exctption in the api properly | 10:55 |
mikal | Is there a good example of a previous extra spec which does what you want that I can cargo cult? | 10:55 |
sean-k-mooney | mikal: am yes i can proably find one. likely the vPMU or vIOMMu ones | 10:56 |
sean-k-mooney | one sec ill grab the relevent review | 10:56 |
mikal | sean-k-mooney: ok cool. No rush though, its basically my bed time so I wont look at this until tomorrow morning your time anyways. | 10:57 |
sean-k-mooney | mikal: as prior arut to allwoign it to be set in flavor i woudl sight https://github.com/openstack/nova/blob/master/nova/api/validation/extra_specs/hw.py#L448-L459 as evidnce :) | 10:57 |
sean-k-mooney | mikal: https://github.com/openstack/nova/commit/326bc658eef04576230d5ba90d2a02bf32deee03 | 10:58 |
sean-k-mooney | mikal: that predates the flavor extra spec validation feature by the way | 10:59 |
mikal | sean-k-mooney: oh, that makes me realise I probably should have written release notes for these changes... | 10:59 |
sean-k-mooney | mikal: so when addign flavor extra specs please make sure you add a vlaidator too | 11:00 |
sean-k-mooney | mikal: yes you have two options on that front. start a singel releasenote in the base patch that has a useable feature and update it in the rest | 11:00 |
sean-k-mooney | or have a sperate release note for each seperate adddtion of funcitonality | 11:01 |
mikal | sean-k-mooney: yeah, that's not going to happen tonight. That can be tomorrow Michael's problem. | 11:01 |
sean-k-mooney | :) | 11:01 |
sean-k-mooney | we sometimes put hte release not in the api patch that enabels the fucntionaltiy, we normlaly keep that to the end of the series by the way | 11:02 |
mikal | Either way I'd appreciate feedback, but I get that you're busy. I'll do release notes tomorrow before I move onto extra specs etc. | 11:02 |
sean-k-mooney | mikal: well the main feedback is you put the api change before the dirver change that implemted the feature | 11:02 |
sean-k-mooney | that is the wrogn order | 11:02 |
mikal | In my head the remaining patches (after the first three) are "more optional". You can live without USB redirection and sound if you need to. The feature is still useful to some people without those. | 11:03 |
sean-k-mooney | you need to put the api change at the end because once the api is intoduced the feature shoudl be useable fully | 11:03 |
sean-k-mooney | oh actully | 11:03 |
mikal | The feature should be fully usable after patch three. | 11:03 |
sean-k-mooney | ok ya i see | 11:03 |
mikal | (Technically patch 3 would work on its own, but the others were already up there). | 11:04 |
sean-k-mooney | the rest are sound, debuging and usb | 11:04 |
sean-k-mooney | ok then its fine as is | 11:04 |
mikal | Yeah, and I might drop debugging. I am undecided. | 11:04 |
sean-k-mooney | it proably does not need seperate debug log cofnig form our normal debug config | 11:04 |
mikal | Its so verbose (and so weird because its GTK debugging) that its useful sometimes, but you have to be quite sad to turn it on. | 11:04 |
sean-k-mooney | unless your very very verbose | 11:05 |
mikal | Its very very verbose debugs in the libvirt qemu logs. | 11:05 |
mikal | Its orchestrating qemu to write debug logs, not nova. | 11:05 |
mikal | It helped when I was reverse engineering some bits of the protocol and occasionally crashing the hypervisor. | 11:05 |
sean-k-mooney | mikal: oh ... | 11:06 |
sean-k-mooney | you are aware we don not allow wemu command line options to be used in nova right | 11:06 |
mikal | The SPICE protocol docs are... not good. | 11:06 |
mikal | sean-k-mooney: as in what happens in https://review.opendev.org/c/openstack/nova/+/922546/5/nova/virt/libvirt/config.py is banned? | 11:07 |
sean-k-mooney | so we cant procedd with https://review.opendev.org/c/openstack/nova/+/922546/5/nova/virt/libvirt/config.py | 11:07 |
mikal | That is a shame, because that feature is cool. | 11:07 |
mikal | But libvirt only exposed it as a command line flag. | 11:07 |
sean-k-mooney | then we need a livbirt chagne before we can support configuing tha tin nova | 11:08 |
sean-k-mooney | i have unfrutaly had to do that dance several times | 11:08 |
mikal | Its this -- https://www.spice-space.org/multiple-clients.html | 11:08 |
sean-k-mooney | if we use the qemu command elemets directly it marks the domain as tainted and voids all supprot form most distors virt teams | 11:09 |
sean-k-mooney | as a result we have had a strict ban on it upstream for as long as i have worked on openstack | 11:09 |
mikal | I thought secure boot required it too though? | 11:09 |
sean-k-mooney | nope | 11:09 |
mikal | Herm, I no longer remember why I believe that <qemu:arg value='-global'/><qemu:arg value='ICH9-LPC.disable_s3=1'/> is required with secure boot. | 11:10 |
mikal | Oh, its required for Windows 11 support I think... https://github.com/quickemu-project/quickemu/issues/220 | 11:10 |
sean-k-mooney | mikal: are you perhaps usign secvure boto with the pc machine type or somethign like that | 11:10 |
sean-k-mooney | mikal: becasue officaly secure boot is considered experimiantal unless your useing q35 | 11:11 |
mikal | sean-k-mooney: no, I'm definitely using q35 with secure boot. This wasn't in nova though, it was another code base. | 11:12 |
mikal | Said other code base just has three examples of qemu command line setting: this ICH9 thing; nvme disks; and concurrent SPICE sessions. | 11:13 |
sean-k-mooney | ich9 i thnk is either the dound device or the chipset | 11:13 |
mikal | Anyways, I can drop that patch from the series of Nova doesn't want to support concurrent SPICE sessions, its just a nice feature to have. | 11:13 |
mikal | Yeah, ICH9 is a sound device. | 11:14 |
sean-k-mooney | well we can supprot it but we need libvirt to supprot it first | 11:14 |
sean-k-mooney | mikal: ok well we dont add soudn device today if i recall | 11:14 |
mikal | Well, its an IO controller hub, which does sound. Its one of the libvirt sound model options. | 11:14 |
sean-k-mooney | yep i have plaied with the setting in virt-manager before | 11:14 |
sean-k-mooney | i just dont recall if we generate that upstream or not | 11:15 |
mikal | I'll just drop the SPICE concurrency patch for now. There is zero chance of me pursuing a libvirt patch right now. | 11:15 |
opendevreview | Merged openstack/nova stable/2024.1: Stop using split UEC image (mostly) https://review.opendev.org/c/openstack/nova/+/923831 | 12:46 |
frickler | sean-k-mooney: bauzas: grenade jobs are failing on the 2023.1 patch, but should grenade actually still run there when zed is eom? https://review.opendev.org/c/openstack/nova/+/924734 | 13:14 |
frickler | (seems more of the setuptools woe happening there) | 13:16 |
frickler | also https://zuul.opendev.org/t/openstack/build/c8fbe90f70974d2f918ee7fa6da54575 looks recheckable to me | 13:16 |
sean-k-mooney | frickler: we have debated that before. so its failign because of the setuptool issue right | 13:19 |
frickler | sean-k-mooney: the grenade failures that I looked at, yes. note the failure is on the controller, the later failure on compute1 is just secondary | 13:21 |
sean-k-mooney | i skimed them this morning and didnt see anytign really related to the patch | 13:21 |
sean-k-mooney | i didnt really look into test_get_list_deleted_instance_actions closely | 13:22 |
sean-k-mooney | Build of instance 38107228-1260-473d-b6a8-7f3148a135ce aborted: Failed to allocate the network(s), not rescheduling.: nova.exception.BuildAbortException: Build of instance 38107228-1260-473d-b6a8-7f3148a135ce aborted: Failed to allocate the network(s), not rescheduling. | 13:23 |
sean-k-mooney | ok so that is also unrelated | 13:23 |
bauzas | .... | 13:24 |
sean-k-mooney | frickler: so yes i think that will pass if we reject if the upper constraits bump for packaging has been merged | 13:31 |
sean-k-mooney | to fix the focal issues | 13:31 |
dansmith | ykarel: around? | 13:38 |
dansmith | oh I guess sean-k-mooney and ralonsoh have already discussed the image content failure | 13:39 |
ralonsoh | dansmith, most probably will change the neutron jobs to use qcow2 images, we are still testing | 13:41 |
dansmith | ralonsoh: excellent | 13:41 |
sean-k-mooney | dansmith: longterm we should fix tempets but that the short term solution. for our 2024.1 branch we have merge the revert back to qcow | 14:15 |
sean-k-mooney | and we had previously green result on 2023.2 | 14:16 |
dansmith | long-term we should drop ami/uec/whatever support :) | 14:16 |
sean-k-mooney | :) perhaps | 14:16 |
sean-k-mooney | alpine now provide cloud images by the way with cloud init in qcow format | 14:16 |
sean-k-mooney | so i might look at usign those instead of cirros again if we continue to see kernel panics | 14:17 |
sean-k-mooney | dansmith: a i saw you repsonded on https://bugs.launchpad.net/cinder/+bug/2073413 as well | 14:19 |
dansmith | yup | 14:19 |
sean-k-mooney | v2 qcow images might be safe if the lack of the feature mask means they cant have backing files | 14:20 |
sean-k-mooney | but if they can and we just didnt ahve a clean way to detech them before v3 | 14:20 |
sean-k-mooney | then im much less open to supprotign them going forward | 14:20 |
* frickler would be really interested to see whether alpine works better than cirros | 14:21 | |
dansmith | yeah, I think they're probably safe here and we can exclude them, but I'm in no big rush to enable them.. that's pretty old at this point, none of the things I've tested use that old format (of course) and you can qemu-img convert your way to a v3 very easily | 14:22 |
sean-k-mooney | frickler: i can hack a poc of that i have a job that pull form my file share but i can likely just update the url | 14:22 |
dansmith | I'm working on a patch against oslo, but I don't think we need to apply it everywhere first | 14:22 |
ykarel | sean-k-mooney, wrt your alternate solution re. configure tempest for image create/upload | 14:53 |
ykarel | have you checked whoami-rajat comment regarding issue in format inspector? | 14:54 |
dansmith | ykarel: which issue? | 14:54 |
ykarel | as the images are created from snapshot, so tempest config doesn't seem involved here | 14:54 |
ykarel | dansmith, https://bugs.launchpad.net/nova/+bug/2073944 | 14:54 |
dansmith | ah | 14:56 |
dansmith | perhaps bfv instances and cinder is uploading always as raw even if it's qcow2? | 14:57 |
dansmith | or, cinder is setting the disk_format to ami, but uploading a qcow2 I guess | 14:58 |
ykarel | whoami-rajat, once around if you can check/confirm ^ | 15:45 |
opendevreview | Dan Smith proposed openstack/nova master: DNM re-enable UEC images for testing https://review.opendev.org/c/openstack/nova/+/924865 | 16:02 |
opendevreview | Dan Smith proposed openstack/nova master: WIP: Remove AMI snapshot format special case https://review.opendev.org/c/openstack/nova/+/924866 | 16:02 |
_colby | sean-k-mooney: bauzas: thanks. We were able to get instances running with the vgpu and the A40. Its still creating all the resource providers for every virtual function (event the ones not allowed/configured in nova) unlike the centos 8 machines. So we might have some issues with vgpu recycling and removing mdevs. Luckily for this host its going to be a single project using it and the vgpu sizes will stay the same. | 17:04 |
_colby | We are using the latest Nvidia vgpu drivers | 17:05 |
dansmith | melwitt: sean-k-mooney: early signs show that patch combo repro's and fixes the problem | 17:09 |
sean-k-mooney | dansmith: just back. that is good to hear | 17:11 |
dansmith | I need to fix unit tests | 17:13 |
melwitt | sounds good but I think I missed, how does the snapshot metadata path get taken with rescue? | 17:13 |
melwitt | or maybe it's that rescue tempest tests make snapshots? | 17:14 |
melwitt | ok yeah I see that stable rescue test makes a snapshot to create a rescue image | 17:15 |
sean-k-mooney | dansmith: so am i correct in thinking that there may be instance that have had snapshot created in the past for ami guess that are not the wrong format in glance | 17:22 |
dansmith | I failed to parse that question | 17:22 |
sean-k-mooney | do you know if glance allow the format to be updated to reflect reality? | 17:22 |
dansmith | glance does not allow updating the disk_format after creation (except internally due to image conversion) | 17:22 |
sean-k-mooney | ack so if you booted a guest from a uec/ami image and took a snapshot | 17:23 |
sean-k-mooney | that was uploaded as ami | 17:23 |
dansmith | you're asking about existing snapshots that have an incorrect disk_format? that will be a problem for everyone, even not AMI.. like if you have always had isos or qcows registered as raw because nobody checked, those are all now broken | 17:23 |
sean-k-mooney | then we can nologner use that snapshot correct | 17:23 |
sean-k-mooney | right | 17:23 |
dansmith | yes | 17:23 |
sean-k-mooney | ok im wondering if we need to consider a tools or something to help with that case in the future | 17:24 |
sean-k-mooney | basically tring ot think of if there is anythign we can do for isntance that are shelved with the wrogn format for example | 17:25 |
dansmith | yeah, I mean hacking the glance DB is probably the only way, other than a glance-manage automation of that | 17:27 |
dansmith | I say we wait until someone demands it | 17:27 |
sean-k-mooney | ya i also have no idea how powsisbel it woudl even be if your using glance with cinder as a backend for example | 17:29 |
dansmith | zomg | 17:45 |
dansmith | one of those tests was silently failing because a mock didn't set disk_format | 17:45 |
dansmith | and now that I'm removing the line that was failing and being covered up. by the test, we're actually running a test that use ami and failing for other broken crap | 17:46 |
dansmith | because a missing o.vo object attribute raises NotImplementedError, which the test is ignoring because some drivers don't implement snapshot | 17:47 |
dansmith | *facepalm* | 17:47 |
dansmith | actually, this test wasn't running for libvirt *at all* for that reason, image type aside | 17:48 |
opendevreview | Dan Smith proposed openstack/nova master: Remove AMI snapshot format special case https://review.opendev.org/c/openstack/nova/+/924866 | 18:05 |
dansmith | melwitt: sean-k-mooney: please review carefully ^ | 18:05 |
frickler | bauzas: sean-k-mooney: dansmith: 2024.1 part of the cve fix seems ready now https://review.opendev.org/c/openstack/nova/+/924732 | 18:07 |
dansmith | frickler: that had to wait for the de-AMI-ification yeah? | 18:08 |
dansmith | asking because of its relevance to the above patch, although I guess the ship has sailed now | 18:09 |
frickler | from what I saw, it was only waiting on https://review.opendev.org/c/openstack/nova/+/923831 in order to fix jobs, but I may have missed something | 18:10 |
dansmith | that's the de-AMI-ification of which I speak | 18:11 |
frickler | ah, that's done then, I'd say | 18:11 |
sean-k-mooney | frickler: we will likely keep merging the backport even though there is a regresion in snapshot functionaltiy but we may give it a few days before proposing new releases | 18:55 |
sean-k-mooney | to also include https://review.opendev.org/c/openstack/nova/+/924866 | 18:56 |
sean-k-mooney | and its backports | 18:56 |
sean-k-mooney | dansmith: from the commit message i was expecting larger changes whic makes me sad as they are just subtler changes instead | 18:58 |
sean-k-mooney | i think i need to test this in devstack porperly beause im not sure i trust just unit test coverage in this case | 19:00 |
dansmith | the unit testing is garbage, as proved by the fact that they've been skipping this case on libvirt for ages | 19:05 |
dansmith | but yeah, please do | 19:05 |
sean-k-mooney | im basically concerend we might be really storign other things in the instnace_system_metadata | 19:09 |
sean-k-mooney | so i want ot boot an ami guest and look at the db | 19:10 |
dansmith | this change only really affects snapshot, fwiw | 19:11 |
*** bauzas_ is now known as bauzas | 19:17 | |
sean-k-mooney | ya so really just need to test snapshot, rebuild form snapshot, boot form snapshot and shelve | 19:29 |
sean-k-mooney | had do fix a few things first but ok i have guest booted form the ami formated image | 20:28 |
opendevreview | Merged openstack/nova stable/2024.1: Change force_format strategy to catch mismatches https://review.opendev.org/c/openstack/nova/+/924732 | 20:30 |
sean-k-mooney | so with force_raw_images = True and use_cow_images = True | 20:32 |
sean-k-mooney | it create the snaphsot in qcow format | 20:32 |
sean-k-mooney | which is the format of the root disk of the runing vm but not the format of the orginal glance image or the backing file | 20:33 |
sean-k-mooney | im going to double check that now | 20:33 |
sean-k-mooney | and ill also save the image and check what it actully contins too | 20:34 |
sean-k-mooney | yes backing file is raw and root disk is a qcow and the snapshot was uploaded as a qcow | 20:35 |
sean-k-mooney | and it actully contaiens a qcow... | 20:38 |
sean-k-mooney | https://paste.opendev.org/show/bgiZgIIqkdSgDuotQFMO/ | 20:38 |
sean-k-mooney | dansmith: right... im testing your patch right now and i should be expecting it to be fixed | 20:38 |
sean-k-mooney | so with force_raw_images = True and use_cow_images = True the new behvior is to snapshot in qcow2 format | 20:40 |
sean-k-mooney | we also flattend the image properly before uploading | 20:40 |
sean-k-mooney | ill quickly do shelve, boot form snapshot and rebuild then flip the config options to now use cow images | 20:41 |
sean-k-mooney | so shelve, rebuild (to ami, to qcow snapthost form ami, to iso) work, as does boot form snapshot | 20:57 |
sean-k-mooney | dansmith: so in the force_raw_images = True and use_cow_images = True that mostly tracks | 20:57 |
sean-k-mooney | the only thin i was expecting to be differnt was i expected the snapshot to be raw but since we actully upload a qcow file and it is decalred as a qcow then that working as indended | 20:58 |
*** bauzas_ is now known as bauzas | 21:20 | |
sean-k-mooney | dansmith: i got this when i treid to shelve using raw images https://paste.opendev.org/show/bxtLREMcntQ73IvUdBDq/ | 21:29 |
dansmith | hrm, I wonder if that's the same bug that was fooling the tests | 21:29 |
sean-k-mooney | so snapshot actully works | 21:30 |
dansmith | because it is looking for NotImplemented, which I thought was just the mock missing it | 21:30 |
sean-k-mooney | but snapshot when shelving does not | 21:30 |
sean-k-mooney | im just going to test snapshot on its own sepreatly again quickly | 21:33 |
sean-k-mooney | then ill call it aday an have a look again in the morning | 21:33 |
sean-k-mooney | yep upload fine in raw format as expected | 21:33 |
sean-k-mooney | weird | 21:37 |
sean-k-mooney | so if so if i do it again it worked | 21:37 |
sean-k-mooney | but the only thing i changed is i deleted the old snapshot form the sehelve and the one i did manually before tryign again | 21:38 |
sean-k-mooney | dansmith: ya so if i take a snapshot i cant shelve | 21:42 |
sean-k-mooney | if i delete the snapshot i can | 21:42 |
melwitt | sean-k-mooney: when you say using raw images do you mean you set images_type = raw or flat? | 21:44 |
sean-k-mooney | sorry good question | 21:44 |
sean-k-mooney | im using a cirros uec image | 21:45 |
sean-k-mooney | https://paste.opendev.org/show/b5LlnYZfb8Kq5g1RCiLz/ | 21:45 |
sean-k-mooney | and i have set the https://paste.opendev.org/show/824881/ | 21:46 |
sean-k-mooney | i have set the correct image properites for the ram disk | 21:46 |
melwitt | yeah I mean like a case where shelving the instance will result in a raw image being uploaded to glance (rather than qcow2) | 21:47 |
sean-k-mooney | so the kernel is uploaed as disk_format aki the ramdisk as ari and the root filesystem as ami | 21:47 |
sean-k-mooney | melwitt: so i have use_cow_iamge = false and force_raw_iamages=true | 21:47 |
sean-k-mooney | so it shoudl be uploaded as raw | 21:47 |
melwitt | I don't think so.. otherwise we wouldn't have hit the bug right? what's being uploaded is qcow2 | 21:48 |
melwitt | force raw images means force backing files to be raw format | 21:48 |
sean-k-mooney | what being uploaded matches the format of teh root disk with dan's patch | 21:49 |
sean-k-mooney | melwitt: yes but when you dont have use_cow_images=true it also froces the root disk to be raw | 21:49 |
melwitt | so if you boot an instance and it pulls a qcow2 image down from glance if force raw images is true it will convert the image to raw and use it as a backing file and then the instance disk will be qcow2 (assuming images_type = qcow2) | 21:49 |
sean-k-mooney | no | 21:50 |
sean-k-mooney | in that case if you hve set use_cow_iamge = false and force_raw_iamages=true | 21:50 |
sean-k-mooney | and boot form a qcow in glance | 21:50 |
sean-k-mooney | the vm should use a raw root disk after we convert it with qemu-img | 21:50 |
sean-k-mooney | i have qcows so ill test that now to confirm | 21:51 |
melwitt | I don't think so, given those are the defaults | 21:51 |
melwitt | you get a qcow2 image uploaded to glance when you snapshot, right? | 21:51 |
sean-k-mooney | we default to use_cow_iamge=true | 21:51 |
sean-k-mooney | not false | 21:51 |
melwitt | yes | 21:51 |
sean-k-mooney | melwitt: no as i said if i have "use_cow_iamge = false and force_raw_iamages=true" then the snapshot uploaded is raw | 21:52 |
melwitt | ok, sorry I think I read what you wrote as opposite | 21:52 |
sean-k-mooney | and if i have "use_cow_iamge = true and force_raw_iamages=true" the snapshot is qcow | 21:52 |
sean-k-mooney | with dans patch applied | 21:52 |
melwitt | ok, yes | 21:52 |
sean-k-mooney | in both cases the vm was created from an ami formated image in galnce | 21:53 |
melwitt | ok, cool. I had been thinking about the raw case so if that is working also then that answers what I was wondering | 21:54 |
sean-k-mooney | so i just booted form a qcow in glance with "use_cow_iamge = false and force_raw_iamages=true" and taht resulst in a raw root disk as expected | 21:55 |
sean-k-mooney | ill try snapshot and shelve quicklys | 21:55 |
sean-k-mooney | yep so snapshot creates a raw sanpshot | 21:56 |
melwitt | ok. yeah I was thinking about it because of that old code comment "glance forces ami disk format to be ami" I don't understand what it meant. so I wondered if a raw is uploaded and not labeled "ami" will glance somehow have a problem with it | 21:56 |
melwitt | like have a problem creating an instance from it | 21:57 |
sean-k-mooney | i think its just wrong | 21:57 |
sean-k-mooney | or at least it is today | 21:57 |
sean-k-mooney | i dont think glance has any idea what ami is | 21:58 |
sean-k-mooney | i can upload the 3 pars of the iamge as raw and it wont care | 21:58 |
melwitt | ok. mystery comment then I guess :) | 21:58 |
sean-k-mooney | so i get the same odd behavior when shelving a instance created form a glance qcow when i also have a snapshot | 21:59 |
sean-k-mooney | so it look like for soem reason the image backidn is tryign to use direct_snapshot | 22:02 |
sean-k-mooney | that fails and then we abort | 22:03 |
sean-k-mooney | https://paste.opendev.org/show/bKsmlsy1ClyGhEg1XTVp/ | 22:03 |
melwitt | it actually does that for everything, tries direct_snapshot first and then falls back on regular snapshot if not implemented | 22:03 |
sean-k-mooney | we are not on ceph so we do not expect driect_snapshot to work | 22:04 |
melwitt | it's weird, not sure why it's done that way | 22:04 |
sean-k-mooney | its because for ceph we can sometiems do it and othertiems not | 22:04 |
melwitt | https://github.com/openstack/nova/blob/df39222b106326a4c28dee26b7127a61174d6b51/nova/virt/libvirt/driver.py#L3205 | 22:04 |
sean-k-mooney | depending on if glance and nova are on the same cluster i think | 22:04 |
melwitt | ack | 22:05 |
sean-k-mooney | but the odd part is dan is not changin any of that in https://review.opendev.org/c/openstack/nova/+/924866 | 22:05 |
sean-k-mooney | it also does not make sense why this works if i dont alreayd have a snapshot and why creating a snapshot seperatly also works | 22:07 |
melwitt | huh you keep getting bad gateway from glance. are there any errors in the g-api log? | 22:07 |
sean-k-mooney | oh you think we are just logging ingocrrectly | 22:07 |
sean-k-mooney | ill check | 22:07 |
melwitt | I don't see how 502 bad gateway can be related to your test or the patch but I have seen weird things in upstream CI before | 22:08 |
sean-k-mooney | i might have hit an image size quota | 22:09 |
melwitt | ohhhh yeah that's what it is | 22:09 |
sean-k-mooney | DEBUG oslo.limit.limit [None req-635da87a-1d0b-45e9-b3d1-df7b8b2decf2 admin admin] hit limit for project: [Resource image_size_total is over limit of 1000 due to current usage 1849 and delta 0] | 22:09 |
melwitt | when you exceed quota, glance gives a 502, and it confused me so much | 22:09 |
sean-k-mooney | taht is really inccorect usage of http repsoce codes | 22:09 |
melwitt | and every time I guess I have to re figure it out to find that it's a quota limit issue | 22:10 |
melwitt | yeah, seriously | 22:10 |
sean-k-mooney | i have horizon im just setting them to -1 for now | 22:11 |
sean-k-mooney | lol of course the image qutas are not there | 22:12 |
melwitt | to do it manually it's openstack --os-cloud devstack-system-admin registered limit set --default-limit $limit --resource-name $name $registered_limit_id | 22:14 |
melwitt | to get the list openstack --os-cloud devstack registered limit list | 22:15 |
sean-k-mooney | isnt that for unified limits? | 22:15 |
sean-k-mooney | i guess glance uses that already | 22:15 |
melwitt | yeah, glance uses unified limits for its quotas | 22:15 |
melwitt | so I think -1 won't work as unlimited, but you can try if you want | 22:16 |
sean-k-mooney | it wont any more no | 22:16 |
sean-k-mooney | why is the default 1000 | 22:16 |
sean-k-mooney | which is 1G | 22:17 |
melwitt | qcow2 favoritism | 22:17 |
melwitt | (I don't actually know) | 22:17 |
sean-k-mooney | that only work for like cirros, alpine or tinycore | 22:17 |
sean-k-mooney | like most cloud images are under a gig but barely | 22:18 |
melwitt | yeah there's that too | 22:18 |
melwitt | you don't actually shouldn't pass --resource-name because it's by id and --resource-name is if you want to update the name. I made a mistake in the nova docs | 22:20 |
melwitt | *you actually shouldn't | 22:21 |
sean-k-mooney | ya i got a dup-licet key error | 22:21 |
sean-k-mooney | Conflict occurred attempting to store registered_limit - Duplicate entry. (HTTP 409) (Request-ID: req-f0f478fe-f682-4081-8837-bc89a09e79f2) | 22:21 |
sean-k-mooney | but i have updated it | 22:22 |
melwitt | yeah, sorry. I need to fix the doc I'm referencing | 22:22 |
sean-k-mooney | no worries i would have given up all ready | 22:23 |
sean-k-mooney | but i want ot see if this works | 22:23 |
sean-k-mooney | if it does then we have another bug to fix but dans patch looks good otherwise | 22:23 |
sean-k-mooney | ok its workign for me locally | 22:27 |
melwitt | ok nice | 22:27 |
sean-k-mooney | i know the ship has kidn of sailed on -1 but i really se that as a downgrade with unifried limts... | 22:27 |
sean-k-mooney | i need a new way to write max_int with brevity and style | 22:28 |
melwitt | yeah, it's weird bc the keystone docs say specifically that -1 means unlimited but oslo.limit does something else | 22:28 |
sean-k-mooney | liek --im-an-admin-do-what-i-say akek --sudo | 22:29 |
melwitt | and style 😂 | 22:29 |
sean-k-mooney | *aka | 22:29 |
melwitt | ceph has at least one funny CLI option like that | 22:29 |
sean-k-mooney | ofcouse i could personaly do with --do-what-i-ment-not-what-typed option too | 22:30 |
sean-k-mooney | the one for when your deleting things | 22:30 |
sean-k-mooney | ya i always liked that | 22:30 |
melwitt | yeah. I love that | 22:30 |
sean-k-mooney | i feell like if we added that to reset-state i would feel alttile better about that existing | 22:31 |
melwitt | same | 22:31 |
sean-k-mooney | ok im geting tired so im goign to go get some rest and ill take a look at dansmith's patch again in the morning | 22:31 |
melwitt | ok, seeya o/ | 22:32 |
opendevreview | melanie witt proposed openstack/nova master: docs: Correct unified limits CLI commands https://review.opendev.org/c/openstack/nova/+/924888 | 22:59 |
opendevreview | Merged openstack/nova master: [CI] Replace deprecated regex https://review.opendev.org/c/openstack/nova/+/922212 | 23:38 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!