Wednesday, 2024-07-24

ykarelHi after https://review.opendev.org/c/openstack/nova/+/924731 - Change force_format strategy to catch mismatches05:41
ykarelmany tests are failing involving snapshot/rescue, is this something known already?05:42
ykarelmay be now as don't see any recent bug for it, will report05:43
ykarelreported https://bugs.launchpad.net/nova/+bug/207394405:57
*** bauzas_ is now known as bauzas06:04
ralonsohsean-k-mooney, ^06:11
ralonsohbauzas, ^06:11
ralonsohupsss maybe you didn't see the message06:11
ralonsohhttps://bugs.launchpad.net/nova/+bug/207394406:11
ralonsohsince https://review.opendev.org/c/openstack/nova/+/92473106:11
opendevreviewBalazs Gibizer proposed openstack/nova master: DNM: test downstream cross repo testing  https://review.opendev.org/c/openstack/nova/+/92482307:18
sean-k-mooneyralonsoh: is that on neutron ci07:59
sean-k-mooneyykarel: ralonsoh: if i recall correctly neutron is using the uec images (ami format) and has not neessialy configure tempest correctly for that08:00
ralonsohlet me check that08:00
sean-k-mooneyhttps://github.com/openstack/neutron/blob/master/zuul.d/tempest-multinode.yaml#L89-L9008:01
ralonsohsean-k-mooney, that change was to address https://bugs.launchpad.net/nova/+bug/193910808:02
sean-k-mooneytempest is by default configured ot expect qcow08:02
ralonsohhttps://review.opendev.org/c/openstack/neutron/+/91662908:02
sean-k-mooneyralonsoh: right we have since determined that, using the uec image does not actully fix the probelm08:03
sean-k-mooneyit makes it less likely but we did it last cycle and the issue didnt go away for us08:03
ralonsohsean-k-mooney, ok, I'll revert the neutron patch then08:03
sean-k-mooneyreverting is one approch the other is to set the approrate disk/container formats in the tempest config08:05
sean-k-mooneyfor 2024.1 we are also using thte uec images and im currently looking at regerting that to bring it inline with master08:05
ralonsohbut if using uec images makes the issue lest likely to happen08:05
ralonsohwhat should I configure in tempest for that?08:06
sean-k-mooneyhttps://review.opendev.org/c/openstack/nova/+/924746/208:06
sean-k-mooneyi think that08:06
sean-k-mooneybut in general the uec image just change the timeign08:06
sean-k-mooneywhich means if you dont land on a slow node you might get lucky08:06
ralonsohyeah, that's another issue08:07
sean-k-mooneyralonsoh: so Kernel panic - not syncing: IO-APIC + timer doesn't work! Boot with apic=debug and send a report. Then try booting with the 'noapic' option.08:07
sean-k-mooneyis actully fixed by using cirros 0.6.208:07
sean-k-mooneythat specific kernel panic in https://bugs.launchpad.net/nova/+bug/1939108 was a know kernel bug that was fixed in the cirros 0.6.x series08:08
ralonsohok, so if this issue is solved in 0.6.2, I could also use qcow2 images too08:09
ralonsohright?08:09
sean-k-mooneyyep08:09
ralonsohthat will fix the current problem08:09
ralonsohperfect then!08:09
sean-k-mooneythe other kernel panics im hoping have been fixed by https://review.opendev.org/c/openstack/devstack/+/92409408:09
bauzasjust saw _colby's question, I probably need more details08:10
bauzaslike the nvidia driver version they use 08:10
sean-k-mooneybauzas: ya i was hoping you recalled when nvidai split the driver in 208:10
bauzaswas a long time ago08:10
sean-k-mooneythey proably didnt make that chagne for centos808:10
bauzasGRID1608:11
sean-k-mooney but not that they are on centos 9 the preobaly pulled the latest dirvers08:11
bauzasI can lookup the A40 support for GRID16 08:11
bauzasyeah08:11
bauzasbut GRID 16 and 17 transition around RHEL9 08:12
bauzasboth support some RHEL9 versions, just not the same08:12
bauzasGRID16 is also a LTE version08:12
bauzaswhile GRID17 is only a production version, ie. with a 1-year term support08:12
bauzashttps://docs.nvidia.com/vgpu/16.0/product-support-matrix/index.html#abstract__red-hat-el-kvm08:14
*** chuanm6 is now known as chuanm08:14
ykarelralonsoh, sean-k-mooney but we still hit kernel panic with those 0.6.2 qcow images, i think i have never seen that issue with uec images08:37
ykarelhttps://32b7a8914f85d2150a67-dfd2b77d4a65df68dc83e2cbebde05a8.ssl.cf1.rackcdn.com/924004/4/check/tempest-full-py3/861ecb0/testr_results.html08:37
ykareli mean never seen that issue with uec images in neutron jobs that we track08:39
ralonsohykarel, we can also add https://review.opendev.org/c/openstack/nova/+/924746/2 in our jobs08:41
ykarelralonsoh, yeap +1 to that as we should also avoid those random kernel panics in our jobs08:41
ralonsohykarel, perfect, I08:42
ralonsohI'll change the neutron patch08:42
opendevreviewMark Goddard proposed openstack/nova master: ironic: Let Ironic handle deployment cleanup actions during destroy  https://review.opendev.org/c/openstack/nova/+/88341109:02
sean-k-mooneyykarel: we have see that issue with the uec images, when we swapped to them the kernel panices reduced but they never went away09:04
sean-k-mooneyykarel: i will also poitn out that creating vms with uec images is not how 99% of people use openstack09:05
sean-k-mooneyits why nova in particaly cant just use uec images even if that was a 100% reliable solution which it is not.09:06
*** bauzas_ is now known as bauzas09:08
*** bauzas_ is now known as bauzas09:21
fricklersean-k-mooney: iiuc the 2024.1 fix is blocked by https://review.opendev.org/c/openstack/nova/+/923831 now? so it could be rebase on top of it to get the tests to pass and reviewers could approve things in parallel?09:38
sean-k-mooneyits not blocked09:41
sean-k-mooneyfrickler: well it is09:41
sean-k-mooneybut rebasing is not the answer09:41
sean-k-mooneywe can merge https://review.opendev.org/c/openstack/nova/+/92383109:41
sean-k-mooneyand then just rehceck09:41
sean-k-mooneyi woudl prefer to not have to respin the cherrypics for the sha but i can if needed09:42
fricklerwhich takes more time than to rebase now. so anyway, bauzas gibi could you look at ^^ please09:42
sean-k-mooneyfrickler: i was about to ping bauzas, gibi is on pto today but i thinke we can proceed with this without there review09:43
frickleryeah, sorry for being impatient, I was just hoping that with just a single commit per branch this whole thing would be done within hours and not take days again09:44
sean-k-mooneyfrickler: well that why i started backporting https://review.opendev.org/c/openstack/nova/+/923831 2 weeks ago09:45
sean-k-mooneyi had hoped ot have that in place before this hit09:45
sean-k-mooneythe prvious cve was also impacted by theuse of the uec images09:45
sean-k-mooneybut that ended up bing partly mitigated 09:46
ykarelsean-k-mooney, ack atleast in our jobs we have not reproduced that issue with uec image10:01
fricklersean-k-mooney: I noticed that I do have stable core powers, so added +2, but leaving the approval for you10:06
sean-k-mooneyill give bauzas an hour or so and move forward if he or stephenfin dont have time to review https://review.opendev.org/c/openstack/nova/+/923831 by noon(utc+1)10:13
bauzashere now10:16
bauzashttps://review.opendev.org/c/openstack/nova/+/923831 +Wd10:17
sean-k-mooneybauzas: thanks10:17
opendevreviewMichael Still proposed openstack/nova master: libvirt: Add config option to require secure SPICE.  https://review.opendev.org/c/openstack/nova/+/92254410:49
opendevreviewMichael Still proposed openstack/nova master: libvirt: allow concurrent access to SPICE consoles.  https://review.opendev.org/c/openstack/nova/+/92254610:49
opendevreviewMichael Still proposed openstack/nova master: WIP: libvirt: Add guest devices to support SPICE USB.  https://review.opendev.org/c/openstack/nova/+/92254710:49
opendevreviewMichael Still proposed openstack/nova master: WIP: libvirt: Optionally enable SPICE debug logging.  https://review.opendev.org/c/openstack/nova/+/92254810:49
opendevreviewMichael Still proposed openstack/nova master: WIP: libvirt: Optionally support sound when using SPICE.  https://review.opendev.org/c/openstack/nova/+/92254910:49
opendevreviewMichael Still proposed openstack/nova master: libvirt: allow direct SPICE connections to qemu  https://review.opendev.org/c/openstack/nova/+/92484410:49
mikalsean-k-mooney: https://review.opendev.org/c/openstack/nova/+/924844/1 is my first swing at the reworking of the Nova APIs for the SPICE consoles.10:49
sean-k-mooneymikal: stephenfin is working on a paralle series to add openapi schema validation that might collide with that by the way but we can cross that bridge wehn we get there10:51
mikalsean-k-mooney: yeah, fair enough. Imma just gonna keep rebasing and uploading for now.10:51
sean-k-mooneymikal: currently my focuse is unfortuentlly on backporting a cve fix upstream and downstream so while i skimmed that i proably wont be able to start reviewing properly unitl next week10:53
mikalsean-k-mooney: assuming that the bottom three of those pass CI (and the first two already have before), I'll move onto reimplementing the sound model as an extra spec next. There is also a patch in that stack which adds USB redirect devices, I am undecided if I should make the number of devices configurable in extra specs as well.10:53
sean-k-mooneyhopefullly we will get most of the cve patches landed today10:53
mikalsean-k-mooney: yeah fair enough, I'm just a bit concerned about the code complete deadline, so I want to get them out there as quick as I can.10:54
sean-k-mooneywe make the number of serial ports configurable in the flavor extra specs and image properlteis i belive10:54
sean-k-mooneyso i dont see a harm in supproting bot10:54
sean-k-mooneyi.e. supportign usb count in both flavor and image10:55
mikalI haven't read https://review.opendev.org/c/openstack/nova-specs/+/920687 in depth yet, but I will do so before doing any USB stuff which might conflict.10:55
sean-k-mooneyjsu tmake sure if they disagree that you raise a falvor_image_configlct exctption in the api properly10:55
mikalIs there a good example of a previous extra spec which does what you want that I can cargo cult?10:55
sean-k-mooneymikal: am yes i can proably find one. likely the vPMU or vIOMMu ones10:56
sean-k-mooneyone sec ill grab the relevent review10:56
mikalsean-k-mooney: ok cool. No rush though, its basically my bed time so I wont look at this until tomorrow morning your time anyways.10:57
sean-k-mooneymikal: as prior arut to allwoign it to be set  in flavor i woudl sight https://github.com/openstack/nova/blob/master/nova/api/validation/extra_specs/hw.py#L448-L459 as evidnce :)10:57
sean-k-mooneymikal: https://github.com/openstack/nova/commit/326bc658eef04576230d5ba90d2a02bf32deee0310:58
sean-k-mooneymikal: that predates the flavor extra spec validation feature by the way10:59
mikalsean-k-mooney: oh, that makes me realise I probably should have written release notes for these changes...10:59
sean-k-mooneymikal: so when addign flavor extra specs please make sure you add a vlaidator too11:00
sean-k-mooneymikal: yes you have two options on that front. start a singel releasenote in the base patch that has a useable feature and update it in the rest11:00
sean-k-mooneyor have a sperate release note for each seperate adddtion of funcitonality11:01
mikalsean-k-mooney: yeah, that's not going to happen tonight. That can be tomorrow Michael's problem.11:01
sean-k-mooney:)11:01
sean-k-mooneywe sometimes put hte release not in the api patch that enabels the fucntionaltiy, we normlaly keep that to the end of the series by the way11:02
mikalEither way I'd appreciate feedback, but I get that you're busy. I'll do release notes tomorrow before I move onto extra specs etc.11:02
sean-k-mooneymikal: well the main feedback is you put the api change before the dirver change that implemted the feature11:02
sean-k-mooneythat is the wrogn order 11:02
mikalIn my head the remaining patches (after the first three) are "more optional". You can live without USB redirection and sound if you need to. The feature is still useful to some people without those.11:03
sean-k-mooneyyou need to put the api change at the end because once the api is intoduced the feature shoudl be useable fully11:03
sean-k-mooneyoh actully11:03
mikalThe feature should be fully usable after patch three.11:03
sean-k-mooneyok ya i see11:03
mikal(Technically patch 3 would work on its own, but the others were already up there).11:04
sean-k-mooneythe rest are sound, debuging and usb11:04
sean-k-mooneyok then its fine as is11:04
mikalYeah, and I might drop debugging. I am undecided.11:04
sean-k-mooneyit proably does not need seperate debug log cofnig form our normal debug config11:04
mikalIts so verbose (and so weird because its GTK debugging) that its useful sometimes, but you have to be quite sad to turn it on.11:04
sean-k-mooneyunless your very very verbose11:05
mikalIts very very verbose debugs in the libvirt qemu logs.11:05
mikalIts orchestrating qemu to write debug logs, not nova.11:05
mikalIt helped when I was reverse engineering some bits of the protocol and occasionally crashing the hypervisor.11:05
sean-k-mooneymikal: oh ...11:06
sean-k-mooneyyou are aware we don not allow wemu command line options to be used in nova right11:06
mikalThe SPICE protocol docs are... not good.11:06
mikalsean-k-mooney: as in what happens in https://review.opendev.org/c/openstack/nova/+/922546/5/nova/virt/libvirt/config.py is banned?11:07
sean-k-mooneyso we cant procedd with https://review.opendev.org/c/openstack/nova/+/922546/5/nova/virt/libvirt/config.py11:07
mikalThat is a shame, because that feature is cool.11:07
mikalBut libvirt only exposed it as a command line flag.11:07
sean-k-mooneythen we need a livbirt chagne before we can support configuing tha tin nova11:08
sean-k-mooneyi have unfrutaly had to do that dance several times11:08
mikalIts this -- https://www.spice-space.org/multiple-clients.html11:08
sean-k-mooneyif we use the qemu command elemets directly it marks the domain as tainted and voids all supprot form most distors virt teams11:09
sean-k-mooneyas a result we have had a strict ban on it upstream for as long as i have worked on openstack11:09
mikalI thought secure boot required it too though?11:09
sean-k-mooneynope11:09
mikalHerm, I no longer remember why I believe that <qemu:arg value='-global'/><qemu:arg value='ICH9-LPC.disable_s3=1'/> is required with secure boot.11:10
mikalOh, its required for Windows 11 support I think... https://github.com/quickemu-project/quickemu/issues/22011:10
sean-k-mooneymikal: are you perhaps usign secvure boto with the pc machine type or somethign like that11:10
sean-k-mooneymikal: becasue officaly secure boot is considered experimiantal unless your useing q3511:11
mikalsean-k-mooney: no, I'm definitely using q35 with secure boot. This wasn't in nova though, it was another code base.11:12
mikalSaid other code base just has three examples of qemu command line setting: this ICH9 thing; nvme disks; and concurrent SPICE sessions.11:13
sean-k-mooneyich9 i thnk is either the dound device or the chipset11:13
mikalAnyways, I can drop that patch from the series of Nova doesn't want to support concurrent SPICE sessions, its just a nice feature to have.11:13
mikalYeah, ICH9 is a sound device.11:14
sean-k-mooneywell we can supprot it but we need libvirt to supprot it first11:14
sean-k-mooneymikal: ok well we dont add soudn device today if i recall11:14
mikalWell, its an IO controller hub, which does sound. Its one of the libvirt sound model options.11:14
sean-k-mooneyyep i have plaied with the setting in virt-manager before11:14
sean-k-mooneyi just dont recall if we generate that upstream or not11:15
mikalI'll just drop the SPICE concurrency patch for now. There is zero chance of me pursuing a libvirt patch right now.11:15
opendevreviewMerged openstack/nova stable/2024.1: Stop using split UEC image (mostly)  https://review.opendev.org/c/openstack/nova/+/92383112:46
fricklersean-k-mooney: bauzas: grenade jobs are failing on the 2023.1 patch, but should grenade actually still run there when zed is eom? https://review.opendev.org/c/openstack/nova/+/92473413:14
frickler(seems more of the setuptools woe happening there)13:16
frickleralso https://zuul.opendev.org/t/openstack/build/c8fbe90f70974d2f918ee7fa6da54575 looks recheckable to me13:16
sean-k-mooneyfrickler: we have debated that before. so its failign because of the setuptool issue right13:19
fricklersean-k-mooney: the grenade failures that I looked at, yes. note the failure is on the controller, the later failure on compute1 is just secondary13:21
sean-k-mooneyi skimed them this morning and didnt see anytign really related to the patch13:21
sean-k-mooneyi didnt really look into test_get_list_deleted_instance_actions closely13:22
sean-k-mooneyBuild of instance 38107228-1260-473d-b6a8-7f3148a135ce aborted: Failed to allocate the network(s), not rescheduling.: nova.exception.BuildAbortException: Build of instance 38107228-1260-473d-b6a8-7f3148a135ce aborted: Failed to allocate the network(s), not rescheduling.13:23
sean-k-mooneyok so that is also unrelated13:23
bauzas.... 13:24
sean-k-mooneyfrickler: so yes i think that will pass if we reject if the upper constraits bump for packaging has been merged13:31
sean-k-mooneyto fix the focal issues13:31
dansmithykarel: around?13:38
dansmithoh I guess sean-k-mooney and ralonsoh have already discussed the image content failure13:39
ralonsohdansmith, most probably will change the neutron jobs to use qcow2 images, we are still testing13:41
dansmithralonsoh: excellent13:41
sean-k-mooneydansmith: longterm we should fix tempets but that the short term solution. for our 2024.1 branch we have merge the revert back to qcow14:15
sean-k-mooneyand we had previously green result on 2023.214:16
dansmithlong-term we should drop ami/uec/whatever support :)14:16
sean-k-mooney:) perhaps14:16
sean-k-mooneyalpine now provide cloud images by the way with cloud init in qcow format14:16
sean-k-mooneyso i might look at usign those instead of cirros again if we continue to see kernel panics14:17
sean-k-mooneydansmith: a i saw you repsonded on https://bugs.launchpad.net/cinder/+bug/2073413 as well14:19
dansmithyup14:19
sean-k-mooneyv2 qcow images might be safe if the lack of the feature mask means they cant have backing files14:20
sean-k-mooneybut if they can and we just didnt ahve a clean way to detech them before v314:20
sean-k-mooneythen im much less open to supprotign them going forward14:20
* frickler would be really interested to see whether alpine works better than cirros14:21
dansmithyeah, I think they're probably safe here and we can exclude them, but I'm in no big rush to enable them.. that's pretty old at this point, none of the things I've tested use that old format (of course) and you can qemu-img convert your way to a v3 very easily14:22
sean-k-mooneyfrickler: i can hack a poc of that i have a job that pull form my file share but i can likely just update the url14:22
dansmithI'm working on a patch against oslo, but I don't think we need to apply it everywhere first14:22
ykarelsean-k-mooney, wrt your alternate solution re. configure tempest for image create/upload14:53
ykarelhave you checked whoami-rajat comment regarding issue in format inspector?14:54
dansmithykarel: which issue?14:54
ykarelas the images are created from snapshot, so tempest config doesn't seem involved here14:54
ykareldansmith, https://bugs.launchpad.net/nova/+bug/207394414:54
dansmithah14:56
dansmithperhaps bfv instances and cinder is uploading always as raw even if it's qcow2?14:57
dansmithor, cinder is setting the disk_format to ami, but uploading a qcow2 I guess14:58
ykarelwhoami-rajat, once around if you can check/confirm ^15:45
opendevreviewDan Smith proposed openstack/nova master: DNM re-enable UEC images for testing  https://review.opendev.org/c/openstack/nova/+/92486516:02
opendevreviewDan Smith proposed openstack/nova master: WIP: Remove AMI snapshot format special case  https://review.opendev.org/c/openstack/nova/+/92486616:02
_colbysean-k-mooney: bauzas: thanks. We were able to get instances running with the vgpu and the A40. Its still creating all the resource providers for every virtual function (event the ones not allowed/configured in nova) unlike the centos 8 machines. So we might have some issues with vgpu recycling and removing mdevs. Luckily for this host its going to be a single project using it and the vgpu sizes will stay the same.17:04
_colbyWe are using the latest Nvidia vgpu drivers17:05
dansmithmelwitt: sean-k-mooney: early signs show that patch combo repro's and fixes the problem17:09
sean-k-mooneydansmith: just back. that is good to hear17:11
dansmithI need to fix unit tests17:13
melwittsounds good but I think I missed, how does the snapshot metadata path get taken with rescue?17:13
melwittor maybe it's that rescue tempest tests make snapshots? 17:14
melwittok yeah I see that stable rescue test makes a snapshot to create a rescue image17:15
sean-k-mooneydansmith: so am i correct in thinking that there may be instance that have had snapshot created in the past for ami guess that are not the wrong format in glance17:22
dansmithI failed to parse that question17:22
sean-k-mooneydo you know if glance allow the format to be updated to reflect reality?17:22
dansmithglance does not allow updating the disk_format after creation (except internally due to image conversion)17:22
sean-k-mooneyack so if you booted a guest from a uec/ami image and took a snapshot17:23
sean-k-mooneythat was uploaded as ami17:23
dansmithyou're asking about existing snapshots that have an incorrect disk_format? that will be a problem for everyone, even not AMI.. like if you have always had isos or qcows registered as raw because nobody checked, those are all now broken17:23
sean-k-mooneythen we can nologner use that snapshot correct17:23
sean-k-mooneyright17:23
dansmithyes17:23
sean-k-mooneyok im wondering if we need to consider a tools or something to help with that case in the future17:24
sean-k-mooneybasically tring ot think of if there is anythign we can do for isntance that are shelved with the wrogn format for example17:25
dansmithyeah, I mean hacking the glance DB is probably the only way, other than a glance-manage automation of that17:27
dansmithI say we wait until someone demands it17:27
sean-k-mooneyya i also have no idea how powsisbel it woudl even be if your using glance with cinder as a backend for example17:29
dansmithzomg17:45
dansmithone of those tests was silently failing because a mock didn't set disk_format17:45
dansmithand now that I'm removing the line that was failing and being covered up. by the test, we're actually running a test that use ami and failing for other broken crap17:46
dansmithbecause a missing o.vo object attribute raises NotImplementedError, which the test is ignoring because some drivers don't implement snapshot17:47
dansmith*facepalm*17:47
dansmithactually, this test wasn't running for libvirt *at all* for that reason, image type aside17:48
opendevreviewDan Smith proposed openstack/nova master: Remove AMI snapshot format special case  https://review.opendev.org/c/openstack/nova/+/92486618:05
dansmithmelwitt: sean-k-mooney: please review carefully ^18:05
fricklerbauzas: sean-k-mooney: dansmith: 2024.1 part of the cve fix seems ready now https://review.opendev.org/c/openstack/nova/+/92473218:07
dansmithfrickler: that had to wait for the de-AMI-ification yeah?18:08
dansmithasking because of its relevance to the above patch, although I guess the ship has sailed now18:09
fricklerfrom what I saw, it was only waiting on https://review.opendev.org/c/openstack/nova/+/923831 in order to fix jobs, but I may have missed something18:10
dansmiththat's the de-AMI-ification of which I speak18:11
fricklerah, that's done then, I'd say18:11
sean-k-mooneyfrickler: we will likely keep merging the backport even though there is a regresion in snapshot functionaltiy but we may give it a few days before proposing new releases18:55
sean-k-mooneyto also  include https://review.opendev.org/c/openstack/nova/+/92486618:56
sean-k-mooneyand its backports18:56
sean-k-mooneydansmith: from the commit message i was expecting larger changes whic makes me sad as they are just subtler changes instead18:58
sean-k-mooneyi think i need to test this in devstack porperly beause im not sure i trust just unit test coverage in this case19:00
dansmiththe unit testing is garbage, as proved by the fact that they've been skipping this case on libvirt for ages19:05
dansmithbut yeah, please do19:05
sean-k-mooneyim basically concerend we might be really storign other things in the instnace_system_metadata19:09
sean-k-mooneyso i want ot boot an ami guest and look at the db19:10
dansmiththis change only really affects snapshot, fwiw19:11
*** bauzas_ is now known as bauzas19:17
sean-k-mooneyya so really just need to test snapshot, rebuild form snapshot, boot form snapshot and shelve19:29
sean-k-mooneyhad do fix a few things first but ok i have guest booted form the ami formated image20:28
opendevreviewMerged openstack/nova stable/2024.1: Change force_format strategy to catch mismatches  https://review.opendev.org/c/openstack/nova/+/92473220:30
sean-k-mooneyso with force_raw_images = True and use_cow_images = True20:32
sean-k-mooneyit create the snaphsot in qcow format20:32
sean-k-mooneywhich is the format of the root disk of the runing vm but not the format of the orginal glance image or the backing file20:33
sean-k-mooneyim going to double check that now20:33
sean-k-mooneyand ill also save the image and check what it actully contins too20:34
sean-k-mooneyyes backing file is raw and root disk is a qcow and the snapshot was uploaded as a qcow20:35
sean-k-mooneyand it actully contaiens a qcow...20:38
sean-k-mooneyhttps://paste.opendev.org/show/bgiZgIIqkdSgDuotQFMO/20:38
sean-k-mooneydansmith: right... im testing your patch right now and i should be expecting it to be fixed 20:38
sean-k-mooneyso with force_raw_images = True and use_cow_images = True the new behvior is to snapshot in qcow2 format20:40
sean-k-mooneywe also flattend the image properly before uploading20:40
sean-k-mooneyill quickly do shelve, boot form snapshot and rebuild then flip the config options to now use cow images20:41
sean-k-mooneyso shelve, rebuild (to ami, to qcow snapthost form ami, to iso) work, as does boot form snapshot20:57
sean-k-mooneydansmith: so in the force_raw_images = True and use_cow_images = True that mostly tracks20:57
sean-k-mooneythe only thin i was expecting to be differnt was i expected the snapshot to be raw but since we actully upload a qcow file and it is decalred as a qcow then that working as indended20:58
*** bauzas_ is now known as bauzas21:20
sean-k-mooneydansmith: i got this when i treid to shelve using raw images https://paste.opendev.org/show/bxtLREMcntQ73IvUdBDq/21:29
dansmithhrm, I wonder if that's the same bug that was fooling the tests21:29
sean-k-mooneyso snapshot actully works21:30
dansmithbecause it is looking for NotImplemented, which I thought was just the mock missing it21:30
sean-k-mooneybut snapshot when shelving does not21:30
sean-k-mooneyim just going to test snapshot on its own sepreatly again quickly21:33
sean-k-mooneythen ill call it aday an have a look again in the morning21:33
sean-k-mooneyyep upload fine in raw format as expected21:33
sean-k-mooneyweird21:37
sean-k-mooneyso if so if i do it again it worked21:37
sean-k-mooneybut the only thing i changed is i deleted the old snapshot form the sehelve and the one i did manually before tryign again21:38
sean-k-mooneydansmith: ya so if i take a snapshot i cant shelve21:42
sean-k-mooneyif i delete the snapshot i can21:42
melwittsean-k-mooney: when you say using raw images do you mean you set images_type = raw or flat?21:44
sean-k-mooneysorry good question 21:44
sean-k-mooneyim using a cirros uec image21:45
sean-k-mooneyhttps://paste.opendev.org/show/b5LlnYZfb8Kq5g1RCiLz/21:45
sean-k-mooneyand i have set the https://paste.opendev.org/show/824881/21:46
sean-k-mooneyi have set the correct image properites for the ram disk21:46
melwittyeah I mean like a case where shelving the instance will result in a raw image being uploaded to glance (rather than qcow2)21:47
sean-k-mooneyso the kernel is uploaed as disk_format aki the ramdisk as ari and the root filesystem as ami21:47
sean-k-mooneymelwitt: so i have use_cow_iamge = false and force_raw_iamages=true21:47
sean-k-mooneyso it shoudl be uploaded as raw21:47
melwittI don't think so.. otherwise we wouldn't have hit the bug right? what's being uploaded is qcow221:48
melwittforce raw images means force backing files to be raw format21:48
sean-k-mooneywhat being uploaded matches the format of teh root disk with dan's patch21:49
sean-k-mooneymelwitt: yes but when you dont have use_cow_images=true it also froces the root disk to be raw21:49
melwittso if you boot an instance and it pulls a qcow2 image down from glance if force raw images is true it will convert the image to raw and use it as a backing file and then the instance disk will be qcow2 (assuming images_type = qcow2)21:49
sean-k-mooneyno21:50
sean-k-mooneyin that case if you hve set use_cow_iamge = false and force_raw_iamages=true21:50
sean-k-mooneyand boot form a qcow in glance21:50
sean-k-mooneythe vm should use a raw root disk after we convert it with qemu-img21:50
sean-k-mooneyi have qcows so ill test that now to confirm21:51
melwittI don't think so, given those are the defaults21:51
melwittyou get a qcow2 image uploaded to glance when you snapshot, right?21:51
sean-k-mooneywe default to use_cow_iamge=true21:51
sean-k-mooneynot false21:51
melwittyes21:51
sean-k-mooneymelwitt: no as i said if i have "use_cow_iamge = false and force_raw_iamages=true" then the snapshot uploaded is raw21:52
melwittok, sorry I think I read what you wrote as opposite21:52
sean-k-mooneyand if i have "use_cow_iamge = true and force_raw_iamages=true" the snapshot is qcow 21:52
sean-k-mooneywith dans patch applied21:52
melwittok, yes21:52
sean-k-mooneyin both cases the vm was created from an ami formated image in galnce21:53
melwittok, cool. I had been thinking about the raw case so if that is working also then that answers what I was wondering21:54
sean-k-mooneyso i just booted form a qcow in glance with "use_cow_iamge = false and force_raw_iamages=true" and taht resulst in a raw root disk as expected21:55
sean-k-mooneyill try snapshot and shelve quicklys21:55
sean-k-mooneyyep so snapshot creates a raw sanpshot21:56
melwittok. yeah I was thinking about it because of that old code comment "glance forces ami disk format to be ami" I don't understand what it meant. so I wondered if a raw is uploaded and not labeled "ami" will glance somehow have a problem with it21:56
melwittlike have a problem creating an instance from it21:57
sean-k-mooneyi think its just wrong21:57
sean-k-mooneyor at least it is today21:57
sean-k-mooneyi dont think glance has any idea what ami is21:58
sean-k-mooneyi can upload the 3 pars of the iamge as raw and it wont care21:58
melwittok. mystery comment then  I guess :)21:58
sean-k-mooneyso i get the same odd behavior when shelving a instance created form a glance qcow when i also have a snapshot21:59
sean-k-mooneyso it look like for soem reason the image backidn is tryign to use direct_snapshot22:02
sean-k-mooneythat fails and then we abort22:03
sean-k-mooneyhttps://paste.opendev.org/show/bKsmlsy1ClyGhEg1XTVp/22:03
melwittit actually does that for everything, tries direct_snapshot first and then falls back on regular snapshot if not implemented22:03
sean-k-mooneywe are not on ceph so we do not expect driect_snapshot to work22:04
melwittit's weird, not sure why it's done that way22:04
sean-k-mooneyits because for ceph we can sometiems do it and othertiems not22:04
melwitthttps://github.com/openstack/nova/blob/df39222b106326a4c28dee26b7127a61174d6b51/nova/virt/libvirt/driver.py#L320522:04
sean-k-mooneydepending on if glance and nova are on the same cluster i think22:04
melwittack22:05
sean-k-mooneybut the odd part is dan is not changin any of that in https://review.opendev.org/c/openstack/nova/+/92486622:05
sean-k-mooneyit also does not make sense why this works if i dont alreayd have a snapshot and why creating a snapshot seperatly also works22:07
melwitthuh you keep getting bad gateway from glance. are there any errors in the g-api log?22:07
sean-k-mooneyoh you think we are just logging ingocrrectly22:07
sean-k-mooneyill check22:07
melwittI don't see how 502 bad gateway can be related to your test or the patch but I have seen weird things in upstream CI before22:08
sean-k-mooneyi might have hit an image size quota22:09
melwittohhhh yeah that's what it is22:09
sean-k-mooneyDEBUG oslo.limit.limit [None req-635da87a-1d0b-45e9-b3d1-df7b8b2decf2 admin admin] hit limit for project: [Resource image_size_total is over limit of 1000 due to current usage 1849 and delta 0] 22:09
melwittwhen you exceed quota, glance gives a 502, and it confused me so much22:09
sean-k-mooneytaht is really inccorect usage of http repsoce codes22:09
melwittand every time I guess I have to re figure it out to find that it's a quota limit issue22:10
melwittyeah, seriously22:10
sean-k-mooneyi have horizon im just setting them to -1 for now22:11
sean-k-mooneylol of course the image qutas are not there22:12
melwittto do it manually it's openstack --os-cloud devstack-system-admin registered limit set --default-limit $limit --resource-name $name $registered_limit_id22:14
melwittto get the list openstack --os-cloud devstack registered limit list22:15
sean-k-mooneyisnt that for unified limits?22:15
sean-k-mooneyi guess glance uses that already22:15
melwittyeah, glance uses unified limits for its quotas22:15
melwittso I think -1 won't work as unlimited, but you can try if you want22:16
sean-k-mooneyit wont any more no22:16
sean-k-mooneywhy is the default 100022:16
sean-k-mooneywhich is 1G22:17
melwittqcow2 favoritism22:17
melwitt(I don't actually know)22:17
sean-k-mooneythat only work for like cirros, alpine or tinycore22:17
sean-k-mooneylike most cloud images are under a gig but barely22:18
melwittyeah there's that too22:18
melwittyou don't actually shouldn't pass --resource-name because it's by id and --resource-name is if you want to update the name. I made a mistake in the nova docs22:20
melwitt*you actually shouldn't22:21
sean-k-mooneyya i got a dup-licet key error22:21
sean-k-mooneyConflict occurred attempting to store registered_limit - Duplicate entry. (HTTP 409) (Request-ID: req-f0f478fe-f682-4081-8837-bc89a09e79f2)22:21
sean-k-mooneybut i have updated it22:22
melwittyeah, sorry. I need to fix the doc I'm referencing 22:22
sean-k-mooneyno worries i would have given up all ready22:23
sean-k-mooneybut i want ot see if this works22:23
sean-k-mooneyif it does then we have another bug to fix but dans patch looks good otherwise22:23
sean-k-mooneyok its workign for me locally22:27
melwittok nice22:27
sean-k-mooneyi know the ship has kidn of sailed on -1 but i really se that as a downgrade with unifried limts...22:27
sean-k-mooneyi need a new way to write max_int with brevity and style22:28
melwittyeah, it's weird bc the keystone docs say specifically that -1 means unlimited but oslo.limit does something else22:28
sean-k-mooneyliek --im-an-admin-do-what-i-say akek --sudo22:29
melwittand style 😂 22:29
sean-k-mooney*aka22:29
melwittceph has at least one funny CLI option like that22:29
sean-k-mooneyofcouse i could personaly do with --do-what-i-ment-not-what-typed option too22:30
sean-k-mooneythe one for when your deleting things22:30
sean-k-mooneyya i always liked that22:30
melwittyeah. I love that22:30
sean-k-mooneyi feell like if we added that to reset-state i would feel alttile better about that existing22:31
melwittsame22:31
sean-k-mooneyok im geting tired so im goign to go get some rest and ill take a look at dansmith's patch again in the morning22:31
melwittok, seeya o/22:32
opendevreviewmelanie witt proposed openstack/nova master: docs: Correct unified limits CLI commands  https://review.opendev.org/c/openstack/nova/+/92488822:59
opendevreviewMerged openstack/nova master: [CI] Replace deprecated regex  https://review.opendev.org/c/openstack/nova/+/92221223:38

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!