Tuesday, 2025-11-25

*** mhen_ is now known as mhen02:30
ykarelgmaan, ack updated06:10
fricklergmaan: nova-ceph-multistore seems to be consistently failing now on https://review.opendev.org/c/openstack/devstack/+/968076 , I hope we didn't introduce a real regression due to lack of test coverage. still I'm wary of simply making the job non-voting now08:25
frickleryes, it is passing on other changes, like https://zuul.opendev.org/t/openstack/build/1aa05ba4148a46ba9f1bd547cb95c145 . :-( sean-k-mooney ^^ maybe you can check this?08:36
sean-k-mooneyfrickler: just getting started but ill take a look shortly09:13
sean-k-mooneytest_boot_cloned_encrypted_volume09:29
sean-k-mooneyfrickler: so we have passing runs like my patch above where that passes https://review.opendev.org/c/openstack/devstack/+/96824909:31
sean-k-mooneyso it by no means universally broken, i guess the concern is https://review.opendev.org/c/openstack/devstack/+/968076/1/lib/tempest#82909:32
sean-k-mooneyis the behviaorl chagne that is casue the test case to faile09:32
sean-k-mooneylet me see if that est is running on the passing version or if its skipped09:32
sean-k-mooneyhum this is also form cinder_tempest_plugin09:33
sean-k-mooneyi didnt think we ran that plugin in nova-ceph-multistore09:33
sean-k-mooneyi wonder of the  test is new09:33
sean-k-mooneyno its running and passing on my patch09:35
sean-k-mooneyfrickler: ah so this is a know issue we hit a few months ago09:38
sean-k-mooneyhttps://pb.teim.app/?90a8d20013a7b570#HH3B9MsRVHJAuFjtDvUKkLduZtwnELkFNAr5hYQFBHbq09:38
sean-k-mooneythe tldr is cidner is using a cgrpu liek mechanisum plimits or something to enforce some runtim restriction on the processutils.exec calls to qemu-img09:40
sean-k-mooneyin some cases that failes with 'qemu-img: /opt/stack/data/cinder/conversion/tmpzj1a9mr9.luks: error while converting luks: Unable to get accurate CPU usage\n'09:40
sean-k-mooneyits the throttle_cmd bit of convert_image(tuple(throttle_cmd['prefix']), in the trace  File "/opt/stack/cinder/cinder/image/image_utils.py", line 553, in convert_image09:41
sean-k-mooneyhttps://tinyurl.com/3wjwjpuu09:45
sean-k-mooneyi dont know howlove or logs are but it looks liek 250 ish failure in the last 2 weeks09:45
sean-k-mooneyfrickler: for now i have recheked it again the one thing that i do find intersting is all the failures are on raxflex-*09:50
sean-k-mooneythose are kvm host using the pc machine type and a pretty old biors firmware. the passign case is on xen on the older rax cloud09:53
sean-k-mooneyi dont know if i can trust 09:54
sean-k-mooney ansible_bios_date: 04/01/201409:55
sean-k-mooney  ansible_bios_vendor: SeaBIOS09:55
sean-k-mooney  ansible_bios_version: 1.16.2-debian-1.16.2-109:55
sean-k-mooneythat would be the bookwork version https://packages.debian.org/source/oldstable/seabios09:56
sean-k-mooneywhich i guess may make sense09:56
sean-k-mooneyi was orgianbly wonderign if this was that xen bug where sometime we dont get all the cpus we expect but aprpently not09:58
opendevreviewMerged openstack/devstack master: Revert "Cap stable/2025.2 network, swift, volume api_extensions for tempest"  https://review.opendev.org/c/openstack/devstack/+/96807614:01
fricklersean-k-mooney: oh, so completely unrelated to the change after all, just bad luck, thx for checking. and logs expire after 10 days or so for the opensearch access iirc14:23
fricklermaybe we can check with rax whether we can use a different machine type or something, I guess this could affect other customers of them, too?14:24
frickleralso do you know whether there might be some option for qemu-img that could be added as a workaround?14:27
sean-k-mooneyi doubt this is failing 100% of the time14:27
sean-k-mooneyi.e. i suspect we had more then 250 jobs run on the rax flex nodes in those 10 days14:28
sean-k-mooneyi.e. with that test14:28
sean-k-mooneybut ya it look like bad timing but i also think we need to evenutaly fix this in cinder14:28
sean-k-mooneyfrickler: with that said i think there may be a qemu-img bug at play as well14:29
* sean-k-mooney is trying to remember all the context form a few weeks ago14:30
sean-k-mooneyfrickler: ok so ya to work around a qemu-img bug related to adresspace usage with there new async mode in librbd14:31
opendevreviewMaxim Sava proposed openstack/tempest master: Add image decompression import tests  https://review.opendev.org/c/openstack/tempest/+/96588914:31
sean-k-mooneywe moved the ceph job to debian 1214:31
sean-k-mooneyfrickler: then we hit this bug which is fixed in the version in debian backports14:32
sean-k-mooneybut i never got aroudn to enabling that14:32
sean-k-mooneyso a fix for this may be to move this job to debian 13 now that we have that14:32
sean-k-mooneyfrickler: ill push a change for that to review and we can see if we want to proceed14:32
sean-k-mooneyfrickler: https://github.com/openstack/devstack-plugin-ceph/commit/536ebea559673d694c8f8deeafe1362e2c41c02114:33
sean-k-mooneyhttps://bugs.launchpad.net/ceph/+bug/211685214:33
frickler.oO(distro hopping ;)14:34
sean-k-mooneyyep to aovid the buggy version fo qemu-img14:34
sean-k-mooneythis was inteded to be temproy to be fair but other then this rare faliure it been just as stable as ubuntu14:34
sean-k-mooneywe ment form an almost alwasy failrue to a very rare one14:35
fricklerclarkb: fungi: not sure if you're reading along anyway, but fyi the tinyurl from sean above shows yet another case of "raxflex is different and can break things", which we might want to forward to rax, too14:37
fungii was not following, catching up now sorry14:38
fricklerthe failure likely got more frequent recently as we enabled more capacity on raxflex14:38
fungiso maybe related to https://bugzilla.redhat.com/show_bug.cgi?id=2336437 i guess?14:43
fungi"...a subtle bug on machines which are very fast..." (from dan berrangé's related qemu-devel post)14:49
fungiseems like the way rackspace flex may be different in this case is by having faster processors (which we already know)14:49
fungiif it needs https://gitlab.com/qemu-project/qemu/-/commit/145f12e then that will require qemu 10.0.0 or later, looks like?14:58
fungifrickler: sean-k-mooney: am i understanding the concern correctly?14:59
sean-k-mooneysorry was looking at somethign else reading back15:00
fungialso the error message in your opensearch query was first introduced by qemu 9.2.0, looks like15:01
sean-k-mooneyqemu-9.2.0-3.fc4215:02
fungiso in theory you'd start seeing it with qemu >=9.2,<10 if the error is related to the bug i mentioned15:02
sean-k-mooneyya so there are 2 related issues15:03
sean-k-mooneylibrados15:03
fungiwell, qemu-9.2.0-3.fc42 is the fedora backport of the fix mentioned in the rh bug, not the upstream version it's fixed in15:03
sean-k-mooneyis internally rewritign how it works to su c++ coruties and an async executor15:03
sean-k-mooneythat increased the virtual memory usage but not nessisalry the RSS15:04
sean-k-mooneycausing a check in nova to hard fail15:04
sean-k-mooneyto work around that we swaped to debian 12 to ahve a version that does not have that behavior15:04
sean-k-mooneywhich uncoverd this other bug15:04
sean-k-mooneydebian backport in 12 has a newer qemu-image that shoudl have a fix for https://bugzilla.redhat.com/show_bug.cgi?id=2336437 as shoudl 1315:05
sean-k-mooneyon the nova side i have not got around to fixing our usage to check RSS not vritual memory15:05
sean-k-mooneyon the cinder side15:05
fungilooks like debian 23 (trixie) has qemu 10 so would include dan's upstream fix15:06
fungier, debian 1215:06
sean-k-mooneyyou mean 13 (trixie)15:06
sean-k-mooneyya so first step i was going to push a patch to just move the ceph jobs form 12->1315:06
sean-k-mooneyand see if that passes a few times15:06
fungioh, sorry yes. debian 12 (bookworm) has qemu 7.2 which pre-dates that check, but also bookworm-backports has qemu 1015:07
sean-k-mooneyif so that should side step most of the issues but i sould also do a draft patch to change the nova check15:07
fungiso depending on whether or not you're installing qemu packages from -backports, you either have a version from before the check was added or a version which includes the fix for it15:07
opendevreviewFernando Ferraz proposed openstack/devstack-plugin-nfs master: [DNM] Test Glance over Cinder/NFS with NFS driver fixes  https://review.opendev.org/c/openstack/devstack-plugin-nfs/+/96540915:07
sean-k-mooneyfungi: yes15:08
fungiand yeah, switching to trixie you'll have a similar qemu version to what's in bookworm-backports15:09
sean-k-mooneywe dicsused using backport a while ago but its not exactly clean. it requires some change to devstack that are more invasive then just updateign the package in files/ ...15:10
sean-k-mooneywe are mirroring backport but debian backport rquire you to opt in per package to install form them15:11
sean-k-mooneyso its not just a case fo enabling the repo15:11
sean-k-mooneywe either need to pass -t backports? or simialr or install the package with <name>/backports or simialr15:12
fungiwell, you don't *have* to opt in per-package, you can do it per invocation of apt by using -t, but generally yes it requires some adjustment because backports isn't meant to be something you install everything froim15:12
sean-k-mooneyyep so i tied the targeted approch when i had a free after noone but ran out of time because i needed to isntall each dep as well15:13
fungitechnically you can make that happen by overriding the priority for that suite in apt's configuration, but i don't recommend it. though you can configure apt pinning to backports for specific packages too15:13
sean-k-mooneygibi got a poc working that way15:14
sean-k-mooneybtu the simple way to do this vai backprot would be to have an extra if here 15:14
sean-k-mooneyhttps://github.com/openstack/devstack/blob/master/lib/nova_plugins/functions-libvirt#L7215:14
sean-k-mooneyfor debian if its debian 1215:14
fungiyeah, you could instead drop a conffile in /etc/apt/apt.conf.d/ that basically says always install qemu packages from backports15:16
sean-k-mooneydo you have an exampel of that bacues i coudl not find that but if there is one that woudl eb an path forward15:17
sean-k-mooneywe coudl do that in the fixup-stuff methods15:17
fungisean-k-mooney: i'm looking for a more concise example, but https://linuxconfig.org/debian-pinning-howto is pretty close15:20
fungiand it's /etc/apt/preferences.d/ you'd put it under actually15:21
opendevreviewSean Mooney proposed openstack/devstack-plugin-ceph master: move ceph jobs to debian 13  https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/96833515:22
sean-k-mooneyah 15:23
sean-k-mooneyPackage: *15:23
sean-k-mooneyPin: release a=stable15:23
sean-k-mooneyPin-Priority: 90015:23
sean-k-mooneyso i could proably do *qemu* and *libvirt* or simialr15:23
sean-k-mooneyah via /etc/apt/preferences15:24
fungiyes, i recommend using the /etc/apt/preferences.d/ directory so you don't have to worry about file-level overwrites/conflicts with your config additions15:25
sean-k-mooneyok i can also try that. i still think moving to debian 13 woudl make sense but we can do both to have somethign we could backport for sable branches15:26
fungithe default priorities for each suite are encoded into their release files, so you only have to worry about installing configuration for your overrides15:26
sean-k-mooneyyep15:26
fungiand yes, if this is for testing openstack master branches then debian 13 is what the pti says we're going to test anyway15:27
funginow that our package mirror network should be working for it, hopefully it's pretty trivial to get going15:27
sean-k-mooneyok well we can review the patch above with an view to this cycle but let me spend 20 mins creating a patch for devstack to use backport of qemu on debian 12 seperatly15:28
fungihttps://superuser.com/questions/678396/apt-pinning-several-packages-in-one-section has some examples15:29
fungifrom over a decade ago, but things haven't changed much with this mechanism15:29
sean-k-mooneyai say it shoudl be somthign like this https://pb.teim.app/?6bec1eccfb89cf14#4P3UUs6JMJt3jcXpNc3e1C9cEmwDeN9mxj6Eh1Gmw7de15:29
sean-k-mooneybased on https://manpages.debian.org/testing/apt/apt_preferences.5.en.html15:30
sean-k-mooneywhich seam to be consitent with  https://manpages.debian.org/testing/apt/apt_preferences.5.en.html#Matching_packages_in_the_Package_field15:30
sean-k-mooneypackages is a space seperate list with bash glob support15:30
fungiyeah, i'd have to refresh my memory on whether 900 is high enough priority to overcome the stable default15:31
sean-k-mooneyi dont know if you have tried perplexity.ai as a search tool but i like that it provide sitations for the souce that it used to make the recomendaiton so you can click in and verify it15:32
fungiah, yeah priority >=1000 is only needed if you want apt to downgrade to the target version (otherwise it avoids downgrading)15:33
sean-k-mooneythe man pages have coverage of the priorties https://manpages.debian.org/testing/apt/apt_preferences.5.en.html#How_APT_Interprets_Priorities15:33
sean-k-mooneyso it looks liike  99015:33
sean-k-mooneymight be more correct15:33
sean-k-mooneyaltough i think anythin over 500 would work for us15:34
fungiyes, i think you want >=990 according to the manpage15:34
fungi"causes a version to be installed even if it does not come from the target release, unless the installed version is more recent"15:35
sean-k-mooneyya and we are not passign a version15:35
sean-k-mooneyso i think we need to raise to 990 as a result15:35
sean-k-mooneyi mean we will see in either case alsothg im goign to create a debian 12 vm  to test this locally to avoid the ci round trip15:36
sean-k-mooneyfungi: its kind of a novel expiricne to have an issueb becasue our ci vms are too fast for once15:42
sean-k-mooneynot that im particually complaining, its a good problem to have15:43
fungiheh, indeed!15:44
sean-k-mooneylol for the first time ever i hit the kernel paninc that cirros get locally booting a debin 12 image15:55
sean-k-mooneyand of couse it booted fine on a reboot15:55
clarkbfrickler: fungi: I'm not sure complaining to a cloud that their cpus are too fast is something I want to do16:04
clarkbthe last time someone did that HPCloud instituted draconian anti noisy neighbor measures that made the cloud useless for us16:04
sean-k-mooneyclarkb: no one is suggesting that at least not seriously16:05
sean-k-mooneythis is a qemu bug16:05
sean-k-mooneynot an openstack one16:05
fricklerclarkb: yes, that's not what I meant, I was assuming earlier from sean's comments that the old bios or machine type might be part of the trigger for this16:05
clarkbgot it so we thought maybe it was a config thing but now we realize its a software bug in qemu and we should focus on addressing that16:06
sean-k-mooneyclarkb: i tought it might have been failing on the xen host wehre we had the kernel issue whre not all the cpu cores were aviable16:06
sean-k-mooneybut it was the opicite it was passing on xen on the old rax hardware16:07
fungibut in this case it's a kvm provider not xen16:07
fungiright16:07
sean-k-mooneycorrect it was passign on the slow cpus because slow cpus dont trigger the qemu bug (apprently)16:07
clarkbI guess luks is trying to determine how many cycles it needs to hash secrets based on cpu speed? I wonder if you can configure it to a value instead and skip that step?16:08
sean-k-mooneyim not sure if that is the reason16:09
sean-k-mooneybut i dont think we coudl without changind cinder/nova to invoke qemu-img diffently in any case16:09
sean-k-mooneyi.e. i dont think qemu-img has a global config file to change that behvior16:10
clarkback16:10
opendevreviewMerged openstack/grenade stable/2025.1: Update grenade-skip-level-always job FROM branch  https://review.opendev.org/c/openstack/grenade/+/96371416:20
opendevreviewMaxim Sava proposed openstack/tempest master: Add image decompression import tests  https://review.opendev.org/c/openstack/tempest/+/96588916:47
sean-k-mooneyfungi: creating a debian vm took way longer then i plande because of dumb reasons but i think i have a prefernce file that work locally i just need to make devstack generate it and test it end to end.16:53
clarkbsean-k-mooney: are we doing luks in a nested vm? or does qemu-img runs this stuff unconditionally?16:54
sean-k-mooneyclarkb: this is failign the the ceph job that test encypted ceph volumes16:54
clarkbthe other thought that occurred to me is should we be using luks but it isn't clear to me if we do that intentionally16:54
clarkbaha ok so it is a necessary prereq of the test case16:55
sean-k-mooneyyep it failing in the test that confimrs you can clone luks encypted voluems16:55
sean-k-mooneybut that also why only 1 test fails16:55
clarkbit is probably possible to use something other than qemu-img to manage luks volumes/devices16:59
clarkbbut that also probably implies big changes to nova?16:59
sean-k-mooneyproably but why woudl we do that for a bug that is already fixed upstream?16:59
sean-k-mooneyi.e. why replace somethign that works for a temporay bug17:00
clarkbthe primary reason woudl be because the bug isn't expected to get fixed in the platforms people deploy on17:01
clarkb(I don't know if that is the case but it is common for distros to not fix issues in their stable releases unless they are security issues or hard crashes)17:01
sean-k-mooneyso the cammand that is failign ins the image convertsion 17:02
sean-k-mooneysudo cinder-rootwrap /etc/cinder/rootwrap.conf qemu-img convert -O luks -f raw -o cipher-alg=aes-256,cipher-mode=xts,ivgen-alg=plain64 --object secret,id=luks_sec,format=raw,file=/opt/stack/data/cinder/conversion/luks_qj1kdpp8 -o key-secret=*** /opt/stack/data/cinder/conversion/tmpzj1a9mr9 /opt/stack/data/cinder/conversion/tmpzj1a9mr9.luks17:02
clarkbI guess with debian the fix may be via the backports repo17:02
clarkbwhich is probably sufficient for our needs17:03
sean-k-mooneyso it converting the raw imamge into a lunks encyped datastrame that writen to the backing volume17:03
fungiclarkb: yes, the fix is in the version in bullseye-backports and also the version in trixie proper17:03
clarkbfungi: ack thanks17:03
fungiat least if it's the bug we think it is17:04
fungiwhich seems probable17:04
sean-k-mooneyit was not in ubuntu last summer but it may be now we woudl need to check if it ever made it to noble17:04
sean-k-mooneys/last summer/in august/17:05
sean-k-mooneyfungi: https://termbin.com/t48q seams to work as the preferncie file content17:06
sean-k-mooneyim thinkign of doing this in https://github.com/openstack/devstack/blob/master/tools/fixup_stuff.sh17:07
sean-k-mooneybut do you have any other suggestion?17:08
sean-k-mooneyi could just put it in https://github.com/openstack/devstack/blob/master/lib/nova_plugins/functions-libvirt17:08
sean-k-mooneywith the actual package installation as well17:08
sean-k-mooneyi guess we enabel the virt previw repo for fedora there17:08
fungii don't really have any architectural opinions on devstack itself, since i'm not helping maintain it17:08
sean-k-mooneyso its kind of the same17:08
sean-k-mooneyi htink ill leave it in functions-libvirt as its a little more discoverable17:09
opendevreviewSean Mooney proposed openstack/devstack master: use qemu/libvirt from backport repos on debian 12  https://review.opendev.org/c/openstack/devstack/+/96835417:34
sean-k-mooney1} cinder_tempest_plugin.scenario.test_volume_encrypted.TestEncryptedCinderVolumes.test_boot_cloned_encrypted_volume [138.328944s] ... ok18:29
sean-k-mooneythat still running and 1 success for a race condtion does not prove anything18:29
sean-k-mooneybut https://review.opendev.org/c/openstack/devstack/+/968354 seams to be working18:29
opendevreviewMerged openstack/grenade master: Drop reference to removed services  https://review.opendev.org/c/openstack/grenade/+/93547418:31
fricklersean-k-mooney: nice, we could backport that to stable branches and switch to trixie for master anyway I guess18:54
sean-k-mooneyyep ill do a dnm form nova to get a little more coverage18:56
sean-k-mooneymaybe also cidner18:56
sean-k-mooneyfor both patches btu that was what i was thinking18:57
frickler+118:57
sean-k-mooneyfrickler: do you know how ofte we typically bump the ceph verion in these jobs? it dont think that currently covered by our testing runtimes is it?18:58
sean-k-mooneyit looks like we did it 8 months ago https://github.com/openstack/devstack-plugin-ceph/commit/9820be32934a4a074415e320ce757a96c973bc2818:59
sean-k-mooneyi ask becasue the "tentacle" release replaced "squid" as the most recent release last week 2025-11-1819:01
sean-k-mooneyso i was wonderign if i should propsoae a mail or soemthing to the dev list ot consier moving to that this cycle?19:02
sean-k-mooneywe install ceph form container and you can overried the version in the job if you liek so its not urgent either way19:03
fricklerI think gouthamr was doing more on this, I personally would be very conservative in terms of upgrading ceph at least for productive environments, I wouldn't switch to a new ceph release until at least 6 months have passed19:27
*** haleyb is now known as haleyb|out22:51

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!