Tuesday, 2025-11-25

*** mhen_ is now known as mhen		02:30
ykarel	gmaan, ack updated	06:10
frickler	gmaan: nova-ceph-multistore seems to be consistently failing now on https://review.opendev.org/c/openstack/devstack/+/968076 , I hope we didn't introduce a real regression due to lack of test coverage. still I'm wary of simply making the job non-voting now	08:25
frickler	yes, it is passing on other changes, like https://zuul.opendev.org/t/openstack/build/1aa05ba4148a46ba9f1bd547cb95c145 . :-( sean-k-mooney ^^ maybe you can check this?	08:36
sean-k-mooney	frickler: just getting started but ill take a look shortly	09:13
sean-k-mooney	test_boot_cloned_encrypted_volume	09:29
sean-k-mooney	frickler: so we have passing runs like my patch above where that passes https://review.opendev.org/c/openstack/devstack/+/968249	09:31
sean-k-mooney	so it by no means universally broken, i guess the concern is https://review.opendev.org/c/openstack/devstack/+/968076/1/lib/tempest#829	09:32
sean-k-mooney	is the behviaorl chagne that is casue the test case to faile	09:32
sean-k-mooney	let me see if that est is running on the passing version or if its skipped	09:32
sean-k-mooney	hum this is also form cinder_tempest_plugin	09:33
sean-k-mooney	i didnt think we ran that plugin in nova-ceph-multistore	09:33
sean-k-mooney	i wonder of the test is new	09:33
sean-k-mooney	no its running and passing on my patch	09:35
sean-k-mooney	frickler: ah so this is a know issue we hit a few months ago	09:38
sean-k-mooney	https://pb.teim.app/?90a8d20013a7b570#HH3B9MsRVHJAuFjtDvUKkLduZtwnELkFNAr5hYQFBHbq	09:38
sean-k-mooney	the tldr is cidner is using a cgrpu liek mechanisum plimits or something to enforce some runtim restriction on the processutils.exec calls to qemu-img	09:40
sean-k-mooney	in some cases that failes with 'qemu-img: /opt/stack/data/cinder/conversion/tmpzj1a9mr9.luks: error while converting luks: Unable to get accurate CPU usage\n'	09:40
sean-k-mooney	its the throttle_cmd bit of convert_image(tuple(throttle_cmd['prefix']), in the trace File "/opt/stack/cinder/cinder/image/image_utils.py", line 553, in convert_image	09:41
sean-k-mooney	https://tinyurl.com/3wjwjpuu	09:45
sean-k-mooney	i dont know howlove or logs are but it looks liek 250 ish failure in the last 2 weeks	09:45
sean-k-mooney	frickler: for now i have recheked it again the one thing that i do find intersting is all the failures are on raxflex-*	09:50
sean-k-mooney	those are kvm host using the pc machine type and a pretty old biors firmware. the passign case is on xen on the older rax cloud	09:53
sean-k-mooney	i dont know if i can trust	09:54
sean-k-mooney	ansible_bios_date: 04/01/2014	09:55
sean-k-mooney	ansible_bios_vendor: SeaBIOS	09:55
sean-k-mooney	ansible_bios_version: 1.16.2-debian-1.16.2-1	09:55
sean-k-mooney	that would be the bookwork version https://packages.debian.org/source/oldstable/seabios	09:56
sean-k-mooney	which i guess may make sense	09:56
sean-k-mooney	i was orgianbly wonderign if this was that xen bug where sometime we dont get all the cpus we expect but aprpently not	09:58
opendevreview	Merged openstack/devstack master: Revert "Cap stable/2025.2 network, swift, volume api_extensions for tempest" https://review.opendev.org/c/openstack/devstack/+/968076	14:01
frickler	sean-k-mooney: oh, so completely unrelated to the change after all, just bad luck, thx for checking. and logs expire after 10 days or so for the opensearch access iirc	14:23
frickler	maybe we can check with rax whether we can use a different machine type or something, I guess this could affect other customers of them, too?	14:24
frickler	also do you know whether there might be some option for qemu-img that could be added as a workaround?	14:27
sean-k-mooney	i doubt this is failing 100% of the time	14:27
sean-k-mooney	i.e. i suspect we had more then 250 jobs run on the rax flex nodes in those 10 days	14:28
sean-k-mooney	i.e. with that test	14:28
sean-k-mooney	but ya it look like bad timing but i also think we need to evenutaly fix this in cinder	14:28
sean-k-mooney	frickler: with that said i think there may be a qemu-img bug at play as well	14:29
* sean-k-mooney is trying to remember all the context form a few weeks ago		14:30
sean-k-mooney	frickler: ok so ya to work around a qemu-img bug related to adresspace usage with there new async mode in librbd	14:31
opendevreview	Maxim Sava proposed openstack/tempest master: Add image decompression import tests https://review.opendev.org/c/openstack/tempest/+/965889	14:31
sean-k-mooney	we moved the ceph job to debian 12	14:31
sean-k-mooney	frickler: then we hit this bug which is fixed in the version in debian backports	14:32
sean-k-mooney	but i never got aroudn to enabling that	14:32
sean-k-mooney	so a fix for this may be to move this job to debian 13 now that we have that	14:32
sean-k-mooney	frickler: ill push a change for that to review and we can see if we want to proceed	14:32
sean-k-mooney	frickler: https://github.com/openstack/devstack-plugin-ceph/commit/536ebea559673d694c8f8deeafe1362e2c41c021	14:33
sean-k-mooney	https://bugs.launchpad.net/ceph/+bug/2116852	14:33
frickler	.oO(distro hopping ;)	14:34
sean-k-mooney	yep to aovid the buggy version fo qemu-img	14:34
sean-k-mooney	this was inteded to be temproy to be fair but other then this rare faliure it been just as stable as ubuntu	14:34
sean-k-mooney	we ment form an almost alwasy failrue to a very rare one	14:35
frickler	clarkb: fungi: not sure if you're reading along anyway, but fyi the tinyurl from sean above shows yet another case of "raxflex is different and can break things", which we might want to forward to rax, too	14:37
fungi	i was not following, catching up now sorry	14:38
frickler	the failure likely got more frequent recently as we enabled more capacity on raxflex	14:38
fungi	so maybe related to https://bugzilla.redhat.com/show_bug.cgi?id=2336437 i guess?	14:43
fungi	"...a subtle bug on machines which are very fast..." (from dan berrangé's related qemu-devel post)	14:49
fungi	seems like the way rackspace flex may be different in this case is by having faster processors (which we already know)	14:49
fungi	if it needs https://gitlab.com/qemu-project/qemu/-/commit/145f12e then that will require qemu 10.0.0 or later, looks like?	14:58
fungi	frickler: sean-k-mooney: am i understanding the concern correctly?	14:59
sean-k-mooney	sorry was looking at somethign else reading back	15:00
fungi	also the error message in your opensearch query was first introduced by qemu 9.2.0, looks like	15:01
sean-k-mooney	qemu-9.2.0-3.fc42	15:02
fungi	so in theory you'd start seeing it with qemu >=9.2,<10 if the error is related to the bug i mentioned	15:02
sean-k-mooney	ya so there are 2 related issues	15:03
sean-k-mooney	librados	15:03
fungi	well, qemu-9.2.0-3.fc42 is the fedora backport of the fix mentioned in the rh bug, not the upstream version it's fixed in	15:03
sean-k-mooney	is internally rewritign how it works to su c++ coruties and an async executor	15:03
sean-k-mooney	that increased the virtual memory usage but not nessisalry the RSS	15:04
sean-k-mooney	causing a check in nova to hard fail	15:04
sean-k-mooney	to work around that we swaped to debian 12 to ahve a version that does not have that behavior	15:04
sean-k-mooney	which uncoverd this other bug	15:04
sean-k-mooney	debian backport in 12 has a newer qemu-image that shoudl have a fix for https://bugzilla.redhat.com/show_bug.cgi?id=2336437 as shoudl 13	15:05
sean-k-mooney	on the nova side i have not got around to fixing our usage to check RSS not vritual memory	15:05
sean-k-mooney	on the cinder side	15:05
fungi	looks like debian 23 (trixie) has qemu 10 so would include dan's upstream fix	15:06
fungi	er, debian 12	15:06
sean-k-mooney	you mean 13 (trixie)	15:06
sean-k-mooney	ya so first step i was going to push a patch to just move the ceph jobs form 12->13	15:06
sean-k-mooney	and see if that passes a few times	15:06
fungi	oh, sorry yes. debian 12 (bookworm) has qemu 7.2 which pre-dates that check, but also bookworm-backports has qemu 10	15:07
sean-k-mooney	if so that should side step most of the issues but i sould also do a draft patch to change the nova check	15:07
fungi	so depending on whether or not you're installing qemu packages from -backports, you either have a version from before the check was added or a version which includes the fix for it	15:07
opendevreview	Fernando Ferraz proposed openstack/devstack-plugin-nfs master: [DNM] Test Glance over Cinder/NFS with NFS driver fixes https://review.opendev.org/c/openstack/devstack-plugin-nfs/+/965409	15:07
sean-k-mooney	fungi: yes	15:08
fungi	and yeah, switching to trixie you'll have a similar qemu version to what's in bookworm-backports	15:09
sean-k-mooney	we dicsused using backport a while ago but its not exactly clean. it requires some change to devstack that are more invasive then just updateign the package in files/ ...	15:10
sean-k-mooney	we are mirroring backport but debian backport rquire you to opt in per package to install form them	15:11
sean-k-mooney	so its not just a case fo enabling the repo	15:11
sean-k-mooney	we either need to pass -t backports? or simialr or install the package with <name>/backports or simialr	15:12
fungi	well, you don't have to opt in per-package, you can do it per invocation of apt by using -t, but generally yes it requires some adjustment because backports isn't meant to be something you install everything froim	15:12
sean-k-mooney	yep so i tied the targeted approch when i had a free after noone but ran out of time because i needed to isntall each dep as well	15:13
fungi	technically you can make that happen by overriding the priority for that suite in apt's configuration, but i don't recommend it. though you can configure apt pinning to backports for specific packages too	15:13
sean-k-mooney	gibi got a poc working that way	15:14
sean-k-mooney	btu the simple way to do this vai backprot would be to have an extra if here	15:14
sean-k-mooney	https://github.com/openstack/devstack/blob/master/lib/nova_plugins/functions-libvirt#L72	15:14
sean-k-mooney	for debian if its debian 12	15:14
fungi	yeah, you could instead drop a conffile in /etc/apt/apt.conf.d/ that basically says always install qemu packages from backports	15:16
sean-k-mooney	do you have an exampel of that bacues i coudl not find that but if there is one that woudl eb an path forward	15:17
sean-k-mooney	we coudl do that in the fixup-stuff methods	15:17
fungi	sean-k-mooney: i'm looking for a more concise example, but https://linuxconfig.org/debian-pinning-howto is pretty close	15:20
fungi	and it's /etc/apt/preferences.d/ you'd put it under actually	15:21
opendevreview	Sean Mooney proposed openstack/devstack-plugin-ceph master: move ceph jobs to debian 13 https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/968335	15:22
sean-k-mooney	ah	15:23
sean-k-mooney	Package: *	15:23
sean-k-mooney	Pin: release a=stable	15:23
sean-k-mooney	Pin-Priority: 900	15:23
sean-k-mooney	so i could proably do qemu and libvirt or simialr	15:23
sean-k-mooney	ah via /etc/apt/preferences	15:24
fungi	yes, i recommend using the /etc/apt/preferences.d/ directory so you don't have to worry about file-level overwrites/conflicts with your config additions	15:25
sean-k-mooney	ok i can also try that. i still think moving to debian 13 woudl make sense but we can do both to have somethign we could backport for sable branches	15:26
fungi	the default priorities for each suite are encoded into their release files, so you only have to worry about installing configuration for your overrides	15:26
sean-k-mooney	yep	15:26
fungi	and yes, if this is for testing openstack master branches then debian 13 is what the pti says we're going to test anyway	15:27
fungi	now that our package mirror network should be working for it, hopefully it's pretty trivial to get going	15:27
sean-k-mooney	ok well we can review the patch above with an view to this cycle but let me spend 20 mins creating a patch for devstack to use backport of qemu on debian 12 seperatly	15:28
fungi	https://superuser.com/questions/678396/apt-pinning-several-packages-in-one-section has some examples	15:29
fungi	from over a decade ago, but things haven't changed much with this mechanism	15:29
sean-k-mooney	ai say it shoudl be somthign like this https://pb.teim.app/?6bec1eccfb89cf14#4P3UUs6JMJt3jcXpNc3e1C9cEmwDeN9mxj6Eh1Gmw7de	15:29
sean-k-mooney	based on https://manpages.debian.org/testing/apt/apt_preferences.5.en.html	15:30
sean-k-mooney	which seam to be consitent with https://manpages.debian.org/testing/apt/apt_preferences.5.en.html#Matching_packages_in_the_Package_field	15:30
sean-k-mooney	packages is a space seperate list with bash glob support	15:30
fungi	yeah, i'd have to refresh my memory on whether 900 is high enough priority to overcome the stable default	15:31
sean-k-mooney	i dont know if you have tried perplexity.ai as a search tool but i like that it provide sitations for the souce that it used to make the recomendaiton so you can click in and verify it	15:32
fungi	ah, yeah priority >=1000 is only needed if you want apt to downgrade to the target version (otherwise it avoids downgrading)	15:33
sean-k-mooney	the man pages have coverage of the priorties https://manpages.debian.org/testing/apt/apt_preferences.5.en.html#How_APT_Interprets_Priorities	15:33
sean-k-mooney	so it looks liike 990	15:33
sean-k-mooney	might be more correct	15:33
sean-k-mooney	altough i think anythin over 500 would work for us	15:34
fungi	yes, i think you want >=990 according to the manpage	15:34
fungi	"causes a version to be installed even if it does not come from the target release, unless the installed version is more recent"	15:35
sean-k-mooney	ya and we are not passign a version	15:35
sean-k-mooney	so i think we need to raise to 990 as a result	15:35
sean-k-mooney	i mean we will see in either case alsothg im goign to create a debian 12 vm to test this locally to avoid the ci round trip	15:36
sean-k-mooney	fungi: its kind of a novel expiricne to have an issueb becasue our ci vms are too fast for once	15:42
sean-k-mooney	not that im particually complaining, its a good problem to have	15:43
fungi	heh, indeed!	15:44
sean-k-mooney	lol for the first time ever i hit the kernel paninc that cirros get locally booting a debin 12 image	15:55
sean-k-mooney	and of couse it booted fine on a reboot	15:55
clarkb	frickler: fungi: I'm not sure complaining to a cloud that their cpus are too fast is something I want to do	16:04
clarkb	the last time someone did that HPCloud instituted draconian anti noisy neighbor measures that made the cloud useless for us	16:04
sean-k-mooney	clarkb: no one is suggesting that at least not seriously	16:05
sean-k-mooney	this is a qemu bug	16:05
sean-k-mooney	not an openstack one	16:05
frickler	clarkb: yes, that's not what I meant, I was assuming earlier from sean's comments that the old bios or machine type might be part of the trigger for this	16:05
clarkb	got it so we thought maybe it was a config thing but now we realize its a software bug in qemu and we should focus on addressing that	16:06
sean-k-mooney	clarkb: i tought it might have been failing on the xen host wehre we had the kernel issue whre not all the cpu cores were aviable	16:06
sean-k-mooney	but it was the opicite it was passing on xen on the old rax hardware	16:07
fungi	but in this case it's a kvm provider not xen	16:07
fungi	right	16:07
sean-k-mooney	correct it was passign on the slow cpus because slow cpus dont trigger the qemu bug (apprently)	16:07
clarkb	I guess luks is trying to determine how many cycles it needs to hash secrets based on cpu speed? I wonder if you can configure it to a value instead and skip that step?	16:08
sean-k-mooney	im not sure if that is the reason	16:09
sean-k-mooney	but i dont think we coudl without changind cinder/nova to invoke qemu-img diffently in any case	16:09
sean-k-mooney	i.e. i dont think qemu-img has a global config file to change that behvior	16:10
clarkb	ack	16:10
opendevreview	Merged openstack/grenade stable/2025.1: Update grenade-skip-level-always job FROM branch https://review.opendev.org/c/openstack/grenade/+/963714	16:20
opendevreview	Maxim Sava proposed openstack/tempest master: Add image decompression import tests https://review.opendev.org/c/openstack/tempest/+/965889	16:47
sean-k-mooney	fungi: creating a debian vm took way longer then i plande because of dumb reasons but i think i have a prefernce file that work locally i just need to make devstack generate it and test it end to end.	16:53
clarkb	sean-k-mooney: are we doing luks in a nested vm? or does qemu-img runs this stuff unconditionally?	16:54
sean-k-mooney	clarkb: this is failign the the ceph job that test encypted ceph volumes	16:54
clarkb	the other thought that occurred to me is should we be using luks but it isn't clear to me if we do that intentionally	16:54
clarkb	aha ok so it is a necessary prereq of the test case	16:55
sean-k-mooney	yep it failing in the test that confimrs you can clone luks encypted voluems	16:55
sean-k-mooney	but that also why only 1 test fails	16:55
clarkb	it is probably possible to use something other than qemu-img to manage luks volumes/devices	16:59
clarkb	but that also probably implies big changes to nova?	16:59
sean-k-mooney	proably but why woudl we do that for a bug that is already fixed upstream?	16:59
sean-k-mooney	i.e. why replace somethign that works for a temporay bug	17:00
clarkb	the primary reason woudl be because the bug isn't expected to get fixed in the platforms people deploy on	17:01
clarkb	(I don't know if that is the case but it is common for distros to not fix issues in their stable releases unless they are security issues or hard crashes)	17:01
sean-k-mooney	so the cammand that is failign ins the image convertsion	17:02
sean-k-mooney	sudo cinder-rootwrap /etc/cinder/rootwrap.conf qemu-img convert -O luks -f raw -o cipher-alg=aes-256,cipher-mode=xts,ivgen-alg=plain64 --object secret,id=luks_sec,format=raw,file=/opt/stack/data/cinder/conversion/luks_qj1kdpp8 -o key-secret=*** /opt/stack/data/cinder/conversion/tmpzj1a9mr9 /opt/stack/data/cinder/conversion/tmpzj1a9mr9.luks	17:02
clarkb	I guess with debian the fix may be via the backports repo	17:02
clarkb	which is probably sufficient for our needs	17:03
sean-k-mooney	so it converting the raw imamge into a lunks encyped datastrame that writen to the backing volume	17:03
fungi	clarkb: yes, the fix is in the version in bullseye-backports and also the version in trixie proper	17:03
clarkb	fungi: ack thanks	17:03
fungi	at least if it's the bug we think it is	17:04
fungi	which seems probable	17:04
sean-k-mooney	it was not in ubuntu last summer but it may be now we woudl need to check if it ever made it to noble	17:04
sean-k-mooney	s/last summer/in august/	17:05
sean-k-mooney	fungi: https://termbin.com/t48q seams to work as the preferncie file content	17:06
sean-k-mooney	im thinkign of doing this in https://github.com/openstack/devstack/blob/master/tools/fixup_stuff.sh	17:07
sean-k-mooney	but do you have any other suggestion?	17:08
sean-k-mooney	i could just put it in https://github.com/openstack/devstack/blob/master/lib/nova_plugins/functions-libvirt	17:08
sean-k-mooney	with the actual package installation as well	17:08
sean-k-mooney	i guess we enabel the virt previw repo for fedora there	17:08
fungi	i don't really have any architectural opinions on devstack itself, since i'm not helping maintain it	17:08
sean-k-mooney	so its kind of the same	17:08
sean-k-mooney	i htink ill leave it in functions-libvirt as its a little more discoverable	17:09
opendevreview	Sean Mooney proposed openstack/devstack master: use qemu/libvirt from backport repos on debian 12 https://review.opendev.org/c/openstack/devstack/+/968354	17:34
sean-k-mooney	1} cinder_tempest_plugin.scenario.test_volume_encrypted.TestEncryptedCinderVolumes.test_boot_cloned_encrypted_volume [138.328944s] ... ok	18:29
sean-k-mooney	that still running and 1 success for a race condtion does not prove anything	18:29
sean-k-mooney	but https://review.opendev.org/c/openstack/devstack/+/968354 seams to be working	18:29
opendevreview	Merged openstack/grenade master: Drop reference to removed services https://review.opendev.org/c/openstack/grenade/+/935474	18:31
frickler	sean-k-mooney: nice, we could backport that to stable branches and switch to trixie for master anyway I guess	18:54
sean-k-mooney	yep ill do a dnm form nova to get a little more coverage	18:56
sean-k-mooney	maybe also cidner	18:56
sean-k-mooney	for both patches btu that was what i was thinking	18:57
frickler	+1	18:57
sean-k-mooney	frickler: do you know how ofte we typically bump the ceph verion in these jobs? it dont think that currently covered by our testing runtimes is it?	18:58
sean-k-mooney	it looks like we did it 8 months ago https://github.com/openstack/devstack-plugin-ceph/commit/9820be32934a4a074415e320ce757a96c973bc28	18:59
sean-k-mooney	i ask becasue the "tentacle" release replaced "squid" as the most recent release last week 2025-11-18	19:01
sean-k-mooney	so i was wonderign if i should propsoae a mail or soemthing to the dev list ot consier moving to that this cycle?	19:02
sean-k-mooney	we install ceph form container and you can overried the version in the job if you liek so its not urgent either way	19:03
frickler	I think gouthamr was doing more on this, I personally would be very conservative in terms of upgrading ceph at least for productive environments, I wouldn't switch to a new ceph release until at least 6 months have passed	19:27
*** haleyb is now known as haleyb\|out		22:51

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!