Thursday, 2026-06-04

opendevreviewsean mooney proposed openstack/tempest master: fix nova 2.80 responce schema  https://review.opendev.org/c/openstack/tempest/+/99155610:15
opendevreviewsean mooney proposed openstack/tempest master: fix nova 2.80 responce schema  https://review.opendev.org/c/openstack/tempest/+/99155610:17
opendevreviewTakashi Kajinami proposed openstack/devstack master: neutron: Remove [DEFAULT] bind_host  https://review.opendev.org/c/openstack/devstack/+/99156310:23
opendevreviewStephen Finucane proposed openstack/tempest master: Remove compute API schemas  https://review.opendev.org/c/openstack/tempest/+/99158111:27
stephenfinsean-k-mooney: alternative approach to your fix ☝️11:28
sean-k-mooneyi am also ok with that :)11:28
sean-k-mooneyi just dint know hwoe to do that properly11:29
sean-k-mooneyso i did the 1 line fix11:29
sean-k-mooneygmaan: ^ tldr 2.80 has a trivial bug but since nova now has complete the openapi serise and has full responce scema valiation we could remove the tempest test11:30
sean-k-mooneygmaan: if im not misatke you wanted to wait until the last stable branche that didnt have the repsoce validation was eol/unmainted11:30
sean-k-mooneyto drop the validation in tempest but i dont knwo if that is really requried11:30
sean-k-mooneyso i have no issues with fixign the case that is broken or taking stephens approch and just nuking the valdiation in tempest and movign on with our lives11:31
stephenfinI don't see any reason to keep it given (as I noted in the patch) that we don't/can't backport API changes 11:36
*** pdeore_ is now known as pdeore14:01
gmaaneven we do not backport API changes but still regression can happen and tempest master schema in those stable branches catch them15:47
gmaananyways, I am in no rush on removing the tempest schemas but we can discuss it once the coverage of the same is in all stable branches nova side15:48
gmaanapproved the 2.80 schema bug, thanks sean-k-mooney for fixing15:49
sean-k-mooneyits one of the many interment failures i saw in my cirror experiemnt15:50
sean-k-mooneyi have no idea why that does not fail 100%15:50
sean-k-mooneystephen suggested the message might be misleading and it failed on something esle but i coudl not see anyting obviously wrong15:50
gmaansean-k-mooney: failure link, I can check. it is strange that it is not failing 100%16:03
sean-k-mooneysure one sec ill get the tab16:04
sean-k-mooneyhttps://1891d9588d40ea5325f3-1988f1bc3d637497f7692396b58d77ce.ssl.cf5.rackcdn.com/openstack/099abcc112ac4e5c85e99755024e110e/testr_results.html  https://zuul.opendev.org/t/openstack/build/099abcc112ac4e5c85e99755024e110e16:06
sean-k-mooneythat was a one of failure in nova-grenade-multinode 16:06
gmaansean-k-mooney: ohk so the test was like that it is not guaranteed to get the migrations from nova and it list is empty then it skip the test not fail. that explain why it did not fail 100% https://github.com/openstack/tempest/blob/bda57c90dfe02de4e7206b0252701b80dde93939/tempest/api/compute/admin/test_live_migration.py#L49716:11
sean-k-mooneyah so the test is also a little non determinsitc16:12
sean-k-mooneybased on timing16:12
gmaanyeah16:15
sean-k-mooney wether or not we actully merge my cirros iso experemtn pathc it has been very good at findin unlreated ci bugs16:17
sean-k-mooneythere was a non detemristic failre in the nova sutdwon job too by the way and in the nova vtpm job16:18
gmaan++16:18
sean-k-mooneythe graceful shutdeon one was it took a little longer then we waited for it to shutdown16:18
sean-k-mooneygmaan: do you recall if we have a buffer/grace preriod built inot that job?16:19
sean-k-mooneyfor the vtpm one we got schduled to a amd host and nova rejected the host as bing compatble with the requeisted cpu model16:19
sean-k-mooneyi have not tied to fix either of those yes16:19
sean-k-mooney*yet16:20
gmaansean-k-mooney: yeah we wait for 180 + 30 sec and if service did not stopped then it fail16:21
gmaani think that is enough time and if it fail to stop then there is some other issue16:22
sean-k-mooneyack https://zuul.opendev.org/t/openstack/build/b33251f50495423088e47c8058a8760c that was actully on the AGENT.md chagne16:22
gmaan180 sec is what compute wait for shutdown and return to serviec.stop which I think should be completed in 30 sec. so 30 sec wait is seomthing can be increased if it is valid timeout16:23
gmaanchecking16:23
sean-k-mooneyTimed out waiting for compute service on np1108b6f6f5e84 to be inactive (current: failed)16:25
sean-k-mooneyso i wonder if you need to also account for the heatbeat interval16:25
sean-k-mooneywell no16:26
sean-k-mooneythis iactully the n-cpu16:26
sean-k-mooneyoh16:26
sean-k-mooneybut it went to failed16:26
sean-k-mooneyand ite expecting inactive16:26
sean-k-mooneyso maybe it need to handel both16:26
gmaanit hanged on 2nd RPC server16:27
gmaan"nova-compute service stopping RPC server on topic: compute-alt"16:27
gmaanbut it did not stopped it completely before tiemout happen16:27
gmaanhttps://zuul.opendev.org/t/openstack/build/b33251f50495423088e47c8058a8760c/log/controller/logs/screen-n-cpu.txt#2827716:27
gmaanmeans there is still something running on 2nd RPC server16:28
sean-k-mooneyso ya it ligitly didnt finsih in the time allowed16:28
sean-k-mooneyand then hard shutdown i ugess16:28
sean-k-mooneyso oslow killed it16:29
sean-k-mooneyand it went to failed instead of inactive16:29
sean-k-mooneybased on 16:29
sean-k-mooneyay 29 16:00:06.284694 np1108b6f6f5e84 nova-compute[96531]: INFO oslo_service.backend._threading.service [None req-90c75c07-595f-4ef0-9290-6d50665dc3a9 None None] Graceful shutdown timeout exceeded, instantaneous exiting16:29
sean-k-mooneyMay 29 16:00:06.299677 np1108b6f6f5e84 systemd[1]: devstack@n-cpu.service: Main process exited, code=exited, status=1/FAILURE16:29
gmaan2nd RPC server got 19 sec to be finished bfore tim eout happen16:29
gmaanand it did not16:29
sean-k-mooneyi guess there is a potiall delay in the signal handler prociign the sigterm16:30
gmaanyeah so 30 sec of timeout is finished before 2nd RPC server finished in 19 sec16:30
gmaanmaybe16:30
sean-k-mooneyso it looks like update aviabel resouce was runnign in the backgorund16:31
sean-k-mooneywhen we did this16:31
gmaanbut we wait enough for SIGTERM to arrive, revert resize to finish and then 180 sec wait16:31
gmaanI think 30 sec is little on neck to neck here, maybe we can increase that to 6016:32
sean-k-mooneybut we dont prevent perodic form starting16:32
sean-k-mooneyor running  after we get the singal do we16:32
gmaanyeah, that is what I am doing as part of phase-2. 'no new periodic tasks after shutdown is initiated'16:33
sean-k-mooneyah ok16:33
sean-k-mooneyso that a know gap16:33
sean-k-mooneymakes sense16:33
gmaanI was in doubt when i did 30 sec timeout but making it to 60 make sense to avoid these timing things16:34
sean-k-mooneywe could extend it a bit more16:34
sean-k-mooneythe job is short but it also rarely fails this way16:34
sean-k-mooneyso we can keep an eye on it16:34
sean-k-mooneyif your workign on this anyway in pahse 2 16:34
sean-k-mooneythen we dont nee to rush unless we see it start to fail often16:35
gmaanyeah, that also fine as phase-2 makes it more better on wait16:35
gmaanack16:35
sean-k-mooneyi guess that just leave the vtpm failre to look into but i can follow up with that seperatly16:36
sean-k-mooneyhttps://zuul.opendev.org/t/openstack/build/0e2b96f24dc64c71b6c5ce412205c0a1/log/controller/logs/screen-n-cpu.txt#149416:39
sean-k-mooneyhttps://zuul.opendev.org/t/openstack/build/0e2b96f24dc64c71b6c5ce412205c0a1/log/controller/logs/etc/nova/nova-cpu_conf.txt#132-14016:40
sean-k-mooneywe are tweaking the cpu flags there 16:40
sean-k-mooneybut that combination shoudl actully work on intel or amd16:40
sean-k-mooneyfor the vtpm usecase we coud disabel that overed since we are not runnign the test that uses them16:41
sean-k-mooneyor change the values we orgianlly chose those as option that shoule alwasy work on any cpu16:41
sean-k-mooneyi belive we actully have a know bug realted to this too so im not really wored but obviously we dont want the false positive her16:43
sean-k-mooneyoh 16:44
sean-k-mooney] Error from libvirt when retrieving domain capabilities for arch x86_64 / virt_type kvm / machine_type pc: [Error Code 8]: invalid argument: the accel 'kvm' is not supported by '/usr/bin/qemu-system-x86_64' on this host {{(pid=80813) _add_to_domain_capabilities /opt/stack/nova/nova/virt/libvirt/host.py:1058}}16:44
sean-k-mooneyi think this is a nova bug16:45
sean-k-mooneyi think we shoudl not be passing kvm16:45
sean-k-mooneyor we shoudl be it KVM or somethign like that16:45
sean-k-mooneyah16:54
sean-k-mooneyJun 03 18:07:52.710361 npa5a9814153f44 nova-compute[80813]:       <model>kvm64</model>16:54
sean-k-mooneyJun 03 18:07:52.710361 npa5a9814153f44 nova-compute[80813]:       <vendor>AMD</vendor>16:54
sean-k-mooneyso the host this ran is is also useing a custom model16:54
sean-k-mooneybased on kvm6416:54
sean-k-mooneythey have obviously enabeld a bunch of cpu flaks in the host nova16:55
sean-k-mooneybut this is effectivly an etirly custom cpu from the looks fo it16:55
melwittisn't that just the nested virt requirement? the nodeset required by the job should have nested bc recall it would not work in practice without nested virt even though we would have expected it to16:56
sean-k-mooneyit does16:57
sean-k-mooneybut the whitebox tempst plug make some assumtions about what the cpus supprot16:57
melwittoh ok16:58
sean-k-mooneymelwitt: so either htis host is not actully supproting nested virt16:58
sean-k-mooneyor it is but because its a custom kvm64 host 16:58
sean-k-mooneyvm the kvm module is not loading automatily in nested virt mode16:58
sean-k-mooneyor the model we are requeting is actully inompatable16:59
sean-k-mooneymelwitt: so it no an issue with your vtpm work16:59
sean-k-mooneythis is just a mismatch betwen what that provdier is giving and the job expecations16:59
sean-k-mooneymelwitt: for example i know that enabling nested virt by default happen in diffent kernel version for intel and amd17:00
melwittok I see. trying to keep my eyes peeled for stuff I might have broke 😬 17:00
sean-k-mooneyalso because intel and amd cant agree on anyting the flag on intel is vmx and it svm on amd17:01
melwittdo they ever agree on flags17:01
sean-k-mooneyrarely17:01
sean-k-mooneyif we looks at the host flags in this job https://zuul.opendev.org/t/openstack/build/0e2b96f24dc64c71b6c5ce412205c0a1/log/controller/logs/screen-n-cpu.txt#3017:02
sean-k-mooneyi dont see svm17:02
sean-k-mooneyso the nested virt vm does nto actully supprot nested virt17:03
sean-k-mooneythat why it explode when we tried touse kvm17:03
sean-k-mooneythis ran on raxflex-dfw3-main17:04
sean-k-mooneyso that provider is sometimes not providign nested virt capbale vsm for the nested-virt-ubuntu-noble lable17:04
sean-k-mooneytaking libvirt/nova out of the picture we can see the same in the ansible facts for the host https://zuul.opendev.org/t/openstack/build/0e2b96f24dc64c71b6c5ce412205c0a1/log/zuul-info/host-info.controller.yaml#41017:07
sean-k-mooneysvm and vmx are missing so kvm wont work17:07
melwittis there a way to fix it, like is it that some config is showing flags it does not support for that provider or something?17:11
sean-k-mooneyya so there are 2 things17:11
sean-k-mooneywe shoudl follow up with the infra folks and reach ow to rackspace17:11
sean-k-mooneythis could be an issue with a few computes in the wrong host aggreate17:12
melwittah ok17:12
sean-k-mooneyand we can temporay remove the lable form the DFW provider in the zuul config17:12
melwittcc clarkb ^17:12
sean-k-mooneythe passign runs seam to be on vexhost17:12
sean-k-mooneyim checkign if any of the recent passign runs were on that provider17:12
clarkbwhich provider isn't providing nested virt?17:13
sean-k-mooneyraxflex-dfw3-main17:14
clarkbweird. That is the newest cloud we talk to. I wonder if they removed nested virt support?17:14
clarkbor maybe they have misconfigured hypervisors17:14
sean-k-mooneythe are usign an entrily custm cpu model17:15
sean-k-mooneyand they forgot to add the flag for nested virt17:15
clarkbya so maybe they chagned the cpu models and that happened17:15
sean-k-mooneyor intentionally left it out17:15
sean-k-mooneyhttps://zuul.opendev.org/t/openstack/build/19d3f49f01e14fd79bbd56155e810c67/log/controller/logs/screen-n-cpu.txt#3217:15
clarkbcardoe maybe you know ^17:15
sean-k-mooneyso ya the are presentign as kvm64 cpu + a bunch of addtionl flags17:15
sean-k-mooneyso they have cpu_model=kvm64 + cpu_mode=custome and then they listed a lot of cpu flags17:16
sean-k-mooneymy guess is this is to work aroudn specter adn a bunch of those silicon bugs17:16
clarkbhttps://zuul.opendev.org/t/openstack/build/19d3f49f01e14fd79bbd56155e810c67/log/zuul-info/host-info.compute-host.yaml#410-52217:16
sean-k-mooney yep i checkt that too17:17
sean-k-mooneyso if im not mistake we shoudl se svm there17:17
sean-k-mooneyfor amd hosts and vmx for intel17:17
clarkbyes I think that is the case17:18
sean-k-mooneyhttps://zuul.opendev.org/t/openstack/build/ae9f2247a9584d3db324e6548753ae6c/log/zuul-info/host-info.controller.yaml#371 maybe not?17:19
sean-k-mooneythat a amd host on vexhost wher ethe job worked17:19
sean-k-mooneyoh no17:19
sean-k-mooneyhttps://zuul.opendev.org/t/openstack/build/ae9f2247a9584d3db324e6548753ae6c/log/zuul-info/host-info.controller.yaml#42517:19
sean-k-mooneyits there i just missed it  so ya that the problem17:19
sean-k-mooneyso ya cardoe the fix si to add svm to cpu_model_extra_flags with the rest or remove the lable form the proivder17:23
clarkbspot checking both dfw3 and iad3 none of them have svm or vmx in the flags (sjc3 is currently disabled for unrelated reasons)17:23
clarkbI'll put a chagne up to drop the nested virt labels from rax flex17:23
sean-k-mooneyack so only vexhost is workign for now17:24
sean-k-mooneyfor that lable17:24
sean-k-mooneymelwitt: as an asside i know qemu was crashing without it but we may want to try debian-13 or ubuntu 26.04 again in the futrue adn see if we can remove the need for kvm17:25
melwittok sure17:25
clarkbovh also does nested virt17:25
sean-k-mooneyclarkb: oh good to know17:25
melwittit would be ideal to not need it17:25
clarkbraxflex, ovh, and vexxhost support nested virt right now. I'm removing raxflex based on the evidence collected above17:25
sean-k-mooneythanks17:26
sean-k-mooneybrb17:26
clarkbhttps://review.opendev.org/c/opendev/zuul-providers/+/99170117:27
opendevreviewRajat Dhasmana proposed openstack/devstack master: Add cinder config for multi-ceph setup  https://review.opendev.org/c/openstack/devstack/+/95208818:29
opendevreviewRajat Dhasmana proposed openstack/devstack master: Add support for Cinder replication  https://review.opendev.org/c/openstack/devstack/+/95304518:29
whoami-rajatthanks sean-k-mooney for catching the rebase error i made, fixed it now https://review.opendev.org/c/openstack/devstack/+/95208818:31
sean-k-mooney+218:35
sean-k-mooneylet me chec the depedn on18:35
sean-k-mooneydid you rdnm report18:35
sean-k-mooneyhum no18:35
sean-k-mooneywhy did https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/952107 not tirgger 18:36
sean-k-mooneyoh it did https://zuul.opendev.org/t/openstack/builds?change=952107&skip=018:37
sean-k-mooneyhttps://zuul.opendev.org/t/openstack/build/c75ad27c92c74b3894017f773549cc8c18:37
sean-k-mooneyoh no that a year ago18:37
sean-k-mooneyoh i see18:38
sean-k-mooneyits because it need a rebase18:38
sean-k-mooneywhen it say not currnet  on the right in orange18:38
sean-k-mooneyit wont actuly trigger18:38
sean-k-mooneyand because its missing signed off by you cant rebase via the ui18:39
sean-k-mooneywhoami-rajat: would you mind updating https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/95210718:39
opendevreviewMerged openstack/tempest master: fix nova 2.80 responce schema  https://review.opendev.org/c/openstack/tempest/+/99155618:41
gmaansean-k-mooney: thoe are tested in https://review.opendev.org/c/openstack/cinder-tempest-plugin/+/952882 18:45
sean-k-mooneyi that more testing the cidner replciation feature which uses it as a sideffect but you are correct18:48
sean-k-mooneygmaan: i was hopign we coudl make the DNM an actul real job to test change to the devectk plugin18:49
sean-k-mooneygiven https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/951810/7 and https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/953042/2 are otherwize untest in that ci18:50
gmaan++ yeah running multi_ceph testing on ceph plugin is good idea18:50
opendevreviewRajat Dhasmana proposed openstack/devstack-plugin-ceph master: Test multi ceph + replication configuration  https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/95210718:50
sean-k-mooneymy main concer with https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/951810 beside the testing was more the memory usage of 2 cpehs cluster on one host18:51
sean-k-mooneybut its proably ok18:51
opendevreviewRajat Dhasmana proposed openstack/devstack-plugin-ceph master: Test multi ceph configuration  https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/95210718:51
sean-k-mooneyespically sicne that josb is not really testing nova vms its just testing the volume replication18:52
whoami-rajatsean-k-mooney, gmaan done https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/95210718:52
sean-k-mooneymemory_tracker low_point: 85500818:53
sean-k-mooneythat actull not that terible18:53
whoami-rajatI don't think running 2x ceph services would be a big impact on memory, just the disk space needs to be considered since we're creating 2 ceph clusters18:55
sean-k-mooneyso for on osd its 18:57
sean-k-mooney   Memory: 181.2M (peak: 449.9M, swap: 32.1M, swap peak: 37.3M, zswap: 3.3M)18:57
sean-k-mooneyand the second is 18:58
sean-k-mooneyMemory: 143.4M (peak: 471.9M, swap: 16M, swap peak: 16.3M, zswap: 2M)18:58
sean-k-mooneyco callit 150M per osd with a 450M peak liekly during startup or the inital providioning18:58
sean-k-mooneywhoami-rajat: gmaan  so do either of ye have a concern with proceedign with https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/95181019:09
sean-k-mooneyif not then we can like appove that based on the cidner-tempest-plugin results19:09
sean-k-mooneyand allwo this to start merging19:10
gmaansean-k-mooney: m good as long as job verify it (952107) so go ahead once results are there19:16
sean-k-mooneyhttps://zuul.openstack.org/stream/a5d0bab3b1844aecadf58a84ca73ad20?logfile=console.log its passing so ill jsut wati for ti to compelte19:26
sean-k-mooneyand by passing i mena tempest is running and succeeding19:26
sean-k-mooneybut i want to confirm that both custers are runign first 19:27
*** elodilles is now known as elodilles_OoO20:17
opendevreviewEric Harney proposed openstack/devstack-plugin-ceph master: Enable scheduled backend trash purging  https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/91953921:01
cardoeclarkb: I’m on PTO with no work access. Maybe jamesdenton or cloudnull are on somewhere.21:21
clarkback sorry to bother21:51
opendevreviewMerged openstack/devstack-plugin-ceph master: Add support to deploy two ceph clusters  https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/95181022:22
opendevreviewMerged openstack/devstack-plugin-ceph master: Add support for ceph replication  https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/95304222:44

Generated by irclog2html.py 4.1.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!