| opendevreview | sean mooney proposed openstack/tempest master: fix nova 2.80 responce schema https://review.opendev.org/c/openstack/tempest/+/991556 | 10:15 |
|---|---|---|
| opendevreview | sean mooney proposed openstack/tempest master: fix nova 2.80 responce schema https://review.opendev.org/c/openstack/tempest/+/991556 | 10:17 |
| opendevreview | Takashi Kajinami proposed openstack/devstack master: neutron: Remove [DEFAULT] bind_host https://review.opendev.org/c/openstack/devstack/+/991563 | 10:23 |
| opendevreview | Stephen Finucane proposed openstack/tempest master: Remove compute API schemas https://review.opendev.org/c/openstack/tempest/+/991581 | 11:27 |
| stephenfin | sean-k-mooney: alternative approach to your fix ☝️ | 11:28 |
| sean-k-mooney | i am also ok with that :) | 11:28 |
| sean-k-mooney | i just dint know hwoe to do that properly | 11:29 |
| sean-k-mooney | so i did the 1 line fix | 11:29 |
| sean-k-mooney | gmaan: ^ tldr 2.80 has a trivial bug but since nova now has complete the openapi serise and has full responce scema valiation we could remove the tempest test | 11:30 |
| sean-k-mooney | gmaan: if im not misatke you wanted to wait until the last stable branche that didnt have the repsoce validation was eol/unmainted | 11:30 |
| sean-k-mooney | to drop the validation in tempest but i dont knwo if that is really requried | 11:30 |
| sean-k-mooney | so i have no issues with fixign the case that is broken or taking stephens approch and just nuking the valdiation in tempest and movign on with our lives | 11:31 |
| stephenfin | I don't see any reason to keep it given (as I noted in the patch) that we don't/can't backport API changes | 11:36 |
| *** pdeore_ is now known as pdeore | 14:01 | |
| gmaan | even we do not backport API changes but still regression can happen and tempest master schema in those stable branches catch them | 15:47 |
| gmaan | anyways, I am in no rush on removing the tempest schemas but we can discuss it once the coverage of the same is in all stable branches nova side | 15:48 |
| gmaan | approved the 2.80 schema bug, thanks sean-k-mooney for fixing | 15:49 |
| sean-k-mooney | its one of the many interment failures i saw in my cirror experiemnt | 15:50 |
| sean-k-mooney | i have no idea why that does not fail 100% | 15:50 |
| sean-k-mooney | stephen suggested the message might be misleading and it failed on something esle but i coudl not see anyting obviously wrong | 15:50 |
| gmaan | sean-k-mooney: failure link, I can check. it is strange that it is not failing 100% | 16:03 |
| sean-k-mooney | sure one sec ill get the tab | 16:04 |
| sean-k-mooney | https://1891d9588d40ea5325f3-1988f1bc3d637497f7692396b58d77ce.ssl.cf5.rackcdn.com/openstack/099abcc112ac4e5c85e99755024e110e/testr_results.html https://zuul.opendev.org/t/openstack/build/099abcc112ac4e5c85e99755024e110e | 16:06 |
| sean-k-mooney | that was a one of failure in nova-grenade-multinode | 16:06 |
| gmaan | sean-k-mooney: ohk so the test was like that it is not guaranteed to get the migrations from nova and it list is empty then it skip the test not fail. that explain why it did not fail 100% https://github.com/openstack/tempest/blob/bda57c90dfe02de4e7206b0252701b80dde93939/tempest/api/compute/admin/test_live_migration.py#L497 | 16:11 |
| sean-k-mooney | ah so the test is also a little non determinsitc | 16:12 |
| sean-k-mooney | based on timing | 16:12 |
| gmaan | yeah | 16:15 |
| sean-k-mooney | wether or not we actully merge my cirros iso experemtn pathc it has been very good at findin unlreated ci bugs | 16:17 |
| sean-k-mooney | there was a non detemristic failre in the nova sutdwon job too by the way and in the nova vtpm job | 16:18 |
| gmaan | ++ | 16:18 |
| sean-k-mooney | the graceful shutdeon one was it took a little longer then we waited for it to shutdown | 16:18 |
| sean-k-mooney | gmaan: do you recall if we have a buffer/grace preriod built inot that job? | 16:19 |
| sean-k-mooney | for the vtpm one we got schduled to a amd host and nova rejected the host as bing compatble with the requeisted cpu model | 16:19 |
| sean-k-mooney | i have not tied to fix either of those yes | 16:19 |
| sean-k-mooney | *yet | 16:20 |
| gmaan | sean-k-mooney: yeah we wait for 180 + 30 sec and if service did not stopped then it fail | 16:21 |
| gmaan | i think that is enough time and if it fail to stop then there is some other issue | 16:22 |
| sean-k-mooney | ack https://zuul.opendev.org/t/openstack/build/b33251f50495423088e47c8058a8760c that was actully on the AGENT.md chagne | 16:22 |
| gmaan | 180 sec is what compute wait for shutdown and return to serviec.stop which I think should be completed in 30 sec. so 30 sec wait is seomthing can be increased if it is valid timeout | 16:23 |
| gmaan | checking | 16:23 |
| sean-k-mooney | Timed out waiting for compute service on np1108b6f6f5e84 to be inactive (current: failed) | 16:25 |
| sean-k-mooney | so i wonder if you need to also account for the heatbeat interval | 16:25 |
| sean-k-mooney | well no | 16:26 |
| sean-k-mooney | this iactully the n-cpu | 16:26 |
| sean-k-mooney | oh | 16:26 |
| sean-k-mooney | but it went to failed | 16:26 |
| sean-k-mooney | and ite expecting inactive | 16:26 |
| sean-k-mooney | so maybe it need to handel both | 16:26 |
| gmaan | it hanged on 2nd RPC server | 16:27 |
| gmaan | "nova-compute service stopping RPC server on topic: compute-alt" | 16:27 |
| gmaan | but it did not stopped it completely before tiemout happen | 16:27 |
| gmaan | https://zuul.opendev.org/t/openstack/build/b33251f50495423088e47c8058a8760c/log/controller/logs/screen-n-cpu.txt#28277 | 16:27 |
| gmaan | means there is still something running on 2nd RPC server | 16:28 |
| sean-k-mooney | so ya it ligitly didnt finsih in the time allowed | 16:28 |
| sean-k-mooney | and then hard shutdown i ugess | 16:28 |
| sean-k-mooney | so oslow killed it | 16:29 |
| sean-k-mooney | and it went to failed instead of inactive | 16:29 |
| sean-k-mooney | based on | 16:29 |
| sean-k-mooney | ay 29 16:00:06.284694 np1108b6f6f5e84 nova-compute[96531]: INFO oslo_service.backend._threading.service [None req-90c75c07-595f-4ef0-9290-6d50665dc3a9 None None] Graceful shutdown timeout exceeded, instantaneous exiting | 16:29 |
| sean-k-mooney | May 29 16:00:06.299677 np1108b6f6f5e84 systemd[1]: devstack@n-cpu.service: Main process exited, code=exited, status=1/FAILURE | 16:29 |
| gmaan | 2nd RPC server got 19 sec to be finished bfore tim eout happen | 16:29 |
| gmaan | and it did not | 16:29 |
| sean-k-mooney | i guess there is a potiall delay in the signal handler prociign the sigterm | 16:30 |
| gmaan | yeah so 30 sec of timeout is finished before 2nd RPC server finished in 19 sec | 16:30 |
| gmaan | maybe | 16:30 |
| sean-k-mooney | so it looks like update aviabel resouce was runnign in the backgorund | 16:31 |
| sean-k-mooney | when we did this | 16:31 |
| gmaan | but we wait enough for SIGTERM to arrive, revert resize to finish and then 180 sec wait | 16:31 |
| gmaan | I think 30 sec is little on neck to neck here, maybe we can increase that to 60 | 16:32 |
| sean-k-mooney | but we dont prevent perodic form starting | 16:32 |
| sean-k-mooney | or running after we get the singal do we | 16:32 |
| gmaan | yeah, that is what I am doing as part of phase-2. 'no new periodic tasks after shutdown is initiated' | 16:33 |
| sean-k-mooney | ah ok | 16:33 |
| sean-k-mooney | so that a know gap | 16:33 |
| sean-k-mooney | makes sense | 16:33 |
| gmaan | I was in doubt when i did 30 sec timeout but making it to 60 make sense to avoid these timing things | 16:34 |
| sean-k-mooney | we could extend it a bit more | 16:34 |
| sean-k-mooney | the job is short but it also rarely fails this way | 16:34 |
| sean-k-mooney | so we can keep an eye on it | 16:34 |
| sean-k-mooney | if your workign on this anyway in pahse 2 | 16:34 |
| sean-k-mooney | then we dont nee to rush unless we see it start to fail often | 16:35 |
| gmaan | yeah, that also fine as phase-2 makes it more better on wait | 16:35 |
| gmaan | ack | 16:35 |
| sean-k-mooney | i guess that just leave the vtpm failre to look into but i can follow up with that seperatly | 16:36 |
| sean-k-mooney | https://zuul.opendev.org/t/openstack/build/0e2b96f24dc64c71b6c5ce412205c0a1/log/controller/logs/screen-n-cpu.txt#1494 | 16:39 |
| sean-k-mooney | https://zuul.opendev.org/t/openstack/build/0e2b96f24dc64c71b6c5ce412205c0a1/log/controller/logs/etc/nova/nova-cpu_conf.txt#132-140 | 16:40 |
| sean-k-mooney | we are tweaking the cpu flags there | 16:40 |
| sean-k-mooney | but that combination shoudl actully work on intel or amd | 16:40 |
| sean-k-mooney | for the vtpm usecase we coud disabel that overed since we are not runnign the test that uses them | 16:41 |
| sean-k-mooney | or change the values we orgianlly chose those as option that shoule alwasy work on any cpu | 16:41 |
| sean-k-mooney | i belive we actully have a know bug realted to this too so im not really wored but obviously we dont want the false positive her | 16:43 |
| sean-k-mooney | oh | 16:44 |
| sean-k-mooney | ] Error from libvirt when retrieving domain capabilities for arch x86_64 / virt_type kvm / machine_type pc: [Error Code 8]: invalid argument: the accel 'kvm' is not supported by '/usr/bin/qemu-system-x86_64' on this host {{(pid=80813) _add_to_domain_capabilities /opt/stack/nova/nova/virt/libvirt/host.py:1058}} | 16:44 |
| sean-k-mooney | i think this is a nova bug | 16:45 |
| sean-k-mooney | i think we shoudl not be passing kvm | 16:45 |
| sean-k-mooney | or we shoudl be it KVM or somethign like that | 16:45 |
| sean-k-mooney | ah | 16:54 |
| sean-k-mooney | Jun 03 18:07:52.710361 npa5a9814153f44 nova-compute[80813]: <model>kvm64</model> | 16:54 |
| sean-k-mooney | Jun 03 18:07:52.710361 npa5a9814153f44 nova-compute[80813]: <vendor>AMD</vendor> | 16:54 |
| sean-k-mooney | so the host this ran is is also useing a custom model | 16:54 |
| sean-k-mooney | based on kvm64 | 16:54 |
| sean-k-mooney | they have obviously enabeld a bunch of cpu flaks in the host nova | 16:55 |
| sean-k-mooney | but this is effectivly an etirly custom cpu from the looks fo it | 16:55 |
| melwitt | isn't that just the nested virt requirement? the nodeset required by the job should have nested bc recall it would not work in practice without nested virt even though we would have expected it to | 16:56 |
| sean-k-mooney | it does | 16:57 |
| sean-k-mooney | but the whitebox tempst plug make some assumtions about what the cpus supprot | 16:57 |
| melwitt | oh ok | 16:58 |
| sean-k-mooney | melwitt: so either htis host is not actully supproting nested virt | 16:58 |
| sean-k-mooney | or it is but because its a custom kvm64 host | 16:58 |
| sean-k-mooney | vm the kvm module is not loading automatily in nested virt mode | 16:58 |
| sean-k-mooney | or the model we are requeting is actully inompatable | 16:59 |
| sean-k-mooney | melwitt: so it no an issue with your vtpm work | 16:59 |
| sean-k-mooney | this is just a mismatch betwen what that provdier is giving and the job expecations | 16:59 |
| sean-k-mooney | melwitt: for example i know that enabling nested virt by default happen in diffent kernel version for intel and amd | 17:00 |
| melwitt | ok I see. trying to keep my eyes peeled for stuff I might have broke 😬 | 17:00 |
| sean-k-mooney | also because intel and amd cant agree on anyting the flag on intel is vmx and it svm on amd | 17:01 |
| melwitt | do they ever agree on flags | 17:01 |
| sean-k-mooney | rarely | 17:01 |
| sean-k-mooney | if we looks at the host flags in this job https://zuul.opendev.org/t/openstack/build/0e2b96f24dc64c71b6c5ce412205c0a1/log/controller/logs/screen-n-cpu.txt#30 | 17:02 |
| sean-k-mooney | i dont see svm | 17:02 |
| sean-k-mooney | so the nested virt vm does nto actully supprot nested virt | 17:03 |
| sean-k-mooney | that why it explode when we tried touse kvm | 17:03 |
| sean-k-mooney | this ran on raxflex-dfw3-main | 17:04 |
| sean-k-mooney | so that provider is sometimes not providign nested virt capbale vsm for the nested-virt-ubuntu-noble lable | 17:04 |
| sean-k-mooney | taking libvirt/nova out of the picture we can see the same in the ansible facts for the host https://zuul.opendev.org/t/openstack/build/0e2b96f24dc64c71b6c5ce412205c0a1/log/zuul-info/host-info.controller.yaml#410 | 17:07 |
| sean-k-mooney | svm and vmx are missing so kvm wont work | 17:07 |
| melwitt | is there a way to fix it, like is it that some config is showing flags it does not support for that provider or something? | 17:11 |
| sean-k-mooney | ya so there are 2 things | 17:11 |
| sean-k-mooney | we shoudl follow up with the infra folks and reach ow to rackspace | 17:11 |
| sean-k-mooney | this could be an issue with a few computes in the wrong host aggreate | 17:12 |
| melwitt | ah ok | 17:12 |
| sean-k-mooney | and we can temporay remove the lable form the DFW provider in the zuul config | 17:12 |
| melwitt | cc clarkb ^ | 17:12 |
| sean-k-mooney | the passign runs seam to be on vexhost | 17:12 |
| sean-k-mooney | im checkign if any of the recent passign runs were on that provider | 17:12 |
| clarkb | which provider isn't providing nested virt? | 17:13 |
| sean-k-mooney | raxflex-dfw3-main | 17:14 |
| clarkb | weird. That is the newest cloud we talk to. I wonder if they removed nested virt support? | 17:14 |
| clarkb | or maybe they have misconfigured hypervisors | 17:14 |
| sean-k-mooney | the are usign an entrily custm cpu model | 17:15 |
| sean-k-mooney | and they forgot to add the flag for nested virt | 17:15 |
| clarkb | ya so maybe they chagned the cpu models and that happened | 17:15 |
| sean-k-mooney | or intentionally left it out | 17:15 |
| sean-k-mooney | https://zuul.opendev.org/t/openstack/build/19d3f49f01e14fd79bbd56155e810c67/log/controller/logs/screen-n-cpu.txt#32 | 17:15 |
| clarkb | cardoe maybe you know ^ | 17:15 |
| sean-k-mooney | so ya the are presentign as kvm64 cpu + a bunch of addtionl flags | 17:15 |
| sean-k-mooney | so they have cpu_model=kvm64 + cpu_mode=custome and then they listed a lot of cpu flags | 17:16 |
| sean-k-mooney | my guess is this is to work aroudn specter adn a bunch of those silicon bugs | 17:16 |
| clarkb | https://zuul.opendev.org/t/openstack/build/19d3f49f01e14fd79bbd56155e810c67/log/zuul-info/host-info.compute-host.yaml#410-522 | 17:16 |
| sean-k-mooney | yep i checkt that too | 17:17 |
| sean-k-mooney | so if im not mistake we shoudl se svm there | 17:17 |
| sean-k-mooney | for amd hosts and vmx for intel | 17:17 |
| clarkb | yes I think that is the case | 17:18 |
| sean-k-mooney | https://zuul.opendev.org/t/openstack/build/ae9f2247a9584d3db324e6548753ae6c/log/zuul-info/host-info.controller.yaml#371 maybe not? | 17:19 |
| sean-k-mooney | that a amd host on vexhost wher ethe job worked | 17:19 |
| sean-k-mooney | oh no | 17:19 |
| sean-k-mooney | https://zuul.opendev.org/t/openstack/build/ae9f2247a9584d3db324e6548753ae6c/log/zuul-info/host-info.controller.yaml#425 | 17:19 |
| sean-k-mooney | its there i just missed it so ya that the problem | 17:19 |
| sean-k-mooney | so ya cardoe the fix si to add svm to cpu_model_extra_flags with the rest or remove the lable form the proivder | 17:23 |
| clarkb | spot checking both dfw3 and iad3 none of them have svm or vmx in the flags (sjc3 is currently disabled for unrelated reasons) | 17:23 |
| clarkb | I'll put a chagne up to drop the nested virt labels from rax flex | 17:23 |
| sean-k-mooney | ack so only vexhost is workign for now | 17:24 |
| sean-k-mooney | for that lable | 17:24 |
| sean-k-mooney | melwitt: as an asside i know qemu was crashing without it but we may want to try debian-13 or ubuntu 26.04 again in the futrue adn see if we can remove the need for kvm | 17:25 |
| melwitt | ok sure | 17:25 |
| clarkb | ovh also does nested virt | 17:25 |
| sean-k-mooney | clarkb: oh good to know | 17:25 |
| melwitt | it would be ideal to not need it | 17:25 |
| clarkb | raxflex, ovh, and vexxhost support nested virt right now. I'm removing raxflex based on the evidence collected above | 17:25 |
| sean-k-mooney | thanks | 17:26 |
| sean-k-mooney | brb | 17:26 |
| clarkb | https://review.opendev.org/c/opendev/zuul-providers/+/991701 | 17:27 |
| opendevreview | Rajat Dhasmana proposed openstack/devstack master: Add cinder config for multi-ceph setup https://review.opendev.org/c/openstack/devstack/+/952088 | 18:29 |
| opendevreview | Rajat Dhasmana proposed openstack/devstack master: Add support for Cinder replication https://review.opendev.org/c/openstack/devstack/+/953045 | 18:29 |
| whoami-rajat | thanks sean-k-mooney for catching the rebase error i made, fixed it now https://review.opendev.org/c/openstack/devstack/+/952088 | 18:31 |
| sean-k-mooney | +2 | 18:35 |
| sean-k-mooney | let me chec the depedn on | 18:35 |
| sean-k-mooney | did you rdnm report | 18:35 |
| sean-k-mooney | hum no | 18:35 |
| sean-k-mooney | why did https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/952107 not tirgger | 18:36 |
| sean-k-mooney | oh it did https://zuul.opendev.org/t/openstack/builds?change=952107&skip=0 | 18:37 |
| sean-k-mooney | https://zuul.opendev.org/t/openstack/build/c75ad27c92c74b3894017f773549cc8c | 18:37 |
| sean-k-mooney | oh no that a year ago | 18:37 |
| sean-k-mooney | oh i see | 18:38 |
| sean-k-mooney | its because it need a rebase | 18:38 |
| sean-k-mooney | when it say not currnet on the right in orange | 18:38 |
| sean-k-mooney | it wont actuly trigger | 18:38 |
| sean-k-mooney | and because its missing signed off by you cant rebase via the ui | 18:39 |
| sean-k-mooney | whoami-rajat: would you mind updating https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/952107 | 18:39 |
| opendevreview | Merged openstack/tempest master: fix nova 2.80 responce schema https://review.opendev.org/c/openstack/tempest/+/991556 | 18:41 |
| gmaan | sean-k-mooney: thoe are tested in https://review.opendev.org/c/openstack/cinder-tempest-plugin/+/952882 | 18:45 |
| sean-k-mooney | i that more testing the cidner replciation feature which uses it as a sideffect but you are correct | 18:48 |
| sean-k-mooney | gmaan: i was hopign we coudl make the DNM an actul real job to test change to the devectk plugin | 18:49 |
| sean-k-mooney | given https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/951810/7 and https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/953042/2 are otherwize untest in that ci | 18:50 |
| gmaan | ++ yeah running multi_ceph testing on ceph plugin is good idea | 18:50 |
| opendevreview | Rajat Dhasmana proposed openstack/devstack-plugin-ceph master: Test multi ceph + replication configuration https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/952107 | 18:50 |
| sean-k-mooney | my main concer with https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/951810 beside the testing was more the memory usage of 2 cpehs cluster on one host | 18:51 |
| sean-k-mooney | but its proably ok | 18:51 |
| opendevreview | Rajat Dhasmana proposed openstack/devstack-plugin-ceph master: Test multi ceph configuration https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/952107 | 18:51 |
| sean-k-mooney | espically sicne that josb is not really testing nova vms its just testing the volume replication | 18:52 |
| whoami-rajat | sean-k-mooney, gmaan done https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/952107 | 18:52 |
| sean-k-mooney | memory_tracker low_point: 855008 | 18:53 |
| sean-k-mooney | that actull not that terible | 18:53 |
| whoami-rajat | I don't think running 2x ceph services would be a big impact on memory, just the disk space needs to be considered since we're creating 2 ceph clusters | 18:55 |
| sean-k-mooney | so for on osd its | 18:57 |
| sean-k-mooney | Memory: 181.2M (peak: 449.9M, swap: 32.1M, swap peak: 37.3M, zswap: 3.3M) | 18:57 |
| sean-k-mooney | and the second is | 18:58 |
| sean-k-mooney | Memory: 143.4M (peak: 471.9M, swap: 16M, swap peak: 16.3M, zswap: 2M) | 18:58 |
| sean-k-mooney | co callit 150M per osd with a 450M peak liekly during startup or the inital providioning | 18:58 |
| sean-k-mooney | whoami-rajat: gmaan so do either of ye have a concern with proceedign with https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/951810 | 19:09 |
| sean-k-mooney | if not then we can like appove that based on the cidner-tempest-plugin results | 19:09 |
| sean-k-mooney | and allwo this to start merging | 19:10 |
| gmaan | sean-k-mooney: m good as long as job verify it (952107) so go ahead once results are there | 19:16 |
| sean-k-mooney | https://zuul.openstack.org/stream/a5d0bab3b1844aecadf58a84ca73ad20?logfile=console.log its passing so ill jsut wati for ti to compelte | 19:26 |
| sean-k-mooney | and by passing i mena tempest is running and succeeding | 19:26 |
| sean-k-mooney | but i want to confirm that both custers are runign first | 19:27 |
| *** elodilles is now known as elodilles_OoO | 20:17 | |
| opendevreview | Eric Harney proposed openstack/devstack-plugin-ceph master: Enable scheduled backend trash purging https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/919539 | 21:01 |
| cardoe | clarkb: I’m on PTO with no work access. Maybe jamesdenton or cloudnull are on somewhere. | 21:21 |
| clarkb | ack sorry to bother | 21:51 |
| opendevreview | Merged openstack/devstack-plugin-ceph master: Add support to deploy two ceph clusters https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/951810 | 22:22 |
| opendevreview | Merged openstack/devstack-plugin-ceph master: Add support for ceph replication https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/953042 | 22:44 |
Generated by irclog2html.py 4.1.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!