| opendevreview | Goutham Pacha Ravi proposed openstack/nova-specs master: Add virtiofs cold migration spec https://review.opendev.org/c/openstack/nova-specs/+/985738 | 01:43 |
|---|---|---|
| opendevreview | Merged openstack/os-vif master: typing: Updates for typed oslo.config https://review.opendev.org/c/openstack/os-vif/+/989206 | 10:47 |
| opendevreview | Merged openstack/os-vif master: Remove linux bridge plugin https://review.opendev.org/c/openstack/os-vif/+/941586 | 10:52 |
| opendevreview | Elod Illes proposed openstack/nova master: Do not log metadata proxy shared secret https://review.opendev.org/c/openstack/nova/+/988501 | 13:05 |
| elodilles | sean-k-mooney: i've extended a unit test ^^^ | 13:06 |
| sean-k-mooney | self.assertIn('mismatched_signature', str(warning_calls)) | 13:07 |
| sean-k-mooney | shoudl that not be assertNotIn | 13:08 |
| sean-k-mooney | oh | 13:08 |
| sean-k-mooney | no | 13:08 |
| sean-k-mooney | that the value that was passed in teh header | 13:08 |
| sean-k-mooney | ok and signed is the cofnig value | 13:08 |
| sean-k-mooney | elodilles: ya ok that correct +2 | 13:08 |
| sean-k-mooney | you could argure you shoudl not log either value but i think at that poitn it start to be a litte hard to debug | 13:10 |
| sean-k-mooney | elodilles: im ok with this as a ok compromise for operational debugablity | 13:11 |
| sean-k-mooney | at some point we shoudl split that test up into multiple tests but that out of scope of your change | 13:12 |
| elodilles | ACK, thanks :) | 13:16 |
| opendevreview | Masanori Ueno proposed openstack/nova master: NUMA live-migration: ensure allocation_ratio is respected https://review.opendev.org/c/openstack/nova/+/989378 | 13:21 |
| opendevreview | Masanori Ueno proposed openstack/nova master: NUMA live-migration: ensure allocation_ratio is respected https://review.opendev.org/c/openstack/nova/+/989378 | 13:23 |
| sean-k-mooney | elodilles: by the way the os-vif changes finally merged if you want to proceed with the release and have not already done so | 13:24 |
| sean-k-mooney | we might need to update teh sha. if you arelady did the m1 release there is no rush to release again | 13:24 |
| elodilles | sean-k-mooney: oh, cool, i've been waiting for the patch to merge. but then it did :) let me update the hash on the rel patch | 13:25 |
| elodilles | sean-k-mooney: hash and version number is updated -> https://review.opendev.org/c/openstack/releases/+/988061 | 13:29 |
| sean-k-mooney | that looks correct to me, i have commented the same on the patch | 13:32 |
| elodilles | Uggla: when you have some time, could you please take a quick look at the os-vif release patch? Sean and Stephen are both +1'd it already ;) https://review.opendev.org/c/openstack/releases/+/988061 | 14:24 |
| Uggla | elodilles, so I guess Stephen includes what he wanted in os-vif right ? | 14:25 |
| elodilles | yepp | 14:27 |
| elodilles | that patch has merged | 14:27 |
| auniyal | https://review.opendev.org/c/openstack/requirements/+/988412 | 14:44 |
| auniyal | gmaan this might be the reason for tempest bug - attachment issue | 14:44 |
| auniyal | :( seems nope | 14:58 |
| opendevreview | Takashi Kajinami proposed openstack/nova master: Drop description for ancient horizon https://review.opendev.org/c/openstack/nova/+/989419 | 15:46 |
| opendevreview | Takashi Kajinami proposed openstack/nova master: Drop description for ancient horizon https://review.opendev.org/c/openstack/nova/+/989419 | 15:48 |
| gmaan | auniyal: thanks, let's chat on cinder channel as it is not related to nova | 16:57 |
| Uggla | sean-k-mooney, does the PCI tracker only read the libvirt pci device "inventory" at startup ? | 17:00 |
| sean-k-mooney | it technially doe snot read it at all | 17:02 |
| sean-k-mooney | the pci tracker isslef is virt dirver indepentent | 17:03 |
| sean-k-mooney | and each of the virt driver reasd the pci infromation internally and present ti to the tracker in a normalised form | 17:03 |
| sean-k-mooney | but yes its effectivly cached | 17:03 |
| sean-k-mooney | we dont supprot changing the device wheil nova-comptue is running | 17:04 |
| Uggla | I mean update_devices_from_hypervisor_resources is only called at boot ? | 17:04 |
| Uggla | s/boot/nova startup/ | 17:04 |
| sean-k-mooney | we do refesh it as part of update aviabel resouces | 17:04 |
| sean-k-mooney | but wiht cached data or with data that shoudl be cached | 17:04 |
| Uggla | sean-k-mooney thx. | 17:06 |
| melwitt | gmaan: since your comment on the nova-vtpm job patch, I went looking through the NoValidHost failures again (the reason inspiring the patch) and found that it looks like what happens in that case is that a "Connection event '0' reason 'Connection to libvirt lost: 1'" happens on a compute node which causes the COMPUTE_STATUS_DISABLED trait to be added to that compute and _that's_ what ends up causing the NoValidHost ... as the other | 17:22 |
| melwitt | compute node shows as temporarily disabled, | 17:22 |
| melwitt | so now I'm questioning whether setting tempest_concurrency=1 would actually help anything ... seems like maybe not | 17:22 |
| melwitt | I'm not sure how or why it could be more common to cause a failure in nova-vtpm ... perhaps because there are so few tests total and two of them involve resizes which have to go to the other host | 17:23 |
| sean-k-mooney | melwitt: if the libvirt connection is lost | 17:29 |
| sean-k-mooney | melwitt: we mark the compute service as disabled automaticlly | 17:30 |
| sean-k-mooney | and renableit when it comes back | 17:30 |
| melwitt | sean-k-mooney: yes that's what I meant is I can see why it is causing the NoValidHost, if it happens at an inopportune time | 17:30 |
| sean-k-mooney | so the question is why is that disconenct happening are there OOM events | 17:31 |
| sean-k-mooney | that noramly only happens if libvirt restarts | 17:31 |
| melwitt | hm ok, let me check that | 17:31 |
| sean-k-mooney | the nova-vtpm job looks pretty stabel by the way | 17:32 |
| sean-k-mooney | i assuem your debuging https://zuul.openstack.org/build/61181a17a11a41feb198f6d817337ca1 | 17:33 |
| melwitt | yeah.. agreed. at the time I proposed the patch to reduce concurrency to 1, it was failing with NoValidHost more often and I had thought it was in part due to tests running in parallel. but now that I see this libvirt connection loss thing, I am questioning that thought | 17:33 |
| melwitt | no I'm using https://zuul.opendev.org/t/openstack/build/1efc16477b414adbaf673b00520aa5cf bc that's where I see NoValidHost. the one you linked looked to be a different problem unless I missed something | 17:34 |
| sean-k-mooney | well it could be concrance related | 17:34 |
| melwitt | for the libvirt connection lost? | 17:34 |
| sean-k-mooney | well if the conenction loos was because of an OOM | 17:35 |
| sean-k-mooney | but i doen see that in the logs for that job | 17:35 |
| melwitt | gotcha. yeah I'm not seeing something like that either so far | 17:35 |
| sean-k-mooney | libvirt.libvirtError: internal error: client socket is closed | 17:38 |
| sean-k-mooney | that is werid | 17:38 |
| sean-k-mooney | taht is saying that nova could not create the vm in this case becaus the libvirt socket was closed form libivrts side | 17:39 |
| sean-k-mooney | ot at least we recived a disconenct on the socket we had opened to talk to it | 17:39 |
| sean-k-mooney | in this case it hapepns to be coming form libvirt_secret.undefine() | 17:40 |
| sean-k-mooney | based on the trace back https://zuul.opendev.org/t/openstack/build/1efc16477b414adbaf673b00520aa5cf/log/compute-host/logs/screen-n-cpu.txt#5203 | 17:40 |
| melwitt | this is the one I found that caused a NoValidHost https://zuul.opendev.org/t/openstack/build/1efc16477b414adbaf673b00520aa5cf/log/controller/logs/screen-n-cpu.txt#4543 | 17:43 |
| sean-k-mooney | so i think the libvirt log time stampe are off by an hour form novas | 17:47 |
| sean-k-mooney | 2026-05-12 22:08:17.608+0100: 29111: debug : virThreadJobClear:118 : Thread 29111 (rpc-libvirtd) finished job remoteDispatchNodeGetCPUMap with ret=0 │ | 17:47 |
| sean-k-mooney | │2026-05-12 22:08:29.663+0100: 29119: debug : virThreadJobSet:93 : Thread 29119 (prio-rpc-libvirtd) is now running job remoteDispatchConnectGetLibVersion | 17:48 |
| sean-k-mooney | but there appare to be a dead space in the log at that point | 17:48 |
| sean-k-mooney | oh i was lookign at the compute | 17:50 |
| sean-k-mooney | there is a 20 secodn gap just before then | 17:52 |
| sean-k-mooney | but noting really stands out | 17:52 |
| sean-k-mooney | i will sayits a liile stange that libvirt prints its verison | 17:56 |
| sean-k-mooney | 2026-05-12 22:08:25.052+0100: 96003: info : libvirt version: 10.0.0, package: 10.0.0-2ubuntu8.13 (Ubuntu) | 17:56 |
| sean-k-mooney | and then start loadinga lot of module and doign some netwrok config not long after | 17:56 |
| sean-k-mooney | tha tkind of olooks like it restarted | 17:57 |
| sean-k-mooney | melwitt: ok ya it was stoped and started | 17:59 |
| sean-k-mooney | │May 12 21:08:38 npe61172270ee04 sudo[44964]: stack : PWD=/opt/stack ; USER=root ; COMMAND=/usr/bin/systemctl stop libvirtd │ | 17:59 |
| sean-k-mooney | │May 12 21:08:38 npe61172270ee04 sudo[44964]: pam_unix(sudo:session): session opened for user root(uid=0) by stack(uid=1001) │ | 17:59 |
| sean-k-mooney | │May 12 21:08:38 npe61172270ee04 sudo[44964]: pam_unix(sudo:session): session closed for user root │ | 17:59 |
| sean-k-mooney | │May 12 21:08:39 npe61172270ee04 sudo[45008]: stack : PWD=/opt/stack ; USER=root ; COMMAND=/usr/bin/systemctl start libvirtd │ | 17:59 |
| sean-k-mooney | │May 12 21:08:39 npe61172270ee04 sudo[45008]: pam_unix(sudo:session): session opened for user root(uid=0) by stack(uid=1001) | 17:59 |
| sean-k-mooney | melwitt: so this is likely oen of 2 thing | 18:01 |
| sean-k-mooney | the whitebox tempest plugin need to run in concrance 1 becasue it has test that do thing like restart libivirt | 18:01 |
| sean-k-mooney | so it could be a whitbox test | 18:01 |
| sean-k-mooney | or this could be beacuse fo a post playbook | 18:02 |
| melwitt | sean-k-mooney: oh huh ok. let me check if there is that | 18:02 |
| melwitt | thanks | 18:02 |
| sean-k-mooney | i used `curl https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_1ef/openstack/1efc16477b414adbaf673b00520aa5cf/compute-host/logs/syslog.txt | zcat | lnav` | 18:03 |
| sean-k-mooney | to look that the syslog locally | 18:03 |
| sean-k-mooney | but tis clearlly shoing that sudo was use to invoke sysclt | 18:03 |
| sean-k-mooney | *systemctl | 18:03 |
| sean-k-mooney | test_vtpm_creation_after_virtqemud_restart | 18:04 |
| sean-k-mooney | that is proably the test i would look at first | 18:04 |
| sean-k-mooney | yep | 18:05 |
| sean-k-mooney | https://opendev.org/openstack/whitebox-tempest-plugin/src/branch/master/whitebox_tempest_plugin/api/compute/test_vtpm.py#L168 | 18:05 |
| melwitt | ugh ok | 18:05 |
| sean-k-mooney | melwitt: gmaan so we have never gotten aroudn to makeign the serial decorator from tempet proper | 18:05 |
| sean-k-mooney | work in whitebox | 18:05 |
| melwitt | that makes it all make sense. thanks | 18:06 |
| sean-k-mooney | so you either need to splity the job into 2 jobs or run it serially. | 18:06 |
| sean-k-mooney | we have done both at diffent time but we currently set concrrency 1 | 18:07 |
| sean-k-mooney | https://opendev.org/openstack/whitebox-tempest-plugin/src/branch/master/.zuul.yaml#L176 | 18:07 |
| sean-k-mooney | ah https://github.com/openstack/nova/blob/master/.zuul.yaml#L484 | 18:08 |
| melwitt | yeah, I had seen that but did not know the root reason why /facepalm. I'm gonna include this specifically in my code comment | 18:08 |
| sean-k-mooney | we shoudl jsut make whitebox use the serial decorator properly eventully | 18:08 |
| sean-k-mooney | maybe let ai spin on that for a while | 18:08 |
| melwitt | yeah... I know. that was my bad. but I did not see why these would have to be serial, I completely missed the restart thing | 18:08 |
| sean-k-mooney | but ya its one of those thing that you only know about because you have the scars | 18:09 |
| sean-k-mooney | melwitt: not at all this is very easy to miss | 18:09 |
| melwitt | "who wouldn't want concurrency!" "oh." | 18:10 |
| gmaan | sean-k-mooney: yeah but is there any known issue for not using serial decorator in whitebox? | 18:10 |
| melwitt | well, mystery solved haha | 18:10 |
| sean-k-mooney | gmaan: jparker tried, hit an isssue and we never got around to it | 18:10 |
| sean-k-mooney | so gmaan no | 18:10 |
| sean-k-mooney | we tried it once there was a bug we reverted back to not using it | 18:11 |
| gmaan | ohk, I can try but sometime start of next month or when get time | 18:11 |
| sean-k-mooney | https://opendev.org/openstack/whitebox-tempest-plugin/commit/a8986a86c03e32daeebcf4e0fb65c91aede2248a | 18:11 |
| melwitt | that means it would let one test be serial test without having to make all the others? | 18:11 |
| sean-k-mooney | im not conviece it was enable properly in te first palce | 18:12 |
| sean-k-mooney | # Decorator support for serial does not land into tempest until 34.0.0. | 18:12 |
| sean-k-mooney | lol ok well that not a proplem any more | 18:12 |
| sean-k-mooney | melwitt: yes | 18:12 |
| melwitt | sounds like a good decorator | 18:13 |
| gmaan | sean-k-mooney: ack, i was searching the link | 18:13 |
| sean-k-mooney | melwitt: it uses a file based reder writere lock | 18:13 |
| sean-k-mooney | melwitt: basiclly when you use this all the test that dont need to run serially acriore the reasder lock and hte ones that do aquire the writer lock | 18:13 |
| sean-k-mooney | and hten the filesysem lock syncoised across the workers | 18:14 |
| melwitt | makes sense | 18:14 |
| sean-k-mooney | but in addtion ot that we run the serial tests last | 18:14 |
| sean-k-mooney | becuase of how we order testes base don name | 18:14 |
| sean-k-mooney | at least in upstream tempest | 18:14 |
| sean-k-mooney | melwitt: https://github.com/openstack/tempest/commit/dfb304355b46882696ef26386637836577be8db7 | 18:15 |
| sean-k-mooney | it was an optimisation that gibi help imeplemnta few years ago | 18:15 |
| gmaan | yeah, that resolved the aggregates tests issue | 18:16 |
| melwitt | very cool | 18:16 |
| sean-k-mooney | hehe https://github.com/openstack/tempest/commit/73ba33773daf1df1be792b616842dd389fd325bc | 18:16 |
| sean-k-mooney | looks like you have used it before | 18:16 |
| melwitt | uhhh lol | 18:17 |
| gmaan | so for now, concurrency=1 will work for VTPM case as they are from same test class | 18:17 |
| sean-k-mooney | yep also we run like 8 tests in that job | 18:17 |
| sean-k-mooney | so concrance 1 is not a big issue | 18:17 |
| melwitt | well, I'm not gonna forget it again now haha | 18:17 |
| melwitt | I put it at its own job bc it needs nested virt and was wary to tie up those machines too much with other tests that don't need it | 18:18 |
| sean-k-mooney | for whiht box in general iw touls be nicer to not need that obvioulsy as many of the test can run in parralel | 18:18 |
| gmaan | yeah | 18:18 |
| sean-k-mooney | well mystery solved | 18:19 |
| sean-k-mooney | feel free to pin and i can review the zuul change quickly | 18:19 |
| melwitt | yeah. this is good. I was struggling to write the code comment for needing concurrency=1 so I was looking at details again. bc I was not seeing why. ugh. | 18:19 |
| opendevreview | melanie witt proposed openstack/nova master: Use tempest_concurrency=1 for nova-vtpm job https://review.opendev.org/c/openstack/nova/+/984864 | 18:29 |
| melwitt | gmaan, sean-k-mooney: ^ | 18:30 |
| gmaan | melwitt: ack, thanks | 18:39 |
| sean-k-mooney | looks good to me | 18:46 |
| melwitt | I'm sure someone already said this but https://bugs.launchpad.net/tempest/+bug/2153382 is affecting nova-next too Details: {'code': 401, 'title': 'Unauthorized', 'message': 'The request you have made requires authentication.'} | 19:41 |
| melwitt | (just replied on the ML) | 19:53 |
| gmaan | ohk, I did not realize nova-next also run that test | 19:54 |
| gmaan | anyways fix is in gate, I will merge and updaet ML once test results are out https://review.opendev.org/c/openstack/tempest/+/938766 | 19:55 |
| melwitt | thanks gmaan | 19:56 |
| melwitt | nova-multi-cell too. sent my reply too hastily | 19:57 |
| sean-k-mooney | you know our gate would pass more frequetly if we just turned off cidner :P | 20:01 |
| sean-k-mooney | gmaan: was the change form a 403 to a 401 intentioanly by the way | 20:01 |
| gmaan | it was due to change in default value of service_token_roles_required in keystonemiddlwware | 20:02 |
| sean-k-mooney | its not really a 401 Unauthorized Error as i understnad it. the service token is a valid token but it does not have the requrie roles right? | 20:02 |
| gmaan | and tempest test did not handle it beacuse it was disable it in CI. | 20:02 |
| gmaan | yes | 20:03 |
| sean-k-mooney | so it really shoudl be a s 403 Forbidden Error | 20:03 |
| sean-k-mooney | becuase it authrization issue not authentication | 20:03 |
| sean-k-mooney | so the real fix woudl be in cidner to make ti return a 403 right? | 20:04 |
| gmaan | I am not sure, if service token does not have a required token then it does not come to cinder itself that if operation is forbidden or not. | 20:05 |
| gmaan | so that is why 401 seems valid one as passed token and required roles are not valid | 20:05 |
| sean-k-mooney | but 401 is not about roles or permeison | 20:06 |
| sean-k-mooney | its about is the token exprie/valid | 20:06 |
| melwitt | I thought that would include roles too though, no? | 20:06 |
| sean-k-mooney | a 401 is an indtication to the client that it should reauthicate | 20:06 |
| sean-k-mooney | adn retry | 20:06 |
| sean-k-mooney | but you do not retyr a 403 | 20:07 |
| sean-k-mooney | form https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status/401 | 20:07 |
| sean-k-mooney | A 401 Unauthorized is similar to the 403 Forbidden response, except that a 403 is returned when a request contains valid credentials, but the client does not have permissions to perform a certain action. | 20:07 |
| sean-k-mooney | and form https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status/403 """Clients that receive a 403 response should expect that repeating the request without modification will fail with the same error. Server owners may decide to send a 404 response instead of a 403 if acknowledging the existence of a resource to clients with insufficient privileges is not | 20:08 |
| sean-k-mooney | desired.""" | 20:08 |
| sean-k-mooney | melwitt: in general no | 20:09 |
| sean-k-mooney | it can | 20:09 |
| gmaan | so in this case, token is considered as invalid right | 20:09 |
| sean-k-mooney | but it more the retry behvior diffent | 20:09 |
| sean-k-mooney | the token is valid | 20:09 |
| sean-k-mooney | but it has insufffenct privldages | 20:10 |
| melwitt | 401 maybe you need to re-auth as a different project or role | 20:10 |
| sean-k-mooney | i guess a client could choose to do that | 20:11 |
| sean-k-mooney | eveutlly nova shoudl just be callign cidner with its own token with the service role on the user token | 20:12 |
| gmaan | I think it is different from service token perspective. what keystonemiddlware consider a serviec token a valid if it has all requried role | 20:12 |
| sean-k-mooney | and we shoudl not be lookign at the service_token at all for permsions in this case | 20:12 |
| gmaan | otherwise a user token which is valid can always be a valid token but that is not the case if we see that as a serviec token | 20:12 |
| sean-k-mooney | maybe | 20:13 |
| gmaan | it just validate if service token send has required role to be consider as 'service' and if not then it is invalid "SERVICE TOKEN" | 20:13 |
| sean-k-mooney | well | 20:13 |
| gmaan | from suer tokjen perspective i agree on 401 vs 403 | 20:13 |
| sean-k-mooney | it need to do 2 things, validate it has the expecte roles adn that it has not expired | 20:13 |
| sean-k-mooney | if both are ture the the service token is valid | 20:13 |
| gmaan | yes | 20:14 |
| sean-k-mooney | anyway did i understnad the fix in tempest is to accpet either 401 or 403 | 20:14 |
| sean-k-mooney | so that you can tolerate either behvior | 20:14 |
| gmaan | yes | 20:14 |
| gmaan | it was only 403 previously bcz service token roles were not validated before | 20:15 |
| sean-k-mooney | that fair you have a few whit space issue by th ewya | 20:15 |
| sean-k-mooney | well yes and no it was optional and off by default | 20:15 |
| sean-k-mooney | but for the cve | 20:15 |
| sean-k-mooney | we added validation in the code spereate from the policy layer | 20:16 |
| gmaan | i thought of adding a new config option to check 401 but that will be unnecessary as service_token_roles_required os a temp config option and should be removed at the end | 20:16 |
| gmaan | it was added for migration purpose and never got moment to be default to True and then removed | 20:16 |
| gmaan | same as enforce_scope in RBAC | 20:17 |
| sean-k-mooney | same as the fallback for threaing pcpus as vcpus | 20:17 |
| sean-k-mooney | https://review.opendev.org/c/openstack/nova/+/975779 | 20:17 |
| gmaan | yeah, i do not know if it is good to be less aggressive on those things or bad :) | 20:17 |
| sean-k-mooney | well ^ was deprecated in nova 20.0.0 | 20:18 |
| sean-k-mooney | which was trian | 20:19 |
| sean-k-mooney | i think that one os more then overdue to be disabel by default and we proably shoudl delete it :) | 20:20 |
| sean-k-mooney | we have a habbit of not cleaing up these migrtion path for many many many years after they were ment to be removed | 20:21 |
| gmaan | yeah | 20:23 |
| gmaan | and by then they become a valid thing for operators than just migration path :) | 20:24 |
| gmaan | I will be more than happy if i can remove enforce_scope this cycle but I know it will fail many project tests and they will not fix it on time | 20:25 |
| gmaan | and breaking them might be the only option to proceed | 20:25 |
| sean-k-mooney | gmaan: if you do it kinder to do it early | 20:35 |
| sean-k-mooney | i.e. at m1 or m2 | 20:35 |
| gmaan | m1 was too early to fix the things so i sent m2 as deadline in ML | 20:37 |
| gmaan | doing it in m1 could be better but then it would not give time for projects to fix tests or change default who still disable it | 20:38 |
| sean-k-mooney | thats fair. on the otherhadn the default was change in oslo a few cycles ago right | 20:40 |
| sean-k-mooney | that was ment to be the time to adapt | 20:40 |
| gmaan | yes | 20:40 |
| opendevreview | Masanori Ueno proposed openstack/nova master: NUMA live-migration: ensure allocation_ratio is respected https://review.opendev.org/c/openstack/nova/+/989378 | 23:21 |
| opendevreview | Merged openstack/nova master: Add reproducer test for bug 2105896 https://review.opendev.org/c/openstack/nova/+/946945 | 23:22 |
Generated by irclog2html.py 4.1.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!