Wednesday, 2026-05-20

opendevreviewGoutham Pacha Ravi proposed openstack/nova-specs master: Add virtiofs cold migration spec  https://review.opendev.org/c/openstack/nova-specs/+/98573801:43
opendevreviewMerged openstack/os-vif master: typing: Updates for typed oslo.config  https://review.opendev.org/c/openstack/os-vif/+/98920610:47
opendevreviewMerged openstack/os-vif master: Remove linux bridge plugin  https://review.opendev.org/c/openstack/os-vif/+/94158610:52
opendevreviewElod Illes proposed openstack/nova master: Do not log metadata proxy shared secret  https://review.opendev.org/c/openstack/nova/+/98850113:05
elodillessean-k-mooney: i've extended a unit test ^^^13:06
sean-k-mooneyself.assertIn('mismatched_signature', str(warning_calls))13:07
sean-k-mooneyshoudl that not be assertNotIn13:08
sean-k-mooneyoh13:08
sean-k-mooneyno13:08
sean-k-mooneythat the value that was passed in teh header13:08
sean-k-mooneyok and signed is the cofnig value13:08
sean-k-mooneyelodilles: ya ok that correct +213:08
sean-k-mooneyyou could argure you shoudl not log either value but i think at that poitn it start to be a litte hard to debug13:10
sean-k-mooneyelodilles: im ok with this as a ok compromise for operational debugablity13:11
sean-k-mooneyat some point we shoudl split that test up into multiple tests but that out of scope of your change13:12
elodillesACK, thanks :)13:16
opendevreviewMasanori Ueno proposed openstack/nova master: NUMA live-migration: ensure allocation_ratio is respected  https://review.opendev.org/c/openstack/nova/+/98937813:21
opendevreviewMasanori Ueno proposed openstack/nova master: NUMA live-migration: ensure allocation_ratio is respected  https://review.opendev.org/c/openstack/nova/+/98937813:23
sean-k-mooneyelodilles: by the way the os-vif changes finally merged if you want to proceed with the release and have not already done so13:24
sean-k-mooneywe might need to update teh sha. if you arelady did the m1 release there is no rush to release again13:24
elodillessean-k-mooney: oh, cool, i've been waiting for the patch to merge. but then it did :) let me update the hash on the rel patch13:25
elodillessean-k-mooney: hash and version number is updated -> https://review.opendev.org/c/openstack/releases/+/98806113:29
sean-k-mooneythat looks correct to me, i have commented the same on the patch13:32
elodillesUggla: when you have some time, could you please take a quick look at the os-vif release patch? Sean and Stephen are both +1'd it already ;) https://review.opendev.org/c/openstack/releases/+/98806114:24
Ugglaelodilles, so I guess Stephen includes what he wanted in os-vif right ?14:25
elodillesyepp14:27
elodillesthat patch has merged14:27
auniyalhttps://review.opendev.org/c/openstack/requirements/+/98841214:44
auniyalgmaan this might be the reason for tempest bug - attachment issue14:44
auniyal:( seems nope 14:58
opendevreviewTakashi Kajinami proposed openstack/nova master: Drop description for ancient horizon  https://review.opendev.org/c/openstack/nova/+/98941915:46
opendevreviewTakashi Kajinami proposed openstack/nova master: Drop description for ancient horizon  https://review.opendev.org/c/openstack/nova/+/98941915:48
gmaanauniyal: thanks, let's chat on cinder channel as it is not related to nova16:57
Ugglasean-k-mooney, does the PCI tracker only read the libvirt pci device "inventory" at startup ?17:00
sean-k-mooneyit technially doe snot read it at all17:02
sean-k-mooneythe pci tracker isslef is virt dirver indepentent17:03
sean-k-mooneyand each of the virt driver reasd the pci infromation internally and present ti to the tracker in a normalised form17:03
sean-k-mooneybut yes its effectivly cached 17:03
sean-k-mooneywe dont supprot changing the device wheil nova-comptue is running17:04
UgglaI mean update_devices_from_hypervisor_resources is only called at boot ?17:04
Ugglas/boot/nova startup/17:04
sean-k-mooneywe do refesh it as part of update aviabel resouces17:04
sean-k-mooneybut wiht cached data or with data that shoudl be cached17:04
Ugglasean-k-mooney thx.17:06
melwittgmaan: since your comment on the nova-vtpm job patch, I went looking through the NoValidHost failures again (the reason inspiring the patch) and found that it looks like what happens in that case is that a "Connection event '0' reason 'Connection to libvirt lost: 1'" happens on a compute node which causes the COMPUTE_STATUS_DISABLED trait to be added to that compute and _that's_ what ends up causing the NoValidHost ... as the other17:22
melwitt compute node shows as temporarily disabled,17:22
melwittso now I'm questioning whether setting tempest_concurrency=1 would actually help anything ... seems like maybe not17:22
melwittI'm not sure how or why it could be more common to cause a failure in nova-vtpm ... perhaps because there are so few tests total and two of them involve resizes which have to go to the other host17:23
sean-k-mooneymelwitt: if the libvirt connection is lost17:29
sean-k-mooneymelwitt: we mark the compute service as disabled automaticlly17:30
sean-k-mooneyand renableit when it comes back17:30
melwittsean-k-mooney: yes that's what I meant is I can see why it is causing the NoValidHost, if it happens at an inopportune time17:30
sean-k-mooneyso the question is why is that disconenct happening are there OOM events17:31
sean-k-mooneythat noramly only happens if libvirt restarts17:31
melwitthm ok, let me check that17:31
sean-k-mooneythe nova-vtpm job looks pretty stabel by the way17:32
sean-k-mooneyi assuem your debuging https://zuul.openstack.org/build/61181a17a11a41feb198f6d817337ca117:33
melwittyeah.. agreed. at the time I proposed the patch to reduce concurrency to 1, it was failing with NoValidHost more often and I had thought it was in part due to tests running in parallel. but now that I see this libvirt connection loss thing, I am questioning that thought17:33
melwittno I'm using https://zuul.opendev.org/t/openstack/build/1efc16477b414adbaf673b00520aa5cf bc that's where I see NoValidHost. the one you linked looked to be a different problem unless I missed something17:34
sean-k-mooneywell it could be concrance related17:34
melwittfor the libvirt connection lost?17:34
sean-k-mooneywell if the conenction loos was because of an OOM17:35
sean-k-mooneybut i doen see that in the logs for that job17:35
melwittgotcha. yeah I'm not seeing something like that either so far17:35
sean-k-mooneylibvirt.libvirtError: internal error: client socket is closed17:38
sean-k-mooneythat is werid17:38
sean-k-mooneytaht is saying that nova could not create the vm in this case becaus the libvirt socket was closed form libivrts side17:39
sean-k-mooneyot at least we recived a disconenct on the socket we had opened to talk to it17:39
sean-k-mooneyin this case it hapepns to be coming form  libvirt_secret.undefine()17:40
sean-k-mooneybased on the trace back https://zuul.opendev.org/t/openstack/build/1efc16477b414adbaf673b00520aa5cf/log/compute-host/logs/screen-n-cpu.txt#520317:40
melwittthis is the one I found that caused a NoValidHost https://zuul.opendev.org/t/openstack/build/1efc16477b414adbaf673b00520aa5cf/log/controller/logs/screen-n-cpu.txt#454317:43
sean-k-mooneyso i think the libvirt log time stampe are off by an hour form novas17:47
sean-k-mooney2026-05-12 22:08:17.608+0100: 29111: debug : virThreadJobClear:118 : Thread 29111 (rpc-libvirtd) finished job remoteDispatchNodeGetCPUMap with ret=0                                         │17:47
sean-k-mooney│2026-05-12 22:08:29.663+0100: 29119: debug : virThreadJobSet:93 : Thread 29119 (prio-rpc-libvirtd) is now running job remoteDispatchConnectGetLibVersion17:48
sean-k-mooneybut there appare to be a dead space in the log at that point17:48
sean-k-mooneyoh i was lookign at the compute17:50
sean-k-mooneythere is a 20 secodn gap just before then17:52
sean-k-mooneybut noting really stands out17:52
sean-k-mooneyi will sayits a liile stange that libvirt prints its verison17:56
sean-k-mooney2026-05-12 22:08:25.052+0100: 96003: info : libvirt version: 10.0.0, package: 10.0.0-2ubuntu8.13 (Ubuntu)17:56
sean-k-mooneyand then start loadinga lot of module and doign some netwrok config not long after17:56
sean-k-mooneytha tkind of olooks like it restarted17:57
sean-k-mooneymelwitt: ok ya it was stoped and started17:59
sean-k-mooney│May 12 21:08:38 npe61172270ee04 sudo[44964]:    stack : PWD=/opt/stack ; USER=root ; COMMAND=/usr/bin/systemctl stop libvirtd                                                                │17:59
sean-k-mooney│May 12 21:08:38 npe61172270ee04 sudo[44964]: pam_unix(sudo:session): session opened for user root(uid=0) by stack(uid=1001)                                                                  │17:59
sean-k-mooney│May 12 21:08:38 npe61172270ee04 sudo[44964]: pam_unix(sudo:session): session closed for user root                                                                                            │17:59
sean-k-mooney│May 12 21:08:39 npe61172270ee04 sudo[45008]:    stack : PWD=/opt/stack ; USER=root ; COMMAND=/usr/bin/systemctl start libvirtd                                                               │17:59
sean-k-mooney│May 12 21:08:39 npe61172270ee04 sudo[45008]: pam_unix(sudo:session): session opened for user root(uid=0) by stack(uid=1001)17:59
sean-k-mooneymelwitt: so this is likely oen of 2 thing18:01
sean-k-mooneythe whitebox tempest plugin need to run in concrance 1 becasue it has test that do thing like restart libivirt18:01
sean-k-mooneyso it could be a whitbox test18:01
sean-k-mooneyor this could be beacuse fo a post playbook18:02
melwittsean-k-mooney: oh huh ok. let me check if there is that18:02
melwittthanks18:02
sean-k-mooneyi used  `curl https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_1ef/openstack/1efc16477b414adbaf673b00520aa5cf/compute-host/logs/syslog.txt | zcat | lnav`18:03
sean-k-mooneyto look that the syslog locally18:03
sean-k-mooneybut tis clearlly shoing that sudo was use to invoke sysclt18:03
sean-k-mooney*systemctl18:03
sean-k-mooneytest_vtpm_creation_after_virtqemud_restart18:04
sean-k-mooneythat is proably the test i would look at first18:04
sean-k-mooneyyep18:05
sean-k-mooneyhttps://opendev.org/openstack/whitebox-tempest-plugin/src/branch/master/whitebox_tempest_plugin/api/compute/test_vtpm.py#L16818:05
melwittugh ok18:05
sean-k-mooneymelwitt: gmaan  so we have never gotten aroudn to makeign the serial decorator from tempet proper18:05
sean-k-mooneywork in whitebox18:05
melwittthat makes it all make sense. thanks 18:06
sean-k-mooneyso you either need to splity the job into 2 jobs or run it serially.18:06
sean-k-mooneywe have done both at diffent time but we currently set concrrency 118:07
sean-k-mooneyhttps://opendev.org/openstack/whitebox-tempest-plugin/src/branch/master/.zuul.yaml#L17618:07
sean-k-mooneyah https://github.com/openstack/nova/blob/master/.zuul.yaml#L48418:08
melwittyeah, I had seen that but did not know the root reason why /facepalm. I'm gonna include this specifically in my code comment18:08
sean-k-mooneywe shoudl jsut make whitebox use the serial decorator properly eventully18:08
sean-k-mooneymaybe let ai spin on that for a while18:08
melwittyeah... I know. that was my bad. but I did not see why these would have to be serial, I completely missed the restart thing18:08
sean-k-mooneybut ya its one of those thing that you only know about because you have the scars18:09
sean-k-mooneymelwitt: not at all this is very easy to miss18:09
melwitt"who wouldn't want concurrency!" "oh."18:10
gmaansean-k-mooney: yeah but is there any known issue for not using serial decorator in whitebox? 18:10
melwittwell, mystery solved haha18:10
sean-k-mooneygmaan: jparker tried, hit an isssue and we never got around to it18:10
sean-k-mooneyso gmaan no18:10
sean-k-mooneywe tried it once there was a bug we reverted back to not using it18:11
gmaanohk, I can try but sometime start of next month or when get time18:11
sean-k-mooneyhttps://opendev.org/openstack/whitebox-tempest-plugin/commit/a8986a86c03e32daeebcf4e0fb65c91aede2248a18:11
melwittthat means it would let one test be serial test without having to make all the others?18:11
sean-k-mooneyim not conviece it was enable properly in te first palce18:12
sean-k-mooney   # Decorator support for serial does not land into tempest until 34.0.0.18:12
sean-k-mooneylol ok well that not a proplem any more18:12
sean-k-mooneymelwitt: yes18:12
melwittsounds like a good decorator18:13
gmaansean-k-mooney: ack, i was searching the link18:13
sean-k-mooneymelwitt: it uses a file based reder writere lock18:13
sean-k-mooneymelwitt: basiclly when you use this all the test that dont need to run serially acriore the reasder lock and hte ones that do aquire the writer lock18:13
sean-k-mooneyand hten the filesysem lock syncoised across the workers18:14
melwittmakes sense18:14
sean-k-mooneybut in addtion ot that we run the serial tests last18:14
sean-k-mooneybecuase of how we order testes base don name18:14
sean-k-mooneyat least in upstream tempest18:14
sean-k-mooneymelwitt: https://github.com/openstack/tempest/commit/dfb304355b46882696ef26386637836577be8db718:15
sean-k-mooneyit was an optimisation that gibi help imeplemnta few years ago18:15
gmaanyeah, that resolved the aggregates tests issue18:16
melwittvery cool18:16
sean-k-mooneyhehe https://github.com/openstack/tempest/commit/73ba33773daf1df1be792b616842dd389fd325bc18:16
sean-k-mooneylooks like you have used it before18:16
melwittuhhh lol18:17
gmaanso for now, concurrency=1 will work for VTPM case as they are from same test class 18:17
sean-k-mooneyyep also we run like 8 tests in that job18:17
sean-k-mooneyso concrance 1 is not a big issue18:17
melwittwell, I'm not gonna forget it again now haha18:17
melwittI put it at its own job bc it needs nested virt and was wary to tie up those machines too much with other tests that don't need it18:18
sean-k-mooneyfor whiht box in general iw touls be nicer to not need that obvioulsy as many of the test can run in parralel18:18
gmaanyeah18:18
sean-k-mooneywell mystery solved18:19
sean-k-mooneyfeel free to pin and i can review the zuul change quickly18:19
melwittyeah. this is good. I was struggling to write the code comment for needing concurrency=1 so I was looking at details again. bc I was not seeing why. ugh.18:19
opendevreviewmelanie witt proposed openstack/nova master: Use tempest_concurrency=1 for nova-vtpm job  https://review.opendev.org/c/openstack/nova/+/98486418:29
melwittgmaan, sean-k-mooney: ^18:30
gmaanmelwitt: ack, thanks18:39
sean-k-mooneylooks good to me18:46
melwittI'm sure someone already said this but https://bugs.launchpad.net/tempest/+bug/2153382 is affecting nova-next too Details: {'code': 401, 'title': 'Unauthorized', 'message': 'The request you have made requires authentication.'}19:41
melwitt(just replied on the ML)19:53
gmaanohk, I did not realize nova-next also run that test19:54
gmaananyways fix is in gate, I will merge and updaet ML once test results are out https://review.opendev.org/c/openstack/tempest/+/93876619:55
melwittthanks gmaan 19:56
melwittnova-multi-cell too. sent my reply too hastily19:57
sean-k-mooneyyou know our gate would pass more frequetly if we just turned off cidner :P20:01
sean-k-mooneygmaan: was the change form a 403 to a 401 intentioanly by the way20:01
gmaanit was due to change in default value of service_token_roles_required in keystonemiddlwware20:02
sean-k-mooneyits not really a 401 Unauthorized Error as i understnad it. the service token is a valid token but it does not have the requrie roles right?20:02
gmaanand tempest test did not handle it beacuse it was disable it in CI.20:02
gmaanyes20:03
sean-k-mooneyso it really shoudl be a s 403 Forbidden Error20:03
sean-k-mooneybecuase it authrization issue not authentication20:03
sean-k-mooneyso the real fix woudl be in cidner to make ti return a 403 right?20:04
gmaanI am not sure, if service token does not have a required token then it does not come to cinder itself that if operation is forbidden or not. 20:05
gmaanso that is why 401 seems valid one as passed token and required roles are not valid20:05
sean-k-mooneybut 401 is not about roles or permeison20:06
sean-k-mooneyits about is the token exprie/valid20:06
melwittI thought that would include roles too though, no?20:06
sean-k-mooneya 401 is an indtication to the client that it should reauthicate20:06
sean-k-mooneyadn retry20:06
sean-k-mooneybut you do not retyr a 40320:07
sean-k-mooneyform https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status/401 20:07
sean-k-mooneyA 401 Unauthorized is similar to the 403 Forbidden response, except that a 403 is returned when a request contains valid credentials, but the client does not have permissions to perform a certain action.20:07
sean-k-mooneyand form https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status/403 """Clients that receive a 403 response should expect that repeating the request without modification will fail with the same error. Server owners may decide to send a 404 response instead of a 403 if acknowledging the existence of a resource to clients with insufficient privileges is not20:08
sean-k-mooneydesired."""20:08
sean-k-mooneymelwitt: in general no20:09
sean-k-mooneyit can20:09
gmaanso in this case, token is considered as invalid right20:09
sean-k-mooneybut it more the retry behvior diffent20:09
sean-k-mooneythe token is valid20:09
sean-k-mooneybut it has insufffenct privldages20:10
melwitt401 maybe you need to re-auth as a different project or role20:10
sean-k-mooneyi guess a client could choose to do that20:11
sean-k-mooneyeveutlly nova shoudl just be callign cidner with its own token with the service role on the user token20:12
gmaanI think it is different from service token perspective. what keystonemiddlware consider a serviec token a valid if it has all requried role20:12
sean-k-mooneyand we shoudl not be lookign at the service_token at all for permsions in this case20:12
gmaanotherwise a user token which is valid can always be a valid token but that is not the case if we see that as a serviec token20:12
sean-k-mooneymaybe20:13
gmaanit just validate if service token send has required role to be consider as 'service' and if not then it is invalid "SERVICE TOKEN"20:13
sean-k-mooneywell 20:13
gmaanfrom suer tokjen perspective i agree on 401 vs 40320:13
sean-k-mooneyit need to do 2 things, validate it has the expecte roles adn that it has not expired20:13
sean-k-mooneyif both are ture the the service token is valid 20:13
gmaanyes20:14
sean-k-mooneyanyway did i understnad the fix in tempest is to accpet either 401 or 40320:14
sean-k-mooneyso that you can tolerate either behvior20:14
gmaanyes20:14
gmaanit was only 403 previously bcz service token roles were not validated before20:15
sean-k-mooneythat fair you have a few whit space issue by th ewya20:15
sean-k-mooneywell yes and no it was optional and off by default20:15
sean-k-mooneybut for the cve20:15
sean-k-mooneywe added validation in the code spereate from the policy layer20:16
gmaani thought of adding a new config option to check 401 but that will be unnecessary as service_token_roles_required os a temp config option and should be removed at the end20:16
gmaanit was added for migration purpose and never got moment to be default to True and then removed20:16
gmaansame as enforce_scope in RBAC20:17
sean-k-mooneysame as the fallback for threaing pcpus as vcpus20:17
sean-k-mooneyhttps://review.opendev.org/c/openstack/nova/+/97577920:17
gmaanyeah, i do not know if it is good to be less aggressive on those things or bad :)20:17
sean-k-mooneywell  ^ was deprecated in nova 20.0.020:18
sean-k-mooneywhich was trian20:19
sean-k-mooneyi think that one os more then overdue to be disabel by default and we proably shoudl delete it :)20:20
sean-k-mooneywe have a habbit of not cleaing up these migrtion path for many many many years after they were ment to be removed20:21
gmaanyeah20:23
gmaanand by then they become a valid thing for operators than just migration path :)20:24
gmaanI will be more than happy if i can remove enforce_scope this cycle but I know it will fail many project tests and they will not fix it on time20:25
gmaanand breaking them might be the only option to proceed 20:25
sean-k-mooneygmaan: if you do it kinder to do it early20:35
sean-k-mooneyi.e. at m1 or m220:35
gmaanm1 was too early to fix the things so i sent m2 as deadline in ML20:37
gmaandoing it in m1 could be better but then it would not give time for projects to fix tests or change default who still disable it20:38
sean-k-mooneythats fair. on the otherhadn the default was change in oslo a few cycles ago right20:40
sean-k-mooneythat was ment to be the time to adapt20:40
gmaanyes20:40
opendevreviewMasanori Ueno proposed openstack/nova master: NUMA live-migration: ensure allocation_ratio is respected  https://review.opendev.org/c/openstack/nova/+/98937823:21
opendevreviewMerged openstack/nova master: Add reproducer test for bug 2105896  https://review.opendev.org/c/openstack/nova/+/94694523:22

Generated by irclog2html.py 4.1.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!