Wednesday, 2025-11-19

opendevreviewMerged openstack/ironic stable/2025.1: Fix storing inventory and plugin data in Swift  https://review.opendev.org/c/openstack/ironic/+/96668301:39
opendevreviewMerged openstack/networking-generic-switch master: devstack: Drop explicit etcd api version  https://review.opendev.org/c/openstack/networking-generic-switch/+/95720701:43
*** logan_ is now known as Guest3156103:28
rpittaugood morning ironic! o/07:49
rpittauhey hey kind of a urgent fix when someone has a moment https://review.opendev.org/c/openstack/ironic-python-agent/+/967535 thanks! :)08:17
opendevreviewMerged openstack/ironic-tempest-plugin master: Deprecate options for ironic-inspector tests  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/96751409:37
opendevreviewMerged openstack/ironic master: Improve pre-commit configuration  https://review.opendev.org/c/openstack/ironic/+/96725709:37
opendevreviewMerged openstack/ironic master: Follow-up: Document `noop` deploy interface  https://review.opendev.org/c/openstack/ironic/+/96758609:37
opendevreviewMerged openstack/ironic master: Nit: Fix typo in Migrating from ironic-inspector  https://review.opendev.org/c/openstack/ironic/+/96689709:38
opendevreviewVerification of a change to openstack/ironic master failed: Trait Based Networking Filter Expression Parsing and Base Models  https://review.opendev.org/c/openstack/ironic/+/96149809:58
opendevreviewVerification of a change to openstack/ironic master failed: Update devstack guides to raise RAM requirement  https://review.opendev.org/c/openstack/ironic/+/96708710:01
opendevreviewMerged openstack/ironic stable/2025.1: Handle HTTP 400 and 409 race condition in Redfish power operations  https://review.opendev.org/c/openstack/ironic/+/96629412:11
opendevreviewMerged openstack/ironic-python-agent master: Fix API URL reachability test to use full URL with port  https://review.opendev.org/c/openstack/ironic-python-agent/+/96753512:36
*** hroy_ is now known as hroy12:55
dtantsurbuild jobs will soon finish for ^^^, after that a recheck party13:09
rpittaustarting with the backport13:26
opendevreviewRiccardo Pittau proposed openstack/ironic-python-agent stable/2025.2: Test advertised ip reachability before assigning it  https://review.opendev.org/c/openstack/ironic-python-agent/+/96667113:49
opendevreviewRiccardo Pittau proposed openstack/ironic-python-agent stable/2025.1: Test advertised ip reachability before assigning it  https://review.opendev.org/c/openstack/ironic-python-agent/+/96677613:52
opendevreviewRiccardo Pittau proposed openstack/ironic-python-agent bugfix/11.1: Test advertised ip reachability before assigning it  https://review.opendev.org/c/openstack/ironic-python-agent/+/96677413:53
opendevreviewRiccardo Pittau proposed openstack/ironic-python-agent bugfix/11.0: Test advertised ip reachability before assigning it  https://review.opendev.org/c/openstack/ironic-python-agent/+/96677513:54
opendevreviewDmitry Tantsur proposed openstack/bifrost master: WIP add an OCI artifact registry  https://review.opendev.org/c/openstack/bifrost/+/96138814:16
TheJuliagood morning14:26
opendevreviewDavid Nwosu proposed openstack/ironic master: Create getting-started.rst and link to other guides  https://review.opendev.org/c/openstack/ironic/+/96518914:29
opendevreviewDavid Nwosu proposed openstack/ironic master: Create getting-started.rst and link to other guides  https://review.opendev.org/c/openstack/ironic/+/96518914:33
opendevreviewDmitry Tantsur proposed openstack/bifrost master: WIP add an OCI artifact registry  https://review.opendev.org/c/openstack/bifrost/+/96138815:31
dtantsurBoth build jobs failed, so no recheck party for now :(15:35
rpittau:/15:36
JayFhttps://review.opendev.org/c/openstack/requirements/+/967699 check_requirements job is busted until this lands15:43
JayFIDK if it runs against everything, but it's blocking clif's patch r/n (which I rechecked before I saw this)15:44
opendevreviewVerification of a change to openstack/ironic master failed: Trait Based Networking Filter Expression Parsing and Base Models  https://review.opendev.org/c/openstack/ironic/+/96149815:47
dtantsurJayF: it only runs on related changes16:01
JayFack16:12
opendevreviewNahian Pathan proposed openstack/ironic master: Reduce API calls when collecting sensor data with redfish  https://review.opendev.org/c/openstack/ironic/+/95548416:18
opendevreviewMarcus Furlong proposed openstack/sushy master: Don't require Boot and Actions for Systems  https://review.opendev.org/c/openstack/sushy/+/96771516:37
dtantsurBuild jobs are running again, fingers crossed16:58
opendevreviewDmitry Tantsur proposed openstack/bifrost master: WIP add an OCI artifact registry  https://review.opendev.org/c/openstack/bifrost/+/96138817:16
dtantsurwe have new builds, recheck away :)17:26
rpittau\o/17:28
opendevreviewcid proposed openstack/ironic master: Support segmented port ranges  https://review.opendev.org/c/openstack/ironic/+/96772718:05
opendevreviewcid proposed openstack/ironic master: Support segmented port ranges  https://review.opendev.org/c/openstack/ironic/+/96772718:09
opendevreviewDmitry Tantsur proposed openstack/bifrost master: WIP add an OCI artifact registry  https://review.opendev.org/c/openstack/bifrost/+/96138818:27
JayFI think cid and I just found a really gnarly bug in our devstack config19:39
JayFbut I'd love a sanity check if someone can join up and look now it'd be 10/10 awesome19:39
JayFtl;dr: devstack deploys nova-cpu.conf with status_code_retries = 300 which means, we believe, that it will only retry on status_code 300 19:39
JayFbased on what I am seeing on how the config is written19:39
JayFnevermind us19:52
JayFwe figured out the misunderstanding we were having19:52
JayF(I was having, tbh)19:52
JayFI think we need to add SERV* to nova.virt.ironic.driver._UNPROVISION_STATES19:56
cardoecid: https://review.opendev.org/c/openstack/ironic/+/967727 you mentioned some tests generated with AI. How would ya feel if we standardized a header to convey this?20:40
JayFhttps://bugs.launchpad.net/ironic/+bug/213196020:45
cidcardoe, I actually have thought of that. In fact, in a broader sense, a way to specify to what extend AI assisted.20:47
cidBut that might just be overkill20:47
cidI'm ++ to that20:48
cardoeSo I'm the TC member tasked with studying this. So just fishing for people's comments.20:50
JayFI'd rather the TC stay out of legal/foundation scope stuff like AI policy20:51
JayFthat's my $.0220:52
TheJuliaJayF: https://bugs.launchpad.net/ironic/+bug/213196220:53
cardoeJayF: So we've had a few projects say that they weren't happy with the Assisted-By: header and that they'd like the contributor to convey if they wrote the code or the tests with AI or both.21:08
cardoeSo I have a weird reproducer on 2025.221:08
JayFIt seems like an overreach if the TC adds or modifies that requirements, in a way that seems wrong?21:08
JayFPlease no more hurdles :( 21:09
cardoeQuite the opposite of what I'm going to do.21:10
JayFI guess I just think of 'studying AI policy' as having two outcomes: existing policy, or something more restrictive than existing policy. So strange to be on the side of arguing status quo21:10
cardoeSo the 2025.2 bug...21:25
cardoeThe node goes back to available but Nova doesn't see it as un-reserved21:25
cardoeit's still got the instance_uuid on it21:26
cardoeNow neutron-server crashed during deployment so I chalked it up to that. But I made it crash again and it did it the same way again21:27
TheJuliawhaaat?!21:28
TheJulia(so, funny you had a crash too, I just had to reboot my devstack laptop, io errors and all force it to be a mashing of the power button)21:28
cardoeInventory has not changed for provider 86eb7354-cc10-4173-8ff2-d1ac2ea6befd based on inventory data: {'CUSTOM_M1_SMALL': {'total': 1, 'reserved': 1, 'min_unit': 1, 'max_unit': 1, 'step_size': 1, 'allocation_ratio': 1.0}} set_inventory_for_provider21:28
TheJuliaYeah, if the instance_uuid is still set, its never going to show up in the list of hardware nova considers usable21:29
TheJuliaThe key question: why is the instance_uuid still set21:29
cardoeIt crashed for me cause I used the logger mechanism in my neutron-server to dump into the logs all the network requests and the logger mechanism in 2025.2 apparently calls eventlet.monkey_patch() somehow21:29
TheJuliadid lightning strike the conductor? Gray beings make the conductor hover away? Are we the mirror universe?!?21:30
* TheJulia blinks21:30
TheJuliaWUT21:30
cardoeSo from ironic-api logs there's a DELETE /v1/nodes/86eb7354-cc10-4173-8ff2-d1ac2ea6befd/vifs/75a8d0f1-9928-4280-b3aa-cb2aad4eea89 which Client-side error: Node 86eb7354-cc10-4173-8ff2-d1ac2ea6befd is locked by host 1327175-hp3, please retry after the current operation is completed.21:33
cardoeSimilarly right after that there's a PATCH /v1/nodes/86eb7354-cc10-4173-8ff2-d1ac2ea6befd that throws the same thing.21:34
TheJuliaYeah, thats nova trying to unwind extra fields/values set because they want to control it21:34
TheJuliaeven though we wipe it all out in the end21:35
TheJuliabut21:35
cardoeI can grab conductor logs. They've just rolled off at this point cause I had debug set to 1121:35
JayFuh, there's a bug here I think21:35
TheJuliaif the conductor had a lock on the node, started things but instance_uuid was not removed and somewhere in there the conductor fell over, then I could see it leaving the value and you getting into some super weird state then21:35
JayFthis is relying on Ironic to remove the instance uuid 21:36
JayFso if teardown fails21:36
TheJuliabecause when conductor restarts, lock gets removed21:36
JayFhttps://opendev.org/openstack/nova/src/commit/53aadaf967b708bfb03616535d45f6378a21cae0/nova/virt/ironic/driver.py#L128521:36
cardoeSo I cannot clear that instance_uuid in anyway.21:36
JayFit never attempts to clear it manually21:36
JayFhttps://opendev.org/openstack/nova/src/commit/53aadaf967b708bfb03616535d45f6378a21cae0/nova/virt/ironic/driver.py#L138021:36
JayFI almost want a "validate that we don't have an instance_uuid on there anymore" step, but I guess it's a tradeoff between Nova API responsiveness to users and correctness on the backend21:37
TheJuliayeah21:37
TheJuliaI guess where I'm strugling with this is the conductor should still clear it21:37
JayFthe conductor never knows it got a request21:38
JayFif it 409's everything from Nova21:38
TheJuliaoh21:38
TheJuliaso, the unprovision never actually made it to the conductor?21:38
JayF'DELETE /v1/nodes/86eb7354-cc10-4173-8ff2-d1ac2ea6befd/vifs/75a8d0f1-9928-4280-b3aa-cb2aad4eea89 which Client-side error: Node 86eb7354-cc10-4173-8ff2-d1ac2ea6befd is locked by host 1327175-hp3, please retry after the current operation is completed.' 'after that there's a PATCH /v1/nodes/86eb7354-cc10-4173-8ff2-d1ac2ea6befd that throws the same thing'21:38
JayFthat actually isn't in destroy(), is it?21:39
TheJuliaI don't believe so21:39
JayFhttps://opendev.org/openstack/nova/src/commit/53aadaf967b708bfb03616535d45f6378a21cae0/nova/virt/ironic/driver.py#L118121:40
JayFwould show this same call pattern21:40
TheJuliamaybe easier just to talk through it?21:41
JayFI can if desired21:42
TheJuliahttps://meet.google.com/ucs-wsgg-wrt21:42
JayFI have acheieved subway so my lunch break will not suffer21:43
cardoeCould not seed network configuration ...... 500 error from neutron cause I crashed it21:44
cardoeconductor detached and cleans up the node at this point and then there's just power sync checks21:44
cardoeso that's where instance_uuid got left on there21:44
JayFcardoe: you wanna jump in21:44
opendevreviewcid proposed openstack/ironic master: Fail flat driver rebind when no VIFs are bound  https://review.opendev.org/c/openstack/ironic/+/96777821:44
JayFcardoe: can you join the meet?21:50
JayFcardoe: I think we have questions for you and plasible explanations21:50
JayF**plausible21:50
TheJuliaTrying to understand exactly what is going on because you have me super worried too :)21:57
JayFcardoe: I think we need full conductor logs and a full node show of that node22:02
opendevreviewcid proposed openstack/ironic master: Fail flat driver rebind when no VIFs are bound  https://review.opendev.org/c/openstack/ironic/+/96777822:05
TheJuliaYeah, if you can get us some logs or walk through it, that would help, since nova independently tries to do some stuff on the unwind, while ironic is holding a lock. tl;dr nova side its expected behavior, ironic should ensure its the end case22:06
JayFif it's a bug we really should fix it22:06
JayFbut it'd be /our/ (conductor) bug22:06
TheJuliaI'm really not seeing it walking the code, but maybe I'm missing something or somehow task.node.save() failed22:07
TheJuliacall closed, but totally willing to jump in and walk through it if you've got more info 22:08
opendevreviewcid proposed openstack/ironic master: Fail flat driver rebind when no VIFs are bound  https://review.opendev.org/c/openstack/ironic/+/96777822:14
TheJuliacid: i love you had to add a port to fix a test,but I think we need a negative test too :)22:17
cid++, on it22:17
cardoesorry kids came in the door and got dragged into that22:19
JayFI can jump back in r/n if TheJulia can22:20
cardoeI've got the logs pulled up 22:21
TheJuliajoining22:21
opendevreviewcid proposed openstack/ironic master: Fail flat driver rebind when no VIFs are bound  https://review.opendev.org/c/openstack/ironic/+/96777822:31
TheJulia"trying to delineate"23:22
JayFjulia wins with the last word23:22
cardoelol yep23:22
JayFcongratulations; you win an AI to file a bug23:22
JayFfeel free to ask an AI to perform your AI 23:22
JayFlol23:22
TheJuliaI'll need a little more understanding before I can attempt to execute on that AI ;()23:23
TheJuliaerr ;)23:23
TheJuliacardoe: do you see "Failed spawn cleanup for instance" in your nova-compute service log23:30
TheJuliaif so, nova.virt is actually catching the failure properly and that should match what is expected, its just the reset which is failing23:31
TheJuliacardoe: I guess what is needed is any exceptions/errors on that req-id being unwound23:34
TheJuliacardoe: nova.virt.ironic should be raising a "Failed to remove deploy parameters from node", just need to confirm it is missing. or not. Ultimately if that node is locked I can see it and I bet the threading changes might have shiftd some of these sort of race conditions around as well23:35
*** diablo_rojo_phone is now known as Guest3164723:58

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!