| opendevreview | Merged openstack/ironic stable/2025.1: Fix storing inventory and plugin data in Swift https://review.opendev.org/c/openstack/ironic/+/966683 | 01:39 |
|---|---|---|
| opendevreview | Merged openstack/networking-generic-switch master: devstack: Drop explicit etcd api version https://review.opendev.org/c/openstack/networking-generic-switch/+/957207 | 01:43 |
| *** logan_ is now known as Guest31561 | 03:28 | |
| rpittau | good morning ironic! o/ | 07:49 |
| rpittau | hey hey kind of a urgent fix when someone has a moment https://review.opendev.org/c/openstack/ironic-python-agent/+/967535 thanks! :) | 08:17 |
| opendevreview | Merged openstack/ironic-tempest-plugin master: Deprecate options for ironic-inspector tests https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/967514 | 09:37 |
| opendevreview | Merged openstack/ironic master: Improve pre-commit configuration https://review.opendev.org/c/openstack/ironic/+/967257 | 09:37 |
| opendevreview | Merged openstack/ironic master: Follow-up: Document `noop` deploy interface https://review.opendev.org/c/openstack/ironic/+/967586 | 09:37 |
| opendevreview | Merged openstack/ironic master: Nit: Fix typo in Migrating from ironic-inspector https://review.opendev.org/c/openstack/ironic/+/966897 | 09:38 |
| opendevreview | Verification of a change to openstack/ironic master failed: Trait Based Networking Filter Expression Parsing and Base Models https://review.opendev.org/c/openstack/ironic/+/961498 | 09:58 |
| opendevreview | Verification of a change to openstack/ironic master failed: Update devstack guides to raise RAM requirement https://review.opendev.org/c/openstack/ironic/+/967087 | 10:01 |
| opendevreview | Merged openstack/ironic stable/2025.1: Handle HTTP 400 and 409 race condition in Redfish power operations https://review.opendev.org/c/openstack/ironic/+/966294 | 12:11 |
| opendevreview | Merged openstack/ironic-python-agent master: Fix API URL reachability test to use full URL with port https://review.opendev.org/c/openstack/ironic-python-agent/+/967535 | 12:36 |
| *** hroy_ is now known as hroy | 12:55 | |
| dtantsur | build jobs will soon finish for ^^^, after that a recheck party | 13:09 |
| rpittau | starting with the backport | 13:26 |
| opendevreview | Riccardo Pittau proposed openstack/ironic-python-agent stable/2025.2: Test advertised ip reachability before assigning it https://review.opendev.org/c/openstack/ironic-python-agent/+/966671 | 13:49 |
| opendevreview | Riccardo Pittau proposed openstack/ironic-python-agent stable/2025.1: Test advertised ip reachability before assigning it https://review.opendev.org/c/openstack/ironic-python-agent/+/966776 | 13:52 |
| opendevreview | Riccardo Pittau proposed openstack/ironic-python-agent bugfix/11.1: Test advertised ip reachability before assigning it https://review.opendev.org/c/openstack/ironic-python-agent/+/966774 | 13:53 |
| opendevreview | Riccardo Pittau proposed openstack/ironic-python-agent bugfix/11.0: Test advertised ip reachability before assigning it https://review.opendev.org/c/openstack/ironic-python-agent/+/966775 | 13:54 |
| opendevreview | Dmitry Tantsur proposed openstack/bifrost master: WIP add an OCI artifact registry https://review.opendev.org/c/openstack/bifrost/+/961388 | 14:16 |
| TheJulia | good morning | 14:26 |
| opendevreview | David Nwosu proposed openstack/ironic master: Create getting-started.rst and link to other guides https://review.opendev.org/c/openstack/ironic/+/965189 | 14:29 |
| opendevreview | David Nwosu proposed openstack/ironic master: Create getting-started.rst and link to other guides https://review.opendev.org/c/openstack/ironic/+/965189 | 14:33 |
| opendevreview | Dmitry Tantsur proposed openstack/bifrost master: WIP add an OCI artifact registry https://review.opendev.org/c/openstack/bifrost/+/961388 | 15:31 |
| dtantsur | Both build jobs failed, so no recheck party for now :( | 15:35 |
| rpittau | :/ | 15:36 |
| JayF | https://review.opendev.org/c/openstack/requirements/+/967699 check_requirements job is busted until this lands | 15:43 |
| JayF | IDK if it runs against everything, but it's blocking clif's patch r/n (which I rechecked before I saw this) | 15:44 |
| opendevreview | Verification of a change to openstack/ironic master failed: Trait Based Networking Filter Expression Parsing and Base Models https://review.opendev.org/c/openstack/ironic/+/961498 | 15:47 |
| dtantsur | JayF: it only runs on related changes | 16:01 |
| JayF | ack | 16:12 |
| opendevreview | Nahian Pathan proposed openstack/ironic master: Reduce API calls when collecting sensor data with redfish https://review.opendev.org/c/openstack/ironic/+/955484 | 16:18 |
| opendevreview | Marcus Furlong proposed openstack/sushy master: Don't require Boot and Actions for Systems https://review.opendev.org/c/openstack/sushy/+/967715 | 16:37 |
| dtantsur | Build jobs are running again, fingers crossed | 16:58 |
| opendevreview | Dmitry Tantsur proposed openstack/bifrost master: WIP add an OCI artifact registry https://review.opendev.org/c/openstack/bifrost/+/961388 | 17:16 |
| dtantsur | we have new builds, recheck away :) | 17:26 |
| rpittau | \o/ | 17:28 |
| opendevreview | cid proposed openstack/ironic master: Support segmented port ranges https://review.opendev.org/c/openstack/ironic/+/967727 | 18:05 |
| opendevreview | cid proposed openstack/ironic master: Support segmented port ranges https://review.opendev.org/c/openstack/ironic/+/967727 | 18:09 |
| opendevreview | Dmitry Tantsur proposed openstack/bifrost master: WIP add an OCI artifact registry https://review.opendev.org/c/openstack/bifrost/+/961388 | 18:27 |
| JayF | I think cid and I just found a really gnarly bug in our devstack config | 19:39 |
| JayF | but I'd love a sanity check if someone can join up and look now it'd be 10/10 awesome | 19:39 |
| JayF | tl;dr: devstack deploys nova-cpu.conf with status_code_retries = 300 which means, we believe, that it will only retry on status_code 300 | 19:39 |
| JayF | based on what I am seeing on how the config is written | 19:39 |
| JayF | nevermind us | 19:52 |
| JayF | we figured out the misunderstanding we were having | 19:52 |
| JayF | (I was having, tbh) | 19:52 |
| JayF | I think we need to add SERV* to nova.virt.ironic.driver._UNPROVISION_STATES | 19:56 |
| cardoe | cid: https://review.opendev.org/c/openstack/ironic/+/967727 you mentioned some tests generated with AI. How would ya feel if we standardized a header to convey this? | 20:40 |
| JayF | https://bugs.launchpad.net/ironic/+bug/2131960 | 20:45 |
| cid | cardoe, I actually have thought of that. In fact, in a broader sense, a way to specify to what extend AI assisted. | 20:47 |
| cid | But that might just be overkill | 20:47 |
| cid | I'm ++ to that | 20:48 |
| cardoe | So I'm the TC member tasked with studying this. So just fishing for people's comments. | 20:50 |
| JayF | I'd rather the TC stay out of legal/foundation scope stuff like AI policy | 20:51 |
| JayF | that's my $.02 | 20:52 |
| TheJulia | JayF: https://bugs.launchpad.net/ironic/+bug/2131962 | 20:53 |
| cardoe | JayF: So we've had a few projects say that they weren't happy with the Assisted-By: header and that they'd like the contributor to convey if they wrote the code or the tests with AI or both. | 21:08 |
| cardoe | So I have a weird reproducer on 2025.2 | 21:08 |
| JayF | It seems like an overreach if the TC adds or modifies that requirements, in a way that seems wrong? | 21:08 |
| JayF | Please no more hurdles :( | 21:09 |
| cardoe | Quite the opposite of what I'm going to do. | 21:10 |
| JayF | I guess I just think of 'studying AI policy' as having two outcomes: existing policy, or something more restrictive than existing policy. So strange to be on the side of arguing status quo | 21:10 |
| cardoe | So the 2025.2 bug... | 21:25 |
| cardoe | The node goes back to available but Nova doesn't see it as un-reserved | 21:25 |
| cardoe | it's still got the instance_uuid on it | 21:26 |
| cardoe | Now neutron-server crashed during deployment so I chalked it up to that. But I made it crash again and it did it the same way again | 21:27 |
| TheJulia | whaaat?! | 21:28 |
| TheJulia | (so, funny you had a crash too, I just had to reboot my devstack laptop, io errors and all force it to be a mashing of the power button) | 21:28 |
| cardoe | Inventory has not changed for provider 86eb7354-cc10-4173-8ff2-d1ac2ea6befd based on inventory data: {'CUSTOM_M1_SMALL': {'total': 1, 'reserved': 1, 'min_unit': 1, 'max_unit': 1, 'step_size': 1, 'allocation_ratio': 1.0}} set_inventory_for_provider | 21:28 |
| TheJulia | Yeah, if the instance_uuid is still set, its never going to show up in the list of hardware nova considers usable | 21:29 |
| TheJulia | The key question: why is the instance_uuid still set | 21:29 |
| cardoe | It crashed for me cause I used the logger mechanism in my neutron-server to dump into the logs all the network requests and the logger mechanism in 2025.2 apparently calls eventlet.monkey_patch() somehow | 21:29 |
| TheJulia | did lightning strike the conductor? Gray beings make the conductor hover away? Are we the mirror universe?!? | 21:30 |
| * TheJulia blinks | 21:30 | |
| TheJulia | WUT | 21:30 |
| cardoe | So from ironic-api logs there's a DELETE /v1/nodes/86eb7354-cc10-4173-8ff2-d1ac2ea6befd/vifs/75a8d0f1-9928-4280-b3aa-cb2aad4eea89 which Client-side error: Node 86eb7354-cc10-4173-8ff2-d1ac2ea6befd is locked by host 1327175-hp3, please retry after the current operation is completed. | 21:33 |
| cardoe | Similarly right after that there's a PATCH /v1/nodes/86eb7354-cc10-4173-8ff2-d1ac2ea6befd that throws the same thing. | 21:34 |
| TheJulia | Yeah, thats nova trying to unwind extra fields/values set because they want to control it | 21:34 |
| TheJulia | even though we wipe it all out in the end | 21:35 |
| TheJulia | but | 21:35 |
| cardoe | I can grab conductor logs. They've just rolled off at this point cause I had debug set to 11 | 21:35 |
| JayF | uh, there's a bug here I think | 21:35 |
| TheJulia | if the conductor had a lock on the node, started things but instance_uuid was not removed and somewhere in there the conductor fell over, then I could see it leaving the value and you getting into some super weird state then | 21:35 |
| JayF | this is relying on Ironic to remove the instance uuid | 21:36 |
| JayF | so if teardown fails | 21:36 |
| TheJulia | because when conductor restarts, lock gets removed | 21:36 |
| JayF | https://opendev.org/openstack/nova/src/commit/53aadaf967b708bfb03616535d45f6378a21cae0/nova/virt/ironic/driver.py#L1285 | 21:36 |
| cardoe | So I cannot clear that instance_uuid in anyway. | 21:36 |
| JayF | it never attempts to clear it manually | 21:36 |
| JayF | https://opendev.org/openstack/nova/src/commit/53aadaf967b708bfb03616535d45f6378a21cae0/nova/virt/ironic/driver.py#L1380 | 21:36 |
| JayF | I almost want a "validate that we don't have an instance_uuid on there anymore" step, but I guess it's a tradeoff between Nova API responsiveness to users and correctness on the backend | 21:37 |
| TheJulia | yeah | 21:37 |
| TheJulia | I guess where I'm strugling with this is the conductor should still clear it | 21:37 |
| JayF | the conductor never knows it got a request | 21:38 |
| JayF | if it 409's everything from Nova | 21:38 |
| TheJulia | oh | 21:38 |
| TheJulia | so, the unprovision never actually made it to the conductor? | 21:38 |
| JayF | 'DELETE /v1/nodes/86eb7354-cc10-4173-8ff2-d1ac2ea6befd/vifs/75a8d0f1-9928-4280-b3aa-cb2aad4eea89 which Client-side error: Node 86eb7354-cc10-4173-8ff2-d1ac2ea6befd is locked by host 1327175-hp3, please retry after the current operation is completed.' 'after that there's a PATCH /v1/nodes/86eb7354-cc10-4173-8ff2-d1ac2ea6befd that throws the same thing' | 21:38 |
| JayF | that actually isn't in destroy(), is it? | 21:39 |
| TheJulia | I don't believe so | 21:39 |
| JayF | https://opendev.org/openstack/nova/src/commit/53aadaf967b708bfb03616535d45f6378a21cae0/nova/virt/ironic/driver.py#L1181 | 21:40 |
| JayF | would show this same call pattern | 21:40 |
| TheJulia | maybe easier just to talk through it? | 21:41 |
| JayF | I can if desired | 21:42 |
| TheJulia | https://meet.google.com/ucs-wsgg-wrt | 21:42 |
| JayF | I have acheieved subway so my lunch break will not suffer | 21:43 |
| cardoe | Could not seed network configuration ...... 500 error from neutron cause I crashed it | 21:44 |
| cardoe | conductor detached and cleans up the node at this point and then there's just power sync checks | 21:44 |
| cardoe | so that's where instance_uuid got left on there | 21:44 |
| JayF | cardoe: you wanna jump in | 21:44 |
| opendevreview | cid proposed openstack/ironic master: Fail flat driver rebind when no VIFs are bound https://review.opendev.org/c/openstack/ironic/+/967778 | 21:44 |
| JayF | cardoe: can you join the meet? | 21:50 |
| JayF | cardoe: I think we have questions for you and plasible explanations | 21:50 |
| JayF | **plausible | 21:50 |
| TheJulia | Trying to understand exactly what is going on because you have me super worried too :) | 21:57 |
| JayF | cardoe: I think we need full conductor logs and a full node show of that node | 22:02 |
| opendevreview | cid proposed openstack/ironic master: Fail flat driver rebind when no VIFs are bound https://review.opendev.org/c/openstack/ironic/+/967778 | 22:05 |
| TheJulia | Yeah, if you can get us some logs or walk through it, that would help, since nova independently tries to do some stuff on the unwind, while ironic is holding a lock. tl;dr nova side its expected behavior, ironic should ensure its the end case | 22:06 |
| JayF | if it's a bug we really should fix it | 22:06 |
| JayF | but it'd be /our/ (conductor) bug | 22:06 |
| TheJulia | I'm really not seeing it walking the code, but maybe I'm missing something or somehow task.node.save() failed | 22:07 |
| TheJulia | call closed, but totally willing to jump in and walk through it if you've got more info | 22:08 |
| opendevreview | cid proposed openstack/ironic master: Fail flat driver rebind when no VIFs are bound https://review.opendev.org/c/openstack/ironic/+/967778 | 22:14 |
| TheJulia | cid: i love you had to add a port to fix a test,but I think we need a negative test too :) | 22:17 |
| cid | ++, on it | 22:17 |
| cardoe | sorry kids came in the door and got dragged into that | 22:19 |
| JayF | I can jump back in r/n if TheJulia can | 22:20 |
| cardoe | I've got the logs pulled up | 22:21 |
| TheJulia | joining | 22:21 |
| opendevreview | cid proposed openstack/ironic master: Fail flat driver rebind when no VIFs are bound https://review.opendev.org/c/openstack/ironic/+/967778 | 22:31 |
| TheJulia | "trying to delineate" | 23:22 |
| JayF | julia wins with the last word | 23:22 |
| cardoe | lol yep | 23:22 |
| JayF | congratulations; you win an AI to file a bug | 23:22 |
| JayF | feel free to ask an AI to perform your AI | 23:22 |
| JayF | lol | 23:22 |
| TheJulia | I'll need a little more understanding before I can attempt to execute on that AI ;() | 23:23 |
| TheJulia | err ;) | 23:23 |
| TheJulia | cardoe: do you see "Failed spawn cleanup for instance" in your nova-compute service log | 23:30 |
| TheJulia | if so, nova.virt is actually catching the failure properly and that should match what is expected, its just the reset which is failing | 23:31 |
| TheJulia | cardoe: I guess what is needed is any exceptions/errors on that req-id being unwound | 23:34 |
| TheJulia | cardoe: nova.virt.ironic should be raising a "Failed to remove deploy parameters from node", just need to confirm it is missing. or not. Ultimately if that node is locked I can see it and I bet the threading changes might have shiftd some of these sort of race conditions around as well | 23:35 |
| *** diablo_rojo_phone is now known as Guest31647 | 23:58 | |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!