Thursday, 2026-01-08

opendevreviewOpenStack Proposal Bot proposed openstack/ironic-ui master: Imported Translations from Zanata  https://review.opendev.org/c/openstack/ironic-ui/+/97261502:18
TheJulia…. I think I we need to check the upgrade stamping code to make sure we’re iterating through and stamping the table. At a hockey game and it hit me as a possible cause/source04:34
TheJuliacardoe: if you could dump an offending related row out of the database, if your willing, that will set my paranoia to the side or make me think we have an upgrade bug04:36
rpittaugood morning ironic! o/07:55
opendevreviewMerged openstack/ironic stable/2025.2: fix: bios fields could not be fetched via the API  https://review.opendev.org/c/openstack/ironic/+/97260209:18
opendevreviewMerged openstack/bifrost master: Document IPA image download options  https://review.opendev.org/c/openstack/bifrost/+/96414509:19
opendevreviewJacob Anders proposed openstack/ironic master: Make post-firmware-update reboot conditional on component  https://review.opendev.org/c/openstack/ironic/+/96634409:33
opendevreviewJacob Anders proposed openstack/ironic master: Make post-firmware-update reboot conditional on component  https://review.opendev.org/c/openstack/ironic/+/96634410:37
abongalegood morning o/10:49
ContinuityMorning all10:54
*** dmellado7 is now known as dmellado11:19
*** hroy_ is now known as hroy11:44
TheJuliagood morning14:04
TheJuliaso the online data migrations seem okay to me as they use the object list14:13
TheJuliain the release mapping14:14
cardoegetting into the DB14:20
TheJuliacardoe: much appreciated14:23
TheJuliaContinuity: Regarding the networking reset question, I've put it on the radar on my downstream product owner because it seems like its a feature we absolutely need moving forward, I next chat with them Monday around noon US Pacific time. I can't promise anyone will jump on it immediately, but the more I dig into VXLAN, the more I'm convinced its a necessity to have.14:28
ContinuityTheJulia: thanks, I appreciate it. I am going to *try* to get more involved this year with actually writing code and patching stuff. So if I can I will attempt to help out.14:29
ContinuityI also want my org to get more involved, so we are scaling engineers as we speak 14:30
TheJuliavery cool14:30
cardoehttps://www.irccloud.com/pastebin/hYIdU2Bx/14:38
cardoeTheJulia: ^14:38
cardoebleh node_id is an int14:39
TheJuliaso this was *purely* when its going to see and try to hydrate an empty object?14:39
TheJuliaoh14:39
TheJuliaheh14:39
TheJuliayeah14:39
TheJuliathat is right14:39
cardoegah someone isn't on IRC for me to emote stabbing them..... they deleted the specific box I reported and used it for other testing.14:44
cardoeI'm honestly not sure which field would be the problem.14:44
cardoeI can dump everything from one node. But its gonna unfortunately be different.14:48
TheJuliaugh14:48
TheJuliait sounds like on upgrade you were able to reproduce it immediately based upon the logs you mentioned? is there another node where it was observed?14:49
TheJuliaas long as it hasn't been updated, it could help clarity wise14:49
TheJuliaI really just want to make sure I'm not incorrectly interpretting what is going on14:50
cardoeokay found another one.14:52
TheJuliais it the magical unicorn database record we could hope for?!14:55
cardoehttps://gist.github.com/cardoe/3834eb3c7e55a2a323d22b082cccc21814:56
cardoeIt's even better cause that's a totally plain box that doesn't look like anyone's configured it before.14:56
TheJuliaBut all at 1.114:59
TheJuliawtaf15:00
cardoeSo that DB is older but not too terribly old.15:01
cardoeAs far as I can tell from logs it was deployed with 2024.2 originally.15:02
cardoeMaybe I missed something in the db upgrade?15:02
cardoeOpenStack Helm runs ironic-dbsync upgrade15:02
TheJuliadoes it not use online-data-migrations ?15:03
TheJuliaits a two step process, upgrade creates the schema, but it doesn't roll/update versions/fields15:04
TheJuliagranted, your 1.1 on that record which is still weird if its reproducing15:04
* TheJulia smells the additional fresh coffee15:04
dtantsurhuh, is cotyledon broken on python 3.13?15:11
cardoeYeah so that's the issue.15:11
cardoeOpenStack Helm isn't running online-data-migrations15:11
cardoeWe do that ourselves as a follow on job.15:12
cardoeBut it missed on this DB. I just ran it and everything is 1.215:12
TheJuliadtantsur: ... I think I was on 3.13 when I was doing the work originally last summer, did it break  and if so how?15:14
TheJuliacardoe: okay, that is starting to make more sense then15:14
dtantsurTheJulia: sorry, it was 3.14, I'm just stupid and cannot type15:14
TheJuliaOH!15:14
TheJuliaokay15:14
dtantsurTheJulia: https://paste.opendev.org/show/b3U0gV1ZC5KFd7lvJGum/15:15
TheJuliaI've not actually tried starting ironic with 3.14 yet considering not everything in our code base is ready and project as a whole hasn't moved devstack to 3.14 supporting state as 3.13 is the version it is supporting15:15
dtantsuryeah, it's just the default in Fedora15:15
TheJuliaThat being said -> I was likely going to hit it next week if my hope for plans next week don't get quashed15:15
dtantsurNow, let's try to collect sensors from 3500 nodes using Ironic with 4000 conductor works in a synchronous fashion. I'm sure nothing will go wrong (if I even manage to enroll all this...).15:17
TheJuliaewww15:17
TheJuliayeah15:17
TheJuliaeww with such a config/state15:17
TheJuliathe first eww was the backtrace15:18
dtantsuryeah15:18
dtantsurOn the other hand, it will be quite silly to go ahead with an asynchronous implementation without being sure that native threads won't work.15:18
TheJuliaWell, step 0 is likely giving such a spin on 3.1315:20
TheJuliaIf the computer melts, I'm sorry. :\ :)15:20
dtantsurYou never know until you do :D15:21
cardoeTheJulia: yeah so now everything is good... It was 1.1 cause you added the RPC stuff which bumped it to 1.2 in 2025.215:26
TheJuliaYup, and we expect online migrations to run as well, so Joy!15:27
TheJuliaokay, I'm sorry about that, I feel kind of bad, but Helm *not* running the migrations is problematic and has me wonder if *ironic* itself needs to self attempt to run record upgrades15:27
TheJulia(I was at a shop where we did that like 14 years ago, so the fact we have a command makes me want to scream)15:28
cardoeMy patch isn't wrong though cause the version convert code was wrong.15:32
TheJuliacorrect15:34
TheJuliabut that was why I was really soft of freaking out because a happy state upgraded environment *shouldn't* be unhappy15:35
TheJuliasort, not soft15:35
TheJuliabrraaain15:35
cardoeah yeah makes sense15:36
cardoehttps://bugs.launchpad.net/nova/+bug/2137673 (and I've written a patch which they're hopefully okay with) is probably just ironic specific since we're the only ones that actually do something in that call. But that should be another reason why the error messages will suck less with nova.15:38
TheJuliaYeah, I noticed and I hope they do accept it15:39
TheJuliathey have been sort of resistant to such in the past but times change15:39
dtantsurdear lord, why does sushy need to log everything it touches...15:48
dtantsurOkay, this is interesting. Our thread pool implementation hardly goes beyond 400 threads even if 4000 is possible. I'm curious why.15:58
dtantsurHmm, it's possible that idle thread detection in futurist is utter bullshit16:02
TheJuliaThat, right there16:07
dtantsurTHe more I'm looking at futurist logs, the more insane I'm getting here16:07
TheJuliaAlso, just because you have the thread configured doesn't mean it is actually needed/used16:08
TheJuliaand can get nuked before. The workload itself needs to push it16:08
dtantsurOh, it's very needed, I'm running sensor data with worker count == node count16:08
TheJuliawe aggressively try to keep that pool from sitting idle16:08
TheJuliaoh my16:08
dtantsurmakes this makes sense: https://paste.opendev.org/show/b2v2pMF8oXkcJkAgWjoj/16:09
TheJuliaI'd add some extra logging just to make sure we're actually trying to launch that many16:09
dtantsurhow can queue size jump between 21 and 1 for many-many iterations here?16:09
dtantsurI'm looking at pages of this back-and-forth16:09
TheJuliaLocally I added quite a bit of additional logging into futurist to keep me from going too crazy16:10
TheJuliayou may want to do the same as you ride the insane bus16:10
dtantsurI may indeed, it's just not clear what to log16:11
TheJulia_maybe_spin_up, the method which calls it, and the call which tosses more threads on the pile16:11
dtantsurRight, let's also log when no changes are needed..16:13
TheJuliayup16:13
* dtantsur needs a logging level below debug (like rust-log has trace)16:13
TheJuliayeah, pretty much16:13
dtantsurOne can I say for sure: in its current state Ironic is not capable of handling sensors for any large number of nodes any quickly16:18
opendevreviewDmitry Tantsur proposed openstack/ironic-specs master: [WIP] Asynchronous sensor data collection  https://review.opendev.org/c/openstack/ironic-specs/+/97275416:28
dtantsurThis is what I'm working on btw ^^^^16:28
opendevreviewDmitry Tantsur proposed openstack/ironic-specs master: [WIP] Asynchronous sensor data collection  https://review.opendev.org/c/openstack/ironic-specs/+/97275416:28
JayFI filed an RFE: https://bugs.launchpad.net/ironic/+bug/2137729 -- basically bootc deploy_interface switching, except for ramdisk driver16:49
dtantsurJayF: note that you may also need Nova changes to populate different instance_info fields16:50
JayFI'm 99% sure I pass on all image metadata16:50
JayFchecking16:50
dtantsurI mean, setting boot_iso instead of image_source, etc16:51
JayFI'm thinking we'd own that in the auto swapper16:51
dtantsurah, interesting16:51
JayFI want to avoid adding code to the nova driver for this pattern as possible16:51
JayF*if possible16:51
JayFbecause it enables us to do more interesting things16:51
JayFWe don't pass on full image metadata in the virt driver patcher; only really image_source; https://opendev.org/openstack/nova/src/commit/4b90fdf9af8de53fb6536d27b8fc654a0c011e2f/nova/virt/ironic/patcher.py#L6116:55
JayFWhich means we must be roundtripping glance ourself for metadata; which is good news for not having to modify nova (if we're OK to fudge the instance_info ourselves at deploy time)16:55
TheJuliadtantsur: entirely do-able with just the url and the data lookup, fwiw.17:00
TheJuliaif its an OCI url17:00
TheJuliaoh17:00
TheJuliafor ramdisk, disregard me17:00
TheJuliaanyway, it should be entirely doable with image metadata and I bet nova would prefer that be the case of use17:07
JayFI'm chatting in #openstack-glance17:07
JayFI'd prefer strongly that we keep the logic in Ironic as much as possible17:07
TheJuliaJayF: also, keeping additional code out of nova allows for a direct api consumer to request a machine to be deployed without modifying interfaces manually17:07
JayFyes17:08
JayFyes yes yes17:08
TheJuliaagree17:08
JayFI think we can do this with 0 Nova and Glance code changes17:08
JayFexcept a glance doc update as a follow-on17:08
JayFafter looking at it all and chatting in #-glance17:08
TheJuliaThat is entirely what I would expect17:09
JayFMe too, but I'm used to having my expectations not met :P 17:09
JayFthis is a nice surprise17:09
opendevreviewJulia Kreger proposed openstack/networking-generic-switch master: WIP: l2vni plug case with Cisco NXOS  https://review.opendev.org/c/openstack/networking-generic-switch/+/96837717:18
opendevreviewJulia Kreger proposed openstack/networking-generic-switch master: Update hacking to 7.0.0  https://review.opendev.org/c/openstack/networking-generic-switch/+/97276217:18
opendevreviewJulia Kreger proposed openstack/networking-generic-switch master: WIP: Arista EOS and vendor neutral SONiC support for VXLAN attachments  https://review.opendev.org/c/openstack/networking-generic-switch/+/97276317:18
opendevreviewJulia Kreger proposed openstack/networking-generic-switch master: WIP: VXLAN: Add Junos, Cumulus NVUE, and denote Dell OS10 as unsupported  https://review.opendev.org/c/openstack/networking-generic-switch/+/97276417:18
opendevreviewJulia Kreger proposed openstack/networking-generic-switch master: WIP: OVS testing patch for 'vxlan' binding model  https://review.opendev.org/c/openstack/networking-generic-switch/+/97276517:18
opendevreviewJulia Kreger proposed openstack/ironic-specs master: VXLAN networking  https://review.opendev.org/c/openstack/ironic-specs/+/95940119:39
TheJuliacardoe and anyone else interested in vxlan ^^^ - Now simpler!19:39
cardoeThat’s my intended afternoon19:40
TheJuliavery cool, thanks19:54
cardoeTheJulia: so your create_network_postcommit that's gonna create the VNI... which switch is it creating the VNI on?21:53
TheJuliaso... 21:53
TheJuliaI think that should actually be the port bind now21:53
cardoeSo bind_port() shouldn't touch the switches21:53
cardoebind_port() makes the decision21:54
TheJuliaI mean, the post bind postcommit21:54
TheJuliasorry, juggling a couple conversations here21:54
cardoeI can take the backburner21:54
cardoeSo update_port_postcommit() is where we do it. 21:55
cardoeThat's how my initial patch worked.21:55
TheJuliayeah21:55
TheJuliaand I think that is actually correct21:55
cardoeif its a vxlan segment, call add_vxlan_network(), if its a vlan segment... do the existing stuff21:56
TheJuliaI just -1'ed the spec with that note, fwiw21:56
TheJuliaI think we should do it as part of port add/delete, that way we're also not adding the VNI to switches which don't need it21:57
TheJuliawith some hind sight of playing with patches, I think your approach was actually the best approach in regards to the ml2 interaction21:58
TheJuliaThe NGS way of VLANs poisons the thoughts though21:58
TheJuliaso we need to disabiguate a little bit21:58
TheJuliaAnd thats okay21:58
cardoeonly other nits are that in some places you bolded or italics so far. the hierarchical port binding is dense but I don't have any criticisms21:58
cardoeI can update the style stuff if you're okay with that. I downloaded it locally and built the HTML and read that version.21:58
cardoeI've also got a solution to detect when the VNI is no longer needed on that switch as well.21:59
cardoeAt least it's written on my whiteboard on my wall after tracing through the neutron code.21:59
TheJuliacardoe: oh, by all means, but do take a glance at the ngs patch I posted21:59
TheJuliaI've not given it a spin yet, but keeping that with port binding tightly couples it22:00
cardoeIt'd be infinitely easier to accomplish if I could figure out what to tell Nidhi to change in https://review.opendev.org/c/openstack/python-openstackclient/+/963947 so that the patch that includes if the network is l2 or not comes back to Ironic 22:01
cardoeActually no. It only matters for the trunk plugin support in Ironic22:02
TheJuliaWell, we can kind of *know* it based upon everything else22:02
TheJuliayeah22:02
TheJuliaThere is a dividing line of logical support there, and yes it is weird22:02
cardoeThat's why I went and stole my kids dry erase board to sketch this all out.22:03
TheJuliaThe big thing I feel the spec is missing is a super clear understanding of the port binding on the router side, that feels like, if my understandig is correct, something we might be abel to just sort of sort out with some more mech driver code22:03
cardoeI feel like that meme where the guy has all the red strings.22:03
TheJuliaWhen are we getting Incognito, Inc. badges?22:03
TheJuliaor, was it cognito22:04
TheJuliaI don't remember22:04
TheJuliaSee: Inside Job22:04
TheJuliaanyway, decoupling the concept of the network add/delete from the VXLAN VNI is entirely the right thing to do22:05
TheJulia*because* the bottom binding segment doesn't get created until we go to do the attachment22:05
TheJuliaso if we just check/delete when no longer needed, we keep the switch relatively clean22:06
cardoeYep22:06
TheJuliathat matches the patches I got claude to whip up22:08
TheJuliaIf you want to rev the spec, your welcome to, I can also do it. Great catch regarding the method and modeling for the port/network config22:09
TheJuliaThe big challenge right now is to not feel like the crazy person with the board with red yarn all over it22:09
TheJuliabecause I've managed to start to put together your preception and I get a lot of it now22:09
cardoeYeah.22:09
cardoeI've struggled at how to convey it.22:09
TheJuliaI'm a little vauge on your details/context, but it is making a ton more sense now22:10
cardoeI'm happy to fill in.22:10
cardoeI'll write the trunk stuff down now.22:10
TheJuliaThat would be amazing22:10
TheJuliabecause I think that is where there is room to improve or at least best understand and if we can do *that* we might be able to get out of the router bit as it is related22:10
TheJulia... hopefully :)22:10
-opendevstatus- NOTICE: Zuul will be shutdown for maintenance work. See https://lists.opendev.org/archives/list/service-announce@lists.opendev.org/thread/WBBLBI6ZS6FA6Q5ZMH4C2MWPL3WG3H24/ for more details.23:43

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!