15:04:31 <dtantsur> #startmeeting ironic 15:04:31 <opendevmeet> Meeting started Mon Nov 10 15:04:31 2025 UTC and is due to finish in 60 minutes. The chair is dtantsur. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:04:31 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:04:31 <opendevmeet> The meeting name has been set to 'ironic' 15:04:38 <janders> o/ 15:04:48 <dtantsur> Well, hello everyone, I'm not even sure we have the meeting today, but I'm going to chair it! 15:04:56 <TheJulia> o/ 15:04:59 <dtantsur> Who is here for some ironic conversation? 15:05:07 <TheJulia> \o 15:05:12 <alegacy> o/ 15:05:15 * TheJulia is dancing in the corner 15:05:26 <dtantsur> Our agenda is where it usually is: 15:05:33 <dtantsur> #link https://wiki.openstack.org/wiki/Meetings/Ironic Meeting agenda 15:05:40 <rpittau> o/ 15:06:07 <dtantsur> I'm giving the obviously undercaffeinated community members a couple more minutes to join 15:06:19 <kubajj> o/ 15:06:30 <mostepha[m]> \o 15:07:09 <dtantsur> #topic Announcements / Reminders 15:07:21 <cid> o/ 15:07:48 <dtantsur> #link https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/FQJSYWZIUX2LJKUGTVB274K4GB3T2QPS/ there is a proposal to expand the core team, current cores please take a look 15:09:04 <TheJulia> Does the agenda need a coffee break? 15:09:25 <dtantsur> #link https://specs.openstack.org/openstack/ironic-specs/priorities/2026-1-workitems.html here are the agreed workitems for 2026.1 15:09:34 <dtantsur> (please bear with me, I was not prepared to chair this meeting :) 15:09:59 <dtantsur> Standing reminder to review patches tagged ironic-week-prio and to hashtag any patches ready for review with ironic-week-prio: https://tinyurl.com/ironic-weekly-prio-dash 15:10:23 <dtantsur> Finally, 2026.1 Gazpacho Release Schedule https://releases.openstack.org/gazpacho/schedule.html 15:10:35 <dtantsur> Anything else to announce or remind of? 15:10:46 * dtantsur sprinkles virtual coffee in the air 15:11:27 <dtantsur> I wonder how many people are still confused by the daylight changes :) 15:11:31 <dtantsur> anyway 15:11:41 <dtantsur> #topic Working Group Updates 15:12:10 <dtantsur> TheJulia: should we close the eventlet WG or reuse it for further work like delayed tasks? 15:12:15 <TheJulia> I just removed the line item 15:12:47 <TheJulia> I'll try to rev the delayed task spec this week, fwiw 15:13:11 <dtantsur> Good, please ping me if I miss the notification (I most certainly will) 15:13:30 <dtantsur> alegacy: anything on standalone networking, except for the large patch chain to review? 15:13:57 <dtantsur> #link https://review.opendev.org/q/topic:%22feature/standalone-networking%22 standalone networking patches 15:13:59 <alegacy> No, no further updates. I managed to split the large patch into smaller chunks. I hope that's a better configuration. 15:14:11 <dtantsur> Thanks, appreciated! 15:14:36 <dtantsur> Meanwhile, I don't have anything new on the asyncio front (dragged into downstream priorities). cid, do you have anything? 15:14:50 <cid> Not at all. 15:15:02 <dtantsur> okie cool 15:15:10 <dtantsur> #topic Bug Deputy Updates 15:15:14 <dtantsur> cid: thats you again :) 15:15:22 <cid> Yup :) 15:15:48 <cid> There were a total of 12 bugs, 5 of which were RFEs 15:16:02 <TheJulia> Clearly bug filing occurs with caffeine 15:16:04 <TheJulia> :) 15:16:07 <dtantsur> heh 15:16:13 <dtantsur> Should we review the RFEs, do you think? 15:16:17 <cid> :) 15:16:18 <cid> I will just drop the full list here 15:16:26 <cid> https://bugs.launchpad.net/networking-generic-switch/+bug/2130384: Treat a DPU like a switch 15:16:26 <cid> https://bugs.launchpad.net/ironic/+bug/2130646: Enable the agent deploy interface to auto-switch to bootc 15:16:26 <cid> https://bugs.launchpad.net/ironic/+bug/2130667: Enable Redfish interactions with network device functions 15:16:26 <cid> https://bugs.launchpad.net/ironic/+bug/2129469: Add PostgreSQL database backend support 15:16:26 <cid> https://bugs.launchpad.net/ironic/+bug/2129690: Make all nodes of a conductor-group write grub config 15:16:57 <dtantsur> TheJulia: DPU-as-a-switch, is it spec worthy? 15:16:57 <TheJulia> The first one, I filed, the broad idea is the ability for NGS to begin to treat a DPU like a switch 15:17:05 <TheJulia> I don't think so, I think its more just a driver 15:17:07 <rpittau> I think we can jsut close the postgresql one as cid suggested before 15:17:09 <TheJulia> or drivers 15:17:09 <dtantsur> (I know that we don't usually do specs for n-g-s) 15:17:37 <TheJulia> Issue is, it can only be developed by someone with their hands on a card 15:17:43 <dtantsur> Yeah, let's prepare a polite response for PostgreSQL. Something along the lines of "Alas, OpenStack moved on from it, and we cannot maintain another database on our own" 15:17:46 <TheJulia> which is similar, really, to physical switcehs 15:17:48 <opendevreview> Riccardo Pittau proposed openstack/ironic-python-agent master: Test advertised ip reachability before assigning it https://review.opendev.org/c/openstack/ironic-python-agent/+/963670 15:18:00 <dtantsur> TheJulia: I assume you don't have one handy? 15:18:03 <TheJulia> dtantsur: w/r/t postgres, concur 15:18:15 <TheJulia> dtantsur: I don't, I might be able to grab one out of a lab personally 15:18:32 <TheJulia> its more a broad idea than anything else 15:18:37 <TheJulia> since each card will be different in the end 15:18:45 * cid Goes to update the bug 15:19:28 <TheJulia> the second one, regarding auto-switching the device, I think stevebaker is taking a look at. 15:19:30 <dtantsur> Yeah, I see. I don't mind rfe-approved on this, assuming we don't need to change any data model 15:19:35 <TheJulia> s/device/driver/ 15:19:47 <dtantsur> (this was about the DPU one) 15:20:01 <TheJulia> I don't think so, I think the existing model works for that 15:20:17 <dtantsur> I'm fine with the bootc one, but I know that JayF had reservations, so I'd not approve it until I hear from him 15:20:23 * TheJulia wonders if we should have sifted through these one at at ime 15:20:41 <cardoe> gah time change :/ 15:20:44 <TheJulia> s/ime/time/ 15:20:54 <TheJulia> Daylight savings time is indeed, evil. 15:20:59 <TheJulia> Possibly the root of all evil. 15:21:14 <rpittau> hopefully it will disappear from EU next year 15:21:46 <TheJulia> (We voted it away here in california, but the US seems hellbent on ignoring our resolution) 15:22:04 <dtantsur> heh 15:22:30 <cardoe> We did as well in Alabama (to my shock) but we gotta wait just like CA. 15:22:34 <TheJulia> Next up in the catecory of roots of all evil, cats seeking to cuddle with the cables behind the laptop 15:22:47 * TheJulia can't spell today 15:22:55 <TheJulia> So, back to RFEs 15:23:02 <dtantsur> Cats tend to be.. energized ;) 15:23:21 <cardoe> So I wanted to get to the changing of the deploy interface…. 15:23:51 <cardoe> It feels like we need a way to change that dynamically-ish. From the nova-ironic side it could be an image property. 15:23:54 <TheJulia> The third, network device functions, I see that as first step spend some time in sushy modeling the options and largely in sushy but it proposes a general idea of some clean steps to toggle the options around. 15:24:20 <TheJulia> for going to something like anaconda, I think that makes a lot of sense 15:24:26 <TheJulia> the same general idea is present though 15:24:39 <dtantsur> Network device functions is cool, I wonder if you know how much of a mess NetworkInterfaces are... 15:24:42 * dtantsur summons janders 15:24:55 <TheJulia> aiui, fairly messy, unfortuantely. 15:24:57 <cardoe> He’s in EU time zone no now? 15:25:01 <janders> yes 15:25:28 <janders> we had a chat with TheJulia in Paris about the NetworkAdapters bug I'm tackling 15:25:40 <dtantsur> ah, NetworkAdapters. Are these different? 15:25:43 <TheJulia> I was largely thinking we might be able to model it on a single platform 15:25:45 <TheJulia> so.. 15:25:59 <TheJulia> you basically need to walk from the system -> Network Adapters -> NetworkDeviceFunctions 15:26:25 <dtantsur> aha, so you may be forced to ensure that something (IPA?) is running on the machine 15:26:29 * janders is looking at BMC mockups to see where things are tied together 15:26:42 <janders> ++ 15:26:52 <janders> or at least have some of a fallback for pathological cases 15:26:57 <TheJulia> dtantsur: based upon the model, not really. That is if vendors have tied things cleanly and/or we already have the MAC addresses 15:27:18 <dtantsur> I thought the resource was not accessible when the OS is not running? 15:27:34 <TheJulia> the host might need to be "on" 15:27:40 <janders> dtantsur on some servers, yes 15:28:05 <TheJulia> it really depends on the level of firmware and underlay architecture when it comes to what is done for the BMC to talk to other devices 15:28:24 <janders> so we could 1)either have a requirement to have OS running when we do work on NetworkInterfaces or 2) try and see, if failure -> boot IPA 15:28:39 <TheJulia> But the base idea is to take the devices, model them matching the network ports, and toggle network ports out of networking funciton 15:28:42 <TheJulia> function 15:28:45 <TheJulia> and towards a storage function 15:29:23 <TheJulia> Ultimately, it will all depend on what BMCs return and the error handling which is needed there 15:29:24 <dtantsur> Does it make sense to have something on the Port API to mark it? so that the steps themselves only enforce the value? 15:29:30 <dtantsur> (similarly to target_raid_config?) 15:29:46 <TheJulia> I did poke at an idrac10 which was off, and it seemed that the adapters were visible 15:29:56 <janders> I concur 15:30:00 <dtantsur> yeah, the problem is with HPE machines 15:30:04 <janders> exactly 15:30:11 <TheJulia> I was largely thinking we just use the category function and toggle it to "storage" or some other preconfigurable value 15:30:23 <TheJulia> and then just dissuade from trying to use the interfaces for anything else. 15:30:33 <TheJulia> which ties into TBN 15:30:45 <dtantsur> ah, right, we have many more fields now that I'm not aware of :) 15:31:15 <dtantsur> okay, I personally like the idea, but I'd prefer a bit more technical details (and maybe an RFE per step) 15:31:46 <TheJulia> I'm not sure if I'll be able to start on it this year or not, still trying to determine the priorities until EoY 15:32:20 <dtantsur> ack 15:32:20 <TheJulia> It is seeming like VXLAN networking is more important on my front, realistically. 15:32:42 <dtantsur> Then let's table it for now and get back once you have cycles? 15:33:36 <TheJulia> I just wanted to get something recorded that is the seed of the idea first 15:33:41 <TheJulia> onward! 15:34:02 <dtantsur> So, the last is PXE HA 15:34:09 <dtantsur> https://bugs.launchpad.net/ironic/+bug/2129690 15:34:27 <dtantsur> I'd *love* to have this for metal3 HA. I think you even had a draft spec some time ago TheJulia? 15:35:42 <dtantsur> Like before, I doubt anyone has cycles for that 15:35:45 <TheJulia> Yes, I know exactly how this would need to work 15:36:17 <TheJulia> Realistically, it would take a Conductor RPC call to the remote conductors 15:36:45 <dtantsur> I'm still worried about RPC multiplication, but I guess we can get back to it on the actual spec if anyone revives it.. 15:36:48 <TheJulia> for them to do the needful in an abridged form with some additional details in terms of networking 15:37:22 <TheJulia> Where it would need to be wired in, its actually not a big deal and if we have deferred tasks then its feasible to just use that as a mechanism as well 15:37:48 <TheJulia> The key difference is somehow the other conductors need to so some amount of work and it will need to be a shared task under the hood. 15:38:01 <dtantsur> yeah, there are interesting nuances 15:38:30 <dtantsur> I'd also prefer the secondary conductors to refer to images from the primary one, not download them from the internet 15:38:44 <dtantsur> (which is not fully HA though, so I dunno if it fixes the issue in question) 15:38:58 <TheJulia> oh, yeah 15:39:05 <TheJulia> that was something I had baked into the previous idea 15:39:09 <TheJulia> just include a pointer to the other node 15:39:39 <TheJulia> but the broad fear of conductor triggering 1 to N additional messages over RPC was sort of the hurdle the discussion died at 15:39:44 <cardoe> So do we just need like "conductor can upload to an object storage that's not swift"? 15:39:52 <TheJulia> for a super short lived "create a pointer", thats likely not a big deal 15:40:16 <TheJulia> So 15:40:22 <TheJulia> that goes into conductor failover theory 15:40:34 <TheJulia> when a failure occurs, the conductor will automatically try to download artifacts 15:41:24 <dtantsur> Another approach would be for secondary conductors to download images from the primary one 15:41:40 <TheJulia> that is a greater multiplication risk, really 15:41:52 <dtantsur> it is 15:42:20 <TheJulia> the original idea was just generate a pointer to load config/artifacts from the first, and should failure occur and takeover triggers then the conductor and presumably new secondaries would get calls to create new records and new pointers 15:42:33 <TheJulia> I guess 15:42:36 <dtantsur> yeah, fair 15:42:43 <TheJulia> the challenge is how much of a lag are we okay with... or not. 15:42:50 <dtantsur> now someone needs to do the hardest parts: write/update the spec and the code :D 15:42:59 <opendevreview> Verification of a change to openstack/ironic master failed: Fix storing inventory and plugin data in Swift https://review.opendev.org/c/openstack/ironic/+/966259 15:44:39 <TheJulia> yeah 15:44:50 <dtantsur> Can we agree that the last one definitely needs-spec? 15:44:55 <TheJulia> yeah, definitely 15:45:15 <TheJulia> Its totally do-able, but requires agreement on which time/risk gap we're focused on as well 15:45:29 * dtantsur nods 15:45:35 <dtantsur> cid: anything else on bugs or rfes? 15:45:38 <TheJulia> (this quickly turns into one of those "perfection is the enemy of good" things as well. 15:45:45 <dtantsur> also true 15:45:53 * dtantsur suspects everyone else has falled asleep 15:45:59 <cid> That's all on bug deputy updates 15:46:03 <dtantsur> Thanks! 15:46:15 <dtantsur> #topic Open discussion 15:46:24 <dtantsur> The floor is yours, crew 15:46:33 <rpittau> I have one thing 15:46:56 <dtantsur> you have our attention 15:46:58 <TheJulia> dtantsur: mandatory coffee break at the beginning of meetings? 15:47:01 <dtantsur> ++ 15:47:16 <dtantsur> if it was not 4pm here.. 15:47:26 <rpittau> when moving from tinyipa to DIB for integration tests in bifrost I just realized that the old GRUB can't really work well with big vmedia devices when in UEFI 15:47:29 <TheJulia> dtantsur: agenda updated. 15:47:52 <rpittau> meaning that the vmedia jobs will be broken for bookworm and jammy 15:47:52 <rpittau> and not feeling very well for noble 15:47:59 <TheJulia> Well, that is UEFI in general on some hardware as well. 15:48:05 <dtantsur> will it work anywhere at all? 15:48:06 <TheJulia> Is it the 512MB barrier? 15:48:10 <rpittau> they run ok on centos10 as it has a more recent version of grub 15:48:15 <rpittau> it';s smaller 15:48:23 <TheJulia> SRSLY? 15:48:34 * TheJulia waits for a YA'RLY 15:48:34 <dtantsur> hmm, don't we have vmedia coverage in the devstack CI? the standalone job? 15:48:38 <rpittau> anyway jsut wanted to say that I'm going to test with debian DIB 15:48:38 <rpittau> but I would also like to propose alpine as an alternative 15:48:45 <TheJulia> dtantsur: centos based afaik 15:48:49 <dtantsur> TheJulia: then you had to ask O'RLY? 15:49:00 <JayF> Alpine will introduce new testing service to us as it uses MUSL 15:49:00 <dtantsur> TheJulia: ehhmmm? we have devstack jobs on centos?? 15:49:16 <JayF> **surface 15:49:17 <TheJulia> well, okay, ubuntu but they boot centos 15:49:25 <JayF> I don't think it'll matter but it's worth noting 15:49:37 <rpittau> true 15:49:51 <rpittau> the alternative is to drop bookworm and ubuntu jammy entirely 15:50:03 <rpittau> if debian based can't work 15:50:14 <dtantsur> rpittau: hold on, does it mean that only DIB-building jobs are affected? 15:50:21 <rpittau> yep 15:50:31 <rpittau> I wrote that at the beginning :D 15:50:45 <dtantsur> you.. did not? 15:50:59 <dtantsur> DIB building jobs don't even use vmedia, so I'm still confused 15:51:24 <rpittau> I wrote "when moving from tinyipa to DIB for integration tests in bifrost" 15:51:46 <dtantsur> right, but not all integration tests are affected, only dibipa ones 15:51:51 <dtantsur> which don't use vmedia, so... wut? 15:52:18 <rpittau> I probably can't explain myself 15:52:18 <rpittau> I'm converting ALL the jobs to DIBIPA 15:52:22 <rpittau> https://review.opendev.org/c/openstack/bifrost/+/964404 15:52:26 <dtantsur> wait, why? 15:52:36 <rpittau> because we don't support tinyipa anymore 15:52:39 <dtantsur> we have IPA images published, we don't need to build them in literally every job 15:52:45 <rpittau> we don't 15:52:51 <TheJulia> why not? 15:52:53 <dtantsur> (dibipa jobs build IPA using DIB, hence their name) 15:53:00 <rpittau> in the integration jobs we don't build the images 15:53:03 <rpittau> we used the published ones 15:53:13 <TheJulia> So why don't published images work? 15:53:32 <dtantsur> Ah, so you mean if we switch our official images to Debian, then.. oh 15:53:37 <rpittau> because they're too big for uefi 15:53:56 <rpittau> so 15:54:11 <rpittau> if we switch to dib with cs9/10 they're super big >500 MB 15:54:17 <rpittau> with debian we're around 300MB I think 15:54:28 <dtantsur> CS9/10 is already know to work 15:54:42 <rpittau> with recent version of grub, yes 15:54:53 <rpittau> with grub on bookworm or jammy, nope 15:55:39 <dtantsur> does it mean that the ironic's standalone jobs still use tinyipa? 15:55:44 <opendevreview> Merged openstack/networking-generic-switch master: Drop remaining logic for linuxbridge-agent https://review.opendev.org/c/openstack/networking-generic-switch/+/965149 15:55:48 <TheJulia> Are the ipab results still 500+ MB? 15:55:58 <rpittau> I'm ok making bookworm and jammy non-voting for the time being just to switch to DIB 15:55:58 <rpittau> but I'm not sure debian based still make it work 15:56:00 <TheJulia> I thought I posted patches to remove like 90+ MB 15:56:24 * TheJulia senses we're in the weeds and doing ourself a disservice from asynchronous discussion 15:56:34 <janders> I'd like to briefly touch on iRMC deprecation/removal before we wrap up 15:56:46 <dtantsur> rpittau: could you check it? I think we might have pre-built images in the ipa-b location 15:56:50 <TheJulia> rpittau: maybe some deep synchronous discussion after the meeting? 15:56:57 <dtantsur> if debian IPA images is a workaround, we're fine 15:57:02 <TheJulia> dtantsur: we do afaik 15:57:15 <dtantsur> we don't need a call, we need more information 15:57:17 <TheJulia> it might be the IPA location is not updated yet if we've not merged anything in that repo yet 15:57:30 <rpittau> dtantsur, TheJulia, I checked the images, cs9 is 490MB, cds10 is 590MB 15:57:30 <rpittau> https://tarballs.opendev.org/openstack/ironic-python-agent-builder/dib/files/ 15:57:47 <dtantsur> rpittau: let's create a variation of your patch that uses https://tarballs.opendev.org/openstack/ironic-python-agent-builder/dib/files/ipa-debian-master.initramfs 15:57:50 <dtantsur> and see if it passes 15:58:02 <dtantsur> if yes, we're golden, just need to switch our official images 15:58:10 <dtantsur> if not, we may need to actually seriously talk tomorrow 15:58:19 <rpittau> yeah that's what I was going to do 15:58:19 <rpittau> just wanted to check if we're open to the alpine alternative for CI 15:58:31 <rpittau> alright 15:58:44 <TheJulia> https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/965984 will help, a lot. 15:58:45 <dtantsur> we have some crazy options like using a different grub binary... but let's wait for the results 15:58:55 <TheJulia> janders: might as well :) 15:59:08 <janders> I made initial contact with Fujitsu. Most of the discussion is downstream related ( TheJulia I will loop you into this for OSP part ). The upstream-relevant part is FJ are open to deprecation/removal of iRMC from master if it remains in stable branches. 15:59:18 <janders> question 1) is this doable 15:59:29 * dtantsur issues a 1 minute warning 15:59:42 <janders> question 2) how would they be proposing critical fixes if needed? straight to stable/whatever branch? 15:59:58 <dtantsur> it's definitely doable, and we definitely won't remove code from stable branches 16:00:15 * dtantsur doubts they'll propose any fixes now 16:00:28 <TheJulia> 1) yes 16:00:32 <dtantsur> I've already let janders know my opinion, so please others weigh in 16:00:33 <TheJulia> 2) straight to the branch 16:00:49 <janders> dtantsur I agree, unless there's some realisation about CVEs 16:00:59 <dtantsur> like in ancient pysnmp? ;) 16:01:11 <janders> haha they may need to get creative 16:01:22 <janders> it's damage control at this stage anyway 16:01:23 <dtantsur> anyway. does anyone object tho the approach that janders is proposing? 16:01:52 <TheJulia> not at all, it is what it is 16:01:54 <janders> sounds like we have way forward so I will keep pursuing that - there will be some downstream discussion followed by upstream comms once things are clear 16:02:02 <janders> thank you for your insights 16:02:06 <TheJulia> Thanks! 16:02:12 <dtantsur> We need to wrap up, thank you folks! 16:02:16 <janders> o/ 16:02:20 <dtantsur> #endmeeting