15:00:16 <rpittau> #startmeeting ironic
15:00:16 <opendevmeet> Meeting started Mon Jul 29 15:00:16 2024 UTC and is due to finish in 60 minutes.  The chair is rpittau. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:16 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:16 <opendevmeet> The meeting name has been set to 'ironic'
15:00:27 <rpittau> Hello everyone!
15:00:27 <rpittau> Welcome to our weekly meeting!
15:00:27 <rpittau> The meeting agenda can be found here:
15:00:27 <rpittau> #link https://wiki.openstack.org/wiki/Meetings/Ironic#Agenda_for_July_29.2C_2024
15:01:03 <TheJulia> o/
15:01:50 <mohammed> o/
15:02:00 <JayF> O/
15:02:48 <cardoe> o/
15:03:31 <rpittau> alright let's start
15:03:34 <masghar> o/
15:03:35 <rpittau> #topic Announcements/Reminders
15:03:42 <cid> o/
15:03:53 <rpittau> #info Standing reminder to review patches tagged ironic-week-prio and to hashtag any patches ready for review with ironic-week-prio
15:03:53 <rpittau> #link https://tinyurl.com/ironic-weekly-prio-dash
15:03:56 <iurygregory> o/
15:04:20 <rpittau> 2-3 patches that looks ready there
15:05:01 <rpittau> #info 2024.2 Dalmatian Release Schedule
15:05:02 <rpittau> #link  https://releases.openstack.org/dalmatian/schedule.html
15:05:02 <rpittau> new bugfix branches will be requested this week
15:05:31 <rpittau> we're at R-9, 3 weeks to release for non-client libraries
15:06:06 <rpittau> #info he next OpenInfra PTG which will take place October 21-25, 2024 virtually! Registration is now open!
15:06:06 <rpittau> #link https://ptg.openinfra.dev
15:06:06 <rpittau> the ironic team has been registered!
15:06:17 <rpittau> and JayF prepared a meetpad page
15:06:24 <rpittau> #link https://etherpad.opendev.org/p/ironic-ptg-october-2024
15:06:31 <TheJulia> ... 21-st through 25th... right when I was planning on being on vacation :(
15:06:37 <TheJulia> le-sigh
15:06:39 <rpittau> put your ides/suggestions/comments there
15:06:48 <rpittau> TheJulia: d'oh
15:07:13 <rpittau> I may miss a couple of days too, probably half of that week
15:07:52 <rpittau> well, we'll see
15:08:27 <rpittau> anything else we want to announce/remind ?
15:08:52 <JayF> I'll note that that ptg schedule is also within a stone's throw of my brother's wedding. I don't have a direct conflict, but I could see enough of us having trouble that we may want to have some type of off-cycle ironic sync
15:09:42 <rpittau> JayF: let's see how many people are available and then we can definitely think about having a separate sync, maybe reducing the official PTG time
15:09:43 <TheJulia> Honestly, I think more high bandwidth/sync communiation is better than once every six months
15:10:04 <TheJulia> so if we have a pre-ptg sync, we could at least get some stuff raised up/identified for wider discussion
15:10:10 <JayF> That's actually a pretty good observation. At a minimum, consider bringing back spuc... We could also consider doing a once-monthly video meeting similar to what the TC does
15:10:33 <TheJulia> which would better enable us to move forward with one person who might have previously been critical for the discussion to take place
15:10:59 <rpittau> sure, although IMHO the PTG helps on planning for the next release, so we should still decide on priorities for the next cycle
15:11:06 <TheJulia> Sanity presevation unconference often did end up being some deep technical discussion
15:11:18 <TheJulia> rpittau: agree completely
15:11:45 <TheJulia> I just don't think we should shoehorn all activities including problem definition into the PTG and expect success to result
15:12:37 <JayF> ++
15:12:38 <TheJulia> like Adam's whole disk encryption proposal.... that might need some advance discussion before planning or at least base context sharing
15:12:42 <rpittau> I guess we can come up with the priorities during other discussions and then just write them down to have them ready for the next cycle
15:12:56 <JayF> TheJulia: I just wanna talk with him about it b/c it's so dang cool :D
15:13:27 <TheJulia> Well, spreading understanding is distinctly different than defining and deciding on priorities
15:13:40 <TheJulia> JayF: oh my, I haven't even read it yet becuase insanely busy
15:13:46 <JayF> we've often combined what and how in PTG
15:13:48 * TheJulia resists for now
15:13:50 <JayF> splitting them is wise
15:14:06 <TheJulia> we don't have to entirely split, just... spread context, and it doesn't have to be formal
15:14:13 <JayF> TheJulia: eh, more that I'm a sicko who appreciates seeing crap I've manually done in a gentoo desktop get mainstream enough that I might to get it for my day job :D
15:14:42 <TheJulia> eh, that is more Freaky, TBH :)
15:14:46 <cardoe> Someone likes self flogging.
15:14:59 <TheJulia> heh
15:15:00 <JayF> not exactly :D
15:15:10 <TheJulia> Okay, back to the topic at hand!
15:15:14 <JayF> It's not that tough anyway, dracut supports it well
15:15:24 <rpittau> let's see how many topics we have for the PTG first
15:15:57 <TheJulia> I do think bringing back the SPUC is worthwhile just from a contributor connection/relationship building standpoint
15:16:15 <JayF> ++ especially with new folks in the community
15:16:35 <cardoe> Can someone define SPUC? (sorry newbie here)
15:16:42 <TheJulia> Sanity Preservation Un-Conference
15:16:49 <masghar> (Also listening)
15:16:59 <TheJulia> A weekly call we did during the pandemic to give people an outlet to vent/connect/relate
15:17:09 <TheJulia> And discuss any topics in high bandwidth
15:17:11 <masghar> Oh that sounds nice
15:17:44 <TheJulia> Keep in mind, when we started it, most of the core contributors were loosing all remaining sanity
15:17:48 <TheJulia> But it really did help
15:18:23 <JayF> We eventually gave up on sanity preservation
15:18:30 <JayF> but friendship is valuable even for those who have no sanity :)
15:18:35 <TheJulia> ++
15:18:37 <rpittau> I agree
15:18:37 <rpittau> if we bring that back maybe we could do it once per month or something like that
15:18:42 <TheJulia> Anyway, lets get meeting back on track
15:18:59 <rpittau> yep
15:19:11 <rpittau> #topic Review Ironic CI status
15:19:16 <TheJulia> Anything that is not weekly makes it hard to keep it in mind, fwiw.
15:19:28 <rpittau> #info ironic CI was impacted by an issue caused by the removal of simplejson from osc reqs
15:19:59 <JayF> stevebaker[m] and I are doing a stable branch CI audit, I think much of that is in better shape than usual
15:20:04 <JayF> but I need to check in to see how far we've gotten
15:20:22 <rpittau> JayF: I think I reviewed most of the changes already
15:20:55 <rpittau> btw this is the meetpad link
15:20:55 <rpittau> #link https://etherpad.opendev.org/p/ironic-ci-audit-july-2024
15:22:12 <JayF> yeah I just haven't checked that in a couple days, had other distractions
15:22:17 <rpittau> I'll have another look at the patches during the week
15:22:58 <rpittau> probably we need to clarify that anything that still uses CS8 must be removed or made non-voting
15:23:21 <rpittau> not sure it's worth to upgrade to CS9 unless we still maintain the branch
15:23:45 <TheJulia> Removed, I believe
15:24:00 <TheJulia> since the base images are also gone and the jobs just get ignored as a result
15:24:12 <rpittau> yeah, actually better not waste CI resources
15:24:17 <TheJulia> ++
15:24:47 <rpittau> anything else CI related ?
15:25:41 <rpittau> alright so we don't have discussion topics, unless someone has something to bring up
15:26:08 <JayF> I think cardoe had something for open discussion
15:26:18 <rpittau> sure thing
15:26:27 <cardoe> Hello all. Got a few asks.
15:27:02 <cardoe> So working on an effort around a sizeable amount of machines living in Ironic (multiple ones that we'll have as regions)
15:27:17 <cardoe> Our deployment is centered around Kubernetes and we're using OpenStack Helm.
15:27:38 <cardoe> We're letting people consume the hardware via Nova and some like jamesdenton consume it via baremetal directly.
15:27:43 <cardoe> Just wanted to share that context.
15:27:58 <TheJulia> Interesting....
15:28:04 <TheJulia> context++
15:28:14 <JayF> oh, you're working on my old cluster, or something adjacent to it :D
15:28:15 <cardoe> Happy to give more details anytime as well.
15:28:40 <cardoe> So we've tried to use the redfish inspector via sushy and there's quite a bit of bits lacking.
15:28:45 <JayF> I'm sorry and/or you're welcome as appropriate :)
15:28:57 <chris218> Hi is the meeting still ongoing?
15:29:05 <TheJulia> chris218: yes, cardoe has the floor
15:29:07 <cardoe> I've been monkey patching some more endpoints into sushy because I know there's a specific targeted profile, but how much appetite is there to extensions there?
15:29:21 <cardoe> e.g. ethernet_interfaces is busted on a bunch of hardware.
15:29:32 <cardoe> https://review.opendev.org/c/openstack/ironic/+/924943
15:29:45 <TheJulia> cardoe: so there is a lot of history here, so I guess to sort of start off
15:30:10 <cardoe> e.g. Dell R76xx family of hardware with iDRAC 9 7.x.y.z returns only the 2 on board 1GB interfaces and then an empty MAC address and none of the other NIC cards plugged in.
15:30:13 <TheJulia> The out-of-band introspection is minimal information, but I think we would accept patches to obtain/match identify more within reason
15:30:22 <TheJulia> ethernet_interfaces themselves has a long, painful, history
15:30:33 <cardoe> Right. I've switched us to using Ports
15:30:37 <TheJulia> cool cool
15:30:57 <dtantsur> You have a fix though, it just needs to be moved to sushy?
15:31:40 <cardoe> Well the fix for that change request is to just filter out the empty MAC. Which is done in a few places in Ironic code. Probably should be centralized.
15:31:40 <TheJulia> so the other challenge there is depending on the cards, the interface between the BMC and the firmware (i.e. is it dell's version fo the firmware, or chipset OEM's card, or even third party firwmare), your going to get different BMC reporting behavior *as well*
15:31:58 <cardoe> My "full" fix is to stop using ethernet_interfaces and use the ports endpoint in Redfish, but that's a big change to the interop profile.
15:32:00 <TheJulia> filtering the empty mac makes sense
15:32:26 <dtantsur> cardoe: yeah, also we cannot stop using them - we'll definitely find a hardware that only has EthernetInterfaces...
15:32:28 <cardoe> I've got piles of HP and Dell gear only that I'm testing against. So that's gonna be my bias filter.
15:32:35 <dtantsur> (we've been in similar positions several times)
15:32:37 <TheJulia> the underlying problem is.... I'm trying to think of the protocol used for out of band communications between the BMC and the cards, it knows it is a nic, but not enough information/support to pass along the mac address
15:32:46 <TheJulia> so filtering empty makes *tons* of sense
15:33:06 <dtantsur> but we definitely need to fix sushy to rule out empty strings. if you're not planning on it, we should probably Just Do It ourselves
15:33:22 <cardoe> So would there be an appetite for adding a "ports" interface to sushy? Ironic doesn't have to depend on it.
15:33:32 <dtantsur> TheJulia: we already have "is not None" already btw, but that does not capture empty strings
15:33:35 <TheJulia> I would be happy to review such patches
15:34:02 <cardoe> You don't have to do it. You gimme $TIME and I'll cherry-pick them out and push them to gerrit
15:34:04 <dtantsur> cardoe: we're open for any standard-compliant extensions that open new possibilities for Ironic or adjacent projects
15:34:05 <TheJulia> dtantsur: yeah, that was sort of what I was figuring
15:34:34 <cardoe> Yeah so my Ports implementation follows the DMTF spec
15:35:02 <rpittau> that's a great selling point :)
15:35:06 <cardoe> I tried to make it as identical how you guys have done the others. But I'm sure there's some stuff I've missed and don't know. I'm happy to get review and do the lift to get it landed.
15:35:46 <JayF> that sounds like about as easy of a change as we'll ever have to approve from a theoretical standpoint. A standards-compliant implementation of a redfish endpoint mimicing existing style :D
15:35:55 <cardoe> Essentially I'm trying to add as much of what Ironic Inspector did into the Redfish inspector.
15:36:07 <cardoe> To have as much out of band as possible.
15:36:24 <dtantsur> ++
15:36:42 <cardoe> When I send a human into the DC to fix up a box, I wanna quickly out of band make sure the hardware they swapped is where it's suppose to be.
15:37:00 <cardoe> and before they get back to their desk be able to tell them "no go back and fix it"
15:37:05 <TheJulia> The one thing to keep in mind, you may still want a full OS to boot and look for any devices, because BMC support and reporting replies on firmware on the card behaving on the i2c bus
15:37:44 <TheJulia> ++
15:37:45 <cardoe> 100%. We're doing out-of-band not using the ironic redfish inspector today. But I'm pushing my team to use sushy (our fork) for all of that.
15:38:04 <cardoe> Then we're flipping it to ironic inspector to do the more complete inspection.
15:38:10 <TheJulia> very cool
15:38:17 <TheJulia> Sounds good
15:38:26 <cardoe> Switching to the "agent" currently.
15:38:57 <cardoe> Just wondering if we can maybe contribute this to the redfish inspector eventually.
15:39:07 <cardoe> Or land our sushy bits into actual upstream.
15:39:18 <JayF> I think the answer is very yes
15:39:30 <cardoe> We did some Dell specific bits in our sushy fork which now wanting us to fork sushy-oem-idrac and get it there.
15:39:33 <cardoe> Okay good.
15:39:45 <JayF> I think we've considered in the past rolling sushy-oem-drac back into sushy
15:39:53 <JayF> might be worth having a larger discussion about that if it makes what you're doing easier, too
15:39:59 <cardoe> The other issue we've got is for new hardware out of the box (or maybe pallet? or shipping container? I dunno never seen the gear in person)
15:40:18 <cardoe> A lot of Dell stuff doesn't PXE boot for example.
15:40:32 <TheJulia> sushy-oem-idrac is largely for the super-specific behavior dell doesn't intend to change/fix/make compliant
15:40:35 <cardoe> We do some Redfish calls to fix it up
15:40:37 <dtantsur> is virtual media an option (instead of pxe)?
15:40:44 <TheJulia> will the dell gear do httpboot out of the box?
15:41:04 <cardoe> Not sure. I need to still test with that.
15:41:23 <cardoe> That was my backport question cause the idrac driver doesn't allow redfish-https
15:41:47 <TheJulia> oh, to send a network boot url out of band?
15:42:12 <cardoe> yeah. It's on a REAL soon sprint for someone to work that.
15:42:17 <cardoe> And confirm
15:42:25 <TheJulia> yeah, that is likely just a bug from my point of view
15:42:36 <cardoe> I assume httpboot you meant redfish-https
15:42:44 <TheJulia> yeah, one of the flavors
15:42:53 <TheJulia> there are two distinct http boot flavors
15:43:22 <cardoe> So we poke the BIOS settings via redfish
15:43:25 <dtantsur> virtual media exists for somewhat longer though
15:43:37 <TheJulia> I think dell gear from the factory ships leaning towards httpboot out of the box, but I've never seen a fresh from the factory box which has not been modified by well meaning humans
15:43:48 <cardoe> yeah virtual media doesn't work on the last dozen? or more racks we've gotten from Dell.
15:44:07 <TheJulia> oh joy
15:44:28 <cardoe> https://paste.opendev.org/show/bpoxYEvFBbqHpsGTy1IF/ https://paste.opendev.org/show/bUZln4wKQ1eINnjpTscw/
15:44:28 <TheJulia> cardoe: if you can get us details in a bug, we might be able to assist with that
15:44:35 <dtantsur> huh, interesting. we have bugs from time to time, but never seen any issues at scale
15:44:35 <cardoe> Will do.
15:44:56 <cardoe> I mean I didn't try EVERY server in the rack
15:45:12 <TheJulia> I think this one has appeared in very recent firmware
15:45:14 <TheJulia> "Virtual Media is detached or Virtual Media devices are already in use."
15:45:19 <cardoe> But n is definitely n>1
15:45:29 <cardoe> yes. These boxes all came with iDRAC9 7.00.60.00
15:45:30 <TheJulia> I ?think? Iury was starting to look a week or two ago
15:45:43 * iurygregory looks
15:45:51 <cardoe> Whatever the 6.x.y.z version was didn't do this.
15:46:06 <TheJulia> yeah
15:46:16 <cardoe> So to that effect, I tried to use the bios config of the redfish driver in Ironic.
15:46:19 <iurygregory> oh I saw this error once
15:46:24 <TheJulia> Anything else cardoe? I think chris218 was next up :)
15:46:24 <cardoe> But it doesn't allow for disable_ramdisk=True
15:46:31 <iurygregory> but it was also a networking problem
15:46:50 <dtantsur> cardoe: re disable_ramdisk, janders and I have mid-terms plans to look into that (in the context of servicing)
15:46:56 <iurygregory> ironic couldn't reach the address I had the vmedia
15:47:05 <cardoe> okay I'll work with janders on that.
15:47:14 <cardoe> Perfect. I'll give up the floor. Thank you all.
15:47:19 <rpittau> cardoe:  would be nice to continue the discussion, please open a bug and we'll look into it :)
15:47:20 <rpittau> thanks!
15:47:20 <TheJulia> Thanks cardoe!
15:47:53 <rpittau> any other discussion topics?
15:47:56 <cardoe> Just wanna say thank you all for the hard work. I wanna make sure we contribute back and get my folks engaged so thanks for entertaining me and my questions.
15:48:26 <TheJulia> chris218: you were wondering if the meeting was still in progress? Do you have a topic for discussion?
15:50:48 <TheJulia> I guess something got their attention.
15:50:54 * TheJulia shrugs
15:51:02 <cid> chris218 is probably away from the keyboard.
15:51:04 <rpittau> let's move on
15:51:19 <rpittau> #topic Bug Deputy Updates
15:51:24 <rpittau> cid: anything to report?
15:51:41 <cid> Nothing really.
15:51:59 <cid> Only a new bug was filed and someone help me with triaging it.
15:52:10 * cid one
15:52:35 <rpittau> ok cool
15:52:46 <rpittau> any volunteer for bug deputy for this week ?
15:53:10 <iurygregory> I can
15:53:15 <rpittau> thanks iurygregory :)
15:53:49 <rpittau> #topic RFE review
15:54:00 <rpittau> we have an rfe to discuss apparently https://review.opendev.org/c/openstack/sushy-tools/+/923111
15:54:27 <JayF> There seems to be a lot of ... undocumented context around the direction of sushy-tools?
15:54:38 <mohammed> a tiny patch on sushy-tools. It's a minimal and generic hook that can be enabled to send status changes to a pluggable component's API endpoint
15:54:47 <JayF> I don't have a strong care about it, other than worrying about if this complexity could be maintained if the folks interested in these advanced features went away
15:54:56 <dtantsur> JayF: not really a lot, the root is very simple: the group of us wants to adapt sushy-tools for scale-testing Ironic
15:55:02 <dtantsur> like, scale scale
15:55:28 <JayF> That sounds like a worthy thing, and exciting, but without the plan laid out for upstream about how it's being done, it does make it difficult to review
15:55:31 <dtantsur> then there is some history on how we came to this specific proposal
15:55:53 <TheJulia> I think the problem is the original discussion didn't spread wide so ongoing context is lost
15:56:08 <dtantsur> JayF: have you seen the updated docs? https://review.opendev.org/c/openstack/sushy-tools/+/923111/14/doc/source/user/dynamic-emulator.rst
15:56:09 <TheJulia> (Hey, this is the sort of thing SPUC would have helped with! *ducks*)
15:56:19 <dtantsur> SPUUUC \o/
15:56:21 <mohammed> Why do we need it? It extends the fake system, which is currently insufficient for testing end-to-end Ironic deployments.
15:56:26 <masghar> SPUC++
15:56:41 <JayF> I guess I just wish I could see the whole picture
15:56:50 <mohammed> who will use it? For now, but not limited to, the fake IPA (which mocks some real IPA command executions). This component is in progress on the Metal3 org, and we plan to integrate it with the fake system using this interface.
15:56:55 <JayF> right now it feels like I'm reviewing the "y" of the xy problem on all these changes because I can't see the bigger picture
15:57:08 <chris218> TheJulia: nah I had a question about implementing custom ironic driver
15:57:10 <JayF> which is mostly fine, I rarely review sushy* stuff anyway, but it seems to need the review attention now
15:57:30 <mohammed> Note: Sushy-tools emulates Redfish-compliant hardware, creating a "fake" environment to mimic the behavior of physical hardware for testing and development purposes. The fake system within sushy-tools adds another layer of simulation, creating fake drivers and components on top of the already emulated hardware environment. Essentially, the fake system is a "fake of the fake" and is intended only for testing purposes. I
15:57:30 <mohammed> just want to use it, and if there is no motivation to merge it into sushy-tools, I can host it elsewhere to continue using it effectively.
15:58:17 <cardoe> I mean I've thought about using something similar to that for some "fast" tests.
15:58:20 <TheJulia> I think it is likely okay
15:58:48 <TheJulia> it being the doing a thing, not having looked at the content yet
15:58:55 <cardoe> But our concern has definitely been to not pull a Crowd Strike and implement our behavior to the fake implementation and then have issues in prod.
15:59:30 <TheJulia> .... too soon
15:59:42 * TheJulia had to deal with airline flight drama
16:00:14 <mohammed> i do not think sushy will run in any prod env
16:00:22 <dtantsur> cardoe: sushy-tools is something that should not exist in production hence we're trying to put everything there
16:00:33 <dtantsur> mohammed: please avoid confusion: sushy vs sushy-tools
16:00:41 <dtantsur> sushy is a Python library that Ironic uses
16:00:43 <JayF> mohammed: yeah, mainly my question in the review was in an attempt to have this conversation happen
16:00:45 <rpittau> I guess we need to keep discussing that, maybe one ore more reviews with questions/answers in the patch could help
16:01:02 <JayF> mohammed: the conversation has happened, people are more on the same page, I think you'll get this all landed
16:01:13 <TheJulia> JayF: ++
16:01:14 <dtantsur> I think the patch looks solid, I just spotted one issue with power state handling
16:01:23 <rpittau> alright then :)
16:02:08 <rpittau> going to close the meeting, we're a couple of minutes out
16:02:10 <rpittau> thanks all!
16:02:14 <rpittau> #endmeeting