15:00:26 <rpittau> #startmeeting ironic
15:00:26 <opendevmeet> Meeting started Mon Jun 17 15:00:26 2024 UTC and is due to finish in 60 minutes.  The chair is rpittau. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:26 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:26 <opendevmeet> The meeting name has been set to 'ironic'
15:00:33 <JayF> o/
15:00:35 <TheJulia> o/
15:00:55 <iurygregory> oi/
15:01:00 <rpittau> Hello everyone!
15:01:00 <rpittau> Welcome to our weekly meeting!
15:01:00 <rpittau> The meeting agenda can be found here:
15:01:00 <rpittau> #link https://wiki.openstack.org/wiki/Meetings/Ironic#Agenda_for_June_17.2C_2024
15:02:01 <rpittau> #topic Announcements/Reminders
15:02:18 <rpittau> #info Standing reminder to review patches tagged ironic-week-prio and to hashtag any patches ready for review with ironic-week-prio:
15:02:19 <dtantsur> o/
15:02:25 <rpittau> #link https://tinyurl.com/ironic-weekly-prio-dash
15:03:09 <rpittau> #info The Ironic/Bare Metal SIG meetup at CERN happened on June 5, leaving the link to the notes still for this week
15:03:18 <rpittau> #link  https://etherpad.opendev.org/p/baremetal-sig-cern-june-5-2024
15:03:25 <JayF> I filed several RFEs out of the feedback from that
15:03:35 <JayF> I haven't added them to RFE review in the meeting though, I missed that
15:03:38 <rpittau> JayF: I saw that, thanks
15:03:48 <JayF> we can potentially go over them at the appropriate time if desired
15:03:58 <rpittau> sounds good
15:04:34 <rpittau> #info 2024.2 Dalmatian Release Schedule
15:04:34 <rpittau> #link  https://releases.openstack.org/dalmatian/schedule.html
15:05:40 <rpittau> anything else to announce/remind ?
15:06:18 <rpittau> ok, moving on
15:06:21 <rpittau> #topic Review Ironic CI status
15:06:32 <rpittau> #info metal3 integration job is broken again \o/
15:07:15 <rpittau> probably worth discussing why we didn't see CI breaking in metal3-dev-env during the metal3 meeting on Wednesday
15:08:11 <rpittau> anything else CI related ?
15:08:23 <JayF> It sounds like the pySNMP change hurt us, too
15:08:33 <JayF> I'll look at 912328 and see if I can get it over the line with some time today
15:08:40 <TheJulia> https://d2e31a25148547148788-51f9901de94768cc4d0e03f07c031664.ssl.cf1.rackcdn.com/921966/2/gate/ironic-tempest-ramdisk-bios-snmp-pxe/7f972ac/controller/logs/screen-ir-cond.txt <-- confirmed
15:09:05 <TheJulia> somewhere driver loading now breaks
15:09:18 <rpittau> I left a message for Lex Li in the original thread on github https://github.com/lextudio/pysnmp/issues/49
15:09:19 <TheJulia> Haven't gotten far enough to figure it out exactly
15:09:28 <sylvr> dtantsur: I found the issue I had, and yes it was a misconfiguration...
15:09:48 <dtantsur> good to know!
15:10:15 <sylvr> etc/kayobe/kolla/config/bifrost/bifrost.yml contained the wrong driver, I corrected it and now it works
15:10:30 <rpittau> TheJulia: maybe some hints from the failing unit tests?
15:11:17 <TheJulia> the failing unit tests with the patch to bump our local state is 2000+ failing tests for me locally
15:11:39 <JayF> it was only 6 in the gate, weird
15:11:42 <TheJulia> but that has been the path I've been looking at
15:11:44 <sylvr> I added the kolla dir in my .gitignore to avoid pushing some secrets in the git repo, but then I didn't tracked this file... so hidden from my config..
15:11:49 <sylvr> thanks!
15:12:14 <TheJulia> JayF: I should check the eventlet version of the gate...
15:12:38 <TheJulia> Anyway, its all broken, we can move the meeting along
15:12:48 <TheJulia> its going to take time to figure out what is going on exactly
15:13:46 <JayF> TheJulia: seeing 6 locally here on a clean tox run too, fwiw; I'll take a look at getting those to pass locally post-meeting
15:14:01 <rpittau> TheJulia: I can propose a revert for the uc change until our snmp job passes
15:14:30 <TheJulia> Lets wait and see, I think it is just worse with lextudio's patch, at least on debian
15:14:56 <rpittau> ok
15:15:00 <TheJulia> btw, was that count with just tox -epy3, or did you try tox -ecover ?
15:15:24 <JayF> TheJulia: tox -epy3 on python3.12
15:15:35 <JayF> ^ -r
15:15:39 <TheJulia> I ran cover with 3.11
15:15:46 <TheJulia> ack
15:15:55 <JayF> cover can be a bit ... misleading when there are certain kinds of failures ime
15:16:12 <JayF> e.g. some failures can cascade
15:16:16 <JayF> but we should probably move on the meeting :D
15:16:20 <rpittau> let's move on!
15:16:23 <rpittau> :)
15:16:25 <rpittau> #Discussion topics
15:16:30 <rpittau> heh
15:16:33 <rpittau> #topic Discussion topics
15:16:47 <rpittau> JayF: you have something there :)
15:17:09 <JayF> So Reverbverbverb has a draft up of his report on Ironic documentatation in google docs
15:17:24 <JayF> I implore you, please find time in the next week to go over it and make comments
15:17:46 <JayF> we will be converting this into "final version" and filing bugs around the action items as early as this time next week barring feedback to the contrary
15:17:58 <rpittau> #action review reverbverbverb Ironic documentation audit update & results
15:17:58 <rpittau> #link https://docs.google.com/document/d/1e9URuPHKNTx5QXdkCFAzsS0EAAxPJ-xQC4BEpDwESp4/edit
15:18:21 <JayF> ty for that, you got it copied before I did :D
15:18:23 <rpittau> thanks Reverbverbverb for that!
15:18:49 <Reverbverbverb> My pleasure. Please comment in the doc.
15:18:56 <rpittau> will do :)
15:19:42 <rpittau> anything else to add?
15:20:25 <Reverbverbverb> My review of the doc is necessarily big-picture, but please feel free to comment on anything about the doc
15:21:14 <JayF> I think that's all for our action item
15:21:15 <Reverbverbverb> I'll be breaking the analysis down to (I hope) actionable items, so anything you have to say will help
15:21:38 <rpittau> alright, thanks again
15:21:41 <Reverbverbverb> 👍
15:21:43 <dtantsur> Thanks Reverbverbverb!
15:22:00 <rpittau> #topic Bug Deputy Updates
15:22:34 <rpittau> so that was me last week
15:22:34 <rpittau> haven't seen a lot of movements in terms of bugs but this popped up on Friday
15:22:34 <rpittau> #link  https://bugs.launchpad.net/ironic/+bug/2069413
15:23:23 <TheJulia> Looks like they proposed a fix, I'll try to take a look this week
15:23:28 <rpittau> thanks TheJulia
15:23:46 <rpittau> any volunteer for this week bug deputy ?
15:24:26 <JayF> give it to me
15:24:32 <rpittau> thanks JayF :)
15:24:50 <rpittau> moving on!
15:24:57 <rpittau> #topic RFE Review
15:25:18 <rpittau> JayF: you want to mention the RFEs you opened ?
15:25:35 <JayF> yeah I have a bunch of recent rfes that need review, not all are mine but all need attention
15:25:49 <JayF> #link https://bugs.launchpad.net/ironic/+bug/2069085
15:25:53 <JayF> RFE: Add a burnin_gpu step
15:26:09 <JayF> I'll note all these are against *ironic* even though for many of them, IPA commits are all that are needed
15:26:37 <JayF> That's pretty straightforward; as requested by operators at the cern meetup, we can hook up the stress-ng support for GPUs to a burnin_gpu step
15:26:41 <rpittau> ok
15:26:41 <rpittau> we can always add the project there
15:26:49 <JayF> I don't think there's anything controversial here and it's a good first issue
15:27:04 <rpittau> that looks ok to me
15:27:21 <JayF> #link https://bugs.launchpad.net/ironic/+bug/2069083
15:27:25 <JayF> RFE: Use results of burnin_* methods for inspection
15:27:36 <JayF> I'll be honest, this probably needs more design than has been done in the bug currently
15:27:59 <JayF> but at a high level; operators want us to optionally report metrics from burnin_ methods to inspection so they can compare performance
15:28:41 <TheJulia> ... that seems fairly nebulous
15:29:02 <TheJulia> or at least, at a high level a bit outside of what we can do with the stack with where we're going
15:29:16 <TheJulia> since surely they mean inspector
15:29:45 <JayF> how is "they mean inspector" different than what I said?
15:30:06 <TheJulia> we're on a path to deprecate inspector
15:30:26 <dtantsur> but we can still report stuff? the functional difference is not that huge
15:30:27 <TheJulia> and inspector really only holds value introspection runs, so we're talking about storing more data and extending then
15:30:31 <JayF> Yes, but we have similar functionality in an ironic-forward way?
15:30:49 <JayF> I specifically didn't talk about internal ironic vs external inspector because AIUI it doesn't make much of a difference at that level in ipa
15:30:50 * dtantsur is writing a migration guide by the way
15:30:52 <TheJulia> I guess I'm semi -1 to extending expector
15:30:58 <TheJulia> inspector
15:31:03 * TheJulia needs way more coffee this morning
15:31:20 <rpittau> we can always hold until the migration has been completed
15:31:22 <dtantsur> well, yeah, inspector is frozen at this point
15:31:25 <rpittau> or add the feature to the agent in ironic
15:31:27 <JayF> I don't comprehend how this extends inspector -- if I were implementing this, I'd focus on new ironic-based inspection
15:31:43 <rpittau> yep, exactly
15:31:45 <dtantsur> same
15:31:45 <JayF> we will still have the ability to inspect stuff and report results
15:31:48 <TheJulia> okay then
15:32:08 <JayF> I'm OK with us not approving it because it *is* very vague and might need design
15:32:14 <rpittau> it actually makes sense to have that in the ironic project :)
15:32:18 <JayF> but it's 100% not the intention for this to be an `ironic-inspector` features
15:32:29 <dtantsur> also, if what gets extended is IPA, it does not matter what the server side is
15:32:35 <rpittau> yeah
15:32:45 <dtantsur> if something gets into the inspection data, it will be stored either way
15:33:07 <TheJulia> so I guess going back to my lack of caffinated state, I read Jay's summary as they want us to compare the results
15:33:15 <TheJulia> which sent my brain a step further
15:33:26 <JayF> Ah, I'll be 100% clear we're talking about storing metrics from the run, and nothing else
15:33:37 <JayF> the use case brought up IRL was "sometimes machines just randomly perform out of spec"
15:33:43 <dtantsur> Well, we could write an inspection hook to compare the old data with the new one.. that would depend on which server accepts the data
15:33:48 <JayF> with a story about NVMe drives that *over performed* by a factor of 2s
15:33:50 <JayF> *2x
15:34:20 <JayF> dtantsur: I'd suggest that be a separate, next step
15:34:25 * dtantsur nods
15:34:26 <JayF> dtantsur: part of the goal here is to have some low-hanging-fruit
15:34:58 <rpittau> it clearly needs more info, but I'm ok with that
15:35:02 <JayF> I put a clarifying comment in 2069083, going to move on without marking as approved since it needs more info
15:35:12 <rpittau> yep, thanks
15:35:24 <JayF> #link https://bugs.launchpad.net/ironic/+bug/2068530
15:35:29 <JayF> [RFE] Allow an operator to block all future bios deployments
15:35:49 <TheJulia> That was an idea which came up in discussion which I kind of liked
15:35:51 <JayF> The idea here is to have a conductor-level setting which prevents new nodes from being created / nodes from being updated to use BIOS boot mode
15:36:06 <JayF> In order for places to enforce UEFI-only boot mode
15:36:34 <JayF> Hmm that's not exactly how it is written
15:36:46 <JayF> how it's written is to block *deployments* if in bios boot mode, that's not exactly the same thing
15:36:47 <rpittau> this kind of means that we're deprecating legacy BIOS deployments
15:37:03 <JayF> rpittau: it means we're allowing some operators to flip a switch to disable them *in their environment* if they have a separate requirement
15:37:17 <JayF> rpittau: it's just convenient we can flip it to default-enabled when we're ready to deprecate
15:37:29 <rpittau> yes, I remember the discussion now, and I'm all for it
15:37:45 <TheJulia> I noted it as deployment in large part because you can have machines in different states
15:38:04 <TheJulia> and the logical place to prevent that sort of thing is in the deployment pathway code because you want to be able to unprovision machines
15:38:15 <rpittau> of course, makes sense
15:38:16 <TheJulia> sort of like the retire logic with cleaning
15:38:29 <JayF> well, that doesn't exactly match what the operator in the room was saying
15:38:43 <JayF> they were saying they had an issue where a device would be misconfigured at enrollment with a bios boot mode
15:38:58 <JayF> setting back their efforts to "drain" bios booting out of the environment
15:39:15 <JayF> I think there's potential value in checking at the API level (you can't explicitly set to bios) as well as at deployment, but IMBW
15:39:29 <JayF> at least that's how I understood it
15:40:07 <TheJulia> So, we're semi-seeing such issues with some hardware, but it doesn't really show until cleaning/deployment
15:40:26 <TheJulia> specifically hardware which *has* to be requested to be in bios mode, but we shouldn't model the universe around the exception
15:40:34 <JayF> I guess I'm wondering if we'd ever have operators who'd say "I don't even want a ramdisk to boot via bios"
15:40:57 <TheJulia> quite possible, but they are going to have to somehow drain out their current state
15:41:00 <JayF> that seems to be the meaningful difference between our two perspectives, yeah?
15:41:13 <TheJulia> somewhat, really it is likely two knobs
15:41:18 <JayF> ack
15:41:18 <TheJulia> and two distinct checks at different points
15:41:25 <JayF> I'll update the RFE with that comment
15:41:30 <JayF> I think it's clear we are onboard to do something like this?
15:41:40 <TheJulia> one early on in the entry flow to move a node out of enrollment, and likely later on with deployments
15:41:56 <TheJulia> so if you switch both, you'll just end up with nodes which cannot be deployed anymore and can't add anymroe
15:42:05 <TheJulia> well, until you fix them or replace them :)
15:42:34 <JayF> updated with that comment, marking approved
15:42:58 <JayF> TheJulia: this is one of yours
15:43:00 <JayF> #link https://bugs.launchpad.net/ironic/+bug/2067073
15:43:04 <TheJulia> by out of enrollment, I mean upon managing one node
15:43:06 <JayF> [RFE] HTTP ISO Boot via Network (UEFI) HTTP Boot
15:44:20 <JayF> This seems like a pretty straightforward extension of the new http-* based boot interfaces
15:44:47 <TheJulia> So this one is unrelated to the meetup, but the idea is "what if folks didn't want ipxe or a network style PXE loader anymore", and is there anyway to direct boot an ISO via the network. Today there is not a delineation as the httpboot interfaces are delineated and the pure network+dhcp boot path leverages a network bootloader, or provide the entire url in advance to the BMC
15:45:13 <TheJulia> yeah, pretty much just a slightly different option so you don't need a network bootloader at all
15:45:21 <rpittau> that is very interesting
15:45:45 <TheJulia> It will only really be useful with a managed dhcp server though
15:47:49 <JayF> I think silence is acceptance?
15:48:06 <TheJulia> I suspect so
15:48:14 <rpittau> good for me
15:48:16 <JayF> that's the end of the list
15:48:19 <rpittau> thanks!
15:48:38 <JayF> how refreshing! Features people asked for, in real life
15:48:46 <JayF> not filtered through someone elses' product team :P
15:48:58 <rpittau> heh :)
15:49:11 <rpittau> #topic Open Discussion
15:49:17 <TheJulia> eh... that last one is sort of me saving my sanity from a product team :)
15:49:33 <JayF> As mentioned Friday, we're probably going to start piloting asyncio-based solutions for removing eventlet from IPA
15:49:36 <rpittau> we still have 10 minutes in case someone has something to discuss about
15:49:56 <JayF> I ask folks who may have an opinion to please have it early, I'll make sure stuff gets posted for review quickly
15:51:15 <TheJulia> I really don't have an opinion formed right now, but asyncio seems fine to me
15:51:15 <JayF> that's all I had for open discussion
15:51:32 <JayF> TheJulia: my sincere hope is it's a giant nothingburger for something the size of IPA
15:52:01 <TheJulia> it really should be, honestly
15:52:39 <rpittau> anything else to discuss today?
15:53:05 <TheJulia> we could discuss pysnmp fun and my many broken tests on python 3.11
15:53:16 <TheJulia> but that doesn't seem *that* important :)
15:54:14 <JayF> I want to look at that with an IDE up and not IRC up before talking about it more
15:55:23 <TheJulia> fwiw, I think it's presence angers eventlet, but we can end the meeting
15:56:01 <rpittau> thanks everyone!
15:56:01 <rpittau> #endmeeting