15:00:26 <rpittau> #startmeeting ironic 15:00:26 <opendevmeet> Meeting started Mon Jun 17 15:00:26 2024 UTC and is due to finish in 60 minutes. The chair is rpittau. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:26 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:26 <opendevmeet> The meeting name has been set to 'ironic' 15:00:33 <JayF> o/ 15:00:35 <TheJulia> o/ 15:00:55 <iurygregory> oi/ 15:01:00 <rpittau> Hello everyone! 15:01:00 <rpittau> Welcome to our weekly meeting! 15:01:00 <rpittau> The meeting agenda can be found here: 15:01:00 <rpittau> #link https://wiki.openstack.org/wiki/Meetings/Ironic#Agenda_for_June_17.2C_2024 15:02:01 <rpittau> #topic Announcements/Reminders 15:02:18 <rpittau> #info Standing reminder to review patches tagged ironic-week-prio and to hashtag any patches ready for review with ironic-week-prio: 15:02:19 <dtantsur> o/ 15:02:25 <rpittau> #link https://tinyurl.com/ironic-weekly-prio-dash 15:03:09 <rpittau> #info The Ironic/Bare Metal SIG meetup at CERN happened on June 5, leaving the link to the notes still for this week 15:03:18 <rpittau> #link https://etherpad.opendev.org/p/baremetal-sig-cern-june-5-2024 15:03:25 <JayF> I filed several RFEs out of the feedback from that 15:03:35 <JayF> I haven't added them to RFE review in the meeting though, I missed that 15:03:38 <rpittau> JayF: I saw that, thanks 15:03:48 <JayF> we can potentially go over them at the appropriate time if desired 15:03:58 <rpittau> sounds good 15:04:34 <rpittau> #info 2024.2 Dalmatian Release Schedule 15:04:34 <rpittau> #link https://releases.openstack.org/dalmatian/schedule.html 15:05:40 <rpittau> anything else to announce/remind ? 15:06:18 <rpittau> ok, moving on 15:06:21 <rpittau> #topic Review Ironic CI status 15:06:32 <rpittau> #info metal3 integration job is broken again \o/ 15:07:15 <rpittau> probably worth discussing why we didn't see CI breaking in metal3-dev-env during the metal3 meeting on Wednesday 15:08:11 <rpittau> anything else CI related ? 15:08:23 <JayF> It sounds like the pySNMP change hurt us, too 15:08:33 <JayF> I'll look at 912328 and see if I can get it over the line with some time today 15:08:40 <TheJulia> https://d2e31a25148547148788-51f9901de94768cc4d0e03f07c031664.ssl.cf1.rackcdn.com/921966/2/gate/ironic-tempest-ramdisk-bios-snmp-pxe/7f972ac/controller/logs/screen-ir-cond.txt <-- confirmed 15:09:05 <TheJulia> somewhere driver loading now breaks 15:09:18 <rpittau> I left a message for Lex Li in the original thread on github https://github.com/lextudio/pysnmp/issues/49 15:09:19 <TheJulia> Haven't gotten far enough to figure it out exactly 15:09:28 <sylvr> dtantsur: I found the issue I had, and yes it was a misconfiguration... 15:09:48 <dtantsur> good to know! 15:10:15 <sylvr> etc/kayobe/kolla/config/bifrost/bifrost.yml contained the wrong driver, I corrected it and now it works 15:10:30 <rpittau> TheJulia: maybe some hints from the failing unit tests? 15:11:17 <TheJulia> the failing unit tests with the patch to bump our local state is 2000+ failing tests for me locally 15:11:39 <JayF> it was only 6 in the gate, weird 15:11:42 <TheJulia> but that has been the path I've been looking at 15:11:44 <sylvr> I added the kolla dir in my .gitignore to avoid pushing some secrets in the git repo, but then I didn't tracked this file... so hidden from my config.. 15:11:49 <sylvr> thanks! 15:12:14 <TheJulia> JayF: I should check the eventlet version of the gate... 15:12:38 <TheJulia> Anyway, its all broken, we can move the meeting along 15:12:48 <TheJulia> its going to take time to figure out what is going on exactly 15:13:46 <JayF> TheJulia: seeing 6 locally here on a clean tox run too, fwiw; I'll take a look at getting those to pass locally post-meeting 15:14:01 <rpittau> TheJulia: I can propose a revert for the uc change until our snmp job passes 15:14:30 <TheJulia> Lets wait and see, I think it is just worse with lextudio's patch, at least on debian 15:14:56 <rpittau> ok 15:15:00 <TheJulia> btw, was that count with just tox -epy3, or did you try tox -ecover ? 15:15:24 <JayF> TheJulia: tox -epy3 on python3.12 15:15:35 <JayF> ^ -r 15:15:39 <TheJulia> I ran cover with 3.11 15:15:46 <TheJulia> ack 15:15:55 <JayF> cover can be a bit ... misleading when there are certain kinds of failures ime 15:16:12 <JayF> e.g. some failures can cascade 15:16:16 <JayF> but we should probably move on the meeting :D 15:16:20 <rpittau> let's move on! 15:16:23 <rpittau> :) 15:16:25 <rpittau> #Discussion topics 15:16:30 <rpittau> heh 15:16:33 <rpittau> #topic Discussion topics 15:16:47 <rpittau> JayF: you have something there :) 15:17:09 <JayF> So Reverbverbverb has a draft up of his report on Ironic documentatation in google docs 15:17:24 <JayF> I implore you, please find time in the next week to go over it and make comments 15:17:46 <JayF> we will be converting this into "final version" and filing bugs around the action items as early as this time next week barring feedback to the contrary 15:17:58 <rpittau> #action review reverbverbverb Ironic documentation audit update & results 15:17:58 <rpittau> #link https://docs.google.com/document/d/1e9URuPHKNTx5QXdkCFAzsS0EAAxPJ-xQC4BEpDwESp4/edit 15:18:21 <JayF> ty for that, you got it copied before I did :D 15:18:23 <rpittau> thanks Reverbverbverb for that! 15:18:49 <Reverbverbverb> My pleasure. Please comment in the doc. 15:18:56 <rpittau> will do :) 15:19:42 <rpittau> anything else to add? 15:20:25 <Reverbverbverb> My review of the doc is necessarily big-picture, but please feel free to comment on anything about the doc 15:21:14 <JayF> I think that's all for our action item 15:21:15 <Reverbverbverb> I'll be breaking the analysis down to (I hope) actionable items, so anything you have to say will help 15:21:38 <rpittau> alright, thanks again 15:21:41 <Reverbverbverb> 👍 15:21:43 <dtantsur> Thanks Reverbverbverb! 15:22:00 <rpittau> #topic Bug Deputy Updates 15:22:34 <rpittau> so that was me last week 15:22:34 <rpittau> haven't seen a lot of movements in terms of bugs but this popped up on Friday 15:22:34 <rpittau> #link https://bugs.launchpad.net/ironic/+bug/2069413 15:23:23 <TheJulia> Looks like they proposed a fix, I'll try to take a look this week 15:23:28 <rpittau> thanks TheJulia 15:23:46 <rpittau> any volunteer for this week bug deputy ? 15:24:26 <JayF> give it to me 15:24:32 <rpittau> thanks JayF :) 15:24:50 <rpittau> moving on! 15:24:57 <rpittau> #topic RFE Review 15:25:18 <rpittau> JayF: you want to mention the RFEs you opened ? 15:25:35 <JayF> yeah I have a bunch of recent rfes that need review, not all are mine but all need attention 15:25:49 <JayF> #link https://bugs.launchpad.net/ironic/+bug/2069085 15:25:53 <JayF> RFE: Add a burnin_gpu step 15:26:09 <JayF> I'll note all these are against *ironic* even though for many of them, IPA commits are all that are needed 15:26:37 <JayF> That's pretty straightforward; as requested by operators at the cern meetup, we can hook up the stress-ng support for GPUs to a burnin_gpu step 15:26:41 <rpittau> ok 15:26:41 <rpittau> we can always add the project there 15:26:49 <JayF> I don't think there's anything controversial here and it's a good first issue 15:27:04 <rpittau> that looks ok to me 15:27:21 <JayF> #link https://bugs.launchpad.net/ironic/+bug/2069083 15:27:25 <JayF> RFE: Use results of burnin_* methods for inspection 15:27:36 <JayF> I'll be honest, this probably needs more design than has been done in the bug currently 15:27:59 <JayF> but at a high level; operators want us to optionally report metrics from burnin_ methods to inspection so they can compare performance 15:28:41 <TheJulia> ... that seems fairly nebulous 15:29:02 <TheJulia> or at least, at a high level a bit outside of what we can do with the stack with where we're going 15:29:16 <TheJulia> since surely they mean inspector 15:29:45 <JayF> how is "they mean inspector" different than what I said? 15:30:06 <TheJulia> we're on a path to deprecate inspector 15:30:26 <dtantsur> but we can still report stuff? the functional difference is not that huge 15:30:27 <TheJulia> and inspector really only holds value introspection runs, so we're talking about storing more data and extending then 15:30:31 <JayF> Yes, but we have similar functionality in an ironic-forward way? 15:30:49 <JayF> I specifically didn't talk about internal ironic vs external inspector because AIUI it doesn't make much of a difference at that level in ipa 15:30:50 * dtantsur is writing a migration guide by the way 15:30:52 <TheJulia> I guess I'm semi -1 to extending expector 15:30:58 <TheJulia> inspector 15:31:03 * TheJulia needs way more coffee this morning 15:31:20 <rpittau> we can always hold until the migration has been completed 15:31:22 <dtantsur> well, yeah, inspector is frozen at this point 15:31:25 <rpittau> or add the feature to the agent in ironic 15:31:27 <JayF> I don't comprehend how this extends inspector -- if I were implementing this, I'd focus on new ironic-based inspection 15:31:43 <rpittau> yep, exactly 15:31:45 <dtantsur> same 15:31:45 <JayF> we will still have the ability to inspect stuff and report results 15:31:48 <TheJulia> okay then 15:32:08 <JayF> I'm OK with us not approving it because it *is* very vague and might need design 15:32:14 <rpittau> it actually makes sense to have that in the ironic project :) 15:32:18 <JayF> but it's 100% not the intention for this to be an `ironic-inspector` features 15:32:29 <dtantsur> also, if what gets extended is IPA, it does not matter what the server side is 15:32:35 <rpittau> yeah 15:32:45 <dtantsur> if something gets into the inspection data, it will be stored either way 15:33:07 <TheJulia> so I guess going back to my lack of caffinated state, I read Jay's summary as they want us to compare the results 15:33:15 <TheJulia> which sent my brain a step further 15:33:26 <JayF> Ah, I'll be 100% clear we're talking about storing metrics from the run, and nothing else 15:33:37 <JayF> the use case brought up IRL was "sometimes machines just randomly perform out of spec" 15:33:43 <dtantsur> Well, we could write an inspection hook to compare the old data with the new one.. that would depend on which server accepts the data 15:33:48 <JayF> with a story about NVMe drives that *over performed* by a factor of 2s 15:33:50 <JayF> *2x 15:34:20 <JayF> dtantsur: I'd suggest that be a separate, next step 15:34:25 * dtantsur nods 15:34:26 <JayF> dtantsur: part of the goal here is to have some low-hanging-fruit 15:34:58 <rpittau> it clearly needs more info, but I'm ok with that 15:35:02 <JayF> I put a clarifying comment in 2069083, going to move on without marking as approved since it needs more info 15:35:12 <rpittau> yep, thanks 15:35:24 <JayF> #link https://bugs.launchpad.net/ironic/+bug/2068530 15:35:29 <JayF> [RFE] Allow an operator to block all future bios deployments 15:35:49 <TheJulia> That was an idea which came up in discussion which I kind of liked 15:35:51 <JayF> The idea here is to have a conductor-level setting which prevents new nodes from being created / nodes from being updated to use BIOS boot mode 15:36:06 <JayF> In order for places to enforce UEFI-only boot mode 15:36:34 <JayF> Hmm that's not exactly how it is written 15:36:46 <JayF> how it's written is to block *deployments* if in bios boot mode, that's not exactly the same thing 15:36:47 <rpittau> this kind of means that we're deprecating legacy BIOS deployments 15:37:03 <JayF> rpittau: it means we're allowing some operators to flip a switch to disable them *in their environment* if they have a separate requirement 15:37:17 <JayF> rpittau: it's just convenient we can flip it to default-enabled when we're ready to deprecate 15:37:29 <rpittau> yes, I remember the discussion now, and I'm all for it 15:37:45 <TheJulia> I noted it as deployment in large part because you can have machines in different states 15:38:04 <TheJulia> and the logical place to prevent that sort of thing is in the deployment pathway code because you want to be able to unprovision machines 15:38:15 <rpittau> of course, makes sense 15:38:16 <TheJulia> sort of like the retire logic with cleaning 15:38:29 <JayF> well, that doesn't exactly match what the operator in the room was saying 15:38:43 <JayF> they were saying they had an issue where a device would be misconfigured at enrollment with a bios boot mode 15:38:58 <JayF> setting back their efforts to "drain" bios booting out of the environment 15:39:15 <JayF> I think there's potential value in checking at the API level (you can't explicitly set to bios) as well as at deployment, but IMBW 15:39:29 <JayF> at least that's how I understood it 15:40:07 <TheJulia> So, we're semi-seeing such issues with some hardware, but it doesn't really show until cleaning/deployment 15:40:26 <TheJulia> specifically hardware which *has* to be requested to be in bios mode, but we shouldn't model the universe around the exception 15:40:34 <JayF> I guess I'm wondering if we'd ever have operators who'd say "I don't even want a ramdisk to boot via bios" 15:40:57 <TheJulia> quite possible, but they are going to have to somehow drain out their current state 15:41:00 <JayF> that seems to be the meaningful difference between our two perspectives, yeah? 15:41:13 <TheJulia> somewhat, really it is likely two knobs 15:41:18 <JayF> ack 15:41:18 <TheJulia> and two distinct checks at different points 15:41:25 <JayF> I'll update the RFE with that comment 15:41:30 <JayF> I think it's clear we are onboard to do something like this? 15:41:40 <TheJulia> one early on in the entry flow to move a node out of enrollment, and likely later on with deployments 15:41:56 <TheJulia> so if you switch both, you'll just end up with nodes which cannot be deployed anymore and can't add anymroe 15:42:05 <TheJulia> well, until you fix them or replace them :) 15:42:34 <JayF> updated with that comment, marking approved 15:42:58 <JayF> TheJulia: this is one of yours 15:43:00 <JayF> #link https://bugs.launchpad.net/ironic/+bug/2067073 15:43:04 <TheJulia> by out of enrollment, I mean upon managing one node 15:43:06 <JayF> [RFE] HTTP ISO Boot via Network (UEFI) HTTP Boot 15:44:20 <JayF> This seems like a pretty straightforward extension of the new http-* based boot interfaces 15:44:47 <TheJulia> So this one is unrelated to the meetup, but the idea is "what if folks didn't want ipxe or a network style PXE loader anymore", and is there anyway to direct boot an ISO via the network. Today there is not a delineation as the httpboot interfaces are delineated and the pure network+dhcp boot path leverages a network bootloader, or provide the entire url in advance to the BMC 15:45:13 <TheJulia> yeah, pretty much just a slightly different option so you don't need a network bootloader at all 15:45:21 <rpittau> that is very interesting 15:45:45 <TheJulia> It will only really be useful with a managed dhcp server though 15:47:49 <JayF> I think silence is acceptance? 15:48:06 <TheJulia> I suspect so 15:48:14 <rpittau> good for me 15:48:16 <JayF> that's the end of the list 15:48:19 <rpittau> thanks! 15:48:38 <JayF> how refreshing! Features people asked for, in real life 15:48:46 <JayF> not filtered through someone elses' product team :P 15:48:58 <rpittau> heh :) 15:49:11 <rpittau> #topic Open Discussion 15:49:17 <TheJulia> eh... that last one is sort of me saving my sanity from a product team :) 15:49:33 <JayF> As mentioned Friday, we're probably going to start piloting asyncio-based solutions for removing eventlet from IPA 15:49:36 <rpittau> we still have 10 minutes in case someone has something to discuss about 15:49:56 <JayF> I ask folks who may have an opinion to please have it early, I'll make sure stuff gets posted for review quickly 15:51:15 <TheJulia> I really don't have an opinion formed right now, but asyncio seems fine to me 15:51:15 <JayF> that's all I had for open discussion 15:51:32 <JayF> TheJulia: my sincere hope is it's a giant nothingburger for something the size of IPA 15:52:01 <TheJulia> it really should be, honestly 15:52:39 <rpittau> anything else to discuss today? 15:53:05 <TheJulia> we could discuss pysnmp fun and my many broken tests on python 3.11 15:53:16 <TheJulia> but that doesn't seem *that* important :) 15:54:14 <JayF> I want to look at that with an IDE up and not IRC up before talking about it more 15:55:23 <TheJulia> fwiw, I think it's presence angers eventlet, but we can end the meeting 15:56:01 <rpittau> thanks everyone! 15:56:01 <rpittau> #endmeeting