15:00:26 #startmeeting ironic 15:00:26 Meeting started Mon Jun 17 15:00:26 2024 UTC and is due to finish in 60 minutes. The chair is rpittau. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:26 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:26 The meeting name has been set to 'ironic' 15:00:33 o/ 15:00:35 o/ 15:00:55 oi/ 15:01:00 Hello everyone! 15:01:00 Welcome to our weekly meeting! 15:01:00 The meeting agenda can be found here: 15:01:00 #link https://wiki.openstack.org/wiki/Meetings/Ironic#Agenda_for_June_17.2C_2024 15:02:01 #topic Announcements/Reminders 15:02:18 #info Standing reminder to review patches tagged ironic-week-prio and to hashtag any patches ready for review with ironic-week-prio: 15:02:19 o/ 15:02:25 #link https://tinyurl.com/ironic-weekly-prio-dash 15:03:09 #info The Ironic/Bare Metal SIG meetup at CERN happened on June 5, leaving the link to the notes still for this week 15:03:18 #link https://etherpad.opendev.org/p/baremetal-sig-cern-june-5-2024 15:03:25 I filed several RFEs out of the feedback from that 15:03:35 I haven't added them to RFE review in the meeting though, I missed that 15:03:38 JayF: I saw that, thanks 15:03:48 we can potentially go over them at the appropriate time if desired 15:03:58 sounds good 15:04:34 #info 2024.2 Dalmatian Release Schedule 15:04:34 #link https://releases.openstack.org/dalmatian/schedule.html 15:05:40 anything else to announce/remind ? 15:06:18 ok, moving on 15:06:21 #topic Review Ironic CI status 15:06:32 #info metal3 integration job is broken again \o/ 15:07:15 probably worth discussing why we didn't see CI breaking in metal3-dev-env during the metal3 meeting on Wednesday 15:08:11 anything else CI related ? 15:08:23 It sounds like the pySNMP change hurt us, too 15:08:33 I'll look at 912328 and see if I can get it over the line with some time today 15:08:40 https://d2e31a25148547148788-51f9901de94768cc4d0e03f07c031664.ssl.cf1.rackcdn.com/921966/2/gate/ironic-tempest-ramdisk-bios-snmp-pxe/7f972ac/controller/logs/screen-ir-cond.txt <-- confirmed 15:09:05 somewhere driver loading now breaks 15:09:18 I left a message for Lex Li in the original thread on github https://github.com/lextudio/pysnmp/issues/49 15:09:19 Haven't gotten far enough to figure it out exactly 15:09:28 dtantsur: I found the issue I had, and yes it was a misconfiguration... 15:09:48 good to know! 15:10:15 etc/kayobe/kolla/config/bifrost/bifrost.yml contained the wrong driver, I corrected it and now it works 15:10:30 TheJulia: maybe some hints from the failing unit tests? 15:11:17 the failing unit tests with the patch to bump our local state is 2000+ failing tests for me locally 15:11:39 it was only 6 in the gate, weird 15:11:42 but that has been the path I've been looking at 15:11:44 I added the kolla dir in my .gitignore to avoid pushing some secrets in the git repo, but then I didn't tracked this file... so hidden from my config.. 15:11:49 thanks! 15:12:14 JayF: I should check the eventlet version of the gate... 15:12:38 Anyway, its all broken, we can move the meeting along 15:12:48 its going to take time to figure out what is going on exactly 15:13:46 TheJulia: seeing 6 locally here on a clean tox run too, fwiw; I'll take a look at getting those to pass locally post-meeting 15:14:01 TheJulia: I can propose a revert for the uc change until our snmp job passes 15:14:30 Lets wait and see, I think it is just worse with lextudio's patch, at least on debian 15:14:56 ok 15:15:00 btw, was that count with just tox -epy3, or did you try tox -ecover ? 15:15:24 TheJulia: tox -epy3 on python3.12 15:15:35 ^ -r 15:15:39 I ran cover with 3.11 15:15:46 ack 15:15:55 cover can be a bit ... misleading when there are certain kinds of failures ime 15:16:12 e.g. some failures can cascade 15:16:16 but we should probably move on the meeting :D 15:16:20 let's move on! 15:16:23 :) 15:16:25 #Discussion topics 15:16:30 heh 15:16:33 #topic Discussion topics 15:16:47 JayF: you have something there :) 15:17:09 So Reverbverbverb has a draft up of his report on Ironic documentatation in google docs 15:17:24 I implore you, please find time in the next week to go over it and make comments 15:17:46 we will be converting this into "final version" and filing bugs around the action items as early as this time next week barring feedback to the contrary 15:17:58 #action review reverbverbverb Ironic documentation audit update & results 15:17:58 #link https://docs.google.com/document/d/1e9URuPHKNTx5QXdkCFAzsS0EAAxPJ-xQC4BEpDwESp4/edit 15:18:21 ty for that, you got it copied before I did :D 15:18:23 thanks Reverbverbverb for that! 15:18:49 My pleasure. Please comment in the doc. 15:18:56 will do :) 15:19:42 anything else to add? 15:20:25 My review of the doc is necessarily big-picture, but please feel free to comment on anything about the doc 15:21:14 I think that's all for our action item 15:21:15 I'll be breaking the analysis down to (I hope) actionable items, so anything you have to say will help 15:21:38 alright, thanks again 15:21:41 👍 15:21:43 Thanks Reverbverbverb! 15:22:00 #topic Bug Deputy Updates 15:22:34 so that was me last week 15:22:34 haven't seen a lot of movements in terms of bugs but this popped up on Friday 15:22:34 #link https://bugs.launchpad.net/ironic/+bug/2069413 15:23:23 Looks like they proposed a fix, I'll try to take a look this week 15:23:28 thanks TheJulia 15:23:46 any volunteer for this week bug deputy ? 15:24:26 give it to me 15:24:32 thanks JayF :) 15:24:50 moving on! 15:24:57 #topic RFE Review 15:25:18 JayF: you want to mention the RFEs you opened ? 15:25:35 yeah I have a bunch of recent rfes that need review, not all are mine but all need attention 15:25:49 #link https://bugs.launchpad.net/ironic/+bug/2069085 15:25:53 RFE: Add a burnin_gpu step 15:26:09 I'll note all these are against *ironic* even though for many of them, IPA commits are all that are needed 15:26:37 That's pretty straightforward; as requested by operators at the cern meetup, we can hook up the stress-ng support for GPUs to a burnin_gpu step 15:26:41 ok 15:26:41 we can always add the project there 15:26:49 I don't think there's anything controversial here and it's a good first issue 15:27:04 that looks ok to me 15:27:21 #link https://bugs.launchpad.net/ironic/+bug/2069083 15:27:25 RFE: Use results of burnin_* methods for inspection 15:27:36 I'll be honest, this probably needs more design than has been done in the bug currently 15:27:59 but at a high level; operators want us to optionally report metrics from burnin_ methods to inspection so they can compare performance 15:28:41 ... that seems fairly nebulous 15:29:02 or at least, at a high level a bit outside of what we can do with the stack with where we're going 15:29:16 since surely they mean inspector 15:29:45 how is "they mean inspector" different than what I said? 15:30:06 we're on a path to deprecate inspector 15:30:26 but we can still report stuff? the functional difference is not that huge 15:30:27 and inspector really only holds value introspection runs, so we're talking about storing more data and extending then 15:30:31 Yes, but we have similar functionality in an ironic-forward way? 15:30:49 I specifically didn't talk about internal ironic vs external inspector because AIUI it doesn't make much of a difference at that level in ipa 15:30:50 * dtantsur is writing a migration guide by the way 15:30:52 I guess I'm semi -1 to extending expector 15:30:58 inspector 15:31:03 * TheJulia needs way more coffee this morning 15:31:20 we can always hold until the migration has been completed 15:31:22 well, yeah, inspector is frozen at this point 15:31:25 or add the feature to the agent in ironic 15:31:27 I don't comprehend how this extends inspector -- if I were implementing this, I'd focus on new ironic-based inspection 15:31:43 yep, exactly 15:31:45 same 15:31:45 we will still have the ability to inspect stuff and report results 15:31:48 okay then 15:32:08 I'm OK with us not approving it because it *is* very vague and might need design 15:32:14 it actually makes sense to have that in the ironic project :) 15:32:18 but it's 100% not the intention for this to be an `ironic-inspector` features 15:32:29 also, if what gets extended is IPA, it does not matter what the server side is 15:32:35 yeah 15:32:45 if something gets into the inspection data, it will be stored either way 15:33:07 so I guess going back to my lack of caffinated state, I read Jay's summary as they want us to compare the results 15:33:15 which sent my brain a step further 15:33:26 Ah, I'll be 100% clear we're talking about storing metrics from the run, and nothing else 15:33:37 the use case brought up IRL was "sometimes machines just randomly perform out of spec" 15:33:43 Well, we could write an inspection hook to compare the old data with the new one.. that would depend on which server accepts the data 15:33:48 with a story about NVMe drives that *over performed* by a factor of 2s 15:33:50 *2x 15:34:20 dtantsur: I'd suggest that be a separate, next step 15:34:25 * dtantsur nods 15:34:26 dtantsur: part of the goal here is to have some low-hanging-fruit 15:34:58 it clearly needs more info, but I'm ok with that 15:35:02 I put a clarifying comment in 2069083, going to move on without marking as approved since it needs more info 15:35:12 yep, thanks 15:35:24 #link https://bugs.launchpad.net/ironic/+bug/2068530 15:35:29 [RFE] Allow an operator to block all future bios deployments 15:35:49 That was an idea which came up in discussion which I kind of liked 15:35:51 The idea here is to have a conductor-level setting which prevents new nodes from being created / nodes from being updated to use BIOS boot mode 15:36:06 In order for places to enforce UEFI-only boot mode 15:36:34 Hmm that's not exactly how it is written 15:36:46 how it's written is to block *deployments* if in bios boot mode, that's not exactly the same thing 15:36:47 this kind of means that we're deprecating legacy BIOS deployments 15:37:03 rpittau: it means we're allowing some operators to flip a switch to disable them *in their environment* if they have a separate requirement 15:37:17 rpittau: it's just convenient we can flip it to default-enabled when we're ready to deprecate 15:37:29 yes, I remember the discussion now, and I'm all for it 15:37:45 I noted it as deployment in large part because you can have machines in different states 15:38:04 and the logical place to prevent that sort of thing is in the deployment pathway code because you want to be able to unprovision machines 15:38:15 of course, makes sense 15:38:16 sort of like the retire logic with cleaning 15:38:29 well, that doesn't exactly match what the operator in the room was saying 15:38:43 they were saying they had an issue where a device would be misconfigured at enrollment with a bios boot mode 15:38:58 setting back their efforts to "drain" bios booting out of the environment 15:39:15 I think there's potential value in checking at the API level (you can't explicitly set to bios) as well as at deployment, but IMBW 15:39:29 at least that's how I understood it 15:40:07 So, we're semi-seeing such issues with some hardware, but it doesn't really show until cleaning/deployment 15:40:26 specifically hardware which *has* to be requested to be in bios mode, but we shouldn't model the universe around the exception 15:40:34 I guess I'm wondering if we'd ever have operators who'd say "I don't even want a ramdisk to boot via bios" 15:40:57 quite possible, but they are going to have to somehow drain out their current state 15:41:00 that seems to be the meaningful difference between our two perspectives, yeah? 15:41:13 somewhat, really it is likely two knobs 15:41:18 ack 15:41:18 and two distinct checks at different points 15:41:25 I'll update the RFE with that comment 15:41:30 I think it's clear we are onboard to do something like this? 15:41:40 one early on in the entry flow to move a node out of enrollment, and likely later on with deployments 15:41:56 so if you switch both, you'll just end up with nodes which cannot be deployed anymore and can't add anymroe 15:42:05 well, until you fix them or replace them :) 15:42:34 updated with that comment, marking approved 15:42:58 TheJulia: this is one of yours 15:43:00 #link https://bugs.launchpad.net/ironic/+bug/2067073 15:43:04 by out of enrollment, I mean upon managing one node 15:43:06 [RFE] HTTP ISO Boot via Network (UEFI) HTTP Boot 15:44:20 This seems like a pretty straightforward extension of the new http-* based boot interfaces 15:44:47 So this one is unrelated to the meetup, but the idea is "what if folks didn't want ipxe or a network style PXE loader anymore", and is there anyway to direct boot an ISO via the network. Today there is not a delineation as the httpboot interfaces are delineated and the pure network+dhcp boot path leverages a network bootloader, or provide the entire url in advance to the BMC 15:45:13 yeah, pretty much just a slightly different option so you don't need a network bootloader at all 15:45:21 that is very interesting 15:45:45 It will only really be useful with a managed dhcp server though 15:47:49 I think silence is acceptance? 15:48:06 I suspect so 15:48:14 good for me 15:48:16 that's the end of the list 15:48:19 thanks! 15:48:38 how refreshing! Features people asked for, in real life 15:48:46 not filtered through someone elses' product team :P 15:48:58 heh :) 15:49:11 #topic Open Discussion 15:49:17 eh... that last one is sort of me saving my sanity from a product team :) 15:49:33 As mentioned Friday, we're probably going to start piloting asyncio-based solutions for removing eventlet from IPA 15:49:36 we still have 10 minutes in case someone has something to discuss about 15:49:56 I ask folks who may have an opinion to please have it early, I'll make sure stuff gets posted for review quickly 15:51:15 I really don't have an opinion formed right now, but asyncio seems fine to me 15:51:15 that's all I had for open discussion 15:51:32 TheJulia: my sincere hope is it's a giant nothingburger for something the size of IPA 15:52:01 it really should be, honestly 15:52:39 anything else to discuss today? 15:53:05 we could discuss pysnmp fun and my many broken tests on python 3.11 15:53:16 but that doesn't seem *that* important :) 15:54:14 I want to look at that with an IDE up and not IRC up before talking about it more 15:55:23 fwiw, I think it's presence angers eventlet, but we can end the meeting 15:56:01 thanks everyone! 15:56:01 #endmeeting