15:01:46 <TheJulia> #startmeeting ironic
15:01:46 <opendevmeet> Meeting started Mon Aug 18 15:01:46 2025 UTC and is due to finish in 60 minutes.  The chair is TheJulia. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:01:46 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:01:46 <opendevmeet> The meeting name has been set to 'ironic'
15:01:47 <TheJulia> o/
15:01:51 * TheJulia brews coffeeeeeeee
15:01:53 <JayF> #startmeeting ir.. jinx :D
15:01:59 <JayF> o/
15:02:04 <janders> o/
15:02:08 <iurygregory> o/
15:02:10 <TheJulia> #chair JayF
15:02:10 <opendevmeet> Current chairs: JayF TheJulia
15:02:13 <TheJulia> there ;)
15:02:21 <JayF> #info  Standing reminder to review patches tagged ironic-week-prio and to hashtag any patches ready for review with ironic-week-prio: https://tinyurl.com/ironic-weekly-prio-dash
15:02:24 <cid> o/
15:02:32 <rpittau> o/
15:02:35 <darkhackernc> o/
15:02:50 <JayF> #info https://releases.openstack.org/flamingo/schedule.html it's R-6. Time to land the things you want in this release!
15:02:57 <TheJulia> ++++++
15:03:14 <JayF> #topic Working group: Standalone Networking
15:03:21 <JayF> any updates from standalone networking NG?
15:03:48 <TheJulia> alegacy_: o/
15:04:08 <TheJulia> I think alegacy_ indicated last week he was hitting some challenges with the retooling of jsonrpc stuffs
15:04:19 <TheJulia> as it relates to the local conductor flow, but that is all I'm aware of
15:04:38 <JayF> Is there a etherpad or something with the historical info so I can fill in gaps on what we're doing?
15:05:45 <TheJulia> https://etherpad.opendev.org/p/ironic-standalone-networking
15:05:54 <JayF> #link https://etherpad.opendev.org/p/ironic-standalone-networking
15:06:01 <JayF> #topic Working group: Eventlet removal
15:06:08 <JayF> have we an eventlet still?
15:06:13 <TheJulia> Things should begin to merge in a few minutes
15:06:21 <JayF> for ironic-proper
15:06:24 <JayF> ipa is already eventletless
15:06:32 <JayF> do we know if the neutron pieces of ngs/nbm are eventlet-free yet?
15:06:43 <JayF> curious if we're actually getting eventlet-zero across the capital-P project
15:06:54 <TheJulia> I need a few reviews later in the chain, but we're on the path to be eventlet free sometime in the next 24 hours
15:06:55 <opendevreview> Merged openstack/ironic master: ci: temporary metal3 integration job disable  https://review.opendev.org/c/openstack/ironic/+/956952
15:07:09 <TheJulia> ngs and nbm area already eventlet free afaik
15:07:23 <JayF> they are in code we own for sure
15:07:25 <TheJulia> they are much more dependent upon their launchers, neutron depending on the code path
15:07:35 <JayF> yeah that's more what I was curious, if their launchers were eventlet-free
15:07:41 <JayF> very possible b/c neutron has been making good progress
15:07:44 <JayF> I might dig at that today
15:08:14 <opendevreview> Merged openstack/ironic master: Replace GreenThreadPoolExecutor in conductor  https://review.opendev.org/c/openstack/ironic/+/952939
15:08:18 <JayF> Good work to everyone on the eventlet removal, special thanks to cid TheJulia dtantsur hberaud and others who has been directly working on that stuff :D
15:08:21 <opendevreview> Merged openstack/ironic master: Set the backend to threading  https://review.opendev.org/c/openstack/ironic/+/953683
15:08:39 <JayF> lol thanks for the informational input, opendevreview :D
15:08:41 <TheJulia> speaking of!
15:09:06 <JayF> #topic Discussion topics: Clean exit from service fail
15:09:11 <JayF> janders this had your name attached
15:09:17 <janders> that's right
15:09:28 <janders> we spent a fair bit of time talking about it last week
15:09:57 <janders> we have two patches up, taking different approaches - fixing existing abort verb for servicing - and a new verb, unservice
15:10:16 <JayF> I think the biggest issue I had with the unservice patch was us being certain no agent was booted if we're forgoing a reboot
15:10:23 <janders> I raised my hand to talk about this because I am leading downstream work which depends on having a clean exit from service failed
15:10:30 <JayF> I don't have any issues with us trying to land the things we need as long as we are sure we're not adding in a "leave agent running" bug\
15:10:46 <TheJulia> JayF: yeah, and I think that will require adding a bit more logic around "did we ever launch", which is fine.
15:11:00 <TheJulia> I think the concern is ability to consume something before the cycle is out
15:11:01 <JayF> the more ham-fisted "reboot everytime" option is likely a safe patch for this release
15:11:09 <janders> my concern is whether we're able to land the "unservice" version in this cycle?
15:11:10 <JayF> and we could add the "feature" to avoid the reboot as a follow-on next cycle
15:11:12 <TheJulia> that path leaves abort, really as a middle ground
15:11:34 <janders> it does feel like the abort pathway is more likely to make it
15:11:43 <TheJulia> and thinking about it, I've reached this conclusion that it might not be awful to be able to abort a rescue in a stable state
15:11:52 <rpittau> the abort path sounds like a good compromise at the moment
15:12:15 <TheJulia> dunno, just thinking a middle ground of "we get what they are asking" makes sense over perfection which we can still strive towards as time progresses
15:12:21 <JayF> I don't have a strong opinion what verb as long as we steer clear of the security-scary no-reboot bit (or take the time to do it very well)
15:12:44 <rpittau> yeah, we can always keep  working on the unservice verb during the next cycle
15:12:44 <TheJulia> JayF: yeah, we need to do the same with abort, add "if we ever tried to boot the agent" logic
15:12:57 <TheJulia> which is fine, its a centralized code path, we just need to wire it up
15:13:34 <janders> to me it feels like if we remove the concern about the verb and related API change, we will be able to better focus on pinning down the security concern
15:13:39 <janders> would you agree?
15:13:42 <JayF> Consensus is: land a patch with a conservative approach; treat reboot-optimization as a secondary feature?
15:14:03 <JayF> janders: I see them as separate. I don't think it'd be an api break for us to hook up unservice in a "it always reboots" kinda way, then in 2026.1 remove that reboot as optimization
15:14:04 <janders> I tend to agree
15:14:14 <TheJulia> I'm thinking the reboot handing is really trivial
15:14:35 <JayF> the tough case is "we booted into an agent that couldn't lookup for some reason"
15:14:37 <TheJulia> the issue is more which direction do we focus on in that adding an unservice verb is a heavier lift because we're needing to touch api+rpc as well
15:14:45 <TheJulia> and we've got a lot we're trhing to land right now
15:14:58 <TheJulia> i.e. the bigger the thing, the harder it will be over the next couple of weeks
15:15:04 <janders> to me this is a vote towards abort
15:15:14 <rpittau> I wold just go for the abort at this point
15:15:21 <iurygregory> I would go with the abort approach at this point also
15:15:21 <janders> if there are no strong objections, maybe let's go this way?
15:15:31 <JayF> sure. For bonus points maybe hook unrescue up similarly for consistency :)
15:15:36 <TheJulia> JayF: I have a solution for that specifically, its not a big deal in my mind, but I do get the concern and both patches need to handle it
15:15:46 <TheJulia> I'm thinking we still do unrescue, tbh
15:15:54 <iurygregory> I like the unservice, but due to the time we have for things landing eventlet removal etc it can be a pain
15:15:57 <TheJulia> but... be ready for that not to land until next cycle
15:16:06 <TheJulia> if it lands this cycle, cool
15:16:12 <TheJulia> but a LOT is going on
15:16:17 <iurygregory> ++
15:16:18 <JayF> yup, land the simple to get the time pressure off
15:16:24 <janders> ++
15:16:28 <TheJulia> cool cool
15:16:38 <TheJulia> okay, then I'll take the action item to update the abort patch later today
15:16:40 <janders> that will be huge help for operators - and will allow me to wire in our downstream feature into it
15:16:44 <janders> which would be great
15:16:50 <JayF> #agreed Start with an abort-only patch for "unservice" so we can close the bug, do "unservice" verb later.
15:16:51 <TheJulia> and foretell the possibility of an unservice verb in the future
15:17:08 <JayF> Can we move on?
15:17:09 <TheJulia> cool cool
15:17:10 <TheJulia> ++
15:17:12 <janders> this is my top priority at the moment, so let me know how I can best help TheJulia
15:17:13 <janders> yes
15:17:15 <janders> thank you!
15:17:25 <TheJulia> janders: I should have a patch up for you by your morning
15:17:31 <JayF> #topic Discussion: Type annotations and checking
15:17:31 <TheJulia> unless something else explodes
15:17:34 <JayF> cid: you had brought this up
15:17:47 <cid> Yeah. Trying to get the feel of the community about adding annotations and type checking in Ironic.
15:17:51 <JayF> I am generally in favor at the face of it, I think this is a thing where the devil will be in the details/implementation
15:18:59 <JayF> if you have strong opinions against or to what implementation it'd be good to mention ...
15:19:13 <JayF> I do think there's some prior art in this area, maybe in SDKs and nova? I think nova did separate .pyi files
15:21:05 <cid> Yeah. It's one of those things, if we think that's something we're open to. The implemention approach could be shaped during reviews.
15:21:05 <JayF> I look forward to the quick
15:21:12 <JayF> ***quick +2s on cid's changes to add typing :P
15:21:19 <cid> ++ :D
15:21:28 <JayF> cid: I'd say go for it; might be nice to pick something smaller like IPA for a test
15:21:33 <TheJulia> ++
15:21:36 <JayF> cid: and that way we stay outta the hair of eventlet removal/apis/etc
15:21:42 <TheJulia> get reviewers familiar and such
15:21:43 <rpittau> yep, let's start from something small :)
15:21:56 <TheJulia> and yeah, I suspect we won't be able to get it done "this cycle"
15:22:40 <kubajj> o/
15:22:41 <JayF> cid: I strongly suggest you look at other openstack projects and follow their lead or have a good reason to be different :D
15:22:53 <JayF> #agreed cid will pilot some type-checking changes in a smaller repo, like IPA
15:22:56 <cid> Totally not this cycle. :!
15:23:01 <JayF> eventually :D
15:23:02 <JayF> back burner
15:23:03 <JayF> etc
15:23:22 <JayF> #topic Bug Deputy
15:23:30 <JayF> cid was bug deputy, notes 3 new bugs none of which are rfes
15:23:34 <JayF> anything interesting in those bugs?
15:23:38 <JayF> who wants to be the next deputy?
15:23:45 <cid> That was pretty much it
15:23:46 <iurygregory> it will be me this week
15:23:51 <cid> \o/
15:24:02 <JayF> #info iurygregory to be bug deputy next week
15:24:08 <JayF> No RFEs for review, skipping that meeting portion.
15:24:13 <JayF> #topic Open Discussion
15:24:14 <iurygregory> wouldn't be "this week"? :D
15:24:19 <iurygregory> lol
15:24:24 <JayF> anything not on the agenda for discussion?
15:24:30 <JayF> iurygregory: I signed you up for two weeks /s
15:24:37 <alegacy_> Hi... sorry I arrived late and didn't give my update earlier for standalone networking
15:24:37 <iurygregory> ack :D
15:24:48 <JayF> if you have something to add alegacy_ now is a great time
15:24:57 <alegacy_> I've resolved the RPC issues.  Continuing to test various scenarios.
15:25:16 <alegacy_> Specifically, I've been prototyping some integration into OpenShift to make sure it would handle the scenarios there
15:25:21 <alegacy_> so far so good.
15:25:49 <alegacy_> I'll be on PTO for the next two weeks, but I'm hoping by this Friday I'll have things in a good place to start to open change requests when i'm back.
15:26:08 <alegacy_> i've identified some issues/things that need to be discussed so i'll update the etherpad
15:26:28 <TheJulia> cool, as an FYI then, we're likely going to end up cutting the release around R-3 for this cycle
15:26:47 <alegacy_> ok, noted
15:27:00 * TheJulia is just guessing R-3 based upon the time left and the historical "oh noes, we need to fix x" last minute details
15:27:15 <JayF> Nice, thanks for the update alegacy_
15:27:22 <JayF> Anyone else have a topic for open discussion?
15:27:47 <TheJulia> did you start a ptg etherpad?
15:28:05 <JayF> I'm pretty sure that fell off a long list of items to do in my head /o\
15:28:06 <TheJulia> Speaking of etherpads, last week you wanted to begin discussing eventlet removal related testing
15:28:07 * JayF doin it now
15:28:42 <kubajj> I wanted to quickly ask while most people are here. For the safeguards, can we force having a volume name for all logical disks in the target raid config? (i.e. we would complain if it is not included if skip_block_devices mentions any volume name)
15:28:43 <TheJulia> https://etherpad.opendev.org/p/ironic-eventlet-removal#56 has many questionmarks :)
15:29:18 <TheJulia> kubajj: force it as a requirement now when it was not previously a requirement?
15:30:14 <kubajj> TheJulia: yes, exactly (but only if the user mentions volume name in the skip list property - i.e. we would make it fail in validation)
15:30:27 <TheJulia> This reminds me of the other person who has... VROCs. We likely need to patch the initial checking logic to go "no, don't need to do anything" with these devices/volumes
15:30:48 <JayF> can we just add some concept of "skip_unnamed_volumes" as an additional filter
15:30:51 <JayF> instead of making it magic?
15:30:53 <TheJulia> kubajj: I guess the issue is going to be framing because the cat is sort of "out of the bag"
15:31:40 <TheJulia> I guess I'm not opposed, but the concern which comes to mind is around upgrades or folks with prior schema definitions which makes me think it should be opt in or sort of like JayF has noted
15:31:46 <kubajj> now we just add a generic volume name to lists (md127, or whatever its index is)
15:31:47 <JayF> https://etherpad.opendev.org/p/ironic-ptg-2026.1 and is added to the whiteboard. I can't spell the name of our next release though so it's "G".
15:32:10 <JayF> whoever decided that complex-to-spell name was a good release name owes me a dictionary or seventeen :D lol
15:32:42 <kubajj> We would like to enforce all RAIDs to have a volume name (if there is a skip list with volume name on it) because of how we are planning to prevent cleaning them
15:33:37 <kubajj> JayF: I am pretty sure that the Spanish community in an unnamed research institute manipulated the vote
15:34:07 <JayF> kubajj and I thought the dalmatian-is-spelled-with-an-a was tough to learn :D
15:34:08 <TheJulia> kubajj: so framing it as "this is a good practice to take on", then maybe with a knob and release note could be acceptable
15:34:20 <janders> a good Gazpacho ain't bad
15:34:28 <TheJulia> oh my
15:35:09 <TheJulia> kubajj: is this so root device hints can be directly mapped by the user of software raid?
15:38:30 <kubajj> TheJulia: this is for the scenario when there are two logical disks on a certain set of holder disks (let's say root with 100 GB and data with the rest). the volume data will be on skip list, but we want to wipe root. As a workaround, we want to keep the root array, but wipe its data, but then when we create configuration, we want to make sure that it is the correct logical disk to existing raid device pair - an alternative
15:38:30 <kubajj> to enforcing volume names would be to check raid level and size, but checking size is difficult as some sizes are 'MAX' and even if they are not, from testing 100 GB is not actually 100 GB
15:38:58 <TheJulia> okay
15:38:59 <JayF> are you trying to avoid a case where a label drops off a data disk and you lose data?
15:39:54 <kubajj> JayF: the scenario that we want is to have a hypervisor which has two logical disks on two SSDs (root is 100 and the rest is data). We can reinstall root, but keep data
15:40:22 <kubajj> I am not 100% sure when we would want to do that, but we want to have the option :D
15:40:35 * TheJulia waits for the rebuild verb to re-appear
15:41:31 <kubajj> I guess it could be called rebuild, yes
15:41:58 <kubajj> I just usually only evacuate the whole node in case something is wrong, not rebuild :D
15:42:36 <JayF> This is what rebuild --preserve-ephemeral is for
15:42:49 <TheJulia> yeah, but that whole preserve-ephemeral logic is different
15:42:51 <JayF> I don't love the "magically save disks from cleaning" pattern in general so I'm probably not the best person to pitch this with tbh.
15:42:54 <TheJulia> different purpose
15:43:04 <JayF> I'm less scared of data loss; more scared of data leakage
15:43:44 <TheJulia> yeah, I think what they are doing makes sense for their use case since they are taking a target pool of nodes and very specifically managing them because they have a ton of data on those other disks that re-copying would be a nightmare since its not really a general BMaaS pool machine
15:43:53 <kubajj> JayF: we already have the removal of the flag in our decommissioning process, just need to actually make the "rebuild" not delete the data
15:44:23 <TheJulia> "hi, my pool of 48 disks is all radio telescope data that we don't want to try and push over a wire every time"
15:44:40 <JayF> TheJulia: your hadoop cluster can't sync over the data fast enough? is that what you said /s
15:44:46 <kubajj> TheJulia: exactly, I think our storage team would like to have them on their quads with 4 JBODs each, just to prevent accidents
15:44:48 <JayF> (same problems ten years later) :D
15:45:32 <TheJulia> yeah, I think this exact same challenge has come up in past discussions regarding the SKA cluster
15:45:46 <kubajj> anyway, I will take this as "yes, you can try to propose such a change" and finish the implementation
15:45:56 <JayF> To be clear; not saying I'd -1 or -2 this, just maybe more reflecting why I don't review those IPA changes, I can't really bring myself to like that approach but don't wanna block it either
15:46:48 <TheJulia> JayF: Yeah, I'm sort of in the same boat but also more inclined to review such changes because I do totally get where they are coming from.
15:48:26 <JayF> better to have software that works for people than a perfect model of cleaning
15:48:42 <TheJulia> unfortunately, yeah
15:49:14 <TheJulia> and on a plus side, the folks that do this sort of thing do know what they are doing in that the managing very specific nodes for those purposes
15:49:41 <JayF> TBH with stuff like this, I'm way more scared of the features' existance being an attractive nuisance to someone who only half-understands it
15:49:55 <JayF> we have a handful of those in ironic already, but I can't pretend like it's easy enough to use someone is likely to not be filtered by that point
15:49:57 <TheJulia> this is why I like very big warnings
15:50:20 * cardoe shakes off the crushing weight of Teams calls.
15:50:44 * JayF sees the beginnings of a "Ironic Ghost Stories" blogpost/talk with a list of all the scary things you could misuse ironic for ;)
15:50:49 * TheJulia hands cardoe a cup of coffee to remedy this weight of Teams calls.
15:50:52 <JayF> I think we're to the end of kubajj's questions though
15:50:59 <JayF> 10 minutes left, anything else for open discussion?
15:51:08 <TheJulia> JayF: concur :)
15:52:01 <JayF> Last call for topics for the meeting?
15:52:33 <TheJulia> I've got nothing
15:52:43 <cardoe> Last call? Was I on Teams calls that long. :(
15:52:51 <cardoe> I've got nothing.
15:53:06 <JayF> Thanks o/
15:53:07 <JayF> #endmeeting