15:01:46 #startmeeting ironic 15:01:46 Meeting started Mon Aug 18 15:01:46 2025 UTC and is due to finish in 60 minutes. The chair is TheJulia. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:46 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:46 The meeting name has been set to 'ironic' 15:01:47 o/ 15:01:51 * TheJulia brews coffeeeeeeee 15:01:53 #startmeeting ir.. jinx :D 15:01:59 o/ 15:02:04 o/ 15:02:08 o/ 15:02:10 #chair JayF 15:02:10 Current chairs: JayF TheJulia 15:02:13 there ;) 15:02:21 #info Standing reminder to review patches tagged ironic-week-prio and to hashtag any patches ready for review with ironic-week-prio: https://tinyurl.com/ironic-weekly-prio-dash 15:02:24 o/ 15:02:32 o/ 15:02:35 o/ 15:02:50 #info https://releases.openstack.org/flamingo/schedule.html it's R-6. Time to land the things you want in this release! 15:02:57 ++++++ 15:03:14 #topic Working group: Standalone Networking 15:03:21 any updates from standalone networking NG? 15:03:48 alegacy_: o/ 15:04:08 I think alegacy_ indicated last week he was hitting some challenges with the retooling of jsonrpc stuffs 15:04:19 as it relates to the local conductor flow, but that is all I'm aware of 15:04:38 Is there a etherpad or something with the historical info so I can fill in gaps on what we're doing? 15:05:45 https://etherpad.opendev.org/p/ironic-standalone-networking 15:05:54 #link https://etherpad.opendev.org/p/ironic-standalone-networking 15:06:01 #topic Working group: Eventlet removal 15:06:08 have we an eventlet still? 15:06:13 Things should begin to merge in a few minutes 15:06:21 for ironic-proper 15:06:24 ipa is already eventletless 15:06:32 do we know if the neutron pieces of ngs/nbm are eventlet-free yet? 15:06:43 curious if we're actually getting eventlet-zero across the capital-P project 15:06:54 I need a few reviews later in the chain, but we're on the path to be eventlet free sometime in the next 24 hours 15:06:55 Merged openstack/ironic master: ci: temporary metal3 integration job disable https://review.opendev.org/c/openstack/ironic/+/956952 15:07:09 ngs and nbm area already eventlet free afaik 15:07:23 they are in code we own for sure 15:07:25 they are much more dependent upon their launchers, neutron depending on the code path 15:07:35 yeah that's more what I was curious, if their launchers were eventlet-free 15:07:41 very possible b/c neutron has been making good progress 15:07:44 I might dig at that today 15:08:14 Merged openstack/ironic master: Replace GreenThreadPoolExecutor in conductor https://review.opendev.org/c/openstack/ironic/+/952939 15:08:18 Good work to everyone on the eventlet removal, special thanks to cid TheJulia dtantsur hberaud and others who has been directly working on that stuff :D 15:08:21 Merged openstack/ironic master: Set the backend to threading https://review.opendev.org/c/openstack/ironic/+/953683 15:08:39 lol thanks for the informational input, opendevreview :D 15:08:41 speaking of! 15:09:06 #topic Discussion topics: Clean exit from service fail 15:09:11 janders this had your name attached 15:09:17 that's right 15:09:28 we spent a fair bit of time talking about it last week 15:09:57 we have two patches up, taking different approaches - fixing existing abort verb for servicing - and a new verb, unservice 15:10:16 I think the biggest issue I had with the unservice patch was us being certain no agent was booted if we're forgoing a reboot 15:10:23 I raised my hand to talk about this because I am leading downstream work which depends on having a clean exit from service failed 15:10:30 I don't have any issues with us trying to land the things we need as long as we are sure we're not adding in a "leave agent running" bug\ 15:10:46 JayF: yeah, and I think that will require adding a bit more logic around "did we ever launch", which is fine. 15:11:00 I think the concern is ability to consume something before the cycle is out 15:11:01 the more ham-fisted "reboot everytime" option is likely a safe patch for this release 15:11:09 my concern is whether we're able to land the "unservice" version in this cycle? 15:11:10 and we could add the "feature" to avoid the reboot as a follow-on next cycle 15:11:12 that path leaves abort, really as a middle ground 15:11:34 it does feel like the abort pathway is more likely to make it 15:11:43 and thinking about it, I've reached this conclusion that it might not be awful to be able to abort a rescue in a stable state 15:11:52 the abort path sounds like a good compromise at the moment 15:12:15 dunno, just thinking a middle ground of "we get what they are asking" makes sense over perfection which we can still strive towards as time progresses 15:12:21 I don't have a strong opinion what verb as long as we steer clear of the security-scary no-reboot bit (or take the time to do it very well) 15:12:44 yeah, we can always keep working on the unservice verb during the next cycle 15:12:44 JayF: yeah, we need to do the same with abort, add "if we ever tried to boot the agent" logic 15:12:57 which is fine, its a centralized code path, we just need to wire it up 15:13:34 to me it feels like if we remove the concern about the verb and related API change, we will be able to better focus on pinning down the security concern 15:13:39 would you agree? 15:13:42 Consensus is: land a patch with a conservative approach; treat reboot-optimization as a secondary feature? 15:14:03 janders: I see them as separate. I don't think it'd be an api break for us to hook up unservice in a "it always reboots" kinda way, then in 2026.1 remove that reboot as optimization 15:14:04 I tend to agree 15:14:14 I'm thinking the reboot handing is really trivial 15:14:35 the tough case is "we booted into an agent that couldn't lookup for some reason" 15:14:37 the issue is more which direction do we focus on in that adding an unservice verb is a heavier lift because we're needing to touch api+rpc as well 15:14:45 and we've got a lot we're trhing to land right now 15:14:58 i.e. the bigger the thing, the harder it will be over the next couple of weeks 15:15:04 to me this is a vote towards abort 15:15:14 I wold just go for the abort at this point 15:15:21 I would go with the abort approach at this point also 15:15:21 if there are no strong objections, maybe let's go this way? 15:15:31 sure. For bonus points maybe hook unrescue up similarly for consistency :) 15:15:36 JayF: I have a solution for that specifically, its not a big deal in my mind, but I do get the concern and both patches need to handle it 15:15:46 I'm thinking we still do unrescue, tbh 15:15:54 I like the unservice, but due to the time we have for things landing eventlet removal etc it can be a pain 15:15:57 but... be ready for that not to land until next cycle 15:16:06 if it lands this cycle, cool 15:16:12 but a LOT is going on 15:16:17 ++ 15:16:18 yup, land the simple to get the time pressure off 15:16:24 ++ 15:16:28 cool cool 15:16:38 okay, then I'll take the action item to update the abort patch later today 15:16:40 that will be huge help for operators - and will allow me to wire in our downstream feature into it 15:16:44 which would be great 15:16:50 #agreed Start with an abort-only patch for "unservice" so we can close the bug, do "unservice" verb later. 15:16:51 and foretell the possibility of an unservice verb in the future 15:17:08 Can we move on? 15:17:09 cool cool 15:17:10 ++ 15:17:12 this is my top priority at the moment, so let me know how I can best help TheJulia 15:17:13 yes 15:17:15 thank you! 15:17:25 janders: I should have a patch up for you by your morning 15:17:31 #topic Discussion: Type annotations and checking 15:17:31 unless something else explodes 15:17:34 cid: you had brought this up 15:17:47 Yeah. Trying to get the feel of the community about adding annotations and type checking in Ironic. 15:17:51 I am generally in favor at the face of it, I think this is a thing where the devil will be in the details/implementation 15:18:59 if you have strong opinions against or to what implementation it'd be good to mention ... 15:19:13 I do think there's some prior art in this area, maybe in SDKs and nova? I think nova did separate .pyi files 15:21:05 Yeah. It's one of those things, if we think that's something we're open to. The implemention approach could be shaped during reviews. 15:21:05 I look forward to the quick 15:21:12 ***quick +2s on cid's changes to add typing :P 15:21:19 ++ :D 15:21:28 cid: I'd say go for it; might be nice to pick something smaller like IPA for a test 15:21:33 ++ 15:21:36 cid: and that way we stay outta the hair of eventlet removal/apis/etc 15:21:42 get reviewers familiar and such 15:21:43 yep, let's start from something small :) 15:21:56 and yeah, I suspect we won't be able to get it done "this cycle" 15:22:40 o/ 15:22:41 cid: I strongly suggest you look at other openstack projects and follow their lead or have a good reason to be different :D 15:22:53 #agreed cid will pilot some type-checking changes in a smaller repo, like IPA 15:22:56 Totally not this cycle. :! 15:23:01 eventually :D 15:23:02 back burner 15:23:03 etc 15:23:22 #topic Bug Deputy 15:23:30 cid was bug deputy, notes 3 new bugs none of which are rfes 15:23:34 anything interesting in those bugs? 15:23:38 who wants to be the next deputy? 15:23:45 That was pretty much it 15:23:46 it will be me this week 15:23:51 \o/ 15:24:02 #info iurygregory to be bug deputy next week 15:24:08 No RFEs for review, skipping that meeting portion. 15:24:13 #topic Open Discussion 15:24:14 wouldn't be "this week"? :D 15:24:19 lol 15:24:24 anything not on the agenda for discussion? 15:24:30 iurygregory: I signed you up for two weeks /s 15:24:37 Hi... sorry I arrived late and didn't give my update earlier for standalone networking 15:24:37 ack :D 15:24:48 if you have something to add alegacy_ now is a great time 15:24:57 I've resolved the RPC issues. Continuing to test various scenarios. 15:25:16 Specifically, I've been prototyping some integration into OpenShift to make sure it would handle the scenarios there 15:25:21 so far so good. 15:25:49 I'll be on PTO for the next two weeks, but I'm hoping by this Friday I'll have things in a good place to start to open change requests when i'm back. 15:26:08 i've identified some issues/things that need to be discussed so i'll update the etherpad 15:26:28 cool, as an FYI then, we're likely going to end up cutting the release around R-3 for this cycle 15:26:47 ok, noted 15:27:00 * TheJulia is just guessing R-3 based upon the time left and the historical "oh noes, we need to fix x" last minute details 15:27:15 Nice, thanks for the update alegacy_ 15:27:22 Anyone else have a topic for open discussion? 15:27:47 did you start a ptg etherpad? 15:28:05 I'm pretty sure that fell off a long list of items to do in my head /o\ 15:28:06 Speaking of etherpads, last week you wanted to begin discussing eventlet removal related testing 15:28:07 * JayF doin it now 15:28:42 I wanted to quickly ask while most people are here. For the safeguards, can we force having a volume name for all logical disks in the target raid config? (i.e. we would complain if it is not included if skip_block_devices mentions any volume name) 15:28:43 https://etherpad.opendev.org/p/ironic-eventlet-removal#56 has many questionmarks :) 15:29:18 kubajj: force it as a requirement now when it was not previously a requirement? 15:30:14 TheJulia: yes, exactly (but only if the user mentions volume name in the skip list property - i.e. we would make it fail in validation) 15:30:27 This reminds me of the other person who has... VROCs. We likely need to patch the initial checking logic to go "no, don't need to do anything" with these devices/volumes 15:30:48 can we just add some concept of "skip_unnamed_volumes" as an additional filter 15:30:51 instead of making it magic? 15:30:53 kubajj: I guess the issue is going to be framing because the cat is sort of "out of the bag" 15:31:40 I guess I'm not opposed, but the concern which comes to mind is around upgrades or folks with prior schema definitions which makes me think it should be opt in or sort of like JayF has noted 15:31:46 now we just add a generic volume name to lists (md127, or whatever its index is) 15:31:47 https://etherpad.opendev.org/p/ironic-ptg-2026.1 and is added to the whiteboard. I can't spell the name of our next release though so it's "G". 15:32:10 whoever decided that complex-to-spell name was a good release name owes me a dictionary or seventeen :D lol 15:32:42 We would like to enforce all RAIDs to have a volume name (if there is a skip list with volume name on it) because of how we are planning to prevent cleaning them 15:33:37 JayF: I am pretty sure that the Spanish community in an unnamed research institute manipulated the vote 15:34:07 kubajj and I thought the dalmatian-is-spelled-with-an-a was tough to learn :D 15:34:08 kubajj: so framing it as "this is a good practice to take on", then maybe with a knob and release note could be acceptable 15:34:20 a good Gazpacho ain't bad 15:34:28 oh my 15:35:09 kubajj: is this so root device hints can be directly mapped by the user of software raid? 15:38:30 TheJulia: this is for the scenario when there are two logical disks on a certain set of holder disks (let's say root with 100 GB and data with the rest). the volume data will be on skip list, but we want to wipe root. As a workaround, we want to keep the root array, but wipe its data, but then when we create configuration, we want to make sure that it is the correct logical disk to existing raid device pair - an alternative 15:38:30 to enforcing volume names would be to check raid level and size, but checking size is difficult as some sizes are 'MAX' and even if they are not, from testing 100 GB is not actually 100 GB 15:38:58 okay 15:38:59 are you trying to avoid a case where a label drops off a data disk and you lose data? 15:39:54 JayF: the scenario that we want is to have a hypervisor which has two logical disks on two SSDs (root is 100 and the rest is data). We can reinstall root, but keep data 15:40:22 I am not 100% sure when we would want to do that, but we want to have the option :D 15:40:35 * TheJulia waits for the rebuild verb to re-appear 15:41:31 I guess it could be called rebuild, yes 15:41:58 I just usually only evacuate the whole node in case something is wrong, not rebuild :D 15:42:36 This is what rebuild --preserve-ephemeral is for 15:42:49 yeah, but that whole preserve-ephemeral logic is different 15:42:51 I don't love the "magically save disks from cleaning" pattern in general so I'm probably not the best person to pitch this with tbh. 15:42:54 different purpose 15:43:04 I'm less scared of data loss; more scared of data leakage 15:43:44 yeah, I think what they are doing makes sense for their use case since they are taking a target pool of nodes and very specifically managing them because they have a ton of data on those other disks that re-copying would be a nightmare since its not really a general BMaaS pool machine 15:43:53 JayF: we already have the removal of the flag in our decommissioning process, just need to actually make the "rebuild" not delete the data 15:44:23 "hi, my pool of 48 disks is all radio telescope data that we don't want to try and push over a wire every time" 15:44:40 TheJulia: your hadoop cluster can't sync over the data fast enough? is that what you said /s 15:44:46 TheJulia: exactly, I think our storage team would like to have them on their quads with 4 JBODs each, just to prevent accidents 15:44:48 (same problems ten years later) :D 15:45:32 yeah, I think this exact same challenge has come up in past discussions regarding the SKA cluster 15:45:46 anyway, I will take this as "yes, you can try to propose such a change" and finish the implementation 15:45:56 To be clear; not saying I'd -1 or -2 this, just maybe more reflecting why I don't review those IPA changes, I can't really bring myself to like that approach but don't wanna block it either 15:46:48 JayF: Yeah, I'm sort of in the same boat but also more inclined to review such changes because I do totally get where they are coming from. 15:48:26 better to have software that works for people than a perfect model of cleaning 15:48:42 unfortunately, yeah 15:49:14 and on a plus side, the folks that do this sort of thing do know what they are doing in that the managing very specific nodes for those purposes 15:49:41 TBH with stuff like this, I'm way more scared of the features' existance being an attractive nuisance to someone who only half-understands it 15:49:55 we have a handful of those in ironic already, but I can't pretend like it's easy enough to use someone is likely to not be filtered by that point 15:49:57 this is why I like very big warnings 15:50:20 * cardoe shakes off the crushing weight of Teams calls. 15:50:44 * JayF sees the beginnings of a "Ironic Ghost Stories" blogpost/talk with a list of all the scary things you could misuse ironic for ;) 15:50:49 * TheJulia hands cardoe a cup of coffee to remedy this weight of Teams calls. 15:50:52 I think we're to the end of kubajj's questions though 15:50:59 10 minutes left, anything else for open discussion? 15:51:08 JayF: concur :) 15:52:01 Last call for topics for the meeting? 15:52:33 I've got nothing 15:52:43 Last call? Was I on Teams calls that long. :( 15:52:51 I've got nothing. 15:53:06 Thanks o/ 15:53:07 #endmeeting