15:01:28 <JayF> #startmeeting ironic
15:01:28 <opendevmeet> Meeting started Mon Dec 18 15:01:28 2023 UTC and is due to finish in 60 minutes.  The chair is JayF. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:01:28 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:01:28 <opendevmeet> The meeting name has been set to 'ironic'
15:01:33 <JayF> #topic Announcments/Reminder
15:01:55 <JayF> #info Standing reminder to review patches tagged ironic-week-prio and to hashtag your patches; https://tinyurl.com/ironic-weekly-prio-dash
15:02:09 <JayF> #info The next two Ironic meetings (Dec 25, Jan 1 2024) are cancelled.
15:02:27 <JayF> #topic Review Action Items
15:02:32 <JayF> #info JayF emailed list about cancelled meetings
15:02:37 <iurygregory> o/
15:02:40 <rpittau> o/
15:02:44 <JayF> #topic Caracal Release Schedule
15:02:52 <JayF> #info Next Milestone R-17, Caracal-@ on Jan 11
15:03:01 <JayF> Any other comments on the release schedule? Anything we need to consider?
15:03:07 <dtantsur> o/
15:03:26 <dtantsur> where do we stand with intermediate releases?
15:03:36 <JayF> I've cut none.
15:03:48 <dtantsur> rpittau: ^^
15:04:06 <rpittau> we cut bugfix in decemeber
15:04:17 <rpittau> next ones will be in February, end of it
15:04:21 <dtantsur> \o/
15:04:34 <rpittau> and thanks to that we also released ironic-image in metal3 :)
15:04:41 <JayF> Ack, sounds like we're on track then.
15:04:45 <rpittau> yup
15:04:53 <JayF> On this general topic; how is bugfix support in release automation / retiring the old ones?
15:05:05 <JayF> I know that was in process but sorta lost the thread on it during my vacation
15:05:28 <rpittau> JayF: I've opened a patch for that but I didn't get the talk going with the release team after an initial discussion
15:05:49 <rpittau> this is the patch btw https://review.opendev.org/c/openstack/releases/+/900810
15:06:08 <JayF> ack; so in progress just low priority and not moving quickly it seems
15:06:12 <JayF> basically what I expected
15:06:16 <rpittau> yeah :/
15:06:27 <JayF> #topic OpenInfra Meetup at CERN June 6 2024
15:06:41 <JayF> Looks like someone added an item suggesting a meetup for Ironic be done during this.
15:06:45 <rpittau> yes!
15:06:59 <JayF> Sounds like a good idea. I would try to go but will note that "I will try to go" still means very low likelihood
15:07:03 <JayF> so please someone else own this :D
15:07:09 <rpittau> :D
15:07:18 <rpittau> I proposed it, I will own it :)
15:07:24 <JayF> awesome \o/
15:07:33 <JayF> Gotta go see some protons go boom
15:07:40 <rpittau> arne_wiebalck: this ^ probably is of your interest
15:08:37 <rpittau> I guess a good dte would be June 5 as people will probably travel on Friday (June 7)
15:08:41 <iurygregory> Meetup is probably complicated I would say .-., even Summit is complicated to get $budget
15:08:57 <JayF> I would say picking a date is probably getting ahead of ourselves
15:09:02 <JayF> maybe just send out anemail and get feelers?
15:09:25 <rpittau> yeah, that's the intention, I was just thinking out loud
15:09:29 <JayF> I know if I went, I'd probably have to combine a UK trip with it, so I might actually be more able to go on the 7th
15:10:18 <JayF> Anything else on this topic?
15:11:20 <JayF> #topic Review Ironic CI Status
15:11:45 <dtantsur> Bifrost DHCP jobs are broken, presumably since updating ansible-collection-openstack. We don't know why.
15:11:57 <JayF> I'll note it broke for a couple days last week due to an Ironic<>Nova driver chain being tested on the *tip* but an intermediate patch broke it.
15:12:13 <JayF> Now that whole chain of the openstacksdk migration has landed and those jobs are happy
15:12:30 <rpittau> dtantsur: can we rebase the revert on top of https://review.opendev.org/c/openstack/bifrost/+/903755 to collect the dnsmasq config ?
15:12:42 <dtantsur> doing
15:12:45 <rpittau> tnx
15:12:54 <JayF> #info Bifrost DHCP jobs broke by ansible-collection-openstack upgrade; revert and investigation in progress.
15:13:01 <JayF> Anything else on hte gate?
15:13:05 <opendevreview> Dmitry Tantsur proposed openstack/bifrost master: DNM Revert "Support ansible-collections-openstack 2 and later"  https://review.opendev.org/c/openstack/bifrost/+/903694
15:14:29 <JayF> #topic Bug Deputy
15:14:37 <JayF> rpittau was bug deputy this week; anything interesting to report?
15:15:08 <rpittau> nothing new, it was really calm, I triaged a couple of old things
15:15:25 <JayF> Any volunteers to take the baton this week?
15:15:43 <JayF> If not I think this it is reasonable to say "community" can do it through the holiday?
15:16:07 <dtantsur> yeah
15:16:09 <rpittau> yep
15:16:14 <JayF> #info No specific bug deputy assigned through holiday weeks; Ironic community members encouraged to triage as they are working and have time.
15:16:21 <JayF> #topic RFE Review
15:16:23 <JayF> One for dtantsur
15:16:43 <JayF> #link https://bugs.launchpad.net/ironic/+bug/2046428 Move configdrive to an auxiliary table
15:16:53 <JayF> dtantsur: my big concern about this is how nasty is the migration going to be
15:16:56 <dtantsur> It's a small one, but has an API visibility
15:16:59 <dtantsur> well
15:17:14 <dtantsur> We won't migrate existing configdrives; the code will need to handle both locations for a good while
15:17:16 <JayF> I don't think it's going to be small for scaled up deployments with lots of active configdrive instances :)
15:17:30 <JayF> oooh, so we're not going to migrate the field outta node?
15:17:37 <dtantsur> Well, there is no "field"
15:17:43 <dtantsur> It's just something in instance_info currently
15:18:06 <JayF> *pulls up an api ref*
15:18:12 <dtantsur> So, new code will stop inserting configdrive into instance_info, but will keep reading it from both places
15:18:22 <JayF> This is ~trivial
15:18:33 <JayF> instance_info is not microversioned, we really can't microversion it
15:18:41 <dtantsur> *nod*
15:18:44 <JayF> unless we want to make changes in our nova/ironic driver harder than they already are
15:19:17 <dtantsur> :D
15:19:19 <JayF> Would we still support storing configdrives in swift?
15:19:26 <dtantsur> Absolutely
15:19:32 <JayF> Would we ever use this table in that case?
15:19:38 <JayF> e.g. I can't reach swift; is this table now a fallback?
15:19:47 <dtantsur> I don't know how many people store that in swift, to be honest. It's opt-in.
15:20:04 <JayF> That's fair. I think from that perspective because my two largest environments did
15:20:09 <JayF> but I'm sure my downstream now doesn't
15:20:14 <JayF> and swift usage is much lower
15:20:27 <dtantsur> my downstream definitely does not either :)
15:20:35 <JayF> I am +2 on the feature, and like, +.999999 to it without a spec
15:20:49 <JayF> let me put it this way: there's no way I'd be able to implement this safely without a spec
15:20:54 <JayF> but you may be able to
15:21:21 <dtantsur> The patch is likely going to be shorter than even a short spec.
15:21:32 <rpittau> not sure about the spec either, but probably not needed
15:22:04 <JayF> I think my big concern is more around code we might need to write but don't know than the code we'd know we need to write :)
15:22:12 <JayF> but you can't reduce my concern around unknown unknowns lol
15:22:24 <dtantsur> I'm afraid I cannot :D
15:22:25 <JayF> any objection to an approval as it sits, then?
15:22:34 <iurygregory> none from me
15:22:51 <JayF> #info RFE 2046428 approved
15:23:40 <JayF> #topic Open Discussion
15:23:43 <JayF> Anything for open discussion?
15:23:48 <dtantsur> Wanna chat about https://review.opendev.org/c/openstack/ironic/+/902801 ?
15:24:18 <dtantsur> I may be missing the core of your objections to it
15:24:20 <JayF> I don't like the shape of that change and I don't know how to express it
15:24:33 <JayF> I think you are
15:24:38 <JayF> and I think I am, to an extent
15:24:49 <dtantsur> (and would happily hear other opinions; no need to read the code, the summary should be enough)
15:24:51 <JayF> So basically we have a pie of threads
15:25:16 <JayF> right now, we have AFAICT, two config options to control how that pie is setup
15:25:25 <dtantsur> one?
15:25:28 <JayF> "how big is the pie" (how many threads) and "how much of the pie do periodic workers get to use"
15:25:37 <dtantsur> the latter is not a thing
15:25:47 <JayF> that is untrue, I looked it up, gimme a sec and I'll link
15:26:19 <dtantsur> https://review.opendev.org/c/openstack/ironic/+/902801/2/ironic/conductor/base_manager.py#335
15:26:36 <dtantsur> that's the same executor...
15:26:37 <JayF> https://opendev.org/openstack/ironic/src/branch/master/ironic/conf/conductor.py#L89
15:26:49 <JayF> it's the same executor, but we allow you to limit how much of that executor that the periodics will use
15:26:56 <dtantsur> *each periodic*
15:27:05 <JayF> OH
15:27:10 <dtantsur> 1 periodic can use 8 threads. 100 periodics can use 800 threads.
15:27:16 <dtantsur> This was done for power sync IIRC
15:27:27 <JayF> This conversation helps me get to the core of my point though, actually, which is nice
15:28:01 <dtantsur> it's used like this https://opendev.org/openstack/ironic/src/branch/master/ironic/conductor/manager.py#L1415-L1424
15:28:30 <JayF> I worry that we are goign to make it extremely difficult to figure out sane values for this in scaled up environments
15:28:49 <JayF> hmm but you didn't want it to be configurable
15:29:02 <dtantsur> I do have a percentage
15:29:02 <JayF> you just wanted to reserve 5% of the pie at all times for user-interactive-apis
15:29:21 <dtantsur> it's a config https://review.opendev.org/c/openstack/ironic/+/902801/2/ironic/conf/conductor.py#31
15:29:50 <JayF> I'm going to reorient my question
15:29:57 <dtantsur> 5% of the default 300 is 15, which matches my personal definition of "several" :)
15:29:58 <JayF> Do these configs exist in a post-eventlet world?
15:30:09 <dtantsur> Possibly?
15:30:13 <JayF> As laid out in the current draft in governance (if you've read it)
15:30:28 <dtantsur> We may want to limit the concurrency for any asynchronous approach we take
15:30:51 <dtantsur> Otherwise, we may land in the situation where Ironic is doing so much in parallel that it never gets to the bottom of its backlog
15:31:07 <JayF> I think I'm just trying to close the barn door when the horse has already escaped w/r/t operational complexity :(
15:31:35 <JayF> and everytime we add something like this, it gets a little harder for a new user to understand how Ironic performs, and we'll never get rid of it
15:31:50 <dtantsur> I cannot fully agree with both statements
15:32:07 <dtantsur> We *can* get rid of configuration options for sure. Removing eventlet will have a huge impact already.
15:32:12 <JayF> well agree or not, it's basically an exasperated "I give up" because I don't have a better answer and I don't want to stand in your way
15:32:42 <dtantsur> Well, it's not super critical for me. If nobody thinks it's a good idea, I'll happily walk away from it.
15:32:54 <dtantsur> (Until the next time someone tries to deploy 3500 nodes within a few hours, lol)
15:33:02 <JayF> I think it's a situation where we're maybe putting a bandaid on a wound that needs stitches, right?
15:33:18 <JayF> but the last thing we need is another "lets take a look at this from another angle" sorta thing
15:33:33 <JayF> and with eventlet's retirement from openstack on the horizon, there's no point
15:33:40 <dtantsur> "on the horizen" ;)
15:34:05 <dtantsur> I keep admiring your optimism :)
15:34:11 <JayF> so kick the can down the road is probably the right call; whether that means for me to stop fighting and drop my -1 or for us to just be OK with the concurrency chokeout bug until it's gone
15:34:14 <JayF> dtantsur: we don't have a choice
15:34:20 <JayF> dtantsur: have you looked at how bad eventlet is on 3.12?
15:34:34 <dtantsur> Not beyond what you shared with us
15:34:35 <JayF> dtantsur: I have optimism only because staying on eventlet is harder than migrating off in the medium term
15:34:45 <dtantsur> I know we must do it; I just don't know if we can practically do it
15:34:52 <JayF> which isn't exactly "optimism" so much as "out of the fire and into the pan"
15:34:57 <dtantsur> :D
15:35:08 <JayF> dtantsur: smart people have already answered the question "yes we can, and here's how"
15:35:13 <dtantsur> \o/
15:35:18 <JayF> I think code is already written which mades asyncio and eventlet code work together
15:35:24 <JayF> using eventlet/aiohub (iirc)
15:35:33 <JayF> https://github.com/eventlet/aiohub
15:35:53 * dtantsur doesn't want to imagine potential issues that may arise from it...
15:35:57 <JayF> hberaud is working on it, along with some others (including itamarst from GR-OSS)
15:36:18 <JayF> dtantsur: I'm thinking the opposite. I'm looking at the other side of this, and seeing any number of "recheck random BS failure" things disappearing
15:36:23 <dtantsur> But.. if eventlet stays in some form, so do these options?
15:36:26 <JayF> dtantsur: I'm telling you, eventlet's status today is miserable
15:36:31 <JayF> dtantsur: probably, yeah :/
15:36:49 <JayF> dtantsur: so I am like, going to pull my -1 off that. I'm not +1/+2 to the change but don't have a better idea
15:37:23 <dtantsur> Okay, let's see what the quiet people here say :) if someone actually decided it's a good idea, we'll do it. otherwise, I'll silently abandon it the next time I clean up my backlog.
15:38:21 <JayF> As another note for open discussion
15:38:30 <JayF> I believe I'm meeting downstream with a potential doc contractor
15:38:38 <JayF> that we sorta put in motion with my downstream a few weeks ago
15:38:39 <dtantsur> \o/
15:38:49 <JayF> maybe I'll ask them how to make a decoder ring for 902801 :P
15:38:56 <JayF> Anything else for open discussion?
15:40:20 <JayF> Thanks everyone, have a good holiday o/
15:40:22 <JayF> #endmeeting