15:01:28 <JayF> #startmeeting ironic 15:01:28 <opendevmeet> Meeting started Mon Dec 18 15:01:28 2023 UTC and is due to finish in 60 minutes. The chair is JayF. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:28 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:28 <opendevmeet> The meeting name has been set to 'ironic' 15:01:33 <JayF> #topic Announcments/Reminder 15:01:55 <JayF> #info Standing reminder to review patches tagged ironic-week-prio and to hashtag your patches; https://tinyurl.com/ironic-weekly-prio-dash 15:02:09 <JayF> #info The next two Ironic meetings (Dec 25, Jan 1 2024) are cancelled. 15:02:27 <JayF> #topic Review Action Items 15:02:32 <JayF> #info JayF emailed list about cancelled meetings 15:02:37 <iurygregory> o/ 15:02:40 <rpittau> o/ 15:02:44 <JayF> #topic Caracal Release Schedule 15:02:52 <JayF> #info Next Milestone R-17, Caracal-@ on Jan 11 15:03:01 <JayF> Any other comments on the release schedule? Anything we need to consider? 15:03:07 <dtantsur> o/ 15:03:26 <dtantsur> where do we stand with intermediate releases? 15:03:36 <JayF> I've cut none. 15:03:48 <dtantsur> rpittau: ^^ 15:04:06 <rpittau> we cut bugfix in decemeber 15:04:17 <rpittau> next ones will be in February, end of it 15:04:21 <dtantsur> \o/ 15:04:34 <rpittau> and thanks to that we also released ironic-image in metal3 :) 15:04:41 <JayF> Ack, sounds like we're on track then. 15:04:45 <rpittau> yup 15:04:53 <JayF> On this general topic; how is bugfix support in release automation / retiring the old ones? 15:05:05 <JayF> I know that was in process but sorta lost the thread on it during my vacation 15:05:28 <rpittau> JayF: I've opened a patch for that but I didn't get the talk going with the release team after an initial discussion 15:05:49 <rpittau> this is the patch btw https://review.opendev.org/c/openstack/releases/+/900810 15:06:08 <JayF> ack; so in progress just low priority and not moving quickly it seems 15:06:12 <JayF> basically what I expected 15:06:16 <rpittau> yeah :/ 15:06:27 <JayF> #topic OpenInfra Meetup at CERN June 6 2024 15:06:41 <JayF> Looks like someone added an item suggesting a meetup for Ironic be done during this. 15:06:45 <rpittau> yes! 15:06:59 <JayF> Sounds like a good idea. I would try to go but will note that "I will try to go" still means very low likelihood 15:07:03 <JayF> so please someone else own this :D 15:07:09 <rpittau> :D 15:07:18 <rpittau> I proposed it, I will own it :) 15:07:24 <JayF> awesome \o/ 15:07:33 <JayF> Gotta go see some protons go boom 15:07:40 <rpittau> arne_wiebalck: this ^ probably is of your interest 15:08:37 <rpittau> I guess a good dte would be June 5 as people will probably travel on Friday (June 7) 15:08:41 <iurygregory> Meetup is probably complicated I would say .-., even Summit is complicated to get $budget 15:08:57 <JayF> I would say picking a date is probably getting ahead of ourselves 15:09:02 <JayF> maybe just send out anemail and get feelers? 15:09:25 <rpittau> yeah, that's the intention, I was just thinking out loud 15:09:29 <JayF> I know if I went, I'd probably have to combine a UK trip with it, so I might actually be more able to go on the 7th 15:10:18 <JayF> Anything else on this topic? 15:11:20 <JayF> #topic Review Ironic CI Status 15:11:45 <dtantsur> Bifrost DHCP jobs are broken, presumably since updating ansible-collection-openstack. We don't know why. 15:11:57 <JayF> I'll note it broke for a couple days last week due to an Ironic<>Nova driver chain being tested on the *tip* but an intermediate patch broke it. 15:12:13 <JayF> Now that whole chain of the openstacksdk migration has landed and those jobs are happy 15:12:30 <rpittau> dtantsur: can we rebase the revert on top of https://review.opendev.org/c/openstack/bifrost/+/903755 to collect the dnsmasq config ? 15:12:42 <dtantsur> doing 15:12:45 <rpittau> tnx 15:12:54 <JayF> #info Bifrost DHCP jobs broke by ansible-collection-openstack upgrade; revert and investigation in progress. 15:13:01 <JayF> Anything else on hte gate? 15:13:05 <opendevreview> Dmitry Tantsur proposed openstack/bifrost master: DNM Revert "Support ansible-collections-openstack 2 and later" https://review.opendev.org/c/openstack/bifrost/+/903694 15:14:29 <JayF> #topic Bug Deputy 15:14:37 <JayF> rpittau was bug deputy this week; anything interesting to report? 15:15:08 <rpittau> nothing new, it was really calm, I triaged a couple of old things 15:15:25 <JayF> Any volunteers to take the baton this week? 15:15:43 <JayF> If not I think this it is reasonable to say "community" can do it through the holiday? 15:16:07 <dtantsur> yeah 15:16:09 <rpittau> yep 15:16:14 <JayF> #info No specific bug deputy assigned through holiday weeks; Ironic community members encouraged to triage as they are working and have time. 15:16:21 <JayF> #topic RFE Review 15:16:23 <JayF> One for dtantsur 15:16:43 <JayF> #link https://bugs.launchpad.net/ironic/+bug/2046428 Move configdrive to an auxiliary table 15:16:53 <JayF> dtantsur: my big concern about this is how nasty is the migration going to be 15:16:56 <dtantsur> It's a small one, but has an API visibility 15:16:59 <dtantsur> well 15:17:14 <dtantsur> We won't migrate existing configdrives; the code will need to handle both locations for a good while 15:17:16 <JayF> I don't think it's going to be small for scaled up deployments with lots of active configdrive instances :) 15:17:30 <JayF> oooh, so we're not going to migrate the field outta node? 15:17:37 <dtantsur> Well, there is no "field" 15:17:43 <dtantsur> It's just something in instance_info currently 15:18:06 <JayF> *pulls up an api ref* 15:18:12 <dtantsur> So, new code will stop inserting configdrive into instance_info, but will keep reading it from both places 15:18:22 <JayF> This is ~trivial 15:18:33 <JayF> instance_info is not microversioned, we really can't microversion it 15:18:41 <dtantsur> *nod* 15:18:44 <JayF> unless we want to make changes in our nova/ironic driver harder than they already are 15:19:17 <dtantsur> :D 15:19:19 <JayF> Would we still support storing configdrives in swift? 15:19:26 <dtantsur> Absolutely 15:19:32 <JayF> Would we ever use this table in that case? 15:19:38 <JayF> e.g. I can't reach swift; is this table now a fallback? 15:19:47 <dtantsur> I don't know how many people store that in swift, to be honest. It's opt-in. 15:20:04 <JayF> That's fair. I think from that perspective because my two largest environments did 15:20:09 <JayF> but I'm sure my downstream now doesn't 15:20:14 <JayF> and swift usage is much lower 15:20:27 <dtantsur> my downstream definitely does not either :) 15:20:35 <JayF> I am +2 on the feature, and like, +.999999 to it without a spec 15:20:49 <JayF> let me put it this way: there's no way I'd be able to implement this safely without a spec 15:20:54 <JayF> but you may be able to 15:21:21 <dtantsur> The patch is likely going to be shorter than even a short spec. 15:21:32 <rpittau> not sure about the spec either, but probably not needed 15:22:04 <JayF> I think my big concern is more around code we might need to write but don't know than the code we'd know we need to write :) 15:22:12 <JayF> but you can't reduce my concern around unknown unknowns lol 15:22:24 <dtantsur> I'm afraid I cannot :D 15:22:25 <JayF> any objection to an approval as it sits, then? 15:22:34 <iurygregory> none from me 15:22:51 <JayF> #info RFE 2046428 approved 15:23:40 <JayF> #topic Open Discussion 15:23:43 <JayF> Anything for open discussion? 15:23:48 <dtantsur> Wanna chat about https://review.opendev.org/c/openstack/ironic/+/902801 ? 15:24:18 <dtantsur> I may be missing the core of your objections to it 15:24:20 <JayF> I don't like the shape of that change and I don't know how to express it 15:24:33 <JayF> I think you are 15:24:38 <JayF> and I think I am, to an extent 15:24:49 <dtantsur> (and would happily hear other opinions; no need to read the code, the summary should be enough) 15:24:51 <JayF> So basically we have a pie of threads 15:25:16 <JayF> right now, we have AFAICT, two config options to control how that pie is setup 15:25:25 <dtantsur> one? 15:25:28 <JayF> "how big is the pie" (how many threads) and "how much of the pie do periodic workers get to use" 15:25:37 <dtantsur> the latter is not a thing 15:25:47 <JayF> that is untrue, I looked it up, gimme a sec and I'll link 15:26:19 <dtantsur> https://review.opendev.org/c/openstack/ironic/+/902801/2/ironic/conductor/base_manager.py#335 15:26:36 <dtantsur> that's the same executor... 15:26:37 <JayF> https://opendev.org/openstack/ironic/src/branch/master/ironic/conf/conductor.py#L89 15:26:49 <JayF> it's the same executor, but we allow you to limit how much of that executor that the periodics will use 15:26:56 <dtantsur> *each periodic* 15:27:05 <JayF> OH 15:27:10 <dtantsur> 1 periodic can use 8 threads. 100 periodics can use 800 threads. 15:27:16 <dtantsur> This was done for power sync IIRC 15:27:27 <JayF> This conversation helps me get to the core of my point though, actually, which is nice 15:28:01 <dtantsur> it's used like this https://opendev.org/openstack/ironic/src/branch/master/ironic/conductor/manager.py#L1415-L1424 15:28:30 <JayF> I worry that we are goign to make it extremely difficult to figure out sane values for this in scaled up environments 15:28:49 <JayF> hmm but you didn't want it to be configurable 15:29:02 <dtantsur> I do have a percentage 15:29:02 <JayF> you just wanted to reserve 5% of the pie at all times for user-interactive-apis 15:29:21 <dtantsur> it's a config https://review.opendev.org/c/openstack/ironic/+/902801/2/ironic/conf/conductor.py#31 15:29:50 <JayF> I'm going to reorient my question 15:29:57 <dtantsur> 5% of the default 300 is 15, which matches my personal definition of "several" :) 15:29:58 <JayF> Do these configs exist in a post-eventlet world? 15:30:09 <dtantsur> Possibly? 15:30:13 <JayF> As laid out in the current draft in governance (if you've read it) 15:30:28 <dtantsur> We may want to limit the concurrency for any asynchronous approach we take 15:30:51 <dtantsur> Otherwise, we may land in the situation where Ironic is doing so much in parallel that it never gets to the bottom of its backlog 15:31:07 <JayF> I think I'm just trying to close the barn door when the horse has already escaped w/r/t operational complexity :( 15:31:35 <JayF> and everytime we add something like this, it gets a little harder for a new user to understand how Ironic performs, and we'll never get rid of it 15:31:50 <dtantsur> I cannot fully agree with both statements 15:32:07 <dtantsur> We *can* get rid of configuration options for sure. Removing eventlet will have a huge impact already. 15:32:12 <JayF> well agree or not, it's basically an exasperated "I give up" because I don't have a better answer and I don't want to stand in your way 15:32:42 <dtantsur> Well, it's not super critical for me. If nobody thinks it's a good idea, I'll happily walk away from it. 15:32:54 <dtantsur> (Until the next time someone tries to deploy 3500 nodes within a few hours, lol) 15:33:02 <JayF> I think it's a situation where we're maybe putting a bandaid on a wound that needs stitches, right? 15:33:18 <JayF> but the last thing we need is another "lets take a look at this from another angle" sorta thing 15:33:33 <JayF> and with eventlet's retirement from openstack on the horizon, there's no point 15:33:40 <dtantsur> "on the horizen" ;) 15:34:05 <dtantsur> I keep admiring your optimism :) 15:34:11 <JayF> so kick the can down the road is probably the right call; whether that means for me to stop fighting and drop my -1 or for us to just be OK with the concurrency chokeout bug until it's gone 15:34:14 <JayF> dtantsur: we don't have a choice 15:34:20 <JayF> dtantsur: have you looked at how bad eventlet is on 3.12? 15:34:34 <dtantsur> Not beyond what you shared with us 15:34:35 <JayF> dtantsur: I have optimism only because staying on eventlet is harder than migrating off in the medium term 15:34:45 <dtantsur> I know we must do it; I just don't know if we can practically do it 15:34:52 <JayF> which isn't exactly "optimism" so much as "out of the fire and into the pan" 15:34:57 <dtantsur> :D 15:35:08 <JayF> dtantsur: smart people have already answered the question "yes we can, and here's how" 15:35:13 <dtantsur> \o/ 15:35:18 <JayF> I think code is already written which mades asyncio and eventlet code work together 15:35:24 <JayF> using eventlet/aiohub (iirc) 15:35:33 <JayF> https://github.com/eventlet/aiohub 15:35:53 * dtantsur doesn't want to imagine potential issues that may arise from it... 15:35:57 <JayF> hberaud is working on it, along with some others (including itamarst from GR-OSS) 15:36:18 <JayF> dtantsur: I'm thinking the opposite. I'm looking at the other side of this, and seeing any number of "recheck random BS failure" things disappearing 15:36:23 <dtantsur> But.. if eventlet stays in some form, so do these options? 15:36:26 <JayF> dtantsur: I'm telling you, eventlet's status today is miserable 15:36:31 <JayF> dtantsur: probably, yeah :/ 15:36:49 <JayF> dtantsur: so I am like, going to pull my -1 off that. I'm not +1/+2 to the change but don't have a better idea 15:37:23 <dtantsur> Okay, let's see what the quiet people here say :) if someone actually decided it's a good idea, we'll do it. otherwise, I'll silently abandon it the next time I clean up my backlog. 15:38:21 <JayF> As another note for open discussion 15:38:30 <JayF> I believe I'm meeting downstream with a potential doc contractor 15:38:38 <JayF> that we sorta put in motion with my downstream a few weeks ago 15:38:39 <dtantsur> \o/ 15:38:49 <JayF> maybe I'll ask them how to make a decoder ring for 902801 :P 15:38:56 <JayF> Anything else for open discussion? 15:40:20 <JayF> Thanks everyone, have a good holiday o/ 15:40:22 <JayF> #endmeeting