15:01:28 #startmeeting ironic 15:01:28 Meeting started Mon Dec 18 15:01:28 2023 UTC and is due to finish in 60 minutes. The chair is JayF. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:28 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:28 The meeting name has been set to 'ironic' 15:01:33 #topic Announcments/Reminder 15:01:55 #info Standing reminder to review patches tagged ironic-week-prio and to hashtag your patches; https://tinyurl.com/ironic-weekly-prio-dash 15:02:09 #info The next two Ironic meetings (Dec 25, Jan 1 2024) are cancelled. 15:02:27 #topic Review Action Items 15:02:32 #info JayF emailed list about cancelled meetings 15:02:37 o/ 15:02:40 o/ 15:02:44 #topic Caracal Release Schedule 15:02:52 #info Next Milestone R-17, Caracal-@ on Jan 11 15:03:01 Any other comments on the release schedule? Anything we need to consider? 15:03:07 o/ 15:03:26 where do we stand with intermediate releases? 15:03:36 I've cut none. 15:03:48 rpittau: ^^ 15:04:06 we cut bugfix in decemeber 15:04:17 next ones will be in February, end of it 15:04:21 \o/ 15:04:34 and thanks to that we also released ironic-image in metal3 :) 15:04:41 Ack, sounds like we're on track then. 15:04:45 yup 15:04:53 On this general topic; how is bugfix support in release automation / retiring the old ones? 15:05:05 I know that was in process but sorta lost the thread on it during my vacation 15:05:28 JayF: I've opened a patch for that but I didn't get the talk going with the release team after an initial discussion 15:05:49 this is the patch btw https://review.opendev.org/c/openstack/releases/+/900810 15:06:08 ack; so in progress just low priority and not moving quickly it seems 15:06:12 basically what I expected 15:06:16 yeah :/ 15:06:27 #topic OpenInfra Meetup at CERN June 6 2024 15:06:41 Looks like someone added an item suggesting a meetup for Ironic be done during this. 15:06:45 yes! 15:06:59 Sounds like a good idea. I would try to go but will note that "I will try to go" still means very low likelihood 15:07:03 so please someone else own this :D 15:07:09 :D 15:07:18 I proposed it, I will own it :) 15:07:24 awesome \o/ 15:07:33 Gotta go see some protons go boom 15:07:40 arne_wiebalck: this ^ probably is of your interest 15:08:37 I guess a good dte would be June 5 as people will probably travel on Friday (June 7) 15:08:41 Meetup is probably complicated I would say .-., even Summit is complicated to get $budget 15:08:57 I would say picking a date is probably getting ahead of ourselves 15:09:02 maybe just send out anemail and get feelers? 15:09:25 yeah, that's the intention, I was just thinking out loud 15:09:29 I know if I went, I'd probably have to combine a UK trip with it, so I might actually be more able to go on the 7th 15:10:18 Anything else on this topic? 15:11:20 #topic Review Ironic CI Status 15:11:45 Bifrost DHCP jobs are broken, presumably since updating ansible-collection-openstack. We don't know why. 15:11:57 I'll note it broke for a couple days last week due to an Ironic<>Nova driver chain being tested on the *tip* but an intermediate patch broke it. 15:12:13 Now that whole chain of the openstacksdk migration has landed and those jobs are happy 15:12:30 dtantsur: can we rebase the revert on top of https://review.opendev.org/c/openstack/bifrost/+/903755 to collect the dnsmasq config ? 15:12:42 doing 15:12:45 tnx 15:12:54 #info Bifrost DHCP jobs broke by ansible-collection-openstack upgrade; revert and investigation in progress. 15:13:01 Anything else on hte gate? 15:13:05 Dmitry Tantsur proposed openstack/bifrost master: DNM Revert "Support ansible-collections-openstack 2 and later" https://review.opendev.org/c/openstack/bifrost/+/903694 15:14:29 #topic Bug Deputy 15:14:37 rpittau was bug deputy this week; anything interesting to report? 15:15:08 nothing new, it was really calm, I triaged a couple of old things 15:15:25 Any volunteers to take the baton this week? 15:15:43 If not I think this it is reasonable to say "community" can do it through the holiday? 15:16:07 yeah 15:16:09 yep 15:16:14 #info No specific bug deputy assigned through holiday weeks; Ironic community members encouraged to triage as they are working and have time. 15:16:21 #topic RFE Review 15:16:23 One for dtantsur 15:16:43 #link https://bugs.launchpad.net/ironic/+bug/2046428 Move configdrive to an auxiliary table 15:16:53 dtantsur: my big concern about this is how nasty is the migration going to be 15:16:56 It's a small one, but has an API visibility 15:16:59 well 15:17:14 We won't migrate existing configdrives; the code will need to handle both locations for a good while 15:17:16 I don't think it's going to be small for scaled up deployments with lots of active configdrive instances :) 15:17:30 oooh, so we're not going to migrate the field outta node? 15:17:37 Well, there is no "field" 15:17:43 It's just something in instance_info currently 15:18:06 *pulls up an api ref* 15:18:12 So, new code will stop inserting configdrive into instance_info, but will keep reading it from both places 15:18:22 This is ~trivial 15:18:33 instance_info is not microversioned, we really can't microversion it 15:18:41 *nod* 15:18:44 unless we want to make changes in our nova/ironic driver harder than they already are 15:19:17 :D 15:19:19 Would we still support storing configdrives in swift? 15:19:26 Absolutely 15:19:32 Would we ever use this table in that case? 15:19:38 e.g. I can't reach swift; is this table now a fallback? 15:19:47 I don't know how many people store that in swift, to be honest. It's opt-in. 15:20:04 That's fair. I think from that perspective because my two largest environments did 15:20:09 but I'm sure my downstream now doesn't 15:20:14 and swift usage is much lower 15:20:27 my downstream definitely does not either :) 15:20:35 I am +2 on the feature, and like, +.999999 to it without a spec 15:20:49 let me put it this way: there's no way I'd be able to implement this safely without a spec 15:20:54 but you may be able to 15:21:21 The patch is likely going to be shorter than even a short spec. 15:21:32 not sure about the spec either, but probably not needed 15:22:04 I think my big concern is more around code we might need to write but don't know than the code we'd know we need to write :) 15:22:12 but you can't reduce my concern around unknown unknowns lol 15:22:24 I'm afraid I cannot :D 15:22:25 any objection to an approval as it sits, then? 15:22:34 none from me 15:22:51 #info RFE 2046428 approved 15:23:40 #topic Open Discussion 15:23:43 Anything for open discussion? 15:23:48 Wanna chat about https://review.opendev.org/c/openstack/ironic/+/902801 ? 15:24:18 I may be missing the core of your objections to it 15:24:20 I don't like the shape of that change and I don't know how to express it 15:24:33 I think you are 15:24:38 and I think I am, to an extent 15:24:49 (and would happily hear other opinions; no need to read the code, the summary should be enough) 15:24:51 So basically we have a pie of threads 15:25:16 right now, we have AFAICT, two config options to control how that pie is setup 15:25:25 one? 15:25:28 "how big is the pie" (how many threads) and "how much of the pie do periodic workers get to use" 15:25:37 the latter is not a thing 15:25:47 that is untrue, I looked it up, gimme a sec and I'll link 15:26:19 https://review.opendev.org/c/openstack/ironic/+/902801/2/ironic/conductor/base_manager.py#335 15:26:36 that's the same executor... 15:26:37 https://opendev.org/openstack/ironic/src/branch/master/ironic/conf/conductor.py#L89 15:26:49 it's the same executor, but we allow you to limit how much of that executor that the periodics will use 15:26:56 *each periodic* 15:27:05 OH 15:27:10 1 periodic can use 8 threads. 100 periodics can use 800 threads. 15:27:16 This was done for power sync IIRC 15:27:27 This conversation helps me get to the core of my point though, actually, which is nice 15:28:01 it's used like this https://opendev.org/openstack/ironic/src/branch/master/ironic/conductor/manager.py#L1415-L1424 15:28:30 I worry that we are goign to make it extremely difficult to figure out sane values for this in scaled up environments 15:28:49 hmm but you didn't want it to be configurable 15:29:02 I do have a percentage 15:29:02 you just wanted to reserve 5% of the pie at all times for user-interactive-apis 15:29:21 it's a config https://review.opendev.org/c/openstack/ironic/+/902801/2/ironic/conf/conductor.py#31 15:29:50 I'm going to reorient my question 15:29:57 5% of the default 300 is 15, which matches my personal definition of "several" :) 15:29:58 Do these configs exist in a post-eventlet world? 15:30:09 Possibly? 15:30:13 As laid out in the current draft in governance (if you've read it) 15:30:28 We may want to limit the concurrency for any asynchronous approach we take 15:30:51 Otherwise, we may land in the situation where Ironic is doing so much in parallel that it never gets to the bottom of its backlog 15:31:07 I think I'm just trying to close the barn door when the horse has already escaped w/r/t operational complexity :( 15:31:35 and everytime we add something like this, it gets a little harder for a new user to understand how Ironic performs, and we'll never get rid of it 15:31:50 I cannot fully agree with both statements 15:32:07 We *can* get rid of configuration options for sure. Removing eventlet will have a huge impact already. 15:32:12 well agree or not, it's basically an exasperated "I give up" because I don't have a better answer and I don't want to stand in your way 15:32:42 Well, it's not super critical for me. If nobody thinks it's a good idea, I'll happily walk away from it. 15:32:54 (Until the next time someone tries to deploy 3500 nodes within a few hours, lol) 15:33:02 I think it's a situation where we're maybe putting a bandaid on a wound that needs stitches, right? 15:33:18 but the last thing we need is another "lets take a look at this from another angle" sorta thing 15:33:33 and with eventlet's retirement from openstack on the horizon, there's no point 15:33:40 "on the horizen" ;) 15:34:05 I keep admiring your optimism :) 15:34:11 so kick the can down the road is probably the right call; whether that means for me to stop fighting and drop my -1 or for us to just be OK with the concurrency chokeout bug until it's gone 15:34:14 dtantsur: we don't have a choice 15:34:20 dtantsur: have you looked at how bad eventlet is on 3.12? 15:34:34 Not beyond what you shared with us 15:34:35 dtantsur: I have optimism only because staying on eventlet is harder than migrating off in the medium term 15:34:45 I know we must do it; I just don't know if we can practically do it 15:34:52 which isn't exactly "optimism" so much as "out of the fire and into the pan" 15:34:57 :D 15:35:08 dtantsur: smart people have already answered the question "yes we can, and here's how" 15:35:13 \o/ 15:35:18 I think code is already written which mades asyncio and eventlet code work together 15:35:24 using eventlet/aiohub (iirc) 15:35:33 https://github.com/eventlet/aiohub 15:35:53 * dtantsur doesn't want to imagine potential issues that may arise from it... 15:35:57 hberaud is working on it, along with some others (including itamarst from GR-OSS) 15:36:18 dtantsur: I'm thinking the opposite. I'm looking at the other side of this, and seeing any number of "recheck random BS failure" things disappearing 15:36:23 But.. if eventlet stays in some form, so do these options? 15:36:26 dtantsur: I'm telling you, eventlet's status today is miserable 15:36:31 dtantsur: probably, yeah :/ 15:36:49 dtantsur: so I am like, going to pull my -1 off that. I'm not +1/+2 to the change but don't have a better idea 15:37:23 Okay, let's see what the quiet people here say :) if someone actually decided it's a good idea, we'll do it. otherwise, I'll silently abandon it the next time I clean up my backlog. 15:38:21 As another note for open discussion 15:38:30 I believe I'm meeting downstream with a potential doc contractor 15:38:38 that we sorta put in motion with my downstream a few weeks ago 15:38:39 \o/ 15:38:49 maybe I'll ask them how to make a decoder ring for 902801 :P 15:38:56 Anything else for open discussion? 15:40:20 Thanks everyone, have a good holiday o/ 15:40:22 #endmeeting