| abongale | good morning ironic o/ | 08:23 |
|---|---|---|
| FreemanBoss[m] | good morning ironic | 12:39 |
| cardoe | JayF: naming is hard | 13:07 |
| TheJulia | good morning | 13:09 |
| dtantsur | Hi folks. Does anyone know the CI situation? It looks like we have quite a few of DIB failures, maybe more. | 13:15 |
| TheJulia | I've not looked yet this morning, is it errors reading a qcow image to extract it? | 13:15 |
| dtantsur | I think so | 13:16 |
| TheJulia | presently a known issue, I'll check the ticket once I'm logged into jira | 13:16 |
| TheJulia | hmm, last update from the 21st noting vagrant-virtualbox, and ec2 images are similarly impacted | 13:18 |
| TheJulia | https://issues.redhat.com/browse/CS-2983 | 13:21 |
| TheJulia | But most jobs on friday worked | 13:22 |
| cardoe | TheJulia: so I'm trying to do what you suggested with the notify_controller to kick off the continue_inspection from inspect_hardware. Wondering if I should drop the node lock in inspect_hardware so that continue_inspection can grab it or if I should use after_spawn | 13:39 |
| * dtantsur is very scared | 13:40 | |
| cardoe | And no rush on answering. Just wanted to throw this out there. | 13:40 |
| TheJulia | That seems a bit excessive | 13:40 |
| cardoe | dtantsur: well the goal was that redfish inspection now behaves identical to agent inspection | 13:40 |
| TheJulia | your scaring the owlet ;) | 13:40 |
| TheJulia | ... or am I?!? | 13:41 |
| * TheJulia is worried now | 13:41 | |
| dtantsur | cardoe: the problem with all these notify_* calls is that if they fail, nothing will ever retry them. | 13:41 |
| dtantsur | for the agent inspection, IPA will | 13:41 |
| dtantsur | if you use it from Ironic code.. well.. it's on you somehow | 13:41 |
| * dtantsur expands the feathers to look bigger | 13:42 | |
| TheJulia | heh | 13:42 |
| cardoe | dtantsur: well unfortunately a lot of behavior of inspection that should likely be common in now embedded in the rpc side of the conductor in the continue_inspection call. | 13:43 |
| cardoe | I can pull that out to something common if that's more preferred. | 13:43 |
| dtantsur | I vote for refactoring, yeah | 13:43 |
| cardoe | I've got a pile of refactors then so help me by merging some cause they're all very boring mechanical moving ones. | 13:44 |
| cardoe | So like here's one that I think is more correct testing wise... but then again maybe not? https://review.opendev.org/c/openstack/ironic/+/958370 | 13:45 |
| cardoe | I noticed agent inspection has a bug in some of the tests that check things of the task.node object but manipulations are made after the last node.save() and therefore the test case isn't what's actually written to the DB. | 13:45 |
| cardoe | So that patch attempts to avoid that condition in redfish. | 13:46 |
| cardoe | I've got another one where I add _get_$THING_info() for each $THING in the redfish. | 13:47 |
| cardoe | I think Julia suggested that one. | 13:47 |
| cardoe | My last big scary one is that I refactored the whole inspector folder and renamed a bunch of stuff. | 13:47 |
| cardoe | inspector/interface.py -> inspector/base.py since we import it as common and base everywhere. I made base just have the abstract class and added a bunch of tests. | 13:48 |
| dtantsur | Rename all the things \o/ | 13:48 |
| cardoe | In that refactor I moved the property validation to be a common operation so that all inspection interfaces must comply with the ESSENTIAL_PROPERTIES check. | 13:48 |
| cardoe | Which removed code out of all the other inspectors except agent inspector since it didn't validate them. | 13:49 |
| cardoe | But Julia said we probably don't need to validate properties anymore. | 13:49 |
| cardoe | But the behavior on agent was different than the other inspectors because it didn't ensure properties got set. | 13:49 |
| cardoe | Anyway, just need an idea of which direction to go. | 13:49 |
| cardoe | I wanna get as much of this upstream and out of our downstream as possible. | 13:50 |
| cardoe | We're gonna be adding out-of-band inspection as a service step as well. | 13:51 |
| dtantsur | Inspection is a service step is a neat idea, although I'm curious what you're going to do with ports | 14:01 |
| * TheJulia looses he rbrain | 14:02 | |
| TheJulia | brain | 14:02 |
| opendevreview | cid proposed openstack/ironic master: Fix improper HTTP status code usage (RFC 7231) https://review.opendev.org/c/openstack/ironic/+/958454 | 14:05 |
| cardoe | dtantsur: so we want to have two different types of inspections... one which creates ports and updates them for an "initial discovery" type inspection and then another that's the "service step inspection" | 14:05 |
| iurygregory | I've updated the report for the bug triage, I probably wont be able to join the upstream meeting (I have a conflict with a downstream meeting) | 14:06 |
| cardoe | The idea is someone in the DC might have just touched the box or maybe we periodically do it just cause. And we just make sure that things are still the same. | 14:06 |
| cardoe | iurygregory: the IPE patch that I wanna see land too is held up because of CentOS 9 stuff. | 14:06 |
| iurygregory | cardoe, hey o/ | 14:11 |
| dtantsur | cardoe: yeah, I get requests for updating inspection data too | 14:11 |
| iurygregory | I will update a few things on it today | 14:11 |
| iurygregory | to re-use redfish part a bit more | 14:12 |
| opendevreview | Merged openstack/ironic master: Add request logging middleware for API requests https://review.opendev.org/c/openstack/ironic/+/958103 | 14:18 |
| opendevreview | Merged openstack/ironic bugfix/30.0: Memoize calls to bcrypt.checkpw https://review.opendev.org/c/openstack/ironic/+/958315 | 14:18 |
| opendevreview | Merged openstack/ironic master: Orphaned accelerators after devices removed https://review.opendev.org/c/openstack/ironic/+/956060 | 14:19 |
| opendevreview | Merged openstack/ironic master: redfish: mechanical moves of inspection tests https://review.opendev.org/c/openstack/ironic/+/958370 | 14:19 |
| cardoe | oh hey that merged. Thanks all. :-D | 14:25 |
| cardoe | If ya want another super simple one https://review.opendev.org/c/openstack/ironic/+/958371 that'll make my use of mypy across the inspection code happy. | 14:26 |
| * TheJulia blinks | 14:32 | |
| opendevreview | Clif Houck proposed openstack/ironic master: Add a new 'physical_network' field to the Portgroup object https://review.opendev.org/c/openstack/ironic/+/955625 | 14:58 |
| opendevreview | Clif Houck proposed openstack/ironic master: Add a new 'category' field to the Portgroup object https://review.opendev.org/c/openstack/ironic/+/955713 | 15:00 |
| TheJulia | hmm | 15:03 |
| TheJulia | #startmeeting ironic | 15:03 |
| opendevmeet | Meeting started Mon Aug 25 15:03:46 2025 UTC and is due to finish in 60 minutes. The chair is TheJulia. Information about MeetBot at http://wiki.debian.org/MeetBot. | 15:03 |
| opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 15:03 |
| opendevmeet | The meeting name has been set to 'ironic' | 15:03 |
| TheJulia | o/ | 15:03 |
| clif | o/ | 15:03 |
| kubajj | o/ | 15:04 |
| cardoe | o/ | 15:05 |
| TheJulia | So who is going to chair our meeting this week? Or are we all going to need to have more coffeeeeee? | 15:05 |
| mostepha | o/ | 15:05 |
| TheJulia | \o | 15:05 |
| TheJulia | I guess I'm chairing today | 15:05 |
| TheJulia | Welcome to this week's Ironic meeting! | 15:06 |
| TheJulia | Our agenda can be found on the wiki! | 15:06 |
| TheJulia | #link https://wiki.openstack.org/wiki/Meetings/Ironic#Agenda_for_August_25.2C_2025 | 15:06 |
| cid | o/ | 15:06 |
| JayF | o/ | 15:06 |
| TheJulia | As for our standard reminder, please review priority items labled with the hashtag ironic-week-prio. A dashboard link is available for this purpose. | 15:06 |
| TheJulia | #link https://review.opendev.org/q/hashtag:%22ironic-week-prio%22+(status:open) | 15:06 |
| TheJulia | This week is R-5 on the release schedule. | 15:07 |
| TheJulia | #link https://releases.openstack.org/flamingo/schedule.html | 15:07 |
| TheJulia | As a reminder! This week other projects freeze. Final release for client libraries occur, requirements also freeze, and well, yeah. You can see the full list on the schedule | 15:08 |
| TheJulia | Next week! Cycle highlights are due, so we should also consider composing our prelude for this release | 15:08 |
| TheJulia | Based upon my experience over the years, we're likely going to have to release in 2 weeks. | 15:09 |
| JayF | I'll take a look at cycle highlights | 15:09 |
| JayF | #action JayF to do cycle highlights | 15:09 |
| TheJulia | #chair jayf | 15:09 |
| opendevmeet | Current chairs: TheJulia jayf | 15:09 |
| JayF | #action JayF to do cycle highlights | 15:09 |
| TheJulia | Does anyone else have anything to remind us of or announce this week? | 15:09 |
| TheJulia | I guess not, onward! | 15:10 |
| TheJulia | #topic Working Group Updates | 15:10 |
| TheJulia | Our first update I'll be proxying for the Standalone networking working group. | 15:10 |
| TheJulia | alegacy messaged me with a scheduled message and let me know that he made progress but feels he will need to revise the spec. Basically they realized a chicken/egg issue that we're already aware of in ironic, but I suspect we never clearly documented. tl;dr how do you get the data automatically. | 15:12 |
| TheJulia | In terms of Eventlet removal! | 15:12 |
| JayF | I move that we rename the Eventlet WG the Eventlet is dead partytime group | 15:13 |
| JayF | ;) | 15:13 |
| TheJulia | It feels like we're in final clean-up stage, where we may be partying and forgetting any final clean-ups. | 15:13 |
| TheJulia | For any final cleanups, lets go ahead and get them tracked and raise awareness if we're aware of any. | 15:13 |
| JayF | request logging fix landed this morning, there's a couple of dev environment cleanups I have up that (I think?) have not yet landed | 15:13 |
| TheJulia | That matches my awareness | 15:14 |
| JayF | https://review.opendev.org/c/openstack/ironic/+/958249 | 15:14 |
| TheJulia | I think we were also going to pull the wsgi entry point? | 15:14 |
| JayF | Ah, that's a good one too | 15:14 |
| TheJulia | or am I incorrectly presenting that one? | 15:14 |
| TheJulia | (why? I don't know?!) | 15:14 |
| JayF | I think that's aaccurate, but I don't think it's tied with eventlet directly | 15:14 |
| JayF | so much as tied with "remove old crap" :) | 15:15 |
| TheJulia | Yeah | 15:15 |
| TheJulia | Okay, anything else regarding working group updates? | 15:15 |
| JayF | For eventlet WG | 15:15 |
| JayF | we should mention we are encouraging contributors to set aside some meaningful time | 15:15 |
| JayF | to perform basic testing locally, if you have hardware test against it | 15:16 |
| JayF | the weirder the use case tested the better :) | 15:16 |
| TheJulia | I concur | 15:16 |
| JayF | also 👏 to our CNCF counterparts for helping us find the first (only?) major regression | 15:16 |
| TheJulia | #agreed Contributors should try to take time to exercise the "weirder" use cases to observe if they have found any regressions. | 15:17 |
| TheJulia | ++ | 15:17 |
| TheJulia | FWIW, I did sit down and just verify the vnc console stuff was working still last week and I was able to confirm it was happy as a single process. | 15:17 |
| JayF | I would *really* like testing against ilo driver | 15:18 |
| JayF | but I don't personally have access to that hardware so :/ | 15:18 |
| TheJulia | Unfortunately, I have no ilo gear | 15:18 |
| TheJulia | I *suspect* we may have another issue brewing, of sorts, with the redfish cache, but I've not sat down to explore or inspect logs | 15:19 |
| TheJulia | (session cache) | 15:19 |
| kubajj | TheJulia: I recently figured out that we do have some ilo nodes we could test stuff on | 15:20 |
| TheJulia | Anyhow, it looks like have no discussion topics today, so we can proceed to the Bug Deputy updates if there are nothing else | 15:20 |
| JayF | it might be a good idea to chat about portgroup.physical_network; but that can wait until open discussion | 15:21 |
| TheJulia | Yeah, it would :) | 15:21 |
| TheJulia | kubajj: if y'all can, that would be much appreciated. | 15:21 |
| * TheJulia waits for a moment to see if there is anything eventlet/testing related before we proceed | 15:21 | |
| kubajj | TheJulia: although I am not 100% familiar with the potential issues that could arise | 15:22 |
| TheJulia | kubajj: largely the removal of eventlet changes the threading model to use... actual threads. It increases memory usage may expose issues in drivers doing anything odd. Because threads will basically become blocking, the aspects eventlet monkeypatches, largely around socket/thread interactions can appear as pain points | 15:23 |
| kubajj | TheJulia: I assume this involves both ironic and the IPA? | 15:24 |
| TheJulia | kubajj: Eventlet was removed from both, IPA is less multi-threaded and is much simpler in many ways. Ironic at it's core is *really* where it is at. | 15:25 |
| JayF | It's hard to me to imagine we have any edges in IPA that haven't been tested | 15:26 |
| kubajj | TheJulia: ok, we will try to build Ironic and IPA from master and see what fails | 15:26 |
| JayF | We made a point to test image streaming + failures when building that feature which I think is the edgiest case there | 15:26 |
| JayF | but IMBW :) | 15:26 |
| TheJulia | kind of yeah, and also there is an easy fallback path for IPA, load a slightly older image up and roll forward while we work a bug ;) | 15:26 |
| TheJulia | #topic Bug Deputy Updates | 15:27 |
| TheJulia | iurygregory: you're up if your around | 15:27 |
| iurygregory | quick updates | 15:27 |
| iurygregory | 3 new bugs during the week | 15:27 |
| iurygregory | all have patches up =) | 15:27 |
| TheJulia | I think 2 are actually "resolved" for now | 15:27 |
| iurygregory | some are even merged | 15:27 |
| iurygregory | yeah | 15:27 |
| TheJulia | cool cool | 15:27 |
| TheJulia | Anything else? Would anyone like to be our bug deputy for this week? | 15:29 |
| clif | I'll volunteer | 15:29 |
| clif | assuming no one else is in line already | 15:30 |
| TheJulia | #action clif to be our bug deputy for this week! | 15:30 |
| clif | \o/ | 15:30 |
| TheJulia | Well, since we have no RFEs to review this week! | 15:31 |
| TheJulia | #topic Open Discussion | 15:31 |
| kubajj | anybody knows what's up with IPA CI? | 15:31 |
| TheJulia | kubajj: is it blowing up a lot in dib's extract-image logic ? | 15:32 |
| JayF | #note CI for anything that needs a CentOS image, including IPA CI and 4k test jobs in Ironic, are failing due to upstream CentOS issues. | 15:32 |
| JayF | TheJulia: yes | 15:32 |
| TheJulia | yeah, that! | 15:32 |
| JayF | I noticed it last week when 4k job was busted | 15:32 |
| JayF | figured no need to pile on and be too loud about it | 15:32 |
| TheJulia | They already have multiple groups of folks being loud on the ticket | 15:32 |
| TheJulia | #link https://issues.redhat.com/browse/CS-2983 | 15:33 |
| clif | I'm a bit perturbed it hasn't been addressed yet | 15:33 |
| kubajj | I am not sure if it is that, but could be https://zuul.opendev.org/t/openstack/build/5a5e32685bb14ace847a3b9f8fa0d07a | 15:33 |
| TheJulia | So, as for that physical_network topic! | 15:33 |
| TheJulia | clif: as am I.... | 15:33 |
| JayF | looking at https://docs.openstack.org/ironic/latest/admin/networking.html#terminology | 15:33 |
| JayF | plus the fact that we can loosen things up later, but it's not as easy to tighten them up later | 15:34 |
| JayF | I think it makes sense to make portgroup.physical_network either be inherited or be limited to values present on their underlying ports | 15:34 |
| JayF | the part that makes this hairy is that I suspect it's possible to have portgroup X containing ports A and B, and having ports A and B have different physical networks | 15:35 |
| JayF | even though that config wouldn't make sense, if the data exists we'd have to handle it | 15:35 |
| JayF | does that match cardoe and TheJulia concerns/thoughts? | 15:35 |
| * JayF tried to catch up this morning :D | 15:35 | |
| TheJulia | I think having a portgroup spanning to physically different portgroups is *really* a bad idea... but its sort of kind of valid modeling wise | 15:37 |
| TheJulia | because you *could* sort of say the portgroup exists on *both* | 15:37 |
| TheJulia | I mean... not ideal, likely model breaking overall, but sort of possible. | 15:37 |
| TheJulia | And even then, odds are you *really* wouldn't have a proper LACP config in such a case | 15:38 |
| JayF | my question is | 15:38 |
| JayF | is there ever a case where that ^^ makes sense | 15:38 |
| JayF | that doesn't involve "operator physically moves the cable from X fabric to Y" | 15:38 |
| JayF | I can't find one. | 15:38 |
| JayF | and if the physical world changes, asking people to update the API isn't just OK; it's the most sensible thing | 15:38 |
| TheJulia | I think if someone came with a case where they said it makes sense, I'd challenge them to explain why. I mean, I guess I can see a sort of meta question to this argument, I'm viewing it as a shorthand, but there are other fields in this model they could also use | 15:39 |
| TheJulia | I'm thinking inherant and enforce, and maybe we put a config option around the validation | 15:39 |
| JayF | I'd say just lock down by default | 15:39 |
| TheJulia | and if someone does something like that blow up/detonate/scream/etc | 15:39 |
| TheJulia | and re-iterate in the docs | 15:39 |
| TheJulia | cardoe: thoughts^ | 15:39 |
| JayF | if you wanna set portgroup.physical_network, it needs to be one that's on port.physical_network | 15:39 |
| TheJulia | I'm good with that, personally | 15:39 |
| JayF | if there's ever a mismatch when we go to do things, go boom | 15:39 |
| TheJulia | yeah | 15:40 |
| JayF | *then if later needed* add the toggle switch and implement the other case | 15:40 |
| JayF | I'd rather be able to make assumptions about the data is the use case isn't obvious | 15:40 |
| TheJulia | fair | 15:41 |
| TheJulia | clif: seems like you'd need to add more validation logic (if you haven't already) | 15:41 |
| TheJulia | (It occurs to me cardoe has a meeting overlap around now on mondays as well...( | 15:42 |
| TheJulia | ) | 15:42 |
| cardoe | Sorry I had to step away and getting caught up. | 15:42 |
| JayF | TheJulia: clif: should we also add enforcement the other way: if someone updates port.physical_network to a different value than it's portgroup, is that OK? Should that blank the portgroup value? | 15:42 |
| cardoe | Yes. I've got the show slide decks to senior leadership about project status call right now. | 15:42 |
| JayF | Or is this just a safeguard on the way in | 15:42 |
| clif | can a port belong to more than one portgroup? | 15:43 |
| clif | and once a port is in a portgroup what do we currently do if the port changes somehow? | 15:43 |
| cardoe | JayF: you summed it up well. | 15:43 |
| TheJulia | clif: no, it can't belong to more than one afaik | 15:44 |
| cardoe | yeah it needs to be removed afaik | 15:44 |
| TheJulia | so there is more guardrail logic needed on the port, but I think that could also be added separately | 15:44 |
| TheJulia | as long as we don't forget it. | 15:44 |
| cardoe | I'd like if 2 ports are part of the same PortGroup and someone tried to update the physical_network on 1 Port to differ from the other Port that they get an error. That's actually be a real jira I've got here. | 15:45 |
| TheJulia | Actually, that... could make a lot of sense, use the portgroup as the aggregate to reset the ports | 15:46 |
| clif | so if physical network is something derived/inherited from member ports, is category as well? | 15:46 |
| TheJulia | since... changing them is sort of a PITA separately. | 15:46 |
| cardoe | TheJulia: That would be good with me. | 15:46 |
| TheJulia | category is for humans, not the code ;) | 15:46 |
| TheJulia | at least, I think it is for humans | 15:46 |
| JayF | my real concern about this suggestion | 15:47 |
| JayF | is that we don't have an easy way in the client to do that update atomically, do we? | 15:47 |
| JayF | I guess you can't update portgroup/port atomically in our API at all | 15:47 |
| TheJulia | Going back to the other open discussion topic, I've politely asked for an ETR as to when they expect to have the mirrors resolved | 15:47 |
| * TheJulia is tempted to also increase the issue's priority | 15:47 | |
| JayF | so it'd have to be a side-effect of updating portgroup.physical_network to update ports[].physical_network ... which seems OK? | 15:47 |
| JayF | as long as we document it loudly :) | 15:48 |
| TheJulia | I think it would be a super logical side-effect | 15:48 |
| cardoe | JayF: yeah if we said to update physical_network on Ports in a PortGroup, you must update the PortGroup. | 15:48 |
| TheJulia | And doing that is also way more user friendly, really | 15:48 |
| TheJulia | Just, needs to be documented, very loudly. | 15:48 |
| JayF | would we allow port.physical_network to be manually updated if on the relevant microversion? | 15:48 |
| JayF | My feeling is NO | 15:48 |
| JayF | because that could cause inconsistency | 15:49 |
| cardoe | I'm a no as well. | 15:49 |
| TheJulia | Also likely with a "expect to wait for a little bit for the ironic-neutron-agent to pickup the change | 15:49 |
| JayF | but we /could/ permanently break some (weird and maybe unrealistic, as noted above) workflows that might need ports on different physnets in a group | 15:49 |
| TheJulia | Portgroups are a bit of a weird edge case, so I think its a "if set on the port but not portgroup, its okay to reset/change" | 15:50 |
| TheJulia | but the moment its set it would need to be set the same across all ports | 15:50 |
| TheJulia | and if your changing one, you need to change them all | 15:50 |
| JayF | To summarize: | 15:50 |
| JayF | - When we update portgroup.physical_network, update member ports.physical_network | 15:50 |
| JayF | - On that same microversion boundry, disallow update of port.physical_network when a member of a portgroup (with a friendly directive error IMO) | 15:50 |
| JayF | - Document the hell out of this behavior change in release notes and anywhere we mention physical_network | 15:50 |
| TheJulia | *unless* you remove the port from the portgroup | 15:50 |
| JayF | so there's one case left that I'm unsure about | 15:51 |
| TheJulia | I believe that sums it up, aside from possibly permitting in the case of if portgroup.physical_network is not already set | 15:51 |
| TheJulia | They are not in a great state, but it just means they aren't using the portgroup.physical_network field at all | 15:52 |
| TheJulia | (That is less likely to be breaking to anyone as well) | 15:52 |
| JayF | if I have a preexisting db with portgroup X containing A and B ports, and they are *already on* different physical_networks .... it's just okay? | 15:52 |
| opendevreview | Jakub Jelinek proposed openstack/ironic-python-agent master: Fix skip block devices for RAID arrays https://review.opendev.org/c/openstack/ironic-python-agent/+/937342 | 15:52 |
| JayF | sounds like based on what you said, we allow that | 15:52 |
| TheJulia | JayF: I'll file under "not great" | 15:52 |
| clif | so are we saying that portgroup.phsyical_network is it's own attribute/member but it's value(s) are logically limited to it's members | 15:52 |
| clif | ? | 15:52 |
| JayF | well you want to enable not-great as long as portgroup.physical_network is null (which makes sense and is less breaky) | 15:52 |
| TheJulia | yeah, its "not great" and if they try to use this feature then we need to break... err... I mean guide them | 15:52 |
| JayF | clif: so think about physical network as actually reflecting a switch (fabric) in the real world | 15:53 |
| JayF | there is *never* a case we can figure out where two ports would be bonded (portgrouped) without being on the same switch fabric | 15:53 |
| JayF | but we cannot assume there's not some downstream somewhere abusing that field for something else, or using it in a patch, etc, so we're try to be as not-heavy-handed as possible | 15:53 |
| JayF | update of this item: - On that same microversion boundry, disallow update of port.physical_network when a member of a portgroup that has portgroup.physical_network set (with a friendly directive error IMO) | 15:54 |
| cardoe | Agreed. | 15:54 |
| JayF | cardoe might be that downstream /s | 15:54 |
| JayF | lol | 15:54 |
| TheJulia | time check, 5 minutes | 15:54 |
| JayF | clif: did that help? | 15:55 |
| clif | I think so | 15:55 |
| JayF | in a perfect world modelled like we expect ports[].physical_network to never be different but we're used to people abusing our fields for other things | 15:55 |
| JayF | SGTM. If you'll update that patch we can have a look. | 15:56 |
| JayF | I think this issue is at an end, and probably the whole meeting? | 15:56 |
| JayF | Anything else on this topic or open discussion in general? | 15:56 |
| TheJulia | I *suspect* we're good.... | 15:56 |
| * TheJulia expects someone to do the typical new topic right as we end the meeting ;) | 15:57 | |
| JayF | Oh, one more thing! | 15:57 |
| cid | :D | 15:57 |
| JayF | #endmeeting | 15:57 |
| JayF | #endmeeting ironic | 15:57 |
| JayF | I thought I was chair? | 15:57 |
| TheJulia | ohh, interesting | 15:57 |
| TheJulia | I think because it is case matching | 15:57 |
| TheJulia | doh! | 15:57 |
| TheJulia | #endmeeting | 15:57 |
| opendevmeet | Meeting ended Mon Aug 25 15:57:52 2025 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 15:57 |
| opendevmeet | Minutes: https://meetings.opendev.org/meetings/ironic/2025/ironic.2025-08-25-15.03.html | 15:57 |
| opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/ironic/2025/ironic.2025-08-25-15.03.txt | 15:57 |
| opendevmeet | Log: https://meetings.opendev.org/meetings/ironic/2025/ironic.2025-08-25-15.03.log.html | 15:57 |
| TheJulia | its all good | 15:58 |
| JayF | oh wow lol | 15:58 |
| TheJulia | It recorded your stuffs in the notes so \o/ | 15:58 |
| JayF | https://review.opendev.org/c/openstack/ironic/+/958249 cores please land these? Trivial review... | 16:13 |
| TheJulia | done | 16:24 |
| JayF | the second patch in that stack is a regression fix, too | 16:25 |
| TheJulia | JayF: why are you giving 2M to each thread? | 16:25 |
| TheJulia | is your machine *really* needing 2M per thread to launch?! | 16:26 |
| JayF | I took the number and made it bigger | 16:26 |
| JayF | that's as far as I went to pick the right number | 16:26 |
| * TheJulia blinks | 16:26 | |
| TheJulia | wow | 16:26 |
| TheJulia | That is a bit frightening | 16:27 |
| TheJulia | but its just for local testing | 16:27 |
| JayF | I assure you for non-janky-tox-dev-environment I pay more attention | 16:27 |
| TheJulia | folks can scream if we need to increase the default | 16:27 |
| JayF | also I assumed that was in bytes | 16:27 |
| TheJulia | yeah, it is | 16:27 |
| TheJulia | sso, 2.5-ish | 16:28 |
| JayF | 39 times the default, yeah, that was an off-by-order-of-magnitude error and not intentional | 16:29 |
| JayF | and 2x does the trick, updating the patch | 16:29 |
| TheJulia | ok | 16:30 |
| JayF | I was so flippant because I had a memory of doing like, 4x or something, then I did the math and realized it was ridic | 16:30 |
| opendevreview | Jay Faulkner proposed openstack/ironic master: Fix local-ironic-dev by setting stack size https://review.opendev.org/c/openstack/ironic/+/958250 | 16:31 |
| JayF | yep, two off-by-one errors: 65535*4 + add a zero on the end = the number that we have. So I did 4x the wrong number (off by one) and added a digit (off by another one) | 16:33 |
| TheJulia | heh | 16:34 |
| TheJulia | okay | 16:34 |
| TheJulia | I *know* kernels can greatly impact thread stack starting size | 16:34 |
| JayF | I'm using gentoo-kernel-bin | 16:35 |
| JayF | which uses a very lightly modified fedora config | 16:35 |
| TheJulia | I had what as functionlly a single threaded database which would launch the engine in it's own thread but then mutex across the client connections and it was super sensitive to that | 16:35 |
| TheJulia | my debian machine seems to be fine with it, but my fedora machine does seem to want it to be slightly higher | 16:36 |
| TheJulia | but... it worked on my fedora machine without issues previously | 16:36 |
| JayF | I wonder if that is an indication we should bump the default overall. | 16:36 |
| TheJulia | I'm wondering the same | 16:36 |
| JayF | That's basically "the smallest possible memory footprint of a distinct thread", yea? | 16:37 |
| JayF | I don't think it's silly to pay 64k*300 threads (19.2MB) RAM to make it work by default in more places | 16:37 |
| TheJulia | sort of, its the default starting amount | 16:38 |
| TheJulia | which then gets allocated and not necessarilly used | 16:38 |
| JayF | so as I said, in default thread count, it's ~20MB of ram additional | 16:38 |
| TheJulia | So, its 64k now, afaik | 16:38 |
| JayF | yeah, so 19.2MB of ram in stack sizes for all 300 threads if we are at max | 16:38 |
| TheJulia | so, doubling it seems reasonable to me, fwiw | 16:38 |
| JayF | and 2x that (an extra 19.2M) beyond that is worth it | 16:39 |
| JayF | ack I'll abandon the local-ironic-dev change and push that update now() | 16:39 |
| TheJulia | k | 16:39 |
| opendevreview | Jay Faulkner proposed openstack/ironic master: Raise default IRONIC_DEFAULT_THREAD_SIZE https://review.opendev.org/c/openstack/ironic/+/958461 | 16:42 |
| *** melwitt_ is now known as jgwentworth | 16:43 | |
| *** jgwentworth is now known as melwitt | 16:44 | |
| * dtantsur just out of a reboot and with little scrollback because bloody computers | 16:48 | |
| JayF | it's okay, we just got done agreeing to rewrite in ruby on rails | 16:48 |
| JayF | otherwise you missed nothing important | 16:48 |
| JayF | ;) | 16:48 |
| TheJulia | JayF: you forgot the best part, the explicit mandate to not have any single way to do addition. | 16:53 |
| JayF | thanks for discussion.add(that comment) | 16:54 |
| JayF | The first "real" programming language I dabbled in was ruby: I said they had "optimistic driven design" meaning if you imagine a .whatever() exists on an object, it does. | 16:54 |
| JayF | lol | 16:54 |
| dtantsur | Hallucinated things into existence before it become mainstream, right? | 16:59 |
| JayF | lol | 17:01 |
| JayF | I will say: I think `unless` is great | 17:01 |
| * dtantsur nods | 17:01 | |
| dtantsur | TheJulia: I'm looking at this logging snippet from the Ironic start-up and having a nagging feeling that even local calls now go through RPC: https://paste.opendev.org/raw/b08Twv3p9IZtL67Ge6YO/ | 17:06 |
| dtantsur | Please prove me wrong :) | 17:06 |
| dtantsur | Well, I cannot be wrong, can I? It's a singleprocess Ironic, there is no way to distinguish between API and conductor. | 17:07 |
| dtantsur | So, anything that does objects.BlahBlah.blah_blah goes through RPC. Even the conductor start-up code that checks allocations and stuff. | 17:07 |
| * dtantsur still hopes TheJulia proves him wrong somehow | 17:08 | |
| TheJulia | your not wrong | 17:08 |
| dtantsur | damn | 17:08 |
| TheJulia | And that is exactly what is happening because you cannot distingish between the two, really | 17:09 |
| TheJulia | There is a catch in the request handling logic for this, but yeah | 17:09 |
| TheJulia | The entire model technically allows *remote* conductors as well, but that is something we've never really done any thinking about aside from crazy idea pondering back in ?2023? | 17:09 |
| dtantsur | Okay, so I guess my idea with local RPC was pretty bad in the end, even if it let us progress quickly. I'll find a way to reap it out. | 17:10 |
| TheJulia | honestly... | 17:10 |
| dtantsur | I don't think remote conductors will work in practice: our hash ring code also relies on database/objects, and hash ring is required for RPC :) | 17:10 |
| dtantsur | Checken->JSON->Egg | 17:10 |
| TheJulia | I'd update the conductor launch process to reset the indirection flag on the primary object in the conductor start service | 17:11 |
| dtantsur | I'll do this if I don't find a way to get rid of all this mess :) | 17:11 |
| opendevreview | Dmitry Tantsur proposed openstack/ironic master: Reduce the number of RPC calls to traits API https://review.opendev.org/c/openstack/ironic/+/958226 | 17:12 |
| TheJulia | effectively undo https://github.com/openstack/ironic/blob/master/ironic/command/singleprocess.py#L63-L64 around https://github.com/openstack/ironic/blob/master/ironic/conductor/base_manager.py#L90 | 17:12 |
| TheJulia | Set it to None, and the objects shouldn't really be loaded yet in the conductor service process | 17:12 |
| TheJulia | but was there *for* the launch of the api surface | 17:13 |
| opendevreview | Merged openstack/ironic master: Fix setting IRONIC_THREAD_STACK_SIZE https://review.opendev.org/c/openstack/ironic/+/958249 | 17:13 |
| dtantsur | Makes sense. But if I can stop doing network calls on every sneeze, I'll go down that path. | 17:13 |
| TheJulia | it just turns off the indirection | 17:13 |
| TheJulia | and thats fine for the conductor, its the cost to pay for the API | 17:13 |
| TheJulia | I can put up a patch a little later, I'm working on a backport right now | 17:17 |
| dtantsur | TheJulia: please wait, I don't think we should go down this path at all | 17:17 |
| opendevreview | Dmitry Tantsur proposed openstack/ironic master: PoC: launch API in the same process as conductor https://review.opendev.org/c/openstack/ironic/+/958462 | 17:18 |
| dtantsur | Let's see what this ^^ shows | 17:18 |
| dtantsur | Heh, it seems to actually work. This is promising. I'll develop this idea further tomorrow. | 18:03 |
| TheJulia | ok | 18:03 |
| opendevreview | Merged openstack/networking-generic-switch master: Document advanced Netmiko parameters https://review.opendev.org/c/openstack/networking-generic-switch/+/958080 | 19:53 |
| iurygregory | TheJulia, can you W -1 https://review.opendev.org/c/openstack/ironic-prometheus-exporter/+/954870 ? | 19:56 |
| TheJulia | done | 19:56 |
| iurygregory | I'm updating to re-use most of the redfish part | 19:56 |
| iurygregory | tks :D | 19:57 |
| TheJulia | k | 19:57 |
| TheJulia | cid: are you planning on picking up 951055 this week ? | 20:00 |
| * cid goes looking | 20:00 | |
| cid | So that is both a valid LP and a changeset, I guess you mean the change | 20:02 |
| TheJulia | I guess it feels weird since because it is only changing the base config nothing else actually runs | 20:02 |
| TheJulia | yeah, the change, sorry | 20:02 |
| cid | So, JayF, wanted to take over that change if I'm not mistaken | 20:03 |
| JayF | yeah I offered to take it over | 20:05 |
| JayF | did some work on it to not a lot of success | 20:05 |
| JayF | I know what the path forward is, writing it on my todo list so I get around to it | 20:06 |
| TheJulia | okay | 20:06 |
| cid | ++ | 20:10 |
| cid | Depending on how much you have on your plate, if you happen to only go as far as having the path forward as a comment on the change, I can also get it in. | 20:10 |
| JayF | the problem is the path forward basically involves deleting and rewriting all our wsgi docs | 20:17 |
| opendevreview | Merged openstack/ironic master: inspection: fix None case for inventory data https://review.opendev.org/c/openstack/ironic/+/958371 | 20:27 |
| TheJulia | yeah | 20:31 |
| opendevreview | Merged openstack/ironic master: Raise default IRONIC_DEFAULT_THREAD_SIZE https://review.opendev.org/c/openstack/ironic/+/958461 | 20:32 |
| JayF | (this also is #ironic-week-prio) RFR https://review.opendev.org/c/openstack/releases/+/958489 [ironic] Cycle highlights for Flamingo/2025.2 | 20:52 |
| TheJulia | I wonder if we can get steve's ngs work landed soon :) | 20:53 |
| TheJulia | (to mention, becuase he is really moving the needle there, overall) | 20:54 |
| JayF | I had the same thought when I saw it wasn't ready for highlights | 20:54 |
| TheJulia | I think its more review starvations ince.. its ngs | 20:55 |
| JayF | yep | 20:56 |
| JayF | I've taken a look, I don't even remember if I +2'd because I was on the edge of my knowledge the whole time | 20:56 |
| TheJulia | Yeah, I saw, thanks! | 20:56 |
| TheJulia | I need a "don't forget our networking stuffs" sign | 20:56 |
| opendevreview | Verification of a change to openstack/ironic-tempest-plugin master failed: Replace deprecated assertItemsEqual https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/953589 | 21:13 |
| opendevreview | Merged openstack/ironic master: Direct return of vmedia action during in power failure https://review.opendev.org/c/openstack/ironic/+/958147 | 21:22 |
| opendevreview | Doug Goldstein proposed openstack/ironic master: redfish: process inspection rules during inspection https://review.opendev.org/c/openstack/ironic/+/957609 | 23:17 |
| cardoe | uhh I think that delta got really small like that now. | 23:21 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!