opendevreview | OpenStack Proposal Bot proposed openstack/ironic-ui master: Imported Translations from Zanata https://review.opendev.org/c/openstack/ironic-ui/+/956852 | 03:34 |
---|---|---|
janders | TheJulia I tested the patch, thank you for working on this. With the patch, when I run abort against a node in "service failed" it lands right back in "active" without any node reboots. Without the patch using abort wasn't possible. Brief test report: https://paste.openstack.org/show/buxFPAzyKBdaJmpOSPmr/ | 05:40 |
janders | I can't test servicing end to end on latest master due to some regression (likely NIC firmware update related, I will try track this down with Iury this week). However I wonder if we need to think about a case where someone may use this to interrupt a in-flight firmware upgrade. Do you think there is any scenario where running an "abort" command | 05:44 |
janders | can trigger a node reboot when it's performing a BIOS upgrade with a big "update in progress, DO NOT REBOOT" sign across the console? | 05:44 |
janders | In my testing I saw no reboots, just a state transition. But I wasn't able to test this 100% due to the current (unrelated) regression. | 05:44 |
janders | Abort won't be accepted in the -ing state. It works in service failed state. I expect it will work in service wait state (which I think is the state where in when the actual BIOS update is running) | 05:45 |
janders | in any case this is not necessarily a direct concern about the patch, we may just need to put a big red warning in the doco advising the user to observe console output when using this verb. It's not much different from manual recovery from a manual firmware upgrade really. | 05:49 |
rpittau | good morning ironic! o/ | 06:31 |
abongale | good morning | 07:38 |
queensly[m] | Good morning o/ | 07:43 |
iurygregory | good morning ironic | 12:04 |
opendevreview | Merged openstack/sushy-tools master: Add SUSHY_EMULATOR_VIRTUAL_MEDIA_IP_FAMILY environment variable support https://review.opendev.org/c/openstack/sushy-tools/+/956904 | 12:04 |
TheJulia | janders: as-is I don't see a reason to reboot the node nor one that it would reboot a node in such a case. that being said, at a minimum it might be wise toa dd comments as such into the code | 13:04 |
TheJulia | janders: on some level we can't perfectly guard ever user, the vendors on some level need to self guard issues like reooting in their BMCs if the software update is in a critical spot | 13:05 |
TheJulia | also, good morning folks | 13:18 |
JayF | janders: we will need to ensure that, for instance, if we go into service fail while booted into the ramdisk that we do get rebooted back into the tenant OS. Similarly with networking. | 13:27 |
JayF | janders: so I don't know if the no reboot was considered a good or bad case, but it just sounded scary reading it and passing | 13:27 |
JayF | **in passing | 13:27 |
TheJulia | Well | 13:27 |
TheJulia | we kick the node back, we toggle the networking, we don't know the OS state nor the steps supplied so we can't make assumptions. so really it is all we can do because we don't know what was done | 13:28 |
JayF | Then we have to reboot every time. | 13:28 |
TheJulia | in jander's case, an update is still running | 13:28 |
JayF | Putting the machine in active while it still has a RamDisk booted is a security risk | 13:28 |
TheJulia | and we simply can't say "reboot" without risking the overall health of the machine | 13:28 |
TheJulia | well, not really. Its locked out | 13:28 |
JayF | Because you're exposing the IPA ramdisk to the tenant's Network | 13:29 |
TheJulia | which needs credentials the user can't get which are then also no longer invalid on the conductor | 13:29 |
TheJulia | and there is no guarentee the host ever had IPA running | 13:29 |
JayF | You're making a really invalid assumption though: in my environment The IPA ram disk is not always held to the same security requirements as a tenant OS. Locked out or not, an IPA booted node ending up in a tenant Network would likely be considered a security incident | 13:30 |
JayF | So it's not a matter of if we know IPA is running or not, in my mind it's a matter of we cannot allow any cases where we hand the node back to the customer network with IPA running on it | 13:30 |
JayF | At least in any multi-tenant networking setups | 13:30 |
TheJulia | Fair, so then there is no way out short of explicitly rebooting | 13:30 |
JayF | Yeah and I know that is not awesome for a number of reasons... | 13:31 |
TheJulia | I guess | 13:33 |
TheJulia | actually there is a way | 13:33 |
TheJulia | If we see a token, then we know we had an agent | 13:33 |
TheJulia | if we don't, then we can assume OOB steps were only executed | 13:33 |
TheJulia | That likely requires a specific flag or something, but the same basic idea | 13:34 |
TheJulia | Which might already exist, actually | 13:34 |
JayF | That sounds like a very good guardrail | 13:51 |
JayF | Not to mention just generally ironic should have some idea whether it tried to boot an agent or not | 13:52 |
JayF | And if we don't have that idea, we can add like an agent booted at timestamp somewhere or something 😂 | 13:52 |
TheJulia | we definitely have some, but yeah | 13:58 |
dtantsur | We have last heartbeat timestamp? | 13:58 |
TheJulia | we do | 13:58 |
JayF | The thing that's really nice about choosing a timestamp is you can potentially make it configurable (even with a patch if we didn't want to do it upstream) to adjust the threshold to match your security paranoia | 13:59 |
JayF | Or I guess you could say was last heartbeat before provision state updated at | 13:59 |
JayF | But I don't think we keep that information separately | 13:59 |
TheJulia | I think the idea might be "if it has ever heartbeated, it must be rebooted" | 14:04 |
JayF | even better | 14:52 |
opendevreview | Julia Kreger proposed openstack/ironic master: ci: grenade: restart neutron services https://review.opendev.org/c/openstack/ironic/+/956801 | 14:56 |
TheJulia | #startmeeting ironic | 15:00 |
opendevmeet | Meeting started Mon Aug 11 15:00:33 2025 UTC and is due to finish in 60 minutes. The chair is TheJulia. Information about MeetBot at http://wiki.debian.org/MeetBot. | 15:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 15:00 |
opendevmeet | The meeting name has been set to 'ironic' | 15:00 |
TheJulia | o/ | 15:00 |
alegacy_ | p | 15:00 |
kubajj | o/ | 15:00 |
alegacy_ | o/ | 15:00 |
TheJulia | who is chairing today? :) | 15:00 |
iurygregory | o/ | 15:01 |
iurygregory | I can | 15:02 |
queensly[m] | o/ | 15:02 |
TheJulia | #chair iurygregory | 15:02 |
opendevmeet | Current chairs: TheJulia iurygregory | 15:02 |
iurygregory | Hello everyone, welcome to our weekly meeting o/ | 15:02 |
JayF | o/ | 15:02 |
iurygregory | you can find our agenda in the wiki | 15:02 |
iurygregory | #link https://wiki.openstack.org/wiki/Meetings/Ironic#Agenda_for_next_meeting | 15:02 |
iurygregory | #topic Announcements / Reminders | 15:03 |
iurygregory | #info Standing reminder to review patches tagged ironic-week-prio and to hashtag any patches ready for review with ironic-week-prio: https://tinyurl.com/ironic-weekly-prio-dash | 15:03 |
rpittau | o/ | 15:03 |
iurygregory | #info 2025.2 Flamingo Release Schedule https://releases.openstack.org/flamingo/schedule.html We are in the R-7 week, last week we have released bugfix branches for some projects | 15:04 |
iurygregory | #info next week (R-6) is the final release for non-client libraries | 15:05 |
iurygregory | Does anyone have something to add for Announcements / Reminders ? | 15:05 |
iurygregory | ok, moving on | 15:06 |
JayF | We don't have anything that counts on that deadline, right? | 15:06 |
TheJulia | sushy, really. | 15:06 |
cid | o/ | 15:07 |
JayF | I didn't know if we treated that via those rules, given it's not shared, but that's what I was curious on really | 15:07 |
iurygregory | yeah | 15:07 |
iurygregory | well, we have cases in the past where we did releases after the deadline .. | 15:07 |
TheJulia | release holds us to those rules, fwiw | 15:07 |
TheJulia | so, if we need anything there, we need to get it sorted. | 15:08 |
TheJulia | If not, well, yeah | 15:08 |
iurygregory | yup | 15:08 |
rpittau | sushy has been release one month ago FYI | 15:08 |
iurygregory | tks rpittau ! | 15:08 |
iurygregory | let's just check if there is something we will need to land and see about a new release, but I think we are good | 15:09 |
iurygregory | moving on | 15:09 |
iurygregory | #topic Working Group Updates | 15:09 |
iurygregory | #info Standalone networking https://etherpad.opendev.org/p/ironic-standalone-networking | 15:10 |
alegacy_ | Making good progress. Was hoping to get some WIP branches out this week but realized I needed to redo some of the RPC stuff to compensate for the local-rpc changes. | 15:10 |
alegacy_ | Hoping to test out a few more scenarios and then I will be able to do that. | 15:10 |
alegacy_ | note that I will be on PTO for 2 weeks starting Monday August 25 | 15:11 |
TheJulia | cool cool | 15:11 |
iurygregory | tks alegacy_ | 15:12 |
TheJulia | as an fyi, I will also likely take time off the second week of September. | 15:12 |
iurygregory | #info Eventlet Removal https://etherpad.opendev.org/p/ironic-eventlet-removal | 15:12 |
TheJulia | Yes, I need to update the etherpad | 15:13 |
TheJulia | the tl;dr is we have a whole stack of patches to review which moves ironic to threaded, removes eventlet, and ultimately enables object redirection | 15:13 |
TheJulia | The stack starts at: https://review.opendev.org/c/openstack/ironic/+/952939 | 15:14 |
iurygregory | I will give some time this week to review the patches | 15:14 |
TheJulia | (and, everything is presently passing CI at this time) | 15:14 |
rpittau | that's great | 15:15 |
JayF | We've also talked some about how this changes the performance shape of Ironic, and how we might want to take a more structured approach to QA this release during the final month-ish | 15:15 |
rpittau | I'll find the time to review | 15:15 |
iurygregory | JayF, yeah ++ totally agree | 15:15 |
JayF | If anyone wants to create an etherpad with a checklist/ideas/etc please do. I'll do it closer to time if nobody else has; but my QA experience is near-zero. | 15:15 |
iurygregory | We don't have any discussion topics, so I will skip it, any discussion we can take in the Open discussion | 15:17 |
iurygregory | moving on | 15:17 |
iurygregory | #topic Bug Deputy Updates | 15:17 |
cid | That's me | 15:18 |
iurygregory | #info Bug Deputy (cid) New Bugs: 3 new New RFEs: 0 new | 15:18 |
cid | There were 3 new bugs, quite a couple of existing ones are in progress. Overall quite week | 15:18 |
iurygregory | Who is the next bug deputy? | 15:18 |
iurygregory | I can take the role next week | 15:19 |
iurygregory | any volunteer for this week? | 15:20 |
* cid \o/, *quiet :) | 15:20 | |
cid | I would be happy to, iurygregory | 15:20 |
iurygregory | ack cid, tks! | 15:21 |
iurygregory | #info cid is the bug deputy this week, iurygregory will be the deputy next week | 15:21 |
iurygregory | We don't have RFE's so skipping the topic | 15:22 |
iurygregory | #topic Open discussion | 15:22 |
iurygregory | If anyone has something that would like to discuss this is the time =) | 15:22 |
JayF | I'm back, if you have something that specifically needs my attention please ping me directly :) | 15:23 |
TheJulia | I'm updating the eventlet etherpad, fwiw | 15:24 |
iurygregory | so I think we are good for today | 15:25 |
iurygregory | thanks everyone! | 15:25 |
iurygregory | #endmeeting | 15:26 |
opendevmeet | Meeting ended Mon Aug 11 15:26:04 2025 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 15:26 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/ironic/2025/ironic.2025-08-11-15.00.html | 15:26 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/ironic/2025/ironic.2025-08-11-15.00.txt | 15:26 |
opendevmeet | Log: https://meetings.opendev.org/meetings/ironic/2025/ironic.2025-08-11-15.00.log.html | 15:26 |
TheJulia | Hmm, looks like we have an issue with novncproxy and signals being registered too late :( | 17:34 |
TheJulia | Anyway! https://etherpad.opendev.org/p/ironic-eventlet-removal updated with a ton of details/info. And patch list! | 17:36 |
guilhermesp | dtantsur: yeah currently trying to explode software raid via ipa with an image built with dib ( https://paste.openstack.org/raw/bQyhBQ0NiAnEIUDOHQg0/ ) but hitting this https://bugs.launchpad.net/ironic-python-agent/+bug/2090993 -- its been quite a fun journey :P | 17:38 |
frickler | ah, I've seen that, too, wasn't aware that there's a bug for it. my workaround is to use a partition with 549MB only https://github.com/osism/openstack-ironic-images/blob/main/elements/block-device-osism-efi/block-device-default.yaml#L11 | 17:45 |
guilhermesp | thanks for sharing frickler ! | 17:49 |
TheJulia | We might have a blocker with the websocket proxy for graphical consoles with eventlet. An underlying library appears to try to setup it's own signal handling for threading, I guess written in the model/mindset of being run via eventlet. I'll try to discuss it with steve, I think the cleanish path is to lift two methods from the library and execute rewritten copies which are more pertinent to our situation. | 18:15 |
TheJulia | oh, lgpl | 18:17 |
TheJulia | nevermind | 18:17 |
JayF | you could greenfield rewrite them if you haven't looked too hard, or spec it for someone else if you have | 18:17 |
TheJulia | its several hundred lines of code :\ | 18:18 |
JayF | ugh | 18:19 |
JayF | can we fork the proxy instead of running in a thread? | 18:19 |
JayF | actual subprocess? | 18:19 |
TheJulia | we might be able to.... | 18:20 |
TheJulia | maybe | 18:20 |
TheJulia | looks like what it is largely doing is all geared around DIYing socket handling for the proxy stuffs | 18:20 |
TheJulia | so, https://github.com/openstack/oslo.service/blame/master/oslo_service/backend/_threading/service.py#L234-L238 might work | 18:28 |
TheJulia | except, it won't work in single process mode, but maybe that is okay?! | 18:31 |
opendevreview | Julia Kreger proposed openstack/ironic master: DNM For Science - try no fork?! https://review.opendev.org/c/openstack/ironic/+/957044 | 18:32 |
TheJulia | curious to see if that would work, I don't remember seeing the no_fork option in the launcher, but its a relatively recent addition | 18:34 |
JayF | I think it's possible to say definitionally since gfx console requires a subprocess, we can say unsupported in single process mode :) | 18:52 |
JayF | but I don't know if that causes specific pain to any of our consumers | 18:52 |
TheJulia | yeah, dunno | 18:55 |
TheJulia | I mean, we have a possible path to make it work | 18:55 |
TheJulia | but I think it was just added to the single process launcher as a nicity overall even though the default is false | 18:56 |
TheJulia | at least on a standalone process it seems to work with nofork | 19:14 |
JayF | \o/ | 19:15 |
TheJulia | *seems*, I've not tried exercising it. The thing to disambiguate is single process mode and CI + github == sadness right now | 19:15 |
opendevreview | Verification of a change to openstack/ironic-prometheus-exporter master failed: [IPE] Support iDRAC driver metrics https://review.opendev.org/c/openstack/ironic-prometheus-exporter/+/954870 | 19:47 |
opendevreview | Verification of a change to openstack/bifrost master failed: Do not pass empty values to instance_info https://review.opendev.org/c/openstack/bifrost/+/953336 | 19:52 |
opendevreview | Merged openstack/ironic master: Fix broken <range-in> in root device hints https://review.opendev.org/c/openstack/ironic/+/955618 | 20:04 |
TheJulia | I talked with steve, he is good if we just remove the vnc stuffs from the single process launcher as well | 21:04 |
TheJulia | but I'm going to recheck my no_fork change | 21:04 |
opendevreview | Julia Kreger proposed openstack/ironic master: DNM For Science - try no fork?! https://review.opendev.org/c/openstack/ironic/+/957044 | 21:11 |
janders | TheJulia (CC JayF) w/r/t failed updates: in my case (Redfish driver and SimpleUpdate API) the failure happens before boot into IPA ( https://review.opendev.org/c/openstack/ironic/+/954311 makes it possible cause with this patch SimpleUpdate gets called before the first reboot) | 21:15 |
JayF | yeah, for that failure case it's safe, but I think there will be cases it's not safe in | 21:15 |
JayF | we just need to add that guardrail we talked about in here to ensure no new heartbeat | 21:15 |
janders | so the node boots from image to image and abort clears the error state | 21:16 |
janders | I agree | 21:16 |
janders | given I only ever used firmware updates with Redfish I can't easily visualise what the other cases look like | 21:16 |
janders | but I suppose we would be looking at a case where firmware update is done with a BLOB ran from CLI from Ramdisk | 21:17 |
janders | or even using a non-Redfish driver that does it out-of-band but doesn't go through codepaths from 954311 so still reboots into IPA before sending the update call to the BMC | 21:17 |
JayF | I mean, imagine any case which can cause a random cleaning failure in IPA happening during servicing | 21:18 |
JayF | anything from a temporary network blip doing callbacks to ironic to an actual failure mid-operation | 21:18 |
janders | I will do some more testing today with the current iteration of the service abort patch, I think we tracked down the suspected regression (and it was a lab issue) | 21:20 |
janders | may have an useful insight or two out of that | 21:20 |
JayF | dtantsur: so fun thing, I'm finishing the wsgi script removal originally started by cid ... weirdly enough; it looks to me like it's *not possible* to use apache2 mod_wsgi with a module entrypoint. This implies to me we may need to change our approach and/or remove apache2 use from our docs? Am I missing something? | 22:20 |
opendevreview | Julia Kreger proposed openstack/ironic master: Launch vnc proxy with no_fork https://review.opendev.org/c/openstack/ironic/+/957044 | 22:27 |
JayF | dtantsur: I find myself forgetting, more generally, why we wanted to remove the script-y method anyway... | 22:27 |
TheJulia | w/r/t the no_fork change, I need to sit down and test to ensure it still works, but I have relatively high confidence it should still just work unless websockify was entirely written for eventlet (hopefully not!) | 22:32 |
TheJulia | which doesn't seem to be the case | 22:33 |
TheJulia | fwiw, seems to work but didn't do it on single process because systemd doesn't think it has started... really. | 22:35 |
opendevreview | Julia Kreger proposed openstack/ironic master: ci: grenade: restart neutron services https://review.opendev.org/c/openstack/ironic/+/956801 | 22:42 |
opendevreview | Julia Kreger proposed openstack/ironic master: WIP: Fix the ability to escape service fail https://review.opendev.org/c/openstack/ironic/+/956972 | 23:07 |
TheJulia | janders: JayF: Does along the lines of ^ make everyone happy? | 23:07 |
iurygregory | my reaction after seeing a bump from 1.2.0 to 3.2.0 for futurist was WOW <O> https://review.opendev.org/c/openstack/ironic/+/952939/20/requirements.txt#38 | 23:30 |
janders | g'day Ironic o/ | 23:31 |
janders | TheJulia looking | 23:31 |
janders | LGTM from a quick look, will try lab-test this now. Lab needs some work so Bear with me (pun intended) | 23:34 |
janders | it would be awesome to have an Ironic catchup somewhere in Kyushu, Japan and make an Ironic themed Kumamoto Bear event mascot | 23:35 |
iurygregory | +1 to this idea :D | 23:40 |
TheJulia | Only if we can get lost on the corporate dime. :) | 23:42 |
janders | Agreed. Having said that I've been watching Fukuoka become a little bit of a conference venue hotspot over the years and it's fair bit cheaper than tier-one cities. Being the home town of Ichiran ramen means we can save on meals and just live on Ichiran. It's no sacrifice LOL. | 23:46 |
janders | it's likely an AUS certain perspective though. Being far from everything kinda makes Japan relavitvely close and affordable (since both AUD and JPY tanked recently). | 23:47 |
iurygregory | corporate dime ++ :D | 23:51 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!