opendevreview | Steve Baker proposed openstack/ironic-specs master: Graphical Console Support https://review.opendev.org/c/openstack/ironic-specs/+/938526 | 01:15 |
---|---|---|
opendevreview | Verification of a change to openstack/ironic master failed: change ambiguous variable name https://review.opendev.org/c/openstack/ironic/+/937270 | 01:40 |
rpittau | good morning ironic! o/ | 07:51 |
kubajj | good morning rpittau, and ironic! o/ | 07:54 |
opendevreview | Adam Rozman proposed openstack/ironic master: disable ISO cache image format and safety checks https://review.opendev.org/c/openstack/ironic/+/938363 | 09:17 |
opendevreview | Merged openstack/ironic master: Calculate missing checksum for file:// based images https://review.opendev.org/c/openstack/ironic/+/935992 | 12:45 |
TheJulia | good morning | 14:04 |
TheJulia | iurygregory: o/ I left some comments on your bmc goes AWOL change after firmware updates | 14:37 |
TheJulia | iurygregory: not necessarilly a -1, just thinking maybe we should do that level of check close to where we already do the same basic thing for connections timing out | 14:38 |
TheJulia | I *suspect* it would better guard things, but we would need to log the failure and whatnot, because we do get cases where BadRequest is also surfaced by BMCs being evil. Think one of the DPU threads I've recently commented on in slack. | 14:39 |
TheJulia | Guarding at a lower level would also help soften the edges around the power status change/update cases I've had some reports of | 14:40 |
iurygregory | TheJulia, ack, will look in a bit o/ | 14:41 |
TheJulia | It is also an odd thing to guard against because it could be the BMC is doing something really wrong, but verbose logging is likely critical then | 14:41 |
TheJulia | .... (including a node history entry most likely...) | 14:41 |
TheJulia | ((record that bad bmc behavior!)) | 14:41 |
iurygregory | I talked with dtantsur about it, last week, maybe the location for retrying would be in https://opendev.org/openstack/ironic/src/branch/master/ironic/drivers/modules/redfish/firmware.py#L184 (adding a call to try to poke the system/managers list and retry) wdyt? | 14:43 |
iurygregory | ++ to include information in the node history | 14:44 |
TheJulia | I thought that as well, I'm definitely influenced by the issues I hear grumbled about by NobodyCam | 14:47 |
iurygregory | ack, I will take a closer look at your comments after lunch :D | 14:47 |
TheJulia | The thing I wonder, is how long do we block/hold/wait there | 14:47 |
TheJulia | for example, idracs, you ask to update the firmware, they still respond semi-normally for a little bit while it is unpacking/checking the firwmare | 14:48 |
iurygregory | yeah, I totally agree we should add configs for that | 14:48 |
iurygregory | yup, and will stop answering after some time till the update is in place | 14:49 |
TheJulia | ... and can actually break horribly if the IP address and reverse dns doesn't match the DNS name being used | 14:49 |
TheJulia | yup | 14:49 |
iurygregory | I wasn't even thinking about this scenario, but yeah... | 14:49 |
TheJulia | I've been burned by it a few times | 14:55 |
TheJulia | Greetings folks, welcome abongale, also known as Abhishek. He is a new member of my team | 14:58 |
rpittau | welcome abongale :) | 14:58 |
abongale | Hello Guys:) | 14:59 |
JayF | abongale: welcome | 15:17 |
cardoe | abongale: welcome! | 15:30 |
TheJulia | coffeeeeeeeeeeeeeeeeeeeee | 16:20 |
masghar | Welcome abongale! | 16:31 |
opendevreview | Merged openstack/ironic-python-agent-builder master: Move jobs and DIB builds to ubuntu noble https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/938115 | 16:33 |
JayF | ironic-lib removal patches for both IPA and Ironic pass CI and have been hashtagged ironic-week-prio | 16:38 |
opendevreview | Merged openstack/ironic master: change ambiguous variable name https://review.opendev.org/c/openstack/ironic/+/937270 | 16:45 |
rpittau | if anyone has a moment please have a look at https://review.opendev.org/c/openstack/ironic-python-agent/+/937042 thanks! | 16:48 |
JayF | on it | 16:50 |
rpittau | JayF: just realized that the ipa patch that removes ironic-lib potentially conflicts with my change, probably easier to merge yours first | 16:52 |
JayF | eh, doesn't really matter | 16:52 |
JayF | if it causes a rebase it won't be too bad | 16:52 |
JayF | I'd be pleasantly surprised if it doesn't need revision anyway | 16:52 |
rpittau | looks good at a glance | 16:54 |
JayF | IPA one was simpler than Ironic, too | 16:55 |
rpittau | yeah, didn't get to that yet | 16:55 |
rpittau | just wondering if we want to release ipa and ironic before removing ironic-lib | 16:57 |
JayF | why? | 16:58 |
JayF | I was specifically trying to get these changes in before next release | 16:58 |
rpittau | ok, just yhinking out loud | 16:58 |
rpittau | next bugfix is in ~3 weeks anyway | 16:58 |
JayF | yeah my hope was we've made the final release of ironic-lib :D | 16:59 |
rpittau | yep :D | 16:59 |
rpittau | alright, time to go, good night! o/ | 17:01 |
cardoe | iurygregory/TheJulia: part of the BMC goes AWOL on firmware updates... we don't grab the Update Job (what's the right term?) from Redfish and track that job to completion or failure. We wait for the box to reboot back into IPA. Which isn't necessarily correct. | 18:04 |
cardoe | I believe we need to grab that job once it appears (some systems it only appears for us once its told to boot up, I originally wanted to make the step IPA-less). And follow that job. We need to give the BMC some non-responsive grace period and then track that job. | 18:04 |
cardoe | I think I added these notes to a bug but I honestly don't recall. Do we have an open bug for this? | 18:05 |
TheJulia | cardoe: so, I think the issue is differences in hardware behavior and if you update the bmc itself, you loose the session/data to the bmc | 18:26 |
TheJulia | I think we do have an open bug, and that is a great question. I think we're sort of focusing around several distinctly different but related bugs at the same time | 18:27 |
cardoe | Right. That's why I'm saying the job is what we should ultimately track. | 18:28 |
TheJulia | I think so, yes. But we can't track a thing if we can't talk to or invalidate the bmc | 18:28 |
TheJulia | and the issue here is the bmc is giving "BadRequest" back when it is still booting up | 18:29 |
TheJulia | That is part of the issue, we do the thing, next interaction fails because BadRequest | 18:30 |
TheJulia | ... and funny thing is, we've got a downstream report of a DPU class device which is giving bad json or bogus responses after the power state is changed :( | 18:30 |
TheJulia | Theory is... it is invalidating the client session | 18:32 |
cardoe | Yeah that's what I really think is what that BadRequest is. | 18:36 |
opendevreview | Scott Solkhon proposed openstack/ironic master: Update hardware burn-in docs https://review.opendev.org/c/openstack/ironic/+/938606 | 18:37 |
iurygregory | cardoe, sometime the Task is more fast and we can't get the information from it if I recall | 19:09 |
cardoe | Hmm. Good old bare metal. On the Dell’s I’ve got it’s been opposite. | 19:11 |
iurygregory | for me it was on a R640 | 19:11 |
iurygregory | not sure how old it was =) | 19:11 |
opendevreview | Jay Faulkner proposed openstack/ironic-tempest-plugin master: Validate automatic lessee https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/927545 | 19:13 |
TheJulia | cardoe: it might make sense to figure out when to use BadRequest to invalidate the session (If... I didn't already do that.) | 19:41 |
cardoe | iurygregory: so check those R640s and lemme know if there's two jobs. | 19:47 |
TheJulia | when I've done it manually, always one job | 19:47 |
TheJulia | unless I had a stalled/broken job | 19:47 |
TheJulia | and then there was no guarentee the bmc would update anyway | 19:47 |
cardoe | There's 1 job that appears pretty quick and then there's another one that doesn't appear until the BIOS posts | 19:47 |
TheJulia | system bios? | 19:47 |
cardoe | Ironic makes two for me on that class of hardware. | 19:48 |
cardoe | One generic and one that has BIOS in a field. | 19:48 |
cardoe | The generic one is the download and extraction | 19:48 |
cardoe | When you use their HTTP interface or tools the generic job doesn't appear. | 19:48 |
iurygregory | cardoe, will do tomorrow, I'm wrapping up for today o/ | 19:48 |
cardoe | My guess is cause their tools and the HTTP interface do the HTTP PUT while Ironic tells the BMC to fetch it. | 19:49 |
cardoe | I don't have anymore R640s but I've got R740s that I'll try against. | 19:50 |
stevebaker[m] | Good morning | 20:08 |
TheJulia | o/ | 20:09 |
TheJulia | Good morning stevebaker[m] | 20:10 |
stevebaker[m] | TheJulia: I guess this should be backported as far as the CVE change was? https://review.opendev.org/c/openstack/ironic/+/935992?usp=email | 20:11 |
JayF | yes | 20:13 |
TheJulia | as is possible, yes | 20:13 |
JayF | https://review.opendev.org/c/openstack/ironic-python-agent/+/934091 similarly needs to go back | 20:14 |
JayF | but it looks like stable/2023.2 has CI issues and that's where it hung up | 20:14 |
opendevreview | Steve Baker proposed openstack/ironic stable/2024.2: Calculate missing checksum for file:// based images https://review.opendev.org/c/openstack/ironic/+/938617 | 21:12 |
opendevreview | Steve Baker proposed openstack/ironic stable/2024.1: Calculate missing checksum for file:// based images https://review.opendev.org/c/openstack/ironic/+/938618 | 21:13 |
opendevreview | Steve Baker proposed openstack/ironic stable/2023.2: Calculate missing checksum for file:// based images https://review.opendev.org/c/openstack/ironic/+/938619 | 21:14 |
opendevreview | Steve Baker proposed openstack/ironic unmaintained/2023.1: Calculate missing checksum for file:// based images https://review.opendev.org/c/openstack/ironic/+/938621 | 21:16 |
opendevreview | Steve Baker proposed openstack/ironic unmaintained/zed: Calculate missing checksum for file:// based images https://review.opendev.org/c/openstack/ironic/+/938622 | 21:16 |
opendevreview | Steve Baker proposed openstack/ironic unmaintained/yoga: Calculate missing checksum for file:// based images https://review.opendev.org/c/openstack/ironic/+/938623 | 21:17 |
opendevreview | Steve Baker proposed openstack/ironic unmaintained/xena: Calculate missing checksum for file:// based images https://review.opendev.org/c/openstack/ironic/+/938624 | 21:17 |
opendevreview | Steve Baker proposed openstack/ironic unmaintained/wallaby: Calculate missing checksum for file:// based images https://review.opendev.org/c/openstack/ironic/+/938625 | 21:18 |
opendevreview | Steve Baker proposed openstack/ironic unmaintained/victoria: Calculate missing checksum for file:// based images https://review.opendev.org/c/openstack/ironic/+/938626 | 21:18 |
opendevreview | Steve Baker proposed openstack/ironic bugfix/24.0: Calculate missing checksum for file:// based images https://review.opendev.org/c/openstack/ironic/+/938627 | 21:19 |
opendevreview | Steve Baker proposed openstack/ironic bugfix/26.0: Calculate missing checksum for file:// based images https://review.opendev.org/c/openstack/ironic/+/938628 | 21:19 |
opendevreview | Steve Baker proposed openstack/ironic bugfix/25.0: Calculate missing checksum for file:// based images https://review.opendev.org/c/openstack/ironic/+/938629 | 21:20 |
opendevreview | Julia Kreger proposed openstack/ironic master: WIP OCI container adjacent artifact support https://review.opendev.org/c/openstack/ironic/+/937896 | 21:34 |
opendevreview | Julia Kreger proposed openstack/ironic master: WIP - A very early wip of bootc deployment on the ironic side https://review.opendev.org/c/openstack/ironic/+/937897 | 21:34 |
cardoe | huh? https://zuul.opendev.org/t/openstack/build/9295c667155b40d497bf6668d55d88d2 | 21:41 |
cardoe | I hate to just blindly recheck... https://review.opendev.org/c/openstack/ironic/+/937271/ | 21:42 |
JayF | I *think* that length too long is a red herring | 22:06 |
JayF | it likely had a real failuire somewhere | 22:06 |
TheJulia | oh, I've seen that | 22:07 |
TheJulia | downstream | 22:07 |
TheJulia | ... uhhhh | 22:07 |
TheJulia | so semi-red herring, but yeah, it is frustrating | 22:08 |
* TheJulia tries to recall | 22:08 | |
* TheJulia lets new devstack machine build... slowly | 22:13 | |
cardoe | That change is just moving imports above a comment so definitely don't think it's that change. | 22:27 |
cardoe | But hoping to get to the root of some of our repeat failures. | 22:27 |
opendevreview | Merged openstack/sushy unmaintained/2023.1: Update .gitreview for unmaintained/2023.1 https://review.opendev.org/c/openstack/sushy/+/936695 | 23:40 |
janders | Good morning Ironic o/ Happy New Year 2025 | 23:45 |
JayF | \o | 23:45 |
janders | since I saw all the key people I wanted to touch base about a topic that raised some interest for us internally - and that is exploring some simple hardware health monitoring in Ironic with a view to use it in metal3 | 23:58 |
janders | with at least some vendors, System has Status.Health and Status.HealthRollUp properties which aim to give a general view of how healthy or unhealthy a server is | 23:59 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!