Tuesday, 2025-01-07

opendevreviewSteve Baker proposed openstack/ironic-specs master: Graphical Console Support  https://review.opendev.org/c/openstack/ironic-specs/+/93852601:15
opendevreviewVerification of a change to openstack/ironic master failed: change ambiguous variable name  https://review.opendev.org/c/openstack/ironic/+/93727001:40
rpittaugood morning ironic! o/07:51
kubajjgood morning rpittau, and ironic! o/07:54
opendevreviewAdam Rozman proposed openstack/ironic master: disable ISO cache image format and safety checks  https://review.opendev.org/c/openstack/ironic/+/93836309:17
opendevreviewMerged openstack/ironic master: Calculate missing checksum for file:// based images  https://review.opendev.org/c/openstack/ironic/+/93599212:45
TheJuliagood morning14:04
TheJuliaiurygregory: o/ I left some comments on your bmc goes AWOL change after firmware updates14:37
TheJuliaiurygregory: not necessarilly a -1, just thinking maybe we should do that level of check close to where we already do the same basic thing for connections timing out14:38
TheJuliaI *suspect* it would better guard things, but we would need to log the failure and whatnot, because we do get cases where BadRequest is also surfaced by BMCs being evil. Think one of the DPU threads I've recently commented on in slack.14:39
TheJuliaGuarding at a lower level would also help soften the edges around the power status change/update cases I've had some reports of14:40
iurygregoryTheJulia, ack, will look in a bit o/14:41
TheJuliaIt is also an odd thing to guard against because it could be the BMC is doing something really wrong, but verbose logging is likely critical then14:41
TheJulia.... (including a node history entry most likely...)14:41
TheJulia((record that bad bmc behavior!))14:41
iurygregoryI talked with dtantsur about it, last week, maybe the location for retrying would be in https://opendev.org/openstack/ironic/src/branch/master/ironic/drivers/modules/redfish/firmware.py#L184 (adding a call to try to poke the system/managers list and retry) wdyt?14:43
iurygregory++ to include information in the node history14:44
TheJuliaI thought that as well, I'm definitely influenced by the issues I hear grumbled about by NobodyCam14:47
iurygregoryack, I will take a closer look at your comments after lunch :D14:47
TheJuliaThe thing I wonder, is how long do we block/hold/wait there14:47
TheJuliafor example, idracs, you ask to update the firmware, they still respond semi-normally for a little bit while it is unpacking/checking the firwmare 14:48
iurygregoryyeah, I totally agree we should add configs for that14:48
iurygregoryyup, and will stop answering after some time till the update is in place14:49
TheJulia... and can actually break horribly if the IP address and reverse dns doesn't match the DNS name being used14:49
TheJuliayup14:49
iurygregoryI wasn't even thinking about this scenario, but yeah...14:49
TheJuliaI've been burned by it a few times14:55
TheJuliaGreetings folks, welcome abongale, also known as Abhishek. He is a new member of my team14:58
rpittauwelcome abongale :)14:58
abongaleHello Guys:)14:59
JayFabongale: welcome 15:17
cardoeabongale: welcome!15:30
TheJuliacoffeeeeeeeeeeeeeeeeeeeee16:20
masgharWelcome abongale!16:31
opendevreviewMerged openstack/ironic-python-agent-builder master: Move jobs and DIB builds to ubuntu noble  https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/93811516:33
JayFironic-lib removal patches for both IPA and Ironic pass CI and have been hashtagged ironic-week-prio16:38
opendevreviewMerged openstack/ironic master: change ambiguous variable name  https://review.opendev.org/c/openstack/ironic/+/93727016:45
rpittauif anyone has a moment please have a look at https://review.opendev.org/c/openstack/ironic-python-agent/+/937042 thanks!16:48
JayFon it16:50
rpittauJayF: just realized that the ipa patch that removes ironic-lib potentially conflicts with my change, probably easier to merge yours first16:52
JayFeh, doesn't really matter16:52
JayFif it causes a rebase it won't be too bad16:52
JayFI'd be pleasantly surprised if it doesn't need revision anyway16:52
rpittaulooks good at a glance16:54
JayFIPA one was simpler than Ironic, too16:55
rpittauyeah, didn't get to that yet16:55
rpittaujust wondering if we want to release ipa and ironic before removing ironic-lib16:57
JayFwhy?16:58
JayFI was specifically trying to get these changes in before next release16:58
rpittauok, just yhinking out loud16:58
rpittaunext bugfix is in ~3 weeks anyway16:58
JayFyeah my hope was we've made the final release of ironic-lib :D 16:59
rpittauyep :D16:59
rpittaualright, time to go, good night! o/17:01
cardoeiurygregory/TheJulia: part of the BMC goes AWOL on firmware updates... we don't grab the Update Job (what's the right term?) from Redfish and track that job to completion or failure. We wait for the box to reboot back into IPA. Which isn't necessarily correct.18:04
cardoeI believe we need to grab that job once it appears (some systems it only appears for us once its told to boot up, I originally wanted to make the step IPA-less). And follow that job. We need to give the BMC some non-responsive grace period and then track that job.18:04
cardoeI think I added these notes to a bug but I honestly don't recall. Do we have an open bug for this?18:05
TheJuliacardoe: so, I think the issue is differences in hardware behavior and if you update the bmc itself, you loose the session/data to the bmc18:26
TheJuliaI think we do have an open bug, and that is a great question. I think we're sort of focusing around several distinctly different but related bugs at the same time18:27
cardoeRight. That's why I'm saying the job is what we should ultimately track.18:28
TheJuliaI think so, yes. But we can't track a thing if we can't talk to or invalidate the bmc18:28
TheJuliaand the issue here is the bmc is giving "BadRequest" back when it is still booting up18:29
TheJuliaThat is part of the issue, we do the thing, next interaction fails because BadRequest18:30
TheJulia... and funny thing is, we've got a downstream report of a DPU class device which is giving bad json or bogus responses after the power state is changed :(18:30
TheJuliaTheory is... it is invalidating the client session18:32
cardoeYeah that's what I really think is what that BadRequest is.18:36
opendevreviewScott Solkhon proposed openstack/ironic master: Update hardware burn-in docs  https://review.opendev.org/c/openstack/ironic/+/93860618:37
iurygregorycardoe, sometime the Task is more fast and we can't get the information from it if I recall19:09
cardoeHmm. Good old bare metal. On the Dell’s I’ve got it’s been opposite.19:11
iurygregoryfor me it was on a R640 19:11
iurygregorynot sure how old it was =)19:11
opendevreviewJay Faulkner proposed openstack/ironic-tempest-plugin master: Validate automatic lessee  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/92754519:13
TheJuliacardoe: it might make sense to figure out when to use BadRequest to invalidate the session (If... I didn't already do that.)19:41
cardoeiurygregory: so check those R640s and lemme know if there's two jobs.19:47
TheJuliawhen I've done it manually, always one job19:47
TheJuliaunless I had a stalled/broken job19:47
TheJuliaand then there was no guarentee the bmc would update anyway19:47
cardoeThere's 1 job that appears pretty quick and then there's another one that doesn't appear until the BIOS posts19:47
TheJuliasystem bios?19:47
cardoeIronic makes two for me on that class of hardware.19:48
cardoeOne generic and one that has BIOS in a field.19:48
cardoeThe generic one is the download and extraction19:48
cardoeWhen you use their HTTP interface or tools the generic job doesn't appear.19:48
iurygregorycardoe, will do tomorrow, I'm wrapping up for today o/19:48
cardoeMy guess is cause their tools and the HTTP interface do the HTTP PUT while Ironic tells the BMC to fetch it.19:49
cardoeI don't have anymore R640s but I've got R740s that I'll try against.19:50
stevebaker[m]Good morning20:08
TheJuliao/20:09
TheJuliaGood morning stevebaker[m] 20:10
stevebaker[m]TheJulia: I guess this should be backported as far as the CVE change was? https://review.opendev.org/c/openstack/ironic/+/935992?usp=email20:11
JayFyes20:13
TheJuliaas is possible, yes20:13
JayFhttps://review.opendev.org/c/openstack/ironic-python-agent/+/934091 similarly needs to go back20:14
JayFbut it looks like stable/2023.2 has CI issues and that's where it hung up20:14
opendevreviewSteve Baker proposed openstack/ironic stable/2024.2: Calculate missing checksum for file:// based images  https://review.opendev.org/c/openstack/ironic/+/93861721:12
opendevreviewSteve Baker proposed openstack/ironic stable/2024.1: Calculate missing checksum for file:// based images  https://review.opendev.org/c/openstack/ironic/+/93861821:13
opendevreviewSteve Baker proposed openstack/ironic stable/2023.2: Calculate missing checksum for file:// based images  https://review.opendev.org/c/openstack/ironic/+/93861921:14
opendevreviewSteve Baker proposed openstack/ironic unmaintained/2023.1: Calculate missing checksum for file:// based images  https://review.opendev.org/c/openstack/ironic/+/93862121:16
opendevreviewSteve Baker proposed openstack/ironic unmaintained/zed: Calculate missing checksum for file:// based images  https://review.opendev.org/c/openstack/ironic/+/93862221:16
opendevreviewSteve Baker proposed openstack/ironic unmaintained/yoga: Calculate missing checksum for file:// based images  https://review.opendev.org/c/openstack/ironic/+/93862321:17
opendevreviewSteve Baker proposed openstack/ironic unmaintained/xena: Calculate missing checksum for file:// based images  https://review.opendev.org/c/openstack/ironic/+/93862421:17
opendevreviewSteve Baker proposed openstack/ironic unmaintained/wallaby: Calculate missing checksum for file:// based images  https://review.opendev.org/c/openstack/ironic/+/93862521:18
opendevreviewSteve Baker proposed openstack/ironic unmaintained/victoria: Calculate missing checksum for file:// based images  https://review.opendev.org/c/openstack/ironic/+/93862621:18
opendevreviewSteve Baker proposed openstack/ironic bugfix/24.0: Calculate missing checksum for file:// based images  https://review.opendev.org/c/openstack/ironic/+/93862721:19
opendevreviewSteve Baker proposed openstack/ironic bugfix/26.0: Calculate missing checksum for file:// based images  https://review.opendev.org/c/openstack/ironic/+/93862821:19
opendevreviewSteve Baker proposed openstack/ironic bugfix/25.0: Calculate missing checksum for file:// based images  https://review.opendev.org/c/openstack/ironic/+/93862921:20
opendevreviewJulia Kreger proposed openstack/ironic master: WIP OCI container adjacent artifact support  https://review.opendev.org/c/openstack/ironic/+/93789621:34
opendevreviewJulia Kreger proposed openstack/ironic master: WIP - A very early wip of bootc deployment on the ironic side  https://review.opendev.org/c/openstack/ironic/+/93789721:34
cardoehuh? https://zuul.opendev.org/t/openstack/build/9295c667155b40d497bf6668d55d88d221:41
cardoeI hate to just blindly recheck... https://review.opendev.org/c/openstack/ironic/+/937271/21:42
JayFI *think* that length too long is a red herring22:06
JayFit likely had a real failuire somewhere22:06
TheJuliaoh, I've seen that22:07
TheJuliadownstream22:07
TheJulia... uhhhh22:07
TheJuliaso semi-red herring, but yeah, it is frustrating22:08
* TheJulia tries to recall22:08
* TheJulia lets new devstack machine build... slowly22:13
cardoeThat change is just moving imports above a comment so definitely don't think it's that change.22:27
cardoeBut hoping to get to the root of some of our repeat failures.22:27
opendevreviewMerged openstack/sushy unmaintained/2023.1: Update .gitreview for unmaintained/2023.1  https://review.opendev.org/c/openstack/sushy/+/93669523:40
jandersGood morning Ironic o/ Happy New Year 202523:45
JayF\o23:45
janderssince I saw all the key people I wanted to touch base about a topic that raised some interest for us internally - and that is exploring some simple hardware health monitoring in Ironic with a view to use it in metal323:58
janderswith at least some vendors, System has Status.Health and Status.HealthRollUp properties which aim to give a general view of how healthy or unhealthy a server is23:59

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!