Friday, 2025-05-09

dxterslabTheJulia: pyghmi SOL to supermicro and then proxied to a websocket and using xterm.js for interactive shell works00:49
TheJuliaimpressive00:50
dxterslabthe good thing is that sol is generic and not oem specific. I will be trying the same approach with dell soon01:19
TheJuliacool01:36
JayFThanks for the new xterm-ipmi console interface ;) 02:51
dxterslabThe current approach I believe I will be taking is having a proxy that does IPMI SOL in the backend and websocket on the front end. I heavily rely on metal3, so my assumption is that ironic would own this container/app, and metal3 should proxy the websocket for Kubernetes?02:57
JayFLook at the recent redfish graphical console code, it follows a similar pattern03:05
JayFI also suspect we need to write different ways to spawn the proxy container there, too.03:05
JayFhttps://opendev.org/openstack/ironic/commit/25a3dd076a0a8d3f4bbb5886252f6d08d78e33f9 Is the last commit in that series.03:08
dxterslabI saw the graphical-console. It is a selenium browser wrapper. I tried to interact with the websockets provided by redfish, first challenge is the custom RFB used by SMC and their BMC hardware, second challenge, Dell websocket didn’t easily work with vanilla novnc… because of time I have parked this option, but will revisit in the future. Also, to my understanding, novnc doesn’t allow for copy paste for text nor 03:09
dxterslabkeeps shell history03:09
JayFWell I'm saying it uses a sidebar container03:21
JayFYou could follow the pattern, and just put your stuff into the container instead of the VNC stuff03:21
JayF**sidecar03:22
dxterslabAh! Gotcha03:28
rpittaugood morning ironic! o/07:15
queensly[m]Good morning 08:23
AmarachiOrdor[m]Good morning everyone!08:26
freemanboss[m]Good afternoon everyone 11:17
opendevreviewJay Faulkner proposed openstack/ironic stable/2024.2: OSSA-2025-001: Disallow unsafe image file:// paths  https://review.opendev.org/c/openstack/ironic/+/94917414:05
opendevreviewJay Faulkner proposed openstack/ironic stable/2024.1: OSSA-2025-001: Disallow unsafe image file:// paths  https://review.opendev.org/c/openstack/ironic/+/94917514:05
JayFAnyone poked at vmedia+openbmc based BMCs yet?14:45
JayFI found what looks like an old spec from their GitHub that looks like they may have chosen a cifs-based implementation at least as an option, I'm really hoping that's not the case14:58
TheJuliaI've had a few off-hand conversations about doing cifs on weird vendors int he past15:01
JayFirmc driver supports it15:02
TheJuliaand the consensus was maybe having a flag in ironic which could know to "send it a cifs url" would be ideal, but yeah....15:02
JayFfor their mc15:02
TheJuliafun15:02
TheJuliayeah15:02
TheJuliaI think that sort of predicated upon the idea of we change the output url for where to grab the artifact from, but don't change to where we write it, in other words make it the expectation that the operator sets up a cifs share with it15:03
JayFyeah, what I'm looking at is just rejecting an http url outright (not even trying to GET/HEAD it) so I think I might have hit the reverse lottery :| 15:03
JayFmakes sense15:03
TheJuliaor a cifs share which points to the vmedia folder, or something15:03
JayFand gives me ideas on what to try with sushy directly15:03
TheJuliaI think it makes a ton of sense to detect teh failure, and then try submitting a changed url15:03
TheJuliafwiw15:03
JayFprobably trivial to hack in for seeing if I can get it to work15:04
JayFack, thanks for the suggestion, I wouldn't have considered just doing a basic fallback15:04
okamitok[m]So I've got a just general question on flow that maybe I'm not understanding.... (full message at <https://matrix.org/oftc/media/v1/media/download/AcoWRZe3OD7jr9wyHvi2mt8-77bes5aYjqOQzgAtD2vD-iC81itD_sAZP-6GYZt2am593eTX5PkQiLs-tq_ZjbRCeW_bLnzwAG1hdHJpeC5vcmcvR3ROT3NMZHBzbUdma01rTWVrQW9QQ1Rq>)15:11
JayFto your last line; yes15:12
JayFthat's exactly what happens in flat or neutron network interface15:12
okamitok[m]Got it so that's where my issue is then, everything is working except that last step.15:14
okamitok[m]I'll need to dive into the logs, thanks.15:14
* TheJulia attempts to put brain into policy writing mode15:24
*** MichaelSherman[m] is now known as shermanm[m]15:38
shermanm[m]now that service_steps are a thing, are there any plans around snapshot support? I've got a student working on our internal mechanism for it this summer, and thought it would be nice to try and align with an approach that could get upstreamed15:41
opendevreviewJay Faulkner proposed openstack/ironic bugfix/26.0: OSSA-2025-001: Disallow unsafe image file:// paths  https://review.opendev.org/c/openstack/ironic/+/94918615:44
TheJuliaJayF: it occurs to me someone came in with the exact same issue recently16:18
JayFMy current plan is to hack in ironic/drivers/modules/redfish/boot.py:288 and last-minute swap to an smb url in this test env16:19
JayFto see if it will take samba16:19
TheJuliashermanm[m]:  would love to do it and get it into place, that being said I don't have time to do it. I think we could loosely collaborate/mentor if that might help on a upstream friendly change16:19
TheJuliaJayF: I think that is exactly what that person did based upon the failure from the bmc16:19
JayFif not, I will likely email a vendor to get more info16:19
opendevreviewJulia Kreger proposed openstack/ironic master: Patch configdrive metadata  https://review.opendev.org/c/openstack/ironic/+/94667716:20
TheJulia^ was painful.16:20
TheJuliaBut more so from stop/start multiple times16:20
cardoeSo I think we should look at implementing the upload to BMC instead of instructing the BMC to download it.16:27
opendevreviewJulia Kreger proposed openstack/ironic master: Patch configdrive metadata  https://review.opendev.org/c/openstack/ironic/+/94667716:27
cardoeI mean sure extend how ya need.16:27
cardoeBut we should add the upload to BMC and nudge folks to try that first.16:27
JayFAt this point I'm not thinking in theory, I wanna get a thing working on a specific piece of hardware16:29
cardoeYeah that's why I said "yes extend how ya need"16:30
cardoeJust tossing out a suggestion for a longer term thing.16:30
opendevreviewJay Faulkner proposed openstack/ironic bugfix/26.0: [bugfix-only] Further docs build fixes  https://review.opendev.org/c/openstack/ironic/+/94937316:39
JayFThe docs build failures that continue are incredibly perplexing16:40
* JayF going to maybe try one alternate approach16:40
opendevreviewJay Faulkner proposed openstack/ironic bugfix/26.0: [bugfix-only] Ensure u-c is applied  https://review.opendev.org/c/openstack/ironic/+/94937416:42
JayFtwo different approaches, if the second works it'll be ebtter16:42
JayFalternatively if cores are +1, I can ask infra to mash force merge on these security patches and stop the urgency-clock on fixing docs16:43
shermanm[m]TheJulia: that's exactly what I was hoping for, just broad guidance on footguns to avoid, and a "shape" that might be acceptable. And maybe it just turns into a spec after we've discovered some tradeoffs16:49
Sandzwerg[m]hello ironic. Has anyone heard about race conditions with nvme naming? I see a case were the IPA identified the root disk correctly as the smaller disk (nvm1n1) Accourding to the IPA log the device is <900GiB. But in the booted instance the OS installed on nvme1n1 (6TB) which was detected as nvme0n1 in the IPA log. I mean I get that different OS'es might have different naming schemes but it seems to me right now that IPA is16:50
Sandzwerg[m]writing to the bigger disk but detects that disk as the other, smaller one. 16:50
TheJuliashermanm[m]: Yeah, a spec is a great starting point to try and get to the same place16:51
TheJulia... That is weird, since If I'm remembering the logic correctly the OS should be targetted to the smallest disk greater than 4 GB16:54
TheJuliaSandzwerg[m]: ^16:54
Sandzwerg[m]We set root hints, but the hint is correct if I can believe the log. According to the log the OS is installed on the correct, so the smaller, disk. But when booting it appears the OS was installed on the wrong disk. But the device name in IPA and OS is swapped. So IPA: nvme0 big disk, nvme1 small disk, OS: nvme0 small disk, nvme1 big disk16:57
Sandzwerg[m]I only have a couple of nodes, all from the same order, that show these behavior. 16:58
TheJuliaoh wow16:59
TheJuliacould it be initalization order differences?16:59
TheJuliaGoing back to.. ?2022? distributions started shipping kernels which did async device initalization17:00
Sandzwerg[m]I'm not sure what it is but it feels like something like that yeah17:01
TheJuliawhich makes actual device by name matching/ordering unreliable across reboots17:01
TheJuliaunless you only have a single such device17:01
Sandzwerg[m]I would expect that even if both OS order/name the devices differently if they get the facts from these devices these should be stable. But right now it feels like IPA mixes the naming at first, then continues to use the mixed names but the device naming itself is changed while that is happening17:03
TheJuliaJayF: I'm good with force merging fixes in for broken doc builds, as long as we work the doc issues17:03
TheJuliaordering for the kernel is purely based upon which device responds first17:04
TheJuliawhat is your root device hint set to?17:04
JayFTheJulia: the thing is, I only see the docs job failing on the patch changes, and IDK why17:06
Sandzwerg[m]The root device hint is <900 and the smaller disk fits to that (894GiB)17:08
TheJuliaso no hint?17:11
JayFGB vs GiB shenanigans?17:11
TheJuliano, one boot the device will be /dev/nvme0, reboot it might be /dev/nvme1 or /dev/nvme017:12
TheJuliaJayF: docs run on a different nodeset17:13
Sandzwerg[m]> so no hint?17:13
Sandzwerg[m]No the hint is "root_device: size <=900". That is correct.17:13
Sandzwerg[m]Should FS UUIDs be stable? Maybe this is an old install, because the FS UUIDs I see in the IPA log and on the OS are different. But then the install to the smaller disk is completly gone17:14
Sandzwerg[m]ohhhhh, the metadata uses the date & time of depyment as UUID and that shows that the partitions on the bigger disk are two months old. It does not explain why all the partitions that IPA wrote to the smaller disk are no longer visible in lsblk but maybe wiping everything is enough. (Yes I know I reaaally need cleaning but we had issues with it last time we tried it)17:20
TheJuliaSandzwerg[m]: UUIDs/WWID/Serial numbers are stable (except on some raid controlelrs which fake a serial number)17:25
Sandzwerg[m]Yeah these were also not matching, that was the other hint. I'll wipe both disks and try again17:26
shermanm[m]Sandzwerg: if wiping fixes it, I'd try to get cleaning enabled if at all possible. we were having similar issues here: https://bugs.launchpad.net/ironic/+bug/2084565, https://bugs.launchpad.net/ironic/+bug/2084852 . worst case, you should be able to use deploy templates to trigger erase_devices_metadata during deployment, instead of as a separate cleaning step.17:29
Sandzwerg[m]The issue is that for some of our deployments we don't want to wipe all disks normally as in these cases the bigger disks hold data that ideally shouldn't get deleted during (redeployments). I think the issue here was that the root device hint at first was wrong so the first deployment went to a wrong disk and since then it didn't really recover. But I will need to look at cleaning again. I'll look into the links thanks. Maybe17:33
Sandzwerg[m]deploy templates work for us, need to look them up. 17:33
shermanm[m]one thing that I did discover on the last go-around, the rebuild action doesn't trigger automated cleaning17:36
shermanm[m]not sure off the top of my head if it's possible to blacklist some disks form cleaning17:36
JayFTheJulia: in case the vmedia question comes up again: http:// urls (nopearoni, fast error) https:// urls, it actually tries to hit17:37
JayFTheJulia: so I think it's reuqiring https17:37
TheJuliaoh, nice17:39
JayF(it also liked smb:// urls, but I was unable to get one to work)17:39
TheJuliacifs:// perhaps?17:40
JayFcifs is rejected like http is17:40
JayFhttp/cifs: fast, non-retryable error17:40
JayFsmb/https: appears to try something (we couldn't get smb:// to ever connect, but we got https:// to connect)17:41
Sandzwerg[m]<shermanm[m]> "one thing that I did discover on..." <- hmm somewhere in the back of my head I've heard of that command but never used it. Might something I need to look into as well. Thanks for the idea.17:44
Sandzwerg[m]<shermanm[m]> "not sure off the top of my..." <- I think back then the only way I found was to write our own hardware(device?)manager and I never got around to do that. But the issues we had were more with ports that were created but not at the correct place (project) and then cleaning would fail.17:46
Sandzwerg[m]BTW I meet Aeva last weekend and she mentioned she worked on ironic back then. I'm sure I should greet you all :)17:58
JayFAeva is wonderful people18:04
TheJuliacool cool18:27
TheJuliashermanm[m]: yeah, rebuild was originally modeled on just redeploy the node, don't clean the state because in the partition image days we had this preserve_ephemeral context18:29
* TheJulia twitches18:29
* JayF uses rebuild downstream for "get a new OS image but keep the same physical hardware"18:31
JayFno nonsensical preserving of ephemeral here :D 18:32
JayFlike trying to capture steam in a butterfly net; the preserved ephemeral :D 18:32
TheJuliaheh18:36
TheJuliaWhy would anyone want to capture a butterfly?!18:36
* TheJulia tries to not draw lines and things there18:40
Sandzwerg[m]<JayF> "Aeva is wonderful people" <- I agree, so are you all. I always enjoy talking to you all <318:50
Sandzwerg[m]<JayF> "uses rebuild downstream for "get..." <- We currently do that with a new deployment but rebuild might be better suited 18:51
TheJuliaSandzwerg[m]: you'll want a bit more specifics on the root device hint, just in case since you have multiple devices19:00
Sandzwerg[m]For us it work as we only have two disks (or three) disks in a single node. A small(ish) disk for the OS (usually a raid from some onboard raid controller) and one or more bigger disks for data, which all have the same size. The OS disk is usally <1TiB and I think the data disks are usally >2TiB. So it's relatively clear.19:03
Sandzwerg[m]Adding other properties might also be hard, because of the raid controller there is no manufacturer reported, and since all disks on new hardware are SSD/NVMe some of the other properties one could use to differentiate are not helpfull anymore19:06
TheJuliayeah19:22
TheJuliathat is a common problem with hardware raid controllers19:22
opendevreviewJay Faulkner proposed openstack/ironic master: Inspection throws exception on CPU-less systems  https://review.opendev.org/c/openstack/ironic/+/94909019:53
opendevreviewJay Faulkner proposed openstack/ironic master: Inspection throws exception on CPU-less systems  https://review.opendev.org/c/openstack/ironic/+/94909019:54
opendevreviewJulia Kreger proposed openstack/ironic master: Patch configdrive metadata  https://review.opendev.org/c/openstack/ironic/+/94667720:21
opendevreviewJay Faulkner proposed openstack/ironic bugfix/26.0: OSSA-2025-001: Disallow unsafe image file:// paths  https://review.opendev.org/c/openstack/ironic/+/94918620:52
opendevreviewJay Faulkner proposed openstack/ironic bugfix/26.0: OSSA-2025-001: Disallow unsafe image file:// paths  https://review.opendev.org/c/openstack/ironic/+/94918621:03
opendevreviewJulia Kreger proposed openstack/ironic master: provide host_id to neutron early on  https://review.opendev.org/c/openstack/ironic/+/94637821:06
opendevreviewJulia Kreger proposed openstack/ironic master: Patch configdrive metadata  https://review.opendev.org/c/openstack/ironic/+/94667721:06
opendevreviewJulia Kreger proposed openstack/ironic master: Consider missing MTU invalid metadata  https://review.opendev.org/c/openstack/ironic/+/94938521:06
JayFI am at my wits end: https://review.opendev.org/c/openstack/ironic/+/949186 fails the docs job but https://review.opendev.org/c/openstack/ironic/+/949373 passes it. I've not touched *any* of the config in question in the backported patch21:22
JayFIt almost feels like it's running different code in the docs job for my change21:22
JayFAnyone with any ideas, please share them. I'm to the point where Monday I'll get a held node and look at what's going on21:31
okamitok[m]Hey everyone, so I did some more digging and one thing I'm not sure about is when the inspection should happen?21:48
okamitok[m]I was reading some documentation that says to do a baremetal node inspect before provide to make it available for Nova.21:48
okamitok[m]That throws an error about not being supported by ipmi.21:48
okamitok[m]But if I do a baremetal introspection start node_id it powers on and removes the ignore from the ports in dnsmasq.21:49
okamitok[m]Once it finishes though the ignore gets added back and even doing a server create it stays ignored.21:49
JayFI suggest payingless attention to the DHCP config behind the stage :D 21:50
JayFSo inspection, like a lot of things in ironic, can be done different ways21:50
JayFin your config file and in the nodes you'll have an inspect_interface referenced21:50
JayFfor an IPMI node, that'd likely be set to "agent" which means we boot a ramdisk and perform inspection when you call `baremetal node $name inspect`21:51
JayF\21:51
JayFthat's the dhcp actions you saw21:51
JayFdid the node go back into manageable once complete? 21:51
JayFmost of the information you need will be in the node, fields of last_error, provision_state, target_provision_state among others21:51
JayFalso https://docs.openstack.org/ironic/latest/admin/node-history.html can perhaps give you insight into previous failures if you've not been checking last_error21:52

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!