Friday, 2025-05-09

dxterslab	TheJulia: pyghmi SOL to supermicro and then proxied to a websocket and using xterm.js for interactive shell works	00:49
TheJulia	impressive	00:50
dxterslab	the good thing is that sol is generic and not oem specific. I will be trying the same approach with dell soon	01:19
TheJulia	cool	01:36
JayF	Thanks for the new xterm-ipmi console interface ;)	02:51
dxterslab	The current approach I believe I will be taking is having a proxy that does IPMI SOL in the backend and websocket on the front end. I heavily rely on metal3, so my assumption is that ironic would own this container/app, and metal3 should proxy the websocket for Kubernetes?	02:57
JayF	Look at the recent redfish graphical console code, it follows a similar pattern	03:05
JayF	I also suspect we need to write different ways to spawn the proxy container there, too.	03:05
JayF	https://opendev.org/openstack/ironic/commit/25a3dd076a0a8d3f4bbb5886252f6d08d78e33f9 Is the last commit in that series.	03:08
dxterslab	I saw the graphical-console. It is a selenium browser wrapper. I tried to interact with the websockets provided by redfish, first challenge is the custom RFB used by SMC and their BMC hardware, second challenge, Dell websocket didn’t easily work with vanilla novnc… because of time I have parked this option, but will revisit in the future. Also, to my understanding, novnc doesn’t allow for copy paste for text nor	03:09
dxterslab	keeps shell history	03:09
JayF	Well I'm saying it uses a sidebar container	03:21
JayF	You could follow the pattern, and just put your stuff into the container instead of the VNC stuff	03:21
JayF	**sidecar	03:22
dxterslab	Ah! Gotcha	03:28
rpittau	good morning ironic! o/	07:15
queensly[m]	Good morning	08:23
AmarachiOrdor[m]	Good morning everyone!	08:26
freemanboss[m]	Good afternoon everyone	11:17
opendevreview	Jay Faulkner proposed openstack/ironic stable/2024.2: OSSA-2025-001: Disallow unsafe image file:// paths https://review.opendev.org/c/openstack/ironic/+/949174	14:05
opendevreview	Jay Faulkner proposed openstack/ironic stable/2024.1: OSSA-2025-001: Disallow unsafe image file:// paths https://review.opendev.org/c/openstack/ironic/+/949175	14:05
JayF	Anyone poked at vmedia+openbmc based BMCs yet?	14:45
JayF	I found what looks like an old spec from their GitHub that looks like they may have chosen a cifs-based implementation at least as an option, I'm really hoping that's not the case	14:58
TheJulia	I've had a few off-hand conversations about doing cifs on weird vendors int he past	15:01
JayF	irmc driver supports it	15:02
TheJulia	and the consensus was maybe having a flag in ironic which could know to "send it a cifs url" would be ideal, but yeah....	15:02
JayF	for their mc	15:02
TheJulia	fun	15:02
TheJulia	yeah	15:02
TheJulia	I think that sort of predicated upon the idea of we change the output url for where to grab the artifact from, but don't change to where we write it, in other words make it the expectation that the operator sets up a cifs share with it	15:03
JayF	yeah, what I'm looking at is just rejecting an http url outright (not even trying to GET/HEAD it) so I think I might have hit the reverse lottery :\|	15:03
JayF	makes sense	15:03
TheJulia	or a cifs share which points to the vmedia folder, or something	15:03
JayF	and gives me ideas on what to try with sushy directly	15:03
TheJulia	I think it makes a ton of sense to detect teh failure, and then try submitting a changed url	15:03
TheJulia	fwiw	15:03
JayF	probably trivial to hack in for seeing if I can get it to work	15:04
JayF	ack, thanks for the suggestion, I wouldn't have considered just doing a basic fallback	15:04
okamitok[m]	So I've got a just general question on flow that maybe I'm not understanding.... (full message at <https://matrix.org/oftc/media/v1/media/download/AcoWRZe3OD7jr9wyHvi2mt8-77bes5aYjqOQzgAtD2vD-iC81itD_sAZP-6GYZt2am593eTX5PkQiLs-tq_ZjbRCeW_bLnzwAG1hdHJpeC5vcmcvR3ROT3NMZHBzbUdma01rTWVrQW9QQ1Rq>)	15:11
JayF	to your last line; yes	15:12
JayF	that's exactly what happens in flat or neutron network interface	15:12
okamitok[m]	Got it so that's where my issue is then, everything is working except that last step.	15:14
okamitok[m]	I'll need to dive into the logs, thanks.	15:14
* TheJulia attempts to put brain into policy writing mode		15:24
*** MichaelSherman[m] is now known as shermanm[m]		15:38
shermanm[m]	now that service_steps are a thing, are there any plans around snapshot support? I've got a student working on our internal mechanism for it this summer, and thought it would be nice to try and align with an approach that could get upstreamed	15:41
opendevreview	Jay Faulkner proposed openstack/ironic bugfix/26.0: OSSA-2025-001: Disallow unsafe image file:// paths https://review.opendev.org/c/openstack/ironic/+/949186	15:44
TheJulia	JayF: it occurs to me someone came in with the exact same issue recently	16:18
JayF	My current plan is to hack in ironic/drivers/modules/redfish/boot.py:288 and last-minute swap to an smb url in this test env	16:19
JayF	to see if it will take samba	16:19
TheJulia	shermanm[m]: would love to do it and get it into place, that being said I don't have time to do it. I think we could loosely collaborate/mentor if that might help on a upstream friendly change	16:19
TheJulia	JayF: I think that is exactly what that person did based upon the failure from the bmc	16:19
JayF	if not, I will likely email a vendor to get more info	16:19
opendevreview	Julia Kreger proposed openstack/ironic master: Patch configdrive metadata https://review.opendev.org/c/openstack/ironic/+/946677	16:20
TheJulia	^ was painful.	16:20
TheJulia	But more so from stop/start multiple times	16:20
cardoe	So I think we should look at implementing the upload to BMC instead of instructing the BMC to download it.	16:27
opendevreview	Julia Kreger proposed openstack/ironic master: Patch configdrive metadata https://review.opendev.org/c/openstack/ironic/+/946677	16:27
cardoe	I mean sure extend how ya need.	16:27
cardoe	But we should add the upload to BMC and nudge folks to try that first.	16:27
JayF	At this point I'm not thinking in theory, I wanna get a thing working on a specific piece of hardware	16:29
cardoe	Yeah that's why I said "yes extend how ya need"	16:30
cardoe	Just tossing out a suggestion for a longer term thing.	16:30
opendevreview	Jay Faulkner proposed openstack/ironic bugfix/26.0: [bugfix-only] Further docs build fixes https://review.opendev.org/c/openstack/ironic/+/949373	16:39
JayF	The docs build failures that continue are incredibly perplexing	16:40
* JayF going to maybe try one alternate approach		16:40
opendevreview	Jay Faulkner proposed openstack/ironic bugfix/26.0: [bugfix-only] Ensure u-c is applied https://review.opendev.org/c/openstack/ironic/+/949374	16:42
JayF	two different approaches, if the second works it'll be ebtter	16:42
JayF	alternatively if cores are +1, I can ask infra to mash force merge on these security patches and stop the urgency-clock on fixing docs	16:43
shermanm[m]	TheJulia: that's exactly what I was hoping for, just broad guidance on footguns to avoid, and a "shape" that might be acceptable. And maybe it just turns into a spec after we've discovered some tradeoffs	16:49
Sandzwerg[m]	hello ironic. Has anyone heard about race conditions with nvme naming? I see a case were the IPA identified the root disk correctly as the smaller disk (nvm1n1) Accourding to the IPA log the device is <900GiB. But in the booted instance the OS installed on nvme1n1 (6TB) which was detected as nvme0n1 in the IPA log. I mean I get that different OS'es might have different naming schemes but it seems to me right now that IPA is	16:50
Sandzwerg[m]	writing to the bigger disk but detects that disk as the other, smaller one.	16:50
TheJulia	shermanm[m]: Yeah, a spec is a great starting point to try and get to the same place	16:51
TheJulia	... That is weird, since If I'm remembering the logic correctly the OS should be targetted to the smallest disk greater than 4 GB	16:54
TheJulia	Sandzwerg[m]: ^	16:54
Sandzwerg[m]	We set root hints, but the hint is correct if I can believe the log. According to the log the OS is installed on the correct, so the smaller, disk. But when booting it appears the OS was installed on the wrong disk. But the device name in IPA and OS is swapped. So IPA: nvme0 big disk, nvme1 small disk, OS: nvme0 small disk, nvme1 big disk	16:57
Sandzwerg[m]	I only have a couple of nodes, all from the same order, that show these behavior.	16:58
TheJulia	oh wow	16:59
TheJulia	could it be initalization order differences?	16:59
TheJulia	Going back to.. ?2022? distributions started shipping kernels which did async device initalization	17:00
Sandzwerg[m]	I'm not sure what it is but it feels like something like that yeah	17:01
TheJulia	which makes actual device by name matching/ordering unreliable across reboots	17:01
TheJulia	unless you only have a single such device	17:01
Sandzwerg[m]	I would expect that even if both OS order/name the devices differently if they get the facts from these devices these should be stable. But right now it feels like IPA mixes the naming at first, then continues to use the mixed names but the device naming itself is changed while that is happening	17:03
TheJulia	JayF: I'm good with force merging fixes in for broken doc builds, as long as we work the doc issues	17:03
TheJulia	ordering for the kernel is purely based upon which device responds first	17:04
TheJulia	what is your root device hint set to?	17:04
JayF	TheJulia: the thing is, I only see the docs job failing on the patch changes, and IDK why	17:06
Sandzwerg[m]	The root device hint is <900 and the smaller disk fits to that (894GiB)	17:08
TheJulia	so no hint?	17:11
JayF	GB vs GiB shenanigans?	17:11
TheJulia	no, one boot the device will be /dev/nvme0, reboot it might be /dev/nvme1 or /dev/nvme0	17:12
TheJulia	JayF: docs run on a different nodeset	17:13
Sandzwerg[m]	> so no hint?	17:13
Sandzwerg[m]	No the hint is "root_device: size <=900". That is correct.	17:13
Sandzwerg[m]	Should FS UUIDs be stable? Maybe this is an old install, because the FS UUIDs I see in the IPA log and on the OS are different. But then the install to the smaller disk is completly gone	17:14
Sandzwerg[m]	ohhhhh, the metadata uses the date & time of depyment as UUID and that shows that the partitions on the bigger disk are two months old. It does not explain why all the partitions that IPA wrote to the smaller disk are no longer visible in lsblk but maybe wiping everything is enough. (Yes I know I reaaally need cleaning but we had issues with it last time we tried it)	17:20
TheJulia	Sandzwerg[m]: UUIDs/WWID/Serial numbers are stable (except on some raid controlelrs which fake a serial number)	17:25
Sandzwerg[m]	Yeah these were also not matching, that was the other hint. I'll wipe both disks and try again	17:26
shermanm[m]	Sandzwerg: if wiping fixes it, I'd try to get cleaning enabled if at all possible. we were having similar issues here: https://bugs.launchpad.net/ironic/+bug/2084565, https://bugs.launchpad.net/ironic/+bug/2084852 . worst case, you should be able to use deploy templates to trigger erase_devices_metadata during deployment, instead of as a separate cleaning step.	17:29
Sandzwerg[m]	The issue is that for some of our deployments we don't want to wipe all disks normally as in these cases the bigger disks hold data that ideally shouldn't get deleted during (redeployments). I think the issue here was that the root device hint at first was wrong so the first deployment went to a wrong disk and since then it didn't really recover. But I will need to look at cleaning again. I'll look into the links thanks. Maybe	17:33
Sandzwerg[m]	deploy templates work for us, need to look them up.	17:33
shermanm[m]	one thing that I did discover on the last go-around, the rebuild action doesn't trigger automated cleaning	17:36
shermanm[m]	not sure off the top of my head if it's possible to blacklist some disks form cleaning	17:36
JayF	TheJulia: in case the vmedia question comes up again: http:// urls (nopearoni, fast error) https:// urls, it actually tries to hit	17:37
JayF	TheJulia: so I think it's reuqiring https	17:37
TheJulia	oh, nice	17:39
JayF	(it also liked smb:// urls, but I was unable to get one to work)	17:39
TheJulia	cifs:// perhaps?	17:40
JayF	cifs is rejected like http is	17:40
JayF	http/cifs: fast, non-retryable error	17:40
JayF	smb/https: appears to try something (we couldn't get smb:// to ever connect, but we got https:// to connect)	17:41
Sandzwerg[m]	<shermanm[m]> "one thing that I did discover on..." <- hmm somewhere in the back of my head I've heard of that command but never used it. Might something I need to look into as well. Thanks for the idea.	17:44
Sandzwerg[m]	<shermanm[m]> "not sure off the top of my..." <- I think back then the only way I found was to write our own hardware(device?)manager and I never got around to do that. But the issues we had were more with ports that were created but not at the correct place (project) and then cleaning would fail.	17:46
Sandzwerg[m]	BTW I meet Aeva last weekend and she mentioned she worked on ironic back then. I'm sure I should greet you all :)	17:58
JayF	Aeva is wonderful people	18:04
TheJulia	cool cool	18:27
TheJulia	shermanm[m]: yeah, rebuild was originally modeled on just redeploy the node, don't clean the state because in the partition image days we had this preserve_ephemeral context	18:29
* TheJulia twitches		18:29
* JayF uses rebuild downstream for "get a new OS image but keep the same physical hardware"		18:31
JayF	no nonsensical preserving of ephemeral here :D	18:32
JayF	like trying to capture steam in a butterfly net; the preserved ephemeral :D	18:32
TheJulia	heh	18:36
TheJulia	Why would anyone want to capture a butterfly?!	18:36
* TheJulia tries to not draw lines and things there		18:40
Sandzwerg[m]	<JayF> "Aeva is wonderful people" <- I agree, so are you all. I always enjoy talking to you all <3	18:50
Sandzwerg[m]	<JayF> "uses rebuild downstream for "get..." <- We currently do that with a new deployment but rebuild might be better suited	18:51
TheJulia	Sandzwerg[m]: you'll want a bit more specifics on the root device hint, just in case since you have multiple devices	19:00
Sandzwerg[m]	For us it work as we only have two disks (or three) disks in a single node. A small(ish) disk for the OS (usually a raid from some onboard raid controller) and one or more bigger disks for data, which all have the same size. The OS disk is usally <1TiB and I think the data disks are usally >2TiB. So it's relatively clear.	19:03
Sandzwerg[m]	Adding other properties might also be hard, because of the raid controller there is no manufacturer reported, and since all disks on new hardware are SSD/NVMe some of the other properties one could use to differentiate are not helpfull anymore	19:06
TheJulia	yeah	19:22
TheJulia	that is a common problem with hardware raid controllers	19:22
opendevreview	Jay Faulkner proposed openstack/ironic master: Inspection throws exception on CPU-less systems https://review.opendev.org/c/openstack/ironic/+/949090	19:53
opendevreview	Jay Faulkner proposed openstack/ironic master: Inspection throws exception on CPU-less systems https://review.opendev.org/c/openstack/ironic/+/949090	19:54
opendevreview	Julia Kreger proposed openstack/ironic master: Patch configdrive metadata https://review.opendev.org/c/openstack/ironic/+/946677	20:21
opendevreview	Jay Faulkner proposed openstack/ironic bugfix/26.0: OSSA-2025-001: Disallow unsafe image file:// paths https://review.opendev.org/c/openstack/ironic/+/949186	20:52
opendevreview	Jay Faulkner proposed openstack/ironic bugfix/26.0: OSSA-2025-001: Disallow unsafe image file:// paths https://review.opendev.org/c/openstack/ironic/+/949186	21:03
opendevreview	Julia Kreger proposed openstack/ironic master: provide host_id to neutron early on https://review.opendev.org/c/openstack/ironic/+/946378	21:06
opendevreview	Julia Kreger proposed openstack/ironic master: Patch configdrive metadata https://review.opendev.org/c/openstack/ironic/+/946677	21:06
opendevreview	Julia Kreger proposed openstack/ironic master: Consider missing MTU invalid metadata https://review.opendev.org/c/openstack/ironic/+/949385	21:06
JayF	I am at my wits end: https://review.opendev.org/c/openstack/ironic/+/949186 fails the docs job but https://review.opendev.org/c/openstack/ironic/+/949373 passes it. I've not touched any of the config in question in the backported patch	21:22
JayF	It almost feels like it's running different code in the docs job for my change	21:22
JayF	Anyone with any ideas, please share them. I'm to the point where Monday I'll get a held node and look at what's going on	21:31
okamitok[m]	Hey everyone, so I did some more digging and one thing I'm not sure about is when the inspection should happen?	21:48
okamitok[m]	I was reading some documentation that says to do a baremetal node inspect before provide to make it available for Nova.	21:48
okamitok[m]	That throws an error about not being supported by ipmi.	21:48
okamitok[m]	But if I do a baremetal introspection start node_id it powers on and removes the ignore from the ports in dnsmasq.	21:49
okamitok[m]	Once it finishes though the ignore gets added back and even doing a server create it stays ignored.	21:49
JayF	I suggest payingless attention to the DHCP config behind the stage :D	21:50
JayF	So inspection, like a lot of things in ironic, can be done different ways	21:50
JayF	in your config file and in the nodes you'll have an inspect_interface referenced	21:50
JayF	for an IPMI node, that'd likely be set to "agent" which means we boot a ramdisk and perform inspection when you call `baremetal node $name inspect`	21:51
JayF	\	21:51
JayF	that's the dhcp actions you saw	21:51
JayF	did the node go back into manageable once complete?	21:51
JayF	most of the information you need will be in the node, fields of last_error, provision_state, target_provision_state among others	21:51
JayF	also https://docs.openstack.org/ironic/latest/admin/node-history.html can perhaps give you insight into previous failures if you've not been checking last_error	21:52

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!