Friday, 2024-10-18

gmannJayF: cool. btw this greande GLOBAL_VENV is ready https://review.opendev.org/c/openstack/ironic/+/93201603:56
gmanniurygregory: ^^03:56
rpittaugood morning ironic! happy friday! o/06:52
kubajjgood morning rpittau, and ironic! o/06:52
rpittauhey kubajj :)06:57
opendevreviewMerged openstack/sushy master: bump pbr to match what pyproject.toml requests  https://review.opendev.org/c/openstack/sushy/+/93263807:43
TheJuliagood morning!12:17
rpittauhey TheJulia :)12:18
iurygregoryhappy friday ironic o/12:47
opendevreviewDmitry Tantsur proposed openstack/ironic master: Replace image_format_inspector with its oslo.utils version  https://review.opendev.org/c/openstack/ironic/+/92990413:22
opendevreviewcid proposed openstack/ironic master: Gracefully handle bad request exception  https://review.opendev.org/c/openstack/ironic/+/93184913:40
opendevreviewcid proposed openstack/ironic-specs master: Add a Kea DHCP backend  https://review.opendev.org/c/openstack/ironic-specs/+/93102513:43
cidI'm going out of Keyboard for a while, but if anyone could confirm/debunk my hypothesis here https://review.opendev.org/c/openstack/ironic/+/931849, I will appreciate it. Just leave the feedback on there.14:16
TheJuliacid: I suspect that is a good change. I think I've seen something like that befor314:51
TheJuliabefore14:51
shermanmI had some good discussion on the realities of node-cleaning yesterday, but had some questions come up regarding instance rebuilds.15:17
shermanmdo rebuilds use the same root disk as the prior instance? or does it run through the disk selection logic every time?15:17
shermanmmy concern is that if it hits the same "pick a root disk of minimum size sort of randomly" logic as we've been seeing, then rebuilds would be vulnerable to the "stale partitions / configdrive" on secondary disks issue that i've been working around with automated cleaning disabled15:24
rpittaubye everyone, have a great weekend! o/15:33
opendevreviewDmitry Tantsur proposed openstack/ironic master: Actually ignore [inspector]power_off with fast track  https://review.opendev.org/c/openstack/ironic/+/93272015:34
TheJuliashermanm: if you set a static root device hint, yes, however it does run through selection *each* time as well and the algorithm is static. The last time we changed it was to logically handle device mapper/multi-devices and that was done as filtering a while back.16:08
TheJuliashermanm: I used to know this better off the top of my head, but it is the largest device smaller than like 10GB or something funky like that. If you want I can dig up the code in the ironic-python-agent's hardware manager, but *generally* we expect it to be static. If you've got changing, could the devices be the same size? We *have* seen that happen.16:09
shermanmyes, that's exactly the issue we're working around16:12
shermanma bunch of nodes with N disks of same, smallest size16:12
shermanmand wanting to avoid needing to set a static hint on every node16:13
shermanmcommon case is that a node was delivered with raid1 boot disk, and was then reconfigured as separate disks16:13
TheJuliais the device naming at leas consistent? 16:13
TheJulialeast16:13
TheJuliai.e. no rhel 9.216:13
shermanmthe naming inside e.g. ubuntu seems to be relatively consistent, but we keep ending up with e.g. `/` on /dev/sda, and `/efi` on /dev/sdb16:14
shermanmor cloud-init reading an old config-drive from sdb (explicitly with automated cleaning disabled)16:14
shermanmbut I haven't tested the rebuild case, I think it would run into the same issue16:14
shermanmmaybe we could do some kind of consistent hashing off of the wwn as a tie-breaker?16:15
TheJuliaeww, yeah, one moment16:17
TheJuliahttps://github.com/openstack/ironic-python-agent/blob/master/ironic_python_agent/hardware.py#L1718-L178016:18
cidTheJulia: noted, thanks.16:19
TheJuliahttps://github.com/openstack/ironic-lib/blob/master/ironic_lib/utils.py#L449-L48716:19
TheJuliashermanm: so realistically, we would likely to add some further smarts to do tie breaking around https://github.com/openstack/ironic-python-agent/blob/master/ironic_python_agent/hardware.py#L1774C1-L1774C3816:20
shermanmalternatively, I had a proposed fix which just runs `erase_devices_metadata` prior to writing the new disk image, but I'm unsure how that would interact with things like the `preserve_ephemeral` flag for rebuild16:22
TheJuliait would destroy the structure16:26
TheJuliabecause AIUI, we don't recreate/re-save the struccture16:26
TheJuliaand it also goes into the initial metadata sections16:27
shermanmyeah, I don't think we've ever tried that flag, but it jumped out to me as an edge case that makes it messy16:27
TheJuliaWe do have a thing where some of this happens, but obviously we are all about "please clean" but on rebuilds this makes a lot of sense as an issue if the match can choose a matching identically sized disk16:28
shermanmI tested the approach with deploy templates, and it was working for "normal" deploys, I've got some rebuilds testing now to double-check16:28
TheJuliaI still think there is a fundimental issue with simiarly sized devices, just never had anyone doing it with rebuilds, really16:30
TheJuliaand typically.16:30
TheJuliatypically, it should still end up on the same device16:30
TheJuliasince the data gets serialized out16:30
TheJuliabut... maybe the kernel in the ramdisk changing might impact that16:30
* TheJulia knows it would with newer rhel16:30
shermanmyeah, I don't have a root cause for the ordering change16:30
shermanmbut we were having ~50% failure rates on our nodes with 2 matching disks16:31
shermanmeven after a manual cleaning to wipe out prior state16:31
TheJuliaif there is a bug with the ironic-python-agent logs from 1 or more attempts, that might help16:31
TheJuliaeek16:31
shermanmyeah, let me see what I can pull out from the logs16:32
shermanmthese are all on dell nodes btw. A bunch of them have those BOSS dual m.2 boot drives, maybe there's an issue with how those enumerate?16:33
shermanmin any case, I've got this bug open https://bugs.launchpad.net/ironic/+bug/2084565 , and I'll dig up some more context16:34
cardoeshermanm: we (rackspace) need vlan trunks as well on baremetal so it's something we're interested in as well. James gave the talk but you can ping me about the inspection stuff. I'm working to get rid of all the special bits and make it part of an ironic flow as much as possible and where it's not have some docs around how it could work.16:37
shermanmnice! i'll definitely be in touch, I've been working on our own automation about inspection (combination of in-band and out-of band), and how to use that data not just for node enrollment, but to detect divergence from baseline (e.g. firmware versions have changed, a disk is missing, etc)16:39
shermanmright now it runs ~weekly on nodes when they're not in use, and makes a PR against a big git repo of json data when a change is detected16:39
shermanmbut could clearly be more integrated16:40
TheJuliashermanm: possibly, we generally see folks raid-1 them since... that is the point of the controller. It also does that out of the box16:44
TheJuliaor at least, historically did reset the device after a full system reset/wipe and assembled two m.2 devies on a BOSS to a raid116:44
opendevreviewMichael Sherman proposed openstack/ironic master: allow disk cleaning during deploy  https://review.opendev.org/c/openstack/ironic/+/93273116:53
shermanmsorry, I will eventually figure out the correct use of gerrit instead of making new issues by accident16:55
clarkbshermanm: the change id in the footer of the commit message is what gerrit uses to tie a new push to an existing change. If project + target branch + change id match an existing change you get a new version rather than a new change16:56
clarkban easy way to update things is to git commit --amend but you cn also copy and paste change ids around etc16:56
shermanmah, ok. but if I needed to change the target branch (because submitted to the wrong one initially), that would make a new change for the same ID?16:57
clarkbcorrect the unique identifier is the project + target branch + changeid tuple. So moving branches requires a new change in gerrit16:58
TheJuliashermanm: just cherry-pick your old on to the master branch17:07
TheJuliaoh, I guess you did17:08
TheJuliaIts all good17:08
shermanmI guess I can just restore the abandoned change? didn't want to lose the context in the comments17:08
TheJuliaits not lost even when abandoned17:09
TheJuliajust slightly harder to see17:09
opendevreviewDoug Goldstein proposed openstack/ironic-python-agent master: add pyproject.toml to support pip 23.1  https://review.opendev.org/c/openstack/ironic-python-agent/+/93273417:14
opendevreviewDoug Goldstein proposed openstack/ironic master: add pyproject.toml to support pip 23.1  https://review.opendev.org/c/openstack/ironic/+/93273517:16
opendevreviewDoug Goldstein proposed openstack/networking-baremetal master: add pyproject.toml to support pip 23.1  https://review.opendev.org/c/openstack/networking-baremetal/+/93273617:17
opendevreviewDoug Goldstein proposed openstack/networking-generic-switch master: add pyproject.toml to support pip 23.1  https://review.opendev.org/c/openstack/networking-generic-switch/+/93273717:19
cardoeYou know what I just realized as I'm spamming this...17:19
cardoethe requirements.txt dependency is wrong.17:19
cardoeWe don't have a run time dependency on that.17:20
opendevreviewDoug Goldstein proposed openstack/ironic-lib master: add pyproject.toml to support pip 23.1  https://review.opendev.org/c/openstack/ironic-lib/+/93273817:21
cardoeI'll follow that up in a bit.17:21
cardoeOkay that's the last one... what'd I miss that we wanna for sure keep going?17:22
cardoeI wasn't gonna touch ironic-inspector17:26
JayFI'll look at those, but it needs to be done everywhere19:03
JayFsadly the ironic change looks like it needs more19:07
JayFany of our jobs that use wsgi just don't work19:08
opendevreviewJay Faulkner proposed openstack/ironic-python-agent master: Migrate to oslo.utils-based format_inspector  https://review.opendev.org/c/openstack/ironic-python-agent/+/92846319:26
opendevreviewJay Faulkner proposed openstack/ironic-python-agent master: Cleanup usage of imported-from-ironic-lib disk_utils  https://review.opendev.org/c/openstack/ironic-python-agent/+/92846619:26
opendevreviewJay Faulkner proposed openstack/ironic-python-agent master: Remove use of ironic_lib i18n module  https://review.opendev.org/c/openstack/ironic-python-agent/+/93008019:26
opendevreviewJay Faulkner proposed openstack/ironic-python-agent master: Migrate more trivial code from ironic-lib  https://review.opendev.org/c/openstack/ironic-python-agent/+/92877919:26
opendevreviewJay Faulkner proposed openstack/ironic-python-agent master: Migrate to oslo.utils-based format_inspector  https://review.opendev.org/c/openstack/ironic-python-agent/+/92846319:33
opendevreviewJay Faulkner proposed openstack/ironic-python-agent master: Cleanup usage of imported-from-ironic-lib disk_utils  https://review.opendev.org/c/openstack/ironic-python-agent/+/92846619:33
opendevreviewJay Faulkner proposed openstack/ironic-python-agent master: Remove use of ironic_lib i18n module  https://review.opendev.org/c/openstack/ironic-python-agent/+/93008019:33
opendevreviewJay Faulkner proposed openstack/ironic-python-agent master: Migrate more trivial code from ironic-lib  https://review.opendev.org/c/openstack/ironic-python-agent/+/92877919:33
TheJuliao/ folks, heading out for the weekend19:39
iurygregoryhave a great weekend TheJulia o/19:42
JayF\o19:43
*** ubuntu is now known as Guest676021:40
JayFI got some of these heading my way: https://wiki.sipeed.com/hardware/en/kvm/NanoKVM/introduction.html22:17
JayFlikely will be my dev bmc for redfish stuff that's not oem-specific22:17

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!