gmann | JayF: cool. btw this greande GLOBAL_VENV is ready https://review.opendev.org/c/openstack/ironic/+/932016 | 03:56 |
---|---|---|
gmann | iurygregory: ^^ | 03:56 |
rpittau | good morning ironic! happy friday! o/ | 06:52 |
kubajj | good morning rpittau, and ironic! o/ | 06:52 |
rpittau | hey kubajj :) | 06:57 |
opendevreview | Merged openstack/sushy master: bump pbr to match what pyproject.toml requests https://review.opendev.org/c/openstack/sushy/+/932638 | 07:43 |
TheJulia | good morning! | 12:17 |
rpittau | hey TheJulia :) | 12:18 |
iurygregory | happy friday ironic o/ | 12:47 |
opendevreview | Dmitry Tantsur proposed openstack/ironic master: Replace image_format_inspector with its oslo.utils version https://review.opendev.org/c/openstack/ironic/+/929904 | 13:22 |
opendevreview | cid proposed openstack/ironic master: Gracefully handle bad request exception https://review.opendev.org/c/openstack/ironic/+/931849 | 13:40 |
opendevreview | cid proposed openstack/ironic-specs master: Add a Kea DHCP backend https://review.opendev.org/c/openstack/ironic-specs/+/931025 | 13:43 |
cid | I'm going out of Keyboard for a while, but if anyone could confirm/debunk my hypothesis here https://review.opendev.org/c/openstack/ironic/+/931849, I will appreciate it. Just leave the feedback on there. | 14:16 |
TheJulia | cid: I suspect that is a good change. I think I've seen something like that befor3 | 14:51 |
TheJulia | before | 14:51 |
shermanm | I had some good discussion on the realities of node-cleaning yesterday, but had some questions come up regarding instance rebuilds. | 15:17 |
shermanm | do rebuilds use the same root disk as the prior instance? or does it run through the disk selection logic every time? | 15:17 |
shermanm | my concern is that if it hits the same "pick a root disk of minimum size sort of randomly" logic as we've been seeing, then rebuilds would be vulnerable to the "stale partitions / configdrive" on secondary disks issue that i've been working around with automated cleaning disabled | 15:24 |
rpittau | bye everyone, have a great weekend! o/ | 15:33 |
opendevreview | Dmitry Tantsur proposed openstack/ironic master: Actually ignore [inspector]power_off with fast track https://review.opendev.org/c/openstack/ironic/+/932720 | 15:34 |
TheJulia | shermanm: if you set a static root device hint, yes, however it does run through selection *each* time as well and the algorithm is static. The last time we changed it was to logically handle device mapper/multi-devices and that was done as filtering a while back. | 16:08 |
TheJulia | shermanm: I used to know this better off the top of my head, but it is the largest device smaller than like 10GB or something funky like that. If you want I can dig up the code in the ironic-python-agent's hardware manager, but *generally* we expect it to be static. If you've got changing, could the devices be the same size? We *have* seen that happen. | 16:09 |
shermanm | yes, that's exactly the issue we're working around | 16:12 |
shermanm | a bunch of nodes with N disks of same, smallest size | 16:12 |
shermanm | and wanting to avoid needing to set a static hint on every node | 16:13 |
shermanm | common case is that a node was delivered with raid1 boot disk, and was then reconfigured as separate disks | 16:13 |
TheJulia | is the device naming at leas consistent? | 16:13 |
TheJulia | least | 16:13 |
TheJulia | i.e. no rhel 9.2 | 16:13 |
shermanm | the naming inside e.g. ubuntu seems to be relatively consistent, but we keep ending up with e.g. `/` on /dev/sda, and `/efi` on /dev/sdb | 16:14 |
shermanm | or cloud-init reading an old config-drive from sdb (explicitly with automated cleaning disabled) | 16:14 |
shermanm | but I haven't tested the rebuild case, I think it would run into the same issue | 16:14 |
shermanm | maybe we could do some kind of consistent hashing off of the wwn as a tie-breaker? | 16:15 |
TheJulia | eww, yeah, one moment | 16:17 |
TheJulia | https://github.com/openstack/ironic-python-agent/blob/master/ironic_python_agent/hardware.py#L1718-L1780 | 16:18 |
cid | TheJulia: noted, thanks. | 16:19 |
TheJulia | https://github.com/openstack/ironic-lib/blob/master/ironic_lib/utils.py#L449-L487 | 16:19 |
TheJulia | shermanm: so realistically, we would likely to add some further smarts to do tie breaking around https://github.com/openstack/ironic-python-agent/blob/master/ironic_python_agent/hardware.py#L1774C1-L1774C38 | 16:20 |
shermanm | alternatively, I had a proposed fix which just runs `erase_devices_metadata` prior to writing the new disk image, but I'm unsure how that would interact with things like the `preserve_ephemeral` flag for rebuild | 16:22 |
TheJulia | it would destroy the structure | 16:26 |
TheJulia | because AIUI, we don't recreate/re-save the struccture | 16:26 |
TheJulia | and it also goes into the initial metadata sections | 16:27 |
shermanm | yeah, I don't think we've ever tried that flag, but it jumped out to me as an edge case that makes it messy | 16:27 |
TheJulia | We do have a thing where some of this happens, but obviously we are all about "please clean" but on rebuilds this makes a lot of sense as an issue if the match can choose a matching identically sized disk | 16:28 |
shermanm | I tested the approach with deploy templates, and it was working for "normal" deploys, I've got some rebuilds testing now to double-check | 16:28 |
TheJulia | I still think there is a fundimental issue with simiarly sized devices, just never had anyone doing it with rebuilds, really | 16:30 |
TheJulia | and typically. | 16:30 |
TheJulia | typically, it should still end up on the same device | 16:30 |
TheJulia | since the data gets serialized out | 16:30 |
TheJulia | but... maybe the kernel in the ramdisk changing might impact that | 16:30 |
* TheJulia knows it would with newer rhel | 16:30 | |
shermanm | yeah, I don't have a root cause for the ordering change | 16:30 |
shermanm | but we were having ~50% failure rates on our nodes with 2 matching disks | 16:31 |
shermanm | even after a manual cleaning to wipe out prior state | 16:31 |
TheJulia | if there is a bug with the ironic-python-agent logs from 1 or more attempts, that might help | 16:31 |
TheJulia | eek | 16:31 |
shermanm | yeah, let me see what I can pull out from the logs | 16:32 |
shermanm | these are all on dell nodes btw. A bunch of them have those BOSS dual m.2 boot drives, maybe there's an issue with how those enumerate? | 16:33 |
shermanm | in any case, I've got this bug open https://bugs.launchpad.net/ironic/+bug/2084565 , and I'll dig up some more context | 16:34 |
cardoe | shermanm: we (rackspace) need vlan trunks as well on baremetal so it's something we're interested in as well. James gave the talk but you can ping me about the inspection stuff. I'm working to get rid of all the special bits and make it part of an ironic flow as much as possible and where it's not have some docs around how it could work. | 16:37 |
shermanm | nice! i'll definitely be in touch, I've been working on our own automation about inspection (combination of in-band and out-of band), and how to use that data not just for node enrollment, but to detect divergence from baseline (e.g. firmware versions have changed, a disk is missing, etc) | 16:39 |
shermanm | right now it runs ~weekly on nodes when they're not in use, and makes a PR against a big git repo of json data when a change is detected | 16:39 |
shermanm | but could clearly be more integrated | 16:40 |
TheJulia | shermanm: possibly, we generally see folks raid-1 them since... that is the point of the controller. It also does that out of the box | 16:44 |
TheJulia | or at least, historically did reset the device after a full system reset/wipe and assembled two m.2 devies on a BOSS to a raid1 | 16:44 |
opendevreview | Michael Sherman proposed openstack/ironic master: allow disk cleaning during deploy https://review.opendev.org/c/openstack/ironic/+/932731 | 16:53 |
shermanm | sorry, I will eventually figure out the correct use of gerrit instead of making new issues by accident | 16:55 |
clarkb | shermanm: the change id in the footer of the commit message is what gerrit uses to tie a new push to an existing change. If project + target branch + change id match an existing change you get a new version rather than a new change | 16:56 |
clarkb | an easy way to update things is to git commit --amend but you cn also copy and paste change ids around etc | 16:56 |
shermanm | ah, ok. but if I needed to change the target branch (because submitted to the wrong one initially), that would make a new change for the same ID? | 16:57 |
clarkb | correct the unique identifier is the project + target branch + changeid tuple. So moving branches requires a new change in gerrit | 16:58 |
TheJulia | shermanm: just cherry-pick your old on to the master branch | 17:07 |
TheJulia | oh, I guess you did | 17:08 |
TheJulia | Its all good | 17:08 |
shermanm | I guess I can just restore the abandoned change? didn't want to lose the context in the comments | 17:08 |
TheJulia | its not lost even when abandoned | 17:09 |
TheJulia | just slightly harder to see | 17:09 |
opendevreview | Doug Goldstein proposed openstack/ironic-python-agent master: add pyproject.toml to support pip 23.1 https://review.opendev.org/c/openstack/ironic-python-agent/+/932734 | 17:14 |
opendevreview | Doug Goldstein proposed openstack/ironic master: add pyproject.toml to support pip 23.1 https://review.opendev.org/c/openstack/ironic/+/932735 | 17:16 |
opendevreview | Doug Goldstein proposed openstack/networking-baremetal master: add pyproject.toml to support pip 23.1 https://review.opendev.org/c/openstack/networking-baremetal/+/932736 | 17:17 |
opendevreview | Doug Goldstein proposed openstack/networking-generic-switch master: add pyproject.toml to support pip 23.1 https://review.opendev.org/c/openstack/networking-generic-switch/+/932737 | 17:19 |
cardoe | You know what I just realized as I'm spamming this... | 17:19 |
cardoe | the requirements.txt dependency is wrong. | 17:19 |
cardoe | We don't have a run time dependency on that. | 17:20 |
opendevreview | Doug Goldstein proposed openstack/ironic-lib master: add pyproject.toml to support pip 23.1 https://review.opendev.org/c/openstack/ironic-lib/+/932738 | 17:21 |
cardoe | I'll follow that up in a bit. | 17:21 |
cardoe | Okay that's the last one... what'd I miss that we wanna for sure keep going? | 17:22 |
cardoe | I wasn't gonna touch ironic-inspector | 17:26 |
JayF | I'll look at those, but it needs to be done everywhere | 19:03 |
JayF | sadly the ironic change looks like it needs more | 19:07 |
JayF | any of our jobs that use wsgi just don't work | 19:08 |
opendevreview | Jay Faulkner proposed openstack/ironic-python-agent master: Migrate to oslo.utils-based format_inspector https://review.opendev.org/c/openstack/ironic-python-agent/+/928463 | 19:26 |
opendevreview | Jay Faulkner proposed openstack/ironic-python-agent master: Cleanup usage of imported-from-ironic-lib disk_utils https://review.opendev.org/c/openstack/ironic-python-agent/+/928466 | 19:26 |
opendevreview | Jay Faulkner proposed openstack/ironic-python-agent master: Remove use of ironic_lib i18n module https://review.opendev.org/c/openstack/ironic-python-agent/+/930080 | 19:26 |
opendevreview | Jay Faulkner proposed openstack/ironic-python-agent master: Migrate more trivial code from ironic-lib https://review.opendev.org/c/openstack/ironic-python-agent/+/928779 | 19:26 |
opendevreview | Jay Faulkner proposed openstack/ironic-python-agent master: Migrate to oslo.utils-based format_inspector https://review.opendev.org/c/openstack/ironic-python-agent/+/928463 | 19:33 |
opendevreview | Jay Faulkner proposed openstack/ironic-python-agent master: Cleanup usage of imported-from-ironic-lib disk_utils https://review.opendev.org/c/openstack/ironic-python-agent/+/928466 | 19:33 |
opendevreview | Jay Faulkner proposed openstack/ironic-python-agent master: Remove use of ironic_lib i18n module https://review.opendev.org/c/openstack/ironic-python-agent/+/930080 | 19:33 |
opendevreview | Jay Faulkner proposed openstack/ironic-python-agent master: Migrate more trivial code from ironic-lib https://review.opendev.org/c/openstack/ironic-python-agent/+/928779 | 19:33 |
TheJulia | o/ folks, heading out for the weekend | 19:39 |
iurygregory | have a great weekend TheJulia o/ | 19:42 |
JayF | \o | 19:43 |
*** ubuntu is now known as Guest6760 | 21:40 | |
JayF | I got some of these heading my way: https://wiki.sipeed.com/hardware/en/kvm/NanoKVM/introduction.html | 22:17 |
JayF | likely will be my dev bmc for redfish stuff that's not oem-specific | 22:17 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!