Wednesday, 2025-08-13

opendevreviewVerification of a change to openstack/ironic master failed: api: Add schema for allocations API (requests)  https://review.opendev.org/c/openstack/ironic/+/94521800:30
opendevreviewJacob Anders proposed openstack/ironic master: Fix servicing abort to respect abortable flag  https://review.opendev.org/c/openstack/ironic/+/95718900:54
jandersTheJulia ^^ is my attempt to try a fix for the issue of abortable=false being ignored in servicewait, while making a step towards completing my AI performance goal00:57
jandersI will lab-test this now and report back00:57
jandersit seems to work, avoiding two problems: 1) firmware update being interrupted (it nicely queued the abort till after it finished) and 2) having to double-tap abort to get out from service wait to service failed to active01:20
jandersonce we're happy with your change I will re-spin mine with the correct verb/ in-code naming scheme01:20
janderswill be curious what you folks think (also CC dtantsur and JayF since we discussed "abortability")01:21
opendevreviewcid proposed openstack/ironic master: Add periodic cleanup of stale conductors  https://review.opendev.org/c/openstack/ironic/+/95650001:28
opendevreviewcid proposed openstack/ironic master: Add periodic cleanup of stale conductors  https://review.opendev.org/c/openstack/ironic/+/95650001:42
opendevreviewSteve Baker proposed openstack/networking-generic-switch master: WIP Add security group support to ovs  https://review.opendev.org/c/openstack/networking-generic-switch/+/95651904:41
opendevreviewTakashi Kajinami proposed openstack/networking-generic-switch master: devstack: Drop explicit etcd api version  https://review.opendev.org/c/openstack/networking-generic-switch/+/95720707:49
tkajinamI wonder if author information of ngs needs to be updated https://github.com/openstack/networking-generic-switch/blob/master/setup.cfg#L3-L4 . I'm not quite sure if we can/should update the author but I double the email is still active07:54
opendevreviewTakashi Kajinami proposed openstack/networking-generic-switch master: Update author information  https://review.opendev.org/c/openstack/networking-generic-switch/+/95720807:58
tkajinamI *doubt*07:58
abongalegood morning ironic o/08:28
queensly[m]Good morning o/08:32
kubajjgood morning abongale , queensly[m] , and Ironic! o/08:32
darkhackernchttps://bugs.launchpad.net/ironic/+bug/212054209:34
darkhackerncteam any help on this 09:34
FreemanBoss[m]Good morning everyone 09:37
darkhackerncmorning09:46
dtantsurdarkhackernc: "request was made with project scope" is the most important hint. With new RBAC, some commands, including this one, requires a system-wide token, not a project one.09:54
darkhackerncThanks dimitry09:54
darkhackerncdtantsur, does that will block the provisioning???09:57
dtantsurdarkhackernc: normal provisioning actions can be done with a project token10:09
darkhackerncdtantsur, perfect, my nodes are starting, which means should not be a blocker for me. 10:10
darkhackerncI have 2 more bugz 10:10
darkhackernchttps://bugs.launchpad.net/ironic/+bug/212054910:15
dtantsurWell, not broken, just not implemented :) And the bug should go to Nova, not to Ironic proper.10:19
darkhackernccool thank you Dimitry10:49
darkhackerncteam can you please share some hints for this oen - https://pasteboard.co/pnSTaBENm20F.png11:32
dtantsurdarkhackernc: it's a very generic problem. Basically, the node cannot DHCP on boot. Maybe it does not reach neutron or whatever you're using for DHCP. Using tcpdump may help, also check the neutron's resources.12:00
darkhackerncdtantsur, yes, I am replicating and collecting the logs 12:14
darkhackerncwill share that 12:14
TheJuliagood morning12:57
TheJuliatkajinam: it really should be who maintains it at this point, so broadly speaking that would be OpenStack13:13
tkajinamTheJulia, yeah that's what I thought.13:15
TheJuliadarkhackernc: fwiw, I put in comments aligned with the above discussion into the bugs and updatd their status13:18
TheJuliadarkhackernc: If you manually created the baremetal port(s), I'd double check the MACs are right as a first step, then move on to ensuring the physical network the port is wired to is attached to the correct physical network on the controller nodes, and then find the dhcp service and ensure packets are making it to neutron.13:19
TheJuliaBut as Dmitry said, this is very common and often just... a typo or a incorrect port tagging, or the wrong physical network port being used on the controllers13:20
TheJuliaWhich just needs to be worked through one step at a time :(13:20
darkhackerncTheJulia++ thank you very much 13:21
darkhackerncin hte logs I am getting some uncommon messgaes and missleading info, I ll pass that in few minutes 13:21
TheJuliadtantsur: regarding eventlet, steve is suggesting splitting out the api object and indirection stuff out separately and merging. Which could make sense, if you have any strong opinion lmk13:23
TheJuliaJayF: Quick question, regarding eventlet, are the +1s more just hesitancy in general or are you expecting to them be treated or upgraded to +2 when the time comes? Just trying to make sure folks are on the same page and given the timeline for end of cycle I unfortunately need to be a little blunt :)13:26
TheJuliaFolks, If I can get a single core to review https://review.opendev.org/c/openstack/ironic/+/956801, it would be appreicated. It gets grenade back in passing shape.13:36
kubajjHello, we are looking into fixing the other part of the RAID skip_block_devices. We think that we need to write a function which retrieves raid_level of an existing RAID and so we looked into the possible values and how they can be stored in mdadm --detail. We are a bit confused about the options. The Ironic docs say that we support value of raid_level 2, but the manual for mdadm does not list it as a possible level value.13:45
kubajjhttps://docs.openstack.org/ironic/latest/admin/raid.html#mandatory-properties13:45
JayFTheJulia: mainly sticking to what I told you -- I think I may have missed too much context to get the confidence to be a core review on that. I would hope that my vote doesn't become something that's absolutely needed, because I hope that this is a big enough change we get lots of cores looking at it13:54
opendevreviewJulia Kreger proposed openstack/ironic master: Replace GreenThreadPoolExecutor in conductor  https://review.opendev.org/c/openstack/ironic/+/95293913:55
opendevreviewJulia Kreger proposed openstack/ironic master: Set the backend to threading  https://review.opendev.org/c/openstack/ironic/+/95368313:55
opendevreviewJulia Kreger proposed openstack/ironic master: Launch vnc proxy with no_fork  https://review.opendev.org/c/openstack/ironic/+/95704413:55
opendevreviewJulia Kreger proposed openstack/ironic master: Remove direct mapping from API -> DB  https://review.opendev.org/c/openstack/ironic/+/95651213:55
opendevreviewJulia Kreger proposed openstack/ironic master: Optional indirection API use  https://review.opendev.org/c/openstack/ironic/+/95650413:55
TheJuliaJayF: okay, glad you said that in public channel so other reviewers can be aware of that with their reviews so they don't take to thinking we're all using +1s instead or something like that13:57
opendevreviewJulia Kreger proposed openstack/ironic master: Set the backend to threading  https://review.opendev.org/c/openstack/ironic/+/95368313:57
opendevreviewJulia Kreger proposed openstack/ironic master: Revert "ci: temporary metal3 integration job disable"  https://review.opendev.org/c/openstack/ironic/+/95695313:58
opendevreviewJulia Kreger proposed openstack/ironic master: Clean-up misc eventlet references  https://review.opendev.org/c/openstack/ironic/+/95563213:58
darkhackerncTheJulia, https://bugs.launchpad.net/ironic/+bug/212056714:09
TheJuliaso your deploying VMs?14:12
TheJuliaso slight out of the box problem, ironic leans hard into UEFI by default. Octavia doesn't ship an amphora image which is UEFI compatible.14:13
TheJuliaso14:14
TheJuliayou really shouldn't be pre-creating your neutron port with the same mac as the baremetal host14:15
TheJuliaIronic will change it properly14:15
TheJuliaand your using the network interface flat which means it should still work I guess, but any changes from the default to the mac are expected by design14:15
TheJuliadarkhackernc: 1) check the neutron-dhcp-agent log to make sure it is working and it is processing the updates. 2) Make sure the network it is attaching a dhcp service to *is* the same physical network the VM is attached to. Most commonly the latter is the issue14:19
TheJuliaalso, make sure dnsmasq is not crashing/erroring/restarting. That has been a pain as of relative recentness14:27
dtantsurTheJulia: no objections re splitting indirection. Testing it with eventlet still present may get a useful data point.14:40
TheJuliaoh yeah, that is why I stacked them together since we don't want lock conflicts to creep into being a thing. I've noticed the before jobs seem to run a little longer by like 25% but that is still in the swing of CI in general14:41
TheJuliabefore jobs being eventlet removed + metal3-integration14:41
TheJuliabut after, at least what I've looked at might be the fastest metal3-integration executions I've seen to date14:41
TheJulia(31 minutes)14:50
TheJulia32 in the last run14:51
darkhackerncTheJulia, I am simulating in a virtualized environed, so yes I am using vm's as a vBMC using vbmc driver to manage IPMI calls.  14:51
drannouHello.  I'm playing with soft raid, doing Two distinct raid 1 with the classical json. I see in the doc that the "is_root_volume" is not working with mdadm implementation, so how do you explain to the IPA on which raid he should put the system ? Obvisouly my tests show that most of the time, it takes the wrong one :p14:52
darkhackerncand I checked everything no difference in the service, if you glance the https://pastebin.com/raw/K0Ymh6e8 you will see I have complied all the relevant logs14:52
drannouHello.  I'm playing with soft raid, doing Two distinct raid 1 with the classical json. I see in the doc that the "is_root_volume" is not working with mdadm implementation, so how do you explain to the IPA on which raid he should put the system ? Obvisouly my tests show that most of the time, it takes the wrong one :p14:53
TheJuliadarkhackernc: you don't have neutron-dhcp-agent logs there14:53
darkhackerncTheJulia, ohhh14:54
TheJuliadrannou: so, the model with software raid configured by ironic is. I think.. and it has been a long time, but its always the first device14:54
darkhackernclet me complied that as well 14:54
TheJuliaso first raid device configured14:54
TheJuliadarkhackernc: you should see leases getting logged as well, in theory. That gives you a solid indication if dhcp is working or not and starts to also point you in the direction of the root problem14:54
darkhackerncdrannou, long time back when I was in RedHat I tried that, let me check my git repo, 14:55
drannouHello Ironic team! I'm playing with soft raid, doing Two distinct raid 1 with the classical json14:56
darkhackerncdrannou, https://github.com/NileshChandekar/rhosp16_2_sw_raid14:56
darkhackernccheck if this helping you 14:56
drannouSorry buffer issue, I send multiple time the same message :p14:56
TheJuliadrannou: no worries14:58
drannoudarkhackernc: you also define "is_root_volume", but there https://docs.openstack.org/ironic/latest/admin/raid.html#optional-properties we said that it's not supported14:59
drannouAnd I don't see anything implemented for that in IPA15:00
drannouI was checking to implement it (by using the volume_name), but I was wondering if it was not the wrong direction, may be someone already had it and fix it15:00
darkhackerncdrannou, then sorry no idea, this is 3 year old stuff15:02
drannouTheJulia: I was also thinking that it would be the first device, but on my tests it doesn't seems to be the case, But my be I can put the ROOT raid at the end to verify :)15:03
TheJuliaI feel like kubajj is the new software raid expert15:03
opendevreviewJulia Kreger proposed openstack/ironic master: Clean-up misc eventlet references  https://review.opendev.org/c/openstack/ironic/+/95563215:03
kubajjTheJulia: oh, no15:03
* TheJulia grins evilly :)15:03
kubajjTheJulia: I think I also came across the issue, but then solved it with forcing a root device hint15:04
TheJuliadrannou: I don't think is_root_volume is cared about/honored by the agent software raid15:04
kubajjCan have a look into the 'is_root_volume' if needed15:04
TheJuliaideally, yeah, the you root device hint it instead, but maybe there is a path where it should be honored. I guess it is also much harder to draw a direct line there15:05
drannouThe documentation seems to be outdated, as volume_name in soft raid is working: I named my raid devices and I can see them on the host15:06
TheJuliadrannou: written by different folks with different focuses. Most of the raid docs were written before software raid was a thing.15:07
kubajjdrannou: I don't think we need the volume name for the implementation, no?!15:07
drannouI was checking on hardware.py on IPA to patch and select the corresponding named raid in get_os_install_device, but it seems tthat cached_node does not have the raid partitionning15:08
drannoukubajj: this was just for a short test, I'm sure that (as an expert as TheJulia said :p) you can do without it ;)15:09
kubajjdrannou: ah, makes sense. When creating RAID, you could add its volume name to the root device hints15:09
TheJuliawell, it can be refreshed, the issue at hand is unless the device is explicitly named you can't tie it to an automatic device determination short of somehow saving the intent to device name mapping15:10
kubajjdrannou: I am no expert :D just spent quite a lot of time trying to prevent RAIDs from being deleted when reinstalling15:10
drannoukubajj: yes, and when we write the image, we just have to find the corresponding name15:10
kubajjdrannou: for finding the name, there is already a function raid_utils.get_volume_name_of_raid_device15:11
drannoukubajj: I saw but I miss the node database information (or may be I miss it)15:11
opendevreviewDoug Szumski proposed openstack/ironic-python-agent-builder master: Don't fail early if no config drive found  https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/95725315:12
kubajjTheJulia: I am not sure if we have a different unique field of the raid which would be consistent (I am not sure if the indices of the md partitions could change, for example)15:14
kubajjvolume name should be unique15:14
opendevreviewDoug Szumski proposed openstack/ironic-python-agent-builder master: Don't fail early if no config drive found  https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/95725315:14
darkhackernc    https://bugzilla.redhat.com/show_bug.cgi?id=203224315:14
darkhackernc    https://bugzilla.redhat.com/show_bug.cgi?id=131791815:14
TheJuliakubajj: I know the device name can change across reboots and that is sort of a know your platform detail. I've never done anything with custom volume names, closest I've gotten to software raid recently was trying to access my old NAS's hard disks which were an LVM mirror set.15:15
drannoukubajj: the problem is that we might need to force the customer to add a (unique) name, if not the matching might be complicated (or we need to store the information). On my side that would not be a problem because for security reason, I changed a little bit the way raid is confiugred: raid is destroyed when instance is deleted and re created during the spawn15:18
drannouthis way we prevent a case where a customer A change the raid configuration, and a customer B get this new one (even if the host in DB is mark with the old RAID)15:19
kubajjdrannou: the current default behavior is that if you change target raid config then you need to go to cleaning anyway, no? so should not be a problem15:22
kubajjvolume names need to be unique, enforced by https://opendev.org/openstack/ironic-python-agent/src/branch/master/ironic_python_agent/hardware.py#L3370-L337715:24
drannoukubajj: depends on who is managing the infrastructure. In our case there is a huge separation between admin (that configure raid) and final customer (public cloud way of work). In that case we want to be SURE that if the customer put the host in rescue and change the radi configuration, we will destroy his change at next recycle15:24
kubajjdrannou: so the target raid config would not match the atcual raid config then?15:29
drannoukubajj: if custA change manually ? yes exactly. And as long as there is no check during recycle, you want see it. Another problem we had (if I remmber well) is that keeping raid during recycle create a huge usage of schred, instead of NVME instant erase (We only have NVMe hosts)15:32
darkhackerncTheJulia++ thanks, I will look into that tomorrow 15:34
kubajjdrannou: I understand, our use case is for our hypervisors. If we need to reinstall a hypervisor with RAID array, we want to keep the data of VMs, but reinstall the OS15:34
drannoukubajj: make sense15:39
kubajjdrannou: anyway, do you want me to have a look into the is_root_volume once I am done with debugging? https://review.opendev.org/c/openstack/ironic-python-agent/+/93734215:40
drannoukubajj: you might be better than me to fix that upstream, but I will  keep going in my read/patch of the code to better understand this part. How do you think to retreive the volume_name ? Add it in the cache_node information ?15:44
TheJuliacached node information can be replaced15:45
TheJuliaand comes from the api which you can't write to15:45
TheJuliaI think an internal mapping might make sense15:45
TheJuliaor to internally teach root device selection to check for software raid and attempt to identify is_root_volume15:45
opendevreviewClif Houck proposed openstack/ironic master: Add a new 'vendor' field to the Port object  https://review.opendev.org/c/openstack/ironic/+/95496616:01
opendevreviewClif Houck proposed openstack/ironic master: Add a new 'category' field to the Port object  https://review.opendev.org/c/openstack/ironic/+/95544716:11
opendevreviewClif Houck proposed openstack/ironic master: Add a new 'vendor' field to the Port object  https://review.opendev.org/c/openstack/ironic/+/95496616:15
opendevreviewClif Houck proposed openstack/ironic master: Add a new 'category' field to the Port object  https://review.opendev.org/c/openstack/ironic/+/95544716:20
opendevreviewClif Houck proposed openstack/ironic master: Add a new 'physical_network' field to the Portgroup object  https://review.opendev.org/c/openstack/ironic/+/95562516:34
opendevreviewClif Houck proposed openstack/ironic master: Add a new 'physical_network' field to the Portgroup object  https://review.opendev.org/c/openstack/ironic/+/95562516:36
opendevreviewClif Houck proposed openstack/ironic master: Add a new 'category' field to the Portgroup object  https://review.opendev.org/c/openstack/ironic/+/95571316:41
opendevreviewClif Houck proposed openstack/ironic-tempest-plugin master: Change Portgroup minimum microversion to 1.26  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/95579916:43
opendevreviewMerged openstack/ironic-python-agent master: Fix for motherboards where efibootmgr returns UTF-8.  https://review.opendev.org/c/openstack/ironic-python-agent/+/95606816:45
opendevreviewMerged openstack/ironic master: Initialize variable to prevent an error  https://review.opendev.org/c/openstack/ironic/+/95662917:24
opendevreviewNahian Pathan proposed openstack/sushy master: Support expanded Chassis and Storage for redfish  https://review.opendev.org/c/openstack/sushy/+/95521118:06
opendevreviewNahian Pathan proposed openstack/sushy master: Support expanded Chassis and Storage for redfish  https://review.opendev.org/c/openstack/sushy/+/95521118:10
opendevreviewNahian Pathan proposed openstack/sushy master: Support expanded Chassis and Storage for redfish  https://review.opendev.org/c/openstack/sushy/+/95521119:39
jandersgood morning Ironic o/20:53
jandersTheJulia couple questions regarding https://review.opendev.org/c/openstack/ironic/+/956972 before it gets too late in your TZ20:55
janders1) is it right to say we need to move from abort to unrescue as the verb and otherwise the code is close to merge ready?20:56
janders2) to me it feels like we also need something similar to https://review.opendev.org/c/openstack/ironic/+/957189 to enforce abortable flag in-step to be respected. Would you agree?20:57
jandersI'm working on the downstream side to this in BMO, we've got some time pressures so trying to test against in-flight patches, which is fine as long as behaviour doesn't change in a major way (can change verbs in the BMO code later).20:58
janderscoffee time in my TZ now21:05
opendevreviewcid proposed openstack/ironic master: Fix service failed state transitions for wait/hold  https://review.opendev.org/c/openstack/ironic/+/95729022:19

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!