15:00:15 <rpittau> #startmeeting ironic 15:00:15 <opendevmeet> Meeting started Mon Aug 12 15:00:15 2024 UTC and is due to finish in 60 minutes. The chair is rpittau. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:15 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:15 <opendevmeet> The meeting name has been set to 'ironic' 15:00:18 <iurygregory> o/ 15:00:23 <dtantsur> o/ 15:00:30 <rpittau> Hello everyone! 15:00:30 <rpittau> Welcome to our weekly meeting! 15:00:30 <rpittau> The meeting agenda can be found here: 15:00:30 <rpittau> #link https://wiki.openstack.org/wiki/Meetings/Ironic#Agenda_for_August_12.2C_2024 15:00:31 <TheJulia> o/ 15:00:32 <JayF> o/ 15:01:09 <rpittau> #topic Announcements/Reminders 15:01:25 <rpittau> #info Standing reminder to review patches tagged ironic-week-prio and to hashtag any patches ready for review with ironic-week-prio 15:01:25 <rpittau> #link https://tinyurl.com/ironic-weekly-prio-dash 15:01:42 <cardoe> o/ 15:01:57 <rpittau> it's not looking bad, lots of new patches, some from last week 15:02:01 <masghar> o/ 15:02:17 <rpittau> #info 2024.2 Dalmatian Release Schedule 15:02:17 <rpittau> #link https://releases.openstack.org/dalmatian/schedule.html 15:02:30 <rpittau> we're at week -7 ! 15:02:37 <cid> o/ 15:02:46 <rpittau> we hsould start thinking about what's missing in the non-client and client libraries 15:02:55 <rpittau> FF is in 2 weeks also 15:03:14 <rpittau> #info PTL nominations period between August 14 and August 28 15:03:29 <rpittau> I have decided to run for a second mandate as PTL 15:03:41 <iurygregory> nice =D 15:03:52 <rpittau> :) 15:04:18 <rpittau> sorry for the late notice, just life in the middle of everything, it's been a long summer and it hasn't ended yet! 15:04:37 <JayF> Thanks Riccardo! 15:04:48 <TheJulia> Awesome, thanks! 15:04:53 <rpittau> my pleasure, really :) 15:05:01 <rpittau> I'm glad I can do it 15:05:32 <rpittau> #info the next OpenInfra PTG which will take place October 21-25, 2024 virtually! Registration is now open! 15:05:32 <rpittau> #link https://ptg.openinfra.dev/ 15:05:38 <rpittau> the ironic team has been registered 15:05:54 <rpittau> please add your name and topics here 15:05:54 <rpittau> https://etherpad.opendev.org/p/ironic-ptg-october-2024 15:06:48 <rpittau> we still have some time for topics 15:07:12 <rpittau> anything else to announce/remind ? 15:07:58 <rpittau> okey dokey, onward! 15:08:00 <rpittau> #topic Review Ironic CI status 15:08:49 <rpittau> anything worth mentioning from last week? 15:08:49 <rpittau> I've only seen some instability on pkgs repos, but it recovered pretty quickly 15:09:36 <rpittau> alright, great week for CI :D 15:09:45 <rpittau> #topic Discussions 15:09:57 <rpittau> nothing to discuss, unless anyone has anything to mention? 15:10:17 <JayF> Is someone making sure old bugfix branches still get retired? 15:10:23 <rpittau> JayF: I am :) 15:10:27 <JayF> Awesome, thanks :) 15:10:39 <rpittau> np! :) 15:11:23 <rpittau> actually maybe I should write something down about the procedure, I'll take a note on that 15:11:29 <TheJulia> ++ 15:11:57 <rpittau> ok! 15:11:57 <rpittau> anything else? 15:12:06 <masghar> I would like to mention something 15:12:12 <rpittau> masghar: please go ahead :) 15:12:36 <masghar> Unfortunately I have not been able to complete the inspection rules work so far 15:12:59 <masghar> (Have been a little overwhelmed with bugs and such downstream) 15:13:10 <TheJulia> No worries, it happens :) 15:13:14 <masghar> (Just a head's up) 15:13:25 <rpittau> no worries masghar :) 15:13:31 <masghar> Thanks, will try to carve out time for it 15:13:49 <rpittau> thanks for the headsup 15:14:20 <masghar> No problem, and thanks 15:14:54 <masghar> (Thats it from me) 15:14:57 <JayF> masghar: If we can document, in detail, what's done and what's left to be done, I think cid is willing to help pickup some of that work. I'm unsure if the knowledge transfer is worth it :) 15:15:13 <JayF> masghar: so know that's an option at hand if you'd like to exercise it 15:15:34 <masghar> I have a very tiny patch that I started a few months ago 15:15:58 <masghar> I can explain my thought process to cid or whoever asks 15:16:25 <masghar> I appreciate the offer of help :) 15:16:57 <cid> masghar, JayF, In my todo for the week 15:17:14 <rpittau> thanks cid :) 15:17:19 <masghar> Thanks cid! 15:17:32 <cid> nop! 15:18:20 * cid that should be no problems :) 15:18:45 <rpittau> anything else? otherwise I have one quick thing 15:19:50 <rpittau> ok 15:19:51 <rpittau> I forgot to mention that it's almost time for the highlights 15:19:51 <rpittau> I will take care of them, at least start them, I'll be out at the beginning of Seprember when they're due, but I'll do my best to finish them before I leave for my PTO 15:19:51 <rpittau> if you have anything you want to mention about this development cycle please let me know 15:20:15 <iurygregory> ack 15:20:52 <rpittau> alright, moving on 15:20:54 <rpittau> #topic Bug Deputy Updates 15:21:05 <rpittau> cid: thanks for taking car of that, anything to mention? 15:21:44 <cid> So, I needed help triaging 4 bugs, I think two can be considered done, except these other two: 15:21:44 <cid> https://bugs.launchpad.net/sushy/+bug/2075979 15:21:44 <cid> https://bugs.launchpad.net/sushy/+bug/2075980 15:21:58 <cardoe> So the first one is mine. 15:22:33 <TheJulia> I suspect yours is a valid but, but I'm having trouble understanding exactly what, the partition, in the api is 15:22:40 <cardoe> Essentially sushy does not see those NICs. Dell says they conform to Redfish, etc. 15:22:44 <rpittau> cardoe: which version of iDRAC9 firmware are you using ? cause that could be a firmware issue 15:23:07 <cardoe> It doesn't matter. It's the same for at least half a dozen versions. Both 6.x and 7.x 15:23:30 <rpittau> ok, that's what I wanted to exclude :) 15:23:39 <TheJulia> its sort of a bit of a framing issue, and my comment sort of reflect this, we need to understand how to frame it and I think we're missing context. 15:24:11 <cardoe> Yeah I'm sure I'm not providing the right details. 15:24:38 <TheJulia> Well, your providing what you have :) 15:24:51 <cardoe> So basically when ironic calls `ethernet_interfaces.summary` on Sushy. Those NICs are excluded. 15:25:14 <TheJulia> I guess sushy needs to understand "which one is actually important" 15:25:26 <TheJulia> and identify when to check/reconcile them together 15:27:30 <cardoe> So the ramdisk inspector sees NIC.Slot.1-1 as ens2f0np0 for example. Which would imply NIC.Slot.1-1-1 15:27:51 <dtantsur> I see a big problem in that there is no link between the two EthernetInterface objects 15:28:16 <cardoe> But then the other port on that card is NIC.Slot.1-2 for example... it's called ens2f1np1 which implies NIC.Slot.1-2-2 15:29:05 <TheJulia> dtantsur: ... that is indeed a huge issue 15:29:06 <dtantsur> A workaround would be to ignore the Health record if its values are null (as opposed to unhealthy) 15:29:34 <TheJulia> but, 1-1-1 is a noted partition, but still sort of goes back to what is the partition in the context 15:29:47 <cardoe> I'm gonna get someone from Dell's firmware team on a call. 15:29:54 <TheJulia> +++++++ 15:30:08 <dtantsur> ++ 15:30:15 <dtantsur> "InterfaceEnabled": false is very concerning (but we don't look at it.. yet?) 15:30:38 <cardoe> But I did want to at least bubble this up to you guys and see if I could come up with an acceptable way to map them and put that in sushy-oem-idrac or something. 15:30:51 <TheJulia> I guess that explains why it fails 15:30:54 <TheJulia> the partition has no mac 15:30:59 <cardoe> Right 15:32:11 <TheJulia> so more info needed, I guess next item? 15:32:25 <cardoe> So another weird wrinkle 15:32:42 <cardoe> When I set the devices as PXE or HTTPS bootable from Dell's HTTP UI 15:32:47 <cid> Next item is: https://bugs.launchpad.net/sushy/+bug/2075980 15:32:47 <cid> Then, one last one from TheJulia: https://bugs.launchpad.net/ironic-python-agent/+bug/2076367 (sounded like something that needs to be discussed, so...)https://bugs.launchpad.net/sushy/+bug/2075980https://bugs.launchpad.net/sushy/+bug/2075980 15:33:04 <cardoe> The values that Ironic pulls out from the BIOS are "NIC.Slot.1-1" and "NIC.Embedded.1-1-1" 15:33:05 <dtantsur> Ahh ohh. 2075980 is a long standing pain point 15:33:31 <cardoe> Yeah 2075980 is mine as well. Happy to write patches. 15:34:02 <dtantsur> cardoe: what's your timezone? I'm happy to chat about the potential fix when I'm not boiling alive in this bloody weather. 15:34:05 <JayF> Is there something sneakily not straightforward about 2075980? 15:34:20 <cardoe> I'm CST (or is it CDT right now?) 15:34:57 <dtantsur> JayF: yes, figuring out what do use instead of IPA to detect the finish of the operation and the subsequent reboot. 15:35:30 <dtantsur> it could be simple, but we need to take a careful look (and ideally involve janders and iurygregory) 15:36:02 <JayF> why is "flip the power on and wait" just like we do for going ACTIVE on deployment not sufficient? 15:36:10 <dtantsur> JayF: wait for what? 15:36:21 <cardoe> So when I locally patched it, Ironic got mad that the node went away for a while. 15:36:29 <TheJulia> power doesn't even necessarily need to be on. 15:36:32 <JayF> cardoe just pointed at the thing I was trying to see 15:36:51 <dtantsur> Yeah, during BIOS settings or firmware updates, the machine is doing $weird_things for several minutes 15:36:57 <iurygregory> yeah 15:37:14 <iurygregory> in firmware updates the bmc will be unresponsive for some time also 15:37:23 <cardoe> Ironic was happy knowing that it would come back to the IPA after a while. 15:37:33 <iurygregory> at least with iDRAC the UI goes down XD 15:37:49 <dtantsur> yeah, we cannot even be sure the BMC behaves during the process 15:38:07 <dtantsur> so it's "wait for $something and retry if the BMC is not reachable or returns HTTP 5xx) 15:38:24 <TheJulia> I think we're going down a rabbit hole 15:38:31 <dtantsur> PTG topic? :) 15:38:41 <TheJulia> but maybe disjoint it from the overall issue 15:38:42 <rpittau> sounds like it :) 15:38:53 <TheJulia> Yes, there are issues, but that is not blocking to trying to fix cardoe 15:39:03 <TheJulia> or at least, cardoe and his efforts are input into that discussion 15:39:13 <TheJulia> so we shouldn't inadvertently block 15:39:47 <cardoe> So if I just updated 1 setting and let it expect to come back after a while that was fine. 15:39:59 <JayF> my question of "where is the hard here" has been more than sufficiently answered :D 15:40:03 <cardoe> But you can't chain 2 clean/service steps 15:40:24 <TheJulia> cardoe: quite possibly, but you'll need to provide input to the discussions :) 15:40:53 <cardoe> Kick the bug back to me to write up more details when running with a patched Ironic allowing that behavior 15:40:58 <cardoe> Fair? 15:41:03 <iurygregory> i'm a bit confused by "you can't chain 2 clean/service steps" .-. 15:41:15 <dtantsur> yeah, let's start with that, and maybe let's have a high-bandwidth discussion afterwards 15:41:17 <TheJulia> iurygregory: i think it means, the second step might fail 15:41:29 <iurygregory> TheJulia, oh ok! 15:41:31 <TheJulia> iurygregory: because we don't have a complete understanding 15:41:54 <JayF> Yeah it sounds like the workaround is setting cleaning timeout so high it just works 15:42:03 <JayF> I've seen similar workaround for in-band steps that rebooted outta band of ironic 15:42:42 <rpittau> or it doesn't :) 15:42:42 <rpittau> reboot time is so unpredictable with bios/bmc updates 15:42:51 <iurygregory> well, I had a funny iLO bug when doing firmware update (but only happened when doing between two specific versions) 15:43:12 <dtantsur> I think TheJulia has rightfully hinted that we should collect more information and thoughts before coming up with a solution :) 15:43:16 <iurygregory> even increasing the timeout the node went to clean failed because it failed to power on after reboot 15:43:20 <rpittau> yes, I agree 15:43:22 <TheJulia> dtantsur: bingo 15:43:28 <iurygregory> ++ 15:45:02 <TheJulia> onward? 15:45:05 <rpittau> yeah 15:45:14 <rpittau> any other bug to check ? 15:45:14 <cid> ++ 15:45:36 <rpittau> otherwise I think we're good for today :) 15:45:50 <rpittau> oh wait 15:46:02 <rpittau> any volunteer for the bug deputy this week ? 15:46:40 <cid> Happy to do it again 15:46:42 <cid> This could be a topic for another day: 15:46:42 <cid> https://bugs.launchpad.net/ironic-python-agent/+bug/2076367 15:46:42 <cid> And my observation is worthy of note too: A non-core bug deputy might need to be able to revert the status of a bug that shows as 'In Progress' when the assignee has abandoned it. 15:47:00 <rpittau> right 15:47:03 <JayF> I am surprised the ironic-drivers group we added you to doesn't have that ability 15:47:31 <JayF> cid: I'd say 2076367 is Low 15:47:41 <dtantsur> JayF: me too 15:47:53 <TheJulia> Yeah, there is an opportunity to make IPA a little smarter there, but definitely low priority 15:47:54 <JayF> It's behavior that's existed for years, that's mildly annoying but the real price paid is minimal (5 seconds?) 15:48:13 <TheJulia> eh... 20+ locally 15:48:21 <TheJulia> at least, it feels like 20+ 15:48:23 <TheJulia> in a VM 15:48:25 <JayF> it's a trivial enough fix in any event ... if os.path.exists() 15:48:36 <JayF> (on the various ipmi device locations) 15:48:56 <rpittau> looks like a low priority indeed, and a quick fix 15:49:27 * cid Updated 2076367 bug's important. 15:49:39 <rpittau> cid: thanks for volunteering, again! :D 15:50:28 <rpittau> I forgot one more thing! 15:50:28 <rpittau> I will be out next monday, so someone will have to take care of the meeting and meeting notes, please :) 15:50:28 <cid> no p 15:51:41 <JayF> I can run it if you want 15:51:56 <rpittau> thanks JayF, much appreciated 15:52:10 <rpittau> alright, I think that's it for today 15:52:19 <rpittau> thanks everyone! 15:52:23 <rpittau> #endmeeting