15:00:15 #startmeeting ironic 15:00:15 Meeting started Mon Aug 12 15:00:15 2024 UTC and is due to finish in 60 minutes. The chair is rpittau. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:15 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:15 The meeting name has been set to 'ironic' 15:00:18 o/ 15:00:23 o/ 15:00:30 Hello everyone! 15:00:30 Welcome to our weekly meeting! 15:00:30 The meeting agenda can be found here: 15:00:30 #link https://wiki.openstack.org/wiki/Meetings/Ironic#Agenda_for_August_12.2C_2024 15:00:31 o/ 15:00:32 o/ 15:01:09 #topic Announcements/Reminders 15:01:25 #info Standing reminder to review patches tagged ironic-week-prio and to hashtag any patches ready for review with ironic-week-prio 15:01:25 #link https://tinyurl.com/ironic-weekly-prio-dash 15:01:42 o/ 15:01:57 it's not looking bad, lots of new patches, some from last week 15:02:01 o/ 15:02:17 #info 2024.2 Dalmatian Release Schedule 15:02:17 #link https://releases.openstack.org/dalmatian/schedule.html 15:02:30 we're at week -7 ! 15:02:37 o/ 15:02:46 we hsould start thinking about what's missing in the non-client and client libraries 15:02:55 FF is in 2 weeks also 15:03:14 #info PTL nominations period between August 14 and August 28 15:03:29 I have decided to run for a second mandate as PTL 15:03:41 nice =D 15:03:52 :) 15:04:18 sorry for the late notice, just life in the middle of everything, it's been a long summer and it hasn't ended yet! 15:04:37 Thanks Riccardo! 15:04:48 Awesome, thanks! 15:04:53 my pleasure, really :) 15:05:01 I'm glad I can do it 15:05:32 #info the next OpenInfra PTG which will take place October 21-25, 2024 virtually! Registration is now open! 15:05:32 #link https://ptg.openinfra.dev/ 15:05:38 the ironic team has been registered 15:05:54 please add your name and topics here 15:05:54 https://etherpad.opendev.org/p/ironic-ptg-october-2024 15:06:48 we still have some time for topics 15:07:12 anything else to announce/remind ? 15:07:58 okey dokey, onward! 15:08:00 #topic Review Ironic CI status 15:08:49 anything worth mentioning from last week? 15:08:49 I've only seen some instability on pkgs repos, but it recovered pretty quickly 15:09:36 alright, great week for CI :D 15:09:45 #topic Discussions 15:09:57 nothing to discuss, unless anyone has anything to mention? 15:10:17 Is someone making sure old bugfix branches still get retired? 15:10:23 JayF: I am :) 15:10:27 Awesome, thanks :) 15:10:39 np! :) 15:11:23 actually maybe I should write something down about the procedure, I'll take a note on that 15:11:29 ++ 15:11:57 ok! 15:11:57 anything else? 15:12:06 I would like to mention something 15:12:12 masghar: please go ahead :) 15:12:36 Unfortunately I have not been able to complete the inspection rules work so far 15:12:59 (Have been a little overwhelmed with bugs and such downstream) 15:13:10 No worries, it happens :) 15:13:14 (Just a head's up) 15:13:25 no worries masghar :) 15:13:31 Thanks, will try to carve out time for it 15:13:49 thanks for the headsup 15:14:20 No problem, and thanks 15:14:54 (Thats it from me) 15:14:57 masghar: If we can document, in detail, what's done and what's left to be done, I think cid is willing to help pickup some of that work. I'm unsure if the knowledge transfer is worth it :) 15:15:13 masghar: so know that's an option at hand if you'd like to exercise it 15:15:34 I have a very tiny patch that I started a few months ago 15:15:58 I can explain my thought process to cid or whoever asks 15:16:25 I appreciate the offer of help :) 15:16:57 masghar, JayF, In my todo for the week 15:17:14 thanks cid :) 15:17:19 Thanks cid! 15:17:32 nop! 15:18:20 * cid that should be no problems :) 15:18:45 anything else? otherwise I have one quick thing 15:19:50 ok 15:19:51 I forgot to mention that it's almost time for the highlights 15:19:51 I will take care of them, at least start them, I'll be out at the beginning of Seprember when they're due, but I'll do my best to finish them before I leave for my PTO 15:19:51 if you have anything you want to mention about this development cycle please let me know 15:20:15 ack 15:20:52 alright, moving on 15:20:54 #topic Bug Deputy Updates 15:21:05 cid: thanks for taking car of that, anything to mention? 15:21:44 So, I needed help triaging 4 bugs, I think two can be considered done, except these other two: 15:21:44 https://bugs.launchpad.net/sushy/+bug/2075979 15:21:44 https://bugs.launchpad.net/sushy/+bug/2075980 15:21:58 So the first one is mine. 15:22:33 I suspect yours is a valid but, but I'm having trouble understanding exactly what, the partition, in the api is 15:22:40 Essentially sushy does not see those NICs. Dell says they conform to Redfish, etc. 15:22:44 cardoe: which version of iDRAC9 firmware are you using ? cause that could be a firmware issue 15:23:07 It doesn't matter. It's the same for at least half a dozen versions. Both 6.x and 7.x 15:23:30 ok, that's what I wanted to exclude :) 15:23:39 its sort of a bit of a framing issue, and my comment sort of reflect this, we need to understand how to frame it and I think we're missing context. 15:24:11 Yeah I'm sure I'm not providing the right details. 15:24:38 Well, your providing what you have :) 15:24:51 So basically when ironic calls `ethernet_interfaces.summary` on Sushy. Those NICs are excluded. 15:25:14 I guess sushy needs to understand "which one is actually important" 15:25:26 and identify when to check/reconcile them together 15:27:30 So the ramdisk inspector sees NIC.Slot.1-1 as ens2f0np0 for example. Which would imply NIC.Slot.1-1-1 15:27:51 I see a big problem in that there is no link between the two EthernetInterface objects 15:28:16 But then the other port on that card is NIC.Slot.1-2 for example... it's called ens2f1np1 which implies NIC.Slot.1-2-2 15:29:05 dtantsur: ... that is indeed a huge issue 15:29:06 A workaround would be to ignore the Health record if its values are null (as opposed to unhealthy) 15:29:34 but, 1-1-1 is a noted partition, but still sort of goes back to what is the partition in the context 15:29:47 I'm gonna get someone from Dell's firmware team on a call. 15:29:54 +++++++ 15:30:08 ++ 15:30:15 "InterfaceEnabled": false is very concerning (but we don't look at it.. yet?) 15:30:38 But I did want to at least bubble this up to you guys and see if I could come up with an acceptable way to map them and put that in sushy-oem-idrac or something. 15:30:51 I guess that explains why it fails 15:30:54 the partition has no mac 15:30:59 Right 15:32:11 so more info needed, I guess next item? 15:32:25 So another weird wrinkle 15:32:42 When I set the devices as PXE or HTTPS bootable from Dell's HTTP UI 15:32:47 Next item is: https://bugs.launchpad.net/sushy/+bug/2075980 15:32:47 Then, one last one from TheJulia: https://bugs.launchpad.net/ironic-python-agent/+bug/2076367 (sounded like something that needs to be discussed, so...)https://bugs.launchpad.net/sushy/+bug/2075980https://bugs.launchpad.net/sushy/+bug/2075980 15:33:04 The values that Ironic pulls out from the BIOS are "NIC.Slot.1-1" and "NIC.Embedded.1-1-1" 15:33:05 Ahh ohh. 2075980 is a long standing pain point 15:33:31 Yeah 2075980 is mine as well. Happy to write patches. 15:34:02 cardoe: what's your timezone? I'm happy to chat about the potential fix when I'm not boiling alive in this bloody weather. 15:34:05 Is there something sneakily not straightforward about 2075980? 15:34:20 I'm CST (or is it CDT right now?) 15:34:57 JayF: yes, figuring out what do use instead of IPA to detect the finish of the operation and the subsequent reboot. 15:35:30 it could be simple, but we need to take a careful look (and ideally involve janders and iurygregory) 15:36:02 why is "flip the power on and wait" just like we do for going ACTIVE on deployment not sufficient? 15:36:10 JayF: wait for what? 15:36:21 So when I locally patched it, Ironic got mad that the node went away for a while. 15:36:29 power doesn't even necessarily need to be on. 15:36:32 cardoe just pointed at the thing I was trying to see 15:36:51 Yeah, during BIOS settings or firmware updates, the machine is doing $weird_things for several minutes 15:36:57 yeah 15:37:14 in firmware updates the bmc will be unresponsive for some time also 15:37:23 Ironic was happy knowing that it would come back to the IPA after a while. 15:37:33 at least with iDRAC the UI goes down XD 15:37:49 yeah, we cannot even be sure the BMC behaves during the process 15:38:07 so it's "wait for $something and retry if the BMC is not reachable or returns HTTP 5xx) 15:38:24 I think we're going down a rabbit hole 15:38:31 PTG topic? :) 15:38:41 but maybe disjoint it from the overall issue 15:38:42 sounds like it :) 15:38:53 Yes, there are issues, but that is not blocking to trying to fix cardoe 15:39:03 or at least, cardoe and his efforts are input into that discussion 15:39:13 so we shouldn't inadvertently block 15:39:47 So if I just updated 1 setting and let it expect to come back after a while that was fine. 15:39:59 my question of "where is the hard here" has been more than sufficiently answered :D 15:40:03 But you can't chain 2 clean/service steps 15:40:24 cardoe: quite possibly, but you'll need to provide input to the discussions :) 15:40:53 Kick the bug back to me to write up more details when running with a patched Ironic allowing that behavior 15:40:58 Fair? 15:41:03 i'm a bit confused by "you can't chain 2 clean/service steps" .-. 15:41:15 yeah, let's start with that, and maybe let's have a high-bandwidth discussion afterwards 15:41:17 iurygregory: i think it means, the second step might fail 15:41:29 TheJulia, oh ok! 15:41:31 iurygregory: because we don't have a complete understanding 15:41:54 Yeah it sounds like the workaround is setting cleaning timeout so high it just works 15:42:03 I've seen similar workaround for in-band steps that rebooted outta band of ironic 15:42:42 or it doesn't :) 15:42:42 reboot time is so unpredictable with bios/bmc updates 15:42:51 well, I had a funny iLO bug when doing firmware update (but only happened when doing between two specific versions) 15:43:12 I think TheJulia has rightfully hinted that we should collect more information and thoughts before coming up with a solution :) 15:43:16 even increasing the timeout the node went to clean failed because it failed to power on after reboot 15:43:20 yes, I agree 15:43:22 dtantsur: bingo 15:43:28 ++ 15:45:02 onward? 15:45:05 yeah 15:45:14 any other bug to check ? 15:45:14 ++ 15:45:36 otherwise I think we're good for today :) 15:45:50 oh wait 15:46:02 any volunteer for the bug deputy this week ? 15:46:40 Happy to do it again 15:46:42 This could be a topic for another day: 15:46:42 https://bugs.launchpad.net/ironic-python-agent/+bug/2076367 15:46:42 And my observation is worthy of note too: A non-core bug deputy might need to be able to revert the status of a bug that shows as 'In Progress' when the assignee has abandoned it. 15:47:00 right 15:47:03 I am surprised the ironic-drivers group we added you to doesn't have that ability 15:47:31 cid: I'd say 2076367 is Low 15:47:41 JayF: me too 15:47:53 Yeah, there is an opportunity to make IPA a little smarter there, but definitely low priority 15:47:54 It's behavior that's existed for years, that's mildly annoying but the real price paid is minimal (5 seconds?) 15:48:13 eh... 20+ locally 15:48:21 at least, it feels like 20+ 15:48:23 in a VM 15:48:25 it's a trivial enough fix in any event ... if os.path.exists() 15:48:36 (on the various ipmi device locations) 15:48:56 looks like a low priority indeed, and a quick fix 15:49:27 * cid Updated 2076367 bug's important. 15:49:39 cid: thanks for volunteering, again! :D 15:50:28 I forgot one more thing! 15:50:28 I will be out next monday, so someone will have to take care of the meeting and meeting notes, please :) 15:50:28 no p 15:51:41 I can run it if you want 15:51:56 thanks JayF, much appreciated 15:52:10 alright, I think that's it for today 15:52:19 thanks everyone! 15:52:23 #endmeeting