scott_ | Hey Ironic team! Quick question -- I'm trying to deploy OKD4.12 via IPI bare metal install in a home lab that consists of 3 PowerEdge R720s with enterprise iDrac7's running the latest firmware. I've got the OKD portion of it all set up with a provisioner FCOS VM running in VMWare Fusion and the bootstrap VM running inside it (with Ironic pod running inside that). No provisionining network, just the baremetal network. | 03:26 |
---|---|---|
scott_ | To cut to the point, I've run the openshift install script a dozen times now and get different results every time. It seems that 0 or 1 of the 3 R720s gets provisoned correctly with FCOS running on the bare metal but the other 2 dont. Each of the 3 have gotten it provisioned correctly at least once -- but never more than that 1 server gets it on a run of the install script | 03:28 |
scott_ | The error I get for the other 2 (or sometimes all 3 servers) is "inspect failed" with 'Message': 'Server is already powered ON.', 'MessageArgs': [], 'MessageArgs@odata.count': 0, 'MessageId': 'IDRAC.1.6.PSU501'... | 03:31 |
scott_ | I've tried resetting all the iDracs to factory defaults, only powering on one server at a time during install, looking through the logs for any other hints, setting static IPs (in comparison to normally static-mapped DHCP), etc etc and have had absolutely 0 luck finding any rhyme or reason | 03:33 |
scott_ | I fully understand the iDrac7s don't seem to be supported at this point and am pretty close to switching to an IPMI install or just manual OKD UPI install but thought I would see if anyone may have some thoughts before I give up on it. It's very odd to me that 1 of 3 seems to regularly work and that the 1 is different every run. | 03:36 |
scott_ | I feel very comfortable digging around through the containers or logs if anyone has ideas and would much appreciate the thoughts! I tried to manually kick off the introspection process to make debugging a bit quicker and easier (than running through the entire installer) but couldn't figure out how to get past the Keystone authentication bit quickly/easily. | 03:39 |
scott_ | Additionally, I'm using the "idrac-virtualmedia" as the BMC address and Ironic is 100% able to hit all 3 iDracs -- all 3 power cycle correctly on every install, have their Boot option set to Virtual CD, etc -- but only one will possibly make it past introspection on to success | 03:44 |
scott_ | Thanks in advance!! And if the answer to all of this is simply that iDrac7s are not supported, thanks in advance for that too! Would honestly be a bit of a relief to just go with a different method at this point haha | 03:50 |
samuelkunkel[m] | Hi, | 06:39 |
samuelkunkel[m] | If you describe introspection fails I would ssh into the ipa and have a look at the logs there. | 06:39 |
samuelkunkel[m] | On controller side these tasks are handeled by the conductor. So those logs should be of interest. | 06:39 |
kaloyank | o/ | 07:31 |
kaloyank | 2 questions here: | 07:35 |
kaloyank | 1. I've never attended a PTG meeting, I see that the full schedule is yet to be announced, where do I look it once it's published? | 07:38 |
kaloyank | 1. I have a vague memory that there was a draft about saving local disk deployments as images in Glance but I can't find the spec. Does such a spec even exist? | 07:38 |
arne_wiebalck | Good morning, Ironic! | 08:03 |
arne_wiebalck | kaloyank: This is the PTG etherpad https://etherpad.opendev.org/p/ironic-bobcat-ptg, the whiteboard says schedule to come (https://etherpad.opendev.org/p/IronicWhiteBoard) | 08:07 |
arne_wiebalck | kaloyank: for your 2nd question, you mean snapshotting bare metal instances? | 08:07 |
vanou | good morning ironic | 10:45 |
iurygregory | good morning ironic | 11:17 |
iurygregory | kaloyank, regarding PTG schedule we will add the information this week to the etherpad that arne_wiebalck pasted, the initial proposal for the schedule is in the openstack-discuss so people could give feedback https://lists.openstack.org/pipermail/openstack-discuss/2023-March/032641.html | 11:19 |
kaloyank | iurygregory: thanks :) | 11:56 |
iurygregory | yw | 11:56 |
Nisha_Agarwal | JayF, dtantsur Is there any action required from my side on https://review.opendev.org/c/openstack/ironic/+/860820? I see merge fails even after recheck done by Jayf...Shall i do recheck one more time? | 12:13 |
dtantsur | Nisha_Agarwal: someone needs to check why the failure happened. If you don't have time to wait, you may want to do it yourself. | 12:14 |
Nisha_Agarwal | dtantsur, i saw py310 gate failed but i dont see the same gate failing on https://review.opendev.org/c/openstack/ironic/+/860821/5 while this patch is dependent on the first one... | 12:16 |
Nisha_Agarwal | dtantsur, dont know but looks like recheck may pass else that gate should fail for all ironic patches as the failing code has nothing to do with the patch code | 12:17 |
* TheJulia tries to wake u | 13:22 | |
TheJulia | up | 13:22 |
TheJulia | kaloyank: typically we also update the original etherpad with a schedule so it is all in one place | 13:25 |
TheJulia | kaloyank: snapshots have come up as a topic a few times, but the idea has never really gotten past an idea/initial phase. If you would be willing I would encourage you to start a spec if you have thoughts/interest in the topic. | 14:12 |
JayF | scott_: I don't specifically know if iDRAC 7 is supported; I'd be surprised if not. I don't have specific troubleshooting tips in your case though. | 14:19 |
JayF | scott_: what samuelkunkel[m] said about looking in logs is a good suggestion though | 14:19 |
dtantsur | JayF, scott_, probably works, but probably not with virtual media | 14:19 |
TheJulia | JayF: scott_ was using something which was redfish based and only works on idrac8/9 :( | 14:20 |
JayF | aha | 14:20 |
dtantsur | yeah, I don't think wsman is supported in metal3. so IPMI it is. | 14:20 |
TheJulia | ... yeah | 14:22 |
scott_ | Thanks samuelkunkel[m], JayF, dtantsur, and TheJulia -- running through an install now to get at the logs. Happy to move on to IPMI but just anecdotally considering that it seems to work on 1 out of 3 fairly regularly, wouldn't that indicate that it's working? I was figuring it may have been a lock, timing, or some kind of multicast issue | 14:23 |
scott_ | Nonetheless, thanks a lot for your guys help and I'll look at these logs here soon | 14:23 |
dtantsur | My bet is on timing. | 14:23 |
TheJulia | scott_: didn't see you re-appear! I saw you were not around last night and didn't respond | 14:23 |
scott_ | hahaha sorry using a web client and just reading through the archives if I go down! | 14:24 |
TheJulia | note taken for future reference | 14:24 |
scott_ | @dtantsur, I was thinking timing too and thought maybe booting up one at a time would fix that issue but that didn't seem to do any better either | 14:25 |
TheJulia | scott_: an idrac7? Same firmware? | 14:26 |
* TheJulia wonders if dell backported the vmedia capabilities | 14:26 | |
scott_ | Yup! iDrac7 on 2.65.65.65 | 14:26 |
TheJulia | ... in their firmware | 14:27 |
TheJulia | oh! | 14:27 |
TheJulia | you know what, 2.65.65.65 *is* the literal minimal version | 14:27 |
scott_ | hahaha yeah im just scraping by here | 14:27 |
TheJulia | so, in theory, it should be working. Redfish does need to be enabled for it to work. Worth checking if it is in the settings | 14:28 |
scott_ | yeah, Redfish is enabled on each and each of the 3 machines has provisioned correctly at least once -- just not all 3 of them in the same install | 14:29 |
TheJulia | oh joy :( | 14:29 |
JayF | Do we have someone from Dell in the community still we could point at a bug? | 14:30 |
TheJulia | since it doesn't fail, I suspect attachment is succeeding, I wonder if the BMC is just failing internally | 14:30 |
TheJulia | ... this feels familiar unfortuantely | 14:31 |
kaloyank | TheJulia | 14:31 |
kaloyank | I'd love to start a spec as I have a real use-case for this feature | 14:32 |
TheJulia | scott_: Are they all in the same boot mode to start? | 14:32 |
scott_ | I think @dtantsur may be right on timing being the issue since I can't find any other consistency to this | 14:33 |
scott_ | yup -- I've tried them all with NormalBoot set and with VirtualCD set on different runs | 14:33 |
TheJulia | yeah, I'm thinking the same, the boot mode reset might be causing a configuration reset which would cause the power to cycle through once before | 14:33 |
TheJulia | UEFI boot mode? | 14:33 |
scott_ | ahhhh interesting! | 14:33 |
scott_ | it appears Ironic is setting it to UEFI but I haven't manually set it at all | 14:34 |
TheJulia | Yeah, pre-set them to UEFI and see what happens | 14:34 |
scott_ | will try that now | 14:34 |
dtantsur | both metal3 and ironic default to uefi, yeah | 14:35 |
TheJulia | it could be the config is alternating somehow | 14:35 |
TheJulia | If we can identify a firm bug and write it up, I can email my dell contacts | 14:37 |
scott_ | done! all 3 set to UEFI -- will let you guys know in 20-30 how it plays out and then i can parse through some logs as well | 14:38 |
scott_ | @TheJulia sounds great | 14:38 |
TheJulia | There is an issue stevebaker[m] found with vmedia url in *much* later versions of firmware and we reached out to dell engineering for an answer but in that case it is definitely not our code nor something we can work around.. i.e. has to go to the firmware devs. | 14:39 |
scott_ | gotcha | 14:44 |
scott_ | @TheJulia -- hmm UEFI didn't seem to work -- straight to "inspect failed" for all 3 of them. Will start digging through the logs as samuelkunkel[m] suggested but happy to try any other suggestions as well! | 15:00 |
JayF | Good morning folks, it's meeting time but hopefully we'll wrap it up quick and get back to DRAC'in :D | 15:01 |
JayF | #startmeeting Ironic | 15:01 |
opendevmeet | Meeting started Mon Mar 20 15:01:21 2023 UTC and is due to finish in 60 minutes. The chair is JayF. Information about MeetBot at http://wiki.debian.org/MeetBot. | 15:01 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 15:01 |
opendevmeet | The meeting name has been set to 'ironic' | 15:01 |
TheJulia | o/ | 15:01 |
iurygregory | o/ | 15:01 |
matfechner | o/ | 15:01 |
JayF | Who all is here this morning? | 15:01 |
vanou | o/ | 15:01 |
hjensas | o/ | 15:01 |
janders | o/ | 15:01 |
JayF | #topic Announcements/Reminder | 15:02 |
JayF | Please hashtag your ready-for-review stuff with #ironic-week-prio; and prioritize reviews in the priority dashboard in the Ironic Whiteboard @ http://bit.ly/ironic-whiteboard | 15:03 |
JayF | We have 2023.1 branches cut of everything; master commits now go to 2023.2. Coordinated release is Wednesday. | 15:03 |
dtantsur | o/ | 15:03 |
JayF | Congratulations on yet-another-successful integrated release including Ironic \o/ | 15:03 |
vanou | \o/ | 15:04 |
JayF | There were no action items from previous meeting; skipping the related agenda item. | 15:05 |
JayF | #topic Ironic CI status | 15:05 |
JayF | Do we have any observations about CI? | 15:05 |
TheJulia | I didn't see any issues last week | 15:05 |
JayF | From me, I'm pretty sure we have a flakey test in py310 CI; I might try to find time to look in depth after PTG (literally time-wise after the PTG meetings on those days) | 15:05 |
JayF | I'll also note, metal3 CI is in master now | 15:06 |
JayF | and I think it's almost to the point of running out of the (shared) metal3-dev-env repo; once that change hits I will propose we backport that CI job to 2023.1 | 15:06 |
JayF | to ensure we keep things working for our metal3 friends + sqlite users | 15:07 |
JayF | If no other comments moving on | 15:07 |
JayF | #topic VirtualPDU | 15:08 |
JayF | Reminder: repo move scheduled for Apr 7, then it'll be under openstack/ and we'll have full mangement of it (not just paper-governance lol) | 15:08 |
JayF | #topic Ironic Bobcat vPTG | 15:08 |
JayF | Please stick around after the meeting; we'll be doing a sync to schedule PTG items. | 15:08 |
JayF | Please join in either the zoom room I will link post-meeting, or just async by being in the ehterpad ( https://etherpad.opendev.org/p/ironic-bobcat-ptg ) making comments. | 15:09 |
JayF | If there are any requirements you have for PTG scheduling: requested times for certain topics, topics not listed in the the etherpad, etc | 15:09 |
JayF | right now is more or less your last chance to make noise about that :) so please do | 15:09 |
TheJulia | hopefully that will start promptly, I have another meeting starting at the top of the hour | 15:10 |
JayF | yep I'll hurry on then :D | 15:10 |
JayF | #topic Ironic VMT | 15:10 |
JayF | Going to give a quick update here; essentially the only piece we're missing is giving VMT group exclusive access to Ironic security bugs | 15:10 |
JayF | but because we're sorta in storyboard/LP limbo, I'm unsure where to go next | 15:11 |
JayF | we should probably just configure in storyboard and get VMT managed, and ensure LP is configured correctly when that migration happens? I just haven' | 15:11 |
JayF | **haven't prioritized making time for that migration | 15:11 |
JayF | I'll probably go that route unless there are objections | 15:12 |
TheJulia | JayF: They can already see them in storyboard AFAIK | 15:12 |
TheJulia | and interact with them | 15:12 |
JayF | They have to have exclusive access | 15:12 |
JayF | e.g. VMT sees them but Ironic cores can't | 15:12 |
TheJulia | Yeah, they have that afaik | 15:12 |
TheJulia | the reporter otherwise has to explicitly grant in storyboard | 15:12 |
JayF | well that makes this easier; I'll move on VMT this week | 15:12 |
JayF | moving on so we can get to PTG planning | 15:12 |
JayF | #topic Hosting full IPA images | 15:12 |
JayF | dtantsur: this is your item | 15:12 |
dtantsur | That's a past one, sorry, should have removed | 15:13 |
JayF | ack; no problem | 15:13 |
JayF | what was the decision outta that? | 15:13 |
JayF | we going to add extra-hardware? | 15:13 |
JayF | 20M didn't seem like much in context of a huge modern image? | 15:13 |
dtantsur | I want to investigate getting rid of the dependency on extra-hardware in baremetal-operator | 15:13 |
JayF | nice | 15:13 |
JayF | #topic Open Discussion | 15:13 |
dtantsur | It may involve adding something to the IPA inventory | 15:13 |
JayF | Anything for open discussion? Speak quickly or else I'm going to close the meeting so we can shift to PTG planning | 15:13 |
vanou | I have | 15:14 |
JayF | dtantsur: neat; I'll be interested to see what comes out of that | 15:14 |
JayF | vanou: awesome; go ahead | 15:14 |
vanou | I +2 to moving VMT process regarding Ironic vul | 15:14 |
vanou | However, regarding vulnerability which affects both Ironic and vendor library, I think we need to add vul handling note into ironic doc. | 15:14 |
vanou | Just put 2 things in doc is enough I think: If Ironic community is asked by owner of unofficial library, | 15:14 |
vanou | 1)Ironic community is open and willing to collaborate to solve such rare vul | 15:14 |
vanou | 2)Ironic community is willing to collaborate in resonable manner, which means follwing good manner to handle vul (e.g. craft vul patch in private till fix is published), to resolve vul. | 15:14 |
JayF | I think we're willing in general to do those things; but like I suggested when this was brought up outside a meeting in IRC; I think there's value in getting that added to Openstack-wide VMT documentation | 15:15 |
JayF | because Ironic is not the only project that has vendor drivers which may require coordinated disclosure | 15:15 |
JayF | and I suspect the reality would look like what you lay out; but if you're concerned about getting that in writing, it's probably best to put that in OpenStack-level docs since Ironic is going to hook into the OpenStack-level VMT | 15:15 |
vanou | I see. | 15:15 |
vanou | you mean, it is better to consult this on OpenStack ML | 15:16 |
vanou | like you, on openstack-discussion? | 15:16 |
JayF | Or with the security SIG in #openstack-security; or both | 15:16 |
JayF | It's an openstack-wide problem so I prefer not solve it at a project level | 15:17 |
vanou | OK. I'll contact through that channel | 15:17 |
JayF | Is there anything else fro Open Discussion? | 15:17 |
JayF | Alright, thank you everyone. Stay tuned for PTG planning. | 15:19 |
JayF | #endmeeting | 15:19 |
opendevmeet | Meeting ended Mon Mar 20 15:19:08 2023 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 15:19 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/ironic/2023/ironic.2023-03-20-15.01.html | 15:19 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/ironic/2023/ironic.2023-03-20-15.01.txt | 15:19 |
opendevmeet | Log: https://meetings.opendev.org/meetings/ironic/2023/ironic.2023-03-20-15.01.log.html | 15:19 |
JayF | PTG planning Zoom -> https://us06web.zoom.us/j/89245276276?pwd=QS83YUh1K1ZoSUpicFM1ZFFNbGJ4dz09 | 15:19 |
JayF | if you don't wanna get in the zoom, you can open etherpad and comment along | 15:19 |
JayF | https://etherpad.opendev.org/p/ironic-bobcat-ptg | 15:19 |
kubajj | o/ | 15:21 |
JayF | >>>>> PTG planning Zoom -> https://us06web.zoom.us/j/89245276276?pwd=QS83YUh1K1ZoSUpicFM1ZFFNbGJ4dz09 <<<<< | 15:22 |
opendevreview | Merged openstack/ironic master: Fixes Secureboot with Anaconda deploy https://review.opendev.org/c/openstack/ironic/+/860820 | 15:32 |
scott_ | Seems like the meeting is over in here? So I don't see any IPA logs -- the only ironic pods running in the bootstrap system are "ironic", "ironic-inspector" and "ironic-ramdisk-logs" -- they seem to all be pushing out to journald on the host. | 15:47 |
scott_ | Of note -- I may have been overlooking this previously but the two errors that pop out of Terraform that OKD is running are the "Error: could not inspect" with 'Server is already powered ON.' as I mentioned but also just noticed the second error is "...node is currently 'inspect failed' , last error was 'Failed to inspect hardware. Reason: unable to start inspection: No suitable virtual media device found'" | 15:49 |
scott_ | I had assumed the "powered ON" issue was the underlying since thats where the ironic.drivers.modules.inspector stack trace pops in the logs but maybe thats not it? | 15:54 |
dtantsur | scott_: ironic-ramdisk-logs are IPA logs | 15:56 |
scott_ | ah gotcha! thanks @dtantsur | 15:57 |
dtantsur | scott_: "no suitable virtual media device" may mean that the hardware does not support virtual media, at least the standard way | 15:58 |
TheJulia | it might not have an attachment? | 15:58 |
scott_ | hmmm -- it's definitely worked oddly -- all 3 of these machines had Windows Server on them before my dozen provisioning attempts and now they all have FCOS | 15:59 |
scott_ | just doesnt seem to have any particular regularity to them provisioning correctly | 16:00 |
JayF | Even if vmedia is your end goal; I'd be curious if a pxe-based driver would work in your case | 16:01 |
JayF | just so we can narrow it down? | 16:01 |
TheJulia | scott_: is there a 2.75.75.75 available? | 16:03 |
TheJulia | for firmware | 16:03 |
scott_ | I think I may end up having to head the direction anyways -- was just trying to avoid adding a separate provisioning network | 16:03 |
TheJulia | I do seem to remember some issues on that first version ages ago | 16:03 |
scott_ | @TheJulia -- I hadn't seen a newer available version but let me look! | 16:03 |
JayF | Even with vmedia it's a good practice to have a separate provisioning network, fwiw :D (although significantly less dangerous than it was pre-agent-token) | 16:04 |
scott_ | from Dell's website "iDRAC7 has reached both the End of Sale as of February 2017 and End of Software Maintenance as of February 2020. The last release of iDRAC7 firmware is version 2.65.65.65. " leads me to believe no unfortunately | 16:05 |
scott_ | @JayF hahaha yeah understandable -- just trying to keep things pretty minimal in this lab environment but may just need to pull the trigger on adding a new network -- will look into doing it via VLANs if thats an option | 16:07 |
JayF | Just making it clear that it still will hit the network during provisioning :D | 16:07 |
JayF | I know that "good practice" sometimes is code for "a pain in the rear which might not be worth it" lol | 16:07 |
scott_ | hahahaha yeah definitely | 16:08 |
TheJulia | scott_: https://www.dell.com/support/home/en-us/drivers/driversdetails?driverid=krcxx | 16:09 |
* JayF adds "Dell whisperer" to his list of TheJulia's magical powers | 16:11 | |
scott_ | TheJulia: :O -- let me double check this will work before I brick an iDrac but thanks so much if it does!!! haha | 16:11 |
TheJulia | I *believe* that magical power is hjensas's | 16:11 |
TheJulia | but some of it may have rubbed off on me | 16:11 |
scott_ | hmmm when I put in my service tag on that page it returns "This driver is not compatible" but im certainly not opposed to trying it if you think it may work | 16:14 |
TheJulia | dunno, I know we have applied it to some of our hardware, but there is a variety of hardware out there | 16:21 |
TheJulia | honestly, if your super worried about the risk of bricking the BMC (because, I know from experience, it is just not fun to recover from), then PXE is the path forward since your on a version that was used in the very initial development was taking place | 16:22 |
scott_ | TheJulia: I love a little excitment and risk taking in my day! Going to give 2.75.75.75 a go -- will report back shortly! Assuming it does brick my device or simply doesn't work, will likely go the PXE route as you, JayF, and dtantsur have mentioned | 16:34 |
scott_ | Yup appears no luck with 2.75.75.75 -- "Unable to extract payloads from Update Package." | 16:37 |
* JayF wonders if it's > if serviceTagWarrantyExpire: raise SomeError | 16:38 | |
JayF | I'm maybe a little more cynical than needed though :P | 16:39 |
scott_ | hahahahaha that would be very unfortunate but since it works 1 out of 5 times, seems unlikely | 16:39 |
JayF | I meant more the inability to upgrade but yeah, unlikely anyway :P | 16:40 |
scott_ | ah gotcha | 16:40 |
scott_ | Well thank you guys regardless! Will try a couple variations on what I've been trying repeatedly (on a separate provisioning network, via the IPMI option, etc) before moving on to PXE | 16:40 |
JayF | good luck :) sorry we were unable to get it working the way you wanted | 16:41 |
scott_ | not at all! was helpful nonetheless! much appreciated | 16:41 |
scott_ | will let you guys know if it unexpectedly seems to work with some minor tweak for archives sake | 16:42 |
JayF | feel free to hang out in any event :) It means you get to be the first vote anytime we talk about what an operator would actually want ;) | 16:45 |
opendevreview | Dmitry Tantsur proposed openstack/ironic-specs master: [WIP] Merge Inspector into Ironic https://review.opendev.org/c/openstack/ironic-specs/+/878001 | 18:57 |
dtantsur | ^^^ Long overdue, I know :) | 18:57 |
dtantsur | I've been delaying it until.. well, until I realized that if I don't write it down, I'll keep repeating it over and over again. | 18:57 |
dtantsur | Not finished, but comments on the motivation section, as well as performance/scaling/upgrades are welcome | 18:58 |
dtantsur | on this positive note, wishing y'all good nicht | 18:58 |
opendevreview | Steve Baker proposed openstack/ironic-python-agent-builder master: Add checksum generation support https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/878009 | 21:06 |
JayF | TheJulia: dtantsur: I think we're going to need another hour for vPTG to fit it all in, unless we want to cut some things | 21:32 |
TheJulia | I think it is reasonable | 21:32 |
TheJulia | to add an hour, that is | 21:32 |
JayF | TheJulia: dtantsur: I'm thinking adding a 1600-1700 UTC slot on wednesday, use it as a lightning round for all of the things we want to talk about quickly and move on (e.g. ARM CI/image publishing, kernel/ramdisk multiarch) | 21:32 |
TheJulia | sounds good to me | 21:56 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!