opendevreview | Iury Gregory Melo Ferreira proposed openstack/ironic-specs master: Firmware Interface https://review.opendev.org/c/openstack/ironic-specs/+/878505 | 01:50 |
---|---|---|
rpittau | good morning ironic! o/ | 07:22 |
opendevreview | Stephen Finucane proposed openstack/ironic master: db: Resolve SAWarning warnings https://review.opendev.org/c/openstack/ironic/+/856349 | 09:44 |
opendevreview | Stephen Finucane proposed openstack/ironic master: tests: Replace invalid UUIDs https://review.opendev.org/c/openstack/ironic/+/856347 | 09:44 |
iurygregory | good morning Ironic | 11:32 |
iurygregory | if possible a new round of reviews in https://review.opendev.org/c/openstack/ironic-specs/+/878505 would be good =) | 11:56 |
opendevreview | Merged openstack/ironic-python-agent-builder master: Move ubuntu jobs to jammy https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/879538 | 12:32 |
TheJulia | good morning | 13:13 |
iurygregory | good morning TheJulia | 13:18 |
TheJulia | ugh, continued v6 issues this morning | 13:56 |
iurygregory | ouch >.< | 14:01 |
iurygregory | sorry to hear that | 14:01 |
iurygregory | v4 issues no? v6 is working | 14:01 |
TheJulia | well, they are taking different paths | 14:02 |
TheJulia | but v6 seems to just be getting crushed with packet loss | 14:02 |
iurygregory | oh god .-. | 14:02 |
dtantsur | morning TheJulia! I recall you wanted to discuss something? | 14:05 |
TheJulia | Yeah, would at the bottom of the hour work for you? | 14:06 |
dtantsur | TheJulia: should do. I'm listening to an all-hands now. | 14:07 |
jrosser | is `redfish_verify_ca: false` the right way to disable certificate verification for redfish driver? | 14:08 |
TheJulia | dtantsur: afterwards? | 14:08 |
samuelkunkel[m] | jrosser: yes | 14:08 |
dtantsur | yep, should have a bit of time | 14:08 |
TheJulia | dtantsur: just let me know after it is over, and a 10m heads up would be awesome so I can make coffee :) | 14:09 |
dtantsur | ++ | 14:09 |
jrosser | hmm i still get an SSL verification error https://paste.opendev.org/show/btDy9vFZFJTnm7mx6dae/ | 14:10 |
samuelkunkel[m] | which version of sushy do you use? | 14:10 |
TheJulia | ... jrosser... have you tried to restart the conductor service? | 14:11 |
jrosser | oh - no i just updated the node | 14:11 |
TheJulia | give that a try | 14:11 |
TheJulia | if it suddenly works, I can explain why :) | 14:11 |
TheJulia | and then I'll kindly ask for a bug so we can fix it :) | 14:11 |
samuelkunkel[m] | as this is a container, do you have REQUESTS_CA_BUNDLE set? | 14:12 |
samuelkunkel[m] | thinking about this https://review.opendev.org/c/openstack/sushy/+/870888 | 14:12 |
dtantsur | TheJulia: suspecting session caching? | 14:12 |
samuelkunkel[m] | and if your sushy is "too old" you can also run into this were the flag is just invalidated | 14:12 |
jrosser | it still looks unhappy | 14:12 |
jrosser | yes there is a REQUESTS_CA_BUNDLE set in order to talk to in keystone etc | 14:13 |
jrosser | *in order to | 14:13 |
jrosser | this is Zed | 14:13 |
TheJulia | dtantsur: that was what I was thinking, but verify_ca is covered in the cache | 14:14 |
TheJulia | ohn oes | 14:14 |
TheJulia | noes | 14:14 |
TheJulia | https://github.com/openstack/ironic/blame/master/ironic/drivers/modules/redfish/utils.py#L216 | 14:14 |
* TheJulia suspects dmitry will spot it | 14:14 | |
samuelkunkel[m] | which sushy version? as this was merged not long ago | 14:14 |
samuelkunkel[m] | but hopefully I am wrong :) | 14:15 |
jrosser | sushy==4.3.2 | 14:15 |
TheJulia | err, hmm, maybe not | 14:16 |
dtantsur | I cannot spot anything wrong, but I'm also half occupied by a meeting | 14:16 |
samuelkunkel[m] | I run sushy==4.4.1 in my zed conductor pods | 14:17 |
samuelkunkel[m] | can you give it a try? | 14:17 |
jrosser | i can manually install that i think - though i expect upper-constraints for zed has things to say about that | 14:17 |
TheJulia | hmmmm | 14:18 |
TheJulia | dtantsur: yeah, I though we had the wrong field name but I traced it further up the code and it looks fine | 14:18 |
TheJulia | jrosser: do you ahve debug logging turned up? | 14:19 |
jrosser | i do | 14:20 |
samuelkunkel[m] | ah, we install sushy 4.4.1 and we also backport the patch | 14:21 |
samuelkunkel[m] | from the linked PR | 14:21 |
samuelkunkel[m] | had just a look into the container image | 14:21 |
samuelkunkel[m] | https://gitlab.com/yaook/images/infra-ironic/-/blob/devel/Dockerfile | 14:22 |
jrosser | this is what i get in the debug log https://paste.opendev.org/show/bGIVBSCCpNYKCBHyA0Vj/ | 14:23 |
samuelkunkel[m] | That is the bug mentioned in the PR | 14:24 |
jrosser | and 870888 is part of sushy 4.4.1? | 14:24 |
jrosser | oh or not - if you have to backport the patch | 14:25 |
samuelkunkel[m] | y | 14:25 |
samuelkunkel[m] | we use 4.4.1 and backport into 4.4.1 | 14:25 |
samuelkunkel[m] | not sure in what it is included | 14:25 |
samuelkunkel[m] | thats why, back when I build the zed image, I just took 4.4.1 and backported into that ^^ | 14:25 |
jrosser | ahha maybe 4.4.2 is including it | 14:26 |
samuelkunkel[m] | I did not check, but possible :) | 14:26 |
rpittau | samuelkunkel[m], jrosser, 870888 is indeed in 4.4.2 | 14:28 |
samuelkunkel[m] | then give it a try :) | 14:28 |
jrosser | ah i see it's now not failing on SSL errors with 4.4.2 | 14:28 |
samuelkunkel[m] | (note to me, adjust container build) | 14:28 |
jrosser | is 870888 backportable to stable/zed? | 14:29 |
opendevreview | Riccardo Pittau proposed openstack/sushy stable/zed: workaround: requests verify handling if env is set https://review.opendev.org/c/openstack/sushy/+/880832 | 14:31 |
rpittau | jrosser: cherry-picked :) | 14:31 |
jrosser | thankyou - i can confirm it's fixed the ssl errors on Zed here | 14:32 |
rpittau | awesome | 14:32 |
samuelkunkel[m] | nice :) | 14:32 |
jrosser | well, i mean 4.4.2 has :) | 14:32 |
rpittau | yeah, I think we can release 4.3.4 after that merges, we already have some fixes there | 14:32 |
rpittau | 4.3.4 being zed of c ourse | 14:33 |
jrosser | thats great - i have to jump some hoops to install outside of u-c | 14:33 |
samuelkunkel[m] | really? :D | 14:41 |
TheJulia | https://meet.google.com/dip-wrpc-jwe if anyone wants to discuss cross conductor rpc/survivability of actions | 14:41 |
samuelkunkel[m] | jrosser: are you happy with the supermicro arm server? | 14:46 |
jrosser | kind of +/- i think | 14:47 |
jrosser | we have them working for openstack compute nodes and controllers just fine, but in that situation the bmc DHCP it's address and everything just works (tm) | 14:48 |
samuelkunkel[m] | we try the HPE RL300. They dont provide inband IPMI access at all | 14:48 |
samuelkunkel[m] | so the ipa does not find any useable bmc addresses | 14:48 |
jrosser | where we want to set static BMC addresses for ironic nodes the setup screen seems barely tested | 14:48 |
samuelkunkel[m] | so we had to build a workaround by looking up their DDNS names | 14:49 |
samuelkunkel[m] | oh | 14:49 |
jrosser | and there is no way to disable the BMC sharing one/other of the onboard ports which seems also to have really unusual behaviour compared to x86 supermicro where you can just turn it off | 14:49 |
jrosser | also some uefi boot order bugs | 14:50 |
jrosser | having said all that, ampere/supermicro are being pretty responsive and have fixed stuff and given us new firmware | 14:50 |
jrosser | and this was also pre-release hardware and beta-ish motherboards, so i can't complain too much | 14:51 |
jrosser | the reason i'm trying redfish is that i've just had two more supermicro/ampere nodes delivered and those don't respect the ipmi "next boot should be PXE" at all | 14:53 |
opendevreview | Riccardo Pittau proposed openstack/ironic-python-agent-builder master: Add a non-voting ubuntu arm64 bnuild check job https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/880854 | 14:56 |
opendevreview | Riccardo Pittau proposed openstack/ironic-python-agent-builder master: Add a non-voting ubuntu arm64 build check job https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/880854 | 14:57 |
TheJulia | jrosser: so, we try the old documented raw values, can you reproduce that directly with just ipmi? | 15:34 |
TheJulia | and not the previously documented by supermicro values | 15:34 |
jrosser | i'm not sure what to do actually | 15:35 |
jrosser | if i stab F12 at the console when it reboots it correctly PXE boot into the ironic agent and does the cleaning | 15:35 |
jrosser | if i dont do that, i get whatever was last on the ssd | 15:36 |
jrosser | and i'm seeing similar behaviour now that i've tried the redfish driver too, like the setting of boot order is just ignored | 15:36 |
JayF | I assume you've taken the grand tour of the firmware settings, and looked for something like "BMC boot control [disabled]" or similar? | 15:37 |
jrosser | i will take another look | 15:37 |
JayF | you indicated this is prereleaseish hardware, yeah? | 15:38 |
TheJulia | I remember you mentioned the redfish case. And we explicitly set the default value with efibootmgr as well | 15:38 |
jrosser | this one allegedly is a production version | 15:38 |
TheJulia | Ugh | 15:39 |
TheJulia | So, even to just deploy we can’t signal it with either… | 15:39 |
TheJulia | To boot to network? | 15:39 |
jrosser | my hunch is that happens by accident | 15:40 |
JayF | if the ssd happens to be blank, it falls thru to net booting | 15:40 |
JayF | so we can boot exactly one agent on it | 15:40 |
JayF | yeah? | 15:40 |
jrosser | right, and i've succeeded once in deploying an OS | 15:40 |
jrosser | and now get that all the time | 15:41 |
opendevreview | Mark Goddard proposed openstack/tenks master: Add retries for get_url and package tasks https://review.opendev.org/c/openstack/tenks/+/880759 | 15:43 |
opendevreview | Mark Goddard proposed openstack/tenks master: Fix CI failures https://review.opendev.org/c/openstack/tenks/+/880866 | 15:43 |
TheJulia | jrosser: with redfish, do you see if the field settings change for the 'Boot' field under the system? | 15:43 |
* TheJulia wonders if we're just getting some sort of static data blob back | 15:44 | |
jrosser | like device: pxe ? | 15:45 |
TheJulia | yeah | 15:45 |
jrosser | how do i show that with the cli - i see it in horizon | 15:47 |
TheJulia | this would be in debug logs from sushy showing responses | 15:48 |
* jrosser reboots the BMC | 15:57 | |
opendevreview | Merged openstack/ironic stable/zed: [iRMC] Handle IPMI incompatibility in iRMC S6 2.x https://review.opendev.org/c/openstack/ironic/+/870881 | 15:57 |
rpittau | good night! o/ | 16:15 |
jrosser | ok so i have at least two compounded issues | 17:23 |
jrosser | the bmc was in a wierd state, that is cleared up for now by resetting it <- i've seen this before | 17:24 |
jrosser | and my ipa image built for zed just sits there and seems to do nothing, though sadly i've not included dynamic-login so cant jump in and see whats going on | 17:25 |
jrosser | the centos8/yoga ipa i have is working though for clean & deploy | 17:26 |
TheJulia | jrosser: is there any ipa output to the console? | 17:36 |
samuelkunkel[m] | So the issue with the bmc I only had on pre Victoria Release using ipmi with x86 supermicro - with the current zed + redfish setting boot order works fine. | 17:57 |
samuelkunkel[m] | I would have assumed that they use the same ASpeed bmc on these platforms? | 17:57 |
samuelkunkel[m] | So why would the bmc behave so differently? | 17:57 |
samuelkunkel[m] | Thats sounds really not good | 17:58 |
jrosser | samuelkunkel[m]: it’s open bmc on these | 18:16 |
jrosser | TheJulia: there would be console output but that’s an open question we have with ampere/sm that it’s only going to the serial port, not the screen | 18:17 |
jrosser | that’s quite possibly a more general arm platform thing, but would be interested to know if the agent output is shown on the screen / kvm in your systems samuelkunkel[m] | 18:18 |
TheJulia | jrosser: I guess serial port might be a thing to look at if you've got connectivity to it | 18:20 |
JayF | jrosser: I wonder if you need something like console=/dev/ttySx on the kernel command line | 18:20 |
TheJulia | and what JayF said | 18:20 |
JayF | jrosser: sometimes those machines can have >1 serial console, where one is physical and one is Serial-over-LAN | 18:20 |
jrosser | the physical one is usb /o\ | 18:21 |
jrosser | so all my lantronix console sever ports are no good | 18:21 |
TheJulia | only way to know is to try and iterate I guess | 18:21 |
TheJulia | oh, wait | 18:21 |
TheJulia | so the bmc does not sniff the serial port? | 18:21 |
JayF | TheJulia: on the opencompute machines we had at rax it was the same way; two different OS-facing ports, one was physical serial one was IPMI-serial | 18:22 |
jrosser | yes, there’s a graphical SOL interface as well as the vga kvm | 18:22 |
JayF | oh yeah, and IPMI-serial was not the same as the graphical SoL interface (which wasn't installed in our machines but could still be output to, sadly) | 18:22 |
TheJulia | yeah, I guess it would just take iterating | 18:23 |
TheJulia | :( | 18:23 |
jrosser | yep, I’m in the lab in person tomorrow so this will all be much easier to poke | 18:23 |
JayF | that's how I solved the problem then. Lots of "send the vendor an email; solve the problem with guess-and-check while waiting for a response" | 18:23 |
JayF | although that vendor is the one that gave us the IPMI raw bits which (maybe?) are still in the IPMI driver | 18:23 |
samuelkunkel[m] | jrosser: yes, partly. So for HPE you have the ILO (which is basically HPEs proprietary bmc). This one comes with a html5 console (and java). And this console works. (Until you boot the final Ubuntu image) | 18:24 |
samuelkunkel[m] | So I can see stuff like Post, BIOS, pxe, ipxe | 18:24 |
jrosser | you see the output from ipa there? | 18:25 |
samuelkunkel[m] | Yes | 18:25 |
samuelkunkel[m] | We use stream 9 ipa (as debian 12 efibootmgr failed me) | 18:25 |
samuelkunkel[m] | For debugging I have an IPA with a devuser / password | 18:25 |
samuelkunkel[m] | And I can log in via the console | 18:25 |
jrosser | yes - I forgot to include that in my stream 9 ipa | 18:26 |
jrosser | will fix that tomorrow | 18:26 |
samuelkunkel[m] | (Mostly I only grab the dhcp ip from there and then ssh into it) | 18:26 |
JayF | you can get the dhcp IP from Ironic, fwiw samuelkunkel[m] | 18:26 |
samuelkunkel[m] | Only if it is registered already | 18:27 |
JayF | samuelkunkel[m]: look at the driver_internal_info[agent_url] iirc (I may have slightly misremembered the location; but it's in there) | 18:27 |
JayF | samuelkunkel[m]: oh yeah, good point, only if the first lookup worked | 18:27 |
samuelkunkel[m] | Yep :) | 18:27 |
jrosser | tbh one of the reasons I was sticking with the ipmi driver was ipmitool-socat console access | 18:27 |
samuelkunkel[m] | But your right, I also do it if it fails on cleaning and such so ;) | 18:28 |
jrosser | it seems odd to me that I can’t mix that with redfish/drac/etc for everything else | 18:28 |
samuelkunkel[m] | That is something I never tried | 18:29 |
samuelkunkel[m] | But is the console also that weird on other supermicro server? | 18:30 |
samuelkunkel[m] | For the few we have for testing I recall using the html console via the BMC | 18:30 |
jrosser | it’s not that it’s wierd, it’s just not particularly accessible network wise | 18:31 |
jrosser | so it’s and ssh -D / reconfigure browser proxy settings away | 18:31 |
jrosser | but that’s a local issue here with how the networks are partitioned | 18:32 |
samuelkunkel[m] | There is a advisory for hpe about the blackscreen on the console. So this explains for my nodes why its black once its booted into the OS. Something like GRUB_CMDLINE_LINUX_DEFAULT="initcall_blacklist=sysfb_init“ is required | 18:32 |
opendevreview | Julia Kreger proposed openstack/ironic-python-agent master: WIP: Disable md5 https://review.opendev.org/c/openstack/ironic-python-agent/+/865190 | 18:32 |
TheJulia | ^^^^ == PAAAAAIIINNNNN | 18:32 |
samuelkunkel[m] | Then the console should work, according to hpe | 18:32 |
jrosser | I think we have seen similar, intermittently | 18:32 |
samuelkunkel[m] | But it seems like all of them struggle with their arm platforms | 18:32 |
samuelkunkel[m] | Would love to touch gigabytes 2 socket 2U4N Plattform | 18:33 |
samuelkunkel[m] | 1024 Cores on 2U | 18:33 |
samuelkunkel[m] | Does anyone know if there is a dib in diskimage-builder to set GRUB_CMDLINE Stuff? | 18:34 |
TheJulia | dtantsur: btw, I ended up making the checksum stuff smarter to figure out based up on length, because anything else is maddening | 18:39 |
TheJulia | also hides the glance field name detail nicely now | 18:40 |
TheJulia | unless you want to do something like sha3-512 | 18:40 |
TheJulia | I'm semi-stepping away, a mechanic will be arriving soon to take a the diesel engine which refuses to turn over | 19:32 |
TheJulia | for clarity, that is for the day, and I'm off the next two days | 19:34 |
-opendevstatus- NOTICE: The Etherpad service on etherpad.opendev.org will be offline for the next 90 minutes for a server replacement and operating system upgrade | 21:58 | |
JayF | For virtualpdu; needed to fix the github sync (we have to land anything at all) https://docs.openstack.org/virtualpdu/latest/ | 23:02 |
JayF | bad link | 23:02 |
JayF | https://review.opendev.org/c/openstack/virtualpdu/+/880894 Fix gitreview; Minor update to readme [NEW] | 23:02 |
JayF | good link | 23:02 |
JayF | I also just pushed https://review.opendev.org/c/openstack/project-config/+/880895 to make virtualpdu changes alert in here | 23:04 |
JayF | o/ | 23:04 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!