Wednesday, 2023-07-05

JayFthank you00:19
iurygregoryhttps://zuul.opendev.org/t/openstack/build/566e49467d9344c4a890fb748bfefaef/log/controller/logs/ironic-bm-logs/node-1_console_log.txt doesn't look good, trying to find more info on the logs to see how we can fix02:58
iurygregoryideas are welcome ofc02:58
TheJuliaJammy ipxe whee03:15
TheJuliaI suspect dhcp, but it would need to be checked in the neutron dhcp logs03:15
rpittaugood morning ironic! o/06:24
opendevreviewDmitry Tantsur proposed openstack/bifrost stable/yoga: CI: Update cached cirros image to 0.5.3  https://review.opendev.org/c/openstack/bifrost/+/88765608:07
dtantsurnext one ^^08:07
samuelkunkel[m]Good morning ironic08:10
samuelkunkel[m]I have a question, we have nodes that can not register themselves to ironic via the ipa because we have some other nodes in maintenance (they get their ILOs exchanged by a "combined ILO card" - hpe stuff). During that time a DELL node can not register himself, logging: 08:11
rpittaudtantsur: approved, yoga is still in Maintained status, so should have upgrade jobs, I guess08:11
dtantsuryep (if they don't pass, we'll think)08:11
samuelkunkel[m]The following failures happened during running pre-processing hooks:\nNode not found hook failed: Failed to resolve the hostname (ilocz22410kgd.mc.infra.eu01.int.stackit.cloud) for node 8660cb5e-6308-4d2d-9d28-6dff2f08a6e3.08:11
samuelkunkel[m]So the Inspector complains that there is an HPE Node not reachable via its redfish (ilo)08:12
samuelkunkel[m]and therefore responds with a 400 to a DELL node wanting to register himself08:12
dtantsurI don't remember this code well, but it probably tries to de-duplicate nodes08:12
samuelkunkel[m]is this intended because the BMC is not reachable?08:12
dtantsurAnd probably tries to resolve hostnames for all Ironic nodes08:12
samuelkunkel[m]ahh08:12
dtantsursamuelkunkel[m]: we definitely need a better error message. I don't know if we can safely ignore the error though...08:17
samuelkunkel[m]for now I remove all the nodes in maintenance08:25
samuelkunkel[m]as I will redeploy them afterwards anyway08:25
samuelkunkel[m]after removing the nodes in question (16) - everything works fine08:26
samuelkunkel[m]noted to myself: having nodes without redfish / bmc connectivity is not a good state08:26
opendevreviewJacob Anders proposed openstack/ironic master: [WIP] Retry connecting vmedia through a DVD device if available.  https://review.opendev.org/c/openstack/ironic/+/88766510:20
opendevreviewJacob Anders proposed openstack/ironic master: [WIP] Retry connecting vmedia through a DVD device if available.  https://review.opendev.org/c/openstack/ironic/+/88766510:49
dtantsurMy first in-band inspection without ironic-inspector succeeded \o/10:59
dtantsur(Ports only, masghar is looking into the remaining hooks)10:59
janders\o/11:01
opendevreviewDmitry Tantsur proposed openstack/bifrost master: Make inspector.ipxe respect inspector_debug  https://review.opendev.org/c/openstack/bifrost/+/88766711:03
dtantsurError: iLO get_power_status failed, error: RIBCL is disabled11:06
dtantsurSo, apparently the ilo hardware type may not work with ilo6 at all. TheJulia ^^^11:06
iurygregorygood morning Ironic11:31
iurygregoryoh WOW O.o11:33
iurygregoryre ilo hardware type may not work with ilo611:33
samuelkunkel[m]generic redfish works pretty good with ilo611:33
iurygregorythank god11:34
iurygregory\o/11:35
opendevreviewVerification of a change to openstack/bifrost stable/yoga failed: CI: Update cached cirros image to 0.5.3  https://review.opendev.org/c/openstack/bifrost/+/88765611:37
rpittausamuelkunkel[m]: I really hope that's true :)12:17
dtantsursamuelkunkel[m]: nice, thanks for confirming! I wonder if we need to update the docs...12:21
samuelkunkel[m]So I have currently only 5 ILO6 Nodes, and TheJulia  did a fix (https://bugs.launchpad.net/sushy/+bug/2016307). 12:22
samuelkunkel[m]Since then I did not see any issues.12:22
dtantsuraha, so there was a bug fixed, good to know12:29
dtantsur(someone is asking me about wallaby)12:29
samuelkunkel[m]we are using zed - so not really able to comment about that12:29
dtantsursamuelkunkel[m]: is this the fix in question? https://review.opendev.org/q/Ib78198a60a8924de934bda0c9a0b9298541496cf12:30
samuelkunkel[m]yes12:30
samuelkunkel[m]I have backported it in our environment if I recall correctly12:32
samuelkunkel[m]as its not merged in the zed equivalent of sushy12:32
dtantsursamuelkunkel[m]: yeah, I'm looking at it, but it's causing merge conflicts12:36
dtantsurnot even this patch, but the patch that is required for it12:37
dtantsurhttps://review.opendev.org/c/openstack/sushy/+/86767512:37
opendevreviewDmitry Tantsur proposed openstack/sushy stable/zed: Retry on ilo state error  https://review.opendev.org/c/openstack/sushy/+/88274612:43
iurygregorydoes anyone remember why we have IRONIC_VM_COUNT: 4 in standalone jobs? but our ironic-base is defined with 2? grenade also has 4 and most other jobs have IRONIC_VM_COUNT as 3 (some inspector jobs only have 1) 13:28
dtantsuriurygregory: grenade and standalone jobs run the most tests13:28
TheJuliagood morning13:29
TheJuliadtantsur: wow, I've kind of floated we detect and warn on ilo usage13:29
dtantsurOr even outright refuse to operate if we can confirm that ilo6 is not going to work.13:30
TheJuliaThe other challenge is the next generation proliants can be ordered with openbmc instead13:30
dtantsurwhich also won't have RIBCL, I assume?13:30
TheJuliacorrect13:31
rpittauTheJulia: so it's confirmed next gen proliantutils will support openbmc ?13:31
samuelkunkel[m]you can order proliant g11 with either ILO or openbmc13:31
samuelkunkel[m]its basically up to the buyer13:31
TheJuliarpittau: no, it is not confirmed13:31
samuelkunkel[m]the hpe rep "strongly advises" to not do this ;)13:31
rpittau:D13:32
TheJuliaheh13:32
samuelkunkel[m]do we have any case of people already doing that?13:32
TheJuliaThe hardware for it is just not available in the general market13:32
rpittauI'm not aware13:32
TheJuliasometime in Q3 AIUI13:32
TheJuliadtantsur: was that an error/exception we could catch on w/r/t the ilo6?!13:42
dtantsurpossibly, I don't have many details13:43
iurygregoryTheJulia, I also thought about dhcp, I saw a few errors in the neutron dhcp logs but after each error there was a warning  https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_566/885276/6/check/ironic-standalone-redfish/566e494/controller/logs/screen-q-dhcp.txt "ERROR neutron.agent.linux.external_process [-] dnsmasq for dhcp with uuid 323e34a1-6ce9-4413-84f6-6724b78fb51e not found. The process 13:51
iurygregoryshould not have died"  "WARNING neutron.agent.linux.external_process [-] Respawning dnsmasq for uuid 323e34a1-6ce9-4413-84f6-6724b78fb51e"13:51
opendevreviewMerged openstack/bifrost stable/yoga: CI: Update cached cirros image to 0.5.3  https://review.opendev.org/c/openstack/bifrost/+/88765614:04
opendevreviewDmitry Tantsur proposed openstack/bifrost stable/zed: CI: Update cached cirros image to 0.5.3  https://review.opendev.org/c/openstack/bifrost/+/88769914:05
dtantsurnext one ^^^14:05
dtantsursee you tomorrow o/14:05
TheJuliaoh, lovely14:17
TheJuliaiurygregory: if it did die, that would totally do it14:17
iurygregoryeven if it restarts right after? humm14:17
iurygregorywe have the chance to fail during that time...14:18
opendevreviewJulia Kreger proposed openstack/ironic master: DNM Enable OVN  https://review.opendev.org/c/openstack/ironic/+/88508714:23
TheJuliaindeed14:23
iurygregoryI shouldn't have opened devstack/lib/ironic in this patch...14:25
TheJuliaheh14:26
TheJuliaI'd really like to get us mainly off of dnsmasq based jobs because of stability reasons14:26
iurygregory++14:26
opendevreviewDaniel Bengtsson proposed openstack/metalsmith master: Display a message if the undercloudrc file is not loaded  https://review.opendev.org/c/openstack/metalsmith/+/88770414:42
rpittaugood night! o/15:30
TheJuliadtantsur: was that rib disabled error from the ilo hardware type or the ilo5 hardware type?15:41
JayFAre there any changes for Ironic libraries we need to land before b-2 tomorrow?16:23
TheJuliaI don't believe so16:40
JayFgood stuff, I'm going to make sure all the release prs are up to date then16:40
TheJuliak16:40
JayFI'm going to be gone some this afternoon; flexing out some time since I fly to UK early Saturday :| 16:41
TheJuliaMakes sense, have a good flight16:51
JayFYeah, will be glad to have travel done for the rest of the year16:52
JayFI'm checking r/n to make sure there's nothing about travelling internationally by air that's changed in the years since I've done it lol16:52
TheJuliaeek, for the UK too16:53
* TheJulia wonders if you need an ETA16:53
JayFeverything I've seen says I need no advance anything 16:56
JayFeta is a useful keyword though, now I'm in another rabbithole lol16:58
JayFlooks like not required yet, but I can get one to make my life easier (which I will)17:00
JayFnot yet, but soon you can get them17:02
JayFso I should check each time I gotta go, apparently17:02
opendevreviewJulia Kreger proposed openstack/ironic-python-agent master: DNM: Test logging number of bytes downloaded  https://review.opendev.org/c/openstack/ironic-python-agent/+/88772917:06
TheJuliaso, it looks like if the connection breaks mid download, we don't actually (nor can we actually figure it out) anything17:07
TheJuliawe think the download is done thanks to http17:07
TheJulia... also this is a gray area because we don't have much we can do and we don't test python-requests returning content so the only way for me to validate my change works to just log is to put it through CI :\17:11
TheJulia2023-07-05T15:21:31.595Z|00018|lflow|WARN|error parsing actions "reg0[3] = put_dhcp_opts(offerip = 10.1.0.17, bootfile_name = "http://173.231.255.103:3928/boot.ipxe", bootfile_name_alt = "undionly.kpxe", [trim] ; next;": Syntax error at `bootfile_name_alt' expecting DHCPv4 option name.17:23
opendevreviewJulia Kreger proposed openstack/ironic master: DNM Enable OVN  https://review.opendev.org/c/openstack/ironic/+/88508717:24
TheJuliaiurygregory: you around?18:16
opendevreviewJulia Kreger proposed openstack/ironic master: DNM Enable OVN  https://review.opendev.org/c/openstack/ironic/+/88508718:31
stevebaker[m]Good morning20:25
TheJuliagood morning stevebaker[m] 20:25
opendevreviewJulia Kreger proposed openstack/ironic master: DNM Enable OVN  https://review.opendev.org/c/openstack/ironic/+/88508721:50
iurygregoryTheJulia, now I'm22:19
iurygregoryI was at the doctor22:20
iurygregorythe errors above are from the attempt to use OVN?22:21
TheJuliano worries22:29
TheJuliayeah, from ovn without latest version22:29
opendevreviewJulia Kreger proposed openstack/ironic-python-agent master: Log the number of bytes downloaded  https://review.opendev.org/c/openstack/ironic-python-agent/+/88772922:40
TheJuliaso ^^^ is a result of a customer case where I can't tell if a download is getting interrupted mid stream, but it sure looks like it22:51
TheJuliachecksums don't match and that is the simplest answer I can think of22:51
iurygregoryI think I saw this discussion on slack =) 22:51
TheJuliaiurygregory: I found where they tried iscsi, and the transfer io errored about half way22:52
iurygregoryouch >.<22:53
iurygregorywill add the patch to my list of review22:53
TheJuliathanks22:53
TheJuliawe really have no way to go "we know what the size is" up front either22:54
TheJuliaand the body/content size header is optional22:54
TheJuliaso.....22:54
iurygregory"it's fine"22:54
TheJulialogging seems like the minimal and maximal22:54
TheJuliapretty much22:54
iurygregory<insert the gif here>22:54
iurygregoryyup, log would totally help22:54
* TheJulia adds some more fire22:54
JayFI want the Ironic installshield(tm) progress bar23:09
JayFwhere it goes 1%, up through 10-15% in small steps, sorta stays there, jumps to 45% where it stays for three hours, then jumps to 99% and stays there for another hour, then it's done23:10
* JayF wonders if anyone is old enough to remember the days of a software install taking hours23:10
TheJulia“Please insert diskette 23”23:13
JayFI got an old powerbook a couple years back, only real interactability with it is via floppies, and I have the hardware to image them23:15
opendevreviewMerged openstack/bifrost stable/zed: CI: Update cached cirros image to 0.5.3  https://review.opendev.org/c/openstack/bifrost/+/88769923:15
JayFso I did a few of those, getting newer mac os system and old apps onto it23:15
JayFfun story: only about 9 out of 10 new-old-stock floppies still work, and those are mostly gone. Nowadays you just buy giant bulk quantities of floppies and get very, very low yields. yay.23:16
TheJuliaMy parents had an autocad license…. We had boxes of diskettes23:16
JayFwe had a family friend who believed in copying that floppy LOL 23:17
JayFI think I have a sealed OEM floppy copy of Windows NT 3.51 somewhere in my retro stuff23:17
TheJuliawe could copy the floppies, but when you had hardware keys...23:19
JayFdead serious, I remember my dad taking NASCAR Racing (by Papyrus, the best NASCAR game made that ended up turning into iRacing, yes, seriously)23:19
JayFcopying the floppies, but the copy-protect was a brochure about nascar tracks23:19
JayFwith dark-red background and black text so you couldn't use a B&W copier23:20
JayFmy dad went down to the print shop and paid like $3 a copy to make copies of that guide for all his work buddies lol23:20
JayFI still can identify the outline of some tracks from doing the copy protection loading into that game23:20

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!