Tuesday, 2025-08-19

TheJuliahappy day!00:00
cardoejanders: come over to my little special corner. The CI just gives us cat poop in little baggies.00:00
TheJuliacardoe: I thought our CI system was like a Litter-Robot... so BIG baggies?!00:01
opendevreviewMerged openstack/ironic master: Remove direct mapping from API -> DB  https://review.opendev.org/c/openstack/ironic/+/95651200:03
cardoeThey can be quite large.00:05
TheJuliacardoe: I haven't had any time to cycle over to the issue your looking at00:07
TheJuliaunfortuantely00:07
TheJuliaMaybe tomorrow, I've got to rebuild my test VM and make sure something is happy00:07
cardoeIt's all good. I haven't either.00:09
iurygregoryTheJulia, shouldn't we remove wip from the commit title https://review.opendev.org/c/openstack/ironic/+/956972 ?00:10
TheJuliaiurygregory: I need to sit down to self review it at this point and likely make doc changes, I can do that unless you or jacob want to sanity review it00:10
TheJuliaJacob has already commented in irc though, which is a positive sign00:11
iurygregoryyeah I was about to mention that00:11
TheJuliaI'm going to go begin to prepare dinner here00:12
iurygregoryenjoy :D00:12
iurygregoryI just had lasagna here 00:12
opendevreviewVerification of a change to openstack/ironic master failed: Fix service failed state transitions for wait/hold  https://review.opendev.org/c/openstack/ironic/+/95729000:15
opendevreviewMerged openstack/ironic master: Optional indirection API use  https://review.opendev.org/c/openstack/ironic/+/95650400:25
jandersTheJulia ++ for removing WIP00:25
jandersshall I start working on the doco change related to aborting servicing? Happy to00:25
jandershalf of gardening done, just dropped in to check messages, off for another 20 mins and then back properly00:26
opendevreviewMerged openstack/ironic master: Revert "ci: temporary metal3 integration job disable"  https://review.opendev.org/c/openstack/ironic/+/95695300:35
opendevreviewMerged openstack/ironic master: Clean-up misc eventlet references  https://review.opendev.org/c/openstack/ironic/+/95563200:35
opendevreviewJacob Anders proposed openstack/ironic master: Fix servicing abort to respect abortable flag  https://review.opendev.org/c/openstack/ironic/+/95718902:12
opendevreviewJacob Anders proposed openstack/ironic master: WIP: update documentation to include servicing abort.  https://review.opendev.org/c/openstack/ironic/+/95782502:34
jandersTheJulia when you're online in the morning, please have a look, ^^ is my initial attempt at service-abort doco change. Question: do we update the state maching svg by hand or is it auto-generated?02:35
TheJuliaits a command to do it, I can do it tomorrow03:03
opendevreviewClif Houck proposed openstack/ironic master: Add a new 'category' field to the Port object  https://review.opendev.org/c/openstack/ironic/+/95544703:03
TheJuliaits in tox.ini, fwiw03:03
jandersthank you! :)03:04
opendevreviewMerged openstack/ironic master: Fix service failed state transitions for wait/hold  https://review.opendev.org/c/openstack/ironic/+/95729003:10
opendevreviewClif Houck proposed openstack/ironic master: Add a new 'physical_network' field to the Portgroup object  https://review.opendev.org/c/openstack/ironic/+/95562503:22
opendevreviewOpenStack Proposal Bot proposed openstack/ironic-ui master: Imported Translations from Zanata  https://review.opendev.org/c/openstack/ironic-ui/+/95782903:33
opendevreviewJacob Anders proposed openstack/ironic master: WIP: update documentation to include servicing abort.  https://review.opendev.org/c/openstack/ironic/+/95782503:34
jandersfurther to above I have a general doco contributions question: do we write Ironic docs manually or is Ironic documentation auto-generated?05:07
rpittaugood morning ironic! o/06:47
jandershey rpittau o/06:47
rpittaujanders: re ironic docs: it is usually manually written :)06:48
jandersthank you rpittau06:48
jandersprobably a good job for Claude, however I did jump the gun with the above change (all manual). Was all done before I thought about it :)06:48
rpittauI'm sure Claude will be happy to help :D07:20
opendevreviewJacob Anders proposed openstack/ironic master: WIP: update documentation to include servicing abort.  https://review.opendev.org/c/openstack/ironic/+/95782507:42
opendevreviewJacob Anders proposed openstack/ironic master: WIP: update documentation to include servicing abort.  https://review.opendev.org/c/openstack/ironic/+/95782507:44
janders^^ rebased on service-abort patch AND regenerated the state machine diagram07:45
janderslooks better now07:45
rpittauFYI final release for sushy for Flamingo has been requested https://review.opendev.org/c/openstack/releases/+/95774207:59
opendevreviewRiccardo Pittau proposed openstack/bifrost master: Deprecate support for Debian 11 Bullseye  https://review.opendev.org/c/openstack/bifrost/+/95784708:38
rpittauforgot debian 11 has pyuthon 3.9 by default! ^08:39
saHi all,   We are seeing an issue with InsertMedia via Redfish on HPE Compute Scale-up Server 3200: The path .../VirtualMedia/0/Actions/VirtualMedia.InsertMedia exists, but calls fail with sushy.exceptions.ResourceNotFoundError. We are using the current sushy and ironic versions with the latest merged patches.   Could you advise: Are there any known conditions where InsertMedia fails even when the resource path exists? Are th09:51
saAre there limitations on image URL types or lengths for VirtualMedia on this platform? Any suggestions to reliably insert a UEFI boot ISO in this setup?   Thank you for your guidance.   Best regards, Pooja Sangle09:51
Sandzwerg[m]Morning Ironic. Regarding the request above: We now found out that the things don't support HTTP(S) for virtual-media only CIFS or NFS. https://docs.openstack.org/ironic/latest/admin/drivers/redfish.html#redfish-virtual-media mentions "The idea behind virtual media boot is that BMC gets hold of the boot image one way or the other (e.g. by HTTP GET, other methods are defined in the standard), then “inserts” it into node’s10:59
Sandzwerg[m]virtual drive as if it was burnt on a physical CD/DVD." but does ironic supports anything else apart from HTTP(S)?10:59
fricklerSandzwerg[m]: I have a similar issue and already opened an RFE bug https://bugs.launchpad.net/ironic/+bug/2119212, planning to submit some code for it real soon(tm)11:35
Sandzwerg[m]Sounds promising. Thanks. I'll follow that bug :)12:02
TheJuliagood morning13:13
darkhackernc0/13:13
cardoeJayF: I do wanna talk about the quirks thing at some point13:23
cardoeI'm just neck deep in cinder right now.13:23
rpittaucardoe: I hope not literally :D13:27
cardoeheh. it might be more fun13:28
cardoeJust diving into a new to me code base is always frought with battles.13:28
opendevreviewMorten Stephansen proposed openstack/ironic-python-agent stable/2025.1: Fix for motherboards where efibootmgr returns UTF-8.  https://review.opendev.org/c/openstack/ironic-python-agent/+/95790913:30
JayFcardoe: let's have the conversation async in the etherpad so it can be a jumping off point for the PTG13:41
rpittauJayF: you have the etherpad for the PTG already created ?13:44
cardoeJayF: good call.13:47
rpittaummm PRC connection not working well with Python 3.10? jammy  does not like it13:56
rpittauhttps://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_2d0/openstack/2d0742476cfc4b12aa93e91ab0e67ea3/logs/ironic.log13:57
opendevreviewClif Houck proposed openstack/ironic master: Add a new 'category' field to the Portgroup object  https://review.opendev.org/c/openstack/ironic/+/95571314:12
JayFrpittau: yep it's linked on the whiteboard14:21
rpittauJayF: thanks14:21
opendevreviewcid proposed openstack/ironic master: Image cached on deployment failure  https://review.opendev.org/c/openstack/ironic/+/95761314:45
opendevreviewcid proposed openstack/ironic master: Clear image cache on deployment failure  https://review.opendev.org/c/openstack/ironic/+/95761314:53
*** darmach47 is now known as darmach414:58
TheJuliaI've already had someone reach out for eventlet help, success?!?15:00
JayFoh yeah that's right, it's party day15:37
JayF\o/ /o\ \o/ /o\15:37
TheJuliaIndeed!15:38
opendevreviewStanislav Dmitriev proposed openstack/ironic-python-agent stable/2024.2: Fix for motherboards where efibootmgr returns UTF-8.  https://review.opendev.org/c/openstack/ironic-python-agent/+/95794715:56
opendevreviewStanislav Dmitriev proposed openstack/ironic-python-agent stable/2024.1: Fix for motherboards where efibootmgr returns UTF-8.  https://review.opendev.org/c/openstack/ironic-python-agent/+/95794815:57
opendevreviewMerged openstack/ironic-ui master: Fix small mistake in text  https://review.opendev.org/c/openstack/ironic-ui/+/95661416:16
clifIt seems like something may have broke with downloading the CentOS GenericCloud-9-latest image? It lists have been last updated today and at least one voting test is failing to download it: https://zuul.opendev.org/t/openstack/build/7e229768b74e4e8c86805e984e429847/log/job-output.txt?severity=0#2475816:47
JayFlooking16:47
clifmaybe not downloading but something with repacking the base image16:48
JayF2025-08-19 03:27:45.681573 | controller | 2025-08-19 03:27:45.680 | qemu-img: error while reading at byte 6186532864: Input/output error16:48
JayFthat is ... weird16:48
JayFimplies a bad download or full disk or something along those lines I'd suspect16:48
clifso download succeeds, then it barfs trying to do `qemu-img convert`16:48
JayFso a few things I usually check in this case:16:50
JayFother changes, is that same job failing or passing16:50
JayFif it's passing; is it a machine in the same cloud16:50
JayF(you can usually tell by mirror urls but I think it's in metadata somewhere)16:51
JayFif the answer is "it's passing on a machine in the same cloud" I'd recheck16:51
JayFif the answer is "it's passing on a machine in a different cloud" I'd recheck and note the additional datapoint16:51
JayFalso checking the system output to see if any of the other htings (e.g. full disk) happened16:51
JayFhttps://6d6df89134025ef5b0e9-648722ac87374da2f576895eac8df5a8.ssl.cf5.rackcdn.com/openstack/7e229768b74e4e8c86805e984e429847/controller/logs/worlddump-latest.txt yeah the systems info looks good16:53
JayFthe thing that job specifically tests is weird though: you have to have special disk images for things to work on 4k16:53
JayFso there's also a possibility something changed about converting the images to 4k breaking that job16:53
JayFbut asking if other jobs are passing will help answer that16:54
clifwhat is 4k in this context?16:54
JayFhttps://zuul.opendev.org/t/openstack/builds16:54
JayFdisk block size16:54
JayFas oppposed to 512, the standard16:54
JayF(well, "standard" meaning how it's been done a long long time, I'm sure it's all standard to someone)16:54
clifhwere is another job doing the same thing: https://zuul.opendev.org/t/openstack/build/a827f19c81bb49baac21aa0777854d3a/log/job-output.txt#2481816:55
JayFhttps://zuul.opendev.org/t/openstack/builds?job_name=ironic-tempest-uefi-redfish-vmedia-4k&skip=0 clif that looks bad16:55
clifaha16:55
clifyea they started failing around the time that new image was published16:55
JayFso I'd spot check those, maybe 3-4 of them, make sure it's a similar failure16:55
clifhttps://cloud.centos.org/centos/9-stream/x86_64/images/16:55
JayFthen we beg TheJulia for her 4k voodoo knowledge16:55
JayFand/or mark it nonvoting temporarily while we figure it out16:55
clifI mean I already have found two with the exact same failure16:56
JayFI think we have a hypothesis16:56
clifI agree16:56
JayFif you want, you can document this in a bug, file a review that temporarily marks this job nonvoting16:56
* TheJulia hides16:56
JayFwe won't merge it immediately, but we will if there's no path to fix soon16:56
clifsure16:56
TheJuliagive me a few to context switch, I'm digging into another issue right now16:56
clifeither non-voting, or temporarily peg the image to previous version?16:56
JayFif you can peg to previous version you get three gold stars16:57
JayFbut make sure you document it in a bug so we don't have it pinned to august 18, 2025 on august 18, 2026 :D16:57
clifyea true, well I'll see how easy/hard that is to do16:58
clifdoes this go against ironic or tempest in the bug tracker?16:59
JayFironic17:00
JayFwe have our own ironic-tempest-plugin as well17:00
JayFbut a bug would only go there if the *test itself* was broken17:00
JayFin this case, we're breaking in the devstack setup inside ironic/devstack/lib/ironic17:00
opendevreviewStephen Finucane proposed openstack/ironic master: api: Allow more types for updates  https://review.opendev.org/c/openstack/ironic/+/95796017:00
JayFclearly in ironic's land17:00
JayFbut we might also discover, for instance, a new DIB release is impacting (doubtful given failures) or qemu-img (again doubtful)17:00
*** sfinucan is now known as stephenfin17:02
stephenfinrpittau: cid: Small follow-up for https://review.opendev.org/c/openstack/ironic/+/945218 there ^17:04
stephenfinSpotted it in the SDK CI https://zuul.opendev.org/t/openstack/builds?job_name=openstacksdk-functional-devstack-ironic&project=openstack/openstacksdk17:04
JayFlookin17:05
opendevreviewStephen Finucane proposed openstack/ironic master: api: Add schema for bios API (responses)  https://review.opendev.org/c/openstack/ironic/+/95214917:05
cliffiled: https://bugs.launchpad.net/ironic/+bug/212097417:05
JayFgood stuff, if you can get the version pin done I can land that for a CI fix, otherwise do the voting:false patch and we'll have that as an escape hatch 17:05
cliflooking17:09
opendevreviewClif Houck proposed openstack/ironic master: Make ironic-tempest-uefi-redfish-vmedia-4k non-voting  https://review.opendev.org/c/openstack/ironic/+/95796217:14
JayFyeah I was afraid it wouldn't be pinnable :( 17:15
clifproposing that for now, will look at doing the version pin too if that's preferrable17:15
JayFack17:15
JayFthat would be ideal17:16
opendevreviewClif Houck proposed openstack/ironic master: Make ironic-tempest-uefi-redfish-vmedia-4k non-voting  https://review.opendev.org/c/openstack/ironic/+/95796217:16
clifI think we would have to patch diskimage_builder somehow either in its tree or however we pull it into the devstack environment in order to peg or point it to a previous version17:37
clifwhich seems like a lot of work for something that may be fixed upstram in centos land17:37
clifso unless it's incredibly important I propose we just make it non-voting for now and watch for another centos image release17:38
JayFmy bigger concern is that it's not a *bug* in the centos image, it's some kind of intended-change that has side effects on us17:43
JayFI'd suggest we give some time for TheJulia or stevebaker[m] to have a look before we mark it -nv17:43
TheJuliawhat is going on?17:44
JayF4k job failing since ~3am (when the timestamp for teh updated centos image is) erroring with 2025-08-19 03:27:45.681573 | controller | 2025-08-19 03:27:45.680 | qemu-img: error while reading at byte 6186532864: Input/output error17:44
TheJuliayeah, we've seen that before17:44
JayFduring the 4k image conversion piece17:44
TheJuliathe mirror is bad17:44
JayFaha17:44
TheJuliait clears up eventually once the copy gets resynced17:45
TheJuliaor its a partial image on a mirror17:45
JayFis there room for us to do something like add sha1/md5 checking to avoid a wild goose chase in the future?17:45
JayFor is the sha1/md5 right and the image is just bad17:45
TheJuliaDIB would need to do that17:45
TheJuliabut I think that is definitely something which we shoudl do and if we get a crazy error code... I dunno, skip the job17:45
JayFclif: if you wanna add that feature to dib it would be nice to have and downstream would likely benefit too; up to you17:46
JayFand/or implementing TheJulia's suggestion; but I have no idea how to make a devstack setup fail in a way that passes the job17:46
* TheJulia finishes pinning down a customer complaint regarding proliantutils17:47
JayFyou know, I wouldn't mind keeping the lights on for that if they had given us the keys17:57
TheJuliayeaaah17:57
TheJuliaThis customer is unhappy that nic0 is the pxe nic, but when proliantutils uses redfish the bmc somehow boots nic118:00
TheJuliawith PXE18:00
JayFwell, good luck patching it.18:02
TheJuliayeah, no18:02
JayFI had a patch up, they asked after 4 months for a unit test18:02
JayFand I decided not to waste my time18:03
TheJuliaheh18:03
clifJayF: which feature? trying a different mirror? 18:04
JayFDIB checking sha1/md5 on the image and/or us configuring it to do so if it already supports it18:05
clifI'll take a look18:05
JayFjust generally trying to turn that awful error into "yeah it's a bad image download"18:05
TheJuliaand also likely detecting such a failure and likely blowing up the job in a way that we can know "oh, it was this"18:05
JayFthe first thing I'd do is validate the assumption that the hash woulda shown this18:05
TheJuliaJayF: like 2 weeks ago, the image on mirrors was like 268 MB for a few hours and that did the same exact thing.18:06
clifI'd be surprised if it doesn't already do that18:06
JayFbut in general, yes, just somehow make our devstack failure loud in the right way so you and I, or some other victim in the future, doesn't lose time digging  aknown issue18:06
JayFclif: me too, but sometimes things are surprising, especially in tools like DIB which usually get updated as-needed18:06
clifperhaps it does not after an initial skim of git grep md5/sha18:08
clifI'll dig some more18:09
JayFyeah like I said up to you18:09
JayFthese "make CI failures less annoying" rabbitholes are infinitly deep18:09
JayFso sometimes you take the shot, sometimes you move on :D 18:09
clifI'm happy to take the shot, occasionally at least18:09
JayF++18:13
JayFmake sure to link the PR into here and/or to me18:13
TheJuliaokay, brains18:31
TheJuliaword from centos land20:46
TheJulia( I prodded a centos board member after being on a call with someone expressing a very similar issue. )20:47
TheJuliaThey are working on it, they are aware of it. No ETR but they have identified the root cause and are going to try and fix the root issue.20:49
JayFgood stuff, thank you for closing that loop21:01
clifhttps://review.opendev.org/c/openstack/diskimage-builder/+/95798321:21
clifworks and in the process I discovered that a bunch of the most recent sha256sums for centos images are missing/zero length so that's fun21:22
clifidk what's going on with their infra/mirrors21:22
clifI have taken psychic damage21:23
JayFTheJulia: ^ perhaps more data for your CentOS board member contact21:37
JayFclif: it's okay, we have a cleric, she'll help you21:37
jandersgood morning Ironic o/21:46
jandersw/r/t repos/mirrors (we are also being hit by this downstream) - would there be any point in considering running our own?21:46
JayFopendev does mirror some items, but not everything21:47
JayFand often our image needs don't align with the rest of the community21:47
JayFso it's sorta a more complex question than it should be tbh21:47
jandersI used to hit this waay too often in my devops days, got annoyed, set up my own with snapshotting mechanism and never looked back21:47
jandersI would have paths with "live" mirrors as well as "frozen" ones snapshotted at a certain date21:48
jandersin case of garbage landing in repos breaking stuff I could just repoint either the client or the "default" symlink21:48
jandersthere is complexity involved but the tradeoff is fixing this whole class of problems21:49
jandersfor those familiar with RH Satellite channel concept, I am thinking something similar but more lightweight and using CentOS tooling21:50
janders"only" drawbacks : 1) setup/maintenance effort 2) this needs a few TB disk space21:51
jandersbut such approach does a pretty good job of swapping random, unpredictable but intense CI pain to constant background pain that can be lived with so to speak :)21:52
TheJuliaI'm 95% sure opendev doesn't actually mirror the qcow2 images22:21
TheJuliaand they are trying to hold without expanding the AFS mirors22:22
TheJuliaso centos is sort of the first victim to suffer mirroring wise22:22
TheJuliaclif: that is the issue, basically a race condition aiui22:31
opendevreviewJulia Kreger proposed openstack/ironic master: trivial: fix benchmark data generation script  https://review.opendev.org/c/openstack/ironic/+/95509922:57
opendevreviewJulia Kreger proposed openstack/ironic master: Fix the ability to escape service fail  https://review.opendev.org/c/openstack/ironic/+/95697223:08
TheJuliajanders: if you want to propose a doc change after ^, thta would help. Just to keep clarity23:09
jandersTheJulia ACK. I stacked the doco change on top of ^^, will have a look at the latest revision of 956972 shortly23:13
janders(I forgot that initially and was wondering why state machine diagram didn't update - doh! :) )23:14
TheJuliaoh, let me do that as a fresh change23:50
TheJuliauhhhhhhhhh23:50
opendevreviewJulia Kreger proposed openstack/ironic master: Update the state machine diagram  https://review.opendev.org/c/openstack/ironic/+/95799023:51
TheJuliajanders: ^23:51
opendevreviewSteve Baker proposed openstack/networking-generic-switch master: WIP Add security group support to ovs  https://review.opendev.org/c/openstack/networking-generic-switch/+/95651923:56

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!