Wednesday, 2024-09-11

adam-metal3hello Ironic o705:50
adam-metal3Do you also experience inspection failures if you use an IPA that has this change? https://opendev.org/openstack/ironic-python-agent/commit/ab99f36baa1d89c7db1502be4ca36ebd91446a8c , I have built an IPA that has everyting up until the change before this and it works fine but if I include this very fresh commit I get an inspection failure. THe failure is a 400 error code from the inspector (conductor) 05:57
adam-metal3that the inspection data format is not OK str is not None I will reproduce and give the you the exact error message soon05:57
adam-metal3There are new None return values around L2399, L2440 and L2490 of the hardware.py I suspect something funny there05:59
adam-metal3this is the error I am seeing if the mentioned commit is present in the code: https://pastebin.com/embed_js/ZKVyCfnh?theme=dark06:31
dtantsuradam-metal3: we see issues in the metal3 job but not in our regular jobs07:50
dtantsurmmm yeah, the commit probably changed what is returned on a missing IPMI address: None instead of something silly like 0.0.0.007:51
dtantsurcid: ^^^07:52
dtantsuras much as I'd prefer to change to None, we might want to change back to whatever placeholder values we used to return07:52
dtantsur(same for MAC's)07:52
adam-metal3dtantsur, it is a bit strange because in the respective Exception blocks there was a pssibilityto return None already, O_O so are those correct? because in here Ironoic as you have said always excepts string https://opendev.org/openstack/ironic/src/branch/master/ironic/api/controllers/v1/ramdisk.py#L28807:55
dtantsurI don't think it was actually possible to return None before, we always had a fallback value07:57
dtantsurJayF: FYI the metal3 job failure could actually be a regression on our side07:59
opendevreviewDmitry Tantsur proposed openstack/ironic master: Fix inspection if bmc_address or bmc_v6address is None  https://review.opendev.org/c/openstack/ironic/+/92888508:03
dtantsuradam-metal3: ^^^08:03
cidThat section of the code is just weird.08:03
cidIt might just be a case where nothing really ever went wrong, because the earlier code returned both a default value and None too.08:03
ciddtantsur: ty08:04
adam-metal3dtantsur: thx08:04
cidI remember there was a weird error with also returning defaults.08:04
dtantsurcid: I guess it rarely returned None in practice08:05
cidExactly 08:05
cidProbably the same for defaults too. But let's see what CI thinks.08:06
opendevreviewVerification of a change to openstack/ironic master failed: Set node "alive" when inspection finished  https://review.opendev.org/c/openstack/ironic/+/92782808:11
adam-metal3dtantsur: wdyt could this be backported to supported branches? because technically as I understand Ironic was expected to handle these None values by design and the so older Ironic release branches have become incomptible with the new IPAs.08:36
dtantsuradam-metal3: I see no issues with backporting this patch08:53
adam-metal3dtantsur: nice, thanks should I do it?09:01
dtantsuradam-metal3: I can request the backports once the master change merges09:48
adam-metal3dtantsur: sounds good, thanks!10:05
dtantsurCould anyone who used to be excited above skip-level upgrade testing (JayF? TheJulia?) please fix it? https://zuul.opendev.org/t/openstack/builds?job_name=ironic-grenade-skip-level&project=openstack/ironic10:39
dtantsurotherwise, I'll move the job to experimental to stop wasting CI resources.10:39
dtantsurit only ever passes on 2024.1, apparently10:40
* dtantsur just realized bifrost fails on ubuntu-noble, send help!10:42
opendevreviewDmitry Tantsur proposed openstack/bifrost master: WIP: start working on Ubuntu 24.04 support  https://review.opendev.org/c/openstack/bifrost/+/92889510:45
opendevreviewDmitry Tantsur proposed openstack/bifrost master: CI: pin the benchmark job to ubuntu-jammy  https://review.opendev.org/c/openstack/bifrost/+/92889610:47
opendevreviewDmitry Tantsur proposed openstack/ironic master: Move the benchmark job to the experimental pipeline  https://review.opendev.org/c/openstack/ironic/+/92889710:54
dtantsurWe're hit pretty hard by something in Neutron's devstack plugin (see their channel)10:54
opendevreviewDmitry Tantsur proposed openstack/bifrost master: WIP: start working on Ubuntu 24.04 support  https://review.opendev.org/c/openstack/bifrost/+/92889510:59
JayFdtantsur: we cannot remove that job for policy reasons. I'll have a look when I'm working12:40
dtantsurJayF: there is no difference, policy or not, between a job that is neglected because it's experimental and a job that is neglected because it's non-voting and permafailing12:41
dtantsuron top of that, the CI is in a very bad shape - see #openstack-neutron12:43
JayFI just realized we might not need it for this one. Because this would be upgrading from 2023.2 to 2024.2 which is not supported13:12
dtantsurI guess we don't, but it's good to keep it working13:12
JayFWell I wouldn't be surprised if it's broken for actual reasons if someone did like a breaking migration13:14
JayFOr due to machinations in things like requirements or similar shared config or resources13:14
TheJuliadtantsur: I was never excited by it13:26
* TheJulia feels a bit rough today13:27
masgharTake care, Julia!14:17
masgharAnybody looking at the metal3 intergration job?14:17
dtantsurmasghar: yep https://review.opendev.org/c/openstack/ironic/+/92888514:19
masgharAh, I thought that might be it (saw inspection failing in the job logs I think)14:20
masgharalso, thanks!14:20
opendevreviewStephen Finucane proposed openstack/ironic master: api: Introduce new mechanism for API versioning  https://review.opendev.org/c/openstack/ironic/+/92891914:25
opendevreviewStephen Finucane proposed openstack/ironic master: WIP: api: Add schema validation framework  https://review.opendev.org/c/openstack/ironic/+/92892014:25
opendevreviewStephen Finucane proposed openstack/ironic master: WIP: api: Add schema for allocations API  https://review.opendev.org/c/openstack/ironic/+/92892114:25
stephenfinJayF: I was hoping to get tests passing on one API but ran out of time. However, that's a prototype for how you could do the JSONSchema integration that is very similar to how we're approaching it elsewhere ^14:26
stephenfinNot sure how far adamcarthur5 has gotten but it might be helpful14:27
JayFYeah this is what I figured those changes would look like, but I have significant concerns about upending our API in that manner without causing any regression14:35
JayFPlus, even if cores agreed to a migration like that we would need to ensure it happened in a single release cycle. I don't want us to end up in a half-migrated limbo like what happened with the virt driver14:36
JayFSo I just kind of want us to be deliberate about how we approach it to avoid the regressions and make sure we can get it in one shot14:36
dtantsurJFYI bifrost is also broken by the same regression as metal314:39
masghar:its-fine-fire:14:39
dtantsurprobably everything running new inspection on virtual machines does not work14:39
masgharDue to the same error as fixed in 928885?14:40
dtantsuryep14:40
masgharok14:40
dtantsuron the "bright" side, bifrost on ubuntu 24.04 is borken waaaaaay before inspection :D14:41
masghar:its-fine-fire-2: 14:41
dtantsuremojis are not enough for this situation. animated gif support in irc when?14:42
masgharGood question14:42
JayFHonestly this result makes me wonder if we need to add a job on the IPA side or something to catch these things14:46
JayFAlthough I do feel slightly good knowing that we at least resolved a bug through this process since it was always possible for that to return none in rare cases14:46
dtantsura voting job with inspection, yeah14:46
JayFIt might not be a bad idea at ptg to have a short session on how to troubleshoot that metal3 job14:47
JayFBecause I took a look at it and all I ended up seeing was something that looked like a leader election failed and combining that with the extremely long time out on failure I extrapolated (incorrectly)14:47
masgharI didnt know where the logs went (but I didnt dig much)14:48
masgharSo I just popped in here and asked if someone was taking a look14:48
JayFRealistically, given how many new people we have in the community who have less ops experience it might be smart to have a session about troubleshooting CI jobs in general14:48
masgharI would be all eyes and ears, tbh14:49
opendevreviewDmitry Tantsur proposed openstack/bifrost master: WIP: start working on Ubuntu 24.04 support  https://review.opendev.org/c/openstack/bifrost/+/92889515:01
TheJuliabrraaaains15:23
opendevreviewStephen Finucane proposed openstack/ironic master: WIP: api: Add schema validation framework  https://review.opendev.org/c/openstack/ironic/+/92892015:34
opendevreviewStephen Finucane proposed openstack/ironic master: WIP: api: Add schema for allocations API  https://review.opendev.org/c/openstack/ironic/+/92892115:34
opendevreviewDmitry Tantsur proposed openstack/ironic master: Try limiting MTU to at least 1280  https://review.opendev.org/c/openstack/ironic/+/92893115:38
JayFthe more I look at stephen's proposed changes, the more I like it tbh16:02
JayFI'm just scared of breaking the API :D 16:02
TheJuliaThere is a required portion of fear, uncertinty, and doubt in all things. But then there is the cost to that... and yeah16:05
JayFHonestly, it's one of those things where I think I thought the way we handled microversinos was fine16:08
JayFuntil I had to explain it to two different junior devs (Boushra then C I D)16:08
JayFand seeing a potentially clearer way was nice ... but the node object is going to be crazy hard16:09
JayFbtw; the missing context in this channel for this: if we did this migration, we get openapi for free 16:09
JayFand that means we get a lot of clients for free16:09
JayFthat's the real problem we're solving for16:10
dtantsurweeelll... I'm highly doubtful about the "lot of clients" part16:15
dtantsurbut it will definitely help SDK authors16:15
JayFI was told at least there's a rust SDK generator? 16:17
JayFI believe the exchange was16:17
JayFadamcarthur5: "We can get a rust SDK for free with OpenAPI specs!" me: "We already do, dtantsur made it for us :D" ;)16:17
dtantsurSDK folks have some hopes. From my experience, it's auxiliary best case.16:18
dtantsurIt's like with AI tools: you can generate *something*, you need to apply an engineer to generate *something good*.16:18
JayFWhat are you talking about? LLM code generation is going to take all of our jobs, and they'll only need a middle manager for code review /s 16:19
JayF:D 16:19
JayFMakes sense; I am always skeptical of any code generation stuff generally16:19
JayFbut my opinion on making that API change is snowballing towards a "I'm probably for it" given the number of small benefits we pick up at each step16:20
dtantsurit'll definitely save me from the mundane work of copying fields from the docs into rust16:21
dtantsurJayF: ehhhhmmmm https://opendev.org/openstack/ironic-python-agent/src/branch/stable/2023.2/ironic_python_agent/hardware.py#L3216:40
dtantsurthis looks like a problem to me16:40
JayFđź‘€16:40
dtantsurI don't understand how it works upstream, in our downstream builds we now get oslo.config complaining about duplicate options16:41
JayFdtantsur: those are safe calls16:41
JayFdtantsur: oh16:41
dtantsuroslo_config.cfg.DuplicateOptError: duplicate option: efi_system_partition_size16:42
JayFhttps://review.opendev.org/c/openstack/ironic-python-agent/+/927978/3#message-cfacb46d3913a35e6e754e1ef68f654a89913705 it passed CI here16:42
JayFI wonder if this is a ironic_lib version issue?16:42
JayFone of us testing against the wrong version, perhaps?16:43
dtantsuror oslo.config?16:43
TheJuliadtantsur: your version is too old then16:43
TheJuliadtantsur: they changed oslo.config at some point to ignore dupes16:43
TheJuliacode.. like.. wallaby age *does* throw an error16:43
dtantsurTheJulia: we have python3-oslo-config-2:9.1.1-0.20230608145954.515daab16:46
dtantsurwhich seems to match upper-constraints for 2023.1, for instance16:46
TheJuliauhhhhh lovely16:47
TheJuliaI thought it changed behavior post wallaby, but didn't pin down the date16:47
dtantsurwell, somehow it works upstream16:47
dtantsurI'll figure it out tomorrow, no brain power left16:49
TheJulia++16:49
TheJuliaget some rest16:49
dtantsurintending to lift some weights as a brain-clearing measure16:49
JayFhappy to patch upstream if it's broken, somehow, too, just don't see where it would be and TBH I don't think my digging in would be valuable as I am so satiated with thoughts of that code in multiple different versions already16:53
TheJuliaThere *IS* definitely a point where oslo.config was very not friendly to duplicated options being defined due to delayed invocation16:58
JayFI suspect this may be a weird SLURP/IPA interaction16:59
JayFSLURP *services* have to work on 2023.1 dependencies16:59
JayFwe've never enforced that for IPA since it has to be co-installed16:59
JayF(for stable/2023.2, they have to work on 2023.1 requirements AIUI)17:00
JayFs/it has/it doesn't have/17:03
JayFI reversed the bool in that statement17:03
JayFdtantsur: efi_system_partition_size is not in https://review.opendev.org/c/openstack/ironic-python-agent/+/927978/3/ironic_python_agent/config.py17:10
JayFdtantsur: bad rebase, I'd guess17:10
opendevreviewcid proposed openstack/ironic-tempest-plugin master: Add tempest tests for runbooks  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/92895817:26
TheJuliait *could* also be the version of ironic-lib has the options setup for when the class is loaded as upposed to upfront registration17:34
TheJuliaso it could be just differences at that level?!17:34
JayFI was having to be selective about what configs I pulled17:34
JayFbased on version of ironic_lib in that constraints17:34
JayFbecause I intentionally didn't change non-image-related-code out of ironic-lib for backport safety17:35
TheJuliawell, a better question might be in the short term, is that option explicitly inside ironic-lib?17:35
cardoegood morning ironic....18:09
TheJuliagood morning cardoe 18:10
TheJuliaHow goes your morning?18:11
cardoeWell its 1pm and I'm now looking at starting what I planned on doing first thing today. So instead I'm on IRC declaring bankruptcy on the day.18:11
JayFwhen you hit that bankrupt, you gotta just grab that wheel and give it another spin18:13
cardoeLooks like some CI fun today thus far for you guys.18:13
JayFthis time you'll hit that shiny $5,000 space, I know it18:13
TheJuliacardoe: Have you watched the re-imagined battlestar galactica series?18:13
cardoeThere's a new one?18:13
TheJulia2005-ish?18:13
cardoeomg. that's the re-imagined!18:14
cardoederp. yes.18:14
TheJulia"as foretold in the sacred scrolls"18:14
TheJulia^^^ my favorite response to stuff like that18:15
cardoeheh yes that is spot on18:15
cardoeBSG as a whole sums up so many things in computing.18:17
cardoeI was gonna say everything looks great. No patches sitting at +2/+118:19
cardoeThen I look at Zuul with metal3 sitting at 3+ hours18:20
TheJuliaI think there is an ironic issue, and then a mtu issue related to CI it seems18:20
TheJuliaI'm focused on a backport to the days of train18:21
cardoeooo that'll be fun.18:21
* TheJulia gets out the archaeology tool to figure out where something went between Train and Wallaby18:22
cardoeI would have traded today though.18:22
cardoeVP's wife was sick. Director is on jury duty. There was an all-hands and I had to fill in for slides I've never seen. My son forgot his lunch at home. So after that rodeo I took it up to school. I'm in the lobby waiting and massive police response shows up. Someone made a threat against the school online and we couldn't go anywhere.18:24
cardoeI just wanted to work on my sushy patches!18:24
cardoeI did add this item to the PTG "We would like to use inspection hooks or modify the node data based on inspection data coming from the redfish inspector" at the end of that's okay.18:26
cardoeJayF: I see you talking about SRE types for Ironic. One the things I was wondering about was strengthing a story around metrics / monitoring.18:30
TheJuliacardoe: I would highly recommend details of specifics about what one wishes to obtain18:33
TheJuliaso the overall discussion is appropriately framed18:34
cardoeOkay added context.18:49
TheJuliaawesome18:54
TheJuliawheeee: oslo_config.cfg.DuplicateOptError: duplicate option: image_convert_memory_limit18:54
JayFcid and I are pairing on virtualpdu + pysnmp stuff19:13
JayFjust hit my first IRL clear heisenbug19:13
JayFunit tests happily pass under debugger19:13
TheJuliaheisenbug? is that like a Schrödinger's unit test?19:18
* TheJulia tears up at the same number of unit tests failing19:20
JayFyeah basically the tests were passing in debug but not in full runs20:44
JayFwe nailed it down to basically threading.Lock(), and not just our use of it, the use of it in threading.Thread that is subclassed for the SNMPPDUHardness is causing issues20:44
JayFwe were able to fix the asyncio unhappiness, but without extracting all uses of threading (or changing them in a way I don't understand) it will deadlock eventually due to the locks not being event-loop-safe20:45
JayFwhat I proposed to cid was, given pysnmp is BSD licensed, we try to import the asyncore version of the older APIs directly into virtualpdu, and migrate them to use pyasyncore20:45
JayFbecause of the other thing I unearthed: pysnmp removed the low-level APIs we used, and AFAICT were the only way, to create an SNMP server around  6.4.3 release or so20:46
JayFand they are up to 7.0.220:46
JayFso I do not see a long-term happy path with pysnmp + virtualpdu20:46
JayFbut we might be able to keep it on life support for a while if we import only the bits of pysnmp we need into our project directly20:46
JayFcid: ^ if you have anything to add, please do20:47
JayFso tl;dr: 1) threading.Thread/threading.Lock incompatabilities with asyncio event loops (and modern pysnmp's use of em) is the immediate problem, but 2) pysnmp removed the interfaces we need for virtualpdu in recent releases, which makes solving #1 a bit of a moot point20:48
JayFTheJulia: tl;dr basically https://en.wikipedia.org/wiki/Heisenbug  > In computer programming jargon, a heisenbug is a software bug that seems to disappear or alter its behavior when one attempts to study it20:57
JayFin this case, stepping thru tests 1-by-1 removed any thread-unsafety element :/ 20:58
cidJayF: ++, that's pretty much it. The path to a probable solution, or ... :D21:04
* cid I will be nipping out now21:04
JayFTheJulia: JFYI; your follow-up unmaintained/xena patch tested good in my devstack; I reproduced the original issue then verified the fix. Thank you for that, I'll spend the rest of the day patting CI on the head for them :D22:25
TheJuliaJayF: thank you so so so much for the repro and the verification22:26
JayFwe really do put the file down unmodified, matching checksum, if we don't convert22:26
JayFplus those empty qcow test images convert from like, ~150k -> 4k 22:26
JayFit's pretty hilarious22:26
adamcarthur5stephenfin Thank you sharing those commits in Ironic. I haven't gotten much further than the discussions point, but I am going to look into https://review.opendev.org/c/openstack/codegenerator/+/928664 and try and scope exactly what still needs done.22:30
TheJuliaJayF: I'm not sure what you mean by by put the file down unmodified matching checksum, if we don't convert.22:31
JayFI mean literally md5sum good-qcow (or whatever) on my http server ==== md5sum /var/lib/ironic/images/{blah}22:32
JayFwhich made the validation pretty darn convincing 22:32
TheJuliaoh, heh22:33
TheJuliayeah22:33
JayFbingo I think I might have just gotten yoga ci22:46
JayFor at least now it's throwing over 9000 warnings instead of failing at install time :D 22:46
adamcarthur5stephenfin (And anyone else interested) I added a few notes in the PTG etherpad https://etherpad.opendev.org/p/ironic-ptg-april-2024 w.r.t OpenAPI. let me know if you have questions / anything else you want me to put in there22:48
JayFadamcarthur5: that's the last PTG etherpad22:49
JayFadamcarthur5: you likely meant to update the oct one22:49
JayFTheJulia: https://review.opendev.org/c/openstack/ironic/+/928974 [stable-only] Cap versions to allow CI to pass [NEW] 22:51
JayFyou'll meed to  pull the requirements change from your patch22:52
JayFmine obviates it22:52
adamcarthur5yep.. thank you, here is the October one: https://etherpad.opendev.org/p/ironic-ptg-october-2024. I will add OpenAPI as a topic since it is not currently there22:53
JayFI documented cid's and my findings in the etherpad entry about the SNMP driver, too23:08
cardoeYeah I wanna talk about the OpenAPI bits as well.23:50
cardoeLike what’s the goal of the code generation. Cause IMHO keeping OpenAPI and generating server code that then doesn’t need modifications is hard. Then how do you know what’s a difference between modified code and altered generated code.23:52
cardoeI personally prefer to write the server code and annotate it so that the server can generate the OpenAPI for you.23:52

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!