Wednesday, 2025-07-23

rpittaugood morning ironic! o/06:48
mdfrgood morning ironic! :)07:56
queensly[m]Good morning08:00
opendevreviewRiccardo Pittau proposed openstack/bifrost master: Move all CS9 jobs to non-voting  https://review.opendev.org/c/openstack/bifrost/+/95566008:18
opendevreviewQueensly Kyerewaa Acheampongmaa proposed openstack/ironic master: Add manual clean and automated verify steps to set BMC clock via Redfish Manager  https://review.opendev.org/c/openstack/ironic/+/95347708:22
opendevreviewMerged openstack/ironic master: [docs] Add NIC Firmware Update to Firmware Interface Docs  https://review.opendev.org/c/openstack/ironic/+/95555108:42
opendevreviewMerged openstack/bifrost stable/2025.1: Fix Keystone after their migration from WSGI scripts  https://review.opendev.org/c/openstack/bifrost/+/95196310:08
dtantsurTheJulia: I don't think I'm following. Why is it better to have the current thread simply wait instead of doing its share of work?11:10
dtantsurcardoe: hi! Do you prefer the commit message of https://review.opendev.org/c/openstack/ironic/+/954755 updated or are you fine with my explanations inline?12:00
cardoeIt’s fine.12:13
opendevreviewJacob Anders proposed openstack/ironic master: Skip initial reboot to IPA when updating firmware out-of-band  https://review.opendev.org/c/openstack/ironic/+/95431113:06
TheJuliadtantsur: I think the queue being on the launching thread means that when it gets blocked, the queue gets frozen13:12
dtantsurTheJulia: that does not match my understanding of queues at all, but my understanding is not 100% solid.13:13
dtantsurWhat do you even mean by "on the launching thread" in context of a variable?13:14
TheJuliaI noticed it as like "well that is weird and inconsistent", removed the extra worker on that thread and was like "oh wow"13:14
*** darmach1 is now known as darmach13:15
dtantsurWell, I did notice really awkward queue behaviors when writing unit tests for my futurist patch13:15
dtantsurNoticed some sleep(0) there?13:15
dtantsurhttps://review.opendev.org/c/openstack/futurist/+/955217/7/futurist/tests/test_executors.py#22713:15
dtantsurSo yeah, something is off, maybe you're right? I see no traces of that in the docs though, I wonder if it's more about GIL.13:16
TheJuliaso periodic triggers the power state sync, its say thread 113. Thread 113 creates the queue... then launches thread 114-124 to work the queue it gets provided from thread 11313:16
TheJuliaI think it is totally the GIL biting us, at least thinking of it in hind sight13:16
TheJuliabecause each entry popped out of the queue has to be reconciled out somewhere13:16
dtantsurYeah, I operated under impression that things like network calls, subprocess calls or queue waits lifted the GIL. It might not be so easy.13:17
TheJuliaat least, that is the easiest explaination13:17
dtantsurGoogling now. I'm getting a gut feeling that Queue.get may not release GIL in the end...13:20
TheJuliaack13:20
TheJulia(FWIW, we're likely to find all sorts of weird fun things like this as time progresses, but this is one that stood out to me last week)13:20
dtantsurGemini is sure that GIL does get released :)13:20
TheJulia(because, I needed a consistent spread for sanity)13:21
dtantsurbut the example it provides uses time.sleep which is cheating :)13:21
TheJuliawell, fwiw, that is how I simulated the hanging threads, utilizing the deploy option in the fake drivers13:24
TheJuliaerr13:24
TheJuliadelay options13:24
* TheJulia drinks more coffee13:24
TheJuliaERRTOOMANYMESSAGES <-- the exception my brain is tossing this morning13:28
dtantsurTheJulia: I think the actual tl;dr is that we should never use Queue.wait without a short timeout and retries13:46
queensly[m]Hi everyone, I’m seeing an ironic-grenade failure on my patch:... (full message at <https://matrix.org/oftc/media/v1/media/download/AXkoTQFW_ANPIPaGILlOYBESiPer5HpapTeo0-yWot75xdjnEm7tuTNtT5oHgbGBwB7v1wckNySbs3Ks9fonb1FCeYf6KimwAG1hdHJpeC5vcmcveklIWFdqaFp3dENtb2ZHZXdFeVVWTVZG>)13:47
TheJuliadtantsur: uhh, not sure we do, but I’ll look when I context switch back.13:48
dtantsuraha, I see https://opendev.org/openstack/ironic/src/branch/master/ironic/conductor/manager.py#L1615-L162014:00
TheJuliaoh! I see!14:03
TheJuliadtantsur: in the grand scheme of things, we at least sort the order based upon age since last seen, so if we fail to check one its *really* not a huge deal, but it makes sense to try and soften any issue around that14:12
dtantsurTheJulia: importantly, the pattern we're using does not release GIL, at least not the Queue part. The nowait bit ensures that. We do have sleep(0) though.14:13
TheJuliaI was sort of hoping the existence of sleep(0) would entirely go away post-eventlet14:13
dtantsurTheJulia: if you change 0 to a small value, like 0.01 or 0.1, does it change anything?14:13
dtantsurWell.... it won't :)14:13
TheJuliale sigh14:14
TheJuliaI'll have to setup to give it a try, I restacked the code everything yesterday and had to re-take memory footprint notes to  write the release note14:14
dtantsurI'm still puzzled why the actual implementation of get_power_state is not enough to release the GIL for a while.14:14
TheJulias/everything//14:14
dtantsurYou mentioned you're using the fake driver, right? But it has a sleep somewhere?14:15
cardoeWhat's the git ref we're suppose to use when we used AI to help us write the code?14:15
TheJuliaI mean, it might be, but its sort of an issue where we want and really can only focus on the trees, but the spread/behavior we observe from the forest14:15
cardoeCo-Authored-By?14:15
dtantsurGenerated-By?14:15
dtantsurbetter double-check14:15
cardoeadamcarthur5: you've been doing it.14:15
TheJuliaor Assisted-By... I almost have all the votes to approve it14:15
dtantsurcardoe: https://openinfra.org/legal/ai-policy14:16
TheJulia(and some projects are already using it14:16
TheJuliadtantsur: thanks!14:16
cardoeAssisted-By would probably fit me better.14:16
cardoeI wrote it and then asked Claude code to give me some feedback on it and work through it.14:16
adamcarthur5Generated-By is what I have been using 14:17
TheJuliaThat would align with the label14:17
cardoeThat's my usual approach. "This is a thing I did. Let's walk through the documentation and the tests and you ask me questions or concerns that a senior software engineer would have in reviewing it."14:17
adamcarthur5But yeah maybe a month ago I'd have said it was more assisted by... But tbh it's getting to the point now where I'm letting it ride and being much more of a reviewer 14:17
TheJuliadtantsur: oh, btw. I didn't have to change by systemd unit files at all with the switch to threading14:17
dtantsurTheJulia: so, in your testing case, does the driver actually release GIL? via a sleep or via a network call14:17
cardoeadamcarthur5: yeah there's certainly a bunch of that that I do as well.14:18
TheJuliadtantsur: everything has sleep(1) for get_power_state right now14:18
TheJuliabut, that is also 1 second, I've been looking for consistent chaos14:18
cardoeAnd that's the dang truth with Kiro. It's ask it to do something... leave for lunch... maybe its finished thinking about the problem and given me a response by the time I'm back.14:18
dtantsurhmmm, sleep definitely releases GIL, so we have sleep(1) in each iteration14:18
dtantsurTheJulia: entirely unrelated, any chances to see you at the OpenInfra Europe?14:20
dtantsuranyone else present?14:20
TheJuliadtantsur: possibly, TBD. It will be a topic with my manager and we may hold some level of boardish meeting. Its a bit like herding cats who are hearing can openers elsewhere.14:20
TheJuliaerr, topic with my manager tomorrow14:21
dtantsur++14:21
* TheJulia clearly needs MOAR COFFEE14:21
dtantsurMaybe we'll have a chance to debug all them threads with a large bottle of brandy14:21
mumesan[m]lol14:22
TheJuliaNot Scotch?!14:22
TheJulia:)14:22
queensly[m]Just following up in case my earlier message got missed:14:22
queensly[m]I'm seeing an ironic-grenade failure on my patch:14:22
queensly[m]https://review.opendev.org/c/openstack/ironic/+/95347714:22
dtantsurIn France??14:22
opendevreviewDoug Goldstein proposed openstack/sushy master: feat: add quirks module for handling BMC differences  https://review.opendev.org/c/openstack/sushy/+/95570114:22
TheJuliadtantsur: true!14:22
cardoedtantsur: ^ super rough swag of the quirks stuff.14:23
TheJuliadtantsur: although, at my first summit I was in a bar with a bunch of HP-ers and ordered scotch and chjones screamed out "Kreger! Nice choice!"14:23
dtantsurohh, so many cool things to check out before the PTO.. I'll try to get to it cardoe!14:23
cardoeI'll be super honest too... I wrote it a while back and you poked me about it and I ran though it a little bit. But then just asked Claude this morning to review it and criticize it and the docs too.14:23
cardoeI committed what it modified without actually reading it because I don't intend on this being the actual implementation but a starting point to have a conversation.14:24
* dtantsur is still pissed at Gemini for bullshitting him about Queue and GIL14:24
TheJuliadtantsur: its AI, what do you expect?14:25
TheJuliaits going to tell you want you want to hear!14:25
dtantsurI tried really hard to avoid making assumptions...14:25
TheJuliaand not tell you how much funky yoga it will take to get there and how much (bad) pain you'll be in afterwards.14:26
dtantsurAlso, quite a few people expect AI to write acceptable code ;)14:26
dtantsurhehe14:26
TheJuliale sigh14:26
queensly[m]From the logs, I found this error:14:26
queensly[m]HttpException: 501: Server Error for url: http://10.0.18.29/compute/v2.1/servers/d40d62c5-f0d0-4485-8d76-eb84b2e94f86/action, The requested functionality is not supported.14:26
queensly[m]Is this a known infra-related or intermittent issue?14:26
dtantsurqueensly[m]: it does look like something entirely unrelated to your patch at lease.14:27
TheJuliaYeah, that seems... Odd14:27
queensly[m]Yeah, I was wondering if a "recheck" could resolve it since it's not related to my patch. Or is there something I could do?14:29
TheJuliaTypically we would try to identify what is going on, why it is going on, but sometimes if it seems intermittent a recheck with additional text of the error from the compute service might be good, that does seem.. weird.14:30
queensly[m]Alright. I will recheck and include the error message in the comment in case it helps spot any pattern.14:36
TheJulialast call on https://review.opendev.org/c/openstack/ironic/+/95475514:41
dtantsurIntersting, Gemini was probably not wrong, just outdated. In the 3.12 version, GIL is released inside locks.14:44
TheJuliaThere were a number of change around that, so entirely possible in the grand scheme of things14:51
dtantsurTL;DR locks should release GIL, it's just hard to find proofs of that14:51
TheJuliaokay, you wanted me to put a sleep right before the thread got a node from the queue?14:54
TheJuliarunning again with the restacked state i uploaded into gerrit yesterday14:54
dtantsurSigh, I don't know what I want any more...14:54
* TheJulia gives dtantsur a chocolate cookie14:54
dtantsurnomnom14:55
TheJuliaAt least it is running with the latest state matching gerrit since we can't run any of that until we have a new futurist release14:55
TheJuliaand I'll check the spread after some meetings this morning14:56
dtantsurTheJulia: I guess I'm still curious if changing time.sleep(0) to slightly higher numbers changes anything14:58
TheJuliaack14:58
cardoeI gave it +215:03
cardoeSo good gosh Claude loves trailing whitespace... https://review.opendev.org/c/openstack/sushy/+/955701 But anyway I would appreciate some feedback on the idea of a module like that.15:04
opendevreviewNahian Pathan proposed openstack/sushy master: Support expanded Chassis and Storage for redfish  https://review.opendev.org/c/openstack/sushy/+/95521115:20
dtantsurcardoe: whitespace everywhere indeed :D I wonder if instead of hardcoding with_vendorname_quirks we want to use something similar to oem plugins16:43
opendevreviewClif Houck proposed openstack/ironic master: Add a new 'category' field to the Portgroup object  https://review.opendev.org/c/openstack/ironic/+/95571317:19
opendevreviewVerification of a change to openstack/ironic master failed: Switch from local RPC to automated JSON RPC on localhost  https://review.opendev.org/c/openstack/ironic/+/95475517:44
opendevreviewMerged openstack/ironic-ui master: Imported Translations from Zanata  https://review.opendev.org/c/openstack/ironic-ui/+/95483317:52
opendevreviewMerged openstack/ironic-ui master: add pyproject.toml to support pip 23.1  https://review.opendev.org/c/openstack/ironic-ui/+/95235217:52
adamcarthur5TheJulia I apologies for not getting on the spec. I've been hit with a multi-country deadline in the PhD and upstream work on the other side. I will get to it!18:02
TheJuliaadamcarthur5: no worries!18:07
TheJuliadtantsur: 12-14 minutes per check without removal and keeping the 9th worker in place, sleeps added, I'll let you know. In the mean time lunch()18:16
opendevreviewVerification of a change to openstack/bifrost master failed: Fix deployment image checksum validation  https://review.opendev.org/c/openstack/bifrost/+/95308019:00
opendevreviewMerged openstack/ironic-python-agent master: Vendor own option for tls cert file and key file  https://review.opendev.org/c/openstack/ironic-python-agent/+/95418219:02

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!