rpittau | good morning ironic! o/ | 06:48 |
---|---|---|
mdfr | good morning ironic! :) | 07:56 |
queensly[m] | Good morning | 08:00 |
opendevreview | Riccardo Pittau proposed openstack/bifrost master: Move all CS9 jobs to non-voting https://review.opendev.org/c/openstack/bifrost/+/955660 | 08:18 |
opendevreview | Queensly Kyerewaa Acheampongmaa proposed openstack/ironic master: Add manual clean and automated verify steps to set BMC clock via Redfish Manager https://review.opendev.org/c/openstack/ironic/+/953477 | 08:22 |
opendevreview | Merged openstack/ironic master: [docs] Add NIC Firmware Update to Firmware Interface Docs https://review.opendev.org/c/openstack/ironic/+/955551 | 08:42 |
opendevreview | Merged openstack/bifrost stable/2025.1: Fix Keystone after their migration from WSGI scripts https://review.opendev.org/c/openstack/bifrost/+/951963 | 10:08 |
dtantsur | TheJulia: I don't think I'm following. Why is it better to have the current thread simply wait instead of doing its share of work? | 11:10 |
dtantsur | cardoe: hi! Do you prefer the commit message of https://review.opendev.org/c/openstack/ironic/+/954755 updated or are you fine with my explanations inline? | 12:00 |
cardoe | It’s fine. | 12:13 |
opendevreview | Jacob Anders proposed openstack/ironic master: Skip initial reboot to IPA when updating firmware out-of-band https://review.opendev.org/c/openstack/ironic/+/954311 | 13:06 |
TheJulia | dtantsur: I think the queue being on the launching thread means that when it gets blocked, the queue gets frozen | 13:12 |
dtantsur | TheJulia: that does not match my understanding of queues at all, but my understanding is not 100% solid. | 13:13 |
dtantsur | What do you even mean by "on the launching thread" in context of a variable? | 13:14 |
TheJulia | I noticed it as like "well that is weird and inconsistent", removed the extra worker on that thread and was like "oh wow" | 13:14 |
*** darmach1 is now known as darmach | 13:15 | |
dtantsur | Well, I did notice really awkward queue behaviors when writing unit tests for my futurist patch | 13:15 |
dtantsur | Noticed some sleep(0) there? | 13:15 |
dtantsur | https://review.opendev.org/c/openstack/futurist/+/955217/7/futurist/tests/test_executors.py#227 | 13:15 |
dtantsur | So yeah, something is off, maybe you're right? I see no traces of that in the docs though, I wonder if it's more about GIL. | 13:16 |
TheJulia | so periodic triggers the power state sync, its say thread 113. Thread 113 creates the queue... then launches thread 114-124 to work the queue it gets provided from thread 113 | 13:16 |
TheJulia | I think it is totally the GIL biting us, at least thinking of it in hind sight | 13:16 |
TheJulia | because each entry popped out of the queue has to be reconciled out somewhere | 13:16 |
dtantsur | Yeah, I operated under impression that things like network calls, subprocess calls or queue waits lifted the GIL. It might not be so easy. | 13:17 |
TheJulia | at least, that is the easiest explaination | 13:17 |
dtantsur | Googling now. I'm getting a gut feeling that Queue.get may not release GIL in the end... | 13:20 |
TheJulia | ack | 13:20 |
TheJulia | (FWIW, we're likely to find all sorts of weird fun things like this as time progresses, but this is one that stood out to me last week) | 13:20 |
dtantsur | Gemini is sure that GIL does get released :) | 13:20 |
TheJulia | (because, I needed a consistent spread for sanity) | 13:21 |
dtantsur | but the example it provides uses time.sleep which is cheating :) | 13:21 |
TheJulia | well, fwiw, that is how I simulated the hanging threads, utilizing the deploy option in the fake drivers | 13:24 |
TheJulia | err | 13:24 |
TheJulia | delay options | 13:24 |
* TheJulia drinks more coffee | 13:24 | |
TheJulia | ERRTOOMANYMESSAGES <-- the exception my brain is tossing this morning | 13:28 |
dtantsur | TheJulia: I think the actual tl;dr is that we should never use Queue.wait without a short timeout and retries | 13:46 |
queensly[m] | Hi everyone, I’m seeing an ironic-grenade failure on my patch:... (full message at <https://matrix.org/oftc/media/v1/media/download/AXkoTQFW_ANPIPaGILlOYBESiPer5HpapTeo0-yWot75xdjnEm7tuTNtT5oHgbGBwB7v1wckNySbs3Ks9fonb1FCeYf6KimwAG1hdHJpeC5vcmcveklIWFdqaFp3dENtb2ZHZXdFeVVWTVZG>) | 13:47 |
TheJulia | dtantsur: uhh, not sure we do, but I’ll look when I context switch back. | 13:48 |
dtantsur | aha, I see https://opendev.org/openstack/ironic/src/branch/master/ironic/conductor/manager.py#L1615-L1620 | 14:00 |
TheJulia | oh! I see! | 14:03 |
TheJulia | dtantsur: in the grand scheme of things, we at least sort the order based upon age since last seen, so if we fail to check one its *really* not a huge deal, but it makes sense to try and soften any issue around that | 14:12 |
dtantsur | TheJulia: importantly, the pattern we're using does not release GIL, at least not the Queue part. The nowait bit ensures that. We do have sleep(0) though. | 14:13 |
TheJulia | I was sort of hoping the existence of sleep(0) would entirely go away post-eventlet | 14:13 |
dtantsur | TheJulia: if you change 0 to a small value, like 0.01 or 0.1, does it change anything? | 14:13 |
dtantsur | Well.... it won't :) | 14:13 |
TheJulia | le sigh | 14:14 |
TheJulia | I'll have to setup to give it a try, I restacked the code everything yesterday and had to re-take memory footprint notes to write the release note | 14:14 |
dtantsur | I'm still puzzled why the actual implementation of get_power_state is not enough to release the GIL for a while. | 14:14 |
TheJulia | s/everything// | 14:14 |
dtantsur | You mentioned you're using the fake driver, right? But it has a sleep somewhere? | 14:15 |
cardoe | What's the git ref we're suppose to use when we used AI to help us write the code? | 14:15 |
TheJulia | I mean, it might be, but its sort of an issue where we want and really can only focus on the trees, but the spread/behavior we observe from the forest | 14:15 |
cardoe | Co-Authored-By? | 14:15 |
dtantsur | Generated-By? | 14:15 |
dtantsur | better double-check | 14:15 |
cardoe | adamcarthur5: you've been doing it. | 14:15 |
TheJulia | or Assisted-By... I almost have all the votes to approve it | 14:15 |
dtantsur | cardoe: https://openinfra.org/legal/ai-policy | 14:16 |
TheJulia | (and some projects are already using it | 14:16 |
TheJulia | dtantsur: thanks! | 14:16 |
cardoe | Assisted-By would probably fit me better. | 14:16 |
cardoe | I wrote it and then asked Claude code to give me some feedback on it and work through it. | 14:16 |
adamcarthur5 | Generated-By is what I have been using | 14:17 |
TheJulia | That would align with the label | 14:17 |
cardoe | That's my usual approach. "This is a thing I did. Let's walk through the documentation and the tests and you ask me questions or concerns that a senior software engineer would have in reviewing it." | 14:17 |
adamcarthur5 | But yeah maybe a month ago I'd have said it was more assisted by... But tbh it's getting to the point now where I'm letting it ride and being much more of a reviewer | 14:17 |
TheJulia | dtantsur: oh, btw. I didn't have to change by systemd unit files at all with the switch to threading | 14:17 |
dtantsur | TheJulia: so, in your testing case, does the driver actually release GIL? via a sleep or via a network call | 14:17 |
cardoe | adamcarthur5: yeah there's certainly a bunch of that that I do as well. | 14:18 |
TheJulia | dtantsur: everything has sleep(1) for get_power_state right now | 14:18 |
TheJulia | but, that is also 1 second, I've been looking for consistent chaos | 14:18 |
cardoe | And that's the dang truth with Kiro. It's ask it to do something... leave for lunch... maybe its finished thinking about the problem and given me a response by the time I'm back. | 14:18 |
dtantsur | hmmm, sleep definitely releases GIL, so we have sleep(1) in each iteration | 14:18 |
dtantsur | TheJulia: entirely unrelated, any chances to see you at the OpenInfra Europe? | 14:20 |
dtantsur | anyone else present? | 14:20 |
TheJulia | dtantsur: possibly, TBD. It will be a topic with my manager and we may hold some level of boardish meeting. Its a bit like herding cats who are hearing can openers elsewhere. | 14:20 |
TheJulia | err, topic with my manager tomorrow | 14:21 |
dtantsur | ++ | 14:21 |
* TheJulia clearly needs MOAR COFFEE | 14:21 | |
dtantsur | Maybe we'll have a chance to debug all them threads with a large bottle of brandy | 14:21 |
mumesan[m] | lol | 14:22 |
TheJulia | Not Scotch?! | 14:22 |
TheJulia | :) | 14:22 |
queensly[m] | Just following up in case my earlier message got missed: | 14:22 |
queensly[m] | I'm seeing an ironic-grenade failure on my patch: | 14:22 |
queensly[m] | https://review.opendev.org/c/openstack/ironic/+/953477 | 14:22 |
dtantsur | In France?? | 14:22 |
opendevreview | Doug Goldstein proposed openstack/sushy master: feat: add quirks module for handling BMC differences https://review.opendev.org/c/openstack/sushy/+/955701 | 14:22 |
TheJulia | dtantsur: true! | 14:22 |
cardoe | dtantsur: ^ super rough swag of the quirks stuff. | 14:23 |
TheJulia | dtantsur: although, at my first summit I was in a bar with a bunch of HP-ers and ordered scotch and chjones screamed out "Kreger! Nice choice!" | 14:23 |
dtantsur | ohh, so many cool things to check out before the PTO.. I'll try to get to it cardoe! | 14:23 |
cardoe | I'll be super honest too... I wrote it a while back and you poked me about it and I ran though it a little bit. But then just asked Claude this morning to review it and criticize it and the docs too. | 14:23 |
cardoe | I committed what it modified without actually reading it because I don't intend on this being the actual implementation but a starting point to have a conversation. | 14:24 |
* dtantsur is still pissed at Gemini for bullshitting him about Queue and GIL | 14:24 | |
TheJulia | dtantsur: its AI, what do you expect? | 14:25 |
TheJulia | its going to tell you want you want to hear! | 14:25 |
dtantsur | I tried really hard to avoid making assumptions... | 14:25 |
TheJulia | and not tell you how much funky yoga it will take to get there and how much (bad) pain you'll be in afterwards. | 14:26 |
dtantsur | Also, quite a few people expect AI to write acceptable code ;) | 14:26 |
dtantsur | hehe | 14:26 |
TheJulia | le sigh | 14:26 |
queensly[m] | From the logs, I found this error: | 14:26 |
queensly[m] | HttpException: 501: Server Error for url: http://10.0.18.29/compute/v2.1/servers/d40d62c5-f0d0-4485-8d76-eb84b2e94f86/action, The requested functionality is not supported. | 14:26 |
queensly[m] | Is this a known infra-related or intermittent issue? | 14:26 |
dtantsur | queensly[m]: it does look like something entirely unrelated to your patch at lease. | 14:27 |
TheJulia | Yeah, that seems... Odd | 14:27 |
queensly[m] | Yeah, I was wondering if a "recheck" could resolve it since it's not related to my patch. Or is there something I could do? | 14:29 |
TheJulia | Typically we would try to identify what is going on, why it is going on, but sometimes if it seems intermittent a recheck with additional text of the error from the compute service might be good, that does seem.. weird. | 14:30 |
queensly[m] | Alright. I will recheck and include the error message in the comment in case it helps spot any pattern. | 14:36 |
TheJulia | last call on https://review.opendev.org/c/openstack/ironic/+/954755 | 14:41 |
dtantsur | Intersting, Gemini was probably not wrong, just outdated. In the 3.12 version, GIL is released inside locks. | 14:44 |
TheJulia | There were a number of change around that, so entirely possible in the grand scheme of things | 14:51 |
dtantsur | TL;DR locks should release GIL, it's just hard to find proofs of that | 14:51 |
TheJulia | okay, you wanted me to put a sleep right before the thread got a node from the queue? | 14:54 |
TheJulia | running again with the restacked state i uploaded into gerrit yesterday | 14:54 |
dtantsur | Sigh, I don't know what I want any more... | 14:54 |
* TheJulia gives dtantsur a chocolate cookie | 14:54 | |
dtantsur | nomnom | 14:55 |
TheJulia | At least it is running with the latest state matching gerrit since we can't run any of that until we have a new futurist release | 14:55 |
TheJulia | and I'll check the spread after some meetings this morning | 14:56 |
dtantsur | TheJulia: I guess I'm still curious if changing time.sleep(0) to slightly higher numbers changes anything | 14:58 |
TheJulia | ack | 14:58 |
cardoe | I gave it +2 | 15:03 |
cardoe | So good gosh Claude loves trailing whitespace... https://review.opendev.org/c/openstack/sushy/+/955701 But anyway I would appreciate some feedback on the idea of a module like that. | 15:04 |
opendevreview | Nahian Pathan proposed openstack/sushy master: Support expanded Chassis and Storage for redfish https://review.opendev.org/c/openstack/sushy/+/955211 | 15:20 |
dtantsur | cardoe: whitespace everywhere indeed :D I wonder if instead of hardcoding with_vendorname_quirks we want to use something similar to oem plugins | 16:43 |
opendevreview | Clif Houck proposed openstack/ironic master: Add a new 'category' field to the Portgroup object https://review.opendev.org/c/openstack/ironic/+/955713 | 17:19 |
opendevreview | Verification of a change to openstack/ironic master failed: Switch from local RPC to automated JSON RPC on localhost https://review.opendev.org/c/openstack/ironic/+/954755 | 17:44 |
opendevreview | Merged openstack/ironic-ui master: Imported Translations from Zanata https://review.opendev.org/c/openstack/ironic-ui/+/954833 | 17:52 |
opendevreview | Merged openstack/ironic-ui master: add pyproject.toml to support pip 23.1 https://review.opendev.org/c/openstack/ironic-ui/+/952352 | 17:52 |
adamcarthur5 | TheJulia I apologies for not getting on the spec. I've been hit with a multi-country deadline in the PhD and upstream work on the other side. I will get to it! | 18:02 |
TheJulia | adamcarthur5: no worries! | 18:07 |
TheJulia | dtantsur: 12-14 minutes per check without removal and keeping the 9th worker in place, sleeps added, I'll let you know. In the mean time lunch() | 18:16 |
opendevreview | Verification of a change to openstack/bifrost master failed: Fix deployment image checksum validation https://review.opendev.org/c/openstack/bifrost/+/953080 | 19:00 |
opendevreview | Merged openstack/ironic-python-agent master: Vendor own option for tls cert file and key file https://review.opendev.org/c/openstack/ironic-python-agent/+/954182 | 19:02 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!