opendevreview | Clif Houck proposed openstack/ironic master: Add a new 'vendor' field to the Port object https://review.opendev.org/c/openstack/ironic/+/954966 | 00:26 |
---|---|---|
opendevreview | Merged openstack/ironic-python-agent master: Fix missing [mdns] options https://review.opendev.org/c/openstack/ironic-python-agent/+/954183 | 01:56 |
rpittau | good morning ironic! o/ | 06:10 |
queensly[m] | Good morning | 08:06 |
opendevreview | Merged openstack/ironic master: Imported Translations from Zanata https://review.opendev.org/c/openstack/ironic/+/954844 | 08:35 |
tkajinam | rpittau, so giving it another thought I noticed the real problem triggered by python 3.9 removal is that it breaks compatibility with c9s, not ubuntu jammy, because c9s uses python 3.9 as its default. | 08:57 |
rpittau | tkajinam: yeah, you're right, I was wondering about jammy too | 08:58 |
rpittau | in any case, we're using python 3.12 on CS9 | 08:58 |
tkajinam | I'm unsure why https://review.opendev.org/c/openstack/bifrost/+/955061 does not break c9s jobs, but I see these jobs attempt to install ironic services in c9s (with py3.9 used). If these attempt to install master, not stable releases, then removing py39 support may kill these | 08:58 |
tkajinam | ah, ok | 08:58 |
tkajinam | hmm. I know we use py3.12 for devstack jobs but I wasn't sure if the same switch has been done for bifrost jobs | 08:59 |
tkajinam | that's what I was about to ask | 08:59 |
rpittau | tkajinam: nvm I confused with another thimg, we actually tried to use Py 3.12 in bifrost but to no avail | 09:02 |
rpittau | we'll have to abandon CS9 and switch to CS10 when ready | 09:02 |
rpittau | we're pinning UC for Py3.9 compatibility for the time being | 09:03 |
rpittau | I think im going to move the cs9 jobs to non voting | 09:23 |
tkajinam | rpittau, yeah or use 2025.1 branch for c9s jobs | 10:36 |
rpittau | tkajinam: or even both, we're not supposed to support cs9/py3.9 during this cycle anyway | 10:51 |
tkajinam | yeah | 10:51 |
dtantsur | TheJulia: if futurist relies on eventlet in code that is not explicitly called GreenSomething, it's a bug, and we can fix it | 11:12 |
dtantsur | (I think I still have +2 on futurist heh) | 11:12 |
opendevreview | Riccardo Pittau proposed openstack/bifrost master: Updated pinned upper-constraints for Python 3.9 https://review.opendev.org/c/openstack/bifrost/+/955181 | 11:17 |
rpittau | we'll have to start pinning jobs to cs10 compatible nodes, starting to see "Fatal glibc error: CPU does not support x86-64-v3" | 11:25 |
iurygregory | this is fine | 11:26 |
opendevreview | Verification of a change to openstack/ironic master failed: Advanced vmedia deployment test ops https://review.opendev.org/c/openstack/ironic/+/898010 | 12:10 |
TheJulia | dtantsur: so, the eventlet use looks like it could be excised. The thread exhaustion I'm seeing really makes me wonder :\ | 13:05 |
TheJulia | rpittau: are we seeing that leak into cs9 ? or are we trying to run cs10 now without config to pin the jobs to providers? | 14:34 |
opendevreview | Merged openstack/ironic master: Advanced vmedia deployment test ops https://review.opendev.org/c/openstack/ironic/+/898010 | 14:34 |
rpittau | TheJulia: I'm seeing that when trying to build a CS10 image on a noble node | 14:34 |
TheJulia | oh yeah, that makes sense | 14:34 |
TheJulia | and would be expected, we need to explicitly pin those nodes | 14:34 |
rpittau | unfortunately it does | 14:34 |
rpittau | yeah | 14:35 |
rpittau | btw I'm seeing this error when trying to run ironic with Python 3.12 "RLock(s) were not greened, to fix this error make sure you run eventlet.monkey_patch() before importing any other modules." | 14:36 |
rpittau | I beleive it's a bug in an older version of eventlet, but if anyone has any hint would be great :) | 14:36 |
TheJulia | yeah, the packaged version is too old | 14:43 |
TheJulia | we'll need to get that to ?0.30? | 14:43 |
TheJulia | I think | 14:43 |
cardoe | TheJulia: I had mentioned disabling a port due to a bad link or something to you the other day... it seems like maybe we should add that into this? https://review.opendev.org/c/openstack/ironic-specs/+/940861 | 14:47 |
rpittau | TheJulia: thanks, I think I will give it a try with the latest, we're currently running 0.33.1 | 14:55 |
TheJulia | cardoe: could you elaborate on what you mean? | 14:59 |
TheJulia | I may finally be escaping the shop soon too :) | 14:59 |
* TheJulia watches magical wash machine do it's thing | 14:59 | |
TheJulia | okay, it looks like we are orphaning threads from the periodic worker launches and because of the futurist code path just keeps creating new workers. I guess it was written in eventlet loaded systems where it might orphan or consider them abandoned, but the overall model is different could be as simple as just calling stop on the current thread executor. I think what we need to do as a first step is actually save a name | 15:25 |
TheJulia | on each thread worker | 15:25 |
dtantsur | Orphaning threads, omg | 15:46 |
TheJulia | in the eventlet world, its sort of entirely free. It does look possible to nuke a thread, although futurist doesn't quite like that and new periodics seem to not do so well :) | 15:48 |
TheJulia | but that is also me going "is this idle, if so, get rid of it" | 15:48 |
TheJulia | dtantsur: and that is combined with the default model on futurist is "lets add a new thread" | 15:49 |
opendevreview | Clif Houck proposed openstack/ironic master: Add a new 'vendor' field to the Port object https://review.opendev.org/c/openstack/ironic/+/954966 | 15:58 |
* dtantsur dives into the futurist code | 16:04 | |
* dtantsur stares at vim in disbelief | 16:07 | |
dtantsur | TheJulia: well, it absolutely does grow to max_workers on each submission because ThreadWorkers only exit on shutdown | 16:08 |
TheJulia | oh, they never actually exit | 16:09 |
TheJulia | unless you *ask* them to exit | 16:09 |
dtantsur | they do not indeed | 16:09 |
TheJulia | and there are totally valid reasons there | 16:09 |
dtantsur | https://github.com/openstack/futurist/blob/master/futurist/_futures.py#L154-L155 | 16:09 |
dtantsur | self._workers never shrinks | 16:09 |
TheJulia | indeed | 16:10 |
TheJulia | yup | 16:10 |
dtantsur | TheJulia: I highly suspect that the intention of MAX_IDLE_FOR here https://github.com/openstack/futurist/blob/master/futurist/_thread.py#L84 was to exit the thread once it's reached | 16:10 |
dtantsur | i.e. line 86 should be dropped | 16:11 |
TheJulia | quite possibly | 16:12 |
dtantsur | That, of course, can cause races with the growth logic | 16:12 |
dtantsur | A saner idea, probably, is to replace this logic https://github.com/openstack/futurist/blob/master/futurist/_futures.py#L154-L155 with something based on the queue size and allow shrinking | 16:14 |
opendevreview | Nahian Pathan proposed openstack/sushy master: Support expanded Chassis and Storage for redfish https://review.opendev.org/c/openstack/sushy/+/955211 | 16:14 |
dtantsur | I have a strong deja vu, I litreally wrote https://github.com/cherrypy/cheroot/issues/190#issuecomment-2883903045 already | 16:15 |
opendevreview | Nahian Pathan proposed openstack/sushy master: Support expanded Chassis and Storage for redfish https://review.opendev.org/c/openstack/sushy/+/955211 | 16:18 |
opendevreview | Nahian Pathan proposed openstack/sushy master: Support expanded Chassis and Storage for redfish https://review.opendev.org/c/openstack/sushy/+/955211 | 16:22 |
dtantsur | TheJulia: while oslo folks are hopefully thinking, let's take a step back. Okay, the workers count will necessarily reach max_workers and stop growing. That should not prevening you from submitting more work though. | 16:26 |
dtantsur | I wonder if the slow-down is simply due to the crazy number of threads doing roughly nothing | 16:27 |
TheJulia | I think the other issue we're trying to create more by default to meet the new request, and then we drop into the backup worker path. I guess a starting point is to try and record a name to at least help us rationalize what is going on | 16:34 |
TheJulia | and then sort of make sure work is cleaning itself up | 16:34 |
TheJulia | *alternatively* we could just have a periodic which individually launches threads | 16:34 |
TheJulia | which would address power sync specifically and then close that out | 16:35 |
TheJulia | that might be awful though | 16:35 |
dtantsur | From the openstack-oslo discussion, maybe I just fix futurist.. | 16:38 |
dtantsur | Yeah, the reserved pool is probably biting us here | 16:39 |
dtantsur | Hmm or not. The rejection logic uses teh queue size, not the workers number | 16:39 |
opendevreview | Nahian Pathan proposed openstack/sushy master: Support expanded Chassis and Storage for redfish https://review.opendev.org/c/openstack/sushy/+/955215 | 16:45 |
opendevreview | Nahian Pathan proposed openstack/sushy master: Support expanded Chassis and Storage for redfish https://review.opendev.org/c/openstack/sushy/+/955211 | 16:53 |
TheJulia | nah, its the rejection upon trying to schedule new work that is biting us | 17:05 |
TheJulia | so what happens | 17:05 |
TheJulia | the power sync periodic starts | 17:06 |
TheJulia | it basically tries to launch 8 threads to work the queue | 17:06 |
TheJulia | futurist rejects it because it can't create 8 more | 17:06 |
TheJulia | so we get, for example 2 | 17:06 |
TheJulia | which then begins to create this overall state where okay, we were looping every 6-8 minutes reliably for power sync on say 5k nodes. Suddenly that is 13+ minutes | 17:07 |
TheJulia | because we only got like half the workers we wanted when we reach the end of the normal worker pool | 17:07 |
TheJulia | hopefully that makes sense | 17:07 |
TheJulia | I'm going to go takes some meds and hopefully hit the road shortly | 17:08 |
TheJulia | (in any event, I suspect futurist could likely pass/set a name on a thread as an optional argument, I'll give it a spin locally, most likely tomorrow, just so we can make debugging easier | 17:08 |
dtantsur | TheJulia: safe travels! I don't think I understand why it rejects work when there is still capacity.. but it's a bit late, I need a fresh head for that | 17:16 |
opendevreview | Nahian Pathan proposed openstack/sushy master: Support expanded Chassis and Storage for redfish https://review.opendev.org/c/openstack/sushy/+/955211 | 17:17 |
TheJulia | Cool cool, sort of still trying to understand it myself | 17:27 |
cardoe | TheJulia: sorry. so that's in relation to know a port is bad for example | 18:42 |
cardoe | we don't wanna schedule on a disconnected port | 18:42 |
cardoe | So a node has maintenance mode but a port does not. | 18:42 |
iurygregory | in case someone is interested in nic firmware updates https://review.opendev.org/c/openstack/ironic/+/953394 o/ | 19:39 |
opendevreview | Nahian Pathan proposed openstack/sushy master: Support expanded Chassis and Storage for redfish https://review.opendev.org/c/openstack/sushy/+/955211 | 19:46 |
opendevreview | Nahian Pathan proposed openstack/sushy master: Support expanded Chassis and Storage for redfish https://review.opendev.org/c/openstack/sushy/+/955211 | 21:30 |
opendevreview | Queensly Kyerewaa Acheampongmaa proposed openstack/sushy-tools master: Validate JSON content type before parsing manager PATCH requests https://review.opendev.org/c/openstack/sushy-tools/+/954945 | 22:28 |
opendevreview | Steve Baker proposed openstack/networking-generic-switch master: Add security group support to netmiko_sonic https://review.opendev.org/c/openstack/networking-generic-switch/+/955252 | 23:01 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!