| *** jgilaber_ is now known as jgilaber | 08:54 | |
| opendevreview | Douglas Viroel proposed openstack/watcher master: [WIP] Add optional lock-free mode for cluster data models https://review.opendev.org/c/openstack/watcher/+/987492 | 12:08 |
|---|---|---|
| opendevreview | Douglas Viroel proposed openstack/watcher master: [WIP] Add optional lock-free mode for cluster data models https://review.opendev.org/c/openstack/watcher/+/987492 | 12:14 |
| dviroel | amoralej: ^ | 12:18 |
| dviroel | seems to improve a lot, at least on my env, was ~89% faster | 12:18 |
| amoralej | cool! in the watcher_stabilization case? | 12:23 |
| amoralej | we discussed that in the past but i didn't think that'd be such a great improvement for that particular one | 12:24 |
| dviroel | amoralej: yes, i was testing with workload_stabilization | 12:32 |
| amoralej | that's nice, that would automatically improve all strategies | 12:33 |
| amoralej | I'll give it a try in a local env | 12:37 |
| sean-k-mooney | dviroel: we need to be very very carful if we allow not takign a lock | 12:59 |
| sean-k-mooney | i have not reviwed your WIP yet | 13:00 |
| sean-k-mooney | but we need to ensure its correct in all usage | 13:00 |
| sean-k-mooney | my incliation is to say instead of adding a nolock version we shoudl asses why the locks were there in the first place | 13:02 |
| sean-k-mooney | skimming it im borderly -2 because the use of logs is encoded as a a privte internal on teh model root | 13:03 |
| sean-k-mooney | that is very very nonobvious and will likely cause issue with debuging in the future | 13:03 |
| opendevreview | Alfredo Moralejo proposed openstack/watcher stable/2026.1: Add debug logging for host rejection in workload_balance https://review.opendev.org/c/openstack/watcher/+/987497 | 13:06 |
| dviroel | yeah, agree, we need to make sure that is are using a no lock version on veryu specific cases, which may need more investigation as part of this solution | 13:06 |
| dviroel | i would day that the lock is originally there since there are 2 different thread that may change the models, the collectors and the notifications | 13:07 |
| sean-k-mooney | if we know it safe to call a speicifc fucntion without a lock i woupld pefer to have a locked and unlocked version | 13:07 |
| dviroel | sean-k-mooney: ack, we can discuss a different approch instead of the property | 13:08 |
| amoralej | actually, it may be a good opportunity to extend documentation about how to use the model in the strategies, | 13:08 |
| dviroel | amoralej: valid point | 13:08 |
| sean-k-mooney | well even in eventlet mode you still need to lock shared datastucure like that | 13:08 |
| sean-k-mooney | so its not so much that we have muliple trhead btu concurrent access | 13:10 |
| sean-k-mooney | @lockutils.synchronized("model_root") feels very heavy for this usecase | 13:11 |
| dviroel | yes, that's what I meant to say in the end | 13:11 |
| sean-k-mooney | @lockutils.synchronized("model_root") is a filesystem lock | 13:11 |
| amoralej | locking for the shared model which is updated by the notification and collector while non-locking for the in-audit execution copies looks like a good starting point, and what the patch implements, | 13:12 |
| sean-k-mooney | it would be better to have a lock in the instnace (self.model_lock) and have a decorator that aquired the lock for the current instance of the obejct | 13:12 |
| sean-k-mooney | then when we make a copy they can have seperate locks | 13:12 |
| amoralej | but yep, needs to be treated carefully ... | 13:12 |
| dviroel | indeed a filesystem lock is expensive | 13:13 |
| sean-k-mooney | amoralej: right but we are makeign a copy so that it wont be updated by the collecort | 13:13 |
| sean-k-mooney | the issue here is because its a file sytem lock each copy still share the lock | 13:13 |
| amoralej | we don't want the collector to update a copy of the model which is in use | 13:13 |
| amoralej | i mean by an audit | 13:13 |
| sean-k-mooney | if we make the lock per insstnace of the obejct that wont be a probelem any more | 13:13 |
| amoralej | ah, sorry, about that, yep, makes sense | 13:14 |
| sean-k-mooney | amoralej: right but we can hanel that by do a CAS (compare exchange) when updating the shared instnace | 13:14 |
| amoralej | i didn't know @lockutils.synchronized was on filesystem tbh | 13:14 |
| sean-k-mooney | yes its effectivly a full inter process level lock | 13:15 |
| sean-k-mooney | https://github.com/openstack/oslo.concurrency/blob/master/oslo_concurrency/lockutils.py#L469 | 13:16 |
| sean-k-mooney | to befair that depend on if external is set to true or not | 13:18 |
| sean-k-mooney | it is in nova but the defaultg in oslo is false | 13:18 |
| sean-k-mooney | so this might not be a file level lock but we are still sharing it when we should not be because of how this is implemented | 13:18 |
| dviroel | yeah, I see your point, all copies would be sharing the same lock, which doesn't make too much sense | 13:21 |
| sean-k-mooney | it wont be that hard to create new decorator to fix this | 13:24 |
| sean-k-mooney | but that how i woudl od it, put the lock in the instance of the datamodle and have the decorater aquire that via self | 13:24 |
| sean-k-mooney | that will allow us to use the same pattern of decorating the methods and if a single instnace of the object is ever shared arocss thread or greenlets it will just work | 13:25 |
| amoralej | anyway, aren't they two different issues?, 1. where to lock and where to non-lock, and 2. what's the best to way to lock and the scope when locking | 13:25 |
| sean-k-mooney | no ned to opt in or out | 13:25 |
| sean-k-mooney | amoralej: well yes and no if there is no contention on a lock it basiclly free | 13:26 |
| sean-k-mooney | so if we stop sharign the locks between copies i suspect the majoriy of the perfomacne impact will go away | 13:26 |
| amoralej | we could easily test it | 13:26 |
| dviroel | we would need to test and get some results | 13:26 |
| amoralej | yep | 13:26 |
| sean-k-mooney | give claude the irc logs and ask it to create a poc | 13:26 |
| sean-k-mooney | :) | 13:27 |
| amoralej | but not locking if it's not needed sounds also good to me :) | 13:27 |
| sean-k-mooney | yep that need more tought | 13:27 |
| sean-k-mooney | i would be relucat to backprot the lock removal | 13:27 |
| sean-k-mooney | changing to a better lock i think is much simpler to justify | 13:28 |
| sean-k-mooney | so we can do both | 13:28 |
| amoralej | that's right | 13:28 |
| dviroel | i'll give a try | 13:31 |
| sean-k-mooney | i assume this is related to the descion engine scaling work or eventlet removal work? | 13:34 |
| dviroel | related to the scaling work | 13:37 |
| dviroel | this LP https://bugs.launchpad.net/watcher/+bug/2141951 | 13:37 |
| opendevreview | Ivan Anfimov proposed openstack/watcher-dashboard master: Update Babel configuration https://review.opendev.org/c/openstack/watcher-dashboard/+/987518 | 14:09 |
| opendevreview | Ivan Anfimov proposed openstack/watcher-dashboard master: Update Babel configuration https://review.opendev.org/c/openstack/watcher-dashboard/+/987518 | 14:10 |
| dviroel | sean-k-mooney: amoralej: so yeah, it seems that replacing the lockutils.syncronized with a simple instance lock already improves the execution, to a execution time close to the no lock proposal | 15:07 |
| amoralej | cool, will you send a separated review with that? I may test both | 15:09 |
| dviroel | amoralej: yes | 15:10 |
| sean-k-mooney | sweet | 15:17 |
| dviroel | btw, sean-k-mooney, if you have some time today, would you like to take a look on https://review.opendev.org/c/openstack/watcher/+/986486? | 15:23 |
| sean-k-mooney | im reviewing the tempest tests first | 15:31 |
| sean-k-mooney | but yes i have that open | 15:31 |
| opendevreview | Douglas Viroel proposed openstack/watcher master: Replace global lockutils locks with per-instance RLock https://review.opendev.org/c/openstack/watcher/+/987540 | 15:36 |
| dviroel | forgot to WIP it | 15:38 |
| dviroel | amoralej: ^ | 15:39 |
| dviroel | claude thinks that is ready for production XD | 15:39 |
| amoralej | np it may be! :) | 15:41 |
| amoralej | add a -W if you prefere not to resend with wip | 15:41 |
| sean-k-mooney | i mean obviouly just ship it, what the worst that can happen when changing concurance primitives | 15:43 |
| sean-k-mooney | i wonder if teim will agrre with claude while using claude (with glm) to review what claude did :) | 15:43 |
| opendevreview | Alfredo Moralejo proposed openstack/watcher stable/2026.1: Add debug logging for host rejection in workload_balance https://review.opendev.org/c/openstack/watcher/+/987497 | 16:26 |
| sean-k-mooney | dviroel: i wont push my comment on https://review.opendev.org/c/openstack/watcher-specs/+/969840 this evenign because they are frm ps 7 pre-ptg but ill see if i can review it this week | 17:17 |
| dviroel | ok, thanks sean-k-mooney | 17:18 |
| sean-k-mooney | i also finished reviewing the bfv code. the code itself looks reasonabel but there are gaps in teh testing at least on the tempest side | 17:18 |
| sean-k-mooney | there is no end to end bfv coverage with an actual cold/live migration to confirm watcher actully calls nova correctly nor is there any testing with flavor with non zeor swap/ephermal | 17:20 |
| sean-k-mooney | im goign to grab dinner | 17:21 |
| dviroel | indeed, i was considering the migration as same as the existing ones, but we don't have any bfv coverage in tempest, before we actually say that it is supported | 17:36 |
| dviroel | the test is creating the instance, but not migrating it | 17:37 |
| opendevreview | Merged openstack/watcher-dashboard master: test: Restructure test directory with unit/ and local_fixtures/ https://review.opendev.org/c/openstack/watcher-dashboard/+/978257 | 18:19 |
Generated by irclog2html.py 4.1.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!