#openstack-watcher log

12:01:22 <dviroel> #startmeeting watcher
12:01:22 <opendevmeet> Meeting started Thu Oct  2 12:01:22 2025 UTC and is due to finish in 60 minutes.  The chair is dviroel. Information about MeetBot at http://wiki.debian.org/MeetBot.
12:01:22 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
12:01:22 <opendevmeet> The meeting name has been set to 'watcher'
12:01:32 <dviroel> hi all o/
12:01:44 <dviroel> who is around today?
12:02:02 <jgilaber> o/
12:02:13 <morenod> o/
12:02:38 <amoralej> o/
12:02:43 <dviroel> courtesy ping: sean-k-mooney chandankumar rlandy
12:02:52 <dviroel> ok, let's start with today's meeting agenda
12:02:59 <dviroel> #link https://etherpad.opendev.org/p/openstack-watcher-irc-meeting#L28 (Meeting agenda)
12:03:12 <dviroel> feel free to add your own topics to our agenda
12:03:26 <sean-k-mooney> o/
12:03:40 <dviroel> #topic Announcements
12:03:58 <dviroel> one quick announcement
12:04:04 <dviroel> which you should already know
12:04:06 <dviroel> OpenStack Flamingo is now released!
12:04:12 <dviroel> https://lists.openstack.org/archives/list/openstack-announce@lists.openstack.org/thread/7ANVEGX7NMVJ7A6ROGCPGXGC7NWQ4UBT/
12:05:07 <dviroel> that reminds us that we can get our backports to stable/2025.2
12:05:16 <dviroel> and continue with backports to older releases
12:05:27 <sean-k-mooney> yep that is very true
12:05:40 <dviroel> i also added that comment in reviews topic
12:06:20 <dviroel> any other announcement before we move on?
12:06:48 <dviroel> ok
12:06:49 <dviroel> #topic PTG schedule for Watcher
12:07:00 <dviroel> we start this discussion last week
12:07:18 <dviroel> we just need to define our schedule, so I can book our room
12:07:35 <dviroel> there were 2 proposals from last week
12:07:44 <dviroel> 1- Wed-Fri (13 UTC, 14 UTC, 15 UTC)
12:07:49 <dviroel> 2 - Tue-Thu (13 UTC, 14 UTC, 15 UTC)
12:08:04 <dviroel> both of them will conflict with other projects, like noa
12:08:09 <dviroel> s/noa/nova
12:09:00 <dviroel> I see that sean-k-mooney also proposes to go with Tue-Thu
12:09:11 <sean-k-mooney> 2 woudl be my prefernce yes
12:09:16 <sean-k-mooney> either are ok
12:09:22 <amoralej> +1 to Tue-Thu
12:09:24 <dviroel> this gives Fri as backup, or to join any other session
12:10:06 <dviroel> I am also ok with Tue-Thu, and we still can block the time slots to join others team discussion
12:10:37 <dviroel> so lets keep Tue-Thu (13 UTC, 14 UTC, 15 UTC) and if needed, we add Fri to the list
12:10:44 <jgilaber> +1 to option 2
12:11:45 <dviroel> #agree watcher ptg schedule to be booked: Tuesday, Wednesday and Thurdsays, 13 UTC, 14 UTC, 15 UTC
12:12:11 <dviroel> #undo
12:12:11 <opendevmeet> Removing item from minutes: #agreed watcher ptg schedule to be booked: Tuesday, Wednesday and Thurdsays, 13 UTC, 14 UTC, 15 UTC
12:12:29 <dviroel> #agreed watcher ptg schedule to be booked: Tuesday, Wednesday and Thurdsays, at  13 UTC, 14 UTC, 15 UTC
12:12:50 <dviroel> ok,  our planning etherpad is here:
12:12:53 <dviroel> #link https://etherpad.opendev.org/p/watcher-2026.1-ptg
12:13:12 <dviroel> please add any other with your name, that you plan to cover
12:13:31 <dviroel> i should work on setting the slots for each
12:14:10 <dviroel> #action dviroel to book pts slot for watcher
12:14:20 <dviroel> ok, anything else in this topic?
12:14:45 <dviroel> #topic Unused code to handle Active/Active decisio-engine services
12:14:55 <amoralej> that's mine
12:15:01 <dviroel> all yours
12:15:35 <amoralej> so, while digging into the A/A solution for decision-engine a found this https://github.com/openstack/watcher/blob/master/watcher/api/scheduling.py
12:16:08 <amoralej> that implements a basic approach to move audits to alive decision-engines when one is detected as dead
12:16:27 <sean-k-mooney> ya that more of a stopgap then a real solution
12:16:35 <amoralej> yes
12:16:52 <sean-k-mooney> that does nto mean we cant start with that but it shoudl nto be in the api
12:16:56 <amoralej> so, i have two questions, cuould this be a short-term fix?
12:17:06 <amoralej> if so, how/where should we run it?
12:17:22 <sean-k-mooney> if we add a new watcher-schduler consolsript entry point that just runs this then yes
12:17:34 <amoralej> yes, that's what i was thinking
12:17:49 <sean-k-mooney> if we do that we can then build on that to buidl a real solution
12:18:30 <amoralej> that'd be fine, but in that case, we should also what would be the long-term real solution (that may be the discussion in ptg)
12:18:51 <sean-k-mooney> i mentioned this in the ptg topic but my approch would be to have the schduler dispatch work to the descion enginve via an rpc to a shared queue that the desions engines plural listen on
12:19:02 <amoralej> another option would be t run it as a thread in the decision-engine, but i don't know if i like that
12:19:22 <sean-k-mooney> so descion engins would not pool and would not do any schdulign of the continues audits at all
12:19:43 <sean-k-mooney> amoralej: we coudl be if we did that
12:19:46 <amoralej> decision-engine would just get a rpc at the time to execute it and would execute it, right?
12:19:54 <sean-k-mooney> then we woudl need to also have a distibute locl/memofy view
12:20:05 <sean-k-mooney> so that only one of the service did the reshuffel
12:20:16 <sean-k-mooney> amoralej: yes that is my idea
12:20:20 <amoralej> yes, that's another question, we should only run a instance of the scheduler
12:20:27 <amoralej> i mean, in the short term solution
12:20:31 <amoralej> with the simple approach
12:20:35 <sean-k-mooney> we woudl remove the use of apschduler entirly form applier and descion engin
12:20:39 <amoralej> or do some kind of locking
12:20:58 <sean-k-mooney> and make them purly worker that execute action or audits in responce to an rpc message
12:21:16 <sean-k-mooney> amoralej: we can also do a very simple form of leader election
12:21:24 <sean-k-mooney> i did that in nova's schduler recently
12:21:43 <amoralej> based on something in db?
12:21:43 <sean-k-mooney> so all descison engins coudl run it in a new tread
12:22:07 <sean-k-mooney> yes i can fidn the patch and link it to you after the meeting
12:22:22 <sean-k-mooney> its a very simple apprhc based on a rendevour hash
12:22:26 <amoralej> no, no, no problem, i was just thinking in if we would need some other infra service
12:22:55 <amoralej> actually, i think something simple would be enough
12:23:03 <sean-k-mooney> https://github.com/openstack/nova/commit/e98393c5c26743ec4c862af3e1a4beaa7f2d174b
12:23:39 <amoralej> is it worthy to discuss and send patch for the 1st step intermediate solution before PTG or better to hold on and do the full discussion in PTG?
12:23:56 <sean-k-mooney> i think having a poc to look at is good
12:24:02 <sean-k-mooney> so fi you have time why not
12:24:14 <sean-k-mooney> but if you look at https://github.com/openstack/nova/commit/e98393c5c26743ec4c862af3e1a4beaa7f2d174b#diff-ce71048fd132b0db262fad40554388acaa624bbb621e1141020c30840cdc1472R112
12:24:37 <amoralej> thanks
12:24:45 <sean-k-mooney> i litrally just looked up the set of schdluers (in our case descion engions) filtered them by up and sorted them to make it stable
12:24:56 <sean-k-mooney> and then the first one does the work and rest early return
12:25:10 <sean-k-mooney> we coudl do the same thing with this reblance approch
12:25:26 <amoralej> yes, i think that would be enough
12:25:42 <dviroel> +1
12:25:53 <amoralej> so, i understand we are fine with going for this intermediate solution before implementing a rpc based one
12:26:05 <sean-k-mooney> this bascilly rely on the fact that if it take a few seconds for the leader to change its fine because it will heal over time
12:26:06 <amoralej> which will likely need more changes
12:26:29 <sean-k-mooney> amoralej: i think yes. its better to make incremental progress
12:26:40 <sean-k-mooney> as long as that does not block a more complete solution down the road
12:26:52 <sean-k-mooney> i dont really see this as creating any tech debt
12:27:04 <amoralej> i tested current approach by running a watcher-api standalone and worked fine, i mean audits were rebalanced and the new decision-engined picked them in next execution
12:27:27 <amoralej> ack, thanks
12:27:47 <sean-k-mooney> i think we shoudl proceed with that as an initial setp but in the ptg we shoud dicuss what the next step woudl be
12:27:50 <amoralej> from reporting pov, may i report this as a bug?
12:28:23 <sean-k-mooney> yes in that the functionalty is not supprted in the wsgi mode and we have deprecated the eventlet standalone verison of the api
12:28:24 <amoralej> scheduling.APISchedulingService() is not executed when running as wsgi?
12:28:26 <sean-k-mooney> so its a regression
12:28:37 <amoralej> yep, agreed
12:28:47 <amoralej> so, i think that's it about this topic
12:28:50 <sean-k-mooney> and as such coudl be a bug. even if its a boarderline feature.
12:28:57 <dviroel> amoralej: no it is not, afaik
12:29:40 <amoralej> i meant the bug would be "scheduling.APISchedulingService() is not executed when running as wsgi"
12:29:45 <amoralej> sorry i was not clear :)
12:30:09 <amoralej> that can be understood as a bug although borderline feature, as Sean said
12:30:23 <dviroel> i think that the functionality of migrating audits in the end
12:31:20 <dviroel> but indeed, this will be a great topic for  our PTG
12:31:21 <sean-k-mooney> i think dviroel ment no its not called in wsgi mode
12:31:32 <dviroel> yes ^
12:32:06 <amoralej> it's in the list
12:32:15 <dviroel> ok, thanks amoralej
12:32:32 <dviroel> ok, lets move to the next topic
12:32:44 <dviroel> we can cover this with more details at the ptg
12:32:49 <dviroel> #topic what about testing the end-to-end strategy execution as unit tests?
12:33:19 <amoralej> #link https://review.opendev.org/c/openstack/watcher/+/962784
12:33:27 <amoralej> i sent that WIP patch to get feedback
12:33:30 <dviroel> it is a patch that you just pushed, right amoralej ?
12:33:34 <amoralej> yes
12:34:07 <amoralej> i realized that at least in some of the strategies, our unit tests are focused on each method
12:34:09 <amoralej> which is fine
12:34:43 <amoralej> but, i was wondering if we should also run tests which execute the entire strategy for a predefined metrics and model
12:34:51 <amoralej> and check the resulting solution
12:35:08 <sean-k-mooney> amoralej: so technicaly that would not be a unit test
12:35:09 <amoralej> i called this end-to-end strategy testing (may exist a better name)
12:35:15 <amoralej> yes, that's my doubt
12:35:15 <sean-k-mooney> but it woudl be a functional tests in nova parlance
12:35:27 <sean-k-mooney> and i want to build a functional test suite
12:35:35 <amoralej> it would allow us to test much more complex cases that we do in tempest
12:35:49 <sean-k-mooney> so im ok with this type of testign but htere is more test setup we need to do to do it peroperly
12:35:52 <amoralej> testing with hundred of computes and vms
12:36:04 <sean-k-mooney> yep
12:36:23 <amoralej> what would mean properly, from your pov?
12:36:47 <amoralej> in my patch the coverage scope is restricted to the strategy itself
12:37:06 <sean-k-mooney> ne of the tenant of doing correct functional testing is you minimize any mocking of the watcher project but use fixture for external services
12:37:43 <amoralej> you mean, simulate prometheus, nova, etc... and run watcher in "real mode" with  no mocks, right?
12:37:59 <sean-k-mooney> so the way you do that is you start a watcher api descion engin and appler in the test using oslo messaging in memory message bus and sqlite
12:38:23 <sean-k-mooney> then your test actully calls the api with a psot to triggert the audit
12:38:34 <sean-k-mooney> but you use fixture to emulate the respocnes form nova ectra
12:38:46 <sean-k-mooney> so yes
12:38:56 <sean-k-mooney> i think you have a middel gorund
12:38:58 <amoralej> got it, what i proposed is more that unit tests but less that funcional tests ...
12:39:02 <amoralej> exactly
12:39:02 <sean-k-mooney> so what i woudl suggst for now
12:39:17 <sean-k-mooney> is we add watcher.test.unit.senario
12:39:37 <sean-k-mooney> but eventulaly i would like to to have watcher.test.functional.*
12:39:46 <sean-k-mooney> which will do even less mocking
12:40:16 <amoralej> i like the idea of moving these intermediate to it's own folder and classes
12:40:22 <amoralej> make sense
12:40:36 <amoralej> i will
12:41:15 <jgilaber> there are similar tests to what amoralej proposed in the zone_migration https://github.com/openstack/watcher/blob/03073a1b0d8dacfc49b2d220a1120be381d831d1/watcher/tests/decision_engine/strategy/strategies/test_zone_migration.py#L678
12:41:21 <jgilaber> I don't know if also for other strategies
12:42:27 <amoralej> i check the workload_balancing and i think some other, but yeah, not all
12:42:55 <sean-k-mooney> so as a general rule or guide line
12:43:06 <amoralej> i think it may reserve a full check on all the strategies, at least the non-experimental ones
12:43:25 <sean-k-mooney> unit test shoudl test one thing, they shoudl mock any calle that are to fucntion in a diffent module and any function in the current moduel that have sideefects
12:43:49 <sean-k-mooney> that does nto mean they have to test exactly one fucntion but they shoudl be small and targeted
12:44:02 <sean-k-mooney> have unit.senairo tessts in there onw folder
12:44:11 <sean-k-mooney> makes it clear thaty they are not followign the normal pattern
12:44:18 <sean-k-mooney> and make maintianing/reviewign the simpler
12:44:45 <sean-k-mooney> so if we want to add more secenairo tests im fine with that
12:44:49 <jgilaber> +1 to moving these existing tests to dedicated folder
12:44:53 <amoralej> +1
12:45:06 <sean-k-mooney> but we should still writh the simple unit test to test the relevent fucniton on there own too
12:45:45 <amoralej> i also created a way to define the metrics we want to get for each host and instance, instead of having them hardcoded as we have today
12:45:50 <sean-k-mooney> dviroel: any input on ^
12:45:52 <amoralej> based on the uuid
12:46:08 <amoralej> it's not very elegant, tbh
12:46:19 <amoralej> but i'd like to get your feedback on that too
12:46:20 <sean-k-mooney> amoralej: cool that could be come the basis fo t test fixture in the future
12:46:42 <dviroel> i am not sure if .scenario or e2e are the best names in this situation, but I also don't have any other idea
12:46:57 <amoralej> https://review.opendev.org/c/openstack/watcher/+/962784/1/watcher/tests/decision_engine/model/gnocchi_metrics.py
12:47:02 <dviroel> but I am +1 on moving to a different directory
12:47:31 <amoralej> "ComputeNode hostname="hostname_1" uuid="Node_1_CPU_5_RAM_46" :)
12:47:54 <amoralej> i couldn't find a better way to add arbitrary metadata into the model
12:48:06 <sean-k-mooney> oh i tought you ment it was done via a map lookup
12:48:48 <sean-k-mooney> i was thinking more that in the tst you would do somtihng liek metrics_fixture.register_metics(uuid, {...})
12:48:54 <amoralej> currently, there are hardcoded values for specific compute names and instances in https://review.opendev.org/c/openstack/watcher/+/962784/1/watcher/tests/decision_engine/model/gnocchi_metrics.py
12:49:07 <sean-k-mooney> and then when you used the metrics client it woudl return those metrics ectra
12:49:38 <amoralej> i can look for a better way, i can think in something like that
12:49:46 <sean-k-mooney> so we dont need to design this now
12:49:51 <sean-k-mooney> before we move on
12:50:07 <sean-k-mooney> one thing i wanted to do to make room for edxperiments and impovments
12:50:07 <dviroel> we may want to have a topic to discuss more about our unit tests and functional tests at the PTG - in case someone wants to take it
12:50:11 <amoralej> but i liked the idea of having both the metrics and model defined together in that xml
12:50:21 <amoralej> i will add it dviroel
12:50:24 <sean-k-mooney> is move all the tests form watcher/tests/ to watcher/tests/unit/
12:50:36 <amoralej> but anyone can take it
12:50:42 <dviroel> amoralej: nice, thanks
12:50:53 <sean-k-mooney> because i want to add watcher/tests/functional and watcher/tests/watcher_fixtures/
12:51:23 <sean-k-mooney> we can discuss that more in teh ptg too but does ^ sound ok to folks
12:51:31 <dviroel> make sense
12:51:43 <jgilaber> +1
12:51:48 <amoralej> sounds good
12:52:22 <sean-k-mooney> cool anything more on this topic for today?
12:52:23 <dviroel> ok, lets move on, and add more feedbacks in the patch
12:52:28 <amoralej> not from my side
12:52:31 <dviroel> ack
12:52:50 <dviroel> tks amoralej
12:52:59 <dviroel> #Reviews
12:53:08 <dviroel> there is nothing new there
12:53:22 <dviroel> I just added a reminder to our 2025.2 open backports
12:53:29 <dviroel> #link https://review.opendev.org/q/project:openstack/watcher+branch:stable/2025.2+is:open
12:53:47 <sean-k-mooney> i guess one minor update
12:53:57 <dviroel> yes
12:54:01 <sean-k-mooney> i approved the watcher-spec patch to create the 2026.1 folder yesterday
12:54:20 <sean-k-mooney> so if we need to crate them that not possible
12:54:23 <sean-k-mooney> *now
12:54:44 <sean-k-mooney> thanks dviroel for propsoing that
12:55:11 <dviroel> np, we also agree at some point, for the next release, we would split that change
12:55:20 <dviroel> and create the 2026.2 earlier
12:55:35 <dviroel> so we can decide to move specs to next release earlier too
12:55:54 <sean-k-mooney> right we shoudl create teh new folder at m2 and do the approved -> implemntted mvoe at m3
12:56:01 <dviroel> +!
12:56:04 <dviroel> +1
12:56:25 <dviroel> any other review that someone wants to bring to this meeting?
12:56:39 <dviroel> there are some open changes in watcher-tempest-plugin
12:56:47 <dviroel> but all under review I think
12:56:52 <sean-k-mooney> jgilaber: i think you had 3?
12:57:21 <dviroel> #link https://review.opendev.org/q/project:openstack/watcher-tempest-plugin+status:open
12:57:37 <jgilaber> I have 2 currently in the tempest plugin
12:57:44 <dviroel> I have been working on some refactoring too:
12:57:46 <jgilaber> on was merged
12:57:47 <dviroel> #link https://review.opendev.org/q/topic:%22organize_tests%22
12:57:51 <sean-k-mooney> and there is an nfs patch too?
12:58:17 <jgilaber> not upstream
12:58:32 <sean-k-mooney> ah. so going to dviroel series
12:58:47 <dviroel> tks
12:59:18 <sean-k-mooney> so the first is just reogainisng where tests are located to group them more logically
12:59:18 <dviroel> we only have 1 minute
12:59:35 <amoralej> i already commented in https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/960310, i think that refactor is good
12:59:44 <amoralej> thanks for taking care of it
12:59:46 <dviroel> yes, there are duplicated tests
12:59:58 <dviroel> the ones in test_execute_strategies
13:00:34 <sean-k-mooney> ya i see you are also changing the decorators.idempotent_id
13:00:51 <sean-k-mooney> so normally we dont change that if we are renameing a test
13:00:58 <dviroel> since there are duplicated tests, I was trying to keep one of them
13:01:37 <dviroel> so it is possible that is moving the test and the id
13:01:48 <sean-k-mooney> yep that is fine but we shoudl keep one of the two idempotent_id
13:01:54 <dviroel> but requires some review on that too yes
13:02:03 <dviroel> to make sure that is correct
13:02:08 <sean-k-mooney> on the downstream side thsoe IDs are what is used to track thing in polarion
13:02:34 <sean-k-mooney> so so that when we go form verion to version even if the test is renames we have continuity fo the test cases
13:02:49 <morenod> try to keep the latest tests, they are which are being executed and tracked in polarion
13:03:06 <dviroel> good point, i will double check that
13:03:17 <dviroel> thanks for raising this
13:03:20 <opendevreview> Joan Gilabert proposed openstack/watcher-tempest-plugin master: Test zone migration volume and compute migrations  https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/962702
13:03:39 <dviroel> ok, we are over time
13:03:58 <dviroel> please add your questions and comments in the patch, I will get to them asap
13:04:02 <dviroel> anything else?
13:04:29 <dviroel> ok, we will meet again next week
13:05:02 <dviroel> thank you all for participating
13:05:06 <dviroel> #endmeeting