12:01:02 <dviroel> #startmeeting watcher
12:01:02 <opendevmeet> Meeting started Thu Oct  9 12:01:02 2025 UTC and is due to finish in 60 minutes.  The chair is dviroel. Information about MeetBot at http://wiki.debian.org/MeetBot.
12:01:02 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
12:01:02 <opendevmeet> The meeting name has been set to 'watcher'
12:01:16 <dviroel> hi all, lets see who is around today
12:01:22 <jgilaber_> o/
12:01:30 <chandankumar> o/
12:01:46 <amoralej> o/
12:02:18 <dviroel> courtesy ping: sean-k-mooney morenod rlandy
12:02:24 <dviroel> let's start with today's meeting agenda
12:02:37 <dviroel> #link https://etherpad.opendev.org/p/openstack-watcher-irc-meeting#L27 (Meeting agenda)
12:02:42 <dviroel> feel free to add your own topics to the agenda
12:02:56 <sean-k-mooney> o/
12:03:04 <dviroel> there are topics to place your changes that requires attention from reviewers
12:03:18 <dviroel> lets start with
12:03:24 <dviroel> #topic Announcements
12:03:38 <dviroel> PTG Schedule
12:03:46 <dviroel> #link https://ptg.opendev.org/ptg.html
12:04:01 <dviroel> based on our decisions from last meeting
12:04:15 <dviroel> i booked Tue, Wed and Thu for watcher
12:04:29 <dviroel> 13 - 15 UTC
12:04:54 <dviroel> one week before, I plan to organise our topics
12:05:25 <dviroel> if you have any topic that you want to discuss, please add it to our PTG etherpad
12:05:31 <dviroel> #link https://etherpad.opendev.org/p/watcher-2026.1-ptg
12:05:32 <sean-k-mooney> 13 - 15 UTC being an incluse range so we have 3 hours each day
12:05:45 <dviroel> correct
12:07:27 <sean-k-mooney> there are sever topic i think can be merged togther into sessions
12:08:03 <sean-k-mooney> i.e. many of the curernt topic have similar theme so we can group the related topics toghther
12:08:31 <dviroel> agreed
12:08:46 <amoralej> +1
12:08:52 <dviroel> I plan to group similar theme
12:09:05 <dviroel> and also merge some of the topics that I plan to bring
12:09:19 <sean-k-mooney> we should invite the telemetry team to https://etherpad.opendev.org/p/watcher-2026.1-ptg#L35 : Future of datasouces backends and untested integrations
12:09:36 <amoralej> I added one about "Improving testing coverage for strategies", that should be in the CI testing/coverage block i think
12:09:44 <sean-k-mooney> they are nto required but they may have input into the Aetos vs promethose topic
12:09:58 <dviroel> ack
12:10:07 <sean-k-mooney> there are at least 3 diffent ci/testing topics
12:10:33 <sean-k-mooney> and i wote 2 specs/impelation plans with ai over the last few days to help with that as well
12:11:09 <dviroel> so yeah, for now, we can just drop topics there in the etherpad
12:11:13 <amoralej> I just moved the one i created
12:11:30 <dviroel> we can then group them and find a slot for it
12:11:31 <amoralej> thanks Sean for adding the link to your spec there
12:12:01 <chandankumar> Do we want to merge watcher-dashboard ci topic under other CI topics?
12:12:38 <jgilaber> I think the dashboard case might be different enough to merit being a separate topic
12:12:58 <jgilaber> because of the stack used (django, html, etc)
12:13:00 <dviroel> yeah, but we can have both in a sequence
12:13:17 <chandankumar> +1 to keep in sequence
12:13:25 <jgilaber> +1
12:13:30 <amoralej> lgtm
12:14:39 <dviroel> ack, anything else related to the vPTG?
12:15:00 <dviroel> any other announcement?
12:15:23 <dviroel> #topic Moving decision-engine monitoring to the decision-engine service
12:15:33 <dviroel> hey amoralej, seems that this one is yours
12:15:37 <amoralej> yes
12:15:55 <dviroel> #link https://review.opendev.org/q/topic:%22bug-2126767%22
12:15:56 <amoralej> #link https://review.opendev.org/q/topic:%22bug-2126767%22
12:15:59 <dviroel> :)
12:16:00 <amoralej> damn :)
12:16:23 <amoralej> so, i sent a couple of patches related to moving the decision engine monitoring to the decision-engine service
12:16:35 <amoralej> I have several questions
12:17:05 <amoralej> I moved it from the watcher-api to the decision-engine inconditionally
12:17:16 <amoralej> i mean, no deprecation, no config parameter, etc...
12:17:29 <amoralej> i think this is transparent for the users
12:17:34 <amoralej> and even to the admin
12:17:48 <amoralej> so i don't see the need to do something more complex than adding a release note
12:17:51 <amoralej> wdyt ?
12:18:56 <dviroel> in the end you are really fixing a bug
12:19:02 <amoralej> an user running watcher-api as a wsgi will get a bug fixed when updates, an user running eventlet based watcher-api will keep same functionality but in a different service
12:19:19 <jgilaber> iiuc the monitor service is currently not running by default right?
12:19:22 <amoralej> yes, that's also my opinion
12:19:39 <amoralej> it is running when running watcher-api as a standalone service only
12:19:44 <amoralej> not as wsgi server
12:20:12 <jgilaber> we don't have any env like that in ci right?
12:20:55 <dviroel> the standalone? i don't think so
12:21:01 <sean-k-mooney> sorry had to leave for a phone call
12:21:03 <sean-k-mooney> back
12:21:17 <amoralej> np
12:21:27 <amoralej> right, devstack runs it with uwsgi iiuc
12:21:43 * dviroel look at ci failure
12:22:00 * dviroel not related
12:22:06 <jgilaber> it doesn't look like we lose any functionality to me
12:23:04 <sean-k-mooney> well the watcher-api evently proicess is due to be delete perhaps even this cycle althoug definlty in 2026.2
12:23:34 <sean-k-mooney> so i think this is not really a regression. however we coudl allow it to work form either location
12:23:57 <amoralej> yeah, but then we'd have to add parameters, etc... dunno if it's worthy tbh
12:24:18 <amoralej> that was my doubt
12:24:52 <sean-k-mooney> ack
12:24:54 <amoralej> if we would allow to run in either or both, we'd need to include the api also in the leader election process etc..
12:25:02 <sean-k-mooney> we may or may not want to have a config opiont
12:25:06 <dviroel> to keep the current behavior, you would need to add a config to enable/disable the functionality in dec-eng, with the later disabled by default
12:25:10 <sean-k-mooney> *option
12:25:48 <sean-k-mooney> right so we likely would want to have the config option in either case
12:25:56 <amoralej> yes, that'd be like the way to keep current behavior by default and adding the new one
12:26:05 <sean-k-mooney> we can have 1 that is shared between both services
12:26:22 <amoralej> but watcher-api is not registeres as a service in watcher
12:26:27 <amoralej> i mean, for the leader election
12:26:37 <sean-k-mooney> ah that is a good point
12:26:53 <sean-k-mooney> the implemtin woudl not be identical althogu i guess it does not have to be
12:27:12 <sean-k-mooney> we could copy rather then move the currnt code and deprecate teh nova-api version
12:27:13 <amoralej> the question is if it's worthy
12:27:21 <sean-k-mooney> again im not sure its worht the effort either
12:27:46 <amoralej> yes, that'd be an option sean-k-mooney, but again if an user runs it in both sides, may have unexpected
12:27:49 <sean-k-mooney> so i think we can just deprecate teh nova-api version and docuemtn it and remove it
12:28:17 <amoralej> actually, i realized i added it as a fix in releasenote, and may be also deprecated
12:28:32 <sean-k-mooney> well this is also not a bug
12:28:41 <sean-k-mooney> its a feature so it shoud have a bluepritn at a minium
12:28:54 <amoralej> we discussed in lat minute it could be a bug
12:29:00 <amoralej> last meeting, i meant
12:29:21 <amoralej> given that it's regression on moving from eventlet to wsgi
12:29:31 <sean-k-mooney> it coudl be a whislise bug to minize paper work but its not somethign we woudl consider backporting upstream
12:29:59 <amoralej> ah, ok, i wasn't thinking in backporting at this point tbh
12:30:40 <sean-k-mooney> i guess we can move forward with thsi and dicuss this more in the implementaion
12:30:40 <dviroel> yeah, doesn't seems safe to backport
12:30:43 <amoralej> let's keep the discussion about backporting as a separate one
12:30:52 <amoralej> ok
12:31:12 <amoralej> so next question is, i added the leader election in spearate review, dunnow if you prefer me to squash?
12:31:54 <amoralej> i found easier to review 1. move 2. add leader election
12:32:05 <amoralej> but can squash if you prefer
12:32:27 <jgilaber> +1 to have two patches, they are easier to review and smaller
12:32:45 <sean-k-mooney> i think either is fien it woueld be more correct to squash but we can just have a single release note for both patches and update it in the second one to note it is now safe to enable if you have more then one desion engine
12:33:10 <amoralej> i like that
12:33:14 <dviroel> ok to split if the first one works properly without the second one
12:33:34 <sean-k-mooney> the first patch shoudl have a cofnig option to opt into it and say this is currently only supproted on one desion engine at a time and you remove that limitation in the second patch
12:33:35 <amoralej> that's the point, the first one only may be problematic if running multiple dec-eng
12:34:05 <amoralej> then i will squash :)
12:34:37 <sean-k-mooney> ok well the other thing is did you look at why watcher-prometheus-integration-threading failed
12:34:59 <sean-k-mooney> ah it was just 1 test
12:34:59 <dviroel> yes, just checked
12:34:59 <amoralej> i think it's unrelated, it's a tempest test failed
12:35:02 <sean-k-mooney> so likely not relatd
12:35:18 <amoralej> but yeah, i will check why that on e failed
12:35:27 <dviroel> with the recent refactoring on tempest
12:35:32 <jgilaber> that job seems a bit flaky, I've seen it fail in a different patch https://zuul.opendev.org/t/openstack/builds?job_name=watcher-prometheus-integration-threading&project=openstack/watcher
12:35:41 <sean-k-mooney> we should look at enabling a second desion engine in the job also to test this properly
12:35:48 <dviroel> i may see some test that sometimes don't generate a Recommended action plan, and will fail
12:35:59 <amoralej> that's another question sean-k-mooney, lemme go to that at the end :)
12:36:03 <dviroel> so we would need to improve these tests
12:36:14 <dviroel> to avoid failures like that
12:36:29 <amoralej> another question i have, current code only takes care of audit migrations when there is a change in the state of a decision-engine. It saves some api calls and processing but may have corner cases
12:36:40 <sean-k-mooney> dviroel: ya i have notice at least one test failing semi often because of not actions in the plan
12:36:54 <amoralej> i.e. decision-engine-0 dies while decision-engine-1 is not running. Then decision-engine-1 is started
12:37:34 <sean-k-mooney> well the way this shoudl work is 1 the perodic shoudl run on agent start and 2 it shoudl set that it is the leader
12:37:36 <amoralej> if a decision-engine is detected as failed at start, then it will not look for assigned audits to it
12:37:48 <sean-k-mooney> so decision-engine-1 on start shoudl rebalance
12:37:59 <amoralej> not with current implementation
12:38:07 <sean-k-mooney> right but that need to be fixed in the first patch
12:38:08 <amoralej> current = the one in watcher-api
12:38:21 <amoralej> ack, I will, i was unsure that was a bug or a feature :)
12:38:42 <sean-k-mooney> well i was leaning betwen blueprint and specless blueprint so im definlty in hte feature camp for this over all
12:38:50 <dviroel> so, maybe another reason to consider a new implementation in dec-eng, and deprecate the api version one
12:39:02 <sean-k-mooney> im ok with it as a bug because of the eventlet removal btu if we are going to supprot this protperly in the descion engin
12:39:18 <sean-k-mooney> i dont wnat to claim supprot for this until the know issuee are adressed
12:40:22 <sean-k-mooney> that does nto mean it all has to happen in one patch
12:40:34 <sean-k-mooney> but it does mean that the first patch in its current form should not have Closes-Bug: #2126767
12:40:41 <sean-k-mooney> it should have partial-bug
12:40:56 <amoralej> i was thinking in squashing
12:41:28 <amoralej> would it be fine if i squash both patches + fix the logic in the new implementation for the issue i mentioned?
12:42:14 <dviroel> hum
12:42:50 <amoralej> feel free to propose what'd be better
12:43:21 <dviroel> to  use the same bug, would be good to add more details to the LP
12:43:49 <amoralej> or i can treat the second issue as a separate bug
12:44:15 <sean-k-mooney> so you coudl fix the issue of not handling the down service in the api version first
12:44:25 <sean-k-mooney> before the prot/refactor that your are doing
12:44:52 <amoralej> ah, that'd be another option... right
12:46:02 <dviroel> split the issues in two LP seems better imho
12:46:05 <amoralej> report it as a new bug (bug2), 1st review: fix bug2 in watcher-api, 2nd review: move + leader election
12:46:25 <sean-k-mooney> that sound reasonable to me +1
12:47:10 <dviroel> ok, amoralej anything else in this topic? I think that we can get more feedbacks in the reviews
12:47:22 <amoralej> wrt testing
12:47:29 <dviroel> sure
12:47:37 <amoralej> dunno if today or in ptg
12:47:50 <amoralej> how/if we test high availability upstream
12:48:16 <amoralej> but, we are short on time, we can discuss that in next meeting or ptg, no problem
12:48:24 <dviroel> i think that we can discuss at the ptg? it may require more time to discuss
12:48:30 <amoralej> yeah +1
12:48:40 <amoralej> i'm done then, thanks for your feedback!
12:48:50 <dviroel> thanks amoralej@
12:48:58 <dviroel> s/@/!
12:49:19 <dviroel> we have 10m, so lets quickly cover some open reviews
12:49:24 <dviroel> #topic Reviews
12:49:36 <dviroel> i will cover mine first
12:49:45 <dviroel> 960381: Fail if action plan cannot be execute | https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/960381
12:50:02 <dviroel> this is kind of one of the things that i wanted to fix
12:50:16 <dviroel> test were just retuning sucess if no action plan is generated
12:50:43 <sean-k-mooney> amoralej: we do not need to test the fail over but i do wnat to see 2 things, 1 we shoudl have 2 dession engine enabled in a job and 2 i want to see a debug log showing that one was elected the leader and the other was not.
12:50:46 <dviroel> which doesn't actually properly test the outcome of a strategy
12:50:59 <sean-k-mooney> "Fail if action plan cannot be execute"
12:51:12 <sean-k-mooney> ya that is an important fix IMO
12:51:46 <dviroel> so we might see some test that sometimes don't generate an action plan, to fail in CI
12:51:53 <sean-k-mooney> it may uncover other flaky tests but that a good thing
12:51:56 <dviroel> like the one in amoralej patches above
12:52:02 <sean-k-mooney> ya
12:52:05 <dviroel> which is likely the case
12:52:30 <jgilaber> +1
12:52:33 <dviroel> and then we can work on improving the test, to keep ci more stable
12:52:44 <dviroel> ok, next one
12:52:57 <sean-k-mooney> i just reviewd it now and approved it FYI
12:52:59 <dviroel> one from eventlet removal that was pending at my side
12:53:02 <dviroel> sean-k-mooney: tks
12:53:07 <dviroel> 960881: Add ThreadPool statistics debug log | https://review.opendev.org/c/openstack/watcher/+/960881
12:53:33 <dviroel> this is a similar approach from nova code, that print threadpool statistics
12:54:23 <dviroel> which would be helpfull for debug
12:54:57 <sean-k-mooney> ya i agree that it should eb off by default and have its own config option
12:54:57 <dviroel> since watcher has couple of thread pools around
12:55:01 <sean-k-mooney> i think print_thread_pool_stats is ok
12:55:23 <sean-k-mooney> and i see you enabled it in the zuul jobs
12:55:40 <sean-k-mooney> ill review teh patch fully and the job output later but i think this looks good overall
12:55:47 <dviroel> ack, thanks
12:55:53 <dviroel> thanks jgilaber for the review ther too
12:55:54 <dviroel> next one
12:56:03 <dviroel> 955472: Add Continuous audit dummy strategy scenario test | https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/955472
12:56:25 <chandankumar> It adds coverage for continuous audit using dummy strategies in scenario test, validating all state of Audit, Updating start/end time/interval with sec and crontab.
12:56:45 <chandankumar> It will make sure everything is working for continous audit.
12:56:56 <dviroel> chandankumar: ack, i already start reviewing it
12:57:14 <chandankumar> dviroel: thank you!
12:57:15 <dviroel> we are moving from nothing to many test for continuous audits
12:57:21 <dviroel> :)
12:57:37 <dviroel> thanks for proposing
12:57:40 <dviroel> next one
12:57:47 <dviroel> 958644: Add test for volume migrate with zone migration | https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/958644
12:58:17 <jgilaber> this adds a test for zone migration where it migrates volumes
12:58:18 <dviroel> jgilaber: is proposing some new volume migrate tests
12:58:21 <dviroel> ++
12:58:26 <jgilaber> we already merged the one with retype
12:58:53 <dviroel> ack, S change, I can add to my list too
12:58:55 <dviroel> thanks
12:59:00 <jgilaber> this test assumes that there are at least two cinder pools configured with the same volume_backend_name
12:59:24 <jgilaber> otherwise it's skipped
12:59:49 <dviroel> ack, I see that you rebased based on the refactoring ++
12:59:58 <jgilaber> I'm looking to add a similar test with multiple backends
13:00:24 <dviroel> ack
13:00:32 <dviroel> 960265: Handle missing fields building storage model XML | https://review.opendev.org/c/openstack/watcher/+/960265
13:00:52 <dviroel> this is a fix for https://bugs.launchpad.net/watcher/+bug/2121147
13:00:58 <jgilaber> this and the other two patches are all related to the storage model
13:01:17 <dviroel> 961667: Handle optional pool fields in Cinder notification | https://review.opendev.org/c/openstack/watcher/+/961667
13:01:23 <jgilaber> it logs some exceptions that are not really errors and can be confusing for the user
13:01:26 <dviroel> 958781: Add some debug logs to storage model | https://review.opendev.org/c/openstack/watcher/+/958781
13:01:55 <jgilaber> last one adds logs to clarify, the other two prevents erros over optional fields
13:02:07 <dviroel> ack, so it would be good to review all them together
13:02:17 <sean-k-mooney> jgilaber: ok so you moved it form log.excption which is actully at error level to log.debug
13:02:40 <jgilaber> I would say they are not super urgent, mainly UX improvements
13:02:57 <dviroel> ack
13:03:19 <sean-k-mooney> im not sure https://review.opendev.org/c/openstack/watcher/+/960265 is the write way to fix that but i have not looked deeply enought to provdie a difetn suggestion
13:03:36 <sean-k-mooney> to me that is just hiding the issue rather then fixing it
13:03:49 <jgilaber> sean-k-mooney, yes, since it was not really an error ( the storage model is building correctly)
13:03:55 <dviroel> yeah, I can't provide a feedback now, I will need to spend some time on it, even being XS change :)
13:04:23 <jgilaber> I think the right fix is to remove those optional fields
13:04:42 <jgilaber> but this is a small improvement for the logs
13:05:12 <dviroel> ack, we are over time, going to close the meeting, but feel free to continue the discussion in the channel
13:05:31 <dviroel> thanks jgilaber for volunteering to chair next meeting
13:05:42 <sean-k-mooney> jgilaber: i fell like we shoudl do the real fix rather then the logging change
13:05:48 <dviroel> let's wrap up
13:05:53 <dviroel> thank you all for participating
13:05:57 <dviroel> #endmeeting