12:00:47 <jgilaber> #startmeeting watcher
12:00:47 <opendevmeet> Meeting started Thu Nov 13 12:00:47 2025 UTC and is due to finish in 60 minutes.  The chair is jgilaber. Information about MeetBot at http://wiki.debian.org/MeetBot.
12:00:47 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
12:00:47 <opendevmeet> The meeting name has been set to 'watcher'
12:01:06 <chandankumar> o/
12:01:26 <rlandy_> o/
12:01:36 <morenod> o/
12:01:52 <jgilaber> courtesy ping: dviroel amoralej sean-k-mooney
12:02:04 <amoralej> o/
12:02:07 <sean-k-mooney> o/
12:02:09 <dviroel> o/
12:02:46 <jgilaber> ok let's get started with today agenda, here's the link in case someone has a last minute topic https://etherpad.opendev.org/p/openstack-watcher-irc-meeting
12:02:58 <jgilaber> #link https://etherpad.opendev.org/p/openstack-watcher-irc-meeting
12:03:19 <jgilaber> we have some topic already
12:03:27 <jgilaber> I added the first
12:03:41 <jgilaber> #topic 2026.1 status etherpad review
12:03:50 <jgilaber> #link https://etherpad.opendev.org/p/watcher-2026.1-status
12:04:01 <sean-k-mooney> i need to step a way for 5-10 minues sorry brb
12:04:08 <jgilaber> we have two specs ready for review
12:04:21 <jgilaber> two active blueprints
12:04:33 <jgilaber> and two patches
12:04:49 <jgilaber> *three actually
12:05:06 <jgilaber> does anyone want to highlight something?
12:05:50 <amoralej> I added a topic to discuss about one of the blueprints in the agenda
12:06:02 <jgilaber> ack thanks amoralej
12:06:21 <jgilaber> I wanted to quickly point out that two of the patches are action items from the PTG
12:06:34 <jgilaber> 1. deprecate prometheus datasource in favor of aetos
12:06:47 <jgilaber> #link https://review.opendev.org/c/openstack/watcher/+/966672
12:06:57 <jgilaber> and 2. remove monasca datasource
12:07:02 <jgilaber> #link https://review.opendev.org/c/openstack/watcher/+/966786
12:07:08 <dviroel> ++ thanks jgilaber
12:07:21 <jgilaber> IIUC before proceeding with the deprecation we need to announce it in the ML
12:07:28 <jgilaber> do we need to do the same for the removal?
12:09:08 <dviroel> I am not sure, would need to check governance docs, but I don't think we need
12:09:26 <jgilaber> ack thanks dviroel, I can check that later
12:09:39 <jgilaber> if there are no more thoughts on this topic we can move to the next
12:09:51 <dviroel> one thing
12:09:59 <dviroel> just to mention, in the same line
12:10:12 <amoralej> for monasca, i think it's fine given that monasca is marked as inactive
12:10:35 <dviroel> that yesterday I sent another call for MAAS maintainers
12:10:38 <amoralej> for prometheus, it may be good to ask in the ML
12:10:50 <dviroel> #link https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/VIY3V2XD5H2KSSBQXLBS2QHKG24JKCQZ/
12:11:33 <dviroel> jsut saw that we have an reply on that topic, but still, dmitriy doesn't plan to work on that soon
12:11:47 <dviroel> i will reply after our meeting
12:11:59 <jgilaber> is keeping the MAAS deprecated a hard blocker on the eventlet removal?
12:12:18 <jgilaber> could we for example keep it as only working with evenlet?
12:12:24 <dviroel> we need to remove MAAS code since it has evenlet-dependent code there
12:12:55 <dviroel> so we would deprecate it this cycle
12:13:13 <dviroel> we can't modify that code since we can't validate it
12:13:17 <sean-k-mooney> back
12:13:50 <dviroel> so the final move in eventlet-removal is to really remove eventlet code :)
12:14:11 <dviroel> and MAAS code will need to go together unless someone fix it
12:14:23 <jgilaber> ack I think I got confused because the proposal here is to mark as deprecated not removal in this cycle
12:14:32 <dviroel> correct
12:14:40 <jgilaber> so +1 for me to continue with the plan despite the reply
12:14:46 <dviroel> we need to deprecate in a SLURP release
12:14:55 <dviroel> and remove in a future release
12:15:07 <dviroel> we don't need to remove all code in 2026.1
12:15:21 <sean-k-mooney> technially we do not need to annouch deprecation on the mailing list but we can for more visisbality. however as part of the deprecation of the direct backend we shoudl provde upgrade instuction on how to adopt aetos we may want to do that in the deprecation patch but it shoudl happen ebfore the end of the cycle in either case
12:15:48 <sean-k-mooney> jgilaber: its not a hard blocker as maas is an optional integration
12:15:52 <opendevreview> David proposed openstack/watcher-tempest-plugin master: Add test for skipped actions  https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/966860
12:16:02 <jgilaber> sean-k-mooney, thanks I did include a migration guide in the patch
12:16:03 <sean-k-mooney> in threaded mode it just wont work properly
12:16:45 <sean-k-mooney> what it is a blocker for woudl be removal of eventlet entirly unless we prot it or remove it
12:17:06 <sean-k-mooney> but we shoudl be able to achive this cycle goal of being able to run all compent s in threaded mode
12:17:23 <sean-k-mooney> i decided not to reply
12:17:26 <jgilaber> the final deadline for the evenlet removal is 2027.1?
12:17:28 <dviroel> about eventlet-removal schedule
12:17:30 <dviroel> #link https://governance.openstack.org/tc//goals/selected/remove-eventlet.html#completion-criteria
12:17:46 <dviroel> 2027.2
12:17:49 <sean-k-mooney> but even without the eventlet depency we are not currently planning to use ascynio
12:17:55 <jgilaber> thanks for the link dviroel
12:18:03 <sean-k-mooney> so the fact the mass code does use that is also a problem form my point of view
12:18:20 <sean-k-mooney> meaning it may need to be rewritten even if we ketp the integration
12:18:57 <sean-k-mooney> 2027.1 for services
12:19:03 <sean-k-mooney> 2027.2 for oslo
12:19:13 <sean-k-mooney> our goal is 2026.2
12:19:31 <sean-k-mooney> for removal and defaultign to threaded shoudl ideally happen this cycle
12:20:03 <sean-k-mooney> we have a full cycle of buffer so it ok if our schdule gets delayed a bit
12:20:23 <jgilaber> so we should deprecate MAAS in this cycle and remove in the next?
12:20:28 <sean-k-mooney> yep
12:20:40 <sean-k-mooney> that woudl be my perfernce in any case
12:20:43 <dviroel> that's the pan
12:20:54 <dviroel> s/pan/plan
12:20:57 <jgilaber> wfm, thanks!
12:21:08 <jgilaber> any other thoughts on this topic?
12:21:29 <dviroel> lets move :)
12:21:42 <jgilaber> ok, moving on chandankumar looks like the next topic is yours
12:21:50 <jgilaber> #topic Updates on Pytest vs PTI discussion
12:22:02 <chandankumar> As we know, watcher-dashboard tests are broken, needs rewriting.
12:22:10 <chandankumar> I sent an email <  https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/3V3CNPQLB77SKFVLZ6LXJ5NPNYWW4QFD/ > needing TC guidance on using use Horizon's pytest fixtures in PTI doc.
12:22:20 <jgilaber> #link https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/3V3CNPQLB77SKFVLZ6LXJ5NPNYWW4QFD/
12:22:24 <chandankumar> PTI recommends using stestr as a testrunner via tox and use unittest module for tests.
12:22:25 <chandankumar> Below is the summary of the thread.
12:22:32 <chandankumar> many people supports addition of pytest in the PTI doc as the tool has evolved over time. Some does not.
12:22:40 <chandankumar> Before modifying PTI doc we also need to emphasize on the importance of understanding why rules exist before changing them.
12:22:48 <chandankumar> keypoints discussed:
12:22:55 <chandankumar> Horizon is using pytest as tests are more maintainable and less flaky in CI. Horizon team were not aware about PTI.
12:23:03 <chandankumar> Horizon has never used stestr for running integration tests after 2018. Some exception made in past around nosetest usage in horizon < https://opendev.org/openstack/governance/commit/759c42b10cb3728f5549b05f68e826b1c62a968c >
12:23:11 <chandankumar> Rally is using pytest-fixtures module and pytest as a test runner.
12:23:18 <chandankumar> pytest lacks parallel test execution but seems to be solved by pytest-xdist, But someone needs to verify does it works with shared resources.
12:23:26 <chandankumar> pytest framework fixture dependency injection mechanism is not standard and locks tests into requiring pytest as the runner.
12:23:27 <sean-k-mooney> the test being more maintainable and less flaky is not because of pytest its because of how they were re written
12:23:39 <chandankumar> If we start using pytest, devs will add multiple pytest plugins, which might create a headache for packagers to package it.
12:23:45 <chandankumar> Not all the current unittest based integration tests are runned via pytest(I have not verified it).
12:23:51 <chandankumar> [other topic from this thread] replacing Horizon's local_settings.py with oslo.config-based configuration discussed long time back, no-one is working on that
12:23:54 <chandankumar> that's the summary
12:24:02 <chandankumar> I need some help to draft further response with following points
12:24:10 <chandankumar> [1]. Should we use Horizon's pytest work(they will not going to rewrite it into unittest) OR build unittest tests for consistency with main watcher project(will avoid learning new test framework)? We will be needing an spec for integration test.
12:24:30 <chandankumar> or Since we started the thread, should we propose a patch to update the PTI doc around using pytest? or let TC decide on this.
12:24:39 <sean-k-mooney> right now i am -1 on useing there test framewrok
12:24:50 <sean-k-mooney> -2 if it not in the pti
12:24:53 <jgilaber> thanks for the summary chandankumar, looks like there are a few related thread
12:25:54 <chandankumar> yes, there was one similar thread related to keystone.
12:26:19 <sean-k-mooney> do you mean the rust one?
12:26:23 <chandankumar> yes
12:26:25 <sean-k-mooney> because that is very diffent
12:26:36 <jgilaber> re the packaging, I'm not sure I understand the problem if it's already in Horizon
12:26:48 <sean-k-mooney> ya so that is a cause of the peoron wanting to use rust activly not following the process to have it propsed as a new language
12:27:05 <chandankumar> jgilaber: that was raised by debian packager.
12:27:22 <sean-k-mooney> jgilaber: so it might be in horizon but that does not mean its in rdo or in other distros
12:27:39 <sean-k-mooney> jgilaber: there was an explcit rule that it shoudl nto be allowed in the requriement repo (pytest)
12:27:52 <chandankumar> We also need to choose selenium/playwright for integration test.
12:27:58 <sean-k-mooney> the test runner was allowed but it was expclity not
12:28:07 <jgilaber> but if those distros pacakge horizon they must have dealt with it in some way, no?
12:28:08 <chandankumar> I need help on deciding next course of action here
12:28:09 <dviroel> is it going to be to hard to write our own tests and not depend on  horizon? don't seems to be too much code in there tbh
12:28:38 <chandankumar> nothing is hard, whatever we write, we need to maintain it for longer term
12:29:06 <jgilaber> +1 to what dviroel said, looking at the amount of replies in the mailing list, this looks like a complicated topic
12:29:19 <sean-k-mooney> chandankumar: so we could do a poc of both approches and review however i thikn we also shoudl have a spec for this topic
12:29:59 <sean-k-mooney> adopting playwright aslo has packaging implication but unless unittests the borwser test faramework is not specified in the pti
12:30:15 <sean-k-mooney> so we can sleect the one that is best suited
12:31:07 <chandankumar> one question? for spec do we want to do poc first then spec
12:31:34 <chandankumar> poc with selinium with unittest and poc with playwright with unit test?
12:31:44 <jgilaber> I think it this case it would be helpful to have pocs while reviewing the spec
12:31:54 <sean-k-mooney> so normally you do the spec first to agree on the requirement and probelm statement and you amy do a very simple poc to supprot the spec
12:32:12 <sean-k-mooney> but obvioulsy you would not start the main bulk of the implemeiton until after the spec is approved
12:33:25 <chandankumar> ok, I will take a look at smallest bit and do the poc and with spec for both selenium and playwright
12:34:19 <chandankumar> taking this small bit as a example https://review.opendev.org/c/openstack/watcher-dashboard/+/959189: Fixed incorrect use of status_choices in statetable
12:34:46 <chandankumar> for testing via poc
12:35:08 <chandankumar> or do we need a different example?
12:36:56 <jgilaber> not sure about that, but I think that any example will do to show the pattern the tests should follow
12:38:23 <sean-k-mooney> that could work bu it not what i woudl start with
12:39:11 <sean-k-mooney> i woudl start with the basics fo naviagting to a pannel for exampel the action plan templeate o r goal or similar
12:39:39 <sean-k-mooney> and show hwo we will test that the relevent element are tehre.
12:40:04 <sean-k-mooney> for example we coudl start with creat a audit then check that an action plan is created
12:40:41 <chandankumar> sean-k-mooney: that sounds like a good example, thank you!
12:40:50 <jgilaber> thanks chandankumar, any more thoughts on this topic?
12:40:59 <chandankumar> Now I have the next course of action.
12:41:21 <sean-k-mooney> i might spend a day and try my own poc just to get a feel for it depending on how much time i have next week
12:41:34 <jgilaber> ack, moving to the next topic from dviroel
12:41:39 <jgilaber> #topic Issues with Taskflow parallel engine in threading mode
12:41:49 <dviroel> tks
12:41:49 <sean-k-mooney> what i really want to see is what is the ux fo creatign a test, and what output we get and howt this works end to end
12:42:01 <sean-k-mooney> but ya lets move on
12:42:03 <chandankumar> thank you everyone for input on this thread
12:42:10 <dviroel> ok, let me add some background on that issue/topic
12:42:25 <dviroel> while working on eventlet-removal
12:42:37 <dviroel> but first adding support to native thread mode, we start to see sql error in our unit tests, when executing action plans, e.g:
12:42:44 <dviroel> #link https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_f82/openstack/f827ad85b1294c43bae471abcbb69d2d/testr_results.html
12:42:49 <dviroel> "error: sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) cannot start a transaction within a transaction"
12:43:14 <dviroel> something that doesn't reproduce in the integration tests
12:43:28 <dviroel> but still reproducible in unit tests
12:43:48 <dviroel> it happens that our applier works with multiple threadpools today
12:44:20 <dviroel> the processing of starting action plans is handled by a threadpool executor, where each action plan creates a new taskflow engine that when working in parallel mode creates another threadpool
12:44:36 <dviroel> in addition to that we also have a thread spawn for each action, so it can be killed when running in eventlet mode
12:45:05 <dviroel> not an easy thing to debug or follow
12:45:37 <dviroel> while investigation the issue, i asked some help from sean-k-mooney (tks!), and we start to dig into details, and how watcher currently implements the db access using oslo.db
12:46:13 <dviroel> the current implementation in our db api, using a threading.local object as context may not be the best approach
12:46:30 <dviroel> and we may need to rework everything there, but still no guarantee that this would solve our concurrency problems
12:47:16 <dviroel> so we want to propose a short-term solution for now to unblock the eventlet-removal work
12:47:48 <dviroel> which is implemented in
12:47:52 <dviroel> #link https://review.opendev.org/c/openstack/watcher/+/966226
12:48:27 <dviroel> where we configure taskflow engine to work with the "serial" mode when native thread is configured
12:49:08 <dviroel> with that, we would lose the parallellism across actions of a action plan, when running in native thread mode (for now)
12:49:18 <sean-k-mooney> process mode woudl also likely work but that feels to heavy weight if we are going to allow more then 1 action plan to be proceed at once per applier
12:50:01 <dviroel> sean-k-mooney: agree
12:50:11 <sean-k-mooney> the fact that we have potenitaly 2 levels of concurancy (concurrent actions and concurrent action plans)
12:50:27 <dviroel> so with that we can continue working on a better solution to enable parallellism, which may need some rework on the db api and in the applier
12:50:31 <sean-k-mooney> makes the scaling and resource usage harder to reason about
12:51:21 <amoralej> losing parallelism in an actionplan is a bad thing, tbh
12:51:31 <amoralej> but if there are no alternative ...
12:51:34 <sean-k-mooney> my curent thinking is as follows
12:51:55 <dviroel> amoralej: yes, but this is the short-term solution while we improve watcher
12:52:16 <sean-k-mooney> i thiks that we need ot factor out the management of the action plan and the concurance of actions within that into a seperate compoent
12:52:40 <sean-k-mooney> and then an applier will be able to execute multipel actions but each action woudl be a result fo an rpc call
12:53:26 <sean-k-mooney> so we woudl not use tasks flows engine to manage the concurancy
12:53:37 <sean-k-mooney> at least not the way we do today
12:54:23 <amoralej> so having an action-dispatcher
12:54:35 <dviroel> this could be considered as part of the scaling topic that amoralej is working with too, since it may solve more problems
12:54:46 <sean-k-mooney> yes, each node in the graph woudl effectivly jsut be dispatching an rpc to be proceed by the applier
12:54:55 <amoralej> yes, that's what i'm thinking about
12:55:13 <amoralej> i think that's a good solution, but also it will take time to develop
12:55:14 <dviroel> and that's why we wouldn't be fixing now in eventlet-removal effort
12:55:38 <sean-k-mooney> right the other thing is if we can run 50 appliers
12:55:42 <amoralej> that's why i'm afraid that the short-term solution may stay for some time ...
12:55:51 <sean-k-mooney> and load balcne action plans over applier
12:56:02 <amoralej> i think that's better actually, as a short term
12:56:10 <sean-k-mooney> we may or may not need the 2 levels fo concurance we have today even without the action dispatcher
12:56:41 <amoralej> if each applier can only run an actionplan at a time, that would also fix the issue?
12:56:50 <sean-k-mooney> no
12:57:02 <amoralej> ah, I missunderstood then :)
12:57:11 <sean-k-mooney> the probelm is that each action is sharign a singel db connection
12:57:21 <sean-k-mooney> we effectivly need one connection per thread
12:57:36 <sean-k-mooney> if we want to re enabel the threadpool engine
12:57:53 <sean-k-mooney> that or have all db oepratiosn done by the top level thread
12:57:56 <amoralej> ok, that's the change in db access that dviroel mentioned before
12:58:01 <sean-k-mooney> that the actual probelm we have today
12:58:22 <sean-k-mooney> ya there is maybe oen other hack we coudl do in the short-medium term
12:58:44 <sean-k-mooney> we coudl intoduce a transaction lock
12:59:05 <sean-k-mooney> effectivly to serialise all db transactions
12:59:20 <sean-k-mooney> my concern with taht is obviously again perfaomce and possibel deadlocks
12:59:32 <amoralej> to be clear, i don't mind having a short-term solution that reduces concurrency if we are able to do a better fix during the G releasse
12:59:59 <sean-k-mooney> well for now this woudl only be enabled if you disable eventlet
13:00:15 <amoralej> that's right.... good point
13:00:15 <sean-k-mooney> so there is actully no regression unless you opt into threaded mode
13:00:26 <sean-k-mooney> so we really have until the end of 2026.2
13:00:28 <dviroel> ack, and we don't need to enable threading mode for the applier as default in this release
13:00:30 <amoralej> right
13:00:42 <sean-k-mooney> i.e. when we remove the eventlet option even 2027.1 if we really need it
13:01:09 <sean-k-mooney> so to summerise the general propsal
13:01:10 <amoralej> that makes sense
13:01:18 <sean-k-mooney> 1 in thread mode use the serial engine for now
13:01:27 <sean-k-mooney> 2 continue to supprot the greenpool when using eventlet
13:01:56 <sean-k-mooney> 3 before we remvoe eventlet supprot we shoudl develop a way to do parallel action execution in threaded mode
13:02:10 <sean-k-mooney> 4 defer making it the default until 3 is done
13:02:20 <amoralej> wfm +1
13:02:37 <dviroel> yeah, sounds like a good plan
13:02:42 <jgilaber> +1
13:03:01 <jgilaber> that was a good discussion, but we're overtime and have one more topic to cover
13:03:11 <sean-k-mooney> ack lets move on
13:03:12 <jgilaber> we can continue in irc or the patch
13:03:25 <dviroel> ack, yes jgilaber
13:03:30 <jgilaber> last topic, we can leave the bug triage for next week
13:03:35 <jgilaber> #topic  new blueprint for automatic skip actions on pre_condition
13:03:38 <jgilaber> from amoralej
13:03:44 <amoralej> #link  https://blueprints.launchpad.net/watcher/+spec/skip-actions-in-pre-condition
13:04:04 <amoralej> as discussed I've created the blueprint for automatic actions skip
13:04:21 <amoralej> and send initial patch for migrate action https://review.opendev.org/c/openstack/watcher/+/966699
13:04:29 <amoralej> i plan to send a review per-action
13:05:04 <amoralej> I'm not sure what's the review process for blueprints, given that there is no peer review
13:05:10 <amoralej> lemme know if i should add something
13:05:27 <sean-k-mooney> amoralej: the normal proces is to present them in a team meeting
13:05:37 <sean-k-mooney> where we would agree it can proceed without a spec or ask for one
13:05:37 <amoralej> ok, so, that was my plan for today :)
13:05:52 <sean-k-mooney> and then if we agree we mark it as approved for the cycle and leave a comment in the blueprint
13:05:54 <amoralej> we can discuss on next mtg anyway
13:06:05 <dviroel> +!
13:06:07 <dviroel> +1
13:06:21 <sean-k-mooney> ok we can come back to it then
13:06:46 <amoralej> sure, no problem
13:06:57 <amoralej> I'll add the topic to the agenda
13:07:01 <jgilaber> ack, thanks amoralej
13:07:21 <jgilaber> last topic
13:07:24 <jgilaber> #topic Volunteers to chair next meeting
13:07:41 <jgilaber> any volunteer?
13:08:17 <dviroel> i can't :( - local holiday next thursday
13:08:41 <rlandy> I'l do it
13:08:49 <jgilaber> np dviroel thanks rlandy
13:09:04 <jgilaber> that's it for today, thanks all for participating!
13:09:11 <dviroel> thank you all
13:09:12 <jgilaber> #endmeeting