12:00:47 <jgilaber> #startmeeting watcher 12:00:47 <opendevmeet> Meeting started Thu Nov 13 12:00:47 2025 UTC and is due to finish in 60 minutes. The chair is jgilaber. Information about MeetBot at http://wiki.debian.org/MeetBot. 12:00:47 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 12:00:47 <opendevmeet> The meeting name has been set to 'watcher' 12:01:06 <chandankumar> o/ 12:01:26 <rlandy_> o/ 12:01:36 <morenod> o/ 12:01:52 <jgilaber> courtesy ping: dviroel amoralej sean-k-mooney 12:02:04 <amoralej> o/ 12:02:07 <sean-k-mooney> o/ 12:02:09 <dviroel> o/ 12:02:46 <jgilaber> ok let's get started with today agenda, here's the link in case someone has a last minute topic https://etherpad.opendev.org/p/openstack-watcher-irc-meeting 12:02:58 <jgilaber> #link https://etherpad.opendev.org/p/openstack-watcher-irc-meeting 12:03:19 <jgilaber> we have some topic already 12:03:27 <jgilaber> I added the first 12:03:41 <jgilaber> #topic 2026.1 status etherpad review 12:03:50 <jgilaber> #link https://etherpad.opendev.org/p/watcher-2026.1-status 12:04:01 <sean-k-mooney> i need to step a way for 5-10 minues sorry brb 12:04:08 <jgilaber> we have two specs ready for review 12:04:21 <jgilaber> two active blueprints 12:04:33 <jgilaber> and two patches 12:04:49 <jgilaber> *three actually 12:05:06 <jgilaber> does anyone want to highlight something? 12:05:50 <amoralej> I added a topic to discuss about one of the blueprints in the agenda 12:06:02 <jgilaber> ack thanks amoralej 12:06:21 <jgilaber> I wanted to quickly point out that two of the patches are action items from the PTG 12:06:34 <jgilaber> 1. deprecate prometheus datasource in favor of aetos 12:06:47 <jgilaber> #link https://review.opendev.org/c/openstack/watcher/+/966672 12:06:57 <jgilaber> and 2. remove monasca datasource 12:07:02 <jgilaber> #link https://review.opendev.org/c/openstack/watcher/+/966786 12:07:08 <dviroel> ++ thanks jgilaber 12:07:21 <jgilaber> IIUC before proceeding with the deprecation we need to announce it in the ML 12:07:28 <jgilaber> do we need to do the same for the removal? 12:09:08 <dviroel> I am not sure, would need to check governance docs, but I don't think we need 12:09:26 <jgilaber> ack thanks dviroel, I can check that later 12:09:39 <jgilaber> if there are no more thoughts on this topic we can move to the next 12:09:51 <dviroel> one thing 12:09:59 <dviroel> just to mention, in the same line 12:10:12 <amoralej> for monasca, i think it's fine given that monasca is marked as inactive 12:10:35 <dviroel> that yesterday I sent another call for MAAS maintainers 12:10:38 <amoralej> for prometheus, it may be good to ask in the ML 12:10:50 <dviroel> #link https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/VIY3V2XD5H2KSSBQXLBS2QHKG24JKCQZ/ 12:11:33 <dviroel> jsut saw that we have an reply on that topic, but still, dmitriy doesn't plan to work on that soon 12:11:47 <dviroel> i will reply after our meeting 12:11:59 <jgilaber> is keeping the MAAS deprecated a hard blocker on the eventlet removal? 12:12:18 <jgilaber> could we for example keep it as only working with evenlet? 12:12:24 <dviroel> we need to remove MAAS code since it has evenlet-dependent code there 12:12:55 <dviroel> so we would deprecate it this cycle 12:13:13 <dviroel> we can't modify that code since we can't validate it 12:13:17 <sean-k-mooney> back 12:13:50 <dviroel> so the final move in eventlet-removal is to really remove eventlet code :) 12:14:11 <dviroel> and MAAS code will need to go together unless someone fix it 12:14:23 <jgilaber> ack I think I got confused because the proposal here is to mark as deprecated not removal in this cycle 12:14:32 <dviroel> correct 12:14:40 <jgilaber> so +1 for me to continue with the plan despite the reply 12:14:46 <dviroel> we need to deprecate in a SLURP release 12:14:55 <dviroel> and remove in a future release 12:15:07 <dviroel> we don't need to remove all code in 2026.1 12:15:21 <sean-k-mooney> technially we do not need to annouch deprecation on the mailing list but we can for more visisbality. however as part of the deprecation of the direct backend we shoudl provde upgrade instuction on how to adopt aetos we may want to do that in the deprecation patch but it shoudl happen ebfore the end of the cycle in either case 12:15:48 <sean-k-mooney> jgilaber: its not a hard blocker as maas is an optional integration 12:15:52 <opendevreview> David proposed openstack/watcher-tempest-plugin master: Add test for skipped actions https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/966860 12:16:02 <jgilaber> sean-k-mooney, thanks I did include a migration guide in the patch 12:16:03 <sean-k-mooney> in threaded mode it just wont work properly 12:16:45 <sean-k-mooney> what it is a blocker for woudl be removal of eventlet entirly unless we prot it or remove it 12:17:06 <sean-k-mooney> but we shoudl be able to achive this cycle goal of being able to run all compent s in threaded mode 12:17:23 <sean-k-mooney> i decided not to reply 12:17:26 <jgilaber> the final deadline for the evenlet removal is 2027.1? 12:17:28 <dviroel> about eventlet-removal schedule 12:17:30 <dviroel> #link https://governance.openstack.org/tc//goals/selected/remove-eventlet.html#completion-criteria 12:17:46 <dviroel> 2027.2 12:17:49 <sean-k-mooney> but even without the eventlet depency we are not currently planning to use ascynio 12:17:55 <jgilaber> thanks for the link dviroel 12:18:03 <sean-k-mooney> so the fact the mass code does use that is also a problem form my point of view 12:18:20 <sean-k-mooney> meaning it may need to be rewritten even if we ketp the integration 12:18:57 <sean-k-mooney> 2027.1 for services 12:19:03 <sean-k-mooney> 2027.2 for oslo 12:19:13 <sean-k-mooney> our goal is 2026.2 12:19:31 <sean-k-mooney> for removal and defaultign to threaded shoudl ideally happen this cycle 12:20:03 <sean-k-mooney> we have a full cycle of buffer so it ok if our schdule gets delayed a bit 12:20:23 <jgilaber> so we should deprecate MAAS in this cycle and remove in the next? 12:20:28 <sean-k-mooney> yep 12:20:40 <sean-k-mooney> that woudl be my perfernce in any case 12:20:43 <dviroel> that's the pan 12:20:54 <dviroel> s/pan/plan 12:20:57 <jgilaber> wfm, thanks! 12:21:08 <jgilaber> any other thoughts on this topic? 12:21:29 <dviroel> lets move :) 12:21:42 <jgilaber> ok, moving on chandankumar looks like the next topic is yours 12:21:50 <jgilaber> #topic Updates on Pytest vs PTI discussion 12:22:02 <chandankumar> As we know, watcher-dashboard tests are broken, needs rewriting. 12:22:10 <chandankumar> I sent an email < https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/3V3CNPQLB77SKFVLZ6LXJ5NPNYWW4QFD/ > needing TC guidance on using use Horizon's pytest fixtures in PTI doc. 12:22:20 <jgilaber> #link https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/3V3CNPQLB77SKFVLZ6LXJ5NPNYWW4QFD/ 12:22:24 <chandankumar> PTI recommends using stestr as a testrunner via tox and use unittest module for tests. 12:22:25 <chandankumar> Below is the summary of the thread. 12:22:32 <chandankumar> many people supports addition of pytest in the PTI doc as the tool has evolved over time. Some does not. 12:22:40 <chandankumar> Before modifying PTI doc we also need to emphasize on the importance of understanding why rules exist before changing them. 12:22:48 <chandankumar> keypoints discussed: 12:22:55 <chandankumar> Horizon is using pytest as tests are more maintainable and less flaky in CI. Horizon team were not aware about PTI. 12:23:03 <chandankumar> Horizon has never used stestr for running integration tests after 2018. Some exception made in past around nosetest usage in horizon < https://opendev.org/openstack/governance/commit/759c42b10cb3728f5549b05f68e826b1c62a968c > 12:23:11 <chandankumar> Rally is using pytest-fixtures module and pytest as a test runner. 12:23:18 <chandankumar> pytest lacks parallel test execution but seems to be solved by pytest-xdist, But someone needs to verify does it works with shared resources. 12:23:26 <chandankumar> pytest framework fixture dependency injection mechanism is not standard and locks tests into requiring pytest as the runner. 12:23:27 <sean-k-mooney> the test being more maintainable and less flaky is not because of pytest its because of how they were re written 12:23:39 <chandankumar> If we start using pytest, devs will add multiple pytest plugins, which might create a headache for packagers to package it. 12:23:45 <chandankumar> Not all the current unittest based integration tests are runned via pytest(I have not verified it). 12:23:51 <chandankumar> [other topic from this thread] replacing Horizon's local_settings.py with oslo.config-based configuration discussed long time back, no-one is working on that 12:23:54 <chandankumar> that's the summary 12:24:02 <chandankumar> I need some help to draft further response with following points 12:24:10 <chandankumar> [1]. Should we use Horizon's pytest work(they will not going to rewrite it into unittest) OR build unittest tests for consistency with main watcher project(will avoid learning new test framework)? We will be needing an spec for integration test. 12:24:30 <chandankumar> or Since we started the thread, should we propose a patch to update the PTI doc around using pytest? or let TC decide on this. 12:24:39 <sean-k-mooney> right now i am -1 on useing there test framewrok 12:24:50 <sean-k-mooney> -2 if it not in the pti 12:24:53 <jgilaber> thanks for the summary chandankumar, looks like there are a few related thread 12:25:54 <chandankumar> yes, there was one similar thread related to keystone. 12:26:19 <sean-k-mooney> do you mean the rust one? 12:26:23 <chandankumar> yes 12:26:25 <sean-k-mooney> because that is very diffent 12:26:36 <jgilaber> re the packaging, I'm not sure I understand the problem if it's already in Horizon 12:26:48 <sean-k-mooney> ya so that is a cause of the peoron wanting to use rust activly not following the process to have it propsed as a new language 12:27:05 <chandankumar> jgilaber: that was raised by debian packager. 12:27:22 <sean-k-mooney> jgilaber: so it might be in horizon but that does not mean its in rdo or in other distros 12:27:39 <sean-k-mooney> jgilaber: there was an explcit rule that it shoudl nto be allowed in the requriement repo (pytest) 12:27:52 <chandankumar> We also need to choose selenium/playwright for integration test. 12:27:58 <sean-k-mooney> the test runner was allowed but it was expclity not 12:28:07 <jgilaber> but if those distros pacakge horizon they must have dealt with it in some way, no? 12:28:08 <chandankumar> I need help on deciding next course of action here 12:28:09 <dviroel> is it going to be to hard to write our own tests and not depend on horizon? don't seems to be too much code in there tbh 12:28:38 <chandankumar> nothing is hard, whatever we write, we need to maintain it for longer term 12:29:06 <jgilaber> +1 to what dviroel said, looking at the amount of replies in the mailing list, this looks like a complicated topic 12:29:19 <sean-k-mooney> chandankumar: so we could do a poc of both approches and review however i thikn we also shoudl have a spec for this topic 12:29:59 <sean-k-mooney> adopting playwright aslo has packaging implication but unless unittests the borwser test faramework is not specified in the pti 12:30:15 <sean-k-mooney> so we can sleect the one that is best suited 12:31:07 <chandankumar> one question? for spec do we want to do poc first then spec 12:31:34 <chandankumar> poc with selinium with unittest and poc with playwright with unit test? 12:31:44 <jgilaber> I think it this case it would be helpful to have pocs while reviewing the spec 12:31:54 <sean-k-mooney> so normally you do the spec first to agree on the requirement and probelm statement and you amy do a very simple poc to supprot the spec 12:32:12 <sean-k-mooney> but obvioulsy you would not start the main bulk of the implemeiton until after the spec is approved 12:33:25 <chandankumar> ok, I will take a look at smallest bit and do the poc and with spec for both selenium and playwright 12:34:19 <chandankumar> taking this small bit as a example https://review.opendev.org/c/openstack/watcher-dashboard/+/959189: Fixed incorrect use of status_choices in statetable 12:34:46 <chandankumar> for testing via poc 12:35:08 <chandankumar> or do we need a different example? 12:36:56 <jgilaber> not sure about that, but I think that any example will do to show the pattern the tests should follow 12:38:23 <sean-k-mooney> that could work bu it not what i woudl start with 12:39:11 <sean-k-mooney> i woudl start with the basics fo naviagting to a pannel for exampel the action plan templeate o r goal or similar 12:39:39 <sean-k-mooney> and show hwo we will test that the relevent element are tehre. 12:40:04 <sean-k-mooney> for example we coudl start with creat a audit then check that an action plan is created 12:40:41 <chandankumar> sean-k-mooney: that sounds like a good example, thank you! 12:40:50 <jgilaber> thanks chandankumar, any more thoughts on this topic? 12:40:59 <chandankumar> Now I have the next course of action. 12:41:21 <sean-k-mooney> i might spend a day and try my own poc just to get a feel for it depending on how much time i have next week 12:41:34 <jgilaber> ack, moving to the next topic from dviroel 12:41:39 <jgilaber> #topic Issues with Taskflow parallel engine in threading mode 12:41:49 <dviroel> tks 12:41:49 <sean-k-mooney> what i really want to see is what is the ux fo creatign a test, and what output we get and howt this works end to end 12:42:01 <sean-k-mooney> but ya lets move on 12:42:03 <chandankumar> thank you everyone for input on this thread 12:42:10 <dviroel> ok, let me add some background on that issue/topic 12:42:25 <dviroel> while working on eventlet-removal 12:42:37 <dviroel> but first adding support to native thread mode, we start to see sql error in our unit tests, when executing action plans, e.g: 12:42:44 <dviroel> #link https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_f82/openstack/f827ad85b1294c43bae471abcbb69d2d/testr_results.html 12:42:49 <dviroel> "error: sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) cannot start a transaction within a transaction" 12:43:14 <dviroel> something that doesn't reproduce in the integration tests 12:43:28 <dviroel> but still reproducible in unit tests 12:43:48 <dviroel> it happens that our applier works with multiple threadpools today 12:44:20 <dviroel> the processing of starting action plans is handled by a threadpool executor, where each action plan creates a new taskflow engine that when working in parallel mode creates another threadpool 12:44:36 <dviroel> in addition to that we also have a thread spawn for each action, so it can be killed when running in eventlet mode 12:45:05 <dviroel> not an easy thing to debug or follow 12:45:37 <dviroel> while investigation the issue, i asked some help from sean-k-mooney (tks!), and we start to dig into details, and how watcher currently implements the db access using oslo.db 12:46:13 <dviroel> the current implementation in our db api, using a threading.local object as context may not be the best approach 12:46:30 <dviroel> and we may need to rework everything there, but still no guarantee that this would solve our concurrency problems 12:47:16 <dviroel> so we want to propose a short-term solution for now to unblock the eventlet-removal work 12:47:48 <dviroel> which is implemented in 12:47:52 <dviroel> #link https://review.opendev.org/c/openstack/watcher/+/966226 12:48:27 <dviroel> where we configure taskflow engine to work with the "serial" mode when native thread is configured 12:49:08 <dviroel> with that, we would lose the parallellism across actions of a action plan, when running in native thread mode (for now) 12:49:18 <sean-k-mooney> process mode woudl also likely work but that feels to heavy weight if we are going to allow more then 1 action plan to be proceed at once per applier 12:50:01 <dviroel> sean-k-mooney: agree 12:50:11 <sean-k-mooney> the fact that we have potenitaly 2 levels of concurancy (concurrent actions and concurrent action plans) 12:50:27 <dviroel> so with that we can continue working on a better solution to enable parallellism, which may need some rework on the db api and in the applier 12:50:31 <sean-k-mooney> makes the scaling and resource usage harder to reason about 12:51:21 <amoralej> losing parallelism in an actionplan is a bad thing, tbh 12:51:31 <amoralej> but if there are no alternative ... 12:51:34 <sean-k-mooney> my curent thinking is as follows 12:51:55 <dviroel> amoralej: yes, but this is the short-term solution while we improve watcher 12:52:16 <sean-k-mooney> i thiks that we need ot factor out the management of the action plan and the concurance of actions within that into a seperate compoent 12:52:40 <sean-k-mooney> and then an applier will be able to execute multipel actions but each action woudl be a result fo an rpc call 12:53:26 <sean-k-mooney> so we woudl not use tasks flows engine to manage the concurancy 12:53:37 <sean-k-mooney> at least not the way we do today 12:54:23 <amoralej> so having an action-dispatcher 12:54:35 <dviroel> this could be considered as part of the scaling topic that amoralej is working with too, since it may solve more problems 12:54:46 <sean-k-mooney> yes, each node in the graph woudl effectivly jsut be dispatching an rpc to be proceed by the applier 12:54:55 <amoralej> yes, that's what i'm thinking about 12:55:13 <amoralej> i think that's a good solution, but also it will take time to develop 12:55:14 <dviroel> and that's why we wouldn't be fixing now in eventlet-removal effort 12:55:38 <sean-k-mooney> right the other thing is if we can run 50 appliers 12:55:42 <amoralej> that's why i'm afraid that the short-term solution may stay for some time ... 12:55:51 <sean-k-mooney> and load balcne action plans over applier 12:56:02 <amoralej> i think that's better actually, as a short term 12:56:10 <sean-k-mooney> we may or may not need the 2 levels fo concurance we have today even without the action dispatcher 12:56:41 <amoralej> if each applier can only run an actionplan at a time, that would also fix the issue? 12:56:50 <sean-k-mooney> no 12:57:02 <amoralej> ah, I missunderstood then :) 12:57:11 <sean-k-mooney> the probelm is that each action is sharign a singel db connection 12:57:21 <sean-k-mooney> we effectivly need one connection per thread 12:57:36 <sean-k-mooney> if we want to re enabel the threadpool engine 12:57:53 <sean-k-mooney> that or have all db oepratiosn done by the top level thread 12:57:56 <amoralej> ok, that's the change in db access that dviroel mentioned before 12:58:01 <sean-k-mooney> that the actual probelm we have today 12:58:22 <sean-k-mooney> ya there is maybe oen other hack we coudl do in the short-medium term 12:58:44 <sean-k-mooney> we coudl intoduce a transaction lock 12:59:05 <sean-k-mooney> effectivly to serialise all db transactions 12:59:20 <sean-k-mooney> my concern with taht is obviously again perfaomce and possibel deadlocks 12:59:32 <amoralej> to be clear, i don't mind having a short-term solution that reduces concurrency if we are able to do a better fix during the G releasse 12:59:59 <sean-k-mooney> well for now this woudl only be enabled if you disable eventlet 13:00:15 <amoralej> that's right.... good point 13:00:15 <sean-k-mooney> so there is actully no regression unless you opt into threaded mode 13:00:26 <sean-k-mooney> so we really have until the end of 2026.2 13:00:28 <dviroel> ack, and we don't need to enable threading mode for the applier as default in this release 13:00:30 <amoralej> right 13:00:42 <sean-k-mooney> i.e. when we remove the eventlet option even 2027.1 if we really need it 13:01:09 <sean-k-mooney> so to summerise the general propsal 13:01:10 <amoralej> that makes sense 13:01:18 <sean-k-mooney> 1 in thread mode use the serial engine for now 13:01:27 <sean-k-mooney> 2 continue to supprot the greenpool when using eventlet 13:01:56 <sean-k-mooney> 3 before we remvoe eventlet supprot we shoudl develop a way to do parallel action execution in threaded mode 13:02:10 <sean-k-mooney> 4 defer making it the default until 3 is done 13:02:20 <amoralej> wfm +1 13:02:37 <dviroel> yeah, sounds like a good plan 13:02:42 <jgilaber> +1 13:03:01 <jgilaber> that was a good discussion, but we're overtime and have one more topic to cover 13:03:11 <sean-k-mooney> ack lets move on 13:03:12 <jgilaber> we can continue in irc or the patch 13:03:25 <dviroel> ack, yes jgilaber 13:03:30 <jgilaber> last topic, we can leave the bug triage for next week 13:03:35 <jgilaber> #topic new blueprint for automatic skip actions on pre_condition 13:03:38 <jgilaber> from amoralej 13:03:44 <amoralej> #link https://blueprints.launchpad.net/watcher/+spec/skip-actions-in-pre-condition 13:04:04 <amoralej> as discussed I've created the blueprint for automatic actions skip 13:04:21 <amoralej> and send initial patch for migrate action https://review.opendev.org/c/openstack/watcher/+/966699 13:04:29 <amoralej> i plan to send a review per-action 13:05:04 <amoralej> I'm not sure what's the review process for blueprints, given that there is no peer review 13:05:10 <amoralej> lemme know if i should add something 13:05:27 <sean-k-mooney> amoralej: the normal proces is to present them in a team meeting 13:05:37 <sean-k-mooney> where we would agree it can proceed without a spec or ask for one 13:05:37 <amoralej> ok, so, that was my plan for today :) 13:05:52 <sean-k-mooney> and then if we agree we mark it as approved for the cycle and leave a comment in the blueprint 13:05:54 <amoralej> we can discuss on next mtg anyway 13:06:05 <dviroel> +! 13:06:07 <dviroel> +1 13:06:21 <sean-k-mooney> ok we can come back to it then 13:06:46 <amoralej> sure, no problem 13:06:57 <amoralej> I'll add the topic to the agenda 13:07:01 <jgilaber> ack, thanks amoralej 13:07:21 <jgilaber> last topic 13:07:24 <jgilaber> #topic Volunteers to chair next meeting 13:07:41 <jgilaber> any volunteer? 13:08:17 <dviroel> i can't :( - local holiday next thursday 13:08:41 <rlandy> I'l do it 13:08:49 <jgilaber> np dviroel thanks rlandy 13:09:04 <jgilaber> that's it for today, thanks all for participating! 13:09:11 <dviroel> thank you all 13:09:12 <jgilaber> #endmeeting