12:00:43 <dviroel> #startmeeting watcher 12:00:43 <opendevmeet> Meeting started Thu Jun 12 12:00:43 2025 UTC and is due to finish in 60 minutes. The chair is dviroel. Information about MeetBot at http://wiki.debian.org/MeetBot. 12:00:43 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 12:00:43 <opendevmeet> The meeting name has been set to 'watcher' 12:00:58 <dviroel> hi all, who's is around today? 12:01:29 <sean-k-mooney> o/ 12:01:37 <sean-k-mooney> im doing spec review but im here 12:01:43 <dviroel> courtesy ping: amoralej jgilaber chandankumar morenod 12:01:46 <jgilaber> o/ 12:01:51 <mtembo> o/ 12:02:01 <opendevreview> David proposed openstack/watcher-tempest-plugin master: Add tests for workload_balance with injected data https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/949722 12:02:04 <amoralej> o/ 12:02:28 <dviroel> alright, let's start with today's meeting agenda 12:02:33 <dviroel> #link https://etherpad.opendev.org/p/openstack-watcher-irc-meeting#L45 (Meeting agenda) 12:02:51 <dviroel> feel free to add your own topics to the agenda 12:03:00 <dviroel> there are topics to place your changes that requires attention from reviewers 12:03:10 <dviroel> lets start with the first one 12:03:27 <dviroel> #topic Eventlet Removal 12:03:35 <dviroel> #link https://etherpad.opendev.org/p/watcher-eventlet-removal 12:03:40 <rlandy> o/ 12:03:50 <dviroel> not too much to add i think 12:03:51 <dviroel> I updated the patch to move eventlet commands to a different directory 12:04:06 <dviroel> and propose a WIP patch for decision engine 12:04:12 <dviroel> link for related patches: 12:04:18 <dviroel> #link https://etherpad.opendev.org/p/watcher-eventlet-removal#L11 12:04:37 <dviroel> the WIP patch for decision engine, which is now running on CI, is failing :P 12:04:59 <dviroel> the patch allows to replace usage of green thread pools and test against oslo.service new threading backend 12:05:20 <dviroel> as for now, there is an issue with cluster data model updates which I'm investigating and working on a fix on my env 12:05:49 <dviroel> i will bring more updates about that soon 12:06:26 <dviroel> any question or comments on this topic? 12:07:27 <dviroel> ok, moving to the next topic 12:07:36 <amoralej> the plan is to make the change configurable and disabled by default? 12:07:47 <amoralej> or moving to the new threading backend in oslo.service by default? 12:08:02 <sean-k-mooney> different project are taking diffent approches 12:08:12 <sean-k-mooney> nova is making it configurable via a ENV var 12:08:29 <sean-k-mooney> ironic and neutron just ripped out supprot i think 12:08:30 <dviroel> yes amoralej, and have a CI job running with eventlet disabled for the services that we add support for it 12:09:06 <dviroel> i think that, for now we could go with the ENV var to enable/disable eventlet, like nova 12:09:44 <amoralej> i ask because i just found that the oslo.service supporting the threading backend is blocked in RDO because it requires some other updates in libs 12:09:48 <amoralej> cotyledon, iirc 12:10:04 <amoralej> nothing strategic 12:10:12 <sean-k-mooney> yes it does 12:10:19 <sean-k-mooney> alhtouhg that is not a new requriement in general 12:10:30 <sean-k-mooney> cotyledon was already used in some projects in openstack 12:10:33 <amoralej> yes, but it requires > 2.0 12:10:37 <sean-k-mooney> that shoudl already be packaged in rdo 12:10:41 <amoralej> and there is 1.7 or something like that 12:10:55 <amoralej> not big thing, but needs to be done 12:11:00 <sean-k-mooney> ack but that would have been requireed for the other service that nativly use it already 12:11:03 <amoralej> also new backend is added as an extras 12:11:14 <amoralej> so, it needs to provide an extra optional rpm 12:11:37 <sean-k-mooney> good to know 12:11:54 <dviroel> right, good point amoralej 12:11:55 <sean-k-mooney> once we start merging support in watcher we wil want to make it a required dep for the watcher rpm 12:12:23 <amoralej> it has automatic deps, so if we add it to requiremets.txt, it should be automatically in the rpm 12:12:26 <sean-k-mooney> but that can happen later in the cycle once we actully have it woring 12:12:33 <amoralej> yeap 12:13:14 <dviroel> thanks for the heads up amoralej 12:13:33 <sean-k-mooney> dviroel: this is not the way to test that by the way https://review.opendev.org/c/openstack/watcher/+/952257/1/watcher/eventlet.py 12:13:52 <sean-k-mooney> you can tweak the job to set the env var for the relevent service 12:14:12 <sean-k-mooney> basicly you shoudl not need to comemnt and hard code the behvior 12:14:40 <sean-k-mooney> also returnign FALSE here https://review.opendev.org/c/openstack/watcher/+/952257/1/watcher/eventlet.py#26 12:14:45 <sean-k-mooney> is semanticly incorrect 12:15:13 <dviroel> it checks the MONKEY_PATCHED for that 12:15:35 <dviroel> but we can discuss more about there in the change, for sure 12:15:43 <dviroel> the idea is not to hardcoded things 12:16:08 <dviroel> but work based on OS_WATCHER_DISABLE_EVENTLET_PATCHING 12:16:26 <sean-k-mooney> i know but monkey patch shoudl return None 12:16:40 <sean-k-mooney> i dont want to use if _monkey_patch(): 12:17:17 <sean-k-mooney> if is_patching_enabled(): is fine 12:17:57 <sean-k-mooney> but dont conflate the two "_monkey_patch" shoudl patch if not allreay patched it shoudl not be used to make desions 12:18:08 <sean-k-mooney> anyway lets move on 12:18:31 <dviroel> ack, we can discuss more about this in the patch on later in the channel 12:18:38 <dviroel> thanks sean-k-mooney 12:18:58 <sean-k-mooney> also is_patched is the function that you shoudl be useing to make desciosn outside this module 12:19:06 <sean-k-mooney> ill leave thos comment on the review 12:19:18 <dviroel> lets move on to the next topic, and feel free to comment on the WIP patch too, maybe I'm missing some details there 12:19:21 <sean-k-mooney> is_patching_enabled shoudl be private and internal. 12:19:21 <dviroel> sean-k-mooney: ty 12:20:18 <dviroel> #topic Call for reviews 12:20:31 <dviroel> thre is some doc reviews open 12:20:42 <dviroel> #link https://review.opendev.org/c/openstack/watcher/+/951002 Update Host Maintenance strategy documentation 12:21:08 <dviroel> already has some votes on it ^ 12:22:15 <dviroel> #link https://review.opendev.org/c/openstack/watcher/+/951025 Update Workload Balance strategy documentation 12:22:46 <dviroel> rlandy: any comment on this one? 12:23:04 <rlandy> not really - just needs some more votes 12:23:18 <rlandy> the only ones for further discussion are the tables ones 12:23:28 <dviroel> ack, it was recently update, i will review again 12:23:33 <rlandy> those two should match 12:23:38 <dviroel> #link https://review.opendev.org/c/openstack/watcher/+/951440 Add table - level of test/usage per strategy 12:23:49 <dviroel> and 12:23:53 <dviroel> #link https://review.opendev.org/c/openstack/watcher/+/951699 Add Integrations doc page with support matrix 12:25:01 <dviroel> ack, i see that you moved the status from Production -> Supported 12:25:14 <dviroel> to match with the Integration table 12:25:28 <dviroel> they look that are in sync now 12:26:17 <dviroel> going to review it again and vote 12:26:33 <dviroel> pls all, take some time to review these doc updates ^ 12:26:49 <dviroel> there is one more 12:26:52 <dviroel> #link https://review.opendev.org/c/openstack/watcher/+/951899 Add doc clarifications for Zone Migration 12:27:29 <dviroel> ty all for helping improving the docs (and reviewing them) 12:27:51 <dviroel> any other doc patches? 12:28:25 <dviroel> now about specs 12:28:41 <sean-k-mooney> ill try and review those all after the meeting 12:28:58 <dviroel> the spec: feat(spec): disable cold/live migration for host maintenance strategy merged minutes ago 12:29:08 <sean-k-mooney> yep 12:29:11 <dviroel> we still have 3 open for review 12:29:19 <dviroel> #link https://review.opendev.org/c/openstack/watcher-specs/+/947282 (Adds spec for extend compute model attributes) 12:30:01 <dviroel> which I recently updated to reduce the scope ^ - we would end now with 2 specs, one for extending the model, and anothe one for using this new info in strategies 12:30:29 <sean-k-mooney> ya we likely will be able to complete the first this cycle i think 12:30:30 <dviroel> going to propose another one to cover the strategies part 12:30:51 <sean-k-mooney> the second might be next cycle as im not sure we can design it in time for sepc freeze 12:31:03 <sean-k-mooney> we can see how things go 12:31:10 <dviroel> correct, there is a lot to discuss in the second one 12:31:37 <dviroel> ty for all for comments/reviews 12:31:41 <dviroel> next spec 12:31:44 <dviroel> #link https://review.opendev.org/c/openstack/watcher-specs/+/949528 (Add spec for skip actions from action plan executions) 12:31:59 <amoralej> about that, i still need to update it with your comments 12:32:18 <dviroel> i need to revisit the last ps too 12:32:32 <amoralej> but there is an opened topic from gmaan , which is the status of the actionplan when one ore more actions are skipped 12:32:32 <dviroel> it is on my spec review list 12:33:20 <amoralej> my initial approach was keeping it as succeeded but he challenged if succeeded is the same as succeeded_with_skipped 12:33:33 <amoralej> wdyt? should we add a new state to the actionplan? 12:33:55 <amoralej> or keep as succeeded with some note or something visible? 12:34:17 <sean-k-mooney> thats an interesting question 12:34:27 <amoralej> it is :) 12:34:31 <sean-k-mooney> i dont htink it shoudl be a uniqe state 12:35:10 <sean-k-mooney> it could be but then it calles into question the diffent betwen skiped intentionally and skipped by precondtions 12:35:54 <sean-k-mooney> it feels incorrect to skip actions before executing the plann and then to say succeeded_with_skipped 12:36:06 <sean-k-mooney> if not addtional actions were skipped during the execution 12:36:27 <sean-k-mooney> to feel missleadign to me as all the actions i asked to executed were executed 12:36:49 <amoralej> that's a good point, so skipping manually actions would not affect to the succeeded status, but automatically would 12:37:11 <sean-k-mooney> so i woudl only add succeeded_with_skiped if you explcitly exluded ones that were skipped before we started the action plan and that feels overly compelxt IMO 12:37:34 <sean-k-mooney> can you capture this in the spec. i can try an loop back 12:37:42 <sean-k-mooney> and we can discuss it more with gmaan if they have time 12:37:55 <amoralej> so, what would be the case if the action skips automatically based on pre-condution? 12:38:05 <sean-k-mooney> i think we could do it based on th reason filed 12:38:10 <amoralej> yes, we can keep discussing in the review 12:38:10 <jgilaber> amoralej: I would still consider an succesful actionplan that had actions skipped automatically as success 12:38:25 <sean-k-mooney> basiclaly if we set it to "skipped by user" or some knwo string when skipping ebfore the execution 12:39:02 <sean-k-mooney> then we could perhaps have successs_with_skipped if it only was reported that way when there is skipped_by_precondtion in the reaons 12:39:10 <sean-k-mooney> that might be the way forward 12:40:09 <sean-k-mooney> amoralej: in either case we shoudl capture this in the alternitive section regardles of which path we take 12:40:16 <sean-k-mooney> to record this design choice 12:40:18 <amoralej> ack 12:41:19 <dviroel> ack, so for those that want to know more, pls check the spec, there are some discussions in there already 12:41:30 <amoralej> i will do, but instead of using the text reason field, i wonder if that should be a new one 12:41:48 <amoralej> anyway, yes, let's keep the discussion in the spec 12:41:57 <amoralej> any feedback is appreciated 12:42:10 <dviroel> tks amoralej for the updates 12:42:21 <dviroel> thre is one more spec open 12:42:28 <dviroel> #link https://review.opendev.org/c/openstack/watcher-specs/+/949804 (Support multitenancy with Prometheus datasource) 12:42:45 <dviroel> this one I still need to review :( 12:43:30 <dviroel> but sean-k-mooney already raise some discussion/suggestions there 12:43:31 <dviroel> ++ 12:43:53 <sean-k-mooney> ya im not partically happy with how the design of this is beign done 12:44:13 <sean-k-mooney> metta context as well i think they are assumeign we would backprot this supprot downstream 12:44:34 <sean-k-mooney> which they have not discuss with us and i did not plan to do 12:44:51 <sean-k-mooney> that a donwstream dicsusion but 12:45:08 <sean-k-mooney> form my perspecitve they are not properly doing open desgins and following upsream convestion which im annoyed by 12:46:44 <dviroel> ack, thanks for raising that and for requesting a spec, which allow us to discuss more about the design 12:47:02 <dviroel> i will try to get some eyes on this one too 12:47:18 <dviroel> if someone else have some time, pls also take a look 12:47:49 <sean-k-mooney> as it stand the current issue i think is a blocker is changing the obeserablity client to require that we alwasy pass a keystone session 12:48:04 <sean-k-mooney> that shoudl only be done if Ateos is used 12:48:10 <dviroel> +1 12:48:32 <sean-k-mooney> i condiser it to be a security issue 12:48:41 <dviroel> #link https://opendev.org/openstack/aetos fyi 12:48:41 <sean-k-mooney> to pass a admin toke to prometiosu if it is not required 12:48:50 <sean-k-mooney> because if that is not authenicated viah https 12:48:58 <sean-k-mooney> then it can be sniffed on the wire 12:49:04 <sean-k-mooney> and now you have full admin rights 12:49:33 <sean-k-mooney> so it is not accpable to pass it when it is not used and it need to be opt in for upgrade reasons 12:50:33 <dviroel> upgrade reasons is a important point to raise, for sure 12:50:58 <dviroel> and also the security concerns 12:51:27 <dviroel> thanks for the discussions sean-k-mooney - we should all ready the spec and the comments there 12:51:45 <dviroel> since it is a important design change for watcher 12:52:08 <dviroel> going to move forward, and we should bring these specs again in the next meetings.. 12:52:33 <dviroel> that's all for now in specs 12:52:39 <dviroel> there are some review request for bug fixes too 12:52:48 <dviroel> #link https://review.opendev.org/c/openstack/watcher/+/952212 bug in workload stabilization 12:53:18 <amoralej> i added a those, which are more urgent and i expect are easy to review 12:53:29 <dviroel> ack, going to take a look 12:53:46 <dviroel> and 12:53:52 <dviroel> #link https://review.opendev.org/c/openstack/watcher/+/952364 Aggregate by label when querying instance cpu usage in prometheus 12:54:10 <dviroel> this second one I need some time to process the query:) 12:54:52 <opendevreview> David proposed openstack/watcher-tempest-plugin master: Add tests for workload_balance with injected data https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/949722 12:55:00 <amoralej> sure :) 12:55:26 <dviroel> but it is something important to review 12:56:12 <dviroel> any question on these patches? 12:56:23 <amoralej> it's finally only changing the "avg by (instance)" to "avg by (resource)" 12:56:43 <amoralej> but i took the oportunity to move the query format to dict which i think it's better 12:56:55 <amoralej> from %s to dict, i mean 12:56:55 <morenod> I confirm that after that change, the workload_balance under prometheus is consistent 12:57:30 <dviroel> ack, it is a small change, but it is also a dangerous one :) 12:57:39 <dviroel> so it requires more eyes 12:58:11 <dviroel> i will take a look 12:58:22 <amoralej> yep, we can break it all by chaging prometheus queries, that's right 12:58:33 <amoralej> :) 12:58:56 <dviroel> we are almost out of time 12:59:49 <dviroel> we can review the bugs listed after the meeting, or someone want to highlight something? 13:01:04 <dviroel> thanks mtembo for volunteer to chair next week meeting 13:01:30 <dviroel> I won't join next week's meeting, since it is a public holiday here 13:01:38 <mtembo> thanks too dviroel 13:02:22 <dviroel> ok, i will move the bugs listed to next week meeting, and close for today 13:02:42 <dviroel> thank you all for joining 13:02:56 <dviroel> #endmeeting