12:00:43 #startmeeting watcher 12:00:43 Meeting started Thu Jun 12 12:00:43 2025 UTC and is due to finish in 60 minutes. The chair is dviroel. Information about MeetBot at http://wiki.debian.org/MeetBot. 12:00:43 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 12:00:43 The meeting name has been set to 'watcher' 12:00:58 hi all, who's is around today? 12:01:29 o/ 12:01:37 im doing spec review but im here 12:01:43 courtesy ping: amoralej jgilaber chandankumar morenod 12:01:46 o/ 12:01:51 o/ 12:02:01 David proposed openstack/watcher-tempest-plugin master: Add tests for workload_balance with injected data https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/949722 12:02:04 o/ 12:02:28 alright, let's start with today's meeting agenda 12:02:33 #link https://etherpad.opendev.org/p/openstack-watcher-irc-meeting#L45 (Meeting agenda) 12:02:51 feel free to add your own topics to the agenda 12:03:00 there are topics to place your changes that requires attention from reviewers 12:03:10 lets start with the first one 12:03:27 #topic Eventlet Removal 12:03:35 #link https://etherpad.opendev.org/p/watcher-eventlet-removal 12:03:40 o/ 12:03:50 not too much to add i think 12:03:51 I updated the patch to move eventlet commands to a different directory 12:04:06 and propose a WIP patch for decision engine 12:04:12 link for related patches: 12:04:18 #link https://etherpad.opendev.org/p/watcher-eventlet-removal#L11 12:04:37 the WIP patch for decision engine, which is now running on CI, is failing :P 12:04:59 the patch allows to replace usage of green thread pools and test against oslo.service new threading backend 12:05:20 as for now, there is an issue with cluster data model updates which I'm investigating and working on a fix on my env 12:05:49 i will bring more updates about that soon 12:06:26 any question or comments on this topic? 12:07:27 ok, moving to the next topic 12:07:36 the plan is to make the change configurable and disabled by default? 12:07:47 or moving to the new threading backend in oslo.service by default? 12:08:02 different project are taking diffent approches 12:08:12 nova is making it configurable via a ENV var 12:08:29 ironic and neutron just ripped out supprot i think 12:08:30 yes amoralej, and have a CI job running with eventlet disabled for the services that we add support for it 12:09:06 i think that, for now we could go with the ENV var to enable/disable eventlet, like nova 12:09:44 i ask because i just found that the oslo.service supporting the threading backend is blocked in RDO because it requires some other updates in libs 12:09:48 cotyledon, iirc 12:10:04 nothing strategic 12:10:12 yes it does 12:10:19 alhtouhg that is not a new requriement in general 12:10:30 cotyledon was already used in some projects in openstack 12:10:33 yes, but it requires > 2.0 12:10:37 that shoudl already be packaged in rdo 12:10:41 and there is 1.7 or something like that 12:10:55 not big thing, but needs to be done 12:11:00 ack but that would have been requireed for the other service that nativly use it already 12:11:03 also new backend is added as an extras 12:11:14 so, it needs to provide an extra optional rpm 12:11:37 good to know 12:11:54 right, good point amoralej 12:11:55 once we start merging support in watcher we wil want to make it a required dep for the watcher rpm 12:12:23 it has automatic deps, so if we add it to requiremets.txt, it should be automatically in the rpm 12:12:26 but that can happen later in the cycle once we actully have it woring 12:12:33 yeap 12:13:14 thanks for the heads up amoralej 12:13:33 dviroel: this is not the way to test that by the way https://review.opendev.org/c/openstack/watcher/+/952257/1/watcher/eventlet.py 12:13:52 you can tweak the job to set the env var for the relevent service 12:14:12 basicly you shoudl not need to comemnt and hard code the behvior 12:14:40 also returnign FALSE here https://review.opendev.org/c/openstack/watcher/+/952257/1/watcher/eventlet.py#26 12:14:45 is semanticly incorrect 12:15:13 it checks the MONKEY_PATCHED for that 12:15:35 but we can discuss more about there in the change, for sure 12:15:43 the idea is not to hardcoded things 12:16:08 but work based on OS_WATCHER_DISABLE_EVENTLET_PATCHING 12:16:26 i know but monkey patch shoudl return None 12:16:40 i dont want to use if _monkey_patch(): 12:17:17 if is_patching_enabled(): is fine 12:17:57 but dont conflate the two "_monkey_patch" shoudl patch if not allreay patched it shoudl not be used to make desions 12:18:08 anyway lets move on 12:18:31 ack, we can discuss more about this in the patch on later in the channel 12:18:38 thanks sean-k-mooney 12:18:58 also is_patched is the function that you shoudl be useing to make desciosn outside this module 12:19:06 ill leave thos comment on the review 12:19:18 lets move on to the next topic, and feel free to comment on the WIP patch too, maybe I'm missing some details there 12:19:21 is_patching_enabled shoudl be private and internal. 12:19:21 sean-k-mooney: ty 12:20:18 #topic Call for reviews 12:20:31 thre is some doc reviews open 12:20:42 #link https://review.opendev.org/c/openstack/watcher/+/951002 Update Host Maintenance strategy documentation 12:21:08 already has some votes on it ^ 12:22:15 #link https://review.opendev.org/c/openstack/watcher/+/951025 Update Workload Balance strategy documentation 12:22:46 rlandy: any comment on this one? 12:23:04 not really - just needs some more votes 12:23:18 the only ones for further discussion are the tables ones 12:23:28 ack, it was recently update, i will review again 12:23:33 those two should match 12:23:38 #link https://review.opendev.org/c/openstack/watcher/+/951440 Add table - level of test/usage per strategy 12:23:49 and 12:23:53 #link https://review.opendev.org/c/openstack/watcher/+/951699 Add Integrations doc page with support matrix 12:25:01 ack, i see that you moved the status from Production -> Supported 12:25:14 to match with the Integration table 12:25:28 they look that are in sync now 12:26:17 going to review it again and vote 12:26:33 pls all, take some time to review these doc updates ^ 12:26:49 there is one more 12:26:52 #link https://review.opendev.org/c/openstack/watcher/+/951899 Add doc clarifications for Zone Migration 12:27:29 ty all for helping improving the docs (and reviewing them) 12:27:51 any other doc patches? 12:28:25 now about specs 12:28:41 ill try and review those all after the meeting 12:28:58 the spec: feat(spec): disable cold/live migration for host maintenance strategy merged minutes ago 12:29:08 yep 12:29:11 we still have 3 open for review 12:29:19 #link https://review.opendev.org/c/openstack/watcher-specs/+/947282 (Adds spec for extend compute model attributes) 12:30:01 which I recently updated to reduce the scope ^ - we would end now with 2 specs, one for extending the model, and anothe one for using this new info in strategies 12:30:29 ya we likely will be able to complete the first this cycle i think 12:30:30 going to propose another one to cover the strategies part 12:30:51 the second might be next cycle as im not sure we can design it in time for sepc freeze 12:31:03 we can see how things go 12:31:10 correct, there is a lot to discuss in the second one 12:31:37 ty for all for comments/reviews 12:31:41 next spec 12:31:44 #link https://review.opendev.org/c/openstack/watcher-specs/+/949528 (Add spec for skip actions from action plan executions) 12:31:59 about that, i still need to update it with your comments 12:32:18 i need to revisit the last ps too 12:32:32 but there is an opened topic from gmaan , which is the status of the actionplan when one ore more actions are skipped 12:32:32 it is on my spec review list 12:33:20 my initial approach was keeping it as succeeded but he challenged if succeeded is the same as succeeded_with_skipped 12:33:33 wdyt? should we add a new state to the actionplan? 12:33:55 or keep as succeeded with some note or something visible? 12:34:17 thats an interesting question 12:34:27 it is :) 12:34:31 i dont htink it shoudl be a uniqe state 12:35:10 it could be but then it calles into question the diffent betwen skiped intentionally and skipped by precondtions 12:35:54 it feels incorrect to skip actions before executing the plann and then to say succeeded_with_skipped 12:36:06 if not addtional actions were skipped during the execution 12:36:27 to feel missleadign to me as all the actions i asked to executed were executed 12:36:49 that's a good point, so skipping manually actions would not affect to the succeeded status, but automatically would 12:37:11 so i woudl only add succeeded_with_skiped if you explcitly exluded ones that were skipped before we started the action plan and that feels overly compelxt IMO 12:37:34 can you capture this in the spec. i can try an loop back 12:37:42 and we can discuss it more with gmaan if they have time 12:37:55 so, what would be the case if the action skips automatically based on pre-condution? 12:38:05 i think we could do it based on th reason filed 12:38:10 yes, we can keep discussing in the review 12:38:10 amoralej: I would still consider an succesful actionplan that had actions skipped automatically as success 12:38:25 basiclaly if we set it to "skipped by user" or some knwo string when skipping ebfore the execution 12:39:02 then we could perhaps have successs_with_skipped if it only was reported that way when there is skipped_by_precondtion in the reaons 12:39:10 that might be the way forward 12:40:09 amoralej: in either case we shoudl capture this in the alternitive section regardles of which path we take 12:40:16 to record this design choice 12:40:18 ack 12:41:19 ack, so for those that want to know more, pls check the spec, there are some discussions in there already 12:41:30 i will do, but instead of using the text reason field, i wonder if that should be a new one 12:41:48 anyway, yes, let's keep the discussion in the spec 12:41:57 any feedback is appreciated 12:42:10 tks amoralej for the updates 12:42:21 thre is one more spec open 12:42:28 #link https://review.opendev.org/c/openstack/watcher-specs/+/949804 (Support multitenancy with Prometheus datasource) 12:42:45 this one I still need to review :( 12:43:30 but sean-k-mooney already raise some discussion/suggestions there 12:43:31 ++ 12:43:53 ya im not partically happy with how the design of this is beign done 12:44:13 metta context as well i think they are assumeign we would backprot this supprot downstream 12:44:34 which they have not discuss with us and i did not plan to do 12:44:51 that a donwstream dicsusion but 12:45:08 form my perspecitve they are not properly doing open desgins and following upsream convestion which im annoyed by 12:46:44 ack, thanks for raising that and for requesting a spec, which allow us to discuss more about the design 12:47:02 i will try to get some eyes on this one too 12:47:18 if someone else have some time, pls also take a look 12:47:49 as it stand the current issue i think is a blocker is changing the obeserablity client to require that we alwasy pass a keystone session 12:48:04 that shoudl only be done if Ateos is used 12:48:10 +1 12:48:32 i condiser it to be a security issue 12:48:41 #link https://opendev.org/openstack/aetos fyi 12:48:41 to pass a admin toke to prometiosu if it is not required 12:48:50 because if that is not authenicated viah https 12:48:58 then it can be sniffed on the wire 12:49:04 and now you have full admin rights 12:49:33 so it is not accpable to pass it when it is not used and it need to be opt in for upgrade reasons 12:50:33 upgrade reasons is a important point to raise, for sure 12:50:58 and also the security concerns 12:51:27 thanks for the discussions sean-k-mooney - we should all ready the spec and the comments there 12:51:45 since it is a important design change for watcher 12:52:08 going to move forward, and we should bring these specs again in the next meetings.. 12:52:33 that's all for now in specs 12:52:39 there are some review request for bug fixes too 12:52:48 #link https://review.opendev.org/c/openstack/watcher/+/952212 bug in workload stabilization 12:53:18 i added a those, which are more urgent and i expect are easy to review 12:53:29 ack, going to take a look 12:53:46 and 12:53:52 #link https://review.opendev.org/c/openstack/watcher/+/952364 Aggregate by label when querying instance cpu usage in prometheus 12:54:10 this second one I need some time to process the query:) 12:54:52 David proposed openstack/watcher-tempest-plugin master: Add tests for workload_balance with injected data https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/949722 12:55:00 sure :) 12:55:26 but it is something important to review 12:56:12 any question on these patches? 12:56:23 it's finally only changing the "avg by (instance)" to "avg by (resource)" 12:56:43 but i took the oportunity to move the query format to dict which i think it's better 12:56:55 from %s to dict, i mean 12:56:55 I confirm that after that change, the workload_balance under prometheus is consistent 12:57:30 ack, it is a small change, but it is also a dangerous one :) 12:57:39 so it requires more eyes 12:58:11 i will take a look 12:58:22 yep, we can break it all by chaging prometheus queries, that's right 12:58:33 :) 12:58:56 we are almost out of time 12:59:49 we can review the bugs listed after the meeting, or someone want to highlight something? 13:01:04 thanks mtembo for volunteer to chair next week meeting 13:01:30 I won't join next week's meeting, since it is a public holiday here 13:01:38 thanks too dviroel 13:02:22 ok, i will move the bugs listed to next week meeting, and close for today 13:02:42 thank you all for joining 13:02:56 #endmeeting