12:01:25 #startmeeting Watcher IRC meeting February 27, 2025 12:01:25 Meeting started Thu Feb 27 12:01:25 2025 UTC and is due to finish in 60 minutes. The chair is rlandy. Information about MeetBot at http://wiki.debian.org/MeetBot. 12:01:25 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 12:01:25 The meeting name has been set to 'watcher_irc_meeting_february_27__2025' 12:01:37 hi all 12:01:41 who is around? 12:01:49 o/ 12:02:00 hello o/ 12:02:02 o/ 12:02:33 please add any additional meeting topics to: #link https://etherpad.opendev.org/p/openstack-watcher-irc-meeting#L65 12:03:16 o/ 12:03:35 ok ... let's begin with the topics on the etherpad ... 12:03:53 #topic Code freeze and RC 12:04:19 reminder that those dates are here or coming soon 12:04:43 code freeze is technically today for features 12:04:56 bug can merge until rc1 12:05:06 althoguh we whoudl avoid large changes at this point 12:05:14 the same is true for ci changes 12:05:38 they can still merge until rc1 but we shoudl make sure they are stable 12:06:11 o/ sorry I'm late 12:06:20 master will reopen for feature devleopemt once the stable branches are cut at rc1 12:06:20 thank you sean-k-mooney for the additional information on what can/should go in 12:06:28 the changes related to prometheus testing in tempest should be merged before rc 1? 12:06:44 they can be yes 12:06:58 yes ... that brings us to the next topic ... 12:07:00 so we shoudl not add new jobs in stable, we can extend existing ones however 12:07:33 which is the outstanding reviews 12:08:09 so let's go there so we have enough time to cover them 12:08:22 #topic Reviews that need attention: 12:08:54 dviroel: would you like to take the ones you have on the list? 12:09:03 sure 12:09:14 1st one 12:09:25 #link https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/942308 (check for instances in the compute model) 12:09:51 there is more info the bug report, which explains the issue that we are trying to solve there 12:10:11 it should make the CI more stable since it waits for compute model updates before starting the strategy execution and also at the end of the test, after deleting the instances 12:10:53 ci should vote again, not that the issue with oslo.service regression is solved 12:11:05 any comments on this one? 12:11:15 if CI passes, this is mergeable? 12:11:38 it should be yes 12:12:03 yes 12:12:10 for ci only changes we can continue to merge as normal for the next week and and a half 12:12:17 there is another change that is testing it: 12:12:25 i would prefer to stop merging ci changes a day or two before we cut rc1 12:12:32 #link https://review.opendev.org/c/openstack/watcher/+/942150 (Enable prometheus datasource in watcher-prometheus-integration job) 12:12:49 ^ depends on the 1st patch 12:13:48 ok, so the next one is 12:14:00 #link https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/942141 (support for prometheus datasource in scenario tests) 12:14:07 this one was already mentioned last week, change was updated based on comments from reviewers 12:14:18 ty for reviewing it :) 12:14:35 is is also being tested on https://review.opendev.org/c/openstack/watcher/+/942150 (Enable prometheus datasource in watcher-prometheus-integration job) 12:15:09 where is changes the datasource of 'watcher-prometheus-integration' job to 'prometheus' and enable the strategy execution tests 12:15:29 s/where is/where it 12:16:29 if we merge the prometheus datasource one, it would be good to merge the ci change as well, since it is goint to be testing it 12:17:22 so we got the new oslo.service pulled into upper-constraints which seems to fix the breakage. 12:17:45 yep 12:17:52 tkajinam: yes, thanks for proposing the fixes 12:17:52 it did 12:18:27 any comments on the prometheus datasource change? 12:18:37 dviroel: as we have been dicussiont these i have been reviwign them and have approved the first prom datasouce patch 12:18:43 and the one for waiting on the model 12:18:47 both look good to me 12:19:09 sean-k-mooney: great, thanks for reviewing, i know that you have a huge list of reviews 12:19:32 https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/942481 looks like the one that needs review attention 12:19:47 ya so i wont get to the second prom tool patch today but i might loop back to https://review.opendev.org/c/openstack/watcher/+/942150 once the others have merged 12:20:03 rlandy: right, this is a follow up for prometheus datasource 12:20:03 well there are more of us than just sean-k-mooney 12:20:14 ya so i wont get to that today 12:20:49 but if other can review ill try an get to it early next week 12:21:08 thank you 12:21:28 and thank you tkajinam for joining and for the fixes 12:22:09 there is also a link for test results in the etherped for the (run promtool in a podified deployment) change 12:22:48 rlandy: that's what I have 12:22:52 anyone have any more questions for dviroel on the reviews discussed? 12:23:03 dviroel++ great work in a short amount of time 12:23:34 yeah, it's a great work dviroel 12:23:45 (⊙‿⊙) 12:23:51 I'll review https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/942481 today 12:24:08 ty all for the reviews 12:25:02 ok ... let's move on ... 12:25:18 #topic Bug triage 12:25:40 we have a few in the list today - the first of which I did not add ... 12:25:54 #link https://bugs.launchpad.net/watcher/+bug/2098984 12:26:08 Zone Migration Strategy failing to build a list of instances for migration 12:26:20 newly created 12:26:27 ah, I reported this one, i saw this failure while running tests with prometheus datasource 12:26:40 zone migration is failing with a KeyError in some cases 12:26:55 right - you reported that last week 12:27:03 repeated test failures 12:27:14 if you check the link [3] in the Bug 12:27:27 you will see that zone migration is looking directly in nova, to get a list of instances and filtering based on the compute model 12:27:42 #link https://github.com/openstack/watcher/blob/7fcca0cc469b89957fd3821c72c3bb2d167a23ba/watcher/decision_engine/strategy/strategies/zone_migration.py#L528-L530 12:27:49 yes, tks 12:27:57 it is not catching the keyError exception raised when the instance does not exist in the model 12:28:05 and fails 12:28:31 but I am not even sure if the strategy should be getting instances from nova, instead of looking to the model only 12:28:39 it seems to be the only strategy doing that 12:29:19 i'd say strategies should rely only in the collected model 12:29:39 in the end, CI should not fail anymore since with https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/942308 (check for instances in the compute model) fix 12:29:41 maybe some metadata is missing there? 12:30:05 but still, we should verify the behavior of this strategy, so I think that the Bug is still valid 12:30:16 i think so 12:30:20 +1 12:30:33 out of ci, model may be out of sync 12:30:39 correct 12:31:22 so let's up the importance 12:31:28 and moving to triaged 12:31:58 I'll put that on high 12:32:04 so we are sure to come back to it 12:32:45 in the end, it fails, but running the audit again, it can propose a solution 12:33:12 so it is not a high/critical imo 12:33:50 I was thinking if we may be able to improve the way it manages the model, may be a topic for the ptg 12:34:16 also, it only affects one strategy, so i agree, non critical 12:34:19 usign the model can be out of sync so ya the stragies shoudl handle not found ectra 12:34:29 amoralej: +1 12:34:29 but i think we need to reflect on the error handelign in general 12:34:44 i suspect this is not the onlyh place where it assumes no extrenal change when running 12:35:05 one suggestion i have and we can go into it more in the ptg 12:35:15 is watcher can consume nova notifciaotns 12:35:25 nova sends notification for isntance create and delete 12:35:29 and other hging 12:35:42 so we could hook those notificatoin ot update the model reactivly 12:36:12 ok - whoever is editing the PTG topics - thank you for adding to that list 12:36:40 I'll move the bug to medium 12:37:04 sean-k-mooney: i think that watcher already has some notification processing, according with some debug messages in logs 12:37:17 but yeah, lets discuss that in more details at ptg 12:37:33 sean-k-mooney, I'm adding the topic for ptg, feel free to add your ideas there 12:38:02 this-> "DEBUG watcher.decision_engine.model.notification.nova Mapped instance eb46da3f-5b4d-41c4-95f7-648ebc0d8162 to cf119883-9445-4660-967b-e5c49ff08495 {{(pid=97523) update_instance_mapping /opt/stack/watcher/watcher/decision_engine/model/notification/nova.py:237}}..." 12:38:21 #action continue discussion at PTG 12:38:29 dviroel: it does i just dont know how conencted it is. 12:38:46 i.e. i have not looked to see what we are usign them for yet 12:38:55 we can likely move on 12:38:58 sean-k-mooney: yeah, needs further investigation 12:40:20 thank you for raising this ... to be continued at PTG 12:40:35 ok - let's move on to the rest of the bugs listed 12:40:46 these few are the last few without any recent update 12:41:10 most I think are test related or already fixed so we should be able to action them 12:41:27 #link https://bugs.launchpad.net/watcher/+bug/1790129 12:42:05 2018, python 2.7 12:42:20 yep - very old 12:42:27 paste still exists though :) 12:42:52 objections to closing this as old version? 12:43:32 or incomplete , no way to reproduce 12:44:08 the test is passing now, so I think we can close this one 12:44:14 for example in https://702b7e8f253d29e679a6-2fe3f6c342189909aad5220492fb4721.ssl.cf1.rackcdn.com/942150/6/check/openstack-tox-py39/2c67349/testr_results.html 12:44:54 thanks jgilaber__ 12:45:07 the paste also points to ceilometer as backend... 12:45:22 lets clsoe as incomplete 12:46:04 marked incomplete ... moving on 12:46:16 #link https://bugs.launchpad.net/watcher/+bug/1829075 12:46:29 watcher-tls-test intermittently fails cloning gnocchi from github 12:47:37 I don't think we have hit clone issues lately? 12:47:51 was this job removed? I can't find its history 12:48:14 2019 12:48:21 possible - long time ago 12:48:32 ya so we are not ment to clone in the jobs 12:48:53 i suspsect this was fixxed by either removign the job or making it use gnocci cloned by zuul 12:49:17 actully i may have env fixed this indirectlly 12:50:00 https://review.opendev.org/c/openstack/ceilometer/+/872332 12:50:10 https://review.opendev.org/c/openstack/telemetry-tempest-plugin/+/872350 12:50:37 ok - so fix released? 12:50:37 i made the plugin that installed know acutlly use zuul to handel the cloning 12:50:43 yep or invalid 12:50:46 you might have to add gnocchi to required-projects 12:50:48 it wasnt a bug in watcher 12:51:50 ok - moving on 12:51:54 #link https://bugs.launchpad.net/watcher/+bug/1828598 12:52:06 test_execute_workload_stabilization intermittently fails because server is deleted before live migration is complete 12:52:20 again 2019 12:53:09 https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/658725 was proposed as a possible fix 12:53:09 possible fix is listed 12:53:15 ack 12:53:48 we should be running a workload stabilization test 12:53:54 so have a recent history? 12:54:20 we are and its passign as far as i am aware 12:54:26 so i think this could be marked fix released 12:55:44 updating bug 12:55:53 + 12:55:56 +1 12:56:01 and last one ... 12:56:05 #link https://bugs.launchpad.net/watcher/+bug/1807180 12:56:16 OSC watcher plugin doesn't use updated API versioning 12:56:25 This issue was fixed in the openstack/python-watcherclient 2.3.0 release. 12:56:30 per bug 12:57:25 right, we don't see that issue happening in our env 12:57:41 updating per that comment 12:58:00 ok - we're coming up to time so .. 12:58:12 #topic Volunteers to chair March 6th meeting 12:58:19 thank you dviroel 12:58:26 o/ 12:58:36 any other topics to raise before we close out? 12:59:17 I've been adding topics to the list of items for ptg, to track conversations we've had in last weeks/months, feel free to coment on them, we can filter the list when ptg is closer 12:59:32 thank you amoralej 13:00:04 ok - thanks all ... 13:00:06 #endmeeting