12:01:25 <rlandy> #startmeeting Watcher IRC meeting February 27, 2025
12:01:25 <opendevmeet> Meeting started Thu Feb 27 12:01:25 2025 UTC and is due to finish in 60 minutes.  The chair is rlandy. Information about MeetBot at http://wiki.debian.org/MeetBot.
12:01:25 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
12:01:25 <opendevmeet> The meeting name has been set to 'watcher_irc_meeting_february_27__2025'
12:01:37 <rlandy> hi all
12:01:41 <rlandy> who is around?
12:01:49 <amoralej> o/
12:02:00 <marios> hello o/
12:02:02 <dviroel> o/
12:02:33 <rlandy> please add any additional meeting topics to: #link https://etherpad.opendev.org/p/openstack-watcher-irc-meeting#L65
12:03:16 <sean-k-mooney> o/
12:03:35 <rlandy> ok ... let's begin with the topics on the etherpad ...
12:03:53 <rlandy> #topic Code freeze and RC
12:04:19 <rlandy> reminder that those dates are here or coming soon
12:04:43 <sean-k-mooney> code freeze is technically today for features
12:04:56 <sean-k-mooney> bug can merge until rc1
12:05:06 <sean-k-mooney> althoguh we whoudl avoid large changes at this point
12:05:14 <sean-k-mooney> the same is true for ci changes
12:05:38 <sean-k-mooney> they can still merge until rc1 but we shoudl make sure they are stable
12:06:11 <jgilaber__> o/ sorry I'm late
12:06:20 <sean-k-mooney> master will reopen for feature devleopemt once the stable branches are cut at rc1
12:06:20 <rlandy> thank you sean-k-mooney for the additional information on what can/should go in
12:06:28 <amoralej> the changes related to prometheus testing in tempest should be merged before rc 1?
12:06:44 <sean-k-mooney> they can be yes
12:06:58 <rlandy> yes ... that brings us to the next topic ...
12:07:00 <sean-k-mooney> so we shoudl not add new jobs in stable, we can extend existing ones however
12:07:33 <rlandy> which is the outstanding reviews
12:08:09 <rlandy> so let's go there so we have enough time to cover them
12:08:22 <rlandy> #topic Reviews that need attention:
12:08:54 <rlandy> dviroel: would you like to take the ones you have on the list?
12:09:03 <dviroel> sure
12:09:14 <dviroel> 1st one
12:09:25 <dviroel> #link https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/942308  (check for instances in the compute model)
12:09:51 <dviroel> there is more info the bug report, which explains the issue that we are trying to solve there
12:10:11 <dviroel> it should make the CI more stable since it waits for compute model updates before starting the strategy execution and also at the end of the test, after deleting the instances
12:10:53 <dviroel> ci should vote again, not that the issue with oslo.service regression is solved
12:11:05 <dviroel> any comments on this one?
12:11:15 <rlandy> if CI passes, this is mergeable?
12:11:38 <sean-k-mooney> it should be yes
12:12:03 <dviroel> yes
12:12:10 <sean-k-mooney> for ci only changes we can continue to merge as normal for the next week and and a half
12:12:17 <dviroel> there is another change that is testing it:
12:12:25 <sean-k-mooney> i would prefer to stop merging ci changes a day or two before we cut rc1
12:12:32 <dviroel> #link https://review.opendev.org/c/openstack/watcher/+/942150 (Enable prometheus datasource in watcher-prometheus-integration job)
12:12:49 <dviroel> ^ depends on the 1st patch
12:13:48 <dviroel> ok, so the next one is
12:14:00 <dviroel> #link https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/942141 (support for prometheus datasource in scenario tests)
12:14:07 <dviroel> this one was already mentioned last week, change was updated based on comments from reviewers
12:14:18 <dviroel> ty for reviewing it :)
12:14:35 <dviroel> is is also being tested on https://review.opendev.org/c/openstack/watcher/+/942150 (Enable prometheus datasource in watcher-prometheus-integration job)
12:15:09 <dviroel> where is changes the datasource of 'watcher-prometheus-integration' job to 'prometheus' and enable the strategy execution tests
12:15:29 <dviroel> s/where is/where it
12:16:29 <dviroel> if we merge the prometheus datasource one, it would be good to merge the ci change as well, since it is goint to be testing it
12:17:22 <tkajinam> so we got the new oslo.service pulled into upper-constraints which seems to fix the breakage.
12:17:45 <sean-k-mooney> yep
12:17:52 <dviroel> tkajinam: yes, thanks for proposing the fixes
12:17:52 <sean-k-mooney> it did
12:18:27 <dviroel> any comments on the prometheus datasource change?
12:18:37 <sean-k-mooney> dviroel: as we have been dicussiont these i have been reviwign them and have approved the first prom datasouce patch
12:18:43 <sean-k-mooney> and the one for waiting on the model
12:18:47 <sean-k-mooney> both look good to me
12:19:09 <dviroel> sean-k-mooney: great, thanks for reviewing, i know that you have a huge list of reviews
12:19:32 <rlandy> https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/942481 looks like the one that needs review attention
12:19:47 <sean-k-mooney> ya so i wont get to the second prom tool patch today but i might loop back to https://review.opendev.org/c/openstack/watcher/+/942150 once the others have merged
12:20:03 <dviroel> rlandy: right, this is a follow up for prometheus datasource
12:20:03 <rlandy> well there are more of us than just sean-k-mooney
12:20:14 <sean-k-mooney> ya so i wont get to that today
12:20:49 <sean-k-mooney> but if other can review ill try an get to it early next week
12:21:08 <rlandy> thank you
12:21:28 <rlandy> and thank you tkajinam for joining and for the fixes
12:22:09 <dviroel> there is also a link for test results in the etherped for the  (run promtool in a podified deployment) change
12:22:48 <dviroel> rlandy: that's what I have
12:22:52 <rlandy> anyone have any more questions for dviroel on the reviews discussed?
12:23:03 <rlandy> dviroel++ great work in a short amount of time
12:23:34 <amoralej> yeah, it's a great work dviroel
12:23:45 <dviroel> (⊙‿⊙)
12:23:51 <amoralej> I'll review https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/942481 today
12:24:08 <dviroel> ty all for the reviews
12:25:02 <rlandy> ok ... let's move on ...
12:25:18 <rlandy> #topic Bug triage
12:25:40 <rlandy> we have a few in the list today - the first of which I did not add ...
12:25:54 <rlandy> #link https://bugs.launchpad.net/watcher/+bug/2098984
12:26:08 <rlandy> Zone Migration Strategy failing to build a list of instances for migration
12:26:20 <rlandy> newly created
12:26:27 <dviroel> ah, I reported this one, i saw this failure while running tests with prometheus datasource
12:26:40 <dviroel> zone migration is failing with a KeyError in some cases
12:26:55 <rlandy> right - you reported that last week
12:27:03 <rlandy> repeated test failures
12:27:14 <dviroel> if you check the link [3] in the Bug
12:27:27 <dviroel> you will see that zone migration is looking directly in nova, to get a list of instances and filtering based on the compute model
12:27:42 <rlandy> #link https://github.com/openstack/watcher/blob/7fcca0cc469b89957fd3821c72c3bb2d167a23ba/watcher/decision_engine/strategy/strategies/zone_migration.py#L528-L530
12:27:49 <dviroel> yes, tks
12:27:57 <dviroel> it is not catching the keyError exception raised when the instance does not exist in the model
12:28:05 <dviroel> and  fails
12:28:31 <dviroel> but I am not even sure if the strategy should be getting instances from nova, instead of looking to the model only
12:28:39 <dviroel> it seems to be the only strategy doing that
12:29:19 <amoralej> i'd say strategies should rely only in the collected model
12:29:39 <dviroel> in the end, CI should not fail anymore since with  https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/942308  (check for instances in the compute model) fix
12:29:41 <amoralej> maybe some metadata is missing there?
12:30:05 <dviroel> but still, we should verify the behavior of this strategy, so I think that the Bug is still valid
12:30:16 <amoralej> i think so
12:30:20 <rlandy> +1
12:30:33 <amoralej> out of ci, model may be out of sync
12:30:39 <dviroel> correct
12:31:22 <rlandy> so let's up the importance
12:31:28 <rlandy> and moving to triaged
12:31:58 <rlandy> I'll put that on high
12:32:04 <rlandy> so we are sure to come back to it
12:32:45 <dviroel> in the end, it fails, but running the audit again, it can propose a solution
12:33:12 <dviroel> so it is not a high/critical imo
12:33:50 <amoralej> I was thinking if we may be able to improve the way it manages the model, may be a topic for the ptg
12:34:16 <amoralej> also, it only affects one strategy, so i agree, non critical
12:34:19 <sean-k-mooney> usign the model can be out of sync so ya the stragies shoudl handle not found ectra
12:34:29 <dviroel> amoralej: +1
12:34:29 <sean-k-mooney> but i think we need to reflect on the error handelign in general
12:34:44 <sean-k-mooney> i suspect this is not the onlyh place where it assumes no extrenal change when running
12:35:05 <sean-k-mooney> one suggestion i have and we can go into it more in the ptg
12:35:15 <sean-k-mooney> is watcher can consume nova notifciaotns
12:35:25 <sean-k-mooney> nova sends notification for isntance create and delete
12:35:29 <sean-k-mooney> and other hging
12:35:42 <sean-k-mooney> so we could hook those notificatoin ot update the model reactivly
12:36:12 <rlandy> ok - whoever is editing the PTG topics - thank you for adding to that list
12:36:40 <rlandy> I'll move the bug to medium
12:37:04 <dviroel> sean-k-mooney: i think that watcher already has some notification processing, according with some debug messages in logs
12:37:17 <dviroel> but yeah, lets discuss that in more details at ptg
12:37:33 <amoralej> sean-k-mooney, I'm adding the topic for ptg, feel free to add your ideas there
12:38:02 <dviroel> this-> "DEBUG watcher.decision_engine.model.notification.nova Mapped instance eb46da3f-5b4d-41c4-95f7-648ebc0d8162 to cf119883-9445-4660-967b-e5c49ff08495 {{(pid=97523) update_instance_mapping /opt/stack/watcher/watcher/decision_engine/model/notification/nova.py:237}}..."
12:38:21 <rlandy> #action continue discussion at PTG
12:38:29 <sean-k-mooney> dviroel: it does i just dont know how conencted it is.
12:38:46 <sean-k-mooney> i.e. i have not looked to see what we are usign them for yet
12:38:55 <sean-k-mooney> we can likely move on
12:38:58 <dviroel> sean-k-mooney: yeah, needs further investigation
12:40:20 <rlandy> thank you for raising this ... to be continued at PTG
12:40:35 <rlandy> ok - let's move on to the rest of the bugs listed
12:40:46 <rlandy> these few are the last few without any recent update
12:41:10 <rlandy> most I think are test related or already fixed so we should be able to action them
12:41:27 <rlandy> #link https://bugs.launchpad.net/watcher/+bug/1790129
12:42:05 <dviroel> 2018, python 2.7
12:42:20 <rlandy> yep - very old
12:42:27 <rlandy> paste still exists though :)
12:42:52 <rlandy> objections to closing this as old version?
12:43:32 <amoralej> or incomplete , no way to reproduce
12:44:08 <jgilaber__> the test is passing now, so I think we can close this one
12:44:14 <jgilaber__> for example in https://702b7e8f253d29e679a6-2fe3f6c342189909aad5220492fb4721.ssl.cf1.rackcdn.com/942150/6/check/openstack-tox-py39/2c67349/testr_results.html
12:44:54 <rlandy> thanks jgilaber__
12:45:07 <dviroel> the paste also points to ceilometer as backend...
12:45:22 <sean-k-mooney> lets clsoe as incomplete
12:46:04 <rlandy> marked incomplete ... moving on
12:46:16 <rlandy> #link https://bugs.launchpad.net/watcher/+bug/1829075
12:46:29 <rlandy> watcher-tls-test intermittently fails cloning gnocchi from github
12:47:37 <rlandy> I don't think we have hit clone issues lately?
12:47:51 <jgilaber__> was this job removed? I can't find its history
12:48:14 <rlandy> 2019
12:48:21 <rlandy> possible - long time ago
12:48:32 <sean-k-mooney> ya so we are not ment to clone in the jobs
12:48:53 <sean-k-mooney> i suspsect this was fixxed by either removign the job or making it use gnocci cloned by zuul
12:49:17 <sean-k-mooney> actully i may have env fixed this indirectlly
12:50:00 <sean-k-mooney> https://review.opendev.org/c/openstack/ceilometer/+/872332
12:50:10 <sean-k-mooney> https://review.opendev.org/c/openstack/telemetry-tempest-plugin/+/872350
12:50:37 <rlandy> ok - so fix released?
12:50:37 <sean-k-mooney> i made the plugin that installed know acutlly use zuul to handel the cloning
12:50:43 <sean-k-mooney> yep or invalid
12:50:46 <tkajinam> you might have to add gnocchi to required-projects
12:50:48 <sean-k-mooney> it wasnt a bug in watcher
12:51:50 <rlandy> ok - moving on
12:51:54 <rlandy> #link https://bugs.launchpad.net/watcher/+bug/1828598
12:52:06 <rlandy> test_execute_workload_stabilization intermittently fails because server is deleted before live migration is complete
12:52:20 <rlandy> again 2019
12:53:09 <jgilaber__> https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/658725 was proposed as a possible fix
12:53:09 <rlandy> possible fix is listed
12:53:15 <rlandy> ack
12:53:48 <rlandy> we should be running a workload stabilization test
12:53:54 <rlandy> so have a recent history?
12:54:20 <sean-k-mooney> we are and its passign as far as i am aware
12:54:26 <sean-k-mooney> so i think this could be marked fix released
12:55:44 <rlandy> updating bug
12:55:53 <dviroel> +
12:55:56 <dviroel> +1
12:56:01 <rlandy> and last one ...
12:56:05 <rlandy> #link https://bugs.launchpad.net/watcher/+bug/1807180
12:56:16 <rlandy> OSC watcher plugin doesn't use updated API versioning
12:56:25 <rlandy> This issue was fixed in the openstack/python-watcherclient 2.3.0 release.
12:56:30 <rlandy> per bug
12:57:25 <dviroel> right, we don't see that issue happening in our env
12:57:41 <rlandy> updating per that comment
12:58:00 <rlandy> ok - we're coming up to time so ..
12:58:12 <rlandy> #topic Volunteers to chair March 6th meeting
12:58:19 <rlandy> thank you dviroel
12:58:26 <dviroel> o/
12:58:36 <rlandy> any other topics to raise before we close out?
12:59:17 <amoralej> I've been adding topics to the list of items for ptg, to track conversations we've had in last weeks/months, feel free to coment on them, we can filter the list when ptg is closer
12:59:32 <rlandy> thank you amoralej
13:00:04 <rlandy> ok - thanks all ...
13:00:06 <rlandy> #endmeeting