12:00:38 <dviroel> #startmeeting watcher
12:00:38 <opendevmeet> Meeting started Thu Mar  6 12:00:38 2025 UTC and is due to finish in 60 minutes.  The chair is dviroel. Information about MeetBot at http://wiki.debian.org/MeetBot.
12:00:38 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
12:00:38 <opendevmeet> The meeting name has been set to 'watcher'
12:01:00 <dviroel> hi all, who's around today?
12:01:05 <rlandy> o/
12:01:09 <marios> o/
12:01:14 <mtembo> o/
12:01:14 <jgilaber> o/
12:01:48 <amoralej> o/
12:02:02 <dviroel> o/  thanks for joining
12:02:09 <dviroel> let's start with today's meeting agenda
12:02:16 <dviroel> #link https://etherpad.opendev.org/p/openstack-watcher-irc-meeting (Meeting agenda)
12:02:44 <dviroel> please feel free to add your own topics to the agenda if you want to highlight something :)
12:03:15 <dviroel> first one
12:03:26 <dviroel> #topic Feature Freeze
12:03:42 <dviroel> #link https://releases.openstack.org/epoxy/schedule.html (2025.1 Epoxy Release Schedule)
12:03:55 <dviroel> we are now in feature freeze period
12:04:17 <dviroel> meaning that no new features, configurations changes or even string changes should be merged by now, or at least avoided
12:05:04 <dviroel> we also don't have any FFE requests by now, so we should be ok for next week
12:05:40 <dviroel> which is RC1 target
12:06:15 <chandankumar> o/
12:06:24 <dviroel> let me highlight some patches from sean-k-mooney
12:06:35 <dviroel> #link https://review.opendev.org/c/openstack/watcher/+/943206 (Add epoxy prelude)
12:07:07 <dviroel> and
12:07:19 <dviroel> #link https://review.opendev.org/c/openstack/releases/+/943489 (Add cycle highlights for watcher in 2025.1)
12:08:10 <dviroel> thanks sean-k-mooney for proposing these patches
12:08:32 <sean-k-mooney> o/
12:08:42 <dviroel> if you have any comment or addition to the epoxy prelude, pls add your review there
12:09:17 <sean-k-mooney> i would like to merge the prelude patch early next week
12:09:21 <sean-k-mooney> i.e. monday
12:09:26 <sean-k-mooney> so that its included in RC1
12:09:35 <sean-k-mooney> so please provide any feedback before then
12:09:57 <dviroel> +1
12:10:48 <dviroel> any other comments for this topic?
12:11:04 <dviroel> ok, next one
12:11:13 <dviroel> #topic (rlandy) we need to sort/prioritize PTG topics
12:12:05 <rlandy> thanks dviroel
12:12:16 <rlandy> we have a lot of topics on the etherpad
12:12:31 <rlandy> some of which may need sessions with other DFGs
12:12:42 <rlandy> or groups I should say
12:13:06 <rlandy> so proposing we prioritize this list ... on line or async
12:13:15 <rlandy> whatever the group chooses
12:13:43 <rlandy> ie: we could do this by voting or meeting so that we can settle on a list and request the meeting slots
12:13:44 <marios> i added a sub-point under your agenda item which is related Ronelle
12:13:57 <rlandy> marios - pls go ahead then
12:14:07 <marios> we need to book slots... so perhaps we can book the times, and then folks can start to request particular slots of they want them
12:14:39 <marios> i got an email about the need for us to choose meeting slots (since I registered the team a few weeks back and they needed a specific contact)
12:14:42 <rlandy> well we don't want a chicken/egg situation with slots and topics
12:14:51 <rlandy> one needs to be sorted first
12:14:52 <marios> we need to select slots in https://ptg.opendev.org/ptg.html
12:15:43 <marios> i would propose we book 1300 to 1600 UTC on mon/tue/wed ? ... then we have specific times sorted if it works (we can discuss the proposal but i think that roughly lines up with the folks that have been active thus far at least)
12:15:56 <marios> we can always cancel slots of we don't use them
12:16:41 <marios> #info etherpads are live https://ptg.opendev.org/etherpads.html
12:16:52 <marios> #link https://etherpad.opendev.org/p/apr2025-ptg-watcher
12:17:08 <marios> so any objections to 1300-1600 UTC for mon/tue/wed?
12:17:13 <marios> otherwise i will book those
12:18:17 <dviroel> works for me
12:19:02 <marios> rlandy: for the topics. either one person (or few) can try and group these and then place them into 'slots' (eg we can vary the time but something like 45 mins per topic to allow 10 mins break or whatever way we decide to split it), OR
12:19:02 <chandankumar> above timing works for me too
12:19:02 <jgilaber> sounds good
12:19:08 <amoralej> wfm, i think it's probably more than we need, but if we can cancel, no problem
12:19:28 <marios> rlandy: OR, we ask the folks who added each of those topics, to add them into the PTG etherpad and we take them in order/as listed
12:19:41 <dviroel> sean-k-mooney: not sure if we can avoid confluicts with nova, since there are slots in all days :)
12:19:44 <marios> amoralej: yes exactly better to book and not use it
12:19:51 <marios> also if we don't book we can still decide to meet ;)
12:20:09 <sean-k-mooney> dviroel: there will be conflict but i think that will be managable
12:20:11 <marios> it just means it wont be listed and available to anyone that wants to follow
12:20:14 <amoralej> then wfm
12:20:26 <sean-k-mooney> im not sure either team will need the full slots on all days
12:20:33 <rlandy> marios: fine with either - as long as we have a system that works to get the most out of the meetings
12:21:19 <sean-k-mooney> i do know that nova was considerign mon-weds also becasue some core are not around later in the week or also part fo the tc
12:21:44 <marios> i'm wondering
12:22:01 <marios> sean-k-mooney: should we book  thu too, in case we want to schedule something with other teams who are busy on mon-wed
12:22:12 <marios> but i don't know if anyone has plans to invite say nova folks
12:22:29 <sean-k-mooney> i was planning to attend there seesion but we coudl do it the other way around
12:23:01 <sean-k-mooney> lets stick with the current schduler for now and i can reach out to them
12:23:17 <sean-k-mooney> the other team i think we shoudl try and sync with is horizon and or telemetry
12:23:51 <sean-k-mooney> im not sure if we will have topic for either but we should look at when they will have there seesion and see if we can organise cross project topic if they are relevent
12:24:22 <dviroel> ok, so we agree with initial marios slot proposal
12:24:39 <marios> thanks dviroel i will book mon-wed 1300-1600 slots
12:24:53 <dviroel> #action marios to book ptg slots: mon-wed 1300-1600
12:25:27 <marios> rlandy: i guess we will re-visit topics next week? or between now and then
12:25:42 <rlandy> marios: ack - sure
12:25:57 <rlandy> people can vote or self order in the mean time
12:26:06 <rlandy> thank you dviroel  - that is all from my side
12:26:35 <dviroel> ack, thanks for bringing this topic
12:26:46 <dviroel> also tks marios o/
12:27:07 <dviroel> ok, lets move on
12:27:24 <dviroel> #topic Bug Triage
12:27:56 <dviroel> first one
12:28:01 <dviroel> https://bugs.launchpad.net/watcher/+bug/2098374
12:29:15 <dviroel> there are kind of 2 issues in there
12:29:36 <dviroel> one with the audit
12:29:48 <dviroel> other one with the strategy itself?
12:31:28 <jgilaber> the first issue could maybe be related to the evenlet issue?
12:33:12 <amoralej> workload_cache is missing info about this instance.uuid apparently
12:33:40 <jgilaber> the second issue seems to happen in https://github.com/openstack/watcher/blob/77a30ef28140ec6c7748153a733b06d5d5ea55df/watcher/decision_engine/strategy/strategies/workload_balance.py#L162
12:33:46 <amoralej> may also be related to sync between model and something else ...
12:34:28 <jgilaber> the workload_cache is generated by https://github.com/openstack/watcher/blob/77a30ef28140ec6c7748153a733b06d5d5ea55df/watcher/decision_engine/strategy/strategies/workload_balance.py#L206
12:34:49 <dviroel> an exception in the strategy will move the audit to failed, and from there the audit can be deleted only
12:34:51 <jgilaber> there is some logging in that method, more logs could maybe tell us more
12:35:54 <dviroel> it seems that is working as expected and documented
12:35:58 <sean-k-mooney> audits are not mutable today
12:36:00 <dviroel> #link https://docs.openstack.org/watcher/latest/_images/audit_state_machine.png
12:36:05 <amoralej> everything seems to be coming from the model ...
12:36:22 <sean-k-mooney> so if there is an exption i think its expeed to move to failed and then only delete or archive would make sense
12:36:50 <sean-k-mooney> so ya i think this is expected
12:37:15 <dviroel> the issue is really the workload_balance KeyError exception
12:37:25 <dviroel> but it is not what was reported in this bug
12:37:43 <sean-k-mooney> well i think there are two things
12:38:04 <sean-k-mooney> if there is an unrecoverable error its expect to go to fiald
12:38:35 <sean-k-mooney> but if we are not properly handleing effectivly "instnace not found"
12:38:48 <sean-k-mooney> that is a logic bug
12:39:08 <amoralej> but it's continuous, it should keep as ongoing and run on next planned execution
12:39:56 <sean-k-mooney> yes and no. really continous audit shoudl be two seperate concepts
12:40:16 <sean-k-mooney> in the api we should have a disticntion between teh trigger/schduler and the actual audit
12:40:47 <sean-k-mooney> sicne we dont have that distinction today
12:41:08 <sean-k-mooney> it would be valid for the contiuse audit type to perhasp reset to pending
12:41:15 <sean-k-mooney> when the next interval elaspses
12:41:53 <sean-k-mooney> but fundementely this is a design flaw in teh api IMO
12:42:23 <amoralej> i doubt if pending or ongoing, tbh, once an audit is assigned into a decision-engine it's ongoing, iiuc
12:42:32 <amoralej> but yeah, api is confusing
12:42:43 <sean-k-mooney> we are currently conflating when to do an audit and what to audit
12:44:11 <sean-k-mooney> allowing a transition ot ongoing may be valid but im not sure its a bug
12:44:44 <sean-k-mooney> its a semantic api change so i dont think this would be backportable
12:45:01 <sean-k-mooney> we could convert this into a feature
12:45:13 <dviroel> +1
12:45:44 <sean-k-mooney> and add a "recoverable property" to the auit to opt in to requeing on failure perhapse with an upper limit on retrys without success
12:45:56 <amoralej> so you mean in having a separate api element to represent audit execution ?
12:46:00 <sean-k-mooney> i.e. add retry=5
12:46:10 <sean-k-mooney> amoralej: well that another option yes
12:46:52 <amoralej> so, audit_execution would be FAILED, but continous AUDIT would stay ONGOING meaning it will do a new execution according to schedule
12:46:58 <dviroel> i would say that the current bug report is invalid, and a RFE should be created
12:47:22 <dviroel> and maybe a new bug for the workload_balance should also be open
12:47:33 <sean-k-mooney> we could triage this as wishlist, with the rfe tag and say we should adress this with a spec
12:47:38 <amoralej> something interesting would be max_failures for an continous audit
12:47:54 <sean-k-mooney> amoralej: thats what retry=5 was intened to be
12:48:00 <amoralej> which is similar to that, yp
12:48:45 <sean-k-mooney> dviroel: i woudl suggest markign this as opipion, whishlist and add rfe to the tags if no one objects ?
12:49:33 <sean-k-mooney> dviroel: and perhasp have another bug for the sepcific key error
12:49:38 <dviroel> ack, I can also add meeting logs to the bug
12:49:39 <sean-k-mooney> as you suggested
12:49:44 <amoralej> and wrt the KeyError issue, i don't know how it hitted that, tbh
12:50:00 <amoralej> +1 to split in two
12:50:02 <sean-k-mooney> well the instance was likely deleted
12:50:21 <amoralej> but it's getting the data from the model not from live nova
12:50:36 <sean-k-mooney> its getting a mix in some case i think
12:50:40 <amoralej> so, unless it deleted and resynced the model in the middle of the execution ...
12:51:03 <amoralej> or I may be misreading the code
12:51:18 <sean-k-mooney> the key error is proably a valid bug to go fix
12:51:40 <dviroel> yep, requires more investigation
12:52:02 <dviroel> lets move on, I can do the split after the meeting, and update the current one
12:52:25 <dviroel> #link https://bugs.launchpad.net/watcher-tempest-plugin/+bug/2100741
12:52:25 <sean-k-mooney> ack
12:52:47 <dviroel> chandankumar: is working on this
12:52:58 <chandankumar> dviroel: I have opened this one. functional tests should live in python-watcherclient
12:53:33 <chandankumar> I have all the reviews up and ci jobs are passing.
12:53:43 <dviroel> chandankumar: ack, can you update the bug report? with status, assignee, progress
12:54:17 <chandankumar> sure
12:54:44 <dviroel> any other concern on this one?
12:55:14 <dviroel> ok, just one more for today
12:55:20 <dviroel> #link https://bugs.launchpad.net/watcher-tempest-plugin/+bug/2090853
12:55:56 <dviroel> sean-k-mooney: ^ do you think that we already covered this one? or there is place for more?
12:57:50 <dviroel> just fyi, we added support for prometheus in make_instance_statistic method
12:57:56 <dviroel> #link https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/942141 (Add support for prometheus datasource in scenario tests)
12:58:48 <sean-k-mooney> sorry got pinged for something else
12:59:09 <sean-k-mooney> not entirly
12:59:16 <sean-k-mooney> we have to some degree
12:59:30 <sean-k-mooney> we can proably close it
12:59:49 <sean-k-mooney> but i still think we need to refactor some of the test that dont need metrics to not use them
12:59:50 <dviroel> I can add more info to the bug, wrt to the changes merged recently
13:00:01 <dviroel> ack, agree
13:00:09 <sean-k-mooney> the hardcoding has been adressed
13:00:27 <sean-k-mooney> so i think we can likely close this and file a sepreate bug for "tests that inject metrics they do not use"
13:00:52 <dviroel> sean-k-mooney: ack
13:00:59 * dviroel time check
13:01:06 <dviroel> going to skip the next topic, since we already have a volunteer to chair next week meeting
13:01:17 <dviroel> thanks mtembo fo/
13:01:45 <dviroel> let's wrap up for today
13:01:54 <dviroel> thank you all for participating
13:02:02 <dviroel> #endmeeting