12:00:38 <dviroel> #startmeeting watcher 12:00:38 <opendevmeet> Meeting started Thu Mar 6 12:00:38 2025 UTC and is due to finish in 60 minutes. The chair is dviroel. Information about MeetBot at http://wiki.debian.org/MeetBot. 12:00:38 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 12:00:38 <opendevmeet> The meeting name has been set to 'watcher' 12:01:00 <dviroel> hi all, who's around today? 12:01:05 <rlandy> o/ 12:01:09 <marios> o/ 12:01:14 <mtembo> o/ 12:01:14 <jgilaber> o/ 12:01:48 <amoralej> o/ 12:02:02 <dviroel> o/ thanks for joining 12:02:09 <dviroel> let's start with today's meeting agenda 12:02:16 <dviroel> #link https://etherpad.opendev.org/p/openstack-watcher-irc-meeting (Meeting agenda) 12:02:44 <dviroel> please feel free to add your own topics to the agenda if you want to highlight something :) 12:03:15 <dviroel> first one 12:03:26 <dviroel> #topic Feature Freeze 12:03:42 <dviroel> #link https://releases.openstack.org/epoxy/schedule.html (2025.1 Epoxy Release Schedule) 12:03:55 <dviroel> we are now in feature freeze period 12:04:17 <dviroel> meaning that no new features, configurations changes or even string changes should be merged by now, or at least avoided 12:05:04 <dviroel> we also don't have any FFE requests by now, so we should be ok for next week 12:05:40 <dviroel> which is RC1 target 12:06:15 <chandankumar> o/ 12:06:24 <dviroel> let me highlight some patches from sean-k-mooney 12:06:35 <dviroel> #link https://review.opendev.org/c/openstack/watcher/+/943206 (Add epoxy prelude) 12:07:07 <dviroel> and 12:07:19 <dviroel> #link https://review.opendev.org/c/openstack/releases/+/943489 (Add cycle highlights for watcher in 2025.1) 12:08:10 <dviroel> thanks sean-k-mooney for proposing these patches 12:08:32 <sean-k-mooney> o/ 12:08:42 <dviroel> if you have any comment or addition to the epoxy prelude, pls add your review there 12:09:17 <sean-k-mooney> i would like to merge the prelude patch early next week 12:09:21 <sean-k-mooney> i.e. monday 12:09:26 <sean-k-mooney> so that its included in RC1 12:09:35 <sean-k-mooney> so please provide any feedback before then 12:09:57 <dviroel> +1 12:10:48 <dviroel> any other comments for this topic? 12:11:04 <dviroel> ok, next one 12:11:13 <dviroel> #topic (rlandy) we need to sort/prioritize PTG topics 12:12:05 <rlandy> thanks dviroel 12:12:16 <rlandy> we have a lot of topics on the etherpad 12:12:31 <rlandy> some of which may need sessions with other DFGs 12:12:42 <rlandy> or groups I should say 12:13:06 <rlandy> so proposing we prioritize this list ... on line or async 12:13:15 <rlandy> whatever the group chooses 12:13:43 <rlandy> ie: we could do this by voting or meeting so that we can settle on a list and request the meeting slots 12:13:44 <marios> i added a sub-point under your agenda item which is related Ronelle 12:13:57 <rlandy> marios - pls go ahead then 12:14:07 <marios> we need to book slots... so perhaps we can book the times, and then folks can start to request particular slots of they want them 12:14:39 <marios> i got an email about the need for us to choose meeting slots (since I registered the team a few weeks back and they needed a specific contact) 12:14:42 <rlandy> well we don't want a chicken/egg situation with slots and topics 12:14:51 <rlandy> one needs to be sorted first 12:14:52 <marios> we need to select slots in https://ptg.opendev.org/ptg.html 12:15:43 <marios> i would propose we book 1300 to 1600 UTC on mon/tue/wed ? ... then we have specific times sorted if it works (we can discuss the proposal but i think that roughly lines up with the folks that have been active thus far at least) 12:15:56 <marios> we can always cancel slots of we don't use them 12:16:41 <marios> #info etherpads are live https://ptg.opendev.org/etherpads.html 12:16:52 <marios> #link https://etherpad.opendev.org/p/apr2025-ptg-watcher 12:17:08 <marios> so any objections to 1300-1600 UTC for mon/tue/wed? 12:17:13 <marios> otherwise i will book those 12:18:17 <dviroel> works for me 12:19:02 <marios> rlandy: for the topics. either one person (or few) can try and group these and then place them into 'slots' (eg we can vary the time but something like 45 mins per topic to allow 10 mins break or whatever way we decide to split it), OR 12:19:02 <chandankumar> above timing works for me too 12:19:02 <jgilaber> sounds good 12:19:08 <amoralej> wfm, i think it's probably more than we need, but if we can cancel, no problem 12:19:28 <marios> rlandy: OR, we ask the folks who added each of those topics, to add them into the PTG etherpad and we take them in order/as listed 12:19:41 <dviroel> sean-k-mooney: not sure if we can avoid confluicts with nova, since there are slots in all days :) 12:19:44 <marios> amoralej: yes exactly better to book and not use it 12:19:51 <marios> also if we don't book we can still decide to meet ;) 12:20:09 <sean-k-mooney> dviroel: there will be conflict but i think that will be managable 12:20:11 <marios> it just means it wont be listed and available to anyone that wants to follow 12:20:14 <amoralej> then wfm 12:20:26 <sean-k-mooney> im not sure either team will need the full slots on all days 12:20:33 <rlandy> marios: fine with either - as long as we have a system that works to get the most out of the meetings 12:21:19 <sean-k-mooney> i do know that nova was considerign mon-weds also becasue some core are not around later in the week or also part fo the tc 12:21:44 <marios> i'm wondering 12:22:01 <marios> sean-k-mooney: should we book thu too, in case we want to schedule something with other teams who are busy on mon-wed 12:22:12 <marios> but i don't know if anyone has plans to invite say nova folks 12:22:29 <sean-k-mooney> i was planning to attend there seesion but we coudl do it the other way around 12:23:01 <sean-k-mooney> lets stick with the current schduler for now and i can reach out to them 12:23:17 <sean-k-mooney> the other team i think we shoudl try and sync with is horizon and or telemetry 12:23:51 <sean-k-mooney> im not sure if we will have topic for either but we should look at when they will have there seesion and see if we can organise cross project topic if they are relevent 12:24:22 <dviroel> ok, so we agree with initial marios slot proposal 12:24:39 <marios> thanks dviroel i will book mon-wed 1300-1600 slots 12:24:53 <dviroel> #action marios to book ptg slots: mon-wed 1300-1600 12:25:27 <marios> rlandy: i guess we will re-visit topics next week? or between now and then 12:25:42 <rlandy> marios: ack - sure 12:25:57 <rlandy> people can vote or self order in the mean time 12:26:06 <rlandy> thank you dviroel - that is all from my side 12:26:35 <dviroel> ack, thanks for bringing this topic 12:26:46 <dviroel> also tks marios o/ 12:27:07 <dviroel> ok, lets move on 12:27:24 <dviroel> #topic Bug Triage 12:27:56 <dviroel> first one 12:28:01 <dviroel> https://bugs.launchpad.net/watcher/+bug/2098374 12:29:15 <dviroel> there are kind of 2 issues in there 12:29:36 <dviroel> one with the audit 12:29:48 <dviroel> other one with the strategy itself? 12:31:28 <jgilaber> the first issue could maybe be related to the evenlet issue? 12:33:12 <amoralej> workload_cache is missing info about this instance.uuid apparently 12:33:40 <jgilaber> the second issue seems to happen in https://github.com/openstack/watcher/blob/77a30ef28140ec6c7748153a733b06d5d5ea55df/watcher/decision_engine/strategy/strategies/workload_balance.py#L162 12:33:46 <amoralej> may also be related to sync between model and something else ... 12:34:28 <jgilaber> the workload_cache is generated by https://github.com/openstack/watcher/blob/77a30ef28140ec6c7748153a733b06d5d5ea55df/watcher/decision_engine/strategy/strategies/workload_balance.py#L206 12:34:49 <dviroel> an exception in the strategy will move the audit to failed, and from there the audit can be deleted only 12:34:51 <jgilaber> there is some logging in that method, more logs could maybe tell us more 12:35:54 <dviroel> it seems that is working as expected and documented 12:35:58 <sean-k-mooney> audits are not mutable today 12:36:00 <dviroel> #link https://docs.openstack.org/watcher/latest/_images/audit_state_machine.png 12:36:05 <amoralej> everything seems to be coming from the model ... 12:36:22 <sean-k-mooney> so if there is an exption i think its expeed to move to failed and then only delete or archive would make sense 12:36:50 <sean-k-mooney> so ya i think this is expected 12:37:15 <dviroel> the issue is really the workload_balance KeyError exception 12:37:25 <dviroel> but it is not what was reported in this bug 12:37:43 <sean-k-mooney> well i think there are two things 12:38:04 <sean-k-mooney> if there is an unrecoverable error its expect to go to fiald 12:38:35 <sean-k-mooney> but if we are not properly handleing effectivly "instnace not found" 12:38:48 <sean-k-mooney> that is a logic bug 12:39:08 <amoralej> but it's continuous, it should keep as ongoing and run on next planned execution 12:39:56 <sean-k-mooney> yes and no. really continous audit shoudl be two seperate concepts 12:40:16 <sean-k-mooney> in the api we should have a disticntion between teh trigger/schduler and the actual audit 12:40:47 <sean-k-mooney> sicne we dont have that distinction today 12:41:08 <sean-k-mooney> it would be valid for the contiuse audit type to perhasp reset to pending 12:41:15 <sean-k-mooney> when the next interval elaspses 12:41:53 <sean-k-mooney> but fundementely this is a design flaw in teh api IMO 12:42:23 <amoralej> i doubt if pending or ongoing, tbh, once an audit is assigned into a decision-engine it's ongoing, iiuc 12:42:32 <amoralej> but yeah, api is confusing 12:42:43 <sean-k-mooney> we are currently conflating when to do an audit and what to audit 12:44:11 <sean-k-mooney> allowing a transition ot ongoing may be valid but im not sure its a bug 12:44:44 <sean-k-mooney> its a semantic api change so i dont think this would be backportable 12:45:01 <sean-k-mooney> we could convert this into a feature 12:45:13 <dviroel> +1 12:45:44 <sean-k-mooney> and add a "recoverable property" to the auit to opt in to requeing on failure perhapse with an upper limit on retrys without success 12:45:56 <amoralej> so you mean in having a separate api element to represent audit execution ? 12:46:00 <sean-k-mooney> i.e. add retry=5 12:46:10 <sean-k-mooney> amoralej: well that another option yes 12:46:52 <amoralej> so, audit_execution would be FAILED, but continous AUDIT would stay ONGOING meaning it will do a new execution according to schedule 12:46:58 <dviroel> i would say that the current bug report is invalid, and a RFE should be created 12:47:22 <dviroel> and maybe a new bug for the workload_balance should also be open 12:47:33 <sean-k-mooney> we could triage this as wishlist, with the rfe tag and say we should adress this with a spec 12:47:38 <amoralej> something interesting would be max_failures for an continous audit 12:47:54 <sean-k-mooney> amoralej: thats what retry=5 was intened to be 12:48:00 <amoralej> which is similar to that, yp 12:48:45 <sean-k-mooney> dviroel: i woudl suggest markign this as opipion, whishlist and add rfe to the tags if no one objects ? 12:49:33 <sean-k-mooney> dviroel: and perhasp have another bug for the sepcific key error 12:49:38 <dviroel> ack, I can also add meeting logs to the bug 12:49:39 <sean-k-mooney> as you suggested 12:49:44 <amoralej> and wrt the KeyError issue, i don't know how it hitted that, tbh 12:50:00 <amoralej> +1 to split in two 12:50:02 <sean-k-mooney> well the instance was likely deleted 12:50:21 <amoralej> but it's getting the data from the model not from live nova 12:50:36 <sean-k-mooney> its getting a mix in some case i think 12:50:40 <amoralej> so, unless it deleted and resynced the model in the middle of the execution ... 12:51:03 <amoralej> or I may be misreading the code 12:51:18 <sean-k-mooney> the key error is proably a valid bug to go fix 12:51:40 <dviroel> yep, requires more investigation 12:52:02 <dviroel> lets move on, I can do the split after the meeting, and update the current one 12:52:25 <dviroel> #link https://bugs.launchpad.net/watcher-tempest-plugin/+bug/2100741 12:52:25 <sean-k-mooney> ack 12:52:47 <dviroel> chandankumar: is working on this 12:52:58 <chandankumar> dviroel: I have opened this one. functional tests should live in python-watcherclient 12:53:33 <chandankumar> I have all the reviews up and ci jobs are passing. 12:53:43 <dviroel> chandankumar: ack, can you update the bug report? with status, assignee, progress 12:54:17 <chandankumar> sure 12:54:44 <dviroel> any other concern on this one? 12:55:14 <dviroel> ok, just one more for today 12:55:20 <dviroel> #link https://bugs.launchpad.net/watcher-tempest-plugin/+bug/2090853 12:55:56 <dviroel> sean-k-mooney: ^ do you think that we already covered this one? or there is place for more? 12:57:50 <dviroel> just fyi, we added support for prometheus in make_instance_statistic method 12:57:56 <dviroel> #link https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/942141 (Add support for prometheus datasource in scenario tests) 12:58:48 <sean-k-mooney> sorry got pinged for something else 12:59:09 <sean-k-mooney> not entirly 12:59:16 <sean-k-mooney> we have to some degree 12:59:30 <sean-k-mooney> we can proably close it 12:59:49 <sean-k-mooney> but i still think we need to refactor some of the test that dont need metrics to not use them 12:59:50 <dviroel> I can add more info to the bug, wrt to the changes merged recently 13:00:01 <dviroel> ack, agree 13:00:09 <sean-k-mooney> the hardcoding has been adressed 13:00:27 <sean-k-mooney> so i think we can likely close this and file a sepreate bug for "tests that inject metrics they do not use" 13:00:52 <dviroel> sean-k-mooney: ack 13:00:59 * dviroel time check 13:01:06 <dviroel> going to skip the next topic, since we already have a volunteer to chair next week meeting 13:01:17 <dviroel> thanks mtembo fo/ 13:01:45 <dviroel> let's wrap up for today 13:01:54 <dviroel> thank you all for participating 13:02:02 <dviroel> #endmeeting