12:01:33 <amoralej> #startmeeting watcher meeting - 10-July-2025
12:01:33 <opendevmeet> Meeting started Thu Jul 10 12:01:33 2025 UTC and is due to finish in 60 minutes.  The chair is amoralej. Information about MeetBot at http://wiki.debian.org/MeetBot.
12:01:33 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
12:01:33 <opendevmeet> The meeting name has been set to 'watcher_meeting___10_july_2025'
12:01:56 <rlandy> o/
12:02:39 <amoralej> courtesy ping: sean-k-mooney jgilaber
12:03:30 <amoralej> let's start with the agenda topics
12:03:52 <amoralej> #topic Eventlet Removal Updates
12:04:05 <amoralej> #link https://etherpad.opendev.org/p/watcher-eventlet-removal
12:04:15 <amoralej> #link https://review.opendev.org/c/openstack/watcher/+/952257
12:04:21 <amoralej> all yours dviroel
12:04:30 <dviroel> o/
12:04:32 <jgilaber> o/
12:04:52 <dviroel> the watcher patch mentioned is is marked as wip
12:04:57 <dviroel> but it has some progress
12:05:22 <dviroel> lastest issue found was with the continous audit handler
12:05:42 <dviroel> #link https://review.opendev.org/c/openstack/watcher/+/952257/6/watcher/decision_engine/messaging/audit_endpoint.py
12:06:17 <dviroel> since it is being started in audit endpoint construcotr
12:06:38 <dviroel> it was causing a problem with its jobs running on a different process
12:06:58 <dviroel> last patch set I am testing it now in the decision engine service
12:07:08 <dviroel> the one that we are creating here:
12:07:23 <dviroel> #link https://review.opendev.org/c/openstack/watcher/+/952499 (Merge decision engine services into a single one)
12:08:04 <dviroel> I created new scenarios tests, just to validate continous audit, which are missing from our plugin today
12:08:14 <dviroel> (we only have api tests)
12:08:21 <dviroel> #link https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/954264
12:08:57 <dviroel> I think that we can discuss more if we should duplicate tests to cover continous audits, in a different test file
12:09:14 <dviroel> or if we can change existing tests to cover continous audits
12:09:21 <dviroel> the thing is:
12:09:38 <chandankumar> dviroel: ++ thank you for adding the scenario tempest for adding continous audit.
12:09:51 <amoralej> yep, that was a gap
12:10:10 <dviroel> we need to create compute resources to populate the model, since the continous audit thread will need these model to execute a strategy
12:10:42 <dviroel> but int the end, we don't need to execute the action plan, to validate the continous audit code
12:10:58 <dviroel> that's why this tempest change is not really executing the action plan
12:11:11 <amoralej> yes, no need
12:11:21 <amoralej> but actually, we may use the nop or sleep actions
12:11:30 <amoralej> i guess those do not need any model?
12:11:35 <dviroel> correct
12:11:48 <dviroel> we need strategies that consume model info
12:11:55 <dviroel> to reproduce the isse that I found
12:12:08 <amoralej> ahhhhh, sorry i was missing the point totally
12:12:10 <amoralej> now i got it
12:12:33 <dviroel> the issue with continous audit running in a different process, in threading mode :)
12:12:56 <dviroel> so in the end, it is important for the continous audit consume a real model info
12:12:58 <amoralej> yeah, i wasn't understanding, I do now
12:13:01 <amoralej> yeap
12:13:10 <amoralej> even if we don't run the action plan
12:13:16 <dviroel> yes
12:13:30 <amoralej> makes sense
12:13:40 <dviroel> I was able to reproduce the issue in CI here
12:13:43 <dviroel> #link https://review.opendev.org/c/openstack/watcher/+/954364
12:13:59 <dviroel> results:
12:14:04 <dviroel> #link https://77b011d758712ead8b20-de6b79a0bbc85dd849a2bc7008d89fe0.ssl.cf2.rackcdn.com/openstack/a493d2d23f764b198bd434dbfe5451fd/testr_results.html
12:14:18 <dviroel> see that we also need 2 tests to reproduce the isse
12:14:45 <dviroel> since the first run will trigger a model update
12:14:57 <dviroel> but not the second test, it will not get an update model
12:15:36 <dviroel> in the, just to point that it is important to test the continuous audit in CI
12:15:53 <dviroel> I think that was jgilaber that raised that in previous meetings/chat
12:16:02 <dviroel> jgilaber++
12:16:42 <dviroel> ok, so I will continue my work in the decision-engine patch, now working with fixing/adding more units tests
12:16:47 <dviroel> to be ready for a review too
12:16:52 <amoralej> actually, one test we may do is create an workload balance or something like that, in first execution, no vms are created, so actionplan is created empty, then create two vms in the same host, and following execution should cratee an non-empty one
12:17:33 <dviroel> right, we can do everything in a single test, to reproduce the issue, it is a good idea
12:18:02 <amoralej> it may be tricky about timing given that it's hard to predict exactly the time when the model is updated, but i think it'd be doable
12:18:41 <dviroel> i will give a try, it won't take too much time
12:18:47 <amoralej> we may need an interval > than 10 seconds :)
12:19:15 <sean-k-mooney> o/
12:20:03 <amoralej> we may use any other strategy, anyway host_maintenance, etc... maybe that would be better, but the idea is the same
12:21:42 <amoralej> so the continuous handler will be thread within the same process ?
12:21:58 <dviroel> yes, correct
12:23:10 <amoralej> ah, here https://review.opendev.org/c/openstack/watcher/+/952257/6/watcher/decision_engine/service.py
12:23:47 <dviroel> yep, and it will ne started together with the other handlers/schedulers
12:23:59 <dviroel> and not in the audit endpoint constructor
12:24:29 <amoralej> looks much cleaner
12:24:43 <dviroel> and note that the continuous audit handler actually uses the backgroung scheduler
12:25:18 <dviroel> which is also created/init there
12:26:19 <dviroel> any other question/comments on that? we can continuing our discussion in the patch
12:26:28 <amoralej> thanks for the update
12:26:33 <dviroel> one more thing
12:26:41 <amoralej> and thanks for the work you are doing on that
12:26:45 <dviroel> there is a related patch ready for reaview:
12:26:48 <dviroel> #link https://review.opendev.org/c/openstack/watcher/+/952499
12:26:59 <dviroel> which most of you already reviewed/approved
12:27:04 <dviroel> but I had to rebase
12:27:13 <dviroel> and lost your votes
12:27:41 <dviroel> because we merged this
12:27:52 <dviroel> #link https://review.opendev.org/c/openstack/watcher/+/949641 (Move eventlet command scripts to a different dir)
12:28:02 <dviroel> which is also part of the effort
12:28:13 <dviroel> ty all
12:28:28 <dviroel> that's everything I have
12:29:08 <amoralej> ack
12:29:46 <amoralej> let's check if there is any recent bug
12:29:54 <amoralej> #topic bug triage
12:30:48 <sean-k-mooney> ah that makes sense ill lop back to that today, fyi i wont be aroudn tomorrow to review
12:30:52 <amoralej> #link https://bugs.launchpad.net/watcher/+bug/2116304
12:31:10 <amoralej> that's about croniter, is set as triaged already by chandankumar
12:31:47 <chandankumar> I checked the backlog no bug was found, so added it.
12:32:10 <sean-k-mooney> ya so that was a left over form epoxy
12:32:31 <sean-k-mooney> we should fix that and backport it when we have time
12:32:43 <sean-k-mooney> it became less urgent because a new maintianer took it over
12:32:57 <sean-k-mooney> but for the very minimal usage we have to parse the interval
12:32:58 <chandankumar> yes, I am working on a fix, will assign it to myself
12:33:05 <sean-k-mooney> there is not reason to keep it as an extra dep
12:33:37 <amoralej> i have one question
12:34:20 <amoralej> according to croniter doc there is a syntax that i was not aware
12:34:29 <amoralej> sat#1,sun#2 = # 1st Saturday, and 2nd Sunday of the month
12:34:34 <amoralej> is that standard cron?
12:35:02 <sean-k-mooney> im not sure
12:35:11 <sean-k-mooney> i think maybe yes
12:35:12 <amoralej> and will apscheduler know how to manage that?
12:35:20 <sean-k-mooney> the reals quetion is what do we docuemnte as supported
12:35:43 <sean-k-mooney> im pretty sure we dont say conitor format
12:36:03 <amoralej> right we say "Can be set either in seconds or cron syntax"
12:36:27 <amoralej> actually my question is if there are different flavors of cron formats :)
12:36:32 <amoralej> as i had never seen that
12:37:08 <chandankumar> https://apscheduler.readthedocs.io/en/3.x/modules/triggers/cron.html will take care of triggering based on cron format
12:37:39 <sean-k-mooney> you can use sun as an aliase
12:37:52 <sean-k-mooney> so i think this is just an alteriive encodeing
12:38:20 <sean-k-mooney> https://linux.die.net/man/5/crontab
12:38:27 <sean-k-mooney> 0-7 (0 or 7 is Sun, or use names)
12:38:48 <amoralej> yes, my doubt was the #1 or #2
12:39:14 <amoralej> as 1st saturday of the month, i.e.
12:39:23 <sean-k-mooney> i am not sure
12:39:42 <sean-k-mooney> normaly i use the / sysntax
12:39:50 <sean-k-mooney> but # might eb a normal thing
12:39:52 <amoralej> yep
12:40:12 <amoralej> anyway, it's just a minor detail
12:40:40 <sean-k-mooney> google say it is
12:40:46 <sean-k-mooney> https://www.netiq.com/documentation/cloud-manager-2-5/ncm-reference/data/bexyssf.html
12:40:59 <sean-k-mooney> but only for day of the week
12:41:05 <sean-k-mooney> Day of the Week
12:41:07 <sean-k-mooney> 
12:41:09 <sean-k-mooney> Yes
12:41:11 <sean-k-mooney> 
12:41:13 <sean-k-mooney> 1-7 OR SUN-SAT
12:41:15 <sean-k-mooney> 
12:41:17 <sean-k-mooney> , - * ? / L #
12:41:41 <sean-k-mooney> amoralej: so i think we are good
12:41:47 <amoralej> good
12:41:51 <amoralej> thanks for checking
12:42:15 <amoralej> so i think there are no more bugs to discuss about
12:42:38 <amoralej> #link https://bugs.launchpad.net/watcher/+bug/2115058
12:42:53 <amoralej> there is also that, reported by jgilaber, but also marked as triaged
12:43:05 <jgilaber> we have a couple untriaged
12:43:11 <jgilaber> #link https://bugs.launchpad.net/watcher/+bug/2108855
12:43:18 <jgilaber> #link https://bugs.launchpad.net/watcher/+bug/2108994
12:43:19 <sean-k-mooney> the internal az is a specal az in nova
12:43:34 <sean-k-mooney> its not a real one and i dont think it has any equivlent in cinder
12:44:08 <amoralej> right, sorry i was checking only the last ones
12:44:53 <sean-k-mooney> my guess is there is some special handling for internal
12:44:54 <jgilaber> ack sean-k-mooney then I'll try at some point to check creating a new az and see if the problem still persits
12:44:57 <amoralej> internal az is kind of default one if there is no explicit ones?
12:45:02 <sean-k-mooney> since it not an az you are everm ent to use in an api request
12:45:16 <sean-k-mooney> amoralej: no that default az is called nova
12:45:35 <sean-k-mooney> interenl is used for thigns that are not comptues like the metadta api
12:45:45 <sean-k-mooney> its a weird legacy thing
12:45:47 <amoralej> ah, got it
12:46:25 <amoralej> maybe we could even exclude from the model, then
12:47:08 <sean-k-mooney> so for nova that likely ok i am not sure if they use internal for the same thing in cinder
12:47:35 <sean-k-mooney> since this is about the stroage model we shoudl check with them first
12:47:43 <amoralej> good point
12:48:16 <amoralej> but, for the regular case, az names for cinder and nova should not match?
12:48:19 <sean-k-mooney> https://docs.openstack.org/api-ref/compute/#id291
12:48:39 <sean-k-mooney> incase your interested the schduler and condocotrs in nova are part of the Internal zone
12:50:17 <amoralej> wrt https://bugs.launchpad.net/watcher/+bug/2108855 now that the spec is merged, can we set it as triaged?
12:51:04 <sean-k-mooney> we can close it as invlid with a link to the spec since it was a feature request not a bug
12:51:35 <sean-k-mooney> ill do that now
12:51:45 <amoralej> ack, thanks
12:52:01 <amoralej> about bugs.launchpad.net/watcher/+bug/2108994
12:54:02 <sean-k-mooney> im not entily sure about htat but its for 2023.2 which is now end of life
12:54:19 <sean-k-mooney> so i think we just close it unless we see the problem in later relesaes
12:54:34 <jgilaber> I did try to reproduce it at some point
12:54:39 <jgilaber> but I couldn't
12:55:03 <sean-k-mooney> any objection if i set it to wont fix.
12:55:16 <amoralej> or invalid
12:55:21 <jgilaber> +1 from me
12:55:27 <amoralej> the reported didn't replay in almost 2 months...
12:55:45 <sean-k-mooney> it went eol 3 months ago
12:56:39 <amoralej> i think it's good to move to some other state
12:56:45 <amoralej> actualy, maybe incomplete
12:58:10 <sean-k-mooney> so the branh its reproted for is eol so we cant fix it on 2023.2 even if we wanted too
12:58:20 <sean-k-mooney> that why i was saying wont fix rather then incomplete
12:58:25 <sean-k-mooney> but i can update it if you liek
12:58:40 <amoralej> i just did :)
12:59:01 <amoralej> damn, we did it in parallel :)
12:59:08 <sean-k-mooney> yours came second
12:59:11 <sean-k-mooney> so you won
12:59:19 <sean-k-mooney> its fine we can leave it as it is
12:59:24 <amoralej> ok
12:59:34 <amoralej> so i think that was it about bugs
13:00:00 <amoralej> #topic volunteers to chair next meeting
13:00:06 <amoralej> any?
13:00:23 <rlandy> I'll do it - it's been a while
13:00:35 <amoralej> #action rlandy will chair next week
13:00:42 <amoralej> just in time
13:01:00 <amoralej> unless someone has some last minute item, i'm closing the meeting
13:01:34 <amoralej> thank you all for joining!
13:01:43 <amoralej> #endmeeting