12:01:33 <amoralej> #startmeeting watcher meeting - 10-July-2025 12:01:33 <opendevmeet> Meeting started Thu Jul 10 12:01:33 2025 UTC and is due to finish in 60 minutes. The chair is amoralej. Information about MeetBot at http://wiki.debian.org/MeetBot. 12:01:33 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 12:01:33 <opendevmeet> The meeting name has been set to 'watcher_meeting___10_july_2025' 12:01:56 <rlandy> o/ 12:02:39 <amoralej> courtesy ping: sean-k-mooney jgilaber 12:03:30 <amoralej> let's start with the agenda topics 12:03:52 <amoralej> #topic Eventlet Removal Updates 12:04:05 <amoralej> #link https://etherpad.opendev.org/p/watcher-eventlet-removal 12:04:15 <amoralej> #link https://review.opendev.org/c/openstack/watcher/+/952257 12:04:21 <amoralej> all yours dviroel 12:04:30 <dviroel> o/ 12:04:32 <jgilaber> o/ 12:04:52 <dviroel> the watcher patch mentioned is is marked as wip 12:04:57 <dviroel> but it has some progress 12:05:22 <dviroel> lastest issue found was with the continous audit handler 12:05:42 <dviroel> #link https://review.opendev.org/c/openstack/watcher/+/952257/6/watcher/decision_engine/messaging/audit_endpoint.py 12:06:17 <dviroel> since it is being started in audit endpoint construcotr 12:06:38 <dviroel> it was causing a problem with its jobs running on a different process 12:06:58 <dviroel> last patch set I am testing it now in the decision engine service 12:07:08 <dviroel> the one that we are creating here: 12:07:23 <dviroel> #link https://review.opendev.org/c/openstack/watcher/+/952499 (Merge decision engine services into a single one) 12:08:04 <dviroel> I created new scenarios tests, just to validate continous audit, which are missing from our plugin today 12:08:14 <dviroel> (we only have api tests) 12:08:21 <dviroel> #link https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/954264 12:08:57 <dviroel> I think that we can discuss more if we should duplicate tests to cover continous audits, in a different test file 12:09:14 <dviroel> or if we can change existing tests to cover continous audits 12:09:21 <dviroel> the thing is: 12:09:38 <chandankumar> dviroel: ++ thank you for adding the scenario tempest for adding continous audit. 12:09:51 <amoralej> yep, that was a gap 12:10:10 <dviroel> we need to create compute resources to populate the model, since the continous audit thread will need these model to execute a strategy 12:10:42 <dviroel> but int the end, we don't need to execute the action plan, to validate the continous audit code 12:10:58 <dviroel> that's why this tempest change is not really executing the action plan 12:11:11 <amoralej> yes, no need 12:11:21 <amoralej> but actually, we may use the nop or sleep actions 12:11:30 <amoralej> i guess those do not need any model? 12:11:35 <dviroel> correct 12:11:48 <dviroel> we need strategies that consume model info 12:11:55 <dviroel> to reproduce the isse that I found 12:12:08 <amoralej> ahhhhh, sorry i was missing the point totally 12:12:10 <amoralej> now i got it 12:12:33 <dviroel> the issue with continous audit running in a different process, in threading mode :) 12:12:56 <dviroel> so in the end, it is important for the continous audit consume a real model info 12:12:58 <amoralej> yeah, i wasn't understanding, I do now 12:13:01 <amoralej> yeap 12:13:10 <amoralej> even if we don't run the action plan 12:13:16 <dviroel> yes 12:13:30 <amoralej> makes sense 12:13:40 <dviroel> I was able to reproduce the issue in CI here 12:13:43 <dviroel> #link https://review.opendev.org/c/openstack/watcher/+/954364 12:13:59 <dviroel> results: 12:14:04 <dviroel> #link https://77b011d758712ead8b20-de6b79a0bbc85dd849a2bc7008d89fe0.ssl.cf2.rackcdn.com/openstack/a493d2d23f764b198bd434dbfe5451fd/testr_results.html 12:14:18 <dviroel> see that we also need 2 tests to reproduce the isse 12:14:45 <dviroel> since the first run will trigger a model update 12:14:57 <dviroel> but not the second test, it will not get an update model 12:15:36 <dviroel> in the, just to point that it is important to test the continuous audit in CI 12:15:53 <dviroel> I think that was jgilaber that raised that in previous meetings/chat 12:16:02 <dviroel> jgilaber++ 12:16:42 <dviroel> ok, so I will continue my work in the decision-engine patch, now working with fixing/adding more units tests 12:16:47 <dviroel> to be ready for a review too 12:16:52 <amoralej> actually, one test we may do is create an workload balance or something like that, in first execution, no vms are created, so actionplan is created empty, then create two vms in the same host, and following execution should cratee an non-empty one 12:17:33 <dviroel> right, we can do everything in a single test, to reproduce the issue, it is a good idea 12:18:02 <amoralej> it may be tricky about timing given that it's hard to predict exactly the time when the model is updated, but i think it'd be doable 12:18:41 <dviroel> i will give a try, it won't take too much time 12:18:47 <amoralej> we may need an interval > than 10 seconds :) 12:19:15 <sean-k-mooney> o/ 12:20:03 <amoralej> we may use any other strategy, anyway host_maintenance, etc... maybe that would be better, but the idea is the same 12:21:42 <amoralej> so the continuous handler will be thread within the same process ? 12:21:58 <dviroel> yes, correct 12:23:10 <amoralej> ah, here https://review.opendev.org/c/openstack/watcher/+/952257/6/watcher/decision_engine/service.py 12:23:47 <dviroel> yep, and it will ne started together with the other handlers/schedulers 12:23:59 <dviroel> and not in the audit endpoint constructor 12:24:29 <amoralej> looks much cleaner 12:24:43 <dviroel> and note that the continuous audit handler actually uses the backgroung scheduler 12:25:18 <dviroel> which is also created/init there 12:26:19 <dviroel> any other question/comments on that? we can continuing our discussion in the patch 12:26:28 <amoralej> thanks for the update 12:26:33 <dviroel> one more thing 12:26:41 <amoralej> and thanks for the work you are doing on that 12:26:45 <dviroel> there is a related patch ready for reaview: 12:26:48 <dviroel> #link https://review.opendev.org/c/openstack/watcher/+/952499 12:26:59 <dviroel> which most of you already reviewed/approved 12:27:04 <dviroel> but I had to rebase 12:27:13 <dviroel> and lost your votes 12:27:41 <dviroel> because we merged this 12:27:52 <dviroel> #link https://review.opendev.org/c/openstack/watcher/+/949641 (Move eventlet command scripts to a different dir) 12:28:02 <dviroel> which is also part of the effort 12:28:13 <dviroel> ty all 12:28:28 <dviroel> that's everything I have 12:29:08 <amoralej> ack 12:29:46 <amoralej> let's check if there is any recent bug 12:29:54 <amoralej> #topic bug triage 12:30:48 <sean-k-mooney> ah that makes sense ill lop back to that today, fyi i wont be aroudn tomorrow to review 12:30:52 <amoralej> #link https://bugs.launchpad.net/watcher/+bug/2116304 12:31:10 <amoralej> that's about croniter, is set as triaged already by chandankumar 12:31:47 <chandankumar> I checked the backlog no bug was found, so added it. 12:32:10 <sean-k-mooney> ya so that was a left over form epoxy 12:32:31 <sean-k-mooney> we should fix that and backport it when we have time 12:32:43 <sean-k-mooney> it became less urgent because a new maintianer took it over 12:32:57 <sean-k-mooney> but for the very minimal usage we have to parse the interval 12:32:58 <chandankumar> yes, I am working on a fix, will assign it to myself 12:33:05 <sean-k-mooney> there is not reason to keep it as an extra dep 12:33:37 <amoralej> i have one question 12:34:20 <amoralej> according to croniter doc there is a syntax that i was not aware 12:34:29 <amoralej> sat#1,sun#2 = # 1st Saturday, and 2nd Sunday of the month 12:34:34 <amoralej> is that standard cron? 12:35:02 <sean-k-mooney> im not sure 12:35:11 <sean-k-mooney> i think maybe yes 12:35:12 <amoralej> and will apscheduler know how to manage that? 12:35:20 <sean-k-mooney> the reals quetion is what do we docuemnte as supported 12:35:43 <sean-k-mooney> im pretty sure we dont say conitor format 12:36:03 <amoralej> right we say "Can be set either in seconds or cron syntax" 12:36:27 <amoralej> actually my question is if there are different flavors of cron formats :) 12:36:32 <amoralej> as i had never seen that 12:37:08 <chandankumar> https://apscheduler.readthedocs.io/en/3.x/modules/triggers/cron.html will take care of triggering based on cron format 12:37:39 <sean-k-mooney> you can use sun as an aliase 12:37:52 <sean-k-mooney> so i think this is just an alteriive encodeing 12:38:20 <sean-k-mooney> https://linux.die.net/man/5/crontab 12:38:27 <sean-k-mooney> 0-7 (0 or 7 is Sun, or use names) 12:38:48 <amoralej> yes, my doubt was the #1 or #2 12:39:14 <amoralej> as 1st saturday of the month, i.e. 12:39:23 <sean-k-mooney> i am not sure 12:39:42 <sean-k-mooney> normaly i use the / sysntax 12:39:50 <sean-k-mooney> but # might eb a normal thing 12:39:52 <amoralej> yep 12:40:12 <amoralej> anyway, it's just a minor detail 12:40:40 <sean-k-mooney> google say it is 12:40:46 <sean-k-mooney> https://www.netiq.com/documentation/cloud-manager-2-5/ncm-reference/data/bexyssf.html 12:40:59 <sean-k-mooney> but only for day of the week 12:41:05 <sean-k-mooney> Day of the Week 12:41:07 <sean-k-mooney> 12:41:09 <sean-k-mooney> Yes 12:41:11 <sean-k-mooney> 12:41:13 <sean-k-mooney> 1-7 OR SUN-SAT 12:41:15 <sean-k-mooney> 12:41:17 <sean-k-mooney> , - * ? / L # 12:41:41 <sean-k-mooney> amoralej: so i think we are good 12:41:47 <amoralej> good 12:41:51 <amoralej> thanks for checking 12:42:15 <amoralej> so i think there are no more bugs to discuss about 12:42:38 <amoralej> #link https://bugs.launchpad.net/watcher/+bug/2115058 12:42:53 <amoralej> there is also that, reported by jgilaber, but also marked as triaged 12:43:05 <jgilaber> we have a couple untriaged 12:43:11 <jgilaber> #link https://bugs.launchpad.net/watcher/+bug/2108855 12:43:18 <jgilaber> #link https://bugs.launchpad.net/watcher/+bug/2108994 12:43:19 <sean-k-mooney> the internal az is a specal az in nova 12:43:34 <sean-k-mooney> its not a real one and i dont think it has any equivlent in cinder 12:44:08 <amoralej> right, sorry i was checking only the last ones 12:44:53 <sean-k-mooney> my guess is there is some special handling for internal 12:44:54 <jgilaber> ack sean-k-mooney then I'll try at some point to check creating a new az and see if the problem still persits 12:44:57 <amoralej> internal az is kind of default one if there is no explicit ones? 12:45:02 <sean-k-mooney> since it not an az you are everm ent to use in an api request 12:45:16 <sean-k-mooney> amoralej: no that default az is called nova 12:45:35 <sean-k-mooney> interenl is used for thigns that are not comptues like the metadta api 12:45:45 <sean-k-mooney> its a weird legacy thing 12:45:47 <amoralej> ah, got it 12:46:25 <amoralej> maybe we could even exclude from the model, then 12:47:08 <sean-k-mooney> so for nova that likely ok i am not sure if they use internal for the same thing in cinder 12:47:35 <sean-k-mooney> since this is about the stroage model we shoudl check with them first 12:47:43 <amoralej> good point 12:48:16 <amoralej> but, for the regular case, az names for cinder and nova should not match? 12:48:19 <sean-k-mooney> https://docs.openstack.org/api-ref/compute/#id291 12:48:39 <sean-k-mooney> incase your interested the schduler and condocotrs in nova are part of the Internal zone 12:50:17 <amoralej> wrt https://bugs.launchpad.net/watcher/+bug/2108855 now that the spec is merged, can we set it as triaged? 12:51:04 <sean-k-mooney> we can close it as invlid with a link to the spec since it was a feature request not a bug 12:51:35 <sean-k-mooney> ill do that now 12:51:45 <amoralej> ack, thanks 12:52:01 <amoralej> about bugs.launchpad.net/watcher/+bug/2108994 12:54:02 <sean-k-mooney> im not entily sure about htat but its for 2023.2 which is now end of life 12:54:19 <sean-k-mooney> so i think we just close it unless we see the problem in later relesaes 12:54:34 <jgilaber> I did try to reproduce it at some point 12:54:39 <jgilaber> but I couldn't 12:55:03 <sean-k-mooney> any objection if i set it to wont fix. 12:55:16 <amoralej> or invalid 12:55:21 <jgilaber> +1 from me 12:55:27 <amoralej> the reported didn't replay in almost 2 months... 12:55:45 <sean-k-mooney> it went eol 3 months ago 12:56:39 <amoralej> i think it's good to move to some other state 12:56:45 <amoralej> actualy, maybe incomplete 12:58:10 <sean-k-mooney> so the branh its reproted for is eol so we cant fix it on 2023.2 even if we wanted too 12:58:20 <sean-k-mooney> that why i was saying wont fix rather then incomplete 12:58:25 <sean-k-mooney> but i can update it if you liek 12:58:40 <amoralej> i just did :) 12:59:01 <amoralej> damn, we did it in parallel :) 12:59:08 <sean-k-mooney> yours came second 12:59:11 <sean-k-mooney> so you won 12:59:19 <sean-k-mooney> its fine we can leave it as it is 12:59:24 <amoralej> ok 12:59:34 <amoralej> so i think that was it about bugs 13:00:00 <amoralej> #topic volunteers to chair next meeting 13:00:06 <amoralej> any? 13:00:23 <rlandy> I'll do it - it's been a while 13:00:35 <amoralej> #action rlandy will chair next week 13:00:42 <amoralej> just in time 13:01:00 <amoralej> unless someone has some last minute item, i'm closing the meeting 13:01:34 <amoralej> thank you all for joining! 13:01:43 <amoralej> #endmeeting