12:00:19 <dviroel> #startmeeting watcher 12:00:19 <opendevmeet> Meeting started Thu Aug 7 12:00:19 2025 UTC and is due to finish in 60 minutes. The chair is dviroel. Information about MeetBot at http://wiki.debian.org/MeetBot. 12:00:19 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 12:00:19 <opendevmeet> The meeting name has been set to 'watcher' 12:00:26 <dviroel> who is around today? 12:00:36 <amoralej> o/ 12:01:18 <morenod> o/ 12:02:01 <dviroel> courtesy ping: jgilaber sean-k-mooney chandankumar rlandy 12:02:11 <rlandy> o/ 12:02:21 <dviroel> alright, let's start with today's meeting agenda 12:02:36 <dviroel> #link https://etherpad.opendev.org/p/openstack-watcher-irc-meeting#L40 (Meeting agenda) 12:02:43 <dviroel> feel free to add your own topics to the agenda 12:03:17 <dviroel> #topic Announcements 12:03:48 <dviroel> Flamingo release schedule 12:03:51 <dviroel> #link https://releases.openstack.org/flamingo/schedule.html 12:04:24 <dviroel> adding it here just to mention that we are 3 weeks from feature freeze 12:05:06 <dviroel> and reminder that no features should be merged after this milestone 12:05:08 <dviroel> so please add your changes to the review list in our meeting agenda 12:05:28 <dviroel> today I added a section in the etherpad 12:05:36 <sean-k-mooney> o/ 12:05:53 <dviroel> to help us track the main topic that we plan to review 12:06:09 <dviroel> #link https://etherpad.opendev.org/p/openstack-watcher-irc-meeting#L18 12:06:36 <dviroel> i added 3 proposed features by their topics 12:07:18 <dviroel> feel free to comment, place status update and so on 12:07:57 <dviroel> sorry jwysogla for missing the Aetos topic :( 12:08:03 <dviroel> thanks for adding 12:08:04 <sean-k-mooney> i guess we never created a wtacher status etehrpad 12:08:32 <dviroel> sean-k-mooney: yeah, we may create one, to include everything, bugfixes, features, etc.. 12:08:58 <sean-k-mooney> well we can use the meting one btu this si what nova does https://etherpad.opendev.org/p/nova-2025.2-status 12:09:15 <sean-k-mooney> i tought we did this last cycle but maybe not 12:09:25 <sean-k-mooney> im ok using the irc ether pad for now 12:09:33 <sean-k-mooney> but it nice to have a per cycle one as well 12:09:56 <sean-k-mooney> less noice and its uesful for cycle highlight and the reslease note prelude 12:10:00 <dviroel> yes it is, since the amount of patche is also growing 12:10:58 <dviroel> ok, any other announcement? 12:11:00 <sean-k-mooney> let create one async and start usign it for the remaidner fo the cyel and put a link at the top of the irc etherpad 12:11:09 <dviroel> sean-k-mooney: +1 12:11:49 <dviroel> i can start one after the meeting 12:11:55 <rlandy> +1 12:12:30 <dviroel> #action dviroel to start a watcher status etherpad and link at the meeting etherpad 12:13:33 <dviroel> ok so, next topic in the list 12:13:44 <dviroel> (dviroel): Eventlet Removal Updates 12:13:57 <dviroel> #topic Eventlet Removal Updates 12:14:12 <dviroel> #link https://etherpad.opendev.org/p/watcher-eventlet-removal (watcher evenlet removal etherpad) 12:14:35 <dviroel> there is a good news 12:14:48 <dviroel> the patch that extend decision engine to support threading mode merged 12:14:58 <dviroel> #link https://review.opendev.org/c/openstack/watcher/+/952257 (Extend decision engine to support threading mode) 12:15:09 <dviroel> ty all for the reviews 12:15:24 <dviroel> note that this patch added a new voting job 12:15:44 <dviroel> watcher-prometheus-integration-threading runs with decision-engine in threading mode 12:16:00 <dviroel> we decided to keep it voting, since was stable enough 12:16:08 <amoralej> cool 12:16:32 <dviroel> let us know if your patch start hitting any issue with that specific job, and we can help with debug, or just make it non-voting if required 12:16:54 <dviroel> thre is another job proposed in 12:17:01 <dviroel> #link https://review.opendev.org/c/openstack/watcher/+/955097 (Add a new tox environment to run unit tests in threading mode) 12:17:25 <dviroel> that adds a new tox environment to run unit tests in threading mode too 12:17:39 <dviroel> which should be merging soon too, i think 12:17:49 <sean-k-mooney> i left that unappoved to allow other to reivew 12:17:55 <dviroel> ++ 12:18:09 <sean-k-mooney> but ya llikely early next week if there are no objects 12:18:25 <dviroel> from the decision-engine side, I expect to still go through all threadpool default number of threads 12:18:36 <dviroel> and propose a update if needed 12:18:46 <sean-k-mooney> so im not sure if that will be 12:19:08 <sean-k-mooney> looking at the eventlet code they were prvciously limiting the eventlet pool to 4 eventlets i think 12:19:23 <sean-k-mooney> i have not checked all code path but if that was infact correct for all pools 12:19:34 <sean-k-mooney> then i dont think there will be an issue using 4 real os threads 12:19:57 <sean-k-mooney> the concern with using real os thread was mainly for project where that default limit si 10,000 12:19:58 <dviroel> right, that's the conclusion that I would expect to reach, but still as a in-progress task on my side 12:20:02 <sean-k-mooney> which i think it was in nova 12:20:20 <sean-k-mooney> ack 12:20:40 <dviroel> we may also want to include threadpool stats in debug logs, to help us on debugging 12:20:52 <dviroel> I think that nova is also adding something similar 12:21:03 <sean-k-mooney> ya gibi has added a hook for futurists stat mechanium 12:21:06 <dviroel> but that is not only for decision-engine, but for all 12:21:11 <sean-k-mooney> so you can likely port that 12:21:17 <dviroel> +1 12:21:30 <dviroel> next thing would be to continue the work in the applier 12:21:44 <dviroel> I will provide more udpates as soon as I have something 12:21:46 <sean-k-mooney> https://review.opendev.org/c/openstack/nova/+/948340 12:21:54 <dviroel> or the issue that we found in the way 12:22:09 <dviroel> sean-k-mooney: yeah, this one 12:22:15 <sean-k-mooney> the applier will be a littel trickier 12:22:33 <sean-k-mooney> so the applie is implementign cancelation fo the task by killing the greenthread 12:22:42 <dviroel> yes! 12:22:43 <sean-k-mooney> that is not somethign you can do with an OS pthread 12:23:00 <sean-k-mooney> so we may have to consier usign a process pool but that has other problems 12:23:16 <sean-k-mooney> ro we may have to rethinkn how that works in general 12:23:26 <dviroel> indeed, the applier logic need to be evaluated 12:23:38 <sean-k-mooney> if we did not have time to do that this cycle you have still made a lot fo progress 12:24:47 <sean-k-mooney> dviroel: have we ensured that there is no eventlet usage in the api? 12:25:12 <sean-k-mooney> i know you marked the depercation fo the console script as done 12:25:39 <dviroel> there is a usage only with console script 12:25:40 <sean-k-mooney> but when we run under uswig have you confirmed we nevver use eventlet in teh api 12:25:56 <sean-k-mooney> ya the console script is not a problem because we are just going to delete that 12:26:25 <sean-k-mooney> its the wsgi applciation taht is the main part we need to validate 12:26:41 <dviroel> sean-k-mooney: only digging through the code, and I didn't found anything 12:26:47 <sean-k-mooney> ack 12:27:40 <dviroel> but worth revisting yeah 12:27:49 <dviroel> thanks for the reminder 12:28:01 <dviroel> any other question for this topic? 12:28:27 <amoralej> you are doing a great progress on this 12:28:32 <dviroel> moving forward then, since we have things to cover 12:28:38 * dviroel thanks amoralej 12:28:57 <dviroel> #topic (dviroel) Fix for bug #2098984 12:29:18 <dviroel> i was recently hitting this bug, with a continuous audit 12:29:20 <dviroel> #link https://bugs.launchpad.net/watcher/+bug/2098984 (Zone Migration Strategy failing to build a list of instances for migration) 12:29:31 <dviroel> the tl;dr; is 12:30:07 <dviroel> zone migration strategy uses nova and cinder client to retrive instances and volumes, and then checks if they exist in the compute and storage models 12:30:51 <dviroel> when the instance or volume doesn't exist in the model, it raises an exception, which impatch the audit, setting it to Failure 12:31:05 <dviroel> so the audir doesn't run anymore 12:31:11 <dviroel> there is a patch 12:31:30 <dviroel> #link https://review.opendev.org/c/openstack/watcher/+/956198 12:32:03 <dviroel> where I try to fix this issue with a simple try/expect to ignore the instance/volume 12:32:14 <dviroel> which is the expected behavior in the current code 12:32:42 <dviroel> amoralej: raised a question if we should fix the strategy to instead, only look at the model, and not use the clients 12:33:05 <sean-k-mooney> ya so there are multiple parts to this. 1 your patch is correct and valid, 2 amoralej is also correct that the stragey shoudl be using the data form the model not hitting the api directly 12:33:07 <dviroel> so the idea here is to get a feedback on tha 12:33:24 <amoralej> it's fine for me to fix in two steps 12:33:24 <sean-k-mooney> i think we will want to fix both issue as seperate bugs 12:33:29 <amoralej> wfm 12:34:14 <amoralej> 1st merge the try/expect from dviroel and fill a new bug with "zone_migration uses nova client instead of the model to retrive instances" 12:34:18 <dviroel> i also agree that we should be using these client directly in the strategy 12:35:06 <sean-k-mooney> well using the clinet to take actions is obvioulsy fine as part of the applciation fo the action plan. 12:35:19 <sean-k-mooney> in limited cases it may be ok in the descion engine 12:35:32 <amoralej> theoretically, consuming from model should avoid hitting the other issue, but there may be race conditions with model updates, so good to handle exceptions also 12:35:33 <sean-k-mooney> to enfrich the obejcts form teh model with extra data 12:35:40 <sean-k-mooney> but we should ideally avoid that 12:36:10 <dviroel> the 2nd fix change the behavior a little bit 12:36:10 <sean-k-mooney> amoralej: well the race can alwasy happen and we need to be tolerate to that 12:36:21 <sean-k-mooney> which si where the skip feature comes in 12:36:29 <amoralej> yes 12:36:44 <amoralej> but in that case is between running the audit and executing the action 12:36:49 <dviroel> by getting the list of instances/volumes from the api, we also avoid adding already deleted elements, that are still in the model 12:36:56 <amoralej> while this particular bug is while running the audit, not the action 12:37:21 <dviroel> but yes, this can be mitigated by the migration action, in the pre_ methods 12:37:27 <sean-k-mooney> dviroel: that is correct but we have to handel that the elemen might be deleted in teh applier anyway 12:37:34 <dviroel> ++ 12:37:36 <amoralej> yes 12:37:39 <sean-k-mooney> becuase between the action plan being created and it being appoved it could happen 12:37:46 <dviroel> true 12:37:58 <sean-k-mooney> dviroel: so where i ma with his is i think im pretty comfrotabel backprotign yoru inital patch 12:38:13 <sean-k-mooney> im less sure about changing to usign the model 12:38:35 <amoralej> note, that when i said to use only (unless we can't) the model is in the strategy execution, not in the action execution 12:38:55 <dviroel> so we agree with the current patch, file a new bug to replace the current implementation with model only, and we can discuss if should be backport afterwards or not 12:39:03 <amoralej> +1 12:39:04 <sean-k-mooney> that a bigger change and we need to make sure it does nto result in other bugs before consdiring backproting it. that the ohter reason i would prefer 2 patches and 2 bugs 12:39:15 <sean-k-mooney> +1 12:40:03 <dviroel> #action dviroel to file a new bug to explaing the current behavior with zone_migration using nova/cinder clients to get info about resources 12:40:41 <dviroel> #agreed on continue with the fix https://review.opendev.org/c/openstack/watcher/+/956198 12:41:05 <dviroel> thanks for the feedbacks on that o/ 12:41:29 <dviroel> moving to next topic then 12:41:36 <dviroel> #topic (amoralej) Update about skipped actions feature 12:41:48 <dviroel> amoralej: want to highlight your latest changes? 12:41:51 <amoralej> #link https://review.opendev.org/q/topic:%22blueprint-add-skip-actions%22 12:42:09 <amoralej> so, i've reorganized the changes as discussed a couple of meetings back 12:42:32 <amoralej> main change is that all the api related changes are in the last one https://review.opendev.org/c/openstack/watcher/+/955753/ 12:42:52 <amoralej> including status_message visibility and new actions patch 12:43:04 <amoralej> both covered by the same microversion 1.5 12:43:19 <amoralej> instead of in separate ones, as in my previous patches 12:43:41 <amoralej> other than that, i think organization now is much more clear 12:43:44 <amoralej> and easy to review 12:44:17 <dviroel> amoralej++ 12:44:35 <dviroel> already started to review them, based on the relation chain 12:44:36 <amoralej> also, I sent a new tempest test https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/955775 which is using the microversion support 12:44:40 <amoralej> that dviroel added 12:44:43 <dviroel> ++ 12:44:49 <amoralej> and worked great 12:45:00 * dviroel needs to update its own tempest patch to use that 12:45:11 <amoralej> runing in functional master job, skipped in functiona-<stable> ones 12:45:57 <amoralej> I think we discussed pretty much the implementation details in a previous mtg, so this was just heads-up for reviews 12:46:04 <dviroel> thanks amoralej 12:46:10 <amoralej> the only missing part is the watcherclient and watcher-dashboard one 12:46:28 <amoralej> I'll work on that too 12:46:44 <dviroel> so please folks, review these changes when you have a chance 12:47:10 <dviroel> amoralej: anything else? 12:47:15 <sean-k-mooney> i have approved the parent 12:47:24 <sean-k-mooney> https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/956306/3 so that should merge soon 12:47:26 <amoralej> nothing else from my side 12:47:29 <amoralej> thanks 12:48:08 <dviroel> nice, thanks sean-k-mooney 12:48:11 <dviroel> thanks amoralej 12:48:24 <dviroel> #topic Reviews 12:48:34 <dviroel> again, no specific reviews to mention here 12:48:44 <dviroel> just take a look on the m-3 review list 12:48:57 <sean-k-mooney> jwysogla: proably want to bring up theres? 12:49:01 <dviroel> see what you can help us to move forward 12:49:45 <sean-k-mooney> i did a first pass on the aetos change yesterday https://review.opendev.org/c/openstack/watcher/+/955608 12:49:58 <sean-k-mooney> its in merge conflict becase of some other stuff we landed 12:50:08 <sean-k-mooney> but over all it looks pretty reasonable 12:50:31 <dviroel> great, going to take a look soon too 12:50:34 <sean-k-mooney> the min risks here are on teh timeing of releasign the new version fo the observiablity client and geting that into upper constratis 12:51:18 <sean-k-mooney> jwysogla: what are the changes fo gettign jaunLarriba to appove teh release patch adn geing that doen thsi week? 12:51:50 <sean-k-mooney> oh they already have https://review.opendev.org/c/openstack/releases/+/956657 12:52:09 <sean-k-mooney> ok so this is just pending the release team to approve 12:53:08 <dviroel> ok, so we need to follow up on that too 12:53:30 <dviroel> we can add it under its change in the status etherpad 12:53:46 <dviroel> jwysogla: pls provide us a feedback on that, once you are back online 12:54:07 <jwysogla> Sorry, I was in another meeting 12:54:10 <dviroel> sean-k-mooney: thanks for raising this concern 12:54:19 <sean-k-mooney> jwysogla: no worries 12:54:44 <sean-k-mooney> once the release is doen a bot will propose an update to the requirement repo to bump it in upper constraits 12:54:53 <jwysogla> I hope the observabilityclient releases soon. Then I can finish up the patch and make the CI passing. I run all the tests locally and they were working, so I hope there is not much work left. 12:55:01 <sean-k-mooney> until that is also merged the watcher patch will be blocked 12:55:13 <sean-k-mooney> but that does nto eman we cant review in advance of that 12:55:25 <jwysogla> yes, thanks 12:55:33 <dviroel> correct 12:55:49 <sean-k-mooney> jwysogla: there are some followup that we may want to dicsus after or at the ptg 12:55:50 <jwysogla> I should also mention, that I'll be on PTO last 2 weeks of August, which is quite unfortunate timing. 12:56:22 <jwysogla> But until then, that patch will be my top priority. 12:57:12 <dviroel> ok, this is importanto to know 12:57:15 <sean-k-mooney> jwysogla: basiclly next cycle if we can i woudl like to see if we can evolve our jobs away form sgcore to use the new promtious scape endpoint. i think that woudl be a good topic to dicuss in the telemetyr ptg session or any time before then after we have cut RC1 12:57:43 <jwysogla> +1 to that 12:57:56 <sean-k-mooney> we also need to dicsus the future of gnoici supprot in ceilometer 12:58:29 <morenod> +1 12:58:32 <sean-k-mooney> is it correct that there is a plan to retire the way ceilometer feed data to gnoccic today 12:58:34 <jwysogla> Yes, that's also a good topic for the PTG. 12:59:03 <sean-k-mooney> we had a related topic that we want to dicuss at the ptg about the future or the gnocci backend and that woudl obviously be a factor 12:59:50 <jwysogla> I don't think we have any official "plan". I my personal opinion, gnocchi should go away once in the future. But afaik most users of openstack telemetry outside of Red Hat currently use gnocchi, so we can't just deprecate it. 12:59:52 <sean-k-mooney> jwysogla: is this somethign youc can feed back to the rest of the telemetry team (i.e. our interest in disccsign this) 13:00:11 <jwysogla> yes 13:00:36 <sean-k-mooney> gnoicci relise on sg_core today is that correct? 13:00:45 <sean-k-mooney> is there an intent to deprecate sg_core 13:01:26 <sean-k-mooney> or the ceilometer part. actully we can leave that to a future time 13:01:32 <sean-k-mooney> we are at the top of the hour 13:01:42 <dviroel> yeah 13:01:50 <jwysogla> sg-core is used purely for Prometheus. I think (I'd need to check to be sure) you can currently use the Prometheus exporter just for compute metrics, so ceilometer-central still needs to go through the notification agent and sg-core 13:01:55 <dviroel> sean-k-mooney: ok to move your topic to the next meeting? 13:02:14 <sean-k-mooney> yep 13:02:22 <dviroel> rlandy: thanks for volunteering to be the next chair 13:02:41 <dviroel> ok, so let's wrap up for today 13:02:49 <dviroel> we will meet again next week 13:02:55 <dviroel> thank you all for participating 13:03:04 <dviroel> #endmeeting