12:03:21 #startmeeting Watcher meeting - 2025-01-16 12:03:21 Meeting started Thu Jan 16 12:03:21 2025 UTC and is due to finish in 60 minutes. The chair is amoralej. Information about MeetBot at http://wiki.debian.org/MeetBot. 12:03:21 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 12:03:21 The meeting name has been set to 'watcher_meeting___2025_01_16' 12:03:48 please, add your topics to https://etherpad.opendev.org/p/openstack-watcher-irc-meeting 12:04:42 o/ 12:04:46 #link https://etherpad.opendev.org/p/openstack-watcher-irc-meeting meeting agenda 12:04:53 let's start with the first topic 12:05:11 #topic (rlandy): with Martin Kopec changing roles, we will need to new cores for watcher-tempest-plugin 12:05:45 rlandy, you want to introduce the topic? 12:06:02 martin has switched roles 12:06:28 as such we will need to propose other cores for watcher-tempest-plugin 12:06:38 in time 12:06:44 this is just a team fyi 12:07:10 i think we have a general problem with lack of cores (basically we now have just one active core sean-k-mooney ) 12:07:30 in the tempest-plugin case, until now we had martin as well (it is the exception) so here we are also down to one core 12:07:35 fetching the group for reference... 12:08:01 well realisticlaly until late december we didnt have any active cores 12:08:18 so martin has only been in the list for a few weeks and they were on pto for part of that 12:09:00 i added them on the 28th of november 12:09:12 #info https://review.opendev.org/admin/groups/09a91d8e24af9ce44b80062c4851a1d2fa3d4d14 watcher-tempest-core gerrit group 12:09:44 #info https://review.opendev.org/admin/groups/09a91d8e24af9ce44b80062c4851a1d2fa3d4d14,members watcher-tempest-core members 12:09:52 so i was going to propos that we do a review of core membership the first week of febuary 12:10:04 +1 12:10:09 +1 12:10:38 sounds good i think we have discussed doing this around this timeframe before (in the context of things not being able to merge in watcher and dashboard repos) 12:10:45 my plan was to review the review stats in https://openstack.biterg.io/app/dashboards and propsoe a set of potitla cores to each of the watcher group to the mailing list 12:11:23 if we want i can try and prepare that email before the next meeting 12:11:45 and then we can wait for feedback and dicusss in the next meeting 12:12:09 #agreed we will do a review of core membership the first week of febuary 12:12:09 if there are no objects by the meeting after that on jan 30th 12:12:14 i can implemnt the changes 12:12:20 *objections 12:12:22 i guess that's a good plan 12:13:08 so, we can move to next topic? 12:14:40 i will open next one 12:14:47 #topic (marios): update on prometheus datasource 12:14:56 thanks amoralej 12:14:59 #link https://review.opendev.org/c/openstack/watcher/+/934423 12:15:36 as discussed last week there were some requested changes around the auth options, making the fqdn_instance_map more like a cache with a rebuild & retry at least once 12:15:57 those and some other smaller bits where implemented now (including removing the 'prometheus_' prefix on the config options) 12:16:36 there have been no further comments or requests yet and we have a +2 from sean-k-mooney and +1 from various other folks who have requested changes 12:16:48 thanks again to everyone for all your suggestions and improvements. 12:17:10 since we have the core issue, I would propose that if there are no negative comments by end of next week we can merge it? 12:17:17 * marios checks date on the patch 12:17:27 (I discussed that bit about merge with sean-k-mooney privately already 12:17:47 yep 12:17:59 yeah so i updated that jan 10th 12:18:00 +1 to merge it asasp 12:18:09 so for singlel core approval i want ot 1 leave time for other to review ideally at least 2 weeks 12:18:10 i'd say tomorrow is 1 week , so next friday sound good sean-k-mooney ? 12:18:22 2 see reviews form non cores with no objections 12:18:36 btw, i started some work to integrate that in a deployment tool and I'm already relying in the config options set in latest PS :) 12:18:40 and 3 adress it by buildign out the core team so it is not requried long term 12:19:56 @marios yes the end of next week was what i had in mind so either after the next team meeting or fiday 12:20:27 #info planning to merge https://review.opendev.org/c/openstack/watcher/+/934423 by next friday 24 unless there are objections 12:21:01 so amoralej has already started iterating with the instance work on top 12:21:18 i wouldn't want it slipping further into february to merge i mean 12:21:35 thanks, that's all i had on this topic amoralej if there are no further comments we can move on? 12:21:47 actually my instance work is next topic :) 12:22:12 #topic (amoralej) add instance metrics into prometheus datasource 12:22:25 #link https://review.opendev.org/c/openstack/watcher/+/938893/ 12:22:30 this is mainly a call for review 12:22:57 Given that the merge of previous one is approaching I'd like to also get this one reviewed when you have a chance 12:23:11 i think its already looking good amoralej thanks for jumping on that 12:24:29 it is much simpler that the one adding the datasource so i hope will be faster to review 12:25:43 the main thing that i think is missing (and coudl be in a follow up patch) 12:26:04 is i would like use to also extend the new tempest job to start testign with the new datasouce 12:26:38 that does require work in the tempest plugin but we shoudl at elast enabel/configure the new data source sooner rather then later once its merged 12:27:02 yes big +1 12:27:21 amoralej: has been testing on his env with the datasource but we should get into ci asap 12:27:23 yes, +1 for me too 12:27:45 im ok to defer the decion on if we start doing that in https://review.opendev.org/c/openstack/watcher/+/938893 or a follow up patch 12:27:48 as you said, I'd propose to make that follow up patch 12:27:58 but it would be nice to work on that before we merge it 12:28:00 yeah i think it can/should be different patch 12:28:03 chandankumar is close to being able to add prometheus metrics in the plugin 12:28:05 ack 12:28:15 so we should be able to extend the test shortly 12:28:40 for me it is a matter of time, if we can add it soon, no problem in including it into the patch 12:28:47 ok we can defer this to gerrit review and see how the various efforts come togheter 12:28:58 i think we are generally in agreement on the direction 12:29:14 yes 12:30:23 ok 12:30:27 so that's it for this one 12:30:31 moving to next one 12:30:55 #topic (rlandy) reminder bug triage continuation on Tuesday, January 21, 2025 12pm UTC (gmeet and IRC as sent on the ML) 12:31:04 i guess this is just a reminder 12:31:48 #info next bug triage session is on Tuesday, January 21, 2025 12pm UTC, details for anyone interested are in the mailing list 12:31:51 rlandy: did a great job running that but... do you want to do it again or would you like us to rotate? 12:32:37 #link https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/message/5KZUHLXUOGTBXMQ4HERO52XB7I5A3HXI/ 12:33:20 rlandy: do you want us to rotate the chair on that? ^^ 12:34:01 either way 12:34:27 I can finish off this one - and then next time we do it, someone else can take it 12:34:39 ie: I'll take this Tuesday's 12:34:48 sounds good thank you rlandy 12:34:59 we should be able to get through a fair chunk after next meeting 12:35:06 we had some overhead on the first one ;) 12:35:13 btw, i found a great exercise to keep learning about the status of watcher 12:35:42 yep, we well go faster next time :) 12:36:49 #topic next chair 12:37:06 any volunteer to chair next meeting? i don't want to forget this time 12:37:08 :) 12:37:20 i didn't do this for a while, but if there is someone who wants to/didn't go yet i will not fight you for it 12:38:15 looks like it's yours marios 12:38:21 yup 12:38:30 #action marios will chair next meeting 12:38:35 #topic open floor 12:39:07 out of the ones in the agenda, is there some other topic you'd like to discuss ? 12:39:22 i have one minor update 12:39:28 not in the adgeda 12:39:31 * dviroel proposing myself for chair the meeting on 30th 12:39:39 thanks dviroel 12:39:48 ok, sean-k-mooney go ahead please 12:40:13 #link https://bugs.launchpad.net/watcher/+bug/2086710 12:40:25 i have been looking into ^ on and off for a while 12:40:35 that's an important one ... 12:40:45 chattign to jayF yesterday i took a look at eventlet 12:40:58 and found that the behavior was changed in 12:41:02 #link https://github.com/eventlet/eventlet/pull/932 12:41:02 i think this randomly fails in the -strategies job right ? 12:41:13 so i have filed 12:41:17 #link https://github.com/eventlet/eventlet/issues/1014 12:41:28 marios: no this is not realted to -straegies 12:41:37 that is a bug in the tempest plugin 12:42:05 the -stragies job failure is 12:42:09 #link https://bugs.launchpad.net/watcher-tempest-plugin/+bug/2090854 12:42:40 thank you looking at that 12:42:52 sean-k-mooney, so we need to wait on the eventlet patch, righ? not fix from watcher side ? 12:42:55 anyway my update is that the runtime error were previosusly asserts in eventlet prior to eventlet 0.36.0 12:43:21 at least one of the new expction seams to have been incorrect to raise 12:43:35 the other exception may be valid and may be a sqlachemy 2.0 issue 12:44:11 so right now its not clear if we will have to fix anything more in wathcer for this or if we will need to adress this in eventlet/sqlachmey 12:44:35 ill continue to follow this and update folks btu that was what i wanted to highlight 12:45:12 freeze for non-client libs is Feb 17 - Feb 21, i hope it arrives on timme 12:45:31 we probably can manage it as some kind of exception otherwise 12:46:12 so its also unclear why we are seeign diffent behavior on 3.9 vs 3.12 12:46:28 but yes we will have to see how we proceed if we do not have a reolution by then 12:47:05 ill also not that i woudl have expected this to also impact other services like nova 12:47:19 the fact its not impleis there is some watcher specific context to this 12:47:31 which is why im gong to continue to look into this in paralle 12:48:04 as you said before, probably the specific thing is using APScheduler, right? 12:48:15 yes 12:48:25 i did a review of which project use it in openstack 12:48:28 thereare 5 12:48:34 3 are dead 12:48:49 zuul does not use eventlet 12:49:09 so watcher is the only "active" project with eventlet + apscheduler 12:49:32 i also think it required the sqlachemy datastore to be used with apscheduler too 12:49:51 im plannign to try and create a smaller repoducer with that combination today 12:49:56 so, it'd make sense that the issue is in the combination... this is a hard one sean-k-mooney , thanks for investigating it 12:50:33 a possibel PTG topic might be shoudl we remove/replace the usage of apschduler entirly 12:51:07 that not somethign i think we can do this cycle however. removbing eventlet woruld also be an optin but again too large to consider for 2025.1 12:51:53 as a comunity it is somethign we shoudl evaluate however as long term apschduler is workign on a 4.0 relase that is not backwards comaptible 12:52:00 we probably can follow what other active projects with similar usecase are doing 12:52:13 so we woudl have to do a large migration in the next cycle ot two anyway form 3.x 12:52:47 what are the alternatives to apscheduler ? 12:53:04 which projects may have a similar usecase ? 12:53:13 :) good question. today we useign it for two things 12:53:23 1 runing perodic tasks that done need to use it 12:53:30 and the contiious audit 12:53:42 the first usecase it easy to remvoe 12:53:49 the second is why it was added in the first place 12:53:54 yeah 12:54:13 there is also a scalebality element 12:54:33 currently its not clear if we can horizontally scale watcher 12:54:48 i.e. to distibute the continous audits between deamons 12:55:00 anyway i think we can take this out of the meeting 12:55:07 right 12:55:22 but it is an interesting conversation to have, out of the mtg :) 12:55:30 also, we only have 5 more minutes 12:55:47 anything else you want to add before closing ? 12:57:52 I am closing then 12:58:01 Thanks all for joining! 12:58:06 thank you amoralej 12:58:12 thanks! 12:58:15 thanks all and thanks for running amoralej o/ 12:58:23 #endmeeting