12:05:22 #startmeeting Watcher IRC meeting, August 21st, 2025 12:05:22 Meeting started Thu Aug 21 12:05:22 2025 UTC and is due to finish in 60 minutes. The chair is amoralej. Information about MeetBot at http://wiki.debian.org/MeetBot. 12:05:22 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 12:05:22 The meeting name has been set to 'watcher_irc_meeting__august_21st__2025' 12:05:28 o/ 12:06:07 hi o/ 12:06:26 courtesy ping: sean-k-mooney chandankumar 12:06:37 o/ 12:07:02 let's start with the first topic 12:07:25 I will go throug all the features work in https://etherpad.opendev.org/p/watcher-flamingo-status#L13 12:08:10 #topic (Flamingo features) Host Maintenance Strategy - Disable Migration 12:08:16 o/ 12:08:19 #link https://review.opendev.org/q/topic:%22bp//host-maintenance-strategy-disable-migration%22 12:08:20 just a reminder that, feature freeze is next week, so these feature need to merge to get into 2025.2 12:08:48 so for that implementation has already a +2 https://review.opendev.org/c/openstack/watcher/+/952538 12:09:02 so, I assume pending on core review 12:09:08 yeah, we discussed a lot in the change 12:09:30 yep 12:09:31 looks good to merge I think 12:09:38 +1 12:10:01 sean-k-mooney also +1 in past reviews 12:10:23 for the tempest tests, https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/954214 some changes have been requested although it's in good shape 12:10:24 yes i was waitign for dviroel to re review 12:10:48 i wanted to finish lookign at the tempest test but i will likely approve it later today since dviroel is now happy with it too 12:11:04 we may need some improvements/fixes in the tempest patch, but that will not block the main patch 12:11:09 there are a number of follow up dicssion we shoudl have at teh ptg but i tihnk its in a good postion to move forward 12:11:24 yes, a lot of good questions appeared on that 12:11:41 dviroel: ya i saw your comments about the assertions 12:11:54 status is properly tracked in the reviews, so let's move to next one if that's fine for you 12:11:56 and i agree with them in genral but also that it shoudl nto block the feature form landing 12:11:58 sean-k-mooney: also, I think that we need to block the test to run on stable branches 12:12:05 otherwise it will break them 12:12:25 right ^ 12:12:27 but I need to check that 12:12:39 dviroel: yes the test will ned to be skiped on stable we shoudl do this via a tempet plugin config option 12:12:49 default it to false but set it to true on master 12:12:56 upstream we only run api tests in stable releases, but if running scenarios ones, it would fail 12:12:59 ack, since there is no new api microversion for them 12:13:12 amoralej: that something we also need to fix 12:13:21 amoralej: yeah, right! 12:13:22 we shoudl be runnign the senario tests on stable too 12:13:24 yeah, we discussed already i +1 to that 12:13:26 but that seperate 12:13:31 yep 12:13:49 next feature? 12:14:00 so ill keep the patch open in a table and loop back to it after the meeting but yes we can move on 12:14:17 +1 to move 12:14:20 #topic (Flamingo features) Add Skip Actions 12:14:25 #link https://review.opendev.org/q/topic:%22blueprint-add-skip-actions%22 12:14:44 thanks for all the reviews there, jgilaber and dviroel 12:15:18 i got reviews for most of the implementation ones and did some fixes 12:15:23 i am missing the review in the client and finishing the API one, but looks good so far 12:15:44 wrt question in https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/955775 12:16:01 tbh, I'm not sure if the test i added should be in the scenario instead of api 12:16:07 ah yeah, I have that one open to reply 12:16:26 correct, the test is testing the functionallity, I missed that 12:16:29 i execute an entire workflow using dummy 12:16:51 mainly because we don't usually test the funcionality in api tests I think 12:16:54 so, maybe it should be an scenario? i saw similar ones in api, that's why i added there, but i was unsure 12:17:11 I'll wait for your feedback 12:17:48 but I was able to check in the logs that your feature is working 12:17:48 wrt the watcher-dashboard change, it relies on watcherclient patch https://review.opendev.org/c/openstack/python-watcherclient/+/956911 12:18:40 ya so the merge order will be watcher then client then dashbaord 12:18:41 so, i was thinking in waiting for it to be merged and released so that i can update the minimal version of the client 12:19:12 and client release is next week already 12:19:16 but I'm afraid that will be after feature freezee 12:19:32 wdyt? would that be backportable if we miss this release? 12:19:33 we can do an early release of the client before FF 12:19:42 amoralej: no 12:19:54 but we may allow it to merge in the RC period 12:20:04 that'd be good 12:20:41 if we get the feature landed today/tomrrow we can aim for a client release monday and proceed with the dashbaord before FF 12:21:21 unfortunatelly, I'm going to be out next weeks, will be hard to make, unless someone is willing to take care 12:21:40 I have a wip patch locally, but still needs some work 12:22:15 ack we can see how it plays out 12:22:34 ack, let's see if i can send something today 12:22:40 its not the end fo the world it this land early next cycle in the horizon plugin 12:23:15 the only thing that we need in the plugin is the ablity to mark someting as skiped premetivly right 12:23:28 and show status_message 12:23:33 amoralej: If you get wip of watcher-dashboard, I will iterate over that since currently I am already working in that area 12:23:47 ah yes but they coudl see the skipped state when its auto skipped 12:23:59 yes 12:24:25 actually, when skipping manually, i'm presenting a form to ask for an optional reason that will be added to the status_message 12:24:37 thanks chandankumar, let's sync later 12:24:55 we use the release with intermediary release cycle for watcher 12:25:06 so we can also do an early release next cycle if we chosoe too 12:25:18 ok, sounds good 12:25:29 anything else wrt this feature? 12:26:44 #topic (Flamingo features) Extend Compute Model Attributes 12:26:54 #link https://review.opendev.org/q/topic:%22bp/extend-compute-model-attributes%22 12:27:00 ah 12:27:22 so this one was facing an issue 12:27:26 which was reported here 12:27:38 #link https://bugs.launchpad.net/watcher/+bug/2120586 12:28:09 an sean-k-mooney noticed that actually, the python-novaclient has its max api version frozen to 2.96 12:28:24 https://bugs.launchpad.net/watcher/+bug/2120586/comments/1 12:28:44 so, that forces us to move to openstacksdk ? 12:28:47 ya so the bug itself it trival to fix. and as it a bug that can be doen before or after FF up until RC1 12:28:59 we could still fix the mentioned bug, and force the api microversion to 2.100 if we want too 12:29:09 the client supprot is harder to adress and need us to swap to the sdk 12:29:16 which we shoudl do next cycle 12:29:20 but that's not aligned with the version supported by the novaclient 12:29:50 so yes, the proper thing to do is to be limited to 2.96 for now 12:30:11 and adress with the openstacksdk in the future 12:30:33 so I will propose for this cycle, to partially implement the extended attributes 12:30:46 so im +2 on the 2 preceeding patches and was going to merge them after the meeting i fyou want to respine the final patch to limit to 2.96 i woudl be ok with that for this cycle 12:30:57 we will miss the scheduler_hints since it was added in 2.100 12:31:13 i am planning to propose the update today 12:31:29 +1 12:31:40 I think that we can already benefit from the pinned_az and the flavor extra_specs 12:31:50 for future strategies developments 12:32:05 this will be likely the api 1.6 12:32:14 +1 sounds good 12:33:25 anything else wrt this one? 12:33:29 you will need to propose another spec for next cycle 12:33:48 but i think it may make sense anyway so you can desicbe which stragies will make use of the new info in the same one 12:34:11 so, https://review.opendev.org/c/openstack/watcher-specs/+/955921 will be cover this cycle? 12:34:38 or it's part of next cycle work? 12:34:39 oh yes, there is the addition of the flavor extra_specs, which will be covered 12:34:45 this cycle 12:35:04 ya so can you update that to note the hint wont be done this cycle as well 12:35:28 amoralej: thanks for reminding me about that i tought we merged the amened ment already 12:35:28 yes 12:36:49 ok, moving on to next ... 12:36:51 that's it then, I will be working on the updates today - thanks for reviewing 12:37:15 #topic (Flamingo features) Eventlet Removal [MERGED] 12:37:28 so, i assume you are done with the eventlet removal for this cycle? 12:37:39 yes, the proposed patches are all merged 12:37:47 for this cycle 12:37:57 next thing is to work in the applier ones 12:38:06 #info all the expected work for eventlet removal in flamingo is merged! 12:38:31 dviroel: one think i might ask you do do is to write up a short summary of the work so far for the cycle highlights 12:38:57 sean-k-mooney: sure I can do that 12:39:21 great there is no rush but we normally strat those after FF 12:40:00 #topic Aetos datasource [MERGED] 12:40:09 that's also done, right? 12:40:15 any pending task? 12:40:45 #info new datasource aetos has been merged 12:40:50 ++ 12:41:08 yep there is still one followup 12:41:16 to add the aetos job toe the tempest plugin 12:41:41 but the feature itself is complete and that is not bound by the FF deadline 12:41:43 do we expect to get this during this cycle? 12:41:55 its a trivial patch so yes 12:42:17 its already running on master for watcher we just need to add it to check in the tempest plugin 12:42:22 is there support to install and run aetos in devstack ? 12:42:30 yes 12:42:32 this one https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/957556 ? 12:42:35 ah, cool 12:42:46 yep thats the one oh its merged 12:42:58 I totally missed that, sorry 12:43:04 then cool there is nothign mroe for that this cycle then 12:43:32 at the ptg we need to plan a session with telemetry to discuss the future of the promethus direct plugin 12:43:34 so that was it wrt flamingo features 12:43:48 +1 12:44:08 #topic Bug Triage 12:44:14 there are a couple of new bugs 12:44:41 #link https://bugs.launchpad.net/watcher/+bug/2120666 12:45:16 i created that, summary, when adding a new volume from notification, it complains that one field is missing 12:45:35 field is `multiattach` 12:46:00 the volume is added to the model, so the strategies works correctly (as multiattach is currently not used) 12:46:07 but an error is displayed 12:46:44 I couldn't check if the missing field in the notification is expected on cinder notifications, or it may be a bug in cinder 12:46:54 we would need to check the documentation in cinder, to check where is the error 12:47:19 multiattach is not a filed 12:47:47 so there are propeties or cablaities on voluem and that is a dict of string to string effectivly 12:48:02 and multi attaahc i belive is one of hte ones that can be set on teh volume type 12:48:21 so we shoudl not assume its there as 99% of voluems wont have it set 12:48:22 according to api-ref on volumes https://bugs.launchpad.net/watcher/+bug/2120666 12:48:31 https://docs.openstack.org/api-ref/block-storage/v3/index.html#list-accessible-volumes-with-details 12:48:56 multiattach is a mandatory boolean parameter on volumes 12:49:05 field i meant 12:49:24 "If true, this volume can attach to more than one instance." 12:49:27 is it though or is it somethimes null 12:49:52 actually, to be clear, on the volume created that i got that error, the api reported it as false 12:50:17 hum ok this may have chagned in v3 12:50:42 multiattach means it's multiattach capable, not that it's actually mutliattched 12:50:44 i think in v2 it was not mandaroy but if its alwasy set now we shoudl handel that 12:51:35 but the bug comes from a notification, I did not see any documentation of the cinder notification format 12:51:44 so, we may need to call the cinder api during notification handling if we don't get it in the notification 12:51:52 the reason im doubting it is i normally interact with cidner via horizon and it does not show it but that does nto eman it not returned 12:52:00 yes, i couldn't find it also, although i probably didn't research enough 12:52:20 well the noticaiotn have nothitng to do with the rest apis 12:52:26 yeah 12:52:45 the files fo the noticiaton do not use the same obejct as the rest apis so while it may be in the api it migh tnot be in the notifcaion 12:52:57 idealy we woudl not call the cinder api in respocne to a notificiton 12:53:11 yep, but i don't see any other way 12:53:19 we woudl heal any missing information then ext time the perodic runs 12:53:30 yes, it does 12:53:46 i fogot that, that's correct, the information is added in the periodic run 12:53:58 periodic collection 12:54:02 https://paste.opendev.org/show/bxLFmfBTHzTrjzlel9TA/ 12:54:33 if we accept that, we may only want to manage that exception and replace that error by a more friendly warning message 12:54:34 so ya its defintly there in the api responce so we jsut need to be more tolernet to that internally 12:55:03 well we coudl also ask cidner to include it in the notificaiotn object going forward 12:55:36 my doubt is, what if at some point we rely on that field? should we manage the lack of it in the strategy? do some default when handling the notification? 12:55:53 to avoid errors until next periodic collection 12:56:04 so you cant actully migrate multi attach voluems if they are in use 12:56:11 if is a mandatory attribute for a strategy, we may need to 1) user cinder api to get the missing information from the notification OR 2) call the collection before running the strategy 12:56:14 so it is somethign we shoudl be aware of 12:56:32 i think if its not present we woudl have to skip the volume 12:56:48 that may be a good option 12:56:56 since we are tryign to avoid doing api calls during the evacluation fo the audits 12:57:37 again this is proably somethign we should raise with the cidner team as a bug/mini feature and see if we or they can implemtne it in the notificaion object next cycle 12:57:44 in any case, if we call migrate on multiattached volumes, cinder will reject and it would fail, but better to skip 12:58:00 let's keep discussing on the bug 12:58:02 cidner or nova depending 12:58:09 i'd like to present next bug, which is related 12:58:19 #info https://bugs.launchpad.net/watcher/+bug/2121147 12:58:20 you can migrate a multi atach volume if its only attached to 1 vm 12:58:50 right, actually, what we should check is the attachments, then, not the capability 12:59:24 this bug was reported by jgilaber and it's related to an error because of optional field is not found 12:59:26 not quite it can have more then one attachmetn for other reasons. 12:59:32 mainly bugs but it can happen 12:59:51 but ya going back to the next bug 13:00:02 I found that this morning, when building the xml representation of the storage model 13:00:05 this again seam potenally vallid 13:00:30 watcher assumes all pools have a 'total_volumes' capability, but pretty much only the lvm driver reports it 13:00:49 so we noted before that we cant rely on backend having pools 13:00:53 while building the model it's fine because it catches the error https://github.com/openstack/watcher/blob/90f0c2264c4243b4bfa493e4aa371c5315ce163c/watcher/decision_engine/model/collector/cinder.py#L250 13:01:03 and i guess when they do we cant rely on the tootal being reported 13:01:09 but when building the xml it does not https://github.com/openstack/watcher/blob/90f0c2264c4243b4bfa493e4aa371c5315ce163c/watcher/decision_engine/model/element/base.py#L56 13:01:25 but i wonder are there other api requests we can make to list the voluem in a pool/backend and count them ourselves 13:02:15 I think the solution here is to catch the error in the element/base.py 13:02:24 currently, i think that field is not used anywhere, so i think we should make watcher being able to smoothly manage if there is no value rpeported 13:02:47 yes, AFAIK that field is only printed in the xml 13:02:56 amoralej: is it not used for the volume blancing stragey 13:03:05 i guess that might just use teh size rather then count 13:03:44 from a quick glance I don't think it's used 13:03:49 yes 13:03:57 i guess for now we can just make this skip if the value is not aviable 13:03:59 I don't see any reference in https://github.com/openstack/watcher/blob/master/watcher/decision_engine/strategy/strategies/storage_capacity_balance.py 13:04:03 it uses total_capacity_gb 13:04:23 ack 13:04:42 what is as_xml_element used for 13:04:58 https://github.com/openstack/watcher/blob/90f0c2264c4243b4bfa493e4aa371c5315ce163c/watcher/decision_engine/model/notification/cinder.py#L42 13:05:05 which comes from a different field 13:05:23 sorry, we are running out of time 13:05:39 sean-k-mooney, to generate an xml string of the model e.g https://github.com/openstack/watcher/blob/90f0c2264c4243b4bfa493e4aa371c5315ce163c/watcher/decision_engine/model/model_root.py#L225 13:05:48 ya i was just looking at that 13:05:59 i guess we are storign the model as xml blobs in the db then 13:06:13 its use dfor to_string 13:06:19 as well 13:06:49 i think this might mainly be used for debuging 13:06:57 but we shoudl look into thi smore seperatly 13:06:58 I'd be in favor of following the approach from jgilaber of managing the exception or even remove it from the model, it can be risky to have fields that are not implemented in some backends 13:07:04 yes, all uses I've seen are for logging 13:07:35 so a more python approch woudl eb to implemnet proper __repr__ and __str__ functions 13:07:45 so if we confirm its only for logging 13:07:53 i woudl be incldine to rip out all the xml logic 13:08:07 and jsut make the pyton object pretty printable instead 13:08:30 if we need it for storing the objects in the db then we can revisti that seperatly 13:09:00 amoralej: and yes i agree wew shoudl avoid havign backend depent feild 13:09:53 ok, i'm closing the mtg, we can keep discussing in the bug or in irc later 13:09:53 if its part fo the model adn not directly retured int eh current api query we shoudl either consdier removing it and or calulating it another way 13:10:00 +1 13:10:18 yeah, that wfm 13:10:25 thanks all for joining 13:10:41 and thanks dviroel for volunteering to chair next one 13:11:07 and sorry for not managing time properly! :) 13:11:31 #endmeeting