12:05:22 <amoralej> #startmeeting Watcher IRC meeting, August 21st, 2025
12:05:22 <opendevmeet> Meeting started Thu Aug 21 12:05:22 2025 UTC and is due to finish in 60 minutes.  The chair is amoralej. Information about MeetBot at http://wiki.debian.org/MeetBot.
12:05:22 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
12:05:22 <opendevmeet> The meeting name has been set to 'watcher_irc_meeting__august_21st__2025'
12:05:28 <rlandy> o/
12:06:07 <dviroel> hi o/
12:06:26 <amoralej> courtesy ping: sean-k-mooney chandankumar
12:06:37 <chandankumar> o/
12:07:02 <amoralej> let's start with the first topic
12:07:25 <amoralej> I will go throug all the features work in https://etherpad.opendev.org/p/watcher-flamingo-status#L13
12:08:10 <amoralej> #topic (Flamingo features) Host Maintenance Strategy - Disable Migration
12:08:16 <sean-k-mooney> o/
12:08:19 <amoralej> #link https://review.opendev.org/q/topic:%22bp//host-maintenance-strategy-disable-migration%22
12:08:20 <dviroel> just a reminder that, feature freeze is next week, so these feature need to merge to get into 2025.2
12:08:48 <amoralej> so for that implementation has already a +2 https://review.opendev.org/c/openstack/watcher/+/952538
12:09:02 <amoralej> so, I assume pending on core review
12:09:08 <dviroel> yeah, we discussed a lot in the change
12:09:30 <amoralej> yep
12:09:31 <dviroel> looks good to merge I think
12:09:38 <jgilaber> +1
12:10:01 <dviroel> sean-k-mooney also +1 in past reviews
12:10:23 <amoralej> for the tempest tests, https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/954214 some changes have been requested although it's in good shape
12:10:24 <sean-k-mooney> yes i was waitign for dviroel  to re review
12:10:48 <sean-k-mooney> i wanted to finish lookign at the tempest test but i will likely approve it later today since dviroel is now happy with it too
12:11:04 <dviroel> we may need some improvements/fixes in the tempest patch, but that will not block the main patch
12:11:09 <sean-k-mooney> there are a number of follow up dicssion we shoudl have at teh ptg but i tihnk its in a good postion to move forward
12:11:24 <amoralej> yes, a lot of good questions appeared on that
12:11:41 <sean-k-mooney> dviroel: ya i saw your comments about the assertions
12:11:54 <amoralej> status is properly tracked in the reviews, so let's move to next one if that's fine for you
12:11:56 <sean-k-mooney> and i agree with them in genral but also that it shoudl nto block the feature form landing
12:11:58 <dviroel> sean-k-mooney: also, I think that we need to block the test to run on stable branches
12:12:05 <dviroel> otherwise it will break them
12:12:25 <amoralej> right ^
12:12:27 <dviroel> but I need to check that
12:12:39 <sean-k-mooney> dviroel: yes the test will ned to be skiped on stable we shoudl do this via a tempet plugin config option
12:12:49 <sean-k-mooney> default it to false but set it to true on master
12:12:56 <amoralej> upstream we only run api tests in stable releases, but if running scenarios ones, it would fail
12:12:59 <dviroel> ack, since there is no new api microversion for them
12:13:12 <sean-k-mooney> amoralej: that something we also need to fix
12:13:21 <dviroel> amoralej: yeah, right!
12:13:22 <sean-k-mooney> we shoudl be runnign the senario tests on stable too
12:13:24 <amoralej> yeah, we discussed already i +1 to that
12:13:26 <sean-k-mooney> but that seperate
12:13:31 <amoralej> yep
12:13:49 <amoralej> next feature?
12:14:00 <sean-k-mooney> so ill keep the patch open in a table and loop back to it after the meeting but yes we can move on
12:14:17 <dviroel> +1 to move
12:14:20 <amoralej> #topic (Flamingo features) Add Skip Actions
12:14:25 <amoralej> #link https://review.opendev.org/q/topic:%22blueprint-add-skip-actions%22
12:14:44 <amoralej> thanks for all the reviews there, jgilaber and dviroel
12:15:18 <amoralej> i got reviews for most of the implementation ones and did some fixes
12:15:23 <dviroel> i am missing the review in the client and finishing the API one, but looks good so far
12:15:44 <amoralej> wrt question in https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/955775
12:16:01 <amoralej> tbh, I'm not sure if the test i added should be in the scenario instead of api
12:16:07 <dviroel> ah yeah, I have that one open to reply
12:16:26 <dviroel> correct, the test is testing the functionallity, I missed that
12:16:29 <amoralej> i execute an entire workflow using dummy
12:16:51 <dviroel> mainly because we don't usually test the funcionality in api tests I think
12:16:54 <amoralej> so, maybe it should be an scenario? i saw similar ones in api, that's why i added there, but i was unsure
12:17:11 <amoralej> I'll wait for your feedback
12:17:48 <dviroel> but I was able to check in the logs that your feature is working
12:17:48 <amoralej> wrt the watcher-dashboard change, it relies on watcherclient patch https://review.opendev.org/c/openstack/python-watcherclient/+/956911
12:18:40 <sean-k-mooney> ya so the merge order will be watcher then client then dashbaord
12:18:41 <amoralej> so, i was thinking in waiting for it to be merged and released so that i can update the minimal version of the client
12:19:12 <dviroel> and client release is next week already
12:19:16 <amoralej> but I'm afraid that will be after feature freezee
12:19:32 <amoralej> wdyt? would that be backportable if we miss this release?
12:19:33 <sean-k-mooney> we can do an early release of the client before FF
12:19:42 <sean-k-mooney> amoralej: no
12:19:54 <sean-k-mooney> but we may allow it to merge in the RC period
12:20:04 <amoralej> that'd be good
12:20:41 <sean-k-mooney> if we get the feature landed today/tomrrow we can aim for a client release monday and proceed with the dashbaord before FF
12:21:21 <amoralej> unfortunatelly, I'm going to be out next weeks, will be hard to make, unless someone is willing to take care
12:21:40 <amoralej> I have a wip patch locally, but still needs some work
12:22:15 <sean-k-mooney> ack we can see how it plays out
12:22:34 <amoralej> ack, let's see if i can send something today
12:22:40 <sean-k-mooney> its not the end fo the world it this land early next cycle in the horizon plugin
12:23:15 <sean-k-mooney> the only thing that we need in the plugin is the ablity to mark someting as skiped premetivly right
12:23:28 <amoralej> and show status_message
12:23:33 <chandankumar> amoralej: If you get wip of watcher-dashboard, I will iterate over that since currently I am already working in that area
12:23:47 <sean-k-mooney> ah yes but they coudl see the skipped state when its auto skipped
12:23:59 <amoralej> yes
12:24:25 <amoralej> actually, when skipping manually, i'm presenting a form to ask for an optional reason that will be added to the status_message
12:24:37 <amoralej> thanks chandankumar, let's sync later
12:24:55 <sean-k-mooney> we use the release with intermediary release cycle for watcher
12:25:06 <sean-k-mooney> so we can also do an early release next cycle if we chosoe too
12:25:18 <amoralej> ok, sounds good
12:25:29 <amoralej> anything else wrt this feature?
12:26:44 <amoralej> #topic (Flamingo features) Extend Compute Model Attributes
12:26:54 <amoralej> #link https://review.opendev.org/q/topic:%22bp/extend-compute-model-attributes%22
12:27:00 <dviroel> ah
12:27:22 <dviroel> so this one was facing an issue
12:27:26 <dviroel> which was reported here
12:27:38 <dviroel> #link https://bugs.launchpad.net/watcher/+bug/2120586
12:28:09 <dviroel> an sean-k-mooney noticed that actually, the python-novaclient has its max api version frozen to 2.96
12:28:24 <dviroel> https://bugs.launchpad.net/watcher/+bug/2120586/comments/1
12:28:44 <amoralej> so, that forces us to move to openstacksdk ?
12:28:47 <sean-k-mooney> ya so the bug itself it trival to fix. and as it a bug that can be doen before or after FF up until RC1
12:28:59 <dviroel> we could still fix the mentioned bug, and force the api microversion to 2.100 if we want too
12:29:09 <sean-k-mooney> the client supprot is harder to adress and need us to swap to the sdk
12:29:16 <sean-k-mooney> which we shoudl do next cycle
12:29:20 <dviroel> but that's not aligned with the version supported by the novaclient
12:29:50 <dviroel> so yes, the proper thing to do is to be limited to 2.96 for now
12:30:11 <dviroel> and adress with the openstacksdk in the future
12:30:33 <dviroel> so I will propose for this cycle, to partially implement the extended attributes
12:30:46 <sean-k-mooney> so im +2 on the 2 preceeding patches and was going to merge them after the meeting i fyou want to respine the final patch to limit to 2.96 i woudl be ok with that for this cycle
12:30:57 <dviroel> we will miss the scheduler_hints since it was added in 2.100
12:31:13 <dviroel> i am planning to propose the update today
12:31:29 <sean-k-mooney> +1
12:31:40 <dviroel> I think that we can already benefit from the pinned_az and the flavor extra_specs
12:31:50 <dviroel> for future strategies developments
12:32:05 <dviroel> this will be likely the api 1.6
12:32:14 <amoralej> +1 sounds good
12:33:25 <amoralej> anything else wrt this one?
12:33:29 <sean-k-mooney> you will need to propose another spec for next cycle
12:33:48 <sean-k-mooney> but i think it may make sense anyway so you can desicbe which stragies will make use of the new info in the same one
12:34:11 <amoralej> so, https://review.opendev.org/c/openstack/watcher-specs/+/955921 will be cover this cycle?
12:34:38 <amoralej> or it's part of next cycle work?
12:34:39 <dviroel> oh yes, there is the addition of the flavor extra_specs, which will be covered
12:34:45 <dviroel> this cycle
12:35:04 <sean-k-mooney> ya so can you update that to note the hint wont be done this cycle as well
12:35:28 <sean-k-mooney> amoralej: thanks for reminding me about that i tought we merged the amened ment already
12:35:28 <dviroel> yes
12:36:49 <amoralej> ok, moving on to next ...
12:36:51 <dviroel> that's it then, I will be working on the updates today - thanks for reviewing
12:37:15 <amoralej> #topic (Flamingo features) Eventlet Removal [MERGED]
12:37:28 <amoralej> so, i assume you are done with the eventlet removal for this cycle?
12:37:39 <dviroel> yes, the proposed patches are all merged
12:37:47 <dviroel> for this cycle
12:37:57 <dviroel> next thing is to work in the applier ones
12:38:06 <amoralej> #info all the expected work for eventlet removal in flamingo is merged!
12:38:31 <sean-k-mooney> dviroel: one think i might ask you do do is to write up a short summary of the work so far for the cycle highlights
12:38:57 <dviroel> sean-k-mooney: sure I can do that
12:39:21 <sean-k-mooney> great there is no rush but we normally strat those after FF
12:40:00 <amoralej> #topic Aetos datasource [MERGED]
12:40:09 <amoralej> that's also done, right?
12:40:15 <amoralej> any pending task?
12:40:45 <amoralej> #info new datasource aetos has been merged
12:40:50 <dviroel> ++
12:41:08 <sean-k-mooney> yep there is still one followup
12:41:16 <sean-k-mooney> to add the aetos job toe the tempest plugin
12:41:41 <sean-k-mooney> but the feature itself is complete and that is not bound by the FF deadline
12:41:43 <amoralej> do we expect to get this during this cycle?
12:41:55 <sean-k-mooney> its a trivial patch so yes
12:42:17 <sean-k-mooney> its already running on master for watcher we just need to add it to check in the tempest plugin
12:42:22 <amoralej> is there support to install and run aetos in devstack ?
12:42:30 <sean-k-mooney> yes
12:42:32 <dviroel> this one https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/957556 ?
12:42:35 <amoralej> ah, cool
12:42:46 <sean-k-mooney> yep thats the one oh its merged
12:42:58 <amoralej> I totally missed that, sorry
12:43:04 <sean-k-mooney> then cool there is nothign mroe for that this cycle then
12:43:32 <sean-k-mooney> at the ptg we need to plan a session with telemetry to discuss the future of the promethus direct plugin
12:43:34 <amoralej> so that was it wrt flamingo features
12:43:48 <dviroel> +1
12:44:08 <amoralej> #topic Bug Triage
12:44:14 <amoralej> there are a couple of new bugs
12:44:41 <amoralej> #link https://bugs.launchpad.net/watcher/+bug/2120666
12:45:16 <amoralej> i created that, summary, when adding a new volume from notification, it complains that one field is missing
12:45:35 <amoralej> field is `multiattach`
12:46:00 <amoralej> the volume is added to the model, so the strategies works correctly (as multiattach is currently not used)
12:46:07 <amoralej> but an error is displayed
12:46:44 <amoralej> I couldn't check if the missing field in the notification is expected on cinder notifications, or it may be a bug in cinder
12:46:54 <dviroel> we would need to check the documentation in cinder, to check where is the error
12:47:19 <sean-k-mooney> multiattach is not a filed
12:47:47 <sean-k-mooney> so there are propeties or cablaities on voluem and that is a dict of string to string effectivly
12:48:02 <sean-k-mooney> and multi attaahc i belive is one of hte ones that can be set on teh volume type
12:48:21 <sean-k-mooney> so we shoudl not assume its there as 99% of voluems wont have it set
12:48:22 <amoralej> according to api-ref on volumes https://bugs.launchpad.net/watcher/+bug/2120666
12:48:31 <amoralej> https://docs.openstack.org/api-ref/block-storage/v3/index.html#list-accessible-volumes-with-details
12:48:56 <amoralej> multiattach is a mandatory boolean parameter on volumes
12:49:05 <amoralej> field i meant
12:49:24 <amoralej> "If true, this volume can attach to more than one instance."
12:49:27 <sean-k-mooney> is it though or is it somethimes null
12:49:52 <amoralej> actually, to be clear, on the volume created that i got that error, the api reported it as false
12:50:17 <sean-k-mooney> hum ok this may have chagned in v3
12:50:42 <amoralej> multiattach means it's multiattach capable, not that it's actually mutliattched
12:50:44 <sean-k-mooney> i think in v2 it was not mandaroy but if its alwasy set now we shoudl handel that
12:51:35 <jgilaber> but the bug comes from a notification, I did not see any documentation of the cinder notification format
12:51:44 <amoralej> so, we may need to call the cinder api during notification handling if we don't get it in the notification
12:51:52 <sean-k-mooney> the reason im doubting it is i normally interact with cidner via horizon and it does not show it but that does nto eman it not returned
12:52:00 <amoralej> yes, i couldn't find it also, although i probably didn't research enough
12:52:20 <sean-k-mooney> well the noticaiotn have nothitng to do with the rest apis
12:52:26 <amoralej> yeah
12:52:45 <sean-k-mooney> the files fo the noticiaton do not use the same obejct as the rest apis so while it may be in the api it migh tnot be in the notifcaion
12:52:57 <sean-k-mooney> idealy we woudl not call the cinder api in respocne to a notificiton
12:53:11 <amoralej> yep, but i don't see any other way
12:53:19 <sean-k-mooney> we woudl heal any missing information then ext time the perodic runs
12:53:30 <amoralej> yes, it does
12:53:46 <amoralej> i fogot that, that's correct, the information is added in the periodic run
12:53:58 <amoralej> periodic collection
12:54:02 <sean-k-mooney> https://paste.opendev.org/show/bxLFmfBTHzTrjzlel9TA/
12:54:33 <amoralej> if we accept that, we may only want to manage that exception and replace that error by a more friendly warning message
12:54:34 <sean-k-mooney> so ya its defintly there in the api responce so we jsut need to be more tolernet to that internally
12:55:03 <sean-k-mooney> well we coudl also ask cidner to include it in the notificaiotn object going forward
12:55:36 <amoralej> my doubt is, what if at some point we rely on that field? should we manage the lack of it in the strategy? do some default when handling the notification?
12:55:53 <amoralej> to avoid errors until next periodic collection
12:56:04 <sean-k-mooney> so you cant actully migrate multi attach voluems if they are in use
12:56:11 <dviroel> if is a mandatory attribute for a strategy, we may need to 1) user cinder api to get the missing information from the notification OR 2) call the collection before running the strategy
12:56:14 <sean-k-mooney> so it is somethign we shoudl be aware of
12:56:32 <sean-k-mooney> i think if its not present we woudl have to skip the volume
12:56:48 <amoralej> that may be a good option
12:56:56 <sean-k-mooney> since we are tryign to avoid doing api calls during the evacluation fo the audits
12:57:37 <sean-k-mooney> again this is proably somethign we should raise with the cidner team as a bug/mini feature and see if we or they can implemtne it in the notificaion object next cycle
12:57:44 <amoralej> in any case, if we call migrate on multiattached volumes, cinder will reject and it would fail, but better to skip
12:58:00 <amoralej> let's keep discussing on the bug
12:58:02 <sean-k-mooney> cidner or nova depending
12:58:09 <amoralej> i'd like to present next bug, which is related
12:58:19 <amoralej> #info https://bugs.launchpad.net/watcher/+bug/2121147
12:58:20 <sean-k-mooney> you can migrate a multi atach volume if its only attached to 1 vm
12:58:50 <amoralej> right, actually, what we should check is the attachments, then, not the capability
12:59:24 <amoralej> this bug was reported by jgilaber and it's related to an error because of optional field is not found
12:59:26 <sean-k-mooney> not quite it can have more then one attachmetn for other reasons.
12:59:32 <sean-k-mooney> mainly bugs but it can happen
12:59:51 <sean-k-mooney> but ya going back to the next bug
13:00:02 <jgilaber> I found that this morning, when building the xml representation of the storage model
13:00:05 <sean-k-mooney> this again seam potenally vallid
13:00:30 <jgilaber> watcher assumes all pools have a 'total_volumes' capability, but pretty much only the lvm driver reports it
13:00:49 <sean-k-mooney> so we noted before that we cant rely on backend having pools
13:00:53 <jgilaber> while building the model it's fine because it catches the error  https://github.com/openstack/watcher/blob/90f0c2264c4243b4bfa493e4aa371c5315ce163c/watcher/decision_engine/model/collector/cinder.py#L250
13:01:03 <sean-k-mooney> and i guess when they do we cant rely on the tootal being reported
13:01:09 <jgilaber> but when building the xml it does not https://github.com/openstack/watcher/blob/90f0c2264c4243b4bfa493e4aa371c5315ce163c/watcher/decision_engine/model/element/base.py#L56
13:01:25 <sean-k-mooney> but i wonder are there other api requests we can make to list the voluem in a pool/backend and count them ourselves
13:02:15 <jgilaber> I think the solution here is to catch the error in the element/base.py
13:02:24 <amoralej> currently, i think that field is not used anywhere, so i think we should make watcher being able to smoothly manage if there is no value rpeported
13:02:47 <jgilaber> yes, AFAIK that field is only printed in the xml
13:02:56 <sean-k-mooney> amoralej: is it not used for the volume blancing stragey
13:03:05 <sean-k-mooney> i guess that might just use teh size rather then count
13:03:44 <jgilaber> from a quick glance I don't think it's used
13:03:49 <amoralej> yes
13:03:57 <sean-k-mooney> i guess for now we can just make this skip if the value is not aviable
13:03:59 <jgilaber> I don't see any reference in https://github.com/openstack/watcher/blob/master/watcher/decision_engine/strategy/strategies/storage_capacity_balance.py
13:04:03 <amoralej> it uses total_capacity_gb
13:04:23 <sean-k-mooney> ack
13:04:42 <sean-k-mooney> what is as_xml_element used for
13:04:58 <amoralej> https://github.com/openstack/watcher/blob/90f0c2264c4243b4bfa493e4aa371c5315ce163c/watcher/decision_engine/model/notification/cinder.py#L42
13:05:05 <amoralej> which comes from a different field
13:05:23 <amoralej> sorry, we are running out of time
13:05:39 <jgilaber> sean-k-mooney, to generate an xml string of the model e.g https://github.com/openstack/watcher/blob/90f0c2264c4243b4bfa493e4aa371c5315ce163c/watcher/decision_engine/model/model_root.py#L225
13:05:48 <sean-k-mooney> ya i was just looking at that
13:05:59 <sean-k-mooney> i guess we are storign the model as xml blobs in the db then
13:06:13 <sean-k-mooney> its use dfor to_string
13:06:19 <sean-k-mooney> as well
13:06:49 <sean-k-mooney> i think this might mainly be used for debuging
13:06:57 <sean-k-mooney> but we shoudl look into thi smore seperatly
13:06:58 <amoralej> I'd be in favor of following the approach from jgilaber of managing the exception or even remove it from the model, it can be risky to have fields that are not implemented in some backends
13:07:04 <jgilaber> yes, all uses I've seen are for logging
13:07:35 <sean-k-mooney> so a more python approch woudl eb to implemnet proper __repr__ and __str__ functions
13:07:45 <sean-k-mooney> so if we confirm its only for logging
13:07:53 <sean-k-mooney> i woudl be incldine to rip out all the xml logic
13:08:07 <sean-k-mooney> and jsut make the pyton object pretty printable instead
13:08:30 <sean-k-mooney> if we need it for storing the objects in the db then we can revisti that seperatly
13:09:00 <sean-k-mooney> amoralej: and yes i agree wew shoudl avoid havign backend depent feild
13:09:53 <amoralej> ok, i'm closing the mtg, we can keep discussing in the bug or in irc later
13:09:53 <sean-k-mooney> if its part fo the model adn not directly retured int eh current api query we shoudl either consdier removing it and or calulating it another way
13:10:00 <sean-k-mooney> +1
13:10:18 <amoralej> yeah, that wfm
13:10:25 <amoralej> thanks all for joining
13:10:41 <amoralej> and thanks dviroel for volunteering to chair next one
13:11:07 <amoralej> and sorry for not managing time properly! :)
13:11:31 <amoralej> #endmeeting