Thursday, 2025-09-18

*** haleyb is now known as haleyb|out00:31
opendevreviewchandan kumar proposed openstack/watcher-dashboard master: Show parameter spec in the strategy info page  https://review.opendev.org/c/openstack/watcher-dashboard/+/96023204:14
opendevreviewchandan kumar proposed openstack/watcher-dashboard master: List strategies based on selected goal  https://review.opendev.org/c/openstack/watcher-dashboard/+/96036304:14
opendevreviewJoan Gilabert proposed openstack/watcher-tempest-plugin master: Add test for volume migrate with zone migration  https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/95864408:13
opendevreviewchandan kumar proposed openstack/watcher-dashboard master: Add Start and END time fields for continuous audit  https://review.opendev.org/c/openstack/watcher-dashboard/+/95723208:56
rlandyhello all ... Watcher IRC Weekly meeting will be in 1.5 hours. Please add any topics to https://etherpad.opendev.org/p/openstack-watcher-irc-meeting#L3810:30
opendevreviewchandan kumar proposed openstack/watcher-dashboard master: Add Start and END time fields for continuous audit  https://review.opendev.org/c/openstack/watcher-dashboard/+/95723210:43
opendevreviewDouglas Viroel proposed openstack/watcher stable/2025.2: Add missing 1.6 API doc in rest version history  https://review.opendev.org/c/openstack/watcher/+/96162511:32
chandankumarsean-k-mooney: dviroel please add these two reviews queue https://review.opendev.org/c/openstack/watcher-dashboard/+/957232 and https://review.opendev.org/c/openstack/watcher-dashboard/+/960363 thank you! Few improvements to strategies.11:32
opendevreviewDouglas Viroel proposed openstack/watcher-tempest-plugin master: Add a scenario test with continuous audit  https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/95426411:46
opendevreviewDouglas Viroel proposed openstack/watcher-tempest-plugin master: Add a scenario test with continuous audit  https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/95426411:47
opendevreviewDouglas Viroel proposed openstack/watcher stable/2025.2: Add unit tests for instance and volume not found in model  https://review.opendev.org/c/openstack/watcher/+/96163211:56
opendevreviewDouglas Viroel proposed openstack/watcher stable/2025.2: Fix zone migration instance not found issue  https://review.opendev.org/c/openstack/watcher/+/96163511:57
rlandy#startmeeting Watcher IRC Weekly Meeting - September 18, 202512:01
opendevmeetMeeting started Thu Sep 18 12:01:39 2025 UTC and is due to finish in 60 minutes.  The chair is rlandy. Information about MeetBot at http://wiki.debian.org/MeetBot.12:01
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.12:01
opendevmeetThe meeting name has been set to 'watcher_irc_weekly_meeting___september_18__2025'12:01
rlandyo/ ... who's around today?12:01
morenodo/12:01
amoralejo/12:02
chandankumaro/12:02
dviroelo/12:03
jgilabero/12:03
rlandycourtesy ping ... sean-k-mooney 12:03
rlandylet's begin ...12:04
rlandy#topic: Announcements12:04
rlandy#link https://etherpad.opendev.org/p/watcher-2026.1-ptg#L3312:05
rlandyPTG topics12:05
rlandywe have quite a few already listed12:05
rlandynotice to add others if there are any12:05
dviroeloh nice, there is always new ones12:06
rlandyany questions/concerns about PTG?12:06
dviroelmore info about PTG at12:07
dviroel#link https://openinfra.org/ptg/12:07
rlandymoving on to the next announcement12:08
rlandy#topic: Core team updates12:08
rlandy removals, reminders, and new addition12:08
dviroelI just added this one, as follow up from previous meeting12:08
rlandycongrats jgilaber on new core status12:09
dviroelsean-k-mooney updated the teams, removed the members mentioned in the ML thread12:09
dviroeljgilaber++12:09
dviroelwelcome jgilaber 12:09
chandankumarcongratulations jgilaber 12:10
amoralejjgilaber++ congrats!12:10
jgilaberthanks all!12:10
rlandythanks sean-k-mooney and dviroel for taking care of these updates12:10
rlandyanything more for announcements?12:11
rlandymoving on ...12:12
rlandy#topic: reviews12:12
rlandyjgilaber, are those your reviews?12:12
jgilaberTwo of the three are mine, the other is from amoralej 12:12
rlandysome combination of jgilaber and amoralej - rights12:12
jgilaberI grouped the first two because they are related12:13
rlandyjgilaber, do you want to take this? otherwise we can go one by one?12:13
jgilaberThey fix errors shown in the decision engine logs that are not actually problems12:13
jgilabersure12:13
rlandygo4it12:13
jgilaber# link https://review.opendev.org/c/openstack/watcher/+/95744112:14
jgilaberthis one prevents an error when a cinder notification is processed but the storage model is not built (there is no storage audit)12:14
dviroelyou can upgrade your vote there now :) 12:14
jgilaberit's copying the same behaviour that the compute model has12:14
amoralejit can be your first +2 :)12:15
jgilaberdviroel, yes I just realized that I had +1 some time ago :)12:15
* dviroel is going to take a look12:15
jgilaberack moving to the second12:16
jgilaber#link https://review.opendev.org/c/openstack/watcher/+/96026512:16
jgilaberwatcher assumes that all cinder drivers report the same attributes for fields, but this is wrong12:16
jgilabersince upstream we only test with lvms we have not noticed this, but when using other storage backends like nfs some fields like total_volumes are not reported12:17
jgilaberthis patch adds a check in the code that builds the xml representation to avoid the error when a field is not present12:17
jgilaberand instead logs that it could not be found12:18
jgilaberthis particular instance is not a problem when building a model but I opened a new bug for a different case where I think it prevents the model being built correctly12:19
jgilaberbut I'm still working on that I'll bring that patch in another meeting12:19
dviroelack12:19
jgilaberif there are no questions I can move the next one12:19
amoraleji think that's fine but, as related topic, we should try to reduce the models to contain only fields which are common to all backends12:19
amoralejanyway, that's a different topic12:20
jgilaber+1, I think we should do that in a followup12:20
amoralej+112:20
dviroel+1, it needs to be reviewed 12:20
jgilaberlast patch12:21
jgilaber#link https://review.opendev.org/c/openstack/watcher/+/95876612:21
jgilaberthis is a followup from a patch we already merged where I added the channel that cinder uses for the notifications by default12:21
jgilaberin this one I remove the old default of 'watcher.watcher_notifications' that is not used anywhere12:22
dviroelack, this one is the one that we don't want to backport right12:22
jgilabercorrect12:22
jgilaberthat's all I have, if there are no more questions we can move on12:24
jgilaberthanks!12:24
rlandythank you jgilaber for bringing those to the team's attention12:24
rlandythat's it for reviews ...12:25
rlandymoving on ...12:25
rlandy#topic Bug Triage12:25
rlandy#link https://bugs.launchpad.net/watcher/+bug/2122148 (Workflow Engine is not reverting Actions for failed Action Plans)12:25
dviroelo/12:26
rlandypls go ahead12:26
dviroelI openned this one, while trying to validate the rollback of an action plan, to test the revert of actions12:26
dviroelas we can see, we have an config option in watcher12:26
dviroelthat enable/disable this rollback12:27
dviroel#link https://docs.openstack.org/watcher/latest/configuration/watcher.html#watcher_applier.rollback_when_actionplan_failed12:27
dviroelwhich defaults to false12:27
dviroelbut when enabled, it doesn't work12:27
dviroelthe revert() is never called12:27
dviroelI am pointing in the bug report why this should be happening 12:28
dviroeland it seems that is not working for a long time12:28
dviroelit is not simple to fix12:28
dviroelso I added a topic in the PTG to discuss how we can proceed with this feature in the future12:29
rlandycan we put that on Triaged and High/medium?12:29
amoralejthe expected behavior is that it rollbacks the entire action plan, right?12:29
dviroelamoralej: yes12:29
amoralejnot only the failed action12:29
dviroelamoralej: correct12:29
dviroelit is possible to start the rollback with a small code change12:29
dviroelbut since it not tested for a long time12:30
dviroelthe revert also fail in other parts, at least for the strategy that I was using12:30
sean-k-mooneythat may be how it was orginally intended12:30
sean-k-mooneybut i don tbelive we have the correctly logic to supprota that12:30
dviroelso at least in this cycle, we should take an action on that12:31
sean-k-mooneydviroel certenly found at least one case i.e. compute node disabled wehere we do not12:31
dviroellike deprecate this feature/config at least, and plan a better way to rollback things12:31
sean-k-mooneyright i personally do not think that config drive automatic rollback is somethign we shoudl supprot12:31
sean-k-mooneyand yes we shoudl consider how to do this a better way12:32
sean-k-mooneyim not gong to say what that shoudl be now but it should not be config driven12:32
chandankumar[rollback mechanism](https://review.opendev.org/c/openstack/watcher/+/746845) abandoned cr from past, might be useful12:32
sean-k-mooneyconfig driven api behavior is not something we shoudl do in general12:32
sean-k-mooneychandankumar: yes that as a apprpoch12:33
sean-k-mooneynever finalised but its one option we coudl explore12:33
sean-k-mooneythat at least made it api driven which is a better direction12:33
dviroel+112:33
dviroelso for now, rollback is not working and is a known bug12:34
amoralejyou mean make it a config option in audits, i.e.?12:34
sean-k-mooneyif i was to design htis now. once a action plan goes to failed i woudl provide an option to calualte a new action plan to roolback the deployment which you could then modify via the skip fetur to only roolback the part you need12:34
sean-k-mooneydviroel: i think we can consider this a know issue yes12:35
sean-k-mooneystill a valid bug but the resoltuion might be a new feature12:35
sean-k-mooneyrather then a backportable bugfix12:35
dviroelimportance? medium/high?12:35
amoralejyep, good to discuss it in ptg12:35
dviroelyeah, maybe we would just deprecate and start a new rollback feature in the end12:36
sean-k-mooneyfixign the config option i think low providing a future rollback capablity high12:36
sean-k-mooneyi woudl expect this to be its own feature with a detailed spec12:36
rlandydviroel: do you have what you need to triage and take the next steps?12:37
jgilaberoverall I think this bug should be high12:37
dviroelrlandy: yes12:37
dviroelI will also assign to myself12:38
dviroeli will set to medium/high here, to get more lights on it, since is something to follow up12:38
rlandyok - thanks dviroel 12:38
rlandynext ...12:39
rlandy#link: https://bugs.launchpad.net/watcher/+bug/2122149 (Host Maintenance does not create Migrate action properly)12:39
dviroelthe next one was found when testing the revert12:39
rlandyin progress?12:39
dviroelhost maintenance sets source node by its uuid, and the destination by hostname12:39
dviroelboth should be hostname12:39
dviroelit doesn't fail because the migration only uses the destination12:40
dviroelbut when you try to revert, it fails :) 12:40
dviroeli proposed a fix 12:40
dviroelwe should at leat build the action properly12:41
dviroelthis is the only strategy that is doing that12:41
sean-k-mooneyyep agreed12:41
dviroel#link https://review.opendev.org/c/openstack/watcher/+/95988912:41
sean-k-mooneyi think the patch is in a reasonable good state over all12:41
sean-k-mooneybut i need to actully review it properly12:42
dviroelthis one I already triaged, set to low12:42
dviroelwe can move to the next bug I think12:42
rlandydviroel: you are on a roll today ... next is yours as well ...12:42
rlandy#link: https://bugs.launchpad.net/watcher/+bug/2122362 (Host Maintenance rollback is not possible due to source COMPUTE_STATUS_DISABLED trait)12:42
dviroelwhich is also host maintenance 12:42
sean-k-mooneythe one thin i was unsure about i swe aare chanign the node form a uuid to a hostname12:42
sean-k-mooneythat implies we do not validat hat in the scema today12:43
sean-k-mooneyso there might be other work to do thare later12:43
dviroelsean-k-mooney: correct, there is no validation in the action schema12:43
sean-k-mooneywell there is but it very very loose12:43
sean-k-mooneythey are defiend as jsonschemas12:43
sean-k-mooney i want to tighten them up in the futur ebut that a spereate converstaion12:44
dviroelmaybe because the uuid is also a string?12:44
sean-k-mooneycorrect but we shoudl have valdiat for hostname like vs uuid like if we can only supprot one of the too12:44
sean-k-mooneyso that is what i ment bey verry very loose12:44
sean-k-mooneybut are valid string but only one is a vaild input12:44
dviroelwould someone use an uuid as hostname? 12:45
sean-k-mooneyanyway the next bug is what i mentioned before.12:45
dviroelack12:45
dviroel2122362 now12:45
dviroelwhen trying the rollback, with other fixes12:45
dviroelit failed to revert again12:45
sean-k-mooneyya so this will be depent on teh stragy used12:46
dviroelmainly because the Action Plan sets the maintenance compute node to Disabled12:46
sean-k-mooneyworkload blancing for exampel willl not disable 12:46
sean-k-mooneyexactly12:46
dviroeland when we try to rollback, it fails, because the node is disabled12:46
dviroelthis is a more complex rollaback scenario12:46
dviroelwhich would need to rollback the compute node state first12:46
sean-k-mooneyso this is an example of how the action the desciosn enginge stragies do not actully supprot orchestration a rollback today12:47
dviroelyes12:47
sean-k-mooneyand that is why i was suggesting a roolback shoudl be its own caludated action plan12:47
sean-k-mooneyso it can actully compute the requried actions12:47
dviroelthis show how the rollback can be a more complex solution, other than just reverting things12:47
dviroelsean-k-mooney already triaged this one, but there is no easy solution for it right now12:48
dviroelbut it is a good example to bring to PTG discussion around rollbacks12:48
sean-k-mooneyya this example is effectivly one concreate usecase that we woudl need the rollback solution to handel12:49
dviroelif you want to check where it started, i have a W-1 patch with a rollback scenario here:12:50
dviroel#link https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/95957012:50
dviroelok, so we can discuss more at PTG I think12:50
rlandyanything more on this bug?12:51
dviroeli think we can move on12:51
rlandy#link https://bugs.launchpad.net/watcher/+bug/2121870 (Zone migration strategy accepts multiple input that are slightly different)12:52
rlandyjgilaber: ^^ yours?12:52
jgilaberyes, just filed this today12:52
jgilabersorry the last one is from today12:52
jgilaberthis one came up during a review of another patch12:52
rlandylast one ... as in https://bugs.launchpad.net/watcher/+bug/2125060 ?12:53
rlandythere are two on the etherpad12:53
jgilaberyes12:53
rlandydo you want to discuss here or triage?12:54
jgilaberabout 2121870, the zone migration strategy does only have a very simple schema validation12:54
dviroelyeah, in 2121870 the strategy could validate that src_* is not duplicated, since it don't use both when proposing a solution.12:54
dviroeli.e. it don't use both dest_* as possible destinations, only the first one12:55
jgilaberexactly, I can't think of any valid use case for such an input12:55
sean-k-mooneyso for https://bugs.launchpad.net/watcher/+bug/212187012:55
jgilaberbut it could happen by mistake, e.g a type in the second line12:55
sean-k-mooneythat shoudl restul in a 400 invalid request12:55
sean-k-mooneyto me that is an invaild request since we do not supprot a list of destionat pools for the same source pool12:56
jgilabers/type/typo12:56
sean-k-mooneyso https://bugs.launchpad.net/watcher/+bug/2121870 is valid and medium to high in my view12:56
jgilaber+1 to that12:56
sean-k-mooneyregarding https://bugs.launchpad.net/watcher/+bug/212506012:57
dviroelagree12:57
sean-k-mooneyi thinik https://bugs.launchpad.net/watcher/+bug/2125060 is also valid but medium12:57
sean-k-mooneythis is a simialr type of issue we hit for nova notificaon too right12:57
jgilaberactually I think that one could be preventing building the model correctly12:57
sean-k-mooneyif the model does not yet have an instnace and we get a noticion tha twoudl update it then it had a bug like this in the past12:58
sean-k-mooneyoh actully this is slightly diffent12:58
jgilaberyes, this is not the same12:58
sean-k-mooneyin thei case it because the total_volumes is nto presnt12:58
dviroelso in the end it doesn't  update the model?12:58
jgilaberyes, and possibly others fields as well12:58
sean-k-mooneyso wherne a backend/pool doe not provide total voluem we need to caluate that difefntly or be more graceful in general12:58
jgilaberI'm working on a patch to treat those fields as optional12:59
jgilaberIIUC this bug prevents the notification being fully processed12:59
sean-k-mooneythat might eb a short term mitigation12:59
jgilabersome of the optional fields like total_volumes I think can be removed13:00
sean-k-mooneyi think long term at least fo r total_volumes we need to heal the value if not present13:00
jgilaberI did a quick grep and I did not see any usage other than the model13:00
sean-k-mooneyetierh async or directly when processing it by listing the volume for a backend/pool via cinders api13:00
jgilaberothers related to capacity are used in the storage balance strategy13:00
sean-k-mooneyack if we have no real usage of it today then yes removing is also fine13:01
dviroelwe need to review all fields from storage model at some point13:01
sean-k-mooney+113:01
jgilabermy plane was to have a short term fix and a longer term review and cleaning of fields13:01
jgilaber*plan13:01
dviroelset to high in this case?13:03
jgilaberI think so 13:03
sean-k-mooneyso the critiate we shoudl be using is effectivly https://docs.openstack.org/project-team-guide/bugs.html#importance13:04
sean-k-mooneythis is a "Failure of a significant feature, no workaround" as it breaks the ablity to update the storage model via notificaions13:04
dviroel+113:04
jgilabermaybe, although there certain conditions to be met for the problem to appear13:04
jgilabere.g using a storage backend different than lvms13:05
sean-k-mooneywell the main one is using a storage backend that does nto provide the expected feilds13:05
sean-k-mooneyright like ceph right?13:05
jgilaberyes, I've seen it with ceph and nfs13:05
sean-k-mooneyright so cpeh is by far the most common sotage backend for openstack13:05
sean-k-mooneylike 50-60% of all deployments13:05
sean-k-mooneywell that have cinder at least13:06
sean-k-mooneyso breakages of ceph are more impactful13:06
jgilaberfair point, but what about the note 13:06
jgilaber"Note that presence of Critical bugs will delay the release."13:06
rlandywe're over time folks ... so we should call it. Please continue if needed on channel13:07
jgilaberdoes that mean that if we set to critical it will force us to include a bug fix in a new rc for flamingo?13:07
amoralejit really breaks adding the volume to the model? i've seen similar errors which didn't impede the volume to be added13:07
jgilaberIt adds the volume but not under the pool13:08
dviroelouch13:08
jgilaberI've have not verified the full impact yet but it could13:08
sean-k-mooneythat becasue pools are optional13:09
sean-k-mooneyand ceph does not use them by defualt13:09
sean-k-mooneyi think13:09
sean-k-mooneyso we shodul not really asusme there are bools13:09
sean-k-mooney*pools13:09
dviroelwe can revisit this bug next meeting too13:10
amoralejit's a bit weird that it calls update_pool or create_pool when adding a new volume13:10
sean-k-mooney+113:10
jgilaberI've seen it with nfs as well13:10
jgilaberbut I think I suspect (need to verify) that it might fail the audit13:10
dviroelrlandy: there is only my nick in the volunteers to chair :) 13:11
sean-k-mooneyamoralej: well that may be becasue the asssume dthat voluem alwasy live in a pool and what to keep the two in sync13:11
dviroeli will add this bug to next meeting agenda 13:11
amoralejcould be13:11
rlandydviroel: ack - thank you - next meeting is yours13:11
rlandyI'll close this one and you can take the bug forward13:11
rlandythanks all13:12
rlandy#endmeeting13:12
opendevmeetMeeting ended Thu Sep 18 13:12:13 2025 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)13:12
opendevmeetMinutes:        https://meetings.opendev.org/meetings/watcher_irc_weekly_meeting___september_18__2025/2025/watcher_irc_weekly_meeting___september_18__2025.2025-09-18-12.01.html13:12
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/watcher_irc_weekly_meeting___september_18__2025/2025/watcher_irc_weekly_meeting___september_18__2025.2025-09-18-12.01.txt13:12
opendevmeetLog:            https://meetings.opendev.org/meetings/watcher_irc_weekly_meeting___september_18__2025/2025/watcher_irc_weekly_meeting___september_18__2025.2025-09-18-12.01.log.html13:12
dviroeltks13:12
opendevreviewDouglas Viroel proposed openstack/watcher stable/2025.2: Add unit tests for instance and volume not found in model  https://review.opendev.org/c/openstack/watcher/+/96163213:27
opendevreviewDouglas Viroel proposed openstack/watcher stable/2025.2: Fix zone migration instance not found issue  https://review.opendev.org/c/openstack/watcher/+/96163513:27
jgilaberI think I was wrong on my assumption, after taking a better look on the failures on nfs I don't think https://bugs.launchpad.net/watcher/+bug/2125060 is making the audit fail13:39
jgilaberthere are two test failing, one is hitting https://bugs.launchpad.net/watcher/+bug/2088118 and the other looks like some misconfiguration of the volume types 13:41
jgilaberor some other bug in the cinder helper13:41
opendevreviewJoan Gilabert proposed openstack/watcher master: [WIP] Handle optional pool fields in Cinder notification  https://review.opendev.org/c/openstack/watcher/+/96166714:04
opendevreviewJoan Gilabert proposed openstack/watcher master: [WIP] Handle optional pool fields in Cinder notification  https://review.opendev.org/c/openstack/watcher/+/96166714:06
opendevreviewJoan Gilabert proposed openstack/watcher-tempest-plugin master: Add test for volume migrate with zone migration  https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/95864416:00

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!