Tuesday, 2025-05-13

opendevreviewMerged openstack/watcher master: Set number of decimal digits in efficacy indicator  https://review.opendev.org/c/openstack/watcher/+/94519900:06
opendevreviewchandan kumar proposed openstack/watcher master: Drop nova command reference from the code  https://review.opendev.org/c/openstack/watcher/+/94782803:26
opendevreviewchandan kumar proposed openstack/watcher-tempest-plugin master: [wip] Use self.get_host_for_server  https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/94955703:47
opendevreviewchandan kumar proposed openstack/watcher-tempest-plugin master: [wip] Use self.get_host_for_server  https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/94955706:08
opendevreviewchandan kumar proposed openstack/watcher-tempest-plugin master: Use get_host_for_server to get server host  https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/94955709:14
amoralej@dviroel, I've created https://bugs.launchpad.net/watcher/+bug/2110538 , iiuc this is one of the problems related to watcher wrongly managing parameters validation, right? 10:14
opendevreviewJoan Gilabert proposed openstack/watcher master: Check logs in some cinder and nova helper tests  https://review.opendev.org/c/openstack/watcher/+/94918710:32
opendevreviewJoan Gilabert proposed openstack/watcher master: Fix incorrect logging format  https://review.opendev.org/c/openstack/watcher/+/82255910:32
opendevreviewJoan Gilabert proposed openstack/watcher master: Check logs in some cinder and nova helper tests  https://review.opendev.org/c/openstack/watcher/+/94918711:45
opendevreviewJoan Gilabert proposed openstack/watcher master: Fix incorrect logging format  https://review.opendev.org/c/openstack/watcher/+/82255911:45
dviroelamoralej: correct, api responding 500 \11:45
opendevreviewMerged openstack/watcher master: Drop nova command reference from the code  https://review.opendev.org/c/openstack/watcher/+/94782812:39
sean-k-mooneyCANCELLED is or was in the state diagram but i belive that is diffent then SKIPPED12:53
sean-k-mooneyCANCELLED to me woudl only make sense if we approved applying the action plan and then canceled it before it ran12:54
sean-k-mooneywhere as SKIPPED woudl be doen either automaticlly beased on preconditions or proactivly before the action plan is approvved12:55
sean-k-mooneya SKIPPED in my mind woudl either result in teh action never enterign pending or perhaps ebing created in the case where we mark an action as skipped in the action plan before we approve. or it is the state after pedning instead fo ONGOING where we skip due to a failed precondtion12:57
sean-k-mooneyfor exampel when the applier dequeues a pending live migraiton action before it got to ongoing if the vm has been deleted it woudl transition to skipped12:58
sean-k-mooneywhere as the task woudl transtion to cnacheld if teh admin canceld the action plan while it was running12:59
sean-k-mooneyi.e. if they wanted to migrate 100 instnaces and ran out of time in a mantaince window and cancled the remaing operations12:59
sean-k-mooneyamoralej: so i would not reuse CANCELED for SKIPPED ^13:00
sean-k-mooneyjgilaber: ^13:00
amoralejreading back13:00
sean-k-mooneyi think in either case we shoudl capture this question in the spec 13:00
sean-k-mooneyi would lean toward listing using the canasled state as an alternitive in the spec but if we decied to pivort and use canceled we can list SKIPPED as the alternitive13:01
amoralejsean-k-mooney, what lead me to reuse CANCELLED was the wording in https://github.com/openstack/watcher/blob/f38ab70ba46756b2c3ae74b1a2fafdb39ac58cc7/watcher/api/controllers/v1/action.py#L49-L5113:03
sean-k-mooneythe wrodign there specificly say cancedl by the administror13:03
sean-k-mooneywhich mean allowing a task to enter that state because of a preconsition is exiplcity out of scope13:04
amoralejthat's true13:04
sean-k-mooneyand when we discuss skiped it was not a state you woudl reach form ONGOING13:04
sean-k-mooneyi.e. you could not skip an inporgress action13:04
amoralejcorrect, in this case transition from ONGOING to CANCELLED is used when the applier dies while an action is ongoing and it's started13:05
sean-k-mooneyso we could use canceld if we belive the UX is better but im not sold that it is13:05
amoralejthen the action is set to CANCELLED13:05
sean-k-mooneyhum13:05
amoralejso, iiuc, you are right with the two current use cases of CANCELLED, right?13:05
sean-k-mooneyim ok with the exiting usage of canceled13:05
sean-k-mooneybut i think we shoudl also have skipped13:06
sean-k-mooneyfor cases where we never ty to apply the action 13:06
sean-k-mooneyi guess in teh case where the appier dies it only goes to cancheld once you restart it right13:06
amoralejbut one of the current cases implies also the action was never applied, which is when the entire action plan is cancelled from PENDING13:07
sean-k-mooneyright13:07
amoralejin that case is also moving to CANCELLED13:07
sean-k-mooneyand to me that is not the same as SKIPPED13:07
sean-k-mooneyso if we mark an action as skipped in teh action_plan13:07
sean-k-mooneyi do not expect that action to be created for the applier to apply13:08
sean-k-mooneyi would expect the decsion engin to omit it form the set of actions13:08
amoralejuh, that's different to what i was thinking ...13:08
sean-k-mooneyand an action woudl only ever end up in SKIPPED if it failed the preconditoin13:08
sean-k-mooneywe coudl do it differntly 13:08
sean-k-mooneyi have not had time to read your sepc yet13:09
sean-k-mooneyso im open to other approches. im just stating how i would expect it to work based on our high level converstaion13:09
amoralejtbc, I'm not totally against adding new SKIPPED action, but i think it's good to discuss about options13:10
sean-k-mooney:) that why we create specs. so we can capture the diffent optins and capture why we did tno take the alternitives before then moving on to explain how we will implement the proposed solution :)13:11
amoralejin my proposal, when an AP is created, no actions can be added/deleted, but when we mark it as SKIPPED or CANCELLED (whatever we decide), the applier will simply do nothing and will jump to next one13:11
amoralejotherwise, if we modify the AP graph definition, we need to redefine parent relations and so on, which will make it more complex with small gain, imo13:12
sean-k-mooneythats doable but i think CANCELD is not a good name in the case where an admin does nto want it to be applied in that case as it was never accpated at that point13:12
sean-k-mooneywell i was trying to avoid ever creatign the action objects and havign to tack there statte.13:13
amoralejthat's reasonable, what made me move on with the CANCELLED state was the analogy with the resulting state of the actions when an entire AP is cancelled13:14
amoralejin that case, the action was also never tried13:14
sean-k-mooneyright and i dont really like conflating the two.13:14
sean-k-mooneybecause the action could have been in the ongoing state 13:14
amoralejthat's right13:14
amoralejAPs can be cancelled while ongoing13:15
sean-k-mooneyif cancled only applied to PENDING 13:15
sean-k-mooneythen it woudl be equivclent ot SKIPPED 13:15
sean-k-mooneyso there are some operation that cant be aborted once started13:15
sean-k-mooneyso you cant treat cancelign in ongoing and pending the same in general13:16
amoralejyeah, i don't want to allow to skip actions while ongoing, only while pending13:16
amoralejack, that makes sense13:16
sean-k-mooneyso if we cancel an ongoing live migration we would need to abort the migration (well or force compelte it)13:16
amoralejI will wait for feedback in the review so that others can also add their feedback and i will rewrite13:17
amoralejyeah, +1 to forbid to cancel an ongoing action13:17
sean-k-mooneywell its allowed todya right13:17
sean-k-mooneyits more that cancelign an ongoing action required roleback logic13:18
sean-k-mooneythere was a proposal to add that in the past but never completed13:18
sean-k-mooneycanel basically cant be a noop in many cases13:18
amoralejyes, but i mean from API point of view13:18
sean-k-mooneywe need to ahve a well defiend cancel action that the applyer will invoke for any cancelation13:18
amoralejmoved to SKIPPED (or CANCELLED in the new case) whould be only allowed from PENDING13:19
sean-k-mooneyin the api you can cancel an action plan today https://docs.openstack.org/api-ref/resource-optimization/#cancel-action-plan13:19
amoralejyes13:19
amoralejyeah, but for the new use case, i mean13:20
amoralejactually, to cancel an AP, I'm not sure how it manage the ongoing actions ...13:21
sean-k-mooneythe new case being canceling sepcific action within the plan13:21
amoralejyes13:21
sean-k-mooneyfor live migration at least and some other actions there si an abort method https://github.com/openstack/watcher/blob/f38ab70ba46756b2c3ae74b1a2fafdb39ac58cc7/watcher/applier/actions/migration.py#L19113:23
amoralejfor the case where an AP is cancelled while ongoing, I'd say only the pending actions are cancelled, the ongoing ones are finished13:24
amoralejhttps://github.com/openstack/watcher/blob/f38ab70ba46756b2c3ae74b1a2fafdb39ac58cc7/watcher/applier/action_plan/default.py#L4613:24
amoraleji'm not totally sure, tbh13:24
amoralejthat's where the state of the parent AP state is checked before starting an action13:24
jgilaberreading back, I agree that if we use CANCELLED only there would be some ambiguity, we discussed briefly yesterday with amoralej, so if using CANCELLED we should have some 'reason' field to distinguish the different cases 13:27
jgilaberI need to look at the state diagram again, I'm not too familiar with it13:28
sean-k-mooneyi have not looked at this in detail but i would expect ther eto be an rpc call form the api to the applier or decision engin to cancel the ongoign execution13:28
amoralejanyway, for the new case for Actions, I'd go with PENDING->SKIPPED as the only supported transition to SKIPPED for administrator override13:28
amoralejwe can comment in the spec13:28
sean-k-mooneyjgilaber: ack a reason may be nice in anycase but i aggeree we would need ot treat canceled by api request and precongtion failure differntly13:29
sean-k-mooneyamoralej: so i think canceld takes effect in two places https://github.com/openstack/watcher/blob/master/watcher/applier/workflow_engine/base.py#L16313:31
sean-k-mooneyfirst in pre_execute13:31
sean-k-mooneythats the pending->cancelded transtion13:31
sean-k-mooneyand then during the execution https://github.com/openstack/watcher/blob/master/watcher/applier/workflow_engine/base.py#L23813:32
sean-k-mooneyhttps://github.com/openstack/watcher/blob/master/watcher/applier/workflow_engine/base.py#L203-L23813:33
sean-k-mooneyso the applier spawns the action into a backgroup greenthread13:33
amoralejright!13:34
amoralejthanks for the pointer13:34
sean-k-mooneyand then it polls it once a second to see if it finished or if they action plan is cnacheld13:34
amoralejso that's the CANCEL an ongoing ACTION in a cancelled AP13:34
amoralejand you were right, it's calling the abort there13:35
amoralejmmm, sorry, only checks if the action can be aborted13:36
amoralejtaskflow would take care of launching the abort13:37
sean-k-mooneyya it checks im not sure where abort is beign calle dbut it might be done by taskflow13:37
sean-k-mooneyor perhaps its done here https://github.com/openstack/watcher/blob/master/watcher/applier/workflow_engine/base.py#L258-L26613:38
amoralejaccordinbg to comment https://github.com/openstack/watcher/blob/master/watcher/applier/workflow_engine/base.py#L233-L23513:38
amoralej"catch the greenlet exit exception due to thread kill, taskflow will call revert for the action, we will redirect it to abort.13:38
amoralejbut it's just a comment ...13:38
sean-k-mooneyright so if you look at the line i lined to revert belwo13:38
sean-k-mooneyit has handleign for the cancedl state13:39
sean-k-mooneyhttps://github.com/openstack/watcher/blob/master/watcher/applier/workflow_engine/base.py#L271-L27813:39
amoralejyeah, that's the code13:39
sean-k-mooneyso for live migation it will properly abort13:39
sean-k-mooneyfor cold migration... 13:40
amoralejso, you think it's worthy to support SKIPPING actions which are ongoing for the new feature?13:40
sean-k-mooneyit might fix things but we should look into that later13:40
sean-k-mooneyno13:40
amoralej+1 :)13:40
sean-k-mooneyi dont think skipped shoudl ever be a transtion form ongoing13:40
sean-k-mooneyonly form pending or we could set actions directly to Skipped if you mark them as such in the action plan13:41
sean-k-mooneyi.e. skip pending and go directly to skipped instead of updating the graph to remove them13:41
amoralejyes13:42
amoralejthat was my idea13:42
sean-k-mooneyi htink that could be fine we 13:42
amoralejin the pre-condition failure case, will go from ONGOING to SKIPPED, however13:42
sean-k-mooneyno13:42
sean-k-mooneywe would go form pending ot skiped13:42
sean-k-mooneypreconditions should be checked in pre_execute13:43
sean-k-mooneyi think13:43
amoralejyes, my doubt is if state transition to ONGOING is before of after running pre_execute13:43
sean-k-mooneyi guyess we can look at the detail fo this and see if the ONGOING->SKIPPED is required13:43
sean-k-mooneyi woudl only consier that valid if we do that tanstion before actully trying to assk the other service to do somehting13:44
sean-k-mooneywell before it accpet it13:44
sean-k-mooneyif we got a 400 form nova when we call live migrate13:44
sean-k-mooneybecause the instance was gone13:44
sean-k-mooneywe could amrk it as FAILED or SKIPPED13:44
sean-k-mooneybut if we get back a 200 or 20213:45
sean-k-mooneythen at that point i think we coudl only transtion to FAILED not SKIPPED13:45
amoralejhttps://github.com/openstack/watcher/blob/master/watcher/applier/workflow_engine/default.py#L140-L14313:45
sean-k-mooneybut that just my inclination i have not fully considered all the failure modes13:45
amoralejfor me a migrate action for a non-existing vm would be a SKIPPED, actually, that's the case i use as example :)13:46
sean-k-mooneyamoralej: is notify saving the object back to the db13:47
sean-k-mooneyah it is13:47
sean-k-mooneythat now hoe notificaton work in nova13:47
sean-k-mooneynotificaiton do not alther the state of the obejct in the db in nova but they do in watcher13:47
amoralejyep13:47
sean-k-mooneyi was expecting the state to be saved after that function retured13:48
amoralejI'm learning a lot about watcher internals last days :)13:48
sean-k-mooneywe may want to reorder the notifiction13:48
sean-k-mooneyto be sent after we call the actions pre_execute13:48
sean-k-mooneyif it returns ture and set it to skipped if it returns false13:48
amoralejanyway, iiuc the transition of the action from pre_condition would go by directly calling notification or setting state, not through the api call, right?13:49
sean-k-mooneycorrect13:49
amoraleji mean, we can restrict api call to allow only from PENDING->SKIPPED13:49
sean-k-mooneyservice never call there own api13:49
amoralejwhile the pre-condition failure may do from ONGOING to SKIPPED13:50
sean-k-mooneywe can just reorder https://github.com/openstack/watcher/blob/master/watcher/applier/workflow_engine/default.py#L140-L14313:50
amoraleji mean, reordering is also a valid discussion, but different one13:50
amoralejmy doubt is in that case is not confusing that the applier started running action things (pre_condition) while in PENDING13:51
amoralejfor me, that'd be missleading too13:51
amoralejunless we create another state only for that, what may be overcomplicating things13:51
sean-k-mooneyhttps://paste.opendev.org/show/b6V0ESLaohDBLYL98ScH/13:52
sean-k-mooneysomething like this13:52
sean-k-mooneyhave pre_condition() return the next state i.e. skipped or Ongoing13:52
amoralejor FAILED if returns a different exception13:53
sean-k-mooneyright13:53
sean-k-mooneyso to me the preconstions are gardign transtioning from pending to one of the next valid states so they happen before you endter ONGOING, SKIPPED or FAILED13:54
sean-k-mooneythat not quite the current logic13:54
sean-k-mooneybut i think its a valid evolution of the current logic13:54
amoralejthat may be part of this same spec? or a different one13:55
sean-k-mooneysame spec13:55
sean-k-mooneyi think this is just an implemetion detial or how we add skipped13:55
amoralejack13:55
amoralejno problem for me13:55
sean-k-mooneyamoralej: are you ok with capturing a sumemry of this conversation in the spec perhaps with a refence to the irc logs in an comment and then we can decdied how to proceed. if not i can try and do the same wehn i get tiem to review later this week13:57
amoralejyes, I will do13:57
amoralejor i will ask some IA to summarize it :D13:57
sean-k-mooneymy guess is i wont get time to do a proper review until thrudate or friday but ill try to do it before the ned of the week13:58
amoralejthat'd be good13:58
sean-k-mooneyim happy to chat about things in real time here before then but im goign to step away for a minute and go grab coffee13:58
amoralejsure, thanks for your feedback13:58
amoralejjgilaber, anyway, although we move to the new SKIPPED state, i will maintain the new field skip_reason in the spec14:18
jgilaber+1, I think that will be useful, thanks14:18
amoralejyep, i agree14:19
jgilaberdoes anyone know the right way to enable logging in some tests? I managed to reproduce https://bugs.launchpad.net/watcher/+bug/2110149 in unit tests but it feels really hacky CI in https://review.opendev.org/c/openstack/watcher/+/949187 shows the error14:31
opendevreviewJoan Gilabert proposed openstack/watcher master: Migrate value column of efficacy indicator on load  https://review.opendev.org/c/openstack/watcher/+/94964014:35
opendevreviewDouglas Viroel proposed openstack/watcher master: Move eventlet command scripts to a different dir  https://review.opendev.org/c/openstack/watcher/+/94964114:36
opendevreviewJoan Gilabert proposed openstack/watcher master: Migrate value column of efficacy indicator on load  https://review.opendev.org/c/openstack/watcher/+/94964014:38
amoralejjgilaber, i'm not entirely sure about your question, but you mean something like https://github.com/openstack/watcher/blob/master/watcher/tests/api/test_hooks.py#L179-L182 ?16:12
amoralejiiuc that would enable debug for that test16:12
jgilaberamoralej: basically I need to do 'LOG.logger.disabled = False' in the tests to get them to actually run the LOG.debug call in code to reproduce the bug16:17
jgilaberI tried setting 'cfg.CONF.set_override('debug', True)' but that still does not execute the debug log16:18
amoralejno idea, then16:22
opendevreviewDavid proposed openstack/watcher-tempest-plugin master: Refactor of wait_for_instances_in_model function to wait until all instances are in the correct compute node on the model  https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/94965717:15
opendevreviewDavid proposed openstack/watcher-tempest-plugin master: Refactor of wait_for_instances_in_model function to wait until all instances are in the correct compute node on the model  https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/94965717:18
opendevreviewAlfredo Moralejo proposed openstack/watcher-specs master: Add spec for skip actions from action plan executions  https://review.opendev.org/c/openstack/watcher-specs/+/94952817:18
amoralejsean-k-mooney ^ 17:19
sean-k-mooneyjgilaber: i dont have time to check why you need to do that today but i can take a look tomorow if you remind me17:20
opendevreviewDouglas Viroel proposed openstack/watcher master: Remove deprecated executor in message handling servers  https://review.opendev.org/c/openstack/watcher/+/94965817:28
jgilaberack thanks sean-k-mooney I'll do that19:08

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!