opendevreview | Merged openstack/watcher master: Set number of decimal digits in efficacy indicator https://review.opendev.org/c/openstack/watcher/+/945199 | 00:06 |
---|---|---|
opendevreview | chandan kumar proposed openstack/watcher master: Drop nova command reference from the code https://review.opendev.org/c/openstack/watcher/+/947828 | 03:26 |
opendevreview | chandan kumar proposed openstack/watcher-tempest-plugin master: [wip] Use self.get_host_for_server https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/949557 | 03:47 |
opendevreview | chandan kumar proposed openstack/watcher-tempest-plugin master: [wip] Use self.get_host_for_server https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/949557 | 06:08 |
opendevreview | chandan kumar proposed openstack/watcher-tempest-plugin master: Use get_host_for_server to get server host https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/949557 | 09:14 |
amoralej | @dviroel, I've created https://bugs.launchpad.net/watcher/+bug/2110538 , iiuc this is one of the problems related to watcher wrongly managing parameters validation, right? | 10:14 |
opendevreview | Joan Gilabert proposed openstack/watcher master: Check logs in some cinder and nova helper tests https://review.opendev.org/c/openstack/watcher/+/949187 | 10:32 |
opendevreview | Joan Gilabert proposed openstack/watcher master: Fix incorrect logging format https://review.opendev.org/c/openstack/watcher/+/822559 | 10:32 |
opendevreview | Joan Gilabert proposed openstack/watcher master: Check logs in some cinder and nova helper tests https://review.opendev.org/c/openstack/watcher/+/949187 | 11:45 |
opendevreview | Joan Gilabert proposed openstack/watcher master: Fix incorrect logging format https://review.opendev.org/c/openstack/watcher/+/822559 | 11:45 |
dviroel | amoralej: correct, api responding 500 \ | 11:45 |
opendevreview | Merged openstack/watcher master: Drop nova command reference from the code https://review.opendev.org/c/openstack/watcher/+/947828 | 12:39 |
sean-k-mooney | CANCELLED is or was in the state diagram but i belive that is diffent then SKIPPED | 12:53 |
sean-k-mooney | CANCELLED to me woudl only make sense if we approved applying the action plan and then canceled it before it ran | 12:54 |
sean-k-mooney | where as SKIPPED woudl be doen either automaticlly beased on preconditions or proactivly before the action plan is approvved | 12:55 |
sean-k-mooney | a SKIPPED in my mind woudl either result in teh action never enterign pending or perhaps ebing created in the case where we mark an action as skipped in the action plan before we approve. or it is the state after pedning instead fo ONGOING where we skip due to a failed precondtion | 12:57 |
sean-k-mooney | for exampel when the applier dequeues a pending live migraiton action before it got to ongoing if the vm has been deleted it woudl transition to skipped | 12:58 |
sean-k-mooney | where as the task woudl transtion to cnacheld if teh admin canceld the action plan while it was running | 12:59 |
sean-k-mooney | i.e. if they wanted to migrate 100 instnaces and ran out of time in a mantaince window and cancled the remaing operations | 12:59 |
sean-k-mooney | amoralej: so i would not reuse CANCELED for SKIPPED ^ | 13:00 |
sean-k-mooney | jgilaber: ^ | 13:00 |
amoralej | reading back | 13:00 |
sean-k-mooney | i think in either case we shoudl capture this question in the spec | 13:00 |
sean-k-mooney | i would lean toward listing using the canasled state as an alternitive in the spec but if we decied to pivort and use canceled we can list SKIPPED as the alternitive | 13:01 |
amoralej | sean-k-mooney, what lead me to reuse CANCELLED was the wording in https://github.com/openstack/watcher/blob/f38ab70ba46756b2c3ae74b1a2fafdb39ac58cc7/watcher/api/controllers/v1/action.py#L49-L51 | 13:03 |
sean-k-mooney | the wrodign there specificly say cancedl by the administror | 13:03 |
sean-k-mooney | which mean allowing a task to enter that state because of a preconsition is exiplcity out of scope | 13:04 |
amoralej | that's true | 13:04 |
sean-k-mooney | and when we discuss skiped it was not a state you woudl reach form ONGOING | 13:04 |
sean-k-mooney | i.e. you could not skip an inporgress action | 13:04 |
amoralej | correct, in this case transition from ONGOING to CANCELLED is used when the applier dies while an action is ongoing and it's started | 13:05 |
sean-k-mooney | so we could use canceld if we belive the UX is better but im not sold that it is | 13:05 |
amoralej | then the action is set to CANCELLED | 13:05 |
sean-k-mooney | hum | 13:05 |
amoralej | so, iiuc, you are right with the two current use cases of CANCELLED, right? | 13:05 |
sean-k-mooney | im ok with the exiting usage of canceled | 13:05 |
sean-k-mooney | but i think we shoudl also have skipped | 13:06 |
sean-k-mooney | for cases where we never ty to apply the action | 13:06 |
sean-k-mooney | i guess in teh case where the appier dies it only goes to cancheld once you restart it right | 13:06 |
amoralej | but one of the current cases implies also the action was never applied, which is when the entire action plan is cancelled from PENDING | 13:07 |
sean-k-mooney | right | 13:07 |
amoralej | in that case is also moving to CANCELLED | 13:07 |
sean-k-mooney | and to me that is not the same as SKIPPED | 13:07 |
sean-k-mooney | so if we mark an action as skipped in teh action_plan | 13:07 |
sean-k-mooney | i do not expect that action to be created for the applier to apply | 13:08 |
sean-k-mooney | i would expect the decsion engin to omit it form the set of actions | 13:08 |
amoralej | uh, that's different to what i was thinking ... | 13:08 |
sean-k-mooney | and an action woudl only ever end up in SKIPPED if it failed the preconditoin | 13:08 |
sean-k-mooney | we coudl do it differntly | 13:08 |
sean-k-mooney | i have not had time to read your sepc yet | 13:09 |
sean-k-mooney | so im open to other approches. im just stating how i would expect it to work based on our high level converstaion | 13:09 |
amoralej | tbc, I'm not totally against adding new SKIPPED action, but i think it's good to discuss about options | 13:10 |
sean-k-mooney | :) that why we create specs. so we can capture the diffent optins and capture why we did tno take the alternitives before then moving on to explain how we will implement the proposed solution :) | 13:11 |
amoralej | in my proposal, when an AP is created, no actions can be added/deleted, but when we mark it as SKIPPED or CANCELLED (whatever we decide), the applier will simply do nothing and will jump to next one | 13:11 |
amoralej | otherwise, if we modify the AP graph definition, we need to redefine parent relations and so on, which will make it more complex with small gain, imo | 13:12 |
sean-k-mooney | thats doable but i think CANCELD is not a good name in the case where an admin does nto want it to be applied in that case as it was never accpated at that point | 13:12 |
sean-k-mooney | well i was trying to avoid ever creatign the action objects and havign to tack there statte. | 13:13 |
amoralej | that's reasonable, what made me move on with the CANCELLED state was the analogy with the resulting state of the actions when an entire AP is cancelled | 13:14 |
amoralej | in that case, the action was also never tried | 13:14 |
sean-k-mooney | right and i dont really like conflating the two. | 13:14 |
sean-k-mooney | because the action could have been in the ongoing state | 13:14 |
amoralej | that's right | 13:14 |
amoralej | APs can be cancelled while ongoing | 13:15 |
sean-k-mooney | if cancled only applied to PENDING | 13:15 |
sean-k-mooney | then it woudl be equivclent ot SKIPPED | 13:15 |
sean-k-mooney | so there are some operation that cant be aborted once started | 13:15 |
sean-k-mooney | so you cant treat cancelign in ongoing and pending the same in general | 13:16 |
amoralej | yeah, i don't want to allow to skip actions while ongoing, only while pending | 13:16 |
amoralej | ack, that makes sense | 13:16 |
sean-k-mooney | so if we cancel an ongoing live migration we would need to abort the migration (well or force compelte it) | 13:16 |
amoralej | I will wait for feedback in the review so that others can also add their feedback and i will rewrite | 13:17 |
amoralej | yeah, +1 to forbid to cancel an ongoing action | 13:17 |
sean-k-mooney | well its allowed todya right | 13:17 |
sean-k-mooney | its more that cancelign an ongoing action required roleback logic | 13:18 |
sean-k-mooney | there was a proposal to add that in the past but never completed | 13:18 |
sean-k-mooney | canel basically cant be a noop in many cases | 13:18 |
amoralej | yes, but i mean from API point of view | 13:18 |
sean-k-mooney | we need to ahve a well defiend cancel action that the applyer will invoke for any cancelation | 13:18 |
amoralej | moved to SKIPPED (or CANCELLED in the new case) whould be only allowed from PENDING | 13:19 |
sean-k-mooney | in the api you can cancel an action plan today https://docs.openstack.org/api-ref/resource-optimization/#cancel-action-plan | 13:19 |
amoralej | yes | 13:19 |
amoralej | yeah, but for the new use case, i mean | 13:20 |
amoralej | actually, to cancel an AP, I'm not sure how it manage the ongoing actions ... | 13:21 |
sean-k-mooney | the new case being canceling sepcific action within the plan | 13:21 |
amoralej | yes | 13:21 |
sean-k-mooney | for live migration at least and some other actions there si an abort method https://github.com/openstack/watcher/blob/f38ab70ba46756b2c3ae74b1a2fafdb39ac58cc7/watcher/applier/actions/migration.py#L191 | 13:23 |
amoralej | for the case where an AP is cancelled while ongoing, I'd say only the pending actions are cancelled, the ongoing ones are finished | 13:24 |
amoralej | https://github.com/openstack/watcher/blob/f38ab70ba46756b2c3ae74b1a2fafdb39ac58cc7/watcher/applier/action_plan/default.py#L46 | 13:24 |
amoralej | i'm not totally sure, tbh | 13:24 |
amoralej | that's where the state of the parent AP state is checked before starting an action | 13:24 |
jgilaber | reading back, I agree that if we use CANCELLED only there would be some ambiguity, we discussed briefly yesterday with amoralej, so if using CANCELLED we should have some 'reason' field to distinguish the different cases | 13:27 |
jgilaber | I need to look at the state diagram again, I'm not too familiar with it | 13:28 |
sean-k-mooney | i have not looked at this in detail but i would expect ther eto be an rpc call form the api to the applier or decision engin to cancel the ongoign execution | 13:28 |
amoralej | anyway, for the new case for Actions, I'd go with PENDING->SKIPPED as the only supported transition to SKIPPED for administrator override | 13:28 |
amoralej | we can comment in the spec | 13:28 |
sean-k-mooney | jgilaber: ack a reason may be nice in anycase but i aggeree we would need ot treat canceled by api request and precongtion failure differntly | 13:29 |
sean-k-mooney | amoralej: so i think canceld takes effect in two places https://github.com/openstack/watcher/blob/master/watcher/applier/workflow_engine/base.py#L163 | 13:31 |
sean-k-mooney | first in pre_execute | 13:31 |
sean-k-mooney | thats the pending->cancelded transtion | 13:31 |
sean-k-mooney | and then during the execution https://github.com/openstack/watcher/blob/master/watcher/applier/workflow_engine/base.py#L238 | 13:32 |
sean-k-mooney | https://github.com/openstack/watcher/blob/master/watcher/applier/workflow_engine/base.py#L203-L238 | 13:33 |
sean-k-mooney | so the applier spawns the action into a backgroup greenthread | 13:33 |
amoralej | right! | 13:34 |
amoralej | thanks for the pointer | 13:34 |
sean-k-mooney | and then it polls it once a second to see if it finished or if they action plan is cnacheld | 13:34 |
amoralej | so that's the CANCEL an ongoing ACTION in a cancelled AP | 13:34 |
amoralej | and you were right, it's calling the abort there | 13:35 |
amoralej | mmm, sorry, only checks if the action can be aborted | 13:36 |
amoralej | taskflow would take care of launching the abort | 13:37 |
sean-k-mooney | ya it checks im not sure where abort is beign calle dbut it might be done by taskflow | 13:37 |
sean-k-mooney | or perhaps its done here https://github.com/openstack/watcher/blob/master/watcher/applier/workflow_engine/base.py#L258-L266 | 13:38 |
amoralej | accordinbg to comment https://github.com/openstack/watcher/blob/master/watcher/applier/workflow_engine/base.py#L233-L235 | 13:38 |
amoralej | "catch the greenlet exit exception due to thread kill, taskflow will call revert for the action, we will redirect it to abort. | 13:38 |
amoralej | but it's just a comment ... | 13:38 |
sean-k-mooney | right so if you look at the line i lined to revert belwo | 13:38 |
sean-k-mooney | it has handleign for the cancedl state | 13:39 |
sean-k-mooney | https://github.com/openstack/watcher/blob/master/watcher/applier/workflow_engine/base.py#L271-L278 | 13:39 |
amoralej | yeah, that's the code | 13:39 |
sean-k-mooney | so for live migation it will properly abort | 13:39 |
sean-k-mooney | for cold migration... | 13:40 |
amoralej | so, you think it's worthy to support SKIPPING actions which are ongoing for the new feature? | 13:40 |
sean-k-mooney | it might fix things but we should look into that later | 13:40 |
sean-k-mooney | no | 13:40 |
amoralej | +1 :) | 13:40 |
sean-k-mooney | i dont think skipped shoudl ever be a transtion form ongoing | 13:40 |
sean-k-mooney | only form pending or we could set actions directly to Skipped if you mark them as such in the action plan | 13:41 |
sean-k-mooney | i.e. skip pending and go directly to skipped instead of updating the graph to remove them | 13:41 |
amoralej | yes | 13:42 |
amoralej | that was my idea | 13:42 |
sean-k-mooney | i htink that could be fine we | 13:42 |
amoralej | in the pre-condition failure case, will go from ONGOING to SKIPPED, however | 13:42 |
sean-k-mooney | no | 13:42 |
sean-k-mooney | we would go form pending ot skiped | 13:42 |
sean-k-mooney | preconditions should be checked in pre_execute | 13:43 |
sean-k-mooney | i think | 13:43 |
amoralej | yes, my doubt is if state transition to ONGOING is before of after running pre_execute | 13:43 |
sean-k-mooney | i guyess we can look at the detail fo this and see if the ONGOING->SKIPPED is required | 13:43 |
sean-k-mooney | i woudl only consier that valid if we do that tanstion before actully trying to assk the other service to do somehting | 13:44 |
sean-k-mooney | well before it accpet it | 13:44 |
sean-k-mooney | if we got a 400 form nova when we call live migrate | 13:44 |
sean-k-mooney | because the instance was gone | 13:44 |
sean-k-mooney | we could amrk it as FAILED or SKIPPED | 13:44 |
sean-k-mooney | but if we get back a 200 or 202 | 13:45 |
sean-k-mooney | then at that point i think we coudl only transtion to FAILED not SKIPPED | 13:45 |
amoralej | https://github.com/openstack/watcher/blob/master/watcher/applier/workflow_engine/default.py#L140-L143 | 13:45 |
sean-k-mooney | but that just my inclination i have not fully considered all the failure modes | 13:45 |
amoralej | for me a migrate action for a non-existing vm would be a SKIPPED, actually, that's the case i use as example :) | 13:46 |
sean-k-mooney | amoralej: is notify saving the object back to the db | 13:47 |
sean-k-mooney | ah it is | 13:47 |
sean-k-mooney | that now hoe notificaton work in nova | 13:47 |
sean-k-mooney | notificaiton do not alther the state of the obejct in the db in nova but they do in watcher | 13:47 |
amoralej | yep | 13:47 |
sean-k-mooney | i was expecting the state to be saved after that function retured | 13:48 |
amoralej | I'm learning a lot about watcher internals last days :) | 13:48 |
sean-k-mooney | we may want to reorder the notifiction | 13:48 |
sean-k-mooney | to be sent after we call the actions pre_execute | 13:48 |
sean-k-mooney | if it returns ture and set it to skipped if it returns false | 13:48 |
amoralej | anyway, iiuc the transition of the action from pre_condition would go by directly calling notification or setting state, not through the api call, right? | 13:49 |
sean-k-mooney | correct | 13:49 |
amoralej | i mean, we can restrict api call to allow only from PENDING->SKIPPED | 13:49 |
sean-k-mooney | service never call there own api | 13:49 |
amoralej | while the pre-condition failure may do from ONGOING to SKIPPED | 13:50 |
sean-k-mooney | we can just reorder https://github.com/openstack/watcher/blob/master/watcher/applier/workflow_engine/default.py#L140-L143 | 13:50 |
amoralej | i mean, reordering is also a valid discussion, but different one | 13:50 |
amoralej | my doubt is in that case is not confusing that the applier started running action things (pre_condition) while in PENDING | 13:51 |
amoralej | for me, that'd be missleading too | 13:51 |
amoralej | unless we create another state only for that, what may be overcomplicating things | 13:51 |
sean-k-mooney | https://paste.opendev.org/show/b6V0ESLaohDBLYL98ScH/ | 13:52 |
sean-k-mooney | something like this | 13:52 |
sean-k-mooney | have pre_condition() return the next state i.e. skipped or Ongoing | 13:52 |
amoralej | or FAILED if returns a different exception | 13:53 |
sean-k-mooney | right | 13:53 |
sean-k-mooney | so to me the preconstions are gardign transtioning from pending to one of the next valid states so they happen before you endter ONGOING, SKIPPED or FAILED | 13:54 |
sean-k-mooney | that not quite the current logic | 13:54 |
sean-k-mooney | but i think its a valid evolution of the current logic | 13:54 |
amoralej | that may be part of this same spec? or a different one | 13:55 |
sean-k-mooney | same spec | 13:55 |
sean-k-mooney | i think this is just an implemetion detial or how we add skipped | 13:55 |
amoralej | ack | 13:55 |
amoralej | no problem for me | 13:55 |
sean-k-mooney | amoralej: are you ok with capturing a sumemry of this conversation in the spec perhaps with a refence to the irc logs in an comment and then we can decdied how to proceed. if not i can try and do the same wehn i get tiem to review later this week | 13:57 |
amoralej | yes, I will do | 13:57 |
amoralej | or i will ask some IA to summarize it :D | 13:57 |
sean-k-mooney | my guess is i wont get time to do a proper review until thrudate or friday but ill try to do it before the ned of the week | 13:58 |
amoralej | that'd be good | 13:58 |
sean-k-mooney | im happy to chat about things in real time here before then but im goign to step away for a minute and go grab coffee | 13:58 |
amoralej | sure, thanks for your feedback | 13:58 |
amoralej | jgilaber, anyway, although we move to the new SKIPPED state, i will maintain the new field skip_reason in the spec | 14:18 |
jgilaber | +1, I think that will be useful, thanks | 14:18 |
amoralej | yep, i agree | 14:19 |
jgilaber | does anyone know the right way to enable logging in some tests? I managed to reproduce https://bugs.launchpad.net/watcher/+bug/2110149 in unit tests but it feels really hacky CI in https://review.opendev.org/c/openstack/watcher/+/949187 shows the error | 14:31 |
opendevreview | Joan Gilabert proposed openstack/watcher master: Migrate value column of efficacy indicator on load https://review.opendev.org/c/openstack/watcher/+/949640 | 14:35 |
opendevreview | Douglas Viroel proposed openstack/watcher master: Move eventlet command scripts to a different dir https://review.opendev.org/c/openstack/watcher/+/949641 | 14:36 |
opendevreview | Joan Gilabert proposed openstack/watcher master: Migrate value column of efficacy indicator on load https://review.opendev.org/c/openstack/watcher/+/949640 | 14:38 |
amoralej | jgilaber, i'm not entirely sure about your question, but you mean something like https://github.com/openstack/watcher/blob/master/watcher/tests/api/test_hooks.py#L179-L182 ? | 16:12 |
amoralej | iiuc that would enable debug for that test | 16:12 |
jgilaber | amoralej: basically I need to do 'LOG.logger.disabled = False' in the tests to get them to actually run the LOG.debug call in code to reproduce the bug | 16:17 |
jgilaber | I tried setting 'cfg.CONF.set_override('debug', True)' but that still does not execute the debug log | 16:18 |
amoralej | no idea, then | 16:22 |
opendevreview | David proposed openstack/watcher-tempest-plugin master: Refactor of wait_for_instances_in_model function to wait until all instances are in the correct compute node on the model https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/949657 | 17:15 |
opendevreview | David proposed openstack/watcher-tempest-plugin master: Refactor of wait_for_instances_in_model function to wait until all instances are in the correct compute node on the model https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/949657 | 17:18 |
opendevreview | Alfredo Moralejo proposed openstack/watcher-specs master: Add spec for skip actions from action plan executions https://review.opendev.org/c/openstack/watcher-specs/+/949528 | 17:18 |
amoralej | sean-k-mooney ^ | 17:19 |
sean-k-mooney | jgilaber: i dont have time to check why you need to do that today but i can take a look tomorow if you remind me | 17:20 |
opendevreview | Douglas Viroel proposed openstack/watcher master: Remove deprecated executor in message handling servers https://review.opendev.org/c/openstack/watcher/+/949658 | 17:28 |
jgilaber | ack thanks sean-k-mooney I'll do that | 19:08 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!