*** ruijie has joined #senlin | 00:58 | |
*** yanyanhu has joined #senlin | 01:41 | |
yanyanhu | hi, Qiming, just got response from Heidi last weekend and she said they are still waiting for the illustrator to give them the update version since they didn't think the first version is good enough. | 01:43 |
---|---|---|
Qiming | ok | 01:44 |
yanyanhu | and she will give me message as soon as she get it | 01:44 |
*** elynn has joined #senlin | 01:44 | |
Qiming | so long they are still working on it, it is fine | 01:44 |
yanyanhu | yep, looks so | 01:46 |
*** elynn has quit IRC | 01:47 | |
*** elynn has joined #senlin | 01:49 | |
*** XueFeng has joined #senlin | 01:56 | |
*** elynn has joined #senlin | 02:11 | |
openstackgerrit | Merged openstack/python-senlinclient: Support "global_project" arguments for action-list https://review.openstack.org/397805 | 02:11 |
openstackgerrit | Qiming Teng proposed openstack/senlin: Move notifications object down one level https://review.openstack.org/400024 | 02:15 |
openstackgerrit | miaohb proposed openstack/python-senlinclient: Cluster collect display error https://review.openstack.org/400026 | 02:25 |
*** yuanying has quit IRC | 02:50 | |
*** yuanying has joined #senlin | 02:51 | |
openstackgerrit | Qiming Teng proposed openstack/senlin: Registry support for notification classes https://review.openstack.org/400033 | 02:54 |
Qiming | are we suffering from the same problem? https://review.openstack.org/#/c/378572/ | 03:19 |
*** zhurong has joined #senlin | 03:27 | |
*** yuanying has quit IRC | 03:46 | |
*** elynn has quit IRC | 03:48 | |
*** yuanying has joined #senlin | 03:48 | |
*** zhurong has quit IRC | 03:58 | |
*** zhurong has joined #senlin | 03:59 | |
*** gongysh2 has quit IRC | 04:01 | |
*** shu-mutou-AWAY is now known as shu-mutou | 04:13 | |
*** zhurong has quit IRC | 04:26 | |
*** gongysh2 has joined #senlin | 04:27 | |
*** rasmus has joined #senlin | 04:32 | |
*** rasmus has quit IRC | 04:36 | |
*** elynn has joined #senlin | 04:52 | |
yanyanhu | hi, Qiming, we are not I think. We inherit from service.Service for three services including engine, dispatcher and health manager and I think all of them are used with threadgroup | 04:55 |
Qiming | okay, good to know | 04:55 |
*** elynn has quit IRC | 04:57 | |
*** elynn has joined #senlin | 04:58 | |
*** zhurong has joined #senlin | 05:11 | |
openstackgerrit | Merged openstack/senlin: Add TODO item about referencing existing pool https://review.openstack.org/398872 | 05:12 |
*** zhurong has quit IRC | 05:31 | |
*** zhurong has joined #senlin | 05:35 | |
openstackgerrit | Merged openstack/senlin: Add request object for event-get https://review.openstack.org/399835 | 05:51 |
openstackgerrit | Merged openstack/senlin: Engine support for profile-validate2 https://review.openstack.org/398863 | 05:53 |
*** elynn has quit IRC | 06:09 | |
*** elynn has joined #senlin | 06:31 | |
openstackgerrit | lvdongbing proposed openstack/senlin: API support for profile-validate2 https://review.openstack.org/400062 | 06:31 |
*** elynn has quit IRC | 06:36 | |
*** elynn has joined #senlin | 06:36 | |
Qiming | yanyanhu, free for a quick discussion? | 06:46 |
yanyanhu | Qiming, sure | 06:46 |
Qiming | working on versioned notification ... | 06:46 |
yanyanhu | ok | 06:46 |
Qiming | I don't think we will have a big problem unifying the logging interface among database, message, file, etc | 06:47 |
Qiming | I'm deferring that work till we get a poc implementation for versioned notification | 06:47 |
Qiming | once the interface is proved to be flexible, generic enough for both database and message, we can work on the generalization step | 06:48 |
yanyanhu | yes, that makes sense. Once the poc implementation is ready, it will be easy to add more backend | 06:48 |
Qiming | before we are there, we need to get the versioned notification thing done | 06:48 |
Qiming | I'm somehow blocked by the granularity problem when modeling events | 06:49 |
Qiming | it is not like the LOG.info, LOG.error, ... which we treated as free | 06:49 |
Qiming | for notifications, if no one is receiving and processing them, it seems that they will be accumulated into the message queue for a long time | 06:50 |
yanyanhu | you're worried about the overhead | 06:50 |
yanyanhu | oh, I see | 06:50 |
Qiming | I just experience that when reinstalling devstack adding gnocchi and aodh | 06:50 |
yanyanhu | agree with this. Actually I think amqp is designed for runtime message delivering | 06:51 |
yanyanhu | with an assumption that the consumer is always online to receive and handle message | 06:51 |
Qiming | and ... previous experience debuging some enterprise middleware ... do NOT overly log, do NOT overly notify ... | 06:51 |
yanyanhu | this is different from log type of message, e.g. kafka | 06:51 |
yanyanhu | yes, it is | 06:52 |
Qiming | alright, I did considered this when drafting the spec file, so eventually, we will expose some switches into the config file for users to customize ... | 06:52 |
Qiming | what level of events should be fired, what kind of events should be masked etc ... | 06:53 |
yanyanhu | yep | 06:53 |
Qiming | that would be ... complex to use, but flexible enough to meet requirements we are not anticipating | 06:53 |
Qiming | okay, enought recap | 06:53 |
Qiming | the problem is ... we have too many choices to send event notifications | 06:53 |
Qiming | we have to make some design decisions on this | 06:54 |
Qiming | we are not supposed to emit a notification whenever we just need a debug info | 06:54 |
Qiming | even with user customization, we will only decide whether to emit an event at the last moment | 06:55 |
yanyanhu | last moment, you mean? | 06:55 |
Qiming | we are not supposed to place a lot of 'if ... else ..' calls at the call site | 06:55 |
yanyanhu | Qiming, that's for sure... | 06:56 |
Qiming | take the LOG.info calls as an example | 06:56 |
Qiming | we are calling it everywhere | 06:56 |
Qiming | if we want to do conditional logging, we are not supposed to add 'if (info is allowed for this module, for this action) then LOG.info' everywhere | 06:57 |
Qiming | we will keep the call site simple, just a single line, LOG.info(...) | 06:57 |
yanyanhu | yes | 06:57 |
yanyanhu | so there should a filter for this purpose? | 06:58 |
Qiming | then in the driver layer, we decide whether we will actually generate an event notification (or db record) | 06:58 |
Qiming | called a filter or a filter chain if you want | 06:58 |
Qiming | but that filtering logic is not supposed to be placed at the call site, instead it should be placed inside the 'info' call | 06:59 |
yanyanhu | yes | 06:59 |
Qiming | in other words, the 'info' call should be smart enough to handle this customizations, correct? | 06:59 |
yanyanhu | right | 06:59 |
Qiming | okay, then ... where do we place those 'info/warn/error/' calls? | 07:00 |
Qiming | (suppose we can filter them eventually, efficiently, at the last moment) | 07:00 |
yanyanhu | each key point of workflow I guess? | 07:00 |
yanyanhu | e.g. action starts, succeeds | 07:01 |
Qiming | right, the question lies in the definition of "key point of workflows" | 07:01 |
Qiming | say, cluster-scale-out as a workflow | 07:01 |
yanyanhu | tough quesition :) | 07:01 |
Qiming | where do we call event generation? | 07:01 |
yanyanhu | inside engine, I think service call, action building, policy taking effect, action scheduling/executing/finishing? | 07:02 |
yanyanhu | and those points inside each sub action | 07:03 |
yanyanhu | It's hard to ask enduer to make decision I feel | 07:04 |
Qiming | we can emit event at the following places: 1) rpc request received and validated 2) cluster_scale_out action queued 3) cluster_scale_out action starts execution 4) cluster_scale_out action forks node_create action; 5) node_create action queued; 6) node_create action starts execution; 7) node_create action failes/succeeds 8) cluster_create action fails/succeeds 9) the original request reached a conclusion, i.e. cluster was scaled or not (status changes) | 07:04 |
Qiming | I do see every step a key point in the workflow | 07:05 |
yanyanhu | yes, those events should be emitted | 07:05 |
Qiming | but I don't think we need to log them all | 07:05 |
Qiming | it is too heavy | 07:05 |
Qiming | completely ruining the idea of notification | 07:06 |
yanyanhu | Qiming, that's true | 07:06 |
yanyanhu | especially consider the overhead from interacting with event backend | 07:06 |
*** guoshan has joined #senlin | 07:06 | |
Qiming | correct, 5), 6), 7) above are proportional to the scale of a cluster operation | 07:07 |
Qiming | after drawing this on a paper, I'm astonished ... | 07:08 |
Qiming | we cannot afford logging so many events where each event will carry a lot of payload (based on my current design) | 07:08 |
Qiming | oslo versioned objects, when dumped, are already generating a lot of overhead regarding bytes added | 07:09 |
yanyanhu | yes | 07:09 |
yanyanhu | in large scale, that could be very low efficient | 07:09 |
Qiming | suppose we dump the cluster properties for all the events above, and all the action properties for these events | 07:09 |
Qiming | if we don't dump all the properties, we will be challenged ... why the cluster_scale_out event didn't tell me when it was started and when it was stopped? I want to compute the duration of its execution ... | 07:11 |
Qiming | so ... a tough decision, right? | 07:11 |
yanyanhu | yes | 07:11 |
yanyanhu | if so maybe we start from coarse granularity(e.g. only logging cluster level events) and then try finer granularity and evaluate the overhead increasing? | 07:11 |
Qiming | here is my current proposal | 07:11 |
Qiming | we don't dump action details | 07:12 |
Qiming | we think from end user's perspective | 07:12 |
Qiming | they shouldn't care about the asynchronous/synchronous execution of cluster operations ... | 07:13 |
Qiming | it was ... senlin ... that makes things a "mess" | 07:13 |
Qiming | take node_create as an example | 07:14 |
Qiming | if it is a derived action, not one originated from RPC request, we don't have to expose that detail to users | 07:14 |
Qiming | we instead should strive and focus on exposing information on the cluster operation itself ... event if it fails, we let the users know why it failed .. | 07:15 |
openstackgerrit | xu-haiwei proposed openstack/senlin: Update host node 'dependents' when create/delete container node https://review.openstack.org/396016 | 07:15 |
Qiming | that is the 'original' goal of events or notifications | 07:15 |
yanyanhu | Qiming, yep, totally makes sense. They are events not "debug" info | 07:16 |
*** zhurong has quit IRC | 07:16 | |
Qiming | back to the list above | 07:16 |
Qiming | I'd like to focus on 3), 9) only | 07:16 |
yanyanhu | 8) is duplicated with action list/get? | 07:17 |
Qiming | in terms of event notification, there will be three types of events for this operation: cluster.scale_out.start, cluster.scale_out.end, cluster.scale_out.error | 07:18 |
Qiming | and .. that is ALL | 07:18 |
yanyanhu | that's reasonable | 07:18 |
Qiming | oh, 8) is inside the 'do_scale_out' function, and 9) is at the end of the '_execute' function | 07:19 |
yanyanhu | I see | 07:19 |
Qiming | it is gonna complicate the event generation a little bit, regarding the derivation of "status reason", but ... | 07:20 |
Qiming | the simplification of overall infrastructure may justify that effort, I hope | 07:20 |
yanyanhu | it will I think | 07:21 |
Qiming | okay, will proceed on this | 07:21 |
yanyanhu | otherwise, the overhead could be unaffordable | 07:21 |
yanyanhu | great, thanks for those explanation :) | 07:21 |
Qiming | and try apply the same principle on node operations (those derived from RPC) | 07:21 |
Qiming | em ... actually, it maybe not that complex | 07:22 |
Qiming | we have been working very hard to reduce error message into the action.status and even into cluster status | 07:22 |
Qiming | that include failures of policy checks ... | 07:23 |
Qiming | so we will see if we have to emit something when a policy check has failed | 07:23 |
yanyanhu | ok | 07:23 |
yanyanhu | sounds feasible | 07:23 |
Qiming | it is not that interesting either, if we have recorded the reason why a cluster operation has failed | 07:24 |
Qiming | okay, thx for ur time, :) | 07:24 |
yanyanhu | event can be used together with action get I think | 07:24 |
yanyanhu | my pleasure | 07:24 |
yanyanhu | hope this digging can help the team better understand the design principle | 07:25 |
yanyanhu | :) | 07:25 |
openstackgerrit | lvdongbing proposed openstack/senlin: Engine support for profile-create2 https://review.openstack.org/400075 | 07:25 |
Qiming | will try to document these design considerations when creating developer docs | 07:25 |
yanyanhu | great | 07:27 |
openstackgerrit | miaohb proposed openstack/python-senlinclient: The default value of "--list" in cluster-collect's help message displays error https://review.openstack.org/400076 | 07:30 |
openstackgerrit | lvdongbing proposed openstack/senlin: API support for profile-create2 https://review.openstack.org/400079 | 07:50 |
openstackgerrit | Yanyan Hu proposed openstack/senlin: Fix an error in integration test https://review.openstack.org/400081 | 07:53 |
openstackgerrit | miaohb proposed openstack/python-senlinclient: Revise the help message of cluster-collect https://review.openstack.org/400076 | 08:06 |
openstackgerrit | miaohb proposed openstack/python-senlinclient: Revise the help message of cluster-collect https://review.openstack.org/400076 | 08:12 |
openstackgerrit | miaohb proposed openstack/python-senlinclient: Revise the help info of cluster collect https://review.openstack.org/400026 | 08:19 |
openstackgerrit | lvdongbing proposed openstack/senlin: Remove dead code related to profile-get in engine layer https://review.openstack.org/400093 | 08:22 |
openstackgerrit | lvdongbing proposed openstack/senlin: Remove dead code related to profile-update in engine layer https://review.openstack.org/400104 | 08:30 |
openstackgerrit | Shan Guo proposed openstack/senlin: Modify the cli in doc of policy attach command https://review.openstack.org/400105 | 08:31 |
*** gongysh2 has quit IRC | 08:43 | |
openstackgerrit | lvdongbing proposed openstack/senlin: Remove dead code related to profile-delete in engine layer https://review.openstack.org/400114 | 08:44 |
openstackgerrit | Merged openstack/senlin: Add engine support for event_get2 https://review.openstack.org/399836 | 08:50 |
openstackgerrit | Merged openstack/senlin: Api support for event_get2 https://review.openstack.org/399841 | 08:52 |
openstackgerrit | RUIJIE YUAN proposed openstack/senlin: prepare for "destory" parameter in cluster-replace-nodes https://review.openstack.org/400129 | 09:04 |
openstackgerrit | miaohb proposed openstack/python-senlinclient: Fix error in cluster collect https://review.openstack.org/400133 | 09:15 |
openstackgerrit | Yanyan Hu proposed openstack/senlin: Versioned request object for receiver-delete https://review.openstack.org/400135 | 09:19 |
openstackgerrit | Yanyan Hu proposed openstack/senlin: Engine support for receiver_delete2 https://review.openstack.org/400136 | 09:19 |
*** yanyanhu has quit IRC | 09:24 | |
*** shu-mutou is now known as shu-mutou-AWAY | 09:25 | |
openstackgerrit | Merged openstack/python-senlinclient: Updated from global requirements https://review.openstack.org/395377 | 09:30 |
openstackgerrit | lvdongbing proposed openstack/senlin: Versioned request objects for profile_type https://review.openstack.org/400148 | 09:39 |
*** elynn has quit IRC | 09:51 | |
*** guoshan has quit IRC | 10:40 | |
*** guoshan has joined #senlin | 11:41 | |
*** guoshan has quit IRC | 11:46 | |
-openstackstatus- NOTICE: We are currently having capacity issues with our ubuntu-xenial nodes. We have addressed the issue but will be another few hours before new images have been uploaded to all cloud providers. | 12:20 | |
*** catintheroof has joined #senlin | 12:31 | |
openstackgerrit | XueFeng Liu proposed openstack/senlin: Fix nova resource leak https://review.openstack.org/400232 | 12:40 |
*** guoshan has joined #senlin | 12:42 | |
*** guoshan has quit IRC | 12:47 | |
openstackgerrit | Merged openstack/senlin: Update host node 'dependents' when create/delete container node https://review.openstack.org/396016 | 13:33 |
*** guoshan has joined #senlin | 13:43 | |
*** bran has quit IRC | 13:44 | |
*** guoshan has quit IRC | 13:47 | |
openstackgerrit | Qiming Teng proposed openstack/senlin: Remove NotificationPayloadBase class https://review.openstack.org/400266 | 14:20 |
openstackgerrit | Qiming Teng proposed openstack/senlin: New fields for versioned notification https://review.openstack.org/400267 | 14:20 |
openstackgerrit | Qiming Teng proposed openstack/senlin: New fields for versioned notification https://review.openstack.org/400267 | 14:21 |
*** guoshan has joined #senlin | 14:44 | |
*** guoshan has quit IRC | 14:48 | |
*** elynn has joined #senlin | 14:50 | |
*** elynn has quit IRC | 15:09 | |
*** guoshan has joined #senlin | 15:44 | |
*** guoshan has quit IRC | 15:49 | |
*** guoshan has joined #senlin | 16:45 | |
*** guoshan has quit IRC | 16:50 | |
*** guoshan has joined #senlin | 17:46 | |
*** guoshan has quit IRC | 17:51 | |
*** guoshan has joined #senlin | 19:01 | |
*** guoshan has quit IRC | 19:05 | |
*** guoshan has joined #senlin | 20:02 | |
*** guoshan has quit IRC | 20:06 | |
*** guoshan has joined #senlin | 21:03 | |
*** guoshan has quit IRC | 21:07 | |
*** shu-mutou-AWAY has quit IRC | 21:41 | |
*** guoshan has joined #senlin | 22:03 | |
*** guoshan has quit IRC | 22:08 | |
*** guoshan has joined #senlin | 23:04 | |
*** guoshan has quit IRC | 23:09 | |
*** openstack has joined #senlin | 23:47 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!