*** lkarm has joined #senlin | 00:10 | |
*** lkarm has quit IRC | 00:15 | |
*** Qiming has joined #senlin | 00:47 | |
Qiming | morning | 01:01 |
---|---|---|
xuhaiwei | Qiming, morning | 01:15 |
*** Yanyanhu has joined #senlin | 01:15 | |
*** branw has quit IRC | 01:20 | |
*** Yanyan has joined #senlin | 01:22 | |
Yanyan | hi, xuhaiwei, about the failure in these two patches, will check whether we need a fix in the test case | 01:24 |
Yanyan | https://review.openstack.org/213964 and https://review.openstack.org/213944 | 01:25 |
*** Yanyanhu has quit IRC | 01:25 | |
xuhaiwei | Yanyan, I am checking them too, it seems not the test case's problem | 01:26 |
Yanyan | ok | 01:26 |
Yanyan | it happened locally after I recreate my test environment | 01:27 |
Yanyan | I think it is cause by the version update of some packages, e.g. oslo.db | 01:27 |
xuhaiwei | maybe | 01:27 |
Yanyan | I guess they changed the exception msg when unavailable sort_dir is provided | 01:29 |
Yanyan | thus caused these failures | 01:29 |
Yanyan | will make a test | 01:29 |
xuhaiwei | ok | 01:29 |
Yanyan | hi, xuhaiwei, I think it is the reason | 01:32 |
Yanyan | just feel the new error msg is a little weird | 01:32 |
Yanyan | 'Unknown sort direction, must be one of: asc-nullsfirst, asc-nullslast, desc-nullsfirst, desc-nullslast' | 01:32 |
Yanyan | why there is string 'null' here | 01:32 |
xuhaiwei | have no idea | 01:33 |
Yanyan | hmm, seems they do use these strings in the latest code | 01:35 |
xuhaiwei | how to reproduce this error? | 01:37 |
Yanyan | just run tox -epy27 -r | 01:37 |
Yanyan | I think one of recent package version update cause this problem | 01:38 |
Yanyan | will propose a fix for this | 01:38 |
*** Qiming has quit IRC | 01:39 | |
*** mathspanda has joined #senlin | 01:43 | |
xuhaiwei | see this Yanyan, https://github.com/openstack/oslo.db/blob/master/oslo_db/sqlalchemy/utils.py#L143 | 01:45 |
xuhaiwei | it's due to oslo.db's update | 01:46 |
Yanyan | yes | 01:46 |
Yanyan | and we didn't update our local package and thus didn't find it | 01:46 |
*** ChrisSen has joined #senlin | 01:52 | |
*** Qiming has joined #senlin | 01:53 | |
*** elynn has joined #senlin | 01:54 | |
openstackgerrit | Yanyan Hu proposed stackforge/senlin: Fix some test cases about illegal sort_dir https://review.openstack.org/214416 | 01:56 |
xuhaiwei | Yanyan, dont you think it's a little strange? we give the sort_dir='desc', but the error message calls for 'desc-nullsfirst' ? | 01:56 |
Yanyan | hi, xuhaiwei, the patch has been proposed here, let's see whether it can fix the problem, thanks | 01:56 |
Yanyan | yes, that's why I feel the msg is a little weird... | 01:57 |
Yanyan | don't understand why there is a 'nulls' here | 01:57 |
xuhaiwei | it seems both 'desc' and 'desc-nullsfirst' works | 01:58 |
Yanyan | yes | 01:58 |
Yanyan | and oslo_db also use this word in their own test cases for utils module | 01:59 |
Yanyan | like this: http://git.openstack.org/cgit/openstack/oslo.db/tree/oslo_db/tests/sqlalchemy/test_utils.py#n236 | 02:00 |
Yanyan | oh | 02:01 |
Yanyan | I guess desc now equal to desc-nullsfirst + desc-nullslast | 02:01 |
xuhaiwei | yes | 02:02 |
xuhaiwei | maybe they want to be more specific | 02:02 |
Yanyan | they provide more accurate support for query | 02:02 |
Yanyan | right | 02:02 |
Yanyan | hmm, good news :) | 02:02 |
Yanyan | ok, lets see whether this fix works. If so, we can try to rebase those blocked patches. | 02:03 |
*** Qiming_ has joined #senlin | 02:03 | |
xuhaiwei | ok | 02:03 |
Yanyan | hello, Qiming | 02:04 |
Yanyan | seems your network is not stable ;) | 02:04 |
*** Qiming has quit IRC | 02:05 | |
*** Qiming__ has joined #senlin | 02:05 | |
xuhaiwei | it seems he is having some meetup | 02:06 |
*** Qiming_ has quit IRC | 02:06 | |
Yanyan | yes | 02:07 |
*** Qiming__ has quit IRC | 02:15 | |
openstackgerrit | Yanyan Hu proposed stackforge/senlin: Check size limitation in cluster scale in/out action https://review.openstack.org/213964 | 02:18 |
*** jdandrea has quit IRC | 02:20 | |
*** jroyal has joined #senlin | 02:22 | |
*** jroyal has quit IRC | 02:26 | |
openstackgerrit | Yanyan Hu proposed stackforge/senlin: Add functional test for listing profile_type https://review.openstack.org/213040 | 02:34 |
openstackgerrit | Merged stackforge/senlin: Fix some test cases about illegal sort_dir https://review.openstack.org/214416 | 02:35 |
openstackgerrit | Yanyan Hu proposed stackforge/senlin: Add functional test for listing policy_types https://review.openstack.org/213626 | 02:35 |
*** lkarm has joined #senlin | 02:37 | |
openstackgerrit | xu-haiwei proposed stackforge/senlin: Revise cluster-scale-in/out default value https://review.openstack.org/213944 | 02:38 |
openstackgerrit | xu-haiwei proposed stackforge/senlin: Handle exceptions in keystone_v3 driver https://review.openstack.org/213569 | 02:38 |
*** Qiming has joined #senlin | 02:41 | |
Qiming | sigh, network is very limited | 02:42 |
*** lkarm has quit IRC | 02:42 | |
Yanyan | yes | 02:43 |
Yanyan | the meetup has started? | 02:43 |
Qiming | yes | 02:45 |
openstackgerrit | Yanyan Hu proposed stackforge/senlin: Add functional test for listing profile_type https://review.openstack.org/213040 | 02:45 |
Yanyan | the hackthon is tomorrow? | 02:45 |
Qiming | it already started | 02:45 |
Yanyan | oh | 02:45 |
Qiming | fixing/reviewing bugs | 02:45 |
Yanyan | nice :) | 02:46 |
xuhaiwei | which project? | 02:51 |
Yanyan | I guess in several projects | 02:53 |
*** Qiming has quit IRC | 02:55 | |
*** mathspanda has quit IRC | 02:57 | |
*** Qiming has joined #senlin | 03:03 | |
openstackgerrit | xu-haiwei proposed stackforge/senlin: Fix some exception mapping miss https://review.openstack.org/214431 | 03:04 |
*** elynn_ has joined #senlin | 03:09 | |
*** elynn has quit IRC | 03:13 | |
Qiming | hi | 03:44 |
Qiming | the sort_dir patch | 03:45 |
Yanyan | hello | 03:45 |
Qiming | I was wondering if it is affecting | 03:45 |
Yanyan | you mean? | 03:46 |
Qiming | what is your oslo.db version? | 03:47 |
Yanyan | let me check | 03:47 |
Yanyan | 2.4.0 for tox | 03:48 |
Yanyan | and 2.1.0 in local | 03:48 |
Yanyan | seems gate uses a newer version than the one defined in requirement | 03:50 |
*** Qiming has quit IRC | 03:55 | |
*** Qiming has joined #senlin | 03:56 | |
Qiming | ah, I see, gate is using oslo.db 2.4.0 | 04:01 |
Yanyan | yes | 04:01 |
Yanyan | so breaked our test | 04:01 |
*** Qiming has quit IRC | 04:06 | |
openstackgerrit | xu-haiwei proposed stackforge/senlin: Fix some exception mapping miss https://review.openstack.org/214431 | 04:16 |
*** mathspanda has joined #senlin | 04:26 | |
*** mathspanda has quit IRC | 04:30 | |
*** mathspanda has joined #senlin | 04:31 | |
openstackgerrit | Yanyan Hu proposed stackforge/senlin: Use wait_for_delete to wait for nova server deletion https://review.openstack.org/214448 | 04:56 |
*** Qiming has joined #senlin | 05:14 | |
*** lkarm has joined #senlin | 05:20 | |
*** lkarm has quit IRC | 05:24 | |
*** Qiming has quit IRC | 05:26 | |
*** Qiming has joined #senlin | 05:29 | |
*** Qiming has quit IRC | 05:29 | |
*** Qiming has joined #senlin | 05:30 | |
openstackgerrit | LinPeiyu proposed stackforge/senlin: Fix misleading document for webhooks usage https://review.openstack.org/214455 | 05:39 |
*** Qiming has quit IRC | 05:43 | |
*** Qiming has joined #senlin | 05:43 | |
*** Qiming_ has joined #senlin | 05:46 | |
*** Qiming_ has quit IRC | 05:47 | |
openstackgerrit | xu-haiwei proposed stackforge/senlin: Fix some exception mapping miss https://review.openstack.org/214431 | 05:48 |
*** Qiming has quit IRC | 05:50 | |
*** Qiming has joined #senlin | 05:55 | |
openstackgerrit | Yanyan Hu proposed stackforge/senlin: Allow NODE_DELETE action to steal node lock https://review.openstack.org/214459 | 06:00 |
Yanyan | hi, Qiming, free to talk? | 06:00 |
Qiming | not now, about to present | 06:00 |
Yanyan | ok, talk later | 06:01 |
Yanyan | have a good lecture :) | 06:01 |
Qiming | will do my best | 06:01 |
Qiming | Ken is presenting Neutron, I'm kinda lost at the moment | 06:02 |
Yanyan | neutron is complicated... | 06:02 |
mathspanda | hi, xuhaiwei. | 06:06 |
mathspanda | '-c' for the specified cluster, but '-C' is for crendential. | 06:06 |
xuhaiwei | yes | 06:07 |
mathspanda | the example i wrote is '-C' | 06:07 |
mathspanda | oh. i know what's wrong. | 06:08 |
xuhaiwei | :) | 06:08 |
mathspanda | thanks.:) | 06:08 |
xuhaiwei | nope | 06:08 |
openstackgerrit | LinPeiyu proposed stackforge/senlin: Fix misleading document for webhooks usage https://review.openstack.org/214455 | 06:10 |
openstackgerrit | Merged stackforge/senlin: Add functional test for listing profile_type https://review.openstack.org/213040 | 06:25 |
openstackgerrit | Merged stackforge/senlin: Fix some exception mapping miss https://review.openstack.org/214431 | 06:38 |
openstackgerrit | Merged stackforge/senlin: Fix misleading document for webhooks usage https://review.openstack.org/214455 | 06:38 |
*** ChrisSen has quit IRC | 06:54 | |
Qiming | presentation done | 07:02 |
xuhaiwei | about what | 07:03 |
Yanyan | how about it? | 07:03 |
Yanyan | seems it's the heat's meeting time | 07:03 |
Qiming | yes | 07:04 |
xuhaiwei | what kind of people have joined? | 07:04 |
Yanyan | I'm gonna join openstack-meeting channel to listen :) | 07:07 |
Yanyan | hi, xuhaiwei, I think it's a China openstack community activity | 07:08 |
*** xuhaiwei_ has joined #senlin | 07:10 | |
xuhaiwei_ | it seems there are many this kind of meetup in China | 07:11 |
*** xuhaiwei has quit IRC | 07:13 | |
Yanyan | xuhaiwei_, yes | 07:13 |
*** jroyal has joined #senlin | 07:16 | |
*** jroyal has quit IRC | 07:20 | |
Yanyan | hi, Qiming, are you free now? | 07:41 |
Qiming | fine | 07:41 |
Yanyan | just pushed some comments on the node lock patch | 07:41 |
Yanyan | I think you're right that it's not safe to steal the node lock in most cases | 07:42 |
Yanyan | I think the only safe case is the old owner action of node has gone | 07:42 |
Qiming | if you have some nodes that cannot be deleted | 07:42 |
Qiming | you will first investigate why it is still locked | 07:42 |
Qiming | there could be some bugs in the code | 07:43 |
Yanyan | yep | 07:43 |
Yanyan | so if the code is good writing, this should never happen | 07:43 |
Qiming | if you are allowing node deletion unconditionally, a lot of bugs will be masked | 07:43 |
Yanyan | unless you kill/restart the engine | 07:43 |
Yanyan | yes | 07:43 |
Yanyan | so maybe we just allow the lock stealing when we can ensure the parent engine of node's owner action has gone | 07:44 |
Yanyan | in other cases, we don't allow it | 07:44 |
Qiming | when you find some nodes cannot be deleted, you will first look into why that happened | 07:44 |
*** mathspanda has quit IRC | 07:44 | |
Yanyan | yes, if it happen accidentally, this should be a bug | 07:45 |
Qiming | there are two cases: bug in code (e.g. exception not caught) leaving node still locked, need to be fixed | 07:45 |
Qiming | or there are cases beyond our control, if that is the case, we check the node status, and decide whether to force a steal | 07:46 |
Yanyan | about checking the node status, you mean check the 'status' attr of node? | 07:47 |
Qiming | stealing locks unconditionally for NODE_DELETE action is bad, it is like the HARestarter resource | 07:49 |
Qiming | node status | 07:49 |
Qiming | under certain conditions, we may find that node must be deleted forciably, if that is the case, we will check node status for a decision | 07:50 |
Qiming | maybe there will be other cases for testing | 07:50 |
Qiming | other conditions, sorry | 07:50 |
Yanyan | you mean we should delete node when it is in status like 'ACTIVE' 'INIT'? | 07:53 |
Yanyan | but not 'CREATING' 'DELETING' or 'UPDAING' | 07:54 |
Yanyan | or some logic like this? | 07:55 |
Qiming | yes | 08:04 |
Qiming | we will find out which status is safe to delete, which status is not | 08:04 |
Qiming | the basic assumption (starting point) would be: under no condition will we forcibly delete a node, unless we cannot find out a solution | 08:05 |
Yanyan | agree with the assumption, but I think we may not be able to make the decision from just checking the status attr of node | 08:06 |
Qiming | saw your comments | 08:07 |
Yanyan | unless we know the detail of physical resource behind the node | 08:07 |
Qiming | right, we need to deal them case by case | 08:07 |
Yanyan | yes | 08:07 |
Yanyan | so before that, only case we can handle is the engine dying | 08:08 |
Qiming | about multi-engine case, there needs a special logic when engine starts | 08:08 |
Qiming | it was documented in the FEATURES.rst as 'scavenger' process | 08:08 |
Yanyan | yes, something like a scaning | 08:08 |
Qiming | i.e. when a engine starts up, it will look for nodes/clusters .... those that in a hangup status and recover them | 08:09 |
Yanyan | yes. this will be the complete solution | 08:10 |
Yanyan | ok, will add support for engine alive check to help decide whether we need node lock stealing before we can support more cases | 08:15 |
Qiming | we need a design here | 08:15 |
Yanyan | hmm, for scavenger | 08:15 |
Qiming | currently, we don't have multi-engine support, right? | 08:15 |
Yanyan | right | 08:15 |
Qiming | it was designed, but not yet implemented | 08:16 |
Qiming | so there is a priority here | 08:16 |
Qiming | either we add multi-engine support first, then add scavenger | 08:16 |
Qiming | if we plan like this, the scavenger would be a complete design | 08:16 |
Qiming | on the other hand, if we add scavenger now, and implement multi-engine support later | 08:17 |
Qiming | the scavenger will have to be rewritten | 08:17 |
Qiming | s/will/may | 08:17 |
Yanyan | hmm, actually, our current implementation should support multiple engines theoretically, we just didn't test it before | 08:19 |
Yanyan | but it may not be able to work correctly since some parts like dispatcher may not support it | 08:20 |
xuhaiwei_ | though don't understand the 'scavenger' well, do we need it now? I think the second way sounds better | 08:21 |
Yanyan | xuhaiwei_, we may don't need to add it now, but if we want to support multiple engine, it is necessary I think | 08:22 |
xuhaiwei_ | just saw the FEATURES.rst, 'scavenger process' is in the High priority list | 08:23 |
Qiming | My experience writing test cases for the scheduler and dispatcher module told me that multi-engine is not finished | 08:23 |
Qiming | xuhaiwei_, it was there as high because we were assuming that multi-engine support is ready | 08:24 |
Yanyan | yes, that's true | 08:24 |
Yanyan | Qiming, just as you said, we need a plan for this feature | 08:25 |
xuhaiwei_ | what do you mean by 'multi-engine is not finished' | 08:25 |
Yanyan | maybe not in liberty-3, but we need a timeline for it | 08:25 |
*** LiuWei has joined #senlin | 08:26 | |
Qiming | xuhaiwei_, it means you start two senlin-engine processes to service user requests | 08:27 |
Yanyan | hi, xuhaiwei_, we actually only run a single engine thread now | 08:27 |
Qiming | multi engine set up is a workaround to the Python's GIL (Global Interpreter Locking) problem | 08:28 |
xuhaiwei_ | due to my understanding, new engine service will be started when some new request is coming, so if only one request is there, only one engine service is started , right? | 08:29 |
Qiming | engine service is a process | 08:31 |
Qiming | we handle requests using eventlets -- a Python emulation of multi-threads, as other projects do | 08:31 |
xuhaiwei_ | so multi-engine processes are already started before request comes? | 08:33 |
Qiming | yes | 08:36 |
Qiming | these engines will share requests forwarded by the senlin-api process | 08:36 |
xuhaiwei_ | got it | 08:36 |
xuhaiwei_ | just confirmed heat started 9 engine process by default | 08:37 |
Yanyan | you can check this option in senlin.conf #num_engine_workers = 1 | 08:38 |
Yanyan | it shoud 1 by default | 08:38 |
xuhaiwei_ | yes | 08:39 |
Qiming | xuhaiwei_, it depends on your number of processors I think | 08:39 |
xuhaiwei_ | oh | 08:40 |
Yanyan | so, Qiming, what is your opinion about it? | 08:45 |
*** mathspanda has joined #senlin | 08:45 | |
Yanyan | should we start working on multiple engine support first? | 08:45 |
Qiming | it would be great if we can double confirm the multi-engine support | 08:45 |
Yanyan | hmm, yes, we can do some tests about it | 08:46 |
Qiming | then we base the scavenger work on it | 08:46 |
Qiming | great, thanks | 08:46 |
Yanyan | no problem. And do we need the interim solution before we can support this feature? for engine died case | 08:47 |
Yanyan | or we can handle it manually | 08:48 |
Yanyan | since the work will be replaced after scavenger is supported | 08:48 |
Qiming | I don't think we need to do it | 08:48 |
Yanyan | ok | 08:49 |
Yanyan | will do some tests about multiple engine, hope there are no much holes there :) | 08:49 |
openstackgerrit | Yanyan Hu proposed stackforge/senlin: Use wait_for_delete to wait for nova server deletion https://review.openstack.org/214448 | 09:04 |
Yanyan | make some tests using concurrent cluster creating and deleting with two engine threads, seems the basic workflow is ok :) | 09:19 |
Yanyan | the node_create and cluster_create actions were assigned to two engines nearly equal | 09:20 |
*** Qiming has quit IRC | 09:30 | |
*** mathspanda has quit IRC | 09:35 | |
xuhaiwei_ | cool, Yanyan | 09:36 |
Yanyan | looks good, create and delete 4 cluster with 14 nodes :) | 09:36 |
Yanyan | of course, there was not exception happened during node creation and deletion | 09:36 |
Yanyan | otherwise, there could be error happened | 09:37 |
Yanyan | I guess maybe we can enable two engine threads by default when doing daily development work | 09:37 |
Yanyan | can help to find problem :) | 09:37 |
xuhaiwei_ | ok | 09:38 |
*** Qiming has joined #senlin | 09:38 | |
Yanyan | let me increase the cluster size to 20 | 09:38 |
Qiming | okay | 09:40 |
Yanyan | looks pretty good ;) | 09:40 |
Yanyan | 5 cocurrent cluster with 40 nodes: 36 heat stacks and 4 nova server | 09:40 |
Qiming | just for creation? | 09:41 |
Yanyan | cocurrent creation and deletion :) | 09:41 |
Yanyan | wrote a shell script, sleep 1 second before each step | 09:41 |
Yanyan | let me make more tests | 09:42 |
Qiming | okay | 09:42 |
Yanyan | but the api response become slow obviously... | 09:43 |
Yanyan | cost about 2 seconds to get response | 09:44 |
Yanyan | ah, this time, a cluster deletion failed although all its node has been deleted | 09:45 |
Qiming | okay, is that a concurrency problem? | 09:48 |
Qiming | I'm afraid we have such problems when dealing with locks | 09:48 |
Yanyan | hmm, guess so, but the second deletion succeeded | 09:48 |
Yanyan | yes, also think so | 09:49 |
Yanyan | guess it is caused by lock competition between cluster creating and deleting action | 09:49 |
Qiming | em | 09:50 |
Qiming | I'm feeling we have some hidden bug there | 09:50 |
Qiming | the action logics | 09:50 |
Yanyan | since in the test, I deleted cluster just a second after the creating request was sent out | 09:50 |
Qiming | those APIs were written without a careful design and it was not thoroughly revised | 09:51 |
Yanyan | em, need more tests here | 09:51 |
Yanyan | oh, a question is should we focus on this issue before l-3 deadline? | 09:53 |
Qiming | As I can recall, Zhai HF has done some tests there, he told me it is not stable somehow | 09:53 |
Qiming | if it is an action api problem it should be solved asap | 09:53 |
Yanyan | that's true | 09:54 |
Yanyan | so maybe we enable multiple engine by default, I think this can help us find problem | 09:54 |
Qiming | right | 09:56 |
Yanyan | em, will use multi-engine env for daily work | 09:59 |
Yanyan | prepare to leave | 10:00 |
Yanyan | see U guys tomorrow | 10:02 |
*** Yanyan has quit IRC | 10:07 | |
*** elynn_ has quit IRC | 10:18 | |
openstackgerrit | Merged stackforge/senlin: Revise cluster-scale-in/out default value https://review.openstack.org/213944 | 10:44 |
openstackgerrit | Merged stackforge/senlin: Use Senlin generic driver to manage ceilometer_v2 driver https://review.openstack.org/213593 | 10:45 |
*** LiuWei has quit IRC | 11:21 | |
*** Qiming has quit IRC | 11:35 | |
*** branw has joined #senlin | 11:53 | |
*** lkarm has joined #senlin | 12:34 | |
*** jdandrea has joined #senlin | 13:55 | |
*** Qiming has joined #senlin | 14:42 | |
*** Qiming has quit IRC | 16:07 | |
*** lkarm has quit IRC | 16:58 | |
*** lkarm has joined #senlin | 16:59 | |
*** lkarm has quit IRC | 16:59 | |
*** lkarm has joined #senlin | 17:00 | |
*** lkarm has quit IRC | 17:09 | |
*** lkarm has joined #senlin | 17:10 | |
*** jdandrea has left #senlin | 19:24 | |
*** jdandrea has joined #senlin | 19:24 | |
*** lkarm has quit IRC | 19:46 | |
*** lkarm has joined #senlin | 19:46 | |
*** lkarm has quit IRC | 19:47 | |
*** lkarm has joined #senlin | 19:47 | |
*** lkarm has quit IRC | 21:26 | |
*** lkarm has joined #senlin | 21:27 | |
*** lkarm has quit IRC | 21:31 | |
*** lkarm has joined #senlin | 21:52 | |
*** lkarm has quit IRC | 21:56 | |
*** lkarm has joined #senlin | 22:17 | |
*** lkarm has quit IRC | 22:23 | |
*** lkarm has joined #senlin | 22:23 | |
*** lkarm has quit IRC | 22:23 | |
*** lkarm has joined #senlin | 22:23 | |
*** lkarm has quit IRC | 22:28 | |
*** xuhaiwei_ has quit IRC | 23:32 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!