*** zhurong has joined #senlin | 00:33 | |
*** yanyanhu has joined #senlin | 01:36 | |
*** dixiaoli has joined #senlin | 01:38 | |
*** yanyanhu has quit IRC | 01:40 | |
*** zhurong has quit IRC | 01:58 | |
*** zhurong has joined #senlin | 02:10 | |
*** yanyanhu has joined #senlin | 02:20 | |
openstackgerrit | Qiming Teng proposed openstack/senlin-dashboard master: DNM: Use mock for unit test instead of mox3 https://review.openstack.org/455113 | 03:31 |
---|---|---|
*** catintheroof has joined #senlin | 03:43 | |
*** catintheroof has quit IRC | 03:51 | |
openstackgerrit | Merged openstack/senlin master: Add unit test for action sqlarchemy api. https://review.openstack.org/454954 | 04:17 |
openstackgerrit | XueFeng Liu proposed openstack/senlin master: Add cidr, ip_version to sub_net create https://review.openstack.org/455135 | 06:50 |
openstackgerrit | RUIJIE YUAN proposed openstack/senlin master: revise action data dumping https://review.openstack.org/452425 | 07:16 |
openstackgerrit | Qiming Teng proposed openstack/senlin-dashboard master: DNM: Use mock for unit test instead of mox3 https://review.openstack.org/455113 | 07:18 |
openstackgerrit | Qiming Teng proposed openstack/senlin-dashboard master: DNM: Use mock for unit test instead of mox3 https://review.openstack.org/455113 | 07:44 |
XueFeng | QiMing, onlin? local tempest prepare has been finished. Next will add create/delete to each test cause. | 07:53 |
XueFeng | online | 07:53 |
Qiming | great! thx! | 07:54 |
XueFeng | my pleasure | 07:54 |
XueFeng | And after these test causes can passed in local. | 07:55 |
XueFeng | We can remove KEEP_LOCALRC=1 in gate? | 07:55 |
*** ruijie has joined #senlin | 08:08 | |
openstackgerrit | RUIJIE YUAN proposed openstack/senlin master: dump data to action.outputs for deletion process https://review.openstack.org/455151 | 08:18 |
Qiming | XueFeng, I'm not quite sure about that | 08:18 |
Qiming | maybe yanyanhu knows more | 08:19 |
yanyanhu | hi | 08:19 |
*** elynn has joined #senlin | 08:20 | |
yanyanhu | XueFeng, this flag is for ensuring our local setting on devstack take effect during the test | 08:20 |
yanyanhu | last time we checked the job template, this flag is required. Not sure whether it is still needed(I guess so?) | 08:21 |
yanyanhu | hi, XueFeng, any special requirement for removing it? | 08:23 |
XueFeng | hi yanyanhu, KEEP_LOCAL=1 use devstack auth. And I am supporting temptest use dynamical auth. | 08:28 |
yanyanhu | devstack auth, you mean? | 08:29 |
XueFeng | I think KEEP_LOCALRC=1 will user,project... which we installed in devstack | 08:30 |
yanyanhu | XueFeng, you mean tempest will use those users/projects pre-created during the devstack installation? | 08:31 |
XueFeng | Yep | 08:31 |
XueFeng | In /opt/stack/tempest# vi etc/tempest.conf | 08:32 |
XueFeng | [auth] | 08:32 |
XueFeng | use_dynamic_credentials = True | 08:32 |
XueFeng | use_dynamic_credentials = True is default | 08:33 |
XueFeng | And if we use this conf, we can't passs senlin tempest | 08:34 |
openstackgerrit | Merged openstack/senlin master: Add cidr, ip_version to sub_net create https://review.openstack.org/455135 | 08:34 |
XueFeng | We will get No keypair | 08:34 |
yanyanhu | XueFeng, yes, I think tempest will use dynamic user credential by default | 08:34 |
XueFeng | for example | 08:34 |
XueFeng | And last time | 08:34 |
yanyanhu | XueFeng, I think on keypair is because that keypair is created by other user(e.g. admin) and thus will be invisible for those users tempest uses | 08:35 |
XueFeng | When Ocata release | 08:35 |
yanyanhu | on/no | 08:35 |
XueFeng | We got fail in senlin gate | 08:35 |
yanyanhu | so a better solution could be creating keypair before using it in tempest test :) | 08:36 |
XueFeng | And use KEEP_LOCALRC=1 to avoid | 08:36 |
XueFeng | So now, I am supporting use_dynamic_credentials = True | 08:36 |
XueFeng | fro senlin tempest | 08:36 |
XueFeng | for | 08:36 |
XueFeng | yes, yanyanhu | 08:37 |
yanyanhu | hi, XueFeng, KEEP_LOCALRC was added last time because some change in devstack side make our local configuration been overridden and thus failed our test jobs | 08:37 |
yanyanhu | so it was added to ensure those configuration we added in pre_test_hook takes effect | 08:38 |
yanyanhu | use_dynamic_credentials is default True I guess? | 08:38 |
yanyanhu | after tempest is installed | 08:39 |
yanyanhu | I did notice there were some users/projects left in my system when some of my tempest test failed locally :) | 08:39 |
yanyanhu | for creating keypair, we did similar thing before | 08:40 |
XueFeng | Yes | 08:40 |
yanyanhu | let me find some example for you | 08:40 |
XueFeng | I know this. | 08:40 |
Qiming | the reason we added 'KEEP_LOCALRC=1' last time was that we cannot customize senlin.conf when setting up tempest | 08:40 |
Qiming | it is not related to authentication as I see it | 08:40 |
XueFeng | OK | 08:41 |
Qiming | we wanted to switch cloud_backend to 'openstack_test' for api and functional tests, but the customization was overridden due to devstack-lib logics | 08:41 |
XueFeng | Ok, so these are two problems. | 08:42 |
yanyanhu | http://git.openstack.org/cgit/openstack/senlin/tree/senlin/tests/tempest/integration/test_nova_server_cluster.py#n27 | 08:42 |
yanyanhu | here is the logic to create keypair before creating nova server in integration test | 08:43 |
XueFeng | Yes, this is create by admin | 08:43 |
XueFeng | But for api and functional test we can't pass now in loacl. | 08:44 |
yanyanhu | I guess it should be created by the user that tempest uses to perform the integration test? | 08:44 |
XueFeng | yes | 08:44 |
yanyanhu | not admin | 08:44 |
XueFeng | https://review.openstack.org/#/c/453670/ | 08:45 |
yanyanhu | XueFeng, you mean the functional test can't pass if it is performed locally? | 08:45 |
XueFeng | So some patches was add | 08:45 |
XueFeng | Yes, this may begain from Ocata Release | 08:46 |
XueFeng | https://bugs.launchpad.net/senlin/+bug/1678625 | 08:46 |
openstack | Launchpad bug 1678625 in senlin "senlin local temptest fail" [Undecided,In progress] - Assigned to XueFeng Liu (jonnary-liu) | 08:46 |
ruijie | the network 'private' created by tempest that 'shared' property is set to 'False' by default, is that still fine? | 08:46 |
yanyanhu | if so, consider to change is_admin_manager=True to False? | 08:47 |
XueFeng | The is_admin_manager is add by me:) | 08:47 |
yanyanhu | I guess we don't have to create keypair as admin user? | 08:47 |
yanyanhu | yes, I know that :) | 08:47 |
XueFeng | also network, subnet | 08:48 |
yanyanhu | I didn't noticed that it changes the default behavior of integration | 08:48 |
XueFeng | So next will use these methods. | 08:48 |
ruijie | its not shared by talents is not admin ? | 08:48 |
XueFeng | yanyanhu, I have test passed in may local | 08:49 |
yanyanhu | ruijie, ideally, we should create our own network for integration test :) | 08:49 |
yanyanhu | as what we are doing for keypair | 08:49 |
XueFeng | Not submit the last patch | 08:49 |
ruijie | agreed yanyanhu :) | 08:50 |
yanyanhu | XueFeng, ok. So I feel we'd better avoid using admin user during tempest test | 08:50 |
XueFeng | Yes | 08:50 |
yanyanhu | tempest itself will handle the credential creation/usage during the test : ) | 08:52 |
XueFeng | Yes | 08:52 |
yanyanhu | in opposite, we create every resource we need for testing | 08:52 |
yanyanhu | and delete them after test is done | 08:52 |
XueFeng | This is way we failed in current senlin version | 08:52 |
XueFeng | I am sloving this problem | 08:52 |
yanyanhu | great, thanks :) | 08:53 |
XueFeng | keypair ,network, subnet api maybe changed when Ocata relase. | 08:53 |
yanyanhu | yes | 08:54 |
yanyanhu | they could be | 08:54 |
XueFeng | And then it cause senlin gate fail in February | 08:54 |
XueFeng | Then we use https://review.openstack.org/#/c/437851/ to slove | 08:55 |
yanyanhu | hi, XueFeng, that fix is not for API change in keypair/network or compute | 08:56 |
XueFeng | OK | 08:56 |
yanyanhu | it is for a change in devstack which makes our local configuration failed to takes effect during gate installation | 08:56 |
yanyanhu | just as Qiming mentioned, we will revise the "cloud_backend=openstack_test" option for functional test | 08:57 |
yanyanhu | this change is added in pre_test_hook.sh | 08:57 |
XueFeng | Got it. Thanks yanyanhu, Qiming | 08:58 |
yanyanhu | so they are two different problems :) | 08:58 |
yanyanhu | one is about fixing gate by add 'KEEP_LOCALRC=1' | 08:58 |
xuhaiwei | Hi guys, sorry to bother you, I got this error when trying to deploy senlin resource from heat, I tried to create two policies(scalein/scaleout) and attach them to one cluster, but the first ACTION didn't release the lock though it has completed, this is the error log from senlin-engine http://paste.openstack.org/show/606016/ and the error log from heat-engine http://paste.openstack.org/show/606018/ | 08:58 |
yanyanhu | another is about better designing our tempest test case :) | 08:59 |
*** shu-mutou is now known as shu-mutou-AWAY | 08:59 | |
xuhaiwei | this is the heat template I am using http://paste.openstack.org/show/606019/ | 09:00 |
xuhaiwei | sorry to interrupt | 09:00 |
xuhaiwei | Can anyone take a look of it? yanyanhu, elynn, Qiming, Xuefeng | 09:03 |
yanyanhu | xuhaiwei, sounds like a bug. So the first action didn't release the cluster lock? | 09:03 |
yanyanhu | after it succeeded | 09:03 |
xuhaiwei | yanyanhu, yes it seems, the sencond POLICY_ATTACH action's status is ready, but didn't excuted | 09:04 |
Qiming | looks like a db concurrency error | 09:05 |
Qiming | this should never occur because basically you are inserting two records to the cluster_policy table | 09:06 |
Qiming | if the lock wasn't released by the first action, you may want to check senlin-engine log | 09:07 |
elynn | Hi xuhaiwei | 09:07 |
xuhaiwei | Qiming, I have pasted senlin-engine log above | 09:07 |
xuhaiwei | elynn: is there any mistake in template? | 09:08 |
elynn | seems some lock issue in senlin.... | 09:09 |
Qiming | the template looks fine to me | 09:09 |
elynn | The design in senlin resource is execute action one by one, so might need to dig deeper in senlin side.. | 09:10 |
elynn | I guess. | 09:10 |
xuhaiwei | can anyone recreate the error in your environment? | 09:10 |
elynn | xuhaiwei: Do you encounter this everytime? | 09:10 |
xuhaiwei | elynn: yes, every time | 09:10 |
elynn | hmm... | 09:10 |
elynn | Let me check heat codes first... | 09:11 |
Qiming | senlin side retry logic was removed .... | 09:13 |
xuhaiwei | I tried it again, error is still there, but the error log is different | 09:13 |
xuhaiwei | 2017-04-10 09:12:37.198 ERROR senlin.engine.senlin_lock [req-12813ee9-2974-4afd-b8e9-883aeb208814 None None] Cluster is already locked by action [u'8d4dfd7e-5c83-4426-9b6b-3284a70f3377'], action deeb1567-5e1e-4426-b0c9-dbd710603a6e failed grabbing the lock | 09:13 |
xuhaiwei | this is all the error log in senlin-engine | 09:14 |
Qiming | yep, that is normal | 09:14 |
Qiming | two requests arrived at senlin attempting to lock the same cluster | 09:15 |
Qiming | one of them will fail | 09:15 |
Qiming | because the other takes some time to release it | 09:15 |
Qiming | http://git.openstack.org/cgit/openstack/senlin/commit/senlin/engine/senlin_lock.py?id=b6e8d758a0e33547a3fd78c434337e80b0a826fa | 09:16 |
xuhaiwei | Qiming, the first action is finished, it will release the lock to the action which is in 'READY' status, isn't it? | 09:17 |
Qiming | yes | 09:17 |
Qiming | but the second action is supposed to do a RETRY | 09:17 |
XueFeng | else: # result == self.RES_RETRY: | 09:17 |
XueFeng | status = self.READY | 09:17 |
XueFeng | # Action failed at the moment, but can be retried | 09:17 |
XueFeng | # We abandon it and then notify other dispatchers to execute it | 09:17 |
XueFeng | ao.Action.abandon(self.context, self.id) | 09:17 |
XueFeng | Maybe here we can optimize | 09:18 |
XueFeng | the notify hasn't been send immidately. And we will try this action when next action come. | 09:19 |
Qiming | problem is here: http://git.openstack.org/cgit/openstack/senlin/tree/senlin/engine/actions/base.py#n479 | 09:20 |
Qiming | we somehow assigned a action.RES_ERROR to the result variable during exception handling on line 482 | 09:21 |
Qiming | that line should be removed | 09:21 |
Qiming | hopefully, the finally statement on line 491 will get executed and set the action to RES_RETRY | 09:21 |
Qiming | looks like action.execute() raised an exception here ? | 09:22 |
elynn | seems like so. | 09:24 |
*** dixiaoli has quit IRC | 09:24 | |
Qiming | so we are not properly getting a RES_RETRY return value | 09:24 |
Qiming | okay, here: http://git.openstack.org/cgit/openstack/senlin/tree/senlin/engine/actions/cluster_action.py#n1002 | 09:26 |
Qiming | we didn't expect cluster_lock_acquire to raise exceptions | 09:26 |
*** dixiaoli has joined #senlin | 09:26 | |
yanyanhu | hi, Qiming, in case there are some errors happen during the action execution and the status of some resources have been changed, if we simplly let the action to retry, more serious error could happen | 09:26 |
Qiming | so we only checked the returned value | 09:26 |
xuhaiwei | Qiming, it seems no exceptions happened in action.execute() | 09:26 |
yanyanhu | I mean simply removing line482 could cause some problems | 09:27 |
Qiming | xuhaiwei, yanyanhu, check execute() method in cluster_action | 09:27 |
Qiming | line 1002 | 09:27 |
yanyanhu | yes, we should catch and handle exception here | 09:27 |
Qiming | the cluster_lock_acquire may actuall throw an exception, in addition to return some meaningful value | 09:28 |
Qiming | we failed to catch it | 09:28 |
yanyanhu | yep | 09:28 |
Qiming | if we catch it here, it is a RES_RETRY | 09:28 |
Qiming | and it is safe | 09:28 |
yanyanhu | yes | 09:28 |
yanyanhu | it should be a try in this case | 09:28 |
xuhaiwei | Qiming, yanyanhu, I tested it, after catch the exception you said, it worked | 09:32 |
Qiming | cool | 09:33 |
Qiming | a different fix would be keep it at the db level | 09:33 |
yanyanhu | Qiming, +1 | 09:33 |
yanyanhu | that is better | 09:33 |
xuhaiwei | anyone is writing patch for this? | 09:36 |
Qiming | xuhaiwei as far as I know, :) | 09:36 |
xuhaiwei | Qiming, any hint on fixing it at db layer? | 09:37 |
Qiming | studying | 09:37 |
Qiming | to minimize the potential impact to other code | 09:38 |
yanyanhu | xuhaiwei, take this as an example :) | 09:38 |
yanyanhu | http://git.openstack.org/cgit/openstack/senlin/tree/senlin/db/sqlalchemy/api.py#n863 | 09:38 |
Qiming | I'd suggest we wrap the call here: http://git.openstack.org/cgit/openstack/senlin/tree/senlin/db/sqlalchemy/api.py#n433 | 09:39 |
xuhaiwei | Qiming , yanyanhu, thanks , I will try to fix it | 09:40 |
Qiming | with a try: ... catch db_exc.DBDuplicateEntry: | 09:40 |
xuhaiwei | it seems DBDuplicateEntry is not happened every time | 09:41 |
Qiming | em, right | 09:42 |
Qiming | DBDuplicateEntry is a guard against DB level concurrency | 09:42 |
elynn | The weird thing here is that https://github.com/openstack/senlin/blob/master/senlin/db/sqlalchemy/api.py#L425 , two query seems happened at the same time. | 09:42 |
elynn | Then they both insert the lock. | 09:43 |
elynn | Qiming's way it a way to fix. | 09:43 |
Qiming | I don't have a good idea what the return value should be even if you caught the exception there | 09:43 |
elynn | Should be None | 09:43 |
elynn | I guess. | 09:43 |
Qiming | None is okay | 09:44 |
Qiming | but have to double check the senlin_lock module and see if None is properly handled, right? | 09:44 |
Qiming | no, should return [] | 09:45 |
Qiming | or else, you will get 'None is not iterable' or similar errors in senlin_lock | 09:46 |
elynn | Oh, yes, you are right. | 09:46 |
*** yanyanhu has quit IRC | 09:46 | |
Qiming | then it means cluster_lock_acquire from senlin_lock module will return False if lock has been grabbed | 09:47 |
Qiming | we don't need to change ClusterAction.execute() to add exception handling | 09:47 |
Qiming | however, removing line 482 here is still recommended, xuhaiwei : http://git.openstack.org/cgit/openstack/senlin/tree/senlin/engine/actions/base.py#n482 | 09:48 |
xuhaiwei | Qiming, ok, will test it | 09:50 |
xuhaiwei | Qiming, by remove line 482, the error is still remained | 09:54 |
Qiming | line 482 should be removed although it is not the key issue of the bug | 09:56 |
elynn | xuhaiwei: To fix your problem: catch DBDuplicateEntry and return [] at http://git.openstack.org/cgit/openstack/senlin/tree/senlin/db/sqlalchemy/api.py#n433 | 10:15 |
xuhaiwei | elynn: it worked? I will test it, thank you | 10:16 |
xuhaiwei | elynn: I am afraid the exception is not happend there | 10:16 |
elynn | it didn't work? | 10:19 |
xuhaiwei | elynn: yes, exception didn't happen there | 10:21 |
elynn | Maybe I misunderstand how session work. | 10:21 |
xuhaiwei | I will go home now, see you guys | 10:22 |
elynn | From your log, it is happened in cluster_lock_acquire xuhaiwei | 10:22 |
elynn | xuhaiwei: ttl | 10:23 |
xuhaiwei | yes | 10:23 |
*** elynn has quit IRC | 10:43 | |
*** dixiaoli has quit IRC | 10:48 | |
*** zhurong has quit IRC | 12:26 | |
*** catintheroof has joined #senlin | 12:36 | |
*** openstackgerrit has quit IRC | 13:33 | |
*** openstackgerrit has joined #senlin | 13:34 | |
*** ChanServ sets mode: +v openstackgerrit | 13:34 | |
openstackgerrit | XueFeng Liu proposed openstack/senlin master: Add prepare and cleanup for nova server local tempest https://review.openstack.org/455325 | 13:34 |
openstackgerrit | XueFeng Liu proposed openstack/senlin master: Add prepare and cleanup to all needed test cases https://review.openstack.org/455331 | 13:59 |
openstackgerrit | XueFeng Liu proposed openstack/senlin master: Add prepare and cleanup to all needed test cases https://review.openstack.org/455331 | 14:01 |
openstackgerrit | XueFeng Liu proposed openstack/senlin master: Add prepare and cleanup to all needed test cases https://review.openstack.org/455331 | 14:20 |
*** sweeneyb has quit IRC | 21:22 | |
*** catintheroof has quit IRC | 21:25 | |
*** catintheroof has joined #senlin | 22:27 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!