14:01:31 <rvasilets_> #startmeeting Rally 14:01:32 <openstack> Meeting started Mon Feb 15 14:01:31 2016 UTC and is due to finish in 60 minutes. The chair is rvasilets_. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:01:33 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:01:35 <openstack> The meeting name has been set to 'rally' 14:01:48 <rvasilets_> Hi to all 14:03:07 <andreykurilin> hi! 14:03:09 <andreykurilin> o/ 14:03:11 <rvasilets_> looks like we have not a lot of topics for today 14:03:20 <andreykurilin> are you sure?) 14:03:26 <rvasilets_> Lets wait a bit 14:03:29 <ikhudoshyn> o/ 14:03:38 <rvasilets_> Maybe someone else join to us 14:04:18 <ikhudoshyn> andreykurilin: around? you seem to had a topic 14:05:00 <andreykurilin> yes, I'm here:) let's wait for another attenders a bit 14:05:23 <ikhudoshyn> i'd love to report 'bout my install_rally.sh refactoring (we want to install rally in venv on gates) but i'm at the middle of testing 14:05:30 <andreykurilin> :) 14:05:47 <ikhudoshyn> ..so I won't take your time today)) 14:06:02 <andreykurilin> let's start? 14:06:07 <rvasilets_> Okey 14:06:14 <andreykurilin> I have a one topic 14:06:32 <andreykurilin> raised by saurabh__ in our main chat today 14:06:43 <rvasilets_> andreykurilin, lets start from you 14:06:48 <andreykurilin> ok ok 14:06:53 <rvasilets_> what is the topic? 14:07:15 <andreykurilin> keystone can kill rally 14:07:16 <andreykurilin> lol 14:07:26 <andreykurilin> sounds like a good topic :D 14:07:37 <andreykurilin> #topic keystone can kill rally 14:07:51 <andreykurilin> rvasilets, can you set a topic? 14:08:03 <rvasilets_> #topic keystone can kill rally 14:08:26 <andreykurilin> nice:) 14:08:39 <rvasilets_> this is my privilege) 14:09:36 <andreykurilin> In case of "dead" keystone and big number of parallel iterations, keystoneclient will open a lot of sockets 14:10:32 <andreykurilin> saurabh__ faced with the issue, when rally was unable to open a db file to write the results of task 14:10:45 <andreykurilin> sqlite was used 14:11:03 <rvasilets_> Is that a problem of rally or sqlite for example? 14:11:43 <rvasilets_> This is limitation of sqlite not Rally 14:11:48 <rvasilets_> possibly 14:11:50 <rvasilets_> ? 14:11:57 <andreykurilin> it is limitation of the system in general 14:12:31 <andreykurilin> the problem on the rally side - we don't handle such cases 14:13:35 <andreykurilin> Maybe, we can check the limit before saving results and increase it if possible 14:13:38 <rvasilets_> Could be fix this somehow? 14:14:00 <andreykurilin> At least, we can catch the error and write user-friendly error 14:14:06 <rvasilets_> limit of what? 14:14:31 <andreykurilin> limit of "open files" 14:14:36 <ikhudoshyn> andreykurilin: sorry i kinda dont follow. how lots of open sockets prevent us from writing to sqlite? 14:14:47 <ikhudoshyn> i see 14:15:26 <andreykurilin> ikhudoshyn: it depends on system settings 14:15:35 <ikhudoshyn> maybe just post a warning? 14:15:46 <andreykurilin> when?) 14:15:54 <ikhudoshyn> during parsing of scenario? 14:15:57 <rvasilets_> the biggest thin that we could to do is raise user friendly msg here 14:16:24 <andreykurilin> ikhudoshyn: each time? it will bother 14:16:41 <andreykurilin> we have already 2 warnings(from boto and from requests) 14:16:48 <andreykurilin> and I want to remove them:) 14:16:49 <ikhudoshyn> like 'dear user you are to run lots of iterations, you might need many open sockets, pls make sure you can' 14:17:27 <ikhudoshyn> can you increase limits in runtime, not being a root? 14:17:45 <andreykurilin> ikhudoshyn: I suppose we can check the system limit before launching task and print a warning 14:17:56 <andreykurilin> ikhudoshyn: I don't have such experience:) 14:18:26 <ikhudoshyn> andreykurilin: that's what i suggested, that did not seem to satisfy u 14:18:35 <rvasilets_> Did we filled the bug? 14:18:51 <andreykurilin> ikhudoshyn: https://docs.python.org/2/library/resource.html#resource.setrlimit 14:18:56 <ikhudoshyn> I mean we parse scenario, check limits, if they are too low -- we warn 14:19:03 <rvasilets_> this is really bad thin 14:19:11 <andreykurilin> maybe, it is possible to change a limit 14:19:20 <andreykurilin> but we need to check 14:19:41 <andreykurilin> rvasilets_: no, we don't have filed bug yet 14:19:57 <ikhudoshyn> well, I'm not sure this could be a good idea -- changing system settings quietly 14:20:11 <rvasilets_> agree 14:20:35 <rvasilets_> we should just show error or warning 14:20:44 <rvasilets_> an steps how to fix it 14:21:15 <ikhudoshyn> andreykurilin: what d'you think? 14:22:20 <andreykurilin> ikhudoshyn: It would be nice to have the check proposed by you, in case of sqlite backend and user-friendly error in db-layer 14:23:08 <ikhudoshyn> why db layer? I believe it's a somewhat wider issue 14:23:51 <ikhudoshyn> like we e.g. could run in an issue when we're unable to open sockets as well as files? 14:24:20 <andreykurilin> currently, we faced we such issue at db-layer:) http://paste.openstack.org/show/486988/ 14:24:42 <ikhudoshyn> so it is not just 'we'll be possibly unable ti store results in db' but 'we'll be possibly unable to do any writes/reads' 14:25:17 <andreykurilin> yes 14:25:53 <ikhudoshyn> so db layer does not look like the very best place) 14:27:02 <andreykurilin> task layer already tries to catch all errors and wrap them with user-friendly exception 14:27:56 <andreykurilin> and only db-layer is not wrapped with any try...except 14:29:11 <ikhudoshyn> it's not good) but it is not necessarily connected to system limits 14:29:30 <rvasilets_> we need bigger count of reraising =) 14:30:49 <andreykurilin> ikhudoshyn: Are you talking about the paste posted above? 14:31:17 <ikhudoshyn> nope, i'm talking about the 'limits' issue in general 14:32:31 <andreykurilin> I know about only one limit - open files:) which relates to open new files and new sockets and new threads:) 14:32:47 <ikhudoshyn> if we're sure that the 'pasebin' issue related to 'limits' -- even then i dont think it is a good idea to catch exception and print warning like 'shit happens during operationg with sqlite -- check limits' 14:33:18 <ikhudoshyn> andreykurilin: yes -- we are talking about THAT limit ) 14:33:51 <rvasilets_> =) 14:35:43 <ikhudoshyn> so... 14:35:52 <andreykurilin> why you don't think that it is a good idea? imo, it would be nice to catch such errors, maybe, execute "time.sleep()" and try again 14:36:19 <rvasilets_> where we would sleep() ?) 14:36:45 <ikhudoshyn> rvasilets_: at home) 14:36:54 <rvasilets_> reraising error we could lost the trace 14:37:12 <rvasilets_> and not found exact occurance of error 14:37:23 <rvasilets_> I'm for warning 14:37:36 <ikhudoshyn> andreykurilin: from what I could see in the paste -- nothing gives any hint that the issue is related with the limit of open files 14:37:56 <andreykurilin> ikhudoshyn: It was not a full log 14:37:57 <andreykurilin> lol 14:38:03 <andreykurilin> http://paste.openstack.org/show/486959/ 14:38:05 <andreykurilin> look here 14:38:07 <ikhudoshyn> hm.. nice,) 14:38:10 <andreykurilin> L3 14:38:53 <andreykurilin> rvasilets_: we have log.exception to store an original trace 14:39:05 <ikhudoshyn> Failed to consume a task from the queue: Unable to establish connection to https://192.169.123.50:5000/v2.0 14:39:22 <ikhudoshyn> see? we got lot's of issues here, not just db related 14:40:04 <andreykurilin> ikhudoshyn: i start this topic from the phrase "keystone can kill rally" 14:40:06 <andreykurilin> :) 14:40:13 <ikhudoshyn> so I strongly suggest to print a warning during scenario parsing/validation 14:40:23 <ikhudoshyn> )) 14:40:28 <rvasilets_> yea) +1 for warning) 14:40:41 <andreykurilin> so, keystone is dead -> keystoneclient continue to open new sockets -> rally failed to write the results 14:41:05 <ikhudoshyn> andreykurilin: it can indeed. but catching exception at db layer won't save us anyway)) 14:41:56 <andreykurilin> we can print the results(json.dumps) in stdout in case of unability to save in db 14:41:57 <andreykurilin> lol 14:42:05 <rvasilets_> lol 14:42:14 <ikhudoshyn> we don't want to 'sleep()' until all ks client connections close and release file handlers, do we? 14:42:49 <andreykurilin> we can add one more tries in several seconds 14:42:54 <andreykurilin> in can help 14:43:02 <andreykurilin> and it will not produce a big delay 14:43:14 <andreykurilin> *it can help 14:43:20 <ikhudoshyn> and what if we just run out of disk space))? 14:43:56 <ikhudoshyn> we're still won't be able to store data, after 1 sec of after 100 14:44:18 <andreykurilin> yes, but it is another issue, which should be fixed separated 14:44:22 <andreykurilin> or don't fixed:) 14:45:37 <ikhudoshyn> disagree. the issues is 'we can't write db' 14:46:08 <rvasilets_> the error would be sqlite3.OperationalError 14:46:28 <rvasilets_> the same as for many things) 14:46:39 <ikhudoshyn> what we could do is to list all possible reasons to user (which i don't believe to be a good idea) or to describe the reasons in a runbook 14:46:43 <andreykurilin> ok, but this issue can appeared in different cases. so of them can be processed and handled, another - not 14:46:55 <rvasilets_> only difference is the trace) and possibly msg 14:47:05 <andreykurilin> +1 for a runbook 14:47:23 <ikhudoshyn> )) 14:47:43 <ikhudoshyn> and getting back to the separate warning -- do we need it? 14:48:34 <andreykurilin> I prefer a warning after the event :) 14:49:21 <ikhudoshyn> which event? sqlite3.OperationalError 14:49:23 <ikhudoshyn> ? 14:50:07 <rvasilets_> catching the error by checking the number of used resources and available and writing warning - possibly we can, I don't see here the evil 14:50:14 <ikhudoshyn> so to be consistent you should say 'db shit happens -- pls check limits, free space, access rights.. what else?' 14:51:21 <rvasilets_> we could catch to many files opens 14:51:24 <andreykurilin> we can check limits and free space in "catch" code 14:51:35 <andreykurilin> I write a proper message 14:51:35 <rvasilets_> and check keystone 14:51:37 <rvasilets_> here 14:52:11 <rvasilets_> bECAUSE FAILED KEYSTONE UNDER LOAD THIS IS COMMON PROBLEM 14:52:15 <rvasilets_> sorry 14:52:29 <ikhudoshyn> ok i give up... you are to creathe the whole recovery and diagnostic system for just one specific case 14:52:46 <rvasilets_> and all this stuff was caused by failed keystone 14:53:05 <rvasilets_> ) 14:53:12 <ikhudoshyn> rvasilets_: yesssss, but we're talking about file limits and not ks at all 14:53:19 <rvasilets_> we could use th simple rule 14:53:22 <rvasilets_> 80/20 14:54:02 <ikhudoshyn> so what 80/20 gonna tell you in a case of sqlite3.OperationalError 14:54:04 <ikhudoshyn> ? 14:54:14 <ikhudoshyn> ks sux? 14:54:35 <andreykurilin> :) 14:54:49 <andreykurilin> ikhudoshyn: btw, we already tries to catch something - https://github.com/openstack/rally/blob/master/rally/cli/cliutils.py#L567-L572 14:54:57 <ikhudoshyn> if we see gazillion of nable to establish connection to https://192.169.123.50:5000/v2.0 14:55:26 <ikhudoshyn> we could say it is a ks issue, but if we cant write to db -- why ks is here at all? 14:56:09 <andreykurilin> ikhudoshyn: Can we reserve one "open file" for db-stuff? 14:56:24 <ikhudoshyn> andreykurilin: that is a sample of good warning. 'db issue -- pls check yr db' 14:56:57 <ikhudoshyn> but 'db issue -- pls check yr limits.. or check yr ks' -- it is a bad warning)) 14:57:20 <andreykurilin> ikhudoshyn: ok, but user will check the rights and disk space and will not find the reason of issue 14:57:33 <ikhudoshyn> andreykurilin: I dont know. If we could -- it would be great 14:58:07 <andreykurilin> ikhudoshyn: free resources and disk space can be checked by us ;) 14:58:09 <ikhudoshyn> i was thinking about keeping it always open -- but it could be fragile 14:58:12 <andreykurilin> and even rights 14:58:57 <rvasilets_> We have not much time here do we have any agreed? 14:58:59 <rvasilets_> ) 14:59:09 <ikhudoshyn> andreykurilin: ^^ hm.. do you want to check everything in a case of issue? 14:59:14 <andreykurilin> maybe 14:59:18 <ikhudoshyn> btw we're out of time 14:59:19 <andreykurilin> why not? 14:59:34 <rvasilets_> Okey 14:59:37 <andreykurilin> let's move to our general chat 14:59:42 <andreykurilin> *main 14:59:45 <ikhudoshyn> lets continue in slack 14:59:47 <rvasilets_> #agree almost agreed) 15:00:01 <rvasilets_> See you next meeting 15:00:05 <andreykurilin> see you 15:00:09 <rvasilets_> #endmeeting