14:03:02 <andreykurilin> #startmeeting Rally 14:03:03 <openstack> Meeting started Mon Oct 31 14:03:02 2016 UTC and is due to finish in 60 minutes. The chair is andreykurilin. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:03:04 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:03:07 <openstack> The meeting name has been set to 'rally' 14:03:11 <andreykurilin> o/ 14:03:51 <amaretskiy> hi 14:03:57 <astarove> hi 14:05:19 <andreykurilin> I did not have enough time to prepare my notes related to summit, so I'll present them at next meeting 14:05:39 <andreykurilin> for today we have one item from agenda 14:05:43 <andreykurilin> let's start from it 14:05:56 <andreykurilin> #topic [astudenov] Let's discuss how db api should save task results data in chunks 14:07:23 <andreykurilin> astidenov proposed a change - https://review.openstack.org/#/c/391421/ 14:07:35 <andreykurilin> it fixes an issue related to saving results 14:07:52 <andreykurilin> nice, astudenov here) 14:08:37 <astudenov> hi all 14:09:12 <andreykurilin> in case of multi-node rally deployment, such fix will not work, since it is done on db-layer while saving full results. But it fixes "current" issue with saving big results 14:09:33 <andreykurilin> astudenov: hi. I started intro of your work) 14:11:21 <astudenov> andreykurilin: ok, thanks 14:11:48 <andreykurilin> so here are several questions: on what stage should we apply "chunking" ? What size chunks should be? chunk per iteration or specific amount of data? should we merge current fix and return to this question when work on distributed runner will start? 14:12:10 <andreykurilin> amaretskiy: any ideas? 14:13:06 <amaretskiy> i think it is reasonable to have chunk size configurable and it shoudl keep 1 or more iterations results 14:13:35 <amaretskiy> i need to review the fix, no ideas about merging right now 14:13:43 <amaretskiy> also I see -w on this patch 14:14:29 <andreykurilin> amaretskiy: yeah, it is just a wip. It was created to discuss the idea) 14:14:37 <andreykurilin> PoC 14:14:47 <amaretskiy> i will take a look on it 14:14:48 <astarove> I think what 1 chink == 1 iteration is reasonable 14:15:33 <amaretskiy> in case 1 c = 1i - should we have chunk term at all ? 14:15:46 <astarove> I't small enouth to store data and big enoth to track results 14:16:17 <astarove> why not? 14:16:34 <andreykurilin> yeah, 1c = 1i sounds reasonable, but it can be redundant in case of very simple scenarios(with one atomic action) 14:16:39 <amaretskiy> if 1c=1i then we have duplicated entity 14:16:48 <astarove> @chunk@ it's a part of data 14:17:06 <amaretskiy> as well as iteration results :) 14:18:16 <astarove> <andreykurilin> is it possible to set some limit on data& 14:18:18 <astarove> ? 14:18:23 <astudenov> also we need to take into account that we planned to compress chunks 14:19:48 <amaretskiy> we are currently compressing results 14:20:00 <andreykurilin> btw, in case of 1c = 1i and rps runner(let it be 400rps), we can kill database 14:20:01 <andreykurilin> lol 14:20:08 <amaretskiy> this is the only choice because json has redundant size 14:20:47 <amaretskiy> chunk size should be as max as possible 14:20:49 <amaretskiy> imho 14:21:01 <amaretskiy> 100 iterations :) 14:21:06 <amaretskiy> or more 14:21:08 <astarove> so actually the question is to set limit on chunk size 14:21:28 <astudenov> 1c=1i seems not right to me 14:21:42 <amaretskiy> the number of iterations is chunk can be easily changed, i guess 14:21:59 <amaretskiy> why not having it configurable? 14:22:15 <amaretskiy> even via python constant for first time 14:22:18 <andreykurilin> we will make it configurable, but we need to set default value:) 14:22:24 <astudenov> amaretsky: yes, I think it should be a config file field 14:22:26 <amaretskiy> 100 14:22:31 <amaretskiy> 42 :) 14:22:40 <andreykurilin> 42 14:22:41 <astarove> 42 (+1) 14:22:44 <andreykurilin> +1 for 42 14:22:45 <andreykurilin> lol 14:22:54 <amaretskiy> so we have 42 + 1 +1 = 44 14:23:02 <amaretskiy> :-D 14:23:03 <andreykurilin> xD 14:23:11 <astudenov> i am currently using 1k 14:23:25 <amaretskiy> 420 14:24:04 <amaretskiy> initial value is not critical, i think 14:24:09 <andreykurilin> it looks like we have first agreement 14:24:22 <astarove> 1k is not much better than 42 ) 14:24:30 <amaretskiy> 1024 14:24:49 <andreykurilin> #agreement the size of chunks should be configurable 14:25:12 <amaretskiy> #vote for configurable and initial value >=42 14:25:23 <andreykurilin> let it be 1k 14:25:31 <amaretskiy> #agreed 14:25:37 <astarove> #agreed 14:25:37 <andreykurilin> nice 14:25:47 <andreykurilin> so we closed one of several questions 14:26:01 <astudenov> #agreed 1k iterations per chunk 14:26:02 <andreykurilin> another one - on which stage should we save results to db? 14:27:00 <astarove> together with preparation of report? 14:27:43 <astudenov> current solution can be merged only as a temporary fix for saving big results. 14:27:47 <amaretskiy> report is not related to saving data 14:28:38 <andreykurilin> astudenov: if we merge current fix, there is a possibility that your boss will be satisfied and will not allow to continue work on improments 14:28:39 <andreykurilin> lol 14:28:49 <andreykurilin> *improvements 14:28:56 <amaretskiy> i think the most efficient way is saving data in the very end of scenario run, however this requires having enough rom 14:29:42 <amaretskiy> if some issues expected then lets save chunks as soon as we hav ethem 14:30:05 <astudenov> so result consumer should save results by chunks too 14:30:28 <andreykurilin> saving chunks asap can led to rally performance degradation due to jsonschema validation of data 14:30:58 <andreykurilin> but yeah, it would be nice to save them asap 14:32:02 <andreykurilin> chunk size should as big as possible to reduce number of validation calls 14:32:23 <andreykurilin> or validation on smaller parts can affect performance in less way 14:32:25 <andreykurilin> hm.. 14:32:33 <andreykurilin> we should check it:) 14:32:43 <andreykurilin> *validation of smaller 14:33:23 <amaretskiy> how about pure-python validation (fast) 14:33:57 <astudenov> as far as i know we dont use jsonschema for validating results? https://github.com/openstack/rally/blob/master/rally/task/runner.py#L241-L301 14:34:34 <andreykurilin> oh... I forgot that we already abandoned jsonschema for task results 14:35:18 <andreykurilin> ok, so I'm ok to save chunks as soon as we have them 14:35:28 <andreykurilin> and it should not be hard to implement 14:38:29 <andreykurilin> do we have an agreement? 14:39:41 <andreykurilin> amaretskiy astarove astudenov ^ 14:41:14 <astudenov> I agree that we should save chunk as soon as we have 1k iterations in results consumer 14:41:45 <andreykurilin> ok 14:41:52 <astarove> looks like we have one more agreement 14:41:59 <andreykurilin> nice) 14:42:12 <andreykurilin> ok, anything else to discuss on current topic? 14:42:59 <amaretskiy> nothing from me 14:43:06 <astudenov> me too 14:43:10 <andreykurilin> ok 14:43:14 <astarove> no 14:43:22 <andreykurilin> #topic Free discussion 14:43:57 <andreykurilin> I found a perfect Irish Pub... Is it a good topic for discussion?)) 14:45:21 <astudenov> :) 14:46:00 <andreykurilin> ok, if noone have topics to discuss, let's finish our meeting and I'll go there 14:46:04 <astarove> it's good topic for selfy 14:46:14 <andreykurilin> ))) 14:46:33 <andreykurilin> thanks guys for participation 14:46:35 <andreykurilin> see you 14:46:38 <andreykurilin> #endmeeting