#openstack-meeting log

14:03:02 <andreykurilin> #startmeeting Rally
14:03:03 <openstack> Meeting started Mon Oct 31 14:03:02 2016 UTC and is due to finish in 60 minutes.  The chair is andreykurilin. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:03:04 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:03:07 <openstack> The meeting name has been set to 'rally'
14:03:11 <andreykurilin> o/
14:03:51 <amaretskiy> hi
14:03:57 <astarove> hi
14:05:19 <andreykurilin> I did not have enough time to prepare my notes related to summit, so I'll present them at next meeting
14:05:39 <andreykurilin> for today we have one item from agenda
14:05:43 <andreykurilin> let's start from it
14:05:56 <andreykurilin> #topic  [astudenov] Let's discuss how db api should save task results data in chunks
14:07:23 <andreykurilin> astidenov proposed a change - https://review.openstack.org/#/c/391421/
14:07:35 <andreykurilin> it fixes an issue related to saving results
14:07:52 <andreykurilin> nice, astudenov here)
14:08:37 <astudenov> hi all
14:09:12 <andreykurilin> in case of multi-node rally deployment, such fix will not work, since it is done on db-layer while saving full results. But it fixes "current" issue with saving big results
14:09:33 <andreykurilin> astudenov: hi. I started intro of your work)
14:11:21 <astudenov> andreykurilin: ok, thanks
14:11:48 <andreykurilin> so here are several questions: on what stage should we apply "chunking" ? What size chunks should be? chunk per iteration or specific amount of data? should we merge current fix and return to this question when work on distributed runner will start?
14:12:10 <andreykurilin> amaretskiy: any ideas?
14:13:06 <amaretskiy> i think it is reasonable to have chunk size configurable and it shoudl keep 1 or more iterations results
14:13:35 <amaretskiy> i need to review the fix, no ideas about merging right now
14:13:43 <amaretskiy> also I see -w on this patch
14:14:29 <andreykurilin> amaretskiy: yeah, it is just a wip. It was created to discuss the idea)
14:14:37 <andreykurilin> PoC
14:14:47 <amaretskiy> i will take a look on it
14:14:48 <astarove> I think what 1 chink == 1 iteration is reasonable
14:15:33 <amaretskiy> in case 1 c = 1i - should we have chunk term at all ?
14:15:46 <astarove> I't small enouth to store data and big enoth to track results
14:16:17 <astarove> why not?
14:16:34 <andreykurilin> yeah, 1c = 1i sounds reasonable, but it can be redundant in case of very simple scenarios(with one atomic action)
14:16:39 <amaretskiy> if 1c=1i then we have duplicated entity
14:16:48 <astarove> @chunk@ it's a part of data
14:17:06 <amaretskiy> as well as iteration results :)
14:18:16 <astarove> <andreykurilin> is it possible to set some limit on data&
14:18:18 <astarove> ?
14:18:23 <astudenov> also we need to take into account that we planned to compress chunks
14:19:48 <amaretskiy> we are currently compressing results
14:20:00 <andreykurilin> btw, in case of 1c = 1i and rps runner(let it be 400rps), we can kill database
14:20:01 <andreykurilin> lol
14:20:08 <amaretskiy> this is the only choice because json has redundant size
14:20:47 <amaretskiy> chunk size should be as max as possible
14:20:49 <amaretskiy> imho
14:21:01 <amaretskiy> 100 iterations :)
14:21:06 <amaretskiy> or more
14:21:08 <astarove> so actually the question is to set limit on chunk size
14:21:28 <astudenov> 1c=1i seems not right to me
14:21:42 <amaretskiy> the number of iterations is chunk can be easily changed, i guess
14:21:59 <amaretskiy> why not having it configurable?
14:22:15 <amaretskiy> even via python constant for first time
14:22:18 <andreykurilin> we will make it configurable, but we need to set default value:)
14:22:24 <astudenov> amaretsky: yes, I think it should be a config file field
14:22:26 <amaretskiy> 100
14:22:31 <amaretskiy> 42 :)
14:22:40 <andreykurilin> 42
14:22:41 <astarove> 42 (+1)
14:22:44 <andreykurilin> +1 for 42
14:22:45 <andreykurilin> lol
14:22:54 <amaretskiy> so we have 42 + 1 +1 = 44
14:23:02 <amaretskiy> :-D
14:23:03 <andreykurilin> xD
14:23:11 <astudenov> i am currently using 1k
14:23:25 <amaretskiy> 420
14:24:04 <amaretskiy> initial value is not critical, i think
14:24:09 <andreykurilin> it looks like we have first agreement
14:24:22 <astarove> 1k is not much better than 42 )
14:24:30 <amaretskiy> 1024
14:24:49 <andreykurilin> #agreement the size of chunks should be configurable
14:25:12 <amaretskiy> #vote for configurable and initial value >=42
14:25:23 <andreykurilin> let it be 1k
14:25:31 <amaretskiy> #agreed
14:25:37 <astarove> #agreed
14:25:37 <andreykurilin> nice
14:25:47 <andreykurilin> so we closed one of several questions
14:26:01 <astudenov> #agreed 1k iterations per chunk
14:26:02 <andreykurilin> another one - on which stage should we save results to db?
14:27:00 <astarove> together with preparation of report?
14:27:43 <astudenov> current solution can be merged only as a temporary fix for saving big results.
14:27:47 <amaretskiy> report is not related to saving data
14:28:38 <andreykurilin> astudenov: if we merge current fix, there is a possibility that your boss will be satisfied and will not allow to continue work on improments
14:28:39 <andreykurilin> lol
14:28:49 <andreykurilin> *improvements
14:28:56 <amaretskiy> i think the most efficient way is saving data in the very end of scenario run, however this requires having enough rom
14:29:42 <amaretskiy> if some issues expected then lets save chunks as soon as we hav ethem
14:30:05 <astudenov> so result consumer should save results by chunks too
14:30:28 <andreykurilin> saving chunks asap can led to rally performance degradation due to jsonschema validation of data
14:30:58 <andreykurilin> but yeah, it would be nice to save them asap
14:32:02 <andreykurilin> chunk size should as big as possible to reduce number of validation calls
14:32:23 <andreykurilin> or validation on smaller parts can affect performance in less way
14:32:25 <andreykurilin> hm..
14:32:33 <andreykurilin> we should check it:)
14:32:43 <andreykurilin> *validation of smaller
14:33:23 <amaretskiy> how about pure-python validation (fast)
14:33:57 <astudenov> as far as i know we dont use jsonschema for validating results? https://github.com/openstack/rally/blob/master/rally/task/runner.py#L241-L301
14:34:34 <andreykurilin> oh... I forgot that we already abandoned jsonschema for task results
14:35:18 <andreykurilin> ok, so I'm ok to save chunks as soon as we have them
14:35:28 <andreykurilin> and it should not be hard to implement
14:38:29 <andreykurilin> do we have an agreement?
14:39:41 <andreykurilin> amaretskiy astarove astudenov ^
14:41:14 <astudenov> I agree that we should save chunk as soon as we have 1k iterations in results consumer
14:41:45 <andreykurilin> ok
14:41:52 <astarove> looks like we have one more agreement
14:41:59 <andreykurilin> nice)
14:42:12 <andreykurilin> ok, anything else to discuss on current topic?
14:42:59 <amaretskiy> nothing from me
14:43:06 <astudenov> me too
14:43:10 <andreykurilin> ok
14:43:14 <astarove> no
14:43:22 <andreykurilin> #topic Free discussion
14:43:57 <andreykurilin> I found a perfect Irish Pub... Is it a good topic for discussion?))
14:45:21 <astudenov> :)
14:46:00 <andreykurilin> ok, if noone have topics to discuss, let's finish our meeting and I'll go there
14:46:04 <astarove> it's good topic for selfy
14:46:14 <andreykurilin> )))
14:46:33 <andreykurilin> thanks guys for participation
14:46:35 <andreykurilin> see you
14:46:38 <andreykurilin> #endmeeting