16:02:45 <mlavalle> #startmeeting neutron_performance 16:02:46 <openstack> Meeting started Mon Oct 22 16:02:45 2018 UTC and is due to finish in 60 minutes. The chair is mlavalle. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:02:47 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:02:49 <openstack> The meeting name has been set to 'neutron_performance' 16:02:50 <slaweq> hi 16:03:31 <njohnston> hello 16:03:46 <mlavalle> njohnston, slaweq, rubasov: thanks for attending 16:03:51 <rubasov> hello again 16:04:33 <clarkb> mlavalle: ah well they stopped ~12 hours ago so I don't know anyone is still around to do it for you :) 16:05:06 <mlavalle> #topic Discussion 16:05:33 <mlavalle> Eearlier today I sent each one of you (and others) a message with html files 16:05:42 <mlavalle> did everyone see it? 16:05:56 <slaweq> I'm just looking at them now 16:06:06 <rubasov> mlavalle: yes, I'm looking too 16:06:29 <rubasov> mlavalle: could you please give access to the add-router-interface.html too? 16:06:44 <rubasov> mlavalle: for some reason I can't open it 16:06:58 <slaweq> yes, I don't have access to this one also 16:07:31 <mlavalle> sure I'll resend it soon 16:08:09 <mlavalle> in the meantime, let me make a few general statements about them 16:09:46 <mlavalle> if you want to deploy a test environment with devstack enabling osprofiler, you just need to add two lines to your local.conf file: 16:09:59 <mlavalle> enable_plugin osprofiler https://git.openstack.org/openstack/osprofiler 16:10:13 <mlavalle> OSPROFILER_COLLECTOR=redis 16:10:25 <mlavalle> it doesn't have to be redis 16:10:46 <njohnston> if you use redis, do you have to already have that up and running, or does devstack handle setting it up for you? 16:10:54 <mlavalle> there are other options. But I don't have any preference and redis seems to be the most used by the odprofiler team 16:11:09 <mlavalle> njohnston: devstack sets it up for you 16:11:14 <njohnston> nice 16:11:50 <mlavalle> It most be part of what the osprofiler plugin for devstack does for you 16:13:11 <mlavalle> when stack.sh is done, you will see the following section added to neutron.conf: http://paste.openstack.org/show/732600/ 16:13:47 <mlavalle> Please note that with this setup, you enable tracing in general and sqlalchemy tracing 16:14:14 <mlavalle> and as you can see, the 'SECRET_KEY' is the default 16:14:36 <rubasov> side note: and similar config should also be added to all other openstack component config files when devstack is done 16:14:51 <slaweq> have it got big performance impact on running neutron server? if not maybe we could enable it in gate jobs? 16:14:53 <mlavalle> rubasov: you got it 16:15:06 <rubasov> slaweq: the docs say it shouldn't 16:15:12 <mlavalle> slaweq: I cannot see any diffrence in performance 16:15:25 <mlavalle> whatsoever 16:15:39 <njohnston> +1 for adding it to the CI 16:16:14 <slaweq> ok, and how to get such report from the API call? is it generated for each call then? 16:16:29 <rubasov> slaweq: it's integrated with osc 16:17:06 <rubasov> it's like this: openstack ... --os-profile SECRET_KEY 16:17:18 <mlavalle> this is an example of a command execution along with the generation of the corresponding html file: http://paste.openstack.org/show/732601/ 16:17:18 <rubasov> then a trace is generated for that call 16:17:46 <rubasov> and osc gives back an id of the trace 16:17:50 <mlavalle> Note that "Trace ID: b723c5c7-f87e-49e3-a51e-f1071a00c433" is added to the output of the command 16:18:13 <mlavalle> and you use that in the osprofiler command to generate the html file 16:18:22 <rubasov> eg: osprofiler trace show --html ... --connection-string redis://localhost:6379 16:19:20 <mlavalle> makes sene? 16:19:57 <slaweq> yep 16:20:16 <slaweq> but this means that we can't easily integrate it with tempest jobs 16:20:18 <slaweq> right? 16:20:36 <mlavalle> well, let me explain 16:20:49 <mlavalle> apparently it is integrated with Rally.... 16:21:19 <slaweq> ok, sorry for disturbing then. Go on mlavalle :) 16:21:49 <rubasov> yes as mlavalle said, the test runner must know about osprofiler, and rally seems to know that 16:21:51 <mlavalle> these early sessions of this meeting are more brainstroming than anythig else 16:22:14 <tovin07> slaweq, for tempest job, you can see this patch https://review.openstack.org/#/c/523935/ 16:22:18 <mlavalle> #link https://docs.openstack.org/rally/latest/quick_start/tutorial/step_10_profiling_openstack_internals.html 16:22:55 <tovin07> #link Add profiler support into Tempest https://review.openstack.org/#/c/523935/ 16:23:06 <slaweq> tovin07: thx, I will check that 16:23:29 <mlavalle> tovin07: ahhh, that's so cool. Is this patch something that will be merged soon? or we take the patch and depploy it? 16:23:54 <rubasov> that's nice 16:24:46 <rubasov> I'm working on the other half - enabling osprofiler in a tempest gate check job: https://review.openstack.org/611835 16:26:06 <mlavalle> that's a good step forward rubaso 16:26:10 <mlavalle> rubasov: 16:26:39 <mlavalle> I spent ometime over the weekend looking at results of our rally jobs 16:26:57 <mlavalle> and it doesn't seem that we are generating the traces 16:26:57 <rubasov> last friday I thought ps3 was working already, then today I realized I misread the logs 16:27:34 <rubasov> but I'm progressing slowly as the gate gives me a chance once every few hours 16:27:49 <mlavalle> but if we were, according to the rally docs we would generate a osprofiler id for each request 16:27:53 <rubasov> mlavalle: yes, it's only enabling osprofiler in devstack, not tracing any call yet 16:28:33 <tovin07> mlavalle, it depends on tempest team 16:28:33 <mlavalle> and then we could retrieve the file generated and generate htnl analsis 16:28:48 <mlavalle> tovin07: thanks :-) 16:28:57 <tovin07> I think the patch is good to me for now 16:29:16 <rubasov> mlavalle: exactly 16:29:36 <mlavalle> so I think one action item would be to create a rally job with osprofiler enabled 16:29:58 <tovin07> if you guys feel plain html output from osprofiler is boring 16:30:15 <tovin07> you can use jeager as backend instead of redis 16:30:37 <slaweq> mlavalle: I can take care of this rally job 16:30:44 <mlavalle> I think it should be a periodic job 16:30:48 <tovin07> like this one for a simple glance command (sorry, it's not related to neutron :D ) 16:30:49 <tovin07> http://137.116.141.93:16686/trace/92dbc0064374abfc 16:31:25 <mlavalle> nice tovin07 ! 16:31:47 <mlavalle> what do other think of a periodic job? 16:32:04 <slaweq> mlavalle: periodic rally job with enabled osprofiler? 16:32:11 <mlavalle> yeah 16:32:26 <mlavalle> I just don't want to have too many jobs in the check queue 16:32:27 <tovin07> for rally job, we can look at this patch, hope that it can help https://review.openstack.org/#/c/467252/ 16:32:58 <slaweq> would it be a problem to just enable it for neutron-rally-task job which we have now in check queue? 16:32:59 <njohnston> let's put it in experimental to start, so we can trigger it when we want it 16:33:20 <njohnston> slaweq++ 16:33:26 <mlavalle> ok, experimental sounds good 16:33:31 <rubasov> mlavalle: at first it likely can't vote anyway, so it makes sense to make it periodical instead of check 16:34:04 <mlavalle> and no, I don see a problem using neutron-rally-task as a base 16:34:11 <mlavalle> so.... 16:34:17 <mlavalle> agree? 16:34:24 <mlavalle> with the experimental job 16:34:26 <slaweq> mlavalle: but I was asking if we need new job for that 16:34:45 <mlavalle> slaweq: let's not make that decision today 16:34:57 <mlavalle> I want to see how much effect it has first 16:34:58 <rubasov> maybe stupid question: how do I trigger an experimental job? 16:35:00 <slaweq> ok, it can be experimental job at the beginning 16:35:14 <slaweq> rubasov: You write a comment "check experimental" in gerrit 16:35:22 <rubasov> slaweq: thanks 16:35:54 <slaweq> ok, I have one more question about this rally experimental job then 16:36:02 <mlavalle> #action slaweq to create a experimental rally job with osprofiler enabled 16:36:21 <slaweq> from commit message in https://review.openstack.org/#/c/467252/ and from rally docs it looks that this integration "just" put profiler id to each api call 16:36:33 <mlavalle> yes 16:36:42 <slaweq> how You think we should then genearate reports from each call in job? 16:36:44 <mlavalle> that's my understanding 16:36:56 <slaweq> should we add some script which will do it in post-run? 16:37:30 <mlavalle> my understanding is that we should be able to download the generated database and do the analysis locally 16:37:41 <mlavalle> but it is an assumption 16:37:56 <slaweq> ok, I will investigate that 16:38:03 <slaweq> and ask infra if will be necessary 16:38:04 <mlavalle> I would be looking at the person enabling the job to help us answer this 16:38:14 <mlavalle> i.e. slaweq 16:38:20 <mlavalle> :-) 16:38:22 <slaweq> :) 16:38:29 <rubasov> mlavalle: you mean the whole tracepoint db, right? 16:38:38 <mlavalle> rubasov: yeah 16:39:08 <mlavalle> this is the process I forsee: 16:39:18 <mlavalle> 1) We run the rally job 16:39:19 <rubasov> we may want to log the trace ids somewhere 16:39:43 <slaweq> rubasov: trace ids will be stored in rally result page for each iteration of each test 16:39:48 <mlavalle> 2) We identify problematci requests from the rally report 16:39:58 <rubasov> slaweq: cool 16:40:28 <mlavalle> 3) with the trace ids associated with the problematic requests, we analyze from the traces db 16:40:32 <mlavalle> makes sense? 16:40:47 <mlavalle> again, that is what I am assuming based on what I know so far 16:40:58 <slaweq> mlavalle: I think so, let's test that in practise :) 16:41:01 <rubasov> it does to me 16:41:46 <mlavalle> The other think is that someone should look at the requests that we are currently testing in our Rally job.... 16:42:16 <mlavalle> the problematic ones that slaweq identified in his original analysis are not all there 16:42:32 <mlavalle> #link http://paste.openstack.org/show/731683/ 16:42:46 <mlavalle> #link http://paste.openstack.org/show/731684/ 16:43:05 <mlavalle> any volunteer to do that? 16:43:38 <rubasov> mlavalle: I can take that 16:44:10 <mlavalle> rubasov: thanks. I think the action item is to see what is missing and start adding some of them 16:44:26 <rubasov> mlavalle: ack 16:44:28 <mlavalle> Let's focus on the top 10 offenders in slaweq's analysis 16:45:01 <mlavalle> #action rubasov to add top request offnders to rally job 16:45:03 <slaweq> maybe we can (for now) skip first one and focus on 2-10th :) 16:45:15 <mlavalle> slaweq: absolutely. good point 16:45:20 <rubasov> slaweq: of course 16:45:26 <mlavalle> LOL 16:45:36 <slaweq> :) 16:45:36 <njohnston> :-) 16:46:24 <mlavalle> The other think that draw my attention looking at the html file is that we are duplicating or triplicating the information for DB related traces 16:46:59 <mlavalle> that is due to the fact that we enable the tracing both in the neutron-lib and the neutron DB api modules 16:47:30 <mlavalle> so if you look at the each DB query, it shows up several times 16:47:44 <mlavalle> I can take an action item to experiment and simplify that 16:47:52 <mlavalle> agree? 16:48:02 <njohnston> sounds good 16:48:05 <rubasov> mlavalle: yes 16:48:20 <njohnston> let me know if you'd like any help with that 16:48:33 <mlavalle> #action mlavalle to simplify the DB traces 16:48:37 <mlavalle> njohnston: will do 16:49:24 <mlavalle> I have one final point related to the traces 16:50:03 <mlavalle> if you look for example at the subnet creation ones, the largest amount of time is spent with the dhcp server rpc 16:50:14 <mlavalle> about two thirds 16:50:59 <mlavalle> and even in other requests where the relative weight is not that much (for example router create), you will see the dhcp rcp time is significant 16:51:07 <mlavalle> even the largest individually 16:51:17 <slaweq> yes, I also noticed that now 16:51:26 <mlavalle> I am not drawing any conclusions just yet about that 16:51:46 <mlavalle> becuase I tested without concurrent load 16:52:01 <mlavalle> I just wanted to draw your attention to the fact 16:52:11 <slaweq> I also found in router-create call that there is a lot of "SELECT 1" calls there 16:52:19 <slaweq> do You know what it is maybe? 16:52:26 <mlavalle> nope 16:52:34 <rubasov> my guess was some sqlalchemy connection check 16:52:42 <rubasov> but that's just a guess 16:52:46 <njohnston> yes, I believe rubasov is correct 16:52:47 <slaweq> there is e.g. such call which took 50 ms which is quite significant comapring to other calls 16:53:03 <mlavalle> interesting 16:53:20 <mlavalle> ok 16:53:34 <mlavalle> I think we've had a very productive meeting today :-) 16:53:43 <mlavalle> anything else we should discuss? 16:54:05 <tovin07> about Berlin summit 16:54:08 <rubasov> one more note about traces I noticed 16:54:15 <tovin07> will anyone join? 16:54:23 <mlavalle> tovin07: I'll be there 16:54:34 <mlavalle> and rubasov as well 16:54:40 <rubasov> if you trace nova too the SECRET_KEY can leak into periodic jobs 16:54:58 <rubasov> just ignore that if the trace is suspiciously long 16:55:11 <mlavalle> rubasov: nice point! thanks 16:55:12 <tovin07> mlavalle, see you there :D 16:55:13 <rubasov> tovin07: yep, as mlavalle said 16:55:38 <mlavalle> tovin07: maybe we can pick your brains in Berlin, if we have questions :_) 16:56:09 <tovin07> yep, hope that i can help 16:56:14 <mlavalle> :-) 16:56:42 <mlavalle> tovin07: this meeting is bi-weekly. you are more than welcome to contribute 16:57:02 <mlavalle> ok team, thanks for your attendance 16:57:12 <rubasov> thank you guys 16:57:12 <mlavalle> #endmeeting