08:00:10 #startmeeting Watcher 08:00:11 Meeting started Wed Jul 17 08:00:10 2019 UTC and is due to finish in 60 minutes. The chair is licanwei. Information about MeetBot at http://wiki.debian.org/MeetBot. 08:00:12 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 08:00:14 The meeting name has been set to 'watcher' 08:00:23 hi~ 08:00:28 hello 08:01:37 #topic Announcements 08:02:08 Proof of concept threadpool achieves 90% performance improvement with 16 threads 08:02:53 In the referenced patch you can see how this poc was implemented 08:03:21 in measurements single average cell build time went from 315 to 41 seconds 08:03:53 For five cells build time was reduced from 1500 tot 151 seconds 08:03:58 It's amazing! 08:04:44 Cool 08:06:17 I want to know why you choose 16 08:06:49 because with 32 performance was similar or worse 08:07:30 Likely the number of workers ideal for infrastructure depends on many things, such as the host that Watcher runs on and the amount of Nova API's deployed. 08:07:59 how the cpu usage percent? 08:08:07 enmuu... 08:08:18 Is it much higher? 08:08:22 Beyond 16 workers I also noticed problems with url3lib not being able to open the connections 08:08:57 CPU usage on Watcher was around 8% not much higher then before. This parallel operations are all very IO bound so lots of IO wait on cpu time 08:11:14 How many workers maybe a probelm 08:11:30 I am not sure if creating a global thread pool is a good idea. 08:11:32 maybe we need a config option 08:11:55 let user to config it 08:11:56 maybe we need a method to compute. 08:12:22 Ofcourse amount of workers needs to be user configurable 08:12:37 I also think number of audit workers and parallel operation workers needs to be separate 08:13:26 what you mean 'operation workers'? 08:13:49 To perform requests like getting compute node info or instance info from Nova 08:14:02 But also perform requests like getting metrics from datasources 08:14:26 This are the two primary things that slow down Watcher 08:14:28 i see 08:15:18 The reason for global threadpool in decision engine is because audits already run in a threadpool, using a additional threadpool inside of one will cause problems 08:15:49 it can be two different workers. 08:16:30 global threadpool will not cause problem? 08:16:57 I am not familiar with this. 08:17:03 I think we should limit to one threadpool per executable 08:17:32 executable being, decision-engine, applier, api 08:18:31 action execution in applier is parallel 08:18:46 actions 08:20:25 Not sure I think actions are executed sequential but multiple actionplans can run in parallel 08:21:22 No, only one actionplan can be run each time 08:21:41 but actions can run paralle 08:21:51 Could you show in code 08:22:16 wait a minute 08:23:09 https://github.com/openstack/watcher/blob/master/watcher/applier/workflow_engine/default.py#L85 08:23:40 Now let's move on 08:24:00 #topic reviews 08:24:53 https://review.opendev.org/#/c/668598/ Improve Compute Data Model need a new patchset 08:25:19 Yes 08:25:30 I'll submit a new patch, thanks 08:26:29 Overall it looks. 08:26:34 https://review.opendev.org/#/c/670366/ Baseclass for ModelBuilder with audit scope need review 08:27:47 +1 review 08:27:51 +1 08:28:18 https://review.opendev.org/#/c/670386/ Add call_retry for ModelBuilder for error recovery 08:29:04 It's ok to me 08:29:07 +1 08:30:02 https://review.opendev.org/#/c/670453/ Move datasources folder into decision_engine 08:30:31 review +1 08:30:35 Yep. 08:30:41 totally it's ok 08:31:38 https://review.opendev.org/#/c/671014/ Remove useless _opts.py needs review 08:32:27 review +1 08:32:37 +1 08:32:58 https://review.opendev.org/#/c/669611/ Replace human_id with name in grafana doc needs review 08:33:35 the name patch has been merged. 08:33:41 Looks good to me 08:34:00 +1 08:34:25 https://review.opendev.org/#/c/669087/ remove id field from CDM needs review 08:35:02 This patch need time to review. 08:35:09 yes 08:35:34 I will try to do this week or next but I do not have a lot of time due to writting my thesis 08:35:36 I wish you can have time to review 08:36:22 The paper should be placed at a high priority. 08:36:53 I will review it again after patch updated. 08:36:53 Ok, thank you Corne 08:37:32 https://review.opendev.org/#/c/669786/ Add reource_name for save_energy in action input parameter field needs review 08:37:47 review +1 08:38:08 https://review.opendev.org/#/c/669528/ Add Python 3 Train unit tests needs review 08:38:23 I think we should not merge these two patches 08:38:47 I think we need abandon it 08:38:48 They remove python 3.6 unit tests while the linked governance article advises py3.6 should ideally be included 08:38:55 Yes. 08:39:29 ok 08:40:17 https://review.opendev.org/#/c/670936/ rollback node status 08:40:36 This patch watcher-tempest-strategies failed. 08:41:47 watcher_tempest_plugin.tests.scenario.test_execute_strategies.TestExecuteStrategies.test_execute_host_maintenance_strategy fails 08:42:05 sometimes the tempest compute node can't migarate 08:42:24 I still don't know the reason, 08:42:38 very strange~ 08:43:11 the compute maybe Ok after a few seconds 08:44:08 I'll continue to debug the reason 08:44:15 Okaj, yes lets investigate 08:44:54 #topic Discussion 08:45:14 Global threadpool proposal for decision-engine 08:45:30 I have seen processutils for concurrency now in workflow engine 08:45:47 I would like to use that for tests as well before making a decision 08:45:55 I think we need more test about the global threadpool 08:46:04 but it's a good proposal 08:46:20 yes 08:46:21 When I attempted to use an additional threadpool inside the already existing audit threadpool deadlocks occur 08:46:36 Clearly threadpools can not be safely run inside threadpools 08:46:40 this is python's problem? 08:46:57 Can you try greenthreadpool? 08:47:06 Maybe a Python related problem or just due to how futurist library implements it 08:47:31 When I try GreenThreadPool everything is executed sequantially instead of in parallel no matter how many workers 08:48:48 GreenThreadPool everything is executed sequantially ? 08:48:55 Is something wrong? 08:50:05 I do not yet fully understand difference between threadpool and the greenthreadpool but noticed this behavior. However, when trying to execute multiple audits in parallel using the greenthreadpool it still worked 08:50:34 But the requests to Nova were not executed in parallel 08:51:01 that's strange. 08:51:43 I will learn the difference between thread and greenthread after our meetting 08:52:02 Ok, we need more tests and can discuss it next meeting time 08:52:16 I will investigate using oslo_concurrency processutils and see if it runs Nova calls in parallel properly 08:52:19 Yes 08:52:55 I'll end the meeting if no more thing 08:53:11 thank you all! 08:53:26 thanks all. 08:53:39 bye~ 08:53:42 life is good. coding make me happy. 08:53:45 bye~ 08:53:55 #endmeeting