08:00:10 <licanwei> #startmeeting Watcher 08:00:11 <openstack> Meeting started Wed Jul 17 08:00:10 2019 UTC and is due to finish in 60 minutes. The chair is licanwei. Information about MeetBot at http://wiki.debian.org/MeetBot. 08:00:12 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 08:00:14 <openstack> The meeting name has been set to 'watcher' 08:00:23 <licanwei> hi~ 08:00:28 <Dantalion> hello 08:01:37 <licanwei> #topic Announcements 08:02:08 <licanwei> Proof of concept threadpool achieves 90% performance improvement with 16 threads 08:02:53 <Dantalion> In the referenced patch you can see how this poc was implemented 08:03:21 <Dantalion> in measurements single average cell build time went from 315 to 41 seconds 08:03:53 <Dantalion> For five cells build time was reduced from 1500 tot 151 seconds 08:03:58 <licanwei> It's amazing! 08:04:44 <chenke> Cool 08:06:17 <chenke> I want to know why you choose 16 08:06:49 <Dantalion> because with 32 performance was similar or worse 08:07:30 <Dantalion> Likely the number of workers ideal for infrastructure depends on many things, such as the host that Watcher runs on and the amount of Nova API's deployed. 08:07:59 <licanwei> how the cpu usage percent? 08:08:07 <chenke> enmuu... 08:08:18 <licanwei> Is it much higher? 08:08:22 <Dantalion> Beyond 16 workers I also noticed problems with url3lib not being able to open the connections 08:08:57 <Dantalion> CPU usage on Watcher was around 8% not much higher then before. This parallel operations are all very IO bound so lots of IO wait on cpu time 08:11:14 <licanwei> How many workers maybe a probelm 08:11:30 <chenke> I am not sure if creating a global thread pool is a good idea. 08:11:32 <licanwei> maybe we need a config option 08:11:55 <licanwei> let user to config it 08:11:56 <chenke> maybe we need a method to compute. 08:12:22 <Dantalion> Ofcourse amount of workers needs to be user configurable 08:12:37 <Dantalion> I also think number of audit workers and parallel operation workers needs to be separate 08:13:26 <licanwei> what you mean 'operation workers'? 08:13:49 <Dantalion> To perform requests like getting compute node info or instance info from Nova 08:14:02 <Dantalion> But also perform requests like getting metrics from datasources 08:14:26 <Dantalion> This are the two primary things that slow down Watcher 08:14:28 <licanwei> i see 08:15:18 <Dantalion> The reason for global threadpool in decision engine is because audits already run in a threadpool, using a additional threadpool inside of one will cause problems 08:15:49 <chenke> it can be two different workers. 08:16:30 <chenke> global threadpool will not cause problem? 08:16:57 <chenke> I am not familiar with this. 08:17:03 <Dantalion> I think we should limit to one threadpool per executable 08:17:32 <Dantalion> executable being, decision-engine, applier, api 08:18:31 <licanwei> action execution in applier is parallel 08:18:46 <licanwei> actions 08:20:25 <Dantalion> Not sure I think actions are executed sequential but multiple actionplans can run in parallel 08:21:22 <licanwei> No, only one actionplan can be run each time 08:21:41 <licanwei> but actions can run paralle 08:21:51 <Dantalion> Could you show in code 08:22:16 <licanwei> wait a minute 08:23:09 <licanwei> https://github.com/openstack/watcher/blob/master/watcher/applier/workflow_engine/default.py#L85 08:23:40 <licanwei> Now let's move on 08:24:00 <licanwei> #topic reviews 08:24:53 <licanwei> https://review.opendev.org/#/c/668598/ Improve Compute Data Model need a new patchset 08:25:19 <chenke> Yes 08:25:30 <licanwei> I'll submit a new patch, thanks 08:26:29 <chenke> Overall it looks. 08:26:34 <licanwei> https://review.opendev.org/#/c/670366/ Baseclass for ModelBuilder with audit scope need review 08:27:47 <licanwei> +1 review 08:27:51 <chenke> +1 08:28:18 <licanwei> https://review.opendev.org/#/c/670386/ Add call_retry for ModelBuilder for error recovery 08:29:04 <licanwei> It's ok to me 08:29:07 <chenke> +1 08:30:02 <licanwei> https://review.opendev.org/#/c/670453/ Move datasources folder into decision_engine 08:30:31 <licanwei> review +1 08:30:35 <chenke> Yep. 08:30:41 <licanwei> totally it's ok 08:31:38 <licanwei> https://review.opendev.org/#/c/671014/ Remove useless _opts.py needs review 08:32:27 <licanwei> review +1 08:32:37 <chenke> +1 08:32:58 <licanwei> https://review.opendev.org/#/c/669611/ Replace human_id with name in grafana doc needs review 08:33:35 <chenke> the name patch has been merged. 08:33:41 <Dantalion> Looks good to me 08:34:00 <licanwei> +1 08:34:25 <licanwei> https://review.opendev.org/#/c/669087/ remove id field from CDM needs review 08:35:02 <chenke> This patch need time to review. 08:35:09 <licanwei> yes 08:35:34 <Dantalion> I will try to do this week or next but I do not have a lot of time due to writting my thesis 08:35:36 <licanwei> I wish you can have time to review 08:36:22 <chenke> The paper should be placed at a high priority. 08:36:53 <chenke> I will review it again after patch updated. 08:36:53 <licanwei> Ok, thank you Corne 08:37:32 <licanwei> https://review.opendev.org/#/c/669786/ Add reource_name for save_energy in action input parameter field needs review 08:37:47 <licanwei> review +1 08:38:08 <licanwei> https://review.opendev.org/#/c/669528/ Add Python 3 Train unit tests needs review 08:38:23 <Dantalion> I think we should not merge these two patches 08:38:47 <licanwei> I think we need abandon it 08:38:48 <Dantalion> They remove python 3.6 unit tests while the linked governance article advises py3.6 should ideally be included 08:38:55 <chenke> Yes. 08:39:29 <licanwei> ok 08:40:17 <licanwei> https://review.opendev.org/#/c/670936/ rollback node status 08:40:36 <chenke> This patch watcher-tempest-strategies failed. 08:41:47 <Dantalion> watcher_tempest_plugin.tests.scenario.test_execute_strategies.TestExecuteStrategies.test_execute_host_maintenance_strategy fails 08:42:05 <licanwei> sometimes the tempest compute node can't migarate 08:42:24 <licanwei> I still don't know the reason, 08:42:38 <licanwei> very strange~ 08:43:11 <licanwei> the compute maybe Ok after a few seconds 08:44:08 <licanwei> I'll continue to debug the reason 08:44:15 <Dantalion> Okaj, yes lets investigate 08:44:54 <licanwei> #topic Discussion 08:45:14 <licanwei> Global threadpool proposal for decision-engine 08:45:30 <Dantalion> I have seen processutils for concurrency now in workflow engine 08:45:47 <Dantalion> I would like to use that for tests as well before making a decision 08:45:55 <licanwei> I think we need more test about the global threadpool 08:46:04 <licanwei> but it's a good proposal 08:46:20 <licanwei> yes 08:46:21 <Dantalion> When I attempted to use an additional threadpool inside the already existing audit threadpool deadlocks occur 08:46:36 <Dantalion> Clearly threadpools can not be safely run inside threadpools 08:46:40 <chenke> this is python's problem? 08:46:57 <licanwei> Can you try greenthreadpool? 08:47:06 <Dantalion> Maybe a Python related problem or just due to how futurist library implements it 08:47:31 <Dantalion> When I try GreenThreadPool everything is executed sequantially instead of in parallel no matter how many workers 08:48:48 <licanwei> GreenThreadPool everything is executed sequantially ? 08:48:55 <licanwei> Is something wrong? 08:50:05 <Dantalion> I do not yet fully understand difference between threadpool and the greenthreadpool but noticed this behavior. However, when trying to execute multiple audits in parallel using the greenthreadpool it still worked 08:50:34 <Dantalion> But the requests to Nova were not executed in parallel 08:51:01 <chenke> that's strange. 08:51:43 <chenke> I will learn the difference between thread and greenthread after our meetting 08:52:02 <licanwei> Ok, we need more tests and can discuss it next meeting time 08:52:16 <Dantalion> I will investigate using oslo_concurrency processutils and see if it runs Nova calls in parallel properly 08:52:19 <Dantalion> Yes 08:52:55 <licanwei> I'll end the meeting if no more thing 08:53:11 <licanwei> thank you all! 08:53:26 <chenke> thanks all. 08:53:39 <licanwei> bye~ 08:53:42 <chenke> life is good. coding make me happy. 08:53:45 <chenke> bye~ 08:53:55 <licanwei> #endmeeting