08:00:10 <licanwei> #startmeeting Watcher
08:00:11 <openstack> Meeting started Wed Jul 17 08:00:10 2019 UTC and is due to finish in 60 minutes.  The chair is licanwei. Information about MeetBot at http://wiki.debian.org/MeetBot.
08:00:12 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
08:00:14 <openstack> The meeting name has been set to 'watcher'
08:00:23 <licanwei> hi~
08:00:28 <Dantalion> hello
08:01:37 <licanwei> #topic Announcements
08:02:08 <licanwei> Proof of concept threadpool achieves 90% performance improvement with 16 threads
08:02:53 <Dantalion> In the referenced patch you can see how this poc was implemented
08:03:21 <Dantalion> in measurements single average cell build time went from 315 to 41 seconds
08:03:53 <Dantalion> For five cells build time was reduced from 1500 tot 151 seconds
08:03:58 <licanwei> It's amazing!
08:04:44 <chenke> Cool
08:06:17 <chenke> I want to know why you choose 16
08:06:49 <Dantalion> because with 32 performance was similar or worse
08:07:30 <Dantalion> Likely the number of workers ideal for infrastructure depends on many things, such as the host that Watcher runs on and the amount of Nova API's deployed.
08:07:59 <licanwei> how the cpu usage percent?
08:08:07 <chenke> enmuu...
08:08:18 <licanwei> Is it much higher?
08:08:22 <Dantalion> Beyond 16 workers I also noticed problems with url3lib not being able to open the connections
08:08:57 <Dantalion> CPU usage on Watcher was around 8% not much higher then before. This parallel operations are all very IO bound so lots of IO wait on cpu time
08:11:14 <licanwei> How many workers maybe a probelm
08:11:30 <chenke> I am not sure if creating a global thread pool is a good idea.
08:11:32 <licanwei> maybe we need a config option
08:11:55 <licanwei> let user to config it
08:11:56 <chenke> maybe we need a method to compute.
08:12:22 <Dantalion> Ofcourse amount of workers needs to be user configurable
08:12:37 <Dantalion> I also think number of audit workers and parallel operation workers needs to be separate
08:13:26 <licanwei> what you mean 'operation workers'?
08:13:49 <Dantalion> To perform requests like getting compute node info or instance info from Nova
08:14:02 <Dantalion> But also perform requests like getting metrics from datasources
08:14:26 <Dantalion> This are the two primary things that slow down Watcher
08:14:28 <licanwei> i see
08:15:18 <Dantalion> The reason for global threadpool in decision engine is because audits already run in a threadpool, using a additional threadpool inside of one will cause problems
08:15:49 <chenke> it can be two different workers.
08:16:30 <chenke> global threadpool will not cause problem?
08:16:57 <chenke> I am not familiar with this.
08:17:03 <Dantalion> I think we should limit to one threadpool per executable
08:17:32 <Dantalion> executable being, decision-engine, applier, api
08:18:31 <licanwei> action execution in applier is parallel
08:18:46 <licanwei> actions
08:20:25 <Dantalion> Not sure I think actions are executed sequential but multiple actionplans can run in parallel
08:21:22 <licanwei> No, only one actionplan can be run each time
08:21:41 <licanwei> but actions can run paralle
08:21:51 <Dantalion> Could you show in code
08:22:16 <licanwei> wait a minute
08:23:09 <licanwei> https://github.com/openstack/watcher/blob/master/watcher/applier/workflow_engine/default.py#L85
08:23:40 <licanwei> Now let's move on
08:24:00 <licanwei> #topic reviews
08:24:53 <licanwei> https://review.opendev.org/#/c/668598/ Improve Compute Data Model need a new patchset
08:25:19 <chenke> Yes
08:25:30 <licanwei> I'll submit a new patch, thanks
08:26:29 <chenke> Overall it looks.
08:26:34 <licanwei> https://review.opendev.org/#/c/670366/ Baseclass for ModelBuilder with audit scope need review
08:27:47 <licanwei> +1 review
08:27:51 <chenke> +1
08:28:18 <licanwei> https://review.opendev.org/#/c/670386/ Add call_retry for ModelBuilder for error recovery
08:29:04 <licanwei> It's ok to me
08:29:07 <chenke> +1
08:30:02 <licanwei> https://review.opendev.org/#/c/670453/ Move datasources folder into decision_engine
08:30:31 <licanwei> review +1
08:30:35 <chenke> Yep.
08:30:41 <licanwei> totally it's ok
08:31:38 <licanwei> https://review.opendev.org/#/c/671014/ Remove useless _opts.py needs review
08:32:27 <licanwei> review +1
08:32:37 <chenke> +1
08:32:58 <licanwei> https://review.opendev.org/#/c/669611/ Replace human_id with name in grafana doc needs review
08:33:35 <chenke> the name patch has been merged.
08:33:41 <Dantalion> Looks good to me
08:34:00 <licanwei> +1
08:34:25 <licanwei> https://review.opendev.org/#/c/669087/ remove id field from CDM needs review
08:35:02 <chenke> This patch need time to review.
08:35:09 <licanwei> yes
08:35:34 <Dantalion> I will try to do this week or next but I do not have a lot of time due to writting my thesis
08:35:36 <licanwei> I wish you can have time to review
08:36:22 <chenke> The paper should be placed at a high priority.
08:36:53 <chenke> I will review it again after patch updated.
08:36:53 <licanwei> Ok, thank you Corne
08:37:32 <licanwei> https://review.opendev.org/#/c/669786/ Add reource_name for save_energy in action input parameter field needs review
08:37:47 <licanwei> review +1
08:38:08 <licanwei> https://review.opendev.org/#/c/669528/ Add Python 3 Train unit tests needs review
08:38:23 <Dantalion> I think we should not merge these two patches
08:38:47 <licanwei> I think we need abandon it
08:38:48 <Dantalion> They remove python 3.6 unit tests while the linked governance article advises py3.6 should ideally be included
08:38:55 <chenke> Yes.
08:39:29 <licanwei> ok
08:40:17 <licanwei> https://review.opendev.org/#/c/670936/ rollback node status
08:40:36 <chenke> This patch watcher-tempest-strategies failed.
08:41:47 <Dantalion> watcher_tempest_plugin.tests.scenario.test_execute_strategies.TestExecuteStrategies.test_execute_host_maintenance_strategy fails
08:42:05 <licanwei> sometimes the tempest compute node can't migarate
08:42:24 <licanwei> I still don't know the reason,
08:42:38 <licanwei> very strange~
08:43:11 <licanwei> the compute maybe Ok after a few seconds
08:44:08 <licanwei> I'll continue to debug the reason
08:44:15 <Dantalion> Okaj, yes lets investigate
08:44:54 <licanwei> #topic Discussion
08:45:14 <licanwei> Global threadpool proposal for decision-engine
08:45:30 <Dantalion> I have seen processutils for concurrency now in workflow engine
08:45:47 <Dantalion> I would like to use that for tests as well before making a decision
08:45:55 <licanwei> I think we need more test about the global threadpool
08:46:04 <licanwei> but it's a good proposal
08:46:20 <licanwei> yes
08:46:21 <Dantalion> When I attempted to use an additional threadpool inside the already existing audit threadpool deadlocks occur
08:46:36 <Dantalion> Clearly threadpools can not be safely run inside threadpools
08:46:40 <chenke> this is python's problem?
08:46:57 <licanwei> Can you try greenthreadpool?
08:47:06 <Dantalion> Maybe a Python related problem or just due to how futurist library implements it
08:47:31 <Dantalion> When I try GreenThreadPool everything is executed sequantially instead of in parallel no matter how many workers
08:48:48 <licanwei> GreenThreadPool everything is executed sequantially ?
08:48:55 <licanwei> Is something wrong?
08:50:05 <Dantalion> I do not yet fully understand difference between threadpool and the greenthreadpool but noticed this behavior. However, when trying to execute multiple audits in parallel using the greenthreadpool it still worked
08:50:34 <Dantalion> But the requests to Nova were not executed in parallel
08:51:01 <chenke> that's strange.
08:51:43 <chenke> I will learn the difference between thread and greenthread after our meetting
08:52:02 <licanwei> Ok, we need more tests and can discuss it next meeting time
08:52:16 <Dantalion> I will investigate using oslo_concurrency processutils and see if it runs Nova calls in parallel properly
08:52:19 <Dantalion> Yes
08:52:55 <licanwei> I'll end the meeting if no more thing
08:53:11 <licanwei> thank you all!
08:53:26 <chenke> thanks all.
08:53:39 <licanwei> bye~
08:53:42 <chenke> life is good. coding make me happy.
08:53:45 <chenke> bye~
08:53:55 <licanwei> #endmeeting