Monday, 2019-04-01

*** irclogbot_2 has quit IRC02:19
*** apetrich has quit IRC02:58
*** pgaxatte has joined #openstack-mistral06:28
*** openstackgerrit has joined #openstack-mistral06:59
openstackgerritMerged openstack/python-mistralclient stable/stein: Update .gitreview for stable/stein  https://review.openstack.org/64416906:59
*** apetrich has joined #openstack-mistral07:32
rakhmerovd0ugal, apetrich: hi, any idea why this is is happening? http://logs.openstack.org/16/648316/5/check/openstack-tox-docs/e9b504c/job-output.txt.gz#_2019-03-29_21_46_34_02552307:33
apetrichrakhmerov, that looks like flask from pecan inside sphinx. that's something I didn't know happened. But that seems to be a sphinx error07:34
rakhmerovyeah07:35
apetrichthat is very odd indeed07:35
*** akovi has joined #openstack-mistral07:51
rakhmerovakovi: hi )07:58
akovirakhmerov: hi Renat!07:59
rakhmerovakovi: just FYI: my last few patches seriously improve Mistral performance07:59
rakhmerovin case of big data contextx07:59
rakhmerovcontexts07:59
rakhmerovone of them was indeed a regression fix07:59
akovigood to know07:59
rakhmerovyes07:59
rakhmerovyou may want to try it07:59
akoviI'll try to get an update in the product07:59
rakhmerovyep08:00
akoviCurrently I'm working on solving the expiring tokens issue08:00
rakhmerovsome of our NSs now get deployed 4-5x faster08:00
rakhmerovooh, cool08:00
akoviNasty hack, this cannot be upstreamed :(08:00
rakhmerovaah )08:00
*** gkadam has joined #openstack-mistral08:02
*** vgvoleg has joined #openstack-mistral08:09
vgvolegHi everyone! Does mistral have any recommendations about how to work with huge context? Engines lose their connections with rabbit during yaql evaluation, sometimes engines could be killed by OOM and we lose some delayed calls08:25
akoviyou've got to increase the RPC and DB heartbeat timeouts08:26
akoviand increase the memory limits08:26
vgvolegWe have workflow, which has about 6000 tasks, each of them puts about 200 key:value objects to context08:26
akoviI had an instance lately where I had to move the limit above 8G08:27
akoviDoes the kill happen when the Executuion completes?08:27
vgvolegIt should be less than 1GB of mem08:28
vgvolegYes08:28
akoviyou should decrease the batch size08:28
vgvolegWe tried to give 10-15 GB to the engines, but they eat everything08:28
akoviRenat has introduced it las summer if I remember correctly08:28
vgvolegwe tried to set batch size 5 :D08:29
vgvolegok we'll try to increase RPC heartbeat timeout08:29
vgvolegty08:29
akovilet the limits go away08:30
akovirun only a single engine08:30
akovito see what you have to deal with08:30
akovilooks like the json marshalling-unmarshalling is really memory intensive08:30
akovi200 tasks with 4MB contexts are 800MB context08:31
akovithis required 8-9 GB memory to go through08:31
vgvoleghow have you calculate this? :D08:32
vgvoleg4MB context08:32
*** gkadam has quit IRC08:32
vgvolegI've just woke up...08:32
akovithis is an example for what I had to tackle lately08:33
akovieasiest to check the context size from the DB08:33
akoviselect sum(len(in_context)) from action_executions_v2;08:34
akovior something similar08:34
vgvolegoh ty08:34
vgvoleg`increase the RPC and DB heartbeat timeouts` == heartbeat_timeout_threshold  ?08:36
akoviyes, and the number of missed HBs08:36
akoviduring YAQL evaluation the thread is not yielded and the greenthread is stuck08:37
akoviwe tried to put it on a separate real thread but other issues arised immediately08:38
akovi(or rather: consequently)08:38
vgvolegyes, I thought about it08:38
vgvoleg`the number of missed HBs` - can't find it08:38
akovi#heartbeat_rate = 208:39
akovi#heartbeat_timeout_threshold = 6008:39
akoviI think these are the two important08:39
akovi#heartbeat_interval = 308:40
akovimaybe this one too08:40
akoviit's freakin' loaded with legacy :)08:40
vgvolegty so much08:40
vgvolegIs there any mechanish to handle stucked delayed calls?08:42
vgvolegI've found `pickup_job_after` option08:42
akoviyes08:42
vgvolegBut I can't get is it what I needed08:43
akoviah, it's only in our version08:44
akoviyou can implement it with a cron job08:44
akoviselect the delayed calls that have the processing=1 flag for too long08:44
akovisimply update these lines to 008:44
akovithe engine will start processing them08:45
akovithis is the simplest way we could tackle OOM kills too08:45
vgvolegYes, I thought about how to implement it locally, and how you handle the case when this timeout is less than the executing time?08:46
akoviit should not be :)08:47
akovidelayed calls are usually short08:47
vgvolegNot all functions, that could be delayed, are OK with executing two times simultaneously08:47
vgvolegoh OK08:47
akovithis is practically a fix for the discrepancy that ongoing calls may not have been administered consistently at the time of the OOM kill08:48
akoviso yes, this is far slower than having an optimal solution but at least it keeps your service running08:49
vgvolegI think by the time we stop falling because of the memory, it will be possible to set some timeout08:49
vgvolegthank you08:50
akoviyou're welcome :)08:50
vgvolegThis mechanism could be implemented right in the scheduler, was there any problem with it? Or why do you use external cron job for it?08:52
*** bobh has joined #openstack-mistral09:06
*** bobh has quit IRC09:11
*** jrist has quit IRC09:15
*** jrist has joined #openstack-mistral09:16
*** gkadam has joined #openstack-mistral09:28
vgvolegbtw parallel execution tooks more time then consistent09:35
vgvoleglol09:35
*** akovi has quit IRC09:44
*** akovi has joined #openstack-mistral09:45
*** d0ugal has quit IRC09:55
*** d0ugal has joined #openstack-mistral10:05
rakhmerovvgvoleg: can you remind what version of Mistral you're using?10:36
rakhmerovif you have a version from last summer (I remember something like this, Mistral Queens) than the recommendation #1 from me is to switch to the latest available version from master10:37
rakhmerov+ my latest patch https://review.openstack.org/#/c/648316/10:38
rakhmerovthis patch removes a HUGE performance regression related to YAQL evaluation10:38
rakhmerovalso lots of performance improvements were made in Oct-Nov10:39
rakhmerovapetrich, d0ugal: sphinx was updated to 2.0.0 on Mar 2810:51
rakhmerovI guess that's the cause10:51
d0ugalSounds likely10:52
openstackgerritRenat Akhmerov proposed openstack/mistral master: WIP: try to pin sphinx version  https://review.openstack.org/64894410:58
vgvolegrakhmerov: I'm using latest with your patch11:01
rakhmerovok11:01
rakhmerov6000 tasks is a lot :)11:01
rakhmerovI guess parsing YAML alone takes very much time11:02
vgvolegwe are trying to place all cycles in publish section to one yaql expression, I think we could win some time with it11:08
rakhmerovvgvoleg: cycles?11:22
rakhmerovwhat do you mean by that?11:22
rakhmerovd0ugal, apetrich: yes, it's sphinx version. https://review.openstack.org/#/c/648944/ passes doc but doesn't pass requirements-check11:23
apetrichrakhmerov, great11:23
rakhmerovI probably need your advice here. Do you think we need to send a patch to the global requirements to pin sphinx version?11:24
rakhmerovvgvoleg: and make sure to set the config property "convert_input_data" to false11:25
apetrichrakhmerov, I'm not sure if that isn't an interaction with pecan11:25
apetrichI'd keep like that for now. It is going to hit other projects if that is a sphinx bug. if it is not and it is just an interaction with pecan we might need to leave like this anyway11:26
rakhmerovit seems like that the new version of sphinx 2.0.0 conflicts with sphinxcontrib-pecanwsme11:26
rakhmerovI guess that's what it is11:27
rakhmerovthere probably must be a new version of sphinxcontrib-pecanwsme to fix that but it doesn't exist yet11:27
rakhmerovbecause yes, the problem comes from interaction of sphinx and pecanwsme11:28
apetrichyeah11:28
apetrichmakes sense11:28
apetrich"sense"11:28
*** akovi has quit IRC11:30
rakhmerovok, I'll leave it as is for now11:30
rakhmerovlet's see if they fix it11:30
*** akovi has joined #openstack-mistral11:30
openstackgerritVlad Gusev proposed openstack/mistral master: Add release note for I04ba85488b27cb05c3b81ad8c973c3cc3fe56d36  https://review.openstack.org/64895612:11
*** apetrich has quit IRC12:16
*** apetrich has joined #openstack-mistral12:17
*** apetrich has quit IRC12:36
openstackgerritVlad Gusev proposed openstack/mistral stable/stein: Add http_proxy_to_wsgi middleware  https://review.openstack.org/64769412:36
*** apetrich has joined #openstack-mistral12:53
*** irclogbot_2 has joined #openstack-mistral13:26
*** bobh has joined #openstack-mistral15:15
*** bobh has quit IRC15:19
*** pgaxatte has quit IRC15:49
*** akovi has quit IRC16:24
*** gkadam has quit IRC17:05
*** bobh has joined #openstack-mistral17:15
*** zigo has quit IRC17:37
*** bobh has quit IRC17:54
*** openstackgerrit has quit IRC23:56

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!