Thursday, 2017-11-30

*** Bakey has quit IRC		00:51
*** bcoca has quit IRC		02:14
*** _dev has quit IRC		05:18
*** resmo has joined #ara		08:45
*** resmo has quit IRC		08:56
*** BlessJah has quit IRC		09:39
*** andymccr has quit IRC		09:39
*** BlessJah has joined #ara		09:39
*** andymccr has joined #ara		09:40
*** resmo has joined #ara		10:02
*** jparrill has quit IRC		10:09
*** berendt has quit IRC		10:09
*** jparrill has joined #ara		10:15
*** berendt has joined #ara		10:15
*** sshnaidm\|off is now known as sshnaidm\|rover		10:48
*** vcn[m] has quit IRC		11:14
*** vcn[m] has joined #ara		12:29
*** tbielawa has joined #ara		12:42
*** tbielawa has quit IRC		12:42
*** tbielawa has joined #ara		12:43
*** tbielawa is now known as tbielawa\|drappt		13:33
*** bcoca has joined #ara		14:56
*** bcoca has joined #ara		14:56
*** Bakey has joined #ara		15:01
*** jrist has quit IRC		15:08
*** jrist has joined #ara		15:09
*** tbielawa\|drappt is now known as tbielawa		15:16
-openstackstatus- NOTICE: if you receieved a result of "RETRY_LIMIT" after 14:15 UTC, it was likely due to an error since corrected. please "recheck"		15:36
*** tbielawa is now known as tbielawa\|lunch		17:00
*** resmo has quit IRC		17:21
*** tbielawa\|lunch is now known as tbielawa		18:00
*** sshnaidm\|rover is now known as sshnaidm\|off		18:02
*** wwriverrat has joined #ara		21:40
wwriverrat	Hi. I’ve been running ansible tied to ara here at GoDaddy. Been noticing it’s not capturing all of the tasks and seems to stop recording tasks after a while (sometimes early, sometimes later). Anyone else see something similar?	21:45
wwriverrat	For those that drop tasks, it appears on main page they never completed	21:48
harlowja	Might be retries needed from what I see :/	21:55
harlowja	https://gist.github.com/harlowja/0c98e92c40c9366a2f56fbe5aa2f083b	21:56
*** tbielawa has quit IRC		22:09
dmsimard	wwriverrat: hey there	23:00
wwriverrat	hey :)	23:00
dmsimard	harlowja is way better than I am for this kind of stuff but let's try to figure something out	23:01
harlowja	u can do its	23:01
harlowja	lol	23:01
dmsimard	ARA isn't very tolerant to failure right now but it's tricky... Are you getting that on a regular basis ?	23:03
wwriverrat	I think harlowja got it figured out. Two thoughts: 1) Like he said, apply a retry and/or 2) Make sure when a connection is pulled for use, it has a connection that the server-side hasnt dropped	23:03
dmsimard	I say it's tricky because ara has to eventually let go and give up otherwise it could hang the entire ansible process: https://github.com/ansible/ansible/issues/27705	23:04
wwriverrat	We used to do a “test on borrow” technique: http://docs.sqlalchemy.org/en/latest/core/pooling.html#dealing-with-disconnects	23:07
dmsimard	So I think we can do a minimum here and retry another time, though	23:07
* dmsimard reads		23:07
wwriverrat	That way if the server-side drops the connection while it is in the pool, it gets refreshed before there is an attempt to use it	23:07
dmsimard	That makes sense. The sqlalchemy stuff in ARA is very basic right now, and a bit all over the place tbh	23:08
wwriverrat	refreshed by typically invoking a “select 1” (and fixing the connection if broken) before your code attempts to use it	23:09
dmsimard	Part of that is being addressed with ara 1.0 but it won't be out for a while so that doesn't help you	23:09
wwriverrat	no worries. Just a thought. Likely several ways to skin that cat.	23:09
dmsimard	In 1.0, actually, ara no longer speaks to the database directly.	23:09
dmsimard	In your use case, it'd be over HTTP with the new API.	23:10
*** jparrill has quit IRC		23:10
dmsimard	Hang on, let me get off my phone and on a keyboard... :D	23:10
dmsimard	yay, keyboard \o/	23:11
dmsimard	Ok, so, in 1.0 if you're going to be sending data from one place to the other like you are doing now, it would be through the API. Everything goes through a single place, the API client, and I am betting a lot on that to be this central point where this kind of failure tolerance will be easy to implement	23:13
dmsimard	If this is a regular occurrence for you, I'd love to accomodate you by releasing something helpful in the stable branch but I am honestly not very knowledgeable when it comes to stuff like that	23:14
dmsimard	I know that harlowja is familiar with oslo.db which likely has stuff like that built-in I suppose	23:15
* harlowja is just a dumb person		23:15
harlowja	lol	23:15
harlowja	nope nope, don't know anything	23:15
harlowja	lol	23:15
harlowja	dumb josh	23:15
harlowja	being dumb	23:15
harlowja	lol	23:15
dmsimard	However, last time I've looked at oslo.db, it also required oslo.config, which totally derails the thing	23:16
wwriverrat	I’ve been running ansible jobs all day. For ~50% of them, I’ve seen some point where the server-side of MySQL drops the connection under tour feet	23:16
dmsimard	wwriverrat: 50% ouch	23:17
harlowja	shitty networking?	23:17
wwriverrat	that or I believe mysql default wait_timeout is 30 seconds	23:17
wwriverrat	so if you have a pool of say 10 connections and you finally get around to using #8, it could have been long ago disconnected	23:18
wwriverrat	In that link I sent above, there are several techniques for handling stale mysql connections (pessimistic, optimistic, pool recycle)	23:20
harlowja	from what i understand ara is using http://flask-sqlalchemy.pocoo.org/2.3/config/	23:20
harlowja	which has a SQLALCHEMY_POOL_TIMEOUT Specifies the connection timeout in seconds for the pool.	23:21
harlowja	might just need to set that somehow	23:21
harlowja	and set it lower than mysql/mariadb value	23:21
dmsimard	yeah I'm reading right now	23:21
dmsimard	wwriverrat: what's your database server timeout ?	23:21
harlowja	302 harlowja	23:21
harlowja	lol	23:21
wwriverrat	lol.. stole my thunder	23:22
harlowja	we just used a standard centos mariadb	23:22
harlowja	so i'm guessing the default	23:22
wwriverrat	I did the google thing a while ago and found 30s	23:22
harlowja	ya, its whatever the default is :-P	23:23
harlowja	of `mariadb-server-5.5.52-1.el7.x86_64`	23:23
harlowja	prob just need to know how to set SQLALCHEMY_POOL_TIMEOUT so that it gets picked up by ara	23:24
dmsimard	I love that sqlalchemy has something called connectionFairy	23:24
dmsimard	harlowja, wwriverrat: If I write a patch that would plumb that up, would you like to try that out and let me know if it helps ?	23:25
harlowja	all sorts of weird shit in there, lol	23:25
harlowja	yes	23:25
harlowja	wwriverrat very much would like that	23:25
harlowja	:-P	23:25
dmsimard	ok, give me a minute	23:25
wwriverrat	Yep, and it would be on us to ensure SQLALCHEMY_POOL_TIMEOUT is <= maridb wait_timeout	23:25
*** openstackgerrit has joined #ara		23:35
openstackgerrit	David Moreau Simard proposed openstack/ara master: Add support for configuring sqlalchemy pool size, timeout and recycle https://review.openstack.org/524427	23:35
dmsimard	harlowja, wwriverrat ^	23:35
dmsimard	do you use an ansible.cfg file to configure ara ? or env variables ?	23:36
harlowja	ansible.cfg	23:36
dmsimard	ok, so, under [ara], you'll want to play with sqlalchemy_pool_size, sqlalchemy_pool_timeout and sqlalchemy_pool_recycle	23:37
harlowja	any changes needed to do that?	23:37
harlowja	(ara code changes)	23:37
dmsimard	well yeah, you're going to need the patch that I just sent	23:38
harlowja	oh	23:38
* harlowja slaps self in face		23:38
harlowja	lol	23:38
dmsimard	don't do that :)	23:38
* harlowja slaps someone else in face?		23:38
harlowja	lol	23:38
dmsimard	¯\_(ツ)_/¯	23:38
dmsimard	I have to catch dinner before I digest myself. Can you try that out and let me know if it helps ? I'll try and see where we go from there.	23:39
harlowja	yup	23:39
dmsimard	thanks	23:41
dmsimard	and like I mentioned, I'm definitely planning to bake failure tolerance at the API client layer which will be fairly easy. Right now there's queries to the database a bit all over the place so it's not super trivial beyond tweaking those pool settings	23:42
harlowja	yuppers	23:42
harlowja	might have to get off flask_sqlachemy thingy	23:42
harlowja	and onto <one of the other 10000000 libs>	23:42
dmsimard	lol	23:42
harlowja	^ not oslo.db, lol	23:42
dmsimard	it's funny that you mention that	23:42
dmsimard	because one of the only things that kept me on flask is removed in 1.0	23:43
dmsimard	and awx is built in django	23:43
dmsimard	but I don't really wanna rewrite everything, and I don't know django at all anyway	23:43
harlowja	:-p	23:43
dmsimard	but django rest framework looks pretty dope	23:43
dmsimard	ok afk food, please keep me updated	23:44
harlowja	willdo	23:44
wwriverrat	Ironically: I upped my forks from 5 to 8. Getting all the results :P . Assume it’s cause it takes less time and taken from pool sooner :)	23:47
*** Bakey has quit IRC		23:54

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!