| *** Bakey has quit IRC | 00:51 | |
| *** bcoca has quit IRC | 02:14 | |
| *** _dev has quit IRC | 05:18 | |
| *** resmo has joined #ara | 08:45 | |
| *** resmo has quit IRC | 08:56 | |
| *** BlessJah has quit IRC | 09:39 | |
| *** andymccr has quit IRC | 09:39 | |
| *** BlessJah has joined #ara | 09:39 | |
| *** andymccr has joined #ara | 09:40 | |
| *** resmo has joined #ara | 10:02 | |
| *** jparrill has quit IRC | 10:09 | |
| *** berendt has quit IRC | 10:09 | |
| *** jparrill has joined #ara | 10:15 | |
| *** berendt has joined #ara | 10:15 | |
| *** sshnaidm|off is now known as sshnaidm|rover | 10:48 | |
| *** vcn[m] has quit IRC | 11:14 | |
| *** vcn[m] has joined #ara | 12:29 | |
| *** tbielawa has joined #ara | 12:42 | |
| *** tbielawa has quit IRC | 12:42 | |
| *** tbielawa has joined #ara | 12:43 | |
| *** tbielawa is now known as tbielawa|drappt | 13:33 | |
| *** bcoca has joined #ara | 14:56 | |
| *** bcoca has joined #ara | 14:56 | |
| *** Bakey has joined #ara | 15:01 | |
| *** jrist has quit IRC | 15:08 | |
| *** jrist has joined #ara | 15:09 | |
| *** tbielawa|drappt is now known as tbielawa | 15:16 | |
| -openstackstatus- NOTICE: if you receieved a result of "RETRY_LIMIT" after 14:15 UTC, it was likely due to an error since corrected. please "recheck" | 15:36 | |
| *** tbielawa is now known as tbielawa|lunch | 17:00 | |
| *** resmo has quit IRC | 17:21 | |
| *** tbielawa|lunch is now known as tbielawa | 18:00 | |
| *** sshnaidm|rover is now known as sshnaidm|off | 18:02 | |
| *** wwriverrat has joined #ara | 21:40 | |
| wwriverrat | Hi. I’ve been running ansible tied to ara here at GoDaddy. Been noticing it’s not capturing all of the tasks and seems to stop recording tasks after a while (sometimes early, sometimes later). Anyone else see something similar? | 21:45 |
|---|---|---|
| wwriverrat | For those that drop tasks, it appears on main page they never completed | 21:48 |
| harlowja | Might be retries needed from what I see :/ | 21:55 |
| harlowja | https://gist.github.com/harlowja/0c98e92c40c9366a2f56fbe5aa2f083b | 21:56 |
| *** tbielawa has quit IRC | 22:09 | |
| dmsimard | wwriverrat: hey there | 23:00 |
| wwriverrat | hey :) | 23:00 |
| dmsimard | harlowja is way better than I am for this kind of stuff but let's try to figure something out | 23:01 |
| harlowja | u can do its | 23:01 |
| harlowja | lol | 23:01 |
| dmsimard | ARA isn't very tolerant to failure right now but it's tricky... Are you getting that on a regular basis ? | 23:03 |
| wwriverrat | I think harlowja got it figured out. Two thoughts: 1) Like he said, apply a retry and/or 2) Make sure when a connection is pulled for use, it has a connection that the server-side hasnt dropped | 23:03 |
| dmsimard | I say it's tricky because ara has to eventually let go and give up otherwise it could hang the entire ansible process: https://github.com/ansible/ansible/issues/27705 | 23:04 |
| wwriverrat | We used to do a “test on borrow” technique: http://docs.sqlalchemy.org/en/latest/core/pooling.html#dealing-with-disconnects | 23:07 |
| dmsimard | So I think we can do a minimum here and retry another time, though | 23:07 |
| * dmsimard reads | 23:07 | |
| wwriverrat | That way if the server-side drops the connection while it is in the pool, it gets refreshed before there is an attempt to use it | 23:07 |
| dmsimard | That makes sense. The sqlalchemy stuff in ARA is very basic right now, and a bit all over the place tbh | 23:08 |
| wwriverrat | refreshed by typically invoking a “select 1” (and fixing the connection if broken) before your code attempts to use it | 23:09 |
| dmsimard | Part of that is being addressed with ara 1.0 but it won't be out for a while so that doesn't help you | 23:09 |
| wwriverrat | no worries. Just a thought. Likely several ways to skin that cat. | 23:09 |
| dmsimard | In 1.0, actually, ara no longer speaks to the database directly. | 23:09 |
| dmsimard | In your use case, it'd be over HTTP with the new API. | 23:10 |
| *** jparrill has quit IRC | 23:10 | |
| dmsimard | Hang on, let me get off my phone and on a keyboard... :D | 23:10 |
| dmsimard | yay, keyboard \o/ | 23:11 |
| dmsimard | Ok, so, in 1.0 if you're going to be sending data from one place to the other like you are doing now, it would be through the API. Everything goes through a single place, the API client, and I am betting a lot on that to be this central point where this kind of failure tolerance will be easy to implement | 23:13 |
| dmsimard | If this is a regular occurrence for you, I'd love to accomodate you by releasing something helpful in the stable branch but I am honestly not very knowledgeable when it comes to stuff like that | 23:14 |
| dmsimard | I know that harlowja is familiar with oslo.db which likely has stuff like that built-in I suppose | 23:15 |
| * harlowja is just a dumb person | 23:15 | |
| harlowja | lol | 23:15 |
| harlowja | nope nope, don't know anything | 23:15 |
| harlowja | lol | 23:15 |
| harlowja | dumb josh | 23:15 |
| harlowja | being dumb | 23:15 |
| harlowja | lol | 23:15 |
| dmsimard | However, last time I've looked at oslo.db, it also required oslo.config, which totally derails the thing | 23:16 |
| wwriverrat | I’ve been running ansible jobs all day. For ~50% of them, I’ve seen some point where the server-side of MySQL drops the connection under tour feet | 23:16 |
| dmsimard | wwriverrat: 50% ouch | 23:17 |
| harlowja | shitty networking? | 23:17 |
| wwriverrat | that or I believe mysql default wait_timeout is 30 seconds | 23:17 |
| wwriverrat | so if you have a pool of say 10 connections and you finally get around to using #8, it could have been long ago disconnected | 23:18 |
| wwriverrat | In that link I sent above, there are several techniques for handling stale mysql connections (pessimistic, optimistic, pool recycle) | 23:20 |
| harlowja | from what i understand ara is using http://flask-sqlalchemy.pocoo.org/2.3/config/ | 23:20 |
| harlowja | which has a SQLALCHEMY_POOL_TIMEOUT Specifies the connection timeout in seconds for the pool. | 23:21 |
| harlowja | might just need to set that somehow | 23:21 |
| harlowja | and set it lower than mysql/mariadb value | 23:21 |
| dmsimard | yeah I'm reading right now | 23:21 |
| dmsimard | wwriverrat: what's your database server timeout ? | 23:21 |
| harlowja | 302 harlowja | 23:21 |
| harlowja | lol | 23:21 |
| wwriverrat | lol.. stole my thunder | 23:22 |
| harlowja | we just used a standard centos mariadb | 23:22 |
| harlowja | so i'm guessing the default | 23:22 |
| wwriverrat | I did the google thing a while ago and found 30s | 23:22 |
| harlowja | ya, its whatever the default is :-P | 23:23 |
| harlowja | of `mariadb-server-5.5.52-1.el7.x86_64` | 23:23 |
| harlowja | prob just need to know how to set SQLALCHEMY_POOL_TIMEOUT so that it gets picked up by ara | 23:24 |
| dmsimard | I love that sqlalchemy has something called connectionFairy | 23:24 |
| dmsimard | harlowja, wwriverrat: If I write a patch that would plumb that up, would you like to try that out and let me know if it helps ? | 23:25 |
| harlowja | all sorts of weird shit in there, lol | 23:25 |
| harlowja | yes | 23:25 |
| harlowja | wwriverrat very much would like that | 23:25 |
| harlowja | :-P | 23:25 |
| dmsimard | ok, give me a minute | 23:25 |
| wwriverrat | Yep, and it would be on us to ensure SQLALCHEMY_POOL_TIMEOUT is <= maridb wait_timeout | 23:25 |
| *** openstackgerrit has joined #ara | 23:35 | |
| openstackgerrit | David Moreau Simard proposed openstack/ara master: Add support for configuring sqlalchemy pool size, timeout and recycle https://review.openstack.org/524427 | 23:35 |
| dmsimard | harlowja, wwriverrat ^ | 23:35 |
| dmsimard | do you use an ansible.cfg file to configure ara ? or env variables ? | 23:36 |
| harlowja | ansible.cfg | 23:36 |
| dmsimard | ok, so, under [ara], you'll want to play with sqlalchemy_pool_size, sqlalchemy_pool_timeout and sqlalchemy_pool_recycle | 23:37 |
| harlowja | any changes needed to do that? | 23:37 |
| harlowja | (ara code changes) | 23:37 |
| dmsimard | well yeah, you're going to need the patch that I just sent | 23:38 |
| harlowja | oh | 23:38 |
| * harlowja slaps self in face | 23:38 | |
| harlowja | lol | 23:38 |
| dmsimard | don't do that :) | 23:38 |
| * harlowja slaps someone else in face? | 23:38 | |
| harlowja | lol | 23:38 |
| dmsimard | ¯\_(ツ)_/¯ | 23:38 |
| dmsimard | I have to catch dinner before I digest myself. Can you try that out and let me know if it helps ? I'll try and see where we go from there. | 23:39 |
| harlowja | yup | 23:39 |
| dmsimard | thanks | 23:41 |
| dmsimard | and like I mentioned, I'm definitely planning to bake failure tolerance at the API client layer which will be fairly easy. Right now there's queries to the database a bit all over the place so it's not super trivial beyond tweaking those pool settings | 23:42 |
| harlowja | yuppers | 23:42 |
| harlowja | might have to get off flask_sqlachemy thingy | 23:42 |
| harlowja | and onto <one of the other 10000000 libs> | 23:42 |
| dmsimard | lol | 23:42 |
| harlowja | ^ not oslo.db, lol | 23:42 |
| dmsimard | it's funny that you mention that | 23:42 |
| dmsimard | because one of the only things that kept me on flask is removed in 1.0 | 23:43 |
| dmsimard | and awx is built in django | 23:43 |
| dmsimard | but I don't really wanna rewrite everything, and I don't know django at all anyway | 23:43 |
| harlowja | :-p | 23:43 |
| dmsimard | but django rest framework looks pretty dope | 23:43 |
| dmsimard | ok afk food, please keep me updated | 23:44 |
| harlowja | willdo | 23:44 |
| wwriverrat | Ironically: I upped my forks from 5 to 8. Getting all the results :P . Assume it’s cause it takes less time and taken from pool sooner :) | 23:47 |
| *** Bakey has quit IRC | 23:54 | |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!