*** Bakey has quit IRC | 00:51 | |
*** bcoca has quit IRC | 02:14 | |
*** _dev has quit IRC | 05:18 | |
*** resmo has joined #ara | 08:45 | |
*** resmo has quit IRC | 08:56 | |
*** BlessJah has quit IRC | 09:39 | |
*** andymccr has quit IRC | 09:39 | |
*** BlessJah has joined #ara | 09:39 | |
*** andymccr has joined #ara | 09:40 | |
*** resmo has joined #ara | 10:02 | |
*** jparrill has quit IRC | 10:09 | |
*** berendt has quit IRC | 10:09 | |
*** jparrill has joined #ara | 10:15 | |
*** berendt has joined #ara | 10:15 | |
*** sshnaidm|off is now known as sshnaidm|rover | 10:48 | |
*** vcn[m] has quit IRC | 11:14 | |
*** vcn[m] has joined #ara | 12:29 | |
*** tbielawa has joined #ara | 12:42 | |
*** tbielawa has quit IRC | 12:42 | |
*** tbielawa has joined #ara | 12:43 | |
*** tbielawa is now known as tbielawa|drappt | 13:33 | |
*** bcoca has joined #ara | 14:56 | |
*** bcoca has joined #ara | 14:56 | |
*** Bakey has joined #ara | 15:01 | |
*** jrist has quit IRC | 15:08 | |
*** jrist has joined #ara | 15:09 | |
*** tbielawa|drappt is now known as tbielawa | 15:16 | |
-openstackstatus- NOTICE: if you receieved a result of "RETRY_LIMIT" after 14:15 UTC, it was likely due to an error since corrected. please "recheck" | 15:36 | |
*** tbielawa is now known as tbielawa|lunch | 17:00 | |
*** resmo has quit IRC | 17:21 | |
*** tbielawa|lunch is now known as tbielawa | 18:00 | |
*** sshnaidm|rover is now known as sshnaidm|off | 18:02 | |
*** wwriverrat has joined #ara | 21:40 | |
wwriverrat | Hi. I’ve been running ansible tied to ara here at GoDaddy. Been noticing it’s not capturing all of the tasks and seems to stop recording tasks after a while (sometimes early, sometimes later). Anyone else see something similar? | 21:45 |
---|---|---|
wwriverrat | For those that drop tasks, it appears on main page they never completed | 21:48 |
harlowja | Might be retries needed from what I see :/ | 21:55 |
harlowja | https://gist.github.com/harlowja/0c98e92c40c9366a2f56fbe5aa2f083b | 21:56 |
*** tbielawa has quit IRC | 22:09 | |
dmsimard | wwriverrat: hey there | 23:00 |
wwriverrat | hey :) | 23:00 |
dmsimard | harlowja is way better than I am for this kind of stuff but let's try to figure something out | 23:01 |
harlowja | u can do its | 23:01 |
harlowja | lol | 23:01 |
dmsimard | ARA isn't very tolerant to failure right now but it's tricky... Are you getting that on a regular basis ? | 23:03 |
wwriverrat | I think harlowja got it figured out. Two thoughts: 1) Like he said, apply a retry and/or 2) Make sure when a connection is pulled for use, it has a connection that the server-side hasnt dropped | 23:03 |
dmsimard | I say it's tricky because ara has to eventually let go and give up otherwise it could hang the entire ansible process: https://github.com/ansible/ansible/issues/27705 | 23:04 |
wwriverrat | We used to do a “test on borrow” technique: http://docs.sqlalchemy.org/en/latest/core/pooling.html#dealing-with-disconnects | 23:07 |
dmsimard | So I think we can do a minimum here and retry another time, though | 23:07 |
* dmsimard reads | 23:07 | |
wwriverrat | That way if the server-side drops the connection while it is in the pool, it gets refreshed before there is an attempt to use it | 23:07 |
dmsimard | That makes sense. The sqlalchemy stuff in ARA is very basic right now, and a bit all over the place tbh | 23:08 |
wwriverrat | refreshed by typically invoking a “select 1” (and fixing the connection if broken) before your code attempts to use it | 23:09 |
dmsimard | Part of that is being addressed with ara 1.0 but it won't be out for a while so that doesn't help you | 23:09 |
wwriverrat | no worries. Just a thought. Likely several ways to skin that cat. | 23:09 |
dmsimard | In 1.0, actually, ara no longer speaks to the database directly. | 23:09 |
dmsimard | In your use case, it'd be over HTTP with the new API. | 23:10 |
*** jparrill has quit IRC | 23:10 | |
dmsimard | Hang on, let me get off my phone and on a keyboard... :D | 23:10 |
dmsimard | yay, keyboard \o/ | 23:11 |
dmsimard | Ok, so, in 1.0 if you're going to be sending data from one place to the other like you are doing now, it would be through the API. Everything goes through a single place, the API client, and I am betting a lot on that to be this central point where this kind of failure tolerance will be easy to implement | 23:13 |
dmsimard | If this is a regular occurrence for you, I'd love to accomodate you by releasing something helpful in the stable branch but I am honestly not very knowledgeable when it comes to stuff like that | 23:14 |
dmsimard | I know that harlowja is familiar with oslo.db which likely has stuff like that built-in I suppose | 23:15 |
* harlowja is just a dumb person | 23:15 | |
harlowja | lol | 23:15 |
harlowja | nope nope, don't know anything | 23:15 |
harlowja | lol | 23:15 |
harlowja | dumb josh | 23:15 |
harlowja | being dumb | 23:15 |
harlowja | lol | 23:15 |
dmsimard | However, last time I've looked at oslo.db, it also required oslo.config, which totally derails the thing | 23:16 |
wwriverrat | I’ve been running ansible jobs all day. For ~50% of them, I’ve seen some point where the server-side of MySQL drops the connection under tour feet | 23:16 |
dmsimard | wwriverrat: 50% ouch | 23:17 |
harlowja | shitty networking? | 23:17 |
wwriverrat | that or I believe mysql default wait_timeout is 30 seconds | 23:17 |
wwriverrat | so if you have a pool of say 10 connections and you finally get around to using #8, it could have been long ago disconnected | 23:18 |
wwriverrat | In that link I sent above, there are several techniques for handling stale mysql connections (pessimistic, optimistic, pool recycle) | 23:20 |
harlowja | from what i understand ara is using http://flask-sqlalchemy.pocoo.org/2.3/config/ | 23:20 |
harlowja | which has a SQLALCHEMY_POOL_TIMEOUT Specifies the connection timeout in seconds for the pool. | 23:21 |
harlowja | might just need to set that somehow | 23:21 |
harlowja | and set it lower than mysql/mariadb value | 23:21 |
dmsimard | yeah I'm reading right now | 23:21 |
dmsimard | wwriverrat: what's your database server timeout ? | 23:21 |
harlowja | 302 harlowja | 23:21 |
harlowja | lol | 23:21 |
wwriverrat | lol.. stole my thunder | 23:22 |
harlowja | we just used a standard centos mariadb | 23:22 |
harlowja | so i'm guessing the default | 23:22 |
wwriverrat | I did the google thing a while ago and found 30s | 23:22 |
harlowja | ya, its whatever the default is :-P | 23:23 |
harlowja | of `mariadb-server-5.5.52-1.el7.x86_64` | 23:23 |
harlowja | prob just need to know how to set SQLALCHEMY_POOL_TIMEOUT so that it gets picked up by ara | 23:24 |
dmsimard | I love that sqlalchemy has something called connectionFairy | 23:24 |
dmsimard | harlowja, wwriverrat: If I write a patch that would plumb that up, would you like to try that out and let me know if it helps ? | 23:25 |
harlowja | all sorts of weird shit in there, lol | 23:25 |
harlowja | yes | 23:25 |
harlowja | wwriverrat very much would like that | 23:25 |
harlowja | :-P | 23:25 |
dmsimard | ok, give me a minute | 23:25 |
wwriverrat | Yep, and it would be on us to ensure SQLALCHEMY_POOL_TIMEOUT is <= maridb wait_timeout | 23:25 |
*** openstackgerrit has joined #ara | 23:35 | |
openstackgerrit | David Moreau Simard proposed openstack/ara master: Add support for configuring sqlalchemy pool size, timeout and recycle https://review.openstack.org/524427 | 23:35 |
dmsimard | harlowja, wwriverrat ^ | 23:35 |
dmsimard | do you use an ansible.cfg file to configure ara ? or env variables ? | 23:36 |
harlowja | ansible.cfg | 23:36 |
dmsimard | ok, so, under [ara], you'll want to play with sqlalchemy_pool_size, sqlalchemy_pool_timeout and sqlalchemy_pool_recycle | 23:37 |
harlowja | any changes needed to do that? | 23:37 |
harlowja | (ara code changes) | 23:37 |
dmsimard | well yeah, you're going to need the patch that I just sent | 23:38 |
harlowja | oh | 23:38 |
* harlowja slaps self in face | 23:38 | |
harlowja | lol | 23:38 |
dmsimard | don't do that :) | 23:38 |
* harlowja slaps someone else in face? | 23:38 | |
harlowja | lol | 23:38 |
dmsimard | ¯\_(ツ)_/¯ | 23:38 |
dmsimard | I have to catch dinner before I digest myself. Can you try that out and let me know if it helps ? I'll try and see where we go from there. | 23:39 |
harlowja | yup | 23:39 |
dmsimard | thanks | 23:41 |
dmsimard | and like I mentioned, I'm definitely planning to bake failure tolerance at the API client layer which will be fairly easy. Right now there's queries to the database a bit all over the place so it's not super trivial beyond tweaking those pool settings | 23:42 |
harlowja | yuppers | 23:42 |
harlowja | might have to get off flask_sqlachemy thingy | 23:42 |
harlowja | and onto <one of the other 10000000 libs> | 23:42 |
dmsimard | lol | 23:42 |
harlowja | ^ not oslo.db, lol | 23:42 |
dmsimard | it's funny that you mention that | 23:42 |
dmsimard | because one of the only things that kept me on flask is removed in 1.0 | 23:43 |
dmsimard | and awx is built in django | 23:43 |
dmsimard | but I don't really wanna rewrite everything, and I don't know django at all anyway | 23:43 |
harlowja | :-p | 23:43 |
dmsimard | but django rest framework looks pretty dope | 23:43 |
dmsimard | ok afk food, please keep me updated | 23:44 |
harlowja | willdo | 23:44 |
wwriverrat | Ironically: I upped my forks from 5 to 8. Getting all the results :P . Assume it’s cause it takes less time and taken from pool sooner :) | 23:47 |
*** Bakey has quit IRC | 23:54 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!