*** bobh_ has joined #openstack-mistral | 00:17 | |
*** chlong has joined #openstack-mistral | 00:23 | |
*** bobh_ has quit IRC | 00:44 | |
*** bobh_ has joined #openstack-mistral | 02:10 | |
*** bobh_ has quit IRC | 02:12 | |
*** bobh_ has joined #openstack-mistral | 02:17 | |
*** igormarnat has quit IRC | 02:24 | |
*** igormarnat has joined #openstack-mistral | 02:24 | |
*** bobh_ has quit IRC | 02:43 | |
*** achatterjee has joined #openstack-mistral | 04:07 | |
*** achatterjee has left #openstack-mistral | 04:08 | |
*** hparekh has joined #openstack-mistral | 04:09 | |
*** chlong has quit IRC | 04:13 | |
*** sharatss has joined #openstack-mistral | 04:41 | |
*** sharatss has quit IRC | 04:55 | |
*** sharatss has joined #openstack-mistral | 05:00 | |
openstackgerrit | Sharat Sharma proposed openstack/mistral: Added senlin action pack https://review.openstack.org/395587 | 05:14 |
---|---|---|
*** janki has joined #openstack-mistral | 05:50 | |
*** jaosorior has joined #openstack-mistral | 05:51 | |
*** harlowja_ has quit IRC | 06:33 | |
*** jaosorior has quit IRC | 06:34 | |
*** jaosorior has joined #openstack-mistral | 06:35 | |
*** kamtamtun has joined #openstack-mistral | 06:42 | |
*** chlong has joined #openstack-mistral | 06:58 | |
*** chlong has quit IRC | 07:09 | |
*** jtomasek has joined #openstack-mistral | 07:32 | |
openstackgerrit | Renat Akhmerov proposed openstack/mistral: Fix error message format for task run and continue https://review.openstack.org/395988 | 07:33 |
openstackgerrit | Renat Akhmerov proposed openstack/mistral: Fix error message format for task run and continue https://review.openstack.org/395988 | 07:35 |
sharatss | rakhmerov: hi | 07:37 |
rakhmerov | hi :) | 07:38 |
sharatss | 8778 is the bind-port of senlin | 07:39 |
rakhmerov | ok | 07:39 |
openstackgerrit | Renat Akhmerov proposed openstack/mistral: Fix error message format for task run and continue https://review.openstack.org/395988 | 07:40 |
rakhmerov | I just noticed that we use some hardcoded string | 07:40 |
rakhmerov | which always looks slightly suspicious to me | 07:40 |
rakhmerov | do we use hardcoded strings in similar cases too? Did you check? | 07:40 |
sharatss | so i had mentioned it in the return of SenlinAction | 07:40 |
sharatss | we have used it for Murano as well | 07:41 |
rakhmerov | ok | 07:41 |
rakhmerov | it's a fake client anyway | 07:41 |
rakhmerov | can you pls fix other small issues? | 07:41 |
rakhmerov | they are minor but they reflect what we usually follow | 07:42 |
sharatss | yes sure | 07:42 |
sharatss | fake client is hardcoded even in Ironic and Barbican as well | 07:43 |
sharatss | i will fix the other issues as of now | 07:43 |
rakhmerov | ok | 07:46 |
*** openstackgerrit has quit IRC | 07:48 | |
*** openstackgerrit has joined #openstack-mistral | 07:48 | |
openstackgerrit | Sharat Sharma proposed openstack/mistral: Added senlin action pack https://review.openstack.org/395587 | 07:53 |
d0ugal | rakhmerov: Hey | 07:53 |
rakhmerov | d0ugal: yep, hi | 08:01 |
d0ugal | rakhmerov: I spent some time looking at that transaction bug yesterday | 08:02 |
d0ugal | rakhmerov: adding with db_api.transaction() around some of the API endpoints did resolve it for me. | 08:03 |
d0ugal | using a decorator or just manually - doesn't matter | 08:03 |
d0ugal | but in your comment you did make a good point, I don't know why the transaction is being started - I'd guess there is something wrong in the db_api code | 08:03 |
rakhmerov | yes, but what's the root cause of it? | 08:03 |
d0ugal | rakhmerov: good question. | 08:04 |
rakhmerov | yes | 08:04 |
d0ugal | I am trying to figure that out now | 08:04 |
rakhmerov | this is the most important part | 08:04 |
d0ugal | rakhmerov: do you deploy on mysql or postgres? | 08:04 |
rakhmerov | otherwise we may just mask the real problem | 08:04 |
rakhmerov | mostly on mysql | 08:04 |
d0ugal | rakhmerov: same here. | 08:04 |
d0ugal | rakhmerov: so I found without patching the number of processes on the mistral table got to 30 | 08:05 |
d0ugal | with the patch it stopped at 5 | 08:05 |
d0ugal | it sounds like the problem is worse with postgres | 08:05 |
rakhmerov | I assume that possible there may be a bug in oslo.db or even sqlalchemy | 08:05 |
rakhmerov | because we don't do much on top of it regarding transaction management | 08:05 |
rakhmerov | we can ask also Winson from StackStorm | 08:06 |
rakhmerov | they use only Postgres | 08:06 |
rakhmerov | successfully | 08:06 |
d0ugal | cool, that might be useful | 08:06 |
d0ugal | Maybe there is some postgres configuration that would stop xavierhardy's db being overloaded. | 08:06 |
xavierhardy | We were also using PostgreSQL successfully up until now. | 08:08 |
xavierhardy | for the error what do we do ? remove the decorator when with db_api.transaction is already there or the opposite N | 08:10 |
xavierhardy | *? | 08:10 |
d0ugal | xavierhardy: I think we need to understand why the transaction is being created. | 08:12 |
rakhmerov | xavierhardy: yes, this kind of problems require thorough investigation | 08:14 |
rakhmerov | I really wanna know why transactions get started and don't finish | 08:14 |
xavierhardy | It comes from SQLAlchemy | 08:14 |
xavierhardy | http://docs.sqlalchemy.org/en/latest/orm/session_transaction.html | 08:15 |
rakhmerov | you mean if we don't explicitly open a TX and then do some DB API operation? | 08:15 |
rakhmerov | if so, why? isn't it supposed to be configured as "autocommit=true" if it's a single operation? | 08:16 |
xavierhardy | commit has nothing to do here | 08:16 |
xavierhardy | since we are getting rows most of the time and not editing them | 08:16 |
rakhmerov | I guess it has a lot to do with transaction scope, no? | 08:16 |
xavierhardy | yes | 08:17 |
rakhmerov | so? | 08:17 |
rakhmerov | ok, let's get to the bottom line of it | 08:17 |
xavierhardy | but where is the transaction scope, it's missing in most of the REST API function | 08:17 |
xavierhardy | *functions | 08:17 |
rakhmerov | what's your opinion on the root cause of this behavior? | 08:17 |
d0ugal | xavierhardy: but what starts the transaction? | 08:17 |
rakhmerov | we may be missing some details | 08:17 |
rakhmerov | yes | 08:18 |
rakhmerov | ok, here's my statement | 08:18 |
rakhmerov | Any DB API method, if not encompassed into an explicitly open TX, should not leave any open transactions | 08:18 |
rakhmerov | this is how it should be | 08:19 |
rakhmerov | if it's not true, we need to fix that | 08:19 |
rakhmerov | not REST controllers | 08:19 |
rakhmerov | makes sense? | 08:19 |
xavierhardy | Any DB API method, if not encompassed into an explicitly open TX, should not leave any open transactions -> YES | 08:19 |
rakhmerov | ok | 08:20 |
xavierhardy | The problem is in the REST controllers | 08:20 |
xavierhardy | the other components already seem to handle it properly | 08:20 |
rakhmerov | brr... | 08:20 |
rakhmerov | but why? | 08:20 |
rakhmerov | I still don't understand, sorry | 08:20 |
rakhmerov | d0ugal: do you? | 08:20 |
d0ugal | rakhmerov: no, not really | 08:20 |
d0ugal | My question still is: What starts the transactions? | 08:21 |
rakhmerov | if it's a problem of DB API why is it a problem of REST layer? | 08:21 |
xavierhardy | Yes, d0ugal, that's the question | 08:22 |
xavierhardy | I think it's session.query | 08:22 |
rakhmerov | ok | 08:22 |
rakhmerov | xavierhardy: can you try to investigate this? | 08:22 |
xavierhardy | Already am | 08:22 |
rakhmerov | to name the exact reason | 08:22 |
rakhmerov | ok | 08:22 |
rakhmerov | thanks, once I have a chance I'll look into this too | 08:23 |
d0ugal | I am also looking FWIW | 08:23 |
xavierhardy | http://docs.sqlalchemy.org/en/latest/orm/query.html | 08:23 |
xavierhardy | Query is produced in terms of a given Session, using the query() method | 08:23 |
xavierhardy | the Session object’s usual behavior of maintaining a transaction and expiring all attributes after rollback or commit handles object state automatically. | 08:24 |
rakhmerov | yes, that's fine | 08:24 |
rakhmerov | but may be we just don't pass some params when we open a session | 08:25 |
xavierhardy | http://docs.sqlalchemy.org/en/latest/orm/session_basics.html#what-does-the-session-do | 08:25 |
rakhmerov | which would tell SQLAlchemy to not keep the session open after this query is made | 08:25 |
xavierhardy | Once queries are issued or other objects are persisted with it, it requests a connection ... This connection represents an ongoing transaction, which remains in effect until the Session is instructed to commit or roll back its pending state. | 08:25 |
rakhmerov | and I'm also wondering if we have the same behavior with mysql | 08:25 |
rakhmerov | yes | 08:26 |
xavierhardy | yes, you will because that's SQLAlchemy's role | 08:26 |
rakhmerov | there's "autocommit" property of the connection | 08:26 |
xavierhardy | to abstract the DB as much as possible | 08:26 |
xavierhardy | but we should not commit anything | 08:26 |
xavierhardy | and to use that you'll need to wrap around as I did with the db_api.transaction | 08:26 |
xavierhardy | Querying something opens a transaction, that makes sense, your API call will be isolated from the other transactions and we also need that | 08:27 |
* d0ugal will brb, some people here to do work on my house | 08:28 | |
openstackgerrit | Michal Gershenzon proposed openstack/mistral: Migrate mistral task_type https://review.openstack.org/395742 | 08:28 |
xavierhardy | If you are querying twice the DB in the same REST API call with 2 different transactions, you're either opening Pandora's box of concurrency or you have a reason to do it | 08:28 |
openstackgerrit | Renat Akhmerov proposed openstack/mistral: Fix error message format for task run and continue https://review.openstack.org/395988 | 08:29 |
openstackgerrit | Renat Akhmerov proposed openstack/mistral: Fix error message format in other task handler methods https://review.openstack.org/396009 | 08:29 |
xavierhardy | (a reason can be, you have just committed something and you want to reuse it) | 08:29 |
xavierhardy | coming back to the issue, I would need to know more about Pecan's session handling | 08:30 |
rakhmerov | xavierhardy: agree on using 2 transactions | 08:30 |
rakhmerov | we don't do that | 08:30 |
rakhmerov | xavierhardy: again, I don't think it's related to Pecan | 08:31 |
rakhmerov | it's a problem of DB API | 08:31 |
xavierhardy | SQLAlchemy is meant to be thread-safe, so if we open too many threads, that could explain why we have many transactions | 08:32 |
rakhmerov | we don't open many threads | 08:32 |
xavierhardy | how does Pecan work ? | 08:32 |
xavierhardy | I mean their client session handling | 08:32 |
rakhmerov | no matter how SQLAlchemy works with sessions and transactions we need to make sure that if we call a single DB API methods w/o explicit transactions that no open transactions left after that | 08:33 |
xavierhardy | I disagree | 08:33 |
rakhmerov | I don't know exactly how it works | 08:33 |
rakhmerov | why do you disagree? | 08:33 |
xavierhardy | That's creating 2 transactions for querying 2 objects for instance | 08:34 |
rakhmerov | in such cases we open TX explicitly | 08:34 |
xavierhardy | If there's no modification in between, that leads to concurrency problems | 08:34 |
rakhmerov | show me the place where we use many transactions in one method | 08:35 |
xavierhardy | Fair enough, but that will lead to mis-use | 08:35 |
xavierhardy | there's no such place, but there might be if you do that | 08:35 |
xavierhardy | for the moment, what we see is basically one transaction per API call | 08:36 |
xavierhardy | *REST API call | 08:36 |
rakhmerov | that's why we have code review process | 08:36 |
rakhmerov | to prevent having such anti-patterns | 08:36 |
xavierhardy | I would be more inclined to raise an error if no transaction has been opened before | 08:36 |
xavierhardy | or let the developer explicitly set a flag like "transaction=True/False" to explicitly use that behavior | 08:37 |
rakhmerov | ok, could please explain your point on multiple threads? Not sure I understood it | 08:37 |
xavierhardy | OK | 08:37 |
rakhmerov | really, all I wanna know now: why after a call made to DB API we have some transactions | 08:38 |
xavierhardy | Me tto | 08:38 |
xavierhardy | too | 08:38 |
xavierhardy | We know now that calling session.query for instance, opens a transaction | 08:38 |
rakhmerov | yes, let's not talk about controllers | 08:38 |
rakhmerov | for now | 08:38 |
rakhmerov | yes, that's fine | 08:38 |
rakhmerov | agree | 08:38 |
xavierhardy | We know that that happens in each REST API call | 08:38 |
rakhmerov | ooh, gosh... | 08:39 |
xavierhardy | If those API calls were made in the same thread, we would have no problem | 08:39 |
rakhmerov | why are you talking about REST again? It's irrelevant to the conversation :) | 08:39 |
xavierhardy | No, that's not | 08:39 |
xavierhardy | To understand the whole problem | 08:39 |
rakhmerov | ok, keep going.. | 08:39 |
xavierhardy | we must understand how Pecan works | 08:39 |
rakhmerov | seems like I'm missing something | 08:39 |
rakhmerov | sorry | 08:39 |
rakhmerov | ok | 08:39 |
xavierhardy | session.query only opens a transaction if it's not done yet | 08:40 |
rakhmerov | not done what? | 08:40 |
xavierhardy | if no transaction is open yet | 08:40 |
rakhmerov | ok | 08:40 |
rakhmerov | is supposed to close the TX after it's made? query | 08:40 |
xavierhardy | So the question is? Why are multiple transactions handling | 08:40 |
xavierhardy | No, it needs to be explicitly | 08:41 |
xavierhardy | *to be done | 08:41 |
rakhmerov | why explicitly? | 08:41 |
xavierhardy | Because it can't possibly know when to close it | 08:41 |
xavierhardy | SQLAlchemy does not handle that for you | 08:41 |
rakhmerov | but we should control it | 08:41 |
*** jpich has joined #openstack-mistral | 08:41 | |
xavierhardy | except when you commit or rollback | 08:41 |
xavierhardy | Yes, exactly | 08:42 |
rakhmerov | we know exactly if we opened a TX explicitly or not | 08:42 |
rakhmerov | we know it | 08:42 |
xavierhardy | no | 08:42 |
rakhmerov | in the current thread | 08:42 |
xavierhardy | we don't | 08:42 |
rakhmerov | why don't we know it? | 08:42 |
xavierhardy | but we could find out | 08:42 |
rakhmerov | let me show you the code where we know it... | 08:42 |
xavierhardy | ok | 08:42 |
xavierhardy | Coming back to Pecan, it has means of handling database sessions | 08:44 |
xavierhardy | https://pecan.readthedocs.io/en/latest/databases.html | 08:44 |
rakhmerov | https://github.com/openstack/mistral/blob/master/mistral/db/sqlalchemy/base.py#L106 | 08:44 |
rakhmerov | specifically https://github.com/openstack/mistral/blob/master/mistral/db/sqlalchemy/base.py#L90 | 08:44 |
rakhmerov | that's where we know if TX was open or not | 08:44 |
rakhmerov | there may be a bug here, in this code | 08:45 |
rakhmerov | https://github.com/openstack/mistral/blob/master/mistral/db/sqlalchemy/base.py#L121 | 08:45 |
xavierhardy | or it's not called at the right place | 08:45 |
rakhmerov | this is where we determine if we need to auto commit the TX | 08:45 |
rakhmerov | as far as Pecan | 08:45 |
rakhmerov | as far as I know (not confirmed!) it's single threaded | 08:46 |
rakhmerov | like most of other things in Python | 08:46 |
rakhmerov | what we use to handle requests in parallel is WSGIService | 08:46 |
rakhmerov | from oslo.service | 08:46 |
xavierhardy | _secure_query is not session aware ? | 08:46 |
xavierhardy | https://github.com/openstack/mistral/blob/a0bd4a9a4ee3fbccbab72291c7d4378b4d830408/mistral/db/v2/sqlalchemy/api.py#L118 | 08:46 |
rakhmerov | it creates physically multiple processes to handle requests | 08:46 |
xavierhardy | OK | 08:47 |
rakhmerov | yes, _secure_query is not because the implicit rule was: only all public methods should be marked with @session_aware | 08:47 |
rakhmerov | to be sure that all subsequent calls use the same session | 08:48 |
xavierhardy | agreed | 08:48 |
rakhmerov | maybe we are missing some @session_aware on some of the methods | 08:48 |
rakhmerov | we need to check | 08:48 |
xavierhardy | agreed | 08:49 |
rakhmerov | ideally we need to see what REST method reproduces this problem and then we'll see what DB API method is not marked with it | 08:49 |
rakhmerov | yep | 08:49 |
xavierhardy | GET workflows | 08:49 |
xavierhardy | is a good exmaple | 08:49 |
xavierhardy | however, there should not be a need to commit | 08:50 |
xavierhardy | just closing the transaction should be enough | 08:50 |
d0ugal | Back - sorry, that took longer than expected | 08:50 |
* d0ugal reads up | 08:50 | |
xavierhardy | (when we are just querying stuff) | 08:50 |
xavierhardy | wait, something's weird | 08:51 |
rakhmerov | xavierhardy: doesn't matter for us, SQLAlchemy should take care of that | 08:51 |
rakhmerov | if session.dirty is False it should always do a roll back instead of commit | 08:51 |
xavierhardy | true | 08:51 |
xavierhardy | it's just using SQLAlchemy's magic | 08:51 |
rakhmerov | yeah, sometimes it seems like magic | 08:52 |
rakhmerov | because it has a notion of session | 08:52 |
rakhmerov | on top of regular SQL stuff | 08:52 |
openstackgerrit | Renat Akhmerov proposed openstack/mistral: Fix error message format in other task handler methods https://review.openstack.org/396009 | 08:52 |
xavierhardy | how is get_thread_local supposed to return you whether SQLAlchemy has already an open session or not? | 08:55 |
rakhmerov | if the session is stored in thread local storage it means it was already open | 08:56 |
rakhmerov | it means that db_api.transaction() was called | 08:57 |
xavierhardy | OK | 08:57 |
xavierhardy | Another proposal: what about disabling connection pooling ? | 08:57 |
xavierhardy | If you have separate processes and not threads, that should work | 08:58 |
xavierhardy | I see it's already done :( | 08:58 |
* d0ugal is looking at how other projects handle transactions | 08:59 | |
rakhmerov | d0ugal: good luck! | 08:59 |
xavierhardy | or wait no | 09:00 |
d0ugal | rakhmerov: it seems we have much more transaction code than others | 09:00 |
rakhmerov | a lot of projects simply didn't have any support for them up until recently :))) | 09:00 |
rakhmerov | yes | 09:00 |
d0ugal | right | 09:00 |
rakhmerov | that's why I've written new code for this myself | 09:00 |
d0ugal | I see | 09:00 |
xavierhardy | In Flask for example, there's an extension that makes sure that the transaction is closed | 09:00 |
xavierhardy | at each API call | 09:01 |
xavierhardy | *REST API | 09:01 |
d0ugal | rakhmerov: Where do we create the session? | 09:01 |
rakhmerov | xavierhardy: yeah, we could do that too. But my point is: I'm afraid that we will just mask a real problem behind those decorators etc. | 09:01 |
rakhmerov | d0ugal: https://github.com/openstack/mistral/blob/master/mistral/db/sqlalchemy/base.py#L106 | 09:02 |
xavierhardy | Agreed | 09:02 |
xavierhardy | what about session pool ? | 09:02 |
xavierhardy | *pools | 09:02 |
xavierhardy | (basically what you do for migrations) | 09:03 |
d0ugal | rakhmerov: we don't have a session.begin? | 09:03 |
d0ugal | rakhmerov: sorry, I was already looking at this code but it confused me :-D | 09:03 |
rakhmerov | xavierhardy: hm.. I'm not sure what you mean by that. I know connection pool | 09:03 |
xavierhardy | Well, part of the problem is that many connections are open towards the DB, too many | 09:04 |
rakhmerov | d0ugal: yes, we don't have it, a new TX starts effectively when we create a new session (or at least make a first query in it) | 09:04 |
xavierhardy | more than one per process | 09:04 |
rakhmerov | xavierhardy: how many? | 09:04 |
xavierhardy | Yesterday, I reached 50 connections on local desktop | 09:04 |
rakhmerov | there are config properties for that | 09:04 |
rakhmerov | it's not many | 09:04 |
xavierhardy | it is | 09:04 |
rakhmerov | we usually bumpt it up to 150 at least | 09:05 |
xavierhardy | considering I only had 4 processes running | 09:05 |
xavierhardy | and that I was not using any kind of redundancy | 09:05 |
xavierhardy | and not even committing anything | 09:05 |
rakhmerov | it doesn't really matter | 09:05 |
xavierhardy | that's actually the problem I'm describing | 09:05 |
rakhmerov | no, I mean | 09:06 |
rakhmerov | it's just a config property that you can change | 09:06 |
xavierhardy | and in those open connections, dangling transactions | 09:06 |
rakhmerov | wait a sec | 09:06 |
xavierhardy | so let's try it | 09:06 |
rakhmerov | [database] | 09:08 |
rakhmerov | pool_size = 20 | 09:08 |
rakhmerov | max_overflow = 0 | 09:08 |
rakhmerov | max_pool_size = 1000 | 09:08 |
d0ugal | rakhmerov: we use oslo_db.sqlalchemy.session.EngineFacade | 09:08 |
rakhmerov | you can change these params | 09:08 |
d0ugal | rakhmerov: Did you know it is deprecated? :) | 09:08 |
xavierhardy | foud nthem, thanks | 09:08 |
rakhmerov | by default I guess it's about 50 exactly | 09:08 |
rakhmerov | d0ugal: no, I didn't | 09:08 |
xavierhardy | but is it a pool for all the processes or per process ? | 09:08 |
d0ugal | rakhmerov: in December 2014 too. it seems we should be using oslo_db.sqlalchemy.enginefacade | 09:09 |
d0ugal | rakhmerov: I'll open a bug | 09:09 |
rakhmerov | per process | 09:09 |
xavierhardy | OK, the problem still happens | 09:09 |
xavierhardy | let's see if I have put the configuration in the right file | 09:10 |
rakhmerov | yeah, I don't think it's related to any pools | 09:10 |
rakhmerov | what we need to do is: 1) Reproduce the situation when we have an open transaction after DB API call 2) see what DB API method it is 3) see if it's marked with @session_aware | 09:11 |
xavierhardy | OK, but thanks for showing me the max_overflow | 09:11 |
xavierhardy | it explains one of the behaviors I have seen | 09:11 |
rakhmerov | if it is marked with that then we need to debug and see why session doesn't get committed | 09:11 |
rakhmerov | ok | 09:11 |
xavierhardy | After reaching 50 open connections, it cleaned up partially | 09:11 |
xavierhardy | 1) easy: GET on workflow list | 09:13 |
xavierhardy | 2) get_workflow_definitions | 09:13 |
xavierhardy | more exactly | 09:14 |
xavierhardy | session.query | 09:14 |
xavierhardy | from model_query | 09:14 |
rakhmerov | ok | 09:14 |
xavierhardy | 3) model_query is | 09:14 |
rakhmerov | do you see if that methods is called? | 09:14 |
rakhmerov | I mean here: https://github.com/openstack/mistral/blob/master/mistral/db/sqlalchemy/base.py#L121 | 09:15 |
rakhmerov | or possibly here: https://github.com/openstack/mistral/blob/master/mistral/db/sqlalchemy/base.py#L126 | 09:15 |
xavierhardy | Yes | 09:15 |
xavierhardy | wait | 09:15 |
rakhmerov | or here: https://github.com/openstack/mistral/blob/master/mistral/db/sqlalchemy/base.py#L130 | 09:16 |
xavierhardy | yes | 09:17 |
xavierhardy | OK, by limiting the pool, I have another behavior | 09:18 |
xavierhardy | The first call works, the second fails | 09:18 |
xavierhardy | (because pool exceeded, but it shouldn't as it the same process) | 09:18 |
xavierhardy | (and so reuse the same connection) | 09:18 |
rakhmerov | xavierhardy: so seems like from Mistral all that's needed happens? | 09:18 |
rakhmerov | can you debug that session.commit call and see what happens inside? | 09:19 |
xavierhardy | what do you mean ? | 09:19 |
rakhmerov | why TX still remains open | 09:19 |
xavierhardy | OK, just understood your previous sentence | 09:19 |
rakhmerov | ok | 09:20 |
rakhmerov | we need to localize the problem | 09:20 |
rakhmerov | and understand wether it's in Mistral (we don't close or clean up something or do it incorrectly) or in SQLAlchemy | 09:20 |
rakhmerov | or oslo.db | 09:21 |
xavierhardy | wait there's another call | 09:21 |
xavierhardy | get_workflow_definition_by_id | 09:21 |
xavierhardy | wait, something weird in secure_query | 09:22 |
xavierhardy | mistral/db/v2/sqlalchemy/api.py -> secure_query | 09:23 |
xavierhardy | it returns a query object | 09:23 |
xavierhardy | but it does not use it | 09:23 |
rakhmerov | ok | 09:23 |
rakhmerov | :) | 09:23 |
rakhmerov | I guess we're close to find a problem ) | 09:24 |
xavierhardy | then for example in _paginate_query | 09:24 |
xavierhardy | same thing | 09:24 |
xavierhardy | (uses _secure_query | 09:24 |
xavierhardy | however, _secure_query uses our friend model_query | 09:24 |
xavierhardy | which is wrapped | 09:24 |
xavierhardy | (model_query not paginate query) | 09:24 |
rakhmerov | ok | 09:24 |
xavierhardy | and finally in for instance, get_collection | 09:25 |
xavierhardy | we have a query.all | 09:25 |
xavierhardy | but no wrapper | 09:25 |
xavierhardy | actually model_query should not be wrapped | 09:26 |
rakhmerov | :) | 09:26 |
xavierhardy | what should be are the functions use it | 09:26 |
xavierhardy | *using it | 09:26 |
rakhmerov | model_query? | 09:26 |
xavierhardy | mistral/db/sqlalchemy/base.py | 09:26 |
xavierhardy | just calls session.query | 09:27 |
rakhmerov | line? | 09:27 |
xavierhardy | 220 | 09:27 |
rakhmerov | yes, it's just a base utility method | 09:27 |
xavierhardy | OK | 09:27 |
rakhmerov | on top of it we can call secure_model_query | 09:28 |
rakhmerov | if it's a secure model (belongs to a tenant etc.) | 09:28 |
xavierhardy | OK | 09:28 |
rakhmerov | xavierhardy: sorry, I have to disappear for 1.5 hr | 09:28 |
rakhmerov | meeting | 09:28 |
rakhmerov | will ping you when I'm back | 09:28 |
xavierhardy | But I'm pretty sure that's there | 09:28 |
xavierhardy | I'll pdb that and monitor the DB | 09:29 |
rakhmerov | yes, please | 09:29 |
openstackgerrit | Renat Akhmerov proposed openstack/mistral: Fix error message format in other task handler methods https://review.openstack.org/396009 | 09:30 |
openstackgerrit | Renat Akhmerov proposed openstack/mistral: Fix error message format in action handler https://review.openstack.org/396038 | 09:30 |
xavierhardy | right after leaving model_query the idle in transaction session appears | 09:32 |
xavierhardy | and after query.all() | 09:33 |
xavierhardy | another idle in transaction session appears | 09:33 |
xavierhardy | so 2 sessions (and 2 transactions) in the same function call | 09:33 |
openstackgerrit | Renat Akhmerov proposed openstack/mistral: Fix error message format in action handler https://review.openstack.org/396038 | 09:37 |
rakhmerov | xavierhardy: which means we open two sessions (by mistake) but close only one, right? | 09:38 |
xavierhardy | exactly | 09:41 |
xavierhardy | model_query opens and close its transaction | 09:41 |
xavierhardy | and does not even do anything with it, does not even query anything | 09:41 |
xavierhardy | and then comes get_all | 09:41 |
xavierhardy | that opens a new transaction | 09:41 |
xavierhardy | but it's not closed | 09:42 |
xavierhardy | making get_collection session aware solves the problem | 09:42 |
rakhmerov | yeah, so seems like something is just not wrapped with @session_aware | 09:42 |
xavierhardy | line 179 | 09:42 |
rakhmerov | yeah | 09:42 |
rakhmerov | ok | 09:42 |
xavierhardy | mistral/db/v2/sqlalchemy/api.py | 09:42 |
xavierhardy | we have to look for all similar cases | 09:43 |
rakhmerov | xavierhardy: I think it was broken during some refactoring that we did this summer | 09:43 |
rakhmerov | yes | 09:43 |
xavierhardy | _get_db_object_by_name | 09:43 |
xavierhardy | _get_db_object_by_id | 09:43 |
xavierhardy | _get_db_object_by_name_or_id | 09:43 |
xavierhardy | _delete_all | 09:44 |
rakhmerov | wait a sec | 09:46 |
rakhmerov | xavierhardy: as far as _get_collection | 09:46 |
rakhmerov | where exactly do we open a new session instead of using an existing one? | 09:47 |
xavierhardy | get_completed_task_executions | 09:47 |
rakhmerov | I'm looking at the method | 09:47 |
xavierhardy | in the wrapper | 09:47 |
rakhmerov | line | 09:47 |
rakhmerov | ? | 09:47 |
xavierhardy | mistral/db/sqlalchemy/base.py | 09:47 |
xavierhardy | 121 | 09:48 |
xavierhardy | get_incomplete_task_executions | 09:48 |
xavierhardy | get_incomplete_task_executions_count | 09:48 |
rakhmerov | ooh | 09:48 |
rakhmerov | wait a sec | 09:48 |
rakhmerov | so | 09:48 |
rakhmerov | tell me if I'm wrong | 09:48 |
rakhmerov | in many methods we use this wrapper multiple times | 09:49 |
rakhmerov | I mean, for example, we have a call | 09:49 |
rakhmerov | which is not wrapped with @session_aware | 09:49 |
xavierhardy | get_resource_member | 09:49 |
xavierhardy | yes | 09:49 |
rakhmerov | inside, we have a call which is wrapped with it | 09:49 |
rakhmerov | and then we call something again inside the first one which is also wrapped with it | 09:50 |
rakhmerov | yeah | 09:50 |
rakhmerov | makes sense | 09:50 |
xavierhardy | you can call the wrapper again | 09:50 |
xavierhardy | that shouldn't be a problem | 09:50 |
rakhmerov | it will open a new TX | 09:51 |
xavierhardy | but the whole session should be wrapped | 09:51 |
rakhmerov | if those wrappers are not overlapping | 09:51 |
rakhmerov | it's a problem because we open many sessions whereas we don't need to | 09:51 |
xavierhardy | yes | 09:51 |
rakhmerov | but I don't quite see where we don't close a session | 09:51 |
rakhmerov | where we need to | 09:51 |
xavierhardy | after the actual query | 09:52 |
xavierhardy | for example when calling | 09:52 |
xavierhardy | .all() | 09:52 |
xavierhardy | because it's not wrapped | 09:52 |
rakhmerov | oooh! | 09:53 |
rakhmerov | yes | 09:53 |
xavierhardy | add to the previous list | 09:53 |
rakhmerov | got it | 09:53 |
xavierhardy | get_resource_members | 09:53 |
xavierhardy | _get_accepted_resources | 09:53 |
xavierhardy | _get_event_trigger | 09:53 |
rakhmerov | because query.all has to be in a session | 09:53 |
xavierhardy | yes | 09:53 |
rakhmerov | ooh gosh... | 09:54 |
rakhmerov | such a mess... | 09:54 |
xavierhardy | it's everywhere in that file | 09:54 |
xavierhardy | not only what comes from this summer's refactoring | 09:54 |
rakhmerov | I haven't look at this quite a while | 09:54 |
xavierhardy | but yes, I did copy/paste this stuff | 09:54 |
xavierhardy | and potentially increased this problem | 09:54 |
rakhmerov | that's ok | 09:55 |
rakhmerov | the real problem is that the right usage of @session_aware is not too obvious | 09:55 |
rakhmerov | we need to fix this all | 09:55 |
rakhmerov | and ideally test it | 09:55 |
rakhmerov | we should test it for sure | 09:55 |
rakhmerov | interested in fixing it? | 09:56 |
rakhmerov | d0ugal: following our conversation? | 09:56 |
rakhmerov | d0ugal: we found the problem | 09:56 |
d0ugal | rakhmerov: oops, sorry, I thought you went to a meeting so stopped :) | 09:56 |
* d0ugal reads | 09:56 | |
xavierhardy | interested in fixing it, but I won't have time to write test for that | 09:56 |
rakhmerov | the meeting was postponed for 30 mins | 09:57 |
d0ugal | xavierhardy: I can try and help with that, I have some time. | 09:57 |
rakhmerov | xavierhardy, d0ugal: let me take care of this | 09:57 |
xavierhardy | but how do we test it ? that's also a meaningful question | 09:57 |
rakhmerov | d0ugal: can you? | 09:57 |
xavierhardy | we check that we have only one transaction that was created and that is closed ? | 09:57 |
rakhmerov | xavierhardy: we already have test for pretty much all DB API calls | 09:57 |
xavierhardy | yes, I get that | 09:58 |
d0ugal | rakhmerov: I don't mind who does it :) | 09:58 |
xavierhardy | but testing this exact problem | 09:58 |
xavierhardy | ? | 09:58 |
rakhmerov | so we just need need to check that they 1) don't open more than 1 session 2) don't leave sessions open | 09:58 |
d0ugal | I guess we don't test this exact problem, or it would fail ;) | 09:58 |
rakhmerov | we don't | 09:58 |
rakhmerov | yeah, I mean we could just add some additional checks | 09:59 |
rakhmerov | to existing tests | 09:59 |
d0ugal | sure | 09:59 |
d0ugal | Can we mock it out and verify how many times it is called? | 09:59 |
rakhmerov | yeah, something like that | 09:59 |
xavierhardy | not sure it's sufficient | 10:00 |
xavierhardy | because it will be called several times | 10:00 |
xavierhardy | what is important is lower | 10:00 |
rakhmerov | d0ugal: this is really a huge problem, we need to fix it rather sooner than later | 10:00 |
d0ugal | right | 10:00 |
xavierhardy | the actual number of sessions and transactions | 10:00 |
d0ugal | rakhmerov: Yeah, I agree :) | 10:00 |
xavierhardy | I can/must fix that today | 10:00 |
rakhmerov | :) | 10:00 |
xavierhardy | for our production to run again | 10:00 |
rakhmerov | I feel ashamed that we have this mess | 10:01 |
xavierhardy | but as I said, I have limited time to test it | 10:01 |
xavierhardy | and tomorrow is a national holiday (France) | 10:01 |
rakhmerov | but the reason is why we didn't see it clearly before is because we use explicit transactions everywhere | 10:01 |
rakhmerov | except REST layer | 10:01 |
d0ugal | rakhmerov: hah | 10:01 |
d0ugal | indeed | 10:01 |
d0ugal | xavierhardy: do you want to try and prepare a fix? | 10:02 |
rakhmerov | xavierhardy, d0ugal: we could split the work, xavierhardy could prepare a fix (if have time), and d0ugal could proceed with testing | 10:02 |
xavierhardy | Will do | 10:02 |
xavierhardy | right now | 10:02 |
d0ugal | Sure, that works for me | 10:02 |
xavierhardy | please make sure in the review to check that I haven't forgotten any function | 10:03 |
xavierhardy | here is the list I have come up with | 10:03 |
xavierhardy | _get_collection, _get_db_object_by_name, _get_db_object_by_id, _get_db_object_by_name_or_id, _delete_all, get_completed_task_executions, get_incomplete_task_executions, get_incomplete_task_executions_count, get_resource_member, _get_accepted_resources, _get_event_trigger | 10:03 |
openstackgerrit | Sharat Sharma proposed openstack/mistral: Added senlin action pack https://review.openstack.org/395587 | 10:03 |
xavierhardy | these are the lowest level functions that have this problem | 10:03 |
d0ugal | okay, I'll take a look | 10:03 |
xavierhardy | I'll grab a tea, and fix those ASAP and send for review after running the tests once | 10:04 |
d0ugal | xavierhardy: Great | 10:04 |
d0ugal | brb | 10:04 |
*** jaosorior is now known as jaosorior_lunch | 10:10 | |
* ddeja is rading all | 10:10 | |
ddeja | so, basically, we don't use the session_aware wrapper in all needed places? | 10:10 |
d0ugal | ddeja: Yeah, I think that sounds about right | 10:12 |
d0ugal | but I am still catching up too :) | 10:12 |
ddeja | maybe when xavierhardy send his fix for reviews, I can catch it | 10:13 |
ddeja | btw. congrats d0ugal, it's great to have you in a core team! | 10:13 |
d0ugal | ddeja: Thanks :) | 10:13 |
d0ugal | This issue is a fun introduction :-D | 10:13 |
ddeja | That's what happens when someone use your software on production :D | 10:14 |
xavierhardy | One more thing | 10:18 |
xavierhardy | some more high level functions are affected | 10:18 |
xavierhardy | get_all in mistral/utils/rest_utils.py | 10:19 |
xavierhardy | if "resource_function" is defined, it does several queries | 10:20 |
xavierhardy | they all should be in the same transaction too | 10:20 |
xavierhardy | so we also need to check the REST API controller | 10:20 |
xavierhardy | (sorry Renat ;) ) | 10:20 |
d0ugal | lol | 10:20 |
xavierhardy | in that case, do we use the with db_api.transaction or the session aware wrapper ? | 10:21 |
d0ugal | Looking | 10:22 |
d0ugal | xavierhardy: I would use with db_api.transaction() for those | 10:24 |
d0ugal | since it is at a higher level | 10:24 |
d0ugal | but not totally sure | 10:24 |
xavierhardy | That's what I think too | 10:25 |
xavierhardy | add get_resource_members to my list (lower level) | 10:26 |
d0ugal | xavierhardy: silly question, how do you tell which functions require it? | 10:27 |
xavierhardy | when it's calling querying more than one thing on the DB | 10:27 |
xavierhardy | and no db_api.transaction is present | 10:28 |
xavierhardy | I try to find the lowest level function that satifies these conditions | 10:28 |
xavierhardy | if only one query is done, the session aware decorator will handle it correctly | 10:28 |
d0ugal | xavierhardy: okay, makes sense, thanks. | 10:29 |
ddeja | xavierhardy: wait, so maybe we should fix something in session aware decoreator? | 10:29 |
ddeja | if it cannot properly handle more the one query | 10:29 |
xavierhardy | by query, I mean not only creating a query object but also using it (all(), first(), one()) | 10:29 |
xavierhardy | It can | 10:29 |
xavierhardy | If the session is already created, it does not close or commit | 10:30 |
xavierhardy | that's correct | 10:30 |
xavierhardy | so calling it twice, will result in a commit in the higher level function | 10:30 |
xavierhardy | (twice and nested) | 10:30 |
ddeja | ok | 10:30 |
ddeja | oh, and lower would never get committed? | 10:31 |
* ddeja is trying to understand | 10:31 | |
ddeja | but I hope to got it once I see the patches | 10:31 |
xavierhardy | look at the decorator | 10:32 |
xavierhardy | mistral/db/sqlalchemy/base.py | 10:32 |
ddeja | ooooooh | 10:32 |
xavierhardy | line 106 | 10:32 |
xavierhardy | session_aware | 10:32 |
ddeja | if created: session.commit | 10:32 |
xavierhardy | yes | 10:32 |
ddeja | got it | 10:32 |
xavierhardy | and same thing for close | 10:33 |
xavierhardy | and the finally | 10:33 |
ddeja | ok, I got it, thanks | 10:33 |
ddeja | to be honest, I don't see an easy fix, but I can wait for the patch to see it | 10:37 |
xavierhardy | we just have to add the decorator everywhere it is required (given the criteria above) | 10:38 |
xavierhardy | that will fix dangling transactions | 10:39 |
xavierhardy | then, on the higher layers (the REST controller basically), we need to make sure that 2 subsequent db_api calls are nested into a single transaction | 10:39 |
xavierhardy | because else, we risk concurrency problems | 10:39 |
xavierhardy | race conditions | 10:39 |
xavierhardy | typically, look at mistral/api/controllers/v2/event_trigger.py EventTriggersController:delete | 10:40 |
xavierhardy | it gets the event trigger and then deletes it | 10:40 |
xavierhardy | if another process deletes the same event trigger between those two, it will raise a 500 internal server error for the other | 10:41 |
xavierhardy | process | 10:41 |
xavierhardy | btw, I think I found all (but please double check) functions where this second problem occurs | 10:41 |
xavierhardy | here is the list | 10:41 |
xavierhardy | I guess it should be in another commit | 10:42 |
xavierhardy | mistral/api/controllers/v2/task.py TasksController:put, mistral/api/controllers/v2/member.py MembersController:post, mistral/api/controllers/v2/execution.py ExecutionsController:put, mistral/api/controllers/v2/event_trigger.py EventTriggersController:put, mistral/api/controllers/v2/event_trigger.py EventTriggersController:delete, mistral/api/controllers/v2/action_execution.py ActionExecutionsController:delete, mistral/utils/rest | 10:42 |
rakhmerov | guys | 10:48 |
rakhmerov | ddeja: hi, yeah, that's an interesting reading | 10:48 |
rakhmerov | I think the right approach is to add @session_aware to ALL public functions | 10:49 |
rakhmerov | methods | 10:49 |
rakhmerov | not low level methods | 10:49 |
xavierhardy | OK | 10:50 |
rakhmerov | this will guarantee that we have the same scope (and the same sqlalchemy session) for the entire method call stack | 10:50 |
rakhmerov | makes sense? | 10:50 |
rakhmerov | because, think about it.. If we have a TX open explicitly then everything is fine | 10:50 |
xavierhardy | a wait, which API are we talking about ? | 10:50 |
xavierhardy | db_api | 10:50 |
xavierhardy | ? | 10:50 |
xavierhardy | or higher ? | 10:51 |
rakhmerov | so essentially @session_aware on a public method (exposed on DB API) is a replacement for explicit TX | 10:51 |
rakhmerov | DB API | 10:51 |
xavierhardy | yes | 10:51 |
xavierhardy | OK | 10:51 |
xavierhardy | agreed | 10:51 |
rakhmerov | ok | 10:51 |
rakhmerov | just wanted to make sure we're on the same page | 10:51 |
ddeja | rakhmerov: agree | 10:52 |
rakhmerov | ok | 10:52 |
xavierhardy | what about the transactions in the controller ? | 10:52 |
xavierhardy | this list | 10:53 |
xavierhardy | mistral/api/controllers/v2/task.py TasksController:put, mistral/api/controllers/v2/member.py MembersController:post, mistral/api/controllers/v2/execution.py ExecutionsController:put, mistral/api/controllers/v2/event_trigger.py EventTriggersController:put, mistral/api/controllers/v2/event_trigger.py EventTriggersController:delete, mistral/api/controllers/v2/action_execution.py ActionExecutionsController:delete, mistral/utils/rest | 10:53 |
rakhmerov | what about them? | 10:53 |
rakhmerov | sorry, I missed last 30 mins of the conversation | 10:53 |
ddeja | oh, we would close the transactions too early and we would have concurency problems. | 10:53 |
xavierhardy | (assuming the dangling transaction problem is fixed) | 10:54 |
rakhmerov | xavierhardy, ddeja, d0ugal: ok, on controllers | 10:54 |
xavierhardy | (on the lower level) | 10:54 |
rakhmerov | here's my opinion | 10:54 |
rakhmerov | I was agains decorating all the controller methods blindly because (I left a comment about that) we need to keep transactions as short as possible | 10:54 |
rakhmerov | besides DB stuff we also do some conversions, logging etc. inside those methods | 10:55 |
xavierhardy | sure | 10:55 |
rakhmerov | which increases length of transactions | 10:55 |
rakhmerov | so my preference would be just using "with db_api.transaction():" explicitly where we need them | 10:55 |
d0ugal | so the context manager makes sense there | 10:55 |
d0ugal | Yeah | 10:56 |
rakhmerov | yes | 10:56 |
xavierhardy | totally agree | 10:56 |
rakhmerov | if we see that we need a TX (e.g. when we need lazy loading or simply multiple DB operations) we can open it explicitly | 10:57 |
rakhmerov | on a minimal scope | 10:57 |
rakhmerov | ok | 10:57 |
xavierhardy | one more question | 10:59 |
ddeja | but hm, one concern. We call with db_api.transaction. Then inside the transaction we call some method that is 'session_aware'. Does it mean that sesion_aware would close the session and therefore we are not in the transaction anymore? | 10:59 |
xavierhardy | do we agree those are 2 separate PRs ? | 10:59 |
xavierhardy | no, for the same reason as before | 10:59 |
xavierhardy | it checks that the session is already open | 10:59 |
xavierhardy | and does not close it | 10:59 |
xavierhardy | correct me if I'm wrong rakhmerov | 11:00 |
ddeja | oh, created is true, if it *created* the session on the beggining | 11:00 |
ddeja | of the decorator code | 11:00 |
rakhmerov | xavierhardy, ddeja: exactly correct | 11:01 |
rakhmerov | that's the purpose of this method | 11:01 |
rakhmerov | to open a new session if it's not opened yet | 11:01 |
rakhmerov | by db_api.transaction() | 11:01 |
rakhmerov | it needs to make sure that all subcalls will use the same instance of session | 11:02 |
ddeja | oh, so now I'm lost again why there are hanging transaction. But maybe guys you can proceed with the fix instead of elaborating on this and then I can maybe understand it :) | 11:02 |
rakhmerov | ddeja: sec, I'll explain it | 11:03 |
xavierhardy | I'm doing it as we speak... i mean write ;) | 11:03 |
rakhmerov | it's easy | 11:03 |
rakhmerov | ooh, ok | 11:03 |
xavierhardy | xavierhardy: do we agree those are 2 separate PRs ? | 11:04 |
xavierhardy | rakhmerov | 11:04 |
xavierhardy | (the transaction handling in the REST API controller and the DB API problem) | 11:04 |
rakhmerov | d0ugal, hparekh: can you please also look at https://review.openstack.org/#/c/395988/? | 11:05 |
rakhmerov | same thing as in the patch you just looked at | 11:05 |
d0ugal | rakhmerov: sure | 11:05 |
rakhmerov | thanks | 11:05 |
rakhmerov | xavierhardy: PRs? | 11:06 |
rakhmerov | what do you mean? ) | 11:06 |
d0ugal | reviews/patches | 11:06 |
xavierhardy | PR is Pull Request | 11:06 |
d0ugal | (PRs = pull request, github naming) | 11:06 |
xavierhardy | so yes, review/patches | 11:06 |
rakhmerov | oh | 11:06 |
rakhmerov | yes | 11:06 |
xavierhardy | OK | 11:06 |
openstackgerrit | Xavier Hardy proposed openstack/mistral: Fix REST API dangling transactions https://review.openstack.org/395696 | 11:15 |
xavierhardy | https://review.openstack.org/#/c/395696 | 11:15 |
d0ugal | cool, looking | 11:18 |
d0ugal | ddeja: ^ | 11:18 |
d0ugal | rakhmerov: ^ | 11:18 |
rakhmerov | ok | 11:18 |
*** hparekh has quit IRC | 11:19 | |
ddeja | the code looks OK, but since I dont 100% understand the problem I'll wait with giving the rate | 11:20 |
rakhmerov | woooow! | 11:23 |
rakhmerov | http://logs.openstack.org/96/395696/2/check/gate-mistral-python34/7e9550c/testr_results.html.gz | 11:23 |
rakhmerov | it's for this patch | 11:23 |
d0ugal | lol | 11:23 |
rakhmerov | I looked at the changes and I believe they are correct | 11:23 |
rakhmerov | BUT! | 11:23 |
rakhmerov | these changes revealed a bunch of problems in tests | 11:23 |
rakhmerov | where we don't use db_api.transaction() but we need to | 11:24 |
rakhmerov | xavierhardy: I'm not sure if you have enough patience to fix all of that ) | 11:24 |
rakhmerov | during today | 11:24 |
d0ugal | oh, wow | 11:24 |
d0ugal | this might take a bit of fixing | 11:24 |
rakhmerov | yes | 11:24 |
* d0ugal runs away | 11:25 | |
rakhmerov | I could probably fix it pretty quickly | 11:25 |
rakhmerov | because I already fixed a bunch of places like this | 11:25 |
d0ugal | right | 11:25 |
d0ugal | I would be happy to help of course, but I guess I will be much slower | 11:25 |
rakhmerov | btw, I often didn't understand why lazy loading sometimes works even after calling DB API methods | 11:25 |
rakhmerov | but now it's all clear :) | 11:26 |
rakhmerov | because sessions didn't get closed properly | 11:26 |
rakhmerov | maan... this is huge | 11:26 |
d0ugal | lol | 11:26 |
rakhmerov | xavierhardy: what do you say? Willing to fix these tests? | 11:26 |
rakhmerov | if you don't have time we can take over | 11:27 |
rakhmerov | the size of the patch will be crazy | 11:27 |
ddeja | rakhmerov: maybe it would be easier to fire a bunch of patches | 11:27 |
ddeja | in each we add session_aware to a new method in DB aPI | 11:28 |
ddeja | and fix mistral in needed places | 11:28 |
xavierhardy | It's not really a question of patiences | 11:28 |
xavierhardy | *patience | 11:28 |
xavierhardy | it's more a question of assignment at work | 11:28 |
xavierhardy | Mistral is not the only project I'm on (other are proprietary) | 11:28 |
xavierhardy | but today, our production is broken, so I must fix this | 11:29 |
rakhmerov | xavierhardy: ok, go ahead | 11:29 |
rakhmerov | let us know if you need help | 11:30 |
rakhmerov | ddeja: yes, we can do multiple patches if needed | 11:30 |
rakhmerov | but the changes are pretty simple | 11:30 |
rakhmerov | xavierhardy: thanks a ton for bringing this up | 11:31 |
rakhmerov | and sorry for having this problem still | 11:31 |
ddeja | rakhmerov: all of this tests must use with db_api.transaction, right? | 11:33 |
rakhmerov | yes | 11:33 |
ddeja | ok, I see | 11:33 |
rakhmerov | typically we do something like this | 11:33 |
rakhmerov | wf_ex = db_api.get_workflow_execution(id) | 11:33 |
rakhmerov | task_execs = wf_ex.task_executions | 11:34 |
rakhmerov | the last thing will fail if both methods are not under transaction | 11:34 |
rakhmerov | because of lazy loading | 11:34 |
ddeja | yup | 11:34 |
openstackgerrit | Xavier Hardy proposed openstack/mistral: Fix possible DB race conditions in REST controller https://review.openstack.org/396128 | 11:38 |
xavierhardy | 1) We need to fix the commit message | 11:40 |
xavierhardy | 2) We need to fix a bunch of failing tests | 11:40 |
xavierhardy | ? | 11:40 |
*** chlong has joined #openstack-mistral | 11:43 | |
*** chlong has quit IRC | 11:44 | |
d0ugal | rakhmerov: ^ | 11:52 |
d0ugal | xavierhardy: I think the commit message on the initial review is wrong - the new review is better, but could do with more detail | 11:53 |
d0ugal | xavierhardy: and for 2, I think that a number of the unit tests have been broken by this fix (as they were wrong initiall) | 11:53 |
*** dprince has joined #openstack-mistral | 12:19 | |
openstackgerrit | Merged openstack/mistral: Fix error message format for task run and continue https://review.openstack.org/395988 | 12:28 |
openstackgerrit | Merged openstack/mistral: Fix error message format in other task handler methods https://review.openstack.org/396009 | 12:28 |
openstackgerrit | Merged openstack/mistral: Fix error message format in action handler https://review.openstack.org/396038 | 12:28 |
*** bobh_ has joined #openstack-mistral | 12:29 | |
*** jaosorior_lunch is now known as jaosorior | 12:34 | |
*** bobh_ has quit IRC | 12:40 | |
*** ^Gal^ has quit IRC | 12:53 | |
xavierhardy | is that normal that Jenkins has not built yet? | 12:55 |
xavierhardy | I have a problem with tox in my build environment, it takes more than 15 minutes to install the dependencies | 12:57 |
d0ugal | xavierhardy: It's the TripleO CI, it takes a while | 12:57 |
d0ugal | xavierhardy: it just finished | 12:57 |
d0ugal | xavierhardy: it is a good sign that it passed :) | 12:58 |
xavierhardy | OK, what about tox ? | 12:58 |
d0ugal | xavierhardy: I'm not sure, it shouldn't take that long. | 12:58 |
d0ugal | xavierhardy: do you have any errors? | 12:58 |
d0ugal | xavierhardy: The Mistral tests do take some time, but not that long IIRC | 12:59 |
d0ugal | xavierhardy: Are you running all the Python versions? It might be quicker to work with one first. 'tox -e py27' | 13:00 |
xavierhardy | I have that with 2.7 and 3.5 | 13:00 |
xavierhardy | and pep8 | 13:01 |
d0ugal | ok | 13:01 |
xavierhardy | no, it's in our build environment | 13:01 |
d0ugal | oh, I see | 13:01 |
xavierhardy | it does not even start the test (it gets killed after 15 minutes | 13:01 |
xavierhardy | ) | 13:01 |
d0ugal | Strange, unless you have an error/traceback I'm not sure what it could be | 13:02 |
*** catintheroof has joined #openstack-mistral | 13:08 | |
xavierhardy | It does not matter for now, I'm running the tests locally | 13:38 |
*** kamtamtun has quit IRC | 13:45 | |
xavierhardy | I have configured the tests this way: | 14:06 |
xavierhardy | ./run_tests.sh --db-type postgres -N -n | 14:06 |
xavierhardy | and I end up with sqlite errors .... | 14:06 |
xavierhardy | ./run_tests.sh --db-type postgresql -N -n | 14:08 |
xavierhardy | I can't launch the tests on my build env nor on my desk for reasons that I can't explain | 14:14 |
xavierhardy | oslo_config.cfg.NoSuchOptError: no such option mistral in group [service_available] | 14:19 |
xavierhardy | how do I load the configuration file for the tests ? | 14:19 |
xavierhardy | d0ugal, rakhmerov | 14:21 |
*** jaosorior has quit IRC | 14:34 | |
*** jaosorior has joined #openstack-mistral | 14:35 | |
d0ugal | xavierhardy: I'm not sure, I generally run the tests against sqlite | 14:37 |
xavierhardy | rakhmerov, are you sure the session aware decorator is able to fetch the transaction open by with db_api.transaction | 15:01 |
xavierhardy | I have some doubts seen the errors | 15:01 |
xavierhardy | oh, my mistake, I did not "call" with db_api.transction | 15:07 |
xavierhardy | (no "()") | 15:07 |
xavierhardy | still, same error | 15:09 |
xavierhardy | ah, ok update_event_triggers calls db_api.transaction too | 15:13 |
*** jaosorior has quit IRC | 15:20 | |
*** bobh_ has joined #openstack-mistral | 15:24 | |
xavierhardy | I have tons of TypeError: exceptions must be old-style classes or derived from BaseException, not NoneType | 15:26 |
xavierhardy | I'll at least push the first fixes | 15:26 |
openstackgerrit | Xavier Hardy proposed openstack/mistral: Fix possible DB race conditions in REST controller https://review.openstack.org/396128 | 15:29 |
xavierhardy | any idea about those lazy loading errors ? | 15:38 |
xavierhardy | it basically happens because the transaction is already closed when we try to load some attributes from an object | 15:43 |
xavierhardy | so I guess (need to test though), we need to merge both change reviews | 15:43 |
xavierhardy | oh, OK, the test are actually broken | 15:47 |
xavierhardy | it's not inside the API code anymore | 15:47 |
xavierhardy | such as test_create_and_get_and_load_task_execution | 15:48 |
xavierhardy | which needs to be in transaction too | 15:48 |
ddeja | xavierhardy: yes, we've talked about this with rakhmerov before. Some tests need to be run in transactions | 15:48 |
xavierhardy | sorry, I was not following what you meant | 15:49 |
xavierhardy | now, I see it | 15:49 |
xavierhardy | OK, fixing | 15:49 |
*** rrecio has joined #openstack-mistral | 15:53 | |
xavierhardy | grep "lazy" /tmp/console.html | wc | 15:54 |
xavierhardy | 230 | 15:54 |
xavierhardy | that's what he meant with patience (2 hours later...) | 15:54 |
ddeja | I think so | 15:54 |
*** rrecio has quit IRC | 15:54 | |
*** rrecio has joined #openstack-mistral | 15:55 | |
xavierhardy | why not make this systematic | 16:05 |
xavierhardy | ? | 16:05 |
xavierhardy | at setUp and tearDown ? | 16:05 |
*** janki has quit IRC | 16:11 | |
ddeja | becouse we would like to have transactions as short as possible | 16:12 |
xavierhardy | in the tests too ? | 16:12 |
ddeja | I'm not sure | 16:15 |
ddeja | but if we do so, we make tests pass faster, which is good | 16:16 |
xavierhardy | does it ? | 16:16 |
xavierhardy | anyway, looking at the tests, it's not possible to make the transaction start atthe beginning and end at the end all the time | 16:16 |
ddeja | I see | 16:17 |
xavierhardy | I push a first batch | 16:17 |
xavierhardy | I'm done for today I guess | 16:17 |
openstackgerrit | Xavier Hardy proposed openstack/mistral: Fix REST API dangling transactions https://review.openstack.org/395696 | 16:19 |
ddeja | ok, thanks xavierhardy :) | 16:20 |
*** harlowja has joined #openstack-mistral | 16:31 | |
*** harlowja has quit IRC | 16:32 | |
*** harlowja has joined #openstack-mistral | 16:34 | |
*** bobh_ has quit IRC | 16:44 | |
*** catinthe_ has joined #openstack-mistral | 16:45 | |
*** catintheroof has quit IRC | 16:46 | |
*** jpich has quit IRC | 17:06 | |
*** igormarnat has quit IRC | 18:03 | |
*** akuznetsova has quit IRC | 18:04 | |
*** rakhmerov has quit IRC | 18:04 | |
*** rakhmerov has joined #openstack-mistral | 18:07 | |
*** akuznetsova has joined #openstack-mistral | 18:07 | |
*** igormarnat has joined #openstack-mistral | 18:09 | |
*** hrybacki has joined #openstack-mistral | 18:37 | |
*** jamielennox|away is now known as jamielennox | 18:37 | |
hrybacki | hey all. I'm seeing receiving an 'Enviornment not found <stack> ' from Mistral (newton) exception calling `openstack overcloud node delete --stack <stack>' I verified that the stack id is correct -- any pointers? https://paste.fedoraproject.org/477444/79992414/ | 18:38 |
hrybacki | journalctl logs for ^^ https://paste.fedoraproject.org/477464/14788013/ -- see line ~166 | 18:38 |
*** shardy has quit IRC | 18:41 | |
*** dprince has quit IRC | 18:48 | |
*** dprince has joined #openstack-mistral | 19:03 | |
*** bobh_ has joined #openstack-mistral | 19:05 | |
*** openstackgerrit has quit IRC | 19:48 | |
*** openstackgerrit has joined #openstack-mistral | 19:48 | |
*** gyee has joined #openstack-mistral | 20:27 | |
*** bobh_ has quit IRC | 20:49 | |
*** dprince has quit IRC | 20:50 | |
*** gyee has quit IRC | 20:54 | |
*** gyee has joined #openstack-mistral | 21:32 | |
*** sharatss has quit IRC | 21:35 | |
*** sharatss has joined #openstack-mistral | 21:37 | |
*** bobh has joined #openstack-mistral | 21:38 | |
*** bobh has quit IRC | 21:38 | |
*** bobh has joined #openstack-mistral | 21:38 | |
*** bobh has quit IRC | 21:42 | |
*** catinthe_ has quit IRC | 21:59 | |
*** Kiall has quit IRC | 22:38 | |
*** rrecio_ has joined #openstack-mistral | 22:57 | |
*** rrecio_ has quit IRC | 22:57 | |
*** rrecio has quit IRC | 23:00 | |
*** catintheroof has joined #openstack-mistral | 23:05 | |
*** catintheroof has quit IRC | 23:09 | |
*** catintheroof has joined #openstack-mistral | 23:17 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!