22:02:49 <devananda> #startmeeting db 22:02:50 <openstack> Meeting started Thu Feb 7 22:02:49 2013 UTC. The chair is devananda. Information about MeetBot at http://wiki.debian.org/MeetBot. 22:02:51 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 22:02:53 <openstack> The meeting name has been set to 'db' 22:03:30 <devananda> (kinda lost my train of thought with the break, heh) 22:03:51 <comstud> i have 2 separate issues 22:03:56 <comstud> though somewhat related 22:04:12 <comstud> but I dunno what the agenda here is.. 22:04:34 <devananda> so i didn't have an agenda for today 22:04:40 <comstud> ok 22:04:49 <devananda> last meeting, we went over the status of all the db BP's, which was good, but i dont think we need to rehash that 22:05:10 <devananda> the problems you're seeing with pooling is pretty big IMO 22:05:17 <devananda> especially how that will affect conductor 22:05:33 <comstud> they've uncovered at least one problem for sure, one with which you were already aware, I think 22:05:36 <comstud> yeah 22:05:53 <comstud> i'll recap 22:06:01 <comstud> starting with this first issue 22:06:57 <comstud> well, first... 22:07:20 <comstud> With the move to nova-conductor... we're now pushing DB queries to a single node (or set of nodes if you run multiple nova-conductors)... 22:07:31 <comstud> but in a large deployment, you have much less conductors than computes, likely. 22:07:35 <comstud> You don't want to have 1 per compute :) 22:08:04 <comstud> This means that DB pooling becomes important... as without it, DB calls block the whole python process. 22:08:40 <comstud> The RPC queue will back up because the service cannot keep up with all of the DB requests if it has to block on every single one without any sort of parallelism 22:09:00 <comstud> Good so far? 22:09:05 <devananda> yep 22:09:23 <comstud> Cools... so I've been doing testing with db pooling 22:09:45 <comstud> With the trunk code, under load, you can get some connect timeouts, etc. This patch is needed: 22:09:47 <comstud> https://review.openstack.org/#/c/21461/ 22:09:57 <comstud> which changes how the thread pooling is done 22:10:15 <comstud> After applying that... 22:10:34 <comstud> issue #1) innodb reporting Deadlock 22:11:07 <boris-42> Seems that DB pooling start working=) 22:11:08 <comstud> when you get 2 bw_usage_update() requests in parallel for the same instance uuid (can be different mac addresses/networks), this happens 22:11:19 <comstud> This is due to doing UPDATE+INSERT in a single transaction 22:11:53 <devananda> iirc, it should also deadlock the first time any instance is provisioned for a new tenant, if that happens in parallel. 22:12:05 <devananda> and in a few places around fixed / floating IP allocation, if they happen fast enough in parallel 22:12:21 <comstud> Yeah, you had a list that reported a number of other spots where this could happen 22:12:31 <comstud> besides bw_usage_update() 22:12:40 <devananda> right. there are also some open bugs on LP about these now 22:13:13 <comstud> In any case... We've found that bw_usage_cache table should have a unique index on instance_uuid+mac+start_period, which it doesn't right now. 22:13:29 <comstud> And that would allow us to do things like INSERT ON DUP KEY UPDATE 22:13:40 <comstud> if we can find a portable way to do it with sqlalchemy :) 22:14:37 <dripton> sounds like we should add the index regardless, while we hunt for the portable upsert 22:14:42 <comstud> I tested a variation on this that removed the Deadlock 22:14:49 <comstud> yes 22:15:11 <comstud> I think we can solve that issue without *too* much trouble, however 22:15:13 <comstud> issue #2: 22:15:53 <comstud> Even after getting innodb Deadlocks to go away, I'm seeing queries getting stuck and timing out due to waiting on innodb locks. 22:16:23 <comstud> The only conclusion is that sqlalchemy is not committing a transaction... it's holding it somewhere 22:16:26 <devananda> i dug through my notes and remembered that i found this issue a while back, too 22:16:34 <devananda> https://bugs.launchpad.net/nova/+bug/1007038/comments/4 22:16:36 <uvirtbot> Launchpad bug 1007038 in nova "Nova is issuing unnecessary ROLLBACK statements to MySQL" [Low,Confirmed] 22:16:40 <comstud> but I can only see this when thread pooling is on 22:16:58 <comstud> devananda: I was able to cause an extra rollback very easily.. 22:17:05 <comstud> kind of a side toopic 22:17:08 <comstud> topic 22:17:08 <devananda> comstud: it was happening in august last year. was thread pooling in at that time? 22:17:12 <comstud> no 22:17:28 <comstud> it seems to be our wrapping of DB exceptions 22:17:30 <devananda> rather. it was happening when I disabled pool_reset_on_return 22:17:34 <comstud> from what I could tell 22:17:48 <devananda> hmmm 22:17:50 <comstud> i had code such as this: 22:17:55 <comstud> with session.begin(): 22:18:07 <comstud> try: 22:18:09 <comstud> do a query 22:18:18 <comstud> except DBDuplicateEntry: 22:18:21 <comstud> pass 22:18:34 <comstud> and I noticed that 2 rollbacks would happen if there was an exception 22:18:43 <comstud> and there'd be a error logged about it 22:18:48 <devananda> kinda makes sense 22:18:59 <devananda> sqla is probably doing a rollback before the exception bubbles up to your code 22:19:04 <comstud> it seems like a rollback would happen when the exception occurred 22:19:13 <comstud> and then also when we exited the context manager. 22:19:22 <devananda> and then after you leave the with session context, another implicit rollback happens (because of pool_reset_on_return) 22:19:25 <devananda> right 22:19:25 <comstud> yep! 22:19:27 <comstud> exactly 22:19:40 <comstud> that's a side issue, I think 22:19:57 <comstud> i'm concerned about where sqlalchemy seems to be holding a transaction 22:20:04 <devananda> yea. me too 22:20:07 <comstud> causing these lock wait timeouts 22:20:18 <comstud> I can cause the whole process to stall by slamming DB requests 22:20:33 <comstud> after meetings are done today, I was going to debug with eventlet backdoor 22:21:17 <comstud> anyway, that's where I am with 'bugs' that show up with db pooling. 22:21:45 <comstud> separate issue, somewhat related: 22:22:08 <comstud> sqlalchemy is extremely slow compared to raw queries with MySQLdb. 22:22:14 <comstud> And there are no lock wait timeouts, either 22:22:15 <comstud> :) 22:22:35 <devananda> :) 22:22:37 <dripton> comstud: I think that's only with the ORM. Low-level sqla is pretty fast IMX 22:22:48 <comstud> it very well might be, I just haven't tested it yet 22:22:57 <comstud> belliott and I have been working on this together 22:23:02 <devananda> by raw queries, you mean model.insert? or session.execute("INSERT ... INTO table") 22:23:27 <devananda> just want to be clear 22:23:31 <comstud> mysqldb .cursor(....) 22:23:35 <dripton> the fast ones are session.execute, but it could be the Python insert() not necessaily the string insert 22:24:01 <dripton> comstud: please confirm for yourself, but I want to make sure we don't throw out the baby with the bathwater 22:24:23 <comstud> cursor.execute('UPDATE bw...') 22:24:28 <dripton> my experience is that low-level SQLAlchemy is an excellent portability shim, and high-level sqla ORM doesn't scale to big projects 22:24:38 <comstud> dripton: I'm not proposing we do. I'm just stating what I've seen so far 22:24:42 <dripton> ok 22:24:53 <devananda> "doesn't scale" sums up my exp with all ORMs 22:24:56 <dripton> +1 22:25:01 <comstud> it was just a quick hack to test something 22:25:04 <dripton> but I haven't tried all of them so I'm being nice 22:25:24 <comstud> wanted to eliminate innodb just being stupid somehow 22:25:30 <dripton> yep 22:25:33 <dripton> one layer at a time 22:26:00 <devananda> would it be viable to replace just the areas that are likely to deadlock with raw sql? 22:26:21 <dripton> I think we'd end up replacing entire api calls that have problems 22:26:26 <comstud> i was thinking about that yes 22:26:33 <devananda> ie, anywhere that currently uses with_lockmode or select-then-update-or-insert 22:26:55 <devananda> and yea, that does mean several api calls get entirely rewritten to avoid locking issues 22:26:56 <comstud> but I think all of the calls potentially have this 'lock wait timeout' issue 22:27:13 <comstud> because it's a sqlalchemy orm layer problem 22:27:14 <boris-42> comstud +1 22:27:23 <comstud> with db pooling anyway 22:27:25 <devananda> well 22:27:29 <dripton> We can start with a couple that are known bad then start attacking en masse if it works well 22:27:31 <devananda> while i agree with that 22:27:40 <comstud> but in particular, I've seen it with 3 or 4 calls 22:27:45 <devananda> if we replaced all the ORM code that touches table X with raw sql, that didn't have locking problems 22:27:48 <comstud> they just happen to be the most common calls that cells uses. 22:27:59 <comstud> so that doesn't eliminate any of the others 22:28:15 <devananda> true. but lock waits and deadlocks are table specific 22:28:25 <devananda> it's not like holding on lock on table_a causes a problem for table_b 22:28:33 <comstud> sure sure 22:28:34 <comstud> i know 22:28:42 <comstud> but i'd expect that it could happen on any table 22:28:42 <devananda> granted, there _is_ a larger problem with sqla here, which i'd love to see fixed :) 22:28:53 <comstud> yea 22:28:54 <devananda> ah. so I wouldn't 22:29:07 <devananda> afaict it will only happen when the ORM is trying to pre-emptively lock rows 22:29:20 <devananda> ie, with_lockmode or SELECT-then-INSERT 22:29:26 <devananda> there aren't that many places which do that 22:29:35 <comstud> but if there's somehow an implicit transaction for all of our DB api calls, does that not mean there's locking of some sort on every table? 22:30:00 <devananda> well, yes, but no :) 22:30:08 <comstud> i was able to see this problem even after removing session.begin() 22:30:12 <comstud> for instance 22:30:21 <devananda> select 1; begin; update table ... where ..; commit; rollback; 22:30:23 <comstud> (the lock wait timeouts, not the Deadlock) 22:30:28 <devananda> that's the usual sqla query pattern for a single update 22:30:34 <dripton> But if we change an API call to not use the ORM then that should go away, right? 22:31:36 <dripton> We can put sqla in trace mode to see all the commands it sends to the DB, and verify that simple stuff doesn't do any extra transaction stuff. 22:31:36 <devananda> if sqla somehow forgets to do the final "commit; rollback;" then yea, the lock wait problem could happen anywhere 22:31:53 <comstud> nod 22:32:03 <comstud> atm, that's what I suspect somewhere.. but I don't know for sure 22:32:09 <devananda> k 22:32:14 <comstud> possible i'm wrong and it's only happening in certain calls 22:32:35 <comstud> dunno! 22:32:53 <devananda> i wonder whether it's related to eventlet? 22:33:02 <comstud> but I know we're approaching the limits here at RAX in global-cells 22:33:07 <comstud> with all of the DB calls it has to do 22:33:13 <comstud> so I need Db pooling RSN 22:33:20 <dripton> devananda: me too 22:33:36 <comstud> I may have to resort to raw queries for things at least internally if I can't figure this out quickly 22:33:53 <comstud> and I know this same problem will show up in conductor under load 22:34:05 <comstud> (we're still using local conductor only) 22:34:37 <comstud> i'm going to attempt to find the sqlalchemy issue first 22:34:44 <dripton> sounds good 22:34:44 <comstud> and if that takes too long, look at low level sqlalchemy calls 22:34:49 <devananda> ++ 22:35:33 <devananda> i wouldn't mind seeing low level sqla calls upstream, esp considering it's better for performance 22:35:39 <dripton> I will have a work-in-progress patch containing low-level sqla up soon, if you need a reference. It doesn't quite work yet. 22:35:55 <comstud> i could def use a reference 22:36:06 <dripton> I'll ping you when I upload it. 22:36:10 <comstud> or.. it'd save some googling at least :) 22:36:15 <comstud> hehe ty 22:36:59 <dripton> Here's my fundamental problem with db-archiving: my current api call is to move *all* deleted rows to shadow tables in one call. That could take a long time. I don't know a clever way to subdivide the job. 22:37:26 <devananda> comstud: would it be possible to disable eventlet in conductor, while still enabling tpool? 22:37:44 <devananda> probably a crazy idea ... 22:38:02 <devananda> dripton: ORDER BY id LIMIT X; 22:38:05 <comstud> tpool is implemented in eventlet 22:38:10 <comstud> eventlet.tpool 22:38:13 <devananda> hah. yep, crazy idea 22:38:43 <comstud> we could do our own threading, i suppose.. but you lose pseudo-parallelising pulling off the rabbit queue 22:39:02 <comstud> maybe it's a win in the end, though, i dunno 22:39:14 <comstud> i suspect kinda not 22:39:17 <comstud> but i dunno! 22:39:25 <dripton> devananda: yes, that works within one table. I'm worried about cross-table issues due to foreign keys, but we don't have many so maybe it's okay. 22:39:42 <devananda> dripton: there shouldn't be any FKs in production IMNSHO .... 22:40:00 <devananda> dripton: but even so, i'm not sure how that would matter 22:40:11 <devananda> dripton: loop on one table, in small chunks, until it's finished. then move to next table 22:40:23 <devananda> make sure each chunk is a separate transaction, too 22:40:27 <boris-42> devananda +1 22:40:44 <devananda> otherwise you can blow away the innodb undo space, and it will stall replication, etc... 22:40:47 <dripton> devananda: sure, FKs would just impose an ordering on how to move things. But, yeah, I'll do a bunch of little transactions and have a row limit per call to the api 22:41:45 <comstud> i'm concerned about down time for large deployments 22:41:53 <comstud> with the changes to soft delete 22:41:55 <comstud> the deleted column 22:42:18 <comstud> maybe we just prune the tables first if we don't care about archiving 22:42:38 <dripton> comstud: you mean downtime when running the archiving operation? 22:42:45 <comstud> DB migration 22:42:47 <comstud> in general 22:42:48 <dripton> comstud: or when migrating the deleted column? 22:42:50 <dripton> ah 22:42:52 <comstud> yeah that one 22:42:55 <comstud> we ran a test... 22:43:13 <comstud> we took the proposed migration and ran it against a copy of the DB 22:43:18 <comstud> it took about 45 minutes IIRC 22:43:18 <comstud> :) 22:43:23 <boris-42> =))) 22:43:26 <devananda> i would assume that deployuers are not going to just run all the migrations blindly, but we should include a note about the larger migrations 22:43:36 <dripton> +1 22:43:36 <devananda> comstud: 45min is nothing for a big ALTER TABLE ;) 22:43:52 <comstud> yeah, but unfort due to use of sqlalchemy orm... 22:43:53 <boris-42> alter table for all tables=) 22:43:55 <dripton> comstud: I was planning to include another API call to nuke deleted rows 22:43:57 <comstud> it means shit is broken during the whole migration 22:44:16 <devananda> sure. which is why you have two db's & HA 22:44:17 <devananda> :) 22:44:17 <comstud> i think we just prune our tables first 22:44:32 <devananda> or that, hehe 22:45:55 <comstud> i gotta drop off.. anything else for me? 22:46:00 <devananda> anyone have other topics to bring up? it's seeming like we're about done 22:46:06 * comstud waits 22:46:18 <dripton> I'm done 22:46:22 <devananda> cool 22:46:26 <devananda> thanks guys :) 22:46:30 <dripton> thanks for the tip devananda 22:46:34 <devananda> #endmeeting