20:59:58 #startmeeting nova 20:59:59 Meeting started Thu Apr 4 20:59:58 2013 UTC. The chair is vishy. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:00 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:02 The meeting name has been set to 'nova' 21:00:05 oh hai! 21:00:08 Doh! 21:00:10 I wanted to do that! 21:00:17 \o 21:00:17 #chair mikal 21:00:19 HI 21:00:20 Heh 21:00:22 hi! 21:00:24 Current chairs: mikal vishy 21:00:24 o/ 21:00:32 #link https://wiki.openstack.org/wiki/Meetings/Nova 21:00:34 hi 21:00:40 #chair russellb 21:00:41 Warning: Nick not in channel: russellb 21:00:42 Current chairs: mikal russellb vishy 21:00:53 he's on vacation :) 21:01:01 I know... russellb asked me to run this meeting because he's currently taking a vacation in an Olive Garden or something. I think his goal is to finish the salad. 21:01:09 But I am also happy with vishy running it if he wants. 21:01:09 heh 21:01:23 el oh el 21:01:33 So, I think this will be relatively short... First agenda item... 21:01:42 #topic backport bug review 21:01:57 So, the only bug I am aware of that we should backport is https://bugs.launchpad.net/nova/+bug/1163660 21:01:58 Launchpad bug 1163660 in nova "nova-manage uses nova.db.fixed_ip_get_all_by_instance_host, which no longer exists" [Medium,In progress] 21:02:00 Any others? 21:02:13 Hillariously, that's been broken for over a year. 21:02:23 there are more 21:02:48 note the critical: https://bugs.launchpad.net/nova/+bugs?field.tag=grizzly-backport-potential 21:02:50 and the high 21:03:06 the nova-manage fixed list i think isn't too bad 21:03:15 Oh, its not as bad as I first thought it was 21:03:17 but not being able to use memcached is pretty bad 21:03:19 Its embarrassing though 21:03:32 * mikal is waiting for LP to load that page 21:03:49 that's a lot of things to backport 21:03:57 how does that compare to folsom? 21:04:32 Half are "fix committed" at least 21:05:02 dansmith: I'm not sure, I don't remember that far back 21:05:05 we always had embarrassing bugs showing their heads on release +1h 21:05:10 yeah, there isnt a folsom-backport-potential 21:05:34 So, it seems we should at least try and get them all assigned to people who are actually working on them... 21:05:53 so, I guess we don't have one for the system_metadata concern, 21:06:04 but is anyone else in a panic over that? 21:06:17 Want to provide more detail? 21:06:28 there is also this: https://review.openstack.org/#/c/26109/ 21:06:49 folsom has has many hundreds of backports 21:07:05 mikal: apparently comstud is severely concerned about performance related to the system_metadata "explosion" 21:07:12 vishy: ah, good to know, I've never looked 21:07:19 dansmith: ahhh yeah, that blew up yesterday 21:07:32 mikal: yeah :/ 21:07:32 dansmith: this review looks like it fixes that? 21:07:56 mikal: this review? 21:08:08 mikal: you mean 26109? no.. :) 21:08:29 mikal: it relieves one pressure point that was easy :) 21:08:40 comstud: you around? 21:08:50 he's on a plane 21:08:51 dansmith: hmmm, ok 21:09:12 Ok, so it sounds like that needs more work then 21:09:16 we've got a few plans for mitigation, but yeah 21:09:33 yeah joining all of the metadata at once is bad 21:09:34 There are also three bugs in that grizzly-backport-potential list with no assignee 21:09:50 honestly we should have stuck of the flavor related metadata into one field 21:10:07 then we could have joined it all at once 21:10:11 sigh 21:10:33 vishy: could still do that, but aren't those columns limited to 255 chars? 21:10:47 dansmith: yeah we'd have to migrate them to text 21:10:52 vishy: yeah 21:11:00 vishy: I wonder how big those tables are in deploys... 21:11:07 vishy: well, that's doable since everything is calling the instance_type_extract() to get at it 21:11:44 we could make it join on system_metadata where key = instance_type 21:11:45 so, that helps because we have fewer things in system_metadata for each instance, 21:11:53 but we're still going to be joining the whole thing right? 21:12:04 oh, we can do that? 21:12:13 but then we have to figure out where all of the other system metadata is being used 21:12:21 and explicitly join it when we need it 21:12:23 yeah 21:12:26 dansmith: so, this sure sounds like it should have a bug filed if one doesn't already exist 21:12:45 mikal: yeah, I don't know that comstud doesn't have one, he's usually quick on that 21:13:29 looks like not though 21:13:31 * mikal asks his coworkers 21:14:14 so, one thought we had, 21:14:24 was to avoid the join entirely, 21:14:42 and just do another query for the system_metadata items we care about 21:14:57 which we can do in bulk after we get all the instances that we're going after in a given query 21:15:16 the thought being that two trips to the db is better than the massive join 21:15:37 so if that makes a significant improvement, that might be better than trying to do something more invasive 21:15:59 dansmith: i'm not convinced that will be faster but it is worth a shot 21:16:15 dansmith: if that means only pulling the metadata a) when it's actually needed, and b) which is desired, not all of it, then I think it'll be big 21:16:15 vishy: yeah 21:16:30 Isn't that table lacking indexes to make that not suck? 21:16:37 devananda: unfortunately, it doesn't mean either of those things necessarily 21:16:55 mikal: no, apparently the indexes are the only reason it doesn't kill kittens already 21:17:24 I can see an index on instance_uuid, but not key? 21:17:36 mikal: we don't need a key, right? 21:17:46 probably not, since afaik nothing currently fetches only a single key 21:17:46 mikal: since we're joining only on uuid and taking everything 21:17:51 right 21:17:54 we will want an index though :) 21:17:57 dansmith: well, if you're going to cherry pick out specific keys to avoid a full join... 21:18:15 mikal: that's not what we're talking about in this step 21:18:17 devananda: is there some way to optimize the join so that we don't get quite so many rows back? 21:18:31 so the eventual query would be WHERE (uuid=X and key in(...)) OR (uuid=Y and key in (...)) ... 21:18:43 for examle with 24 metadata items we get 24 copies of each instance right? 21:18:53 right, but we need all of those, 21:18:56 hrmm 21:19:00 since we don't know if the caller is going to use system_metadata or not 21:19:13 well we need all of the rows, but we don't need all of the instance data in every row 21:19:17 vishy: i hadn't considered the code was doing that 21:19:22 not sure if it would make any difference 21:19:32 but you may be right. that'll be a huge waste of network traffic but not necessarily nmake the join any slower 21:19:43 so, comstud said that it wasn't pegging cpu or anything, just generating a crapton of network traffic, 21:19:59 so I think that the problem _is_ the extra data, I just can't see how a join would not generate the full result on each row 21:20:00 if the caller needs 4 pieces of metadata but fetches 20, that slows down the query cause the db has to do 5x more work 21:20:25 The resource tracker does know what metadata items it needs, right? 21:20:33 So it could provide that as a hint to the lookup 21:20:40 if, otoh, the db returned only those 4 pieces of metadata -- but each ROW also had an identical copy of the instance data -- then it slows down the network and, possibly, python -- but not th DB 21:20:55 mikal: in a few cases we could micro-optimize, yeah, 21:21:11 dansmith: well, the resource tracker is what is punching comstud in the face 21:21:12 mikal: but it's hard for things like compute manager where it's going to pass the instance to a bunch of random things 21:21:15 dansmith: its the painful case 21:21:42 mikal: no, I think that the other patch made that better.. he said the periodic task was the killer now, I thought 21:22:02 dansmith: ahhh, ok 21:22:08 i think the right solution here is to convert metadata and system_metadata into a json blob 21:22:23 unless there are places we are filtering on it 21:22:27 vishy: do you seeing that being backported to grizzly? 21:22:30 what's the impact of the text column? 21:22:49 in which case we should just convert parts of metadata (like flavor info) to blobs 21:22:53 vishy: that's a helluva migration too, right? 21:22:55 just to take a step back, where exactly is the performance problem right now? 21:22:58 dansmith: negligible afaik 21:23:18 devananda: network traffic out of the db server, I believe 21:23:22 devananda: when you have large numbers of instances the joins makes the data transfer crazy 21:23:22 devananda: we really should have comstud here 21:23:35 moving from a k/v table to using a json blob (one row per uuid) is a dramatic change on db behavior 21:23:38 like 10000 rows turns into 120000 21:23:57 vishy: ok. so in that case, is the issue network traffic or DB time for the JOIN ? 21:24:11 devananda: I believe comstud's problem was network traffic 21:24:14 network traffic + unpacking time 21:24:16 comstud made it sound like traffic 21:24:17 yeah 21:24:19 has someone separated those by timing the first packet and the last packet of mysql's response? 21:24:25 ok 21:24:31 sqlalchemy processing 120000 rows is pretty damn slow as well 21:24:36 so then change the JOIN to a subquery + CONCAT 21:24:46 or use an in-memory pivot table 21:24:52 to return only 1 row per instance 21:25:04 devananda: do you have any idea how to do either of those in sqlalchemy? 21:25:12 from python's POV, it has the same effect as going to json but doesn't require any db migration 21:25:22 vishy: no clue in sqla ... 21:25:30 devananda: that would be sweet 21:25:31 boris-42: any thoughts on ^ ? 21:25:33 that sounds nice (no migration) 21:25:52 * dansmith pretends he understands what devananda said 21:26:30 basically we want to add an extra field to instances that does a subquery on metadata items and sticks them all into a single field. 21:26:40 dansmith: http://stackoverflow.com/questions/4555988/sql-server-pivot-on-key-value-table (first hit on google, haven't read it...) 21:26:59 but the question there shows an exaple of our problem 21:27:00 devananda: actually we could just add a prefix to the fields right? 21:27:03 and what a pivot does 21:27:11 vishy: i'm not sure why we'd do that 21:27:19 metadata_key1, metadata_key2 etc. 21:27:26 devananda: that looks excellent :) 21:27:38 vishy: ah. yes 21:28:03 vishy: all columns in table 1 + (prefix+key) for all columns in table 2 21:28:04 we already have a prefix on the type keys, do we need more? 21:28:58 vishy: another approach is to use GROUP_CONCAT to return all the key/value pairs in a single text field 21:29:08 eg, "key1:val1,key2:val2" 21:29:23 devananda: so you concat them with a , and hope none of the vals have a , in them? 21:29:25 as long as we have separators which are guaranteed not to be present in either key or value ;) 21:29:34 yeah thats the part i'm worried about :) 21:29:46 devananda: so is this going to be significantly better than two queries? 21:29:58 I mean, a little more latency, but is it actually faster for some reason? 21:30:10 https://github.com/tomasd/pandas-sqlalchemy-pivot 21:30:15 because I haven't seen anyone speak in SQLA terms yet 21:30:45 dansmith: probably not faster, no. 21:31:05 okay 21:31:30 vishy: do I take the README on that project to mean that SQLA can't do this itself? 21:31:51 dansmith: so you are saying that we convert all of the instance gets to do a separate query for metadata and shove it in? 21:32:11 db still has to do the same work to get the entries from system_metadata. returning (uuid,key,val) 10M times or returning (uuid, other fields, concat of keys and values) 100k times... the total byte count will be similar. 21:32:18 vishy: that's what we want to try, yeah 21:32:29 dansmith: i guess that avoids copying the extra data from each row 21:32:35 vishy: the only problem I have right now is that the Instance SQLA object doesn't want me fscking with its field, 21:32:49 so I have to do some dirty work to convert it to a dict 21:33:01 which is ugly, and maybe a showstopper, but it works for testing at least 21:33:04 dansmith: we already do some similar mucking 21:33:10 for aggregates 21:33:12 okay then :) 21:33:15 iirc 21:33:21 https://review.openstack.org/#/c/26136/ 21:33:47 that's just a quick hack to test with, but it appears to pass tests 21:34:05 dansmith: dict(inst.iteritems()) 21:34:07 btw 21:34:11 ah, okay 21:34:53 lets get comstud to test taht when he gets off the plane 21:34:58 i think we need a bug report as well 21:35:05 vishy: yeah 21:36:12 well, sorry for dominating the meeting, 21:36:16 Heh 21:36:16 but excellent discussion, IMHO :) 21:36:19 no it is important 21:36:23 It was worthwhile 21:36:25 i will mark the bug critical 21:36:33 and we can get it into stable/grizzly next week 21:36:33 Did we actually find a bug? 21:36:46 mikal: I didn't see one 21:36:48 (in LP I mean) 21:36:55 we have enough bugs to warrant a quick grizzly.1 i think 21:37:04 mikal: can someone report it? 21:37:09 I can 21:37:13 Ok cool 21:37:23 (I have that ability) 21:37:41 Ok, so what about https://bugs.launchpad.net/nova/+bug/1164072 -- is on the backport list but doesn't have an assignee 21:37:42 Launchpad bug 1164072 in nova "libvirt livemigration warning logger refernces a not defined local varaible" [Medium,Triaged] 21:38:07 Stupid locals() 21:39:11 https://bugs.launchpad.net/nova/+bug/1164737 21:39:13 Launchpad bug 1164737 in nova "Instance joins with system_metadata are critically slow" [Critical,Confirmed] 21:39:35 Actually, I'll take 1164072 21:39:38 Btw I know probably it is offtop but could we speak little bit about this bp https://blueprints.launchpad.net/nova/+spec/use-db-time 21:39:41 There are a few things wrong with that log message. 21:39:47 I have some questions.. 21:39:55 boris-42: can we do that in "open discussion"? 21:40:06 yeah that one is super easy :) 21:40:21 The other unassigned one is https://bugs.launchpad.net/nova/+bug/1160442 21:40:22 Launchpad bug 1160442 in quantum "when boot many vms with quantum, nova sometimes allocates two quantum ports rather than one" [Undecided,Incomplete] 21:40:36 mikal ok 21:40:54 mikal: is that a race condition in db? (just guessing, still reading) 21:41:11 devananda: no idea. I'm trying to trick someone into digging into it... 21:41:20 * devananda hides 21:42:09 No takers? 21:43:03 Fair enough 21:43:03 mikal: arosen might be good for that one 21:43:20 vishy: ok, I will ping him 21:43:23 Unless he's here... 21:43:26 i just did in nova 21:43:33 Cool 21:43:47 Is there anything else we should discuss backport wise? 21:44:14 mikal: I'm going to discuss doing a quick .1 with the stable-backports folks 21:44:51 vishy: ok. Do you want to issue a deadline for people to try and get their fixes in to make that release? 21:45:18 if we do it it will probably just be a quick version of all the really important ones 21:45:23 high + critical 21:45:34 Ok, cool 21:45:38 but i need to discuss with markmc + other stable team to be sure 21:45:45 Fair enough 21:45:55 We're ok to move on to the next topic? 21:45:57 it would be great if everyone could help stay on top of the bug queue 21:46:04 i knocked out about 15 today 21:46:09 but there are still a few left 21:46:21 addressing bugs quickly right after release is really important 21:46:29 Yeah, I see 26 bugs needing triage as well 21:46:31 because a lot of people don't try the rcs 21:46:47 mikal: refresh, its only 12 21:46:48 :) 21:47:09 Ahhh, I think we're counting differently. I'm including incomplete bugs as untriaged. 21:47:17 oh gotcha 21:47:30 i set bugs back to incomplete if i need more info from the reporter 21:47:46 Yeah, that's fine. I think that's you doing the right thing. 21:47:52 I do the same 21:48:08 But in my mind its still a bug we need to categorize, even if we're waiting for someone 21:48:15 Otherwise if they never reply it sits around forever 21:48:49 We really should move on... 21:48:53 #topic design summit 21:49:10 IIRC russellb has now let people know about session acceptance 21:49:44 Hmmm, there are still a fair few marked as "unreviewed" though 21:50:27 Does anyone have anything they want to say about the summit? 21:51:13 Moving on then 21:51:15 #topic open discussion 21:51:21 boris-42: now is your time to shine... 21:51:37 mikal thanks 21:52:10 So, reading that blueprint... 21:52:17 timeutils.utcnow() is super useful for unit testing 21:52:17 So we discussed that using local time on nodes is bad idea.. 21:52:28 I'd expect a bunch of unit tests to need work if we removed it and used DB time 21:52:46 indeed 21:52:52 boris-42: that seems like a huge amount of work for very little benefit 21:53:09 vishy it seems only =) 21:53:47 it's a pretty common problem in distributed systems 21:53:52 boris-42: it seems like a lot of places we use utcnow we are not actually needing to compare against db objects 21:54:01 if the clock gets out of sync somewhere, all kinds of issues crop up 21:54:28 devanada: +1 21:54:30 eg, if one compute host's ntpd dies and it drifts, pretty soon it may appear "dead" to the scheduler 21:54:30 Yes but there is also a lot of places where we are comparing 21:54:41 I have problems with only 70 nodes =) 21:54:50 just about any middleware "environment" ends up with some concept of system time (OMG Time Service, etc) 21:55:02 because the compute host is writing it's localtime to the db updated_at value, which the scheduler compares to it's localtime 21:55:05 So the idea is pretty simple to add methods in db and conductor 21:55:22 and use it instead of timeutils.utcnow() 21:55:25 i'm just worried about adding a whole bunch of unneded db calls 21:55:42 They are pretty simple for DB 21:55:52 I wonder how much extra load it will be on conductor as well? 21:55:53 if done right, i don't think it'll result in more db calls at all 21:55:55 i guess we could specifically split out system_time and local_time 21:55:58 Do we know how often we ask for the time? 21:56:17 it's only a matter of when something writes a timestamp TO the db that the problem occurs 21:56:27 er, correcting myself. that's half 21:56:37 we use it for caching and things as well 21:56:42 the other half is when somethig reads a time from the DB and compares to local time. but again, it can ask the DB for its time 21:57:08 ah, hrm 21:57:31 there are a lot of places that this will not scale 21:57:47 making a db call every time we need to check if a cache item is valid is rediculous 21:57:57 so we will need two versions of time 21:58:06 one where we just need a locally consistent time 21:58:19 and one where we need a globally consistent time 21:58:22 vishy +1 21:58:25 ++ 21:58:40 So, what about a periodic task which checks db time against system time? 21:58:43 ++, but while we're changing everything, we should always use UTC rather than local. 21:58:48 Every ten minutes or something 21:58:54 devananda: any idea about db systems that have master/master 21:59:04 dripton: sorry i don't mean local as in timezone local 21:59:05 mikal: of what benefit is that? 21:59:12 i mean local as in local to this machine 21:59:16 still in utc time 21:59:36 vishy: when there are >1 DB at play (whethe rmaster,master or galera or ..) it's extra important for their clocks to be in sync 21:59:50 devananda: well, we could at least log that we've detected an error condition. It would help with diagnosis from an operator. 21:59:51 vishy: in my former db consulting hat, i saw lots of issues creep up from that 21:59:59 for example this: https://github.com/openstack/nova/blob/master/nova/openstack/common/memorycache.py#L62 22:00:05 should never talk to the db 22:00:32 devananda: ok so we are assuming that keeping the db clocks in sync is easier than keeping all of the nodes in sync? 22:00:49 mikal: if there is a clear separation between things that need to track distributed-time and things that need to track local/relative-time, then i don't think such a check would be meaningful 22:01:12 I'm skeptical of anything where we have to consult the db for the current time, 22:01:22 vishy: yes. there are probably a lot less db's than nova nodes, and a lot more attention should be spent on them :) 22:01:32 devananda: well, I'm thinking of ways to avoid asking the db for the time all over the place 22:01:37 devananda: I buy that 22:01:54 i dont see this as "ask the db for the time" 22:01:55 devananda: i just want to make sure we aren't limiting db choices by forcing the db to handle global time for us 22:02:57 we could instead do something like use the nova-conductor time for everything but a lot of the time comparisons are against db objects so we do get some performance benefits by doing it concurrently with db requests 22:03:01 Given we're now over time, do we want to discuss this at the summit? 22:03:06 sure 22:03:32 is there value in having an openstack system time that has a source from "somewhere"... start there, maybe have it initialized and periodically verified from whatever source? 22:03:33 I'm still not convinced that running a robust ntp solution isn't easier 22:03:46 beagles: that is ntp 22:03:57 but yeah we are out of time 22:04:03 summit discussion! 22:04:04 vishy: :) of course it could grab it from there ;) 22:04:17 #endmeeting