20:59:58 <vishy> #startmeeting nova 20:59:59 <openstack> Meeting started Thu Apr 4 20:59:58 2013 UTC. The chair is vishy. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:00 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:02 <openstack> The meeting name has been set to 'nova' 21:00:05 <vishy> oh hai! 21:00:08 <mikal> Doh! 21:00:10 <mikal> I wanted to do that! 21:00:17 <devananda> \o 21:00:17 <vishy> #chair mikal 21:00:19 <boris-42> HI 21:00:20 <mikal> Heh 21:00:22 <cyeoh> hi! 21:00:24 <openstack> Current chairs: mikal vishy 21:00:24 <dansmith> o/ 21:00:32 <mikal> #link https://wiki.openstack.org/wiki/Meetings/Nova 21:00:34 <alaski> hi 21:00:40 <mikal> #chair russellb 21:00:41 <openstack> Warning: Nick not in channel: russellb 21:00:42 <openstack> Current chairs: mikal russellb vishy 21:00:53 <vishy> he's on vacation :) 21:01:01 <mikal> I know... russellb asked me to run this meeting because he's currently taking a vacation in an Olive Garden or something. I think his goal is to finish the salad. 21:01:09 <mikal> But I am also happy with vishy running it if he wants. 21:01:09 <sdague> heh 21:01:23 <vishy> el oh el 21:01:33 <mikal> So, I think this will be relatively short... First agenda item... 21:01:42 <mikal> #topic backport bug review 21:01:57 <mikal> So, the only bug I am aware of that we should backport is https://bugs.launchpad.net/nova/+bug/1163660 21:01:58 <uvirtbot> Launchpad bug 1163660 in nova "nova-manage uses nova.db.fixed_ip_get_all_by_instance_host, which no longer exists" [Medium,In progress] 21:02:00 <mikal> Any others? 21:02:13 <mikal> Hillariously, that's been broken for over a year. 21:02:23 <vishy> there are more 21:02:48 <vishy> note the critical: https://bugs.launchpad.net/nova/+bugs?field.tag=grizzly-backport-potential 21:02:50 <vishy> and the high 21:03:06 <vishy> the nova-manage fixed list i think isn't too bad 21:03:15 <mikal> Oh, its not as bad as I first thought it was 21:03:17 <vishy> but not being able to use memcached is pretty bad 21:03:19 <mikal> Its embarrassing though 21:03:32 * mikal is waiting for LP to load that page 21:03:49 <dansmith> that's a lot of things to backport 21:03:57 <dansmith> how does that compare to folsom? 21:04:32 <mikal> Half are "fix committed" at least 21:05:02 <mikal> dansmith: I'm not sure, I don't remember that far back 21:05:05 <ttx> we always had embarrassing bugs showing their heads on release +1h 21:05:10 <krtaylor> yeah, there isnt a folsom-backport-potential 21:05:34 <mikal> So, it seems we should at least try and get them all assigned to people who are actually working on them... 21:05:53 <dansmith> so, I guess we don't have one for the system_metadata concern, 21:06:04 <dansmith> but is anyone else in a panic over that? 21:06:17 <mikal> Want to provide more detail? 21:06:28 <vishy> there is also this: https://review.openstack.org/#/c/26109/ 21:06:49 <vishy> folsom has has many hundreds of backports 21:07:05 <dansmith> mikal: apparently comstud is severely concerned about performance related to the system_metadata "explosion" 21:07:12 <dansmith> vishy: ah, good to know, I've never looked 21:07:19 <mikal> dansmith: ahhh yeah, that blew up yesterday 21:07:32 <dansmith> mikal: yeah :/ 21:07:32 <mikal> dansmith: this review looks like it fixes that? 21:07:56 <dansmith> mikal: this review? 21:08:08 <dansmith> mikal: you mean 26109? no.. :) 21:08:29 <dansmith> mikal: it relieves one pressure point that was easy :) 21:08:40 <mikal> comstud: you around? 21:08:50 <dansmith> he's on a plane 21:08:51 <mikal> dansmith: hmmm, ok 21:09:12 <mikal> Ok, so it sounds like that needs more work then 21:09:16 <dansmith> we've got a few plans for mitigation, but yeah 21:09:33 <vishy> yeah joining all of the metadata at once is bad 21:09:34 <mikal> There are also three bugs in that grizzly-backport-potential list with no assignee 21:09:50 <vishy> honestly we should have stuck of the flavor related metadata into one field 21:10:07 <vishy> then we could have joined it all at once 21:10:11 <vishy> sigh 21:10:33 <dansmith> vishy: could still do that, but aren't those columns limited to 255 chars? 21:10:47 <vishy> dansmith: yeah we'd have to migrate them to text 21:10:52 <dansmith> vishy: yeah 21:11:00 <mikal> vishy: I wonder how big those tables are in deploys... 21:11:07 <dansmith> vishy: well, that's doable since everything is calling the instance_type_extract() to get at it 21:11:44 <vishy> we could make it join on system_metadata where key = instance_type 21:11:45 <dansmith> so, that helps because we have fewer things in system_metadata for each instance, 21:11:53 <dansmith> but we're still going to be joining the whole thing right? 21:12:04 <dansmith> oh, we can do that? 21:12:13 <vishy> but then we have to figure out where all of the other system metadata is being used 21:12:21 <vishy> and explicitly join it when we need it 21:12:23 <dansmith> yeah 21:12:26 <mikal> dansmith: so, this sure sounds like it should have a bug filed if one doesn't already exist 21:12:45 <dansmith> mikal: yeah, I don't know that comstud doesn't have one, he's usually quick on that 21:13:29 <dansmith> looks like not though 21:13:31 * mikal asks his coworkers 21:14:14 <dansmith> so, one thought we had, 21:14:24 <dansmith> was to avoid the join entirely, 21:14:42 <dansmith> and just do another query for the system_metadata items we care about 21:14:57 <dansmith> which we can do in bulk after we get all the instances that we're going after in a given query 21:15:16 <dansmith> the thought being that two trips to the db is better than the massive join 21:15:37 <dansmith> so if that makes a significant improvement, that might be better than trying to do something more invasive 21:15:59 <vishy> dansmith: i'm not convinced that will be faster but it is worth a shot 21:16:15 <devananda> dansmith: if that means only pulling the metadata a) when it's actually needed, and b) which is desired, not all of it, then I think it'll be big 21:16:15 <dansmith> vishy: yeah 21:16:30 <mikal> Isn't that table lacking indexes to make that not suck? 21:16:37 <dansmith> devananda: unfortunately, it doesn't mean either of those things necessarily 21:16:55 <dansmith> mikal: no, apparently the indexes are the only reason it doesn't kill kittens already 21:17:24 <mikal> I can see an index on instance_uuid, but not key? 21:17:36 <dansmith> mikal: we don't need a key, right? 21:17:46 <devananda> probably not, since afaik nothing currently fetches only a single key 21:17:46 <dansmith> mikal: since we're joining only on uuid and taking everything 21:17:51 <dansmith> right 21:17:54 <devananda> we will want an index though :) 21:17:57 <mikal> dansmith: well, if you're going to cherry pick out specific keys to avoid a full join... 21:18:15 <dansmith> mikal: that's not what we're talking about in this step 21:18:17 <vishy> devananda: is there some way to optimize the join so that we don't get quite so many rows back? 21:18:31 <devananda> so the eventual query would be WHERE (uuid=X and key in(...)) OR (uuid=Y and key in (...)) ... 21:18:43 <vishy> for examle with 24 metadata items we get 24 copies of each instance right? 21:18:53 <dansmith> right, but we need all of those, 21:18:56 <devananda> hrmm 21:19:00 <dansmith> since we don't know if the caller is going to use system_metadata or not 21:19:13 <vishy> well we need all of the rows, but we don't need all of the instance data in every row 21:19:17 <devananda> vishy: i hadn't considered the code was doing that 21:19:22 <vishy> not sure if it would make any difference 21:19:32 <devananda> but you may be right. that'll be a huge waste of network traffic but not necessarily nmake the join any slower 21:19:43 <dansmith> so, comstud said that it wasn't pegging cpu or anything, just generating a crapton of network traffic, 21:19:59 <dansmith> so I think that the problem _is_ the extra data, I just can't see how a join would not generate the full result on each row 21:20:00 <devananda> if the caller needs 4 pieces of metadata but fetches 20, that slows down the query cause the db has to do 5x more work 21:20:25 <mikal> The resource tracker does know what metadata items it needs, right? 21:20:33 <mikal> So it could provide that as a hint to the lookup 21:20:40 <devananda> if, otoh, the db returned only those 4 pieces of metadata -- but each ROW also had an identical copy of the instance data -- then it slows down the network and, possibly, python -- but not th DB 21:20:55 <dansmith> mikal: in a few cases we could micro-optimize, yeah, 21:21:11 <mikal> dansmith: well, the resource tracker is what is punching comstud in the face 21:21:12 <dansmith> mikal: but it's hard for things like compute manager where it's going to pass the instance to a bunch of random things 21:21:15 <mikal> dansmith: its the painful case 21:21:42 <dansmith> mikal: no, I think that the other patch made that better.. he said the periodic task was the killer now, I thought 21:22:02 <mikal> dansmith: ahhh, ok 21:22:08 <vishy> i think the right solution here is to convert metadata and system_metadata into a json blob 21:22:23 <vishy> unless there are places we are filtering on it 21:22:27 <mikal> vishy: do you seeing that being backported to grizzly? 21:22:30 <dansmith> what's the impact of the text column? 21:22:49 <vishy> in which case we should just convert parts of metadata (like flavor info) to blobs 21:22:53 <dansmith> vishy: that's a helluva migration too, right? 21:22:55 <devananda> just to take a step back, where exactly is the performance problem right now? 21:22:58 <vishy> dansmith: negligible afaik 21:23:18 <dansmith> devananda: network traffic out of the db server, I believe 21:23:22 <vishy> devananda: when you have large numbers of instances the joins makes the data transfer crazy 21:23:22 <dansmith> devananda: we really should have comstud here 21:23:35 <devananda> moving from a k/v table to using a json blob (one row per uuid) is a dramatic change on db behavior 21:23:38 <vishy> like 10000 rows turns into 120000 21:23:57 <devananda> vishy: ok. so in that case, is the issue network traffic or DB time for the JOIN ? 21:24:11 <mikal> devananda: I believe comstud's problem was network traffic 21:24:14 <vishy> network traffic + unpacking time 21:24:16 <dansmith> comstud made it sound like traffic 21:24:17 <dansmith> yeah 21:24:19 <devananda> has someone separated those by timing the first packet and the last packet of mysql's response? 21:24:25 <devananda> ok 21:24:31 <vishy> sqlalchemy processing 120000 rows is pretty damn slow as well 21:24:36 <devananda> so then change the JOIN to a subquery + CONCAT 21:24:46 <devananda> or use an in-memory pivot table 21:24:52 <devananda> to return only 1 row per instance 21:25:04 <vishy> devananda: do you have any idea how to do either of those in sqlalchemy? 21:25:12 <devananda> from python's POV, it has the same effect as going to json but doesn't require any db migration 21:25:22 <devananda> vishy: no clue in sqla ... 21:25:30 <vishy> devananda: that would be sweet 21:25:31 <devananda> boris-42: any thoughts on ^ ? 21:25:33 <dansmith> that sounds nice (no migration) 21:25:52 * dansmith pretends he understands what devananda said 21:26:30 <vishy> basically we want to add an extra field to instances that does a subquery on metadata items and sticks them all into a single field. 21:26:40 <devananda> dansmith: http://stackoverflow.com/questions/4555988/sql-server-pivot-on-key-value-table (first hit on google, haven't read it...) 21:26:59 <devananda> but the question there shows an exaple of our problem 21:27:00 <vishy> devananda: actually we could just add a prefix to the fields right? 21:27:03 <devananda> and what a pivot does 21:27:11 <devananda> vishy: i'm not sure why we'd do that 21:27:19 <vishy> metadata_key1, metadata_key2 etc. 21:27:26 <dansmith> devananda: that looks excellent :) 21:27:38 <devananda> vishy: ah. yes 21:28:03 <devananda> vishy: all columns in table 1 + (prefix+key) for all columns in table 2 21:28:04 <dansmith> we already have a prefix on the type keys, do we need more? 21:28:58 <devananda> vishy: another approach is to use GROUP_CONCAT to return all the key/value pairs in a single text field 21:29:08 <devananda> eg, "key1:val1,key2:val2" 21:29:23 <vishy> devananda: so you concat them with a , and hope none of the vals have a , in them? 21:29:25 <devananda> as long as we have separators which are guaranteed not to be present in either key or value ;) 21:29:34 <vishy> yeah thats the part i'm worried about :) 21:29:46 <dansmith> devananda: so is this going to be significantly better than two queries? 21:29:58 <dansmith> I mean, a little more latency, but is it actually faster for some reason? 21:30:10 <vishy> https://github.com/tomasd/pandas-sqlalchemy-pivot 21:30:15 <dansmith> because I haven't seen anyone speak in SQLA terms yet 21:30:45 <devananda> dansmith: probably not faster, no. 21:31:05 <dansmith> okay 21:31:30 <dansmith> vishy: do I take the README on that project to mean that SQLA can't do this itself? 21:31:51 <vishy> dansmith: so you are saying that we convert all of the instance gets to do a separate query for metadata and shove it in? 21:32:11 <devananda> db still has to do the same work to get the entries from system_metadata. returning (uuid,key,val) 10M times or returning (uuid, other fields, concat of keys and values) 100k times... the total byte count will be similar. 21:32:18 <dansmith> vishy: that's what we want to try, yeah 21:32:29 <vishy> dansmith: i guess that avoids copying the extra data from each row 21:32:35 <dansmith> vishy: the only problem I have right now is that the Instance SQLA object doesn't want me fscking with its field, 21:32:49 <dansmith> so I have to do some dirty work to convert it to a dict 21:33:01 <dansmith> which is ugly, and maybe a showstopper, but it works for testing at least 21:33:04 <vishy> dansmith: we already do some similar mucking 21:33:10 <vishy> for aggregates 21:33:12 <dansmith> okay then :) 21:33:15 <vishy> iirc 21:33:21 <dansmith> https://review.openstack.org/#/c/26136/ 21:33:47 <dansmith> that's just a quick hack to test with, but it appears to pass tests 21:34:05 <vishy> dansmith: dict(inst.iteritems()) 21:34:07 <vishy> btw 21:34:11 <dansmith> ah, okay 21:34:53 <vishy> lets get comstud to test taht when he gets off the plane 21:34:58 <vishy> i think we need a bug report as well 21:35:05 <dansmith> vishy: yeah 21:36:12 <dansmith> well, sorry for dominating the meeting, 21:36:16 <mikal> Heh 21:36:16 <dansmith> but excellent discussion, IMHO :) 21:36:19 <vishy> no it is important 21:36:23 <mikal> It was worthwhile 21:36:25 <vishy> i will mark the bug critical 21:36:33 <vishy> and we can get it into stable/grizzly next week 21:36:33 <mikal> Did we actually find a bug? 21:36:46 <dansmith> mikal: I didn't see one 21:36:48 <mikal> (in LP I mean) 21:36:55 <vishy> we have enough bugs to warrant a quick grizzly.1 i think 21:37:04 <vishy> mikal: can someone report it? 21:37:09 <dansmith> I can 21:37:13 <mikal> Ok cool 21:37:23 <dansmith> (I have that ability) 21:37:41 <mikal> Ok, so what about https://bugs.launchpad.net/nova/+bug/1164072 -- is on the backport list but doesn't have an assignee 21:37:42 <uvirtbot> Launchpad bug 1164072 in nova "libvirt livemigration warning logger refernces a not defined local varaible" [Medium,Triaged] 21:38:07 <mikal> Stupid locals() 21:39:11 <dansmith> https://bugs.launchpad.net/nova/+bug/1164737 21:39:13 <uvirtbot> Launchpad bug 1164737 in nova "Instance joins with system_metadata are critically slow" [Critical,Confirmed] 21:39:35 <mikal> Actually, I'll take 1164072 21:39:38 <boris-42> Btw I know probably it is offtop but could we speak little bit about this bp https://blueprints.launchpad.net/nova/+spec/use-db-time 21:39:41 <mikal> There are a few things wrong with that log message. 21:39:47 <boris-42> I have some questions.. 21:39:55 <mikal> boris-42: can we do that in "open discussion"? 21:40:06 <vishy> yeah that one is super easy :) 21:40:21 <mikal> The other unassigned one is https://bugs.launchpad.net/nova/+bug/1160442 21:40:22 <uvirtbot> Launchpad bug 1160442 in quantum "when boot many vms with quantum, nova sometimes allocates two quantum ports rather than one" [Undecided,Incomplete] 21:40:36 <boris-42> mikal ok 21:40:54 <devananda> mikal: is that a race condition in db? (just guessing, still reading) 21:41:11 <mikal> devananda: no idea. I'm trying to trick someone into digging into it... 21:41:20 * devananda hides 21:42:09 <mikal> No takers? 21:43:03 <mikal> Fair enough 21:43:03 <vishy> mikal: arosen might be good for that one 21:43:20 <mikal> vishy: ok, I will ping him 21:43:23 <mikal> Unless he's here... 21:43:26 <vishy> i just did in nova 21:43:33 <mikal> Cool 21:43:47 <mikal> Is there anything else we should discuss backport wise? 21:44:14 <vishy> mikal: I'm going to discuss doing a quick .1 with the stable-backports folks 21:44:51 <mikal> vishy: ok. Do you want to issue a deadline for people to try and get their fixes in to make that release? 21:45:18 <vishy> if we do it it will probably just be a quick version of all the really important ones 21:45:23 <vishy> high + critical 21:45:34 <mikal> Ok, cool 21:45:38 <vishy> but i need to discuss with markmc + other stable team to be sure 21:45:45 <mikal> Fair enough 21:45:55 <mikal> We're ok to move on to the next topic? 21:45:57 <vishy> it would be great if everyone could help stay on top of the bug queue 21:46:04 <vishy> i knocked out about 15 today 21:46:09 <vishy> but there are still a few left 21:46:21 <vishy> addressing bugs quickly right after release is really important 21:46:29 <mikal> Yeah, I see 26 bugs needing triage as well 21:46:31 <vishy> because a lot of people don't try the rcs 21:46:47 <vishy> mikal: refresh, its only 12 21:46:48 <vishy> :) 21:47:09 <mikal> Ahhh, I think we're counting differently. I'm including incomplete bugs as untriaged. 21:47:17 <vishy> oh gotcha 21:47:30 <vishy> i set bugs back to incomplete if i need more info from the reporter 21:47:46 <mikal> Yeah, that's fine. I think that's you doing the right thing. 21:47:52 <mikal> I do the same 21:48:08 <mikal> But in my mind its still a bug we need to categorize, even if we're waiting for someone 21:48:15 <mikal> Otherwise if they never reply it sits around forever 21:48:49 <mikal> We really should move on... 21:48:53 <mikal> #topic design summit 21:49:10 <mikal> IIRC russellb has now let people know about session acceptance 21:49:44 <mikal> Hmmm, there are still a fair few marked as "unreviewed" though 21:50:27 <mikal> Does anyone have anything they want to say about the summit? 21:51:13 <mikal> Moving on then 21:51:15 <mikal> #topic open discussion 21:51:21 <mikal> boris-42: now is your time to shine... 21:51:37 <boris-42> mikal thanks 21:52:10 <mikal> So, reading that blueprint... 21:52:17 <mikal> timeutils.utcnow() is super useful for unit testing 21:52:17 <boris-42> So we discussed that using local time on nodes is bad idea.. 21:52:28 <mikal> I'd expect a bunch of unit tests to need work if we removed it and used DB time 21:52:46 <devananda> indeed 21:52:52 <vishy> boris-42: that seems like a huge amount of work for very little benefit 21:53:09 <boris-42> vishy it seems only =) 21:53:47 <devananda> it's a pretty common problem in distributed systems 21:53:52 <vishy> boris-42: it seems like a lot of places we use utcnow we are not actually needing to compare against db objects 21:54:01 <devananda> if the clock gets out of sync somewhere, all kinds of issues crop up 21:54:28 <beagles> devanada: +1 21:54:30 <devananda> eg, if one compute host's ntpd dies and it drifts, pretty soon it may appear "dead" to the scheduler 21:54:30 <boris-42> Yes but there is also a lot of places where we are comparing 21:54:41 <boris-42> I have problems with only 70 nodes =) 21:54:50 <beagles> just about any middleware "environment" ends up with some concept of system time (OMG Time Service, etc) 21:55:02 <devananda> because the compute host is writing it's localtime to the db updated_at value, which the scheduler compares to it's localtime 21:55:05 <boris-42> So the idea is pretty simple to add methods in db and conductor 21:55:22 <boris-42> and use it instead of timeutils.utcnow() 21:55:25 <vishy> i'm just worried about adding a whole bunch of unneded db calls 21:55:42 <boris-42> They are pretty simple for DB 21:55:52 <mikal> I wonder how much extra load it will be on conductor as well? 21:55:53 <devananda> if done right, i don't think it'll result in more db calls at all 21:55:55 <vishy> i guess we could specifically split out system_time and local_time 21:55:58 <mikal> Do we know how often we ask for the time? 21:56:17 <devananda> it's only a matter of when something writes a timestamp TO the db that the problem occurs 21:56:27 <devananda> er, correcting myself. that's half 21:56:37 <vishy> we use it for caching and things as well 21:56:42 <devananda> the other half is when somethig reads a time from the DB and compares to local time. but again, it can ask the DB for its time 21:57:08 <devananda> ah, hrm 21:57:31 <vishy> there are a lot of places that this will not scale 21:57:47 <vishy> making a db call every time we need to check if a cache item is valid is rediculous 21:57:57 <vishy> so we will need two versions of time 21:58:06 <vishy> one where we just need a locally consistent time 21:58:19 <vishy> and one where we need a globally consistent time 21:58:22 <boris-42> vishy +1 21:58:25 <devananda> ++ 21:58:40 <mikal> So, what about a periodic task which checks db time against system time? 21:58:43 <dripton> ++, but while we're changing everything, we should always use UTC rather than local. 21:58:48 <mikal> Every ten minutes or something 21:58:54 <vishy> devananda: any idea about db systems that have master/master 21:59:04 <vishy> dripton: sorry i don't mean local as in timezone local 21:59:05 <devananda> mikal: of what benefit is that? 21:59:12 <vishy> i mean local as in local to this machine 21:59:16 <vishy> still in utc time 21:59:36 <devananda> vishy: when there are >1 DB at play (whethe rmaster,master or galera or ..) it's extra important for their clocks to be in sync 21:59:50 <mikal> devananda: well, we could at least log that we've detected an error condition. It would help with diagnosis from an operator. 21:59:51 <devananda> vishy: in my former db consulting hat, i saw lots of issues creep up from that 21:59:59 <vishy> for example this: https://github.com/openstack/nova/blob/master/nova/openstack/common/memorycache.py#L62 22:00:05 <vishy> should never talk to the db 22:00:32 <vishy> devananda: ok so we are assuming that keeping the db clocks in sync is easier than keeping all of the nodes in sync? 22:00:49 <devananda> mikal: if there is a clear separation between things that need to track distributed-time and things that need to track local/relative-time, then i don't think such a check would be meaningful 22:01:12 <dansmith> I'm skeptical of anything where we have to consult the db for the current time, 22:01:22 <devananda> vishy: yes. there are probably a lot less db's than nova nodes, and a lot more attention should be spent on them :) 22:01:32 <mikal> devananda: well, I'm thinking of ways to avoid asking the db for the time all over the place 22:01:37 <vishy> devananda: I buy that 22:01:54 <devananda> i dont see this as "ask the db for the time" 22:01:55 <vishy> devananda: i just want to make sure we aren't limiting db choices by forcing the db to handle global time for us 22:02:57 <vishy> we could instead do something like use the nova-conductor time for everything but a lot of the time comparisons are against db objects so we do get some performance benefits by doing it concurrently with db requests 22:03:01 <mikal> Given we're now over time, do we want to discuss this at the summit? 22:03:06 <vishy> sure 22:03:32 <beagles> is there value in having an openstack system time that has a source from "somewhere"... start there, maybe have it initialized and periodically verified from whatever source? 22:03:33 <vishy> I'm still not convinced that running a robust ntp solution isn't easier 22:03:46 <vishy> beagles: that is ntp 22:03:57 <vishy> but yeah we are out of time 22:04:03 <vishy> summit discussion! 22:04:04 <beagles> vishy: :) of course it could grab it from there ;) 22:04:17 <vishy> #endmeeting