#openstack-meeting log

20:59:58 <vishy> #startmeeting nova
20:59:59 <openstack> Meeting started Thu Apr  4 20:59:58 2013 UTC.  The chair is vishy. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:00:00 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:00:02 <openstack> The meeting name has been set to 'nova'
21:00:05 <vishy> oh hai!
21:00:08 <mikal> Doh!
21:00:10 <mikal> I wanted to do that!
21:00:17 <devananda> \o
21:00:17 <vishy> #chair mikal
21:00:19 <boris-42> HI
21:00:20 <mikal> Heh
21:00:22 <cyeoh> hi!
21:00:24 <openstack> Current chairs: mikal vishy
21:00:24 <dansmith> o/
21:00:32 <mikal> #link https://wiki.openstack.org/wiki/Meetings/Nova
21:00:34 <alaski> hi
21:00:40 <mikal> #chair russellb
21:00:41 <openstack> Warning: Nick not in channel: russellb
21:00:42 <openstack> Current chairs: mikal russellb vishy
21:00:53 <vishy> he's on vacation :)
21:01:01 <mikal> I know... russellb asked me to run this meeting because he's currently taking a vacation in an Olive Garden or something. I think his goal is to finish the salad.
21:01:09 <mikal> But I am also happy with vishy running it if he wants.
21:01:09 <sdague> heh
21:01:23 <vishy> el oh el
21:01:33 <mikal> So, I think this will be relatively short... First agenda item...
21:01:42 <mikal> #topic backport bug review
21:01:57 <mikal> So, the only bug I am aware of that we should backport is https://bugs.launchpad.net/nova/+bug/1163660
21:01:58 <uvirtbot> Launchpad bug 1163660 in nova "nova-manage uses nova.db.fixed_ip_get_all_by_instance_host, which no longer exists" [Medium,In progress]
21:02:00 <mikal> Any others?
21:02:13 <mikal> Hillariously, that's been broken for over a year.
21:02:23 <vishy> there are more
21:02:48 <vishy> note the critical: https://bugs.launchpad.net/nova/+bugs?field.tag=grizzly-backport-potential
21:02:50 <vishy> and the high
21:03:06 <vishy> the nova-manage fixed list i think isn't too bad
21:03:15 <mikal> Oh, its not as bad as I first thought it was
21:03:17 <vishy> but not being able to use memcached is pretty bad
21:03:19 <mikal> Its embarrassing though
21:03:32 * mikal is waiting for LP to load that page
21:03:49 <dansmith> that's a lot of things to backport
21:03:57 <dansmith> how does that compare to folsom?
21:04:32 <mikal> Half are "fix committed" at least
21:05:02 <mikal> dansmith: I'm not sure, I don't remember that far back
21:05:05 <ttx> we always had embarrassing bugs showing their heads on release +1h
21:05:10 <krtaylor> yeah, there isnt a folsom-backport-potential
21:05:34 <mikal> So, it seems we should at least try and get them all assigned to people who are actually working on them...
21:05:53 <dansmith> so, I guess we don't have one for the system_metadata concern,
21:06:04 <dansmith> but is anyone else in a panic over that?
21:06:17 <mikal> Want to provide more detail?
21:06:28 <vishy> there is also this: https://review.openstack.org/#/c/26109/
21:06:49 <vishy> folsom has has many hundreds of backports
21:07:05 <dansmith> mikal: apparently comstud is severely concerned about performance related to the system_metadata "explosion"
21:07:12 <dansmith> vishy: ah, good to know, I've never looked
21:07:19 <mikal> dansmith: ahhh yeah, that blew up yesterday
21:07:32 <dansmith> mikal: yeah :/
21:07:32 <mikal> dansmith: this review looks like it fixes that?
21:07:56 <dansmith> mikal: this review?
21:08:08 <dansmith> mikal: you mean 26109? no.. :)
21:08:29 <dansmith> mikal: it relieves one pressure point that was easy :)
21:08:40 <mikal> comstud: you around?
21:08:50 <dansmith> he's on a plane
21:08:51 <mikal> dansmith: hmmm, ok
21:09:12 <mikal> Ok, so it sounds like that needs more work then
21:09:16 <dansmith> we've got a few plans for mitigation, but yeah
21:09:33 <vishy> yeah joining all of the metadata at once is bad
21:09:34 <mikal> There are also three bugs in that grizzly-backport-potential list with no assignee
21:09:50 <vishy> honestly we should have stuck of the flavor related metadata into one field
21:10:07 <vishy> then we could have joined it all at once
21:10:11 <vishy> sigh
21:10:33 <dansmith> vishy: could still do that, but aren't those columns limited to 255 chars?
21:10:47 <vishy> dansmith: yeah we'd have to migrate them to text
21:10:52 <dansmith> vishy: yeah
21:11:00 <mikal> vishy: I wonder how big those tables are in deploys...
21:11:07 <dansmith> vishy: well, that's doable since everything is calling the instance_type_extract() to get at it
21:11:44 <vishy> we could make it join on system_metadata where key = instance_type
21:11:45 <dansmith> so, that helps because we have fewer things in system_metadata for each instance,
21:11:53 <dansmith> but we're still going to be joining the whole thing right?
21:12:04 <dansmith> oh, we can do that?
21:12:13 <vishy> but then we have to figure out where all of the other system metadata is being used
21:12:21 <vishy> and explicitly join it when we need it
21:12:23 <dansmith> yeah
21:12:26 <mikal> dansmith: so, this sure sounds like it should have a bug filed if one doesn't already exist
21:12:45 <dansmith> mikal: yeah, I don't know that comstud doesn't have one, he's usually quick on that
21:13:29 <dansmith> looks like not though
21:13:31 * mikal asks his coworkers
21:14:14 <dansmith> so, one thought we had,
21:14:24 <dansmith> was to avoid the join entirely,
21:14:42 <dansmith> and just do another query for the system_metadata items we care about
21:14:57 <dansmith> which we can do in bulk after we get all the instances that we're going after in a given query
21:15:16 <dansmith> the thought being that two trips to the db is better than the massive join
21:15:37 <dansmith> so if that makes a significant improvement, that might be better than trying to do something more invasive
21:15:59 <vishy> dansmith: i'm not convinced that will be faster but it is worth a shot
21:16:15 <devananda> dansmith: if that means only pulling the metadata a) when it's actually needed, and b) which is desired, not all of it, then I think it'll be big
21:16:15 <dansmith> vishy: yeah
21:16:30 <mikal> Isn't that table lacking indexes to make that not suck?
21:16:37 <dansmith> devananda: unfortunately, it doesn't mean either of those things necessarily
21:16:55 <dansmith> mikal: no, apparently the indexes are the only reason it doesn't kill kittens already
21:17:24 <mikal> I can see an index on instance_uuid, but not key?
21:17:36 <dansmith> mikal: we don't need a key, right?
21:17:46 <devananda> probably not, since afaik nothing currently fetches only a single key
21:17:46 <dansmith> mikal: since we're joining only on uuid and taking everything
21:17:51 <dansmith> right
21:17:54 <devananda> we will want an index though :)
21:17:57 <mikal> dansmith: well, if you're going to cherry pick out specific keys to avoid a full join...
21:18:15 <dansmith> mikal: that's not what we're talking about in this step
21:18:17 <vishy> devananda: is there some way to optimize the join so that we don't get quite so many rows back?
21:18:31 <devananda> so the eventual query would be WHERE (uuid=X and key in(...)) OR (uuid=Y and key in (...)) ...
21:18:43 <vishy> for examle with 24 metadata items we get 24 copies of each instance right?
21:18:53 <dansmith> right, but we need all of those,
21:18:56 <devananda> hrmm
21:19:00 <dansmith> since we don't know if the caller is going to use system_metadata or not
21:19:13 <vishy> well we need all of the rows, but we don't need all of the instance data in every row
21:19:17 <devananda> vishy: i hadn't considered the code was doing that
21:19:22 <vishy> not sure if it would make any difference
21:19:32 <devananda> but you may be right. that'll be a huge waste of network traffic but not necessarily nmake the join any slower
21:19:43 <dansmith> so, comstud said that it wasn't pegging cpu or anything, just generating a crapton of network traffic,
21:19:59 <dansmith> so I think that the problem _is_ the extra data, I just can't see how a join would not generate the full result on each row
21:20:00 <devananda> if the caller needs 4 pieces of metadata but fetches 20, that slows down the query cause the db has to do 5x more work
21:20:25 <mikal> The resource tracker does know what metadata items it needs, right?
21:20:33 <mikal> So it could provide that as a hint to the lookup
21:20:40 <devananda> if, otoh, the db returned only those 4 pieces of metadata -- but each ROW also had an identical copy of the instance data -- then it slows down the network and, possibly, python -- but not th DB
21:20:55 <dansmith> mikal: in a few cases we could micro-optimize, yeah,
21:21:11 <mikal> dansmith: well, the resource tracker is what is punching comstud in the face
21:21:12 <dansmith> mikal: but it's hard for things like compute manager where it's going to pass the instance to a bunch of random things
21:21:15 <mikal> dansmith: its the painful case
21:21:42 <dansmith> mikal: no, I think that the other patch made that better.. he said the periodic task was the killer now, I thought
21:22:02 <mikal> dansmith: ahhh, ok
21:22:08 <vishy> i think the right solution here is to convert metadata and system_metadata into a json blob
21:22:23 <vishy> unless there are places we are filtering on it
21:22:27 <mikal> vishy: do you seeing that being backported to grizzly?
21:22:30 <dansmith> what's the impact of the text column?
21:22:49 <vishy> in which case we should just convert parts of metadata (like flavor info) to blobs
21:22:53 <dansmith> vishy: that's a helluva migration too, right?
21:22:55 <devananda> just to take a step back, where exactly is the performance problem right now?
21:22:58 <vishy> dansmith: negligible afaik
21:23:18 <dansmith> devananda: network traffic out of the db server, I believe
21:23:22 <vishy> devananda: when you have large numbers of instances the joins makes the data transfer crazy
21:23:22 <dansmith> devananda: we really should have comstud here
21:23:35 <devananda> moving from a k/v table to using a json blob (one row per uuid) is a dramatic change on db behavior
21:23:38 <vishy> like 10000 rows turns into 120000
21:23:57 <devananda> vishy: ok. so in that case, is the issue network traffic or DB time for the JOIN ?
21:24:11 <mikal> devananda: I believe comstud's problem was network traffic
21:24:14 <vishy> network traffic + unpacking time
21:24:16 <dansmith> comstud made it sound like traffic
21:24:17 <dansmith> yeah
21:24:19 <devananda> has someone separated those by timing the first packet and the last packet of mysql's response?
21:24:25 <devananda> ok
21:24:31 <vishy> sqlalchemy processing 120000 rows is pretty damn slow as well
21:24:36 <devananda> so then change the JOIN to a subquery + CONCAT
21:24:46 <devananda> or use an in-memory pivot table
21:24:52 <devananda> to return only 1 row per instance
21:25:04 <vishy> devananda: do you have any idea how to do either of those in sqlalchemy?
21:25:12 <devananda> from python's POV, it has the same effect as going to json but doesn't require any db migration
21:25:22 <devananda> vishy: no clue in sqla ...
21:25:30 <vishy> devananda: that would be sweet
21:25:31 <devananda> boris-42: any thoughts on ^ ?
21:25:33 <dansmith> that sounds nice (no migration)
21:25:52 * dansmith pretends he understands what devananda said
21:26:30 <vishy> basically we want to add an extra field to instances that does a subquery on metadata items and sticks them all into a single field.
21:26:40 <devananda> dansmith: http://stackoverflow.com/questions/4555988/sql-server-pivot-on-key-value-table (first hit on google, haven't read it...)
21:26:59 <devananda> but the question there shows an exaple of our problem
21:27:00 <vishy> devananda: actually we could just add a prefix to the fields right?
21:27:03 <devananda> and what a pivot does
21:27:11 <devananda> vishy: i'm not sure why we'd do that
21:27:19 <vishy> metadata_key1, metadata_key2 etc.
21:27:26 <dansmith> devananda: that looks excellent :)
21:27:38 <devananda> vishy: ah. yes
21:28:03 <devananda> vishy: all columns in table 1 + (prefix+key) for all columns in table 2
21:28:04 <dansmith> we already have a prefix on the type keys, do we need more?
21:28:58 <devananda> vishy: another approach is to use GROUP_CONCAT to return all the key/value pairs in a single text field
21:29:08 <devananda> eg, "key1:val1,key2:val2"
21:29:23 <vishy> devananda: so you concat them with a , and hope none of the vals have a , in them?
21:29:25 <devananda> as long as we have separators which are guaranteed not to be present in either key or value ;)
21:29:34 <vishy> yeah thats the part i'm worried about :)
21:29:46 <dansmith> devananda: so is this going to be significantly better than two queries?
21:29:58 <dansmith> I mean, a little more latency, but is it actually faster for some reason?
21:30:10 <vishy> https://github.com/tomasd/pandas-sqlalchemy-pivot
21:30:15 <dansmith> because I haven't seen anyone speak in SQLA terms yet
21:30:45 <devananda> dansmith: probably not faster, no.
21:31:05 <dansmith> okay
21:31:30 <dansmith> vishy: do I take the README on that project to mean that SQLA can't do this itself?
21:31:51 <vishy> dansmith: so you are saying that we convert all of the instance gets to do a separate query for metadata and shove it in?
21:32:11 <devananda> db still has to do the same work to get the entries from system_metadata. returning (uuid,key,val) 10M times or returning (uuid, other fields, concat of keys and values) 100k times... the total byte count will be similar.
21:32:18 <dansmith> vishy: that's what we want to try, yeah
21:32:29 <vishy> dansmith: i guess that avoids copying the extra data from each row
21:32:35 <dansmith> vishy: the only problem I have right now is that the Instance SQLA object doesn't want me fscking with its field,
21:32:49 <dansmith> so I have to do some dirty work to convert it to a dict
21:33:01 <dansmith> which is ugly, and maybe a showstopper, but it works for testing at least
21:33:04 <vishy> dansmith: we already do some similar mucking
21:33:10 <vishy> for aggregates
21:33:12 <dansmith> okay then :)
21:33:15 <vishy> iirc
21:33:21 <dansmith> https://review.openstack.org/#/c/26136/
21:33:47 <dansmith> that's just a quick hack to test with, but it appears to pass tests
21:34:05 <vishy> dansmith: dict(inst.iteritems())
21:34:07 <vishy> btw
21:34:11 <dansmith> ah, okay
21:34:53 <vishy> lets get comstud to test taht when he gets off the plane
21:34:58 <vishy> i think we need a bug report as well
21:35:05 <dansmith> vishy: yeah
21:36:12 <dansmith> well, sorry for dominating the meeting,
21:36:16 <mikal> Heh
21:36:16 <dansmith> but excellent discussion, IMHO :)
21:36:19 <vishy> no it is important
21:36:23 <mikal> It was worthwhile
21:36:25 <vishy> i will mark the bug critical
21:36:33 <vishy> and we can get it into stable/grizzly next week
21:36:33 <mikal> Did we actually find a bug?
21:36:46 <dansmith> mikal: I didn't see one
21:36:48 <mikal> (in LP I mean)
21:36:55 <vishy> we have enough bugs to warrant a quick grizzly.1 i think
21:37:04 <vishy> mikal: can someone report it?
21:37:09 <dansmith> I can
21:37:13 <mikal> Ok cool
21:37:23 <dansmith> (I have that ability)
21:37:41 <mikal> Ok, so what about https://bugs.launchpad.net/nova/+bug/1164072 -- is on the backport list but doesn't have an assignee
21:37:42 <uvirtbot> Launchpad bug 1164072 in nova "libvirt livemigration warning logger refernces a not defined local varaible" [Medium,Triaged]
21:38:07 <mikal> Stupid locals()
21:39:11 <dansmith> https://bugs.launchpad.net/nova/+bug/1164737
21:39:13 <uvirtbot> Launchpad bug 1164737 in nova "Instance joins with system_metadata are critically slow" [Critical,Confirmed]
21:39:35 <mikal> Actually, I'll take 1164072
21:39:38 <boris-42> Btw I know probably it is offtop but could we speak little bit about this bp https://blueprints.launchpad.net/nova/+spec/use-db-time
21:39:41 <mikal> There are a few things wrong with that log message.
21:39:47 <boris-42> I have some questions..
21:39:55 <mikal> boris-42: can we do that in "open discussion"?
21:40:06 <vishy> yeah that one is super easy :)
21:40:21 <mikal> The other unassigned one is https://bugs.launchpad.net/nova/+bug/1160442
21:40:22 <uvirtbot> Launchpad bug 1160442 in quantum "when boot many vms with quantum, nova sometimes allocates two quantum ports rather than one" [Undecided,Incomplete]
21:40:36 <boris-42> mikal ok
21:40:54 <devananda> mikal: is that a race condition in db? (just guessing, still reading)
21:41:11 <mikal> devananda: no idea. I'm trying to trick someone into digging into it...
21:41:20 * devananda hides
21:42:09 <mikal> No takers?
21:43:03 <mikal> Fair enough
21:43:03 <vishy> mikal: arosen might be good for that one
21:43:20 <mikal> vishy: ok, I will ping him
21:43:23 <mikal> Unless he's here...
21:43:26 <vishy> i just did in nova
21:43:33 <mikal> Cool
21:43:47 <mikal> Is there anything else we should discuss backport wise?
21:44:14 <vishy> mikal: I'm going to discuss doing a quick .1 with the stable-backports folks
21:44:51 <mikal> vishy: ok. Do you want to issue a deadline for people to try and get their fixes in to make that release?
21:45:18 <vishy> if we do it it will probably just be a quick version of all the really important ones
21:45:23 <vishy> high + critical
21:45:34 <mikal> Ok, cool
21:45:38 <vishy> but i need to discuss with markmc + other stable team to be sure
21:45:45 <mikal> Fair enough
21:45:55 <mikal> We're ok to move on to the next topic?
21:45:57 <vishy> it would be great if everyone could help stay on top of the bug queue
21:46:04 <vishy> i knocked out about 15 today
21:46:09 <vishy> but there are still a few left
21:46:21 <vishy> addressing bugs quickly right after release is really important
21:46:29 <mikal> Yeah, I see 26 bugs needing triage as well
21:46:31 <vishy> because a lot of people don't try the rcs
21:46:47 <vishy> mikal: refresh, its only 12
21:46:48 <vishy> :)
21:47:09 <mikal> Ahhh, I think we're counting differently. I'm including incomplete bugs as untriaged.
21:47:17 <vishy> oh gotcha
21:47:30 <vishy> i set bugs back to incomplete if i need more info from the reporter
21:47:46 <mikal> Yeah, that's fine. I think that's you doing the right thing.
21:47:52 <mikal> I do the same
21:48:08 <mikal> But in my mind its still a bug we need to categorize, even if we're waiting for someone
21:48:15 <mikal> Otherwise if they never reply it sits around forever
21:48:49 <mikal> We really should move on...
21:48:53 <mikal> #topic design summit
21:49:10 <mikal> IIRC russellb has now let people know about session acceptance
21:49:44 <mikal> Hmmm, there are still a fair few marked as "unreviewed" though
21:50:27 <mikal> Does anyone have anything they want to say about the summit?
21:51:13 <mikal> Moving on then
21:51:15 <mikal> #topic open discussion
21:51:21 <mikal> boris-42: now is your time to shine...
21:51:37 <boris-42> mikal thanks
21:52:10 <mikal> So, reading that blueprint...
21:52:17 <mikal> timeutils.utcnow() is super useful for unit testing
21:52:17 <boris-42> So we discussed that using local time on nodes is bad idea..
21:52:28 <mikal> I'd expect a bunch of unit tests to need work if we removed it and used DB time
21:52:46 <devananda> indeed
21:52:52 <vishy> boris-42: that seems like a huge amount of work for very little benefit
21:53:09 <boris-42> vishy it seems only =)
21:53:47 <devananda> it's a pretty common problem in distributed systems
21:53:52 <vishy> boris-42: it seems like a lot of places we use utcnow we are not actually needing to compare against db objects
21:54:01 <devananda> if the clock gets out of sync somewhere, all kinds of issues crop up
21:54:28 <beagles> devanada: +1
21:54:30 <devananda> eg, if one compute host's ntpd dies and it drifts, pretty soon it may appear "dead" to the scheduler
21:54:30 <boris-42> Yes but there is also a lot of places where we are comparing
21:54:41 <boris-42> I have problems with only 70 nodes =)
21:54:50 <beagles> just about any  middleware "environment" ends up with some concept of system time (OMG Time Service, etc)
21:55:02 <devananda> because the compute host is writing it's localtime to the db updated_at value, which the scheduler compares to it's localtime
21:55:05 <boris-42> So the idea is pretty simple to add methods in db and conductor
21:55:22 <boris-42> and use it instead of timeutils.utcnow()
21:55:25 <vishy> i'm just worried about adding a whole bunch of unneded db calls
21:55:42 <boris-42> They are pretty simple for DB
21:55:52 <mikal> I wonder how much extra load it will be on conductor as well?
21:55:53 <devananda> if done right, i don't think it'll result in more db calls at all
21:55:55 <vishy> i guess we could specifically split out system_time and local_time
21:55:58 <mikal> Do we know how often we ask for the time?
21:56:17 <devananda> it's only a matter of when something writes a timestamp TO the db that the problem occurs
21:56:27 <devananda> er, correcting myself. that's half
21:56:37 <vishy> we use it for caching and things as well
21:56:42 <devananda> the other half is when somethig reads a time from the DB and compares to local time. but again, it can ask the DB for its time
21:57:08 <devananda> ah, hrm
21:57:31 <vishy> there are a lot of places that this will not scale
21:57:47 <vishy> making a db call every time we need to check if a cache item is valid is rediculous
21:57:57 <vishy> so we will need two versions of time
21:58:06 <vishy> one where we just need a locally consistent time
21:58:19 <vishy> and one where we need a globally consistent time
21:58:22 <boris-42> vishy +1
21:58:25 <devananda> ++
21:58:40 <mikal> So, what about a periodic task which checks db time against system time?
21:58:43 <dripton> ++, but while we're changing everything, we should always use UTC rather than local.
21:58:48 <mikal> Every ten minutes or something
21:58:54 <vishy> devananda: any idea about db systems that have master/master
21:59:04 <vishy> dripton: sorry i don't mean local as in timezone local
21:59:05 <devananda> mikal: of what benefit is that?
21:59:12 <vishy> i mean local as in local to this machine
21:59:16 <vishy> still in utc time
21:59:36 <devananda> vishy: when there are >1 DB at play (whethe rmaster,master or galera or ..) it's extra important for their clocks to be in sync
21:59:50 <mikal> devananda: well, we could at least log that we've detected an error condition. It would help with diagnosis from an operator.
21:59:51 <devananda> vishy: in my former db consulting hat, i saw lots of issues creep up from that
21:59:59 <vishy> for example this: https://github.com/openstack/nova/blob/master/nova/openstack/common/memorycache.py#L62
22:00:05 <vishy> should never talk to the db
22:00:32 <vishy> devananda: ok so we are assuming that keeping the db clocks in sync is easier than keeping all of the nodes in sync?
22:00:49 <devananda> mikal: if there is a clear separation between things that need to track distributed-time and things that need to track local/relative-time, then i don't think such a check would be meaningful
22:01:12 <dansmith> I'm skeptical of anything where we have to consult the db for the current time,
22:01:22 <devananda> vishy: yes. there are probably a lot less db's than nova nodes, and a lot more attention should be spent on them :)
22:01:32 <mikal> devananda: well, I'm thinking of ways to avoid asking the db for the time all over the place
22:01:37 <vishy> devananda: I buy that
22:01:54 <devananda> i dont see this as "ask the db for the time"
22:01:55 <vishy> devananda: i just want to make sure we aren't limiting db choices by forcing the db to handle global time for us
22:02:57 <vishy> we could instead do something like use the nova-conductor time for everything but a lot of the time comparisons are against db objects so we do get some performance benefits by doing it concurrently with db requests
22:03:01 <mikal> Given we're now over time, do we want to discuss this at the summit?
22:03:06 <vishy> sure
22:03:32 <beagles> is there value in having an openstack system time that has a source from "somewhere"... start there, maybe have it initialized and periodically verified from whatever source?
22:03:33 <vishy> I'm still not convinced that running a robust ntp solution isn't easier
22:03:46 <vishy> beagles: that is ntp
22:03:57 <vishy> but yeah we are out of time
22:04:03 <vishy> summit discussion!
22:04:04 <beagles> vishy: :) of course it could grab it from there ;)
22:04:17 <vishy> #endmeeting