#openstack-meeting log

15:01:11 <n0ano> #startmeeting gantt
15:01:16 <openstack> Meeting started Tue Jun 10 15:01:11 2014 UTC and is due to finish in 60 minutes.  The chair is n0ano. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:01:18 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:01:21 <openstack> The meeting name has been set to 'gantt'
15:01:30 * n0ano is hopefully using the right window this week :-)
15:01:37 <n0ano> anyone here to talk about the scheduler?
15:01:43 <mspreitz> yes
15:02:17 <toan-tran> yes
15:03:16 <n0ano> bauzas won't be here today (traveling) but I got a status from him
15:03:25 <n0ano> #topic code forklift
15:04:06 <n0ano> he had to address issues related to object model support and miss use of some DB fields but the patches for the scheduler-lib are here:
15:04:13 <n0ano> https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/scheduler-lib,n,z
15:04:31 <n0ano> #action all to review the scheduler-lib patches
15:05:16 <n0ano> I did find some people to work on the isolate scheduler DB access BP so we should get some action on that
15:05:37 <n0ano> other than that, I think the forklift is basically work in progress
15:06:21 <n0ano> if no questions on that...
15:06:41 <n0ano> #topic no-db scheduler
15:07:04 <n0ano> I noticed that yorik (doesn't appear to be online today) abandoned the patch for the no-db work
15:07:26 <n0ano> I wanted to talk to him about that as I don't think we wanted to just give up on it
15:07:30 <toan-tran> n0ano: I hope that's not because of last week
15:07:38 <n0ano> toan-tran, agreed
15:08:09 <n0ano> I'll ping him on email and see if I can get an explanation, hopefully he is just looking at different ways to implement things
15:08:38 <n0ano> anyway...
15:08:51 <n0ano> #topic policy based scheduler
15:08:59 <n0ano> toan-tran, I believe this is your issue
15:09:09 <toan-tran> n0ano: thanks
15:09:32 <toan-tran> Here is the bp: https://blueprints.launchpad.net/nova/+spec/policy-based-scheduler
15:09:41 <toan-tran> and its specs: https://review.openstack.org/#/c/97503/2
15:10:06 <toan-tran> the idea is to be able to control scheduling decision process by policy
15:10:31 <toan-tran> currently what're we're doing is to put a list of Filters, Weighers and parameters into nova.conf
15:10:57 <toan-tran> all of these Filters & Weighers will be applied to ALL requests from ALL clients on ALL hosts
15:11:41 <toan-tran> imagin if we need 2 policies:
15:11:54 <toan-tran> LoadBalancing for overall infrastructure
15:12:09 <toan-tran> and Consolidation (regrouping hosts) in an aggregate
15:12:19 <toan-tran> that's simply impossible
15:12:30 <mspreitz> I'm not sure I understand your use case
15:12:40 <toan-tran> FYI, we have this usecase at Cloudwatt
15:12:50 <toan-tran> :)
15:12:50 <mspreitz> what do you mean by "overall infrastructure", how is that different from an aggregate?
15:12:58 <toan-tran> it's Windows licence :)
15:13:15 <toan-tran> Microsoft charges Windows licence by physical hosts
15:13:30 <mspreitz> so placement matters
15:13:39 <toan-tran> thus it's important that we can regroup Windows VM in minimal of hosts
15:13:47 <toan-tran> ==> Consolidation
15:14:15 <mspreitz> I do not understand why you are framing this as an issue for all VMs.  Shouldn't it be a policy issue for the VMs with Windows licenses whose cost you want to minimize?
15:14:18 <PaulMurray> toan-tran do you stack windows?
15:14:18 <n0ano> toan-tran, devil's advocate here, wouldn't that just be a slightly more complex weighting funcitno
15:14:26 <toan-tran> however, we still need the global Load Balancing in all infrastructure, so that other VMs will be distributed evenly
15:14:44 <toan-tran> PaulMurray: yes
15:15:08 <toan-tran> mspreitz: it's one of the usecases
15:15:17 <PaulMurray> toan-tran, very familiar... :)
15:15:25 <PaulMurray> toan-tran, we do it with filter scheduler
15:15:33 <PaulMurray> toan-tran, whats the problem?
15:15:38 <toan-tran> n0ano: yes this usecase is feasible  with another wiegher
15:15:45 <mspreitz> The big picture thing that confuses me is that the blueprint talks about making distinctions per client, but the solution is not so structured
15:15:46 <PaulMurray> toan-tran, not saying it cant be better...
15:15:54 <toan-tran> the thing is more general
15:16:24 <toan-tran> we have only ONE global policy for all requests , all users, all clusters of hosts
15:16:47 <toan-tran> what we're proposing is a seperation between scheduling logic from its application domain
15:17:01 <toan-tran> scheduling logic = how do you want to schedule the requested resources
15:17:15 <toan-tran> application demain = where you want to execute this logic
15:17:52 <toan-tran> another example then : :)
15:18:23 <toan-tran> 2 users signs 2 different contracts
15:18:58 <toan-tran> one with gold quality (high quality equipments) , another with trial (low quality equipments)
15:19:49 <toan-tran> with Filter Scheduler, you probably create 2 flavors and put metadata on them and give users the rights to use it
15:19:50 <YorikSar> n0ano: Sorry for being late. We can get back to no-db topic once we're finished with other topics.
15:19:59 <n0ano> YorikSar, tnx, will do
15:20:04 <toan-tran> the problem is that :
15:20:21 <toan-tran> it's not transparent to users: he has to explicitly choose the right flavor
15:20:58 <toan-tran> imagine that the trial user is now satisfied with the trial and decide to go for gold contract
15:21:25 <toan-tran> then he has to change his entire application to call for the gold flavor
15:21:37 <mspreitz> huh?
15:21:44 <mspreitz> How much of a change is that really?
15:21:58 <n0ano> toan-tran, he just has to change the flavor requested, that doesn't seem like such a big deal
15:22:29 <toan-tran> mspreitz: well, it's not tranparent
15:22:45 <mspreitz> The contract quality is inherently visible to the cloud user
15:22:49 <mspreitz> one way or another
15:22:59 <n0ano> toan-tran, not sure it should be transparent, he's changing what he will be billed, he should be aware of that
15:23:13 <toan-tran> mspreitz: but technically  it is not managed by client, but by cloud provider
15:23:24 <mspreitz> What n0ano said
15:23:53 <toan-tran> n0ano: he will be billed by his contract,
15:24:06 <toan-tran> yes
15:24:12 <n0ano> I would imaging that cloud provider has two host aggregates, one gold & one bronze, users can use flavor to select the price/performance the user wants
15:24:17 <toan-tran> but he does not need to verify his flavors
15:25:06 <toan-tran> n0ano: we do, actually, have several aggrs with associated flavors for customers to choose
15:25:11 <mspreitz> I am very confused.  This example is about the very kind of stuff that flavors are about
15:25:37 <mspreitz> well, that's a bit of an overstatement, but you get the idea
15:25:45 <toan-tran> mspreitz: the problem is that right now customers have to select the right flavors
15:26:00 <toan-tran> and we want to leverage from customers
15:26:03 <mspreitz> Yes, flavors are based on considerations that are user visible
15:26:18 <toan-tran> to manage the whole deployment process from cloud provider's end
15:26:27 <n0ano> toan-tran, I think your policy based scheduling might have merit but we really don't see a good use case for it yet
15:26:39 <mspreitz> From the user's point of view, flavors are nothing but an unwelcome pain.  They are there to make the providing easier.
15:27:04 <toan-tran> mspreitz: exactly
15:27:07 <mspreitz> OK, let me try to buck him up
15:27:15 <mspreitz> I can imagine use cases
15:27:35 <mspreitz> toan-tran may not remember my history here, but I came in with similar issues
15:27:52 <toan-tran> mspreitz: go ahead
15:27:58 <mspreitz> In fact, my group worked an example in 2012 where we deployed VMs running IBM software whose license cost also depends on placement
15:28:20 <mspreitz> we put a policy statement on those VMs that created a preference for co-location.
15:28:40 <mspreitz> A somewhat precise preference, in terms of licensing.
15:28:59 <mspreitz> In the same example we also had some anti-colocation constraints for reliability
15:29:24 <mspreitz> And used logic that threw in some preference for minimzing network usage.
15:29:40 <mspreitz> A reasonable thing if you are deploying, say, a three tier web application.
15:29:56 <mspreitz> But we took a little different tack...
15:30:19 <mspreitz> We let there be policy statements in the input, attached precisely to groups of VMs and between them.
15:30:38 <mspreitz> our solution transformed the input to a constrained optimization problem and solved that to get the placement.
15:30:58 <mspreitz> You may know that some others guys from Cisco and VMware are advocating this approach
15:31:05 <mspreitz> as well
15:31:09 <toan-tran> SolverScheduler you mean
15:31:14 <mspreitz> yes
15:32:04 <toan-tran> I have done some analysis on that: https://docs.google.com/document/d/1RfP7jRsw1mXMjd7in72ARjK0fTrsQv1bqolOriIQB2Y/edit?usp=drive_web
15:32:19 <toan-tran> SolverScheduler needs a bunch of contraints as input
15:32:36 <toan-tran> and PolicyBasedScheduler can get these constraints
15:33:12 <toan-tran> actually the constraint will be what the policy rules dictate
15:33:16 <mspreitz> Yes, I saw that analysis, and was not as alarmed by it as by the remarks in the blueprint
15:33:28 <n0ano> time check, I want to talk about no-db and have an open, do we have an end goal for this discussion?
15:33:36 <mspreitz> If your intent is what you outline in that analysis, I may be able to live with it
15:34:05 <mspreitz> n0ano: i'm happy to stop here and do some more reading and thinking
15:34:07 <toan-tran> mspreitz: oh, I have intention to develop it into a more ambitious than that :)
15:34:29 <toan-tran> the first step is to have an policy based scheduling engine
15:34:51 <toan-tran> then (much much) later it can incorprate with Tetris & Congress
15:35:18 <n0ano> toan-tran, but first you have to get by the objections to the current BP :-)
15:35:30 <toan-tran> so that we can have a scheduling engin inside Gantt to be able to controlscheduling from intial placement to life-cycle
15:35:43 <toan-tran> n0ano: that's right :D
15:36:04 <toan-tran> I tries my best to present something simple in nova first
15:36:11 <n0ano> toan-tran, note, you have to refresh your BP, it's about to get dropped right now
15:36:39 <toan-tran> n0ano: what do you mean?
15:37:07 <n0ano> last message - code review expired after 1 week of no activity after a negative review, it can be restored using the `Restore Change` button under the Patch Set on the web interface
15:37:39 <toan-tran> n0ano: ok :)
15:37:50 <n0ano> anyway, moving on...
15:37:55 <n0ano> #topic no-db scheduler
15:38:01 <n0ano> YorikSar, you still here?
15:38:09 <YorikSar> n0ano: Yep
15:38:35 <n0ano> I see you abandoned the current patch, I hope that just means you thinking how to do it and not giving up
15:38:36 <YorikSar> I've written my reasons in the comment to change request in the spec repo
15:39:07 <mspreitz> link?
15:39:08 <YorikSar> Nope, I think it's just a premature optimization that shouldn't be implemented right away.
15:39:23 <YorikSar> #link https://review.openstack.org/92128
15:39:52 <YorikSar> There are other options that hasn't been considered.
15:40:02 <n0ano> good, I agree, just so that we don't give up completely
15:40:08 <YorikSar> And there's nothing to write in "Problem statement" section.
15:40:46 <YorikSar> So I guess this effort might be revived once Gantt faces performance issues related to DB, but for now it should be left alone.
15:41:16 <n0ano> I thought this all started with the performance issues with a Bluehost 10,000 node system, I would think those performance issues reamain
15:41:26 <n0ano> s/reamain/remain
15:41:40 <YorikSar> There have been a lot of work done under aegis of no-db-scheduler blueprint (even before it has been created).
15:42:23 <YorikSar> They should be verified. One hug performance bottleneck has been eliminated - separate key-value table for host states.
15:42:29 <YorikSar> *huge
15:42:51 <n0ano> that required a join operation, right?
15:42:58 <YorikSar> Yes
15:43:26 <YorikSar> And all cool documents-reports has been written before that has been done.
15:43:36 <mspreitz> And it has been experimentally verified that those joins cost less than the stuff that was eliminated?
15:43:56 <YorikSar> mspreitz: Yes...
15:44:17 <YorikSar> I'm talking about changes that happened last summer iirc
15:44:26 <mspreitz> thanks
15:45:28 <n0ano> so, to summarize, this is really a performance optimization issue and we want to make sure we are addressing the right problem
15:45:55 <YorikSar> So now I suggest to focus on more pressing issues like separating Gantt and polishing its API and come back to performance once it becomes a problem.
15:46:07 <n0ano> YorikSar, +1
15:46:09 <YorikSar> n0ano: Yes
15:46:32 <n0ano> OK, sounds like a plan
15:46:43 <toan-tran> YorikSar: I think the problem is always there, but I agree that we can address no db once Gantt is materalized
15:47:17 <YorikSar> toan-tran: It's speculated that it should become a real problem at 10k+ nodes.
15:47:35 <YorikSar> toan-tran: So I wouldn't say it's there. It might be.
15:48:05 <n0ano> moving on...
15:48:05 <YorikSar> toan-tran: I mean it definitelly was there, but in current state of Nova it might be already gone.
15:48:31 <n0ano> #topic opens
15:49:07 <n0ano> based upon last weeks discussion I've created a BP to optimize the status reporting from compute nodes
15:49:17 <n0ano> #link https://review.openstack.org/#/c/97903/
15:49:45 <n0ano> basic idea is, rather that update the DB every minute, only update the DB when the status changes
15:49:56 <mspreitz> some folks were talking about status that changes every time, like CPU utilization.  So I am confused.
15:50:18 <n0ano> cpu utilization is not currently reported so that will be a future issue
15:50:27 <mspreitz> same for memory?
15:50:32 <toan-tran> mspreitz: cpu utilization is utilisation-aware
15:50:40 <mspreitz> ?
15:50:52 <toan-tran> not on this period update
15:51:06 <mspreitz> toan-tran: what do you mean by "cpu utilization is utilisation-aware" ?
15:51:10 <n0ano> mspreitz, interesting point, turns out I think there's a bug in the current memory reporting, it doesn't report used memory...
15:51:21 <n0ano> it only reports memory used by instances
15:51:22 <toan-tran> mspreitz: period update does not update cpu utilisation if i'm not wrong
15:51:30 <mspreitz> Well, actually, I think we should focus first on allocated memory rather than used memory
15:51:30 <n0ano> toan-tran, +1
15:51:37 <PaulMurray> n0ano, the discussion I remember talked about calculating regularly but only updating if it changed
15:52:00 <n0ano> PaulMurray, that's what I do with the current reporting mechanism
15:52:09 <mspreitz> by which I think I may mean what n0ano said: focus on how much memory is dedicated to instances
15:52:50 <PaulMurray> n0ano, so does this change that reporting?
15:53:12 <n0ano> mspreitz, but that's not a true indication of `node` resources, if many other processes are using memory you might not want to schedule onto that node
15:54:03 <mspreitz> yeah, if you really want a compute node doing other things too, then you have to account for them
15:54:33 <n0ano> which it is currently not doing, I'll raise that issue on the mailing list and see what peope think
15:54:46 <mspreitz> and you have to account for broken compute-node business too (e.g., half-deleted VMs)
15:55:10 <n0ano> you could argue both ways (we only care about instance usage vs. node usage), I don't know what people really want
15:55:37 <PaulMurray> n0ano, actually we want to know what is left available
15:55:44 <PaulMurray> n0ano, imo
15:55:56 <n0ano> which is not what is currently being reported
15:55:58 <mspreitz> What I would really like is to count memory allocated to instances + all other memory usage, compare with memory capacity
15:56:14 <n0ano> mspreitz, +1
15:56:51 <mspreitz> But let me revise that after a second's reconsideration...
15:57:11 <toan-tran> mspreitz: I'm not sure if the current period update count on allocated mem to instances
15:57:12 <mspreitz> if the non-instance usage is arbitrarily dynamic, you could get into over-use
15:57:18 <n0ano> that is my preference but I see it rather simplistic, report free memory (no matter what the occupied memory is used for)
15:57:28 <n0ano> mspreitz, ?
15:57:50 <mspreitz> Suppose you allocate all available memory now, and then some non-instance process wants even more?
15:58:10 <n0ano> linux handles that
15:58:18 <PaulMurray> mspreitz, some hypervisors use memory in addition to the instance allocation
15:58:41 <n0ano> PaulMurray, all the more reason to report true memory usage
15:58:43 <mspreitz> You really want allocations such that the users can be relied on to not exceed them.  We rarely want to actually use the host's virtual memory, we want to stay in physical memory
15:59:04 <PaulMurray> n0ano, yes, agreed
15:59:28 <mspreitz> I am presuming that the overheads due to being a compute node can be reasonably characterized
16:00:01 <n0ano> we're running out of time, I suggest everyone comment on my BP and respond to the email thread that I start
16:00:39 <n0ano> so, top of the hour, tnx everyone and we'll talk again next week
16:00:47 <n0ano> #endmeeting