15:01:11 <n0ano> #startmeeting gantt 15:01:16 <openstack> Meeting started Tue Jun 10 15:01:11 2014 UTC and is due to finish in 60 minutes. The chair is n0ano. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:18 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:21 <openstack> The meeting name has been set to 'gantt' 15:01:30 * n0ano is hopefully using the right window this week :-) 15:01:37 <n0ano> anyone here to talk about the scheduler? 15:01:43 <mspreitz> yes 15:02:17 <toan-tran> yes 15:03:16 <n0ano> bauzas won't be here today (traveling) but I got a status from him 15:03:25 <n0ano> #topic code forklift 15:04:06 <n0ano> he had to address issues related to object model support and miss use of some DB fields but the patches for the scheduler-lib are here: 15:04:13 <n0ano> https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/scheduler-lib,n,z 15:04:31 <n0ano> #action all to review the scheduler-lib patches 15:05:16 <n0ano> I did find some people to work on the isolate scheduler DB access BP so we should get some action on that 15:05:37 <n0ano> other than that, I think the forklift is basically work in progress 15:06:21 <n0ano> if no questions on that... 15:06:41 <n0ano> #topic no-db scheduler 15:07:04 <n0ano> I noticed that yorik (doesn't appear to be online today) abandoned the patch for the no-db work 15:07:26 <n0ano> I wanted to talk to him about that as I don't think we wanted to just give up on it 15:07:30 <toan-tran> n0ano: I hope that's not because of last week 15:07:38 <n0ano> toan-tran, agreed 15:08:09 <n0ano> I'll ping him on email and see if I can get an explanation, hopefully he is just looking at different ways to implement things 15:08:38 <n0ano> anyway... 15:08:51 <n0ano> #topic policy based scheduler 15:08:59 <n0ano> toan-tran, I believe this is your issue 15:09:09 <toan-tran> n0ano: thanks 15:09:32 <toan-tran> Here is the bp: https://blueprints.launchpad.net/nova/+spec/policy-based-scheduler 15:09:41 <toan-tran> and its specs: https://review.openstack.org/#/c/97503/2 15:10:06 <toan-tran> the idea is to be able to control scheduling decision process by policy 15:10:31 <toan-tran> currently what're we're doing is to put a list of Filters, Weighers and parameters into nova.conf 15:10:57 <toan-tran> all of these Filters & Weighers will be applied to ALL requests from ALL clients on ALL hosts 15:11:41 <toan-tran> imagin if we need 2 policies: 15:11:54 <toan-tran> LoadBalancing for overall infrastructure 15:12:09 <toan-tran> and Consolidation (regrouping hosts) in an aggregate 15:12:19 <toan-tran> that's simply impossible 15:12:30 <mspreitz> I'm not sure I understand your use case 15:12:40 <toan-tran> FYI, we have this usecase at Cloudwatt 15:12:50 <toan-tran> :) 15:12:50 <mspreitz> what do you mean by "overall infrastructure", how is that different from an aggregate? 15:12:58 <toan-tran> it's Windows licence :) 15:13:15 <toan-tran> Microsoft charges Windows licence by physical hosts 15:13:30 <mspreitz> so placement matters 15:13:39 <toan-tran> thus it's important that we can regroup Windows VM in minimal of hosts 15:13:47 <toan-tran> ==> Consolidation 15:14:15 <mspreitz> I do not understand why you are framing this as an issue for all VMs. Shouldn't it be a policy issue for the VMs with Windows licenses whose cost you want to minimize? 15:14:18 <PaulMurray> toan-tran do you stack windows? 15:14:18 <n0ano> toan-tran, devil's advocate here, wouldn't that just be a slightly more complex weighting funcitno 15:14:26 <toan-tran> however, we still need the global Load Balancing in all infrastructure, so that other VMs will be distributed evenly 15:14:44 <toan-tran> PaulMurray: yes 15:15:08 <toan-tran> mspreitz: it's one of the usecases 15:15:17 <PaulMurray> toan-tran, very familiar... :) 15:15:25 <PaulMurray> toan-tran, we do it with filter scheduler 15:15:33 <PaulMurray> toan-tran, whats the problem? 15:15:38 <toan-tran> n0ano: yes this usecase is feasible with another wiegher 15:15:45 <mspreitz> The big picture thing that confuses me is that the blueprint talks about making distinctions per client, but the solution is not so structured 15:15:46 <PaulMurray> toan-tran, not saying it cant be better... 15:15:54 <toan-tran> the thing is more general 15:16:24 <toan-tran> we have only ONE global policy for all requests , all users, all clusters of hosts 15:16:47 <toan-tran> what we're proposing is a seperation between scheduling logic from its application domain 15:17:01 <toan-tran> scheduling logic = how do you want to schedule the requested resources 15:17:15 <toan-tran> application demain = where you want to execute this logic 15:17:52 <toan-tran> another example then : :) 15:18:23 <toan-tran> 2 users signs 2 different contracts 15:18:58 <toan-tran> one with gold quality (high quality equipments) , another with trial (low quality equipments) 15:19:49 <toan-tran> with Filter Scheduler, you probably create 2 flavors and put metadata on them and give users the rights to use it 15:19:50 <YorikSar> n0ano: Sorry for being late. We can get back to no-db topic once we're finished with other topics. 15:19:59 <n0ano> YorikSar, tnx, will do 15:20:04 <toan-tran> the problem is that : 15:20:21 <toan-tran> it's not transparent to users: he has to explicitly choose the right flavor 15:20:58 <toan-tran> imagine that the trial user is now satisfied with the trial and decide to go for gold contract 15:21:25 <toan-tran> then he has to change his entire application to call for the gold flavor 15:21:37 <mspreitz> huh? 15:21:44 <mspreitz> How much of a change is that really? 15:21:58 <n0ano> toan-tran, he just has to change the flavor requested, that doesn't seem like such a big deal 15:22:29 <toan-tran> mspreitz: well, it's not tranparent 15:22:45 <mspreitz> The contract quality is inherently visible to the cloud user 15:22:49 <mspreitz> one way or another 15:22:59 <n0ano> toan-tran, not sure it should be transparent, he's changing what he will be billed, he should be aware of that 15:23:13 <toan-tran> mspreitz: but technically it is not managed by client, but by cloud provider 15:23:24 <mspreitz> What n0ano said 15:23:53 <toan-tran> n0ano: he will be billed by his contract, 15:24:06 <toan-tran> yes 15:24:12 <n0ano> I would imaging that cloud provider has two host aggregates, one gold & one bronze, users can use flavor to select the price/performance the user wants 15:24:17 <toan-tran> but he does not need to verify his flavors 15:25:06 <toan-tran> n0ano: we do, actually, have several aggrs with associated flavors for customers to choose 15:25:11 <mspreitz> I am very confused. This example is about the very kind of stuff that flavors are about 15:25:37 <mspreitz> well, that's a bit of an overstatement, but you get the idea 15:25:45 <toan-tran> mspreitz: the problem is that right now customers have to select the right flavors 15:26:00 <toan-tran> and we want to leverage from customers 15:26:03 <mspreitz> Yes, flavors are based on considerations that are user visible 15:26:18 <toan-tran> to manage the whole deployment process from cloud provider's end 15:26:27 <n0ano> toan-tran, I think your policy based scheduling might have merit but we really don't see a good use case for it yet 15:26:39 <mspreitz> From the user's point of view, flavors are nothing but an unwelcome pain. They are there to make the providing easier. 15:27:04 <toan-tran> mspreitz: exactly 15:27:07 <mspreitz> OK, let me try to buck him up 15:27:15 <mspreitz> I can imagine use cases 15:27:35 <mspreitz> toan-tran may not remember my history here, but I came in with similar issues 15:27:52 <toan-tran> mspreitz: go ahead 15:27:58 <mspreitz> In fact, my group worked an example in 2012 where we deployed VMs running IBM software whose license cost also depends on placement 15:28:20 <mspreitz> we put a policy statement on those VMs that created a preference for co-location. 15:28:40 <mspreitz> A somewhat precise preference, in terms of licensing. 15:28:59 <mspreitz> In the same example we also had some anti-colocation constraints for reliability 15:29:24 <mspreitz> And used logic that threw in some preference for minimzing network usage. 15:29:40 <mspreitz> A reasonable thing if you are deploying, say, a three tier web application. 15:29:56 <mspreitz> But we took a little different tack... 15:30:19 <mspreitz> We let there be policy statements in the input, attached precisely to groups of VMs and between them. 15:30:38 <mspreitz> our solution transformed the input to a constrained optimization problem and solved that to get the placement. 15:30:58 <mspreitz> You may know that some others guys from Cisco and VMware are advocating this approach 15:31:05 <mspreitz> as well 15:31:09 <toan-tran> SolverScheduler you mean 15:31:14 <mspreitz> yes 15:32:04 <toan-tran> I have done some analysis on that: https://docs.google.com/document/d/1RfP7jRsw1mXMjd7in72ARjK0fTrsQv1bqolOriIQB2Y/edit?usp=drive_web 15:32:19 <toan-tran> SolverScheduler needs a bunch of contraints as input 15:32:36 <toan-tran> and PolicyBasedScheduler can get these constraints 15:33:12 <toan-tran> actually the constraint will be what the policy rules dictate 15:33:16 <mspreitz> Yes, I saw that analysis, and was not as alarmed by it as by the remarks in the blueprint 15:33:28 <n0ano> time check, I want to talk about no-db and have an open, do we have an end goal for this discussion? 15:33:36 <mspreitz> If your intent is what you outline in that analysis, I may be able to live with it 15:34:05 <mspreitz> n0ano: i'm happy to stop here and do some more reading and thinking 15:34:07 <toan-tran> mspreitz: oh, I have intention to develop it into a more ambitious than that :) 15:34:29 <toan-tran> the first step is to have an policy based scheduling engine 15:34:51 <toan-tran> then (much much) later it can incorprate with Tetris & Congress 15:35:18 <n0ano> toan-tran, but first you have to get by the objections to the current BP :-) 15:35:30 <toan-tran> so that we can have a scheduling engin inside Gantt to be able to controlscheduling from intial placement to life-cycle 15:35:43 <toan-tran> n0ano: that's right :D 15:36:04 <toan-tran> I tries my best to present something simple in nova first 15:36:11 <n0ano> toan-tran, note, you have to refresh your BP, it's about to get dropped right now 15:36:39 <toan-tran> n0ano: what do you mean? 15:37:07 <n0ano> last message - code review expired after 1 week of no activity after a negative review, it can be restored using the `Restore Change` button under the Patch Set on the web interface 15:37:39 <toan-tran> n0ano: ok :) 15:37:50 <n0ano> anyway, moving on... 15:37:55 <n0ano> #topic no-db scheduler 15:38:01 <n0ano> YorikSar, you still here? 15:38:09 <YorikSar> n0ano: Yep 15:38:35 <n0ano> I see you abandoned the current patch, I hope that just means you thinking how to do it and not giving up 15:38:36 <YorikSar> I've written my reasons in the comment to change request in the spec repo 15:39:07 <mspreitz> link? 15:39:08 <YorikSar> Nope, I think it's just a premature optimization that shouldn't be implemented right away. 15:39:23 <YorikSar> #link https://review.openstack.org/92128 15:39:52 <YorikSar> There are other options that hasn't been considered. 15:40:02 <n0ano> good, I agree, just so that we don't give up completely 15:40:08 <YorikSar> And there's nothing to write in "Problem statement" section. 15:40:46 <YorikSar> So I guess this effort might be revived once Gantt faces performance issues related to DB, but for now it should be left alone. 15:41:16 <n0ano> I thought this all started with the performance issues with a Bluehost 10,000 node system, I would think those performance issues reamain 15:41:26 <n0ano> s/reamain/remain 15:41:40 <YorikSar> There have been a lot of work done under aegis of no-db-scheduler blueprint (even before it has been created). 15:42:23 <YorikSar> They should be verified. One hug performance bottleneck has been eliminated - separate key-value table for host states. 15:42:29 <YorikSar> *huge 15:42:51 <n0ano> that required a join operation, right? 15:42:58 <YorikSar> Yes 15:43:26 <YorikSar> And all cool documents-reports has been written before that has been done. 15:43:36 <mspreitz> And it has been experimentally verified that those joins cost less than the stuff that was eliminated? 15:43:56 <YorikSar> mspreitz: Yes... 15:44:17 <YorikSar> I'm talking about changes that happened last summer iirc 15:44:26 <mspreitz> thanks 15:45:28 <n0ano> so, to summarize, this is really a performance optimization issue and we want to make sure we are addressing the right problem 15:45:55 <YorikSar> So now I suggest to focus on more pressing issues like separating Gantt and polishing its API and come back to performance once it becomes a problem. 15:46:07 <n0ano> YorikSar, +1 15:46:09 <YorikSar> n0ano: Yes 15:46:32 <n0ano> OK, sounds like a plan 15:46:43 <toan-tran> YorikSar: I think the problem is always there, but I agree that we can address no db once Gantt is materalized 15:47:17 <YorikSar> toan-tran: It's speculated that it should become a real problem at 10k+ nodes. 15:47:35 <YorikSar> toan-tran: So I wouldn't say it's there. It might be. 15:48:05 <n0ano> moving on... 15:48:05 <YorikSar> toan-tran: I mean it definitelly was there, but in current state of Nova it might be already gone. 15:48:31 <n0ano> #topic opens 15:49:07 <n0ano> based upon last weeks discussion I've created a BP to optimize the status reporting from compute nodes 15:49:17 <n0ano> #link https://review.openstack.org/#/c/97903/ 15:49:45 <n0ano> basic idea is, rather that update the DB every minute, only update the DB when the status changes 15:49:56 <mspreitz> some folks were talking about status that changes every time, like CPU utilization. So I am confused. 15:50:18 <n0ano> cpu utilization is not currently reported so that will be a future issue 15:50:27 <mspreitz> same for memory? 15:50:32 <toan-tran> mspreitz: cpu utilization is utilisation-aware 15:50:40 <mspreitz> ? 15:50:52 <toan-tran> not on this period update 15:51:06 <mspreitz> toan-tran: what do you mean by "cpu utilization is utilisation-aware" ? 15:51:10 <n0ano> mspreitz, interesting point, turns out I think there's a bug in the current memory reporting, it doesn't report used memory... 15:51:21 <n0ano> it only reports memory used by instances 15:51:22 <toan-tran> mspreitz: period update does not update cpu utilisation if i'm not wrong 15:51:30 <mspreitz> Well, actually, I think we should focus first on allocated memory rather than used memory 15:51:30 <n0ano> toan-tran, +1 15:51:37 <PaulMurray> n0ano, the discussion I remember talked about calculating regularly but only updating if it changed 15:52:00 <n0ano> PaulMurray, that's what I do with the current reporting mechanism 15:52:09 <mspreitz> by which I think I may mean what n0ano said: focus on how much memory is dedicated to instances 15:52:50 <PaulMurray> n0ano, so does this change that reporting? 15:53:12 <n0ano> mspreitz, but that's not a true indication of `node` resources, if many other processes are using memory you might not want to schedule onto that node 15:54:03 <mspreitz> yeah, if you really want a compute node doing other things too, then you have to account for them 15:54:33 <n0ano> which it is currently not doing, I'll raise that issue on the mailing list and see what peope think 15:54:46 <mspreitz> and you have to account for broken compute-node business too (e.g., half-deleted VMs) 15:55:10 <n0ano> you could argue both ways (we only care about instance usage vs. node usage), I don't know what people really want 15:55:37 <PaulMurray> n0ano, actually we want to know what is left available 15:55:44 <PaulMurray> n0ano, imo 15:55:56 <n0ano> which is not what is currently being reported 15:55:58 <mspreitz> What I would really like is to count memory allocated to instances + all other memory usage, compare with memory capacity 15:56:14 <n0ano> mspreitz, +1 15:56:51 <mspreitz> But let me revise that after a second's reconsideration... 15:57:11 <toan-tran> mspreitz: I'm not sure if the current period update count on allocated mem to instances 15:57:12 <mspreitz> if the non-instance usage is arbitrarily dynamic, you could get into over-use 15:57:18 <n0ano> that is my preference but I see it rather simplistic, report free memory (no matter what the occupied memory is used for) 15:57:28 <n0ano> mspreitz, ? 15:57:50 <mspreitz> Suppose you allocate all available memory now, and then some non-instance process wants even more? 15:58:10 <n0ano> linux handles that 15:58:18 <PaulMurray> mspreitz, some hypervisors use memory in addition to the instance allocation 15:58:41 <n0ano> PaulMurray, all the more reason to report true memory usage 15:58:43 <mspreitz> You really want allocations such that the users can be relied on to not exceed them. We rarely want to actually use the host's virtual memory, we want to stay in physical memory 15:59:04 <PaulMurray> n0ano, yes, agreed 15:59:28 <mspreitz> I am presuming that the overheads due to being a compute node can be reasonably characterized 16:00:01 <n0ano> we're running out of time, I suggest everyone comment on my BP and respond to the email thread that I start 16:00:39 <n0ano> so, top of the hour, tnx everyone and we'll talk again next week 16:00:47 <n0ano> #endmeeting