15:00:35 <n0ano> #startmeeting scheduler 15:00:36 <openstack> Meeting started Tue Jun 18 15:00:35 2013 UTC. The chair is n0ano. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:37 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:40 <openstack> The meeting name has been set to 'scheduler' 15:00:51 <n0ano> show of hands, anyone here for the scheduler meeting? 15:00:58 <belmoreira> here 15:01:13 <jgallard> hi all 15:03:45 <n0ano> #topic scalability 15:04:17 <n0ano> I started a thread on the dev mailing list and have been getting some replies, have you seen the tread? 15:05:18 <belmoreira> not sure. what is the subject? 15:05:28 <jgallard> from my side, sorry, I was not available the last few days... 15:05:35 <shanewang> hi don 15:05:47 <n0ano> Subject: Compute node stats sent to the scheduler 15:06:12 <jgallard> ok, get it 15:06:17 <n0ano> to me the big question is do we communicate usage data to the scheduler via fan-out message or through the DB 15:07:09 <n0ano> I prefer fan-out messages (I hate DBs, it's a personal quirk) but I'm hearing from people that think the DB is the way to go. 15:07:27 <n0ano> I've voiced by concerns in the email thread, now waiting to hear back 15:09:06 <n0ano> we can talk about the issues here or just follow the email thread which is still active (I started the thread late) 15:09:16 <belmoreira> sorry I need to read all thread 15:09:17 <shanewang> I prefer fan-out messages too, as n0ano concerned, I am not sure who else is using db. 15:09:34 <jgallard> +1 fan-out messages 15:09:57 <n0ano> well, the one issue that hasn't been brought up is ceilometer, does it want to get the data from the DB or does it want to query the scheduler 15:10:39 <n0ano> shanewang, I'm in the process of reviewing the code to see who actually uses the DB data, not done with that yet. 15:12:49 <n0ano> if no one has any strong opions today I think it makes sense to just see how the email thread works out. 15:13:12 <shanewang> agree 15:13:41 <jgallard> ok 15:13:48 <n0ano> #topic follow ups on scheduler BPs 15:13:58 <n0ano> anyone have anything to report here? 15:14:37 <belmoreira> two weeks ago we decided to discuss the blueprint: https://blueprints.launchpad.net/nova/+spec/schedule-set-availability-zones 15:14:52 <belmoreira> there are any opinions now? 15:16:28 <n0ano> I don't see how you belong to multiple availability zones, can you explain that? 15:17:24 <belmoreira> my understanding is that az is now defined in aggregates 15:18:09 <belmoreira> a host can belong to different aggregates that have different azs 15:18:46 <n0ano> that seems - odd - to say the least, is that a feature that is actually being used? 15:20:19 <belmoreira> I don't see much sense on it too. But you can have a setup were a host have multiple azs 15:20:36 <belmoreira> that's why a raised the question in the BP 15:21:06 <belmoreira> but that is what is available now in nova... 15:21:42 <belmoreira> about the BP what do you guys think about it? 15:22:00 <n0ano> I'm not qualified to make a decision but I'd consider changing that to be a 1-1 map, host to AZ, but that might start a bit of a discussion 15:22:50 <jgallard> n0ano, +1 15:23:07 <belmoreira> I agree. 15:24:07 <belmoreira> but the BP to have multiple default azs to an instance 15:24:13 <belmoreira> instead only one. 15:25:11 <shanewang> what's the benefit to set multiple available zones? for dividing more zones logically and physically. 15:25:33 <jgallard> belmoreira, in the implementation part, "After find the node that will run the instance set the availability zone of the node to the instance", means AZ are provided dynamically according to clients? 15:25:46 <n0ano> back to your BP, I'm unclear on the use case, if the AZ is set `after` scheduling then it can't be used for physical separation 15:27:09 <belmoreira> jgallard. according to the client select using also the az_zone filter. 15:27:52 <belmoreira> when an instance is booted and the az filter is enabled only the clients of the default az are selected. 15:28:19 <belmoreira> if instead only one default az we have several 15:28:40 <belmoreira> the az filter passes for all the nodes in them 15:28:59 <belmoreira> the best node is selected and then the az is set 15:29:13 <n0ano> so you're only addressing the case where the user `doesn't` specify an AZ 15:29:28 <belmoreira> exactly 15:29:40 <belmoreira> means that he doesn't care 15:30:04 <belmoreira> but we can provide some reliability to the instances 15:30:17 <belmoreira> starting maybe in different azs 15:30:26 <n0ano> I can accept that and it makes sense but I still am having issues with setting the AZ after selecting the node, that means AZs don't apply to physical separation 15:31:16 <belmoreira> n0ano. its is physical separation as well... 15:32:00 <belmoreira> if you set two default azs means that you expect that the instances start in one of them 15:32:20 <n0ano> how, if you assign AZ after selecting the node then any node can be part of any AZ so there's no way to physically separate two nodes. 15:32:45 <jgallard> really? if a node can moved from one AZ to another, or even if a node can belong to multiple AZ at the same time? 15:33:07 <n0ano> jgallard, that's my point 15:33:28 <jgallard> n0ano, yes, and I agree with you 15:33:46 <belmoreira> if a node belongs to multiple azs is an admin problem 15:34:06 <belmoreira> in that case you don't have physical separation 15:34:29 <belmoreira> I completely agree with that 15:34:51 <n0ano> but `assigned after node selection` => not under admin control, this is under user control 15:35:49 <n0ano> hang on, I think I see the confusion ... 15:35:50 <belmoreira> but the az considered during the scheduler 15:35:55 <belmoreira> scheduling 15:36:21 <jgallard> user control? 15:36:42 <n0ano> a node is assigned to an AZ, the schdule request will select a node (potentially use AZ criteria) and, after the node is selecte, the AZ for the `instance` is set 15:36:57 <belmoreira> yes 15:37:47 <belmoreira> but instead only one default az if you have lets say two azs 15:38:11 <belmoreira> the scheduler uses the two azs for filtering 15:38:18 <belmoreira> and picks the node 15:38:31 <belmoreira> and the az of the node id set in the instance 15:38:42 <n0ano> in that case I don't see an issue with having multiple default AZs, if the user didn't specify then the user doesn't care which AZ it winds up in. 15:39:06 <belmoreira> for my use case is important to have this 15:39:18 <belmoreira> because users usually never define an AZ 15:39:24 <jgallard> this use case should not be handle with cells? 15:39:25 <belmoreira> and we need to have multiple 15:39:47 <belmoreira> so we end up with the default az very busy 15:40:17 <belmoreira> jgallard: we also have cells… but our cells are big 15:40:31 <belmoreira> we like to split them in azs 15:40:36 <n0ano> belmoreira, indeed. To me the only issue is which of the default AZs to pick, round robin or random or least used or ... 15:40:43 <jgallard> ok, in your configuration AZ are partitions of cells, right? 15:40:49 <jgallard> belmoreira, ah ok 15:42:08 <belmoreira> n0ano: my proposal is to change to availability_zone filter to pass for all azs defined as default. 15:42:27 <belmoreira> having the nodes of all azs 15:42:42 <jgallard> belmoreira, is my understanding correct? --> the idea is to have a kind of "scheduler for AZ", this scheduler will pick one availability zone among several default ones in the case the user doesn't choose a specific AZ 15:42:49 <belmoreira> the scheduler will select the best one considering the other filters 15:42:59 <n0ano> and then let the normal scheduling choose the best node - seems like a reasonble fairly simple change 15:42:59 <belmoreira> so is not random 15:44:06 <belmoreira> the point that I raised in the BP and we started the discussion with it 15:44:34 <belmoreira> is what to do if a host belongs to different aggregates 15:44:45 <belmoreira> and have multiple azs 15:45:20 <belmoreira> I agree that is bad… but someone can have a setup like this. 15:45:44 <n0ano> seems simple, if the host belongs to at least one of the default aggregates it passes, otherwise it only passes if it's a member of the specified AZ 15:47:13 <belmoreira> if it belongs to only one az of the default list it passes… and that az is set if the host is selected 15:47:36 <belmoreira> but if it belongs to more than one az in the default az list? 15:47:43 <belmoreira> it passes as well 15:47:53 <belmoreira> and what az we set to the instance? 15:48:05 <belmoreira> probably ramdom? 15:48:09 <n0ano> belmoreira, random, the user didn't specify so the user doesn't care 15:48:12 <jgallard> probably :) 15:48:31 <belmoreira> ok. good 15:48:45 <jgallard> or maybe the one which is the least loaded? 15:49:07 <belmoreira> but for that we need more queries 15:49:15 <n0ano> jgallard, the normal scheduling should have found the least loaded so I don't think we need to worry about that in the AZ filter 15:49:17 <belmoreira> random is to avoid that 15:50:05 <jgallard> but, I mean, if you are on a node with multiple AZ, the admin should want to give priority to AZ which is least loaded 15:50:12 <jgallard> I'm not sur if i'm clea 15:50:15 <jgallard> clear 15:51:02 <belmoreira> jgallard: completely agree… but how to know what is the least loaded? 15:51:19 <n0ano> I don't think the issue is `least loaded AZ` so much as it's `most optimal host` and the rest of the scheduling determines that 15:51:44 <jgallard> belmoreira, h�h� yes, as you said this probably needs more queries 15:53:53 <jgallard> n0ano, in fact, what I want to explain is that, if the user don't care about a specific AZ, and a node with multiple AZ is selected, perhaps, the admin will want to give a policy to select a prefer AZ between the one available on that node 15:54:17 <jgallard> but this is not targeted by this BP 15:55:12 <n0ano> potentially but remember, `prefence` is determined by the weighting functions, filters only do yes/no so, as you say, finding the preferred AZ would be a different BP 15:55:22 <n0ano> s/prefence/preference 15:55:52 <belmoreira> ok. I will start to implement this 15:56:19 <n0ano> belmoreira, you might want to update the BP to remove the question and put in the decision 15:56:40 <belmoreira> ok 15:56:50 <n0ano> #opens 15:57:00 <n0ano> just a few minutes left, does anyone have any opens for toda? 15:57:10 <jgallard> belmoreira, may be you can ask a question about the fact that, in the current implementation it's possible to have several AZ on a node? 15:57:17 <jgallard> (on the mailing list) 15:57:52 <n0ano> jgallard, good idea (I like using the mailing lists) 15:58:22 <belmoreira> ok 15:58:29 <jgallard> n0ano, same for me :-) 15:58:56 <jgallard> belmoreira, thanks! 15:59:03 <n0ano> hearing silence I think it's time to wrap up, tnx everyone, good discussion. 15:59:10 <jgallard> thanks to all! 15:59:27 <n0ano> #endmeeting