14:59:58 <n0ano> #startmeeting scheduler
14:59:59 <openstack> Meeting started Tue Jun  4 14:59:58 2013 UTC.  The chair is n0ano. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:00 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:03 <openstack> The meeting name has been set to 'scheduler'
15:00:25 <n0ano> show of hands, anyone here for the scheduler meeting?
15:01:02 <belmoreira> here
15:01:19 <jgallard> hi!
15:02:44 <n0ano> hmmm, thin turnout so far, we should make all our major decisions today and force them through :-)
15:03:21 <senhuang> hello~
15:03:38 <belmoreira> :)
15:03:49 <n0ano> BTW, my new tip for the day, when moving into a new house, don't plug your computers into a switched outlet, the total silence when you swith it off is very disconcerting :-)
15:04:38 <n0ano> Well, let's get started...
15:04:47 * glikson here
15:04:49 <n0ano> #topic host directory service
15:05:10 <n0ano> Does anyone here know enough about this to talk about it, I don't
15:05:34 <PhiLDay> sorry - missed the subjet ?
15:05:44 <n0ano> host directory service
15:06:02 <n0ano> the issue came up at the Havana summit but there's no BP and I don't really know what it is
15:06:36 <PhiLDay> Don't even remember it from the summit to be honest
15:06:49 <belmoreira> I'm not aware of that… can you explain?
15:07:21 <n0ano> it't there on the etherpad but that's about it, if no one is knowledgeable about it I'm just going to drop it from our list
15:08:02 <senhuang> i believe it is something like polling the hosts for their capabilities?
15:08:06 <senhuang> but i am not sure.
15:08:37 <n0ano> if someone is really interested they can create a BP, until then I'd prefer to drop it.
15:08:52 <PhiLDay> "Host directory service" which stores extended information about hosts and configuration for scheduler. It is located inside the environment and has authentication. (cooperative work of the Dell and Mirantis teams).
15:09:09 <PhiLDay> Taken from teh Summet abstract
15:09:14 <n0ano> PhiLDay, is there a BP for that?
15:09:30 <PhiLDay> Not that I know of
15:10:16 <n0ano> Well, no BP and the people involved haven't attended this meeting yet so I'm not seeing that much interest in the subject.
15:10:54 <n0ano> Why don't we move on for now...
15:11:00 <n0ano> #topic opens
15:11:47 <n0ano> I think I'd like to open things up, we've discussed all the issues from the Havanna summit, is there anything people want to go over in more detail for now?
15:13:56 <PhiLDay> Nothing burning for me - just need to free up some time to start work
15:14:24 <n0ano> I'm hearing a lot of silence (which is not necessarily bad)
15:14:38 <belmoreira> I have a BP that probably is good to have a discussion on it
15:14:50 <belmoreira> https://blueprints.launchpad.net/nova/+spec/schedule-set-availability-zones
15:15:33 <belmoreira> but probably is good to do it in other meeting after people reading it
15:15:51 <belmoreira> I can give you an overview if you are interested
15:15:55 <PhiLDay> Just taking a quick look now.
15:16:36 <belmoreira> The idea is to define a list of default availability zones not only one like what you have now
15:16:41 <n0ano> #topic set availability zones
15:16:54 <PhiLDay> BTW we could also talk about the isolation filter change ?
15:16:59 <belmoreira> if we do that it's needed to select the best one
15:17:30 <senhuang> what do you mean by "the best" availability zone?
15:18:04 <n0ano> belmoreira, so, if I read you, you are only addressing where availabilty zones are defined but none is specified in the schedule request
15:18:15 <belmoreira> schedule the availability zone that has for example more free ram
15:18:28 <glikson> belmoreira: how would this compare to cells?
15:18:30 <senhuang> it seems to me "ram" is more like a property of node
15:18:52 <PhiLDay> The biggest issue I see is that in some configurations (like ours ;-)   Instances and volumes have to be in the same AZ - so if you change the "default" az dynamically in Nova then it will cause issues for people expecting to work accross Nova and Cinder
15:18:54 <glikson> there is ongoing work to add scheduling capabilities across cells..
15:19:05 <n0ano> senhuang, indeed, .5 G for each of 5 nodes is not the same as 2G on one node
15:19:48 <belmoreira> what I would like to have is a set of default availability zones if the user don't define any on vm create
15:19:58 <belmoreira> more or less like aws ec2
15:20:12 <senhuang> n0ano: yes. it also depends on the algorithm to calculate/aggregate the capabilities for nodes in a zone
15:20:14 <PhiLDay> I think if your lookign for cross AZ or cross Cell criteria then it should be things like #instances, #running_creates
15:20:21 <belmoreira> it can be random the selection (if you have a list of az)
15:20:37 <belmoreira> or it can use the scheduler for select the az
15:21:05 <jgallard> nice idea, but from my understanding, cells will do that, no?
15:21:22 <PhiLDay> but not everyone will deploy cells
15:21:30 <belmoreira> in our setup we have az inside cells
15:21:45 <belmoreira> the scheduler is done inside a cell
15:22:47 <PhiLDay> I like the idea - just trying to think how to make it work consistently across Nova and Cinder
15:22:57 <n0ano> I plead ignorance, what does the scheduler currently do in your situation
15:23:15 <senhuang> when all the nodes within the configured set of az's run out of resources, what the scheduler will do
15:23:48 <senhuang> PhiLDay: i think you can define the same defatul set of az's for both cinder and nova
15:24:16 * jgallard is thinking about the potential complexity of the scheduling : scheduling between host aggregates / scheduling between AZ / scheduling between cells
15:25:43 <senhuang> jgllard: once the query-scheduler function is implemented, the scheduler will be only responsible for host/ selection. it might be not a big issue of complexity.
15:25:43 <belmoreira> I will be more descriptive in the BP. And if we can discuss then in the next meeting will be great.
15:25:58 <PhiLDay> Today you can set the same default AZ (single) for both Nova and Cinder - but with this BP you would get a list - and so an instance create and a volume create both without an AZ specified could end up in different AZs
15:26:25 <n0ano> belmoreira, NP, I'll add this to the agenda, looks like there are still some details that need to be sorted out
15:27:03 <senhuang> PhilDay: the right solution will be having a cross-project resource selection/scheduler
15:27:03 <jgallard> senhuang, ok, maybe, I need to think a little bit more about it
15:27:09 <belmoreira> PhiLDay: yes good point. In my setup we don't have cinder in different az.
15:27:18 <jgallard> senhuang, +1
15:28:05 <n0ano> OK, let's all think about this and discuss further next week
15:28:13 <n0ano> #topic isolation filter
15:28:25 <n0ano> PhiLDay, this is yours, right?
15:28:59 <PhiLDay> You almost want it to be sticky for a user - i.e the first default AZ is random, but after that they always go to that one.      If you coudl define teh defautk AZ per project in Keystome maybe
15:30:02 <PhiLDay> It was a question really about a review that belmoreira has going through, that others have suggestes my be related to whole-host allcoation
15:30:18 <PhiLDay> https://review.openstack.org/#/c/28635/
15:31:09 <PhiLDay> The filter is similar to an existing filter - but there wasn't any consensus amongst the reviewers on whether to have 3 simple (but similar) filters or one configurable filter
15:32:04 <PhiLDay> Also whether he aggregate metadata used by the filters should be the same or different
15:32:26 <PhiLDay> IAs its scheduler related, and it seems to be a slow news day here I thought it might be worth getting opiions
15:32:27 <belmoreira> Would be good to have other opinions
15:32:53 <n0ano> hmmm, I don't like the idea of 3 mostly the same filters (code duplication issues with things getting out of sync) but configuration can be a pain also, I think this is all in the implementation details
15:33:45 <PhiLDay> I was leaning towards having 3 filters but aligning on metadata (As configuring which filter to use is the same as configuring a filter)
15:34:01 <belmoreira> Having different filters is more clean and don't change any behavior from previous releases.
15:34:18 <n0ano> PhiLDay, back to the code duplication concerns
15:34:54 <n0ano> belmoreira, but making sure the default configuration matches the previous releases should solve that issue
15:35:17 <belmoreira> PhiLDay: the only problem that I see having the same metadata is if someone wants to use the filters simultaneously but with different behaviors. It will not be possible to configure in that case.
15:35:26 <PhiLDay> Each filter is very short - and it is easier to see what it does as a spereate filter.  But I think we shouldl make it easy for someone to switch between them without having to set up new metadata values ion their aggregates
15:35:49 <n0ano> I think I'm talking myself into one configurable filter, but it's not a strong preference
15:36:33 <PhiLDay> belmoreira>  I was lookign at teh filters as being in effect exclusive - do you think there is a valid use case for using them in combination ?
15:37:08 <PhiLDay> AggregateMultiTenancyIsolation:  Reserves hosts in specific aggregates only for use by selected projects (in effect limits other projects to the subset of the nodes not on those aggregates)    - Controlled Projects can use specific aggregate and any     other non-specific aggregate  - Other Projects can use any non-specific aggregate       ProjectsToAggregateFilter:  Constrains projects to specific aggregates.        - Co
15:37:25 <PhiLDay> Ok - that didn't paste well ;-(
15:37:35 <n0ano> PhiLDay, you noticed :-)
15:37:37 <PhiLDay> AggregateMultiTenancyIsolation:  Reserves hosts in specific aggregates only for use by selected projects (in effect limits other projects to the subset of the nodes not on those aggregates)
15:37:51 <belmoreira> yes… I agree with you. I can't find a valid use case :)
15:38:23 <PhiLDay> ProjectsToAggregateFilter:  Constrains projects to specific aggregates. (i.e mandatory isolation)
15:39:29 <PhiLDay> and then for whole-host I want something more like  AggregateMultiTenancyIsolation but with the user able to chose if tey want to go into a restrcited aggregate or not
15:40:42 <belmoreira> but makes sense to change the metadata in AggregateMultiTenancyIsolation to be the same as ProjectsToAggregateFilter
15:41:04 <belmoreira> and also give support to multiple projects in AggregateMultiTenancyIsolation
15:41:40 <belmoreira> instead using the metadata key in AggregateMultiTenancyIsolation
15:41:47 <belmoreira> ?
15:42:57 <PhiLDay> I hadn't picked up that AggregateMultiTenancyIsolation was single tenant only ?       I agree that would be cleaner - I guess it causes a compatibility issue though.
15:43:41 <belmoreira> yes… that's why at the end I moved to a different filter
15:44:58 <PhiLDay> The doc string says "If a host is in an aggregate that has the metadata key         "filter_tenant_id" it can only create instances from that tenant(s)."
15:45:27 <PhiLDay> isn't filter_tenant_id in fact a list of tenant IDs ?
15:45:52 <belmoreira> no. It only supports one project.
15:46:50 <PhiLDay> if tenant_id not in metadata["filter_tenant_id"]:   is treating the metdata value as a list no ?
15:49:11 <jgallard> URL : https://github.com/openstack/nova/blob/master/nova/scheduler/filters/aggregate_multitenancy_isolation.py#L43
15:52:34 <belmoreira> yes… you get all filter_tenant_ids from that host
15:52:45 <jog0> I joined the meeting late, but was wondering if the floor will be open for a few minutes at the end?  I have a general question
15:53:39 <n0ano> jog0, as long as it's quick, we're running out of time.
15:53:57 <belmoreira> but you can only define one filter_tenant_id per aggregate
15:54:18 <jog0> So at the last summit BlueHost announced they have a 16k node openstack cluster
15:54:37 <n0ano> belmoreira, PhiLDay are we winding down on this, maybe people should follow the review link and chime in there or we can continue the discussion next week.
15:54:48 <jog0> and one of the first things that they had to change was the scheduler.  As it didn't work at that scale, what are you guys discussing to make scheduling work at scale?
15:55:16 <belmoreira> PhiLDay: this means we can't have filter_tenant_id=id1, id2, id3
15:55:24 <belmoreira> per aggregate
15:55:53 <n0ano> jog0, would be very interested to see what BlueHost did, I don't think we're doing anything specific to address scale right now.
15:56:28 <jog0> n0ano: ohh :(, scale seems like one of the most important use cases for all this stuff
15:56:37 <jog0> bluehost gutted the scheduler all together
15:56:43 <jog0> and just swapped in a simple one
15:57:26 <jog0> compute nodes broadcasting stats to all schedulers every minute turns out to not scale well among other things
15:57:33 <n0ano> jog0, without identifying what the problems were with the current scheduler?
15:58:03 <PhiLDay> @belmoreira - have to fly no, maybe we can follow up on e-mail (phil.day@hp.com)
15:58:09 <jog0> n0ano: ^, and the schedulers just spun processing all the incomming compute node reports
15:58:45 <jog0> comstud and I have a bug open on this, https://bugs.launchpad.net/nova/+bug/1178008
15:58:46 <uvirtbot> Launchpad bug 1178008 in nova "publish_service_capabilities does a fanout to all nova-compute" [Undecided,Triaged]
15:58:47 <n0ano> jog0, maybe that was a solveable problem rather than just replacing things
15:58:47 <jgallard> PhiLDay, belmoreira , but as a compute can belongs to several aggregates, it can have several tenants?
15:58:54 <senhuang> bluehost's scalability is amazing
15:59:13 <jog0> n0ano: agreed, so lets solve it
15:59:17 <n0ano> jog0,  I would love to discuss this further next week if possible, will you be here then?
15:59:29 <jog0> sure
15:59:45 <n0ano> OK, we'll have to close for now but I'm hoping next week will be lively
15:59:49 <n0ano> Tnx everyone
15:59:54 <n0ano> #endmeeting