15:00:19 <n0ano> #startmeeting scheduler sub-group
15:00:19 <openstack> Meeting started Tue May 21 15:00:19 2013 UTC.  The chair is n0ano. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:20 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:22 <openstack> The meeting name has been set to 'scheduler_sub_group'
15:00:33 <n0ano> show of hands, anyone here for the scheduler meeting?
15:00:37 * glikson here
15:00:43 <jgallard_> hi all :)
15:00:50 <senhuang> hi all
15:01:28 <n0ano> Well, while waiting for people to get here, a little administrivia...
15:02:06 <n0ano> I have to leave after 20 min (my wife is having oral surgery) senhuang can I tag you to chari when I leave (mainly make sure you do an #endmeeting)?
15:02:24 <senhuang> sure
15:02:26 <n0ano> s/chari/chair
15:02:31 <n0ano> senhuang, tnx
15:02:43 <senhuang> n0ano: no problem!
15:02:48 <jgallard_> n0ano, I'm not sure someone else can do an endmeeting for you
15:02:54 <n0ano> #chair senhuang
15:02:55 <openstack> Current chairs: n0ano senhuang
15:03:20 <n0ano> uh oh, I have to be careful, senhuang can override my commands now :-)
15:03:27 <n0ano> anyway, let
15:03:41 <jgallard_> n0ano, ok :)
15:03:43 <n0ano> let's get started, maybe some lurkers will appear
15:04:00 <n0ano> #topic network bandwidth aware scheduling
15:04:14 <PhilDay> I filed a BP to cover this.
15:04:22 <n0ano> anyone here want to drive this discussion (I have some ideas but this is not my BP)
15:04:31 <n0ano> PhilDay, you have the floor
15:04:44 <alaski> hi all
15:04:48 <garyk> hi
15:05:21 <PhilDay> Basic idea for now is pretty simple - count the bandwidth allocated to flavors in the the same sort of way that we do memory
15:05:41 <PhilDay> so that its possibel to use overall bandwidth of a server as a scheduling constraint
15:06:21 <n0ano> fine idea, my concern is that this is rather specific, I'd prefer a more general mechanism
15:06:25 <garyk> PhilDay: do you mean taking free bandwidth into account?
15:06:27 <PhilDay> Difference is that the hypervisor probably can't derive the total available bandwidth, so it will need to be configured on a per host basis
15:06:27 <senhuang> this requires the host-manager to report the total bandwidth capability to the scheduler?
15:06:35 <PhilDay> Yep
15:06:39 <n0ano> could this be part of something like extending data in host state that has been proposed?
15:06:53 <PhilDay> If that is in place then we'd build on that for sure
15:07:14 <PhilDay> the flavor already has an rxtx_factor which is used in Xen
15:07:33 <senhuang> PhilDay: I see. that is good.
15:07:34 <PhilDay> there used to be an rxtx_cap, which is more along the lines that we were looking for
15:07:42 <glikson> yep, sounds like a reasonable extension to yet another kind of resource that would make sense to manage host capacity for.
15:08:23 <n0ano> would this be an extension to the rxtx_factor or a replacement for it?
15:08:32 <PhilDay> I'm not quiet sure what triggered the shift,  rxtx_factor is just a relative measure - so what you get with that depoend on which host you get scheduled to
15:09:16 <senhuang> what the constraint will look like? will we have a constraint for rx and one for tx?
15:09:21 <PhilDay> I was thinkign that maybe we could also set the rxtx valeu as a port property in case Quantum can do somethign more fancy with it in terms of QoS -
15:09:56 <PhilDay> I was thiking more of just a sinlge cap, but I guess it could be two values if you think there is a use case.
15:10:13 <senhuang> PhilDay: that might be something we can discuss with Quauntum team. they are also working on a QoS blueprint.
15:10:18 <n0ano> if the rxtx is truly a ratio then it wouldn't really work for a QoS value, I can still steal rx bandwidth just by doing some tx
15:10:20 <PhilDay> Given that it goes into the flavor I'm wary of havign to have lost of different flavors for people to choose from
15:11:33 <n0ano> PhilDay, in that case maybe it's more appropriate to be part of the image attributes
15:11:53 <garyk> in quantum there was a little discusion about qos, but nothing concreate at the moment
15:11:57 <PhilDay> I'm always wary of buildign on too many other BPs - so maybe we juist go for a simple bandwitdh based scheduleing for now (i.e the network peer of memory capacity) so that we can stay independent
15:12:08 <n0ano> unfortunately, the cloud provider (the who worries about bandwidth) has no control over the image attributes so that might not work.
15:12:45 <PhilDay> don'f follow your comment on image attricbutes ?
15:13:03 <glikson> PhilDay: true, adding an additional kind of resource would potentially invite many more flavors.. but I personally think it still makes sense.
15:14:01 <n0ano> I thought there was an image contraints filter that utilized attributes stored as part of the meta data for the image
15:14:24 * n0ano moved to a new house and my network isn't complete, can't access the right machine to research this
15:15:08 <n0ano> senhuang, got to go, you are now in control
15:15:17 <PhilDay> Well there are all sorts of additional filters you can have - but that's just adding more constartints to the scheduler to work with.  I don't see why image attributes wouls clash with network capacity ?
15:15:20 <senhuang> n0ano: no problem.
15:15:24 <glikson> there is a filter that matches image properties. but is it really an image property?
15:15:55 <senhuang> glikson: i agree that it is not an image property
15:16:04 <PhilDay> I think it matches an image property (liek I need an AMD server) to a host capability (liek I an an AMD or x86 server)
15:16:25 <PhilDay> I.e its just a binary capability filter.
15:17:01 <PhilDay> The network bandwidth is just concerned with "where does an instance of this flavor fit"
15:17:17 <senhuang> PhilDay: There is also a json filter that can do more operators based on requests and capabilities.
15:17:19 <glikson> right -- not a capacity requirement. which sometimes could make sense, btw (to make sure that certain image can run with certain flavors).
15:17:52 <PhilDay> You can match images to flavors via min_ram and min_disk metadata
15:18:17 <PhilDay> I'm not sure I'd see a used for min_rx or min_tx though
15:18:42 <PhilDay> Fliter scheduler does both capacity and capability matching
15:19:42 <PhilDay> I think adding to the flavor definiton is a better model.
15:20:06 <PhilDay> I'd rathe rhave it as a core property that encoded in extra_specs
15:20:25 <senhuang> then the # of flavors *= # of min_rx, min_tx pairs
15:21:08 <senhuang> i guess a little bit more details on the proposal will help
15:21:10 <PhilDay> Well that depends on how much flexibility you want to provide users with
15:21:47 <PhilDay> You could make the same case for providing every single memory size, but no-one does that
15:22:01 <senhuang> PhilDay: that is true..
15:22:18 <PhilDay> There is a BP page - hang on I'll dig out the link
15:22:47 <PhilDay> https://blueprints.launchpad.net/nova/+spec/network-bandwidth-entitlement
15:22:59 <PhilDay> https://wiki.openstack.org/wiki/NetworkBandwidthEntitlement
15:23:33 <PhilDay> Maybe people could take a look and provide feedback next week ?
15:23:44 <senhuang> that is a good idea.
15:23:49 <senhuang> shall we move on?
15:24:23 <senhuang> #topic group/ensemble scheduling
15:24:40 <garyk> senhuang: you should know what this one is about :)
15:25:14 <senhuang> garyk: yep. i know. :-) let's update the whole subteam.
15:25:40 <garyk> ok, i'll let you do that
15:25:56 <senhuang> on the instance group api extension, gary has submitted 30+ patches on the db support
15:26:12 <senhuang> https://review.openstack.org/#/c/28880/
15:26:44 <senhuang> the development of api support for this is on-going work
15:26:59 <senhuang> i hope i can submit the initial patch today or tomorrow
15:27:09 <senhuang> garyk: do you need more help on the review?
15:27:37 <garyk> senhuang: i think that we have ironed out all of the issues. we just need some core guys to take a look
15:27:44 <garyk> the doc describing it all is https://docs.google.com/document/d/1QUThPfZh6EeOOz1Yhyvx-jFUYHQgvCw9yBAGDGc0y78/edit?usp=sharing
15:27:58 <senhuang> blueprint: https://blueprints.launchpad.net/nova/+spec/instance-group-api-extension
15:28:13 <senhuang> wiki: https://wiki.openstack.org/wiki/GroupApiExtension
15:28:24 <garyk> i am going to start the conductor api side soon so that the scheduler can make use of the info
15:28:40 <PhilDay> I found it a bit hard from just the DB change to work out exactly what relationships will be supported
15:29:21 <PhilDay> I thought there was a general move away from introducing features in small pieces - or has this been OK'd ?
15:29:22 <garyk> the idea is that we will have an instance group.
15:29:27 <senhuang> basically, a group-instance will contains a group of VM instances, with a list of policies that can be enforced by the scheduler
15:30:16 <PhilDay> Yeah I got the idea from the BP - what I'm saying is that it can be quite hard to meaningfully review just one part of the changes in isolation.
15:30:27 <garyk> PhilDay: agreed.
15:30:41 <PhilDay> Having a DB model with no code to drive it makes it pretty hard to see if the DB code is OK or not
15:30:47 <garyk> hopefully when we have the API and the conductor side it will all fit in together
15:30:55 <senhuang> yes.
15:31:00 <senhuang> i am working on the API part
15:31:21 <glikson> yep, we might need some refactoring around db APIs once other parts are developed
15:31:28 <garyk> i do not think that this should prevent the pacth going through. it is very generic and covers all bases at the moment
15:31:28 <PhilDay> Why not make that all same commit then ?
15:31:50 <PhilDay> Just saying that there has been push back on this kind of appraoch before
15:32:10 <PhilDay> (But I like where teh BP is headed)
15:32:38 <garyk> to be honest i am not sure what the approach in nova is. in quantum this is acceptability. we can certainly build the patches on on top of another if that will provide a better picture
15:32:50 <garyk> i just think that adding one huge feature will take a tone of time.
15:33:23 <PhilDay> Its hard to get the right balance.
15:33:56 <glikson> garyk: in some cases we used dependencies between changes to have the bigger context but still review smaller pieces
15:34:39 <garyk> glikson: i am not sure how the db piece can be made smaller. it is very isolated
15:34:44 <senhuang> the DB part is quite stand-alone implementation
15:35:00 <PhilDay> I haven't seen anyone from core comment on thsi yet - which is odd for something with 30+ iterations
15:35:03 <glikson> garyk: I mean, to link with other pieces
15:35:19 <senhuang> it also has enough meterials for review.
15:35:20 <PhilDay> Mayeb worth reaching out to Russell ?
15:36:12 <garyk> ok, i'll try and be in touch with him
15:36:16 <glikson> senhuang: correct, but in several other reviews it was stated that it is not a good practice to merge code that noone is using..
15:36:22 <PhilDay> Right now its a great patch for adding a DB table - but that seems an odd chunk to do in isolation to me
15:36:31 <senhuang> #action Reaching out to Russell about the code reviews for parts of a big feature
15:37:08 <russellb> probably need the rest before you'll get much review on the db part
15:37:25 <russellb> i wouldn't want to see it go in until the feature to use it is ready to go right behind it
15:38:14 <senhuang> russellb: okay
15:38:29 <garyk> in the mean time i will add in the conductor side of things. this just seems like it is really prolonging the integration and adoption
15:39:15 <PhilDay> integration maybe - but you can't really adopt anything without the API and scheduler layer ?
15:39:27 <russellb> right
15:39:54 <russellb> but my comment is just a general approach to review and acceptance of any feature
15:40:07 <russellb> we don't merge the code until the whole thing is ready (with few exceptions)
15:40:14 <senhuang> it will be a long process since there are so many pieces for the whole thing
15:40:16 <glikson> PhilDay: well, I can think of usefulness of grouping even without having it supported in the scheduler -- but sure, the bulk of it will come with the scheduler support.
15:40:20 <PhilDay> The risk of partial features is that they tend to de-stablise trunk, esp if you have to rework parts as you add the upper leyaers
15:40:31 <garyk> at the moment the scheduler has a very patchy implementation of the grouping - using the isnatnce metadata. this will be formalized and easily updated here
15:41:41 <glikson> garyk: right, ideally without changing the hint syntax
15:42:29 <PhilDay> glikson:  No objection to submtiing a complete (but smaller) subset of the grouping feature and then adding scheduler support - the important part is that it completly implements some feature
15:42:57 <garyk> i understand. i'll add the patches above this one and hopefully it will make a better picture.
15:43:20 <garyk> we will also need nova client support
15:44:08 <glikson> might make sense to add nova client support together with the api support
15:44:13 <senhuang> garyk: i should be able to have the api part ready today. should i submit a new patch or on top of the patch you have?
15:45:13 <garyk> senhuang: that would be great.
15:45:25 <garyk> PhilDay: russellb: is that acceptable?
15:46:00 <russellb> sounds like it
15:46:04 <russellb> need novaclient support too though
15:46:35 <garyk> we'll take care of that
15:47:07 <senhuang> okay. let's move on to another topic?
15:47:49 <senhuang> #topic open-discussion
15:50:07 <glikson> I have a question regarding multi-scheduler blueprint (definition of different scheduler drivers and/or policies for different host aggregates). One of the assumption that we need is that host aggregates that use this feature are disjoint. Does it sound like a reasonable assumption that admin will manage it? or do we need to introduce the notion of disjoint host aggregates in the code? it might be relevant for other features as well..
15:51:21 <russellb> why do they need to be disjoint?
15:51:37 <PhilDay> So would that be a property of an aggregate - i.e you can define an aggregate that it must be disjoint, and it will then reject hosts that are part of another aggregate ?
15:51:51 <PhilDay> I think that woudl also be useful for AZs
15:52:05 <russellb> request comes in with an AZ (host aggregate), and that would be your key to figure out which scheduling path to take, right?
15:52:28 <glikson> russellb: otherwise it might create some inconsistencies, if the same host belongs to two aggregates each associated with different scheduler..
15:53:01 <russellb> a host may be in AZ 1 and AZ 2, but the *request* specified AZ 2, so you'd schedule based on that
15:53:47 <russellb> the scheduling path you take has to be entirely based on what is in the request
15:53:51 <glikson> I didn't think to use AZ hint for that -- just properties of the flavor, like we do with standard aggregate filter..
15:53:55 <PhilDay> I was thinking that it woudl just be a useful safeguard in setting up AZs.  At the moment a host can be in more than one AZ (just beacause aggregates support it) - but I can't see a reason you would want to to that
15:54:25 <russellb> there are certainly use cases for overlapping aggregates
15:54:41 <russellb> aggregate with special hardware thing 1, and aggregate with special hardware thing 2, and some hosts may have both
15:54:50 <glikson> PhilDay: agree, disjoint aggregates based on certain property might make sense regardless of this particular feature
15:54:53 <PhilDay> So having an aggregate property that says "this aggregate must be disjoint from these aggregates" would be useful at that level to avoid mistakes
15:55:22 <primeministerp> hi all
15:55:37 <primeministerp> whoops my clock seems to be off by 5
15:56:01 <PhilDay> Agreed that overlapping is a useful capability in some cases - I think disjoint would have to be relative to specific other aggregates
15:56:13 <glikson> it might make sense to be able to express something like: "this aggregate should be disjoint with all the aggregates of type AZ"
15:56:51 <PhilDay> So why does the multi scheduler need disjoint aggregates
15:56:55 <glikson> kind of grouping of aggregates..
15:58:10 <glikson> PhilDay: for consistency.. e.g., if we want to know which scheduler to use to migrate a given instance off a host, we would go to the scheduler associated with the "scheduling aggregate" that this host belongs to
15:58:14 <PhilDay> For use in the scheduler you'd need disjoint to not be linked just to aggregates of type AZ - maybe we also need an aggregate type then (I think AZ is a special case at the moment no) ?
15:59:16 <glikson> yep, exactly -- aggregate type sounds like a good approach. and for certain type we may require them to be disjoint (within that type).
15:59:38 <senhuang> okay. guys. the time is almost there.
15:59:46 <senhuang> let's continue the discussions next week
16:00:07 <PhilDay> works for me.  See ya
16:00:16 <glikson> ok. thanks, bye.
16:00:17 <senhuang> #end-meeting
16:00:18 <jgallard_> ok, thanks a lot, it was very interesting :)
16:00:44 <primeministerp> #endmeeting
16:00:45 <senhuang> #end
16:00:50 <senhuang> #endmeeting