15:00:19 <n0ano> #startmeeting scheduler sub-group 15:00:19 <openstack> Meeting started Tue May 21 15:00:19 2013 UTC. The chair is n0ano. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:20 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:22 <openstack> The meeting name has been set to 'scheduler_sub_group' 15:00:33 <n0ano> show of hands, anyone here for the scheduler meeting? 15:00:37 * glikson here 15:00:43 <jgallard_> hi all :) 15:00:50 <senhuang> hi all 15:01:28 <n0ano> Well, while waiting for people to get here, a little administrivia... 15:02:06 <n0ano> I have to leave after 20 min (my wife is having oral surgery) senhuang can I tag you to chari when I leave (mainly make sure you do an #endmeeting)? 15:02:24 <senhuang> sure 15:02:26 <n0ano> s/chari/chair 15:02:31 <n0ano> senhuang, tnx 15:02:43 <senhuang> n0ano: no problem! 15:02:48 <jgallard_> n0ano, I'm not sure someone else can do an endmeeting for you 15:02:54 <n0ano> #chair senhuang 15:02:55 <openstack> Current chairs: n0ano senhuang 15:03:20 <n0ano> uh oh, I have to be careful, senhuang can override my commands now :-) 15:03:27 <n0ano> anyway, let 15:03:41 <jgallard_> n0ano, ok :) 15:03:43 <n0ano> let's get started, maybe some lurkers will appear 15:04:00 <n0ano> #topic network bandwidth aware scheduling 15:04:14 <PhilDay> I filed a BP to cover this. 15:04:22 <n0ano> anyone here want to drive this discussion (I have some ideas but this is not my BP) 15:04:31 <n0ano> PhilDay, you have the floor 15:04:44 <alaski> hi all 15:04:48 <garyk> hi 15:05:21 <PhilDay> Basic idea for now is pretty simple - count the bandwidth allocated to flavors in the the same sort of way that we do memory 15:05:41 <PhilDay> so that its possibel to use overall bandwidth of a server as a scheduling constraint 15:06:21 <n0ano> fine idea, my concern is that this is rather specific, I'd prefer a more general mechanism 15:06:25 <garyk> PhilDay: do you mean taking free bandwidth into account? 15:06:27 <PhilDay> Difference is that the hypervisor probably can't derive the total available bandwidth, so it will need to be configured on a per host basis 15:06:27 <senhuang> this requires the host-manager to report the total bandwidth capability to the scheduler? 15:06:35 <PhilDay> Yep 15:06:39 <n0ano> could this be part of something like extending data in host state that has been proposed? 15:06:53 <PhilDay> If that is in place then we'd build on that for sure 15:07:14 <PhilDay> the flavor already has an rxtx_factor which is used in Xen 15:07:33 <senhuang> PhilDay: I see. that is good. 15:07:34 <PhilDay> there used to be an rxtx_cap, which is more along the lines that we were looking for 15:07:42 <glikson> yep, sounds like a reasonable extension to yet another kind of resource that would make sense to manage host capacity for. 15:08:23 <n0ano> would this be an extension to the rxtx_factor or a replacement for it? 15:08:32 <PhilDay> I'm not quiet sure what triggered the shift, rxtx_factor is just a relative measure - so what you get with that depoend on which host you get scheduled to 15:09:16 <senhuang> what the constraint will look like? will we have a constraint for rx and one for tx? 15:09:21 <PhilDay> I was thinkign that maybe we could also set the rxtx valeu as a port property in case Quantum can do somethign more fancy with it in terms of QoS - 15:09:56 <PhilDay> I was thiking more of just a sinlge cap, but I guess it could be two values if you think there is a use case. 15:10:13 <senhuang> PhilDay: that might be something we can discuss with Quauntum team. they are also working on a QoS blueprint. 15:10:18 <n0ano> if the rxtx is truly a ratio then it wouldn't really work for a QoS value, I can still steal rx bandwidth just by doing some tx 15:10:20 <PhilDay> Given that it goes into the flavor I'm wary of havign to have lost of different flavors for people to choose from 15:11:33 <n0ano> PhilDay, in that case maybe it's more appropriate to be part of the image attributes 15:11:53 <garyk> in quantum there was a little discusion about qos, but nothing concreate at the moment 15:11:57 <PhilDay> I'm always wary of buildign on too many other BPs - so maybe we juist go for a simple bandwitdh based scheduleing for now (i.e the network peer of memory capacity) so that we can stay independent 15:12:08 <n0ano> unfortunately, the cloud provider (the who worries about bandwidth) has no control over the image attributes so that might not work. 15:12:45 <PhilDay> don'f follow your comment on image attricbutes ? 15:13:03 <glikson> PhilDay: true, adding an additional kind of resource would potentially invite many more flavors.. but I personally think it still makes sense. 15:14:01 <n0ano> I thought there was an image contraints filter that utilized attributes stored as part of the meta data for the image 15:14:24 * n0ano moved to a new house and my network isn't complete, can't access the right machine to research this 15:15:08 <n0ano> senhuang, got to go, you are now in control 15:15:17 <PhilDay> Well there are all sorts of additional filters you can have - but that's just adding more constartints to the scheduler to work with. I don't see why image attributes wouls clash with network capacity ? 15:15:20 <senhuang> n0ano: no problem. 15:15:24 <glikson> there is a filter that matches image properties. but is it really an image property? 15:15:55 <senhuang> glikson: i agree that it is not an image property 15:16:04 <PhilDay> I think it matches an image property (liek I need an AMD server) to a host capability (liek I an an AMD or x86 server) 15:16:25 <PhilDay> I.e its just a binary capability filter. 15:17:01 <PhilDay> The network bandwidth is just concerned with "where does an instance of this flavor fit" 15:17:17 <senhuang> PhilDay: There is also a json filter that can do more operators based on requests and capabilities. 15:17:19 <glikson> right -- not a capacity requirement. which sometimes could make sense, btw (to make sure that certain image can run with certain flavors). 15:17:52 <PhilDay> You can match images to flavors via min_ram and min_disk metadata 15:18:17 <PhilDay> I'm not sure I'd see a used for min_rx or min_tx though 15:18:42 <PhilDay> Fliter scheduler does both capacity and capability matching 15:19:42 <PhilDay> I think adding to the flavor definiton is a better model. 15:20:06 <PhilDay> I'd rathe rhave it as a core property that encoded in extra_specs 15:20:25 <senhuang> then the # of flavors *= # of min_rx, min_tx pairs 15:21:08 <senhuang> i guess a little bit more details on the proposal will help 15:21:10 <PhilDay> Well that depends on how much flexibility you want to provide users with 15:21:47 <PhilDay> You could make the same case for providing every single memory size, but no-one does that 15:22:01 <senhuang> PhilDay: that is true.. 15:22:18 <PhilDay> There is a BP page - hang on I'll dig out the link 15:22:47 <PhilDay> https://blueprints.launchpad.net/nova/+spec/network-bandwidth-entitlement 15:22:59 <PhilDay> https://wiki.openstack.org/wiki/NetworkBandwidthEntitlement 15:23:33 <PhilDay> Maybe people could take a look and provide feedback next week ? 15:23:44 <senhuang> that is a good idea. 15:23:49 <senhuang> shall we move on? 15:24:23 <senhuang> #topic group/ensemble scheduling 15:24:40 <garyk> senhuang: you should know what this one is about :) 15:25:14 <senhuang> garyk: yep. i know. :-) let's update the whole subteam. 15:25:40 <garyk> ok, i'll let you do that 15:25:56 <senhuang> on the instance group api extension, gary has submitted 30+ patches on the db support 15:26:12 <senhuang> https://review.openstack.org/#/c/28880/ 15:26:44 <senhuang> the development of api support for this is on-going work 15:26:59 <senhuang> i hope i can submit the initial patch today or tomorrow 15:27:09 <senhuang> garyk: do you need more help on the review? 15:27:37 <garyk> senhuang: i think that we have ironed out all of the issues. we just need some core guys to take a look 15:27:44 <garyk> the doc describing it all is https://docs.google.com/document/d/1QUThPfZh6EeOOz1Yhyvx-jFUYHQgvCw9yBAGDGc0y78/edit?usp=sharing 15:27:58 <senhuang> blueprint: https://blueprints.launchpad.net/nova/+spec/instance-group-api-extension 15:28:13 <senhuang> wiki: https://wiki.openstack.org/wiki/GroupApiExtension 15:28:24 <garyk> i am going to start the conductor api side soon so that the scheduler can make use of the info 15:28:40 <PhilDay> I found it a bit hard from just the DB change to work out exactly what relationships will be supported 15:29:21 <PhilDay> I thought there was a general move away from introducing features in small pieces - or has this been OK'd ? 15:29:22 <garyk> the idea is that we will have an instance group. 15:29:27 <senhuang> basically, a group-instance will contains a group of VM instances, with a list of policies that can be enforced by the scheduler 15:30:16 <PhilDay> Yeah I got the idea from the BP - what I'm saying is that it can be quite hard to meaningfully review just one part of the changes in isolation. 15:30:27 <garyk> PhilDay: agreed. 15:30:41 <PhilDay> Having a DB model with no code to drive it makes it pretty hard to see if the DB code is OK or not 15:30:47 <garyk> hopefully when we have the API and the conductor side it will all fit in together 15:30:55 <senhuang> yes. 15:31:00 <senhuang> i am working on the API part 15:31:21 <glikson> yep, we might need some refactoring around db APIs once other parts are developed 15:31:28 <garyk> i do not think that this should prevent the pacth going through. it is very generic and covers all bases at the moment 15:31:28 <PhilDay> Why not make that all same commit then ? 15:31:50 <PhilDay> Just saying that there has been push back on this kind of appraoch before 15:32:10 <PhilDay> (But I like where teh BP is headed) 15:32:38 <garyk> to be honest i am not sure what the approach in nova is. in quantum this is acceptability. we can certainly build the patches on on top of another if that will provide a better picture 15:32:50 <garyk> i just think that adding one huge feature will take a tone of time. 15:33:23 <PhilDay> Its hard to get the right balance. 15:33:56 <glikson> garyk: in some cases we used dependencies between changes to have the bigger context but still review smaller pieces 15:34:39 <garyk> glikson: i am not sure how the db piece can be made smaller. it is very isolated 15:34:44 <senhuang> the DB part is quite stand-alone implementation 15:35:00 <PhilDay> I haven't seen anyone from core comment on thsi yet - which is odd for something with 30+ iterations 15:35:03 <glikson> garyk: I mean, to link with other pieces 15:35:19 <senhuang> it also has enough meterials for review. 15:35:20 <PhilDay> Mayeb worth reaching out to Russell ? 15:36:12 <garyk> ok, i'll try and be in touch with him 15:36:16 <glikson> senhuang: correct, but in several other reviews it was stated that it is not a good practice to merge code that noone is using.. 15:36:22 <PhilDay> Right now its a great patch for adding a DB table - but that seems an odd chunk to do in isolation to me 15:36:31 <senhuang> #action Reaching out to Russell about the code reviews for parts of a big feature 15:37:08 <russellb> probably need the rest before you'll get much review on the db part 15:37:25 <russellb> i wouldn't want to see it go in until the feature to use it is ready to go right behind it 15:38:14 <senhuang> russellb: okay 15:38:29 <garyk> in the mean time i will add in the conductor side of things. this just seems like it is really prolonging the integration and adoption 15:39:15 <PhilDay> integration maybe - but you can't really adopt anything without the API and scheduler layer ? 15:39:27 <russellb> right 15:39:54 <russellb> but my comment is just a general approach to review and acceptance of any feature 15:40:07 <russellb> we don't merge the code until the whole thing is ready (with few exceptions) 15:40:14 <senhuang> it will be a long process since there are so many pieces for the whole thing 15:40:16 <glikson> PhilDay: well, I can think of usefulness of grouping even without having it supported in the scheduler -- but sure, the bulk of it will come with the scheduler support. 15:40:20 <PhilDay> The risk of partial features is that they tend to de-stablise trunk, esp if you have to rework parts as you add the upper leyaers 15:40:31 <garyk> at the moment the scheduler has a very patchy implementation of the grouping - using the isnatnce metadata. this will be formalized and easily updated here 15:41:41 <glikson> garyk: right, ideally without changing the hint syntax 15:42:29 <PhilDay> glikson: No objection to submtiing a complete (but smaller) subset of the grouping feature and then adding scheduler support - the important part is that it completly implements some feature 15:42:57 <garyk> i understand. i'll add the patches above this one and hopefully it will make a better picture. 15:43:20 <garyk> we will also need nova client support 15:44:08 <glikson> might make sense to add nova client support together with the api support 15:44:13 <senhuang> garyk: i should be able to have the api part ready today. should i submit a new patch or on top of the patch you have? 15:45:13 <garyk> senhuang: that would be great. 15:45:25 <garyk> PhilDay: russellb: is that acceptable? 15:46:00 <russellb> sounds like it 15:46:04 <russellb> need novaclient support too though 15:46:35 <garyk> we'll take care of that 15:47:07 <senhuang> okay. let's move on to another topic? 15:47:49 <senhuang> #topic open-discussion 15:50:07 <glikson> I have a question regarding multi-scheduler blueprint (definition of different scheduler drivers and/or policies for different host aggregates). One of the assumption that we need is that host aggregates that use this feature are disjoint. Does it sound like a reasonable assumption that admin will manage it? or do we need to introduce the notion of disjoint host aggregates in the code? it might be relevant for other features as well.. 15:51:21 <russellb> why do they need to be disjoint? 15:51:37 <PhilDay> So would that be a property of an aggregate - i.e you can define an aggregate that it must be disjoint, and it will then reject hosts that are part of another aggregate ? 15:51:51 <PhilDay> I think that woudl also be useful for AZs 15:52:05 <russellb> request comes in with an AZ (host aggregate), and that would be your key to figure out which scheduling path to take, right? 15:52:28 <glikson> russellb: otherwise it might create some inconsistencies, if the same host belongs to two aggregates each associated with different scheduler.. 15:53:01 <russellb> a host may be in AZ 1 and AZ 2, but the *request* specified AZ 2, so you'd schedule based on that 15:53:47 <russellb> the scheduling path you take has to be entirely based on what is in the request 15:53:51 <glikson> I didn't think to use AZ hint for that -- just properties of the flavor, like we do with standard aggregate filter.. 15:53:55 <PhilDay> I was thinking that it woudl just be a useful safeguard in setting up AZs. At the moment a host can be in more than one AZ (just beacause aggregates support it) - but I can't see a reason you would want to to that 15:54:25 <russellb> there are certainly use cases for overlapping aggregates 15:54:41 <russellb> aggregate with special hardware thing 1, and aggregate with special hardware thing 2, and some hosts may have both 15:54:50 <glikson> PhilDay: agree, disjoint aggregates based on certain property might make sense regardless of this particular feature 15:54:53 <PhilDay> So having an aggregate property that says "this aggregate must be disjoint from these aggregates" would be useful at that level to avoid mistakes 15:55:22 <primeministerp> hi all 15:55:37 <primeministerp> whoops my clock seems to be off by 5 15:56:01 <PhilDay> Agreed that overlapping is a useful capability in some cases - I think disjoint would have to be relative to specific other aggregates 15:56:13 <glikson> it might make sense to be able to express something like: "this aggregate should be disjoint with all the aggregates of type AZ" 15:56:51 <PhilDay> So why does the multi scheduler need disjoint aggregates 15:56:55 <glikson> kind of grouping of aggregates.. 15:58:10 <glikson> PhilDay: for consistency.. e.g., if we want to know which scheduler to use to migrate a given instance off a host, we would go to the scheduler associated with the "scheduling aggregate" that this host belongs to 15:58:14 <PhilDay> For use in the scheduler you'd need disjoint to not be linked just to aggregates of type AZ - maybe we also need an aggregate type then (I think AZ is a special case at the moment no) ? 15:59:16 <glikson> yep, exactly -- aggregate type sounds like a good approach. and for certain type we may require them to be disjoint (within that type). 15:59:38 <senhuang> okay. guys. the time is almost there. 15:59:46 <senhuang> let's continue the discussions next week 16:00:07 <PhilDay> works for me. See ya 16:00:16 <glikson> ok. thanks, bye. 16:00:17 <senhuang> #end-meeting 16:00:18 <jgallard_> ok, thanks a lot, it was very interesting :) 16:00:44 <primeministerp> #endmeeting 16:00:45 <senhuang> #end 16:00:50 <senhuang> #endmeeting