15:03:24 <garyk> #startmeeting scheduling 15:03:25 <openstack> Meeting started Tue Sep 24 15:03:24 2013 UTC and is due to finish in 60 minutes. The chair is garyk. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:03:26 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:03:30 <openstack> The meeting name has been set to 'scheduling' 15:03:58 <garyk> Last week we did not have much chance to discuss Mike's and Tahi's ideas. 15:04:04 <garyk> Sorry Yathi's 15:04:16 <garyk> MikeSpreitzer: do you want to start? 15:04:23 <MikeSpreitzer> OK 15:04:24 <Subbu> #info 15:04:49 <garyk> Subbu: please see https://docs.google.com/document/d/1hQQGHId-z1A5LOipnBXFhsU3VAMQdSe-UXvL4VPY4ps/edit 15:04:57 <MikeSpreitzer> Should I start with responding to the latest thing on the ML (From Zane), or start from scratch? 15:05:19 <Subbu> thanks garyk 15:05:24 <garyk> MikeSpreitzer: I am fine with that. Not sure if others are up to speed or having been following the list. 15:05:31 <garyk> Maybe it is best to start from the beginning 15:05:37 <MikeSpreitzer> OK, I'll start from the beginning. 15:06:12 <garyk> Great 15:06:20 <MikeSpreitzer> I am interested in holistic scheduling. By that I mean the idea of a scheduler that can look at a whole template/pattern/topology and make a joint decision about all the resources in it. 15:06:40 <MikeSpreitzer> I do not mean that this thing *has* to make all the decisions, but it should have the opportunity. 15:06:55 <MikeSpreitzer> I mean a richer notion of pattern than CFN has today. 15:07:16 <MikeSpreitzer> A pattern should have internal grouping, with various sorts of policy and relationship statements attached. 15:07:42 <garyk> I agree with the fact that the scheduler should have a complete picture of all of the resources 15:07:50 <MikeSpreitzer> (that's the response to Zane's main complaint. This richer information gives the holistic scheduler information to use, rather than requiring mind reading) 15:08:29 <MikeSpreitzer> Not sure how much you want to hear about what my group has working, so I'll go on for now. 15:08:29 <garyk> MikeSpreitzer: Gilad an I tried to brach this with the VM ensembles. 15:08:48 <garyk> MikeSpreitzer: all ears 15:09:41 <MikeSpreitzer> My group is doing stuff like this, but not integrated with heat; our current holistic controller is a client of Nova, Cinder, etc, but slightly extended versions of them to give us the visibility and control we need. 15:10:12 <MikeSpreitzer> We have worked an example of a template for IBM Connections, which is a complicated set of apps based on our appserver products, and are working examples based on Hadoop. 15:10:48 <MikeSpreitzer> Anyway, once the holistic scheduler has made the decisions it is going to, the next step is infrastructure orchestration. 15:11:18 <MikeSpreitzer> That is the business of invoking the specific resource services to put the decisions already made into effect and pass the remaining bits of the problem along. 15:11:54 <MikeSpreitzer> This is the main job of today's heat engine, and I see no reason to use something else for this part. 15:12:50 <garyk> Can you please explain why it does the orchestration part? 15:12:52 <MikeSpreitzer> I am also interested in other ways of doing software orchestration. I have colleagues who want to promote a technique that has a non-trivial preparatory stage, and then at runtime in the VMs etc all the dependencies are handled primarily by software running there. 15:13:09 <MikeSpreitzer> Garyk: which "it" ? 15:13:17 <garyk> I can understand that the scheduler should see the entire 'application' that is going to be deployed, but that can be done without integration with heat 15:13:48 <MikeSpreitzer> I think it is awkward to have today's heat engine upstream of holistic infrastructure scheduling 15:14:05 <MikeSpreitzer> today's heat engine breaks a whole template up into individual resource calls and makes those… 15:14:17 <MikeSpreitzer> Not very natural to pass a whole template downstream from that. 15:14:41 <garyk> if i understand correct heat has a template. if the template can create logical links 'links' between the entitietis and these passed to the scheduler then it can be seprate 15:14:43 <MikeSpreitzer> Also, kind of pointless to break up the template before the holistic scheduling. 15:15:20 <garyk> i think that at the moment heat does things sequentially (i may be wrong here) 15:15:37 <MikeSpreitzer> Sure, if the scheduler sees all the resources and links, that's what is needed. 15:16:26 <MikeSpreitzer> There is no infrastructure orchestration to be done before holistic scheduling, and there *is* infrastructure scheduling to be done after holistic scheduling. 15:16:41 <garyk> That was our initial goal with the VM ensembles. We failed to convince people that it was the right way. The piece meal approcah was to us the instance groups 15:17:04 <MikeSpreitzer> What was the sticking point? 15:17:30 <garyk> I think that we did not manage to define the API well enough. 15:17:54 <garyk> In addition to this there were schedulers being developed for all of the different projects 15:18:01 <MikeSpreitzer> My group has been using a pretty simple API — with a rich template/pattern/topology language 15:18:42 <MikeSpreitzer> Sure, there should be schedulers for smaller scopes. 15:18:49 <Yathi> Currently the resources are limited to the individual services (projects), there is definitely a need for some kind of a Global state repository (like how I can explain later about my high-level-vision) 15:19:26 <Yathi> this global state repository can feed to any of the services 15:19:29 <MikeSpreitzer> Yathi: yes, repo as well as decision making. The repo raises Boris' issues, which are relevant to all schedulers. 15:19:42 <garyk> Yathi: that is very intersting. How would this get the information from the various sources. 15:19:53 <garyk> MikeSpreitzer: how did you guys address that? 15:20:11 <MikeSpreitzer> I can tell you how we do it now, but like you guys, we are not satisfied... 15:20:23 <MikeSpreitzer> I think there is room for improvement here, but it does not change the overall picture. 15:20:39 <Yathi> An attempt to get there is started by the blueprint proposed by Boris 15:20:43 <garyk> hopefully with the community we can improve things :) 15:20:49 <Yathi> In-memory state 15:21:32 <Yathi> https://blueprints.launchpad.net/nova/+spec/no-db-scheduler 15:21:33 <MikeSpreitzer> Our current approach is anchored in a database, and not fully built out as we already decided we want. We are also interested in moving to something that is based in memory. This stuff is all a cache, the hard state is in lower layers. 15:21:38 <garyk> #info https://review.openstack.org/#/c/45867 (this is review mentioned ^) 15:22:12 <Yathi> yeah that is something that can eventually address getting a global state repository 15:22:29 <garyk> So we all seem to be aligned int he fact that we need to cache the information locally (in memory) 15:22:54 <MikeSpreitzer> OK, so let's get back to the reasons for rejection earlier. 15:23:03 <garyk> I think that one of the major challenges is how this information is shared between the hosts, services and scheduler 15:23:11 <MikeSpreitzer> I do not see a conflict with the fact that individual services have their own schedulers. What was the problelm? 15:23:18 <Yathi> I can explain later.. but in my document #info https://docs.google.com/document/d/1IiPI0sfaWb1bdYiMWzAAx0HYR6UqzOan_Utgml5W1HI/edit?pli=1 I try to put together the bits required for a smart resource placement 15:23:51 <MikeSpreitzer> OTOH, I do see a conflict... 15:24:13 <MikeSpreitzer> If you think the only place to put smarts is in an individual service, then that's a conflict. 15:25:03 <garyk> MikeSpreitzer: true 15:25:03 <MikeSpreitzer> But I think we are liking the idea of enabling joint decision-making. 15:25:31 <garyk> MikeSpreitzer: i think that there are a number of people who like and support that idea 15:25:35 <MikeSpreitzer> garyk: enough of my conjecture, can you elaborate on the objection wrt service schedulers? 15:26:45 <alaski> I'm late to the meeting and just catching up, but I'm very interested in what extra data needed to be exposed from nova/cinder/etc for holistic scheduling and what resource placement control was needed 15:26:52 <garyk> MikeSpreitzer: it was a tough one. there were people who felt like it was part of heat 15:27:27 <garyk> our point was that the scheuler needed to see all of the information and heat was not able to do that 15:27:29 <MikeSpreitzer> alaski: To compute we added visibility into the physical hierarchy, so we can make rack-aware decisions. 15:27:36 <alaski> I think the sticking point for earlier efforts are partly due to the focus on holistic scheduling before discussing what each service needs to provide and accept 15:28:33 <MikeSpreitzer> I think that, at least for private clouds, there is a simple general rule: you may think you are at the top of the heap, but you are not. Enable a smarter client with a bigger view to make decisions. 15:29:18 <garyk> it is a chance to provide preferential services for applications 15:29:19 <alaski> right. I've heard very little objection to that, the devil is in the details 15:29:24 <MikeSpreitzer> Nova today allows its client to direct placement. We added visibility of the physical hierarchy, so a smarter client can decide where VMs go. 15:29:49 <MikeSpreitzer> We also worked out a way to abuse Cinder volume types to direct placement. 15:30:05 <MikeSpreitzer> We are currently cheating on the visibility for Cinder, would prefer that Cinder have a real solution. 15:30:28 <MikeSpreitzer> For network, we are moving from something more proprietary to something OpenDaylight based. 15:30:40 <alaski> so as far as Nove direct placement goes, I'm very much in favor of removing scheduler hints. I want a placement api, but I think it needs to redone with an idea of what we want from it 15:30:59 <MikeSpreitzer> We currently use a tree-shaped abstraction for network. That is admittedly an serious abstraction, it is an open question how well it will work. 15:32:00 <garyk> alaski: the placement api is a good start. the instance groups can be an option? 15:32:16 <MikeSpreitzer> So the kind of visibility that I think is needed is an abstract statement of the topology and capacity of the physical resources (we tend to use the word "containers",but not to mean LXC, rather as a general term for things that can host virtual resources). 15:33:01 <garyk> MikeSpreitzer: in a public cloud how much information do you want to provide to the end users? At the end of the day they just want to be guarantted for the service that they are paying for 15:33:24 <MikeSpreitzer> Yes, it's pretty different in a public cloud. 15:33:33 <alaski> garyk: I think instance groups is a good start, but I don't know if it's rich enough to stop there 15:33:38 <Yathi> I think we are in the common theme for a smarter resource placement, and I would like to present a high-level vision document that I shared.. and connect it to some efforts being undertaken 15:33:50 <Yathi> including Instance groups 15:33:53 <garyk> alaski: agreed. it is very primitive at the moment 15:34:13 <garyk> Yathi: agreed. Can you elborate 15:34:18 <MikeSpreitzer> Yes, I think instance groups falls short of what we need. 15:34:51 <MikeSpreitzer> There are things to say about public cloud, but I will listen to Yathi first. 15:34:52 <Yathi> Ok.. the idea is to start with business rules/ policies as stated by tenants - leading to a smart resource placement 15:35:11 <Yathi> with the help of all the datacenter resources with the help of a global state repository 15:35:34 <Yathi> but the main decision making of resource placement to be handled by a smart constraint-based resource placement engine 15:35:55 <Yathi> this ties and puts together several efforts and blueprints proposed 15:36:12 <Yathi> the instance groups effort - should evolve to support policies / business rules 15:36:32 <Yathi> and this should transform to some form of constraints to be used by the decision engine, 15:37:04 <Yathi> Boris's efforts of in-memory stuff should evolve to provide a global state repository providing a view of all the resources 15:37:35 <Yathi> and the new work (for which I added a POC code) - using LP based solver scheduler should handle the decision making 15:37:54 <garyk> Yathi: can you please post the link to the code you posted 15:38:00 <Yathi> the actual orchestration or the placement of the vms can be done using existing mechanisms 15:38:24 <Yathi> #link https://review.openstack.org/#/c/46588/ 15:38:45 <Yathi> so the general idea is presented in this doc - #link https://docs.google.com/document/d/1IiPI0sfaWb1bdYiMWzAAx0HYR6UqzOan_Utgml5W1HI/edit?pli=1 15:39:06 <Yathi> the idea is this should be backward compatible and hence non-disruptive 15:39:10 <Yathi> works with the current Nova 15:39:49 <garyk> my concerns is that we seen to all have great ideas about how to do the backend implementations but the user and admin facing api's are our achilles heal 15:39:53 <Yathi> using a PULP solver module that I added code for in Nova, that I ran instead of the Filter Scheduler 15:40:24 <Yathi> user facing APIs - is the one we should reach a common agreement on 15:40:46 <alaski> garyk: agreed. In order to get something in there will need to be consensus among the projects, who don't care what makes the placement decisions. They care what they need to expose 15:41:02 <Yathi> THe instance group blueprint - brought in the concept of policlies 15:41:02 <alaski> so that needs to be figured out and added to various projects 15:41:06 <MikeSpreitzer1> I think the user-facing API can be pretty simple, I think the pattern/template/topology language is where the action is 15:41:18 <garyk> alaski: agreed 15:41:27 <Yathi> that is something we want to evolve to transform to constraints to be used by a solver engine 15:41:27 <MikeSpreitzer1> Well, there are APIs at various levels 15:42:00 <MikeSpreitzer1> I think the infrastructure level APIs need to expose sufficient visibility and control; the whole-pattern layers can have simple APIs but need rich pattern language. 15:42:22 <garyk> what about the idea that we divide it up into 3 parts: 15:42:29 <garyk> 1. the user facing apis 15:42:39 <garyk> 2. the information required from all of the services 15:42:48 <garyk> 3. backend scheduling 15:43:03 <Yathi> Garyk if you read my document - this is exactly the three points :) 15:43:11 <garyk> if we can define the relations ships between these (i guess with api's) 15:43:12 <Yathi> you read my mind! 15:43:13 <MikeSpreitzer1> garyk: which are "user facing" APIs? Which scheduling is the "backend"? 15:43:18 <garyk> Yathi: i have yet to read it 15:43:30 <Yathi> ok 15:43:42 <MikeSpreitzer1> yathi: I have skimmed it , will review more carefully 15:44:39 <Yathi> okay it presents a high-level vision of the necessary efforts, relationship to some of the existing proposed blueprints, and then some additional details on the actual DECISION engine - which makes resource placement decisions 15:45:06 <Yathi> this is work-in-progress, design-in-progress, and something we want to discuss in detail at the summit also face-to-face 15:45:32 <Yathi> but it is dependent on other blueprints and hence a big collaborative effort 15:45:32 <MikeSpreitzer1> sounds good. BTW, will anybody be around before/after the official summit for informal discussions? 15:45:50 <garyk> MikeSpreitzer1: hopefully 15:45:58 <PaulMurray> how much before or after? 15:46:06 <PaulMurray> but yes, a bit 15:46:08 <MikeSpreitzer1> Not much. 15:46:18 <Yathi> Please do review and post your feedback, and we can continue the discussion again 15:46:23 <garyk> Maybe we could all meet for lunch or breakfast one day to discuss 15:46:34 <Yathi> that will be a good idea 15:46:47 <PaulMurray> I would like to be there for that 15:46:54 <MikeSpreitzer1> I'm not sure how far this goes before it becomes a process error. 15:47:17 <garyk> MikeSpreitzer1: not sure i understand 15:47:36 <MikeSpreitzer1> Is there any problem with organizing an extra summit? 15:47:56 <MikeSpreitzer1> I'm still new here, learning the rules 15:48:03 <MikeSpreitzer1> already made some mistakes, sorry about that 15:48:18 <PaulMurray> Mike 15:48:22 <PaulMurray> ooops 15:48:30 <garyk> At the summit hopefully we'll have a few slots to discuss the scheduling. The PTL will need to allocate time 15:48:56 <Yathi> unconference sessions ? 15:49:05 <garyk> The last summit russellb gave us a lot of sessions. We met before and aligned the presentations 15:49:26 <garyk> I think that this time we should also meet before. Syncing it all will be a challenge 15:49:34 <PaulMurray> If that is the objective it is a good idea 15:50:21 <garyk> I just think that it is very important for us to try and make the most of the time that we get. 15:50:38 <garyk> Conveying the ideas and getting the communities support is a challenge 15:50:50 <MikeSpreitzer1> Yes. But at some point we have to dig into details, that is sometimes necessary to get real agreement. 15:51:19 <garyk> MikeSpreitzer1: true. 15:51:40 <MikeSpreitzer1> I am a big fan of reading and writing. But time for discussion is needed too. 15:51:53 <garyk> In the last two summit with Neutron there was great colabortaion for the LBaas and FWaas,. maybe we need to follow the same model 15:52:11 <MikeSpreitzer1> Can you elaborate? 15:52:12 <garyk> That is, set up a few meetings and get all of our information into a google doc (or etherpad) 15:52:31 <MikeSpreitzer1> yes, that sounds good. Good writeup and reading beforehand, detailed discussion. 15:52:34 <garyk> Then when we can come to summit we can present the details and get inputs from the community 15:52:43 <garyk> MikeSpreitzer1: exactly 15:53:06 <garyk> boris-42: and Yathi: have two implementaions 15:53:19 <Yathi> Okay I think we have already started some of this process in our etherpad 15:53:29 <garyk> I still think that we need the documentation to have the idea from A - Z. Then we can slice up the cake/pie 15:53:30 <Yathi> and we have added some POC code to demo and discuss 15:53:35 <boris-42> garyk I will try to find some time 15:53:41 <boris-42> garyk to update our docs and ehterpad 15:53:43 <garyk> http://9gag.com/gag/adN9Mp9 (sorry I could not resist) 15:54:26 <MikeSpreitzer1> the cake is a lie 15:54:26 <Yathi> funny! 15:54:38 <garyk> Does someone want to take the initiative and start to prepare a document for the API's? 15:55:11 <MikeSpreitzer1> garyk: which APIs? (which level)? 15:55:47 <garyk> I think the 3 parts - user/admin; information required from service and scheduling engine 15:56:06 <MikeSpreitzer1> I am interested in working on that. 15:56:32 <MikeSpreitzer1> Not sure what I can promise, but realize it has to be done long enough before summit to allow careful reading. 15:56:53 <garyk> MikeSpreitzer1: great. I'd be happy to work with you on that too. I am a bit pressed for time in the coming two weeks, but after that I will have some free cycles 15:56:59 <Yathi> scheduling engine API part, my code relied upon something existing 15:57:07 <Yathi> my POC code I mean 15:57:21 <Yathi> something based on what the FilterScheduler uses 15:57:28 <garyk> Yathi: cool. 15:57:43 <garyk> which is good for backward compatibility (and very importnat) 15:57:53 <Yathi> but this should interface with the new ideas of a "global state repo", and the tenant-faceing APIs 15:57:57 <garyk> How about we decide next week on how we want to proceed? 15:58:09 <MikeSpreitzer1> OK 15:58:14 <Yathi> sure 15:58:33 <Yathi> please review the POC code and the doc I shared links on etherpad 15:58:44 <MikeSpreitzer1> yep 15:58:45 <garyk> i'll try. 15:58:53 <Yathi> POC code doesn't pass unit-tests because of the dependency to PULP 15:59:05 <Yathi> I will need to figure out how to make them pass 15:59:12 <Yathi> not for discussino in this forum sorry 15:59:34 <garyk> so i guess that we'll meet next week. 15:59:38 <garyk> thanks guys 15:59:50 <garyk> #endmeeting