15:02:47 <garyk> #startmeeting scheduler 15:02:47 <doron> hi 15:02:48 <openstack> Meeting started Tue Nov 19 15:02:47 2013 UTC and is due to finish in 60 minutes. The chair is garyk. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:02:49 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:02:51 <openstack> The meeting name has been set to 'scheduler' 15:02:58 <jgallard> hi all 15:03:12 <n0ano> o/ 15:03:18 <garyk> hi, sorry about missing the meeting last week. combination of the jetlag and clock changes. 15:03:18 <toan-tran> sorry last week I got the wrong time 15:03:31 <toan-tran> it's changed into winter time 15:03:37 <garyk> yeah, i guess that we all got it mixed up a lillte 15:04:08 <garyk> i was thinking that we can go over action items from the summit. 15:04:30 <garyk> in addition to this are there other topics that people would like to bring up? 15:05:00 <n0ano> I guess the question is what are the ARs to go over, do you have a list? 15:05:36 <garyk> at the moment we do not have a lite, but a list would be a good idea 15:05:47 <garyk> i can update regarding the instance groups and the resource tracking 15:06:08 <n0ano> I'm curious about Boris' memcache changes 15:06:26 <doron> +1 on memcached 15:06:36 <garyk> boris-42: you around? 15:06:56 <garyk> regarding the instance groups 15:07:10 <garyk> 1. it was decided that the new api's that we proposed were too complicated 15:07:30 <garyk> 2. it was decided that we should complete the proposed api's for havana (for v2 and v3) 15:07:44 <garyk> this gives us a good basis to start working on 15:08:03 <n0ano> do we have time to complete the API for Havana? I though new development was closed 15:08:28 <garyk> for havana we missed the cut due to the fact that we did not have v3 support 15:08:49 <garyk> we plane to get this done in the coming weeks. there were also some loose ends regarding the v2 api 15:09:07 <n0ano> so is the plan to do that work for Icehouse instead?\ 15:09:19 <garyk> in short, yes, we have the time to complete the api. i certainly hope that we get it in by I1 15:09:31 <garyk> yes. plan is to do it in I 15:09:54 <n0ano> makes sense, is there any other design needed or are you just at the implementation phase 15:09:55 <garyk> At the moment we have anti-affinity scheduling in review and are in the process of thinking about host capabilities 15:10:13 <garyk> the ball is in our court to get down and do the implementations 15:10:19 <MikeSpreitzer1> hi 15:10:25 <garyk> MikeSpreitzer1: hi 15:10:38 <MikeSpreitzer1> Do we have an agenda? 15:10:55 <garyk> so i hope that in the near future we will have something up for review regarding the api (the cli and scheduler part are ready for review) 15:11:12 <garyk> MikeSpreitzer1: basically to go over summit action items etc. no formal agenda 15:11:55 <n0ano> sounds like it would be good to do the review for all at the same time - api, cli and scheduler - rather than doing it piecemeal 15:12:06 <garyk> In parallel Yathi will be working on his scheduling changes 15:12:31 <MikeSpreitzer2> I just had a connectivity glitch, may have missed a remark before the one about Yathi 15:12:38 <garyk> n0ano: we have all of the building blocks and hope just to get them in review soon 15:12:49 <n0ano> OK, sounds good 15:13:23 <garyk> MikeSpreitzer1: basically Yathi will be working on his 'smart resource placement' pluggable driver 15:13:23 <MikeSpreitzer2> n0ano: I would like to follow up on your question at the summit about scalability 15:14:04 <garyk> MikeSpreitzer2: i guess that this is a good time to talk about that. 15:14:11 <n0ano> MikeSpreitzer1, in what way, I'm hoping that Bors' memcache changes will address a large part of the current scalability, I'd like to see where that goes 15:14:19 <n0ano> s/Bors/Boris 15:14:27 <garyk> it would be nice if boris-42 could chime in regarding their performance developments. 15:14:30 <MikeSpreitzer2> Yes, I very much agree, Boris' suggestion is getting way too little love 15:14:56 <MikeSpreitzer2> I also wanted to talk about the goal posts (hoping to nail them down so they do not move) 15:15:02 <n0ano> love should be coming, as I understand it the patch is working, they just have to create some peformance numbers 15:15:18 <n0ano> MikeSpreitzer1, which goal posts? 15:15:25 <MikeSpreitzer2> for scalability 15:15:39 <toan-tran> well, i'm curious so see the number as well 15:15:51 <n0ano> aah, interesting question, typically that has always been - a good as possible 15:15:53 <toan-tran> as I understand, LP solving is rather a costly 15:15:59 <n0ano> s/a good/as good 15:16:18 <garyk> i think that we are all curious about the numbers. i am not sure how many schedulers thay are using in thier test case. but i understand that they simulate 100's of nodes 15:16:33 <MikeSpreitzer2> "as good as possible" begs the question of "possible" — which begs the question of other requirements 15:16:38 <garyk> toan-tran: what is LP? 15:16:43 <n0ano> MikeSpreitzer2, exactly 15:16:58 <toan-tran> sorry, i'm talking about solver scheduler 15:17:03 <toan-tran> miss up a little 15:17:07 <toan-tran> mess up 15:17:13 <garyk> toan-tran: no problem. 15:17:33 <garyk> i think that is work in progress and hopefully when Yathi is back from his vacation there will be more information on that. 15:17:47 <MikeSpreitzer2> OK, so this is what I was afraid of, no extrinsic "good enough" mark 15:18:28 <n0ano> MikeSpreitzer2, sort of, currently we're good for about ~200 nodes, the goal at least is to do ~1000 nodes, would like to do ~10000 15:18:45 <n0ano> I don't know if those are acceptable goals or not but they make sense to me 15:19:19 <toan-tran> 200 nodes on simulation or real test? and if simulation which simulator? 15:19:30 <MikeSpreitzer> Just had another connectivity glitch, missed everything after the first numbers from n0ano 15:19:42 <MikeSpreitzer> Glad to have a number 15:19:47 <n0ano> MikeSpreitzer2, sort of, currently we're good for about ~200 nodes, the goal at least is to do ~1000 nodes, would like to do ~10000 15:19:52 <garyk> i think that rally (and may be wrong here) is a testing environment when one can simulate load 15:19:55 <n0ano> I don't know if those are acceptable goals or not but they make sense to me 15:20:01 * n0ano loves cut/paste 15:20:31 <n0ano> toan-tran, the ~200 nodes comes from real world usage, not simulation 15:20:41 <MikeSpreitzer> So my group currently thinks our solver can do about 1K hosts, is unlikely to do 10K hosts well enough. 15:20:44 <toan-tran> nice 15:21:13 <garyk> MikeSpreitzer: i think that there are a number of different issues at hand 15:21:21 <garyk> 1. is the interactions with the database 15:21:30 <MikeSpreitzer> I have started reading the Omega paper, which seems to be recommending multiple solvers with optimistic concurrency control as a way to scale beyond the ability of a single solver 15:21:42 <MikeSpreitzer> Yes, DB interaction is crucial 15:21:45 <n0ano> I belive that Boris is claiming his memcache should be able to handle ~10000 which is why I `really` want to see his real perf numbers 15:21:53 <garyk> 2. each scheduler being able to have a real time picture of the current situation 15:21:55 <MikeSpreitzer> In our current code the DB is more of a bottleneck than the solver 15:22:05 <garyk> 3. placement complexity 15:22:19 <garyk> yes, i agree. the db is the major bottleneck 15:22:36 <garyk> i think that is where boris-42's solution comes into place. 15:22:44 <MikeSpreitzer> That is why I raised the question of whether we could use NOSQL instead of SQL, it could be a major advance in DB efficiency. 15:22:45 <toan-tran> Mike: sorry, which paper are you taling about? 15:22:57 <garyk> i am not sure if it just touches on the problem or if it actually provides a real solution 15:23:06 <MikeSpreitzer> This year's EuroSys, a paper from Google folks on their Borg replacement called Omega 15:23:25 <garyk> MikeSpreitzer: can you please post alink if you have one 15:23:27 <MikeSpreitzer> Someone recommended it at the summit 15:23:29 <n0ano> also, what about the session where we talked about decisions that were `good enough`, not perfect, another way to scale up 15:23:42 <MikeSpreitzer> That's inherent in all our solver based approaches 15:23:45 <doron> MikeSpreitzer: boris-42's suggestion was to replace db with memcache which can be sync'ed bwith several instances 15:23:59 <doron> this should resolve the db issue. 15:24:00 <MikeSpreitzer> But only part of DB usage is that 15:24:26 <doron> well IIRC resource tracker should report to the scheduler 15:24:41 <MikeSpreitzer> For the Omega paper, just Google "Google Omega", it's the first hit 15:24:45 <doron> and basically this should remove the need for a db 15:25:12 <toan-tran> I have a remark on the implemented solver scheduler 15:25:27 <MikeSpreitzer> We do have a system with multiple moving parts regardless, so there has to be a DB for some level of coordination. 15:25:32 <n0ano> MikeSpreitzer, this link 15:25:38 <n0ano> #link http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEQQFjAA&url=http%3A%2F%2Feurosys2013.tudos.org%2Fwp-content%2Fuploads%2F2013%2Fpaper%2FSchwarzkopf.pdf&ei=zIKLUqXUHIi2yAG4noDIAQ&usg=AFQjCNGy7xy2leYjXnSPgAJckQKX0qxung&sig2=i3S8xPB05-_bbzI8FxHDsQ&bvm=bv.56643336,d.aWc 15:25:41 <toan-tran> it calls DB (RAMwiegher) once for getting the cost 15:25:53 <doron> MikeSpreitzer: I agree, but stats are voletile 15:26:37 <toan-tran> however, it does not reflex the Load Balacing policy as RAMWeigher doe 15:26:38 <toan-tran> does 15:27:38 <toan-tran> basically the objectif functions cannot be linear 15:28:04 <toan-tran> basically the objective function cannot be linear 15:28:13 <doron> guys, this is boris-42's BP: https://blueprints.launchpad.net/nova/+spec/no-db-scheduler 15:29:38 <garyk> i hope that next week boris-42 could join us to elaborate on thier developments 15:29:51 <doron> +1 15:29:55 <boris-42> garyk I am in vacation=) 15:30:00 <boris-42> garyk yes I will joing 15:30:09 <boris-42> garyk we are going to finish implementation 15:30:23 <garyk> ok, enjoy the vacation. chat to you next week 15:31:02 <garyk> in addition to this yathi will hopefully be here next week to also discuss the constraints scheduling (i think that is what it is being called). 15:31:19 <garyk> PaulMurray: was there any intersting you want to add from your sessions? 15:31:21 <toan-tran> garyk: +1 15:31:46 <PaulMurray> garyk sorry got distracted 15:32:01 <PaulMurray> do you want me to update? 15:32:21 <garyk> PaulMurray: np. just wanted to know if you wanted to add anything from the session that you did on the scheduling at the summit 15:32:48 <PaulMurray> the main point was sorting out the order of work between me, Lianhau and boris 15:33:05 <PaulMurray> Lianhau was doing scheduler metrics 15:33:22 <PaulMurray> and was ready to go, so I think several of his patches are merged now 15:33:24 <MikeSpreitzer> (I'd like to queue up a little discussion of the ML thread "Introducing the new OpenStack service for Containers") 15:33:47 <PaulMurray> I will add extensibility to the resource tracking 15:34:23 <PaulMurray> FYI - there was a discussion about boris' work there too. 15:34:42 <PaulMurray> In case the wrong idea came across - I am all for it - just didn't 15:34:57 <garyk> sounds good. 15:35:00 <n0ano> funny how all roads come back to Boris :-) 15:35:13 <PaulMurray> all roads come back to the db! 15:35:19 <garyk> true 15:35:22 <PaulMurray> that's the problem 15:35:26 <n0ano> PaulMurray, +1 15:36:01 <PaulMurray> unless there are questions there is little more to add right now 15:36:07 <PaulMurray> just getting code done. 15:36:24 <toan-tran> I have a question on the scheduler & API if you don't mind 15:36:31 <alaski> btw, I'm around and catching up on the meeting. Got mixed up with the time change 15:36:31 <PaulMurray> ure 15:36:34 <PaulMurray> sure 15:36:42 <garyk> ok, thanks for the update 15:37:02 <garyk> alaski: hi. also wanted to ask if you have updates regarding your scheduling sessions 15:37:07 <PaulMurray> toan-tran - go ahead 15:37:43 <toan-tran> I'm just curious where we're targetting with solver scheduler & instance group API 15:37:52 <toan-tran> Nova, Heat , or something else? 15:38:02 <garyk> toan-tran: good question. 15:38:12 <toan-tran> Boris also proposed separating scheduler from nova & cinder 15:38:20 <toan-tran> can we corporate? 15:38:25 <garyk> ideally we would have liked the scheduler API to be able to define the who application. This was deemed to complicated at this time 15:38:29 <MikeSpreitzer> Containers may also lead to separating scheduler 15:38:56 <alaski> I didn't have any scheduling sessions, but was at each of them and have opinions which I think are captured in etherpads overall 15:39:02 <doron> btw, neutron also need a scheduler... 15:39:09 <toan-tran> ok neutron ... 15:39:22 <garyk> The solver scheduler will make use of the metadata key value pairs. 15:39:29 <MikeSpreitzer> Will containers and VMs compete for the same hosts? If so, they need a common scheduler 15:39:31 <n0ano> neutron needs a scheduler?? that seems odd 15:39:44 <doron> n0ano: yep 15:39:53 <doron> they use 'service' vms for routing 15:40:11 <doron> and need to provide hints to control the placment 15:40:16 <MikeSpreitzer> Some networks also have some degrees of freedom in choosing routes 15:40:16 <n0ano> yet another push for Scheduler as a Service 15:40:19 <garyk> n0ano: the scheduler for neutron is pretty simple - just needs to select which dhcp or l3 node to use for the specific support 15:40:26 <garyk> at the moment it is round robin 15:40:35 <doron> I was in a session they discussed it. 15:40:41 <alaski> MikeSpreitzer: re: containers, it can be done many different ways, but most likely it wil be done based on capabilities since they're different 'hypervisors' 15:40:44 <garyk> what we would like to do is add the ability to provide network proximity. 15:40:52 <n0ano> I missed that, sounds like they're trying to get overly complex 15:40:54 <doron> regardless of complexity the need for SaaS is gaining momentum 15:41:10 <doron> SaaS- Scheduling as a service.. 15:41:36 <doron> so some of the summit ideas where very true 15:41:41 <doron> in this context. 15:41:52 <toan-tran> ok so now we need a scheduler as a completely independent component 15:42:06 <doron> true, but noi immediatly 15:42:12 <toan-tran> do we need an API like instance group API for it 15:42:18 <toan-tran> or RPC? 15:42:21 <n0ano> toan-tran, that's the implication, I haven't bought into the need yet 15:42:32 <doron> fising scale should be a closer goal 15:42:40 <MikeSpreitzer> Let me ask again very specifically: will containers and VMs compete for the same hosts? (Regardless of whether they use different hypervisors on those hosts) 15:42:45 <doron> then we can start discussing saas. 15:43:03 <doron> MikeSpreitzer: isn't it a matter of a use case? 15:43:20 <MikeSpreitzer> doron: ? 15:43:43 <alaski> MikeSpreitzer: with the current work a host will have either containers or VMs, not both 15:43:44 <ekarlso> when's the plan to deliver a alpha or beta ? 15:43:49 <doron> MikeSpreitzer: in some places you'd separate them an in other setups you allow such competition 15:43:59 <ekarlso> oops 15:44:03 <ekarlso> wrong chan ;p 15:44:17 <doron> MikeSpreitzer: and I'm not representing LXC ;) 15:44:31 <MikeSpreitzer> So I'll take alaski's answer for now 15:44:57 <MikeSpreitzer> alaski: and you mean that it is the cloud provider's job to manually allocate hosts to those two roles, right? 15:45:26 <toan-tran> i'm ok if we set up in Nova to work with group of VMs 15:45:29 <alaski> toan-tran: I agree with n0ano, it's a bit early to look at a separate scheduler. But there was talk of using RPC from projects to that scheduler, but I'm not convinced of that approach. I'm sure it will be heavily discussed at a later time 15:45:40 <alaski> MikeSpreitzer: yes 15:45:45 <toan-tran> however, if we introducing new API then we have to plan far ahead 15:46:06 * doron recalls the Docker session, which suggested such a separation 15:46:20 <garyk> alaski: i agree. it is too early to discuss these issues. regardin tge rpc, not sure if we would want nova to speak to cinder with rpc when there is a well deifned REST api 15:46:51 <toan-tran> the API targets the nova only or plans to Neutron after? 15:47:07 <MikeSpreitzer> garyk: why do you say "nova speak to cinder with rpc"? I did not get that from alaski's remark 15:47:19 <garyk> toan-tran: at the moment we are only targeting nova. once we get the basic instance groups in then we can start to build on 15:47:46 <garyk> MikeSpreitzer: at the moment nova uses the cinder client to interface with cinder. this is rest based. 15:48:08 <MikeSpreitzer> garyk: connection to alaski's remark? 15:48:08 <garyk> rpc would be like having a backdoor. it may be the long term solution but at the moment seems like a hack. 15:48:28 <MikeSpreitzer> (still confused) 15:48:47 <garyk> MikeSpreitzer: sorry i do not understand. was alaski refering to an external scheduler or the nova one? 15:48:56 <toan-tran> garyk: i agree that we aim to close target 15:49:19 <alaski> garyk: I was referring to an external one 15:49:29 <garyk> alaski: ok, thanks for the clarification 15:49:39 <toan-tran> but we're creating an API before designing an architecture ! and that's not good 15:50:09 <toan-tran> if we get the group into nova 15:50:15 <toan-tran> and the API into v3 15:50:30 <toan-tran> then once the question of separte scheduler does out 15:50:38 <toan-tran> we have to redraw the design 15:50:52 <garyk> toan-tran: i think that we should focus on what we have at the moment. 15:51:01 <toan-tran> recreating the API once again to include neutron & cinder 15:51:07 <MikeSpreitzer> I see two architectures with an external scheduler. 1: optional thing that is upstream from Nova and maybe some other services too. 2: required element that every service calls under the covers. 15:51:41 <MikeSpreitzer> 2: s/every/some/ 15:51:55 <n0ano> option 2 will be harder to push through, I'd prefer option 1 if those are my only choices 15:52:16 <MikeSpreitzer> Do you see other choices? 15:52:34 <toan-tran> Mike: so Nova (API) => Scheduler => Nova again? 15:52:40 <n0ano> option 3 - no external scheduler service 15:52:46 <MikeSpreitzer> t-t: that's #2 15:53:11 <toan-tran> Mike: when what's #1 15:53:14 <MikeSpreitzer> n0ano: does option three do anything cross service? 15:53:15 <toan-tran> ? 15:53:47 <n0ano> MikeSpreitzer, nope, each service does it's own scheduling 15:53:56 <MikeSpreitzer> #1: (multi-service scheduler) -> [Heat ->] (Nova || Cinder || ..) 15:54:26 <toan-tran> Mike: oh, i'm thinking like Heat -> scheduler -> nova ... 15:54:29 <toan-tran> :) 15:54:36 <garyk> n0ano: the cross service scheduling is where things become of value. for example we want a cinder volume to be close to to the nova instance 15:54:37 <n0ano> I put option 3 out for completeness, make sure we know what were are deciding upon 15:54:42 <garyk> that is ciritical for performance 15:54:47 <MikeSpreitzer> Well, orchestration is downstream from joint decision making 15:54:56 <garyk> if one is running a best effort cloud then cool. 15:55:18 <garyk> but in order to provide a competitive service one needs to be able to provide some added value 15:55:41 <n0ano> I've always been confused, exactly what does orchestration do and how does it interact with scheduling 15:56:01 <toan-tran> n0no: orchestration of workflow 15:56:12 <toan-tran> it does not do scheduling (lcoation of VMs) 15:56:31 <toan-tran> it only decides which VM should be initiated first 15:56:33 <MikeSpreitzer> I was surprised at the claim that "orchestration means everything to everyone". It thought it was agreed to mean calling the various service APIs in the right order to get things done. 15:56:53 <garyk> i think that the orchestration and scheduling are mutually exclusive. but that is my opinion 15:57:15 <MikeSpreitzer> .. specifically to handle dependencies between resources 15:57:20 <alaski> orchestration is a very overloaded term. it's used to talk about workflows within a service or creation of resources across services 15:57:30 <MikeSpreitzer> create/update/delete time dependencies 15:57:34 <garyk> alaski: agreed 15:57:37 <toan-tran> alsaki: +1 15:57:46 <n0ano> garyk, sounds like they are more orthogonal than mutually exclusive but yes, not related 15:57:59 <MikeSpreitzer> yes, orthogonal 15:58:19 * n0ano loves the term orthogonal 15:58:25 <MikeSpreitzer> except that it makes no sense to do joint decision making downstream from orchestration 15:58:29 <garyk> yes, that is a better way of describing it 15:58:32 <toan-tran> in my understanding, the orchestrator deicdes which goes first: scheduling, configuration, rollback, etc 15:58:38 <MikeSpreitzer> So, technically, "independent" 15:59:06 <garyk> it looks like we are at the end of the hour 15:59:18 <garyk> hopefully next week boris-42 will be here for the performance part 15:59:27 <garyk> and yathi for the scheuling part 15:59:37 <MikeSpreitzer1> lost some stuff in another connectivity glitch 15:59:42 <MikeSpreitzer1> but we must be done now 15:59:47 <MikeSpreitzer1> I'll read the log 15:59:50 <garyk> i am going to hand the baton back to n0ano and next week he'll run the meeting 15:59:51 <n0ano> good discussion, let's continue this next week. 16:00:07 <n0ano> garyk, great, now you guys will have me to beat up upon :-) 16:00:09 <doron> thanks guys 16:00:13 <n0ano> tnx all 16:00:20 <garyk> so chat to you guys next week 16:00:30 <toan-tran> thanks, bye 16:00:35 <alaski> thanks 16:00:38 <garyk> #endmeeting