#openstack-meeting log

15:02:47 <garyk> #startmeeting scheduler
15:02:47 <doron> hi
15:02:48 <openstack> Meeting started Tue Nov 19 15:02:47 2013 UTC and is due to finish in 60 minutes.  The chair is garyk. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:02:49 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:02:51 <openstack> The meeting name has been set to 'scheduler'
15:02:58 <jgallard> hi all
15:03:12 <n0ano> o/
15:03:18 <garyk> hi, sorry about missing the meeting last week. combination of the jetlag and clock changes.
15:03:18 <toan-tran> sorry last week I got the wrong time
15:03:31 <toan-tran> it's changed into winter time
15:03:37 <garyk> yeah, i guess that we all got it mixed up a lillte
15:04:08 <garyk> i was thinking that we can go over action items from the summit.
15:04:30 <garyk> in addition to this are there other topics that people would like to bring up?
15:05:00 <n0ano> I guess the question is what are the ARs to go over, do you have a list?
15:05:36 <garyk> at the moment we do not have a lite, but a list would be a good idea
15:05:47 <garyk> i can update regarding the instance groups and the resource tracking
15:06:08 <n0ano> I'm curious about Boris' memcache changes
15:06:26 <doron> +1 on memcached
15:06:36 <garyk> boris-42: you around?
15:06:56 <garyk> regarding the instance groups
15:07:10 <garyk> 1. it was decided that the new api's that we proposed were too complicated
15:07:30 <garyk> 2. it was decided that we should complete the proposed api's for havana (for v2 and v3)
15:07:44 <garyk> this gives us a good basis to start working on
15:08:03 <n0ano> do we have time to complete the API for Havana?  I though new development was closed
15:08:28 <garyk> for havana we missed the cut due to the fact that we did not have v3 support
15:08:49 <garyk> we plane to get this done in the coming weeks. there were also some loose ends regarding the v2 api
15:09:07 <n0ano> so is the plan to do that work for Icehouse instead?\
15:09:19 <garyk> in short, yes, we have the time to complete the api. i certainly hope that we get it in by I1
15:09:31 <garyk> yes. plan is to do it in I
15:09:54 <n0ano> makes sense, is there any other design needed or are you just at the implementation phase
15:09:55 <garyk> At the moment we have anti-affinity scheduling in review and are in the process of thinking about host capabilities
15:10:13 <garyk> the ball is in our court to get down and do the implementations
15:10:19 <MikeSpreitzer1> hi
15:10:25 <garyk> MikeSpreitzer1: hi
15:10:38 <MikeSpreitzer1> Do we have an agenda?
15:10:55 <garyk> so i hope that in the near future we will have something up for review regarding the api (the cli and scheduler part are ready for review)
15:11:12 <garyk> MikeSpreitzer1: basically to go over summit action items etc. no formal agenda
15:11:55 <n0ano> sounds like it would be good to do the review for all at the same time - api, cli and scheduler - rather than doing it piecemeal
15:12:06 <garyk> In parallel Yathi will be working on his scheduling changes
15:12:31 <MikeSpreitzer2> I just had a connectivity glitch, may have missed a remark before the one about Yathi
15:12:38 <garyk> n0ano: we have all of the building blocks and hope just to get them in review soon
15:12:49 <n0ano> OK, sounds good
15:13:23 <garyk> MikeSpreitzer1: basically Yathi will be working on his 'smart resource placement' pluggable driver
15:13:23 <MikeSpreitzer2> n0ano: I would like to follow up on your question at the summit about scalability
15:14:04 <garyk> MikeSpreitzer2: i guess that this is a good time to talk about that.
15:14:11 <n0ano> MikeSpreitzer1, in what way, I'm hoping that Bors' memcache changes will address a large part of the current scalability, I'd like to see where that goes
15:14:19 <n0ano> s/Bors/Boris
15:14:27 <garyk> it would be nice if boris-42 could chime in regarding their performance developments.
15:14:30 <MikeSpreitzer2> Yes, I very much agree, Boris' suggestion is getting way too little love
15:14:56 <MikeSpreitzer2> I also wanted to talk about the goal posts (hoping to nail them down so they do not move)
15:15:02 <n0ano> love should be coming, as I understand it the patch is working, they just have to create some peformance numbers
15:15:18 <n0ano> MikeSpreitzer1, which goal posts?
15:15:25 <MikeSpreitzer2> for scalability
15:15:39 <toan-tran> well, i'm curious so see the number as well
15:15:51 <n0ano> aah, interesting question, typically that has always been - a good as possible
15:15:53 <toan-tran> as I understand, LP solving is rather a costly
15:15:59 <n0ano> s/a good/as good
15:16:18 <garyk> i think that we are all curious about the numbers. i am not sure how many schedulers thay are using in thier test case. but i understand that they simulate 100's of nodes
15:16:33 <MikeSpreitzer2> "as good as possible" begs the question of "possible" — which begs the question of other requirements
15:16:38 <garyk> toan-tran: what is LP?
15:16:43 <n0ano> MikeSpreitzer2, exactly
15:16:58 <toan-tran> sorry, i'm talking about solver scheduler
15:17:03 <toan-tran> miss up a little
15:17:07 <toan-tran> mess up
15:17:13 <garyk> toan-tran: no problem.
15:17:33 <garyk> i think that is work in progress and hopefully when Yathi is back from his vacation there will be more information on that.
15:17:47 <MikeSpreitzer2> OK, so this is what I was afraid of, no extrinsic "good enough" mark
15:18:28 <n0ano> MikeSpreitzer2, sort of, currently we're good for about ~200 nodes, the goal at least is to do ~1000 nodes, would like to do ~10000
15:18:45 <n0ano> I don't know if those are acceptable goals or not but they make sense to me
15:19:19 <toan-tran> 200 nodes on simulation or real test? and if simulation which simulator?
15:19:30 <MikeSpreitzer> Just had another connectivity glitch, missed everything after the first numbers from n0ano
15:19:42 <MikeSpreitzer> Glad to have a number
15:19:47 <n0ano> MikeSpreitzer2, sort of, currently we're good for about ~200 nodes, the goal at least is to do ~1000 nodes, would like to do ~10000
15:19:52 <garyk> i think that rally (and may be wrong here) is a testing environment when one can simulate load
15:19:55 <n0ano> I don't know if those are acceptable goals or not but they make sense to me
15:20:01 * n0ano loves cut/paste
15:20:31 <n0ano> toan-tran, the ~200 nodes comes from real world usage, not simulation
15:20:41 <MikeSpreitzer> So my group currently thinks our solver can do about 1K hosts, is unlikely to do 10K hosts well enough.
15:20:44 <toan-tran> nice
15:21:13 <garyk> MikeSpreitzer: i think that there are a number of different issues at hand
15:21:21 <garyk> 1. is the interactions with the database
15:21:30 <MikeSpreitzer> I have started reading the Omega paper, which seems to be recommending multiple solvers with optimistic concurrency control as a way to scale beyond the ability of a single solver
15:21:42 <MikeSpreitzer> Yes, DB interaction is crucial
15:21:45 <n0ano> I belive that Boris is claiming his memcache should be able to handle ~10000 which is why I `really` want to see his real perf numbers
15:21:53 <garyk> 2. each scheduler being able to have a real time picture of the current situation
15:21:55 <MikeSpreitzer> In our current code the DB is more of a bottleneck than the solver
15:22:05 <garyk> 3. placement complexity
15:22:19 <garyk> yes, i agree. the db is the major bottleneck
15:22:36 <garyk> i think that is where boris-42's solution comes into place.
15:22:44 <MikeSpreitzer> That is why I raised the question of whether we could use NOSQL instead of SQL, it could be a major advance in DB efficiency.
15:22:45 <toan-tran> Mike: sorry, which paper are you taling about?
15:22:57 <garyk> i am not sure if it just touches on the problem or if it actually provides a real solution
15:23:06 <MikeSpreitzer> This year's EuroSys, a paper from Google folks on their Borg replacement called Omega
15:23:25 <garyk> MikeSpreitzer: can you please post alink if you have one
15:23:27 <MikeSpreitzer> Someone recommended it at the summit
15:23:29 <n0ano> also, what about the session where we talked about decisions that were `good enough`, not perfect, another way to scale up
15:23:42 <MikeSpreitzer> That's inherent in all our solver based approaches
15:23:45 <doron> MikeSpreitzer: boris-42's suggestion was to replace db with memcache which can be sync'ed bwith several instances
15:23:59 <doron> this should resolve the db issue.
15:24:00 <MikeSpreitzer> But only part of DB usage is that
15:24:26 <doron> well IIRC resource tracker should report to the scheduler
15:24:41 <MikeSpreitzer> For the Omega paper, just Google "Google Omega", it's the first hit
15:24:45 <doron> and basically this should remove the need for a db
15:25:12 <toan-tran> I have a remark on the implemented solver scheduler
15:25:27 <MikeSpreitzer> We do have a system with multiple moving parts regardless, so there has to be a DB for some level of coordination.
15:25:32 <n0ano> MikeSpreitzer, this link
15:25:38 <n0ano> #link http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEQQFjAA&url=http%3A%2F%2Feurosys2013.tudos.org%2Fwp-content%2Fuploads%2F2013%2Fpaper%2FSchwarzkopf.pdf&ei=zIKLUqXUHIi2yAG4noDIAQ&usg=AFQjCNGy7xy2leYjXnSPgAJckQKX0qxung&sig2=i3S8xPB05-_bbzI8FxHDsQ&bvm=bv.56643336,d.aWc
15:25:41 <toan-tran> it calls DB (RAMwiegher) once for getting the cost
15:25:53 <doron> MikeSpreitzer: I agree, but stats are voletile
15:26:37 <toan-tran> however, it does not reflex the Load Balacing policy as RAMWeigher doe
15:26:38 <toan-tran> does
15:27:38 <toan-tran> basically the objectif functions cannot be linear
15:28:04 <toan-tran> basically the objective function cannot be linear
15:28:13 <doron> guys, this is boris-42's BP: https://blueprints.launchpad.net/nova/+spec/no-db-scheduler
15:29:38 <garyk> i hope that next week boris-42 could join us to elaborate on thier developments
15:29:51 <doron> +1
15:29:55 <boris-42> garyk I am in vacation=)
15:30:00 <boris-42> garyk yes I will joing
15:30:09 <boris-42> garyk we are going to finish implementation
15:30:23 <garyk> ok, enjoy the vacation. chat to you next week
15:31:02 <garyk> in addition to this yathi will hopefully be here next week to also discuss the constraints scheduling (i think that is what it is being called).
15:31:19 <garyk> PaulMurray: was there any intersting you want to add from your sessions?
15:31:21 <toan-tran> garyk: +1
15:31:46 <PaulMurray> garyk sorry got distracted
15:32:01 <PaulMurray> do you want me to update?
15:32:21 <garyk> PaulMurray: np. just wanted to know if you wanted to add anything from the session that you did on the scheduling at the summit
15:32:48 <PaulMurray> the main point was sorting out the order of work between me, Lianhau and boris
15:33:05 <PaulMurray> Lianhau was doing scheduler metrics
15:33:22 <PaulMurray> and was ready to go, so I think several of his patches are merged now
15:33:24 <MikeSpreitzer> (I'd like to queue up a little discussion of the ML thread "Introducing the new OpenStack service for	Containers")
15:33:47 <PaulMurray> I will add extensibility to the resource tracking
15:34:23 <PaulMurray> FYI - there was a discussion about boris' work there too.
15:34:42 <PaulMurray> In case the wrong idea came across - I am all for it - just didn't
15:34:57 <garyk> sounds good.
15:35:00 <n0ano> funny how all roads come back to Boris :-)
15:35:13 <PaulMurray> all roads come back to the db!
15:35:19 <garyk> true
15:35:22 <PaulMurray> that's the problem
15:35:26 <n0ano> PaulMurray, +1
15:36:01 <PaulMurray> unless there are questions there is little more to add right now
15:36:07 <PaulMurray> just getting code done.
15:36:24 <toan-tran> I have a question on the scheduler & API if you don't mind
15:36:31 <alaski> btw, I'm around and catching up on the meeting.  Got mixed up with the time change
15:36:31 <PaulMurray> ure
15:36:34 <PaulMurray> sure
15:36:42 <garyk> ok, thanks for the update
15:37:02 <garyk> alaski: hi. also wanted to ask if you have updates regarding your scheduling sessions
15:37:07 <PaulMurray> toan-tran - go ahead
15:37:43 <toan-tran> I'm just curious where we're targetting with solver scheduler & instance group API
15:37:52 <toan-tran> Nova, Heat , or something else?
15:38:02 <garyk> toan-tran: good question.
15:38:12 <toan-tran> Boris also proposed separating scheduler from nova & cinder
15:38:20 <toan-tran> can we corporate?
15:38:25 <garyk> ideally we would have liked the scheduler API to be able to define the who application. This was deemed to complicated at this time
15:38:29 <MikeSpreitzer> Containers may also lead to separating scheduler
15:38:56 <alaski> I didn't have any scheduling sessions, but was at each of them and have opinions which I think are captured in etherpads overall
15:39:02 <doron> btw, neutron also need a scheduler...
15:39:09 <toan-tran> ok neutron ...
15:39:22 <garyk> The solver scheduler will make use of the metadata key value pairs.
15:39:29 <MikeSpreitzer> Will containers and VMs compete for the same hosts?  If so, they need a common scheduler
15:39:31 <n0ano> neutron needs a scheduler??  that seems odd
15:39:44 <doron> n0ano: yep
15:39:53 <doron> they use 'service' vms for routing
15:40:11 <doron> and need to provide hints to control the placment
15:40:16 <MikeSpreitzer> Some networks also have some degrees of freedom in choosing routes
15:40:16 <n0ano> yet another push for Scheduler as a Service
15:40:19 <garyk> n0ano: the scheduler for neutron is pretty simple - just needs to select which dhcp or l3 node to use for the specific support
15:40:26 <garyk> at the moment it is round robin
15:40:35 <doron> I was in a session they discussed it.
15:40:41 <alaski> MikeSpreitzer: re: containers, it can be done many different ways, but most likely it wil be done based on capabilities since they're different 'hypervisors'
15:40:44 <garyk> what we would like to do is add the ability to provide network proximity.
15:40:52 <n0ano> I missed that, sounds like they're trying to get overly complex
15:40:54 <doron> regardless of complexity the need for SaaS is gaining momentum
15:41:10 <doron> SaaS- Scheduling as a service..
15:41:36 <doron> so some of the summit ideas where very true
15:41:41 <doron> in this context.
15:41:52 <toan-tran> ok so now we need a scheduler as a completely independent component
15:42:06 <doron> true, but noi immediatly
15:42:12 <toan-tran> do we need an API like instance group API for it
15:42:18 <toan-tran> or RPC?
15:42:21 <n0ano> toan-tran, that's the implication, I haven't bought into the need yet
15:42:32 <doron> fising scale should be a closer goal
15:42:40 <MikeSpreitzer> Let me ask again very specifically: will containers and VMs compete for the same hosts?  (Regardless of whether they use different hypervisors on those hosts)
15:42:45 <doron> then we can start discussing saas.
15:43:03 <doron> MikeSpreitzer: isn't it a matter of a use case?
15:43:20 <MikeSpreitzer> doron: ?
15:43:43 <alaski> MikeSpreitzer: with the current work a host will have either containers or VMs, not both
15:43:44 <ekarlso> when's the plan to deliver a alpha or beta ?
15:43:49 <doron> MikeSpreitzer: in some places you'd separate them an in other setups you allow such competition
15:43:59 <ekarlso> oops
15:44:03 <ekarlso> wrong chan ;p
15:44:17 <doron> MikeSpreitzer: and I'm not representing LXC ;)
15:44:31 <MikeSpreitzer> So I'll take alaski's answer for now
15:44:57 <MikeSpreitzer> alaski: and you mean that it is the cloud provider's job to manually allocate hosts to those two roles, right?
15:45:26 <toan-tran> i'm ok if we set up in Nova to work with group of VMs
15:45:29 <alaski> toan-tran: I agree with n0ano, it's a bit early to look at a separate scheduler.  But there was talk of using RPC from projects to that scheduler, but I'm not convinced of that approach.  I'm sure it will be heavily discussed at a later time
15:45:40 <alaski> MikeSpreitzer: yes
15:45:45 <toan-tran> however, if we introducing new API then we have to plan far ahead
15:46:06 * doron recalls the Docker session, which suggested such a separation
15:46:20 <garyk> alaski: i agree. it is too early to discuss these issues. regardin tge rpc, not sure if we would want nova to speak to cinder with rpc when there is a well deifned REST api
15:46:51 <toan-tran> the API targets the nova only or plans to Neutron after?
15:47:07 <MikeSpreitzer> garyk: why do you say "nova speak to cinder with rpc"?  I did not get that from alaski's remark
15:47:19 <garyk> toan-tran: at the moment we are only targeting nova. once we get the basic instance groups in then we can start to build on
15:47:46 <garyk> MikeSpreitzer: at the moment nova uses the cinder client to interface with cinder. this is rest based.
15:48:08 <MikeSpreitzer> garyk: connection to alaski's remark?
15:48:08 <garyk> rpc would be like having a backdoor. it may be the long term solution but at the moment seems like a hack.
15:48:28 <MikeSpreitzer> (still confused)
15:48:47 <garyk> MikeSpreitzer: sorry i do not understand. was alaski refering to an external scheduler or the nova one?
15:48:56 <toan-tran> garyk: i agree that we aim to close target
15:49:19 <alaski> garyk: I was referring to an external one
15:49:29 <garyk> alaski: ok, thanks for the clarification
15:49:39 <toan-tran> but we're creating an API before designing an architecture ! and that's not good
15:50:09 <toan-tran> if we get the group into nova
15:50:15 <toan-tran> and the API into v3
15:50:30 <toan-tran> then once the question of separte scheduler does out
15:50:38 <toan-tran> we have to redraw the design
15:50:52 <garyk> toan-tran: i think that we should focus on what we have at the moment.
15:51:01 <toan-tran> recreating the API once again to include neutron & cinder
15:51:07 <MikeSpreitzer> I see two architectures with an external scheduler.  1: optional thing that is upstream from Nova and maybe some other services too.  2: required element that every service calls under the covers.
15:51:41 <MikeSpreitzer> 2: s/every/some/
15:51:55 <n0ano> option 2 will be harder to push through, I'd prefer option 1 if those are my only choices
15:52:16 <MikeSpreitzer> Do you see other choices?
15:52:34 <toan-tran> Mike: so Nova (API) => Scheduler => Nova again?
15:52:40 <n0ano> option 3 - no external scheduler service
15:52:46 <MikeSpreitzer> t-t: that's #2
15:53:11 <toan-tran> Mike: when what's #1
15:53:14 <MikeSpreitzer> n0ano: does option three do anything cross service?
15:53:15 <toan-tran> ?
15:53:47 <n0ano> MikeSpreitzer, nope, each service does it's own scheduling
15:53:56 <MikeSpreitzer> #1: (multi-service scheduler) -> [Heat ->] (Nova || Cinder || ..)
15:54:26 <toan-tran> Mike: oh, i'm thinking like Heat -> scheduler -> nova ...
15:54:29 <toan-tran> :)
15:54:36 <garyk> n0ano: the cross service scheduling is where things become of value. for example we want a cinder volume to be close to to the nova instance
15:54:37 <n0ano> I put option 3 out for completeness, make sure we know what were are deciding upon
15:54:42 <garyk> that is ciritical for performance
15:54:47 <MikeSpreitzer> Well, orchestration is downstream from joint decision making
15:54:56 <garyk> if one is running a best effort cloud then cool.
15:55:18 <garyk> but in order to provide a competitive service one needs to be able to provide some added value
15:55:41 <n0ano> I've always been confused, exactly what does orchestration do and how does it interact with scheduling
15:56:01 <toan-tran> n0no: orchestration of workflow
15:56:12 <toan-tran> it does not do scheduling (lcoation of VMs)
15:56:31 <toan-tran> it only decides which VM should be initiated first
15:56:33 <MikeSpreitzer> I was surprised at the claim that "orchestration means everything to everyone".  It thought it was agreed to mean calling the various service APIs in the right order to get things done.
15:56:53 <garyk> i think that the orchestration and scheduling are mutually exclusive. but that is my opinion
15:57:15 <MikeSpreitzer> .. specifically to handle dependencies between resources
15:57:20 <alaski> orchestration is a very overloaded term.  it's used to talk about workflows within a service or creation of resources across services
15:57:30 <MikeSpreitzer> create/update/delete time dependencies
15:57:34 <garyk> alaski: agreed
15:57:37 <toan-tran> alsaki: +1
15:57:46 <n0ano> garyk, sounds like they are more orthogonal than mutually exclusive but yes, not related
15:57:59 <MikeSpreitzer> yes, orthogonal
15:58:19 * n0ano loves the term orthogonal
15:58:25 <MikeSpreitzer> except that it makes no sense to do joint decision making downstream from orchestration
15:58:29 <garyk> yes, that is a better way of describing it
15:58:32 <toan-tran> in my understanding, the orchestrator deicdes which goes first: scheduling, configuration, rollback, etc
15:58:38 <MikeSpreitzer> So, technically, "independent"
15:59:06 <garyk> it looks like we are at the end of the hour
15:59:18 <garyk> hopefully next week boris-42 will be here for the performance part
15:59:27 <garyk> and yathi for the scheuling part
15:59:37 <MikeSpreitzer1> lost some stuff in another connectivity glitch
15:59:42 <MikeSpreitzer1> but we must be done now
15:59:47 <MikeSpreitzer1> I'll read the log
15:59:50 <garyk> i am going to hand the baton back to n0ano and next week he'll run the meeting
15:59:51 <n0ano> good discussion, let's continue this next week.
16:00:07 <n0ano> garyk, great, now you guys will have me to beat up upon :-)
16:00:09 <doron> thanks guys
16:00:13 <n0ano> tnx all
16:00:20 <garyk> so chat to you guys next week
16:00:30 <toan-tran> thanks, bye
16:00:35 <alaski> thanks
16:00:38 <garyk> #endmeeting