15:03:34 <garyk> #startmeeting scheduling 15:03:35 <openstack> Meeting started Tue Sep 17 15:03:34 2013 UTC and is due to finish in 60 minutes. The chair is garyk. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:03:36 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:03:39 <openstack> The meeting name has been set to 'scheduling' 15:03:55 <garyk> hope that people are around to discuss 15:04:24 <garyk> #topic summit sessions 15:04:50 <garyk> Does anyone have any additional comments or updates to https://etherpad.openstack.org/IceHouse-Nova-Scheduler-Sessions 15:05:00 <MikeSpreitzer> yes 15:05:21 <garyk> MikeSpreitzer: ok, is that what you want to discuss later in the meeting or something else? 15:06:03 <Yathi> Debo and I added a topic called Smart Resource Placement.. and we have added a blueprint 15:06:06 <MikeSpreitzer> Can I start with a clarification on the whole host allocation part... 15:06:20 <garyk> Yathi: thanks! 15:06:56 <alaski> MikeSpreitzer: what would you like clarification on? 15:06:56 <garyk> MikeSpreitzer: Sure. Unless people want to discuss something else regarding the proposed summit sessions 15:06:59 <MikeSpreitzer> Is whole host allocation about bare metal allocation , really exclusive allocation, or is it about some bigger unit of allocation (pool)? 15:07:28 <alaski> MikeSpreitzer: It's not about baremtal. It's about allocation to host aggregates essentially 15:08:10 <alaski> host aggregates will be set aside for exclusive use by a tenant, or delegated tenants 15:08:24 <MikeSpreitzer> It is about giving one tenant control over a whole host aggregate, right? 15:08:30 <alaski> yes 15:08:33 <MikeSpreitzer> So it is about this larger unit of allocation. 15:08:38 <alaski> yep 15:09:08 <MikeSpreitzer> Why do we want that? 15:10:01 <garyk> performance and isolation may be motivations 15:10:11 <alaski> There are customer requests for this type of allocation. I've heard it's for concerns about resource isolation and somewhat for security concerns, though that's questionable 15:10:12 <garyk> security too 15:11:21 <MikeSpreitzer> Performance and isolation can be delivered by requesting performance and isolation from one undivided cloud, letting that cloud decide where to place for performance and isolation. 15:12:13 <MikeSpreitzer> Same thing for security, really. 15:13:24 <alaski> that's kind of what this is doing 15:13:35 <alaski> host aggregates just help the cloud decide where to place instances 15:13:52 <garyk> it is allowing the tenant to run their instances on specific resources that may be reserved for that specific tenants 15:14:22 <MikeSpreitzer> That sounds like AZ functionality. 15:15:31 <garyk> in my opinion it is just another option that is available that enables the cloud provider to meet certain standards. 15:15:33 <MikeSpreitzer> My point here is that a holistic scheduler that is aware of isolation issues could place for isolation, without having a separate feature for dividing up the cloud a-priori. 15:16:41 <alaski> MikeSpreitzer: that's likely the case, though how does it ensure that there remains enough spots to ensure isolation is possible? 15:16:46 <garyk> i agree with you on that. but why not have the option of allocating a whole host? 15:16:59 <MikeSpreitzer> ALaski: yes,... 15:17:20 <MikeSpreitzer> (thinking on my feet here...) 15:17:39 <alaski> But whole host allocation is very early right now. I know it's going to be the topic of a lot of discussion so alternative ideas are appreciated 15:17:54 <MikeSpreitzer> OK, I'll stop here. I understand. 15:18:28 <garyk> MikeSpreitzer: please feel free to take your questions or reservations to the lists or bring them up here. 15:18:37 <MikeSpreitzer> Next session. For multiple scheduler policies, what sort of differences are involved? 15:19:03 <garyk> One point that come up at the Neutron meeting last night and I am not sure if it is relevant here is that people wanted to work only with etherpads at the summit and 'ban' presentations. 15:19:43 <garyk> glikson you around? 15:19:51 <alaski> I like the idea. I think it's good for us to think about but probably a topic for the Nova meeting 15:20:08 <garyk> ok. 15:20:24 <garyk> MikeSpreitzer: alex is not here to elaborate. 15:20:39 <MikeSpreitzer> OK, I'll pursue that separately 15:20:58 <MikeSpreitzer> Is Boris Pavlovic here? 15:20:59 <garyk> I think that it enables different scheduling policies to be invoked for different requests. That is, not have one global configuration 15:21:21 <garyk> MikeSpreitzer: not sure. 15:21:40 <garyk> Are there any additional things we want to discuss regarding the summit sessions? 15:21:43 <alaski> boris is boris_42. Doesn't look like he's here 15:22:02 <garyk> he is currently driving a rally 15:22:10 <MikeSpreitzer> I see significant overlap between the "Scheduling across Services" session proposal and the "Smart Resource Placement" session proposal. 15:22:35 <garyk> Yathi: do you think there is overlap here? 15:23:19 <garyk> I think that there may be room for some collaboration here. 15:23:25 <Yathi> Smart Resource Placement provides a generic framework to allow for complex constraints 15:23:41 <MikeSpreitzer> Yathi: between resources of different types? 15:24:02 <Yathi> yes that is part of our idea 15:24:24 <MikeSpreitzer> Isn't that the essence of Scheduling Across Services? 15:25:36 <Yathi> i guess this framework is something that can be leveraged 15:25:47 <Yathi> to build complex constraints that run across services 15:25:51 <garyk> It is in a sense and it is something that we touched on at the last summit but we did not make any progress with this 15:25:58 <MikeSpreitzer> Anyway, I think I am just suggesting they go in the same session. 15:26:36 <MikeSpreitzer> I suppose I am also suggesting the proponents talk to each other and see about a merge beforehand. 15:26:37 <Yathi> scheduling across services calls for orchestration framework 15:26:55 <Yathi> smart scheduling provides a pluggable solver framework 15:27:07 <MikeSpreitzer> um, anything calls for orchestration. What exactly do you mean? 15:27:08 <garyk> MikeSpreitzer: agreed. that is why we are discussing this now to try and be more efficient when it comes to the summit 15:27:58 <Yathi> trying to separate orchestration between services and decision making framework 15:28:10 <Yathi> that is what I meant 15:28:30 <MikeSpreitzer> OK, no surprise there. The u-rpm proposal also has this, as does my group's running code. 15:28:32 <garyk> #action consider combining "smart resource placement" and "multiple scheuler policies" to one session 15:28:40 <MikeSpreitzer> if I understand you correctly 15:29:56 <garyk> Anything else regarding summit or can we move to the resource tracking? 15:30:16 <MikeSpreitzer> I'm done 15:30:32 <garyk> #topic resource tracking 15:30:54 <alaski> I brought this up last time 15:30:54 <garyk> alaski: do you want to explain your ideas. last week we touched on it but the meeting was ending 15:31:31 <alaski> So my main idea is that I think it would be helpful to persist the resource tracker off of the compute node 15:31:45 <alaski> And have it be remotely accessible by other components, like conductor 15:32:06 <MikeSpreitzer> What does the tracker do? 15:32:31 <alaski> My thinking being that I want to speed up scheduling so I want to get a host from the scheduler and then consult the resource tracker quickly without having to roundtrip to the compute 15:32:50 <alaski> MikeSpreitzer: the resource tracker is the definitive source of what resources are available/used on a compute 15:33:08 <alaski> definitive in Nova I mean 15:33:09 <MikeSpreitzer> Really definitive, or a convenient cache? 15:33:35 <alaski> As definitive as we get in Nova, it could still mismatch reality a bit 15:33:43 <MikeSpreitzer> I would expect the hypervisor is the definitive source regarding what is actually being used now. 15:33:51 <garyk> alaski: in some cases there is querying from the db, would that be replaced by interfacing with the conductor instead? 15:34:11 <alaski> MikeSpreitzer: you're correct, so in that sense it is a cache 15:34:17 <MikeSpreitzer> This is where the distinction between what I call observed and target state matters... 15:34:46 <MikeSpreitzer> The observed state is a convenient cache of the real state, and the target state is about allocations that may or may not be in effect right now. 15:35:06 <alaski> garyk: I'm not sure where the db queries are, so i don't know. But possibly 15:36:07 <garyk> alaski: cold this be related to the changes that boris and co are doing with the messages (i have yet to look at that code) 15:36:43 <alaski> MikeSpreitzer: I have your emails flagged and need to read those thoroughly. I think we all want to move in a similar direction and need to figure out how to come together 15:36:55 <MikeSpreitzer> The read of nova's DB, in preference to (cache of) read from hypervisors, would be to get target state. 15:37:49 <MikeSpreitzer> Yes, I am also trying to catch up on the other work here and help figure out how to bring it all together. 15:37:50 <alaski> garyk: possibly, in the sense of using the same pattern for setting it up. But resource tracker and scheduler are separate entities so it's not likely to be touched by his work 15:37:57 <garyk> the complexity is being able to sync all schedulers 15:38:14 <garyk> alaski: ok, understood 15:38:45 <MikeSpreitzer> garyk: I wonder which multiplicity you are referring to. Different services, or different cells/regions/... ? 15:39:33 <garyk> MikeSpreitzer: i am trying to understand how the conductor(s) will manage the data and enable the scheduler(s) to access and use it 15:40:19 <MikeSpreitzer> (I need to learn what a conductor is) 15:40:49 <MikeSpreitzer> garyk: I am still wondering which multiplicity of scheduler you are referring to. 15:41:02 <alaski> garyk: the way I'm looking at it, the conductor queries the scheduler for a host or list of hosts, then it consults the resource tracker to make sure the instance will fit on that host 15:41:25 <MikeSpreitzer> garyk: I do not know what you meant by "the changes that boris and co are doing with the messages"… can you identify another way? 15:41:32 <alaski> Right now we have to send the build to the compute host before it can fail the resource tracker check. I want it to fail faster 15:41:53 <glikson> alaski: wouldn't scheduler already check the available capacity? or you are suggesting to separate the two? 15:42:43 <alaski> glikson: I think they're already separate. TBH I don't know everythng that the scheduler looks at, I should dig into that a bit 15:43:12 <alaski> But I know that sometimes an instance is scheduled to a host and then there's not actually enough free memory to build the instance 15:43:26 <garyk> alaski: in that case it would go to recheduling. 15:43:43 <glikson> alaski: that might happen because of race conditions between schedulers, for example.. 15:44:11 <MikeSpreitzer> alaski: I have had colleagues running clouds tell me that happens for a variety of reasons, mistakes/discrepencies are possible at every level 15:44:44 <alaski> garyk: right. My main concern is optimizing it so the schedule/reschedule loop can be faster 15:44:59 <garyk> i think that there is a over commit ratio that takes thing slike this into account (but may be wrong) 15:45:15 <alaski> glikson: yes. My understanding is that scheduling is a best attempt fail fast setup. I want failure to be as fast as possible 15:45:41 <MikeSpreitzer> alaski: I'm with you on that... 15:45:55 <MikeSpreitzer> but every cache has lag and there can be a nasty surprise in rare cases. 15:46:08 <garyk> alaski: is this something that will work with multiple conductors (sorry I am slow today) 15:46:41 <glikson> alaski: so, are you thinking to keep that somewhere else than the DB, to keep better track of in-fly requests? 15:47:06 <alaski> MikeSpreitzer: true. It's worth me looking into what can go wrong. I guess I'm thinking of a write through type cache where lag shouldn't present itself, but I suppose it could 15:47:18 <garyk> at the moment the flow is api-> scheduler-> compute node 15:47:45 <MikeSpreitzer> only one scheduler can allocate on a compute node, I take it. 15:47:58 <alaski> garyk: it would need to. Right now resource tracker has synchronization based on being on a single compute, but moving it off the compute means we need to address synchronization another way 15:48:16 <alaski> glikson: right now resource tracker is in memory on the compute, I want it in a db or other store 15:48:20 <garyk> alaski: ok 15:48:52 <MikeSpreitzer> I have heard that when VM creation or deletion has a strange failure, a zombie can be left using memory that the scheduler does not realize exists. 15:49:35 <glikson> alaski: I thought it is already using the DB.. but maybe I'm confusing it with something else. 15:49:40 <alaski> MikeSpreitzer: multiple schedulers can allocate to a compute. It's racy, but known to be racy, and the resource tracker is the control point 15:50:02 <glikson> alaski: didn't we just move those updates from using rpc fanout to using the DB? 15:50:15 <garyk> alaski: it would be intersting to discuss the data structure for the resource tracking in more detail 15:51:30 <glikson> alaski: or are you talking about the part that generates those updates, at nova-compute? 15:51:42 <alaski> glikson: I think we're talking about different things. But now you have me wondering if it's sending data up to the scheduler 15:52:33 <alaski> glikson: it's possible. I'm talking about the part that runs instance_claim() to claim resources 15:52:46 <alaski> but it may also be populating something for the scheduler to use 15:53:07 <garyk> we may be running out of time. do we want to continue with this or switch to MikeSpreitzer mails and document? Would could discuss that next week as I am not sure many of us got to read https://docs.google.com/document/d/1hQQGHId-z1A5LOipnBXFhsU3VAMQdSe-UXvL4VPY4ps/edit 15:53:37 <alaski> I say we switch. I think I need to research a bit more and come up with a more solid proposal 15:53:54 <garyk> alaski: ok 15:54:05 <garyk> #topic MikeSpreitzer's mail 15:54:22 <garyk> MikeSpreitzer: with the few minutes left 15:54:32 <garyk> we can always continue next week 15:54:52 <glikson> I also had a quick question regarding the proposal to consider merging multi-sched and smart-sched proposals, when I was away for few minutes.. 15:55:01 <MikeSpreitzer> I am finding rough alignment between the u-rpm proposal and my group's work.. 15:55:16 <MikeSpreitzer> so I thought I would outline what we have worked out. 15:55:38 <garyk> glikson: MikeSpreitzer suggest that have them togertheer as there may be some overlap 15:55:43 <MikeSpreitzer> I have not yet roadmapped to a set of small changes, just wanted some review of the overall vision. 15:56:13 <MikeSpreitzer> and hope to help out 15:56:14 <garyk> glikson: i guess we can take it offline and discuss 15:56:24 <Yathi> garyk, glikson, I think garyk meant smart-sched and 'scheduling across services' 15:56:55 <garyk> Yathi: yes, that is what I meant. sorry my bad 15:56:56 <glikson> garyk: I think the two are complimentary -- one to introduce a new scheduler driver, and the second to have different driver configs co-exist within the same scheduler instance (regardless of which driver it is) 15:57:12 <MikeSpreitzer> I think there is big overlap between those two session proposals and what I wrote about. 15:57:51 <garyk> I think that we should try and read what you have written an then discuss it next week. 15:58:06 <MikeSpreitzer> OK 15:58:11 <garyk> I guess we could also have some time to see what we can combine (if possible) 15:58:23 <garyk> #action discuss MikeSpreitzer proposal next week 15:58:41 <garyk> #action check if we can merge/combine sessions 15:58:41 <glikson> yathu: ah, ok. I personally think those two are also complementary -- the optimization approach is rather orthogonal to the scope of the optimization problem to solve.. 15:59:11 <MikeSpreitzer> I thought it was said that Smart Resource Placement is also about going across services 16:00:03 <MikeSpreitzer> Yathi: right? 16:00:21 <garyk> I am sorry but I guess we will have to continue next week. 16:00:34 <garyk> thanks guys 16:00:36 <garyk> #endmeeting