15:02:59 #startmeeting scheduler 15:03:01 Meeting started Tue Oct 22 15:02:59 2013 UTC and is due to finish in 60 minutes. The chair is garyk. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:03:02 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:03:04 The meeting name has been set to 'scheduler' 15:03:26 sorry about not making it last week 15:03:46 do we have any open issues regarding the summit sessions? 15:03:54 What is the final list? 15:04:06 The list isn't final yet 15:04:10 Last week we decided to merge API + smart resource placement sessions into one 15:04:15 but we're looking at 4 slots right now 15:04:25 after some discussion other pieces were taken out 15:04:30 so API and smart resource placement should get their own sessions 15:04:34 instance group api 15:04:38 oh that is sweet 15:04:55 So we got 4 slots? 15:05:14 is there an updated list of the latest session topics 15:05:51 mspreitz: yes 15:06:03 great 15:06:31 are we documenting this final list some where ? 15:06:35 garyk: not yet, some have been refused and preapproved but a lot aren't touched yet 15:06:52 alaski: ok, thanks 15:07:10 but the ones that were discussed last week are looking good for approval 15:07:26 great. thanks! 15:07:31 rethinking scheduler design, extensible metrics, instance group api, amrter resource placement 15:07:44 smarter 15:07:58 ok cool. 15:08:11 i have concerns with the metrics 15:08:24 rethinking design is taking over for performance since boris didn't propose a session 15:08:29 and they seem to have some overlap 15:08:30 not really regarding the session but issues that i stumbled on a few days ago 15:09:09 i think that is logical. i guess that the design considerations should take the performance and scale into account 15:09:24 garyk: are you going to be at the summit? 15:09:31 alaski: are you familiar with the 15:09:41 alaski: yes, i will be at the summit 15:10:01 BTW, what is the format of the design summit sessions going to be? I heard a suggestion of text chat only. 15:10:30 garyk: okay, it will be good if you can voicethe issue in person 15:10:45 alaski: understood. 15:10:48 mspreitz: it's a discussion, with notetaking in etherpads 15:11:01 thanks 15:11:09 just wanted to ask about the resource tracking - it just seems to ignore all used statistics form the hypervisor 15:11:28 garyk: examples? 15:11:44 that is, the hypervisor returns used disk and used memory 15:12:00 garyk: no good to ignore, IMHO 15:12:04 the scheduler is not aware of these as the resource tracker calculate the used memeory and disk by itself 15:12:25 mspreitz: that is my concern too 15:12:46 Do we have evidence of whether or not that dead reckoning falls short in practice? 15:12:58 don't we update the host metrics after a scheduling is done 15:13:04 I am surprised 15:13:04 (I have evidence from other systems that it will) 15:13:13 mspreitz: i am not sure. 15:14:03 my concern is that the scheduler may think that there is enough disk space but a cinder volume may take up space and the resource tracker may not be aware of this 15:14:20 that is just one case. 15:14:30 Yow, that's a really simple example. 15:14:44 If it can fail that way, won't we already have reports of problems? 15:14:48 i guess that i need to go back and do my homework 15:15:12 well scheduler has this notion of retries.. 15:15:19 I have a problem of updating too 15:15:24 I am guessing that is how it works now.. if something is not really available 15:15:29 whenI create multiples VMs 15:15:45 I got that they are not registered immediately 15:15:46 please look at the comment - https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L576 15:16:17 toan-tran: not sure i understand your comment. what do you mean by regsitered 15:16:26 meaning the scheduler sheduling the second VM does not know see the first in DB 15:16:39 sorry, register = update DB 15:17:15 I made a simple weigher that looks for number of VMs in a host instead of available Ram 15:17:17 garyk: now it makes sense, probably it is just taking into consideration the current instance 15:17:27 toan-tran: yeah, there's definitely a race condition with muiltiple instances being scheduled in quick succession 15:17:29 and not the actual hypervisor's state 15:17:43 then I found out that DB is not updated among multiple VMs 15:18:09 Yathi: but the hyperviosr has the true picture of the actual state of the host - that is, the actual amount of free memory and disk space. 15:18:23 This is one reason I keep talking about scheduling against the union of observed and target state. 15:19:47 So there was a session proposed about cleaning up the resource tracker. We're passing it over based on there being no contention about cleaning it up. 15:19:54 something still needs to be done for race conditions I guess, when multiple scheduling calls in parallel or quick succession 15:20:11 There are likely to be issues with it but I don't think there's resistance to fixing it up 15:20:19 Yeah. Take the union of plans and effects as the current usage. 15:20:19 Yathi: +1 15:21:06 alaski: i am in favor of fixing it up. 15:21:15 which of our planned sessions covers the resource tracking topic - enhanced metrics ? 15:21:33 I think the other one... that's Boris' topic, right? 15:21:36 alaski: i just think that it would be nice if there were considerations to the actual usage on the hyperviosr 15:21:54 garyk: agreed 15:21:58 garyk: +1 15:22:13 alaski, garyk: agree. Union the actual usage and the planned usae. 15:22:36 how often do we update the db to get the latest hypervisor states.. that matters I guess here 15:23:10 Yathi: enhanced metrics has some overlap with resource tracking concerns 15:23:14 If you use the union, latency only affects speed with which you can reclaim freed space 15:23:26 garyk: +1 15:23:30 but overall resource tracking issues are non contentious. the work just needs to be done 15:24:47 resource tracker - http://docs.openstack.org/developer/nova/api/nova.compute.resource_tracker.html 15:25:02 Yathi: if I'm not wrong, once per several seconds at best 15:25:03 Yathi: if I'm not wrong, once per several seconds at best 15:25:21 Maybe somebody could spell out those two current session proposals in a bit of detail, so we know what goes in which? 15:25:59 mspreitz: as far as i recall the one was about ceilometer/accessing the resources directly 15:26:03 what is part of the "rethinking design" session 15:26:32 i thin line 63 - https://etherpad.openstack.org/p/IceHouse-Nova-Scheduler-Sessions 15:27:03 there is a blueprint on real resource usage: https://blueprints.launchpad.net/nova/+spec/utilization-aware-scheduling 15:28:16 alaski: Yathi: who will be leading the "rethinking" session? 15:28:53 where is the "rethinking" session in our etherpad? 15:29:19 Yathi: I do not think that it appears. 15:29:45 garyk: I believe it's Mike Wilson(?) 15:29:59 goes by geekinutah in irc, but doesn't appear to be on 15:30:34 Yathi: it's not in the etherpad, but it took the place of scheduler performance 15:30:42 since there's no propsed session there 15:30:49 alaski: thanks. 15:31:08 I thought Boris was going to propose a session? 15:31:13 and it's looking to address similar concerns 15:31:23 mspreitz: I thought so to, but he didn't 15:31:48 i guess that we should try and sync on this topic so that we can be most affective when we meet up 15:32:52 garyk: sounds good, but what does that mean? 15:32:58 Can someone please point me to any written description of this "rethinking" session? 15:33:12 Yathi: http://summit.openstack.org/cfp/details/34 15:33:22 alaski: Thanks 15:33:40 mspreitz: i am not sure. i think that we need to have boris and theguyfromutah talk 15:33:53 Session 34 is different from Boris' topic 15:34:03 garyk: if we can do that, it would be great 15:34:59 There are some overlaps in the "rethinking" session to our "smart resource placement" ideas 15:35:13 #action try and get some talk about ideas of the rethinking prior to the summit 15:35:14 Yathi: yes. Smart has to be "good enough" 15:35:39 mspreitz: yeah the smart need to always wait for the most optimal solution 15:35:47 but it need not 15:35:53 You mean NOT always wait 15:36:05 yeah NOT always wait.. sorry 15:36:25 Optimization problems are usually NP hard, you never expect to find the true optimum 15:37:32 there has to be a cut off as to when to stop the minimization or maximization, as long as the constraints are satisfied, you are good.. 15:37:35 So, yeah, I think going smart implies doing what session 34 asks for. 15:37:46 yathi: exactly 15:37:59 at the moment i feel that people are dealing with a lot of issues: placement, processing, interactions with databases etc. 15:38:14 i am not sure that we have one topic or idea that covers it all. 15:38:37 I think our idea for the smart placement involves this one piece of a smart resource placement - constraint solver, along with the other aspects 15:38:42 session 34 is also dealing with performance. geekinutah is dealing with a >1000 node cluster iirc and they've had performance issues they want to address 15:39:06 I did not think smart was only for small systems 15:39:54 other aspects I mean - common db that covers cross services, suitable for high scale, improved performance over filter scheduling, 15:40:13 alaski: is there any mention of the number of schedulers they are using? 15:40:17 well we have a bunch of sessions with overlapping concerns 15:40:46 yathi: yes 15:40:55 garyk: it may have come up before but I don't recall, might be 1 though 15:41:03 ok, thanks 15:41:37 His etherpad explicitly suggests parallel schedulers 15:41:56 is SovlerScheduler in smarter placement ? 15:43:09 toan-tran - SolverScheduler is one aspect of the smarter placement, but involves other aspects too 15:43:27 Yathi: thanks 15:44:00 toan-tran: See line 53 in https://etherpad.openstack.org/p/IceHouse-Nova-Scheduler-Sessions 15:44:12 I'm just curious about the choice of LinearProgram 15:44:25 is it a little time-consuming? 15:45:25 the idea proposed is intersting 15:45:39 The idea is a pluggable constraints-based solver framework.. so any pluggable solvers can be included 15:45:45 i think that the pain points will arise when it comes to the messaging 15:46:05 that is, we need some kind of p2p messaging. 15:46:15 garyk: for what? 15:46:16 Yathi: ok, so not necessary LP, thanks 15:46:32 for the "rethinking" 15:46:42 I'm lost. 15:46:48 p2p = peer to peer 15:46:49 ? 15:46:56 are we talking some kind of mapreduce kind of scheduling ? 15:47:00 mspreitz: i am going over what is written in https://etherpad.openstack.org/p/RethinkingSchedulerDesign 15:47:01 distributed 15:47:01 You mean offline one-on-one discussions? 15:47:34 that etherpad ends with a long list of alternatives 15:47:45 one of which is optimization orientation 15:49:00 i have to go over it in more detail. i am just concerned that the current infrastructure that we have may not be suited for something like this. i guess that when we discuss it we can see what is required, what is missing and then address. 15:49:39 garyk: mspreitz: if that etherpad has a bunch of alternatives, what is it mainly trying to achieve ? - performance ? 15:49:41 garyk: there are many "it" there. My group has done some investigation of some of them. 15:50:13 First two bullets say "scalability" to me 15:50:23 but maybe i am being a little conservative - that is, if we are unable to get very simple things in then how can we do something that is non trivial 15:50:27 scalability in cloud size, request rate 15:50:30 not very clear - there could be several alternatives possible that way 15:50:47 I would also say we should be explicit about request size 15:51:03 when request is for a whole pattern, not a single resource 15:51:59 yes, a request should be a whole pattern, only the scheduler can know how to place a collection or resources most optimally 15:52:14 We had a summer student with an economics background investigate a bidding approach that can solve joint problems with things like bidding. Takes several rounds of bidding to sort of converge. 15:52:33 I mean problems with affinity 15:53:00 Result was not strong enough to make us take that approach. 15:53:09 garyk: is it now related to instance group apis + the smarter placement taking the whole picture into consideration 15:53:19 i guess that we can all agree - it will be challenging and interesting :) 15:53:56 cross-services scheduling is key 15:54:13 agreed 15:54:17 we made some progress - combining cinder into nova to schedule based on volume affinity 15:54:28 Agreed too. but it also has the problems in "rethinking" 15:54:52 yup, i do not think that it was even addressed in the etherpad (but may be wrong here) 15:55:45 are there any additional issues that we would like to address? 15:55:55 right now or at the summit? 15:56:12 now - we have ~4 min left 15:56:34 I'd like to plead for progress on the API issues before the summit. 15:56:40 No time to do anything now, 15:56:44 mspreitz: +1 15:56:48 but maybe we can agree to do something inML? 15:57:07 mspreitz: that would be great. 15:57:33 maybe if the sessions are closed then next week we can start with discussing the API's 15:57:40 garyk: mspreitz: I guess the API work we have made significant progress already 15:57:52 we agreed on the model 15:58:05 leaving certain minor implementation specifics aside 15:58:32 now it is about how the list of APIs to support.. 15:58:43 and what the payload will be like 15:58:54 right. My group is implementing right now, I am hoping for convergence 15:59:32 mspreitz: Good, Debo and I are planning to push updates for the already committed instance group API code 15:59:56 but this is planned for Icehouse, and not planned to complete before the summit 16:00:25 ok. i really hope we can get this in in Icehouse and do not miss this opportunity 16:00:31 i guess time is up. 16:00:31 It's all provisional, which is why I am concerned about convergence 16:00:40 chat to you guys next week 16:00:44 thanks 16:00:46 ok thanks 16:00:50 #endmeeting