15:01:00 <n0ano> #startmeeting scheduler 15:01:01 <openstack> Meeting started Tue Nov 26 15:01:00 2013 UTC and is due to finish in 60 minutes. The chair is n0ano. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:03 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:05 <openstack> The meeting name has been set to 'scheduler' 15:01:38 <alaski> hi 15:01:43 <n0ano> bauzas, welcome (we don't bite - much :-) 15:01:56 <bauzas> n0ano: thanks :) 15:02:51 <n0ano> I sent out an agenda but most of the people that are concerned with those items aren't here yet 15:03:51 <n0ano> Given the US holiday this week this meeting might be a bust 15:04:04 <toan-tran> I see Boris with memcached based 15:04:09 <toan-tran> Yathi with instance group 15:04:09 <jgallard> hi all 15:04:19 <toan-tran> collins cannot join 15:04:29 <n0ano> boris doesn't appear to be online and yathi hasn't said anything 15:04:38 <toan-tran> what is "black box scheduler" ? 15:04:44 <MikeSpreitzer> hi 15:04:50 <Yathi> Hi 15:05:29 <n0ano> a session from the summit, basically allow the system to use a `black box' scheduler, put in the data and the black box gives the scheduling answer 15:05:56 <jgallard> this is the same thing as "scheduling as a service" ? 15:06:04 <MikeSpreitzer> How is BB sched different from plugging in a custom scheduler? 15:06:07 <bauzas> n0ano: is it related to the scheduling-as-a-service thing ? 15:06:07 <Yathi> is it the session we proposed ? - smart resource placement ? 15:06:14 <bauzas> jgallard: :) 15:06:17 <jgallard> :) 15:06:38 <n0ano> jgallard, I don't think so, saas is move the scheduler into a separately addressable service, black box is changing the internals of the scheduler 15:06:40 <Yathi> garyk are you on? 15:07:23 <alaski> I think a "black box" scheduler would have to be a new scheduler that's plugged in rather than filter_scheduler. There would have to be a compelling reason for a deployer to use it 15:07:25 <garyk> hi, sorry, was on a call 15:07:28 <n0ano> Yathi, I believe the BB was from the rethinking scheduler design session 15:07:33 <jgallard> n0ano: ok, thanks for the clarification 15:07:53 <toan-tran> do we have a etherpad on BB? 15:08:16 <n0ano> one was started at the summit, it should still be there 15:08:36 <n0ano> #topic black blox scheduler 15:08:49 <garyk> toan-tran: let me try and look up lifeless's etherpad on the scheduling 15:09:06 <toan-tran> garyk: thanks 15:09:24 <n0ano> alaski, yes, I was worried about throwing out the baby with the bath water with this proposal 15:09:29 <Yathi> It will be good to have the link... for all the session etherpads.. I seem to have lost it 15:09:55 <MikeSpreitzer> https://wiki.openstack.org/wiki/Summit/Icehouse/Etherpads#Nova 15:10:01 <n0ano> the current filter scheduler has some scaling concerns, I don't know that we have to throw it away completely to address them. 15:10:10 <MikeSpreitzer> I do not see the words "black box" on that index 15:10:19 <jgallard> MikeSpreitzer: thanks 15:10:25 <garyk> here is the proposal - https://etherpad.openstack.org/p/icehouse-external-scheduler 15:10:57 <n0ano> MikeSpreitzer, that's my interpretation, that's probably not the exact words from the session but I think it describes it better 15:11:22 <MikeSpreitzer> What garyk posted is Robert Collins' proposal 15:11:38 <MikeSpreitzer> that's not "black box", that's code refactoring 15:11:44 <bauzas> n0ano: was it about extending the resource tracker ? 15:11:46 <garyk> MikeSpreitzer: yes, that is correct. it seems be be gaining momentum 15:12:01 <garyk> My understanding is that the first step will be code moving 15:12:12 <garyk> Then there will be discussion how to make it into a service 15:12:33 <bauzas> garyk: agreed, that's the saas goal 15:12:34 <MikeSpreitzer> garyk: neither of those is "black box", at least as the words are usually construed 15:12:44 <garyk> :) 15:12:44 <n0ano> bauzas, it was to create a set of constraints that could be fed to an industry standard scheduler code 15:12:51 <Yathi> black box, if I remember about the rethinking scheduler design proposal - it is about the multiple scheduler threads 15:13:19 <Yathi> but can't recollect this being called as black box 15:13:23 <bauzas> n0ano: ah, so you talk about this one ? https://etherpad.openstack.org/p/IcehouseNovaExtensibleSchedulerMetrics 15:13:35 <MikeSpreitzer> OK, maybe the problem is just bad wording on today's agenda 15:13:38 <alaski> I think there was concern that a solver scheduler would be a black box 15:14:10 <Yathi> as part of the smart resource placement design session - we talked about Solver Scheduler - a constraint based solver 15:14:25 <n0ano> bauzas, no, that's not it either, let me look 15:14:28 <Yathi> a pluggable "black box" so to say! 15:14:43 <MikeSpreitzer> Any replaceable module in a system is a black box in that sense, alaski, right? We define its interface, internals are private. 15:15:09 <toan-tran> pluggable? plugged to what? 15:15:13 <toan-tran> nova? 15:15:14 <n0ano> found it - https://etherpad.openstack.org/p/RethinkingSchedulerDesign 15:15:20 <toan-tran> or openstack in general? 15:15:52 <alaski> MikeSpreitzer: in a sense yes. But with the filter_scheduler it's easy to trace how it made its decision, a solver scheduler was considered a potential black box because there's not that same traceability 15:16:23 <alaski> it's more about debugging issues when the scheduler doesn't return an answer you expect 15:16:30 <MikeSpreitzer> ah yes, I remember that remark 15:16:37 <Yathi> exactly.. this issue was raised at the session 15:17:39 <Yathi> traceability may have to be introduced, probably with some logging if it is possible 15:17:50 <MikeSpreitzer> But I'm not sure what to do with it. Are we to shy away from every computation that is not easy to reproduce in someone's head? 15:18:43 <alaski> In my opinion, no. But it can't be the only, or default, option for Nova 15:19:00 <MikeSpreitzer> because...? 15:19:20 <alaski> because the default is used for gating code changes and traceability is a necessity 15:19:42 <MikeSpreitzer> what exactly do you mean by "traceability"? 15:20:06 <Yathi> it believe even with filter scheduler - it is a list of filters 15:20:16 <Yathi> so some filters will fail and log ? 15:20:30 <alaski> understanding why a scheduling decision was made. If Jenkins fails a gate check because it couldn't schedule an instance, I want to know why 15:20:39 <MikeSpreitzer> thanks 15:20:52 <toan-tran> alaski: we have logs on every filter 15:20:59 <toan-tran> cant that help? 15:21:41 <MikeSpreitzer> In my group's previous work, we developed a replay framework. Problem instances can be logged completely, and replayed into a test harness for debugging purposes. 15:21:57 <MikeSpreitzer> Essentially, a formalized kind of log that can be replayed. 15:22:14 <n0ano> MikeSpreitzer, but how do you know the exact state that the system was in in order to replay things 15:22:34 <MikeSpreitzer> The log contains all the relevant information. 15:22:35 <alaski> toan-tran: I'm not sure if there are logs on every filter, but they can be added. And there is a blueprint for additional logging in the scheduler being worked on 15:23:07 <toan-tran> alaski: at least the filter_scheduler says which filter returns which hosts 15:23:34 <toan-tran> of course' it's inside the filer that we have to add log if we need more details 15:23:44 <n0ano> always remembering that logging adds overhead, we're already concerned about scheduler efficiency 15:23:50 <toan-tran> we also have Error code, although not every detailed 15:24:03 <alaski> toan-tran: right. The filter_scheduler is fine, the concern is regarding a potential new scheduler which is based on more complicated solving methods, or possibly even hueristics 15:24:34 <alaski> and by fine I mean not too bad, it could certainly be better 15:24:41 <toan-tran> alaski: aggreed 15:24:54 <MikeSpreitzer> Regardless of decision method, same inputs apply, right? 15:25:25 <MikeSpreitzer> Would it be OK to have a variable level of logging? Full in the gate, production might be less? 15:25:34 <Yathi> just to be clear.. the idea is not yet to replace Filter scheduler.. provide an additional option for a scheduler driver 15:25:50 <n0ano> MikeSpreitzer, I think that's an absolute requirement 15:25:52 <cfriesen> is logging really expensive? I thought the issue was mostly the time to pull the data out of the database? 15:26:06 <toan-tran> so basically we need a framework to write the new scheduler, some steps that it must voice the state? 15:26:17 <alaski> Yathi: yes. 15:26:38 <n0ano> cfriesen, the logs have to be stored somewhere, we're already concerned about DB access, this would just make it worse 15:26:47 <alaski> MikeSpreitzer: variable logging would be great 15:27:10 <cfriesen> why not just stream the logs via syslog? 15:27:20 <Yathi> I think it is about enhancing a decision making engine to be able to clearly log which of the constraints did not satisfy 15:27:50 <MikeSpreitzer> Yathi: getting a log of complete input is non-trivial 15:27:59 <MikeSpreitzer> but necessary to replay and explain. 15:28:18 <MikeSpreitzer> However, note that some serious guys do very extensive logging all the time 15:28:29 <n0ano> cfriesen, possible but one of the ideas is creating multiple schedulers, with multiples a single log point would be helpful although maybe I've overthinking things 15:28:36 <MikeSpreitzer> Do I recall correctly that Google logs a lot all the time? 15:29:16 <Yathi> I guess I don't have anything else to add here at this point on the logging aspect 15:30:00 <toan-tran> log is not good, we should think about creating info objects 15:30:11 <MikeSpreitzer> I have experience with IBM products that offer variable level of logging. Our product guys love it. I hate it when called in to debug a customer problem, they always logged too little, so it always starts with "turn up the logging to XXX and then reproduce the problem" 15:30:12 <toan-tran> I think we have a blueprint for that 15:30:25 <n0ano> toan-tran, not sure I understand what you mean about objects 15:30:48 <alaski> toan-tran: https://blueprints.launchpad.net/nova/+spec/record-scheduler-information though it's still under discussion 15:31:08 <n0ano> MikeSpreitzer, but at least that's an option vs. no or minimal logging 15:31:44 <MikeSpreitzer> yes 15:32:12 <MikeSpreitzer> What we did at first is to make some of our optional logging have a very precise and parseable format, put all information on scheduler problems in there. 15:32:30 <n0ano> well, one take away from this seems to be a concensus that we need to consider logging, especially variable level 15:32:39 <MikeSpreitzer> Later the product guys got interested in non-optional binary logging of structured data, but I'm not sure how far they have taken it thus far. 15:32:50 <n0ano> I don't know if there is any kind of loggin standard in OpenStack, anybody know? 15:33:28 <russellb> openstack/common/log.py is what everything uses 15:33:29 <toan-tran> alaski: this is what I'm talking about: https://blueprints.launchpad.net/nova/+spec/add-missing-notifications 15:34:02 <toan-tran> I remember it has had more information than current version 15:34:22 <n0ano> russellb, which I believe puts everything in files on the local machine with not level capability 15:34:54 <alaski> n0ano: there are level capabilities 15:35:20 <toan-tran> and this one: https://blueprints.launchpad.net/nova/+spec/notification-compute-scheduler 15:35:20 <n0ano> alaski, which are setable from configuration files/run time? 15:35:27 <russellb> and can use syslog 15:35:41 <russellb> yes, you configure what levels you want logged 15:35:44 <russellb> and where you want the logs to go 15:35:48 <bauzas> n0ano: you just set it explicitely 15:36:23 <n0ano> sounds like the infrastructure is there then, we just need to make sure all the filters use the logging services properly 15:37:05 <MikeSpreitzer> And if we want to be able to debug scheduler decision making, "properly" means log all the relevant information at the chosen log level. 15:37:20 <n0ano> MikeSpreitzer, +2 15:37:28 <n0ano> s/+2/+1 15:38:00 <toan-tran> Mike: +1 15:38:07 <toan-tran> the question is , how we find "relevant"? 15:38:27 <bauzas> I only played with a global logger for the whole project, don't know if we can have a special logger for scheduling things 15:38:48 <bauzas> afaik, the logger is global to nova 15:38:59 <n0ano> bauzas, I would hope we don't need anything special, standard loggin services should be fine 15:39:06 <toan-tran> the problem of text logging is that the developper of a scheduler can write anything in the log 15:39:19 <toan-tran> which is not necessarily meaningful to others 15:39:22 <bauzas> n0ano: then you're fine 15:39:35 <MikeSpreitzer> toan-tran: that's why I talk about a precisely defined format for the scheduler problem info 15:39:47 <toan-tran> Mike: agreed! 15:40:03 <toan-tran> should we create a log class for that? 15:40:11 <toan-tran> put some structure into what is logged 15:40:56 <alaski> That's probably a good idea, but I think there's more immediate work before that becomes a concern 15:41:02 <MikeSpreitzer> I agree 15:41:15 <n0ano> some structure is good as long as there is the freedom to add other things that aren't part of the structure 15:41:43 <bauzas> toan-tran: there is no need for a log class 15:42:08 <bauzas> you just have to explicitely define which logger name you want 15:42:15 <n0ano> I'm feeling that someone needs to create a BP to propose some standardized logging for the current scheduler filters 15:42:16 <MikeSpreitzer> In my group's work, we have an internal API to the solver, and it has simple style: input is a whole problem, output is a whole answer. It is pretty easy to do complete logging in that case. 15:43:06 <MikeSpreitzer> We have not had to worry about alternate solvers or alternate schedulers. 15:43:09 <toan-tran> Mike: is it possible to record the state of the system in the log? 15:43:39 <n0ano> MikeSpreitzer, the filter scheduler is kind of like that, input is the set of possible nodes and output is the set of acceptable nodes 15:43:39 <MikeSpreitzer> Currently we log snapshots of the relevant state info. Alternatively the log could stream updates. 15:44:22 <MikeSpreitzer> As alaski said, I think we have beat this horse enough for now. 15:44:55 <Yathi> Mike +1 15:45:06 <n0ano> agreed, since Yahti is here let's switch to 15:45:15 <n0ano> #topic instance groups 15:45:38 <n0ano> Yathi, do you have an update on this 15:45:58 <Yathi> garyk you want to say something 15:47:07 <toan-tran> well, since one one say a word, I have a question :) 15:47:12 <Yathi> no major update as of now. But the plan after the summit was to continue the implementation on a simpler instance group model 15:47:15 <n0ano> looks like garyk got called away 15:47:25 <toan-tran> if we intend to make it into nova 15:47:35 <toan-tran> do we keep edge & policy? 15:47:41 <Yathi> a flat group model 15:47:55 <Yathi> we do not keep the edge 15:48:04 <toan-tran> Yathi: +1 15:48:12 <n0ano> Yathi, I thought there was work needed on the V3 API, is that ongoing 15:48:19 <toan-tran> what about policy, we don't have policy manager either 15:48:24 <toan-tran> ? 15:48:38 <Yathi> yeah I believe it is part of the plan.. to complete what was pending from Havana time.. 15:49:16 <Yathi> work is needed for V3 API 15:49:44 <n0ano> do you think that will be controversial or should it be straight forward 15:49:52 * n0ano always worries about API changes 15:50:52 <cfriesen> I've been playing with the current instance groups CLI and have some comments on usability--where do I send feedback? 15:51:01 <Yathi> we will sync up again with others - garyk, debo and discuss on the remaining tasks 15:51:42 <MikeSpreitzer> cfriesen: I'm just a newbie here, my guess is the mailing list 15:51:57 <Yathi> please send it to - the dev mailer is the best 15:52:02 <MikeSpreitzer> but you can talk to us now too! 15:52:36 <cfriesen> there's a bunch of stuff I ran into...like it would be nice to accept human-readable group names in the commands rather than only the full group UUID 15:52:48 <MikeSpreitzer> +1 in general on that 15:52:53 <cfriesen> and to me it doesn't make sense to have an "instance-group-add-members" command where the member argument is optional 15:53:05 <cfriesen> what does that even mean? :) 15:53:40 <MikeSpreitzer> me steps back, waiting for someone who designed that API to answer 15:54:07 * MikeSpreitzer will eventually remember to type a slash before a command 15:54:20 <n0ano> sounds like no one wants to admit ownership, might need to ask that one on the dev mailing list 15:54:34 <Yathi> I think it is best to compile an email 15:54:56 <cfriesen> okay, will do. 15:55:05 <n0ano> OK, time running down 15:55:09 <n0ano> #topic opens 15:55:21 <garyk> sorry, i had internet problems. 15:55:31 <n0ano> anybody have any opens they want to raise in the few minutes we have available 15:55:39 <garyk> instance group updates: have posted scheduler changes. pending api changes - debu will work on these next week\ 15:55:45 <garyk> sorry for late update 15:55:50 <n0ano> garyk, NP, we didn't say too many bad things about you :-) 15:55:55 <garyk> :) 15:56:09 <n0ano> garyk, yeah, that's what we got, pretty much WIP 15:56:40 <n0ano> any other opens 15:56:43 <toan-tran> I'd like to discuss on SaaS 15:56:58 <toan-tran> well, discuss on SaaS's discussion :) 15:57:03 <Yathi> you mean the external scheduler ? 15:57:09 <toan-tran> yeah 15:57:17 <bauzas> we're running out of time 15:57:19 <n0ano> toan-tran, I would like to discuss it also but we'll need a full session for that 15:57:20 <Yathi> that might need a lot of time.. 15:57:32 <n0ano> Yathi, no might, it will take a lot of time. 15:57:42 <Yathi> :) 15:57:44 <toan-tran> that's what I'm saying :) 15:57:56 <toan-tran> how we organise discussion on SaaS 15:58:02 <jgallard> can we add this item in 1st for next week? 15:58:04 <jgallard> :) 15:58:11 <alaski> +1 15:58:21 <bauzas> I was thinking there was a separate meeting on that point, non ? 15:58:23 <bauzas> no ? 15:58:36 <toan-tran> I don't know where Collins live 15:58:37 <n0ano> next week, if possible, I'd like to get Boris on board to talk about memcached, that's the most important immediate topic, we can put SaaS as the 2nd priority 15:58:46 <bauzas> toan-tran: he lives in NZ 15:58:49 <toan-tran> if he lives in UTC+13 15:59:02 <toan-tran> ... 15:59:12 <alaski> n0ano: sounds good 15:59:14 <toan-tran> ok so we need another slot 15:59:19 <jgallard> n0ano: ok, great :) 15:59:25 <bauzas> that's what lifeless proposed 15:59:30 <toan-tran> not scheduler meeting 15:59:36 <n0ano> we'll discuss further next week 15:59:40 <n0ano> tnx everyone 15:59:45 <n0ano> #endmeeting