15:00:49 <n0ano> #startmeeting gantt 15:00:50 <openstack> Meeting started Tue Aug 26 15:00:49 2014 UTC and is due to finish in 60 minutes. The chair is n0ano. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:51 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:53 <openstack> The meeting name has been set to 'gantt' 15:01:03 <n0ano> anyone here to talk about the scheduler? 15:01:04 <bauzas> \o 15:01:24 <mspreitz> yes 15:02:07 <n0ano> looks like I missed a lively meeting last week but we'll get to that later 15:02:49 <bauzas> n0ano: yeah, last week one was good to discuss 15:03:10 <jaypipes> o/ 15:03:19 <n0ano> let's get started, maybe others will join in 15:03:31 <n0ano> #topic resource model for scheduler 15:03:36 <bauzas> sure 15:03:40 <bauzas> sooooo 15:03:48 <n0ano> bauzas, jaypipes you said you were going to look at this, any progress? 15:04:02 <bauzas> n0ano: I thought about the sched split yeah 15:04:18 <bauzas> n0ano: so, maybe let me explain why this action item is there 15:04:32 <bauzas> n0ano: so we could discuss about the progress after that 15:04:56 <bauzas> n0ano: so, basically, the last week discussion was about how bad the scheduler was about updating stats 15:05:47 <bauzas> atm, resources are passed to the sched each 60 secs by writing into compute_nodes DB table some JSON fields called "resources" 15:06:19 <bauzas> long story short, we thought it was needed to give a more API for scheduler updates 15:06:42 <bauzas> so, the proposal is re: scheduler-lib patch and what is passed now 15:07:02 <bauzas> the idea is to make use of the next method that will be provided thanks to the patch 15:07:05 <bauzas> hold on 15:07:16 <bauzas> https://review.openstack.org/82778 15:07:29 <bauzas> https://review.openstack.org/82778 is the API proposal for scheduler updates 15:07:53 <bauzas> so, here we provide a JSON field called 'values' 15:08:45 <bauzas> based on last week discussion, we identified the need to keep that method but provide high-level Objects instead of these JSON blobs 15:08:51 <bauzas> so the plan is 15:08:56 <bauzas> 1/ merge that patch 15:09:23 <bauzas> 2/ provide a change for providing ComputeNode object instead of values JSON field into that method 15:09:50 <bauzas> that requires some work on ComputeNode object, ie. making sure that it's correct 15:10:04 <bauzas> the main pain point is the Service FK we have on that object 15:10:39 <bauzas> hence I owned a bug created by jaypipes for cleaning up CN : https://bugs.launchpad.net/nova/+bug/1357491 15:10:40 <uvirtbot> Launchpad bug 1357491 in nova "Detach service from compute_node" [Wishlist,Triaged] 15:10:50 <bauzas> sooooo 15:11:20 <n0ano> a couple of issues come to mind - 1) does this require a change to the DB (which currently holds that JSON string) and 2) how extensible is the new method (I know of changes bubbling underneath related to enhanced compute node stats) 15:11:34 <bauzas> once we have ComputeNode passed instead of arbitrary JSON field, we should think on how provide other objects if needed for filters 15:12:03 <bauzas> n0ano: about 1/ 15:12:24 <bauzas> n0ano: there should be a change about FK on service_id which will be deleted 15:12:31 <bauzas> n0ano: apart from that, no changes on DB 15:13:08 <bauzas> n0ano: because instead of calling db.compute_update, we would be issue compute_obj.save() 15:13:13 <bauzas> which is by far better 15:13:18 <n0ano> so, rather than passing a JSON string in `values' we pass a ComputeNode object (that contains that same JSON string) 15:13:48 <bauzas> n0ano: the main difference being that's not arbitrary fields but versioned and typed ones 15:14:05 <bauzas> jaypipes: your thoughts on that ? 15:14:48 <n0ano> I'm not objecting actually, it seems we're just making the API heavier weight for minimal gain but it still works and if everyone thinks its better that's fine 15:15:59 <bauzas> n0ano: yeah, the problem is that we discovered some problems with the current situation 15:16:35 <n0ano> I think both my points don't really apply, the DB will be the same and you can extend things by changing the `resources' string in the ComputeNote object, different way of doing the same thing 15:16:37 <bauzas> n0ano: for example, with NUMA patches, ndipanov had to convert back from JSON to an object, it was a pure hack 15:17:26 <bauzas> n0ano: the extent of ComputeNode is yet to be discussed 15:17:38 <jaypipes> bauzas: sorry, on phone... 15:18:09 <n0ano> jaypipes, can you scroll back, do we match your thinking? 15:18:11 <bauzas> n0ano: from my perspective, we should say that ComputeNode could have some dependent classes 15:18:41 <bauzas> n0ano: one other point while jaypipes is on the phone, we discussed about the claim thing 15:18:59 <n0ano> just remembering that we have to consider how any changes to the ComputeNode object will be reflecting in the comput_node table in the DB 15:19:03 <bauzas> n0ano: wrt a very good paper I recommend, I'm pro having a in-scheduler claim system 15:19:18 <bauzas> n0ano: that's already tied 15:19:39 <bauzas> n0ano: have you seen the link I provided about this paper, the Omega one ? 15:19:50 <n0ano> that was my concern, change the ComputeNode table implies changing the DB 15:20:05 <n0ano> haven't read that yet, I saw the link, I'll read it 15:20:33 <n0ano> in general I agree, I think the scheduler is the right central place to track resources 15:21:06 <bauzas> n0ano: http://eurosys2013.tudos.org/wp-content/uploads/2013/paper/Schwarzkopf.pdf 15:21:42 <bauzas> n0ano: that one is basically saying that an optimistic scheduler with retry features is better than a divide-and-conquer scheduler 15:22:00 <bauzas> n0ano: in terms of scalability 15:22:21 <n0ano> well, I thought we currently `had` an optimistic with retries :-) 15:22:44 <bauzas> n0ano: right but with a slight difference about the claiming thing 15:22:57 <bauzas> n0ano: in Omega, claiming is done on a transactional way 15:23:20 <bauzas> n0ano: here that's a 2-phase commit 15:23:48 <bauzas> n0ano: that said, I think the most important problem for the split is about the API 15:24:00 <bauzas> n0ano: hence the work to provide clean interfaces 15:24:05 <n0ano> indeed +1 15:24:14 <bauzas> n0ano: kind of a follow-up of scheduler-lib 15:24:34 <n0ano> seems to me that just changing to the ComputeNode object shouldn't be that big a change 15:24:40 <bauzas> n0ano: think so too 15:24:59 <bauzas> n0ano: tbh, that's even part of the move-to-objects effort 15:25:21 <n0ano> so, back to mechanics, the plan is to push the current sched-lib patch and then change it to use an object - right? 15:25:26 <bauzas> +1 15:25:53 <bauzas> that still doesn't mean we agreed on how to update filters for isolate-sched-db :) 15:26:19 <bauzas> ie. I think we need to make use of that API instead of the ERT 15:26:28 <n0ano> right about now just getting the sched-lib resolved seems like a major accomplishment :-) 15:26:51 <n0ano> let's segue into the next topic 15:26:56 <bauzas> jaypipes was having some concerns about the naming of such a method :) 15:27:02 <bauzas> +1 15:27:06 <n0ano> #topic forklift status 15:27:19 <bauzas> (at least once jaypipes is freed from his phone :) ) 15:27:23 <bauzas> soooo 15:27:32 <bauzas> I think we covered the first bit of the split 15:27:33 <n0ano> seems to me the isolate-sched-db is hung up on the ERT discussion, is there anyway to resolve that? 15:28:07 <bauzas> n0ano: by speaking of that, I think we should not rely on ERT for providing that 15:28:18 <bauzas> n0ano: but on scheduler-lib and objects instead 15:28:41 <n0ano> if there's a way to be independant from ERT I'm +2 for that 15:29:00 <bauzas> n0ano: the way has to be designed yet still :) 15:29:49 <n0ano> well, we kind of have to, what do we do if they decide to revert ERT out? 15:30:45 <n0ano> the other option is we code to the current interfaces (e.g. we use ERT) and only change if ERT is changed 15:31:09 <bauzas> n0ano: well, I think we need to think what's a resource 15:31:23 <bauzas> n0ano: here I'm saying that a resource is a ComputeNode object 15:31:57 <bauzas> n0ano: if we want to claim things, that should be on the ComputeNode object too 15:32:26 <n0ano> bauzas, that's kind of a high level view, I would call resources many of the things encapsulated by the ComputeNode object 15:32:51 <bauzas> n0ano: that's where I disagree 15:32:54 <n0ano> I guess the question is how coarse can the resources be 15:33:15 <bauzas> n0ano: IMHO, we should provide a claim class per object 15:33:39 <bauzas> n0ano: ie. "I want to claim usage for ComputeNode" 15:33:56 <bauzas> n0ano: but I can also "claim usage for an Aggregate" 15:34:37 <n0ano> but you don't claim the entire ComputeNode, I want to claim 2G of mem from the node and 2 vCPUs and so on, hence you need a finer granularity 15:34:55 <bauzas> n0ano: the idea behind that is that the computation of the usage is done on the object itself, so it can be shared with RT until we give that to the scheduler 15:35:32 <n0ano> not following, how do I claim that 2G of mem 15:36:31 <bauzas> n0ano: well, you're probably right, it would be a 1:N dependency 15:37:04 <bauzas> n0ano: ie. a ComputeNode object could have a ClaimCPU object, a ClaimMemory etc. 15:37:23 <n0ano> claiming the ComputeNode object is simpler so I'd accept it as a start but, ultimately, I think we'll want finer control 15:38:11 <bauzas> n0ano: well, the outcome of this is to have a compute_obj.cpu.claim(usage) method but you get the idea 15:38:11 <mspreitz> n0ano: finer in what way? 15:38:54 * n0ano screams at X window, let my keyboard talk :-) 15:39:37 <n0ano> mspreitz, rather than claiming an entire compute node object claim 2G from that node and 2 vCPUs and so on 15:39:57 <mspreitz> I thought that's what bauzas is saying 15:40:18 <bauzas> n0ano: to be precise, I don't like the word "claim" 15:40:43 <bauzas> n0ano: I prefer compute_obj.make_use_of() 15:41:02 <mspreitz> bauzas: who quantifies how much usage? 15:41:09 <bauzas> so what you "claim" is a subset of the resource itself 15:41:34 <bauzas> mspreitz: atm, that's compute based on request_spec 15:41:57 <bauzas> mspreitz: it will probably be the scheduler wrt request_spec in the next future 15:42:17 <mspreitz> how would the scheduler modulate what is in the request spec? 15:42:22 <n0ano> bauzas, so the compute node would call compute_obj.make_use_of() reserve resources, is that the idea 15:42:35 <bauzas> n0ano: that's from my mind, correct 15:42:37 <n0ano> bauzas, and then that would have to be sent to the scheduler via an API 15:43:09 <bauzas> n0ano: correct too, until scheduler calls directly that method 15:43:14 <n0ano> bauzas, and would you reserve multiple resources with one call or have to make a separate call for each resource 15:43:51 <bauzas> n0ano: well, you go into the details where that still WIP in my mind :) 15:44:10 <n0ano> sorry, just doing a mind dump here :-) 15:44:16 <bauzas> I'm just thinking of aggregates and instances here 15:44:43 <bauzas> or NUMATopology 15:44:44 <n0ano> there are details to be worked out but, as long as the ability to reserver specific resouces is there, I'm OK with it 15:44:51 <mspreitz> those of us who want to make joint decisions also want to make joint claims 15:45:25 <n0ano> mspreitz, hence my question about whether multiple calls are needed 15:45:39 <mspreitz> right, I did not see a clear answer 15:45:54 <bauzas> frankly, I don't think it's here yet 15:45:56 <mspreitz> I was hoping for an affirmation that this is a goal 15:46:08 <n0ano> mspreitz, I don't think we have one yet, this is just bauzas thinking for the future 15:46:29 <bauzas> jaypipes: still otp ? 15:46:53 <n0ano> mspreitz, I agree with the goal of supporting joint decisions, I don't want to do anything that would preclude that 15:48:27 <n0ano> well, back to immediate concerns, how do we proceed with the isolate-sched-db? 15:49:46 <bauzas> n0ano: I think we still need to see how community is thinking about ERT 15:49:51 <bauzas> n0ano: and leave the patches there 15:50:15 <bauzas> n0ano: but in the mean time, I need to carry on the move to ComputeNode for updating stats and think about the alternative 15:50:51 <n0ano> OK (maybe), I don't like treading water but I guess we can, hopefully the ERT will be decided soon (it better be_ 15:50:57 <bauzas> n0ano: anyway, the spec is not merged so we can't say "eh, that was validated before so that needs to be there" 15:51:22 <bauzas> n0ano: PaulMurray is still on PTO until end of this week AFAIK 15:51:32 <n0ano> bauzas, yeah, I rattled some cages but didn't get a response, at least it hasn't be rejected 15:52:01 <bauzas> n0ano: you can voice your opinion there 15:52:04 <n0ano> really?, I was hoping ERT would be decided this week, now we have to wait until next week, sigh 15:52:12 <bauzas> https://review.openstack.org/115218 Revert patch for ERT 15:52:50 <bauzas> yay, that's the price to pay for having a dependency on such a new feature :) 15:52:55 <n0ano> yeah but nothing is going to happen until Paul gets back, that's the important thing 15:53:00 <bauzas> +1 15:53:04 <n0ano> sigh 15:53:06 <n0ano> moving on 15:53:09 <bauzas> n0ano: hence my work on ComputeNode 15:53:11 <n0ano> #topic opens 15:53:19 <n0ano> anything new anyone? 15:53:32 <bauzas> I'll probably be on PTO end of this week 15:53:43 <mspreitz> I think there's a flaky CI 15:53:49 <bauzas> and maybe the begining of next week 15:54:01 <bauzas> mspreitz: ie. ? 15:54:05 <n0ano> isn't there a bauzas 2.0 scheduled soon :-) 15:54:14 <bauzas> n0ano: bauzas 3.0 tbp 15:54:20 <mspreitz> check-tempest-dsvm-full 15:54:34 <bauzas> n0ano: coming in theaters end of this week 15:54:45 <n0ano> congratulations and good luck 15:55:00 <n0ano> mspreitz, I would imagine that will be a topic for the nova meeting this week 15:55:09 <mspreitz> ok, thanks 15:55:30 <bauzas> mspreitz: well, you can at least check if a bug has been filed 15:55:38 <bauzas> and create it if so 15:55:42 <mspreitz> yeah, haven't had a chance to do that yet 15:55:54 <mspreitz> hope to get to it 15:55:56 <mspreitz> soon 15:56:05 <bauzas> mspreitz: that's the most important thing, because it needs to be categorized 15:56:25 <bauzas> so, rechecks can be done on that bug number and that could get a trend 15:56:41 <bauzas> mspreitz: you can also try logstash to see the frequency of your failure 15:57:09 <mspreitz> bauzas: oh, what's that? 15:57:22 <n0ano> I pretty much just blindly recheck once, only if I get a second failure on code I think is good do I look for a CI issue 15:58:17 <bauzas> dammit, I would recommend a training on the next Summit for you :) 15:58:19 <bauzas> https://wiki.openstack.org/wiki/ElasticRecheck 15:58:43 <bauzas> and in particular 15:58:43 <bauzas> https://wiki.openstack.org/wiki/GerritJenkinsGit#Test_Failures 15:58:51 <mspreitz> thanks, I'll read that again 15:59:18 <n0ano> OK, top of the hour, I'll thank everyone, talk again next week 15:59:22 <n0ano> #endmeeting