#openstack-meeting log

15:00:39 <n0ano> #startmeeting gantt
15:00:40 <openstack> Meeting started Tue Sep  9 15:00:39 2014 UTC and is due to finish in 60 minutes.  The chair is n0ano. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:41 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:44 <openstack> The meeting name has been set to 'gantt'
15:00:50 <n0ano> anyone here to talk about the scheduler?
15:01:34 <bauzas_> \o
15:01:54 <n0ano> bauzas_, I was beginning to think you were on baby duty today :-)
15:01:58 <ndipanov> o/
15:02:29 <Chengyong_Lin> #help
15:02:32 <bauzas_> n0ano: naaaah
15:03:20 <toan_tran|2> o/
15:03:30 <n0ano> well, it's approaching 5 after, let's begin
15:03:36 <n0ano> #topic next steps
15:03:49 <bauzas_> is jaypipes around ?
15:04:03 <PaulMurray> hi all - I'm here
15:04:10 <n0ano> odd, my topic didn't work, wonder why, whatever
15:04:35 <jaypipes> bauzas_: ya!
15:04:47 <n0ano> hopefully you all saw the email about `what we agreed to', I'd like to expand upon that
15:05:06 <n0ano> to me there are 3 APIs that need to be cleaned up...
15:05:12 <n0ano> 1) client interface
15:05:18 <n0ano> 2) DB access
15:05:29 <n0ano> 3) resource tracker
15:05:30 <bauzas_> n0ano: not that exactly that
15:05:41 <n0ano> bauzas_, feel free to clarify
15:05:46 <bauzas_> n0ano: by speaking about APIs, we're discussing about
15:06:00 <bauzas_> 1/ select_destinations (incl. request_spec data)
15:06:16 <bauzas_> 2/ update_resource_stats (ie. how the Scheduler is getting updates)
15:06:27 <jaypipes> bauzas_: correct. it's the request_spec that needs to be versioned and "objectiified".
15:06:42 <jaypipes> bauzas_: I prefer "update_resources" vs. update_resource_stats, but yes.
15:06:45 <bauzas_> we also need to objecify the stats dict
15:07:01 <jaypipes> bauzas_: we need to get rid of the stats dict entirely...
15:07:16 <bauzas_> jaypipes: +1
15:07:17 <PaulMurray> jaypipes, bauzas_ I agree incidntally
15:07:19 <n0ano> I was folding the client interface as the implmention of the select destination API
15:07:30 <bauzas_> n0ano: client interface is providing both
15:07:36 <jaypipes> bauzas_: and just send a versioned object that has a set of resource usage objects in it.
15:07:46 <bauzas_> jaypipes: violent agreement here
15:07:49 <PaulMurray> jaypipes, +1
15:08:12 <jaypipes> bauzas_: similar to what the NUMATopology classes added by ndipanov and danpb in nova.virt.hardware look like
15:08:13 <bauzas_> there is one last thing to consider : claims
15:08:23 <jaypipes> bauzas_: yes, that's the biggie.
15:08:34 <jaypipes> or the baddie, depending on how you look at it ;)
15:09:25 <bauzas_> the problem that ndipanov faced is that he had to hack the claim methods for the NUMATopology objects
15:09:28 <ndipanov> so my thinking was that claims need to live in the scheduler, and that weather they are done in the scheduler service or on nodes is an implemntation detail
15:09:42 <ndipanov> bauzas_, that is mostly because we don't have the data model
15:09:49 <jaypipes> bauzas_: exactly.
15:10:05 <jaypipes> bauzas_: please :see my comments about the claim interface in those patch reviews :)
15:10:06 <bauzas_> ndipanov: I'm just thinking we should attach a claim() method to the objects we provide to the Scheduler with API endpoint #2 (update_resource_stats or whatevor)
15:10:43 <jaypipes> ndipanov: claims should be *returned* from the scheduler, yes.
15:10:47 <ndipanov> bauzas_, that does not sound insane to me, but is really up to how we do it
15:10:48 <bauzas_> jaypipes: I'm just thinking that scheduler could just call this method when select_destinations
15:10:50 <jaypipes> ndipanov: or rather, a set of claims..
15:11:34 <ndipanov> jaypipes, you may be right - if you are certain that we want to claim in the scheduler (with a cheap almost lock free db hit) than we may want to do that form the start
15:11:35 <bauzas_> to me, I'm thinking of :
15:11:46 <jaypipes> bauzas_: so, before 1/ and 2/ above, we need to complete the work to un-kludge the ComputeNode -> Service relation.
15:11:54 <bauzas_> jaypipes: \o/
15:11:57 <ndipanov> but if not and we stick to "optimistic" clam if fail retry system we have now
15:12:00 <bauzas_> can't agree more
15:12:04 <jaypipes> ndipanov: yes. I am certain :)
15:12:07 <bauzas_> that FK is insane
15:12:10 <ndipanov> we still want tthe code to live in the sched
15:12:20 <n0ano> I'm a little confused, the claiming should be part of the select destination or part of the update resources or should be a separate interface
15:12:27 <ndipanov> jaypipes, in that case yes - claims HAVE to be in the scheduler :D
15:12:42 <bauzas_> n0ano: objects are passed to the Scheduler using update_resources
15:12:49 <bauzas_> n0ano: and stored
15:13:03 <jaypipes> ndipanov: well, putting the construction of the claim set in the scheduler means the "retry on resource starvation" loop becomes much tighter, meaning we don't need to go all the way to the compute node to figure out we're starved and initiate a retry..
15:13:09 <bauzas_> n0ano: that's when select_destinations is coming that these objects are claimed
15:13:54 <bauzas_> jaypipes: I stated the proposal in the n0ano's reply that we could live with current RT claims for one cycle
15:14:01 <bauzas_> jaypipes: in parallel with objects claims
15:14:02 <n0ano> my issue is that updating resources doesn't necessarily imply claiming anything, your just informing the scheduler what;s available
15:14:12 <ndipanov> jaypipes, right - of course - (your retry is got 0 rows back)
15:14:22 <jaypipes> ndipanov: correct :)
15:14:25 <bauzas_> n0ano: you inform scheduler with objects
15:14:33 <ndipanov> this does sound like we will need to change a lot about the RT as well
15:14:43 <ndipanov> though I may be wrong
15:14:47 <ndipanov> but overall
15:14:48 <bauzas_> n0ano: those objects are claimed when select_destinations
15:15:16 <ndipanov> if this can be done and is not ridiculously more difficult then keeping the current RT + claims only changing the data model
15:15:23 <ndipanov> then by all means do it!
15:15:26 <n0ano> so, in my thinking, the claim happens in the select destination, that makes sense
15:15:35 <bauzas_> n0ano: yup
15:15:38 <jaypipes> ++
15:15:39 <jaypipes> correct.
15:15:57 <bauzas_> lemme provide the next steps I'm seeing
15:16:14 <bauzas_> so folks, you can debate
15:16:28 <ndipanov> jaypipes, who was the ++ to? me or n0ano ?
15:16:44 <bauzas_> 0/ Remove FK on ComputeNode
15:17:06 <bauzas_> 1/ update_resource_stats is providing ComputeNode objects instead of stats dict
15:17:24 <bauzas_> 2/ build_request_spec builds a Request object
15:17:30 <jaypipes> ndipanov: to n0ano and bauza, who were saying claims are done in select_destinations() (which I think should be renamed get_placement_claims()
15:17:52 <bauzas_> 3/ add claim() to ComputeNode
15:18:05 <bauzas_> 4/ call CN.claim() in select_destinations
15:18:17 <ndipanov> jaypipes, ack
15:18:21 <jaypipes> bauzas_: well, not necessarily...
15:18:29 <bauzas_> ok, time for debating :)
15:18:36 <bauzas_> jaypipes: your thoughts ?
15:18:41 <n0ano> bauzas_, was that your complete list?
15:18:48 <ndipanov> jaypipes, and then the RT on compute nodes still goes and does it's updates of available resources as they are now?
15:18:50 * n0ano hates IRC at times
15:19:03 <jaypipes> bauzas_: you wouldn't want to call claim() on the nova.objects.ComputeNode itself, but rather a wrapper (I called it ComputeNodeState object) that has a ComputeNode DB object inside it).
15:19:04 <ndipanov> jaypipes, or how will that work?
15:19:15 <bauzas_> yeah, steps 0 to 4
15:19:37 <bauzas_> jaypipes: interesting
15:20:03 <bauzas_> jaypipes: I just need to care about nested objects
15:20:06 <jaypipes> ndipanov: the RT on the compute node would still do a check, yes, but it would be the exception to the rule that a resource starvation would occur, vs. the current implementation which has the retry logic actually *be* the Claim object abort in the resource tracker.
15:20:25 <bauzas_> jaypipes: +1
15:20:41 <ndipanov> jaypipes, ok - I have a picture now...
15:20:43 <bauzas_> jaypipes: I'm seeing the RT Claims as an emergency check
15:20:57 <jaypipes> bauzas_: see the ComputeNodeState object here: https://review.openstack.org/#/c/103598/4/nova/placement/resource_tracker.py
15:20:58 <ndipanov> is there any way you guys could do a write up
15:21:04 <ndipanov> so that I am sure I am not missunderstanding
15:21:12 <bauzas_> ndipanov: I began step 0
15:21:15 <jaypipes> bauzas_: for an explanation as to why the ComputeNode DB object should be an attribute of the maintained state.
15:21:28 <bauzas_> jaypipes: will read further
15:21:54 <n0ano> we need a short writeup of steps 3 & 4, the others are pretty obvious
15:22:12 <bauzas_> n0ano: we need a *spec* for step 3 and 4 :)
15:22:21 <jaypipes> bauzas_: yes, the RT claim on the compute node would be the emergency check for any externally-initiated change to the virt state (for instance somebody doing a virsh destroy out of band of Nova)
15:22:40 <bauzas_> jaypipes: +1 again
15:22:58 <bauzas_> jaypipes: the last controller, but not the main one
15:23:04 <jaypipes> ndipanov: yes, I am working on a writeup, will share with bauzas_, you, and n0ano today. (etherpad)
15:23:12 <bauzas_> jaypipes: awesome
15:23:18 <ndipanov> jaypipes, awesome!
15:23:18 <n0ano> jaypipes, that would be great
15:23:20 <jaypipes> and sorry n0ano for not responding sooner to your ML post :(
15:23:23 <PaulMurray> jaypipes, me too please
15:23:31 <jaypipes> PaulMurray: of course, will do.
15:23:32 <johnthetubaguy> jaypipes: interested too, hard to follow the irc converstion
15:23:36 <n0ano> jaypipes, NP
15:23:42 <jaypipes> will just send the link to the ML thread and ping you all
15:23:48 <johnthetubaguy> cools
15:23:57 <jaypipes> johnthetubaguy: yeah, agreed :) tough to follow sometimers :)
15:24:08 <bauzas_> jaypipes: yeah, just pile up an empty etherpad, so we can attach it to the logs
15:24:33 <johnthetubaguy> (mostly my fault for not ever being able to make the first half of this meeting)
15:24:34 <n0ano> well, start with an etherpad, we'll need to turn it into a true BP soon
15:24:40 <jaypipes> n0ano: in the meantime, I want to make it clear that I fully support the split of the scheduler code. I just want to make that process have a good chance of suceeding by doing the above few steps before the split happens.
15:25:26 <n0ano> jaypipes, I've always said we all agree on lthe goal, it's the path that is the issue
15:25:31 <jaypipes> ++
15:26:05 <n0ano> one thought, we're redoing `how` we call into the scheduler, do we think this provides the clean, separable interface we are looking for?
15:26:14 <bauzas_> I just want to restate that we agreed during last midcycle meetup that the split will consist in a python lib
15:26:23 <bauzas_> not atm a new project
15:26:46 <jaypipes> bauzas_: yes, the first step is a python lib.
15:27:09 <bauzas_> yeah, just emphasizing this, that's it :)
15:27:12 <jaypipes> n0ano: I personally do think that it would provide such a clean, separable interface.
15:27:40 <johnthetubaguy> jaypipes: +1 its a good first step, lets get an interface, before anyone mentions REST
15:27:51 <jaypipes> oh yeah. totes.
15:27:59 <n0ano> johnthetubaguy, go away, too early for that :-)
15:28:06 <jaypipes> johnthetubaguy: especially since the RPC API is alreaduy versioned.
15:28:14 <PaulMurray> n0ano, I think we need to get this step right to determine exactly what is in the interface
15:28:19 <johnthetubaguy> yeah, its about getting the data versioned too
15:28:21 <bauzas_> update_resource_stats is not RPC versioned
15:28:24 <PaulMurray> n0ano, the retry loop is killing us
15:28:54 <n0ano> PaulMurray, what I'm hearing is that getting the claims right will ameliorate the retry loop
15:29:24 <bauzas_> n0ano: it will
15:29:30 <PaulMurray> n0ano, I think so, I think (not used to long words)
15:29:53 <bauzas_> n0ano: because we're moving from a 2-step scheduling to what the community calls an "optimistic scheduler"
15:30:05 * n0ano likes to expand everyone thinking (and show off - sorry about that :-)
15:30:53 <jaypipes> bauzas_: no, I meant the existing scheduler RPC API is versioned.
15:31:05 <n0ano> anyway, the critical thing is the write up so...
15:31:06 <bauzas_> yeah I know what you meant
15:31:11 <bauzas_> jaypipes: ^
15:31:18 <jaypipes> n0ano: I will have it done by EOD today.
15:31:24 <bauzas_> jaypipes: I mean, select_destinations is actually an RPC versioned call
15:31:35 <n0ano> #action jaypipes to writeup the claims process
15:31:37 <bauzas_> jaypipes: while update_resources_stats is just a DB call
15:31:38 <jaypipes> might even have pretty pictures and diagrams :)
15:31:49 <jaypipes> bauzas_: yes, understood.
15:32:21 <bauzas_> jaypipes: well, to be precise, a conductor call, so versioned anyway
15:32:28 <bauzas_> anyway
15:32:42 <bauzas_> n0ano: jaypipes: want me to restate the next steps ?
15:32:53 <bauzas_> is that clear to everybody ?
15:32:57 <n0ano> bauzas_, can't hurt, to ahead
15:32:57 <johnthetubaguy> I still think we need all that DB split work we spoke about before though, unless I am missing something?
15:33:12 <jaypipes> bauzas_: sure, go for it.
15:33:22 <bauzas_> johnthetubaguy: indeed, that's the last step
15:33:25 <n0ano> s/to ahead/go ahead
15:33:26 <johnthetubaguy> I guess maybe the solution looks slightly different
15:33:37 <bauzas_> johnthetubaguy: it will
15:33:42 <bauzas_> ok, let me restate
15:33:54 <bauzas_> step #0 : Remove FK on ComputeNodes/Service
15:34:11 <bauzas_> step #1 : update_resource_stats provides ComputeNode object
15:34:31 <bauzas_> step #2: build_request_sepc provides Request object
15:34:46 <bauzas_> step #3: add claims to objects
15:34:58 <bauzas_> step #4 : call objects claims in select_destinations
15:35:03 <bauzas_> end.
15:35:19 <bauzas_> so, johnthetubaguy, isolate-sched-db is based on these steps
15:35:28 <bauzas_> we should provide objects to the Scheduler
15:35:31 <johnthetubaguy> hmm, feels odd said like that, but anyways
15:35:37 <jaypipes> bauzas_: that is my understanding
15:35:42 <n0ano> and jaypipes will provide a writeup on how steps 3&4 work
15:36:00 <johnthetubaguy> I think object vs db object might need claifying, vs dict vs object
15:36:11 <johnthetubaguy> but anyway, I will read the write up
15:36:13 <bauzas_> so, by saying that, update_resource_stats(Aggregate)
15:36:49 <bauzas_> so the Aggregate object is versioned
15:37:12 <bauzas_> that's step 6
15:37:39 <n0ano> I would have said step 1.5 but either way
15:38:22 <bauzas_> n0ano: step 3 and step 4 are mandatory
15:38:33 <bauzas_> n0ano: because you have to claim to aggegates then
15:38:42 <n0ano> bauzas_, no argument
15:38:58 * PaulMurray has to leave - will catch up later
15:39:08 <bauzas_> PaulMurray: o/
15:39:09 <n0ano> PaulMurray, tnx for coming
15:39:32 <n0ano> I'd prefer to continue this after we have the writeup so...
15:39:37 <n0ano> #topic opens
15:39:45 <n0ano> any new items for today?
15:40:17 * n0ano listenint to the sound of crickets
15:40:54 <n0ano> OK, let's close, good talking to everyone, we'll meet again next week
15:40:59 <n0ano> #endmeeting