#openstack-meeting-3 log

17:00:32 <alaski> #startmeeting nova_cells
17:00:33 <openstack> Meeting started Wed Apr  8 17:00:32 2015 UTC and is due to finish in 60 minutes.  The chair is alaski. Information about MeetBot at http://wiki.debian.org/MeetBot.
17:00:34 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
17:00:36 <openstack> The meeting name has been set to 'nova_cells'
17:00:51 <alaski> anyone here for the cells meeting?
17:00:54 <vineetmenon> o/
17:00:58 <melwitt> o/
17:01:03 <dheeraj-gupta-4> o/
17:01:22 <alaski> bauzas: ?
17:01:28 <bauzas> oh
17:01:29 <bauzas> \o
17:01:36 <alaski> #topic Cellsv1 Tempest Testing
17:01:37 <bauzas> alaski: thanks for pinging me
17:01:57 <bauzas> soooo
17:01:59 <alaski> so cells is green right now
17:02:01 <dansmith> o/
17:02:05 <alaski> but.. https://review.openstack.org/#/c/171414/
17:02:47 <alaski> once that gets in we'll be failing hypervisor tests again
17:02:49 <bauzas> who can we hassle for +W'ing it ?
17:02:55 <bauzas> :)
17:02:59 <alaski> and then there's https://review.openstack.org/#/c/171306 and https://review.openstack.org/#/c/160506/ for those
17:03:39 <alaski> bauzas: I'm not sure, could probably ask in -infra about it
17:03:56 <bauzas> dheeraj-gupta-4: you asked me a question this morning about why cells job is just having a few tests, see https://review.openstack.org/#/c/171414/ for hte explanation
17:04:12 <bauzas> alaski: yeah, that's good that we have at least one +2
17:04:12 <dheeraj-gupta-4> bauzas: Yes got it.
17:04:25 <alaski> I'd like to see that change in before the service objects changes are merged
17:04:32 <alaski> so we can see a good test run on them
17:04:37 <bauzas> alaski: agreed
17:04:51 <bauzas> alaski: but that would require a recheck for both
17:05:14 <alaski> bauzas: yep
17:05:28 <alaski> fortunately jenkins seems in good shape atm
17:05:29 <bauzas> anyway, let's see what we can do soon
17:05:45 <bauzas> RC1 is tomorrow :/
17:06:35 <alaski> yep.  we're close though
17:06:43 <alaski> anything more on testing?
17:06:54 <melwitt> I'll ask in infra and see if we can get +W today
17:07:04 <alaski> melwitt: thanks
17:07:04 <bauzas> melwitt: cool
17:07:13 <alaski> #topic Specs
17:07:27 <alaski> this is mainly a reminder that there are some specs now
17:07:49 <alaski> and scheduling in particular is going to get a lot of feedback I think
17:07:58 <alaski> so the earlier that happens the better
17:08:23 <alaski> https://review.openstack.org/#/c/141486/ https://review.openstack.org/#/c/136490/ https://review.openstack.org/#/c/169901/
17:08:25 <bauzas> alaski: nit: you should amend https://wiki.openstack.org/wiki/Nova-Cells-v2 by modifying the Gerrit search URL
17:08:54 <alaski> bauzas: ok
17:09:15 <alaski> #action alaski update https://wiki.openstack.org/wiki/Nova-Cells-v2 with new specs/reviews/info
17:10:00 <alaski> anything to discuss on specs today?
17:10:08 <vineetmenon> alaski: since we are in scheduling..
17:10:13 <bauzas> alaski: I saw jaypipes's point and I will put some notes
17:10:30 <alaski> bauzas: thanks
17:10:38 <vineetmenon> is it finalized that top level will schedule a new instance upto node level?
17:10:49 <bauzas> alaski: because I think what cells v2 is trying to do with scheduling can directly benefit to scheduler
17:11:05 <bauzas> vineetmenon: that's what's discussed in https://review.openstack.org/#/c/141486/
17:11:06 <vineetmenon> i mean the scheduling decision (cell, host) will be entirely taken by top level
17:11:07 <alaski> vineetmenon: nothing is finalized at this point
17:11:25 <alaski> vineetmenon: but the idea is that asking to be  scheduled will return that
17:11:46 <alaski> vineetmenon: and the scheduler might be broken into two levels behind the scenes.  that's up in the air now though
17:11:47 <bauzas> alaski: I was thinking of an approach like Google Omega, ie. multiple optimistic schedulers sharing a global view
17:12:05 <vineetmenon> alaski: kk
17:12:21 <bauzas> alaski: but that's a scheduler approach - the biggest problem is what jaypipes said, ie. the current scheduler doesn't scale well
17:12:26 <bauzas> s/well/at all even
17:13:05 <alaski> bauzas: I saw you post http://eurosys2013.tudos.org/wp-content/uploads/2013/paper/Schwarzkopf.pdf a while ago and have that in a tab to read
17:13:22 <bauzas> alaski: that's a long long journey
17:13:32 <bauzas> alaski: but I'm trying to move towards that direction
17:13:33 <alaski> bauzas: but if you could distill that into comments on the spec that would be very helpful
17:13:49 <bauzas> alaski: honestly, it's all about deliverables
17:14:12 <vineetmenon> :)
17:14:17 <bauzas> alaski: we can't barely assume that the nova scheduler will be able to cope by magic with thousands of nodes
17:14:24 <bauzas> at least for L
17:14:30 <alaski> agreed
17:14:46 <bauzas> alaski: so I think we should keep a divide and conquer approach as a short term solution
17:15:05 <alaski> what I'm looking at is can we get cells into the scheduler and have it cope with the load of a current non cells deployment
17:15:19 <alaski> I don't expect it to handle a rackspace or CERN load yet
17:15:20 <vineetmenon> bauzas: I was thinking the same +!
17:15:21 <bauzas> with the magic Scheduler (with a big S) as a mid-term goal
17:15:46 <bauzas> alaski: well, I think that the idea behind cells is duplication
17:16:47 <bauzas> ie. call it mitosis
17:17:05 <bauzas> ie. doesn't scale ? duplicate it
17:17:22 <alaski> bauzas: I'm on board with keeping a two level approach, assuming that's what you mean by divide and conquer, if we abstract that properly
17:17:43 <alaski> i.e. hide it behind a client api
17:17:56 <vineetmenon> +1 with two level is that it can scale up to n levels.. if properly thought of..
17:18:01 <bauzas> alaski: shard it, if you prefer
17:18:23 <alaski> bauzas: gotcha.  I'm fine with that too, but that's a big change
17:18:42 <alaski> and has to answer many of the same questions as cells, like how do we handle affinity/anti-affinity
17:18:49 <bauzas> alaski: I think that's a tinier change than just a scalable scheduler
17:19:11 <bauzas> alaski: IIUC, RAX is using one sched per cell ?
17:19:19 <alaski> bauzas: yes
17:19:29 <bauzas> okay, plus the cells scheduler for parenting this
17:19:41 <bauzas> plus the CacheScheduler, got it
17:20:27 <bauzas> okay, sounds like a big big thing anyway - either we scale up or we scale out
17:20:47 <alaski> I think we need to have an understanding of what requirements need to be met for scheduling, and then we can look at how to do it
17:21:23 <bauzas> alaski: well, the scheduler itself is fairly easy to understand*
17:21:34 <bauzas> alaski: the main problem is that it's racy
17:21:54 <alaski> because while I agree that sharding is easier than scaling the scheduler as is, it needs to handle affinity which is then trickier
17:22:01 <bauzas> alaski: so the bigger your deployment is, the higher you get retries
17:22:11 <bauzas> alaski: agreed
17:22:54 <bauzas> alaski: in particular if we consider that aggregates and servergroups are colocated with the cloud itself, not a child cell
17:23:28 <bauzas> but that question hasn't been answered AFAIK
17:23:56 <bauzas> alaski: do you plan inter-cell migrations with cells ?
17:23:56 <alaski> it hasn't, but I agree
17:24:37 <alaski> bauzas: that's tricky too.  I think Nova should support it, but you may want to turn it off.
17:24:48 <bauzas> agreed
17:25:12 <vineetmenon> bauzas: what does that mean, 'inter-cell migrations'?
17:25:59 <bauzas> vineetmenon: do you want to live/cold migrate or evacuate VMs from cell1@hostA to cell2@hostB N
17:26:01 <bauzas> ?
17:26:34 <alaski> this actually gets into the next topic so let me switch real quick
17:26:41 <alaski> #topic Cells Scheduling
17:26:43 <alaski> :)
17:26:51 <bauzas> alaski: that reminds me that I probably missed the biggest question : do we hide to the users the complexity of deployments and are we just showing instances - not cells
17:27:07 <bauzas> alaski: or is it a segragation thing ?
17:27:18 <bauzas> *segregation
17:27:33 <alaski> I don't want to hide it, not fully
17:27:44 <bauzas> alaski: speaking of end-users, not admins of course
17:27:51 <alaski> it doesn't have to be exposed as cells though
17:28:02 <bauzas> alaski: that's my thought
17:28:24 <bauzas> alaski: but if it's hidden, then that's just like aggregates, an op thing
17:29:06 <alaski> I think it's useful to know if two instances are in the same or different cells
17:29:11 <melwitt> if I think about the use case of scheduling to a specific cell using a scheduler hint, then I think it would be nice to expose something about the cells
17:30:09 <alaski> melwitt: agreed
17:30:27 <bauzas> melwitt: that's the difference between AZs and aggregates
17:31:10 <bauzas> anyway, I just want to ask those questions not to point out all the difficulties, just say that we'll need to make decisions and approximations
17:31:16 <alaski> this is a good question to get some operator feedback on
17:31:23 <bauzas> +1
17:32:16 <alaski> back to scheduling
17:32:47 <alaski> I think we should discuss some specifics to better shape what will be needed in the scheduler
17:33:28 <alaski> I started getting a list of scheduler filters, and alphabetically affinity is first
17:34:32 <alaski> I have two questions on it:  how do we revise the semantics for cells, and how do we implement it
17:34:54 <alaski> for the semantics I'm thinking about anti-affinity and cells
17:35:20 <alaski> should it mean anything as far as cells is concerned, like anti-affinity in same cell vs different cell?
17:35:54 <bauzas> mmm, affinity is a big word
17:36:05 <bauzas> we have affinity for VMs, server groups and aggregates
17:36:45 <bauzas> I'm fine with having a new level as cells, but that ties to the question I mentioned about cells scope
17:38:08 <melwitt> I can think of it meaning anti-affinity within a cell. if a cell represents a group of instances belonging to a certain security zone for example. this kind of brings up the two level scheduling possibility I think, allowing in-cell schedulers to filter if they want to. at the top anti-affinity would mean different cells, in-cell it would be within the same cell
17:38:18 <alaski> most of the cells filters I deal with are primarily about the capabilities of a cell, this type of hardware for this type of flavor type thing
17:38:42 <alaski> so my initial thinking is that cells scope only deals with large things like that, everything else is global host level stuff
17:39:12 <bauzas> melwitt: that's just two filters, if we consider a top-level scheduler
17:39:34 <melwitt> bauzas: okay
17:40:08 <bauzas> alaski: are you grouping per capabilities ?
17:40:42 <alaski> bauzas: not sure what you mena
17:40:44 <alaski> mean
17:41:11 <bauzas> alaski: sorry, I was referring to what you mention as "capabilities of a cell"
17:41:28 <bauzas> alaski: that implies homegeneous hardware
17:41:39 <alaski> bauzas: ahh, gotcha
17:42:00 <alaski> bauzas: doesn't have to be fully homogeneous, but close enough
17:42:12 <alaski> like all SSDs, but perhaps different cpus
17:42:22 <bauzas> alaski: because I'm considering cells as just something for scaling Nova, not necessarly necessary for grouping
17:42:34 <bauzas> alaski: because aggregates are used for that purpose
17:43:05 <alaski> sure, but it is a grouping as well and I think it will be used that way
17:43:39 <bauzas> alaski: sorry but why not creating a big aggregate per cell, and leave as it is now ?
17:43:40 <alaski> but it's a good point that it doesn't need to be
17:44:12 <bauzas> alaski: I'm asking that, because that sends a bad signal that cells are just aggregates
17:44:46 <bauzas> alaski: but I agree with you on the colocation aspect
17:45:03 <bauzas> alaski: I guess you tend to group cells by hardware proximity
17:45:25 <alaski> yeah
17:45:50 <alaski> I'm currently rethinking how much of a concept cells need to be in the scheduler
17:46:45 <alaski> we do need to figure out how to migrate the concepts we currently have into something the scheduler can handle
17:46:51 <bauzas> alaski: well, I'm considering cells as just a new colocation object
17:47:04 <alaski> but if can move cell capabilities into aggregates that could be a path
17:47:30 <alaski> bauzas: yeah, but should that be different than an implicit aggregate?
17:47:51 <bauzas> alaski: because of the affinity stuff you mentioned
17:48:08 <bauzas> alaski: like saying that you have a DC with 3 floors
17:48:37 <bauzas> alaski: you could use aggregates for grouping your SSD hosts which are not at the same floor
17:49:16 <bauzas> alaski: but you could need to do a boot request saying 'I want to boot on a host at the 3rd floor, because I know that my power consumption is lower there"
17:49:29 <bauzas> alaski: of course, it can be implicit
17:50:06 <bauzas> so the affinity can be done orthogonallyt
17:50:17 <alaski> right
17:51:00 <bauzas> alaski: and that would solve one big confusion about Nova AZs != AWS AZs
17:51:24 <bauzas> because we could say that Nova cells are somehow related to AWS AZs
17:52:13 <alaski> possibly
17:52:20 <alaski> that's still up to the deployer I think
17:52:25 <bauzas> agreed
17:52:42 <bauzas> that's why I'm saying that's just a new segregation object for the scheduler
17:52:54 <bauzas> so new filters
17:53:10 <bauzas> - leaving off the scaling problem, of course -
17:53:36 <alaski> the new filter would be at the same place other scheduler filters are at
17:53:50 <bauzas> alaski: if the scheduler is a global scheduler, yes
17:53:55 <alaski> going this route I'm not sure we're introducing a real scalability difference
17:54:23 <bauzas> that's just an abstraction that leaves the scalability problem unresolved
17:54:35 <alaski> bauzas: right, I'm assuming global scheduler for now since that's what we have
17:54:52 <alaski> bauzas: unresolved, but not any different than today right?
17:55:01 <bauzas> alaski: exactly, and that's necessary to keep that abstraction if we want "orthogonal" filtering
17:55:18 <bauzas> alaski: right, I'm just trying to say that's distinct efforts
17:55:34 <bauzas> #1 scalability problem of a global scheduler
17:55:42 <bauzas> #2 cells-related filters
17:56:05 <alaski> okay.  agreed, though for this meeting I would reverse the ordering :)
17:56:13 <bauzas> lol
17:56:36 <bauzas> just thinking about hash rings for scheduler, lol
17:56:48 <alaski> well that felt productive to me, because I'm totally rethinking scheduling now
17:57:24 <alaski> bauzas: heh, if we could do that it would be pretty cool
17:57:42 <bauzas> yeah.... but claims are compute-based :/
17:57:43 <alaski> so my takeaway is that I need to rewrite my scheduling spec
17:58:11 <bauzas> eh
17:58:37 <bauzas> you have to jump in on scheduler, as I have to do with cells :)
17:58:51 <alaski> I'm seeing that
17:59:05 <alaski> I'll look up the meeting time and start showing up
17:59:18 <bauzas> cool
17:59:25 <alaski> any last minute thoughts from anyone, since we didn't get an open discussion?
17:59:41 <melwitt> regex fix merged, I put rechecks on the two patches
17:59:48 <bauzas> melwitt: \o/
17:59:48 <alaski> melwitt: awesome!
17:59:57 <bauzas> mens are chattering...
18:00:10 <bauzas> congrats !
18:00:13 <melwitt> heh
18:00:14 <bauzas> doing rechecks now
18:00:29 <alaski> she already did them
18:00:34 <alaski> thanks everyone!
18:00:34 <bauzas> oh cool
18:00:38 <bauzas> thanks
18:00:40 <alaski> #endmeeting