#openstack-meeting-3 log

22:00:28 <alaski> #startmeeting nova_cells
22:00:33 <openstack> Meeting started Wed Dec 10 22:00:28 2014 UTC and is due to finish in 60 minutes.  The chair is alaski. Information about MeetBot at http://wiki.debian.org/MeetBot.
22:00:34 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
22:00:35 <vineetmenon_> hi
22:00:37 <openstack> The meeting name has been set to 'nova_cells'
22:00:40 <tonyb> Howdy guys.
22:00:42 <melwitt> o/
22:00:43 <gilliard> Hello
22:00:43 <mriedem> o/
22:00:45 <alaski> hi
22:00:45 <edleafe> hey
22:00:47 <dansmith> yo
22:00:55 <mateuszb> Hi
22:01:03 <belmoreira> hi
22:01:18 <alaski> awesome, let's get started
22:01:27 <alaski> #topic Cells manifesto
22:01:34 <alaski> https://review.openstack.org/#/c/139191/
22:01:44 <alaski> It's been proposed to devref
22:02:04 <alaski> I have some changes I would still like to make, but please look it over and comment
22:02:21 <dansmith> gilliard: you know that normal nova *is* a no-cells deployment, right?
22:02:45 <dansmith> alaski: I didn't realize this was up yet, sorry, else I'd have reviewed it
22:02:53 <gilliard> dansmith: right.
22:03:04 <alaski> dansmith: no worries.  I didn't really announce it at all
22:03:07 <vineetmenon_> alaski: nice stuff..
22:03:09 <bauzas> \o
22:03:15 <dansmith> gilliard: okay, then I don't understand your comment on there
22:03:34 <melwitt> after cells v2 there won't be no-cells is what I thought it meant
22:03:39 <alaski> I took it to mean that going forward there won't be a no-cells deployment
22:03:46 <dansmith> ah, okay,
22:03:53 <gilliard> that's what I meant, yes.
22:04:09 <mriedem> there won't not be a cells deployment, more double negatives please
22:04:11 <vineetmenon_> it's actually, no no-cell deployment
22:04:12 <dansmith> I guess it seems weird that the comment is in the cellsv1 section
22:04:24 <melwitt> oh, I didn't notice that
22:04:40 <dansmith> that's why I was confused, but sounds like alaski will just slap something in there to clarify
22:04:54 <gilliard> I'll try to word it better
22:04:59 <alaski> yeah, I'll likely add it to the proposal section
22:05:01 <comstud> alaski: How does top level cell have things like correct current power state for instance for 'nova show <instance_uuid>' ?
22:05:23 <alaski> uh oh
22:05:26 <dansmith> yeah :/
22:05:37 <dansmith> comstud: because it connects directly to the cell db to look at that
22:05:42 <alaski> comstud: it doesn't.  all queries go to a cell
22:05:54 <comstud> Ok, so you look up the mapping first
22:05:57 <dansmith> comstud: and direct to its db, not to its conductor or anything
22:06:00 <alaski> comstud: right
22:06:01 <bauzas> using the cell instance mapping table :)
22:06:06 <comstud> so 2 db calls
22:06:17 <comstud> which is fine
22:06:17 <dansmith> initially yeah
22:06:26 <alaski> comstud: yes.  very likely to end up cached in memory
22:06:30 <alaski> eventually
22:06:40 <vineetmenon_> actually 3 db if we have cells and server tables different
22:06:44 <bauzas> comstud: I think you should be interested in looking https://review.openstack.org/#/c/135644/
22:06:53 <dansmith> vineetmenon_: how three?
22:07:12 <vineetmenon_> first get the cell, then the server then get status from db connection?
22:07:28 <comstud> this is unfortunate for cells that may be 'far away'
22:07:30 <dansmith> vineetmenon_: um, not sure about that
22:07:57 <comstud> cache probably solves it
22:08:11 <vineetmenon_> oh.. okay.. i may need to revisit that..
22:08:16 <dansmith> comstud: so the thought was also to mix in some of what alaski was talking about doing with current stuff even, which is caching like the json from an instance show
22:08:47 <comstud> yeah, now that xml is dead.. caching json is cool
22:08:58 <bauzas> cache could be a good option provided we know when invalidating it :à)
22:09:00 <bauzas> :)
22:09:24 <alaski> right, we could keep a read friendly copy of relevant instance data up top
22:09:27 <dansmith> bauzas: if vm_state == active, cache for a while, invalidate when you do a thing
22:09:28 * bauzas likes nitpicking
22:09:40 <dansmith> bauzas: if vm_state != active, cache for only short periods
22:09:49 <dansmith> I think there are some easy wins there
22:09:57 <bauzas> dansmith: agreed, and invalidate on a subset of queries ?
22:10:02 <dansmith> right
22:10:09 <vineetmenon_> There are only two hard things in Computer Science: cache invalidation and naming things. -- Phil Karlton.
22:10:15 <alaski> comstud: I was originally thinking of how to move from current cells to a place where the global db differed from the cells.  this is approaching that same place from the other side
22:10:30 <bauzas> vineetmenon_: that's why I just wanted to make sure we're not creating a beast :)
22:10:44 <alaski> comstud: and removing the cells rpc proxy and just hitting mqs directly
22:10:48 <alaski> and dbs
22:10:55 <bauzas> alleluiah
22:11:01 <comstud> yeah
22:11:16 <comstud> I think this is a reasonable solution
22:11:22 * dansmith gasps
22:11:26 <comstud> it feels somewhat less distributed somehow, though.
22:11:33 <comstud> 'feels'
22:12:03 <melwitt> how about the list instances all tenants query? I assume that doesn't end up significantly more costly than current cells
22:12:07 <comstud> i think the thing that negates that feeling is the cache.. depending on how it is implemented
22:12:30 <bauzas> melwitt: are we really *sure* that this query scales ? :D
22:12:36 <bauzas> I mean atm ?
22:12:37 <dansmith> comstud: you could potentially have the dbs all close to the api node, since the computes are using rpc to get to the DB anyway
22:12:51 <vineetmenon_> comstud, but thinking from a systems POV, this design looks clean, if not efficient
22:12:54 <dansmith> comstud: i.e. more latency for computes, less for the api, but still one DB per cell
22:13:15 <dansmith> melwitt: you issues multiple parallel requests to each cell
22:13:20 <alaski> melwitt: it's costlier in terms of db connections, but the size of the data should be the same
22:13:25 <comstud> dansmith: that makes it worse for me :)
22:13:27 <comstud> less distributed
22:13:40 <bauzas> and then someone invented MapReduce
22:14:01 <dansmith> comstud: right, but faster for the API -- I wasn't really talking about the dsitributedness there
22:14:07 <comstud> sure, it's faster
22:14:14 <melwitt> dansmith, alaski: yeah, I was referring to number of db calls
22:14:16 <bauzas> I mean, I like the idea of dividing for reigning
22:14:36 <dansmith> melwitt: but if they're in parallel to N DBs and smaller each
22:14:46 <bauzas> so we can scale if we do parallel calls
22:15:02 <melwitt> yeah, cool
22:15:09 <alaski> comstud: this may end up evolving to look a lot like current cells over time, if we need more distributedness
22:15:18 <gilliard> I'll try to word it better~.
22:15:25 <bauzas> of course, that's tied to greenthread IOs, but don't pick the nits yet now :)
22:15:31 <dansmith> bauzas: right
22:16:02 <dansmith> next topic?
22:16:16 <alaski> yep
22:16:23 <alaski> this got off topic a bit
22:16:32 <alaski> but everyone comment on the manifesto :)
22:16:37 <vineetmenon_> alaski: just a sec
22:16:37 <alaski> #topic Testing
22:16:49 <vineetmenon_> there are two tables, right..
22:17:03 <alaski> vineetmenon_: there are.  but can you hold for open discussion?
22:17:09 <comstud> I certainly like the queue part.
22:17:19 <vineetmenon_> so, I'm not getting, why we need parallel db calls..
22:17:21 <alaski> I updated the bottom of https://etherpad.openstack.org/p/nova-cells-testing with the latest test failures
22:17:43 <vineetmenon_> you can precisely get which server resides where, right
22:17:46 <comstud> I think the DB part is reasonable for now to simply things
22:17:50 <alaski> we're down to 40 test failures on the latest runs
22:17:51 <vineetmenon_> or am I totally wrong?
22:17:59 <dansmith> vineetmenon_: can you wait for open discussion?
22:18:04 <vineetmenon_> okay.. sure
22:18:27 <mriedem> so looks like we need eyes on https://review.openstack.org/#/c/135700/
22:18:37 <alaski> comstud: awesome.  I would love your thoughts on the specs that are open, or a more open discussion at some poitn
22:18:52 <comstud> I just looked at the one so far
22:18:55 <dansmith> mriedem: well, eyes are good for sure, but I'm still not where I want to be on that thing :(
22:19:08 <mriedem> dansmith: so it's WIP?
22:19:08 <comstud> I'll see if I can find time to look and comment more
22:19:15 <comstud> but it probably won't be til after christmas
22:19:17 <alaski> mriedem: the one before that is I think
22:19:17 <comstud> hehe
22:19:22 <bauzas> mriedem: sorry for looking lazy, but how this patch helps increasing the coverage ?
22:19:25 <mriedem> dansmith: oh boy, just saw LOC
22:19:32 <alaski> comstud: heh, no worries
22:19:39 * bauzas misses some backlog history
22:19:41 <mriedem> bauzas: i was just looking at the list of patches for review in the testing etherpad
22:19:41 <dansmith> mriedem: yeah
22:19:51 <comstud> you obviously don't need me ;)
22:20:02 <dansmith> mriedem: this is the third time I've started it from scratch, but flavors are just soooo pervasive :(
22:20:10 <dansmith> mriedem: third time I've tried splitting bits off I mean
22:20:18 <alaski> comstud: if we say we do will you stick around :)
22:20:24 <bauzas> ok, so can s/o explain why modifying storage of flavors will help functional test coverage ?
22:20:34 <dansmith> alaski: he means he doesn't need *us* too :)
22:20:35 * bauzas is lost a bit
22:20:43 <dansmith> bauzas: unrelated
22:20:55 <bauzas> dansmith: oh, perfect reason then
22:20:59 <bauzas> :)
22:21:11 <alaski> bauzas: what we need is for flavor extra_specs to be available in a cell
22:21:26 <alaski> because flavors are not replicated into the cell dbs
22:21:35 <alaski> we want to pass that down with the instance
22:21:36 <bauzas> alaski: aaaah ack.
22:21:37 <comstud> This does go somewhat in the opposite direction than I was thinkings in terms of the API
22:21:39 <tonyb> alaski: is that basically the same reqiest the ironic guys had?
22:21:46 <dansmith> we need it for a lot of things
22:21:55 <alaski> tonyb: yes, it will help them as well
22:22:06 <comstud> I wanted to segregate it more from the computes
22:22:14 <tonyb> alaski: cool.
22:22:29 <comstud> but depending on how the cache works, it might end up the same thing
22:23:22 <alaski> comstud: cool, would love to talk more on this when there's more time
22:23:26 <vineetmenon_> comstud: that's why we are looking forward in spilitting data between cell and api..https://etherpad.openstack.org/p/nova-cells-table-analysis
22:23:39 <comstud> me too
22:23:53 <comstud> unfort i have like 3 big things to finish before xmas
22:24:09 <alaski> so the reason that dansmiths patch series is linked in the testing etherpad is because it's a long term solution to something we've worked around in another way in the short term
22:24:30 <alaski> which is likely to break as the scheduler work progresses (the short term solution)
22:25:17 <alaski> dansmith: is there anything we can do to help with it atm
22:25:24 <dansmith> alaski: shoot me in the head
22:25:34 <alaski> that helps you, not us :)
22:25:37 <melwitt> :(
22:25:42 <dansmith> heh
22:25:45 <vineetmenon_> :)
22:26:33 <tonyb> alaski: so on Testing and the ~40 failures is that something I can look at and not duplicate work you're doing?
22:26:39 <alaski> well, there are still 40 failures which are likely unrelated to flavors
22:26:54 <alaski> tonyb: yes
22:27:08 <alaski> I'm intermittently looking into them, but I can mark that on the pad when I do it
22:27:08 <bauzas> well, the host detail Tempest test worries me
22:27:29 <bauzas> because I can't see how we can fix it
22:27:36 <tonyb> alaski: okay I'll see what I can do in that area
22:27:46 <alaski> tonyb: awesome
22:27:56 <alaski> bauzas: do you know where the failure is?
22:28:13 <bauzas> alaski: needs definitely more time to look at the issue
22:28:19 <alaski> ok
22:28:20 <bauzas> need*
22:28:20 <tonyb> it'd be nice to actually write code in openstack rather than qemu/libvirt ;P
22:28:46 <alaski> bauzas: if it's not a quick fix, we can skip the test(s)
22:29:23 <alaski> moving on...
22:29:32 <alaski> #topic cells scheduling requirements
22:29:50 <alaski> woops, forgot to link on the agenda
22:29:52 <alaski> https://etherpad.openstack.org/p/nova-cells-scheduling-requirements
22:29:53 <vineetmenon_> bauza: are you talking about this? https://bugs.launchpad.net/nova/+bug/1312002
22:29:57 <mateuszb> There is a use case of filtering cells basing on their capabilities which are already gathered and passed up to the parent cell: https://review.openstack.org/#/c/140031/
22:30:48 <alaski> mateuszb: yes, I agree.  But I do think it's unrelated to this a bit
22:31:09 <alaski> because we're not really looking at using the cells scheduler
22:31:11 <bauzas> vineetmenon_: nope
22:32:00 <alaski> mateuszb: but I do like that spec and think it's worthwhile regardless
22:33:05 <bauzas> alaski: I think that belmoreira raised the main issues for having a intra-cell scheduler
22:33:06 <mateuszb_> alaski: Ok, but it would be great if you leave your feedback on this. I know there is an interest in it apart from Intel
22:33:21 <bauzas> alaski: to be clear, s/issues/concerns
22:33:35 <alaski> mateuszb_: okay, I will do that
22:33:43 <mateuszb_> thank you
22:34:24 <belmoreira> mateuszb_: yes, we are interested on this... in fact we are already using something similar to what you are proposing
22:34:57 <bauzas> belmoreira: there is one spec for changing how Scheduler would pick hosts based on aggregates that I think you could be interested in https://review.openstack.org/#/c/89893/
22:35:07 <mateuszb_> belmoreira: you mean you created your own filter?
22:35:53 <alaski> belmoreira and I both listed a desire to have intra-cell scheduling
22:35:53 <belmoreira> mateuszb_: yes, we have created filters to deal with capabilities... (datacentre, avz, ...)
22:36:25 <belmoreira> bauzas: thanks, I will have a look
22:36:45 <bauzas> just to be clear, do all people know we'll change how filters will look at aggregates and instances ?
22:36:48 <alaski> belmoreira: a question I had for you is if it is a requirement that they be different scheduler endpoints?
22:37:26 <bauzas> I'm very concerned by any spec creating intra-calls within the filter to Nova DB or so
22:37:26 <alaski> belmoreira: I'm thinking ahead to when the scheduler is split out, potentially
22:38:39 <bauzas> alaski: I think we need to think about how having a scheduler able to either provide a cell or an host
22:38:42 <belmoreira> alaski: not a requirement... My concern then have a bottleneck, and be more difficult to scale
22:38:48 <bauzas> alaski: but not having 2 different schedulers
22:39:05 <bauzas> alaski: or we would reproduce what we have with the current Cellv1 scheduler
22:39:36 <bauzas> ie. something totally out of scope of what's happening within the scheduler's world
22:39:39 <alaski> belmoreira: I completely agree about scale.  But I'm wondering if we can have the scheduler be something we query, and it can deal with the scale and separation question separately
22:40:04 <belmoreira> For me one of the advantages to have cells is because each cell can be configured in a different way (depending in the use case) including the schedulers
22:40:06 <alaski> bauzas: agreed
22:40:19 <bauzas> belmoreira: we know that the Scheduler can't scale because it does in-query calls to DB
22:40:24 <alaski> bauzas: I've been thinking about it a lot these past few days, and want to write something down about it
22:40:29 <bauzas> that's really expensive
22:40:38 <bauzas> alaski: pick me in the loop then
22:40:57 <bauzas> alaski: we have time to loop back with the scheduler swat team
22:41:06 <alaski> bauzas: will do.  I would love to get some ideas and thoughts from others on this
22:41:45 <vineetmenon_> a memcache would be more beneficial here, IMHO.
22:41:45 <belmoreira> alaski, bauzas: keep me in the loop as well
22:41:58 <alaski> belmoreira: will do
22:42:18 <bauzas> anyway, the idea is to make sure we can do something generic and scalable
22:42:32 <alaski> We have not received any feedback from HP/Nectar yet, because I didn't reach out to them yet
22:42:33 <bauzas> eh, isn't it what we want to provide for the scheduler ? :D
22:42:51 <alaski> So I will do that so they can add to the conversation
22:43:20 <alaski> #action alaski Reach out to HP/Nectar for scheduling requirement feedback
22:43:45 <alaski> next up...
22:43:53 <alaski> #topic open discussion
22:44:09 <vineetmenon_> alaski: did you miss database?
22:44:27 <vineetmenon_> i guess that was an agenda as well..
22:44:57 <alaski> vineetmenon_: I actually removed that item since I wasn't sure where to go with that yet
22:45:14 <alaski> but we can talk about it now
22:45:20 <bauzas> just to be clear, I think my comments on https://etherpad.openstack.org/p/nova-cells-table-analysis depend on the issue of the discussions about the sched requirements
22:45:28 <vineetmenon_> aah..
22:46:17 <alaski> bauzas: which comments?
22:46:57 <bauzas> alaski: eh... damn etherpad, it left different colors for myself
22:47:10 <alaski> and should we come to more of a resolution around scheduling requirements before getting too far into table analysis?
22:47:13 <bauzas> alaski: I was mentioning aggregates and instancegroyps
22:47:17 <vineetmenon_> under controversial tab?
22:47:24 <belmoreira> for the DB discussion it will be easy if we first reserve a meeting to discuss aggregates, volumes, server groups...
22:47:26 <bauzas> alaski: they're tied to the scheduler
22:47:38 <vineetmenon_> belmireira: +1
22:47:40 <bauzas> vineetmenon_: right
22:48:07 <bauzas> belmoreira: as I said, I think the scheduling requirement decision seems to be the first thing to do
22:48:41 <bauzas> belmoreira: because we can't talk about DB segregation without having a clear idea yet on what would be the whole stories for boot and evacuate for example
22:49:12 <vineetmenon_> so this part is going to be limbo for a looong time.
22:49:20 <alaski> bauzas: I think we can talk about it if we limit the scope to basic scheduling
22:49:46 <alaski> vineetmenon_: some parts of it, maybe
22:49:51 <bauzas> alaski: well, it depends on if you want to reach feature parity with Nova
22:49:54 <alaski> but let's start with this:
22:50:01 <bauzas> alaski: like, live migration in between cells ?
22:50:17 <alaski> do people want more time to consider the "easy" tables
22:50:21 <alaski> ?
22:50:29 <alaski> or is there a general consensus there?
22:50:46 <bauzas> alaski: well, I pointed out services
22:51:02 <bauzas> alaski: as it's very related to SG API
22:51:10 <alaski> okay, that can move to controversial
22:52:09 <belmoreira> bauzas: I agree with you, but we can also see it in the other way around... for example deciding where aggregates live will influence the scheduler
22:52:31 <alaski> Can we maybe pick a date, like next wednesday, and say that we're generally okay with what's not in controversial/unsure and start from there?
22:52:38 <bauzas> belmoreira: aggregates are a beast for only the Scheduler
22:52:52 <alaski> then we can start picking apart what's left
22:53:09 <bauzas> belmoreira: what would be the decision for aggregates, it would require a same level of granularity for the scheduler
22:53:33 <tonyb> alaski: that plan sounds fair
22:53:36 <bauzas> alaski: I'm pretty ok with the list except one last thing
22:53:41 <bauzas> alaski: networks ?
22:54:14 <bauzas> but time is running fast, dammit.
22:54:14 <dansmith> this is why we need to decide what to do about nova-network,
22:54:23 <dansmith> because if n-net goes away soon, so does networks
22:54:28 <bauzas> +1
22:54:42 <bauzas> and what would be the network topology for cells ?
22:55:10 <bauzas> belmoreira: do you have 1 to N subnets per cells ?
22:55:20 <bauzas> belmoreira: or is it something global ?
22:55:27 <dansmith> it'll depend
22:55:30 <dansmith> on the deployer
22:55:44 <dansmith> there are people out there running everything on one L2, one cell per subnet, etc
22:55:54 <belmoreira> bauzas: each cell has different subnets
22:55:58 <bauzas> dansmith: agreed, that's why Neutron exists
22:56:12 <vineetmenon_> what about subnet spread across multiple cells?
22:56:29 <bauzas> anyway, we have 4 mins left :(
22:56:33 <vineetmenon_> and each cell constisting multiple subnets as well?
22:57:07 <bauzas> dansmith: I don't remember anything about networks in the manifesto, will review the patch with that in mind
22:57:25 <bauzas> dansmith: we need to be explicit on that I guess
22:57:26 <alaski> so for now it seems like network falls under controversial, and we can devote some time to it later
22:57:39 <alaski> *networks
22:58:11 <alaski> let's try to get to the list of non controversial things for now
22:58:35 <alaski> so if there's a concern on a table move it to the unsure/controversial list
22:59:07 <bauzas> sounds good, with a deadline set to Wed hten ?
22:59:12 <bauzas> *then
22:59:18 <alaski> there's a lot to try to tackle in this effort, we're not going to get it all at once
22:59:28 <alaski> bauzas: yes, since I didn't hear any complaints
22:59:34 <bauzas> alaski: cool
22:59:56 <alaski> #action review table split so we can claim consensus by next wednesday
23:00:26 <alaski> my final item was that I will not be around to run the meeting on the 24th or 31st
23:00:40 <alaski> and it's likely others won't be around either
23:00:49 <alaski> so I am going to suggest we skip those weeks
23:00:59 <bauzas> 31st seems to be hard to follow :)
23:01:00 <alaski> but we can make that decision later, just throwing it out there
23:01:01 <tonyb> alaski: I'd say cancel those meetings.
23:01:06 <belmoreira> alaski: +1
23:01:16 <bauzas> in particular as it's midnight now
23:01:24 <alaski> coll
23:01:25 <alaski> cool
23:01:29 <belmoreira> bauzas: :)
23:01:29 <alaski> thanks all!
23:01:39 <alaski> #endmeeting