22:00:28 <alaski> #startmeeting nova_cells 22:00:33 <openstack> Meeting started Wed Dec 10 22:00:28 2014 UTC and is due to finish in 60 minutes. The chair is alaski. Information about MeetBot at http://wiki.debian.org/MeetBot. 22:00:34 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 22:00:35 <vineetmenon_> hi 22:00:37 <openstack> The meeting name has been set to 'nova_cells' 22:00:40 <tonyb> Howdy guys. 22:00:42 <melwitt> o/ 22:00:43 <gilliard> Hello 22:00:43 <mriedem> o/ 22:00:45 <alaski> hi 22:00:45 <edleafe> hey 22:00:47 <dansmith> yo 22:00:55 <mateuszb> Hi 22:01:03 <belmoreira> hi 22:01:18 <alaski> awesome, let's get started 22:01:27 <alaski> #topic Cells manifesto 22:01:34 <alaski> https://review.openstack.org/#/c/139191/ 22:01:44 <alaski> It's been proposed to devref 22:02:04 <alaski> I have some changes I would still like to make, but please look it over and comment 22:02:21 <dansmith> gilliard: you know that normal nova *is* a no-cells deployment, right? 22:02:45 <dansmith> alaski: I didn't realize this was up yet, sorry, else I'd have reviewed it 22:02:53 <gilliard> dansmith: right. 22:03:04 <alaski> dansmith: no worries. I didn't really announce it at all 22:03:07 <vineetmenon_> alaski: nice stuff.. 22:03:09 <bauzas> \o 22:03:15 <dansmith> gilliard: okay, then I don't understand your comment on there 22:03:34 <melwitt> after cells v2 there won't be no-cells is what I thought it meant 22:03:39 <alaski> I took it to mean that going forward there won't be a no-cells deployment 22:03:46 <dansmith> ah, okay, 22:03:53 <gilliard> that's what I meant, yes. 22:04:09 <mriedem> there won't not be a cells deployment, more double negatives please 22:04:11 <vineetmenon_> it's actually, no no-cell deployment 22:04:12 <dansmith> I guess it seems weird that the comment is in the cellsv1 section 22:04:24 <melwitt> oh, I didn't notice that 22:04:40 <dansmith> that's why I was confused, but sounds like alaski will just slap something in there to clarify 22:04:54 <gilliard> I'll try to word it better 22:04:59 <alaski> yeah, I'll likely add it to the proposal section 22:05:01 <comstud> alaski: How does top level cell have things like correct current power state for instance for 'nova show <instance_uuid>' ? 22:05:23 <alaski> uh oh 22:05:26 <dansmith> yeah :/ 22:05:37 <dansmith> comstud: because it connects directly to the cell db to look at that 22:05:42 <alaski> comstud: it doesn't. all queries go to a cell 22:05:54 <comstud> Ok, so you look up the mapping first 22:05:57 <dansmith> comstud: and direct to its db, not to its conductor or anything 22:06:00 <alaski> comstud: right 22:06:01 <bauzas> using the cell instance mapping table :) 22:06:06 <comstud> so 2 db calls 22:06:17 <comstud> which is fine 22:06:17 <dansmith> initially yeah 22:06:26 <alaski> comstud: yes. very likely to end up cached in memory 22:06:30 <alaski> eventually 22:06:40 <vineetmenon_> actually 3 db if we have cells and server tables different 22:06:44 <bauzas> comstud: I think you should be interested in looking https://review.openstack.org/#/c/135644/ 22:06:53 <dansmith> vineetmenon_: how three? 22:07:12 <vineetmenon_> first get the cell, then the server then get status from db connection? 22:07:28 <comstud> this is unfortunate for cells that may be 'far away' 22:07:30 <dansmith> vineetmenon_: um, not sure about that 22:07:57 <comstud> cache probably solves it 22:08:11 <vineetmenon_> oh.. okay.. i may need to revisit that.. 22:08:16 <dansmith> comstud: so the thought was also to mix in some of what alaski was talking about doing with current stuff even, which is caching like the json from an instance show 22:08:47 <comstud> yeah, now that xml is dead.. caching json is cool 22:08:58 <bauzas> cache could be a good option provided we know when invalidating it :à) 22:09:00 <bauzas> :) 22:09:24 <alaski> right, we could keep a read friendly copy of relevant instance data up top 22:09:27 <dansmith> bauzas: if vm_state == active, cache for a while, invalidate when you do a thing 22:09:28 * bauzas likes nitpicking 22:09:40 <dansmith> bauzas: if vm_state != active, cache for only short periods 22:09:49 <dansmith> I think there are some easy wins there 22:09:57 <bauzas> dansmith: agreed, and invalidate on a subset of queries ? 22:10:02 <dansmith> right 22:10:09 <vineetmenon_> There are only two hard things in Computer Science: cache invalidation and naming things. -- Phil Karlton. 22:10:15 <alaski> comstud: I was originally thinking of how to move from current cells to a place where the global db differed from the cells. this is approaching that same place from the other side 22:10:30 <bauzas> vineetmenon_: that's why I just wanted to make sure we're not creating a beast :) 22:10:44 <alaski> comstud: and removing the cells rpc proxy and just hitting mqs directly 22:10:48 <alaski> and dbs 22:10:55 <bauzas> alleluiah 22:11:01 <comstud> yeah 22:11:16 <comstud> I think this is a reasonable solution 22:11:22 * dansmith gasps 22:11:26 <comstud> it feels somewhat less distributed somehow, though. 22:11:33 <comstud> 'feels' 22:12:03 <melwitt> how about the list instances all tenants query? I assume that doesn't end up significantly more costly than current cells 22:12:07 <comstud> i think the thing that negates that feeling is the cache.. depending on how it is implemented 22:12:30 <bauzas> melwitt: are we really *sure* that this query scales ? :D 22:12:36 <bauzas> I mean atm ? 22:12:37 <dansmith> comstud: you could potentially have the dbs all close to the api node, since the computes are using rpc to get to the DB anyway 22:12:51 <vineetmenon_> comstud, but thinking from a systems POV, this design looks clean, if not efficient 22:12:54 <dansmith> comstud: i.e. more latency for computes, less for the api, but still one DB per cell 22:13:15 <dansmith> melwitt: you issues multiple parallel requests to each cell 22:13:20 <alaski> melwitt: it's costlier in terms of db connections, but the size of the data should be the same 22:13:25 <comstud> dansmith: that makes it worse for me :) 22:13:27 <comstud> less distributed 22:13:40 <bauzas> and then someone invented MapReduce 22:14:01 <dansmith> comstud: right, but faster for the API -- I wasn't really talking about the dsitributedness there 22:14:07 <comstud> sure, it's faster 22:14:14 <melwitt> dansmith, alaski: yeah, I was referring to number of db calls 22:14:16 <bauzas> I mean, I like the idea of dividing for reigning 22:14:36 <dansmith> melwitt: but if they're in parallel to N DBs and smaller each 22:14:46 <bauzas> so we can scale if we do parallel calls 22:15:02 <melwitt> yeah, cool 22:15:09 <alaski> comstud: this may end up evolving to look a lot like current cells over time, if we need more distributedness 22:15:18 <gilliard> I'll try to word it better~. 22:15:25 <bauzas> of course, that's tied to greenthread IOs, but don't pick the nits yet now :) 22:15:31 <dansmith> bauzas: right 22:16:02 <dansmith> next topic? 22:16:16 <alaski> yep 22:16:23 <alaski> this got off topic a bit 22:16:32 <alaski> but everyone comment on the manifesto :) 22:16:37 <vineetmenon_> alaski: just a sec 22:16:37 <alaski> #topic Testing 22:16:49 <vineetmenon_> there are two tables, right.. 22:17:03 <alaski> vineetmenon_: there are. but can you hold for open discussion? 22:17:09 <comstud> I certainly like the queue part. 22:17:19 <vineetmenon_> so, I'm not getting, why we need parallel db calls.. 22:17:21 <alaski> I updated the bottom of https://etherpad.openstack.org/p/nova-cells-testing with the latest test failures 22:17:43 <vineetmenon_> you can precisely get which server resides where, right 22:17:46 <comstud> I think the DB part is reasonable for now to simply things 22:17:50 <alaski> we're down to 40 test failures on the latest runs 22:17:51 <vineetmenon_> or am I totally wrong? 22:17:59 <dansmith> vineetmenon_: can you wait for open discussion? 22:18:04 <vineetmenon_> okay.. sure 22:18:27 <mriedem> so looks like we need eyes on https://review.openstack.org/#/c/135700/ 22:18:37 <alaski> comstud: awesome. I would love your thoughts on the specs that are open, or a more open discussion at some poitn 22:18:52 <comstud> I just looked at the one so far 22:18:55 <dansmith> mriedem: well, eyes are good for sure, but I'm still not where I want to be on that thing :( 22:19:08 <mriedem> dansmith: so it's WIP? 22:19:08 <comstud> I'll see if I can find time to look and comment more 22:19:15 <comstud> but it probably won't be til after christmas 22:19:17 <alaski> mriedem: the one before that is I think 22:19:17 <comstud> hehe 22:19:22 <bauzas> mriedem: sorry for looking lazy, but how this patch helps increasing the coverage ? 22:19:25 <mriedem> dansmith: oh boy, just saw LOC 22:19:32 <alaski> comstud: heh, no worries 22:19:39 * bauzas misses some backlog history 22:19:41 <mriedem> bauzas: i was just looking at the list of patches for review in the testing etherpad 22:19:41 <dansmith> mriedem: yeah 22:19:51 <comstud> you obviously don't need me ;) 22:20:02 <dansmith> mriedem: this is the third time I've started it from scratch, but flavors are just soooo pervasive :( 22:20:10 <dansmith> mriedem: third time I've tried splitting bits off I mean 22:20:18 <alaski> comstud: if we say we do will you stick around :) 22:20:24 <bauzas> ok, so can s/o explain why modifying storage of flavors will help functional test coverage ? 22:20:34 <dansmith> alaski: he means he doesn't need *us* too :) 22:20:35 * bauzas is lost a bit 22:20:43 <dansmith> bauzas: unrelated 22:20:55 <bauzas> dansmith: oh, perfect reason then 22:20:59 <bauzas> :) 22:21:11 <alaski> bauzas: what we need is for flavor extra_specs to be available in a cell 22:21:26 <alaski> because flavors are not replicated into the cell dbs 22:21:35 <alaski> we want to pass that down with the instance 22:21:36 <bauzas> alaski: aaaah ack. 22:21:37 <comstud> This does go somewhat in the opposite direction than I was thinkings in terms of the API 22:21:39 <tonyb> alaski: is that basically the same reqiest the ironic guys had? 22:21:46 <dansmith> we need it for a lot of things 22:21:55 <alaski> tonyb: yes, it will help them as well 22:22:06 <comstud> I wanted to segregate it more from the computes 22:22:14 <tonyb> alaski: cool. 22:22:29 <comstud> but depending on how the cache works, it might end up the same thing 22:23:22 <alaski> comstud: cool, would love to talk more on this when there's more time 22:23:26 <vineetmenon_> comstud: that's why we are looking forward in spilitting data between cell and api..https://etherpad.openstack.org/p/nova-cells-table-analysis 22:23:39 <comstud> me too 22:23:53 <comstud> unfort i have like 3 big things to finish before xmas 22:24:09 <alaski> so the reason that dansmiths patch series is linked in the testing etherpad is because it's a long term solution to something we've worked around in another way in the short term 22:24:30 <alaski> which is likely to break as the scheduler work progresses (the short term solution) 22:25:17 <alaski> dansmith: is there anything we can do to help with it atm 22:25:24 <dansmith> alaski: shoot me in the head 22:25:34 <alaski> that helps you, not us :) 22:25:37 <melwitt> :( 22:25:42 <dansmith> heh 22:25:45 <vineetmenon_> :) 22:26:33 <tonyb> alaski: so on Testing and the ~40 failures is that something I can look at and not duplicate work you're doing? 22:26:39 <alaski> well, there are still 40 failures which are likely unrelated to flavors 22:26:54 <alaski> tonyb: yes 22:27:08 <alaski> I'm intermittently looking into them, but I can mark that on the pad when I do it 22:27:08 <bauzas> well, the host detail Tempest test worries me 22:27:29 <bauzas> because I can't see how we can fix it 22:27:36 <tonyb> alaski: okay I'll see what I can do in that area 22:27:46 <alaski> tonyb: awesome 22:27:56 <alaski> bauzas: do you know where the failure is? 22:28:13 <bauzas> alaski: needs definitely more time to look at the issue 22:28:19 <alaski> ok 22:28:20 <bauzas> need* 22:28:20 <tonyb> it'd be nice to actually write code in openstack rather than qemu/libvirt ;P 22:28:46 <alaski> bauzas: if it's not a quick fix, we can skip the test(s) 22:29:23 <alaski> moving on... 22:29:32 <alaski> #topic cells scheduling requirements 22:29:50 <alaski> woops, forgot to link on the agenda 22:29:52 <alaski> https://etherpad.openstack.org/p/nova-cells-scheduling-requirements 22:29:53 <vineetmenon_> bauza: are you talking about this? https://bugs.launchpad.net/nova/+bug/1312002 22:29:57 <mateuszb> There is a use case of filtering cells basing on their capabilities which are already gathered and passed up to the parent cell: https://review.openstack.org/#/c/140031/ 22:30:48 <alaski> mateuszb: yes, I agree. But I do think it's unrelated to this a bit 22:31:09 <alaski> because we're not really looking at using the cells scheduler 22:31:11 <bauzas> vineetmenon_: nope 22:32:00 <alaski> mateuszb: but I do like that spec and think it's worthwhile regardless 22:33:05 <bauzas> alaski: I think that belmoreira raised the main issues for having a intra-cell scheduler 22:33:06 <mateuszb_> alaski: Ok, but it would be great if you leave your feedback on this. I know there is an interest in it apart from Intel 22:33:21 <bauzas> alaski: to be clear, s/issues/concerns 22:33:35 <alaski> mateuszb_: okay, I will do that 22:33:43 <mateuszb_> thank you 22:34:24 <belmoreira> mateuszb_: yes, we are interested on this... in fact we are already using something similar to what you are proposing 22:34:57 <bauzas> belmoreira: there is one spec for changing how Scheduler would pick hosts based on aggregates that I think you could be interested in https://review.openstack.org/#/c/89893/ 22:35:07 <mateuszb_> belmoreira: you mean you created your own filter? 22:35:53 <alaski> belmoreira and I both listed a desire to have intra-cell scheduling 22:35:53 <belmoreira> mateuszb_: yes, we have created filters to deal with capabilities... (datacentre, avz, ...) 22:36:25 <belmoreira> bauzas: thanks, I will have a look 22:36:45 <bauzas> just to be clear, do all people know we'll change how filters will look at aggregates and instances ? 22:36:48 <alaski> belmoreira: a question I had for you is if it is a requirement that they be different scheduler endpoints? 22:37:26 <bauzas> I'm very concerned by any spec creating intra-calls within the filter to Nova DB or so 22:37:26 <alaski> belmoreira: I'm thinking ahead to when the scheduler is split out, potentially 22:38:39 <bauzas> alaski: I think we need to think about how having a scheduler able to either provide a cell or an host 22:38:42 <belmoreira> alaski: not a requirement... My concern then have a bottleneck, and be more difficult to scale 22:38:48 <bauzas> alaski: but not having 2 different schedulers 22:39:05 <bauzas> alaski: or we would reproduce what we have with the current Cellv1 scheduler 22:39:36 <bauzas> ie. something totally out of scope of what's happening within the scheduler's world 22:39:39 <alaski> belmoreira: I completely agree about scale. But I'm wondering if we can have the scheduler be something we query, and it can deal with the scale and separation question separately 22:40:04 <belmoreira> For me one of the advantages to have cells is because each cell can be configured in a different way (depending in the use case) including the schedulers 22:40:06 <alaski> bauzas: agreed 22:40:19 <bauzas> belmoreira: we know that the Scheduler can't scale because it does in-query calls to DB 22:40:24 <alaski> bauzas: I've been thinking about it a lot these past few days, and want to write something down about it 22:40:29 <bauzas> that's really expensive 22:40:38 <bauzas> alaski: pick me in the loop then 22:40:57 <bauzas> alaski: we have time to loop back with the scheduler swat team 22:41:06 <alaski> bauzas: will do. I would love to get some ideas and thoughts from others on this 22:41:45 <vineetmenon_> a memcache would be more beneficial here, IMHO. 22:41:45 <belmoreira> alaski, bauzas: keep me in the loop as well 22:41:58 <alaski> belmoreira: will do 22:42:18 <bauzas> anyway, the idea is to make sure we can do something generic and scalable 22:42:32 <alaski> We have not received any feedback from HP/Nectar yet, because I didn't reach out to them yet 22:42:33 <bauzas> eh, isn't it what we want to provide for the scheduler ? :D 22:42:51 <alaski> So I will do that so they can add to the conversation 22:43:20 <alaski> #action alaski Reach out to HP/Nectar for scheduling requirement feedback 22:43:45 <alaski> next up... 22:43:53 <alaski> #topic open discussion 22:44:09 <vineetmenon_> alaski: did you miss database? 22:44:27 <vineetmenon_> i guess that was an agenda as well.. 22:44:57 <alaski> vineetmenon_: I actually removed that item since I wasn't sure where to go with that yet 22:45:14 <alaski> but we can talk about it now 22:45:20 <bauzas> just to be clear, I think my comments on https://etherpad.openstack.org/p/nova-cells-table-analysis depend on the issue of the discussions about the sched requirements 22:45:28 <vineetmenon_> aah.. 22:46:17 <alaski> bauzas: which comments? 22:46:57 <bauzas> alaski: eh... damn etherpad, it left different colors for myself 22:47:10 <alaski> and should we come to more of a resolution around scheduling requirements before getting too far into table analysis? 22:47:13 <bauzas> alaski: I was mentioning aggregates and instancegroyps 22:47:17 <vineetmenon_> under controversial tab? 22:47:24 <belmoreira> for the DB discussion it will be easy if we first reserve a meeting to discuss aggregates, volumes, server groups... 22:47:26 <bauzas> alaski: they're tied to the scheduler 22:47:38 <vineetmenon_> belmireira: +1 22:47:40 <bauzas> vineetmenon_: right 22:48:07 <bauzas> belmoreira: as I said, I think the scheduling requirement decision seems to be the first thing to do 22:48:41 <bauzas> belmoreira: because we can't talk about DB segregation without having a clear idea yet on what would be the whole stories for boot and evacuate for example 22:49:12 <vineetmenon_> so this part is going to be limbo for a looong time. 22:49:20 <alaski> bauzas: I think we can talk about it if we limit the scope to basic scheduling 22:49:46 <alaski> vineetmenon_: some parts of it, maybe 22:49:51 <bauzas> alaski: well, it depends on if you want to reach feature parity with Nova 22:49:54 <alaski> but let's start with this: 22:50:01 <bauzas> alaski: like, live migration in between cells ? 22:50:17 <alaski> do people want more time to consider the "easy" tables 22:50:21 <alaski> ? 22:50:29 <alaski> or is there a general consensus there? 22:50:46 <bauzas> alaski: well, I pointed out services 22:51:02 <bauzas> alaski: as it's very related to SG API 22:51:10 <alaski> okay, that can move to controversial 22:52:09 <belmoreira> bauzas: I agree with you, but we can also see it in the other way around... for example deciding where aggregates live will influence the scheduler 22:52:31 <alaski> Can we maybe pick a date, like next wednesday, and say that we're generally okay with what's not in controversial/unsure and start from there? 22:52:38 <bauzas> belmoreira: aggregates are a beast for only the Scheduler 22:52:52 <alaski> then we can start picking apart what's left 22:53:09 <bauzas> belmoreira: what would be the decision for aggregates, it would require a same level of granularity for the scheduler 22:53:33 <tonyb> alaski: that plan sounds fair 22:53:36 <bauzas> alaski: I'm pretty ok with the list except one last thing 22:53:41 <bauzas> alaski: networks ? 22:54:14 <bauzas> but time is running fast, dammit. 22:54:14 <dansmith> this is why we need to decide what to do about nova-network, 22:54:23 <dansmith> because if n-net goes away soon, so does networks 22:54:28 <bauzas> +1 22:54:42 <bauzas> and what would be the network topology for cells ? 22:55:10 <bauzas> belmoreira: do you have 1 to N subnets per cells ? 22:55:20 <bauzas> belmoreira: or is it something global ? 22:55:27 <dansmith> it'll depend 22:55:30 <dansmith> on the deployer 22:55:44 <dansmith> there are people out there running everything on one L2, one cell per subnet, etc 22:55:54 <belmoreira> bauzas: each cell has different subnets 22:55:58 <bauzas> dansmith: agreed, that's why Neutron exists 22:56:12 <vineetmenon_> what about subnet spread across multiple cells? 22:56:29 <bauzas> anyway, we have 4 mins left :( 22:56:33 <vineetmenon_> and each cell constisting multiple subnets as well? 22:57:07 <bauzas> dansmith: I don't remember anything about networks in the manifesto, will review the patch with that in mind 22:57:25 <bauzas> dansmith: we need to be explicit on that I guess 22:57:26 <alaski> so for now it seems like network falls under controversial, and we can devote some time to it later 22:57:39 <alaski> *networks 22:58:11 <alaski> let's try to get to the list of non controversial things for now 22:58:35 <alaski> so if there's a concern on a table move it to the unsure/controversial list 22:59:07 <bauzas> sounds good, with a deadline set to Wed hten ? 22:59:12 <bauzas> *then 22:59:18 <alaski> there's a lot to try to tackle in this effort, we're not going to get it all at once 22:59:28 <alaski> bauzas: yes, since I didn't hear any complaints 22:59:34 <bauzas> alaski: cool 22:59:56 <alaski> #action review table split so we can claim consensus by next wednesday 23:00:26 <alaski> my final item was that I will not be around to run the meeting on the 24th or 31st 23:00:40 <alaski> and it's likely others won't be around either 23:00:49 <alaski> so I am going to suggest we skip those weeks 23:00:59 <bauzas> 31st seems to be hard to follow :) 23:01:00 <alaski> but we can make that decision later, just throwing it out there 23:01:01 <tonyb> alaski: I'd say cancel those meetings. 23:01:06 <belmoreira> alaski: +1 23:01:16 <bauzas> in particular as it's midnight now 23:01:24 <alaski> coll 23:01:25 <alaski> cool 23:01:29 <belmoreira> bauzas: :) 23:01:29 <alaski> thanks all! 23:01:39 <alaski> #endmeeting