21:01:21 <alaski> #startmeeting nova_cells 21:01:21 <openstack> Meeting started Wed Feb 10 21:01:21 2016 UTC and is due to finish in 60 minutes. The chair is alaski. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:01:23 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:01:25 <openstack> The meeting name has been set to 'nova_cells' 21:01:31 * bauzas waves to alaski 21:01:37 <doffm> Hi. 21:01:38 <alaski> o/ 21:01:43 <mriedem> o/ 21:01:48 <ccarmack> o/ 21:02:00 <bauzas> and... \o (because I'm a gentleman) 21:02:04 <alaski> great, let's get going 21:02:05 <ctrath> o/ 21:02:11 <alaski> #topic Cells testing/bugs 21:02:34 <ccarmack> I updated https://review.openstack.org/#/c/225199/ to change server_basic_ops to test ssh 21:02:46 <ccarmack> I could use some reviews on it 21:03:03 <alaski> great 21:03:08 <alaski> #link https://review.openstack.org/#/c/225199/ 21:03:28 <ccarmack> One thing I need to do is change project-config to set run_validation = true for cells 21:04:21 <ccarmack> but I already have a 3 dependency patch …so would like to change project-config after this is approved 21:04:44 <alaski> sounds reasonable 21:04:54 <ccarmack> cool 21:05:24 <alaski> there was a tempest test added recently which failed the cells job 21:05:35 <alaski> so it was added to the exclusion list in https://review.openstack.org/#/c/277536/ 21:05:50 <mriedem> alaski: that test broke all shared storage jobs too 21:05:51 <mriedem> so it was reverted 21:05:56 <alaski> oh 21:06:34 <alaski> when/if it's added back it would be good to have the security group addition behind a flag 21:06:53 <alaski> but otherwise I'm not aware of any cells failures 21:07:11 <alaski> #topic Open reviews 21:07:20 <alaski> as always, just calling out https://etherpad.openstack.org/p/mitaka-nova-priorities-tracking 21:07:31 <alaski> I have some reviews to add there, but there's a lot of stuff up now 21:07:38 <alaski> please take a look 21:07:50 <alaski> #topic Open Discussion 21:08:12 <doffm> I created a script to look at possible foreign key issues when doing the database split. 21:08:28 <doffm> https://etherpad.openstack.org/p/CellsV2-database-split 21:08:54 <doffm> I also proposed a doc change to add all missing databases to the split info. 21:09:07 <doffm> https://review.openstack.org/#/c/277543/ 21:09:22 <doffm> If people could review that to discuss any database split issues i'd be grateful. 21:09:37 <bauzas> lovely 21:09:52 <alaski> yeah, really nice 21:10:06 <bauzas> that said, I just wonder how we could draw the line in the sand 21:10:06 <alaski> do we want to discuss the big one now, aggregates? 21:10:20 <doffm> We could. I don't have any really good ideas though. 21:10:39 <bauzas> the 'how' being: how can we be sure that the boundaries are good ? 21:10:44 <alaski> bauzas: we may need to draw a temporary line that we know we might revise later 21:11:26 <doffm> We wont know for sure until we try. So make a plan for the best split we can. 21:11:28 <bauzas> alaski: yeah, I just think that migrating a table to the API DB can be okay, reverting it to the cell DB could be difficult 21:11:41 <bauzas> so, I would be a bit conservative first 21:12:01 <bauzas> and just make sure we migrate the tables we really need to call 21:12:07 <alaski> that's fair 21:12:16 <doffm> Yep. 21:12:45 <alaski> the list of api tables is really just a list of tables we want to look at migrating 21:12:53 <bauzas> then cool :) 21:13:01 <alaski> in the process we may find that something shouldn't actually be moved 21:13:35 <alaski> but it's good to have the list of things to look at 21:13:45 <bauzas> yeah for sure 21:13:53 <alaski> getting back to aggregates, I started wondering if we actually do want them to span cells 21:14:09 <bauzas> I think yeah 21:14:32 <alaski> I think so too, but why? 21:14:33 <bauzas> I was seeing the aggregates differently from cells 21:14:42 <bauzas> orthogonally even 21:14:58 <doffm> I always presumed that we did. Some things might be rack / cell specific. Others global. 21:15:01 <bauzas> so, aggregates are there for 2 reasons, right? 21:15:13 <bauzas> # for a placement decision 21:15:23 <bauzas> #2 for a global item 21:15:32 <bauzas> #1 and/or #2 21:15:46 <alaski> what do you mean by a global item? 21:15:51 <bauzas> like a ratio 21:16:12 <bauzas> I mean a metadata var for all the computes in there 21:16:20 <doffm> uses_ssh has_gpu. funky_network_gear. slow_disks. 21:16:27 <bauzas> hah 21:16:42 <bauzas> so, that's why I see cells being different 21:17:17 <alaski> I agree with all of that, but will also point out that at rackspace those tags applied at the cell level 21:17:40 <bauzas> so, MHO is that cells are failure domains 21:17:46 <alaski> I'm also just playing devils advocate here 21:17:48 <bauzas> while aggregates are the above 21:18:06 <bauzas> an aggregate (and an AZ) are not a failure domain 21:18:11 <alaski> if a cell is a failure domain should there be resources that span it 21:18:30 <bauzas> I can give an example 21:18:34 <alaski> because you're not isolated against failure like that 21:18:43 <doffm> Sure we could add aggregates to each cell. funky_network_gear_1, funky_network_gear_2. Thats what people do in cellsv1 right? 21:18:48 <doffm> Adds a load on the operator though. 21:19:05 <doffm> If there are globalish concepts that span cells. 21:19:21 <bauzas> yeah, you could want to place like this 21:19:49 <bauzas> (cell_A OR cell_B) AND not cellC 21:19:59 <doffm> cell affinity. 21:20:05 <bauzas> aggregates could be one way 21:20:06 <alaski> doffm: aggregates weren't used, but yes that's essentially what happened in v1 21:20:07 <bauzas> also 21:20:49 <alaski> doffm: it was basically cells 3,4,5 can handle flavors a,b and cells 6,7 take flavors c,d 21:21:01 <bauzas> consider for example that you have 2 cells sharing each 2 types of hardware 21:21:28 <bauzas> what if as a user I care about a specific type of hardware but I don't care about where it will be placed 21:22:32 <alaski> let me ask something real quick, but I want to be clear 21:23:00 <alaski> does it matter that an aggregate spans cells, or that two cells each have an aggregate with the same properties? 21:23:36 <alaski> for affinity it does seem to matter 21:24:00 <doffm> Yes, if we are using aggregates for affinity. Also for operator load. (Having to create in N cells). 21:24:19 <doffm> Also possibly for performance in a global scheduler. (Multiple db look ups for aggregates?) 21:24:41 <alaski> but since it's really the scheduler that cares about aggregates, and it will have a global view, I wonder if the aggregates could be merged by it instead of having them global in nova 21:24:52 <bauzas> well, I just don't want to lock up a specific implementation detail that could lead to a huge design difference :) 21:25:42 <bauzas> the thing is, you can create as many aggregates per host as you wish 21:25:46 <alaski> here's my struggle 21:26:01 <alaski> I think Jays work depends on aggregates being in a cell 21:26:04 <bauzas> so, I know that lots of operators have dedicated aggregates, each per use they want 21:26:13 <alaski> but I agree that they should be global 21:26:45 <doffm> alaski: We could think of other ways around aggregates in regards to the resource pools framework. 21:26:49 <alaski> and long term I think we want to move a lot of this into a unified scheduler and not store it in nova dbs 21:27:07 <doffm> We will have to have a mapping between resource pools and cells anyway. 21:27:19 <alaski> doffm: yeah, might be the way we end up needing to go 21:27:26 <doffm> So we will have a table with resource pools ids in them. 21:27:40 <doffm> We could move the resource-pool <-> aggregate mapping to the api db. 21:27:56 <doffm> And link it to the resource pool cell-id mapping instead of directly to the resource pool table. 21:28:37 <bauzas> that could work 21:28:47 <bauzas> just another level of indirection 21:29:32 <alaski> I think I need to draw this all out at some point 21:29:45 <alaski> but that does seem to work 21:29:50 <doffm> I could write something up for us all to discuss. 21:30:00 <alaski> that would be great 21:30:04 <doffm> OK. 21:30:31 <mriedem> don't draw it out in ascii art 21:30:40 <doffm> I actually was going to. 21:30:56 <mriedem> doffm: that ML thread was before you were working on openstack... 21:30:59 <mriedem> but i digress 21:31:00 <alaski> oh man, not this discussion 21:31:13 <bauzas> heh 21:31:20 <alaski> it was my ascii art that spawned that discussion 21:31:32 <alaski> doffm: ascii art is wonderful 21:31:36 <alaski> lascii art, even better 21:31:42 <doffm> mriedem: Will inform me of what i'm supposed to do offline. :) 21:32:02 <mriedem> whatever jogo used for the arch diagram in the nova devref 21:32:04 <mriedem> use that 21:32:33 <mriedem> mtreinish: will tell you to use, oh what's it called again 21:32:41 <alaski> heh, latex :) 21:32:47 <mriedem> yeah 21:32:54 <doffm> There are two other forign key issues I found. Fixed-ips -> instance ids. SecurityGroups -> instance ids. I guess we can discuss those down the line. 21:33:03 <doffm> I only had a chance to look at security groups a little bit. 21:33:31 <alaski> for those I was thinking we could change the foreign key to be to the instance_mapping 21:33:42 <doffm> Makes sense. 21:34:14 <alaski> there could still be some gotchas in there, but it's something to try 21:34:54 <doffm> Although the security-group <-> instance mapping is many-to-many. Its most often used in the cells for accessing security groups I think. 21:35:05 <doffm> So there is some argument for keeping it in the cell db. 21:35:23 <alaski> the only other thing on the agenda is a reminder about summit proposals and newton specs, we may want to start an etherpad to track those 21:35:37 <alaski> doffm: ahh, okay 21:36:19 <alaski> although I think security_groups should really be in the api db 21:36:34 <doffm> alaski: For sure, the mapping could stay though. 21:36:47 <alaski> ahh, I see 21:37:33 <doffm> For newton specs... I guess we should start writing some. Should we table a discussion of what we want to get in for newton and go from there? 21:38:02 <alaski> yep, that's a good plan 21:38:07 <ccarmack> alaski: are you still looking for volunteers? 21:38:21 <alaski> definitely 21:38:30 <ccarmack> anything I can work on? 21:38:44 <ccarmack> I saw something about grenade updates 21:39:14 <alaski> that's one thing 21:39:28 <alaski> some slightly more invasive testing to check things that functional tests may not catch 21:39:37 <alaski> if that's possible 21:39:40 <doffm> alaski: When would you like to have the newton plans discussion, next weeks meeting? Hangouts? 21:40:08 <doffm> alaski: ccarmack: Will we want to do a multi-cell grenade eventually? I mean adding a new cell and checking everything works. 21:40:46 <doffm> As well as a multi-cell test in general. :/ 21:40:55 <alaski> doffm: lets start with doing it in the meetings, and maybe have a hangout after FF when we see what progress was made in M 21:41:05 <ccarmack> maybe I should write a grenade spec 21:41:06 <doffm> Ok. 21:41:56 <alaski> yes, multi-cell testing will be necessary 21:42:15 <alaski> I'm trying to find some code that has no testing that should 21:42:44 <bauzas> so, newton high-level objective would be to allow a 2nd cell ? :) 21:43:00 <alaski> this nova-manage command is untested http://git.openstack.org/cgit/openstack/nova/tree/nova/cmd/manage.py#n1269 21:43:24 <alaski> a great grenade test would be to boot some instances and then run that command after the upgrade 21:43:35 <alaski> and ensure the migration succeeded 21:43:54 <alaski> it's going to depend on https://review.openstack.org/#/c/270565/ as well 21:44:40 <alaski> bauzas: heh 21:44:59 <doffm> Did anyone put in summit proposals? 21:45:26 <alaski> I proposed one talk on cellsv2 21:45:57 <alaski> doffm: I'm not sure if you're familiar with the format, but there will be proposals for the design summit much closer to the event 21:46:18 <alaski> those will be the technical discussions that are more helpful 21:46:33 <doffm> I'm not familiar at all. Thats good to know. 21:47:12 <alaski> the proposals that just happened are basically presentation format, the design summit is much like the midcycle 21:47:25 <alaski> except with many more people 21:47:29 <mriedem> and time boxed 21:47:31 <bauzas> alaski: I guess you're planning the cellsv2 talk for giving an high-level view ? 21:47:36 <mriedem> it's basically worse in every way :) 21:47:39 <bauzas> to ops 21:47:57 <alaski> mriedem: +1 21:48:08 <alaski> bauzas: high level view, and progress report 21:48:13 <bauzas> ack 21:48:56 <alaski> anything else for today? 21:49:29 <alaski> that was a good discussion, and I look forward to your writeup doffm 21:49:51 <doffm> Thanks. 21:50:16 <alaski> ccarmack: if any other cells work comes up this cycle I'll ping you, but next cycle should be chock full of it 21:50:21 <alaski> thanks all! 21:50:31 <alaski> #endmeeting