21:01:21 #startmeeting nova_cells 21:01:21 Meeting started Wed Feb 10 21:01:21 2016 UTC and is due to finish in 60 minutes. The chair is alaski. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:01:23 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:01:25 The meeting name has been set to 'nova_cells' 21:01:31 * bauzas waves to alaski 21:01:37 Hi. 21:01:38 o/ 21:01:43 o/ 21:01:48 o/ 21:02:00 and... \o (because I'm a gentleman) 21:02:04 great, let's get going 21:02:05 o/ 21:02:11 #topic Cells testing/bugs 21:02:34 I updated https://review.openstack.org/#/c/225199/ to change server_basic_ops to test ssh 21:02:46 I could use some reviews on it 21:03:03 great 21:03:08 #link https://review.openstack.org/#/c/225199/ 21:03:28 One thing I need to do is change project-config to set run_validation = true for cells 21:04:21 but I already have a 3 dependency patch …so would like to change project-config after this is approved 21:04:44 sounds reasonable 21:04:54 cool 21:05:24 there was a tempest test added recently which failed the cells job 21:05:35 so it was added to the exclusion list in https://review.openstack.org/#/c/277536/ 21:05:50 alaski: that test broke all shared storage jobs too 21:05:51 so it was reverted 21:05:56 oh 21:06:34 when/if it's added back it would be good to have the security group addition behind a flag 21:06:53 but otherwise I'm not aware of any cells failures 21:07:11 #topic Open reviews 21:07:20 as always, just calling out https://etherpad.openstack.org/p/mitaka-nova-priorities-tracking 21:07:31 I have some reviews to add there, but there's a lot of stuff up now 21:07:38 please take a look 21:07:50 #topic Open Discussion 21:08:12 I created a script to look at possible foreign key issues when doing the database split. 21:08:28 https://etherpad.openstack.org/p/CellsV2-database-split 21:08:54 I also proposed a doc change to add all missing databases to the split info. 21:09:07 https://review.openstack.org/#/c/277543/ 21:09:22 If people could review that to discuss any database split issues i'd be grateful. 21:09:37 lovely 21:09:52 yeah, really nice 21:10:06 that said, I just wonder how we could draw the line in the sand 21:10:06 do we want to discuss the big one now, aggregates? 21:10:20 We could. I don't have any really good ideas though. 21:10:39 the 'how' being: how can we be sure that the boundaries are good ? 21:10:44 bauzas: we may need to draw a temporary line that we know we might revise later 21:11:26 We wont know for sure until we try. So make a plan for the best split we can. 21:11:28 alaski: yeah, I just think that migrating a table to the API DB can be okay, reverting it to the cell DB could be difficult 21:11:41 so, I would be a bit conservative first 21:12:01 and just make sure we migrate the tables we really need to call 21:12:07 that's fair 21:12:16 Yep. 21:12:45 the list of api tables is really just a list of tables we want to look at migrating 21:12:53 then cool :) 21:13:01 in the process we may find that something shouldn't actually be moved 21:13:35 but it's good to have the list of things to look at 21:13:45 yeah for sure 21:13:53 getting back to aggregates, I started wondering if we actually do want them to span cells 21:14:09 I think yeah 21:14:32 I think so too, but why? 21:14:33 I was seeing the aggregates differently from cells 21:14:42 orthogonally even 21:14:58 I always presumed that we did. Some things might be rack / cell specific. Others global. 21:15:01 so, aggregates are there for 2 reasons, right? 21:15:13 # for a placement decision 21:15:23 #2 for a global item 21:15:32 #1 and/or #2 21:15:46 what do you mean by a global item? 21:15:51 like a ratio 21:16:12 I mean a metadata var for all the computes in there 21:16:20 uses_ssh has_gpu. funky_network_gear. slow_disks. 21:16:27 hah 21:16:42 so, that's why I see cells being different 21:17:17 I agree with all of that, but will also point out that at rackspace those tags applied at the cell level 21:17:40 so, MHO is that cells are failure domains 21:17:46 I'm also just playing devils advocate here 21:17:48 while aggregates are the above 21:18:06 an aggregate (and an AZ) are not a failure domain 21:18:11 if a cell is a failure domain should there be resources that span it 21:18:30 I can give an example 21:18:34 because you're not isolated against failure like that 21:18:43 Sure we could add aggregates to each cell. funky_network_gear_1, funky_network_gear_2. Thats what people do in cellsv1 right? 21:18:48 Adds a load on the operator though. 21:19:05 If there are globalish concepts that span cells. 21:19:21 yeah, you could want to place like this 21:19:49 (cell_A OR cell_B) AND not cellC 21:19:59 cell affinity. 21:20:05 aggregates could be one way 21:20:06 doffm: aggregates weren't used, but yes that's essentially what happened in v1 21:20:07 also 21:20:49 doffm: it was basically cells 3,4,5 can handle flavors a,b and cells 6,7 take flavors c,d 21:21:01 consider for example that you have 2 cells sharing each 2 types of hardware 21:21:28 what if as a user I care about a specific type of hardware but I don't care about where it will be placed 21:22:32 let me ask something real quick, but I want to be clear 21:23:00 does it matter that an aggregate spans cells, or that two cells each have an aggregate with the same properties? 21:23:36 for affinity it does seem to matter 21:24:00 Yes, if we are using aggregates for affinity. Also for operator load. (Having to create in N cells). 21:24:19 Also possibly for performance in a global scheduler. (Multiple db look ups for aggregates?) 21:24:41 but since it's really the scheduler that cares about aggregates, and it will have a global view, I wonder if the aggregates could be merged by it instead of having them global in nova 21:24:52 well, I just don't want to lock up a specific implementation detail that could lead to a huge design difference :) 21:25:42 the thing is, you can create as many aggregates per host as you wish 21:25:46 here's my struggle 21:26:01 I think Jays work depends on aggregates being in a cell 21:26:04 so, I know that lots of operators have dedicated aggregates, each per use they want 21:26:13 but I agree that they should be global 21:26:45 alaski: We could think of other ways around aggregates in regards to the resource pools framework. 21:26:49 and long term I think we want to move a lot of this into a unified scheduler and not store it in nova dbs 21:27:07 We will have to have a mapping between resource pools and cells anyway. 21:27:19 doffm: yeah, might be the way we end up needing to go 21:27:26 So we will have a table with resource pools ids in them. 21:27:40 We could move the resource-pool <-> aggregate mapping to the api db. 21:27:56 And link it to the resource pool cell-id mapping instead of directly to the resource pool table. 21:28:37 that could work 21:28:47 just another level of indirection 21:29:32 I think I need to draw this all out at some point 21:29:45 but that does seem to work 21:29:50 I could write something up for us all to discuss. 21:30:00 that would be great 21:30:04 OK. 21:30:31 don't draw it out in ascii art 21:30:40 I actually was going to. 21:30:56 doffm: that ML thread was before you were working on openstack... 21:30:59 but i digress 21:31:00 oh man, not this discussion 21:31:13 heh 21:31:20 it was my ascii art that spawned that discussion 21:31:32 doffm: ascii art is wonderful 21:31:36 lascii art, even better 21:31:42 mriedem: Will inform me of what i'm supposed to do offline. :) 21:32:02 whatever jogo used for the arch diagram in the nova devref 21:32:04 use that 21:32:33 mtreinish: will tell you to use, oh what's it called again 21:32:41 heh, latex :) 21:32:47 yeah 21:32:54 There are two other forign key issues I found. Fixed-ips -> instance ids. SecurityGroups -> instance ids. I guess we can discuss those down the line. 21:33:03 I only had a chance to look at security groups a little bit. 21:33:31 for those I was thinking we could change the foreign key to be to the instance_mapping 21:33:42 Makes sense. 21:34:14 there could still be some gotchas in there, but it's something to try 21:34:54 Although the security-group <-> instance mapping is many-to-many. Its most often used in the cells for accessing security groups I think. 21:35:05 So there is some argument for keeping it in the cell db. 21:35:23 the only other thing on the agenda is a reminder about summit proposals and newton specs, we may want to start an etherpad to track those 21:35:37 doffm: ahh, okay 21:36:19 although I think security_groups should really be in the api db 21:36:34 alaski: For sure, the mapping could stay though. 21:36:47 ahh, I see 21:37:33 For newton specs... I guess we should start writing some. Should we table a discussion of what we want to get in for newton and go from there? 21:38:02 yep, that's a good plan 21:38:07 alaski: are you still looking for volunteers? 21:38:21 definitely 21:38:30 anything I can work on? 21:38:44 I saw something about grenade updates 21:39:14 that's one thing 21:39:28 some slightly more invasive testing to check things that functional tests may not catch 21:39:37 if that's possible 21:39:40 alaski: When would you like to have the newton plans discussion, next weeks meeting? Hangouts? 21:40:08 alaski: ccarmack: Will we want to do a multi-cell grenade eventually? I mean adding a new cell and checking everything works. 21:40:46 As well as a multi-cell test in general. :/ 21:40:55 doffm: lets start with doing it in the meetings, and maybe have a hangout after FF when we see what progress was made in M 21:41:05 maybe I should write a grenade spec 21:41:06 Ok. 21:41:56 yes, multi-cell testing will be necessary 21:42:15 I'm trying to find some code that has no testing that should 21:42:44 so, newton high-level objective would be to allow a 2nd cell ? :) 21:43:00 this nova-manage command is untested http://git.openstack.org/cgit/openstack/nova/tree/nova/cmd/manage.py#n1269 21:43:24 a great grenade test would be to boot some instances and then run that command after the upgrade 21:43:35 and ensure the migration succeeded 21:43:54 it's going to depend on https://review.openstack.org/#/c/270565/ as well 21:44:40 bauzas: heh 21:44:59 Did anyone put in summit proposals? 21:45:26 I proposed one talk on cellsv2 21:45:57 doffm: I'm not sure if you're familiar with the format, but there will be proposals for the design summit much closer to the event 21:46:18 those will be the technical discussions that are more helpful 21:46:33 I'm not familiar at all. Thats good to know. 21:47:12 the proposals that just happened are basically presentation format, the design summit is much like the midcycle 21:47:25 except with many more people 21:47:29 and time boxed 21:47:31 alaski: I guess you're planning the cellsv2 talk for giving an high-level view ? 21:47:36 it's basically worse in every way :) 21:47:39 to ops 21:47:57 mriedem: +1 21:48:08 bauzas: high level view, and progress report 21:48:13 ack 21:48:56 anything else for today? 21:49:29 that was a good discussion, and I look forward to your writeup doffm 21:49:51 Thanks. 21:50:16 ccarmack: if any other cells work comes up this cycle I'll ping you, but next cycle should be chock full of it 21:50:21 thanks all! 21:50:31 #endmeeting