17:00:26 #startmeeting nova_cells 17:00:27 Meeting started Wed Dec 17 17:00:26 2014 UTC and is due to finish in 60 minutes. The chair is alaski. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:00:28 my bell ringed :) 17:00:28 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 17:00:30 The meeting name has been set to 'nova_cells' 17:00:34 bauzas: :) 17:00:36 Hi 17:00:36 Hello :) 17:00:37 o/ 17:00:42 \o 17:00:57 o/ 17:00:57 o/ 17:01:07 #topic Cells manifesto 17:01:12 https://review.openstack.org/#/c/139191/ 17:01:28 I just updated that based on some good feedback 17:01:33 alaski: eh :) 17:01:36 o/ 17:01:58 more feedback is appreciated as always, but I think it's looking pretty good 17:02:21 I'll try to hit that here in a bit 17:02:33 cool 17:02:37 me too, interested in reviewing that 17:02:49 alaski: the change looks pretty small 17:03:09 alaski: so if you agree to add more mentions to the networking stuff later, I'm +1 with it 17:03:34 bauzas: great 17:03:37 alaski: because I was thinking it was necessary to also discuss what would be the standard way for networks in cells 17:04:03 I agree. But I don't have anything together for that right now 17:04:11 right 17:04:18 but it's open for proposals 17:04:25 so, let's move on, and discuss on it later on 17:04:34 +1 17:04:40 #topic Testing 17:04:41 alaski: I think we should go all in on neutron, assuming nova-network is dead by then, but lets move on 17:04:55 same stuff at https://etherpad.openstack.org/p/nova-cells-testing 17:05:10 newest failures at the bottom 17:05:29 I tackled the test_host_negative tests and have a review up, which needs some unit tests 17:05:43 but there are plenty more to look at 17:06:05 alaski: should we queue up a patch to remove the flavor query from the libvirt driver on top of my flavor patch to see if cells is happy with it? 17:06:15 alaski: could you please provide the devstack logs also ? 17:06:34 dansmith: yeah, that would be a great test 17:06:38 alaski: because if not, it needs to run a cells devstack 17:07:01 bauzas: I can include a link to the job I pulled the failures from 17:07:22 nevermind, I'm bad at reading 17:07:23 and a 'check experimental' can run the tests for logs as well 17:07:39 agreed 17:07:58 dansmith: I thought garyk had a patch up to do something like that -- pass the flavor to driver instead of driver lookup? 17:08:00 https://review.openstack.org/#/c/141905/ seems to be right change to look at 17:08:12 melwitt: i think those are short-term 17:08:22 Okay 17:08:23 melwitt: different anyway I think 17:08:26 bauzas: yes, that's a good one 17:08:35 melwitt: but I'll look 17:08:37 melwitt: that change merged 17:08:49 oh, sorry 17:08:50 that's what brought the test failures as low as they are 17:08:56 alaski: link? 17:09:18 dansmith: https://review.openstack.org/#/c/135285/ 17:09:31 I see 17:09:41 dansmith: so that's actually what should be removed 17:09:55 so, 17:10:01 doesn't this undo things? 17:10:12 meaning, I'd think it causes a bunch of the driver to look at potentially different flavor bits 17:10:23 instead of just the extra_specs bit 17:11:09 I'd have to look at the proceeding patch again, but I think the passed in flavor should match what would have been queried in the driver 17:11:26 okay 17:11:35 maybe the driver was already being too aggressive there actually 17:11:55 anyway, that's fine, sorry for the distraction 17:12:08 it probably was 17:12:11 no worries 17:12:20 anything else on testing? 17:12:44 #topic Table analysis 17:12:52 https://etherpad.openstack.org/p/nova-cells-table-analysis 17:13:15 Unless there are objections I think we should move the uncontroversial tables into the devref for now 17:13:36 as a follow on to the cells page started with the manifesto 17:14:15 sounds like a good incremental step 17:14:29 cool 17:14:43 yeah +1 on that 17:14:48 alaski: I vote for a conservative way of explaining this using lots of conditionals : "it should", "we may" etc. :D 17:14:50 that will give some concrete examples to reference while discussing things or writing specs 17:15:34 bauzas: sure. There can be a disclaimer that this is the current breakdown, but nothing is final until it's coded, and not even then 17:15:44 alaski: I like this :) 17:16:23 anyone want to make make the devref review? 17:17:11 okay, I'll add it as a followup to the manifesto review 17:17:36 #topic Cells scheduling 17:17:45 The big topic for today 17:17:49 https://review.openstack.org/#/c/141486/ 17:18:21 there's a lot of stuff in there 17:18:52 I haven't looked at this yet, but I will after the meeting 17:18:56 * dansmith sucks 17:19:07 \o/ 17:19:11 I made a proposal to start the discussion, but am not necessarily advocating what's currently up 17:19:25 alaski: yeah, I think the discussion is very large 17:19:49 the other thing we can do, 17:19:52 alaski: see all our current discussions in the table analysis that are also related to the scheduler placement decision 17:20:06 is expect that the current scheduling approach will apply to cells as if it was a single deployment, 17:20:17 dansmith: +1000 17:20:25 with the note that we can't do anything other than prove that the rest of the stuff works until we address the scale problem 17:20:51 dansmith: I'm seeing the nova-scheduler code as something unique, but which can be deployed in many ways 17:20:56 so, we can't dump cellsv1 until the scheduler scales, but at least we're able to move forward on the rest of the bits which are more structural 17:21:09 * dansmith wonders what johnthetubaguy thinks of that 17:21:18 dansmith: ie. if it's a scalability problem, then we can consider having a nova-scheduler for only cells picking and one nova-scheduler process per cell 17:21:27 dansmith: I think thats what I was thinking too, if I understand you correctly 17:21:54 basically, we need *something* to give us (cell, host, node), lets make that a single RPC call to the "scheduler" 17:22:06 at the moment, the scheduler is unable to pick a cell, so I would propose to add a new RPC method for this 17:22:07 what happens next, needs fixing before we drop v1 17:22:10 right, and then we can fix what is behind it 17:22:14 johnthetubaguy: yeah, agreed 17:22:20 alaski: thoughts? 17:23:21 I'm okay with that approach 17:23:47 do we all agree to extend the current nova-scheduler ? 17:23:51 alaski: I wonder if tasks could help your worries in the spec? 17:24:02 bauzas: I don't agree to that 17:24:14 alaski: the global cell records the user request as a task 17:24:14 bauzas: I agree to not hinge cellsv2 on the non-scaling scheduler problem :P 17:24:27 alaski: once the scheduler is done, we create the instance record? 17:24:29 what I like is the interface between the API and cells being flexible enough to handle a few scheduler deployment scenarios 17:24:45 and look at a single scheduler right now 17:25:15 johnthetubaguy: tasks would help, yes 17:25:20 alaski: right, the above interface we describe could actually be a services that first quieres a cell, then uses that to pick the next scheduler to ask, then returns, etc 17:25:32 johnthetubaguy: I proposed something that starts to look like a task in the spec 17:25:43 johnthetubaguy: do we consider to elect a cell first and then a host ? 17:25:46 alaski: I should read further down, sorry! 17:26:21 dansmith: thats a good point though, the question is do we want a DB record by the time the API returns, or does the API return wait for the scheduler to finish 17:26:46 dansmith: I skipped over that before, and having a "task" recorded in the global DB fixes that a little 17:26:53 johnthetubaguy: gotta have something before the API returns I'd think, but we can just shove it into the mapping at that point, right? 17:27:19 We need to store a little more than the mapping can hold 17:27:38 like what? 17:27:41 more than just a uuid? 17:27:54 alaski: you mean the spec ? 17:27:59 I guess we have to look like we've recorded all the information they gave us... 17:28:02 bauzas: not the spec 17:28:05 dansmith: yes 17:28:18 we need to fulfill the current api contract if they show the instance right away 17:28:26 yeah 17:28:50 alaski: gilliard and I left a comment on why you're considering the whole story as async 17:28:50 alaski: we could go ahead and add an instance json cache table :) 17:29:04 alaski: that's still unclear for me 17:29:06 * johnthetubaguy shudders 17:29:23 dansmith: heh, that would probably work 17:29:30 johnthetubaguy: are you shuddering at me? 17:29:32 I was thinking of storing it almost as a task 17:29:46 dansmith: yeah, its feels nasty, but its annoying because it works... 17:30:01 johnthetubaguy: I thought there was a goal of doing that anyway? 17:30:07 bauzas: because the API should be snappy, and waiting for the scheduler won't allow that 17:30:10 johnthetubaguy: even for running instances 17:30:31 dansmith: yeah, we do need a cache eventually, thats true, I mean we might need to index on other things, but yeah 17:31:02 alaski: but how are you sure that the contract you give back to the user is valid ? 17:31:34 sorry guys, I know about the tasks proposal but I just want to make sure we all agree with the idea to return an uuid with a building state ? 17:31:48 and then just provide to the user a fail status if not ? 17:32:05 bauzas: think about listing all your instances, etc, its more that bit I worry about 17:32:17 johnthetubaguy: hell yeah 17:32:32 dansmith: johnthetubaguy we need somewhere to store some info. task or instance cache should work now, we should probably get that on the review and debate/think on it there 17:32:55 yeah, I have a hard time thinking these things through without actually doing them, 17:32:59 alaski: yeah, sounds like its hit all the major issues, I should just review the written detail 17:33:02 'cause I'm weak-minded 17:33:22 dansmith: hah, I find that hard to believe 17:33:31 likewise. 17:33:52 bauzas: I don't think returning before scheduling changes the contract we have now, because we dont currently wait on scheduling 17:34:09 alaski: right 17:34:47 so I'm not sure why we would start to wait on scheduling 17:34:54 alaski: gotcha 17:35:49 anything else on scheduling for now? 17:35:59 please no. 17:36:00 :) 17:36:03 :) 17:36:04 +1 17:36:18 alright, please add thoughts to the review 17:36:27 #topic Open Discussion 17:36:47 General announcement, the next meeting will be Jan 7th 17:37:16 anything else people would like to discuss? 17:37:53 going once... 17:38:10 sold! 17:38:18 alright, early marks all around! 17:38:25 :) 17:38:26 awesome 17:38:26 heh 17:38:31 :) 17:38:31 #endmeeting