17:00:07 #startmeeting nova_cells 17:00:08 Meeting started Wed Feb 11 17:00:07 2015 UTC and is due to finish in 60 minutes. The chair is alaski. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:00:09 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 17:00:11 The meeting name has been set to 'nova_cells' 17:00:22 Anyone around? 17:00:26 /o 17:00:27 o/ 17:00:31 \o 17:00:31 o/ 17:00:33 o/ 17:00:36 o/ 17:00:37 o/ 17:00:44 excellent! 17:00:49 #topic Testing 17:01:11 mriedem pointed out https://bugs.launchpad.net/nova/+bug/1420322 17:01:11 Launchpad bug 1420322 in OpenStack Compute (nova) "gate-devstack-dsvm-cells fails in volumes exercise with "Server ex-vol-inst not deleted"" [Medium,In progress] - Assigned to Matt Riedemann (mriedem) 17:01:26 which seems to be related to what melwitt has been chasing 17:02:19 melwitt: any updates since we discussed this yesterday? 17:03:08 ah, yeah. I'm chasing around all of the DetachedInstance errors. not really unfortunately, I tried your suggestion with the obj_to_primitive and will be combing through results this morning 17:03:40 cool, thanks for chasing that 17:04:01 latest count with the regex (by melwitt) is 126 17:04:18 so we had regressions ? 17:04:31 I was only counting 74 exceptions 17:04:54 etherpad has lot more than that 17:05:07 I thought it's still around 74 (when you look at the CI job) 17:05:12 I'm seeing 2 failures on a recent run 17:05:19 9th feb 17:05:23 #link https://etherpad.openstack.org/p/nova-cells-testing 17:06:26 i have appened possibly rectified for failures which are no more 17:06:28 looking at http://logs.openstack.org/67/154567/2/check/check-tempest-dsvm-cells/b9eecd8/console.html from the 10th 17:06:31 shows 2 failures 17:06:55 tempest.thirdparty.boto.test_ec2_instance_run.InstanceRunTest.test_run_idempotent_instances is pretty consistent in failing 17:07:13 I've tried reproducing locally and have failed thus far 17:07:46 right 17:08:14 I'd hate to just exclude it without understanding the reasoning there, since it was passing before 17:08:58 other failures seem to be intermittent and I haven't had a chance to dig deeper 17:09:18 alaski: I can't even find the boto one on http://logs.openstack.org/03/127203/5/check/check-tempest-dsvm-cells/7b641b4/console.html 17:09:46 that's probably transient errors :( 17:10:12 yep 17:10:43 I ran tempest locally with --until-failure for about half a day and didn't get any failures, but they keep coming up in the gate 17:11:19 it could use some digging from anyone interested 17:11:21 The boto one is here too http://logs.openstack.org/81/150381/2/check/check-tempest-dsvm-cells/9f37833/console.html but the end error is different 17:11:29 and I'll keep digging as well 17:11:40 alaski, my error count is from a local run too 17:11:46 in general, I've been having trouble reproducing the gate locally 17:11:59 ahh no scratch that...its the same 17:12:31 dheeraj-gupta-4: yep, just a different id 17:13:02 melwitt: same here, which makes this all the more interesting 17:13:14 but we're really close to having a passing job 17:13:48 anything else around testing? 17:14:12 #topic Database migrations 17:14:23 https://review.openstack.org/#/c/153666/ 17:14:48 I've added basic support for alembic as a PoC 17:15:14 the autogenerate feature for migrations is nice 17:15:19 agreed 17:15:37 but I'm currently working on testing which requires reworking a lot of things that are geared towards the current setup 17:15:59 I was thinking that jerfleldt was working on shipping alembic for his expand/contract BP ? 17:16:13 so maybe you could just rebase on top of his patch ? 17:16:26 bauzas: he is, but it's not a full alembic setup from what I understand 17:16:36 alaski: oh ok 17:16:41 I need to touch base with him again on this 17:17:15 alaski: yeah, my only concern was that you could maybe leverage his work, that's it 17:17:38 but right now the options seem to be: use alembic like I have here, depend on the expand contract work, use sqlalchemy-migrate 17:17:53 I would like to leverage his work, but I'm not sure it's in a state to use it quite yet 17:18:32 so then the question is, alembic or sqlalchemy-migrate for the short term 17:19:01 sqlalchemy-migrate seems to be the easiest path 17:19:20 I mean, we're focusing on having 2 connection strings for the DB API 17:19:32 that would directly benefit for sqlalchemy-migrate 17:20:17 bauzas: Why only 2? 17:20:42 sqlalchemy-migrate is easier, but I feel there are benefits to alembic that make it worth pursuing 17:21:17 dheeraj-gupta-4: forget the number, I was just wanting to say that we're working on providing a facade for any connection string 17:21:19 IMHO, we are limiting to two tier (one top-level, one child) 17:21:36 let me chat with johannes to see how we can converge in a solid way 17:21:44 alaski: makes sense 17:22:15 alaski: that's only matter of priorities... :) 17:22:38 yep 17:22:46 anything else on the database for now? 17:23:17 #topic Neutron integration 17:23:39 I spoke with armax a bit yesterday about the Neutron side of things 17:24:23 we agreed that there are a lot of open questions, and that this would make a good summit discussion on the neutron side 17:24:56 I'm trying to get the right people together for this discussion 17:25:06 and not limit it to the summit, but get it going before then 17:26:01 there seem to be two main issues we face, scheduling and api contracts 17:26:01 alaski: but what is the real goal on this? having a neutron for cell? 17:26:35 belmoreira: having nova/neutron work together when Nova is using cells 17:26:54 Neutron would like to look at cells for themselves, but I'm not sure it's clear what that means 17:27:09 alaski: different nets and subnets for cells then ? 17:27:18 ok, but what is the problem today? 17:27:34 bauzas: got it. 17:28:04 bauzas: perhaps. Right now I think it's unclear whether that's a requirement 17:28:11 alaski: I mean, if we only consider a global network for the cloud, then Neutron doesn't necessarly need to talk cells ? 17:28:18 bauzas: right 17:28:37 but there are at least two deployments that don't work that way 17:28:51 alaski: ack, so that's not a design problem, but rather a feature 17:29:18 mmm 17:29:34 ideally Neutron would work for both cases, but how that's accomplished is an open problem atm 17:29:47 you can have multiple subnets in Neutron, you only need a global L2 17:30:02 alaski: we have the same problem. subnets per cells. However we are looking into solving it in a different way 17:30:38 so maybe the question is : is Neutron able to scale its L2 network with cells ? 17:31:02 belmoreira: good to know. I'd like to ensure you or someone from CERN is involved, as well as people from Rackspace, and then whoever is interested 17:31:36 bauzas: that may be a good way to frame it 17:32:59 alaski: so then, Neutron folks should be interested in that 17:33:17 alaski: because that's a scale problem 17:33:25 I don't have the knowdedge of Neutron/networking that I'd like to have, so I'm going to be relying on others to know the details while I'm learning 17:34:00 alaski: maybe inviting Neutron folks to our meeting is worth it ? 17:34:22 bauzas: I was told that they are interested. And this is a good opportunity for them to learn about what cells is 17:34:33 alaski: totally agree with you 17:35:11 bauzas: +1. It would help if there were some people more knowledgeable on networking than I to help represent the Nova side 17:35:43 I'm trying to pull in some rackspace networking folks to help 17:36:03 and belmoreira when he's available 17:36:31 alaski: yes, I'm interested on this 17:37:00 great 17:37:13 so my plan at the moment is primarily to get the right people talking 17:37:24 makes sense 17:37:51 #topic Open Discussion 17:38:22 anything on anyones mind? 17:38:30 I'd like to get some feedback on https://review.openstack.org/#/c/150381/ . Like is it in line with what plans for cells are 17:38:37 or is it totally orthogonal 17:39:27 It looks like a really good start, and in line with the plans 17:39:48 The two/one configuration thing is still fuzzy 17:39:56 dheeraj-gupta-4: I didn't had time to review your latest PS 17:40:40 dheeraj-gupta-4: but very quickly glancing at it, I'm +1 on it 17:40:42 any clarification about how to pass connection_url? 17:40:48 bauzas: no worries...I tried fixing it 17:41:01 dheeraj-gupta-4: it could use an answer to dansmiths comment about how this will be used in practice 17:41:27 and it will probably also cross paths with what alaski is doing with DB 17:41:31 dheeraj-gupta-4: i.e. how to make a db call to one of these engines 17:42:10 alaski: I just think we just need a new opt for the cells V2 DB 17:42:15 Well.... to my mind the function calling the DBAPI method will supply the connection_url 17:42:41 alaski: so when calling the engine, we're just importing the opt 17:43:13 Like say nova-api on cell show will first call a dbapi.get_cells() and then for each connection_url it will do a dbapi.whatever(connection_url) 17:43:17 something like that 17:43:24 bauzas: I'm thinking about when calling into different cells, the same query could go to multiple dbs 17:43:55 alaski: oh you mean a global engine for multiple DBs ? 17:43:58 bauzas: Within oslo.db it is not possible 17:43:59 dheeraj-gupta-4: it would be good to see an example of that in a patch 17:44:23 dheeraj-gupta-4: and you might consider putting that info into the context so that each api method doesn't need to be updated 17:44:28 alaski: I think we need to have one engine per child cell 17:44:31 but with the changes you asked me to do (moving the register_opts out of oslo.db and into nova) we may be able to get two sections in the conf 17:44:40 bauzass : ^ 17:44:57 dheeraj-gupta-4: I was not thinking of an oslo.db opt 17:45:02 bauzas: right 17:45:18 bauzas: and then a way to pick an engine per db call 17:45:34 alaski: context is a security thing as per my limited understanding no? 17:45:41 alaski: well, that's partially done by dheeraj-gupta-4 in his patcfh 17:45:52 alaski: because there is a dict keyed on the connection string 17:46:09 dheeraj-gupta-4: not entirely. it's used for policy but it does more than that, and seems like a good fit for this 17:46:38 it holds information about the environment a request is being made in 17:46:40 alaski: ok, I'll look into that 17:46:43 alaski: by speaking about context, you mean a nova context, or a python context manager ? 17:46:52 bauzas: nova context 17:47:21 alaski: well, I don't think we should have one context per cell 17:47:51 bauzas: agreed. but the context could hold information about which db a request should go to 17:47:53 bauzas, +1 17:48:36 alaski: oh I see, you mean caching the connection information into the context object ? 17:48:36 so, you couls possibly have as many contexts as many cells you have 17:49:04 I'm going to steal from dansmith again. He gave an example of "target_cell(context, cell) as targetted_context: do_thing(targetted_context, ...)" in a discussion I had with him a while back 17:49:10 alaski: IIUC, that's only nova-api which needs to handle multiple connections right ? 17:49:12 alaski: but when API makes calls to different cells, it changes the connection info in the context or creates a new context? 17:49:24 dheeraj-gupta-4: changes it 17:49:38 bauzas: right 17:49:59 alaski: so then the local context is kinda transient ? 17:50:02 alaski: makes sense (though I'm still a bit hazy about it all) 17:50:13 a context manager could set something in the context on a per request basis 17:50:29 alaski: got your idea 17:50:44 bauzas: the connection data would be, similar to is_admin on the context 17:51:03 alaski: yeah understood, just need to think about the benefits 17:51:16 alaski: as I said, that's very local to n-apiu 17:51:19 n-api 17:51:51 bauzas: sure. The thing I really like is this means not modying every db.api method, and using a context manager means the state is set only when needed 17:51:53 alaski: the dansmith example is really helpful.... so using that we basically don't need to change API signatures, only the methods themselves and the callers 17:52:13 dheeraj-gupta-4: exactly 17:52:14 alaski: yeah, as I said, a context manager used for caching it 17:52:57 alaski: thanks for clearing that out 17:53:11 dheeraj-gupta-4: np 17:53:21 alaski: Another thing..... the two databases still bug me 17:53:36 shouldn't there be only one DB the service *really* cares about 17:53:36 in what way? 17:53:44 for n-api that is nova_api 17:53:54 for n-cpu the standard nova 17:54:20 for everything except nova-api that will be true 17:54:36 so while putting it into code, we don;t explicilty look to handle two DBs 17:54:40 but nova-api will need to be able to return data that lives in the cells 17:54:57 yes but that can be done through mechanism we discussed 17:55:50 The point I am trying to make is that do we really need two explicit engines in the DB API - Say nova_engine and api_engine? 17:56:09 the default engine will always point to the correct DB as per the config file in use 17:56:15 dheeraj-gupta-4: from an API PoV, there N+1 engines to manage 17:56:17 sorry if it is all confusing 17:56:27 dheeraj-gupta-4: N being the number of cells 17:57:09 I see what you've worked on a being nova_engine 17:57:14 and api_engine doesn't exist yet 17:57:27 for n-api the nova_engine _is_ the api_engine 17:57:30 that is my point 17:58:00 gotcha 17:58:08 because the n-api configuration will point it to the correct database 17:58:27 I think we do want to be explicit, and I've had reservations about overloading the config option that wya 17:58:28 way 17:58:54 ok...well that was my line of thought during the WIP patch 17:59:08 1 min to go :) 17:59:21 alaski: Would you mind commenting on the WIP patch with your PoV on this 17:59:26 I don't want to hold the meeting up 17:59:28 dheeraj-gupta-4: sure 17:59:43 it's a good point, and we should all be on the same page there 17:59:48 yep 18:00:02 I tried making the same point to dan there but.... 18:00:09 I think that's it for today 18:00:15 thanks everyone! 18:00:20 #endmeeting