17:00:07 <alaski> #startmeeting nova_cells 17:00:08 <openstack> Meeting started Wed Feb 11 17:00:07 2015 UTC and is due to finish in 60 minutes. The chair is alaski. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:00:09 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 17:00:11 <openstack> The meeting name has been set to 'nova_cells' 17:00:22 <alaski> Anyone around? 17:00:26 <vineetmenon> /o 17:00:27 <melwitt> o/ 17:00:31 <bauzas> \o 17:00:31 <dansmith> o/ 17:00:33 <edleafe> o/ 17:00:36 <belmoreira> o/ 17:00:37 <dheeraj-gupta-4> o/ 17:00:44 <alaski> excellent! 17:00:49 <alaski> #topic Testing 17:01:11 <alaski> mriedem pointed out https://bugs.launchpad.net/nova/+bug/1420322 17:01:11 <openstack> Launchpad bug 1420322 in OpenStack Compute (nova) "gate-devstack-dsvm-cells fails in volumes exercise with "Server ex-vol-inst not deleted"" [Medium,In progress] - Assigned to Matt Riedemann (mriedem) 17:01:26 <alaski> which seems to be related to what melwitt has been chasing 17:02:19 <alaski> melwitt: any updates since we discussed this yesterday? 17:03:08 <melwitt> ah, yeah. I'm chasing around all of the DetachedInstance errors. not really unfortunately, I tried your suggestion with the obj_to_primitive and will be combing through results this morning 17:03:40 <alaski> cool, thanks for chasing that 17:04:01 <vineetmenon> latest count with the regex (by melwitt) is 126 17:04:18 <bauzas> so we had regressions ? 17:04:31 <bauzas> I was only counting 74 exceptions 17:04:54 <vineetmenon> etherpad has lot more than that 17:05:07 <melwitt> I thought it's still around 74 (when you look at the CI job) 17:05:12 <alaski> I'm seeing 2 failures on a recent run 17:05:19 <vineetmenon> 9th feb 17:05:23 <bauzas> #link https://etherpad.openstack.org/p/nova-cells-testing 17:06:26 <vineetmenon> i have appened possibly rectified for failures which are no more 17:06:28 <alaski> looking at http://logs.openstack.org/67/154567/2/check/check-tempest-dsvm-cells/b9eecd8/console.html from the 10th 17:06:31 <alaski> shows 2 failures 17:06:55 <alaski> tempest.thirdparty.boto.test_ec2_instance_run.InstanceRunTest.test_run_idempotent_instances is pretty consistent in failing 17:07:13 <alaski> I've tried reproducing locally and have failed thus far 17:07:46 <bauzas> right 17:08:14 <alaski> I'd hate to just exclude it without understanding the reasoning there, since it was passing before 17:08:58 <alaski> other failures seem to be intermittent and I haven't had a chance to dig deeper 17:09:18 <bauzas> alaski: I can't even find the boto one on http://logs.openstack.org/03/127203/5/check/check-tempest-dsvm-cells/7b641b4/console.html 17:09:46 <bauzas> that's probably transient errors :( 17:10:12 <alaski> yep 17:10:43 <alaski> I ran tempest locally with --until-failure for about half a day and didn't get any failures, but they keep coming up in the gate 17:11:19 <alaski> it could use some digging from anyone interested 17:11:21 <dheeraj-gupta-4> The boto one is here too http://logs.openstack.org/81/150381/2/check/check-tempest-dsvm-cells/9f37833/console.html but the end error is different 17:11:29 <alaski> and I'll keep digging as well 17:11:40 <vineetmenon> alaski, my error count is from a local run too 17:11:46 <melwitt> in general, I've been having trouble reproducing the gate locally 17:11:59 <dheeraj-gupta-4> ahh no scratch that...its the same 17:12:31 <alaski> dheeraj-gupta-4: yep, just a different id 17:13:02 <alaski> melwitt: same here, which makes this all the more interesting 17:13:14 <alaski> but we're really close to having a passing job 17:13:48 <alaski> anything else around testing? 17:14:12 <alaski> #topic Database migrations 17:14:23 <alaski> https://review.openstack.org/#/c/153666/ 17:14:48 <alaski> I've added basic support for alembic as a PoC 17:15:14 <alaski> the autogenerate feature for migrations is nice 17:15:19 <bauzas> agreed 17:15:37 <alaski> but I'm currently working on testing which requires reworking a lot of things that are geared towards the current setup 17:15:59 <bauzas> I was thinking that jerfleldt was working on shipping alembic for his expand/contract BP ? 17:16:13 <bauzas> so maybe you could just rebase on top of his patch ? 17:16:26 <alaski> bauzas: he is, but it's not a full alembic setup from what I understand 17:16:36 <bauzas> alaski: oh ok 17:16:41 <alaski> I need to touch base with him again on this 17:17:15 <bauzas> alaski: yeah, my only concern was that you could maybe leverage his work, that's it 17:17:38 <alaski> but right now the options seem to be: use alembic like I have here, depend on the expand contract work, use sqlalchemy-migrate 17:17:53 <alaski> I would like to leverage his work, but I'm not sure it's in a state to use it quite yet 17:18:32 <alaski> so then the question is, alembic or sqlalchemy-migrate for the short term 17:19:01 <bauzas> sqlalchemy-migrate seems to be the easiest path 17:19:20 <bauzas> I mean, we're focusing on having 2 connection strings for the DB API 17:19:32 <bauzas> that would directly benefit for sqlalchemy-migrate 17:20:17 <dheeraj-gupta-4> bauzas: Why only 2? 17:20:42 <alaski> sqlalchemy-migrate is easier, but I feel there are benefits to alembic that make it worth pursuing 17:21:17 <bauzas> dheeraj-gupta-4: forget the number, I was just wanting to say that we're working on providing a facade for any connection string 17:21:19 <vineetmenon> IMHO, we are limiting to two tier (one top-level, one child) 17:21:36 <alaski> let me chat with johannes to see how we can converge in a solid way 17:21:44 <bauzas> alaski: makes sense 17:22:15 <bauzas> alaski: that's only matter of priorities... :) 17:22:38 <alaski> yep 17:22:46 <alaski> anything else on the database for now? 17:23:17 <alaski> #topic Neutron integration 17:23:39 <alaski> I spoke with armax a bit yesterday about the Neutron side of things 17:24:23 <alaski> we agreed that there are a lot of open questions, and that this would make a good summit discussion on the neutron side 17:24:56 <alaski> I'm trying to get the right people together for this discussion 17:25:06 <alaski> and not limit it to the summit, but get it going before then 17:26:01 <alaski> there seem to be two main issues we face, scheduling and api contracts 17:26:01 <belmoreira> alaski: but what is the real goal on this? having a neutron for cell? 17:26:35 <alaski> belmoreira: having nova/neutron work together when Nova is using cells 17:26:54 <alaski> Neutron would like to look at cells for themselves, but I'm not sure it's clear what that means 17:27:09 <bauzas> alaski: different nets and subnets for cells then ? 17:27:18 <belmoreira> ok, but what is the problem today? 17:27:34 <belmoreira> bauzas: got it. 17:28:04 <alaski> bauzas: perhaps. Right now I think it's unclear whether that's a requirement 17:28:11 <bauzas> alaski: I mean, if we only consider a global network for the cloud, then Neutron doesn't necessarly need to talk cells ? 17:28:18 <alaski> bauzas: right 17:28:37 <alaski> but there are at least two deployments that don't work that way 17:28:51 <bauzas> alaski: ack, so that's not a design problem, but rather a feature 17:29:18 <bauzas> mmm 17:29:34 <alaski> ideally Neutron would work for both cases, but how that's accomplished is an open problem atm 17:29:47 <bauzas> you can have multiple subnets in Neutron, you only need a global L2 17:30:02 <belmoreira> alaski: we have the same problem. subnets per cells. However we are looking into solving it in a different way 17:30:38 <bauzas> so maybe the question is : is Neutron able to scale its L2 network with cells ? 17:31:02 <alaski> belmoreira: good to know. I'd like to ensure you or someone from CERN is involved, as well as people from Rackspace, and then whoever is interested 17:31:36 <alaski> bauzas: that may be a good way to frame it 17:32:59 <bauzas> alaski: so then, Neutron folks should be interested in that 17:33:17 <bauzas> alaski: because that's a scale problem 17:33:25 <alaski> I don't have the knowdedge of Neutron/networking that I'd like to have, so I'm going to be relying on others to know the details while I'm learning 17:34:00 <bauzas> alaski: maybe inviting Neutron folks to our meeting is worth it ? 17:34:22 <alaski> bauzas: I was told that they are interested. And this is a good opportunity for them to learn about what cells is 17:34:33 <bauzas> alaski: totally agree with you 17:35:11 <alaski> bauzas: +1. It would help if there were some people more knowledgeable on networking than I to help represent the Nova side 17:35:43 <alaski> I'm trying to pull in some rackspace networking folks to help 17:36:03 <alaski> and belmoreira when he's available 17:36:31 <belmoreira> alaski: yes, I'm interested on this 17:37:00 <alaski> great 17:37:13 <alaski> so my plan at the moment is primarily to get the right people talking 17:37:24 <bauzas> makes sense 17:37:51 <alaski> #topic Open Discussion 17:38:22 <alaski> anything on anyones mind? 17:38:30 <dheeraj-gupta-4> I'd like to get some feedback on https://review.openstack.org/#/c/150381/ . Like is it in line with what plans for cells are 17:38:37 <dheeraj-gupta-4> or is it totally orthogonal 17:39:27 <alaski> It looks like a really good start, and in line with the plans 17:39:48 <dheeraj-gupta-4> The two/one configuration thing is still fuzzy 17:39:56 <bauzas> dheeraj-gupta-4: I didn't had time to review your latest PS 17:40:40 <bauzas> dheeraj-gupta-4: but very quickly glancing at it, I'm +1 on it 17:40:42 <vineetmenon> any clarification about how to pass connection_url? 17:40:48 <dheeraj-gupta-4> bauzas: no worries...I tried fixing it 17:41:01 <alaski> dheeraj-gupta-4: it could use an answer to dansmiths comment about how this will be used in practice 17:41:27 <dheeraj-gupta-4> and it will probably also cross paths with what alaski is doing with DB 17:41:31 <alaski> dheeraj-gupta-4: i.e. how to make a db call to one of these engines 17:42:10 <bauzas> alaski: I just think we just need a new opt for the cells V2 DB 17:42:15 <dheeraj-gupta-4> Well.... to my mind the function calling the DBAPI method will supply the connection_url 17:42:41 <bauzas> alaski: so when calling the engine, we're just importing the opt 17:43:13 <dheeraj-gupta-4> Like say nova-api on cell show will first call a dbapi.get_cells() and then for each connection_url it will do a dbapi.whatever(connection_url) 17:43:17 <dheeraj-gupta-4> something like that 17:43:24 <alaski> bauzas: I'm thinking about when calling into different cells, the same query could go to multiple dbs 17:43:55 <bauzas> alaski: oh you mean a global engine for multiple DBs ? 17:43:58 <dheeraj-gupta-4> bauzas: Within oslo.db it is not possible 17:43:59 <alaski> dheeraj-gupta-4: it would be good to see an example of that in a patch 17:44:23 <alaski> dheeraj-gupta-4: and you might consider putting that info into the context so that each api method doesn't need to be updated 17:44:28 <bauzas> alaski: I think we need to have one engine per child cell 17:44:31 <dheeraj-gupta-4> but with the changes you asked me to do (moving the register_opts out of oslo.db and into nova) we may be able to get two sections in the conf 17:44:40 <dheeraj-gupta-4> bauzass : ^ 17:44:57 <bauzas> dheeraj-gupta-4: I was not thinking of an oslo.db opt 17:45:02 <alaski> bauzas: right 17:45:18 <alaski> bauzas: and then a way to pick an engine per db call 17:45:34 <dheeraj-gupta-4> alaski: context is a security thing as per my limited understanding no? 17:45:41 <bauzas> alaski: well, that's partially done by dheeraj-gupta-4 in his patcfh 17:45:52 <bauzas> alaski: because there is a dict keyed on the connection string 17:46:09 <alaski> dheeraj-gupta-4: not entirely. it's used for policy but it does more than that, and seems like a good fit for this 17:46:38 <alaski> it holds information about the environment a request is being made in 17:46:40 <dheeraj-gupta-4> alaski: ok, I'll look into that 17:46:43 <bauzas> alaski: by speaking about context, you mean a nova context, or a python context manager ? 17:46:52 <alaski> bauzas: nova context 17:47:21 <bauzas> alaski: well, I don't think we should have one context per cell 17:47:51 <alaski> bauzas: agreed. but the context could hold information about which db a request should go to 17:47:53 <vineetmenon> bauzas, +1 17:48:36 <bauzas> alaski: oh I see, you mean caching the connection information into the context object ? 17:48:36 <vineetmenon> so, you couls possibly have as many contexts as many cells you have 17:49:04 <alaski> I'm going to steal from dansmith again. He gave an example of "target_cell(context, cell) as targetted_context: do_thing(targetted_context, ...)" in a discussion I had with him a while back 17:49:10 <bauzas> alaski: IIUC, that's only nova-api which needs to handle multiple connections right ? 17:49:12 <dheeraj-gupta-4> alaski: but when API makes calls to different cells, it changes the connection info in the context or creates a new context? 17:49:24 <alaski> dheeraj-gupta-4: changes it 17:49:38 <alaski> bauzas: right 17:49:59 <bauzas> alaski: so then the local context is kinda transient ? 17:50:02 <dheeraj-gupta-4> alaski: makes sense (though I'm still a bit hazy about it all) 17:50:13 <alaski> a context manager could set something in the context on a per request basis 17:50:29 <bauzas> alaski: got your idea 17:50:44 <alaski> bauzas: the connection data would be, similar to is_admin on the context 17:51:03 <bauzas> alaski: yeah understood, just need to think about the benefits 17:51:16 <bauzas> alaski: as I said, that's very local to n-apiu 17:51:19 <bauzas> n-api 17:51:51 <alaski> bauzas: sure. The thing I really like is this means not modying every db.api method, and using a context manager means the state is set only when needed 17:51:53 <dheeraj-gupta-4> alaski: the dansmith example is really helpful.... so using that we basically don't need to change API signatures, only the methods themselves and the callers 17:52:13 <alaski> dheeraj-gupta-4: exactly 17:52:14 <bauzas> alaski: yeah, as I said, a context manager used for caching it 17:52:57 <dheeraj-gupta-4> alaski: thanks for clearing that out 17:53:11 <alaski> dheeraj-gupta-4: np 17:53:21 <dheeraj-gupta-4> alaski: Another thing..... the two databases still bug me 17:53:36 <dheeraj-gupta-4> shouldn't there be only one DB the service *really* cares about 17:53:36 <alaski> in what way? 17:53:44 <dheeraj-gupta-4> for n-api that is nova_api 17:53:54 <dheeraj-gupta-4> for n-cpu the standard nova 17:54:20 <alaski> for everything except nova-api that will be true 17:54:36 <dheeraj-gupta-4> so while putting it into code, we don;t explicilty look to handle two DBs 17:54:40 <alaski> but nova-api will need to be able to return data that lives in the cells 17:54:57 <dheeraj-gupta-4> yes but that can be done through mechanism we discussed 17:55:50 <dheeraj-gupta-4> The point I am trying to make is that do we really need two explicit engines in the DB API - Say nova_engine and api_engine? 17:56:09 <dheeraj-gupta-4> the default engine will always point to the correct DB as per the config file in use 17:56:15 <bauzas> dheeraj-gupta-4: from an API PoV, there N+1 engines to manage 17:56:17 <dheeraj-gupta-4> sorry if it is all confusing 17:56:27 <bauzas> dheeraj-gupta-4: N being the number of cells 17:57:09 <alaski> I see what you've worked on a being nova_engine 17:57:14 <alaski> and api_engine doesn't exist yet 17:57:27 <dheeraj-gupta-4> for n-api the nova_engine _is_ the api_engine 17:57:30 <dheeraj-gupta-4> that is my point 17:58:00 <alaski> gotcha 17:58:08 <dheeraj-gupta-4> because the n-api configuration will point it to the correct database 17:58:27 <alaski> I think we do want to be explicit, and I've had reservations about overloading the config option that wya 17:58:28 <alaski> way 17:58:54 <dheeraj-gupta-4> ok...well that was my line of thought during the WIP patch 17:59:08 <bauzas> 1 min to go :) 17:59:21 <dheeraj-gupta-4> alaski: Would you mind commenting on the WIP patch with your PoV on this 17:59:26 <dheeraj-gupta-4> I don't want to hold the meeting up 17:59:28 <alaski> dheeraj-gupta-4: sure 17:59:43 <alaski> it's a good point, and we should all be on the same page there 17:59:48 <dheeraj-gupta-4> yep 18:00:02 <dheeraj-gupta-4> I tried making the same point to dan there but.... 18:00:09 <alaski> I think that's it for today 18:00:15 <alaski> thanks everyone! 18:00:20 <alaski> #endmeeting