22:00:07 <alaski> #startmeeting nova_cells
22:00:07 <openstack> Meeting started Wed Feb 18 22:00:07 2015 UTC and is due to finish in 60 minutes.  The chair is alaski. Information about MeetBot at http://wiki.debian.org/MeetBot.
22:00:08 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
22:00:11 <openstack> The meeting name has been set to 'nova_cells'
22:00:17 <alaski> Anyone around today?
22:00:19 <bauzas> mornooning
22:00:26 <melwitt> hi
22:01:05 <alaski> cool
22:01:13 <alaski> #topic Testing
22:01:16 <bauzas> lots of people eh
22:01:25 <alaski> bauzas: yep :)
22:01:32 <alaski> https://bugs.launchpad.net/nova/+bug/1420322
22:01:33 <openstack> Launchpad bug 1420322 in OpenStack Compute (nova) "gate-devstack-dsvm-cells fails in volumes exercise with "Server ex-vol-inst not deleted"" [Medium,In progress] - Assigned to Matt Riedemann (mriedem)
22:01:45 <alaski> melwitt: I believe you had a patch for this?
22:02:16 <dansmith> o/
22:02:37 <alaski> melwitt: apparently it did not have that bug number on it, or the bug didn't update
22:03:03 <melwitt> alaski: just looked at it, I don't think so. my patch was for the DetachedInstanceError
22:03:47 <alaski> melwitt: yep, that's this one
22:04:00 <alaski> you have to expand the comment from mriedem to see it though
22:04:00 <melwitt> oh, sorry I didn't make the connection
22:04:22 <alaski> just looked at logstash real quick and it seems to have dissapeared since the 12th
22:04:47 <alaski> so I think we can mark that fixed for now
22:04:53 <melwitt> ah, okay. I can close it out with a link to the merged review
22:05:04 <alaski> melwitt: that would be great, thanks
22:05:18 <alaski> next is https://bugs.launchpad.net/nova/+bug/1423237
22:05:19 <openstack> Launchpad bug 1423237 in OpenStack Compute (nova) "check-tempest-dsvm-cells fails with: "AttributeError: 'dict' object has no attribute 'host' in hypervisor.py"" [High,Confirmed] - Assigned to Sylvain Bauza (sylvain-bauza)
22:05:23 <alaski> which bauzas is working on
22:05:26 <bauzas> my turn
22:05:46 <bauzas> so, the problem is that I had to provide a primitive when calling the compute node object
22:06:23 <bauzas> so, when going to the Host API, it was either an object if it wasn't for cells, but a dict if it was using the cells api
22:06:35 <bauzas> hence the dot notation not working
22:07:18 <bauzas> so, by discussing with alaski, I'm working on providing a ComputeNodeProxy for the Cells API methods around compute_node_get_all() and cn_get()
22:07:53 <alaski> it should just be hydrating a ComputeNode object in the api and then wrapping it with a Proxy, right?
22:07:58 <bauzas> it should take the dict, then uploading it to an object and then running the computenodeproxy
22:08:12 <bauzas> alaski: exactly, like you did
22:08:26 <alaski> bauzas: cool
22:08:42 <alaski> please add me to that when it's ready, and ping me
22:08:46 <bauzas> alaski: I thought first it wasn't good to provide a primitive on the messaging system and then rehydrating it, but that seems to be the only one solution
22:09:09 <alaski> bauzas: I won't say that it's good, but it's how a lot of things work in cells currently
22:09:14 <bauzas> alaski: sure thing, my fingers are working on it
22:09:52 <alaski> cool
22:10:07 <alaski> #topic Database migrations
22:10:20 <alaski> So I've proofed out two methods now
22:10:23 <alaski> alembic https://review.openstack.org/#/c/153666/
22:10:30 <alaski> sqla-migrate https://review.openstack.org/#/c/157156/
22:11:08 <alaski> I personally like alembic much better
22:11:09 <bauzas> cool, starring them
22:11:28 <alaski> there's a more clear separation between the two dbs, and it's much nicer to use
22:11:51 <alaski> but johannes brought up a good point about requiring devs to know two systems
22:11:58 <bauzas> alaski: yeah, but my main problem is that it means that the patch is very huge
22:12:02 <alaski> so I'd like to get some additional feedback
22:12:23 <alaski> bauzas: I can probably split the patch
22:12:30 <bauzas> alaski: agreeing with jerfeldt, that's something I'm thinking
22:12:43 <alaski> it's 590 lines vs 273 right now
22:12:54 <bauzas> alaski: remember a previous comment I made, that means that we will have 2 migration tools for 2 distinct DBs
22:12:56 <alaski> but I probably have some tests to fix with sqla-migrate
22:13:17 <alaski> bauzas: right.  so I don't love that it's two tools, but I like that they're separate
22:13:38 <alaski> some of the sqla-migrate code is a bit unclear right now as to which db it's working on
22:13:54 <bauzas> alaski: mmm, as I said previously, I know that johannes is working on alembic for Nova, right
22:13:59 <bauzas> ?
22:14:05 <alaski> yes
22:14:17 <alaski> but it's not at a point where I can get away with not writing migrations
22:14:24 <bauzas> mmm
22:14:40 <bauzas> that's a priority problem then
22:15:05 <alaski> well, we asked him to make it optional
22:15:14 <alaski> and there are still some bits to merge
22:15:19 <bauzas> I mean, we can support an alembic provision for the Cells DB, but that's something huge
22:15:44 <alaski> I don't think it is really
22:15:50 <bauzas> because then, the port to Alembic makes it mandatory to the Cells DB
22:16:00 <alaski> a user has no exposure because it's behind nova-manage
22:16:07 <bauzas> alaski: at least, you have to work on nova-manage
22:16:08 <bauzas> eg
22:16:11 <bauzas> eh
22:16:15 <bauzas> that's the point
22:16:20 <bauzas> new CLI
22:16:28 <alaski> it's new either way
22:16:33 <bauzas> agreed
22:16:42 <edleafe> /me walks in late
22:16:52 <edleafe> <sigh>
22:17:12 <bauzas> so, maybe my problem is that's you're providing a new CLI for alembic in the same patch for the Cells DB
22:17:22 <bauzas> maybe that's just a split problem
22:17:30 <alaski> bauzas: right, I can split that out
22:17:55 <alaski> I should add this to the agenda for the Nova meeting to get some additional feedback
22:18:24 <bauzas> alaski: agreed, that's maybe having more impact than just us
22:18:39 <bauzas> and also operators could be interested in it
22:18:40 <alaski> I have a preference for alembic, but I can see the argument for using sqla-migrate
22:18:58 <alaski> bauzas: it shouldn't matter for an operator though
22:19:17 <bauzas> alaski: I mean, if your Cells patch is agnostic to the migration tool, that's not a problem
22:19:39 <bauzas> alaski: because if you're splitting, then you could just say that's optional
22:20:09 <bauzas> I need to review again your patch for seeing how we could have an agnostic migration tool
22:20:29 <alaski> an operator will see 'nova-manage db api_sync'
22:20:43 <alaski> the arguments are slightly different, but they shouldn't care what's behind that
22:21:21 <alaski> it's more a change for devs
22:21:40 <bauzas> alaski: well, you're right
22:21:56 <alaski> but I'll add it to the Nova agenda and we can discuss there as well
22:22:19 <alaski> feedback welcome on the reviews in the meantime
22:22:25 <bauzas> alaski: sure thing
22:22:57 <alaski> #topic Multiple database support
22:23:02 <alaski> https://review.openstack.org/#/c/150381/
22:23:12 <alaski> just bringing attention to this mainly
22:23:33 <alaski> the patch has evolved a bit so getting more reviews would be helpful
22:23:54 <dansmith> I need to look at that again
22:23:58 <dansmith> sorry for being lazy
22:24:13 <alaski> we'll call it busy
22:24:25 <alaski> he added in the context manager
22:24:36 <dansmith> ah, cool
22:24:37 <bauzas> alaski: yeah, I saw
22:24:44 <alaski> it could still use some example of using it, but I think the direction is good
22:24:52 <bauzas> alaski: sure, I can review it again, but it needs some rebase
22:25:00 <bauzas> alaski: agreed
22:25:29 <bauzas> alaski: some high-level unittests could cover this
22:25:46 <alaski> bauzas: yeah, that would be good to see
22:26:10 <alaski> #topic Neutron discussion
22:26:43 <alaski> I've been in touch with some networking folks at Rackspace who are helping me to understand more about neutron and nova and cells
22:26:58 <alaski> and I have some volunteers to help with some discussions
22:27:32 <alaski> now I'm trying to get everyone together to get started
22:28:16 <dansmith> nice
22:28:26 <bauzas> cool
22:28:46 <alaski> yeah.  they've been thinking about this for a long time and have some ideas they haven't been able to bring to fruition
22:29:04 <alaski> so I'm going to at least get those out in the open
22:29:27 <alaski> but that's all I have for now
22:29:34 <bauzas> sounds like a new etherpad manifesto eh ? :)
22:30:01 <alaski> bauzas: that might be good
22:30:13 <bauzas> at least it would be async :)
22:30:28 <alaski> once I have a better handle on the scope of it I'll see how that can be documented for discussion
22:30:44 <bauzas> so if the Rackspace guys are willing to put some draft, I would be glad to sneak peek on it
22:31:34 <alaski> at this point it seems to me that there are solutions for specific things, not a wholistic solution yet
22:32:02 <bauzas> alaski: so, I remember our last call, and it was about wondering if Neutron can scale
22:32:40 <bauzas> because if we assess that we'll support Neutron, it should scale on the same pace than Nova
22:33:12 <alaski> bauzas: apparently it scales, but has some challenges
22:33:40 <alaski> so having them thinking about cells might be good
22:33:44 <bauzas> alaski: good to know, I'm looking forward knowing the challenges :)
22:35:02 <alaski> db related from what I know, as everything seems to be
22:35:26 <alaski> #topic Open Discussion
22:36:15 <alaski> I had one topic I wanted to bring up, related to cells v1
22:36:36 <alaski> melwitt and I were looking at how to pass instance objects up during cell updates
22:37:02 <alaski> where I stopped was when that caused a loop of updates
22:37:25 <alaski> instance.save cause an update to go up/down which triggers an instance.save on the other end
22:37:54 <alaski> so in order to get this to work we need to make updates one way only
22:38:31 <alaski> I'm not sure of a good way to do that without modifying the save api
22:39:05 <melwitt> I thought the same, something akin to the update_cells=True/False thing in the db api
22:40:21 <bauzas> alaski: you mean that updating an instance means that you will call twice the DB save ?
22:40:40 <melwitt> it just seemed like we need a way to indicate we don't want it to sync back
22:41:11 <alaski> bauzas: it will loop forever currently
22:41:24 <alaski> melwitt: right
22:41:40 <alaski> I was thinking we would need to tell the object when we call save, but now I don't think we do
22:42:16 <melwitt> I was thinking the same thing. all it does is detect whether it's at the top cell or not and sync to the other side depending
22:42:38 <bauzas> oic
22:43:32 <alaski> hmm, just looked a bit closer and might have a thought
22:43:43 <bauzas> any pointer I could look at it ?
22:44:11 <dansmith> alaski: I really hate that :(
22:44:13 <alaski> bauzas: instance_update_from_api, instance_update_at_top in messaging.py
22:44:19 <dansmith> alaski: that being "update_cells=True"
22:45:04 <dansmith> alaski: can we break the chain by looking to see if the updates being made are already in the db?
22:45:07 <alaski> dansmith: what would you say to a context manager for save()?  @dont_update_cell save
22:45:13 <melwitt> dansmith: the notes near that say that once everything calls Instance.save, it could go away. but I think we'd still have this ping pong syncing unless I'm missing something
22:45:14 <dansmith> or even something simple like a TTL to prevent it from running after X hops?
22:45:42 <dansmith> melwitt: I didn't write that stuff (AFAIK), so I'm not sure
22:45:57 <dansmith> alaski: how would that work? save happens at the conductor side, not the caller side
22:46:01 <melwitt> dansmith: heh yeah, I know. comstud wrote the notes
22:46:24 <alaski> dansmith: in instance_update_at_top it would call save in a way that neuters to cells sync
22:46:31 <dansmith> alaski: how about we catch up tomorrow morning and look at the details?
22:46:34 <alaski> and same for isntance_update_from_api
22:46:45 <alaski> dansmith: sure
22:46:45 <dansmith> alaski: well, I know, but I mean, how would the context manager communicate it to the remoted call?
22:46:52 <alaski> melwitt: it would still have the ping pong
22:47:10 <alaski> dansmith: ahh, I see
22:48:11 <alaski> stopping the sync if there are no writes could work, but it would require an extra trip and would be prone to races
22:48:26 <dansmith> yeah
22:48:41 <dansmith> can we calculate a TTL from the cell path?
22:48:48 <bauzas> can we add something in the context that we're passing ?
22:48:56 <dansmith> like if we're in the first child cell, TTL would be 1, so it only ever gets updated once, parent cell or child cell?
22:48:59 <bauzas> like a proximity direction
22:49:02 <dansmith> bauzas: maybe
22:49:31 <bauzas> dansmith: a TTL is a good idea because that's not cell related
22:49:42 <dansmith> well, it'd only ever be needed in cells
22:50:00 <bauzas> dansmith: atm, yes
22:50:04 <alaski> the issue is still how to let the object know the ttl
22:50:13 <alaski> TTL
22:50:23 <bauzas> alaski: introspecting the context it gets ?
22:50:48 <alaski> context would work, I just hate to add something just for this
22:51:10 <bauzas> alaski: isn't the direction we're following for the cells DB connection ? :D
22:51:12 <dansmith> we're getting the loop because we're calling cells_rpc.instance_update right?
22:51:43 <alaski> bauzas: right, but that's a more general thing for a feature.  not a hack :)
22:52:02 <bauzas> alaski: agreed, it was a pun
22:52:10 <alaski> dansmith: it's a loop between instance_update_at_top and instance_update_from_api
22:52:19 <alaski> in cells/messaging.py
22:52:29 <dansmith> surely seems like we can do something in there, since we have all those cells bits in between
22:52:55 <melwitt> dansmith: and it occurs after we convert everything to objects and call instance.save() in both places
22:53:13 <dansmith> right
22:53:19 <melwitt> right now it's not happening because instance_update_at_top calls db.instance_update
22:53:26 <melwitt> yeah
22:53:29 <dansmith> can we slap something into the cell name? doesn't cell_name have foo!bar syntax or whatever?
22:54:12 <alaski> hmm, we'd have to be careful that it's not persisted but that might work
22:54:52 <dansmith> right
22:55:03 <dansmith> like slap a # on the end or something
22:55:12 <dansmith> but,
22:55:44 <dansmith> it also seems like we should be able to do something in the cells bits to prevent the loop
22:56:40 <alaski> I'm just coming up blank on that right now.  the cells methods are being called from instance.save() and right now don't know if it's the first time they're being called or not
22:56:41 <dansmith> wait
22:57:05 <alaski> something in instance.save had to provide some data to the cells bits
22:57:12 <alaski> s/had/has/
22:57:43 <dansmith> if we unset cell_name entirely before we make those calls,
22:57:51 <dansmith> then they won't match the condition on the receiving end,
22:58:11 <dansmith> but we won't have cell_name be modified either, so we won't try to update it in the DB
22:58:26 <dansmith> meaning,
22:58:31 <dansmith> unset it on the clone we pass over rpc
22:58:31 <melwitt> hm, interesting
22:58:46 <dansmith> I don't think anything else there will care that it's missing, will it?
22:59:11 <dansmith> with a big comment on top that says "remove the cell name so that we don't re-run this on the other side" and it'll half make sense even
22:59:18 <melwitt> hehe
22:59:33 <alaski> the cells routing uses the cell_name at some point, I'll have to refresh myself on that
22:59:54 <dansmith> hmm
22:59:55 <alaski> but we could pull it off after that point
23:00:16 <dansmith> maybe modify *those* apis to do the smart thing if we pass it a flag?
23:00:45 <alaski> yeah, I'm all for that.  It's figuring out when to pass the flag that's tricky
23:01:11 <alaski> oops, times up
23:01:17 <alaski> Thanks all!
23:01:23 <alaski> #endmeeting