22:00:07 #startmeeting nova_cells 22:00:07 Meeting started Wed Feb 18 22:00:07 2015 UTC and is due to finish in 60 minutes. The chair is alaski. Information about MeetBot at http://wiki.debian.org/MeetBot. 22:00:08 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 22:00:11 The meeting name has been set to 'nova_cells' 22:00:17 Anyone around today? 22:00:19 mornooning 22:00:26 hi 22:01:05 cool 22:01:13 #topic Testing 22:01:16 lots of people eh 22:01:25 bauzas: yep :) 22:01:32 https://bugs.launchpad.net/nova/+bug/1420322 22:01:33 Launchpad bug 1420322 in OpenStack Compute (nova) "gate-devstack-dsvm-cells fails in volumes exercise with "Server ex-vol-inst not deleted"" [Medium,In progress] - Assigned to Matt Riedemann (mriedem) 22:01:45 melwitt: I believe you had a patch for this? 22:02:16 o/ 22:02:37 melwitt: apparently it did not have that bug number on it, or the bug didn't update 22:03:03 alaski: just looked at it, I don't think so. my patch was for the DetachedInstanceError 22:03:47 melwitt: yep, that's this one 22:04:00 you have to expand the comment from mriedem to see it though 22:04:00 oh, sorry I didn't make the connection 22:04:22 just looked at logstash real quick and it seems to have dissapeared since the 12th 22:04:47 so I think we can mark that fixed for now 22:04:53 ah, okay. I can close it out with a link to the merged review 22:05:04 melwitt: that would be great, thanks 22:05:18 next is https://bugs.launchpad.net/nova/+bug/1423237 22:05:19 Launchpad bug 1423237 in OpenStack Compute (nova) "check-tempest-dsvm-cells fails with: "AttributeError: 'dict' object has no attribute 'host' in hypervisor.py"" [High,Confirmed] - Assigned to Sylvain Bauza (sylvain-bauza) 22:05:23 which bauzas is working on 22:05:26 my turn 22:05:46 so, the problem is that I had to provide a primitive when calling the compute node object 22:06:23 so, when going to the Host API, it was either an object if it wasn't for cells, but a dict if it was using the cells api 22:06:35 hence the dot notation not working 22:07:18 so, by discussing with alaski, I'm working on providing a ComputeNodeProxy for the Cells API methods around compute_node_get_all() and cn_get() 22:07:53 it should just be hydrating a ComputeNode object in the api and then wrapping it with a Proxy, right? 22:07:58 it should take the dict, then uploading it to an object and then running the computenodeproxy 22:08:12 alaski: exactly, like you did 22:08:26 bauzas: cool 22:08:42 please add me to that when it's ready, and ping me 22:08:46 alaski: I thought first it wasn't good to provide a primitive on the messaging system and then rehydrating it, but that seems to be the only one solution 22:09:09 bauzas: I won't say that it's good, but it's how a lot of things work in cells currently 22:09:14 alaski: sure thing, my fingers are working on it 22:09:52 cool 22:10:07 #topic Database migrations 22:10:20 So I've proofed out two methods now 22:10:23 alembic https://review.openstack.org/#/c/153666/ 22:10:30 sqla-migrate https://review.openstack.org/#/c/157156/ 22:11:08 I personally like alembic much better 22:11:09 cool, starring them 22:11:28 there's a more clear separation between the two dbs, and it's much nicer to use 22:11:51 but johannes brought up a good point about requiring devs to know two systems 22:11:58 alaski: yeah, but my main problem is that it means that the patch is very huge 22:12:02 so I'd like to get some additional feedback 22:12:23 bauzas: I can probably split the patch 22:12:30 alaski: agreeing with jerfeldt, that's something I'm thinking 22:12:43 it's 590 lines vs 273 right now 22:12:54 alaski: remember a previous comment I made, that means that we will have 2 migration tools for 2 distinct DBs 22:12:56 but I probably have some tests to fix with sqla-migrate 22:13:17 bauzas: right. so I don't love that it's two tools, but I like that they're separate 22:13:38 some of the sqla-migrate code is a bit unclear right now as to which db it's working on 22:13:54 alaski: mmm, as I said previously, I know that johannes is working on alembic for Nova, right 22:13:59 ? 22:14:05 yes 22:14:17 but it's not at a point where I can get away with not writing migrations 22:14:24 mmm 22:14:40 that's a priority problem then 22:15:05 well, we asked him to make it optional 22:15:14 and there are still some bits to merge 22:15:19 I mean, we can support an alembic provision for the Cells DB, but that's something huge 22:15:44 I don't think it is really 22:15:50 because then, the port to Alembic makes it mandatory to the Cells DB 22:16:00 a user has no exposure because it's behind nova-manage 22:16:07 alaski: at least, you have to work on nova-manage 22:16:08 eg 22:16:11 eh 22:16:15 that's the point 22:16:20 new CLI 22:16:28 it's new either way 22:16:33 agreed 22:16:42 /me walks in late 22:16:52 22:17:12 so, maybe my problem is that's you're providing a new CLI for alembic in the same patch for the Cells DB 22:17:22 maybe that's just a split problem 22:17:30 bauzas: right, I can split that out 22:17:55 I should add this to the agenda for the Nova meeting to get some additional feedback 22:18:24 alaski: agreed, that's maybe having more impact than just us 22:18:39 and also operators could be interested in it 22:18:40 I have a preference for alembic, but I can see the argument for using sqla-migrate 22:18:58 bauzas: it shouldn't matter for an operator though 22:19:17 alaski: I mean, if your Cells patch is agnostic to the migration tool, that's not a problem 22:19:39 alaski: because if you're splitting, then you could just say that's optional 22:20:09 I need to review again your patch for seeing how we could have an agnostic migration tool 22:20:29 an operator will see 'nova-manage db api_sync' 22:20:43 the arguments are slightly different, but they shouldn't care what's behind that 22:21:21 it's more a change for devs 22:21:40 alaski: well, you're right 22:21:56 but I'll add it to the Nova agenda and we can discuss there as well 22:22:19 feedback welcome on the reviews in the meantime 22:22:25 alaski: sure thing 22:22:57 #topic Multiple database support 22:23:02 https://review.openstack.org/#/c/150381/ 22:23:12 just bringing attention to this mainly 22:23:33 the patch has evolved a bit so getting more reviews would be helpful 22:23:54 I need to look at that again 22:23:58 sorry for being lazy 22:24:13 we'll call it busy 22:24:25 he added in the context manager 22:24:36 ah, cool 22:24:37 alaski: yeah, I saw 22:24:44 it could still use some example of using it, but I think the direction is good 22:24:52 alaski: sure, I can review it again, but it needs some rebase 22:25:00 alaski: agreed 22:25:29 alaski: some high-level unittests could cover this 22:25:46 bauzas: yeah, that would be good to see 22:26:10 #topic Neutron discussion 22:26:43 I've been in touch with some networking folks at Rackspace who are helping me to understand more about neutron and nova and cells 22:26:58 and I have some volunteers to help with some discussions 22:27:32 now I'm trying to get everyone together to get started 22:28:16 nice 22:28:26 cool 22:28:46 yeah. they've been thinking about this for a long time and have some ideas they haven't been able to bring to fruition 22:29:04 so I'm going to at least get those out in the open 22:29:27 but that's all I have for now 22:29:34 sounds like a new etherpad manifesto eh ? :) 22:30:01 bauzas: that might be good 22:30:13 at least it would be async :) 22:30:28 once I have a better handle on the scope of it I'll see how that can be documented for discussion 22:30:44 so if the Rackspace guys are willing to put some draft, I would be glad to sneak peek on it 22:31:34 at this point it seems to me that there are solutions for specific things, not a wholistic solution yet 22:32:02 alaski: so, I remember our last call, and it was about wondering if Neutron can scale 22:32:40 because if we assess that we'll support Neutron, it should scale on the same pace than Nova 22:33:12 bauzas: apparently it scales, but has some challenges 22:33:40 so having them thinking about cells might be good 22:33:44 alaski: good to know, I'm looking forward knowing the challenges :) 22:35:02 db related from what I know, as everything seems to be 22:35:26 #topic Open Discussion 22:36:15 I had one topic I wanted to bring up, related to cells v1 22:36:36 melwitt and I were looking at how to pass instance objects up during cell updates 22:37:02 where I stopped was when that caused a loop of updates 22:37:25 instance.save cause an update to go up/down which triggers an instance.save on the other end 22:37:54 so in order to get this to work we need to make updates one way only 22:38:31 I'm not sure of a good way to do that without modifying the save api 22:39:05 I thought the same, something akin to the update_cells=True/False thing in the db api 22:40:21 alaski: you mean that updating an instance means that you will call twice the DB save ? 22:40:40 it just seemed like we need a way to indicate we don't want it to sync back 22:41:11 bauzas: it will loop forever currently 22:41:24 melwitt: right 22:41:40 I was thinking we would need to tell the object when we call save, but now I don't think we do 22:42:16 I was thinking the same thing. all it does is detect whether it's at the top cell or not and sync to the other side depending 22:42:38 oic 22:43:32 hmm, just looked a bit closer and might have a thought 22:43:43 any pointer I could look at it ? 22:44:11 alaski: I really hate that :( 22:44:13 bauzas: instance_update_from_api, instance_update_at_top in messaging.py 22:44:19 alaski: that being "update_cells=True" 22:45:04 alaski: can we break the chain by looking to see if the updates being made are already in the db? 22:45:07 dansmith: what would you say to a context manager for save()? @dont_update_cell save 22:45:13 dansmith: the notes near that say that once everything calls Instance.save, it could go away. but I think we'd still have this ping pong syncing unless I'm missing something 22:45:14 or even something simple like a TTL to prevent it from running after X hops? 22:45:42 melwitt: I didn't write that stuff (AFAIK), so I'm not sure 22:45:57 alaski: how would that work? save happens at the conductor side, not the caller side 22:46:01 dansmith: heh yeah, I know. comstud wrote the notes 22:46:24 dansmith: in instance_update_at_top it would call save in a way that neuters to cells sync 22:46:31 alaski: how about we catch up tomorrow morning and look at the details? 22:46:34 and same for isntance_update_from_api 22:46:45 dansmith: sure 22:46:45 alaski: well, I know, but I mean, how would the context manager communicate it to the remoted call? 22:46:52 melwitt: it would still have the ping pong 22:47:10 dansmith: ahh, I see 22:48:11 stopping the sync if there are no writes could work, but it would require an extra trip and would be prone to races 22:48:26 yeah 22:48:41 can we calculate a TTL from the cell path? 22:48:48 can we add something in the context that we're passing ? 22:48:56 like if we're in the first child cell, TTL would be 1, so it only ever gets updated once, parent cell or child cell? 22:48:59 like a proximity direction 22:49:02 bauzas: maybe 22:49:31 dansmith: a TTL is a good idea because that's not cell related 22:49:42 well, it'd only ever be needed in cells 22:50:00 dansmith: atm, yes 22:50:04 the issue is still how to let the object know the ttl 22:50:13 TTL 22:50:23 alaski: introspecting the context it gets ? 22:50:48 context would work, I just hate to add something just for this 22:51:10 alaski: isn't the direction we're following for the cells DB connection ? :D 22:51:12 we're getting the loop because we're calling cells_rpc.instance_update right? 22:51:43 bauzas: right, but that's a more general thing for a feature. not a hack :) 22:52:02 alaski: agreed, it was a pun 22:52:10 dansmith: it's a loop between instance_update_at_top and instance_update_from_api 22:52:19 in cells/messaging.py 22:52:29 surely seems like we can do something in there, since we have all those cells bits in between 22:52:55 dansmith: and it occurs after we convert everything to objects and call instance.save() in both places 22:53:13 right 22:53:19 right now it's not happening because instance_update_at_top calls db.instance_update 22:53:26 yeah 22:53:29 can we slap something into the cell name? doesn't cell_name have foo!bar syntax or whatever? 22:54:12 hmm, we'd have to be careful that it's not persisted but that might work 22:54:52 right 22:55:03 like slap a # on the end or something 22:55:12 but, 22:55:44 it also seems like we should be able to do something in the cells bits to prevent the loop 22:56:40 I'm just coming up blank on that right now. the cells methods are being called from instance.save() and right now don't know if it's the first time they're being called or not 22:56:41 wait 22:57:05 something in instance.save had to provide some data to the cells bits 22:57:12 s/had/has/ 22:57:43 if we unset cell_name entirely before we make those calls, 22:57:51 then they won't match the condition on the receiving end, 22:58:11 but we won't have cell_name be modified either, so we won't try to update it in the DB 22:58:26 meaning, 22:58:31 unset it on the clone we pass over rpc 22:58:31 hm, interesting 22:58:46 I don't think anything else there will care that it's missing, will it? 22:59:11 with a big comment on top that says "remove the cell name so that we don't re-run this on the other side" and it'll half make sense even 22:59:18 hehe 22:59:33 the cells routing uses the cell_name at some point, I'll have to refresh myself on that 22:59:54 hmm 22:59:55 but we could pull it off after that point 23:00:16 maybe modify *those* apis to do the smart thing if we pass it a flag? 23:00:45 yeah, I'm all for that. It's figuring out when to pass the flag that's tricky 23:01:11 oops, times up 23:01:17 Thanks all! 23:01:23 #endmeeting