20:03:15 #startmeeting 20:03:16 Meeting started Thu May 10 20:03:15 2012 UTC. The chair is n0ano. Information about MeetBot at http://wiki.debian.org/MeetBot. 20:03:17 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 20:03:35 no, we're good for orchestration 20:04:12 cool 20:04:45 now if sriram appears we'll have full quorum 20:05:13 my WIP feature branch is at github 20:05:27 that seems like the way to do it for openstack for now. 20:06:23 I think that follows the BKM (Best Known Method), in fact I think you're the first one to do that. 20:06:46 glad to be the lab rat.. 20:07:31 we like to call it `bleeding edge` :-) 20:07:49 have you had anyone look at your feature branch yet? 20:08:26 Yes. Mark McLoughlin and Jay Pipes 20:08:44 i'm also in contact with some folks from IBM and NTT 20:09:00 excellent, any feedback so far? 20:09:15 yes, some inline comments at github. 20:09:32 will have a much better update next week 20:09:50 sounds good 20:09:53 i'm not entirely sure if I should rebase or merge the new update though.. 20:10:16 I would think re-basing would be the way to go, is there a problem. 20:10:20 perhaps i should just use a different branch every time.. 20:10:53 and rebase 20:11:37 branches are very cheap in git, I use them extensively 20:11:56 pretty much, when I doubt I create a new branch 20:13:54 about the blueprint, i'm inclined to update the blueprint inplace than creating a new one 20:14:27 works for me, that should actually create a history which is good 20:14:35 I have some comments about orchestration stuff 20:14:47 esp. regarding maoy's proposed code 20:15:07 great 20:15:40 i was hoping to hear from you vishy.. 20:16:02 maoy: should i mention now? 20:16:06 #topic proposed code 20:16:35 so first the major concern: we are trying to get rid of all db access in nova-compute 20:17:22 yes please. 20:17:39 that should work when the zookeeper backend is in. 20:18:31 without database access, i'm assuming there is a place to write persistent state, such as the health monitor, or report capability 20:20:27 maoy: so there are to other things 20:20:48 a) if compute isn't hitting the db, I don't think we need distributed state management in compute 20:21:22 b) it is possible that distributed state isn't needed at all. Some people have suggested that there are lock-free approaches which might save us a lot of extra work 20:21:40 the scheduler could be a a different story 20:22:08 but for individual vm state management i think in memory state machine is probably fine on the compute node 20:22:35 here is the general principle that I'm going to suggest 20:23:06 user requests come into api and they are performed by simply making a call to compute and succeding or failing 20:23:27 state is propogated back up from compute to api periodically 20:23:52 the api node doesn't need to make decisions about state because it lets the owning node do it 20:24:13 there are a few special cases which need to be considered but this can be solved in a lock-free way as well. 20:24:28 this should work if the state is local. e.g. the compute node owns the VM 20:24:30 thoughts? 20:24:43 but my concerns are mostly non local state: 20:24:48 maoy: such as? 20:25:06 a) volume + vm needs to work together, also network 20:25:08 b) vm migration 20:25:55 i think a) makes sense and so there may be a need for that kind of state management at a higher layer 20:26:21 although I'm not totally sure we are doing anything complicated enough there to warrant distributed locking 20:26:51 b) what kind of state is important in this case? and does it need to be managed on multiple nodes? 20:27:25 for b) which node owns the VM? the source or the target? 20:29:34 maoy: the source until the migration is complete 20:30:10 maoy: the two nodes already need to communicate directly to perform the migration so having a higher level lock arbiter seems like a bit of overkill in this case 20:30:25 maoy: but perhaps there is a complicated case where it would be necessary 20:33:21 vishy: there might be tricky crash cases where it's not clear who owns what.. 20:33:42 maoy: I think in general i would prefer if we are doing distributed locking that it does not happen in the compute worker 20:33:58 maoy: i want the compute worker to be as dumb as possible and have access to as little as possible 20:34:15 vishy: regardless of how it's implemented, the task abstraction still holds. 20:34:24 maoy: however it probably needs an internal state machine 20:35:08 maoy: to handle some of the transitions required. 20:35:58 vishy: ok. points taken. but i don't think the locking mechanism i have in mind is more complicated than local locks. 20:36:20 maoy: otherwise i like the idea of tracking actions and steps via something like what you proposed. In fact I tried to make a generalized task system for python here https://github.com/vishvananda/task 20:36:37 maoy: before i discovered that celery does essentially the same thing only better :) 20:37:48 vishy: i need to look into celery. does celery allow you to kill tasks and recycle locks/resources? 20:38:05 maoy: not sure I never got into it that deeply 20:38:55 vishy: so even within the compute node, the tracking actions and kill tasks functions are still necessary.. 20:38:56 maoy: doesn't look like it has it out of the box: http://loose-bits.com/2010/10/distributed-task-locking-in-celery.html 20:39:22 maoy: I agree, I just don't want it to have to talk to a centralized db/zookeeper if possible 20:39:50 maoy: and I wonder how much of it is already implemented in the drivers 20:39:53 vishy: i see your point. that's one backend change. right? from a centralized db to a in memory one.. 20:40:02 maoy: as in xen and libvirt already have to handle state management 20:40:21 maoy: so we may get a lot of it for free 20:40:47 vishy: i saw those i was actually planning to use the state management code as well. 20:40:48 maoy: by just going try: reboot() except: libvirtError rollback() 20:41:42 maoy: true, but I wonder if using the db layer is necessary at all. 20:42:15 maoy: you could use in memory sqlite but that is going to do table locking and nastiness 20:42:42 maoy: so maybe something specifically designed to handle that kind of stuff would be better. 20:42:44 vishy: a in memory hash table is enough. actually that's how i started. 20:43:12 maoy: That seems like a great place to start, do a simple in memory one 20:43:19 maoy: we may find that is all we need. 20:43:24 vishy: but I felt that the information is useful for ops to gain insight of the system in general, so keeping the log in db is not a bad place. 20:44:03 maoy: hmm i guess that is a good point. There is a review in to store running greenlets, have you seen it? 20:44:03 vishy: the thing is, once the task traverse the node boundry, e.g. from compute to network, you lose the context 20:44:50 vishy: not yet.. link plz.. 20:44:55 maoy: https://review.openstack.org/#/c/6694/ 20:45:22 maoy: so this seems like it is solving a very similar problem 20:46:01 maoy: especially if we add subtasks/logging to the idea 20:46:32 maoy: persistence is also a possibility but I feel like we could add that later if needed. 20:46:33 vishy: ok. will take a look. it's local task tracking or cross node? i can't tell from the title.. 20:47:42 vishy: i can't connect the blueprint with the patch title. perhaps i should ping JE and read the code for more details. 20:47:59 maoy: yeah do that 20:48:03 maoy: it is just local 20:48:22 maoy: and it is specific to greenthreads (no further granularity) 20:48:39 vishy: i'd also have a function where the ops can just say, find all running tasks against that VM, kill them if necessary 20:48:56 maoy: yes i think that is where the patch tries to get 20:49:04 maoy: you should probably sync up with him 20:49:30 vishy: then i though migration might make this tricky so a centralized version is dead simple to get started. 20:49:33 vishy: yeah sure. 20:50:14 vishy: i have some VM+EBS race conditions in my amazon cloud so I'd like to get that right in openstack. :) 20:50:35 maoy: i think we can see how far we get without centralizing. I agree that we will need it for higher-level orchestration 20:50:40 vishy: but local task tracking is definitely composable with a global/distributed one 20:51:00 maoy: but that could be something that lives above nova / quantum / cinder 20:51:37 vishy: that's indeed what's in my mind but i have to start from somewhere.. so nova.. 20:52:12 maoy: also check out this one https://review.openstack.org/#/c/7323/ 20:52:34 maoy: it looks like johannes is trying to solve the same problems as you, so you should probably communicate :) 20:53:07 vishy: ok. that means i'm solving the right problems at least. :) 20:54:23 vishy: is there more docs on how to get rid of db? 20:54:32 vishy: at compute. 20:56:25 vishy: I'm afraid we might have to abuse rabbitmq more to extract state from compute nodes. 20:57:47 compute nodes are already sending state info to the scheduler, can you ride on top of that? 20:58:23 maoy: i don't know if there are docs yet 20:58:39 maoy: but the idea is to just allow computes to report state about there vms 20:58:52 maoy: and all relevant info will be passed in through the queue 20:59:46 maoy: my initial version was going to make the api nodes listen and just throw data back in a periodic task 21:00:22 maoy: and update the state of the instance on the other end 21:00:58 if we keep the user requested state as a separate field, then we don't run into weird timing collisions 21:02:05 vishy: i'm not sure i follow this. but it seems like the api nodes, other than translating api calls to compute/network apis, it also monitors the task execution status? 21:02:34 maoy: no not task execution status, just vm state 21:03:33 maoy: nova-api is just an easy place to put the receiving end of the call, it could also be a separate worker: nova-instance-state-writer or some such 21:04:07 vishy: got you 21:05:09 vishy: so the vm state change in db now happens in n-cpu, but will be rpc-ed to nova-state-writer who does the db ops 21:05:28 maoy: correct 21:05:57 maoy: and the calls from api -> compute will pass in all the relevant info so it doesn't need to read from the db either 21:06:08 i.e. the entire instance object instead of just an id 21:06:35 vishy: great. that makes sense. 21:09:02 vishy: i will take a closer look at the the code in review and see how that fits the task management i have. 21:09:49 vishy: will make the backend plugable to fit both local in memory and distributed case. 21:11:12 vishy: I wish I saw Johannes's patch earlier.. 21:11:26 maoy: hard to keep track of this stuff, I know :) 21:11:26 vishy: is there any attempt on utilizing celery by anyone else? 21:11:36 maoy: not that i know of 21:13:14 vishy: ok. so i'll ignore it for now. :) 21:13:47 vishy: where would the compute node health status update go without db? 21:14:20 i know the IBM folks are working on a zookeeper backend for that. 21:14:40 maoy: passed through the queue most likely 21:15:15 is this going to happen in folsom or later release? 21:16:19 maoy: we are going to try and get all db access out in folsom 21:16:29 maoy: but we will see how it goes 21:17:20 vishy: what about which VMs should be running on the node -- used periodically to compare it against libvirt/xenapi 21:17:38 vishy: does that mean the compute node need to maintain a local copy? 21:17:59 maoy: I don't think so, I think the periodic task could be initiated by api/external worker 21:18:31 maoy: it could glob the instances directory periodically or something 21:18:54 maoy: but having a separate data store I don't think would be needed 21:19:30 maoy: alternatively it could keep a list in memory, and make a request out to api/scheduler/nova-db-reader or something and get a list when it starts up 21:20:07 vishy: ok. sounds like a lot of changes. will this happen gradually in trunk or on a feature branch? 21:20:18 maoy: feature branch i think 21:20:37 we are trying to pull staged changes out of trunk 21:22:19 vishy: ok. will keep an eye on it. thanks! 21:25:19 vishy: i would imagine there are some tricky cases to get the periodic tasks right on n-cpu. but in general i think making n-cpu dumb is the right direction. 21:28:57 n0ano: i think we are done in the discussion. 21:29:05 vishy: thanks so much for jumping in. :) 21:29:50 sounds good 21:30:25 is there a resolution that needs tp be documented? 21:31:57 maoy: yw 21:33:09 n0ano: tp? 21:33:36 tp - sorry, don't know the abbreviation 21:39:05 n0ano: oh i think you mean "needs to be documented". right? 21:39:24 oops 21:39:28 s/tp/to 21:39:34 we have the meeting log for everything, right? 21:39:51 not sure about resolution.. 21:39:59 yep, if you don't have a succinct summary that is sufficient. 21:40:58 let's go with the full log and we'll talk again next week 21:41:04 #endmeeting