20:03:15 <n0ano> #startmeeting 20:03:16 <openstack> Meeting started Thu May 10 20:03:15 2012 UTC. The chair is n0ano. Information about MeetBot at http://wiki.debian.org/MeetBot. 20:03:17 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 20:03:35 <n0ano> no, we're good for orchestration 20:04:12 <maoy> cool 20:04:45 <n0ano> now if sriram appears we'll have full quorum 20:05:13 <maoy> my WIP feature branch is at github 20:05:27 <maoy> that seems like the way to do it for openstack for now. 20:06:23 <n0ano> I think that follows the BKM (Best Known Method), in fact I think you're the first one to do that. 20:06:46 <maoy> glad to be the lab rat.. 20:07:31 <n0ano> we like to call it `bleeding edge` :-) 20:07:49 <n0ano> have you had anyone look at your feature branch yet? 20:08:26 <maoy> Yes. Mark McLoughlin and Jay Pipes 20:08:44 <maoy> i'm also in contact with some folks from IBM and NTT 20:09:00 <n0ano> excellent, any feedback so far? 20:09:15 <maoy> yes, some inline comments at github. 20:09:32 <maoy> will have a much better update next week 20:09:50 <n0ano> sounds good 20:09:53 <maoy> i'm not entirely sure if I should rebase or merge the new update though.. 20:10:16 <n0ano> I would think re-basing would be the way to go, is there a problem. 20:10:20 <maoy> perhaps i should just use a different branch every time.. 20:10:53 <maoy> and rebase 20:11:37 <n0ano> branches are very cheap in git, I use them extensively 20:11:56 <n0ano> pretty much, when I doubt I create a new branch 20:13:54 <maoy> about the blueprint, i'm inclined to update the blueprint inplace than creating a new one 20:14:27 <n0ano> works for me, that should actually create a history which is good 20:14:35 <vishy> I have some comments about orchestration stuff 20:14:47 <vishy> esp. regarding maoy's proposed code 20:15:07 <maoy> great 20:15:40 <maoy> i was hoping to hear from you vishy.. 20:16:02 <vishy> maoy: should i mention now? 20:16:06 <n0ano> #topic proposed code 20:16:35 <vishy> so first the major concern: we are trying to get rid of all db access in nova-compute 20:17:22 <maoy> yes please. 20:17:39 <maoy> that should work when the zookeeper backend is in. 20:18:31 <maoy> without database access, i'm assuming there is a place to write persistent state, such as the health monitor, or report capability 20:20:27 <vishy> maoy: so there are to other things 20:20:48 <vishy> a) if compute isn't hitting the db, I don't think we need distributed state management in compute 20:21:22 <vishy> b) it is possible that distributed state isn't needed at all. Some people have suggested that there are lock-free approaches which might save us a lot of extra work 20:21:40 <vishy> the scheduler could be a a different story 20:22:08 <vishy> but for individual vm state management i think in memory state machine is probably fine on the compute node 20:22:35 <vishy> here is the general principle that I'm going to suggest 20:23:06 <vishy> user requests come into api and they are performed by simply making a call to compute and succeding or failing 20:23:27 <vishy> state is propogated back up from compute to api periodically 20:23:52 <vishy> the api node doesn't need to make decisions about state because it lets the owning node do it 20:24:13 <vishy> there are a few special cases which need to be considered but this can be solved in a lock-free way as well. 20:24:28 <maoy> this should work if the state is local. e.g. the compute node owns the VM 20:24:30 <vishy> thoughts? 20:24:43 <maoy> but my concerns are mostly non local state: 20:24:48 <vishy> maoy: such as? 20:25:06 <maoy> a) volume + vm needs to work together, also network 20:25:08 <maoy> b) vm migration 20:25:55 <vishy> i think a) makes sense and so there may be a need for that kind of state management at a higher layer 20:26:21 <vishy> although I'm not totally sure we are doing anything complicated enough there to warrant distributed locking 20:26:51 <vishy> b) what kind of state is important in this case? and does it need to be managed on multiple nodes? 20:27:25 <maoy> for b) which node owns the VM? the source or the target? 20:29:34 <vishy> maoy: the source until the migration is complete 20:30:10 <vishy> maoy: the two nodes already need to communicate directly to perform the migration so having a higher level lock arbiter seems like a bit of overkill in this case 20:30:25 <vishy> maoy: but perhaps there is a complicated case where it would be necessary 20:33:21 <maoy> vishy: there might be tricky crash cases where it's not clear who owns what.. 20:33:42 <vishy> maoy: I think in general i would prefer if we are doing distributed locking that it does not happen in the compute worker 20:33:58 <vishy> maoy: i want the compute worker to be as dumb as possible and have access to as little as possible 20:34:15 <maoy> vishy: regardless of how it's implemented, the task abstraction still holds. 20:34:24 <vishy> maoy: however it probably needs an internal state machine 20:35:08 <vishy> maoy: to handle some of the transitions required. 20:35:58 <maoy> vishy: ok. points taken. but i don't think the locking mechanism i have in mind is more complicated than local locks. 20:36:20 <vishy> maoy: otherwise i like the idea of tracking actions and steps via something like what you proposed. In fact I tried to make a generalized task system for python here https://github.com/vishvananda/task 20:36:37 <vishy> maoy: before i discovered that celery does essentially the same thing only better :) 20:37:48 <maoy> vishy: i need to look into celery. does celery allow you to kill tasks and recycle locks/resources? 20:38:05 <vishy> maoy: not sure I never got into it that deeply 20:38:55 <maoy> vishy: so even within the compute node, the tracking actions and kill tasks functions are still necessary.. 20:38:56 <vishy> maoy: doesn't look like it has it out of the box: http://loose-bits.com/2010/10/distributed-task-locking-in-celery.html 20:39:22 <vishy> maoy: I agree, I just don't want it to have to talk to a centralized db/zookeeper if possible 20:39:50 <vishy> maoy: and I wonder how much of it is already implemented in the drivers 20:39:53 <maoy> vishy: i see your point. that's one backend change. right? from a centralized db to a in memory one.. 20:40:02 <vishy> maoy: as in xen and libvirt already have to handle state management 20:40:21 <vishy> maoy: so we may get a lot of it for free 20:40:47 <maoy> vishy: i saw those i was actually planning to use the state management code as well. 20:40:48 <vishy> maoy: by just going try: reboot() except: libvirtError rollback() 20:41:42 <vishy> maoy: true, but I wonder if using the db layer is necessary at all. 20:42:15 <vishy> maoy: you could use in memory sqlite but that is going to do table locking and nastiness 20:42:42 <vishy> maoy: so maybe something specifically designed to handle that kind of stuff would be better. 20:42:44 <maoy> vishy: a in memory hash table is enough. actually that's how i started. 20:43:12 <vishy> maoy: That seems like a great place to start, do a simple in memory one 20:43:19 <vishy> maoy: we may find that is all we need. 20:43:24 <maoy> vishy: but I felt that the information is useful for ops to gain insight of the system in general, so keeping the log in db is not a bad place. 20:44:03 <vishy> maoy: hmm i guess that is a good point. There is a review in to store running greenlets, have you seen it? 20:44:03 <maoy> vishy: the thing is, once the task traverse the node boundry, e.g. from compute to network, you lose the context 20:44:50 <maoy> vishy: not yet.. link plz.. 20:44:55 <vishy> maoy: https://review.openstack.org/#/c/6694/ 20:45:22 <vishy> maoy: so this seems like it is solving a very similar problem 20:46:01 <vishy> maoy: especially if we add subtasks/logging to the idea 20:46:32 <vishy> maoy: persistence is also a possibility but I feel like we could add that later if needed. 20:46:33 <maoy> vishy: ok. will take a look. it's local task tracking or cross node? i can't tell from the title.. 20:47:42 <maoy> vishy: i can't connect the blueprint with the patch title. perhaps i should ping JE and read the code for more details. 20:47:59 <vishy> maoy: yeah do that 20:48:03 <vishy> maoy: it is just local 20:48:22 <vishy> maoy: and it is specific to greenthreads (no further granularity) 20:48:39 <maoy> vishy: i'd also have a function where the ops can just say, find all running tasks against that VM, kill them if necessary 20:48:56 <vishy> maoy: yes i think that is where the patch tries to get 20:49:04 <vishy> maoy: you should probably sync up with him 20:49:30 <maoy> vishy: then i though migration might make this tricky so a centralized version is dead simple to get started. 20:49:33 <maoy> vishy: yeah sure. 20:50:14 <maoy> vishy: i have some VM+EBS race conditions in my amazon cloud so I'd like to get that right in openstack. :) 20:50:35 <vishy> maoy: i think we can see how far we get without centralizing. I agree that we will need it for higher-level orchestration 20:50:40 <maoy> vishy: but local task tracking is definitely composable with a global/distributed one 20:51:00 <vishy> maoy: but that could be something that lives above nova / quantum / cinder 20:51:37 <maoy> vishy: that's indeed what's in my mind but i have to start from somewhere.. so nova.. 20:52:12 <vishy> maoy: also check out this one https://review.openstack.org/#/c/7323/ 20:52:34 <vishy> maoy: it looks like johannes is trying to solve the same problems as you, so you should probably communicate :) 20:53:07 <maoy> vishy: ok. that means i'm solving the right problems at least. :) 20:54:23 <maoy> vishy: is there more docs on how to get rid of db? 20:54:32 <maoy> vishy: at compute. 20:56:25 <maoy> vishy: I'm afraid we might have to abuse rabbitmq more to extract state from compute nodes. 20:57:47 <n0ano> compute nodes are already sending state info to the scheduler, can you ride on top of that? 20:58:23 <vishy> maoy: i don't know if there are docs yet 20:58:39 <vishy> maoy: but the idea is to just allow computes to report state about there vms 20:58:52 <vishy> maoy: and all relevant info will be passed in through the queue 20:59:46 <vishy> maoy: my initial version was going to make the api nodes listen and just throw data back in a periodic task 21:00:22 <vishy> maoy: and update the state of the instance on the other end 21:00:58 <vishy> if we keep the user requested state as a separate field, then we don't run into weird timing collisions 21:02:05 <maoy> vishy: i'm not sure i follow this. but it seems like the api nodes, other than translating api calls to compute/network apis, it also monitors the task execution status? 21:02:34 <vishy> maoy: no not task execution status, just vm state 21:03:33 <vishy> maoy: nova-api is just an easy place to put the receiving end of the call, it could also be a separate worker: nova-instance-state-writer or some such 21:04:07 <maoy> vishy: got you 21:05:09 <maoy> vishy: so the vm state change in db now happens in n-cpu, but will be rpc-ed to nova-state-writer who does the db ops 21:05:28 <vishy> maoy: correct 21:05:57 <vishy> maoy: and the calls from api -> compute will pass in all the relevant info so it doesn't need to read from the db either 21:06:08 <vishy> i.e. the entire instance object instead of just an id 21:06:35 <maoy> vishy: great. that makes sense. 21:09:02 <maoy> vishy: i will take a closer look at the the code in review and see how that fits the task management i have. 21:09:49 <maoy> vishy: will make the backend plugable to fit both local in memory and distributed case. 21:11:12 <maoy> vishy: I wish I saw Johannes's patch earlier.. 21:11:26 <vishy> maoy: hard to keep track of this stuff, I know :) 21:11:26 <maoy> vishy: is there any attempt on utilizing celery by anyone else? 21:11:36 <vishy> maoy: not that i know of 21:13:14 <maoy> vishy: ok. so i'll ignore it for now. :) 21:13:47 <maoy> vishy: where would the compute node health status update go without db? 21:14:20 <maoy> i know the IBM folks are working on a zookeeper backend for that. 21:14:40 <vishy> maoy: passed through the queue most likely 21:15:15 <maoy> is this going to happen in folsom or later release? 21:16:19 <vishy> maoy: we are going to try and get all db access out in folsom 21:16:29 <vishy> maoy: but we will see how it goes 21:17:20 <maoy> vishy: what about which VMs should be running on the node -- used periodically to compare it against libvirt/xenapi 21:17:38 <maoy> vishy: does that mean the compute node need to maintain a local copy? 21:17:59 <vishy> maoy: I don't think so, I think the periodic task could be initiated by api/external worker 21:18:31 <vishy> maoy: it could glob the instances directory periodically or something 21:18:54 <vishy> maoy: but having a separate data store I don't think would be needed 21:19:30 <vishy> maoy: alternatively it could keep a list in memory, and make a request out to api/scheduler/nova-db-reader or something and get a list when it starts up 21:20:07 <maoy> vishy: ok. sounds like a lot of changes. will this happen gradually in trunk or on a feature branch? 21:20:18 <vishy> maoy: feature branch i think 21:20:37 <vishy> we are trying to pull staged changes out of trunk 21:22:19 <maoy> vishy: ok. will keep an eye on it. thanks! 21:25:19 <maoy> vishy: i would imagine there are some tricky cases to get the periodic tasks right on n-cpu. but in general i think making n-cpu dumb is the right direction. 21:28:57 <maoy> n0ano: i think we are done in the discussion. 21:29:05 <maoy> vishy: thanks so much for jumping in. :) 21:29:50 <n0ano> sounds good 21:30:25 <n0ano> is there a resolution that needs tp be documented? 21:31:57 <vishy> maoy: yw 21:33:09 <maoy> n0ano: tp? 21:33:36 <n0ano> tp - sorry, don't know the abbreviation 21:39:05 <maoy> n0ano: oh i think you mean "needs to be documented". right? 21:39:24 <n0ano> oops 21:39:28 <n0ano> s/tp/to 21:39:34 <maoy> we have the meeting log for everything, right? 21:39:51 <maoy> not sure about resolution.. 21:39:59 <n0ano> yep, if you don't have a succinct summary that is sufficient. 21:40:58 <n0ano> let's go with the full log and we'll talk again next week 21:41:04 <n0ano> #endmeeting