17:03:29 <renuka> #startmeeting
17:03:30 <openstack> Meeting started Thu Oct 20 17:03:29 2011 UTC.  The chair is renuka. Information about MeetBot at http://wiki.debian.org/MeetBot.
17:03:31 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic.
17:04:17 <renuka> vladimir3p: would you like to begin with the scheduler discussion
17:04:27 <renuka> #topic volume scheduler
17:04:42 <vladimir3p> no prob. So, there is a blueprint spec you've probably seen
17:04:56 <renuka> yes
17:04:57 <vladimir3p> we have couple options on implementing it
17:05:19 <vladimir3p> 1. to create a single "generic" scheduler tat will perform some exemplary scheduling
17:05:39 <vladimir3p> 2. or don't do there almost nothing and just find an appropriate sub-scheduler
17:06:13 <vladimir3p> another change that we may need - providing some extra-data back to Default Scheduler (for sending it to manager host)
17:06:30 <vladimir3p> let's first of all discuss how we would like schedduler to look like:
17:06:39 <renuka> ok
17:06:40 <vladimir3p> either we will put there any basic logic
17:06:52 <vladimir3p> or just rely on every vendor to supply their own
17:07:01 <vladimir3p> I prefer the 1st option
17:07:06 <vladimir3p> any ideas?
17:07:21 <rnirmal> the only problem I see with the sub scheduler is, it's going to become too much too soon
17:07:23 <clayg> I think a generic type scheduler could get you pretty far
17:07:45 <rnirmal> so starting with a generic scheduler would be nice
17:07:48 <DuncanT> Assuming it is generic enough, I can't see a problem with a good generic scheduler
17:07:59 <clayg> I'm not sure how zones and host aggregates play in at the top level scheduler
17:08:04 <timr1> vladimir3p: am I rigth in understaning that the generic schduler will eb able to schdule on opaque keys
17:08:06 <renuka> I think the vendors can plug in their logic in report capabilities, correct? So why not let the scheduler decide which backend is appropriate and based on a mapping which shows which volume workers can reach with backend, we can have the scheduler select a node
17:09:01 <vladimir3p> timr1: yes, at the beginning the generic scheduler could just match volume_type keys with reported keys
17:09:16 <vladimir3p> I was thinking that it must have some logic for quantities as well
17:09:24 <DuncanT> Since the HP backend can support many volume types in a single instance, it is also important that required capabilities get passed through to the create. Is that covered in the design already?
17:09:32 <vladimir3p> (instead of going to the DB for ll available vols on that host)
17:09:59 <vladimir3p> DuncanT: can you pls clarify?
17:10:20 <vladimir3p> DuncanT: I was thinking that this is the essential requirement for it
17:10:49 <vladimir3p> renuka: on volume-driver level, yes, they will report whatever keys they want
17:10:49 <renuka> DuncanT: isn't the volume type already passed in the create call (with the extensions added)
17:11:21 <DuncanT> vladimir3p: A single hp backend could support say (spindle speed==4800, spindle speed=7200, spindlespeed = 15000) so if the user asked for spindlespeed > 7200 we'd need that detail passed through to create
17:11:23 <vladimir3p> DuncanT: yes, volume type is used during volume creation and might be retrieved by the scheduler
17:11:29 <renuka> vladimir3p: correct, so those keys should be sufficient at this point to plug in vendor logic, correct?
17:12:03 <timr1> that sound ok so
17:12:10 <renuka> DuncanT: I think the user need only specify the volume type. The admin can decide if a certain type means spindle speed = x
17:12:16 <vladimir3p> renuka: yes, but without special scheduler driver they will be useless (generic will only match volume type vs this reported capabilities)
17:13:12 <vladimir3p> DuncanT, renuka: yes, I was thinking that admin will create separate volume types. Every node will report what exactly it sees and scheduler will perform a matching between them
17:13:14 <renuka> vladimir3p: what is the problem with that?
17:13:46 <renuka> vladimir3p: every node should report for every backend (array) that it can reach
17:14:03 <renuka> the duplicates can be filtered later or appropriately at some point
17:14:05 <DuncanT> vladimir3p: That makes sense, thanks
17:14:54 <vladimir3p> renuka: I mean drivers will report whatever data they think is appropriate (some opaque key/value pairs), but generic scheduler will only look at the ones it recognize (and those are the one supplied in volume types)
17:15:22 <vladimir3p> if vendor would like to perform some special logic based on this opaque data - special schedulers will be required
17:15:36 <timr1> I agree that we want to deal with quantities as well
17:15:42 <rnirmal> can we do something like match volume_type: if type supports sub_criteria match those too (gleaned from the opaque key/value pairs)
17:16:01 <DuncanT> vladimir3p: It should be possible to do ==. =/=, <, > generically on any key shouldn't it?
17:16:19 <renuka> rnirmal: I agree
17:16:32 <timr1> dont htink we need special schedulers  to support addiitonal vendor logic in backend
17:16:44 <vladimir3p> DuncanT: == and != yes, not sure about >, <
17:17:01 <renuka> yea, we should be able to add at least some basic logic via report capabilities
17:17:42 <vladimir3p> rnirmal: it would be great to have sub_criteria, but how scheduler will recognize that particular key/value pair is not a single criteria, but a complex one
17:17:45 <renuka> #idea what if we add a "specify rules" at driver level and keep "report capabilities" at node/backend level
17:18:28 <rnirmal> valdimir3p: it wouldn't know that, without some rules
17:18:36 <vladimir3p> folks, how about to start with something basic and improve it with the time (we could discuss these improvements right now, but let's agree on basics)
17:18:52 <vladimir3p> so, meanwhile we agreed that:
17:19:21 <vladimir3p> #agreed drivers will report capabilities in key/value pairs
17:19:22 <renuka> vladimir3p: an opaque rules should be simple enough to add, while keeping it generic
17:19:55 <renuka> correction, nodes will report capabilities in key/value pairs per backend
17:19:57 <vladimir3p> #agreed basic scheduler will have a logic to match volume_type's key/value pairs vs reported ones
17:20:30 <vladimir3p> renuka: driver will decide how many types it would like to report
17:20:41 <clayg> I think the abstract_scheduler for compute would be a good base for the gerneic type/quantity scheduler
17:20:41 <vladimir3p> each type has key/value pairs
17:20:59 <vladimir3p> clayg: yes, I was thinking exactly the same
17:21:12 <vladimir3p> clayg: there is already some logic for matching this stuff
17:21:19 <clayg> least_cost allows you to register cost functions
17:21:42 <clayg> oh right...
17:21:52 <clayg> in the vsa scheduler
17:21:53 <clayg> ?
17:22:13 <vladimir3p> we will need to rework vsa scheduler, but it has some basic logic for that
17:22:50 <vladimir3p> how about quantities: I suppose this is important. are we all agree to add reserved keywords?
17:23:08 <renuka> clayg: so do you think it will be straightforward enough to go with the register cost functions at this point?
17:24:07 <DuncanT> vladimir3p: I think the set needs to be small, since novel backends may not have many of the same concepts as other designs
17:24:13 <timr1> vladimir3p: yes - IOPS, bandwidths, capacity etc?
17:24:49 <clayg> renuka: I'm not sure... something like that may work, but the cost idea is more about selecting a best match from a group of basically indentically capable hosts - but with different loads
17:25:34 <vladimir3p> the cost in our case might be total capacity or amount of drivers, etc...
17:25:57 <renuka> I think there needs to be a way to call some method for a driver specific function. It will keep things generic and simplify a lot of stuff
17:26:05 <vladimir3p> that's why I suppose reporting something that scheduler will look at is important
17:26:06 <clayg> perhaps better would be to just subclass and over-ride "filter_capable_nodes" to look at type... and also some vendor specific opaque k/v pairs.
17:26:36 <clayg> the quantities stuff would more closely match the costs concepts
17:26:42 <vladimir3p> clayg: yes, subclass will work for single-vendor scenario
17:27:30 <clayg> vladimir3p: I mean you could call the super method and ignore stuff that's pass on "other-vendor" stuff
17:28:10 <renuka> vladimir3p: how about letting the scheduler to the basic scheduling as you said, and call a driver method passing the backends it has filtered?
17:28:22 <clayg> meh, I think we start with types and quantities - knowning that we'll have to had something more later
17:28:40 <rnirmal> regarding filtering, is anyone considering locating volume near vm, for scenarios with top-of-rack storage
17:28:42 <clayg> s/we/get out of the way and let Vlad
17:28:50 <renuka> rnirmal: yes
17:28:54 <vladimir3p> :-)
17:28:57 <clayg> ;)
17:29:18 <rnirmal> renuka: ok
17:29:33 <vladimir3p> renuka: not sure if generic scheduler should call driver... probably as clayg mentioned every provider could declare a subclass that will do whatever required
17:29:34 <timr1> timr1: we are including it
17:29:58 <vladimir3p> in this case we will not need to  implement volume_type - 2 - driver translation tables
17:30:18 <vladimir3p> however, there is a disadvantage with this - only a single vendor per scheduler ...
17:30:24 <timr1> we (HP) are agnostic at this point as our driver is generic and we re-schdule in the back end
17:30:50 <vladimir3p> timr1: do you really want to rescheduler on volume driver level?
17:31:07 <vladimir3p> timr1: will it be easier if scheduler will have all the logic
17:31:17 <renuka> vladimir3p: you mean single driver... SM for example supports multiple vendors backend types
17:31:20 <timr1> vladimir3p: we already do it this way - legacy
17:31:23 <vladimir3p> timr1: and just use volume node as a gateway
17:31:37 <vladimir3p> ok
17:31:42 <timr1> that is correct - we jsut use it as a gateway
17:31:55 <vladimir3p> renuka: yes, single driver
17:32:35 <vladimir3p> so, can we claim that we agreed that there will be no volume_type-2-driver translation menwhile?
17:33:01 <renuka> vladimir3p: i still think matching simply on type is a bit less. If a driver does report capabilities per backend, there should be a way to distinguish between two backends
17:33:30 <renuka> for example, 2 netapp servers do not make 2 different volume types. They map to the same one, but the scheduler should be able to choose one
17:33:31 <clayg> two backends of the same type?
17:34:15 <vladimir3p> how about to go over particular renuka's scenario and see if we could do it with a basic scheduler
17:34:34 <renuka> similarly, as rnirmal pointed out, top of rack type of restrictions cannot be applied
17:34:56 <clayg> renuka: how do you suggest the scheduler choose between the two if they both have equal "qantities"
17:35:41 <renuka> that is where i think there should be a volume driver specific call
17:35:45 <renuka> or a similar way
17:35:48 <clayg> i do't really know how to support top of rack restrictions... unless you know where the volume is going to be attached when you create it
17:36:26 <renuka> clayg: that will have to be filtered based on some reachability which is reported
17:36:44 <rnirmal> clayg: yeah, it's a little more complicated case, where it would require combining both vm and volume placement, so it might be out of scope for the basic scheduler
17:37:06 <renuka> clayg: we can look at user/project info and make a decision. I expect that is how it is done anyway
17:37:29 <vladimir3p> renuka: I suppose the generic scheduler could not really determine which one should be used, but you could do it in the sub-class. So you will over-ride some methds and pick based on opaque data
17:38:21 <renuka> so by deciding to subclass, we have essentially made it so that at least for a while, we can have only 1 volume driver in the zone
17:38:22 <vladimir3p> in this case scheduler will call driver "indirecty", because driver will over-ride it
17:38:39 <renuka> can everyone here live with that for now?
17:38:58 <clayg> isn't there like an "agree" or "vote" option?
17:38:59 <DuncanT> One issue I've just thought off with volume-types:
17:39:05 <clayg> I'm totally +1'ing the simple thing first
17:39:51 <vladimir3p> #agreed simple things first :-) there will be no volume_type-2-driver translation meanwhile (sub-class will override whatever necessary)
17:40:03 <timr1> :)
17:40:06 <rnirmal> +1
17:40:40 <DuncanT> agreed
17:41:09 <vladimir3p> so, renuka, back to your scenario. I suppose it will be possible to register volume types like: SATA volumes, SAS volumes, etc... probably with some RPM, QoS, etc extra data
17:41:23 <renuka> ok
17:41:33 <DuncanT> Is it possible to specify volume affinity or anti-affinity with volume types?
17:41:56 <vladimir3p> on volume driver side, each driver will go to all its underlying arrays and will collect things like what type of storage each one of them support.
17:42:36 <DuncanT> I don't think there is a way to express taht with volume types?
17:42:50 <vladimir3p> I think after that it could rearrange it in the form like... [ {type: "SATA", arrays: [{array1, access path1}, {array2, accesspath2, ...}]
17:42:59 <renuka> vladimir3p: as long as we can subclass appropriately, the SM case should be fine, since we use one driver (multiple instances) in a zone
17:43:31 <vladimir3p> yeah, the sub-class on scheduler level will be able to perform all the filtering and will be able to recognize such data
17:43:37 <clayg> DuncanT: no I don't think so, not yet
17:43:40 <timr1> DuncanT: I dont think volume types can do this - but it is somethign customers want
17:43:43 <vladimir3p> the only thing - it will need to report back what exactly should be done
17:43:50 <renuka> DuncanT: yes, that is why I am stressing on reachability
17:45:04 <DuncanT> renuka: I see, yes, it is a reachability question too -we were looking at it for performance reasons but the API is the same
17:45:45 <vladimir3p> actually, on volume driver level, we could at least understand the preferred path to the array
17:45:48 <renuka> DuncanT: yes, we could give it a better name ;)
17:46:58 <vladimir3p> renuka: do you think that what I described above will work for you?
17:47:06 <DuncanT> renuka: Generically it is affinity - we don't care about reachability from a scheduler point of view but obviously other technologies might, but we need a way for it to be passed to the driver...
17:47:37 <renuka> vladimir3p: yes
17:47:44 <vladimir3p> renuka: great
17:47:51 <vladimir3p> DuncanT: yes, today scheduler returns back only host name
17:48:26 <vladimir3p> it will be required to return probably a tuple of host and some capabilities that were used for this decision
17:48:36 <renuka> vladimir3p: could we ensure that when a request is passed to the host, it has enough information about which backend array to use
17:48:47 <DuncanT> vladimir3p: I'm more worried about the user facing API - we need a suer to be able to specify both a volume_type and a volume (or list of volumes) to have affinity to or antiaffinity to
17:49:24 <renuka> #topic admin APIs
17:49:41 <vladimir3p> DuncanT: afinity of user to type?
17:49:56 <renuka> DuncanT: that was in fact the next thing on the agenda, what are the admin APIs that everyone requires
17:50:04 <DuncanT> vladimir3p: No, affinity of the new volume to an existing volume
17:50:13 <DuncanT> vladimir3p: On a per-create basis
17:50:44 <vladimir3p> DuncanT: yes, we have the same requirement
17:50:44 <timr1> i dont thinks duncanT is talkign about an admin API, it is the ability of a user to say create my volume near (or far)rom volume-abs
17:50:50 <renuka> DuncanT: could that be restated as ensuring a user/project data is kept together
17:51:09 <timr1> renuka: also want anti-ainity for availability
17:51:19 <timr1> anti-affinity
17:51:22 <renuka> timr1: isn't that too much power in the hands of the user. Will they even know such details?
17:52:22 <timr1> renuka: we have users who want to do it, I dont see it as power. They say make volume X anywhere. Make Volume Y in a different place from volume X
17:52:29 <renuka> timr1: yes, in general some rules about placing user/project data. Am i correct in assuming that anti-affinity is used while creating redundant volumes?
17:52:30 <rnirmal> renuka: I think it's something similar to the hostId, which is an opaque id to the user, on a per user/project basis and maps on the backend to a host location
17:52:31 <vladimir3p> DuncanT: I guess affinity could be solved on sub-scheduler level. Actually specialized scheduler go first retrive what was already created for the user/project and based on that perform filtering
17:53:32 <renuka> vladimir3p: yes but we still have to have a way of saying that
17:53:50 <rnirmal> volumes may need something like the hostId for vms, to be able to tackle affinity first
17:53:51 <timr1> yes it is for usere who want to create their own tier of  availability. agree it could be solved on sub-sched - but need to user API expandedd
17:53:57 <clayg> vladimir3p: that would probably work, when they create the volume they can send in meta info, a customer scheduler could look for affinity and anti-affinity keys
17:54:07 <DuncanT> vladimir3p: What we'd like to make sure of if possible is that the user can specif which specific volume that already exists they want to be affine to (or anti-) - e.g. if they are creating several mirrored pairs, they might want to say that each pair is 'far' from each other to enhance the survivability of that data
17:54:49 <DuncanT> clayg: That should work fine, yes
17:55:39 <rnirmal> DuncanT: is this for inclusion in the basic scheduler ?
17:55:48 <vladimir3p> DuncanT: I see ... in our case we create volumes and build RAID on top of them. So, it is qute important for us to be able to schedule volumes on different nodes and to know about that
17:56:29 <DuncanT> rnirmal: The API field, yes. I'm happy for the scheduler to ignore it, just pass it to the driver
17:56:30 <renuka_> vladimir3p: I was disconnected, has that disrupted anything?
17:56:53 <vladimir3p> renuka: not really, still discussing options for affinity
17:57:13 <DuncanT> Tim and I can write up a more detailed explaination off-line and email it round if that helps with progress?
17:57:20 <clayg> the scheduler will send the volume_ref, any meta data hanging of of that will be available to the driver
17:57:43 <renuka_> #action DuncanT will write up detailed explaination for affinity
17:58:01 <DuncanT> :-)
17:58:10 <vladimir3p> DuncanT: thanks it wil be very helpfull. seems like we have some common areas there
17:58:20 <renuka_> vladimir3p: what is the next step? We have 2 minutes
17:58:21 <clayg> OH: someone on ozone is working on adding os-volumes support to novaclient
17:58:39 <vladimir3p> clayg: seems like we will need to change how message is sent from scheduler to volumemanager
17:59:10 <rnirmal> I suppose admin api topic is for the next meeting
17:59:19 <vladimir3p> renuka: I guess we will need to find a volunteer
17:59:20 <renuka_> rnirmal: yes
17:59:22 <vladimir3p> :-)
17:59:49 <clayg> vladimir3p: I don't see that, manager can get whatever it needs from volume_id
17:59:57 <vladimir3p> nope
18:00:13 <DuncanT> OT: I've put in a blueprint for snap/backup API for some discussion
18:00:16 <clayg> you're so mysterious...
18:00:25 <vladimir3p> clayg: scheduler will need to pass volume not only to the host, but "for particular backEnd array"
18:00:25 <renuka_> any volunteers for the simple scheduler work in the room? we can have people taking this up in the mailing list, since we are out of time
18:01:25 <clayg> hrmmm.... if the manager/driver is responsible for multiple backends that support the same type... I don't see why it would let the scheduler make that decision
18:01:25 <renuka_> alright i am going to end the meeting at this point
18:01:37 <renuka_> is that ok?
18:01:46 <timr1> okso
18:01:54 <vladimir3p> clayg: do you have some free time to continue ?
18:01:59 <clayg> it's going to aggregate that info in capabilities and send it to the scheduler, but by the time create volume comes down - it'll know better than the scheduler node which array is the best place for the volume
18:01:59 <vladimir3p> renuka: fine with me
18:02:13 <renuka_> #endmeeting
18:02:57 <renuka_> oh wow, that didn't work because I got disconnected before
18:03:03 <vladimir3p> clayg: it is not really relevant for my company, but I suppose folks managing multiple arays would prefer to have the scheduling on one place
18:03:05 <clayg> #endmetting
18:03:54 <renuka> #endmeeting