20:02:56 #startmeeting 20:02:57 Meeting started Thu Oct 13 20:02:56 2011 UTC. The chair is renuka. Information about MeetBot at http://wiki.debian.org/MeetBot. 20:02:58 Useful Commands: #action #agreed #help #info #idea #link #topic. 20:03:08 yay, I thought that might work :) 20:03:19 cool :-) 20:03:22 Right, I tried to change the topic before, but that didn't work 20:03:35 you should be able to now with #topic 20:03:50 anyone can do #info #idea and #link 20:04:02 not sure about action 20:04:05 #help 20:04:16 ok 20:04:51 So going by the agenda, we could let Vladimir start, since he has a concrete list of what he would like to bring up 20:05:14 o/ 20:05:17 renuka is the agenda in the wiki? 20:05:17 ok, not really concrete 20:05:24 if not i will add it 20:05:41 no I haven't added it... just in the email 20:05:43 so, starting with our points. some history: 20:06:12 we've implemented our own driver & scheduler that are using report_capabilities and reporting our special info 20:06:38 in general I suppose it will be great if we could either standardize this info or make it completely up to vendor 20:06:47 vladimir3p_: sidebar regarding the VSA impl in trunk - can I test it w/o ZadaraDriver? 20:06:54 but on scheduler level it should recognize what type of info it is getting for volume 20:07:03 nope :-) 20:07:08 :( 20:07:10 we are working on "generalizing" it 20:07:26 other people could help if they could run it :-) 20:07:39 maybe you can give me a trial license ;) 20:07:41 yep, agree 20:07:57 vladimir3p_: are the capabilites for volumes or storage backends? 20:08:18 for us - they are storage backend capab 20:08:31 and are they dynamic? could you give an example 20:08:44 there is a special package that we install on every node (where volume is running) and recognize what types of drives are there 20:08:45 vladimir3p_: and also to confirm on caps - they are only aggregated in the scheduler not dropped into db? 20:08:50 yes, it is dynamically updated 20:08:56 yes 20:09:04 (only dynamic in sched) 20:09:23 # link http://wiki.openstack.org/NovaVolumeMeetings 20:09:28 moreover, there is a small code for verification if they were changed or not (to avoid sending the same data over and over) 20:09:30 #link http://wiki.openstack.org/NovaVolumeMeetings 20:09:37 vladimir3p_: I've read the blueprint a couple of times, and the impl deviated slightly because of some other depends - but in import ways... did docs ever get released? 20:10:05 s/import/important 20:10:10 docs - still in progress. we have some outdated ones 20:10:36 anyway, I suppose VSA is not the main topic of this meeting :-) 20:10:51 ok, sorry to grill you :D - zadara has made the most dramatic changes to volumes lately (no disrespect to vishy's latest fixes!) 20:10:53 we would be glad to generalize this stuff or at least to use some of our concepts 20:11:02 So (forgive my ignorance) what part of the capabilities is dynamic? 20:11:16 hmm... everything ... 20:11:21 :) 20:11:27 vladimir3p_: honestly I think generalization of what we already have is the most important chater of this group starting out 20:11:49 report capabilities goes to driver (periodic task) and checks what is there, what used/free/etc 20:12:01 renuka: just the reporting of capabilities... the scheduler just reacts to the constant stream of data being fed up from the drivers 20:12:11 change the storage backend and let it propogate up 20:12:21 *dynamic* 20:12:27 I would push towards the generalization/abstraction as vladimir mentions rather than worry much about details of Zadara (although I'm very interested) 20:12:38 makes sense 20:13:04 The question I have is what are you proposing in terms of the generalization? 20:13:06 jdg: zadara is already all up in trunk, understanding what's there and how it falls short of greater needs seems relevant 20:13:17 Fair enough 20:13:32 jdg: what does soldfire need? 20:13:45 from the scheduler part - it is also very specific to Zadara today. The cheduler knows to correspond volume types to whatever reported by driver 20:13:55 *scheduler 20:14:26 * clayg goes to look at scheduler 20:14:28 vladimir3p_: could you please link the blueprint here 20:14:28 clayg: Very long list... :) My main interest is getting san.py subclasses written (my issue of course), and future possibilities of boot from vol 20:15:05 jdg: the transport is ultimately iscsi, yes? 20:15:08 i would suggest a simple general scheduler 20:15:14 Correct 20:15:33 doesn't kvm already do boot from volume? 20:15:35 i would guess many storage vendors need to get specific and change it. 20:15:36 * clayg has never tried it 20:15:42 clayg: yes it does 20:16:15 is hp here? 20:16:29 a general volume type scheduler that does something dumb like match volume type to a single reported capability would be great 20:17:00 and a more advanced one that does json matching and filtering as well like the ones on the compute side 20:17:08 vishy: yep, something like that. the question if we would like to have schedulers per volume type 20:17:15 vishy: agreed on the basic, and advanced sounds nice too 20:17:38 how would this scheme react if we have restrictions about which hosts can access which storage? 20:17:57 #idea A very basic volume type scheduler that can redirect to different backends based on a single reported capability called "type" 20:18:11 vladimir3p_: as in a meta-scheduler which chooses a scheduler based on the requested type and passes on to that? 20:18:26 #idea A more advanced scheduler that does json filtering similar to the advanced schedulers in compute 20:18:44 df1: yep 20:18:51 renuka: i would think that we should have a simlar concept to host-aggregates for volumes 20:19:06 renuka: That would fall under the advanced scheduler - pass in arguments to say something about which sorts of volumes are ok 20:19:09 renuka: i think it could use the same set of apis as proposed by armando 20:19:22 vishy: right... that's what I was getting to 20:19:39 so the scheduler would need the knowledge of host aggregates as well, then? 20:19:53 renuka: you could implement it through capabilities, but it wouldn't be dynamically modifyable in that case 20:20:04 vishy, renuka: missed amando's proposal - what was it? 20:20:10 link pls 20:20:18 renuka: it also means that during volume creation you need to know associated instance 20:20:24 renuka: yes the scheduler will need access. I would think that the easy version would be to add the host-aggregate metadata in the capability reporting 20:20:38 #link https://blueprints.launchpad.net/nova/+spec/host-aggregates 20:20:39 #link https://blueprints.launchpad.net/nova/+spec/host-aggregates 20:20:44 you beat me 20:20:45 :( 20:20:47 :) 20:20:53 vladimir3p_: it may be sufficient to know project_id depending on how you're scheduling instances 20:21:14 vishy, renuka: thnx for link 20:22:23 vladimir3p_: good point... I am expecting this to be used most for things like boot from volume 20:22:45 so I assumed there would be more control over where the volume got created when it did 20:23:27 I suppose that aggregates is a very important part, but more like an "advanced" add-on 20:23:31 #idea report host agreggate metadata through capabilities to the scheduler so that we don't have to do separate db access in the scheduler. 20:24:32 vishy: that means information like which backend can be accessed by which host-aggregate? 20:24:50 I'm trying to understand the entire flow ... when volume will be created, should aggregates/host relations be stored in volume types extra specs? 20:24:52 vishy: how does this work out with a very large number of host agreggates? 20:24:55 #topic capability reporting 20:25:07 renuka can you change the topic? Only the chair can do it 20:25:31 now at this point, if we do not have a heirarchy of some sort, and there happens to be storage reachable from a large number of hosts but not all within that zone, it gets a little tricky via capabilities 20:25:33 df1: each host only reports the host aggregate metadata that it is a part of 20:25:44 #topic capability reporting 20:25:49 thx 20:25:51 when did I become chair :P 20:25:59 when you typed #startmeeting 20:26:01 :) 20:26:30 #info the previous conversation was also about capability reporting, and the need for host aggregates to play a part 20:26:42 so is capability reporting _mainly_ reporting the volume_types that are available? 20:27:01 I suppose it should be also total quantity / occupied / etc 20:27:09 (per each volume type) 20:27:24 and which hosts it can access, per vish's suggestion 20:27:27 the reporting itself we could leave as is 20:27:42 renuka: which host - it will be up to driver 20:27:46 so you know all the storage node endpoints, what volume types they can create and how much space/how many volumes they have? 20:27:56 I mean every driver could add whatever additional info 20:28:10 renuka: will the storage node _know_ which hosts it can "reach" 20:28:12 clayg: yes 20:28:22 clayg: the capability reporting is meant to be general, so it depends on the scheduler. Generally the idea is for complicated scheduling you might report all sorts of capabilities 20:28:36 ok so there is some way for the driver to specify implementation specific/connection details 20:28:41 clayg: but i think most use cases can be solved by just reporting the types that the driver supports 20:28:44 vladimir3p_: but unless every driver is going to run their own scheduler (totally an option) they'll want to agree on what goes where and what it means. 20:29:40 lets get the simple version working first 20:29:43 renuka: the driver can have it's own flags, or get config from wherever I suppose, but it wouldn't need/want to report that to the scheudler? 20:29:50 vishy, valdimir3p_: is there a one-many relation between types and storage backends? 20:30:07 there isn't an explicit relationship 20:30:08 clayg: that's where we will do "generalization": we will report {volume_type, quantity_total, quantity_used, other_params={}} 20:30:34 is quantity megs, gigs, # of volumes? 20:30:35 volume_type doesn't have any default definitions currently 20:30:46 some abstract numbers 20:31:02 depends on volume type 20:31:14 vladimir3p_: yeah gotcha, agreed between type - makes sense 20:31:19 renuka: volume_driver_type (which is how compute knows how to connect currently has: 'iscsi', 'local', 'rbd', 'sheepdog' 20:31:31 the scheduler will be only able to perform some comparisons 20:31:37 and currently multiple backends export 'iscsi' 20:32:02 vishy: but in the context of scheduling type is more granular - the host may connect via iscsi to "gold" and "silver" volume_types 20:32:29 yes, my point was, more than 1 backend can be associated with a type as abstract as gold 20:33:05 because if gold means netapp backends... we could still have more than one of those, correct? 20:33:05 renuka: well... the backend may just be the driver/message_bus - and then that driver can speak to multiple "nodes" ? (one idea) 20:33:42 clayg, renuka: +1 20:34:25 renuka: in the sm impl, where does nova-volume run? 20:34:41 it is meant to be a control plane. so on a node of its own 20:34:50 #idea lets define 3 simple types: bronze, silver, gold and make the existing drivers just export all three types. This will allow us to write a scheduler that can differentiate the three 20:34:53 yeah... I guess I really mean like how manY/ 20:34:56 by node i mean xenserver host 20:35:01 clayg: depends on the backend 20:35:29 clayg: in the simple lvm/iscsi you have X volume hosts and one runs on every host 20:35:31 vishy: Can you expand on your bronze, silver, gold types? 20:35:49 clayg: in HPSan you run one that just talks to the san 20:35:50 renuka: how about to report list of volume types, where details like gold/silver connection, etc might be hidden in additional params. The dimple scheduler should be able to "consolidate" all together, but more granular schedulers could really understand if gold or silver should be used 20:35:54 vishy: but in renuka's sm branch the the xenapi kinda abstracts the different backends 20:35:58 so in SM, the volume driver instances have equal capabilities at this point... we expect the requests to be distributed accross them... and then can all see all of the storage 20:36:23 jdg: they are just names so that we can prove that we support different tiers of storage 20:36:48 renuka: but they still have to figure out placement as far as reaching the node? Or the scheduler already did that for them? 20:36:53 jdg: and that the scheduling and driver backends use them. 20:37:15 vishy: thanks 20:37:29 clayg: at this point, the scheduler is a first fit... 20:37:42 vladimir3p_, renuka: I really don't understand when why a report_capabilities would ever send up "connection" 20:37:45 so it treats all the storage equally for now 20:38:16 clayg: connection here means more like access control info 20:38:33 maybe that was a bad word again... more like which hosts can reach this storage 20:38:34 that did not disambiguate the concept for me :P 20:38:41 what is "access control info" 20:38:41 clayg: I suppose I understand Renuka's use case - you may have different storage controllers connected to different nodes (Active Optimized/Non-optimized, etc.) 20:39:10 the access from "preferred" controller might be more preferable 20:40:33 Looking at the time, do we want to continue this discussion or touch more of the topics (i vote to continue) 20:40:37 renuka: do you have a solution in mind when there is no knowledge about which vm the storage will be attached to? 20:41:15 vishy: for a scheduler you mean? 20:41:45 or which storage backend will be used when there is no knowledge? 20:42:06 the second 20:42:41 for example, creating a volume through the ec2 api gives you no knowledge of where it will be attached 20:42:53 in that case, the assumption is that reachability is not an issue, correct? So I agree with Vladimir's capability based decision 20:43:02 vishy: for EC2 - you could have default type 20:43:29 having said that, I would expect some knowledge of the project/user to be used 20:43:52 type is not the concern 20:43:55 so, do we have an agreement re reporting capabilities? It will report the list of ... {volume type id, quantities (all, used), other info} 20:44:06 the concern is for top-of-rack storage 20:44:10 you have to put it somewhere 20:44:17 renuka: when you say "storage backend" is that a type of backend or a specific node/instances/pool that can create volumes? 20:44:46 in other_info we could put some sort of UUID for storage array ... 20:44:49 clayg: it could be anything ranging from local storage on the volume node to netapp backends 20:45:52 we have only 15 min left ... prior to Vish's meeting 20:45:59 do we have specific blueprints for these features? 20:46:22 I would suggest blueprints for the first scheduler with simple capability reporting 20:46:27 at the very least 20:46:43 and someone taking lead on getting it implemented 20:46:48 yes, it would be a really good idea to have as many details of the implementation written down, so we have something more concrete 20:46:48 jsut to compare if storage is there? 20:48:02 vladimir3p_: would it be possible for you to put the design details into a scheduler blueprint? 20:48:40 yeah, I could create something small ... sorry, on this stage cant commit for full implementation 20:49:18 sure... we should just have a point where we have a concrete design 20:49:42 yep, I will write down what I think about simple scheduler & simple reporting 20:49:52 blue print could link to a etherpad we could flesh it out more async 20:50:09 makes sense 20:50:13 update blueprint when we're happy (same time next week?) 20:50:14 we could review it and decide if it is good for everyone as a first step 20:50:15 sounds good 20:50:43 we should discuss the time ... turns out netapp and hp couldn't make it to this one 20:50:52 I vote for same time next week 20:51:07 fine with me 20:51:37 can anyone make it earlier on any day? It is 8 pm UTC which might be late for some 20:51:38 I will create an etherpad for this design .. or should we put it on wiki? 20:51:55 how about 10am PST? 20:51:56 vladimir3p_: yes 20:53:03 so, next Thu @ 10am. Renuka, will you send an update to everyone on ML? 20:53:05 vladimir3p_: 10am PST is good 20:53:15 #action vladimir3p_ to document plans for simple reporting and scheduler 20:53:16 10am works 20:53:21 10am is good 20:53:25 that works too 20:53:47 re document location: wiki or etherpad? 20:53:53 do we have a blueprint for it? 20:53:55 #info next meeting moved to 10 am 20:54:12 vladimir3p_: pls make a blueprint 20:54:15 etherpad is good for hashing out design 20:54:19 you can link to either imo 20:54:19 ok, will do 20:56:17 #info the rest of the agenda will be discussed next week 20:56:24 vladimir3p_: what did you mean by - Volume-type aware drivers 20:56:39 renuka: if we're done could you issue an #endmeeting 20:56:56 renuka: to be able to have multiple drivers on the same node responsible for different volume types 20:57:00 #endmeeting