16:10:27 #startmeeting cinder 16:10:28 Meeting started Wed Jun 4 16:10:27 2014 UTC and is due to finish in 60 minutes. The chair is DuncanT. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:10:29 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:10:31 The meeting name has been set to 'cinder' 16:10:35 # topuc Volume backup 16:10:43 #topic Volume backup 16:10:52 ok starting again 16:10:57 here is the spec https://review.openstack.org/#/c/97663/1 16:11:10 n the blueprint https://blueprints.launchpad.net/cinder/+spec/vol-backup-service-per-backend 16:11:26 DuncanT: u hv ny concerns 16:11:34 Yes 16:11:36 or questions 16:11:38 ok 16:12:01 The original aim of volume-backup as a separate service is that it can be scaled indpendantly of the volume service 16:12:05 (among aims) 16:12:16 DuncanT: theoritically yes 16:12:21 but actually no 16:12:38 DuncanT: u still have vol manager for each backend 16:12:42 There are bugs that stop this happening, but I'd rather fix those than increase the coupling 16:13:00 DuncanT: dis aims at decoupling 16:13:11 You have a volume-manager per backend, but you don't need a backup service per volume service 16:13:22 DuncanT: which bugs ? theylisted somewhr 16:13:35 navneet: I've not looked at the issues recently 16:13:37 DuncanT: dats th fault in design 16:13:59 DuncanT: dis is handling msg routing in mgr 16:14:07 navneet: why do you need a backup service per vol service? 16:14:32 winston-d: to route the requests properly 16:14:35 navneet: if the backup service is fixed so it always does a proper remote attach to the volume it is backing up, then it doesn't matter what the backend is 16:14:40 manager is not the place to do it 16:15:08 navneet: Then all the routing can be removed and we can just let the scheduler pick the least busy backup manager 16:15:14 DuncanT: mgr is routing the requests to backends internally 16:15:27 navneet: can you elabrate? 16:15:34 DuncanT: I dn thnk thr is any scheduler for back up 16:15:40 and is not even needed 16:16:01 sure 16:16:07 navneet: There was, it got removed because remote attach didn't work for iscsi based drivers at the time 16:16:36 DuncanT: its not needed now as well 16:16:42 m not proposing to bring in the sch 16:16:57 navneet: who gets to decide where to backup a volume then? 16:16:58 m just telling that the msg handling is better off in a service 16:17:19 navneet: Scheduling is nice to have... some backends at way more efficient if the backup manager is co-located with the volume manager, and otherwise you want to pick a free backup service for new requests where possible 16:17:21 winston-d: its the service 16:17:44 DuncanT: its nice to have but m not aiming in this change 16:18:00 and with this change it will enable sch later 16:18:02 navneet: It is far from clear what you /are/ proposing in this change 16:18:05 if somebody wants it 16:18:31 What is API looks like? DOe sit allows user to choose where to back and to which back end to restore? 16:18:38 DuncanT: u understand hw d request flows? 16:18:45 using a msg server 16:19:01 navneet: In general, yes 16:19:07 i thought you were proposed to add a scheduler 16:19:23 DuncanT: ok so dis ll enable rquest routing at service level 16:19:33 than the manager level whch is d case now 16:19:53 and we do routing everywhr at service level which has a defined host 16:19:57 So the first routing is currently done by the API 16:20:06 DuncanT: yes 16:20:17 DuncanT: it selects a topic and a host 16:20:28 DuncanT: and then goes to msg server 16:20:37 service pick it from thr 16:20:40 The topic and host it selects is already wrong 16:20:54 DuncaT: host is a common host rt now 16:21:01 which needs to be per backend 16:21:11 *DuncanT 16:21:32 DuncanT: u mean u r loking into correcting host? 16:21:36 Common between the volume-manager and the backup-manager, yes. That restriction needs removing 16:21:49 DuncanT: dats wat m planning 16:22:02 DuncanT: dis ll clean it up 16:22:06 navneet: Then your spec does a terrible job of explaining it ;-) 16:22:13 DuncanT: agree 16:22:19 DuncanT: sorry if it does not :( 16:22:36 DuncanT: I can modify if u suggest 16:22:52 It sounds like you want to change the backup-manager code - the breakage happens before the message ahs even left the API node 16:23:18 DucanT: the api ll now pick up the rt host for vol 16:23:22 and not the common host 16:23:31 What is the 'right' host though? 16:23:38 what is a common host? 16:23:39 DuncanT: n a service for that host ll address it 16:23:57 DuncanT: for multibackend it ll be host@backend 16:24:20 DuncanT: currently its just host 16:24:27 with topic volume-backup 16:24:28 No, that means you now need a cinder-backup service for every backend, which is *exactly* what I *don't* want 16:24:44 DuncanT: y u dn want 16:24:45 ? 16:25:10 I want to be able to have one cinder-backup service serving N backends, which might be running on a toally different host to the cinder-volume service 16:25:47 DuncanT: dat can be achieved with changes as well 16:25:59 navneet: Not the changes you are proposing 16:26:11 time check, 34 mins left 16:26:13 DuncanT: we read the cinder.conf to have vol managers for each backend 16:26:33 n we wud do same for service as well 16:26:41 if it works wid current desing 16:26:46 it ll work with canges as well 16:26:51 *changes 16:26:52 navneet: I want to completely and utterly decouple any relationship between volume-managers and backup service 16:27:07 No relationship what so ever 16:27:18 DuncanT: dats seems to be new feature 16:27:31 DuncanT: u dn have it rt now 16:27:46 DuncanT unknowingly u tie it up in manager 16:27:47 navneet: It was an original feature that got removed because iscsi remote attach didn't work 16:27:47 navneet, any chance you can use complete words ? 16:27:54 hemna++ 16:28:03 hemna: sure :) 16:28:05 hemna: +++ 16:28:17 great thanks 16:28:20 my eyes are bleeding. thx 16:28:43 DuncanT: can u plz explain me the problem ffline 16:28:45 navneet: What I don't understand is why you want to have *more* cinder-backup services running 16:28:51 *sigh* 16:29:00 hemna: :) 16:29:08 DuncanT: its not the right way to handle messages 16:29:16 hemna: now works 16:29:41 DuncanT: and its very difficult to extend bacup manager with current design 16:29:46 navneet: What isn't? I don't *care* about messages, I care where things have to be run, and how many need to be run 16:29:55 navneet: Extend it how? 16:30:35 DuncanT: for any new feature which involves back feature support will need changes in manager 16:30:49 navneet: Why? 16:31:20 navneet: I'm writing a feature today, and it is no problem at all. Can you give a specific example? 16:31:22 DuncanT: because you handle request to a particular backend in manager 16:31:37 navneet: That is by design 16:31:38 DuncanT: like in the pools stuff 16:31:43 navneet: Why is that bad? 16:31:52 this seems to be ratholing 16:32:02 DuncanT: wat m telling is that its not the right place 16:32:24 28 mins left, maybe we should move on to next topic 16:32:25 DuncanT: and services are there for that same exact reason 16:32:25 e.g. ceph can't easily attach a volume as a block device, so it handles the I/O itself 16:32:47 navneet: I don't want any backup stuff happening in the volume-manager, ever 16:32:57 navneet: That is too much coupling 16:32:57 DuncanT: I will upload a WIP 16:32:57 DuncanT, +1 16:32:59 we have 3 more topic 16:33:03 can u guys comment on that 16:33:11 navneet: Ok, a WIP might help make things clearer 16:33:17 DuncanT: sure 16:33:19 thx 16:33:24 navneet: +1 for wip 16:33:36 Ok, next topic? 16:33:38 winston-d_:thx 16:33:41 next topic 16:33:47 #topic Dynamic multipool 16:34:05 Did any of you got a chance to look into it 16:34:09 the WIP? 16:34:38 https://review.openstack.org/#/c/85760/ 16:34:39 not I. I fundamentally disagree with the approach. 16:34:42 I had a quick look, and my worry that the driver instance is no longer a singleton stands 16:34:52 navneet: did you put together a spec into gerrid? 16:35:03 winston-d: for multipools no 16:35:13 winston-d: thr is a blueprint howwevr 16:35:36 DncanT: the driver instances u worry about? 16:35:41 DuncanT: 16:35:59 navneet: you should have a matching cinder-spec for https://blueprints.launchpad.net/cinder/+spec/multiple-service-pools 16:36:06 DuncanT: ok let me put this way 16:36:07 I still think we need to do the more simple approach using volume types. 16:36:34 hemna: I was curious if you hae this documented anywhere 16:36:38 I can have a look 16:36:49 there is no reason to be dynamically creating driver instances simply because a backend has a pool or pools. 16:36:58 hemna: I want to do a comparative study 16:37:13 navneet, there really isn't anything to document. It's volume types. 16:37:31 hemna: I think you are not understanding the real problem 16:37:33 just so you know, i'm working on a scheduler based change to address multi-pool needs 16:37:33 put your pool in a volume type. report stats for the pools in get_volume_stats() and make the scheduler aware. 16:37:35 done 16:37:45 POC patchs is 50% ready 16:37:50 hemna: there are backends like ours where you have multiple flexvols 16:37:52 winston-d, +1 16:37:59 navneet: a number of driver today can change the storage pool with volume types today 16:38:07 hemna: and each have a differen capability and capacity 16:38:13 navneet, same with ours 16:38:16 we use volume types. 16:38:43 hemna: if it does not find a pariculr flexvol then it ll fials at cinder-vol 16:38:45 rt now 16:38:58 hemna: I saw the 3par thing 16:39:06 hemna: u also have qualified specs 16:39:13 each our our pools has different capabilities and sizes. We put the pool name/id in the volume type, and we enter a default pool to use if no volume type specified. it just works. 16:39:22 hemna: which does not evaluate at the scheduler 16:39:55 correct, that's why we need to make the scheduler aware of pools via get_volume_stats() 16:39:57 hemna: putting pool name is the type is actually hiding the problem than solvng it 16:39:57 update the get_volume_stats to include capabilities for each volume type 16:39:59 if we do that, then everything just works. 16:40:08 on the contrary 16:40:15 I'm interested in seeing winston-d's PoC since I'm pretty sure he is planning on doing it exactly as I would 16:40:18 putting the pool in the volume type makes the admin directly aware 16:40:33 DuncanT: I will be interested too 16:40:42 dynamically creating drivers per pool, is exactly as you described, hiding the problem. 16:40:51 DuncanT: if u r coming with ny new approach? 16:40:56 I'll put it up tomorrow 16:40:59 heman: not thats not 16:41:19 hemna: it does not mandate admin to create vol type for each pool 16:41:19 ok, agree to disagree. 16:41:33 winston-d: I'm interesting in see as well, did you have a cinder-spec for it yet? 16:41:50 hemna: your vol type will go on increasing if you have pool name in volume specs 16:41:52 winston-d: what is your proposal about? 16:41:56 winston-d: code is fine as well? :) for a POC 16:42:00 Putting the pool name in the type is an unnecessary limitation IMO, makign the scheduler aware of pools and just adding @pool to the end of the create host and making drivers understand that (or adding a pool param to the driver create and letting the manager unwrap it, if that seems cleaner) should work 16:42:13 what does vol type increasing ? that doesn't make sense. 16:42:36 hemna: u need to put pool nam ein the extra spec rt? 16:42:40 kmartin: not really, but i might get the code up into gerrit first. ;) but sure I will submit a spec as well 16:42:54 * jungleboyj has to drop. Will bring my topic up in openstack-cinder or next week unless you guys want to discuss and I will catch up from the meeting notes. 16:42:55 winston-d: perfect, thanks 16:42:56 hemna: for each pool u need to do the same 16:42:56 navneet, I can't make sense of what you are writing, as you aren't using words. 16:43:22 heman: vol type ties specs with key value pai 16:43:25 pair 16:43:26 jungleboyj: Sorry, sisn't see your note about time on the agenda 16:43:42 *didn't 16:43:49 hemna: pool name should be in the spec 16:44:07 hemna: and so each pool will have a new volume type 16:44:20 hemna: as per the approach u mentioned 16:44:35 navneet: That certainly isn't true of the scheduler appoach 16:44:58 DuncanT: scheduler lessens the work for admin 16:45:03 yah, I don't see a problem with that actually. The admin knows about the pools on the backend. If they want them available or not available to use, it should be up to the admin to decide. 16:45:27 DuncanT: and its dynamic which does not require any changes to driver for having new pools 16:45:42 hemna: I feel thats too much work for the admin 16:45:55 when scheduler is made to be pool-aware, it treats every pool like a standalone backend, the only difference is pool itself doesn't really have to be in a cinder-volume service 16:45:59 and they ll hate pools if new vol type needs to be created 16:46:10 winston-d, +1 16:46:14 winston-d++ 16:46:17 winston-d_: I see ur point 16:46:37 but do you how many changes it requirs? 16:46:59 there is no reason to create a new manager/driver instance just to support another pool on the same backend. 16:47:02 navneet: the changes don't look that huge to me, though I might be missing some 16:47:25 DuncanT: is thr a WIP for winstons changes? 16:47:43 what is the mechanism for scehduler to know that multiple pools have "the same charateristics" to choose which pool to create volumes on? 16:47:47 I think winston-d's approach is far better, as long as the admin still has a way of either creating a whitelist and/or a blacklist of which pools to use. 16:47:49 winston-d_: do you have a spec for it? 16:48:01 navneet: It does require a driver interface change so it has some way of telling the driver which pool to pick on create, but that seems a better hit than changing the driver singleton, which affects every driver more fundamentally 16:48:17 It's bad to force the use of all available pools on the array, as the array may be shared by non OpenStack stuff. 16:48:27 navneet: i don't have a spec yet, but my code is almost ready for review 16:48:38 DuncanT: singleton drivers will be in separate threads 16:48:44 hemna: The driver get_stats decides what pools to expose... 16:48:57 DuncanT: no security issue 16:49:03 navneet: Threads don't help much... not security, races 16:49:17 DuncanT, we should have some standard mechanism for informing the driver which pools to whitelist/blacklist though, so it's consistent for all drivers. 16:49:23 DuncanT: I dont see that happening 16:49:30 we have green threads in place 16:49:53 navneet: I do. Two creates can now be started concurrently, inside the same process. That could not happen before 16:50:06 DuncanT: true 16:50:09 navneet: That is a *big* change 16:50:16 DuncanT: but only one ll run 16:50:25 DUncanT: dats the beauty 16:50:29 Why will only one run? 16:50:32 10 mins left. We have 3rd party CI in the queue 16:50:39 DuncanT: becoz we have green threads 16:50:48 DuncanT: within same process 16:50:49 and for N pools, you'll have N drivers, which creates M connections to the backend. This seems like an explosion of resource usage to me as well. 16:51:12 yeah lets progress on the agenda 16:51:26 DuncanT, winston-d_: r u guys working on alternative approaches? 16:51:28 navneet: You've still got problems if you e.g. cache backend info in your driver 16:51:36 navneet: yes, i am. 16:51:38 and with everyone using locks in their driver APIs, you don't really get any benefit of concurrency with the N drivers. It's just a hack. 16:51:47 navneet: winston-d ahs said repeatedly he has something nearly finished 16:51:52 winston-d_: is thr a WIP out thr? ll like to coordinate 16:52:06 navneet: WIP soon from winston 16:52:11 navneet: i didn't put it up yet, will do tomorrow 16:52:21 #action winston-d to publish scheduler based WIP 16:52:34 Right, last topic, 3rd party CI 16:52:35 DuncanT: sure ll like to see that 16:52:41 #topic 3rd party CI 16:52:55 winston-d_: thx 16:53:56 Has anybody looked at asselin's brnach? Or got any other comments? 16:54:16 hemna: are your Jenkins slaves all running on VMs? 16:54:26 Can I get a URL for the brnach? 16:54:43 https://github.com/rasselin/os-ext-testing/tree/nodepool 16:54:53 xyang1_, we are working on getting nodepool support in with jaypipes code 16:55:21 we haven't tested FC PCI passthrough yet either. 16:55:28 hemna: and this branch that was linked is that WIP? 16:55:49 akerr, yah 16:55:56 is it usable? 16:56:07 I think it's close 16:56:11 might be worth trying 16:57:21 hemna: do you have service account all setup? Can tests be triggered manually without that? 16:57:36 xyang1_, we are just triggering manually for now 16:57:50 akerr, asselin_ will be online in the cinder channel later and should be able to answer any question, he is just in another meeting currrently 16:57:51 until everything is dialed in, then we'll get the service account setup 16:58:04 tests can be manually triggered without a service account, or you can attach to the event stream with a personal account (just don't vote on a personal account) 16:58:11 hemna: Are the tests supposed to run only with drivers already merged? 16:58:23 hemna: I had that problem last time with cert test 16:58:35 kmartin: thanks, I have wall-to-wall meetings today, but I'll look for him later on 16:58:43 hemna: it was designed to test merged code, and it erased my new driver changes when I start the test 16:58:45 xyang1_, ideally, they need to pull the patch that triggered the event. 16:59:06 hemna: so I had to change the script to work around that 16:59:23 hemna: we may have to do it for CI test, for new driver test 16:59:27 guys - you close to wrapping up? the next meeting starts in 1 minute 16:59:46 tjones: Ok, thanks 16:59:50 yah, it'll have to get the patchset that triggered the CI test, once it's integrated with the upstream 16:59:56 30 seconds 17:00:03 Right all, time to move things to the cinder channel..... 17:00:15 #end meeting 17:00:19 DuncanT: thanks 17:00:26 #endmeeting