16:13:17 <DuncanT-> #startmeeting Cinder 16:13:18 <openstack> Meeting started Wed Nov 13 16:13:17 2013 UTC and is due to finish in 60 minutes. The chair is DuncanT-. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:13:19 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:13:22 <openstack> The meeting name has been set to 'cinder' 16:13:37 <winston-d> DuncanT-: good call 16:13:44 <DuncanT-> avishay isn't here yet AFAICT 16:14:09 <DuncanT-> Is Ehud Trainin here? 16:14:17 <ehudtr> yes 16:14:21 <jgriffith> go figure 16:14:28 <DuncanT-> #topic Fencing 16:14:44 <DuncanT-> Oh, hi Jogn, all yours ;-) 16:15:02 <DuncanT-> Jogn? John even 16:15:07 <jgriffith> :) 16:15:25 <jgriffith> Looks like Ehud is hanging with Dave W 16:15:41 <DuncanT-> ehudtr: The stage is all yours.... 16:15:59 <ehudtr> Following last dicussion I agree two of your comments regarding the fencing implementation 16:16:13 <ehudtr> I accept your comment fencing should take care also for detaching the volumes at the Cinder level. 16:16:28 <ehudtr> I accept your comment it is not necessary to add into Cinder a blacklist of hosts nor an unfence method. 16:17:03 <ehudtr> I do think it should be possible to fence/force-detach a host through a new method of force-detach-host, rather then trying to change the current detach-volume method to force a detachment at the storage level and use such method for each one of the volumes attached to a host. 16:17:40 <ehudtr> for several reasons 16:17:43 <ehudtr> In case of NFS it is not possible to force-detach a volume. 16:18:00 <ehudtr> In cases it is possible, there would still be a problem when shared volumes would supported by Cinder 16:18:06 <jgriffith> ehudtr: TBH in NFS we don't really ever detach to begin with :) 16:18:23 <ehudtr> It is an optimization, which may be valuable for fast recovery: send 1 request rather than N (e.g. 100) requests 16:19:37 <caitlin56> Isn't "NFS support" something the specific volume driver would be responsible for? 16:19:53 <jgriffith> I still have the same concerns I raised previously WRT the complication and potential for errant fencing to occur 16:20:11 <ehudtr> I think we would like to prevent access at the storage level 16:20:15 <caitlin56> jgriffith: +1 16:20:16 <jgriffith> My only other question is "is this a real problem" 16:20:48 <jgriffith> anybody else have any thoughts on this? 16:21:01 <guitarzan> is there a proposal written up somewhere? 16:21:02 <winston-d> live migration, maybe 16:21:04 <DuncanT-> I think it is a real problem, yes. We do something very similar to a fence in compute node startup 16:21:11 <jgriffith> https://blueprints.launchpad.net/cinder/+spec/fencing-and-unfencing 16:21:16 <guitarzan> jgriffith: thanks! 16:21:16 <ehudtr> Fencing is something standard done in HA clusters like pacemaker 16:21:32 <winston-d> someone reported such issue, when using ceph 16:21:46 <jgriffith> too bad this wouldn't apply to Ceph :) 16:22:04 <dosaboy> winston-d: did they raise a bug for that in the end, i did not see one 16:22:20 <jungleboyj> I think as OpenStack is moving to HA we need to be considering how Cinder fits into that. 16:22:21 <ehudtr> The current option in openstack to do a rebuild without first fencing is a bug in my opinion 16:22:52 <winston-d> dosaboy: no, i don't think so. 16:22:58 <jgriffith> jungleboyj: don't confuse this with HA impl 16:23:05 <jgriffith> Ok 16:23:17 <jgriffith> ehudtr: sounds like folks are in favor of moving forward on this 16:23:35 <jungleboyj> jgriffith: Ah, sorry. 16:23:42 <jgriffith> ehudtr: My concerns as I stated are just how to do this cleanly and mitigating having admins shoot themselves 16:23:53 <guitarzan> as a counter example, we were looking at a way to do the exact opposite and whitelist at the ip level 16:24:13 <jgriffith> guitarzan: I thnk that's kinda how the LIO target works 16:24:43 <jgriffith> guitarzan: or how it "could" work I guess, right now we just read the connector info but it has hooks to have a white list 16:25:15 <guitarzan> jgriffith: nice, I'll have to look at that 16:25:24 <ehudtr> I think one possible way to prevent admin to press to easily on the fence botton is enabling only if a host in a failed state 16:25:41 <DuncanT-> I can see the point of the idea... waiting to see code before I have a strong opinion... 16:25:48 <DuncanT-> How does cinder know a host is failed? 16:25:58 <jgriffith> guitarzan: https://github.com/openstack/cinder/blob/master/cinder/brick/iscsi/iscsi.py#L441 16:26:24 <ehudtr> Cinder needs not know the host is failed Nova and possibly Heat will know 16:26:33 <winston-d> DuncanT-: notified by Nova? 16:27:00 <caitlin56> I agree, cinder cannot determine this itself. It needs to come from other OpenStack components. 16:27:03 <jgriffith> so add an API call to "notify-invalid-iqn's" or something of the sort? 16:27:23 <jgriffith> and how is it cleared :) 16:27:39 <jgriffith> nova has to then have another command to add something back in 16:27:55 <jgriffith> honestly seems like this all needs to happen in nova first 16:28:01 <caitlin56> Would't it be a one time transition. "Clear any attached volumes held by this compute instance."? 16:28:08 <jgriffith> I think there's more work there than on this side (ie failure detection etc) 16:28:32 <jgriffith> caitlin56: sure, but if you blacklist a node, what happens when it comes back up and you want to add it back in to your cluster 16:28:44 <jgriffith> You have to clear it somehow 16:29:06 <jgriffith> It's not just clear current attach if I understand ehudtr correctly 16:29:07 <caitlin56> A node is attempting to re-attach without having been in contact with nova? 16:29:22 <jgriffith> ehudtr: it's clear an attach and prevent that attach from being reconnected no? 16:29:29 <jungleboyj> jgriffith: I was assuming that this would be managed by Nova and Cinder would just provide the tools. 16:29:34 <jgriffith> thus the term "fencing" 16:29:44 <jgriffith> jungleboyj: yes, that's what I'm getting at 16:29:58 <jgriffith> jungleboyj: a good deal of nova work before getting to Cinder 16:30:09 <jgriffith> and the Cinder side might not be so tough to implement 16:30:12 <jungleboyj> jgriffith: +2 16:30:20 <jgriffith> emphasis on *might* 16:30:22 <jgriffith> :) 16:30:25 <caitlin56> I can definitely see the need for nova being able to tell Cinder "this guy is gone" but "don't talk to this guy" is something more dangerous. 16:30:28 <jungleboyj> jgriffith: Just wanted to emphasize that. 16:30:45 <caitlin56> Couldn't nova use neutron to enforce that without bothering Cinder? 16:30:50 <jgriffith> caitlin56: but I think that's what we're talking about... ehudtr ?? ^^ 16:30:54 <winston-d> and that's nova's problem 16:31:04 <jgriffith> caitlin56: nope 16:31:07 <ehudtr> Yes, I agree this would be managed by Nova. The fence host is needed to disconnect the host at the storage level 16:31:18 <jgriffith> I don't want Neutron mucking about with my data path 16:31:26 <DuncanT-> caitlin56: Nope, neutron doesn't get in the way of storage network usually 16:31:42 <jgriffith> ehudtr: but the question is you also want to prevent the failed node from connecting again right? 16:31:59 <jgriffith> I think this is where the debate started last week :) 16:32:17 <jungleboyj> ehudtr: The important part is that you don't have the 'failed' host's node attempting to access the storage while the new node is being brought up. 16:32:50 <jungleboyj> jgriffith: I would assume that is only if a new node has taken over. 16:32:59 <caitlin56> Such a "failed node" shouldn't be using *any* OpenStack services, right? It's not just Cinder. 16:33:04 <jgriffith> Ok, two more minutes for this topic and then I think we should move on 16:33:22 <jgriffith> jungleboyj: sure 16:33:29 <jgriffith> caitlin56: haha... probably 16:33:37 <DuncanT-> caitlin56: Some services multiple accesses don't matter... block store it does 16:33:44 <jungleboyj> caitlin56: Depends on how it fails. In situations like this you need to cover all the bases. 16:33:46 <jgriffith> this is where I went all wonky in the last discussion :) 16:34:04 <ehudtr> Yes, this was part of the original suggestion, but last meeting you suggested that preventing the failed node attempting during a new node creation may be done in Nova. I checked it and it seems this might be done in Nova only after the attach volume would be moved from nova-compute to nova-conductor. 16:34:23 <jgriffith> Ok... so my proposal: 16:34:26 <jgriffith> ehudtr: 16:34:31 <jgriffith> 1. Take a look at the nova work 16:34:38 <jgriffith> Focus on things like failure detection 16:34:48 <jgriffith> How you would generate a notification 16:35:03 <jgriffith> what would be needed from cinder (if anything versus disabling the initiator files etc) 16:35:14 <jgriffith> 2. After getting things sorted in Nova 16:35:14 <ehudtr> I know how to do failure detection with Nova 16:35:29 <jgriffith> Great, step 1 is almost done then :) 16:35:38 <guitarzan> hah 16:35:46 <jgriffith> Then put together the proposal, make sure the Nova team is good with that 16:36:00 <jgriffith> from there we can work on providing an API in Cinder to fence of initiators 16:36:09 <jgriffith> I'd like to see what the code for that looks like though 16:36:16 <jgriffith> and how to clear it 16:36:22 <jgriffith> ehudtr: sound reasonable? 16:36:27 <ehudtr> yes 16:36:38 <jgriffith> DuncanT-: guitarzan jungleboyj caitlin56 ok ^^ 16:36:43 <jgriffith> winston-d: 16:36:46 <DuncanT-> Sounds sensible to me 16:36:54 <jungleboyj> jgriffith: Sounds good to me. Good summary to keep moving forward. 16:37:17 <jgriffith> winston-d: seem like that works for the error case you were thinking of? 16:37:36 <jgriffith> dosaboy: I have no idea how to make this work with Ceph but that's why you're an invaluable asset here :) 16:37:54 <jgriffith> hemna_: you'll have to figure out FC :) 16:38:10 <jgriffith> Ok... 16:38:10 <guitarzan> just go yank the cable out 16:38:18 <jgriffith> guitarzan: I'm down with that 16:38:28 <jgriffith> guitarzan: DC Monkey.. fetch me that cable! 16:38:35 <hodos> hi guys, we at Nexenta are implementing a storage-assisted volume migration; we have run into a problem: there can be multiple storage hosts connected to a single NFS driver. So there's one-to-many mapping... 16:38:52 <winston-d> dsfasd 16:39:01 <jgriffith> #topic patches and release notes 16:39:07 <jgriffith> winston-d: what's dsfasd? 16:39:10 <dosaboy> jgriffith: it may actually be easier for ceph since it has the notion of 'watchers' 16:39:13 <guitarzan> winston-d: lagging? 16:39:17 <jgriffith> dosaboy: yeah :) 16:39:17 <winston-d> jgriffith: sorry lagging 16:39:21 <jgriffith> haha 16:39:22 <DuncanT-> hodos: Please wait until the any other business section of the meeting 16:39:23 <jgriffith> no prob 16:39:33 <jgriffith> So quick note on this topic 16:39:40 <hodos> ok, sorry ) 16:39:55 <DuncanT-> #topic patches and release notes 16:40:00 <jgriffith> reviewers, I'd like for us when adding a patch that's associated with a BP or a Bug to update the doc/src/index file 16:40:14 <jgriffith> that way I don't have to go back and try to do it every milestone :) 16:40:26 <jgriffith> Same format as what's there 16:40:30 <jgriffith> simple summary, link 16:40:34 <jgriffith> sound reasonable? 16:40:50 <jgriffith> rolling release notes :) 16:41:16 * jgriffith takes silence as agreement :) 16:41:20 <winston-d> sounds good 16:41:25 <jungleboyj> jgriffith: Sounds reasonable. 16:41:27 <jgriffith> or just plain lack of interest and apathy 16:41:31 <jgriffith> kk 16:41:37 <jgriffith> now to the hard stuff :) 16:41:45 <DuncanT-> Something for reviewers to catch I guess... 16:41:51 <jgriffith> DuncanT-: to catch yes 16:42:02 <winston-d> we do agree to write a cinder dev doc, right. make sure this is documented as well 16:42:10 <jgriffith> not a horribly big deal but it would be helpful IMO 16:42:11 <jungleboyj> jgriffith: So just to be clear as a newbie ... 16:42:12 <DuncanT-> Might be able to get a bot to catch simple cases after a while 16:42:14 <jgriffith> winston-d: excelelnt point 16:42:30 <jgriffith> DuncanT-: hmmm... perhaps a git hook, yes 16:42:34 <jungleboyj> jgriffith: If I approve something associated with a BP I would need to go update that file with appropriate information? 16:42:41 <jgriffith> jungleboyj: oh.. no 16:42:50 <jgriffith> jungleboyj: so the idea is that the submitter would add it 16:43:00 <jgriffith> when core reviews it we should look for that entry 16:43:18 <jgriffith> if people hate the idea or think it's a waste that's ok 16:43:20 <jgriffith> just say so 16:43:26 <jgriffith> I don't mind doing it the way I have been 16:43:30 <caitlin56> and if you don't approve patches without that link people should learn very quickly. 16:43:37 <jgriffith> just don't complain if your change isn't listed :) 16:43:55 <jungleboyj> jgriffith: Ahhh, ok ... That makes more sense. Thanks for the clarification. 16:44:03 <DuncanT-> I'd say we try it for a couple of weeks and see how it works out 16:44:08 <jgriffith> works for me 16:44:14 <jgriffith> trial basis 16:44:18 <jgriffith> kk 16:44:29 <jgriffith> #topic summit summary 16:44:36 <jgriffith> hmmm 16:44:44 <jgriffith> #topic summit-summary 16:44:49 <jgriffith> come on meetbot 16:45:01 <DuncanT-> #topic summit-summary 16:45:08 <DuncanT-> I started the meeting 16:45:12 <jungleboyj> It knows I am also in another summit summary meeting and my head may explode. 16:45:23 <jgriffith> lol 16:45:30 <jgriffith> DuncanT-: thanks :) 16:45:33 <jgriffith> okie 16:45:37 <jungleboyj> o-) 16:45:53 <jgriffith> I threw a quick overview together: https://etherpad.openstack.org/p/cinder-icehouse-summary 16:46:05 <jgriffith> of course it's handy to review the etherpads from the sessions 16:46:15 <jgriffith> but I wanted to capture the main points in one doc 16:46:36 <jgriffith> I *think* these are the items that we had moderate concensus on 16:46:54 <jgriffith> the capabilities reporting maybe not so much... but I'm still pushing to go this route 16:47:15 <jgriffith> We can always make things *harder* but I don't know that we should 16:47:24 <DuncanT-> I'd add that we seemed to agree state machine with atomic transitions was a good route to try, re taskflow 16:47:35 <jgriffith> DuncanT-: for sure 16:47:53 <jgriffith> added 16:48:10 <jgriffith> anything else glaring that I missed (that somebody will actually get to)? 16:48:26 <winston-d> jgriffith: i think most of us in this room agreed on those capabilities 16:48:32 <caitlin56> Making snapshots a first layer object. 16:48:33 <jgriffith> I left the import out intentionally for now by the way 16:48:35 <DuncanT-> My take-away from the capabilities reporting was that we couldn't agree on anything at all 16:48:45 <jgriffith> DuncanT-: I don't think that's really true 16:48:55 <jgriffith> I think one person didn't agree 16:49:05 <DuncanT-> jgriffith: I see no harm in adding it anyway 16:49:06 <jgriffith> I think most of us agreed with what i've put on the list 16:49:28 <jgriffith> winston-d: I think you may have some other ideas/adds that would be fine as well 16:49:41 <jgriffith> anyway... 16:49:51 <jgriffith> anything pressing to add or remove here? 16:49:53 <DuncanT-> jgriffith: O'll try to BP the stuff that redhat guy eventually explained, since it seemed valuable once I finally understood him 16:50:02 <DuncanT-> s/O'll/I'll/ 16:50:03 <jgriffith> We need more thought/info around the ACL's I think 16:50:24 <jgriffith> DuncanT-: cool 16:50:31 <winston-d> DuncanT-: can't wait to see the BP 16:50:33 <jgriffith> DuncanT-: or work off the ehterpad for now 16:50:46 <jgriffith> whichever is faster and more effective 16:50:58 <jgriffith> if we reach concensus prior to the BP it might help :) 16:51:01 <DuncanT-> jgriffith: Etherpad might be easiest... I'll post a link when I'm done 16:51:10 <jgriffith> sounds good 16:51:24 <jgriffith> Everybody should feel free to put some notes add question to the etherpad 16:51:32 <jgriffith> but the intent is not to open debate 16:51:42 <jgriffith> just to focus on what's there and build the ideas up 16:51:54 <jgriffith> and use that info to build blueprints 16:51:57 <jgriffith> and assign :) 16:52:14 <jgriffith> anybody want to talk more on that? 16:52:38 <jgriffith> if not I believe hodos had some things to talk about 16:52:41 <jgriffith> #topic open 16:52:45 * jgriffith never learns 16:52:51 <DuncanT-> #topic open 16:52:58 <hodos> ok so it touches not only Nexenta 16:53:02 <jungleboyj> :-) 16:53:12 <jgriffith> hodos: back up... what's "it" 16:53:27 <jgriffith> afraid I ignored you earlier :) 16:53:54 <hodos> so if we want ot do storage-to-storage migration without routing data through Cinder 16:53:57 <hodos> on NFS 16:54:11 <hodos> we have 2 NFS drivers 16:54:38 <caitlin56> hodos: our bigger priority is enabling more operations on snapshots. I think the fix required for NFS is too much to tackle by icehouse. 16:55:02 <thingee> lawl, time change 16:55:06 <hodos> so how does the source driver knows what storage host to use on the dest 16:55:10 <jgriffith> thingee: :) 16:55:17 <winston-d> thingee: you made it 16:55:29 <jgriffith> hodos: scheduler could help with that 16:55:33 <caitlin56> Speaking of snapshots, I didn't hear any opposition to enabling snapshot replication. Shouldn't that be added to your list jgriffith? 16:55:34 <jgriffith> that is it's job after all 16:55:59 <hodos> yes, but when I issue a command on the source storage driver 16:56:06 <hodos> i need that info 16:56:15 <jgriffith> hodos: :) 16:56:24 <vito-ordaz> u 16:56:35 <jgriffith> hodos: frankly this is why I hate the whole "I'm going to talk to this backend directly" problem 16:56:55 <vito-ordaz> update_volume_stats for NFS driver not provide information about host 16:57:05 <jgriffith> I'm not a fan of trying to implement Cinder based replication 16:57:11 <hodos> hmm 16:57:27 * jgriffith thinks it's a bad idea 16:57:31 <thingee> why does the driver need to know which host (sorry catching up) 16:57:44 <caitlin56> jgriffith the alternative is inefficient replication 16:57:51 <jgriffith> thingee: he wants to talk directly from his backend to his *other* backend 16:58:04 <hodos> to *my* other backend 16:58:05 <jgriffith> caitlin56: actually the atlernative is cinder doesn't do replication 16:58:12 <winston-d> vito-ordaz: feel free to add any capability that you want to report,just note that scheduler can only consume some of them (basic ones) 16:58:21 <med_> maybe hodos is not a "he" 16:58:27 <hodos> i'am 16:58:29 <hodos> ) 16:58:36 <jgriffith> med_: fair 16:58:43 <med_> or not. 16:58:48 <jgriffith> everyone... I aplogize for being gender specific 16:58:50 <thingee> jgriffith, hodos: what's the use case? 16:59:06 <guitarzan> thingee: migration from one backend to another 16:59:32 <guitarzan> homogeneous 16:59:33 <thingee> why does a driver have to know? shouldn't cinder just be the bridge with that knowledge? 16:59:33 <hodos> yes, say the same vendor, so these backends know how to talk 16:59:35 <vito-ordaz> problem it that NFS drivers can control many storage backend at the same time. 16:59:43 <thingee> of another backend that meets that requirement...the scheduler will figure 16:59:47 <guitarzan> thingee: so they can do it more cheaply 16:59:59 <winston-d> hodos: don't we have a shortcut in migration? 16:59:59 <guitarzan> it sounds hard to me :) 17:00:05 <thingee> so cinder says, hey you two backends, talk to each other 17:00:13 <hodos> ) 17:00:20 <thingee> the other backend never initates it is my point 17:00:26 <thingee> jgriffith: that's time 17:00:28 <jgriffith> For the record, I don't think we should even implement replication in Cinder 17:00:28 <guitarzan> got it 17:00:44 <guitarzan> DuncanT- has to throw the switch today 17:00:46 <jgriffith> Let the sysadmin setup replication between devices if avaial and create a volume-type for it 17:00:57 <DuncanT-> Right, I'm afraid we need to move channels 17:01:02 <med_> yep 17:01:02 <DuncanT-> #endmeeting