16:00:42 <jgriffith> #startmeeting cinder 16:00:43 <openstack> Meeting started Wed Oct 30 16:00:42 2013 UTC and is due to finish in 60 minutes. The chair is jgriffith. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:44 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:46 <rushiagr> yeah!! 16:00:46 <openstack> The meeting name has been set to 'cinder' 16:00:49 <rushiagr> o/ 16:00:52 <jgriffith> dosaboy: 16:00:53 <duncanT> Hey 16:01:07 <rushiagr> jusst on time :) 16:01:08 <jgriffith> dosaboy: with us? 16:01:11 <jgriffith> rushiagr: :) 16:01:14 <dosaboy> oh hey yes 16:01:16 <duncanT> Glad you shouted, totally forgotten the clocks change altered the meeting time 16:01:16 <avishay> hi 16:01:20 <dosaboy> goddam DST ;) 16:01:28 <jgriffith> #topic backup support for metadata 16:01:30 <jgriffith> dosaboy: LOL 16:01:34 <jungleboyj> Howdy all! 16:01:38 <jgriffith> jungleboyj: yo 16:01:40 <dosaboy> ok so 16:01:55 <avishay> https://wiki.openstack.org/wiki/CinderMeetings 16:02:09 <guitarzan> duncanT: it's ok, the rest of us will show up at the wrong time next week 16:02:25 <guitarzan> hmm, or the week after I suppose 16:02:31 <jungleboyj> guitarzan: +2 16:02:32 <dosaboy> so i had a few discussions now about backlup metadata 16:02:42 <dosaboy> few opionions flying around 16:02:49 <dosaboy> but 16:02:59 <dosaboy> i think the best way forward is as follows 16:03:16 <dosaboy> each backup driver will backup a set of volume metadata 16:03:27 <dosaboy> that 'set' of metadata will come from a common api 16:03:34 <dosaboy> presented to all drivers 16:03:38 <dosaboy> which will be versioned 16:03:56 <dosaboy> this will allow for the volume to recreated from scratch 16:04:02 <dosaboy> should the db/cinder cluster get lost 16:04:04 <jgriffith> dosaboy: why back up the metadata at all? 16:04:11 <dosaboy> (some caveats tbd) 16:04:20 <jgriffith> dosaboy: ahh... db recovery 16:04:21 <avishay> dosaboy: i noted on the BP that you need an import_backup too 16:04:23 <dosaboy> well (queue DuncanT) 16:04:33 <dosaboy> avishay 16:04:35 <dosaboy> yes 16:04:41 <dosaboy> I have created a sperate BP for that 16:04:50 <dosaboy> basiucally all this back stuff is mushromming a bit ;) 16:04:52 <avishay> Oh OK, cool - please link the BPs 16:04:55 <dosaboy> back/backup 16:05:09 <dosaboy> yeah sorry i am all over the shop this week 16:05:11 <duncanT> The vision I've always tryed to keep for backup is that it *is* for disaster recovery 16:05:15 <dosaboy> trying to keep up 16:05:19 <jgriffith> duncanT: DR of what though? 16:05:28 <caitlin56> dosaboy: how would this work if we allowed the Volume Drivers to do backups? 16:05:29 <jgriffith> duncanT: it's not an HA implementation of Cinder 16:05:35 <jgriffith> duncanT: at least I don't think it should be 16:05:48 <duncanT> jgriffith: Cidner volumes. Even if your cinder cluster caught frie orr got stolen, you can still get you volume(s) back 16:06:03 <duncanT> Gah, typing fail 16:06:06 <caitlin56> dosaboy: having the Volume Driver do the backup would be more efficient when the drives are external, but we need a definitive format. 16:06:07 <jgriffith> duncanT: if it's cinder volumes I ask again, why even back up metadata 16:06:08 <avishay> jgriffith: if you backup to a remote site and you lose your entire cinder, your backups should remain usable 16:06:19 <dosaboy> caitlin56: not sure what you mean 16:06:20 <jgriffith> caitlin56: besides the point right now 16:06:27 <jungleboyj> duncanT: So, the data about what volumes there were? 16:06:33 <jgriffith> avishay: that's Cinder DR not volume backup 16:06:44 <duncanT> jgriffith: because certain volumes are useless without at least some metadata (e.g. the bootable flags and glace metadata for licensing) 16:06:46 <jgriffith> what I'm saying is that they're two very different things 16:06:53 <avishay> jgriffith: everything's connected :) 16:06:57 <jgriffith> duncanT: ok.. I'm going ot try this one more time 16:07:06 <jgriffith> avishay: duncanT 16:07:07 <jgriffith> first: 16:07:19 <jungleboyj> avishay: Oh, ok, backup of the volumes is separate and then this backs up the data for accessing them. Right? 16:07:22 <jgriffith> My thought regarding the purpose of volume backup service is to backup volumes 16:07:41 <jgriffith> what you're proposing now is bleeding over into the db contents 16:07:45 <jgriffith> however... 16:07:57 <jgriffith> if you're going to do that, then I would argue that you have to go all the way 16:08:02 <dosaboy> jgriffith: since we need to backup the metadata, we could just shove it in the backend and effectively get DR for free 16:08:11 <jgriffith> in other words just backing up the meta is only part of the story 16:08:21 <dosaboy> it is not much effort to get that done 16:08:26 <jgriffith> dosaboy: ummm... I don't think it's that simple honestly 16:08:29 <thingee> o/ 16:08:34 <jgriffith> dosaboy: quotas, limits etc etc 16:08:41 <dosaboy> well, 16:08:41 <jgriffith> all of those things exist in the DB 16:08:48 <avishay> dosaboy: it's DR with high RPO and RTO 16:08:48 <jgriffith> snapshots 16:08:50 <duncanT> quotas, limits etc I don't think are part of the volume 16:08:54 <dosaboy> that is where my versioned api comes in 16:08:57 <dosaboy> so 16:09:01 <jgriffith> duncanT: neither is metadata 16:09:10 <dosaboy> the idea is that we define a sufficient set of metadata 16:09:13 <bswartz> if you want to recover from your whole cinder going up in smoke you need to mirror the whole cinder DB 16:09:15 <duncanT> But the stuff needed to use the volume *is* part of the volume 16:09:18 <caitlin56> Backing up the metadata with the data is relatively easy, it's standardizing it and being compatible with existing backups that takes work. 16:09:22 <jgriffith> I honestly think this makes things WAY more complicated than we should 16:09:37 <dosaboy> caitlin56: hence the versioning 16:09:38 <avishay> caitlin56: yes 16:09:40 <jgriffith> if you want cinder DR then implement an HA cinder setup 16:09:44 <duncanT> bswartz: The backup API allows you to choose (and pay for in certain cases) a safe, cold storage copy of you volume 16:09:54 <jgriffith> if you want to back up databases, back them up using a backup service 16:10:09 <duncanT> I *don't* want to back up the database 16:10:10 <winston-d> caitlin56: but can we 'backup' the metadata in db? 16:10:10 <caitlin56> Backing up the CinderDB means that you would be restoring *all* volumes. 16:10:19 <jgriffith> duncanT: but that's your argument here 16:10:28 <jgriffith> the only reason to backup metadata is if the db is lost 16:10:30 <duncanT> jgriffith: No it isn't 16:10:37 <duncanT> (my arguement) 16:10:44 <jgriffith> ok... then why backup the metadata at all? 16:10:48 <caitlin56> If you want to restore selective volumes then you need selective metadata (or no metadata, which is what jgriffith is arguing). 16:11:07 <duncanT> Right, I want a backup to be a disaster resistant copy of a volume. 16:11:13 <jgriffith> duncanT: what's the logic to backing up the metadata? 16:11:13 <bswartz> if you're not worried about the database going away, then there's no point to making more copies of the metadata 16:11:20 <duncanT> Including everything you need to use *that volume* 16:11:29 <duncanT> Not *all volumes* 16:11:34 <duncanT> Not *cinder config* 16:11:43 <duncanT> Just the volume I've said is important 16:11:53 <duncanT> Otehrwise use a snapshot 16:12:00 <jgriffith> You're still not realy answering my question 16:12:05 <winston-d> bswartz: yes, there is. we 'snapshot' the metadata, in DB 16:12:18 <caitlin56> duncanT: can you site some specific metadata fields that you would not know how to set when restoring just the volume payload? 16:12:19 <jgriffith> bswartz: +1 16:12:30 <jgriffith> bswartz: which is my whole point 16:12:42 <jgriffith> bswartz: and what I'm trying to get duncanT to explain 16:12:43 <duncanT> caitlin56: The bootable flag. The licensing info held in the glance metadata 16:13:00 <jgriffith> introducing things like "disaster resistant" isn't very helpful to me :) 16:13:02 <avishay> winston-d: needs to be consistent 16:13:04 <bswartz> winston-d: you're imagining that the metadata might change and you want to restore it from an old copy? 16:13:07 <jgriffith> that's a bit subjective 16:13:16 <duncanT> I'd like to be able to import a backup into a clean cinder install 16:13:18 <winston-d> bswartz: correct 16:13:26 <jgriffith> duncanT: ahh... that's VERY different! 16:13:34 <jgriffith> duncanT: that's volume import or migration 16:13:42 <jgriffith> that's NOT volume backup 16:13:43 <bswartz> winston-d: seems reasonble 16:13:47 <duncanT> jgriffith: No, it's backup and restore 16:13:52 <avishay> no, it's not migration 16:13:58 <duncanT> jgriffith: Even if cinder dies, catchs fire etc 16:14:01 * dosaboy is sitting on the fence whistling 16:14:05 <caitlin56> Does anyone have examples of metadagta that SHOULD NOT be restored when you migrate a volume from one site to another? 16:14:09 <duncanT> jgriffith: The backup should be enough to get my working volume back 16:14:12 <avishay> migration is within one openstack install 16:14:41 <duncanT> I put that on my first ever backup slide, and I propose to keep it there 16:14:43 <jgriffith> avishay: yeah... 16:14:59 <jgriffith> avishay: duncanT alright we're obviously not going to agree here 16:15:03 * jungleboyj is enjoying the show. 16:15:05 <jgriffith> avishay: well maybe you and I will 16:15:12 <avishay> haha 16:15:25 <winston-d> avishay: backup is within one cinder install as well, no? 16:15:25 <jgriffith> duncanT: fine, so you want a "cinder-volume service" backup 16:15:30 <avishay> i'm with duncanT on this one 16:15:46 <avishay> winston-d: i can backup to wherever i want - think geo-distributed swift even 16:15:46 <jgriffith> haha 16:15:53 <dosaboy> avishay: +1 16:15:56 <jgriffith> WTF? 16:16:01 <duncanT> jgriffith: I just want the volume I backup to come back, even if cinder caught fire in the mean time 16:16:09 <jgriffith> right 16:16:10 * bswartz is nervou about conflating backup/restore use cases with DR use cases 16:16:21 <bswartz> nervous* 16:16:22 <jgriffith> the key is "cinder caugh fire" 16:16:26 <avishay> if i backup to geo-distributed swift, and a meteor hits my datacenter, i can rebuild and point my new metadata to existing swift objects 16:16:27 <caitlin56> What is a backup? If it is not enough to restore a volume to a new context then why not just replicate snapshots? 16:16:34 <jgriffith> sorry.. buy you're saying "backup as a service" in openstack as a whole IMO 16:16:34 <duncanT> backup has been DR since day one on the design spec 16:16:50 <jgriffith> You're not talking about restoring a volume anymore 16:17:09 <winston-d> avishay, duncanT don't forget to backup volume types, and qos alone with volume and metadata 16:17:09 <guitarzan> where is that leap coming from? 16:17:12 <jgriffith> you're talking about all the little nooks and crannies that your specific install/implementation may require 16:17:14 <bswartz> duncanT: I don't agree that taking backups is the best way to implement DR -- it's *a* way, but a relatively poor one 16:17:21 <avishay> backup and replication are on the same scale, with different RPO/RTO and fail-over methods 16:17:25 <jgriffith> and what's worse is you're saying "I only care about metadata" 16:17:26 <winston-d> avishay: duncanT 'cos those two are considered to be 'metadata' to the voluem as well 16:17:32 <duncanT> The problem is, when I first wrote backup, we didn't have bootable flags, or volume encryption... you got the bits back onto an iscsi target and you were back in business 16:17:33 <jgriffith> but somebody else says "but I care about quotas" 16:17:38 <guitarzan> winston-d has an interesting point 16:17:40 <jgriffith> and somebody else cares about something else 16:17:44 <duncanT> Glance metadata was the first bug 16:17:47 <bswartz> the real value of backups is when you don't have a disaster, but you've corrupted your data somehow 16:17:57 <jgriffith> It doesn't end until you backup all of cinder and the db 16:17:59 <avishay> winston-d: but i can put the data into a volume with different qos and it still works 16:18:01 <duncanT> Types or rate limits don't stop me using a volume 16:18:12 <guitarzan> bswartz: isn't that a disaster? :) 16:18:15 <jgriffith> duncanT: they on't stop YOU 16:18:17 <jgriffith> that's the key 16:18:26 <jgriffith> they may stop others though depending on IMPL 16:18:40 <bswartz> guitarzan: no -- that's a snafu 16:18:42 <duncanT> jgriffith: They don't stop a customer... 16:18:49 <jgriffith> duncanT: they don't stop your customers 16:18:52 <jgriffith> duncanT: they stop mine 16:19:09 <jgriffith> duncanT: I have specific heat jobs that require volumes of certain types 16:19:10 <bswartz> there's a difference between users screwing up their own data, and a service provider having an outage 16:19:14 <duncanT> jgriffith: The point is, right now, even if you've got your backup of a bootable volume, it is useless if cinder looses stuff out the DB 16:19:14 <winston-d> avishay but that's not the original volume (not data) any more. 16:19:28 <jgriffith> duncanT: perhaps 16:19:32 <winston-d> avishay: i mean data is, volume is not 16:19:39 <caitlin56> Would it always make sense when restoring a single volume to a new datacenter to preserve the prior QoS/quotas/etc.? 16:19:39 <avishay> winston-d: that's philosophical :P 16:19:48 <duncanT> jgriffith: You can't restore it in such a way that you can boot from it. At all. 16:19:53 <dosaboy> all this could be accounted for by simply defining a required matadata set 16:19:58 <jgriffith> duncanT: but the purpose of the backup IMO is if your backend takes a dump a user can get his data back, or as bswartz pointed out a user does "rm -rf /*" 16:20:06 <dosaboy> i don't see why that would be so complex 16:20:07 <winston-d> dosaboy: +1 16:20:16 <avishay> it's all of these use cases 16:20:19 <jgriffith> dosaboy: I don't disagree with that 16:20:36 <dosaboy> deliberating whther this or that metadata is required is ot reall for this conversation 16:20:39 <duncanT> jgriffith: Right now, I CAN'T GET MY BOOTABLE VOLUME TO BOOT 16:20:43 <avishay> it's rm -rf, it's a fire, it's a meteor 16:20:43 <duncanT> It jsut can't be done 16:20:45 <caitlin56> dosaboy: agreed, if we are going to backup metadata, we need to define filters on the metadata so only things that should be kept are. 16:20:55 <jgriffith> duncanT: keep yelling 16:21:04 <jgriffith> duncanT: I'll keep ignoring :) 16:21:12 <guitarzan> that's the bug, you can't boot a restored backup 16:21:14 <avishay> jgriffith: accidental caps lock ;) 16:21:26 <duncanT> jgriffith: Sorry, I was out of line a touch there 16:21:28 <jgriffith> avishay: haha... I don't think that's the case 16:21:36 <avishay> jgriffith: be optimistic :) 16:21:45 <jgriffith> dosaboy: so like I said in IRC the other day.... 16:22:01 <dosaboy> yarp 16:22:03 <jgriffith> dosaboy: I'm fine with it being implemented, I could care less 16:22:03 <winston-d> i'd like to consider the volume as a virtual hard drive. 16:22:03 <duncanT> jgriffith: But a snapshot covers the rm -rf case 16:22:18 <jgriffith> dosaboy: I have the info in my DB so I really don't care 16:22:37 <jgriffith> dosaboy: If you're an idiot and you don't back up you db then hey.. this at least will help you 16:22:49 <jgriffith> dosaboy: but something else is going to bight you in the ass later 16:22:54 <winston-d> whatever it takes to backup a virtual hard drive, that's what we should do in cinder backup. 16:22:56 <caitlin56> Allowing snapshot replication would deal with disaster recovery issues, but not with porting a volume to a new vendor. 16:23:03 <dosaboy> damn it, i'm back on the fence again 16:23:28 <dosaboy> i kinda think the only way to resolve this is to have a vote 16:23:33 <jgriffith> dosaboy: I also think that things like metadata would be good in an "export/import" api 16:23:48 <avishay> dosaboy: democracy doesn't work in these situations i think :) 16:23:50 <thingee> I missed the beginning of this convo. Why are people opposed to it restoring metadata? 16:23:52 <jgriffith> dosaboy: duncanT but like I said, it doesn't *hurt* me if you put metadata there 16:23:54 <dosaboy> lol 16:23:59 <duncanT> I totally agree that theya re part of export too 16:24:08 <duncanT> And transfer for that matter 16:24:23 <dosaboy> ok so, i have implement a chunk of this, 16:24:23 <avishay> what is "volume export"? 16:24:30 <duncanT> Certain volumes are literally useless if you loose certain bits of their metadata 16:24:33 <dosaboy> why don't i see if i can knock uo the rest 16:24:38 <dosaboy> and then if you like... 16:24:42 <jgriffith> dosaboy: I do think someone should better define the purpose of cinder-backup though 16:24:50 <jgriffith> dosaboy: that's fine by me 16:24:52 <winston-d> thingee: i think we are discussing about where to save the copy of metadata for a volume backup, in DB or in Swift/Ceph/Sth else 16:24:53 <dosaboy> jgriffith: totally agree 16:24:57 <zhiyan> winston-d: if we thinking volume as a virtual hard driver, so can we export it as a package, like ovf? it contains metadata 16:25:00 <jgriffith> dosaboy: like I said, I won't object to backing it up at all 16:25:00 <jungleboyj> jgriffith: +1 16:25:09 <jgriffith> dosaboy: but I don't want to have misleading expectations 16:25:10 <dosaboy> i would have asked for a session if i had not confrmed HK so late 16:25:21 <jgriffith> dosaboy: this is nowhere near a Cinder DR recovery backup 16:25:25 <jgriffith> and I don't want to make it one 16:25:35 <jgriffith> errr... 16:25:38 <jgriffith> s/recover// 16:25:43 <dosaboy> there are many a stong opinion on this one ;) 16:25:43 <thingee> winston-d: I think we've discussed before to leave it to the object store. 16:25:46 <caitlin56> jrgriffith: we're debating what a backup is good for. 16:25:51 <thingee> as object store metadata 16:26:07 <avishay> jgriffith: what is "volume export"? 16:26:27 <jgriffith> avishay: non-existent :) 16:26:44 <avishay> jgriffith: it looks like i'm leading the "volume import" session, so thought I should know :) 16:26:53 <jgriffith> avishay: the idea/proposal was to be able to kick out volumes from Cinder without deleting them off the backend 16:27:00 <dosaboy> put it this way, as long as the necessary metadat is backed up (either way) 16:27:03 <avishay> jgriffith: ah OK 16:27:03 <jgriffith> and then obviously an import to pull in existing volumes 16:27:03 <dosaboy> noone gets hurt 16:27:08 <winston-d> thingee: yeah, but as i said to dosaboy the other day, i missed that discussion 16:27:11 <jgriffith> dosaboy: agreed 16:27:32 <dosaboy> soi'll keep going in the track i'm on 16:27:47 <dosaboy> and we can take a look at what i get done 16:27:51 <avishay> jgriffith: quotas are tricky on export 16:27:53 <dosaboy> if we don't like it then fine 16:28:02 <thingee> dosaboy: are you still storing it in object store metadata? 16:28:09 <dosaboy> thingee: yes 16:28:12 <jgriffith> avishay: indeed 16:28:18 <dosaboy> but each driver will have it's own way 16:28:20 <duncanT> avishay: I'd suggest that once something is kicked out of cinder, then it can't take up a cinder quota? 16:28:22 <winston-d> avishay: and types, extra specs, qos? 16:28:23 <caitlin56> dosaboy: be sure to escape the metadata then. 16:28:32 <jgriffith> duncanT: +1 16:28:35 <avishay> duncanT: but it still takes up space on disk 16:28:54 <jgriffith> avishay: yeah so that's the counter, however you kicked it out 16:28:58 <duncanT> avishay: Not cinder's problem once you've explicitly decided it isn't cinder's problem 16:29:06 <jgriffith> avishay: that tenant can't 'access' via cinder anymore 16:29:17 <jgriffith> avishay: it's troubling.... 16:29:27 <avishay> jgriffith: yes. is export needed? 16:29:29 <jgriffith> avishay: I'm having the same dilema with adding a purge call to the API 16:29:39 <caitlin56> Quotas are tricky, because they are implicitly part of the cinder context where the backup was made. 16:29:47 <jgriffith> avishay: well that's another good question :) 16:29:48 <dosaboy> ooh i had not thought of that 16:30:03 <dosaboy> so how do backups count towards quota atm? 16:30:06 <avishay> wait...i'm sorry i forked the conversation 16:30:14 <avishay> people are getting confused 16:30:14 <duncanT> dosaboy: Currently they don't 16:30:24 * rushiagr definitely is 16:30:27 <avishay> we moved to talk about "volume export" without finishing backup metadata 16:30:29 <dosaboy> and i presume that is bad? 16:30:34 * jgriffith went down the fork nobody else is on 16:30:44 <avishay> haha 16:30:50 <caitlin56> dosaboy: backups should count in your *object* storage quoata, not cinder. 16:31:12 <avishay> i think we can hash out export at the session. i think we should also find time to talk about backup metadata. 16:31:17 <dosaboy> caitlin56: but they don't necessarily go to an object store 16:31:33 <dosaboy> like if you offload to TSM 16:31:51 <caitlin56> dosaboy: when the backup is to an object store, we should let that object store track/report the data consumption. 16:32:07 <thingee> caitlin56: +1 16:32:09 <jungleboyj> caitlin56: That makes sense. 16:32:22 <jungleboyj> Cinder can't run the whole world. 16:32:29 <thingee> I don't want this problem where multiple projects are tracking the same resource quota again. 16:32:30 <jungleboyj> ...yet. 16:32:42 <dosaboy> anyway i don't wanna hijack this meeting anymore 16:33:29 <dosaboy> i think quotas in backup are discussion to be had though 16:33:30 <jgriffith> dosaboy: too bad :) 16:33:37 <avishay> dosaboy: duncanT: can you come up with a set of metadata to back up and we'll discuss in person next week? we can do an ad-hoc session? jgriffith sound good? 16:33:46 <dosaboy> sure 16:33:50 <duncanT> Sure 16:34:04 * jungleboyj is sorry he is going to miss that discussion. The IRC version was fun! 16:34:08 <jgriffith> avishay: sure... but like I said, if dosaboy just wants to implement backup of metadata with a volume I have no objection 16:34:19 <dosaboy> we can all sit around a nice campfire 16:34:19 <jgriffith> so it would be a short conversation on my part :) 16:34:25 <avishay> ok :) 16:34:25 <dosaboy> DuncanT can make the cocoa 16:34:27 <jgriffith> no way!!!! 16:34:32 <jungleboyj> dosaboy: +1 16:34:33 <jgriffith> I know duncanT would push me into the fire 16:34:39 <dosaboy> haha 16:34:39 <avishay> that way we know if duncanT is yelling or just hit the caps lock ;) 16:34:42 <avishay> hahah 16:34:49 <jungleboyj> :-) 16:35:07 <dosaboy> don't worry, i'll bring the whiskey 16:35:16 <jungleboyj> dosaboy: +2 16:35:19 <jgriffith> alright... 16:35:21 <jgriffith> well that was fun 16:35:33 <jgriffith> Is Ehud around? 16:35:37 <EhudTrainin> yes 16:35:44 <jgriffith> EhudTrainin: welcome 16:35:49 <EhudTrainin> hi 16:35:51 <jgriffith> #topic fencing 16:36:21 <jgriffith> https://blueprints.launchpad.net/cinder/+spec/fencing-and-unfencing 16:36:27 <jgriffith> for those that haven't seen it ^^ 16:36:33 <jgriffith> EhudTrainin: I'll let you kick it off 16:37:07 <EhudTrainin> It is about adding fencing functionality to Cinder 16:37:24 <EhudTrainin> in order to support HA for instances 16:38:08 <avishay> blueprint explains it pretty well 16:38:15 <EhudTrainin> by ensuring the instances on a failed host would not try to access the storage after they rebuilt on another host 16:38:35 <jungleboyj> avishay: +1 16:38:46 <duncanT> My concerns here are the failure cases... partitioned storage etc... 16:38:47 <winston-d> avishay: +1, nice write up in the bp. haven't seen any one like that for quite a while. 16:38:49 <rushiagr> +1 for beutifully explaining in BP 16:38:50 <jgriffith> yeah, the bp is well written (nice job on that by the way) 16:39:21 <avishay> duncanT: please elaborate 16:39:41 <duncanT> It's easy to say it's hard for nova so cinder should do it, but exactly the same problems exist in cinder failure cases... like cinder looses communitication with a storage backend 16:40:13 <zhiyan> EhudTrainin: i'm thinking how to cinder identify those attachment session for a volume. seems need prevent such race condition issue.. IMO 16:40:20 <avishay> it's not hard for nova - it's impossible for nova. the server is in a bad state and we can't trust it to do the right thing. 16:40:21 <jgriffith> I guess my only real question was: 16:40:22 <hemna> fencing ? 16:40:37 <jgriffith> 1. why would the compute host try and access the storage? 16:40:44 <jgriffith> 2. why do we necessarily care? 16:41:03 <jgriffith> I should clarify before somebody flips out... 16:41:18 <EhudTrainin> I think that if the storage does not response then fence will fail, but it is lower probability to both host and storage fail at the same time 16:41:25 <jgriffith> If a compute host *fails* and instances are migrated 16:41:43 <jgriffith> it should IMO be up to nova to disable the atachments on the *failed* compute host 16:41:50 <avishay> jgriffith: it might not fail completely - it might just lose connectivity or go into some other bad state 16:41:58 <jgriffith> avishay: sure 16:42:12 <jgriffith> avishay: but it migrated instances right? 16:42:13 <hemna> shouldn't nova deal with detaching and reattaching elsewhere ? 16:42:22 <jgriffith> hemna: that's kinda what I was saying 16:42:30 <thingee> I agree with jgriffith. it should be up to whatever is doing the migration. 16:42:42 <avishay> wait 16:42:45 <hemna> since Nova should know that it had to migrate the instance to another host, it has the knowledge of the state 16:42:46 <jgriffith> we could really screw some things up if we make incorrect decisions 16:42:53 <duncanT> But currently there is no way of telling cinder 'make sure this is teally detached', I don't think 16:42:58 <avishay> EhudTrainin: if nova brings up the instance on another VM, does it have the same instance ID? 16:43:02 <duncanT> *totally detached 16:43:18 <EhudTrainin> This is not exactly migation, but a rebuild, since the host from Nova point of view has failed 16:43:25 <caitlin56> jgriffith: I agree, nova knows the client state accurately. It should deal with the results of that changing. 16:43:25 <winston-d> jgriffith: migrate the instance may or maynot stop the old one from connecting cinder 16:43:34 <hemna> avishay, I'd asume since it's a rebuild,it would be a new instance id 16:43:48 <jgriffith> winston-d: yeah... but I'm saying it *should* 16:43:50 <dosaboy> sounds like we expect the same piece of code that may fail to migrat and instance to be sane enough to ensure a fence 16:43:51 <hemna> could be a poor assumption though 16:44:10 <jgriffith> dosaboy: that could be double un-good 16:44:31 <dosaboy> yeah, i may be missing how this would be done though 16:44:33 <avishay> the use case is that the nova server is not responsive but the VM continues to run and access the storage 16:44:43 <EhudTrainin> It may rebuild with the same IP and attach it to the same volume 16:44:46 <jgriffith> avishay: ahhh 16:44:46 <caitlin56> dosaboy: could you propose something where we are protecting the volume rather than doing nova's work for it? 16:45:02 <duncanT> So if a compute node goes wonkey, when is it safe to reattach a volume that was attached to that host to another instance? 16:45:03 <jgriffith> avishay: so rogue vm's that nova can't get to anymore 16:45:12 <avishay> jgriffith: yes 16:45:14 <jgriffith> duncanT: never probably 16:45:26 <jgriffith> avishay: so who does the fencing? 16:45:30 <avishay> but now i'm think that maybe this is only a problem if we have multi-attach? 16:45:30 <duncanT> jgriffith: Indeed. I think the idea of fence is 'make it safe to do that' 16:45:34 <avishay> jgriffith: i assume the admin 16:45:39 <jgriffith> avishay:ie who makes the call 16:45:48 <jgriffith> avishay: and why not just send a detach/disconnect 16:45:52 <caitlin56> rogueVMs aren't something we shoudl solve - at most we should protect the volume from rogue VMs. 16:46:00 <dosaboy> caitlin56: not sure what you mean there 16:46:03 <hemna> jgriffith, +1 16:46:22 <duncanT> caitlin56: The idea here *is* to protect the volume for a rogue VM 16:46:26 <dosaboy> caitlin56: how does cinder know what a rogue vm is though?> 16:46:27 <winston-d> jgriffith: detach failed 16:46:34 <duncanT> I guess it is a disconnect on steroids 16:46:48 <winston-d> jgriffith: 'cos nova compute is not reachable 16:46:58 <caitlin56> dosaboy: exactly, we don't want cinder falsely deciding a VM is rogue. 16:47:00 <hemna> dosaboy, cinder doesn't know. only nova does 16:47:06 <jgriffith> duncanT: yeah, I'm assumign that's what the implementation would basicly be here 16:47:09 <dosaboy> catlin56: ah gotcha 16:47:34 <hemna> I think nova should drive this and I'm not sure what cinder needs to do during the fencing process. nova should detach from the rogue vm 16:47:36 <jgriffith> Ok... so interesting scenarios 16:47:39 <avishay> EhudTrainin: want to comment? 16:47:40 <jgriffith> here's my take... 16:47:57 <thingee> hemna: +1 nova should be driving this 16:48:01 <thingee> cinder does not have enough information 16:48:05 <duncanT> hemna: If the compute node stops talking, nova can't to the detach from the VM 16:48:11 <jgriffith> if you want to implement a service/admin API call to force disconnect from a node and ban a node I *think* that's ok 16:48:17 <winston-d> hemna: what if nova failed to detach volume, the only hope is to beg cinder to help 16:48:21 <EhudTrainin> An instance may be rougue when there is no connection to nova-compute of its host, but in future further indication may used to decide a host is failed 16:48:29 <jgriffith> quite honestly I'm worried about the bugs that will be logged due to operator/admin error though 16:48:30 <winston-d> hemna: cinder is on the end of the connection 16:48:38 <hemna> winston-d, cinder failed in that case as well no ? 16:48:53 <jgriffith> winston-d: EhudTrainin hemna avishay duncanT caitlin56 thoughts on my comment ^^ 16:49:00 <avishay> jgriffith: +1 16:49:07 <hemna> jgriffith, +1 16:49:18 <duncanT> jgriffith: I totally agree this is basically a force call 16:49:21 <thingee> I think we all understand the use case now. nova node is not responsive, rouge vms. Still we keep coming back to cinder not having enough information. I think really though nova should be driving this still in handling this situation happening. 16:49:31 <winston-d> jgriffith: +1 16:49:38 <avishay> thingee: what's missing? 16:49:39 <dosaboy> +1 16:49:40 <hemna> if nova can't talk to the n-cpu process on the host, nova can't really detach the volume. 16:49:43 <jgriffith> thingee: yeah, but if we step back.... 16:49:43 <duncanT> jgriffith: My single concern is how to signal to the caller that the call failed 16:50:02 <jgriffith> thingee: allowing an admin to force disconnect and ban a node from connecting I'm ok 16:50:03 <EhudTrainin> The problem Nova can't take care of this, since a failure indication does not ensure the the instance is not talking to the storage or won't do it after some time 16:50:23 <jgriffith> duncanT: can you explain? 16:50:25 <jgriffith> sorry 16:50:33 <winston-d> duncanT: signal for what? force detach failed? 16:50:52 <duncanT> jgriffith: If cinder can't talk to the storage backend, it can't force the detach... 16:50:52 <thingee> EhudTrainin: I apologize, but this is the first time I'm hearing about the bp. I'll check it out to understand more, but as jgriffith mentioned I fear the bugs in automating this. 16:51:00 <thingee> I like the idea of admin call to force detach though 16:51:02 <duncanT> winston-d: Yes, forced detach failed 16:51:26 <duncanT> winston-d: It is far from a show-stopper, just want to ensure it is thought of 16:51:31 <hemna> thingee, but if n-cpu isn't reachable on the host....nova can't detach the volume. 16:51:36 <avishay> duncanT: if nobody can talk to the VM and nobody can talk to the storage, i guess you need to pull the plug :) 16:51:40 <winston-d> duncanT: a-synch call, no call back. please come back query the result. 16:51:52 <jgriffith> ok 16:52:07 <jgriffith> so EhudTrainin I think the take-away (and I can add this to bp if you like) 16:52:10 <jgriffith> is: 16:52:13 <duncanT> winston-d: That's fine, yup, just need to remember to add the queryable status :-) 16:52:31 <jgriffith> 1. Add an admin API call to attempt force disconnect of an attachment/s to a specified node/IP 16:52:36 <hemna> the best you can do in that case is ask cinder to disconnect from the backend, that'll eventually leave a broken LUN on the host, which will give i/o errors for the host and the vm. 16:52:38 <thingee> hemna: cinder still doesn't have the information needed though to act. Maybe this bp explains that..I haven't read it yet. 16:52:53 <jgriffith> Ummm... hmm 16:53:05 <xyang__> avishay: if this is done by cinder, what happened to the entries in nova db 16:53:10 <jgriffith> so then what :) 16:53:24 <hemna> thingee, cinder has the volume and attachment info in it's db. it can call the right backend to disconnect 16:53:26 <avishay> xyang__: good question - EhudTrainin ? 16:53:39 <thingee> this is a nova node HA problem. I'm not sure why we're trying to solve it with cinder. 16:53:40 <winston-d> xyang__: nova is the caller, it should know what to do about the block-device-mapping 16:53:43 <hemna> this is just icky 16:54:02 <thingee> hemna: that's not the problem 16:54:05 <xyang__> winston-d: nova is not working here, right 16:54:09 <thingee> hemna: the problem is cinder doesn't know to act 16:54:22 <xyang__> winston: this will be a cinder operation completely 16:54:24 <jgriffith> xyang__: a particular compute node is trashed 16:54:28 <avishay> so this should start at n-api and then to cinder? 16:54:31 <hemna> thingee, well not yet. :) we were talking about forcing a disconnect from cinder. 16:54:32 <duncanT> thingee: Nova knows it has lost track of a vm and can make the call, yes? 16:54:50 <hemna> duncanT, yah 16:54:54 <EhudTrainin> when the instance is rebuilt, the volume is detached and then attached. the rebuild would be done only after fencing to avoid possible conflict 16:55:05 <thingee> duncanT, hemna: as jgriffith mentioned, I think making a call to force deatach is good. But nova should make that call 16:55:13 <hemna> thingee, +1 16:55:14 <thingee> or an admin 16:55:15 <winston-d> xyang__: nova compute is not working, not the entire nova, e.g. nova-api is still working 16:55:18 <jgriffith> EhudTrainin: ok, now you kinda lost me 16:55:20 <duncanT> thingee +1 16:55:36 <xyang__> winston-d: ok. 16:55:45 <hemna> winston-d, but the host needs n-cpu to be working in order to detach the volume from the VM and the host 16:55:59 <jgriffith> hemna: no 16:56:05 <jgriffith> hemna: it just needs n-api 16:56:10 <jgriffith> hemna: n-api can call cinder 16:56:11 <avishay> EhudTrainin: i think the question is - why can't this be implemented in nova, where nova-api calls detach/terminate_connection for all volumes attached to the host? 16:56:24 <thingee> jgriffith: but what tells n-api? 16:56:26 <hemna> n-cpu does the work to detach the volume from the hypervisor and the host kernel 16:56:30 <jgriffith> thingee: LOL 16:56:36 <jgriffith> thingee: excellent question :) 16:56:55 <jgriffith> thingee: and now we're back to admin, in which case who cares if it's direct to cinder api from admin etc 16:57:02 <winston-d> hemna: in the case when n-cpu is on fire, n-api has to call for help from cinder 16:57:04 <thingee> again, I really think this is a nova node HA case. I don't see anything right now that cinder can know to act on. 16:57:15 <hemna> winston-d, yah I think that's the only option. 16:57:18 <jgriffith> thingee: I agree 16:57:29 <EhudTrainin> The detach command does not ensure that the instace on the mal host would not try to access the storage 16:57:42 <jgriffith> so you all keep saying things like "nova node on fire" "nova node is unreachable" etc etc 16:58:01 <jgriffith> if the nova node is so hosed it's probably not going to be making iscsi connections anyway 16:58:09 <hemna> EhudTrainin, correct, but if the cinder backend driver disconnects from the storage, the host will get i/o errors when the vm/host tries to access the volume. 16:58:14 <avishay> what about file system mounts, where terminate_connection doesn't do anything? 16:58:17 <jgriffith> I have an easy solution... 16:58:25 <thingee> so we all agree...force deatach exposed. Leave it to the people handling the instances. If a nova node catches fire, there better be another nova node available to catch rouge vms and communicate with cinder 16:58:26 <jgriffith> ssh root@nova-node; shutdown -h now 16:58:26 <hemna> EhudTrainin, effectively detaching the volume....but with a dangling LUN 16:58:27 <winston-d> thingee: cinder doesn't and doesn't have to know. cinder just provides help, in the case when n-cpu is broke and no one can reach n-cpu. 16:58:39 <jgriffith> if that doesn't work login to pdu and shut off power 16:58:49 <thingee> winston-d: so when does cinder to the deatch to help? 16:58:58 <hemna> thingee, +1 16:59:00 <avishay> jgriffith: what if the server's management network is down, but it's still accessing a storage network? 16:59:04 <duncanT> jgriffith: It says in the blueprint you don't always have a PDU 16:59:14 <hemna> avishay, re: Fibre Channel ? :P 16:59:17 <jgriffith> avishay: that's where the unplug came from LOL 16:59:25 <avishay> hemna: or a separate ethernet network 16:59:25 <jgriffith> duncanT: sighh 16:59:29 <caitlin56> henma: if nova is hosed then it shouldn't be surprising that gettig evertything working again will not be trivial. 16:59:32 <jgriffith> call the DC monkey 16:59:43 <avishay> hah 16:59:56 <jgriffith> alright, we're spiraling 16:59:57 <thingee> times up. 17:00:06 <winston-d> thingee: n-api finds out n-cpu is on fire, it'd like to re-create another vm on another n-cpu. but n-api failed to disconnect vol, it has to call for cinder's help 17:00:07 <hemna> throw a grenade and run. next! 17:00:14 <jgriffith> EhudTrainin: it's an interesting idea but there are some very valid concerns here IMO 17:00:15 <avishay> how about this case: we have an NFS mount on the host. disconnect today does nothing. how do we stop the VM from accessing it? 17:00:18 <duncanT> The way I'm reading the blueprint here, all it is asking for is a force_disconnect_and_rogue_reconnections 17:00:26 <duncanT> avishay: Kill the export? 17:00:42 <hemna> avishay, heh, that's why jgriffith and I complained about the NFS unmount code :) 17:00:46 <jgriffith> I think we're all fine with an admin extension to force disconnect 17:00:52 <jgriffith> let's start with that and go from there 17:00:57 <jgriffith> everybody ok with that? 17:01:01 <hemna> +1 17:01:05 <jgriffith> of course with NFS you're just screwed 17:01:19 <winston-d> +1 17:01:30 <jgriffith> Ok... we can theorize more in #openstack-cinder if you like 17:01:30 <thingee> +1 17:01:32 * hartsocks waes 17:01:32 <EhudTrainin> I think beyond force disconnect we would also want to prevent the nova-compute on that host from creating new connections 17:01:35 <jgriffith> thanks everybody 17:01:37 <hemna> yah good luck deploying a cloud w/ NFS :P 17:01:41 <jgriffith> #endmeeting