16:00:42 <jgriffith> #startmeeting cinder
16:00:43 <openstack> Meeting started Wed Oct 30 16:00:42 2013 UTC and is due to finish in 60 minutes.  The chair is jgriffith. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:44 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:46 <rushiagr> yeah!!
16:00:46 <openstack> The meeting name has been set to 'cinder'
16:00:49 <rushiagr> o/
16:00:52 <jgriffith> dosaboy:
16:00:53 <duncanT> Hey
16:01:07 <rushiagr> jusst on time :)
16:01:08 <jgriffith> dosaboy: with us?
16:01:11 <jgriffith> rushiagr: :)
16:01:14 <dosaboy> oh hey yes
16:01:16 <duncanT> Glad you shouted, totally forgotten the clocks change altered the meeting time
16:01:16 <avishay> hi
16:01:20 <dosaboy> goddam DST ;)
16:01:28 <jgriffith> #topic backup support for metadata
16:01:30 <jgriffith> dosaboy: LOL
16:01:34 <jungleboyj> Howdy all!
16:01:38 <jgriffith> jungleboyj: yo
16:01:40 <dosaboy> ok so
16:01:55 <avishay> https://wiki.openstack.org/wiki/CinderMeetings
16:02:09 <guitarzan> duncanT: it's ok, the rest of us will show up at the wrong time next week
16:02:25 <guitarzan> hmm, or the week after I suppose
16:02:31 <jungleboyj> guitarzan: +2
16:02:32 <dosaboy> so i had a few discussions now about backlup metadata
16:02:42 <dosaboy> few opionions flying around
16:02:49 <dosaboy> but
16:02:59 <dosaboy> i think the best way forward is as follows
16:03:16 <dosaboy> each backup driver will backup a set of volume metadata
16:03:27 <dosaboy> that 'set' of metadata will come from a common api
16:03:34 <dosaboy> presented to all drivers
16:03:38 <dosaboy> which will be versioned
16:03:56 <dosaboy> this will allow for the volume to recreated from scratch
16:04:02 <dosaboy> should the db/cinder cluster get lost
16:04:04 <jgriffith> dosaboy: why back up the metadata at all?
16:04:11 <dosaboy> (some caveats tbd)
16:04:20 <jgriffith> dosaboy: ahh...  db recovery
16:04:21 <avishay> dosaboy: i noted on the BP that you need an import_backup too
16:04:23 <dosaboy> well (queue DuncanT)
16:04:33 <dosaboy> avishay
16:04:35 <dosaboy> yes
16:04:41 <dosaboy> I have created a sperate BP for that
16:04:50 <dosaboy> basiucally all this back stuff is mushromming a bit ;)
16:04:52 <avishay> Oh OK, cool - please link the BPs
16:04:55 <dosaboy> back/backup
16:05:09 <dosaboy> yeah sorry i am all over the shop this week
16:05:11 <duncanT> The vision I've always tryed to keep for backup is that it *is* for disaster recovery
16:05:15 <dosaboy> trying to keep up
16:05:19 <jgriffith> duncanT: DR of what though?
16:05:28 <caitlin56> dosaboy: how would this work if we allowed the Volume Drivers to do backups?
16:05:29 <jgriffith> duncanT: it's not an HA implementation of Cinder
16:05:35 <jgriffith> duncanT: at least I don't think it should be
16:05:48 <duncanT> jgriffith: Cidner volumes. Even if your cinder cluster caught frie orr got stolen, you can still get you volume(s) back
16:06:03 <duncanT> Gah, typing fail
16:06:06 <caitlin56> dosaboy: having the Volume Driver do the backup would be more efficient when the drives are external, but we need a definitive format.
16:06:07 <jgriffith> duncanT: if it's cinder volumes I ask again, why even back up metadata
16:06:08 <avishay> jgriffith: if you backup to a remote site and you lose your entire cinder, your backups should remain usable
16:06:19 <dosaboy> caitlin56: not sure what you mean
16:06:20 <jgriffith> caitlin56: besides the point right now
16:06:27 <jungleboyj> duncanT: So, the data about what volumes there were?
16:06:33 <jgriffith> avishay: that's Cinder DR not volume backup
16:06:44 <duncanT> jgriffith: because certain volumes are useless without at least some metadata (e.g. the bootable flags and glace metadata for licensing)
16:06:46 <jgriffith> what I'm saying is that they're two very different things
16:06:53 <avishay> jgriffith: everything's connected :)
16:06:57 <jgriffith> duncanT: ok.. I'm going ot try this one more time
16:07:06 <jgriffith> avishay: duncanT
16:07:07 <jgriffith> first:
16:07:19 <jungleboyj> avishay: Oh, ok, backup of the volumes is separate and then this backs up the data for accessing them.  Right?
16:07:22 <jgriffith> My thought regarding the purpose of volume backup service is to backup volumes
16:07:41 <jgriffith> what you're proposing now is bleeding over into the db contents
16:07:45 <jgriffith> however...
16:07:57 <jgriffith> if you're going to do that, then I would argue that you have to go all the way
16:08:02 <dosaboy> jgriffith: since we need to backup the metadata, we could just shove it in the backend and effectively get DR for free
16:08:11 <jgriffith> in other words just backing up the meta is only part of the story
16:08:21 <dosaboy> it is not much effort to get that done
16:08:26 <jgriffith> dosaboy: ummm... I don't think it's that simple honestly
16:08:29 <thingee> o/
16:08:34 <jgriffith> dosaboy: quotas, limits etc etc
16:08:41 <dosaboy> well,
16:08:41 <jgriffith> all of those things exist in the DB
16:08:48 <avishay> dosaboy: it's DR with high RPO and RTO
16:08:48 <jgriffith> snapshots
16:08:50 <duncanT> quotas, limits etc I don't think are part of the volume
16:08:54 <dosaboy> that is where my versioned api comes in
16:08:57 <dosaboy> so
16:09:01 <jgriffith> duncanT: neither is metadata
16:09:10 <dosaboy> the idea is that we define a sufficient set of metadata
16:09:13 <bswartz> if you want to recover from your whole cinder going up in smoke you need to mirror the whole cinder DB
16:09:15 <duncanT> But the stuff needed to use the volume *is* part of the volume
16:09:18 <caitlin56> Backing up the metadata with the data is relatively easy, it's standardizing it and being compatible with existing backups that takes work.
16:09:22 <jgriffith> I honestly think this makes things WAY more complicated than we should
16:09:37 <dosaboy> caitlin56: hence the versioning
16:09:38 <avishay> caitlin56: yes
16:09:40 <jgriffith> if you want cinder DR then implement an HA cinder setup
16:09:44 <duncanT> bswartz: The backup API allows you to choose (and pay for in certain cases) a safe, cold storage copy of you volume
16:09:54 <jgriffith> if you want to back up databases, back them up using a backup service
16:10:09 <duncanT> I *don't* want to back up the database
16:10:10 <winston-d> caitlin56: but can we 'backup' the metadata in db?
16:10:10 <caitlin56> Backing up the CinderDB means that you would be restoring *all* volumes.
16:10:19 <jgriffith> duncanT: but that's your argument here
16:10:28 <jgriffith> the only reason to backup metadata is if the db is lost
16:10:30 <duncanT> jgriffith: No it isn't
16:10:37 <duncanT> (my arguement)
16:10:44 <jgriffith> ok... then why backup the metadata at all?
16:10:48 <caitlin56> If you want to restore selective volumes then you need selective metadata (or no metadata, which is what jgriffith is arguing).
16:11:07 <duncanT> Right, I want a backup to be a disaster resistant copy of a volume.
16:11:13 <jgriffith> duncanT: what's the logic to backing up the metadata?
16:11:13 <bswartz> if you're not worried about the database going away, then there's no point to making more copies of the metadata
16:11:20 <duncanT> Including everything you need to use *that volume*
16:11:29 <duncanT> Not *all volumes*
16:11:34 <duncanT> Not *cinder config*
16:11:43 <duncanT> Just the volume I've said is important
16:11:53 <duncanT> Otehrwise use a snapshot
16:12:00 <jgriffith> You're still not realy answering my question
16:12:05 <winston-d> bswartz: yes, there is. we 'snapshot' the metadata, in DB
16:12:18 <caitlin56> duncanT: can you site some specific metadata fields that you would not know how to set when restoring just the volume payload?
16:12:19 <jgriffith> bswartz: +1
16:12:30 <jgriffith> bswartz: which is my whole point
16:12:42 <jgriffith> bswartz: and what I'm trying to get duncanT to explain
16:12:43 <duncanT> caitlin56: The bootable flag. The licensing info held in the glance metadata
16:13:00 <jgriffith> introducing things like "disaster resistant" isn't very helpful to me :)
16:13:02 <avishay> winston-d: needs to be consistent
16:13:04 <bswartz> winston-d: you're imagining that the metadata might change and you want to restore it from an old copy?
16:13:07 <jgriffith> that's a bit subjective
16:13:16 <duncanT> I'd like to be able to import a backup into a clean cinder install
16:13:18 <winston-d> bswartz: correct
16:13:26 <jgriffith> duncanT: ahh... that's VERY different!
16:13:34 <jgriffith> duncanT: that's volume import or migration
16:13:42 <jgriffith> that's NOT volume backup
16:13:43 <bswartz> winston-d: seems reasonble
16:13:47 <duncanT> jgriffith: No, it's backup and restore
16:13:52 <avishay> no, it's not migration
16:13:58 <duncanT> jgriffith: Even if cinder dies, catchs fire etc
16:14:01 * dosaboy is sitting on the fence whistling
16:14:05 <caitlin56> Does anyone have examples of metadagta that SHOULD NOT be restored when you migrate a volume from one site to another?
16:14:09 <duncanT> jgriffith: The backup should be enough to get my working volume back
16:14:12 <avishay> migration is within one openstack install
16:14:41 <duncanT> I put that on my first ever backup slide, and I propose to keep it there
16:14:43 <jgriffith> avishay: yeah...
16:14:59 <jgriffith> avishay: duncanT alright we're obviously not going to agree here
16:15:03 * jungleboyj is enjoying the show.
16:15:05 <jgriffith> avishay: well maybe you and I will
16:15:12 <avishay> haha
16:15:25 <winston-d> avishay: backup is within one cinder install as well, no?
16:15:25 <jgriffith> duncanT: fine, so you want a "cinder-volume service" backup
16:15:30 <avishay> i'm with duncanT on this one
16:15:46 <avishay> winston-d: i can backup to wherever i want - think geo-distributed swift even
16:15:46 <jgriffith> haha
16:15:53 <dosaboy> avishay: +1
16:15:56 <jgriffith> WTF?
16:16:01 <duncanT> jgriffith: I just want the volume I backup to come back, even if cinder caught fire in the mean time
16:16:09 <jgriffith> right
16:16:10 * bswartz is nervou about conflating backup/restore use cases with DR use cases
16:16:21 <bswartz> nervous*
16:16:22 <jgriffith> the key is "cinder caugh fire"
16:16:26 <avishay> if i backup to geo-distributed swift, and a meteor hits my datacenter, i can rebuild and point my new metadata to existing swift objects
16:16:27 <caitlin56> What is a backup? If it is not enough to restore a volume to a new context then why not just replicate snapshots?
16:16:34 <jgriffith> sorry.. buy you're saying "backup as a service" in openstack as a whole IMO
16:16:34 <duncanT> backup has been DR since day one on the design spec
16:16:50 <jgriffith> You're not talking about restoring a volume anymore
16:17:09 <winston-d> avishay, duncanT don't forget to backup volume types, and qos alone with volume and metadata
16:17:09 <guitarzan> where is that leap coming from?
16:17:12 <jgriffith> you're talking about all the little nooks and crannies that your specific install/implementation may require
16:17:14 <bswartz> duncanT: I don't agree that taking backups is the best way to implement DR -- it's *a* way, but a relatively poor one
16:17:21 <avishay> backup and replication are on the same scale, with different RPO/RTO and fail-over methods
16:17:25 <jgriffith> and what's worse is you're saying "I only care about metadata"
16:17:26 <winston-d> avishay: duncanT 'cos those two are considered to be 'metadata' to the voluem as well
16:17:32 <duncanT> The problem is, when I first wrote backup, we didn't have bootable flags, or volume encryption... you got the bits back onto an iscsi target and you were back in business
16:17:33 <jgriffith> but somebody else says "but I care about quotas"
16:17:38 <guitarzan> winston-d has an interesting point
16:17:40 <jgriffith> and somebody else cares about something else
16:17:44 <duncanT> Glance metadata was the first bug
16:17:47 <bswartz> the real value of backups is when you don't have a disaster, but you've corrupted your data somehow
16:17:57 <jgriffith> It doesn't end until you backup all of cinder and the db
16:17:59 <avishay> winston-d: but i can put the data into a volume with different qos and it still works
16:18:01 <duncanT> Types or rate limits don't stop me using a volume
16:18:12 <guitarzan> bswartz: isn't that a disaster? :)
16:18:15 <jgriffith> duncanT: they on't stop YOU
16:18:17 <jgriffith> that's the key
16:18:26 <jgriffith> they may stop others though depending on IMPL
16:18:40 <bswartz> guitarzan: no -- that's a snafu
16:18:42 <duncanT> jgriffith: They don't stop a customer...
16:18:49 <jgriffith> duncanT: they don't stop your customers
16:18:52 <jgriffith> duncanT: they stop mine
16:19:09 <jgriffith> duncanT: I have specific heat jobs that require volumes of certain types
16:19:10 <bswartz> there's a difference between users screwing up their own data, and a service provider having an outage
16:19:14 <duncanT> jgriffith: The point is, right now, even if you've got your backup of a bootable volume, it is useless if cinder looses stuff out the DB
16:19:14 <winston-d> avishay but that's not the original volume (not data) any more.
16:19:28 <jgriffith> duncanT: perhaps
16:19:32 <winston-d> avishay: i mean data is, volume is not
16:19:39 <caitlin56> Would it always make sense when restoring a single volume to a new datacenter to preserve the prior QoS/quotas/etc.?
16:19:39 <avishay> winston-d: that's philosophical :P
16:19:48 <duncanT> jgriffith: You can't restore it in such a way that you can boot from it. At all.
16:19:53 <dosaboy> all this could be accounted for by simply defining a required matadata set
16:19:58 <jgriffith> duncanT: but the purpose of the backup IMO is if your backend takes a dump a user can get his data back, or as bswartz pointed out a user does "rm -rf /*"
16:20:06 <dosaboy> i don't see why that would be so complex
16:20:07 <winston-d> dosaboy: +1
16:20:16 <avishay> it's all of these use cases
16:20:19 <jgriffith> dosaboy: I don't disagree with that
16:20:36 <dosaboy> deliberating whther this or that metadata is required is ot reall for this conversation
16:20:39 <duncanT> jgriffith: Right now, I CAN'T GET MY BOOTABLE VOLUME TO BOOT
16:20:43 <avishay> it's rm -rf, it's a fire, it's a meteor
16:20:43 <duncanT> It jsut can't be done
16:20:45 <caitlin56> dosaboy: agreed, if we are going to backup metadata, we need to define filters on the metadata so only things that should be kept are.
16:20:55 <jgriffith> duncanT: keep yelling
16:21:04 <jgriffith> duncanT: I'll keep ignoring :)
16:21:12 <guitarzan> that's the bug, you can't boot a restored backup
16:21:14 <avishay> jgriffith: accidental caps lock ;)
16:21:26 <duncanT> jgriffith: Sorry, I was out of line a touch there
16:21:28 <jgriffith> avishay: haha... I don't think that's the case
16:21:36 <avishay> jgriffith: be optimistic :)
16:21:45 <jgriffith> dosaboy: so like I said in IRC the other day....
16:22:01 <dosaboy> yarp
16:22:03 <jgriffith> dosaboy: I'm fine with it being implemented, I could care less
16:22:03 <winston-d> i'd like to consider the volume as a virtual hard drive.
16:22:03 <duncanT> jgriffith: But a snapshot covers the rm -rf case
16:22:18 <jgriffith> dosaboy: I have the info in my DB so I really don't care
16:22:37 <jgriffith> dosaboy: If you're an idiot and you don't back up you db then hey.. this at least will help you
16:22:49 <jgriffith> dosaboy: but something else is going to bight you in the ass later
16:22:54 <winston-d> whatever it takes to backup a virtual hard drive, that's what we should do in cinder backup.
16:22:56 <caitlin56> Allowing snapshot replication would deal with disaster recovery issues, but not with porting a volume to a new vendor.
16:23:03 <dosaboy> damn it, i'm back on the fence again
16:23:28 <dosaboy> i kinda think the only way to resolve this is to have a vote
16:23:33 <jgriffith> dosaboy: I also think that things like metadata would be good in an "export/import" api
16:23:48 <avishay> dosaboy: democracy doesn't work in these situations i think :)
16:23:50 <thingee> I missed the beginning of this convo. Why are people opposed to it restoring metadata?
16:23:52 <jgriffith> dosaboy: duncanT but like I said, it doesn't *hurt* me if you put metadata there
16:23:54 <dosaboy> lol
16:23:59 <duncanT> I totally agree that theya re part of export too
16:24:08 <duncanT> And transfer for that matter
16:24:23 <dosaboy> ok so, i have implement a chunk of this,
16:24:23 <avishay> what is "volume export"?
16:24:30 <duncanT> Certain volumes are literally useless if you loose certain bits of their metadata
16:24:33 <dosaboy> why don't i see if i can knock uo the rest
16:24:38 <dosaboy> and then if you like...
16:24:42 <jgriffith> dosaboy: I do think someone should better define the purpose of cinder-backup though
16:24:50 <jgriffith> dosaboy: that's fine by me
16:24:52 <winston-d> thingee: i think we are discussing about where to save the copy of metadata for a volume backup, in DB or in Swift/Ceph/Sth else
16:24:53 <dosaboy> jgriffith: totally agree
16:24:57 <zhiyan> winston-d: if we thinking volume as a virtual hard driver, so can we export it as a package, like ovf? it contains metadata
16:25:00 <jgriffith> dosaboy: like I said, I won't object to backing it up at all
16:25:00 <jungleboyj> jgriffith: +1
16:25:09 <jgriffith> dosaboy: but I don't want to have misleading expectations
16:25:10 <dosaboy> i would have asked for a session if i had not confrmed HK so late
16:25:21 <jgriffith> dosaboy: this is nowhere near a Cinder DR recovery backup
16:25:25 <jgriffith> and I don't want to make it one
16:25:35 <jgriffith> errr...
16:25:38 <jgriffith> s/recover//
16:25:43 <dosaboy> there are many a stong opinion on this one ;)
16:25:43 <thingee> winston-d: I think we've discussed  before to leave it to the object store.
16:25:46 <caitlin56> jrgriffith: we're debating what a backup is good for.
16:25:51 <thingee> as object store metadata
16:26:07 <avishay> jgriffith: what is "volume export"?
16:26:27 <jgriffith> avishay: non-existent :)
16:26:44 <avishay> jgriffith: it looks like i'm leading the "volume import" session, so thought I should know :)
16:26:53 <jgriffith> avishay: the idea/proposal was to be able to kick out volumes from Cinder without deleting them off the backend
16:27:00 <dosaboy> put it this way, as long as the necessary metadat is backed up (either way)
16:27:03 <avishay> jgriffith: ah OK
16:27:03 <jgriffith> and then obviously an import to pull in existing volumes
16:27:03 <dosaboy> noone gets hurt
16:27:08 <winston-d> thingee: yeah, but as i said to dosaboy the other day, i missed that discussion
16:27:11 <jgriffith> dosaboy: agreed
16:27:32 <dosaboy> soi'll keep going in the track i'm on
16:27:47 <dosaboy> and we can take a look at what i get done
16:27:51 <avishay> jgriffith: quotas are tricky on export
16:27:53 <dosaboy> if we don't like it then fine
16:28:02 <thingee> dosaboy: are you still storing it in object store metadata?
16:28:09 <dosaboy> thingee: yes
16:28:12 <jgriffith> avishay: indeed
16:28:18 <dosaboy> but each driver will have it's own way
16:28:20 <duncanT> avishay: I'd suggest that once something is kicked out of cinder, then it can't take up a cinder quota?
16:28:22 <winston-d> avishay: and types, extra specs, qos?
16:28:23 <caitlin56> dosaboy: be sure to escape the metadata then.
16:28:32 <jgriffith> duncanT: +1
16:28:35 <avishay> duncanT: but it still takes up space on disk
16:28:54 <jgriffith> avishay: yeah so that's the counter, however you kicked it out
16:28:58 <duncanT> avishay: Not cinder's problem once you've explicitly decided it isn't cinder's problem
16:29:06 <jgriffith> avishay: that tenant can't 'access' via cinder anymore
16:29:17 <jgriffith> avishay: it's troubling....
16:29:27 <avishay> jgriffith: yes.  is export needed?
16:29:29 <jgriffith> avishay: I'm having the same dilema with adding a purge call to the API
16:29:39 <caitlin56> Quotas are tricky, because they are implicitly part of the cinder context where the backup was made.
16:29:47 <jgriffith> avishay: well that's another good question :)
16:29:48 <dosaboy> ooh i had not thought of that
16:30:03 <dosaboy> so how do backups count towards quota atm?
16:30:06 <avishay> wait...i'm sorry i forked the conversation
16:30:14 <avishay> people are getting confused
16:30:14 <duncanT> dosaboy: Currently they don't
16:30:24 * rushiagr definitely is
16:30:27 <avishay> we moved to talk about "volume export" without finishing backup metadata
16:30:29 <dosaboy> and i presume that is bad?
16:30:34 * jgriffith went down the fork nobody else is on
16:30:44 <avishay> haha
16:30:50 <caitlin56> dosaboy: backups should count in your *object* storage quoata, not cinder.
16:31:12 <avishay> i think we can hash out export at the session.  i think we should also find time to talk about backup metadata.
16:31:17 <dosaboy> caitlin56: but they don't necessarily go to an object store
16:31:33 <dosaboy> like if you offload to TSM
16:31:51 <caitlin56> dosaboy: when the backup is to an object store, we should let that object store track/report the data consumption.
16:32:07 <thingee> caitlin56: +1
16:32:09 <jungleboyj> caitlin56: That makes sense.
16:32:22 <jungleboyj> Cinder can't run the whole world.
16:32:29 <thingee> I don't want this problem where multiple projects are tracking the same resource quota again.
16:32:30 <jungleboyj> ...yet.
16:32:42 <dosaboy> anyway i don't wanna hijack this meeting anymore
16:33:29 <dosaboy> i think quotas in backup are discussion to be had though
16:33:30 <jgriffith> dosaboy: too bad :)
16:33:37 <avishay> dosaboy: duncanT: can you come up with a set of metadata to back up and we'll discuss in person next week?  we can do an ad-hoc session?  jgriffith sound good?
16:33:46 <dosaboy> sure
16:33:50 <duncanT> Sure
16:34:04 * jungleboyj is sorry he is going to miss that discussion.  The IRC version was fun!
16:34:08 <jgriffith> avishay: sure... but like I said, if dosaboy just wants to implement backup of metadata with a volume I have no objection
16:34:19 <dosaboy> we can all sit around a nice campfire
16:34:19 <jgriffith> so it would be a short conversation on my part :)
16:34:25 <avishay> ok :)
16:34:25 <dosaboy> DuncanT can make the cocoa
16:34:27 <jgriffith> no way!!!!
16:34:32 <jungleboyj> dosaboy: +1
16:34:33 <jgriffith> I know duncanT would push me into the fire
16:34:39 <dosaboy> haha
16:34:39 <avishay> that way we know if duncanT is yelling or just hit the caps lock ;)
16:34:42 <avishay> hahah
16:34:49 <jungleboyj> :-)
16:35:07 <dosaboy> don't worry, i'll bring the whiskey
16:35:16 <jungleboyj> dosaboy: +2
16:35:19 <jgriffith> alright...
16:35:21 <jgriffith> well that was fun
16:35:33 <jgriffith> Is Ehud around?
16:35:37 <EhudTrainin> yes
16:35:44 <jgriffith> EhudTrainin: welcome
16:35:49 <EhudTrainin> hi
16:35:51 <jgriffith> #topic fencing
16:36:21 <jgriffith> https://blueprints.launchpad.net/cinder/+spec/fencing-and-unfencing
16:36:27 <jgriffith> for those that haven't seen it ^^
16:36:33 <jgriffith> EhudTrainin: I'll let you kick it off
16:37:07 <EhudTrainin> It is about adding fencing functionality to Cinder
16:37:24 <EhudTrainin> in order to support HA for instances
16:38:08 <avishay> blueprint explains it pretty well
16:38:15 <EhudTrainin> by ensuring the instances on a failed host would not try to access the storage after they rebuilt on another host
16:38:35 <jungleboyj> avishay: +1
16:38:46 <duncanT> My concerns here are the failure cases... partitioned storage etc...
16:38:47 <winston-d> avishay: +1, nice write up in the bp. haven't seen any one like that for quite a while.
16:38:49 <rushiagr> +1 for beutifully explaining in BP
16:38:50 <jgriffith> yeah, the bp is well written (nice job on that by the way)
16:39:21 <avishay> duncanT: please elaborate
16:39:41 <duncanT> It's easy to say it's hard for nova so cinder should do it, but exactly the same problems exist in cinder failure cases... like cinder looses communitication with a storage backend
16:40:13 <zhiyan> EhudTrainin: i'm thinking how to cinder identify those attachment session for a volume. seems need prevent such race condition issue.. IMO
16:40:20 <avishay> it's not hard for nova - it's impossible for nova.  the server is in a bad state and we can't trust it to do the right thing.
16:40:21 <jgriffith> I guess my only real question was:
16:40:22 <hemna> fencing ?
16:40:37 <jgriffith> 1. why would the compute host try and access the storage?
16:40:44 <jgriffith> 2. why do we necessarily care?
16:41:03 <jgriffith> I should clarify before somebody flips out...
16:41:18 <EhudTrainin> I think that if the storage does not response then fence will fail, but it is lower probability to both host and storage fail at the same time
16:41:25 <jgriffith> If a compute host *fails* and instances are migrated
16:41:43 <jgriffith> it should IMO be up to nova to disable the atachments on the *failed* compute host
16:41:50 <avishay> jgriffith: it might not fail completely - it might just lose connectivity or go into some other bad state
16:41:58 <jgriffith> avishay: sure
16:42:12 <jgriffith> avishay: but it migrated instances right?
16:42:13 <hemna> shouldn't nova deal with detaching and reattaching elsewhere ?
16:42:22 <jgriffith> hemna: that's kinda what I was saying
16:42:30 <thingee> I agree with jgriffith. it should be up to whatever is doing the migration.
16:42:42 <avishay> wait
16:42:45 <hemna> since Nova should know that it had to migrate the instance to another host, it has the knowledge of the state
16:42:46 <jgriffith> we could really screw some things up if we make incorrect decisions
16:42:53 <duncanT> But currently there is no way of telling cinder 'make sure this is teally detached', I don't think
16:42:58 <avishay> EhudTrainin: if nova brings up the instance on another VM, does it have the same instance ID?
16:43:02 <duncanT> *totally detached
16:43:18 <EhudTrainin> This is not exactly migation, but a rebuild, since the host from Nova point of view has failed
16:43:25 <caitlin56> jgriffith: I agree, nova knows the client state accurately. It should deal with the results of that changing.
16:43:25 <winston-d> jgriffith: migrate the instance may or maynot stop the old one from connecting cinder
16:43:34 <hemna> avishay, I'd asume since it's a rebuild,it would be a new instance id
16:43:48 <jgriffith> winston-d: yeah... but I'm saying it *should*
16:43:50 <dosaboy> sounds like we expect the same piece of code that may fail to migrat and instance to be sane enough to ensure a fence
16:43:51 <hemna> could be a poor assumption though
16:44:10 <jgriffith> dosaboy: that could be double un-good
16:44:31 <dosaboy> yeah, i may be missing how this would be done though
16:44:33 <avishay> the use case is that the nova server is not responsive but the VM continues to run and access the storage
16:44:43 <EhudTrainin> It may rebuild with the same IP and attach it to the same volume
16:44:46 <jgriffith> avishay: ahhh
16:44:46 <caitlin56> dosaboy: could you propose something where we are protecting the volume rather than doing nova's work for it?
16:45:02 <duncanT> So if a compute node goes wonkey, when is it safe to reattach a volume that was attached to that host to another instance?
16:45:03 <jgriffith> avishay: so rogue vm's that nova can't get to anymore
16:45:12 <avishay> jgriffith: yes
16:45:14 <jgriffith> duncanT: never probably
16:45:26 <jgriffith> avishay: so who does the fencing?
16:45:30 <avishay> but now i'm think that maybe this is only a problem if we have multi-attach?
16:45:30 <duncanT> jgriffith: Indeed. I think the idea of fence is 'make it safe to do that'
16:45:34 <avishay> jgriffith: i assume the admin
16:45:39 <jgriffith> avishay:ie who makes the call
16:45:48 <jgriffith> avishay: and why not just send a detach/disconnect
16:45:52 <caitlin56> rogueVMs aren't something we shoudl solve - at most we should protect the volume from rogue VMs.
16:46:00 <dosaboy> caitlin56: not sure what you mean there
16:46:03 <hemna> jgriffith, +1
16:46:22 <duncanT> caitlin56: The idea here *is* to protect the volume for a rogue VM
16:46:26 <dosaboy> caitlin56: how does cinder know what a rogue vm is though?>
16:46:27 <winston-d> jgriffith: detach failed
16:46:34 <duncanT> I guess it is a disconnect on steroids
16:46:48 <winston-d> jgriffith: 'cos nova compute is not reachable
16:46:58 <caitlin56> dosaboy: exactly, we don't want cinder falsely deciding a VM is rogue.
16:47:00 <hemna> dosaboy, cinder doesn't know.  only nova does
16:47:06 <jgriffith> duncanT: yeah, I'm assumign that's what the implementation would basicly be here
16:47:09 <dosaboy> catlin56: ah gotcha
16:47:34 <hemna> I think nova should drive this and I'm not sure what cinder needs to do during the fencing process.  nova should detach from the rogue vm
16:47:36 <jgriffith> Ok... so interesting scenarios
16:47:39 <avishay> EhudTrainin: want to comment?
16:47:40 <jgriffith> here's my take...
16:47:57 <thingee> hemna: +1 nova should be driving this
16:48:01 <thingee> cinder does not have enough information
16:48:05 <duncanT> hemna: If the compute node stops talking, nova can't to the detach from the VM
16:48:11 <jgriffith> if you want to implement a service/admin API call to force disconnect from a node and ban a node I *think* that's ok
16:48:17 <winston-d> hemna: what if nova failed to detach volume, the only hope is to beg cinder to help
16:48:21 <EhudTrainin> An instance may be rougue when there is no connection to nova-compute of its host, but in future further indication may used to decide a host is failed
16:48:29 <jgriffith> quite honestly I'm worried about the bugs that will be logged due to operator/admin error though
16:48:30 <winston-d> hemna: cinder is on the end of the connection
16:48:38 <hemna> winston-d, cinder failed in that case as well no ?
16:48:53 <jgriffith> winston-d: EhudTrainin hemna avishay duncanT caitlin56 thoughts on my comment ^^
16:49:00 <avishay> jgriffith: +1
16:49:07 <hemna> jgriffith, +1
16:49:18 <duncanT> jgriffith: I totally agree this is basically a force call
16:49:21 <thingee> I think we all understand the use case now. nova node is not responsive, rouge vms. Still we keep coming back to cinder not having enough information. I think really though nova should be driving this still in handling this situation happening.
16:49:31 <winston-d> jgriffith: +1
16:49:38 <avishay> thingee: what's missing?
16:49:39 <dosaboy> +1
16:49:40 <hemna> if nova can't talk to the n-cpu process on the host, nova can't really detach the volume.
16:49:43 <jgriffith> thingee: yeah, but if we step back....
16:49:43 <duncanT> jgriffith: My single concern is how to signal to the caller that the call failed
16:50:02 <jgriffith> thingee: allowing an admin to force disconnect and ban a node from connecting I'm ok
16:50:03 <EhudTrainin> The problem Nova can't take care of this, since a failure indication does not ensure the the instance is not talking to the storage or won't do it after some time
16:50:23 <jgriffith> duncanT: can you explain?
16:50:25 <jgriffith> sorry
16:50:33 <winston-d> duncanT: signal for what? force detach failed?
16:50:52 <duncanT> jgriffith: If cinder can't talk to the storage backend, it can't force the detach...
16:50:52 <thingee> EhudTrainin: I apologize, but this is the first time I'm hearing about the bp. I'll check it out to understand more, but as jgriffith mentioned I fear the bugs in automating this.
16:51:00 <thingee> I like the idea of admin call to force detach though
16:51:02 <duncanT> winston-d: Yes, forced detach failed
16:51:26 <duncanT> winston-d: It is far from a show-stopper, just want to ensure it is thought of
16:51:31 <hemna> thingee, but if n-cpu isn't reachable on the host....nova can't detach the volume.
16:51:36 <avishay> duncanT: if nobody can talk to the VM and nobody can talk to the storage, i guess you need to pull the plug :)
16:51:40 <winston-d> duncanT: a-synch call, no call back. please come back query the result.
16:51:52 <jgriffith> ok
16:52:07 <jgriffith> so EhudTrainin I think the take-away (and I can add this to bp if you like)
16:52:10 <jgriffith> is:
16:52:13 <duncanT> winston-d: That's fine, yup, just need to remember to add the queryable status :-)
16:52:31 <jgriffith> 1. Add an admin API call to attempt force disconnect of an attachment/s to a specified node/IP
16:52:36 <hemna> the best you can do in that case is ask cinder to disconnect from the backend, that'll eventually leave a broken LUN on the host, which will give i/o errors for the host and the vm.
16:52:38 <thingee> hemna: cinder still doesn't have the information needed though to act. Maybe this bp explains that..I haven't read it yet.
16:52:53 <jgriffith> Ummm... hmm
16:53:05 <xyang__> avishay: if this is done by cinder, what happened to the entries in nova db
16:53:10 <jgriffith> so then what :)
16:53:24 <hemna> thingee, cinder has the volume and attachment info in it's db.   it can call the right backend to disconnect
16:53:26 <avishay> xyang__: good question - EhudTrainin ?
16:53:39 <thingee> this is a nova node HA problem. I'm not sure why we're trying to solve it with cinder.
16:53:40 <winston-d> xyang__: nova is the caller, it should know what to do about the block-device-mapping
16:53:43 <hemna> this is just icky
16:54:02 <thingee> hemna: that's not the problem
16:54:05 <xyang__> winston-d: nova is not working here, right
16:54:09 <thingee> hemna: the problem is cinder doesn't know to act
16:54:22 <xyang__> winston: this will be a cinder operation completely
16:54:24 <jgriffith> xyang__: a particular compute node is trashed
16:54:28 <avishay> so this should start at n-api and then to cinder?
16:54:31 <hemna> thingee, well not yet.  :)  we were talking about forcing a disconnect from cinder.
16:54:32 <duncanT> thingee: Nova knows it has lost track of a vm and can make the call, yes?
16:54:50 <hemna> duncanT, yah
16:54:54 <EhudTrainin> when the instance is rebuilt, the volume is detached and then attached. the rebuild would be done only after fencing to avoid possible conflict
16:55:05 <thingee> duncanT, hemna: as jgriffith mentioned, I think making a call to force deatach is good. But nova should make that call
16:55:13 <hemna> thingee, +1
16:55:14 <thingee> or an admin
16:55:15 <winston-d> xyang__: nova compute is not working, not the entire nova, e.g. nova-api is still working
16:55:18 <jgriffith> EhudTrainin: ok, now you kinda lost me
16:55:20 <duncanT> thingee +1
16:55:36 <xyang__> winston-d: ok.
16:55:45 <hemna> winston-d, but the host needs n-cpu to be working in order to detach the volume from the VM and the host
16:55:59 <jgriffith> hemna: no
16:56:05 <jgriffith> hemna: it just needs n-api
16:56:10 <jgriffith> hemna: n-api can call cinder
16:56:11 <avishay> EhudTrainin: i think the question is - why can't this be implemented in nova, where nova-api calls detach/terminate_connection for all volumes attached to the host?
16:56:24 <thingee> jgriffith: but what tells n-api?
16:56:26 <hemna> n-cpu does the work to detach the volume from the hypervisor and the host kernel
16:56:30 <jgriffith> thingee: LOL
16:56:36 <jgriffith> thingee: excellent question :)
16:56:55 <jgriffith> thingee: and now we're back to admin, in which case who cares if it's direct to cinder api from admin etc
16:57:02 <winston-d> hemna: in the case when n-cpu is on fire, n-api has to call for help from cinder
16:57:04 <thingee> again, I really think this is a nova node HA case. I don't see anything right now that cinder can know to act on.
16:57:15 <hemna> winston-d, yah I think that's the only option.
16:57:18 <jgriffith> thingee: I agree
16:57:29 <EhudTrainin> The detach command does not ensure that the instace on the mal host would not try to access the storage
16:57:42 <jgriffith> so you all keep saying things like "nova node on fire" "nova node is unreachable" etc etc
16:58:01 <jgriffith> if the nova node is so hosed it's probably not going to be making iscsi connections anyway
16:58:09 <hemna> EhudTrainin, correct, but if the cinder backend driver disconnects from the storage, the host will get i/o errors when the vm/host tries to access the volume.
16:58:14 <avishay> what about file system mounts, where terminate_connection doesn't do anything?
16:58:17 <jgriffith> I have an easy solution...
16:58:25 <thingee> so we all agree...force deatach exposed. Leave it to the people handling the instances. If a nova node catches fire, there better be another nova node available to catch rouge vms and communicate with cinder
16:58:26 <jgriffith> ssh root@nova-node; shutdown -h now
16:58:26 <hemna> EhudTrainin, effectively detaching the volume....but with a dangling LUN
16:58:27 <winston-d> thingee: cinder doesn't and doesn't have to know. cinder just provides help, in the case when n-cpu is broke and no one can reach n-cpu.
16:58:39 <jgriffith> if that doesn't work login to pdu and shut off power
16:58:49 <thingee> winston-d: so when does cinder to the deatch to help?
16:58:58 <hemna> thingee, +1
16:59:00 <avishay> jgriffith: what if the server's management network is down, but it's still accessing a storage network?
16:59:04 <duncanT> jgriffith: It says in the blueprint you don't always have a PDU
16:59:14 <hemna> avishay, re: Fibre Channel ? :P
16:59:17 <jgriffith> avishay: that's where the unplug came from LOL
16:59:25 <avishay> hemna: or a separate ethernet network
16:59:25 <jgriffith> duncanT: sighh
16:59:29 <caitlin56> henma: if nova is hosed then it shouldn't be surprising that gettig evertything working again will not be trivial.
16:59:32 <jgriffith> call the DC monkey
16:59:43 <avishay> hah
16:59:56 <jgriffith> alright, we're spiraling
16:59:57 <thingee> times up.
17:00:06 <winston-d> thingee: n-api finds out n-cpu is on fire, it'd like to re-create another vm on another n-cpu. but n-api failed to disconnect vol, it has to call for cinder's help
17:00:07 <hemna> throw a grenade and run.  next!
17:00:14 <jgriffith> EhudTrainin: it's an interesting idea but there are some very valid concerns here IMO
17:00:15 <avishay> how about this case: we have an NFS mount on the host.  disconnect today does nothing.  how do we stop the VM from accessing it?
17:00:18 <duncanT> The way I'm reading the blueprint here, all it is asking for is a force_disconnect_and_rogue_reconnections
17:00:26 <duncanT> avishay: Kill the export?
17:00:42 <hemna> avishay, heh, that's why jgriffith and I complained about the NFS unmount code :)
17:00:46 <jgriffith> I think we're all fine with an admin extension to force disconnect
17:00:52 <jgriffith> let's start with that and go from there
17:00:57 <jgriffith> everybody ok with that?
17:01:01 <hemna> +1
17:01:05 <jgriffith> of course with NFS you're just screwed
17:01:19 <winston-d> +1
17:01:30 <jgriffith> Ok... we can theorize more in #openstack-cinder if you like
17:01:30 <thingee> +1
17:01:32 * hartsocks waes
17:01:32 <EhudTrainin> I think beyond force disconnect we would also want to prevent the nova-compute on that host from creating new connections
17:01:35 <jgriffith> thanks everybody
17:01:37 <hemna> yah good luck deploying a cloud w/ NFS :P
17:01:41 <jgriffith> #endmeeting