#openstack-meeting-alt log

13:59:24 <m3m0> #startmeeting freezer
13:59:26 <openstack> Meeting started Thu Mar 24 13:59:24 2016 UTC and is due to finish in 60 minutes.  The chair is m3m0. Information about MeetBot at http://wiki.debian.org/MeetBot.
13:59:27 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
13:59:30 <openstack> The meeting name has been set to 'freezer'
13:59:49 <m3m0> hey guys who's here for the freezer meeting
13:59:54 <m3m0> o/
14:00:01 <ddieterly> o/
14:01:00 <zhurong> 0/
14:01:03 <daemontool> o/
14:01:17 <zhangjn> 0/
14:01:21 <reldan> 0/
14:01:24 <yangyapeng> :)
14:01:33 <m3m0> there is a situation
14:01:40 <m3m0> we don't have topics for today :P https://etherpad.openstack.org/p/freezer_meetings
14:01:54 <m3m0> but I will like to start with one
14:02:04 <m3m0> #topic pending reviews
14:02:27 <m3m0> does anyone has important commits need it to be reviewd?
14:02:44 <daemontool> I think this is important https://review.openstack.org/#/c/297119/
14:02:45 <daemontool> from infra
14:03:03 <slashme> From szaher https://review.openstack.org/#/c/295220/
14:03:23 <daemontool> it's not highly urgent and still working in progress
14:03:30 <daemontool> https://review.openstack.org/#/c/290461/ but if anyone wants to review the code...
14:03:38 <daemontool> thanks ddieterly  btw ^^
14:03:54 <m3m0> from my side
14:03:54 <m3m0> https://review.openstack.org/278407
14:04:00 <m3m0> https://review.openstack.org/280811
14:04:07 <m3m0> https://review.openstack.org/291757
14:04:08 <ddieterly> daemontool np, just gave it a cursory reading, nothing too deep
14:04:13 <m3m0> https://review.openstack.org/296436
14:04:45 <daemontool> we need to solve an important challenge
14:05:02 <daemontool> which is, how to not re read all the data every time to compute incrementals
14:05:02 <m3m0> which is?
14:05:25 <daemontool> so we have a quick exchange of ideas with frescof
14:05:30 <m3m0> can we use an external tool like diff?
14:05:43 <daemontool> the problem is
14:05:46 <reldan> daemontool: Like git did - checks modification date
14:06:01 <daemontool> ok
14:06:07 <daemontool> so let's say there's a use case
14:06:12 <daemontool> where the current data
14:06:29 <daemontool> is 300TB (sounds big but is starting to be common)
14:06:54 <zhangjn> backup 300TB data?
14:06:57 <daemontool> if you have volumes of 100GB
14:07:11 <daemontool> you will realise 300TB is not that much volumes
14:07:20 <daemontool> zhangjn,  yes
14:07:23 <daemontool> even 100TB
14:07:37 <zhangjn> Some devstack patch is need to be accept.
14:07:44 <zhangjn> now devstack is not work.
14:07:47 <daemontool> zhangjn,  yes ++
14:07:58 <daemontool> so
14:08:07 <daemontool> I hear a lots of uses
14:08:10 <daemontool> use cases
14:08:37 <daemontool> where is needed a solution
14:08:38 <zhangjn> https://review.openstack.org/#/c/265111/
14:08:44 <zhangjn> this one
14:08:45 <daemontool> that can do backups of all volumes
14:08:48 <daemontool> like all_tenants
14:08:50 <daemontool> incremental
14:08:55 <daemontool> and that is the challenge we have
14:08:58 <m3m0> like reldan suggested, do like git or tar, first check the modification time and if is different from the previous one, read only that file
14:09:09 <daemontool> we cannot reread the data every time
14:09:18 <daemontool> m3m0, we might have something like
14:09:22 <m3m0> we are only reading the inodes
14:09:23 <daemontool> /dev/sdb2
14:09:38 <daemontool> but let's say
14:09:40 <m3m0> ooooo not from file system perspective but from the image?
14:09:42 <daemontool> we have 10k volumes
14:09:45 <m3m0> sorry the volume?
14:09:48 <daemontool> so we have 10k files
14:09:53 <daemontool> and every file is modified
14:10:01 <daemontool> and every file is 100GB
14:10:24 <daemontool> so we need to reread the data to generate the incrementals each time...
14:10:38 <reldan> daemontool: If we don’t know about the structure of these files - yes
14:10:44 <reldan> there are no better solution
14:10:57 <daemontool> we could thing of something like
14:11:01 <daemontool> storage drivers
14:11:11 <reldan> if we now for example, that these files only support appends to the end - it easier
14:11:11 <daemontool> there are fs that can do that for us
14:11:17 <daemontool> like zfs
14:11:19 <daemontool> or btrfs
14:11:26 <daemontool> or other proprietory solution
14:11:44 <daemontool> they can provide to us
14:11:46 <daemontool> before hand
14:12:12 <daemontool> the blocks that changed
14:12:16 <daemontool> and for example
14:12:18 <daemontool> a differencial
14:12:25 <daemontool> of 2 snapshots
14:12:45 <EinstCrazy> you mean use the feature the fs provide ?
14:12:52 <daemontool> yes
14:12:59 <daemontool> we need a way to know what data modified
14:13:11 <daemontool> without re read all the data each time
14:13:24 <daemontool> how do we do that? this is the challenge
14:13:33 <daemontool> we can do it with tar and rsync
14:13:44 <daemontool> for significant use cases
14:13:49 <reldan> I actually don’t understand the idea to have 10K of 100 GB files per each and modify them
14:14:03 <m3m0> without support of the file system will be a problem
14:14:11 <reldan> Do we really have such customers?
14:14:16 <daemontool> reldan,  yes
14:14:23 <daemontool> I have now at least 3
14:14:31 <reldan> Probably you can share what the files?
14:14:35 <EinstCrazy> But I've study btrfs' cow 2 years ago, it did not support file  cow,but only directory
14:14:44 <reldan> What are they storing in these files?
14:14:55 <daemontool> EinstCrazy, btrfs would be for example the file system in the storage
14:15:00 <daemontool> reldan,  I don't know
14:15:11 <daemontool> so in the compute nodes
14:15:17 <daemontool> you have /var/lib/nova/instances
14:15:28 <daemontool> and that /var/lib/nova/instances is mounted on a remote storage
14:15:39 <daemontool> but btrfs is one of the use cases
14:15:40 <daemontool> cause
14:15:48 <daemontool> most of the users for this case
14:15:50 <daemontool> would have
14:15:52 <daemontool> emc
14:15:54 <daemontool> or 3par
14:15:57 <daemontool> storeonce
14:15:58 <daemontool> and so on
14:16:39 <zhangjn> Data Deduplication?
14:16:45 <daemontool> data deduplication
14:16:50 <daemontool> we can do it
14:17:01 <daemontool> with rsync as the hashes of blocks
14:17:12 <daemontool> are computed there
14:17:15 <daemontool> but
14:17:21 <daemontool> for instance data deduplication
14:17:30 <daemontool> is provided by zfs
14:17:31 <daemontool> natively
14:17:33 <zhangjn> I think this is most import for instance backup.
14:17:34 <daemontool> and also compression
14:17:43 <reldan> Yes, but in common case I suppose we should store more hashes than we have data )
14:18:03 <zhangjn> zfs is good at this scenario.
14:18:04 <daemontool> reldan, rephrase that please?
14:18:08 <daemontool> I didn't got it
14:18:19 <reldan> Let’s say we have file A and file B
14:18:28 <reldan> file A and file B share 50% of content
14:18:32 <daemontool> ok
14:18:43 <reldan> so we have rolling hash in first file
14:18:49 <reldan> and rolling hash in second
14:19:02 <zhangjn> but I think we'd better do in freezer.
14:19:03 <reldan> to be able to find that file A and file B sharing some chunk
14:19:10 <daemontool> reldan,  yes
14:19:21 <reldan> we should keep our hashes (in this case rolling hashes)
14:19:53 <reldan> and for file with size m and chunk for hash n
14:20:02 <reldan> we should have m - n + 1 hash
14:20:19 <reldan> O(n) hashes
14:20:37 <daemontool> reldan,  exactly
14:20:42 <daemontool> that's how zfs does
14:20:51 <daemontool> hash table
14:21:03 <daemontool> some more info here v
14:21:05 <daemontool> https://pthree.org/2012/12/18/zfs-administration-part-xi-compression-and-deduplication/
14:21:13 <daemontool> what I'd like to understand is
14:21:23 <daemontool> do you see this is something we need to solve?
14:21:27 <daemontool> all?
14:21:49 <ddieterly> seems like s scale issue that has to be solved
14:21:57 <daemontool> yes
14:22:12 <daemontool> we cannot use rsync for large scale cases
14:22:15 <daemontool> or for cases
14:22:26 <daemontool> where customers wants to have the backups executed on all_tenants
14:22:33 <daemontool> all volumes
14:22:37 <daemontool> or all instances
14:22:53 <daemontool> if the customer can use rsync on each vm/volume
14:22:55 <daemontool> than better
14:24:05 <daemontool> so
14:24:14 <daemontool> we can have something like storage drivers
14:24:38 <zhangjn> like zfs driver?
14:24:38 <daemontool> and we manage the storage drivers as btrfs, zfs, storeonce, 3par
14:24:40 <daemontool> yez
14:25:07 <daemontool> so if we have the backend that does dedup and can provide hashes of modified blocks
14:25:11 <daemontool> we use that and we scale
14:25:19 <daemontool> otherwise we provide tar and rsync
14:25:21 <zhangjn> enable driver to do this is easy.
14:25:22 <daemontool> depending on the cases
14:25:49 <daemontool> if we provide this... we'll provide the best open source backup/restore/disaster recovery tool in the world
14:26:07 <daemontool> that's it :)
14:26:39 <zhangjn> in the virtualization scenario :)
14:26:39 <daemontool> now, who wants to write a bp for this?
14:26:49 <daemontool> in the cloud business yes
14:27:10 <slashme> daemontool: You mean we would have a zfs/btrfs/3par/... engine ?
14:27:20 <daemontool> slashme yes
14:27:30 <slashme> Seems very good to me :)
14:27:34 <daemontool> better definition in freezer glossary
14:27:36 <daemontool> ty :)
14:27:46 <daemontool> it's is higly critical
14:27:55 <daemontool> cause it's the competitive advantage
14:28:16 <daemontool> that commercial solution have
14:28:21 <daemontool> vs us
14:28:25 <zhangjn> first support 3par
14:28:29 <slashme> And it allows us not to introduce too much complexity in freezer if we manage dedupliction and incremental using the storage
14:28:50 <daemontool> slashme, yes. I think we can provide both
14:29:09 <daemontool> but the cases of where the features will be uses
14:29:12 <daemontool> are very different
14:29:15 <daemontool> for different customers
14:29:15 <slashme> Yes\
14:29:17 <daemontool> etc
14:29:34 <daemontool> if people use ext4 and use freezer to backup from withicn the vms
14:29:40 <daemontool> with dedup
14:29:43 <daemontool> then we offer that
14:29:52 <zhangjn> last week I meet with hpe sale in my office.
14:30:03 <daemontool> but if we have all the giants
14:30:06 <daemontool> looking at us
14:30:15 <daemontool> cause they want to use freezer
14:30:24 <daemontool> then we have to provide that
14:30:42 <daemontool> and I think, it's a reasonable approach to get more people onboard to the project
14:30:57 <zhangjn> good idea
14:31:13 <daemontool> we can talk about this
14:31:16 <daemontool> in the Summit
14:31:30 <ddieterly> who will be going to the summit
14:31:38 <ddieterly> i will be there
14:31:41 <daemontool> most of us
14:31:54 <daemontool> me too
14:32:01 <slashme> I will
14:32:06 <slashme> m3m0 as well
14:32:09 <daemontool> frescof too
14:32:13 <daemontool> I hope szaher too
14:32:14 <zhangjn> more resources is good for us.
14:32:19 <slashme> frescof also
14:32:25 <szaher> daemontool: Sorry I won't be able to do it :(
14:32:33 <ddieterly> reldan you going?
14:32:35 <daemontool> ok I'm sorry Saad
14:32:59 <reldan> ddieterly: Nope, my boss said that I’m not going
14:32:59 <daemontool> I'm going to return the free ticket than, is that ok?
14:33:14 <szaher> Ok
14:33:29 <daemontool> ddieterly, if you could say half word internally that'd be good, if you can/want
14:33:38 <ddieterly> ok, sounds like a good number of people will be there
14:33:39 <daemontool> I can share the room
14:33:46 <daemontool> so no room costs
14:33:56 <daemontool> and provide free access to the summit
14:34:11 <ddieterly> daemontool i did not understand; could you please rephrase?
14:34:29 <daemontool> ddieterly, can you have a word with Omead, and see if there's anything he can do to send reldan ?
14:34:33 <daemontool> something like that
14:34:49 <ddieterly> daemontool i'm sorry, but i just barely made the cut to go
14:34:51 <daemontool> I can share the room
14:34:56 <daemontool> ok
14:34:57 <daemontool> np
14:34:58 <ddieterly> hpe is really cutting back on attendees
14:35:01 <daemontool> ok
14:35:03 <zhangjn> share man :(
14:35:17 <daemontool> ok np
14:35:43 <ddieterly> daemontool omead an only send 2 people
14:35:44 <zhangjn> I am so shy
14:35:59 <ddieterly> roland and i got to go
14:36:04 <daemontool> ok np
14:36:07 <daemontool> ty :)
14:36:11 <daemontool> anyway
14:36:18 <daemontool> let's get back on track
14:36:20 <daemontool> with the meeting here
14:36:23 <daemontool> so
14:36:28 <daemontool> anyone wants to write the bp
14:36:43 <daemontool> for engines that leverage storage specific
14:36:46 <daemontool> features?
14:36:52 <daemontool> like dedup and incrementals?
14:37:24 <daemontool> zhangjn, EinstCrazy are you itnerested guys?
14:38:03 <ddieterly> i'd be happy to, but i don't know enough about the project to do it
14:38:09 <ddieterly> :-(
14:38:30 <daemontool> ddieterly, do you mean you don't know about freezer?
14:38:38 <daemontool> I think this can be a good opportunity
14:38:44 <daemontool> and if you see the rsync code
14:38:49 <ddieterly> yea, i don't know enough about backups and backup technology
14:38:49 <daemontool> it will be somewhat similar
14:38:54 <daemontool> ah ok
14:39:09 <daemontool> well, we can start with 3par, storeonce, emc
14:39:12 <EinstCrazy> I'm interested. But I dont know the scope in detail
14:39:15 <daemontool> or at least the technology
14:39:21 <daemontool> we have access more
14:39:25 <zhangjn> good opportunity to know freezer and backup.
14:39:47 <daemontool> ok
14:39:49 <daemontool> so let me know
14:39:54 <daemontool> even we can start with
14:39:56 <daemontool> zfs
14:40:07 <daemontool> I don't know
14:40:27 <zhangjn> EinstCrazy, go ahead, daemontool will help you.
14:40:38 <daemontool> if you are interested I'm here
14:40:40 <daemontool> frescof too
14:40:48 <daemontool> and reldan too :)
14:40:57 <daemontool> m3m0,  we can move forward
14:41:02 <daemontool> to the next topic
14:41:09 <EinstCrazy> en
14:41:15 <m3m0> ok, but still we don't have a list of topics
14:41:28 <m3m0> does anyone have something to share?
14:41:30 <daemontool> m3m0, improvise
14:41:33 <daemontool> lol
14:41:37 <daemontool> so
14:41:44 <daemontool> reldan, metadata
14:41:46 <m3m0> documentation or testing then :)
14:41:54 <daemontool> any new on that?
14:42:22 <ddieterly> ok, hpe is talkign about multi-region agaion
14:42:34 <reldan> Yes, metadata. I am still writing blueprint. But acutally have one question
14:42:37 <ddieterly> we still aren't sure exactly what it is, but it keeps coming up
14:43:17 <kelepirci_> I am here to observ
14:43:22 <ddieterly> one instance of freezer-api/elastic search serving multiple regions of nova/neutron/etc
14:44:00 <reldan> We have discussed that we would like to change the way we keep our backups. Like for swift engine/id/bla-bla. So my question - should we support our previous backups and storages in this case
14:44:49 <ddieterly> reldan you mean remain backward compatible?
14:45:37 <reldan> yes, you are right. Because 1) remain backward compatability is really bad solution 2) don’t remain previous backup - also not very nice
14:46:11 <ddieterly> it is much easier for all if backward compatibility is maintained
14:46:21 <ddieterly> otherwise, you must deal with upgrading
14:46:25 <zhangjn> How many people are using freezer project to backup in product?
14:46:50 <ddieterly> freezer is shipped with hpe helion
14:47:11 <daemontool> also in Ericsson is being used in few projects
14:47:17 <ddieterly> so, we need backward compatibility or else an upgrade path
14:47:34 <daemontool> ddieterly,  yes, but it's just about doing a new level 0 backups after all
14:47:48 <daemontool> that's the difference I think
14:48:04 <ddieterly> ok, as long as existing installations do not break...
14:48:11 <reldan> In case of backward compatibility, there are no much reason to support new container path and metadata. Because I should be able to restore my data without metadata anyway :)
14:48:54 <zhangjn> who can write a user case for me, I can push the freezer project in china.
14:50:09 <daemontool> zhangjn, let's have an offline conversation about that
14:50:10 <daemontool> I can help
14:50:29 <daemontool> reldan, I agree
14:51:07 <reldan> So I need some decision about backward compatibility
14:51:13 <reldan> I can even write “migration”
14:51:26 <daemontool> any thought?
14:51:27 <daemontool> anyone
14:51:38 <reldan> but don’t think that for big amount of data it is a good idea
14:51:45 <daemontool> I agree
14:51:49 <ddieterly> migration sucks
14:52:11 <reldan> :)
14:52:16 <ddieterly> ;-)
14:52:29 <daemontool> another feature we need to provide
14:52:33 <daemontool> are rolling upagrades
14:52:40 <daemontool> all other services are providing that
14:52:46 <daemontool> we need to do that too
14:52:53 <zhangjn> how to upgrade the freezer will be considered later?
14:53:08 <reldan> Second idea (also very bad) rename swift storage to old-swift and write new-swift with metadata and different pathes
14:53:19 <reldan> local -> old-loca
14:53:29 <reldan> ssh -> old-ssh
14:53:40 <daemontool> reldan, I think, if needed we can have backward incompatibility
14:53:41 <reldan> and support both versions :)
14:53:50 <daemontool> in Newton
14:53:55 <daemontool> and if you need to restrore data
14:54:00 <daemontool> exactly on that moment
14:54:15 <daemontool> because there's no level 0 yet
14:54:25 <daemontool> then a previou freezer-agent version needs to be used
14:54:30 <daemontool> we can write that in the documentation
14:54:44 <daemontool> if we need to move forward and provide better feature and simplify our life
14:55:04 <reldan> Sounds good for me. But then everybody should agree that at some point we will be unable to restore old backups by new code
14:56:07 <slashme> I don't see the problem
14:56:12 <m3m0> guys we have 4 min left
14:56:19 <daemontool> if this happens rarely
14:56:24 <daemontool> I don't see the problem either
14:56:30 <slashme> You just have to use an older version of freezer to restore an older backup
14:56:49 <ddieterly> slashme will give you the customer ticket when the problem comes up ;-)
14:57:13 <yangyapeng> I disagree
14:57:15 <daemontool> ddieterly, it slashme that get the ticket first :)
14:57:26 <daemontool> yangyapeng, ok, please extend
14:57:28 <daemontool> :)
14:57:49 <ddieterly> yea, i meant that slashme gets the ticket
14:57:49 <yangyapeng> it is a trouble to restore use old backup
14:58:22 <m3m0> 2 minutes left
14:58:24 <daemontool> ddieterly,  ok :)
14:58:38 <daemontool> sorry I have to run now, I have a meeting
14:58:44 <ddieterly> ciao!
14:58:52 <m3m0> remember that we can take this discussion to #openstack-freezer channel
14:59:07 <m3m0> thanks all for your time :)
14:59:09 <m3m0> #endmeeting