13:59:24 <m3m0> #startmeeting freezer 13:59:26 <openstack> Meeting started Thu Mar 24 13:59:24 2016 UTC and is due to finish in 60 minutes. The chair is m3m0. Information about MeetBot at http://wiki.debian.org/MeetBot. 13:59:27 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 13:59:30 <openstack> The meeting name has been set to 'freezer' 13:59:49 <m3m0> hey guys who's here for the freezer meeting 13:59:54 <m3m0> o/ 14:00:01 <ddieterly> o/ 14:01:00 <zhurong> 0/ 14:01:03 <daemontool> o/ 14:01:17 <zhangjn> 0/ 14:01:21 <reldan> 0/ 14:01:24 <yangyapeng> :) 14:01:33 <m3m0> there is a situation 14:01:40 <m3m0> we don't have topics for today :P https://etherpad.openstack.org/p/freezer_meetings 14:01:54 <m3m0> but I will like to start with one 14:02:04 <m3m0> #topic pending reviews 14:02:27 <m3m0> does anyone has important commits need it to be reviewd? 14:02:44 <daemontool> I think this is important https://review.openstack.org/#/c/297119/ 14:02:45 <daemontool> from infra 14:03:03 <slashme> From szaher https://review.openstack.org/#/c/295220/ 14:03:23 <daemontool> it's not highly urgent and still working in progress 14:03:30 <daemontool> https://review.openstack.org/#/c/290461/ but if anyone wants to review the code... 14:03:38 <daemontool> thanks ddieterly btw ^^ 14:03:54 <m3m0> from my side 14:03:54 <m3m0> https://review.openstack.org/278407 14:04:00 <m3m0> https://review.openstack.org/280811 14:04:07 <m3m0> https://review.openstack.org/291757 14:04:08 <ddieterly> daemontool np, just gave it a cursory reading, nothing too deep 14:04:13 <m3m0> https://review.openstack.org/296436 14:04:45 <daemontool> we need to solve an important challenge 14:05:02 <daemontool> which is, how to not re read all the data every time to compute incrementals 14:05:02 <m3m0> which is? 14:05:25 <daemontool> so we have a quick exchange of ideas with frescof 14:05:30 <m3m0> can we use an external tool like diff? 14:05:43 <daemontool> the problem is 14:05:46 <reldan> daemontool: Like git did - checks modification date 14:06:01 <daemontool> ok 14:06:07 <daemontool> so let's say there's a use case 14:06:12 <daemontool> where the current data 14:06:29 <daemontool> is 300TB (sounds big but is starting to be common) 14:06:54 <zhangjn> backup 300TB data? 14:06:57 <daemontool> if you have volumes of 100GB 14:07:11 <daemontool> you will realise 300TB is not that much volumes 14:07:20 <daemontool> zhangjn, yes 14:07:23 <daemontool> even 100TB 14:07:37 <zhangjn> Some devstack patch is need to be accept. 14:07:44 <zhangjn> now devstack is not work. 14:07:47 <daemontool> zhangjn, yes ++ 14:07:58 <daemontool> so 14:08:07 <daemontool> I hear a lots of uses 14:08:10 <daemontool> use cases 14:08:37 <daemontool> where is needed a solution 14:08:38 <zhangjn> https://review.openstack.org/#/c/265111/ 14:08:44 <zhangjn> this one 14:08:45 <daemontool> that can do backups of all volumes 14:08:48 <daemontool> like all_tenants 14:08:50 <daemontool> incremental 14:08:55 <daemontool> and that is the challenge we have 14:08:58 <m3m0> like reldan suggested, do like git or tar, first check the modification time and if is different from the previous one, read only that file 14:09:09 <daemontool> we cannot reread the data every time 14:09:18 <daemontool> m3m0, we might have something like 14:09:22 <m3m0> we are only reading the inodes 14:09:23 <daemontool> /dev/sdb2 14:09:38 <daemontool> but let's say 14:09:40 <m3m0> ooooo not from file system perspective but from the image? 14:09:42 <daemontool> we have 10k volumes 14:09:45 <m3m0> sorry the volume? 14:09:48 <daemontool> so we have 10k files 14:09:53 <daemontool> and every file is modified 14:10:01 <daemontool> and every file is 100GB 14:10:24 <daemontool> so we need to reread the data to generate the incrementals each time... 14:10:38 <reldan> daemontool: If we don’t know about the structure of these files - yes 14:10:44 <reldan> there are no better solution 14:10:57 <daemontool> we could thing of something like 14:11:01 <daemontool> storage drivers 14:11:11 <reldan> if we now for example, that these files only support appends to the end - it easier 14:11:11 <daemontool> there are fs that can do that for us 14:11:17 <daemontool> like zfs 14:11:19 <daemontool> or btrfs 14:11:26 <daemontool> or other proprietory solution 14:11:44 <daemontool> they can provide to us 14:11:46 <daemontool> before hand 14:12:12 <daemontool> the blocks that changed 14:12:16 <daemontool> and for example 14:12:18 <daemontool> a differencial 14:12:25 <daemontool> of 2 snapshots 14:12:45 <EinstCrazy> you mean use the feature the fs provide ? 14:12:52 <daemontool> yes 14:12:59 <daemontool> we need a way to know what data modified 14:13:11 <daemontool> without re read all the data each time 14:13:24 <daemontool> how do we do that? this is the challenge 14:13:33 <daemontool> we can do it with tar and rsync 14:13:44 <daemontool> for significant use cases 14:13:49 <reldan> I actually don’t understand the idea to have 10K of 100 GB files per each and modify them 14:14:03 <m3m0> without support of the file system will be a problem 14:14:11 <reldan> Do we really have such customers? 14:14:16 <daemontool> reldan, yes 14:14:23 <daemontool> I have now at least 3 14:14:31 <reldan> Probably you can share what the files? 14:14:35 <EinstCrazy> But I've study btrfs' cow 2 years ago, it did not support file cow,but only directory 14:14:44 <reldan> What are they storing in these files? 14:14:55 <daemontool> EinstCrazy, btrfs would be for example the file system in the storage 14:15:00 <daemontool> reldan, I don't know 14:15:11 <daemontool> so in the compute nodes 14:15:17 <daemontool> you have /var/lib/nova/instances 14:15:28 <daemontool> and that /var/lib/nova/instances is mounted on a remote storage 14:15:39 <daemontool> but btrfs is one of the use cases 14:15:40 <daemontool> cause 14:15:48 <daemontool> most of the users for this case 14:15:50 <daemontool> would have 14:15:52 <daemontool> emc 14:15:54 <daemontool> or 3par 14:15:57 <daemontool> storeonce 14:15:58 <daemontool> and so on 14:16:39 <zhangjn> Data Deduplication? 14:16:45 <daemontool> data deduplication 14:16:50 <daemontool> we can do it 14:17:01 <daemontool> with rsync as the hashes of blocks 14:17:12 <daemontool> are computed there 14:17:15 <daemontool> but 14:17:21 <daemontool> for instance data deduplication 14:17:30 <daemontool> is provided by zfs 14:17:31 <daemontool> natively 14:17:33 <zhangjn> I think this is most import for instance backup. 14:17:34 <daemontool> and also compression 14:17:43 <reldan> Yes, but in common case I suppose we should store more hashes than we have data ) 14:18:03 <zhangjn> zfs is good at this scenario. 14:18:04 <daemontool> reldan, rephrase that please? 14:18:08 <daemontool> I didn't got it 14:18:19 <reldan> Let’s say we have file A and file B 14:18:28 <reldan> file A and file B share 50% of content 14:18:32 <daemontool> ok 14:18:43 <reldan> so we have rolling hash in first file 14:18:49 <reldan> and rolling hash in second 14:19:02 <zhangjn> but I think we'd better do in freezer. 14:19:03 <reldan> to be able to find that file A and file B sharing some chunk 14:19:10 <daemontool> reldan, yes 14:19:21 <reldan> we should keep our hashes (in this case rolling hashes) 14:19:53 <reldan> and for file with size m and chunk for hash n 14:20:02 <reldan> we should have m - n + 1 hash 14:20:19 <reldan> O(n) hashes 14:20:37 <daemontool> reldan, exactly 14:20:42 <daemontool> that's how zfs does 14:20:51 <daemontool> hash table 14:21:03 <daemontool> some more info here v 14:21:05 <daemontool> https://pthree.org/2012/12/18/zfs-administration-part-xi-compression-and-deduplication/ 14:21:13 <daemontool> what I'd like to understand is 14:21:23 <daemontool> do you see this is something we need to solve? 14:21:27 <daemontool> all? 14:21:49 <ddieterly> seems like s scale issue that has to be solved 14:21:57 <daemontool> yes 14:22:12 <daemontool> we cannot use rsync for large scale cases 14:22:15 <daemontool> or for cases 14:22:26 <daemontool> where customers wants to have the backups executed on all_tenants 14:22:33 <daemontool> all volumes 14:22:37 <daemontool> or all instances 14:22:53 <daemontool> if the customer can use rsync on each vm/volume 14:22:55 <daemontool> than better 14:24:05 <daemontool> so 14:24:14 <daemontool> we can have something like storage drivers 14:24:38 <zhangjn> like zfs driver? 14:24:38 <daemontool> and we manage the storage drivers as btrfs, zfs, storeonce, 3par 14:24:40 <daemontool> yez 14:25:07 <daemontool> so if we have the backend that does dedup and can provide hashes of modified blocks 14:25:11 <daemontool> we use that and we scale 14:25:19 <daemontool> otherwise we provide tar and rsync 14:25:21 <zhangjn> enable driver to do this is easy. 14:25:22 <daemontool> depending on the cases 14:25:49 <daemontool> if we provide this... we'll provide the best open source backup/restore/disaster recovery tool in the world 14:26:07 <daemontool> that's it :) 14:26:39 <zhangjn> in the virtualization scenario :) 14:26:39 <daemontool> now, who wants to write a bp for this? 14:26:49 <daemontool> in the cloud business yes 14:27:10 <slashme> daemontool: You mean we would have a zfs/btrfs/3par/... engine ? 14:27:20 <daemontool> slashme yes 14:27:30 <slashme> Seems very good to me :) 14:27:34 <daemontool> better definition in freezer glossary 14:27:36 <daemontool> ty :) 14:27:46 <daemontool> it's is higly critical 14:27:55 <daemontool> cause it's the competitive advantage 14:28:16 <daemontool> that commercial solution have 14:28:21 <daemontool> vs us 14:28:25 <zhangjn> first support 3par 14:28:29 <slashme> And it allows us not to introduce too much complexity in freezer if we manage dedupliction and incremental using the storage 14:28:50 <daemontool> slashme, yes. I think we can provide both 14:29:09 <daemontool> but the cases of where the features will be uses 14:29:12 <daemontool> are very different 14:29:15 <daemontool> for different customers 14:29:15 <slashme> Yes\ 14:29:17 <daemontool> etc 14:29:34 <daemontool> if people use ext4 and use freezer to backup from withicn the vms 14:29:40 <daemontool> with dedup 14:29:43 <daemontool> then we offer that 14:29:52 <zhangjn> last week I meet with hpe sale in my office. 14:30:03 <daemontool> but if we have all the giants 14:30:06 <daemontool> looking at us 14:30:15 <daemontool> cause they want to use freezer 14:30:24 <daemontool> then we have to provide that 14:30:42 <daemontool> and I think, it's a reasonable approach to get more people onboard to the project 14:30:57 <zhangjn> good idea 14:31:13 <daemontool> we can talk about this 14:31:16 <daemontool> in the Summit 14:31:30 <ddieterly> who will be going to the summit 14:31:38 <ddieterly> i will be there 14:31:41 <daemontool> most of us 14:31:54 <daemontool> me too 14:32:01 <slashme> I will 14:32:06 <slashme> m3m0 as well 14:32:09 <daemontool> frescof too 14:32:13 <daemontool> I hope szaher too 14:32:14 <zhangjn> more resources is good for us. 14:32:19 <slashme> frescof also 14:32:25 <szaher> daemontool: Sorry I won't be able to do it :( 14:32:33 <ddieterly> reldan you going? 14:32:35 <daemontool> ok I'm sorry Saad 14:32:59 <reldan> ddieterly: Nope, my boss said that I’m not going 14:32:59 <daemontool> I'm going to return the free ticket than, is that ok? 14:33:14 <szaher> Ok 14:33:29 <daemontool> ddieterly, if you could say half word internally that'd be good, if you can/want 14:33:38 <ddieterly> ok, sounds like a good number of people will be there 14:33:39 <daemontool> I can share the room 14:33:46 <daemontool> so no room costs 14:33:56 <daemontool> and provide free access to the summit 14:34:11 <ddieterly> daemontool i did not understand; could you please rephrase? 14:34:29 <daemontool> ddieterly, can you have a word with Omead, and see if there's anything he can do to send reldan ? 14:34:33 <daemontool> something like that 14:34:49 <ddieterly> daemontool i'm sorry, but i just barely made the cut to go 14:34:51 <daemontool> I can share the room 14:34:56 <daemontool> ok 14:34:57 <daemontool> np 14:34:58 <ddieterly> hpe is really cutting back on attendees 14:35:01 <daemontool> ok 14:35:03 <zhangjn> share man :( 14:35:17 <daemontool> ok np 14:35:43 <ddieterly> daemontool omead an only send 2 people 14:35:44 <zhangjn> I am so shy 14:35:59 <ddieterly> roland and i got to go 14:36:04 <daemontool> ok np 14:36:07 <daemontool> ty :) 14:36:11 <daemontool> anyway 14:36:18 <daemontool> let's get back on track 14:36:20 <daemontool> with the meeting here 14:36:23 <daemontool> so 14:36:28 <daemontool> anyone wants to write the bp 14:36:43 <daemontool> for engines that leverage storage specific 14:36:46 <daemontool> features? 14:36:52 <daemontool> like dedup and incrementals? 14:37:24 <daemontool> zhangjn, EinstCrazy are you itnerested guys? 14:38:03 <ddieterly> i'd be happy to, but i don't know enough about the project to do it 14:38:09 <ddieterly> :-( 14:38:30 <daemontool> ddieterly, do you mean you don't know about freezer? 14:38:38 <daemontool> I think this can be a good opportunity 14:38:44 <daemontool> and if you see the rsync code 14:38:49 <ddieterly> yea, i don't know enough about backups and backup technology 14:38:49 <daemontool> it will be somewhat similar 14:38:54 <daemontool> ah ok 14:39:09 <daemontool> well, we can start with 3par, storeonce, emc 14:39:12 <EinstCrazy> I'm interested. But I dont know the scope in detail 14:39:15 <daemontool> or at least the technology 14:39:21 <daemontool> we have access more 14:39:25 <zhangjn> good opportunity to know freezer and backup. 14:39:47 <daemontool> ok 14:39:49 <daemontool> so let me know 14:39:54 <daemontool> even we can start with 14:39:56 <daemontool> zfs 14:40:07 <daemontool> I don't know 14:40:27 <zhangjn> EinstCrazy, go ahead, daemontool will help you. 14:40:38 <daemontool> if you are interested I'm here 14:40:40 <daemontool> frescof too 14:40:48 <daemontool> and reldan too :) 14:40:57 <daemontool> m3m0, we can move forward 14:41:02 <daemontool> to the next topic 14:41:09 <EinstCrazy> en 14:41:15 <m3m0> ok, but still we don't have a list of topics 14:41:28 <m3m0> does anyone have something to share? 14:41:30 <daemontool> m3m0, improvise 14:41:33 <daemontool> lol 14:41:37 <daemontool> so 14:41:44 <daemontool> reldan, metadata 14:41:46 <m3m0> documentation or testing then :) 14:41:54 <daemontool> any new on that? 14:42:22 <ddieterly> ok, hpe is talkign about multi-region agaion 14:42:34 <reldan> Yes, metadata. I am still writing blueprint. But acutally have one question 14:42:37 <ddieterly> we still aren't sure exactly what it is, but it keeps coming up 14:43:17 <kelepirci_> I am here to observ 14:43:22 <ddieterly> one instance of freezer-api/elastic search serving multiple regions of nova/neutron/etc 14:44:00 <reldan> We have discussed that we would like to change the way we keep our backups. Like for swift engine/id/bla-bla. So my question - should we support our previous backups and storages in this case 14:44:49 <ddieterly> reldan you mean remain backward compatible? 14:45:37 <reldan> yes, you are right. Because 1) remain backward compatability is really bad solution 2) don’t remain previous backup - also not very nice 14:46:11 <ddieterly> it is much easier for all if backward compatibility is maintained 14:46:21 <ddieterly> otherwise, you must deal with upgrading 14:46:25 <zhangjn> How many people are using freezer project to backup in product? 14:46:50 <ddieterly> freezer is shipped with hpe helion 14:47:11 <daemontool> also in Ericsson is being used in few projects 14:47:17 <ddieterly> so, we need backward compatibility or else an upgrade path 14:47:34 <daemontool> ddieterly, yes, but it's just about doing a new level 0 backups after all 14:47:48 <daemontool> that's the difference I think 14:48:04 <ddieterly> ok, as long as existing installations do not break... 14:48:11 <reldan> In case of backward compatibility, there are no much reason to support new container path and metadata. Because I should be able to restore my data without metadata anyway :) 14:48:54 <zhangjn> who can write a user case for me, I can push the freezer project in china. 14:50:09 <daemontool> zhangjn, let's have an offline conversation about that 14:50:10 <daemontool> I can help 14:50:29 <daemontool> reldan, I agree 14:51:07 <reldan> So I need some decision about backward compatibility 14:51:13 <reldan> I can even write “migration” 14:51:26 <daemontool> any thought? 14:51:27 <daemontool> anyone 14:51:38 <reldan> but don’t think that for big amount of data it is a good idea 14:51:45 <daemontool> I agree 14:51:49 <ddieterly> migration sucks 14:52:11 <reldan> :) 14:52:16 <ddieterly> ;-) 14:52:29 <daemontool> another feature we need to provide 14:52:33 <daemontool> are rolling upagrades 14:52:40 <daemontool> all other services are providing that 14:52:46 <daemontool> we need to do that too 14:52:53 <zhangjn> how to upgrade the freezer will be considered later? 14:53:08 <reldan> Second idea (also very bad) rename swift storage to old-swift and write new-swift with metadata and different pathes 14:53:19 <reldan> local -> old-loca 14:53:29 <reldan> ssh -> old-ssh 14:53:40 <daemontool> reldan, I think, if needed we can have backward incompatibility 14:53:41 <reldan> and support both versions :) 14:53:50 <daemontool> in Newton 14:53:55 <daemontool> and if you need to restrore data 14:54:00 <daemontool> exactly on that moment 14:54:15 <daemontool> because there's no level 0 yet 14:54:25 <daemontool> then a previou freezer-agent version needs to be used 14:54:30 <daemontool> we can write that in the documentation 14:54:44 <daemontool> if we need to move forward and provide better feature and simplify our life 14:55:04 <reldan> Sounds good for me. But then everybody should agree that at some point we will be unable to restore old backups by new code 14:56:07 <slashme> I don't see the problem 14:56:12 <m3m0> guys we have 4 min left 14:56:19 <daemontool> if this happens rarely 14:56:24 <daemontool> I don't see the problem either 14:56:30 <slashme> You just have to use an older version of freezer to restore an older backup 14:56:49 <ddieterly> slashme will give you the customer ticket when the problem comes up ;-) 14:57:13 <yangyapeng> I disagree 14:57:15 <daemontool> ddieterly, it slashme that get the ticket first :) 14:57:26 <daemontool> yangyapeng, ok, please extend 14:57:28 <daemontool> :) 14:57:49 <ddieterly> yea, i meant that slashme gets the ticket 14:57:49 <yangyapeng> it is a trouble to restore use old backup 14:58:22 <m3m0> 2 minutes left 14:58:24 <daemontool> ddieterly, ok :) 14:58:38 <daemontool> sorry I have to run now, I have a meeting 14:58:44 <ddieterly> ciao! 14:58:52 <m3m0> remember that we can take this discussion to #openstack-freezer channel 14:59:07 <m3m0> thanks all for your time :) 14:59:09 <m3m0> #endmeeting