14:00:40 #startmeeting freezer 14:00:40 Meeting started Thu Jun 30 14:00:40 2016 UTC and is due to finish in 60 minutes. The chair is domhnallw. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:40 groovy 14:00:41 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:44 The meeting name has been set to 'freezer' 14:00:50 o/ 14:01:01 Okay, I don't see any topics in the etherpad, has anyone any suggestions? 14:01:53 o/ 14:01:55 o/ 14:01:55 o/ 14:02:03 #topic Using bup for backup engine - Tim Buckley to lead discussion 14:02:45 timothyb89? 14:02:48 so, I'm interested in adding a new engine that would use https://github.com/bup/bup 14:03:31 it would give us solid deduplication for free, in addition to incremental backups and compression 14:03:56 I do note this in the readme: Reasons you might want to avoid bup 14:03:56 This is a very early version. Therefore it will most probably not work for you, but we don't know why. It is also missing some probably-critical features. 14:04:04 it would fix the limitations tar has with incremental and name changes/moves 14:04:38 while the github repo does say that (and it is still true, in some areas) it is still a fairly mature project 14:04:41 its been around ~6 years now 14:05:07 and, it's underlying format (git packfile) is pretty battle-tested by now 14:05:12 how is it doing in the performance area? 14:05:47 I haven't measured that myself, but I'd imagine pretty decently.. it seems that they have implemented all the performance-critical parts in C 14:05:58 current release .028.1 is a red flag 14:06:21 and I wonder how it will behave when working with streams 14:06:22 ddieterly, red flag? 14:06:35 The 0.x version number you mean? 14:06:38 it is not at a 1.0 release yet 14:06:43 timothyb89, can it be used to store data in the current supported media storage? 14:06:44 domhnallw yes 14:07:32 how would it work, would we wrap the executable? 14:07:36 daemontool: to my knowledge it's only designed for filesystem-to-filesystem backups, so there would need to be a small layer on top for, e.g. swift 14:08:15 There's a related project called 'bupper' that facilitates config file-based profiles: https://github.com/tobru/bupper 14:08:16 wrapping needs further investigation, but it might not need to be wrapped at all, given that it's mainly written in python 14:08:46 we may be able to use it directly, or worst case use it like we currently use tar 14:09:20 timothyb89, if we can use the code, without wrapping binaries +1 14:11:18 there are other backup tools similar to bup. e.g. borg, attic, ... 14:11:38 Might also be worth looking at this near the bottom of the readme: https://github.com/bup/bup/tree/0.28.1#things-that-are-stupid-for-now-but-which-well-fix-later 14:11:39 has anybody tried to analyse the different options? 14:12:08 i did with borg 14:12:17 python 3 only 14:12:19 it would be great if someone could make a list of possible options in a bp and pros/cons of each option 14:12:25 works with local storages 14:12:32 but not so sure about swift 14:12:33 and also define some requirements from our side 14:12:38 for the tool 14:12:52 what needs to support/provide in order to be included 14:13:08 Absolutely, a concrete set of requirements must come first. 14:13:24 (a *lot* easier said than done though) 14:13:43 if everybody contributes what they know, then it would be easy 14:13:55 step 1, create a bp 14:14:35 step 3, profit 14:14:44 I looked at different backup systems for my private home backup a while ago. I found this overview very useful: https://wiki.archlinux.org/index.php/Synchronization_and_backup_programs 14:14:55 I can write the requirements 14:15:02 in the bp 14:15:57 #action daemontool to create bp for backup engine options/requirements 14:16:38 we need to be consistent with previous engines conversations 14:16:53 I-ll check that with slashme 14:17:02 are those recorded somewhere? 14:17:07 I was only aware of tar? 14:17:23 domhnallw theres dar as well 14:17:24 ddieterly: I see what you did there :P 14:17:54 m3m0 i didn't just fall off a turnip truck 14:18:28 ok next? 14:18:50 We're done with this topic then? 14:19:10 sure 14:19:18 Okay. 14:19:19 #topic Policy on breaking backward compatibility 14:19:32 This should be fun :) 14:19:40 i know we discussed this before, but i'm not sure if there is a stated policy on this 14:19:50 domhnallw, nice to meet you btw 14:19:54 Likewise :) 14:20:00 when daemontool is around, it is always fun 14:20:11 If anyone has any additional topics, now would be an excellent time to add them to the etherpad :) 14:21:10 do we have a policy on breaking backward compatibility? 14:21:24 So, backward compatability. Are we talking about changing freezer's behaviour, or changing its requirement versions? 14:21:36 behavior I believe? 14:22:03 as i understand it, version n does not work with version n+1 14:23:09 So I guess we need to also think about whether we're talking about the API, the scheduler, or the agent, and if we're talking command-line arguments, configuration settings, etc. etc. 14:23:30 if you are adding things to the etherpad, could you please pick a color so that we know who is adding what? 14:24:01 ddieterly, it looks so far that it's you and yangyapeng, is that wrong? 14:24:36 it looks like someone is adding with purple 14:24:52 not me :) 14:24:53 maybe that's lavendar, i'm not a color expert 14:25:05 'How can we scale'... 14:25:11 who dat? 14:25:30 got to be daemontool 14:25:49 Whoever it is, please state your name and, erm, colour? 14:25:49 i guess daemontool ？ 14:25:50 :p 14:25:57 it's me yes 14:26:04 domhnallw: usualy, for cli arguments 14:26:06 first of all, you are welcome 14:26:09 :) 14:26:12 ok 14:26:14 haha :) I see the patch in gerrit 14:26:16 Adding is no poblem 14:26:30 removing needs to be deprecated for a release cycle 14:26:34 daemontool just put your name on a color so that we know who it is 14:27:21 daemontool thank you 14:27:36 Okay, they're both you :) 14:27:43 Thanks folks. 14:27:50 Now, back on topic. 14:27:52 so, getting back to serious things 14:27:59 CLI arguments for the various elements of freezer I believe? 14:28:03 one example I was wondering about was fixing the '--no-incremental' argument type (string -> boolean) 14:28:03 how do we backup large data set size, 14:28:13 without re reading all the data every time_ 14:28:23 ? 14:29:48 daemontool what is the current issue with the current implementation now? 14:29:59 that you need to re read all the data every time 14:30:14 I mean not now 14:30:20 because we do not support block based incremental 14:30:32 so we check only file inode changes 14:30:59 would a different engine besides tar solve some of that? 14:31:09 it depends on the engine 14:31:12 dar nope 14:31:27 ok, so a different engine may help with that 14:31:28 so the problem is that if you have 1TB 14:31:31 I think it would have to be a different engine, right? 14:31:35 yes 14:31:43 you read it today 14:31:48 compute the block hashes 14:32:03 or anything that keep track of blocks state 14:32:04 ok, dumb idea, why don't users just back up entire volumes of the cloud hosts? 14:32:27 ddieterly, ok, no incremental? 14:32:36 if we have a 1TB 14:32:38 use a huge net instead of trying to pick up each little fishy 14:32:43 Incremental backups are designed to save space, right? Just save the changes rather than having multiple copies of near-identical data? 14:33:04 I can't see when that wouldn't be at least desirable. 14:33:04 so let's say we have a volume of 1 TB today 14:33:26 we execute a volume backup today 14:33:29 what do we do tomorrow? 14:33:37 backup it the whole volume again? 14:33:51 if we do incremental 14:33:58 daemontool incremental? 14:34:03 then we need to check the block differences with yesterday execution 14:34:09 but every time we do a backup 14:34:14 we need to re read the 1TB 14:34:28 now thing if we have 500 volumes, 1TB each 14:34:33 every time we need to re read 500TB 14:34:38 that does not scale.... 14:34:46 so, how do we scale? :( 14:34:53 so, you propose to check inodes instead of the data itself? 14:35:02 ddieterly, that's how we do things now with tar 14:35:11 and with that approach everyday we backup 1TB 14:35:12 each time 14:35:17 fast but not efficient 14:35:24 this is important 14:35:29 because in enterprise environment 14:35:33 so, is there an efficient way to do this? 14:35:35 1PB of storage start to be common 14:35:49 Throwing hardware at the problem is only ever a stop-gap solution. 14:35:49 what do other backup solutions do to handle this? 14:35:56 we need to adopt something, that track the blocks changed in the fs 14:36:01 then the data is written 14:36:03 daemontool are you proposing a solution? 14:36:03 by the application 14:36:13 omg 14:36:19 so for instance 14:36:38 well this is a discussion had with a customer 14:36:49 not sure about the solution 14:36:58 does anybody do this? is there a precedent for this? 14:37:03 so zfs keep the hashtable of changed blocks in the fs 14:37:09 It sounds a lot like Dropbox if I'm honest :) 14:37:11 sounds like you want to write a logging fs 14:37:34 domhnallw don't be honest, lie to us, we like that better ;-) 14:37:34 ddieterly, nope, I think we need to support that with 1 FS that provides that at least 14:37:55 but I don't know about the solution 14:38:01 It would seem to me that doing filesystem-specific stuff might not be the cleverest solution in an open environment? 14:38:09 if we solve this, we provide enterprise grade backups 14:38:11 That's something we should investigate 14:38:14 Yep. 14:38:25 ok 14:38:28 :( 14:38:29 :) 14:38:36 And I think this is a topic for the midcycle 14:38:36 next 14:38:41 Okay. 14:38:42 yes definetely 14:38:58 #topic Tenant resources backup (relates to backup as a service) 14:39:04 https://blueprints.launchpad.net/freezer/+spec/tenant-backup/ 14:39:07 We said no to this 14:39:10 ah ok 14:39:14 sorry 14:39:17 For two reasons 14:39:52 1 Way to much code to implement and maintain to support this (we need to map every api calls for every OpenStack services) 14:40:08 this was about also what the people asked for in the summit, during the design session 14:40:15 but if we do not want to provide that 14:40:18 fine for me 14:40:21 :) 14:40:33 2 This is basicaly what Smaug does and they are a step ahead of us on that topic 14:40:56 Okay, so moving on? 14:41:03 We've already covered this topic I think: ' How do we scale? (i.e. use case backup a data set of 500TB) ' 14:41:19 #topic ' How do we scale? (i.e. use case backup a data set of 500TB) ' 14:41:36 I'm putting it in for completeness but I think we've just had this discussion. 14:41:54 Anyone? 14:42:39 Right. Next topic? 14:42:57 #topic Add more back end storages (AWS S3?) 14:43:08 This is from daemontool again. 14:43:26 Again, we've been here a bit already. 14:43:32 Think we can skip, that okay? 14:43:43 what was the outcome in brief, sorry? 14:44:00 We have an action to generate a requirements document to look at what we'd need here. 14:44:15 That will be easy once the agent refactoring is completed 14:44:15 (for any engines we'd consider) 14:44:34 ok 14:44:46 Good to move on? Or am I rushing things? 14:45:09 domhnallw, good good 14:45:15 Okay. 14:45:16 #topic Add multiple snapshotting technologies, not only LVM or cinder volumes only. We need to integrate with 3PP Storage snapshotting APIs. 14:46:46 that'd be about interacting directly with the back end storage to generate snapshot 14:46:59 like VNX, 3PAR, etc 14:47:30 zfs, btrfs, etc 14:47:30 So I guess we'd need to see if anyone has a wishlist of vendor and/or technologies it'd be preferred to support? 14:47:49 vendors* 14:48:15 let's start with the technologies directly related with the companies we work for, that'd easier for all of us to justify time 14:48:17 to invest on it 14:48:26 ? 14:49:45 Seems sensible. Would we need to devise a minimum set of supported actions each technology would need to be able to implement (or work around), or is this already in place? 14:49:54 daemontool: Same answer 14:50:06 Easy once the agent refactoring is completed 14:50:18 * domhnallw thought it was "no for two reasons" :) 14:50:19 It will just be add a new snapshot plugin 14:50:45 ok 14:51:08 Next? 14:51:10 Moreover snapshot plugins will be the simplest 14:51:51 domhnallw: yes 14:51:54 #topic Fix freezer-scheduler use trigger cron 14:51:59 From yangyapeng 14:52:08 hello 14:52:28 hi :) 14:52:29 if someone has tested schedule_date and start_date job_schedule 14:53:10 I have some test about cron trigger in freezer-schedule, cron is is unvaivable 14:54:02 So are we saying cron is unavailable/unmocked in the testing environment, or...? 14:54:45 I think we are saying it is just not working 14:54:58 yeah 14:54:58 Oh :) 14:55:19 it is not working , i will fix it 14:55:33 As an aside, I was a bit confused looking at the API where some dates/times were expressed as Unix timestamps and others as ISO8601 for no immediately obvious reason... 14:55:54 That might be for another time though. 14:56:32 So yangyapeng is that an action for you? 14:56:52 yeah, 14:57:28 #action yangyapeng Fix cron/scheduling issues in freezer-scheduler 14:57:37 :) 14:57:43 :) 14:57:52 Next? 14:57:56 #topic Improve the volumes backup through the Cinder APIs (i.e retention, delete, list backups, metadata) 14:58:24 First of all, what's bad/wrong with the current state of this? 14:58:38 daemontool ? 14:59:03 We're running low on time here, I've noticed. 14:59:08 ping iceyao 14:59:21 here 14:59:21 yes 14:59:24 sorry 14:59:32 so we do not remove old backups currently 14:59:36 Right folks, we may need to just jump to the reviews that need looking at soon... 14:59:38 the topic is your? 14:59:43 when executing cinder backups using cinder api 15:00:04 let's go to the chan 15:00:10 Okay. 15:00:12 Pending reviews? 15:00:19 And then we're wrap up. 15:00:29 First, a "hot pickle": https://review.openstack.org/#/c/331880 15:00:31 Currently, volumes backup cannot be delete or list by freezer 15:00:31 Native cinder backup 15:00:31 So, we need to work for it 15:00:37 ¬_¬ 15:00:43 I think we need to move guys 15:00:50 now there's another meeting here 15:00:53 Yep, will we close this out now? 15:00:57 yes 15:01:00 #endmeeting freezer