Tuesday, February 25, 2014

Manage your backup retentions policies with retdo

You have folders with all your backups but storage capacity is starting to become low ? Need to clean up all these old files but still want to keep some of them just in case ? Then retdo is the perfect tool !

Retdo is a little script I wrote that allows administrators to clean up files on a custom retention basics.
Retdo can be used to implement production's backups retention plans.

retdo can resolve the following queries :

- I want to keep only one file per week if files are older than 3 months up to 6 months.
- I want to keep only one file per month if files are older than 6 months up to 1 year.
- I want files older than 1 year to be moved to another machine.
- I want a cup of tea (feature in progress)

Code and instructions are available for free at https://github.com/gcharot/retdo

Example : 

 Let's say I have my January daily backups in /data/backup/db/dbname :

#  ll /data/backup/db/dbname/   
 -rw-r--r-- 1 root root 0 Jan 1 12:00 jan01.tgz  
 -rw-r--r-- 1 root root 0 Jan 2 12:00 jan02.tgz  
 -rw-r--r-- 1 root root 0 Jan 3 12:00 jan03.tgz  
 -rw-r--r-- 1 root root 0 Jan 4 12:00 jan04.tgz  
 -rw-r--r-- 1 root root 0 Jan 5 12:00 jan05.tgz  
 -rw-r--r-- 1 root root 0 Jan 6 12:00 jan06.tgz  
 -rw-r--r-- 1 root root 0 Jan 7 12:00 jan07.tgz  
 -rw-r--r-- 1 root root 0 Jan 8 12:00 jan08.tgz  
 -rw-r--r-- 1 root root 0 Jan 9 12:00 jan09.tgz  
 -rw-r--r-- 1 root root 0 Jan 10 12:00 jan10.tgz  
 -rw-r--r-- 1 root root 0 Jan 11 12:00 jan11.tgz  
 -rw-r--r-- 1 root root 0 Jan 12 12:00 jan12.tgz  
 -rw-r--r-- 1 root root 0 Jan 13 12:00 jan13.tgz  
 -rw-r--r-- 1 root root 0 Jan 14 12:00 jan14.tgz  
 -rw-r--r-- 1 root root 0 Jan 15 12:00 jan15.tgz  
 -rw-r--r-- 1 root root 0 Jan 16 12:00 jan16.tgz  
 -rw-r--r-- 1 root root 0 Jan 17 12:00 jan17.tgz  
 -rw-r--r-- 1 root root 0 Jan 18 12:00 jan18.tgz  
 -rw-r--r-- 1 root root 0 Jan 19 12:00 jan19.tgz  
 -rw-r--r-- 1 root root 0 Jan 20 12:00 jan20.tgz  
 -rw-r--r-- 1 root root 0 Jan 21 12:00 jan21.tgz  
 -rw-r--r-- 1 root root 0 Jan 22 12:00 jan22.tgz  
 -rw-r--r-- 1 root root 0 Jan 23 12:00 jan23.tgz  
 -rw-r--r-- 1 root root 0 Jan 24 12:00 jan24.tgz  
 -rw-r--r-- 1 root root 0 Jan 25 12:00 jan25.tgz  
 -rw-r--r-- 1 root root 0 Jan 26 12:00 jan26.tgz  
 -rw-r--r-- 1 root root 0 Jan 27 12:00 jan27.tgz  
 -rw-r--r-- 1 root root 0 Jan 28 12:00 jan28.tgz  
 -rw-r--r-- 1 root root 0 Jan 29 12:00 jan29.tgz  
 -rw-r--r-- 1 root root 0 Jan 30 12:00 jan30.tgz  
 -rw-r--r-- 1 root root 0 Jan 31 12:00 jan31.tgz  

Now I need to free some space up so I'd like to keep only one file per week :

 # retdo -p /data/backup/db/dbname -r "*.tgz" -b 1 -e 92 -d 7  
 26 file(s) processed - 0 file(s) in error  
 # ll /data/backup/db/dbname  
 total 0  
 -rw-r--r-- 1 root root 0 Jan 5 12:00 jan05.tgz  
 -rw-r--r-- 1 root root 0 Jan 12 12:00 jan12.tgz  
 -rw-r--r-- 1 root root 0 Jan 19 12:00 jan19.tgz  
 -rw-r--r-- 1 root root 0 Jan 26 12:00 jan26.tgz  
 -rw-r--r-- 1 root root 0 Jan 31 12:00 jan31.tgz  

As you can see only one file per week (7 days) has been kept, 26 files were deleted.

This commands means  : "find all files matching regexp *.tgz in /data/backup/db/dbname which are older than 1 days up to 92 days (3 months) and keep only one file every week (7 days)"

Hope that helps !

2 comments:

  1. Hey Greg, I found your blog after you commented on mine Salt Banner post.

    Your script, retdo has solved a current need of mine, so I wanted to reach out and say thanks!

    I really like that the simulate option puts the retention in human readable form.

    This is what I plan to put into prod this morning:

    # older than 14 days up to 91 days and keep only one file every 7 days
    /bin/bash /usr/local/sbin/retdo -p /backup -r "*tar.gz" -b 14 -e 91 -d 7

    # older than 92 days up to 362 days and keep only one file every 30 days
    /bin/bash /usr/local/sbin/retdo -p /backup -r "*tar.gz" -b 92 -e 362 -d 30

    # delete every file older than 363 days up to 600 days
    /bin/bash /usr/local/sbin/retdo -p /backup -r "*tar.gz" -b 363 -e 600 -d 1

    Working on the cronjobs now. Thanks again!

    ReplyDelete
    Replies
    1. Thanks ! We found mutual help thanks to the power of the internet and open source !

      I would recommend you use the -s option, then if your happy with it run it manually and finally put it in a cron.
      Just be careful is you have subdirectories, I should add a "recursive option" when I have the time.

      I have the same kind of setup on my backup NAS :

      # 1 file / month for files older than 3 months up to 1 years

      0 0 1 * * * root /usr/local/scripts/retdo -p /data/backups/prod/DB/HBASE/dumps/ -r "*.gpg" -b 90 -e 360 -d 30
      2 0 1 * * * root /usr/local/scripts/retdo -p /data/backups/prod/DB/HBASE/dumps/ -r "*.md5" -b 90 -e 360 -d 30


      # Delete files older than 1 year

      3 0 1 * * * root /usr/local/scripts/retdo -p /data/backups/prod/DB/HBASE/dumps/ -b 361 -e 720 -d 1


      Greg

      Delete