If you asked anyone the basic question “How long should I keep my backups (or backup tapes) for?”, the most common answer in Australia would be “7 years”. But they could never tell you why.
It is one of those urban myths where, on the face of it, 7 years might sound like a reasonable retention period, and aligns with our personal and corporate tax record retention requirements.
However, it is probably inappropriate from a backup retention perspective.
Data retention requirements will vary depending on the data set in question; and in some ways this opens the Pandora’s box for the need for a proper data classification exercise, as not all data is of equal value.
If you are building a bridge or an office tower, then you will likely want to retain the plans and architectural diagrams associated with that construction FOR THE LIFE OF THE CONSTRUCTION.
Been to Europe lately? It could be over 500 years. The trouble is in Australia, we don’t often have that long term thinking.
Certain regulations ask us to retain records for:
However, backup data retention is not the same as data or records retention. Such data retention regulations often are only concerned about providing proof of any transaction being available for the stipulated period should an audit be initiated, within a statutory period.
Assuming you are in the financial services industry. You have a database with transactional client records. Your industry regulation says that you need to retain customer data for seven years.
Every day you add new records to the database
You may update some records (depends on how the application handles edits)
You don’t purge old records to save space because the disk space is relatively cheap, and the application updates existing records to show an account or transaction as being closed or complete.
There is nothing which suggests you need to take a weekly or monthly backup of this database and retain it for 84 months. Yet you would be surprised how common this practice actually is.
The Simplest Retention Strategy is the Best – but you need to know how your applications purge data. Purging of (aged) data can sometimes get in the way of a simple retention regime. If applications (or file servers) never purged or over-wrote any content, then the only backup we would need is the one for the recovery in case of a system corruption; as all data would effectively be retained within a single online copy.
The ongoing mistake made is over-catering for multiple “levels of recoverability” from copies made years ago. If a file such as a database is updated every day for 7 years, there would be very few circumstances where you would need to go back to a copy specifically from 2374 days ago (6 years, in case you were wondering). Most organisations only keep a monthly “archive” copy of the backup for the purposes of a just-in-case recovery as needed. Very few organisations would ever recover an application back this far. On very rare occasions they are more likely to run up a parallel copy and examine aged records. However, if the online database is retaining records all the way back to 7 years ago, then this typically would meet the record retention requirements without any backup copies.
What does this mean: unless there is some hefty purging of content (by users or inside the application), there is no need to retain backups for such a long period.
What about files in a file server, where there are ongoing updates and even deletions of files? Management of whole files is actually more challenging that for records in a database. Enterprise backup software applications have an archive module for exactly this functionality. If you intend to purge entire files from the active primary data store, for example to relieve pressure on backup times, but also on primary storage usage, then the best option is to archive this data; usually to a much lower cost tier such as nearline disk or tape, or possibly to the cloud.
For compliance purposes, to retain a copy of a file, some type of protection scheme must be in place. This could be a backup, archive, records/document management system, or repository.
Data Classification and Retention
Assuming the deep purging of in-database records does not occur, then your life can be quite simple. Here’s an outline of sound practice:
|“Current” data set|
|How long to keep:||Determined by restore profile. If not sure, 1 month is a good benchmark|
|Databases frequency of copies:||Could be backed up hourly (or in extreme cases every 10-15 mins). High frequency database backups would have a single backup nominated as the retained representative daily backup; where other intermediate copies are expired within a small number of hours.|
|Files frequency of copies:||Usually daily (perhaps augmented with a snapshot regime on the file server/NAS).|
|Method:||Archive and replace with a tiny stub file, (or completely remove from primary storage – but less common)|
|Frequency of copies:||Monthly, or perhaps fortnightly|
|Archive candidature:||Typically older than 24 months, but can be lower depending on data access patterns|
|Where to keep the backup and archive copies:||Commonly nearline disk, as this allows for immediate and easy recovery at a low cost; but also tape is highly economical especially for vast data volumes. Cloud is also an option and many modern data management platforms have strong cloud interfaces.|
|Archive protection:||Archived files should be retained in at least 2 places: disk and tape, or 2 disk copies in different sites to cater for a disaster event|
Consider corruption: if your online system has had a corruption or some event leading to a data loss, you must consider the recovery of such data. Backups are designed for recovery of recent data loss, whether the result of human error, data corruption, or site disaster. You must design your backups to allow recovery from these events. It will rarely mean retaining backups for 7 years.
Some data, especially voluminous data sets including geo-seismic data, clinical records such as a pathology tissue scan or x-ray, or files from a complex engineering project, may need to be retained for many years. Clinical records often need to be retained for the life of the patient. Engineering drawings can provide great insight into a building for refurbishment or later demolition. The life of a building or a bridge could be in excess of 100 years. How do we retain all of this data, and where is it best kept, if it grows to 50 or 100TB (or more) over the life of an organisation?
Keeping such aged data on the primary storage disk array is uneconomical, especially new and expensive all-flash arrays. This is ever more important if the access rate is low.
Relative likehood of access:
If you need to keep such data forever, backup isn’t the right platform. One option when faced with this challenge is to move it to a nearline storage platform which preserves data indefinitely, not needing backup. Such platforms are called content repositories and are usually vast cloud-like storage platforms that operate with in-built data management functionality, including:
Then you are certainly not alone. The vendors will love you; however you must consider how you will handle technology obsolescence. While the shelf life of an LTO tape cartridge is 30 years if stored under ideal conditions, the chances are that in 15 years you won’t actually have the hardware or software technology necessary to recover the data.
Consider the floppy disk, which was commonplace 25 years ago. They were available in 8-inch, 5¼-inch and 3½-inch formats. No single vendor today sells floppy disk drives. Keeping backups for these sorts of periods should only be done after careful evaluation and business financial commitment because to do it properly requires significant effort to be dedicated to ongoing maintenance and movement to the latest storage technologies.
Instead, take the time consider your process in an effort to recover from some type of event. Thinking through it logically based on the information presented here, you will save yourself enormous amounts of cost and effort:
Hopefully some of the information in this article is useful food for thought on this often misunderstood topic.