Data Retention ≠ Backup Retention

If you asked anyone the basic question “How long should I keep my backups (or backup tapes) for?”, the most common answer in Australia would be “7 years”. But they could never tell you why.

It is one of those urban myths where, on the face of it, 7 years might sound like a reasonable retention period, and aligns with our personal and corporate tax record retention requirements.
However, it is probably inappropriate from a backup retention perspective.

Data retention requirements will vary depending on the data set in question; and in some ways this opens the Pandora’s box for the need for a proper data classification exercise, as not all data is of equal value.

If you are building a bridge or an office tower, then you will likely want to retain the plans and architectural diagrams associated with that construction FOR THE LIFE OF THE CONSTRUCTION.

How long does a bridge or building last?

Been to Europe lately? It could be over 500 years. The trouble is in Australia, we don’t often have that long term thinking.

Certain regulations ask us to retain records for:

  • The life of the patient (or until the child turns 18 plus 7 years = age 25)
  • The length of employment, plus 7 years
  • The term of the contract
  • At least 7 years for financial records
  • After an OH&S incident, plus 5 years
  • 70 years after the end of the year of a copyright creator’s death
  • And so on
  • And some regulations are specified as “at least…”, with no maximum specified.

However, backup data retention is not the same as data or records retention. Such data retention regulations often are only concerned about providing proof of any transaction being available for the stipulated period should an audit be initiated, within a statutory period.

Assuming you are in the financial services industry. You have a database with transactional client records. Your industry regulation says that you need to retain customer data for seven years.

What could this look like?:

Every day you add new records to the database

You may update some records (depends on how the application handles edits)

You don’t purge old records to save space because the disk space is relatively cheap, and the application updates existing records to show an account or transaction as being closed or complete.

For protection purposes, do you:

  • Back up the database daily and keep each backup for 7 years?, or
  • Keep the database with historical records online, and apply data protection techniques to meet the regulatory requirements? This could take the form of:
    • an occasional backup for recovery purposes
    • Retain this backup until you take the next backup, with a small number of levels of recoverability

There is nothing which suggests you need to take a weekly or monthly backup of this database and retain it for 84 months. Yet you would be surprised how common this practice actually is.
The Simplest Retention Strategy is the Best – but you need to know how your applications purge data. Purging of (aged) data can sometimes get in the way of a simple retention regime. If applications (or file servers) never purged or over-wrote any content, then the only backup we would need is the one for the recovery in case of a system corruption; as all data would effectively be retained within a single online copy.

Levels Of Recoverability

The ongoing mistake made is over-catering for multiple “levels of recoverability” from copies made years ago. If a file such as a database is updated every day for 7 years, there would be very few circumstances where you would need to go back to a copy specifically from 2374 days ago (6 years, in case you were wondering). Most organisations only keep a monthly “archive” copy of the backup for the purposes of a just-in-case recovery as needed. Very few organisations would ever recover an application back this far. On very rare occasions they are more likely to run up a parallel copy and examine aged records. However, if the online database is retaining records all the way back to 7 years ago, then this typically would meet the record retention requirements without any backup copies.

What does this mean: unless there is some hefty purging of content (by users or inside the application), there is no need to retain backups for such a long period.

What about files in a file server, where there are ongoing updates and even deletions of files? Management of whole files is actually more challenging that for records in a database. Enterprise backup software applications have an archive module for exactly this functionality. If you intend to purge entire files from the active primary data store, for example to relieve pressure on backup times, but also on primary storage usage, then the best option is to archive this data; usually to a much lower cost tier such as nearline disk or tape, or possibly to the cloud.

For compliance purposes, to retain a copy of a file, some type of protection scheme must be in place. This could be a backup, archive, records/document management system, or repository.

Data Classification and Retention

Data Classification and Retention

A 2012 study across a broad range of industries showed that:

of data is required for legal matters (on legal hold)
is being appropriately retained in a records system
has current business value
The result in
of data having no business, legal or regulatory value.
Most people are storing multiple copies of
data for years and years.

So what would be a simple and effective backup retention regime?

Assuming the deep purging of in-database records does not occur, then your life can be quite simple. Here’s an outline of sound practice:

“Current” data set
How long to keep: Determined by restore profile. If not sure, 1 month is a good benchmark
Databases frequency of copies: Could be backed up hourly (or in extreme cases every 10-15 mins). High frequency database backups would have a single backup nominated as the retained representative daily backup; where other intermediate copies are expired within a small number of hours.
Files frequency of copies: Usually daily (perhaps augmented with a snapshot regime on the file server/NAS).
Aged files
Method: Archive and replace with a tiny stub file, (or completely remove from primary storage – but less common)
Frequency of copies: Monthly, or perhaps fortnightly
Archive candidature: Typically older than 24 months, but can be lower depending on data access patterns
Storage Locations
Where to keep the backup and archive copies: Commonly nearline disk, as this allows for immediate and easy recovery at a low cost; but also tape is highly economical especially for vast data volumes. Cloud is also an option and many modern data management platforms have strong cloud interfaces.
Archive protection: Archived files should be retained in at least 2 places: disk and tape, or 2 disk copies in different sites to cater for a disaster event

Consider corruption: if your online system has had a corruption or some event leading to a data loss, you must consider the recovery of such data. Backups are designed for recovery of recent data loss, whether the result of human error, data corruption, or site disaster. You must design your backups to allow recovery from these events. It will rarely mean retaining backups for 7 years.

When aged data grows to an unmanageable scale, how should this be managed?

Some data, especially voluminous data sets including geo-seismic data, clinical records such as a pathology tissue scan or x-ray, or files from a complex engineering project, may need to be retained for many years. Clinical records often need to be retained for the life of the patient. Engineering drawings can provide great insight into a building for refurbishment or later demolition. The life of a building or a bridge could be in excess of 100 years. How do we retain all of this data, and where is it best kept, if it grows to 50 or 100TB (or more) over the life of an organisation?

Keeping such aged data on the primary storage disk array is uneconomical, especially new and expensive all-flash arrays. This is ever more important if the access rate is low.

Relative likehood of access:

If you need to keep such data forever, backup isn’t the right platform. One option when faced with this challenge is to move it to a nearline storage platform which preserves data indefinitely, not needing backup. Such platforms are called content repositories and are usually vast cloud-like storage platforms that operate with in-built data management functionality, including:

  • WORM (write-once, read many)
  • Version control
  • A minimum of 2 copies of every file to cater for corruption
  • Self-healing from corruption
  • Geographically dispersed copies and replication, to cater for DR
  • Therefore its data doesn’t need backing up – because of the features above

Are you one of those super-paranoid organisations who have decided to keep all backups forever?

Then you are certainly not alone. The vendors will love you; however you must consider how you will handle technology obsolescence. While the shelf life of an LTO tape cartridge is 30 years if stored under ideal conditions, the chances are that in 15 years you won’t actually have the hardware or software technology necessary to recover the data.

Consider the floppy disk, which was commonplace 25 years ago. They were available in 8-inch, 5¼-inch and 3½-inch formats. No single vendor today sells floppy disk drives. Keeping backups for these sorts of periods should only be done after careful evaluation and business financial commitment because to do it properly requires significant effort to be dedicated to ongoing maintenance and movement to the latest storage technologies.

Instead, take the time consider your process in an effort to recover from some type of event. Thinking through it logically based on the information presented here, you will save yourself enormous amounts of cost and effort:

  • Focus on data retention, not backup retention
  • Understand your restore profile and work out how many levels of recoverability are needed – it will be less than you think
  • Investigate whether your applications actually retain most historical data online (likely!)
  • Aggressively retain fewer backups in line with the points above.

Hopefully some of the information in this article is useful food for thought on this often misunderstood topic.

Craig Tamlin
Perfekt’s WA Branch Manager, Craig Tamlin, is an IT industry veteran with over 30 years of international experience helping clients of all scales address their Information Management and Technology challenges. He has a passion for storage, backup/recovery and DR. Previously as Australia/NZ Country Manager of backup storage company Quantum, he has vast experience with a broad range of data protection solutions and is a Commvault platform champion, offering Commvault solutions to Perfekt’s WA clients for 9 years.


Need to make an informed decision? Contact a Perfekt specialist to get a free consultation.