Many years ago, before most backup products had backup-to-disk options, vendors launched Disk-based appliances, known as Virtual Tape Libraries, as a backup target. These enhanced the backup and restore experiences over tape. Over time these appliances were enhanced to provide deduplication in the box. This supplemented the lack of such features in some backup solutions.
Time has of course moved on and products such as Commvault has had backup-to-disk as an inherent part of its architecture since inception, and has supported deduplication for many years now.
So the question becomes: What is the best way to perform deduplication? Within backup software, or with an appliance?
Appliances seem easy, as you don’t need to consider the deduplication within the backup software, just buy a box with 3x of the capacity of your data being backed up and you might get a month or so of backups on disk. The backup software sees it as a generic disk target.
However, it really isn’t that simple, and there are a number of factors often overlooked when contemplating this approach. This blog article is designed to flesh these out for you so you can make a considered decision. These are broken down into a few distinct categories:
A dedupe appliance is a self-contained unit and generally can’t be messed with by outside factors. For some users this “black box” approach is simple, but has a number of notable downsides.
They perform the deduplication task far too late in the data path to be useful. Deduplication is done at the very end of the data movement process. Software dedupe (eg. by Commvault) is at the start of the data path. Nothing leaves the client computer unless the Media Agent confirms that it is needed. With a dedupe appliance all of the data has to be sent by the client across the network and be processed by the Media Agent. It then has to be sent out of the Media Agent to the appliance, at which point it’s deduped. There is no aid to backup performance when you still need to move all of the data.
When one compares a typical amount of data that is sent out of the client using Commvault dedupe – somewhere around 2% to 5% daily is quite normal, against what gets sent across the network when using a hardware appliance – 100% – there is really no contest. Commvault dedupe saves significant amounts of two things: time and space. Time is saved because very little is sent across the network from client to Media Agent, and space is saved because only a single copy of everything is stored. A dedupe appliance saves space but no time at all because of the 100% data transmission.
Dedupe performed at the client (like Commvault) is “content aware”. This means that for every item it is backing up, it re-starts the alignment again.
For example, if a system has file 1, file 2 and file 3, and a user edits file 1 and makes it bigger, this won’t disrupt the deduplication for file 2 and file 3, because when it’s finished with file 1, the agent will open file 2 and start again at the beginning of the file, so everything lines up again. A dedupe appliance has no idea what is going on in the client and is not content aware. Commvault (and other backup products) will write big chunk files to the appliance, and these don’t dedupe very well once some content has been edited as described. Today’s chunk files don’t look much like yesterday’s, so for that reason an appliance simply cannot get the same sort of space savings that deduping at the source as Commvault does. No dedupe appliance can achieve dedupe savings of 98% or 99%, and this is often achievable with Commvault dedupe.
Note the Size of Application (original data size, at source), the Data Written (data sent across the network to the Media Agent), and the Savings Percentage – a whopping 98%! Only 2% of the data was sent across the network and only 2% was stored in the disk library. No dedupe appliance can achieve ratios like that.
The picture gets murkier when considering multiple sites and DR protection of backup copies. Such site-to-site copy is only able to be performed efficiently when you have purchased two such dedupe appliances, Commvault cannot participate in assisting this process as it is handled in the back end by the appliance and Commvault is (mostly) unaware that the copy has been made. Now you have hardware vendor lock in for two sites that must both be upgraded together.
Often, in such circumstances you have limited control over the mirror site policies. Commvault solves all of this with DASH copy, where the hardware can be dissimilar and your retention policies can be as varied as you like: keep some jobs for 3 months at Prod and 1 month at DR, keep other jobs for 3 months on each site, etc. As granular as you might need.
Since dedupe appliances only see what data is sent to them, large and complex sites with a mix of backup and archive data are not able to take advantage of global site dedupe. Every site, no matter how big or small, would need the same vendor’s dedupe appliance and this get inordinately complex (not to mention expensive) when trying to coordinate large scale fan-in of remote site content. It is simply not feasible. Contrast that with Commvault software-based dedupe you can protect a single desktop right through to a massive NAS device of >1PB and dedupe will work for all enterprise data, backup and archive, with cross site data management handled efficiently and seamlessly.
Despite purchasing the dedupe appliance, you will also need to consider the licensing obligations of the backup software itself. Within Commvault, depending on your licensing model, in the early days you needed to license the total addressable (effective) capacity of the dedupe appliance. So if it had 40TB of usable disk and with deduplication offered 150TB effective space you would need to purchase a Standard Disk Option license for 150TB.
With the newer Commvault licensing schemes such as a Capacity License Agreement and the VM protection Solution Bundles, your license already includes Commvault deduplication capability! Why would you want to go out and buy another/different solution when you have paid for it? 95% of backup systems are configured with deduplication as it is the market expectation. It makes no sense to bypass that and buy an appliance to do the same thing.
Once you have Commvault deduplication licensing, there is no more to pay for back end dedupe capacity expansion, just the cost of the disk. Dedupe appliances will cost more than generic JBOD disk, thus you are paying more than you will should.
Related to this, is that a dedupe appliance locks you in to a specific HW vendor. Your upgrades need to come from them. Your second site appliance must be the same, so that must also come from them.
Further, capacity upgrades to the dedupe appliance are limited to what the vendor offers and some boxes are restrictive in capacity options. With Commvault software dedupe you can add JBOD after JOBD separately (even from different vendors) and therefore not be concerned with the Library device itself. You are free to choose any vendor Media Agent server, so long as the specification meets the needs for performing deduplication according to Commvault guidelines. If your shop has a preference for HP, Dell, Cisco, IBM/Lenovo – then you can stay with that choice.
All of the points above add up to a total cost of ownership for dedupe appliances which can only be more expensive than using the Commvault deduplication you probably have already in your environment.
- Earlier in the data path = faster backups, less user impact, less network traffic (thinner pipes)
- Software implementation = regular JBOD, no vendor lock-in with proprietary algorithms and hardware
- Client aware = more efficient deduplication
- Multiple site protection = easy implementation, allow for cost-effective tapeless DR
- Remote copy policy flexibility = limit disk capacity to rules that follow a business retention process, not one mandated by vendor design
- Global dedupe = corporate benefits
- And don’t forget that dedupe appliances may also have a Commvault license obligation