Deduplication Internals – Source Side & Target Side Deduplication: Part-4
In Continuation of my deduplication series Part 1, Part 2, Part 3, in this part we will discuss where the deduplication happens. Based on where the dedupe process is happening, there is 2 types – That is Source Side or Client side and Target side. These are widely used in the Backup Technology, that is backup software and hardware appliances. Like IBM Tivoli TSM, Symantec backup Exec, EMC Avamar, Netbackup etc.
Source or Client Side Deduplication.
From the Name itself we can say, the entire deduplication is happening on the Client (Servers), there will be dedicated deduplication agents installed in the servers and these agents use the CPU/RAM of the server to perform the deduplication. This thereby distributing the deduplication processing overhead across multiple systems.
Source-side deduplication typically uses a client-located deduplication engine that will check for duplicates against a centrally-located deduplication index, typically located on the backup server or media server. Only unique blocks will be transmitted to the disk. So less data to transfer so the backup window is also get reduced.
Source-based deduplication processes the data before it goes over the network. This reduces the amount of data that must be transmitted, which can be important in environments with constrained bandwidth. That is less network bandwidth needed for the backup software.
This method is often used in situations where remote offices must back up data to the primary data center. In this case, deduplication becomes the responsibility of the client server.
Target Side Deduplication.
Target-based deduplication requires that the target backup server or a dedicated Hardware target dedupe appliance handles all of the deduplication. This means no overhead on the client or server being backed up. This solution is transparent to existing workflows, so it creates minimal disruption. However, it requires more network resources because the original data, with all its redundancy, must go over the network.
So a target deduplication solution is a purpose built appliance, so the hardware and software stack are tuned to deliver optimal performance, so there is no additional CPU/RAM resource usage in the client servers.
Dell DR4000, Symantec NetBackup PureDisk, Symantec NetBackup 5220, EMC Avamar, HP StoreOnce, EMC’s Data Domain, ExaGrid Systems’ DeltaZone and NEC’s Hydrastor, IBM’s ProtecTier and Sepaton’s DeltaStor, Quantum DXi are the main examples which uses this type of deduplication.
My next post will be about the Inline and Post processing deduplication………!! Stay Tuned !!!