Deduplication Internals : Part-1

Deduplication is one of the hottest technologies in the current market because of its ability to reduce costs. But it comes in many flavours and organizations need to understand each one of them if they are to choose the one that is best for them. Deduplication can be applied to data in primary storage, backup storage, cloud storage or data in flight for replication, such as LAN and WAN transfers. So eventually it offers the below benefits;

– Allow to substantially save disk space, reduce storage requirements and Less hardware

– Improve bandwidth efficiency,

– Improve replication speed,

– Reduce Backup window and improve RTO and RPO objectives,

– and finally COST.

What is data deduplication?

This concept is a familiar one which we see daily, a URL is a type of pointer; when someone shares a video on YouTube, they send the URL for the video instead of the video itself. There’s only one copy of the video, but it’s available to everyone. Deduplication uses this concept in a more sophisticated, automated way.

 

image

Data deduplication is a technique to reduce storage needs by eliminating redundant or duplicate data in your storage environment. Only one and unique copy of the data is retained on storage media, and redundant or duplicate  data is replaced with a pointer to the unique data copy.
That is, It looks at the data on a sub-file (i.e.block) level, and attempts to determine if it’s seen the data before. If it hasn’t, it stores it. If it has seen it before, it ensures that it is stored only once, and all other references to that duplicate data are merely pointers.

How data deduplication works?

Dedupe technology typically divides data in to smaller chunks/blocks and uses algorithms to assign each data chunk a  unique hash identifier called a fingerprint to each chunks/blocks. To create the fingerprint, it uses an algorithm that computes a cryptographic hash value from the data chunks/blocks, regardless of the data type. These fingerprints are stored in an index.
The deduplication algorithm compares the fingerprints of data chunk/block to those already in the index. If the fingerprint exists in the index, the data chunk/block is replaced with a pointer to data chunk/block. If the fingerprint does not exist, the data is written to the disk as a new unique data chunk.

Different types of de-duplication – There are many types and broad classification of dedupe methods; they are

1- Based on the Technologies, how it is done.

Fixed-Length or Fixed Block Deduplication

Variable-Length or Variable Block  Deduplication

Content Aware or application-aware deduplication

2- Based on the Process, or when it is done.

In-line (or as I like to call it, synchronous) de-duplication

Post-process (or as I like to call it, asynchronous) de-duplication

3- Based on the Type, or where it happens.

Source or Client side Deduplication

Target Deduplication

My next post will discuss, in detail about these dedupe technologies and process.

Advertisements

About GK_RAJ

An enthusiastic IT person, with an intense passion towards Datacenter technologies. I am a VMware vExpert Title holder and working as a Technical Consultant, in Qatar. I am exposed to VMware vSphere, Storage, Bladecenters, Datacenter operations, Symantec Backup, Deduplication technologies and carry rich and diversified experience in these domains. I specialize in Designing & Consulting on VMware VSphere, the integration of Storage and Network Stacks to VSphere. With my experience, I help Organizations/Enterprises to achieve their CAPEX & OPEX savings, develop DR and BCP strategies, Consolidation services with Virtualization using VSphere, and prepare them to move to Cloud. In the meantime, I would like to share my knowledge and do a good contribution to the community. I am an Indian citizen, and have a Engineering degree in Electronics and Communication. I have certified in VCAP5-DCD, VCP-Cloud, VCP 4 & 5, MCITP, MCSE.

Posted on February 2, 2013, in Dell Storage, EMC, Netapp, Storage Technology. Bookmark the permalink. Leave a comment.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Dan Gorman's Technology News Aggregation

My Daily Readings from Zite

Virtual Reality

Lets dive into world of virtualization

VMware Professional Services

VMware Professional Services

Brad Hedlund

stuff and nonsense

VCDX56

A blog focusing on day 2 day virtualization stuff

UCSguru.com

Every Cloud Has a Tin Lining.

pibytes

Experience the Datacenter Technologies

boche.net - VMware vEvangelist

Experience the Datacenter Technologies

blog.scottlowe.org

The weblog of an IT pro specializing in virtualization, networking, cloud, servers, & Macs

Eric Sloof - NTPRO.NL

Experience the Datacenter Technologies

Technodrone

Experience the Datacenter Technologies

Welcome to vSphere-land!

your ultimate VMware information destination

Michelle Laverick...

Laverick by Name, Maverick by Nature...

CloudXC

By Josh Odgers - VCDX#90

Long White Virtual Clouds

all things vmware, cloud and virtualizing business critical applications

Virtual Geek

Experience the Datacenter Technologies

Yellow Bricks

by Duncan Epping

%d bloggers like this: