Monday, March 15, 2010

ProtecTIER deduplication for mainframe

A couple of weeks ago, with the announcement of Protectier Deduplication for Mainframe, I promised to come back on this. I’ve written an article about it in our RealDolmen System z Newsletter, so I thought I might as well put a slightly reworked version here too. Protectier and XIV are two products IBM bought some time ago to gain market share in the distributed world. So, don’t worry if you haven’t heard a lot about them yet. But now that Protectier is brought to the mainframe environment, it might be time to check out how Protectier can mean something for you too.
I’m going to tell you a little about ProtecTIER, how it works and what advantages this might have for a mainframe client. I’m also going to tell you how it’s connected to the mainframe.

Deduplication Explained

At first sight you can compare the ProtecTIER technology with a tapeless backup solution. It simulates a virtual tape library. And just as the TS7720 Tapeless Virtualization Engine, it writes the data to disk, be it an attached disk system of your own choice. The real value of the ProtecTIER solution lies in its deduplication technology. The illustration below shows you how this is done.

Click on image to see larger version in new window

A word of explanation ? We have three components. The mainframe is connected to the ProtecTIER appliance which in turn is connected to the disk system. An ‘appliance’ is the term which is often used for an integrated complete solution of hardware and software in a rack.

The software used by the ProtecTIER Gateway (server) is called HyperFactor. As you see on the right hand side, some blocks of data have already been written to disk. The Memory Resident Index is used to keep track of the blocks which have been written to disk. You might say that this index contains some kind of key for every block. And as you can gather from the name, this index resides in the internal memory of the ProtecTIER Gateway. The index has a maximum size of 4GB which is large enough to index 1PB of data.

At the upper left side you see some new data blocks which have to be written to the ‘tape library’. They are sent to the ProtecTIER Gateway and the HyperFactor software uses the index to filter out the blocks that have already been written to disk. Because there’s always a tiny possibility that two different data blocks might generate the same key, the blocks are only marked as identical after a bit by bit comparison of the data themselves. In the example six blocks are filtered out. On disk we just write a pointer to the real data. The new blocks are added. That way, a data block is only written
once to disk. With this method, ProtecTIER targets a 25 to 1 ratio or storing an effective 1TB of data for 25TB of data that are written to tape.

One example : every day we take a full backup of 10TB. We have a daily change rate of about 15%. Starting the first day we actually have to write 10TB. Nothing’s filtered out and with a compression rate of 3:1 an actual 3.3TB is written to disk. The next day we take our full backup again. With a change rate of 15% we filter out 85% and we only have to write 15% or 1.5TB to disk. Again with a compression of 3:1, this means 0.5TB. After a week we have 70TB of uncompressed backup data. With ProtecTIER we write 6.3TB (3.3TB + 6x0.5TB) to disk. This is a ~11:1 ratio.

As you can see, this deduplication is always happening inline (at the moment of writing to disk) instead of some competitor’s postprocessing approach.

Connectivity to the mainframe

Click on image to see larger version in new window

In a mainframe environment the TS7680 ProtecTIER Gateway behaves as an Automated Tape Library (ATL) and is treated as such by the mainframe Oerating System. It emulates a maximum of 256 3592 Model J1A drives and you can write 1,000,000 tape images. Each virtual tape has a capacity of 100GB. The ‘library’ has standard DFSMS support. At the host side you don’t have to change applications, tape management systems and/or JCL to support the new ‘library’. De TS7680 requires at least z/OS 1.9 or z/VM 5.3.

You can see an illustration of the technical connectivity below.

Click on image to see larger version in new window

The ProtecTIER Gateway originates from the distributed environment and doesn’t know the FICON protocol. You might say we’ve seen the same before with the TS3500 tape library too. Therefore two control units are placed between the mainframe and the ProtecTIER Gateways. Each control unit has a maximum of four FICON connections. I would call them 3592-C06 controllers but they are nowhere indicated as such. As you can also see we have complete redundancy and failover capabilities for each component. The control units are connected with the ProtecTIER Gateways through Fiber Channel.
What stays the same as in the distributed environment is the choice of the disk system the TS7680 writes to. This can e.g. be a DS8000, but also a DS5000, an XIV or an SVC. The choice is up to you.


No comments: