|By Louis Imershein||
|February 17, 2014 11:00 AM EST||
The initial wave of data efficiency features for primary storage focus on silos of information organized in terms of individual file systems. Deduplication and compression features provided by some vendors are limited by the scalability of those underlying file systems, essentially the file systems have become silos of optimized data. For example, NetApp deduplication can't scale beyond a 100 TB limit, because that's the limit in size of its WAFL file system. But ask anyone who's ever used NetApp deduplication if they've done it on a 100 TB file system, and you're likely to hear "are you crazy?" It's one thing to claim that data efficiency features can scale, quite a different one to actually use them with performance at scale.
Challenges around scalability generally center on two areas: scalability of random IO and memory overhead. Older solutions, like the one from NetApp, face the first challenge while newer flash-based storage systems are struggling with the second. I'll review both here:
The IO Challenge
Primary data-oriented storage devices handle both streaming and random throughput and therefore are sensitive to latency effects. Data efficiency requirements for primary storage must have fast hashing techniques to reduce the impact of latency. Fast hashes are non-cryptographic in nature and so require data comparison when used to do deduplication. It works like this:
- When a new chunk of data is read in it is first given a name using the hash algorithm.
- The system then checks a deduplication index to see if a chunk with that name has been seen before (note that this can consume disk IO and tremendous amounts of memory if done wrong).
- If the name has been seen we need to take extra steps. Because fast hashes are non-cryptographic, it is possible to have a name match while the data content differs. This is known in computer science as a hash-collision. To account for this, the existing copy of the chunk must be read in and compared bit-by-bit to the new. If they match, only a reference to the chunk is created. If not, then the new chunk must be written.
Essentially, this form of deduplication means trading a write of a duplicate chunk for a read. Depending on the design of the underlying block virtualization layer, duplicate chunks may be widely dispersed throughout the system. In that case, the bigger the system gets, the more expensive reads get - so processing of duplicate data becomes slower and slower as the storage system fills - this is why you won't find many 100 TB NetApp file systems with deduplication turned on. Certainly not for primary storage applications, the system would be flooded with random read requests and NetApp's deduplication process can end up taking months, years or even never complete.
A number of techniques have been used to reduce the impact of IO in other products. For example, the Hitachi NAS (HNAS) and Hitachi Unified Storage (HUS) solutions from HDS make use of hardware-acceleration to generate cryptographically secure hashes that do not require a data compare at all - this allows for linear scaling of deduplication performance on volumes up to 256 TB in size. Data is also written out before it is deduplicated to avoid introducing any latency through the hash computation process itself.
Permabit's own Albireo Virtual Data Optimizer (VDO) product, a plug-in module for Linux-based storage solutions, takes a different approach but with a similar result. VDO works inline to provide immediate data reduction. When data is written out, the VDO process intelligently lays it out in a sequential pattern, so that subsequent read compares of duplicates are more likely to be sequential as well. Both solutions do a fine job at solving the problem in real world scenarios, they just take different approaches.
The Memory Challenge
Many of today's flash array vendors are providing deduplication using similar fast hashing techniques to what I outlined above. With flash, the cost of doing random reads for read compares is a non-issue (random seeks on flash are much less expensive than for hard drive environments) so the use of the fast hash alone is enough to minimize latency. These systems (such as EMC's recently launched XtremIO product) are focused on delivering performance and the big challenge to performance at scale is available memory (DRAM). As above, after chunks are read in, they are named using a fast hashing algorithm. After that, the flash system must determine whether or not a chunk has been seen before. To get at this information as quickly as possible, flash-based storage systems have tended to use huge amounts of DRAM to cache chunk names in memory. It's not uncommon to see flash storage systems that allocate 16 GB of working cache per TB of storage. To support a 256 TB storage volume, such a system would require a TBs of DRAM. The increased hard costs in terms of more expensive (denser) DIMMS, as well as the increased cost of the server board required to support this many DIMMs combine to make this an extremely costly and unpopular proposition. Combine this with the fact that DRAM prices are not falling at the same rate as flash prices, and you can see why no vendor today makes a 256TB flash storage array with global deduplication capabilities.
The solution to the memory challenge is coming, in the form of a next generation of flash storage products that utilize Albireo indexing and Albireo VDO. Unlike the flash arrays described above, flash-optimized arrays with VDO takes advantage of advanced caching techniques to operate with 128 MB of working cache per TB of storage and deliver excellent performance. With VDO, a 256 TB system can be delivered with as little as 32 GB of RAM while delivering 1M IOPS performance. The net result is a cost effective and easily deployed data efficiency solution for flash arrays.
Deduplication Scalability by Vendor
As you can see in the table above, forward thinking vendors like HDS have done a good job at overcoming limitations in their data efficiency features and have products on the market today that can scale to meet the requirements of the large enterprise. Many other vendors are lagging behind, because of their inability to address IO and/or memory requirements, a serious downfall since data efficiency is at the core of distinguishing storage solutions, a critical end user requirement, and a ‘must have' component for 2014. Permabit's VDO product overcomes both of these limitations through the use of advanced memory-efficient caching techniques.
DevOps at Cloud Expo, taking place Nov 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 19th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long dev...
Aug. 29, 2016 10:00 PM EDT Reads: 2,485
Identity is in everything and customers are looking to their providers to ensure the security of their identities, transactions and data. With the increased reliance on cloud-based services, service providers must build security and trust into their offerings, adding value to customers and improving the user experience. Making identity, security and privacy easy for customers provides a unique advantage over the competition.
Aug. 29, 2016 08:30 PM EDT Reads: 2,440
Qosmos has announced new milestones in the detection of encrypted traffic and in protocol signature coverage. Qosmos latest software can accurately classify traffic encrypted with SSL/TLS (e.g., Google, Facebook, WhatsApp), P2P traffic (e.g., BitTorrent, MuTorrent, Vuze), and Skype, while preserving the privacy of communication content. These new classification techniques mean that traffic optimization, policy enforcement, and user experience are largely unaffected by encryption. In respect wit...
Aug. 29, 2016 08:15 PM EDT Reads: 1,869
SYS-CON Events announced today that StarNet Communications will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. StarNet Communications’ FastX is the industry first cloud-based remote X Windows emulator. Using standard Web browsers (FireFox, Chrome, Safari, etc.) users from around the world gain highly secure access to applications and data hosted on Linux-based servers in a central data center. ...
Aug. 29, 2016 08:00 PM EDT Reads: 912
SYS-CON Events announced today that 910Telecom will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Housed in the classic Denver Gas & Electric Building, 910 15th St., 910Telecom is a carrier-neutral telecom hotel located in the heart of Denver. Adjacent to CenturyLink, AT&T, and Denver Main, 910Telecom offers connectivity to all major carriers, Internet service providers, Internet backbones and ...
Aug. 29, 2016 07:00 PM EDT Reads: 1,980
Traditional on-premises data centers have long been the domain of modern data platforms like Apache Hadoop, meaning companies who build their business on public cloud were challenged to run Big Data processing and analytics at scale. But recent advancements in Hadoop performance, security, and most importantly cloud-native integrations, are giving organizations the ability to truly gain value from all their data. In his session at 19th Cloud Expo, David Tishgart, Director of Product Marketing ...
Aug. 29, 2016 06:45 PM EDT Reads: 839
StarNet Adds Secure Remote Linux and Unix Desktops to X-Win32 | @CloudExpo @XWin32 #Cloud #Linux #Security
StarNet Communications Corp has announced the addition of three Secure Remote Desktop modules to its flagship X-Win32 PC X server. The new modules enable X-Win32 to safely tunnel the remote desktops from Linux and Unix servers to the user’s PC over encrypted SSH. Traditionally, users of PC X servers deploy the XDMCP protocol to display remote desktop environments such as the Gnome and KDE desktops on Linux servers and the CDE environment on Solaris Unix machines. XDMCP is used primarily on comp...
Aug. 29, 2016 06:15 PM EDT Reads: 796
Almost two-thirds of companies either have or soon will have IoT as the backbone of their business in 2016. However, IoT is far more complex than most firms expected. How can you not get trapped in the pitfalls? In his session at @ThingsExpo, Tony Shan, a renowned visionary and thought leader, will introduce a holistic method of IoTification, which is the process of IoTifying the existing technology and business models to adopt and leverage IoT. He will drill down to the components in this fra...
Aug. 29, 2016 06:15 PM EDT Reads: 331
DevOps at Cloud Expo – being held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA – announces that its Call for Papers is open. Born out of proven success in agile development, cloud computing, and process automation, DevOps is a macro trend you cannot afford to miss. From showcase success stories from early adopters and web-scale businesses, DevOps is expanding to organizations of all sizes, including the world's largest enterprises – and delivering real results. Am...
Aug. 29, 2016 04:30 PM EDT Reads: 3,567
Internet of @ThingsExpo, taking place November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 19th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The Internet of Things (IoT) is the most profound change in personal and enterprise IT since the creation of the Worldwide Web more than 20 years ago. All major researchers estimate there will be tens of billions devices - comp...
Aug. 29, 2016 02:15 PM EDT Reads: 3,745
[session] Architecting for the Cloud By @RagsS | @CloudExpo @IBMBluemix #Cloud #Docker #Microservices
As the world moves toward more DevOps and Microservices, application deployment to the cloud ought to become a lot simpler. The Microservices architecture, which is the basis of many new age distributed systems such as OpenStack, NetFlix and so on, is at the heart of Cloud Foundry - a complete developer-oriented Platform as a Service (PaaS) that is IaaS agnostic and supports vCloud, OpenStack and AWS. Serverless computing is revolutionizing computing. In his session at 19th Cloud Expo, Raghav...
Aug. 29, 2016 02:00 PM EDT Reads: 1,006
Data is the fuel that drives the machine learning algorithmic engines and ultimately provides the business value. In his session at Cloud Expo, Ed Featherston, a director and senior enterprise architect at Collaborative Consulting, will discuss the key considerations around quality, volume, timeliness, and pedigree that must be dealt with in order to properly fuel that engine.
Aug. 29, 2016 12:45 PM EDT Reads: 2,029
There is growing need for data-driven applications and the need for digital platforms to build these apps. In his session at 19th Cloud Expo, Muddu Sudhakar, VP and GM of Security & IoT at Splunk, will cover different PaaS solutions and Big Data platforms that are available to build applications. In addition, AI and machine learning are creating new requirements that developers need in the building of next-gen apps. The next-generation digital platforms have some of the past platform needs a...
Aug. 29, 2016 12:15 PM EDT Reads: 843
Fact: storage performance problems have only gotten more complicated, as applications not only have become largely virtualized, but also have moved to cloud-based infrastructures. Storage performance in virtualized environments isn’t just about IOPS anymore. Instead, you need to guarantee performance for individual VMs, helping applications maintain performance as the number of VMs continues to go up in real time. In his session at Cloud Expo, Dhiraj Sehgal, Product and Marketing at Tintri, wil...
Aug. 29, 2016 12:00 PM EDT Reads: 903
19th Cloud Expo, taking place November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA, will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy. Meanwhile, 94% of enterpri...
Aug. 29, 2016 12:00 PM EDT Reads: 3,200
Enterprises have forever faced challenges surrounding the sharing of their intellectual property. Emerging cloud adoption has made it more compelling for enterprises to digitize their content, making them available over a wide variety of devices across the Internet. In his session at 19th Cloud Expo, Santosh Ahuja, Director of Architecture at Impiger Technologies, will introduce various mechanisms provided by cloud service providers today to manage and share digital content in a secure manner....
Aug. 29, 2016 12:00 PM EDT Reads: 842
SYS-CON Events announced today that Hitrons Solutions will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Hitrons Solutions Inc. is distributor in the North American market for unique products and services of small and medium-size businesses, including cloud services and solutions, SEO marketing platforms, and mobile applications.
Aug. 29, 2016 08:15 AM EDT Reads: 749
SYS-CON Events announced today that eCube Systems, a leading provider of middleware modernization, integration, and management solutions, will exhibit at @DevOpsSummit at 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. eCube Systems offers a family of middleware evolution products and services that maximize return on technology investment by leveraging existing technical equity to meet evolving business needs. ...
Aug. 29, 2016 08:00 AM EDT Reads: 856
Announcing @TelecomReseller Named “Media Sponsor” of @CloudExpo Silicon Valley | #IoT #Cloud #BigData
SYS-CON Events announced today Telecom Reseller has been named “Media Sponsor” of SYS-CON's 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Telecom Reseller reports on Unified Communications, UCaaS, BPaaS for enterprise and SMBs. They report extensively on both customer premises based solutions such as IP-PBX as well as cloud based and hosted platforms.
Aug. 29, 2016 08:00 AM EDT Reads: 967
Pulzze Systems was happy to participate in such a premier event and thankful to be receiving the winning investment and global network support from G-Startup Worldwide. It is an exciting time for Pulzze to showcase the effectiveness of innovative technologies and enable them to make the world smarter and better. The reputable contest is held to identify promising startups around the globe that are assured to change the world through their innovative products and disruptive technologies. There w...
Aug. 29, 2016 07:30 AM EDT Reads: 830