Welcome!

SDN Journal Authors: Elizabeth White, David Paquette, Liz McMillan, Sal Fernando, Nate Lindstrom

Blog Feed Post

Why Contextual Data Locality Matters

Big Data is quickly overtaking SDN as a key phrase in today’s networking lingo. And overused already as it may be, it actually has a lot more meaning and definition compared to SDN. Big Data solutions are designed to work on lots of data as the name suggests. Of course they have been around forever, talk to any large bank, credit card company, airline or logistics company and all of them have had applications running on extremely large databases and data sets forever. But this is the new Big Data, the one inspired by Hadoop, MapReduce and friends. High performance compute clusters specifically created to analyze large amounts of data and reduce it to a form and quantity that human brains can use in decision making.

What makes today’s Big Data solutions different than its more traditional large database based applications, beyond the sheer datasets being analyzed, is the distributed nature of the analysis. Big Data solutions are designed to run across 100s or even 1000s of servers, each with multiple CPU cores to chew on the data. Traditional large database applications tend to be more localized with fewer applications and servers accessing the data, allowing for more tightly custom integrated solutions, the likes of which Oracle and friends are experts at.

Big Data Flashback

In the late 80s I started my career working as a network engineer for a high energy physics research institute. Working closely with the folks at CERN in Geneva, these physicists were (at the time, and probably still) masters of creating very large datasets. Every time an experiment was run, Tbytes of data (probably Pbytes by now) were generated by thousands of sensors along the tunnel or ring particles were passed through to collide.

The Big Data solution at the time was primitive, but not all that much different than today. The large datasets were manually broken into manageable pieces, something that would fit on a tape or disk. These datasets were then hand copied onto a compute server or super computer and the analysis application would churn through it to find specific data, correlate events and simply reduce the data to something smaller and meaningful. This would then create a new dataset, which would be combined, chopped up again, and the process repeated itself until they arrived at data that was consumable for humans to create new theories from, or provide a piece of proof of an existing theory.

During that first job, the IT group spend an enormous amount of time moving data around. A lot of it manual: tapes and disks were constantly being copied onto the appropriate compute server. The data had to be local to have any chance of analyzing the data. Between tapes, local disks and the network, the local disks were the only storage with appropriate speed to have a hope of finalizing the data reductions. And even then it would not be unusual to have a rather powerful (for the time) Apollo workstation run for several weeks on a single data set.

Back to the here and now

Forward the clock to now. The above description is really not that different from how Hadoop MapReduce works. Start with a big data set, chop it into pieces, replicate the data, compute on the data close to physical locality of the data. Then send results to Reducers, combine the results, then perhaps repeat again to get to human interpretable results.

As fast as we believe the network is within 10GbE access ports, it is still commonly the most restrictive component in the compute, distributed storage and network trio. Compute power increments have far outpaced network speed increments and even memory speed increments. We have many more cycles available to compute, but have not been able to get the data into these CPUs with the same increments. As a result, storage solutions are becoming increasingly distributed, closer to the compute power that needs it.

It’s a natural thought to have the data close to where it needs to be processed, close enough that the effort of retrieving it does not impact the overall completion of the task that uses that data. If I am writing a research paper that takes several hours to complete, I do not mind having to wait a second here or there for the right web sites to load. I would mind if I had to get into my car and drive to the library to look something up, drive back home to work on my paper, and keep doing that. The relationship between time and effort to get data has to become negligible compared to the time and effort required to complete the task.

Locality and growth

This type of contextual locality is extremely hard to manage in a dynamic and growing environment. How do you make sure that the right data remains contextually close to where it is needed when servers and VMs may not be physically close? They may not be in the same rack for the same application or customer, they may not even be in the same pod or datacenter. Storage is relatively cheap, but replication for closeness can very quickly lead to a data distribution complexity that is unmanageable in environments where its not a single orchestrated big data solution.

To solve this problem you need help from your network. You need to be able to create locality on the fly. Things that are not physically close need to be made virtually close, but with the characteristics of physical locality. And in network terms these are of course measured in the usual staples of latency and bandwidth. This is when you want to articulate relationships between the data and the applications that need that data and create virtual closeness that resembles the physical. This may mean dedicated paths through multiple switches to avoid congestion that will dramatically impact latency. These same paths can provide direct physical connectivity through dynamically engineered optical paths between application and storage, or simply appropriate prioritization of traffic along these paths. Without having to worry explicitly where the application is or where the storage is.

Physics will always stand in the way of what we really want or need, but that does not mean we use that same physics with a bit of math to create solutions that manage the complexity of creating dynamic locality. Locality is important. More pronounced in Big Data solutions, but even at a smaller scale it is important within the context of the compute effort on that data.

[Today's fun fact: Lake Superior is the world's largest lake. With that kind of naming accuracy we would like to hire the person that named the lake as our VP of Naming and Terminology]

The post Why Contextual Data Locality Matters appeared first on Plexxi.

Read the original blog entry...

More Stories By Michael Bushong

The best marketing efforts leverage deep technology understanding with a highly-approachable means of communicating. Plexxi's Vice President of Marketing Michael Bushong has acquired these skills having spent 12 years at Juniper Networks where he led product management, product strategy and product marketing organizations for Juniper's flagship operating system, Junos. Michael spent the last several years at Juniper leading their SDN efforts across both service provider and enterprise markets. Prior to Juniper, Michael spent time at database supplier Sybase, and ASIC design tool companies Synopsis and Magma Design Automation. Michael's undergraduate work at the University of California Berkeley in advanced fluid mechanics and heat transfer lend new meaning to the marketing phrase "This isn't rocket science."

@CloudExpo Stories
Developing software for the Internet of Things (IoT) comes with its own set of challenges. Security, privacy, and unified standards are a few key issues. In addition, each IoT product is comprised of (at least) three separate application components: the software embedded in the device, the back-end service, and the mobile application for the end user’s controls. Each component is developed by a different team, using different technologies and practices, and deployed to a different stack/target –...
SYS-CON Events announced today Telecom Reseller has been named “Media Sponsor” of SYS-CON's 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Telecom Reseller reports on Unified Communications, UCaaS, BPaaS for enterprise and SMBs. They report extensively on both customer premises based solutions such as IP-PBX as well as cloud based and hosted platforms.
Traditional on-premises data centers have long been the domain of modern data platforms like Apache Hadoop, meaning companies who build their business on public cloud were challenged to run Big Data processing and analytics at scale. But recent advancements in Hadoop performance, security, and most importantly cloud-native integrations, are giving organizations the ability to truly gain value from all their data. In his session at 19th Cloud Expo, David Tishgart, Director of Product Marketing ...
Almost two-thirds of companies either have or soon will have IoT as the backbone of their business in 2016. However, IoT is far more complex than most firms expected. How can you not get trapped in the pitfalls? In his session at @ThingsExpo, Tony Shan, a renowned visionary and thought leader, will introduce a holistic method of IoTification, which is the process of IoTifying the existing technology and business models to adopt and leverage IoT. He will drill down to the components in this fra...
Fact: storage performance problems have only gotten more complicated, as applications not only have become largely virtualized, but also have moved to cloud-based infrastructures. Storage performance in virtualized environments isn’t just about IOPS anymore. Instead, you need to guarantee performance for individual VMs, helping applications maintain performance as the number of VMs continues to go up in real time. In his session at Cloud Expo, Dhiraj Sehgal, Product and Marketing at Tintri, wil...
Pulzze Systems was happy to participate in such a premier event and thankful to be receiving the winning investment and global network support from G-Startup Worldwide. It is an exciting time for Pulzze to showcase the effectiveness of innovative technologies and enable them to make the world smarter and better. The reputable contest is held to identify promising startups around the globe that are assured to change the world through their innovative products and disruptive technologies. There w...
StarNet Communications Corp has announced the addition of three Secure Remote Desktop modules to its flagship X-Win32 PC X server. The new modules enable X-Win32 to safely tunnel the remote desktops from Linux and Unix servers to the user’s PC over encrypted SSH. Traditionally, users of PC X servers deploy the XDMCP protocol to display remote desktop environments such as the Gnome and KDE desktops on Linux servers and the CDE environment on Solaris Unix machines. XDMCP is used primarily on comp...
There is growing need for data-driven applications and the need for digital platforms to build these apps. In his session at 19th Cloud Expo, Muddu Sudhakar, VP and GM of Security & IoT at Splunk, will cover different PaaS solutions and Big Data platforms that are available to build applications. In addition, AI and machine learning are creating new requirements that developers need in the building of next-gen apps. The next-generation digital platforms have some of the past platform needs a...
SYS-CON Events announced today that StarNet Communications will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. StarNet Communications’ FastX is the industry first cloud-based remote X Windows emulator. Using standard Web browsers (FireFox, Chrome, Safari, etc.) users from around the world gain highly secure access to applications and data hosted on Linux-based servers in a central data center. ...
Extreme Computing is the ability to leverage highly performant infrastructure and software to accelerate Big Data, machine learning, HPC, and Enterprise applications. High IOPS Storage, low-latency networks, in-memory databases, GPUs and other parallel accelerators are being used to achieve faster results and help businesses make better decisions. In his session at 18th Cloud Expo, Michael O'Neill, Strategic Business Development at NVIDIA, focused on some of the unique ways extreme computing is...
With so much going on in this space you could be forgiven for thinking you were always working with yesterday’s technologies. So much change, so quickly. What do you do if you have to build a solution from the ground up that is expected to live in the field for at least 5-10 years? This is the challenge we faced when we looked to refresh our existing 10-year-old custom hardware stack to measure the fullness of trash cans and compactors.
The emerging Internet of Everything creates tremendous new opportunities for customer engagement and business model innovation. However, enterprises must overcome a number of critical challenges to bring these new solutions to market. In his session at @ThingsExpo, Michael Martin, CTO/CIO at nfrastructure, outlined these key challenges and recommended approaches for overcoming them to achieve speed and agility in the design, development and implementation of Internet of Everything solutions wi...
Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more business becomes digital the more stakeholders are interested in this data including how it relates to business. Some of these people have never used a monitoring tool before. They have a question on their mind like “How is my application doing” but no id...
Cloud computing is being adopted in one form or another by 94% of enterprises today. Tens of billions of new devices are being connected to The Internet of Things. And Big Data is driving this bus. An exponential increase is expected in the amount of information being processed, managed, analyzed, and acted upon by enterprise IT. This amazing is not part of some distant future - it is happening today. One report shows a 650% increase in enterprise data by 2020. Other estimates are even higher....
Identity is in everything and customers are looking to their providers to ensure the security of their identities, transactions and data. With the increased reliance on cloud-based services, service providers must build security and trust into their offerings, adding value to customers and improving the user experience. Making identity, security and privacy easy for customers provides a unique advantage over the competition.
Qosmos has announced new milestones in the detection of encrypted traffic and in protocol signature coverage. Qosmos latest software can accurately classify traffic encrypted with SSL/TLS (e.g., Google, Facebook, WhatsApp), P2P traffic (e.g., BitTorrent, MuTorrent, Vuze), and Skype, while preserving the privacy of communication content. These new classification techniques mean that traffic optimization, policy enforcement, and user experience are largely unaffected by encryption. In respect wit...
SYS-CON Events announced today that 910Telecom will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Housed in the classic Denver Gas & Electric Building, 910 15th St., 910Telecom is a carrier-neutral telecom hotel located in the heart of Denver. Adjacent to CenturyLink, AT&T, and Denver Main, 910Telecom offers connectivity to all major carriers, Internet service providers, Internet backbones and ...
SYS-CON Events announced today that Pulzze Systems will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Pulzze Systems, Inc. provides infrastructure products for the Internet of Things to enable any connected device and system to carry out matched operations without programming. For more information, visit http://www.pulzzesystems.com.
Data is an unusual currency; it is not restricted by the same transactional limitations as money or people. In fact, the more that you leverage your data across multiple business use cases, the more valuable it becomes to the organization. And the same can be said about the organization’s analytics. In his session at 19th Cloud Expo, Bill Schmarzo, CTO for the Big Data Practice at EMC, will introduce a methodology for capturing, enriching and sharing data (and analytics) across the organizati...
DevOps at Cloud Expo – being held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA – announces that its Call for Papers is open. Born out of proven success in agile development, cloud computing, and process automation, DevOps is a macro trend you cannot afford to miss. From showcase success stories from early adopters and web-scale businesses, DevOps is expanding to organizations of all sizes, including the world's largest enterprises – and delivering real results. Am...