Welcome!

SDN Journal Authors: Elizabeth White, Pat Romanski, Lori MacVittie, Greg Ness, Carmen Gonzalez

Blog Feed Post

Why Contextual Data Locality Matters

Big Data is quickly overtaking SDN as a key phrase in today’s networking lingo. And overused already as it may be, it actually has a lot more meaning and definition compared to SDN. Big Data solutions are designed to work on lots of data as the name suggests. Of course they have been around forever, talk to any large bank, credit card company, airline or logistics company and all of them have had applications running on extremely large databases and data sets forever. But this is the new Big Data, the one inspired by Hadoop, MapReduce and friends. High performance compute clusters specifically created to analyze large amounts of data and reduce it to a form and quantity that human brains can use in decision making.

What makes today’s Big Data solutions different than its more traditional large database based applications, beyond the sheer datasets being analyzed, is the distributed nature of the analysis. Big Data solutions are designed to run across 100s or even 1000s of servers, each with multiple CPU cores to chew on the data. Traditional large database applications tend to be more localized with fewer applications and servers accessing the data, allowing for more tightly custom integrated solutions, the likes of which Oracle and friends are experts at.

Big Data Flashback

In the late 80s I started my career working as a network engineer for a high energy physics research institute. Working closely with the folks at CERN in Geneva, these physicists were (at the time, and probably still) masters of creating very large datasets. Every time an experiment was run, Tbytes of data (probably Pbytes by now) were generated by thousands of sensors along the tunnel or ring particles were passed through to collide.

The Big Data solution at the time was primitive, but not all that much different than today. The large datasets were manually broken into manageable pieces, something that would fit on a tape or disk. These datasets were then hand copied onto a compute server or super computer and the analysis application would churn through it to find specific data, correlate events and simply reduce the data to something smaller and meaningful. This would then create a new dataset, which would be combined, chopped up again, and the process repeated itself until they arrived at data that was consumable for humans to create new theories from, or provide a piece of proof of an existing theory.

During that first job, the IT group spend an enormous amount of time moving data around. A lot of it manual: tapes and disks were constantly being copied onto the appropriate compute server. The data had to be local to have any chance of analyzing the data. Between tapes, local disks and the network, the local disks were the only storage with appropriate speed to have a hope of finalizing the data reductions. And even then it would not be unusual to have a rather powerful (for the time) Apollo workstation run for several weeks on a single data set.

Back to the here and now

Forward the clock to now. The above description is really not that different from how Hadoop MapReduce works. Start with a big data set, chop it into pieces, replicate the data, compute on the data close to physical locality of the data. Then send results to Reducers, combine the results, then perhaps repeat again to get to human interpretable results.

As fast as we believe the network is within 10GbE access ports, it is still commonly the most restrictive component in the compute, distributed storage and network trio. Compute power increments have far outpaced network speed increments and even memory speed increments. We have many more cycles available to compute, but have not been able to get the data into these CPUs with the same increments. As a result, storage solutions are becoming increasingly distributed, closer to the compute power that needs it.

It’s a natural thought to have the data close to where it needs to be processed, close enough that the effort of retrieving it does not impact the overall completion of the task that uses that data. If I am writing a research paper that takes several hours to complete, I do not mind having to wait a second here or there for the right web sites to load. I would mind if I had to get into my car and drive to the library to look something up, drive back home to work on my paper, and keep doing that. The relationship between time and effort to get data has to become negligible compared to the time and effort required to complete the task.

Locality and growth

This type of contextual locality is extremely hard to manage in a dynamic and growing environment. How do you make sure that the right data remains contextually close to where it is needed when servers and VMs may not be physically close? They may not be in the same rack for the same application or customer, they may not even be in the same pod or datacenter. Storage is relatively cheap, but replication for closeness can very quickly lead to a data distribution complexity that is unmanageable in environments where its not a single orchestrated big data solution.

To solve this problem you need help from your network. You need to be able to create locality on the fly. Things that are not physically close need to be made virtually close, but with the characteristics of physical locality. And in network terms these are of course measured in the usual staples of latency and bandwidth. This is when you want to articulate relationships between the data and the applications that need that data and create virtual closeness that resembles the physical. This may mean dedicated paths through multiple switches to avoid congestion that will dramatically impact latency. These same paths can provide direct physical connectivity through dynamically engineered optical paths between application and storage, or simply appropriate prioritization of traffic along these paths. Without having to worry explicitly where the application is or where the storage is.

Physics will always stand in the way of what we really want or need, but that does not mean we use that same physics with a bit of math to create solutions that manage the complexity of creating dynamic locality. Locality is important. More pronounced in Big Data solutions, but even at a smaller scale it is important within the context of the compute effort on that data.

[Today's fun fact: Lake Superior is the world's largest lake. With that kind of naming accuracy we would like to hire the person that named the lake as our VP of Naming and Terminology]

The post Why Contextual Data Locality Matters appeared first on Plexxi.

Read the original blog entry...

More Stories By Michael Bushong

The best marketing efforts leverage deep technology understanding with a highly-approachable means of communicating. Plexxi's Vice President of Marketing Michael Bushong has acquired these skills having spent 12 years at Juniper Networks where he led product management, product strategy and product marketing organizations for Juniper's flagship operating system, Junos. Michael spent the last several years at Juniper leading their SDN efforts across both service provider and enterprise markets. Prior to Juniper, Michael spent time at database supplier Sybase, and ASIC design tool companies Synopsis and Magma Design Automation. Michael's undergraduate work at the University of California Berkeley in advanced fluid mechanics and heat transfer lend new meaning to the marketing phrase "This isn't rocket science."

@CloudExpo Stories
Compute virtualization has been transformational, yet security policy implementation and enforcement has lagged behind in agility and automation. There are a number of key considerations when implementing policy in private and hybrid clouds. In his session at 15th Cloud Expo, Holland Barry, VP of Technology at Catbird, will discuss the impact of this new paradigm and what organizations can do today to safely move to software-defined network and compute architectures, including: How normal ope...
Can we look to the paradigm of cloud computing from a completely different perspective? In his General Session at 15th Cloud Expo, Gundars Kulups, Sales Director at DEAC, will discuss what we can learn from our dining habits when choosing a cloud solution. Gundars Kulups is Sales Director at DEAC, full service data center operator. An IT expert, he specializes in European countries and has worked in the IT industry since 1992. He graduated from Riga Technical University (RTU) in Latvia and com...
SYS-CON Events announced today that SOA Software, an API management leader, will exhibit at SYS-CON's 15th International Cloud Expo®, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. SOA Software is a leading provider of API Management and SOA Governance products that equip business to deliver APIs and SOA together to drive their company to meet its business strategy quickly and effectively. SOA Software’s technology helps businesses to accel...
As cloud gives an opportunity to businesses to buy services externally - how is cloud impacting your customers? In his General Session at 15th Cloud Expo, Fabio Gori, Director of Worldwide Cloud Marketing at Cisco, will provide answers to big questions: Do you see hybrid cloud as where the world is going? What benefits does it bring? And how does Cisco connect all of these clouds? He will also tell us everything about Intercloud and Cisco investment on it.
SYS-CON Events announced today that TMCnet has been named “Media Sponsor” of SYS-CON's 15th International Cloud Expo®, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. Technology Marketing Corporation (TMC) is the world's leading business to business and integrated marketing media company, servicing niche markets within the communications and technology industries.
SYS-CON Events announced today that Aria Systems, the recurring revenue expert, has been named "Bronze Sponsor" of SYS-CON's 15th International Cloud Expo®, which will take place on November 4-6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. Aria Systems helps leading businesses connect their customers with the products and services they love. Industry leaders like Pitney Bowes, Experian, AAA NCNU, VMware, HootSuite and many others choose Aria to power their recurring revenue bu...
SYS-CON Events announced today that Parasoft will exhibit at SYS-CON's 15th International Cloud Expo®, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. For 27 years, Parasoft has researched and developed software solutions that help organizations deliver defect-free software efficiently. By integrating Development Testing, API/cloud/SOA/composite app testing, and service virtualization, we reduce the time, effort, and cost of delivering secur...
The Internet of Things (IoT) promises to evolve the way the world does business; however, understanding how to apply it to your company can be a mystery. Most people struggle with understanding the potential business uses or tend to get caught up in the technology, resulting in solutions that fail to meet even minimum business goals. In his session at Internet of @ThingsExpo, Jesse Shiah, CEO / President / Co-Founder of AgilePoint Inc., will show what is needed to leverage the IoT to transform...
SYS-CON Events announced today that Utimaco will exhibit at SYS-CON's 15th International Cloud Expo®, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. Utimaco is a leading manufacturer of hardware based security solutions that provide the root of trust to keep cryptographic keys safe, secure critical digital infrastructures and protect high value data assets. Only Utimaco delivers a general-purpose hardware security module (HSM) as a customiz...
One of the biggest challenges when developing connected devices is identifying user value and delivering it through successful user experiences. In his session at Internet of @ThingsExpo, Mike Kuniavsky, Principal Scientist, Innovation Services at PARC, will describe an IoT-specific approach to user experience design that combines approaches from interaction design, industrial design and service design to create experiences that go beyond simple connected gadgets to create lasting, multi-devic...
Until recently, many organizations required specialized departments to perform mapping and geospatial analysis, and they used Esri on-premise solutions for that work. In his session at 15th Cloud Expo, Dave Peters, author of the Esri Press book Building a GIS, System Architecture Design Strategies for Managers, will discuss how Esri has successfully included the cloud as a fully integrated SaaS expansion of the ArcGIS mapping platform. Organizations that have incorporated Esri cloud-based appl...
Once the decision has been made to move part or all of a workload to the cloud, a methodology for selecting that workload needs to be established. How do you move to the cloud? What does the discovery, assessment and planning look like? What workloads make sense? Which cloud model makes sense for each workload? What are the considerations for how to select the right cloud model? And how does that fit in with the overall IT tranformation? In his session at 15th Cloud Expo, John Hatem, head of V...
Dyn solutions are at the core of Internet Performance. Through traffic management, message management and performance assurance, Dyn is connecting people through the Internet and ensuring information gets where it needs to go, faster and more reliably than ever before. Founded in 2001 at WPI, Dyn’s global presence services more than four million enterprise, small business and personal customers.
IBM and Tencent Cloud signed a business cooperation memorandum to collaborate on providing public cloud with Software-as-a-Service solutions for industries. Both parties agreed to focus on emerging small and medium enterprises in the smarter cities and smarter healthcare industries as well as other fields. This will enable these industries to utilize mobile, cloud computing and big data tools to transform internal processes and operations, thus achieving cloud transformation in the era of mobili...
SimpleECM is the only platform to offer a powerful combination of enterprise content management (ECM) services, capture solutions, and third-party business services providing simplified integrations and workflow development for solution providers. SimpleECM is opening the market to businesses of all sizes by reinventing the delivery of ECM services. Our APIs make the development of ECM services simple with the use of familiar technologies for a frictionless integration directly into web applicat...
European data center operator DEAC is the largest in the Baltics. The activities are orientated to provide data center services and IT outsourcing on Eurasia and America scale in order to create the primary or backup or additional data center for customer in the EU, to protect its business and, most importantly, reduce costs up to 40% within 3-5 years. DEAC is an IT outsourcing services and solutions company whose highly experienced and qualified employees offer various groups of services and...
The Internet of Things will greatly expand the opportunities for data collection and new business models driven off of that data. In her session at Internet of @ThingsExpo, Esmeralda Swartz, CMO of MetraTech, will discuss how for this to be effective you not only need to have infrastructure and operational models capable of utilizing this new phenomenon, but increasingly service providers will need to convince a skeptical public to participate. Get ready to show them the money! Speaker Bio: ...
Samsung VP Jacopo Lenzi, who headed the company's recent SmartThings acquisition under the auspices of Samsung's Open Innovaction Center (OIC), answered a few questions we had about the deal. This interview was in conjunction with our interview with SmartThings CEO Alex Hawkinson. IoT Journal: SmartThings was developed in an open, standards-agnostic platform, and will now be part of Samsung's Open Innovation Center. Can you elaborate on your commitment to keep the platform open? Jacopo Lenzi: S...
The major cloud platforms defy a simple, side-by-side analysis. Each of the major IaaS public-cloud platforms offers their own unique strengths and functionality. Options for on-site private cloud are diverse as well, and must be designed and deployed while taking existing legacy architecture and infrastructure into account. Then the reality is that most enterprises are embarking on a hybrid cloud strategy and programs. In this Power Panel at 15th Cloud Expo, moderated by Ashar Baig, Research ...
Things are being built upon cloud foundations to transform organizations. This CEO Power Panel at 15th Cloud Expo, moderated by Roger Strukhoff, Cloud Expo and @ThingsExpo conference chair, will address the big issues involving these technologies and, more important, the results they will achieve. How important are public, private, and hybrid cloud to the enterprise? How does one define Big Data? And how is the IoT tying all this together?