Welcome!

SDN Journal Authors: Pat Romanski, Elizabeth White, Liz McMillan, Greg Schulz, Jerome McFarland

Blog Feed Post

Why Contextual Data Locality Matters

Big Data is quickly overtaking SDN as a key phrase in today’s networking lingo. And overused already as it may be, it actually has a lot more meaning and definition compared to SDN. Big Data solutions are designed to work on lots of data as the name suggests. Of course they have been around forever, talk to any large bank, credit card company, airline or logistics company and all of them have had applications running on extremely large databases and data sets forever. But this is the new Big Data, the one inspired by Hadoop, MapReduce and friends. High performance compute clusters specifically created to analyze large amounts of data and reduce it to a form and quantity that human brains can use in decision making.

What makes today’s Big Data solutions different than its more traditional large database based applications, beyond the sheer datasets being analyzed, is the distributed nature of the analysis. Big Data solutions are designed to run across 100s or even 1000s of servers, each with multiple CPU cores to chew on the data. Traditional large database applications tend to be more localized with fewer applications and servers accessing the data, allowing for more tightly custom integrated solutions, the likes of which Oracle and friends are experts at.

Big Data Flashback

In the late 80s I started my career working as a network engineer for a high energy physics research institute. Working closely with the folks at CERN in Geneva, these physicists were (at the time, and probably still) masters of creating very large datasets. Every time an experiment was run, Tbytes of data (probably Pbytes by now) were generated by thousands of sensors along the tunnel or ring particles were passed through to collide.

The Big Data solution at the time was primitive, but not all that much different than today. The large datasets were manually broken into manageable pieces, something that would fit on a tape or disk. These datasets were then hand copied onto a compute server or super computer and the analysis application would churn through it to find specific data, correlate events and simply reduce the data to something smaller and meaningful. This would then create a new dataset, which would be combined, chopped up again, and the process repeated itself until they arrived at data that was consumable for humans to create new theories from, or provide a piece of proof of an existing theory.

During that first job, the IT group spend an enormous amount of time moving data around. A lot of it manual: tapes and disks were constantly being copied onto the appropriate compute server. The data had to be local to have any chance of analyzing the data. Between tapes, local disks and the network, the local disks were the only storage with appropriate speed to have a hope of finalizing the data reductions. And even then it would not be unusual to have a rather powerful (for the time) Apollo workstation run for several weeks on a single data set.

Back to the here and now

Forward the clock to now. The above description is really not that different from how Hadoop MapReduce works. Start with a big data set, chop it into pieces, replicate the data, compute on the data close to physical locality of the data. Then send results to Reducers, combine the results, then perhaps repeat again to get to human interpretable results.

As fast as we believe the network is within 10GbE access ports, it is still commonly the most restrictive component in the compute, distributed storage and network trio. Compute power increments have far outpaced network speed increments and even memory speed increments. We have many more cycles available to compute, but have not been able to get the data into these CPUs with the same increments. As a result, storage solutions are becoming increasingly distributed, closer to the compute power that needs it.

It’s a natural thought to have the data close to where it needs to be processed, close enough that the effort of retrieving it does not impact the overall completion of the task that uses that data. If I am writing a research paper that takes several hours to complete, I do not mind having to wait a second here or there for the right web sites to load. I would mind if I had to get into my car and drive to the library to look something up, drive back home to work on my paper, and keep doing that. The relationship between time and effort to get data has to become negligible compared to the time and effort required to complete the task.

Locality and growth

This type of contextual locality is extremely hard to manage in a dynamic and growing environment. How do you make sure that the right data remains contextually close to where it is needed when servers and VMs may not be physically close? They may not be in the same rack for the same application or customer, they may not even be in the same pod or datacenter. Storage is relatively cheap, but replication for closeness can very quickly lead to a data distribution complexity that is unmanageable in environments where its not a single orchestrated big data solution.

To solve this problem you need help from your network. You need to be able to create locality on the fly. Things that are not physically close need to be made virtually close, but with the characteristics of physical locality. And in network terms these are of course measured in the usual staples of latency and bandwidth. This is when you want to articulate relationships between the data and the applications that need that data and create virtual closeness that resembles the physical. This may mean dedicated paths through multiple switches to avoid congestion that will dramatically impact latency. These same paths can provide direct physical connectivity through dynamically engineered optical paths between application and storage, or simply appropriate prioritization of traffic along these paths. Without having to worry explicitly where the application is or where the storage is.

Physics will always stand in the way of what we really want or need, but that does not mean we use that same physics with a bit of math to create solutions that manage the complexity of creating dynamic locality. Locality is important. More pronounced in Big Data solutions, but even at a smaller scale it is important within the context of the compute effort on that data.

[Today's fun fact: Lake Superior is the world's largest lake. With that kind of naming accuracy we would like to hire the person that named the lake as our VP of Naming and Terminology]

The post Why Contextual Data Locality Matters appeared first on Plexxi.

Read the original blog entry...

More Stories By Michael Bushong

The best marketing efforts leverage deep technology understanding with a highly-approachable means of communicating. Plexxi's Vice President of Marketing Michael Bushong has acquired these skills having spent 12 years at Juniper Networks where he led product management, product strategy and product marketing organizations for Juniper's flagship operating system, Junos. Michael spent the last several years at Juniper leading their SDN efforts across both service provider and enterprise markets. Prior to Juniper, Michael spent time at database supplier Sybase, and ASIC design tool companies Synopsis and Magma Design Automation. Michael's undergraduate work at the University of California Berkeley in advanced fluid mechanics and heat transfer lend new meaning to the marketing phrase "This isn't rocket science."

@CloudExpo Stories
The IoT has the potential to create a renaissance of manufacturing in the US and elsewhere. In his session at 18th Cloud Expo, Florent Solt, CTO and chief architect of Netvibes, will discuss how the expected exponential increase in the amount of data that will be processed, transported, stored, and accessed means there will be a huge demand for smart technologies to deliver it. Florent Solt is the CTO and chief architect of Netvibes. Prior to joining Netvibes in 2007, he co-founded Rift Technol...
Struggling to keep up with increasing application demand? Learn how Platform as a Service (PaaS) can streamline application development processes and make resource management easy.
If there is anything we have learned by now, is that every business paves their own unique path for releasing software- every pipeline, implementation and practices are a bit different, and DevOps comes in all shapes and sizes. Software delivery practices are often comprised of set of several complementing (or even competing) methodologies – such as leveraging Agile, DevOps and even a mix of ITIL, to create the combination that’s most suitable for your organization and that maximize your busines...
See storage differently! Storage performance problems have only gotten worse and harder to solve as applications have become largely virtualized and moved to a cloud-based infrastructure. Storage performance in a virtualized environment is not just about IOPS, it is about how well that potential performance is guaranteed to individual VMs for these apps as the number of VMs keep going up real time. In his session at 18th Cloud Expo, Dhiraj Sehgal, in product and marketing at Tintri, will discu...
SYS-CON Events announced today that Peak 10, Inc., a national IT infrastructure and cloud services provider, will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. Peak 10 provides reliable, tailored data center and network services, cloud and managed services. Its solutions are designed to scale and adapt to customers’ changing business needs, enabling them to lower costs, improve performance and focus inter...
Up until last year, enterprises that were looking into cloud services usually undertook a long-term pilot with one of the large cloud providers, running test and dev workloads in the cloud. With cloud’s transition to mainstream adoption in 2015, and with enterprises migrating more and more workloads into the cloud and in between public and private environments, the single-provider approach must be revisited. In his session at 18th Cloud Expo, Yoav Mor, multi-cloud solution evangelist at Cloudy...
SYS-CON Events announced today that Enzu, a leading provider of cloud hosting solutions, will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. Enzu’s mission is to be the leading provider of enterprise cloud solutions worldwide. Enzu enables online businesses to use its IT infrastructure to their competitive advantage. By offering a suite of proven hosting and management services, Enzu wants companies to foc...
So, you bought into the current machine learning craze and went on to collect millions/billions of records from this promising new data source. Now, what do you do with them? Too often, the abundance of data quickly turns into an abundance of problems. How do you extract that "magic essence" from your data without falling into the common pitfalls? In her session at @ThingsExpo, Natalia Ponomareva, Software Engineer at Google, will provide tips on how to be successful in large scale machine lear...
SYS-CON Events announced today that Stratoscale, the software company developing the next generation data center operating system, will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. Stratoscale is revolutionizing the data center with a zero-to-cloud-in-minutes solution. With Stratoscale’s hardware-agnostic, Software Defined Data Center (SDDC) solution to store everything, run anything and scale everywhere...
Angular 2 is a complete re-write of the popular framework AngularJS. Programming in Angular 2 is greatly simplified – now it's a component-based well-performing framework. This immersive one-day workshop at 18th Cloud Expo, led by Yakov Fain, a Java Champion and a co-founder of the IT consultancy Farata Systems and the product company SuranceBay, will provide you with everything you wanted to know about Angular 2.
You think you know what’s in your data. But do you? Most organizations are now aware of the business intelligence represented by their data. Data science stands to take this to a level you never thought of – literally. The techniques of data science, when used with the capabilities of Big Data technologies, can make connections you had not yet imagined, helping you discover new insights and ask new questions of your data. In his session at @ThingsExpo, Sarbjit Sarkaria, data science team lead ...
SYS-CON Events announced today that Men & Mice, the leading global provider of DNS, DHCP and IP address management overlay solutions, will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. The Men & Mice Suite overlay solution is already known for its powerful application in heterogeneous operating environments, enabling enterprises to scale without fuss. Building on a solid range of diverse platform support,...
You deployed your app with the Bluemix PaaS and it's gaining some serious traction, so it's time to make some tweaks. Did you design your application in a way that it can scale in the cloud? Were you even thinking about the cloud when you built the app? If not, chances are your app is going to break. Check out this webcast to learn various techniques for designing applications that will scale successfully in Bluemix, for the confidence you need to take your apps to the next level and beyond.
Whether your IoT service is connecting cars, homes, appliances, wearable, cameras or other devices, one question hangs in the balance – how do you actually make money from this service? The ability to turn your IoT service into profit requires the ability to create a monetization strategy that is flexible, scalable and working for you in real-time. It must be a transparent, smoothly implemented strategy that all stakeholders – from customers to the board – will be able to understand and comprehe...
SYS-CON Events announced today that SoftLayer, an IBM Company, has been named “Gold Sponsor” of SYS-CON's 18th Cloud Expo, which will take place on June 7-9, 2016, at the Javits Center in New York, New York. SoftLayer, an IBM Company, provides cloud infrastructure as a service from a growing number of data centers and network points of presence around the world. SoftLayer’s customers range from Web startups to global enterprises.
Increasing IoT connectivity is forcing enterprises to find elegant solutions to organize and visualize all incoming data from these connected devices with re-configurable dashboard widgets to effectively allow rapid decision-making for everything from immediate actions in tactical situations to strategic analysis and reporting. In his session at 18th Cloud Expo, Shikhir Singh, Senior Developer Relations Manager at Sencha, will discuss how to create HTML5 dashboards that interact with IoT devic...
Artificial Intelligence has the potential to massively disrupt IoT. In his session at 18th Cloud Expo, AJ Abdallat, CEO of Beyond AI, will discuss what the five main drivers are in Artificial Intelligence that could shape the future of the Internet of Things. AJ Abdallat is CEO of Beyond AI. He has over 20 years of management experience in the fields of artificial intelligence, sensors, instruments, devices and software for telecommunications, life sciences, environmental monitoring, process...
Peak 10, Inc., has announced the implementation of IT service management, a business process alignment initiative based on the widely adopted Information Technology Infrastructure Library (ITIL) framework. The implementation of IT service management enhances Peak 10’s current service-minded approach to IT delivery by propelling the company to deliver higher levels of personalized and prompt service. The majority of Peak 10’s operations employees have been trained and certified in the ITIL frame...
SYS-CON Events announced today that Ericsson has been named “Gold Sponsor” of SYS-CON's @ThingsExpo, which will take place on June 7-9, 2016, at the Javits Center in New York, New York. Ericsson is a world leader in the rapidly changing environment of communications technology – providing equipment, software and services to enable transformation through mobility. Some 40 percent of global mobile traffic runs through networks we have supplied. More than 1 billion subscribers around the world re...
There is an ever-growing explosion of new devices that are connected to the Internet using “cloud” solutions. This rapid growth is creating a massive new demand for efficient access to data. And it’s not just about connecting to that data anymore. This new demand is bringing new issues and challenges and it is important for companies to scale for the coming growth. And with that scaling comes the need for greater security, gathering and data analysis, storage, connectivity and, of course, the...