Welcome!

SDN Journal Authors: Elizabeth White, Pat Romanski, TJ Randall, Yeshim Deniz, Liz McMillan

Related Topics: @DXWorldExpo, Java IoT, @CloudExpo

@DXWorldExpo: Article

Getting Automation Right with Big Data | @BigDataExpo #BigData

Things To Remember While Automating With Big Data

Big data automation can mean writing dozens of scripts to process different input sources and aligning them in order to consolidate all this data and produce the required output.

Why exactly do you need big data for your enterprise projects? Many industry observers have been noting that although a lot of enterprises like to claim that their big data projects are aimed at "deriving insights" that replace human intuition with data-driven alternatives, in reality though, the objective appears to be automation. They point out that the role of data scientists at a lot of organizations has got little to do with replacing human intuition with big data. Instead, it is about augmenting human experience by making it easier, faster and more efficient.

But automating big data processing is easier said than done and the biggest problem here is that big data is well big. What this means is that there is a lot of chaos and inconsistency in the data available. As a result, creating a MapReduce script that can instantly input all your data and process the results is just wishful thinking. In reality, big data automation can mean writing dozens of scripts to process different input sources and aligning them in order to consolidate all this data and produce the required output.

The first thing to get right with respect to automating big data is the architecture. One of the most popular ways to set up big data automation is through data lakes. To put it simple, data lakes is a large storage repository that holds all the raw data until it is necessary for processing. Unlike traditional hierarchical data warehouses, data lakes stores raw data in a flat architecture . One of the key advantages here is that data lakes can store all sorts of data - structured, semi-structured and unstructured and is thus ably suited for big data automation.

The next thing to get right is agility. Traditional data sources are structured and using a data warehouse technology ensures seamless processing and efficient processing of data. With big data though, this can be a disadvantage. Data scientists need to build agile systems that can be easily configured and reworked in order to quickly and efficiently navigate through the multitude of data sources and build an automation system that works.

While challenges as those mentioned above can be tackled by choosing the right technologies, there are other problems with big data that need to be dealt at a more granular level. One example is manipulative algorithms that can bring about vastly different outputs and rogue or incompetent developers can cause automation issues that can be extremely difficult to track down and modify. Another issue is with misinterpretation of data. An automated big data system could possibly magnify minor discrepancies in data and feed them into a loop that could lead to grossly misleading outputs.

These are issues that cannot be wished away and the only way to get automation right in such cases is by diligently monitoring and evaluating the code and outputs. This way, it is possible to identify discrepancies in the algorithm and outputs before it can potentially blow up. From a business perspective, this means additional resources to test and validate the code and output at each stage of the development and operational cycle. This could effectively bring down the cost advantage that big automation has. But this is a necessary expense to pay if businesses need to establish a sustainable big automation product that also works.

More Stories By Harry Trott

Harry Trott is an IT consultant from Perth, WA. He is currently working on a long term project in Bangalore, India. Harry has over 7 years of work experience on cloud and networking based projects. He is also working on a SaaS based startup which is currently in stealth mode.

CloudEXPO Stories
The precious oil is extracted from the seeds of prickly pear cactus plant. After taking out the seeds from the fruits, they are adequately dried and then cold pressed to obtain the oil. Indeed, the prickly seed oil is quite expensive. Well, that is understandable when you consider the fact that the seeds are really tiny and each seed contain only about 5% of oil in it at most, plus the seeds are usually handpicked from the fruits. This means it will take tons of these seeds to produce just one bottle of the oil for commercial purpose. But from its medical properties to its culinary importance, skin lightening, moisturizing, and protection abilities, down to its extraordinary hair care properties, prickly seed oil has got lots of excellent rewards for anyone who pays the price.
The platform combines the strengths of Singtel's extensive, intelligent network capabilities with Microsoft's cloud expertise to create a unique solution that sets new standards for IoT applications," said Mr Diomedes Kastanis, Head of IoT at Singtel. "Our solution provides speed, transparency and flexibility, paving the way for a more pervasive use of IoT to accelerate enterprises' digitalisation efforts. AI-powered intelligent connectivity over Microsoft Azure will be the fastest connected path for IoT innovators to scale globally, and the smartest path to cross-device synergy in an instrumented, connected world.
There are many examples of disruption in consumer space – Uber disrupting the cab industry, Airbnb disrupting the hospitality industry and so on; but have you wondered who is disrupting support and operations? AISERA helps make businesses and customers successful by offering consumer-like user experience for support and operations. We have built the world’s first AI-driven IT / HR / Cloud / Customer Support and Operations solution.
ScaleMP is presenting at CloudEXPO 2019, held June 24-26 in Santa Clara, and we’d love to see you there. At the conference, we’ll demonstrate how ScaleMP is solving one of the most vexing challenges for cloud — memory cost and limit of scale — and how our innovative vSMP MemoryONE solution provides affordable larger server memory for the private and public cloud. Please visit us at Booth No. 519 to connect with our experts and learn more about vSMP MemoryONE and how it is already serving some of the world’s largest data centers. Click here to schedule a meeting with our experts and executives.
Darktrace is the world's leading AI company for cyber security. Created by mathematicians from the University of Cambridge, Darktrace's Enterprise Immune System is the first non-consumer application of machine learning to work at scale, across all network types, from physical, virtualized, and cloud, through to IoT and industrial control systems. Installed as a self-configuring cyber defense platform, Darktrace continuously learns what is ‘normal' for all devices and users, updating its understanding as the environment changes.