Welcome!

SDN Journal Authors: Destiny Bertucci, Liz McMillan, Pat Romanski, Elizabeth White, Amitabh Sinha

Related Topics: @DXWorldExpo, Java IoT, Microservices Expo, Recurring Revenue, @CloudExpo, SDN Journal

@DXWorldExpo: Article

Five Big Data Features in Oracle

Traditional RDBMS and new data processing

Over the past two decades relational databases have been most successful in serving large scale OLTP and OLAP applications across enterprises. However, in the past couple of years with the advent of Big Data, processing especially processing unstructured data coupled with the need for processing massive quantities of data, made the industry to look into non RDBMS solutions. This has led to the popularity of NOSQL databases as well as massively parallel processing frameworks.

However, the traditional RDBMS have been quick to react and added several Big Data features as part of their offering so that enterprises with a heavy investment in traditional RDBMS can have the best of both worlds by properly leveraging these new features.

The following sections provide an idea about Big Data features in the popular Oracle databases. Please refer to my earlier articles on Five Big Data Features in SQL Server, Five Big Data Features in DB2.

1. External Tables: As the name suggests, an external table accesses data in external sources as if this data were in a table in the database. In the earlier releases external tables were mainly used to access CSV files and Oracle Loader files. However, to support Big Data, Oracle has released a direct connector to the Hadoop HDFS file system on which an external table can be built. With a SQL-like CREATE TABLE syntax an external table feature allows easy access to the HDFS file system. Oracle SQL Connector for HDFS creates the external table definition from a Hive table by contacting the Hive meta store client to retrieve information about the table columns and the location of the table data. In addition, the Hive table data paths are published to the location files of the Oracle external table.

Considering the distinct advantages of the Columnar databases for certain types of workloads, external tables also support Columnar storage. With Hybrid Columnar Compression, the database stores the same column for a group of rows together. The data block does not store data in row-major format, but uses a combination of both row and columnar methods.

Storing column data together, with the same data type and similar characteristics, dramatically increases the storage savings achieved from compression. The database compresses data manipulated by any SQL operation, although compression levels are higher for direct path loads. Database operations work transparently against compressed objects, so no application changes are required.

As a complimentary option, Oracle also provides a Loader for Hadoop. Oracle Loader for Hadoop is a MapReduce application that is invoked as a command-line utility. It provides an efficient and high-performance loader for fast movement of data from a Hadoop cluster into a table in an Oracle database.

2. Oracle Text: Oracle Text enables you to build text query applications and document classification applications. Oracle Text provides indexing, word and theme searching, and viewing capabilities for text. Oracle Text indexes text by converting all words into tokens. The general structure of an Oracle Text CONTEXT index is an inverted index where each token contains the list of documents (rows) that contain that token. The lexer breaks the text into tokens according to your language. These tokens are usually words. Oracle Text can index most document formats including HTML, PDF, Microsoft Word, and plain text, you can load any supported type into the text column. Oracle Text can index most languages. BASIC_LEXER preference type to index whitespace-delimited languages such as English, French, German, and Spanish. MULTI_LEXER preference type for indexing tables containing documents of different languages such as English, German, and Japanese.

The basic Oracle Text query takes a query expression, usually a word with or without operators, as input. Oracle Text returns all documents (previously indexed) that satisfy the expression along with a relevance score for each document. Defining a custom thesaurus enables you to process queries more intelligently. Because users of your application might not know which words represent a topic, you can define synonyms or narrower terms for likely query terms. You can use the thesaurus operators to expand your query into your thesaurus terms.

With extensive support for processing unstructured text document, Oracle Text can play a major role in Big Data processing.

3. VLDB Partitioning: If one of the appealing features of Big Data frameworks is the ability to split large quantities of data across multiple nodes, then Oracle's partitioning features performs the similar functionality and it exists for a while. Partitioning addresses key issues in supporting very large tables and indexes by decomposing them into smaller and more manageable pieces called partitions, which are entirely transparent to an application. SQL queries and Data Manipulation Language (DML) statements do not need to be modified to access partitioned tables.

Partitioning is a critical feature for managing very large databases. Growth is the basic challenge that partitioning addresses for very large databases, and partitioning enables a divide and conquer technique for managing the tables and indexes in the database, especially as those tables and indexes grow.

Oracle also supports many different types of partitioning types depending on the nature of the applications.

  • Range Partitioning: Range partitioning maps data to partitions based on ranges of values of the partitioning key that you establish for each partition.
  • Hash Partitioning: Hash partitioning maps data to partitions based on a hashing algorithm that Oracle applies to the partitioning key that you identify.
  • List Partitioning: List partitioning enables you to explicitly control how rows map to partitions by specifying a list of discrete values for the partitioning key in the description for each partition.
  • Composite Partitioning: Composite partitioning is a combination of the basic data distribution methods; a table is partitioned by one data distribution method and then each partition is further subdivided into subpartitions using a second data distribution method.

4. Native Parallelism & Grid Computing: While the MPP (Massively Parallel Processing) computing forms the basis of Big Data Processing that does not mean the complimentary computing of SMP (symmetric processing) cannot be utilized for processing large quantities of data and many of the Big Data features in Oracle like VLDB partitioning fully utilize the SMP power of servers.

Parallel execution enables the application of multiple CPU and I/O resources to the execution of a single database operation. It dramatically reduces response time for data-intensive operations on large databases. You can use parallel queries and parallel subqueries in SELECT statements and execute in parallel the query portions of DDL statements and DML statements (INSERT, UPDATE, and DELETE). You can also query external tables in parallel.

Oracle also supports MPP kind of parallelism using Oracle Real Application Clusters (Oracle RAC). Oracle RAC enables you to cluster an Oracle database. Oracle RAC uses Oracle Clusterware for the infrastructure to bind multiple servers so they operate as a single system. While RAC may not be suitable for analytical workloads but in conjunction with other features it may help real time analytics.

5. XML DB: Oracle XML DB is a set of Oracle Database technologies related to high-performance handling of XML data: storing, generating, accessing, searching, validating, transforming, evolving, and indexing. It provides native XML support by encompassing both the SQL and XML data models in an interoperable way. Oracle XML DB is included as part of Oracle Database .

XMLType is an abstract data type for native handling of XML data in the database. This data type is integrated with the regular RDBMS tables so that this can be just another column in a table. The table with XMLType can be partitioned using the above mentioned VLDB partitioning techniques making it a good candidate for Big Data processing. There is another component of XMLDB namely Oracle XML DB Repository. Using XML DB Repository we can store any kind of documents in the repository, including XML documents that are associated with an XML schema.

Summary
Traditional high performance RDBMS like Oracle have their strengths. They are very strong in maintaining the data integrity and quality in the form of constraints, foreign keys and other validation mechanisms. They are also strong in transactional integrity by providing a superior locking model, automatic dead lock resolution, etc. Howeve, initially they are not perceived to adjust to Big Data processing needs of enterprises.

With the enhancements in the products made by respective vendors, now databases like Oracle have been enhanced with Big Data processing features that makes them the best candidate for enterprises looking for best-of- breed features between traditional RDBMS and Big Data processing systems, and to leverage the best of existing investments.

More Stories By Srinivasan Sundara Rajan

Highly passionate about utilizing Digital Technologies to enable next generation enterprise. Believes in enterprise transformation through the Natives (Cloud Native & Mobile Native).

@CloudExpo Stories
The dynamic nature of the cloud means that change is a constant when it comes to modern cloud-based infrastructure. Delivering modern applications to end users, therefore, is a constantly shifting challenge. Delivery automation helps IT Ops teams ensure that apps are providing an optimal end user experience over hybrid-cloud and multi-cloud environments, no matter what the current state of the infrastructure is. To employ a delivery automation strategy that reflects your business rules, making r...
Modern software design has fundamentally changed how we manage applications, causing many to turn to containers as the new virtual machine for resource management. As container adoption grows beyond stateless applications to stateful workloads, the need for persistent storage is foundational - something customers routinely cite as a top pain point. In his session at @DevOpsSummit at 21st Cloud Expo, Bill Borsari, Head of Systems Engineering at Datera, explored how organizations can reap the bene...
Kubernetes is an open source system for automating deployment, scaling, and management of containerized applications. Kubernetes was originally built by Google, leveraging years of experience with managing container workloads, and is now a Cloud Native Compute Foundation (CNCF) project. Kubernetes has been widely adopted by the community, supported on all major public and private cloud providers, and is gaining rapid adoption in enterprises. However, Kubernetes may seem intimidating and complex ...
In a recent survey, Sumo Logic surveyed 1,500 customers who employ cloud services such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). According to the survey, a quarter of the respondents have already deployed Docker containers and nearly as many (23 percent) are employing the AWS Lambda serverless computing framework. It’s clear: serverless is here to stay. The adoption does come with some needed changes, within both application development and operations. Tha...
In his session at 21st Cloud Expo, Michael Burley, a Senior Business Development Executive in IT Services at NetApp, described how NetApp designed a three-year program of work to migrate 25PB of a major telco's enterprise data to a new STaaS platform, and then secured a long-term contract to manage and operate the platform. This significant program blended the best of NetApp’s solutions and services capabilities to enable this telco’s successful adoption of private cloud storage and launching ...
In his general session at 21st Cloud Expo, Greg Dumas, Calligo’s Vice President and G.M. of US operations, discussed the new Global Data Protection Regulation and how Calligo can help business stay compliant in digitally globalized world. Greg Dumas is Calligo's Vice President and G.M. of US operations. Calligo is an established service provider that provides an innovative platform for trusted cloud solutions. Calligo’s customers are typically most concerned about GDPR compliance, application p...
The past few years have brought a sea change in the way applications are architected, developed, and consumed—increasing both the complexity of testing and the business impact of software failures. How can software testing professionals keep pace with modern application delivery, given the trends that impact both architectures (cloud, microservices, and APIs) and processes (DevOps, agile, and continuous delivery)? This is where continuous testing comes in. D
The 22nd International Cloud Expo | 1st DXWorld Expo has announced that its Call for Papers is open. Cloud Expo | DXWorld Expo, to be held June 5-7, 2018, at the Javits Center in New York, NY, brings together Cloud Computing, Digital Transformation, Big Data, Internet of Things, DevOps, Machine Learning and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding busin...
Smart cities have the potential to change our lives at so many levels for citizens: less pollution, reduced parking obstacles, better health, education and more energy savings. Real-time data streaming and the Internet of Things (IoT) possess the power to turn this vision into a reality. However, most organizations today are building their data infrastructure to focus solely on addressing immediate business needs vs. a platform capable of quickly adapting emerging technologies to address future ...
SYS-CON Events announced today that Synametrics Technologies will exhibit at SYS-CON's 22nd International Cloud Expo®, which will take place on June 5-7, 2018, at the Javits Center in New York, NY. Synametrics Technologies is a privately held company based in Plainsboro, New Jersey that has been providing solutions for the developer community since 1997. Based on the success of its initial product offerings such as WinSQL, Xeams, SynaMan and Syncrify, Synametrics continues to create and hone in...
You know you need the cloud, but you’re hesitant to simply dump everything at Amazon since you know that not all workloads are suitable for cloud. You know that you want the kind of ease of use and scalability that you get with public cloud, but your applications are architected in a way that makes the public cloud a non-starter. You’re looking at private cloud solutions based on hyperconverged infrastructure, but you’re concerned with the limits inherent in those technologies.
Nordstrom is transforming the way that they do business and the cloud is the key to enabling speed and hyper personalized customer experiences. In his session at 21st Cloud Expo, Ken Schow, VP of Engineering at Nordstrom, discussed some of the key learnings and common pitfalls of large enterprises moving to the cloud. This includes strategies around choosing a cloud provider(s), architecture, and lessons learned. In addition, he covered some of the best practices for structured team migration an...
With tough new regulations coming to Europe on data privacy in May 2018, Calligo will explain why in reality the effect is global and transforms how you consider critical data. EU GDPR fundamentally rewrites the rules for cloud, Big Data and IoT. In his session at 21st Cloud Expo, Adam Ryan, Vice President and General Manager EMEA at Calligo, examined the regulations and provided insight on how it affects technology, challenges the established rules and will usher in new levels of diligence arou...
Most technology leaders, contemporary and from the hardware era, are reshaping their businesses to do software. They hope to capture value from emerging technologies such as IoT, SDN, and AI. Ultimately, irrespective of the vertical, it is about deriving value from independent software applications participating in an ecosystem as one comprehensive solution. In his session at @ThingsExpo, Kausik Sridhar, founder and CTO of Pulzze Systems, discussed how given the magnitude of today's application ...
The “Digital Era” is forcing us to engage with new methods to build, operate and maintain applications. This transformation also implies an evolution to more and more intelligent applications to better engage with the customers, while creating significant market differentiators. In both cases, the cloud has become a key enabler to embrace this digital revolution. So, moving to the cloud is no longer the question; the new questions are HOW and WHEN. To make this equation even more complex, most ...
As you move to the cloud, your network should be efficient, secure, and easy to manage. An enterprise adopting a hybrid or public cloud needs systems and tools that provide: Agility: ability to deliver applications and services faster, even in complex hybrid environments Easier manageability: enable reliable connectivity with complete oversight as the data center network evolves Greater efficiency: eliminate wasted effort while reducing errors and optimize asset utilization Security: imple...
Mobile device usage has increased exponentially during the past several years, as consumers rely on handhelds for everything from news and weather to banking and purchases. What can we expect in the next few years? The way in which we interact with our devices will fundamentally change, as businesses leverage Artificial Intelligence. We already see this taking shape as businesses leverage AI for cost savings and customer responsiveness. This trend will continue, as AI is used for more sophistica...
In his Opening Keynote at 21st Cloud Expo, John Considine, General Manager of IBM Cloud Infrastructure, led attendees through the exciting evolution of the cloud. He looked at this major disruption from the perspective of technology, business models, and what this means for enterprises of all sizes. John Considine is General Manager of Cloud Infrastructure Services at IBM. In that role he is responsible for leading IBM’s public cloud infrastructure including strategy, development, and offering m...
Digital transformation is about embracing digital technologies into a company's culture to better connect with its customers, automate processes, create better tools, enter new markets, etc. Such a transformation requires continuous orchestration across teams and an environment based on open collaboration and daily experiments. In his session at 21st Cloud Expo, Alex Casalboni, Technical (Cloud) Evangelist at Cloud Academy, explored and discussed the most urgent unsolved challenges to achieve f...
In his session at 21st Cloud Expo, Raju Shreewastava, founder of Big Data Trunk, provided a fun and simple way to introduce Machine Leaning to anyone and everyone. He solved a machine learning problem and demonstrated an easy way to be able to do machine learning without even coding. Raju Shreewastava is the founder of Big Data Trunk (www.BigDataTrunk.com), a Big Data Training and consulting firm with offices in the United States. He previously led the data warehouse/business intelligence and B...