Category Archives: Peak Indicators

Big Data Architecture – Frequently Asked Questions (FAQs)

by Antony Heljula

Big Data Architecture - Frequently Asked Questions (FAQs)

The aim of this article is to help provide a common understanding of what "Big Data" is all about, especially in relation to Oracle technologies.   A lot of organisations are now talking about big data and we are often asked to clarify what it actually means and, ultimately, why companies would want to consider going down that route.

Please use the form at the bottom to leave your own views on this or if you have some additional FAQs.

1) What is a Big Data Architecture?

2) Why would you want to store and process data in a file system?

3) What is "un-structured" data?

4) How do you report on un-structured data?

5) What is Hadoop and NoSQL?

6) Can Oracle BI (OBIEE) report directly against Hadoop and NoSQL?

7) This all sounds great!  Shall we get rid of our Data Warehouse then and just use Hadoop?

8) How do I get data out of Hadoop and into an Oracle Database?

9) What is Oracle Big Data Discovery?


1) What is a Big Data Architecture?

Generally speaking, a Big Data Architecture is one which involves the storing and processing of data in a file system rather than a relational database.   In most cases, this provides a mechanism to summarise and/or clean data prior to loading it into a database for further analysis.


2) Why would you want to store and process data in a file system?

Historically, companies have adopted relational databases to store and process their data.  However there are two issues which are creeping up on a number of organisations:

a) Data volumes are growing exponentially, it is becoming ever most costly to store all your data in a relational database.   Storing and processing data in a file system is relatively cheap in comparison and is highly scalable

b) To gain a competitive edge, Organisations need to bring a greater variety of data sources together to perform meaningful analysis.   Relational databases have always been good at analysing "structured" data but they often have limitations when dealing with "un-structured" data sources


3) What is "un-structured" data?

Relational databases store data in a very structured format - they contain tables with records that have a fixed number of columns with specific data types i.e. the database consists of a data-model.
"Un-structured" data sources are ones where the data-model is not so well defined.   For example, consider the following:

- An employee's CV
- Social media data e.g. a Twitter feed
- A customer's review
- A server log file

It is probably more accurate to say "semi-structured" data, since all data surely has some structure even if the structure is quite vague or complex!   But either way, one purpose of Big Data is to provide a mechanism for making use of your un-structured data sources.  This often means summarising it, making it more structured and then combining it with other structured data sources for further analysis.

4) How do you report on un-structured data?

In a relational database world you can't (or it is quite difficult).  You have to convert your un-structured sources to a more structured format first before you can report on them.    For example:

- Key word extraction:  Picking out the common terms or words mentioned in Twitter feeds or CVs
- Sentiment Analysis:   Determining whether the sentiment in a phrase or paragraph is "positive" or "negative"
- Log Parsing:          Parsing log files to extract error messages and other useful messages
- Entity Extraction:    Identify nouns, phone numbers, addresses from textual data

These processes would be useful for the following types of Business Intelligence query:

- How many employees do we have who can speak German?
- How many customers in each country have given us negative feedback in the last week?
....and so on

5) What is Hadoop and NoSQL?

Apache Hadoop is widely regarded as the main foundation of a Big Data Architecture.   Hadoop is open-source and provides a file system that allows you to store huge volumes of data and it supports distributed processing across clusters of computers.  It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.    It is "highly-avaiable", so losing any single Hadoop node will result in zero data loss and processing can continue unaffected.

NoSQL (Not Only SQL) is a type of database that can store data in different formats to the standard "structured" schemas used with relational databases, such as "key-value" (KV) pair format.   Just as a basic example, here is how relational and KV formats can differ when storing a person's Id, Name, Date of Birth and Company:

Relational:

(1234,John Smith,1976-14-05,Oracle Corporation)

Key-Value Pair:

(1234,Name,John Smith)
(1234,DoB,1976-14-05)
(1234,Company,Oracle Corporation)

Key-value pair format is very useful when the number of columns of information is extremely large or not known.   As an example, with Twitter feeds the number of pieces of information that is supplied with each Tweet could vary (some users will allow their Lat/Long geo-locations to be made public whilst others do not).


6) Can Oracle BI (OBIEE) report directly against Hadoop and NoSQL?

Yes.  It is possible for Oracle BI to query Hadoop structures using a Hadoop utility such as Hive.   Hive makes it possible to present structures as relational objects and therefore you can report against them using standard SQL commands via a JDBC connection

No.  It is not possible for Oracle BI to report against Oracle NoSQL.   You will need to write scripts to extract data from NoSQL where it can then be consumed by a relational database.


7) This all sounds great!  Shall we get rid of our Data Warehouse then and just use Hadoop?

No you don't want to go down this route.   When you run queries against Hadoop, they are essentially batch processes that can run massively in parallel.   Whilst this is extremely useful, you won't get the response times and levels of concurrency that are delivered by a relational database.  Perhaps you can think of it this way:

- Hadoop is designed for huge batch queries, but only a small number of them taking place at any one time
- A relational database is designed for mixed workloads with many small/medium/large processes all happening at the same time

8) How do I get data out of Hadoop and into an Oracle Database?

Oracle provide "Big Data Connectors" that enable Oracle Data Integrator (ODI) to extract/load data between Hadoop/NoSQL and an Oracle Database.    These connectors require additional licenses but are relatively low cost (and anyone can use them).

Oracle also provides "Big Data SQL" which enables you to create "external tables" in the Oracle Database that present Hadoop structures as standard Oracle Database tables.   So you can run any type of database SQL query against a table and the processing will all be done on the Hadoop file system.  This facility however is only available for customers who have purchased an Oracle Big Data Appliance (BDA).


9) What is Oracle Big Data Discovery?

Historically, one of the issues with a Big Data Architecture is that you don't know what your data will look like until you've extracted it, loaded it into a relational database and then built some reports.

Oracle Big Data Discovery overcomes this issue by building graphs and other visualisations direct against the structures in Hadoop.   The benefit is that it compliments your existing Business Intelligence tools by enabling you to explore your data (summarise, join, transform etc) at source to see whether it contains any value and to assist with defining further reporting and processing requirements.

Please see the Oracle web-site for more details:  https://www.oracle.com/big-data/big-data-discovery/index.html

 

OBIEE and JavaScript Integration

By Antony Heljula

This is a useful presentation for anyone wishing to embed JavaScript into their Oracle BI reports and dashboards.   It gives an overview of JavaScript and its syntax followed by a number of examples on where and how you can embed JavaScript and HTML code.

It also explains how, for example, you can get JavaScript code to automatically fire whenever you open a dashboard or after you make a selection in a dashboard prompt.

To view the presentation please click the image below, and please feel free to leave any comments or questions using the form underneath.


JavaScript Integration Cover



Oracle BI and Geo-Spatial Big Data

By Antony Heljula   (Oracle Spatial Summit 2015, San Francisco)

Predictive BI is all about using your past to predict the future! It an effective and exciting way to extract more value out of the significant volumes of historical data that is gathered by most Organisations. Predictive BI is applicable to just about any Organisation or Department, especially those with Oracle BI and/or Oracle BI Applications.
This presentation will explain what Predictive BI can achieve for your business and discusses the Oracle products which can be used to deliver it (Oracle Advanced Analytics and Oracle Real-Time Decisions). A live demonstration will also show the impact of having Predictive capabilities on your Business Intelligence dashboards.

The role of "Geo-Spatial" data is often overlooked when it comes to Big Data analytics.   This presentation aims to demonstrate how Oracle Spatial can significantly extending the capabilities of Oracle BI through the adoption of Geo-Spatial analytics.

Example dashboards are provided using digital map data from both HERE and GfK Geo-Marketing, the following features are discussed:

- Plotting existing geometries
- Spatial Aggregation
- Geo-coding
- Advanced Spatial Analysis
- POI Analysis
- Overlapping 3rd party marketing data to build BI Heat Map layers
- Using geo-spatial data for Predictive BI

To view the presentation please click the image below, and please feel free to leave any comments or questions using the form underneath.


Analytics as a Service - Go Outdoors

 

 

Advanced Modelling with OBIEE: Data-mart Automation

By Antony Heljula   (UKOUG Tech 2014 presentation)

Based on real-life customer projects, this presentation outlines the advanced modelling features of the Oracle BI Server, which can significantly reduce your development timeframes and simplify the process of building and populating data-marts to optimise your analytical queries.

Topics covered include Modelling Principles, Fragmentation, Cross Database Joins, Federated Queries, Aggregate Persistence and BI Server "Populate" commands.

Examples show how Oracle BI was able to consolidate data from several operational systems into a central data-mart and provide near real-time reporting with fast response times - all data was loaded via the BI Server, no ETL tool required.

To view the presentation please click the image below, and please feel free to leave any comments or questions using the form underneath.


Analytics as a Service - Go Outdoors

Real Business Value with Predictive BI

By Antony Heljula   (UKOUG Apps 2014 presentation)

Predictive BI is all about using your past to predict the future! It an effective and exciting way to extract more value out of the significant volumes of historical data that is gathered by most Organisations. Predictive BI is applicable to just about any Organisation or Department, especially those with Oracle BI and/or Oracle BI Applications.
This presentation will explain what Predictive BI can achieve for your business and discusses the Oracle products which can be used to deliver it (Oracle Advanced Analytics and Oracle Real-Time Decisions). A live demonstration will also show the impact of having Predictive capabilities on your Business Intelligence dashboards.

Predictive BI is all about using your past to predict the future!

Predictive BI is an effective and exciting way to extract more value out of the significant volumes of historical data that is gathered by most Organisations. Predictive BI is applicable to just about any Organisation or Department, especially those with Oracle BI and/or Oracle BI Applications.

This presentation will explain what Predictive BI can achieve for your business and discusses the Oracle products which can be used to deliver it (Oracle Advanced Analytics and Oracle Real-Time Decisions)

To view the presentation please click the image below, and please feel free to leave any comments or questions using the form underneath.


Analytics as a Service - Go Outdoors