Tag Archives: Oracle BI Suite EE
Oracle Exalytics, Oracle R Enterprise and Endeca Part 3 : Flight Delays Analysis using OBIEE, Endeca and Oracle R Enterprise on Exalytics
So far in this series we’ve looked at Oracle’s BI, Big Data and Advanced Analytics strategy and seen how Exalytics can benefit both Endeca Information Discovery and Oracle R Enterprise through its 40 CPU cores and 1TB of RAM. For a recap on the two earlier postings in this series the links are provided below, and in this final posting we’ll look at an example where all three tools are brought together on the Exalytics platform to provide different types of “big data” analysis, with big data defined as data of high volume, high variety and high velocity:
- Oracle Exalytics, Oracle R Enterprise and Endeca Part 1 : Oracle’s Analytics, Engineered Systems, and Big Data Strategy
- Oracle Exalytics, Oracle R Enterprise and Endeca Part 2 : Oracle Endeca, the Advanced Analytics Option and Oracle Exalytics
- Oracle Exalytics, Oracle R Enterprise and Endeca Part 3 : Flight Delays Analysis using OBIEE, Endeca and Oracle R Enterprise on Exalytics
In this example, the big data bit is slightly tenuous as the data we’re using is transactional data (numbers of flights, numbers of cancellations etc) accompanied by some unstructured/semi-structured data (reasons for flight delays). There’s no sensor data though, or data arriving at high velocity, but the dataset is large (around 123m rows), and diverse (sort of), and Oracle R Enterprise could easily connect to a Hadoop cluster via the Oracle R Connector for Hadoop if required. All of this also could run on a regular server but if it’s hosted on Exalytics then we additionally benefit from the 1TB RAM and 40 cores that this hardware can bring to bear on the analysis, with the R client running on Exalytics along with Endeca Information Discovery, and the Exalytics server then connecting via the two InfiniBand ports to an Exadata Server, an Oracle Big Data Appliance server, or other data sources via its 10GB and 1GB ethernet interfaces.
The flight delays dataset is a fairly well-known set of public data made available by the Bureau of Transportation Statistics, Research and Innovative Technology Administration within the United States Department of Transportation, and in it’s full incarnation contains 123M rows of non-stop US domestic flight legs. For each flight leg it contains the source and destination airports, operator, and aircraft type, whilst for delays it holds the type and duration of delay, the delay reason and other supporting numeric and textual information. If you’ve seen the standard Exalytics Airlines demo, this is the dataset it uses, but it can also be used by Endeca and Oracle R Enterprise, as we’ll see in this post.
So given the three Exalytics tools we’ll be using for this example (OBIEE, Endeca Information Discovery and Oracle R Enterprise), at a high-level what is each tool good at? A good first start would be to say:
- OBIEE is good for dashboard analysis of structured data(numeric + attribute with a clearly-defined data model), together with ad-hoc analysis, scorecards and other traditional BI-type analysis
- Oracle Endeca Information Discovery is good for the initial exploration and discovery of the data set, allowing us to quickly bring in disparate structured and unstructured data and then aggregate and analyse it, usually as the pre-cursor to more structured dashboard analysis using OBIEE
- R, and Oracle R Enterprise, is good at providing deep insight into specific questions, such as “are some airports more prone to delays than others”, and “for American Airlines, how has the distribution of delays for departures and arrivals evolved over time?”
If we take this model of Endeca first to initially discover the full dataset, then OBIEE for answers to the questions we’ve now defined, and R/ORE to dig deeper into specific topics, our BI approach on Exalytics would look something like this:
So let’s start now with the Endeca element. If you read my series of postings on Endeca just after the Oracle acquisition, you’ll have read how one of the main use-cases for Endeca Latitude and the MDEX engine (now known as Oracle Endeca Information Discovery, and the Endeca Server, respectively) was in situations where you had a whole range of potentially interesting data that you wanted to load up and quickly analyse, but you didn’t want to spend an inordinate amount of time creating a conformed dimensional data model; instead, the key-value pair database loads data up as records, each one of which contains a number of attributes that effectively contain their own schema. What you often end up with then is what Endeca termed a “jagged database”, where each record had at least one attribute in common with the others (typically, more than one attribute as shown in the diagram below), but records that originated from each different source system or database table might have different attribute sets to the other, or even different to other records in that dataset. The net effect of this is that upfront-data modelling is minimised and you don’t need to reject incoming data just because it doesn’t fit into your conformed data model. The diagram below shows a conceptual view of such an Endeca Server datastore, with the first incoming set of rows containing sales transaction data made up of dimension IDs and some attributes unique to sales data, with the next set of rows containing market research information that shares some key values with the previous dataset, but then contains its own unique attributes that may or may not be present in all of its records.
Endeca Server datastores (as their databases are called) are created and loaded via web service calls, typically constructed using Endeca Information Discovery Integrator, an ETL tool built-off of the Eclipse/CloverETL open-source platform and enhanced with specific components for Endeca Server administration. Once the datastore is loaded, the front-end dashboard application is created using Endeca Information Discovery Studio, with the two GUI tools looking as in the screenshots below. For more details of the Endeca Information Discovery development process, see this series of postings that I put together earlier in the year where I go through an end-to-end development process using the Quickstart/Bikestore Endeca demo dataset, and the set of videos on YouTube page that takes you through the process with narrative explaining what’s going on.
Where the Endeca Server differentiates itself from OBIEE’s BI Server and its traditional RDBMS sources, and Essbase and other multi-dimensional OLAP servers, is that it’s got a bunch of features and capabilities for analysing textual data and extracting meaning, sentiment and other semantics from it. Using Integrator or direct web service calls to the Endeca Server, incoming unstructured data can be analysed using features such as:
- Keyword search, boolean search, parametric search, wildcard search, dimension search and dimension filters
- Dimension precedence rules
- Numeric range, geospatial, date/time and security filters
- Spell correction/suggestion, and “do you mean”-type alternative presentation
- Find similar, and 1 and 2-way synonyms
- Stemming and lemmatisation
- Keyword-in-context snippeting
- Results clustering, relevance ranking, sorting and paging
- Support for Multiple languages
So what do Endeca Information Discovery dashboards look like once they’re created, and connected to a suitable Endeca Server datastore? In the example of the flight delays data we’re using across the various tools, there are a number of unique features that EID brings to the dataset, starting with the first dashboard we’ll look at, below.
The flight delays dataset contains lots of free-form text, so that there are, for example, many different mis-spellings of McDonnell Douglas, an aircraft manufacturer. After being loaded into the Endeca Server datastore and then processed using the Endeca Server’s document analysis capabilities, cleaned-up and standardised versions of these mis-spellings are used to populate a manufacturer attribute that groups all of them together, for easy analysis.
I mentioned earlier that one of the main uses of Endeca Information Discovery is searching across the entire dataset to find attributes and records of interest, which will then form the focus of the more structured data model that we’ll then use as a data source for OBIEE. In the screenshot below, the Value Search feature is initially used to display all occurrences of the typed-in value in all attributes in the recordset, with the search highlighting attributes as the search term is typed in. In addition, what’s termed a record search can then be performed that takes an attribute value and uses it to filter the displayed set of records based on groups of attributes called “search interfaces”. As the set of records is narrowed down by the record search, graphs and other visuals on the dashboard page immediately show metric numbers aggregated for this record set, showing the dual search/analytic capabilities of the Endeca Server. When run on the Exalytics platform, all of this potentially takes place much quicker as the Endeca Server can parallelise search operations as well as any indexing that needs to take place in the datastore. The 1TB of RAM on the server can also be useful as the Endeca Server will try and keep as much of the analysis dataset in memory as is possible, with the disk-based column store database more there as a persistence store.
Finally, the text search and analysis features in the Endeca Server are useful for pulling out themes and sentiments from the incoming data; in the screenshot below, we can see that MD-88 aircraft typically are involved in delays that are down to the age of the aircraft, whilst delays involving the newer Boing 777 are more often down to issues such as lights not working, crew areas now being serviceable and so on.
Armed with all of this information and a subsequent better understanding of the data available to us, we can now start thinking about a more structured data model for use with OBIEE.
The flight delays dataset, once you look into it in more detail, really contains two main star schemas we’re interested in; one based around a flight leg fact dimensioned by carrier, flight month, origin and destination airport, route and so forth. The other fact would be around the actual flight delays, sharing some of these dimensions but also with its actual reason for the delay, like the diagram below:
This dimension model would then map fairly easily into an Oracle BI Repository semantic model, with a single data source and business model and with two subject areas, one for each star schema. As we’re running on Exalytics though, we can then generate some aggregate recommendations, based either on database optimiser statistics (if the system is otherwise unused by end-users), or on actual query patterns taken from the usage tracking and summary statistics tables maintained by the BI Server. To generate these recommendations and then create a script for their implementation, you’d then use the Oracle BI Summary Advisor that’s only available on Exalytics systems.
Full details on what happens when you use the Summary Advisor are in this previous blog post and my article on the topic for Oracle Magazine, but once you’ve generated your aggregates and created your dashboards and analyses, your dashboards would look something like the screenshots below. Note that whilst these examples are focusing on Exalytics, a cut-down version of the Flight Delays data along with dashboards and analyses are available as part of SampleApp v207, along with the R dashboards that we’ll see later on.
What OBIEE does well here is display, in a very rich graphical form, lots of aggregated data with supporting attributes to enable slice-and-dice, analysis, KPIs, scorecards and maps. When run on Exalytics, all of the prompts have their “Apply” buttons removed so that changes in parameter values are reflected in the dashboard immediately, whilst the TimesTen in-memory database ensures that response-times are within the sub-second range, even when the underlying dataset has millions of detail-level rows within it.
So now on to R, and Oracle R Enterprise. R is typically used to answer more in-depth, focused questions using more advanced statistical functions than you’d get in regular SQL, such as:
- Are some airports more prone to delays than others? Are some days of the week likely to see fewer delays than others? And are these differences significant?
- How do arrival delay distributions differ for the best and worst 3 airlines compared to the industry? Moreover, are there significant differences among airlines?
- For American Airlines, how has the distribution of delays for departures and arrivals evolved over time?
- How do average annual arrival delays compare across select airlines, and what is the underlying trend for each airline?
To analyse the airlines dataset using R, luckily enough a cut-down version of the dataset ships also with ORE (ONTIME_S) and comes pre-installed with Oracle R Enterprise (ONTIME_S is also described in this Oracle R Enterprise Blog post, where you can see examples of R functions being used on the dataset). To work with the flight delays dataset then, you’d go through a process of creating “frames” within ORE using data from the Oracle Database, and then create R scripts to manipulate the dataset and provide answers to your questions. Again, teaching R is outside the scope of this posting, but the screenshots below show the ONTIME_S dataset being loaded up in the R client that’s included in SampleApp v207, along with an R script that provides one of the analyses used in the dashboard I’ll show in a moment.
Scripts created using R can be utilised within Oracle BI in a couple of main ways; R scripts stored within the Oracle database using ORE can be referenced directly using BI Publisher, with R’s XML output then being used to create an image that can be displayed using RTF templates, or you can reference R scripts held within ORE directly within OBIEE’s BI Repository, as PL/SQL functions similar to regular ones such as AVG, LAG/LEAD and REGEXP (with details explained in a training PDF on Operationalizing R Scripts on the Oracle website). The OBIEE SampleApp v207 comes with a set of dashboards that show how both types of output might look, with the dashboard page on the left displaying a parameterised BI Publisher report embedded within, showing flight delays per airport calculate live by R engines on the Exalytics server. The dashboard page on the right, by contrast, shows a regression analysis calculated using functions referenced in the BI Repository RPD, displaying the output as both a table and an interactive map.
So, it was a bit of a whistle-stop tour but hopefully it sets out the different types of analysis made available by Oracle Endeca Information Discovery, OBIEE and Oracle R Enterprise, and how you might use one, two or all of them on a typical BI project. I’ve left out Essbase of course which also has a role to play, and the “big data” element is a bit superficial as I’m not doing anything with Hadoop, MapReduce and so on. But hopefully it gives you a flavour of the different tools and how they might benefit from being run on the Exalytics platform. For more information on Rittman Mead and Endeca, check out the Rittman Mead Endeca homepage, whilst for more information on Exalytics, check out our Exalytics resource centre, where you can also read about our Exalytics Test Centre in London, UK, where we can prototype this sort of analysis using our own, dedicated Exalytics server, working in conjunction with our OBIEE, Endeca and R consulting and implementation team.
Oracle Exalytics, Oracle R Enterprise and Endeca Part 2 : Oracle Endeca, the Advanced Analytics Option and Oracle Exalytics
In this week of postings we’re going to look at Oracle Exalytics and how it enables “big data” and unstructured data analytics, using Oracle Endeca, Oracle Exadata, Oracle Big Data Appliance and the Oracle Database Advanced Analytics option. In case you’ve arrived via a Google search and you’re interested in the rest of the postings in this series, here’s the links to the articles (to be completed as postings are published).
- Oracle Exalytics, Oracle R Enterprise and Endeca Part 1 : Oracle’s Analytics, Engineered Systems, and Big Data Strategy
- Oracle Exalytics, Oracle R Enterprise and Endeca Part 2 : Oracle Endeca, the Advanced Analytics Option and Oracle Exalytics
- Oracle Exalytics, Oracle R Enterprise and Endeca Part 3 : Flight Delays Analysis using OBIEE, Endeca and Oracle R Enterprise on Exalytics
So in the first post in this series we looked at Exalytics as part of the Oracle database tech stack, and how Oracle’s analytics strategy is to handle all types of data, using a number of optimised analytic tools and analysis engines, with packaged applications where appropriate and delivered via the web, via mobile devices, in the cloud and embedded in business application and processes. We closed the post with a mention of a new database option called the Advanced Analytics Option, and in this second posting we’ll look at just what this new option contains and how it related to Oracle’s engineered systems strategy.
The Advanced Analytics Option is an option to Oracle Database Enterprise Edition, available from version 11.2 of the database onwards. It includes two major components:
- Oracle Data Mining, which prior to the Advanced Analytics Option was an option in itself (typically bought along with the OLAP Option, which is still a separate option)
- Oracle R Enterprise, Oracle’s take on R, the statistical language used widely in academia and rapidly replacing Base SAS and SPSS within commercial organisations
For both data mining and R, the key premise with the Advanced Analytics Option is to bring the algorithms to the data; instead of having to extract data from a database, along with files and other sources, and then load this into a statistics engine such as SAS, you can instead embed R scripts and data mining algorithms directly within the database, making it easy to score and classify data in real-time, such as in a call-centre application or as part of an ETL routine.
Oracle Data Mining has been around for a number of years now, but R is a new addition to Oracle’s analytic toolset, and is probably new to most Oracle BI & DW developers. So just what is R, and Oracle’s version of it, Oracle R Enterprise?
In fact, there are actually two R packages that Oracle have put together; one is free, the other is a database option. The free one is Oracle’s own distribution of open-source R, the same as you’d get from downloading it from the R Project’s website, but with additional libraries to make it run faster on x86 hardware. Open-source R can also be downloaded from the Oracle website along with licensable Oracle R Enterprise, installing it direct onto Oracle Linux or other Unix OS’s using Oracle’s Yum repository. Oracle R Enterprise, however, is basic R extended to work closer with the Oracle database by adding the following (licensed) elements:
- R packages to add to and extend the standard packages provided with open-source R
- A database library for connecting to Oracle and running R scripts within the database
- SQL extensions to allow R functionality to be called from SQL and PL/SQL
These elements then provide four main Oracle Enterprise R features:
- A “Transparency” layer that intercepts standard R functions and extends them to allow certain R functions and datatypes to reside in the Oracle database
- A Statistics Engine providing a set of statistical functions and procedures for commonly-used statistical libraries, which then execute in the Oracle database
- SQL extensions, which allow database server execution of R code, and support parallelism, SQL access to R and XML output
- A Hadoop connector, for running R scripts and functions against an Oracle Hadoop cluster with its files held in either HDFS, an Oracle database, or local files.
When you work with R, you typically have the R client installed on your laptop or workstation which communicates with the R server, typically delivered as a single executable for Windows, Linux or Unix. Whilst this has the virtue of simplicity it also means that you are limited by the amount of RAM and CPU on your local machine, which can quickly become an issue when you try to spin up multiple R engines to process a model in parallel, as each engine loads up the full data set into memory before starting work. Even on a 2-4 core laptop with 16GB RAM you can quickly run out of memory, which is where Oracle R Enterprise comes in – the basic data structure that you work with in R called a “frame”, analogous to a relational table, can with Oracle R Enterprise be actually stored in a database giving you the ability to process much larger sets of data, with many more R engines running, than if you were running standalone. Typically this would be a large, multi-core Oracle database, though you can also connect R and ORE to the TimesTen in-memory database using the new ROracle R interface, detailed in this blog post by Jason Feldhaus on the Oracle R Enterprise blog.
Oracle R Enterprise also has the ability to spin-up (or “spawn”) it’s own R engines within the database server, providing a lights-out environment that allows R computations to be carried out even when you’re not at your workstation, and with these database-resident R engines having full access to the database, SQL and PL/SQL. Coupled with the Oracle R Connector for Hadoop, a typical ORE (as we’ll shorten Oracle R Enterprise to now) topology looks like the diagram below.
So where does Exalytics come in to this? If you’ve followed-along so far you may well have spotted that, as ORE is in fact a database option and therefore runs as part of the Oracle Database, it shouldn’t really be installed (along with an Oracle database) on the Exalytics server – apart from having to license 20 processors of Oracle Database Enterprise Edition plus the Advanced Analytics Option, Exalytics is really meant for just OBIEE, WebLogic, TimesTen and Essbase, with ORE really supposed to reside on Exadata, or at least a separate database server. What Exalytics does do well though is play the role of a supercharged client for ORE, with Open-source R running on Exalytics then connecting to ORE on Exadata; The R client can then spin-up multiple R engines to process models in parallel making use of Exalytics 40 cores, whilst the 1TB of RAM allows multiple copies of the models’ data to be held in memory without the machine breaking a sweat. Coupled with ORE’s ability to spin-up it’s own R engines on the Exalytics server, and the InfiniBand connection between the two servers, and Oracle Big Data Appliance if you’ve also got this, and your R topology now looks like the diagram below.
The question you’re probably asking at this point now, seeing as we’ve established where R fits into the Oracle BI and big data architecture, is just what is R? And what can it do for Oracle BI, if it’s just a statistical programming language? Well if you’ve got the latest OBIEE 11g SampleApp (v207) downloadable from OTN, it’s actually got R, and Oracle R Enterprise, already installed and set up, ready to go. So assuming you’ve got SampleApp v207 installed and all of the OBIEE and other servers running, you can start your first R session by selecting Applications > Accessories > Terminal from the Linux desktop menu bar, then type in “R” to start the R console, part of the standard R client, like this:
[oracle@obieesampleapp ~]$ R R version 2.13.1 (2011-07-08) Copyright (C) 2011 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: i686-redhat-linux-gnu (32-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. Loading Oracle R Enterprise Packages Connecting to Oracle RDBMS User: rquser SID : orcl Host: localhost Port: 1521 Done. >
Note how starting the R console first displays the open-source license messages, then displays details about ORE, finishing up by displaying the connection details of the database account it’s now going to connect us to, which is the standard ORE account on a database that’s been configured to use the Advanced Analytics Option.
Obviously explaining the full syntax and capabilities of R is outside the scope of this blog post (try the online ORE docs, and the R Project’s online manuals), but ORE comes with a number of sample R scripts that are part of a base ORE package, that you can look at to get a flavour of the language and its syntax. Whilst connected to the R console you can list out the demo ORE scripts like this:
> demo (package = "ORE") basic Basic connectivity to database binning Binning logic columnfns Column functions cor Correlation matrix crosstab Frequency cross tabulations derived Handling of derived columns distributions Distribution, density, and quantile functions do_eval Embedded R processing freqanalysis Frequency cross tabulations graphics Demonstrates visual analysis group_apply Embedded R processing by group hypothesis Hyphothesis testing functions matrix Matrix related operations nulls Handling of NULL in SQL vs. NA in R push_pull RDBMS <-> R data transfer rank Attributed-based ranking of observations reg Ordinary least squares linear regression row_apply Embedded R processing by row chunks sql_like Mapping of R to SQL commands stepwise Stepwise OLS linear regression summary Summary functionality table_apply Embedded R processing of entire table
To run one of these, for example the correlation matrix one, type in the command:
> demo ("cor", package = "ORE")
R also ships with a number of graphics demos, that show off some of the graphs and other visualisations that R can produce. To run these, from the R console type in:
> demo (graphics)
The R console will then step you through a number of graph demos, displaying each graph when you press the enter key.
Compared to the basic statistical functions provided by Oracle SQL, R provides a wider variety of statistical and graphical techniques including:
- Linear and non-linear modelling
- Classical statistical tests and time-series analysis
- Classification, clustering and other capabilities
- Matrix arithmetic, with scalar, vector, list and data frame (analogous to relational tables)
In addition, R is extensible through community-contributed packages at the Comprehensive R Archive Network (CRAN), which is probably the main attraction for users of R, and it can connect to “big data” sources such as Hadoop through Oracle’s R Connector for Hadoop. So now that we’ve seen the basics of R and how it might benefit from Exalytics and Oracle’s Enterprise R features, how might R, OBIEE and Endeca work together if used on Exalytics? In the final posting in this series we’ll look at a case study that takes the publically-available Flight Delays dataset and analyses it using OBIEE, Endeca and Oracle R Enterprise, to see what each tool can contribute and how they might look to a typical end-user.
Oracle Exalytics, Oracle R Enterprise and Endeca Part 1 : Oracle’s Analytics, Engineered Systems, and Big Data Strategy
One of the presentations Rittman Mead gave at last week’s Oracle Openworld was entitled “High Speed, Big Data Analysis using Oracle Exalytics” [PDF]. Although it was the last of my presentations it was probably the one that I most looked forward to delivering, as it talked about how Oracle’s in-memory analytics server could also be used for advanced analytics, and unstructured data analysis along with OBIEE’s traditional dashboards and ad-hoc reports. So how do Endeca Information Discovery and R, the most recent addition to Oracle’s analytics toolkit, relate to Exalytics and what benefits does this high-end server provide for these tools? Over this next week I’ll be looking at this topic in detail, with the following postings in the series (links will be added as each post is published).
- Oracle Exalytics, Oracle R Enterprise and Endeca Part 1 : Oracle’s Analytics, Engineered Systems, and Big Data Strategy
- Oracle Exalytics, Oracle R Enterprise and Endeca Part 2 : Oracle Endeca, the Advanced Analytics Option and Oracle Exalytics
- Oracle Exalytics, Oracle R Enterprise and Endeca Part 3 : Flight Delays Analysis using OBIEE, Endeca and Oracle R Enterprise on Exalytics
For anyone new to the product, Oracle Exalytics In-Memory Machine is Oracle’s “engineered system” for business intelligence and analytics. Typically used alongside Oracle Exadata Database Machine, an analogy is that Exalytics is the “set top box” to Exadata’s 50″ flat screen TV, in that it provides query acceleration and highly-interactive visuals to accompany the terabytes of data typically managed by an Exadata Database Machine server. So far on the blog we’ve mostly talked about Exalytics in the context of OBIEE, but it also hosts Oracle Essbase (a multi-dimensional OLAP server) and is certified for use with Oracle Endeca Information Discovery, Oracle’s discovery/analytics tool for unstructured, semi-structured and structured data. Another way of thinking of Exalytics (idea courtesy of Oracle’s Jack Berkowitz, who looks after the BI Presentation Server part of OBIEE) is that’s it’s like the Akamai web caching service; for a single end-user it generally provides faster page delivery than Oracle could provide from it’s own web servers , but when it comes into its own is when there are 10,000 or 1m people trying to access Oracle’s website at a time – Akaimai’s cache, like Exalytics’ cache, guarantees fast service when user numbers scale beyond just a few test users, due in Exalytics case to the TimesTen in-memory database that provides a mid-tier cache between OBIEE’s BI Server and Presentation Server and the various data sources accessed in the dashboard.
As I mentioned before though, Exalytics also supports Essbase and the rest of the EPM product stack (where the product runs on Linux), with Essbase included in the Oracle BI Foundation Suite product bundle that comes with the base Exalytics server. Exalytics, from version 1.1, is also certified to run Endeca Information Discovery, details of which are in a series of blog posts that you can read on Rittman Mead’s Endeca homepage here. In fact, this wide range of query tools and analytic engines is one of the four pillar’s of Oracle’s current business analytics strategy, which as the diagram below shows covers data from any source, analytics using multiple query tools and engines, packaged applications and delivery via the web, mobile, desktop, or embedded in business processes and applications.
“Big Data” is as I’m sure most readers will be aware, along with Cloud the current buzz-word and hot topic within Oracle and the wider IT world, and refers to much larger data sets than we’re used to with relational databases holding much more granular data such as meter readings, bus movements, sensor data and the like. The interest in big data comes from its ability to provide us with much more context about people, activities and events of interest than we get with traditional data such as sales figures and product inventories, and is now made possible by server specs going up coupled with a bunch of new database and analysis techniques that eschew regular SQL and relational stores in favour of file-based databases, “NoSQL”-type languages and distributed processing tools that first crunch numbers and then extract useful information (Hadoop and MapReduce, for example). Oracle of course have put together a bunch of products to address Big Data requirements, including another engineered system called Oracle Big Data Appliance that couples a third-party distribution of Hadoop and MapReduce along with new Oracle products such as NoSQL and Oracle R Enterprise; Big Data Appliance therefore sits alongside Exadata as the second part of Oracle’s engineered systems data management hardware/software product set.
The idea here then is that Big Data Appliance acts as a data gatherer/processor/cruncher for a big-data enabled analytics environment, with Big Data Appliance linked to Exadata via InfiniBand and ODI, via Oracle’s Big Data Adaptors, taking nuggets of pre-processed data from Big Data Appliance and then loading them into Exadata for later analysis by Exalytics, Endeca or Oracle RTD.
Big Data Appliance is mostly concerned with acquiring and organising data from unstructured sources, then processing it into a structured form (via Hadoop and MapReduce) for loading into Exadata, or querying via tools such as Endeca and Oracle Real-Time Decisions. But Big Data Appliance also comes with something called R, and Oracle have also recently released a new database option called the Advanced Analytics Option that comes with Oracle R Enterprise, Oracle’s added-value version of R that leverages the scale and capacities of the Oracle Database. So what is the Advanced Analytics Option and what is Oracle R Enterprise, and what can these new analytic capabilities provide for the BI Developer? We’ll look at this topic in more detail in the second posting in this series, tomorrow.
Oracle BI Developers Guide, and Oracle Exalytics Revealed, Now Available to Buy!
After what has been around five years of development, including two years of waiting for OBIEE11g and then three of desperately keeping up with 11g point releases, the Oracle Business Intelligence Developers Guide has now been published by Oracle Press / McGraw Hill and is available for purchase at Amazon.com, Amazon.co.uk, Barnes & Noble.com, Apple’s iTunes bookstore, and most other online and high-street retailers. It’s a pretty hefty volume, covering in over a thousand pages the whole spectrum of OBIEE development from installation, configuration of the BI Repository, accessing relational, OLAP, file and XML data, and then analysing it using dashboards, analyses, KPIs, scorecards, maps, published reports and mobile devices.
Based on the 11.1.1.6 release of OBIEE, the book features in-depth coverage and step-by-step examples of security, the action framework, change management and deployment through environments, clustering and high availability, and Oracle Exalytics In-Memory Machine, with the following chapter list.
- Chapter 1 : Oracle Business Intelligence Overview and Architecture
- Chapter 2 : Installation and Upgrading of Oracle Business Intelligence
- Chapter 3 : Modeling Repositories using Relational, File and XML Data Sources
- Chapter 4 : Creating Repositories from Oracle Essbase and Other OLAP Sources
- Chapter 5 : Configuring and Maintaining the Oracle BI Server
- Chapter 6 : Creating Analyses, Dashboards, KPIs and Scorecards
- Chapter 7 : Actionable Intelligence
- Chapter 8 : Security
- Chapter 9 : Creating Published Reports
- Chapter 10 : Systems Management
- Chapter 11 : Managing Change
- Chapter 12 : Oracle Exalytics In-Memory Machine
My aim at the outset with the book was to make it the definitive guide to developing on the OBIEE 11g platform, with every page in the book written by myself and Oracle’s Mike Durran (Principal Product Manager, Oracle Business Intelligence) and our own Venkatakrishnan J (Managing Director, Rittman Mead India) as technical editors, and Balaji Yelamanchili (Senior Vice President, Analytics and Performance Management Products) kindly providing the foreword. The book is full of worked examples and comes with a downloadable dataset (available shortly on the McGraw Hill website) that allows you to follow along with the examples, and try them out yourself.
As well as the Oracle Business Intelligence Developers Guide which covers the whole product stack and is based on the 11.1.1.6 release of OBIEE, Oracle Press have also published Oracle Exalytics Revealed, a specially-priced ebook-only release containing an extended and updated version of the main book’s Oracle Exalytics In-Memory Machine chapter.
The book is only just out in the United States and was rush-released for Oracle Openworld, but it should be available in other countries in the next week or so. Over the coming weeks and months I’ll be posting errata and other content on the book’s homepage on our website, but for now, take a look at the two books and if you find them useful, post a review or drop me a line at mark.rittman@rittmanmead.com
BI, Data Warehousing and Data Integration News from Oracle Openworld 2012
This week is Oracle Openworld week in San Francisco, USA, with around 50,000 attendees attending Openworld itself, Java One and fringe events such as Oaktable world. Rittman Mead have had ten sessions during the week, covering topics such as OBIEE, Endeca, data warehousing, Essbase and EPM, Exalyics and Oracle Advanced Analytics, with links to our presentation downloads below (links will be added as presentations are given):
- Oracle Exalytics and TimesTen for Exalytics Best Practices (Mark Rittman)
- Endeca Information Discovery for Oracle BI/DW Developers (Mark Rittman)
- Integrating Oracle Fusion Middleware 11g and OBIEE 11g (Mark Rittman)
- How to Integrate OBIEE 11g and Essbase / EPM (Mark Rittman)
- High-Speed, Big Data Analytics using Oracle Exalytics (Mark Rittman)
- Event-Driven Real-Time Analytics (Jon Mead)
- In-Memory Analytics: Oracle TimesTen In-Memory Database Compared with Oracle Essbase 11.1.2.2 (Venkatakrishnan J)
- No-Surprises Development and Environment Management (Stewart Bryson and Kellyn Pot’Vin)
- Report Against Transactional Schemas with Oracle Business Intelligence Enterprise Edition 11g (Stewart Bryson)
- Developing Search/Analytic BI Applications with Oracle Endeca Information Discovery (Stewart Bryson)
It’s also a good opportunity for us to attend the various product roadmap sessions, talk to the product managers and catch-up with our friends and colleagues in the industry. Jon Mead posted an update earlier in the week on the first few days, but what I’d like to go through in this posting is some of the product news from the week, focusing on OBIEE, Endeca, Data Integrator and the BI Applications.
Before we get onto those products though, the major non-BI news this week was around Oracle Database 12c, Oracle Exadata X3 Database In-Memory Machine, and Oracle Public Cloud. Many people (including myself) were expecting Oracle to formally announce and launch Database 12c this week, but it’s been a sort-of strange “non-launch” this week with the product described in a fair bit of detail, but not formally launched with the accompanying white papers on oracle.com, detailed articles on their website and so forth. In fact, formally launching a product in that way creates a number of obligations to release the product within a certain timeframe, so by announcing but not launching the product Oracle can get the (expected) word out whilst giving themselves a bit more latitude around when the product actually becomes available. In terms of the database and in particular data warehousing new features, some ones that were called out were:
- Adaptive query optimization – sounded like explain plans being able to evolve mid-query (which could be interesting when doing some tuning)
- New partitioning abilities such as operations on multiple partitions, ability to mass-partition an unpartitioned table etc – Jonathan Lewis has a good article on this
- New online DDL operations such as partition move
- Asynchronous partitioned global index maintenance
- Out-of-place refresh and synchronous refresh for MVs
- In-database MapReduce and Hadoop (interesting…)
- New In-Database predictive analytics
- Further embedding of R in the database (see my posting later in the week on R and Oracle R Enterprise)
- Automatic data compression (may be Exadata only) – database reviews data usage and selects from compression for archive, read or read/write
- Pluggable databases – a form of database virtualisation where the overall database “root” is called a container database, whilst the virtualised/hosted instances are called “pluggable databases”, aimed primarily at the cloud/multi-tenant/consolidation space
The big news though was around Oracle Exadata X3 Database “In-Memory Machine”. The idea here is that X3 is an update to Oracle’s database “engineered system” line with, this time around, 26TB of RAM on a full-rack machine and the overall product positioned as an “in-memory” database server, with disk being used to supplement the main memory store. Whilst this is certainly impressive and not to be sniffed at, it’s slightly disingenuous to call it an “in-memory” database as only 4TB of the overall 26TB of memory is DRAM, with the rest being flash memory – a bit like RAM on your laptop compared to memory in a USB memory stick. Each server within the overall Exadata rack has one eighth of this total memory, meaning that a typical Exadata server has 1/2 TB of RAM (plus all the flash memory) compared to 1TB of RAM for an Exalytics server (and no flash memory). So we won’t be going looking for a refund on our Exalytics server yet (of which there was no real news about an Exalytics v2) but certainly it’ll be one more product to compete against SAP Hana with, and if someone gave us one for free we’d certainly be all over it. But not quite 26TB of RAM as you’d normally think of it, and you still need Exalytics for hosting TimesTen and Essbase.
So on to OBIEE, Endeca and Advanced Analytics. Versions 11.1.1.7 and 11.1.1.8 were talked about at the BI Roadmap & Strategy talk, with 11.1.1.7 likely to feature installation within IBM WebSphere as an option, and with new visualisations coming along such as:
- Interactive Trellis – ability to draw selection boxes around sets of trellis cells to include, exclude data from the view
- In-line Planning – the example shown was around revenue simulation, where sliders could be used to vary # of sales people, revenue, hours etc with Essbase in the background then varying a projected profit figure displayed on the screen as the “tip” of a tree-like structure
- Motion chart – augmenting existing time controls and designed to show trends over time
- Heatmap – a grid of coloured cells showing different shades of base colours (red, green etc) to reveal distribution and hidden patterns
- Timeline Analysis – represents key events over a particular period and reveals supporting details as required
- Histogram / Chip Display – a variation on the trellis chart / sparkline chart plotting density, and supporting estimation by showing a visual impression of the distribution of data
- Treemap – shows patterns in data by displaying hierarchical (tree-structured) data as sets of nested rectangles
- Updates to the thematic maps and hierarchy wheel visualisations
- Performance tiles, freeze headers for tables, waterfall and stacked bar charts, precision layers, improved printing, full-featured Excel interaction
BI Mobile was talked about, with the 11.1.1.6.2 BP1 “BI Mobile HD” version being showcased and talk about more specialised, job-specific mobile applications called “BI Mobile Solutions” that take elements of Oracle BI Mobile and embed it in function-specific, pre-built applications (including Windows Mobile and Android) apps. Nothing on dates or how these products would be distributed, but coupled with an announcement in a separate session about making the BI Mobile application available as a static library for embedding in security wrappers provided by the likes of Bitzer Mobile, it seems likely that Oracle are focusing on mobile as a first-class delivery platform for BI and looking to address some of the remaining shortcomings of their more general purpose BI Mobile app.
The other major announcement for OBIEE was around SmartView replacing BI Office as OBIEE’s MS Office client. BI Office has been lacking for a while now whilst SmartView, though technically compatible with OBIEE wasn’t really suitable as a replacement for BI Office. OBIEE 11.1.1.7 looks likely to introduce an updated version of SmartView that will support migration from BI Office, and will work as an Office front-end for both OBIEE and the Hyperion Tools, covering MS Word, MS Excel and MS Powerpoint. Should be interesting to see when it comes out.
Endeca and the BI Apps also go their own mention and, in the case of BI Apps, roadmap presentation. No real new news on either product (we covered the BI Apps product roadmap in a three-part series earlier in the year, available here, here and here) but it was good to hear Florian Schouten talk about where the BI Apps user interface is going post 11g, and to hear the search/discovery and unstructured analytics message getting out to the audience for Endeca. R, and Oracle R Enterprise also got a mention at many sessions, and I’m looking forward to delivering my combined OBIEE / Endeca and Oracle R Enterprise on Exalytics talk at Openworld later today. So positive words and lots around visualisations for Oracle’s BI tools at Openworld.
My other main interest for this year’s Openworld was around data integration, big data and advanced analytics, and in several sessions Oracle’s big data strategy was set out similar to the diagram below.
Key elements here that may not be apparent to readers immersed in the “big data” story include:
- Oracle Big Data Appliance (a large, full-rack server running various Oracle “big data” tools and the Cloudera distribution of Hadoop and MapReduce) acts as the collection point for the mass of machine data, social media conversations and other “big data” data flows and uses techniques such as MapReduce to condense this data down into something suitable for loading into Exadata, via Oracle’s Big Data Connectors (and over InfiniBand for high-bandwidth loading)
- Exadata then acts as the down-stream storage system of record for this condensed information, with Endeca then supporting unstructured/discovery-type analytics, Exalytics (again via InfiniBand) performing traditional dashboard and OLAP-style analytics, and RTD supporting decisioning and predictive modelling.
- All of these tools fit into the canonical acquire-organise-analyze-decide big data analytics approach
Big Data Connectors are a key part also of Oracle’s Data Integration strategy, with ODI providing the link between Big Data Appliance and Exadata, and in-general moving and orchestrating data across the whole Oracle product stack, in a process Oracle have termed “fast data”.
The ODI product roadmap and futures session covered the use of ODI with big data and went through some of the release themes for future versions of the product, which look likely to pan out like this:
- ODI 11.1.1.7 – due around the autumn/ fall of 2012 – will focus on XML handling and parsing (a welcome bit of news)
- ODI 12c – probably due in 2013 at some point – will be the OWB/ODI conversion release, featuring a “developer jumpstart” that will allow the tool to be switched between pure old-style ODI and the new “mappings”-based ODI/OWB convergence approach
ODI 11.1.1.7 is also likely to be the release that the BI Apps team will use to extend ETL support to this product, a topic again at I covered in my BI Apps roadmap postings earlier in the year. ODI 12c is going to be the most significant release, with the concept of interfaces going away an instead an OWB-style approach being used where mappings can contain multiple steps, objects like variables will appear in the mapping and be configurable using the mapping UI, and many of the concepts used in OWB brought across to ODI. Expect more details once 2013 comes along, and also details on how optional migration from OWB will become available as a feature within the tool.
So that’s in in terms of updates from me, with the time now 10.45am on the Wednesday and my last presentation, on Exalytics and Big Data Analytics, due at 5pm later today. Have a safe journey home if you’re also over in San Francisco, and expect a posting from me in the next few days on using OBIEE, Endeca and Oracle R Enterprise on the Exalytics platform based on today’s talk.