Tag Archives: Oracle BI Suite EE

Introduction to Oracle BI Cloud Service : Provisioning Data

In the first post in this series I looked at the new Oracle BI Cloud Service, which went GA over the weekend and which Rittman Mead have been using these past few weeks as part of a beta release. In the first post I looked at what BICS is and who its aimed at in this initial release, and went through the features at a high-level; over the rest of the week I’ll be looking at the features in-detail, starting today with the data upload and provisioning process. Here’s the links to the rest of the series, with the items getting updated over the week as I post each entry in the series:

As I mentioned in that first post, “Introduction to Oracle BI Cloud Service : Product Overview”, BICS in this initial release to my mind is aimed at departmental use-cases where someone wants to quickly upload and analyse an offline dataset and share the results with other members of their team. BICS comes bundled with Oracle Database Schema Service and 50GB of storage, and OBIEE in this setup reports just against this data source with no ability to reach-out dynamically to other data sources or blend those sources with the main one in Oracle’s cloud database. It’s aimed really at users with a single source of data to work with, who’ve probably obtained it as an export from some other system and just want to be able to report against it, though as we’ll see later in this post it is possible to link to other SaaS sources with a bit of PL/SQL wizardry.

So the first task you’re likely to perform when working with BICS is to upload some data to report on. There are three main options for uploading data to BICS, two of which are browser-based and aimed at end-users, and one that uses SQL*Developer and more aimed at devs. BICS itself comes with a menu items on the home page for uploading data, and this is what we’ll think users will use most as it’s built-into the tool and fairly prominent.

NewImage

Clicking on this menu item launches an ApEx application hosted in the Database Schema Service that comes with BICS, and which allows you to upload and parse XLS and delimited file-types to the database cloud instance and then store the contents in database tables.

NewImage

Oracle Database Schema Service also comes with Application Express (ApEx) as a front-end, and ApEx has similar tools for upload datasets into the service, with additional features for creating views and PL/SQL packages to process and manipulate the data, something we used in our beta program example to connect to Salesforce.com and download data using their REST API. In-theory you shouldn’t need to use these features much, but SIs and partners such as ourselves will no doubt use ApEx a lot to build out the loading infrastructure, data cleansing and other features that you might want for a packaged cloud app – so get your PL/SQL books out and brush-up on ApEx development.

NewImage

The other way to get data into BICS is to use Oracle SQLDeveloper, which has a special Oracle Cloud connector type that allows you to view and work with database objects as if they were regular database ones, and upload data to the cloud in the form of “carts”. I’d imagine these options will get extended over time, either by tools or utilities Oracle release for this v1.0 BICS release, or by BICS eventually supporting the full Oracle Database Instance Service that’ll support regular SQLNet connections from ETL tools.

NewImage

So once you’ve got some data uploaded into Database Schema Services, you’ll end up with a set of source tables from which you can create your BI Repository. Check back tomorrow for more details on how BICS’s new thin-client data modeller works and how you create your business model against this cloud data source, including how the repository editing and checkout process works in this new potentially multi-user development environment.

 

Introduction to Oracle BI Cloud Service : Product Overview

Long-term readers of this blog will probably know that I’m enthusiastic about the possibilities around running OBIEE in the cloud, and over the past few weeks Rittman Mead have been participating in the beta program for release one of Oracle’s Business Intelligence Cloud Service (BICS). BICS went GA over the weekend and is now live on Oracle’s public cloud site, so all of this week we’ll be running a special five-part series on what BI Cloud Service is, how it works and how you go about building a simple application. I’m also presenting on BICS and our beta program experiences at Oracle Openworld this week (Oracle BI in the Cloud: Getting Started, Deployment Scenarios, and Best Practices [CON2659], Monday Sep 29 10:15 AM – 11.00 AM Moscone West 3014), so if you’re at the event and want to hear our thoughts, come along.

Over the next five days I’ll be covering the following topics, and I’ll update the list with hyperlinks once the articles are published:

So what is Oracle BI Cloud Service, and how does it relate to regular, on-premise OBIEE11g?

On the Oracle BI Cloud Service homepage, Oracle position the product as “Agile Business Intelligence in the Cloud for Everyone”, and there’s a couple of key points in this positioning that describe the product well.

NewImage

The “agile” part is referring to the point that being cloud-based, there’s no on-premise infrastructure to stand-up, so you can get started a lot quicker than if you needed to procure servers, get the infrastructure installed, configure the software and get it accepted by the IT department. Agile also refers to the fact that you don’t need to purchase perpetual or one/two-year term licenses for the software, so you can use OBIEE for more tactical projects without having to worry about expensive long-term license deals. The final way that BICS is “agile” is in the simplified, user-focused tools that you use to build your cloud-based dashboards, with BICS adopting a more consumer-like user interface that in-theory should mean you don’t have to attend a course to use it.

BICS is built around standard OBIEE 11g, with an updated user interface that’ll roll-out across on-premise OBIEE in the next release and the standard Analysis Editor, Dashboard Editor and repository (RPD) under the covers. Your initial OBIEE homepage is a modified version of the standard OBIEE homepage that lists standard developer functions down the left-hand side as a series of menu items, and the BI Administration tool is replaced with an online, thin-client repository editor that provides a subset of the full BI Administration tool functionality.

NewImage

Customers who license BICS in this initial release get two environments (or instances) to work with; a pre-prod or development environment to create their applications in initially, and a production environment into which they deploy each release of their work. BICS is also bundled with Oracle Database Schema Service, a single-schema Oracle Database service with an ApEx front-end into which you store the data that BICS reports on, and with ApEx and BICS itself having tools to upload data into it; this is, however, the only data source that BICS in version 1 supports, so any data that your cloud-based dashboards report on has to be loaded into Database Schema Service before you can use it, and you have to use Oracle’s provided tools to do this as regular ETL tools won’t connect. We’ll get onto the data provisioning process in the next article in this five-part series.

BICS dashboards and reports currently support a subset of what’s available in the on-premise version. The Analysis Editor (“Answers”) is the same as on-premise OBIEE with the catalog view on the left-hand side, tabs for Results and so on, and the same set of view types (and in fact a new one, for heat maps). There’s currently no access to Agents, Scorecards, BI Publisher or any other Presentation Services features that require a database back-end though, or any Essbase database in the background as you get with on-premise OBIEE 11.1.1.7+.

NewImage

What does become easier to deploy though is Oracle BI Mobile HD as every BICS instance is, by definition, accessible over the internet. Last time I checked the current version of BI Mobile HD on Apple’s App Store couldn’t yet connect, but I’m presuming an update will be out shortly to deal with BICS’s login process, which gets you to enter a BICS username and password along with an “identity domain” that specifics the particular company tenant ID that you use.

NewImage

I’ll cover the thin-client data modeller later in this series in more detail, but at a high-level what this does is remove the need for you to download and install Oracle BI Administration to set up your BI Repository, something that would have been untenable for Oracle if they were serious about selling a cloud-based BI tool. The thin-client data modeller takes the most important (to casual users) features of BI Administration and makes them available in a browser-based environment, so that you can create simple repository models against a single data source and add features like dimension hierarchies, calculations, row-based and subject-area security using a point-and-click environment.

NewImage

Features that are excluded in this initial release include the ability to define multiple logical table sources for a logical table, creating multiple business areas, creating calculations using physical (vs. logical) tables and so on, and there’s no way to upload on-premise RPDs to BICS, or download BICS ones to use on-premise, at this stage. What you do get with BICS is a new import and export format called a “BI Archive” which bundles up the RPD, the catalog and the security settings into a single archive file, and which you use to move applications between your two instances and to store backups of what you’ve created.

So what market is BICS aimed at in this initial release, and what can it be used for? I think it’s fair to say that in this initial release, it’s not a drop-in replacement for on-premise OBIEE 11g, with only a subset of the on-premise features initially supported and some fairly major limitations such as only being able to report against a single database source, no access to Agents, BI Publisher, Essbase and so on. But like the first iteration of the iPhone or any consumer version of a previously enterprise-only tool, its trying to do a few things well and aiming at a particular market – in this case, departmental users who want to stand-up an OBIEE environment quickly, maybe only for a limited amount of time, and who are familiar with OBIEE and would like to carry on using it. In some ways its target market is those OBIEE customers who might otherwise have use Qlikview, Tableau or one of the new SaaS BI services such as Good Data, who most probably have some data exports in the form of Excel spreadsheets or CSV documents, want to upload them to a BI service without getting all of IT involved and then share the results in the form of dashboards and reports with their team. Pricing-wise this appears to be who Oracle are aiming the service at (minimum 10 users, $3500/month including 50GB of database storage) and with the product being so close to standard OBIEE functionality in terms of how you use it, it’s most likely to appeal to customers who already use OBIEE 11g in their organisation.

That said, I can see partners and ISVs adopting BICS to deliver cloud-based SaaS BI applications to their customers, either as stand-alone analysis apps or as add-ons to other SaaS apps that need reporting functionality. Oracle BI Cloud Service is part of the wider Oracle Platform-as-a-Service (PaaS) that includes Java (WebLogic), Database, Documents, Compute and Storage, so I can see companies such as ourselves developing reporting applications for the likes of Salesforce, Oracle Sales Cloud and other SaaS apps and then selling them, hosting included, through Oracle’s cloud platform; I’ll cover our initial work in this area, developing a reporting application for Salesforce.com data, later in this series.

NewImage

Of course it’s been possible to deploy OBIEE in the cloud for some while, with this presentation of mine from BIWA 2014 covering the main options; indeed, Rittman Mead host OBIEE instances for customers in Amazon AWS and do most of our development and training in the cloud including our exclusive “ExtremeBI in the Cloud” agile BI service; but BICS has two major advantages for customers looking to cloud-deploy OBIEE:

  • It’s entirely thin-client, with no need for local installs of BI Administration and so forth. There’s also no need to get involved with Enterprise Manager Fusion Middleware Control for adding users to application roles, defining application role mappings and so on
  • You can license it monthly, including data storage. No other on-premise license option lets you do this, with the shortest term license being one year

such that we’ll be offering it as an alternative to AWS hosting for our ExtremeBI product, for customers who in-particular want the monthly license option.

So, an interesting start. As I said, I’ll be covering the detail of how BICS works over the next five days, starting with the data upload and provisioning process in tomorrow’s post – check back tomorrow for the next instalment.

OBIEE SampleApp in The Cloud: Importing VirtualBox Machines to AWS EC2

Virtualisation has revolutionised how we work as developers. A decade ago, using new software would mean trying to find room on a real tin server to install it, hoping it worked, and if it didn’t, picking apart the pieces probably leaving the server in a worse state than it was to begin with. Nowadays, we can just launch a virtual machine to give a clean environment and if it doesn’t work – trash it and start again.
The sting in the tail of virtualisation is that full-blown VMs are heavy – for disk we need several GB just for a blank OS, and dozens of GB if you’re talking about a software stack such as Fusion MiddleWare (FMW), and the host machine needs to have the RAM and CPU to support it all too. Technologies such as Linux Containers go some way to making things lighter by abstracting out a chunk of the OS, but this isn’t something that’s reached the common desktop yet.

So whilst VMs are awesome, it’s not always practical to maintain a library of all of them on your local laptop (even 1TB drives fill up pretty quickly), nor will your laptop have the grunt to run more than one or two VMs at most. VMs like this are also local to your laptop or server – but wouldn’t it be neat if you could duplicate that VM and make a server based on it instantly available to anyone in the world with an internet connection? And that’s where The Cloud comes in, because it enables us to store as much data as we can eat (and pay for), and provision “hardware” at the click of a button for just as long as we need it, accessible from anywhere.

Here at Rittman Mead we make extensive use of Amazon Web Services (AWS) and their Elastic Computing Cloud (EC2) offering. Our website runs on it, our training servers run on it, and it scales just as we need it to. A class of 3 students is as easy to provision for as a class of 24 – no hunting around for spare servers or laptops, no hardware sat idle in a cupboard as spare capacity “just in case”.

One of the challenges that we’ve faced up until now is that all servers have had to be built from scratch in the cloud. Obviously we work with development VMs on local machines too, so wouldn’t it be nice if we could build VMs locally and then push them to the cloud? Well, now we can. Amazon offer a route to import virtual machines, and in this article I’m going to show how that works. I’ll use the superb SampleApp v406 VM that Oracle provide, because this is a great real-life example of a VM that is so useful, but many developers can find too memory-intensive to be able to run on their local machines all the time.

This tutorial is based on exporting a Linux guest VM from a Linux host server. A Windows guest probably behaves differently, but a Mac or Windows host should work fine since VirtualBox is supported on both. The specifics are based on SampleApp, but the process should be broadly the same for all VMs. 

Obtain the VM

We’re going to use SampleApp, which can be downloaded from Oracle.

  1. Download the six-part archive from http://www.oracle.com/technetwork/middleware/bi-foundation/obiee-samples–167534.html
  2. Verify the md5 checksums against those published on the download page:
    [oracle@asgard sampleapp406]$ ll
    total 30490752
    -rw-r--r-- 1 oracle oinstall 5242880000 Sep  9 01:33 SampleAppv406.zip.001
    -rw-r--r-- 1 oracle oinstall 5242880000 Sep  9 01:30 SampleAppv406.zip.002
    -rw-r--r-- 1 oracle oinstall 5242880000 Sep  9 02:03 SampleAppv406.zip.003
    -rw-r--r-- 1 oracle oinstall 5242880000 Sep  9 02:34 SampleAppv406.zip.004
    -rw-r--r-- 1 oracle oinstall 5242880000 Sep  9 02:19 SampleAppv406.zip.005
    -rw-r--r-- 1 oracle oinstall 4977591522 Sep  9 02:53 SampleAppv406.zip.006
    [oracle@asgard sampleapp406]$ md5sum *
    2b9e11f69ada5f889088dd74b5229322  SampleAppv406.zip.001
    f8a1a5ae6162b20b3e9c6c888698c071  SampleAppv406.zip.002
    68438cfea87e8d3a2e2f15ff00dadf12  SampleAppv406.zip.003
    b71d9ace4f75951198fc8197da1cfe62  SampleAppv406.zip.004
    4f1a5389c9e0addc19dce6bbc759ec20  SampleAppv406.zip.005
    2c430f87e22ff9718d5528247eff2da4  SampleAppv406.zip.006
  3. Unpack the archive using 7zip — the instructions for SampleApp are very clear that you must use 7zip, and not another archive tool such as winzip.
    [oracle@asgard sampleapp406]$ time 7za x SampleAppv406.zip.001</code>7-Zip (A) [64] 9.20 Copyright (c) 1999-2010 Igor Pavlov 2010-11-18
    p7zip Version 9.20 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,80 CPUs)
    
    Processing archive: SampleAppv406.zip.001
    
    Extracting SampleAppv406Appliance
    Extracting SampleAppv406Appliance/SampleAppv406ga-disk1.vmdk
    Extracting SampleAppv406Appliance/SampleAppv406ga.ovf
    
    Everything is Ok
    
    Folders: 1
    Files: 2
    Size: 31191990916
    Compressed: 5242880000
    
    real 1m53.685s
    user 0m16.562s
    sys 1m15.578s
  4. Because we need to change a couple of things on the VM first (see below), we’ll have to import the VM to VirtualBox so that we can boot it up and make these changes.You can import using the VirtualBox GUI, or as I prefer, the VBoxManage command line interface. I like to time all these things (just because, numbers), so stick a time command on the front:
    time VBoxManage import --vsys 0 --eula accept SampleAppv406Appliance/SampleAppv406ga.ovf

    This took 12 minutes or so, but that was on a high-spec system, so YMMV.
    [...]
    0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%
    Successfully imported the appliance.
    
    real    12m15.434s
    user    0m1.674s
    sys     0m2.807s

Preparing the VM

Importing Linux VMs to Amazon EC2 will only work if the kernel is supported, which according to an AWS blog post includes Red Hat Enterprise Linux 5.1 – 6.5. Whilst SampleApp v406 is built on Oracle Linux 6.5 (which isn’t listed by AWS as supported), we have the option of telling the VM to use a kernel that is Red Hat Enterprise Linux compatible (instead of the default Unbreakable Enterprise Kernel – UEK). There are some other pre-requisites that you need to check if you’re trying this with your own VM, including a network adaptor configured to use DHCP. The aforementioned blog post has details.

  1. Boot the VirtualBox VM, which should land you straight in the desktop environment, logged in as the oracle user.
  2. We need to modify a file as root (superuser). Here’s how to do it graphically, or use vi if you’re a real programmer:
    1. Open a Terminal window from the toolbar at the top of the screen
    2. Enter
      sudo gedit /etc/grub.conf

      The sudo bit is important, because it tells Linux to run the command as root. (I’m on an xkcd-roll here: 1, 2)

    3. In the text editor that opens, you will see a header to the file and then a set of repeating sections beginning with title. These are the available kernels that the machine can run under. The default is 3, which is zero-based, so it’s the fourth title section. Note that the kernel version details include uek which stands for Unbreakable Enterprise Kernel – and is not going to work on EC2.
    4. Change the default to 0, so that we’ll instead boot to a Red Hat Compatible Kernel, which will work on EC2
    5. Save the file
  3. Optional steps:
    1. Whilst you’ve got the server running, add your SSH key to the image so that you can connect to it easily once it is up on EC2. For more information about SSH keys, see my previous blog post here, and a step-by-step for doing it on SampleApp here.
    2. Disable non-SSH key logins (in /etc/ssh/sshd_config, set PasswordAuthentication no and PubkeyAuthentication yes), so that your server once on EC2 is less vulnerable to attack. Particularly important if you’re using the stock image with Admin123 as the root password.
    3. Set up screen, and OBIEE and the database as a Linux service, both covered in my article here.
  4. Shutdown the instance by entering this at a Terminal window:

    sudo shutdown -h now

Export the VirtualBox VM to Amazon EC2

Now we’re ready to really get going. The first step is to export the VirtualBox VM to a format that Amazon EC2 can work with. Whilst they don’t explicitly support VMs from VirtualBox, they do support the VMDK format – which VirtualBox can create. You can do the export from the graphical interface, or as before, from the command line:

time VBoxManage export "OBIEE SampleApp v406" --output OBIEE-SampleApp-v406.ovf

0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%
Successfully exported 1 machine(s).

real    56m51.426s
user    0m6.971s
sys     0m12.162s

If you compare the result of this to what we downloaded from Oracle it looks pretty similar – an OVF file and a VMDK file. The only difference is that the VMDK file is updated with the changes we made above, including the modified kernel settings which are crucial for the success of the next step.

[oracle@asgard sampleapp406]$ ls -lh
total 59G
-rw------- 1 oracle oinstall  30G Sep  9 10:55 OBIEE-SampleApp-v406-disk1.vmdk
-rw------- 1 oracle oinstall  15K Sep  9 09:58 OBIEE-SampleApp-v406.ovf

We’re ready now to get all cloudy. For this, you’ll need:

  1. An AWS account
    1. You’ll also need your AWS account’s Access Key and Secret Key
  2. AWS EC2 commandline tools installed, along with a Java Runtime Environment (JRE) 1.7 or greater:

    wget http://s3.amazonaws.com/ec2-downloads/ec2-api-tools.zip
    sudo mkdir /usr/local/ec2
    sudo unzip ec2-api-tools.zip -d /usr/local/ec2
    # You might need to fiddle with the following paths and version numbers: 
    sudo yum install -y java-1.7.0-openjdk.x86_64
    cat >> ~/.bash_profile <<EOF
    export JAVA_HOME="/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre"
    export EC2_HOME=/usr/local/ec2/ec2-api-tools-1.7.1.1/
    export PATH=$PATH:$EC2_HOME/bin
    EOF<

  3. Set your credentials as environment variables:
    export AWS_ACCESS_KEY=xxxxxxxxxxxxxx
    export AWS_SECRET_KEY=xxxxxxxxxxxxxxxxxxxxxx
  4. Ideally a nice fat pipe to upload the VM file over, because at 30GB it is not trivial (not in 2014, anyway)

What’s going to happen now is we use an EC2 command line tool to upload our VMDK (virtual disk) file to Amazon S3 (a storage platform), from where it gets converted into an EBS volume (Elastic Block Store, i.e. a EC2 virtual disk), and from there attached to a new EC2 instance (a “server”/”VM”).

Before we can do the upload we need an S3 “bucket” to put the disk image in that we’re uploading. You can create one from https://console.aws.amazon.com/s3/. In this example, I’ve got one called rmc-vms – but you’ll need your own.

Once the bucket has been created, we build the command line upload statement using ec2-import-instance:

time ec2-import-instance OBIEE-SampleApp-v406-disk1.vmdk --instance-type m3.large --format VMDK --architecture x86_64 --platform Linux --bucket rmc-vms --region eu-west-1 --owner-akid $AWS_ACCESS_KEY --owner-sak $AWS_SECRET_KEY

Points to note:

  • m3.large is the spec for the VM. You can see the available list here. In the AWS blog post it suggests only a subset will work with the import method, but I’ve not hit this limitation yet.
  • region is the AWS Region in which the EBS volume and EC2 instance will be built. I’m using ew-west-1 (Ireland), and it makes sense to use the one geographically closest to where you or your users are located. Still waiting for uk-yorks-1
  • architecture and platform relate to the type of VM you’re importing.

The upload process took just over 45 minutes for me, and that’s from a data centre with a decent upload:

[oracle@asgard sampleapp406]$ time ec2-import-instance OBIEE-SampleApp-v406-disk1.vmdk --instance-type m3.large --format VMDK --architecture x86_64 --platform Linux --bucket rmc-vms --region eu-west-1 --owner-akid $AWS_ACCESS_KEY --owner-sak $AWS_SECRET_KEY
Requesting volume size: 200 GB
TaskType        IMPORTINSTANCE  TaskId  import-i-fh08xcya       ExpirationTime  2014-09-16T10:07:44Z    Status  active  StatusMessage   Pending InstanceID      i-b07d3bf0
DISKIMAGE       DiskImageFormat VMDK    DiskImageSize   31191914496     VolumeSize      200     AvailabilityZone        eu-west-1a      ApproximateBytesConverted       0       Status       active  StatusMessage   Pending : Downloaded 0
Creating new manifest at rmc-vms/d77672aa-0e0b-4555-b368-79d386842112/OBIEE-SampleApp-v406-disk1.vmdkmanifest.xml
Uploading the manifest file
Uploading 31191914496 bytes across 2975 parts
0% |--------------------------------------------------| 100%
   |==================================================|
Done
Average speed was 11.088 MBps
The disk image for import-i-fh08xcya has been uploaded to Amazon S3
where it is being converted into an EC2 instance.  You may monitor the
progress of this task by running ec2-describe-conversion-tasks.  When
the task is completed, you may use ec2-delete-disk-image to remove the
image from S3.

real    46m59.871s
user    10m31.996s
sys     3m2.560s

Once the upload has finished Amazon automatically converts the VMDK (now residing on S3) into a EBS volume, and then attaches it to a new EC2 instance (i.e. a VM). You can monitor the status of this task using ec2-describe-conversion-tasks, optionally filtered on the TaskId returned by the import command above:

ec2-describe-conversion-tasks --region eu-west-1 import-i-fh08xcya

TaskType        IMPORTINSTANCE  TaskId  import-i-fh08xcya       ExpirationTime  2014-09-16T10:07:44Z    Status  active  StatusMessage   Pending InstanceID      i-b07d3bf0
DISKIMAGE       DiskImageFormat VMDK    DiskImageSize   31191914496     VolumeSize      200     AvailabilityZone        eu-west-1a      ApproximateBytesConverted       3898992128
Status  active  StatusMessage   Pending : Downloaded 31149971456

This is now an ideal time to mention as a side note the Linux utility watch, which simply re-issues a command for you every x seconds (2 by default). This way you can leave a window open and keep an eye on the progress of what is going to be a long-running job

watch ec2-describe-conversion-tasks --region eu-west-1 import-i-fh08xcya

Every 2.0s: ec2-describe-conversion-tasks --region eu-west-1 import-i-fh08xcya                                                             Tue Sep  9 12:03:24 2014

TaskType        IMPORTINSTANCE  TaskId  import-i-fh08xcya       ExpirationTime  2014-09-16T10:07:44Z    Status  active  StatusMessage   Pending InstanceID      i-b07d3bf0
DISKIMAGE       DiskImageFormat VMDK    DiskImageSize   31191914496     VolumeSize      200     AvailabilityZone        eu-west-1a      ApproximateBytesConverted       5848511808
Status  active  StatusMessage   Pending : Downloaded 31149971456

And whilst we’re at it, if you’re using a remote server to do this (as I am, to take advantage of the large bandwidth), you will find screen invaluable for keeping tasks running and being able to reconnect at will. You can read more about screen and watch here.

So back to our EC2 import job. To start with, the task will be Pending: (NB unlike lots of CLI tools, you read the output of this one left-to-right, rather than as columns with headings)

$ ec2-describe-conversion-tasks --region eu-west-1
TaskType        IMPORTINSTANCE  TaskId  import-i-ffvx6z86       ExpirationTime  2014-09-12T15:32:01Z    Status  active  StatusMessage   Pending InstanceID      i-b2245ef2
DISKIMAGE       DiskImageFormat VMDK    DiskImageSize   5021144064      VolumeSize      60      AvailabilityZone        eu-west-1a      ApproximateBytesConverted       4707330352      Status  active  StatusMessage   Pending : Downloaded 5010658304

After a few moments it gets underway, and you can see a Progress percentage indicator: (scroll right in the code snippet below to see)

TaskType        IMPORTINSTANCE  TaskId  import-i-fgr0djcc       ExpirationTime  2014-09-15T15:39:28Z    Status  active  StatusMessage   Progress: 53%   InstanceID      i-c7692e87
DISKIMAGE       DiskImageFormat VMDK    DiskImageSize   5582545920      VolumeId        vol-f71368f0    VolumeSize      20      AvailabilityZone        eu-west-1a      ApproximateBytesConverted       5582536640      Status  completed

Note that at this point you’ll see also see an Instance in the EC2 list, but it won’t launch (no attached disk – because it’s still being imported!)

If something goes wrong you’ll see the Status as cancelled, such as in this example here where the kernel in the VM was not a supported one (observe it is the UEK kernel, which isn’t supported by Amazon):

TaskType        IMPORTINSTANCE  TaskId  import-i-ffvx6z86       ExpirationTime  2014-09-12T15:32:01Z    Status  cancelled       StatusMessage   ClientError: Unsupported kernel version 2.6.32-300.32.1.el5uek       InstanceID      i-b2245ef2
DISKIMAGE       DiskImageFormat VMDK    DiskImageSize   5021144064      VolumeId        vol-91b1c896    VolumeSize      60      AvailabilityZone        eu-west-1a      ApproximateBytesConverted    5021128688      Status  completed

After an hour or so, the task should complete:

TaskType        IMPORTINSTANCE  TaskId  import-i-fh08xcya       ExpirationTime  2014-09-16T10:07:44Z    Status  completed       InstanceID      i-b07d3bf0
DISKIMAGE       DiskImageFormat VMDK    DiskImageSize   31191914496     VolumeId        vol-a383f8a4    VolumeSize      200     AvailabilityZone        eu-west-1a      ApproximateBy
tesConverted    31191855472     Status  completed

At this point you can remove the VMDK from S3 (and should do, else you’ll continue to be charged for it), following the instructions for ec2-delete-disk-image

Booting the new server on EC2

Go to your EC2 control panel, where you should see an instance (EC2 term for “server”) in Stopped state and with no name.

Select the instance, and click Start on the Actions menu. After a few moments a Public IP will be shown in the details pane. But, we’re not home free quite yet…read on.

Firewalls

So this is where it gets a bit tricky. By default, the instance will have launched with Amazon’s Firewall (known as a Security Group) in place which – unless you have an existing AWS account and have modified the default security group’s configuration – is only open on port 22, which is for ssh traffic.

You need to head over to the Security Group configuration page, accessed in several ways but easiest is clicking on the security group name from the instance details pane:

Click on the Inbound tab and then Edit, and add “Custom TCP Rule” for the following ports:

  • 7780 (OBIEE front end)
  • 7001 (WLS Console / EM)
  • 5902 (oracle VNC)

You can make things more secure by allowing access to the WLS admin (7001) and VNC port (5902) to a specific IP address or range only.

Whilst we’re talking about security, your server is now open to the internet and all the nefarious persons out there, so you’ll be wanting to harden your server not least by resetting all the passwords to ones which aren’t publicly documented in the SampleApp user documentation!

Once you’ve updated your Security Group, you can connect to your server! If you installed the OBIEE and database auto start scripts (and if not, why not??) you should find OBIEE running just nicely on http://[your ip]:7780/analytics – note that the port is 7780, not 9704.

2014-09-09_20-21-23

If you didn’t install the script, you will need to start the services manually per the SampleApp documentation. To connect to the server you can ssh (using Terminal, PuTTY, etc) to the server or connect on VNC (Admin123 is the password). For VNC clients try Screen Share on Macs (installed by default), or RealVNC on Windows.

Caveats & Disclaimers

  • Running a server on AWS EC2 costs real money, so watch out. Once you’ve put your credit card details in, Amazon will continue to charge your card whilst there are chargeable items on your account (EBS volumes, instances – running or not- , and so on). You can get an idea of the scale of charges here.
  • As mentioned above, a server on the open internet is a lot more vulnerable than one virtualised on your local machine. You will get poked and probed, usually by automated scripts looking for open ports, weak passwords, and so on. SampleApp is designed to open the toybox of a pimped-out OBIEE deployment to you, it is not “hardened”, and you risk learning the tough way about the need for it if you’re not careful.

Cloning

Amazon EC2 supports taking a snapshot of a server, either for backup/rollback purposes or spinning up as a clone, using an Amazon Machine Image (AMI). From the Instances page, simply select “Create an Image” to build your AMI. You can then build another instance (or ten) from this AMI as needed, exact replicas of the server as it was at the point that you created the image.

Lather, Rinse, and Repeat

There’s a whole host of VirtualBox “appliances” out there, and some of them such as the developer-tools-focused ones only really make sense as local VMs. But there are plenty that would benefit from a bit of “Cloud-isation”, where they’re too big or heavy to keep on your laptop all the time, but are handy to be able to spin up at will. A prime example of this for me is the EBS Vision demo database that we use for our BI Apps training. Oracle used to provide an pre-built Amazon image (know as an AMI) of this, but since withdrew it. However, Oracle do publish Oracle VM VirtualBox templates for EBS 12.1.3 and 12.2.3 (related blog), so from this with a bit of leg-work and a big upload pipe, it’s a simple matter to brew your own AWS version of it — ready to run whenever you need it.

Taking a Look at the Oracle Database 12c In-Memory Option

The In-Memory Option for Oracle Database 12c became available a few weeks ago with the 12.1.0.2 database patchset, adding column-store and in-memory capabilities to the Oracle Database. Unlike pure in-memory databases such as Oracle TimesTen, the in-memory option adds an in-memory column-store feature to the regular row-based storage in the Oracle database, creating in-memory copies of selected row-store tables in a compressed column-based storage format, with the whole process being automatic and and enabled by issuing just a couple of commands – to the point where my feedback at the end of beta testing was that it was “almost boring” – said slightly tongue-in-cheek…

NewImage

But of course adding in-memory capabilities to the Oracle database is anything but boring – whilst TimesTen on Exalytics has given us the ability to store aggregate tables, and “hot data” tables  in a dedicated in-memory database co-located with the BI mid-tier, TimesTen is another set of skills to learn and another product to manage. Moreover, you’ve got to somehow get the data you want to aggregate or otherwise cache out of your data warehouse and into TimesTen, and then you’ve got the even more difficult task of keeping that TimesTen dataset in-sync with your main data warehouse data. Most importantly though, even with the 2TB and 4TB versions of Exalytics many data warehouses won’t fit entirely into memory, so you’re going to end-up with some of your data in TimesTen and some in your data warehouse, and with only the TimesTen dataset benefiting from in-memory speeds.

So what if you could enable the in-memory option on your data warehouse, and then just run it all in-memory there? To see how this worked, I thought it’d be interesting to take the flight delays dataset that comes with the latest OBIEE 11g v406 SampleApp, and copy that across to an Oracle 12.1.0.2.0 database to test it out with the in-memory option. This dataset comes with a set of dashboard pages and analyses and contains around 6m rows of data, and in SampleApp is accelerated by an aggregate schema, also stored in Oracle, that generally reduces query times down to just a few seconds.

NewImage

So what I thought would be interesting to check would be first, whether enabling the in-memory option sped-up the dashboards, and second, whether having the in-memory option removed the need for the aggregate tables altogether. I was also interested to see whether putting the aggregate tables into memory improved their access time significantly, as anecdotally I’d heard that there wasn’t much clock-time difference between accessing aggregates in a TimesTen database vs. just creating them in the same data warehouse database as the source data. To set this all up then I created the new 12.1.0.2.0 database,. exported the BI_AIRLINES and BI_AIRLINES_AGG schemas out of the 12c database that comes with SampleApp v406, and then imported them into the same schema names in the new 12.1.0.2.0 instance, on the new VM (rather than on the SampleApp VM), and then amended the connection pool details in the SampleApp RPD to point to this new, external Oracle 12.1.0.2.0 database instance. The 12.1.0.2.0 database at this point had the following memory allocation:

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
 
SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.
 
Total System Global Area 6325010432 bytes
Fixed Size               2938448 bytes
Variable Size            1207962032 bytes
Database Buffers         5100273664 bytes
Redo Buffers             13836288 bytes
Database mounted.
Database opened.

So I don’t end-up comparing the larger database instance on my 12.1.0.2.0 VM with the smaller one the airlines data came from on the VM, I created my baseline response time measurements on this new VM and with none of the in-memory features enabled, and ran some of the dashboard pages and clicked-on some of the prompt values – and the response time wasn’t bad, with just the odd analysis that took an excessive time to return. Mostly though, the results for each dashboard came back straight-away, what you’d expect when summary tables have been deployed to speed-up reporting.

chart1

Looking at the SQL being generated by the BI Server, you can quickly see why this is the case; the BI Server is using aggregate navigation and actually querying the BI_AIRLINES_AGG schema to return the aggregated results the dashboard, as you can see from one of the SQL statements retrieved from the NQQuery.log file below:

select sum(T255906.Dep_Delay_00039B26) as c1,
     sum(T255906.Arr_Delay_00039B22) as c2,
     sum(T255906.Z_of_Fligh00039B28) as c3,
     substr(T255216.Carrier00039BA9 , 1, 25) as c4,
     T255216.Carrier00039BA9 as c5,
     T255216.Carrier_Co00039BAA as c6
from 
     BI_AIRLINES_AGG.SA_16_Dest00039D06 T255357,
     BI_AIRLINES_AGG.SA_Time_Mo00039CFB T255737,
     BI_AIRLINES_AGG.SA_31_Carr00039CEB T255216,
     BI_AIRLINES_AGG.FACT_AGG_OR_06M T255906
where  ( T255216.Carrier_Co00039BAA = T255906.Carrier_Co00039BAA and T255357.Dest_Airpo00039C2A = T255906.Dest_Airpo00039C2A and T255737.Dep_Month00039C07 = T255906.Dep_Month00039C07 and substr(T255216.Carrier00039BA9 , 1, 25) = 'SunFlower Airlines' and (T255357.Dest_Regio00039C31 in ('Midwest Region', 'Northeast Region', 'South Region', 'West Region')) and T255737.Month_of_Y00039C0F between 3 and 12 ) 
group by T255216.Carrier00039BA9, T255216.Carrier_Co00039BAA
order by c5, c6, c4

OBIEE will always use these aggregate tables if they’re available in the repository, so to enable me to test the reports with these aggregates disabled I borrowed the technique Robin introduced in his blog on testing aggregate navigation, and added a request variable prompt to the dashboard page that allows me to pass an INACTIVE_SCHEMAS value to the logical SQL queries issued for the dashboard analyses, and therefore tell the BI Server to ignore the aggregate schema and just use the detail-level BI_AIRLINES schema instead.

NewImage

I therefore test the dashboard again, this time with the BI_AIRLINES_AGG schema disabled; this time, every dashboard took a while to display properly, with every analysis showing the “spinning clock” for twenty, thirty seconds a time. Comparing the response times to the ones a moment ago when aggregates were enabled, and you can see the difference.

chart2

So let’s set-up the in-memory option and see how it affects these two scenarios. The in-memory option for Oracle Database 12c 12.1.0.2.0 is installed by default as part of the core database, but doesn’t start working until you set a value for the INMEMORY_SIZE initialisation parameter – as you can see from the SQL*Plus output below, it’s currently set to zero, effectively disabling the feature:

SQL> conn / as sysdba
Connected.

SQL> show parameter INMEMORY

NAME                     TYPE    VALUE
------------------------------------ ----------- ------------------------------
inmemory_clause_default          string
inmemory_force               string  DEFAULT
inmemory_max_populate_servers        integer     0
inmemory_query               string  ENABLE
inmemory_size                big integer 0
inmemory_trickle_repopulate_servers_ integer     1
percent
optimizer_inmemory_aware         boolean     TRUE

The memory Oracle allocates to the in-memory option, via this initialisation parameter, comes out of the SGA and therefore before I set the INMEMORY_SIZE parameter, it’d be a good idea to increase the SGA_TARGET value to accommodate the 1G I’m then going to assign to INMEMORY_SIZE. I do both from the SQL*Plus command-prompt and then bounce the database to bring the new values into use:

SQL> alter system set inmemory_size = 1G scope=spfile;

System altered.

SQL> show parameter sga_target

NAME                     TYPE    VALUE
------------------------------------ ----------- ------------------------------
sga_target               big integer 6032M
SQL> alter system set sga_target = 7032M scope=spfile;

System altered.

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.

Total System Global Area 7381975040 bytes
Fixed Size          2941480 bytes
Variable Size        1207963096 bytes
Database Buffers     5083496448 bytes
Redo Buffers           13832192 bytes
In-Memory Area       1073741824 bytes
Database mounted.
Database opened.

Now we’re at the point where we can enable the tables for in-memory access. Tables to be copied into memory can either be created like that at the start, or you can ALTER TABLE them afterwards and specify that they go into memory (or you can configure the in-memory settings at the tablespace level, or even specify particular columns or partitions to go into memory). The underlying data still gets stored row-wise on disk, but enabling a table for in-memory access tells the Oracle database to create column-store in-memory representations of the table and direct suitable queries to those copies, all the time in the background keeping the copy in-sync with the row-store base data.

I’ll start by enabling all of the BI_AIRLINES schema for in-memory access, as the biggest pay-off would be if then meant we didn’t need to maintain the aggregate tables. After running a SELECT table_name FROM user_tables to list out the table names, I then run a script to enable all the tables for in-memory access, with the in-memory copies being populated immediately:

alter table AIRCRAFT_GROUP inmemory priority high;
alter table AIRCRAFT_TYPE inmemory priority high;
alter table AIRLINES_USER_DATA inmemory priority high;
alter table AIRLINE_ID inmemory priority high;
alter table CANCELLATION inmemory priority high;
alter table CARRIER_GROUP_NEW inmemory priority high;
alter table CARRIER_REGION inmemory priority high;
alter table DEPARBLK inmemory priority high;
alter table DISTANCE_GROUP_250 inmemory priority high;
alter table DOMESTIC_SEGMENT inmemory priority high;
alter table OBIEE_COUNTY_HIER inmemory priority high;
alter table OBIEE_GEO_AIRPORT_BRIDGE inmemory priority high;
alter table OBIEE_GEO_ORIG inmemory priority high;
alter table OBIEE_ROUTE inmemory priority high;
alter table OBIEE_TIME_DAY_D inmemory priority high;
alter table OBIEE_TIME_MTH_D inmemory priority high;
alter table ONTIME_DELAY_GROUPS inmemory priority high;
alter table PERFORMANCE inmemory priority high;
alter table PERFORMANCE_ENDECA_MV inmemory priority high;
alter table ROUTES_FOR_LINKS inmemory priority high;
alter table SCHEDULES inmemory priority high;
alter table SERVICE_CLASS inmemory priority high;
alter table UNIQUE_CARRIERS inmemory priority high;

Looking at USER_TABLES from this schema, I can see all of the tables I selected now marked for in-memory access, for immediate loading:

SQL> @display_table_inmem_details.sql
SQL> select table_name
  2  ,    inmemory
  3  ,    inmemory_priority
  4  from   user_tables
  5  /
 
TABLE_NAME           INMEMORY INMEMORY
-------------------- -------- --------
AIRCRAFT_GROUP       ENABLED  HIGH
UNIQUE_CARRIERS      ENABLED  HIGH
SERVICE_CLASS        ENABLED  HIGH
SCHEDULES            ENABLED  HIGH
ROUTES_FOR_LINKS     ENABLED  HIGH
PERFORMANCE          ENABLED  HIGH
ONTIME_DELAY_GROUPS  ENABLED  HIGH
OBIEE_TIME_MTH_D     ENABLED  HIGH
OBIEE_TIME_DAY_D     ENABLED  HIGH
OBIEE_ROUTE          ENABLED  HIGH
OBIEE_GEO_ORIG       ENABLED  HIGH
 
TABLE_NAME    INMEMORY INMEMORY
-------------------- -------- --------
OBIEE_GEO_AIRPORT_BR ENABLED  HIGH
IDGE
 
OBIEE_COUNTY_HIER    ENABLED  HIGH
DOMESTIC_SEGMENT     ENABLED  HIGH
DISTANCE_GROUP_250   ENABLED  HIGH
DEPARBLK             ENABLED  HIGH
CARRIER_REGION       ENABLED  HIGH
CARRIER_GROUP_NEW    ENABLED  HIGH
CANCELLATION         ENABLED  HIGH
AIRLINE_ID           ENABLED  HIGH
 
TABLE_NAME           INMEMORY INMEMORY
-------------------- -------- --------
AIRLINES_USER_DATA   ENABLED  HIGH
AIRLINES_PBLOB$      DISABLED
AIRLINES_PART$       DISABLED
AIRLINES_NODE_TZ$    DISABLED
AIRLINES_NODE$       DISABLED
AIRLINES_LINK_TZ$    DISABLED
AIRLINES_LINK_SCH$   DISABLED
AIRLINES_LINK$       DISABLED
AIRLINES_AIRPORT_TZ$ DISABLED
AIRCRAFT_TYPE        ENABLED  HIGH
 
30 rows selected.

And I can track the progress of the tables being copied into memory using the V$IM_SEGMENTS v$ view, like this:

SQL> @display_im_segments.sql
SQL> set echo on
SQL> set pages 200
SQL> col owner for a20
SQL> col name for a20
SQL> col status for a10
SQL> select v.owner
  2  ,      v.segment_name name
  3  ,      v.populate_status status
  4  from   v$im_segments v;

OWNER            NAME         STATUS
-------------------- -------------------- ----------
BI_AIRLINES      OBIEE_COUNTY_HIER    COMPLETED
BI_AIRLINES      PERFORMANCE      STARTED
BI_AIRLINES      UNIQUE_CARRIERS      COMPLETED
BI_AIRLINES      AIRLINES_LINK_TZ$    COMPLETED
BI_AIRLINES      OBIEE_TIME_MTH_D     COMPLETED
BI_AIRLINES      AIRLINES_LINK_SCH$   COMPLETED
BI_AIRLINES      OBIEE_ROUTE      COMPLETED
BI_AIRLINES      DOMESTIC_SEGMENT     COMPLETED
BI_AIRLINES      AIRLINES_LINK$   COMPLETED
BI_AIRLINES      AIRLINE_ID       COMPLETED
BI_AIRLINES      OBIEE_GEO_ORIG   COMPLETED
BI_AIRLINES      AIRLINES_NODE$   COMPLETED
BI_AIRLINES      OBIEE_GEO_AIRPORT_BR COMPLETED
             IDGE

BI_AIRLINES      AIRLINES_NODE_TZ$    COMPLETED
BI_AIRLINES      OBIEE_TIME_DAY_D     COMPLETED

Note that most of the tables went into memory immediately, but one (PERFORMANCE) is taking a while because it’s so big. Also note that not all tables are listed in the v$ view yet, as the database hasn’t got around to adding them into memory yet, or it might choose not to populate them if it feels the memory could be used better elsewhere. You can alter the priority of these in-memory copy processes if you want, and decide whether the copying happens immediately, or when the table is first scanned (accessed).

Running the dashboards again, with the request variable prompt set to disallow the aggregate schema, gave me the response times below – the in-memory queries were much faster than the row-based non in-memory ones, but they weren’t down to the response time of the dashboards right at the start, where all data is stored row-wise but we’ve got aggregate tables to speed up the queries (note I’ve reordered the bars so the non in-memory queries with no aggregate tables are on the left of each series, as the slowest of all approaches)

chart3

Taking a look at one of the physical SQL queries for a cross-tab (hierarchical columns) analysis, you can see that in-memory table access is happening:

WITH 
SAWITH0 AS (select sum(T233937.ACTUALELAPSEDTIME) as c1,
     sum(T233937.ARRDELAYMINUTES) as c2,
     concat(concat(T233484.AIRPORT, ' - '), substr(T233484.TR_AIRPORT_NAME , instr(T233484.TR_AIRPORT_NAME , ': ') + 2 , 50)) as c3,
     T233820.D_NAME as c4,
     T233820.R_NAME as c5
from 
     BI_AIRLINES.OBIEE_GEO_AIRPORT_BRIDGE T233484 /* 10 GEO_AIRPORT_ORIGIN */ ,
     BI_AIRLINES.OBIEE_COUNTY_HIER T233820 /* 11 COUNTY_HIER_ORIGIN */ ,
     BI_AIRLINES.OBIEE_TIME_MTH_D T233732 /* 41 TIME MONTH */ ,
     BI_AIRLINES.PERFORMANCE T233937 /* 00 PERFORMANCE FULL */ 
where  ( T233484.AIRPORT = T233937.ORIGIN and T233484.STCTY_FIPS = T233820.SC_CODE and T233732.Cal_Month = T233937.MONTH and T233732.Cal_Qtr = T233937.QUARTER and T233732.Cal_Year = T233937.YEAR and T233732.Cal_Month between 6 and 12 and T233937.MONTH between 6 and 12 ) 
group by T233820.D_NAME, T233820.R_NAME, concat(concat(T233484.AIRPORT, ' - '), substr(T233484.TR_AIRPORT_NAME , instr(T233484.TR_AIRPORT_NAME , ': ') + 2 , 50))),
SAWITH1 AS (select sum(T233609.PASSENGERS) as c1,
     T233820.R_NAME as c2,
     T233820.D_NAME as c3,
     concat(concat(T233484.AIRPORT, ' - '), substr(T233484.TR_AIRPORT_NAME , instr(T233484.TR_AIRPORT_NAME , ': ') + 2 , 50)) as c4,
     sum(T233609.PASSENGERS_MILES) as c5
from 
     BI_AIRLINES.OBIEE_GEO_AIRPORT_BRIDGE T233484 /* 10 GEO_AIRPORT_ORIGIN */ ,
     BI_AIRLINES.OBIEE_COUNTY_HIER T233820 /* 11 COUNTY_HIER_ORIGIN */ ,
     BI_AIRLINES.OBIEE_TIME_MTH_D T233732 /* 41 TIME MONTH */ ,
     BI_AIRLINES.DOMESTIC_SEGMENT T233609 /* 01 DOMESTIC Segment */ 
where  ( T233484.AIRPORT = T233609.ORIGIN and T233484.STCTY_FIPS = T233820.SC_CODE and T233609.MONTH = T233732.Cal_Month and T233609.QUARTER = T233732.Cal_Qtr and T233609.YEAR = T233732.Cal_Year and T233609.MONTH between 6 and 12 and T233732.Cal_Month between 6 and 12 ) 
group by T233820.D_NAME, T233820.R_NAME, concat(concat(T233484.AIRPORT, ' - '), substr(T233484.TR_AIRPORT_NAME , instr(T233484.TR_AIRPORT_NAME , ': ') + 2 , 50)))
select 2 as c1,
     case  when D1.c3 is not null then D1.c3 when D2.c4 is not null then D2.c4 end  as c2,
     case  when D1.c4 is not null then D1.c4 when D2.c3 is not null then D2.c3 end  as c3,
     case  when D1.c5 is not null then D1.c5 when D2.c2 is not null then D2.c2 end  as c4,
     'All Orig Airports' as c5,
     1 as c6,
     case  when D1.c4 is not null then D1.c4 when D2.c3 is not null then D2.c3 end  as c7,
     case  when D1.c5 is not null then D1.c5 when D2.c2 is not null then D2.c2 end  as c8,
     cast(D1.c2 as  DOUBLE PRECISION  ) / nullif( D1.c1, 0) * 100 as c9,
     D2.c5 as c10,
     D2.c1 as c14
from 
     SAWITH0 D1 full outer join SAWITH1 D2 On D1.c3 = D2.c4 and  SYS_OP_MAP_NONNULL(D1.c4) = SYS_OP_MAP_NONNULL(D2.c3) 
order by c4, c3, c2
SQL> @complex_query_explain.sql
 
Explained.

SQL> set lines 200
SQL> set pages 0
SQL> select * from table(dbms_xplan.display);
Plan hash value: 3097908901

----------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name             | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |              |  2673K|  1392M|   |   112K  (1)| 00:00:05 |    |      |        |
|   1 |  PX COORDINATOR                    |              |   |   |   |        |      |    |      |        |
|   2 |   PX SEND QC (ORDER)                   | :TQ10006         |  2673K|  1392M|   |   112K  (1)| 00:00:05 |  Q1,06 | P->S | QC (ORDER) |
|   3 |    SORT ORDER BY                   |              |  2673K|  1392M|  1492M|   112K  (1)| 00:00:05 |  Q1,06 | PCWP |        |
|   4 |     PX RECEIVE                     |              |  2673K|  1392M|   | 26819   (1)| 00:00:02 |  Q1,06 | PCWP |        |
|   5 |      PX SEND RANGE                 | :TQ10005         |  2673K|  1392M|   | 26819   (1)| 00:00:02 |  Q1,05 | P->P | RANGE      |
|   6 |       VIEW                     | VW_FOJ_0         |  2673K|  1392M|   | 26819   (1)| 00:00:02 |  Q1,05 | PCWP |        |
|*  7 |        HASH JOIN FULL OUTER BUFFERED           |              |  2673K|  1392M|   | 26819   (1)| 00:00:02 |  Q1,05 | PCWP |        |
|   8 |     PX RECEIVE                 |              |   136 | 37128 |   |   144   (7)| 00:00:01 |  Q1,05 | PCWP |        |
|   9 |      PX SEND HASH                  | :TQ10003         |   136 | 37128 |   |   144   (7)| 00:00:01 |  Q1,03 | S->P | HASH       |
|  10 |       PX SELECTOR                  |              |   |   |   |        |      |  Q1,03 | SCWC |        |
|  11 |        VIEW                    |              |   136 | 37128 |   |   144   (7)| 00:00:01 |  Q1,03 | SCWC |        |
|  12 |         HASH GROUP BY              |              |   136 | 17408 |   |   144   (7)| 00:00:01 |  Q1,03 | SCWC |        |
|* 13 |          HASH JOIN                 |              |   136 | 17408 |   |   143   (6)| 00:00:01 |  Q1,03 | SCWC |        |
|  14 |           JOIN FILTER CREATE           | :BF0000          |   136 | 11288 |   |   142   (6)| 00:00:01 |  Q1,03 | PCWP |        |
|* 15 |            HASH JOIN               |              |   136 | 11288 |   |   142   (6)| 00:00:01 |  Q1,03 | SCWC |        |
|  16 |         JOIN FILTER CREATE         | :BF0001          |   136 |  4896 |   |   142   (6)| 00:00:01 |  Q1,03 | PCWP |        |
|* 17 |          HASH JOIN             |              |   136 |  4896 |   |   142   (6)| 00:00:01 |  Q1,03 | SCWC |        |
|* 18 |           TABLE ACCESS INMEMORY FULL   | OBIEE_TIME_MTH_D     |   161 |  1610 |   | 1   (0)| 00:00:01 |  Q1,03 | SCWP |        |
|* 19 |           TABLE ACCESS INMEMORY FULL   | DOMESTIC_SEGMENT     |   771 | 20046 |   |   141   (6)| 00:00:01 |  Q1,03 | SCWP |        |
|  20 |         JOIN FILTER USE            | :BF0001          |  1787 | 83989 |   | 1   (0)| 00:00:01 |  Q1,03 | PCWP |        |
|* 21 |          TABLE ACCESS INMEMORY FULL    | OBIEE_GEO_AIRPORT_BRIDGE |  1787 | 83989 |   | 1   (0)| 00:00:01 |  Q1,03 | SCWP |        |
|  22 |           JOIN FILTER USE              | :BF0000          |  3221 |   141K|   | 1   (0)| 00:00:01 |  Q1,03 | PCWP |        |
|* 23 |            TABLE ACCESS INMEMORY FULL      | OBIEE_COUNTY_HIER    |  3221 |   141K|   | 1   (0)| 00:00:01 |  Q1,03 | SCWP |        |
|  24 |     PX RECEIVE                 |              |  2255K|   587M|   | 26673   (1)| 00:00:02 |  Q1,05 | PCWP |        |
|  25 |      PX SEND HASH                  | :TQ10004         |  2255K|   587M|   | 26673   (1)| 00:00:02 |  Q1,04 | P->P | HASH       |
|  26 |       VIEW                     |              |  2255K|   587M|   | 26673   (1)| 00:00:02 |  Q1,04 | PCWP |        |
|  27 |        HASH GROUP BY               |              |  2255K|   264M|   489M| 26673   (1)| 00:00:02 |  Q1,04 | PCWP |        |
|  28 |         PX RECEIVE                 |              |  2255K|   264M|   | 26673   (1)| 00:00:02 |  Q1,04 | PCWP |        |
|  29 |          PX SEND HASH              | :TQ10002         |  2255K|   264M|   | 26673   (1)| 00:00:02 |  Q1,02 | P->P | HASH       |
|  30 |           HASH GROUP BY            |              |  2255K|   264M|   489M| 26673   (1)| 00:00:02 |  Q1,02 | PCWP |        |
|* 31 |            HASH JOIN               |              |  3761K|   441M|   |   841   (9)| 00:00:01 |  Q1,02 | PCWP |        |
|  32 |         PX RECEIVE             |              |   161 |  1610 |   | 1   (0)| 00:00:01 |  Q1,02 | PCWP |        |
|  33 |          PX SEND BROADCAST         | :TQ10000         |   161 |  1610 |   | 1   (0)| 00:00:01 |  Q1,00 | S->P | BROADCAST  |
|  34 |           PX SELECTOR              |              |   |   |   |        |      |  Q1,00 | SCWC |        |
|* 35 |            TABLE ACCESS INMEMORY FULL  | OBIEE_TIME_MTH_D     |   161 |  1610 |   | 1   (0)| 00:00:01 |  Q1,00 | SCWP |        |
|* 36 |         HASH JOIN              |              |  3773K|   406M|   |   838   (9)| 00:00:01 |  Q1,02 | PCWP |        |
|  37 |          PX RECEIVE            |              |  1787 |   160K|   | 2   (0)| 00:00:01 |  Q1,02 | PCWP |        |
|  38 |           PX SEND BROADCAST        | :TQ10001         |  1787 |   160K|   | 2   (0)| 00:00:01 |  Q1,01 | S->P | BROADCAST  |
|  39 |            PX SELECTOR             |              |   |   |   |        |      |  Q1,01 | SCWC |        |
|* 40 |             HASH JOIN              |              |  1787 |   160K|   | 2   (0)| 00:00:01 |  Q1,01 | SCWC |        |
|  41 |              TABLE ACCESS INMEMORY FULL| OBIEE_GEO_AIRPORT_BRIDGE |  1787 | 83989 |   | 1   (0)| 00:00:01 |  Q1,01 | SCWP |        |
|  42 |              TABLE ACCESS INMEMORY FULL| OBIEE_COUNTY_HIER    |  3221 |   141K|   | 1   (0)| 00:00:01 |  Q1,01 | SCWP |        |
|  43 |          PX BLOCK ITERATOR         |              |  3773K|    75M|   |   834   (9)| 00:00:01 |  Q1,02 | PCWC |        |
|* 44 |           TABLE ACCESS INMEMORY FULL   | PERFORMANCE          |  3773K|    75M|   |   834   (9)| 00:00:01 |  Q1,02 | PCWP |        |
----------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   7 - access("D1"."C3"="D2"."C4" AND SYS_OP_MAP_NONNULL("D1"."C4")=SYS_OP_MAP_NONNULL("D2"."C3"))
  13 - access("T233484"."STCTY_FIPS"="T233820"."SC_CODE")
  15 - access("T233484"."AIRPORT"="T233609"."ORIGIN")
  17 - access("T233732"."CAL_MONTH"=TO_NUMBER("T233609"."MONTH") AND "T233732"."CAL_QTR"=TO_NUMBER("T233609"."QUARTER") AND
          "T233732"."CAL_YEAR"=TO_NUMBER("T233609"."YEAR"))
  18 - inmemory("T233732"."CAL_MONTH">=6 AND "T233732"."CAL_MONTH"<=12)
       filter("T233732"."CAL_MONTH">=6 AND "T233732"."CAL_MONTH"<=12)
  19 - inmemory(TO_NUMBER("T233609"."MONTH")>=6 AND TO_NUMBER("T233609"."MONTH")<=12)
       filter(TO_NUMBER("T233609"."MONTH")>=6 AND TO_NUMBER("T233609"."MONTH")<=12)
  21 - inmemory(SYS_OP_BLOOM_FILTER(:BF0001,"T233484"."AIRPORT"))
       filter(SYS_OP_BLOOM_FILTER(:BF0001,"T233484"."AIRPORT"))
  23 - inmemory(SYS_OP_BLOOM_FILTER(:BF0000,"T233820"."SC_CODE"))
       filter(SYS_OP_BLOOM_FILTER(:BF0000,"T233820"."SC_CODE"))
  31 - access("T233732"."CAL_MONTH"="T233937"."MONTH" AND "T233732"."CAL_QTR"="T233937"."QUARTER" AND "T233732"."CAL_YEAR"="T233937"."YEAR")
  35 - inmemory("T233732"."CAL_MONTH">=6 AND "T233732"."CAL_MONTH"<=12)
       filter("T233732"."CAL_MONTH">=6 AND "T233732"."CAL_MONTH"<=12)
  36 - access("T233484"."AIRPORT"="T233937"."ORIGIN")
  40 - access("T233484"."STCTY_FIPS"="T233820"."SC_CODE")
  44 - inmemory("T233937"."MONTH">=6 AND "T233937"."MONTH"<=12)
       filter("T233937"."MONTH">=6 AND "T233937"."MONTH"<=12)

Note
-----
   - dynamic statistics used: dynamic sampling (level=AUTO)
   - Degree of Parallelism is 2 because of table property

80 rows selected.

Indeed, looking at the queries the BI Server is sending to the database it’s not too surprising the in-memory difference in this case wasn’t too dramatic. According to the docs (emphasis mine):

Storing a database object in the IM column store can improve performance significantly for the following types of operations performed on the database object 

  • A query that scans a large number of rows and applies filters that use operators such as the following: =, <, >, and IN
  • A query that selects a small number of columns from a table or materialized view with a large number of columns, such as a query that selects five columns from a table with 100 columns
  • A query that joins a small table to a large table
  • A query that aggregates data

The IM column store does not improve performance for the following types of operations:

  • Queries with complex predicates
  • Queries that select a large number of columns
  • Queries that return a large number of rows
  • Queries with multiple large table joins

and our query certainly has complex predicates, returns a fair few rows, has large and lots of joins etc. Taking a more simple query that you’d likely write yourself if querying a data warehouse, you can see the in-memory table access being used again but a much simpler, an cheaper explain plan:

SQL> @inmem_explain.sql
SQL> set echo on
SQL> explain plan for
  2  select /*+ INMEMORY */ sum(T233937.ACTUALELAPSEDTIME) as c1,
  3   sum(T233937.WEATHERDELAY) as c2,
  4   sum(T233937.SECURITYDELAY) as c3,
  5   sum(T233937.NASDELAY) as c4,
  6   sum(T233937.LATEAIRCRAFTDELAY) as c5,
  7   sum(T233937.ARRDELAYMINUTES) as c6,
  8   sum(T233937.CARRIERDELAY) as c7,
  9   sum(nvl(casewhen T233937.CANCELLED < 1 then T233937.FLIGHTS end  , 0)) as c8,
 10   T233820.D_NAME as c9
 11  from
 12   BI_AIRLINES.OBIEE_GEO_AIRPORT_BRIDGE T233484 /* 10 GEO_AIRPORT_ORIGIN */ ,
 13   BI_AIRLINES.OBIEE_COUNTY_HIER T233820 /* 11 COUNTY_HIER_ORIGIN */ ,
 14   BI_AIRLINES.OBIEE_GEO_AIRPORT_BRIDGE T233497 /* 12 GEO_AIPORT_DEST */ ,
 15   BI_AIRLINES.OBIEE_COUNTY_HIER T233831 /* 13 COUNTY_HIER_DEST */ ,
 16   BI_AIRLINES.OBIEE_TIME_MTH_D T233732 /* 41 TIME MONTH */ ,
 17   BI_AIRLINES.DISTANCE_GROUP_250 T233594 /* 19 DISTANCE_GROUP_250 */ ,
 18   BI_AIRLINES.PERFORMANCE T233937 /* 00 PERFORMANCE FULL */
 19  where  ( T233484.AIRPORT = T233937.ORIGIN and T233484.STCTY_FIPS = T233820.SC_CODE and T233497.AIRPORT = T233937.DEST and T233497.STCTY_FIPS = T233831.SC_CODE and T233594.DESCRIPTION = '1000-1249 Miles' and T233594.CODE = T233937.DISTANCEGROUP and T233732.Cal_Month = T233937.MONTH and T233732.Cal_Qtr = T233937.QUARTER and T233732.Cal_Year = T233937.YEAR and T233831.R_NAME = 'Northeast Region' and T233732.Cal_Month between 6 and 12 and T233937.MONTH between 6 and 12 )
 20  group by T233820.D_NAME
 21  order by c9
 22  /
 
Explained.
 
SQL> set lines 300
SQL> set pages 0
SQL> select * from table(dbms_xplan.display);
Plan hash value: 3055743864

-----------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name             | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |                  |     9 |  1314 |   883  (13)| 00:00:01 |    |      |        |
|   1 |  PX COORDINATOR                 |                  |       |       |        |          |    |      |        |
|   2 |   PX SEND QC (ORDER)                | :TQ10006             |     9 |  1314 |   883  (13)| 00:00:01 |  Q1,06 | P->S | QC (ORDER) |
|   3 |    SORT GROUP BY                |                  |     9 |  1314 |   883  (13)| 00:00:01 |  Q1,06 | PCWP |        |
|   4 |     PX RECEIVE                  |                  |     9 |  1314 |   883  (13)| 00:00:01 |  Q1,06 | PCWP |        |
|   5 |      PX SEND RANGE              | :TQ10005             |     9 |  1314 |   883  (13)| 00:00:01 |  Q1,05 | P->P | RANGE       |
|   6 |       HASH GROUP BY             |                  |     9 |  1314 |   883  (13)| 00:00:01 |  Q1,05 | PCWP |        |
|*  7 |        HASH JOIN                |                  | 60775 |  8665K|   882  (13)| 00:00:01 |  Q1,05 | PCWP |        |
|   8 |     PX RECEIVE              |                  |  3221 | 99851 |     1   (0)| 00:00:01 |  Q1,05 | PCWP |        |
|   9 |      PX SEND BROADCAST          | :TQ10000             |  3221 | 99851 |     1   (0)| 00:00:01 |  Q1,00 | S->P | BROADCAST  |
|  10 |       PX SELECTOR               |                  |       |       |        |          |  Q1,00 | SCWC |        |
|  11 |        TABLE ACCESS INMEMORY FULL       | OBIEE_COUNTY_HIER        |  3221 | 99851 |     1   (0)| 00:00:01 |  Q1,00 | SCWP |        |
|* 12 |     HASH JOIN               |                  | 60775 |  6825K|   881  (13)| 00:00:01 |  Q1,05 | PCWP |        |
|  13 |      PX RECEIVE             |                  |  1787 | 17870 |     1   (0)| 00:00:01 |  Q1,05 | PCWP |        |
|  14 |       PX SEND BROADCAST         | :TQ10001             |  1787 | 17870 |     1   (0)| 00:00:01 |  Q1,01 | S->P | BROADCAST  |
|  15 |        PX SELECTOR              |                  |       |       |        |          |  Q1,01 | SCWC |        |
|  16 |         TABLE ACCESS INMEMORY FULL      | OBIEE_GEO_AIRPORT_BRIDGE |  1787 | 17870 |     1   (0)| 00:00:01 |  Q1,01 | SCWP |        |
|* 17 |      HASH JOIN              |                  | 60775 |  6231K|   880  (13)| 00:00:01 |  Q1,05 | PCWP |        |
|  18 |       PX RECEIVE                |                  |   161 |  1610 |     1   (0)| 00:00:01 |  Q1,05 | PCWP |        |
|  19 |        PX SEND BROADCAST            | :TQ10002             |   161 |  1610 |     1   (0)| 00:00:01 |  Q1,02 | S->P | BROADCAST  |
|  20 |         PX SELECTOR             |                  |       |       |        |          |  Q1,02 | SCWC |        |
|* 21 |          TABLE ACCESS INMEMORY FULL     | OBIEE_TIME_MTH_D         |   161 |  1610 |     1   (0)| 00:00:01 |  Q1,02 | SCWP |        |
|* 22 |       HASH JOIN             |                  | 60964 |  5655K|   879  (13)| 00:00:01 |  Q1,05 | PCWP |        |
|  23 |        JOIN FILTER CREATE           | :BF0000              |     1 |    19 |     1   (0)| 00:00:01 |  Q1,05 | PCWP |        |
|  24 |         PX RECEIVE              |                  |     1 |    19 |     1   (0)| 00:00:01 |  Q1,05 | PCWP |        |
|  25 |          PX SEND BROADCAST          | :TQ10003             |     1 |    19 |     1   (0)| 00:00:01 |  Q1,03 | S->P | BROADCAST  |
|  26 |           PX SELECTOR           |                  |       |       |        |          |  Q1,03 | SCWC |        |
|* 27 |            TABLE ACCESS INMEMORY FULL   | DISTANCE_GROUP_250       |     1 |    19 |     1   (0)| 00:00:01 |  Q1,03 | SCWP |        |
|* 28 |        HASH JOIN                |                  |   670K|    48M|   878  (13)| 00:00:01 |  Q1,05 | PCWP |        |
|  29 |         JOIN FILTER CREATE          | :BF0001              |   318 |  9540 |     2   (0)| 00:00:01 |  Q1,05 | PCWP |        |
|  30 |          PX RECEIVE             |                  |   318 |  9540 |     2   (0)| 00:00:01 |  Q1,05 | PCWP |        |
|  31 |           PX SEND BROADCAST         | :TQ10004             |   318 |  9540 |     2   (0)| 00:00:01 |  Q1,04 | S->P | BROADCAST  |
|  32 |            PX SELECTOR          |                  |       |       |        |          |  Q1,04 | SCWC |        |
|* 33 |         HASH JOIN           |                  |   318 |  9540 |     2   (0)| 00:00:01 |  Q1,04 | SCWC |        |
|  34 |          JOIN FILTER CREATE     | :BF0002              |   217 |  4340 |     1   (0)| 00:00:01 |  Q1,04 | PCWP |        |
|* 35 |           TABLE ACCESS INMEMORY FULL| OBIEE_COUNTY_HIER        |   217 |  4340 |     1   (0)| 00:00:01 |  Q1,04 | SCWP |        |
|  36 |          JOIN FILTER USE        | :BF0002              |  1787 | 17870 |     1   (0)| 00:00:01 |  Q1,04 | PCWP |        |
|* 37 |           TABLE ACCESS INMEMORY FULL| OBIEE_GEO_AIRPORT_BRIDGE |  1787 | 17870 |     1   (0)| 00:00:01 |  Q1,04 | SCWP |        |
|  38 |         JOIN FILTER USE         | :BF0000              |  3773K|   165M|   874  (13)| 00:00:01 |  Q1,05 | PCWP |        |
|  39 |          JOIN FILTER USE            | :BF0001              |  3773K|   165M|   874  (13)| 00:00:01 |  Q1,05 | PCWP |        |
|  40 |           PX BLOCK ITERATOR         |                  |  3773K|   165M|   874  (13)| 00:00:01 |  Q1,05 | PCWC |        |
|* 41 |            TABLE ACCESS INMEMORY FULL   | PERFORMANCE          |  3773K|   165M|   874  (13)| 00:00:01 |  Q1,05 | PCWP |        |
-----------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   7 - access("T233484"."STCTY_FIPS"="T233820"."SC_CODE")
  12 - access("T233484"."AIRPORT"="T233937"."ORIGIN")
  17 - access("T233732"."CAL_MONTH"="T233937"."MONTH" AND "T233732"."CAL_QTR"="T233937"."QUARTER" AND "T233732"."CAL_YEAR"="T233937"."YEAR")
  21 - inmemory("T233732"."CAL_MONTH">=6 AND "T233732"."CAL_MONTH"<=12)
       filter("T233732"."CAL_MONTH">=6 AND "T233732"."CAL_MONTH"<=12)
  22 - access("T233594"."CODE"="T233937"."DISTANCEGROUP")
  27 - inmemory("T233594"."DESCRIPTION"='1000-1249 Miles')
       filter("T233594"."DESCRIPTION"='1000-1249 Miles')
  28 - access("T233497"."AIRPORT"="T233937"."DEST")
  33 - access("T233497"."STCTY_FIPS"="T233831"."SC_CODE")
  35 - inmemory("T233831"."R_NAME"='Northeast Region')
       filter("T233831"."R_NAME"='Northeast Region')
  37 - inmemory(SYS_OP_BLOOM_FILTER(:BF0002,"T233497"."STCTY_FIPS"))
       filter(SYS_OP_BLOOM_FILTER(:BF0002,"T233497"."STCTY_FIPS"))
  41 - inmemory("T233937"."MONTH">=6 AND "T233937"."MONTH"<=12 AND SYS_OP_BLOOM_FILTER_LIST(SYS_OP_BLOOM_FILTER(:BF0001,"T233937"."DEST"),SYS
          _OP_BLOOM_FILTER(:BF0000,"T233937"."DISTANCEGROUP")))
       filter("T233937"."MONTH">=6 AND "T233937"."MONTH"<=12 AND SYS_OP_BLOOM_FILTER_LIST(SYS_OP_BLOOM_FILTER(:BF0001,"T233937"."DEST"),SYS_O
          P_BLOOM_FILTER(:BF0000,"T233937"."DISTANCEGROUP")))

Note
-----
   - dynamic statistics used: dynamic sampling (level=AUTO)
   - Degree of Parallelism is 2 because of table property

75 rows selected.

If I then turn-off the in-memory feature and regenerate the execution plan, you can see without in-memory the plan is around 5x as expensive:

SQL> alter system set INMEMORY_SIZE = 0 scope = spfile;

System altered.

SQL> shutdown immediate;
ORA-01097: cannot shutdown while in a transaction - commit or rollback first
SQL> rollback;

Rollback complete.

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.
Total System Global Area 7398752256 bytes
Fixed Size          2941528 bytes
Variable Size        1056968104 bytes
Database Buffers     6325010432 bytes
Redo Buffers           13832192 bytes
Database mounted.
Database opened.
SQL> @noinmem_explain.sql

Explained.

SQL> set lines 300
SQL> set pages 0
SQL> select * from table(dbms_xplan.display);
Plan hash value: 2990499928

--------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name              | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |               |     9 |  1341 |  4086   (1)| 00:00:01 |        |      |        |
|   1 |  TEMP TABLE TRANSFORMATION                |               |       |       |        |          |        |      |        |
|   2 |   LOAD AS SELECT                      | SYS_TEMP_0FD9D6605_275335 |       |       |        |          |        |      |        |
|*  3 |    HASH JOIN                          |               |   318 |  9540 |    22   (0)| 00:00:01 |        |      |        |
|*  4 |     TABLE ACCESS FULL                     | OBIEE_COUNTY_HIER         |   217 |  4340 |    13   (0)| 00:00:01 |        |      |        |
|   5 |     TABLE ACCESS FULL                     | OBIEE_GEO_AIRPORT_BRIDGE  |  1787 | 17870 |     9   (0)| 00:00:01 |        |      |        |
|   6 |   PX COORDINATOR                      |               |       |       |        |          |        |      |        |
|   7 |    PX SEND QC (ORDER)                     | :TQ10008              |     9 |  1341 |  4086   (1)| 00:00:01 |  Q1,08 | P->S | QC (ORDER) |
|   8 |     SORT GROUP BY                     |               |     9 |  1341 |  4086   (1)| 00:00:01 |  Q1,08 | PCWP |        |
|   9 |      PX RECEIVE                       |               |     9 |  1341 |  4086   (1)| 00:00:01 |  Q1,08 | PCWP |        |
|  10 |       PX SEND RANGE                   | :TQ10007              |     9 |  1341 |  4086   (1)| 00:00:01 |  Q1,07 | P->P | RANGE       |
|  11 |        HASH GROUP BY                      |               |     9 |  1341 |  4086   (1)| 00:00:01 |  Q1,07 | PCWP |        |
|* 12 |     HASH JOIN                     |               |   281 | 41869 |  4085   (1)| 00:00:01 |  Q1,07 | PCWP |        |
|  13 |      PX RECEIVE                   |               |   281 | 33158 |  4072   (1)| 00:00:01 |  Q1,07 | PCWP |        |
|  14 |       PX SEND HYBRID HASH                 | :TQ10005              |   281 | 33158 |  4072   (1)| 00:00:01 |  Q1,05 | P->P | HYBRID HASH|
|  15 |        STATISTICS COLLECTOR               |               |       |       |        |          |  Q1,05 | PCWC |        |
|* 16 |         HASH JOIN BUFFERED                |               |   281 | 33158 |  4072   (1)| 00:00:01 |  Q1,05 | PCWP |        |
|  17 |          VIEW                     | VW_GBC_29             |   281 | 30348 |  4063   (1)| 00:00:01 |  Q1,05 | PCWP |        |
|  18 |           HASH GROUP BY               |               |   281 | 79523 |  4041   (1)| 00:00:01 |  Q1,05 | PCWP |        |
|  19 |            PX RECEIVE                 |               |   281 | 79523 |  4041   (1)| 00:00:01 |  Q1,05 | PCWP |        |
|  20 |         PX SEND HASH                  | :TQ10003              |   281 | 79523 |  4041   (1)| 00:00:01 |  Q1,03 | P->P | HASH    |
|  21 |          HASH GROUP BY                |               |   281 | 79523 |  4041   (1)| 00:00:01 |  Q1,03 | PCWP |        |
|* 22 |           HASH JOIN               |               | 60853 |    16M|  4039   (1)| 00:00:01 |  Q1,03 | PCWP |        |
|  23 |            BUFFER SORT                |               |       |       |        |          |  Q1,03 | PCWC |        |
|  24 |             PX RECEIVE                |               |   318 |  1272 |     2   (0)| 00:00:01 |  Q1,03 | PCWP |        |
|  25 |              PX SEND BROADCAST            | :TQ10000              |   318 |  1272 |     2   (0)| 00:00:01 |        | S->P | BROADCAST  |
|  26 |               TABLE ACCESS FULL           | SYS_TEMP_0FD9D6605_275335 |   318 |  1272 |     2   (0)| 00:00:01 |        |      |        |
|* 27 |            HASH JOIN                  |               | 60853 |    16M|  4037   (1)| 00:00:01 |  Q1,03 | PCWP |        |
|  28 |             PX RECEIVE                |               |   160 |  4640 |     4   (0)| 00:00:01 |  Q1,03 | PCWP |        |
|  29 |              PX SEND BROADCAST            | :TQ10002              |   160 |  4640 |     4   (0)| 00:00:01 |  Q1,02 | S->P | BROADCAST  |
|  30 |               PX SELECTOR             |               |       |       |        |          |  Q1,02 | SCWC |        |
|  31 |                MERGE JOIN CARTESIAN       |               |   160 |  4640 |     4   (0)| 00:00:01 |  Q1,02 | SCWC |        |
|* 32 |             VIEW                  | index$_join$_006          |     1 |    19 |     2   (0)| 00:00:01 |  Q1,02 | SCWC |        |
|* 33 |              HASH JOIN            |               |       |       |        |          |  Q1,02 | SCWC |        |
|  34 |               BITMAP CONVERSION TO ROWIDS     |               |     1 |    19 |     1   (0)| 00:00:01 |  Q1,02 | SCWC |        |
|* 35 |                BITMAP INDEX SINGLE VALUE      | M_INDEX32             |       |       |        |          |  Q1,02 | SCWP |        |
|  36 |               BITMAP CONVERSION TO ROWIDS     |               |     1 |    19 |     1   (0)| 00:00:01 |  Q1,02 | SCWC |        |
|  37 |                BITMAP INDEX FULL SCAN     | INDEX4            |       |       |        |          |  Q1,02 | SCWP |        |
|  38 |             BUFFER SORT           |               |   161 |  1610 |     2   (0)| 00:00:01 |  Q1,02 | SCWC |        |
|  39 |              BITMAP CONVERSION TO ROWIDS      |               |   161 |  1610 |     2   (0)| 00:00:01 |  Q1,02 | SCWC |        |
|* 40 |               BITMAP INDEX FAST FULL SCAN     | M_INDEX28             |       |       |        |          |  Q1,02 | SCWP |        |
|* 41 |             VIEW                  | VW_ST_167D3604        | 61043 |    14M|  4033   (1)| 00:00:01 |  Q1,03 | PCWP |        |
|  42 |              NESTED LOOPS             |               | 61043 |  4768K|  4029   (1)| 00:00:01 |  Q1,03 | PCWP |        |
|  43 |               BUFFER SORT             |               |       |       |        |          |  Q1,03 | PCWC |        |
|  44 |                PX RECEIVE             |               |       |       |        |          |  Q1,03 | PCWP |        |
|  45 |             PX SEND HASH (BLOCK ADDRESS)      | :TQ10001              |       |       |        |          |        | S->P | HASH (BLOCK|
|  46 |              BITMAP CONVERSION TO ROWIDS      |               | 61042 |  1311K|   365   (1)| 00:00:01 |        |      |        |
|  47 |               BITMAP AND              |               |       |       |        |          |        |      |        |
|  48 |                BITMAP MERGE           |               |       |       |        |          |        |      |        |
|  49 |                 BITMAP KEY ITERATION      |               |       |       |        |          |        |      |        |
|* 50 |                  VIEW             | index$_join$_255          |     1 |    19 |     2   (0)| 00:00:01 |        |      |        |
|* 51 |                   HASH JOIN           |               |       |       |        |          |        |      |        |
|  52 |                    BITMAP CONVERSION TO ROWIDS|               |     1 |    19 |     1   (0)| 00:00:01 |        |      |        |
|* 53 |                 BITMAP INDEX SINGLE VALUE | M_INDEX32             |       |       |        |          |        |      |        |
|  54 |                    BITMAP CONVERSION TO ROWIDS|               |     1 |    19 |     1   (0)| 00:00:01 |        |      |        |
|  55 |                 BITMAP INDEX FULL SCAN    | INDEX4            |       |       |        |          |        |      |        |
|* 56 |                  BITMAP INDEX RANGE SCAN      | PERF_DISTANCEGRP          |       |       |        |          |        |      |        |
|  57 |                BITMAP MERGE           |               |       |       |        |          |        |      |        |
|  58 |                 BITMAP KEY ITERATION      |               |       |       |        |          |        |      |        |
|  59 |                  TABLE ACCESS FULL        | SYS_TEMP_0FD9D6605_275335 |   318 |  1272 |     2   (0)| 00:00:01 |        |      |        |
|* 60 |                  BITMAP INDEX RANGE SCAN      | PERF_DEST             |       |       |        |          |        |      |        |
|  61 |                BITMAP MERGE           |               |       |       |        |          |        |      |        |
|* 62 |                 BITMAP INDEX RANGE SCAN   | PERF_MONTH            |       |       |        |          |        |      |        |
|  63 |               TABLE ACCESS BY USER ROWID      | PERFORMANCE           |     1 |    58 |  3669   (1)| 00:00:01 |  Q1,03 | PCWP |        |
|  64 |          PX RECEIVE                   |               |  1787 | 17870 |     9   (0)| 00:00:01 |  Q1,05 | PCWP |        |
|  65 |           PX SEND BROADCAST               | :TQ10004              |  1787 | 17870 |     9   (0)| 00:00:01 |  Q1,04 | S->P | BROADCAST  |
|  66 |            PX SELECTOR                |               |       |       |        |          |  Q1,04 | SCWC |        |
|  67 |         TABLE ACCESS FULL             | OBIEE_GEO_AIRPORT_BRIDGE  |  1787 | 17870 |     9   (0)| 00:00:01 |  Q1,04 | SCWP |        |
|  68 |      PX RECEIVE                   |               |  3221 | 99851 |    13   (0)| 00:00:01 |  Q1,07 | PCWP |        |
|  69 |       PX SEND HYBRID HASH                 | :TQ10006              |  3221 | 99851 |    13   (0)| 00:00:01 |  Q1,06 | S->P | HYBRID HASH|
|  70 |        PX SELECTOR                    |               |       |       |        |          |  Q1,06 | SCWC |        |
|  71 |         TABLE ACCESS FULL                 | OBIEE_COUNTY_HIER         |  3221 | 99851 |    13   (0)| 00:00:01 |  Q1,06 | SCWP |        |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("T233497"."STCTY_FIPS"="T233831"."SC_CODE")
   4 - filter("T233831"."R_NAME"='Northeast Region')
  12 - access("T233484"."STCTY_FIPS"="T233820"."SC_CODE")
  16 - access("T233484"."AIRPORT"="ITEM_1")
  22 - access("C0"="ITEM_1")
  27 - access("T233732"."CAL_YEAR"="ITEM_5" AND "T233732"."CAL_QTR"="ITEM_4" AND "T233732"."CAL_MONTH"="ITEM_3" AND "T233594"."CODE"="ITEM_2")
  32 - filter("T233594"."DESCRIPTION"='1000-1249 Miles')
  33 - access(ROWID=ROWID)
  35 - access("T233594"."DESCRIPTION"='1000-1249 Miles')
  40 - filter("T233732"."CAL_MONTH">=6 AND "T233732"."CAL_MONTH"<=12)
  41 - filter("ITEM_3"<=12 AND "ITEM_3">=6)
  50 - filter("T233594"."DESCRIPTION"='1000-1249 Miles')
  51 - access(ROWID=ROWID)
  53 - access("T233594"."DESCRIPTION"='1000-1249 Miles')
  56 - access("T233937"."DISTANCEGROUP"="T233594"."CODE")
  60 - access("T233937"."DEST"="C0")
  62 - access("T233937"."MONTH">=6 AND "T233937"."MONTH"<=12)

Note
-----
   - dynamic statistics used: dynamic sampling (level=AUTO)
   - Degree of Parallelism is 2 because of table property
   - star transformation used for this statement

105 rows selected.

Running the actual queries in this case gives me a wall-time of around 4 seconds for the in-memory version, and 14 seconds when in-memory query is disabled … but the response time isn’t anywhere near the initial run where we had data stored row-wise but with aggregate tables, so let’s finish-off the testing by putting the aggregate table in-memory too, and see if that makes a difference.

And of course, it absolutely flew:

chart4

So to conclude from my look at the Oracle Database 12c In-Memory option with OBIEE11g v406 SampleApp, I’d say the following based on my initial tests:

  • For BI-type reporting where you’re typically summarising lots of data, the in-memory option doesn’t remove the need for aggregate tables – you’ll still benefit significantly from having them, in my observation
  • Where the in-memory option does benefit you is when you’re querying the detail-level data – it helps with aggregation but it’s main strength is fast filtering against subsets of columns
  • Some of the more complex SQL queries issued by OBIEE’s BI Server, for example when creating lots of subtotals and totals against a dataset, reduce the effectiveness of the in-memory option – you’ll get the biggest speed improvement, at least at the moment, with queries with simpler predicates and not so complex joins

There might be more to the aggregation story in the end, though. Looking at the Oracle Database 12c In-Memory Option Technical White Paper, the in-memory option should in-fact help with aggregation through a new optimiser transformation called “vector group by”, a transformation that’s likened to a star transformation that uses CPU-efficient algorithms and a multi-dimensional array created on-the-fly in the PGA called an “in-memory accumulator”.

NewImage

In fact, what we’ve heard is that many of the old Oracle OLAP team have moved over to the in-memory option team and were responsible for this feature, so I’ll be taking a closer look at in-memory aggregation in this new feature over the next few months. In my examples though, I didn’t see any examples of vector group by in the query execution plans, so I’m assuming either conditions weren’t right for it, or like star transformations there’s some combination of setting and query factors that need to be in place before it’ll appear in the execution plan (and queries presumably run that much faster).

For now though – that’s my first run-through of the 12c In-Memory Option in the context of a typical BI workload. Check back later in the year for more in-memory option postings, including hopefully something more on the in-memory aggregation feature.

Rittman Mead and Oracle Big Data Appliance

Over the past couple of years Rittman Mead have been broadening our skills and competencies out from core OBIEE, ODI and Oracle data warehousing into the new “emerging” analytic platforms: R and database advanced analytics, Hadoop, cloud and clustered/distributed systems. As we talked about in the recent series of updated Oracle Information Management Reference Architecture blog posts and my initial look at the Oracle Big Data SQL product, our customers are increasingly looking to complement their core Oracle analytics platform with ones to handle unstructured and big data, and as technologists we’re always interesting in what else we can use to help our customers get more insight out of their (total) dataset.

An area we’ve particularly focused on over the past year has been Hadoop and R analysis, with the recent announcement of our partnering with Cloudera and the recruitment of a big data and advanced analytics team operating our of our Brighton, UK office. We’ve also started to work on a number of projects and proof of concepts with customers in the UK and Europe, working mainly with core Oracle BI, DW and ETL customers looking to make their first move into Hadoop and big data. The usual pattern of engagement is for us to engage with some business users looking to analyse a dataset hitherto too large or too unstructured to load into their Oracle data warehouse, or where they recognise the need for more advanced analytics tools such as R, MapReduce and Spark but need some help getting started. Most often we put together a PoC Hadoop cluster for them using virtualization technology on existing hardware they own, allowing them to get started quickly and with no initial licensing outlay, with our preferred Hadoop distribution being Cloudera CDH, the same Hadoop distribution that comes on the Oracle Big Data Appliance. Projects then typically move on to Hadoop running directly on physical hardware, in a couple of cases Oracle’s Big Data Appliance, usually in conjunction with Oracle Database, Oracle Exadata and Oracle Exalytics for reporting.

One such project started off by the customer wanting to analyse a dataset that was too large for the space available in their Oracle database and that they couldn’t easily process or analyse using the SQL-based tools they usually used; in addition, like most large organisations, database and hardware provisioning took a long time and they needed to get the project moving quickly. We came in and quickly put together a virtualised Hadoop cluster together for them, on re-purposed hardware and using the free (Standard) edition of Cloudera CDH4, and then used the trial version of Oracle Big Data Connectors along with SFTP transfers to get data into the cluster and then analysed.

NewImage

The PoC itself then ran for just over a month with the bulk of the analysis being done using Oracle R Advanced Analytics for Hadoop, an extension to R that allows you to use Hive tables as a data source and create MapReduce jobs from within R itself; the output from the exercise was a series of specific-answer-to-specific-question R graphs that solved an immediate problem for the client, and showed the value of further investment in the technology and our services – the screenshot below shows a typical ORAAH session, in this case analyzing the flight delays dataset that you can also find on the Exalytics server and in smaller form in OBIEE 11g’s SampleApp dataset.

NewImage

That project has now moved onto a larger phase of work with Oracle Big Data Appliance used as the Hadoop platform rather than VMs, and Cloudera Hadoop upgraded from the free, unsupported Standard version to Cloudera Enterprise. The VMs in fact worked pretty well and had the advantage that they could be quickly spun-up and housed temporarily on an existing server, but were restricted by the RAM that we could assign to each VM – 2GB initially, quickly upgraded to 8GB per VM, and the fact that they were sharing CPU and IO resources. Big Data Appliance, by contrast, has 64GB or RAM per node – something that’s increasingly important now in-memory tools like Impala are begin used – and has InfiniBand networking between the nodes as well as fast network connections out to the wider network, something thats often overlooked when speccing up a Hadoop system.

The support setup for the BDA is pretty good as well; from a sysadmin perspective there’s a lights-out ILOM console for low-level administration, as well as plugins for Oracle Enterprise Manager 12c (screenshot below), and Oracle support the whole package, typically handling the hardware support themselves and delegating to Cloudera for more Hadoop-specific queries. I’ve raised several SRs on client support contracts since starting work on BDAs, and I’ve not had any problem with questions not being answered or buck-passing between Oracle and Cloudera.

NewImageOne thing that’s been interesting is the amount of actual work that you need to do with the Big Data Appliance beyond the actual installation and initial configuration by Oracle to “on-board” it into the typical enterprise environment. BDAs are left with customers in a fully-working state, but like Exalytics and Exadata though, initial install and configuration is just the start, and you’ve then got to integrate the platform in with your corporate systems and get developers on-boarded onto the platform. Tasks we’ve typically provided assistance with on projects like these include:

  • Configuring Cloudera Manager and Hue to connect to the corporate LDAP directory, and working with their security team to create LDAP groups for developer and administrative access that we then used to restrict and control access to these tools
  • Configuring other tools such as RStudio Server so that developers can be more productive on the platform
  • Putting in place an HDFS directory structure to support incoming data loads and data archiving, as well as directories to hold the output datasets from the analysis work we’re doing – all within the POSIX security setup that HDFS currently uses which limits us to just granting owner, group and world permissions on directories
  • Working with the client’s infrastructure team on things like alerting, troubleshooting and setting up backup and recovery – something that’s surprisingly tricky in the Hadoop world as Cloudera’s backup tools only backup from Hadoop-to-Hadoop, and by definition your Hadoop system is going to hold a lot of data, the volume of which your current backup tools aren’t going to easily handle

Once things are set up though you’ve got a pretty comprehensive platform that can be expanded up from the initial six nodes our customers’ systems typically start with to the full eighteen node cluster, and can use tools such as ODI to do data loading and movement, Spark and MapReduce to process and analyse data, and Hive, Impala and Pig to provide end-user access. The diagram below shows a typical future-state architecture we propose for clients on this initial BDA “starter config” where we’ve moved up to CDH5.x, with Spark and YARN generally used as the processing framework and with additional products such as MongoDB used for document-type storage and analysis:

NewImage

 

Something that’s turned out to be more of an issue on projects than I’d originally anticipated is complying with corporate security policies. By definition, most customers who buy an Oracle Big Data Appliance and going to be large customers with an existing Oracle database estate, and if they deal with the public they’re going to have pretty strict security and privacy rules you’ll need to adhere to. Something that’s surprising therefore to most customers new to Hadoop is how insecure or at least easily compromised the average Hadoop cluster is, with Hadoop FS shell security relying on trusted networks and incoming user connections and interfaces such as ODBC not checking passwords at all.

Hadoop and the BDA only becomes what’s termed “secure” when you link it to a Kerebos server, but not every customer has Kerebos set up and unless you enable this feature right at the start when you set up the BDA, it’s a fairly involved task to add retrospectively. Moreover, customers are used to fine-grained access control to their data, a single security model over their data and a good understanding in their heads as to how security works on their database, whereas Hadoop is still a collection of fairly-loosely coupled components with pretty primitive access controls, and no easy way to delete or redact data, for example, when a particular country’s privacy laws in-theory mandate this.

Like everything there’s a solution if you’re creative enough, with tools such as Apache Sentry providing role-based access control over Hive and Impala tables, alternative storage tools like HBase that permit read, write, update and delete operations on data rather than just HDFS’s insert and (table or partition-level) delete, and tools like Cloudera Navigator and BDA features like Oracle Audit Vault that provide administrators with some sort of oversight as to who’s accessing what data and when. As I mentioned in my blog post a couple of weeks ago, Oracle’s Big Data SQL product addresses this requirement pretty well, potentially allowing us to apply Oracle security over both relational, and Hadoop, datasets, but for now we’re working within current CDH4 capabilities and planning on introducing Apache Sentry for role-based access control to Hive and Impala in the coming weeks. We’re also looking at implementing Cloudera’s “secure gateway” cluster topology with all access restricted to just a single gateway Hadoop node, and the cluster itself firewalled-off with external access to just that gateway node and HTTP / REST API access to the various cluster services, for example as shown in the diagram below:

NewImage

My main focus on Hadoop projects has been on the overall Hadoop system architecture, and interacting with the client’s infrastructure and security teams to help them adopt the BDA and take over its maintenance. From the analysis side, it’s been equally as interesting, with a number of projects using tools such as R, Oracle R Advanced Analytics for Hadoop and core Hive/MapReduce for data analysis, Flume, Java and Python for data ingestion and processing, and most recently OBIEE11g for publishing the results out to a wider audience. Following the development model that we outlined in the second post in our updated Information Management Reference Architecture blog series, we typically split delivery of each project’s output into two distinct phases; a discovery phase, typically done using RStudio and Oracle R Advanced Analytics for Hadoop, where we explore and start understanding the dataset, presenting initial findings to the business and using their feedback and direction to inform the second phase; and a second, commercial exploitation phase where we use the discovery phases’ outputs and models to drive a more structured dimensional model with output begin in the form of OBIEE analyses and dashboards.

NewImage

We looked at several options for providing the datasets for OBIEE to query, with our initial idea being to connect OBIEE directly to Hive and Impala and let the users query the data in-place, directly on the Hadoop cluster, with an architecture like the one in the diagram below:

NewImage

In fact this turned out to not be possible, as whilst OBIEE 11.1.1.7 can access Apache Hive datasources, it currently only ships with HiveServer1 ODBC support, and no support for Cloudera Impala, which means we need to wait for a subsequent release of OBIEE11g to be able to report against the ODBC interfaces provided by CDH4 and CDH5 on the BDA (although ironically, you can get HiveServer2 and Impala working on OBIEE 11.1.1.7 on Windows, though this platform isn’t officially supported by Oracle for Hadoop access, only Linux). Whichever way though, it soon became apparent that even if we could get Hive and Impala access working, in reality it made more sense to use Hadoop as the data ingestion and processing platform – providing access to data analysts at this point if they wanted access to the raw datasets – but with the output of this then being loaded into an Oracle Exadata database, either via Sqoop or via Oracle Loader for Hadoop and ideally orchestrated by Oracle Data Integrator 12c, and users then querying these Oracle tables rather than the Hive and Impala ones on the BDA, as shown in the diagram below.

NewImage

In-practice, Oracle SQL is far more complete and expressive than HiveQL and Impala SQL and it makes more sense to use Oracle as the query platform for the vast majority of users, with data analysts and data scientists still able to access the raw data on Hadoop using tools like Hive, R and (when we move to CDH5) Spark.

The final thing that’s been interesting about working on Hadoop and Big Data Appliance projects is that 80% of it, in my opinion, is just the same as working on large enterprise data warehouse projects, with 20% being “the magic”. A large portion of your time is spent on analysing and setting up feeds into the system, just in this case you use tools like Flume instead of GoldenGate (though GoldenGate can also load into HDFS and Hive, something that’s useful for transactional database data sources vs. Flume’s focus on file and server log data sources). Another big part of the work is data processing, ingestion, reformatting and combining, again skills an ETL developer would have (though there’s much more reliance, at this point, on command-line tools and Unix utilities, albeit with a place for tools like ODI once you get to the set-based filtering, joining and aggregating phase). In most cases, the output of your analysis and processing will be Hive and Impala tables so that results can be analysed using tools such as OBIEE, and you therefore need skills in areas such as dimensional modelling, business analysis and dashboard prototyping as well as tool-specific skills such as OBIEE RPD development.

Where the “magic” happens, of course, is the data preparation and analysis that you do once the data is loaded, quite intensively and interactively in the discovery phase and then in the form of MapReduce and Spark jobs, Sqoop loads and Oozie workflows once you know what you’re after and need to process the data into something more tabular for tools like OBIEE to access. We’re building up a team competent in techniques such as large-scale data analysis, data visualisation, statistical analysis, text classification and sentiment analysis, and use of NoSQL and JSON-type data sources, which combined with our core BI, DW and ETL teams allows us to cover the project from end-to-end. It’s still relatively early days but we’re encouraged by the response from our project customers so far, and – to be honest – the quality of the Oracle big data products and the Cloudera platform they’re based around – and we’re looking forward to helping other Oracle customers get the most out of their adoption of these new technologies. 

If you’re an Oracle customer looking to make their first move into the worlds of Hadoop, big data and advanced analytics techniques, feel free to drop me an email at mark.rittman@rittmanmead.com  for some initial advice and guidance – the fact we come from an Oracle-centric background as well typically makes it easier for us to relate these new concepts to the ones you’re typically more familiar with. Similarly, if you’re about to bring on-board an Oracle Big Data Appliance system and want to know how best to integrate it in with your existing Oracle BI, DW, data integration and systems management estate, get in contact and I’d be happy to share experiences and our delivery approach.