Tag Archives: Cloud

Rittman Mead BI Forum 2015 Call for Papers Now Open!

I’m very pleased to announce that the Call for Papers for the Rittman Mead BI Forum 2015 is now open, with abstract submissions open to January 18th 2015. As in previous years the BI Forum will run over consecutive weeks in Brighton, UK and Atlanta, GA, with the provisional dates and venues as below:

  • Brighton, UK : Hotel Seattle, Brighton, UK : May 6th – 8th 2015
  • Atlanta, GA : Renaissance Atlanta Midtown Hotel, Atlanta, USA : May 13th-15th 2015

Now on it’s seventh year, the Rittman Mead BI Forum is the only conference dedicated entirely to Oracle Business Intelligence, Oracle Business Analytics and the technologies and processes that support it – data warehousing, data analysis, data visualisation, big data and OLAP analysis. We’re looking for session around tips & techniques, project case-studies and success stories, and sessions where you’ve taken Oracle’s BI products and used them in new and innovative ways. Each year we select around eight-to-ten speakers for each event along with keynote speakers and a masterclass session, with speaker choices driven by attendee votes at the end of January, and editorial input from myself, Jon Mead and Charles Elliott and Jordan Meyer.

NewImage

Last year we had a big focus on cloud, and a masterclass and several sessions on bringing Hadoop and big data to the world of OBIEE. This year we’re interested in project stories and experiences around cloud and Hadoop, and we’re keen to hear about any Oracle BI Apps 11g implementations or migrations from the earlier 7.9.x releases. Getting back to basics we’re always interested in sessions around OBIEE, Essbase and data warehouse data modelling, and we’d particularly like to encourage session abstracts on data visualization, BI project methodologies and the incorporation of unstructured, semi-structured and external (public) data sources into your BI dashboards. For an idea of the types of presentations that have been selected in the past, check out the BI Forum 2014, 2013 and 2012 homepages, or feel free to get in touch via email at mark.rittman@rittmanmead.com

The Call for Papers entry form is here, and we’re looking for speakers for Brighton, Atlanta, or both venues if you can speak at both. All session this year will be 45 minutes long, all we’ll be publishing submissions and inviting potential attendees to vote on their favourite sessions towards the end of January. Other than that – have a think about abstract ideas now, and make sure you get them in by January 18th 2015.

OBIEE SampleApp v406 Amazon EC2 AMI – available for public use

I wrote a while ago about converting Oracle’s superb OBIEE SampleApp from a VirtualBox image into an EC2-hosted instance. I’m pleased to announce that Oracle have agreed for us to make the image (AMI) on Amazon available publicly. This means that anyone who wants to run their own SampleApp v406 server on Amazon’s EC2 cloud service can do so.

2014-11-11_17-47-32

2014-11-12_09-15-41

Important caveats

Before getting to the juicy stuff there’s some important points to note about access to the AMI, which you are implicitly bound by if you use it:

  1. In accessing it you’re bound by the same terms and conditions that govern the original SampleApp
  2. SampleApp is only ever for use in your own development/testing/prototyping/demonstrating with OBIEE. It must not be used as the basis for any kind of Productionisation.
  3. Neither Oracle nor Rittman Mead provide any support for SampleApp or the AMI, nor warranty to any issues caused through their use.
  4. Once launched, the server will be accessible to the public and its your responsibility to secure it as such.

How does it work?

  1. Create yourself an AWS account, if you haven’t already. You’ll need your credit card for this. Read more about getting started with AWS here.
  2. Request access to the AMI (below)
  3. Launch the AMI on your AWS account
  4. Everything starts up automagically. After 15-20 minutes, enjoy your fully functioning SampleApp v406 instance, running in the cloud!

How much does it cost?

You can get an estimate of the cost involved using the Amazon Calculator.

As a rough guide, as of November 2014 an “m3.large” instance costs around $4 a day  – but it’s your responsibility to check pricing and commitments.

Be aware that once a server is created you’ll incur costs on it right through until you “terminate” it. You can “stop” it (in effect, power it off) which reduces the running costs but you’ll still pay for the ‘disk’ (EBS volume) that holds it. The benefit of this though is that you can then power it back up and it’ll be as you left it (just with a different IP).

You can track your AWS usage through the AWS page here.

Security

  • Access to the instance’s command line is through SSH as the oracle user using SSH keys only (provided by you when you launch the server) – no password access
    • You cannot ssh to the server as root; instead connect as oracle and use sudo as required.
    • The ssh key does not get set up until the very end of the first boot sequence, which can be 20 minutes. Be patient!
  • All the OBIEE/WebLogic usernames and passwords are per the stock SampleApp v406 image, so you are well advised to change them. Otherwise if someone finds your instance running, they’ll be able to access it
  • There is no firewall (iptables) running on the server. Since this is a public server you’d be wise to make use of Amazon’s Security Group functionality (in effect, a firewall at the virtual hardware level) to block access on all ports except those necessary.
    For example, you could block all traffic except 7780, and then enable access on port 22 (SSH) and 7001 (Admin Server) just when you need to access it for admin.

Using the AMI

  1. You first need to get access to the AMI, through the form below. You also need an active AWS account.
  2. Launch the server:
    1. From the AWS AMI page locate the SampleApp AMI using the details provided when you request access through the form below. Make sure you are on the Ireland/eu-west-1 region. Click Launch.
    2. Select an Instance Type. An “m3.large” size is a good starting point (this site is useful to see the spec of all instances).2014-11-11_22-42-48
    3. Click through the Configure Instance DetailsAdd Storage, and Tag Instance screens without making changes unless you need to.
    4. On the Security Group page select either a dedicated security group if you have already configured one, or create a new one.
      A security group is a firewall that controls traffic to the server regardless of any software firewall configured or not on the instance. By default only port 22 (SSH) is open, so you’ll need to open at least 7780 for analytics, and 7001 too if you want to access WLS/EM as well
      Note that you can amend a security group’s rules once the instance is created, but you cannot change which security group it is bound to. For ad-hoc purposes I’d always use a dedicated security group per instance so that you can change rules just for your server without impacting others on your account.
      2014-11-11_22-47-33
    5. Click on Review and Launch, check what you’ve specified, and then click Launch. You’ll now need to either specific an existing SSH key pair, or generate a new one. It’s vital that you get this bit right, otherwise you’ll not be able to access the server. If you generate a new key pair, make sure you download it (it’ll be a .pem file). 2014-11-11_22-49-29
    6. Click Launch Instances
      2014-11-11_22-51-09You’ll get a hyperlinked Instance ID; click on that and it’ll take you to the Instances page filtered for your new server.
      2014-11-11_22-52-01
      Shortly you’ll see the server’s public IP address shown.
      2014-11-11_22-52-09
  3. OBIEE is configured to start automagically at boot time along with the database. This means that in theory you don’t need to actually access the server directly. It does take 15-20 minutes on first boot to all fire up though, so be patient.
  4. The managed server is listening on port 7780, and admin server on 7001. If your server IP is 42.42.42.42 the URLs would be:
    • Analytics: http://42.42.42.42:7780/analytics
    • WLS: http://42.42.42.42:7001/console
    • EM: http://42.42.42.42:7001/em

On the server

The server is a stock SampleApp v406 image, with a few extras:

  • obiee and dbora services configured and set to run at bootup. Control obiee using:
    sudo service obiee status
    sudo service obiee stop
    sudo service obiee start
    sudo service obiee restart
  • screen installed with a .screenrc setup

Accessing the AMI

To get access to the AMI, please complete this short form and we will send you the AMI details by email.

By completing the form and requesting access to the AMI, you are acknowledging that you have read and understood the terms and conditions set out by Oracle here.

Oracle BI Cloud Service for SaaS Application Reporting Part 1: Integrating BICS to Salesforce.com using REST APIs

Last month Mark Rittman covered a series of posts detailing the Oracle BI Cloud Service (BICS), aimed at departmental users who want the power of OBIEE 11g without the need to stand-up their own infrastructure. If you’re coming in late, here’s the link to the series.

Before the GA Rittman Mead participated in the beta program for release one of Oracle’s Business Intelligence Cloud Service (BICS), the global aim of the beta was both to understand the capabilities and to identify potential use cases of the new BICS platform.  As Mark wrote, an excellent use case for BICS is to report on top of any SaaS application that expose the data (stored in the cloud) using REST APIs by taking advantage of the ApEx capabilities hosted in the Database Schema Service that comes with BICS. SaaS applications like Oracle’s Fusion CRM, Taleo or Salesforce.com – that was used during the beta program – can be easily integrated and queried with BICS.

The technical goal of our beta program has been to check the features, options and limitations of Oracle BICS by connecting it to a Salesforce.com instance, accessing the data exposed through REST APIs by using the ApEx native functions,  storing the data in the Database Schema Service, creating the Repository using the new Model Editor and showing the data in dashboard by keeping the same data security settings configured in the source platform.

Salesforce.com to Oracle BI Cloud Service

This series of post is going to explain all the details about our successful PoC, over the next few days we’ll be covering the following topics, and we’ll update the list with hyperlinks once the articles are published:

  • Oracle BI Cloud Service for SaaS Application Reporting Part 1: Integrating BICS to Salesforce.com using REST APIs
  • Oracle BI Cloud Service for SaaS Application Reporting Part 2: BICS ApEx components
  • Oracle BI Cloud Service for SaaS Application Reporting Part 3: BICS Repository and Front-end configurations

In the first post of the series we’re going to define the steps required to setup a Salesforce.com demo environment that can be accessed with REST APIs. Later in the post we intend to analyse the most interesting Salesforce.com REST APIs that are used to extract the data from the platform. Before starting digging into the BICS-Salesforce.com integration, some definitions may be needed:

  • RESTful APIs: A Web service API can be defined as RESTful if it conforms to the REST architectural constraints. Some more details about RESTful APIs can be found here.
  • Salesforce.comSalesforce.com is company specialised in software as a service (SaaS) and one of the major providers of CRM in the cloud. In addition to CRM Salesforce.com offers Force.com a cloud platform as a service (PaaS) that developers can use in order to build multitenant applications hosted on Salesforce.com servers. All the Force.com applications can be accessed by using RESTful APIs.

Environments Setup

In a new BI project it is never safe to directly extract data from a live (production) environment. Thus the first step needed in order to test BICS-Salesforce.com connection is to obtain a Salesforce.com environment that we can use without the risk of slowing it down, or even worse crashing and impacting users. A free demo Salesforce.com platform can be obtained by subscribing for the developers program and can be customised as needed. After requesting the Salesforce.com demo environment the creation of a security token is needed, the token is associated with the user invoking the RESTful API. To generate the token: login to Salesforce.com with the specific user -> click on the username -> My Settings -> Reset My Security Token. The new token should then be sent to the user’s email.

Salesforce Reset token Procedure

The third step in order to access Salesforce.com data with REST APIs is to define a Connected App. The creation of a Connected App generates a Consumer Key and a Consumer Secret that is going to be used later during the REST login calls. Once defined a Connected App, the last bit missing is to populate the Salesforce.com instance since it’s empty by default, it can be populated manually or by writing population scripts based on the create REST API statements explained here. An Oracle BICS instance is needed in order to analyse the Salesforce.com data, all the info regarding Oracle BICS and how to activate one instance can be found at this link. Having covered all the basics, it’s time to start analysing how the Salesforce.com data can be extracted.

Salesforce.com REST APIs

In the following sections we are going to analyse some of the REST APIs used in our process. The Salesforce.com REST APIs will be called with cURL commands, cURL is a command line tool and library for transferring data with URL syntax.

Salesforce.com Authentication

The first step in order to download the data from any Salesforce.com instance is to execute the authentication and retrieve the access token, the access token will then be used in all the following REST calls. There are various authentication mechanisms that can be used against Salesforce.com, for the aim of the beta program we used the one called “Username-Password OAuth Authentication Flow” that is described in detail here and in the image below.

Salesforce.com User Password OAuth

The REST API authentication command is

curl -v https://login.salesforce.com/services/oauth2/token -d "grant_type=password" \
    -d "client_id=CLIENT_ID" -d "client_secret=CLIENT_SECRET" \
    -d "username=USERNAME" -d "password=PASSWORD_TOKEN"

where the following parameters must be assigned:

  • CLIENT_ID:  The client ID generated during the creation of the Connected App (step described in Basic Setup)
  • CLIENT_SECRET: The client Secret generated during the creation of the Connected App (step described in Basic Setup)
  • USERNAME: The username you want to use in order to access Salesforce.com data
  • PASSWORD_TOKEN: The concatenation of password and user security token (e.g. if the password is ABC and the security token is 123 then PASSWORD_TOKEN is ABC123)

The response (if all the parameters are correct) should be like the following

{  
   "id":"https://login.salesforce.com/id/00Dx0000000BV7z/005x00000012Q9P",
   "issued_at":"1278448832702",
   "instance_url":"https://na1.salesforce.com",
   "signature":"0CmxinZir53Yex7nE0TD+zMpvIWYGb/bdJh6XfOH6EQ=",
   "access_token":"00Dx0000000BV7z!AR8AQAxo9UfVkh8AlV0Gomt9Czx9LjHnSSpwBMmbRcgKFmxOtvxjTrKW19ye6PE3Ds1eQz3z8jr3W7_VbWmEu4Q8TVGSTHxs"
}

where the most interesting parameters are:

  • instance_url: defines the salesforce instance to use.
  • access_token: defines the time-limited security token to use for all the following calls.

Salesforce.com List of Objects and Retrieve Metadata for an Object

Two of the Salesforce.com REST APIs were very useful when trying to build a general ETL that could be applied to any customised Salesforce.com instance:

  • Get list of Objects: retrieves the list of all objects available in Salesforce.com (custom or not) with some additional information for each object.
  • Retrieve Metadata for an Object: retrieves the list of columns for the selected object with related data types and additional information.

The Get list of Objects call is the following

curl INSTANCE/services/data/VERSION/sobjects/ -H "Authorization: Bearer TOKEN"

where

  • INSTANCE is the instance_url parameter retrieved from the authentication call.
  • TOKEN is the access_token parameter retrieved from the authentication call.
  • VERSION is the REST APIs version used, in our beta test we used v29.0.

The response should be similar to the following in which you could see for each object (identified by the “name” field) all the metadata available.

{  
   "encoding":"UTF-8",
   "maxBatchSize":200,
   "sobjects":[  
      {  
         "name":"Account",
         "label":"Account",
         "keyPrefix":"001",
         "labelPlural":"Accounts",
         "custom":false,
         "layoutable":true,
         "activateable":false,
         "urls":{  
            "sobject":"/services/data/v26.0/sobjects/Account",
            "describe":"/services/data/v26.0/sobjects/Account/describe",
            "rowTemplate":"/services/data/v26.0/sobjects/Account/{ID}"
         },
         "searchable":true,
         "updateable":true,
         "createable":true,
         "deprecatedAndHidden":false,
         "customSetting":false,
         "deletable":true,
         "feedEnabled":true,
         "mergeable":true,
         "queryable":true,
         "replicateable":true,
         "retrieveable":true,
         "undeletable":true,
         "triggerable":true
      },
      ...
   ]
}

The Retrieve Metadata of an Object can be called for each object listed in Get list of Object response, the code is the following:

curl INSTANCE/services/data/VERSION/sobjects/OBJECT/ \
    -H "Authorization: Bearer TOKEN"

where:

  • INSTANCE and TOKEN  and VERSION are the same parameters defined for Get list of objects call
  • OBJECT is the Salesforce.com object to retrieve

The response for the Account object is similar to the following

{  
   "objectDescribe":{  
      "name":"Account",
      "updateable":true,
      "label":"Account",
      "keyPrefix":"001",
      ...  
          
        "replicateable":true,
      "retrieveable":true,
      "undeletable":true,
      "triggerable":true
   },
   "recentItems":[  
      {  
         "attributes":{  
            "type":"Account",
            "url":"/services/data/v20.0/sobjects/Account/001D000000INjVeIAL"
         },
         "Id":"001D000000INjVeIAL",
         "Name":"asdasdasd"
      },
      ...
   ]
}

Interesting parameters for each column are:

  • name: is the columns name.
  • retrievable: if true it means that the particular column can be part of a query
  • soapType and length or byteLength: these fields provide the field type information and the length of the field.

With the three REST APIs analysed we could recreate all the objects (custom or not) in any Salesforce.com (or Force.com) instance as tables in the Oracle Database Schema Service. In the next section we will see how to extract the data from the list of objects by using the Salesforce.com query capabilities.

Salesforce.com query

Salesforce provides two methods of querying the objects: including or excluding deleted object. The difference in the code is minimal, the first uses the /queryAll suffix while the second uses the /query. A Salesforce.com REST API query call is:

curl INSTANCE/services/data/VERSION/query/?q=SELECT+LIST_OF_COLUMNS+from+OBJECT \
    -H "Authorization: Bearer TOKEN"

where:

  • INSTANCE and TOKEN parameters are the ones retrieved from the Authentication call
  • VERSION is the Salesforce.com REST API version used
  • OBJECT is the Salesforce.com object to query
  • LIST_OF_COLUMNS is the comma delimited list of columns to retrieve

The response of a query “select+Name+from+Account” is:

{  
   "done":true,
   "totalSize":14,
   "records":[  
      {  
         "attributes":{  
            "type":"Account",
            "url":"/services/data/v20.0/sobjects/Account/001D000000IRFmaIAH"
         },
         "Name":"Test 1"
      },
      {  
         "attributes":{  
            "type":"Account",
            "url":"/services/data/v20.0/sobjects/Account/001D000000IomazIAB"
         },
         "Name":"Test 2"
      },
      ...
   ]
}

In the JSON result you can find:

  • totalsize: the number of records retrieved
  • Name: for each record the name of the Account

If the resultSet is too big, Salesforce.com will start paging results. If the initial query returns only part of the results, the end of the response will contain a field called nextRecordsUrl. For example:

"nextRecordsUrl" : "/services/data/v20.0/query/01gD0000002HU6KIAW-2000"

In order to get the next page of data a call like the following is needed passing the access token as a parameter.

curl INSTANCE/services/data/v20.0/query/01gD0000002HU6KIAW-2000 \
    -H "Authorization: Bearer TOKEN"

Salesforce.com limits

It’s important to be aware of the limit set by Salesforce.com on the number of REST API calls per day. In order to check at any time the remaining number of calls available execute the following code.

curl INSTANCE/services/data/VERSION/limits/ -H "Authorization: Bearer TOKEN "X-PrettyPrint:1"

The DailyApiRequest -> Remaining field contained in the response shows the amount of calls still available.

{
    "DailyApiRequests":
    {
        "Remaining":"4980",
        "Max":"5000"
    },
    "DailyAsyncApexExecutions":
    {
        "Remaining":"250000",
        "Max":"250000"
    },
    ...
}

Resultset output format

Salesforce.com default output format is JSON, which is a common format for most of the cloud application. However there is the opportunity to also retrieve the result in XML format by using the HTTP ACCEPT header set to “application/xml”. This first release of the tool is based on Oracle DB11g which supports native XML and not JSON. For this reason during our beta program with Oracle BI Cloud Service we decided to use the XML output. Once BICS will be bundled with the Oracle DB 12c that supports JSON and XML native, the resultset output format could be kept as default in JSON. In this post we defined the steps required in order to setup a Salesforce.com demo environment that can be accessed with REST APIs and which are the most interesting Salesforce.com REST APIs that we used to extract the data in order to analyse it with Oracle BI Cloud Service platform. In the next post we will look more in detail at the BICS ApEx part of it by analysing how the Salesforce.com REST APIs can be called from ApEx.

Oracle OpenWorld 2014 is over – What’s next?

Last week Oracle OpenWorld 2014 took place in San Francisco. I did not have the pleasure to attend this event. thanks to the Social Media and the World Wide Web you could be able to follow the highlights. If we check out the Keynote of Thomas Kurian, we can learn that there are three Major Trends; Big…Read more Oracle OpenWorld 2014 is over – What’s next?

News and Updates from Oracle Openworld 2014

It’s the Saturday after Oracle Openworld 2014, and I’m now home from San Francisco and back in the UK. It’s been a great week as usual, with lots of product announcements and updates to the BI, DW and Big Data products we use on current projects. Here’s my take on what was announced this last week.

New Products Announced

From a BI and DW perspective, the most significant product announcements were around Hadoop and Big Data. Up to this point most parts of an analytics-focused big data project required you to code the solution yourself, with the diagram below showing the typical three steps in a big data project – data ingestion, analysis and sharing the results.

NewImage

At the moment, all of these steps are typically performed from the command-line using languages such as Python, R, Pig, Hive and so on, with tools like Apache Flume and Apache Sqoop used to bring data into and out of the Hadoop cluster. Under the covers, these tools use technologies such as MapReduce or Spark to do their work, automatically running jobs in parallel across the cluster and making use of the easy scalability of Hadoop and NoSQL databases.

You can also neatly divide the work up on a big data project into two phases; the “discovery” phase typically performed by a data scientist where data is loaded, analysed, correlated and otherwise “understood” to provide the initial insights, and then an “exploitation” phase where we apply governance, provide the output data in a format usable by BI tools and otherwise share the results with the wider corporate audience. The updated Information Management Reference Architecture we collaborated on with Oracle and launched by in June this year had distinct discovery and exploitation phases, and the architecture itself made a clear distinction between the Innovation part that enabled the discovery phase of a project and the Execution part that delivered the insights and data in a more governed, production setting.

NewImage

This was the theme of the product announcements around analytics, BI, data warehousing and big data during Openworld 2014, with Oracle’s Omri Traub in the photo below taking us through Oracle’s big data product strategy. What Oracle are doing here is productising and “democratising” big data, putting it clearly in context of their existing database, engineered systems and BI products and linking them all together into an overall information management architecture and delivery process.

NewImage

So working through from ingestion through to data analysis, these steps have typically been performed by data scientists using scripting tools and rudimentary data visualisation engines, making them labour-intensive and reliant on a small set of people conversant with these tools and process. Oracle Big Data Discovery is aimed squarely at these steps, and combines Apache Spark-based data preparation and transformation capabilities with an analysis and visualisation engine based on Endeca Server.

NewImage

Key features of Big Data Discovery include:

  • Ability to analyse, parse, explore and “wrangle” data using graphical tools and a Spark-based transformation engine
  • Create a catalog of the data on your Hadoop cluster, and then search that catalog using Endeca Server search technologies
  • Create recommendations of other datasets that might interest you, based on what you’re looking at now
  • Visualize your datasets to help understand what they contain, and discover new insights

Under the covers it comprises two parts; the data loading, transformation and profiling part that uses Apache Spark to do its work in parallel across all the nodes in the cluster, and the analysis part, which takes data prepared by Apache Spark and loads into the Endeca Server in-memory engine to perform the analysis, aggregation and data visualisation. Unlike the Spark part the Endeca server element runs just on one node and limits the size of the analysis dataset to what can run in-memory in the Endeca Server engine, but in practice you’re going to work with a sample of the data rather than the entire dataset at that stage (in time the assumption is that the Endeca Server engine will be unbundled and run natively on YARN, giving it the same scalability as the Spark-based data ingestion and transformation part). Initially Big Data Discovery will run on-premise with a cloud version later on, and it’s not dependent on Big Data Appliance – expect to see something later this year / early next year.

Another new product that addresses the discovery phase and discovery lab part of a big data project is Oracle Data Enrichment Cloud Service, from the Oracle Data Integration team and designed to complement ODI and Oracle EDQ. Whilst Oracle positioned ODECS as something you’d use as well as Big Data Discovery and typically upstream from BDD, to me there seemed to be a fair bit of overlap between the products, with both tools doing data profiling and transformation but BDD being more focused on the exploration and discovery part, and ODECS being more focused on early-stage data profiling and transformation.

NewImage

ODECS is clearly more of an ETL tool complement and runs natively in the cloud, right from the start. It’s most probably aimed at customers with their Hadoop dataset already in the cloud, maybe using Amazon Elastic MapReduce or Oracle’s new Hadoop-as-a-Service and has more in common with the old Data Quality Option for Oracle Warehouse Builder than Endeca’s search-first analytic interface. It’s got a very nice interface including a mobile-enabled website and the ability to include and merge in external datasets, including Oracle’s own Data as a Service platform offering. Along with the new Metadata Management tool Oracle also launched at Openworld it’s a great addition to the Oracle Data Integration product suite, but I can’t help thinking that its initial availability only on Oracle’s public cloud platform is going to limit its use with Oracle’s typical customers – we’ll have to just wait and see.

The other major product that addresses big data projects was Oracle Big Data SQL. Partly addressing the discovery phase of big data projects but mostly (to my mind) addressing the exploitation phase, and the execution part of the information management architecture, Big Data SQL gives Oracle Exadata the ability to return data from Hive and NoSQL on the Big Data Appliance as well as data from its normal relational store. I covered Big Data SQL on the blog a few weeks ago and I’ll be posting some more in-depth articles on it next week, but the other main technical innovation with the product is its bringing of Exadata’s SmartScan feature to Hadoop, projecting and filtering data at the Hadoop storage node level and also giving Hadoop the ability to understand regular Oracle SQL, rather than the cut-down version you get with HiveQL.

NewImage

Where this then leaves us is with the ability to do most of a big data project using (Oracle) tools, bringing big data analysis within reach of organisations with Oracle-style budgets but without access to rare data scientist-type resources. Going back to my diagram earlier, a post-OOW big data project using the new products launched in this last week could look something like this:

NewImage

Big Data SQL is out now and depends on BDA and Exadata for its use; Big Data Discovery should be out in a few months time, runs on-premise but doesn’t require BDA, whilst ODECS is cloud-only and runs on a BDA in the background. Expect more news and more integration/alignment from the products as 2014 ends and 2015 starts, and we’re looking forward to using them on Oracle-centric Hadoop projects in the near future. 

Product Updates for BI, Data Integration, Exalytics, BI Applications and OBIEE

Other news announced over the week for products we more commonly use on projects include:

Finally, something that we were particularly pleased to see was the updated Oracle Information Management Architecture I mentioned earlier referenced in most of the analytics sessions, with Oracle’s Balaji Yelamanchili for example introducing it in his big data and business analytics general session mid-way through the week. 

NewImage
 

We love the way this brings together the big data components and puts them in the context of the wider data warehouse and analytic processes, and compared to a few years ago when Hadoop and big data was considered completely separate to data warehousing and BI and done by staff completely different to the core business analytics team, this new reference architecture puts it squarely within the world of BI and analytics we work in. It also emphasises the new abilities Hadoop, NoSQL databases and big data can bring us – support for wider sets of data sources with dynamic schemas, the ability to economically work with and analyse much larger datasets, and support discovery-type upfront analysis work. Finally, it recognises that to get true value out of analysis you start on Hadoop, you eventually need to add proper data governance, make the results more widely available using full SQL tools, and use the right tools – relational databases, OLAP servers and the like – to analyse the data once its in a more structured form. 

If you missed our write-up on the updated Information Management Reference Architecture you can can read our two-part blog post here and here, read the Oracle white paper, or listen to the podcast with OTN Archbeat’s Bob Rhubart. For now though I’m looking forward to seeing the family after a week and a half away in San Francisco – thanks to OTN and the Oracle ACE Director Program for sponsoring my visit over to SF for Openworld, and we’ll post our conference presentation slides later next week when we’re back in the UK and US offices.