Tag Archives: Oracle Database

Preview of Maria Colgan, and Andrew Bond/Stewart Bryson Sessions at RM BI Forum 2014

We’ve got a great selection of presentations at the two upcoming Rittman Mead BI Forum 2014 events in Brighton and Atlanta, including sessions on Endeca, TimesTen, OBIEE (of course), ODI, GoldenGate, Essbase and Big Data (full timetable for both events here). Two of the sessions I’m particularly looking forward to though are ones by Maria Colgan, product manager for the new In-Memory Option for Oracle Database, and another by Andrew Bond and Stewart Bryson, on an update to Oracle’s reference architecture for Data Warehousing and Information Management.

The In-Memory Option for Oracle Database was of course the big news item from last year’s Oracle Openworld, promising to bring in-memory analytics and column-storage to the Oracle Database. Maria is of course well known to the Oracle BI and Data Warehousing community through her work with the Oracle Database Cost-Based Optimizer, so we’re particular glad to have her at the Atlanta BI Forum 2014 to talk about what’s coming with this new feature. I asked Maria to jot down a few worlds for the blog on what she’ll be covering, so over to Maria:


NewImage“Given this announcement and the performance improvements promised by this new functionality is it still necessary to create a separate access and performance layer in your data warehouse environment or to run  your Oracle data warehouse  on an Exadata environment?“At Oracle Open World last year, Oracle announced the upcoming availability of the Oracle Database In-Memory option, a solution for accelerating database-driven business decision-making to real-time. Unlike specialized In-Memory Database approaches that are restricted to particular workloads or applications, Oracle Database 12c leverages a new in-memory column store format to speed up analytic workloads.

This session explains in detail how Oracle Database In-Memory works and will demonstrate just how much performance improvements you can expect. We will also discuss how it integrates into the existing Oracle Data Warehousing Architecture and with an Exadata environment.”

The other session I’m particularly looking forward to is one being delivered jointly by Andrew Bond, who heads-up Enterprise Architecture at Oracle and was responsible along with Doug Cackett for the various data warehousing, information management and big data reference architectures we’ve covered on the blog over the past few years, including the first update to include “big data” a year or so ago.

NewImage

Back towards the start of this year, Stewart, myself and Jon Mead met up with Andrew and his team to work together on an update to this reference architecture, and Stewart carried on with the collaboration afterwards, bringing in some of our ideas around agile development, big data and data warehouse design into the final architecture. Stewart and Andrew will be previewing the updated reference architecture at the Brighton BI Forum event, and in the meantime, here’s a preview from Andrew:

“I’m very excited to be attending the event and unveiling Oracle’s latest iteration of the Information Management reference architecture. In this version we have focused on a pragmatic approach to “Analytics 3.0″ and in particular looked at bringing an agile methodology to break the IT / business barrier. We’ve also examined exploitation of in-memory technologies and the Hadoop ecosystem and guiding the plethora of new technology choices.

We’ve worked very closely with a number of key customers and partners on this version – most notably Rittman Mead and I’m delighted that Stewart and I will be able to co-present the architecture and receive immediate feedback from delegates.”

Full details of the event, running in Brighton on May 7-9th 2014 and Atlanta, May 15th-17th 2014, can be found on the Rittman Mead BI Forum 2014 homepage, and the agendas for the two days are on this blog post from earlier in the week.

Rittman Mead BI Forum 2014 Registration Now Open – Don’t Miss Out!

Just a quick reminder to say that registration for the Rittman Mead BI Forum 2014 is now open, with the speaker and presentation list now up on the event website. As with previous years, the BI Forum runs in Brighton on the first week, and then moves over to Atlanta on the second, with the dates and venues as follows:

We’ve got a fantastic line-up of sessions and speakers, including:

  • Oracle ACE and past BI Forum best speaker winner Kevin McGinley, on adding third-party visualisations to OBIEE
  • Sessions from TimesTen PMs Chris Jenkins and Susan Cheung on what’s coming with TimesTen
  • Wayne Van Sluys from InterRel, on Essbase optimisation
  • Oracle’s Andrew Bond, and our own Stewart Bryson (Oracle ACE) with an update to Oracle’s reference BI, DW and Big Data Architecture
  • Dan Vlamis on using Oracle Database analytics with the Oracle BI Applications
  • Sessions from Oracle’s Jack Berkowitz, Adam Bloom and Matt Bedin on what’s coming with OBIEE and Oracle BI Applications
  • Peak Indicators’ Alastair Burgess on tuning TimesTen with Aggregate Persistence
  • Endeca sessions from Chris Lynskey (PM), Omri Traub (Development Manager) on Endeca, along with ones from Branchbird’s Patrick Rafferty and Truls Bergersen
  • And sessions from Rittman Mead’s Robin Moffatt (OBIEE performance), Gianni Ceresa (Essbase) and Michael Rainey (ODI, with Nick Hurt from IFPI)

NewImage

We’ve also got some excellent keynote sessions including one in the US from Maria Colgan on the new in-memory database option, and another in Brighton from Matt Bedin and Adam Bloom on BI in the Cloud – along with the opening-night Oracle product development keynote in both Brighton and Atlanta.

We’re also very exited to welcome Lars George from Cloudera to deliver this year’s optional one-day masterclass, this year on Hadoop, big data, and how Oracle BI&DW developers can get started with this technology. Lars is Cloudera’s Chief Architect in EMEA and an HBase committer, and he’ll be covering topics such as:

  • What is Hadoop, what’s in the Hadoop ecosystem and how do you design a Hadoop cluster
  • Using tools such as Flume and Sqoop to import data into Hadoop, and then analyse it using Hive, Pig, Impala and Cloudera Search
  • Introduction to NoSQL and HBase
  • Connecting Hadoop to tools such as OBIEE and ODI using JDBC, ODBC, Impala and Hive

If you’ve been meaning to take a look at Hadoop, or if you’ve made a start but would like a chance to discuss techniques with someone who’s out in the field every week designing and building Hadoop systems, this session is aimed at you – it’s on the Wednesday before each event and you can book at the same time as registering for the main BI Forum days.

NewImage

Attendance is limited to around seventy at each event, and we’re running the Brighton BI Forum back at the Hotel Seattle, whilst the US one is running at the Renaissance Midtown Hotel, Atlanta. We encourage attendees to stay at the hotel as well so as to maximise networking opportunities, and this year you can book US accommodation directly with the hotel so you can collect any Marriott points, corporate discounts etc. As usual, we’ll take good care of you over the two or three days, with meals each night, drinks receptions and lots of opportunities to meet colleagues and friends in the industry.

Full details are on the BI Forum 2014 web page including links to the registration sites. Book now so you don’t miss-out – each year we sell-out in advance, so don’t leave it to the last minute if you’re thinking of coming. Hopefully see you all in Brighton and Atlanta in May 2014!

Using Oracle R Enterprise to Analyze Large In-Database Datasets

The other week I posted an article on the blog about Oracle R Advanced Analytics for Hadoop, part of Oracle’s Big Data Connectors and used for running certain types of R analysis over a Hadoop cluster. ORAAH lets you move data in and out of HDFS and Hive and into in-memory R data frames, and gives you the ability to create Hadoop MapReduce jobs but using R commands and syntax. If you’re looking to use R to analyse, prepare and explore your data, and you’ve got access to a large Hadoop cluster, ORAAH is a useful way to go beyond the normal memory constraints of R running on your laptop.

But what if the data you want to analyse is currently in an Oracle database? You can export the relevant tables to flat files and then import them into HDFS, or you can use a tools such as sqoop to copy the data directly into HDFS and Hive tables. Another option you could consider though is to run your R analysis directly on the database tables, avoiding the need to move data around and taking advantage of the scalability of your Oracle database – which is where Oracle R Enterprise comes in.

Oracle R Enterprise is part of the Oracle Database Enterprise Edition “Advanced Analytics Option”, so it’s licensed separately to ORAAH and the Big Data Connectors. What it gives you is three things:

image2

  • Some client packages to install locally on your desktop along. installed into regular R (or ideally, Oracle’s R distribution)
  • Some database server-side R packages to provide a “transparency layer”, converting R commands into SQL ones, along with extra SQL stats functions to support R
  • The ability to spawn-off R engines within the Oracle Database’s using the extproc mechanism, for performing R analysis directly on the data rather than through the client on your laptop

Where this gets interesting for us is that the ORE transparency layer makes it simple to move data in and out of the Oracle Database, but more importantly it allows us to use database tables and views as R “ore.frames” – proxies for “data frames”, the equivalent to database tables in R and the basic data set that R commands work on. Going down this route avoids the need to export the data we’re interesting out of the Oracle Database, with the ORE transparency layer converting most R function calls to Oracle Database SQL ones – meaning that we can use the data analyst-friendly R language whilst using Oracle under the covers for the heavy lifting.

NewImage

There’s more to ORE than just the transparency layer, but let’s take a look at how you might use ORE and this feature, using the same “flight delays” dataset I used in my post a couple of months ago on Hadoop, Hive and Impala. We’ll use the OBIEE 11.1.1.7.1 SampleApp v309R2 that you can download from OTN as it’s got Oracle R Enterprise already installed, although you’ll need to follow step 10 in the accompanying deployment guide to install the R packages that Oracle couldn’t distribute along with SampleApp.

In the following examples, we’ll:

  • Connect to the main PERFORMANCE fact table in the BI_AIRLINES schema, read in it’s metadata (columns), and then set it up as a “virtual” R data frame that actually  points through to the database table
  • Then we’ll perform some basic analysis, binning and totalling for that table, to give us a sense of what’s in it
  • And then we’ll run some more R analysis on the table, outputting the results in the form of graphs and answering questions such as “which days of the week are best to fly out on?” and “how have airlines relative on-time performance changed over time?”

Let’s start off them by starting the R console and connecting to the database schema containing the flight delays data.

[oracle@obieesample ~]$ R
 
Oracle Distribution of R version 2.15.1  (--) -- "Roasted Marshmallows"
Copyright (C)  The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-unknown-linux-gnu (64-bit)
 
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
 
  Natural language support but running in an English locale
 
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
 
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
 
You are using Oracle's distribution of R. Please contact
Oracle Support for any problems you encounter with this
distribution.
 
[Previously saved workspace restored]
 
> library(ORE)
Loading required package: OREbase
 
Attaching package: ‘OREbase’
 
The following object(s) are masked from ‘package:base’:
 
    cbind, data.frame, eval, interaction, order, paste, pmax, pmin,
    rbind, table
 
Loading required package: OREstats
Loading required package: MASS
Loading required package: OREgraphics
Loading required package: OREeda
Loading required package: OREdm
Loading required package: lattice
Loading required package: OREpredict
Loading required package: ORExml
> ore.connect("bi_airlines","orcl","localhost","BI_AIRLINES",all=TRUE)
Loading required package: ROracle
Loading required package: DBI
> 

Note that “library(ORE)” loads up the Oracle R Enterprise R libraries, and “ore.connect” connects the R session to the relevant Oracle database.

I then synchronise R’s view of the objects in this database schema with its own metadata views, list out what tables are available to us in that schema, and attach that schema to my R session so I can manipulate them from there.

> ore.sync()
> ore.ls()
 [1] "AIRCRAFT_GROUP"           "AIRCRAFT_TYPE"           
 [3] "AIRLINE_ID"               "AIRLINES_USER_DATA"      
 [5] "CANCELLATION"             "CARRIER_GROUP_NEW"       
 [7] "CARRIER_REGION"           "DEPARBLK"                
 [9] "DISTANCE_GROUP_250"       "DOMESTIC_SEGMENT"        
[11] "OBIEE_COUNTY_HIER"        "OBIEE_GEO_AIRPORT_BRIDGE"
[13] "OBIEE_GEO_ORIG"           "OBIEE_ROUTE"             
[15] "OBIEE_TIME_DAY_D"         "OBIEE_TIME_MTH_D"        
[17] "ONTIME_DELAY_GROUPS"      "PERFORMANCE"             
[19] "PERFORMANCE_ENDECA_MV"    "ROUTES_FOR_LINKS"        
[21] "SCHEDULES"                "SERVICE_CLASS"           
[23] "UNIQUE_CARRIERS"         
> ore.attach("bi_airlines")
> 

Now although we know these objects as database tables, what ORE does is present them to R as “data frames” using ore.frame as a proxy, the fundamental data structure in R that looks just like a table in the relational database world. Behind the scenes though, ORE maps these data frames to the underlying Oracle structures using the ore.frame proxy, and turns R commands into SQL function calls including a bunch of new ones added specifically for ORE. Note that this is conceptually different to Oracle R Advanced Analytics for Hadoop, which doesn’t map (or overload) standard R functions to their Hadoop (MapReduce or Hive) equivalent – it instead gives you a set of new R functions that you can use to create MapReduce jobs, which you can then submit to a Hadoop cluster for processing, giving you a more R-native way of creating MapReduce jobs; ORE in-contrast tries to map all of R functionality to Oracle database functions, allowing you to run normal R sessions but with Oracle Database allowing you process bigger R queries closer to the data.

Let’s use another two R commands to see how it views the PERFORMANCE table in the flight delays data set, and get some basic sizing metrics.

> class(PERFORMANCE)
[1] "ore.frame"
attr(,"package")
[1] "OREbase"
> dim(PERFORMANCE)
[1] 6362422     112

Now at this point I could pull the data from one of those tables directly into an in-memory R data frame, like this:

> carriers <- ore.pull(UNIQUE_CARRIERS)
Warning message:
ORE object has no unique key - using random order 
> class(UNIQUE_CARRIERS)
[1] "ore.frame"
attr(,"package")
[1] "OREbase"
> class(carriers)
[1] "data.frame"
> 

As you see, R sees the UNIQUE_CARRIERS object as an ore.frame, whilst carriers (into which data from UNIQUE_CARRIERS was loaded) is a regular data.frame object. In some cases you might want to load data from Oracle tables into a regular data.frame, but what’s interesting here is that we can work directly with ore.frame objects and let the Oracle database do the hard work. So let’s get to work on the PERFORMANCE ore.frame object and do some initial analysis and investigation.

> df <- PERFORMANCE[,c("YEAR","DEST","ARRDELAY")]
> class(df)
[1] "ore.frame"
attr(,"package")
[1] "OREbase"
> head(df)
  YEAR DEST ARRDELAY
1 2010  BOI      -13
2 2010  BUF       44
3 2010  BUF      -14
4 2010  BUR       -6
5 2010  BUR       -2
6 2010  BUR       -9
Warning messages:
1: ORE object has no unique key - using random order 
2: ORE object has no unique key - using random order 
> options(ore.warn.order = FALSE)
> head(PERFORMANCE[,c(1,4,23)])
  YEAR DAYOFMONTH DESTWAC
1 2010         16      83
2 2010         16      22
3 2010         16      22
4 2010         16      91
5 2010         16      91
6 2010         16      91
>

In the above script, the first command creates a temporary ore.frame object made up of just three of the columns from the PERFORMANCE table / ore.frame. Then I switch off the warning about these tables not having unique keys (“options(ore.warn.order = FALSE)”), and then I select three more columns directly from the PERFORMANCE table / ore.frame.

> aggdata <- aggregate(PERFORMANCE$DEST,
+                      by = list(PERFORMANCE$DEST),
+                      FUN = length)
> class(aggdata)
[1] "ore.frame"
attr(,"package")
[1] "OREbase"
 
> head(aggdata)
    Group.1     x
ABE     ABE  4104
ABI     ABI  2497
ABQ     ABQ 33298
ABR     ABR     5
ABY     ABY  1028
ACK     ACK   346
 
> (t <- table(PERFORMANCE$DAYOFWEEK))
 
     1      2      3      4      5      6      7 
943305 924442 932113 942066 956123 777203 887170
 
> dat = PERFORMANCE[PERFORMANCE$ARRDELAY<100 & PERFORMANCE$ARRDELAY>-100,]
> ad = with(dat, split(ARRDELAY,UNIQUECARRIER))
> boxplot(ad,col = "blue", notch = TRUE, cex = 0.5, varwidth = TRUE)

In the above set of scripts, I first aggregate flights by destination airports, then count flights by day of week. In the final set of commands I get a bit more advanced and create a box plot graph showing the range of flight delays by airline, which produces the following graph from the R console:

NewImage

whereas in the next one I create a histogram of flight delays (minutes), showing the vast majority of delays are just a few minutes.

> ad = PERFORMANCE$ARRDELAY
> ad = subset(ad, ad>-200&ad<200)
> hist(ad, breaks = 100, main = "Histogram of Arrival Delay")

NewImage

All of this so far, to be fair, you could do just as easily in SQL or in a tool like Excel, but they’re the sort of commands an R analyst would want to run before getting onto the interesting stuff, and it’s great that they could now do this on the full dataset in an Oracle database, not just on what they can pull into memory on their laptop. Let’s do something more interesting now, and answer the question “which day of the week is best for flying out, in terms of not hitting delays?”

> ad = PERFORMANCE$ARRDELAY
> ad = subset(ad, ad>-200&ad<200)
> hist(ad, breaks = 100, main = "Histogram of Arrival Delay")
> ontime <- PERFORMANCE
> delay <- ontime$ARRDELAY
> dayofweek <- ontime$DAYOFWEEK
> bd <- split(delay, dayofweek)
> boxplot(bd, notch = TRUE, col = "red", cex = 0.5,
+         outline = FALSE, axes = FALSE,
+         main = "Airline Flight Delay by Day of Week",
+         ylab = "Delay (minutes)", xlab = "Day of Week")

NewImage

Looks like Tuesday’s the best. So how has a selection of airlines performed over the past few years?

> ontimeSubset <- subset(PERFORMANCE, UNIQUECARRIER %in% c("AA", "AS", "CO", "DL","WN","NW")) 
> res22 <- with(ontimeSubset, tapply(ARRDELAY, list(UNIQUECARRIER, YEAR), mean, na.rm = TRUE))
> g_range <- range(0, res22, na.rm = TRUE)
> rindex <- seq_len(nrow(res22))
> cindex <- seq_len(ncol(res22))
> par(mfrow = c(2,3))
> res22 <- with(ontimeSubset, tapply(ARRDELAY, list(UNIQUECARRIER, YEAR), mean, na.rm = TRUE))
> g_range <- range(0, res22, na.rm = TRUE)
> rindex <- seq_len(nrow(res22))
> cindex <- seq_len(ncol(res22))
> par(mfrow = c(2,3))
> for(i in rindex) {
+   temp <- data.frame(index = cindex, avg_delay = res22[i,])
+   plot(avg_delay ~ index, data = temp, col = "black",
+        axes = FALSE, ylim = g_range, xlab = "", ylab = "",
+        main = attr(res22, "dimnames")[[1]][i])
+        axis(1, at = cindex, labels = attr(res22, "dimnames")[[2]]) 
+        axis(2, at = 0:ceiling(g_range[2]))
+        abline(lm(avg_delay ~ index, data = temp), col = "green") 
+        lines(lowess(temp$index, temp$avg_delay), col="red")
+ } 
>

NewImage

See this presentation from the BIWA SIG for more examples of ORE queries against the flight delays dataset, which you can adapt from the ONTIME_S dataset that ships with ORE as part of the install.

Now where R and ORE get really interesting, in the context of BI and OBIEE, is when you embed R scripts directly in the Oracle Database and use them to provide forecasting, modelling and other “advanced analytics” features using the database’s internal JVM and an R engine that gets spun-out on-demand. Once you’ve done this, you can expose the calculations through an OBIEE RPD, as Oracle have done in the OBIEE 11.1.1.7.1 SampleApp, shown below:

NewImage

But that’s really an article in itself – so I’ll cover this process and how you surface it all through OBIEE in a follow-up post soon.

New OTN Article: Making the Move from Oracle Warehouse Builder to Oracle Data Integrator 12c

One of the reasons I’ve been silent on the blog lately regarding new Oracle Data Integrator 12c content (even though I promised it some months ago) is because I’ve been writing an article focused on migration options from Oracle Warehouse Builder (OWB). Published recently on OTN, this article hopefully helps customers understand some of the options Oracle has included to make good on their recent Statement of Direction for Data Integration.

Making the Move to ODI 12c

While the article is focused primarily on OWB customers looking to make the move, there’s plenty of value here for organizations already familiar with ODI 11g. I walk through an Enterprise Installation of ODI 12c on Linux, including Fusion Middleware domain creation and JEE agent configuration. Following that, I explore two new technical capabilities for OWB shops included in the new tool. First is Runtime Integration, which allows the execution of OWB mappings and process flows directly from ODI with complete auditing and load plan integration. Secondly, I investigate the new Migration Utility which will migrate the majority of customer mappings directly from OWB to ODI 12c. I then embark on a functional assessment of what it means to add value to a data integration project, and how organizations can use these new features to achieve it. I present three different approaches for OWB shops to implement ODI 12c: the Side-by-Side Approach, the Big Bang Approach, and the Phased Approach. Understanding that no size fits all, I’m sure organizations all over the spectrum will recognize a fit in at least one of these solutions.

As always, early access and beta participation means that Rittman Mead is prepared to implement an OWB migration project immediately using any of these three approaches. If you have any questions, please comment here on the blog, or reach out to us directly.

Oracle Openworld 2013 : Reflections on Product Announcements and Strategy

I’m sitting writing this at my desk back home, with a steaming mug of tea next to me and the kids pleased to see me after having been away for eight days (or at least my wifepleased to hand them over to me after looking after them for eight days). It was an excellent Oracle Openworld – probably the best in the ten years I’ve been going in terms of product announcements, and if you missed any of my daily updates, here’s the links to them:

We also delivered sixteen sessions over the week, and whilst a few of them can’t be circulated because they contain details on beta or forthcoming products, here’s links to the ones that we can post:

So then, on reflection, what did I think about the various product announcements during the week? Here’s my thoughts now I’m back in the UK.

NewImage

First off – Exalytics. Clearly there’s a lot of investment going into the Exalytics offering, both from the hardware and the software sides. For hardware, it’s just really a case of Oracle keeping up with additions to Sun’s product line, and with the announcement of the T5-8 model we’re now up to 4TB of RAM and 128 SPARC CPU cores – aimed at the BI consolidation market, where 1 or 2TB of RAM quickly goes if you’re hosting a number of separate BI systems. Cost-wise – it’s correspondingly expensive, about twice the price of the X3-4 machine, but it’s got twice the RAM, three times the CPU cores and runs Solaris, so you’ve got access to the more fine-grained workload separation and virtualisation that you get on that platform. Not a machine that I can see us buying for a while, but there’s definitely a market for this.

With Exalytics though you could argue that it’s been the software that’s underwhelmed so far, as opposed to the hardware. The Summary Advisor is good, but it doesn’t really handle the subsequent incremental refresh of the aggregate tables, and TimesTen itself whilst fast and powerful hasn’t had a great “out of the box” experience – in the wrong hands, it can give misleadingly-slow response-times, something I found myself a few months ago back on the blog. So it was interesting to hear some of the new features that we’re likely to see in “Exalytics v2.0″, probably late in calendar year 2014; an updated aggregate refresh mechanism based on DAC Server technology and with support for GoldenGate; new visualisations including data mash-up capabilities that I’m guessing we’ll see as exclusives on Exalytics and Oracle’s cloud products; enhancements coming for Essbase that’ll make it easier to spin-off ASO cubes from an OBIEE repository; and of course, the improvements to TimesTen to match those coming in the core Oracle database – in-memory analytics.

NewImage

And what an announcement that was – in-memory column-store technology within the Oracle database, not predicated on using Exadata, and all running transparently in the background withminimal DBA setup required. Now in-reality, not only is this not the first in-memory Oracle database offering – the Exadata boxes in previous open world presentations also were positioned as in-memory, but that was flash memory, not DRAM – and they’re not the first vendor to offer in-memory, column-store as a feature, but given that it’ll be available to all Oracle 12.1.2 databases that license the in-memory option, and it’ll be so easy to administer – in theory – it’s a potential industry game-changer.

Of course the immediate question on my lips after the in-memory Oracle Database announcement was “what about TimesTen“, and “what about TimesTen’s role in Exalytics”, but Oracle played this in the end very well – TimesTen will gain similar capabilities, implemented in a slightly different way as TimesTen already stores its data in memory, albeit in row-store format – and in fact TImesTen can then take on a role of a business-controlled, mid-tier analytic “sandbox”, probably receiving new in-memory features faster than the core Oracle database as it’s got less dependencies and a shorter release cycle, but complementing the Oracle database an it’s own, more large-scale in-memory features. And that’s not forgetting those customers with data from multiple, heterogenous sources, or those that can’t afford to stump-up for the In-Memory option for all of the processors in their data warehouse database server. So – fairly internally-consisent at least at the product roadmap level, and we’ll be looking to get on any betas or early adopter programs to put both products through their paces.

The other major announcement that affects OBIEE customers, is, of course, OBIEE in the Cloud – or “Reporting-as-a-Service” as Oracle referred to it during the keynotes. This is one of the components of Oracle’s new “platform-as-a-service” or PaaS offering, alongside a new, full version of Oracle 12c based on its new multitenant architecture, identity-management-as-a-service, documents-as-a-service and so on. What reporting-a-service will give us isn’t quite “OBIEE in the cloud”, or at least, not as we know it now; Oracle’s view on platform-as-a-service is that it should be consumer-level in terms of simplicity to setup, and the quality of the user interface, it should be self-service and self-provisioning, and simple to sign-up for with no separate need to license the product. So in OBIEE terms, what this means is a simplified RPD/data model builder, simple options to upload and store data (also in Oracle’s cloud), and automatic provisioning using just a credit card (although there’ll also be options to pay by PO number etc, for the larger customers.)

NewImage

And there’s quite a few things that we can draw-out of this announcement; first, it’s squarely aimed – at least at the start – at individual users, departmental users and the like looking to create sandbox-type applications most probably also linking to Oracle Cloud Database, Oracle Java-as-a-Service and the like. It won’t, for example, be possible to upload data to this service’s datastore using conventional ETL tools, as the only datasource it will connect to at least initially will be Oracle’s Cloud Database schema-as-a-service, which only allows access via ApEx and HTTP, because it’s a shared service and giving you SQL*Net access could compromise other users. In the future, it may well connect to Oracle’s full DBaaS which gives you a full Oracle instance, but for now (as far as I’ve heard) there’s no option to connect to an on-premise data source, or Amazon RDS, or whatever. And for this type of use-case – that may be fine, you might only want a single data source, and you can still upload spreadsheets which, if we’re honest, is where most sandbox-type applications get their data from.

This Reporting-as-a-Service offering though might well be where we see new user interface innovations coming through first, though. I get the impression that Oracle plan to use their Cloud OBIEE service to preview and test new visualisation types first, as they can iterate and test faster, and the systems running on it are smaller in scope and probably more receptive to new features. Similar to Salesforce.com and other SaaS providers, it may well be the case that there’s a “current version”, and a”preview version” available at most times, with the preview becoming the current after a while and the current being something you’ve got 6-12 months to switch from after that point. And given that Oracle will know there’s an Oracle database schema behind the service, it’s going to make services such as the proposed “personal data mashup” feature possible, where users can upload spreadsheets of data through OBIEE’s user interface, with the data then being stored in the cloud and the metrics then being merged in with the corporate dataset, with the source of each bit of data clearly marked. All this is previews and speculation though – I wouldn’t expect to see this available for general use until the middle of 2014, given the timetable for previous Oracle cloud releases.

NewImage

The final product area that I was particularly interested in hearing future product direction about, was Oracle’s Data integration and Data Quality tools. We’ve been on the ODI 12c beta for a while and we’re long-term users of OWB, EDQ, GoldenGate and the other data integration tools; moreover on recent projects, and in our look at the cloud as a potential home for our BI, DW and data analytcs projects, its become increasingly clear that database-to-database ETL is no longer what data integration is solely about. For example, if you’re loading a data warehouse in the cloud, and the source database is also in the cloud, does it make sense to host the ETL engine, and the ETL agents, on-premise, or should they live in the cloud too? 

And what if the ETL source is not a database, but a service, or an application such as Salesforce.com that provides a web service / RESTful API for data access? What if you want to integrate data on-the-fly, like OBIEE does with data federation but in the cloud, from a wide range of source types including services, Hadoop, message buses and the like. And where does replication come in, and quality-of-service management, and security and so forth come in? In my view, ODI 12c and its peers will probably be the last of the “on-premise”, “assumed-relational-source-and-target” ETL tools, with ETL instead following apps and data into the cloud, assuming that sources can be APIs, messages, big data sources and so forth as well as relational data, and it’ll be interesting to see what Oracle’ Fusion Middleware and DI teams come up with next year as their vision for this technology space. Thomas Kurian’s keynote touched on this as a subject, but I think we’re still a long way from working out what the approach will be, what the tooling will look like, and whether this will be “along with”, or “instead of” tools like ODI and Informatica.

Anyway – that’s it for Openworld for me, back to the real world now and time to see the family.  Check-back on the blog next week for normal service, but for now – laptop off, kids time.