Tag Archives: Oracle BI Suite EE
An Oracle BI “Blast from the Past”: Ten Products from Oracle’s BI History You May Not Have Heard Of…
With Oracle BI Enterprise Edition, the BI Apps and Hyperion EPM Suite, Oracle have a set of reporting and analysis tools that can be considered “best of breed” and compete with any other tool on the market. Coupled with the Oracle database, the engineered systems strategy and the Fusion Apps/Apps Unlimited ERP suites, as a BI developer it’s pretty clear we “bet on the right horse” in terms of which company we backed as developers. But it’s not always been as plain sailing as this, and like every other software company Oracle have released a few products over the years that didn’t take-off so well, never really came together or just got eclipsed by other acquisitions Oracle made over time. So lets take a light-hearted count-down through the Oracle BI, DW and ETL products released over the years that you may not have heard of, but at least some of us bear the scars from trying to implement ….
10. Oracle Warehouse Builder 10gR2′s Data Quality and Enterprise ETL Options
Back in the days before Oracle Data Integrator, Oracle Warehouse Builder was Oracle’s strategic data integration tool, initially sold on a per-developer basis but in time, bundled in with the Oracle database to match similar product packaging from Microsoft.
LIke the old saying about technology reaching the point of perfection before then becoming obsolete, in retrospect OWB10gR2 pretty much got it right in terms of Oracle-centric data integration, and the Data Quality option in particular has yet (to my mind) to be surpassed in term of data quality integration with an Oracle ETL tool. Of course, with the acquisition of Sunopsis Oracle went with ODI as their strategic ETL tool, driven mostly by the fact that it was cross-platform and had a bit more of a SOA/middleware angle than did OWB, but it’s still a shame to see OWB being de-emphasised over the years and we still miss its dimensional data modelling capabilities, integration with Oracle OLAP, and of course the data quality features that were introduced with OWB10gR2′s Data Quality Option.
9. Oracle Browser, Oracle Data Query, and Oracle Discoverer
Again, products made obsolete by newer and better ones coming through, rather than failing in themselves, Oracle Browser was Oracle’s first proper ad-hoc query tool, which in turn begat Oracle Data Query, which then begat Oracle Discoverer, still in use across many EBS sites and still with some features yet to be incorporated into Oracle BI Enterprise Edition.
But its easy to get rose-tinted-spectacles about Discoverer; having to dig out and maintain ancient Oracle JInitiator Java browser plug-ins to get Discoverer Plus to work; Discoverer Viewer only having a fraction of the functionality of Discoverer Plus; the web-based version of Discoverer first appearing with Oracle 9iAS, possibly the worst piece of Oracle software ever released, and so on. But for getting the job done with minimal fuss, Discoverer is still hard-to-beat as an ad-hoc query tool for the Oracle database, and of course its’ still available and runs now as part of the Fusion Middleware 11g setup, and it’ll still be maintained and developer for as long as there’s EBS customers out there wanting to do some simple exploration of their ERP data.
8. Oracle Data Mart Suite
Here’s one for an Oracle BI Trivial Pursuit quiz – what was the name of Oracle’s first combined ETL, data warehousing and reporting product suite, based around Oracle 8i and using ETL code licensed from Sagent? The answer is Oracle Data Mart Suite, a largely-forgotten precursor to Oracle Warehouse Builder that combined Oracle Discoverer, Oracle Designer and a tool called Oracle Data Mart Builder along with Oracle 8i to create Oracle’s first end-to-end BI & data warehousing tool.
Some of the concepts are very familiar to us now – a central repository, agents to collect metadata and run code, graphical tools to create data models and define ETL processes – but Data Mart Builder was Windows-only when Oracle were just about to move wholesale into Linux with Oracle 9i, and of course the ETL functionality pre-dates the inbuilt SQL ETL that came with Oracle 9i. Oracle Warehouse Builder 2.1 came along towards the start of the 2000s and replaced Data Mart Builder and Data Mart Designer, but Discoverer lived on and Oracle still really haven’t got a single install, single metadata store solution to replace it. Data Mart Suite even pre-dates my involvement with Oracle, but I’d be interested if anyone reading this (my guess – Kent Graziano ;-)) has any first-hand experience in working with it.
7. Oracle Darwin
Just around the same time as Oracle 9i was released, Oracle made the acquisition of Thinking Machines, a data-mining and supercomputer company based out in the States who sold a product called Darwin, a Clementine-like GUI analytical workbench that Oracle later rebranded as “Oracle Data Mining”.
Darwin never really saw the light of day with Oracle but the internal algorithms and technologies went on to form the core of the Data Mining Option for the Oracle Database (now part of the wider database Advanced Analytics Option), which now has GUI elements of its own but does all of the processing in the Oracle database. Technology derived from Darwin can also be found today in products like Oracle Spend Classification, working under the covers to classify corporate spend using data mining classification algorithms.
6. Oracle BI Applications 7.9.5.2
Now we’re getting into the interesting ones. Four weeks of my life I’ll never get back were spent back in 2009 getting to grips with the first version of the Oracle BI Apps that used ODI, rather than Informatica, as the ETL tool. Sources and targets in this initial release were limited to just EBS 11.5.10 on Oracle Database 10gR2, but other than that it was fully-functional, with a method for doing customisations, new configuration tools that did away with the need for the DAC, and all of the relevant SDE and SIL mappings re-implemented as ODI interfaces and packages.
But this was back in the days of ODI10g, and there were no load plans or any other features since introduced to ease the move to ODI with the BI Apps, and the customisation approach was slightly scary and complex, to say the least. In the end, only one release of BI Apps on ODI10g ever came out, but of course we’re now just nearing the point where BI Apps 11g gets full support for ODI as an alternative to Informatica as the ETL tool, and seeing the various workarounds Oracle had to do with ODI to get it to work as an Informatica alternative back in 2009 made it obvious to me why features such as load plans were introduced over the past few years.
5. The Oracle BI Spreadsheet Add-in for Excel
A spreadsheet add-in with less useful functionality than Oracle BI Office, the GUI performance of Warehouse Builder and an OLAP server back-end that nobody used. One of a number of entries in this chart based around Oracle 9i OLAP.
This, and No.1 on our list were probably the single biggest reason Oracle ended-up buying Hyperion – the chances of an accountant actually using this Excel add-in, as opposed to say Hyperion’s Essbase add-in, were about as close to zero as you could get, assuming you could find anyone still using Oracle OLAP after the bodged migration from Express Server. But – Oracle 9i OLAP going so badly paved the way, in time and several years later, for the Hyperion acquisition, and now Oracle OLAP sensibly focuses on the Simba MDX Provider for Oracle OLAP along with Microsoft Excel pivot tables, the descendants from this product release are actually pretty darn good.
4. Oracle Business Intelligence Essbase Integrator
Another one for Oracle BI Trivial Pursuit – which product from Oracle integrates OBIEE, Essbase and the BI Apps, so that you can click on links on your BI Apps dashboard and launch Smarview, preserving the POV context from the related analysis on the dashboard?
The answer is the Oracle Business Intelligence Essbase Integrator, a product launched by Oracle back in 2010 and which appeared to me, at the time, as wildly ambitious but solved a real problem – how do you combine the capabilities of Essbase and OBIEE whilst in the background, keeping their metadata in-sync.
Rather predictably, we didn’t really hear much about this product again which was a shame, as the concept was superb (albeit a bit “Heath Robinson”, or “Rube Goldberg” as the Americans would say). I suspect we’ll see something around this same concept going into the 12c timeline as it’s such an obvious “win” for Oracle, but for now, it’s a product best known as the challenge we set Stewart Bryson back at the Brighton BI Forum 2011 when Venkat couldn’t make it over – present Venkat’s OBI Essbase Integrator session as if it were his, and without seeing the content of each slide until it came up on the projector.
3. Oracle Daily Business Intelligence
Prior to the BI Apps and around the time of Oracle E-Business Suite 11i, reporting against EBS data was done either through Discoverer, or through Oracle Reports and FSGs. Oracle Discoverer could integrate with Oracle Portal, but it was fairly primitive integration and Portal wasn’t really cut-out to be a BI Portal, more being used for intranet-style applications like data entry and staff directories.
Oracle Daily Business Intelligence (DBI) improved on this situation in two ways; first, it was based off-off Oracle Database materialised views, speeding up report response times and, in theory, taking the load off of your underlying ERP system; second, it had its own custom portal and dashboard framework that was faster, more responsive and “tighter” than Oracle Portal and its Discoverer portlets.
DBI may well have got more traction over time but probably was eclipsed by the major acquisition spree that Oracle went on in the early 2000′s, buying Peoplesoft (and thereby JD Edwards) and Siebel, and then using Siebel Analytics along with BEA’s WebLogic tooling to create the successor BI-on-ERP platform, OBIEE and the BI Apps. Which was probably a good thing in the end, as I never met anybody actually able to customise Daily Business Intelligence, a task that makes customising the BI Apps seem like plugging your iPod into iTunes running on a Mac.
2. Oracle 9i OLAP, and Oracle Business Intelligence Beans
On to the final two, and they’re both based around Oracle 9i OLAP, Oracle’s (in retrospect) disasterous replacement for Oracle Express Server which lost them the OLAP market they owned back in the late 90′s. I’ve got mixed feelings on 9i OLAP as I know many of the people involved in its development and marketing, and nowadays in its database OLAP option form it’s an excellent product, I think technically and architecturally better than Essbase. But the introduction of Oracle 9i OLAP was a masterclass in suicidal product marketing; first, there was no backward compatibility with Express Server tools, so all of the users of Oracle Financial Analyzer and Oracle Sales Analyzer had to wait years for the product at #1 in our list, with of course most of them decamping to Microsoft OLAP Services or Arbor Essbase instead. The first version of Oracle 9i OLAP was ROLAP only, with terrible, buggy performance and a wait of a year or so before the MOLAP version came out, again without backwards compatibility with Express Server tools (due to the removal of the key SNAPI and XCA communications APIs that Express Server applications used)
All of this made great technical sense, and if the MOLAP version of 9i OLAP had come out at the same time as the ROLAP version, and if Oracle somehow managed to keep SNAPI support so that OFA and OSA customers could technically migrate their OLAP cube to Oracle 9i without loosing their tool access. It may have all worked out. But the “icing on the cake” was the lack of any proper ad-hoc or OLAP query tool support right at the start (Discoverer “Drake” came a few years later), with customers expected to – get this -write their own BI tool using a Java component technology called Oracle Business Intelligence Beans.
A few parters, including myself in this bit of history from around ten years ago, gamely tried to generate interest around 9i OLAP and BI Beans, but combined with Microsoft’s entry into the OLAP market and Arbor (and then Hyperion’s) focus on the finance department, rather than DBAs and IT who never actually buy OLAP servers, Oracle OLAP never regained the market share that Express Server had, even though as I said earlier it’s arguably a better, more scalable and easier-to-manage OLAP Server than Essbase.
The last laugh is on the BI Beans product development team though, as the BI Beans query builder became the inspiration for OBIEE 11g’s “Selection Steps” feature, whilst its data visualisation components found their spiritual successor in ADF’s Data Visualization Tools (DVT) feature, which provides the BI visuals behind OBIEE, the latest version of Endeca Information Discovery, and of course the Oracle Fusion Apps.
1. Oracle Enterprise Planning & Budgeting
Number one in our list of Oracle’s slightly crazy BI tools from the past was Enterprise Planning & Budgeting, the long-awaited replacement for Oracle Financial Analyzer and Oracle Sales Analyzer based around the Oracle 9i OLAP platform. More akin to the Stone Roses’ “Second Coming” and about as well critically received, EPB was the “aircraft carrier” to OFA’s “motor torpedo boat”, had a list as long as your arm of critical patches you had to apply before you could use it, and required installation along with EBS (and knowledge of a set of arcane setup steps) before you could use it.
Coupled with a painfully-slow user interface for users typically used to split-second OFA response-times, EPB was long in the coming but quickly despatched when Oracle first adopted Siebel Analytics and the BI Apps as their new BI Platform, and then bought Hyperion and made Essbase and Hyperion Planning the centrepiece of their performance management strategy, something that carries on to this day.
So there we have it – a light-hearted look through some of the Oracle BI products that didn’t make it to the big time, and a bit of history to explain why OBIEE and EPM Suite are the tools we use today. Most of the Oracle PMs who looked after these tools are still with us, working on OBIEE and its related technologies, so apologies if I’ve inadvertently offended anyone by picking on one of the products they looked after – it was all fun at the time and most of the products would have stayed with us, and gone on to be successes were it not for the massive strategic shift Oracle made back at the turn of the century towards cross-platform, and away from the Oracle database begin the centre of everything. Let me know if you’ve had any experiences with these tools, or if you’ve got any corrections or additions to their stories.
OBIEE, ODI and Hadoop Part 4: Hive Data Transformation & Integration via ODI 11g
In the previous three articles in this series (and well done for everyone that’s still with us), we looked at how OBIEE 11.1.1.7 and ODI 11.1.1.6+ can access Hadoop data sources via a related technology called Hive, looking in the second article in more detail at the OBIEE aspect including how the OBIEE metadata import was set up, and how new HiveODBC drivers supplied by Oracle make this all possible. In the last posting in the series, we saw how ODI can be used to populate the Hive “warehouse” tables, stored on Hadoop’s HDFS (Hadoop Distributed File System) storage grid, that are then analysed via HiveQL and MapReduce, and in this final post, we’ll take a look at how ODI can go beyond simple loading from file into Hive tables and start to do some of the data transformation, and integration tasks that we regularly perform with more traditional data sources and targets.
In the previous ODI and Hive/Hadoop example, we used an ODI interface to load data from a local file into a Hive table, using the IKM File to Hive knowledge module. In this first example columns in the source file were mapped 1:1 into “columns” in the Hive table source, like this:
whilst in the second example, we used a Hadoop “SerDe” serializer-deserializer transformation to parse incoming weblog rows into the target Hive column format, again using this same IKM File to Hive knowledge module.
Which of course is great if you’re loading data directly from files into corresponding Hive warehouse tables, but what if you’ve already got data in Hive but you want to “reshape” or transform it, creating the equivalent of a star schema, say, of a set of “normalised” tables?
Hive, compared to Oracle at least, is pretty simplistic in terms of the data manipulation you can do with it, and is more akin to working with Oracle external tables than a full insert-update-delete-ACID-compliant database (but then again, it was never intended to be that). You can only insert new data into Hive tables, not (ordinarily) update rows or delete them, with the only way to delete data in a Hive table being to drop it, then re-create it new. HiveQL has syntax for joins, group by and some functions, and you can drop in your own custom MapReduce scripts for the more complex stuff – which is not bad going and probably more than enough for your average Hadoop/Big Data analyst.
But the scenarios we’re considering probably won’t have a Hadoop expert around, and are typically something like:
- We’ve got some marketing or analyst-type users who want to access some behavioural, weblog or activity data sitting in Hadoop/HDFS/Hive, they’re happy users of OBIEE, and we want to be able to connect OBIEE to this data so that they can report on it – which probably means “reshaping” it somewhat to fit OBIEE’s preference for star-schema (or at least simplified, denormalized) source data models, or
- We’ve got some data sitting in Hadoop/HDFS that we’d like to add in as a new source into our data warehouse, and ODI is the preferred tool for bringing in new sources
In neither case do we have much in the way of Hadoop or MapReduce skills, so we can either (a) use OBIEE’s BI Repository to do some source data reshaping, or even better (b) do the job properly, maybe as part of a much wider ETL process, using ODI. The key thing is fitting Hadoop into ODI’s way of doing things, and giving ODI the ability to do Hadoop and Hive-specific tasks through its extensible “knowledge module” framework.
In the example above that I used to illustrate ODI’s ability to work with Hive, I used the IKM File to Hadoop knowledge module that comes as part of the ODI Application Adaptor for Hadoop (ODIAAH); ODIAAH is one of a number of Fusion Middleware “application adapters”, and is licensed separately (but in conjunction with) ODI EE costing around the same as GoldenGate (around $17000/processor, according to the current tech price list). For some more background into ODIAAH this self-study training session goes into the basics, but at a high-level this adapter actually ships four mainly Hive-based ODI KMs that enables data loading, transformation and integration between Hadoop/Hive/HDFS and the Oracle database (amongst other sources/targets);
- IKM File to Hive (Load Data) : what we’ve been using so far, used to load Hive tables from local and HDFS-stored files
- IKM Hive Control Append : used for loading data into Hive tables, with data sourced from one or more other Hive tables – the equivalent of loading one relational table by joining, filtering and transforming other tables
- IKM Hive Transform : a variation on the previous KM that allows incoming data to be transformed via Python or Perl scripts
- IKM File-Hive to Oracle : load data from either a Hive table, or from files stored on HDFS, into an Oracle database using the separately-licensed Oracle Loader for Hadoop
- CKM Hive : a check knowledge module for Hive (which because of its “schema on read” rather than “schema on write” approach, doesn’t natively support keys or constraints)
- RKM Hive : a “reverse-engineering” KM that allows ODI to read table definitions from the Hive metastore, and use them to create ODI datastore definitions
A typical Oracle BI&DW project is going to use these KMs for a couple of main reasons; one, as stated before, is because their might be information sitting in Hadoop that the BI/DW system wants access to, and would otherwise have to start writing MapReduce, Pig, Sqoop etc code to get into their data warehouse. The other might be to leverage Hadoop’s ability to crunch and count large sets of data massively parallel, at relatively low cost, with ODI then initiating and monitoring the process, then loading the results into a more traditional Oracle data store. Other technologies, some implemented or licensed by Oracle, also surround this process – Oracle NoSQL database for example – but for now lets concentrate on Hadoop and Hive, and see what else these KMs can do for us.
Starting off with IKM Hive Control Append, you might have a situation where you’ve already got some data in Hive tables (perhaps through loading them up using ODI’s File to Hive (Load Data) KM, but the data needs joining, transforming, filtering or otherwise reshaping before you can connect a tool such as OBIEE to it. In this case it doesn’t make sense to use ODI File to Hive (Load Data) as this KM is for when the data sits outside Hive in source files, so instead we can use IKM Hive Control Append to truncate/insert append new rows into an existing target Hive table.
This process works very similar to regular table loading IKMs (except of course you can’t insert/update into the target, only insert append); for example, in the screenshot below, two Hive tables, one for customer information and one for salesperson information, are joined and the results transformed using HiveQL with the results loaded into another Hive table.
Looking at the loading code generated by ODI for this interface, you can see that the HiveQL used to extract and load the data looks very similar to Oracle SQL, and in fact this KM is about the closest one to “regular” relational database ones out of the set of ODIAAH knowledge modules.
Things get more interesting with the IKM Hive Transform, a knowledge module that takes data from any source and loads it, via custom shell scripts, into a target Hive table. These transformation shell scripts are typically written in Python or Perl, and give you the ability to write your own custom pre-processing or transformation code that (via ODI’s temporary interfaces feature) can then be used as “inline views” or multi-step processes when performing more complex data loading processes around Hive. In the example below, a three-step data loading process first creates and then loads an Apache Weblog file using IKM File to Hive (Load Data), then sessionizes (tags the log file with IDs to identify all activity within a particular browser session) using IKM Hive Transform:
Taking a look at the final “Sessionize Weblog” step in more detail, you can see in the interface Flow tab that this step uses the IKM Hive Transform module, and a script called “sessionize.pl” to do the transform work.
The perl script itself then parses through the log file information and works out the start, and stop points for each individual user session, outputting the results which are then transformed into the correct target columns by the PRE_TRANSFORM_DISTRIBUTION settings in the KM options.
Finally, the IKM File-Hive to Oracle knowledge module takes things in the other direction, extracting from Hive tables or HDFS files into an Oracle database, via the Oracle Loaded for Hadoop big data connector, but that’s really a topic in itself and for another day, when I’d like to look in more detail at the big data connectors in general, and how you can leverage Hadoop and HFDS from within SQL and PL/SQL commands. For now though, this concludes my look at Hadoop connectivity from within OBIEE 11.1.1.7 and ODI.11.1.1.6, but if you’re like me this brings up as many questions as it answers; for example:
- How would I go about setting up my own Hadoop/Hive/HDFS development environment, and can I use Windows or does only Linux make sense?
- Just what could I do with Hadoop as a data source that I can’t do with regular Oracle and file sources; in terms of scale, and also complexity/speed?
- Can ODI also make use of Hadoop data loading/transformation tools like Pig, or Sqoop?
- Do any of the related/commercialized Hive/Hadoop technologies add anything to these scenarios – for example, Cloudera’s Impala (for BI metadata/reporting) or Cloudera Manager (for Hadoop administration)?
- And – is there a time/volume threshold where Hadoop makes more sense as a data processing platform than an Oracle database?
If these sound interesting to you, they’re exactly what I’ll be covering during my part of the Data Integration Masterclass at the Rittman Mead BI Forum 2013 events in Brighton and Atlanta, running in just a few weeks time. We’ve still got a few places left, so if you’re interested and want to see all this technology in action, sign-up now and I’ll hopefully see you soon.
OBIEE, ODI and Hadoop Part 3: A Closer Look at Hive, HFDS and Cloudera CDH3
In the first two parts in this series, I looked at the recently-added support for Apache Hadoop as a data source for OBIEE 11.1.1.7 and ODI 11.1.1.6, and explained how the Hadoop support was really enabled through a related technology called Hive. In the second part in the series I showed how OBIEE 11.1.1.7 could report against “big data” sources using Hadoop and this Hive technology, but this all of course pre-supposes that we have data in Hive in the first place. So what actually is Hive, how do you load data into it, and can ODI help with this process?
To take a few steps back, Apache Hive is a Hadoop-family project that provides a “data warehouse” layer over Hadoop, through a metadata layer not unlike OBIEE’s RPD together with a SQL-like language called HiveQL. Coupled with ODBC and JDBC database drivers, BI tools like OBIEE use Hive to get access to big data sources, as the HiveQL language that Hive uses is very similar to SQL used to access databases such as Oracle, SQL Server or mySQL. Delving a bit deeper in the Hive product architecture, as shown in the diagram below Hive has a number of components including a “database engine”, a metadata store, APIs for client access, and a link through to Hadoop to actually load, process and retrieve data in HDFS (Hadoop Distributed File System).
So what’s HDFS then? HFDS is a fault-tolerant, distributed filesystem that’s a core part of Apache Hadoop, and stores the data that MapReduce jobs then process via job trackers, task trackers and all the other Hadoop paraphernalia. HDFS is accessed through a URI (URL) rather than through your Linux filesystem browser, but distributions such as Cloudera’s CDH3 and CDH4 ship with tools such as Hue, shown below, that provide a web-based interface into HDFS so that you can browse HDFS like a regular OS-level filesystem.
Notice how there’s a “user” folder like we’d get with Linux, and within that folder there’s a home folder for Hive? With Hive, generally the data you manage using Hive is actually loaded into a directory structure under the “hive” user, either using data taken from another directory area in HDFS or from external files. Hive’s data is still in file form and accessed via MapReduce and Hadoop, but it’s in a directory area away from everything else. You can, however, tell Hive to create tables using data held elsewhere in HDFS, analogous to Oracle’s external tables feature, which then skips the data loading process and just maps table structures onto files held elsewhere in the Hadoop filesystem.
In most cases when we’re considering OBIEE accessing Hadoop data via Hive, the data would have been loaded into Hive-mananged tables tables beforehand, though it’s possible that Hive table metadata could have been mapped onto other data in HDFS. In your own particular Hive implementation and assuming you’ve got Hue installed, and Beeswax, a table browser for Hive that usually comes with Hue, you can see where each individual table within your Hive metastore is actually held; in the examples below, the dwh_customer Hive table is a managed table and has its data stored within the /user/hive/warehouse/ HDFS directory, whilst the ratings table has its data stored outside of Hive’s directory structure, but still within the HDFS managed filesystem.
So how does one create a Hive table, load data into it and get it ready for OBIEE access, and can ODI help with this, as we asked earlier? Before we get into ODI then, let’s take a look at how a Hive table is created and loaded, and then we’ll see how ODI does the same job.
With thanks to the ODI product development team’s David Allan, who put together some great Hive and ODI examples in this blog post, let’s start by creating a Hive table against the same movie ratings data in the right-hand screenshot below, but this time with the data actually loaded into Hive’s directory structure (i.e. a “managed” table). From the Hive command-shell, I type in the following commands to create the managed table, after SSH’ing into the VM running Hive:
officeimac:~ markrittman$ ssh oracle@bigdatalite
Warning: Permanently added the RSA host key for IP address '192.168.2.35' to the list of known hosts.
oracle@bigdatalite's password:
Last login: Mon Apr 22 10:59:07 2013 from 192.168.2.47
=====================================================
=====================================================
Welcome to BigDataLite
run startx at the command line for X-Windows console
=====================================================
=====================================================
Host: bigdatalite.us.oracle.com [192.168.2.35]
[oracle@bigdatalite ~]$ hive
Hive history file=/tmp/oracle/hive_job_log_oracle_201304250732_1523047910.txt
hive> create table movie_ratings (user_id string
> , movie_id string
> , rating float
> , tmstmp string)
> row format delimited fields terminated by '\t';
OK
Time taken: 3.809 seconds
hive>
At this point the table is created but there’s no data in it; that part comes in a moment. I can see the table structure and its empty state from the Hive command-line:
hive> describe movie_ratings;
OK
user_id string
movie_id string
rating float
tmstmp string
Time taken: 0.168 seconds
hive> select count(*) from movie_ratings;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapred.reduce.tasks=
Starting Job = job_201303171815_0021, Tracking URL = http://localhost.localdomain:50030/jobdetails.jsp?jobid=job_201303171815_0021
Kill Command = /usr/lib/hadoop-0.20/bin/hadoop job -Dmapred.job.tracker=localhost.localdomain:8021 -kill job_201303171815_0021
2013-04-25 07:40:51,581 Stage-1 map = 0%, reduce = 0%
2013-04-25 07:40:56,617 Stage-1 map = 0%, reduce = 100%
2013-04-25 07:40:58,640 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201303171815_0021
OK
0
Time taken: 12.931 seconds
hive>
and also from the Beeswax web UI:
So how do we get the data into this table, without any tools such as ODI? I can either load data straight from files on my local workstation, or I can upload them, for example using Hue, into the HDFS filesystem first.
Now I can use the HiveQL LOAD DATA command to load from one of these HDFS tables into Hive, and then count how many rows have been loaded, like this:
hive> load data inpath '/user/oracle/movielens_src/u.data'
> overwrite into table movie_ratings;
Loading data to table default.movie_ratings
Deleted hdfs://localhost.localdomain/user/hive/warehouse/movie_ratings
OK
Time taken: 0.341 seconds
hive> select count(*) from movie_ratings;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapred.reduce.tasks=
Starting Job = job_201303171815_0022, Tracking URL = http://localhost.localdomain:50030/jobdetails.jsp?jobid=job_201303171815_0022
Kill Command = /usr/lib/hadoop-0.20/bin/hadoop job -Dmapred.job.tracker=localhost.localdomain:8021 -kill job_201303171815_0022
2013-04-25 08:14:24,159 Stage-1 map = 0%, reduce = 0%
2013-04-25 08:14:32,340 Stage-1 map = 100%, reduce = 0%
2013-04-25 08:14:42,420 Stage-1 map = 100%, reduce = 33%
2013-04-25 08:14:43,428 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201303171815_0022
OK
100000
Time taken: 26.32 seconds
hive>
So how does this process look when using ODI to do the Hive data loading? Let’s start with importing the Hive table metadata for the movie_ratings table I just created from the Hive command-line shell, by going over to the Topology navigator in ODI 11.1.1.6 – note that you’ll need to configure ODI to connect to your Hive, HDFS and Hadoop environment beforehand, using the Oracle Data Integrator for Hadoop documentation as a guide, with this adapter being an extra-cost license option on top of base ODI Enterprise Edition.
Hive has its own technology type within the Topology navigator, and you create the connection through to Hive using the HiveJDBC driver, first adding the connection to the Hive server and then specifying the particular Hive database / namespace, in this case selecting the “default” database for my Hive system.
Now I can reverse-engineer the Hive table structures into a Designer navigator model, just like any other relational table structure.
Within the ODI Topology navigator you can then create File technology connections either to files held in HFDS, or more likely with ODI to files on your workstation, or server, filesystem, like this:
and then add the filedata stores to the Designer Navigator Model list, entering the correct delimiter information and reversing the column definitions into the datastore definition.
Now it’s a case of creating an interface to load the Hive table. In this instance, I map each of the source file “columns” into the Hive table’s columns, as the source file is delimited with an easily-usable structure.
Then, over in the Flows tab for the interface, I make sure the IKM File to Hive knowledge module is selected, keep the default values for the KM options (more on these in a moment), and then save the interface.
Now it’s a case of running the interface, and checking the results. Notice in the Operator navigator code panel, the LOAD DATA command that ODI is generating dynamically, similar to the one I wrote manually earlier on in the article.
Going back to my Hive command-line session, I can see that there’s now 100,000 rows in the movie_ratings Hive table.
hive> select count(*) from movie_ratings;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapred.reduce.tasks=
Starting Job = job_201303171815_0024, Tracking URL = http://localhost.localdomain:50030/jobdetails.jsp?jobid=job_201303171815_0024
Kill Command = /usr/lib/hadoop-0.20/bin/hadoop job -Dmapred.job.tracker=localhost.localdomain:8021 -kill job_201303171815_0024
2013-04-25 16:59:12,275 Stage-1 map = 0%, reduce = 0%
2013-04-25 16:59:18,346 Stage-1 map = 100%, reduce = 0%
2013-04-25 16:59:29,467 Stage-1 map = 100%, reduce = 33%
2013-04-25 16:59:30,475 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201303171815_0024
OK
100000
Time taken: 27.251 seconds
Now in many cases the data going into a Hive table isn’t neatly arranged into columns within delimited files; it could be, for example, web log data that you’ll need to parse using regular expressions or other APIs or standard parsers. When that’s the case, you can use an option with the IKM File to Hive knowledge module to override the normal column-to-column mappings and instead use an expression, something Oracle have done in their demo environment for parsing these types of log files.
“ROW FORMAT SERDE” is a reference to Hive’s “Serializer – Deserializer”, or row-formatting feature, that gives you the ability to use regular expressions and other data manipulation techniques to, in this case, allocate incoming file data to the proper columns in the target hive table.
So now we’re at the point where we can use ODI to populate the Hive tables that OBIEE in turn uses to access Hadoop data sources. But what if the data we want to load into Hive isn’t in the format or shape we need, and we need to join, filter or otherwise work with Hive data and tables before we can report on it. And what if we want to get data out of Hive and into regular tables if a relational data store makes more sense than Hadoop, for a particular reporting requirement? Check back tomorrow for the final part in this series, where we’ll answer these remaining questions.
SmartView as the Replacement for BI Office with OBIEE 11.1.1.7
Apart from system-wide improvements to Essbase integration across OBIEE 11.1.1.7, the other Essbase-related improvement that came with this latest release was the (re-)introduction of SmartView as the replacement for Oracle BI Add-in for Microsoft Office (“BI Office”), OBIEE’s previous MS Office solution. As a reminder, BI Office appeared with OBIEE 10.1.3.3 back in 2007/8 and supported integration with Microsoft Excel and Powerpoint, allowing you to download analysis views from the BI Catalog and then view them within Excel and Powerpoint.
What you couldn’t do with BI Office though was use it to create new analyses, or upload what you’d created back to the BI Catalog. There was also no integration with Microsoft Word or Outlook, which meant it was a feature meant more for viewing and copying analyses into Excel and Powerpoint rather than as a “first class” report authoring environment.
Then when OBIEE 10.1.3.4 was released, a number of EPM Suite products were integrated with OBIEE, including Workspace (now resurrected with OBIEE 11.1.1.7), SmartSpace (where did that go?) and SmartView, the long-term replacement for Essbase’s somewhat minimalist Excel Add-in. This was all good stuff except that, in terms of OBIEE support, this version of SmartView was essentially unusable, rendering OBIEE data in an Essbase-like way that made little sense for an OBIEE user.
“The UI takes a bit of getting used to” was my comment at the time, which in retrospect was a bit of an understatement and this version of SmartView had little to no take-up within the OBIEE world, with BI Office carrying on until now as the only viable MS Office integration approach. Now though, the new improved version of SmartView is with us, so how well does it work with OBIEE data?
SmartView can be download from the BI Presentation Services homepage, but note that this is the 32-bit version and you’ll need to go to My Oracle Support for the 64-bit version, available using patch ID 16238382 (at the time of writing, for SmartView version 11.1.2.2.310). Once its installed, select SmartView > Options > Advanced and enter your general EPM Suite Smartview Provider Services URL into the Shared Connections URL setting (in the format http://[machine_name:port}/workspace/SmartViewProviders), like this:
This setting only covers SmartView connecting to Essbase and Financial Reporting, so to connect to OBIEE's Presentation Services Catalog you'll need to create what's called a Private Connection (or define a shared connection for OBIEE within an XML file, as detailed in the SmartView 11.1.2.2.310 docs), by pressing the Panel button in the menu ribbon, selecting Private Connections from the Smart View menu, then clicking on the Create new connection button.
Then, when prompted for the SmartView connection type, select Oracle BI EE, then type in the OBIEE SmartView URL in the format http://[machine_name:port]/analytics/jbips, and press Finish to complete this part of the process.
Then, when prompted enter the username and password for your OBIEE system, and then save the connection as a private connection to your workstation.
Now you should be able to browse the BI Catalog and select a SmartView report, for example, to view within Excel.
Or you can select any view from a regular analysis, and add that to Excel just as you did with BI Office.
More importantly though, the View Designer feature allows you to create a new report from scratch, selecting from any subject area in the BI Catalog and creating a report from right within Excel.
This report can then be manipulated either as an Excel pivot table (pictured below) or an OBIEE pivot table, giving you an OBIEE-within-Excel experience far more intuitive and usable than the earlier incarnation of SmartView.
Additional calculated fields can be added, in what is arguably a more obvious way than you’d do so in the Analysis Editor…
… and charts can be developed as well, using a similar set of of chart types to the ones provided by the Analysis Editor.
Then, once you’re done, you can either save the Excel (or Word, or Powerpoint, or whatever) document to your workstation’s filesystem, or you can upload to the BI Presentation Catalog using the Publish View button…
… and then – get this – open the report in the Analysis Editor, just like any other analysis in the catalog. Impressive stuff (although the calculation defined in Excel didn’t make it through to OBIEE, and the upload feature only seems to bring a single view at a time, but this is version 1.0)
There’s tons more to Smartview and in reality, presumably some of the new OBIEE stuff won’t work properly in this first release, but it’s a huge improvement over the old OBIEE MS Office plug-in, and it’s also useful being able to use the same MS Office plugin for all Oracle’s BI & EPM tools, with full 32 and 64-bit support for all the modern MS Office versions.
Essbase and EPM Integration Improvements in OBIEE 11.1.1.7
One of the major new feature areas in OBIEE 11.1.1.7, but which has so far got very little attention, is the significant improvement in integration between Essbase, the Hyperion EPM Suite, and OBIEE 11g. The integration between EPM Workspace and OBIEE’s Presentation Services which disappeared when 11g came along is now back, along with installation and security integration, a new version of SmartView that (properly) supports OBIEE as a data source, and the ability to spin-off aggregates from the RPD into Essbase ASO cubes.
Now some of these features of course made an appearance in the earlier, 11.1.1.6.2 BP1 release, and integration between OBIEE 11g and EPM Suite has been happening on-and-off right back from the OBIEE 10g days, but where we’re at now with OBIEE 11.1.1.7 is the delivery of a number of things that customers have long been asking for, including:
- The ability to run OBIEE from within EPM Workspace, with single sign-on between the two
- Shared security provisioning and organisation between Essbase and OBIEE, through application roles and policies
- The ability to install Essbase and the other EPM tools into the same WebLogic domain as OBIEE, using a single installer
- A proper Excel (and Word, Powerpoint, Outlook) add-in for OBIEE, with the ability to author reports as well as run existing Answers-authored ones
This is actually one of a number of new feature areas that came with 11.1.1.7 that have had little publicity; as well as better Essbase integration, there’s actually now support for multi-tenancy in the RPD and catalog, Hadoop integration (which we covered in a blog post last week), the View Suggestion Engine, the inevitable changes to MUD, and quite a few others, some of which I’ll try and cover in the next few days and weeks, but for now let’s look at these new Essbase/EPM integration improvements, starting with installation of Essbase and its related tools into the OBIEE WebLogic domain.
As I mentioned back in my OBIEE 11.1.1.7 New Features posting a few weeks ago, the OBIEE product installer now offers Essbase as an installation option alongside OBIEE, Real-Time Decisions (RTD) and BI Publisher. As with RTD, Essbase isn’t included in the base OBIEE+ license, but it is included in Oracle BI Foundation Suite, the product package that Oracle encourage new customers to take out an includes OBIEE, Scorecard & Strategy Management, Essbase and BI Mobile. Selecting Essbase during the install process installs it, and the other EPM Suite tools, in the same WebLogic domain as OBIEE, and you can see Essbase within Fusion Middleware Control as a product – separate from OBIEE – that you can manage and monitor.
Essbase Server, and Essbase Studio (the client/server tool used to design and build Essbase cubes) are also now controlled and monitored through OPMN, something that’s been a feature of EPM Suite for several releases now but which is, of course, new for OBIEE.
[oracle@obiee11117 ~]$ cd /home/oracle/obiee/instances/instance1/bin
[oracle@obiee11117 bin]$ ./opmnctl status
Processes in Instance: instance1
---------------------------------+--------------------+---------+---------
ias-component | process-type | pid | status
---------------------------------+--------------------+---------+---------
essbasestudio1 | EssbaseStudio | 12682 | Alive
essbaseserver1 | Essbase | 12685 | Alive
coreapplication_obiccs1 | OracleBIClusterCo~ | 12686 | Alive
coreapplication_obisch1 | OracleBIScheduler~ | 12687 | Alive
coreapplication_obijh1 | OracleBIJavaHostC~ | 12683 | Alive
coreapplication_obips1 | OracleBIPresentat~ | 12684 | Alive
coreapplication_obis1 | OracleBIServerCom~ | 12689 | Alive
[oracle@obiee11117 bin]$
So something that’s been an issue for EPM customers upgrading from OBIEE 10g to 11g was the removal, at the time, of the ability to integrate OBIEE’s Presentation Services within EPM Workspace, and the SSO link between the two products. Back with OBIEE 10.1.3.4 there was an admittedly complicated but supported and working process to integrate the two products together, allowing EPM Workspace customers to “skin” OBIEE to look like Workspace and run the two products together, albeit with separate report catalogs, security models and so forth.
This, coupled with the removal of OBIEE’s Hyperion custom authenticator for the RPD left many EPM Suite customers upgrading to OBIEE 11g in the lurch, leading to workarounds such as this one that we put together recently for one of our customers. Well this integration (mostly…) is back with OBIEE 11.1.1.7, so let’s see what it does, and what functionality is still missing compared to OBIEE 10g.
First off, Essbase and EPM Suite as installed as part of an OBIEE installation isn’t quite the same as EPM Suite installed standalone; most importantly, Essbase in this OBIEE incarnation has a different security model than “standalone” EPM Suite, in that it uses the same system of application roles and policies that the Fusion Middleware 11g-centric OBIEE 11g does, rather than the Shared Services and groups that standalone EPM Suite does. Also, the OBIEE 11.1.1.7 install installs just the following EPM Suite products:
- Essbase Server, including Essbase Agent, Essbase Studio, Essbase Administration Services, Provider Services
- Financial Reporting
- Calculation Manager
Therefore you don’t get Planning, Web Analysis and so forth, and you can’t subsequently install them into the domain and Fusion Middleware Control afterwards – so think of Essbase and the EPM Suite tools in this context as an add-on and complement to OBIEE, not a full installation of EPM Suite in their own right. Moreover, the majority of Essbase administration tasks which for standalone EPM Suite installs are performed through MaxL, Shared Services and EAS are performed through Fusion Middleware Control, and Essbase high-availability and clustering works different within this context, for example. The standard product architecture diagram for OBIEE and Essbase combined within the 11.1.1.7 release therefore gets updated, with a number of products added to the Java components, and System components part of the diagram, like this:
Now, when installed as part of OBIEE 11.1.1.7′s WebLogic domain, EPM Workspace is available at http://[machine_name:port]/workspace, and when you launch it you’re presented with a view into the BI Catalog, and menu options to administer the various EPM and BI tools from one place.
Within this catalog are both OBIEE objects such as analyses, dashboards and agents, and EPM objects such as Financial Reporting and SmartView reports.
There are limits to this EPM/BI Catalog integration though – FR reports, for example, can only be opened using the File > Open dialog in EPM Workspace, with an error message showing if you just click on the report itself in the BI Catalog view within EPM Workspace. But SSO between Workspace and OBIEE seems to work (as in, you don’t need to re-enter your BI password when clicking on an analysis in the Workspace Catalog view) as both OBIEE and EPM are working off of the same Fusion Middleware security model, which (the lack of) explains why the feature disappeared for so long after OBIEE 11g was introduced.
Now that OBIEE and Essbase share the same security, the need for the old HSS Custom Authenticator has now gone away, though of course this will only be of use if a customer has moved their Essbase installation into the OBIEE domain, with standalone EPM Suite installations still needing the security workaround mentioned earlier in this article. There’s no upgrade path from standalone EPM Suite installations to this integrated arrangement, so most probably any users of Essbase within this new 11.1.1.7 context will be installing it “net-new”, with the main objective being to enhance their existing BI setup rather than merging their separate BI and EPM platforms into one.
As you’ve probably picked-up by now, much of this new integration ability is down to security harmonised across both Essbase and OBIEE, or more accurately Essbase now having an option to use Fusion Middleware 11g security rather than Hyperion Shared Services. So what does Essbase and FMW11g security look like in practice? Let’s head over to Fusion Middleware Control, in particular the Application Policies administration screen, to take a look.
The big difference when Essbase runs as part of an Oracle BI domain is that authentication, and authorization for Essbase use Fusion MIddleware security rather than Shared Services or Native Essbase security. Although Essbase Administration Services ships with OBIEE 11.1.1.7, you should use Fusion Middleware Control to enable access to particular Essbase databases, and give permission to access tools such as Financial Reporting or Administration Services; the only security role for EAS and MaxL in this setup is to create the Essbase data and metadata filters; these filters are then assigned to users through FMW security resource permissions and application policies, which then are then granted to application roles and thereby to users.
Whilst this probably seems like an over-complicated nightmare to traditional Essbase users, it does have the major advantage that one set of application roles granted to users within a Fusion Middleware system can cover both OBIEE and Essbase permissions, and there’s no need to link to Shared Services or support Native Essbase security. We’ll cover the implications of this more in some future blog posts, but this is the enabling technology that makes the rest of this integration make sense.
With Essbase integrated into the OBIEE BI Domain, you can also now use Essbase as an aggregate persistence target, though this feature comes with the same (slightly strange) approach and limitations that we first encountered when it was first introduced with OBIEE 11.1.1.6.2 BP1; although there’s not the same requirement for the Essbase server only to be used for aggregate persistence, you still have to name the Essbase database in a particular way, it’s ASO-only, and the Aggregate Persistence Wizard still creates a separate ASO database for each aggregation (similar to Oracle Database materialised views) rather than one single cube covering all aggregations. In practical terms – I’m not sure how much you’d use this vs. creating your own Essbase cube in Studio against the whole RPD business area – but it might be useful for OBIEE developers who otherwise don’t know Essbase.
So finally, the other major Essbase-related new feature in OBIEE 11.1.1.7 is SmartView, the successor to Oracle BI Office. But that’s a topic in itself, so I’ll cover that this in the next posting.