OBIEE.nl | Oracle BI Expert Portal – The Netherlands

Liberate your data

Intelligence is all about knowledge. This website is dedicated sharing expertise on Oracle BI. More »

September 27, 2015 Rittman Mead

Taking a Look at Oracle Big Data Preparation Cloud Service – Spark-Based Data Transformation in the Cloud

One of the sessions I’m delivering at the upcoming Oracle Openworld 2015 in San Francisco is entitled “Oracle Business Intelligence Cloud Service—Moving Your Complete BI Platform to the Cloud [UGF4906]”, and looks at how you can now migrate your entire OBIEE11g platform into Oracle Public Cloud including data warehouse and data integration routines. Using Oracle BI Cloud Services’ new on-premise RPD upload feature you can upload an existing RPD into BICS and run it from there, with the accompanying data warehouse database moving into Oracle’s Database Cloud Service (and with the restriction that you can’t then edit the repository within BICS, you need to do that on-premise and upload again). For ETL and data integration you can carry on using ODI12c which now has the ability to load data into Oracle Storage Cloud (for file sources) and BICS (via a REST API) as well as the full Oracle DBaaS, but another Oracle option for doing purely cloud-based data processing enrichment has recent become available – Oracle Big Data Preparation Cloud Service. So what is it, how does it work and how is it different to ODI12c?

Oracle Big Data Preparation Cloud Service (“BDP”) is a thin-client application within Oracle Cloud for ingesting, preparing and enriching datasets that don’t have a predefined schema and may well need certain fields obfuscated or cleansed. Being integrated with Oracle Storage Cloud and other infrastructure and platform services within Oracle cloud it’s obviously aimed mainly at data transformation tasks within the Oracle Cloud enviroment, but you can upload and download datasets from your browser for use with on-premise applications. Unlike the more general-purpose Oracle Data Integrator it’s aimed instead at a particular use-case – non-technical information analysts who need to get data transformed, wrangled and enriched before they can make use of it in an environment like Hadoop. In fact the product name is a bit misleading – it runs on a big data platform within Oracle Cloud and like Oracle Big Data Discovery uses Apache Spark for its data processing – but it could potentially be useful for a business analyst to prepare data for loading into Oracle BI Cloud Service, and I’ll cover this angle when I talk about data loading options in by Oracle Openworld session.

Within a logical architecture for a typical big data DW and BI system, BDP sits alongside ODI within the Data Factory and provides self-service, agile transformation capabilities to more business-orientated users.

NewImage

Oracle Big Data Cloud Preparation Service shares quite a bit of functionality and underlying technology, with Oracle Big Data Discovery – both run on Hadoop, they both use Apache Spark for data processing and transformation, and both offer data transformation and “wrangling” features aimed at non-technical users. Oracle are positioning Big Data Preparation Service as something you’d use in the execution layer of the Oracle Information Management Reference Architecture whereas Big Data Discovery is associated more with the discovery layer – I’d mostly agree but I can see a role for BDD even within the execution layer, as a front-end to the data reservoir that typically now runs alongside relationally-stored data warehouses.

NewImage

Looking back at the slides from one of the recent Strata conferences, for example, sees Oracle positioning BDP as the “operational data preparation” tool for structured and unstructured data – with no defined schema – coming into your information platform, with the enriched output then being used BI tools, enterprise reporting and data discovery tools.

NewImage

Apart from the scalability benefits of running BDP on Apache Spark, the other interesting feature in BDP is how it uses Spark’s machine learning capabilities to try to automate as much of the data preparation process as possible, for example detecting credit card numbers in data fields and recommending you obfuscate that column. Similar to BICS and how Oracle have tried to simplify the process of creating reports and dashboards for a small team, BDP runs in the cloud tries to automate and simplify as much of the data preparation and enrichment process as possible, with ODI12c still available for ETL developers to develop more complex transformations.

The development lifecycle for BDP (from the Oracle Big Data Preparation Cloud Service e-book on Oracle’s website) uses a cycle of ingesting, cleaning, enriching and then publishing data using scripts authored using the tool and run on the Apache Spark platform. The diagram below shows the BDP development lifecycle from Oracle’s Big Data Preparation Cloud Service Handbook, and shows how ingestion, enrichment, publishing and governance go in a cycle with the common foundation of the transformation scripts that you build using BDP’s web interface.

NewImage

So let’s walk through an example data preparation exercise using a file of data stored initially in Oracle Storage Cloud Service. After logging into BDP via Oracle Cloud you’re first presented with the Catalog view, listing out all your previous transformations and showing you when they were last used to process some data.

NewImage

To create a transformation you first give it a name, then select the data source and then the file you’re interested in. In my environment I’ve got Oracle Storage Cloud and HDFS available as my main data sources, or I could upload a file from my desktop and start from there.

NewImage

BDP then ingests the file and then uses its machine learning features to process and classify data in each column, recommending column names such as “gender”, “city” and cc_number based on (presumably) some sort of classification model. In the screenshot below you can see a set of these recommendations on the left-hand side of the screen, with the columns themselves listed centre and a summary of the file profiling on the right.

NewImage

Taking a closer look at the profile results panel you can see two of the columns have alerts raised, in red. Clicking on the alert shows that the two columns have credit card data stored in clear text, with the recommendation being to obfuscate or otherwise secure these fields. Clicking on a field then shows the various transformation options, with the obvious choice here being to automatically obfuscate the data in those fields.

NewImage

Once you’ve worked through all the recommendations and added any transformations you choose to add yourself, the final step is to publish your transformation to one of the available targets. In the example below we’ve got Oracle Storage Cloud and HDFS again as potential targets; I’d imagine Oracle will add a connector to BICS soon, for example, so that you can use BDP as a data prep tool for file data that will then be added to your dataset in BICS.

NewImage

So … it’ll be interesting to see where this one goes. Its interesting that Oracle have split out data preparation and data discovery into two tools whilst others are saying theirs can do both, and you’ll still need ODI for the more complex integration jobs. But I like the innovative use of machine learning to do away with much of the manual work required for classification of incoming data fields, and running the whole thing on Spark certainly gives it the potential of scale. A couple of years ago I was worried Oracle didn’t really have a strategy for data integration and ETL in the cloud, but we’re starting to see something happen now.

There’s a big push from the Oracle field at the moment to move customers into the cloud, and I can see BDP getting bundled in with Big Data Cloud Service and BICS as the accompanying cloud data preparation tool. The danger then of course is that Big Data Discovery starts to look less useful, especially with Visual Analyzer already available within BICS and coming soon on-premise with OBIEE12c. My guess is that what we’re seeing now with these initial releases of BDP and BDD is just the start, with BDP adding more automatic enrichment “smarts” and starting to cover integration use-cases too, whilst BDD will put more focus on data visualization and analytics on the data reservoir.

September 25, 2015 Rittman Mead

Using the BI Server Metadata Web Service for Automated RPD Modifications

A little-known new feature of OBIEE 11g is a web service interface to the BI Server. Called the “BI Server Metadata Web Service” it gives a route into making calls into the BI Server using SOAP-based web services calls. Why is this useful? Because it means you can make any call to the BI Server (such as SAPurgeAllCache) from any machine without needing to install any OBIEE-related artefacts such as nqcmd, JDBC drivers, etc. The IT world has been evolving over the past decade or more towards a more service-based architecture (remember the fuss about SOA?) where we can piece together the functionality we need rather than having one monolithic black box trying to do everything. Being able to make use of this approach in our BI deployments is a really good thing. We can do simple things like BI Server cache management using a web service call, but we can also do more funky things, such as actually updating repository variable values in real time – and we can do it from within our ETL jobs, as part of an automated deployment script, and so on.

Calling the BI Server Metadata Web Service

First off, let’s get the web service configured and working. The documentation for the BI Server Metadata Web Service can be found here, and it’s important to read it if you’re planning on using this. What I describe here is the basic way to get it up and running.

Configuring Security

We need to configure the security against the web service to define what kind of authentication is required by it to use. If you don’t do this, you won’t be able to make calls to it. Setting up the security is a case of attaching a security policy in WebLogic Server to the web service (called AdminService) itself. I’ve used oracle/wss_http_token_service_policy which means that the credentials can be passed through using standard HTTP Basic authentication.

You can do this through Enterprise Manager:

Or you can do it through WLST using the attachWebServicePolicy call.

You also need to configure WSM, adding some entries to the credential store as detailed here.

Testing the Web Service

The great thing about web services is that they can be used from pretty much anywhere. Many software languages will have libraries for making SOAP calls, and if they don’t, it’s just HTTP under the covers so you can brew your own. For my testing I’m using the free version of SoapUI. In anger, I’d use something like curl for a standalone script or hook it up to ODI for integration in the batch.

Let’s fire up SoapUI and create a new project. Web services provide a Web Service Definition Language (WSDL) that describes what and how they work, which is pretty handy. We can pass this WSDL to SoapUI for it to automatically build some sample requests for us. The WSDL for this web service is

http://<biserver>:<port>/AdminService/AdminService?WSDL

Where biserver is your biserver (duh) and port is the managed server port (usually 9704, or 7780).

We’ve now got a short but sweet list of the methods we can invoke:

Expand out callProcedureWithResults and double click on Request 1. This is a template SOAP message that SoapUI has created for you, based on the WSDL.

Edit the XML to remove the parameters section, and just to test things out in procedureName put GetOBISVersion(). Your XML request should look like this:

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ws="http://ws.admin.obiee.oracle/">  
   <soapenv:Header/>  
   <soapenv:Body>  
      <ws:callProcedureWithResults>  
         <procedureName>GetOBISVersion()</procedureName>  
      </ws:callProcedureWithResults>  
   </soapenv:Body>  
</soapenv:Envelope>

If you try and run this now (green arrow or Cmd-Enter on the Mac) you’ll see the exact same XML appear on the right pane, which is strange… but click on Raw and you’ll see what the problem is :

Remember the security stuff we set up on the server previously? Well as the client we now need to keep our side of the bargain, and pass across our authentication with the SOAP call. Under the covers this is a case of sending HTTP Basic auth (since we’re using oracle/wss_http_token_service_policy), and how you do this depends on how you’re making the call to the web service. In SoapUI you click on the Auth button at the bottom of the screen, Add New Authentication, Type: Basic, and then put your OBIEE server username/password in.

Now you can click submit on the request again, and you should see in the response pane (on the right) in the XML view details of your BI Server version (you may have to scroll to the right to see it)

This simple test is just about validating the end-to-end calling of a BI Server procedure from a web service. Now we can get funky with it….

Updating Repository Variables Programatically

Repository variables in OBIEE are “global”, in that every user session sees the same value. They’re often used for holding things like “what was the last close of business date”, or “when was the data last updated in the data warehouse”. Traditionally these have been defined as dynamic repository variables with an initialisation block that polled a database query on a predefined schedule to get the current value. This meant that to have a variable showing when your data was last loaded you’d need to (a) get your ETL to update a row in a database table with a timestamp and then (b) write an init block to poll that table to get the value. That polling of the table would have to be as frequent as you needed in order to show the correct value, so maybe every minute. It’s kinda messy, but it’s all we had. Here I’d like to show an alternative approach.

Let’s say we have a repository variable, LAST_DW_REFRESH. It’s a timestamp that we use in report predicates and titles so that when they are run we can show the correct data based on when the data was last loaded into the warehouse. Here’s a rather over-simplified example:

The Title view uses the code:

Data correct as of: @{biServer.variables['LAST_DW_REFRESH']}

Note that we’re referencing the variable here. We could also put it in the Filter clause of the analysis. Somewhat tenuously, let’s imagine we have a near-realtime system that we’re federating a DW across direct from OLTP, and we just want to show data from the last load into the DW:

For the purpose of this example, the report is less important than the diagnostics it gives us when run, in nqquery.log. First we see the inbound logical request:

-------------------- SQL Request, logical request hash:  
45f9492a  
SET VARIABLE QUERY_SRC_CD='Report',PREFERRED_CURRENCY='USD';SELECT  
   0 s_0,  
   "A - Sample Sales"."Time"."T05 Per Name Year" s_1,  
   "A - Sample Sales"."Base Facts"."1- Revenue" s_2  
FROM "A - Sample Sales"  
WHERE  
("Time"."T00 Calendar Date" <  VALUEOF("LAST_DW_REFRESH"))  
ORDER BY 1, 2 ASC NULLS LAST  
FETCH FIRST 5000001 ROWS ONLY

Note the VALUEOF clause. When this is parsed out we get to see the actual value of the repository variable that OBIEE is going to execute the query with:

-------------------- Logical Request (before navigation): [[
RqList  
    0 as c1 GB,  
    D0 Time.T05 Per Name Year as c2 GB,  
    1- Revenue:[DAggr(F0 Sales Base Measures.1- Revenue by [ D0 Time.T05 Per Name Year] )] as c3 GB  
DetailFilter:  
    D0 Time.T00 Calendar Date < TIMESTAMP '2015-09-24 23:30:00.000'  
OrderBy: c1 asc, c2 asc NULLS LAST

We can see the value through the Administration Tool Manage Sessions page too, but it’s less convenient for tracking in testing:

If we update the RPD online with the Administration Tool (nothing fancy at this stage) to change the value of this static repository variable :

as18

And then rerun the report, we can see the value has changed:

-------------------- Logical Request (before navigation): [[
RqList  
    0 as c1 GB,  
    D0 Time.T05 Per Name Year as c2 GB,  
    1- Revenue:[DAggr(F0 Sales Base Measures.1- Revenue by [ D0 Time.T05 Per Name Year] )] as c3 GB  
DetailFilter:  
    D0 Time.T00 Calendar Date < TIMESTAMP '2015-09-25 00:30:00.000'  
OrderBy: c1 asc, c2 asc NULLS LAST

Now let’s do this programatically. First off, the easy stuff, that’s been written about plenty before. Using biserverxmlgen we can create a XUDML version of the repository. Searching through this we can pull out the variable definition:

<Variable name="LAST_DW_REFRESH" id="3031:286125" uid="00000000-1604-1581-9cdc-7f0000010000">  
<Expr><![CDATA[TIMESTAMP '2015-09-25 00:30:00']]></Expr>  
</Variable>

and then wrap it in the correct XML structure and update the timestamp we want to use:

<?xml version="1.0" encoding="UTF-8" ?>  
<Repository xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">  
    <DECLARE>  
        <Variable name="LAST_DW_REFRESH" id="3031:286125" uid="00000000-1604-1581-9cdc-7f0000010000">  
        <Expr><![CDATA[TIMESTAMP '2015-09-26 00:30:00']]></Expr>  
        </Variable>  
    </DECLARE>  
</Repository>

(you can also get this automagically by modifying the variable in the RPD, saving the RPD file, and then comparing it to the previous copy to generate a patch file with the Administration Tool or comparerpd)

Save this XML snippet, with the updated timestamp value, as last_dw_refresh.xml. Now to update the value on the BI Server in-flight, first using the OBIEE tool biserverxmlcli. This is available on all OBIEE servers and client installations – we’ll get to web services for making this update call remotely in a moment.

biserverxmlcli -D AnalyticsWeb -R Admin123 -U weblogic -P Admin123 -I last_dw_refresh.xml

Here -D is the BI Server DSN, -U / -P are the username/password credentials for the server, and the -R is the RPD password.

Running the analysis again shows that it is now working with the new value of the variable:

-------------------- Logical Request (before navigation): [[
RqList  
    0 as c1 GB,  
    D0 Time.T05 Per Name Year as c2 GB,  
    1- Revenue:[DAggr(F0 Sales Base Measures.1- Revenue by [ D0 Time.T05 Per Name Year] )] as c3 GB  
DetailFilter:  
    D0 Time.T00 Calendar Date < TIMESTAMP '2015-09-26 00:30:00.000'  
OrderBy: c1 asc, c2 asc NULLS LAST

Getting Funky – Updating RPD from Web Service

Let’s now bring these two things together – RPD updates (in this case to update a variable value, but could be anything), and BI Server calls.

In the above web service example I called the very simple GetOBISVersion. Now we’re going to use the slightly more complex NQSModifyMetadata. This is actually documented, and what we’re going to do is pass across the same XUDML that we sent to biserverxmlcli above, but through the web service. As a side note, you could also do this over JDBC if you wanted (under the covers, AdminService is just a web app with a JDBC connector to the BI Server).

I’m going to do this in SoapUI here for clarity but I actually used SUDS to prototype it and figure out the exact usage.

So as a quick recap, this is what we’re going to do:

Update the value of a repository variable, using XUDML. We generated this XUDML from biserverxmlgen and wrapped it in the correct XML structure.

We could also have got it through comparerpd or the Administration Tool ‘create patch’ function
Use the BI Server’s NQSModifyMetadata call to push the XUDML to the BI Server online.

We saw that biserverxmlcli can also be used as an alternative to this for making online updates through XUDML.
Use the BI Server Metadata Web Service (AdminService) to invoke the NQSModifyMetadata call on the BI Server, using the callProcedureWithResults method

In SoapUI create a new request:

Set up the authentication:

Edit the XML SOAP message to remove parameters and specify the basic BI Server call:

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ws="http://ws.admin.obiee.oracle/">  
   <soapenv:Header/>  
   <soapenv:Body>  
      <ws:callProcedureWithResults>  
         <procedureName>NQSModifyMetadata()</procedureName>  
      </ws:callProcedureWithResults>  
   </soapenv:Body>  
</soapenv:Envelope>

Now the tricky bit – we need to cram the XUDML into our XML SOAP message. But, XUDML is its own kind of XML, and all sorts of grim things happen here if we’re not careful (because the XUDML gets swallowed up into the SOAP message, it all being XML). The solution I came up with (which may not be optimal…) is to encode all of the HTML entities, which a tool like this does (and if you’re using a client library like SUDS will happen automagically). So our XUDML, with another new timestamp for testing:

<?xml version="1.0" encoding="UTF-8" ?>  
<Repository xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">  
    <DECLARE>  
        <Variable name="LAST_DW_REFRESH" id="3031:286125" uid="00000000-1604-1581-9cdc-7f0000010000">  
        <Expr><![CDATA[TIMESTAMP '2015-09-21 00:30:00']]></Expr>  
        </Variable>  
    </DECLARE>  
</Repository>

becomes:

&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot; ?&gt; &lt;Repository xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot;&gt; &lt;DECLARE&gt; &lt;Variable name=&quot;LAST_DW_REFRESH&quot; id=&quot;3031:286125&quot; uid=&quot;00000000-1604-1581-9cdc-7f0000010000&quot;&gt; &lt;Expr&gt;&lt;![CDATA[TIMESTAMP '2015-09-21 00:30:00']]&gt;&lt;/Expr&gt; &lt;/Variable&gt; &lt;/DECLARE&gt; &lt;/Repository&gt;

We’re not quite finished yet. Because this is actually a call (NQSModifyMetadata) nested in a call (callProcedureWithResults) we need to make sure NQSModifyMetadata gets the arguments (the XUDML chunk) through intact, so we wrap it in single quotes – which also need encoding ('):

&apos;&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot; ?&gt; &lt;Repository xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot;&gt; &lt;DECLARE&gt; &lt;Variable name=&quot;LAST_DW_REFRESH&quot; id=&quot;3031:286125&quot; uid=&quot;00000000-1604-1581-9cdc-7f0000010000&quot;&gt; &lt;Expr&gt;&lt;![CDATA[TIMESTAMP '2015-09-21 23:30:00']]&gt;&lt;/Expr&gt; &lt;/Variable&gt; &lt;/DECLARE&gt; &lt;/Repository&gt;&apos;

and then for final good measure, the single quotes around the timestamp need double-single quoting:

&apos;&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot; ?&gt; &lt;Repository xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot;&gt; &lt;DECLARE&gt; &lt;Variable name=&quot;LAST_DW_REFRESH&quot; id=&quot;3031:286125&quot; uid=&quot;00000000-1604-1581-9cdc-7f0000010000&quot;&gt; &lt;Expr&gt;&lt;![CDATA[TIMESTAMP ''2015-09-21 23:30:00'']]&gt;&lt;/Expr&gt; &lt;/Variable&gt; &lt;/DECLARE&gt; &lt;/Repository&gt;&apos;

Nice, huh? The WSDL suggests that parameters for these calls should be able to be placed within the XML message as additional entities, which I wonder if would allow for proper encoding, but I couldn’t get it to work (I kept getting java.sql.SQLException: Parameter 1 is not bound).

So, stick this mess of encoding plus twiddles into your SOAP message and it should look like this: (watch out for line breaks; these can break things)

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ws="http://ws.admin.obiee.oracle/">  
   <soapenv:Header/>  
   <soapenv:Body>  
      <ws:callProcedureWithResults>  
         <procedureName>NQSModifyMetadata(&apos;&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot; ?&gt; &lt;Repository xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot;&gt; &lt;DECLARE&gt; &lt;Variable name=&quot;LAST_DW_REFRESH&quot; id=&quot;3031:286125&quot; uid=&quot;00000000-1604-1581-9cdc-7f0000010000&quot;&gt; &lt;Expr&gt;&lt;![CDATA[TIMESTAMP ''2015-09-21 23:30:00'']]&gt;&lt;/Expr&gt; &lt;/Variable&gt; &lt;/DECLARE&gt; &lt;/Repository&gt;&apos;)</procedureName>  
      </ws:callProcedureWithResults>  
   </soapenv:Body>  
</soapenv:Envelope>

Hit run, and with a bit of luck you’ll get a “nothing to report” response :

If you look in nqquery.log you’ll see:

[...] NQSModifyMetadata started.  
[...] NQSModifyMetadata finished successfully.

and all-importantly when you run your report, the updated variable will be used:

-------------------- Logical Request (before navigation): [[
RqList  
    0 as c1 GB,  
    D0 Time.T05 Per Name Year as c2 GB,  
    1- Revenue:[DAggr(F0 Sales Base Measures.1- Revenue by [ D0 Time.T05 Per Name Year] )] as c3 GB  
DetailFilter:  
D0 Time.T00 Calendar Date < TIMESTAMP '2015-09-21 23:30:00.000'  
OrderBy: c1 asc, c2 asc NULLS LAST

If this doesn’t work … well, best of luck. Use nqquery.log, bi_server1.log (where the AdminService writes some diagnostics) to try and trace the issue. Also test calling NQSModifyMetadata from JDBC (or ODBC) directly, and then add in the additional layer of the web service call.

So, in the words of Mr Rittman, there you have it. Programatically updating the value of repository variables, or anything else, done online, and for bonus points done through a web service call making it possible to use without any local OBIEE client/server tools.

September 24, 2015 Rittman Mead

Managing the OBIEE BI Server Cache from ODI 12c

I wrote recently about the OBIEE BI Server Cache and how useful it can be, but how important it is to manage it properly, both in the purging of stale data and seeding of new. In this article I want to show how to walk-the-walk and not just talk-the-talk (WAT? But you’re a consultant?!). ODI is the premier data integration tool on the market and one that we are great fans of here at Rittman Mead. We see a great many analytics implementations built with ODI for the data load (ELT, strictly speaking, rather than ETL) and then OBIEE for the analytics on top. Managing the BI Server cache from within your ODI batch makes a huge amount of sense. By purging and reseeding the cache directly after the data has been loaded into the database we can achieve optimal cache usage with no risk of stale data.

There are two options for cleanly hooking into OBIEE from ODI 12c with minimal fuss: JDBC, and Web Services. JDBC requires the OBIEE JDBC driver to be present on the ODI Agent machine, whilst Web Services have zero requirement on the ODI side, but a bit of config on the OBIEE side.

Setting up the BI Server JDBC Driver and Topology

Here I’m going to demonstrate using JDBC to connect to OBIEE from ODI. It’s a principle that was originally written up by Julien Testut here. We take the OBIEE JDBC driver bijdbc.jar from $FMW_HOME/Oracle_BI1/bifoundation/jdbc and copy it to our ODI machine. I’m just using a local agent for my testing, so put it in ~/.odi/oracledi/userlib/. For a standalone agent it should go in $AGENT_HOME/odi/agent/lib.

[oracle@ip-10-103-196-207 ~]$ cd /home/oracle/.odi/oracledi/userlib/  
[oracle@ip-10-103-196-207 userlib]$ ls -l  
total 200  
-rw-r----- 1 oracle oinstall    332 Feb 17  2014 additional_path.txt  
-rwxr-xr-x 1 oracle oinstall 199941 Sep 22 14:50 bijdbc.jar

Now fire up ODI Studio, sign in to your repository, and head to the Topology pane. Under Physical Architecture -> Technologies and you’ll see Oracle BI

Right click and select New Data Server. Give it a sensible name and put your standard OBIEE credentials (eg. weblogic) under the Connection section. Click the JDBC tab and click the search icon to the right of the JDBC Driver text box. Select the default, oracle.bi.jdbc.AnaJdbcDriver, and then in the JDBC Url box put your server and port (9703, unless you’ve changed the listen port of OBIEE BI Server)

Now click Test Connection (save the data server when prompted, and click OK at the message about creating a physical schema), and select the Local Agent with which to run it. If you get an error then click Details to find out the problem.

One common problem can be the connection through to the OBIEE server port, so to cut ODI out of the equation try this from the command prompt on your ODI machine (assuming it’s *nix):

nc -vz my-obiee-server.foo.com 9703

If the host resolves correctly and the port is open then you should get:

Connection to my-obiee-server.foo.com 9703 port [tcp/*] succeeded!

If not you’ll get something like:

nc: my-obiee-server.foo.com port 9703 (tcp) failed: Connection refused

Check the usual suspects – firewall (eg iptables) on the OBIEE server, firewalls on the network between the ODI and OBIEE servers, etc.

Assuming you’ve got a working connection you now need to create a Physical Schema. Right click on the new data server and select New Physical Schema.

OBIEE’s BI Server acts as a “database” to clients, within which there are “schemas” (Subject Areas) and “tables” (Presentation Tables). On the New Physical Schema dialog you just need to set Catalog (Catalog), and when you click the drop-down you should see a list of the Subject Areas within your RPD. Pick one – it doesn’t matter which.

Save the physical schema (ignore the context message). At this point your Physical Architecture for Oracle BI should look like this:

Now under Logical Architecture locate the Oracle BI technology, right click on it and select New Logical Schema. From the Physical Schemas dropdown select the one that you’ve just created. Give a name to the Logical Schema.

Your Logical Architecture for Oracle BI should look like this:

Building the Cache Management Routine

Full Cache Purge

Over in the Designer tab go to your ODI project into which you want to integrate the OBIEE cache management functions. Right click on Procedures and select Create New Procedure. Give it a name such as OBIEE Cache – Purge All and set the Target Technology to Oracle BI

Switch to the Tasks tab and add a new Task. Give it a name, and set the Schema to the logical schema that you defined above. Under Target Command enter the call you want to make to the BI Server, which in this case is

call SAPurgeAllCache();

Save the procedure and then from the toolbar menu click on Run. Over in the Operator tab you should see the session appear and soon after complete – all being well – successfully.

You can go and check your BI Server Cache from the OBIEE Administration Tool to confirm that it is now empty:

And confirm it through Usage Tracking:

From what I can see at the default log levels, nothing gets written to either nqquery.log or nqserver.log for this action unless there is an error in your syntax in which case it is logged in nqserver.log:

(For more information on that particular error see here)

Partial Cache Purge

This is the same pattern as above – create an ODI Procedure to call the relevant OBIEE command, which for purging by table is SAPurgeCacheByTable. We’re going to get a step more fancy now, and add a variable that we can pass in so that the Procedure is reusable multiple times over throughout the ODI execution for different tables.

First off create a new ODI Variable that will hold the name of the table to purge. If you’re working with multiple RPD Physical Database/Catalog/Schema objects you’ll want variables for those too:

Now create a Procedure as before, with the same settings as above but a different Target Command, based on SAPurgeCacheByTable and passing in the four parameters as single quoted, comma separated values. Note that these are the Database/Catalog/Schema/Table as defined in the RPD. So “Database” is not your TNS or anything like that, it’s whatever it’s called in the RPD Physical layer. Same for the other three identifiers. If there’s no Catalog (and often there isn’t) just leave it blank.

When including ODI Variable(s) make sure you still single-quote them. The command should look something like this:

Now let’s seed the OBIEE cache with a couple of queries, one of which uses the physical table and one of which doesn’t. When we run our ODI Procedure we should see one cache entry go and the other remain. Here’s the seeded cache:

And now after executing the procedure:

And confirmation through Usage Tracking of the command run:

Cache Seeding

As before, we use an ODI Procedure to call the relevant OBIEE command. To seed the cache we can use SASeedQuery which strictly speaking isn’t documented but a quick perusal of the nqquery.log when you run a cache-seeding OBIEE Agent shows that it is what is called in the background, so we’re going to use it here (and it’s mentioned in support documents on My Oracle Support, so it’s not a state secret). The documentation here gives some useful advice on what you should be seeding the cache with — not necessarily only exact copies of the dashboard queries that you want to get a cache hit for.

Since this is a cookie-cutter of what we just did previously you can use the Duplicate Selection option in ODI Designer to clone one of the other OBIEE Cache procedures that you’ve already created. Amend the Target Command to:

When you run this you should see a positive confirmation in the nqserver.log of the cache seed:

[2015-09-23T23:23:10.000+01:00] [OracleBIServerComponent] [TRACE:3]  
[USER-42] [] [ecid: 005874imI9nFw000jzwkno0007q700008K,0] [tid: 9057d700]  
[requestid: 477a0002] [sessionid: 477a0000] [username: weblogic]  
--------------------  
Query Result Cache: [59124]  
The query for user 'weblogic' was inserted into the query result cache.  
The filename is '/app/oracle/biee/instances/instance1/bifoundation/OracleBIServerComponent/coreapplication_obis1/cache/NQS__735866_84190_2.TBL'. [[

A very valid alternative to calling SASeedQuery would be to call the OBIEE SOA Web Service to trigger an OBIEE Agent that populated the cache (by setting ‘Destination’ to ‘Oracle BI Server Cache (For seeding cache)’). OBIEE Agents can also be ‘daisy chained’ so that one Agent calls another on completion, meaning that ODI could kick off a single ‘master’ OBIEE Agent which then triggered multiple ‘secondary’ OBIEE Agents. The advantage of this approach over SASeedQuery is that cache seeding is more likely to change as OBIEE usage patterns do, and it is easier for OBIEE developers to maintain all the cache seeding code within ‘their’ area (OBIEE Presentation Catalog) than put in a change request to the ODI developers each time to change a procedure.

Integrating it in the ODI batch

You’ve two options here, using Packages or Load Plans. Load Plans were introduced in ODI 11.1.1.5 and are a clearer and more flexible of orchestrating the batch.

To use it in a load plan create a serial step that will call a mapping followed by the procedure to purge the affected table. In the procedure step in the load plan set the value for the variable. At the end of the load plan, call the OBIEE cache seed step:

Alternatively, to integrate the above procedures into a Package instead of a load plan you need to add two steps per mapping. First, the variable is updated to hold the name of the table just loaded, and then the OBIEE cache is purged for the affected table. At the end of the flow a call is made to reseed the cache:

These are some very simple examples, but hopefully illustrate the concept and the powerful nature of integrating OBIEE calls directly from ODI. For more information about OBIEE Cache Management, see my post here.

September 23, 2015 Rittman Mead

OBIEE BI Server Cache Management Strategies

The OBIEE BI Server cache can be one of the most effective ways of improving response times of OBIEE dashboards. By using data already in the cache it reduces load on the database, the network, and the BI Server.

Should you be using it? I always describe it as the “icing on the cake” – it’s not a fix for a badly-designed OBIEE system, but it does make a lot of sense to use once you’re happy that the foundations for the system are in place. If the foundations are not not in place? Then you’re just papering over the cracks and at some point it’s probably going to come back to bite you. As Mark Rittman put it nearly seven years ago, it’s “[…]usually the last desperate throw of the dice”. The phrase “technical debt”? Yeh, that. But, BI Server caching used after performance review and optimisation rather than instead of – then it’s a Good Thing.

So you’ve decided to use the BI Server cache, and merrily trotted over to Enterprise Manager to enable it, restarted the BI Server, and now your work is done, right? Not quite. Because the BI Server cache will start to store data from all the queries that you run, and use it to satisfy subsequent queries. Not only will it match on a direct hit for the same query, it will use a subset of an existing cache entry where appropriate, and can even aggregate up from what’s in the cache to satisfy a query at a higher level. Clever stuff. But, what happens when you load new data into your data warehouse? Well, the BI Server continues to serve requests out of the cache, because why shouldn’t it? And herein lies the problem with “just turn caching on”. You have to have a cache management strategy.

A cache management strategy sounds grand doesn’t it? But it boils down to two things:

Accuracy – Flush any data from the cache that is now stale
Speed – Prime the cache so that as many queries get a hit on it, first time

Maintaining an Accurate Cache

Every query that is run through the BI Server, whether from a Dashboard, Answers, or more funky routes such as custom ODBC clients or JDBC, will end up in cache. It’s possible to “seed” (“prime”/“warmup”) the cache explicitly, and this is discussed later. The only time you won’t see data in the cache is if (a) you have BI Server caching disabled, or (b) you’ve disabled the Cacheable option for a physical table that is involved in providing the data for the query being run.

You can see metadata for the current contents of the cache in the Administration Tool when connected online to the BI Server, through the Manage -> Cache menu option. This gives you lots of useful information (particularly when you come to optimising cache usage) including the size of each entry, when it was created, when it was last used, and so on.

Purging Options

So we’ve a spread of queries run that hit various dimension and fact tables and created lots of cache entries. Now we’ve loaded data into our underlying database, so we need to make sure that the next time a user runs an OBIEE query that uses the new data they can see it. Otherwise we commit the cardinal sin of any analytical system and show the user incorrect data which is a Bad Thing. It may be fast, but it’s WRONG….

We can purge the whole cache, but that’s a pretty brutal approach. The cache is persisted to disk and can hold lots of data stretching back months – to blitz all of that just because one table has some new data is overkill. A more targetted approach is to purge by physical database, physical table, or even logical query. When would you use these?

Purge entire cache – the nuclear option, but also the simplest. If your data model is small and a large proportion of the underlying physical tables may have changed data, then go for this
Purge by Physical Database – less brutal that clearing the whole cache, if you have various data sources that are loaded at different points in the batch schedule then targetting a particular physical database makes sense.
Purge by Physical Table – if many tables within your database have remained unchanged, whilst a large proportion of particlar tables have changed (or it’s a small table) then this is a sensible option to run for each affected table
Purge by Query – If you add a few thousand rows to a billion row fact table, purging all references to that table from the cache would be a waste. Imagine you have a table with sales by day. You load new sales figures daily, so purging the cache by query for recent data is obviously necessary, but data from previous weeks and months may well remain untouched so it makes sense to leave queries against those in the cache. The specifics of this choice are down to you and your ETL process and business rules inherent in the data (maybe there shouldn’t be old data loaded, but what happens if there is? See above re. serving wrong data to users). This option is the most complex to maintain because you risk leaving behind in the cache data that may be stale but doesn’t match the precise set of queries that you purge against.

Which one is correct depends on

your data load and how many tables you’ve changed
your level of reliance on the cache (can you afford low cache hit ratio until it warms up again?)
time to reseed new content

If you are heavily dependant on the cache and have large amounts of data in it, you are probably going to need to invest time in a precise and potentially complex cache purge strategy. Conversely if you use caching as the ‘icing on the cake’ and/or it’s quick to seed new content then the simplest option is to purge the entire cache. Simple is good; OBIEE has enough moving parts without adding to its complexity unnecessarily.

Note that OBIEE itself will perform cache purges in some situations including if a dynamic repository variable used by a Business Model (e.g. in a Logical Column) gets a new value through a scheduled initialisation block.

Performing the Purge

There are several ways in which we can purge the cache. First I’ll discuss the ones that I would not recommend except for manual testing:

Administration Tool -> Manage -> Cache -> Purge. Doing this every time your ETL runs is not a sensible idea unless you enjoy watching paint dry (or need to manually purge it as part of a deployment of a new RPD etc).
In the Physical table, setting Cache persistence time. Why not? Because this time period starts from when the data was loaded into the cache, not when the data was loaded into your database.
An easy mistake to make would be to think that with a daily ETL run, setting the Cache persistence time to 1 day might be a good idea. It’s not, because if your ETL runs at 06:00 and someone runs a report at 05:00, there is a going to be a stale cache entry present for another 23 hours. Even if you use cache seeding, you’re still relinquishing control of the data accuracy in your cache. What happens if the ETL batch overruns or underruns?

The only scenario in which I would use this option is if I was querying directly against a transactional system and wanted to minimise the number of hits OBIEE made against it – the trade-off being users would deliberately be seeing stale data (but sometimes this is an acceptable compromise, so long as it’s made clear in the presentation of the data).

So the two viable options for cache purging are:

BI Server Cache Purge Procedures

These are often called “ODBC” Procedures but technically ODBC is just one – of several – ways that the commands can be sent to the BI Server to invoke.

As well as supporting queries for data from clients (such as Presentation Services) sent as Logical SQL, the BI Server also has its own set of procedures. Many of these are internal and mostly undocumented (Christian Berg does a great job of explaining them here, and they do creep into the documentation here and here), but there are some cache management ones that are fully supported and documented. They are:

SAPurgeCacheByQuery
SAPurgeCacheByTable
SAPurgeCacheByDatabase
SAPurgeAllCache

The names of these match up to the purge processes that I describe above. The syntax is in the documentation, but what I am interested in here is how you can invoke them. They are my preferred method for managing the BI Server cache because they enable you to tightly couple your data load (ETL) to your cache purge. Setting the cache to purge based on a drop-dead timer (whether crontab, tivoli, Agent/iBot, whatever) gives you a huge margin of error if your ETL runtime does not remain consistent. Whether it organically increases in runtime as data volumes increase, or it fails and has to be fixed and restarted, ETL does not always finish bang-on when it is ‘supposed’ to.

You can call these procedures in the several ways, including:

nqcmd – one of the most common ways, repeated on many a blog, but requires nqcmd/OBIEE to be installed on the machine running it. nqcmd is a command-line ODBC client for connecting to the BI Server
ODBC – requires BI to be installed on the machine running it in order to make the OBIEE ODBC driver available
JDBC – just requires the OBIEE JDBC driver, which is a single .jar file and thus portable
Web Service – the OBIEE BI Server Web Service can be used to invoke these procedures from any machine with no dependencies other than some WSM configuration on the OBIEE server side.

My preference is for JDBC or Web Service, because they can be called from anywhere. In larger organisations the team building the ETL may have very little to do with OBIEE, and so asking them to install OBIEE components on their server in order to trigger cache purging can be quite an ask. Using JDBC only a single .jar needs copying onto the server, and using the web service not even that:

curl --silent --header "Content-Type: text/xml;charset=UTF-8" 
--user weblogic:Admin123 
--data @purge_cache_soap.xml 
http://192.168.56.102:7780/AdminService/AdminService

[...]
[59118] Operation SAPurgeAllCache succeeded!
[...]

For details of configuring ODI to use the BI Server JDBC driver in order to tightly couple the cache management into an existing ODI load job, stay tuned for a future blog!

Event Polling Tables (EPT)

NB Not Event “Pooling” Tables as I’ve often seen this called

The second viable approach to automated cache purging is EPT, which is a decoupled approach to managing the cache purge, with two components:

An application (your ETL) inserts a row into the table S_NQ_EPT (which is created at installation time by the RCU in the BIPLATFORM schema) with the name of the physical table in which data has been changed
The BI Server polls (hence the name) the S_NQ_EPT table periodically, and if it finds entries in it, purges the cache of data that is from those tables.

So EPT is in a sense the equivilent of using SAPurgeCacheByTable, but in a manner that is not tightly coupled. It relies on configuring the BI Server for EPT, and there is no easy way to know from your ETL if the cache purge has actually happened. It also means that the cache remains stale potentially as long as the polling interval that you’ve configured. Depending on when you’re running your ETL and the usage patterns of your users this may not be an issue, but if you are running ETL whilst users are on the system (for example intra-day micro ETL batches) you could end up with users seeing stale data. Oracle themselves recommend not setting the polling interval any lower than 10 minutes.

EPT has the benefit of being very easy to implement on the ETL side, because it is simply a database table into which the ETL developers need to insert a row for each table that they update during the ETL.

Seeding the Cache

Bob runs an OBIEE dashboard, and the results are added to the cache so that when Bill runs the same dashboard Bill gets a great response rate because his dashboard runs straight from cache. Kinda sucks for Bob though, because his query ran slow as it wasn’t in the cache yet. What’d be nice would be that for the first user on a dashboard the results were already in cache. This is known as seeding the cache, or ‘priming’ it. Because the BI Server cache is not dumb and will hit the cache for queries that aren’t necessarily direct replicas of what previously ran working out the optimal way to seed the cache can take some ~~trial and error~~ careful research. The documentation does a good job of explaining what will and won’t qualify for a cache hit, and it’s worth reading this first.

There are several options for seeding the cache. These all assume you’ve figured out the queries that you want to run in order to load the results into cache.

Run the analysis manually, which will return the analysis data to you and insert it into the BI Server Cache too.
Create an Agent to run the analysis with destination set to Oracle BI Server Cache (For seeding cache), and then either:
1. Schedule the analysis to run from an Agent on a schedule
2. Trigger it from a Web Service in order to couple it to your ETL data load / cache purge batch steps.
Use the BI Server Procedure SASeedQuery (which is what the Agent does in the background) to load the given query into cache without returning the data to the client. This is useful for doing over JDBC/ODBC/Web Service (as discussed for purging above). You could just run the Logical SQL itself, but you probably don’t want to pull the actual data back to the client, hence using the procedure call instead.

Sidenote – Checking the RPD for Cacheable Tables

The RPD Query Tool is great for finding objects matching certain criteria. However, it seems to invert results when looking for Cacheable Physical tables – if you add a filter of Cacheable = false you get physical tables where Cacheable is enabled! And the same in reverse (Cacheable = true -> shows Physical tables where Cacheable is disabled)

Day in the Life of an OBIEE Cache Entry (Who Said BI Was Boring?)

In this example here I’m running a very simple report from SampleApp v406:

The Logical SQL for this is:

SELECT  
   0 s_0,  
   "A - Sample Sales"."Time"."T02 Per Name Month" s_1,  
   "A - Sample Sales"."Base Facts"."1- Revenue" s_2  
FROM "A - Sample Sales"  
ORDER BY 1, 2 ASC NULLS LAST  
FETCH FIRST 5000001 ROWS ONLY

Why’s that useful to know? Because when working with the cache resubmitting queries is needed frequently and doing so directly from an interface like nqcmd is much faster (for me) than a web GUI. Horses for courses…

So I’ve run the query and now we have a cache entry for it. How do we know? Because we see it in the nqquery.log (and if you don’t have it enabled, go and enable it now):

[2015-09-23T15:58:18.000+01:00] [OracleBIServerComponent] [TRACE:3] [USER-42]  
[] [ecid: 00586hFR07mFw000jzwkno0005Qx00007U,0] [tid: 84a35700]  
[requestid: a9730015] [sessionid: a9730000] [username: weblogic]  
--------------------  
Query Result Cache: [59124] The query for user 'weblogic' was inserted into 
the query result cache.  
The filename is '/app/oracle/biee/instances/instance1/bifoundation/OracleBIServerComponent/coreapplication_obis1/cache/NQS__735866_57498_0.TBL'.

We see it in Usage Tracking (again, if you don’t have this enabled, go and enable it now):

SELECT TO_CHAR(L.START_TS, 'YYYY-MM-DD HH24:Mi:SS') LOGICAL_START_TS,  
  SAW_SRC_PATH,  
  NUM_CACHE_INSERTED,  
  NUM_CACHE_HITS,  
  NUM_DB_QUERY  
FROM BIEE_BIPLATFORM.S_NQ_ACCT L  
ORDER BY START_TS DESC;

We can also see it in the Administration Tool (when connected online to the BI Server):

We can even see it and touch it (figuratively) on disk:

So we have the data in the cache. The same query run again will now use the cache entry, as seen in nqquery.log:

[2015-09-23T16:09:24.000+01:00] [OracleBIServerComponent] [TRACE:3] [USER-21]
[] [ecid: 11d1def534ea1be0:6066a19d:14f636f1dea:-8000-000000000000b948,0:1:1:5]  
[tid: 87455700] 
[requestid: a9730017] [sessionid: a9730000] [username: weblogic]  
--------------------  
Cache Hit on query: [[  
Matching Query: SET VARIABLE QUERY_SRC_CD='Report',SAW_SRC_PATH='/users/weblogic/Cache Test 01',PREFERRED_CURRENCY='USD';SELECT  
   0 s_0,  
   "A - Sample Sales"."Time"."T02 Per Name Month" s_1,  
   "A - Sample Sales"."Base Facts"."1- Revenue" s_2  
FROM "A - Sample Sales"  
ORDER BY 1, 2 ASC NULLS LAST  
FETCH FIRST 5000001 ROWS ONLY

Created by:     weblogic

and in Usage Tracking:

“Interestingly” Usage Tracking shows a count of 1 for number of DB queries run, which we would not expect for a cache hit. The nqquery.log shows the same, but no query logged as being sent to the database, so I’m minded to dismiss this as an instrumentation bug.

Now what about if we want to run a query but not use the BI Server Cache? This is an easy one, plenty blogged about it elsewhere – use the Request Variable DISABLE_CACHE_HIT=1. This overrides the built in system session variable of the same name. Here I’m running it directly against the BI Server, prefixed onto my Logical SQL – if you want to run it from within OBIEE you need the Advanced tab in the Answers editor.

SET VARIABLE SAW_SRC_PATH='/users/weblogic/Cache Test 01',
DISABLE_CACHE_HIT=1:SELECT  
   0 s_0,  
   "A - Sample Sales"."Time"."T02 Per Name Month" s_1,  
   "A - Sample Sales"."Base Facts"."1- Revenue" s_2  
FROM "A - Sample Sales"  
ORDER BY 1, 2 ASC NULLS LAST  
FETCH FIRST 5000001 ROWS ONLY

Now we get a cache ‘miss’, because we’ve specifically told the BI Server to not use the cache. As you’d expect, Usage Tracking shows no cache hit, but it does show a cache insert – because why shouldn’t it?

If you want to run a query without seeding the cache either, you can use DISABLE_CACHE_SEED=1:

SET VARIABLE SAW_SRC_PATH='/users/weblogic/Cache Test 01',
DISABLE_CACHE_HIT=1,DISABLE_CACHE_SEED=1:SELECT  
   0 s_0,  
   "A - Sample Sales"."Time"."T02 Per Name Month" s_1,  
   "A - Sample Sales"."Base Facts"."1- Revenue" s_2  
FROM "A - Sample Sales"  
ORDER BY 1, 2 ASC NULLS LAST  
FETCH FIRST 5000001 ROWS ONLY

These request variables can be set per analysis, or per user by creating a session initialisation block to assign the required values to the respective variables.

Cache Location

The BI Server cache is held on disk, so it goes without saying that storing it on fast (eg SSD) disk is a Good Idea. There’s no harm in giving it its own filesystem on *nix to isolate it from other work (in terms of filesystems filling up) and to make monitoring it super easy.

Use the DATA_STORAGE_PATHS configuration element in NQSConfig.ini to change the location of the BI Server cache.

Summary

Use BI Server Caching as the ‘icing on the cake’ for performance in your OBIEE system. Make sure you have your house in order first – don’t use it to try to get around bad design.
Use the SAPurgeCache procedures to directly invoke a purge, or the Event Polling Tables for a more loosely-coupled approach. Decide carefully which purge approach is best for your particular caching strategy.
If using the SAPurgeCache procedures, use JDBC or Web Services to call them so that there is minimal/no installation required to call them from your ETL server.
Invest time in working out an optimal cache seeding strategy, making use of Usage Tracking to track cache hit ratios.
Integrate both purge and seeding into your ETL. Don’t use a schedule-based approach because it will come back to haunt you with its inflexibility and scope for error.

September 19, 2015 Rittman Mead

Tracing Data Lineage and Impact Analysis in Oracle Enterprise Metadata Management 12c

At the beginning of the year, I wrote a couple of posts about Oracle Enterprise Metadata Management, one about how to install the tool and another one explaining the harvesting (importing) and stitching (relating) processes. A lot of things have happened since then, including a new version of OEMM released on May with some nice features as HTML 5 support, a workflow for business glossary and including the concept of groups in security.

OEMM is the Oracle solution for data governance and lifecycle management. “Data governance (DG) refers to the overall management of the availability, usability, integrity, and security of the data employed in an enterprise”. So, the process of knowing how my data is related, the integrity of it, who has access to which part of the data and with which application, from where people can access the data, defining the stewards of the data, etc are part of data governance.

To refresh the main concepts, the first thing that you have to do when you start working with OEMM is the harvesting or importing process. We import the metadata creating one model for each different metadata source. After the successful importing process, we need to create a configuration to relate all the required models.

One of the main features of OEMM is to perform Data Flow Lineage and Impact Analysis. Why is this so important? Because the typical problem that we face in IT is to understand how all the company data is related, what will happen if I modify one column in a source schema, or how I can find out where a particular piece of data come from. In this post I will talk about these process which are exactly the same as in the previous version.

I created an environment where I installed the latest release of OEMM (12.1.3.0.2), Oracle Database 12c that contains the source, staging and DW schemas and ODI 12c which is the ELT tool to load and transform the data into our DW schema. In addition I also have OBIEE 11g which repository is based on the DW schema and some OBI analyses to query the data. I imported all the metadata from these different sources and create a configuration that includes all the models created.

Data Impact Analysis

The process of Data Impact Analysis is to find out which data items in target database and/or applications are affected if we change one source object, like a column or a table or another source object.

There are many ways to start the tracing data impact process in OEMM. We can start the analysis impact process in OEMM from the model where the object belongs or from a diagram.

We can open the model from the repository panel or from the configuration Architecture diagram. Once the model is opened, select the object that you want to perform the impact analysis, right click on it, and select Trace Data Impact or Trace Lineage (Advanced) option. Either option opens the same dialog window.

Then, we need to choose between the model and the configuration as the scope for this impact analysis. The most common option is to select one configuration to see the consequences of a possible change in all the target models present in this configuration. Also we have the option to see the result in graphic or in text mode.

In our example, we are going to trace the data impact of the ORDERS table which is in the database source model and is the source for the fact table in the DW.

In the following picture, you will find the resulted diagram of the data impact analysis. In order to see the diagrams better in OEMM, you can collapse the Properties windows at your right and press the Fit to Content button in the diagram menu.

There are plenty of interesting things that you can do from here. First, if you take a closer look you will see that the lines between the Orders Source and Order Staging as well as the ones from Orders Staging and Orders_DW are ticker than the ones that connect Orders_DW to the OBIEE Model. This means there are some ETL process in the middle of these connections. When we right click on one of these arrows and select Trace ETL details, a new tab is opened inside the configuration and will show the mappings involved and the operations (a JOIN in our example). Also if we expand the windows properties and select the JOIN element we can see its condition.

Coming back to the Data Impact diagram, you can select one particular object like a column and press the highlight path button to emphasise only a specific data flow. This is very useful when you have many objects in the diagram and you need to focus in some particular data path.

If you want to keep this diagram saved so you can come back easily to it, you can add a bookmark. You can create folders to organise your bookmarks and they will appear at the bottom left of the screen.

Tracing Data Lineage

The Trace Data Lineage process is to find the path from the source to the target object that you selected.

In OEMM, tracing data lineage is a very similar process to impact analysis. In fact, it uses the same dialog window but now you should select the Trace Data Lineage or Trace Lineage (Advanced) option. Again, either option opens the same dialog window.

In our example, we are going to use the Table View from a particular OBI Analysis. We will perform first the data lineage using the model as a scope and then run another one in the context of the configuration in order to make clear the differences between these two data lineages.

Once I have the OBIEE Model opened, I double-click the required analysis and the analysis with all its components (criteria and layout) will appear in the Metadata Browser. I will run first the data lineage based on the model.

Another tab is shown with the data lineage based on the model. You can select one measure for example and see the properties. It will show details as the expression or formula (if it is a calculated item) and the default aggregation. In the diagram appears three main objects (that can be collapsed or expanded as required), one that represents the physical layer in the OBI repository (rpd), another for the Presentation layer and finally the table view in the OBI Analysis.

As you probably realise by now, there are a lot of tabs and panels that appears as you open different objects and execute some tasks. The panels are collapsible and can be resized, so you will need to do this a lot in order to visualise what you want.

We are going to repeat the same process but now selecting the configuration that contains the OBIEE model. In this new diagram you will see where the data in the OBI Analysis is coming from and the relationship with the other models in the configuration including the ODI model and the different database schemas. You can expand or collapse the objects in the diagram, tracing ETL details, highlight the path and all the features that we’ve seen for the data impact diagram as it is the same diagram.

Another useful and new feature of this release is the quick find that allows you to search for specific words and show you a list of all the objects in the diagram that matches the search criteria. If you double-click in one of this results, the object will appear highlighted in the diagram.

Model Connection Overview

The Model Connection Overview diagram shows the connection of the objects inside a model. As its name suggests is just an overview of the connections. You cannot go deeper into the object details, but is useful to have a rough idea of the object relationship in a model. In the next pictures you will see an example of he Model Connection diagram of the OBIEE model and the diagram for a particular mapping in the ODI model. You can also save it as a bookmark as the other diagrams.

And this is how you can perform data lineage and impact analysis in OEMM, analysing the relationships among your company data.

Social

- Recent Posts
- Tags
- Search
- Sql2Odi – best practices for auto-generating your ETL content
  
  5:23 pm By Janis Rumnieks
  I blogged a while ago about our Sql2Odi tool that converts SELECT (and WITH as well) statements into Oracle ODI Mappings. (Blog posts 1, Read More »
- Why Data Governance Is More About Enabling Good Than Stopping Bad
  
  10:06 am By Jon Mead
  It seems like every day businesses are bombarded with reminders of the value of data. Data, we are told, leads us to Read More »
- Oracle APEX – Debugging Tip #1
  
  11:00 am By Lucas Hirschegger
  If I would need this feature I would probably invest a lot of time trying to find out an answer. But funny enough I found out Read More »
- We’ve added Search!
  
  4:07 pm By Jon Mead
  Well, to be precise, our blog is hosted on Ghost, and we have just upgraded to version 5.0 which includes search, see here.
- Oracle APEX – Social Login
  
  2:21 pm By Lucas Hirschegger
  I looked into Social Sign-in as an option for Oracle APEX a few years ago. This was pre APEX 18.1 and, at this time, it was Read More »
/EPM / Bi /Patch Releases 11g Big Data blog Business Intelligence Cloud EPM Cloud EPM On Premise Hyperion Obiee Oracle Oracle BI Apps Oracle BI Suite EE Oracle Data Integrator patch patch_set_update psu Uncategorized User Groups & Conferences

Calling the BI Server Metadata Web Service

Configuring Security

Testing the Web Service

Updating Repository Variables Programatically

Getting Funky – Updating RPD from Web Service

Setting up the BI Server JDBC Driver and Topology

Building the Cache Management Routine

Full Cache Purge

Partial Cache Purge

Cache Seeding

Integrating it in the ODI batch

Maintaining an Accurate Cache

Purging Options

Performing the Purge

BI Server Cache Purge Procedures

Event Polling Tables (EPT)

Seeding the Cache

Sidenote – Checking the RPD for Cacheable Tables

Day in the Life of an OBIEE Cache Entry (Who Said BI Was Boring?)

Cache Location

Summary

Social