Tag Archives: Obiee

Embedding a D3 Visualisation in OBIEE

In this blog entry, my first since joining the company as a consultant a little over a month ago, I will be taking you through the process of embedding a D3 visualisation into OBIEE. In my first few weeks in this new role I’ve had the pleasure or working with the forward thinking people at Nominet and one of the technologies they’ve exploited in their BI solution is D3. They are using it to display daily domain name registrations’ by postcode on a map of the UK within OBIEE- very cool stuff indeed and after seeing it I was sufficiently inspired to have a go myself. We’ll start with an overview of D3 itself and then move onto a worked example where we will look at why you may want to use D3 in OBIEE and the best way to go about it. This blog entry is not intended to be a D3 tutorial, there are plenty of very good tutorials out there already, this is more targeted at OBIEE developers who may have heard of D3 but haven’t yet had a “play” ( yes, it’s great fun ! ) with it yet in the context of OBIEE. So without further delay let’s begin…..

What is D3 ?

To answer this question I will first tell you what D3 isn’t – D3 is not a charting library, you will find no predefined bar charts, no line graphs, no scatter plots, not even a single pie chart. “What no pie charts ?!” I here you say, don’t despair though, you can create all these types of visuals in D3 and a whole lot more. D3 which is short for “Data-Driven Documents” is a visualisation framework written entirely in JavaScript by . The term framework is the key word here, D3 is generic in nature which allows it to tackle almost limitless visualisation challenges. The downside though is that you have to tell D3 exactly what you want it to do with the data you throw at it and that means rolling up your sleeves and writing some code. Fear not though, D3 has a great API that will enable you to get a fairly basic visual up and running in a short space of time. You’ll need a basic understanding of HTML, CSS and SVG and of course some JavaScript but with the help of the online tutorials and the D3 API reference you’ll be up and running in no time.

How does it work ?

D3′s API allows you to bind data to place holders in the browser’s DOM. You can then create SVG or HTML elements in these place holders and manipulate their attributes using the dataset you pass to it. Below is the code to display a very simple D3 visualisation to give you a basic idea of how it works.


var dataset = [ 10, 15, 20 ];                    

var svg = d3.select("body")               
               .append("svg")                                    
               .attr("width", 200) 
               .attr("height", 200); 

svg.selectAll("circle") 
               .data(dataset) 
               .enter() 
               .append("circle") 
               .attr("fill","none") 
               .attr("stroke", "green") 
               .attr("r",function(d){  
                    return d ; }) 
               .attr("cy",100)
               .attr("cx",function(d){ 
                    return d * 8; 
               });

This code produces the following visual, admittedly this is not very impressive to look at but for the purpose of this example it will do just fine.

Let’s break this down and see what’s going on.

var dataset = [ 10, 15, 20 ]; 

Above is our data in a JavaScript array, no D3 here. We’ll look at how to generate this from OBIEE a bit later on.

var svg = d3.select("body") .append("svg")                                    
               .attr("width", 200) 
               .attr("height", 200); 

Here we are seeing D3 for the first time. This is a single line of code split over several lines so it’s easier to read. Anyone familiar with jQuery will notice the chaining of methods with the “.” notation. The first line selects the html body tag and then appends an SVG element to it. The two “.attr()” methods then set the width and the height of the SVG.

 svg.selectAll("circle") 
               .data(dataset) 
               .enter() 
               .append("circle") 
               .attr("fill","none") 
               .attr("stroke", "green") 
               .attr("r",function(d){  
                    return d ; }) 
               .attr("cy",100)
               .attr("cx",function(d){ 
                    return d * 8; 
               }); 

Here’s where it gets interesting. The first line selects all circles within the SVG, but wait, we haven’t created any circles yet ! this is where the the place holders come into play that I mentioned earlier. These place holders exist in memory only and are waiting to have data bound to them and then finally an actual circle element. The next line binds the data to the place holders, 10 to the first one, 15 to the next and 20 to the last. The magic .enter() method will then execute all the remaining statements once for each of our data elements, in this case 3 times. The .append(“circle”) will therefore be called 3 times and create 3 circle elements within the SVG.

The remaining statements will change attributes associated with the circles and this is where you can use the data to “drive” the document. Notice the attr(“r”… and the attr(“cx”… method calls, The “r” attribute defines the circle radius and the “cx” attribute sets the centre “x” coordinate of the circle. Both these methods have functions passed as arguments, the “d” variable in each function represents the current data element  (or datum) in the array, 10 the first time, 15 the next and 20 on the last. These functions then return a value to the attr() methods. In this case the radius of each circle is being set to 10,15 and 20 respectively and the x coordinate of each is being set to 80, 120, 160 respectively.

Right, that’s a basic overview of D3 and how it works, let’s now have a look at an example where you might want to use D3 in OBIEE.

A Worked Example

Let’s say we have some customers and each month, with a bit of luck ( and with the help of a Rittmanmead implemented BI system ! ) we sell to those customers. Based on the sales total for each customer each month we then place these customers into a scoring group. Group 1 for customers for which revenue exceeded £10,000, Group 2 for revenue greater than £5,000, Group 3 for revenue greater than £2,000 and Group 4 for revenue greater £0. At the end of each month we want to compare each customers current scoring group with the scoring group they occupied the previous month. It’s the movement of customers between groups each month that we are interested in analysing.

Here’s the criteria we have in OBIEE Answers.

And here are the results.


As you can see the Movement column is  calculated as  ( Group Last MonthGroup This Month ).

Now at this point you might say “Job done, nothing more to do”, we can clearly see this months and last months scoring group and the movement between the two, we have successfully conveyed the information to the user with minimal fuss and you’d be right. We could even develop the results further by conditionally formatting the colour of each row to reflect the current group and maybe we could add an additional column that contains an up arrow, down arrow or equals sign image to indicate the direction of movement. This is where it gets subjective, some users love visuals, some prefer the raw data and some the combination of the two. For this example we are going to try and visualise the data and you can make up your mind which you prefer at the end.

Lets see what we can produce using some standard OBIEE visuals.

First up the Vertical Bar Chart

That’s not bad, we can see by how many groups a customer has shifted since last month with some customers clearly staying put. What we can’t see is from which groups they moved to and from, if we were to add in the Group Last Month and Group This Month measures to the chart it would look a complete mess.

Lets try something else, let’s try a line graph only this time we’ll include all three measures to make sure we’ve got all the information the user requires.

Mmmm, again not bad but to my mind it’s not particularly intuitive, you have to study the graph to get the information you’re after by which time most users will have switched off completely and may be better off a simple table view.

Lets have try again with a combined line and bar chart

A slight improvement on the previous attempt but still not great. An issue I have with the last 2 examples is that to my mind customers in Group 1 ( the best group ) should be at the top of the chart while customers in Group 4 should be at the bottom of the chart – the cream rises to the top, right ?! We have no way of inverting the Y axis values so we are stuck with this rather counterintuitive view of the data. We could ask the users if they could would kindly change their grouping system but it’s unlikely they’ll agree and anyway that would be cheating !

Here’s one last attempt just for fun !

Nope, we’re not making much progress here, this is just making me feel nauseous and believe me when I tell you I came up with several more ridiculous solutions than this, pie chart anyone ? So far the table view has been the clear winner over the visualisations in terms of ease of interpretation and this isn’t a dig at OBIEE’s visuals, OBIEE’s visual’s are great at doing what they are designed to do and they do it quickly. A D3 solution will take time and planning ( and debugging ) but you will have total control over the output and you’ll be able to express your data in a way that no OBIEE visualisation will ever be able to do. So with that in mind let’s have a look at one possible solution written in D3.

Developing a D3 Solution

In order to embed a D3 visualisation in OBIEE you’ll need to use a Narrative view. The Narrative view will enable you to gain access to the data that we need to drive our visualisation using the @n substitution variables where n equals the column position in the criteria. We’ll use this technique to generate some JavaScript from our analysis results that we can then use in our D3 script. Let’s look at doing that now.

In the Prefix section at the top we are declaring a JavaScript array variable called “data” that will contain the data from the analysis. The Narrative section contains the following code

data.push({customer:"@1",prev_group:@2,cur_group:@3,delta:@4});

Because this line of code is in the narrative section it will be output once for each row in the result set, each time substituting @1, @2, @3 and @4 for Customer, Group Last Month, Group This Month and Movement respectively and will dynamically generate the JavaScript to populate our array.  Notice that within the parentheses we have this format

{ key:value, key:value, key:value } 

This will in fact create a JavaScript object for us in each array element so we can then reference our data in our D3 script by using data.customer, data.prev_group and data.delta to get at the values. As you can see below the postfix section we now have a load of JavaScript code ready to be used in our D3 script for testing and development purposes.

At this point I would strongly recommend leaving OBIEE and opening up your favourite development IDE, you will soon get frustrated writing JavaScript code directly into OBIEE. Personally I like NetBeans but there are several other free alternatives out there.

To get started in your favourite IDE you can either download a copy of the D3 library from http://d3js.org/ or reference the online version from within your script. I’d recommend uploading a copy to the OBIEE server once you’ve finished developing and are ready to go into production. If you want to reference the online version you will need to include the following snippet between the <head></head> tags in your html file.

<script src="http://d3js.org/d3.v3.min.js" charset="utf-8"></script>

So to start developing your D3 solution create a new HTML project in your IDE. Add a reference to the D3 library and then copy and paste the generated JavaScript code from the OBIEE narrative between some <script> tags. You should end up with something like this:-

You are now ready to make a start, simply save the project and open up the resulting HTML file in your favourite browser to test any changes you make along the way.

When you are ready to embed the finished code into OBIEE you’ll first need to decide from where you wish to reference the D3 Library and the D3 Code you have just written. With regards to the D3 library as I mentioned earlier you can either reference the online version of the D3 library or copy the library to the OBIEE server and reference it from there. With the custom D3 code you’ve written you can again either upload the code to a file on the OBIEE server or you can just copy and paste your code directly into the narrative. I’d recommend uploading it to the OBIEE server so you can reference it in other analyses at a later date but for now let’s just paste it in.

Lets have a look at the completed Narrative View.

The First thing to note is that the “Contains HTML Markup” checkbox is ticked, this is required as we have now entered some <script> tags and a <div> tag and without this ticked OBIEE will not interpret them correctly. The first <script> tag in the prefix section is referencing the online D3 library. The second <script> tag in the prefix section is closed at the start of the postfix section and wraps around the “data” variable and the JavaScript used to populate it. Below the closing </script> tag in the postfix section we are creating a HTML DIV element that will contain the SVG created by the D3 script. Finally we either enter a reference to our custom D3 script on the OBIEE server or just paste it in between script tags. One important thing to note is that when we transfer the code from our IDE to OBIEE we only want to bring across the D3 code, we want to leave behind the “data” variable and all the html tags as OBIEE will be generating these for us.

So lets take a look at the solution written in D3.

As you can see this contains all the information we require. We can clearly see the current group that the customer occupies and the movement, if any, from the group they occupied the previous month. This could easily be enhanced so that when you hover your mouse over a customer the raw data is presented to the user in some way, you’d simply need to change the narrative section to include a new column from the criteria and the then write the required code in D3 to display it – you really are only limited by your imagination. It may take more time and effort to produce a D3 solution over using a standard OBIEE visual and yes, as with all customisations, you are exposed from a risk point of view when it comes to upgrades and when the person who developed the solution decides to go travel the world but for these edge cases where you just can’t get the right result using a standard OBIEE visual, D3 may just save your bacon.

Below is a live demo of the visualisation outside of OBIEE, it is seeded from random data each time you press the reload button.

It’s been tested in Firefox, Chrome, Safari and IE9. IE versions 8 and below do not natively support SVG although there are various workarounds, none of which have been implemented in this example.

And here’s a link to the code on jsfiddle.net so you can have a play with it yourself. D3 Demo

So in closing here are some tips around developing D3 in OBIEE.

  • Try to sketch out on paper what you want your D3 solution to look like and then try a replicate it with code ( the fun bit ! ).
  • Head over to d3js.org for some inspiration, there are some truly mind blowing examples on there.
  • Use the Narrative view to generate some hard coded data, you can then use it for development and testing purposes.
  • Don’t develop directly in OBIEE, use a purpose build development IDE
  • Once you’re ready to go into production with your solution upload the D3 library and your custom code to the OBIEE server and reference it from there.

Have fun….

Testing aggregate navigation on OBIEE and Exalytics

One of OBIEE’s many great strengths is aggregate navigation; the ability to choose from a list of possible tables the one which will probably give the optimal performance for a given user query. Users are blissfully unaware of which particular table their query is being satisfied from, since aggregate navigation happens on the BI Server once the user’s request comes through from an Analysis or Dashboard.

This seamless nature of aggregate navigation means that testing specific aggregates are working can be fiddly. We want to ensure that the aggregates we’ve built are (i) being used when appropriate and (ii) showing the correct data. This is the particularly the case in Exalytics when aggregates are put into in-memory (TimesTen) by the Summary Advisor and we need to validate them.

Whilst the log file nqquery.log (or Usage Tracking table S_NQ_DB_ACCT) tells us pretty easily which table a query used, it is nice to be able to switch a query easily between possible aggregate sources to be able to compare the data. This blog demonstrates how we can use the INACTIVE_SCHEMAS variable (as described in my previous blog on loading Exalytics incrementally) to do this.

INACTIVE_SCHEMAS is a Logical SQL variable that tells the BI Server to exclude the specified physical schema(s) from consideration for resolving an inbound query. Normally, the BI Server will parse each incoming query through the RPD, and where a Logical Table has multiple Logical Table Sources it will evaluate each one to determine if it (a) can satisfy the query and (b) whether it will be the most efficient one to use. By using INACTIVE_SCHEMAS we can force the BI Server to ignore certain Logical Table Sources (those associated with the physical schema specified), ensuring that it just queries the source(s) we want it to.

In the following example, the data exists on both Oracle database, and TimesTen (in-memory). Whilst the example here is based on an Exalytics architecture, the principle should be exactly the same regardless of where the aggregates reside. This is how the RPD is set up for the Fact table in my example:

The GCBC_SALES schema on Oracle holds the unaggregated sales data, whilst the EXALYTICS schema on TimesTen has an aggregate of this data in it. The very simple report pictured here shows sales by month, and additionally uses a Logical SQL view to show the contents of the query being sent to the BI Server:

Looking at nqquery.log we can see the query by default hits the TimesTen source:

[...]
------------- Sending query to database named TimesTen aggregates
WITH
SAWITH0 AS (select distinct T1528.Sale_Amoun000000AD as c1,
     T1514.Month_YYYY000000D0 as c2
from
     SA_Month0000011E T1514,
     ag_sales_month T1528
[...]

Now, for thoroughness, let’s compare this to what’s in the TimesTen database, using a Direct Database Request:

OK, all looks good. But, is what we’ve aggregated into TimesTen matching what we’ve got in the source data on Oracle? Here was can use INACTIVE_SCHEMAS to force the BI Server to ignore TimesTen entirely. We can see from the nqquery.log that OBI has now gone back to the Oracle source of the data:

[...]
------------- Sending query to database named orcl
WITH
SAWITH0 AS (select sum(T117.FCAST_SAL_AMT) as c1,
     T127.MONTH_YYYYMM as c2
from
     GCBC_SALES.TIMES T127 /* Dim_TIMES */ ,
     GCBC_SALES.SALES T117 /* Fact_SALES */
[...]

and the report shows that actually we have a problem in our data, since what’s on the source doesn’t match the aggregate:

A Direct Database Request against Oracle confirms the data we’re seeing – we have a mismatch between our source and our aggregate:

This is the kind of testing that it is crucial to perform. Without proper testing, problems may only come to light in specific reports or scenarios, because by the very nature of aggregate navigation working silently and hidden from the user.

So this is the feature we can use to perform the testing, but below I demonstrate a much more flexible way that having to build multiple reports.

Implementing INACTIVE_SCHEMAS

Using INACTIVE_SCHEMAS in your report is very simple, and doesn’t require modification to your reports. Simply use a Variable Prompt to populate INACTIVE_SCHEMAS as a Request Variable. Disable the Apply button for instantaneous switching when the value is changed.

A Request Variable will be prepended it to any logical SQL sent to the BI Server. Save this prompt in your web catalog, and add it to any dashboard on which you want to test the aggregate:

Even better, if you set the security on the dashboard prompt such that only your admins have access to it, then you could put it on all of your dashboards as a diagnostic tool and only those users with the correct privilege will even see it:

Displaying the aggregate source name in the report

So far this is all negative , in that we are specifying the data source not to use. We can examine nqquery.log etc to confirm which source was used, but it’s hardly convenient to wade through log files each time we execute the report. Ripped off from Inspired by SampleApp is this trick:

  1. Add a logical column to the fact table
  2. Hard code the expression for the column in each Logical Table Source
  3. Bring the column through to the relevant subject area
  4. Incorporate it in reports as required, for example using a Narrative View.

Bringing it all together gives us this type of diagnostic view of our reports:

Summary

There’s a variety of ways to write bespoke test reports in OBI, but what I’ve demonstrated here is a very minimal way of overlaying a test capability on top of all existing dashboards. Simply create the Request Variable dashboard prompt, set the security so only admins etc can see it, and then add it in to each dashboard page as required.

In addition, the use of a ‘data source’ logical column in a fact table tied to each LTS can help indicate further where the data seen is coming from.

Incremental refresh of Exalytics aggregates using native BI Server capabilities

One of the key design features of the Exalytics In-Memory Machine is the use of aggregates (pre-calculated summary data), held in the TimesTen In-Memory database. Out of the box (“OotB”) these aggregates are built through the OBIEE tool, and when the underlying data changes they must be rebuilt from scratch.

For OBIEE (Exalytics or not) to make use of aggregate tables in a manner invisible to the user, they must be mapped into the RPD as additional Logical Table Sources for the respective Logical Table in the Business Model and Mapping (BMM) layer. OBIEE will then choose the Logical Table Source that it thinks will give the fastest response time for a query, based on the dimension level at which the query is written.

OBIEE’s capability to load aggregates is provided by the Aggregate Persistence function, scripts for which are generated by the Exalytics Summary Advisor, or the standard tool’s Aggregate Persistence Wizard. The scripts can also be written by hand.

Aggregate Persistence has two great benefits:

  1. It uses the existing metadata model of the RPD to understand where to get the source data for the aggregate from, and how to aggregate it. Because it uses standard RPD metadata, it also means that any data source that is valid for reporting against in OBIEE can be used as a source for the aggregates, and OBIEE will generate the extract SQL automagically. The aggregate creation process becomes source-agnostic. OBIEE will also handle any federation required in creating the aggregates. For example, if there are two source systems (such as Sales, and Stock) but one target aggregate, OBIEE will manage the federation of the aggregated data, just as it would in any query through the front-end.
  2. All of the required RPD work for mapping the aggregate as a new Logical Table Source is done automagically. There is no work on the RPD required by the developer.

However, there are two particular limitations to ‘vanilla’ Aggregate Persistence:

  1. It cannot do incremental refresh of aggregates. Whenever the underlying data changes, the aggregate must be dropped and rebuilt in entirety. This can be extremely inefficient if only a small proportion of the source data has changed, and can ultimately lead to scalability and batch SLA issues.
  2. Each time that the aggregate is updated, the RPD is modified online. This can mean that batch times take longer than they need to, and is also undesirable in a Production environment.

I have written about alternatives and variations to the OotB approach for refreshing Exalytics aggregates previously here and here, namely:

  1. Loading TimesTen aggregates through bespoke ETL, in tools such as GoldenGate and ODI. TimesTen supports a variety of interfaces – including ODBC and JDBC – and therefore can be loaded by any standard ETL tool. A tool such as GoldenGate can be a good way of implementing a light-touch CDC solution against a source database.
  2. Loading TimesTen aggregates directly using TimesTen’s Load from Oracle functionality, taking advantage of Aggregate Persistence to do the aggregate mapping work in the RPD

In both of these cases, there are downsides to the method. Using bespoke ETL is ultimately very powerful and flexible, but has the overhead of writing the ETL along with requiring manual mapping of the aggregates into the RPD. This mapping work is done in the TimesTen Load from Oracle method, but can only be used against an Oracle source database and where there is a single physical SQL required to load the aggregate.

Refreshing aggregates using native OBIEE functionality alone

Here I present another alternative method for refreshing Exalytics aggregates, but using OBIEE functionality alone and remaining close to the OotB method. It is based on Aggregate Persistence but varies in two significant ways :

  1. Incremental refresh of the aggregate is possible
  2. No changes are made to the RPD when the aggregate is refreshed

The method still uses the fundamentals of Aggregate Persistence since , as I mentioned above, it has some very significant benefits:

  • BI Server uses (dare I say, leverages), your existing metadata modelling work which is necessary – regardless of your aggregates – for users to report from the unaggregated data.
  • BI Server generates your aggregate refresh ETL code
  • If your source systems change, your aggregate refresh code doesn’t need to – just as reports are decoupled from the source system through the RPD metadata layers, so are your target aggregates

For us to understand the new method, a bit of background and explanation of the technology is required.

Background, part 1 : Aggregate Persistence – under the covers

When Aggregate Persistence runs, it does several things:

  1. Remove aggregates from physical database and RPD mappings
  2. Create the physical aggregate tables and indexes on the target database, for the fact aggregate and supporting dimensions
  3. Update the RPD Physical and Logical (BMM) layers to include the newly built aggregates
  4. Populate the aggregate tables, from source via the BI Server to the aggregate target (TimesTen)

What we are going to do here is pick apart Aggregate Persistence and invoke just part of it. We don’t need to rebuild the physical tables each time we refresh the data, and we don’t need to touch the RPD. We can actually just tell the BI Server to load the aggregate table, using the results of a Logical SQL query. That is, pretty much the same SQL that would be executed if we ran the aggregate query from an analysis in the OBIEE front end.

The command to tell the BI Server to do this is the populate command, which can be found from close inspection of the nqquery.log during execution of normal Aggregate Persistence:

populate "ag_sales_month" mode ( append table connection pool "TimesTen aggregates"."TT_CP") as
select_business_model "Sales"."Fact Sales"."Sale Amount" as "Sale_Amoun000000AD","Sales"."Dim Times"."Month YYYYMM" as "Month_YYYY000000D0" 
from "Sales";

This populate <table> command can be sent by us directly to the BI Server (exactly in the way that a standard create aggregate Aggregate Persistence script would be – with nqcmd etc) and causes it to load the specified table (using the specified connection pool) using the logical SQL given. The re-creation of the aggregate tables, and the RPD mapping, doesn’t get run:

The syntax of the populate command is undocumented, but from observing the nqquery.log file it follows this pattern:

Looking at a very simple example, we can see how a simple aggregate with a measure summarised by month could be populated:

SELECT_BUSINESS_MODEL was written about by Venkat here, and is BI Server syntax allowing a query directly against the BMM, rather than the Presentation Layer which Logical SQL usually specifies. You can build and test the SELECT_BUSINESS_MODEL clause in OBIEE directly (from Administration -> Issue SQL), in nqcmd, or just by extracting it from the nqquery.log.

Background, part 2 : Secret Sauce – INACTIVE_SCHEMAS

So, we have seen how we can take advantage of Aggregate Persistence to tell the BI Server to load an aggregate, from any source we’ve modelled in the RPD, without requiring it to delete the aggregate to start with or modify the RPD in any way.

Now, we need the a bit of secret sauce to complete the picture and make this method a viable one.

In side-stepping the full Aggregate Persistence sequence, we have one problem. The Logical SQL that we use in the populate statement is going to be parsed by the BI Server to generate the select statement(s) against the source database. However, the BI Server uses its standard query parsing on it, using the metadata defined. Because the aggregates we are loading are already mapped into the RPD then by default the BI Server will probably try to use the aggregate to satisfy the aggregate populate request (because it will judge it the most efficient LTS) – thus loading data straight from the table that we are trying to populate!

The answer is the magical INACTIVE_SCHEMAS variable. What this does it tell OBIEE to ignore one or more Physical schemas in the RPD, and importantly, any associated Logical Table Sources. INACTIVE_SCHEMAS is documented as part of the Double Buffering. It can be used in any logical SQL statement, so is easily demonstrated in an analysis (using Advanced SQL Clauses -> Prefix):


Forcing OBIEE query to use avoid a LTS, using INACTIVE_SCHEMAS. Click image for a larger version.


So when we specify the populate command to update the aggregate, we just include the necessary INACTIVE_SCHEMAS prefix:

SET VARIABLE INACTIVE_SCHEMAS='"TimesTen Aggregates".."EXALYTICS"': 
populate "ag_sales_month" mode ( append table connection pool 
"TimesTen aggregates"."TT_CP") as  
select_business_model "Sales"."Fact Sales"."Sale Amount" as "Sale_Amoun000000AD","Sales"."Dim Times"."Month YYYYMM" as "Month_YYYY000000D0" 
from "Sales";

Why, you could reasonably ask, is this not necessary in a normal OotB aggregate refresh? For the simply reason that in “vanilla” Aggregate Persistence usage the whole aggregate gets deleted from the RPD before it is rebuilt, and therefore when the aggregate query is executed there is only the base LTS is enabled in the RPD at that point in time.

The final part of the puzzle – Incremental refresh

So, we have a way of telling BI Server to populate a target aggregate without rebuilding it, and we have the workaround necessary to stop it trying to populate the aggregate from itself. The last bit is making sure that we only load the data we want to. If we execute the populate statement as it stands straight from the nqquery.log of the initial Aggregate Persistence run then we will end up with duplicate data in the target aggregate. So we need to do one of the following :

  1. Truncate the table contents before the populate
  2. Use a predicate in the populate Logical SQL so that only selected data gets loaded

To issue a truncate command, you can use the logical SQL command execute physical to get the BI Server to run a command against the target database, for example:

execute physical connection pool "TimesTen Aggregates"."TT_CP" truncate table ag_sales_month

This truncate/load method is appropriate for refreshing dimension aggregate tables, since there won’t usually be an update key as such. However, when refreshing a fact aggregate it is better for performance to use an incremental update and only load data that has changed. This assumes that you can identify the data and have an update key for it. In this example, I have an aggregate table at Month level, and each time I refresh the aggregate I want to load just data for the current month. In my repository I have a dynamic repository variable called THIS_MONTH. To implement the incremental refresh, I just add the appropriate predicate to the SELECT_BUSINESS_MODEL clause of the populate statement:

select_business_model "Sales"."Fact Sales"."Sale Amount" as "Sale_Amoun000000AD","Sales"."Dim Times"."Month YYYYMM" as "Month_YYYY000000D0" 
from "Sales" 
where "Dim Times"."Month YYYYMM" =  VALUEOF("THIS_MONTH")

Making the completed aggregate refresh command to send to the BI Server:

SET VARIABLE DISABLE_CACHE_HIT=1, DISABLE_CACHE_SEED=1, DISABLE_SUMMARY_STATS_LOGGING=1, 
INACTIVE_SCHEMAS='"TimesTen Aggregates".."EXALYTICS"'; 
populate "ag_sales_month" mode ( append table connection pool 
"TimesTen aggregates"."TT_CP") as  
select_business_model "Sales"."Fact Sales"."Sale Amount" as "Sale_Amoun000000AD","Sales"."Dim Times"."Month YYYYMM" as "Month_YYYY000000D0" 
from "Sales" 
where "Dim Times"."Month YYYYMM" =  VALUEOF("THIS_MONTH");

Since there will be data in the table for the current month, I delete this out first, using execute physical:

execute physical connection pool "TimesTen Aggregates"."TT_CP" delete from ag_sales_month where Month_YYYY000000D0 = VALUEOF(THIS_MONTH);

Step-by-step

The method I have described above is implemented in two parts:

  1. Initial build- only needs doing once
    1. Create Aggregate Persistence scripts as normal (for example, with Summary Advisor)
    2. Execute the Aggregate Persistence script to :
      1. Build the aggregate tables in TimesTen
      2. Map the aggregates in the RPD
    3. Create custom populate scripts:
      1. From nqquery.log, extract the full populate statement for each aggregate (fact and associated dimensions)
      2. Amend the INACTIVE_SCHEMAS setting into the populate script, specifying the target TimesTen database and schema.
      3. For incremental refresh, add a WHERE clause to the populate logical SQL so that it only fetches the data that will have changed. Repository variables are useful here for holding date values such as current date, week, etc.
      4. If necessary, build an execute physical script to clear down all or part of the aggregate table. This is run prior to the populate script to ensure you do not load duplicate data
  2. Aggregate refresh – run whenever the base data changes
    1. Optionally, execute the execute physical script to prepare the aggregate table (by deleting whatever data is about to be loaded)
    2. Execute the custom populate script from above.
      Because the aggregates are being built directly from the base data (as enforced by INACTIVE_SCHEMAS) the refresh scripts for multiple aggregates could potentially be run in parallel (eg using xargs). A corollary of this is that this method could put additional load on the source database, because it will be hitting it for every aggregate, whereas vanilla Aggregate Persistence will build aggregates from existing lower-level aggregates if it can.

Summary

This method is completely valid for use outside of Exalytics too, since only the Summary Advisor is licensed separately. Aggregate Persistence itself is standard OBIEE functionality. For Exalytics deployed in an environment where aggregate definitions and requirements change rapidly then this method would be less appropriate, because of the additional work required to modify the scripts. However, for an Exalytics deployment where aggregates change less frequently, it could be very useful.

The approach is not without drawbacks. Maintaining a set of custom populate commands has an overhead (although arguably no more so than a set of Aggregate Persistence scripts), and the flexibility comes at the cost of putting the onus of data validity on the developer. If an aggregate table is omitted from the refresh (for example, a support aggregate dimension table) then reports will show erroneous data.

The benefit of this approach is that aggregates can be rapidly built and maintained in a sensible manner. The RPD is modified only in the first step, the initial build. It is then left entirely untouched. This makes refreshes faster, and safer; if it fails there is just the data to tidy up, not the RPD too.

Oracle BI Tech Demos youtube channel

A quick post to advertise Oracle's BI Tech Demos youtube channel !

Very good content and good information, even if a lot isn't available to the public and/or only on NDA basis. Extremely valuable peek into the future though so you can see what's upcoming.