Liberate your data

Intelligence is all about knowledge. This website is dedicated sharing expertise on Oracle BI. More »

 

Taking a Look at Oracle Big Data Preparation Cloud Service – Spark-Based Data Transformation in the Cloud

One of the sessions I’m delivering at the upcoming Oracle Openworld 2015 in San Francisco is entitled “Oracle Business Intelligence Cloud Service—Moving Your Complete BI Platform to the Cloud [UGF4906]”, and looks at how you can now migrate your entire OBIEE11g platform into Oracle Public Cloud including data warehouse and data integration routines. Using Oracle BI Cloud Services’ new on-premise RPD upload feature you can upload an existing RPD into BICS and run it from there, with the accompanying data warehouse database moving into Oracle’s Database Cloud Service (and with the restriction that you can’t then edit the repository within BICS, you need to do that on-premise and upload again). For ETL and data integration you can carry on using ODI12c which now has the ability to load data into Oracle Storage Cloud (for file sources) and BICS (via a REST API) as well as the full Oracle DBaaS, but another Oracle option for doing purely cloud-based data processing enrichment has recent become available – Oracle Big Data Preparation Cloud Service. So what is it, how does it work and how is it different to ODI12c?

Oracle Big Data Preparation Cloud Service (“BDP”) is a thin-client application within Oracle Cloud for ingesting, preparing and enriching datasets that don’t have a predefined schema and may well need certain fields obfuscated or cleansed. Being integrated with Oracle Storage Cloud and other infrastructure and platform services within Oracle cloud it’s obviously aimed mainly at data transformation tasks within the Oracle Cloud enviroment, but you can upload and download datasets from your browser for use with on-premise applications. Unlike the more general-purpose Oracle Data Integrator it’s aimed instead at a particular use-case – non-technical information analysts who need to get data transformed, wrangled and enriched before they can make use of it in an environment like Hadoop. In fact the product name is a bit misleading – it runs on a big data platform within Oracle Cloud and like Oracle Big Data Discovery uses Apache Spark for its data processing – but it could potentially be useful for a business analyst to prepare data for loading into Oracle BI Cloud Service, and I’ll cover this angle when I talk about data loading options in by Oracle Openworld session.

Within a logical architecture for a typical big data DW and BI system, BDP sits alongside ODI within the Data Factory and provides self-service, agile transformation capabilities to more business-orientated users. 

NewImage

Oracle Big Data Cloud Preparation Service shares quite a bit of functionality and underlying technology, with Oracle Big Data Discovery – both run on Hadoop, they both use Apache Spark for data processing and transformation, and both offer data transformation and “wrangling” features aimed at non-technical users. Oracle are positioning Big Data Preparation Service as something you’d use in the execution layer of the Oracle Information Management Reference Architecture whereas Big Data Discovery is associated more with the discovery layer – I’d mostly agree but I can see a role for BDD even within the execution layer, as a front-end to the data reservoir that typically now runs alongside relationally-stored data warehouses.

NewImage

Looking back at the slides from one of the recent Strata conferences, for example, sees Oracle positioning BDP as the “operational data preparation” tool for structured and unstructured data – with no defined schema – coming into your information platform, with the enriched output then being used BI tools, enterprise reporting and data discovery tools.

NewImage

 

Apart from the scalability benefits of running BDP on Apache Spark, the other interesting feature in BDP is how it uses Spark’s machine learning capabilities to try to automate as much of the data preparation process as possible, for example detecting credit card numbers in data fields and recommending you obfuscate that column. Similar to BICS and how Oracle have tried to simplify the process of creating reports and dashboards for a small team, BDP runs in the cloud tries to automate and simplify as much of the data preparation and enrichment process as possible, with ODI12c still available for ETL developers to develop more complex transformations.

The development lifecycle for BDP (from the Oracle Big Data Preparation Cloud Service e-book on Oracle’s website) uses a cycle of ingesting, cleaning, enriching and then publishing data using scripts authored using the tool and run on the Apache Spark platform. The diagram below shows the BDP development lifecycle from Oracle’s Big Data Preparation Cloud Service Handbook, and shows how ingestion, enrichment, publishing and governance go in a cycle with the common foundation of the transformation scripts that you build using BDP’s web interface.

NewImage

So let’s walk through an example data preparation exercise using a file of data stored initially in Oracle Storage Cloud Service. After logging into BDP via Oracle Cloud you’re first presented with the Catalog view, listing out all your previous transformations and showing you when they were last used to process some data.

NewImage

To create a transformation you first give it a name, then select the data source and then the file you’re interested in. In my environment I’ve got Oracle Storage Cloud and HDFS available as my main data sources, or I could upload a file from my desktop and start from there.

NewImage

BDP then ingests the file and then uses its machine learning features to process and classify data in each column, recommending column names such as “gender”, “city” and cc_number based on (presumably) some sort of classification model. In the screenshot below you can see a set of these recommendations on the left-hand side of the screen, with the columns themselves listed centre and a summary of the file profiling on the right.

NewImage

Taking a closer look at the profile results panel you can see two of the columns have alerts raised, in red. Clicking on the alert shows that the two columns have credit card data stored in clear text, with the recommendation being to obfuscate or otherwise secure these fields. Clicking on a field then shows the various transformation options, with the obvious choice here being to automatically obfuscate the data in those fields.

NewImage

Once you’ve worked through all the recommendations and added any transformations you choose to add yourself, the final step is to publish your transformation to one of the available targets. In the example below we’ve got Oracle Storage Cloud and HDFS again as potential targets; I’d imagine Oracle will add a connector to BICS soon, for example, so that you can use BDP as a data prep tool for file data that will then be added to your dataset in BICS.

NewImage

So … it’ll be interesting to see where this one goes. Its interesting that Oracle have split out data preparation and data discovery into two tools whilst others are saying theirs can do both, and you’ll still need ODI for the more complex integration jobs. But I like the innovative use of machine learning to do away with much of the manual work required for classification of incoming data fields, and running the whole thing on Spark certainly gives it the potential of scale. A couple of years ago I was worried Oracle didn’t really have a strategy for data integration and ETL in the cloud, but we’re starting to see something happen now.

There’s a big push from the Oracle field at the moment to move customers into the cloud, and I can see BDP getting bundled in with Big Data Cloud Service and BICS as the accompanying cloud data preparation tool. The danger then of course is that Big Data Discovery starts to look less useful, especially with Visual Analyzer already available within BICS and coming soon on-premise with OBIEE12c. My guess is that what we’re seeing now with these initial releases of BDP and BDD is just the start, with BDP adding more automatic enrichment “smarts” and starting to cover integration use-cases too, whilst BDD will put more focus on data visualization and analytics on the data reservoir.

Managing the OBIEE BI Server Cache from ODI 12c

I wrote recently about the OBIEE BI Server Cache and how useful it can be, but how important it is to manage it properly, both in the purging of stale data and seeding of new. In this article I want to show how to walk-the-walk and not just talk-the-talk (WAT? But you’re a consultant?!). ODI is the premier data integration tool on the market and one that we are great fans of here at Rittman Mead. We see a great many analytics implementations built with ODI for the data load (ELT, strictly speaking, rather than ETL) and then OBIEE for the analytics on top. Managing the BI Server cache from within your ODI batch makes a huge amount of sense. By purging and reseeding the cache directly after the data has been loaded into the database we can achieve optimal cache usage with no risk of stale data.

There are two options for cleanly hooking into OBIEE from ODI 12c with minimal fuss: JDBC, and Web Services. JDBC requires the OBIEE JDBC driver to be present on the ODI Agent machine, whilst Web Services have zero requirement on the ODI side, but a bit of config on the OBIEE side.

Setting up the BI Server JDBC Driver and Topology

Here I’m going to demonstrate using JDBC to connect to OBIEE from ODI. It’s a principle that was originally written up by Julien Testut here. We take the OBIEE JDBC driver bijdbc.jar from $FMW_HOME/Oracle_BI1/bifoundation/jdbc and copy it to our ODI machine. I’m just using a local agent for my testing, so put it in ~/.odi/oracledi/userlib/. For a standalone agent it should go in $AGENT_HOME/odi/agent/lib.

[oracle@ip-10-103-196-207 ~]$ cd /home/oracle/.odi/oracledi/userlib/  
[oracle@ip-10-103-196-207 userlib]$ ls -l  
total 200  
-rw-r----- 1 oracle oinstall    332 Feb 17  2014 additional_path.txt  
-rwxr-xr-x 1 oracle oinstall 199941 Sep 22 14:50 bijdbc.jar

Now fire up ODI Studio, sign in to your repository, and head to the Topology pane. Under Physical Architecture -> Technologies and you’ll see Oracle BI

Right click and select New Data Server. Give it a sensible name and put your standard OBIEE credentials (eg. weblogic) under the Connection section. Click the JDBC tab and click the search icon to the right of the JDBC Driver text box. Select the default, oracle.bi.jdbc.AnaJdbcDriver, and then in the JDBC Url box put your server and port (9703, unless you’ve changed the listen port of OBIEE BI Server)

Now click Test Connection (save the data server when prompted, and click OK at the message about creating a physical schema), and select the Local Agent with which to run it. If you get an error then click Details to find out the problem.

One common problem can be the connection through to the OBIEE server port, so to cut ODI out of the equation try this from the command prompt on your ODI machine (assuming it’s *nix):

nc -vz my-obiee-server.foo.com 9703

If the host resolves correctly and the port is open then you should get:

Connection to my-obiee-server.foo.com 9703 port [tcp/*] succeeded!

If not you’ll get something like:

nc: my-obiee-server.foo.com port 9703 (tcp) failed: Connection refused

Check the usual suspects – firewall (eg iptables) on the OBIEE server, firewalls on the network between the ODI and OBIEE servers, etc.

Assuming you’ve got a working connection you now need to create a Physical Schema. Right click on the new data server and select New Physical Schema.

OBIEE’s BI Server acts as a “database” to clients, within which there are “schemas” (Subject Areas) and “tables” (Presentation Tables). On the New Physical Schema dialog you just need to set Catalog (Catalog), and when you click the drop-down you should see a list of the Subject Areas within your RPD. Pick one – it doesn’t matter which.

Save the physical schema (ignore the context message). At this point your Physical Architecture for Oracle BI should look like this:

Now under Logical Architecture locate the Oracle BI technology, right click on it and select New Logical Schema. From the Physical Schemas dropdown select the one that you’ve just created. Give a name to the Logical Schema.

Your Logical Architecture for Oracle BI should look like this:

Building the Cache Management Routine

Full Cache Purge

Over in the Designer tab go to your ODI project into which you want to integrate the OBIEE cache management functions. Right click on Procedures and select Create New Procedure. Give it a name such as OBIEE Cache – Purge All and set the Target Technology to Oracle BI

Switch to the Tasks tab and add a new Task. Give it a name, and set the Schema to the logical schema that you defined above. Under Target Command enter the call you want to make to the BI Server, which in this case is

call SAPurgeAllCache();

Save the procedure and then from the toolbar menu click on Run. Over in the Operator tab you should see the session appear and soon after complete – all being well – successfully.

You can go and check your BI Server Cache from the OBIEE Administration Tool to confirm that it is now empty:

And confirm it through Usage Tracking:

From what I can see at the default log levels, nothing gets written to either nqquery.log or nqserver.log for this action unless there is an error in your syntax in which case it is logged in nqserver.log:

(For more information on that particular error see here)

Partial Cache Purge

This is the same pattern as above – create an ODI Procedure to call the relevant OBIEE command, which for purging by table is SAPurgeCacheByTable. We’re going to get a step more fancy now, and add a variable that we can pass in so that the Procedure is reusable multiple times over throughout the ODI execution for different tables.

First off create a new ODI Variable that will hold the name of the table to purge. If you’re working with multiple RPD Physical Database/Catalog/Schema objects you’ll want variables for those too:

Now create a Procedure as before, with the same settings as above but a different Target Command, based on SAPurgeCacheByTable and passing in the four parameters as single quoted, comma separated values. Note that these are the Database/Catalog/Schema/Table as defined in the RPD. So “Database” is not your TNS or anything like that, it’s whatever it’s called in the RPD Physical layer. Same for the other three identifiers. If there’s no Catalog (and often there isn’t) just leave it blank.

When including ODI Variable(s) make sure you still single-quote them. The command should look something like this:

Now let’s seed the OBIEE cache with a couple of queries, one of which uses the physical table and one of which doesn’t. When we run our ODI Procedure we should see one cache entry go and the other remain. Here’s the seeded cache:

And now after executing the procedure:

And confirmation through Usage Tracking of the command run:

Cache Seeding

As before, we use an ODI Procedure to call the relevant OBIEE command. To seed the cache we can use SASeedQuery which strictly speaking isn’t documented but a quick perusal of the nqquery.log when you run a cache-seeding OBIEE Agent shows that it is what is called in the background, so we’re going to use it here (and it’s mentioned in support documents on My Oracle Support, so it’s not a state secret). The documentation here gives some useful advice on what you should be seeding the cache with — not necessarily only exact copies of the dashboard queries that you want to get a cache hit for.

Since this is a cookie-cutter of what we just did previously you can use the Duplicate Selection option in ODI Designer to clone one of the other OBIEE Cache procedures that you’ve already created. Amend the Target Command to:

When you run this you should see a positive confirmation in the nqserver.log of the cache seed:

[2015-09-23T23:23:10.000+01:00] [OracleBIServerComponent] [TRACE:3]  
[USER-42] [] [ecid: 005874imI9nFw000jzwkno0007q700008K,0] [tid: 9057d700]  
[requestid: 477a0002] [sessionid: 477a0000] [username: weblogic]  
--------------------  
Query Result Cache: [59124]  
The query for user 'weblogic' was inserted into the query result cache.  
The filename is '/app/oracle/biee/instances/instance1/bifoundation/OracleBIServerComponent/coreapplication_obis1/cache/NQS__735866_84190_2.TBL'. [[

A very valid alternative to calling SASeedQuery would be to call the OBIEE SOA Web Service to trigger an OBIEE Agent that populated the cache (by setting ‘Destination’ to ‘Oracle BI Server Cache (For seeding cache)’). OBIEE Agents can also be ‘daisy chained’ so that one Agent calls another on completion, meaning that ODI could kick off a single ‘master’ OBIEE Agent which then triggered multiple ‘secondary’ OBIEE Agents. The advantage of this approach over SASeedQuery is that cache seeding is more likely to change as OBIEE usage patterns do, and it is easier for OBIEE developers to maintain all the cache seeding code within ‘their’ area (OBIEE Presentation Catalog) than put in a change request to the ODI developers each time to change a procedure.

Integrating it in the ODI batch

You’ve two options here, using Packages or Load Plans. Load Plans were introduced in ODI 11.1.1.5 and are a clearer and more flexible of orchestrating the batch.

To use it in a load plan create a serial step that will call a mapping followed by the procedure to purge the affected table. In the procedure step in the load plan set the value for the variable. At the end of the load plan, call the OBIEE cache seed step:

Alternatively, to integrate the above procedures into a Package instead of a load plan you need to add two steps per mapping. First, the variable is updated to hold the name of the table just loaded, and then the OBIEE cache is purged for the affected table. At the end of the flow a call is made to reseed the cache:

These are some very simple examples, but hopefully illustrate the concept and the powerful nature of integrating OBIEE calls directly from ODI. For more information about OBIEE Cache Management, see my post here.

OBIEE BI Server Cache Management Strategies

The OBIEE BI Server cache can be one of the most effective ways of improving response times of OBIEE dashboards. By using data already in the cache it reduces load on the database, the network, and the BI Server.

Should you be using it? I always describe it as the “icing on the cake” – it’s not a fix for a badly-designed OBIEE system, but it does make a lot of sense to use once you’re happy that the foundations for the system are in place. If the foundations are not not in place? Then you’re just papering over the cracks and at some point it’s probably going to come back to bite you. As Mark Rittman put it nearly seven years ago, it’s “[…]usually the last desperate throw of the dice”. The phrase “technical debt”? Yeh, that. But, BI Server caching used after performance review and optimisation rather than instead of – then it’s a Good Thing.

So you’ve decided to use the BI Server cache, and merrily trotted over to Enterprise Manager to enable it, restarted the BI Server, and now your work is done, right? Not quite. Because the BI Server cache will start to store data from all the queries that you run, and use it to satisfy subsequent queries. Not only will it match on a direct hit for the same query, it will use a subset of an existing cache entry where appropriate, and can even aggregate up from what’s in the cache to satisfy a query at a higher level. Clever stuff. But, what happens when you load new data into your data warehouse? Well, the BI Server continues to serve requests out of the cache, because why shouldn’t it? And herein lies the problem with “just turn caching on”. You have to have a cache management strategy.

A cache management strategy sounds grand doesn’t it? But it boils down to two things:

  1. Accuracy – Flush any data from the cache that is now stale
  2. Speed – Prime the cache so that as many queries get a hit on it, first time

Maintaining an Accurate Cache

Every query that is run through the BI Server, whether from a Dashboard, Answers, or more funky routes such as custom ODBC clients or JDBC, will end up in cache. It’s possible to “seed” (“prime”/“warmup”) the cache explicitly, and this is discussed later. The only time you won’t see data in the cache is if (a) you have BI Server caching disabled, or (b) you’ve disabled the Cacheable option for a physical table that is involved in providing the data for the query being run.

You can see metadata for the current contents of the cache in the Administration Tool when connected online to the BI Server, through the Manage -> Cache menu option. This gives you lots of useful information (particularly when you come to optimising cache usage) including the size of each entry, when it was created, when it was last used, and so on.

Purging Options

So we’ve a spread of queries run that hit various dimension and fact tables and created lots of cache entries. Now we’ve loaded data into our underlying database, so we need to make sure that the next time a user runs an OBIEE query that uses the new data they can see it. Otherwise we commit the cardinal sin of any analytical system and show the user incorrect data which is a Bad Thing. It may be fast, but it’s WRONG….

We can purge the whole cache, but that’s a pretty brutal approach. The cache is persisted to disk and can hold lots of data stretching back months – to blitz all of that just because one table has some new data is overkill. A more targetted approach is to purge by physical database, physical table, or even logical query. When would you use these?

  • Purge entire cache – the nuclear option, but also the simplest. If your data model is small and a large proportion of the underlying physical tables may have changed data, then go for this
  • Purge by Physical Database – less brutal that clearing the whole cache, if you have various data sources that are loaded at different points in the batch schedule then targetting a particular physical database makes sense.
  • Purge by Physical Table – if many tables within your database have remained unchanged, whilst a large proportion of particlar tables have changed (or it’s a small table) then this is a sensible option to run for each affected table
  • Purge by Query – If you add a few thousand rows to a billion row fact table, purging all references to that table from the cache would be a waste. Imagine you have a table with sales by day. You load new sales figures daily, so purging the cache by query for recent data is obviously necessary, but data from previous weeks and months may well remain untouched so it makes sense to leave queries against those in the cache. The specifics of this choice are down to you and your ETL process and business rules inherent in the data (maybe there shouldn’t be old data loaded, but what happens if there is? See above re. serving wrong data to users). This option is the most complex to maintain because you risk leaving behind in the cache data that may be stale but doesn’t match the precise set of queries that you purge against.

Which one is correct depends on

  1. your data load and how many tables you’ve changed
  2. your level of reliance on the cache (can you afford low cache hit ratio until it warms up again?)
  3. time to reseed new content

If you are heavily dependant on the cache and have large amounts of data in it, you are probably going to need to invest time in a precise and potentially complex cache purge strategy. Conversely if you use caching as the ‘icing on the cake’ and/or it’s quick to seed new content then the simplest option is to purge the entire cache. Simple is good; OBIEE has enough moving parts without adding to its complexity unnecessarily.

Note that OBIEE itself will perform cache purges in some situations including if a dynamic repository variable used by a Business Model (e.g. in a Logical Column) gets a new value through a scheduled initialisation block.

Performing the Purge

There are several ways in which we can purge the cache. First I’ll discuss the ones that I would not recommend except for manual testing:

  1. Administration Tool -> Manage -> Cache -> Purge. Doing this every time your ETL runs is not a sensible idea unless you enjoy watching paint dry (or need to manually purge it as part of a deployment of a new RPD etc).
  2. In the Physical table, setting Cache persistence time. Why not? Because this time period starts from when the data was loaded into the cache, not when the data was loaded into your database.
    An easy mistake to make would be to think that with a daily ETL run, setting the Cache persistence time to 1 day might be a good idea. It’s not, because if your ETL runs at 06:00 and someone runs a report at 05:00, there is a going to be a stale cache entry present for another 23 hours. Even if you use cache seeding, you’re still relinquishing control of the data accuracy in your cache. What happens if the ETL batch overruns or underruns?

    The only scenario in which I would use this option is if I was querying directly against a transactional system and wanted to minimise the number of hits OBIEE made against it – the trade-off being users would deliberately be seeing stale data (but sometimes this is an acceptable compromise, so long as it’s made clear in the presentation of the data).

So the two viable options for cache purging are:

  1. BI Server Cache Purge Procedures
  2. Event Polling Table

BI Server Cache Purge Procedures

These are often called “ODBC” Procedures but technically ODBC is just one – of several – ways that the commands can be sent to the BI Server to invoke.

As well as supporting queries for data from clients (such as Presentation Services) sent as Logical SQL, the BI Server also has its own set of procedures. Many of these are internal and mostly undocumented (Christian Berg does a great job of explaining them here, and they do creep into the documentation here and here), but there are some cache management ones that are fully supported and documented. They are:

  • SAPurgeCacheByQuery
  • SAPurgeCacheByTable
  • SAPurgeCacheByDatabase
  • SAPurgeAllCache

The names of these match up to the purge processes that I describe above. The syntax is in the documentation, but what I am interested in here is how you can invoke them. They are my preferred method for managing the BI Server cache because they enable you to tightly couple your data load (ETL) to your cache purge. Setting the cache to purge based on a drop-dead timer (whether crontab, tivoli, Agent/iBot, whatever) gives you a huge margin of error if your ETL runtime does not remain consistent. Whether it organically increases in runtime as data volumes increase, or it fails and has to be fixed and restarted, ETL does not always finish bang-on when it is ‘supposed’ to.

You can call these procedures in the several ways, including:

  1. nqcmd – one of the most common ways, repeated on many a blog, but requires nqcmd/OBIEE to be installed on the machine running it. nqcmd is a command-line ODBC client for connecting to the BI Server
  2. ODBC – requires BI to be installed on the machine running it in order to make the OBIEE ODBC driver available
  3. JDBC – just requires the OBIEE JDBC driver, which is a single .jar file and thus portable
  4. Web Service – the OBIEE BI Server Web Service can be used to invoke these procedures from any machine with no dependencies other than some WSM configuration on the OBIEE server side.

My preference is for JDBC or Web Service, because they can be called from anywhere. In larger organisations the team building the ETL may have very little to do with OBIEE, and so asking them to install OBIEE components on their server in order to trigger cache purging can be quite an ask. Using JDBC only a single .jar needs copying onto the server, and using the web service not even that:

curl --silent --header "Content-Type: text/xml;charset=UTF-8" 
--user weblogic:Admin123 
--data @purge_cache_soap.xml 
http://192.168.56.102:7780/AdminService/AdminService

[...]
[59118] Operation SAPurgeAllCache succeeded!
[...]

For details of configuring ODI to use the BI Server JDBC driver in order to tightly couple the cache management into an existing ODI load job, stay tuned for a future blog!

Event Polling Tables (EPT)

NB Not Event “Pooling” Tables as I’ve often seen this called

The second viable approach to automated cache purging is EPT, which is a decoupled approach to managing the cache purge, with two components:

  1. An application (your ETL) inserts a row into the table S_NQ_EPT (which is created at installation time by the RCU in the BIPLATFORM schema) with the name of the physical table in which data has been changed
  2. The BI Server polls (hence the name) the S_NQ_EPT table periodically, and if it finds entries in it, purges the cache of data that is from those tables.

So EPT is in a sense the equivilent of using SAPurgeCacheByTable, but in a manner that is not tightly coupled. It relies on configuring the BI Server for EPT, and there is no easy way to know from your ETL if the cache purge has actually happened. It also means that the cache remains stale potentially as long as the polling interval that you’ve configured. Depending on when you’re running your ETL and the usage patterns of your users this may not be an issue, but if you are running ETL whilst users are on the system (for example intra-day micro ETL batches) you could end up with users seeing stale data. Oracle themselves recommend not setting the polling interval any lower than 10 minutes.

EPT has the benefit of being very easy to implement on the ETL side, because it is simply a database table into which the ETL developers need to insert a row for each table that they update during the ETL.

Seeding the Cache

Bob runs an OBIEE dashboard, and the results are added to the cache so that when Bill runs the same dashboard Bill gets a great response rate because his dashboard runs straight from cache. Kinda sucks for Bob though, because his query ran slow as it wasn’t in the cache yet. What’d be nice would be that for the first user on a dashboard the results were already in cache. This is known as seeding the cache, or ‘priming’ it. Because the BI Server cache is not dumb and will hit the cache for queries that aren’t necessarily direct replicas of what previously ran working out the optimal way to seed the cache can take some trial and error careful research. The documentation does a good job of explaining what will and won’t qualify for a cache hit, and it’s worth reading this first.

There are several options for seeding the cache. These all assume you’ve figured out the queries that you want to run in order to load the results into cache.

  1. Run the analysis manually, which will return the analysis data to you and insert it into the BI Server Cache too.
  2. Create an Agent to run the analysis with destination set to Oracle BI Server Cache (For seeding cache), and then either:
    1. Schedule the analysis to run from an Agent on a schedule
    2. Trigger it from a Web Service in order to couple it to your ETL data load / cache purge batch steps.
  3. Use the BI Server Procedure SASeedQuery (which is what the Agent does in the background) to load the given query into cache without returning the data to the client. This is useful for doing over JDBC/ODBC/Web Service (as discussed for purging above). You could just run the Logical SQL itself, but you probably don’t want to pull the actual data back to the client, hence using the procedure call instead.

Sidenote – Checking the RPD for Cacheable Tables

The RPD Query Tool is great for finding objects matching certain criteria. However, it seems to invert results when looking for Cacheable Physical tables – if you add a filter of Cacheable = false you get physical tables where Cacheable is enabled! And the same in reverse (Cacheable = true -> shows Physical tables where Cacheable is disabled)

Day in the Life of an OBIEE Cache Entry (Who Said BI Was Boring?)

In this example here I’m running a very simple report from SampleApp v406:

The Logical SQL for this is:

SELECT  
   0 s_0,  
   "A - Sample Sales"."Time"."T02 Per Name Month" s_1,  
   "A - Sample Sales"."Base Facts"."1- Revenue" s_2  
FROM "A - Sample Sales"  
ORDER BY 1, 2 ASC NULLS LAST  
FETCH FIRST 5000001 ROWS ONLY

Why’s that useful to know? Because when working with the cache resubmitting queries is needed frequently and doing so directly from an interface like nqcmd is much faster (for me) than a web GUI. Horses for courses…

So I’ve run the query and now we have a cache entry for it. How do we know? Because we see it in the nqquery.log (and if you don’t have it enabled, go and enable it now):

[2015-09-23T15:58:18.000+01:00] [OracleBIServerComponent] [TRACE:3] [USER-42]  
[] [ecid: 00586hFR07mFw000jzwkno0005Qx00007U,0] [tid: 84a35700]  
[requestid: a9730015] [sessionid: a9730000] [username: weblogic]  
--------------------  
Query Result Cache: [59124] The query for user 'weblogic' was inserted into 
the query result cache.  
The filename is '/app/oracle/biee/instances/instance1/bifoundation/OracleBIServerComponent/coreapplication_obis1/cache/NQS__735866_57498_0.TBL'.

We see it in Usage Tracking (again, if you don’t have this enabled, go and enable it now):

SELECT TO_CHAR(L.START_TS, 'YYYY-MM-DD HH24:Mi:SS') LOGICAL_START_TS,  
  SAW_SRC_PATH,  
  NUM_CACHE_INSERTED,  
  NUM_CACHE_HITS,  
  NUM_DB_QUERY  
FROM BIEE_BIPLATFORM.S_NQ_ACCT L  
ORDER BY START_TS DESC;

We can also see it in the Administration Tool (when connected online to the BI Server):

We can even see it and touch it (figuratively) on disk:

So we have the data in the cache. The same query run again will now use the cache entry, as seen in nqquery.log:

[2015-09-23T16:09:24.000+01:00] [OracleBIServerComponent] [TRACE:3] [USER-21]
[] [ecid: 11d1def534ea1be0:6066a19d:14f636f1dea:-8000-000000000000b948,0:1:1:5]  
[tid: 87455700] 
[requestid: a9730017] [sessionid: a9730000] [username: weblogic]  
--------------------  
Cache Hit on query: [[  
Matching Query: SET VARIABLE QUERY_SRC_CD='Report',SAW_SRC_PATH='/users/weblogic/Cache Test 01',PREFERRED_CURRENCY='USD';SELECT  
   0 s_0,  
   "A - Sample Sales"."Time"."T02 Per Name Month" s_1,  
   "A - Sample Sales"."Base Facts"."1- Revenue" s_2  
FROM "A - Sample Sales"  
ORDER BY 1, 2 ASC NULLS LAST  
FETCH FIRST 5000001 ROWS ONLY

Created by:     weblogic

and in Usage Tracking:

“Interestingly” Usage Tracking shows a count of 1 for number of DB queries run, which we would not expect for a cache hit. The nqquery.log shows the same, but no query logged as being sent to the database, so I’m minded to dismiss this as an instrumentation bug.

Now what about if we want to run a query but not use the BI Server Cache? This is an easy one, plenty blogged about it elsewhere – use the Request Variable DISABLE_CACHE_HIT=1. This overrides the built in system session variable of the same name. Here I’m running it directly against the BI Server, prefixed onto my Logical SQL – if you want to run it from within OBIEE you need the Advanced tab in the Answers editor.

SET VARIABLE SAW_SRC_PATH='/users/weblogic/Cache Test 01',
DISABLE_CACHE_HIT=1:SELECT  
   0 s_0,  
   "A - Sample Sales"."Time"."T02 Per Name Month" s_1,  
   "A - Sample Sales"."Base Facts"."1- Revenue" s_2  
FROM "A - Sample Sales"  
ORDER BY 1, 2 ASC NULLS LAST  
FETCH FIRST 5000001 ROWS ONLY

Now we get a cache ‘miss’, because we’ve specifically told the BI Server to not use the cache. As you’d expect, Usage Tracking shows no cache hit, but it does show a cache insert – because why shouldn’t it?

If you want to run a query without seeding the cache either, you can use DISABLE_CACHE_SEED=1:

SET VARIABLE SAW_SRC_PATH='/users/weblogic/Cache Test 01',
DISABLE_CACHE_HIT=1,DISABLE_CACHE_SEED=1:SELECT  
   0 s_0,  
   "A - Sample Sales"."Time"."T02 Per Name Month" s_1,  
   "A - Sample Sales"."Base Facts"."1- Revenue" s_2  
FROM "A - Sample Sales"  
ORDER BY 1, 2 ASC NULLS LAST  
FETCH FIRST 5000001 ROWS ONLY

These request variables can be set per analysis, or per user by creating a session initialisation block to assign the required values to the respective variables.

Cache Location

The BI Server cache is held on disk, so it goes without saying that storing it on fast (eg SSD) disk is a Good Idea. There’s no harm in giving it its own filesystem on *nix to isolate it from other work (in terms of filesystems filling up) and to make monitoring it super easy.

Use the DATA_STORAGE_PATHS configuration element in NQSConfig.ini to change the location of the BI Server cache.

Summary

  1. Use BI Server Caching as the ‘icing on the cake’ for performance in your OBIEE system. Make sure you have your house in order first – don’t use it to try to get around bad design.
  2. Use the SAPurgeCache procedures to directly invoke a purge, or the Event Polling Tables for a more loosely-coupled approach. Decide carefully which purge approach is best for your particular caching strategy.
  3. If using the SAPurgeCache procedures, use JDBC or Web Services to call them so that there is minimal/no installation required to call them from your ETL server.
  4. Invest time in working out an optimal cache seeding strategy, making use of Usage Tracking to track cache hit ratios.
  5. Integrate both purge and seeding into your ETL. Don’t use a schedule-based approach because it will come back to haunt you with its inflexibility and scope for error.