Category Archives: Rittman Mead
Running Big Data Discovery Shell and Jupyter Notebook on Big Data Lite VM 4.5
New in Big Data Discovery 1.2 is the addition of BDD Shell, an integration point with Python. This exposes the datasets and BDD functionality in a Python and PySpark environment, opening up huge possibilities for advanced data science work on BDD datasets, particularly when used in conjunction with Jupyter Notebooks. With the ability to push back to Hive and thus BDD data modified in this environment, this is important functionality that will make BDD even more useful for navigating and exploring big data.
The Big Data Lite virtual machine is produced by Oracle for demo and development purposes, and hosts all the components that you'd find on the Big Data Appliance, all configured and integrated for use. Version 4.5 was released recently, which included BDD 1.2. In this article we'll see how to configure BDD Shell on Big Data Lite 4.5 (along with Jupyter Notebooks), and in a subsequent post dive into how to actually use them.
Setting up BDD Shell on Big Data Lite
You can find the BDD Shell installation document here.
Login to BigDataLite 4.5 (oracle/welcome1) and open a Terminal window. The first step is to download Anaconda, which is a distribution of Python that also includes "[...] over 100 of the most popular Python, R and Scala packages for data science" as well as Jupyter notebook, which we'll see in a moment.
cd ~/Downloads/ wget http://repo.continuum.io/archive/Anaconda2-4.0.0-Linux-x86_64.shThen install it: (n.b.
bash
is part of the command to enter)
bash Anaconda2-4.0.0-Linux-x86_64.sh
Accept the licence when prompted, and then select a install location - I used /u01/anaconda2
where the rest of the BigDataLite installs are
Anaconda2 will now be installed into this location: /home/oracle/anaconda2 - Press ENTER to confirm the location - Press CTRL-C to abort the installation - Or specify a different location below [/home/oracle/anaconda2] >>> /u01/anaconda2
After a few minutes of installation, you'll be prompted to whether you want to prepend Anaconda's location to the PATH
environment variable. I opted not to (which is the default) since Python is used elsewhere on the system and by prepending it it'll take priority and possibly break things.
Do you wish the installer to prepend the Anaconda2 install location to PATH in your /home/oracle/.bashrc ? [yes|no] [no] >>> no
Now edit the BDD Shell configuration file (/u01/bdd/v1.2.0/BDD-1.2.0.31.813/bdd-shell/bdd-shell.conf
) in your favourite text editor to add/amend the following lines:
SPARK_EXECUTOR_PYTHON=/u01/anaconda2/bin/python LOCAL_PYTHON_HOME=/u01/anaconda2
Amend the path if you didn't install Anaconda into /u01
In the same configuration file, add/amend:
SPARK_HOME=/usr/lib/spark/ SPARK_EXTRA_CLASSPATH=/usr/lib/oozie/oozie-sharelib-yarn/lib/spark/spark-avro_2.10-1.1.0-cdh5.7.0.jar
Now run the BDD Shell setup:
/u01/bdd/v1.2.0/BDD-1.2.0.31.813/bdd-shell/setup.sh
This should succeed:
[bigdatalite.localdomain] Validating pre-requisites... [bigdatalite.localdomain] Validation Success [bigdatalite.localdomain] Setting up BDD Shell... [bigdatalite.localdomain] Setup Success [oracle@bigdatalite Downloads]$
Assuming it does, you can now launch the shell bdd-shell.sh
:
[oracle@bigdatalite Downloads]$ /u01/bdd/v1.2.0/BDD-1.2.0.31.813/bdd-shell/bdd-shell.sh WARNING: User-defined SPARK_HOME (/usr/lib/spark) overrides detected (/usr/lib/spark/). WARNING: Running spark-class from user-defined location. spark.driver.cores is set but does not apply in client mode. Welcome to ___ ___ ___ __ _ ____ _ _ | |_) | | | | ( (` | |_| | |_ | | | | |_|_) |_|_/ |_|_/ _)_) |_| | |_|__ |_|__ |_|__ SparkContext available as sc, HiveContext available as sqlContext. BDD Context available as bc. >>>
From the BDD Shell you can interact with BDD, for example to list out the datasets currently defined in the Catalog:
>>> bc.datasets().count 17 >>> for ds in bc.datasets(): ... print ds ... media_demo_customer edp_cli_edp_2c7f41ee-65bf-43ac-8bb4-5b6b59a55d75 edp_cli_edp_2c7f41ee-65bf-43ac-8bb4-5b6b59a55d75 Hive default.media_demo_customer movie_genre default_edp_7d4c18a5-6f02-4067-9f63-91f950078b1e default_edp_7d4c18a5-6f02-4067-9f63-91f950078b1e Hive default.movie_genre media_demo_customer default_edp_89c616b6-aa10-4827-aa82-1e9c3fcc419e default_edp_89c616b6-aa10-4827-aa82-1e9c3fcc419e Hive default.media_demo_customer
Whilst BDD Shell is command-line based, there's also the option to run Jupyter Notebooks (previous iPython Notebooks) which is a web-based interactive "Notebook". This lets you build up scripts exploring and manipulating the data within BDD, using both Python and Spark. The big advantage of this over the command-line interface is that a 'Notebook' enables you to modify and re-run commands, and then once correct, retain them as a fully functioning script for future use.
To launch it, run:
cd /u01/bdd/v1.2.0/BDD-1.2.0.31.813/bdd-shell /u01/anaconda2/bin/jupyter-notebook --port 18888
Important points to note:
- It's important that you run this from the
bdd-shell
folder, otherwise the BDD shell won't initialise properly - By default Jupyter uses 8888, which is already in use on BigDataLite by Hue, so use a different one by specifying
--port
- Jupyter by default only listens locally, so you need to either be using BigDataLite desktop to run Firefox, or use port-forwarding if you want to access Jupyter from your local web browser.
Go to http://localhost:18888
in your web browser, and you should see the default Jupyter screen with a list of files:
In the next article, we'll see how to use Jupyter Notebooks with Big Data Discovery, and get an idea of just how powerful the combination can be.
OBIEE 12c – Extended Subject Areas (XSA) and the Data Set Service
One of the big changes in OBIEE 12c for end users is the ability to upload their own data sets and start analysing them directly, without needing to go through the traditional data provisioning and modelling process and associated leadtimes. The implementation of this is one of the big architectural changes of OBIEE 12c, introducing the concept of the Extended Subject Areas (XSA), and the Data Set Service (DSS).
In this article we’ll see some of how XSA and DSS work behind the scenes, providing an important insight for troubleshooting and performance analysis of this functionality.
What is an XSA?
An Extended Subject Area (XSA) is made up of a dataset, and associated XML data model. It can be used standalone, or “mashed up” in conjunction with a “traditional” subject area on a common field
How is an XSA Created?
At the moment the following methods are available:
- “Add XSA” in Visual Analzyer, to upload an Excel (XLSX) document
-
CREATE DATASET
logical SQL statement, that can be run through any interface to the BI Server, including ‘Issue Raw SQL’, nqcmd, JDBC calls, and so on -
Add Data Source in Answers. Whilst this option shouldn’t actually be present according to a this doc, it will be for any users of 12.2.1 who have uploaded the SampleAppLite BAR file so I’m including it here for completeness.
Under the covers, these all use the same REST API calls directly into datasetsvc
. Note that these are entirely undocumented, and only for internal OBIEE component use. They are not intended nor supported for direct use.
How does an XSA work?
External Subject Areas (XSA) are managed by the Data Set Service (DSS). This is a java deployment (datasetsvc
) running in the Managed Server (bi_server1), providing a RESTful API for the other OBIEE components that use it.
The end-user of the data, whether it’s Visual Analyzer or the BI Server, send REST web service calls to DSS, storing and querying datasets within it.
Where is the XSA Stored?
By default, the data for XSA is stored on disk in SINGLETON_DATA_DIRECTORY/components/DSS/storage/ssi
, e.g. /app/oracle/biee/user_projects/domains/bi/bidata/components/DSS/storage/ssi
[oracle@demo ssi]$ ls -lrt /app/oracle/biee/user_projects/domains/bi/bidata/components/DSS/storage/ssi|tail -n5 -rw-r----- 1 oracle oinstall 8495 2015-12-02 18:01 7e43a80f-dcf6-4b31-b898-68616a68e7c4.dss -rw-r----- 1 oracle oinstall 593662 2016-05-27 11:00 1beb5e40-a794-4aa9-8c1d-5a1c59888cb4.dss -rw-r----- 1 oracle oinstall 131262 2016-05-27 11:12 53f59d34-2037-40f0-af21-45ac611f01d3.dss -rw-r----- 1 oracle oinstall 1014459 2016-05-27 13:04 a4fc922d-ce0e-479f-97e4-1ddba074f5ac.dss -rw-r----- 1 oracle oinstall 1014459 2016-05-27 13:06 c93aa2bd-857c-4651-bba2-a4f239115189.dss
They’re stored using the format in which they were created, which is XLSX (via VA) or CSV (via CREATE DATASET
)
[oracle@demo ssi]$ head 53f59d34-2037-40f0-af21-45ac611f01d3.dss "7 Megapixel Digital Camera","2010 Week 27",44761.88 "MicroPod 60Gb","2010 Week 27",36460.0 "MP3 Speakers System","2010 Week 27",36988.86 "MPEG4 Camcorder","2010 Week 28",32409.78 "CompCell RX3","2010 Week 28",33005.91
There’s a set of DSS-related tables installed in the RCU schema BIPLATFORM
, which hold information including the XML data model for the XSA, along with metadata such as the user that uploaded the file, when they uploaded, and then name of the file on disk:
How Can the Data Set Service be Configured?
The configuration file, with plenty of inline comments, is at ORACLE_HOME/bi/endpointmanager/jeemap/dss/DSS_REST_SERVICE.properties
. From here you con update settings for the data set service including upload limits as detailed here.
XSA Performance
Since XSA are based on flat files stored in disk, we need to be very careful in their use. Whilst a database may hold billions of rows in a table with with appropriate indexing and partitioning be able to provide sub-second responses, a flat file can quickly become a serious performance bottleneck. Bear in mind that a flat file is just a bunch of data plopped on disk – there is no concept of indices, blocks, partitions — all the good stuff that makes databases able to do responsive ad-hoc querying on selections of data.
If you’ve got a 100MB Excel file with thousands of cells, and want to report on just a few of them, you might find it laggy – because whether you want to report on them on or not, at some point OBIEE is going to have to read all of them regardless. We can see how OBIEE is handling XSA under the covers by examining the query log. This used to be called nqquery.log
in OBIEE 11g (and before), and in OBIEE 12c has been renamed obis1-query.log
.
In this example here I’m using an Excel worksheet with 140,000 rows and 78 columns. Total filesize of the source XLSX on disk is ~55Mb.
First up, I’ll build a query in Answers with a couple of the columns:
The logical query uses the new XSA
syntax:
SELECT 0 s_0, XSA('prodney'.'MOCK_DATA_bigger_55Mb')."Columns"."first_name" s_1, XSA('prodney'.'MOCK_DATA_bigger_55Mb')."Columns"."foo" s_2 FROM XSA('prodney'.'MOCK_DATA_bigger_55Mb') ORDER BY 2 ASC NULLS LAST FETCH FIRST 5000001 ROWS ONLY
The query log shows
Rows 144000, bytes 13824000 retrieved from database query Rows returned to Client 200
So of the 55MB of data, we’re pulling all the rows (144,000) back to the BI Server for it to then perform the aggregation on it, resulting in the 200 rows returned to the client (Presentation Services). Note though that the byte count is lower (13Mb) than the total size of the file (55Mb).
As well as aggregation, filtering on XSA data also gets done by the BI Server. Consider this example here, where we add a predicate:
In the query log we can see that all the data has to come back from DSS to the BI Server, in order for it to filter it:
Rows 144000, bytes 23040000 retrieved from database Physical query response time 24.195 (seconds), Rows returned to Client 0
Note the time taken by DSS — nearly 25 seconds. Compare this later on to when we see the XSA data served from a database, via the XSA Cache.
In terms of BI Server (not XSA) caching, the query log shows that a cache entry was written for the above request:
Query Result Cache: [59124] The query for user 'prodney' was inserted into the query result cache. The filename is '/app/oracle/biee/user_projects/domains/bi/servers/obis1/cache/NQS__736113_56359_0.TBL'
If I refresh the query in Answers, the data is fetched anew (per this changed behaviour in OBIEE 12c), and the cache repopulated. If I clear the Presentation Services cache and re-open the analysis, I get the results from the BI Server cache, and it doesn’t have to refetch the data from the Data Set Service.
Since the cache has two columns in, an attribute and a measure, I wondered if running a query with just the fact rolled up might hit the cache (since it has all the data there that it needs)
Unfortunately it didn’t, and to return a single row of data required BI Server to fetch all the rows again – although looking at the byte count it appears it does prune the columns required since it’s now just over 2Mb of data returned this time:
Rows 144000, bytes 2304000 retrieved from database Rows returned to Client 1
Interestingly if I build an analysis with several more of the columns from the file (in this example, ten of a total of 78), the data returned from the DSS to BI Server (167Mb) is greater than that of the original file (55Mb).
Rows 144000, bytes 175104000 Rows returned to Client 1000
And this data coming back from the DSS to the BI Server has to go somewhere – and if it’s big enough it’ll overflow to disk, as we can see when I run the above:
$ ls -l /app/oracle/biee/user_projects/domains/bi/servers/obis1/tmp/obis_temp [...] -rwxrwx--- 1 oracle oinstall 2910404 2016-06-01 14:08 nQS_AG_22345_7503_7c9c000a_50906091.TMP -rwxrwx--- 1 oracle oinstall 43476 2016-06-01 14:08 nQS_AG_22345_7504_7c9c000a_50906091.TMP -rw------- 1 oracle oinstall 6912000 2016-06-01 14:08 nQS_AG_22345_7508_7c9c000a_50921949.TMP -rw------- 1 oracle oinstall 631375 2016-06-01 14:08 nQS_EX_22345_7506_7c9c000a_50921652.TMP -rw------- 1 oracle oinstall 3670016 2016-06-01 14:08 nQS_EX_22345_7507_7c9c000a_50921673.TMP [...]
You can read more about BI Server’s use of temporary files and the impact that it can have on system performance and particularly I/O bandwidth in this OTN article here.
So – as the expression goes – “buyer beware”. XSA is an excellent feature, but used in its default configuration with files stored on disk it has the potential to wreak havoc if abused.
XSA Caching
If you’re planning to use XSA seriously, you should set up the database-based XSA Cache. This is described in detail in the PDF document attached to My Oracle Support note OBIEE 12c: How To Configure The External Subject Area (XSA) Cache For Data Blending| Mashup And Performance (Doc ID 2087801.1).
In a proper implementation you would follow in full the document, including provisioning a dedicated schema and tablespace for holding the data (to make it easier to manage and segregate from other data), but here I’m just going to use the existing RCU schema (BIPLATFORM), along with the Physical mapping already in the RPD (10 - System DB (ORCL)
):
In NQSConfig.INI
, under the XSA_CACHE
section, I set:
ENABLE = YES; # The schema and connection pool where the XSA data will be cached. PHYSICAL_SCHEMA = "10 - System DB (ORCL)"."Catalog"."dbo"; CONNECTION_POOL = "10 - System DB (ORCL)"."UT Connection Pool";
And restart the BI Server:
/app/oracle/biee/user_projects/domains/bi/bitools/bin/stop.sh -i obis1 && /app/oracle/biee/user_projects/domains/bi/bitools/bin/start.sh -i obis1
Per the document, note that in the BI Server log there’s an entry indicating that the cache has been successfully started:
[101001] External Subject Area cache is started successfully using configuration from the repository with the logical name ssi. [101017] External Subject Area cache has been initialized. Total number of entries: 0 Used space: 0 bytes Maximum space: 107374182400 bytes Remaining space: 107374182400 bytes. Cache table name prefix is XC2875559987.
Now when I re-run the test XSA analysis from above, returning three columns, the BI Server goes off and populates the XSA cache table:
-- Sending query to database named 10 - System DB (ORCL) (id: <<79879>> XSACache Create table Gateway), connection pool named UT Connection Pool, logical request hash b4de812e, physical request hash 5847f2ef: CREATE TABLE dbo.XC2875559987_ZPRODNE1926129021 ( id3209243024 DOUBLE PRECISION, first_n[..]
Or rather, it doesn’t, because PHYSICAL_SCHEMA
seems to want the literal physical schema, rather than the logical physical one (?!) that the USAGE_TRACKING
configuration stanza is happy with in referencing the table.
Properties: description=<<79879>> XSACache Create table Exchange; producerID=0x1561aff8; requestID=0xfffe0034; sessionID=0xfffe0000; userName=prodney; [nQSError: 17001] Oracle Error code: 1918, message: ORA-01918: user 'DBO' does not exist
I’m trying to piggyback on SA511’s existing configruation, which uses catalog.schema notation:
Instead of the more conventional approach to have the actual physical schema (often used in conjunction with ‘Require fully qualified table names’ in the connection pool):
So now I’ll do it properly, and create a database and schema for the XSA cache – I’m still going to use the BIPLATFORM schema though…
Updated NQSConfig.INI:
[ XSA_CACHE ] ENABLE = YES; # The schema and connection pool where the XSA data will be cached. PHYSICAL_SCHEMA = "XSA Cache"."BIEE_BIPLATFORM"; CONNECTION_POOL = "XSA Cache"."XSA CP";
After refreshing the analysis again, there’s a successful creation of the XSA cache table:
-- Sending query to database named XSA Cache (id: <<65685>> XSACache Create table Gateway), connection pool named XSA CP, logical request hash 9a548c60, physical request hash ccc0a410: [[ CREATE TABLE BIEE_BIPLATFORM.XC2875559987_ZPRODNE1645894381 ( id3209243024 DOUBLE PRECISION, first_name2360035083 VARCHAR2(17 CHAR), [...]
as well as a stats gather:
-- Sending query to database named XSA Cache (id: <<65685>> XSACache Collect statistics Gateway), connection pool named XSA CP, logical request hash 9a548c60, physical request hash d73151bb: BEGIN DBMS_STATS.GATHER_TABLE_STATS(ownname => 'BIEE_BIPLATFORM', tabname => 'XC2875559987_ZPRODNE1645894381' , estimate_percent => 5 , method_opt => 'FOR ALL COLUMNS SIZE AUTO' ); END;
Although I do note that it is used a fixed estimate_percent
instead of the recommended AUTO_SAMPLE_SIZE
. The table itself is created with a fixed prefix (as specified in the obis1-diagnostic.log at initialisation), and holds a full copy of the XSA (not just the columns in the query that triggered the cache creation):
With the dataset cached, the query is then run and the query log shows a XSA cache hit
External Subject Area cache hit for 'prodney'.'MOCK_DATA_bigger_55Mb'/Columns : Cache entry shared_cache_key = 'prodney'.'MOCK_DATA_bigger_55Mb', table name = BIEE_BIPLATFORM.XC2875559987_ZPRODNE2128899357, row count = 144000, entry size = 201326592 bytes, creation time = 2016-06-01 20:14:26.829, creation elapsed time = 49779 ms, descriptor ID = /app/oracle/biee/user_projects/domains/bi/servers/obis1/xsacache/NQSXSA_BIEE_BIPLATFORM.XC2875559987_ZPRODNE2128899357_2.CACHE
with the resulting physical query fired at the XSA cache table (replacing what would have gone against the DSS web service):
-- Sending query to database named XSA Cache (id: <<65357>>), connection pool named XSA CP, logical request hash 9a548c60, physical request hash d3ed281d: [[ WITH SAWITH0 AS (select T1000001.first_name2360035083 as c1, T1000001.last_name3826278858 as c2, sum(T1000001.foo2363149668) as c3 from BIEE_BIPLATFORM.XC2875559987_ZPRODNE1645894381 T1000001 group by T1000001.first_name2360035083, T1000001.last_name3826278858) select D1.c1 as c1, D1.c2 as c2, D1.c3 as c3, D1.c4 as c4 from ( select 0 as c1, D102.c1 as c2, D102.c2 as c3, D102.c3 as c4 from SAWITH0 D102 order by c2, c3 ) D1 where rownum <= 5000001
It’s important to point out the difference of what’s happening here: the aggregation has been pushed down to the database, meaning that the BI Server doesn’t have to. In performance terms, this is a Very Good Thing usually.
Rows 988, bytes 165984 retrieved from database query Rows returned to Client 988
Whilst it doesn’t seem to be recorded in the query log from what I can see, the data returned from the XSA Cache also gets inserted into the BI Server cache, and if you open an XSA-based analysis that’s not in the presentation services cache (a third cache to factor in!) you will get a cache hit on the BI Server cache. As discussed earlier in this article though, if an analysis is built against an XSA for which a BI Server cache entry exists that with manipulation could service it (eg pruning columns or rolling up), it doesn’t appear to take advantage of it – but since it’s hitting the XSA cache this time, it’s less of a concern.
If you change the underlying data in the XSA
The BI Server does pick this up and repopulates the XSA Cache.
The XSA cache entry itself is 192Mb in size – generated from a 55Mb upload file. The difference will be down to data types and storage methods etc. However, that it is larger in the XSA Cache (database) than held natively (flat file) doesn’t really matter, particularly if the data is being aggregated and/or filtered, since the performance benefit of pushing this work to the database will outweigh the overhead of storage space. Consider this example here, where I run an analysis pulling back 44 columns (of the 78 in the spreadsheet) and hit the XSA cache, it runs in just over a second, and transfers from the database a total of 5.3Mb (the data is repeated, so rolls up):
Rows 1000, bytes 5576000 retrieved from database Rows returned to Client 1000
If I disable the XSA cache and run the same query, we see this:
Rows 144000, bytes 801792000 Retrieved from database Physical query response time 22.086 (seconds) Rows returned to Client 1000
That’s 764Mb being sent back for the BI Server to process, which it does by dumping a whole load to disk in temporary work files:
$ ls -l /app/oracle/biee/user_projects/domains/bi/servers/obis1/tmp/obis_temp [...]] -rwxrwx--- 1 oracle oinstall 10726190 2016-06-01 21:04 nQS_AG_29733_261_ebd70002_75835908.TMP -rwxrwx--- 1 oracle oinstall 153388 2016-06-01 21:04 nQS_AG_29733_262_ebd70002_75835908.TMP -rw------- 1 oracle oinstall 24192000 2016-06-01 21:04 nQS_AG_29733_266_ebd70002_75862509.TMP -rw------- 1 oracle oinstall 4195609 2016-06-01 21:04 nQS_EX_29733_264_ebd70002_75861716.TMP -rw------- 1 oracle oinstall 21430272 2016-06-01 21:04 nQS_EX_29733_265_ebd70002_75861739.TMP
As a reminder – this isn’t “Bad”, it’s just not optimal (response time of 50 seconds vs 1 second), and if you scale that kind of behaviour by many users with many datasets, things could definitely get hairy for all users of the system. Hence – use the XSA Cache.
As a final point, with the XSA Cache being in the database the standard range of performance optimisations are open to us – indexing being the obvious one. No indexes are built against the XSA Cache table by default, which is fair enough since OBIEE has no idea what the key columns on the data are, and the point of mashups is less to model and optimise the data but to just get it up there in front of the user. So you could index the table if you knew the key columns that were going to be filtered against, or you could even put it into memory (assuming you’ve licensed the option).
The MoS document referenced above also includes further performance recommendations for XSA, including the use of RAM Disk for XSA cache metadata files, as well as the managed server temp folder
Summary
External Subject Areas are great functionality, but be aware of the performance implications of not being able to push down common operations such as filtering and aggregation. Set up XSA Caching if you are going to be using XSA properly.
If you’re interested in the direction of XSA and the associated Data Set Service, this slide deck from Oracle’s Socs Cappas provides some interesting reading. Uploading Excel files into OBIEE looks like just the beginning of what the Data Set Service is going to enable!
The post OBIEE 12c – Extended Subject Areas (XSA) and the Data Set Service appeared first on Rittman Mead Consulting.
OBIEE 12c – Extended Subject Areas (XSA) and the Data Set Service
One of the big changes in OBIEE 12c for end users is the ability to upload their own data sets and start analysing them directly, without needing to go through the traditional data provisioning and modelling process and associated leadtimes. The implementation of this is one of the big architectural changes of OBIEE 12c, introducing the concept of the Extended Subject Areas (XSA), and the Data Set Service (DSS).
In this article we'll see some of how XSA and DSS work behind the scenes, providing an important insight for troubleshooting and performance analysis of this functionality.
What is an XSA?
An Extended Subject Area (XSA) is made up of a dataset, and associated XML data model. It can be used standalone, or "mashed up" in conjunction with a "traditional" subject area on a common field
How is an XSA Created?
At the moment the following methods are available:
- "Add XSA" in Visual Analzyer, to upload an Excel (XLSX) document
CREATE DATASET
logical SQL statement, that can be run through any interface to the BI Server, including 'Issue Raw SQL', nqcmd, JDBC calls, and so onAdd Data Source in Answers. Whilst this option shouldn't actually be present according to a this doc, it will be for any users of 12.2.1 who have uploaded the SampleAppLite BAR file so I'm including it here for completeness.
Under the covers, these all use the same REST API calls directly into datasetsvc
. Note that these are entirely undocumented, and only for internal OBIEE component use. They are not intended nor supported for direct use.
How does an XSA work?
External Subject Areas (XSA) are managed by the Data Set Service (DSS). This is a java deployment (datasetsvc
) running in the Managed Server (bi_server1), providing a RESTful API for the other OBIEE components that use it.
The end-user of the data, whether it's Visual Analyzer or the BI Server, send REST web service calls to DSS, storing and querying datasets within it.
Where is the XSA Stored?
By default, the data for XSA is stored on disk in SINGLETONDATADIRECTORY/components/DSS/storage/ssi
, e.g. /app/oracle/biee/user_projects/domains/bi/bidata/components/DSS/storage/ssi
[oracle@demo ssi]$ ls -lrt /app/oracle/biee/user_projects/domains/bi/bidata/components/DSS/storage/ssi|tail -n5 -rw-r----- 1 oracle oinstall 8495 2015-12-02 18:01 7e43a80f-dcf6-4b31-b898-68616a68e7c4.dss -rw-r----- 1 oracle oinstall 593662 2016-05-27 11:00 1beb5e40-a794-4aa9-8c1d-5a1c59888cb4.dss -rw-r----- 1 oracle oinstall 131262 2016-05-27 11:12 53f59d34-2037-40f0-af21-45ac611f01d3.dss -rw-r----- 1 oracle oinstall 1014459 2016-05-27 13:04 a4fc922d-ce0e-479f-97e4-1ddba074f5ac.dss -rw-r----- 1 oracle oinstall 1014459 2016-05-27 13:06 c93aa2bd-857c-4651-bba2-a4f239115189.dss
They're stored using the format in which they were created, which is XLSX (via VA) or CSV (via CREATE DATASET
)
[oracle@demo ssi]$ head 53f59d34-2037-40f0-af21-45ac611f01d3.dss "7 Megapixel Digital Camera","2010 Week 27",44761.88 "MicroPod 60Gb","2010 Week 27",36460.0 "MP3 Speakers System","2010 Week 27",36988.86 "MPEG4 Camcorder","2010 Week 28",32409.78 "CompCell RX3","2010 Week 28",33005.91
There's a set of DSS-related tables installed in the RCU schema BIPLATFORM
, which hold information including the XML data model for the XSA, along with metadata such as the user that uploaded the file, when they uploaded, and then name of the file on disk:
How Can the Data Set Service be Configured?
The configuration file, with plenty of inline comments, is at ORACLEHOME/bi/endpointmanager/jeemap/dss/DSSREST_SERVICE.properties
. From here you con update settings for the data set service including upload limits as detailed here.
XSA Performance
Since XSA are based on flat files stored in disk, we need to be very careful in their use. Whilst a database may hold billions of rows in a table with with appropriate indexing and partitioning be able to provide sub-second responses, a flat file can quickly become a serious performance bottleneck. Bear in mind that a flat file is just a bunch of data plopped on disk - there is no concept of indices, blocks, partitions -- all the good stuff that makes databases able to do responsive ad-hoc querying on selections of data.
If you've got a 100MB Excel file with thousands of cells, and want to report on just a few of them, you might find it laggy - because whether you want to report on them on or not, at some point OBIEE is going to have to read all of them regardless. We can see how OBIEE is handling XSA under the covers by examining the query log. This used to be called nqquery.log
in OBIEE 11g (and before), and in OBIEE 12c has been renamed obis1-query.log
.
In this example here I'm using an Excel worksheet with 140,000 rows and 78 columns. Total filesize of the source XLSX on disk is ~55Mb.
First up, I'll build a query in Answers with a couple of the columns:
The logical query uses the new XSA
syntax:
SELECT 0 s_0, XSA('prodney'.'MOCK_DATA_bigger_55Mb')."Columns"."first_name" s_1, XSA('prodney'.'MOCK_DATA_bigger_55Mb')."Columns"."foo" s_2 FROM XSA('prodney'.'MOCK_DATA_bigger_55Mb') ORDER BY 2 ASC NULLS LAST FETCH FIRST 5000001 ROWS ONLYThe query log shows
Rows 144000, bytes 13824000 retrieved from database query
Rows returned to Client 200
So of the 55MB of data, we're pulling all the rows (144,000) back to the BI Server for it to then perform the aggregation on it, resulting in the 200 rows returned to the client (Presentation Services). Note though that the byte count is lower (13Mb) than the total size of the file (55Mb).
As well as aggregation, filtering on XSA data also gets done by the BI Server. Consider this example here, where we add a predicate:
In the query log we can see that all the data has to come back from DSS to the BI Server, in order for it to filter it:
Rows 144000, bytes 23040000 retrieved from database Physical query response time 24.195 (seconds), Rows returned to Client 0Note the time taken by DSS -- nearly 25 seconds. Compare this later on to when we see the XSA data served from a database, via the XSA Cache. In terms of BI Server (not XSA) caching, the query log shows that a cache entry was written for the above request:
Query Result Cache: [59124] The query for user 'prodney' was inserted into the query result cache. The filename is '/app/oracle/biee/user_projects/domains/bi/servers/obis1/cache/NQS__736113_56359_0.TBL'
If I refresh the query in Answers, the data is fetched anew (per this changed behaviour in OBIEE 12c), and the cache repopulated. If I clear the Presentation Services cache and re-open the analysis, I get the results from the BI Server cache, and it doesn't have to refetch the data from the Data Set Service.
Since the cache has two columns in, an attribute and a measure, I wondered if running a query with just the fact rolled up might hit the cache (since it has all the data there that it needs)
Unfortunately it didn't, and to return a single row of data required BI Server to fetch all the rows again - although looking at the byte count it appears it does prune the columns required since it's now just over 2Mb of data returned this time:
Rows 144000, bytes 2304000 retrieved from database
Rows returned to Client 1
Interestingly if I build an analysis with several more of the columns from the file (in this example, ten of a total of 78), the data returned from the DSS to BI Server (167Mb) is greater than that of the original file (55Mb).
Rows 144000, bytes 175104000
Rows returned to Client 1000
And this data coming back from the DSS to the BI Server has to go somewhere - and if it's big enough it'll overflow to disk, as we can see when I run the above:
$ ls -l /app/oracle/biee/user_projects/domains/bi/servers/obis1/tmp/obis_temp [...] -rwxrwx--- 1 oracle oinstall 2910404 2016-06-01 14:08 nQS_AG_22345_7503_7c9c000a_50906091.TMP -rwxrwx--- 1 oracle oinstall 43476 2016-06-01 14:08 nQS_AG_22345_7504_7c9c000a_50906091.TMP -rw------- 1 oracle oinstall 6912000 2016-06-01 14:08 nQS_AG_22345_7508_7c9c000a_50921949.TMP -rw------- 1 oracle oinstall 631375 2016-06-01 14:08 nQS_EX_22345_7506_7c9c000a_50921652.TMP -rw------- 1 oracle oinstall 3670016 2016-06-01 14:08 nQS_EX_22345_7507_7c9c000a_50921673.TMP [...]
You can read more about BI Server's use of temporary files and the impact that it can have on system performance and particularly I/O bandwidth in this OTN article here.
So - as the expression goes - "buyer beware". XSA is an excellent feature, but used in its default configuration with files stored on disk it has the potential to wreak havoc if abused.
XSA Caching
If you're planning to use XSA seriously, you should set up the database-based XSA Cache. This is described in detail in the PDF document attached to My Oracle Support note OBIEE 12c: How To Configure The External Subject Area (XSA) Cache For Data Blending| Mashup And Performance (Doc ID 2087801.1).
In a proper implementation you would follow in full the document, including provisioning a dedicated schema and tablespace for holding the data (to make it easier to manage and segregate from other data), but here I'm just going to use the existing RCU schema (BIPLATFORM), along with the Physical mapping already in the RPD (10 - System DB (ORCL)
):
In NQSConfig.INI
, under the XSA_CACHE
section, I set:
ENABLE = YES;
# The schema and connection pool where the XSA data will be cached.
PHYSICAL_SCHEMA = "10 - System DB (ORCL)"."Catalog"."dbo";
CONNECTION_POOL = "10 - System DB (ORCL)"."UT Connection Pool";
And restart the BI Server:
/app/oracle/biee/user_projects/domains/bi/bitools/bin/stop.sh -i obis1 && /app/oracle/biee/user_projects/domains/bi/bitools/bin/start.sh -i obis1
Per the document, note that in the BI Server log there's an entry indicating that the cache has been successfully started:
[101001] External Subject Area cache is started successfully using configuration from the repository with the logical name ssi.
[101017] External Subject Area cache has been initialized. Total number of entries: 0 Used space: 0 bytes Maximum space: 107374182400 bytes Remaining space: 107374182400 bytes. Cache table name prefix is XC2875559987.
Now when I re-run the test XSA analysis from above, returning three columns, the BI Server goes off and populates the XSA cache table:
-- Sending query to database named 10 - System DB (ORCL) (id: <<79879>> XSACache Create table Gateway), connection pool named UT Connection Pool, logical request hash b4de812e, physical request hash 5847f2ef: CREATE TABLE dbo.XC2875559987_ZPRODNE1926129021 ( id3209243024 DOUBLE PRECISION, first_n[..]
Or rather, it doesn't, because PHYSICALSCHEMA
seems to want the literal physical schema, rather than the logical physical one (?!) that the USAGETRACKING
configuration stanza is happy with in referencing the table.
Properties: description=<<79879>> XSACache Create table Exchange; producerID=0x1561aff8; requestID=0xfffe0034; sessionID=0xfffe0000; userName=prodney; [nQSError: 17001] Oracle Error code: 1918, message: ORA-01918: user 'DBO' does not exist
I'm trying to piggyback on SA511's existing configruation, which uses catalog.schema notation:
Instead of the more conventional approach to have the actual physical schema (often used in conjunction with 'Require fully qualified table names' in the connection pool):
So now I'll do it properly, and create a database and schema for the XSA cache - I'm still going to use the BIPLATFORM schema though...
Updated NQSConfig.INI:
[ XSA_CACHE ]
ENABLE = YES;
# The schema and connection pool where the XSA data will be cached.
PHYSICAL_SCHEMA = "XSA Cache"."BIEE_BIPLATFORM";
CONNECTION_POOL = "XSA Cache"."XSA CP";
After refreshing the analysis again, there's a successful creation of the XSA cache table:
-- Sending query to database named XSA Cache (id: <<65685>> XSACache Create table Gateway), connection pool named XSA CP, logical request hash 9a548c60, physical request hash ccc0a410: [[ CREATE TABLE BIEE_BIPLATFORM.XC2875559987_ZPRODNE1645894381 ( id3209243024 DOUBLE PRECISION, first_name2360035083 VARCHAR2(17 CHAR), [...]
as well as a stats gather:
-- Sending query to database named XSA Cache (id: <<65685>> XSACache Collect statistics Gateway), connection pool named XSA CP, logical request hash 9a548c60, physical request hash d73151bb: BEGIN DBMS_STATS.GATHER_TABLE_STATS(ownname => 'BIEE_BIPLATFORM', tabname => 'XC2875559987_ZPRODNE1645894381' , estimate_percent => 5 , method_opt => 'FOR ALL COLUMNS SIZE AUTO' ); END;
Although I do note that it is used a fixed estimatepercent
instead of the recommended AUTOSAMPLE_SIZE
. The table itself is created with a fixed prefix (as specified in the obis1-diagnostic.log at initialisation), and holds a full copy of the XSA (not just the columns in the query that triggered the cache creation):
With the dataset cached, the query is then run and the query log shows a XSA cache hit
External Subject Area cache hit for 'prodney'.'MOCK_DATA_bigger_55Mb'/Columns : Cache entry shared_cache_key = 'prodney'.'MOCK_DATA_bigger_55Mb', table name = BIEE_BIPLATFORM.XC2875559987_ZPRODNE2128899357, row count = 144000, entry size = 201326592 bytes, creation time = 2016-06-01 20:14:26.829, creation elapsed time = 49779 ms, descriptor ID = /app/oracle/biee/user_projects/domains/bi/servers/obis1/xsacache/NQSXSA_BIEE_BIPLATFORM.XC2875559987_ZPRODNE2128899357_2.CACHEwith the resulting physical query fired at the XSA cache table (replacing what would have gone against the DSS web service):
-- Sending query to database named XSA Cache (id: <<65357>>), connection pool named XSA CP, logical request hash 9a548c60, physical request hash d3ed281d: [[ WITH SAWITH0 AS (select T1000001.first_name2360035083 as c1, T1000001.last_name3826278858 as c2, sum(T1000001.foo2363149668) as c3 from BIEE_BIPLATFORM.XC2875559987_ZPRODNE1645894381 T1000001 group by T1000001.first_name2360035083, T1000001.last_name3826278858) select D1.c1 as c1, D1.c2 as c2, D1.c3 as c3, D1.c4 as c4 from ( select 0 as c1, D102.c1 as c2, D102.c2 as c3, D102.c3 as c4 from SAWITH0 D102 order by c2, c3 ) D1 where rownum <= 5000001
It's important to point out the difference of what's happening here: the aggregation has been pushed down to the database, meaning that the BI Server doesn't have to. In performance terms, this is a Very Good Thing usually.
Rows 988, bytes 165984 retrieved from database query
Rows returned to Client 988
Whilst it doesn't seem to be recorded in the query log from what I can see, the data returned from the XSA Cache also gets inserted into the BI Server cache, and if you open an XSA-based analysis that's not in the presentation services cache (a third cache to factor in!) you will get a cache hit on the BI Server cache. As discussed earlier in this article though, if an analysis is built against an XSA for which a BI Server cache entry exists that with manipulation could service it (eg pruning columns or rolling up), it doesn't appear to take advantage of it - but since it's hitting the XSA cache this time, it's less of a concern.
If you change the underlying data in the XSA
The BI Server does pick this up and repopulates the XSA Cache.
The XSA cache entry itself is 192Mb in size - generated from a 55Mb upload file. The difference will be down to data types and storage methods etc. However, that it is larger in the XSA Cache (database) than held natively (flat file) doesn't really matter, particularly if the data is being aggregated and/or filtered, since the performance benefit of pushing this work to the database will outweigh the overhead of storage space. Consider this example here, where I run an analysis pulling back 44 columns (of the 78 in the spreadsheet) and hit the XSA cache, it runs in just over a second, and transfers from the database a total of 5.3Mb (the data is repeated, so rolls up):
Rows 1000, bytes 5576000 retrieved from database
Rows returned to Client 1000
If I disable the XSA cache and run the same query, we see this:
Rows 144000, bytes 801792000 Retrieved from database
Physical query response time 22.086 (seconds)
Rows returned to Client 1000
That's 764Mb being sent back for the BI Server to process, which it does by dumping a whole load to disk in temporary work files:
$ ls -l /app/oracle/biee/user_projects/domains/bi/servers/obis1/tmp/obis_temp [...]] -rwxrwx--- 1 oracle oinstall 10726190 2016-06-01 21:04 nQS_AG_29733_261_ebd70002_75835908.TMP -rwxrwx--- 1 oracle oinstall 153388 2016-06-01 21:04 nQS_AG_29733_262_ebd70002_75835908.TMP -rw------- 1 oracle oinstall 24192000 2016-06-01 21:04 nQS_AG_29733_266_ebd70002_75862509.TMP -rw------- 1 oracle oinstall 4195609 2016-06-01 21:04 nQS_EX_29733_264_ebd70002_75861716.TMP -rw------- 1 oracle oinstall 21430272 2016-06-01 21:04 nQS_EX_29733_265_ebd70002_75861739.TMP
As a reminder - this isn't "Bad", it's just not optimal (response time of 50 seconds vs 1 second), and if you scale that kind of behaviour by many users with many datasets, things could definitely get hairy for all users of the system. Hence - use the XSA Cache.
As a final point, with the XSA Cache being in the database the standard range of performance optimisations are open to us - indexing being the obvious one. No indexes are built against the XSA Cache table by default, which is fair enough since OBIEE has no idea what the key columns on the data are, and the point of mashups is less to model and optimise the data but to just get it up there in front of the user. So you could index the table if you knew the key columns that were going to be filtered against, or you could even put it into memory (assuming you've licensed the option).
The MoS document referenced above also includes further performance recommendations for XSA, including the use of RAM Disk for XSA cache metadata files, as well as the managed server temp folder
Summary
External Subject Areas are great functionality, but be aware of the performance implications of not being able to push down common operations such as filtering and aggregation. Set up XSA Caching if you are going to be using XSA properly.
If you're interested in the direction of XSA and the associated Data Set Service, this slide deck from Oracle's Socs Cappas provides some interesting reading. Uploading Excel files into OBIEE looks like just the beginning of what the Data Set Service is going to enable!
Changes in BI Server Cache Behaviour in OBIEE 12c : OBIS_REFRESH_CACHE
The OBIEE BI Server cache can be a great way of providing a performance boost to response times for end users - so long as it's implemented carefully. Done wrong, and you're papering over the cracks and heading for doom; done right, and it's the 'icing on the cake'. You can read more about how to use it properly here, and watch a video I did about it here. In this article we'll see how the BI Server cache has changed in OBIEE 12c in a way that could prove somewhat perplexing to developers used to OBIEE 11g.
The BI Server cache works by inspecting queries as they are sent to the BI Server, and deciding if an existing cache entry can be used to provide the data. This can include direct hits (i.e. the same query being run again), or more advanced cases, where a subset or aggregation of an existing cache entry could be used. If a cache entry is used then a trip to the database is avoided and response times will typically be better - particularly if more than one database query would have been involved, or lots of additional post-processing on the BI Server.
When an analysis or dashboard is run, Presentation Services generates the necessary Logical SQL to return the data needed, and sends this to the BI Server. It's at this point that the cache will, or won't, kick in. The BI Server will accept Logical SQL from other sources than Presentation Services - in fact, any JDBC or ODBC client. This is useful as it enables us to validate behaviour that we're observing and see how it can apply elsewhere.
When you build an Analysis in OBIEE 11g (and before), the cache will be used if applicable. Each time you add a column, or hit refresh, you'll get an entry back from the cache if one exists. This has benefits - speed - but disadvantages too. When the data in the database changes, you will still get a cache hit, regardless. The only way to force OBIEE to show you the latest version of the data is to purge the cache first. You can target cache purges based on databases, tables, or even specific queries - but you do need to purge it.
What's changed in OBIEE 12c is that when you click "Refresh" on an Analysis or Dashboard, the query is re-run against the source and the cache re-populated. Even if you have an existing cache entry, and even if the underlying data has not changed, if you hit Refresh, the cache will not be used. Which kind of makes sense, since "refresh" probably should indeed mean that.
Digging into OBIEE Cache Behaviour
Let's prove this out. I've got SampleApp v506 (OBIEE 11.1.1.9), and SampleApp v511 (OBIEE 12.2.1). First off, I'll clear the cache on each, using call saPurgeAllCache();
, run via Issue SQL:
Then I can use another BI Server procedure call to view the current cache contents (new in 11.1.1.9), call NQS_GetAllCacheEntries()
. For this one particularly make sure you've un-ticked "Use Oracle BI Presentation Services Cache". This is different from the BI Server cache which is the subject of this article, and as the name implies is a cache that Presentation Services keeps.
I've confirmed that the BI Server cache is enabled on both servers, in NQSConfig.INI
###############################################################################
#
# Query Result Cache Section
#
###############################################################################
[CACHE]
ENABLE = YES; # This Configuration setting is managed by Oracle Enterprise Manager Fusion Middleware Control
Now I create a very simple analysis in both 11g and 12c, showing a list of Airline Carriers and their Codes:
After clicking Results, a cache entry is inserted on each respective system:
Of particular interest is the create time, last used time, and number of times used:
If I now click Refresh in the Analysis window:
We see this happen to the caches:
In OBIEE 11g the cache entry is used - but in OBIEE 12c it's not. The CreatedTime
is evidently not populated correctly, so instead let's dive over to the query log (nqquery/obis1-query in 11g/12c respectively). In OBIEE 11g we've got:
-- SQL Request, logical request hash: 7c365697 SET VARIABLE QUERY_SRC_CD='Report',PREFERRED_CURRENCY='USD';SELECT 0 s_0, "X - Airlines Delay"."Carrier"."Carrier Code" s_1, "X - Airlines Delay"."Carrier"."Carrier" s_2 FROM "X - Airlines Delay" ORDER BY 1, 3 ASC NULLS LAST, 2 ASC NULLS LAST FETCH FIRST 5000001 ROWS ONLY -- Cache Hit on query: [[ Matching Query: SET VARIABLE QUERY_SRC_CD='Report',PREFERRED_CURRENCY='USD';SELECT 0 s_0, "X - Airlines Delay"."Carrier"."Carrier Code" s_1, "X - Airlines Delay"."Carrier"."Carrier" s_2 FROM "X - Airlines Delay" ORDER BY 1, 3 ASC NULLS LAST, 2 ASC NULLS LAST FETCH FIRST 5000001 ROWS ONLYWhereas 12c is:
-- SQL Request, logical request hash: d53f813c SET VARIABLE OBIS_REFRESH_CACHE=1,QUERY_SRC_CD='Report',PREFERRED_CURRENCY='USD';SELECT 0 s_0, "X - Airlines Delay"."Carrier"."Carrier Code" s_1, "X - Airlines Delay"."Carrier"."Carrier" s_2 FROM "X - Airlines Delay" ORDER BY 3 ASC NULLS LAST, 2 ASC NULLS LAST FETCH FIRST 5000001 ROWS ONLY -- Sending query to database named X0 - Airlines Demo Dbs (ORCL) (id: <<320369>>), connection pool named Aggr Connection, logical request hash d53f813c, physical request hash a46c069c: [[ WITH SAWITH0 AS (select T243.CODE as c1, T243.DESCRIPTION as c2 from BI_AIRLINES.UNIQUE_CARRIERS T243 /* 30 UNIQUE_CARRIERS */ ) select D1.c1 as c1, D1.c2 as c2, D1.c3 as c3 from ( select 0 as c1, D1.c1 as c2, D1.c2 as c3 from SAWITH0 D1 order by c3, c2 ) D1 where rownum <= 5000001 -- Query Result Cache: [59124] The query for user 'prodney' was inserted into the query result cache. The filename is '/app/oracle/biee/user_projects/domains/bi/servers/obis1/cache/NQS__736117_52416_27.TBL'.Looking closely at the 12c output shows three things:
- OBIEE has run a database query for this request, and not hit the cache
- A cache entry has clearly been created again as a result of this query
- The Logical SQL has a request variable set:
OBIS_REFRESH_CACHE=1
This is evidently added it by Presentation Services at runtime, since the Advanced tab of the analysis shows no such variable being set:
OBIS_REFRESH_CACHE
. Since it appears that Presentation Services is adding it in at runtime, let's step over to a more basic way of interfacing with the BI Server - nqcmd
. Whilst we could probably use Issue SQL (as we did above for querying the cache) I want to avoid any more behind-the-scenes funny business from Presentation Services.
In OBIEE 12c, I run nqcmd
:
/app/oracle/biee/user_projects/domains/bi/bitools/bin/nqcmd.sh -d AnalyticsWeb -u prodney -p Admin123
Enter Q to enter a query, as follows:
SET VARIABLE OBIS_REFRESH_CACHE=1,QUERY_SRC_CD='Report',PREFERRED_CURRENCY='USD';SELECT 0 s_0, "X - Airlines Delay"."Carrier"."Carrier Code" s_1, "X - Airlines Delay"."Carrier"."Carrier" s_2 FROM "X - Airlines Delay" ORDER BY 3 ASC NULLS LAST, 2 ASC NULLS LAST FETCH FIRST 5000001 ROWS ONLY
In `obis1-query.log' there's the cache bypass and populate:
Query Result Cache: [59124] The query for user 'prodney' was inserted into the query result cache. The filename is '/app/oracle/biee/user_projects/domains/bi/servers/obis1/cache/NQS__736117_53779_29.TBL'.
If I run it again without the OBIS_REFRESH_CACHE
variable:
SET VARIABLE QUERY_SRC_CD='Report',PREFERRED_CURRENCY='USD';SELECT 0 s_0, "X - Airlines Delay"."Carrier"."Carrier Code" s_1, "X - Airlines Delay"."Carrier"."Carrier" s_2 FROM "X - Airlines Delay" ORDER BY 3 ASC NULLS LAST, 2 ASC NULLS LAST FETCH FIRST 5000001 ROWS ONLY
We get the cache hit as expected:
-------------------- Cache Hit on query: [[
Matching Query: SET VARIABLE OBIS_REFRESH_CACHE=1,QUERY_SRC_CD='Report',PREFERRED_CURRENCY='USD';SELECT 0 s_0, "X - Airlines Delay"."Carrier"."Carrier Code" s_1, "X - Airlines Delay"."Carrier"."Carrier" s_2 FROM "X - Airlines Delay" ORDER BY 3 ASC NULLS LAST, 2 ASC NULLS LAST FETCH FIRST 5000001 ROWS ONLY
Created by: prodney
Out of interest I ran the same two tests on 11g -- both resulted in a cache hit, since it presumably ignores the unrecognised variable.
Summary
In OBIEE 12c, if you click "Refresh" on an analysis or dashboard, OBIEE Presentation Services forces a cache-bypass and cache-reseed, ensuring that you really do see the latest version of the data from source. It does this using the request variable, new in OBIEE 12c,OBIS_REFRESH_CACHE
.
Footnote
Courtesy of Steve Fitzgerald:As per the presentation server xsd, you can also revert the behavior (not sure why one would, but you can) in the instanceconfig.xml<Cache> <Query> <RefreshIncludeBIServerCache>false</RefreshIncludeBIServerCache> </Query> </Cache>
Changes in BI Server Cache Behaviour in OBIEE 12c : OBIS_REFRESH_CACHE
The OBIEE BI Server cache can be a great way of providing a performance boost to response times for end users – so long as it’s implemented carefully. Done wrong, and you’re papering over the cracks and heading for doom; done right, and it’s the ‘icing on the cake’. You can read more about how to use it properly here, and watch a video I did about it here. In this article we’ll see how the BI Server cache has changed in OBIEE 12c in a way that could prove somewhat perplexing to developers used to OBIEE 11g.
The BI Server cache works by inspecting queries as they are sent to the BI Server, and deciding if an existing cache entry can be used to provide the data. This can include direct hits (i.e. the same query being run again), or more advanced cases, where a subset or aggregation of an existing cache entry could be used. If a cache entry is used then a trip to the database is avoided and response times will typically be better – particularly if more than one database query would have been involved, or lots of additional post-processing on the BI Server.
When an analysis or dashboard is run, Presentation Services generates the necessary Logical SQL to return the data needed, and sends this to the BI Server. It’s at this point that the cache will, or won’t, kick in. The BI Server will accept Logical SQL from other sources than Presentation Services – in fact, any JDBC or ODBC client. This is useful as it enables us to validate behaviour that we’re observing and see how it can apply elsewhere.
When you build an Analysis in OBIEE 11g (and before), the cache will be used if applicable. Each time you add a column, or hit refresh, you’ll get an entry back from the cache if one exists. This has benefits – speed – but disadvantages too. When the data in the database changes, you will still get a cache hit, regardless. The only way to force OBIEE to show you the latest version of the data is to purge the cache first. You can target cache purges based on databases, tables, or even specific queries – but you do need to purge it.
What’s changed in OBIEE 12c is that when you click “Refresh” on an Analysis or Dashboard, the query is re-run against the source and the cache re-populated. Even if you have an existing cache entry, and even if the underlying data has not changed, if you hit Refresh, the cache will not be used. Which kind of makes sense, since “refresh” probably should indeed mean that.
Digging into OBIEE Cache Behaviour
Let’s prove this out. I’ve got SampleApp v506 (OBIEE 11.1.1.9), and SampleApp v511 (OBIEE 12.2.1). First off, I’ll clear the cache on each, using call saPurgeAllCache();
, run via Issue SQL:
Then I can use another BI Server procedure call to view the current cache contents (new in 11.1.1.9), call NQS_GetAllCacheEntries()
. For this one particularly make sure you’ve un-ticked “Use Oracle BI Presentation Services Cache”. This is different from the BI Server cache which is the subject of this article, and as the name implies is a cache that Presentation Services keeps.
I’ve confirmed that the BI Server cache is enabled on both servers, in NQSConfig.INI
############################################################################### # # Query Result Cache Section # ############################################################################### [CACHE] ENABLE = YES; # This Configuration setting is managed by Oracle Enterprise Manager Fusion Middleware Control
Now I create a very simple analysis in both 11g and 12c, showing a list of Airline Carriers and their Codes:
After clicking Results, a cache entry is inserted on each respective system:
Of particular interest is the create time, last used time, and number of times used:
If I now click Refresh in the Analysis window:
We see this happen to the caches:
In OBIEE 11g the cache entry is used – but in OBIEE 12c it’s not. The CreatedTime
is evidently not populated correctly, so instead let’s dive over to the query log (nqquery/obis1-query in 11g/12c respectively). In OBIEE 11g we’ve got:
-- SQL Request, logical request hash: 7c365697 SET VARIABLE QUERY_SRC_CD='Report',PREFERRED_CURRENCY='USD';SELECT 0 s_0, "X - Airlines Delay"."Carrier"."Carrier Code" s_1, "X - Airlines Delay"."Carrier"."Carrier" s_2 FROM "X - Airlines Delay" ORDER BY 1, 3 ASC NULLS LAST, 2 ASC NULLS LAST FETCH FIRST 5000001 ROWS ONLY -- Cache Hit on query: [[ Matching Query: SET VARIABLE QUERY_SRC_CD='Report',PREFERRED_CURRENCY='USD';SELECT 0 s_0, "X - Airlines Delay"."Carrier"."Carrier Code" s_1, "X - Airlines Delay"."Carrier"."Carrier" s_2 FROM "X - Airlines Delay" ORDER BY 1, 3 ASC NULLS LAST, 2 ASC NULLS LAST FETCH FIRST 5000001 ROWS ONLY
Whereas 12c is:
-- SQL Request, logical request hash: d53f813c SET VARIABLE OBIS_REFRESH_CACHE=1,QUERY_SRC_CD='Report',PREFERRED_CURRENCY='USD';SELECT 0 s_0, "X - Airlines Delay"."Carrier"."Carrier Code" s_1, "X - Airlines Delay"."Carrier"."Carrier" s_2 FROM "X - Airlines Delay" ORDER BY 3 ASC NULLS LAST, 2 ASC NULLS LAST FETCH FIRST 5000001 ROWS ONLY -- Sending query to database named X0 - Airlines Demo Dbs (ORCL) (id: <<320369>>), connection pool named Aggr Connection, logical request hash d53f813c, physical request hash a46c069c: [[ WITH SAWITH0 AS (select T243.CODE as c1, T243.DESCRIPTION as c2 from BI_AIRLINES.UNIQUE_CARRIERS T243 /* 30 UNIQUE_CARRIERS */ ) select D1.c1 as c1, D1.c2 as c2, D1.c3 as c3 from ( select 0 as c1, D1.c1 as c2, D1.c2 as c3 from SAWITH0 D1 order by c3, c2 ) D1 where rownum <= 5000001 -- Query Result Cache: [59124] The query for user 'prodney' was inserted into the query result cache. The filename is '/app/oracle/biee/user_projects/domains/bi/servers/obis1/cache/NQS__736117_52416_27.TBL'.
Looking closely at the 12c output shows three things:
- OBIEE has run a database query for this request, and not hit the cache
- A cache entry has clearly been created again as a result of this query
- The Logical SQL has a request variable set:
OBIS_REFRESH_CACHE=1
This is evidently added it by Presentation Services at runtime, since the Advanced tab of the analysis shows no such variable being set:
Let’s save the analysis, and experiment further. Evidently, the cache is being deliberately bypassed when the Refresh button is clicked when building an analysis – but what about when it is opened from the Catalog? We should see a cache hit here too:
Nope, no hit.
But, in the BI Server query log, no entry either – and the same on 11g. The reason being …. Presentation Service’s cache. D’oh!
From Administration > Manage Sessions I select Close All Cursors which forces a purge of the Presentation Services cache. When I reopen the analysis from the Catalog view, now I get a cache hit, in both 11g and 12c:
The same happens (successful cache hit) for the analysis used in a Dashboard being opened, having purged the Presentation Services cache first.
So at this point, we can say that OBIEE 11g and 12c both behave the same with the cache when opening analyses/dashboards, but differ when refreshing the analysis. In OBIEE 12c when an analysis is refreshed the cache is deliberately bypassed. Let’s check on refreshing a dashboard:
Same behaviour as with analyses – in 11g the cache is hit, in 12c the cache is bypassed and repopulated
To round this off, let’s doublecheck the behaviour of the new request variable that we’ve found, OBIS_REFRESH_CACHE
. Since it appears that Presentation Services is adding it in at runtime, let’s step over to a more basic way of interfacing with the BI Server – nqcmd
. Whilst we could probably use Issue SQL (as we did above for querying the cache) I want to avoid any more behind-the-scenes funny business from Presentation Services.
In OBIEE 12c, I run nqcmd
:
/app/oracle/biee/user_projects/domains/bi/bitools/bin/nqcmd.sh -d AnalyticsWeb -u prodney -p Admin123
Enter Q to enter a query, as follows:
SET VARIABLE OBIS_REFRESH_CACHE=1,QUERY_SRC_CD='Report',PREFERRED_CURRENCY='USD';SELECT 0 s_0, "X - Airlines Delay"."Carrier"."Carrier Code" s_1, "X - Airlines Delay"."Carrier"."Carrier" s_2 FROM "X - Airlines Delay" ORDER BY 3 ASC NULLS LAST, 2 ASC NULLS LAST FETCH FIRST 5000001 ROWS ONLY
In `obis1-query.log’ there’s the cache bypass and populate:
Query Result Cache: [59124] The query for user 'prodney' was inserted into the query result cache. The filename is '/app/oracle/biee/user_projects/domains/bi/servers/obis1/cache/NQS__736117_53779_29.TBL'.
If I run it again without the OBIS_REFRESH_CACHE
variable:
SET VARIABLE QUERY_SRC_CD='Report',PREFERRED_CURRENCY='USD';SELECT 0 s_0, "X - Airlines Delay"."Carrier"."Carrier Code" s_1, "X - Airlines Delay"."Carrier"."Carrier" s_2 FROM "X - Airlines Delay" ORDER BY 3 ASC NULLS LAST, 2 ASC NULLS LAST FETCH FIRST 5000001 ROWS ONLY
We get the cache hit as expected:
-------------------- Cache Hit on query: [[ Matching Query: SET VARIABLE OBIS_REFRESH_CACHE=1,QUERY_SRC_CD='Report',PREFERRED_CURRENCY='USD';SELECT 0 s_0, "X - Airlines Delay"."Carrier"."Carrier Code" s_1, "X - Airlines Delay"."Carrier"."Carrier" s_2 FROM "X - Airlines Delay" ORDER BY 3 ASC NULLS LAST, 2 ASC NULLS LAST FETCH FIRST 5000001 ROWS ONLY Created by: prodney
Out of interest I ran the same two tests on 11g — both resulted in a cache hit, since it presumably ignores the unrecognised variable.
Summary
In OBIEE 12c, if you click “Refresh” on an analysis or dashboard, OBIEE Presentation Services forces a cache-bypass and cache-reseed, ensuring that you really do see the latest version of the data from source. It does this using the request variable, new in OBIEE 12c, OBIS_REFRESH_CACHE
.
The post Changes in BI Server Cache Behaviour in OBIEE 12c : OBIS_REFRESH_CACHE appeared first on Rittman Mead Consulting.