Category Archives: Rittman Mead

Oracle GoldenGate, MySQL and Flume

Back in September Mark blogged about Oracle GoldenGate (OGG) and HDFS . In this short followup post I’m going to look at configuring the OGG Big Data Adapter for Flume, to trickle feed blog posts and comments from our site to HDFS. If you haven’t done so already, I strongly recommend you read through Mark’s previous post, as it explains in detail how the OGG BD Adapter works.  Just like Hive and HDFS, Flume isn’t a fully-supported target so we will use Oracle GoldenGate for Java Adapter user exits to achieve what we want.

What we need to do now is

  1. Configure our MySQL database to be fit for duty for GoldenGate.
  2. Install and configure Oracle GoldenGate for MySQL on our DB server
  3. Create a new OGG Extract and Trail files for the database tables we want to feed to Flume
  4. Configure a Flume Agent on our Cloudera cluster to ‘sink’ to HDFS
  5. Create and configure the OGG Java adapter for Flume
  6. Create External Tables in Hive to expose the HDFS files to SQL access

OGG and Flume

Setting up the MySQL Database Source Capture

The MySQL database I will use for this example contains blog posts, comments etc from our website. We now want to use Oracle GoldenGate to capture new blog post and our readers’ comments and feed this information in to the Hadoop cluster we have running in the Rittman Mead Labs, along with other feeds, such as Twitter and activity logs.

The database has to be configured to user binary logging and also we need to ensure that the socket file can be found in /tmp/mysql.socket. You can find the details for this in the documentation. Also we need to make sure that the tables we want to extract from are using the InnoDB engine and not the default MyISAM one. The engine can easily be changed by issuing

alter table wp_mysql.wp_posts engine=InnoDB;

Assuming we already have installed OGG for MySQL on /opt/oracle/OGG/ we can now go ahead and configure the Manager process and the Extract for our tables. The tables we are interested in are

wp_mysql.wp_posts
wp_mysql.wp_comments
wp_mysql.wp_users
wp_mysql.wp_terms
wp_mysql.wp_term_taxonomy

First configure the manager

-bash-4.1$ cat dirprm/mgr.prm 
PORT 7809
PURGEOLDEXTRACTS /opt/oracle/OGG/dirdat/*, USECHECKPOINTS

Now configure the Extract to capture changes made to the tables we are interested in

-bash-4.1$ cat dirprm/mysql.prm 
EXTRACT mysql
SOURCEDB wp_mysql, USERID root, PASSWORD password
discardfile /opt/oracle/OGG/dirrpt/FLUME.dsc, purge
EXTTRAIL /opt/oracle/OGG/dirdat/et
GETUPDATEBEFORES
TRANLOGOPTIONS ALTLOGDEST /var/lib/mysql/localhost-bin.index
TABLE wp_mysql.wp_comments;
TABLE wp_mysql.wp_posts;
TABLE wp_mysql.wp_users;
TABLE wp_mysql.wp_terms;
TABLE wp_mysql.wp_term_taxonomy;

We should now be able to create the extract and start the process, as with a normal extract.

ggsci>add extract mysql, tranlog, begin now
ggsci>add exttrail ./dirdat/et, extract mysql
ggsci>start extract mysql
ggsci>info mysql
ggsci>view report mysql

We will also have to generate metadata to describe the table structures in the MySQL database. This file will be used by the Flume adapter to map columns and data types to the Avro format.

-bash-4.1$ cat dirprm/defgen.prm 
-- To generate trail source-definitions for GG v11.2 Adapters, use GG 11.2 defgen,
-- or use GG 12.1.x defgen with "format 11.2" definition format.
-- If using GG 12.1.x as a source for GG 11.2 adapters, also generate format 11.2 trails.

-- UserId logger, Password password
SOURCEDB wp_mysql, USERID root, PASSWORD password

DefsFile dirdef/wp.def

TABLE wp_mysql.wp_comments;
TABLE wp_mysql.wp_posts;
TABLE wp_mysql.wp_users;
TABLE wp_mysql.wp_terms;
TABLE wp_mysql.wp_term_taxonomy;
-bash-4.1$ ./defgen PARAMFILE dirprm/defgen.prm 

***********************************************************************
        Oracle GoldenGate Table Definition Generator for MySQL
      Version 12.1.2.1.0 OGGCORE_12.1.2.1.0_PLATFORMS_140920.0203
...

***********************************************************************
**            Running with the following parameters                  **
***********************************************************************
SOURCEDB wp_mysql, USERID root, PASSWORD ******
DefsFile dirdef/wp.def
TABLE wp_mysql.wp_comments;
Retrieving definition for wp_mysql.wp_comments.
TABLE wp_mysql.wp_posts;
Retrieving definition for wp_mysql.wp_posts.
TABLE wp_mysql.wp_users;
Retrieving definition for wp_mysql.wp_users.
TABLE wp_mysql.wp_terms;
Retrieving definition for wp_mysql.wp_terms.
TABLE wp_mysql.wp_term_taxonomy;
Retrieving definition for wp_mysql.wp_term_taxonomy.


Definitions generated for 5 tables in dirdef/wp.def.

Setting up the OGG Java Adapter for Flume

The OGG Java Adapter for Flume will use the EXTTRAIL created earlier as a source, pack the data up and feed to the cluster Flume Agent, using Avro and RPC. The Flume Adapter thus needs to know

  • Where is the OGG EXTTRAIL to read from
  • How to treat the incoming data and operations (e.g. Insert, Update, Delete)
  • Where to send the Avro messages to

First we create a parameter file for the Flume Adapter

-bash-4.1$ cat dirprm/flume.prm
EXTRACT flume
SETENV ( GGS_USEREXIT_CONF = "dirprm/flume.props")
CUSEREXIT libggjava_ue.so CUSEREXIT PASSTHRU INCLUDEUPDATEBEFORES
GETUPDATEBEFORES
NOCOMPRESSUPDATES
SOURCEDEFS ./dirdef/wp.def
DISCARDFILE ./dirrpt/flume.dsc, purge

TABLE wp_mysql.wp_comments;
TABLE wp_mysql.wp_posts;
TABLE wp_mysql.wp_users;
TABLE wp_mysql.wp_terms;
TABLE wp_mysql.wp_term_taxonomy;

There are two things to note here

  • The OGG Java Adapter User Exit is configured in a file called flume.props
  • The source tables’ structures are defined in wp.def

The flume.props file is a ‘standard’ User Exit config file

-bash-4.1$ cat dirprm/flume.props 
gg.handlerlist=ggflume

gg.handler.ggflume.type=com.goldengate.delivery.handler.flume.FlumeHandler
gg.handler.ggflume.host=bd5node1.rittmandev.com
gg.handler.ggflume.port=4545

gg.handler.ggflume.rpcType=avro
gg.handler.ggflume.delimiter=;
gg.handler.ggflume.mode=tx
gg.handler.ggflume.includeOpType=true
# Indicates if the operation timestamp should be included as part of output in the delimited separated values
# true - Operation timestamp will be included in the output
# false - Operation timestamp will not be included in the output
# Default :- true
gg.handler.ggflume.includeOpTimestamp=true

# Optional properties to use the transaction grouping functionality
#gg.handler.ggflume.maxGroupSize=1000
#gg.handler.ggflume.minGroupSize=1000

### native library config ###
goldengate.userexit.nochkpt=TRUE
goldengate.userexit.timestamp=utc
goldengate.log.logname=cuserexit
goldengate.log.level=INFO
goldengate.log.tofile=true
goldengate.userexit.writers=javawriter

gg.report.time=30sec
gg.classpath=AdapterExamples/big-data/flume/target/flume-lib/*

javawriter.stats.full=TRUE
javawriter.stats.display=TRUE
javawriter.bootoptions=-Xmx32m -Xms32m -Djava.class.path=ggjava/ggjava.jar -Dlog4j.configuration=log4j.properties

Some points of interest here are

  • The Flume agent we will send our data to is running on port 4545 on host bd5node1.rittmandev.com
  • We want each record to be prefixed with I(nsert), U(pdated) or D(delete)
  • We want each record to be postfixed with a timestamp of the transaction date
  • The Java class com.goldengate.delivery.handler.flume.FlumeHandler will do the actual work. (The curios reader can view the code in /opt/oracle/OGG/AdapterExamples/big-data/flume/src/main/java/com/goldengate/delivery/handler/flume/FlumeHandler.java)

Before starting up the OGG Flume, let’s first make sure that the Flume agent on bd5node1 is configure to receive our Avro message (Source) and also what to do with the data (Sink)

a1.channels = c1
a1.sources = r1
a1.sinks = k2
a1.channels.c1.type = memory
a1.sources.r1.channels = c1 
a1.sources.r1.type = avro 
a1.sources.r1.bind = bda5node1
a1.sources.r1.port = 4545
a1.sinks.k2.type = hdfs
a1.sinks.k2.channel = c1
a1.sinks.k2.hdfs.path = /user/flume/gg/%{SCHEMA_NAME}/%{TABLE_NAME} 
a1.sinks.k2.hdfs.filePrefix = %{TABLE_NAME}_ 
a1.sinks.k2.hdfs.writeFormat=Writable 
a1.sinks.k2.hdfs.rollInterval=0
a1.sinks.k2.hdfs.hdfs.rollSize=1048576
a1.sinks.k2.hdfs.rollCount=0
a1.sinks.k2.hdfs.batchSize=100 
a1.sinks.k2.hdfs.fileType=DataStream

Here we note that

  • The agent’s source (inbound data stream) is to run on port 4545 and to use avro
  • The agent’s sink will write to HDFS and store the files  in /user/flume/gg/%{SCHEMA_NAME}/%{TABLE_NAME}
  • The HDFS files will be rolled over every 1Mb (1048576 bytes)

We are now ready to head back to the webserver that runs the MySQL database and start the Flume extract, that will feed all committed MySQL transactions against our selected tables to the Flume Agent on the cluster, which in turn will write the data to HDFS

-bash-4.1$ export LD_LIBRARY_PATH=/usr/lib/jvm/jdk1.7.0_55/jre/lib/amd64/server
-bash-4.1$ export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_55/
-bash-4.1$ ./ggsci
ggsci>add extract flume, exttrailsource ./dirdat/et 
ggsci>start flume
ggsci>info flume
EXTRACT    FLUME     Last Started 2015-03-29 17:51   Status RUNNING
Checkpoint Lag       00:00:00 (updated 00:00:06 ago)
Process ID           24331
Log Read Checkpoint  File /opt/oracle/OGG/dirdat/et000008
                     2015-03-29 17:51:45.000000  RBA 7742

If I now submit this blogpost I should see the results showing up our Hadoop cluster in the Rittman Mead Labs.

[oracle@bda5node1 ~]$ hadoop fs -ls /user/flume/gg/wp_mysql/wp_posts
-rw-r--r--   3 flume  flume   3030 2015-03-30 16:40 /user/flume/gg/wp_mysql/wp_posts/wp_posts_.1427729981456

We can quickly create an externally organized table in Hive to view the results with SQL

hive> CREATE EXTERNAL TABLE wp_posts(
     op string, 
 ID                     int,
 post_author            int,
 post_date              String,
 post_date_gmt          String,
 post_content           String,
 post_title             String,
 post_excerpt           String,
 post_status            String,
 comment_status         String,
 ping_status            String,
 post_password          String,
 post_name              String,
 to_ping                String,
 pinged                 String,
 post_modified          String,
 post_modified_gmt      String,
 post_content_filtered  String,
 post_parent            int,
 guid                   String,
 menu_order             int,
 post_type              String,
 post_mime_type         String,
 comment_count          int,
     op_timestamp timestamp
  )
 COMMENT 'External table ontop of GG Flume sink, landed in hdfs'
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ';'
 STORED AS TEXTFILE
 LOCATION '/user/flume/gg/wp_mysql/wp_posts/';

hive> select post_title from gg_flume.wp_posts where op='I' and id=22112;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1427647277272_0017, Tracking URL = http://bda5node1.rittmandev.com:8088/proxy/application_1427647277272_0017/
Kill Command = /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/bin/hadoop job  -kill job_1427647277272_0017
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0
2015-03-30 16:51:17,715 Stage-1 map = 0%,  reduce = 0%
2015-03-30 16:51:32,363 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 1.88 sec
2015-03-30 16:51:33,422 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.38 sec
MapReduce Total cumulative CPU time: 3 seconds 380 msec
Ended Job = job_1427647277272_0017
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 2   Cumulative CPU: 3.38 sec   HDFS Read: 3207 HDFS Write: 35 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 380 msec
OK
Oracle GoldenGate, MySQL and Flume
Time taken: 55.613 seconds, Fetched: 1 row(s)

Please leave a comment and you’ll be contributing to an OGG Flume!

Lifting the Lid on OBIEE Internals with Linux Diagnostics Tools

There comes the point in any sufficiently complex or difficult problem diagnosis that the log files in OBIEE alone are not sufficient for building up a complete picture of what’s going on. Even with the debug/trace data that Presentation Services and other components can be configured precisely to write you’re sometimes just left having to guess what is going on inside the black box of each of the OBIEE system components.

Here we’re going to look at a couple of examples of lifting the lid just a little bit further on what OBIEE is up to, using standard Linux diagnostic tools. These are not something to be reaching for in the first instance, but more getting on to a last resort. Almost always the problem is simpler than you’ll think, and leaping for an network trace or stack trace is going to be missing the wood for the trees.

Diagnostics in action

At a client recently they had a problem with a custom skin deployment on a clustered (scaled-out) OBIEE deployment. Amongst other things the skin was setting the default palette for charts (viewui/chart/dvt-graph-skin.xml), and they were seeing only 50% of chart executions pick up the custom palette – the other 50% used the default. If either entire node was shut down, things were fine, but otherwise it was a 50:50 chance what the colours would be. Most odd….

When you configure a custom skin in OBIEE you should be setting CustomerResourcePhysicalPath in instanceconfig.xml, along with CustomerResourceVirtualPath. Both these are necessary so that Presentation Services knows:

  1. Logical – How to generate URLs for content requested by the user’s browser (eg logos, CSS files, etc).
  2. Physical – How to physically reference files on the file system that are read by OBIEE itself (eg XML files, language files)

The way the client had configured their custom skin was that it was on storage local to each node, and in a node-specific path, something like this:

  • /data/instance1/s_custom/
  • /data/instance2/s_custom/

Writing out the details in hindsight always makes a problem’s root cause a lot more obvious, but at the time this was a tricky problem. Let’s start with the basics. Java Host is responsible for rendering charts, and for some reason, it was not reading the custom colour scheme file from the custom skin correctly. Presentation Services uses all the available Java Hosts in a cluster to request charts, presumably on some kind of round-robin basis. An analysis request on NODE01 has a 50:50 chance of getting its chart rendered on Java Host on NODE01 or Java Host on NODE02:


Turned all the log files up to 11 didn’t yield anything useful. For some reason half the time Java Host would just “ignore” the custom skin. Shutting down each node proved that in isolation the custom skin configuration on each node was definitely correct, because then the colours started working just fine. It was only when multiple Java Hosts across the nodes were active that there was a problem.

How Java Host picks up the custom skin is entirely undocumented, and I ended up figuring out that it must get the path to the skin as part of the chart request from Presentation Services. Since Presentation Services on NODE01 has been configured with a CustomerResourcePhysicalPath of /data/instance1/s_custom/, Java Host on NODE02 would fail to find this path (since on NODE02 the skin is located at /data/instance2/s_custom/) and so fall back on the default. This was my hypothesis that I then proved by making the path available for each skin available on each node (symlink, or using a standard path would also have worked, eg /data/shared/s_custom, or even better, a shared mount point), and from there everything worked just fine.

But a hypothesis and successful resolution alone wasn’t entirely enough. Sure the client was happy, but there was that little itch, that unknown “black box” system that appeared to behave how I had deduced, but could we know for sure?

tcpdump – network analysis

All of the OBIEE components communicate with each other and the outside world over TCP. When Presentation Services wants a chart rendered it does so by sending a request to Java Host – over TCP. Using the tcpdump tool we can see that in action, and inspect what gets sent:

$ sudo tcpdump -i venet0 -i lo -nnA 'port 9810'

The -A flag capture the ASCII representation of the packet; use -X if you want ASCII and hex. Port 9810 is the Java Host listen port.

The output looks like this:


You’ll note that in this case it’s intra-node communication, i.e. src and dest IP addresses are the same. The port for Java Host (9810) is clear, and we can verify that the src port (38566) is Presentation Services with the -p (process) flag of netstat:

$ sudo netstat -pn |grep 38566
tcp        0      0 192.168.10.152:38566        192.168.10.152:9810         ESTABLISHED 5893/sawserver

So now if you look in a bit more detail at the footer of the request from Presentation Services that tcpdump captured you’ll see loud and clear (relatively) the custom skin path with the graph customisation file:


Proof that the Presentation Services is indeed telling Java Host where to go and look for the custom attributes (including colours)! NB this is on a test environment, so that paths vary from the /data/instance... example above)

strace – system call analysis

So tcpdump gives us the smoking gun, but can we find the corpse as well? Sure we can! strace is a tool for tracing system calls, and a fantastically powerful one, but here’s a very simple example:

$strace -o /tmp/obijh1_strace.log -f -p $(pgrep -f obijh1)

-o means to write it to file, -f follows child processes as well, and -p passes the process id that strace should attach to. Have set the trace running I run my chart, and then go and pick through my trace file.

We know it’s the dvt-graph-skin.xml file that Java Host should be reading to pick up the custom colours, so let’s search for that:


Well there we go – Java Host went to go and look for the skin in the path that it was given by Presentation Services, and couldn’t find it. From there it’ll fall back on the product defaults.

Right Tool, Right Job

As as I said at the top of this article, these diagnostic tools are not the kind of things you’d be using day to day. Understanding their output is not always easy and it’s probably easy to do more harm than good with false assumption about what a trace is telling you. But, in the right situations, they are great for really finding out what is going on under the covers of OBIEE.

New Oracle Big Data Quick-Start Packages from Rittman Mead

Many organisations using Oracle’s business intelligence and data warehousing tools are now looking to extend their capabilities using “big data” technologies. Customers running their data warehouses on Oracle Databases are now looking to use Hadoop to extend their storage capacity whilst offloading initial data loading and ETL to this complementary platform; other customers are using Hadoop and Oracle’s Big Data Appliance to add new capabilities around unstructured and sensor data analysis, all at considerably lower-cost than traditional database storage.

NewImage

In addition, as data and analytics technologies and capabilities have evolved, there has never been a better opportunity to reach further into your data to exploit more value. Big Data platforms, Data Science methods and data discovery technologies make it possible to unlock the power of your data and put it in the hands of your  executives and team members – but what is it worth to you? What’s the value to your organisation of exploring deeper int the data you have, and how do you show return?

Many organisations have begin to explore Big Data technologies to understand where they can exploit value and extend their existing analytics platforms, but what’s the business case? The good news is, using current platforms, and following architectures like the Oracle Information Management and Big Reference Architecture written in conjunction with Rittman Mead, the foundation is in place to unlock a range of growth opportunities. Finding new value in existing data, predictive analytics, data discovery, reducing the cost of data storage, ETL offloading are all starter business cases proven to return value quickly.

NewImage

To help you start on the Oracle big data journey, Rittman Mead have put together two quick-start packages focuses on the most popular Oracle customer use-cases;

If this sounds like something you or your organization might be interested in, take a look at our new Quick Start Oracle Big Data and Big Data Discovery packages from Rittman Mead home page, or drop me an email at mark.rittman@rittmanmead.com and I’ll let you know how we can help.

RM BI Forum 2015 : Justification Letters for Employers

(Thanks to Christian Berg @Nephentur for the suggestion, and acknowledgements to ODTUG KScope for the original idea – our favourite conference after the BI Forum)

The Rittman Mead BI Forum 2015 promises to be our best BI Forum yet, with fantastic speakers at each event, keynotes and guest speakers from Oracle and John Foreman, author of the bestselling book “Data Smart”, a data visualisation challenge and an optional one-day masterclass on delivering Oracle’s new Information Management and Big Data reference architecture by Rittman Mead’s Mark Rittman and Jordan Meyer. Uniquely amongst Oracle BI events we keep the numbers attending very limited and run just a single stream at each event, so everyone takes part in the same sessions and gets to meet all the attendees and speakers over the three days.

Sometimes though, management within organizations require special justification for team members to attend events like these, and to help you put your case together and get across the unique education and networking benefits of the Rittman Mead BI Forum, we’ve prepared justification letters for you to complete with your details, one each for the Brighton and Atlanta events. Click on the links below to download sample justification letters for the Brighton BI Forum running on May 6th-8th 2015, and the Atlanta one running the week after on May 13th-15th 2015:

Full details on the BI Forum 2015 agenda and how to register can be found on the Rittman Mead BI Forum 2015 home page, with registration open until the weekend before each event – hurry though as attendee numbers are strictly limited.

OBIEE nqcmd Tidbits

nqcmd is the ODBC command line tool that always has, and hopefully always will, shipped with OBIEE. It enables you to manually fire queries directly at the BI Server, rather than through the usual way of Presentation Services generating Logical SQL and sending it to BI Server. This can be useful in several cases:

  1. Automated cache purging, by sending one of the SAPurge[…] ODBC commands to the BI Server, usually done as part of a script
  2. Automated execution of Logical SQL, often done to support testing scenarios
  3. Load Testing the BI Server (via a magic undocumented switch, SA_NQCMD_ADVANCED)
  4. Manual interogation of the BI Server – if you want to poke and prod nqsserver without launching a web browser, nqcmd is your friend :)

In using nqcmd there’re a couple of things I want to demonstrate here that I find useful but haven’t seen discussed [in detail] elsewhere.

Query Log via nqcmd

All BI Server queries run with a LOGLEVEL>=1 will write some log details to nqquery.log. The usual route to view this is either on the server directly itself, transferring it off with a tool such as WinSCP, or through the Administration page of OBIEE. Another option that is available is from nqcmd itself. You need to do two things:

  1. Set the environment variable SA_NQCMD_ADVANCED to Yes
  2. Include the command line arguments -ShowQueryLog -H when you invoke nqcmd. I don’t know what -H does – it’s just specified as being required for this to work.

Here’s a simple example in action:

[oracle@demo ~]$ export SA_NQCMD_ADVANCED=Yes
[oracle@demo ~]$ nqcmd -d AnalyticsWeb -u prodney -p Admin123 -ShowQueryLog -H

-------------------------------------------------------------------------------
          Oracle BI ODBC Client
          Copyright (c) 1997-2013 Oracle Corporation, All rights reserved
-------------------------------------------------------------------------------



Connection open with info:
[0][State: 01000] [DataDirect][ODBC lib] Application's WCHAR type must be UTF16, because odbc driver's unicode type is UTF16

        [T]able info
        [C]olumn info
        [D]ata type info
        [F]oreign keys info
        [P]rimary key info
        [K]ey statistics info
        [S]pecial columns info
        [Q]uery statement
Select Option: Q

Give SQL Statement: SET VARIABLE LOGLEVEL=1:SELECT "A - Sample Sales"."Base Facts"."1- Revenue" s_1 FROM "A - Sample Sales"
SET VARIABLE LOGLEVEL=1:SELECT "A - Sample Sales"."Base Facts"."1- Revenue" s_1 FROM "A - Sample Sales"
-----------------------
s_1
-----------------------
70000000.00
-----------------------
Row count: 1
-----------------------
[2015-03-21T16:36:31.000+00:00] [OracleBIServerComponent] [TRACE:1] [USER-0] [] [ecid: 0054Sw944KmFw000jzwkno0003ac0000rl,0] [tid: 56660700] [requestid: 201f0002] [sessionid: 201f0000] [username: prodney] ###
########################################### [[
-------------------- SQL Request, logical request hash:
d2294415
SET VARIABLE LOGLEVEL=1:SELECT "A - Sample Sales"."Base Facts"."1- Revenue" s_1 FROM "A - Sample Sales"

]]
[2015-03-21T16:36:31.000+00:00] [OracleBIServerComponent] [TRACE:1] [USER-34] [] [ecid: 0054Sw94mRzFw000jzwkno0003ac0000ro,0] [tid: 56660700] [requestid: 201f0002] [sessionid: 201f0000] [username: prodney] -------------------- Query Status: Successful Completion [[

]]
[2015-03-21T16:36:31.000+00:00] [OracleBIServerComponent] [TRACE:1] [USER-28] [] [ecid: 0054Sw94mRzFw000jzwkno0003ac0000ro,0] [tid: 56660700] [requestid: 201f0002] [sessionid: 201f0000] [username: prodney] -------------------- Physical query response time 0 (seconds), id <<333971>> [[

]]

]]
[2015-03-21T16:36:31.000+00:00] [OracleBIServerComponent] [TRACE:1] [USER-29] [] [ecid: 0054Sw94mRzFw000jzwkno0003ac0000ro,0] [tid: 56660700] [requestid: 201f0002] [sessionid: 201f0000] [username: prodney] -------------------- Physical Query Summary Stats: Number of physical queries 1, Cumulative time 0, DB-connect time 0 (seconds) [[

]]
[2015-03-21T16:36:31.000+00:00] [OracleBIServerComponent] [TRACE:1] [USER-33] [] [ecid: 0054Sw94mRzFw000jzwkno0003ac0000ro,0] [tid: 56660700] [requestid: 201f0002] [sessionid: 201f0000] [username: prodney] -------------------- Logical Query Summary Stats: Elapsed time 0, Response time 0, Compilation time 0 (seconds) [[

]]

Neat! But so what? Well, I see two uses straight away:

  1. In some situations you may not have access to the filesystem of the server on which the BI Server is running. For example, as a consultant I’ve been to clients where I’m given the Administration Tool client installation only. If I want to debug an RPD that I’m developing I’ll usually want to poke around in nqquery.log to see quite what physical SQL is being generated – and now I can.
  2. There was a discussion on the EMG mailing list recently about generating Physical SQL without executing it on the database. I’m going to discuss this in the next section of this article, and to do the analysis for this rapidly I’m using the inline query log.

Generating Physical SQL for OBIEE without Executing it – SKIP_PHYSICAL_QUERY_EXEC

OBIEE generates the Physical SQL that it runs against the database dynamically, at runtime. It takes the Logical request (“Logical SQL”), runs it through the RPD and generates one or more “Physical SQL” statements to be executed on the database as required to pull back the necessary data. A question arose recently on the EMG mailing list as to whether it is possible to get the Physical SQL – without executing it. You can imagine the benefits of this (namely, regression testing) since executing the database query each time is typically going to be expensive in machine resource and time consuming.

In SampleApp v406 there is a /home/oracle/scripts/PhysicalSQLGenerator, which does two things. First off it generates the Logical SQL for a given analysis, presumably using the generateReportSQL web service. It then takes that and runs it through nqcmd, scraping the nqquery.log for the resulting Physical SQL. In all of this no database queries get run. Very cool. But what’s the “secret sauce” at play here – can we distill it down in order to use it ourselves?

First, let’s look at how the SampleApp script does it. It sets some additional request variables in the Logical SQL:

[oracle@demo PhysicalSQLGenerator]$ cat lsql-out-dir/q1.lsql
SET VARIABLE SKIP_PHYSICAL_QUERY_EXEC=1, LOGLEVEL=2, DISABLE_CACHE_HIT=1, DISABLE_CACHE_SEED=1, QUERY_SRC_CD='SampleApp-PSQLGEN', SAW_SRC_PATH='/users/prodney/folder/request variable example':SELECT
   0 s_0,
   "A - Sample Sales"."Base Facts"."1- Revenue" s_1
FROM "A - Sample Sales"
ORDER BY 1
FETCH FIRST 5000001 ROWS ONLY
;

And if we extract the relevant part out of the bash script we can see that it also uses a couple of extra command line arguments (-q -NoFetch) when invoking nqcmd:

nqcmd -q -NoFetch -d AnalyticsWeb -u weblogic -p Admin123 -s lsql-out-dir/q1.lsql

When it’s run we check nqquery.log and lo-and-behold we get this: (edited for brevity)

------------------- Sending query to database named 01 - Sample App Data (ORCL) (id: <<69923>>), connection pool named Sample Relational Connection, logical request hash dd4fb54f, physical request hash 8d6f36
3d: [[
WITH
SAWITH0 AS (select sum(T42442.Revenue) as c1
from
     BISAMPLE.SAMP_REVENUE_FA2 T42442 /* F21 Rev. (Aggregate 2) */ )
select D1.c1 as c1, D1.c2 as c2 from ( select distinct 0 as c1,
     D1.c1 as c2
from
     SAWITH0 D1 ) D1 where rownum <= 5000001

]]

Query Status: Successful Completion [[

Rows 0, bytes 24 retrieved from database query id: <<69923>> Simulation Gateway 

Physical query response time 0 (seconds), id <<69923>> Simulation Gateway

Whilst the log says it is “Sending query to database” it does no such thing, and the “Simulation Gateway” is the giveaway clue. Proof that it doesn’t connect to the database? I shut the database down, and it still worked just fine. Crude, yes, but effective.

I’ll intersperse here the little trick that I mentioned in the first part of this article : -ShowQueryLog. It’s tedious switching back and forth between nqcmd and the nqquery.log when doing this kind of testing, so let’s do it all as one:

export SA_NQCMD_ADVANCED=Yes
nqcmd -H -ShowQueryLog -q -NoFetch -d AnalyticsWeb -u weblogic -p Admin123 -s lsql-out-dir/q1.lsql

Unfortunately it looks like -ShowQueryLog is mutually exclusive to -q and -NoFetch since it doesn’t return anything, even though the nqquery.log did get additional entries. But that’s fine, since by removing these two flags in order to get -ShowQueryLog to work we’re whittling down what is actually needed to generate the physical SQL on its own without database execution. Here’s the nqcmd, showing the query log inline and showing still the “Simulation Gateway” indicative of no physical query execution:

[oracle@demo PhysicalSQLGenerator]$ export SA_NQCMD_ADVANCED=Yes
[oracle@demo PhysicalSQLGenerator]$ nqcmd -H -ShowQueryLog -d AnalyticsWeb -u weblogic -p Admin123 -s lsql-out-dir/q1.lsql

-------------------------------------------------------------------------------
          Oracle BI ODBC Client
          Copyright (c) 1997-2013 Oracle Corporation, All rights reserved
-------------------------------------------------------------------------------

[...]

------------------------------------
s_0          s_1
------------------------------------
------------------------------------
Row count: 0
------------------------------------
[2015-03-23T05:52:57.000+00:00] [OracleBIServerComponent] [TRACE:2] [USER-0] [] [ecid: 0054Ut7AJ33Fw000jzwkno0005UZ00005Q,0] [tid: 8f194700] [requestid: 8a1e0002] [sessionid: 8a1e0000] [username: weblogic] ############################################## [[
-------------------- SQL Request, logical request hash:
dd4fb54f
SET VARIABLE SKIP_PHYSICAL_QUERY_EXEC=1, LOGLEVEL=2, DISABLE_CACHE_HIT=1, DISABLE_CACHE_SEED=1, QUERY_SRC_CD='SampleApp-PSQLGEN', SAW_SRC_PATH='/users/prodney/folder/request variable example':SELECT
   0 s_0,
   "A - Sample Sales"."Base Facts"."1- Revenue" s_1
FROM "A - Sample Sales"
ORDER BY 1
FETCH FIRST 5000001 ROWS ONLY



[...]

[2015-03-23T05:52:57.000+00:00] [OracleBIServerComponent] [TRACE:2] [USER-18] [] [ecid: 0054Ut7AK5DFw000jzwkno0005UZ00005S,0] [tid: 8f194700] [requestid: 8a1e0002] [sessionid: 8a1e0000] [username: weblogic] -------------------- Sending query to database named 01 - Sample App Data (ORCL) (id: <<70983>>), connection pool named Sample Relational Connection, logical request hash dd4fb54f, physical request hash 8d6f363d: [[
WITH
SAWITH0 AS (select sum(T42442.Revenue) as c1
from
     BISAMPLE.SAMP_REVENUE_FA2 T42442 /* F21 Rev. (Aggregate 2) */ )
select D1.c1 as c1, D1.c2 as c2 from ( select distinct 0 as c1,
     D1.c1 as c2
from
     SAWITH0 D1 ) D1 where rownum <= 5000001

]]
[2015-03-23T05:52:57.000+00:00] [OracleBIServerComponent] [TRACE:2] [USER-34] [] [ecid: 0054Ut7AYi0Fw000jzwkno0005UZ00005T,0] [tid: 8f194700] [requestid: 8a1e0002] [sessionid: 8a1e0000] [username: weblogic] -------------------- Query Status: Successful Completion [[

]]
[2015-03-23T05:52:57.000+00:00] [OracleBIServerComponent] [TRACE:2] [USER-26] [] [ecid: 0054Ut7AYi0Fw000jzwkno0005UZ00005T,0] [tid: 8f194700] [requestid: 8a1e0002] [sessionid: 8a1e0000] [username: weblogic] -------------------- Rows 0, bytes 24 retrieved from database query id: <<70983>> Simulation Gateway [[

]]
[2015-03-23T05:52:57.000+00:00] [OracleBIServerComponent] [TRACE:2] [USER-28] [] [ecid: 0054Ut7AYi0Fw000jzwkno0005UZ00005T,0] [tid: 8f194700] [requestid: 8a1e0002] [sessionid: 8a1e0000] [username: weblogic] -------------------- Physical query response time 0 (seconds), id <<70983>> Simulation Gateway [[

[...]

It’s clear that the “-q -Nofetch” parameters used in nqcmd don’t have an effect on whether the physical query is executed (they’re to do with whether nqcmd as an ODBC client pulls back and displays the data you ask for). It’s actually just a single request variable that does the job, and it goes under the rather obvious name of SKIP_PHYSICAL_QUERY_EXEC. When set to 1 it generates all the necessary physical SQL but doesn’t execute it, and the presence of “Simulation Gateway” in the log signals this.