Tag Archives: Oracle BI Suite EE

BI Forum 2014 preview – No Silver Bullets : OBIEE Performance in the Real World

I’m honoured to have been accepted to speak at this year’s Rittman Mead BI Forum, the sixth year of this expert-level conference that draws some of the best Oracle BI/DW minds together from around the world. It’s running May 8th-9th in Brighton, and May 15-16th in Atlanta, with an optional masterclass from Cloudera’s Lars George the day before the conference itself at each venue.

My first visit to the BI Forum was in 2009 where I presented Performance Testing OBIEE, and now five years later (five years!) I’m back, like a stuck record, talking about the same thing – performance. That I’m still talking about it means that there’s still an audience for it, and this time I’m looking beyond just testing performance, but how it’s approached by people working with OBIEE. For an industry built around 1s and 0s, computers doing just what you tell them to and nothing else, there is a surprising amount of suspect folklore and “best practices” used when it comes to “fixing” performance problems.

OBIEE performance good luck charm

Getting good performance with OBIEE is just a matter of being methodical. Understanding where to look for information is half the battle. By understanding where the time goes, improvements can be targeted where they will be most effective. Heavily influence by Cary Millsap and his Method-R approach to performance, I will look at how to practically apply this to OBIEE. Most of the information needed to build up a full picture is available readily from OBIEE’s log files

I’ll also dig a bit deeper into OBIEE, exploring how to determine how the system’s behaving “under the covers”. The primary technique for this is through OBIEE’s DMS metrics which I have written about recently in relation to the new Rittman Mead open-source tool, obi-metrics-agent and am using day-to-day to rapidly examine and resolve performance problems that clients see.

I’m excited to be presenting again on this topic, and I hope to see you in Brighton next month. The conference always sells out, so don’t delay – register today!

Visualising OBIEE DMS metrics with Graphite

Assuming you have set up obi-metrics-agent and collectl as described in my previous post, you have a wealth of data at your disposal for graphing and exploring in Graphite, including:

  • OS (CPU, disk, network, memory)
  • OBIEE’s metrics
  • Metrics about DMS itself
  • Carbon (Graphite’s data collector agent) metrics

In this post I’ll show you some of the techniques we can use to put together a simple dashboard.

Building graphs

First off, let’s see how Graphite actually builds graphs. When you select a data series from the Metrics pane it is added to the Graphite composer where you can have multiple metrics. They’re listed in a legend, and if you click on Graph Data you can see the list of them.

Data held in Graphite (or technically, held in whisper) can be manipulated and pre-processed in many ways before Graphite renders it. This can be mathmatical transforms of the data (eg Moving Average), but also how the data and its label is shown. Here I’ll take the example of several of the CPU metrics (via collectl) to see how we can manipulate them.

To start with, I’ve just added idle, wait and user from the cputotals folder, giving me a nice graph thus:

We can do some obvious things like add in a title, from the Graph Options menu

Graphite functions

Looking at the legend there’s a lot of repeated text (the full qualification of the metric name) which makes the graph more cluttered and less easy to read. We can use a Graphite function to fix this. Click on Graph Data, and use ctrl-click to select all three metrics:

Now click on Apply Function -> Set Legend Name By Metric. The aliasByMetric function is wrapped around the metrics, and the legend on the graph now shows just the metric names which is much smarter:

You can read more about Graphite functions here.

Another useful technique is being able to graph out metrics using a wildcard. Consider the ProcessInfo group of metrics that DMS provides about some of the OBIEE processes:

Let’s say we want a graph that shows cpuTime for each of the processes (not all are available). We could add each metric individually:

But that’s time consuming, and assumes there are only two processes. What if DMS gives us data for other processes? Instead we can use a wildcard in place of the process name:

obieesample.DMS.dms_cProcessInfo.ProcessInfo.*.cpuTime

You can do this by selecting a metric and then amending it in the Graph Data view, or from the Graph Data view itself click on Add and use the auto-complete to manually enter it.

But now the legend is pretty unintelligable, and this time using the aliasByMetric function won’t help because the metric name is constant (cpuTime). Instead, use the Set Legend Name By Node function. In this example we want the third node (the name of the process). Combined with a graph title this gives us:

This aliasbyNode method works well for Connection Pool data too. However it can be sensitive to certain characters (including brackets) in the metric name, throwing a IndexError: list index out of range error. The latest version of obi-metrics-agent should workaround this by modifying the metric names before sending them to carbon.

The above graph shows a further opportunity for using Graphite functions. The metric is a cumulative one – amount to CPU time that the process has used, in total. What would be more useful would be if we could show the delta between each occurrence. For this, the derivative function is appropriate:

Sometimes you’ll get graphs with gaps in; maybe the server was busy and the collector couldn’t keep up.

2014-03-28_07-29-47

To “gloss over” these, use the Keep Last Value function:

2014-03-28_07-30-51

Saving graphs

You don’t have to login to Graphite by default, but to save and return to graphs and dashboards between sessions you’ll want to. If you used the obi-metrics-agent installation script then Graphite will have a user oracle with password Password01. Click the Login button in the top right of the Graphite screen and enter the credentials.

Once logged in, you should see a Save icon (for you young kids out there, that’s a 3.5″ floppy disk…).

You can return to saved graphs from the Tree pane on the left:

flot

As well as the standard Graphite graphing described above, you also have the option of using flot, which is available from the link in the top-right options, or the icon on an existing graph:

2014-03-30_21-44-43

Graphlot/Flot is good for things like examining data values at specific times:

2014-03-30_21-47-36

Creating a dashboard

So far we’ve seen individual graphs in isolation, which is fine for ad-hoc experimentation but doesn’t give us an overall view of a system. Click on Dashboard in the top-right of the Graphite page to go to the dashboards area, ignoring the error about the default theme.

You can either build Graphite dashboards from scratch, or you can bring in graphs that you have prepared already in the Graphite Composer and saved.

At the top of the Graphite Dashboard screen is the metrics available to you. Clicking on them drills down the metric tree, as does typing in the box underneath

Selecting a metric adds it in a graph to the dashboard, and selecting a second adds it into a second graph:

You can merge graphs by dragging and dropping one onto the other:

Metrics within a graph can be modified with functions in exactly the same way as in the Graphite Composer discussed above:

To add in a graph that you saved from Graphite Composer, use the Graphs menu

You can resize the graphs shown on the dashboard, again using the Graphs menu:

To save your dashboard, use the Dashboard -> Save option.

Example Graphite dashboards

Here are some examples of obi-metrics-agent/Graphite being used in anger. Click on an image to see the full version.

  • OS stats (via collectl)
    OS stats from collectl
  • Presentation Services sessions, cache and charting
    Presentation Services sessions, cache and charting
  • BI Server (nqserver) Connection and Thread Pools
    BI Server (nqserver) Connection and Thread Pools
  • Response times vs active users (via JMeter)
    Response times vs active users (via JMeter)

Built-In OBIEE Load Testing with nqcmd

nqcmd ships with all installations of OBIEE and includes some very useful hidden functionality – the ability to generate load tests against OBIEE. There are lots of ways of generating load against OBIEE, but most require third party tools of varying degrees of complexity to work with.

It’s easy to try this out. First set the OBIEE environment:  [I'm using SampleApp v309R2 as an example; your FMW_HOME path will vary]

. ~/obiee/instances/instance1/bifoundation/OracleBIApplication/coreapplication/setup/bi-init.sh

and then the “open sesame” setting which enables the hidden nqcmd functionality:

export SA_NQCMD_ADVANCED=Yes

On Windows, run set SA_NQCMD_ADVANCED=YES instead. If you don’t set this environment variable then nqcmd just throws an error if you try to use one of the hidden options.

Now if you list the available options for nqcmd you’ll see lots of new options in addition to the usual ones:

Command: nqcmd - a command line client which can issue SQL statements
                 against either Oracle BI server or a variety
                 of ODBC compliant backend databases.
SYNOPSIS
         nqcmd [OPTION]...
DESCRIPTION
         -d<data source name>
         -u<user name>
         -p<password>
         -s<sql input file name>
         -o<output result file name>
         -D<Delimiter>
         -b<super batch file name>
         -w<# wait seconds>
         -c<# cancel interval seconds>
         -C<# number of fetched rows by column-wise binding>
         -n<# number of loops>
         -r<# number of requests per shared session>
         -R<# number of fetched rows by row-wise binding>
         -t<# number of threads>
         -T (a flag to turn on time statistics)
         -a (a flag to enable async processing)
         -f (a flag to enable to flush output file for each write)
         -H (a flag to enable to open/close a request handle for each query)
         -z (a flag to enable UTF8 in the output result file
         -utf16 (a flag to enable UTF16 for communicating to Oracle BI ODBC driver)
         -q (a flag to turn off row output)
         -NoFetch (a flag to disable data fetch with query execution)
         -SmartDiff (a flag to enable SmartDiff tags in output)
         -NotForwardCursor (a flag to disable forwardonly cursor)
         -v (a flag to display the version)
         -P<the percent of statements to disable cache hit>
         -impersonate <the impersonate username>
         -runas <the runas username>
         -td <the time duration to run >
         -qsel <the query selection>
         -ds <the dump statistics duration in secs>
         -qstats <print Query statistics at end of run>
         -login <login scenario for PSR. login/execute sqls/logout for sql file>
         -ShowQueryLog <to display query log from server, -H is required for this setting>
         -i <ramup interval for each user for load testing, -i is required for this setting>
         -ONFormat<FormatString, i.e. TM9, 0D99>

You’re own your own figuring the new options out as they’re not documented (and therefore presumably not supported and liable to change or be dropped at any time). What I’ve done below is my best guess at how to use them – don’t take this as gospel. The one source that I did find is a post on Oracle’s CEAL blog: OBIEE 11.1.1 – Advanced Usage of nqcmd command, from which I’ve taken some of the detail below.

Let’s have a look at how we can generate a load test. First off, I’ll create a very simple query:

and from the Advanced tab extract the Logical SQL from it:

SELECT
   0 s_0,
   "A - Sample Sales"."Products"."P2  Product Type" s_1,
   "A - Sample Sales"."Base Facts"."1- Revenue" s_2
FROM "A - Sample Sales"
ORDER BY 1, 2 ASC NULLS LAST
FETCH FIRST 5000001 ROWS ONLY

This Logical SQL I’ve saved to a file, report01.lsql.

To run this Logical SQL from nqcmd I use the standard (documented) syntax, passing the Logical SQL filename with the -s flag:

[oracle@obieesample loadtest]$ nqcmd -d AnalyticsWeb -u Prodney -p Admin123 -s report01.lsql

-------------------------------------------------------------------------------
          Oracle BI ODBC Client
          Copyright (c) 1997-2013 Oracle Corporation, All rights reserved
-------------------------------------------------------------------------------

Connection open with info:
[0][State: 01000] [DataDirect][ODBC lib] Application's WCHAR type must be UTF16, because odbc driver's unicode type is UTF16
SELECT
   0 s_0,
   "A - Sample Sales"."Products"."P2  Product Type" s_1,
   "A - Sample Sales"."Base Facts"."1- Revenue" s_2
FROM "A - Sample Sales"
ORDER BY 1, 2 ASC NULLS LAST
FETCH FIRST 5000001 ROWS ONLY
[...]

0            Smart Phones   6773120.36
--------------------
Row count: 11
--------------------

Processed: 1 queries

Adding the -q flag will do the same, but suppress the data output:

oracle@obieesample loadtest]$ nqcmd -d AnalyticsWeb -u Prodney -p Admin123 -s report01.lsql -q

[...]
----------------------------------------------------------------------
Row count: 11
-------------------------------------------------------------------------------------------------------------   
Processed: 1 queries

The basic parameters for load testing are

  • -t – how many threads [aka Virtual Users]
  • -td – test duration
  • -ds – how frequently to write out load test statistics
  • -T – enable time statistics [without this they will not be reported correctly]

You also need to supply -o with an output filename. Even if you’re not writing the data returned from the query to disk (which you shouldn’t, and -q disables), nqcmd needs this in order to be able to write its load test statistics properly (I got a lot of zeros and nan otherwise). In addition, the -T (Timer) flag should be enabled for accurate timings.

So to run a test for a minute with 5 threads, writing load test stats to disk every 5 seconds, you’d run:

nqcmd -d AnalyticsWeb -u Prodney -p Admin123 -s report01.lsql -q -T -td 60 -t 5 -ds 5 -o output

The load test stats are written to a file based on the name given in the -o parameter, with a _Counters.txt suffix:

$ cat output_Counters.txt
                        nQcmd Load Testing
TimeStamp       Sqls/Sec        Avg RT  CumulativePrepareTime   CumulativeExecuteTime   CumulativeFetchTime
00:00:05        56.200000       0.065925        2.536000                13.977000               2.012000
00:00:10        66.800000       0.065009        5.641000                33.479000               4.306000
00:00:15        69.066667       0.066055        8.833000                52.234000               7.366000
00:00:20        73.100000       0.063984        11.978000               71.944000               9.622000
[...]

Using obi-metrics-agent to pull out the OBIEE metrics and Graphite to render them we can easily visualise what happened when we ran the test. The Oracle_BI_General.Total_sessions metric shows:

nq07

Ramping Up the Load

nqcmd also has a -i parameter, to specify the ramp up per thread. Most load tests should incorporate a “ramp up”, whereby the load is introduced gradually. This is important so that you don’t overwhelm a server all at once. It might be the server will not support the total number of users planned, so by using a ramp up period you can examine the server’s behaviour as the load increases gradually, spotting the point at which the wheels begin to come off.

The -i parameter for nqcmd is the delay between each thread launching, and this has an interesting effect on the duration of the test. If you specify a test duration (-td) of 5 seconds, five threads (-t), and a rampup (-i) of 10 seconds the total elapsed will be c.55 seconds (5×10 + 5).

I’ve used the standard time command on Linux to validate this by specifying it before the nqcmd call.

$ time nqcmd -d AnalyticsWeb -u Prodney -p Admin123 -s report01.lsql -q -td 5 -t 5 -ds 1 -o $(date +%Y-%m-%d-%H%M%S) -T -i 10 

[...]

real    0m56.896s
user    0m2.350s
sys     0m1.434s

So basically the -td is the “Steady State” once all threads are ramped up, and the literal test duration is equal to (rampup * number of threads) + (desired steady state)

The above ramp-up can be clearly seen:

nq06

BTW a handy trick I’ve used here is to use a timestamp for the output name so that the Counter.txt from one test doesn’t overwrite another, by specifying date using an inline bash command :

nqcmd [...]   -o $(date +%Y-%m-%d-%H%M%S)   [...]

Whilst we’re at it for tips & tricks – if you want to stop nqcmd running but Ctrl-C isn’t instant enough for you, the following will stop it in its tracks:

pkill -9 nqcmd

Wait a Moment…

…or two. Wait time, or “think time”, is also important in producing a realistic load test. Unless you want to hammer your server just for the lulz to see how fast you can overload it, you’ll want to make sure the workload you’re simulating represents how it is actually used — and in reality users will be pausing (thinking) between report requests. The -w flag provides this option to nqcmd.

In this test below, whilst the Total Sessions is as before (no ramp up), the Connection Pool shows far fewer busy connections. On previous tests the busy connections were equal to the number of active threads, because the server was continuously running queries.

nq09

And the CPU, which in the previous test was exhausted at five users with no wait time, now is a bit more relaxed

nq10

for comparison, this was the CPU in the first test we ran (5 threads, no wait time, no ramp up). Note that ‘idle’ drops to zero, i.e. the CPU is flat-out.

nq11

Load Test in Action

Let’s combine ramp up and wait times to run a load test and see what we can see in the underlying OBIEE metrics. I’m specifying:

  • Write the output to a file with the current timestamp (date, in the format YYYY-MM-DD HH:MM:SS)
    -o $(date +%Y-%m-%d-%H%M%S)
  • 20 threads
    -t 20
  • 10 second gap between starting each new thread
    -i  10
  • 5 second wait between each thread submitting a new query
    -w 5
  • Run for a total of 230 seconds (20 thread x 10 second ramp up = 200 seconds, plus 30 second steady state)
    -td 230

$ date;time nqcmd -d AnalyticsWeb -u weblogic -p Password01 -s queries.lsql -q -T -o $(date +%Y-%m-%d-%H%M%S) -t 20 -ds 5 -td 230 -w 5 -i 10;date

Here’s what happened.

  • At first, as the users ramp up the Connection Pool gets progressively busier
    2014-03-28_10-24-11
  • However, when we hit c.14 threads, things start to go awry. The busy count stays at 10, even though the user count is increasing: 2014-03-28_10-26-12
    (This was displayed in flot which you can get to on the /graphlot URL of your Graphite server)
  • So the user count is increasing, but we’re not seeing increasing activity on the Connection Pool… so what does that do for the response times? 2014-03-28_10-30-50
    OK, so the Average Query Elapsed Time is a metric I’d normally be wary of, but this is a dedicated server running just my load test workload (and a single query within it) so in this case it’s a valid indicator — and it’s showing that the response time it going up. Why’s it going up?
  • Looking more closely at the Connection Pool we can see a problem — we’re hitting the capacity of ten connections, and requests are starting to queue up: 2014-03-28_10-38-06
    Note how once the Current Busy Connection Count hits the Capacity of ten, the Current Queued Requests value starts to increase — because the number of users is increasing, trying to run more queries, but having to wait.

So this is a good example of where users would see slow performance, but some of the usual “Silver Bullets” around hardware and the database would completely miss the target, because the bottleneck here is actually in the configuration of the Connection Pool.


If you’re interested in hearing more about this subject, make sure you register for the BI Forum in Brighton, 7-9 May where I’m delighted to be speaking for the second time, presenting “No Silver Bullets : OBIEE Performance in the Real World“.

Installing obi-metrics-agent, Graphite, and collectl

In the previous post I introduced obi-metrics-agent, an open-source tool from Rittman Mead that can collect DMS metrics from OBIEE and write them to several formats including graphite. Graphite is a tool for storing and graphing time-based metrics.

Now in this post we will look at how to install the tools and get them up and running on your server. The install process has been tested on both OL5 and OL6 (and so by extension, Cent OS and RHEL). There’s no reason why it oughtn’t work on other *nixes, but it’s not been tested.

I’m using Oracle’s SampleApp v309R2 server which is built on Oracle Linux (OL) 5 so that anyone can try this out at home running it on VirtualBox (or VMWare Fusion).

There are three parts to the obi-metrics-agent installation.

  1. obi-metrics-agent – python script to extract DMS metrics from OBIEE
  2. [optional] Graphite – store and graph data, including that extracted from obi-metrics-agent and collectl
  3. [optional] collectl – OS metrics viewer. Can send data for graphing in Graphite

Only one of these is mandatory – the installation of obi-metrics-agent itself. That in itself is just a case of cloning the repository in order to get the obi-metrics-agent python script on the server. However, the remaining articles in this blog series will assume that graphite and collectl have also been installed, so these are covered too.

Assumptions & pre-requisites

  1. Server has internet access (try ping google.com to check)
  2. Installation is running as the oracle user
  3. Installation path is /home/oracle/graphite
  4. User has sudo rights

Before you run this on SampleApp v309R2, you must first run the following to fix a bug in yum’s configuration. If you’re not running SampleApp, you don’t need to do this:

sudo sed -i.bak -e 's/proxy=/#proxy=/g' /etc/yum.conf

You need the EPEL yum repository setup (for packages such as git, etc):

  • OL 5/Cent OS 5/RHEL 5 (including SampleApp v309R2):
    sudo rpm -Uvh http://dl.fedoraproject.org/pub/epel/5/i386/epel-release-5-4.noarch.rpm
  • OL 6/Cent OS 6/RHEL 6:
    sudo rpm -Uvh http://dl.fedoraproject.org/pub/epel/6/`uname -p`/epel-release-6-8.noarch.rpm

Installation

The bulk of the installation is done by a couple of shell scripts. To get these, you should clone the git repository. The following steps will do this.

Now run the following to install git and clone the repository:

sudo yum install -y git
export FMW_HOME=/home/oracle/obiee # Change this value for your installation
cd $FMW_HOME
git clone https://github.com/RittmanMead/obi-metrics-agent.git

Then launch the installation:

  • OL5:
    cd obi-metrics-agent
    ./install-ol5.sh
  • OL6:
    cd obi-metrics-agent
    ./install-ol6.sh

This will install git (if not present), and create a folder called obi-metrics-agent in the installation folder of OBIEE (FMW Home). You can put obi-metrics-agent wherever you want, this locating is just a suggestion.

The installation for Graphite builds a standalone python environment using virtualenv in which graphite will run under Python 2.6. The reason for this is that OL5 ships with Python 2.4 (Noah was seen coding in this shortly before boarding his ark) and the OS’ python version cannot be easily upgraded without causing all sorts of complications. Virtualenv is generally a good way to deploy python applications in isolation from others, so is also used for the OL6 version of the install.

The script will take a few minutes to run. Once complete, you can test the installation following the steps below. Do note that Graphite is somewhat notorious for installation problems, so whilst these instructions have been tested, you may hit quirks on your own server that may need a bit of Google-Fu to resolve. If you want to follow step-by-step instructions detailing each step, they are provided:

If you’re on a different *nix platform then feel free to adapt the install scripts and submit a pull request. Graphite works flawlessly on Debian-based distributions, and there is no reason why you should run it local to the OBIEE server on which obi-metrics-agent is running.

Testing the installation

Testing obi-metrics-agent

First, start up the OBIEE stack and make sure that you can login to OBIEE. One it is running, you can run obi-metrics-agent by entering:

export FMW_HOME=/home/oracle/obiee # Change this value for your installation
export OPMN_BIN=$FMW_HOME/instances/instance1/bin/opmnctl
python $FMW_HOME/obi-metrics-agent/obi-metrics-agent.py

Expected output:

[...]
--Gather metrics--
    Time of sample: Wed, 19 Mar 2014 22:27:42 +0000 (1395268062)

    Get metrics for coreapplication_obips1
         Processed :    469 data values @ Wed, 19 Mar 2014 22:27:42 +0000       Oracle BI Presentation Server
            Appended CSV data to ./data/metrics.csv
    Get metrics for coreapplication_obis1
         Processed :    229 data values @ Wed, 19 Mar 2014 22:27:42 +0000       Oracle BI Server
            Appended CSV data to ./data/metrics.csv
    Get metrics for opmn
         Processed :    91 data values @ Wed, 19 Mar 2014 22:27:42 +0000        opmn
            Appended CSV data to ./data/metrics.csv

    Processed: 3    Valid: 3 (100.00%)      Invalid: 0 (0.00%)
    -- Sleeping for 5 seconds (until 1395268062)--
[...]

For more advanced options, including sending data to graphite, see below.

Testing Graphite

Go to http://<server>/ and you should see the Graphite launch page:

2014-03-25_09-32-17

By default Graphite will collect data about its own performance, of the carbon agent. This means you can easily test that Graphite’s graphs are working by selecting one of the carbon metrics to graph. Expand the Metrics tree and click cpuUsage which will add it into the graph in the main pane:

2014-03-25_09-33-37

Graphite is installed with a default user oracle with password Password01.

Testing collectl

From the command line enter collectl:

[oracle@obieesample obi-metrics-agent]$ collectl
waiting for 1 second sample...
#<--------CPU--------><----------Disks-----------><----------Network---------->
#cpu sys inter  ctxsw KBRead  Reads KBWrit Writes   KBIn  PktIn  KBOut  PktOut
   1   0   340    430      0      0      0      0      0      0      0       0
   0   0   444    499     48      2      0      0      0      1      0       1
   1   0   316    430      0      0      0      0      0      1      0       1

(press Ctrl-C to exit)

To check that collectl is successfully sending data to Graphite go to Graphite and under the Metrics folder you should see your machine’s hostname (for example obieesample for SampleApp). Under that should be the metrics that collectl is sending to Graphite:

2014-03-25_09-45-10

Graphing OBIEE DMS metrics in Graphite

Now that we’ve gone through the installation, let us see how to use the three components together, graphing out some OBIEE DMS data in Graphite.

Run obi-metrics-agent.py as you did in the test above, but this time specify the output carbon and carbon-server parameters

export OPMN_BIN=$FMW_HOME/instances/instance1/bin/opmnctl
python $FMW_HOME/obi-metrics-agent/obi-metrics-agent.py --output carbon --carbon-server localhost

Now go to Graphite at http://<server>/ and under Metrics you should see all of the OBIEE DMS metrics in folders. Note that it may take a couple of minutes for the metrics to first appear in Graphite. If you don’t see them, or the complete list, wait a minute and then hi

2014-03-25_12-05-19

(the DMS folder is metrics relating to DMS itself – the OBIEE DMS metrics are under OBI)

Find the metric for Presentation Services logged in sessions, Oracle_BI_PS_Sessions.Sessions_Logged_In and click on it to add it to the graph. By default the graph shows a large time period so click on the Select Recent Data icon

2014-03-25_12-08-13

Set the time period to 5 minutes, and click the Auto-Refresh button at the bottom of the graph.

Now log in to OBIEE, and go back to the Graphite page – you should see that the metric has increased accordingly.

2014-03-25_12-13-03

I’ll look at using Graphite in detail in a later blog post; this is just to demonstrate that the collection and rendering is working.

Different ways of using obi-metrics-agent

obi-metrics-agent has three modes of operation:

  1. Collect and parse metrics to output
    • Near-real-time rendering of OBI metrics in Graphite
    • Near-real-time load of data into Oracle (via CSV & external table)
  2. Collect metrics to disk only
    • For analysis at a later date
    • If graphite/carbon is not available to send data to
    • If outbound network bandwidth is constrained (or could be, by OBIEE)
    • For lowest host system overhead
  3. Parse existing metrics from disk
    • Parse previously collected data and output to Graphite or Oracle (via CSV & external table)

Syntax

Options:
  -h, --help            show this help message and exit
  -o OUTPUTFORMAT, --output=OUTPUTFORMAT
                        The output format(s) of data, comma separated. More
                        than one can be specified. Unparsed options: raw, xml
                        Parsed options: csv , carbon, sql
  -d DATA, --data-directory=DATA
                        The directory to which data files are written. Not
                        needed if sole output is carbon.
  -p, --parse-only      If specified, then all raw and xml files specified in
                        the data-directory will be processed, and output to
                        the specified format(s) Selecting this option will
                        disable collection of metrics.
  --fmw-instance=FMW_INSTANCE
                        Optional. The name of a particular FMW instance. This
                        will be prefixed to metric names.
  --carbon-server=CARBON_SERVER
                        The host or IP address of the Carbon server. Required
                        if output format 'carbon' specified.
  --carbon-port=CARBON_PORT
                        Alternative carbon port, if not 2003.
  -i INTERVAL, --interval=INTERVAL
                        The interval in seconds between metric samples.
  --opmnbin=OPMN_BIN    The complete path to opmnctl. Watch out for spaces.

For further details and syntax examples, see the USAGE.MD file in the github repository.

Introducing obi-metrics-agent – an Open-Source OBIEE Metrics Collector

Understanding what is going on inside OBIEE is important for being able to diagnose issues that arise, monitor its health, and dig deep into its behaviour under stress in a load test. OBIEE exposes a set of metrics through the Dynamic Monitoring Service (DMS) and viewable through Enterprise Manager (EM) Fusion Middleware Control. EM is a great tool but doesn’t meet all requirements for accessing these metrics, primarily because it doesn’t retain any history.

obi-metrics-agent is a tool that extracts OBIEE’s performance metrics from the Dynamic Monitoring Service (DMS) functionality. Venkat wrote the original version, which I have rewritten in python and added additional functionality. It polls DMS on a specified interval and output the data to a variety of formats. It was written to aid OBIEE performance monitoring either as part of testing or longer-term use. Its features include:

  • Multiple output options, including CSV, XML, and Carbon (for rendering in Graphite etc)
  • Parse data as it is collected, or write to disk
  • Parse data collected previously

2014-03-26_07-03-19

How does it work?

obi-metrics-agent is written in Python, and uses the documented OPMN functionality to expose DMS metrics opmnctl metric op=query. We experimented with the WLST route but found the overhead was too great. OPMN supplies the DMS data as a large XML message, which obi-metrics-agent can either store raw, or parse out into the constituent metrics. It can write these to CSV or XML files, generate INSERT statements for sending them to a database, or send them to graphite (see below).

obi-metrics-agent can also parse previously-extracted raw data, so if you want to store data in graphite but don’t have the server to hand at execution time it can be loaded retrospectively.

2014-03-26_06-07-18

On which platforms does it work?

  • Tested thoroughly on Oracle Linux 5 and 6
  • Works on Windows 2003, should work on later versions

Which OBI metrics are collected?

All of the ones that OPMN supports. Currently, BI Server and BI Presentation Services, plus the opmn process metrics (such as CPU time of each OBI component)

To explore the DMS metrics available, you can use Enterprise Manager, or the DMS Spy servlet that is installed by default with OBIEE and available at http://<obi-server>:7001/dms/ (assuming your AdminServer is on port 7001).

perf01

I have used DMS metrics primarily when investigating OBIEE’s behaviour under stress in performance testing, but some of the higher-level metrics are useful for day-to-day monitoring too. The DMS metrics let you peer into OBIEE’s workings and deduce or hypothesise the cause of behaviour you are seeing.

  • How many users does OBIEE see as logged in?
  • How many active requests are there from Presentation Services to BI Server (nqserver)?
  • How many active connections are there from each Connection Pool to the database? How many queued connections?
  • What’s the average (careful…) response time by database?
  • What’s the error rate for Presentation Services queries?
  • How does the memory profile of each OBIEE component behave during testing?
  • How are the BI Server’s internal thread pools coping with the load? Do they need resizing?
  • How many queries per second are being run on each database?
  • How is the graphing engine behaving? Is it queuing requests?
  • What’s the Presentation Services and BI Server cache hit rate?

Sounds great, where do I get it?

Rittman Mead have released obi-metrics-agent as open source. You can find it on GitHub: https://github.com/RittmanMead/obi-metrics-agent.

Simply clone the repository and run the python script. You need to install the lxml library first – full details are supplied in the repository.

$ export OPMN_BIN=$FMW_HOME/instances/instance1/bin/opmnctl
$ python ./obi-metrics-agent.py

# ===================================================================
# Developed by @rmoff / Rittman Mead (http://www.rittmanmead.com)
# Absolutely no warranty, use at your own risk
# Please include this notice in any copy or reuse of the script you make
# ===================================================================

---------------------------------------
Output format             : csv
raw/csv/xml/carbon/sql    : False/True/False/False/False
Data dir                  : ./data
FMW instance              : None
OPMN BIN                  : /u01/app/oracle/product/fmw/instances/instance1/bin/opmnctl
Sample interval (seconds) : 5
---------------------------------------

--Gather metrics--
        Time of sample: Wed, 26 Mar 2014 10:38:38 +0000 (1395830318)

        Get metrics for coreapplication_obips1
                 Processed :    469 data values @ Wed, 26 Mar 2014 10:38:38 +0000       Oracle BI Presentation Server
                        Appended CSV data to ./data/metrics.csv
        Get metrics for coreapplication_obis1
                 Processed :    230 data values @ Wed, 26 Mar 2014 10:38:38 +0000       Oracle BI Server
                        Appended CSV data to ./data/metrics.csv
        Get metrics for opmn
                 Processed :    91 data values @ Wed, 26 Mar 2014 10:38:38 +0000        opmn
                        Appended CSV data to ./data/metrics.csv

        Processed: 3    Valid: 3 (100.00%)      Invalid: 0 (0.00%)
[...]

See the next blog post for a step-by-step on getting it set up and running using SampleApp v309R2 as an example server, including with Graphite and Collectl for visualising the data and collecting OS stats too.

Visualising the collected data – Graphite

Graphite is an open-source graphing tool that comes with a daemon called carbon that receives incoming data and stores it its own times-series database (called whisper). Graphite is a very popular tool meaning there’s lots of support out there for it and additional tools written to complement it, some of which I’ll be exploring in later blog posts. It’s also very easy to get data into graphite, and because it stores it in a time series you can then display OBIEE DMS data alongside anything else you may have – for example, OS metrics from collectl, or jmeter performance test counters.

obi-metrics-agent architecture7defbbf23a98ad16184c80041fdf1bc9

Whilst obi-metrics-agent can be used on its own and data stored to CSV for subsequent parsing, or accessing in Oracle as an external table, the focus on this and subsequent blog posts will primarily be on using obi-metrics-agent writing data to graphite and the benefits this brings when it comes to visualising it.

“Graphite?! Haven’t we got a visualisation tool already in OBIEE?”

You can graph this data out through OBIEE, if you want. The OBI metric data can be loaded in by external table from the CSV files, or using the generated INSERT statements.

The benefit of using Graphite is twofold:

  1. Its primary purpose is graphing time-based metrics. Metrics in, time-based graphs out. You don’t need to build an RPD or model a time dimension. It also supports one-click rendering of wildcarded metric groups (for example, current connection count on all connection pools), as well as one-click transformations such as displaying deltas of a cumulative measure.
  2. It gives an alternative to a dependency on OBIEE. If we use OBIEE we’re then rendering the data on the system that we’re also monitoring, thus introducing the significant risk of just measuring the impact of our monitoring! To then set up a second OBIEE server just for rendering graphs then it opens up the question of what’s the best graphing tool for this particular job, and Graphite is a strong option here.

Graphite just works very well in this particular scenario, which is why I use it….YMMV.

Contributing

There are several obvious features that could be added so do please feel free to fork the repository and submit your own pull requests! Ideas include:

  • support for scaled-out clusters
  • init.d / run as a daemon
  • selective metric collection

Known Issues

Prior to OBIEE 11.1.1.7, there was a bug in the opmn process which causes corrupt XML sometimes. This could sometimes be as much as 15% of samples. On corrupt samples, the datapoint is just dropped.

The FMW patch from Oracle for this issue is 13055259.

License

===================================================================
Developed by @rmoff / Rittman Mead (http://www.rittmanmead.com)
Absolutely no warranty, use at your own risk
Please include this notice in any copy or reuse of the script you make
===================================================================

What next?

There are several posts on this has subject to come, including :

  1. Installing obi-metrics-agent, graphite, and collectl.
  2. Exploring the graphite interface, building simple dashboards
  3. Alternative front-ends to graphite