Tag Archives: Obiee

Concurrent RPD Development in OBIEE

OBIEE is a well established product, having been around in various incarnations for well over a decade. The latest version, OBIEE 11g, was released 3.5 years ago, and there are mutterings of OBIEE 12c already. In all of this time however, one thing it has never quite nailed is the ability for multiple developers to work with the core metadata model – the repository, known as the RPD – concurrently and in isolation. Without this, development is doomed to be serialised – with the associated bottlenecks and inability to scale in line with the number of developers available.

My former colleague Stewart Bryson wrote a series of posts back in 2013 in which he outlines the criteria for a successful OBIEE SDLC (Software Development LifeCycle) method. The key points were :

  • There should be a source control tool (a.k.a version control system, VCS) that enables us to store all artefacts of the BI environment, including RPD, Presentation Catalog, etc etc. From here we can tag snapshots of the environment at a given point as being ready for release, and as markers for rollback if we take a wrong turn during development.
  • Developers should be able to do concurrent development in isolation.
    • To do this, source control is mandatory in order to enable branch-based development, also known as feature-driven development, which is a central tenet of an Agile method.

Oracle’s only answer to the SDLC question for OBIEE has always been MUDE. But MUDE falls short in several respects:

  • It only manages the RPD – there is no handling of the Presentation Catalog etc
  • It does not natively integrate with any source control
  • It puts the onus of conflict resolution on the developer rather than the “source master” who is better placed to decide the outcome.

Whilst it wasn’t great, it wasn’t bad, and MUDE was all we had. Either that, or manual integration into source control (1, 2) tools, which was clunky to say the least. The RPD remained a single object that could not be merged or managed except through the Administration Tool itself, so any kind of automatic merge strategies that the rest of the software world were adopting with source control tools were inapplicable to OBIEE. The merge would always require the manual launching of the Administration Tool, figuring out the merge candidates, before slowly dying in despair at having to repeat such a tortuous and error-prone process on a regular basis…

Then back in early 2012 Oracle introduced a new storage format for the RPD. Instead of storing it as a single binary file, closed to prying eyes, it was instead burst into a set of individual files in MDS XML format.

For example, one Logical Table was now one XML files on disk, made up of entities such as LogicalColumn, ExprText, LogicalKey and so on:

It even came with a set of configuration screens for integration with source control. It looked like the answer to all our SDLC prayers – now us OBIEE developers could truly join in with the big boys at their game. The reasoning went something like:

  1. An RPD stored in MDS XML is no longer binary
  2. git can merge code that is plain text from multiple branches
  3. Let’s merge MDS XML with git!

But how viable is MDS XML as a storage format for the RPD used in conjunction with a source control tool such as git? As we will see, it comes down to the Good, the Bad, and the Ugly…

The Good

As described here, concurrent and unrelated developments on an RPD in MDS XML format can be merged successfully by a source control tool such as git. Each logical object is an file, so git just munges (that’s the technical term) the files modified in each branch together to come up with a resulting MDS XML structure with the changes from each development in it.

The Bad

This is where the wheels start to come off. See, our automagic merging fairy dust is based on the idea that individually changed files can be spliced together, and that since MDS XML is not binary, we can trust a source control tool such as git to also work well with changes within the files themselves too.

Unfortunately this is a fallacy, and by using MDS XML we expose ourselves to greater complications than we would if we just stuck to a simple binary RPD merged through the OBIEE toolset. The problem is that whilst MDS XML is not binary, is not unstructured either. It is structured, and it has application logic within it (mdsid, of which see below).

Within the MDS XML structure, individual first-class objects such as Logical Tables are individual files, and structured within them in the XML are child-objects such as Logical Columns:

Source control tools such as git cannot parse it, and therefore do not understand what is a real conflict versus an unrelated change within the same object. If you stop and think for a moment (or longer) quite what would be involved in accurately parsing XML (let alone MDS XML), you’ll realise that you basically need to reverse-engineer the Administration Tool to come up with an accurate engine.

We kind of get away with merging when the file differences are within an element in the XML itself. For example, the expression for a logical column is changed in two branches, causing clashing values within ExprText and ExprTextDesc. When this happens git will throw a conflict and we can easily resolve it, because the difference is within the element(s) themselves:

Easy enough, right?

But taking a similarly “simple” merge conflict where two independent developers add or modify different columns within the same Logical Table we see what a problem there is when we try to merge it back together relying on source control alone.

Obvious to a human, and obvious to the Administration Tool is that these two new columns are unrelated and can be merged into a single Logical Table without problem. In a paraphrased version of MDS XML the two versions of the file look something like this, and the merge resolution is obvious:

But a source control tool such as git looks as the MDS XML as a plaintext file, not understanding the concept of an XML tree and sibling nodes, and throws its toys out of the pram with a big scary merge conflict:

Now the developer has to roll up his or her sleeves and try to reconcile two XML files – with no GUI to support or validate the change made except loading it back into the Administration Tool each time.

So if we want to use MDS XML as the basis for merging, we need to restrict our concurrent developments to completely independent objects. But, that kind of hampers the ideal of more rapid delivery through an Agile method if we’re imposing rules and restrictions like this.

The Ugly

This is where is gets a bit grim. Above we saw that MDS XML can cause unnecessary (and painful) merge conflicts. But what about if two developers inadvertently create the same object concurrently? The behaviour we’d expect to see is a single resulting object. But what we actually get is both versions of the object, and a dodgy RPD. Uh Oh.

Here are the two concurrently developed RPDs, produced in separate branches isolated from each other:

And here’s what happens when you leave it to git to merge the MDS XML:

The duplicated objects now cannot be edited in the Administration Tool in the resulting merged RPD – any attempt to save them throws the above error.

Why does it do this? Because the MDS XML files are named after a globally unique identifier known as the mdsid, and not their corresponding RPD qualified name. And because the mdsid is unique across developments, two concurrent creations of the same object end up with different mdsid values, and thus different filenames.

Two files from separate branches with different names are going to be seen by source control as being unrelated, and so both are brought through in the resulting merge.

As with the unnecessary merge conflict above, we could define process around same object creation, or add in a manual equalise step. The issue really here is that the duplicates can arise without us being aware because there is no conflict seen by the source control tool. It’s not like merging an un-equalised repository in the Administration Tool where we’d get #1 suffixes on the duplicate object so that at least (a) we spot the duplication and (b) the repository remains valid and the duplicate objects available to edit.

MDS XML Repository opening times

Whether a development strategy based on MDS XML is for you or not, another issue to be aware of is that for anything beyond a medium sized RPD opening times of an MDS XML repository are considerable. As in, a minute from binary RPD, and 20 minutes from MDS XML. And to be fair, after 20 minutes I gave up on the basis that no sane developer would write off that amount of their day simply waiting for the repository to open before they can even do any work on it. This rules out working with any big repositories such as that from BI Apps in MDS XML format.

So is MDS XML viable as a Repository storage format?

MDS XML does have two redeeming features :

  1. It reduces the size of your source control repository, because on each commit you will be storing just a delta of the overall repository change, rather than the whole binary RPD each time.
  2. For tracking granular development progress and changes you can identify what was modified through the source control tool alone – because the new & modified objects will be shown as changes:

But the above screenshots both give a hint of the trouble in store. The mdsid unique identifier is used not only in filenames – causing object duplication and strange RPD behaviour- but also within the MDS XML itself, referencing other files and objects. This means that as a RPD developer, or RPD source control overseer, you need to be confident that each time you perform a merge of branches you are correctly putting Humpty Dumpty back together in a valid manner.

If you want to use MDS XML with source control you need to view it as part of a larger solution, involving clear process and almost certainly a hybrid approach with the binary RPD still playing a part — and whatever you do, the Administration Tool within short reach. You need to be aware of the issues detailed above, decide on a process that will avoid them, and make sure you have dedicated resource that understands how it all fits together.

If not MDS XML, then what?…

Source control (e.g. git) is mandatory for any kind of SDLC, concurrent development included. But instead of storing the RPD in MDS XML, we store it as a binary RPD.

Wait wait wait, don’t go yet ! … it gets better

By following the git-flow method, which dictates how feature-driven development is done in source control (git), we can write a simple script that determines when merging branches what the candidates are for an OBIEE three-way RPD merge.

In this simple example we have two concurrent developments – coded “RM–1” and “RM–2”. First off, we create two branches which take the code from our “mainline”. Development is done on the two separate features in each branch independently, and committed frequently per good source control practice. The circles represent commit points:

The first feature to be completed is “RM–1”, so it is merged back into “develop”, the mainline. Because nothing has changed in develop since RM–1 was created from it, the binary RPD file and all other artefacts can simply ‘overwrite’ what is there in develop:

Now at this point we could take “develop” and start its deployment into System Test etc, but the second feature we were working on, RM–2, is also tested and ready to go. Here comes the fancy bit! Git recognises that both RM–1 and RM–2 have made changes to the binary RPD, and as a binary RPD git cannot try to merge it. But now instead of just collapsing in a heap and leaving it for the user to figure out, it makes use of git and the git-flow method we have followed to work out the merge candidates for the OBIEE Administration Tool:

Even better, it invokes the Administration Tool (which can be run from the command line, or alternatively use command line tools comparerpd/patchrpd) to automatically perform the merge. If the merge is successful, it goes ahead with the commit in git of the merge into the “develop” branch. The developer has not had to do any kind of interaction to complete the merge and commit.

If the merge is not a slam-dunk, then we can launch the Administration Tool and graphically figure out the correct resolution – but using the already-identified merge candidates in order to shorten the process.

This is not perfect, but there is no perfect solution. It is the closest thing that there is to perfection though, because it will handle merges of :

  • Unique objects
  • Same objects, different modifications (c.f. two new columns on same table example above)
  • Duplicate objects – by equalisation

Conclusion

There is no single right answer here, nor are any of the options overly appealing.

If you want to work with OBIEE in an Agile method, using feature-driven development, you will have to adopt and learn specific processes for working with OBIEE. The decision you have to make is on how you store the RPD (binary or multiple MDS XML files, or maybe both) and how you handle merging it (git vs Administration Tool).

My personal view is that taking advantage of git-flow logic, combined with the OBIEE toolset to perform three-way merges, is sufficiently practical to warrant leaving the RPD in binary format. The MDS XML format is a lovely idea but there are too few safeguards against dodgy/corrupt RPD (and too many unnecessary merge conflicts) for me to see it as a viable option.

Whatever option you go for, make sure you are using regression testing to test the RPD after you merge changes together, and ideally automate the testing too. Here at Rittman Mead we’ve written our own suite of tools that do just this – get in touch to find out more.

Connecting OBIEE11g on Windows to a Kerberos-Secured CDH5 Hadoop Cluster using Cloudera HiveServer2 ODBC Drivers

In a few previous posts and magazine articles I’ve covered connecting OBIEE11g to a Hadoop cluster, using OBIEE 11.1.1.7 and Cloudera CDH4 and CDH5 as the examples. Things get a bit complicated in that the DataDirect Apache Hive ODBC drivers that Oracle ship are only for HiveServer1 and not the HiveServer2 version that CDH4 and CDH5 use, and the Linux version of OBIEE 11.1.1.7 won’t work with the Cloudera Hive ODBC drivers that you have to use to connect to Hive on CDH4/5. You can however connect OBIEE 11.1.1.7 on Windows to HiveServer2 on CDH4 and CDH5 if you use the Cloudera Hive ODBC drivers for Windows, and although this isn’t supported by Oracle in my experience it does work, albeit with the general OBIEE11g Hive restrictions and caveats detailed in the Metadata Repository Builder’s Guide, and the fact that in-practice Hive is too slow to use for ad-hoc reporting.

However … most enterprise-type customers who run Hadoop on their internal networks have their clusters configured as “secured”, rather than the unsecured cluster examples that you see in most OBIEE connection examples. By default, Hadoop clusters are very trusting of incoming network and client connections and assume that whoever’s connecting is who they say they are, and HDFS and the other cluster components don’t perform any authentication themselves of incoming client connections. In addition, by default all network connections between Hadoop cluster components run in clear text and without any mutual authentication, which is great for a research cluster or PoC but not really appropriate for enterprise customers looking to use Hadoop to store and analyse customer data.

Instead, these customers configure their clusters to run in secured mode, using Kerberos authentication to secure incoming connections, encrypt network traffic and secure connections between the various services in the cluster. How this affects OBIEE though is that your Hive connections through to the cluster also need to use Kerberos authentication, and you (and the OBIEE BI Server) need to have a valid Kerberos ticket when connecting through the Hive ODBC driver. So how do we set this up, and how do we get hold of a secure Hadoop cluster using Kerberos authentication to test against? A few of our customers have asked this question recently, so I thought it’d be worth jotting down a few notes on how to set this up.

At a high-level, if you want to connect OBIEE 11.1.1.7 to a secure, Kerberos-authenticated CDH cluster, there’s three main steps you need to carry out:

  1. Get hold of a Kerberos-secured CDH cluster, and establish the connection details you’ll need to use to connect to it
  2. Make sure the Kerberos server has the correct entries/principals/user details for the user you’re going to securely-connect as
  3. Configure the host environment for OBIEE to work with Kerberos authentication, and then create the connection from OBIEE to the CDH cluster using the correct Kerberos credentials for your user

In my case, I’ve got a Cloudera CDH5.3.0 cluster running in the office that’s been configured to use MIT Kerebos 5 for authentication, set up using an OEL6 VM as the KDC (Key Distribution Centre) and the cluster configured using the new Kerebos setup wizard that was introduced with CDH5.1. Using this wizard automates the creation of the various Kerberos service account and host principals in the Kerberos database, and configures each of the cluster components – YARN, Hive, HDFS and so on – to authenticate with each other using Kerberos authentication and use encrypted network connections for inter-service and inter-node communication.

NewImage

Along with the secured Hadoop cluster, key bits of information and configuration data you’ll need for the OBIEE side are:

  • The krb5.conf file from the Kerberos KDC, which contains details of the Kerberos realm, URL for the KDC server, and other key connection details
  • The name of the Kerberos principal used for the Hive service name on the Hadoop cluster – typically this is “hive”; if you want to connect to Hive first using a JDBC tool such as beeline, you’ll also need the full principal name for this service, in my case “hive/bda3node2.rittmandev.com@RITTMANDEV.COM”
  • The hostname (FQDN) of the node in the CDH cluster that contains the HiveServer2 RPC interface that OBIEE connects to, to run HiveQL queries
  • The Port that HiveServer2 is running on – typically this is “10000”, and the Hive database name (for example, “default’)
  • The name of the Kerebos Realm you’ll be connecting to – for example, MYCOMPANY.COM or in my case, RITTMANDEV.COM (usually in capitals)

In my case, the krb5.conf file that is used to configure Kerebos connections to my KDC looks like this – in your company it might be a bit more complex, but this example defines a simple MIT Kerebos 5 domain:

[logging]
    default = FILE:/var/log/krb5libs.log
    kdc = FILE:/var/log/krb5kdc.log
    admin_server = FILE:/var/log/kadmind.log
[libdefaults]
    default_realm = RITTMANDEV.COM
    dns_lookup_realm = false
    dns_lookup_kdc = false
    ticket_lifetime = 24h
    renew_lifetime = 7d
    forwardable = true
[realms]
    RITTMANDEV.COM = {
    kdc = auth.rittmandev.com
    admin_server = auth.rittmandev.com
}
[domain_realm]
    .rittmandev.com = RITTMANDEV.COM
    rittmandev.com = RITTMANDEV.COM

In my setup, the CDH Hadoop cluster has been configured to use Kerberos authentication for all communications between cluster components and any connections from the outside that use those components; the cluster itself though can still be accessed via unsecured (non-Kerebos authenticated) SSH, though of course this aspect could be secured too. To test out the Hive connectivity before we get into the OBIEE details you can use the beeline CLI that ships with CDH5, and to do this you’ll need to be able to SSH into one of the cluster nodes (if you’ve not got beeline installed on your own workstation) and you’ll need an account (principal) created for you in the Kerebos database to correspond to the Linux user and HDFS/Hive user that has access to the Hive tables you’re interested in. To create such a Kerebos principal for my setup, I used the kadmin.local command on the KDC VM to create a user that matched my Linux/HDFS username and gave it a password:

kadmin.local:  addprinc mrittman
WARNING: no policy specified for mrittman@RITTMANDEV.COM; defaulting to no policy
Enter password for principal "mrittman@RITTMANDEV.COM":
Re-enter password for principal "mrittman@RITTMANDEV.COM":
Principal "mrittman@RITTMANDEV.COM" created.

SSH’ing into one of the secure CDH cluster nodes, I first have to authenticate using the kinit command which when successful, creates a Kerebos ticket that gets cached for a set amount of time, and beeline can thereafter use as part of its own authentication process:

officeimac:.ssh markrittman$ ssh mrittman@bda3node4
mrittman@bda3node4's password: 
[mrittman@bda3node4 ~]$ kinit -p mrittman
Password for mrittman@RITTMANDEV.COM: 
[mrittman@bda3node4 ~]$

Now I can use beeline, and pass the Hive service principal name in the connection details along with the usual host, port and database name. When beeline prompts for my username and password, I use the Kerberos principal name that matches the Linux/HDFS one, and enter that principal’s password:

[mrittman@bda3node4 ~]$ beeline
Beeline version 0.13.1-cdh5.3.0 by Apache Hive
beeline> !connect jdbc:hive2://bda3node2:10000/default;principal=hive/bda3node2.rittmandev.com@RITTMANDEV.COM
scan complete in 2ms
Connecting to jdbc:hive2://bda3node2:10000/default;principal=hive/bda3node2.rittmandev.com@RITTMANDEV.COM
Enter username for jdbc:hive2://bda3node2:10000/default;principal=hive/bda3node2.rittmandev.com@RITTMANDEV.COM: mrittman
Enter password for jdbc:hive2://bda3node2:10000/default;principal=hive/bda3node2.rittmandev.com@RITTMANDEV.COM: ********
Connected to: Apache Hive (version 0.13.1-cdh5.3.0)
Driver: Hive JDBC (version 0.13.1-cdh5.3.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://bda3node2:10000/default> show tables;
+------------------+--+
|     tab_name     |
+------------------+--+
| posts            |
| things_mrittman  |
+------------------+--+
2 rows selected (0.162 seconds)
0: jdbc:hive2://bda3node2:10000/default> select * from things_mrittman;
+---------------------------+-----------------------------+--+
| things_mrittman.thing_id  | things_mrittman.thing_name  |
+---------------------------+-----------------------------+--+
| 1                         | Car                         |
| 2                         | Dog                         |
| 3                         | Hat                         |
+---------------------------+-----------------------------+--+
3 rows selected (0.251 seconds)

So at this point I’ve covered off the first two steps; established the connection details for the secure CDH cluster, and got hold of and confirmed the Kerberos principal details that I’ll need to connect to Hive – now its time to set up the OBIEE element.

In this particular example we’re using Windows to host OBIEE 11.1.1.7, as this is the only platform that we can get the HiveServer2 ODBC drivers to work, in this case the Cloudera Hive ODBC drivers available on their website (free download but registration may be needed). Before we can get this ODBC driver to work though, we need to install the Kerberos client software on the Windows machine so that we can generate the Kerberos ticket that the ODBC driver will need to pass over as part of the authentication process.

To configure the Windows environment for Kerberos authentication, in my case I used the Kerberos for Windows 4.x client software downloadable for free from the MIT website and copied across the krb5.conf file from the KDC server, renaming it to krb5.ini and storing it the default location of c:\ProgramData\MIT\Kerberos5.

NewImage

You also need to define a system environment variable, KRB5CCNAME, to point to a directory where the Kerebos tickets can be cached, in my case I used c:\temp\krb5cache. Once this is done, reboot the Windows environment and you should then be prompted after login to authenticate yourself to the Kerebos KDC.

NewImage

The ticket then stays valid for a set number of days/hours, or you can configure OBIEE itself to authenticate and cache its own ticket – for now though, we’ll create the ticket manually and connect to the secured cluster using these cached ticket details.

After installing the Cloudera Hive ODBC drivers, I create the connection using Kerebos as the Authentication Mechanism, and enter the realm name, HiveServer2 host and the Hive Kerebos principal name, like this:

NewImage

In my case both the BI Administration tool and the OBIEE BI Server were on the same Windows VM, and therefore shared the same ODBC driver install, so I then moved over to the BI Administration tool to import the Hive table metadata details into the RPD and create the physical, logical and presentation layer RPD elements. Depending on how your CDH cluster is set up you might be able to test the connection now by using the View Data… menu item in BI Administration, but in my case I had to do two more things on the CDH cluster itself before I could get Hive queries under this Kerberos principal to run properly.

NewImage

First, as secured CDH Hadoop clusters usually configure HiveServer2 to use “user impersonation” (connecting to Hive as the user you authenticate as, not the user that HiveServer2 authenticates to the Hive service as), YARN and MapReduce jobs run under your account and not the usual “Hive” account that unsecured Hive connections use. Where this causes a problem on CDH installations on RHEL-derived platforms (RHEL, OEL, Centos etc) is that YARN normally blocks jobs running on behalf of users with a UID of <1000 (as this on other Linux distributions typically signifies a system account), RHEL starts user UIDs at 500 and YARN therefore blocks them from running jobs. To fix this, you need to go into Cloudera Manager and edit the YARN configuration settings to lower this UID threshold to something under 500, for example 250:

NewImage

I also needed to alter the group ownership of the temporary directory each node used for the YARN NodeManager’s user files so that YARN could write its temporary files correctly; on each node in the cluster I ran the following Linux commands as root to clear down any files YARN had created before, and recreate the directories with the correct permissions (Hive jobs would fail until I did this, with OBIEE just reporting an ODBC error):

rm -rf /yarn
mkdir -p /yarn/nm
chown -R yarn /yarn
chgrp -R yarn /yarn

Once this is done, queries from the BI Administration tool and from the OBIEE BI Server should connect to the Kerberos-secured CDH cluster successfully, using the Kerberos ticket you obtained using the MIT Kerberos Ticket Manager on login and then passing across the user details under which the YARN, and then Hive job should run.

NewImage

If you’re interested, you can go back to the MIT Kerberos Ticket Manager and see the other Kerberos tickets that were requested and then cached by the Cloudera Hive ODBC driver when it mutually authenticated with the HiveServer2 RPC interface – Kerebos authenticates both ways to ensure that who you’re connecting to is actually who they say they are, in this case checking the HiveServer2 connection you’re connecting to isn’t being spoofed by someone else.

NewImage

So that’s the process for connecting OBIEE to a Kerberos-secured CDH Hadoop cluster in a nutshell; in the New Year I’ll put something together on using Apache Sentry to provide role-based access control for Hive and Impala tables and as of CDH 5.3, HDFS directories, and I’ll also take a look at the new extended ACLs feature in CDH5.2 that goes beyond HDFS’s standard POSIX security model.

TIMESTAMPS and Presentation Variables

TIMESTAMPS and Presentation Variables can be some of the most useful tools a report creator can use to invent robust, repeatable reports while maximizing user flexibility.  I intend to transform you into an expert with these functions and by the end of this page you will certainly be able to impress your peers and managers, you may even impress Angus MacGyver.  In this example we will create a report that displays a year over year analysis for any rolling number of periods, by week or month, from any date in time, all determined by the user.  This entire document will only use values from a date and revenue field.

Final Month DS

The TIMESTAMP is an invaluable function that allows a user to define report limits based on a moving target. If the goal of your report is to display Month-to-Date, Year-to-Date, rolling month or truly any non-static period in time, the TIMESTAMP function will allow you to get there.  Often users want to know what a report looked like at some previous point in time, to provide that level of flexibility TIMESTAMPS can be used in conjunction with Presentation Variables.

To create robust TIMESTAMP functions you will first need to understand how the TIMESTAMP works. Take the following example:

Filter Day -7 DS

Here we are saying we want to include all dates greater than or equal to 7 days ago, or from the current date.

  • The first argument, SQL_TSI_DAY, defines the TimeStamp Interval (TSI). This means that we will be working with days.
  • The second argument determines how many of that interval we will be moving, in this case -7 days.
  • The third argument defines the starting point in time, in this example, the current date.

So in the end we have created a functional filter making Date >= 1 week ago, using a TIMESTAMP that subtracts 7 days from today.

Results -7 Days DS

Note: it is always a good practice to include a second filter giving an upper limit like “Time”.”Date” < CURRENT_DATE. Depending on the data that you are working with you might bring in items you don’t want or put unnecessary strain on the system.

We will now start to build this basic filter into something much more robust and flexible.

To start, when we subtracted 7 days in the filter above, let’s imagine that the goal of the filter was to always include dates >= the first of the month. In this scenario, we can use the DAYOFMONTH() function. This function will return the calendar day of any date. This is useful because we can subtract this amount to give us the first of the month from any date by simply subtracting it from that date and adding 1.

Our new filter would look like this:

DayofMonth DS

For example if today is December 18th, DAYOFMONTH(CURRENT_DATE) would equal 18. Thus, we would subtract 18 days from CURRENT_DATE, which is December 18th, and add 1, giving us December 1st.

MTD Dates DS

(For a list of other similar functions like DAYOFYEAR, WEEKOFYEAR etc. click here.)

To make this even better, instead of using CURRENT_DATE you could use a prompted value with the use of a Presentation Variable (for more on Presentation Variables, click here). If we call this presentation variable pDate, for prompted date, our filter now looks like this:

pDate DS

A best practice is to use default values with your presentation variables so you can run the queries you are working on from within your analysis. To add a default value all you do is add the value within braces at the end of your variable. We will use CURRENT_DATE as our default, @{pDate}{CURRENT_DATE}.  Will will refer to this filter later as Filter 1.

{Filter 1}:

pDateCurrentDate DS

As you can see, the filter is starting to take shape. Now lets say we are going to always be looking at a date range of the most recent completed 6 months. All we would need to do is create a nested TIMESTAMP function. To do this, we will “wrap” our current TIMESTAMP with another that will subtract 6 months. It will look like this:

Month -6 DS

Now we have a filter that is greater than or equal to the first day of the month of any given date (default of today) 6 months ago.

Month -6 Result DS

To take this one step further, you can even allow the users to determine the amount of months to include in this analysis by making the value of 6 a presentation variable, we will call it “n” with a default of 6, @{n}{6}.  We will refer to the following filter as Filter 2:

{Filter 2}:

n DS

For more on how to create a prompt with a range of values by altering a current column, like we want to do to allow users to select a value for n, click here.

Our TIMESTAMP function is now fairly robust and will give us any date greater than or equal to the first day of the month from n months ago from any given date. Now we will see what we just created in action by creating date ranges to allow for a Year over Year analysis for any number of months.

Consider the following filter set:

 Robust1 DS

This appears to be pretty intimidating but if we break it into parts we can start to understand its purpose.

Notice we are using the exact same filters from before (Filter 1 and Filter 2).  What we have done here is filtered on two time periods, separated by the OR statement.

The first date range defines the period as being the most recent complete n months from any given prompted date value, using a presentation variable with a default of today, which we created above.

The second time period, after the OR statement, is the exact same as the first only it has been wrapped in another TIMESTAMP function subtracting 1 year, giving you the exact same time frame for the year prior.

YoY Result DS

This allows us to create a report that can run a year over year analysis for a rolling n month time frame determined by the user.

A note on nested TIMESTAMPS:

You will always want to create nested TIMESTAMPS with the smallest interval first. Due to syntax, this will always be the furthest to the right. Then you will wrap intervals as necessary. In this case our smallest increment is day, wrapped by month, wrapped by year.

Now we will start with some more advanced tricks:

  • Instead of using CURRENT_DATE as your default value, use yesterday since most data are only as current as yesterday.  If you use real time or near real time reporting, using CURRENT_DATE may be how you want to proceed. Using yesterday will be valuable especially when pulling reports on the first day of the month or year, you generally want the entire previous time period rather than the empty beginning of a new one.  So, to implement, wherever you have @{pDate}{CURRENT_DATE} replace it with @{pDate}{TIMESTAMPADD(SQL_TSI_DAY,-1,CURRENT_DATE)}
  • Presentation Variables can also be used to determine if you want to display year over year values by month or by week by inserting a variable into your SQL_TSI_MONTH and DAYOFMONTH statements.  Changing MONTH to a presentation variable, SQL_TSI_@{INT}{MONTH} and DAYOF@{INT}{MONTH}, where INT is the name of our variable.  This will require you to create a dummy variable in your prompt to allow users to select either MONTH or WEEK.  You can try something like this: CASE MOD(DAY(“Time”.”Date”),2) WHEN 0 ‘WEEK’ WHEN 1 THEN ‘MONTH’ END

INT DS

MOD DS

DropDown DS

In order for our interaction between Month and Week to run smoothly we have to make one more consideration.  If we are to take the date December 1st, 2014 and subtract one year we get December 1st, 2013, however, if we take the first day of this week, Sunday December 14, 2014 and subtract one year we get Saturday December 14, 2014.  In our analysis this will cause an extra partial week to show up for prior years.  To get around this we will add a case statement determining if ‘@{INT}{MONTH}’ = ‘Week’ THEN subtract 52 weeks from the first of the week ELSE subtract 1 year from the first of the month.

Our final filter set will look like this:

Final Filter DS

With the use of these filters and some creative dashboarding you can end up with a report that easily allows you to view a year over year analysis from any date in time for any number of periods either by month or by week.

Final Month Chart DS

Final Week Chart DS

That really got out of hand in a hurry! Surely, this will impress someone at your work, or even Angus MacGyver, if for nothing less than he or she won’t understand it, but hopefully, now you do!

Also, a colleague of mine Spencer McGhin just wrote a similar article on year over year analyses using a different approach. Feel free to review and consider your options.


 

Calendar Date/Time Functions

These are functions you can use within OBIEE and within TIMESTAMPS to extract the information you need.

  • Current_Date
  • Current_Time
  • Current_TimeStamp
  • Day_Of_Quarter
  • DayName
  • DayOfMonth
  • DayOfWeek
  • DayOfYear
  • Hour
  • Minute
  • Month
  • Month_Of_Quarter
  • MonthName
  • Now
  • Quarter_Of_Year
  • Second
  • TimestampAdd
  • TimestampDiff
  • Week_Of_Quarter
  • Week_Of_Year
  • Year

Back to section


 

Presentation Variables

The only way you can create variables within the presentation side of OBIEE is with the use of presentation variables. They can only be defined by a report prompt. Any value selected by the prompt will then be sent to any references of that filter throughout the dashboard page.

In the prompt:

Pres Var DS

From the “Set a variable” dropdown, select “Presentation Variable”. In the textbox below the dropdown, name your variable (named “n” above).

When calling this variable in your report, use the syntax @{n}{default}

If your variable is a string make sure to surround the variable in single quotes: ‘@{CustomerName]{default}’

Also, when using your variable in your report, it is good practice to assign a default value so that you can work with your report before publishing it to a dashboard. For variable n, if we want a default of 6 it would look like this @{n}{6}

Presentation variables can be called in filters, formulas and even text boxes.

Back to section


 

Dummy Column Prompt

For situations where you would like users to select a numerical value for a presentation variable, like we do with @{n}{6} above, you can convert something like a date field into values up to 365 by using the function DAYOFYEAR(“Time”.”Date”).

As you can see we are returning the SQL Choice List Values of DAYOFYEAR(“Time”.”Date”) <= 52.  Make sure to include an ORDER BY statement to ensure your values are well sorted.

Dummy Script DS

Back to Section

Supporting BITeamwork for Oracle BI (OBIEE) Updates and Releases

As more and more customers of Oracle BI (OBIEE) find out about BITeamwork and are introduced to the Collaborative BI functionality it adds to OBIEE, we are frequently asked several questions as customers inquire about incorporating BITeamwork into their existing BI architecture. Two questions seem to always stand out: How does BITeamwork align with updates […]

The post Supporting BITeamwork for Oracle BI (OBIEE) Updates and Releases appeared first on Art of Business Intelligence Blog.

BITeamwork 3.0 – The “Larry Ellison” Release Now GA

It’s here. It’s here! Our development team is completely excited to present to the world the latest version of the groundbreaking market leading Collaborative BI software solution, BITeamwork. Please welcome to the stage, BITeamwork 3.0 – The “Larry Ellison” Release. We’ll get to the key new features that BITeamwork 3.0 brings to your Oracle BI […]

The post BITeamwork 3.0 – The “Larry Ellison” Release Now GA appeared first on Art of Business Intelligence Blog.