Category Archives: Rittman Mead
Analysing Elections Data with Oracle Data Visualisation Desktop
Disclaimer #1 This post is not about politics. Its dataset is about politics, but that's a coincidence. It could be about immunisation or transport or anything else. If you are strictly against any politics, here is a link to watch cute kittens.
Introduction
Let's pretend that I'm an analyst and got a supposedly interesting data set. Now I want to understand if the data is actually interesting or it's a total rubbish. Having been in IT for some time I can use tools and technologies which typical end-user can’t access. But this time I pretend I’m a usual analyst which has data and desktop tools. And my task is to do a research and tell if there are anomalies in the data or everything looks like it supposed to look like.
The main tool for this work is obviously Oracle Data Visualisation Desktop (DVD). And, as a supplementary tool, I use Excel. This post is not a guide for any particular DVD feature. It won’t give a step by step instruction or anything like that. The main idea is to show how we can use Oracle Data Visualisation for an analysis of a real dataset. Not simply show that we can build bar charts, and pie charts and other fancy whatever charts, but show how we can get insights from the data.
The Data
I should say a few words about the data. It is an official result of the Russian State Duma (parliament) elections in 2016. Half of the Duma was elected by party lists and for this post I took that data. I should confess that I cheated a little and decided not spend my time downloading and parsing the data piece by piece from the official site, and took a prepared set.
From a bird's-eye view I have the following data:
- Voting results by election commissions: number of votes for every political party and a lot of technical measures like number of registered voters, number of good and damaged voting ballots and so on.
- Turnout figures at given times throughout the day.
From a more technical point of view, the data was stored in two big files with multiple JSON in each. As the data preparation part is big enough, it was extracted to another post. This one concentrates purely on visualisation and the second one is about data flows and comparison to Excel.
Analysing the Data
I did some cleaning, refining and enriching of the data and it's time to use it. I started with a standard Stacked bar chart. It shows percentages of parties by regions and in addition width of bars shows Total votes. The chart is sorted by ascending Total votes.
What can we say here?
Before I start talking I need a lawyer and a disclaimer #2:
Disclaimer #2 Some of the results may be interpreted in different ways. Some of them may be not so pleasant for some people. But I'm not a court and this post is only a data visualisation exercise. Therefore I'm not accusing anyone of committing any crimes. I will make some conclusions because of rules of the genre, but they should be treated as hypotheses only.
That’s not a proven charge (see disclaimer #2) but for me these regions look a bit suspicious. Their results are very uncommon. United Russia ruling party (orange bars) got an extremely high result in these few regions. This may be a sign of some kind of interfere with an election process there. But of course, other explanations (including a measure incorrectly selected for sorting) exist.
Just for reference so we don’t forget the names: Tatarstan, Kemerovo, Dagestan, Chechnya and Mordovia. There are a few more regions with similar results but their number of voters is lower so I don’t show them here.
At this point I'm starting to suspect something. But I need more visuals to support my position, and my next hypothesis is that in these regions ballots were somehow added to voting boxes (or protocols were changed which is basically the same). From a data visualisation point of view that will mean that these regions will have higher turnout (because of added ballots) along with higher United Russia result.
To check this I need one more measure - Turnout, %. It shows how many of registered voters actually voted. I can create this field without leaving visualisation mode. Cool.
Note. This formula may be not absolutely correct but it works well for demonstration purposes.
In order to visualise this hypothesis, I built a Scatter chart. Its horizontal axis is Turnout,% and vertical one United Russia, %. And I added a trend line to make things more obvious. Colour density shows Total votes.
I think my hypothesis just got a strong support. As usual it is not an absolutely impossible situation. But it's hard to explain why the more people come to voting stations the higher one party result is. I'd expect either high result not depending on the turnout (more or less like Uniform distribution) or at least a few exceptions with high turnout and low result.
I consider this result strange because in real life I'd expect that higher turnout should mean more opposition voters (a passive group indeed) coming to voting stations. But that's only my opinion. And highly arguable I should agree. What I really want to show here is that these charts highlight an oddity that should be investigated and may or may not have a rational explanation.
And who are our heroes? Let’s zoom in on the chart …and we see the same regions.
But maybe other parties can show the same result? We can build the same Scatter charts for every party or we can visualise all at once with a usual Line chart. Here I’ve plotted the percent of vote won by each party (Y-axis) against the overall turnout % (X-axis).
United Russia is the only party that increases with turnout.
So far all my charts were about relative measures, it's time to check some absolute values. Here is a Bar chart which shows a number of precinct commissions by results. I'd expect to see something close to normal distribution - a bell-shaped chart with the maximum around 54% (average turnout). Now, look at the real chart (bin size=1.0%). Its maximum is at 36-37%.
In probability theory, the normal (or Gaussian) distribution is a very common continuous probability distribution. Normal distributions are important in statistics and are often used in the natural and social sciences to represent real-valued random variables whose distributions are not known.
Strictly speaking all numbers I'm showing here are discrete and I should say Binomial distribution rather than Normal but right now for my purposes the diffence is not that big.
I'm definitely not Carl Gauss (and even not one of his best students) and you may ignore my opinion, but I expected something more like this:
And I don't have the slightest idea how it is possible that the most "popular" turnout is 100%.
What if we look at the same chart with more details? The previous one was grouped by 1% bins, and this one has 0.1% bins. And I had to add turnout not equal to 100%
filter. Even with smaller bin size, the last one is almost the same ~3K commissions. This bar is much bigger than the others and the chart doesn't show anything in that case.
What can we see here? Well, people aren’t really good in generating random numbers. It's perfectly OK to have some runout on the chart. But hey, it's not normal to have them mostly at round values. That looks like someone was trying to fit the result to some plan.
Here is my favourite part of the chart. I marked 1% intervals, and you can see that round turnout value is always more probable than its closest non-round neighbours. No exceptions. A possible explanation is that the number of commissions with results that high is relatively low and even the slightest manipulation is clearly visible.
But wait. What about that 64.3 percent? It’s not round, but it is a big runaway. Let’s take a closer look at this value and check if there is anything interesting or that is a normal situation. Here is a few interesting visualisation for it.
The first one is Tree Diagram. It shows all existing combinations of district and precinct commissions by regions for the filtered data (turnout=64.3
). And in order to demonstrate how it works for this case I made an animation. Most of the regions have a few commissions with 64.3% turnout. But Saratov region beats them all.
This visualisation has a serious flaw. End-user has to scroll it (I mean for this set of data) and can miss the point. Another visualisation can improve the situation. Network diagram doesn't need scrolling.
Looks good and shows exactly the same. But for this chart we must ensure that every data point is unique what is not true in my case. Different precinct commissions have the same numbers and I had to create a unique field first (DEC #||PEC #
). It's easy to forget and get unpredictable or even misleading results.
Or if you prefer more traditional charts, here is Sunburst for you. Its sectors size shows Total votes and the colour is PEC count. It gives a good representation of how uncommon Saratov's result is.
And the last picture for the same topic boring never-old classic Bar chart.
Considering all these charts I'd say that almost exclusive concentration of commissions with 63.4% turnout in Saratov doesn't look normal for me. It's pretty weird that sibling commissions show exactly the same figures.
A few more diagrams which could work well are Sankey and Parallel coordinates, unfortunately, they are less informative because of the high number of precinct commissions. But if the number was lower I'd consider them too.
All previous charts are based on voting data. But I have one more dataset - official turnout. Let's check if we can find anything interesting there. And unfortunately significant part of commissions doesn't have official data, and sometimes I may use formulas that are not exactly the same as official ones, so numbers may differ slightly from what I got from the protocols data.
The first chart shows the number of commissions (vertical axis) by the official turnout (horizontal axis). Colour shows the moment of time. Strictly saying I shouldn't have used continuous linear charts for discrete values, but coloured overlapped bars don't give that clear picture.
Except for the 100% tail, everything is more or less natural. Graph shape looks more like Gamma distribution rather than Normal but I didn't test it.
What picture do I have for various regions?
Moscow city is known for a relatively high number of poll watchers and we may expect more clean data there. Ignoring the long tail, these look normal (or binomial if you want to be precise).
Saratov region. The one with 64.3% turnout. Look at these peaks. Do they look natural to you?
Do you remember Tatarstan (was the hero in the beginning of this story)? Here it is. I don't know how can anyone explain how it is possible (without manual results adjusting I mean).
Summary
This post shows how we can use Oracle DVD for visualisation of a real data set. And I hope I was able to convince you that this tool can be useful and can give you really interesting insights. Of course, visualisation alone doesn't answer all questions. And this time actually it was less about answers but more about questions. It helps to ask right questions.
More reading on the topic: 1, 2 (Russian language). If you can read Russian, here you will find more visualisations, discussions and interesting information. And this article is about elections in 2011. Its undisputable advantage is that it is in English.
OBIEE 12c Time Dimension: Logical Sequence Number
The key component of any successful OBIEE implementation is the metadata model known as the repository (RPD). Doing it right is sometimes considered "black magic", and small mistakes in the RPD can impact all the exposed Subject Areas, resulting in poor performances or, even worse, wrong results.
Working an RPD requires dedicated knowledge of the tool and we are sharing it in our RPD modelling training both for OBIEE 11g and OBIEE 12c.
If you ever worked on RPD modelling, one of the settings you surely encountered is the Time dimension. This blog post written back in 2007 explains the process of setting up a time dimension for OBIEE 10g. The process didn't have any major modifications until recently when, in 12.2.1, Logical Sequence Numbers were introduced. As per Oracle's documentation this new feature "optimizes time series functions and in some cases improves query time", and in this post we'll see how to configure it and its impact on the time-series calculations. The examples shown below are based on Oracle Sampleapp v607, a really good source of modelling and front-end examples.
Usual Time-series Query Behaviour
Time-series functions like Ago,ToDate, and more recently PeriodRolling, allow end users to compare results coming from different moments just by specifying the level in the time dimension hierarchy and the number of periods to look backwards or forwards. As example if you needed to compare current month sales revenue with the previous month figure you'll end up writing a formula like
AGO("F0 Sales Base Measures"."1- Revenue","H0 Time"."Month", 1)
Where:
- AGO: is the Time-series function being called
- "F0 Sales Base Measures"."1- Revenue": is the metric
- "H0 Time"."Month": is the time hierarchy level
- 1: is the amount of periods (months in our case) to look back in history
Once the time-series metric has been created, it can be used in an analysis like the following to compare Revenue of 2015-04 with the one of the previous month.
The analysis generates the following Logical SQL which basically lists the columns retrieved and the filters applied.
SELECT
0 s_0,
"A - Sample Sales"."Time"."T02 Per Name Month" s_1,
"A - Sample Sales"."Base Facts"."1- Revenue" s_2,
"A - Sample Sales"."Time Series"."127 Mth Ago Rev (Fix Time Lvl)" s_3
FROM "A - Sample Sales"
WHERE
("Time"."T02 Per Name Month" = '2015 / 04')
ORDER BY 2 ASC NULLS LAST
FETCH FIRST 5000001 ROWS ONLY
The translation to source SQL hugely depends on the data model created in the RPD and on the data source type. In our example an Oracle SQL gets generated containing the following steps:
- Sequence Generation: a RANK function is used to created a dense sequence based on Per_Name_Month, the chronological key defined in the time-hierarchy for the month level. Mth_Key is also part of the query since it's used in the join between dimension and fact table.
WITH
OBICOMMON0 AS (select DENSE_RANK() OVER ( ORDER BY T653.Per_Name_Month) as c1,
T653.Mth_Key as c2,
T653.Per_Name_Month as c3
from
BISAMPLE.SAMP_TIME_MTH_D T653 /* D02 Time Month Grain */ ),
- Period Shifting: The sequence generated above is now shifted by the number of periods defined in the column formula (
D1.c1 + 1
), in our example 1.
SAWITH0 AS (select D1.c1 + 1 as c1,
D1.c2 as c2,
D1.c3 as c3
from
OBICOMMON0 D1),
- Ago Period Query: Using the period shifted query the historical record (or set of records) is retrieved.
SAWITH1 AS (select distinct D1.c1 as c1,
D1.c3 as c2
from
OBICOMMON0 D1),
SAWITH2 AS (select sum(T418.Revenue) as c1,
D3.c2 as c2
from
BISAMPLE.SAMP_REVENUE_F T418 /* F10 Billed Rev */ ,
SAWITH0 D4,
SAWITH1 D3
where ( T418.Bill_Mth_Key = D4.c2 and D3.c1 = D4.c1 and D3.c2 = '2015 / 04' )
group by D3.c2, D4.c3),
The period shifted query usage is explained visually by the image below
- Selected Period Query: the query for the selected period, in our case 2015-04, is executed using standard time dimension
SAWITH3 AS (select sum(T418.Revenue) as c1,
T653.Per_Name_Month as c2
from
BISAMPLE.SAMP_TIME_MTH_D T653 /* D02 Time Month Grain */ ,
BISAMPLE.SAMP_REVENUE_F T418 /* F10 Billed Rev */
where ( T418.Bill_Mth_Key = T653.Mth_Key and T653.Per_Name_Month = '2015 / 04' )
group by T653.Per_Name_Month)
- Resultsets joining: Results coming from Ago Period and Selected Period queries are then joined with an outer join.
select D1.c1 as c1, D1.c2 as c2, D1.c3 as c3, D1.c4 as c4 from ( select D1.c1 as c1,
D1.c2 as c2,
D1.c3 as c3,
D1.c4 as c4
from
(select 0 as c1,
coalesce( D1.c2, D2.c2) as c2,
D2.c1 as c3,
D1.c1 as c4,
ROW_NUMBER() OVER (PARTITION BY coalesce( D1.c2, D2.c2) ORDER BY coalesce( D1.c2, D2.c2) ASC) as c5
from
SAWITH2 D1 full outer join SAWITH3 D2 On D1.c2 = D2.c2
) D1
where ( D1.c5 = 1 )
order by c2 ) D1 where rownum <= 5000001
As you can see, it's a rather complex set of instructions that involves several steps including analytical functions like the DENSE_RANK()
in the sequence generation. In our case we have been lucky that the source system provided the DENSE_RANK()
function; with other sources the sequence generation needs to be calculated directly by the BI Server (OBIS) with a possible degradation of performances.
What is Logical Sequence Number?
The Sequence Generation step mentioned above can sometimes be the bottleneck of the whole query especially when the time dimension is complex or huge in volume since DENSE_RANK()
is a costly analytical function.
All that OBIEE's Time-series need to work is a pure sequence, or in Oracle's words:
Sequence numbers are enumerations of time dimensional members at a certain level. The enumeration must be dense (no gaps) and must correspond to a real time order. For example, months in a year can be enumerated from 1 to 12.
Then what if we can find a way of pre-calculating them and storing in the table or calculate them on the fly but using functions less expensive than a DENSE_RANK()
?
This is the idea behind the Logical Sequence Number (LSN): a way of avoiding the BI Server (OBIS) needing to execute the DENSE_RANK()
, by passing either a pre-calculated sequence column in the source table or a cheaper-cost function to calculate it on the fly based on existing columns.
The formula behind a Logical Sequence Number must resolve in a number (no varchar, or dates are allowed) and can either be:
- Absolute: when the sequence is consistent and doesn't need any external reference, e.g. calendar years are a self-defined sequence
- Relative: when the sequence is relative to a parent level, e.g. creating a sequence for months between 1 and 12 would need the calendar year as parent level to provide correct cross-years references
As the word already says "Sequence Numbers" must be sequential, no gaps can be included. Year and month number are good examples of this.
But what if we wanted to provide an absolute sequence number at month level?
One could think about using a month key in the YYYYMM
format but this is not a sequence: 201701 <> 201612+1
. The best way of generating the sequence number would be to add it as a column in the database table.
Will take as example the SAMP_TIME_MTH_D
table that can be found in Sampleapp v607 containing MONTH_KEY
column in the YYYYMM
format.
I'll add to SAMP_TIME_MTH_D
a column MONTH_ABS_SEQ
containing the absolute sequence number. The formula behind the column is exactly what Oracle was doing under the covers using a DENSE_RANK
.
SELECT MTH_KEY, DENSE_RANK() OVER (ORDER BY MTH_KEY) MONTH_ABS_SEQ FROM SAMP_TIME_MTH_D
And the end result as expected being
How are Logical Sequence Number Used?
Previously I described the two types of Logical Sequence Numbers: Absolute and Relative, each one has it use case:
PERIODROLLING
: This function uses absolute LSN to calculate the starting Sequence Number based on the selected one: e.g. looking at the previous image a 6 month rolling starting from 2009-02 will include data from sequences in the range [9-14].TO_DATE
: uses relative LSN, e.g A YearToDate Measure shown by month will use the month relative LSN to calculate the previous months in the same year.AGO
: Ago function uses both absolute and relative: Absolute numbers are used if the grain of the query and the one of the AGO are at the same level e.g. Yearly analysis of Actual vs Previous Year. However when the grain of the shift in the ago is higher than the grain of the query Relative LSN are used, e.g. Monthly analysis of Actual vs Previous Year.
How to Configure Logical Sequence Numbers
Logical Sequence Number is a new feature and as such it requires additional settings in the time dimension hierarchy to be working. In our case we'll add two sequences, an absolute at calendar year level and a relative at calendar month level.
We'll add the sequences directly as formulas in the repository however those formulas should be pushed down as columns in the database table if optimal performances are sought.
In order to set the Logical Sequence Numbers we need to open the RPD (the SampleApp one in this test), and select the Time dimension we want to change.
After selecting any level apart from the Grand Total (top of the hierarchy) a new tab called "Sequence Numbers" should be visible. However if, like in our case, none of the columns at that level are integer or doubles, the sequence number selector is disabled.
In order to enable the selector we first need to create a sequence column in our dimension and bring it at the correct level in the hierarchy. For the Year Level there is already an integer column named "T35 Cal Year" which can be used as sequence. We need simply to drag the column at Year level in the Time hierarchy and set it as Absolute Sequence.
I can do the same with the Month level in the hierarchy and the "T32 Cal Month" column. Note that the column contains only the months enumeration from 1 till 12 so we need to set the sequence number as Relative to the level Year.
Please note that both absolute and relative LSN can be (and should be) entered since as discussed above each have a different use cases. In addition relative LSN should be set for all logical parents level in the hierarchy since they will be used only if the grain of the time shift matches the one of the parent level. For example a Monthly LSN based on Year logical level will only be used in AGO functions having a year shift and not in case of Quarterly shift.
For an optimal usage every level of the time hierarchy should have one absolute and a relative LSN for each of the parents level in the hierarchy.
Impact on SQL
It's time now to review the SQL generated by our analysis and check the differences with the old-school time-series query.
When creating an analysis at year level like the following
As Expected the BI Server (OBIS) uses the CAL_YEAR
column as sequence instead of the DENSE_RANK()
function over the PER_NAME_YEAR
column.
WITH
OBICOMMON0 AS (select T795.CAL_YEAR as c1,
T795.QTR_KEY as c2,
T795.PER_NAME_YEAR as c3
from
BISAMPLE.SAMP_TIME_QTR_D T795 /* D03 Time Quarter Grain */ )
While when using the TO_DATE
both the relative sequence is used, like in the following example where the measure "166 Revenue Year To Date" is defined by the formula:
TODATE("01 - Sample App"."F0 Sales Base Measures"."1- Revenue", "01 - Sample App"."H0 Time"."Year" )
The following query gets generated, note the usage of Cal_Year
and Cal_Month
in the sequence generation query instead of the DENSE_RANK()
function as per RPD settings mentioned above.
WITH
OBICOMMON0 AS (select T653.Cal_Year as c1,
T653.Cal_Month as c2,
T653.Per_Name_Month as c3,
T653.Per_Name_Year as c4
from
BISAMPLE.SAMP_TIME_MTH_D T653 /* D02 Time Month Grain */ )
...
Are Logical Sequence Numbers Useful?
Most of the times the bottleneck when using Time Series is not in the sequence generation, since the time dimension cardinality is rarely big enough to produce a noticeable delay in the query time, but rather in the poorly managed calculations made on query time on top of massive fact tables.
Don't expect LSN to solve all your performance problems with Time Series. However, the usage of Logical Sequence Numbers provides to OBI a way of pre-cooking part of the calculation and so in theory should help performance. The small effort required to set them up centrally in the time hierarchy is covered by the benefits during query time, without having to touch any pre-defined time-series calculation.
If you do have performance problems with your OBIEE system, or would like to ensure that a system you’re building will be performant from the outset, please get in touch to find out more about our Performance Analytics service!
We also provide expert OBIEE training, implementations, QA and health checks - to find out more about how we can help you, please contact us!
Introducing Advanced Analytics Training from Rittman Mead!
Rittman Mead is proud to release our new training course: Advanced Analytics with Oracle's R Technologies.
Oracle has made significant investments in the R language with Oracle R, ROracle and Oracle R Enterprise. Using these tools, data scientists and business intelligence practitioners can work together more efficiently and can transition between their roles more easily.
Rittman Mead has developed a three-day course that tackles R's notoriously steep learning curve. It builds on Oracle professionals' existing skills to accelerate growth into R programming and data science.
What does the course include?
Day one is all about the R programming language, starting with a history and explanation of Oracle's R technologies. Hands-on coding begins right away, with practical labs comparing R's data types and data structures with those found in the Oracle Database. The day wraps up with R programming concepts like conditions and functions, providing a strong grasp of the fundamentals on the very first day.
Day two focuses on the analysis pipeline, from data acquisition to data visualization. You will use SQL and R to tidy and transform raw data into a structured format and then use visualization and basic statistics to gain insights.
Day three looks at statistical modeling—discussing linear models and the predictive modeling pipeline. We present pros and cons of different types of models and get hands-on with preprocessing, model tuning, cross-validation and interpreting model results.
Our course is a mixture of theory and practical exercises—ensuring that you'll understand the tools and know when to apply them.
Who should attend?
The course is suitable for Oracle practitioners having some experience with SQL and business intelligence. No previous knowledge of R is assumed or necessary.
Sounds great, where do I sign up?
Please view our UK & Europe or US training schedule for public courses. For any questions or queries, including on-site training requests, please contact Daniel Delgado (US) or Sam Jeremiah (UK & Europe) for more details.
Financial Reports – which tool to use? Part 2
Financials in BI Publisher
I find it interesting that BI Publisher is mostly known for the creation of pixel perfect repeating forms (invoices, labels, checks, etc) and its ability to bursting them. To me, BI Publisher is the best kept secret for the most challenging reports known to mankind.
In my last blog - https://www.rittmanmead.com/blog/2017/02/financial-reports-which-tool-to-use-part-1/, I discussed some of the challenges of getting precisely formatted financial reports in OBIEE, as well as some pros and cons of using Essbase/HFR. Although we can work through difficult solutions and sometimes get the job done, BI Publisher is the tool that easily allows you to handle the strangest requirements out there!
If you have OBIEE, then you already have BI Publisher, so there is no need to purchase another tool. BI Publisher comes integrated with OBIEE, and they can both be used from the same interface. The transition between BI Publisher and OBIEE is often seamless to the user, so you don’t need to have concerns over training report consumers in another tool, or even transitioning to another url.
The BIP version that comes embedded with OBIEE 12c comes loaded with many more useful features like encryption and delivering documents to Oracle Document Cloud Service. Check out the detailed new features here: http://www.oracle.com/technetwork/middleware/bi-publisher/new-features-guide-for-12-2-1-1-3074557.pdf
In BI Publisher, you can leverage data from flat files, from different databases, from an Essbase cube, from the OBIEE RPD, from one (or multiple) OBIEE analyses, from web services and more:
So, if you already have very complex OBIEE analyses that you could not format properly, you can use these analyses, and all the logic in them, as sources for your perfectly formatted BI Publisher reports.
Every BI Publisher report consists of three main components:
Data Model - data source that you will use across one or more reports
Layout(s) - which will define how your data is presented
Properties - which are related to how it generates, displays and more
You start a BI Publisher project by creating a data model that contains the different data sets that you would like to use on your report (or across multiple reports). These data sets, which reside inside of your data model, can be of the same source or can come from multiple sources and formats. If you regularly use OBIEE, you can think of a data model as the metadata for one or more reports. It is like a very small, but extremely flexible and powerful RPD.
Inside the data model you can connect your data sets using bind variables (which creates a hierarchical relationship between data sets), or you can leave them completely disconnected. You can also connect some of your data sets while leaving others disconnected.
The most impressive component of this tool is that it will allow you to do math from the results of disconnected data sets, without requiring ETL behind the scenes. This may be one of the requirements of a very complex financial report, and one that is very difficult to accomplish with most tools. The data model can extract and transform data within a data set, or extract only, so that it can later be transformed during your report template design!
For example, within a data set, you can create new columns to suit most requirements - they can be filtered, concatenated, or have mathematical functions applied to them, if they come from the same data source.
If they do not come from the same source, you can transform your data using middle tier systems, such as Microsoft Word during your template creation. You can perform math and other functions to any result that comes from any of your data sets using an RTF template, for example.
The example above was mentioned in Part 1 of this blog. It was created using BI Publisher and represents what I would call a "challenging report" to get done in OBIEE. The data model in this example consisted of several OBIEE analyses and their results were added/subtracted/multiplied as needed in each cell.
This second example was another easy transition into BI Publisher: the entire report contained 10 pages that were formatted entirely differently, one from the other. Totals from all pages needed to be added in some specific cells. Better yet, the user entered some measures at the prompt, and these measures needed to be accounted for in every sub-total and grand total. You may be asking: why prompt for a measure? Very good question indeed. In this case, there were very few measures coming from a disconnected system. They changed daily, and the preferred way for my client to deal with them was to enter them at the prompt.
So, do you always have to add apples to apples? Not necessarily! Adding apples and bananas may be meaningful to you.
And you can add what is meaningful with BI Publisher!
For example, here is a sample data model using sources from Excel, OBIEE and a database. As you see, two of these data sets have been joined, while the other two are disconnected:
A data model such as this one would allow you to issue simultaneous queries across these heterogeneous sources and combine their results in the report template. Meaning, you can add anything that you would like in a single cell. Even if it involves that measure coming from the prompt! Goes without saying, you should have the exact purpose and logic behind this machination.
Once your data model is complete: your data sets are in place, you have created the relationships within them (where applicable), you created custom columns, created your parameters and filters, then you generate some sample data (XML) and choose how you will create your actual report.
As I mentioned, there are additional functionalities that may be added when creating the report, depending on the format that you choose for your template:
One very simple option is to choose the online editor, which has a bit more limited formatting capability, but will allow you to interact with your results online.
In my experience, if I had to cross the bridge away from OBIEE and into BI Publisher, it is because I needed to do a lot of customization within my templates. For those customizations, I found that working with RTF templates gave me all the additional power that I could possibly be missing everywhere else. Even when my financial report had to be read by a machine, BI Publisher/RTF was able to handle it.
The power of the BI Publisher data model combined with the unlimited flexibility of the RTF templates was finally the answer to eliminate the worst excel monsters. With these two, you can recreate the most complex reports, and do it just ONCE - not every month. You can use your existing format - that you either love, or are forced to use for some reason - and reuse it within the RTF. Inside of each RTF cell, you define (once!) what that cell is supposed to be. That specific cell, and all others, will be tested and validated to produce accurate results every month.
Once this work is done, you are done forever. Or well, at least until the requirements change… So, if you are battling with any one of these monsters on a monthly basis, I highly encourage you to take a step forward and give BI Publisher a try. Once you are done with the development of your new report, you may find that you have hours per month back in your hands. Over time, many more hours than what you spent to create the report. Time worth spending.