Liberate your data

Intelligence is all about knowledge. This website is dedicated sharing expertise on Oracle BI. More »

 

Bad xml instance! Attribute ‘skipLabels’ is not declared for element ‘labels’

When reports are created in a 12 environment, it is not possible to simply compress the reports in 12 and unarchive them in 11. When we want to run the 12-report on an 11 environment, just copy and paste the XML and apply the XML. Sometimes this error message appears;

When we scroll to the end of the message, we can see what the problem is;

Attribute 'skipLabels' is not declared for element 'labels'
  • edit the XML
  • search for the tag ‘skipLabels’
  • remove the tag skipLabels=”true”
  • apply XML
  • save report

Het bericht Bad xml instance! Attribute ‘skipLabels’ is not declared for element ‘labels’ verscheen eerst op OBIEE 24/7 - Berry van den Heuvel (NL).

Bad xml instance! Attribute ‘biFormat’ is not declared for element ‘dataFormat’

When reports are created in a 12 environment, it is not possible to simply compress the reports in 12 and unarchive them in 11. When we want to run the 12-report on an 11 environment, just copy and paste the XML and apply the XML. Sometimes this error message appears;

When we scroll to the end of the message, we can see what the problem is;

Attribute 'biFormat' is not declared for element 'dataFormat'
  • edit the XML
  • search for the tag ‘biFormat’
  • remove the tag biFormat=”dateShort”
  • apply XML
  • save report

Het bericht Bad xml instance! Attribute ‘biFormat’ is not declared for element ‘dataFormat’ verscheen eerst op OBIEE 24/7 - Berry van den Heuvel (NL).

Analysing Elections Data with Oracle Data Visualisation Desktop

Disclaimer #1 This post is not about politics. Its dataset is about politics, but that's a coincidence. It could be about immunisation or transport or anything else. If you are strictly against any politics, here is a link to watch cute kittens.

Introduction

Let's pretend that I'm an analyst and got a supposedly interesting data set. Now I want to understand if the data is actually interesting or it's a total rubbish. Having been in IT for some time I can use tools and technologies which typical end-user can’t access. But this time I pretend I’m a usual analyst which has data and desktop tools. And my task is to do a research and tell if there are anomalies in the data or everything looks like it supposed to look like.
The main tool for this work is obviously Oracle Data Visualisation Desktop (DVD). And, as a supplementary tool, I use Excel. This post is not a guide for any particular DVD feature. It won’t give a step by step instruction or anything like that. The main idea is to show how we can use Oracle Data Visualisation for an analysis of a real dataset. Not simply show that we can build bar charts, and pie charts and other fancy whatever charts, but show how we can get insights from the data.

The Data

I should say a few words about the data. It is an official result of the Russian State Duma (parliament) elections in 2016. Half of the Duma was elected by party lists and for this post I took that data. I should confess that I cheated a little and decided not spend my time downloading and parsing the data piece by piece from the official site, and took a prepared set.

From a bird's-eye view I have the following data:

  1. Voting results by election commissions: number of votes for every political party and a lot of technical measures like number of registered voters, number of good and damaged voting ballots and so on.
  2. Turnout figures at given times throughout the day.

From a more technical point of view, the data was stored in two big files with multiple JSON in each. As the data preparation part is big enough, it was extracted to another post. This one concentrates purely on visualisation and the second one is about data flows and comparison to Excel.

Analysing the Data

I did some cleaning, refining and enriching of the data and it's time to use it. I started with a standard Stacked bar chart. It shows percentages of parties by regions and in addition width of bars shows Total votes. The chart is sorted by ascending Total votes.

What can we say here?

Before I start talking I need a lawyer and a disclaimer #2:

Disclaimer #2 Some of the results may be interpreted in different ways. Some of them may be not so pleasant for some people. But I'm not a court and this post is only a data visualisation exercise. Therefore I'm not accusing anyone of committing any crimes. I will make some conclusions because of rules of the genre, but they should be treated as hypotheses only.

That’s not a proven charge (see disclaimer #2) but for me these regions look a bit suspicious. Their results are very uncommon. United Russia ruling party (orange bars) got an extremely high result in these few regions. This may be a sign of some kind of interfere with an election process there. But of course, other explanations (including a measure incorrectly selected for sorting) exist.

Just for reference so we don’t forget the names: Tatarstan, Kemerovo, Dagestan, Chechnya and Mordovia. There are a few more regions with similar results but their number of voters is lower so I don’t show them here.

At this point I'm starting to suspect something. But I need more visuals to support my position, and my next hypothesis is that in these regions ballots were somehow added to voting boxes (or protocols were changed which is basically the same). From a data visualisation point of view that will mean that these regions will have higher turnout (because of added ballots) along with higher United Russia result.

To check this I need one more measure - Turnout, %. It shows how many of registered voters actually voted. I can create this field without leaving visualisation mode. Cool.

Note. This formula may be not absolutely correct but it works well for demonstration purposes.

In order to visualise this hypothesis, I built a Scatter chart. Its horizontal axis is Turnout,% and vertical one United Russia, %. And I added a trend line to make things more obvious. Colour density shows Total votes.

I think my hypothesis just got a strong support. As usual it is not an absolutely impossible situation. But it's hard to explain why the more people come to voting stations the higher one party result is. I'd expect either high result not depending on the turnout (more or less like Uniform distribution) or at least a few exceptions with high turnout and low result.

I consider this result strange because in real life I'd expect that higher turnout should mean more opposition voters (a passive group indeed) coming to voting stations. But that's only my opinion. And highly arguable I should agree. What I really want to show here is that these charts highlight an oddity that should be investigated and may or may not have a rational explanation.

And who are our heroes? Let’s zoom in on the chart …and we see the same regions.

But maybe other parties can show the same result? We can build the same Scatter charts for every party or we can visualise all at once with a usual Line chart. Here I’ve plotted the percent of vote won by each party (Y-axis) against the overall turnout % (X-axis).

United Russia is the only party that increases with turnout.

So far all my charts were about relative measures, it's time to check some absolute values. Here is a Bar chart which shows a number of precinct commissions by results. I'd expect to see something close to normal distribution - a bell-shaped chart with the maximum around 54% (average turnout). Now, look at the real chart (bin size=1.0%). Its maximum is at 36-37%.

Normal distribution

In probability theory, the normal (or Gaussian) distribution is a very common continuous probability distribution. Normal distributions are important in statistics and are often used in the natural and social sciences to represent real-valued random variables whose distributions are not known.

Strictly speaking all numbers I'm showing here are discrete and I should say Binomial distribution rather than Normal but right now for my purposes the diffence is not that big.

I'm definitely not Carl Gauss (and even not one of his best students) and you may ignore my opinion, but I expected something more like this:

And I don't have the slightest idea how it is possible that the most "popular" turnout is 100%.

What if we look at the same chart with more details? The previous one was grouped by 1% bins, and this one has 0.1% bins. And I had to add turnout not equal to 100% filter. Even with smaller bin size, the last one is almost the same ~3K commissions. This bar is much bigger than the others and the chart doesn't show anything in that case.
What can we see here? Well, people aren’t really good in generating random numbers. It's perfectly OK to have some runout on the chart. But hey, it's not normal to have them mostly at round values. That looks like someone was trying to fit the result to some plan.

Here is my favourite part of the chart. I marked 1% intervals, and you can see that round turnout value is always more probable than its closest non-round neighbours. No exceptions. A possible explanation is that the number of commissions with results that high is relatively low and even the slightest manipulation is clearly visible.

But wait. What about that 64.3 percent? It’s not round, but it is a big runaway. Let’s take a closer look at this value and check if there is anything interesting or that is a normal situation. Here is a few interesting visualisation for it.

The first one is Tree Diagram. It shows all existing combinations of district and precinct commissions by regions for the filtered data (turnout=64.3). And in order to demonstrate how it works for this case I made an animation. Most of the regions have a few commissions with 64.3% turnout. But Saratov region beats them all.

This visualisation has a serious flaw. End-user has to scroll it (I mean for this set of data) and can miss the point. Another visualisation can improve the situation. Network diagram doesn't need scrolling.

Looks good and shows exactly the same. But for this chart we must ensure that every data point is unique what is not true in my case. Different precinct commissions have the same numbers and I had to create a unique field first (DEC #||PEC #). It's easy to forget and get unpredictable or even misleading results.

Or if you prefer more traditional charts, here is Sunburst for you. Its sectors size shows Total votes and the colour is PEC count. It gives a good representation of how uncommon Saratov's result is.

And the last picture for the same topic boring never-old classic Bar chart.

Considering all these charts I'd say that almost exclusive concentration of commissions with 63.4% turnout in Saratov doesn't look normal for me. It's pretty weird that sibling commissions show exactly the same figures.

A few more diagrams which could work well are Sankey and Parallel coordinates, unfortunately, they are less informative because of the high number of precinct commissions. But if the number was lower I'd consider them too.

All previous charts are based on voting data. But I have one more dataset - official turnout. Let's check if we can find anything interesting there. And unfortunately significant part of commissions doesn't have official data, and sometimes I may use formulas that are not exactly the same as official ones, so numbers may differ slightly from what I got from the protocols data.

The first chart shows the number of commissions (vertical axis) by the official turnout (horizontal axis). Colour shows the moment of time. Strictly saying I shouldn't have used continuous linear charts for discrete values, but coloured overlapped bars don't give that clear picture.

Except for the 100% tail, everything is more or less natural. Graph shape looks more like Gamma distribution rather than Normal but I didn't test it.

What picture do I have for various regions?
Moscow city is known for a relatively high number of poll watchers and we may expect more clean data there. Ignoring the long tail, these look normal (or binomial if you want to be precise).

Saratov region. The one with 64.3% turnout. Look at these peaks. Do they look natural to you?
Do you remember Tatarstan (was the hero in the beginning of this story)? Here it is. I don't know how can anyone explain how it is possible (without manual results adjusting I mean).

Summary

This post shows how we can use Oracle DVD for visualisation of a real data set. And I hope I was able to convince you that this tool can be useful and can give you really interesting insights. Of course, visualisation alone doesn't answer all questions. And this time actually it was less about answers but more about questions. It helps to ask right questions.

More reading on the topic: 1, 2 (Russian language). If you can read Russian, here you will find more visualisations, discussions and interesting information. And this article is about elections in 2011. Its undisputable advantage is that it is in English.

Advisor Webcast: Performance Tuning of Reporting and Analysis Framework (RAF) and FR

Advisor Webcast

Performance Tuning of Reporting and Analysis Framework (RAF) and FR

March 14, 2017

Schedule:
Tuesday , March 14, 2017 08:00 AM (US Pacific Time)
    Tuesday , March 14, 2017 11:00 AM (US Eastern Time)
    Tuesday , March 14, 2017 04:00 PM (Central European Time)
    Tuesday , March 14, 2017 08:30 PM (India Standard Time)

This one-hour advisor webcast is recommended for technical users, functional users, system administrators and Database Administrators etc. who mainly administer the system/environment. Tuning recommendations from an RAF/FR perspective will be covered in this webcast.

Topics Include:

  • Reporting and Analysis Framework (RAF) serves as a base for all Oracle EPM reporting components. The Java based RAF web application deployed on WebLogic enables users to browse through reports of all kinds from an Oracle EPM reporting perspective.
  • Oracle Hyperion Financial Reporting (FR) is a tool which enables the design and presentation of analytical data and financial reporting. FR uses an object-oriented graphical report layout. Whether you have little or advanced experience using an object-oriented graphical program, FR makes report designing easy.
  • EPM Workspace is the Web user interface that is used to access all Oracle Hyperion and non-Oracle Hyperion content. Oracle Hyperion content includes Reporting and Analysis Framework and Oracle's Hyperion financial applications.
For Additional Information and Registration Details - visit:

My Oracle Support Community
Advisor Webcast - Performance Tuning of RAF and FR

To view scheduled & archived recordings of previous Business Analytics Advisor Webcasts visit:

Oracle Business Analytics Advisor Webcast
Doc ID 1456233.1



CUSTOMER NOTIFICATION : Attendees using recent Chrome or Firefox browser versions maybe required to activate the WebEx plug-in before joining a WebEx meeting. For up to-date information review the WebEx Support page:

https://support.webex.com/webex/meetings/en_US/chrome-firefox-join-faq.htm


OBIEE 12c Time Dimension: Logical Sequence Number

OBIEE 12c Time Dimension: Logical Sequence Number

The key component of any successful OBIEE implementation is the metadata model known as the repository (RPD). Doing it right is sometimes considered "black magic", and small mistakes in the RPD can impact all the exposed Subject Areas, resulting in poor performances or, even worse, wrong results.

Working an RPD requires dedicated knowledge of the tool and we are sharing it in our RPD modelling training both for OBIEE 11g and OBIEE 12c.

If you ever worked on RPD modelling, one of the settings you surely encountered is the Time dimension. This blog post written back in 2007 explains the process of setting up a time dimension for OBIEE 10g. The process didn't have any major modifications until recently when, in 12.2.1, Logical Sequence Numbers were introduced. As per Oracle's documentation this new feature "optimizes time series functions and in some cases improves query time", and in this post we'll see how to configure it and its impact on the time-series calculations. The examples shown below are based on Oracle Sampleapp v607, a really good source of modelling and front-end examples.

Usual Time-series Query Behaviour

Time-series functions like Ago,ToDate, and more recently PeriodRolling, allow end users to compare results coming from different moments just by specifying the level in the time dimension hierarchy and the number of periods to look backwards or forwards. As example if you needed to compare current month sales revenue with the previous month figure you'll end up writing a formula like

 AGO("F0 Sales Base Measures"."1- Revenue","H0 Time"."Month", 1)

Where:

  • AGO: is the Time-series function being called
  • "F0 Sales Base Measures"."1- Revenue": is the metric
  • "H0 Time"."Month": is the time hierarchy level
  • 1: is the amount of periods (months in our case) to look back in history

Once the time-series metric has been created, it can be used in an analysis like the following to compare Revenue of 2015-04 with the one of the previous month.

OBIEE 12c Time Dimension: Logical Sequence Number

The analysis generates the following Logical SQL which basically lists the columns retrieved and the filters applied.

SELECT  
   0 s_0,
   "A - Sample Sales"."Time"."T02 Per Name Month" s_1,
   "A - Sample Sales"."Base Facts"."1- Revenue" s_2,
   "A - Sample Sales"."Time Series"."127  Mth Ago Rev  (Fix Time Lvl)" s_3
FROM "A - Sample Sales"  
WHERE  
("Time"."T02 Per Name Month" = '2015 / 04')
ORDER BY 2 ASC NULLS LAST  
FETCH FIRST 5000001 ROWS ONLY  

The translation to source SQL hugely depends on the data model created in the RPD and on the data source type. In our example an Oracle SQL gets generated containing the following steps:

  • Sequence Generation: a RANK function is used to created a dense sequence based on Per_Name_Month, the chronological key defined in the time-hierarchy for the month level. Mth_Key is also part of the query since it's used in the join between dimension and fact table.
WITH  
OBICOMMON0 AS (select DENSE_RANK() OVER ( ORDER BY T653.Per_Name_Month) as c1,  
     T653.Mth_Key as c2,
     T653.Per_Name_Month as c3
from  
     BISAMPLE.SAMP_TIME_MTH_D T653 /* D02 Time Month Grain */ ),
  • Period Shifting: The sequence generated above is now shifted by the number of periods defined in the column formula (D1.c1 + 1), in our example 1.
SAWITH0 AS (select D1.c1 + 1 as c1,  
     D1.c2 as c2,
     D1.c3 as c3
from  
     OBICOMMON0 D1),
  • Ago Period Query: Using the period shifted query the historical record (or set of records) is retrieved.
SAWITH1 AS (select distinct D1.c1 as c1,  
     D1.c3 as c2
from  
     OBICOMMON0 D1),
SAWITH2 AS (select sum(T418.Revenue) as c1,  
     D3.c2 as c2
from  
     BISAMPLE.SAMP_REVENUE_F T418 /* F10 Billed Rev */ ,
     SAWITH0 D4,
     SAWITH1 D3
where  ( T418.Bill_Mth_Key = D4.c2 and D3.c1 = D4.c1 and D3.c2 = '2015 / 04' )  
group by D3.c2, D4.c3),  

The period shifted query usage is explained visually by the image below

OBIEE 12c Time Dimension: Logical Sequence Number

  • Selected Period Query: the query for the selected period, in our case 2015-04, is executed using standard time dimension
SAWITH3 AS (select sum(T418.Revenue) as c1,  
     T653.Per_Name_Month as c2
from  
     BISAMPLE.SAMP_TIME_MTH_D T653 /* D02 Time Month Grain */ ,
     BISAMPLE.SAMP_REVENUE_F T418 /* F10 Billed Rev */ 
where  ( T418.Bill_Mth_Key = T653.Mth_Key and T653.Per_Name_Month = '2015 / 04' )  
group by T653.Per_Name_Month)
  • Resultsets joining: Results coming from Ago Period and Selected Period queries are then joined with an outer join.
select D1.c1 as c1, D1.c2 as c2, D1.c3 as c3, D1.c4 as c4 from ( select D1.c1 as c1,  
     D1.c2 as c2,
     D1.c3 as c3,
     D1.c4 as c4
from  
     (select 0 as c1,
               coalesce( D1.c2, D2.c2) as c2,
               D2.c1 as c3,
               D1.c1 as c4,
               ROW_NUMBER() OVER (PARTITION BY coalesce( D1.c2, D2.c2) ORDER BY coalesce( D1.c2, D2.c2) ASC) as c5
          from 
               SAWITH2 D1 full outer join SAWITH3 D2 On D1.c2 = D2.c2
     ) D1
where  ( D1.c5 = 1 )  
order by c2 ) D1 where rownum <= 5000001  

As you can see, it's a rather complex set of instructions that involves several steps including analytical functions like the DENSE_RANK() in the sequence generation. In our case we have been lucky that the source system provided the DENSE_RANK() function; with other sources the sequence generation needs to be calculated directly by the BI Server (OBIS) with a possible degradation of performances.

What is Logical Sequence Number?

The Sequence Generation step mentioned above can sometimes be the bottleneck of the whole query especially when the time dimension is complex or huge in volume since DENSE_RANK() is a costly analytical function.

All that OBIEE's Time-series need to work is a pure sequence, or in Oracle's words:

Sequence numbers are enumerations of time dimensional members at a certain level. The enumeration must be dense (no gaps) and must correspond to a real time order. For example, months in a year can be enumerated from 1 to 12.

Then what if we can find a way of pre-calculating them and storing in the table or calculate them on the fly but using functions less expensive than a DENSE_RANK()?

This is the idea behind the Logical Sequence Number (LSN): a way of avoiding the BI Server (OBIS) needing to execute the DENSE_RANK(), by passing either a pre-calculated sequence column in the source table or a cheaper-cost function to calculate it on the fly based on existing columns.

The formula behind a Logical Sequence Number must resolve in a number (no varchar, or dates are allowed) and can either be:

  • Absolute: when the sequence is consistent and doesn't need any external reference, e.g. calendar years are a self-defined sequence
  • Relative: when the sequence is relative to a parent level, e.g. creating a sequence for months between 1 and 12 would need the calendar year as parent level to provide correct cross-years references

As the word already says "Sequence Numbers" must be sequential, no gaps can be included. Year and month number are good examples of this.

But what if we wanted to provide an absolute sequence number at month level?
One could think about using a month key in the YYYYMM format but this is not a sequence: 201701 <> 201612+1. The best way of generating the sequence number would be to add it as a column in the database table.

Will take as example the SAMP_TIME_MTH_D table that can be found in Sampleapp v607 containing MONTH_KEY column in the YYYYMM format.
OBIEE 12c Time Dimension: Logical Sequence Number

I'll add to SAMP_TIME_MTH_D a column MONTH_ABS_SEQ containing the absolute sequence number. The formula behind the column is exactly what Oracle was doing under the covers using a DENSE_RANK.

SELECT MTH_KEY, DENSE_RANK() OVER (ORDER BY MTH_KEY) MONTH_ABS_SEQ FROM SAMP_TIME_MTH_D  

And the end result as expected being

OBIEE 12c Time Dimension: Logical Sequence Number

How are Logical Sequence Number Used?

Previously I described the two types of Logical Sequence Numbers: Absolute and Relative, each one has it use case:

  • PERIODROLLING: This function uses absolute LSN to calculate the starting Sequence Number based on the selected one: e.g. looking at the previous image a 6 month rolling starting from 2009-02 will include data from sequences in the range [9-14].
  • TO_DATE: uses relative LSN, e.g A YearToDate Measure shown by month will use the month relative LSN to calculate the previous months in the same year.
  • AGO: Ago function uses both absolute and relative: Absolute numbers are used if the grain of the query and the one of the AGO are at the same level e.g. Yearly analysis of Actual vs Previous Year. However when the grain of the shift in the ago is higher than the grain of the query Relative LSN are used, e.g. Monthly analysis of Actual vs Previous Year.

How to Configure Logical Sequence Numbers

Logical Sequence Number is a new feature and as such it requires additional settings in the time dimension hierarchy to be working. In our case we'll add two sequences, an absolute at calendar year level and a relative at calendar month level.
We'll add the sequences directly as formulas in the repository however those formulas should be pushed down as columns in the database table if optimal performances are sought.

In order to set the Logical Sequence Numbers we need to open the RPD (the SampleApp one in this test), and select the Time dimension we want to change.
OBIEE 12c Time Dimension: Logical Sequence Number

After selecting any level apart from the Grand Total (top of the hierarchy) a new tab called "Sequence Numbers" should be visible. However if, like in our case, none of the columns at that level are integer or doubles, the sequence number selector is disabled.

OBIEE 12c Time Dimension: Logical Sequence Number

In order to enable the selector we first need to create a sequence column in our dimension and bring it at the correct level in the hierarchy. For the Year Level there is already an integer column named "T35 Cal Year" which can be used as sequence. We need simply to drag the column at Year level in the Time hierarchy and set it as Absolute Sequence.

OBIEE 12c Time Dimension: Logical Sequence Number

I can do the same with the Month level in the hierarchy and the "T32 Cal Month" column. Note that the column contains only the months enumeration from 1 till 12 so we need to set the sequence number as Relative to the level Year.

OBIEE 12c Time Dimension: Logical Sequence Number

Please note that both absolute and relative LSN can be (and should be) entered since as discussed above each have a different use cases. In addition relative LSN should be set for all logical parents level in the hierarchy since they will be used only if the grain of the time shift matches the one of the parent level. For example a Monthly LSN based on Year logical level will only be used in AGO functions having a year shift and not in case of Quarterly shift.

For an optimal usage every level of the time hierarchy should have one absolute and a relative LSN for each of the parents level in the hierarchy.

Impact on SQL

It's time now to review the SQL generated by our analysis and check the differences with the old-school time-series query.

When creating an analysis at year level like the following
OBIEE 12c Time Dimension: Logical Sequence Number

As Expected the BI Server (OBIS) uses the CAL_YEAR column as sequence instead of the DENSE_RANK() function over the PER_NAME_YEAR column.

WITH  
OBICOMMON0 AS (select T795.CAL_YEAR as c1,  
     T795.QTR_KEY as c2,
     T795.PER_NAME_YEAR as c3
from  
     BISAMPLE.SAMP_TIME_QTR_D T795 /* D03 Time Quarter Grain */ )

While when using the TO_DATE both the relative sequence is used, like in the following example where the measure "166 Revenue Year To Date" is defined by the formula:

TODATE("01 - Sample App"."F0 Sales Base Measures"."1- Revenue",  "01 - Sample App"."H0 Time"."Year" )  

OBIEE 12c Time Dimension: Logical Sequence Number

The following query gets generated, note the usage of Cal_Year and Cal_Month in the sequence generation query instead of the DENSE_RANK() function as per RPD settings mentioned above.

WITH  
OBICOMMON0 AS (select T653.Cal_Year as c1,  
     T653.Cal_Month as c2,
     T653.Per_Name_Month as c3,
     T653.Per_Name_Year as c4
from  
     BISAMPLE.SAMP_TIME_MTH_D T653 /* D02 Time Month Grain */ )
...

Are Logical Sequence Numbers Useful?

Most of the times the bottleneck when using Time Series is not in the sequence generation, since the time dimension cardinality is rarely big enough to produce a noticeable delay in the query time, but rather in the poorly managed calculations made on query time on top of massive fact tables.

Don't expect LSN to solve all your performance problems with Time Series. However, the usage of Logical Sequence Numbers provides to OBI a way of pre-cooking part of the calculation and so in theory should help performance. The small effort required to set them up centrally in the time hierarchy is covered by the benefits during query time, without having to touch any pre-defined time-series calculation.

If you do have performance problems with your OBIEE system, or would like to ensure that a system you’re building will be performant from the outset, please get in touch to find out more about our Performance Analytics service!
We also provide expert OBIEE training, implementations, QA and health checks - to find out more about how we can help you, please contact us!