Tag Archives: Hyperion

Getting The Users’ Trust – Part 2

Last time I wrote about the performance aspects of a BI system and how they could affect a user’s confidence. I concluded by mentioning that incorrect data might be generated by poorly coded ETL routines causing data loss or duplication. This time I am looking more at the quality of the data we load (or don’t load).

Back in the 1990’s I worked with a 4.5 TB DWH that had a single source for fact and reference data, that is the data loaded was self-consistent. Less and less these days we find a single source DWH to be the case; we are adding multiple data sources (both internal and external). Customers can now appear on CRM, ERP, social media, credit referencing, loyalty, and a whole host of other systems. This proliferation of data sources gives rise to a variety of issues we need to be at least aware of, and in reality, should be actively managing. Some of these issues require us to work out processing rules within our data warehouse such as what do we do with fact data that arrives before its supporting reference data; I once had a system where our customer source could only be extracted once a week but purchases made by new customers would appear in our fact feed immediately after customer registration. Obviously, it is a business call on whether we publish facts that involve yet to be loaded customers straight away or defer those loads until the customer has been processed in the DWH. In the case of my example we needed to auto-create new customers in the data warehouse with just the minimum of data, the surrogate key and the business key and then do a SCD type 1update when the full customer data profile is loaded the following week. Technical issues such as these are trivial, we formulate and agree a business rule to define our actions and we implement it in our ETL or, possibly, the reporting code. In my opinion the bigger issues to resolve are in Data Governance and Data Quality.

Some people combine Data Quality and Governance together as a single topic and believe that a single solution will put all right. However, to my mind, they are completely separate issues. Data quality is about the content of the data and governance is about ownership, providence and business management of the data. Today, Data Governance is increasingly becoming a regulatory requirement, especially in finance.

Governance is much more than the data lineage tools we might access in ETL tools such as ODI and even OWB. ETL lineage is about source to target mappings; our ability to say that ‘bank branch name’ comes from this source attribute, travels through these multiple ODI mappings and finally updates that column in our BANK_BRANCH dimension table. In true Data Governance we probably do some or all of these:

  • Create a dictionary of approved business terms. This will define every attribute in business terms and also provide translations between geographic and business-unit centric ways of viewing data. In finance one division may talk about “customer”, another division will say “investor”, a third says “borrower”; in all three cases we are really talking about the same kind of object, a person. This dictionary should go down to the level of individual attribute and measures and include the type of data being held such as text, currency, date-time, these data types are logical types and not physical types as seen on the actual sources. It is important that this dictionary is shared throughout the organisation and is “the true definition” of what is reported.
  • Define ownership (or stewardship) for the approved business data item.
  • Map business data sources and targets to our approved list of terms (at attribute level). It is very possible that some attributes will have multiple potential sources, in such cases we must specify which source will be the master source.
  • Define processes to keep our business data aligned.  
  • Define ownership for the sources for design (and for static data such as ISO country codes, content) change accountability. Possibility integrate into change notification mechanism of change process.
  • Define data release processes for approved external reference data.
  • Define data access and redaction rules for compliance purposes.
  • Build-in audit and control.
As you can see we are not, in the main, talking data content, instead we are improving our description of the business data over that are already held in database data dictionaries and XSD files. This is still metadata and is almost certainly best managed in some kind of Data Governance application. One tool we might consider for this is Oracle Data Relationship Manager from the Hyperion family of products. If we want to go more DIY it may be possible to leverage some of the data responsibility features of Oracle SQL Developer Data Modeller.

Whereas governance is about using the right data and having processes and people to guarantee it is correctly sourced, Data Quality is much finer in grain and looks at the actual content. Here a tool such as Oracle Enterprise Data Quality is invaluable. By the way I have noticed that OEDQ version 12 has recently been released, I have a blog on this in the pipeline.

I tend to divide Data Quality into three disciplines:

  • Data Profiling is always going to be our first step. Before we fix things we need to know what to fix! Generally, we try to profile a sample of the data and assess it column by column, row by row to build a picture of the actual content. Typically we look at data range, nulls, number of distinct values and in the case of text data: character types used (alpha, letter case, numeric, accents, punctuation etc), regular expressions. From this we develop a plan to tackle quality, for example on a data entry web-page we may want to tighten processing rules to prevent certain “anticipated” errors; more usually we come up with business rules to apply in our next stage. 
  • Data Assessment. Here we test the full dataset against the developed rules to identify data that conforms or needs remedy. This remedy could be referring the data back to the source system owner for correction, providing a set of data fixes to apply to the source which can be validated and applied as a batch, creating processes to “fix” data on the source at initial data entry, or (and I would strongly advise against this for governance reasons) dynamically fix in an ETL process. The reason I am against fixing data downstream in ETL is that the data we report on in our Data Warehouse is not going to match the source and this will be problematic when we try to validate if our data warehouse fits reality.
  • Data de-duplication. This final discipline of our DQ process is the most difficult, identifying data that is potentially duplicated in our data feed. In data quality terms a duplicate is where two or more rows refer to what is probably (statistically) the same item, this is a lot more fuzzy than an exact match in database terms; people miskey data, call centre staff mis-hear names, companies merge and combine data sets, I have even seen customers registering a new email address because they can not be bothered to reset their password on a e-selling website. De-duplication is important to improve the accuracy of BI in general, it is nigh-on mandatory for organisations that need to manage risk and prevent fraud.
Data Quality is so important to trusted BI; without it we run the risk that our dimensions do not roll-up correctly and that we under-report by separating our duplicates. However, being correct at the data warehouse is only part of the story, these corrections also need to be on the sources; to do that we have to implement processes and disciplines throughout the organisation.
 
For BI that users can trust we need to combine both data management disciplines. From governance we need to be sure that we are using the correct business terms for all attributes and that the data displayed in those attributes has made the correct journey from the original source. From quality we gain confidence that we are correctly aggregating data in our reporting.
 
At the end of the day we need to be right to be trusted.

 

 

Getting The Users’ Trust – Part 2

Last time I wrote about the performance aspects of a BI system and how they could affect a user’s confidence. I concluded by mentioning that incorrect data might be generated by poorly coded ETL routines causing data loss or duplication. This time I am looking more at the quality of the data we load (or don’t load).

Back in the 1990’s I worked with a 4.5 TB DWH that had a single source for fact and reference data, that is the data loaded was self-consistent. Less and less these days we find a single source DWH to be the case; we are adding multiple data sources (both internal and external). Customers can now appear on CRM, ERP, social media, credit referencing, loyalty, and a whole host of other systems. This proliferation of data sources gives rise to a variety of issues we need to be at least aware of, and in reality, should be actively managing. Some of these issues require us to work out processing rules within our data warehouse such as what do we do with fact data that arrives before its supporting reference data; I once had a system where our customer source could only be extracted once a week but purchases made by new customers would appear in our fact feed immediately after customer registration. Obviously, it is a business call on whether we publish facts that involve yet to be loaded customers straight away or defer those loads until the customer has been processed in the DWH. In the case of my example we needed to auto-create new customers in the data warehouse with just the minimum of data, the surrogate key and the business key and then do a SCD type 1update when the full customer data profile is loaded the following week. Technical issues such as these are trivial, we formulate and agree a business rule to define our actions and we implement it in our ETL or, possibly, the reporting code. In my opinion the bigger issues to resolve are in Data Governance and Data Quality.

Some people combine Data Quality and Governance together as a single topic and believe that a single solution will put all right. However, to my mind, they are completely separate issues. Data quality is about the content of the data and governance is about ownership, providence and business management of the data. Today, Data Governance is increasingly becoming a regulatory requirement, especially in finance.

Governance is much more than the data lineage tools we might access in ETL tools such as ODI and even OWB. ETL lineage is about source to target mappings; our ability to say that ‘bank branch name’ comes from this source attribute, travels through these multiple ODI mappings and finally updates that column in our BANK_BRANCH dimension table. In true Data Governance we probably do some or all of these:

  • Create a dictionary of approved business terms. This will define every attribute in business terms and also provide translations between geographic and business-unit centric ways of viewing data. In finance one division may talk about “customer”, another division will say “investor”, a third says “borrower”; in all three cases we are really talking about the same kind of object, a person. This dictionary should go down to the level of individual attribute and measures and include the type of data being held such as text, currency, date-time, these data types are logical types and not physical types as seen on the actual sources. It is important that this dictionary is shared throughout the organisation and is “the true definition” of what is reported.
  • Define ownership (or stewardship) for the approved business data item.
  • Map business data sources and targets to our approved list of terms (at attribute level). It is very possible that some attributes will have multiple potential sources, in such cases we must specify which source will be the master source.
  • Define processes to keep our business data aligned.  
  • Define ownership for the sources for design (and for static data such as ISO country codes, content) change accountability. Possibility integrate into change notification mechanism of change process.
  • Define data release processes for approved external reference data.
  • Define data access and redaction rules for compliance purposes.
  • Build-in audit and control.
As you can see we are not, in the main, talking data content, instead we are improving our description of the business data over that are already held in database data dictionaries and XSD files. This is still metadata and is almost certainly best managed in some kind of Data Governance application. One tool we might consider for this is Oracle Data Relationship Manager from the Hyperion family of products. If we want to go more DIY it may be possible to leverage some of the data responsibility features of Oracle SQL Developer Data Modeller.

Whereas governance is about using the right data and having processes and people to guarantee it is correctly sourced, Data Quality is much finer in grain and looks at the actual content. Here a tool such as Oracle Enterprise Data Quality is invaluable. By the way I have noticed that OEDQ version 12 has recently been released, I have a blog on this in the pipeline.

I tend to divide Data Quality into three disciplines:

  • Data Profiling is always going to be our first step. Before we fix things we need to know what to fix! Generally, we try to profile a sample of the data and assess it column by column, row by row to build a picture of the actual content. Typically we look at data range, nulls, number of distinct values and in the case of text data: character types used (alpha, letter case, numeric, accents, punctuation etc), regular expressions. From this we develop a plan to tackle quality, for example on a data entry web-page we may want to tighten processing rules to prevent certain “anticipated” errors; more usually we come up with business rules to apply in our next stage. 
  • Data Assessment. Here we test the full dataset against the developed rules to identify data that conforms or needs remedy. This remedy could be referring the data back to the source system owner for correction, providing a set of data fixes to apply to the source which can be validated and applied as a batch, creating processes to “fix” data on the source at initial data entry, or (and I would strongly advise against this for governance reasons) dynamically fix in an ETL process. The reason I am against fixing data downstream in ETL is that the data we report on in our Data Warehouse is not going to match the source and this will be problematic when we try to validate if our data warehouse fits reality.
  • Data de-duplication. This final discipline of our DQ process is the most difficult, identifying data that is potentially duplicated in our data feed. In data quality terms a duplicate is where two or more rows refer to what is probably (statistically) the same item, this is a lot more fuzzy than an exact match in database terms; people miskey data, call centre staff mis-hear names, companies merge and combine data sets, I have even seen customers registering a new email address because they can not be bothered to reset their password on a e-selling website. De-duplication is important to improve the accuracy of BI in general, it is nigh-on mandatory for organisations that need to manage risk and prevent fraud.
Data Quality is so important to trusted BI; without it we run the risk that our dimensions do not roll-up correctly and that we under-report by separating our duplicates. However, being correct at the data warehouse is only part of the story, these corrections also need to be on the sources; to do that we have to implement processes and disciplines throughout the organisation.
 
For BI that users can trust we need to combine both data management disciplines. From governance we need to be sure that we are using the correct business terms for all attributes and that the data displayed in those attributes has made the correct journey from the original source. From quality we gain confidence that we are correctly aggregating data in our reporting.
 
At the end of the day we need to be right to be trusted.

 

 

OBIEE Workspace Integration Requires a Browser Plugin

Oracle BI 11.1.1.7 now has the ability to integrate with the well-known entry point for Oracle EPM, Workspace. However, from a browser perspective this integration requires additional plug-in configuration on each user's browser in order to render the interface correctly. Some users already attempting this OBIEE to Workspace integration may have already seen an alert or notice to install a plug-in for their browser finding that the OBIEE Workspace Integration Requires a Browser Plugin.

The post OBIEE Workspace Integration Requires a Browser Plugin appeared first on Art of Business Intelligence Blog.

Rittman Mead / ODTUG India BI Masterclass Tour Roundup

Over the past week Venkat, myself and the Rittman Mead India team have been running a series of BI Masterclasses at locations in India, in conjunction with ODTUG, the Oracle Development Tools User Group. Starting off in Bangalore, then traveling to Hyderabad and Mumbai, we presented on topics ranging from OBIEE through Exalytics through to EPM Suite and BI Applications, and with networking events at the end of each day.

NewImage

Around 50 attended at Bangalore, 30 in Hyderbad and 40 in Mumbai, at at the last event we were joined by Harsh Bhogle from the local Oracle office, who presented on Oracle’s high-level strategy around business analytics. Thanks to everyone who attended, thanks to ODTUG for sponsoring the networking events, and thanks especially to Vijay and Pavan from Rittman Mead India who organised everything behind the scenes. If you’re interested, here’s a Flickr set of photos from all three events (plus a few at the start where I visited our offices in Bangalore.)

For anyone who couldn’t attend the events, or if you were there and you’d like copies of the slides, the links below are for the PDF versions of the sessions we presented at various points over the week.

So I’m writing this in my hotel room in Mumbai on Sunday morning, waiting for the airport transfer and then flying back to the UK around lunchtime. It’s been a great week but my only regret was missing the UKOUG Apps’13 conference last week, where I was also supposed to be speaking but managed to double-book myself with the event in India.

In the end, Mike Vickers from Rittman Mead in the UK gamely took my place and presented my session, which was put together as a joint effort with Minesh Patel, another of the team in the UK and one of our BI Apps specialists. Entitled “Oracle BI Apps – Giving the Users the Reports they *Really* Want”, it’s a presentation around the common front-end customisations that we typically carry out for customers who want to move beyond the standard, generic dashboards and reports provided by the BI Apps, and again if you missed the session or you’d like to see the slides, they’re linked-to below:

That’s it for now – and I’ll definitely be at Tech’13 in a few weeks’ time, if only because I’ve just realised I’m delivering the BI Masterclass sessions on the Sunday, including a session on OBIEE/ODI and Hadoop integration - I’ve been saying to myself I’d like to get these two tools working with Impala as an alternative to Hive, so that gives me something to start looking at on the flight back later today.

Installing Essbase Analytics Link for HFM 11.1.2.3

If your requirement is to install EAL for HFM, then this post is for you. Oracle has not yet released a new version of EAL that is certified with HFM 11.1.2.3 and hence you will not be able to find one in the edelivery. The question is – what version of EAL one can use with EPM 11.1.2.3. A few days ago, MOS published a Doc – 1570187.1 that said we could use EAL 11.1.2.2.301 PSU for the same. This PSU is a full installation, which is available for download at MOS site. Also, any previous installation of EAL must be uninstalled before installing this release.

The pre-requisites from the documentation include (these are applicable only if you have an existing EAL instance, and not applicable for fresh installations) -

  • Clear the existing Analytics Link repository (this is bad since you need to redefine the connections, regions, bridges etc.)
  • Unregister previous instance of Analytics Link Server from Foundation Services
  • Reset Data Sync Server

This post will demonstrate a fresh installation on a Windows 2008 64-bit server. Also, this post will not give you a step-by-step approach but will highlight the key steps in the installation.

Like other EPM products, EAL cannot use EPM System Installer for installation. To install EAL 11.1.2.2.301, you should use 32-bit version Oracle Universal Installer 11.2, even if the installation is going to be on a 64-bit machine. The OUI will install Analytics Link version that matches the bitness of the operating system. This comes with the EAL download and hence no separate download required. Alternatively, you can use the OUI that comes with Oracle database installed if you’ve one on the same server.

Installation:

Run the OUI installer; select the installation type and destination folder.

eal1

To let OUI know what product we’re going to install, we need to browse to the path where products.xml exists under the unzipped EAL part.

eal2

Specify a path where you want to install EAL and kick-off installation. Make sure the installer displays 11.1.2.2.301 version during the install.

eal3

Once the installation is finished, the configuration tool starts up. You must enter the Weblogic server details and Analytics Link repository details when prompted.

eal4

eal5

The Doc ID mentioned earlier also suggests not using the default EPM Instance Home location at the Foundation Configuration step.

eal5

Give a suitable Username and password for the Data Sync Server and use an account that is an Administrator to configure Analytics Link services. Unfortunately, the configuration tool doesn’t show the progress of configuration, so you’ll have to wait until you see the ‘success’ message on the window. As specified earlier, this is a full installation and cannot be rolled back.

Verifying the installation:

After successful installation, to verify that EAL is able to connect to HFM, we have to log on to Essbase Administration Services Console (use client installers to install EAS Console) and import the EAL plug-in which is shown in the below screenshots. Go to Tools>Configure components and click Add to import the EAL plug-in.

eal7

Navigate to the directory where Analytics Link server is installed (HFS_HOME which is C:\EAL_Home in our case) and import the jar file eas-plugin_wl.jar.

eal8

After the import is finished, you may need to exit and restart EAS Console to see the ‘Analytics Link Servers’ node.

eal9

Add a new Analytics Link Server by specifying the username and password that is given at the time of configuration.

eal10

Now that we have successfully imported EAL plug-in and added our First Analytics Link server, to verify the HFM connectivity – we need to define which HFM application that EAL should connect to etc. and to what Essbase database it should write the outline/data based on HFM application. Basically, we should configure all the objects under the ‘First’ Analytics Link Server.

HFM Server and Application:

eal11

Essbase Application and Database:

eal12

After you define the Data Sync Server and Data Store – create a bridge that acts as a link between HFM application and Essbase database refreshing the outline and data.

eal13

Open the bridge, create a bridge application and check if the outline is created.

eal14

eal15

Now, we can conclude that there are no configuration related issues since we’re able to refresh the metadata to Essbase without any issues. This I assume gives a good walk through of installing and configuring Essbase Analytics Link for HFM.