Tag Archives: Hyperion
Getting The Users’ Trust – Part 2
Last time I wrote about the performance aspects of a BI system and how they could affect a user’s confidence. I concluded by mentioning that incorrect data might be generated by poorly coded ETL routines causing data loss or duplication. This time I am looking more at the quality of the data we load (or don’t load).
Back in the 1990’s I worked with a 4.5 TB DWH that had a single source for fact and reference data, that is the data loaded was self-consistent. Less and less these days we find a single source DWH to be the case; we are adding multiple data sources (both internal and external). Customers can now appear on CRM, ERP, social media, credit referencing, loyalty, and a whole host of other systems. This proliferation of data sources gives rise to a variety of issues we need to be at least aware of, and in reality, should be actively managing. Some of these issues require us to work out processing rules within our data warehouse such as what do we do with fact data that arrives before its supporting reference data; I once had a system where our customer source could only be extracted once a week but purchases made by new customers would appear in our fact feed immediately after customer registration. Obviously, it is a business call on whether we publish facts that involve yet to be loaded customers straight away or defer those loads until the customer has been processed in the DWH. In the case of my example we needed to auto-create new customers in the data warehouse with just the minimum of data, the surrogate key and the business key and then do a SCD type 1update when the full customer data profile is loaded the following week. Technical issues such as these are trivial, we formulate and agree a business rule to define our actions and we implement it in our ETL or, possibly, the reporting code. In my opinion the bigger issues to resolve are in Data Governance and Data Quality.
Some people combine Data Quality and Governance together as a single topic and believe that a single solution will put all right. However, to my mind, they are completely separate issues. Data quality is about the content of the data and governance is about ownership, providence and business management of the data. Today, Data Governance is increasingly becoming a regulatory requirement, especially in finance.
Governance is much more than the data lineage tools we might access in ETL tools such as ODI and even OWB. ETL lineage is about source to target mappings; our ability to say that ‘bank branch name’ comes from this source attribute, travels through these multiple ODI mappings and finally updates that column in our BANK_BRANCH dimension table. In true Data Governance we probably do some or all of these:
- Create a dictionary of approved business terms. This will define every attribute in business terms and also provide translations between geographic and business-unit centric ways of viewing data. In finance one division may talk about “customer”, another division will say “investor”, a third says “borrower”; in all three cases we are really talking about the same kind of object, a person. This dictionary should go down to the level of individual attribute and measures and include the type of data being held such as text, currency, date-time, these data types are logical types and not physical types as seen on the actual sources. It is important that this dictionary is shared throughout the organisation and is “the true definition” of what is reported.
- Define ownership (or stewardship) for the approved business data item.
- Map business data sources and targets to our approved list of terms (at attribute level). It is very possible that some attributes will have multiple potential sources, in such cases we must specify which source will be the master source.
- Define processes to keep our business data aligned.
- Define ownership for the sources for design (and for static data such as ISO country codes, content) change accountability. Possibility integrate into change notification mechanism of change process.
- Define data release processes for approved external reference data.
- Define data access and redaction rules for compliance purposes.
- Build-in audit and control.
Whereas governance is about using the right data and having processes and people to guarantee it is correctly sourced, Data Quality is much finer in grain and looks at the actual content. Here a tool such as Oracle Enterprise Data Quality is invaluable. By the way I have noticed that OEDQ version 12 has recently been released, I have a blog on this in the pipeline.
I tend to divide Data Quality into three disciplines:
- Data Profiling is always going to be our first step. Before we fix things we need to know what to fix! Generally, we try to profile a sample of the data and assess it column by column, row by row to build a picture of the actual content. Typically we look at data range, nulls, number of distinct values and in the case of text data: character types used (alpha, letter case, numeric, accents, punctuation etc), regular expressions. From this we develop a plan to tackle quality, for example on a data entry web-page we may want to tighten processing rules to prevent certain “anticipated” errors; more usually we come up with business rules to apply in our next stage.
- Data Assessment. Here we test the full dataset against the developed rules to identify data that conforms or needs remedy. This remedy could be referring the data back to the source system owner for correction, providing a set of data fixes to apply to the source which can be validated and applied as a batch, creating processes to “fix” data on the source at initial data entry, or (and I would strongly advise against this for governance reasons) dynamically fix in an ETL process. The reason I am against fixing data downstream in ETL is that the data we report on in our Data Warehouse is not going to match the source and this will be problematic when we try to validate if our data warehouse fits reality.
- Data de-duplication. This final discipline of our DQ process is the most difficult, identifying data that is potentially duplicated in our data feed. In data quality terms a duplicate is where two or more rows refer to what is probably (statistically) the same item, this is a lot more fuzzy than an exact match in database terms; people miskey data, call centre staff mis-hear names, companies merge and combine data sets, I have even seen customers registering a new email address because they can not be bothered to reset their password on a e-selling website. De-duplication is important to improve the accuracy of BI in general, it is nigh-on mandatory for organisations that need to manage risk and prevent fraud.
Getting The Users’ Trust – Part 2
Last time I wrote about the performance aspects of a BI system and how they could affect a user’s confidence. I concluded by mentioning that incorrect data might be generated by poorly coded ETL routines causing data loss or duplication. This time I am looking more at the quality of the data we load (or don’t load).
Back in the 1990’s I worked with a 4.5 TB DWH that had a single source for fact and reference data, that is the data loaded was self-consistent. Less and less these days we find a single source DWH to be the case; we are adding multiple data sources (both internal and external). Customers can now appear on CRM, ERP, social media, credit referencing, loyalty, and a whole host of other systems. This proliferation of data sources gives rise to a variety of issues we need to be at least aware of, and in reality, should be actively managing. Some of these issues require us to work out processing rules within our data warehouse such as what do we do with fact data that arrives before its supporting reference data; I once had a system where our customer source could only be extracted once a week but purchases made by new customers would appear in our fact feed immediately after customer registration. Obviously, it is a business call on whether we publish facts that involve yet to be loaded customers straight away or defer those loads until the customer has been processed in the DWH. In the case of my example we needed to auto-create new customers in the data warehouse with just the minimum of data, the surrogate key and the business key and then do a SCD type 1update when the full customer data profile is loaded the following week. Technical issues such as these are trivial, we formulate and agree a business rule to define our actions and we implement it in our ETL or, possibly, the reporting code. In my opinion the bigger issues to resolve are in Data Governance and Data Quality.
Some people combine Data Quality and Governance together as a single topic and believe that a single solution will put all right. However, to my mind, they are completely separate issues. Data quality is about the content of the data and governance is about ownership, providence and business management of the data. Today, Data Governance is increasingly becoming a regulatory requirement, especially in finance.
Governance is much more than the data lineage tools we might access in ETL tools such as ODI and even OWB. ETL lineage is about source to target mappings; our ability to say that ‘bank branch name’ comes from this source attribute, travels through these multiple ODI mappings and finally updates that column in our BANK_BRANCH dimension table. In true Data Governance we probably do some or all of these:
- Create a dictionary of approved business terms. This will define every attribute in business terms and also provide translations between geographic and business-unit centric ways of viewing data. In finance one division may talk about “customer”, another division will say “investor”, a third says “borrower”; in all three cases we are really talking about the same kind of object, a person. This dictionary should go down to the level of individual attribute and measures and include the type of data being held such as text, currency, date-time, these data types are logical types and not physical types as seen on the actual sources. It is important that this dictionary is shared throughout the organisation and is “the true definition” of what is reported.
- Define ownership (or stewardship) for the approved business data item.
- Map business data sources and targets to our approved list of terms (at attribute level). It is very possible that some attributes will have multiple potential sources, in such cases we must specify which source will be the master source.
- Define processes to keep our business data aligned.
- Define ownership for the sources for design (and for static data such as ISO country codes, content) change accountability. Possibility integrate into change notification mechanism of change process.
- Define data release processes for approved external reference data.
- Define data access and redaction rules for compliance purposes.
- Build-in audit and control.
Whereas governance is about using the right data and having processes and people to guarantee it is correctly sourced, Data Quality is much finer in grain and looks at the actual content. Here a tool such as Oracle Enterprise Data Quality is invaluable. By the way I have noticed that OEDQ version 12 has recently been released, I have a blog on this in the pipeline.
I tend to divide Data Quality into three disciplines:
- Data Profiling is always going to be our first step. Before we fix things we need to know what to fix! Generally, we try to profile a sample of the data and assess it column by column, row by row to build a picture of the actual content. Typically we look at data range, nulls, number of distinct values and in the case of text data: character types used (alpha, letter case, numeric, accents, punctuation etc), regular expressions. From this we develop a plan to tackle quality, for example on a data entry web-page we may want to tighten processing rules to prevent certain “anticipated” errors; more usually we come up with business rules to apply in our next stage.
- Data Assessment. Here we test the full dataset against the developed rules to identify data that conforms or needs remedy. This remedy could be referring the data back to the source system owner for correction, providing a set of data fixes to apply to the source which can be validated and applied as a batch, creating processes to “fix” data on the source at initial data entry, or (and I would strongly advise against this for governance reasons) dynamically fix in an ETL process. The reason I am against fixing data downstream in ETL is that the data we report on in our Data Warehouse is not going to match the source and this will be problematic when we try to validate if our data warehouse fits reality.
- Data de-duplication. This final discipline of our DQ process is the most difficult, identifying data that is potentially duplicated in our data feed. In data quality terms a duplicate is where two or more rows refer to what is probably (statistically) the same item, this is a lot more fuzzy than an exact match in database terms; people miskey data, call centre staff mis-hear names, companies merge and combine data sets, I have even seen customers registering a new email address because they can not be bothered to reset their password on a e-selling website. De-duplication is important to improve the accuracy of BI in general, it is nigh-on mandatory for organisations that need to manage risk and prevent fraud.
OBIEE Workspace Integration Requires a Browser Plugin
Oracle BI 11.1.1.7 now has the ability to integrate with the well-known entry point for Oracle EPM, Workspace. However, from a browser perspective this integration requires additional plug-in configuration on each user's browser in order to render the interface correctly. Some users already attempting this OBIEE to Workspace integration may have already seen an alert or notice to install a plug-in for their browser finding that the OBIEE Workspace Integration Requires a Browser Plugin.
The post OBIEE Workspace Integration Requires a Browser Plugin appeared first on Art of Business Intelligence Blog.
Rittman Mead / ODTUG India BI Masterclass Tour Roundup
Over the past week Venkat, myself and the Rittman Mead India team have been running a series of BI Masterclasses at locations in India, in conjunction with ODTUG, the Oracle Development Tools User Group. Starting off in Bangalore, then traveling to Hyderabad and Mumbai, we presented on topics ranging from OBIEE through Exalytics through to EPM Suite and BI Applications, and with networking events at the end of each day.
Around 50 attended at Bangalore, 30 in Hyderbad and 40 in Mumbai, at at the last event we were joined by Harsh Bhogle from the local Oracle office, who presented on Oracle’s high-level strategy around business analytics. Thanks to everyone who attended, thanks to ODTUG for sponsoring the networking events, and thanks especially to Vijay and Pavan from Rittman Mead India who organised everything behind the scenes. If you’re interested, here’s a Flickr set of photos from all three events (plus a few at the start where I visited our offices in Bangalore.)
For anyone who couldn’t attend the events, or if you were there and you’d like copies of the slides, the links below are for the PDF versions of the sessions we presented at various points over the week.
- Oracle BI, Analytics and EPM Product Update
- Extreme BI: Agile BI Development using OBIEE, ODI and Golden Gate
- OBIEE 11g Integration with the Oracle EPM Stack
- OBIEE and Essbase on Exalytics Development & Deployment Best Practices
- OBIEE 11g Security Auditing
- Intro and tech deep dive into BI Apps 11g + ODI
- Metadata & Data loads to EPM using Oracle Data Integrator
So I’m writing this in my hotel room in Mumbai on Sunday morning, waiting for the airport transfer and then flying back to the UK around lunchtime. It’s been a great week but my only regret was missing the UKOUG Apps’13 conference last week, where I was also supposed to be speaking but managed to double-book myself with the event in India.
In the end, Mike Vickers from Rittman Mead in the UK gamely took my place and presented my session, which was put together as a joint effort with Minesh Patel, another of the team in the UK and one of our BI Apps specialists. Entitled “Oracle BI Apps – Giving the Users the Reports they *Really* Want”, it’s a presentation around the common front-end customisations that we typically carry out for customers who want to move beyond the standard, generic dashboards and reports provided by the BI Apps, and again if you missed the session or you’d like to see the slides, they’re linked-to below:
That’s it for now – and I’ll definitely be at Tech’13 in a few weeks’ time, if only because I’ve just realised I’m delivering the BI Masterclass sessions on the Sunday, including a session on OBIEE/ODI and Hadoop integration - I’ve been saying to myself I’d like to get these two tools working with Impala as an alternative to Hive, so that gives me something to start looking at on the flight back later today.
Installing Essbase Analytics Link for HFM 11.1.2.3
If your requirement is to install EAL for HFM, then this post is for you. Oracle has not yet released a new version of EAL that is certified with HFM 11.1.2.3 and hence you will not be able to find one in the edelivery. The question is – what version of EAL one can use with EPM 11.1.2.3. A few days ago, MOS published a Doc – 1570187.1 that said we could use EAL 11.1.2.2.301 PSU for the same. This PSU is a full installation, which is available for download at MOS site. Also, any previous installation of EAL must be uninstalled before installing this release.
The pre-requisites from the documentation include (these are applicable only if you have an existing EAL instance, and not applicable for fresh installations) -
- Clear the existing Analytics Link repository (this is bad since you need to redefine the connections, regions, bridges etc.)
- Unregister previous instance of Analytics Link Server from Foundation Services
- Reset Data Sync Server
This post will demonstrate a fresh installation on a Windows 2008 64-bit server. Also, this post will not give you a step-by-step approach but will highlight the key steps in the installation.
Like other EPM products, EAL cannot use EPM System Installer for installation. To install EAL 11.1.2.2.301, you should use 32-bit version Oracle Universal Installer 11.2, even if the installation is going to be on a 64-bit machine. The OUI will install Analytics Link version that matches the bitness of the operating system. This comes with the EAL download and hence no separate download required. Alternatively, you can use the OUI that comes with Oracle database installed if you’ve one on the same server.
Installation:
Run the OUI installer; select the installation type and destination folder.
To let OUI know what product we’re going to install, we need to browse to the path where products.xml exists under the unzipped EAL part.
Specify a path where you want to install EAL and kick-off installation. Make sure the installer displays 11.1.2.2.301 version during the install.
Once the installation is finished, the configuration tool starts up. You must enter the Weblogic server details and Analytics Link repository details when prompted.
The Doc ID mentioned earlier also suggests not using the default EPM Instance Home location at the Foundation Configuration step.
Give a suitable Username and password for the Data Sync Server and use an account that is an Administrator to configure Analytics Link services. Unfortunately, the configuration tool doesn’t show the progress of configuration, so you’ll have to wait until you see the ‘success’ message on the window. As specified earlier, this is a full installation and cannot be rolled back.
Verifying the installation:
After successful installation, to verify that EAL is able to connect to HFM, we have to log on to Essbase Administration Services Console (use client installers to install EAS Console) and import the EAL plug-in which is shown in the below screenshots. Go to Tools>Configure components and click Add to import the EAL plug-in.
Navigate to the directory where Analytics Link server is installed (HFS_HOME which is C:\EAL_Home in our case) and import the jar file eas-plugin_wl.jar.
After the import is finished, you may need to exit and restart EAS Console to see the ‘Analytics Link Servers’ node.
Add a new Analytics Link Server by specifying the username and password that is given at the time of configuration.
Now that we have successfully imported EAL plug-in and added our First Analytics Link server, to verify the HFM connectivity – we need to define which HFM application that EAL should connect to etc. and to what Essbase database it should write the outline/data based on HFM application. Basically, we should configure all the objects under the ‘First’ Analytics Link Server.
HFM Server and Application:
Essbase Application and Database:
After you define the Data Sync Server and Data Store – create a bridge that acts as a link between HFM application and Essbase database refreshing the outline and data.
Open the bridge, create a bridge application and check if the outline is created.
Now, we can conclude that there are no configuration related issues since we’re able to refresh the metadata to Essbase without any issues. This I assume gives a good walk through of installing and configuring Essbase Analytics Link for HFM.