Tag Archives: Oracle BI Apps
Rittman Mead BI Forum 2015 Now Open for Registration!
I’m very pleased to announce that the Rittman Mead BI Forum 2015, running in Brighton and Atlanta in May 2015, is now open for registration.
Back for its seventh successful year, the Rittman Mead BI Forum once again will be showcasing the best speakers and presentations on topics around Oracle Business Intelligence and data warehousing, with two events running in Brighton, UK and Atlanta, USA in May 2015. The Rittman Mead BI Forum is different to other Oracle tech events in that we keep the numbers attending limited, topics are all at the intermediate-to-expert level, and we concentrate on just one topic – Oracle Business Intelligence Enterprise Edition, and the technologies and products that support it.
As in previous years, the BI Forum will run on two consecutive weeks, starting in Brighton and then moving over to Atlanta for the following week. Here’s the dates and venue locations:
- Rittman Mead BI Forum 2015 – Hotel Seattle, Brighton, May 6th – 8th 2015
- Rittman Mead BI Forum 2015 – Renaissance Atlanta Midtown Hotel, May 13th – 15th 2015
This year our optional one-day masterclass will be delivered by Jordan Meyer, our Head of R&D, and myself and will be on the topic of “Delivering the Oracle Big Data and Information Management Reference Architecture” that we launched last year at our Brighton event. Details of the masterclass, and the speaker and session line up at the two events are on the Rittman Mead BI Forum 2015 homepage.
Each event has its own agenda, but both will focus on the technology and implementation aspects of Oracle BI, DW, Big Data and Analytics. Most of the sessions run for 45 minutes, but on the first day we’ll be holding a debate and on the second we’ll be running a data visualization “bake-off” – details on this, the masterclass and the keynotes and our special guest speakers will be revealed on this blog over the next few weeks – watch this space!
Data Lineage and Impact Analysis in Oracle BI Apps 11.1.1.8.1
In yesterday’s post I took a look at one of the new features in the 11.1.1.8.1 release of the BI Applications; integration between BI Apps 11g and Oracle Endeca Information Discovery. Whilst we’re on the topic then, I thought it’d be worth taking a look at another new feature introduced with BI Apps 11.1.1.8.1 – data lineage and impact analysis.
So what exactly is data lineage, and impact analysis? Data lineage is the path that data takes through your system from source to the final target reports and dashboards, and describes the lifecycle from raw data through to processed, validated and transformed information presented to your users. Impact analysis is what you do to determine what downstream data items will be affected by a change to a source table, column or data mapping, and has been a feature in many Oracle data integration tools in the past including Oracle Warehouse Builder and ODI11g, as shown in the screenshots below.
You could also trace data lineage through the earlier 7.9.x releases of Oracle BI Applications by starting at the DAC Console, which recorded the source table and columns for a particular target warehouse table, and the DAC tasks (usually corresponding to Informatica workflows of the same name) used to load those tables. The DAC stopped at the warehouse layer though and didn’t contain any details of the dashboards and reports that used the warehouse data, and so if you wanted to trace data lineage back from a particular report or presentation layer column you had to step through the process manually in a way that I illustrated in this blog post from a few years ago. What these new data lineage and impact analysis features in BI Apps 11.1.1.8.1 do for us is bring the two sets of metadata together, along with configuration data from the BI Applications Configuration Manager application, to create an end-to-end data lineage view of the BI Apps dataset.
Data within the BI Apps 11.1.1.8.1 system can be thought of as going through seven different layers, as shown in the diagram below. Starting at the top-level dashboards and reports, these map onto OBIEE presentation tables and columns, which in-turn are selected from business model columns that then map back to the physical tables and columns in the Oracle Business Intelligence Applications data warehouse. These data warehouse tables are loaded in two stages, first using source-specific data mappings from for example Oracle E-Business Suite 12.1.3, and then using a set of source-independant mappings that take standardised staging datasets from these sources and map them into the target data warehouse tables. Our data lineage and impact analysis routines have to be aware of these seven stages and show us how data moves and is transformed between each stage.
The way that BI Apps 11.1.1.8.1 data lineage works is to use ODI11g to extract metadata from the BI Apps Configuration Manager underlying tables, and the ODI11g repository, and combine that with RPD and catalog metadata you have to manually extract and copy into files for ODI to also upload. An ODI load plan supplied by Oracle then combines these datasets into a final set of data lineage tables also stored on the target data warehouse schema, and you can create your own data lineage and impact analysis reports or start with the ones Oracle also provide with this new feature.
To load these data lineage tables, you run a predefined load plan from ODI Studio or ODI console after checking all connections to the various sources are set up correctly. The load plan in-turn runs a number of interfaces that load lineage information from the RPD and catalog extracts, BI Configuration Manager tables and ODI repository tables, with this load plan having to run outside of the main BI Apps managed data loads – which makes sense as you have to manually re-extract the RPD and catalog metadata anyway, and you’ll probably want to run the data lineage reload after every development release of the BI Apps system rather than every day, for example.
Once you’ve loaded the data lineage tables, the subject area you can then select from to create lineage and impact reports covers all the stages in the data load, and also extends to OTBI (Oracle Transactional Business Intelligence, more on that in a future post) if you use that in combination with the BI Apps (or OTBI EE, as it’s called for cloud-based Fusion installations).
You also get a set of starter dashboards and analyses for displaying the lineage for dashboard objects, presentation tables and columns, down to tables and columns in the BI Apps data warehouse, and impact for source models, columns, variables and so on.
It’s definitely a good start, a useful resource. Going back to the days of OWB it’d be nice if this were build-in directly into ODI Studio, and the steps to identify and then export the RPD and catalog metadata are pretty manual, but it’s better than having to step through the metadata layers yourself as you had to do with the previous 7.9.x versions of the BI Apps. More details on data lineage and impact analysis in BI Apps 11.1.1.8.1 can be found in the online docs, including the configuration steps you’ll need to carry out before doing the first data lineage load.
Endeca Information Discovery Integration with Oracle BI Apps 11.1.1.8.1
One of the new features that made its way into Oracle BI Apps 11.1.1.8.1, and that completely passed me by at the time, was integration with Oracle Endeca Information Discovery 3.1 via a new ODI11g plug-in. As this new integration brings the Endeca “faceted search” and semi-structured data analysis capabilities to the BI Apps this is actually a pretty big deal, so let’s take a quick look now at what this integration brings and how it works under-the-covers.
Oracle BI Apps 11.1.1.8.1 integration with Oracle Endeca Information Discovery 3.1 requires OEID to be installed either on the same host as BI Apps or on its own server, and uses a custom ODI11g KM called “IKM SQL to Endeca Server 3.1.0” that needs to be separately downloaded from Oracle’s Edelivery site. The KM download also comes with an ODI Technology XML import file that adds Endeca Server as a technology type in the ODI Topology Navigator, and its here that the connection to the OEID 3.1 server is set up for subsequent data loads from BI Apps 11g.
As the IKM has “SQL” as the loading technology specified this means any source that can be extracted from using standard JDBC drivers can be used to load the Endeca Server, and the KM itself has options for creating the schema to go with the data upload, creating the Endeca Server record spec, specifying the collection name and so on. BI Apps 11.1.1.8.1 ships with a number of interfaces (mappings) and ODI load plans to populate Endeca Server domains for specific BI Apps fact groups, with each interface taking all of the relevant repository fact measures and dimension attributes and loading them into one big Endeca Server denormalized schema.
When you take a look at the physical (flow) tab for the interface, you can see that the data is actually extracted via the BI Apps RPD and the BI Server, then staged in the BI Apps DW database before being loaded into the Endeca Server domain. Column metadata in the BI Server repository is then used to define the datatypes, names, sizes and other details of the Endeca Server domain schema, an improvement over just pointing Endeca Information Discovery Integrator at the BI Apps database schema.
The 11.1.1.8.1 ODI repository also provides a pre-built Endeca Server load plan that allows you to turn-on and turn-off loads for particular fact groups, and sets up the data domain name and other variables needed for the Endeca Server data load.
Once you’ve loaded the BI Apps data into one or more Endeca Server data domains, there are a number of pre-build sample Endeca Studio applications you can use to get a feel for the capabilities of the Endeca Information Discovery Studio interface. For example, the Student Information Analytics Admissions and Recruiting sample application lets you filter on any attribute from the dataset on the left-hand side, and then returns instantaneously the filtered dataset displayed in the form of a number of graphs, word clouds and other visualizations.
Another interesting sample application is around employee expense analysis. Using features such as word clouds, its relatively easy to see which expense categories are being used the most, and you can use Endeca Server’s search and text parsing capabilities to extract keywords and other useful bits of information out of free-text areas, for example supporting text for expense claims.
Typically, you’d use Endeca Information Discovery as a “dig around the data, let’s look for something interesting” tool, with all of your data attributes listed in one place and automatic filtering and focusing on the filtered dataset using the Endeca Server in-memory search and aggregation engine. The sample applications that ship with 11.1.1.8.1 are just meant as a starting point though (and unlike the regular BI Apps dashboards and reports, aren’t themselves supported as an official part of the product), but as you can see from the Manufacturing Endeca Studio application below, there’s plenty of scope to create returns, production and warranty-type applications that let you get into the detail of the attribute and textual information held in the BI Apps data warehouse.
More details on the setup process are in the online docs, and if you’re new to Endeca Information Discovery take a look at our past articles on the blog on this product.
Rittman Mead BI Forum 2015 Call for Papers Now Open!
I’m very pleased to announce that the Call for Papers for the Rittman Mead BI Forum 2015 is now open, with abstract submissions open to January 18th 2015. As in previous years the BI Forum will run over consecutive weeks in Brighton, UK and Atlanta, GA, with the provisional dates and venues as below:
- Brighton, UK : Hotel Seattle, Brighton, UK : May 6th – 8th 2015
- Atlanta, GA : Renaissance Atlanta Midtown Hotel, Atlanta, USA : May 13th-15th 2015
Now on it’s seventh year, the Rittman Mead BI Forum is the only conference dedicated entirely to Oracle Business Intelligence, Oracle Business Analytics and the technologies and processes that support it – data warehousing, data analysis, data visualisation, big data and OLAP analysis. We’re looking for session around tips & techniques, project case-studies and success stories, and sessions where you’ve taken Oracle’s BI products and used them in new and innovative ways. Each year we select around eight-to-ten speakers for each event along with keynote speakers and a masterclass session, with speaker choices driven by attendee votes at the end of January, and editorial input from myself, Jon Mead and Charles Elliott and Jordan Meyer.
Last year we had a big focus on cloud, and a masterclass and several sessions on bringing Hadoop and big data to the world of OBIEE. This year we’re interested in project stories and experiences around cloud and Hadoop, and we’re keen to hear about any Oracle BI Apps 11g implementations or migrations from the earlier 7.9.x releases. Getting back to basics we’re always interested in sessions around OBIEE, Essbase and data warehouse data modelling, and we’d particularly like to encourage session abstracts on data visualization, BI project methodologies and the incorporation of unstructured, semi-structured and external (public) data sources into your BI dashboards. For an idea of the types of presentations that have been selected in the past, check out the BI Forum 2014, 2013 and 2012 homepages, or feel free to get in touch via email at mark.rittman@rittmanmead.com.
The Call for Papers entry form is here, and we’re looking for speakers for Brighton, Atlanta, or both venues if you can speak at both. All session this year will be 45 minutes long, all we’ll be publishing submissions and inviting potential attendees to vote on their favourite sessions towards the end of January. Other than that – have a think about abstract ideas now, and make sure you get them in by January 18th 2015.
News and Updates from Oracle Openworld 2014
It’s the Saturday after Oracle Openworld 2014, and I’m now home from San Francisco and back in the UK. It’s been a great week as usual, with lots of product announcements and updates to the BI, DW and Big Data products we use on current projects. Here’s my take on what was announced this last week.
New Products Announced
From a BI and DW perspective, the most significant product announcements were around Hadoop and Big Data. Up to this point most parts of an analytics-focused big data project required you to code the solution yourself, with the diagram below showing the typical three steps in a big data project – data ingestion, analysis and sharing the results.
At the moment, all of these steps are typically performed from the command-line using languages such as Python, R, Pig, Hive and so on, with tools like Apache Flume and Apache Sqoop used to bring data into and out of the Hadoop cluster. Under the covers, these tools use technologies such as MapReduce or Spark to do their work, automatically running jobs in parallel across the cluster and making use of the easy scalability of Hadoop and NoSQL databases.
You can also neatly divide the work up on a big data project into two phases; the “discovery” phase typically performed by a data scientist where data is loaded, analysed, correlated and otherwise “understood” to provide the initial insights, and then an “exploitation” phase where we apply governance, provide the output data in a format usable by BI tools and otherwise share the results with the wider corporate audience. The updated Information Management Reference Architecture we collaborated on with Oracle and launched by in June this year had distinct discovery and exploitation phases, and the architecture itself made a clear distinction between the Innovation part that enabled the discovery phase of a project and the Execution part that delivered the insights and data in a more governed, production setting.
This was the theme of the product announcements around analytics, BI, data warehousing and big data during Openworld 2014, with Oracle’s Omri Traub in the photo below taking us through Oracle’s big data product strategy. What Oracle are doing here is productising and “democratising” big data, putting it clearly in context of their existing database, engineered systems and BI products and linking them all together into an overall information management architecture and delivery process.
So working through from ingestion through to data analysis, these steps have typically been performed by data scientists using scripting tools and rudimentary data visualisation engines, making them labour-intensive and reliant on a small set of people conversant with these tools and process. Oracle Big Data Discovery is aimed squarely at these steps, and combines Apache Spark-based data preparation and transformation capabilities with an analysis and visualisation engine based on Endeca Server.
Key features of Big Data Discovery include:
- Ability to analyse, parse, explore and “wrangle” data using graphical tools and a Spark-based transformation engine
- Create a catalog of the data on your Hadoop cluster, and then search that catalog using Endeca Server search technologies
- Create recommendations of other datasets that might interest you, based on what you’re looking at now
- Visualize your datasets to help understand what they contain, and discover new insights
Under the covers it comprises two parts; the data loading, transformation and profiling part that uses Apache Spark to do its work in parallel across all the nodes in the cluster, and the analysis part, which takes data prepared by Apache Spark and loads into the Endeca Server in-memory engine to perform the analysis, aggregation and data visualisation. Unlike the Spark part the Endeca server element runs just on one node and limits the size of the analysis dataset to what can run in-memory in the Endeca Server engine, but in practice you’re going to work with a sample of the data rather than the entire dataset at that stage (in time the assumption is that the Endeca Server engine will be unbundled and run natively on YARN, giving it the same scalability as the Spark-based data ingestion and transformation part). Initially Big Data Discovery will run on-premise with a cloud version later on, and it’s not dependent on Big Data Appliance – expect to see something later this year / early next year.
Another new product that addresses the discovery phase and discovery lab part of a big data project is Oracle Data Enrichment Cloud Service, from the Oracle Data Integration team and designed to complement ODI and Oracle EDQ. Whilst Oracle positioned ODECS as something you’d use as well as Big Data Discovery and typically upstream from BDD, to me there seemed to be a fair bit of overlap between the products, with both tools doing data profiling and transformation but BDD being more focused on the exploration and discovery part, and ODECS being more focused on early-stage data profiling and transformation.
ODECS is clearly more of an ETL tool complement and runs natively in the cloud, right from the start. It’s most probably aimed at customers with their Hadoop dataset already in the cloud, maybe using Amazon Elastic MapReduce or Oracle’s new Hadoop-as-a-Service and has more in common with the old Data Quality Option for Oracle Warehouse Builder than Endeca’s search-first analytic interface. It’s got a very nice interface including a mobile-enabled website and the ability to include and merge in external datasets, including Oracle’s own Data as a Service platform offering. Along with the new Metadata Management tool Oracle also launched at Openworld it’s a great addition to the Oracle Data Integration product suite, but I can’t help thinking that its initial availability only on Oracle’s public cloud platform is going to limit its use with Oracle’s typical customers – we’ll have to just wait and see.
The other major product that addresses big data projects was Oracle Big Data SQL. Partly addressing the discovery phase of big data projects but mostly (to my mind) addressing the exploitation phase, and the execution part of the information management architecture, Big Data SQL gives Oracle Exadata the ability to return data from Hive and NoSQL on the Big Data Appliance as well as data from its normal relational store. I covered Big Data SQL on the blog a few weeks ago and I’ll be posting some more in-depth articles on it next week, but the other main technical innovation with the product is its bringing of Exadata’s SmartScan feature to Hadoop, projecting and filtering data at the Hadoop storage node level and also giving Hadoop the ability to understand regular Oracle SQL, rather than the cut-down version you get with HiveQL.
Where this then leaves us is with the ability to do most of a big data project using (Oracle) tools, bringing big data analysis within reach of organisations with Oracle-style budgets but without access to rare data scientist-type resources. Going back to my diagram earlier, a post-OOW big data project using the new products launched in this last week could look something like this:
Big Data SQL is out now and depends on BDA and Exadata for its use; Big Data Discovery should be out in a few months time, runs on-premise but doesn’t require BDA, whilst ODECS is cloud-only and runs on a BDA in the background. Expect more news and more integration/alignment from the products as 2014 ends and 2015 starts, and we’re looking forward to using them on Oracle-centric Hadoop projects in the near future.
Product Updates for BI, Data Integration, Exalytics, BI Applications and OBIEE
Other news announced over the week for products we more commonly use on projects include:
- Oracle BI Cloud Service, now GA and covered on the blog in a five-part series just before Openworld
- Oracle have ended development of the Informatica version of the BI Apps at release 7.9.6.4, and there won’t be an 11g release that uses Informatica as the embedded ETL tool; instead they’ll need to reimplement using ODI to get to BI Apps 11g, and I did hear mention of a migration tool to be released soon
- Oracle Transactional BI Enterprise Edition, a cloud-based BI Apps version for Fusion Apps running in Oracle Public Cloud
- Certification for Oracle Database 12c In-Memory for Exalytics, with TimesTen for Exalytics expected to be de-emphasised over time.
- A new option to install Exalytics in the Big Data Appliance Starter Rack, bring in-memory BI analysis closer to big data
- More details on OBIEE 12c, including devops improvements and the new Tableau-killer Visual Analyzer data analysis tool
- Further extensions of ODI and GoldenGate into the big data world, including the ability for GoldenGate to stream into Apache Flume
- Examples of ODI integration with cloud and SaaS data sources, including a great demo of ODI Salesforce.com and Amazon Redshift integration
Finally, something that we were particularly pleased to see was the updated Oracle Information Management Architecture I mentioned earlier referenced in most of the analytics sessions, with Oracle’s Balaji Yelamanchili for example introducing it in his big data and business analytics general session mid-way through the week.
We love the way this brings together the big data components and puts them in the context of the wider data warehouse and analytic processes, and compared to a few years ago when Hadoop and big data was considered completely separate to data warehousing and BI and done by staff completely different to the core business analytics team, this new reference architecture puts it squarely within the world of BI and analytics we work in. It also emphasises the new abilities Hadoop, NoSQL databases and big data can bring us – support for wider sets of data sources with dynamic schemas, the ability to economically work with and analyse much larger datasets, and support discovery-type upfront analysis work. Finally, it recognises that to get true value out of analysis you start on Hadoop, you eventually need to add proper data governance, make the results more widely available using full SQL tools, and use the right tools – relational databases, OLAP servers and the like – to analyse the data once its in a more structured form.
If you missed our write-up on the updated Information Management Reference Architecture you can can read our two-part blog post here and here, read the Oracle white paper, or listen to the podcast with OTN Archbeat’s Bob Rhubart. For now though I’m looking forward to seeing the family after a week and a half away in San Francisco – thanks to OTN and the Oracle ACE Director Program for sponsoring my visit over to SF for Openworld, and we’ll post our conference presentation slides later next week when we’re back in the UK and US offices.