Tag Archives: Obiee
ETL Offload with Spark and Amazon EMR – Part 1
We recently undertook a two-week Proof of Concept exercise for a client, evaluating whether their existing ETL processing could be done faster and more cheaply using Spark. They were also interested in whether something like Redshift would provide a suitable data warehouse platform for them. In this series of blog articles I will look at how we did this, and what we found.
Background
The client has an existing analytics architecture based primarily around Oracle database, Oracle Data Integrator (ODI), Oracle GoldenGate, and Oracle Business Intelligence Enterprise Edition (OBIEE), all running on Amazon EC2. The larger architecture in the organisation is all AWS based too.
Existing ETL processing for the system in question is done using ODI, loading data daily into a partitioned Oracle table, with OBIEE providing the reporting interface.
There were two aspects to the investigation that we did:
Primarily, what would an alternative platform for the ETL look like? With lots of coverage recently of the concept of "ETL offloading" and "Apache-based ETL", the client was keen to understand how they might take advantage of this
Within this, key considerations were:
- Cost
- Scalability
- Maintenance
- Fit with existing and future architectures
The second aspect was to investigate whether the performance of the existing reporting could be improved. Despite having data for multiple years in Oracle, queries were too slow to provide information other than within a period of a few days.
Oracle licenses were a sensitive point for the client, who were keen to reduce - or at least, avoid increased - costs. ODI for Big Data requires additional licence, and so was not in scope for the initial investigation.
Data and Processing
The client uses their data to report on the level of requests for different products, including questions such as:
- How many requests were there per day?
- How many requests per product type in a given period?
- For a given product, how many requests were there, from which country?
Data volumes were approximately 50MB, arriving in batch files every hour. Reporting requirements were previous day and before only. Being able to see data intra-day would be a bonus but was not a requirement.
High Level Approach
Since the client already uses Amazon Web Services (AWS) for all its infrastructure, it made sense to remain in the AWS realm for the first round of investigation. We broke the overall requirement down into pieces, so as to understand (a) the most appropriate tool at each point and (b) the toolset with best overall fit. A very useful reference for an Amazon-based big data design is the presentation Big Data Architectural Patterns and Best Practices on AWS. Even if you're not running on AWS, the presentation has some useful pointers for things like where to be storing your data based on volumes, frequency of access, etc.
Data Ingest
The starting point for the data was Amazon's storage service - S3, in which the data files in CSV format are landed by an external process every hour.
Processing (Compute)
Currently the processing is done by loading the external data into a partitioned Oracle table, and resolving dimension joins and de-duplication at query time.
Taking away any assumptions, other than a focus on 'new' technologies (and a bias towards AWS where appropriate), we considered:
- Switch out Oracle for Redshift, and resolve the joins and de-duplication there
- Loading the data to Redshift would be easy, but would be switching one RDBMS-based solution for another. Part of the aim of the exercise was to review a broader solution landscape than this.
Use Hadoop-based processing, running on Elastic Map Reduce (EMR):
- Hive QL to process the data on S3 (or HDFS on EMR)
- Not investigated, because provides none of the error handling etc that Spark would, and Spark has SparkSQL for any work that needs doing in SQL.
- Pig
- Still used, but 'old' technology, somewhat esoteric language, and superseded by Spark
- Spark
- Support for several languages including commonly-used ones such as Python
- Gaining increasing levels of adoption in the industry
- Opens up rich eco-system of processing possibilities with related projects such as Machine Learning, and Graph.
- Hive QL to process the data on S3 (or HDFS on EMR)
We opted to use Spark to process the files, joining them to the reference data, and carrying out de-duplication. For a great background and discussion on Spark and its current place in data architectures, have a listen to this podcast.
Storage
The output from Spark was written back to S3.
Analytics
With the processed data in S3, we evaluated two options here:
- Load it to Redshift for query
- Query in-place with a SQL-on-Hadoop engine such as Presto or Impala
- With the data at rest on S3, Amazon's Athena is also of interest here, but was released after we carried out this particular investigation.
The presumption was that OBIEE would continue to provide the front-end to the analytics. Oracle's Data Visualization Desktop tool was also of interest.
In the next post we'll see the development environment that we used for prototyping. Stay tuned!
The Visual Plugin Pack for OBIEE
Last week we announced the Rittman Mead Open Source project, and released into open source:
- the excellent Insights project, a javascript API/framework for building a new frontend for OBIEE, written by Minesh Patel
- Enhanced usage tracking for OBIEE, to track click-by-click how your users interact with the application
Today it is the turn of the Visual Plugin Pack.....
What is the Visual Plugin Pack for OBIEE ?
Visual Plugin Pack (VPP) is a means by which users of OBIEE Answers can use custom JavaScript visualisations without having to write any javascript!
It is a framework that enables developers to build Javascript visualisation plugins, that report builders can then utilise and configure through native OBIEE user interface.
I want to point this out from the very start, that despite its name, the Visual Plugin Pack is not a pack of all-singing, all-dancing, super-duper visualisations for OBIEE.
Instead, VPP should be thought of as a framework that allows you to quickly develop and integrate all-singing, all-dancing, super-duper visualisations that will work natively within OBIEE.
Subtle difference, but an important one.
So what does it do ?
Essentially, VPP allows you to accelerate the development and deployment of custom, configurable and reusable OBIEE JavaScript visualisations.
Back in 2013 I wrote this post describing how to embed a D3 Visualisation within OBIEE. The same method will still work today, but it's a cumbersome process and requires heavy use of the narrative form, which let's be honest, is a painful experience when it comes to JavaScript development.
Some drawbacks with this method:
- Code editing in the Narrative view is not productive.
- Reusing visualisations in other analyses requires the copying and pasting of code.
- Basic Visualisation configuration changes, for example, width, height, colours, font etc requires code changes.
- Remapping Column bindings requires code changes.
- JavaScript library dependencies and load order can be tricky to manage.
The Visual Plugin Pack attempts to address these issues by abstracting away the complexities of the Narrative form and allowing developers to focus on visualisation code, not OBIEE integration code.
If you choose to use VPP for your visualisations then you will never have to touch the narrative form, all visualisation development can take place outside of OBIEE in your favourite code editor and deployed to Weblogic when you are done.
VPP also allows you to define design-time controls that affect column bindings and visualisation behaviour. The example visualisation below has been written to accept 5 column bindings and 1 configuration component, which controls the visualisation size. You can create as many column bindings and configuration components as you need
How do I get started ?
You can download or fork the repository from here.
Installation and developer guides can be found on the wiki:-
There are several visualisations that come bundled with VPP, some more polished than others, but they should serve as good examples that can be enhanced further.
Summary
If you've got some in-house JavaScript skills and are looking to develop and integrate custom visualisations into OBIEE, then VPP can help alleviate a lot of the frustrations associated with the traditional method. Once you're up and running you'll be able to develop faster, integrate quickly and share your visualisations with all OBIEE report writers.
If you'd like to discuss how Rittman Mead can help with deployment or assist with custom visualisation development feel free to contact us.
Enhanced Usage Tracking for OBIEE – Now Available as Open Source!
Introduction
OBIEE provides Usage Tracking as part of the core product functionality. It writes directly to a database table every Logical Query that hits the BI Server, including details of who ran it, when, and information about how it executed including for how long, how many rows, and so on. This in itself is a veritable goldmine of information about your OBIEE system. All OBIEE deployments should have Usage Tracking enabled, for supporting performance analysis, capacity planning, catalog rationalisation, and more.
What Usage Tracking doesn't track is interactions between the end user and the Presentation Services component. Presentation Services sits between the end user and the BI Server from where the actual queries get executed. This means that until a user executes an analysis, there's no record of their actions in Usage Tracking. There is this audit data available, but you have to manually enable and collect it, which can be tricky. This is where Enhanced Usage Tracking comes in. It enables the collection and parsing of every click a user makes in OBIEE. For an overview of the potential of this data, see the article here and here.
Today we're pleased to announce the release into open-source of Enhanced Usage Tracking! You can find the github repository here: https://github.com/RittmanMead/obi-enhanced-usage-tracking.
Highlights of the data that Enhanced Usage Tracking provides includes:
Which web browsers do people use? Who is accessing OBIEE with a mobile device?
Who deleted a catalog object? Who moved it?
What dashboards get exported to Excel most frequently, and by whom?
The above visualisations are from both Kibana, and OBIEE. The data from Enhanced Usage Tracking can be loaded into Elasticsearch, and is also available from Oracle tables too, hence you can put OBIEE itself on top of it, or DV:
How to use Enhanced Usage Tracking
See the github repository for full detail on how to install and run the code.
TODO
What's left TODO? Here are a few ideas if you'd like to help build on this tool. I've linked each title to the relevant github issue.
TODO 01
The sawlog is a rich source of lots of data, but the Logstash script has to know how to parse it. It's all down to the grok statement which identifies fields to extract and defined their deliniators. Use grokdebug.herokuapp.com to help master your syntax. From there, the data can be emitted to CSV and loaded into Oracle.
Here's an example of something yet to build - when items are moved and deleted in the Catalog, it is all logged. What, who, and when. The Logstash grok currently scrapes this, but the data isn't included in the CSV output, nor loaded into Oracle.
Don't forget to submit a pull request for any changes to the code that would benefit others in the community!
You'll also find loading the data directly into Elasticsearch easier than redefining the Oracle table DDL and load script each time, since in Elasticsearch the 'schema' can evolve based simply on the data that Logstash sends to it.
TODO 02
Version 5 of the Elastic stack was released in late 2016, and it would be good to test this code with it and update the README section above to indicate if it works - or submit the required changes needed for it to do so.
TODO 03
There's lots of possibilities for this data. Auditing who did what, when, is useful (e.g. who deleted a report?). Taking it a step further, are there patterns in user behaviour? Certain patterns of clicks that could be identified to highlight users who are struggling to find the data that they want? For example, opening lots of presentation folders in the Answers editor before adding columns to the analysis? Can we extend that to identify users who are struggling to use the tool and are going to "churn" (stop using it) and thus contact them before they do so to help resolve any issues they have?
TODO 04
At the moment the scripts are manual to invoke and run. It would be neat to package this up into a service (or set of services) that could run automagically at server boot.
Until then, using GNU screen
is a handy hack for setting scripts running and being able to disconnect from the server without terminating them. It's like using nohup
... &
, except you can reconnect to the session itself as and when you want to.
TODO 05
Click events have defined 'Request' types, and these I have roughly grouped together into 'Request Groups' to help describe what the user was doing (e.g. Logon / Edit Report / Run Report). Not all requests have been assigned to request groups. It would be useful to identify all request types, and refine further the groupings.
TODO 06
At the moment only clicks in Presentation Services are captured and analysed. I bet the same can be done for Data Visualization/Visual Analyzer too ...
Problems?
Please raise any issues on the github issue tracker. This is open source, so bear in mind that it's no-one's "job" to maintain the code - it's open to the community to use, benefit from, and maintain.
If you'd like specific help with an implementation, Rittman Mead would be delighted to assist - please do get in touch to discuss our rates.
Insights – An Open-Source Visualisation Platform for OBIEE
On and off over the last year, I have spent some time developing a customisable framework for building visualisations and dashboards, using OBIEE as the back-end. The result is Insights, a JavaScript web application that offers a modern alternative to OBIEE Answers. As of today, we have officially open sourced the project, so you are free to download, install, hack and contribute as you please.
The primary motive for building this application was to meet some very bespoke reporting requirements for a client, which I mention in my previous blog describing the prototype. During this piece of work I wrote an object orientated interface for the OBIEE web services. The icing on the cake was tying it into Tom Underhill's Visual Plugin Pack.
You can see more information about Insights in a presentation that I did at the recent UKOUG conference here: Bridging the Gap: Enhancing OBIEE with a Custom Visualisation Platform
Since then a lot of the work has been put in to make it developer friendly, visually appealing and hopefully easier to use. I'll be the first to admit that it's far from perfect, but it should be a decent starting point.
Getting Started
In order to use Insights you will need OBIEE 11.1.1.9 or above. Additionally, the application has only been tested using IE11 or Chrome browsers and so compatibility with other browsers cannot be guaranteed.
First, download the application or fork the Git repository.
There is an installation guide in the project at docs/installation.html
. Follow this guide to deploy the application on your OBIEE server.
Demo
This is a quick step-by-step demonstration creating a basic dashboard, showing off some of the features in the application (apologies if the GIFs take a while to load).
First you log in, using your usual OBIEE credentials. The homepage shows some pre-configured dashboards here, but we're going to click the pencil to create a new one.
Next I've dragged in some columns from my subject area, Sample App and run the query, displaying the default table plugin.
In this step, I've gone to the configuration tab, and changed the colour of my table.
Now I change the plugin type using the drop down menu at the top. Notice that my previous table visualisation gets stored on the right. By clicking the Store button manually, it also adds my new pie chart. Then we can flick between them easily.
Filters can be added by clicking the icon next to the column on the subject area panel.
Adding in a sunburst chart, and playing with some of the colours here.
Now we have our visualisations, we can begin constructing our dashboard. You can freely move around and resize the visualisations as you choose. I recommend hiding the panels for this, as the full screen is much closer to what users will see when viewing the dashboard.
The next GIF shows the interaction framework, which can be used to implement UI features where the user interacts with one visualisation and another visualisation on the page reacts to it. In its most basic form, each plugin type can be filtered - where OBIEE runs the query again. Although more complex reactions that are specific to a certain chart type can also be configured, as seen below with the sunburst chart.
Dashboard prompts can be added by clicking the filter icon next to one of the RPD columns. Any visualisations using this subject area will respond to the prompt. The prompt box can be freely placed on the canvas like any other object.
Finally, we can save the object to the web catalogue. This saves as a hidden analysis object in the OBIEE web catalogue and contains all of the information to recreate the dashboard when loading. All OBIEE security features are preserved, so users will only be able to access folders and reports they have permissions for.
Finished dashboards can be viewed in the application once they have been saved. The dashboard viewer will show all dashboard objects in that folder as different pages, available from the left pane. Images can be exported to PNG and PDF as well as data from the visualisations exporting to Excel and CSV.
So How Do I Learn More?
The slides that I did at UKOUG describing Insights give a comprehensive overview of the design behind the tool. You can find them here.
Summary
In a nutshell, those are the main features of the application. Feel free to try it out and have a read through the documentation (available through the application itself or offline as HTML files in the docs
directory).
As an open source application there is no official support, however if you experience any bugs or have any requests for enhancements, please post them on the issue tracker.
We hope you enjoy using the app and if you would like to enlist our expertise to help you deploy and develop using this platform, feel free to contact us to discuss it further.