Tag Archives: Uncategorized

Smoothing the Transition – The New Smart View 11.1.2.1.102 for Microsoft Office and OBIEE

Introductions

There’s a good chance that, if you’re reading this, you likely perform some reporting, analytics, data stewardship role or probably some combination of all three. And be it for a large corporation or a small company, there are likely standards and practices that pertain to how the above jobs are performed on a day to day basis; not easily changed and perpetually validated by big budgets and long careers. It is equally likely that deeply ingrained within these reporting practices lies some moderate to heavy implementation of Excel. It wasn’t long ago that I found myself utilizing the spreadsheet program on a daily basis and for hours upon hours at a time.

What this essentially amounted to:

  • Pulling down large amounts of data from our department’s data model using large SQL queries that themselves could take most of the day to elucidate, let alone waiting on the query to yield results, which could easily warrant a bathroom break, a phone call, or if you were feeling adventurous, catching up on email.
  • Validating your results
  • Exporting to Excel (key step here!)Massaging and formatting your data by implementing innumerable and often unwieldy functions that deserved their own time slot on your schedule for the day to figure out
  • Proofing your analysis so that it got to management in ship shape
  • Hoping that an analyst from another department who utilized the same metric on their report and who would be at the same meeting actually coincided with yours

                   

Fast forward a bit and I’m sitting here, writing this blog as a sort of proverbial white flag in the great battle between Excel and the behemoth that is OBIEE. And just what is this white flag? Why, it’s Oracle’s most recent iteration of Smart View, which provides expanded functionality and support for the Microsoft Office Suite of programs. Namely, its golden boy, Excel. That’s right, Excel, the darling of office staff everywhere, the program upon which empires rise and fall. In paraphrasing a quote from www.cfo.com, some 64% of public and private companies still use Excel and other “manual” solutions to perform their finance functions. So, in the world of the spreadsheet, when does it makes sense to cross that blurry line from cell to subject area? Smart View now makes answering that question much easier. It seems that they’ve really gotten a grasp on the formatting shortcomings of the last version and made up for it in spades. Or, so at least they claim.

The Test Run – OBIEE to Excel 

The example below illustrates a simple import via Smart View. I generated a dashboard in Answers which mimics that of an Excel design I found online. Thank the good folks over at www.chandoo.org for their excellent skills in Excel dashboarding and for providing plenty of great examples. The dashboard contains a table with a selection of KPI’s that the user may then choose to sort on via a View Selector (each view has been sorted on a different KPI and is on a different Compound Layout). Upon selecting a KPI, the analysis will then display the Top 10 products by the KPI selected. In addition, the table contains conditional formatting which simply alerts users to the variance between different KPI’s and their targets. Lastly, there is a scatter plot view which displays our Product dimension as seen through the lens of Revenue and Quantity. Per the most recent Oracle documentation, we shouldn’t have any trouble including the current selections of a dashboard prompt either. Let’s see how it performs when we move it over to Excel.

 “OBIEE report and page prompts are fully supported as part of the import process. Dashboards can be imported through Oracle Smart View on a per page basis or the entire dashboard. Prompts are applied at the current state of the logged in user. Future releases of the product will support dashboard prompts directly through Microsoft Office.”

SVB 2

The Results

And there you have it! Excel displays our table and graph views as per the most recent selection from the Dashboard prompt. But wait! Our conditional formatting seems to be missing and to prove this, this is even the case when exported directly from the analysis view as an Excel workbook.

SVB 3

Conditional Formatting

For our second scenario, let’s see how Excel handles a simpler, heat map style conditional formatting. I’ve made a simple table on our dashboard that measures Revenue, Quantity Sold, and the Average Order in $. I set up conditional formatting around the Average Order measure to see how Excel handles importing the color scheme for the currently selected Time parameters on the dashboard.

Contrastingly, we see that Smart View has preserved a simpler, Heatmap style of conditional formatting when imported from OBIEE through Smart View. So, perhaps it is Excel’s lack of corresponding graphic in the previous example that has caused the migration snafu? OBIEE doesn’t even seem to render our arrow graphics as per the documentation.

“Oracle BI Customizations and View Standards – The Import of Oracle BI content can leverage the customizations and view standards used within an OBIEE environment. All view designed modifications such as conditional formatting, background colors or data configuration is automatically translated to the Microsoft Office environment.”

SVB 4

SVB 5

 

Excel to OBIEE

Let’s see what the latest edition of Smart View offers when moving an analysis from Excel to OBIEE.
Because we weren’t able to import our full table view, why don’t we construct it using the View Designer? The interface looks clean and provides an intuitive approach to producing basic Answers views. Accessing our subject area, I simply selected the columns that matched those on our Answers analysis. After clicking ‘OK’, sorting on our Revenue column from largest to smallest and doing a little deleting, we have a pseudo ‘Top 10’ analysis by Revenue. Given the aesthetic attributes of our Answers analysis, lets see how we’re going to replicate this in Excel.

 

SVB 6

SVB 8

SVB 7

After selecting the table, we can navigate to the Design tab under ‘Table Tools’ and select an alternating Grey scheme which gives us the ‘Enable Alternate Styling’ design quality. Now lets add some formulas and conditional formatting that will give us our Calculated column equivalents. We can insert two rows, one between Revenue and Target, and between Qty and Target, to make room for conditional formatting and Excel’s Icon Sets feature. We then create a simple formula that subtracts Revenue and Quantity from their respective targets in the column between the two, assign conditional formatting and voila! Excel even has a check box that lets you show the arrow only.

 

SVB 9

SVB 10

From Excel, we can select Publish View to deposit our analysis into our Shared Folder. The results indicate a sort of ‘two way street’ between Smart View and Excel and vice versa. Neither totally supports the formatting capabilities of the other, as if to say Smart View is giving ground with every new release. In this blog, we’ve taken a look at how Smart View handles some mildly complex conditional formatting and what it takes to replicate this feature in native Excel. In a user environment where reports are flying back and forth between the two platforms, Smart View definitely makes sense, however it might be advisable to simply deliver the minimum of what is needed and let an end user make any formatting based modifications. After all, who would want to do all that work only to have it lost in translation?

End-to-End ODI12c ETL on Oracle Big Data Appliance Pt.1 : Flume to Initial Hive Table

A few months ago I posted an article on the ODI12c examples in the new Oracle Big Data Appliance, and over the past few weeks I’ve been writing about the various components within the Cloudera CDH Hadoop stack, including Hive, Pig, Spark and Flume. Since then I’ve built out a virtualized example of an Oracle Big Data Appliance using the Oracle Big Data 3.0 software set, and I thought it’d be interesting to create an example ETL flow through the system showing how ODI could be used to initiate and control the process. As with any situation where you actually build a demo to do something specific, as opposed to just play around with the technology, you end up uncovering a few quirks and techniques that you wouldn’t have otherwise been aware of, so I’ll spend this week going through the various steps and calling out things others in the same situation might find useful – the steps I’ll go through are below, and I’ll add the links as the articles get published during the week;

As an overview, what I’ll be working with is a six-node Hadoop cluster running Cloudera CDH5, Oracle Big Data Connectors 3.0 and Oracle Data Integrator 12c, as you’d get with the current version of Oracle Big Data Appliance. Obviously BDA goes up to eighteen nodes as a maximum, and under the covers there’s lots more memory and much better networking that I was able to set up on VMWare ESXi, but what I’ve got is enough to prove the process. The diagram below shows the six nodes, and where the software is installed.

NewImage

I took a couple of short-cuts with the setup; obviously each node has a lot less RAM than BDA’s 64GB per node, but the main node in the cluster (bdanode1) running the HDFS NameNode and all the ODI software got bumped-up to 28GB, with the next two allocated 12GB and the others 8GB – enough to work through a demo at least. I also ran ODI Studio on the bdanode1 as well, instead of setting it up on a separate client VM, mainly to avoid having to set up all the Hadoop and Hive libraries on another machine. Other than that though, its the same CDH distribution you get on BDA, the same version of ODI and the Big Data Connectors, and so on, with the following software versions and downloads being used:

Setup of the Hadoop cluster is really out of scope for these articles, except to say that with CDH5, I find it easier to select the (non-default, deprecated) Packages install type rather than the new Parcels type, as this new methods installs all of the Hadoop software on each node in a new place – /opt/cloudera – rather than the usual /usr/lib/hadoop, /usr/lib/hive and so forth, meaning that most configuration examples you’ll read point to the wrong place for your install. Parcels are Cloudera’s way forward for delivering software components (there’s advantages in terms of patching across the cluster) but if you’re getting started with the Hadoop platform, installing in the old location usually makes things easier to follow. Other than that, the two bits of configuration you need to do is firstly to tell ODI Studio where to find that various Hadoop libraries and configuration files; because I’ve installed Studio directly on the main Hadoop node, I could then just add that node’s file locations to the Oracle user’s $HOME/.odi/oracledi/userlib/additional_path.txt file, so that it looked like this:

Also to make use of Oracle Loader for Hadoop, one of the Oracle Big Data Connectors and something we’ll use at the end to bulk-unload data from Hadoop to an Oracle database, the second thing I’ll need to do is set a couple of environment variables in the “oracle” user’s .bashrc profile file pointing to where OLH is installed, and where the Hadoop and Hive libraries and configuration files are:

[oracle@bdanode1 userlib]$ cat $HOME/.bashrc
# .bashrc
 
# Source global definitions
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi
 
# User specific aliases and functions
export HADOOP_CLASSPATH=/home/oracle/oracle/product/oraloader-3.0.0-h2/jlib/*:/etc/hive/conf
export OLH_HOME=/home/oracle/oracle/product/oraloader-3.0.0-h2

The scenario I’ll be working with is similar to the ones I covered on the blog recently, where I landed data from the Rittman Mead blog webserver into HDFS files using Hive, and then processed the files using Hive, Pig, Spark and so on. In this example though, I’ll use ODI to do the data manipulation where possible, but still use Hive and so forth under-the-covers to do the work. The diagram below shows the data flow that i’ll be looking to set up in the example:

NewImage

So in this scenario incoming data is being landed in the Hadoop cluster by Flume, using the process outlined in this previous blog post. All Flume is is a transport mechanism; it doesn’t in itself have any processing ability (making it analogous to GoldenGate) and all it does it transport log file entries from one place to another, via Flume agents on either end. All we as developers need to be aware of is (a) where the log files will be landed, and (b) that Flume will keep continuously writing to these files until the source log file gets rotated – therefore I can’t assume a log file is completely written to when I read from it.

What this means in practice is that if I want to do incremental loads, I need to consider the fact that a file I’m reading to might have more data in it later on. There’s various solutions to this – principally having Flume write to HBase, rather than raw HDFS files, and then I read from the HBase database noting the last extraction point at the time – but to keep things simple I’ll just do a full-load each time the ETL run takes place, meaning that I don’t need to think about incremental loads throughout the system.

So the first thing I need to do is have ODI take the incoming files and load them into a Hive table. To do this I set up Topology entries for the incoming HFDS files, and here’s the first “gotcha” – to create a connection to HDFS files, you use the regular File technology, but you leave the JDBC driver type blank, and put the address of the HDFS NameNode in to the JDBC URL – which of course is technically invalid and won’t allow you to either test it, or reverse-engineer the file names in, but is a “hack” used by the IKM File to Hive KM to get the address of your HDFS NameNode (if you choose to source the incoming file from HDFS rather than the local filesystem).

NewImage

Then, when you come to register the physical schemas to go with the new File technology data server, you’ll need to be aware that ODI just appends the final file name to the directory name when retrieving the file data – so if you want the connection to point to a directory, rather than just a file, you’ll need to set up the physical schema to be the directory “above” the one you’re interested in, and then set the file name later on to be that directory. In this example , I want the final file reference to point to hdfs://bdanode1.rittmandev.com/user/oracle/weblog_incoming_files/apache_access_combined, a whole directory (HDFS aggregates all the files in a directory if you reference just the directory in an operation) rather than just a single log file. You can see the directory and the log files it contains in the Hue screenshot below:

NewImage

I therefore set the physical schema to be hdfs://bdanode1.rittmandev.com/user/oracle/weblog_incoming_files, and the file reference in the datastore model is set to the final directory name, like this:

NewImage

If it seems a bit confusing, it’ll become clearer in a moment.

Now I need to go back to the Topology navigator and create a connection through to the Hive server on the Big Data Appliance VMs – in fact recent versions of CDH (CDH4, CDH5) swap out the old Hive Server for HiveServer2, so you’ll need to use the correct JDBC drivers (as supplied with CDH4/5) to connect to it, and also create the JDBC URL in the format jdbc:hive2://[machine name:port], as shown inn the screenshot below:

NewImage

A quick note about security at this point; by default, most CDH4/5 clusters are set up as what’s called “unsecured”, which means that whilst you use a username and login to connect to Cloudera Manager, for example, by this default although Hive and Impala request user credentials when you connect, they don’t actually check the password against anything, for example your LDAP server. You can connect these tools to LDAP or Active Directory, and typically you’d combine this with Kerebos authentication between the various components and when you connect via Impala and Hive, and typically you’d also use Apache Sentry to provide role-based access control to the data within individual HDFS files. But by default, Hive will accept more or less anything as a password when you connect, but then you may hit issues later on when your HDFS files are owned by a different user to the one you connect as.

Where this manifests itself is when a Hive table has underlying HDFS files owned by, say, the “admin” user in CDH5 (because that’s how you connected to Hue to upload them), but then you connect as the “oracle” user through Hive to then manipulate the Hive table contents. In practice, what I try to do is create any Hive tables (using Hue) as the user I’m then going to connect in Hive to them as, which means you’ll most probably need to go into Hue and create a new user called “oracle” (if that’s what you’re running ODI as, and connecting through Hive as) before creating the Hive tables you’ll then import into the ODI topology.

So once you’ve done all that, you can go into the Designer navigator and reverse-engineer the definition of your Hive tables into datastore models. In my case, I’ve got a bunch of tables that I’ll be using throughout the whole ETL process.

NewImage

Now it’s time for the first ODI mapping, to take the raw log file directory and load it into a Hive table. As it stands though, these raw log files are rows of just a single “column” of data – the log file entry in Apache CombinedLogFormat format. To make them useful to the rest of the ETL process I’ll somehow need to parse them into the individual log file elements, so I create a target Hive table that contains an entry for the raw log entry, and then columns for the various log file elements:

NewImage

The way that I parse the log file is to use a feature within the IKM File to Hive (LOAD DATA) KM that allows me to specify a regular expressed Serde (Serializer-Deserializer) to parse the log file entry into its individual columns, like this (note that you’ll need to ensure the hive-contrib-* JAR file is available to all of your Hadoop nodes before using this SerDe)

NewImage

In this instance, I want the KM to leave the source files in-place when doing the data load (Hive by default moves incoming source files into the /user/hive/warehouse directory area) as these files most probably haven’t been finished written-to by Flume yet, so I leave the EXTERNAL_TABLE value set to true (Hive external table, not Oracle external table) and make sure FILE_IS_LOCAL is set to FALSE, so that this KM knows to use the HDFS file location hack I set up in the topology earlier. Then, I just run the mapping and check that it’s worked OK:

NewImage

and I can check from the Model pane in the Designer navigator that I’ve now got a Hive table of individually split-up log entry columns to work with, for the rest of the downstream ETL process:

NewImage

So that’s the first stage done – next, I’ll be combining this Hive table with data from another one, using the IKM Hive Control Append KM.

Rittman Mead BI Forum 2014 Call for Papers Now Open!

It’s January 2014, and it’s that time of year when we start planning out this year’s BI Forum, which like last year’s event will be running in May 2014 in Brighton and Atlanta. This will be our sixth annual event, and as with previous year’s the most important part is the content – and as such the Call for Papers for BI Forum 2014 is now open, running through to January 31st 2014.

If you’ve not been to one of our BI Forum events in past years, the Rittman Mead BI Forum is all about Oracle Business Intelligence, and the technologies and techniques that surround it – data warehousing, data analysis, big data, unstructured data analysis, OLAP analysis and this year – in-memory analytics. Each year we select around ten speakers for Brighton, and ten for Atlanta, along with keynote speakers and a masterclass session, with speaker choices driven by attendee votes at the end of January, and editorial input from myself, Jon Mead and Stewart Bryson.

NewImage

Last year we had sessions on OBIEE internals and new features, OBIEE visualisations and data analysis, OBIEE and “big data”, along with sessions on Endeca, Exalytics, Exadata, Essbase and anything else that starts with an “E”. This year we’re continuing the theme, but are particularly looking for sessions on what’s hot this year and next – integration with unstructured and big data sources, use of engineered systems and in-memory analysis, advanced and innovative data visualisations, cloud deployment and analytics, and anything that “pushes the envelope” around Oracle BI, data warehousing and analytics.

NewImage

The Call for Papers entry form is here, and we’re looking for speakers for Brighton, Atlanta, or both venues. We’re also looking for presenters for ten-minute “TED”-style sessions, and any ideas you might have for keynote speakers, send them directly to me at mark.rittman@rittmanmead.com. Other than that – have a think about abstract ideas now, and make sure you get them in by January 31st 2014.

For 24 Hours Only: Test our New Support Service Desk and Get Free OBIEE Support!

As well as offering consulting and training around Oracle’s BI, DW and EPM products, Rittman Mead also provide managed services and support around these products for customers around the world. We’re just about to move over to a new service desk system based on Atlassian Jira, and as part of our final “soak test” we’re going to open up the service publicly for the next 24 hours, taking on public OBIEE support requests on a “best endeavour” basis.

From mid-day today (Wednesday, 18th December 2014, UK time) through to mid-day tomorrow, you can register here and then raise a support ticket, and we’ll endeavour to answer your query over the next few days. As you’d expect, there are a few conditions; you must register with your work email address, and not one from Gmail, Yahoo, Outlook.com etc, and whilst we’ll try and answer your query, there are no promises and we may have to limit replies and accepted tickets. Full details of the offer, and for our global managed services, are on our support page here including terms and conditions, how to register and how to raise a ticket.

 

UPDATE: Thanks for your interest in our support system. We have now closed registration…

Keep checking back here for future support offerings! Thanks and have a great Christmas and New Year.

Oracle Endeca Information Discovery 3.1 and Self- Service Data Mashup capability

Oracle Endeca Information Discovery (OEID) 3.1 has been released yesterday, 7th of November. The new version is a good step forward to satisfy business users with providing much better self-service discovery capabilities. OEID 3.1 now enables its non-technical users to securely run agile business intelligence analysis on variety of data sources in much easier way and without need to IT. On the other hand the integration with Oracle Business Intelligence is now even more tight to the extend that Oracle announced OEID 3.1 as “the only complete data discovery platform for the enterprise”. The product data sheet is accessible here and in the same way as older versions of OEID, this software is downloadable via Oracle e-Delivery website which is the Oracle Software Delivery Cloud where you can find downloads for all licensed Oracle products.

On the first release of Provisioning-Service on OEID 3.0 on March 2013, users could upload only one file and it had to be in Excel format. Issue was proving that having data on Endeca Server was a better solution compare to Excel itself. Not being able to join datasources together and having no Text-enrichment analysis on the data was enough to keep the Provisioning service very limited. Good news is that new version has answers to all requirements from a business intelligence provisioning service tool.

Having a quick start, here is a list of new features I came across at first glance:

  • Users now can join information from multiple sources such as files, databases or other pre-built Endeca Servers. Files can be in Excel format for structured data or JSON for semi-structured data mostly coming from social media interactions which will lead to more easier combination of social media with other available data in enterprise.

0102

  • Having Geocode data in your source, it should be matter of a click to add a Map component to the dashboard where it will automatically find information it requires from available data sources. New Map component is not only more powerful in what it does, such as being able to display as Heat Layer, but also looks much better!

03

  • Business users can perform Enrichment on their own unstructured text data to identify hidden entities, sentiment and etc without support from IT. The new application setting has got the capability to add text-enrichment and text taggers to data sources.

04

05

  • Managing Data sets such as reloading resources, adding record, managing attributes or deleting data sets is quite straight forward with in the application settings.

06

Look out for my next post where I’ll go more in-detail in the exciting new features of OEID v3.1.