Tag Archives: Cloud

Replicating Hive Data Into Oracle BI Cloud Service for Visual Analyzer using BICS Data Sync

In yesterday’s post on using Oracle Big Data Discovery with Oracle Visual Analyzer in Oracle BI Cloud Service, I said mid-way through the article that I had to copy the Hadoop data into BI Cloud Service so that Visual Analyzer could use it; at present Oracle Visual Analyzer is only available as part of Oracle BI Cloud Service (BICS) so at some point the data prepared by Big Data Discovery had to be moved into BICS so that Visual Analyzer (VA) could access it. In the future once Visual Analyzer is available on-premise as part of Oracle Business Intelligence 12c we’ll be able to connect the on-premise RPD directly to Hadoop via the Cloudera Impala ODBC driver, but for now to get this early access to VA features we’re going to have to copy the data up to BICS and report on it from there. So how does this work?

With this second release of BICS there are actually a number of ways to get on-premise data up into BICS’s accompanying database service:

  • As before, you can export data as CSV or an Excel Spreadsheet, and upload it manually into BICS using the Data Load feature (the approach I took in this recent Oracle Magazine article)
  • You can use SQL*Developer to SFTP “carts” of Oracle database data up into BICS, where it’ll then be unpacked and used to create Oracle tables in the accompanying database
  • You can now also connect BICS to the full Oracle Database-as-a-Service, a full database rather than a single schema that also provides a SQL*Net connection that ETL tools can connect to, for example ODI or Informatica
  • And there’s now a new utility called “Data Sync” that we’ll use in this example, to replicate tables or files up into BICS’s database store with options for incremental refresh, drop-and-reload and so forth

In our case the situation is a bit more complicated in that our data sits in a Hadoop cluster, as Hive tables that we’re accessing through the Cloudera Impala MPP engine. OBIEE 11.1.1.9 can actually connect directly to Impala and if we were just using Answers and Dashboards we wouldn’t have any more work to do, but as we’re using VA through BICS and BICS can’t access on-premise data sources, we need some way of copying the data up into BICS so VA can access it. Again, there’s many ways you can get data out of Hive on Hadoop and into databases and files, but the approach I took is this:

  1. First export each of the Hive tables I accessed through the on-premise RPD into CSV files, in my case using the Hue web-based user interface in CDH5
  2. Then use the Data Sync to upload the contents of those CSV files to BICS’s database store, selecting the correct Oracle datatypes for each of the columns
  3. Do any modeling on those tables to add any sequences or keys that I’m going to need when working with BICS’s more simplistic RPD modeller
  4. Then create a replica (or as close to replica) RPD model in BICS to support the work I’m going to want to do with VA

Again, there are also other ways to do this – another option is to just lift-and-shift the current 11.1.1.9 RPD up into BICS, and replicate the Hive/CSV data into Oracle Database-as-a-Service and then repoint the uploaded RPD to this service, but I haven’t got a DBaaS instance to-hand and I think it’d be more useful to replicate using BICS and recreate the RPD manually – as that’s what most customers using BICS will end-up doing. So the first step then is to export the Hive data out into CSV files using Hue, by first running a SELECT * FROM … for each table, then using the menu option to export the query results to a CSV file on my workstation.

NewImage

Then it’s a case of setting up BICS Data Sync to first connect to my BICS cloud instance, and then selecting one-by-one the CSV files that I’ll be uploading into BICS via this tool.

NewImage

Of course anyone who’s been around Oracle BI for a while will recognise Data Sync as being built on the DAC, the ETL orchestration tool that came with the 7.9.x releases of BI Apps and worked in-conjunction with Informatica PowerCenter to load data into the BI Apps data warehouse. The DAC is actually a pretty flexible tool (disclaimer – I know the development PMs at Redwood Shores and think they’re a pretty cool bunch of people) and more recently it gained the ability to replicate BI Apps DW data into TimesTen for use with Exalytics, so it’s pluggable architecture and active development team meant it provided a useful platform to deliver something in-between BICS’s ApEX data uploader and fully-fledged ODI loading into Oracle DBaaS. The downside of using something built on the DAC is that the DAC had some UI “quirks”, but equally the upside is that if you know the DAC, you can pretty much pick up Data Sync and guess how it works.

As part of uploading each CSV file, I also get to sample the file contents and confirm the datatype choices that Data Sync has suggested; these can of course be amended, and if I’m bringing in data from Oracle, for example, I wouldn’t need to go through such an involved process. 

NewImage

Then it’s a case of uploading the data. In my case one of the tables uploaded OK first time, but an issue I hit was where Hive tables had numeric columns containing NULLs that got exported as the text “NULL” and then caused the load to fail when trying to insert them into numeric columns. Again, a bit of knowledge of how the DAC worked came in useful as I went through the log files and then restarted parts of the load – in the end I replaced the word NULL with an empty string and the loads then succeeded. 

NewImage

Now the data should be uploaded to BICS, you can check out the new tables and their contents either from within BICSs Data Modeller function, or from within the ApEx console that comes with BICS’s database part.

NewImage

One thing I did know I’d have to give some thought to was how to do the types of RPD modelling I’d done in the on-premise RPD, within the more constrained environment of the BICS data modeller. Looking back at the on-premise RPD I’ve made a lot of use of aliases to create fact and dimension versions of key log tables (posts, tweets) and multiple versions of the date dimensions, whereas in BICS you don’t get aliases but you can create database views. What was more worrying was that I’d used columns from the main webserver log table to populate both the main logical fact table and another dimension whilst still keeping a single source table as their physical source, but in BICS I’d have to create these two sources as views and then join them on a common key, which would be tricky as the log table in Hive didn’t have an obvious primary key. In the end I “cheated” a bit and created a derived copy of the incoming log file table with a sequence number added to it, so that I could then join both the derived fact table and dimension table on this synthetic unique key column.

NewImage

Now it’s a case of modelling out the various incoming tables uploaded via Data Sync into the facts and dimensions that the BICS data model will use; again something to be aware of is that each of these tables will need to join to its relevant dimensions or facts, so you need to leave the joining keys in the fact table rather than remove them as you’d do when creating logical fact tables in on-premise OBIEE.

NewImage

Tables that only perform one role, for example the IMP_RM_POSTS_VA table that contains details of all blog posts and web pages on our site, can be brought into the model as they are without creating views. For the second time when I add in the time dimension table, this time to create a time dimension role table for the Tweets fact table, I have to create a view over the table that performs a similar role to alias tables in on-premise OBIEE, and I’m then directed to create a fact or dimension object in the model from that view.

NewImage

Once this is all done, I end up with a basic BICS data model that’s starting to look like the one I had with the on-premise OBIEE install.

NewImage

Then finally, once I’d amended all the column names, brought in all of the additional columns and lookup tables to provide for example lists of Twitter user handles, I could then view the model in BICS’s Visual Analyzer and start produce data visualisation projects off of it.

NewImage

So – it’s admittedly a bit convoluted in the first iteration but once you’ve set up the BICS data model and the Data Sync upload process, you can use DataSync to refresh the replicated Hive table data in the BICS database instance and keep the two systems in-sync. As I said, OBIEE12c will come with Visual Analyzer as part of the on-premise install, but until then this is the way we link VA to Big Data Discovery on Hadoop to enable Visual Analyzer access to BDD’s enriched datasets.

New In Oracle BI Cloud Service – Oracle Visual Analyzer, and Data Mashups in VA

Oracle released an update to Oracle BI Cloud Service a few weeks ago that included Oracle Visual Analyzer, along with some other improvements including support for full Oracle Database-as-a-Service as the database backend, the ability to upload RPDs and run them in the cloud, and support for a new utility called DataSync. In this post though I want to take a quick look at Visual Analyzer, and in-particular look at the data-mashup feature it provides.

Visual Analyzer is of course one of the tentpole features in OBIEE12c that we’ve all been looking forward to, as is 12c’s ability to allow users to upload spreadsheets of data and join them to existing subject areas in Answers. I’m covering Visual Analyzer in an upcoming edition of Oracle Magazine so I won’t go into too much detail on the product at a high-level here, but in summary Visual Analyzer provides a single-pane-of-glass, Tableau-type environment for analysing and visualising datasets stored in Oracle Cloud Database and modelled in BICS’s cut-down web-based data-modeller. In the Oracle Magazine article I take the Donors Choose dataset that we featured at the recent Rittman Mead BI Forum 2015, and create a range of visualizations as I explore the dataset and pick the type of project I’d most like to donate to.

NewImage

Visual Analyzer differs from Answers in that all of the available data items are listed down one side of the page, there’s no flicking backwards-and-forwards between the Criteria tab and the Results tab, filters are set by just right-clicking on the column you wish to filter by, and the visualisation builds up in front-of your eyes as you add more columns, move things around and arrange the data to get the most appropriate view of it. From an IT manager’s perspective, where Visual Analyzer improves on desktop analysis tools such as Tableau and Spotfire is that the data you work with is the same governed dataset that Answers and Dashboards users work with, the same security rules and auditing apply to you as to other Presentation Services and Catalog users, but those types of “self-service” users who just want to play-around with and explore the data – rather than create reports and dashboards for mass consumption – now can work with the type of tool they’ve up-to-now had to look elsewhere for.

NewImage

One of the other headline features for OBIEE12c announced at last year’s Oracle Openworld is “Model Extensibility and Data Mashup”. Announced as part of Paul Rodwick’s “Business Analytics and Strategy Roadmap” session and described in the slide below, this feature extends the capabilities of the BI Server to now handle data the user uploads from the Answers (and now Visual Analyzer) report creation page, joining that data as either “fact extensions” or “measure extensions” to an existing Presentation Services subject area. 

NewImage

I won’t go into the technical details of how this works at this point but in terms of how it works for the end-user, let’s consider a situation where I’ve got a spreadsheet of additional state-level data that I’d like to use in this Visual Analyzer (VA) project, to in this case colour the states in the map based on the income level of the people living there. The spreadsheet of data that I’ve got looks like this:

NewImage

Note the cunningly-named columns in the first row – they don’t have to match the column names in your VA data model, but if they do as you’ll see in a moment it speeds the matching process up. To add this spreadsheet of data to my VA project I therefore switch the menu panel on the left to the Data Sources option, right-click and then choose Add Data Source…

NewImage

Then using the Add Data Source dialog, upload the XLSX file from your workstation. In my instance, because I named the columns in the top row of the spreadsheet to match the column names already in the BICS data model, it’s matched the SCHOOL_STATE column in the spreadsheet to the corresponding column in the SCHOOLS table and worked out that I’m adding measures, joined on that SCHOOL_STATE column.

NewImage

If my spreadsheet contained other text fields matched to the existing model via a dimension attribute, the upload wizard would assume I’m adding dimension attributes, or if it detects them wrong I can match the columns myself, and specify whether the new file contains measures or attributes. BICS then confirms the join between the two datasets and I can then start selecting from the new measures to add to my project.

NewImage

My final step then is to add the HOUSEHOLD_INCOME measure to my visualisation, so that each state is now shaded by the household income level, allowing me to see which states might benefit most from my school project donation.

NewImage

One thing to bear-in-mind when using mashups though, is that what you’re effectively doing is adding a new fact table that joins to the existing one on one or more dimension levels. In my case, my HOUSEHOLD_INCOME and POPULATION measures only join to the DONATIONS dataset on the SCHOOL dimension, and then only at the STATE level, so if I try and reference another column from another dimension – to add, for example a filter on the FUNDING STATUS column within the PROJECTS dimension – the project will error as that dimension isn’t conformed across both facts.

NewImage

My understanding is that Oracle will fix this in a future release by setting all the non-conformed dimensions to “Total” as you can do with the on-site version of OBIEE yourself, but for now this restricts mashups to datasets that use fully-conformed dimensions, and with filters that only use those conformed dimensions from the join-level up.

So that’s VA on BICS in a nutshell, with this article drilling-down further into the very interesting new data mashup feature. Look out for more on this new release of BICS soon as I cover the new DataSync feature, RPD uploads and connecting BICS to the full Oracle Database-as-a-Service.

 

New Oracle Magazine article on Oracle BI Cloud Service

NewImageThe May/June 2015 edition of Oracle Magazine is now out, and my Business Intelligence article in this edition is on Oracle BI Cloud Service (BICS). In “Upload, Model, Analyze and Report” I focus on the “departmental power-user” use-case where someone with a small team wants to share data and dashboards with others in the department, is familiar with OBIEE but wants to get something up-and-running quickly without having to include IT or the formal development process. By just uploading a spreadsheet of data and quickly modeling it into a star-schema using BICS’s simple-to-use web-based tools, you can create reports that can be shared with others in your team using OBIEE’s familiar dashboard interface.

Rittman Mead offer a number of services around Oracle BI Cloud Service and now have our own BICS “pod” for demonstration and evaluation purposes. Now that Oracle have released Visual Analyser early for BICS we’re expecting a lot of interest and demand for services, support and training around Oracle’s cloud version of OBIEE, so if you’re interested in moving part of your OBIEE estate in the cloud, or you’re a departmental BI lead looking to run OBIEE within your department without the need to get IT involved, drop us a line at enquiries@rittmanmead.com and we’ll be pleased to help.

Presentation Slides and Photos from the Rittman Mead BI Forum 2015, Brighton and Atlanta

It’s now the Saturday after the two Rittman Mead BI Forum 2015 events, last week in Atlanta, GA and the week before in Brighton, UK. Both events were a great success and I’d like to say thanks to the speakers, attendees, our friends at Oracle and my colleagues within Rittman Mead for making the two events so much fun. If you’re interested in taking a look at some photos from the two events, I’ve put together two Flickr photosets that you can access using the links below:

NewImage

We’ve also uploaded the presentation slides from the two events (where we’ve been given permission to share them) to our website, and you can download them including the Delivering the Oracle Information Management and Big Data Reference Architecture masterclass using the links below:

Delivering the Oracle Information Management & Big Data Reference Architecture (Mark Rittman & Jordan Meyer, Rittman Mead)

Brighton, May 7th and 8th 2015

Atlanta, May 14th and 15th 2015

Congratulations also to Emiel van Bockel and Robin Moffatt who jointly-won Best Speaker award at the Brighton event, and to Andy Rocha and Pete Tamsin who won Best Speaker in Atlanta for their joint session. It’s time for a well-earned rest now and then back to work, and hopefully we’ll see some of you at KScope’15, Oracle Openworld 2015 or the UKOUG Tech and Apps 2015 conferences later in 2015.

Last Chance to Register for the Brighton Rittman Mead BI Forum 2015!

It’s just a week to go until the start of the Brighton Rittman Mead BI Forum 2015, with the optional one-day masterclass starting on Wednesday, May 6th at 10am and the event opening with a reception and Oracle keynote later in the evening. Spaces are still available if you want to book now, but we can’t guarantee places past this Friday so register now if you’re planning to attend.

NewImage

As a reminder, here’s some earlier blog posts and articles about events going on at the Brighton event, and at the Atlanta event the week after:

We’re also running our first “Data Visualisation Challenge” at both events, where we’re asking attendees to create their most impressive and innovative data visualisation within OBIEE using the Donors Choose dataset, with the rule being that you can use any OBIEE or related technology as long as the visualisation runs with OBIEE and can respond to dashboard prompt controls. We’re also opening it up to OBIEE running as part of Oracle BI Cloud Service (BICS), so if you want to give Visual Analyser a spin within BICS we’d be interested in seeing the results.

Registration is still open for the Atlanta BI Forum event too, running the week after Brighton on the 13th-15th May 2015 at the Renaissance Atlanta Midtown hotel. Full details of both events are on the event homepage, with the registration links for Brighton and Atlanta given below.

  • Rittman Mead BI Forum 2015, Brighton –  May 6th – 8th 2015 
We look forward to seeing you all in Brighton next week, or Atlanta the week after – but remember to book soon, before we close registration!