Tag Archives: Cloud

ADVISOR WEBCAST: Enterprise Performance Management (EPM) Cloud Integration

Advisor Webcast

Enterprise Performance Management (EPM) Cloud Integration

October 13, 2016

Schedule:
Thursday , October 13, 2016 10:00 AM (US Pacific Time)
    Thursday , October 13, 2016 01:00 PM (US Eastern Time)
    Thursday , October 13, 2016 07:00 PM (Central European Time)
    Thursday , October 13, 2016 10:30 PM (India Standard Time)

This one-hour Advisor Webcast is recommended for functional and technical users who are responsible for defining the data integration needs for their EPM cloud applications. Solutions for cloud only, on-premise to cloud hybrid solutions, and cloud to cloud integration will be explained and demonstrated.

TOPICS WILL INCLUDE:

  • Native data load options for the EPM cloud applications
  • Solutions for hybrid integration with on-premise and cloud applications
  • Cloud to Cloud integration options for EPM Applications
  • Automating the data integration process

For Additional Information and Registration Details - visit:

My Oracle Support Community
Advisor Webcast - Enterprise Performance Management (EPM) Cloud Integration

To view scheduled & archived recordings of previous Business Analytics Advisor Webcasts visit:

Oracle Business Analytics Advisor Webcast
Doc ID 1456233.1



CUSTOMER NOTIFICATION : Attendees using recent Chrome or Firefox browser versions maybe required to activate the WebEx plug-in before joining a WebEx meeting. For up to-date information review the WebEx Support page:

https://support.webex.com/webex/meetings/en_US/chrome-firefox-join-faq.htm


Using Python to ‘Wrangle’ Your Messy Data

 or How to Organize Your Comic Book Collection Based on Issue Popularity

In addition to being a product manager at Rittman Mead, I consider myself to be a nerd of the highest order.  My love of comic books, fantasy, sci-fi and general geekery began long before the word ‘Stark’ had anything to do with Robert Downey Jr or memes about the impending winter.  As such, any chance to incorporate my hobbies into my work is a welcomed opportunity.  For my next few blog entries, I’ve decided to construct a predictive classification model using comic book sales data whose eventual goal will be to build a model that can accurately predict whether a comic will rocket off the shelves or if it will be a sales dud.  The first blog of the series shows some of the pitfalls that can come up when preparing your data for analysis.  Data preparation, or data wrangling as it has come to be known, is an imperfect process that usually takes multiple iterations of transformation, evaluation and refactoring before the data is “clean” enough for analysis.

While the steps involved in data wrangling vary based on the state and availability of the raw data, for this blog I have chosen to focus on the gathering of data from disparate sources, the enrichment of that data by merging their attributes and the restructuring of it to facilitate analysis. Comic book sales data is readily available on the interwebs, however, finding that data in a usable format proved to be a more difficult task.  In the end, I had to resort to dreaded process of screen scraping the data from a comic research site.  For those of you who are lucky enough be unfamiliar with it, screen scraping is the process of programmatically downloading HTML data and stripping away that formatting to make it suitable for use.  This is generally used as a last resort because web sites are volatile creatures that are prone to change their appearance as often as my teenage kids do preparing to leave the house.  However, for the purposes of this series, as my friend Melvin the handyman would often say, “We works with what we gots.”

blog-ironman-pythonexclamation-point-icon-30522This leads us to the first issue you may run into while wrangling your data.  You have access to lots of data but it’s not pretty.  So make it pretty.  Working with raw data is like being a sculptor working with wood.  Your job is not to change the core composition of the data to suit your purposes but to chip away at the excess to reveal what was there all along, a beautiful horse… er I mean insight.  Sorry, I got lost in my analogy.  Actually to expand on this analogy a bit, the first tool I pulled out of my toolbox for this project was Python, the Leatherman of  programming languages.  Python is fast, plays well with other technologies and most importantly in this case, Python is ubiquitous.   Used for tasks ranging from process automation and ETL to gaming and academic pursuits, Python is truly a multipurpose tool.  As such, if you have a functional need, chances are there is a native module or someone has already written a public library to perform that function.  In my case, I needed some scripts to “scrape” HTML tables containing comic sales data and combine that data with other comic data that I retrieved elsewhere.  The “other” data is metadata about each of the issues.  Metadata is just data about data.  In this case, information about who authored it, how it was sold, when it was published, etc..  More on that later.  

blog-sales-tableLuckily for me, the format of the data I was scraping was tabular, so extracting the data and transforming it into Python objects was a relatively simple matter of iterating through the table rows and binding each table column to the designated Python object field.   There was still a lot of unnecessary content on the page that needs to be ignored, like the titles and all of the other structural tags, but once I found the specific table holding the data, I was able to isolate it.  At that point, I wrote the objects to to a CSV file, to make the data easy to transport and to facilitate usability by other languages and/or processes.

The heavy lifting in this process was performed by three different Python modules: urllib2, bs4 and csv. Urllib2, as the name implies, provides functions to open URLs.  In this case, I found a site that hosted a page containing the estimated issue sales for every month going back to the early 1990’s.  To extract each month without manually updating the hardcoded URL over and over, I created a script that accepted MONTH and YEAR as arguments, month_sales_scraper.py
blog-get-sales

The response from the urlopen(url) function call was the full HTML code that is typically rendered by a web browser.  In that format, it does me very little good, so I needed to employ a parser to extract the data from the HTML.  In this context, a parser is a program that is used to read in a specific document format, break it down into its constituent parts while preserving the established relationships between those parts, and then finally provide a means to selectively access said parts.  So an HTML parser would allow me to easily access all the <TD> column tags for a specific table within an HTML document.  For my purposes, I chose BeautifulSoup, or bs4.

BeautifulSoup provided search functions that I used to find the specific HTML table containing the sales data and loop through each row, while using the column values to populate a Python object.

blog-bs4

This Python object, named data, contains fields populated with data from different sources.  The year and month are populated using the arguments passed to the module.  The format field is dynamically set based on logic related to the rankings and the remaining fields are set based on their source’s position in the HTML table.  As you can see, there is a lot of hard coded logic that would need to be updated, should the scraped site change their format.  However, for now this logic gets it done.

The final step of this task was to write those Python objects to a CSV file.   The python module, CSV, provides the function writerow(), which accepts an array as a parameter and writes each of the array elements as columns in the CSV.
blog-csv-write

My first pass raised the an exception because the title field contained unicode characters that the CSV writer could not handle.
blog-unicode-error

To rectify this, I had to add a check for unicode and encoded the content as UTF-8.  Unicode and UTF-8 are character encodings; meaning they provide a map computers use to identify characters.  This includes alphabets and logographic symbols from different languages as well as common symbols like ®.

blog-unicode-to-utf8

Additionally, there was the matter of reformatting the values of some of the numeric fields to allow math to be performed on them later(ie stripping ‘$’ and commas).  Other than that, the data load went pretty smoothly.  A file named (MONTH)_(YEAR).CSV was generated for each month.  Each file turned out like so:

blog-sales-csv

While this generated tens of thousands of rows of comic sales data, it was not enough.  Rather, it had the volume but not the breadth of information I needed.  In order to make an accurate prediction, I needed to feed more variables to the model than just the comic’s title, issue number, and price.  The publisher was not relevant as I decided to limit this exercise to only Marvel comics and passing in the the estimated sales would be cheating, as rank is based on sales.  So to enhance my dataset, I pulled metadata about each of the issues down from “the Cloud” using Marvel’s Developer API.  Thankfully, since the API is a web service, there was no need for screen scraping.

exclamation-point-icon-30522Retrieving and joining this data was not as easy as one might think.  My biggest problem was that the issue titles from the scraped source were not an exact match to the titles stored in the Marvel database.  For example, the scraped dataset lists one title as ‘All New All Different Avengers”.  Using their API to search the Marvel database with that title retrieved no results.  Eventually, I was able to manually find it in their database listed as “All-New All-Different Avengers”.  In other cases, there were extra words like “The Superior Foes of Spider-Man” vs “Superior Foes of Spider-Man”.  So in order to perform a lookup by name, I needed to know the titles as they expected them.  To do this I decided to pull a list of all the series titles whose metadata was modified during the timeframes for which I had sales data.  Again, I ran into a roadblock.  The Marvel API only allows you to retrieve up to 100 results per request and Marvel has published thousands of titles. To get around this I had to perform incremental pulls, segmented alphabetically.  

blog-incremental-pulls

Even then there were a few minor issues, as some letters like ‘S’ had more than 100 titles.  To get around that I had to pull the list for ‘S’ titles sorted ascending and descending then combine the results, making sure to remove duplicates.  So my advice on this one is be sure to read up on the limitations of any API you are using.  It may enforce limits but you may be able to work around the limits using creative querying.

blog-sticky-tab

At this point I have my list of Marvel series titles, stored in some CSV files that I eventually combined into a single file, MarvelSeriesList.csv, for ease of use.  Actually, I have more than that.  While retrieving the series titles, I also pulled down the ID for the series and an appropriateness rating.  Searching the API by ID will be much more accurate than name and the appropriateness  rating may be useful when building out the prediction model.  The next step was to iterate through each row of the CSVs we created from the sales data, find the matching ID from MarvelSeriesList.csv and use that ID to retrieve its metadata using the API.

exclamation-point-icon-30522If you remember, the point of doing that last step was that the titles stored in the sales data files don’t match the titles in the API, so I needed to find a way to join the two sources.  Rather than writing cases to handle each of the scenarios (e.g. mismatched punctuation, extra filler words), I looked for a python library to perform some fuzzy matching.  What I found was a extremely useful library called, Fuzzy Wuzzy.  Fuzzy Wuzzy provides a function called extractOne() that allows you to pass in a term and compare it with an array of values.  The extractOne() function will then return the term in the array that has the highest match percentage.  Additionally, you can specify a lower bound for acceptable matches (ie. only return result where the match is >= 90%).

Again, it took a few passes to get the configuration to work effectively.  The first time through about only about 65% of the titles in the sales file found a match.  That was throwing away too much data for my liking so I had to look at the exceptions and figure out why the matches were falling through.  One issue that I found was titles that tacked on the publication years in the Marvel database, like “All-New X-Men (2012)”, had a match score in the 80’s when matched against a sales title like, “All New X-Men”.  This was a pretty consistent issue, so rather than lowering the match percentage, which could introduce some legitimate mismatches, I decided to strip the year, if present, on mismatches and run it through that matching process again.  This got me almost there.  The only other issue I ran into was Fuzzy Wuzzy had trouble matching acronyms/acrostics.  So ‘S.H.E.I.L.D.’  had a match score in the 50’s when matching ‘SHIELD’.  That’s because half the characters (periods) were missing.  Since  there were only two titles affected, I built a lookup dictionary of special cases that needed to be translated.  For the purposes of this exercise, I would still have had enough matches to skip that step, but by doing it brought us up to 100% matching between the two sources.  Once the matching function was working, I pulled out urllib2 and retrieved all the metadata I could for each of the issues.

The resulting files contained not only sales data (title, issue number, month rank, estimated sales), but information about the creative team, issue descriptions, characters, release dates and  associated story arcs.  This would be enough to get us started with building our predictive classification model.
blog-csv-all That being said, there was still a lot of structural rearranging required to make it ready for the type of analysis I wanted to do, but we will deal with that in the next blog.  Hopefully,  you picked up some useful tips on how to combine data from different sources or at the very least found solace in knowing that while you may not be the coolest person in the world, somewhere out there is a grown man who still likes to read comic books enough to write a blog about it.  Be sure to tune in next week,
True Believers, as we delve into The Mysteries of R!

The post Using Python to ‘Wrangle’ Your Messy Data appeared first on Rittman Mead Consulting.

OBIEE12c – Three Months In, What’s the Verdict?

I’m over in Redwood Shores, California this week for the BIWA Summit 2016 conference where I’ll be speaking about BI and analytics development on Oracle’s database and Hadoop platforms. As it’s around three months now since OBIEE12c came out and we were here for Openworld, I thought it’d be a good opportunity to reflect on how OBIEE12c has been received by ourselves, the partner community and of course by customers. Given OBIEE11g was with us for around five years it’s still early days in the overall lifecycle of this release, but it’s also interesting to compare back to where we were around the same time with 11g and see if we can spot any similarities and differences to take-up then.

Starting with the positives; Visual Analyzer (note – not the Data Visualization option, I’ll get to that later) has gone down well with customers at least over in the UK. The major selling point seems to be “Tableau with a shared governed data model and integrated with the rest of our BI platform” (see Oracle’s slide from Oracle Openworld 2015 below), and given that the DV option’s price point per named-used seems to be comparable with Tableau server the cost-savings in terms of not having to learn and support a new platform means that customers seem pleased this new feature is now available.

NewImage

Given that VA is an extra-cost option what I’m seeing is customers planning to upgrade their base OBIEE platform from 11g to 12c as part of their regular platform refresh schedule, and then postponing the VA part until after the upgrade and as part of a separate cost/benefit exercise. But VA seems to be the trigger for customers to start considering an upgrade now, with the business typically now holding the budget for BI and Visual Analyzer (like Mobile with 11g) being the new capability that unlocks the upgrade spend.

On the negative side, Oracle charging for VA hasn’t gone down well, either from the customer side who ask what it is they actually get for their 22% upgrade and maintenance fee if they they have to pay for anything new that comes with the upgrade; or from partners who now see little in the 12c release to incentivise customers to upgrade that’s not an additional cost option. My response is usually to point to previous releases – 11g with Scorecards and Mobile, the database with In-Memory, RAC and so on – and say that it’s always the case that anything net-new comes at extra cost, whereas the upgrade should be something you do anyway to keep the platform up-to-date and be prepared to uptake new features. My observation over the past month or so is that this objection seems to be going away as people get used to the fact that VA costs extra; the other push-back I get a lot is from IT who don’t want to introduce data mashups into their environment, partly I guess out of fear of the unknown but also partly because of concerns around governance, how well it’ll work in the real world, so on and so on. I’d say overall VA has gone down well at least once we got past the “shock” of it costing extra, I’d expect there’ll be some bundle along the lines of BI Foundation Suite (BI Foundation Suite Plus?) in the next year or so that’ll bundle BIF with the DV option, maybe include some upcoming features in 12c that aren’t exposed yet but might round out the release. We’ll see.

The other area where OBIEE12c has gone down well, surprisingly well, is with the IT department for the new back-end features. I’ve been telling people that whilst everyone thinks 12c is about the new front-end features (VA, new look-and-feel etc) it’s actually the back-end that has the most changes, and will lead to the most financial and cost-saving benefits to customers – again note the slide below from last year’s Openworld summarising these improvements.

NewImage

Simplifying install, cloning, dev-to-test and so on will make BI provisioning considerably faster and cheaper to do, whilst the introduction of new concepts such as BI Modules, Service Instances, layered RPD customizations and BAR files paves the way for private cloud-style hosting of multiple BI applications on a single OBIEE12c domain, hybrid cloud deployments and mass customisation of hosted BI environments similar to what we’ve seen with Oracle Database over the past few years.

What’s interesting with 12c at this point though is that these back-end features are only half-deployed within the platform; the lack of a proper RPD upload tool, BI Modules and Services Instances only being in the singular and so on point to a future release where all this functionality gets rounded-off and fully realised in the platform, so where we are now is that 12c seems oddly half-finished and over-complicated for what it is, but it’s what’s coming over the rest of the lifecycle that will make this part of the product most interesting – see the slide below from Openworld 2014 where this full vision was set-out, but in Openworld this year was presumably left-out of the launch slides as the initial release only included the foundation and not the full capability.

NewImage

Compared back to where we were with OBIEE11g (11.1.1.3, at the start of the product cycle) which was largely feature-complete but had significant product quality issues, with 12c we’ve got less of the platform built-out but (with a couple of notable exceptions) generally good product quality, but this half-completed nature of the back-end must confuse some customers and partners who aren’t really aware of the full vision for the platform.

And finally, cloud; BICS had an update some while ago where it gained Visual Analyzer and data mashups earlier than the on-premise release, and as I covered in my recent UKOUG Tech’15 conference presentation it’s now possible to upload an on-premise RPD (but not the accompanying catalog, yet) and run it in BICS, giving you the benefit of immediate availability of VA and data mashups without having to do a full platform upgrade to 12c.

NewImage

In-practice there are still some significant blockers for customers looking to move their BI platform wholesale into Oracle Cloud; there’s no ability yet to link your on-premise Active Directory or other security setup to BICS meaning that you need to recreate all your users as Oracle Cloud users, and there’s very limited support for multiple subject areas, access to on-premise data sources and other more “enterprise” characterises of an Oracle BI platform. And Data Visualisation Cloud Service (DVCS) has so far been a non-event; for partners the question is why would we get involved and sell this given the very low cost and the lack of any area we can really add value, while for customers it’s perceived as interesting but too limited to be of any real practical use. Of course, over the long term this is the future – I expect on-premise installs of OBIEE will be the exception rather than the rule in 5 or 10 years time – but for now Cloud is more “one to monitor for the future” rather than something to plan for now, as we’re doing with 12c upgrades and new implementations.

So in summary, I’d say with OBIEE12c we were pleased and surprised to see it out so early, and VA in-particular has driven a lot of interest and awareness from customers that has manifested itself in enquires around upgrades and new features presentations. The back-end for me is the most interesting new part of the release, promising significant time-saving and quality-improving benefits for the IT department, but at present these benefits are more theoretical than realisable until such time as the full BI Modules/multiple Services Instances feature is rolled-out later this year or next. Cloud is still “one for the future” but there’s significant interest from customers in moving either part or all of their BI platform to the cloud, but given the enterprise nature of OBIEE it’s likely BI will follow after a wider adoption of Oracle Cloud for the customer rather than being the trailblazer given the need to integrate with cloud security, data sources and the need to wait for some other product enhancements to match on-premise functionality.

The post OBIEE12c – Three Months In, What’s the Verdict? appeared first on Rittman Mead Consulting.

Taking a Look at Oracle Big Data Preparation Cloud Service – Spark-Based Data Transformation in the Cloud

One of the sessions I’m delivering at the upcoming Oracle Openworld 2015 in San Francisco is entitled “Oracle Business Intelligence Cloud Service—Moving Your Complete BI Platform to the Cloud [UGF4906]”, and looks at how you can now migrate your entire OBIEE11g platform into Oracle Public Cloud including data warehouse and data integration routines. Using Oracle BI Cloud Services’ new on-premise RPD upload feature you can upload an existing RPD into BICS and run it from there, with the accompanying data warehouse database moving into Oracle’s Database Cloud Service (and with the restriction that you can’t then edit the repository within BICS, you need to do that on-premise and upload again). For ETL and data integration you can carry on using ODI12c which now has the ability to load data into Oracle Storage Cloud (for file sources) and BICS (via a REST API) as well as the full Oracle DBaaS, but another Oracle option for doing purely cloud-based data processing enrichment has recent become available – Oracle Big Data Preparation Cloud Service. So what is it, how does it work and how is it different to ODI12c?

Oracle Big Data Preparation Cloud Service (“BDP”) is a thin-client application within Oracle Cloud for ingesting, preparing and enriching datasets that don’t have a predefined schema and may well need certain fields obfuscated or cleansed. Being integrated with Oracle Storage Cloud and other infrastructure and platform services within Oracle cloud it’s obviously aimed mainly at data transformation tasks within the Oracle Cloud enviroment, but you can upload and download datasets from your browser for use with on-premise applications. Unlike the more general-purpose Oracle Data Integrator it’s aimed instead at a particular use-case – non-technical information analysts who need to get data transformed, wrangled and enriched before they can make use of it in an environment like Hadoop. In fact the product name is a bit misleading – it runs on a big data platform within Oracle Cloud and like Oracle Big Data Discovery uses Apache Spark for its data processing – but it could potentially be useful for a business analyst to prepare data for loading into Oracle BI Cloud Service, and I’ll cover this angle when I talk about data loading options in by Oracle Openworld session.

Within a logical architecture for a typical big data DW and BI system, BDP sits alongside ODI within the Data Factory and provides self-service, agile transformation capabilities to more business-orientated users. 

NewImage

Oracle Big Data Cloud Preparation Service shares quite a bit of functionality and underlying technology, with Oracle Big Data Discovery – both run on Hadoop, they both use Apache Spark for data processing and transformation, and both offer data transformation and “wrangling” features aimed at non-technical users. Oracle are positioning Big Data Preparation Service as something you’d use in the execution layer of the Oracle Information Management Reference Architecture whereas Big Data Discovery is associated more with the discovery layer – I’d mostly agree but I can see a role for BDD even within the execution layer, as a front-end to the data reservoir that typically now runs alongside relationally-stored data warehouses.

NewImage

Looking back at the slides from one of the recent Strata conferences, for example, sees Oracle positioning BDP as the “operational data preparation” tool for structured and unstructured data – with no defined schema – coming into your information platform, with the enriched output then being used BI tools, enterprise reporting and data discovery tools.

NewImage

 

Apart from the scalability benefits of running BDP on Apache Spark, the other interesting feature in BDP is how it uses Spark’s machine learning capabilities to try to automate as much of the data preparation process as possible, for example detecting credit card numbers in data fields and recommending you obfuscate that column. Similar to BICS and how Oracle have tried to simplify the process of creating reports and dashboards for a small team, BDP runs in the cloud tries to automate and simplify as much of the data preparation and enrichment process as possible, with ODI12c still available for ETL developers to develop more complex transformations.

The development lifecycle for BDP (from the Oracle Big Data Preparation Cloud Service e-book on Oracle’s website) uses a cycle of ingesting, cleaning, enriching and then publishing data using scripts authored using the tool and run on the Apache Spark platform. The diagram below shows the BDP development lifecycle from Oracle’s Big Data Preparation Cloud Service Handbook, and shows how ingestion, enrichment, publishing and governance go in a cycle with the common foundation of the transformation scripts that you build using BDP’s web interface.

NewImage

So let’s walk through an example data preparation exercise using a file of data stored initially in Oracle Storage Cloud Service. After logging into BDP via Oracle Cloud you’re first presented with the Catalog view, listing out all your previous transformations and showing you when they were last used to process some data.

NewImage

To create a transformation you first give it a name, then select the data source and then the file you’re interested in. In my environment I’ve got Oracle Storage Cloud and HDFS available as my main data sources, or I could upload a file from my desktop and start from there.

NewImage

BDP then ingests the file and then uses its machine learning features to process and classify data in each column, recommending column names such as “gender”, “city” and cc_number based on (presumably) some sort of classification model. In the screenshot below you can see a set of these recommendations on the left-hand side of the screen, with the columns themselves listed centre and a summary of the file profiling on the right.

NewImage

Taking a closer look at the profile results panel you can see two of the columns have alerts raised, in red. Clicking on the alert shows that the two columns have credit card data stored in clear text, with the recommendation being to obfuscate or otherwise secure these fields. Clicking on a field then shows the various transformation options, with the obvious choice here being to automatically obfuscate the data in those fields.

NewImage

Once you’ve worked through all the recommendations and added any transformations you choose to add yourself, the final step is to publish your transformation to one of the available targets. In the example below we’ve got Oracle Storage Cloud and HDFS again as potential targets; I’d imagine Oracle will add a connector to BICS soon, for example, so that you can use BDP as a data prep tool for file data that will then be added to your dataset in BICS.

NewImage

So … it’ll be interesting to see where this one goes. Its interesting that Oracle have split out data preparation and data discovery into two tools whilst others are saying theirs can do both, and you’ll still need ODI for the more complex integration jobs. But I like the innovative use of machine learning to do away with much of the manual work required for classification of incoming data fields, and running the whole thing on Spark certainly gives it the potential of scale. A couple of years ago I was worried Oracle didn’t really have a strategy for data integration and ETL in the cloud, but we’re starting to see something happen now.

There’s a big push from the Oracle field at the moment to move customers into the cloud, and I can see BDP getting bundled in with Big Data Cloud Service and BICS as the accompanying cloud data preparation tool. The danger then of course is that Big Data Discovery starts to look less useful, especially with Visual Analyzer already available within BICS and coming soon on-premise with OBIEE12c. My guess is that what we’re seeing now with these initial releases of BDP and BDD is just the start, with BDP adding more automatic enrichment “smarts” and starting to cover integration use-cases too, whilst BDD will put more focus on data visualization and analytics on the data reservoir.

Oracle Data Integrator to load Oracle BICS and Oracle Storage Cloud Service

It is known that Oracle focuses its sales on cloud this year and in the BI world, we have seen the introduction of Oracle Business Intelligence Cloud Service – BICS – that we already covered on the blog. Another great product to store files and unstructured data is Oracle Storage Cloud Service, that can also be used as a staging area before loading a BICS schema. To provision our BI system in the cloud or to backup our files, it’s important to find a reliable and maintainable way to move on-premise data to the cloud. And when we speak about reliable and maintainable way to move data, I can’t help but think about Oracle Data Integrator. Luckily, Ayush Ganeriwal wrote two excellent blog posts explaining how to use ODI to load these two cloud services! And he even gave away the Knowledge Modules, ODI Open Tools, Jar files and other things he developed to make it work.

Load data to Oracle Storage Cloud Service with a package.

Oracle Storage Cloud Service is used to store files and unstructured data in the cloud, very much like Amazon S3. In the case of a company that has moved to the cloud, they need some space to store their files that will be used by other cloud services. Oracle Storage Cloud Service is the place to store all these files. If you want to use it as a cloud backup of your local filesystem, you might even choose Oracle Storage Cloud Archive Service which has cheaper storage but cost more when retrieving or editing data, similarly to Amazon Glacier .

Oracle Storage Cloud Service offers a RESTful web service API to load data as well as a Java Library directly wrapping this API and supporting client-side encryption. When it comes to load data, what could be better than using the tool we already use for all the rest? In this blog post , Ayush details how he created an ODI OpenTool that can load data to the Oracle Storage Cloud Service using the Java Library.

We first need to download the Java Library and place it in our ODI Studio or Agent classpaths. Below are the default locations for these classpaths, but we can also specify additional classpath if we want. Don’t forget to close ODI Studio and shutdown the agent(s) before adding the files.

ODI Studio on Unix : ~/.odi/oracledi/userlib
ODI Studio on Windows : %APPDATA%odioraclediuserlib
12c Standalone Agent : /odi/agent/lib
11g Standalone Agent : /oracledi/agent/drivers/

For a JEE agent, we need to create a server template as described in the doc : https://docs.oracle.com/middleware/1212/odi/ODIDG/setup_topology.htm#CHDHGIGI.

In the same folder, we can also drop the Jar file for the Open Tool that Ayush created and posted on java.net (last link in the folder at the moment of writing). Remember that all the drivers, libraries and Open Tools have to be present in every ODI Studio or agent install.

We can now reopen ODI Studio as we need to register the new Open Tool in the master repository so everyone can use it. In the ODI Menu, click on Add Remove Open  Tools and then on the googles icon to search for the Open Tools. In our case, the class is under the package oracle.odi.rest.

Adding an ODI Open Tool

Select the OracleStorageCloudGet tool and click the Plus icon to add it. There we can see the description of the tool, the command, the parameters and the icon used. Let’s add the OracleStorageCloudPut as well and click OK.

Open Tool description

That’s it! We can now see two new ODI Open Tools under the Plugins category in the Package Toolbox. We can add it to the package and fill all the parameters or copy/paste the command to use it in an ODI Procedure.

Open Tools in a package

Load data to Oracle Business Intelligence Cloud Service with a mapping.

BICS offers a Data Loader which is actually an Apex application taking Excel, CSV, text or zip files as input. But this is more a manual process used when working with sample data or really small dataset. If we plan to use BICS in a production environment, we need something more resilient that can be scheduled and that handles errors. We could use Data Sync but maybe you don’t want to introduce a new tool. Luckily, BICS also provides a RESTful web service API we can use to load data. Again, ODI is the perfect tool to create our own reusable components to integrate with new sources or targets. And once again, Ayush did all the job for us as described in this article.

The first step is similar to what we did previously, we need make all the drivers and libraries available for ODI Studio and the agent. The steps in the two Knowledge Modules contains Jython code, calling some java methods from the odi-bics.jar file. These methods – just like the Oracle Storage Cloud Service java library – will use the Jersey Libraries to do the RESTful web service calls so we will need them as well. Basically, just take all the jar files from the archive you can find on java.net under the name “RKM and IKM for Oracle BI Cloud Service”. As mentioned above, we need to add it to the classpath of ODI Studio and the agent installs and restart them.

The article mentions that we need to create a new Technology for BI Cloud Service but an XML export is already created in the archive. Go in the Topology, right click on Technology and import the xml file.

Adding a Technology

We can also import the two Knowledge Modules, either globally or under a single project. Ayush wrote one RKM to reverse-engineer the tables from our BICS schema and one Multi-Connections IKM to integrate the data from the on-premise database to the BICS schema in the cloud. I had to tweak the IKM a little bit but it might be linked to my particular setup. The steps were not displayed in the order it was set in the XML file and I had to reorder it properly to place the drop before the create and truncate. Probably a bug, I will investigate that later.

Reordering the steps in the KM

We are done with the install of new components and we can now set up the topology. First create a Dataserver with the newly imported technology and provide a login/password and the URL to the BICS instance as the Service URL. From my testing and unlike Ayush’s screenshot, it should start with https and should not include the port number. Of course, we need to create a physical schema where we need to provide our Identity Domain in both Tenant fields. We can leave the Database fields empty. The last step to do in the Topology is to create a Logical schema and associate it to the Physical schema through a context.

topo_DS

We can then switch to the Designer tab and create a new Model with BI Cloud Service technology and that Logical Schema. On the Reverse Engineer tab we select Customized and the RKM BI Cloud Service. We can also specify a mask to restrict the metadata import to specific table names before hitting the Reverse Engineer button at the top. I set the mask to DIM_LOC% and only my DIM_LOCATION as been reverse engineered from the BICS schema.

Model and reverse engineering

Finally, we can create a mapping. In this example I load two target tables. DIM_LOCATIONS (plural) is sitting on an on-premise Oracle Database for users accessing OBIEE from our HQ while DIM_LOCATION (singular) is in the schema linked to the our BICS instance used by remote users. There is nothing special here, except that I unselected the insert and update checkboxes for my surrogate key (LOCATION_SK) because I asked BICS to automatically populate it when I created the table – under the scene, a sequence is created and a trigger added to the table to populate the field with the next value of the sequence on each insert.

ODI-BICS-mapping

On the Physical tab, the LKM should be set on LKM SQL Multi-connect because we will delegate the data transfer to the Multi-Connections IKM. This IKM is the one we imported : IKM SQL to BI Cloud Service. I choose to enable the truncate option here.

mapping_2DS

Let’s hit the execute button and watch the result in the operator and in our BICS schema.

bics_result_DS

The two articles demonstrate one more time how easy it is to plug a new technology within ODI. This tool is a wonderful framework that can be used to move data from and to any technologies, even if it’s Big Data or Cloud as we have seen here. By having every integration jobs happening in the same place, we can have an easier maintenance, a better monitoring and we can schedule all the jobs together. It’s also way easier to see the big picture in our projects.

If you have any question about this post feel free to reach me, we love that kind of interesting challenges at Rittman Mead.

Thanks a lot to Ayush for such good components and libraries he provided. I’m glad I will share the stage with him and a fellow ODI expert, Holger Friedrich, at the Oracle Open World. We will speak about Best Practices for Development Lifecycle Management . Come to see us and say hi!