Tag Archives: Oracle Data Integrator

Upgrade to ODI 12c: Repository and Standalone Agent

This is my first post, and I hope to have many more to come. My name is Brian Sauer, and I am fairly new here at Rittman Mead and proud to be here. I have close to 5 years of experience with Oracle Data Integrator, both 10g and 11c. I have been involved in ETL activities for close to 8 years. Prior to using Oracle Data Integrator, most of my activities in ETL utilized Perl. Ever since being exposed to Oracle and ODI, I have found a home that has been both comforting and challenging. It is a passion of mine to be able to work with any type of database and to get data from point A to point B, making the necessary changes to get user’s hands on it in a meaningful and useful way. Thus, I’ve put together this blog post to assist with the upgrade of ODI 11g to ODI 12c. I hope it’s useful to you.

After going through the process of upgrading from ODI 11g to ODI 12c, I found the documentation to be scattered around a bit and began putting together a roadmap of the information that was vital to the upgrade itself. This post is essentially a brief explanation of some added enhancements to ODI in 12.1.3.0 and a straightforward guide to the upgrade process. I’m not addressing 12.2.1.0 in this post, as many of our clients have been moving to 12.1.3.0 due to stability and testing issues with 12.2.1.0. I will take a look 12.2.1.0 in my next post in this series.

ODI 12.1.3.0 was released in November 2014, and along with it came a number of enhancements and significant changes from ODI 11.1.1.7.0. I am going to focus on the upgrade process with this post and some of the more significant changes.

According to the Oracle documentation, the only supported starting points for upgrading to ODI 12c (12.1.3) are the following:

  • Oracle Data Integrator 11g Release 1 (11.1.1.6.0)
  • Oracle Data Integrator 11g Release 1 (11.1.1.7.0)

There are some significant changes to ODI from 11g. First and foremost is the way that the agents are handled in 12c. Standalone Agents are now managed by the WebLogic Management Framework, and they are installed in their own directories. Also, interfaces have given way to mappings and reusable mappings. Lastly, the repository upgrade to 12.1.3 validates name uniqueness for objects such as interfaces, folders, procedures, knowledge modules, packages, and profiles. If duplicates exist, the upgrade process will check for it, and you can print a report that will list the duplicates in the upgrade log. You’ll then need to reopen ODI 11g and manually fix the duplicates and restart the upgrade.

Below is a comparison of the ODI JEE Agent Upgrade topology. Notice that in 12c only an LDAP-based store can be used for OPSS as file-based stores are no longer allowed.

Picture 1

You’ll also notice the topology for Standalone Agents not registered with a WebLogic Domain have changed as mentioned earlier; they are now part of the WebLogic Management Framework. The agent is configured in a standalone domain unlike its predecessor in 11g.

Picture 2

For Standalone Agents in 11g that were registered with a WebLogic Domain nothing has really changed from a topology standpoint as shown below:

Picture 3

Some considerations to take into account before diving into the upgrade are the following:

  • Verify that the database which will host the schemas required by Fusion Middleware is supported. You can verify at: http://docs.oracle.com/middleware/1213/core/ODIUG/tasklist.htm#BABGAIFA
  • ODI 12.1.3 requires a 64-bit operating system. Verify your machine is 64-bit, and, if not, then upgrade.
  • Verify you are running a supported version (11.1.1.6, 11.1.1.7).
  • Develop a backup and recovery strategy.
  • Plan for system downtime during the upgrade process.

Upgrade Process

BACKUPS

The first and probably most important step before you even begin the upgrade is to make sure you have a solid backup of your 11g environment. There are many ways to do this, but you will want to make sure that you have a backup of the following so you can restore your prior environment in the event that a failure occurs:

  1. ODI Home Directory
  2. ODI Work Repository(s)
  3. ODI Master Repository

To preserve the ODI 11g Home Directory, I simply make a copy of the ODI_HOME directory, zip it up, and save in another location. To preserve the ODI Master and Work Repositories, simply use the export utility within ODI.

GUID-08FF568A-7A03-4CE1-9F4E-2F2993BF72C6-default

This will provide a saved copy of each. In addition to the backup, record the Repository IDs in the event you need to restore. Lastly, I create a database backup of the ODI schemas that will be upgraded. These are the schemas where the Master and Work Repositories reside.

Zero or Limited Downtime Approaches

An important question is, How can I upgrade my system with limited or even zero downtime?

Oracle recommends cloning the work and master repositories on your database in another database instance and then upgrading the cloned repositories. This will keep your current 11g repositories active and untouched during the upgrade process, and once the upgrade is complete and you’ve been able to test it out, you can switch to the new 12c repositories as your active ODI implementation. Oracle has provided sample scripts for different database environments at the following location:

https://docs.oracle.com/middleware/1212/odi/ODIUG/tasklist.htm#ODIUG117

As a part of your testing you will want to verify, after you have set up your cloned repositories, that the cloned master repository points to the correct work repository(s). Also, as a part of cloning the repositories, you’ll need to copy the row about the master repository in SYSTEM.SCHEMA_VERSION_REGISTRY$ from the old to the new instance.  This is not a supported change by Oracle, however it is required for the upgrade to be successful.

Another upgrade option other than cloning is to create a brand new ODI 12c repository on your database and then export your objects from 11g and import them individually to your new 12c repository using ODI Studio. To perform this task you’ll need to use the upgrade key which is new to 12c and which I’ve explained later in this post for the imports. This is not my recommended approach as it takes longer and is more tedious which can lead to missed objects during the export/import process.

However, if you choose to go this route, the export/import order I have used in the past is the following:

  1. Topology
  2. Models
  3. Global Objects
  4. Projects

Installing Oracle Data Integrator 12c Products

Now that backups have been made and we are confident that in the event of an unsuccessful upgrade we can recover and we have decided on a transition strategy, it is time to begin installing our 12c products.

We have a couple questions to answer first. Are we upgrading with a Standalone Agent or a Java EE Agent? If we are installing an environment with a Java EE Agent then we will need to download two separate files from Oracle:

  1. Oracle Data Integrator 12cR1 (12.1.3.0.0) for all platforms install file.
  2. Oracle Fusion Middleware Infrastructure (for All Platforms).

If we are installing for a standalone environment which we are in this post, then the second file is not required or needed. You will need approximately 1.5GB of space for the home directory where you install ODI 12c. I would recommend an additional 1GB of space to account for domains for agents. The following link will give you the ODI 12c Studio installation instructions in detail:

https://docs.oracle.com/middleware/1212/odi/ODING/install.htm#ODING286

Creating/Upgrading the Master and Work Repositories

Now that ODI 12c Studio is installed, we’ll need to create the Master and Work repositories in the database so that they can effectively communicate with ODI 12c Studio. To do this you’ll need to navigate to the <ODI_HOME>/oracle_common/bin directory and run either ./rcu.sh or rcu.bat, depending on your environment.

Before beginning the creation of the 12c repositories, make sure you have a solid backup of the 11g repositories.                

You’ll want to make sure that you have DBA privileges for this step as you’ll be creating new schemas and tables in the database that will require these permissions. The wizard does give you the option of creating the scripts for the DBA to use later if necessary, although it may be best to work with the DBA directly as you create the repositories.

First, you will want to choose an existing prefix since this is how you identify the repository to be upgraded in the DB.

Secondly, you’ll need to check off the following components to identify them in your 12c upgrade:

  1. Oracle AS Repository Components/AS Common Schemas/Common Infrastructure Services which creates the new *_STB schema for 12c.
  2. Oracle Data Integrator/Master and Work Repository

RCU_screenshot1

As you enter the schema passwords and mapping information, the utility will ultimately reach the summary screen where it will show you the new schemas it will create. If any schemas have already been created, be aware that they will not be listed here since they do not need to be created again. At this screen, you click create, and the creation of the new schemas/components will commence.

RCU_screenshot2

Once you have created the new schemas necessary for 12c, we will start the Upgrade Assistant so the previous version of ODI can be upgraded to 12.1.3.0.0. You will find the upgrade assistant at:

<ODI_HOME>/oracle_common/upgrade/bin

You’ll execute the ua.bat or ua file depending on your environment.

When asked what type of upgrade you wish to perform, you will want to select “Schemas” as this will allow you to identify the schemas where the upgrade will take place.

UA1

The upgrade assistant will then list the schemas available to upgrade. When asked what components to upgrade, you will select “Oracle Data Integrator”.

UA2

The only other item that I’d draw your attention to is the screen providing the “Upgrade Key”. This key is used to convert 11g object IDs into unique GUIDs. You will want to save this key as you will need this for additional 11g objects that may need to be imported at a later time. This key allows ODI to consistently calculate the same GUID for an 11g object when importing to 12c. The key identifies uniquely the set of repositories that were working together before an upgrade. Since the 12c repository has switched to using GUIDs to better guarantee unique ID’s, the 11g ID will not work without providing the upgrade key.

UA3

As you upgrade to 12c, the question will likely arises as to why would you need this upgrade key. The situation may arise where you would need to restore an object, import a scenario, or revert back to a prior topology. This key will allow you to do so with your 11g exports.

Once completion of the Upgrade Assistant has occurred, you will want to verify that the version of ODI has been updated in the schema. You can run the following command to verify:

SELECT OWNER, VERSION, STATUS FROM SCHEMA_VERSION_REGISTRY WHERE OWNER = ‘<prefix>_ODI_REPO’;

You should see the version for your schema as: 12.1.3.0.0 with a status of ‘VALID’

Creating the Standalone Agent

Our final step is creating the Standalone Agent. As I wrote earlier, this has changed with 12c since the WebLogic framework is being used now for all agents. Due to this, you cannot upgrade 11g standalone agents to 12c, and you will need to create new ones.

We will place our new Standalone Agent in a domain folder which will sit outside of the ODI_HOME.

To create the Standalone Agent, navigate to ODI_HOME/oracle_common/common/bin and execute either config.sh or config.exe based on your operating system.

The first screen that you will come across will ask you to either create a new domain or update an existing domain.  In our case here, we did not have an existing domain so we will create a new Standalone Agent Domain. The wizard will default to a location within your ODI_HOME. You will want to change this location so that it resides OUTSIDE the ODI_HOME as I wrote earlier. The reason for this is that in the future it could create problems for upgrading your ODI software if the domain is created in the ODI_HOME.

agent1

The rest of the steps are straightforward. You’ll choose to create the “Oracle Data Integrator – Standalone Agent – 12.1.3.0[odi]”, enter database configuration information, and choose a user name and password for the Node Configuration. Then, you will be prompted to create your agent.

Once you have completed these steps to start the agent, go to DOMAIN_HOME/bin and select either startNodeManger.cmd or startNodeManager.sh depending on your operating system. You’ll be prompted to enter the credentials you chose when configuring the node above.

Once you’ve completed these steps, you’ll go through the process of creating your agent within ODI Studio both within the Physical and Logical Architecture.

imageedit_1_4071130689

Then run the following command depending on your operating system:

./agent.sh –NAME=<AGENT_NAME>

agent.cmd –NAME=<AGENT_NAME>

Once these commands have executed successfully you’ll want to confirm the agent is running without issue by clicking on the “Test” button by opening the agent you created in the Physical Architecture in ODI Studio.

imageedit_1_8629909932

I would shut down and remove the agent that existed in your 11g environment as you remove the 11g ODI home directory. Remember in our backup that we zipped a copy of the 11g home directory and stored it in another location. Once you have confirmed that ODI 12c is operating as expected, you may remove this directory if it’s on the same server.

Testing, Upgrade Issues, and Clean Up

The approach that I have used when testing has been to test each individual interface and load plan against the results in its 11g counterpart. I verify the counts initially and then have the users/owners verify the data. I then allow the load plans to run a few days to a week, and once the users/owners are comfortable with the results, migrate to the new implementation.

That being said, the following issues and clean up items may need to be addressed, which I and some of my colleagues at Rittman Mead have encountered while upgrading to 12c:

  • Variables in mappings/KMs are not recognized by the parser when creating scenarios, so they are not recognized as input parameters. Consequently, you’ll want to wrap each mapping in a package or load plan with the variables declared in it.
  • The variables are case-sensitive in 12c, so if you called a variable with a different case in a mapping, you will need to change it to the matching case.
  • Make sure to prefix all variables with the project code or global.
  • Do not start any data store alias with a number as this does not work well in 12c.
  • Temporary interfaces from 11g are triplicated in 12c during the upgrade and require cleanup in projects as well as the models. The upgrade creates these as a regular mapping and creates a target table in the models which can be removed. It also creates two copies of each temporary interface in Reusable Mappings with suffixes of RMS (Re-usable Mapping Source) and RMT (Re-usable Mapping Target). For the most part the RMT reusable mappings can be removed, however verify the mapping(s) using the temporary interface are suffixed by RMS before deleting.
  • Upgrade the mappings to flow-based using “Convert to Flow” on the dataset. (Optional)
  • You can check out Michael Rainey’s post on this at: http://www.rittmanmead.com/2015/04/di-tips-odi-convert-to-flow/

In my next post, I will go over the process that involves upgrading the Java EE Agent to 12c from 11g and highlight where the upgrade to 12.2.1.0 differs from the upgrade to 12.1.3.0. Stay tuned!

 

The post Upgrade to ODI 12c: Repository and Standalone Agent appeared first on Rittman Mead Consulting.

Kickstart Your 2016 with Rittman Mead’s Data Integration Training

Happy Holidays and a Happy New Year to all! As you begin your 2016 this January, it’s time to start planning your team’s data integration training. Look no further than Rittman Mead’s Oracle Data Integrator training course! We offer a 4 day Oracle Data Integrator 12c Bootcamp for those looking to take advantage of the latest and greatest features in ODI 12c. We also still teach our 5 day Oracle Data Integrator 11g Bootcamp, as we know sometimes it can be difficult to upgrade to the latest release and new data warehouse team members need to be brought up to speed on the product. ODI 11g is also still very much alive in Oracle Business Intelligence Applications, being the ETL technology for the 11g release of the product suite.

ODI12c training

Customized Data Integration Training

BI Apps 11g training has been a hot topic from the data integration perspective over the last couple of years. Rittman Mead have delivered custom BI Apps training for ODI developers several times just within the last year, prompting us to add a new public training course specific to this topic to our public schedule. This course walks attendees through the unique relationship between OBIEE and ODI 11g as the data integration technology, including configuration, load plan generation, and ETL customization. If you have an Oracle Business Intelligence Applications 11g team looking to enhance their ODI 11g skills, take a look at the new ODI for BI Applications course description.

The customization of training does not just apply to BI Applications, but to all aspects of Oracle Data Integration. Whether adding more details around Oracle GoldenGate installation and maintenance to the ODI 12c course, or learning about Oracle EDQ integration, the Rittman Mead data integration team of experts can work to deliver the course so your team gains the most value from its investment in Oracle Data Integration technologies. Just ask! Reach out and we can work together to create a custom course to fit your needs.

Public or Onsite Training?

Rittman Mead has several dates for each course, scheduled to be delivered out of our offices in either Atlanta, GA or Brighton, UK. Take a look here for our ODI 12c bootcamp, ODI 11g bootcamp, and ODI for BI Apps Developers offerings in the US. Look here for the same in the UK/Europe (Note: as of the writing of this blog post, the 2016 UK/Europe schedule had not been released). We also offer the same courses for delivery onsite at your company’s office, allowing our experts to come to you! Quite often our clients will combine consulting and training, ensuring they get the most out of their investment in our team of experts.

Why Rittman Mead?

Many folks in the Business Intelligence and Data Integration profession who are looking for a consulting company might think Rittman Mead only work on extremely challenging projects based on the depth of knowledge and type of problems (and solutions) we offer via our blog. The fact is, most of our projects are the “standard” data warehouse or business intelligence reporting implementations, with some of these additional challenges coming along the way. Why do I bring that up? Well, if you’re looking for the experts in Oracle Data Integration technology, with experience in both project implementation and solving challenging technical problems, then you’ve come to the right place to learn about ODI.

Unlike many other companies offering training, we don’t have a staff of educators on hand. Our trainers are the same folks that deliver projects, using the technology you’re interested in learning about, on a day-to-day basis. We offer you real world examples as we walk through our training slide deck and labs. Need to know why Oracle GoldenGate is an integral part of real-time data integration? Let me tell you about my latest client where I implemented GoldenGate and ODI. Want to know what to look out for when installing the JEE Agent in ODI 12c? We’ve done that many times – and know the tricks necessary to get it all working.

Our experts, such Jérôme Françoisse, Becky Wagner, Mark Rittman, myself, and many others, all have multiple years of experience with Oracle Data Integration implementations. Not only that, but we here at Rittman Mead truly enjoy sharing our knowledge! Whether posting to this blog, speaking at Oracle conferences, or on the OTN forums, Rittman Mead experts are always looking to teach others in order to better the Oracle Data Integration community.

If you or your company are in need of Oracle Data Integration training, please drop us a line at training@rittmanmead.com. As always, feel free to reach out to me directly on Twitter (@mRainey), LinkedIn, or via email (michael.rainey@rittmanmead.com) if you have any direct questions. See you all next year!

The post Kickstart Your 2016 with Rittman Mead’s Data Integration Training appeared first on Rittman Mead Consulting.

Becky’s BI Apps Corner: Oracle BI Applications, where art thou?

Hello! I would like to take a moment to introduce myself. My name is Becky Wagner and I’ve been working with Rittman Mead America since the beginning of the year. I have a background in data warehousing and ETL with a state government agency, where I was part of a project upgrading from DataStage to ODI 11g with Oracle Golden Gate in Oracle’s early adopter program for Oracle BI Apps 11g. Since coming to Rittman Mead, I’ve taught several ODI 12c bootcamps, designed and delivered custom trainings for BI Apps and ODI and been involved in ODI 11g, ODI 12c projects, and OBIA 11g projects.

 

Recently, I was putting together a BI Apps for Oracle Data Integrator (ODI) custom training for a client who is upgrading BI Apps and will be using ODI for the first time. On the Virtual Machine I was building for the training, I wanted the version of Oracle BI Applications to match the client’s installation. I found that Oracle’s recent website facelifts changed the way I was used to getting older versions of software. Oracle’s Downloads and Technetwork sites both have the most recent version of Oracle BI Apps available (currently 11.1.1.10.1) but no earlier versions any longer. Edelivery has the earlier versions as well as the current version, but the site has changed enough that it took me a bit to understand how to get to the Oracle BI Apps software files to download.

 

Downloading Oracle BI Apps from Edelivery

 

From your favorite browser, go to https://edelivery.oracle.com and sign in.

Accept the Export Restrictions (of course, only after you have read, understand and agree to them.)

 

Fill in the Product* box with ‘Business Intelligence Applications’. You won’t see Business Intelligence Applications in the dropdown that appears as you start typing. What you do see are other Products that your company would have purchased to allow you to have a license of Oracle Business Intelligence Applications, such as Oracle Financial Analytics or Oracle Human Resources Analytics. Select the product (or one of the products) that was purchased by your company.

Click on the Select Platform button and check the box for your appropriate platform, such as Linux x86-64 or Microsoft Windows x64 (64-bit). Then click the Select button.

Once the Product you selected is displaying in the window, click on the Continue button.

 

Now Oracle Business Intelligence 11.1.1.10.1 is showing. Interestingly, it doesn’t say Oracle Business Intelligence Applications, but currently OBIEE doesn’t have a version of 11.1.1.10.1, so we can be confident this is actually Oracle BI Apps. However, this still isn’t the Oracle BI Apps that you are looking for. Below the Available Release, you will see the option to Select Alternate Release.

This will allow you to drop down the box to select an earlier version.

With any version of Oracle BI Apps, there are several different components to download, which you can see by clicking on the triangle to the left of the Available Release. Once you have selected your desired version of Oracle BI Apps, click on the continue button to begin the download.

Please don’t forget to read the license agreements carefully and if you agree, check the box and click Continue.

 

At this point, you can now see the files to download. You can click on each blue link individually, or click the Download All button to use Oracle’s downloader tool. Also, for the Linux, Red Hat, and Solaris folks, notice at the bottom the WGET Options.

Downloading ODI for Oracle BI Apps from Edelivery

 

Get the downloads started for these files. We aren’t quite finished yet, though. ODI needs to be downloaded as well. Return to the Products page. Please note that to get the correct version of ODI, you must type in ‘Oracle Data Integrator for Oracle Business Intelligence’ (again, it probably means Oracle BI Apps, but points here for consistency at least). Select a Platform. Then click Continue.

Notice we are showing Oracle Business Intelligence 11.1.1.10.1 again. You will want to click on Select Alternate Release, and pick the same version you selected above. This will save you from interesting OPatch errors later during the install process, but I will leave those fun errors for another blog post. Then click Continue.

Rinse and repeat.

And that is how you navigate the new Oracle EDelivery site to get Oracle BI Apps download files and previous versions. I would love to hear your thoughts if you found this helpful or confusing. Also, please leave comments below if you found other alternatives for downloading earlier versions of Oracle BI Apps, and/or your suggestions for ways to rename the V#####-##.zip files to something more descriptive that better identifies the zip file. Keep an eye out for more Becky’s BI Apps Corner coming soon and if you’re interested in OBIEE or ODI training (or even a custom Oracle BI Applications course), give us a shout at training@rittmanmead.com!

The post Becky’s BI Apps Corner: Oracle BI Applications, where art thou? appeared first on Rittman Mead Consulting.

Rittman Mead at UKOUG Tech’15 Conference, Birmingham

This week Rittman Mead are very pleased to be presenting at the UK Oracle User Group’s Tech’15 Conference in Birmingham, delivering a number of sessions around OBIEE, Data Integration, Cloud and Big Data.

NewImage

If you’re at the event and you see any of us in sessions, around the conference or during our talks, we’d be pleased to speak with you about your projects and answer any questions you might have. Here’s the list our speaking slots over the four days of the event, and I’ll update the list with links to presentation downloads as they become available over the event.

In addition, if you’re interested in the OBIEE user adoption and retention area that Robin talks about in his Wednesday session, Rittman Mead have a User Engagement service using some of the tools and techniques that Robin talked about (datasheet here) and we’d be pleased to talk to you about how you can increase user engagement, adoption and retention for your OBIEE system. Other than that, come and speak to us if you see us at the Birmingham ICC, and look forward to more content on these areas in the New Year!

The post Rittman Mead at UKOUG Tech’15 Conference, Birmingham appeared first on Rittman Mead Consulting.

Taking a Look at Oracle Big Data Preparation Cloud Service – Spark-Based Data Transformation in the Cloud

One of the sessions I’m delivering at the upcoming Oracle Openworld 2015 in San Francisco is entitled “Oracle Business Intelligence Cloud Service—Moving Your Complete BI Platform to the Cloud [UGF4906]”, and looks at how you can now migrate your entire OBIEE11g platform into Oracle Public Cloud including data warehouse and data integration routines. Using Oracle BI Cloud Services’ new on-premise RPD upload feature you can upload an existing RPD into BICS and run it from there, with the accompanying data warehouse database moving into Oracle’s Database Cloud Service (and with the restriction that you can’t then edit the repository within BICS, you need to do that on-premise and upload again). For ETL and data integration you can carry on using ODI12c which now has the ability to load data into Oracle Storage Cloud (for file sources) and BICS (via a REST API) as well as the full Oracle DBaaS, but another Oracle option for doing purely cloud-based data processing enrichment has recent become available – Oracle Big Data Preparation Cloud Service. So what is it, how does it work and how is it different to ODI12c?

Oracle Big Data Preparation Cloud Service (“BDP”) is a thin-client application within Oracle Cloud for ingesting, preparing and enriching datasets that don’t have a predefined schema and may well need certain fields obfuscated or cleansed. Being integrated with Oracle Storage Cloud and other infrastructure and platform services within Oracle cloud it’s obviously aimed mainly at data transformation tasks within the Oracle Cloud enviroment, but you can upload and download datasets from your browser for use with on-premise applications. Unlike the more general-purpose Oracle Data Integrator it’s aimed instead at a particular use-case – non-technical information analysts who need to get data transformed, wrangled and enriched before they can make use of it in an environment like Hadoop. In fact the product name is a bit misleading – it runs on a big data platform within Oracle Cloud and like Oracle Big Data Discovery uses Apache Spark for its data processing – but it could potentially be useful for a business analyst to prepare data for loading into Oracle BI Cloud Service, and I’ll cover this angle when I talk about data loading options in by Oracle Openworld session.

Within a logical architecture for a typical big data DW and BI system, BDP sits alongside ODI within the Data Factory and provides self-service, agile transformation capabilities to more business-orientated users. 

NewImage

Oracle Big Data Cloud Preparation Service shares quite a bit of functionality and underlying technology, with Oracle Big Data Discovery – both run on Hadoop, they both use Apache Spark for data processing and transformation, and both offer data transformation and “wrangling” features aimed at non-technical users. Oracle are positioning Big Data Preparation Service as something you’d use in the execution layer of the Oracle Information Management Reference Architecture whereas Big Data Discovery is associated more with the discovery layer – I’d mostly agree but I can see a role for BDD even within the execution layer, as a front-end to the data reservoir that typically now runs alongside relationally-stored data warehouses.

NewImage

Looking back at the slides from one of the recent Strata conferences, for example, sees Oracle positioning BDP as the “operational data preparation” tool for structured and unstructured data – with no defined schema – coming into your information platform, with the enriched output then being used BI tools, enterprise reporting and data discovery tools.

NewImage

 

Apart from the scalability benefits of running BDP on Apache Spark, the other interesting feature in BDP is how it uses Spark’s machine learning capabilities to try to automate as much of the data preparation process as possible, for example detecting credit card numbers in data fields and recommending you obfuscate that column. Similar to BICS and how Oracle have tried to simplify the process of creating reports and dashboards for a small team, BDP runs in the cloud tries to automate and simplify as much of the data preparation and enrichment process as possible, with ODI12c still available for ETL developers to develop more complex transformations.

The development lifecycle for BDP (from the Oracle Big Data Preparation Cloud Service e-book on Oracle’s website) uses a cycle of ingesting, cleaning, enriching and then publishing data using scripts authored using the tool and run on the Apache Spark platform. The diagram below shows the BDP development lifecycle from Oracle’s Big Data Preparation Cloud Service Handbook, and shows how ingestion, enrichment, publishing and governance go in a cycle with the common foundation of the transformation scripts that you build using BDP’s web interface.

NewImage

So let’s walk through an example data preparation exercise using a file of data stored initially in Oracle Storage Cloud Service. After logging into BDP via Oracle Cloud you’re first presented with the Catalog view, listing out all your previous transformations and showing you when they were last used to process some data.

NewImage

To create a transformation you first give it a name, then select the data source and then the file you’re interested in. In my environment I’ve got Oracle Storage Cloud and HDFS available as my main data sources, or I could upload a file from my desktop and start from there.

NewImage

BDP then ingests the file and then uses its machine learning features to process and classify data in each column, recommending column names such as “gender”, “city” and cc_number based on (presumably) some sort of classification model. In the screenshot below you can see a set of these recommendations on the left-hand side of the screen, with the columns themselves listed centre and a summary of the file profiling on the right.

NewImage

Taking a closer look at the profile results panel you can see two of the columns have alerts raised, in red. Clicking on the alert shows that the two columns have credit card data stored in clear text, with the recommendation being to obfuscate or otherwise secure these fields. Clicking on a field then shows the various transformation options, with the obvious choice here being to automatically obfuscate the data in those fields.

NewImage

Once you’ve worked through all the recommendations and added any transformations you choose to add yourself, the final step is to publish your transformation to one of the available targets. In the example below we’ve got Oracle Storage Cloud and HDFS again as potential targets; I’d imagine Oracle will add a connector to BICS soon, for example, so that you can use BDP as a data prep tool for file data that will then be added to your dataset in BICS.

NewImage

So … it’ll be interesting to see where this one goes. Its interesting that Oracle have split out data preparation and data discovery into two tools whilst others are saying theirs can do both, and you’ll still need ODI for the more complex integration jobs. But I like the innovative use of machine learning to do away with much of the manual work required for classification of incoming data fields, and running the whole thing on Spark certainly gives it the potential of scale. A couple of years ago I was worried Oracle didn’t really have a strategy for data integration and ETL in the cloud, but we’re starting to see something happen now.

There’s a big push from the Oracle field at the moment to move customers into the cloud, and I can see BDP getting bundled in with Big Data Cloud Service and BICS as the accompanying cloud data preparation tool. The danger then of course is that Big Data Discovery starts to look less useful, especially with Visual Analyzer already available within BICS and coming soon on-premise with OBIEE12c. My guess is that what we’re seeing now with these initial releases of BDP and BDD is just the start, with BDP adding more automatic enrichment “smarts” and starting to cover integration use-cases too, whilst BDD will put more focus on data visualization and analytics on the data reservoir.