Tag Archives: Oracle Data Integrator

The Case for ETL in the Cloud – CAPEX vs OPEX

Recently Oracle announced a new cloud service for Oracle Data Integrator. Because I was helping our sales team by doing some estimates and statements of work, I was already thinking of costs, ROI, use cases, and the questions behind making a decision to move to the cloud. I want to explore what is the business case for using or switching to ODICS?

Oracle Data Integration Cloud Services

First, let me briefly talk about what is Oracle Data Integration Cloud Services? ODICS is ODI version 12.2.1.2 available on Oracle’s Java Cloud Service known as JCS. Several posts cover the implementation, migration, and technical aspects of using ODI in the cloud. Instead of covering the ‘how’, I want to talk about the ‘when’ and ‘why’.

Use Cases

What use cases are there for ODICS?
1. You have or soon plan to have your data warehouse in Oracle’s Cloud. In this situation, you can now have your ODI J2EE agent in the same cloud network, removing network hops and improving performance.
2. If you currently have an ODI license on-premises, you are allowed to install that license on Oracle’s JCS at the JCS prices. See here for more information about installing on JCS. These use cases are described in a webinar posted in the PM Webcast Archive.

When and Why?

So when would it make sense to move towards using ODICS? These are the scenarios I imagine being the most likely:
1. A new customer or project. If a business doesn’t already have ODI, this allows them to decide between an all on-premises solution or a complete solution in Oracle’s cloud. With monthly and metered costs, the standard large start-up costs for hardware and licenses are avoided, making this solution available for more small to medium businesses.
2. An existing business with ODI already and considering moving their DW to the cloud. In this scenario, a possible solution would be to move the current license of ODI to JCS and begin using that to move data, all while tracking JCS costs. When the time comes to review licensing obligations for ODI, compare the calculation for a license to the calculation of expected usage for ODICS and see which one makes the most sense (cents?). For a more detailed explanation of this point, let’s talk CAPEX and OPEX!

CAPEX vs. OPEX

CAPEX and OPEX are short for Capital Expense and Operational Expense, respectively. In a finance and budgeting perspective, these two show up very differently on financial reports. This often has tax considerations for businesses. Traditionally in the past, a data warehouse project was a very large initial capital expenditure, with hardware, licenses, and project costs. This would land it very solidly as CAPEX. Over the last several years, sponsorship for these projects has shifted from CIOs and IT Directors to CFOs and Business Directors. With this shift, several businesses would rather budget and see these expenses monthly as an operating expense as opposed to every few years having large capital expenses, putting these projects into OPEX instead.

Conclusion

Having monthly and metered service costs in the cloud that are fixed or predictable are appealing. As a bonus, this style of service is highly flexible and can scale up (or down) as demand changes. If you are or will soon be in the process of planning for your future business analytics needs, we provide expert services, assessments, accelerators, and executive consultations for assisting with these kinds of decisions. When it is time to talk about actual numbers, your Oracle Sales Representative will have the best prices. Please get in touch for more information.

ETL Offload with Spark and Amazon EMR – Part 5 – Summary

This is the final article in a series documenting an exercise that we undertook for a client recently. Currently running an Oracle-based datawarehouse platform, the client asked for our help in understanding what a future ETL and reporting platform could look like, given the current landscape of tools available. You can read the background to the project, how we developed prototype code, deployed it to Amazon, and evaluated tools for analysing the data.

Whilst the client were generally aware of new technologies, they wanted a clear understanding of what these looked like in practice. Is it viable, as is being touted, to offload ETL entirely to open-source tools? Could they do this, without increasing their licensing costs?

The client are already well adopted to newer technologies, running their entire infrastructure on the Amazon Web Services (AWS) cloud. Given this usage of AWS, our investigation was based around deployment of the Elastic Map Reduce (EMR) Hadoop platform. Many of the findings made during the investigation are as applicable to other Hadoop platforms though, including CDH running on Oracle's Big Data Appliance.

We isolated a single process within the broader part of the client's processing estate for exploration. The point of our study was not less to implement this specific piece of functionality in the most optimal way, but to understand how in general processing would look on another platform in an end-to-end flow. Before any kind of deployment into Production of this design there would be further iterations, particularly around performance. These are discussed further below.

Overview of the Solution

The source data landed in Amazon S3 (similar in concept to HDFS), in CSV format, once per hour. We loaded each file, processed it to enrich it with reference data, and wrote it back to S3.

The enriched data was queried directly, with Presto, and also loaded into Redshift for querying there.

Oracle's Data Visualization Desktop was used as the front end for querying.

Benefits

Cost Benefits

  • By moving ETL processing to Hadoop-based platform, we free up capacity (and potentially licensing costs) on the existing commerical RDBMS (Oracle) where the processing currently takes place
  • Costs are further reduced by the 'elastic' provisioning and cost model of the cloud service. You only pay for the size of the cluster necessary for your workload, for the duration that it took to execute.

Technology Benefits

In this solution we have taken advantage of the decoupling of storage from compute. This is a significant advantage that cloud technology brings.

  • Amazon S3 provides the durable data store for our data (whether CSV, Parquet, or any other data format). With S3 you simply pay for the storage that you use. S3 can be accessed by dozens of client libraries as well as HDFS-compatible APIs. Data in S3 is completely compute-target agnostic. Contrast this to data sat in your proprietory RDBMS database, and if you want to process or analyse this in another system.
  • In this instance we wanted to enrich the data, and proved Spark as an appropriate tool to do so. Running on Elastic Map Reduce we could provision this automagically, run our processing, and have the EMR cluster terminate itself once complete. The compute part of the equation is entirely isolated, and can be switched in and our of the architecture as required.

Moving existing workloads to the cloud is not just a case of provisioning servers running in someone else's data centre to perform the same work as before. To truly benefit (dare I say, leverage) from the new possibilities, it makes sense to re-architect how you store your data and perform processing on it.

As well as the benefit of cloud technology, we can see that we don't even need an RDBMS for much of this enrichment and transformation work. Redshift has proved to be useful for interactive analysis of the data, but the processing of the data that would typically get done within an RDBMS (with associated license costs) can instead be done on technology such as Spark.

Broader Observations

The world of data and analytics is changing, and there are some interesting points that this project raised, which I discuss below.

Cloud

The client for whom we carried out this work are already cloud 'converts', running their entire operation on AWS already. They're not alone in recognising the benefits of Cloud, and it's going to be interesting to see the rate at which adoption continues to occur elsewhere, particularly in the Oracle market as they ramp up their offerings.

Cloud Overview

The Cloud is of course a big deal nowadays, whether in the breathless excitement of marketing talk, or the more gradual realisation amongst more technical folk that The Cloud brings some serious benefits. There are three broad flavours of Cloud - Infrastructure, Platform, and Software (IaaS, PaaS, SaaS respectively):

  • At the lowest level, you basically rent access to tin (hardware). Infrastructure-as-a-service (IaaS) can include simply running virtual machines on someone else's hardware, but it's more clever than that. You get the ability to provision storage separately from compute, and all with virtualised networking too. Thus you store your data, but don't pay for the processing until you want to. This is a very long way from working out how big a server to order for installation in your data centre (or indeed, a VM to provision in the cloud) - how many CPUs, how much RAM, how big the hard disks should be - and worrying about under- or over-provisioning it.

    With IaaS the components can be decoupled, and scaled elastically as required. You pay for what you use.

    The additional benefit of IaaS is that someone else manages the actual hardware; machine outages, disk failures, and so on, are all someone else's concern.

  • IaaS can sometimes still be a lot of work; after all, you still have the manage the servers, or architect and manage the decoupled components such as storage and compute. The next 'aaS' up in Platform as a Service (PaaS). Here, the "platform" is provided and managed for you.

    A clear example of PaaS is the Hadoop platform. You can run a Hadoop cluster yourself, whether on Oracle's Big Data Appliance (BDA), or maybe on your own hardware (or indeed, on IaaS in the cloud) but with a distribution such as Cloudera's CDH. Point being, you still have to manage it, to tune your Hadoop parameters, and so on. Hadoop as a platform in the cloud (i.e. PaaS) is offered by many companies, including the big vendors, such as Oracle (Big Data Cloud Service), Microsoft (HDInsight), Google (Dataproc) -- and then the daddy of them all, Amazon with it's Elastic Map Reduce (EMR) platform

    Another example of PaaS is Oracle's BI Cloud Service (BICS), in which you build and run your own RPD and reports, but Oracle look after the actual server processes.

  • Software as a Service (SaaS) is where everything is provisioned and managed for you. Whereas on PaaS you still write the code that's to be run (whether a Spark routine on Hadoop, or BI metadata model on BICS), on SaaS someone has already done that too. You just provide the inputs, which obviously depend on the purpose of the SaaS. Something like GMail is a good example of SaaS. You're not having to write the web-based email, you're not having to provision the servers on which to run that - you simply utilise the software.

Cloud's Benefit to Analytics

Cloud brings benefits - but also greater subtleties to our solutions. Instead of simply provisioning one or more servers on which to hold our data and process it, we start to unpick this into separate components. In the context of this study, we have:

  • Data at rest, on S3. This is storage paid for simply based on how much you use. Importantly, you don't have to have a server (or in more abstract terms, 'compute') running. It's roughly analogous to network mounted storage. You can access S3 externally to AWS, such as your laptop or a server in your data centre. You can also access it, obviously, from within the AWS ecosystem. You can even use S3 to serve up files just as a web server would.
  • Compute, on EMR. How often do you need to carry out transformations and processing on your data? Not continuously? Then why pay for a server to sit idle the rest of the time? What about the size of the server that it does run on - how many CPUs do you need? How many nodes in your cluster? EMR solves both these problems, by enabling you to provision a Hadoop cluster of any size and spec, on demand - and optionally, terminate itself once it's completed its work so that you only pay for the compute time necessary.
  • Having a bunch of data sat around isn't going to bring any value to the business, without Analytics and a way of presenting that to the user. This could be done either through loading the data into a traditional RDBMS such as Oracle, or Redshift, and analysing it there - or potentially through one of the new generation of "SQL on Hadoop" engines, such as Impala or Presto. There's also Athena which is a SQL interface directly to data in S3 - you don't even need to be running a Hadoop cluster to use this.

Innovation vs Execution (or, just because you can, doesn't mean you should)

The code written during this exercise could, with a bit of tidying up, be run in Production. As in, it does the job that it was built to do. We could even expand it to audit row counts in and out, report duff data, send notifications when complete. What about the next processing requirement that comes in? More bespoke code? And more? At some point we'd probably end up refactoring a whole bunch of it into some kind of framework. Into that framework we'd obviously want good things like handling SCDs, data lineage, and more. Welcome to re-inventing the in-house ETL wheel. Whether Spark jobs nowadays, PL/SQL ten years ago, or COBOL routines a decade before that - doing data processing at a wider scale soon becomes a challenge. Even with the best coders (or 'engineers' as they're now called) in the world, you're going to end up with a bespoke platform that's relient on inhouse skills to support and maintain. That presumes of course that you can find the relevant skills in the market to write all the processing and transformations that you need - and support them, of course. As you aquire new staff, they'll need to be trained on your code base - and suddenly the "free" technology platform isn't appearing so cheap.

Before you shoot me down for a hyperbolic strawman argument, there is an important dichotomy to draw here, between innovation and execution. Applicable to the world of big data in general, it is a useful concept spelt out in the Oracle Information Management & Big Data Reference Architecture. For data to provide value, it doesn't have to land straight away into the world of formalised development processes, Production environments, and so on. A lot of the time you will want to 'poke around' with it, to explore it -- to innovate. Of the technology base out there, you may not know which tool, or library, is going to yield the best results. This is where the "discovery lab" comes in, and where the type of hand-cranked Spark coding that I've demonstrated sits:

Sometimes, work done in innovation is complete once it's done. As in, it has answered the required business question, and provided its value. But a lot of the time though it will simply establish and prove the process that is to be applied to the data, that then needs taking through to the execution layer. This is often called, in an abuse of the English language, "productionisation" or "industrialisation". This is where the questions of code maintenance and scalability need to be seriously considered. And this is where you need a scalable and maintainable approach to the design, management, and orchestration of your data processing - which is exactly what a tool like Oracle Data Integrator (ODI) provides.

ODI is the premier DI tool on the market, with good support for "big data" technologies, including the ability to generate Spark code to perform transformation. It can be deployed to run on Amazon's EMR, as illustrated here, as well as Oracle's Cloud platform. As can be seen from this presentation from Oracle Open World in September 2016, there's additional capabilities coming including around Kafka, Spark Streaming, and Cassandra.

Another route to examine, alongside ODI, is the ecosystem within AWS itself around code execution and orchestration with tools such as Lambda, Data Pipeline, and Simple Workflow Service. There's also AWS Glue, which like Athena was announced at re:Invent 2016. This promises three key things of crucial importance here:

  • A Data Catalog, populated automatically, and not only just supporting multiple formats and sources, but including automatic classification (e.g. "Web Log") of the data itself.
  • Automatic generation of ETL code. From the release announcement notes this looks like it is pySpark-based code. So the code that I put together for this exercise here, manually (and at times, painfully), could be automagically generated based on source/target and operators required. The announcement notes also specifically mention the inclusion of standard ETL processes such as handling bad data
  • Orchestration and management of ETL jobs. One of my main objections above to taking 'proof of concept' pySpark code and trying to use it in a Production scenario is that you end up with a spaghetti of scripts, which are a nightmare to maintain and support. If Glue lives up to its promises, we'd pretty much get the best of all worlds - a flexible yet robust platform.

Hadoop Ecosystems

A single vendor for your IT platform gives you "one throat to choke" when it comes to support, which is usually a good thing. But if that vendor's platform is closed and proprietary it makes leaving it, or even just making use of alternative tools with it, difficult. One of the evangelical claims made about the new world of open source software is that the proliferation of open standards would spell an end to vendor lockin. I was interested to see during the course of this exercise a few examples where the big vendors subtly pushed you towards their own tool of choice, or away from an alternative.

For example, Amazon EMR makes available Presto as part of the default build, but to run the latest version of Impala you'd have to install it yourself. Whilst it is possible to install it yourself, of course, this added friction makes it less likely that people will. This friction increases when we consider that the software usually needs installing - and configuring - across multiple the nodes of the Hadoop cluster. Given an open field of tools all purporting to do the same or similar things, any impedance to using one over the other will count. The same argument could be made for the CDH distribution, in which Impala is front and center, and deploying Presto or Drill would be a manual exercise. Again, yes, installing it may be relatively trivial - but manual download and deployment across a cluster is never going to win out over a one-click deploying from a centralised management console.

This is a long way from any kind of vendor lockin, but it is worth bearing in mind that walls, albeit thin ones, are being built around these various gardens in the Hadoop ecosystem.

Summary

I hope you've found this series of article useful. You can find a list of them below. In the meantime, please do get in touch if you'd like to find out more about how Rittman Mead can help you on your data and analytics journey!

Oracle Data Integrator 12c: Getting Started – Developer’s Quickstart

I’ve decided that it’s time for a refresher on Oracle Data Integrator 12c. This week in the “Oracle Data Integrator 12c: Getting Started” series: getting a quick start on mapping development. Several objects must be created before a single bit of ETL can even be created, and for those who are new to the product, as many readers of this series will be, that can be frustrating. The objects that must be in place are as follows:

  • Data Server
  • This object is the connection to your data source. Created under one of the many technologies available in ODI, this is where the JDBC url, username, password, and other properties are all created and stored.
  • Physical Schema
  • Underneath the Data Server you’ll find the Physical Schema. This object, when connecting to a relational database, represents the database schema where the tables reside that you wish to access in ODI.
  • Logical Schema
  • Here’s where it can sometimes get a bit tricky for folks new to Oracle Data Integrator. One of the great features in ODI is how it abstracts the physical connection and schema from the logical objects. The Logical Schema is mapped to the Physical Schema by an object called a Context. This allows development of mappings and other objects to occur against the Logical schema, shielding the physical side from the developers. Now when promoting code to the next environment, nothing must changed in the developed objects for the connection.
  • Model
  • Once you have the Topology setup (Data Server, Physical Schema, Logical Schema), you can then create your Model. This is going to be where the logical Datastores are grouped for a given schema. There are many other functions of the Model object, such as journalizing (CDC) setup, but we’ll save those features for another day.
  • Datastore
  • The Datastore is a logical representation of a table, file, XML element, or other physical object. Stored in the form of a table, the Datastore has columns and constraints. This is the object that will be used as a source or target in your ODI Mappings.

Now you can create your mapping. Whew!

Over the years, Oracle has worked to make the process of getting started a lot easier. Back in ODI 11g, the Oracle Data Integrator QuickStart was a 10 step checklist, where each step leads to another section in the documentation. A nice gesture by Oracle but by no means “quick”. There was also a great tool, the ODI Accelerator Launchpad, built in Groovy by David Allan of the Oracle DI team. Now we were getting closer to something “quick”. But this was simply a script that you had to run, not an integrated part of the ODI Studio platform. Finally, with the release of ODI 12.1.3, the Quickstart was introduced. The New Model and Topology Objects wizard allows you to create everything you need in order to reverse engineer tables into ODI Datastore objects and begin creating your first mappings.

ODI 12c New Model and Topology Objects wizard

Going through the wizard is much simpler than manually setting up the Topology objects and Model for folks just getting started with Oracle Data Integrator. The blog post from Oracle linked above can walk you through the process and I’ve added a demonstration video below that does the same. As a bonus in my demo, I’ve added a tip to help you get your initial load mappings created in an instant. Have a look:

There you have it, a quick and easy way to get started with Oracle Data Integrator 12c and create your first source to target Mapping. If you have further questions and would like a more detailed answer, you can always join one of the Rittman Mead ODI bootcamps to learn more from one of our data integration experts. Up next in the Getting Started series, we’ll look at adding enhancing the ODI metadata by adding constraints and other options.

Oracle Data Integrator 12c: Getting Started – Installation and Configuration

I’ve decided that it’s time for a refresher on Oracle Data Integrator 12c. This week, I’m continuing the “Oracle Data Integrator 12c: Getting Started” series with the next topic: Installation and Configuration. I don’t plan to dig into all of the nitty gritty details on each specific scenario you might run into. Rittman Mead have an excellent training program for ODI 12c where you can ask the specifics. There is also this 100+ page document from Oracle that can provide further details: Oracle® Fusion Middleware - Installing and Configuring Oracle Data Integrator 12c (12.2.1) E55859-03. In this post, we’ll hit on the basics and hopefully guide you in the right direction.

First, let’s take a look at the installation process. Installing ODI 12c is quite simple compared to past versions of product. This is mainly due to the configuration of agents being moved outside of the initial installation and into a completely different process. The big choice we have now is which implementation to choose: standard or enterprise.

Standard:
Choose this installation if you plan to use a Standalone Agent to execute your mappings and data integration processes. This will install ODI Studio and the Standalone Agent, along with all of the necessary drivers and libraries to make everything work. Choose this option when installing ODI Studio on your client machine.

Enterprise:
This option will install the JEE Agent, Standalone Colocated Agent, as well as ODI Studio. The installation footprint is slightly larger than the Standard install, coming in at 3390 MB vs 3017 MB for standard. That’s a part of why I choose Standard when just installing Studio. Along with the JEE Agent and ODI Studio, you’ll also have the ODI Console and ODI plugin for Fusion Middleware Control (Enterprise Manager). One thing to note, you’ll need to first install the Oracle Fusion Middleware Infrastructure (download) package as it is a prerequisite to the ODI 12c Enterprise version.

To get started, download the bits and bytes from Oracle OTN and unzip the file. Next, using the appropriate Java version, run the ODI installer.

[oracle@bigdatalite bin]$ ./java -version
java version "1.8.0_111"
Java(TM) SE Runtime Environment (build 1.8.0_111-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.111-b14, mixed mode)
[oracle@bigdatalite bin]$ cd ../..
[oracle@bigdatalite odi122120]$ ./jdk1.8.0_111/bin/java -jar fmw_12.2.1.2.0_odi.jar 
Launcher log file is /tmp/OraInstall2016-10-21_12-20-12PM/launcher2016-10-21_12-20-12PM.log.
Extracting the installer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . .

From there, simply go through the install wizard. I won’t go into too much because there’s not much to it. First, decide whether you will install the Standard or Enterprise version. Next, determine where you want to install your ODI home. For a JEE or Colocated Agent installation, ODI must be installed into the Oracle Home containing Fusion Middleware Infrastructure. For Standalone agent, a brand new Oracle Home must be used. Now that we’ve installed the software, it’s time to setup the repository database schema.

Master and Work Repository Setup

The Repository Configuration Utility (RCU) is delivered as a part of the ODI install. It is used to create Fusion Middleware repositories, such as the ODI Master/Work Repositories, and any additional database schemas necessary.

Start the RCU by browsing to ODI_HOME/oracle_common/bin and running ./rcu. The steps are pretty straightforward, so I’ll leave this as an exercise for everyone to try at home. Once we have our repositories, we can begin configuring Agents and Studio.

ODI Studio Configuration

Setting up ODI Studio is fairly straightforward, so I’ll keep it brief. Once installed, browse to ODI_HOME/odi/studio and fire up ./odi.sh (or odi.exe in Windows). Then, create a new connection to your ODI Master and Work repository. Enter the ODI username/password to connect with, master repository schema username/password, JDBC connection information, and Work repository. Test the connection to ensure it’s all working and you’re set.

Standalone Agent Configuration

Browse to your /oracle_common/common/bin and run the WebLogic Server Configuration wizard.

./config.sh

Create ODI Agent

First, create a new domain for the agent. This is similar to a WebLogic domain if you’ve gone through the setup process for any application that relies on WLS. It’s recommended to create the domain outside of the ODI Oracle Home. That way you can avoid issues when upgrading/reinstalling and can follow the same domain directory structure if installing multiple agents, since each must be installed in its own domain.

Next, choose the Standalone Agent Template, installed as part of the ODI download. Then, select the JDK to be used by the agent.

Configure ODI Agent component

The repository is already setup, so now if we connect to the service table (STB) schema, we can look up the ODI repository connection information without having to manually enter anything.

Get RCU configuration

The system component will be the ODI Agent name. Then, set the host server listen address, port, and Supervisor username/password.

ODI Agent configuration

Finally, setup the Node Manager for the ODI Agent. The Node Manager doesn’t have to be used when starting the agent, but it is a requirement to set up during configuration. The Username/Password should be kept in a safe place, as it will be necessary for starting components within the Node Manager.

Finally, click Create and you’re off and running.

Starting the Standalone Agent

Before you can start the agent, be sure to configure it in the master repository via ODI studio. Check out the video below for a short tutorial.

Now, you can startup the agent with the node manager or without it. If we want to manage the agent via Fusion Middleware Control, use the Node Manager. But, if you plan to simply use the command line to start / stop the agent, you can do so using the classic startup method.

Start Agent with Node Manager

From within the DOMAIN_HOME/bin directory, run the following. You’ll want to use nohup to start the process in the background on Linux. For Windows, create a service to run nodemanager.cmd.

[oracle@bigdatalite bin]$ pwd
/u01/odi_agent/user_projects/domains/base_domain/bin
[oracle@bigdatalite bin]$ nohup ./startNodeManager.sh > nm.out&
[oracle@bigdatalite bin]$ nohup: ignoring input and redirecting stderr to stdout

Next, start the agent component from within the same directory. You’ll be prompted for the Node Manager password setup during configuration.

[oracle@bigdatalite bin]$ ./startComponent.sh OracleDIAgent1
Starting system Component OracleDIAgent1 ...

Initializing WebLogic Scripting Tool (WLST) ...
...
Reading domain from /u01/odi_agent/user_projects/domains/base_domain

Please enter Node Manager password:
Connecting to Node Manager ...

Start Agent without Node Manager

For times when you don’t need the Node Manager, you can still start the ODI Agent using the classic method. Just remember to always use all-caps when referring to the agent parameters, such as NAME.

./agent.sh -NAME=OracleDIAgent1

Either approach for starting the agent will get you up and running.

Colocated Agent Configuration

The Colocated Agent is essentially a Standalone agent that is managed and maintained via WebLogic Server. Rather than repeat the steps, I thought I would point you to a nice Oracle By Example on Creating a Colocated Agent. As mentioned earlier, be sure to create your domain outside of the ODI home.

Ping colocated agent

For reference, when testing an agent, simply place the hostname:port/agent-web-application-context in a browser. The Ping response means the agent is up and running.

JEE Agent Configuration

Finally, we’ll discuss the JEE Agent configuration. As mentioned in my previous post, the main reasons for using the JEE Agent is for high availability or connection pooling. You may also want to use the ODI Console or need to manage and monitor ODI agents from FMW Control. Whatever the reason, here’s how we configure the agent.

The configuration process is very similar to that of the Colocated Agent, but with a several extra steps involved. I’ll highlight those steps here. To get started, just as with the other agents, browse to the ODI_HOME/oracle_common/common/ directory and run ./config.sh.

Work through the usual first few installer screens. Then, we select the template to configure. Choose ODI Agent and any additional items you’d like to add to the domain, such as ODI Console or Enterprise Manager plugin.

JEE Agent template

Next, add the credential mappings for both the ODI user, allowing JEE agent access to the Master / Work Repository, and the WebLogic domain user.

JEE Agent credential mapping

Select the options you plan to configure for the domain. This will vary based on what currently exists in WebLogic Server, but in this example we’ll be setting up a new AdminServer, Node Manager, and the Managed Server to host the ODI JEE Agent.

JEE Agent domain configuration

Next, configure the AdminServer, Node Manager, and Managed Server. For each, remember to use the machine IP address and not the default “Local All Machines”. For the node manager, enter a username and password as we did in previous configurations.

JEE Agent server configuration

The final screens in the configuration wizard allow you to create a cluster and assign the cluster to the server. You can assign a dynamic cluster as needed for scalability, though this falls outside the context of this article. Assign the cluster to a server, create the machine (remember to assign the IP address), and finally assign the servers (AdminServer and Managed Server) to the machine.

JEE Agent topology

Click create and you're done! Now start weblogic, node manager, then login to console to start ODI managed server and agent.

There you are, ODI Agents installed and configured. As always, there are many external factors that may change how your configuration and setup works in your environment, so please think of this as a simple guide and not the exact approach for each unique situation. If you do have further questions and would like a more detailed answer, you can always join one of the Rittman Mead ODI bootcamps to learn more from one of our experts on the product. Up next in the Getting Started series, we’ll look at creating your first ODI mapping.

OBIEE, Big Data Discovery, and ODI security updates – October 2016

Oracle release their "Critical Patch Update" (CPU) notices every quarter, bundling together details of vulnerabilities and associated patches across their entire product line. October's was released yesterday, with a few entries of note in the analytics & DI space.

Each vulnerability is given a unique identifier (CVE-xxxx-xxxx) and a score out of ten. The scoring uses a common industry-standard scale on the basis of how easy it is to exploit, and what is compromised (availability, data, etc). Ten is the worst, and I would crudely paraphrase it as generally meaning that someone can wander in, steal your data, change your data, and take your system offline. Lower than that and it might be that it requires extensive skills to exploit, or the impact be much lower.

A final point to note is that the security patches that are released are not available for old versions of the software. For example, if you're on OBIEE 11.1.1.6 or earlier, and it is affected by the vulnerability listed below (which I would assume it is), there is no security patch. So even if you don't want to update your version for the latest functionality, staying within support is an important thing to do and plan for. You can see the dates for OBIEE versions and when they go out of "Error Correction Support" here.

If you want more information on how Rittman Mead can help you plan, test, and carry out patching or upgrades, please do get in touch!

The vulnerabilities listed below are not a comprehensive view of an Oracle-based analytics/DI estate - things like the database itself, along with Web Logic Server, should also be checked. See the CPU itself for full details.

Big Data Discovery (BDD)

  • CVE-2015-3253
    • Affected versions: 1.1.1, 1.1.3, 1.2.0
    • Base score: 9.8
    • Action: upgrade to the latest version, 1.3.2. Note that the upgrade packages are on Oracle Software Delivery Cloud (née eDelivery)

OBIEE

  • CVE-2016-2107
    • Affected versions: 11.1.1.7.0, 11.1.1.9.0, 12.1.1.0.0, 12.2.1.1.0
    • Base score: 5.9
    • Action: apply bundle patch 161018 for your particular version (see MoS doc 2171485.1 for details)

BI Publisher

ODI

  • CVE-2016-5602

    • Affected versions: 11.1.1.7.0, 11.1.1.9.0, 12.1.3.0.0, 12.2.1.0.0, 12.2.1.1.0
    • Base score: 5.7
    • The getInfo() ODI API could be used to expose passwords for data server connections.
    • More details in MoS doc 2188855.1
  • CVE-2016-5618

    • Affected versions: 11.1.1.7.0, 11.1.1.9.0, 12.1.2.0.0, 12.1.3.0.0, 12.2.1.0.0, 12.2.1.1.0
    • Base score: 3.1
    • This vulnerability documents the potential that a developer could take the master repository schema credentials and use them to grant themselves SUPERVISOR access. Even using the secure wallet, the credentials are deobfuscated on the local machine and therefore a malicious developer could still access the credentials in theory.
    • More details in MoS doc 2188871.1