Tag Archives: Oracle GoldenGate

How We Deliver Agile OBIEE Projects – Introducing ExtremeBI

Most OBIEE projects that we see are delivered through some sort of “waterfall” method, where requirements are defined up-front, there’s several stages of development, one or more major releases at the end, and any revision to requirements takes the form of a change request. These work well where requirements can be defined upfront, and can be reassuring to customers when they want to agree a fixed-price up-front with every subsequent change clearly costed. But, as with the development world in general, some customers are starting to look at “agile” methods for delivering BI projects, where requirements emerge over the course of a project, there isn’t so much of a fixed design or specification at the start, but instead the project adds features or capabilities in response to what are called “user stories”, making it more likely in-the-end that what ends-up getting delivered is more in-line with what users want – and where changes and additions to requirements are welcomed, rather than extra-cost change requests.

OBIEE naturally lends itself to working in an agile manner, through the three-layer nature of the repository (RPD); by separating the physical representation of the source data from how it is then presented to the end-users, you can start from the off with the dimensional model that’s your end goal, and then over time evolve the back-end physical layer from pointing directly at the source system to instead point at a data warehouse or OLAP cube. In fact, I covered this approach back in 2008 in a blog post called “A Future Oracle OBIEE Architecture” where I positioned OBIEE’s BI Server as a “business logic layer”, and speculated that at some point in the future, OBIEE might be able to turn the logical > physical mappings in the RPD into actual ODI mappings and transformation.

NewImage

In the end, although OBIEE’s aggregate persistence feature gave us the ability to spin-off aggregate tables and cubes from the RPD logical model, full ETL “push-down” never came although you can see traces of it if you have a good poke around the DLLs and directories under the BI Server component. What did happen though was Exadata; with Exadata, features such as SmartScan, and its ability to do joins between normalised tables much faster than regular databases meant that it became possible to report directly against an OLTP schema, or a ODS-like foundation layer, only adding ETL to build a performance star schema layer if it was absolutely necessary. We covered this in a series of posts on Agile Data Warehousing with Exadata, and the focus of this method was performance – by adding Exadata, and the metadata flexibility in OBIEE’s RPD, we could deliver agile projects where Exadata gave us the performance even when we reported directly against a third-normal form data source.

NewImage

 

 

And this approach worked well for our customers; if they’d invested in Exadata, and were open to the idea of agile, iterative development, we could typically deliver a working system in just a few months, and at all times what the users got was what they’d requested in their user story backlog. But there were still ways in which we could improve this method; not everyone has access to Exadata, for example, and reporting directly against a source system makes it tricky to add DW features like history, and surrogate keys, so recently we introduced the successor to this approach, in the form of an OBIEE development method we called “ExtremeBI”. Building our previous agile work, ExtremeBI introduced an integration element, using GoldenGate and ODI to replicate in real time any source systems we were interested in to the DW foundation layer, add the table metadata that DW systems expect, and then provide a means to transform the logical to physical RPD mappings into ODI ETL specifications.

NewImage

But in a way, all the technical stuff is by-the-by; what this means in practice for customers is that we deliver working systems from the first iteration; initially, by reporting directly against a replicated copy of their source system (with replication and metadata enhancement by GoldenGate, and optionally ODI),and then over subsequent iterations adding more end-user functionality, OR hardened ODI ETL code, all the while driven by end-user stories and not some technical design signed-off months ago and which no longer reflects what users actually want.

NewImage

What we’ve found though from several ExtremeBI customer engagements, is that it’s not just down to the technology and how well ODI, OBIEE and GoldenGate work; the major factors in successful projects are firstly, having the project properly pre-qualified at the start; not every project, and not every client, suits agile working, and agile works best if you’re “all in” as opposed to just agreeing to work in sprints but still having a set-in-stone set of requirements which have to be met at a certain time. The second important success factor is proper project organisation; we’ve grown from just a couple of guys with laptops back in 2007 to a fully-fledged, end-to-end development organisation, with full-time delivery managers,a managed services desk and tools such as JIRA, and you need to have this sort of thing in place, particularly a project management structure that’s agile-friendly and a good relationship with the customer where they’re fully-signed up to the agile approach. As such, we’ve found the most success where we’ve used ExtremeBI for fairly technically-savvy customers, for example a MIS department, who’ve been tasked with delivering something for reasonable price and over a short amount of months, who understand that not all requirements can be delivered, but really want their system to get adopted, delight their customer and focus its features on what’s important to end-users.

As well as processes and a method, we’ve also developed utilities and accelerators to help speed-up the initial setup, and ensure the initial foundation and staging layers are built consistently, with GoldenGate mappings already put in place, and ready for our developers to start delivering reports against the foundation layer, or use these foundation-layer tables as the basis of a data mart or warehouse build-out. The screenshot below shows this particular tool, built using Groovy and run from within the ODI Studio user interface, where the developer selects a set of source tables from an ODI model, and then the utility builds out the staging and foundation layers automatically, typically saving days over the manual method.

NewImage

We’ve also built custom KMs for ExtremeBI, including one that uses Oracle Database’s flashback query feature to pull historical transactions from the UNDO log, as an alternative to Oracle Streams or Oracle GoldenGate when these aren’t available on the project.

All together, using Rittman Mead’s ExtremeBI method along with OBIEE, ODI and optionally GoldenGate has meant we’ve been able to deliver working OBIEE systems for customers over just a few months, typically for a budget less than £50k. Coupled with cloud hosting, where we can get the customer up-and-running immediately rather than having to wait for their IT department to provision servers, we think this the best way for most OBIEE11g projects to be delivered in the future. If you’re interested, we’ve got more details on our “ExtremeBI in the Cloud” web page, or you can contact me via email – mark.rittman@rittmanmead.com – if you’d like to discuss it more,

GoldenGate and Oracle Data Integrator – A Perfect Match in 12c… Part 1: Getting Started

Over the years, I’ve blogged quite a bit about integration between Oracle Data Integrator and GoldenGate, and how to make it all work with the Oracle Reference Architecture. With the release of the 12c versions of ODI and GoldenGate last October, and a soon-to-be-updated reference architecture, it’s time to write a few posts on the subject again.

Getting Started with 12c

First, let me describe the new Journalizing Knowledge Module (JKM) that has been introduced in ODI 12c for integration with GoldenGate: JKM Oracle to Oracle Consistent (OGG Online). This JKM now allows GoldenGate to be setup in an “online” mode, meaning the GoldenGate parameter files and process groups will be configured and installed on the source and target GoldenGate servers. ODI communicates with the GoldenGate JAgent to perform the installation. The “offline” mode still exists as it did in the 11g version of the JKM, in which the parameter files, etc. are created in a temporary location, and then manually moved to the source and target. I’ll use the “online” JKM throughout this series of posts.

Another change to how Journalizing is implemented for GoldenGate in ODI is the Model to which the JKM is applied. In ODI 11g, the GoldenGate JKM was always applied to the Model containing the target tables, leaving the source table metadata completely out of the picture. This made sense, as GoldenGate handled everything on the source side. Now in ODI 12c, the source tables are reverse engineered and the JKM applied to the source Model. This allows the source table to be used in a single mapping for both the initial load and incremental load of the performance layer through the use of ODI 12c deployment specifications. The target, or fully replicated table, is no longer necessary in the metadata. We’ll talk through this concept in more detail later on.

Reference Architecture Update

Finally, before we get into the details, let’s go over the latest (yet to be released, I might add) version of the Oracle Information Management Reference Architecture. It was first presented by Stewart Bryson and Andrew Bond (Oracle) at the Rittman Mead BI Forum in Brighton (which is why I was given the OK to mention it pre-release!).

Information Management - Logical View

This latest reference architecture is not much different than previous versions. The main difference that pertains to this blog post is that the Staging Layer has been renamed the Raw Data Reservoir. If you look through the presentation by Stewart and Andrew, you’ll see that the many of the principles remain the same. They also describe more about an Agile approach to implementing the reference architecture, using GoldenGate, OBIEE against transactional schemas, and eventually ETL development using ODI, a methodology we here at Rittman Mead use with our clients, and call ExtremeBI. Look for the official release of the latest Information Management Reference Architecture in the next couple of months.

In this blog series, we’ll look at how to load the Raw Data Reservoir and Foundation Layer using GoldenGate, and subsequently load the Access and Performance Layer with Oracle Data Integrator Mappings. If you recall from my 11g posts on the subject, we don’t need to load these layers in sequence. GoldenGate will allow the extract of source data once and replication to multiple targets in parallel.

load-raw-data-fnd

Now that we’ve gone through some of the updated concepts for 12c and the Reference Architecture, let’s look at the high-level steps that must be taken in order to implement GoldenGate and ODI integration.

  • Install GoldenGate on the source and target servers – including JAgent configuration
  • Edit the “JKM Oracle to Oracle Consistent OGG (Online)” Knowledge Module (enable Foundation Layer load)
  • Setup ODI Topology (database schema and GoldenGate connections)
  • Setup and start Journalizing on the source Model
  • Develop Mappings for initial load and incremental load
  • Perform initial load and start replication

There is quite a lot of detail to add to these steps, so let’s get right into it.

GoldenGate 12c Installation and JAgent Configuration

The install of GoldenGate 12c is pretty straight-forward, so I don’t plan on going into much detail here. A step-by-step guide can be found on the DBASolved blog, with the installation setting up and starting the manager process and creating the necessary subdirectories. We then need to configure the JAgent on both the source and target GoldenGate installations, enabling ODI to communicate with GoldenGate during the “online” JKM start journalizing process, which will automatically configure and start the GoldenGate process groups. Setting up the JAgent for ODI integration is essentially the same as if you were setting up Oracle Enterprise Manager integration with GoldenGate.

First, you’ll notice that the file jagent.prm exists in the dirprm directory after installation completes. This parameter file will be used by the jagent process once started in GGSCI.

JAgent.prm

Next, we need to enable monitoring of GoldenGate by adding an entry, ENABLEMONITORING, to the GLOBALS file. Create the GLOBALS file (with no extension) in the GoldenGate home directory and open it in your favorite text editor. Simply add the line to enable monitoring, close and save the file.

Edit GLOBALS file

To allow secure communication between ODI and GoldenGate, we must create an Oracle Wallet with a password for JAgent. From the GoldenGate install directory, run the password agent. This will create the cwallet.sso file in the dirwlt subdirectory.

Create Oracle wallet

We’re almost there! Now we need to make a slight change to the Config.properties file in the cfg directory under the GoldenGate home. Edit this file and make the following changes:

Set the agent type to Oracle Enterprise Manager. If left as the default OGGMON, the JAgent will attempt to register with the Monitor server, which most likely is not installed.

agent.type.enabled=OEM

Under the JMX Username, add the line to signify that SSL is not being used by JAgent (unless SSL is actually being used!).

monitor.jmx.username=cmroot
jagent.ssl=false

Finally, ensure the JAgent port is unique across all installations, and on the server in general. In my example, I’m using a single Virtual Machine to host both the source and target GoldenGate installations, so I need to be careful about which ports are in use.

jagent.rmi.port=5571

Before starting the JAgent, go ahead and stop, then start the Manager process to ensure all changes have been initialized. Then, start the JAgent and check to ensure both Manager and JAgent are running. That completes the GoldenGate installation and JAgent configuration.

Start JAgent

If you want to just skip all of this installation work and get right to it, you can always download the Prebuilt Machine for Oracle Data Integrator 12c. It’s a VirtualBox VM with ODI 12c and GoldenGate 12c already installed and configured to work with the Getting Started Guide. The JAgent is already configured on the source and target GoldenGate installations, making it easy to get up and running. This is a great resource that the Oracle Data Integration product team has provided, and it sounds like they plan to continue adding to it in the future.

In Part 2 of the blog post series, we’ll edit the JKM to enable parallel load of the Raw Data Reservoir and Foundation Layer, as well as begin setup of the ODI Topology and metadata.

Rittman Mead BI Forum 2014 Abstract Scoring Now Live – For 1 Week Only!

The call for papers for the Rittman Mead BI Forum 2014 closed at the end of January, and we’ve had some excellent submissions on topics ranging from OBIEE, Visualizations and data discovery through to in-memory analytics, big data and data integration. As always, we’re now opening up the abstract submission list for scoring, so that anyone considering coming to either the Brighton or Atlanta events can have a say in what abstracts are selected.

The voting forms, and event details, are below:

In case you missed it, we also announced the speaker for the Wednesday masterclass the other day – Lars George from Cloudera, who’ll be talking about Hadoop, HBase, Cloudera and how it all applies to the worlds of analytics, BI and DW – something we’re all really excited about.

Voting is open for just one week, and will close at 5pm PST on Tuesday, 18th Feb. Shortly afterwards we’ll announce the speaker line-up, and open-up registrations for both events. Keep an eye on the blog for more details as they come.

Rittman Mead BI Forum 2014 Abstract Scoring Now Live – For 1 Week Only!

The call for papers for the Rittman Mead BI Forum 2014 closed at the end of January, and we’ve had some excellent submissions on topics ranging from OBIEE, Visualizations and data discovery through to in-memory analytics, big data and data integration. As always, we’re now opening up the abstract submission list for scoring, so that anyone considering coming to either the Brighton or Atlanta events can have a say in what abstracts are selected.

The voting forms, and event details, are below:

In case you missed it, we also announced the speaker for the Wednesday masterclass the other day – Lars George from Cloudera, who’ll be talking about Hadoop, HBase, Cloudera and how it all applies to the worlds of analytics, BI and DW – something we’re all really excited about.

Voting is open for just one week, and will close at 5pm PST on Tuesday, 18th Feb. Shortly afterwards we’ll announce the speaker line-up, and open-up registrations for both events. Keep an eye on the blog for more details as they come.

Thoughts on ETL in the Cloud

In a few week’s time I’m presenting at the BIWA Summit 2014 on running OBIEE in the Cloud, with the idea being that I’ll go through the various options for deploying OBIEE in public clouds, including Amazon EC2 and Oracle’s own cloud offering. But the more I thought through the various architecture and product options, the more I began to think that ETL – getting data into the BI system, from wherever it is now – is the key enabling technology. The vast majority of BI systems use data that’s refreshed on a daily, weekly or even real-time basis, and on premise we tend to use tools such as Oracle Data Integrator, Oracle Warehouse Builder, Informatica PowerCenter and the like to move data from source to target, transforming and cleansing it en-route.

When you deploy a BI platform such as OBIEE into the cloud, you’ve got a couple of options around where you hold its data, and how you do the ETL:

  • You can keep the data warehouse database on-premise, along with the ETL tool, and just run the BI part in the cloud
  • You can move the database into the cloud as well (using a service such as Amazon RDS), but keep the ETL tool on-premise
  • You can try and move the ETL into the cloud too – either hosting a tool like ODI in its own cloud VM, or make use of ETL-as-a-service such as that provided by Informatica Cloud.

Things get more interesting when part of your IT infrastructure sits on-premise, and some sits in the cloud. It gets even more interesting when your data sources are also software-as-a-service (SaaS), for example Salesforce.com and Workday, where data access is via APIs rather than direct SQL*Net connections.

So where do we start with ETL-in-the-cloud? One place to start is Oracle’s own slide-deck on OBIEE in the cloud, where they set out what they see as the key differences between traditional on-premise software and software delivered via SaaS:

NewImage

In this SaaS world, the expectation is that:

  • The infrastructure, software etc is already in placed, is patched regularly for you in the background, and new features emerge regularly
  • You’re on a shared platform, but you won’t be aware of other tenants on a day-to-day basis
  • The “innards” and workings are hidden from you – you just consume a service
  • It’s cloud-native – from connectivity to cloud apps through to common social sign-ons, commenting and collaboration
  • Everything is thin-client, web-based
  • Everything is self-service, self-provisioning, right from creation of your user account
  • Sign-up is on-demand, paid by the hour/day etc, and with no long setup process

What this means for data integration (and ODI) then, in my opinion, is:

  • It must be available as a service – log in, create data source connections, define mappings, run code
  • The internals typically would be hidden – so no need to know about agents, GoldenGate vs. ODI, choices of knowledge modules etc
  • Web-based data flow modellers as well as admin tools
  • Connectors through to Salesforce.com and other cloud apps
  • Connectors through to Hadoop, Hive, JSON compatibility etc
  • Instant sign-up and charging by time/use

With bonus points for:

  • Low-cost or free option to get you started
  • A consistent story that goes through from messaging, application integration through to large-scale data movement
  • Easy clustering, scale-out, deployment on services such as Amazon EC2
  • Options to deploy on-premise or cloud, and run integration tasks on-premise or cloud

Looking at it architecturally, ETL-in-the-cloud would sit between sources and targets, both on-premise and in the cloud, providing the vital data movement capability between source systems and the BI Platform itself – most probably to its own database-in-the-cloud, running the data warehouse.

NewImage

So what are the options then, if you want to use cloud-based ETL to load a cloud-based data warehouse and BI system? To my mind, there’s four main options:

  • If you’re using a cloud-based database service such as Amazon’s Redshift, or Oracle’s own public cloud “schema-as-a-service” database, you can use the ETL tools provided with the service
  • You can try and put ODI in the cloud, maybe using an Amazon EC2 instance running WebLogic and a JEE agent, with another instance providing the metadata repository database
    • or as an alternative, do the same for one of the open-source ETL tools
  • You can use one of the new “cloud-native” ETL-as-a-service products, such as Informatica Cloud or SnapLogic
  • Or – you can try and abstract away ETL altogether – more on this later on.

The first option really applies if (a) you’re using a service such as Amazon Web Services’ EC2, (b) your data also most probably sits in AWS cloud storage, (c) you want to move data between your source application or export into the main data warehouse database, and (d) you don’t really need to integrate data from different sources. Amazon AWS provides a number of options for loading data into EC2 and the various database and analytic services they provide, including Amazon Data Pipeline (shown in the screenshot below), the most “ETL-like” of their loading services, along with a sneakernet service, and the new Amazon Kinesis service for real-time streaming data.

NewImage

Oracle’s Cloud Database service at the moment restricts you to a single schema, and more importantly there’s no SQL*Net access, so uploading data is either through SQL*Developer (which has a special, custom Oracle Cloud connector), or through utilities provided via ApEX.

NewImage

Clearly this sort of web-based data loading isn’t designed for data warehouse scenarios, and in this initial iteration Oracle’s cloud database is probably designed to support cloud applications running on Oracle’s cloud Java service, with OBIEE-in-the-cloud mainly designed for reporting against this data, and personal “sandbox” scenarios. What’s not immediately obvious when you use these cloud ETL tools is that each one has its own admin console, its own API, it’s own scheduler, it’s own metadata – which is usually the point at which you decide you need an enterprise ETL tool.

So what about the idea of running Oracle Data Integrator in the cloud? There’s a few options here, that I can think of:

  • Creating a cloud server instance that runs ODI, along with all of its metadata repositories, WebLogic instances and agents in a single location
  • Run the repository database either on-premise or on another cloud server (or even Amazon’s RDS service), with agents running either in their own cloud servers, or on the cloud servers holding the target database
  • Build ODI into the target platform, as Oracle have done with BI Apps 11.1.1.7.1, along with management tools to hide the complexity and workings of ODI

The first option sounds the “neatest” in terms of a wholly-cloud deployment, as all the parts of ODI are held in one place, and you can think of it as an applicance or service. But ODI’s Extract-Transform-Load approach ends-up complicating things in a cloud-only deployment; the target platform (for example one of the Postgres-compatible cloud databases) might not support all of the target-layer transformations you want, and the integration part can’t really sit in the ODI cloud instance, unless you run it hub-and-spoke style and either use a local database for transformations, or use the in-memory feature within the agents. In fact where ODI makes most sense is in a hybrid on-premise/cloud setup, where most of your data resides on-premise, as does your ETL process, with the cloud being gradually migrated to and in the meantime used alongside on-premise applications.

Oracle’s white paper on cloud data integration majors on this type of scenario, with Oracle GoldenGate also used to replicate data between the two environments. At this year’s Oracle Openworld the BI Apps team also announced cloud adapters for the BI Applications, initially used to extract data from Fusion Apps in the Cloud back to the on-premise BI Apps data warehouse, with other Oracle cloud applications data sources following-on.

NewImage

Where things get interesting with Oracle’s approach is when it’s non-Oracle cloud applications that we’re looking to integrate with. “Cloud Integration – A Comprehensive Solution”, another Oracle white paper, describes techniques to integrate with Salesforce.com, Workday and the Fusion Apps, but all of this takes place through Oracle SOA Suite and Oracle Jdeveloper, products that are way too technical for the average ETL developer (or customer implementor).

NewImage

In fact there’s two obvious things that are missing here, that are present in the Cloud Database and Cloud OBIEE offerings that Oracle recently launched:

  • It’s not a “service” – you can’t just create an account, design some mappings, load a target – the assumption is that each system is single tenant, perpetual license, self-install
  • It’s not consumer-level in terms of simplicity – there’s bits of ODI, bits of GoldenGate, bits of SOA, bits of Jdeveloper – all good bits, but not a single integration-platform-as-a-service

But it’s probably fine if you’re running a hybrid on-premise/cloud strategy, and as I’ll talk about later on, the ELT approach does still have some distinct advantages over cloud-native ETL tools.

What about the more recent, “cloud-native” ETL tools such as SnapLogic, and Informatica Cloud? Informatica Cloud is probably the easiest product to understand if, like me, you’re from an ODI background. What Informatica have done here in terms of architecture is move the metadata and application-layer parts of Informatica PowerCenter into the cloud, add a bunch of cloud application adapters (including a third-party marketplace for them), but still do the actual data integration on-premise, using a cut-down version of the main PowerCenter engine.

NewImage

Some of the nicer features in this setup are its multi-tenant and “integration-as-a-service” nature, the way it deals with firewall issues (do the integration on-premise), and the interoperability with traditional Informatica PowerCenter, where you can publish custom maplets from on-premise Informatica and push them into the cloud version. If Oracle came out with an ODI-in-the-cloud service, I think it’d look a lot like this.

To my mind though, the most impressive of the cloud integration vendors is SnapLogic. Their SnapLogic Integration Cloud product looks like it was designed first and foremost to run “in the cloud”, it’s available as a service, and the data integration designer focuses on data paths and application integration rather than the low-level database centric approach traditional ETL tools take,

NewImage

What’s particularly impressive to me is the way that they’ve taken concepts used in tools like ODI – agents that perform the data integration tasks, running as standalone servers – and built on it to create the concept of “snaplexes”, collections of JVMs that can sit in-the-cloud, or on-premise, elastically scale-up to handle larger workloads, and use Amazon’s EC2 and S3 compute and storage clouds to perform any data transformations en-route. Data being transformed in SnapLogic streams through the system with low latency and using web protocols, and the whole thing is just neatly designed to run native on the cloud.

Where tools like SnapLogic do fall short though is on the “last mile” into the data warehouse, where tools like ODI come with lots of templates, database integration features and so forth to handle slowly-changing dimensions, dimension lookups, analytic functions and the like, but cloud-native tools really focus on basic transformations and getting the data from source to target. Where this of course gets interesting is when SnapLogic is used to load databases such as Amazon Redshift, for example, in combination with Tableau, as some of the ETL tool features we’re used to using with tools such as ODI and Informatica just aren’t there yet. What this means in practice is that if you’re looking to move OBIEE into the cloud, along with an Oracle database, you’ll probably still want to use a tool like ODI to do your data load, as these tools are just so mature when it comes to loading relational data warehouses.

So finally, then, onto what I probably think will be the future of ETL – “invisible ETL”, or ETL so abstracted away that you don’t even think of it as a separate product in itself. If you think about it, ETL is a necessary evil when it comes to BI – customers just want their data loaded, they don’t want to worry about how it’s transported, which data loading tool you use and so on. Oracle have had a number of initiatives over the past few years to automate the creation of ETL code from logical table source mappings in the OBIEE RPD, and one of the major drivers in the BI Apps product roadmap is to reduce the amount of manual work propagating application data model changes through the various layers of ETL and BI tool metadata.

A really nice example though of taking this approach of hiding the complexity of ETL can be found with a company called Good Data, who sell a BI in the Cloud product that comes with data storage (via Vertica) and ETL all wrapped-up in a consumer-like service that’s focused on analytics and visualisations but supports data loading from a number of database and cloud application sources. The screenshot below from the GoodData CloudConnect LDM Modeler Guide shows GoodData’s logical data model development environment, with this part of the product handling the facts, dimensions, attributes and other user-facing data elements, and a more traditional ETL tool and server (based on Clover ETL, no less) doing the actual data heavy-lifting.

NewImage

GoodData splits its data models into these logical and physical elements, which of course is exactly what ODI does – and what OBIEE does too. In fact, the more you look at GoodData, the more you think that all of the elements are already in place at Oracle to create a very similar product, with the added benefit of access to tools such as GoldenGate, EDQ and SOA Suite. Even SnapLogic’s Snaplex concept is conceptually very similar to ODI’s agents, but what they’ve done, and what GoodData, and Informatica and others have done, is wrap the products up into a service, made it all consistent (at least on paper), architected it for cloud-only, and hybrid on-premise/cloud environments, and built-out all of the third-party adapters. It’ll be interesting to see what Oracle’s long-term response to this will be – honestly I’ve no special insight into this part of the product roadmap so I’m just as interested as anyone as to how it will turn out.

So – is any approach better, is Oracle’s approach the worst, are vendors like SnapLogic the future, and what should we use if deploying OBIEE in the cloud? Well to my mind there’s no black-and-white answer, and the choice comes down to a number of factors including:

  • Where are your sources and targets; the more that still reside on-premise, the more a traditional on-premise tool such as ODI makes sense
  • To what extent are you interacting with non-Oracle targets and sources; the more non-Oracle ones you’re using, the more a non-Oracle tool will probably make sense
  • How complex will your end data transformations be – if you’re expecting to do lots of SCD2 transformations, analytic queries, clever DW-stuff, a tool like ODI will make sense
  • How much upfront investment do you want to make? Services like Informatica Cloud are far easier to get provisioned on, and involve far less up-front license costs, than an on-premise install
  • Or are you just loading data from a single source into a managed cloud database – if so, one of the vendor-supplied utilities will probably do the job

Anyway – I’d be interested in others’ opinions, and whether I’ve missed anything out. But for now – that was some thoughts from me on running ETL “in the cloud”.