Category Archives: Rittman Mead

Using Streams with ODI12c for Oracle-to-Oracle Change Data Capture

Although Oracle GoldenGate replaced Oracle Streams a couple of years ago as the recommended data replication and change data capture technology for Oracle databases, many customers still on Oracle Database 11gR2 or earlier still use Streams for Oracle-to-Oracle change data capture as it works and compared to GoldenGate doesn’t require any additional licensing. Oracle’s GoldenGate Statement of Direction paper from 2014 states that streams in Oracle 11gR2 will continue to be supported but no future versions of the Oracle Database will come with Streams included, but if you’re on 11gR2 and you just want to trickle-feed capture between two Oracle databases it’s an interesting option.

I covered Oracle-to-Oracle data replication using Streams a few times in the past including this OTN article on ODI and Change Data Capture back from before 2007 or so, this article on OWB11gR2 and Change Data Capture from 2010 and once from back in 2006 that went into the details of setting up asynchronous hotlog change data capture with the new “Paris” OWB10gR2 release. We’re now on the 12c release of Oracle Data Integrator and I’m teaching our ODI12c Bootcamp course to a client next week who’s particularly interested in using Streams with ODI12c, so I thought it’d be worth taking a look at this feature in more detail to see if much has changed since the earlier articles.

Let’s start then with an ODI12c 12.1.3 install with a regular mapping set-up to copy and join the DEPT and EMP tables from one Oracle database into a denormalized table in another Oracle Database. Both are Oracle Database 11gR2 (11.2.0.3) and the initial mapping looks like this:

NewImage

One thing that many ODI developers don’t know about the 12c release is that it comes with a set of “component-style” knowledge modules built-into the tool, which you can use straightaway to get a mapping running without having to select and import IKMs, LKMs and other KMs from the ODI Studio filesystem. In my case the Physical mapping looks like the screenshot below with two execution units (one for each Oracle Database server) and a number of built-in component-style KMs available for selection. I choose the LKM SQL to SQL (Built-in) load knowledge module which uses a generic JDBC connection to load source records into the staging table on the target server, and then the IKM Oracle Insert integration knowledge module to take that staging data and integrate it into the target table.

NewImage

I then run this mapping and see that ODI extracted data from the source database using a Java routine and batch transfers into the Oracle staging table, and then integrated the contents of that staging table into the target Oracle table. I could of course improve this by using the LKM Oracle to Oracle (DBLink) knowledge module and thereby avoid loading in two steps, but what I’d instead like to do is use Oracle Streams to trickle-feed new and changed data from my source tables over to the target database server, as shown in the diagram below. 

 

Asynch

In the OWB and Asynchronous Change Data Capture article i linked to earlier in the post, setting up change data capture involved quite a few steps; the database had to be put into archivelog mode, the GLOBAL_NAMES parameter had to be set and a whole bunch of PL/SQL procedures had to be called to set up the source-to-target connection. Once it’s running, Streams takes transactions off of the redo log files on the source database and send them across the network to the target database server in a similar way to how GoldenGate sends transactions in the trail file across to target database servers – except it’s Oracle-to-Oracle only and in my experience is a lot more fragile than GoldenGate, which is why we and most other customers switched to GoldenGate when it came out.

ODI12c comes with a number of change data capture or “journalizing” knowledge modules that use either database triggers, UPDATE_DATE fields, Oracle Streams or GoldenGate to replicate data from source system to the target data warehouse. The journalizing knowledge module we’ll use, JKM Oracle 11g Consistent (Streams) is a template-style KM that needs to be imported first from the filesystem where ODI Studio was installed, as shown in the screenshot below – note also when you do this yourself that there’s a big “deprecated” notice next to it, saying that it could be removed at any time (presumably in-favour of the GoldenGate-based ones)

NewImage

ODI and the JKM Oracle 11g Consistent (Streams) KM takes a more “fck it, let’s just to it” approach to database configuration than my OWB10gR2 version did, automatically configuring the source database for Streams by running all of the setup PL/SQL routines leaving you just to put the database in archivelog mode if it’s not already, and granting the connecting user (in my case, SYSTEM) streams administrator privileges. Moving over to SQLPlus on the source database I therefore run the setup commands listed in the KM notes like this:

SQL*Plus: Release 11.2.0.3.0 Production on Sun Jul 26 06:34:44 2015

 

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

 

 

Connected to:

Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 – 64bit Production

With the Partitioning, OLAP, Data Mining and Real Application Testing options

 

SQL> grant dba to system;

 

Grant succeeded.

 

SQL> begin

  2  dbms_streams_auth.grant_admin_privilege(

  3       grantee   => ‘system’,

  4       grant_privileges => true);

  5  end;

  6  /

 

PL/SQL procedure successfully completed.

 

SQL> shutdown immediate;

Database closed.

Database dismounted.

ORACLE instance shut down.

SQL> startup mount;

ORACLE instance started.

 

Total System Global Area 2471931904 bytes

Fixed Size    2230872 bytes

Variable Size  570426792 bytes

Database Buffers1895825408 bytes

Redo Buffers    3448832 bytes

Database mounted.

SQL> alter database archivelog;

 

Database altered.

 

SQL> alter database open;

 

Database altered.

Next I’ll go back to ODI Studio and enable the source Model for journalising by double-clicking on the model in ODI Studio and then selecting the JKM from the Journalizing tab, like this:

NewImage

I then right-click on the EMP and DEPT tables within the source model and select Changed Data Capture > Add to CDC, where I can fine-tune the replication order so that new departments (DEPTNO) employees link to will always have been created before the employee record hits.

NewImage

It’s at this next stage, when I enable journalizing, that Streams is set-up on the source and target database servers and all the supporting tables and views are created. To enable journalizing I click on the model, not the individual tables, and select Changed Data Capture > Add to CDC, like this:

NewImage

If you’ve read any of our previous posts on ODI and changed data capture you’ll realise that this setup process is that same regardless of the underlying replication technology, which makes it easy to start with database-centric CDC technologies such as this and then move to GoldenGate later on without lots of rework or re-training. For now though lets run this setup process using the local agent and then check the Operator navigator to see what it did (and whether it worked…)

NewImage

And it did work. Enabling journalising with the Oracle Streams JKM involves quite a few setup steps including checking that all the database settings and parameters are enabled correctly, then running the various DBMS_STREAMS and other packages to setup the capture and transmission process. Then, as with all of the ODI JKMs a set of J$ tables are created to hold the primary keys of new and changed records coming from the source system, along with JV$ views that join those primary keys to the full incoming replicated rows – this blog post by the Oracle ODI team explains the background to JKMs very well if you want to understand them in more detail. Looking at the source SCOTT schema in SQL*Developer I can see the CDC and J$/JV$ tables and views created in the schema; if I didn’t want these tables created in the actual data schema I could have specified a different schema as the WORK schema when I created the database connection in ODI Studio prior to this exercise.

NewImage

Next I have to define one or more “subscribers’ to the journals; for these more advanced “consistent set JKMs” of which the Oracle Streams one is one, you can define multiple consumers or “subscribers” to the changed data so that one can be further down the queue than the other (Simple JKMs only allow a single subscriber). I call my subscriber “SUNOPSIS” as this is the default subscriber name ODI adds to the mappings downstream. 

NewImage

Pressing OK after adding the subscriber again brings up the prompt to select an agent, and going over to the Operator navigator I can see that another set of steps have run, again doing some streams setup but also adding details of the journal subscribers to the tables created on the source database.

NewImage

I can check that journalising is now working by using the View Data… feature in the Designer navigator to insert new rows into the EMP and DEPT tables, and then checking the J$ tables in the source database schema to see if the rows’ primary keys have been added – which they have been.

NewImage

To now just read journalised new and changed data into my mapping, all I do then is go to the Physical mapping diagram, select the first source table and check the Use Journalized Data Only checkbox, then do the same for the other table (note it is the table source you select, not the access point for that table into the target execution block).

NewImage

So now I’ll run the mapping and check the results in the Operator navigator … but instead, I get an error:

NewImage

This is because, given the way that ODI handles CDC and journalising, it can’t allow two journalised tables to be directly joined in a mapping – we think this is because ODI can’t guarantee both tables are in the same update “state” and therefore makes you copy their data into a staging or intermediary table before you can do the join. I therefore amend the mapping to load the journalised tables into staging tables on the target database server, amend the joins and filter to reference the staging tables, and then join their contents to then filter and load into the target reporting table, like this:

NewImage

With the Physical mapping details, the two incoming Oracle staging tables are loaded by the same LKM SQL to SQL component-style mapping as we used before to extract data from the two journalised tables, and the Journalized Data Only flag is still set for the source tables, as you can see below.

NewImage

What this also highlights is a key difference between the way I trickle-fed transactions across from my source database back with the OWB and Changed Data Capture article back at the start of this post, and the way ODI’s JKMs do it; in the OWB example I set up an actual trickle-feed process outside of OWB which transferred changed data across the network to my target data warehouse, whereapon I then read those change tables and used them to update the target DW tables in my data warehouse.

And this is actually how the GoldenGate KMs work with ODI – a GoldenGate replication process copies new and changed data from the source database to the target and ODI then reads from these change tables, whereas the Streams (and other non-GoldenGate) JKMs create the change capture tables back on the source database (the various J$ and JV$ tables I reviewed using SQL*Developer earlier on), with ODI then reading from those remote change tables and bringing across the new and changed data to the target database server. This I guess makes things easier to set up – you don’t have to worry about configuring the target database for streams replication – but it does mean that you still incur the network traffic every-time you micro-batch the changes across the network rather than spreading that traffic transaction-by-transaction.

Anyway, back to ODI and this time, when I run the mapping it works, though looking at the Operator navigator again I can see that no new data came across, and the journal data is still waiting to be consumed afterwards. Why is this?

NewImage

If you’ve only used the Simple CDC JKMs from Oracle before this is normally all you need to do, but with Consistent Set ones such as this one, or the GoldenGate JKMs, you need to lock the subscriber view of the journalised data and extend the CDC window before you can access the journal records; for the Simple JKMs the IKM (Integration Knowledge Module) takes care of the unlock, extend, purge and lock operations for you automatically in the background, whereas with Consistent Set ones you typically do this as part of a wider ODI package as shown in the screenshot below, with the first and last tasks created by dragging and dropping the journalized model onto the package canvas and selecting Journalizing Model as the type (the subscriber name, “SUNOPSIS” in this case is typed-in below those settings and is off-screen in the screenshot)

NewImage

Now when I run this package, as opposed to the mapping on its own, I get the row of new data I was expecting and the journal table is now empty, as I was the only subscriber.

NewImage

Finally, if I was looking for real-time continuous loading into this target table, I could wrap the package in an event-detection loop that waits for in this case ten seconds for three journal rows to be written, then either processes the three as soon as the third arrives or loops around again every ten seconds (obviously in-reality you’d want to put in a mechanism to halt the loop if needed, but in my case I’ll just kill the job from the Operator navigator when I want it to stop)

NewImage

So that’s the basics of using ODI with Oracle Streams for Oracle-to-Oracle changed data capture; if you’re just copying data between two Oracle databases and you’re on 11gR2 or earlier this might be an option, but long-term you’ll need to think about GoldenGate as Oracle aren’t developing Streams beyond the 11gR2 release. Note also that all the streams activity happens over on the source database server so you still need this additional step to copy the journaled data across to the target data warehouse, but it’s still a fairly non-invasive way to capture changes on the source Oracle database and it does have the considerable advantage (compared to GoldenGate) of being free-to-use.

Security patches released for OBIEE 11.1.1.7/11.1.1.9, and ODI DQ 11.1.1.3

Oracle issued their quarterly Critical Patch Update yesterday, and with it notice of several security issues of note:

  • The most serious for OBIEE (CVE-2013-2186) rates 7.5 (out of 10) on the CVSS scale, affecting the OBIEE Security Platform on both 11.1.1.7 and 11.1.1.9. The access vector is by the network, there’s no authentication required, and it can partially affect confidentiality, integrity, and availability.
    • The patch for users of OBIEE 11.1.1.7 is to install the latest patchset, 11.1.1.7.150714 (3GB, released – by no coincidence I’m sure – just yesterday too).
    • For OBIEE 11.1.1.9 there is a small patch (64Kb), number 21235195.
  • There’s also an issue affecting BI Mobile on the iPad prior to 11.1.1.7, the impact being partial impact on integrity.
  • For users of ODI DQ 11.1.1.3 there’s a whole slew of issues, fixed in CPU patch 21418574.
  • Exalytics users who are on ILOM versions earlier that 3.2.6 are also affected by two issues (one of which is 10/10 on the CVSS scale)

The CPU document also notes that it is the final patch date for 10.1.3.4.2. If you are still on 10g, now really is the time to upgrade!

Full details of the issues can be found in Critical Patch Update document, and information about patches on My Oracle Support, DocID 2005667.1.

Options for Enabling ODI11g+12c Standalone Agents for High-Availability (or … Why JEE Agents are the Best Option)

A few years ago we posted some articles on the blog around Oracle Data Integrator ETL restartability and resilience, and making ODI’s agents highly-availabile. The context around those posts was a large data integration project we were doing for a client who was deploying Oracle Fusion Middleware across the entire enterprise, and they wanted us to use the new ability within ODI11g to deploy agents within WebLogic Server Managed Servers to provide true high-availability for their ETL platform; ODI11g in this topology uses Oracle Coherence to provide a middleware-based data grid that holds a shared record of the agent schedule so that another agent can pick-up and resume tasks in the schedule if the main agent falls-over mid-way through execution. The diagram below shows this topology, which was then brought-forward into ODI12c and can be used now, through a feature called “JEE Agents”.

NewImage

And this is our standard recommendation to clients who want to make their ODI agents as highly-available as possible, so that there’s always an active agent ready to receive execution or schedule requests, and schedules keep running even if the agent that’s running it falls-down mid-way through execution. But the JEE Agent approach has some significant prerequisites:

  • You need to deploy them within Oracle WebLogic Server Enterprise Edition, with at least two machines in a cluster within the WebLogic domain, which can be expensive for clients without the required WebLogic Server licenses or who haven’t got suitable hardware available
  • You also need Oracle Coherence for the shared record of ODI schedules that each JEE agent can get access to, and the client may not have a license for Coherence in their current WebLogic/Fusion Middleware license deal
  • The customer might also, understandably, what to properly understand what they can do with just Standalone Agents before they shell-out for WebLogic Server Enterprise Edition, Coherence and a bunch of new servers

For anyone new to Oracle Data Integrator, agents are Java processes that either sit within WebLogic Server clusters or directly on source and target servers, and take the instructions within your ODI mappings, packages and loan plans and execute them as SQL instructions sent to databases, web service calls or whatever other interactions your ETL processes require. Agents can either receive instructions ad-hoc from the ODI Studio application or via command-line or web service calls, or they can be assigned on or more schedules that they then execute at the appropriate time to move data into your data warehouse or between various applications. ODI11g has two types of agents; JEE agents as mentioned earlier, and what are called Standalone Agents, agent that run in a Java JVM and are started from the command-line, that don’t require WebLogic Server and are mainly used because of their small footprint – you an run them directly on the database server that’ll be performing the Load and Transform part of an ELT job, for example, but they don’t have the HA capabilities of agents running in WebLogic Server and they don’t have access to all the “enterprise” Java features provided to agents running within WebLogic Server.

Borkur and I worked with just such a customer recently and it was a useful opportunity to dig into exactly how Standalone Agents handle failover, and how close you can get to JEE agents if you’d like some degree of high-availability for your ODI setup. As this was a new implementation of ODI for the customer and they were currently using OBIEE11g, their first though was to deploy ODI11g until such time as they could upgrade their entire Fusion Middleware stack to 12c; as we were only talking about Standalone Agents though my advice was that there was no harm in deploying ODI12c instead of the 11g version as it didn’t really involve any WebLogic integration, and they could then benefit from the improved developer features in ODI12c such as flow-based mappings and deployment specifications.

High-Availability for ODI11g Standalone Agents – OPMN and “Cross-Wiring” to Create a Single Physical/Logical Standalone Agent

So let’s start with Standalone Agents within an ODI11g deployment. In this instance, the customer had two datacenters and wanted to put an agent in each to provide some degree of HA and disaster recovery, with one agent at any one time being the “primary” agent that received job execution requests and was assigned the scheduled daily load. In this instance we were able to also deploy OPMN (Oracle Process Manager and Notification Server), installed as part of the Oracle Web Tier Utilities download and the recommend way to provide some degree of high-availability for ODI11g Standalone Agents.

What we thought would be interesting here though would be to try and follow John Goodwin’s instructions on cross-wiring two Standalone Agent + OPMN installs to provide active/passive failover across two server nodes. In John’s case he’s talking about agents running Hyperion data loading jobs but it’s just as applicable to our situation, so we went with the topology below to try-out active/passive failover for our ODI11g Standalone Agents across two nodes.

NewImage

To start with then, each Standalone Agent is now registered with their local OPMN instance so that opmntcl can now be used to stop, start and restart them. In our setup, the primary ODI node (odi11g-node1.rittmandev.com) currently has the active Standalone Agent instance running, as I can see by using the opmnctl command-line utility:

[oracle@odi11g-node1 bin]$ ./opmnctl status
 
Processes in Instance: instance1
---------------------------------+--------------------+---------+---------
ias-component                    | process-type       |     pid | status  
---------------------------------+--------------------+---------+---------
ODI11_Standalone                 | odiagent           |   21891 | Alive   
ohs1                             | OHS                |     N/A | Down

 If I switch over to the second node (odi11g-node2.rittmandev.com) I’m currently leaving that agent as disabled, as it’s the “passive” part of my active/passive arrangement.

[oracle@odi11g-node2 ~]$ cd /home/oracle/Middleware/Oracle_WT1/instances/instance1/bin
[oracle@odi11g-node2 bin]$ ./opmnctl status
 
Processes in Instance: instance1
---------------------------------+--------------------+---------+---------
ias-component                    | process-type       |     pid | status  
---------------------------------+--------------------+---------+---------
ODI11_Standalone                 | odiagent           |     N/A | Down    
ohs1                             | OHS                |     N/A | Down

The clever part now, and as detailed in John Goodwin’s article, is registering each OPMN server with the other server in the active/passive setup, so that if one goes down, OPMN brings up the other one. This method doesn’t seem to be documented in the ODI11g documentation and later on I’ll explain why I think this is, but to “cross-wire” the OPMN servers in this way you have to make a few manual changes to each OPMN server’s opmn.xml configuration file, in my case located at /home/oracle/Middleware/Oracle_WT1/instances/instance1/config/OPMN/opmn/opmn.xml:

NewImage

1. First you create a new <topology></topology> section within the file that lists out the hosts that have OPMN installed that you want to include in the failover cluster (use the remote 6701 port rather than the local 6700 port for this)

2. Next you add the service-failover=“1” and service-weight settings to the <process-type></process-type> section for the agent you want to

[oracle@odi11g-node1 bin]$ ./opmnctl status
 
Processes in Instance: instance1
---------------------------------+--------------------+---------+---------
ias-component                    | process-type       |     pid | status  
---------------------------------+--------------------+---------+---------
ODI11_Standalone                 | odiagent           |   22064 | Alive   
ohs1                             | OHS                |     N/A | Down    
 
[oracle@odi11g-node1 bin]$ kill -9 22064
[oracle@odi11g-node1 bin]$ ./opmnctl status
 
Processes in Instance: instance1
---------------------------------+--------------------+---------+---------
ias-component                    | process-type       |     pid | status  
---------------------------------+--------------------+---------+---------
ODI11_Standalone                 | odiagent           |   22147 | Init    
ohs1                             | OHS                |     N/A | Down    
 
[oracle@odi11g-node1 bin]$ kill -9 22147
[oracle@odi11g-node1 bin]$ ./opmnctl status
 
Processes in Instance: instance1
---------------------------------+--------------------+---------+---------
ias-component                    | process-type       |     pid | status  
---------------------------------+--------------------+---------+---------
ODI11_Standalone                 | odiagent           |     N/A | Down    
ohs1                             | OHS                |     N/A | Down

and then moving over to the secondary’s node OPMN instance, you can see that it’s agent has now changed from disabled, to running:

[oracle@odi11g-node2 bin]$ ./opmnctl status
 
Processes in Instance: instance1
---------------------------------+--------------------+---------+---------
ias-component                    | process-type       |     pid | status  
---------------------------------+--------------------+---------+---------
ODI11_Standalone                 | odiagent           |     N/A | Down    
ohs1                             | OHS                |     N/A | Down    
 
[oracle@odi11g-node2 bin]$ ./opmnctl status
 
Processes in Instance: instance1
---------------------------------+--------------------+---------+---------
ias-component                    | process-type       |     pid | status  
---------------------------------+--------------------+---------+---------
ODI11_Standalone                 | odiagent           |   10263 | Alive   
ohs1                             | OHS                |     N/A | Down

… which is pretty cool. But there’s a couple of drawbacks with this approach. The first is that whilst OPMN can move the active agent between two or more server nodes, it doesn’t update the agent’s hostname in the ODI Master Repository, which means you either need to hack around with the ODI Master Repository tables to update its record of the hostname each time the agent fails-over, or you do what I did and create a CNAME or VIP entry in the DNS server’s records so that the agent is reached by a virtual hostname, in our case odi11agent.rittmandev.com

NewImage

The problem though, is that you need a means of updating this CNAME or VIP record when OPMN fails-over the agent to the secondary node, otherwise any further requests to this agent through ODI Studio will direct to the failed node, not the one that’s now running the active Standalone agent. In-practice you could set-up something that pings both agents and updates the CNAME record in your DNS zone when the primary agent moves, but this involves working with your organization’s network or sysadmin team which can sometimes be a challenge. 

The other issue is what happens when an ad-hoc or scheduled job is running when the active agent goes down. Let’s check this now by creating a Load Plan and then running it through what’s currently the active agent:

NewImage

I’ll now kill the process, and let OPMN restart it on that server node:

[oracle@odi11g-node2 bin]$ ./opmnctl status
 
Processes in Instance: instance1
---------------------------------+--------------------+---------+---------
ias-component                    | process-type       |     pid | status  
---------------------------------+--------------------+---------+---------
ODI11_Standalone                 | odiagent           |   10263 | Alive   
ohs1                             | OHS                |     N/A | Down    
 
[oracle@odi11g-node2 bin]$ kill -9 10263
[sudo] password for oracle: 
[oracle@odi11g-node2 bin]$ ./opmnctl status
 
Processes in Instance: instance1
---------------------------------+--------------------+---------+---------
ias-component                    | process-type       |     pid | status  
---------------------------------+--------------------+---------+---------
ODI11_Standalone                 | odiagent           |   10452 | Alive   
ohs1                             | OHS                |     N/A | Down

Notice how the pid (process ID) has now changed for the agent, which indicates that OPMN has restarted it on that node. Going back to ODI Studio, after a while I can see that the agent on restart has marked the load plan execution as having failed:

NewImage

If I restart the load plan execution though, a new load plan execution instance is created which automatically skips the steps that ran successfully before:

NewImage

If I kill the agent’s process three times in-a-row though, like this:

[oracle@odi11g-node2 bin]$ ./opmnctl status
 
Processes in Instance: instance1
---------------------------------+--------------------+---------+---------
ias-component                    | process-type       |     pid | status  
---------------------------------+--------------------+---------+---------
ODI11_Standalone                 | odiagent           |   10452 | Alive   
ohs1                             | OHS                |     N/A | Down    
 
[oracle@odi11g-node2 bin]$ kill -9 10452
[sudo] password for oracle: 
[oracle@odi11g-node2 bin]$ ./opmnctl status
 
Processes in Instance: instance1
---------------------------------+--------------------+---------+---------
ias-component                    | process-type       |     pid | status  
---------------------------------+--------------------+---------+---------
ODI11_Standalone                 | odiagent           |   10557 | Init    
ohs1                             | OHS                |     N/A | Down    
 
[oracle@odi11g-node2 bin]$ kill -9 10557
[oracle@odi11g-node2 bin]$ ./opmnctl status
 
Processes in Instance: instance1
---------------------------------+--------------------+---------+---------
ias-component                    | process-type       |     pid | status  
---------------------------------+--------------------+---------+---------
ODI11_Standalone                 | odiagent           |   10602 | Init    
ohs1                             | OHS                |     N/A | Down    
 
[oracle@odi11g-node2 bin]$ kill -9 10602
[oracle@odi11g-node2 bin]$ ./opmnctl status
 
Processes in Instance: instance1
---------------------------------+--------------------+---------+---------
ias-component                    | process-type       |     pid | status  
---------------------------------+--------------------+---------+---------
ODI11_Standalone                 | odiagent           |     N/A | Down    
ohs1                             | OHS                |     N/A | Down

the other agent will become the active agent, but it may take some time before it sync’s up correctly with the Master Repository because the CNAME created to point to the active node might take a while to change or get updated. In my case, the load plan continues to display as “running” for a couple of minutes more, until it’s marked as failed.

NewImage

What it is good to see, though, is that this agent that OPMN has failed-over too still has the schedule assigned to it; as far as the Master Repository is concerned it’s still the same logical/physical agent, even though OPMN has moved it to a new node – this is because we access it via the special odi11agent.rittmandev.com virtual host name.

NewImage

So using OPMN and this active/passive failover-setup has the advantage of making sure there’s always an agent ready to receive ad-hoc job requests or run jobs on the schedule, but it requires you to maintain a CNAME or Virtual IP for the agent and restart failed mappings to make it work. So how does this all look with ODI12c?

High-Availability for ODI12c Standalone Agents – Node Manager and an Active/Active Setup

ODI12c has the same Standalone Agents and JEE Agents that ODI11g does, but introduces a third type of agent, Co-Located Agents, that confuses things a bit more. Moreover, ODI12c Standalone Agents are managed by Node Manager not OPMN, which has been removed from the overall Fusion Middleware 12c architecture, with Node Manager in this case running within a lightweight, pseudo-domain especially for ODI12’c Standalone Agent in this case, and with the new Co-Located Agents running within a full WebLogic domain but outside of a WebLogic Server Managed Server.

So what this means in-practice is that each node’s Standalone Agent with an ODI12c install is managed by Node Manager rather than OPMN, but as Node Manager outside of a full WebLogic domain can’t link-up with another Node Manager for an active/passive high-availability setup, each Standalone Agent has to be  active all the time (or as much as possible) in what in this case is now an active/active HA setup.

NewImage

Within ODI Studio and the Master Repository topology then, each agent is registered separately and it’s the choice of the operator which agent jobs are sent to, and which agent runs the daily ETL schedule. You could of course set these agents up in a load-balancing cluster with a parent or master agent accepting all the job execution requests, but then you’d need to make that master agent highly-avaialble, where would it go, and you’re back to the same issue again.

NewImage

Each agent in this instance can either be started via the agent.sh utility, or via Node Manager like this (edited for brevity):

[oracle@odi12c-node1 bin]$ ./startComponent.sh ODI12_Primary_Standalone
Starting system Component ODI12_Primary_Standalone ...
 
Initializing WebLogic Scripting Tool (WLST) ...
 
...
Welcome to WebLogic Server Administration Scripting Shell
 
Type help() for help on available commands
 
Reading domain from /home/oracle/odi12c/user_projects/domains/base_domain
 
 
Please enter Node Manager password:
Connecting to Node Manager ...
Successfully Connected to Node Manager.
Starting server ODI12_Primary_Standalone ...
Successfully started server ODI12_Primary_Standalone ...
Successfully disconnected from Node Manager.
 
 
Exiting WebLogic Scripting Tool.
 
Done

By default, like ODI11g Standalone Agents and OPMN, Node Manager won’t automatically restart an agent under it’s control if it falls-over. If, however, you set the CrashRecoveryEnabled setting to “true” in the /home/oracle/odi12c/user_projects/domains/base_domain/nodemanager/nodemanager.properties file as shown in the screenshot below, ODI12c Standalone Agents that were started by NodeManager will get auto-restarted if the process fails.

NewImage

As you can see in the Terminal session screenshot below, Node Manager picks up the fact that the ODI12c Standalone Agent process is started is no longer running (by checking running processes against lock files the system component created on startup), then automatically restarts it so that it’s running again at the end.

NewImage

As with ODI11g’s Standalone Agents and OPMN, load plans that are running when an ODI12c Standalone Agent fails then fail themselves, as shown in the screenshot below:

NewImage

You can in-fact restart failed load plans with either active agent – in this case I’m restarting it with the secondary Standalone Agent, simulating a complete failure of the first node rather than a process failure that Node Manager could automatically deal with.

NewImage

Where things do get complicated with this active/active setup is when the agent you’ve assigned the schedule to fails; in this case, unlike ODI11g where there was just a single logical agent, in this instance the second agent is completely independent of the first, and will need to have the schedule assigned to it if the primary agent fails; the primary agent, once the server node is running again, will need to have the schedule taken away from it otherwise you’ll have two agents acting as the scheduler each one running it’s own copy of the daily data load. One way around this would be to have the scheduling done by a third-party scheduler, which would either access the two active/active agents through a load-balancer and round-robin approach, or try and use one agent where possible falling over to the other one.

Conclusions – And why JEE Agents, WebLogic and Coherence Aren’t Such Overkill, After All…

Once you’ve been through all of this and looked at the limitations of a Standalone Agent-only setup, you can sort-of see why Oracle went down the WebLogic and Coherence route for their fully HA-capable agent topology; Node Manager can work across multiple servers when you run it within a WebLogic Server domain, and using Coherence to store a local (middleware-level) copy of the schedule means that all of the agents can keep track of what’s been executed and where they are in the schedule. Coupled with a load balancer and a Virtual IP that provides a consistent address for all of the JEE agents in the cluster, it’s a bit more engineering but you can see why it was needed; in the meantime, the customer went with 12c in the end and put in-place manual processes to deal with Standalone Agent failure in this active/active setup, which should be enough to tide-them over until the next budget round when they can put additional hardware and the JEE Agent setup into place.

Essbase Studio checkModelInSyncExternal Error

This week I am back at one of my favourite clients to assist with some issues since the latest upgrade to 11.1.1.9. These guys are seriously technical and know their stuff, so when issues pop up I’m always ready for an interesting and challenging day in the office.

As with all the recent 11g versions, they had to decide between in-place and out-of-place upgrades, and here they opted for an in-place upgrade because they have a fairly settled configuration using Answers, Publisher and Essbase. They were trying to avoid the need to reconfigure components like:

  • SQL Group Providers
  • Customisations
  • Application Roles used in OBIEE and Essbase Filters

Plus, when reconfiguring the above you also run the risk of missing something and it could take a long time to track down where it went wrong.

The Problem

In the end this was not a very complicated problem or solution but seeing as we couldn’t find anything on the web or Oracle Support regarding this error, I thought it might be useful to share in case others run into this same issue.

After performing the upgrade from OBIEE 11.1.1.7.140114 to 11.1.1.9 the client was unable to view or edit the Essbase Model properties in Essbase Studio. In fact, they couldn’t even see their Essbase Model at all. Only the Cube Schema was visible.

cube_schema_only1

When we tried to select or edit the Schema the following message appeared:

Error in the external procedure 'checkModelInSyncExternal'. Line = 136.

checkinsyncerror1

Oracle Support Came To The Party

After trying several different options to fix it, none of which made any difference, the client raised a P1 query with Oracle Support. After going through the initial standard questions and a few messages between the client and Oracle Support, they came back with a solution. All of this within 24 hours…

The Reason

After applying the patch the catalog version is not synchronised with the new version of Essbase Studio Server.

The Solution

Even thought we couldn’t find any reference to this in the post-patching section of the documentation, when you read the Oracle Essbase Studio 11.1.2.4.000 Readme there is a section in there describing the problem and solution.

To fix the problem we followed these simple steps

  1. Navigate to
    <ORACLE_BI_HOME>productsEssbaseEssbaseStudioServerscripts.template
  2. Copy startCommandLineClient.bat.template to
    <ORACLE_BI_HOME>productsEssbaseEssbaseStudioServer
  3. Rename the new startCommandLineClient.bat.template to startCommandLineClient.bat
  4. Edit startCommandLineClient.bat so it looks like this
    NOTE – Update the paths according to your environment AND use full paths
    @echo off
    
    set JAVA_HOME=C:oracleOracle_BI1jdkjre
    "%JAVA_HOME%binjava" -Xmx128m %JAVA_OPTIONS% -jar "C:oracleOracle_BI1productsEssbaseEssbaseStudioServerclient.jar"
    
    if %errorlevel% == 0 goto finish
    
    echo .
    echo Press any key to finish the client session
    pause > null
    
    : finish
  5. Open a CMD window and start startCommandLineClient.bat
  6. Enter the requested information (Host, Username and Password)
  7. Enter the following command and wait for it to complete
    reinit
  8. Type Exit to leave the script and close the CMD window

You should now be able to open the Model Properties in Essbase Studio without any issues.

cube_schema_and_essb_props1

How Engaged Are Your OBIEE Users?

Following on from Jon’s blog post “User Engagement: Why does it matter?”, I would like to take this one step further by talking about measurement. At Rittman Mead we believe that if you can’t measure it, you can’t improve it. So how do you measure user engagement?

Metrics

User engagement for OBIEE is like most web based products or services:

  • both have users who access the product or service and then take actions.
  • users of both use it repeatedly if they get value from those actions.

A lot of thought has gone into measuring the customer experience and engagement for web based products and services. Borrowing some of these concepts will help us understand how to measure user engagement for BI solutions.

We look at three metrics:

  • Frequency of use
  • Recency of use
  • Reach of the system

Usage Tracking Data

OBIEE offers visibility of what its users are doing through its Usage Tracking feature, we can use this to drive our metrics.

Figure 1

UT UE LDM

As we can see from Figure 1, the usage tracking data can support our three metrics.

Frequency of use

  • Number of times a user or group of users visit in a specific period (Day / Month / Year)
  • Number of times a dashboard / report is accessed in a specific period.
  • How are these measures changing over time?

Recency of use

  • How recently was a report / dashboard used by relevant user groups?
  • What are the average days between use of each report / dashboard by relevant use group?
  • Number of dashboards / reports used or not used in a specific period (Day / Month / Year)
  • Number of users that have used or not used OBIEE in a specific period (Day / Month / Year)
  • How are these changing over time?

Reach of the system

  • Overall number of users that have used or not used OBIEE. This can be further broken down by user groups.
  • How is it changing over time?

User engagement KPI perspective

We have compared BI solutions to web-based products and services earlier in this post. Let’s look at some popular KPIs that many web-based products use to measure engagement and how they can be used to measure OBIEE engagement.

  • Stickiness: Generally defined as the amount of time spent at a site over a given period.
  • Daily Active Users (DAU): Number of unique users active in a day
  • Monthly Active Users (MAU): Number if unique users active in a month.

DAU and MAU are also used as a ratio (DAU / MAU) to give an approximation of utility.

The R&D division of Rittman Mead has developed the Rittman Mead User Engagement Toolkit, a set of tools and reports to capture and visualise user engagement metrics. The example charts given below have been developed using the R programming language.

Figure 2 – DAU over time with a trailing 30-day average (Red line)

DAU : MAU trailing 30 day average V0.3

Figure 3 – Forecast DAU/MAU for 30 days after the data was generated

Forecast DAU:MAU

What Can You Do With These Insights?

Recall that Jon’s blog post points out the folowing drivers of user engagement:

  • User interface and user experience
  • Quality, relevance, and confidence in data
  • Performance
  • Ability to use the system
  • Accessibility – is the system available in the right way, at the right time?

There are several actions you can take to influence the drivers as a result of monitoring the aforementioned metrics.

  • Identify users or groups that are not using the system as much as they used to. Understand their concerns and address the user engagement drivers that are causing this.
  • Verify usage of any significant enhancement to the BI solution over time.
  • Analyse one of the key drivers, performance, from usage data.
  • Determine peak usage to project future hardware needs.

Conclusion

User engagement is the best way users can get value from their OBIEE systems. Measuring user engagement on an ongoing basis is important and can be monitored with the use of some standard metrics and KPIs.

Future blog posts in this series will address some of the key drivers behind user engagement in addition to providing an overview of the Rittman Mead User Engagement Toolkit.

If you are interested in hearing more about User Engagement please sign up to our mailing list below.