Tag Archives: Oracle GoldenGate

Using Oracle GoldenGate for Trickle-Feeding RDBMS Transactions into Hive and HDFS

A few months ago I wrote a post on the blog around using Apache Flume to trickle-feed log data into HDFS and Hive, using the Rittman Mead website as the source for the log entries. Flume is a good technology to use for this type of capture requirement as it captures log entries, HTTP calls, JMS queue entries and other “event” sources easily, has a resilient architecture and integrates well with HDFS and Hive. But what if the source you want to capture activity for is a relational database, for example Oracle Database 12c? With Flume you’d need to spool the database transactions to file, whereas what you really want is a way to directly connect to the database engine and capture the changes from source.

Which is exactly what Oracle GoldenGate does, and what most people don’t realise is that GoldenGate can also load data into HDFS and Hive, as well as the usual database targets. Hive and HDFS aren’t fully-supported targets yet, you can use the Oracle GoldenGate for Java adapter to act as the handler process and then land the data in HDFS files or Hive tables on your target Hadoop platform. My Oracle Support has two tech nodes, “Integrating OGG Adapter with Hive (Doc ID 1586188.1)” and “Integrating OGG Adapter with HDFS (Doc ID 1586210.1)” that give example implementations of the Java adapters you’d need for these two target types, with the overall end-to-end process for landing Hive data looking like the diagram below (and the HDFS one just swapping out HDFS for Hive at the handler adapter stage)

NewImage

This is also a good example of the sorts of technology we’d use to implement the “data factory” concept within the new Oracle Information Management Reference Architecture, the part of the architecture that moves data between the Hadoop and NoSQL-based Data Reservoir, and the relationally-stored enterprise information store; in this case, trickle-feeding transactional data from the Oracle database into Hadoop, perhaps to archive it at lower-cost than we could do in an Oracle database, or to add transaction activity data to a Hadoop-based application

NewImage

So I asked my colleague Nelio Guimaraes to set up a GoldenGate capture process on our Cloudera CDH5.1 Hadoop cluster, using GoldenGate 12.1.2.0.0 for our source Oracle 11gR2 database and Oracle GoldenGate for Java, downloadable separately on edelivery.oracle.com under Oracle Fusion Middleware > Oracle GoldenGate Application Adapters 11.2.1.0.0 for JMS and Flat File Media Pack. In our example, we’re going to capture activity on the SCOTT.EMP table in the Oracle database, and then perform the following step to set up replication from it into a replica Hive table:

  1. Create a table in Hive that corresponds to the table in Oracle database.
  2. Create a table in the Oracle database and prepare the table for replication.
  3. Configure the Oracle GoldenGate Capture to extract transactions from the Oracle database and create the trail file.
  4. Configure the Oracle GoldenGate Pump to read the trail and invoke the custom adapter
  5. Configure the property file for the Hive handler
  6. Code, Compile and package the custom Hive handler
  7. Execute a test. 

Setting up the Oracle Database Source Capture

Let’s go into the Oracle database first, check the table definition, and then connect to Hadoop to create a Hive table of the same column definition.

[oracle@centraldb11gr2 ~]$ sqlplus scott/tiger
SQL*Plus: Release 11.2.0.3.0 Production on Thu Sep 11 01:08:49 2014
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options
SQL> describe DEPT
 Name Null? Type
 ----------------------------------------- -------- ----------------------------
 DEPTNO NOT NULL NUMBER(2)
 DNAME VARCHAR2(14)
 LOC VARCHAR2(13)
SQL> exit
...
[oracle@centraldb11gr2 ~]$ ssh oracle@cdh51-node1
Last login: Sun Sep 7 16:11:36 2014 from officeimac.rittmandev.com
[oracle@cdh51-node1 ~]$ hive
...
create external table dept
(
 DEPTNO string, 
 DNAME string, 
 LOC string
) row format delimited fields terminated by '\;' stored as textfile
location '/user/hive/warehouse/department'; 
exit
...

Then I install Oracle Golden Gate 12.1.2 on the source Oracle database, just as you’d do for any Golden Gate install, and make sure supplemental logging is enabled for the table I’m looking to capture. Then I go into the ggsci Golden Gate command-line utility, to first register the user it’ll be connecting as, and what table it needs to capture activity for.

[oracle@centraldb11gr2 12.1.2]$ cd /u01/app/oracle/product/ggs/12.1.2/
[oracle@centraldb11gr2 12.1.2]$ ./ggsci
$ggsci> DBLOGIN USERID sys@ctrl11g, PASSWORD password sysdba
$ggsci> ADD TRANDATA SCOTT.DEPT COLS(DEPTNO), NOKEY

GoldenGate uses a number of components to replicate data from source to targets, as shown in the diagram below.

NewImageFor our purposes, though, there are just three that we need to configure; the Extract component, which captures table activity on the source; the Pump process that moves data (or the “trail”) from source database to the Hadoop cluster; and the Replicat component that takes that activity and applies it to the target tables. In our example, the extract and pump processes will be as normal, but we need to create a custom “handler” for the target Hive table that uses the Golden Gate Java API and the Hadoop FS Java API.

The tool we use to set up the extract and capture process is ggsci, the command-line Golden Gate Software Command Interface. I’ll use it first to set up the Manager process that runs on both source and target servers, giving it a port number and connection details into the source Oracle database.

$ggsci> edit params mgr
PORT 7809
USERID sys@ctrl11g, PASSWORD password sysdba
PURGEOLDEXTRACTS /u01/app/oracle/product/ggs/12.1.2/dirdat/*, USECHECKPOINTS

Then I create two configuration files, one for the extract process and one for the pump process, and then use those to start those two processes.

$ggsci> edit params ehive
EXTRACT ehive
USERID sys@ctrl11g, PASSWORD password sysdba
EXTTRAIL /u01/app/oracle/product/ggs/12.1.2/dirdat/et, FORMAT RELEASE 11.2
TABLE SCOTT.DEPT;
$ggsci> edit params phive
EXTRACT phive
RMTHOST cdh51-node1.rittmandev.com, MGRPORT 7809
RMTTRAIL /u01/app/oracle/product/ggs/11.2.1/dirdat/rt, FORMAT RELEASE 11.2
PASSTHRU
TABLE SCOTT.DEPT;
$ggsci> ADD EXTRACT ehive, TRANLOG, BEGIN NOW
$ggsci> ADD EXTTRAIL /u01/app/oracle/product/ggs/12.1.2/dirdat/et, EXTRACT ehive
$ggsci> ADD EXTRACT phive, EXTTRAILSOURCE /u01/app/oracle/product/ggs/12.1.2/dirdat/et
$ggsci> ADD RMTTRAIL /u01/app/oracle/product/ggs/11.2.1/dirdat/rt, EXTRACT phive

As the Java event handler on the target Hadoop platform won’t be able to ordinarily get table metadata for the source Oracle database, we’ll use the defgen utility on the source platform to create the parameter file that the replicat process will need.

$ggsci> edit params dept
defsfile ./dirsql/DEPT.sql
USERID ggsrc@ctrl11g, PASSWORD ggsrc
TABLE SCOTT.DEPT;
./defgen paramfile ./dirprm/dept.prm NOEXTATTR

Note that NOEXTATTR means no extra attributes; because the version on target is a generic and minimal version, the definition file with extra attributes won’t be interpreted. Then, this DEPT.sql file will need to be copied across to the target Hadoop platform where you’ve installed Oracle GoldenGate for Java, to the /dirsql folder within the GoldenGate install. 

[oracle@centraldb11gr2 12.1.2]$ ssh oracle@cdh51-node1
oracle@cdh51-node1's password: 
Last login: Wed Sep 10 17:05:49 2014 from centraldb11gr2.rittmandev.com
[oracle@cdh51-node1 ~]$ cd /u01/app/oracle/product/ggs/11.2.1/
[oracle@cdh51-node1 11.2.1]
$ pwd/u01/app/oracle/product/ggs/11.2.1
[oracle@cdh51-node1 11.2.1]$ ls dirsql/
DEPT.sql

Then, going back to the source Oracle database platform, we’ll start the Golden Gate Monitor process, and then the extract and pump processes.

[oracle@cdh51-node1 11.2.1]$ ssh oracle@centraldb11gr2
oracle@centraldb11gr2's password: 
Last login: Thu Sep 11 01:08:18 2014 from bdanode1.rittmandev.com
GGSCI (centraldb11gr2.rittmandev.com) 7> start mgr
Manager started.
 
GGSCI (centraldb11gr2.rittmandev.com) 8> start ehive
 
Sending START request to MANAGER ...
EXTRACT EHIVE starting
 
GGSCI (centraldb11gr2.rittmandev.com) 9> start phive
 
Sending START request to MANAGER ...
EXTRACT PHIVE starting

Setting up the Hadoop / Hive Replicat Process

Setting up the Hadoop side involves a couple of similar steps to the source capture side; first we configure the parameters for the Manager process, then configure the extract process that will pull table activity off of the trail file, sent over by the pump process on the source Oracle database.

[oracle@centraldb11gr2 12.1.2]$ ssh oracle@cdh51-node1
oracle@cdh51-node1's password: 
Last login: Wed Sep 10 21:09:38 2014 from centraldb11gr2.rittmandev.com
[oracle@cdh51-node1 ~]$ cd /u01/app/oracle/product/ggs/11.2.1/
[oracle@cdh51-node1 11.2.1]$ ./ggsci
$ggsci> edit params mgr
PORT 7809
PURGEOLDEXTRACTS /u01/app/oracle/product/ggs/11.2.1/dirdat/*, usecheckpoints, minkeepdays 3
$ggsci> add extract tphive, exttrailsource /u01/app/oracle/product/ggs/11.2.1/dirdat/rt
$ggsci> edit params tphive
EXTRACT tphive
SOURCEDEFS ./dirsql/DEPT.sql
CUserExit ./libggjava_ue.so CUSEREXIT PassThru IncludeUpdateBefores
GETUPDATEBEFORES
TABLE SCOTT.DEPT;

Now it’s time to create the Java hander that will write the trail data to the HDFS files and Hive table. The My Oracle Support Doc.ID 1586188.1 I mentioned at the start of the article has a sample Java program called SampleHandlerHive.java that writes incoming transactions into an HDFS file within the Hive directory, and also writes it to a file on the local filesystem. To get this working on our Hadoop system, we created a new java source code file from the content in SampleHandlerHive.java, updated the path from hadoopConf.addResource to point the the correct location of core-site.xml, hdfs-site.xml and mapred-site.xml, and then compiled it as follows:

export CLASSPATH=/u01/app/oracle/product/ggs/11.2.1/ggjava/ggjava.jar:/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hadoop/client/*
javac -d . SampleHandlerHive.java

Successfully executing the above command created the SampleHiveHandler.class under /u01/app/oracle/product/ggs/11.2.1//dirprm/com/mycompany/bigdata. To create the JAR file that the GoldenGate for Java adapter will need, I then need to change directory to the /dirprm directory under the Golden Gate install, and then run the following commands:

jar cvf myhivehandler.jar com
chmod 755 myhivehandler.jar

I also need to create a properties file for this JAR to use, in the same /dirprm directory. This properties file amongst other things tells the Golden Gate for Java adapter where in HDFS to write the data to (the location where the Hive table keeps its data files), and also references any other JAR files from the Hadoop distribution that it’ll need to get access to.

[oracle@cdh51-node1 dirprm]$ cat tphive.properties 
#Adapter Logging parameters. 
gg.log=log4j
gg.log.level=info
 
#Adapter Check pointing  parameters
goldengate.userexit.chkptprefix=HIVECHKP_
goldengate.userexit.nochkpt=true
 
# Java User Exit Property
goldengate.userexit.writers=jvm
jvm.bootoptions=-Xms64m -Xmx512M -Djava.class.path=/u01/app/oracle/product/ggs/11.2.1/ggjava/ggjava.jar:/u01/app/oracle/product/ggs/11.2.1/dirprm:/u01/app/oracle/product/ggs/11.2.1/dirprm/myhivehandler.jar:/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hadoop/client/hadoop-common-2.3.0-cdh5.1.0.jar:/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hadoop/lib/commons-configuration-1.6.jar:/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hadoop/lib/commons-logging-1.1.3.jar:/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hadoop/lib/commons-lang-2.6.jar:/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/etc/hadoop:/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/etc/hadoop/conf.dist:/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hadoop/lib/guava-11.0.2.jar:/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hadoop/hadoop-auth-2.3.0-cdh5.1.0.jar:/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hadoop/client/hadoop-hdfs-2.3.0-cdh5.1.0.jar:/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hadoop/client/commons-cli-1.2.jar:/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hadoop/client/protobuf-java-2.5.0.jar
 
#Properties for reporting statistics
# Minimum number of {records, seconds} before generating a report
jvm.stats.time=3600
jvm.stats.numrecs=5000
jvm.stats.display=TRUE
jvm.stats.full=TRUE
 
#Hive Handler.  
gg.handlerlist=hivehandler
gg.handler.hivehandler.type=com.mycompany.bigdata.SampleHandlerHive
gg.handler.hivehandler.HDFSFileName=/user/hive/warehouse/department/dep_data
gg.handler.hivehandler.RegularFileName=cinfo_hive.txt
gg.handler.hivehandler.RecordDelimiter=;
gg.handler.hivehandler.mode=tx

Now, the final step on the Hadoop side is to start its Golden Gate Manager process, and then start the Replicat and apply process.

GGSCI (cdh51-node1.rittmandev.com) 5> start mgr
 
Manager started. 
 
GGSCI (cdh51-node1.rittmandev.com) 6> start tphive
 
Sending START request to MANAGER ...
EXTRACT TPHIVE starting

Testing it All Out

So now I’ve got the extract and pump processes running on the Oracle Database side, and the apply process running on the Hadoop side, let’s do a quick test and see if it’s working. I’ll start by looking at what data is in each table at the beginning.

SQL> select * from dept;     

    DEPTNO DNAME  LOC
 ---------- -------------- -------------

10 ACCOUNTING  NEW YORK
20 RESEARCH  DALLAS
30 SALES  CHICAGO
40 OPERATIONS  BOSTON
50 TESTE  PORTO
60 NELIO  STS
70 RAQUEL  AVES
 
7 rows selected.

Over on the Hadoop side, there’s just one row in the Hive table:

hive> select * from customer;

OK 80MARCIA   ST

Now I’ll go back to Oracle and insert a new row in the DEPT table:

SQL> insert into dept (deptno, dname, loc)
  2  values (75, 'EXEC','BRIGHTON'); 

1 row created. 
SQL> commit; 

Commit complete.

And, going back over to Hadoop, I can see Golden Gate has added that record to the Hive table, by the Golden Gate for Java adapter writing the transaction to the underlying HDFS file.

hive> select * from customer;

OK 80MARCIA   ST
75 EXEC       BRIGHTON

So there you have it; Golden Gate replicating Oracle RBDMS transactions into HDFS and Hive, to complement Apache Flume’s ability to replicate log and event data into Hadoop. Moreover, as Michael Rainey explained in this three part blog series, Golden Gate is closely integrated into the new 12c release of Oracle Data Integrator, making it even easier to manage Golden Gate replication processes into your overall data loading project, and giving Hadoop developers and Golden Gate users access to the full set of load orchestration and data quality features in that product rather than having to rely on home-grown scripting, or Oozie.

GoldenGate and Oracle Data Integrator – A Perfect Match in 12c… Part 4: Start Journalizing!

In this post, the finale of the four-part series “GoldenGate and Oracle Data Integrator – A Perfect Match in 12c”, I’ll walk through the setup of the ODI Models and start journalizing in “online” mode. This will utilize our customized JKM to build the GoldenGate parameter files based on the ODI Metadata and deploy them to both the source and target GoldenGate installation locations. Before I get into all of the details, let’s recap the first 3 posts and see how we arrived at this point.

Part one of the series, Getting Started, led us through a quick review of the Oracle Reference Architecture for Information Management and the tasks we’re trying to accomplish; loading both the Raw Data Reservoir (RDR) and Foundation schemas simultaneously, using Oracle GoldenGate 12c replication, and set it all up via Oracle Data Integrator 12c. We also reviewed the setup of the GoldenGate JAgent process, necessary for communication between ODI and GoldenGate when using the “online” version of the Journalizing Knowledge Module.

In part two, we reviewed the Journalizing Knowledge Module, “JKM Oracle to Oracle Consistent (OGG Online)”, its new features, and how much it has been improved in ODI 12c. Full integration with Oracle GoldenGate, through the use of the new ODI Tool “OdiOggCommand”, allows for the setup and configuration of GoldenGate process groups, trail file directories, and table-level supplemental logging, all from the ODI JKM.

Most recently, part 3, titled Setup Journalizing, walked us through the customizations of the “JKM Oracle to Oracle Consistent (OGG Online)” that will allow us to create the source-to-foundation replication alongside the standard source-to-RDR setup. We added a set of options, the first to control whether or not we replicat to foundation and the second to capture the ODI Logical Schema corresponding to the foundation schema. Then we added the task that will create the source-to-foundation table mapping inside the GoldenGate replicat parameter file and set the options appropriately. I’ve been using the ODI 12c Getting Started VM, with the 12.1.2 version of ODI, for my demo setup. If you haven’t done so already, you can download the latest version of the VM, with ODI 12.1.3, from the Oracle Technical Network. I’d say that’s enough recap, now on to the final steps for GoldenGate and ODI 12c integration and let’s start journalizing!

Setup ODI Models

Create Models

We first need to create the ODI Models and Datastores for the Source, Staging (Raw Data Reservoir) and Foundation tables. I will typically reverse engineer the source tables into a Model first, then copy them to the Staging and Foundation Models. This approach will ensure the column names and data types remain consistent with the source. I then execute a Groovy script to create the additional data warehouse audit columns in each of the Foundation Datastores.

models

Configure JKM

Unlike the 11g version of ODI, in 12c the “JKM Oracle to Oracle Consistent (OGG Online)” Knowledge Module will be set on the source Model. Open up the Model, in this example, PM_SRC, and switch to the Journalizing tab.

model-journalizing

We’ll set the Journalizing Mode to “Consistent Set” and then choose the customized JKM that we have been working with in this example, “JKM Oracle to Oracle Consistent (OGG Online) RM”, from the dropdown list. Now we are presented with the GoldenGate Process Selection parameters and a list of KM Options to configure.

Set the Process Selection parameters for the Capture Process and Delivery Process by selecting the Logical Schemas created in Part 3 - Setup GoldenGate Topology – Schemas. This setting drives the naming of the Extract, Pump, and Replicat parameter files and process groups in GoldenGate. If you plan to use GoldenGate for the initial load, select the processes here as well. I’m not setting mine, as I typically use a batch load tool, such as Oracle Datapump or insert across DBLink to perform the initial load of the target. As you can see in the image below, you can also create the Oracle GoldenGate Logical Schemas from the Model.

model-jkm-process

Next are the set of Options, including the 2 new Options added in the previous post. We can leave several of the values as the default, as they are specific to particular character sets or implementation on a multi-node Oracle RAC setup.

model-jkm-options

The Options we do want to set:

ONLINE - Set to “true” to enable the automatic GoldenGate configuration when Start Journal is run.
LOCAL_TEMP_DIR – Enter a directory local to the machine on which the Start Journal process will be executed. Be sure the user executing the Start Journal process has privileges to create/modify/remove directories and files.
APPLY_FOUNDATION – Custom Option, set to “true” to enable the addition of the source-to-foundation mapping to the GoldenGate Replicat parameter file.
FND_LSCHEMA – The Logical Schema for the Foundation layer, necessary when APPLY_FOUNDATION is true.

After the Options are set, the Model can be saved and closed. Back in the Designer Navigator, add the Datastores to CDC by either selecting each individual Datastore and adding it or by right-clicking the Model and choosing to add the entire set all at once.

model-add-to-cdc

Before we get on with using the JKM, there is one thing I forgot to mention in the previous post. When setting up the Logical Schema for the GoldenGate “Delivery” process, you must also set the target Logical Schema. If you fail to do so, you’ll get an error stating “SnpLSchema does not exist” when attempting any Change Data Capture commands on the source Model.

logical-schema-set-target

With the JKM set and the tables added to Change Data Capture, we can now add a Subscriber. Subscribers allow multiple mappings to consume the change data from the J$ tables at different intervals. For example, a table may be consumed by one mapping every hour, and then by an additional mapping each night. Two different Subscribers would be used in this case. In our example, I’ll create a single Subscriber named “PERFECT_MATCH”. Make sure the process runs successfully, then it’s time to start Journalizing.

Start Capturing Changes

With the setup and configuration out of the way, the rest is up to the JKM. We’re now able to right-click the source Model and select Change Data Capture–>Start Journal. This executes all of the Start Journal related steps in the JKM, which will create the CDC Framework (J$ tables, JV$ views, etc.), generate and deploy the GoldenGate parameter files (Extract, Pump, and Replicat), and configure and start the GoldenGate process groups. Be sure that the source and target GoldenGate Manager and JAgent processes are running prior to executing the Start Journal process. Also, make sure that the database is in ArchiveLog mode and ready for GoldenGate to capture transactions.

After the Start Journal process is successfully completed, you can browse to the source GoldenGate home directory, run GGSCI, and view the status of the OGG process groups.

ogg-src-info-all

It looks like the Extract and Pump are in place and running. Now let’s check out the Replicat on the target GoldenGate installation.

ogg-trg-info-all

Here we see that the Replicat process is in place, but not actually running. Remember from our JKM editing that we commented out the step that will start the Replicat process. This is to ensure we perform an initial load prior to applying any captured change data to the target.

The last bit of work that must be completed is the initial load. I won’t go into details here, but the way I like to do it is to load the source to the target data based on a captured SCN using Oracle Datapump or DBLink rather than using GoldenGate process groups to perform the load. Note: The ODI 12c Getting Started Guide doesn’t even use the OGG initial load! Then, we can start the replicat in GoldenGate after the captured SCN using the same approach I wrote about in a previous blog on ODI and GoldenGate 11g.

JKM “Online” Mode

Beyond adding the parameter files and process groups, what exactly did the “online” version of the JKM do for us? If you browse to the directory that we set in the JKM Options under LOCAL_TEMP_DIR, you’ll find all of the GoldenGate files generated by the JKM. These files were generated locally, then uploaded to their proper GoldenGate home directory. Without “online” mode, they would had to have been manually copied to GoldenGate.

jkm-steps

Once uploaded, the obey files (batch files for GoldenGate commands) were executed.

OdiOggCommand "-LSCHEMA=EXTPMSRC" "-OPERATION=EXECUTEOBEY" "-OBEY_FILE=/home/oracle/Oracle/Middleware/oggsrc/EXTPMSRC.oby"

And finally, JKM steps were generated to perform the creation of the process groups in GoldenGate.

OdiOggCommand "-LSCHEMA=EXTPMSRC" "-OPERATION=EXECUTECMD"
add extract EXTPMSRC, tranlog, begin now 
add exttrail /home/oracle/Oracle/Middleware/oggsrc/dirdat/oc, extract EXTPMSRC, megabytes 100
stop extract EXTPMSRC
start extract

If you ever need to stop the change data capture process, either to add additional tables or make modifications to the metadata, you can run the Drop Journal process. Not only will the CDC Framework of tables and views be removed, but the “online” mode also reaches into GoldenGate and drops the process groups that were generated by the JKM.

OdiOggCommand "-LSCHEMA=EXTPMSRC" "-OPERATION=EXECUTECMD"
stop extract RPMEDWP
delete extract RPMEDWP

In conclusion, the integration between GoldenGate and Oracle Data Integrator  in 12c has been vastly improved over the 11g version. The ability to manage the entire setup process from within ODI is a big step forward, and I can only see these two products being further integrated in future releases. If you have any questions or comments about ODI or GoldenGate, or would like some help with your own implementation, feel free to add a comment below or reach out to me at michael.rainey@rittmanmead.com.

 

GoldenGate and Oracle Data Integrator – A Perfect Match in 12c… Part 3: Setup Journalizing

After a short vacation, some exciting news, and a busy few weeks (including KScope14 in Seattle, WA), it’s time to get the “GoldenGate and Oracle Data Integrator – A Perfect Match in 12c” blog series rolling again. Hopefully readers can find some time between World Cup matches to try integrating ODI and GoldenGate on their own!

To recap my previous two posts on this subject, I first started by showing the latest Information Management Reference Architecture at a high-level (described in further detail by Mark Rittman) and worked through the JAgent configuration, necessary for communication between ODI and GoldenGate. In the second post, I walked through the changes made to the GoldenGate JKM in ODI 12c and laid out the necessary edits for loading the Foundation layer at a high-level. Now, it’s time to make the edits to the JKM and set up the ODI metadata.

Before I jump into the JKM customization, let’s go through a brief review of the foundation layer and its purpose. The foundation schema contains tables that are essentially duplicates of the source table structure, but with the addition of the foundation audit columns, described below, that allow for the storage of all transactional history in the tables.

FND_SCN (System Change Number)
FND_COMMIT_DATE (when the change was committed)
FND_DML_TYPE (DML type for the transaction: insert, update, delete)

The GoldenGate replicat parameter file must be setup to map the source transactions into the foundation tables using the INSERTALLRECORDS option. This is the same option that the replicat uses to load the J$ tables, allowing only inserts and no updates or deletes. A few changes to the JKM will allow us to choose whether or not we want to load the Foundation schema tables via GoldenGate.

Edit the Journalizing Knowledge Module

To start, make a copy of the “JKM Oracle to Oracle Consistent (OGG Online)” so we don’t modify the original. Now we’re ready to make our changes.

Add New Options

A couple of new Options will need to be added to enable the additional feature of loading the foundation schema, while still maintaining the original JKM code. Option values are set during the configuration of the JKM on the Model, but can also have a default in the JKM.

APPLY_FOUNDATION

new-option-apply-fnd

This option, when true, will enable this step during the Start Journal process, allowing it to generate the source-to-foundation mapping statement in the Replicat (apply) parameter file.

FND_LSCHEMA

new-option-fnd-schema

This option will be set with Logical Schema name for the Foundation layer, and will be used to find the physical database schema name when output in the GoldenGate replicat parameter file.

Add a New Task

With the options created, we can now add the additional task to the JKM that will create the source to foundation table mappings in the GoldenGate replicat parameter file. The quickest way to add the task is to duplicate a current task. Open the JKM to the Tasks tab and scroll down to the “Create apply prm (3)” step. Right click the task and select Duplicate. A copy of the task will be created and in the order that we want, just after the step we duplicated.

Rename the step to “Create apply prm (4) RM”, adding the additional RM tag so it’s easily identifiable as a custom step. From the properties, open the Edit Expression dialog for the Target Command. The map statement, just below the OdiOutFile line, will need to be modified. First, remove the IF statement code, as the execution of this step will be driven by the APPLY_FOUNDATION option being set to true.

Here’s a look at the final code after editing.

map <%= odiRef.getObjectName("L", odiRef.getJrnInfo("TABLE_NAME"), odiRef.getOggModelInfo("SRC_LSCHEMA"), "D") %>, TARGET <%= odiRef.getSchemaName("" + odiRef.getOption("FND_LSCHEMA") + "","D") %>.<%= odiRef.getJrnInfo("TABLE_NAME") %>, KEYCOLS (<%= odiRef.getColList("", "[COL_NAME]", ", ", "", "PK") %>, FND_SCN)<%if (!odiRef.getOption("NB_APPLY_PROCESS").equals("1")) {%>, FILTER (@RANGE(#ODI_APPLY_NUMBER,<%= nbApplyProcesses %>,<%= odiRef.getColList("", "[COL_NAME]", ", ", "", "PK") %>))<% } %> INSERTALLRECORDS,
COLMAP (
USEDEFAULTS,
FND_COMMIT_DATE = @GETENV('GGHEADER' , 'COMMITTIMESTAMP'),
FND_SCN = @GETENV('TRANSACTION' , 'CSN'),
FND_DML_TYPE = @GETENV('GGHEADER' , 'OPTYPE')
);

The output of this step is going to be a mapping for each source-to-foundation table in the GoldenGate replicat parameter file, similar to this:

map PM_SRC.SRC_CITY, TARGET EDW_FND.SRC_CITY, KEYCOLS (CITY_ID, FND_SCN) INSERTALLRECORDS,
COLMAP (
USEDEFAULTS,
FND_COMMIT_DATE = @GETENV('GGHEADER' , 'COMMITTIMESTAMP'),
FND_SCN = @GETENV('TRANSACTION' , 'CSN'),
FND_DML_TYPE = @GETENV('GGHEADER' , 'OPTYPE')
);

The column mappings (COLMAP clause) are hard-coded into the JKM, with the parameter USEDEFAULTS mapping each column one-to-one. We also hard-code each foundation audit column mapping to the appropriate environment variable from the GoldenGate trail file. Learn more about the GETENV GoldenGate function here.

The bulk of the editing on this step is done to the MAP statement. The out-of-the-box JKM is setup to apply transactional changes to both the J$, or change table, and fully replicated table. Now we need to add the mapping to the foundation table. In order to do so, we first need to identify the foundation schema and table name for the target table using the ODI Substitution API.

map ... TARGET <%= odiRef.getSchemaName("" + odiRef.getOption("FND_LSCHEMA") + "", "D") %> ...

The nested Substitution API call allows us to get the physical database schema name based on the ODI Logical Schema that we will set in the option FND_LSCHEMA, during setup of the JKM on the ODI Model. Then, we concatenate the target table name with a dot (.) in between to get the fully qualified table name (e.g. EDW_FND.SRC_CITY).

... KEYCOLS (<%= odiRef.getColList("", "[COL_NAME]", ", ", "", "PK") %>, FND_SCN) ...

We also added the FND_SCN to the KEYCOLS clause, forcing the uniqueness of each row in the foundation tables. Because we only insert records into this table, the natural key will most likely be duplicated numerous times should a record be updated or deleted on the source.

Set Options

The previously created task,  “Create apply prm (4) RM”, should be set to execute only when the APPLY_FOUNDATION option is “true”. On this step, go to the Properties window and choose the Options tab. Deselect all options except APPLY_FOUNDATION, and when Start Journal is run, this step will be skipped unless APPLY_FOUNDATION is true.

jkm-set-task-option

Edit Task

Finally, we need to make a simple change to the “Execute apply commands online” task. First, add the custom step indicator (in my example, RM) to the end of the task name. In the target command expression, comment out the “start replicat …” command by using a double-dash.

--start replicat ...

This prevents GoldenGate from starting the replicat process automatically, as we’ll first need to complete an initial load of the source data to the target before we can begin replication of new transactions.

Additional Setup

The GoldenGate Manager and JAgent are ready to go, as is the customized “JKM Oracle to Oracle Consistent (OGG Online)” Journalizing Knowledge Module. Now we need to setup the Topology for both GoldenGate and the data sources.

Setup GoldenGate Topology - Data Servers

In order to properly use the “online” integration between GoldenGate and Oracle Data Integrator, a connection must be setup for the GoldenGate source and target. These will be created as ODI Data Servers, just as you would create an Oracle database connection. But, rather than provide a JDBC url, we will enter connection information for the JAgent that we configured in the initial post in the series.

First, open up the Physical Architecture under the Topology navigator and find the Oracle GoldenGate technology. Right-click and create a new Data Server.

create-ogg-dataserver

Fill out the information regarding the GoldenGate JAgent and Manager. To find the JAgent port, browse to the GG_HOME/cfg directory and open “Config.properties” in a text viewer. Down towards the bottom, the “jagent.rmi.port”, which is used when OEM is enabled, can be found.

####################################################################
## jagent.rmi.port ###
## RMI Port which EM Agent will use to connect to JAgent ###
## RMI Port will only be used if agent.type.enabled=OEM ###
####################################################################
jagent.rmi.port=5572

The rest of the connection information can be recalled from the JAgent setup.

setup-ogg-dataserver

Once completed, test the connection to ensure all of the parameters are correct. Be sure to setup a Data Server for both the source and target, as each will have its own JAgent connection information.

Setup GoldenGate Topology - Schemas

Now that the connection is set, the Physical Schema for both the GoldenGate source and target must be created. These schemas tie directly to the GoldenGate process groups and will be the name of the generated parameter files. Under the source Data Server, create a new Physical Schema. Choose the process type of “Capture”, provide a name (8 characters or less due to GoldenGate restrictions), and enter the trail file paths for the source and target trail files.

Create the Logical Schema just as you would with any other ODI Technology, and the extract process group schema is set.

For the target, or replicat, process group, perform the same actions on the GoldenGate target Data Server. This time, we just need to specify the target trail file directory, the discard directory (where GoldenGate reporting and discarded records will be stored), and the source definitions directory. The source definitions file is a GoldenGate representation of the source table structure, used when the source and target table structures do not match. The Online JKM will create and place this file in the source definitions directory.

Again, setup the Logical Schema as usual and the connections and process group schemas are ready to go!

The final piece of the puzzle is to setup the source and target data warehouse Data Servers, Physical Schemas, and Logical Schemas. Use the standard best practices for this setup, and then it’s time to create ODI Models and start journalizing. In the next post, Part 4 of the series, we’ll walk through applying the JKM to the source Model and start journalizing using the Online approach to GoldenGate and ODI integration.

Data Integration Tips: GoldenGate on RAC – Action Script for Windows

I want to briefly interrupt my blog series on GoldenGate and ODI – A Perfect Match in 12c… to provide another Data Integration Tip that was found during a recent client engagement. I was tasked with installing GoldenGate 11g into a 2-node RAC, which is a pretty straight-forward process and well documented by the Oracle Data Integration product team. The challenge is that the client’s Oracle RAC environment was installed on Windows Server 2003.

The Scenario

Working through the GoldenGate setup and configuration process for RAC / Clusterware, most of the same approach applies to either Unix/Linux or Windows (maybe a future blog post?). There is also an example action script towards the bottom, but it’s written in shell script..and that just won’t work in Windows! For those unfamiliar with Clusterware, when an application is added as a resource, Clusterware will use the action script to perform specific functions, or actions, against the application. These actions, Check, Start, Stop, etc., will often times run commands through GoldenGate’s GGSCI application, ensuring the Manager process is running, starting the Manager, etc.

We decided to convert the shell script example to batch command script, as this is about as native as it gets in Windows. Everything was going well with the conversion until we reached this function and the highlighted code below.

#call_ggsci is a generic routine that executes a ggsci command
call_ggsci () {
  log "entering call_ggsci"
  ggsci_command=$1
  cd ${GGS_HOME}

  ggsci_output=`${GGS_HOME}/ggsci << EOF
  ${ggsci_command}
  exit
  EOF`

  log "got output of : $ggsci_output"
}

The script will simply change directory to the GoldenGate home, run the GGSCI application, execute the command (passed in as an argument), and then exit GGSCI. This is all wrapped within two EOF “tags”, indicating that the formatting in the script, such as hard returns, etc, should remain when the external command is run. It’s known as a heredoc, and we found that it’s not really possible to do in a batch script.

When attempting similar code in the batch script, without the heredoc, we ran into an issue. GGSCI attempted to run all of the commands on the same line, which failed miserably. We needed another approach to ensure the script would execute properly.

Here’s the Tip…

The goal was to find a way to run a GGSCI command on a single line. The solution: use GGSCI to execute GoldenGate obey files. Rather than attempt to place hard returns in the batch command script, we simply placed all of the necessary commands in an obey file and passed the obey command and script file location it into the function.

call :call_ggsci %GGS_HOME%\diroby\StartMgr.oby
 ...

:call_ggsci

 REM set ggsci_command variable with the passed-in value
 SET ggsci_command=%~1
 call :log "ggsci_command: %ggsci_command%"

 REM log whether or not the GGS_HOME path exists
 if exist %GGS_HOME% (
  call :log "%GGS_HOME% exists"
 ) else (
  call :log "%GGS_HOME% does not exist"
 )
 ...

The obey file could be as complex as you like, but in this case it simply starts the manager process.

start-mgr-obey

This approach works very well, but we did end up with the additional obey files to maintain. Next time around, I’ll probably use a scripting language such as Groovy or Python, as either should work just fine on Linux or Windows.

Look for more Data Integration Tips on the blog, as Rittman Mead always coming up with innovative solutions to interesting challenges!

GoldenGate and Oracle Data Integrator – A Perfect Match in 12c… Part 2: Journalizing Knowledge Module

This is the second post in a blog series on the integration between Oracle Data Integrator (ODI) 12c and GoldenGate 12c. The first post focused on the latest, yet-to-be-released, Oracle Information Management Reference Architecture and some high-level features new to the 12c versions of each product. We also began working through the details of the GoldenGate installation and configuration, specifically the JAgent setup, which is necessary for communication between ODI and GoldenGate during the setup of “online” journalizing. In this post, we’ll look at the new features of the Journalizing Knowledge Module “JKM Oracle to Oracle Consistent (OGG Online)” and get started on the modifications to the JKM that must occur for us to load the Raw Data Reservoir (RDR – acronym coined by Nick Hurt in the comments of my first post…I like it!) and Foundation Layer simultaneously.

ODI Journalizing

Before I get into editing the JKM, let me briefly go through an introduction of ODI Journalizing, aka Change Data Capture (CDC). In case you missed it earlier, ODI CDC is implemented using a Journalized Knowledge Module. The JKM generates the infrastructure for the CDC, creating journal tables that store the change rows and views that provide access to the change rows for use in ODI Mappings. ODI CDC can be implemented using various capture processes, such as triggers on source tables, timestamps on rows, or mining of the database logs via Oracle Streams or, in our case, Oracle GoldenGate. A great explanation of ODI JKMs and how they work, written by Christophe Dupupet of the Oracle Data Integration A-Team, can be found here.

Import ODI JKMs

The integration between GoldenGate and ODI begins with a Journalized Knowledge Module, in our example we’ll be using the “JKM Oracle to Oracle Consistent (OGG Online)”. The JKM, delivered with ODI, will create the GoldenGate parameter files, configure GoldenGate process groups (and start them up), and generate the ODI CDC tables and views. The JKM, added to the source Model in 12c, uses the ODI metadata to generate the GoldenGate parameter file mappings. This alone saves quite a bit of manual work and reduces possible typos caused by human error.

JKM Oracle to Oracle Consistent (OGG Online)

In the previous post, I mentioned the new capability of the JKM that allows for an “online” integration between ODI and GoldenGate. But, there are many other new features that need to be described, so we’ll walk through those here.

ODI Tool: OdiOggCommand

The JKM uses an undocumented ODI Tool called OdiOggCommand in the target command of some tasks that are executed only when in “online” mode. This tool has different values for a parameter called OPERATION.

EXECUTECMD: executes various commands, such as Add Extract, within GGSCI
EXECUTEOBEY: runs the OBEY command against an obey file in GGSCI
DEFGEN: generates the source definitions file by executing the DEFGEN command
SAVEFILE: uploads the parameter and obey files to the GoldenGate installation directory

OdiOggCommand

I imagine the code behind this ODI Tool is simply executing command line calls to GGSCI, DEFGEN, etc. It would be great to see some Oracle documentation on this one!

GoldenGate Topology

The “online” aspect of the JKM requires that a Data Server, Physical Schema and Logical Schema all be setup under the GoldenGate technology in ODI. The Data Server contains the location and connection information for communicating with the JAgent on either the source or target GoldenGate server. Under that, a Physical Schema must be setup for each extract (the pump is included) and replicat process group that will be implemented on that server.

GoldenGate Physical Schema

In this screenshot, we have a Physical Schema with the process type set to Capture, also called the extract. Here we set up the directory path on the source to the location where captured transactions from the source database logs will be stored in GoldenGate’s own log files, called trail files. The remote trail file directory, setup on the target server and accessed by the pump process in order to move transactions from the source trail to the target trail, must also be added. Additional options such as trail file size (how large a trail file can get before rolling over to the next file) are also available to be set. Just below the Capture Process Properties are the Additional Options, parameters that can be added to the extract or replicat parameter files based on specific needs of the solution.

GoldenGate Physical Schema Additional Options

Here we can add options to handle different aspects of the extract or replicat, such as TRANLOGOPTIONS on the extract. Once I add this option, I get a template of the additional parameters I can configure. For example, I may want to access my transaction logs stored in ASM via the database rather than directly from ASM. The DBLOGREADER option lets me utilize the ASM API in the database, simplifying my access to the logs.

TRANLOGOPTIONS DBLOGREADER, DBLOGREADERBUFSIZE 2597152

When I add the additional option to the capture physical schema, it will be generated as a part of the extract parameter file. This helps to drastically reduce the amount of manual editing that had to occur after the parameter files were generated in the 11g version, and is a great addition to the JKM.

JKM Setup and Options

The JKM is actually applied to the source ODI Model, which is a grouping of logical table structures called Datastores. On the Journalizing tab in the Model, we first set the type (Consistent Set) and choose the appropriate JKM.

GoldenGate Process Selection

In the GoldenGate Process Selection section, we choose the Capture and Delivery Logical Schemas that were setup in the ODI Topology. If none have been created, but the Data Server for the source and target GoldenGate installations exist, we can choose to create the Physical and Logical Schemas directly from the Model by clicking the Create button. This is a nifty way to separate the system administration role (setting up the Data Server) from the metadata management or developer role.

The JKM also has a set of Options with configurable values, some being required and others optional. The number of Options has been reduced by quite a bit in the 12c JKM. This makes sense, as we’ve seen that additional metadata such as the trail file location and trail file size are set elsewhere. We’ll go through these in more detail later on when setting the values in our example.

JKM Options

That covers the updated features in the JKM, now let’s talk Knowledge Module customization.

JKM Customization

In the first post, I described how we want to extract the source data once and replicate it into both the Foundation Layer and Raw Data Reservoir in parallel. The JKM is set up to load the ODI CDC Framework (the fully replicated table and J$ table) out of the box, but not the Foundation table. In case you’re unfamiliar with the purpose of the Foundation layer, the idea is to store all transactional history from the source tables by converting every transaction into an insert, and tracking the type of change (insert / update / delete), commit date, and commit SCN. With this information stored from the beginning of the data warehouse, it can be used for historical drill-throughs from the dimensional model, or to completely reload a star schema – including all history. With the stage set, let’s look at what we’re going to change in the JKM.

1. Add New Option “APPLY_FOUNDATION”
This option, when true, will allow the Start Journal process to generate the source-to-foundation mapping statement in the Replicat (apply) process.

2. Add New Option “FND_LSCHEMA”
The Logical Schema name for the Foundation layer schema.

3. Add New Task “Create apply prm (4) RM”
This task will create the source-to-foundation mapping code and add it to the replicat parameter file.

4. Set Option on “Create apply prm (4) RM” Task
Set the execution Options to have APPLY_FOUNDATION as the only checked option. This will determine whether or not that task will execute when Start Journal is run.

5. Edit Task “Execute apply commands online RM”
We’ll comment out the “start replicat …” command, as we’ll first need to complete an initial load of the source data to the target.

Now, I did say I would get into the editing of the JKM in this post, but after describing the new features of the Knowledge Module, I’ll save the actual details for the part 3 of the series! Up next, editing the JKM and building the necessary ODI objects so we can start journalizing.