Tag Archives: Oracle Data Integrator
GoldenGate and Oracle Data Integrator – A Perfect Match in 12c… Part 4: Start Journalizing!
In this post, the finale of the four-part series “GoldenGate and Oracle Data Integrator – A Perfect Match in 12c”, I’ll walk through the setup of the ODI Models and start journalizing in “online” mode. This will utilize our customized JKM to build the GoldenGate parameter files based on the ODI Metadata and deploy them to both the source and target GoldenGate installation locations. Before I get into all of the details, let’s recap the first 3 posts and see how we arrived at this point.
Part one of the series, Getting Started, led us through a quick review of the Oracle Reference Architecture for Information Management and the tasks we’re trying to accomplish; loading both the Raw Data Reservoir (RDR) and Foundation schemas simultaneously, using Oracle GoldenGate 12c replication, and set it all up via Oracle Data Integrator 12c. We also reviewed the setup of the GoldenGate JAgent process, necessary for communication between ODI and GoldenGate when using the “online” version of the Journalizing Knowledge Module.
In part two, we reviewed the Journalizing Knowledge Module, “JKM Oracle to Oracle Consistent (OGG Online)”, its new features, and how much it has been improved in ODI 12c. Full integration with Oracle GoldenGate, through the use of the new ODI Tool “OdiOggCommand”, allows for the setup and configuration of GoldenGate process groups, trail file directories, and table-level supplemental logging, all from the ODI JKM.
Most recently, part 3, titled Setup Journalizing, walked us through the customizations of the “JKM Oracle to Oracle Consistent (OGG Online)” that will allow us to create the source-to-foundation replication alongside the standard source-to-RDR setup. We added a set of options, the first to control whether or not we replicat to foundation and the second to capture the ODI Logical Schema corresponding to the foundation schema. Then we added the task that will create the source-to-foundation table mapping inside the GoldenGate replicat parameter file and set the options appropriately. I’ve been using the ODI 12c Getting Started VM, with the 12.1.2 version of ODI, for my demo setup. If you haven’t done so already, you can download the latest version of the VM, with ODI 12.1.3, from the Oracle Technical Network. I’d say that’s enough recap, now on to the final steps for GoldenGate and ODI 12c integration and let’s start journalizing!
Setup ODI Models
Create Models
We first need to create the ODI Models and Datastores for the Source, Staging (Raw Data Reservoir) and Foundation tables. I will typically reverse engineer the source tables into a Model first, then copy them to the Staging and Foundation Models. This approach will ensure the column names and data types remain consistent with the source. I then execute a Groovy script to create the additional data warehouse audit columns in each of the Foundation Datastores.
Configure JKM
Unlike the 11g version of ODI, in 12c the “JKM Oracle to Oracle Consistent (OGG Online)” Knowledge Module will be set on the source Model. Open up the Model, in this example, PM_SRC, and switch to the Journalizing tab.
We’ll set the Journalizing Mode to “Consistent Set” and then choose the customized JKM that we have been working with in this example, “JKM Oracle to Oracle Consistent (OGG Online) RM”, from the dropdown list. Now we are presented with the GoldenGate Process Selection parameters and a list of KM Options to configure.
Set the Process Selection parameters for the Capture Process and Delivery Process by selecting the Logical Schemas created in Part 3 - Setup GoldenGate Topology – Schemas. This setting drives the naming of the Extract, Pump, and Replicat parameter files and process groups in GoldenGate. If you plan to use GoldenGate for the initial load, select the processes here as well. I’m not setting mine, as I typically use a batch load tool, such as Oracle Datapump or insert across DBLink to perform the initial load of the target. As you can see in the image below, you can also create the Oracle GoldenGate Logical Schemas from the Model.
Next are the set of Options, including the 2 new Options added in the previous post. We can leave several of the values as the default, as they are specific to particular character sets or implementation on a multi-node Oracle RAC setup.
The Options we do want to set:
ONLINE - Set to “true” to enable the automatic GoldenGate configuration when Start Journal is run.
LOCAL_TEMP_DIR – Enter a directory local to the machine on which the Start Journal process will be executed. Be sure the user executing the Start Journal process has privileges to create/modify/remove directories and files.
APPLY_FOUNDATION – Custom Option, set to “true” to enable the addition of the source-to-foundation mapping to the GoldenGate Replicat parameter file.
FND_LSCHEMA – The Logical Schema for the Foundation layer, necessary when APPLY_FOUNDATION is true.
After the Options are set, the Model can be saved and closed. Back in the Designer Navigator, add the Datastores to CDC by either selecting each individual Datastore and adding it or by right-clicking the Model and choosing to add the entire set all at once.
Before we get on with using the JKM, there is one thing I forgot to mention in the previous post. When setting up the Logical Schema for the GoldenGate “Delivery” process, you must also set the target Logical Schema. If you fail to do so, you’ll get an error stating “SnpLSchema does not exist” when attempting any Change Data Capture commands on the source Model.
With the JKM set and the tables added to Change Data Capture, we can now add a Subscriber. Subscribers allow multiple mappings to consume the change data from the J$ tables at different intervals. For example, a table may be consumed by one mapping every hour, and then by an additional mapping each night. Two different Subscribers would be used in this case. In our example, I’ll create a single Subscriber named “PERFECT_MATCH”. Make sure the process runs successfully, then it’s time to start Journalizing.
Start Capturing Changes
With the setup and configuration out of the way, the rest is up to the JKM. We’re now able to right-click the source Model and select Change Data Capture–>Start Journal. This executes all of the Start Journal related steps in the JKM, which will create the CDC Framework (J$ tables, JV$ views, etc.), generate and deploy the GoldenGate parameter files (Extract, Pump, and Replicat), and configure and start the GoldenGate process groups. Be sure that the source and target GoldenGate Manager and JAgent processes are running prior to executing the Start Journal process. Also, make sure that the database is in ArchiveLog mode and ready for GoldenGate to capture transactions.
After the Start Journal process is successfully completed, you can browse to the source GoldenGate home directory, run GGSCI, and view the status of the OGG process groups.
It looks like the Extract and Pump are in place and running. Now let’s check out the Replicat on the target GoldenGate installation.
Here we see that the Replicat process is in place, but not actually running. Remember from our JKM editing that we commented out the step that will start the Replicat process. This is to ensure we perform an initial load prior to applying any captured change data to the target.
The last bit of work that must be completed is the initial load. I won’t go into details here, but the way I like to do it is to load the source to the target data based on a captured SCN using Oracle Datapump or DBLink rather than using GoldenGate process groups to perform the load. Note: The ODI 12c Getting Started Guide doesn’t even use the OGG initial load! Then, we can start the replicat in GoldenGate after the captured SCN using the same approach I wrote about in a previous blog on ODI and GoldenGate 11g.
JKM “Online” Mode
Beyond adding the parameter files and process groups, what exactly did the “online” version of the JKM do for us? If you browse to the directory that we set in the JKM Options under LOCAL_TEMP_DIR, you’ll find all of the GoldenGate files generated by the JKM. These files were generated locally, then uploaded to their proper GoldenGate home directory. Without “online” mode, they would had to have been manually copied to GoldenGate.
Once uploaded, the obey files (batch files for GoldenGate commands) were executed.
OdiOggCommand "-LSCHEMA=EXTPMSRC" "-OPERATION=EXECUTEOBEY" "-OBEY_FILE=/home/oracle/Oracle/Middleware/oggsrc/EXTPMSRC.oby"
And finally, JKM steps were generated to perform the creation of the process groups in GoldenGate.
OdiOggCommand "-LSCHEMA=EXTPMSRC" "-OPERATION=EXECUTECMD" add extract EXTPMSRC, tranlog, begin now add exttrail /home/oracle/Oracle/Middleware/oggsrc/dirdat/oc, extract EXTPMSRC, megabytes 100 stop extract EXTPMSRC start extract
If you ever need to stop the change data capture process, either to add additional tables or make modifications to the metadata, you can run the Drop Journal process. Not only will the CDC Framework of tables and views be removed, but the “online” mode also reaches into GoldenGate and drops the process groups that were generated by the JKM.
OdiOggCommand "-LSCHEMA=EXTPMSRC" "-OPERATION=EXECUTECMD" stop extract RPMEDWP delete extract RPMEDWP
In conclusion, the integration between GoldenGate and Oracle Data Integrator in 12c has been vastly improved over the 11g version. The ability to manage the entire setup process from within ODI is a big step forward, and I can only see these two products being further integrated in future releases. If you have any questions or comments about ODI or GoldenGate, or would like some help with your own implementation, feel free to add a comment below or reach out to me at michael.rainey@rittmanmead.com.
GoldenGate and Oracle Data Integrator – A Perfect Match in 12c… Part 3: Setup Journalizing
After a short vacation, some exciting news, and a busy few weeks (including KScope14 in Seattle, WA), it’s time to get the “GoldenGate and Oracle Data Integrator – A Perfect Match in 12c” blog series rolling again. Hopefully readers can find some time between World Cup matches to try integrating ODI and GoldenGate on their own!
To recap my previous two posts on this subject, I first started by showing the latest Information Management Reference Architecture at a high-level (described in further detail by Mark Rittman) and worked through the JAgent configuration, necessary for communication between ODI and GoldenGate. In the second post, I walked through the changes made to the GoldenGate JKM in ODI 12c and laid out the necessary edits for loading the Foundation layer at a high-level. Now, it’s time to make the edits to the JKM and set up the ODI metadata.
Before I jump into the JKM customization, let’s go through a brief review of the foundation layer and its purpose. The foundation schema contains tables that are essentially duplicates of the source table structure, but with the addition of the foundation audit columns, described below, that allow for the storage of all transactional history in the tables.
FND_SCN (System Change Number)
FND_COMMIT_DATE (when the change was committed)
FND_DML_TYPE (DML type for the transaction: insert, update, delete)
The GoldenGate replicat parameter file must be setup to map the source transactions into the foundation tables using the INSERTALLRECORDS option. This is the same option that the replicat uses to load the J$ tables, allowing only inserts and no updates or deletes. A few changes to the JKM will allow us to choose whether or not we want to load the Foundation schema tables via GoldenGate.
Edit the Journalizing Knowledge Module
To start, make a copy of the “JKM Oracle to Oracle Consistent (OGG Online)” so we don’t modify the original. Now we’re ready to make our changes.
Add New Options
A couple of new Options will need to be added to enable the additional feature of loading the foundation schema, while still maintaining the original JKM code. Option values are set during the configuration of the JKM on the Model, but can also have a default in the JKM.
APPLY_FOUNDATION
This option, when true, will enable this step during the Start Journal process, allowing it to generate the source-to-foundation mapping statement in the Replicat (apply) parameter file.
FND_LSCHEMA
This option will be set with Logical Schema name for the Foundation layer, and will be used to find the physical database schema name when output in the GoldenGate replicat parameter file.
Add a New Task
With the options created, we can now add the additional task to the JKM that will create the source to foundation table mappings in the GoldenGate replicat parameter file. The quickest way to add the task is to duplicate a current task. Open the JKM to the Tasks tab and scroll down to the “Create apply prm (3)” step. Right click the task and select Duplicate. A copy of the task will be created and in the order that we want, just after the step we duplicated.
Rename the step to “Create apply prm (4) RM”, adding the additional RM tag so it’s easily identifiable as a custom step. From the properties, open the Edit Expression dialog for the Target Command. The map statement, just below the OdiOutFile line, will need to be modified. First, remove the IF statement code, as the execution of this step will be driven by the APPLY_FOUNDATION option being set to true.
Here’s a look at the final code after editing.
map <%= odiRef.getObjectName("L", odiRef.getJrnInfo("TABLE_NAME"), odiRef.getOggModelInfo("SRC_LSCHEMA"), "D") %>, TARGET <%= odiRef.getSchemaName("" + odiRef.getOption("FND_LSCHEMA") + "","D") %>.<%= odiRef.getJrnInfo("TABLE_NAME") %>, KEYCOLS (<%= odiRef.getColList("", "[COL_NAME]", ", ", "", "PK") %>, FND_SCN)<%if (!odiRef.getOption("NB_APPLY_PROCESS").equals("1")) {%>, FILTER (@RANGE(#ODI_APPLY_NUMBER,<%= nbApplyProcesses %>,<%= odiRef.getColList("", "[COL_NAME]", ", ", "", "PK") %>))<% } %> INSERTALLRECORDS, COLMAP ( USEDEFAULTS, FND_COMMIT_DATE = @GETENV('GGHEADER' , 'COMMITTIMESTAMP'), FND_SCN = @GETENV('TRANSACTION' , 'CSN'), FND_DML_TYPE = @GETENV('GGHEADER' , 'OPTYPE') );
The output of this step is going to be a mapping for each source-to-foundation table in the GoldenGate replicat parameter file, similar to this:
map PM_SRC.SRC_CITY, TARGET EDW_FND.SRC_CITY, KEYCOLS (CITY_ID, FND_SCN) INSERTALLRECORDS, COLMAP ( USEDEFAULTS, FND_COMMIT_DATE = @GETENV('GGHEADER' , 'COMMITTIMESTAMP'), FND_SCN = @GETENV('TRANSACTION' , 'CSN'), FND_DML_TYPE = @GETENV('GGHEADER' , 'OPTYPE') );
The column mappings (COLMAP clause) are hard-coded into the JKM, with the parameter USEDEFAULTS mapping each column one-to-one. We also hard-code each foundation audit column mapping to the appropriate environment variable from the GoldenGate trail file. Learn more about the GETENV GoldenGate function here.
The bulk of the editing on this step is done to the MAP statement. The out-of-the-box JKM is setup to apply transactional changes to both the J$, or change table, and fully replicated table. Now we need to add the mapping to the foundation table. In order to do so, we first need to identify the foundation schema and table name for the target table using the ODI Substitution API.
map ... TARGET <%= odiRef.getSchemaName("" + odiRef.getOption("FND_LSCHEMA") + "", "D") %> ...
The nested Substitution API call allows us to get the physical database schema name based on the ODI Logical Schema that we will set in the option FND_LSCHEMA, during setup of the JKM on the ODI Model. Then, we concatenate the target table name with a dot (.) in between to get the fully qualified table name (e.g. EDW_FND.SRC_CITY).
... KEYCOLS (<%= odiRef.getColList("", "[COL_NAME]", ", ", "", "PK") %>, FND_SCN) ...
We also added the FND_SCN to the KEYCOLS clause, forcing the uniqueness of each row in the foundation tables. Because we only insert records into this table, the natural key will most likely be duplicated numerous times should a record be updated or deleted on the source.
Set Options
The previously created task, “Create apply prm (4) RM”, should be set to execute only when the APPLY_FOUNDATION option is “true”. On this step, go to the Properties window and choose the Options tab. Deselect all options except APPLY_FOUNDATION, and when Start Journal is run, this step will be skipped unless APPLY_FOUNDATION is true.
Edit Task
Finally, we need to make a simple change to the “Execute apply commands online” task. First, add the custom step indicator (in my example, RM) to the end of the task name. In the target command expression, comment out the “start replicat …” command by using a double-dash.
--start replicat ...
This prevents GoldenGate from starting the replicat process automatically, as we’ll first need to complete an initial load of the source data to the target before we can begin replication of new transactions.
Additional Setup
The GoldenGate Manager and JAgent are ready to go, as is the customized “JKM Oracle to Oracle Consistent (OGG Online)” Journalizing Knowledge Module. Now we need to setup the Topology for both GoldenGate and the data sources.
Setup GoldenGate Topology - Data Servers
In order to properly use the “online” integration between GoldenGate and Oracle Data Integrator, a connection must be setup for the GoldenGate source and target. These will be created as ODI Data Servers, just as you would create an Oracle database connection. But, rather than provide a JDBC url, we will enter connection information for the JAgent that we configured in the initial post in the series.
First, open up the Physical Architecture under the Topology navigator and find the Oracle GoldenGate technology. Right-click and create a new Data Server.
Fill out the information regarding the GoldenGate JAgent and Manager. To find the JAgent port, browse to the GG_HOME/cfg directory and open “Config.properties” in a text viewer. Down towards the bottom, the “jagent.rmi.port”, which is used when OEM is enabled, can be found.
#################################################################### ## jagent.rmi.port ### ## RMI Port which EM Agent will use to connect to JAgent ### ## RMI Port will only be used if agent.type.enabled=OEM ### #################################################################### jagent.rmi.port=5572
The rest of the connection information can be recalled from the JAgent setup.
Once completed, test the connection to ensure all of the parameters are correct. Be sure to setup a Data Server for both the source and target, as each will have its own JAgent connection information.
Setup GoldenGate Topology - Schemas
Now that the connection is set, the Physical Schema for both the GoldenGate source and target must be created. These schemas tie directly to the GoldenGate process groups and will be the name of the generated parameter files. Under the source Data Server, create a new Physical Schema. Choose the process type of “Capture”, provide a name (8 characters or less due to GoldenGate restrictions), and enter the trail file paths for the source and target trail files.
Create the Logical Schema just as you would with any other ODI Technology, and the extract process group schema is set.
For the target, or replicat, process group, perform the same actions on the GoldenGate target Data Server. This time, we just need to specify the target trail file directory, the discard directory (where GoldenGate reporting and discarded records will be stored), and the source definitions directory. The source definitions file is a GoldenGate representation of the source table structure, used when the source and target table structures do not match. The Online JKM will create and place this file in the source definitions directory.
Again, setup the Logical Schema as usual and the connections and process group schemas are ready to go!
The final piece of the puzzle is to setup the source and target data warehouse Data Servers, Physical Schemas, and Logical Schemas. Use the standard best practices for this setup, and then it’s time to create ODI Models and start journalizing. In the next post, Part 4 of the series, we’ll walk through applying the JKM to the source Model and start journalizing using the Online approach to GoldenGate and ODI integration.
Introducing the Updated Oracle / Rittman Mead Information Management Reference Architecture Pt1. – Information Architecture and the “Data Factory”
One of the things at Rittman Mead that we’re really interested in, is the architecture of “information management” systems and how these change over time as thinking, and product capabilities, evolve. In fact we often collaborate with the Enterprise Architecture team within Oracle, giving input into the architecture designs they come up with, and more recently working on a full-blown collaboration with them to come up with a next-generation Information Management architecture. I these two posts I wanted to share some of our recent thinking in this area, looking first at our new proposed architecture, and then in the second post talking about how we’d use agile development methods, in-particular our “ExtremeBI” development approach, to deliver it.
But first, some history. Back in 2009 I blogged about a first-generation DW reference architecture which introduced a couple of new concepts, based on new capabilities from tools such as OBIEE plus some thinking we, and the Enterprise Architecture team at Oracle, had been doing over the years. This reference architecture introduced the concept of “Foundation” and “Access and Performance” layers, and recognised the reality that Kimball-type star schemas were great for querying but not so good for long-term, query-neutral storage of data, whilst Inmon-style EDW models were great as a long-term, process-neutral system of record, but not so good for running user queries on. This new architecture included both of these design approaches, with the foundation layer forming the “information management” layer and the access and performance layer being the “information access” layer. Most importantly, tools like OBIEE made it possible for enterprises to create metadata layers that potentially accessed all layers in this model, so users could query the foundation layer if needed as well as the access and performance layer, if the foundation layer was a better source of data for a particular reports.
A second revision to this model, a couple of years later, expanded on the original one and introduced another two interesting concepts, brought upon by the introduction of tools like Endeca Information Discovery, and the rise of unstructured and semi-structured data sources. This new architecture added unstructured and semi-structured sources into the model, and also introduced the concept of “sandboxes”, areas of the information management model that allowed more free-form, exploratory BI applications to be built.
But in-practice, this idea of “unstructured” and “semi-structured” sources wasn’t all that helpful. What really started to make an impact in the past couple of years is the increasing use of “schema-on-read” databases, where we trade-off the performance and defined structure of traditional relational 3NF and star schemas for the flexibility and “time-to-value” provided by key-value store databases. The Endeca Server is a good example of these types of database, where the Endeca Server allows rapid storage of loosely-associated datasets and tools like Endeca Studio then apply a structure to the data, at the time of analysis. Schema-on-read databases are great for fast, flexible access to datasets, but the cost of ETL is then borne by each system that accesses the data.
Probably the most well-known examples of schema-on-read sources though are Hadoop, and NoSQL databases. Coupled with their ability to store lots of detail-level data at relatively low cost, Hadoop and NoSQL databases have significantly affected the overall landscape for BI, data warehousing and business analytics, and we thought it was about time for a new reference architecture that fully-incorporated the capabilities and latest thinking around this area. Back at the start of 2014 myself, Jon Mead and Stewart Bryson met up with Oracle’s Andrew Bond in his team for a series of workshops, and what came out of it was an updated Information Management Architecture *and* a development methodology for delivering it. Let’s start off then by looking at this updated architecture from a conceptual view.
At a conceptual level, we build on this idea of sandbox environment and formally separate things out into the Execution area – business-as-usual, production and development areas – and an Innovation area, where we build on the idea of a sandbox and rename it the “Discovery lab”. The Discovery lab is where, for want of a better word, the “data scientists” work, with fewer constraints on development and whose inputs are events and data, and outputs are the discovery output that can be the prototype and inspiration for designs going into the execution area.
The main “engine” of the Execution area is our enterprise store of data, this time broken down into four areas:
- A “data reservoir” where we store all incoming events and data at detail-level, typically on HDFS. This blog article by Oracle’s Jean-Pierre Dijcks sets out the concept of a data reservoir well, and I like this blog by Scaleabilities’ Jeff Needham where he makes the case for calling it a “data reservoir” that can ingest, process and analyse data rather than a “data lake”, which implies a passive store.
- An Enterprise Data Store, analogous to the enterprise data warehouses we use today, and a reporting component, typically in our case OBIEE
- Most importantly, the new concept of a “data factory”, a conduit between the data reservoir and the enterprise information store
Together, the execution and innovation layers form our “information platform”, with the event engine feeding real-time events into the platform and outputting them into the data reservoir, and traditional ETL routines loading structured data from the enterprise into the enterprise information store.
This conceptual architecture then permits several types of information application. For example, the data reservoir and the data factory together could support what we call “data applications”, applications working on semi-structured, large and low-granularity data sets such as those used for genomic analysis.
Other applications might be more traditional BI and data warehousing applications, but with the addition of data from the data reservoir and the analysis capabilities of Hadoop.
The discovery lab can be a standalone area, or the insights and discovery it outputs can be used as inputs into the main information platform. More event-based data will typically come in via the event engine, with its output going into the data reservoir and supporting “next-best-decision” applications like Oracle Real-Time Decisions.
Another way of looking at this architecture is from a logical perspective, in particular focusing on the data layers and access/loading processes to load them. The diagram below is our latest version of the two diagrams at the start of this article, and as you can see we’ve kept the data sources and BI element much the same, and kept the concept of the sandbox, in this case refined as the “discovery lab sandbox”.
What is different this time though is the middle bit; we’ve lost the staging area and replaced it with the raw data reservoir, added a “Rapid Development Sandbox”, and drawn the main layers as a slanted set of stacked areas. So why?
What we’re trying to show with the slanted data layers is the relative cost of data ingestion (loading), and the relative cost of accessing it (information interpretation). For the raw data reservoir, for example, there’s little cost in ingesting the data – maybe copy some files to HDFS, or use Flume or GoldenGate to capture log or transaction data to HDFS or Hive, but the cost is then borne in accession this typically “schema-on-read” data source. As you go up the stack, there’s a bit more work in landing data into the Foundation layer – ETL routines, GoldenGate routines, some data cleaning and constraint checking, for example – but it’s correspondingly easier to get data out. For the Access and Performance Layer there’s the most cost in getting data in, but then users have very little work to do when getting data out.
Data can move up the stack from Raw Data Reservoir to Foundation, or directly into Access and Performance, or it could be landed at levels above Raw Data Reservoir, for example in our ExtremeBI approach where we use GoldenGate to replicate source system tables directly into Foundation without going through a staging layer. The Rapid Development Sandboxes are there to support agile, iterative development, with the output from them either being the result in itself, or their designs and insights being used to create more formal projects and data structures.
From a more product-centric perspective, you can overlay these types of diagrams with specific schematics for example enterprises. For example, in the diagram below you can see Oracle NoSQL database being see with HDFS and the Oracle Big Data Connectors to capture and store events from Complex Event Processing, and then outputs from CEP being also fed into a more traditional, “high density data” store as well as directly into a decision engine.
So this all sounds great, but how do you build it? Do we have to use the (discredited) step-by-step, waterfall method to build this type of architecture, and in particular the key “data factory” element that provides connectivity between the Raw Data Reservoir and the Enterprise Information Store? And can we apply agile methods to big data sources, as well as regular databases and applications? Check back on Monday for our thoughts on how this should be done.
GoldenGate and Oracle Data Integrator – A Perfect Match in 12c… Part 2: Journalizing Knowledge Module
This is the second post in a blog series on the integration between Oracle Data Integrator (ODI) 12c and GoldenGate 12c. The first post focused on the latest, yet-to-be-released, Oracle Information Management Reference Architecture and some high-level features new to the 12c versions of each product. We also began working through the details of the GoldenGate installation and configuration, specifically the JAgent setup, which is necessary for communication between ODI and GoldenGate during the setup of “online” journalizing. In this post, we’ll look at the new features of the Journalizing Knowledge Module “JKM Oracle to Oracle Consistent (OGG Online)” and get started on the modifications to the JKM that must occur for us to load the Raw Data Reservoir (RDR – acronym coined by Nick Hurt in the comments of my first post…I like it!) and Foundation Layer simultaneously.
ODI Journalizing
Before I get into editing the JKM, let me briefly go through an introduction of ODI Journalizing, aka Change Data Capture (CDC). In case you missed it earlier, ODI CDC is implemented using a Journalized Knowledge Module. The JKM generates the infrastructure for the CDC, creating journal tables that store the change rows and views that provide access to the change rows for use in ODI Mappings. ODI CDC can be implemented using various capture processes, such as triggers on source tables, timestamps on rows, or mining of the database logs via Oracle Streams or, in our case, Oracle GoldenGate. A great explanation of ODI JKMs and how they work, written by Christophe Dupupet of the Oracle Data Integration A-Team, can be found here.
The integration between GoldenGate and ODI begins with a Journalized Knowledge Module, in our example we’ll be using the “JKM Oracle to Oracle Consistent (OGG Online)”. The JKM, delivered with ODI, will create the GoldenGate parameter files, configure GoldenGate process groups (and start them up), and generate the ODI CDC tables and views. The JKM, added to the source Model in 12c, uses the ODI metadata to generate the GoldenGate parameter file mappings. This alone saves quite a bit of manual work and reduces possible typos caused by human error.
JKM Oracle to Oracle Consistent (OGG Online)
In the previous post, I mentioned the new capability of the JKM that allows for an “online” integration between ODI and GoldenGate. But, there are many other new features that need to be described, so we’ll walk through those here.
ODI Tool: OdiOggCommand
The JKM uses an undocumented ODI Tool called OdiOggCommand in the target command of some tasks that are executed only when in “online” mode. This tool has different values for a parameter called OPERATION.
EXECUTECMD: executes various commands, such as Add Extract, within GGSCI
EXECUTEOBEY: runs the OBEY command against an obey file in GGSCI
DEFGEN: generates the source definitions file by executing the DEFGEN command
SAVEFILE: uploads the parameter and obey files to the GoldenGate installation directory
I imagine the code behind this ODI Tool is simply executing command line calls to GGSCI, DEFGEN, etc. It would be great to see some Oracle documentation on this one!
GoldenGate Topology
The “online” aspect of the JKM requires that a Data Server, Physical Schema and Logical Schema all be setup under the GoldenGate technology in ODI. The Data Server contains the location and connection information for communicating with the JAgent on either the source or target GoldenGate server. Under that, a Physical Schema must be setup for each extract (the pump is included) and replicat process group that will be implemented on that server.
In this screenshot, we have a Physical Schema with the process type set to Capture, also called the extract. Here we set up the directory path on the source to the location where captured transactions from the source database logs will be stored in GoldenGate’s own log files, called trail files. The remote trail file directory, setup on the target server and accessed by the pump process in order to move transactions from the source trail to the target trail, must also be added. Additional options such as trail file size (how large a trail file can get before rolling over to the next file) are also available to be set. Just below the Capture Process Properties are the Additional Options, parameters that can be added to the extract or replicat parameter files based on specific needs of the solution.
Here we can add options to handle different aspects of the extract or replicat, such as TRANLOGOPTIONS on the extract. Once I add this option, I get a template of the additional parameters I can configure. For example, I may want to access my transaction logs stored in ASM via the database rather than directly from ASM. The DBLOGREADER option lets me utilize the ASM API in the database, simplifying my access to the logs.
TRANLOGOPTIONS DBLOGREADER, DBLOGREADERBUFSIZE 2597152
When I add the additional option to the capture physical schema, it will be generated as a part of the extract parameter file. This helps to drastically reduce the amount of manual editing that had to occur after the parameter files were generated in the 11g version, and is a great addition to the JKM.
JKM Setup and Options
The JKM is actually applied to the source ODI Model, which is a grouping of logical table structures called Datastores. On the Journalizing tab in the Model, we first set the type (Consistent Set) and choose the appropriate JKM.
In the GoldenGate Process Selection section, we choose the Capture and Delivery Logical Schemas that were setup in the ODI Topology. If none have been created, but the Data Server for the source and target GoldenGate installations exist, we can choose to create the Physical and Logical Schemas directly from the Model by clicking the Create button. This is a nifty way to separate the system administration role (setting up the Data Server) from the metadata management or developer role.
The JKM also has a set of Options with configurable values, some being required and others optional. The number of Options has been reduced by quite a bit in the 12c JKM. This makes sense, as we’ve seen that additional metadata such as the trail file location and trail file size are set elsewhere. We’ll go through these in more detail later on when setting the values in our example.
That covers the updated features in the JKM, now let’s talk Knowledge Module customization.
JKM Customization
In the first post, I described how we want to extract the source data once and replicate it into both the Foundation Layer and Raw Data Reservoir in parallel. The JKM is set up to load the ODI CDC Framework (the fully replicated table and J$ table) out of the box, but not the Foundation table. In case you’re unfamiliar with the purpose of the Foundation layer, the idea is to store all transactional history from the source tables by converting every transaction into an insert, and tracking the type of change (insert / update / delete), commit date, and commit SCN. With this information stored from the beginning of the data warehouse, it can be used for historical drill-throughs from the dimensional model, or to completely reload a star schema – including all history. With the stage set, let’s look at what we’re going to change in the JKM.
1. Add New Option “APPLY_FOUNDATION”
This option, when true, will allow the Start Journal process to generate the source-to-foundation mapping statement in the Replicat (apply) process.
2. Add New Option “FND_LSCHEMA”
The Logical Schema name for the Foundation layer schema.
3. Add New Task “Create apply prm (4) RM”
This task will create the source-to-foundation mapping code and add it to the replicat parameter file.
4. Set Option on “Create apply prm (4) RM” Task
Set the execution Options to have APPLY_FOUNDATION as the only checked option. This will determine whether or not that task will execute when Start Journal is run.
5. Edit Task “Execute apply commands online RM”
We’ll comment out the “start replicat …” command, as we’ll first need to complete an initial load of the source data to the target.
Now, I did say I would get into the editing of the JKM in this post, but after describing the new features of the Knowledge Module, I’ll save the actual details for the part 3 of the series! Up next, editing the JKM and building the necessary ODI objects so we can start journalizing.
How We Deliver Agile OBIEE Projects – Introducing ExtremeBI
Most OBIEE projects that we see are delivered through some sort of “waterfall” method, where requirements are defined up-front, there’s several stages of development, one or more major releases at the end, and any revision to requirements takes the form of a change request. These work well where requirements can be defined upfront, and can be reassuring to customers when they want to agree a fixed-price up-front with every subsequent change clearly costed. But, as with the development world in general, some customers are starting to look at “agile” methods for delivering BI projects, where requirements emerge over the course of a project, there isn’t so much of a fixed design or specification at the start, but instead the project adds features or capabilities in response to what are called “user stories”, making it more likely in-the-end that what ends-up getting delivered is more in-line with what users want – and where changes and additions to requirements are welcomed, rather than extra-cost change requests.
OBIEE naturally lends itself to working in an agile manner, through the three-layer nature of the repository (RPD); by separating the physical representation of the source data from how it is then presented to the end-users, you can start from the off with the dimensional model that’s your end goal, and then over time evolve the back-end physical layer from pointing directly at the source system to instead point at a data warehouse or OLAP cube. In fact, I covered this approach back in 2008 in a blog post called “A Future Oracle OBIEE Architecture” where I positioned OBIEE’s BI Server as a “business logic layer”, and speculated that at some point in the future, OBIEE might be able to turn the logical > physical mappings in the RPD into actual ODI mappings and transformation.
In the end, although OBIEE’s aggregate persistence feature gave us the ability to spin-off aggregate tables and cubes from the RPD logical model, full ETL “push-down” never came although you can see traces of it if you have a good poke around the DLLs and directories under the BI Server component. What did happen though was Exadata; with Exadata, features such as SmartScan, and its ability to do joins between normalised tables much faster than regular databases meant that it became possible to report directly against an OLTP schema, or a ODS-like foundation layer, only adding ETL to build a performance star schema layer if it was absolutely necessary. We covered this in a series of posts on Agile Data Warehousing with Exadata, and the focus of this method was performance – by adding Exadata, and the metadata flexibility in OBIEE’s RPD, we could deliver agile projects where Exadata gave us the performance even when we reported directly against a third-normal form data source.
And this approach worked well for our customers; if they’d invested in Exadata, and were open to the idea of agile, iterative development, we could typically deliver a working system in just a few months, and at all times what the users got was what they’d requested in their user story backlog. But there were still ways in which we could improve this method; not everyone has access to Exadata, for example, and reporting directly against a source system makes it tricky to add DW features like history, and surrogate keys, so recently we introduced the successor to this approach, in the form of an OBIEE development method we called “ExtremeBI”. Building our previous agile work, ExtremeBI introduced an integration element, using GoldenGate and ODI to replicate in real time any source systems we were interested in to the DW foundation layer, add the table metadata that DW systems expect, and then provide a means to transform the logical to physical RPD mappings into ODI ETL specifications.
But in a way, all the technical stuff is by-the-by; what this means in practice for customers is that we deliver working systems from the first iteration; initially, by reporting directly against a replicated copy of their source system (with replication and metadata enhancement by GoldenGate, and optionally ODI),and then over subsequent iterations adding more end-user functionality, OR hardened ODI ETL code, all the while driven by end-user stories and not some technical design signed-off months ago and which no longer reflects what users actually want.
What we’ve found though from several ExtremeBI customer engagements, is that it’s not just down to the technology and how well ODI, OBIEE and GoldenGate work; the major factors in successful projects are firstly, having the project properly pre-qualified at the start; not every project, and not every client, suits agile working, and agile works best if you’re “all in” as opposed to just agreeing to work in sprints but still having a set-in-stone set of requirements which have to be met at a certain time. The second important success factor is proper project organisation; we’ve grown from just a couple of guys with laptops back in 2007 to a fully-fledged, end-to-end development organisation, with full-time delivery managers,a managed services desk and tools such as JIRA, and you need to have this sort of thing in place, particularly a project management structure that’s agile-friendly and a good relationship with the customer where they’re fully-signed up to the agile approach. As such, we’ve found the most success where we’ve used ExtremeBI for fairly technically-savvy customers, for example a MIS department, who’ve been tasked with delivering something for reasonable price and over a short amount of months, who understand that not all requirements can be delivered, but really want their system to get adopted, delight their customer and focus its features on what’s important to end-users.
As well as processes and a method, we’ve also developed utilities and accelerators to help speed-up the initial setup, and ensure the initial foundation and staging layers are built consistently, with GoldenGate mappings already put in place, and ready for our developers to start delivering reports against the foundation layer, or use these foundation-layer tables as the basis of a data mart or warehouse build-out. The screenshot below shows this particular tool, built using Groovy and run from within the ODI Studio user interface, where the developer selects a set of source tables from an ODI model, and then the utility builds out the staging and foundation layers automatically, typically saving days over the manual method.
We’ve also built custom KMs for ExtremeBI, including one that uses Oracle Database’s flashback query feature to pull historical transactions from the UNDO log, as an alternative to Oracle Streams or Oracle GoldenGate when these aren’t available on the project.
All together, using Rittman Mead’s ExtremeBI method along with OBIEE, ODI and optionally GoldenGate has meant we’ve been able to deliver working OBIEE systems for customers over just a few months, typically for a budget less than £50k. Coupled with cloud hosting, where we can get the customer up-and-running immediately rather than having to wait for their IT department to provision servers, we think this the best way for most OBIEE11g projects to be delivered in the future. If you’re interested, we’ve got more details on our “ExtremeBI in the Cloud” web page, or you can contact me via email – mark.rittman@rittmanmead.com – if you’d like to discuss it more,