Category Archives: Rittman Mead

Taking a Look at the New Oracle Big Data SQL

Oracle launched their Oracle Big Data SQL product earlier this week, and it’ll be of interest to anyone who saw our series of posts a few weeks ago about the updated Oracle Information Management Reference Architecture, where Hadoop now sits alongside traditional Oracle data warehouses to provide what’s termed a “data reservoir”. In this type of architecture, Hadoop and its underlying technologies HDFS, Hive and schema-on-read databases provide an extension to the more structured relational Oracle data warehouses, making it possible to store and analyse much larger sets of data with much more diverse data types and structures; the issue that customers face when trying to implement this architecture is that Hadoop is a bit of a “wild west” in terms of data access methods, security and metadata, making it difficult for enterprises to come up with a consistent, over-arching data strategy that works for both types of data store.

Oracle Big Data SQL attempts to address this issue by providing a SQL access layer over Hadoop, managed by the Oracle database and integrated in with the regular SQL engine within the database. Where it differs from SQL on Hadoop technologies such as Apache Hive and Cloudera Impala is that there’s a single unified data dictionary, single Oracle SQL dialect and the full management capabilities of the Oracle database over both sources, giving you the ability to define access controls over both sources, use full Oracle SQL (including analytic functions, complex joins and the like) without having to drop down into HiveQL or other Hadoop SQL dialects. Those of you who follow the blog or work with Oracle’s big data connector products probably know of a couple of current technologies that sound like this; Oracle Loader for Hadoop (OLH) is a bulk-unloader for Hadoop that copies Hive or HDFS data into an Oracle database typically faster than a tool like Sqoop, whilst Oracle Direct Connector for HDFS (ODCH) gives the database the ability to define external tables over Hive or HDFS data, and then query that data using regular Oracle SQL.

Where ODCH falls short is that it treats the HDFS and Hive data as a single stream, making it easy to read once but, like regular external tables, slow to access frequently as there’s no ability to define indexes over the Hadoop data; OLH is also good but you can only use it to bulk-load data into Oracle, you can’t use it to query data in-place. Oracle Big Data SQL uses an approach similar to ODCH but crucially, it uses some Exadata concepts to move processing down to the Hadoop cluster, just as Exadata moves processing down to the Exadata storage cells (so much so that the project was called “Project Exadoop” internally within Oracle up to the launch) – but also meaning that it’s Exadata only, and not available for Oracle Databases running on non-Exadata hardware.

As explained by the launch blog post by Oracle’s Dan McClary, Oracle Big Data SQL includes components that install on the Hadoop cluster nodes that provide the same “SmartScan” functionality that Exadata uses to reduce network traffic between storage servers and compute servers. In the case of Big Data SQL, this SmartScan functionality retrieves just the columns of data requested in the query (a process referred to as “column projection”), and also only sends back those rows that are requested by the query predicate.

NewImage

Combined with Hive’s ability to map unstructured data sources into regular columns and tables, and Big Data SQL’s support for Oracle NoSQL database, the promise of this new technology is the ability to run queries against both relational, Hadoop and NoSQL data sources using a common data dictionary and common set of identity and data access controls.

There’s a couple of potential downsides, though. First-off, Big Data SQL will only be available as part of Oracle Big Data Appliance, which though an impressive bit of hardware and software is a much smaller market than the total set of Oracle customers looking to combine relational and Hadoop-based data; it’s also restricted to Oracle 12c on Exadata meaning you’ll most probably need to do a database upgrade even if you’ve already got the required Exadata servers in-place. Finally, it’s also restricted to the Oracle-specific distribution of Cloudera Hadoop, though if you’re using the BDA you’ll be using this anyway.

My other concern though is that Oracle now focus on SQL as their only access mechanism into Hadoop and big data, in a similar way to how they focused on SQL as their access route into OLAP when they incorporated Oracle Express into the Oracle Database, back in the mid-2000’s. Focusing on SQL over multidimensional languages such as Express 4GL and MDX meant you missed the real point of using a multidimensional, OLAP database – which of course was being able to use a multidimensional query language, and my concern with Big Data SQL is that we’ll end up focusing on that rather than languages such as Spark, Pig and NoSQL query languages which, combined with schema-on-read, is the real differentiator for Hadoop-based systems. As long as Big Data SQL is positioned as a “bonus” – a convenient way of getting data out of Hadoop once it’s been processed and analysed using more Hadoop-native technologies – then Big Data SQL will be a great enabling and acceptance technology for enterprises, rather than one that ends up restricting them.

We’re not aware of any beta program and I don’t think the launch webcast mentioned a specific date or BDA version when Big Data SQL will be out, but with Openworld coming up soon I’d expect to hear more about this over the next few months. We’re involved in a couple of significant Oracle Big Data Appliance implementations at the moment and this product would address a real, pressing need at the moment with our customers, so I’m looking forward to getting more involved in it over the next few months.

This article was updated on 18th July to add the fact that Big Data SQL is only available on Exadata, and is not a generic Oracle Database 12c technology.

GoldenGate and Oracle Data Integrator – A Perfect Match in 12c… Part 4: Start Journalizing!

In this post, the finale of the four-part series “GoldenGate and Oracle Data Integrator – A Perfect Match in 12c”, I’ll walk through the setup of the ODI Models and start journalizing in “online” mode. This will utilize our customized JKM to build the GoldenGate parameter files based on the ODI Metadata and deploy them to both the source and target GoldenGate installation locations. Before I get into all of the details, let’s recap the first 3 posts and see how we arrived at this point.

Part one of the series, Getting Started, led us through a quick review of the Oracle Reference Architecture for Information Management and the tasks we’re trying to accomplish; loading both the Raw Data Reservoir (RDR) and Foundation schemas simultaneously, using Oracle GoldenGate 12c replication, and set it all up via Oracle Data Integrator 12c. We also reviewed the setup of the GoldenGate JAgent process, necessary for communication between ODI and GoldenGate when using the “online” version of the Journalizing Knowledge Module.

In part two, we reviewed the Journalizing Knowledge Module, “JKM Oracle to Oracle Consistent (OGG Online)”, its new features, and how much it has been improved in ODI 12c. Full integration with Oracle GoldenGate, through the use of the new ODI Tool “OdiOggCommand”, allows for the setup and configuration of GoldenGate process groups, trail file directories, and table-level supplemental logging, all from the ODI JKM.

Most recently, part 3, titled Setup Journalizing, walked us through the customizations of the “JKM Oracle to Oracle Consistent (OGG Online)” that will allow us to create the source-to-foundation replication alongside the standard source-to-RDR setup. We added a set of options, the first to control whether or not we replicat to foundation and the second to capture the ODI Logical Schema corresponding to the foundation schema. Then we added the task that will create the source-to-foundation table mapping inside the GoldenGate replicat parameter file and set the options appropriately. I’ve been using the ODI 12c Getting Started VM, with the 12.1.2 version of ODI, for my demo setup. If you haven’t done so already, you can download the latest version of the VM, with ODI 12.1.3, from the Oracle Technical Network. I’d say that’s enough recap, now on to the final steps for GoldenGate and ODI 12c integration and let’s start journalizing!

Setup ODI Models

Create Models

We first need to create the ODI Models and Datastores for the Source, Staging (Raw Data Reservoir) and Foundation tables. I will typically reverse engineer the source tables into a Model first, then copy them to the Staging and Foundation Models. This approach will ensure the column names and data types remain consistent with the source. I then execute a Groovy script to create the additional data warehouse audit columns in each of the Foundation Datastores.

models

Configure JKM

Unlike the 11g version of ODI, in 12c the “JKM Oracle to Oracle Consistent (OGG Online)” Knowledge Module will be set on the source Model. Open up the Model, in this example, PM_SRC, and switch to the Journalizing tab.

model-journalizing

We’ll set the Journalizing Mode to “Consistent Set” and then choose the customized JKM that we have been working with in this example, “JKM Oracle to Oracle Consistent (OGG Online) RM”, from the dropdown list. Now we are presented with the GoldenGate Process Selection parameters and a list of KM Options to configure.

Set the Process Selection parameters for the Capture Process and Delivery Process by selecting the Logical Schemas created in Part 3 - Setup GoldenGate Topology – Schemas. This setting drives the naming of the Extract, Pump, and Replicat parameter files and process groups in GoldenGate. If you plan to use GoldenGate for the initial load, select the processes here as well. I’m not setting mine, as I typically use a batch load tool, such as Oracle Datapump or insert across DBLink to perform the initial load of the target. As you can see in the image below, you can also create the Oracle GoldenGate Logical Schemas from the Model.

model-jkm-process

Next are the set of Options, including the 2 new Options added in the previous post. We can leave several of the values as the default, as they are specific to particular character sets or implementation on a multi-node Oracle RAC setup.

model-jkm-options

The Options we do want to set:

ONLINE - Set to “true” to enable the automatic GoldenGate configuration when Start Journal is run.
LOCAL_TEMP_DIR – Enter a directory local to the machine on which the Start Journal process will be executed. Be sure the user executing the Start Journal process has privileges to create/modify/remove directories and files.
APPLY_FOUNDATION – Custom Option, set to “true” to enable the addition of the source-to-foundation mapping to the GoldenGate Replicat parameter file.
FND_LSCHEMA – The Logical Schema for the Foundation layer, necessary when APPLY_FOUNDATION is true.

After the Options are set, the Model can be saved and closed. Back in the Designer Navigator, add the Datastores to CDC by either selecting each individual Datastore and adding it or by right-clicking the Model and choosing to add the entire set all at once.

model-add-to-cdc

Before we get on with using the JKM, there is one thing I forgot to mention in the previous post. When setting up the Logical Schema for the GoldenGate “Delivery” process, you must also set the target Logical Schema. If you fail to do so, you’ll get an error stating “SnpLSchema does not exist” when attempting any Change Data Capture commands on the source Model.

logical-schema-set-target

With the JKM set and the tables added to Change Data Capture, we can now add a Subscriber. Subscribers allow multiple mappings to consume the change data from the J$ tables at different intervals. For example, a table may be consumed by one mapping every hour, and then by an additional mapping each night. Two different Subscribers would be used in this case. In our example, I’ll create a single Subscriber named “PERFECT_MATCH”. Make sure the process runs successfully, then it’s time to start Journalizing.

Start Capturing Changes

With the setup and configuration out of the way, the rest is up to the JKM. We’re now able to right-click the source Model and select Change Data Capture–>Start Journal. This executes all of the Start Journal related steps in the JKM, which will create the CDC Framework (J$ tables, JV$ views, etc.), generate and deploy the GoldenGate parameter files (Extract, Pump, and Replicat), and configure and start the GoldenGate process groups. Be sure that the source and target GoldenGate Manager and JAgent processes are running prior to executing the Start Journal process. Also, make sure that the database is in ArchiveLog mode and ready for GoldenGate to capture transactions.

After the Start Journal process is successfully completed, you can browse to the source GoldenGate home directory, run GGSCI, and view the status of the OGG process groups.

ogg-src-info-all

It looks like the Extract and Pump are in place and running. Now let’s check out the Replicat on the target GoldenGate installation.

ogg-trg-info-all

Here we see that the Replicat process is in place, but not actually running. Remember from our JKM editing that we commented out the step that will start the Replicat process. This is to ensure we perform an initial load prior to applying any captured change data to the target.

The last bit of work that must be completed is the initial load. I won’t go into details here, but the way I like to do it is to load the source to the target data based on a captured SCN using Oracle Datapump or DBLink rather than using GoldenGate process groups to perform the load. Note: The ODI 12c Getting Started Guide doesn’t even use the OGG initial load! Then, we can start the replicat in GoldenGate after the captured SCN using the same approach I wrote about in a previous blog on ODI and GoldenGate 11g.

JKM “Online” Mode

Beyond adding the parameter files and process groups, what exactly did the “online” version of the JKM do for us? If you browse to the directory that we set in the JKM Options under LOCAL_TEMP_DIR, you’ll find all of the GoldenGate files generated by the JKM. These files were generated locally, then uploaded to their proper GoldenGate home directory. Without “online” mode, they would had to have been manually copied to GoldenGate.

jkm-steps

Once uploaded, the obey files (batch files for GoldenGate commands) were executed.

OdiOggCommand "-LSCHEMA=EXTPMSRC" "-OPERATION=EXECUTEOBEY" "-OBEY_FILE=/home/oracle/Oracle/Middleware/oggsrc/EXTPMSRC.oby"

And finally, JKM steps were generated to perform the creation of the process groups in GoldenGate.

OdiOggCommand "-LSCHEMA=EXTPMSRC" "-OPERATION=EXECUTECMD"
add extract EXTPMSRC, tranlog, begin now 
add exttrail /home/oracle/Oracle/Middleware/oggsrc/dirdat/oc, extract EXTPMSRC, megabytes 100
stop extract EXTPMSRC
start extract

If you ever need to stop the change data capture process, either to add additional tables or make modifications to the metadata, you can run the Drop Journal process. Not only will the CDC Framework of tables and views be removed, but the “online” mode also reaches into GoldenGate and drops the process groups that were generated by the JKM.

OdiOggCommand "-LSCHEMA=EXTPMSRC" "-OPERATION=EXECUTECMD"
stop extract RPMEDWP
delete extract RPMEDWP

In conclusion, the integration between GoldenGate and Oracle Data Integrator  in 12c has been vastly improved over the 11g version. The ability to manage the entire setup process from within ODI is a big step forward, and I can only see these two products being further integrated in future releases. If you have any questions or comments about ODI or GoldenGate, or would like some help with your own implementation, feel free to add a comment below or reach out to me at michael.rainey@rittmanmead.com.

 

Simple Ways to Simplify: Quick Fixes to Enhance OBIEE Visuals

Over the past couple of months I worked with a few clients in order to conduct an assessment on dashboard and analysis design. In most cases I was noticing that the dashboards were overflowing with content to the point that there was really no discernible “story” or “flow” to the information. Most of the dashboards were an overwhelming hub of large tables, excess dashboard prompts, and complicated charts and graphs. Many times the client complaints were that the dashboards were not being fully adopted by the user base. The users knew that something was in fact wrong with what they were looking at, but just couldn’t put their finger on why the analysis process was so frustrating. While this may seem unavoidable during a time when you are trying to provide your users with what they want and “need,” there are a few simple ways to aid this issue. During the development process, simplicity is often overlooked and undervalued, and that is why a few key design principles can drastically improve the user experience.

In many cases that I’ve seen, all dashboards start with the good intention of quickly relaying information to the user. However, as the requirements process grows in length, more and more compromise is made in keeping clean, consistent, functional design. Many of the dashboards in my assessments often hand the user everything they ask for without ever questioning the value. Developers often settle on poor design just to have something accepted by their users and released into production.

While each client is unique and has their own set of issues to resolve, there are a few consistent principles of dashboard design that can be applied for everyone.

Top-Down Analysis—The practice of allowing users to drill from summary to detail, gives your user community the ability to make their dashboards as dynamic as they need to be. This method will never give the user too much, or too little information. With detail being a choice, not the default, the user that can log in and be prompted to have to drill into further detail, rather than being presented everything all at once. While this is a widely accepted best practice when it comes to dashboard design, this principle is ignored more than you would think. Odds are that your VP does not need to see 50 rows of detail level information in a table every time they view their dashboard. The primary purpose of presenting this detail level is to answer a very important question that should be posed by analyzing your data at the summary level. The user can examine something simple like a trend analysis, and then decide whether further examination is needed. The benefit of this common principle is that your user is never overwhelmed and irritated with an overload of information. The dashboard should be treated with the same care and consideration you would . Your organization’s dashboard (product) should be a joy to use, not an experience coupled with frustration.

top down image

Simplicity and Trimming the Fat-This is another very simple, yet often ignored design principle. From my observation, many developers  will create a chart or graph and will leave all the default settings, no modifications needed right? While there is nothing inherently wrong with this, the default will leave a lot of extra pixels on the graph such as unneeded axis titles, shadowing, canvas size, etc. With just a little effort here, you can remove all of these unnecessary data pixels (pictured below) and provide a more professional, clean design. The point that I’m trying to make here is, let’s not get lazy with our visualizations. Instead, we should try to give a lot of thought what is useful in the visualization, and what can be discarded without hindering the message. The less cluttered we make our visualizations, the more pleasant the user experience will be. And as we all know, the happier our user community is, the easier your life as a developer’s will be.

default to flat

 Options for Option’s Sake-The types of visualizations used for a dashboard are one of the most important criteria for user adoption. By choosing visualizations that do not adequately display the type of analysis needed, or tell the correct story about the data, can be a frustrating waste of time for the user. Just because there are a lot of available graphing options in your analytical tool, does not mean they need to be used. This mistake is often made in an attempt to visually enhance the dashboard, or add “variety” . Try to consider things like what type of scale is in my graph (nominal, or interval pictured below), or do I want to provide summary or detail? Be sure to choose your graph based on these factors, rather than picking a graph that you think will add to variety and then figuring out how you can make it useful.

nominal:interval graphs

Visually appealing dashboards are important, however this is only relevant when the graphs are enhancing the users analytical experience. These mistakes are very costly because the overall goal of a dashboard is not to provide variety for varieties sake, but to quickly and accurately relay a message. By focusing only on visualization variety, we run a terrible risk of rendering a dashboard useless.

There are a lot of great resources out there that can provide more detail that can surely take your dashboards to the next level so I certainly suggest reading up information design methodology. I think the principles I’ve listed above are a great way to get started and provide some quick fixes on the road to enhancing the user experience within your organization.

 

GoldenGate and Oracle Data Integrator – A Perfect Match in 12c… Part 3: Setup Journalizing

After a short vacation, some exciting news, and a busy few weeks (including KScope14 in Seattle, WA), it’s time to get the “GoldenGate and Oracle Data Integrator – A Perfect Match in 12c” blog series rolling again. Hopefully readers can find some time between World Cup matches to try integrating ODI and GoldenGate on their own!

To recap my previous two posts on this subject, I first started by showing the latest Information Management Reference Architecture at a high-level (described in further detail by Mark Rittman) and worked through the JAgent configuration, necessary for communication between ODI and GoldenGate. In the second post, I walked through the changes made to the GoldenGate JKM in ODI 12c and laid out the necessary edits for loading the Foundation layer at a high-level. Now, it’s time to make the edits to the JKM and set up the ODI metadata.

Before I jump into the JKM customization, let’s go through a brief review of the foundation layer and its purpose. The foundation schema contains tables that are essentially duplicates of the source table structure, but with the addition of the foundation audit columns, described below, that allow for the storage of all transactional history in the tables.

FND_SCN (System Change Number)
FND_COMMIT_DATE (when the change was committed)
FND_DML_TYPE (DML type for the transaction: insert, update, delete)

The GoldenGate replicat parameter file must be setup to map the source transactions into the foundation tables using the INSERTALLRECORDS option. This is the same option that the replicat uses to load the J$ tables, allowing only inserts and no updates or deletes. A few changes to the JKM will allow us to choose whether or not we want to load the Foundation schema tables via GoldenGate.

Edit the Journalizing Knowledge Module

To start, make a copy of the “JKM Oracle to Oracle Consistent (OGG Online)” so we don’t modify the original. Now we’re ready to make our changes.

Add New Options

A couple of new Options will need to be added to enable the additional feature of loading the foundation schema, while still maintaining the original JKM code. Option values are set during the configuration of the JKM on the Model, but can also have a default in the JKM.

APPLY_FOUNDATION

new-option-apply-fnd

This option, when true, will enable this step during the Start Journal process, allowing it to generate the source-to-foundation mapping statement in the Replicat (apply) parameter file.

FND_LSCHEMA

new-option-fnd-schema

This option will be set with Logical Schema name for the Foundation layer, and will be used to find the physical database schema name when output in the GoldenGate replicat parameter file.

Add a New Task

With the options created, we can now add the additional task to the JKM that will create the source to foundation table mappings in the GoldenGate replicat parameter file. The quickest way to add the task is to duplicate a current task. Open the JKM to the Tasks tab and scroll down to the “Create apply prm (3)” step. Right click the task and select Duplicate. A copy of the task will be created and in the order that we want, just after the step we duplicated.

Rename the step to “Create apply prm (4) RM”, adding the additional RM tag so it’s easily identifiable as a custom step. From the properties, open the Edit Expression dialog for the Target Command. The map statement, just below the OdiOutFile line, will need to be modified. First, remove the IF statement code, as the execution of this step will be driven by the APPLY_FOUNDATION option being set to true.

Here’s a look at the final code after editing.

map <%= odiRef.getObjectName("L", odiRef.getJrnInfo("TABLE_NAME"), odiRef.getOggModelInfo("SRC_LSCHEMA"), "D") %>, TARGET <%= odiRef.getSchemaName("" + odiRef.getOption("FND_LSCHEMA") + "","D") %>.<%= odiRef.getJrnInfo("TABLE_NAME") %>, KEYCOLS (<%= odiRef.getColList("", "[COL_NAME]", ", ", "", "PK") %>, FND_SCN)<%if (!odiRef.getOption("NB_APPLY_PROCESS").equals("1")) {%>, FILTER (@RANGE(#ODI_APPLY_NUMBER,<%= nbApplyProcesses %>,<%= odiRef.getColList("", "[COL_NAME]", ", ", "", "PK") %>))<% } %> INSERTALLRECORDS,
COLMAP (
USEDEFAULTS,
FND_COMMIT_DATE = @GETENV('GGHEADER' , 'COMMITTIMESTAMP'),
FND_SCN = @GETENV('TRANSACTION' , 'CSN'),
FND_DML_TYPE = @GETENV('GGHEADER' , 'OPTYPE')
);

The output of this step is going to be a mapping for each source-to-foundation table in the GoldenGate replicat parameter file, similar to this:

map PM_SRC.SRC_CITY, TARGET EDW_FND.SRC_CITY, KEYCOLS (CITY_ID, FND_SCN) INSERTALLRECORDS,
COLMAP (
USEDEFAULTS,
FND_COMMIT_DATE = @GETENV('GGHEADER' , 'COMMITTIMESTAMP'),
FND_SCN = @GETENV('TRANSACTION' , 'CSN'),
FND_DML_TYPE = @GETENV('GGHEADER' , 'OPTYPE')
);

The column mappings (COLMAP clause) are hard-coded into the JKM, with the parameter USEDEFAULTS mapping each column one-to-one. We also hard-code each foundation audit column mapping to the appropriate environment variable from the GoldenGate trail file. Learn more about the GETENV GoldenGate function here.

The bulk of the editing on this step is done to the MAP statement. The out-of-the-box JKM is setup to apply transactional changes to both the J$, or change table, and fully replicated table. Now we need to add the mapping to the foundation table. In order to do so, we first need to identify the foundation schema and table name for the target table using the ODI Substitution API.

map ... TARGET <%= odiRef.getSchemaName("" + odiRef.getOption("FND_LSCHEMA") + "", "D") %> ...

The nested Substitution API call allows us to get the physical database schema name based on the ODI Logical Schema that we will set in the option FND_LSCHEMA, during setup of the JKM on the ODI Model. Then, we concatenate the target table name with a dot (.) in between to get the fully qualified table name (e.g. EDW_FND.SRC_CITY).

... KEYCOLS (<%= odiRef.getColList("", "[COL_NAME]", ", ", "", "PK") %>, FND_SCN) ...

We also added the FND_SCN to the KEYCOLS clause, forcing the uniqueness of each row in the foundation tables. Because we only insert records into this table, the natural key will most likely be duplicated numerous times should a record be updated or deleted on the source.

Set Options

The previously created task,  “Create apply prm (4) RM”, should be set to execute only when the APPLY_FOUNDATION option is “true”. On this step, go to the Properties window and choose the Options tab. Deselect all options except APPLY_FOUNDATION, and when Start Journal is run, this step will be skipped unless APPLY_FOUNDATION is true.

jkm-set-task-option

Edit Task

Finally, we need to make a simple change to the “Execute apply commands online” task. First, add the custom step indicator (in my example, RM) to the end of the task name. In the target command expression, comment out the “start replicat …” command by using a double-dash.

--start replicat ...

This prevents GoldenGate from starting the replicat process automatically, as we’ll first need to complete an initial load of the source data to the target before we can begin replication of new transactions.

Additional Setup

The GoldenGate Manager and JAgent are ready to go, as is the customized “JKM Oracle to Oracle Consistent (OGG Online)” Journalizing Knowledge Module. Now we need to setup the Topology for both GoldenGate and the data sources.

Setup GoldenGate Topology - Data Servers

In order to properly use the “online” integration between GoldenGate and Oracle Data Integrator, a connection must be setup for the GoldenGate source and target. These will be created as ODI Data Servers, just as you would create an Oracle database connection. But, rather than provide a JDBC url, we will enter connection information for the JAgent that we configured in the initial post in the series.

First, open up the Physical Architecture under the Topology navigator and find the Oracle GoldenGate technology. Right-click and create a new Data Server.

create-ogg-dataserver

Fill out the information regarding the GoldenGate JAgent and Manager. To find the JAgent port, browse to the GG_HOME/cfg directory and open “Config.properties” in a text viewer. Down towards the bottom, the “jagent.rmi.port”, which is used when OEM is enabled, can be found.

####################################################################
## jagent.rmi.port ###
## RMI Port which EM Agent will use to connect to JAgent ###
## RMI Port will only be used if agent.type.enabled=OEM ###
####################################################################
jagent.rmi.port=5572

The rest of the connection information can be recalled from the JAgent setup.

setup-ogg-dataserver

Once completed, test the connection to ensure all of the parameters are correct. Be sure to setup a Data Server for both the source and target, as each will have its own JAgent connection information.

Setup GoldenGate Topology - Schemas

Now that the connection is set, the Physical Schema for both the GoldenGate source and target must be created. These schemas tie directly to the GoldenGate process groups and will be the name of the generated parameter files. Under the source Data Server, create a new Physical Schema. Choose the process type of “Capture”, provide a name (8 characters or less due to GoldenGate restrictions), and enter the trail file paths for the source and target trail files.

Create the Logical Schema just as you would with any other ODI Technology, and the extract process group schema is set.

For the target, or replicat, process group, perform the same actions on the GoldenGate target Data Server. This time, we just need to specify the target trail file directory, the discard directory (where GoldenGate reporting and discarded records will be stored), and the source definitions directory. The source definitions file is a GoldenGate representation of the source table structure, used when the source and target table structures do not match. The Online JKM will create and place this file in the source definitions directory.

Again, setup the Logical Schema as usual and the connections and process group schemas are ready to go!

The final piece of the puzzle is to setup the source and target data warehouse Data Servers, Physical Schemas, and Logical Schemas. Use the standard best practices for this setup, and then it’s time to create ODI Models and start journalizing. In the next post, Part 4 of the series, we’ll walk through applying the JKM to the source Model and start journalizing using the Online approach to GoldenGate and ODI integration.

Going Beyond the Summary Advisor with TimesTen for Exalytics

I’m over in Seattle at the moment for ODTUG KScope’14, and one of the topics I’m presenting on this week is the use of TimesTen with OBIEE; what I’m going to talk about is taking TimesTen beyond its current use as an in-memory cache for aggregates created by the Summary Advisor, and instead using it to store additional custom aggregates, “hot” data from the source data warehouse, and other custom structures created using SQL*Developer and potentially loaded using Oracle Data Integrator. The point of the session is to show one of the main benefits of TimesTen even in the world of the new In-Memory Option for Oracle Database – it’s a flexible but Oracle-compatible database that’s typically under the control of the BI team and open to much more customisation than most customers realise.

To illustrate the point, I’m going to run through an example using the Oracle Database Sales History (SH) schema as the source in this article, with the example going through five stages of development:

1. First, I’ll load TimesTen for Exalytics in the usual way, by having the Summary Advisor run some recommendations and then generate aggregates to load into TimesTen.

2. Then, I’ll add some of my own aggregates into TimesTen, by using the Aggregate Persistence Wizard, so that we’ve now got a base set of aggregates from me (ones that I know we’ll need) plus whatever the Summary Advisor recommends based on historic query patterns.

3. Next, I’ll use TimesTen to hold what we refer to as “hot data” – the actual transactional data from the source database in this case from the last six months. So now, TimesTen will contain both the full transaction-level data that users are most likely to query at detail-level, plus aggregations of the full dataset over time, giving us even more chance that TimesTen will be able to fulfil a user’s query.

4. Then I’ll supplement this data in TimesTen with some additional reference data, that I’ll bring in from file using a tool like Oracle Data Integrator or the Oracle Database external table feature, modifying the existing TimesTen table structure using SQL*Developer to hold these additional dimension attributes.

5. Finally, I’ll set up incremental refresh of all of this, in order to avoid the full table-drop-and-reload approach that the Summary Advisor and Aggregate Persistence Wizard use, and to make data refreshing more efficient

Let’s start with steps 1 and 2, creating the aggregate layer for the underlying Sales History Oracle data source.

1. Creating Initial Aggregate using the Summary Advisor

We can use OBIEE’s Summary Advisor, and the underlying Aggregate Persistence mechanism that the Summary Advisor uses, to build the in-memory aggregates for our underlying datasource. Starting with the simplest part first, we’ll use the Summary Advisor to create a set of aggregates based off of recommendations from historic query data.

Running the Summary Advisor in my case brings up a recommendation to create three aggregates:

Sshot 4

Running the script that the Summary Advisor generates brings up one of the issues though that you’re likely to hit when using this tool – your RPD has to be absolutely “rock solid” in terms of correctness, otherwise your aggregate build will fail. In the case of this example, the summary advisor aggregate persistence script starts-off running OK, but then errors at the end when one of the aggregate tables fails to build:

c:\Middleware\Oracle_BI1\bifoundation\server\bin>nqcmd -d coreapplication_OH799
36239 -u weblogic -p welcome1 -s c:\ttscripts\summ_adv_SH_full_blog.sql

-------------------------------------------------------------------------------
Oracle BI ODBC Client
Copyright (c) 1997-2013 Oracle Corporation, All rights reserved
-------------------------------------------------------------------------------


create aggregates

"ag_561184940"
for "Sales History (Oracle + TT)"."SALES"("AMOUNT_SOLD","QUANTITY_SOLD")
at levels ("Sales History (Oracle + TT)"."CustomerDim"."City", "Sales History
Oracle + TT)"."TimesDim"."Month", "Sales History (Oracle + TT)"."ProductsDim"."Category")
using connection pool "tt_aggr_store"."Connection Pool"
in "tt_aggr_store".."EXALYTICS",

…

"ag_1162984574"
for "Sales History (Oracle + TT)"."SALES"("AMOUNT_SOLD","QUANTITY_SOLD")
at levels ("Sales History (Oracle + TT)"."TimesDim"."Year", "Sales History (Or
cle + TT)"."ProductsDim"."Category", "Sales History (Oracle + TT)"."PromoDim"."Promo Subcategory")
using connection pool "tt_aggr_store"."Connection Pool"
in "tt_aggr_store".."EXALYTICS"
[10058][State: S1000] [NQODBC] [SQL_STATE: S1000] [nQSError: 10058] A general error has occurred.
Statement preparation failed


Processed: 1 queries
Encountered 1 errors

Looking at the list of tables left in TimesTen at that point, I can see that one of the dimensions failed to build, which then cascading down to the dependent fact table build failing:

C:\Users\Administrator>ttisql

Copyright (c) 1996, 2013, Oracle and/or its affiliates. All rights reserved.

Type ? or "help" for help, type "exit" to quit ttIsql.

Command> connect "DSN=tt_exalytics_3;uid=exalytics;pwd=welcome1";

Connection successful: DSN=tt_exalytics_3;UID=exalytics;DataStore=C:\TimesTen\tt_data_3;DatabaseCharacterSet=AL32UTF8;ConnectionCharacterSet=US7ASCII;DRIVER=C:\TimesTen\TT1122~2\bin\ttdv1122.dll;LogDir=C:\TimesTen\tt_data_3;PermSize=4000;TempSize=2000;LockWait=60010.0;SQLQueryTimeout=60000;TypeMode=0;QueryThreshold=60000;PLSCOPE_SETTINGS=IDENTIFIERS:NONE;RangeIndexType=1;(Default setting AutoCommit=1)

Command> tables;

  EXALYTICS.SA_CATEGOR0000AFE8
  EXALYTICS.SA_CHANNEL0000B8F9
  EXALYTICS.SA_CITY0000AC6A
  EXALYTICS.SA_MONTH0000AC81
  EXALYTICS.SA_PROMO_S0000B8E8
  EXALYTICS.TEST

6 tables found.

Summary Advisor aggregates failing to build is something that regular Exalytics developers will be used to, at least from when they first use the tool. The trick to it is to make sure you run the Model Checker, within the BI Administration Tool, before you try and generate some aggregates; whilst your RPD might pass the regular consistency check and be valid as a repository that can go online and be used for reporting, it may not be consistent or logically correct from an aggregation standpoint, and looking at the NQQuery.log file, I can see that the first aggregate failed to build because aggregate persistence rejected the shared product dimension all of the aggregate fact tables used.

Running the model checker on my business model, I can see there are two errors listed for the product dimension, and a bunch of warnings where I’ve not added dimension attribute columns to their relevant dimension level.

Sshot 5

Running the logical SQL given in the two error messages at the start, I can see that there’s an error in the data that I’m using for my Products dimension data source, in that one or more of the dimension levels contains keys that are duplicates – in this case because  there are two “Accessories” product subcategories with different IDs. I go into SQL*Developer and correct this issue, and also add the logical columns listed as “warnings” into their respective logical dimension levels, so that running the model checker again gives me this message instead:

Sshot 6

I then pick two of the aggregate recommendations and run the script for them, so in the end I’m left with the aggregate tables below in my RPD.

Sshot 7

Thereafter, I can run the Summary Advisor again to generate some more recommendations, or I can add my own specific aggregates into the TimesTen database using the Aggregate Persistence Wizard, for example to add an aggregate I know users are going to want to use.

2. Adding Custom Aggregates using the Aggregate Persistence Wizard

Generating aggregates using the Aggregate Persistence Wizard uses the same underlying mechanism to create and populate the aggregate tables, but in this instance I select the specific measures, dimensions and hierarchy levels I want to aggregate by. In this instance, I use the Aggregate Persistence Wizard to create an additional aggregate using the Promotions dimension, as I know some new reports coming along will make use of this pre-aggregation.

Sshot 9

So where I am at the end of all this is with a TimesTen aggregate data mart containing three aggregate fact tables – two recommended by the Summary Advisor, one I added myself – along with supporting dimension tables.

Sshot 11

And this is great for handling dashboard queries that request summarised data, through a graph or pivot table analysis. But what about when users want to drill-down to the detail, or run reports against recent transactional activity? That’s where the second TimesTen use-case, “hot data”, comes in.

3. Populating TimesTen with Transactional “Hot Data”

So far we’ve looked at populating our TimesTen database with aggregate data, but what about making the most recent set of transactions available to users also in TimesTen, so that analyses running against recent activity even at the detail level run fast too? To do this we can use the ttImportFromOracle utility to replicate into TimesTen the Sales History schema, and then edit the data loading script it generates to only load the last six months of data.

This utility only works for Oracle sources, and ships with recent versions of TimesTen in the /support directory. Using it I can have the utility scan the data in my source database in order to recommend the most space-efficient TimesTen datatypes, and I can also use it to recommend compression settings for use with TimesTen for Exalytics Columnar Compression feature.

As ttImportFromOracle requires you to have a TimesTen schema with the same name as the one you’re importing from in Oracle, I create a new schema in my TimesTen for Exalytics database, ready for importing from the Oracle source:

create user sh identified by welcome1;
grant create session to sh;
grant create table to sh;
grant select on SYS.OBJ$ to sh;
grant admin to sh;

Then I can use the utility to create the DDL and data loading scripts that I’ll then use to import the hot data from Oracle.

c:\TimesTen\tt1122_64_3\support>ttImportFromOracle.exe -oraconn sh/password@orcl  -tables CUSTOMERS PRODUCTS SALES CHANNELS PROMOTIONS TIMES -typeMap 2,1 -compression 1

Beginning processing
Resolving any tablename wildcards
Eliminating any duplicate tables
Getting metadata from source
Generating database user list
Assigning TimesTen datatypes
Analyzing source tables (this may take some time)
Analyzing table 'SH.CUSTOMERS' ...
Analyzing table 'SH.PRODUCTS' ...
Analyzing table 'SH.SALES' ...
Analyzing table 'SH.CHANNELS' ...
Analyzing table 'SH.PROMOTIONS' ...
Analyzing table 'SH.TIMES' ...
Optimizing TimesTen datatypes
Estimating compression ratios
Generating output files
Finished processing

This set of scripts creates the tables and indexes to hold the replicated data from Oracle, and I can edit the “LoadData.sql” script that comes with the DDL scripts to load just the last six months of data – to do this I split the “ttLoadFromOracle” TimesTen procedure calls in this script into ones for the dimensions and one for the fact table load, which after amending it to load just six months data looks like this:

timing 1; 

call ttLoadFromOracle('SH', 'SALES', 'SELECT S.PROD_ID, S.CUST_ID, S.TIME_ID, S.CHANNEL_ID, S.PROMO_ID, S.QUANTITY_SOLD, S.AMOUNT_SOLD 
FROM SH.SALES S, SH.TIMES T 
WHERE S.TIME_ID = T.TIME_ID 
AND  T.CALENDAR_MONTH_DESC > ''2001-06''');

Then I connect to TimesTen as the SH user, providing the password to the Oracle database as part of the connect string, and then run the scripts I’ve just generated:

Command> connect "DSN=tt_exalytics_3;uid=sh;password=welcome1;oraclepwd=password";
Command> run c:\ttimportscripts\CreateTables.sql
Command> run c:\ttimportscripts\LoadData_dims.sql
Command> run c:\ttimportscripts\LoadData_fact.sql
Command> run c:\ttimportscripts\CreateIndexes.sql
Command> run c:\ttimportscripts\UpdateStats.sql

I can then go over to SQL*Developer and see the new TimesTen tables there, with the optimised datatypes provided by ttImportFromOracle:

Sshot 13

and also import these new tables into the RPD, adding an extra schema alongside the aggregate tables:

Sshot 12

Then, the way to make use of this six months of transactional data in TimesTen is to set up “fragmentation” in the OBIEE RPD, so that the BI Server goes to the TimesTen data source when queries require data from the past six months, and the standard Oracle datasource if data further back from that is required; in addition, because of the aggregates we’ve also set up, it’ll go to the TimesTen aggregate tables if queries request aggregated data, so TimesTen performs the role of holding both “hot” data, and aggregate data.

The first step in setting this up is to map in the new TimesTen table as additional logical table sources into the existing business model. In the screenshot below, you can see the new TimesTen logical table sources added to the existing Oracle, and TimesTen aggregate logical table sources, and when you do this make sure you remove any additional logical table keys that might come across when you map in the new TimesTen tables.

Sshot 14

Then, for just the fact table within the logical iodel, edit the Oracle detail-level and the TimesTen “hot data” detail level table sources, add fragmentation conditions to define what time period each source covers, like this:

NewImage

Do this just for the fact table logical table source; then, when a query comes through to the BI Server, if it needs to refer to the TimesTen logical table source it’ll join-out to the relevant TimesTen hot data logical table sources to get the rest of the required data, and when it needs to go to the Oracle fact table logical table source, it’ll join-out to the Oracle LTSs, as shown in the diagram below.

NewImage

Then, when you incude the TimesTen aggregate tables into the architecture as well, you’ve got three potential sources of data for user reports, all handled seamlessly and automatically by the BI Server; the Oracle and TimesTen “hot” data sources for detail-level reporting, one or the other (or both) used for queries at the transaction level, and the TimesTen aggregate tables when they would answer a query faster than rolling-up the detail-level sources on-the-fly.

NewImage

4. Supplementing with Additional Reference Data

One of the advantages of using a mid-tier database like TimesTen, that’s under the control of the BI department, is that you can be a lot more “agile” in terms of bringing in new data sources, than is often the case when you’re reporting against a data warehouse that’s strictly governed and a shared corporate resource. TimesTen lends itself well to agile development too, in that you can work with it using tools like SQL*Developer and easily load data into it using ETL tools such as Oracle Data Integrator, or leverage the connectivity features in the Oracle Database and then use ttImportFromOracle.

Working with TimesTen database structures is more or less the same process as working with Oracle ones, but there’s a couple of differences you need to be aware of when developing your TimesTen tables. Adding new columns to a table is possible, but these can end up stored “out of line” with the main column set and can cause performance issues when queried, or compatibility issues when loading using tools like ttImportfFromOracle. If you’re using TimesTen to store “hot” data, make sure you’ve got corresponding tables for this reference data in the source Oracle database, and make sure you select the most appropriate TimesTen datatypes for your new data (for example, TT_TINYINT and TT_INTEGER) rather than just the default Oracle-like datatypes.

Most importantly, for all of your TimesTen tables, make sure you run the Index Advisor after you start to run user reports in order to ensure you’re using the right indexes for your particular query patterns, and make sure you gather stats on your TimesTen tables as well as the Oracle ones, so that the TimeTen query optimiser can generate the most efficient query plans.

5. Setting up Incremental Refresh

The last thing we want to set up for our TimesTen analytic data mart is some way to incrementally refresh the aggregates we built using the Summary Advisor and Aggregate Persistence Wizard. The way these tools work is that they drop, and then recreate the aggregate tables each time you load them, and add and remove them from the RPD at the start and end of the data load, as shown in the diagram below:

NewImage

What you can do, though, as outlined in this earlier blog post by Robin Moffatt, is take the POPULATE BI Server command that you’ll find in the nqquery.log file after an aggregate build, and use it to just refresh the aggregates in-place, without dropping the tables beforehand or removing them from the RPD, as shown in the diagram below.

NewImage

Looking through the nqquery.log file I can see various POPULATE commands for the TimesTen dimension tables, like this:

populate "SA_Promo_C0000B8E4" mode ( append table connection pool "tt_aggr_store"."Connection Pool") as  select_business_model "Sales History (Oracle + TT)"."PROMOTIONS"."PROMO_CATEGORY" as "PROMO_CATE0000B8C6","Sales History (Oracle + TT)"."PROMOTIONS"."PROMO_CATEGORY_ID" as "PROMO_CATE0000B8C7", RCOUNT (1) as "Promo_C_0000B8E4SK" from "Sales History (Oracle + TT)"; [[

If I then couple that with the INACTIVE_SCHEMAS variable being set to the TimesTen physical data source, add a command to truncate the dimension table before the load, and then alter the fact table POPULATE command’s SELECT statement to only load in just this past month’s data, deleting that data down from the target fact table beforehand, I’ve got a full incremental refresh I can run from a cron job or other scheduler. The script excerpt below shows such an example, with just a few of the dimensions being reloaded.

For the “hot data” tables it’s a similar case of taking the existing LoadData.sql and either modifying it to load in just the new transactions, or reload all six months if it’s not easier to do just that.

execute physical connection pool "tt_aggr_store"."Connection Pool" truncate table SA_Categor0000AFE8;
 
SET VARIABLE INACTIVE_SCHEMAS='"tt_aggr_store".."EXALYTICS_AGG_WIZ"':
populate "SA_Categor0000AFE8" mode ( append table connection pool "tt_aggr_store"."Connection Pool") as  select_business_model "SH (Oracle)"."PRODUCTS"."PROD_CATEGORY_DESC" as "PROD_CATEG0000AFB4", RCOUNT (1) as "Categor_0000AFE8SK" from "SH (Oracle)";
 
execute physical connection pool "tt_aggr_store"."Connection Pool" truncate table SA_Month0000AC81;
 
SET VARIABLE INACTIVE_SCHEMAS='"tt_aggr_store"..”EXALYTICS"':
populate "SA_Month0000AC81" mode ( append table connection pool "tt_aggr_store"."Connection Pool") as  select_business_model "SH (Oracle)"."TIMES"."CALENDAR_MONTH_DESC" as "CALENDAR_M0000AC0A","SH (Oracle)"."TIMES"."CALENDAR_QUARTER_DESC" as "CALENDAR_Q0000AC0E","SH (Oracle)"."TIMES"."CALENDAR_YEAR" as "CALENDAR_Y0000AC12", RCOUNT (1) as "Month_0000AC81SK" from "SH (Oracle)";
 
execute physical connection pool "tt_aggr_store"."Connection Pool" truncate table SA_City0000AC6A;
 
SET VARIABLE INACTIVE_SCHEMAS='"tt_aggr_store".."EXALYTICS"':
populate "SA_City0000AC6A" mode ( append table connection pool "tt_aggr_store"."Connection Pool") as  select_business_model "SH (Oracle)"."CUSTOMERS"."CUST_CITY" as "CUST_CITY0000ABC5","SH (Oracle)"."CUSTOMERS"."CUST_STATE_PROVINCE" as "CUST_STATE0000ABD4", RCOUNT (1) as "City_0000AC6ASK" from "SH (Oracle)";
 
execute physical connection pool "tt_aggr_store"."Connection Pool" delete from ag_SALES where CALENDAR_M0000AC0A = '2001-12';
 
SET VARIABLE INACTIVE_SCHEMAS='"tt_aggr_store".."EXALYTICS"':
populate "ag_SALES" mode ( append table connection pool "tt_aggr_store"."Connection Pool") as  
select_business_model "SH (Oracle)"."CUSTOMERS"."CUST_CITY" as "CUST_CITY0000ABC5",
                      "SH (Oracle)"."SALES"."AMOUNT_SOLD" as "AMOUNT_SOL0000ABF6",
                      "SH (Oracle)"."SALES"."QUANTITY_SOLD" as "QUANTITY_S0000ABFB",
                      "SH (Oracle)"."TIMES"."CALENDAR_MONTH_DESC" as "CALENDAR_M0000AC0A",
                      "SH (Oracle)"."PRODUCTS"."PROD_CATEGORY_DESC" as "PROD_CATEG0000AFB4" 
                      from "SH (Oracle)"
                      where "SH (Oracle)"."TIMES"."CALENDAR_MONTH_DESC" = '2001-12';

Conclusion

So what we’ve got here then is a much-expanded use of TimesTen in the context of Exalytics and OBIEE; we’ve not only used it as an in-memory cache for Summary Advisor aggregates, but we’ve used the same underlying mechanism to create other aggregates that we know will be useful for future queries.

We’ve also made use of the 1-2TB of RAM on the Exalytics server to also cache the last six months of detail-level transactional data, making it even more likely that TimesTen will be able to answer all of the users’ queries.

Once you’ve got your data in TimesTen, you’ve got access to the same full set of analysis functions that are available when reporting against regular Oracle databases, with TimesTen for Exalytics giving you more native aggregation and analysis functions compared to standard TimesTen, and the BI Server “functionally compensating” for anything not natively available in TimesTen by performing the calculation itself, using the raw data provided by TimesTen

 Finally, we’ve talked about how we can load in additional datasets into TimesTen via ODI or ttImportFromOracle, potentially using the latter in conjunction with external tables to bring in file data, and then looked at what’s involved in trickle-feeding the TimesTen cache rather than dropping and reloading it each time.

If you’re at KScope’14 in Seattle this week, I’m presenting on this topic on the Wednesday; if not, and you’re potentially interested in us helping you get more out of your TimesTen for Exalytics install, just drop me at line at mark.rittman@rittmanmead.com.