It is known that Oracle focuses its sales on cloud this year and in the BI world, we have seen the introduction of Oracle Business Intelligence Cloud Service – BICS – that we already covered on the blog. Another great product to store files and unstructured data is Oracle Storage Cloud Service, that can also be used as a staging area before loading a BICS schema. To provision our BI system in the cloud or to backup our files, it’s important to find a reliable and maintainable way to move on-premise data to the cloud. And when we speak about reliable and maintainable way to move data, I can’t help but think about Oracle Data Integrator. Luckily, Ayush Ganeriwal wrote two excellent blog posts explaining how to use ODI to load these two cloud services! And he even gave away the Knowledge Modules, ODI Open Tools, Jar files and other things he developed to make it work.
Load data to Oracle Storage Cloud Service with a package.
Oracle Storage Cloud Service is used to store files and unstructured data in the cloud, very much like Amazon S3. In the case of a company that has moved to the cloud, they need some space to store their files that will be used by other cloud services. Oracle Storage Cloud Service is the place to store all these files. If you want to use it as a cloud backup of your local filesystem, you might even choose Oracle Storage Cloud Archive Service which has cheaper storage but cost more when retrieving or editing data, similarly to Amazon Glacier .
Oracle Storage Cloud Service offers a RESTful web service API to load data as well as a Java Library directly wrapping this API and supporting client-side encryption. When it comes to load data, what could be better than using the tool we already use for all the rest? In this blog post , Ayush details how he created an ODI OpenTool that can load data to the Oracle Storage Cloud Service using the Java Library.
We first need to download the Java Library and place it in our ODI Studio or Agent classpaths. Below are the default locations for these classpaths, but we can also specify additional classpath if we want. Don’t forget to close ODI Studio and shutdown the agent(s) before adding the files.
ODI Studio on Unix : ~/.odi/oracledi/userlib ODI Studio on Windows : %APPDATA%odioraclediuserlib 12c Standalone Agent : /odi/agent/lib 11g Standalone Agent : /oracledi/agent/drivers/
For a JEE agent, we need to create a server template as described in the doc : https://docs.oracle.com/middleware/1212/odi/ODIDG/setup_topology.htm#CHDHGIGI.
In the same folder, we can also drop the Jar file for the Open Tool that Ayush created and posted on java.net (last link in the folder at the moment of writing). Remember that all the drivers, libraries and Open Tools have to be present in every ODI Studio or agent install.
We can now reopen ODI Studio as we need to register the new Open Tool in the master repository so everyone can use it. In the ODI Menu, click on Add Remove Open Tools and then on the googles icon to search for the Open Tools. In our case, the class is under the package oracle.odi.rest.
Select the OracleStorageCloudGet tool and click the Plus icon to add it. There we can see the description of the tool, the command, the parameters and the icon used. Let’s add the OracleStorageCloudPut as well and click OK.
That’s it! We can now see two new ODI Open Tools under the Plugins category in the Package Toolbox. We can add it to the package and fill all the parameters or copy/paste the command to use it in an ODI Procedure.
Load data to Oracle Business Intelligence Cloud Service with a mapping.
BICS offers a Data Loader which is actually an Apex application taking Excel, CSV, text or zip files as input. But this is more a manual process used when working with sample data or really small dataset. If we plan to use BICS in a production environment, we need something more resilient that can be scheduled and that handles errors. We could use Data Sync but maybe you don’t want to introduce a new tool. Luckily, BICS also provides a RESTful web service API we can use to load data. Again, ODI is the perfect tool to create our own reusable components to integrate with new sources or targets. And once again, Ayush did all the job for us as described in this article.
The first step is similar to what we did previously, we need make all the drivers and libraries available for ODI Studio and the agent. The steps in the two Knowledge Modules contains Jython code, calling some java methods from the odi-bics.jar file. These methods – just like the Oracle Storage Cloud Service java library – will use the Jersey Libraries to do the RESTful web service calls so we will need them as well. Basically, just take all the jar files from the archive you can find on java.net under the name “RKM and IKM for Oracle BI Cloud Service”. As mentioned above, we need to add it to the classpath of ODI Studio and the agent installs and restart them.
The article mentions that we need to create a new Technology for BI Cloud Service but an XML export is already created in the archive. Go in the Topology, right click on Technology and import the xml file.
We can also import the two Knowledge Modules, either globally or under a single project. Ayush wrote one RKM to reverse-engineer the tables from our BICS schema and one Multi-Connections IKM to integrate the data from the on-premise database to the BICS schema in the cloud. I had to tweak the IKM a little bit but it might be linked to my particular setup. The steps were not displayed in the order it was set in the XML file and I had to reorder it properly to place the drop before the create and truncate. Probably a bug, I will investigate that later.
We are done with the install of new components and we can now set up the topology. First create a Dataserver with the newly imported technology and provide a login/password and the URL to the BICS instance as the Service URL. From my testing and unlike Ayush’s screenshot, it should start with https and should not include the port number. Of course, we need to create a physical schema where we need to provide our Identity Domain in both Tenant fields. We can leave the Database fields empty. The last step to do in the Topology is to create a Logical schema and associate it to the Physical schema through a context.
We can then switch to the Designer tab and create a new Model with BI Cloud Service technology and that Logical Schema. On the Reverse Engineer tab we select Customized and the RKM BI Cloud Service. We can also specify a mask to restrict the metadata import to specific table names before hitting the Reverse Engineer button at the top. I set the mask to DIM_LOC% and only my DIM_LOCATION as been reverse engineered from the BICS schema.
Finally, we can create a mapping. In this example I load two target tables. DIM_LOCATIONS (plural) is sitting on an on-premise Oracle Database for users accessing OBIEE from our HQ while DIM_LOCATION (singular) is in the schema linked to the our BICS instance used by remote users. There is nothing special here, except that I unselected the insert and update checkboxes for my surrogate key (LOCATION_SK) because I asked BICS to automatically populate it when I created the table – under the scene, a sequence is created and a trigger added to the table to populate the field with the next value of the sequence on each insert.
On the Physical tab, the LKM should be set on LKM SQL Multi-connect because we will delegate the data transfer to the Multi-Connections IKM. This IKM is the one we imported : IKM SQL to BI Cloud Service. I choose to enable the truncate option here.
Let’s hit the execute button and watch the result in the operator and in our BICS schema.
The two articles demonstrate one more time how easy it is to plug a new technology within ODI. This tool is a wonderful framework that can be used to move data from and to any technologies, even if it’s Big Data or Cloud as we have seen here. By having every integration jobs happening in the same place, we can have an easier maintenance, a better monitoring and we can schedule all the jobs together. It’s also way easier to see the big picture in our projects.
If you have any question about this post feel free to reach me, we love that kind of interesting challenges at Rittman Mead.
Thanks a lot to Ayush for such good components and libraries he provided. I’m glad I will share the stage with him and a fellow ODI expert, Holger Friedrich, at the Oracle Open World. We will speak about Best Practices for Development Lifecycle Management . Come to see us and say hi!
Another question that came up from the ODI12c Bootcamp Course I’m delivering for a client in London at the moment is how to choose between the different knowledge modules that come with ODI12c. What with the choice now between template-style KMs and the new component-style KMs, the new option of multi-connect KMs, and the general question around which KM you pick within a KM type when building a table mapping, I thought it’d be interesting to take a closer look at how knowledge modules work with ODI12c and how you go about making the right choice of KM when creating a mapping.
As a quick primer, Oracle Data Integrator up until the recent 12c release had six types of knowledge module you could use in a data mapping:
- Load Knowledge Modules, for loading source data out of the source database server and into a staging table typically on the target database platform
- Integrate Knowledge Modules, for taking that staging data and integrating (inserting, updating, merging etc) it into the target table
- Reverse Knowledge Modules, for reverse-engineering the table metadata from a source system
- Check Knowledge Modules, for performing data quality checks on source and target tables
- Journalise Knowledge Modules, for setting up change data capture on a source table or table set
- Service Knowledge Modules, for exposing tables or other datastore as CRUD-type web services
Using ODI11g as an example, when you created a new mapping you selected an LKM for extracting data out of your source database, an IKM for integrating the results into the target table, and optionally a CKM or JKM if you needed to run data quality checks or use table journalization (CDC). In all cases you had to first import the knowledge module definitions into the ODI11g Work Repository and your project before you could use them. To take an example, an ODI11g mapping where the source was a file and the target, an Oracle database, might look like this as a Mapping diagram:
Looking at the Flow diagram, at the start there are no knowledge modules to select from as none have yet been imported:
I therefore import a selection of IKM, LKM and other KMs for the technologies I’m using, and then I’m able to assign an LKM and IKM to my flow diagram.
For both Load Knowledge Modules (LKMs) and Integration Knowledge Modules (IKMs), you have a number of options ranging from generic JBDC/SQL-type modules that connect to a source and then transfer the data using JDBC batch routines, through to highly-specialized ones that leverage particular platform technologies. I typically start the prototyping phase of my ODI projects by selecting the simplest, most generic LKM and IKM I can find, and once I’ve got the mapping logic correct then shift to one that uses more of the underlying database’s features – the docs also have a nice guide for making your KM selection. For example, I might assign the LKM SQL to Oracle knowledge module to the source table and the IKM SQL Control Append one to the target, like this:
In this case, ODI will first create a Java routine that extract via a JDBC connection the rows of data from the file, and then load that data into a staging table on the target database server. Then, that staging data will be integrated into the target table using a regular SQL INSERT statement.
Of course I could do this more efficiently using an Oracle External Table. Let’s select the LKM File to Oracle (EXTERNAL TABLE) load knowledge module and also change the IKM to IKM Oracle Incremental Update; note that this would only work if the target database server could see the file we’re loading in via this mechanism – if the file had to stay on a remote server then I’d have to stick with the IKM SQL Control Append and ODI would effectively ignore the request to use an Oracle External Table.
With ODI12c, things are a bit different due to the introduction of another type of knowledge module: “Component-Style” Knowledge Modules. Component-style KMs were introduced for a few reasons with ODI12c; they made migration of mappings and projects from OWB easier as OWB mappings are made up of lots of arbitrarily-arranged mapping operator components that can be combined into all-types of data flow, and they also made it possible for Oracle to create lots more of these granular mapping components and use them across all technology types. For example as Oracle’s David Allen talks about in this comment on one of our previous blog posts and in this follow-up blog of his own, Oracle could create a generic Table Function component-style KM and have it apply a SQL table function for Oracle sources, or run a Pig relation through an arbitrary Pig Latin script as I did in a more recent blog post.
These new component-style KMs come built-in to ODI12c which means that you don’t actually have to import any template-style KMs in to get started with a mapping; in the example below from my blog post earlier today on ODI12c and Oracle Streams, I can run the mapping I’ve just created by just selecting from the built-in component-style KMs that ship with ODI12c out-of-the-box.
So should we now use component-style KMs when creating mappings and avoid using the old template-style ones? The docs don’t say this explicitly, but my impression is that component-style KMs are the way Oracle wants to take things forward and in most cases, ODI will automatically select suitable component-style LKMs and IKMs when you create the physical mapping and this is usually the best option; only time I switch to a template-style IKM or LKM is if I’m using a platform technology that component-style KMs don’t yet cover, or I’m working on some edge-case – by default though I go with component-style LKMs and IKMs. Of course you still need to import JKMs and other KM types and presumably Oracle will extend component KMs beyond IKMs and LKMs over time, but that’s my recommendation for now.
So now we’ve got what component-style KMs are, there’s another new KM concept that came along with ODI12c – “Multi-Connect” KMs. I used a multi-connect IKM File-Hive to Oracle knowledge module towards the end of another article where I used Oracle Loader for Hadoop to export data out of Hadoop and into an Oracle Database; normally when you use an IKM the staging and target areas are on the same database, but in this case the staging table for the mapping was on the Hadoop (Hive) side and I therefore had to select LKM SQL Multi-Connection as the load knowledge module, whereapon I could then select IKM File-Hive to Oracle (OLH/OSCH) as the integration knowledge module.
Similarly, with our ODI12c mapping if I moved the staging area to the source database, or more commonly a separate “ETL-hub”-style database where the customer wants a more ETL (compared to ELT)-style integration setup, I can request that the staging location for the mapping moves to this hub database using a new feature in the logical mapping editor:
Then in the mapping I can add some transformations that take place on the ETL hub database, something you’d most probably do to avoid licensing ODI on their full data warehouse database server.
Then when I switch to the Physical mapping view I can see this additional execution unit with the transformations in it, I first select the LKM SQL Multi-Connect to bring the file data in, then I can select an LKM and IKM combination that supports multi-connect but uses an Oracle-specific technology (in this case, dblinks) to move the data from the ETL hub to the target database.
Or, if I didn’t like using dblinks or the ETL hub was on a non-Oracle platform, I could use the LKM SQL Multi-Connect LKM to expose the ETL Hub tables and then use the more generic IKMs in the list that also support multi-connections but bring the data across to the target database using a Java routine and batching, with the data flowing through the agent rather than Oracle server-to-Oracle server.
So – a few thoughts on the new knowledge module setup in ODI12c, and what these new component KMs and multi-connect KMs are actually for. You can find more articles on ODI12c and data integration over on our blog, if you’re interested in reading more about ODI12c development and internals.
Although Oracle GoldenGate replaced Oracle Streams a couple of years ago as the recommended data replication and change data capture technology for Oracle databases, many customers still on Oracle Database 11gR2 or earlier still use Streams for Oracle-to-Oracle change data capture as it works and compared to GoldenGate doesn’t require any additional licensing. Oracle’s GoldenGate Statement of Direction paper from 2014 states that streams in Oracle 11gR2 will continue to be supported but no future versions of the Oracle Database will come with Streams included, but if you’re on 11gR2 and you just want to trickle-feed capture between two Oracle databases it’s an interesting option.
I covered Oracle-to-Oracle data replication using Streams a few times in the past including this OTN article on ODI and Change Data Capture back from before 2007 or so, this article on OWB11gR2 and Change Data Capture from 2010 and once from back in 2006 that went into the details of setting up asynchronous hotlog change data capture with the new “Paris” OWB10gR2 release. We’re now on the 12c release of Oracle Data Integrator and I’m teaching our ODI12c Bootcamp course to a client next week who’s particularly interested in using Streams with ODI12c, so I thought it’d be worth taking a look at this feature in more detail to see if much has changed since the earlier articles.
Let’s start then with an ODI12c 12.1.3 install with a regular mapping set-up to copy and join the DEPT and EMP tables from one Oracle database into a denormalized table in another Oracle Database. Both are Oracle Database 11gR2 (18.104.22.168) and the initial mapping looks like this:
One thing that many ODI developers don’t know about the 12c release is that it comes with a set of “component-style” knowledge modules built-into the tool, which you can use straightaway to get a mapping running without having to select and import IKMs, LKMs and other KMs from the ODI Studio filesystem. In my case the Physical mapping looks like the screenshot below with two execution units (one for each Oracle Database server) and a number of built-in component-style KMs available for selection. I choose the LKM SQL to SQL (Built-in) load knowledge module which uses a generic JDBC connection to load source records into the staging table on the target server, and then the IKM Oracle Insert integration knowledge module to take that staging data and integrate it into the target table.
I then run this mapping and see that ODI extracted data from the source database using a Java routine and batch transfers into the Oracle staging table, and then integrated the contents of that staging table into the target Oracle table. I could of course improve this by using the LKM Oracle to Oracle (DBLink) knowledge module and thereby avoid loading in two steps, but what I’d instead like to do is use Oracle Streams to trickle-feed new and changed data from my source tables over to the target database server, as shown in the diagram below.
In the OWB and Asynchronous Change Data Capture article i linked to earlier in the post, setting up change data capture involved quite a few steps; the database had to be put into archivelog mode, the GLOBAL_NAMES parameter had to be set and a whole bunch of PL/SQL procedures had to be called to set up the source-to-target connection. Once it’s running, Streams takes transactions off of the redo log files on the source database and send them across the network to the target database server in a similar way to how GoldenGate sends transactions in the trail file across to target database servers – except it’s Oracle-to-Oracle only and in my experience is a lot more fragile than GoldenGate, which is why we and most other customers switched to GoldenGate when it came out.
ODI12c comes with a number of change data capture or “journalizing” knowledge modules that use either database triggers, UPDATE_DATE fields, Oracle Streams or GoldenGate to replicate data from source system to the target data warehouse. The journalizing knowledge module we’ll use, JKM Oracle 11g Consistent (Streams) is a template-style KM that needs to be imported first from the filesystem where ODI Studio was installed, as shown in the screenshot below – note also when you do this yourself that there’s a big “deprecated” notice next to it, saying that it could be removed at any time (presumably in-favour of the GoldenGate-based ones)
ODI and the JKM Oracle 11g Consistent (Streams) KM takes a more “fck it, let’s just to it” approach to database configuration than my OWB10gR2 version did, automatically configuring the source database for Streams by running all of the setup PL/SQL routines leaving you just to put the database in archivelog mode if it’s not already, and granting the connecting user (in my case, SYSTEM) streams administrator privileges. Moving over to SQLPlus on the source database I therefore run the setup commands listed in the KM notes like this:
SQL*Plus: Release 22.214.171.124.0 Production on Sun Jul 26 06:34:44 2015
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Oracle Database 11g Enterprise Edition Release 126.96.36.199.0 – 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> grant dba to system;
3 grantee => ‘system’,
4 grant_privileges => true);
PL/SQL procedure successfully completed.
SQL> shutdown immediate;
ORACLE instance shut down.
SQL> startup mount;
ORACLE instance started.
Total System Global Area 2471931904 bytes
Fixed Size 2230872 bytes
Variable Size 570426792 bytes
Database Buffers1895825408 bytes
Redo Buffers 3448832 bytes
SQL> alter database archivelog;
SQL> alter database open;
Next I’ll go back to ODI Studio and enable the source Model for journalising by double-clicking on the model in ODI Studio and then selecting the JKM from the Journalizing tab, like this:
I then right-click on the EMP and DEPT tables within the source model and select Changed Data Capture > Add to CDC, where I can fine-tune the replication order so that new departments (DEPTNO) employees link to will always have been created before the employee record hits.
It’s at this next stage, when I enable journalizing, that Streams is set-up on the source and target database servers and all the supporting tables and views are created. To enable journalizing I click on the model, not the individual tables, and select Changed Data Capture > Add to CDC, like this:
If you’ve read any of our previous posts on ODI and changed data capture you’ll realise that this setup process is that same regardless of the underlying replication technology, which makes it easy to start with database-centric CDC technologies such as this and then move to GoldenGate later on without lots of rework or re-training. For now though lets run this setup process using the local agent and then check the Operator navigator to see what it did (and whether it worked…)
And it did work. Enabling journalising with the Oracle Streams JKM involves quite a few setup steps including checking that all the database settings and parameters are enabled correctly, then running the various DBMS_STREAMS and other packages to setup the capture and transmission process. Then, as with all of the ODI JKMs a set of J$ tables are created to hold the primary keys of new and changed records coming from the source system, along with JV$ views that join those primary keys to the full incoming replicated rows – this blog post by the Oracle ODI team explains the background to JKMs very well if you want to understand them in more detail. Looking at the source SCOTT schema in SQL*Developer I can see the CDC and J$/JV$ tables and views created in the schema; if I didn’t want these tables created in the actual data schema I could have specified a different schema as the WORK schema when I created the database connection in ODI Studio prior to this exercise.
Next I have to define one or more “subscribers’ to the journals; for these more advanced “consistent set JKMs” of which the Oracle Streams one is one, you can define multiple consumers or “subscribers” to the changed data so that one can be further down the queue than the other (Simple JKMs only allow a single subscriber). I call my subscriber “SUNOPSIS” as this is the default subscriber name ODI adds to the mappings downstream.
Pressing OK after adding the subscriber again brings up the prompt to select an agent, and going over to the Operator navigator I can see that another set of steps have run, again doing some streams setup but also adding details of the journal subscribers to the tables created on the source database.
I can check that journalising is now working by using the View Data… feature in the Designer navigator to insert new rows into the EMP and DEPT tables, and then checking the J$ tables in the source database schema to see if the rows’ primary keys have been added – which they have been.
To now just read journalised new and changed data into my mapping, all I do then is go to the Physical mapping diagram, select the first source table and check the Use Journalized Data Only checkbox, then do the same for the other table (note it is the table source you select, not the access point for that table into the target execution block).
So now I’ll run the mapping and check the results in the Operator navigator … but instead, I get an error:
This is because, given the way that ODI handles CDC and journalising, it can’t allow two journalised tables to be directly joined in a mapping – we think this is because ODI can’t guarantee both tables are in the same update “state” and therefore makes you copy their data into a staging or intermediary table before you can do the join. I therefore amend the mapping to load the journalised tables into staging tables on the target database server, amend the joins and filter to reference the staging tables, and then join their contents to then filter and load into the target reporting table, like this:
With the Physical mapping details, the two incoming Oracle staging tables are loaded by the same LKM SQL to SQL component-style mapping as we used before to extract data from the two journalised tables, and the Journalized Data Only flag is still set for the source tables, as you can see below.
What this also highlights is a key difference between the way I trickle-fed transactions across from my source database back with the OWB and Changed Data Capture article back at the start of this post, and the way ODI’s JKMs do it; in the OWB example I set up an actual trickle-feed process outside of OWB which transferred changed data across the network to my target data warehouse, whereapon I then read those change tables and used them to update the target DW tables in my data warehouse.
And this is actually how the GoldenGate KMs work with ODI – a GoldenGate replication process copies new and changed data from the source database to the target and ODI then reads from these change tables, whereas the Streams (and other non-GoldenGate) JKMs create the change capture tables back on the source database (the various J$ and JV$ tables I reviewed using SQL*Developer earlier on), with ODI then reading from those remote change tables and bringing across the new and changed data to the target database server. This I guess makes things easier to set up – you don’t have to worry about configuring the target database for streams replication – but it does mean that you still incur the network traffic every-time you micro-batch the changes across the network rather than spreading that traffic transaction-by-transaction.
Anyway, back to ODI and this time, when I run the mapping it works, though looking at the Operator navigator again I can see that no new data came across, and the journal data is still waiting to be consumed afterwards. Why is this?
If you’ve only used the Simple CDC JKMs from Oracle before this is normally all you need to do, but with Consistent Set ones such as this one, or the GoldenGate JKMs, you need to lock the subscriber view of the journalised data and extend the CDC window before you can access the journal records; for the Simple JKMs the IKM (Integration Knowledge Module) takes care of the unlock, extend, purge and lock operations for you automatically in the background, whereas with Consistent Set ones you typically do this as part of a wider ODI package as shown in the screenshot below, with the first and last tasks created by dragging and dropping the journalized model onto the package canvas and selecting Journalizing Model as the type (the subscriber name, “SUNOPSIS” in this case is typed-in below those settings and is off-screen in the screenshot)
Now when I run this package, as opposed to the mapping on its own, I get the row of new data I was expecting and the journal table is now empty, as I was the only subscriber.
Finally, if I was looking for real-time continuous loading into this target table, I could wrap the package in an event-detection loop that waits for in this case ten seconds for three journal rows to be written, then either processes the three as soon as the third arrives or loops around again every ten seconds (obviously in-reality you’d want to put in a mechanism to halt the loop if needed, but in my case I’ll just kill the job from the Operator navigator when I want it to stop)
So that’s the basics of using ODI with Oracle Streams for Oracle-to-Oracle changed data capture; if you’re just copying data between two Oracle databases and you’re on 11gR2 or earlier this might be an option, but long-term you’ll need to think about GoldenGate as Oracle aren’t developing Streams beyond the 11gR2 release. Note also that all the streams activity happens over on the source database server so you still need this additional step to copy the journaled data across to the target data warehouse, but it’s still a fairly non-invasive way to capture changes on the source Oracle database and it does have the considerable advantage (compared to GoldenGate) of being free-to-use.
If you’ve been keeping up with us, you know that our BITeamwork system works really well for communicating and collaborating within your enterprise analytics system. When your users don’t feel like they can participate, they start using the product less. This can cost your organization valuable investment dollars. One of the ways our product, BITeamwork, […]
Problem: Message Invalid Subject Area when editing column formula
Invalid Subject Area The selected request cannot be performed because it references an unknown subject area named [NAME].
- close Edit Column Formula tab
- select Advanced tab
- enter the name of the right Subject Area
- click Apply SQL
- click Criteria tab
- edit column
The message is gone. Now you can select objects from the subject area.