Category Archives: Rittman Mead

Real World OBIEE: Demystification of Variables Pt. 2

In part one of this blog series, I went over using bins and presentation variables to dynamically create groups and switch between them in a report and on a dashboard. In part two, I am going to talk about making reports dynamic for periods of time using repository, system and presentation variables. Before I dive into an example, there are a couple of things I would like to cover first.

SYSDATE

The sysdate function returns the current datetime set by the system where the database resides. Sysdate is a really useful function for creating repository variables for use with date dimensions. If I go into SQL Developer, I can write a query to return the current sysdate:

select sysdate from dual;

CURRENT_DATE

The current_date functions returns the current datetime set by the system where the bi server resides. This datetime may differ from sysdate depending on the geographical location of the database vs. the system that OBIEE resides on. I can write a query using sql developer to return the datetime using the current_date function:

select current_date from dual;

Since my database and OBIEE instance are on the same system, sysdate and current_date are the same.

TRUNCATE

When using sysdate or current_date to create repository variables for dates (which I am going to show in an upcoming example), you have to keep something in mind. While the date may match, the time may not. To show an example of this, I am going to join one of my date columns with sysdate.

select sysdate, dim_date_key from dual, 
gcbc_pef.dim_date
where sysdate = dim_date_key;

If I run this query, I don't get an error but I get no results.

Why? To answer this, I need to write a query to inspect my date column.

select dim_date_key from gcbc_pef.dim_date;

As you can see by the results of my query, the DIM_DATE_KEY column does have the same format as sysdate but all the times are set to 00:00:00 (or midnight). To further demonstrate the difference between my date column and sysdate, I am going to write a new query and use the TRUNC (or TRUNCATE) function.

select sysdate, dim_date_key from dual, 
gcbc_pef.dim_date
where trunc(sysdate) = dim_date_key;

As you can see, the query runs successfully but notice how sysdate and DIM_DATE_KEY still have different times. How is the join possible? Because I used the truncate function in the where clause in my query for sysdate. Without going into too much detail, using truncate on a date function without any formatting (which I will cover later) will set (or truncate) the datetime to the start (or midnight) of the current day. For example, if I run another query that just selects the truncated sysdate from dual, I get this result.

select trunc(sysdate) from dual;

Now, lets dive into an example.

Note: For all of the examples in this blog series I am using OBIEE 12.2.1.2.0

The Scenario

In this example, I have been asked to create a report that is going to reside on a products dashboard. It needs to have the same product grouping as the report I used part one of this series, needs to contain Gross Rev $, Net Rev $ and # of Orders and have a prompt that can select between the first and current day of the month and every day in-between. The person who requested the report wants the prompt to change dynamically with each month and does not want users to be able to select future dates.

There are two foreseeable challenges with this report. The first, and probably the most obvious, is how to make the date prompt for the current month and have it change dynamically with each month. The second is how to pass the dates into the report.

There is one more challenge that I will have to tackle. There is a gap in the data loads for # of Orders. Data does not update until the 2nd or 3rd of each new month. This wouldn't be a big deal except the person who requested the report wants a summary of the previous months # of Orders to be shown until the data is updated for the current month.

Fortunately, by using Repository, System and Presentation Variables, I can accomplish all of the requirements of this report.

The Example

For this example, I am going to start by creating Repository Variables to use with my date column in order to make the dates dynamic. There are other ways to make dates dynamic using functions within Answers but they are a little bit trickier to use and are less common. I am going to go over some of those functions in part three of this blog series.

Repository Variables are created using the Admin Tool. By launching the Admin Tool and opening my RPD in online mode (can also be created offline), I can go to Manage > Variables to start creating my first Repository Variable.

From the Variable Manager window, I can create a Repository Variable by selecting Action > New > Repository > Variable.

I am going to start by creating the Repository Variable for the current date. Since this variable will be dynamic, I need to make sure I select the option 'Dynamic' and I am going to give it the name USCurDate.

Now I need to create a new init block. I can do this by clicking New...

Once in the Repository Variable Initialization Block screen, I need to give the init block a name, set the schedule for when variable or variables will be refreshed then click Edit Data Source to define the connection pool the init block will use as well as the initialization string (query) the init block will use to populate the Repository Variable.

In the data source window, I am going to set my connection pool to one I have created just for my init blocks and then type in the following into the initialization string window:

select TRUNC(sysdate) from dual;

If I click Test, the query will execute and will return a result.

Notice how the result is the same as the query I ran using SQL Developer earlier.

Now I need to create a Repository Variable for the first day of every month. I am going to use the same method as before and name it USMoBeginDate. The query I am going to use is slightly different from the previous query. I still need to use the TRUNC function but I also need to apply formatting so that it truncates to the start of the month. I am going to enter the following into the initialization string window:

select TRUNC(sysdate, 'MM') from dual;

Some other useful queries I can use are:

First Day of the Current Year

select TRUNC(sysdate, 'YY') from dual;

Last Day of the Previous Year

select TRUNC(sysdate, 'YY') -1 from dual;

Previous Year Date

select TRUNC(ADD_MONTHS(sysdate, -12)) from dual;

Now I need to create a Repository Variable for the previous month to use with my # of Orders measure column. Upon inspection, I discover that the column I need to use is called Calendar Year Month and is a VARCHAR or character type. If I go into Answers and pull in the Calendar Year Month column, I can see the format is 'YYYYMM'

To create the Repository Variable, I am going to use the same method as with the current date and first day of the current month Repository Variables and issue a new query. Because the Calendar Year Month column is a VARCHAR, I need to use the to_char function to change sysdate from a date type to a character type, use some formatting syntax and use some basic arithmetic. The query is as follows:

select to_char(to_number(to_char(sysdate, 'YYYY')) * 100 + to_number(to_char(sysdate, 'MM') -1)) from dual;

To break down each part of this query, lets start with the year. In order to use the 'YYYY' format I must first cast sysdate to a character (to_char(sysdate, 'YYYY')). Then I need to cast that result back to and int so that I can multiply by 100. This will give me the result 201500.00. The reason for this is when I add the month number to my yearx100, there will always be a leading 0 for month numbers 1-9. To get the previous month number, I have to first cast sysdate to a character and use the formatting 'MM'. I then have to cast it back to an int and subtract 1 to get the previous month number (to_number(to_char(sysdate, 'MM') -1) then cast the entire statment back to a character type so that it matches the type for the Calendar Year Month column. When I run the query, I get this result.

Now that I have my three repository variables (USCurDate, USMoBeginDate and Prev_Month) I can start to create the report.

Im going to fast forward a little bit to the part of the report creation process where I will use my Repository Variables I created using the Admin Tool. Since I am using virtually the same report as part one of this blog series, please refer back for how to create custom groups using bins and presentation variables and custom value prompts.

Because of the delay in the data load for the # of Orders at the beginning of the month, I can not use a global report filter. Instead, I am going to have to use something called a Filter Expression within each measure column formula.

About Filter Expressions

Unlike global report filters, column formula level filter expressions are used when you need to specify a particular constraint within the column formula itself. Because the filter is at the column formula level, it is independent of any subsequent column filters.

Note: When using a column formula filter for a measure, you can not add a global filter of the same data subject on top of it. For example, if using a column level filter for a particular Year and Month, I can not add a global filter for a particular year. The two filters contradict each other and the result will be null.

To add a filter in the column formula, go to Edit formula, make sure the column syntax is highlighted and click Filter.

From here the Insert Filter window will pop up and I can select the attribute column to filter the measure by. Here, I want to use the column Day Date to filter Gross Rev $ by the day.

I can add a column by double clicking it in the the Subject Areas pane. When a column is added, I will be prompted with a New Filter window and from here, everything is exactly the same process as adding a global report filter.

Here I need to define the operator as is between since we are dealing with date ranges. I could call my Repository Variables for current_date and first day of the month here but, because the request is for a prompt to select between date ranges, I am going to have to call Presentation Variables and use the prompt to populate the actual values.

Note: If you are unsure about the functionality of Presentation Variables, see part one of this blog series

To add Presentation Variables to the filter expression, click Add More Options and select Presentation Variable from the dropdown.

When a Presentation Variable is added to the filter, two new text boxes appear. The Variable Expr box is where you define the variable to be used and the (default) box is used to add a default value. The default value is optional but, when defining a Presentation Variable within a filter, you have to specify a default value in order to get any results. The reason for this is because, when the report is run, the query issued will use the Presentation Variable placeholder that is defined unless a default value is specified. In other words, the default value will always be used unless the Presentation Variable is populated with a value or a list of values.

Because I want the users to be able to specify a date range, I need to define two Presentation Variables: one for the start date and one for the end date. I can add another place for a Presentation Variable by simply clicking Add More Options again and selecting Presentation Variable.

Now I need to add both my start and end date Presentation Variables in the Variable Expr boxes. I’m going to call my start date presentation variable pv_start_dt and my end date presentation variable pv_end_dt. I am also going to specify a default date range from the beginning of the current month (10/01/2015) to yesterday's date (10/15/2015).

If I click OK, I will be taken back to the Insert Filter screen where I can see the filter expression previously defined.

Clicking OK again will return me to Edit Column Formula which shows the column formula with the filter expression defined in the previous steps.

Now I have to do the exact same thing for the Net Rev $ column. Since the filter expression is identical, I can simply copy and paste the column formula for Gross Rev $ and replace the column name in the expression.

Now I need to take care of the # of Orders column. This column is tricky because of the gap between the 1st and the 2nd or 3rd of every month. I could use a filter expression that defaults to the previous month by using the previous month repository variable I created in a previous step, but this alone wouldn’t switch over when the data became available.

So how can we fulfill the requirement of the report if we don’t know the exact date in which the data will be available? This can be accomplished by using a CASE statement as shown previously in part one of this series. We can break the Case statement down into two parts or two conditions:

1. When the day for the current month is less than or equal to 2 OR if # of Orders is null, then filter # of Orders by Calendar Year Month using the value of the Prev_Month Repository Variable.

2. When condition one is not true, then filter # of Orders by Day Date between the values of the pv_start_date and the pv_end_date Presentation Variables

Putting both conditions together and using the correct syntax for Column Formula results in the following formula:

Note that I am using CURRENT_DATE in my column formula. In this case, I am extracting the day number from the current date by using the extract day function (DAY(CURRENT_DATE)). I am going to talk about this in further detail when I talk about using built in functions in Answers to make reports dynamic in part 3 of this series.

Now I need to create my dashboard prompt. I am going to start by clicking on New > Dashboard Prompt.

I need to create two prompts: One for the start date and one for the end date. Because I am using presentation variables as placeholders for the date between values, I have to use a Variable Prompt instead of a Column Prompt. Variable Prompts allow us to define a presentation variable and then define a list of values for the users to select from.

To create a Variable Prompt for Start Date, I can click on the new prompt icon and select Variable Prompt.

There a few things I need to do in order to make this prompt function for the report. First, I have to define the same presentation variable name (pv_start_dt) that I used in the filter expressions for the Gross Rev $, Net Rev $ and # of Orders columns.

Because this is not a column prompt, I have to manually specify the values I want the user to be able to select from. Rather than typing in each value, I can use the SQL Results option from the Choice List Values dropdown and use a SQL statement to select the exact values that I want.

This may seem daunting at first but there is a very straightforward way to accomplish this. Rather than manually writing out a SQL query, we can make use of the Advanced Tab within a new report.

I’m going to start by clicking New > Analysis and selecting the column that I want values for: Day Date.

I need to add a filter to Day Date so that it returns only the values I want to user to select from.

Now I need to select the operator to be is between and add two Repository Variables that I have set up: one for the first date of the current month and one for the current date of the current month.

If I go to results, I can see the data returned with the filter I have specified.

As you can see, the Day Date column only contains the values from the first of the month to the current date (October, 16th 2015 in this example)

Now for the good stuff. I can navigate to the Advanced Tab and copy the SQL statement used to generate these values and paste them into the SQL Results text box in my prompt.

You will notice that within the SQL Statement generated by OBI,
there are numbers and s_# between the SELECT and Day Date column, after the Day Date column and there is also an order by clause that uses a number “2”. Without going into too much detail, this what OBI uses to make the query more efficient when retrieving results from the database. In order to allow the values to populate the prompt, these have to be removed in OBIEE 12c and the “ORDER BY” clause has to be rewritten in order to make it work.

This

SELECT
   0 s_0,
   "Sales - Fact Sales"."Periods"."Day Date" s_1
FROM "Sales - Fact Sales"
WHERE
("Periods"."Day Date" BETWEEN VALUEOF("USMoBeginDate") AND  VALUEOF("USCurDate"))
ORDER BY 2 ASC NULLS LAST
FETCH FIRST 65001 ROWS ONLY

Changed to this

SELECT
   "Sales - Fact Sales"."Periods"."Day Date"
FROM "Sales - Fact Sales"
WHERE
("Periods"."Day Date" BETWEEN  VALUEOF("USMoBeginDate") AND  VALUEOF("USCurDate"))
ORDER BY "Periods"."Day Date" ASC
FETCH FIRST 65001 ROWS ONLY

This can be a bit confusing if you are not very familiar with SQL but just remember:

When populating a prompt using an SQL statement in OBIEE 12c, take out any number and anything that begins with “s” between the SELECT and first column and anything that begins with “s” after any subsequent columns and make sure the “ORDER BY” clause contains the actual column name of the column you want to order by.

Note: If you do not require any values to be in order, you can omit the “ORDER BY” clause all together.

If I expand Options in the Edit Prompt window, I can add a default selection or a default value that the prompt will start with. I can use the USMoBeginDate here as well so that the prompt always starts with the first date of every month as the start date.

Note: You will notice that under Options in the Edit Prompt window there is a Variable Data Type option with a dropdown selector. This can be used if the data type needs to be specified to something other than the default which is ‘text’ or character type. If you are getting an error when running the report that says “Selected value does not match datatype. Expected [this value] but got [this value]” you need to change the Variable Data Type to the datatype of the column you are prompting on. In this example, we are prompting a date datatype so therefore it needs to be set to date.

If I click OK, I can check the values in the display window by clicking the dropdown for the Start Date prompt I just created.

The blue checkmark indicates the value that is selected which, because the first date of every month was set by using the USMoBeginDate Repository Variable as the default value, defaults to the first date of the current month (October, 1st 2015) in this example.

Now I need to create another Variable Prompt for the End Date. The SQL statement used for Start Date can be reused for the values as we want the exact same values to be available for selection. I am going to specify the presentation variable to be named pvenddt, and the default value to be the USCurDate Repository Variable so that the End Date prompt always defaults to the current date.

Now all that’s left to do is put the prompt and report on the Dashboard. Here is the result.

So that concludes part 2 of Demystification of Variables. Please feel free to ask questions or leave me a comment! In part 3, I am going to talk about using built in front end functions and presentation variables to make reports dynamic for any series of time. Until next time.

A Performance Analytics Application Case Study: Challenges and Successes

The Performance Analytics application is a collection of open source technologies that aids users in: identifying performance bottlenecks, identifying causes for slow report execution, discovering areas for performance optimization, and gathering meaningful insights into the health of an OBIEE environment.

This post focuses on lessons learned after a successful Performance Analytics application installation, where within one day of being operational it enabled us to identify and isolate a long-standing memory issue. Here's how.

OBIEE Performance Analytics Dashboards

Overview

Rittman Mead recently undertook an engagement with the remit to:

  • Carry out a health check on the current state of an OBIEE platform, architecture, and development process
  • Install the Rittman Mead Performance Analytics application, enabling rapid and accurate diagnostics of OBIEE issues

The client was on OBIEE 11g, having previously upgraded from 10g. OBIEE Production environment was a three-node cluster running the latest version of the 11g release. It served around 150 users daily, of a registered user base of around 1000.

The client had a long-standing issue with memory alerts on the master node of OBIEE cluster, but no evident architectural, hardware capacity, or software configuration issues were found after completing the health check.

Challenges and successes

Gather all relevant data

Performance Analytics gathers data from a variety of sources in order to provide a full stack view of the OBIEE environment.

  • Active Session History (ASH) - The Active Session History data is read from the v$ACTIVE_SESSION_HISTORY system database view. Access to this data allows Performance Analytics users to have an understanding of the performance and state of the database at a given point it time as it provides information such as the SQL operation being performed, which application is executing the query, whether the query is actively being performed or is waiting for service, what state of execution the query is in, and many other useful statistics.

  • Operating System Metrics - Unix-based Operating Systems contain several commands used to gather information about the performance or status of the server such as vmstat, iostat, netstat, top and df. Performance Analytics utilizes the output of these commands to display the current status of the OS at a given point in time.

  • Usage Tracking - The Oracle BI Server supports the collection of usage tracking data. When usage tracking is enabled, the Oracle BI Server collects usage tracking data for each query, and it writes statistics to a usage tracking log file or inserts them directly into a database table. Access to this data allows Performance Analytics users to have an understanding of the performance of the BI Server and specific reports in the OBIEE environment at any given point in time.

  • OBIEE metrics - OBIEE has the capability to expose internal performance data through the Dynamic Monitoring Service (DMS). The data exposed contains information such as Connection Pool statistics, JVM statistics, the number of active threads, and much more. Access to this data allows Performance Analytics to record the current status of many of the OBIEE components and intricacies found within the tool.

Performance Analytics was deployed using Docker in a couple of days, requiring absolutely no software installation on any machine other than the monitoring server. All configuration settings are held in one file, and it was sufficient to add connection details of each server to it in order to gather all aforementioned data.

Accurately diagnose issues

By combining operating system metrics (CPU, memory, etc.) with internal OBIEE metrics and those from the database, Performance Analytics gives a "single pane of glass" view on the behaviour of the whole stack. This enables correlations in behaviour to be easily identified, and issues drilled into using the analysis capabilities of the tool.

Within a day of being installed, Performance Analytics enabled the client to accurately diagnose a long-standing issue with memory alerts on OBIEE master node. The cause was traced to the export to Excel of a large dataset by a single user.

Workload Planning

Performance Analytics allows to capture system status and workload over time, so you can see how the system is responding to peak loads in real-time.

With Performance Analytics the client is now able to estimate maximum workload the current architecture can support before starting to see issues and whether it is going to cope with the next years workload.

Conclusion

Performance Analytics just paid for itself.

Performance Analytics collects all relevant data and makes it accessible from one central location, allowing users to investigate performance inquiries promptly and simply. Instead of forcing users to dig through database records or a linux server manually, they can access all of the same data through a set of dashboards engineered to facilitate discovery from the collected data.

If you’d like to find out more about the Performance Analytics service offered by Rittman Mead, please get in touch.

Property Graph in Oracle 12.2

The latest release of Oracle (12.2) includes support for Property Graph, previously available only as part of the Big Data Spatial and Graph tool. Unlike the latter, in which data is held in a NoSQL store (Oracle NoSQL, or Apache HBase), it is now possible to use the Oracle Database itself for holding graph definitions and analysing them.

Here we'll see this in action, using the same dataset as I've previously used - the "Panama Papers".

My starting point is the Oracle Developer Day VM, which at under 8GB is a tenth of the size of the beast that is the BigDataLite VM. BDL is great for exploring the vast Big Data ecosystem, both within and external to the Oracle world. However the Developer Day VM serves our needs perfectly here, having been recently updated for the 12.2 release of Oracle. You can also use DB 12.2 in Oracle Cloud, as well as the Docker image.

Prepare Database for Property Graph

The steps below are based on Zhe Wu's blog "Graph Database Says Hello from the Cloud (Part III)", modified slightly for the differing SIDs etc on Developer Day VM.

First, set the Oracle environment by running from a bash prompt

. oraenv

When prompted for SID enter orcl12c:

[oracle@vbgeneric ~]$ . oraenv
ORACLE_SID = [oracle] ? orcl12c  
ORACLE_BASE environment variable is not being set since this  
information is not available for the current user ID oracle.  
You can set ORACLE_BASE manually if it is required.  
Resetting ORACLE_BASE to its previous value or ORACLE_HOME  
The Oracle base has been set to /u01/app/oracle/product/12.2/db_1  
[oracle@vbgeneric ~]$

Now launch SQL*Plus:

sqlplus sys/oracle@localhost:1521/orcl12c as sysdba  

and from the SQL*Plus prompt create a tablespace in which the Property Graph data will be stored:

alter session set container=orcl;

create bigfile tablespace pgts  
datafile '?/dbs/pgts.dat' size 512M reuse autoextend on next 512M maxsize 10G  
EXTENT MANAGEMENT LOCAL  
segment space management auto;  

Now you need to do a bit of work to update the database to hold larger string sizes, following the following steps.

In SQL*Plus:

ALTER SESSION SET CONTAINER=CDB$ROOT;  
ALTER SYSTEM SET max_string_size=extended SCOPE=SPFILE;  
shutdown immediate;  
startup upgrade;  
ALTER PLUGGABLE DATABASE ALL OPEN UPGRADE;  
EXIT;  

Then from the bash shell:

cd $ORACLE_HOME/rdbms/admin  
mkdir /u01/utl32k_cdb_pdbs_output  
mkdir /u01/utlrp_cdb_pdbs_output  
$ORACLE_HOME/perl/bin/perl $ORACLE_HOME/rdbms/admin/catcon.pl -u SYS -d $ORACLE_HOME/rdbms/admin -l '/u01/utl32k_cdb_pdbs_output' -b utl32k_cdb_pdbs_output utl32k.sql

When prompted, enter SYS password (oracle)

After a short time you should get output:

catcon.pl: completed successfully  

Now back into SQL*Plus:

sqlplus sys/oracle@localhost:1521/orcl12c as sysdba  

and restart the database instances:

shutdown immediate;  
startup;  
ALTER PLUGGABLE DATABASE ALL OPEN READ WRITE;  
exit  

Run a second script from the bash shell:

$ORACLE_HOME/perl/bin/perl $ORACLE_HOME/rdbms/admin/catcon.pl -u SYS -d $ORACLE_HOME/rdbms/admin -l '/u01/utlrp_cdb_pdbs_output' -b utlrp_cdb_pdbs_output utlrp.sql

Again, enter SYS password (oracle) when prompted. This step then takes a while (c.15 minutes) to run, so be patient. Eventually it should finish and you'll see:

catcon.pl: completed successfully  

Now to validate that the change has worked. Fire up SQL*Plus:

sqlplus sys/oracle@localhost:1521/orcl12c as sysdba  

And check the value for max_string, which should be EXTENDED:

alter session set container=orcl;  
SQL> show parameters max_string;

NAME                                 TYPE        VALUE  
------------------------------------ ----------- ------------------------------
max_string_size                      string      EXTENDED  

Load Property Graph data from Oracle Flat File format

Now we can get going with our Property Graph. We're going to use Gremlin, a groovy-based interpretter, for interacting with PG. As of Oracle 12.2, it ships with the product itself. Launch it from bash:

cd $ORACLE_HOME/md/property_graph/dal/groovy  
sh gremlin-opg-rdbms.sh

--------------------------------
Mar 08, 2017 8:52:22 AM java.util.prefs.FileSystemPreferences$1 run  
INFO: Created user preferences directory.  
opg-oracledb>  

First off, let's create the Property Graph object in Oracle itself. Under the covers, this will set up the necessary database objects that will store the data.

cfg = GraphConfigBuilder.\  
        forPropertyGraphRdbms().\
        setJdbcUrl("jdbc:oracle:thin:@127.0.0.1:1521/ORCL").\
        setUsername("scott").\
        setPassword("oracle").\
        setName("panama").\
        setMaxNumConnections(8).\
        build();
opg = OraclePropertyGraph.getInstance(cfg);  

You can also do this with the PL/SQL command exec opg_apis.create_pg('panama', 4, 8, 'PGTS');. Either way, the effect is the same; a set of tables created in the owner's schema:

SQL> select table_name from user_tables;  
TABLE_NAME  
------------------------------------------
PANAMAGE$  
PANAMAGT$  
PANAMAVT$  
PANAMAIT$  
PANAMASS$  

Now let's load the data. I'm using the Oracle Flat File format here, having converted it from the original CSV format using R. For more details of why and how, see my article here.

From the Gremlin prompt, run:

// opg.clearRepository();     // start from scratch
opgdl=OraclePropertyGraphDataLoader.getInstance();  
efile="/home/oracle/panama_edges.ope"  
vfile="/home/oracle/panama_nodes.opv"  
opgdl.loadData(opg, vfile, efile, 1, 10000, true, null);  

This will take a few minutes. Once it's completed you'll get null response, but can verify the data has successfully loaded using the opg.Count* functions:

opg-oracledb> opgdl.loadData(opg, vfile, efile, 1, 10000, true, null);  
==>null
opg-oracledb> opg.countEdges()  
==>1265690
opg-oracledb> opg.countVertices()  
==>838295

We can inspect the data in Oracle itself too. Here I'm using SQLcl, which is available by default on the Developer Day VM. Using the ...VT$ table we can query the number of distinct properties the nodes (verticies) in the graph:

SQL> select distinct k from panamaVT$;  
K  
----------------------------
Entity incorporation.date  
Entity company.type  
Entity note  
ID  
Officer icij.id  
Countries  
Type  
Entity status  
Country  
Source ID  
Country Codes  
Entity struck.off.date  
Entity address  
Name  
Entity jurisdiction  
Entity jurisdiction.description  
Entity dorm.date

17 rows selected.  

Inspect the edges:

[oracle@vbgeneric ~]$ sql scott/oracle@localhost:1521/orcl

SQL> select p.* from PANAMAGE$ p where rownum<5;

       EID       SVID       DVID EL               K       T V      VN VT     SL VTS  VTE  FE
---------- ---------- ---------- ---------------- ---- ---- ---- ---- ---- ---- ---- ---- ----
         6          6     205862 officer_of
        11         11     228601 officer_of
        30         36     216748 officer_of
        34         39     216487 officer_of

SQL>  

You can also natively execute some of the Property Graph algorithms from PL/SQL itself. Here is how to run the PageRank algorithm, which can be used to identify the most significant nodes in a graph, assigning them each a score (the "page rank" value):

set serveroutput on  
DECLARE  
    wt_pr  varchar2(2000); -- name of the table to hold PR value of the current iteration
    wt_npr varchar2(2000); -- name of the table to hold PR value for the next iteration
    wt3    varchar2(2000);
    wt4    varchar2(2000);
    wt5    varchar2(2000);
    n_vertices number;
BEGIN  
    wt_pr := 'panamaPR';
    opg_apis.pr_prep('panamaGE$', wt_pr, wt_npr, wt3, wt4, null);
    dbms_output.put_line('Working table names  ' || wt_pr
       || ', wt_npr ' || wt_npr || ', wt3 ' || wt3 || ', wt4 '|| wt4);
    opg_apis.pr('panamaGE$', 0.85, 10, 0.01, 4, wt_pr, wt_npr, wt3, wt4, 'SYSAUX', null, n_vertices)
;
END;  
/

When run this creates a new table with the PageRank score for each vertex in the graph, which can then be queried as any other table:

SQL> select * from panamaPR  
  2  order by PR desc
  3* fetch first 5 rows only;
      NODE         PR          C
---------- ---------- ----------
    236724 8851.73652          0
    288469 904.227685          0
    264051 667.422717          0
    285729 562.561604          0
    237076 499.739316          0

On its own, this is not so much use; but joined to the vertices table, we can now find out, within our graph, the top ranked vertices:

SQL> select pr.pr, v.k,v.V from panamaPR pr inner join PANAMAVT$ V on pr.NODE = v.vid where v.K = 'Name' order by PR desc fetch first 5 rows only;  
        PR K          V
---------- ---------- ---------------
8851.73652 Name       Portcullis TrustNet Chambers P.O. Box 3444 Road Town- Tortola British Virgin Isl  
904.227685 Name       Unitrust Corporate Services Ltd. John Humphries House- Room 304 4-10 Stockwell Stre  
667.422717 Name       Company Kit Limited Unit A- 6/F Shun On Comm Bldg. 112-114 Des Voeux Road C.- Hong  
562.561604 Name       Sealight Incorporations Limited Room 1201- Connaught Commercial Building 185 Wanc  
499.739316 Name       David Chong & Co. Office B1- 7/F. Loyong Court 212-220 Lockhart Road Wanchai Hong K

SQL>  

Since our vertices in this graph have properties, including "Type", we can also analyse it by that - the following shows the top ranked vertices that are Officers:

SQL> select V.vid, pr.pr from panamaPR pr inner join PANAMAVT$ V on pr.NODE = v.vid where v.K = 'Type' and v.V = 'Officer' order by PR desc fetch first 5 rows only;  
       VID         PR
---------- ----------
  12171184 1.99938104
  12030645 1.56722346
  12169701 1.55754873
  12143648 1.46977361
  12220783 1.39846834

which we can then put in a subquery to show the details for these nodes:

with OfficerPR as  
        (select V.vid, pr.pr
          from panamaPR pr
               inner join PANAMAVT$ V
               on pr.NODE = v.vid
         where v.K = 'Type' and v.V = 'Officer'
      order by PR desc
      fetch first 5 rows only)
select pr2.pr,v2.k,v2.v  
from OfficerPR pr2  
     inner join panamaVT$ v2
     on pr2.vid = v2.vid
where v2.k in ('Name','Countries');  
        PR K          V
---------- ---------- -----------------------
1.99938104 Countries  Guernsey  
1.99938104 Name       Cannon Asset Management Limited re G006  
1.56722346 Countries  Gibraltar  
1.56722346 Name       NORTH ATLANTIC TRUST COMPANY LTD. AS TRUSTEE THE DAWN TRUST  
1.55754873 Countries  Guernsey  
1.55754873 Name       Cannon Asset Management Limited re J006  
1.46977361 Countries  Portugal  
1.46977361 Name       B-49-MARQUIS-CONSULTADORIA E SERVICOS (SOCIEDADE UNIPESSOAL) LDA  
1.39846834 Countries  Cyprus  
1.39846834 Name       SCIVIAS TRUST  MANAGEMENT LTD

10 rows selected.  

But here we get into the limitations of SQL - already this is starting to look like a bit of a complex query to maintain. This is where PGQL comes in, as it enables to express the above request much more eloquently. The key thing with PGQL is that it understands the concept of a 'node', which removes the need for the convoluted sub-select that I had to do above to first identify the top-ranked nodes that had a given property (Type = Officer), and then for those identified nodes show information about them (Name and Countries). The above SQL could be expressed in PGQL simply as:

SELECT n.pr, n.name, n.countries  
WHERE (n WITH Type =~ 'Officer')  
ORDER BY n.pr limit 5  

At the moment Property Graph in the Oracle DB doesn't support PGQL - but I'd expect to see it in the future.

Jupyter Notebooks

As well as working with the Property Graph in SQL and Gremlin, we can use the Python API. This is shipped with Oracle 12.2. I'd strongly recommend using it through a Notebook, and this provides an excellent environment in which to prototype code and explore the results. Here I'll use Jupyter, but Apache Zeppelin is also very good.

First let's install Anaconda Python, which includes Jupyter Notebooks:

wget https://repo.continuum.io/archive/Anaconda2-4.3.0-Linux-x86_64.sh  
bash Anaconda2-4.3.0-Linux-x86_64.sh  

In the install options I use the default path (/home/oracle) as the location, and keep the default (no)

Launch Jupyter, telling it to listen on any NIC (not just localhost). If you installed anaconda in a different path from the default you'll need to amend the /home/oracle/ bit of the path.

/home/oracle/anaconda2/bin/jupyter notebook --ip 0.0.0.0

If you ran the above command from the terminal window within the VM, you'll get Firefox pop up with the following:

If you're using the VM headless you'll now want to fire up your own web browser and go to http://<ip>:8888 use the token given in the startup log of Jupyter to login.

Either way, you should now have a functioning Jupyter notebook environment.

Now let's install the Property Graph support into the Python & Jupyter environment. First, make sure you've got the right Python set, by confirming with which it's the anaconda version you installed, and when you run python you see Anaconda in the version details:

[oracle@vbgeneric ~]$ export PATH=/home/oracle/anaconda2/bin:$PATH
[oracle@vbgeneric ~]$ which python
~/anaconda2/bin/python
[oracle@vbgeneric ~]$ python -V
Python 2.7.13 :: Anaconda 4.3.0 (64-bit)  
[oracle@vbgeneric ~]$

Then run the following

cd $ORACLE_HOME/md/property_graph/pyopg  
touch README  
python ./setup.py install  

without the README being created, the install fails with IOError: [Errno 2] No such file or directory: './README'

You need to be connected to the internet for this as it downloads dependencies as needed. After a few screenfuls of warnings that appear OK to ignore, the installation should be succesful:

[...]
creating /u01/userhome/oracle/anaconda2/lib/python2.7/site-packages/JPype1-0.6.2-py2.7-linux-x86_64.egg  
Extracting JPype1-0.6.2-py2.7-linux-x86_64.egg to /u01/userhome/oracle/anaconda2/lib/python2.7/site-packages  
Adding JPype1 0.6.2 to easy-install.pth file

Installed /u01/userhome/oracle/anaconda2/lib/python2.7/site-packages/JPype1-0.6.2-py2.7-linux-x86_64.egg  
Finished processing dependencies for pyopg==1.0  

Now you can use the Python interface to property graph (pyopg) from within Jupyter, as seen below. I've put the notebook on gist.github.com meaning that you can download it from there and run it yourself in Jupyter.

Accelerating Your ODI Implementation, Rittman Mead Style

Introduction

Over the years, at Rittman Mead, we've built up quite a collection of tooling for ODI. We have accelerators, scripts, reports, templates and even entire frameworks at our disposal for when the right use case arises. Many of these tools exploit the power of the excellent ODI SDK to automate tasks that would otherwise be a chore to perform manually. Tasks like, topology creation, model automation, code migration and variable creation.

In this blog post, I'm going to give you a demo of our latest addition, a tool that allows you to programmatically create ODI mappings. ( and a few other tricks )

So you may be thinking isn't that already possible using the ODI SDK ? and you'd be right, it most definitely is. There are many examples out there that show you how it's done, but they all have one thing in common, they create a fairly simple mapping, with, relatively speaking, quite a lot of code and are only useful for creating the said mapping.

And herein lies the obvious question, Why would you create a mapping using the ODI SDK, when it's quicker to use ODI Studio ?

And the obvious answer is...you wouldn't, unless, you were trying to automate the creation of multiple mappings using metadata.

This is a reasonable approach using the raw ODI SDK, the typical use case being the automation of your source to stage mappings. These mappings tend to be simple 1 to 1 mappings, the low hanging fruit of automation if you like. The problem arises though, when you want to automate the creation of a variety of more complex mappings, you run the risk of spending more time writing the automation code, than you would actually save due to the automation itself. The point of diminishing return can creep up pretty quickly.

The principle, however, is sound. Automate as much as possible by leveraging metadata and free up your ODI Developers to tackle the more complex stuff.

All Aboard the Rittman Mead Metadata Train !

What would be really nice is something more succinct, more elegant, something that allows us to create any mapping, with minimal code and fuss.

Something that will allow us to further accelerate...

  • Migrating to ODI from other ETL products
  • Greenfield ODI Projects
  • Day to Day ODI Development work

..all powered by juicy metadata.

These were the design goals for our latest tool. To meet these goals, we created a mini-mapping-language on top of the ODI SDK. This mini-mapping-language abstracts away the SDK's complexities, while, at the same time, retaining its inherent power. We call this mini mapping language OdiDsl ( Oracle Data Integrator Domain Specific Language ) catchy heh?!

OdiDsl

OdiDsl is written in Groovy and looks something like this...

/*
 * OdiDsl to create a SCD2 dimension load mapping.
 */

mapping.drop('MY_PROJECT', 'DEMO_FOLDER', 'EMPLOYEE_DIM_LOAD')

mapping.create('MY_PROJECT', 'DEMO_FOLDER', 'EMPLOYEE_DIM_LOAD')  
        .datastores([
                [name: "HR.EMPLOYEES"],
                [name: "HR.DEPARTMENTS"],
                [name: "HR.JOBS"],
                [name: "PERF.D_EMPLOYEE", integration_type: "SCD"],
        ])
        .select("EMPLOYEES")
            .filter('NAME_FILTER', [filter_condition: "EMPLOYEES.FIRST_NAME LIKE 'D%'" ])
            .join('EMP_DEPT', ['DEPARTMENTS'], [join_condition: "EMPLOYEES.DEPARTMENT_ID = DEPARTMENTS.DEPARTMENT_ID" ])
            .join('DEPT_JOBS', ['JOBS'], [join_condition: "EMPLOYEES.JOB_ID = JOBS.JOB_ID" ])
            .connect("D_EMPLOYEE", [
                                        [ attr: "employee_id", key_indicator: true ],
                                        [ attr: "eff_from_date", expression: "sysdate", execute_on_hint: "TARGET"],
                                        [ attr: "eff_to_date", expression: "sysdate", execute_on_hint: "TARGET"],
                                        [ attr: "current_flag", expression: 1, execute_on_hint: "TARGET"],
                                        [ attr: "surr_key", expression: ":RM_PROJECT.D_EMPLOYEE_SEQ_NEXTVAL", execute_on_hint: "TARGET"],
                                   ])
        .commit()
        .validate()

The above code will create the following, fully functional, mapping in ODI 12c (sorry 11g).

It should be fairly easy to eyeball the code and reconcile it with the above mapping image. We can see that we are specifying our datastores, selecting the EMPLOYEES datastore, adding a filter, a couple of joins and then connecting to our target. OdiDsl has been designed in such a way that it mimics the flow based style of ODI 12c's mappings by chaining components onto one another.

Creating a Mapping Using OdiDsl

Let's walk through the above code, starting with just the datastores, adding the rest as we go along...

Datastores

We start by creating the mapping with mapping.create( <project>, <folder>, <mapping name>). We then chain the .datastores(), .commit() and .validate() methods onto it using the "dot" notation. The .datastores() method is the only method you can chain directly onto mapping.create() as it's a requirement to add some datastores before you start building up the mapping. The .commit() method persists the mapping in the repository and the .validate() method calls ODI's validation routine on the mapping to check if all is ok.

/*
 * OdiDsl to create a mapping with 4 datastores.
 */

mapping.drop('MY_PROJECT', 'DEMO_FOLDER', 'EMPLOYEE_DIM_LOAD')

mapping.create('MY_PROJECT', 'DEMO_FOLDER', 'EMPLOYEE_DIM_LOAD')  
        .datastores([
                [name: "HR.EMPLOYEES"],
                [name: "HR.DEPARTMENTS"],
                [name: "HR.JOBS"],
                [name: "PERF.D_EMPLOYEE", integration_type: "SCD"],
        ])
        .commit()
        .validate()

When we execute this code it returns the following to the console. You can see that the mapping has been dropped/created and that ODI has some validation warnings for us.

Connecting to the repository...

mapping EMPLOYEE_DIM_LOAD dropped  
mapping EMPLOYEE_DIM_LOAD created

  Validation Results
  ------------------
  WARNING: Mapping component EMPLOYEES has no input or output connections.
  WARNING: Mapping component DEPARTMENTS has no input or output connections.
  WARNING: Mapping component JOBS has no input or output connections.
  WARNING: Mapping component D_EMPLOYEE has no input or output connections.

And here is the mapping in ODI - well, it's a start at least...

Starting the Flow with a Filter

Before we can start building up the rest of the mapping we need to select a starting datastore to chain off, you've got to start somewhere right? For that, we call .select("EMPLOYEES"), which is a bit like clicking and selecting the component in ODI Studio. The .filter() method is then chained onto it, passing in the filter name and some configuration, in this case, the actual filter condition.

/*
 * OdiDsl to create a mapping with 4 datastores and a filter.
 */

mapping.drop('MY_PROJECT', 'DEMO_FOLDER', 'EMPLOYEE_DIM_LOAD')

mapping.create('MY_PROJECT', 'DEMO_FOLDER', 'EMPLOYEE_DIM_LOAD')  
        .datastores([
                [name: "HR.EMPLOYEES"],
                [name: "HR.DEPARTMENTS"],
                [name: "HR.JOBS"],
                [name: "PERF.D_EMPLOYEE", integration_type: "SCD"],
        ])
        .select("EMPLOYEES")
            .filter('NAME_FILTER', [filter_condition: "EMPLOYEES.FIRST_NAME LIKE 'D%'" ]) 
        .commit()
        .validate()

We now have an error in the validation results. This is expected as our filter doesn't connect to anything downstream yet.

Connecting to the repository...

mapping EMPLOYEE_DIM_LOAD dropped  
mapping EMPLOYEE_DIM_LOAD created

  Validation Results
  ------------------
  WARNING: Mapping component DEPARTMENTS has no input or output connections.
  WARNING: Mapping component JOBS has no input or output connections.
  WARNING: Mapping component D_EMPLOYEE has no input or output connections.
  ERROR: Mapping component NAME_FILTER must have a connection for output connector point OUTPUT1.

And here's the mapping, as you can see the filter is connected to the EMPLOYEES datastore output connector.

Adding a Join

Next we'll create the join between the filter and the DEPARTMENTS table. To do this we can just chain a .join() onto the .filter() method and pass in some arguments to specify the join name, what it joins to and the join condition itself.

/*
 * OdiDsl to create a mapping with 4 datastores a filter and a join.
 */

mapping.drop('MY_PROJECT', 'DEMO_FOLDER', 'EMPLOYEE_DIM_LOAD')

mapping.create('MY_PROJECT', 'DEMO_FOLDER', 'EMPLOYEE_DIM_LOAD')  
        .datastores([
                [name: "HR.EMPLOYEES"],
                [name: "HR.DEPARTMENTS"],
                [name: "HR.JOBS"],
                [name: "PERF.D_EMPLOYEE", integration_type: "SCD"],
        ])
        .select("EMPLOYEES")
            .filter('NAME_FILTER', [filter_condition: "EMPLOYEES.FIRST_NAME LIKE 'D%'" ])
            .join('EMP_DEPT', ['DEPARTMENTS'], [join_condition: "EMPLOYEES.DEPARTMENT_ID = DEPARTMENTS.DEPARTMENT_ID" ])
        .commit()
        .validate()

Only 2 validation warnings and no errors this time...

Connecting to the repository...

mapping EMPLOYEE_DIM_LOAD dropped  
mapping EMPLOYEE_DIM_LOAD created

  Validation Results
  ------------------
  WARNING: Mapping component JOBS has no input or output connections.
  WARNING: Mapping component D_EMPLOYEE has no input or output connections.

We now have a join named EMP_DEPT joining DEPARTMENTS and the filter, NAME_FILTER, together.

Adding another Join

We'll now do the same for the final join.

/*
 * OdiDsl to create a mapping with 4 datastores, a filter and 2 joins.
 */

mapping.drop('MY_PROJECT', 'DEMO_FOLDER', 'EMPLOYEE_DIM_LOAD')

mapping.create('MY_PROJECT', 'DEMO_FOLDER', 'EMPLOYEE_DIM_LOAD')  
        .datastores([
                [name: "HR.EMPLOYEES"],
                [name: "HR.DEPARTMENTS"],
                [name: "HR.JOBS"],
                [name: "PERF.D_EMPLOYEE", integration_type: "SCD"],
        ])
        .select("EMPLOYEES")
            .filter('NAME_FILTER', [filter_condition: "EMPLOYEES.FIRST_NAME LIKE 'D%'" ])
            .join('EMP_DEPT', ['DEPARTMENTS'], [join_condition: "EMPLOYEES.DEPARTMENT_ID = DEPARTMENTS.DEPARTMENT_ID" ])
            .join('DEPT_JOBS', ['JOBS'], [join_condition: "EMPLOYEES.JOB_ID = JOBS.JOB_ID" ])      
        .commit()
        .validate()

looking better all the time...

Connecting to the repository...

mapping EMPLOYEE_DIM_LOAD dropped  
mapping EMPLOYEE_DIM_LOAD created

  Validation Results
  ------------------
  WARNING: Mapping component D_EMPLOYEE has no input or output connections.

And we now have a join named DEPT_JOBS joining JOBS and the join, EMP_DEPT, to each other.

Connecting to the target

The final step is to connect the DEPT_JOBS join to our target datastore, D_EMPLOYEE. For this we can use the .connect() method. This method is used to map upstream attributes to a datastore. When you perform this action in ODI Studio, you'll be prompted with the attribute matching dialog, with options to auto-map the attributes.

OdiDsl will, by default, auto-map all attributes that are not explicitly mapped in the .connect() method. In our completed code example below we are explicitly mapping several attributes to support SCD2 functionality, auto-map takes care of the rest.

/*
 * OdiDsl to create a SCD2 dimension load mapping
 */

mapping.drop('MY_PROJECT', 'DEMO_FOLDER', 'EMPLOYEE_DIM_LOAD')

mapping.create('MY_PROJECT', 'DEMO_FOLDER', 'EMPLOYEE_DIM_LOAD')  
        .datastores([
                [name: "HR.EMPLOYEES"],
                [name: "HR.DEPARTMENTS"],
                [name: "HR.JOBS"],
                [name: "PERF.D_EMPLOYEE", integration_type: "SCD"],
        ])
        .select("EMPLOYEES")
            .filter('NAME_FILTER', [filter_condition: "EMPLOYEES.FIRST_NAME LIKE 'D%'" ])
            .join('EMP_DEPT', ['DEPARTMENTS'], [join_condition: "EMPLOYEES.DEPARTMENT_ID = DEPARTMENTS.DEPARTMENT_ID" ])
            .join('DEPT_JOBS', ['JOBS'], [join_condition: "EMPLOYEES.JOB_ID = JOBS.JOB_ID" ])
            .connect("D_EMPLOYEE", [
                                        [ attr: "employee_id", key_indicator: true ],
                                        [ attr: "eff_from_date", expression: "sysdate", execute_on_hint: "TARGET"],
                                        [ attr: "eff_to_date", expression: "sysdate", execute_on_hint: "TARGET"],
                                        [ attr: "current_flag", expression: 1, execute_on_hint: "TARGET"],
                                        [ attr: "surr_key", expression: ":RM_PROJECT.D_EMPLOYEE_SEQ_NEXTVAL", execute_on_hint: "TARGET"],
                                   ])
        .commit()
        .validate()

Nice, all validated this time.

Connecting to the repository...

mapping EMPLOYEE_DIM_LOAD dropped  
mapping EMPLOYEE_DIM_LOAD created  
Validation Successful  


What about Updates ?

Yes. We can also update an existing mapping using mapping.update(<project>, <folder>, <mapping name>). This is useful when you need to make changes to multiple mappings or when you can't drop and recreate a mapping due to a dependency. The approach is the same, we start by selecting a component with .select() and then chain a method onto it, in this case, .config().

mapping.update('MYPROJECT', 'DEMO', "EMPLOYEE_DIM_LOAD")  
        .select('DEPT_JOBS')
            .config([join_type: "LEFT_OUTER"])


Which Properties Can I Change for each Component ?

Probably more than you'll ever need to, OdiDsl mirrors the properties that are available in ODI Studio via the SDK.

Can We Generate OdiDsl Code From an Existing Mapping ?

Yes, we can do that too, with .reverse(). This will allow you to mirror the process.

Let's take this, hand built, fictional and completely CRAZY_MAPPING as an example. (fictional and crazy in the sense that it does nothing useful, however, the flow and configuration are completely valid).

If we execute .reverse() on this mapping by calling...

mapping.reverse('MY_PROJECT', 'DEMO_FOLDER', 'CRAZY_MAPPING')  

...OdiDsl will return the following output to the console. What you are seeing here is the OdiDsl required to recreate the crazy mapping above.

Connecting to the repository...

mapping.create('MY_PROJECT', 'DEMO_FOLDER', 'CRAZY_MAPPING')  
    .datastores([
         ['name':'STAGING.TABLE1', 'alias':'TABLE1'],
         ['name':'STAGING.TABLE9', 'alias':'TABLE9'],
         ['name':'STAGING.TABLE3', 'alias':'TABLE3'],
         ['name':'STAGING.TABLE4', 'alias':'TABLE4'],
         ['name':'STAGING.TABLE6', 'alias':'TABLE6'],
         ['name':'STAGING.TABLE5', 'alias':'TABLE5'],
         ['name':'STAGING.TABLE7', 'alias':'TABLE7'],
         ['name':'STAGING.TABLE2', 'alias':'TABLE2'],
         ['name':'STAGING.TABLE8', 'alias':'TABLE8'],
         ['name':'STAGING.TABLE11', 'alias':'TABLE11'],
         ['name':'STAGING.TABLE12', 'alias':'TABLE12'],
         ['name':'STAGING.TABLE13', 'alias':'TABLE13'],
         ['name':'STAGING.TABLE15', 'alias':'TABLE15'],
         ['name':'STAGING.TABLE14', 'alias':'TABLE14'],
         ['name':'STAGING.TABLE16', 'alias':'TABLE16'],
         ['name':'STAGING.TABLE17', 'alias':'TABLE17'],
         ['name':'STAGING.TABLE42', 'alias':'TABLE42'],
    ])
    .select('TABLE5')
        .join('JOIN2', ['TABLE7'], [join_condition: "TABLE5.ID = TABLE7.ID" ])
        .join('JOIN3', ['TABLE6'], [join_condition: "TABLE6.ID = TABLE7.ID" ])
        .connect('TABLE14', [
                [ attr: "ID", expression: "TABLE5.ID" ],
                [ attr: "COL1", expression: "TABLE7.COL1" ],
                [ attr: "COL2", expression: "TABLE6.COL2" ],
                [ attr: "COL3", expression: "TABLE7.COL3" ],
                [ attr: "COL4", expression: "TABLE7.COL4" ],
        ])
    .select('JOIN3')
        .expr('EXPRESSION1', [attrs: [
                [ attr: "ID", expression: "TABLE6.ID * 42", datatype: "NUMERIC", size: "38", scale: "0"]]])
        .connect('TABLE15', [
                [ attr: "ID", expression: "EXPRESSION1.ID" ],
                [ attr: "COL1", expression: "", active_indicator: false ],
                [ attr: "COL2", expression: "TABLE6.COL2" ],
                [ attr: "COL3", expression: "TABLE7.COL3" ],
                [ attr: "COL4", expression: "", active_indicator: false ],
        ])
        .join('JOIN', ['TABLE14'], [join_condition: "TABLE14.ID = TABLE15.ID" ])
        .filter('FILTER2', [filter_condition: "TABLE15.COL3 != 'FOOBAR'" ])
        .connect('TABLE16', [
                [ attr: "ID", expression: "TABLE15.ID" ],
                [ attr: "COL1", expression: "TABLE15.COL1" ],
                [ attr: "COL2", expression: "TABLE14.COL2" ],
                [ attr: "COL3", expression: "TABLE14.COL3" ],
                [ attr: "COL4", expression: "TABLE14.COL4" ],
        ])
    .select('JOIN')
        .connect('TABLE17', [
                [ attr: "ID", expression: "TABLE15.ID" ],
                [ attr: "COL1", expression: "TABLE15.COL1" ],
                [ attr: "COL2", expression: "TABLE14.COL2" ],
                [ attr: "COL3", expression: "TABLE14.COL3" ],
                [ attr: "COL4", expression: "TABLE14.COL4" ],
        ])
    .select('TABLE5')
        .sort('SORT1', [sorter_condition: "TABLE5.ID, TABLE5.COL2, TABLE5.COL4" ])
        .connect('TABLE13', [
                [ attr: "ID", expression: "TABLE5.ID" ],
                [ attr: "COL1", expression: "TABLE5.COL1" ],
                [ attr: "COL2", expression: "TABLE5.COL2" ],
                [ attr: "COL3", expression: "TABLE5.COL3" ],
                [ attr: "COL4", expression: "TABLE5.COL4" ],
        ])
    .select('TABLE3')
        .filter('FILTER1', [filter_condition: "TABLE3.ID != 42" ])
    .select('TABLE4')
        .filter('FILTER', [filter_condition: "TABLE4.COL1 = 42" ])
        .lookup('LOOKUP1', 'FILTER1', [join_condition: "TABLE4.ID = TABLE3.ID AND TABLE3.COL1 = TABLE4.COL1"])
        .join('JOIN5', ['TABLE13'], [join_condition: "TABLE13.ID = TABLE3.ID" ])
        .distinct('DISTINCT_', [attrs: [
                [ attr: "COL3_1", expression: "TABLE4.COL3", datatype: "VARCHAR", size: "30"],
                [ attr: "COL4_1", expression: "TABLE4.COL4", datatype: "VARCHAR", size: "30"]]])
    .select('DISTINCT_')
        .join('JOIN4', ['EXPRESSION1'], [join_condition: "TABLE5.ID = TABLE6.COL1" ])
        .sort('SORT', [sorter_condition: "EXPRESSION1.ID" ])
        .connect('TABLE8', [
                [ attr: "ID", expression: "EXPRESSION1.ID" ],
                [ attr: "COL1", expression: "", active_indicator: false ],
                [ attr: "COL2", expression: "", active_indicator: false ],
                [ attr: "COL3", expression: "TABLE7.COL3" ],
                [ attr: "COL4", expression: "", active_indicator: false ],
        ])
        .connect('TABLE12', [
                [ attr: "ID", expression: "TABLE8.ID" ],
                [ attr: "COL1", expression: "TABLE8.COL1" ],
                [ attr: "COL2", expression: "TABLE8.COL2" ],
                [ attr: "COL3", expression: "TABLE8.COL3" ],
                [ attr: "COL4", expression: "TABLE8.COL4" ],
        ])
    .select('TABLE9')
        .expr('EXPRESSION', [attrs: [
                [ attr: "ID", expression: "TABLE9.ID *42", datatype: "NUMERIC", size: "38", scale: "0"],
                [ attr: "COL4", expression: "TABLE9.COL4 || 'FOOBAR'", datatype: "VARCHAR", size: "30"]]])
        .connect('TABLE1', [
                [ attr: "ID", expression: "EXPRESSION.ID" ],
                [ attr: "COL1", expression: "", active_indicator: false ],
                [ attr: "COL2", expression: "", active_indicator: false ],
                [ attr: "COL3", expression: "", active_indicator: false ],
                [ attr: "COL4", expression: "TABLE9.COL4" ],
        ])
        .join('JOIN1', ['TABLE2'], [join_condition: "TABLE1.ID = TABLE2.ID" ])
        .aggregate('AGGREGATE', [attrs: [
                [ attr: "ID", expression: "TABLE1.ID", datatype: "NUMERIC", size: "38", scale: "0", group_by: "YES"],
                [ attr: "COL4_1", expression: "MAX(TABLE2.COL4)", datatype: "VARCHAR", size: "30", group_by: "AUTO"]]])
        .lookup('LOOKUP', 'DISTINCT_', [join_condition: "AGGREGATE.ID = DISTINCT_.COL3_1"])
        .aggregate('AGGREGATE1', [attrs: [
                [ attr: "ID", expression: "AGGREGATE.ID", datatype: "NUMERIC", size: "38", scale: "0", group_by: "YES"],
                [ attr: "COL4_1_1", expression: "SUM(AGGREGATE.COL4_1)", datatype: "VARCHAR", size: "30", group_by: "AUTO"]]])
        .filter('FILTER3', [filter_condition: "AGGREGATE1.COL4_1_1 > 42" ])
        .connect('TABLE42', [
                [ attr: "ID", expression: "AGGREGATE1.ID" ],
        ])
    .select('AGGREGATE1')
        .join('JOIN6', ['TABLE8'], [join_condition: "AGGREGATE1.ID = TABLE8.ID" ])
        .connect('TABLE11', [
                [ attr: "ID", expression: "TABLE8.ID" ],
                [ attr: "COL1", expression: "" ],
                [ attr: "COL2", expression: "" ],
                [ attr: "COL3", expression: "TABLE8.COL3" ],
                [ attr: "COL4", expression: "TABLE8.COL4" ],
        ])
    .commit()
    .validate()

When we execute this OdiDsl code we get, you guessed it, exactly the same crazy mapping with the flow and component properties all intact.

Being able to flip between ODI studio and OdiDsl has some really nice benefits for those who like the workflow. You can start prototyping a mapping in ODI Studio, convert it to code, hack around for a bit and then reflect it all back into ODI. It's also very useful for generating a "code template" from an existing mapping. The generated code template can be modified to accept variables instead of hard coded properties, all you need then is some metadata.

Did Somebody Say Metadata ?

Metadata is the key to bulk automation. You can find metadata in all kinds of places. If you are migrating to ODI from another product then there will be a whole mass of metadata living in your current product's repository or via some kind of export routine which typically produces XML files. If you are starting a fresh ODI implementation, then there will be metadata living in your source and target systems, in data dictionaries, in excel sheets, in mapping specifications documents, all over the place really. This is the kind of metadata that can be used to feed OdiDsl.

A Quick Example of One possible Approach to Using OdiDsl With Metadata

First we build a mapping in Odi Studio, this will act as our template mapping.

We then generate the equivalent OdiDsl code using mapping.reverse('MY_PROJECT', 'DEMO_FOLDER', 'FEED_ME_METADATA'). Which gives us this code.

mapping.create('MY_PROJECT', 'DEMO_FOLDER', 'FEED_ME_METADATA')  
    .datastores([
         ['name':'STAGING.TABLE1', 'alias':'LOOKUP_TABLE'],
         ['name':'STAGING.TABLE2', 'alias':'TABLE2'],
         ['name':'STAGING.TABLE3', 'alias':'TABLE3'],
         ['name':'STAGING.TABLE4', 'alias':'TARGET_TABLE'],
    ])
    .select('TABLE2')
        .lookup('LOOKUP', 'LOOKUP_TABLE', [join_condition: "TABLE2.ID = LOOKUP_TABLE.ID"])
        .join('JOIN', ['TABLE3'], [join_condition: "TABLE2.ID = TABLE3.ID" ])
        .filter('FILTER', [filter_condition: "TABLE3.COL1 = 'FILTER'" ])
        .expr('EXPRESSION', [attrs: [
                [ attr: "CONSTANT", expression: "42", datatype: "VARCHAR", size: "30"]]])
        .connect('TARGET_TABLE', [
                [ attr: "ID", expression: "LOOKUP_TABLE.ID" ],
                [ attr: "COL1", expression: "LOOKUP_TABLE.COL1 || EXPRESSION.CONSTANT" ],
                [ attr: "COL2", expression: "TABLE2.COL2" ],
                [ attr: "COL3", expression: "TABLE3.COL3" ],
                [ attr: "COL4", expression: "LOOKUP_TABLE.COL4" ],
        ])
    .commit()
    .validate()

We now need to decorate this code with some variables, these variables will act as place holders for our metadata. The metadata we are going to use is from a database table, I'm keeping it simple for the purpose of this demonstration but the approach is the same. Our metadata table has 10 rows and from these 10 rows we are going to create 10 mappings, replacing certain properties with the values from the columns.

Remember that OdiDsl is expressed in Groovy. That means, as well as OdiDsl code, we also have access to the entire Groovy language. In the following code we are using a mixture of Groovy and OdiDsl. We are connecting to a database, grabbing our metadata and then looping over mapping.create(), once for each row in our metadata table. The columns in the metadata table are represented as the variables row.datastore, row.constant_expr and row.filter_cond. The code comments indicate where we are substituting these variables in place of our previously hard coded property values.

import groovy.sql.Sql

// Connect to the database and retrieve rows from the METADATA table.
def sqlConn = Sql.newInstance("jdbc:oracle:thin:@hostname:1521/pdborcl", "username", "password", "oracle.jdbc.pool.OracleDataSource")  
def rows = sqlConn.rows("SELECT * FROM METADATA")  
sqlConn.close()

// For each row in our METADATA table
rows.eachWithIndex() {  row, index ->

    mapping.create('MY_PROJECT', 'DEMO_FOLDER', "FEED_ME_METADATA_${index+1}") // Interpolate row number to make the mapping name unique
            .datastores([
                    ['name': 'STAGING.TABLE1', 'alias': 'LOOKUP_TABLE'],
                    ['name': "STAGING.${row.datastore}" ], // substitute in a different datastore
                    ['name': 'STAGING.TABLE3', 'alias': 'TABLE3'],
                    ['name': 'STAGING.TABLE4', 'alias': 'TARGET_TABLE'],
            ])
            .select(row.datastore)
                .lookup('LOOKUP', 'LOOKUP_TABLE', [join_condition: "${row.datastore}.ID = LOOKUP_TABLE.ID"]) // substitute in a different datastore
                .join('JOIN', ['TABLE3'], [join_condition: "${row.datastore}.ID = TABLE3.ID"]) // substitute in a different datastore
                .filter('FILTER', [filter_condition: "TABLE3.COL1 = '${row.filter_cond}'"]) // substitute in a different filter condition
                .expr('EXPRESSION', [attrs: [
                    [attr: "CONSTANT", expression: row.constant_expr, datatype: "VARCHAR", size: "30"]]]) // substitute in a different constant for the expression
                .connect('TARGET_TABLE', [
                    [attr: "ID", expression: "LOOKUP_TABLE.ID"],
                    [attr: "COL1", expression: "LOOKUP_TABLE.COL1 || EXPRESSION.CONSTANT"],
                    [attr: "COL2", expression: "${row.datastore}.COL2"], // substitute in a different datastore
                    [attr: "COL3", expression: "TABLE3.COL3"],
                    [attr: "COL4", expression: "LOOKUP_TABLE.COL4"],
                ])
            .commit()
            .validate()

}

Here is the output...

Connecting to the repository...

mapping FEED_ME_METADATA_1 created  
Validation Successful  
mapping FEED_ME_METADATA_2 created  
Validation Successful  
mapping FEED_ME_METADATA_3 created  
Validation Successful  
mapping FEED_ME_METADATA_4 created  
Validation Successful  
mapping FEED_ME_METADATA_5 created  
Validation Successful  
mapping FEED_ME_METADATA_6 created  
Validation Successful  
mapping FEED_ME_METADATA_7 created  
Validation Successful  
mapping FEED_ME_METADATA_8 created  
Validation Successful  
mapping FEED_ME_METADATA_9 created  
Validation Successful  
mapping FEED_ME_METADATA_10 created  
Validation Successful  

And here are our 10 mappings, each with it's own configuration.

If we take a look at the FEED_ME_METADATA_5 mapping, we can see the metadata has been reflected into the mapping.

And that's about it. We've basically just built a mini accelerator using OdiDsl and we hardly had to write any code. The OdiDsl code was generated for us using .reverse(). All we really had to code, was the connection to the database, a loop and bit of variable substitution!

Summary

With the Rittman Mead ODI Tool kit, accelerating your ODI implementation has never be easier. If you are thinking about migrating to ODI from another product or embarking on a new ODI Project, Rittman Mead can help. For more information please get in touch.

Create Your Own DVD Plugin in 22 minutes

Introduction

Oracle DVD played well for the task of elections data analysis. But I faced some limitations and had to curtail my use of one of my favourite charts - Scatter chart. While DVD’s implementation of it looks good, has smooth animations and really useful, it is not suitable for visualisations of huge datasets. Election dataset has about 100K election precinct data rows and if I try to plot all of them at one chart I will get something close to this. It doesn't really important what is the data, just compare how charts look like.

Picture with Turnout%, United Russia% by PEC

This picture can give us about zero insights. Actually, I'm not even sure if shows my 100K points. What I want to get is something like this:
plugin with Turnout%, United Russia% by PEC (source: https://habrahabr.ru/post/313372/)

What can I see at this chart and can't see at the vanilla one? First of all, it gives a much better understanding of points distribution other the plot. I can see a relatively dense core around coordinates [30;40] and then not so dense tail growing by both dimensions and again a small and very dense core in the top right corner.
Secondly, it not only shows dense and sparse areas but shows that there are exist a few points which have an unexplainably high concentration. Around [64;62], for example.

Luckily DVD allows me (and anyone else too) create custom plugins. That's what I'm going to show here. The plugin I'm going to do is something I'd call Minimum Viable Product. It's very, very simple and needs a lot of work to make it good and really useful.

The code I'm going to show is the simplest I can invent. It doesn't handle errors, exceptions and so on. It should be more modular. It possibly could use some optimisation. But that was done for a purpose. And the purpose is called 'simplicity'. If you want a more comprehensive sample, there is a link to the Oracle's guide in the Summary. I want to show you that writing a DVD plugin is not a rocket science. It takes less than half an hour to build your own DVD plugin.

And the clock starts now!

[0:00-5:40] Setup Environment and DVD SDK

Setup

Download and install DVD. I presume you already have it, because from my experience people who doesn't have it very rarely feel a need for its plugins. Hence I didn't include download and installation times into 22 minutes.

Create a folder to store your plugins and some Gradle-related configs and scripts. It may be any directory you like, just select your favourite place.

Next step is to define some environment variables.

  • The first one is DVDDESKTOP_SDK_HOME and it should point to thу DVD folder. In most of the cases, it is C:\Program files\Oracle Data Visualisation Desktop.
  • The second variable is PLUGIN_DEV_DIR it points to the folder you created 40 seconds ago. These two variables seem not to be absolutely necessary, but they make your life a bit easier.
  • And one more change to the environment before going further. Add %DVDESKTOP_SDK_HOME%\tools\bin to the PATH variable (do you feel how your life became easier with DVDDESKTOP_SDK_HOME already defined so you can start using it right now?).

And the last step is to initialise our development environment. Go to the plugins directory and execute the initialization command. Open cmd window and execute:

cd %PLUGIN_DEV_DIR%  
bicreateenv  

This will not only initialise everything your need but create a sample plugin to explore and research.

Let's take a look at what we got. In the folder we have created before (%PLUGIN_DEV_DIR%) this command generated the following structure.

a screenshot of %PLUGIN_DEV_DIR%

All names here are more than self-explanatory. You won't need to do anything with .gradle and gradle, and src stores all the code we will develop. Please notice that src already has sample plugin sampleviz. You may use it as a referrence for your own work.

Create First Plugin

Every plugin consists of a few parts. We don't need to create them from scratch every time. Oracle gives us a simple way of creating a new plugin. If you didn't close cmd window from previous steps execute the following:

bicreateplugin viz -id com.rittmanmead.demo -subType dataviz  

Obviously, you should use your own company name for the id. The last word (demo) plus word Plugin is what user will see as a visualisation name. Obviously we can change it but not today.

This will create a fully operational plugin. Where can we find it? You shouldn't be too surprised if I say "In src folder".

a screenshot of %PLUGIN_DEV_DIR%

And we may test it right now. Yes, its functionality is very limited, but you didn't expect it to do something useful, right?

Start SDK

Now we should start Oracle DVD in SDK mode. The main difference from the normal mode is that DVD will start in the default browser. That's not a typo. DVD will start as a usual site in a browser. It's a good idea to select as the default a browser with good developer console and tools. I prefer Chrome for this task but that's not the only choice. Choose the browser you'd normally use for site development. When you finish with setting up of the default browser do:

cd %PLUGIN_DEV_DIR%  
.\gradlew run

If you don't have an ultrafast computer I'd recommend you to make some tea at this point. Because the very first start will take a long time. Actually, it will consume most of the first stage time.

[5:40-6:15] Test the Plugin

We didn't start changing the first plugin yet, so I expect everything to be OK at this point. But anyway it's a good idea to test everything. I think we can afford 30-40 seconds for this.

And again I assume you know how to use DVD and don't need a step-by-step guide on creating a simple visualisation. Just create a new project, select any data source, go to Visualisations, find your plugin and add it. You may even add some data to the plugin and it will show how many rows your data has. I think that's a good place to start from.

a picture of plugin

[6:15-22:00] Add Functions

Change parameters

If you didn't skip testing phase and played with this new toy for a while you may have noticed two things we need to change right now. First, for a chosen type of visualisation, I need that my plugin can accept two measures. You possibly noticed that I was unable to add Number of cancelled ballots as a second measure. By default a plugin accepts not more than one (zero or one). Luckily it can be changed very easily.

We can find all parameter about measures, rows, columns etc inside of extensions folder. In this case it is %PLUGIN_DEV_DIR%\customviz\com-rittmanemad-demo\extensions. Here we can find two folders. The first one is oracle.bi.tech.pluginVisualizationDataHandlerand it has only one file com.rittmanemad.demo.visualizationDataHandler.JSON. This file allows us to define which types of input data our plugin has (measures, columns, rows, colour etc.), what are their types (measures/dimensions), which is the default one and so on.

a picture of json and plugin side-by side Here we should change maxCount to 2. This will allow us to add more than one measure to our visualisation.

The second thing I want to change is the number of data points. Elections dataset has data about 100K precinct commissions. And DVD's default is 10K. We can change this value in the second configuration file from extensions folder. Its name is oracle.bi.tech.Visualization. There is again only one JSON file com.rittmanmead.demo.json. We need to change rows to 200000.

Why 200000 if I need only 100000 points? Well, it's about how data will be passed to our code. Every measure is a separate row, so I need 100K points with 2 measures each and that gives me 200000 rows. It looks like right now that's the only way to have more than one measure in a plugin (at least it's my best knowledge for DVD 12.2.2.2.0-20170208162627).

a picture of json and plugin side-by side

Note. I could change here some things like plugin name, input data types shown to user and so on but my aim is simplicity.

Now we should restart DVD SDK in order to use new parameters.

two measures and ~200K rows

Write code

For my custom scatter chart I'm going to use d3js JavaScript library. It will allow me to concentrate more on logic and less on HTML manipulation. To add it to my plugin I should add a couple of strings in the beginning of my code.

Before:

define(['jquery',  
        'obitech-framework/jsx',
        'obitech-report/datavisualization',
        'obitech-reportservices/datamodelshapes',
        'obitech-reportservices/events',
        'obitech-appservices/logger',
        'ojL10n!com-rittmanmead-demo/nls/messages',
        'obitech-framework/messageformat',
        'css!com-rittmanmead-demo/demostyles'],
        function($,
                 jsx,
                 dataviz,
                 datamodelshapes,
                 events,
                 logger,
                 messages) {

After (I added two lines starting with d3):

define(['jquery',  
        'obitech-framework/jsx',
        'obitech-report/datavisualization',
        'obitech-reportservices/datamodelshapes',
        'obitech-reportservices/events',
        'obitech-appservices/logger',
      'd3js',
        'ojL10n!com-rm-domoViz/nls/messages',
        'obitech-framework/messageformat',
        'css!com-rittmanmead-demo/demostyles'],
        function($,
                 jsx,
                 dataviz,
                 datamodelshapes,
                 events,
                 logger,
               d3,
                 messages) {

That's all. Now I can use d3js magic. And that's cool.

OK, it's time to make our plugin do something more useful than a simple counting of rows. All plugin code is in demo.js file and the main procedure is render. It is called every time DVD needs to update the visualisation. The initial version of this function is really small. Without comments there are only four rows. It retrieves data, counts rows, then finds a container to write and writes a message.

Demo.prototype.render = function(oTransientRenderingContext) {  
      // Note: all events will be received after initialize and start complete.  We may get other events
      // such as 'resize' before the render, i.e. this might not be the first event.

      // Retrieve the data object for this visualization
  var oDataLayout = oTransientRenderingContext.get(dataviz.DataContextProperty.DATA_LAYOUT);

      // Determine the number of records available for rendering on ROW
      // Because we specified that Category should be placed on ROW in the data model handler,
      // this returns the number of rows for the data in Category.
  var nRows = oDataLayout.getEdgeExtent(datamodelshapes.Physical.ROW);

      // Retrieve the root container for our visualization.  This is provided by the framework.  It may not be deleted
      // but may be used to render.
  var elContainer = this.getContainerElem();
  $(elContainer).html(messages.TEXT_MESSAGE.format("Demo Plugin", "" + nRows));
};

First of all, I need to know the actual size of the plotting area. I simply added that after var elContainer = this.getContainerElem();:

  [...]
  var elContainer = this.getContainerElem();

  //container dimensions
  var nWidth = $(elContainer).width();
  var nHeight = $(elContainer).height();
  [...]

The template plugin has code for pulling the data into the script (oDataLayout variable). But for my scatter plot, I need to put this data into an array and find the minimum and the maximum value for both measures. This part of the code is my least favourite one. It looks like currently we can't make two separate measures in a custom viz (or at least I can't find the solution), therefore instead of two nice separate measures, I have them as different rows. Like:

.........
PEC #1Turnout,%34,8%
PEC #1United Russia,%44,3%
PEC #2Turnout,%62,1%
PEC #2United Russia,%54,2%
PEC #3Turnout,%25,9%
PEC #3United Russia,%33,2%
.........

I really hope that any solution for this will be found. So far I just put even and odd rows into X ad Y coordinates. At the same time, I determine minimum and maximum values for both axes.

  //temporary measure
  var tmp_measure;

  //current measure
  var cur_measure;

  var vMinimax=[Number.MAX_VALUE, Number.MIN_VALUE];
  var hMinimax=[Number.MAX_VALUE, Number.MIN_VALUE];

  for(var i=0;i<nRows;i++){

    if(i%2==1){
      cur_measure=Number(oDataLayout.getValue(datamodelshapes.Physical.DATA, i, 0, false));
      vMinimax[0]=Math.min(vMinimax[0], cur_measure);
      vMinimax[1]=Math.max(vMinimax[1], cur_measure);
      points.push([cur_measure,tmp_measure]);
    }
    else{
      tmp_measure=Number(oDataLayout.getValue(datamodelshapes.Physical.DATA, i, 0, false));
      hMinimax[0]=Math.min(hMinimax[0], tmp_measure);
      hMinimax[1]=Math.max(hMinimax[1], tmp_measure);
    }
  }

The next part of the code is about handling multiple calls. I should handle things like changing measures or dimensions. I simply delete the old chart and create a new empty one.

  var oSVG;

  //Delete old chart if exists
  d3.select("#DemoViz").remove();

  //Create new SVG with id=DemoViz
  oSVG=d3.select(elContainer).append("svg");
  oSVG.attr("id","DemoViz");

  //Set plot area size
  oSVG=oSVG
       .attr("width", nWidth)
       .attr("height", nHeight)
       .append("g");

Now I have an array with data. I have a prepared SVG plot for drawing. And I have no reason not to combine all this into a chart.

  //Compute scaling factors
  var hStep=(nWidth-40)/(hMinimax[1]-hMinimax[0]);
  var vStep=(nHeight-40)/(vMinimax[1]-vMinimax[0]);

  //Draw
  oSVG.selectAll("rect")
      .data(points)
      .enter()
      .append("rect")
      .attr("x", function(d) {return 20+d[1]*hStep;})
      .attr("y", function(d) {return nHeight-(20+d[0]*vStep);})
      .attr("height",1).attr("width",1)
      .attr('fill-opacity', 0.3);

What I should mention here is opacity. If its value is too high (almost no opacity) the chart looks like this:

0.99

If I make dots more opaque, the chart looks (and works) better. This is how it looks like with fill-opacity=0.3.
0.99 Look at the black point in the middle. It's there because a lot of commissions has exactly the same values for both measures. That's how we could find Saratov region anomaly.

[22:00-] Summary and TODOs

This shows how we may enhance Oracle DVD with our own visualisation. Yes, it's really, really, really simple. And it needs a lot of work in order to make it more useful. First of all, we should make it more reliable. This realisation doesn't test input parameters for correctness. And we also need to add interactivity. The chart should be zoomable and selectable. It should work as a filter. It should react to resize. It needs visible axes. And so on. But we can create it in less than a half of hour and I think as a first step it works well. I hope you don't afraid of making your own plugins now.

If you need more complex sample, take a look at the Oracle's tutorial. It covers some of these TODOs.