Category Archives: Rittman Mead

Oracle Data Integrator 12c: Getting Started – Developer’s Quickstart

I’ve decided that it’s time for a refresher on Oracle Data Integrator 12c. This week in the “Oracle Data Integrator 12c: Getting Started” series: getting a quick start on mapping development. Several objects must be created before a single bit of ETL can even be created, and for those who are new to the product, as many readers of this series will be, that can be frustrating. The objects that must be in place are as follows:

  • Data Server
  • This object is the connection to your data source. Created under one of the many technologies available in ODI, this is where the JDBC url, username, password, and other properties are all created and stored.
  • Physical Schema
  • Underneath the Data Server you’ll find the Physical Schema. This object, when connecting to a relational database, represents the database schema where the tables reside that you wish to access in ODI.
  • Logical Schema
  • Here’s where it can sometimes get a bit tricky for folks new to Oracle Data Integrator. One of the great features in ODI is how it abstracts the physical connection and schema from the logical objects. The Logical Schema is mapped to the Physical Schema by an object called a Context. This allows development of mappings and other objects to occur against the Logical schema, shielding the physical side from the developers. Now when promoting code to the next environment, nothing must changed in the developed objects for the connection.
  • Model
  • Once you have the Topology setup (Data Server, Physical Schema, Logical Schema), you can then create your Model. This is going to be where the logical Datastores are grouped for a given schema. There are many other functions of the Model object, such as journalizing (CDC) setup, but we’ll save those features for another day.
  • Datastore
  • The Datastore is a logical representation of a table, file, XML element, or other physical object. Stored in the form of a table, the Datastore has columns and constraints. This is the object that will be used as a source or target in your ODI Mappings.

Now you can create your mapping. Whew!

Over the years, Oracle has worked to make the process of getting started a lot easier. Back in ODI 11g, the Oracle Data Integrator QuickStart was a 10 step checklist, where each step leads to another section in the documentation. A nice gesture by Oracle but by no means “quick”. There was also a great tool, the ODI Accelerator Launchpad, built in Groovy by David Allan of the Oracle DI team. Now we were getting closer to something “quick”. But this was simply a script that you had to run, not an integrated part of the ODI Studio platform. Finally, with the release of ODI 12.1.3, the Quickstart was introduced. The New Model and Topology Objects wizard allows you to create everything you need in order to reverse engineer tables into ODI Datastore objects and begin creating your first mappings.

ODI 12c New Model and Topology Objects wizard

Going through the wizard is much simpler than manually setting up the Topology objects and Model for folks just getting started with Oracle Data Integrator. The blog post from Oracle linked above can walk you through the process and I’ve added a demonstration video below that does the same. As a bonus in my demo, I’ve added a tip to help you get your initial load mappings created in an instant. Have a look:

There you have it, a quick and easy way to get started with Oracle Data Integrator 12c and create your first source to target Mapping. If you have further questions and would like a more detailed answer, you can always join one of the Rittman Mead ODI bootcamps to learn more from one of our data integration experts. Up next in the Getting Started series, we’ll look at adding enhancing the ODI metadata by adding constraints and other options.

Oracle Data Integrator 12c: Getting Started – Installation and Configuration

I’ve decided that it’s time for a refresher on Oracle Data Integrator 12c. This week, I’m continuing the “Oracle Data Integrator 12c: Getting Started” series with the next topic: Installation and Configuration. I don’t plan to dig into all of the nitty gritty details on each specific scenario you might run into. Rittman Mead have an excellent training program for ODI 12c where you can ask the specifics. There is also this 100+ page document from Oracle that can provide further details: Oracle® Fusion Middleware - Installing and Configuring Oracle Data Integrator 12c (12.2.1) E55859-03. In this post, we’ll hit on the basics and hopefully guide you in the right direction.

First, let’s take a look at the installation process. Installing ODI 12c is quite simple compared to past versions of product. This is mainly due to the configuration of agents being moved outside of the initial installation and into a completely different process. The big choice we have now is which implementation to choose: standard or enterprise.

Standard:
Choose this installation if you plan to use a Standalone Agent to execute your mappings and data integration processes. This will install ODI Studio and the Standalone Agent, along with all of the necessary drivers and libraries to make everything work. Choose this option when installing ODI Studio on your client machine.

Enterprise:
This option will install the JEE Agent, Standalone Colocated Agent, as well as ODI Studio. The installation footprint is slightly larger than the Standard install, coming in at 3390 MB vs 3017 MB for standard. That’s a part of why I choose Standard when just installing Studio. Along with the JEE Agent and ODI Studio, you’ll also have the ODI Console and ODI plugin for Fusion Middleware Control (Enterprise Manager). One thing to note, you’ll need to first install the Oracle Fusion Middleware Infrastructure (download) package as it is a prerequisite to the ODI 12c Enterprise version.

To get started, download the bits and bytes from Oracle OTN and unzip the file. Next, using the appropriate Java version, run the ODI installer.

[oracle@bigdatalite bin]$ ./java -version
java version "1.8.0_111"
Java(TM) SE Runtime Environment (build 1.8.0_111-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.111-b14, mixed mode)
[oracle@bigdatalite bin]$ cd ../..
[oracle@bigdatalite odi122120]$ ./jdk1.8.0_111/bin/java -jar fmw_12.2.1.2.0_odi.jar 
Launcher log file is /tmp/OraInstall2016-10-21_12-20-12PM/launcher2016-10-21_12-20-12PM.log.
Extracting the installer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . .

From there, simply go through the install wizard. I won’t go into too much because there’s not much to it. First, decide whether you will install the Standard or Enterprise version. Next, determine where you want to install your ODI home. For a JEE or Colocated Agent installation, ODI must be installed into the Oracle Home containing Fusion Middleware Infrastructure. For Standalone agent, a brand new Oracle Home must be used. Now that we’ve installed the software, it’s time to setup the repository database schema.

Master and Work Repository Setup

The Repository Configuration Utility (RCU) is delivered as a part of the ODI install. It is used to create Fusion Middleware repositories, such as the ODI Master/Work Repositories, and any additional database schemas necessary.

Start the RCU by browsing to ODI_HOME/oracle_common/bin and running ./rcu. The steps are pretty straightforward, so I’ll leave this as an exercise for everyone to try at home. Once we have our repositories, we can begin configuring Agents and Studio.

ODI Studio Configuration

Setting up ODI Studio is fairly straightforward, so I’ll keep it brief. Once installed, browse to ODI_HOME/odi/studio and fire up ./odi.sh (or odi.exe in Windows). Then, create a new connection to your ODI Master and Work repository. Enter the ODI username/password to connect with, master repository schema username/password, JDBC connection information, and Work repository. Test the connection to ensure it’s all working and you’re set.

Standalone Agent Configuration

Browse to your /oracle_common/common/bin and run the WebLogic Server Configuration wizard.

./config.sh

Create ODI Agent

First, create a new domain for the agent. This is similar to a WebLogic domain if you’ve gone through the setup process for any application that relies on WLS. It’s recommended to create the domain outside of the ODI Oracle Home. That way you can avoid issues when upgrading/reinstalling and can follow the same domain directory structure if installing multiple agents, since each must be installed in its own domain.

Next, choose the Standalone Agent Template, installed as part of the ODI download. Then, select the JDK to be used by the agent.

Configure ODI Agent component

The repository is already setup, so now if we connect to the service table (STB) schema, we can look up the ODI repository connection information without having to manually enter anything.

Get RCU configuration

The system component will be the ODI Agent name. Then, set the host server listen address, port, and Supervisor username/password.

ODI Agent configuration

Finally, setup the Node Manager for the ODI Agent. The Node Manager doesn’t have to be used when starting the agent, but it is a requirement to set up during configuration. The Username/Password should be kept in a safe place, as it will be necessary for starting components within the Node Manager.

Finally, click Create and you’re off and running.

Starting the Standalone Agent

Before you can start the agent, be sure to configure it in the master repository via ODI studio. Check out the video below for a short tutorial.

Now, you can startup the agent with the node manager or without it. If we want to manage the agent via Fusion Middleware Control, use the Node Manager. But, if you plan to simply use the command line to start / stop the agent, you can do so using the classic startup method.

Start Agent with Node Manager

From within the DOMAIN_HOME/bin directory, run the following. You’ll want to use nohup to start the process in the background on Linux. For Windows, create a service to run nodemanager.cmd.

[oracle@bigdatalite bin]$ pwd
/u01/odi_agent/user_projects/domains/base_domain/bin
[oracle@bigdatalite bin]$ nohup ./startNodeManager.sh > nm.out&
[oracle@bigdatalite bin]$ nohup: ignoring input and redirecting stderr to stdout

Next, start the agent component from within the same directory. You’ll be prompted for the Node Manager password setup during configuration.

[oracle@bigdatalite bin]$ ./startComponent.sh OracleDIAgent1
Starting system Component OracleDIAgent1 ...

Initializing WebLogic Scripting Tool (WLST) ...
...
Reading domain from /u01/odi_agent/user_projects/domains/base_domain

Please enter Node Manager password:
Connecting to Node Manager ...

Start Agent without Node Manager

For times when you don’t need the Node Manager, you can still start the ODI Agent using the classic method. Just remember to always use all-caps when referring to the agent parameters, such as NAME.

./agent.sh -NAME=OracleDIAgent1

Either approach for starting the agent will get you up and running.

Colocated Agent Configuration

The Colocated Agent is essentially a Standalone agent that is managed and maintained via WebLogic Server. Rather than repeat the steps, I thought I would point you to a nice Oracle By Example on Creating a Colocated Agent. As mentioned earlier, be sure to create your domain outside of the ODI home.

Ping colocated agent

For reference, when testing an agent, simply place the hostname:port/agent-web-application-context in a browser. The Ping response means the agent is up and running.

JEE Agent Configuration

Finally, we’ll discuss the JEE Agent configuration. As mentioned in my previous post, the main reasons for using the JEE Agent is for high availability or connection pooling. You may also want to use the ODI Console or need to manage and monitor ODI agents from FMW Control. Whatever the reason, here’s how we configure the agent.

The configuration process is very similar to that of the Colocated Agent, but with a several extra steps involved. I’ll highlight those steps here. To get started, just as with the other agents, browse to the ODI_HOME/oracle_common/common/ directory and run ./config.sh.

Work through the usual first few installer screens. Then, we select the template to configure. Choose ODI Agent and any additional items you’d like to add to the domain, such as ODI Console or Enterprise Manager plugin.

JEE Agent template

Next, add the credential mappings for both the ODI user, allowing JEE agent access to the Master / Work Repository, and the WebLogic domain user.

JEE Agent credential mapping

Select the options you plan to configure for the domain. This will vary based on what currently exists in WebLogic Server, but in this example we’ll be setting up a new AdminServer, Node Manager, and the Managed Server to host the ODI JEE Agent.

JEE Agent domain configuration

Next, configure the AdminServer, Node Manager, and Managed Server. For each, remember to use the machine IP address and not the default “Local All Machines”. For the node manager, enter a username and password as we did in previous configurations.

JEE Agent server configuration

The final screens in the configuration wizard allow you to create a cluster and assign the cluster to the server. You can assign a dynamic cluster as needed for scalability, though this falls outside the context of this article. Assign the cluster to a server, create the machine (remember to assign the IP address), and finally assign the servers (AdminServer and Managed Server) to the machine.

JEE Agent topology

Click create and you're done! Now start weblogic, node manager, then login to console to start ODI managed server and agent.

There you are, ODI Agents installed and configured. As always, there are many external factors that may change how your configuration and setup works in your environment, so please think of this as a simple guide and not the exact approach for each unique situation. If you do have further questions and would like a more detailed answer, you can always join one of the Rittman Mead ODI bootcamps to learn more from one of our experts on the product. Up next in the Getting Started series, we’ll look at creating your first ODI mapping.

Oracle Data Visualization Desktop: Star Schemas and Multiple Fact Sources

Once added to a project, columns I specified with my custom SQL query now shows up as one source.

Now that I have a custom data source with only the data I want to explore, I can easily create an analysis without having to sift through multiple data sources.

* A note about Data Visualization Desktop and caching: When using the above method of writing SQL to create a virtual table, it is loaded into the cache. You should only use this method for very compact models. If the virtual table contains too much data, you can still add it as a data source but it may be too big to cache, causing your columns not to load when creating analyses.

Although having one fact source is common in relational models, using multiple fact sources is sometimes unavoidable in dimensional modeling.

In my sample data, I have another schema called GCBC_SURVEYS which contains two fact tables containing satisfaction scores for both customers and staff and one dimension table containing the organization that conducted the surveys.

For this example, I’m going to try to add each table as a data source manually first and attempt to join the two fact tables to my dimension table. When using this method, pay close attention to change any key and ID columns from Measure to Attribute so they aren’t aggregated. Data Visualization Desktop sees a numeric datatype and assumes it’s a measure.

Once I've added in all of the GCBCSURVEYS tables as data sources, I’m going to load them all into a project and create my joins using source diagram. When I joined each fact table to the dimension table on the SURVORG_ID, notice how DVD automatically created a join between my two fact tables.

This is not desirable because, due to the presence of a circular join, we run the risk of double counting. When I try to break the join between the two fact tables, DVD asks which data source I want to break conformance from.

When I select one of the fact sources, it will not only break the join between the two fact sources but also the join between the fact and the dimension table.

As of this writing, I have not found a way to only break joins between fact tables if they are using the same key to connect to the dimension table.

The only workaround to this I’ve found is to write SQL statement to pull in the columns and create the joins into one virtual table. This way I could specify the joins without DVD creating one between the fact sources.

Once I created my virtual table, I could use it to create a report and use both fact sources.

Although it can take some time to set up all the data sources you want to use for your model, Data Visualization Desktop packs some powerful features when it comes to dimensional modeling. The ability to use more than one fact source when needed adds another area of flexibility to DVD. Whether you are pulling in each table and then creating your model or writing a SQL to create one virtual table, DVD has the flexibility to be able to accommodate a variety of different scenarios.

Creating Security Profiles in ODI 12c

Creating Security Profiles in ODI 12c

As a newcomer to ODI I enjoy hearing from the more seasoned veterans about common situations they encounter on projects. One of these recurring situations (especially if the company has a very small dev team) is the lack of security. I will not discuss how Oracle improved security by using public/private key pairs for Cloud services, external hackers or any of the buzz words the media likes to toss about. But, I will share with you an easy way to create profiles in ODI to setup a more secure work environment.

Generally speaking, security is neglected because admins, operators or users are not aware of how to set it up or they find it too limiting and tedious to deal with. Other times you might see the exact opposite, where someone has it so locked down you have to request project permissions on the hour just to get work done (Pro-tip: never let control freaks setup or manage security! Just kidding. Maybe.)

Prior to starting any security profile setups, make sure to sit down and really put some thought into the types of profiles you want to create. Think about the different types of work being done in ODI (developer, operator, etc) and what level of permission someone may require. Review the built-in generic profiles here. Keep in mind that you will need to setup security in each environment (Dev, Test, QA, Prod and any others you might use) that you want to connect to. No security setup 'automatically' transfers over to other environments, and not all users require access to each environment.

In this tutorial we will take into consideration the following users:

  • DI Projects Developer - Level I
  • Senior BI Project Manager
  • Consultant

We will setup the Security Profile (access) for each user and connect it to the appropriate User for the DEV environment.

NOTE: This tutorial is specific to ODI internal password storage and authentication, not external authentication.

The first step is to decide what type of security profile (access) each user will need.

  • DI Projects Developer - Level I: Entry level DI developer. Should be able to develop in select projects only. Should also have 'view-all' access across the environment
  • Senior BI Project Manager: Full access to all related tasks in Designer, Operator and Topology. Might also have ability to edit or create new users.
  • Consultant: Brought in to assist in developing mappings and to load new data from a recently acquired company
  • Ok, now we can begin the setups.

    1. In a work environment you will login using an ADMIN or Security Profile that has the credentials to create security profiles. In our example you are assumed to be logged in as Admin, Training or Supervisor and have the correct access to set the profiles up.
    2. Navigate to the Security tab in ODI.
    3. Creating Security Profiles in ODI 12c
      If you do not see it, go to the very top menu and click Window > ODI Security Navigator (seen below)
      Creating Security Profiles in ODI 12c

      Now we will create the User logins that will be linked to each profile.

      Most of you will already have User logins, just double-click the specific 'User Profile' when logged in under the ADMIN or full access account to edit permissions for the User.

    4. Expand the 'Users' accordion and click 'New User' to open up the properties window
      Creating Security Profiles in ODI 12c
    5. Create a login for Jane Maine, our Level I ETL Developer. Assign her the profiles that will allow strictly regulated access in the Designer Navigator to projects and models but allow her view access for everything. (Review the profile descriptions)

      We see that in order to limit her development access in DEV to specific projects and models in the Designer Navigator, we must use the non-generic profiles NG_DESIGNER and NG_VERSION_ADMIN. We also must include CONNECT so she has the option to connect to DEV.

      Fast Review: An object is a representation of a design-time or run-time artifact handled through Oracle Data Integrator. Examples of objects include agents, projects, models, data stores, scenarios, mappings, and even repositories. An instance is a particular occurrence of an object. For example, the Datawarehouse project is an instance of the Project object. A method is an action that can be performed on an object, such as edit or delete.

      Generic profiles allow access to all methods of all instances of an object.

      Non-generic profiles are not authorized for all methods on the instances, an admin must grant rights on the methods for each instance.

    6. Input Jane's information and assign her a password (abc123) by clicking 'Enter Password'. Make sure that Jane's password will expire in 6 months, forcing her to change it for security purposes. Click 'OK'
    7. Best Practice: Always go in to your account and change any temporary password. See the video on how to do that, here.

      Creating Security Profiles in ODI 12c
      Your screen should now look like this (description is optional - I always add them in):
      Creating Security Profiles in ODI 12c

    8. On the same properties window click the 'Authorization' tab on the top left side. We are granting very limited access because her projects, as a entry level developer, are limited. Click 'Projects' and allow her access to all methods within 'Projects'. Select all methods (use the checkmark in the top left of each objects) from the following: Select SAVE after each object group methods have been selected:
      • Column
      • Condition
      • Diagram
      • Folder
      • Interface
      • Load Plan
      • Mapping
      • Package
      • Procedure
      • Procedure Command
      • Scenario
      • Scenario Variable

      Your Security Navigator should look similar to this:
      Creating Security Profiles in ODI 12c

    9. Now we create the User and Profile for a recently hired Senior BI Manager named Will Doe. Following the same steps, create the User by expanding (or locating) the Users accordion and clicking New User. Make sure to set the password to expire in 6 months.

    10. Creating Security Profiles in ODI 12c
      Creating Security Profiles in ODI 12c

    11. Unlike the entry level employee, Will Doe needs full access as Senior Manager but he does not need Supervisor access. Check each generic profile (do not check any that start with NG) and click save. Your screen should look similar to the image below.
      Creating Security Profiles in ODI 12c
      Pro Tip: If you aren't sure your security settings are correct, after your new user/profile is saved, expand the 'Objects' and/or 'Instances' (orange boxes on the screenshots above) under the Users name and see what is available.
    12. Now we need to create the Consultants general User and profile. The Consultant password does not need to expire, since we will let the account expire after a month.

    13. Create a new User under the Users accordion. Use the name: 'Consultant', Password: abc123, Notes: Temp consultant for ETL DEV work only.
    14. In this situation, the consultant will need nearly full access but not total access to everything. Check all of the generic profiles EXCEPT version admin. Select the NG VERSION ADMIN to allow selective version access. Your screen should look similar to below.
      Creating Security Profiles in ODI 12c

    15. Click on the 'Authorizations' tab on the top left and scroll down in the objects list and select 'Version' and check only Compare, Restore and View. Click Save. Your image should look similar to below.

    16. Creating Security Profiles in ODI 12c

      Now we test our user settings. Disconnect ODI and login using each USER you created. Look at the limitations for each user.

    17. This screenshot shows how Jane Maine can only access Projects and Load Plans, but not any of the models. What are differences you see for your profiles?

    18. Creating Security Profiles in ODI 12c

      There are so many options for creating secure Users and Profiles within ODI that allow the appropriate amount of access with maximum security - and in fact, it's fairly easy. You can block out high level access such as the entire Operator Navigator or Designer Navigator, all the way down to granular level security where you can block out very specific Methods and Objects associated with it.

      A word to the wise: It is strongly suggested that you only use a generic SUPERVISOR or ADMIN account that has full ODI access for creating users, profiles, changing passwords, etc. Create your own personal user/profile to perform daily work. The reason for this is to know who specifically is doing what. If the user is assigned ADMIN (or something generic) then there is no way to tell who used the login.

      Other suggested settings to try out: You can create Users and Profiles for admin purposes including a 'Designer Navigator only' access, 'Topology Navigator only' access, 'Operator Navigator only' access and variations where you can only access the Designer Navigator and Toplogy navigator, but not the Operator tab.

      ------------

      Make sure to check out these videos and subscribe:

Combining Google Analytics and JSON data through Apache Drill in Oracle Data Visualization Desktop

I've been talking a lot about Oracle's Data Visualization Desktop (DVD) recently, explaining DVD 12.2.2.0 new features and the details of Data Flow component via a fantasy football use case.

Yesterday a new requirement was raised within Rittman Mead: we wanted to analyse our blog stats and specifically understand the number of page views per author of blog posts published in 2016. The two sources of our data were:

My colleague Robin Moffatt already explained in his post how to connect DVD to Google Analytics and how to query JSON files with Apache Drill. Both sources are compatible with DVD, and in this blog post I'll explain my journey in the analysis with the tool to combine both sources.

Ghost JSON data preparation

Following Robin's blog I created two Apache Drill views on top of Ghost Data:

  • vw_fact_post: containing the post information
create or replace view vw_fact_posts as  
 select
 po.post.id id,
 po.post.uuid uuid,
 po.post.title title,
 po.post.slug slug,
 po.post.markdown markdown,
 po.post.published_by published_by,
 cast(po.post.published_at as date) published_at,
 po.post.updated_at updated_at,
 po.post.created_by created_by, 
 cast(po.post.created_at as date) created_at,
 po.post.author_id author_id,
 po.post.meta_description meta_description,
 po.post.visibility visibility,
 po.post.`language` lan,
 po.post.status status from 
 (select flatten(ghost.db.data.posts) post from dfs.tmp.ghost) po;
  • vw_dim_author: containing author data.
select author.u.id id,  
 author.u.name name,
 author.u.slug slug,
 author.u.password pwd,
 author.u.email email,
 author.u.image image,
 author.u.status status,
 author.u.`language` lan,
 author.u.visibility visibility,
 author.u.last_login last_login,
 author.u.created_at created_at,
 author.u.updated_at updated_at,
 author.u.updated_by updated_by 
from (select flatten(ghost.db.data.`users`) u from dfs.tmp.ghost) author;  

The views are not strictly required for the purpose of the analysis since Drill SQL can be directly injected in DVD however creating them has two advantages:

  • the interface between DVD and Drill is cleaner, no complex sql has to be entered and debugged
  • the views can be reused for other projects outside DVD if needed

DVD Data Source Settings

Robin's post provided all the details needed to connect to Google Analytics, no need to add anything there. Apache Drill datasource setting is pretty easy - we just need to specify hostname and port where Drill is running along with the connection username and password.

Drill setup

Once the Drill connection is working I can start importing the views. I do it by selecting the myDrill connection, choosing the dfs.tmp database, selecting the view I want to import and clicking on Add All or selecting the columns.

Drill import

When clicking on OK an "Unsupported SQL statements" error may be raised. This is due to the wrong usage of double quotation marks (") instead of the backtick (`) needed by Drill. I amended the error by clicking on Enter SQL and changing the Drill SQL as in image below.

Drill Error

Having imported the two sources I can review the Measure/Attribute definitions as well as the aggregation methods. This is a crucial point since the associations are made automatically and could be wrong. In my case author_id column was automatically declared as a Measure, which prevented me from using it in joins. I can fix the definitions by right clicking on the source, select Inspect and amend the Measure/Attribute definition.

Drill Measure

Wrangling Google Analytics Data

DVD's Google Analytics connector exposes a set of pre-aggregated views of the tracking data. The Page Tracking view contains a summary of page views, entrances, exit rates and other KPIs at page level - exactly the information I was looking for.

Page Tracking

I then started analysing Page Tracking data within DVD, and found some discrepancies within the data.

  • we recently moved our blog from Wordpress to Ghost: all pages accessed since rittmanmead.com website is on Ghost have the /blog/ prefix, the same was not happening when we were still in Wordpress.
  • the page name could contain a query string appendix like /?.....
  • Ghost slug contains the pure page name, without folder prefix like /blog/2014/ and without starting and ending /

The following is an example of data retrieved from Google Analytics, all for one actual blog post.

GA Row Data

The wrangling of Google Analytics data was necessary and DVD's Data Flow component suited perfectly my needs. I started by importing Page Tracking data source, then by adding two columns:

  • RemovingBlogFromPageName in order to avoid the Ghost/Wordpress problem described above with a case-when and removing the initial / using the substring

  • PageNameClean to remove extra appendix like /?..... with a substring function

Code

I then added the Aggregate step to define the Measures/Attributes and aggregation types. Finally I saved the result locally.

Global GA Flow

The two columns were added in two consecutive Add Columns steps since RemovingBlogFromPageName column is a dependency in the in PageNameClean formula. In a single Add Columns step several columns can be added if they are referencing columns already existing in the dataset.

Creating the Project

In my previous post I linked several data sources with Data Flow, this is always possible but in this case I tried a different approach: the link between Google Analytics and Ghost data (through Drill) was defined directly in DVD Project itself.

First step was to include the Drill datasources: I added vw_fact_post initially and then vw_dim_author by right clicking and selecting Add Data Souce.

Drill data sources

Both sources are now visible in my project and the join between them is auto-generated and based on column name matching. In my case this was wrong and I reviewed and changed it by right clicking, selecting Source Diagram and correcting the joning conditions.

Drill Joining Conditions

Note that I set the Parameter "This connection will" to "Extend a dimension" since the author data is adding attributes to my post data.

Next step was adding Google Analytics refined dataset that I stored locally with Data Flow. I can do it by right clicking, selecting Add Data Souce and then GAAnalyticsDataCleansed. The new source could not be visible immediately in my project, this was due to the lack of matching columns names for the automatic join to be working. I added the joining condition in the Source Diagram section.

GA join

Note that in this case the "This connection will" parameter is set to "Add Facts" since Google Analytics data contains the number of page views and other metrics.
After amending the joining condition I'm finally able to include any columns from my datasource in my project. Here we’ve met the requirement to see post metrics by author, all in DVD and through a very intuitive and accessible interface.