Category Archives: Rittman Mead
Rittman Mead at BIWA Summit 2015
I’m writing this in my hotel room in downtown San Francisco, with my colleague Francesco Tisiot flying in tonight and US colleagues Jordan Meyer, Daniel Adams and Andy Rocha travelling down tomorrow and Monday for next week’s BIWA Summit 2015. The Business Intelligence, Warehousing and Analytics SIG is a part of IOUG and this year also hosts the 11th Annual Oracle Spatial Summit, giving us three days of database-centric content touching most areas of the Oracle BI+DW stack.
Apart from our own sessions (more in a moment), BIWA Summit 2015 has a great-line up of speakers from the Oracle Database and also Hadoop worlds, featuring Cloudera’s Doug Cutting and Oracle’s Paul Sondereggar and with most of the key names from the Oracle BI+DW community including Christian Screen, Tim & Dan Vlamis, Tony Heljula, Kevin McGinley and Stewart Bryson, Brendan Tierney, Eric Helmer, Kyle Hailey and Rene Kuipers. From a Rittman Mead perspective we’ve delivering a number of sessions over the days, details below:
- Jordan Meyer, “Intro to Data Science for Oracle Professionals” – Tuesday Jan 27th, 9.00am – 9.50am
- Mark Rittman, “Bringing Oracle Big Data SQL to OBIEE and ODI” – Wednesday Jan 28th 11.00am – 11.50am
- Daniel Adams & Andy Rocha, “OBIEE Data Visualization: The How and the Why” – Wednesday Jan 28th, 1.00pm – 1.50pm (pt.1) and 2.00 – 2.50pm (pt.2)
- Francesco Tisiot, “Oracle BI Cloud Service : What is It, and Where Will it Be Useful?” – Wednesday Jan 28th, 4.00pm – 4.50pm
- Mark Rittman, “End-to-End Hadoop Development using OBIEE, ODI and Oracle Big Data” – Thursday Jan 29th, 9.30am – 10.20am
Rittman Mead are also proud to be one of the media sponsors for the BIWS Summit 2015, so look out for blogs and other activity from us, and if you’re coming to the event we’ll look forward to seeing you there.
Rittman Mead’s Development Cluster, EM12c and the Blue Mendora VMware EM Plugin
For development and testing purposes, Rittman Mead run a VMWare VSphere cluster made up of a number of bare-metal servers hosting Linux, Windows and other VMs. Our setup has grown over the years from a bunch of VMs running on Mac Mini servers to where we are now, and was added-to considerably over the past twelve months as we started Hadoop development – a typical Cloudera CDH deployment we work with requires six or more nodes along with the associated LDAP server, Oracle OBIEE + ODI VMs and NAS storage for the data files. Last week we added our Exalytics server as a repurposed 1TB ESXi VM server giving us the topology shown in the diagram below.
One of the purposes of setting up a development cluster like this was to mirror the types of datacenter environments our customers run, and we use VMWare VSphere and VCenter Server to manage the cluster as a whole, using technologies such as VMWare VMotion to test out alternatives to WebLogic, OBIEE and Oracle Database HA. The screenshot below shows the cluster setup in VMWare VCenter.
We’re also big advocates of Oracle Enterprise Manager as a way of managing and monitoring a customer’s entire Oracle BI & data warehousing estate, using the BI Management Pack to manage OBIEE installations as whole, building alerts off of OBIEE Usage Tracking data, and creating composite systems and services to monitor a DW, ETL and BI system from end-to-end. We register the VMs on the VMWare cluster as hosts and services in a separate EM12cR4 install and use it to monitor our own development work, and show the various EM Management Packs to customers and prospective clients.
Something we’ve wanted to do for a while though is bring the actual VM management into Enterprise Manager as well, and to do this we’ve also now setup the Blue Mendora VMWare Plugin for Enterprise Manager, which connects to your VMWare VCenter, ESXi, Virtual Machines and other infrastructure components and brings them into EM as monitorable and manageable components. The plugin connects to VCenter and the various ESXi hosts and gives you the ability to list out the VMs, Hosts, Clusters and so on, monitor them for resource usage and set up EM alerts as you’d do with other EM targets, and perform VCenter actions such as stopping, starting and cloning VMs.
What’s particularly useful with such a virtualised environment though is being able to include the VM hypervisors, VM hosts and other VMWare infrastructure in the composite systems we define; for example, with a CDH Hadoop cluster that authenticates via LDAP and Kerberos, is used by OBIEE and ODI and is hosted on two VMWare ESXi hosts part of a VSphere cluster, we can get an overall picture of the system health that doesn’t stop at the host level.
If your organization is using VMWare to host your Oracle development, test or production environments and you’re interested in how Enterprise Manager can help you monitor and manage the whole estate, including the use of Blue Mendora’s VMWare EM Plugin, drop me a line and I’d be happy to take you through what’s involved.
Enable Your Dashboard Designers to Concentrate on User Experience Rather Than Syntax (or How to Add a Treemap in Two Lines)
JavaScript is a powerful tool that can be used to add functionality to OBIEE dashboards. However, for many whose wheelhouses are more naturally aligned with Stephen Few rather than John Resig, adding JavaScript to a dashboard can be intimidating. To facilitate this process, steps can be taken to centralize and simplify the invocation of this code. In this post, I will demonstrate how to create your very own library of custom HTML tags. These tags will empower anyone to add 3rd party visualizations from libraries like D3 without a lick of JavaScript experience.
What is a “Custom Tag”?
Most standard HTML tags provide very simple behaviors. Complex behaviors have typically been reserved for JavaScript. While, for the most part, this is still the case, custom tags can be used to provide a more intuitive interface to the JavaScript. The term “custom tag” library refers to a developer defined library of HTML tags that are not natively supported by the HTML standard, but are instead included at run-time. For example, one might implement a <RM-MODAL> tag to produce a button that opens a modal dialog. Behind the scenes, JavaScript will be calling the shots, but the code in your narrative view or dashboard text section will look like plain old HTML tags.
Developing a JavaScript Library
The first step when incorporating an external library onto your dashboard is to load it. To do so, it’s often necessary to add JavaScript libraries and css files to the <head> of a document to ensure they have been loaded prior to being called. However, in OBIEE we don’t have direct access to the <head> from the Dashboard editor. By accessing the DOM, we can create style and script src objects on the fly and append them to the <head>. The code below appends external scripts to the document’s <head> section.
Figure 1. dashboard.js
01 function loadExtFiles(srcname, srctype){ 02 if (srctype=="js"){ 03 var src=document.createElement('script') 04 src.setAttribute("type","text/JavaScript") 05 src.setAttribute("src", srcname) 06 } else if (srctype=="css"){ 07 var src=document.createElement("link") 08 src.setAttribute("rel", "stylesheet") 09 src.setAttribute("type", "text/css") 10 src.setAttribute("href", srcname) 11 } 12 13 if ((typeof src!==undefined) && (src!==false)) { 14 parent.document.getElementsByTagName("head")[0].appendChild(src) 15 } 16 } 17 18 window.onload = function() { 19 loadExtFiles("/rm/js/d3.v3.min.js", "js") 20 loadExtFiles("/rm/css/visualizations.css", "css") 21 loadExtFiles("/rm/js/visualizations.js", "js") 22 }
In addition to including the D3 library, we have included a CSS file and a JavaScript file, named visualizations.css and visualizations.js respectively. The visualizations.css file contains the default formatting for the visualizations and visualizations.js is our library of functions that collect parameters and render visualizations.
The D3 gallery provides a plethora of useful and not so useful examples to fulfill all your visualizations needs. If you have a background in programming, these examples are simple enough to customize. If not, this is a tall order. Typically the process would go something like this:
- Determine how the data is currently being sourced.
- Rewrite that section of the code to accept data in a format that can be produced by OBIEE. Often this requires a bit more effort in the refactoring as many of the examples are sourced from CSV files or JSON. This step will typically involve writing code to create objects and add those objects to an array or some other container. You will then have to determine how you are passing this data container to the D3 code. Will the D3 code be rewritten as a function that takes in the array as a parameter? Will the array be scoped in a way that the D3 code can simply reference it?
- Identify how configurations like colors, sizing, etc. are set and determine how to customize them as per your requirements.
- Determine what elements need to be added to the narrative view to render the visualization.
If you are writing your own visualization from scratch, these same steps are applied in the design phase. Either way, the JavaScript code that results from performing these steps should not be the interface exposed to a dashboard designer. The interface should be as simple and understandable as possible to promote re-usability and avoid implementation syntax errors. That’s where custom HTML tags come in.
Wait… Why use tags rather than exposing the Javascript function calls?
Using custom tags allow for a more intuitive implementation than JavaScript functions. Simple JavaScript functions do not support named arguments. What this means is JavaScript depends on order to differentiate arguments.
<script>renderTreemap("@1", "@2", @3, null, null, "Y");</script>
In the example above, anyone viewing this call without being familiar with the function definition would have a hard time deciphering the parameters. By using a tag library to invoke the function, the parameters are more clear. Parameters that are not applicable for the current invocation are simply left out.
<rm-treemap name="@1" grouping="@2" measure=@3 showValues="Y"/>
That being said, you should still familiarize yourself with the correct usage prior to using them.
Now some of you may be saying that named arguments can be done using object literals, but the whole point of this exercise is to reduce complexity for front end designers, so I wouldn’t recommend this approach within the context of OBIEE.
What do these tags look like and how do they pass the data to the JavaScript?
For this example, we will be providing a Treemap visualization. As could be expected, the example provided by the link is sourced by a JSON object. For our use, we will have to rewrite that code to source the data from the attributes in our custom HTML tags. The D3 code is expecting a hierarchical object made up of leaf node objects contained within grouping objects. The leaf node objects consists of a “name” field and a “size” field. The grouping object consists of a “name” field and a “children” field that contains an array of leaf node objects. By default, the size values, or measures, are not displayed and are only used to size the nodes. Additionally, the dimensions of the treemap are hard coded values. Inevitably users will want to change these settings, so for each of the settings which we want to expose for configuration we will provide attribute fields on the custom tag we build. Ultimately, that is the purpose of this design pattern.
- Name you custom tag
- Identify all your inputs
- Create a tag attribute for each input
- Within a javascript library, extract and organize the the values
- Pass those values to D3
For this example we will configure behaviours for a tag called <rm-treemap>. Note: It is a good practice to add a dash to your custom tags to ensure they will not match an existing HTML tag. This tag will support the following attributes:
- name – Name of the dimension being measured
- measure: – Used to size the node boxes
- grouping: – Used to determine color for node boxes
- width: Width in pixels
- height: Height in pixels
- showValues: Y/N
It will be implemented within a narrative view like so:
<rm-treemap name="@1" grouping="@2" measure=@3 width="700" height="500" showValues="Y"/>
In order to make this tag useful, we need to bind behaviors to it that are controlled by the tag attributes. To extract the attribute values from <rm-treemap>, the javascript code in visualizations.js will use two methods from the Element Web API, Element.getElementsByTagName and Element.getAttributes.
Fig 2. Lines 8-11 use these methods to identify the first <rm-treemap> tag and extract the values for width, height and showValues. It was necessary specify a single element, in this case the first one, as getElementsByTagName returns an array of all matching elements within the HTML document. There will most likely be multiple matches as the OBIEE narrative field will loop through query results and produce a <rm-treemap> tag for each row.
In Fig 2. Lines 14-41, the attributes for name, measure and grouping will be extracted and bound to either leaf node objects or grouping objects. Additionally lines 11 and 49-50 configure the displayed values and the size of the treemap. The original code was further modified on line 62 to use the first <rm-treemap> element to display the output.
Finally, lines 99-101 ensure that this code only executed when the <rm-treemap> is detected on the page. The last step before deployment is documentation. If you are going to go through all the trouble of building a library of custom tags, you need to set aside the time to document their usage. Otherwise, regardless of how much you simplified the usage, no one will be able to use them.
Figure 2. visualizations.js
01 var renderTreemap = function () { 02 // Outer Container (Tree) 03 var input = {}; 04 input.name = "TreeMap"; 05 input.children = []; 06 07 //Collect parameters from first element 08 var treeProps = document.getElementsByTagName("rm-treemap")[0]; 09 canvasWidth = treeProps.getAttribute("width") ? treeProps.getAttribute("width") : 960; 10 canvasHeight = treeProps.getAttribute("height") ? treeProps.getAttribute("height") : 500; 11 showValues = treeProps.getAttribute("showValues").toUpperCase(); 12 13 // Populate collection of data objects with parameters 14 var mapping = document.getElementsByTagName("rm-treemap"); 15 for (var i = 0; i < mapping.length; i++) { 16 var el = mapping[i]; 17 var box = {}; 18 var found = false; 19 20 box.name = (showValues == "Y") ? el.getAttribute("name") + 21 "<br> " + 22 el.getAttribute("measure") : el.getAttribute("name"); 23 box.size = el.getAttribute("measure"); 24 curGroup = el.getAttribute("grouping"); 25 26 // Add individual items to groups 27 for (var j = 0; j < input.children.length; j++) { 28 if (input.children[j].name === curGroup) { 29 input.children[j].children.push(box); 30 found = true; 31 } 32 } 33 34 if (!found) { 35 var grouping = {}; 36 grouping.name = curGroup; 37 grouping.children = []; 38 grouping.children.push(box); 39 input.children.push(grouping); 40 } 41 } 42 43 var margin = { 44 top: 10, 45 right: 10, 46 bottom: 10, 47 left: 10 48 }, 49 width = canvasWidth - margin.left - margin.right, 50 height = canvasHeight - margin.top - margin.bottom; 51 52 // Begin D3 visualization 53 var color = d3.scale.category20c(); 54 55 var treemap = d3.layout.treemap() 56 .size([width, height]) 57 .sticky(true) 58 .value(function (d) { 59 return d.size; 60 }); 61 62 var div = d3.select("rm-treemap").append("div") 63 .style("position", "relative") 64 .style("width", (width + margin.left + margin.right) + "px") 65 .style("height", (height + margin.top + margin.bottom) + "px") 66 .style("left", margin.left + "px") 67 .style("top", margin.top + "px"); 68 69 var node = div.datum(input).selectAll(".treeMapNode") 70 .data(treemap.nodes) 71 .enter().append("div") 72 .attr("class", "treeMapNode") 73 .call(position) 74 .style("background", function (d) { 75 return d.children ? color(d.name) : null; 76 }) 77 .html(function (d) { 78 return d.children ? null : d.name; 79 }); 80 81 function position() { 82 this.style("left", function (d) { 83 return d.x + "px"; 84 }) 85 .style("top", function (d) { 86 return d.y + "px"; 87 }) 88 .style("width", function (d) { 89 return Math.max(0, d.dx - 1) + "px"; 90 }) 91 .style("height", function (d) { 92 return Math.max(0, d.dy - 1) + "px"; 93 }); 94 } 95 //End D3 visualization 96 } 97 98 // Invoke visualization code only if rm-treemap tag exists 99 var doTreemap = document.getElementsByTagName("rm-treemap"); 100 if (doTreemap !== null) { 101 renderTreemap(); 102 }
Figure 3. visualizations.css
01 .treeMapNode { 02 border: solid 1px white; 03 border-radius: 5px; 04 font: 10px sans-serif; 05 line-height: 12px; 06 overflow: hidden; 07 position: absolute; 08 text-indent: 2px; 09 } |
Putting it all together
The first step to implementing this code is to make is accessible. To do this, you will need to deploy your code to the weblogic server. Many years ago, Venkatakrishnan Janakiraman, detailed how to deploy code to weblogic in his blog about skinning. For this application this process still applies, however you don’t need to be concerned with the bits about modifying the instanceconfig.xml or skinning.
Once that the code has been deployed to the server, there are literally only two lines of code required to implement this visualization. First the libraries need to be included. This is done by sourcing in the dashboard.js file. This can be done within the Narrative view’s prefix field, but I have chosen to add it to a text section on the dashboard. This allows multiple analyses to use the libraries without duplicating the load process in multiple places.
The text section should be configured as follows. (Note: The path to Dashboard.js is relative to the root path specified in your deployment.)
From the Narrative View, add the <rm-treemap> tag to the Narrative field and populate the attributes with the appropriate data bind variables and your desired settings.
This should result in the following analysis.
In summary:
- Deploy the dashboard.js, visualization.js and visualization.css files to weblogic
- From a dashboard text section, source in dashboard.js, which will in turn include visualization.js and visualization.css
- Add the <rm-treemap> tag to the Narrative field of a Narrative view.
As you can see implementing custom HTML tags to serve as the interface for a D3 visualization will save your dashboard designers from having to sift through dozens if not hundreds of lines of confusing code. This will reduce implementation errors, as the syntax is much simpler than JavaScript and will promote conformity, as all visualizations will be sourced from a common library. Hopefully, this post was informative and will inspire you to consider this pattern or a similarly purposed one to make your code easier to implement.
Why and How to use Oracle Metadata Management 12c. Part 1: Getting Started
At OOW 2014, Oracle announced the new Oracle Metadata Management solution and later in the middle of October released its first version – OMM 12.1.3.0.0
At the end of November of 2014, the second version was released -OMM 12.1.3.1.0- with new features and some bugs fixed.
But first things first, what is Oracle Metadata Management? And why we want to use it?
One of the biggest problems that we face today, is the proliferation of different systems, data sources, solutions for BI, for ETL, etc in the same company. So not only for final users but also for technical people (from SysAdmin, Data Scientist, Data Steward to Developers) is quite difficult to track which data is used by which applications. In some cases is almost impossible to perform an impact analysis if someone wants to change a table or if the way that a sales measure is calculated needs to change. With more systems involved, the problem is bigger.
Oracle Metadata Management (OMM) comes to provide a solution to this problem. It is a complete metadata management platform that can reverse engineer (harvest) and catalog metadata from any source: relational, Big data, ETL, BI, data modelling, etc.
OMM allows us to perform interactive searching, data lineage, impact analysis, semantic definition and semantic usage analysis within the catalog. And the really important thing is the metadata from different providers (Oracle or/and third-party) can be related (stitched) so you will have the complete path of data from source to report or vice versa. In addition, it manages versioning and comparison of metadata models.
The Oracle Metadata Management solution offers two products: OEMM (Oracle Enterprise Metadata Management) and OMM for OBI (Oracle Metadata Management for Oracle Business Intelligence). With the first one we can use metadata providers from Oracle and third-party technologies. Using OMM for OBI allows us to use metadata for databases, OBIEE, ODI and DAC.
We will see in this series of posts how to use each of these options, the difference between them and which will be the best option depending of your environment.
In this first post we will focus on the installation process and the requirements for it.
Minimum Requirements for a small test environment
It is important to note and it is also well explained in the Readme document, that the following are the minimum requirements for a tutorial or a small business case, not for a larger system.
Browser
Any of these browsers or newer versions of them with at least Adobe Flash v8 plugging can be used: Microsoft Internet Explorer (IE) v10, Mozilla Firefox v30 or newer, Google Chrome v30, Apple Safari v6.
Hardware
2 GHZ or higher quad core processor
4 GB RAM (8 GB if 64bit OS using 64bits Web Application Server)
10 GB of disk space (all storage is primarily in the database server)
Operating System
Microsoft Windows 2008 Server, Windows 2012 Server, Windows 7, Windows 8, or Windows 8.1. Be sure that the you have full Administrator privilege when run the installer and that the Microsoft .NET Framework 3.5 or higher is installed.
Other operating systems require manual install/setup, so are not supported by this version.
Web Application Server
The installer comes with the Apache Tomcat as Web Application Server and Oracle JRE 6 as Java Run Environment. Others web application servers (including Oracle WebLogic) require manual install/setup, and are not supported by this version.
Database Server
For the Database Server you can only use an Oracle Database from 10gR2 to 12 64-bit as a repository for OMM. You can create a new instance or reuse your existing Oracle database server but we need to have admin privileges in the database.
A very important observation is that the character set MUST be AL32UTF8 (UTF8). This is because the Oracle Intermedia Search can only index columns of type VARCHAR or CLOB (not the national variants NVARCHAR and NCLOB respectively). Otherwise you will receive this error message when you run the OMM for the first time:
To solve this, you can create a new instance of the database, or if your database has data already, there a couple of notes in My Oracle Support 260192.1 and 788156.1 to change any character set to AL32UTF8.
In addition, the CTXSYS user must be exist in the database. In case it doesn’t exist, the creation and granting privileges script can be found in <ORACLE_HOME>/ctx/admin/catctx.sql.
Preparing to install
Step 1 - Download the software. You can download the software from the OTN site http://www.oracle.com/technetwork/middleware/oemm/downloads/index.html or using e-delivery.oracle.com instead.
Step 2 – Create a Database Schema as Repository. Before start the installation, a database schema needs to be created as a repository for OMM to keep all its objects like models, configurations, etc (we will see all of these objects in next posts)
For that reason create a user in the database:
“create user MIR identified by <password> quota unlimited on users”
And give to it the following grants:
“grant create session to MIR;
grant create procedure to MIR;
grant create sequence to MIR;
grant create table to MIR;
grant create trigger to MIR;
grant create type to MIR;
grant create view to MIR”
We also need to give grants to the new to user to execute a package from CTXSYS and another one from SYS.
“grant execute on CTXSYS.CTX_DDL to MIR;
grant execute on SYS.DBMS_LOCK TO MIR;”
If you prefer (and also could be a more accurate solution) you can create specific tablespaces (user tablespace and temp tablespace) for that user. I asked to David Allan, who is always very generous with his time and knowledge, if this schema will be part of the RCU in future releases but there is no plan to incorporate the MIR schema to it.
Installation and Post-Install tasks
Step 3 – Install the software. We can start now to run the installation. The downloaded zip file contains an exe file, double-click on it to start the installation.
In the first screen, select the type of product that you want to install: OEMM or OMM for OBI. We choose the Oracle Enterprise Metadata Management and press Next.
In the next screen, you have access to the Readme document and release notes pressing the View Readme button. After the installation you can find them in the OMM_Home/Documentation folder.
The next screen show you the destination location that you can change if you want. Keep the ports number suggested on the next screen.
The last screen of the installation ask you to restart the computer in order to use the product.
Step 4 – Start OMM Server as a service. After you restart the computer, you need to configure the OMM Server as a Service and start it. You can do this through the option that is showed in the start menu and press the Start button or going directly to the windows services screen and press the right button on the OMM service and start it.
Step 5 – Initialize OEMM. Run the OEMM for the first time. We have everything ready to start using Oracle Metadata Management. Go to the URL: http://localhost:11580/MM or execute the shortcut that was created on your desktop after the installation or use the Windows Start Menu.
We need to enter the connection details using the schema that we created in the database. Enter MIR as the Database User Id, its password and the database URL, and then press the Test Connection button. After you receive the Successful message, press the Save button to run the initialization process where OEMM create the objects in the database schema to manage the repository.
This process takes some minutes until you get the confirmation that the initialization process is also successful.
Step 6 – Start OEMM. Close the browser tab and open again the OEMM URL (http://localhost:11580/MM). A login page appears. User and password to login is Administrator/Administrator
This is the main page of the OEMM where we are going to harvest (reverse-engineer) the metadata from different providers in the next posts.
In case you want to change the password of the Administrator user go to Tools > Administration on the top right of the page. Select the Administrator user and the user will be appear below.
If you prefer to create another user with Administration privileges, just press the Add User button (plus icon) in the Administration page and enter the details for the new user:
We are using the Native LDAP authentication approach for this demo, but OEMM can also use an External LDAP for authentication.
About the product documentation you can access it through the Help option which is on the top right of the page. In the Contents tab you have all the topics (Harvesting, Administration, etc) separated by folder and in each of them all the details about the specific topic
Installation of OMM for OBI
There are no differences in the installation process for OEMM and OMM for OBI. Just be sure to select the one that you want in the first screen of the installation. This is the page to login to the OMM for OBI.
In the next post, we will see how is the harvest (importing metadata) process using different metadata providers like OBIEE, ODI and others.
Concurrent RPD Development in OBIEE
OBIEE is a well established product, having been around in various incarnations for well over a decade. The latest version, OBIEE 11g, was released 3.5 years ago, and there are mutterings of OBIEE 12c already. In all of this time however, one thing it has never quite nailed is the ability for multiple developers to work with the core metadata model – the repository, known as the RPD – concurrently and in isolation. Without this, development is doomed to be serialised – with the associated bottlenecks and inability to scale in line with the number of developers available.
My former colleague Stewart Bryson wrote a series of posts back in 2013 in which he outlines the criteria for a successful OBIEE SDLC (Software Development LifeCycle) method. The key points were :
- There should be a source control tool (a.k.a version control system, VCS) that enables us to store all artefacts of the BI environment, including RPD, Presentation Catalog, etc etc. From here we can tag snapshots of the environment at a given point as being ready for release, and as markers for rollback if we take a wrong turn during development.
- Developers should be able to do concurrent development in isolation.
- To do this, source control is mandatory in order to enable branch-based development, also known as feature-driven development, which is a central tenet of an Agile method.
Oracle’s only answer to the SDLC question for OBIEE has always been MUDE. But MUDE falls short in several respects:
- It only manages the RPD – there is no handling of the Presentation Catalog etc
- It does not natively integrate with any source control
- It puts the onus of conflict resolution on the developer rather than the “source master” who is better placed to decide the outcome.
Whilst it wasn’t great, it wasn’t bad, and MUDE was all we had. Either that, or manual integration into source control (1, 2) tools, which was clunky to say the least. The RPD remained a single object that could not be merged or managed except through the Administration Tool itself, so any kind of automatic merge strategies that the rest of the software world were adopting with source control tools were inapplicable to OBIEE. The merge would always require the manual launching of the Administration Tool, figuring out the merge candidates, before slowly dying in despair at having to repeat such a tortuous and error-prone process on a regular basis…
Then back in early 2012 Oracle introduced a new storage format for the RPD. Instead of storing it as a single binary file, closed to prying eyes, it was instead burst into a set of individual files in MDS XML format.
For example, one Logical Table was now one XML files on disk, made up of entities such as LogicalColumn, ExprText, LogicalKey and so on:
It even came with a set of configuration screens for integration with source control. It looked like the answer to all our SDLC prayers – now us OBIEE developers could truly join in with the big boys at their game. The reasoning went something like:
- An RPD stored in MDS XML is no longer binary
- git can merge code that is plain text from multiple branches
- Let’s merge MDS XML with git!
But how viable is MDS XML as a storage format for the RPD used in conjunction with a source control tool such as git? As we will see, it comes down to the Good, the Bad, and the Ugly…
The Good
As described here, concurrent and unrelated developments on an RPD in MDS XML format can be merged successfully by a source control tool such as git. Each logical object is an file, so git just munges (that’s the technical term) the files modified in each branch together to come up with a resulting MDS XML structure with the changes from each development in it.
The Bad
This is where the wheels start to come off. See, our automagic merging fairy dust is based on the idea that individually changed files can be spliced together, and that since MDS XML is not binary, we can trust a source control tool such as git to also work well with changes within the files themselves too.
Unfortunately this is a fallacy, and by using MDS XML we expose ourselves to greater complications than we would if we just stuck to a simple binary RPD merged through the OBIEE toolset. The problem is that whilst MDS XML is not binary, is not unstructured either. It is structured, and it has application logic within it (mdsid, of which see below).
Within the MDS XML structure, individual first-class objects such as Logical Tables are individual files, and structured within them in the XML are child-objects such as Logical Columns:
Source control tools such as git cannot parse it, and therefore do not understand what is a real conflict versus an unrelated change within the same object. If you stop and think for a moment (or longer) quite what would be involved in accurately parsing XML (let alone MDS XML), you’ll realise that you basically need to reverse-engineer the Administration Tool to come up with an accurate engine.
We kind of get away with merging when the file differences are within an element in the XML itself. For example, the expression for a logical column is changed in two branches, causing clashing values within ExprText and ExprTextDesc. When this happens git will throw a conflict and we can easily resolve it, because the difference is within the element(s) themselves:
Easy enough, right?
But taking a similarly “simple” merge conflict where two independent developers add or modify different columns within the same Logical Table we see what a problem there is when we try to merge it back together relying on source control alone.
Obvious to a human, and obvious to the Administration Tool is that these two new columns are unrelated and can be merged into a single Logical Table without problem. In a paraphrased version of MDS XML the two versions of the file look something like this, and the merge resolution is obvious:
But a source control tool such as git looks as the MDS XML as a plaintext file, not understanding the concept of an XML tree and sibling nodes, and throws its toys out of the pram with a big scary merge conflict:
Now the developer has to roll up his or her sleeves and try to reconcile two XML files – with no GUI to support or validate the change made except loading it back into the Administration Tool each time.
So if we want to use MDS XML as the basis for merging, we need to restrict our concurrent developments to completely independent objects. But, that kind of hampers the ideal of more rapid delivery through an Agile method if we’re imposing rules and restrictions like this.
The Ugly
This is where is gets a bit grim. Above we saw that MDS XML can cause unnecessary (and painful) merge conflicts. But what about if two developers inadvertently create the same object concurrently? The behaviour we’d expect to see is a single resulting object. But what we actually get is both versions of the object, and a dodgy RPD. Uh Oh.
Here are the two concurrently developed RPDs, produced in separate branches isolated from each other:
And here’s what happens when you leave it to git to merge the MDS XML:
The duplicated objects now cannot be edited in the Administration Tool in the resulting merged RPD – any attempt to save them throws the above error.
Why does it do this? Because the MDS XML files are named after a globally unique identifier known as the mdsid, and not their corresponding RPD qualified name. And because the mdsid is unique across developments, two concurrent creations of the same object end up with different mdsid values, and thus different filenames.
Two files from separate branches with different names are going to be seen by source control as being unrelated, and so both are brought through in the resulting merge.
As with the unnecessary merge conflict above, we could define process around same object creation, or add in a manual equalise step. The issue really here is that the duplicates can arise without us being aware because there is no conflict seen by the source control tool. It’s not like merging an un-equalised repository in the Administration Tool where we’d get #1 suffixes on the duplicate object so that at least (a) we spot the duplication and (b) the repository remains valid and the duplicate objects available to edit.
MDS XML Repository opening times
Whether a development strategy based on MDS XML is for you or not, another issue to be aware of is that for anything beyond a medium sized RPD opening times of an MDS XML repository are considerable. As in, a minute from binary RPD, and 20 minutes from MDS XML. And to be fair, after 20 minutes I gave up on the basis that no sane developer would write off that amount of their day simply waiting for the repository to open before they can even do any work on it. This rules out working with any big repositories such as that from BI Apps in MDS XML format.
So is MDS XML viable as a Repository storage format?
MDS XML does have two redeeming features :
- It reduces the size of your source control repository, because on each commit you will be storing just a delta of the overall repository change, rather than the whole binary RPD each time.
- For tracking granular development progress and changes you can identify what was modified through the source control tool alone – because the new & modified objects will be shown as changes:
But the above screenshots both give a hint of the trouble in store. The mdsid unique identifier is used not only in filenames – causing object duplication and strange RPD behaviour- but also within the MDS XML itself, referencing other files and objects. This means that as a RPD developer, or RPD source control overseer, you need to be confident that each time you perform a merge of branches you are correctly putting Humpty Dumpty back together in a valid manner.
If you want to use MDS XML with source control you need to view it as part of a larger solution, involving clear process and almost certainly a hybrid approach with the binary RPD still playing a part — and whatever you do, the Administration Tool within short reach. You need to be aware of the issues detailed above, decide on a process that will avoid them, and make sure you have dedicated resource that understands how it all fits together.
If not MDS XML, then what?…
Source control (e.g. git) is mandatory for any kind of SDLC, concurrent development included. But instead of storing the RPD in MDS XML, we store it as a binary RPD.
Wait wait wait, don’t go yet ! … it gets better
By following the git-flow method, which dictates how feature-driven development is done in source control (git), we can write a simple script that determines when merging branches what the candidates are for an OBIEE three-way RPD merge.
In this simple example we have two concurrent developments – coded “RM–1” and “RM–2”. First off, we create two branches which take the code from our “mainline”. Development is done on the two separate features in each branch independently, and committed frequently per good source control practice. The circles represent commit points:
The first feature to be completed is “RM–1”, so it is merged back into “develop”, the mainline. Because nothing has changed in develop since RM–1 was created from it, the binary RPD file and all other artefacts can simply ‘overwrite’ what is there in develop:
Now at this point we could take “develop” and start its deployment into System Test etc, but the second feature we were working on, RM–2, is also tested and ready to go. Here comes the fancy bit! Git recognises that both RM–1 and RM–2 have made changes to the binary RPD, and as a binary RPD git cannot try to merge it. But now instead of just collapsing in a heap and leaving it for the user to figure out, it makes use of git and the git-flow method we have followed to work out the merge candidates for the OBIEE Administration Tool:
Even better, it invokes the Administration Tool (which can be run from the command line, or alternatively use command line tools comparerpd/patchrpd) to automatically perform the merge. If the merge is successful, it goes ahead with the commit in git of the merge into the “develop” branch. The developer has not had to do any kind of interaction to complete the merge and commit.
If the merge is not a slam-dunk, then we can launch the Administration Tool and graphically figure out the correct resolution – but using the already-identified merge candidates in order to shorten the process.
This is not perfect, but there is no perfect solution. It is the closest thing that there is to perfection though, because it will handle merges of :
- Unique objects
- Same objects, different modifications (c.f. two new columns on same table example above)
- Duplicate objects – by equalisation
Conclusion
There is no single right answer here, nor are any of the options overly appealing.
If you want to work with OBIEE in an Agile method, using feature-driven development, you will have to adopt and learn specific processes for working with OBIEE. The decision you have to make is on how you store the RPD (binary or multiple MDS XML files, or maybe both) and how you handle merging it (git vs Administration Tool).
My personal view is that taking advantage of git-flow logic, combined with the OBIEE toolset to perform three-way merges, is sufficiently practical to warrant leaving the RPD in binary format. The MDS XML format is a lovely idea but there are too few safeguards against dodgy/corrupt RPD (and too many unnecessary merge conflicts) for me to see it as a viable option.
Whatever option you go for, make sure you are using regression testing to test the RPD after you merge changes together, and ideally automate the testing too. Here at Rittman Mead we’ve written our own suite of tools that do just this – get in touch to find out more.