Dimensional Modelling

Tag Archives: Dimensional Modelling

July 3, 2013 Rittman Mead

MDS XML versus MUDE Part 3: Configuring MDS XML with Git

In the previous installment of this series, I gave an overview of MUDE (multi-user development environment), and described it’s strengths and weaknesses, hopefully presenting a fair assessment on both sides. For a recap, here are three properties that I think any metadata layer development (and really… any development period) should adhere to:

Multiple users should be able to develop concurrently… but we need to be clear exactly what we mean by this. Having all the developers login to a shared online repository is one way to do multiuser development… but this equates to multiuser development using serialized access to objects. Whether explicit or not in the requirements… what most organizations want is non-serialized access to objects… meaning that multiple users can be making changes to the same objects at the same time, and we will be able to merge our changes together, and handle conflicts.
Our VCS should give us the ability to rollback to previous versions of our code, either for comparative purposes… or because we want to rewind certain streams of development because of “wrong turns”.
Our VCS and supporting methodology should provide us the ability to tag, secure and package up “releases”… to be migrated to our various environments eventually finding their way to production.

Although I gave MUDE a passing grade, no one is making the honor roll. I wasn’t pleased that MUDE is siloed: it is a VCS applicable only to the metadata repository. I want an all-in approach… a standard SCM methodology that could be applied to the RPD, the web catalog, ETL artifacts, etc. Additionally, I want to decouple the development process from the conflict resolution process: having our own changes to publish to the master repository does not qualify us to determine whether our changes should override those of other developers. That’s a job for the role I defined as the source master. So now, I want to see if an approach using a VCS (specifically Git) combined with MDS XML could match or surpass the grade I gave MUDE.

So I’m going to introduce MDS XML first of all. I won’t spend too much time on this, as Mark presented the case for it very well in a previous post, although he used Subversion as his VCS. Also… Christian Screen in his OTN article describes how to install Git, and get it configured with the Admin Tool. As I described in the introduction to this series, MDS XML is an XML-based standard, not unlike XUDML (but certainly not the same either), using Oracle Metadata Services as a standard instead. Flavius Burca provides a very good non-OBIEE primer to MDS, where he describes the MDS schema’s role in Fusion Middleware proper. So Oracle already heavily utilized the MDS standard, and being that OBIEE is technically Fusion Middleware, the product management team didn’t have to look far for a replacement for the binary structure in the RPD file, or the UDML/XUDML API that accompanies it.

We can use MDS XML by creating a brand new metadata repository specifying XML from the beginning, or converting an existing binary RPD to XML using Copy As|Save As in the Admin Tool:

Notice that when we create an MDS XML metadata repository, we specify a directory location (what I defined as the “core directory” in my introduction) instead of a file location. That’s because we are defining the repository as a series of strongly related XML files organized tightly within a directory structure underneath our core directory. We can see this in the screenshot I first shared in that same introduction:

After we have created an XML repository, we can open it normally, just as we would with a binary RPD. So instead of navigating to an absolute file location, we navigate to the core directory location, as demonstrated below:

The first time we open an MDS XML repository, we are prompted with a dialogue that defines our source control strategy. So the Admin Tool is interested in how we want to develop with MDS XML… whether we want to register this metadata repository with a VCS or not. This amounts to providing the Admin Tool the capability of issuing particular commands for our selected VCS to make it a little easier to work with. We also provide the configuration file–or SCM template–that provides the Admin Tool the logic necessary to work with a specific VCS.

The scm-conf-git.template.xml file specified above is a custom configuration file, as OBIEE only ships with support for Subversion and something called ADE (which I’ve been told is the version control system that Oracle uses internally). So we don’t get Git support (pardon the pun) right out of the box?

There are two solutions to this problem. First and foremost… we don’t need the Admin Tool configured with a template to work with a VCS at all. When the product of our metadata development process is XML, we can be our own proxy to Git, using the command-line tools, or any number of supported Git clients. But secondly… the OBIEE development team was smart to provide an API for allowing us to connect with different VCS providers. Creation of the scm-conf-git.template.xml specified above is demonstrated in Christian’s Screen’s OTN article that I mentioned above, and you can see the specific mapping logic pictured below.

Whenever the Admin Tool creates, modifies, or deletes individual XML files as part of the repository development process, the Admin Tool keeps Git tuned-in to these changes with the commands configured in the SCM template. This is nice… but the Admin Tool will only do so much. It doesn’t handle the initial registration of the MDS XML repository with our VCS. So we can do that either with a client, or the command line, which is demonstrated below. The git status commands I issue are only for informational purposes and not required:

Also, the Admin Tool is not configured to do an actual “commit” either, which is noticeable when viewing the SCM template listed above. I assume this was a design decision by the development team at Oracle… though I have to admit: it’s not one I agree with. It’s the safe choice I guess… keeping their hands (and consciences) clean by not getting involved with VCS commits. But in the end… it’s a silly decision, as the whole point of a VCS is to be able to rollback to particular commits, or even undo them completely. Regardless… no matter what VCS we choose, we have to use an external process of some kind to issue our commits:

So there we have it. I’ve gotten through all the precursors. In the next post, I can finally start demonstrating how we can do multi-user development using MDS XML and Git.

March 25, 2013 Rittman Mead

Using OBIEE against Transactional Schemas Part 5: Complex Facts

I’ve finally gotten around to finishing this series… I believe it’s actually been a year in the making. I’m planning on being more proactive with getting content on the blog. Actually… the main reason I’m putting this one to bed is I have some exciting posts planned in the coming months, and I feel guilty writing those without closing this one off. For the recap, here are the previous entries:

Using OBIEE against Transactional Schemas Part 1: Introduction

Using OBIEE against Transactional Schemas Part 2: Aliases

Using OBIEE against Transactional Schemas Part 3: Simple Dimensions and Facts

Using OBIEE against Transactional Schemas Part 4: Complex Dimensions

In this post, I want to talk about complex facts. The OBIEE metadata layer is a very powerful force indeed, and I can’t demonstrate all the possibilities for shifting a normalized schema into a dimensional one. Instead, I thought I would show one really interesting example… something extremely powerful, in hopes that others reading this series might find inspiration to try something daring themselves.

OLTP developers design their systems with one task in mind: facilitating the functions of the transactional system. Even when standard reporting is delivered as part of an OLTP system, the flexibility of the schema for these purposes is usually an afterthought because the stated goal of these systems is (and should be): delivering transactions with as much efficiency as possible. It should come as no surprise to BI developers when these systems store unrelated activities (at least from a source-system perspective) in completely separate tables. We often want to combine these activities in a coherent structure — usually an activity-based fact table — to be able to GROUP BY the Activity Type in a single report. It’s usually these advanced reporting requirements that cause us to pull out SQL Developer Data Modeler and start designing a data warehouse, which is probably a good idea. But for the reasons mentioned in the introductory blog post in this series, a data warehouse isn’t always in the cards. Desperate for their data, business analysts pull out that old Swiss Army knife — Excel — and start combining data sets and pivoting them, generating the reports that the business needs. We have to understand the need to deliver content to the business quickly, and the business can’t always wait for mounds of ETL code before making very important decisions.

The Business Model and Mapping layer available to us in the OBIEE Semantic layer provides us the ability to present a logical fact table, such as Fact – Customer Activity, as a shell for many types of activities, and even combine those activities in intelligent and performant ways. The example we’ll construct from the Customer Tracking application (which has been the basis for all the previous entries) involves reporting against the activities stored in the EBA_CUST_CUST_ACTIVITY table. As mentioned earlier, this table tracks explicit CRM activities related to particular Customers and Contacts, including meetings, phone calls, etc. We have several activities that we would find useful to combine with these explicit events, such as Customer creation dates, Customer inactivity dates, etc. These implicit activities would look great combined in our Fact – Activity Fact table so we could include the type of activity in our GROUP BY list, and return the results when we drill down to the detail level for a particular customer. We could try to build this integration in the dashboard itself to show the account creation date from Dim – Contact on the same dashboard with dates of CRM activities. But we should all admit that, it’s a better solution to build this functionality in the Business Model if it’s possible. But is this feasible without ETL? Would we have to stitch this together using perhaps presentation variables, or worse: have business analysts dump the results of two separate analyses into an Excel spreadsheet and produce the report they need outside of OBIEE?

So we want to add an implicit event to our Fact – Customer Activity logical table: the creation of a Customer account, which is represented in Customer Tracker with the CREATED_ON column in the EBA_CUST_CUSTOMERS table. We’ll start by adding another logical table source using this source table and provide the same hard-coded value of 1 to the Activity Count measure:

Remember: this is still a factless fact table, and we have to provide a measure to the OBIEE semantic layer which allows aggregate queries. We have a little bit more to do with these logical table sources, but we need to make similar changes to several of our logical dimension tables as well before we complete this task. I’ll preview two logical table sources below (both called Customers), and then explain them further down. The one on the left is a new logical table source for the Dim – Customer Activity table, while the second is for the Dim – Activity Date table:

In the Dim – Customer Activity logical dimension table, we’ll create a new logical table source also based on the EBA_CUST_CUSTOMERS table. As this is a dimension table and requires that we map a value for the primary key, we simply use the physical primary key from the EBA_CUST_CUSTOMERS table. Notice that we have constructed a new logical column called Activity Source Type. This attribute will allow us to differentiate our explicit activities, such as those sourced directly from the EBA_CUST_CUST_ACTIVITY table, from this new implicit activity that we are constructing from the EBA_CUST_CUSTOMERS table. We also provide several hard-coded values to other attributes in this dimension table to compensate for the lack of values for those attributes for our implicit activities.

We also need to provide an additional logical table source for our date dimension Dim – Activity Date. This is where the magic starts to happen. The creation of the Customer account is actually the event that we are interested in reporting on, so it’s this date that ties all the activities together. We’ll map the CREATED_ON date from EBA_CUST_CUSTOMERS to the single Activity Date column that we have defined in the logical table source and let the other calculated measures provide the remaining attributes necessary in the logical dimension table. However, since the CREATED_ON column in the EBA_CUST_CUSTOMERS table is defined as a DATETIME attribute in the physical model (and we want it to remain that way when we view it as a dimensional attribute), we need to modify the expression slightly in the logical table source to remove the time element. As the calculation is not visible in the image above, I’ve listed it here:

Cast(“orcl”.”".”CUST_TRACK”.”EBA_CUST_CUSTOMERS”.”CREATED_ON” AS DATE )

The only remaining dimension table is Dim – Contact, but we don’t need to make any changes here, as the EBA_CUST_CUSTOMERS table is already adequately represented in the logical table source. Because we are bringing a logical table source associated with this logical dimension table into our logical fact table, the BI Server already understands how to construct the join (or more correctly, the lack of a join) with this logical dimension.

Now, we can return to the logical table sources for Fact – Customer Activity to exploit one final piece of sheer magic from the BI Server. For each logical table source, we select the Content tab and make the following changes:

There’s a few really important bits that we are unlocking here. First: we need to check the option next to This source should be combined with other sources at this level. Ordinarily, logical table sources are usually selected by the BI Server using an OR evaluation: one LTS per combination of fact and dimension joins. (There are exceptions to this, but I’m distilling the content down a bit to hopefully make this easier to follow). This setting instead dictates that the LTS’s should be evaluated with an AND instead. We are instructing the BI Server to combine these two logical table sources as if they existed in the same table. This is done using a logical union, which manifests itself as an actual UNION statement in the physical SQL when both sources exist in the same database. We can see this behavior by examining the physical SQL generated by an analysis using Fact – Activity Fact:

The more impressive functionality comes when we make use of the expression specified in Fragmentation content. This logic instructs the BI Server to do a kind of partition pruning that is similar to the behavior of the Oracle Database optimizer when dealing with partitioned tables. What we have constructed here is really a form of logical partitioning, with the source for the Fact – Customer Activity logical fact table existing in two logical physical sources, or partitions. So far, our query wasn’t precise enough to allow the BI Server to prune down to a single logical table source. However, when we choose to filter on the Activity Source Type logical column either directly or by drilling down, which is the same column we defined in our Fragmentation content section, the BI Server removes the UNION statement and generates a query against a single logical table source:

It’s still a best practice to build conformed data warehouses, and transactional reporting should be seen as a means to an end. Regardless, there will always be reasons to do transactional reporting, and the power of the Business Model and Mapping layer provides us with capabilities to deliver analyses and dashboards to replace the Excel spreadsheets that often form the cornerstone of transactional reporting. I hope you’ve enjoyed this series… and perhaps the long delays between each entry kept you on the edge of your seat for each new installment. Perhaps not. Regardless… drop me some comments and let me know what you think.

December 14, 2012 Rittman Mead

Using OBIEE against Transactional Schemas Part 4: Complex Dimensions

You guessed it… more on transactional schemas. For a quick recap, here are the posts to date:

Using OBIEE against Transactional Schemas Part 1: Introduction

Using OBIEE against Transactional Schemas Part 2: Aliases

Using OBIEE against Transactional Schemas Part 3: Simple Dimensions and Facts

Combining Entities

In the last post on Simple Dimensions and Facts, we discussed logical facts and dimensions, factless fact tables, and fake count measures. In constructing the simple dimension in that post, we saw the collapsing of normalized relationships into a single conformed dimension table when building Dim – Customer. Another important entity for reporting in the Customer Tracking application is the Contact entity. At first glance, it might seem that we should simply build another dimension called Dim – Contact, and use analyses to combine our Customer and Contact dimensions along with our Activity fact table to analyze Customer and Contact behavior. When we look again at the data model for the Customer Tracking application below, Customer and Contact are clearly two separate entities in this application, and so this probably means that we should continue to report on them this way, and structure them as such in the OBIEE Business Model and Mapping layer. Right?

Is this the way that business users think about their data? Although it makes perfect sense in a normalized data model to have separate entities for Customer and Contact, the business user likely won’t see it this way, especially when it comes to defining reporting requirements. Contacts are clearly specific instances of a Customer at a lower level in the Customer-Contact hierarchy, and our end users will expect to be able to drill cleanly from Customer on a report down to the specific Contact that is associated with a particular activity. Although OBIEE supports building hierarchies across multiple logical dimension tables so we can provide the drill-down behavior mentioned here, the end user is still confronted with two entities when building the report. This is perhaps the most challenging part of reporting against transactional schemas: deciding not to settle for the model that the OLTP designers have given us. Instead of resigning to have both Dim – Customer and Dim – Contact in our logical BMM, we should attempt to build the appropriate dimensional rendering of this model: a single Dim – Contact dimension of which Customer is simply another element in the hierarchy.

The first step in this process is to rename Dim – Customer to Dim – Contact, as it’s always best to name dimension tables according to the lowest level in the hierarchy: the declared grain of the dimension table. Then I add the physical contact tables to the Dim – Contact logical table source (renaming that as well for clarity purposes), and map all the new Contact-specific attributes to the appropriate physical table.

I also do some renaming of the Customer dimension, and include Contact Detail as the new, lowest level in the hierarchy.

I consider this merging of entities into a single logical dimension table to be more profound than our original constructing of Dim – Customer in the Simple Fact and Dimension post. Although we combined multiple normalized tables to get our original Customer dimension, those tables represented labels and descriptors, such as Customer Category and Customer Status. They didn’t represent true entities, which I argue is the distinction that exists when combining Customer and Contact. This type of high-level distinction comes into play when we try to think about our source model in dimensional terms instead of transactional ones.

An important feature of OBIEE when considering whether to build logical table sources with increasing amounts of physical tables added to them is performance. It might seem that combining the Customer and Contact entities in this way might have a negative impact on performance, because of all the tables that have to be joined to return records for the Dim – Contact dimension. Perhaps keeping the two dimension tables separate would mean less work that the database has to do. What if our query only requires Contact or Customer attributes in the result set, but not both? Doesn’t the BI Server have to include all the joins in the logical table source — in this case six tables — every time this dimension is used in an analysis? The answer is no. The BI Server is very, very smart, and in many ways acts like a database optimizer when structuring the physical SQL for an analysis. In each Intelligent Request Generation, it is aware of the smallest number of physical tables required to produce the result set in the logical query, and only includes those tables in the physical SQL. For instance, a sample analysis that returns only Customer attributes from our new Dim – Contact logical dimension only includes three tables in the SQL issued against the database:

Building a Date Dimension

One consistent element of every BI system is the necessity to handle dates effectively. For this purpose, when building a conformed data warehouse, we always build a date dimension of some kind or another. Ralph Kimball describes the special place that a date (or time) dimension plays in the data warehouse. We may have to split the difference here with Kimball, as you will see later. Regardless, we usually don’t have a date dimension table in our transactional schema. If you recall from earlier posts in this series, the application code for transactional systems usually exists in API’s designed specifically for single record “gets” and “puts”. For this reason, it’s very easy to develop a series of API’s designed to insert or return specific date attributes, and rarely do these systems deal with date conceptually as a hierarchy. It’s simply not part of what they are designed for. It’s not just dimensional models that require date dimensions: OBIEE as a product requires them in particular situations. If we want to capitalize on the built-in feature of automatic drill-down, then we have to build a logical dimension table that contains our levels: year, to quarter, to month, to day, for example. Additionally, the time-series capabilities that OBIEE has also require that our date attributes be configured in a logical hierarchy. We have two options for producing the non-existent date dimension:

We can build a date dimension in our transactional database, or in our foundation layer, and create aliases for this physical table in the OBIEE physical layer. We can then use these aliases in our BMM to form one or more role-playing dimension tables for our facts.
We can use the functionality in the OBIEE business model to logically “construct” a date dimension using a DATE attribute from the source table for the logical fact, and then use calculated attributes to flesh out the remainder of the required level-specific attributes.

The first bullet point is an attractive option, especially if the reporting environment is a replicated version of the transactional system, or better-yet, a foundation layer (described in this post), where we are able to inject additional physical structures specifically for reporting purposes. The join between the date dimension and the source table is constructed using a CAST of DATETIME to DATE for Oracle databases, which will generate the necessary TRUNC command in the join syntax. Although I recommend using DATE datatypes as the surrogate key in date dimensions, it’s not required, as date dimensions usually have a DATE dataype in them somewhere, and this column can be used as the join criteria in the physical model. Don’t be tempted into using OBIEE’s federation capabilities to use a date dimension from another database for this purpose. Though it may seem like the ideal solution on the surface, keep in mind that date-based filters would not be pushed down to the reporting database in this scenario, as these filters would have to be applied logically in memory by the BI Server. This produces a performance nightmare, and inevitably, the production DBA will blame OBIEE for not applying date filters on it’s queries.

Though it’s tempting to dive into the first bullet point further (feel free to leave comments about this), it seems like a fairly standard approach for developers familiar with using OBIEE against traditional data warehouses. We will instead focus on the second bullet point: constructing a date dimension out of sheer nothingness. We start by building an empty dimension table called Dim – Activity Date and mapping only one single attribute from our source table containing the DATE column, in this case, EBA_CUST_CUST_ACTIVITY.ACTIVITY_DATE. This will be the only attribute that we map back to a source table; all the remaining attributes will be calculated measures in the BMM. The OBIEE SQL language contains a series of date conversion functions specifically designed to extract date elements from a DATE datatype. Any functions that OBIEE can’t extract on it’s own can be extracted using the EVALUATE function to push database-specific syntax transparently down to the underlying database engine. I prefer to use the logical option Derived from existing columns using an expression as opposed to Derived from physical mappings whenever possible because we can reuse the Dim – Activity Date dimension with other facts, and we will only have to map a single column in the logical table source: the DATE attribute.

If we choose to use physical mappings instead, we will have to map all the attributes each time we add an additional logical table source to the logical dimension table. When our date dimension is complete, we’ll have a complete logical date dimension with accompanying dimension (hierarchy) ready to be used for drill-down, time-series, etc.

When we are all done with the BMM mapping, we can see how the BI Server generates the physical SQL against an Oracle Database. As we saw earlier, the BI Server doesn’t need to join to the EBA_CUST_CUST_ACTIVITY more than once, even though it’s now part of the logical table source for three logical tables: Dim – Customer Activity, Dim – Activity Date and Fact – Customer Activity.

In the next post, we’ll describe Complex Fact Tables… and see some really, really cool stuff. I promise!

October 26, 2012 Rittman Mead

Using OBIEE against Transactional Schemas Part 3: Simple Dimensions and Facts

This could be the longest series of blog posts in Rittman Mead history… not in number of posts (this is only the third), but in length of time from start to finish. I don’t think that’s a record anyone is actively pursuing, nor am I proud to (possibly) be the record holder, so my apologies to those of you waiting anxiously for each new installment. To reset… I’m discussing using OBIEE to report against transactional schemas. I started with an introduction, and followed up with a few words on aliases. Now I’d like to discuss a general approach to defining logical fact and dimensional tables in the OBIEE Business Model and Mapping layer.

Finding logical tables buried away in a highly normalized model is really as simple as taking a business view of the data. With Customer data for instance, business users don’t think about (or even understand) normalized models with separate entities for Customer, and Customer Category, and Customer Type, and Customer Status, etc. To the business users, these are all attributes of a single entity: Customer. If we think about the BMM layer in OBIEE as a way to present a uniformed model to a business user, or at least a basic user with no knowledge of the underlying 3NF model used for running the business, then we should ask ourselves a simple question: how would a business user want to see this data?

So our job is to undo the brilliant normalization work performed by the transactional data modelers, and combine all of these separate entities into a single, conformed dimension table or fact table. In it’s purest form, I am describing nothing more than real-time ETL, because the logic applied in the BMM is the same sort of logic we would use in an ETL tool to construct facts and dimensions in the physical world.

Thinking only about logical dimension tables for a moment, we start with the most basic and granular entity that composes the heart of the logical dimension, and start joining outward to include all the other ancillary entities needed to form a complete picture of that dimension. With the Customer Tracking application, the logical place to start is the Dim – Customer logical dimension, and the most basic table to use as a starting point is EBA_CUST_CUSTOMERS. This table contains the main attributes of Customer: the name, the address, etc. So I’ll construct a logical table source and start with the EBA_CUST_CUSTOMERS table, and then I’ll add additional physical entities using the (+) button, as depicted below.

Once we have all the required tables joined into the logical table source, we can start mapping the physical columns to our desired logical columns, using either the Column Mapping tab in the logical table source, or by simply clicking and dragging physical columns over onto the logical dimension table.

Once all the columns have been mapped to one of the entities in our logical table source, we can then start building the hierarchy, or dimension in OBIEE terms, that defines the relationship among levels in the dimension. When we are reporting against a pure star schema, this kind of information doesn’t exist in the table definition, though hopefully it exists in the project documentation. However, when reporting off a transactional schema, we have a leg-up here. Although our logical dimension table presents as a single, conformed entity, our physical model is still 3NF, and a 3NF model gives us some hints regarding hierarchies, as those relationships are usually normalized out into separate tables. All and all, the process of building the hierarchy is no different from pure star schemas, because the logical dimension table shields us from the complexity of navigating multiple physical tables to define our hierarchy.

Shifting gears for a moment and thinking about fact tables, the process for mapping simple logical fact tables is at it’s core very similar to the process for logical dimensional tables. When our source system has transactional data, with many measures, such as transaction amount, transaction quantity, etc., then defining the fact table is relatively simple, even though the logical table source for the logical fact table may contain a join across multiple source entities, similar to how we built the simple dimension table above. A common scenario for this is with a Sales Order, where these is a Sales Order Detail table and a Sales Order Header table.

The first approach I want to demonstrate is not as easy as the Sales Order example; it involves a more operational reporting requirement, and how to arrange our logical fact table when we really have no measures at all. In this scenario, what we have is a sort of relationship table, where we require a fact table, but the purpose of that fact table is only to form a bridge between various dimensional tables. Ralph Kimball calls this form of relationship table a factless fact table. As Kimball explains, the event is the fact, and he uses the example of the student attendance fact, where the only thing that is being recorded in the fact is the presence of a particular student, in a particular class, with a particular professor, on a particular day, etc. For the Customer Tracking application, we want to report on customer activities stored in the EBA_CUST_CUST_ACTIVITY table:

This table stores our basic CRM events: phone calls, meetings, delivered proposals, etc. But true to Kimball’s positioning of our requirement, this table doesn’t have any explicit facts such as amount sold, quantity sold, etc. We need a way to model these activities in the semantic layer in a way that makes sense to the BI Server, and what makes sense to the BI Server is the presence of at least one measure. This measure is required to provide the BI Server with the ability to generate aggregate queries involving SUM, AVG, etc. When we build a factless fact table in a data warehouse, we typically load each row with the appropriate surrogate keys to the dimension tables, along with a single, manufactured measure which is usually a value of 1.

We use this value as the activity count, and we can use the measure as a basis for aggregate functions across different dimensions of the data. In building a logical factless fact table, we need to do the same thing. So we start with a logical source for the fact table, even though it doesn’t have any actual measures we will use. In the case of the Customer Tracking application, our relationship table, or factless fact, will be based on EBA_CUST_CUST_ACTIVTY, which is a table with a single record for each activity that we want to record for a customer. Once we have defined the physical table for our logical table source, we map the source column for the Activity Count measure, which in our case will always be a value of 1:

Once we have the new “fake” measure, we can set the aggregation rule, in this case SUM:

It’s also common to have interesting textual attributes existing in the same physical table that we are using as the source for our logical fact table. In our case, this is the EBA_CUST_CUST_ACTIVITY table, and the attribute that we are interested in is called SUMMARY. This attribute holds a description of the activity that has been performed. Although it’s debatable from a pure analytic perspective whether this attribute has value, it certainly does from an operational perspective, especially as an attribute when drilling down to the detail level of a report. When deciding what to do with this attribute, and how to make it available to the end user, we have a series of options. The first decision is a logical one: where do we put the logical column? We could put it in the logical fact table as a non-aggregating attribute, meaning an attribute that doesn’t participate in aggregate functions, but instead forms part of the GROUP BY statement. Although I think this is usually the wrong placement, it is the solution I see more than any others. It is difficult to educate an end user about aggregating and non-aggregating attributes in the same table, and some users will never fully get the distinction. The appropriate solution then is to place these interesting textual attributes, such as SUMMARY, in their own logical dimension table, even though the attributes exist in the “fact table”, so to speak. We can see the SUMMARY column, which has been placed in the logical dimension table Dim – Customer Activity and renamed to Description. I also add an additional table, EBA_CUST_ACTIVITY_TYPE to provide an Activity Type attribute to also use in the logical dimension table.

The reason we put this attribute in it’s own dimension is the importance of separating attributes from measures. Even though OBIEE supports the idea of delivering GROUP BY attributes directly in a logical fact table, that doesn’t mean we should use that feature. Placing the attribute in it’s own dimension also provides us the ability of reusing that dimension table as a conformed dimension table with other logical fact tables that may themselves have their own Description attribute that needs to be exposed, as Venkat describes in his post on modeling degenerate dimensions.

Once we have decided to place the Description attribute in the logical dimension table called Dim – Customer Activity, instead of placing it in Fact – Customer Activity, we now have a decision to make about how to construct the logical table source. The common approach I see is to construct two aliases of the source table, in this example, EBA_CUST_ACTIVITY: one to use as the dimension component of the table, and the other to use as the fact component. Even though I often see this approach, I still haven’t figured out why it is common. The preferred approach is to use the same physical source, either a single alias or a single physical table, as the source for both the dimension logical table source, and the fact logical table source.

To compare these two approaches, I build the analysis using Dim – Customer, Dim – Customer Activity, and Fact – Customer Activity.

Using the 2 alias approach, we would get a self join to the EBA_CUST_ACTIVITY table: once as a dimension and once as a fact:

When using the same physical source for the fact table and the dimension, our report would display the same results, but we would get a single scan against the EBA_CUST_CUST_ACTIVITY table. The BI Server can make out the purpose of separating physical attributes in a logical manner and is capable of rendering the correct results with the fewest number of physical scans:

August 8, 2012 Rittman Mead

Using OBIEE against Transactional Schemas Part 2: Aliases

It seems like a different life when I posted the introduction to reporting against transactional schemas. Since that time, we’ve had some exciting new customers come aboard, which has required a lot of my time (and which I have been glad to provide). We’ve also had Kscope 12, where I was a member of the BI content committee, and where Rittman Mead was a sponsor, and our own Jordan Meyer won Best Speaker for the BI Stream. And, we’ve opened our brand new office in Atlanta, and have our first public training event at that location in progress. All together… we’ve been busy at Rittman Mead America. But a promise is a promise… so here goes Part 2.

The first thing we have to approach is the intelligent use of alias tables in OBIEE. There’s a best practice nugget out in the community that espouses “alias everything”. You’ve heard this, right? It’s not unlike the same advice I’ve been hearing in the database world now for going on two decades, that tells us to always use synonyms, or views, or provide some layer of abstraction. Abstraction I completely understand… but we already have two layers of abstraction in OBIEE without using a single alias. I think the argument goes something like this: “You have no idea when you’ll need to introduce a new way of accessing a particular table… or when it will need to play multiple roles… so do yourself a favor and alias everything right out of the gate.”

My response to this advice: maybe. The longer I’ve worked in IT in general and BI in particular, the more I treat any “absolute” commandment with suspension. When reporting against a transactional system… we have a very complex set of tables and joins. Where dimensional models are characteristic of sparse normalization, OLTP schemas are spilling over the brim with it. And if we choose to “alias everything”, as the good book says, then OBIEE’s ability to import foreign key relationships, and therefore, joins, is completely wasted. We will have to go through the process of re-applying all those joins that we just imported because we want to replace our entire physical layer with aliases. So whatever flexibility you gain by aliasing everything… remember that it comes with a price: the need to redefine every single join once again.

(Do you have a different opinion? Please let me know… I’d love to hear about it. Even you Venkat!)

Regardless of where you stand on the subject of mandatory aliasing… there will always be a need to use some aliasing when dealing with OLTP. To understand why, let’s first take a look at our entire model for the Customer Tracking application:

The first thing to notice is the relationship between EBA_CUST_CONTACTS and EBA_CUST_CUSTOMERS. Basically… this relationship is the heart of the application. Being able to add Customers as well as Contacts, and apply relationships between them (and perhaps sever them if necessary) is one of the key capabilities of any good CRM application. But when looking at this model, how would I write a query to display all the Contacts by Customer? I seem to have two options:

SQL> select count(*) from eba_cust_associations eca
2  join eba_cust_customers ecu on eca.customer_id = ecu.id
3  join eba_cust_contacts eco on eca.contact_id = eco.id;

COUNT(*)
----------
6

1 row selected.

Elapsed: 00:00:00.02
SQL> select count(*) from eba_cust_links ecl
2  join eba_cust_customers ecu on ecl.customer_id = ecu.id
3  join eba_cust_contacts eco on ecl.contact_id = eco.id;

COUNT(*)
----------
0

1 row selected.

Elapsed: 00:00:00.03
SQL>

To the human eye, it’s perfectly clear that the initial query is the correct one. It’s clear by looking at the table name EBA_CUST_ASSOCIATIONS that this table serves as the cross-reference table between Customers and Contacts. EBA_CUST_LINKS is actually a place for recording important URL’s, and they can be stored against either a Contact or a Customer. In this way, the links table is “reused”, and that is why it joins to both EBA_CUST_CUSTOMERS and EBA_CUST_CONTACTS.

But the BI Server has no way of knowing this. To the electronic eye, both EBA_CUST_ASSOCIATIONS and EBA_CUST_LINKS are attractive cross reference tables.

So we have to tell the BI Server how this model plugs together from an “analytics” perspective, and to do this, we’ll use aliases. Once I have created the aliases for the two different “roles” that the EBA_CUST_LINKS table plays, it is very clear to the BI Server which route to take between tables.

Up next in Part 3, I’ll talk about putting together logical dimension tables, and some of the design patterns to keep in mind. Also, when we discuss building logical fact tables (Part 4, perhaps?), we’ll discuss aliases again, and again… I’ll show you common situations where aliases are overused.

Social

- Recent Posts
- Tags
- Search
- Sql2Odi – best practices for auto-generating your ETL content
  
  5:23 pm By Janis Rumnieks
  I blogged a while ago about our Sql2Odi tool that converts SELECT (and WITH as well) statements into Oracle ODI Mappings. (Blog posts 1, Read More »
- Why Data Governance Is More About Enabling Good Than Stopping Bad
  
  10:06 am By Jon Mead
  It seems like every day businesses are bombarded with reminders of the value of data. Data, we are told, leads us to Read More »
- Oracle APEX – Debugging Tip #1
  
  11:00 am By Lucas Hirschegger
  If I would need this feature I would probably invest a lot of time trying to find out an answer. But funny enough I found out Read More »
- We’ve added Search!
  
  4:07 pm By Jon Mead
  Well, to be precise, our blog is hosted on Ghost, and we have just upgraded to version 5.0 which includes search, see here.
- Oracle APEX – Social Login
  
  2:21 pm By Lucas Hirschegger
  I looked into Social Sign-in as an option for Oracle APEX a few years ago. This was pre APEX 18.1 and, at this time, it was Read More »
/EPM / Bi /Patch Releases 11g Big Data blog Business Intelligence Cloud EPM Cloud EPM On Premise Hyperion Obiee Oracle Oracle BI Apps Oracle BI Suite EE Oracle Data Integrator patch patch_set_update psu Uncategorized User Groups & Conferences

Tag Archives: Dimensional Modelling

Combining Entities

Building a Date Dimension

Social