Tag Archives: Obiee
Merging OBIEE 12c .RPD binary files directly in Git
Let's talk about OBIEE concurrent development!
Enabling concurrent development for OBIEE RPD is a recurring theme in OBIEE blogs. Full support for RPD development with Gitflow has long since been part of the Rittman Mead's BI Developer Toolkit and is described in great detail in Minesh's blog post. What you are currently reading is a follow-up to Minesh's post, but taking it one step further: instead of calling Python scripts to perform Gitflow steps, we want to perform all those steps directly from our Git client (including the ones performing a merge, like feature finish
), be it command line or a visual application like Sourcetree.
RPD versioning directly in Git - do we need that?
How is versioning directly in a Git client better than calling Python scripts? First of all, it is the convenience of using the same approach, the same tool for all content your need to version control. A Python script will have to come with instructions for its use, whereas every developer knows how to use Git. Last but not least, a 3-way merge, which is used for Gitflow's feature finish
, release finish
and hotfix finish
commands, requires three repositories that need to be passed to the script in the right order. Doing merges in your Git client would be quicker and less error prone.
What is a Git merge?
Before we proceed with discussing our options for merging OBIEE RPDs, let us quickly recap on how Git merges work.
There are two types of Git merges: Fast-forward Merges and 3-way Merges. Strictly speaking, Fast-forward Merges are no merges at all. If the base branch has not seen any changes whilst you worked on your feature branch, merging the feature back into the base simply means 'fast-forwarding' the base branch tip to your feature branch tip, i.e. your feature becomes the new base. That is allowed because the two branches have not diverged - the histories of the base and the feature branches form a single history line.
When the two branches have diverged, i.e. when the base has been modified by the time we want to merge our feature, a 3-way merge is the only option.
In the above diagram, feature 1 can be fast-forward merged whereas feature 2 must be 3-way merged into the develop branch.
Note that because a Fast-forward Merge is not an actual merge but rather a replacement, it is not relevant what content is being merged. The 3-way Merge however, depending on the content being merged, can be quite challenging or even impossible. And can result in merge conflicts that require manual resolution.
So... can Git 3-way merge RPDs?
OBIEE RPD can be saved in two formats: a single binary .rpd file or one or many .xml files (depending on what rpd-to-xml conversion method you use). The choice here seems obvious - it is common knowledge that Git cannot reliably 3-way merge binary files. So XML format it is. Or is it?
Like any other text file, Git certainly can merge XML files. But will it produce an XML that is still recognised as a consistent OBIEE RPD? Well, there are some OBIEE developer teams that have reported success with this approach. My own experience even with the most trivial of RPD changes shows that somewhere during the .xml to .rpd conversion, then introducing changes in the .rpd and in the end converting it back to .xml, the XML tags get reshuffled and sometimes their identifiers can change as well. (Equalising RPD objects is supposed to help with the latter.) I found no standard Git merge algorithm that would reliably and consistently perform RPD merge for XML format produced this way, be it a single large XML file or a collection of small XML files.
Fortunately, there is a better (and less risky) way.
Creating a Git custom merge driver
It is possible to create custom Git merge drivers and then assign them to specific file extensions (like .rpd) in the .gitattributes file - as described in Git documentation. According to the guide, defining a custom merge driver in Git is really straight forward: just add a new entry to the .git/config
file:
[merge "filfre"]
name = feel-free merge driver
driver = filfre %O %A %B %L %P
recursive = binary
Here, filfre
is the code name of the custom merge driver, feel-free merge driver
is the descriptive name of it (hardly used anywhere) and the driver
value is where we define the driver itself. It is a shell command for your operating system. Typically it would call a shell script or a binary executable. It can be a java -jar
execution or a python my-python-script.py
call. The latter is what we want - we have already got a 3-way merge script for OBIEE RPD in the Rittman Mead's BI Developer Toolkit, as blogged by Minesh.
For the script to know about the content to be merged, it receives the following command line arguments: %O %A %B %L %P
. These are the values that Git passes to the custom merge driver:
%O
- this is the Base or the Original for the 3-way merge. If we are using Git Flow, this is thedevelop
branch's version, from which ourfeature branch
was created;%A
- this is the Current version for the 3-way merge. If we are using Git Flow, this is thefeature branch
that we want to merge back intodevelop
;%B
- this is the Other or the Modified version of the 3-way merge. If we are using Git Flow, this is thedevelop
branch as it is currently (diverged from the original Base), when we want to merge ourfeature branch
back into it.
There are two more values, which we do not need and will ignore: %L
is Conflict marker size, e.g. 7 for '>>>>>>>'. This is irrelevant for us, because we are handling binary files. %P
is the full path name where the merge result will be stored - again irrelevant for us, because Python is capable of getting full paths for the files it is handling, in case it needs it.
Creating a Git custom merge driver for OBIEE .rpd binary files
What we need here is a Python script that performs a 3-way RPD merge by calling OBIEE commands comparerpd
and patchrpd
from command line. Please note that OBIEE creates a 4th file as the output of the merge, whereas a git merge driver is expected to overwrite the Current (%A
) input with the merge result. In Python, that is quite doable.
Another important thing to note is that the script must return exit code 0 in case of a success and exit code 1 in case there were merge conflicts and automatic merge could not be performed. Git determines the success of the merge solely based on the exit code.
Once we have the Python script ready and have tested it standalone, we open our local Git repository folder where our OBIEE .rpd files will be versioned and open the file <repo root>/.git/config
for editing and add the following lines to it:
[merge "rpdbin"]
name = binary RPD file merge driver
driver = python C:/Developer/bi_developer_toolkit/git-rpd-merge-driver.py %O %A %B
Our Python script expects 3 command line arguments - names of .rpd files: Base (%O
), Current (%A
) and Modified (%B
). Those will be temporary files, created by Git in run time.
Once the config file is modified, create a new file <repo root>/.gitattributes
and add the following line to it:
*.rpd merge=rpdbin
This assumes that your binary RPD files will always have the extension .rpd
. If with a different extension, the custom merge driver will not be applied to them.
And that is it - we are done!
Note: if you see that the custom merge driver works from the Git command line tool but does not work in Sourcetree, you may need to run Sourcetree as Administrator.
Trying it out
We will use Sourcetree as our Git/Gitflow client - it is really good at visualising the versioning flow and shows the currently available Gitflow commands for the currently checked out branch.
We will use the RPD from Oracle Sample Application v602 for OBIEE 12c 12.2.1.1.0. for our testing.
After initialising Gitflow in our Git repository, we add the out-of-the-box Sample Apps RPD to our repository's develop branch - that will be our Base.
Then we create two copies of it and modify each copy to introduce changes we would like to see merged. In the screenshots below, you can see Business Models and Databases renamed. But I did also change the content of those Business Models.
Repo 1:
Repo 2:
Now we create a new feature branch and overwrite the Base rpd it contains with our Repo 1 rpd.
As the next step, we check out the develop branch again and replace the Base rpd there with Repo 2 rpd.
Note that we need to make sure the develop branch is different from the original Base when we finish our feature. If the develop branch will be the same as the original Base when we finish the feature, a fast-forward merge will be done instead and our custom merge driver will not be applied.
The result should look like this in Sourcetree. You can see a fork, indicating that the develop and the feature8 branches have diverged:
We are ready to test our custom 3-way merge driver. In Sourcetree, from the Gitflow menu, select Finish Feature
.
Confirm your intention to merge the feature back into develop.
If all goes as planned, Git will call your custom merge driver. In Sourcetree, click the Show Full Output checkbox to see the output from your script. In my script, I tagged all output with a [Git RPD Merge Driver] prefix (except the output coming from external functions). This is what my output looks like:
Now let us check the result: make sure the develop branch is checked out, then open the merged RPD in the Admin tool.
We can see that it worked - we can now do full Gitflow lifecycle for OBIEE .rpd files directly in Git.
But what if the merge fails?
If the merge fails, the feature branch will not be deleted and you will have to merge the .rpd files manually in the OBIEE Admin tool. Note that you can get the Current, the Modified and the Base .rpd files from Git. Once you are happy with your manual merge result, check out the develop branch and add it there.
How Are My Users Connecting? Analyzing OAC and OBIEE entry points
Are you managing an OAC or OBIEE instance and your life is nice and easy since you feel like having everything in control: your users browse existing dashboards, create content via Analysis, Data Visualization or SmartView and deliver data via Agents or download dashboard content to use in Excel. You feel safe since you designed your platform to provide aggregated data and track every query via Usage Tracking.
But one day you start noticing new BI tools appearing in your company that provide similar KPIs to the ones you are already exposing and you start questioning where those data are coming from. Then suddently realize they are automagically sourcing data from your platform in ways you don't think you can control or manage.
Well, you're not alone, let me introduce you on how to monitor OAC/OBIEE connections via network sniffing and usage tracking in this new world of self-service BI platforms.
A Bit of History
Anybody who has been for some time in the Analytics market will be able to recognise the situation described in the image above as a direct experience: multiple people having different views on a KPI calculation and therefore results. Back in the days, that problem was strictly related to the usage of Excel as BI tool and the fact that everybody was directly accessing raw data to build up their own KPIs.
Centralised BI Solutions
The landscape started to change when Centralised Enterprise BI Solutions (like OBIEE or in more recent times OAC ) started appearing and being developed in the market. The Key point of those solutions was to provide a unique source of truth for a certain set of KPIs across the organization.
However, the fact that those tools were centralised in the hands of the IT department, meant most of the times a lack of agility for the Business Departments: every new KPI had to be well defined, understood, documented, implemented by IT, validated and delivered in a process that could take months. Even when the development phase was optimised, via DevOps practices for example, time was still burned due to the communication and coordination efforts which are necessary between Business and IT teams.
Self Service BI Platforms
In order to solve the agility problem, in the last few years a new bottom-up approach has been suggested by the latest set of self-service Analytics tools: a certain set of KPIs is developed locally directly by the Business Department and then, once the KPI has been validated and accepted, its definition and the related data model is certified to allow a broader audience to use it.
Oracle has historically been a leader on the Centralised BI platform space with OBIEE being the perfect tool for this kind of reporting. In recent years, Data Visualization closed the gap of the Self-Service Analytics, providing tools for data preparation, visualization and machine learning directly in the hands of Business Users. Oracle Analytics Cloud (OAC) combines in a unique tool both the traditional centralised BI as well as the self-service analytics providing the best option for each use case.
What we have seen at various customer is a proliferation of BI tools being acquired from various departments: most of the time a centralised BI tool is used side by side with one or more self-service with little or no control over data source usage or KPI calculation.
The transition from old-school centralised BI platform to the new bottom-up certified systems is not immediate and there is no automated solution for it. Moreover, centralised BI platforms are still key in most corporates with big investments associated with them in order to get fully automated KPI management. A complete rewrite of the well-working legacy BI solutions following the latest BI trends and tools is not a doable/affordable on short-term and definitively not a priority for the business.
A Mix of The Two
So, how can we make the old and the new world coexist in a solution which is efficient, agile, and doesn't waste all well defined KPIs that are already produced? The solution that we are suggesting more and more is the re-usage of the central BI solution as a curated data source for the self-service tools.
Just imagine the case where we have a very complex Churn Prediction formula, based on a series of fields in a star schema that has been already validated and approved by the Business. Instead of forcing a new user to rewrite the whole formula from the base tables we could just offer, based on the centralised BI system, something like:
Select "Dim Account"."Account Code", "Fact Churn"."Churn Prediction" from "Churn"
There are various benefits to this:
- No mistakes in formula recalculation
- No prior knowledge of joining Condition, filtering, aggregation needed
- Security system inheritance if specific filters or security-sensitive fields were defined, those settings will still be valid.
- No duplication of code, with different people accessing various versions of the same KPIs.
Using the centralised BI system to query existing KPIs and mashing-up with new datasources is the optimal way of giving agility to the business but at the same time certifying the validity of the core KPIs.
OBIEE as a datasource
A lot of our customers have OBIEE as their own centralised BI reporting tool and are now looking into expanding the BI footprint with a self-service tool. If the chosen tool is Oracle Data Visualization then all the hard work is already done: it natively interfaces with OBIEE's RPD and all the Subject Areas are available together with the related security constraints since the security system is shared.
But what if the self-service tool is not Oracle Data Visualization? How can you expose OBIEE's Data to an external system? Well, there are three main ways:
The first one is by using web-services: OAC (OBIEE) provides a set of SOAP web-services that can be called via python for example, with one of them being executeSQLQuery. After passing the SQL in a string the results are returned in XML format. This is the method used for example by Rittman Mead Insights. SOAP Web-services, however, can't directly be queried by BI tools this is why we created Unify to allow OBIEE connections from Tableau (which is now available for FREE!).
If you aren't using Tableau, a more generic connection method that can is accessible by most of BI tools is via ODBC: OBIEE's BIServer (the component managing the RPD) can be exposed via ODBC by installing the AdminTool Drivers and creating an ODBC connection.
Please note that the ODBC method is only available if the BIServer port is not blocked by firewalls. Once the port is open, the ODBC datasource can be queried by any tool having ODBC querying capabilities.
The last method is obviously Smartview, that allows sourcing from pre-existing or the creation of new Analysis with the option of refreshing the data on demand. Smartview is the perfect choice if your target Analytical tool is one of the two supported: Excel or Powerpoint.
Good for all use-cases?
Are the above connection methods good in every situation?
The solutions described above work really well if you let OBIEE do its job: KPI calculations, aggregations, group by and joins or, in other terms, if your aim is to extract aggregated data. OBIEE is not a massive data exporting tool, if your plan is to export 100k rows (just a random number) every time then you may need to rethink about the solution since you:
- will experience poor performances since you're adding a layer (OAC) between where the data resides (DB) and yourself
- put the OBIEE environment under pressure since it has to run the query and transform the resultset in XML before pushing it to you
If that's the use case you're looking for then you should think about alternative solutions like sourcing the data directly from the database and possibly moving your security settings there.
How Can You Monitor Who is Connecting?
Let's face the reality, in our days everyone tries to make his work as easy as it can. Business Analysts are tech savvy and configurations and connection options are just a google search away. Stopping people from finding alternative solutions to accelerate their work is counterproductive: there will be tension since the analyst work is slowed down thus the usage of the centralized BI platform will decline quickly since analysts will just move to other platforms giving them the required flexibility.
Blocking ports and access methods is not the correct way of providing a (BI) service that should be centrally controlled but used by the maximum amount of people in an organization. Therefore monitoring solutions should be created in order to:
- Understand how users are interacting with the platform
- Provide specific workarounds in cases when there is a misuse of the platform
But how can you monitor user's access? Well, you really have two options: network sniffing or usage tracking.
Network Sniffing
Let's take the example of ODBC connections directly to BI Server (RPD). Those connections can be of three main types:
- From/To the Presentation Service in order to execute queries in the front-end (e.g. via analysis) and to retrieve the data
- From OBI administrators Admin Tool to modify OAC/OBIEE's metadata but this shouldn't happen in Production systems
- From End Users ODBC connections to query OAC/OBIEE data with other BI tools
In the type one connection both the sender and receiver (Presentation and BI server) share the same IP (or IPs in case of cluster), while in the second and third type (the one we are interested) the IP address of the packet sender/receiver is different from the IP of the OBIEE server.
We can then simply use a Linux network analysis tool like tcpdump to check the traffic. With the following command, we are able to listen on port 9516
(the BI Server one) and exclude all the traffic generated from the Presentation Server (IP 192.168.1.30
)
sudo tcpdump -i eth0 -ennA 'port 9516' | grep -v "IP 192.168.1.30"
The following is a representation of the traffic
We can clearly see the traffic passing between the user's machine (IP ending with 161
and the BI Server port (IP ending with 30
and port 56639
).
This is the first tracking effort and it already provides us with some information (like users IP address) however is limited to ODBC and doesn't tell us the username. Let's see now what can we get from Usage Tracking.
Usage Tracking
We wrote a lot about Usage Tracking, how to enhance and how to use it so I don't want to repeat that. A very basic description of it: is a database table containing statistics of every query generated by OBIEE.
The "every query" bit is really important: the query doesn't have to be generated by the standard front-end (analytics), but a record is created even if is coming from Smartview or with a direct ODBC access to the BIServer.
Looking into S_NQ_ACCT
(the default table name) there is an interesting field named QUERY_SRC_CD
that, from Oracle documentation contains
The source of the request.
Checking the values for that table we can see:
Analysing the above data in Detail
- DashboardPrompt and ValuePrompt are related to display values in Prompts
- DisplayValueMap, Member Browser Display Values and Member Browser Path to Value seem related to items display when creating analysis
- Report is an Analysis execution
- SOAP is the webservices
- rawSQL is the usage of Raw SQL (shouldn't be permitted)
So SOAP identifies the webservices, what about the direct ODBC connections? they don't seem to be logged! Not really, looking more in detail in a known dataset, we discovered that ODBC connections are marked with NULL value in QUERY_SRC_CD
together with some other traffic.
Looking into the details of the Null QUERY_SRC_CD
transactions we can see two types of logs:
- The ones starting with
SELECT
are proper queries sent via an ODBC call - The ones starting with
CALL
are requests from the Presentation Server to the BI Server
Summarizing all the findings, the following query should give you the list of users accessing OBIEE via either ODBC
, SOAP
or using rawSQL
.
SELECT DISTINCT
USER_NAME,
NVL(QUERY_SRC_CD, 'RPD ODBC') SOURCE,
TRUNC(START_TS) TS
FROM S_NQ_ACCT
WHERE
AND
(
QUERY_SRC_CD IS NULL OR
UPPER(QUERY_SRC_CD) IN ('SOAP', 'RAWSQL')
)
AND QUERY_TEXT NOT LIKE '{CALL%'
ORDER BY 3 DESC;
You can, of course, do more than this, like analysing query volumes (ROW_COUNT
column) and Subject Areas afflicted in order to understand any potential misuse of the platform!
Real Example
Let's see an example I'll try logging in via ODBC and executing a query. For this I'm using RazorSQL a SQL query tool and OBIEE, exactly the same logs can be found in Oracle Analytics Cloud (OAC) once the Usage Tracking is enabled so, administrators, don't afraid your job is not going to extinct right now.
Small note: Usage Tracking may be available only on non-Autonomous version of Oracle Analytics Cloud, since some parts of the setup need command line access and server configuration changes which may not available on the Autonomous version
Setup
First a bit of a setup: In order to connect to OAC all you need to do is to download OBIEE's Administration Tool, install it and create an ODBC connection. After this we can open RazorSQL and add create a connection.
Then we need to specify our connection details, by selecting Add Connection Profile
, specifying OTHER
as Connection Profile, then selecting ODBC
as Connection Type and filling in the remaining properties. Please note that:
- Datasource Name: Select the ODBC connection entry created with the Admin tool drivers
- Login/Password: Enter the OAC/OBIEE credentials
Querying and Checking the Logs
Then it's time to connect. As expected we see in RazorSQL the list of Subject Areas as datapoints which depend on the security settings configured in Weblogic and RPD.
The Login action is not visible from Usage Tracking S_NQ_ACCT
table, it should be logged in the S_NQ_INITBLOCK
if you have Init Blocks associated with the login. Let's start checking the data and see what's going to happen. First of all, let's explore which Tables and Columns are part of the Usage Tracking Subject Area, by clicking on the +
Icon next to it.
The various Dims and Facts are exposed as Tables by the ODBC driver, now let's see if this action is logged in the database with the query
SELECT USER_NAME,
QUERY_TEXT,
QUERY_SRC_CD,
START_TS,
END_TS,
ROW_COUNT
FROM S_NQ_ACCT
We can clearly see that even checking the columns within the Measures
table is logged as ODBC call, with the column QUERY_SRC_CD
as Null as expected.
Now let's try to fire a proper SQL, we need to remember that the SQL we are writing needs to be in the Logical SQL syntax. An example can be
select `Topic`.`Repository Name` from `Usage Tracking`
Which in RazorSQL returns the row
And in the database is logged as
We can see the user who run the query, the execution time (START_TS
and END_TS
) as well as the number of rows returned (ROW_COUNT
).
We demonstrated that we now have all the info neccessary to start tracking any misuse of OAC/OBIEE as a datasource via ODBC connections.
Automating the Tracking
The easiest solution to properly track this type of OBIEE usage is to have an Agent that on daily basis reports users accessing OAC/OBIEE via ODBC. This solution is very easy to implement since all the Usage Tracking tables are already part of the Repository. Creating an Agent that reports on Usage Tracking rows having QUERY_SRC_CD
field as Null
, SOAP
or rawSQL
covers all the "non traditional" use-cases we have been talking about.
As mentioned above sourcing aggregated data from OAC/OBIEE should be considered a "good practice" since it provides the unique source of truth across the company. On the other side, exporting massive amount of data should be avoided since end-user performances will be slow and there will be an impact on OAC/OBIEE server. Thus setting an upper limit on the number of rows (e.g. ROW_COUNT > 100k
) reported by the Agent could also mean identifying all the specific data-exports cases that should drive an impact assessment and a possible solution redesign.
Conclusion
Tools and Options in the Analytical Market are exploding and more and more we'll see companies using a number of different solutions for specific purposes. Centralised BI solutions, built over the years, provide the significant advantage of containing the unique source of truth across the company and should be preserved. Giving agility to Analysts and at the same time keeping the centrality of well defined and calculated KPIs is a challenge we'll face more and more often in the future.
OAC (or OBIEE on-premises) offers the duality of both Centralised and Self-Service Analytics methods together with a variety (webservices, ODBC, Smartview) of connecting methods which makes it the perfect cornerstone of a company analytical system.
Tracking down usage, discovering potential misuse of the platform is very easy so inefficiencies can be addressed quickly to provide adequate agility and performance to all analytical business cases!
Fixing* Baseline Validation Tool** Using Network Sniffer
* Sort of
** Not exactly
In the past, Robin Moffatt wrote a number of blogs showing how to use various Linux tools for diagnosing OBIEE and getting insights into how it works (one, two, three, ...). Some time ago I faced a task which allowed me to continue Robin's cycle of posts and show you how to use Wireshark to understand how a certain Oracle tool works and how to search for the solution of a problem more effectively.
To be clear, this blog is not about the issue itself. I could simply write a tweet like "If you faced issue A then patch B solves it". The idea of this blog is to demonstrate how you can use somewhat unexpected tools and get things done.
Obviously, my way of doing things is not the only one. If you are good in searching at My Oracle Support, you possibly can do it even faster, but what is good about my way (except for it is mine, which is enough for me) is that it doesn't involve uneducated guessing. I do an observation and get a clarified answer.
Most of my blogs have disclaimers. This one is not an exception, while its disclaimer is rather small. There is still no silver bullet. This won't work for every single problem in OBIEE. I didn't say this.
Now, let's get started.
The Task
The problem was the following: a client was upgrading its OBIEE system from 11g to 12c and obviously wanted to test for regression, making sure that the upgraded system worked exactly the same as the old one. Manual comparison wasn't an option since they have hundreds or even thousands of analyses and dashboards, so Oracle Baseline Validation Tool (usually called just BVT) was the first candidate as a solution to automate the checks.
Using BVT is quite simple:
- Create a baseline for the old system.
- Upgrade
- Create a new baseline
- Compare them
- ???
- Profit! Congratulations. You are ready to go live.
Right? Well, almost. The problem that we faced was that BVT Dashboards plugin for 11g (a very old 11.1.1.7.something) gave exactly what was expected. But for 12c (12.2.1.something) we got all numbers with a decimal point even while all analyses had "no decimal point" format. So the first feeling we got at this point was that BVT doesn't work well for 12c and that was somewhat disappointing.
SPOILER
That wasn't true.I made a simple dashboard demonstrating the issue.
OBIEE 11g
Measure values in the XML produced by BVT are exactly as on the dashboard. Looks good.
OBIEE 12c
Dashboard looks good, but values in the XML have decimal digits.
As you can see, the analyses are the same or at least they look very similar but the XMLs produced by BVT aren't. From regression point of view this dashboard must get "DASHBOARDS PASSED" result, but it got "DASHBOARDS DIFFERENT".
Reading the documentation gave us no clear explanation for this behaviour. We had to go deeper and understand what actually caused it. Is it BVT screwing up the data it gets from 12c? Well, that is a highly improbable theory. Decimals were not simply present in the result but they were correct. Correct as in "the same as stored in the database", we had to reject this theory.
Or maybe the problem is that BVT works differently with 11g and 12c? Well, this looks more plausible. A few years have passed since 11.1.1.7 was released and it would not be too surprising if the old version and the modern one had different APIs used by BVT and causing this problem. Or maybe the problem is that 12c itself ignores formatting settings. Let's find out.
The Tool
Neither BVT, nor OBIEE logs gave us any insights. From every point of view, everything was working fine. Except that we were getting 100% mismatch between the source and the target. My hypothesis was that BVT worked differently with OBIEE 11g and 12c. How can I check this? Decompiling the tool and reading its code would possibly give me the answer, but it is not legal. And even if it was legal, the latest BVT size is more than 160 megabytes which would give an insane amount of code to read, especially considering the fact I don't actually know what I'm looking for. Not an option. But BVT talks to OBIEE via the network, right? Therefore we can intercept the network traffic and read it. Shall we?
There are a lot of ways to do it. I work with OBIEE quite a lot and Windows is the obvious choice for my platform. And hence the obvious tool for me was Wireshark.
Wireshark is the world’s foremost and widely-used network protocol analyzer. It lets you see what’s happening on your network at a microscopic level and is the de facto (and often de jure) standard across many commercial and non-profit enterprises, government agencies, and educational institutions. Wireshark development thrives thanks to the volunteer contributions of networking experts around the globe and is the continuation of a project started by Gerald Combs in 1998.
What this "About" doesn't say is that Wireshark is open-source and free. Which is quite nice I think.
Installation Details
I'm not going to go into too many details about the installation process. It is quite simple and straightforward. Keep all the defaults unless you know what you are doing, reboot if asked and you are fine.
If you've never used Wireshark or analogues, the main question would be "Where to install it?". The answer is pretty simple - install it on your workstation, the same workstation where BVT is installed. We're going to intercept our own traffic, not someone else's.
A Bit of Wireshark
Before going to the task we want to solve let's spend some time familiarizing with Wireshark. Its starting screen shows all the network adapters I have on my machine. The one I'm using to connect to the OBIEE servers is "WiFi 2".
I double-click it and immediately see a constant flow of network packets flying back and forth between my computer and local network machines and the Internet. It's a bit hard to see any particular server in this stream. And "a bit hard" is quite an understatement, to be honest, it is impossible.
I need a filter. For example, I know that my OBIEE 12c instance IP is 192.168.1.226
. So I add ip.addr==192.168.1.226
filter saying that I only want to see traffic to or from this machine. Nothing to see right now, but if I open the login page in a browser, for example, I can see traffic between my machine (192.168.1.25
) and the server. It is much better now but still not perfect.
If I add http
to the filter like this http and ip.addr==192.168.1.226
, I definitely can get a much more clear view.
For example, here I opened http://192.168.1.226:9502/analytics
page just like any other user would do. There are quite a lot of requests and responses. The browser asked for /analytics
URL, the server after a few redirects replied what the actual address for this URL is login.jsp
page, then browser requested /bi-security-login/login.jsp
page using GET
method and got the with HTTP code 200
. Code 200
shows that there were no issues with the request.
Let's try to log in.
The top window is a normal browser and the bottom one is Wireshark. Note that my credentials been sent via clear text and I think that is a very good argument in defence of using HTTPS everywhere.
That is a very basic use of Wireshark: start monitoring, do something, see what was captured. I barely scratched the surface of what Wireshark can do, but that is enough for my task.
Wireshark and BVT 12c
The idea is quite simple. I should start capturing my traffic then use BVT as usual and see how it works with 12c and then how it works with 11g. This should give me the answer I need.
Let's see how it works with 12c first. To make things more simple I created a catalogue folder with just one analysis placed on a dashboard.
It's time to run BVT and see what happens.
Here is the dataset I got from OBIEE 12c. I slightly edited and formatted it to make easier to read, but didn't change anything important.
What did BVT do to get this result? What API did it use? Let's look at Wireshark.
First three lines are the same as with a browser. I don't know why it is needed for BVT, but I don't mind. Then BVT gets WSDL from OBIEE (GET /analytics-ws/saw.dll/wsdl/v6/private
). There are multiple pairs of similar query-response flying back and forth because WSDL is big enough and downloaded in chunks. A purely technical thing, nothing strange or important here.
But now we know what API BVT uses to get data from OBIEE. I don't think anyone is surprised that it is Web Services API. Let's take a look at Web Services calls.
First logon
method from nQSessionService
. It logs into OBIEE and starts a session.
Next requests get catalogue items descriptions for objects in my /shared/BVT
folder. We can see a set of calls to webCatalogServce
methods. These calls are reading my web catalogue structure: all folders, subfolders, dashboard and analysis. Pretty simple, nothing really interesting or unexpected here.
Then we can see how BVT uses generateReportSQLResult
from reportService
to get logical SQL for the analysis.
And gets analysis' logical SQL as the response.
And the final step - BVT executes this SQL and gets the data. Unfortunately, it is hard to show the data on a screenshot, but the line starting with [truncated]
is the XML I showed before.
And that's all. That's is how BVT gets data from OBIEE.
I did the same for 11g and saw absolutely the same procedure.
My initial theory that BVT may have been using different APIs for 11g and 12c was busted.
From my experiment, I found out that BVT used xmlViewService
to actually get the data. And also I know now that it uses logical SQL for getting the data. Looking at the documentation I can see that xmlViewService
has no options related to any formatting. It is a purely data-retrieval service. It can't preserve any formatting and supposed to give only the data. But hey, I've started with the statement "11g preserves formatting", how is that possible? Well, that was a simple coincidence. It doesn't.
In the beginning, I had very little understanding of what keywords to use on MoS to solve the issue. "BVT for 12c doesn't preserve formatting"? "BVT decimal part settings"? "BVT works differently for 11g and 12c"? Now I have something much better - "executeSQLQuery decimal". 30 seconds of searching and I know the answer.
This was fixed in 11.1.1.9, but there is a patch for 11.1.1.7.some_of_them. The patch fixes an 11g issue which prevents BVT from getting decimal parts of numbers.
As you may have noticed I had no chance of finding this using my initial problem description. Nether BVT, nor 12g or 11.1.1.7 were mentioned. This thread looks completely unrelated to the issue, I had zero chances to find it.
Conlusion
OBIEE is a complex software and solving issues is not always easy. Unfortunately, no single method is enough for solving all problems. Usually, log files will help you. But when something works but not the way you expect, log files can be useless. In my case BVT was working fine, 11g was working fine, 12c was working fine too. Nothing special to write to logs was happening. That is why sometimes you may need unexpected tools. Just like this. Thanks for reading!
Looker for OBIEE Experts: Introduction and Concepts
Recently I've been doing some personal study around various areas including streaming, machine learning and data visualization and one of the tools that got my attention is Looker. I've initially heard about Looker from a Drill to Detail podcast and increasingly been hearing about it in conferences and use cases together with other cloud solutions like BigQuery, Snowflake and Fivetran.
I decided to give it a try myself and, since most of my career was based on Oracle Business Intelligence (OBI) writing down a comparison between the tools that could help others sharing my experience getting introduced to Looker.
OBIEE's Golden Feature: The Semantic Model
As you probably know if you have been working with OBIEE for some time the centrepiece of its architecture is the Semantic Model contained in the Repository (RPD)
In the three layers of the RPD, we model our source data (e.g. database tables) into attributes, metrics, hierarchies which can then be easily dragged and dropped by the end-user in the analysis or data visualization.
I called the RPD "OBIEE's Golden Feature" because to me it's the main benefit of the platform: abstracting the data complexity from end-users and, at the same time, optimizing the query definition to take care of all the features that could be set in the datasource. The importance of the RPD is also its centrality: within the traditional OBIEE all Analysis and Dashboard had to be based on Subject Areas exposed by the RPD meaning that the definition of the metrics was done in a unique place in a consistent manner and then spread across all the reporting providing the unique source of truth for the important KPIs in the company typical of what Gartner calls the Mode 1 Analytics.
RPD Development Speed Limitation and Mode 2 Analytics
The RPD is a centralized binary object within the OBIEE infrastructure: in order to develop and test a full OBIEE instance is required, and the merges between different streams are natively performed via the RPD's admin tool.
This complexity unified to the deep knowledge required to correctly build a valid semantic model limits the number of people being able to create and publish new content thus slowing down the process from data to insights typical of the centralized Mode 1 Analytic platform provided centrally by IT teams. Moreover, RPD development is entirely point-and-click within the admintool which is somehow considered slow and old fashion in a world of scripting, code versioning and git merging. Several solutions are out in the market (including Rittman Mead Developer Toolkit) to enhance the agility of the development but still, the skills and the toolset required to develop new content makes it a purely IT manageable solution.
In order to overcome this limitation several tools like Tableau, QlikView or Oracle's Data Visualization (included in OAC or in the Desktop version) give all the power in the ends of the end-user: from data-sources to graphing, the tools allow an end-to-end data discovery to visualization journey. The problem with those tools (called Mode 2 Analytics by Gartner) is that there is no central definition of the KPI since it's demanded to every analyst. All those tools are addressing the problem by providing some sort of datasource certification allowing a datasource to be visible and reusable publicly only when it's validated centrally. Again, for most of those tools, the modelling is done in a visual format, which makes it difficult to debug, version control and automate. I've been speaking about this subject in my presentation "DevOps and OBIEE do it before it's too late".
What if we could provide the same centralized source of truth data modelling with an easily scriptable syntax that can be developed from business users without any deep knowledge of SQL or source tables? Well, what we just described is LookML!
LookML
LookerML takes the best part of OBIEE: the idea of a modelling layer and democratizes it in order to be available to all business user with a simple language and set of concepts. Moreover, the code versioning is embedded in the tool, so there's no need to teach git branch, commit, push or pull to non-IT people.
So, what are the concepts behing LookerML and how can you get familiar with it when comparing it to the medatada modelling in the RPD?
LookML Concepts
Let's start from the basic of the RPD modelling: a database table. In LookerML each table is represented by an object called View (naming is a bit confusing). Moreover, LookerML's Views can be used not only to map existing database tables but also to create new tables based on existing content and a SQL definition, like the opaque views in OBIEE. On top of this LookML allows the phisicalization of those objects (into a table) and the definition of a schedule for the refresh. This concept is very useful when aggregates are needed, the aggregate definition (SQL) is defined within the LookML View together with the related refresh schedule.
The View itself defines only the source, a bit like the RPD's physical layer, the next step is defining how multiple Views interact within each other, or, in OBIEE terms, the Business Layer. In LookML there is an entity called Explores and is the place where we can define which Views we want to group together, and what's the linkage between them. Multiple Explores are defined in a Model, which should be unique per database. So, in OBIEE words, a Model can be compared to a Business Model with Explores being a subset of Facts and Dimensions grouped in a Subject Area.
Ok, all "easy" so far, but where do we map the columns? and where do we set the aggregations? As you might expect both are mapped within a LookML View into Fields. Fields is a generic term which includes in both metrics and attributes, LookML naming is the below:
- Dimension: in OBIEE's terms attributes of a dimension. The terminology is confusing since in LookML the Dimension is the column itself while in OBIEE terms is the table. A Dimension can be a column value or a combination of multiple values (like OBIEE's BM Logical Sources formulas). A Dimension in LookML can't have any aggregation (as in OBIEE).
- Measures: in OBIEE's terms a metric. The definition includes, the source formula in SQL syntax, the type of aggregation (min/max/count...) and the drill fields.
Filters: this is not something usually defined in OBIEE's RPD, filters are a way of passing a user choice based on a column value back to an RPD calculation formula, a bit like, for the OBIEE experts, overriding session variables with dashboard prompt values. - Parameters: again this is not something usually defined in OBIEE's RPD, you can think a Parameter as a way of setting up variables function. E.g. a Parameter with values SUM, AVG, MIN, MAX could be used to change how a certain Measure is aggregated
All good so far? Stick with me and in the future we'll explore more about LookML syntax and Looker in general!
ChitChat for OBIEE – Now Available as Open Source!
ChitChat is the Rittman Mead commentary tool for OBIEE. ChitChat enhances the BI experience by bridging conversational capabilities into the BI dashboard, increasing ease-of-use and seamlessly joining current workflows. From tracking the history behind analytical results to commenting on specific reports, ChitChat provides a multi-tiered platform built into the BI dashboard that creates a more collaborative and dynamic environment for discussion.
Today we're pleased to announce the release into open-source of ChitChat! You can find the github repository here: https://github.com/RittmanMead/ChitChat
Highlights of the features that ChitChat provides includes:
Annotate - ChitChat's multi-tiered annotation capabilities allow BI users to leave comments where they belong, at the source of the conversation inside the BI ecosystem.
Document - ChitChat introduces the ability to include documentation inside your BI environment for when you need more that a comment. Keeping key materials contained inside the dashboard gives the right people access to key information without searching.
Share - ChitChat allows to bring attention to important information on the dashboard using the channel or workflow manager you prefer.
Verified Compatibility - ChitChat has been tested against popular browsers, operating systems, and database platforms for maximum compatibility.
Getting Started
In order to use ChitChat you will need OBIEE 11.1.1.7.x, 11.1.1.9.x or 12.2.1.x.
First, download the application and unzip it to a convenient access location in the OBIEE server, such as a home directory or the desktop.
See the Installation Guide for full detail on how to install ChitChat.
Database Setup
Build the required database tables using the installer:
cd /home/federico/ChitChatInstaller
java -jar SocializeInstaller.jar -Method:BuildDatabase -DatabasePath:/app/oracle/oradata/ORCLDB/ORCLPDB1/ -JDBC:"jdbc:oracle:thin:@192.168.0.2:1521/ORCLPDB1" -DatabaseUser:"sys as sysdba" -DatabasePassword:password -NewDBUserPassword:password1
The installer will create a new user (RMREP
), and tables required for the application to operate correctly. -DatabasePath
flag tells the installer where to place the datafiles for ChitChat in your database server. -JDBC
indicates what JDBC driver to use, followed by a colon and the JDBC string to connect to your database. -DatabaseUser
specifies the user to access the database with. -DatabasePassword
specifies the password for the user previously given. -NewDBUserPassword
indicates the password for the new user (RMREP
) being created.
WebLogic Data Source Setup
Add a Data Source object to WebLogic using WLST:
cd /home/federico/ChitChatInstaller/jndiInstaller
$ORACLE_HOME/oracle_common/common/bin/wlst.sh ./create-ds.py
To use this script, modify the ds.properties
file using the method of your choice. The following parameters must be updated to reflect your installation: domain.name
, admin.url
, admin.userName
, admin.password
, datasource.target
, datasource.url
and datasource.password
.
Deploying the Application on WebLogic
Deploy the application to WebLogic using WLST:
cd /home/federico/ChitChatInstaller
$ORACLE_HOME/oracle_common/common/bin/wlst.sh ./deploySocialize.py
To use this script, modify the deploySocialize.py
file using the method of your choice. The first line must be updated with username, password and url to connect to your Weblogic Server instance. The second parameter in deploy
command must be updated to reflect your ChitChat access location.
Configuring the Application
ChitChat requires several several configuration parameters to allow the application to operate successfully. To change the configuration, you must log in to the database schema as the RMREP
user, and update the values manually into the APPLICATION_CONSTANT
table.
See the Installation Guide for full detail on the available configuration and integration options.
Enabling the Application
To use ChitChat, you must add a small block of code on any given dashboard (in a new column on the right-side of the dashboard) where you want to have the application enabled:
<rm id="socializePageParams"
user="@{biServer.variables['NQ_SESSION.USER']}"
tab="@{dashboard.currentPage.name}"
page="@{dashboard.name}">
</rm>
<script src="/Socialize/js/dashboard.js"></script>
Congratulations! You have successfully installed the Rittman Mead commentary tool. To use the application to its fullest capabilities, please refer to the User Guide.
Problems?
Please raise any issues on the github issue tracker. This is open source, so bear in mind that it's no-one's "job" to maintain the code - it's open to the community to use, benefit from, and maintain.
If you'd like specific help with an implementation, Rittman Mead would be delighted to assist - please do get in touch with Jon Mead or DM us on Twitter @rittmanmead to get access to our Slack channel for support about ChitChat.
Please contact us on the same channels to request a demo.