Tag Archives: Oracle BI Suite EE
Fixing* Baseline Validation Tool** Using Network Sniffer
* Sort of
** Not exactly
In the past, Robin Moffatt wrote a number of blogs showing how to use various Linux tools for diagnosing OBIEE and getting insights into how it works (one, two, three, ...). Some time ago I faced a task which allowed me to continue Robin's cycle of posts and show you how to use Wireshark to understand how a certain Oracle tool works and how to search for the solution of a problem more effectively.
To be clear, this blog is not about the issue itself. I could simply write a tweet like "If you faced issue A then patch B solves it". The idea of this blog is to demonstrate how you can use somewhat unexpected tools and get things done.
Obviously, my way of doing things is not the only one. If you are good in searching at My Oracle Support, you possibly can do it even faster, but what is good about my way (except for it is mine, which is enough for me) is that it doesn't involve uneducated guessing. I do an observation and get a clarified answer.
Most of my blogs have disclaimers. This one is not an exception, while its disclaimer is rather small. There is still no silver bullet. This won't work for every single problem in OBIEE. I didn't say this.
Now, let's get started.
The Task
The problem was the following: a client was upgrading its OBIEE system from 11g to 12c and obviously wanted to test for regression, making sure that the upgraded system worked exactly the same as the old one. Manual comparison wasn't an option since they have hundreds or even thousands of analyses and dashboards, so Oracle Baseline Validation Tool (usually called just BVT) was the first candidate as a solution to automate the checks.
Using BVT is quite simple:
- Create a baseline for the old system.
- Upgrade
- Create a new baseline
- Compare them
- ???
- Profit! Congratulations. You are ready to go live.
Right? Well, almost. The problem that we faced was that BVT Dashboards plugin for 11g (a very old 11.1.1.7.something) gave exactly what was expected. But for 12c (12.2.1.something) we got all numbers with a decimal point even while all analyses had "no decimal point" format. So the first feeling we got at this point was that BVT doesn't work well for 12c and that was somewhat disappointing.
SPOILER
That wasn't true.I made a simple dashboard demonstrating the issue.
OBIEE 11g
Measure values in the XML produced by BVT are exactly as on the dashboard. Looks good.
OBIEE 12c
Dashboard looks good, but values in the XML have decimal digits.
As you can see, the analyses are the same or at least they look very similar but the XMLs produced by BVT aren't. From regression point of view this dashboard must get "DASHBOARDS PASSED" result, but it got "DASHBOARDS DIFFERENT".
Reading the documentation gave us no clear explanation for this behaviour. We had to go deeper and understand what actually caused it. Is it BVT screwing up the data it gets from 12c? Well, that is a highly improbable theory. Decimals were not simply present in the result but they were correct. Correct as in "the same as stored in the database", we had to reject this theory.
Or maybe the problem is that BVT works differently with 11g and 12c? Well, this looks more plausible. A few years have passed since 11.1.1.7 was released and it would not be too surprising if the old version and the modern one had different APIs used by BVT and causing this problem. Or maybe the problem is that 12c itself ignores formatting settings. Let's find out.
The Tool
Neither BVT, nor OBIEE logs gave us any insights. From every point of view, everything was working fine. Except that we were getting 100% mismatch between the source and the target. My hypothesis was that BVT worked differently with OBIEE 11g and 12c. How can I check this? Decompiling the tool and reading its code would possibly give me the answer, but it is not legal. And even if it was legal, the latest BVT size is more than 160 megabytes which would give an insane amount of code to read, especially considering the fact I don't actually know what I'm looking for. Not an option. But BVT talks to OBIEE via the network, right? Therefore we can intercept the network traffic and read it. Shall we?
There are a lot of ways to do it. I work with OBIEE quite a lot and Windows is the obvious choice for my platform. And hence the obvious tool for me was Wireshark.
Wireshark is the world’s foremost and widely-used network protocol analyzer. It lets you see what’s happening on your network at a microscopic level and is the de facto (and often de jure) standard across many commercial and non-profit enterprises, government agencies, and educational institutions. Wireshark development thrives thanks to the volunteer contributions of networking experts around the globe and is the continuation of a project started by Gerald Combs in 1998.
What this "About" doesn't say is that Wireshark is open-source and free. Which is quite nice I think.
Installation Details
I'm not going to go into too many details about the installation process. It is quite simple and straightforward. Keep all the defaults unless you know what you are doing, reboot if asked and you are fine.
If you've never used Wireshark or analogues, the main question would be "Where to install it?". The answer is pretty simple - install it on your workstation, the same workstation where BVT is installed. We're going to intercept our own traffic, not someone else's.
A Bit of Wireshark
Before going to the task we want to solve let's spend some time familiarizing with Wireshark. Its starting screen shows all the network adapters I have on my machine. The one I'm using to connect to the OBIEE servers is "WiFi 2".
I double-click it and immediately see a constant flow of network packets flying back and forth between my computer and local network machines and the Internet. It's a bit hard to see any particular server in this stream. And "a bit hard" is quite an understatement, to be honest, it is impossible.
I need a filter. For example, I know that my OBIEE 12c instance IP is 192.168.1.226
. So I add ip.addr==192.168.1.226
filter saying that I only want to see traffic to or from this machine. Nothing to see right now, but if I open the login page in a browser, for example, I can see traffic between my machine (192.168.1.25
) and the server. It is much better now but still not perfect.
If I add http
to the filter like this http and ip.addr==192.168.1.226
, I definitely can get a much more clear view.
For example, here I opened http://192.168.1.226:9502/analytics
page just like any other user would do. There are quite a lot of requests and responses. The browser asked for /analytics
URL, the server after a few redirects replied what the actual address for this URL is login.jsp
page, then browser requested /bi-security-login/login.jsp
page using GET
method and got the with HTTP code 200
. Code 200
shows that there were no issues with the request.
Let's try to log in.
The top window is a normal browser and the bottom one is Wireshark. Note that my credentials been sent via clear text and I think that is a very good argument in defence of using HTTPS everywhere.
That is a very basic use of Wireshark: start monitoring, do something, see what was captured. I barely scratched the surface of what Wireshark can do, but that is enough for my task.
Wireshark and BVT 12c
The idea is quite simple. I should start capturing my traffic then use BVT as usual and see how it works with 12c and then how it works with 11g. This should give me the answer I need.
Let's see how it works with 12c first. To make things more simple I created a catalogue folder with just one analysis placed on a dashboard.
It's time to run BVT and see what happens.
Here is the dataset I got from OBIEE 12c. I slightly edited and formatted it to make easier to read, but didn't change anything important.
What did BVT do to get this result? What API did it use? Let's look at Wireshark.
First three lines are the same as with a browser. I don't know why it is needed for BVT, but I don't mind. Then BVT gets WSDL from OBIEE (GET /analytics-ws/saw.dll/wsdl/v6/private
). There are multiple pairs of similar query-response flying back and forth because WSDL is big enough and downloaded in chunks. A purely technical thing, nothing strange or important here.
But now we know what API BVT uses to get data from OBIEE. I don't think anyone is surprised that it is Web Services API. Let's take a look at Web Services calls.
First logon
method from nQSessionService
. It logs into OBIEE and starts a session.
Next requests get catalogue items descriptions for objects in my /shared/BVT
folder. We can see a set of calls to webCatalogServce
methods. These calls are reading my web catalogue structure: all folders, subfolders, dashboard and analysis. Pretty simple, nothing really interesting or unexpected here.
Then we can see how BVT uses generateReportSQLResult
from reportService
to get logical SQL for the analysis.
And gets analysis' logical SQL as the response.
And the final step - BVT executes this SQL and gets the data. Unfortunately, it is hard to show the data on a screenshot, but the line starting with [truncated]
is the XML I showed before.
And that's all. That's is how BVT gets data from OBIEE.
I did the same for 11g and saw absolutely the same procedure.
My initial theory that BVT may have been using different APIs for 11g and 12c was busted.
From my experiment, I found out that BVT used xmlViewService
to actually get the data. And also I know now that it uses logical SQL for getting the data. Looking at the documentation I can see that xmlViewService
has no options related to any formatting. It is a purely data-retrieval service. It can't preserve any formatting and supposed to give only the data. But hey, I've started with the statement "11g preserves formatting", how is that possible? Well, that was a simple coincidence. It doesn't.
In the beginning, I had very little understanding of what keywords to use on MoS to solve the issue. "BVT for 12c doesn't preserve formatting"? "BVT decimal part settings"? "BVT works differently for 11g and 12c"? Now I have something much better - "executeSQLQuery decimal". 30 seconds of searching and I know the answer.
This was fixed in 11.1.1.9, but there is a patch for 11.1.1.7.some_of_them. The patch fixes an 11g issue which prevents BVT from getting decimal parts of numbers.
As you may have noticed I had no chance of finding this using my initial problem description. Nether BVT, nor 12g or 11.1.1.7 were mentioned. This thread looks completely unrelated to the issue, I had zero chances to find it.
Conlusion
OBIEE is a complex software and solving issues is not always easy. Unfortunately, no single method is enough for solving all problems. Usually, log files will help you. But when something works but not the way you expect, log files can be useless. In my case BVT was working fine, 11g was working fine, 12c was working fine too. Nothing special to write to logs was happening. That is why sometimes you may need unexpected tools. Just like this. Thanks for reading!
Enabling Concurrent OBIEE RPD Development – for free
One of the most common and long standing problems with developing in OBIEE is the problem of multiple developers working on the RPD at the same time. This blog explains the problem and explores the solution that we’ve developed and have been successfully using at clients over the last couple of years. We’re pleased to announce the immediate availability of the supporting tools, as part of the Rittman Mead Open Source Project.
Before we get into the detail, I'll first explain a bit about the background to the requirement and the options that ship with OBIEE.
Why Concurrent Development
The benefits of concurrent development are obvious: scalability and flexibility of development. It enables you to scale your development team to meet the delivery demands of the business. The challenge is to manage all of the concurrent work and enable releases in a flexible manner - which is where source control comes in.
We couldn't possibly attempt to manage concurrent development on something as complex as the RPD without good version control in place. Source control (A.K.A. version control/revision control) systems like Git and Apache Subversion (SVN) are designed to track and retain all of the changes made to a code base so that you can easily backtrack to points in time where the code was behaving as expected. It tracks what people changed, who changed it, when they changed it and even why they made that change (via commit messages). They also possess merge algorithms that can automatically combine changes made to text files, as long as there are no direct conflicts on the same lines. Then there's added benefits with code branching and tagging for releases. All of this leads to quicker and more reliable development cycles, no matter what the project, so good in fact that I rely on it even when working as one developer. To (mis)quote StackOverflow, "A civilised tool for a civilised age".
All of these techniques are about reducing the risk during the development process, and saving time. Time spent developing, time spent fixing bugs, spent communicating, testing, migrating, deploying and just about every IT activity under the sun. Time that could be better spent elsewhere.
Out of the Box
Oracle provide two ways to tackle this problem in the software:
- Online Check in/out
- Multi-User Development (MUD)
However I believe that neither of these are sufficient for high standards of reliable development and releases - the reasons for which I explore below (and have been described previously). Additionally it is not possible to natively and fully integrate with version control for the RPD, which again presents a significant problem for reliable development cycles.
Firstly the online check-in and check-out system does, by design, force all development to be conducted online. This in itself is not an issue for a single developer, and is in fact a practice that we advocate for ‘sandbox’ development in isolation. However, as soon as there is more than one developer on the same server it reduces development flexibility. Two developers cannot develop their solutions in isolation and can be made to wait for assets they want to modify to be unlocked by other developers. This may be workable for a small amount of developers but does not scale well. Furthermore, the risk of losing work is much higher when working online; we've all seen the infamous "Transaction Update Failed" message when saving online. This is usually because of an inconsistency in the RPD but can be caused by less obvious reasons and usually leads to repeating some redundant work. Lastly, very large RPDs like those from BI Apps or very mature enterprise deployments pose a problem when working online. They cause the Admin Tool to work very slowly because of the calls it has to make to the server, which can be frustrating for developers. To be clear, I am certainly not advocating developing without testing your code, but given the speed of uploading an RPD to the server and the fact that it can be automated, in my experience it is far more efficient to develop offline and upload frequently for testing.
The MUD system is much better and is quite close in methodology to what we recommend in this guide. The premise works on having a production quality master RPD and then having many other individual developers with their own RPDs. The check-in and check-out system will automatically handle three-way merges to and from the master RPD when development changes are made. This is good in theory but has been maligned for years when used in practice. The version control system in MUDE is not as robust as Git or SVN for example and the conflict resolution relies on developers managing their own issues, without the ability for a source master to intervene. Ultimately there is little flexibility in this method, which makes it difficult to use in the real world.
Source Controlling the RPD
Source control is another problem as the RPD is a binary file which cannot be merged or analysed by text-based comparison tools like SVN or Git. A potential solution at one point seemed to be MDS XML, a structured, textual representation of the RPD. However, this also seemed to have some drawbacks when put into practice. Whilst some use MDS XML with great success and there are tools on the market that rely on this system, here at Rittman Mead we’ve found that there are significant risks and issues with it. We’ve come up with what we believe is a robust, scalable, and flexible approach, based around the binary RPD.
The Rittman Mead Solution to Concurrent OBIEE RPD Development
Successful development lifecycles comes down to implementation of the correct process and ensuring it is as quick and reliable as possible. Tools, like the ones described in this blog, can be used to help in both of those areas but are not a substitute for detailed knowledge of the processes and the product. A key feature of this approach is the Source Master who owns and is responsible for the overall development process. They will have a detailed understanding of the method and tools, as well as the specifics of the current and future work being done on the RPD. Things will go wrong, it is as inevitable as death and taxes - the key is to minimise the impact and frequency of these events.
The solution is based on the Gitflow method, which is one of the most established development models. The model is based on a few major concepts:
- Features - Specific items of development, these are begun from the development branch and then merged back into development when complete.
- Develop/Master Branches - Two branches of code, one representing the development stream, the other the production code.
- Releases - A branch taken from development that is then eventually merged into production. This is the mechanism for getting the development stream into production.
I highly recommend reading that blog and this cheatsheet as they explains the method excellently and what we've done here is support that model using binary RPDs and the 3-way merge facility in OBIEE. Also of relevance is this Rittman Mead blog which describes some of the techniques we're explaining here. We've open sourced some command line tools (written in Python) to ease and automate the process. You can download the code from the GitHub repository and need only an install of Python 2.7 and the OBIEE client to get started. The tooling works with both git and Subversion (SVN). We recommend the use of git, but realise that SVN is often embedded at organisations and so support that too.
Scenario
This section shows a simple example of how you might use this methodology for multiple developers to work on the RPD in a reliable way. Many of the screenshots below show SourceTree, a GUI for Git which I'm a fan for both its UI and GitFlow support.
We have two developers in our team, Basil and Manuel, who both want to work on the RPD and add in their own changes. They already have an RPD they've made and are using with OBIEE, named base.rpd
. First they initialise a Git repository, committing a copy of their RPD (base.rpd
).
The production branch is called master
and the development branch develop
, following the standard naming convention for GitFlow.
Before we get started, let's a take a look at the RPD we're working with:
Simple Concurrent Development
Now Basil and Manuel both independently start features F01
and F02
respectively:
python obi-merge-git.py startFeature F01
Each developer is going to add a measure column for Gross Domestic Product (GDP) to the logical fact table, but in different currencies. Basil adds "GDP (GBP)" as a logical column and commits it to his development branch, F01
. Manuel does the same on his, adding "GDP (USD)" as a logical column and committing it to F02
.
Now Basil finishes his feature, which merges his work back into the develop
branch.
This doesn't require any work, as it's the first change to development to occur.
python obi-merge-git.py finishFeature F01
Checking out develop...
Already on 'develop'
Merging feature/F01 into develop...
Successfully merged feature/F01 to the develop branch.
When Manuel does the same, there is some extra work to do. To explain what's happening we need to look at the 3 merge candidates in play, using the terminology of OBIEE’s 3-way merge functionality:
- Original: This is the state of the development repository from when the feature was created.
- Modified: This is your repository at the time of finishing the feature.
- Current: This is the state of the development repository at the time of finishing the feature.
When Basil completed F01, the original and current RPDs were the same, so it could just be overridden with the new RPD. However now, the Original and Current RPDs are different, so we need to resolve the changes. Our RPDs are binary files and so we need to use the 3-way merge from the Admin Tool. The python script wrapped around this process uses Git’s metadata to determine the appropriate merge candidates for invoking the OBIEE 3-way merge.
Since our changes do not conflict so this can happen automatically without user intervention. This is one of the critical differences from doing the same process in MDS XML, which would have thrown a git merge conflict (two changes to the same Logical Table, and thus same MDS XML file) requiring user intervention.
python obi-merge-git.py finishFeature F02
Checking out develop...
Already on 'develop'
Merging feature/F02 into develop...
warning: Cannot merge binary files: base.rpd (HEAD vs. feature/F02)
Creating patch...
Patch created successfully.
Patching RPD...
RPD patched successfully.
RPD Merge complete.
Successfully merged feature/F02 to the develop branch.
In the background the script uses the comparerpd
and patchrpd
OBIEE commands.
Release
Now our development branch has both features in, which we can see using the Admin Tool:
To get this into production we can start a release process:
python obi-merge-git.py startRelease v1.00
This creates a new branch from develop
that we can use to apply bug fixes if we need to. Any changes made to the release now will be applied back into development when the release is complete as well as being merged into the production branch. The developers realise they have forgotten to put the new columns in the presentation layer, so they do it now in the release branch as a bugfix. In GitFlow, bugfixes are last minute changes that need to be made for a release but do not interfere with the next development cycle, which may have already begun (in the develop
branch) by the time the bug was spotted. The changes are merged back to develop
as well as master
so the fix isn't lost in the next cycle.
This is committed to the repo and then the release is finished:
python obi-merge-git.py finishRelease v1.00
After the release we can see that the master
and develop
branches are at the same commit point, with a tag of the release name added in too. Additionally we can switch to the develop
and master
branches and see all of the changes including the columns in the presentation layer. The full commit history of course remains if we want to roll back to other RPDs.
Conflicted Development
Basil and Manuel start their new features, F03 and F04 respectively. This time they’re working on the same existing column - something that a “Source Master” should have helped avoid, but missed this time. Basil edits the column formula of the "Area" column and renames it to "Area (sqm)"" and Manuel does the same, naming his column "Area (sqFt)".
They both commit the changes to their own feature branches and Manuel merges his back to development with no problem.
python obi-merge-git.py finishFeature F04
However when Basil tries to finish his feature the obvious conflict occurs, as the automatic merge cannot resolve without some human intervention since it is the same object in the RPD affected by both changes. At this point, the script will open up the current RPD in the Admin Tool and tell Basil to merge his changes manually in the tool, going through the usual conflict resolution process. The script provides 3 RPDs to make the RPD choosing step unambiguous:
original.rpd
modified.rpd
current.rpd
(Opened)
python obi-merge-git.py finishFeature F03
Checking out develop...
Already on 'develop'
Merging feature/F03 into develop...
warning: Cannot merge binary files: base.rpd (HEAD vs. feature/F03)
Creating patch...
Patch created successfully.
Patching RPD...
Failed to patch RPD. See C:\Users\Administrator\Documents\obi-concurrent-develop\patch_rpd.log for details.
Conflicts detected. Can resolve manually using the Admin Tool.
Original RPD: C:\\Users\\Administrator\\Documents\\rpd-test\a.rpd (original.rpd)
Current RPD: C:\\Users\\Administrator\\Documents\\rpd-test\c.rpd (Opened)
Modified RPD: C:\\Users\\Administrator\\Documents\\rpd-test\b.rpd (modified.rpd)
Perform a full repository merge using the Admin Tool and keep the output name as the default or C:\\Users\\Administrator\\Documents\\rpd-test\base.rpd
Will open RPD using the Admin Tool.
Press Enter key to continue.
You must close the AdminTool after completing the merge manually in order for this script to continue.
When Basil hits a key, the Admin Tool opens up, and from here he needs to manually initiate the merge and specify the merge candidates. This is made easy by the script which automatically names them appropriately:
Note that the a
, b
and c
RPDs are part of the merge logic with Git and can be ignored here.
Basil assigns the original
and modified
RPDs to the correct parts of the wizard and then resolves the conflict (choosing his change) in the next step of the wizard.
Upon closing the Admin Tool, the Git merge to the develop
branch is automatically completed.
Now when they look in the development RPD they can see the column named as "Area (sqm)", having accepted Basil's change. Of course this is a trivial example, but because the method relies on using the Admin Tool, it will be just as reliable as a manual 3-way merge you would perform in OBIEE.
In my experience, most of the problems with 3-way merging is that developers get confused as to which candidates to choose or they lose track of a true original point from when both developers started working. Using this method eliminates both of the those problems, with the added benefit of tight integration into source control. Even with an easier interface to the 3-way merge process, developers and/or the Source Master should be aware of some of the ‘features’ of OBIEE’s 3-way merge. For example, if a change has occurred on the physical layer which does not have any representations at all in the business or presentation layers, it may be lost during a 3-way merge. Another is that the merge rules are not guaranteed to stay the same between OBIEE versions, which means that we cannot be certain our development lifecycle is stable after patching or upgrading OBIEE.
So given this, and as a general core tenet of good software development practice, you should be automatically testing your RPDs after the merge and before release.
Testing the RPD
There are still issues with OBIEE RPD merging that aren't rectified by the 3-way merge and so must be handled manually if and when they occur. One such example is that if a change has occurred on the physical layer which does not have any representations at all in the business or presentation layers, it may be lost during a 3-way merge. Another problem is that the merge rules are not guaranteed to stay the same between OBIEE versions, which means that we cannot be certain our development lifecycle is stable after patching or upgrading OBIEE. Another thing I don't really like is the inherent bias the merge process has toward the modified RPD, instead of treating the modified and current RPDs equally. The merge candidates in the tool have been selected in such a way as to mitigate this problem but I am wary it may have unforeseen consequences for some as yet untested scenarios. There are may be other inconsistencies, but it is difficult to pin down all of the scenarios precisely and that's one of the main stumbling blocks when managing a file as complex as the RPD. Even if we didn't receive any conflicts, it is vital that RPDs are checked and tested (preferably automatically) before release.
The first step to testing is to create a representative test suite, which will encompass as much of the functionality of your system in as few reports as possible. The reason for this is that it is often impractical and sometimes invalid to check the entire catalogue at once. Furthermore, the faster the testing phase occurs, the quicker the overall release process will be. The purpose of a test suite is so that we can take a baseline of the data of each report from which we can validate consistency after making changes. This means your test suite should contain reports that are expected not to change after making changes to RPD. Furthermore you need to be careful that the underlying data of the report does not change between the baseline capture and the regression validation phases, otherwise you will invalidate your test.
In terms of tooling, Oracle provide BVT which can be used outside of upgrades to perform automated regression tests. This is good as it provides both data checks as well as visual validation. Furthermore, it can be run on a specific Web/Presentation Catalog folder directly, as opposed to the whole system.
As well as Oracle’s BVT, we also have an in-house Regression Testing tool that was written prior to BVT’s availability, and is still used to satisfy specific test scenarios. Built in Python, it is part of a larger toolset that we use with clients for automating the full development lifecycle for OBIEE, including migrating RPDs and catalogue artefacts between environments.
This brings us onto the last piece in the DevOps puzzle is continuous integration (CI). This is the concept of automatically testing and deploying code to a higher environment as soon as the work is complete. This is something not explicitly covered by the tools in this blog, however would work nicely used with the testing and migration scripts described above. This could all be made seamless by invoking the processes via script calls or better using Git hooks.
Summary
The success of an OBIEE concurrent development approach comes down to two things: the tooling, and the rigorous implementation of the process - and it is the latter that is key. In this article I’ve demonstrated the tooling that we’ve developed, along with the process required for a successful development method. Here at Rittman Mead we have detailed understanding and experience in the process and framework necessary to implement it at any client, adapting and advising to ensure the integration into existing in-house development and release requirements. The real world is messy and developers don't all work in the same way. A single tool in isolation is not going to succeed in making OBIEE - designed from the outset as a single-developer tool - scale to multiple developers. Instead of insisting that you change to accommodate our tool, we instead bring our tool and process and adapt to suit you.
You can find the code used in this blog up on GitHub and if you would like to discuss how Rittman Mead can help implement concurrent OBIEE RPD development successfully at your organisation, please get in touch.
OBIEE12c – Upgrading to Version 12.2.1.1
INTRODUCTION
The new version of OBIEE 12c, 12.2.1.1 to be exact, is out, so let’s talk about it. It’s my intent that after reading this, you can expect some degree of comfort in regards to possibly doing this thing yourself (should you find yourself in just such a circumstance), but if not, feel free to drop us a line or give us a ring. It should be noted that Oracle documentation explicitly indicates that you’re going to need to upgrade to OBIEE version 12.2.1.0, which is to say you’re going to have to bring your 11g instance up to 12c before you can proceed with another upgrade. A colleague here at RM and I recently sat down to give the upgrade process (click there for the Oracle doc) a go on one of our hosted windows servers, and here’s the cut and dry of it. The examples throughout will be referencing both Linux and Windows, so choose how you’d like. Now, if you’ve gone through the 12c install process before, you’ll be plenty familiar with roughly 80% of the steps involved in the upgrade. Just to get this out of the way, no, it’s not a patch (in the sense that you’re actually going through the OBIEE patching process using OPatch). In fact, the process almost exactly mirrors a basic 12c install, with the addition of a few steps that I will make darn sure we cover in their entirety below. Speaking of which, I’m not going to do a play-by-play of the whole thing, but simply highlight those steps that are wholly unfamiliar. To provide some context, let’s go through the bullet points of what we’ll actually be doing during the upgrade.
First, we’ll make sure we have a server appropriate, supported version of java installed (8_77 is the lowest version) and that this guy corresponds to the JAVA_HOME you’ve got set up.
Next, we’ll be running the install for the WebLogic server into a NEW oracle home. That’s right, you heard me. A. new. oracle. home.
After that, we’ll be running a readiness check to make sure our OBIEE bits won’t run into any trouble during the actual upgrade process. This checks all OBIEE components, including those schemas you installed during the initial install process. Make sure to have your application database admin credentials on hand (we’ll talk about what you need below in more detail). The end of this step will actually have us upgrade all those pieces the readiness checker deems worthy of an upgrade.
Next, we’ll reconfigure and upgrade our existing domain by running the RECONFIGURATION WIZARD!!!!! and upgrade assistant, respectively.
Lastly, we’ll start up our services, cross our fingers, hold onto our four leaf clovers, etc.. (just kidding, at least about that last part).
Before we begin, however, let’s check off a few boxes on the ‘must have’ list.
Download all the files here, and make sure you get the right versions for whatever kind of server your version of OBIEE hangs out in. The java version will be 8_101 which will work out just fine even though the minimum needed is 8_77.
Get those database credentials! If you don’t know, drop everything and figure out how you’re going to access the application database within which the OBIEE 12c schemas were installed. You’ll need the user name/pass for the SYS user (or user with SYS privileges), and the database connection string as well, including the service name, host, and port.
Make sure you have enough disk space wherever you’re installing the upgrade. The downloads for the upgrade aren’t small. You should have at least 150GB, on a DEV box, say. You don’t want to have to manage allocating additional space at a time like this, especially if it involves putting in a ticket with IT (wink wink)! Speaking of which, you’ll also need the server credentials for whichever user 12c was installed under. Note that you probably don’t need root if it was a linux machine, however there have been some instances where I’ve needed to have these handy, as there were some file permission issues that required root credentials and were causing errors during an install. You’ll also need the weblogic/obiee admin user (if you changed the name for some reason).
Lastly, make sure you’re at least a tad bit familiar with both the path to the oracle and to the domain home.
SETTING UP JAVA
After downloading the version of Java you need, go ahead update it via the .rpm or .exe, etc… Make sure to update any environment variables you have set up, and to update both the JAVA_HOME
variable AND the PATH
to reference the new Java location. As stated above, at the time of this blog, the version we used, and that is currently available, is 8_101. During the upgrade process, we got a warning (see below) about our version not being 8_77. If this happens to you, just click Next. Everything will be alright, promise.
A NEW HOME FOR ORACLE
Did you click the link to the upgrade doc yet? If not, do so now, as things are about to get kind of crazy. Follow along as we walk through the next steps if you’d like. First, stop services and disable the SSL like it tells you to. Then, start OBIEE services back up and then run the infrastructure jar (java -jar fmw_12.2.1.1.0_infrastructure.jar
) for the WebLogic server install. Again, I’m not going to go pic by pic here as you can assume most everything resembles the initial 12c install process, and this part is no different. The one piece of this puzzle we need to focus on is establishing a new oracle home. After skipping those auto updates, move onto step 3 where we are, in fact, going to designate a new oracle home. You’ll see that, after completing the WebLogic install, we’ll have a bunch of updated feature sets, in addition to some new directories in our 12.2.1.1 oracle home. For example, if your original home is something like:
/u01/app/oracle/fmw
change it to:
when it asks you to enter a new one.
Breeze through the rest of the steps here, and remember to save that response file!
UPDATING OBIEE
Unzip both of the fmw_12.2.1.1.0_bi_linux64_Disk#_#of2.zip
files, making sure that your OBIEE install files are in the same directory. For windows, this will be the executable from the first zip file, and the zip file from the second part of disk 1. Execute the binary (on linux) or .exe, going through the usual motions and then in step 3, enter the NEW oracle home for 12.2.1.1. In the example above, it would be:
/u01/app/oracle/fmw2
for Linux, and likewise, for Windows:
Again, there isn’t too much to note or trap you here beyond just making sure that you take special care not to enter your original oracle home, but the one you created in the previous section. Proceed through the next steps as usual and remember, save your response file!
UPDATING THE 12C SCHEMAS - USING THE READINESS CHECKER AND UPDATE ASSISTANT
Run the readiness checker from:
NEW_ORACLE_HOME/oracle_common/upgrade/bin/ua -readiness
This next series of steps will take you through all the schemas currently deployed on your application database and confirm that they won’t explode once you take them through the upgrade process. In step 2 of 6, make sure that you’re entering the port for EM/Console (9500 by default). Remember when I said you’re going to need the DB credentials you used to install 12c in the first place? Well here’s where we’re going to use them. The readiness checker will guide you through a bunch of screens that essentially confirms the credentials for each schema installed, and then presents a report detailing which of these will actually get upgraded. That is to say, there are some that won’t be. I really like this new utility as an extra vote of confidence for a process that can admittedly be oftentimes troublesome.
Once you’ve validated that those schemas ready for update, go ahead and stop OBI12c services using the EXISTING oracle home.
Pro tip: they’ve made it super easy to do this now by just pointing your bash_profile to the binaries directory in OBIEE’s bitools folder (ORACLE_HOME/user_projects/domains/bi/bitools/bin). After logging this entry in your profile, you can simply type start.sh or stop.sh to bring everything up or down, not to mention take advantage of the myriad other scripts that are in there. Don't type those paths out every time.
I digress… After the services come down, run the upgrade assistant from within the NEW oracle home, as below:
Citing the previous example:
NEW_ORACLE_HOME/oracle_common/upgrade/bin/ua
After bringing up the install dialogue box, move on to step 2, and select the All Schemas Used by a Domain option (as in the example above), unless of course you’d like to hand select which ones you’d like to upgrade. I suppose if you were thinking about scrapping one you had previously installed, then this would be a good option for you. Make sure the domain directory you specify is from your existing/old 12c instance, as below:
Move through the next series of steps, which are more or less self explanatory (no tricks here, promise), once again validating connection credentials until you get to step 12. As always, save the response file, select Upgrade, and then watch the magic happen,….hopefully. Congratulations, you’ve just updated your schemas!
WHO INVITED A WIZARD TO THE PARTY? - RECONFIGURING THE BI DOMAIN AND UPDATING THE BI CONFIGURATION
Like I said before, I won’t be covering every single step of this process i.e, doing the map viewer portion, which means you’ll have to still consult the…oracle, on some of this stuff. That being said, don’t gloss over backing up the map viewer file..you have to do it. This is simply an attempt to help make the upgrade process a little easier to swallow and hopefully make some of the more confusing steps a bit clearer. Moving on. Guess what? It’s time to run another series of dialogue boxes. Beats the heck out of scripting this stuff though, I guess. Open up the RECONFIGURATION WIZARD!!!!! as instructed in the documentation, from the location within your NEW oracle home. The first step will prompt us for the location of the domain we want to upgrade. We want to upgrade our existing 12c domain (the old one). So type that in/browse for it. Right now.
Validate your java version and location in step 3 and then click your way through the next few screens, ensuring that you’ve at least given your stamp of approval on any pre-filled or manually filled entries in each dialogue box. Leave step 7 alone and click Next to get to the screen where we’re actually going to be starting the reconfiguration process. Click through and exit the RECONFIGURATION WIZARD!!!!!
Don’t forget to restore the map viewer config file at this point, and then launch the configuration assistant again, this time selecting the All Configurations Used By a Domain option in step 2. Make sure you’ve entered the location of the existing 12c domain in this step as well, and NOT the one created under the new oracle home.
Click through the next steps, again, paying close attention to all prompts and the location for the map viewer xml file. Verify in step 7 that the directory locations referenced for both domain and oracle map viewer are for the existing locations and NOT those created by the install of the update.
WRAPPING UP AND NOTES
You can now boot up ssl (as below) and then start OBIEE services.
DOMAIN_HOME/bitools/bin/ssl.sh internalssl true
Note: if you have tnsadmin or ldap.ora, place copies under NEW_ORACLE_HOME/network/admin
You can ignore the new oracle home created at this time, as, in my opinion, we’re going to have to do something similar for any following updates
for 12c. What did you think of the upgrade process and did you run into any issues? Thanks so much for reading, and as always, if you find any inconsistencies or errors please let us hear about them!
OBIEE 12c – Extended Subject Areas (XSA) and the Data Set Service
One of the big changes in OBIEE 12c for end users is the ability to upload their own data sets and start analysing them directly, without needing to go through the traditional data provisioning and modelling process and associated leadtimes. The implementation of this is one of the big architectural changes of OBIEE 12c, introducing the concept of the Extended Subject Areas (XSA), and the Data Set Service (DSS).
In this article we’ll see some of how XSA and DSS work behind the scenes, providing an important insight for troubleshooting and performance analysis of this functionality.
What is an XSA?
An Extended Subject Area (XSA) is made up of a dataset, and associated XML data model. It can be used standalone, or “mashed up” in conjunction with a “traditional” subject area on a common field
How is an XSA Created?
At the moment the following methods are available:
- “Add XSA” in Visual Analzyer, to upload an Excel (XLSX) document
-
CREATE DATASET
logical SQL statement, that can be run through any interface to the BI Server, including ‘Issue Raw SQL’, nqcmd, JDBC calls, and so on -
Add Data Source in Answers. Whilst this option shouldn’t actually be present according to a this doc, it will be for any users of 12.2.1 who have uploaded the SampleAppLite BAR file so I’m including it here for completeness.
Under the covers, these all use the same REST API calls directly into datasetsvc
. Note that these are entirely undocumented, and only for internal OBIEE component use. They are not intended nor supported for direct use.
How does an XSA work?
External Subject Areas (XSA) are managed by the Data Set Service (DSS). This is a java deployment (datasetsvc
) running in the Managed Server (bi_server1), providing a RESTful API for the other OBIEE components that use it.
The end-user of the data, whether it’s Visual Analyzer or the BI Server, send REST web service calls to DSS, storing and querying datasets within it.
Where is the XSA Stored?
By default, the data for XSA is stored on disk in SINGLETON_DATA_DIRECTORY/components/DSS/storage/ssi
, e.g. /app/oracle/biee/user_projects/domains/bi/bidata/components/DSS/storage/ssi
[oracle@demo ssi]$ ls -lrt /app/oracle/biee/user_projects/domains/bi/bidata/components/DSS/storage/ssi|tail -n5 -rw-r----- 1 oracle oinstall 8495 2015-12-02 18:01 7e43a80f-dcf6-4b31-b898-68616a68e7c4.dss -rw-r----- 1 oracle oinstall 593662 2016-05-27 11:00 1beb5e40-a794-4aa9-8c1d-5a1c59888cb4.dss -rw-r----- 1 oracle oinstall 131262 2016-05-27 11:12 53f59d34-2037-40f0-af21-45ac611f01d3.dss -rw-r----- 1 oracle oinstall 1014459 2016-05-27 13:04 a4fc922d-ce0e-479f-97e4-1ddba074f5ac.dss -rw-r----- 1 oracle oinstall 1014459 2016-05-27 13:06 c93aa2bd-857c-4651-bba2-a4f239115189.dss
They’re stored using the format in which they were created, which is XLSX (via VA) or CSV (via CREATE DATASET
)
[oracle@demo ssi]$ head 53f59d34-2037-40f0-af21-45ac611f01d3.dss "7 Megapixel Digital Camera","2010 Week 27",44761.88 "MicroPod 60Gb","2010 Week 27",36460.0 "MP3 Speakers System","2010 Week 27",36988.86 "MPEG4 Camcorder","2010 Week 28",32409.78 "CompCell RX3","2010 Week 28",33005.91
There’s a set of DSS-related tables installed in the RCU schema BIPLATFORM
, which hold information including the XML data model for the XSA, along with metadata such as the user that uploaded the file, when they uploaded, and then name of the file on disk:
How Can the Data Set Service be Configured?
The configuration file, with plenty of inline comments, is at ORACLE_HOME/bi/endpointmanager/jeemap/dss/DSS_REST_SERVICE.properties
. From here you con update settings for the data set service including upload limits as detailed here.
XSA Performance
Since XSA are based on flat files stored in disk, we need to be very careful in their use. Whilst a database may hold billions of rows in a table with with appropriate indexing and partitioning be able to provide sub-second responses, a flat file can quickly become a serious performance bottleneck. Bear in mind that a flat file is just a bunch of data plopped on disk – there is no concept of indices, blocks, partitions — all the good stuff that makes databases able to do responsive ad-hoc querying on selections of data.
If you’ve got a 100MB Excel file with thousands of cells, and want to report on just a few of them, you might find it laggy – because whether you want to report on them on or not, at some point OBIEE is going to have to read all of them regardless. We can see how OBIEE is handling XSA under the covers by examining the query log. This used to be called nqquery.log
in OBIEE 11g (and before), and in OBIEE 12c has been renamed obis1-query.log
.
In this example here I’m using an Excel worksheet with 140,000 rows and 78 columns. Total filesize of the source XLSX on disk is ~55Mb.
First up, I’ll build a query in Answers with a couple of the columns:
The logical query uses the new XSA
syntax:
SELECT 0 s_0, XSA('prodney'.'MOCK_DATA_bigger_55Mb')."Columns"."first_name" s_1, XSA('prodney'.'MOCK_DATA_bigger_55Mb')."Columns"."foo" s_2 FROM XSA('prodney'.'MOCK_DATA_bigger_55Mb') ORDER BY 2 ASC NULLS LAST FETCH FIRST 5000001 ROWS ONLY
The query log shows
Rows 144000, bytes 13824000 retrieved from database query Rows returned to Client 200
So of the 55MB of data, we’re pulling all the rows (144,000) back to the BI Server for it to then perform the aggregation on it, resulting in the 200 rows returned to the client (Presentation Services). Note though that the byte count is lower (13Mb) than the total size of the file (55Mb).
As well as aggregation, filtering on XSA data also gets done by the BI Server. Consider this example here, where we add a predicate:
In the query log we can see that all the data has to come back from DSS to the BI Server, in order for it to filter it:
Rows 144000, bytes 23040000 retrieved from database Physical query response time 24.195 (seconds), Rows returned to Client 0
Note the time taken by DSS — nearly 25 seconds. Compare this later on to when we see the XSA data served from a database, via the XSA Cache.
In terms of BI Server (not XSA) caching, the query log shows that a cache entry was written for the above request:
Query Result Cache: [59124] The query for user 'prodney' was inserted into the query result cache. The filename is '/app/oracle/biee/user_projects/domains/bi/servers/obis1/cache/NQS__736113_56359_0.TBL'
If I refresh the query in Answers, the data is fetched anew (per this changed behaviour in OBIEE 12c), and the cache repopulated. If I clear the Presentation Services cache and re-open the analysis, I get the results from the BI Server cache, and it doesn’t have to refetch the data from the Data Set Service.
Since the cache has two columns in, an attribute and a measure, I wondered if running a query with just the fact rolled up might hit the cache (since it has all the data there that it needs)
Unfortunately it didn’t, and to return a single row of data required BI Server to fetch all the rows again – although looking at the byte count it appears it does prune the columns required since it’s now just over 2Mb of data returned this time:
Rows 144000, bytes 2304000 retrieved from database Rows returned to Client 1
Interestingly if I build an analysis with several more of the columns from the file (in this example, ten of a total of 78), the data returned from the DSS to BI Server (167Mb) is greater than that of the original file (55Mb).
Rows 144000, bytes 175104000 Rows returned to Client 1000
And this data coming back from the DSS to the BI Server has to go somewhere – and if it’s big enough it’ll overflow to disk, as we can see when I run the above:
$ ls -l /app/oracle/biee/user_projects/domains/bi/servers/obis1/tmp/obis_temp [...] -rwxrwx--- 1 oracle oinstall 2910404 2016-06-01 14:08 nQS_AG_22345_7503_7c9c000a_50906091.TMP -rwxrwx--- 1 oracle oinstall 43476 2016-06-01 14:08 nQS_AG_22345_7504_7c9c000a_50906091.TMP -rw------- 1 oracle oinstall 6912000 2016-06-01 14:08 nQS_AG_22345_7508_7c9c000a_50921949.TMP -rw------- 1 oracle oinstall 631375 2016-06-01 14:08 nQS_EX_22345_7506_7c9c000a_50921652.TMP -rw------- 1 oracle oinstall 3670016 2016-06-01 14:08 nQS_EX_22345_7507_7c9c000a_50921673.TMP [...]
You can read more about BI Server’s use of temporary files and the impact that it can have on system performance and particularly I/O bandwidth in this OTN article here.
So – as the expression goes – “buyer beware”. XSA is an excellent feature, but used in its default configuration with files stored on disk it has the potential to wreak havoc if abused.
XSA Caching
If you’re planning to use XSA seriously, you should set up the database-based XSA Cache. This is described in detail in the PDF document attached to My Oracle Support note OBIEE 12c: How To Configure The External Subject Area (XSA) Cache For Data Blending| Mashup And Performance (Doc ID 2087801.1).
In a proper implementation you would follow in full the document, including provisioning a dedicated schema and tablespace for holding the data (to make it easier to manage and segregate from other data), but here I’m just going to use the existing RCU schema (BIPLATFORM), along with the Physical mapping already in the RPD (10 - System DB (ORCL)
):
In NQSConfig.INI
, under the XSA_CACHE
section, I set:
ENABLE = YES; # The schema and connection pool where the XSA data will be cached. PHYSICAL_SCHEMA = "10 - System DB (ORCL)"."Catalog"."dbo"; CONNECTION_POOL = "10 - System DB (ORCL)"."UT Connection Pool";
And restart the BI Server:
/app/oracle/biee/user_projects/domains/bi/bitools/bin/stop.sh -i obis1 && /app/oracle/biee/user_projects/domains/bi/bitools/bin/start.sh -i obis1
Per the document, note that in the BI Server log there’s an entry indicating that the cache has been successfully started:
[101001] External Subject Area cache is started successfully using configuration from the repository with the logical name ssi. [101017] External Subject Area cache has been initialized. Total number of entries: 0 Used space: 0 bytes Maximum space: 107374182400 bytes Remaining space: 107374182400 bytes. Cache table name prefix is XC2875559987.
Now when I re-run the test XSA analysis from above, returning three columns, the BI Server goes off and populates the XSA cache table:
-- Sending query to database named 10 - System DB (ORCL) (id: <<79879>> XSACache Create table Gateway), connection pool named UT Connection Pool, logical request hash b4de812e, physical request hash 5847f2ef: CREATE TABLE dbo.XC2875559987_ZPRODNE1926129021 ( id3209243024 DOUBLE PRECISION, first_n[..]
Or rather, it doesn’t, because PHYSICAL_SCHEMA
seems to want the literal physical schema, rather than the logical physical one (?!) that the USAGE_TRACKING
configuration stanza is happy with in referencing the table.
Properties: description=<<79879>> XSACache Create table Exchange; producerID=0x1561aff8; requestID=0xfffe0034; sessionID=0xfffe0000; userName=prodney; [nQSError: 17001] Oracle Error code: 1918, message: ORA-01918: user 'DBO' does not exist
I’m trying to piggyback on SA511’s existing configruation, which uses catalog.schema notation:
Instead of the more conventional approach to have the actual physical schema (often used in conjunction with ‘Require fully qualified table names’ in the connection pool):
So now I’ll do it properly, and create a database and schema for the XSA cache – I’m still going to use the BIPLATFORM schema though…
Updated NQSConfig.INI:
[ XSA_CACHE ] ENABLE = YES; # The schema and connection pool where the XSA data will be cached. PHYSICAL_SCHEMA = "XSA Cache"."BIEE_BIPLATFORM"; CONNECTION_POOL = "XSA Cache"."XSA CP";
After refreshing the analysis again, there’s a successful creation of the XSA cache table:
-- Sending query to database named XSA Cache (id: <<65685>> XSACache Create table Gateway), connection pool named XSA CP, logical request hash 9a548c60, physical request hash ccc0a410: [[ CREATE TABLE BIEE_BIPLATFORM.XC2875559987_ZPRODNE1645894381 ( id3209243024 DOUBLE PRECISION, first_name2360035083 VARCHAR2(17 CHAR), [...]
as well as a stats gather:
-- Sending query to database named XSA Cache (id: <<65685>> XSACache Collect statistics Gateway), connection pool named XSA CP, logical request hash 9a548c60, physical request hash d73151bb: BEGIN DBMS_STATS.GATHER_TABLE_STATS(ownname => 'BIEE_BIPLATFORM', tabname => 'XC2875559987_ZPRODNE1645894381' , estimate_percent => 5 , method_opt => 'FOR ALL COLUMNS SIZE AUTO' ); END;
Although I do note that it is used a fixed estimate_percent
instead of the recommended AUTO_SAMPLE_SIZE
. The table itself is created with a fixed prefix (as specified in the obis1-diagnostic.log at initialisation), and holds a full copy of the XSA (not just the columns in the query that triggered the cache creation):
With the dataset cached, the query is then run and the query log shows a XSA cache hit
External Subject Area cache hit for 'prodney'.'MOCK_DATA_bigger_55Mb'/Columns : Cache entry shared_cache_key = 'prodney'.'MOCK_DATA_bigger_55Mb', table name = BIEE_BIPLATFORM.XC2875559987_ZPRODNE2128899357, row count = 144000, entry size = 201326592 bytes, creation time = 2016-06-01 20:14:26.829, creation elapsed time = 49779 ms, descriptor ID = /app/oracle/biee/user_projects/domains/bi/servers/obis1/xsacache/NQSXSA_BIEE_BIPLATFORM.XC2875559987_ZPRODNE2128899357_2.CACHE
with the resulting physical query fired at the XSA cache table (replacing what would have gone against the DSS web service):
-- Sending query to database named XSA Cache (id: <<65357>>), connection pool named XSA CP, logical request hash 9a548c60, physical request hash d3ed281d: [[ WITH SAWITH0 AS (select T1000001.first_name2360035083 as c1, T1000001.last_name3826278858 as c2, sum(T1000001.foo2363149668) as c3 from BIEE_BIPLATFORM.XC2875559987_ZPRODNE1645894381 T1000001 group by T1000001.first_name2360035083, T1000001.last_name3826278858) select D1.c1 as c1, D1.c2 as c2, D1.c3 as c3, D1.c4 as c4 from ( select 0 as c1, D102.c1 as c2, D102.c2 as c3, D102.c3 as c4 from SAWITH0 D102 order by c2, c3 ) D1 where rownum <= 5000001
It’s important to point out the difference of what’s happening here: the aggregation has been pushed down to the database, meaning that the BI Server doesn’t have to. In performance terms, this is a Very Good Thing usually.
Rows 988, bytes 165984 retrieved from database query Rows returned to Client 988
Whilst it doesn’t seem to be recorded in the query log from what I can see, the data returned from the XSA Cache also gets inserted into the BI Server cache, and if you open an XSA-based analysis that’s not in the presentation services cache (a third cache to factor in!) you will get a cache hit on the BI Server cache. As discussed earlier in this article though, if an analysis is built against an XSA for which a BI Server cache entry exists that with manipulation could service it (eg pruning columns or rolling up), it doesn’t appear to take advantage of it – but since it’s hitting the XSA cache this time, it’s less of a concern.
If you change the underlying data in the XSA
The BI Server does pick this up and repopulates the XSA Cache.
The XSA cache entry itself is 192Mb in size – generated from a 55Mb upload file. The difference will be down to data types and storage methods etc. However, that it is larger in the XSA Cache (database) than held natively (flat file) doesn’t really matter, particularly if the data is being aggregated and/or filtered, since the performance benefit of pushing this work to the database will outweigh the overhead of storage space. Consider this example here, where I run an analysis pulling back 44 columns (of the 78 in the spreadsheet) and hit the XSA cache, it runs in just over a second, and transfers from the database a total of 5.3Mb (the data is repeated, so rolls up):
Rows 1000, bytes 5576000 retrieved from database Rows returned to Client 1000
If I disable the XSA cache and run the same query, we see this:
Rows 144000, bytes 801792000 Retrieved from database Physical query response time 22.086 (seconds) Rows returned to Client 1000
That’s 764Mb being sent back for the BI Server to process, which it does by dumping a whole load to disk in temporary work files:
$ ls -l /app/oracle/biee/user_projects/domains/bi/servers/obis1/tmp/obis_temp [...]] -rwxrwx--- 1 oracle oinstall 10726190 2016-06-01 21:04 nQS_AG_29733_261_ebd70002_75835908.TMP -rwxrwx--- 1 oracle oinstall 153388 2016-06-01 21:04 nQS_AG_29733_262_ebd70002_75835908.TMP -rw------- 1 oracle oinstall 24192000 2016-06-01 21:04 nQS_AG_29733_266_ebd70002_75862509.TMP -rw------- 1 oracle oinstall 4195609 2016-06-01 21:04 nQS_EX_29733_264_ebd70002_75861716.TMP -rw------- 1 oracle oinstall 21430272 2016-06-01 21:04 nQS_EX_29733_265_ebd70002_75861739.TMP
As a reminder – this isn’t “Bad”, it’s just not optimal (response time of 50 seconds vs 1 second), and if you scale that kind of behaviour by many users with many datasets, things could definitely get hairy for all users of the system. Hence – use the XSA Cache.
As a final point, with the XSA Cache being in the database the standard range of performance optimisations are open to us – indexing being the obvious one. No indexes are built against the XSA Cache table by default, which is fair enough since OBIEE has no idea what the key columns on the data are, and the point of mashups is less to model and optimise the data but to just get it up there in front of the user. So you could index the table if you knew the key columns that were going to be filtered against, or you could even put it into memory (assuming you’ve licensed the option).
The MoS document referenced above also includes further performance recommendations for XSA, including the use of RAM Disk for XSA cache metadata files, as well as the managed server temp folder
Summary
External Subject Areas are great functionality, but be aware of the performance implications of not being able to push down common operations such as filtering and aggregation. Set up XSA Caching if you are going to be using XSA properly.
If you’re interested in the direction of XSA and the associated Data Set Service, this slide deck from Oracle’s Socs Cappas provides some interesting reading. Uploading Excel files into OBIEE looks like just the beginning of what the Data Set Service is going to enable!
The post OBIEE 12c – Extended Subject Areas (XSA) and the Data Set Service appeared first on Rittman Mead Consulting.
OBIEE 12c – Extended Subject Areas (XSA) and the Data Set Service
One of the big changes in OBIEE 12c for end users is the ability to upload their own data sets and start analysing them directly, without needing to go through the traditional data provisioning and modelling process and associated leadtimes. The implementation of this is one of the big architectural changes of OBIEE 12c, introducing the concept of the Extended Subject Areas (XSA), and the Data Set Service (DSS).
In this article we'll see some of how XSA and DSS work behind the scenes, providing an important insight for troubleshooting and performance analysis of this functionality.
What is an XSA?
An Extended Subject Area (XSA) is made up of a dataset, and associated XML data model. It can be used standalone, or "mashed up" in conjunction with a "traditional" subject area on a common field
How is an XSA Created?
At the moment the following methods are available:
- "Add XSA" in Visual Analzyer, to upload an Excel (XLSX) document
CREATE DATASET
logical SQL statement, that can be run through any interface to the BI Server, including 'Issue Raw SQL', nqcmd, JDBC calls, and so onAdd Data Source in Answers. Whilst this option shouldn't actually be present according to a this doc, it will be for any users of 12.2.1 who have uploaded the SampleAppLite BAR file so I'm including it here for completeness.
Under the covers, these all use the same REST API calls directly into datasetsvc
. Note that these are entirely undocumented, and only for internal OBIEE component use. They are not intended nor supported for direct use.
How does an XSA work?
External Subject Areas (XSA) are managed by the Data Set Service (DSS). This is a java deployment (datasetsvc
) running in the Managed Server (bi_server1), providing a RESTful API for the other OBIEE components that use it.
The end-user of the data, whether it's Visual Analyzer or the BI Server, send REST web service calls to DSS, storing and querying datasets within it.
Where is the XSA Stored?
By default, the data for XSA is stored on disk in SINGLETONDATADIRECTORY/components/DSS/storage/ssi
, e.g. /app/oracle/biee/user_projects/domains/bi/bidata/components/DSS/storage/ssi
[oracle@demo ssi]$ ls -lrt /app/oracle/biee/user_projects/domains/bi/bidata/components/DSS/storage/ssi|tail -n5 -rw-r----- 1 oracle oinstall 8495 2015-12-02 18:01 7e43a80f-dcf6-4b31-b898-68616a68e7c4.dss -rw-r----- 1 oracle oinstall 593662 2016-05-27 11:00 1beb5e40-a794-4aa9-8c1d-5a1c59888cb4.dss -rw-r----- 1 oracle oinstall 131262 2016-05-27 11:12 53f59d34-2037-40f0-af21-45ac611f01d3.dss -rw-r----- 1 oracle oinstall 1014459 2016-05-27 13:04 a4fc922d-ce0e-479f-97e4-1ddba074f5ac.dss -rw-r----- 1 oracle oinstall 1014459 2016-05-27 13:06 c93aa2bd-857c-4651-bba2-a4f239115189.dss
They're stored using the format in which they were created, which is XLSX (via VA) or CSV (via CREATE DATASET
)
[oracle@demo ssi]$ head 53f59d34-2037-40f0-af21-45ac611f01d3.dss "7 Megapixel Digital Camera","2010 Week 27",44761.88 "MicroPod 60Gb","2010 Week 27",36460.0 "MP3 Speakers System","2010 Week 27",36988.86 "MPEG4 Camcorder","2010 Week 28",32409.78 "CompCell RX3","2010 Week 28",33005.91
There's a set of DSS-related tables installed in the RCU schema BIPLATFORM
, which hold information including the XML data model for the XSA, along with metadata such as the user that uploaded the file, when they uploaded, and then name of the file on disk:
How Can the Data Set Service be Configured?
The configuration file, with plenty of inline comments, is at ORACLEHOME/bi/endpointmanager/jeemap/dss/DSSREST_SERVICE.properties
. From here you con update settings for the data set service including upload limits as detailed here.
XSA Performance
Since XSA are based on flat files stored in disk, we need to be very careful in their use. Whilst a database may hold billions of rows in a table with with appropriate indexing and partitioning be able to provide sub-second responses, a flat file can quickly become a serious performance bottleneck. Bear in mind that a flat file is just a bunch of data plopped on disk - there is no concept of indices, blocks, partitions -- all the good stuff that makes databases able to do responsive ad-hoc querying on selections of data.
If you've got a 100MB Excel file with thousands of cells, and want to report on just a few of them, you might find it laggy - because whether you want to report on them on or not, at some point OBIEE is going to have to read all of them regardless. We can see how OBIEE is handling XSA under the covers by examining the query log. This used to be called nqquery.log
in OBIEE 11g (and before), and in OBIEE 12c has been renamed obis1-query.log
.
In this example here I'm using an Excel worksheet with 140,000 rows and 78 columns. Total filesize of the source XLSX on disk is ~55Mb.
First up, I'll build a query in Answers with a couple of the columns:
The logical query uses the new XSA
syntax:
SELECT 0 s_0, XSA('prodney'.'MOCK_DATA_bigger_55Mb')."Columns"."first_name" s_1, XSA('prodney'.'MOCK_DATA_bigger_55Mb')."Columns"."foo" s_2 FROM XSA('prodney'.'MOCK_DATA_bigger_55Mb') ORDER BY 2 ASC NULLS LAST FETCH FIRST 5000001 ROWS ONLYThe query log shows
Rows 144000, bytes 13824000 retrieved from database query
Rows returned to Client 200
So of the 55MB of data, we're pulling all the rows (144,000) back to the BI Server for it to then perform the aggregation on it, resulting in the 200 rows returned to the client (Presentation Services). Note though that the byte count is lower (13Mb) than the total size of the file (55Mb).
As well as aggregation, filtering on XSA data also gets done by the BI Server. Consider this example here, where we add a predicate:
In the query log we can see that all the data has to come back from DSS to the BI Server, in order for it to filter it:
Rows 144000, bytes 23040000 retrieved from database Physical query response time 24.195 (seconds), Rows returned to Client 0Note the time taken by DSS -- nearly 25 seconds. Compare this later on to when we see the XSA data served from a database, via the XSA Cache. In terms of BI Server (not XSA) caching, the query log shows that a cache entry was written for the above request:
Query Result Cache: [59124] The query for user 'prodney' was inserted into the query result cache. The filename is '/app/oracle/biee/user_projects/domains/bi/servers/obis1/cache/NQS__736113_56359_0.TBL'
If I refresh the query in Answers, the data is fetched anew (per this changed behaviour in OBIEE 12c), and the cache repopulated. If I clear the Presentation Services cache and re-open the analysis, I get the results from the BI Server cache, and it doesn't have to refetch the data from the Data Set Service.
Since the cache has two columns in, an attribute and a measure, I wondered if running a query with just the fact rolled up might hit the cache (since it has all the data there that it needs)
Unfortunately it didn't, and to return a single row of data required BI Server to fetch all the rows again - although looking at the byte count it appears it does prune the columns required since it's now just over 2Mb of data returned this time:
Rows 144000, bytes 2304000 retrieved from database
Rows returned to Client 1
Interestingly if I build an analysis with several more of the columns from the file (in this example, ten of a total of 78), the data returned from the DSS to BI Server (167Mb) is greater than that of the original file (55Mb).
Rows 144000, bytes 175104000
Rows returned to Client 1000
And this data coming back from the DSS to the BI Server has to go somewhere - and if it's big enough it'll overflow to disk, as we can see when I run the above:
$ ls -l /app/oracle/biee/user_projects/domains/bi/servers/obis1/tmp/obis_temp [...] -rwxrwx--- 1 oracle oinstall 2910404 2016-06-01 14:08 nQS_AG_22345_7503_7c9c000a_50906091.TMP -rwxrwx--- 1 oracle oinstall 43476 2016-06-01 14:08 nQS_AG_22345_7504_7c9c000a_50906091.TMP -rw------- 1 oracle oinstall 6912000 2016-06-01 14:08 nQS_AG_22345_7508_7c9c000a_50921949.TMP -rw------- 1 oracle oinstall 631375 2016-06-01 14:08 nQS_EX_22345_7506_7c9c000a_50921652.TMP -rw------- 1 oracle oinstall 3670016 2016-06-01 14:08 nQS_EX_22345_7507_7c9c000a_50921673.TMP [...]
You can read more about BI Server's use of temporary files and the impact that it can have on system performance and particularly I/O bandwidth in this OTN article here.
So - as the expression goes - "buyer beware". XSA is an excellent feature, but used in its default configuration with files stored on disk it has the potential to wreak havoc if abused.
XSA Caching
If you're planning to use XSA seriously, you should set up the database-based XSA Cache. This is described in detail in the PDF document attached to My Oracle Support note OBIEE 12c: How To Configure The External Subject Area (XSA) Cache For Data Blending| Mashup And Performance (Doc ID 2087801.1).
In a proper implementation you would follow in full the document, including provisioning a dedicated schema and tablespace for holding the data (to make it easier to manage and segregate from other data), but here I'm just going to use the existing RCU schema (BIPLATFORM), along with the Physical mapping already in the RPD (10 - System DB (ORCL)
):
In NQSConfig.INI
, under the XSA_CACHE
section, I set:
ENABLE = YES;
# The schema and connection pool where the XSA data will be cached.
PHYSICAL_SCHEMA = "10 - System DB (ORCL)"."Catalog"."dbo";
CONNECTION_POOL = "10 - System DB (ORCL)"."UT Connection Pool";
And restart the BI Server:
/app/oracle/biee/user_projects/domains/bi/bitools/bin/stop.sh -i obis1 && /app/oracle/biee/user_projects/domains/bi/bitools/bin/start.sh -i obis1
Per the document, note that in the BI Server log there's an entry indicating that the cache has been successfully started:
[101001] External Subject Area cache is started successfully using configuration from the repository with the logical name ssi.
[101017] External Subject Area cache has been initialized. Total number of entries: 0 Used space: 0 bytes Maximum space: 107374182400 bytes Remaining space: 107374182400 bytes. Cache table name prefix is XC2875559987.
Now when I re-run the test XSA analysis from above, returning three columns, the BI Server goes off and populates the XSA cache table:
-- Sending query to database named 10 - System DB (ORCL) (id: <<79879>> XSACache Create table Gateway), connection pool named UT Connection Pool, logical request hash b4de812e, physical request hash 5847f2ef: CREATE TABLE dbo.XC2875559987_ZPRODNE1926129021 ( id3209243024 DOUBLE PRECISION, first_n[..]
Or rather, it doesn't, because PHYSICALSCHEMA
seems to want the literal physical schema, rather than the logical physical one (?!) that the USAGETRACKING
configuration stanza is happy with in referencing the table.
Properties: description=<<79879>> XSACache Create table Exchange; producerID=0x1561aff8; requestID=0xfffe0034; sessionID=0xfffe0000; userName=prodney; [nQSError: 17001] Oracle Error code: 1918, message: ORA-01918: user 'DBO' does not exist
I'm trying to piggyback on SA511's existing configruation, which uses catalog.schema notation:
Instead of the more conventional approach to have the actual physical schema (often used in conjunction with 'Require fully qualified table names' in the connection pool):
So now I'll do it properly, and create a database and schema for the XSA cache - I'm still going to use the BIPLATFORM schema though...
Updated NQSConfig.INI:
[ XSA_CACHE ]
ENABLE = YES;
# The schema and connection pool where the XSA data will be cached.
PHYSICAL_SCHEMA = "XSA Cache"."BIEE_BIPLATFORM";
CONNECTION_POOL = "XSA Cache"."XSA CP";
After refreshing the analysis again, there's a successful creation of the XSA cache table:
-- Sending query to database named XSA Cache (id: <<65685>> XSACache Create table Gateway), connection pool named XSA CP, logical request hash 9a548c60, physical request hash ccc0a410: [[ CREATE TABLE BIEE_BIPLATFORM.XC2875559987_ZPRODNE1645894381 ( id3209243024 DOUBLE PRECISION, first_name2360035083 VARCHAR2(17 CHAR), [...]
as well as a stats gather:
-- Sending query to database named XSA Cache (id: <<65685>> XSACache Collect statistics Gateway), connection pool named XSA CP, logical request hash 9a548c60, physical request hash d73151bb: BEGIN DBMS_STATS.GATHER_TABLE_STATS(ownname => 'BIEE_BIPLATFORM', tabname => 'XC2875559987_ZPRODNE1645894381' , estimate_percent => 5 , method_opt => 'FOR ALL COLUMNS SIZE AUTO' ); END;
Although I do note that it is used a fixed estimatepercent
instead of the recommended AUTOSAMPLE_SIZE
. The table itself is created with a fixed prefix (as specified in the obis1-diagnostic.log at initialisation), and holds a full copy of the XSA (not just the columns in the query that triggered the cache creation):
With the dataset cached, the query is then run and the query log shows a XSA cache hit
External Subject Area cache hit for 'prodney'.'MOCK_DATA_bigger_55Mb'/Columns : Cache entry shared_cache_key = 'prodney'.'MOCK_DATA_bigger_55Mb', table name = BIEE_BIPLATFORM.XC2875559987_ZPRODNE2128899357, row count = 144000, entry size = 201326592 bytes, creation time = 2016-06-01 20:14:26.829, creation elapsed time = 49779 ms, descriptor ID = /app/oracle/biee/user_projects/domains/bi/servers/obis1/xsacache/NQSXSA_BIEE_BIPLATFORM.XC2875559987_ZPRODNE2128899357_2.CACHEwith the resulting physical query fired at the XSA cache table (replacing what would have gone against the DSS web service):
-- Sending query to database named XSA Cache (id: <<65357>>), connection pool named XSA CP, logical request hash 9a548c60, physical request hash d3ed281d: [[ WITH SAWITH0 AS (select T1000001.first_name2360035083 as c1, T1000001.last_name3826278858 as c2, sum(T1000001.foo2363149668) as c3 from BIEE_BIPLATFORM.XC2875559987_ZPRODNE1645894381 T1000001 group by T1000001.first_name2360035083, T1000001.last_name3826278858) select D1.c1 as c1, D1.c2 as c2, D1.c3 as c3, D1.c4 as c4 from ( select 0 as c1, D102.c1 as c2, D102.c2 as c3, D102.c3 as c4 from SAWITH0 D102 order by c2, c3 ) D1 where rownum <= 5000001
It's important to point out the difference of what's happening here: the aggregation has been pushed down to the database, meaning that the BI Server doesn't have to. In performance terms, this is a Very Good Thing usually.
Rows 988, bytes 165984 retrieved from database query
Rows returned to Client 988
Whilst it doesn't seem to be recorded in the query log from what I can see, the data returned from the XSA Cache also gets inserted into the BI Server cache, and if you open an XSA-based analysis that's not in the presentation services cache (a third cache to factor in!) you will get a cache hit on the BI Server cache. As discussed earlier in this article though, if an analysis is built against an XSA for which a BI Server cache entry exists that with manipulation could service it (eg pruning columns or rolling up), it doesn't appear to take advantage of it - but since it's hitting the XSA cache this time, it's less of a concern.
If you change the underlying data in the XSA
The BI Server does pick this up and repopulates the XSA Cache.
The XSA cache entry itself is 192Mb in size - generated from a 55Mb upload file. The difference will be down to data types and storage methods etc. However, that it is larger in the XSA Cache (database) than held natively (flat file) doesn't really matter, particularly if the data is being aggregated and/or filtered, since the performance benefit of pushing this work to the database will outweigh the overhead of storage space. Consider this example here, where I run an analysis pulling back 44 columns (of the 78 in the spreadsheet) and hit the XSA cache, it runs in just over a second, and transfers from the database a total of 5.3Mb (the data is repeated, so rolls up):
Rows 1000, bytes 5576000 retrieved from database
Rows returned to Client 1000
If I disable the XSA cache and run the same query, we see this:
Rows 144000, bytes 801792000 Retrieved from database
Physical query response time 22.086 (seconds)
Rows returned to Client 1000
That's 764Mb being sent back for the BI Server to process, which it does by dumping a whole load to disk in temporary work files:
$ ls -l /app/oracle/biee/user_projects/domains/bi/servers/obis1/tmp/obis_temp [...]] -rwxrwx--- 1 oracle oinstall 10726190 2016-06-01 21:04 nQS_AG_29733_261_ebd70002_75835908.TMP -rwxrwx--- 1 oracle oinstall 153388 2016-06-01 21:04 nQS_AG_29733_262_ebd70002_75835908.TMP -rw------- 1 oracle oinstall 24192000 2016-06-01 21:04 nQS_AG_29733_266_ebd70002_75862509.TMP -rw------- 1 oracle oinstall 4195609 2016-06-01 21:04 nQS_EX_29733_264_ebd70002_75861716.TMP -rw------- 1 oracle oinstall 21430272 2016-06-01 21:04 nQS_EX_29733_265_ebd70002_75861739.TMP
As a reminder - this isn't "Bad", it's just not optimal (response time of 50 seconds vs 1 second), and if you scale that kind of behaviour by many users with many datasets, things could definitely get hairy for all users of the system. Hence - use the XSA Cache.
As a final point, with the XSA Cache being in the database the standard range of performance optimisations are open to us - indexing being the obvious one. No indexes are built against the XSA Cache table by default, which is fair enough since OBIEE has no idea what the key columns on the data are, and the point of mashups is less to model and optimise the data but to just get it up there in front of the user. So you could index the table if you knew the key columns that were going to be filtered against, or you could even put it into memory (assuming you've licensed the option).
The MoS document referenced above also includes further performance recommendations for XSA, including the use of RAM Disk for XSA cache metadata files, as well as the managed server temp folder
Summary
External Subject Areas are great functionality, but be aware of the performance implications of not being able to push down common operations such as filtering and aggregation. Set up XSA Caching if you are going to be using XSA properly.
If you're interested in the direction of XSA and the associated Data Set Service, this slide deck from Oracle's Socs Cappas provides some interesting reading. Uploading Excel files into OBIEE looks like just the beginning of what the Data Set Service is going to enable!