Tag Archives: Oracle BI Suite EE
OBIEE 11.1.1.7 + Essbase : Essbase Filters within Fusion Middleware 11g Security
Data and metadata filters are used with Essbase to limit access to data in a cube. When you use data filters to restrict access to a certain set of dimension members in the cube, the end-user can see that the member exists, but can’t see any data associated with it. Metadata filters go one step further, stopping the end-user even seeing the dimension member if they’ve not got permission to view it. In the past, Essbase filters were defined using MaxL or Essbase Administration Services, and then assigned to individual users or groups using Hyperion Shared Services; however, if you install Essbase as part of OBIEE 11.1.1.7, these filters are assigned to users and groups using Fusion Middleware Security’s application roles and policies. So how does this process work?
There are three main steps required to set up an Essbase filter under this new Fusion Middleware security model:
- Identify the user or group to which the filter is to be applied (or, as in the case of this note, create a new FMW user and add to an existing default group)
- Create an Essbase Filter, typically using Essbase Administration Services
- Set up the required policies within Fusion Middleware security, using either WLST scripting or more usually, Oracle Enterprise Manager Fusion Middleware Control.
Creating a new Embedded LDAP Server User
To start by creating a new user, login to Weblogic Administration Console (eg http://localhost:7001/console/login/LoginForm.jsp) , and click on Domain Structure > Security Realms, like this:
On the subsequent screen, click on the Users and Groups tab. Then, click on New to add a user. On the resultant form, enter a user name (eg markcann) and a password. Confirm the password, and then click OK to save those details.
Back in the main users screen, click on the newly-added user, like this, so that we can then assign it to a group.
Then, in the resultant screen, click on the Groups tab, then select just the BIConsumers group, and press Save. This user is now part of the basic group that can access OBIEE, and will be able to run reports and other catalog content.
Create the Essbase FIlter
Now we’re going to create an Essbase filter to apply to this user. In this example, the Essbase test database EXA102_2.EXA102_2 contains the following dimensions:
The filter to be created needs to hide the bottom levels of the Product Hierarchy from view: For example PT1 within PC12 contains PI3, PI2, etc as lowest level values. All such leaf nodes need to be hidden from user view for this dimension.
Although filters can be written / uploaded using MAXL, even though we are using Fusion MIddleware security with Essbase, they can still be created in Essbase Administration Services (EAS). Once logged into EAS, navigate through the Enterprise View tree to locate the required database, then right-click on the database, and choose Edit > Filters.
In the Filter Editor screen, click New to create a new filter. Assign the new filter a name (for example, “DropLeaves”), and add a MetaRead filter, the most restrictive of the two filter types.
@REMOVE(@IDESCENDANTS(“Product”),@LEVMBRS(“Product”,0))
Click Verify to check the content of the filter code, then click Save to return to the main filter screen:
Then finally, click Close. The filter is now created and ready for assignment.
Assigning the Filter using Application Policy Permission Objects
According to the OBIEE 11.1.1.7 Systems Administrator Guide’s section on Configuring Data-Level Security using Essbase Filters:
“An application role requires at least two policy store permission grants to access to a specific filter. You must give an application role permission to use filters within a specific scope eg
• oracle.essbase.application, /EssbaseCluster-1/Demo, use_filter
and
• oracle.essbase.filter, /EssbaseCluster-1/Demo/Basic/read_filter, apply”
This is true – an application role does indeed require at least two policy store permission grants to access to a specific filter. It requires a whole lot more to work if starting from scratch though.
It is important to recognise that the default groups set up by the install contain a lot of policies by default and they are nested and hence inherit policies from their ‘subordinate’ members. The three groups allocated and set up for the BI elements of FMW are set up thus:
- BIAdministrators
- BIAuthors
- BIConsumers
By adding our new user, markcann, to the BIConsumers group, it automatically assumes all of the rights of that group. We can however overlay additional policies just to the markcann user that will not affect the other members of the group. It is here that the two additional policies to effect the filter – ie to this user – need to be added.
To do this, login now to Enterprise Manager Fusion Middleware Control, and in the left hand pane navigate to Business Intelligence > Core Application and left-click on it. Once the right-hand pane has refreshed, click on the dropdown next to Business Intelligence, then select Security / Application Policies, like this:
In the resultant screen, select obi as the Application Stripe (if not already defaulted to this value), select User as the Principal Type, and then press the Create button.
In the resultant screen, click the top Add:
Change the Type field value to User, then locate the new user (click on the blue arrow on the right hand side of the form after having entered the first letter in the search box if required). Highlight the user, then click OK.
Back in the main screen, the selected user will now be showing. Click the bottom Add to start adding the required policy permission.
In the resultant dialog, click the Resource Types radio button. Choose oracle.essbase.application as the Resource Type, then click Continue.
Type in the cluster / app name (/EssbaseCluster-1/EXA102_2) as the resource. Check only the use_filter permission action, then press the Select button.
Back in the main screen, the newly added application policy permission should now be showing. Click the bottom Add button again to add a second policy permission, following these steps:
- Check the ‘Resource Types’ radio button
- Choose oracle.essbase.filter as the Resource Type
- Click ‘Continue’ to access the second screen
Enter the Cluster / App / DB / Filter (eg /EssbaseCluster-1/EXA102_2/EXA102_2/DropLeaves) as the Resource Name. Check the apply check box, then press Select to return to the main application policy screen for the user.
Back in the main screen, with the two application policy permissions added now showing, click on OK.
The previously created filter is now effective for the markcann user.
Testing the New Filter
To test the filter, I now log into Smartview, initially as the main administrator user ‘weblogic’. To test access without the filter applied, I access the relevant database (in this case, EXA102_2.EXA102_2), place the Product dimension in the ‘row’ position, and perform a Zoom In / All Levels operation. Note that level 0 values of product ARE visible:
Now I re-connect to the database via Smartview using the new markcann user and repeat the orientation / Zoom actions. Note that the level 0 values are no longer visible:
When actually drilling (zooming) from Smartview, double-clicking on (e.g.) Bread Clubs results in no change to the report, exactly as if it was the bottom level of the hierarchy.
Obviously in a real system you would not look to set up access like this on a user by user basis. One approach may be to add access-based groups (UserGroup1, UserGroup2, etc) to the BIConsumers (or indeed, BIAuthors) group, assign the relevant filter (or filters….Essbase under FMW allows recognises multiple filters for a user, which is new) to those groups. Provisioning (access to functions) would then be controlled at ‘Standard Group level, and Scoping (preventing access to data) controlled by the Custom Groups.
So – a bit of a different process than we’re used to with Shared Services, but it has the advantage of using the same users, groups, application roles and application policies as the rest of OBIEE. Remember also to ensure that any connection from OBIEE’s repository to the Essbase database uses the actual users’ credentials (i.e. :USER and :PASSWORD, or use the CSS Token feature) otherwise the filters won’t get applied to the users’ analyses and reports.
Conclusions on Storing Detail-Level Data in Oracle Exalytics
Last week I wrote a blog post on the topic of storing lots of detail-level data in Oracle Exalytics, not the primary use-case for the product but something we’re increasingly asked about. Most implementations of Exalytics use OBIEE’s Summary Advisor to recommend aggregations against large, data warehouse data sets with those aggregates then getting stored in TimesTen, but some customers are looking at their data set, seeing that its well-under 1 or 2TB in size, and wondering whether it’s possible to put the whole dataset in-memory, rather than just specific aggregations.
My initial conclusion on writing the article was that TimesTen, although fast for small datasets, wasn’t particularly suited to querying large, sparse datasets because it lacked common VLDB features such as parallel query and partitioning. Instead, I recommended readers looked at the other database engine that comes with Exalytics – Essbase, and in particular the Aggregate Storage Option, as an alternative way of storing this volume of data.
And Essbase ASO performed very well at the task, rapidly loading 130m source rows of data into an ASO database, and then responding to queries typically within a couple of seconds. The source Oracle database took up to a minute to return queries (having disabled any materialised views in the background), and then TimesTen database performed as expected, typically taking 10, 20 seconds to return answers. Except – having discussed the results with the TimesTen development team over the weekend, I’d missed out a crucial step, by not running TimesTen’s Index Advisor after some queries had run, to create indexes that suited the dashboards I was analysing. In yesterday’s post then I ran the Index Advisor on the two TimesTen databases, and as Oracle advised, the results were much, much better – in fact, faster than the Essbase ASO database that I’d been singing the praises of in the previous post.
But of course – now things aren’t being fair on Essbase, as there’s a similar step that you can perform on Essbase ASO databases, to run an advisor that recommends what are called “aggregate views”, similar to Oracle database materialised views, based on the structure of your cube and the workload on your system. And, of course, there’s the Summary Advisor that I can run on the original Oracle source database, which of course should make certain queries against this source run faster. So which one is best?
Before we get into any more testing though, it’s worth thinking about why we’re looking at this scenario, and what Exalytics is designed for – at least., the OBIEE parts of Exalytics. When Exalytics was put together, Oracle’s vision for the product was as a complement to technologies such as Exadata and Big Data Appliance, where the reporting dataset could typically be many, many terabytes in size. As such, you’re never going to get all of the data into the 1TB (now 2TB) of RAM that’s in the Exalytics server, so you’ve got two choices; either you take a detail-level subset of the entire dataset and put it into RAM (the SAP HANA approach, perhaps) or you take a slice of it – say, just the aggregates – and put those into memory instead. Going down the latter route means that Exalytics can work with source datasets many times the size of Exalytics’ memory, and the approach fits with the general way that data warehouse systems are optimised, by pre-aggregating and pre-caching data.
But some customers don’t fit this model, but still want to take advantage of the power of Exalytics. Some might buy Exalytics to give them their own, departmental analytics server, in the same way that companies used to buy Hyperion Essbase or Oracle Express to be able to “own” their own analysis server. In this case, it’s reasonable that they might want to put all the data required for analysis “in the Exalytics server”, but this isn’t a scenario catered for well by the Summary Advisor. Others might just have a reporting dataset that’s within the 1 or 2TB of RAM within the Exalytics server, and just say – let’s put it all in there. Another scenario might involve copying just this month’s data into TimesTen (“hot data”), and then use a feature such as OBIEE’s LTS fragmentation to get the hot data from TimesTen, and the older data from Oracle. Whichever way, Exalytics has to be able to handle both summary-level queries, and retrievals of individual sets of rows, right down to the grain of the dataset.
So the way that TimesTen, with the Index Advisor, speeds-up queries against large datasets actually serves this well – index access will suit retrieval of small sets of rows, and in fact the new indexes recommended by the Index Advisor in the example in yesterday’s post even seemed to speed up aggregations as well, which was a bonus. The equivalent advisor within Essbase concentrates on aggregations though, not single “row” retrievals, so how does that work?
Whilst you can create ASO aggregate views manually, probably the easiest place to create them is within Essbase Administration Services Console, where there’s a menu option against ASO databases for designing these aggregate views (remember, ASO databases automatically aggregate when queries run against non level-0 members, but this process is about pre-calculating more aggregates in order to speed up those queries).
Selecting this menu option brings up a wizard, that lets you either have the wizard pick and materialize the aggregations, or let you select and materialise them individually, based either on the structure of the ASO database or a query workload captured earlier.
Creating the aggregates, for both the Sales History and the AIrline Delays dataset, was surprisingly fast, and going back to the dashboards and running the queries again, response time as predicted, dropped.
So in-fact, from a purely response-time perspective, there’s not really much in it between TimesTen (plus the Index Advisor) and Essbase ASO (and the Aggregation Design Wizard). For good measure I also created some in-memory aggregate for the Oracle source using the Summary Advisor, which recommended a couple for each, albeit with the TimesTen one coming in at a whopping 10GB.
So – to answer the question – can I put my entire dataset into Exalytics’s RAM, either because in total it’s less than 1TB in size, or because I want all my reporting data on my own server, in-memory, which option comes out best? In truth, there’s no straight answer, with both TimesTen and Essbase ASO capable of doing the job, but each with their own limitations.
During the Essbase testing I’d been corresponding with Dan Pressman, author of the “How ASO Works and How to Design for Performance” chapter in the “Developing Essbase Applications” book, who was very bullish about Essbase Aggregate Storage Option in this type of scenario. In his opinion (and to paraphrase), Essbase ASO creates such compact databases (250MB vs. TimesTen’s 10GB) and returns both detail-level and summary data so quickly because:
- “It effectively “compresses” the metadata when it creates the bitmap (which could be described as containing an efficient surrogate key for each piece of metadata)
- The bitmap not only identifies each piece of data at the lowest level but includes the full ancestry within the hierarchy.
- The full alternate hierarchy ancestry (for all alternates) of each data item is additionally represented in alternate bitmaps, which while not physically in the data file is available to the query evaluator
- This compressed metadata is very much like a column store index. Since the full identity all of the way up the hierarchy (and on alternate hierarchies) is included all possible “additive” queries can be answered from this one compact representation.
- ANY aggregate can be generated dynamically by querying the compressed bitmap. And if it fits in RAM then ANY aggregate can be calculated at RAM speeds. There is no need to use a summary advisor and hope that you got the right Aggregates.
- Even though there is no need to use a summary advisor, one exists: the aggregation wizard. It creates summary aggregations which have all the advantages of the original metadata compression – simply limited to the upper portions of the hierarchies not summarised in the aggregation process.
- This means that an aggregation can serve many queries.”
In addition, Dan observed (correctly) that the queries I ran from the various dashboards were pretty simple, and in his words “if instead you ran a more complicated grid type query that required results from all query categories, I believe you would find Essbase would win on the UNAGGREGATED cube – as TT would need results that would have to come from a number of indexed queries that would then have to be correlated. Essbase would be able to do all the levels in a single pass of the full 900k bitmapped rows.”
But one thing that’s worth also bearing in mind is that it was a lot harder to create the Essbase ASO dataset, even using Essbase Studio, than it was to replicate the tables into TimesTen using ttimportfromOracle. Its certainly possible to model a multiple-fact scenario in Essbase, but in practice reporting across multiple Essbase databases and modelling multiple “facts” is trickier in Essbase (within the OBIEE context), and I had to do lots of data-munging such as prefixing member names to ensure uniqueness, that I didn’t have to do with TimesTen. That said, the out-of-the-box performance with the Essbase ASO source was a lot better than the TimesTen one, and the RPD modelling part was easier as the import process creates all the RPD metadata out of the Essbase database outline.
And all of this assumes that you can get your full dataset into Exalytics’ 1 or 2TB of RAM. With the Essbase ASO database, even with the additional aggregate views, the dataset was tiny, whereas with TimesTen the tables I imported in came to around 10GB (with compression), fine for an Exalytics server but a slight warning that we’d have to carefully manage space when loading data into the server. And this is where the regular Summary Advisor comes into its own, taking just the aggregated data and storing that, leaving the detail in the source database with Exalytics just holding the aggregates.
Unfortunately, therefore, there’s no black-and-white answer as to which technology is best when you want to store all of your reporting data, not just the aggregates, in Exalytics. The good news is that they both work (in the end), but each approach has its benefits – Essbase is easier to setup and model, gives the best initial results, but requires Essbase modelling skills and may not scale-up to the point where you could reasonably model the entire BI Apps dataset in Essbase ASO, for example. TimesTen is a bit more fiddly, takes a bit of post-load tuning, but has the benefit of matching closely the structure and layout of the source Oracle database, making it perhaps a more realistic option when replicating a whole schema into RAM – as long as it all fits in.
For me though – in the short term, I’ll be presenting this along with Stewart Bryson at next week’s Enkitec Extreme Exadata Expo (E4) Conference in Dallas, Texas, as part of a wider “tips and tricks from the field” session on Monday afternoon. Beyond that, my next set of tests will be to revisit the work that Pete Scott and I did on accelerating Oracle BI Apps dataset using Exalytics and TimesTen, this time with the Index Advisor and with the hope that we’ll finally hit that goal of a data warehouse in-memory, with split-second response times. Watch this space, and maybe see one or two of you in Texas next week.
Optimizing TimesTen for Exalytics Queries using TimesTen’s Index Advisor
Late last week I posted an article on our blog on loading detail-level data into Exalytics; whilst the article was intended really as an introduction to Essbase Aggregate Storage Option, and how it could offer an alternative to TimesTen when storing large amounts of detail-level data, to make the case for ASO I needed to show how using TimesTen for this type of scenario might be an issue. TimesTen within the context of Exalytics is really positioned as an aggregate cache, with the Summary Advisor automatically recommending and storing aggregates in TimesTen, and this is a great use-case for it; but as I showed in the article (and as others have also evidenced) TimesTen can start to slow-down when you’ve got large amounts of data being returned by the query, as it doesn’t currently use common VLDB features like partitioning and parallel query to help speed-up queries.
But there was a crucial step I missed-out when preparing the TimesTen datasets for querying. Whilst the ttimportfromOracle utility creates indexes for the TimeTen tables it’s creating, it creates them based on the ones on the corresponding Oracle tables, which isn’t a bad place to start from, but might not suit either TimesTen (which has its own, memory-optimised indexes) or the particular queries you’re then firing at it. To take an example; in the query that I ran agains the Sales History data in TimesTen in the previous article, the execution plan showed two TmpHashScan operations, like this:
Now TmpHashScan operations actually involve TimesTen creating temporary indexes in the background, to speed up the query but which of course add to the execution time of the query. If instead though, we’d created those indexes beforehand, we wouldn’t incur that hit during the actual query, and the response time would go down. Now to be fair to the Oracle database source, and the Essbase ASO source, there are also query optimisation steps that we could perform for those ones too, but let’s look at what happens if we run an index optimisation process on the TimesTen datasets – Sales History, with around 900k rows of fact table data, and Flight Delays, with around 120m rows of data – as analysed in the two dashboards below (and notice how the Flight Delays one even has an analysis that times-out, such is the performance issue with the data “as is”):
Looking at the current response times of the analyses on these two dashboards, you can see that the Sales History queries return data instantaneously, whereas the Airline Delays ones have much longer response times – anything from 15 seconds on average up to a minute – and that minute is the query timeout.
Now there’s actually a utility within TimesTen that helps with index optimisation called the “Index Advisor”. How this works is that you enable it via the ttisql command-line utility, then run a bunch of queries to generate a workload, then run the utility again to generate recommendations based on table scans, joins, sorts and grouping operations within the queries. When capturing the workload, you can either do so for just that ttisql session, or for the whole database, which would make more sense in an Exalytics-type environment. So let’s enable this utility, switch over to the dashboard and run some queries, and see what comes out.
C:\Users\Administrator>ttisql "DSN=tt_exalytics_3"
Copyright (c) 1996, 2013, Oracle and/or its affiliates. All rights reserved. Type ? or "help" for help, type "exit" to quit ttIsql.
connect "DSN=tt_exalytics_3"; Connection successful: DSN=tt_exalytics_3;UID=Administrator;DataStore=C:\TimesTen\tt_data_3;DatabaseCharacterSet=AL32UTF8;ConnectionCharacterSet=US7ASCII;DRIVER=C:\TimesTen\TT1122~2\bin\ttdv1122.dll;LogDir=C:\TimesTen\tt_data_3;PermSize=9000;TempSize=2000;LockWait=600010.0;SQLQueryTimeout=600000;TypeMode=0;QueryThreshold=600000;PLSCOPE_SETTINGS=IDENTIFIERS:NONE;RangeIndexType=1;(Default setting AutoCommit=1)
Command> call ttindexAdviceCaptureStart(1,0);
I then run some queries on the TimesTen Sales History dashboard, filtering on year and channel class, and then after a while go back to ttisql to stop the capture process, and then output the index advice.
Command> call ttindexAdviceCaptureEnd(1);
Command> call ttindexAdviceCaptureOutput(1);
< 33, create hash index SALES_i6 on SH.SALES(CHANNEL_ID); >
< 2, create hash index SALES_i10 on SH.SALES(PROD_ID); >
< 1, create unique hash index SALES_i12 on SH.SALES(TIME_ID); >
< 1, create hash index SALES_i12 on SH.SALES(TIME_ID); >
< 16, create hash index PRODUCTS_i7 on SH.PRODUCTS(PROD_ID,PROD_CATEGORY); >
< 1, create hash index PRODUCTS_i13 on SH.PRODUCTS(PROD_ID,PROD_SUBCATEGORY); >
< 35, create hash index TIMES_i8 on SH.TIMES(TIME_ID,CALENDAR_YEAR); >
< 1, create index TIMES_i11 on SH.TIMES(CALENDAR_YEAR); >
< 8, create unique hash index CUSTOMERS_i9 on SH.CUSTOMERS(CUST_ID); >
< 1, create hash index CUSTOMERS_i9 on SH.CUSTOMERS(CUST_ID); >
10 rows found.
Command>
I then do the same whilst running some queries against the Airline Delays (130m+ rows) dataset, and get the following output:
Command> call ttindexAdviceCaptureStart(1,0);
Command> call ttindexAdviceCaptureEnd(1);
Command> call ttindexAdviceCaptureOutput(1);
< 20, create index OBIEE_GEO_ORIG_i14 on BI_AIRLINES.OBIEE_GEO_ORIG(AIRPORT); >
< 12, create hash index OBIEE_GEO_ORIG_i19 on BI_AIRLINES.OBIEE_GEO_ORIG(AIRPORT); >
< 8, create unique index PERFORMANCE_VIEW_i15 on BI_AIRLINES.PERFORMANCE_VIEW(DEST); >
< 1, create index PERFORMANCE_VIEW_i15 on BI_AIRLINES.PERFORMANCE_VIEW(DEST); >
< 20, create unique hash index PERFORMANCE_VIEW_i16 on BI_AIRLINES.PERFORMANCE_VIEW(DEST); >
< 1, create hash index PERFORMANCE_VIEW_i16 on BI_AIRLINES.PERFORMANCE_VIEW(DEST); >
< 13, create unique hash index UNIQUE_CARRIERS_i17 on BI_AIRLINES.UNIQUE_CARRIERS(CODE); >
< 1, create hash index UNIQUE_CARRIERS_i17 on BI_AIRLINES.UNIQUE_CARRIERS(CODE); >
< 4, create unique hash index OBIEE_TIME_DAY_D_i18 on BI_AIRLINES.OBIEE_TIME_DAY_D(CALENDAR_DATE,CAL_YEAR); >
< 10, create hash index OBIEE_TIME_DAY_D_i18 on BI_AIRLINES.OBIEE_TIME_DAY_D(CALENDAR_DATE,CAL_YEAR); >
< 5, create unique hash index OBIEE_TIME_DAY_D_i20 on BI_AIRLINES.OBIEE_TIME_DAY_D(CALENDAR_DATE); >
< 1, create hash index OBIEE_TIME_DAY_D_i20 on BI_AIRLINES.OBIEE_TIME_DAY_D(CALENDAR_DATE); >
12 rows found.
So that’s 10 indexes recommended for the Sales History dataset, and 12 recommended for the Flight delays one. I therefore go back to the ttisql command line and execute each of the index creation commands, and then run the two dashboards again. And the performance improvement is significant:
Recommendation from Oracle is to run this process over-and-over as more recommendations could come out, so I repeat the process to see if this is the case.
Command> call ttindexAdviceCaptureStart(1,0);
Command> call ttindexAdviceCaptureEnd(1);
Command> call ttindexAdviceCaptureOutput(0);
0 rows found.
So no more recommendations then, and a much-improved query response time from TimesTen. But how does this stack-up against the plain Oracle database source, and Essbase ASO? For the Sales History queries, all sources perform well now, with nothing really in it between the three sources, except that the Essbase source is now actually the slowest (if only by half a second or so).
With the Airline Delays dashboards though, TimesTen is now a winner, with sub-second response times even across 130m+ rows, compared to many seconds for the other sources.
I therefore stand corrected on TimesTen not being suitable for query large sets of large, sparse data, AS LONG as you run the Index Advisor after you’ve run a representative workload (and I even forgot to re-gather stats after generating the new indexes, which would probably have improved things even more). So, on that basis, do I take back my recommendation to consider Essbase ASO for these large, sparse datasets? Well, to be honest – I’ve not done any optimisation on Essbase yet, or used the Summary Advisor on the plain Oracle source, so it looks like there is one more round of testing to do, before coming to some final conclusions. Check back tomorrow for the final post in this series.
Storing Detail-Level Data in Oracle Exalytics
Although the primary use-case for TimesTen within Oracle Exalytics is to store relatively-small aggregate tables created using the Summary Advisor, many customers have asked us whether it’s possible to store their entire reporting dataset in TimesTen. With 1TB of RAM in the original version of the Exalytics hardware, and now 2TB of RAM in the new X3-4 version, even with the requirement to set aside space in RAM for TimesTen’s temporary objects, many data warehouses or data marts would fit in 500GB or 1TB of RAM, which makes the idea pretty interesting.
If you’ve read blogs from us in the past though, you may have picked-up on comments about TimesTen not being a great place to store large tables, as it lacks features such as table partitioning, parallel query and other VLDB essentials we’re used to with the Oracle database. Even though TimesTen runs in-memory, there’s still a cost to table scans, joins, row lookups and so on, and in some cases this can make querying a TimesTen database actually slower than querying an equivalent Oracle one. So where’s the tipping point for this, and is it possible to put an entire data warehouse in-memory, such as the BI Apps data warehouse that Peter Scott talked about in a blog post series a few months ago?
This is actually something we’ve been looking at fairly intensely over the past few weeks, as we’re getting this request from customers looking to maximise the return on their investment in Exalytics. Where it gets particularly interesting is that there’s also a second option for storing these large, detail-level datasets in Exalytics – the Essbase Aggregate Storage Option (ASO), an alternative to the more traditional Block Storage Option (BSO) that’s capable of storing and analyzing very large, highly dimensional sparse datasets, and with the 11.1.1.7 release of OBIEE 11g comes pre-installed and pre-integrated with OBIEE. So, if a customer wants to store their entire data warehouse or data mart on an Exalytics server, ideally in-memory but whichever way, taking advantage of the high-performance Exalytics hardware and its close integration with Oracle’s BI and OLAP software, how do the options work out on a sample dataset?
Let’s start off with the SH (Sales History) sample dataset that comes with all recent Oracle databases, made up of a couple of fact tables and a set of related dimension tables. To take one of the fact tables, SALES, the table contains just over 900k rows, is partitioned by year, and out-of-the-box has a bunch of materialised views and indexes to support it and speed up user queries. Disabling the MVs for the time being and importing the tables into the BI Repository, my initial physical model looks like this:
Running a sample set of reports and with caching turned-off, response time even with no MVs and this Oracle source was pretty good, with no noticeable delay – of course this is only a single user on an unused Exalytics server, but on this size of data response time is perfectly acceptable.
Looking at the physical SQL for a typical one of the analyses, you can see hash joins and partition elimination taking place, standard features within an Oracle data warehouse with partitioned tables.
SQL> explain plan for
2 select sum(T43903.QUANTITY_SOLD) as c1,
3 sum(T43903.AMOUNT_SOLD) as c2,
4 T43814.CHANNEL_DESC as c3,
5 T43841.CUST_STATE_PROVINCE as c4,
6 T43866.PROD_CATEGORY_DESC as c5
7 from
8 SH.CHANNELS T43814,
9 SH.PRODUCTS T43866,
10 SH.TIMES T43911,
11 SH.CUSTOMERS T43841,
12 SH.SALES T43903
13 where ( T43814.CHANNEL_ID = T43903.CHANNEL_ID and T43841.CUST_ID = T43903.CUST_ID and T43866.PROD_ID = T43903.PROD_ID a
ID = T43911.TIME_ID and T43903.TIME_ID = TO_DATE('1998-05-03 00:00:00' , 'YYYY-MM-DD HH24:MI:SS') and T43911.TIME_ID = TO_DAT
0:00:00' , 'YYYY-MM-DD HH24:MI:SS') )
14 group by T43814.CHANNEL_DESC, T43841.CUST_STATE_PROVINCE, T43866.PROD_CATEGORY_DESC;
Explained.
SQL> set lines 130
SQL> set pages 0
SQL> select * from table(dbms_xplan.display);
Plan hash value: 3773586640
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------------------------------------
| 1 | HASH GROUP BY | | 2 | 172 | 21 (10)| 00:00:01 | | |
| 2 | NESTED LOOPS | | | | | | | |
| 3 | NESTED LOOPS | | 2 | 172 | 20 (5)| 00:00:01 | | |
| 4 | NESTED LOOPS | | 2 | 140 | 18 (6)| 00:00:01 | | |
| 5 | NESTED LOOPS | | 2 | 98 | 16 (7)| 00:00:01 | | |
|* 6 | HASH JOIN | | 2 | 72 | 14 (8)| 00:00:01 | | |
|* 7 | INDEX UNIQUE SCAN | TIMES_PK | 1 | 8 | 1 (0)| 00:00:01 | | |
| 8 | PARTITION RANGE SINGLE | | 98 | 2744 | 12 (0)| 00:00:01 | 6 | 6 |
| 9 | TABLE ACCESS BY LOCAL INDEX ROWID| SALES | 98 | 2744 | 12 (0)| 00:00:01 | 6 | 6 |
| 10 | BITMAP CONVERSION TO ROWIDS | | | | | | | |
|* 11 | BITMAP INDEX SINGLE VALUE | SALES_TIME_BIX | | | | | 6 | 6 |
| 12 | TABLE ACCESS BY INDEX ROWID | CHANNELS | 1 | 13 | 1 (0)| 00:00:01 | | |
|* 13 | INDEX UNIQUE SCAN | CHANNELS_PK | 1 | | 0 (0)| 00:00:01 | | |
| 14 | TABLE ACCESS BY INDEX ROWID | PRODUCTS | 1 | 21 | 1 (0)| 00:00:01 | | |
|* 15 | INDEX UNIQUE SCAN | PRODUCTS_PK | 1 | | 0 (0)| 00:00:01 | | |
|* 16 | INDEX UNIQUE SCAN | CUSTOMERS_PK | 1 | | 0 (0)| 00:00:01 | | |
| 17 | TABLE ACCESS BY INDEX ROWID | CUSTOMERS | 1 | 16 | 1 (0)| 00:00:01 | | |
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
6 - access("T43903"."TIME_ID"="T43911"."TIME_ID")
7 - access("T43911"."TIME_ID"=TO_DATE(' 1998-05-03 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
11 - access("T43903"."TIME_ID"=TO_DATE(' 1998-05-03 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
13 - access("T43814"."CHANNEL_ID"="T43903"."CHANNEL_ID")
15 - access("T43866"."PROD_ID"="T43903"."PROD_ID")
16 - access("T43841"."CUST_ID"="T43903"."CUST_ID")
Let’s try now moving those tables, in their entirety, into TimesTen for Exalytics 11.1.2.2.5.0, the version of TimesTen that ships with Exalytics 1.0.0.3. I could use a tool like ODI or GoldenGate to transport the data into the TimesTen database, but instead I’ll use a new utility that comes with this release of TimesTen called ttimportfromOracle. which connects TimesTen to an Oracle database, creates TimesTen tables to reflect the Oracle table structures (but using more optimal TimesTen datatypes), and can take advantage of TimesTen for Exalytics’ column-based compression, potentially reducing the disk space used when storing table data by unto a factor of 5.
Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation. All rights reserved.
C:\Users\Administrator>cd c:\temp\tt
c:\TEMP\tt>mkdir tt_sh
c:\TEMP\tt>cd tt_sh
c:\TEMP\tt\tt_sh>C:\TimesTen\tt1122_64_3\support\ttImportFromOracle.exe
-oraConn sh/password@orcl -compression 1 -tables sales promotions products customers channels times
Beginning processing
Resolving any tablename wildcards
Eliminating any duplicate tables
Getting metadata from source
Generating database user list
Assigning TimesTen datatypes
Analyzing source tables (this may take some time)
Analyzing table 'SH.SALES' ...
Analyzing table 'SH.PROMOTIONS' ...
Analyzing table 'SH.PRODUCTS' ...
Analyzing table 'SH.CUSTOMERS' ...
Analyzing table 'SH.CHANNELS' ...
Analyzing table 'SH.TIMES' ...
Estimating compression ratios
Generating output files
Finished processing
Using the utility gives me a set of scripts for creating a corresponding TimesTen user for the Oracle database user; scripts to create the tables and indexes, load data from Oracle into the TimesTen tables, and then gather stats on the resulting TimesTen database (note that earlier releases of TimesTen don’t come with this utility as standard, and the 11.1.2.2.4.x version’s utility can’t load into compressed TimesTen tables). Using the ttisql command-line interface to the TimesTen server, I therefore run the scripts, and then check-out the resulting tables using SQL*Developer.
So that’s 900K+ rows loaded into TimesTen, using compression, and easily fitting into my 1TB of RAM. Next step is to model the new tables in the RPD, like this:
and then create the same set of reports. Running them feels roughly the same as the Oracle ones; maybe ever-so-slightly-slower, but nothing really noticeable. And again – this is on a single user system, whereas TimesTen really comes into itself with lots of concurrent users. Taking the TimesTen SQL query for the analysis equivalent to the Oracle one we explain-planned earlier on, you can see though that there’s no partition elimination going on, something that’s going to hit TimesTen if we get noticeably bigger tables.
But overall, it’s on a par. The Oracle source performs fine, and so does TimesTen. But this is only just under a million rows of data though – what if we loaded a larger dataset, for example the Airline Delays dataset used for Exalytics demos, that has around 130 million rows in the main fact table? Loading the main fact table plus a handful of dimensions into TimesTen using compression takes up just under 8GB of RAM, so we’re OK there , but what about response times? Let’s start by trying out a few queries against the source Oracle database, on its own. The report below, summarising flights from SFO in 1994 by carrier and quarter, took around 4 seconds to run against my Oracle database source – again, not bad considering it had to sum-up 130m rows of data.
Running the equivalent report against the TimesTen source took about double the time though, about 6 to 8 seconds; not bad considering there’s no partitioning or PQ, but there’s still indexes, and the dataset is in-memory of course. But it’s still 8G of RAM to store it, and it’s not “speed of thought”, if you know what I mean. So how about storing it in an Essbase ASO database then – how will that work? And how do you go about creating an Essbase ASO database?
UPDATE 1st August 2013: I’ve since discovered that there was a crucial, post-load step that I didn’t perform with these TimesTen tables – running the Index Advisor. See this follow-up blog post on how the Index Advisor brought the TimesTen query response times down dramatically, and the final, concluding post where I discuss my overall findings.
There’s actually a number of ways that you could turn a relational star schema into an Essbase ASO database, including the new cube spin-off feature I blogged about the other week, but building the cube automatically assumes that your data is all lined-up correctly, and with Essbase there’s lots of ways your source data can trip you up; for example, by default every dimension member ID and alias (description) needs to be unique, not only within a level, or even within a particular dimension, but in fact across all dimensions in the cube (database). Because of this, you often need to pre-process your incoming data to make dimension members unique in this way, and my preferred way to do this is through Essbase Studio, bundled and pre-integrated with Exalytics 1.0.0.3 and OBIEE 11.1.1.7.
The resulting ASO database was only about 24MB in size (just the level-0 members, no aggregate views), and I then brought it into the BI Repository in a similar way to the TimesTen database.
Running reports and analyses though, was a revelation. Whatever query I threw at it, however deep I drilled into the dataset, response times were instant, and that was even without adding any aggregate views (equivalent to materialised views with Oracle. Not a bad start, but what about doing the same with the Airlines dataset, 130m+ rows of flight data across many years and US states?
Well surprisingly, the 130m rows loaded in about a minute or so, and the resulting Essbase ASO database was only around 250MB in size, as opposed to the 8GB required by TimesTen for the same dataset. Even more surprisingly, all the data went in, and response time was either instant, or a second or so – noticeably faster than the TimesTen dataset, or even the Oracle dataset. So why is this, and how can Essbase ASO (which resides on disk) be faster than TimesTen (which runs in-memory)?
There’s actually a couple of reasons, and some important caveats too. As to why ASO is so fast – it’s designed to be fast, with Hyperion reacting to competition back in the early 2000′s from the likes of Microsoft, as well as Oracle with Oracle OLAP’s compressed composites, and its designed to store lots of detail-level data, across large numbers of dimensions, and aggregate it quickly. Contrast this with TimesTen, which started-off as an OLTP database and only now is getting the sorts of analytic and VLDB features you’d expect from databases of this type.
But there’s some important limitations that you need to be aware of, if you’re thinking about moving an entire EBS-style database into Essbase ASO. First of all – you need to know Essbase, and you need to know Essbase’s data rule limitations, and you also need to consider what happens when users want to query across multiple facts using conformed dimensions. Now you can do all of that using Essbase and techniques such as cube partitioning, dummy dimensions and so on, but it’s not as simple as creating the equivalent TimesTen database tables, or even better, using the Summary Advisor to recommend aggregates based on usage patterns.
But if you’re set on caching your entire detail-level dataset into Exalytics, think about using Essbase ASO as your storage engine, as it’s designed from the ground-up to quickly analyse and aggregate large, sparse, datasets. For source data around the size of the SH Sales History star schema, there’s not much in it, but for datasets of the size of Exalytics’ Airline Delays dataset, you’ll most probably notice the difference immediately.
Update 29th July 2013: The statement about Exalytics’ memory-to-CPU bandwidth being less than it’s disk-to-CPU bandwidth was incorrect and was removed. Exalytics’ memory bandwidth is actually (max) around 90GB/sec compared to the disk controller’s 16-32GB/sec, and should therefore not be an impediment to TimesTen’s performance. Apologies for any confusion caused.
Update 1st August 2013: See this blog post where I followed-up the initial tests with some further optimisation of the TimesTen database, and this final post where I further optimised the Oracle, and Essbase ASO data sources, and came up with some final conclusions on how to store detail-level data in Exalytics’ memory.
Rittman Mead BI Masterclass Tour: India, October 2013- Register Now!
I’m very pleased to announce a one-off Rittman Mead BI Masterclass Tour that we’re organising, in partnership with ODTUG, to take place in India in October 2013. Along with the US and UK, India has the largest amount of readers of our blog, and so we thought it’d be nice to put together an event especially for our Indian readers and run it in conjunction with our favourite international user group, ODTUG.
The event is taking place over three cities – Bangalore, Hyderabad and Mumbia – on the week commencing 13th October 2013. We’ll go to Bangalore on Tuesday 15th October, Hyderabad on Thursday 17th October and then fly up to Mumbai for Saturday, 19th October 2013. Joining me will be Venkatakrishnan J, who you’ll all know from the OBIEE and EPM posts on our blog, and Stewart Bryson, Managing Director for Rittman Mead in the US and also prolific blogger, writer and presenter on OBIEE, DW, data integration and data modelling. We’ll be bringing the best of our recent presentations including new content from Oracle Openworld, and we’ll leave plenty of time for networking, introductions, questions and deep-dive discussions, with topics covering the following Oracle products:
- Oracle BI Enterprise Edition 11.1.1.7 development, administration and new features
- Oracle BI Applications 11.1.1.7.1 configuration and development
- Oracle EPM Suite 11.1.2.3+
- Oracle Exalytics In-Memory Machine
- Oracle Data Integrator
Stewart, Venkat and I will deliver a number of sessions over each day, with the planned agenda as follows:
- 9.30am – 10.00am: Registration and Welcome10.00am – 10.30am: Oracle BI, Analytics and EPM Product Update – Mark Rittman
- 10.30am – 11.30pm: Reporting against Transactional Schemas using OBIEE11g – Stewart Bryson
- 11.30pm – 12.30pm: OBIEE 11g Integration with the Oracle EPM Stack – Venkatakrishnan J
- 12.30pm – 1.30pm: Lunch & Networking
- 1.30pm – 2.30pm: OBIEE and Essbase on Exalytics Development & Deployment Best Practices – Mark Rittman
- 2.30pm – 3.30pm: Oracle BI Multi-user Development: MDS XML versus MUDE – Stewart Bryson
- 3.30pm – 4.00pm: Coffee Break & Networkng
- 4.00pm – 5.00pm: Intro and tech deep dive into BI Apps 11g + ODI
- 5.00pm – 6.00pm: Metadata & Data loads to EPM using Oracle Data Integrator - Venkatakrishnan J
The dates, locations and registration links for the three events are as follows:
- Bangalore, Fortune Select Trinity Hotel, Whitefield: October 15th 2013, 10am – 6pm (IST)
- Hyderabad, Westin Mindspace, Hyderabad, Hitec City: October 17th 2013, 10am – 6pm (IST)
- Mumbai, Courtyard Marriott, Mumbai: October 19th 2013, 10am – 6pm (IST)
We’re also investigating the idea of bringing our Rittman Mead BI Forum to India in 2014, so this would be a good opportunity to introduce yourself to us and the other attendees if you’d like to present at that event, and generally let us know what you’re doing with Oracle’s BI, EPM, analytics and data warehousing tools. There’ll also be lots of ODTUG goodies and giveaways, and a social event in the evening after the main masterclass finishes. The last events we ran in India were back in 2010 for the launch of OBIEE 11g, each of which was a sell-out and were great fun. Hopefully some of you who came in 2010 can join us again in 2013, where we can look at how Oracle’s BI products have progressed since then, and see what’s planned and announced at Openworld 2013.
Numbers are strictly limited, so register now using one of the three links above, or contact either Mark Rittman (mark.rittman@rittmanmead.com) if you have any other questions about the agenda, or Venkatakrishnan J (venkat@rittmanmead.com) for questions about registration and the venues.. Note that this is a lecture-based format, there are no hands-on labs, although you are welcome to bring your laptops + installed software if you would like to try out any of the techniques we describe, or discuss any aspect of Oracle BI development and administration. Hopefully we’ll see you in India in October 2013, on one of the three days!