Tag Archives: Oracle BI Suite EE
Simple Ways to Simplify: Quick Fixes to Enhance OBIEE Visuals
Over the past couple of months I worked with a few clients in order to conduct an assessment on dashboard and analysis design. In most cases I was noticing that the dashboards were overflowing with content to the point that there was really no discernible “story” or “flow” to the information. Most of the dashboards were an overwhelming hub of large tables, excess dashboard prompts, and complicated charts and graphs. Many times the client complaints were that the dashboards were not being fully adopted by the user base. The users knew that something was in fact wrong with what they were looking at, but just couldn’t put their finger on why the analysis process was so frustrating. While this may seem unavoidable during a time when you are trying to provide your users with what they want and “need,” there are a few simple ways to aid this issue. During the development process, simplicity is often overlooked and undervalued, and that is why a few key design principles can drastically improve the user experience.
In many cases that I’ve seen, all dashboards start with the good intention of quickly relaying information to the user. However, as the requirements process grows in length, more and more compromise is made in keeping clean, consistent, functional design. Many of the dashboards in my assessments often hand the user everything they ask for without ever questioning the value. Developers often settle on poor design just to have something accepted by their users and released into production.
While each client is unique and has their own set of issues to resolve, there are a few consistent principles of dashboard design that can be applied for everyone.
Top-Down Analysis—The practice of allowing users to drill from summary to detail, gives your user community the ability to make their dashboards as dynamic as they need to be. This method will never give the user too much, or too little information. With detail being a choice, not the default, the user that can log in and be prompted to have to drill into further detail, rather than being presented everything all at once. While this is a widely accepted best practice when it comes to dashboard design, this principle is ignored more than you would think. Odds are that your VP does not need to see 50 rows of detail level information in a table every time they view their dashboard. The primary purpose of presenting this detail level is to answer a very important question that should be posed by analyzing your data at the summary level. The user can examine something simple like a trend analysis, and then decide whether further examination is needed. The benefit of this common principle is that your user is never overwhelmed and irritated with an overload of information. The dashboard should be treated with the same care and consideration you would . Your organization’s dashboard (product) should be a joy to use, not an experience coupled with frustration.
Simplicity and Trimming the Fat-This is another very simple, yet often ignored design principle. From my observation, many developers will create a chart or graph and will leave all the default settings, no modifications needed right? While there is nothing inherently wrong with this, the default will leave a lot of extra pixels on the graph such as unneeded axis titles, shadowing, canvas size, etc. With just a little effort here, you can remove all of these unnecessary data pixels (pictured below) and provide a more professional, clean design. The point that I’m trying to make here is, let’s not get lazy with our visualizations. Instead, we should try to give a lot of thought what is useful in the visualization, and what can be discarded without hindering the message. The less cluttered we make our visualizations, the more pleasant the user experience will be. And as we all know, the happier our user community is, the easier your life as a developer’s will be.
Options for Option’s Sake-The types of visualizations used for a dashboard are one of the most important criteria for user adoption. By choosing visualizations that do not adequately display the type of analysis needed, or tell the correct story about the data, can be a frustrating waste of time for the user. Just because there are a lot of available graphing options in your analytical tool, does not mean they need to be used. This mistake is often made in an attempt to visually enhance the dashboard, or add “variety” . Try to consider things like what type of scale is in my graph (nominal, or interval pictured below), or do I want to provide summary or detail? Be sure to choose your graph based on these factors, rather than picking a graph that you think will add to variety and then figuring out how you can make it useful.
Visually appealing dashboards are important, however this is only relevant when the graphs are enhancing the users analytical experience. These mistakes are very costly because the overall goal of a dashboard is not to provide variety for varieties sake, but to quickly and accurately relay a message. By focusing only on visualization variety, we run a terrible risk of rendering a dashboard useless.
There are a lot of great resources out there that can provide more detail that can surely take your dashboards to the next level so I certainly suggest reading up information design methodology. I think the principles I’ve listed above are a great way to get started and provide some quick fixes on the road to enhancing the user experience within your organization.
Going Beyond the Summary Advisor with TimesTen for Exalytics
I’m over in Seattle at the moment for ODTUG KScope’14, and one of the topics I’m presenting on this week is the use of TimesTen with OBIEE; what I’m going to talk about is taking TimesTen beyond its current use as an in-memory cache for aggregates created by the Summary Advisor, and instead using it to store additional custom aggregates, “hot” data from the source data warehouse, and other custom structures created using SQL*Developer and potentially loaded using Oracle Data Integrator. The point of the session is to show one of the main benefits of TimesTen even in the world of the new In-Memory Option for Oracle Database – it’s a flexible but Oracle-compatible database that’s typically under the control of the BI team and open to much more customisation than most customers realise.
To illustrate the point, I’m going to run through an example using the Oracle Database Sales History (SH) schema as the source in this article, with the example going through five stages of development:
1. First, I’ll load TimesTen for Exalytics in the usual way, by having the Summary Advisor run some recommendations and then generate aggregates to load into TimesTen.
2. Then, I’ll add some of my own aggregates into TimesTen, by using the Aggregate Persistence Wizard, so that we’ve now got a base set of aggregates from me (ones that I know we’ll need) plus whatever the Summary Advisor recommends based on historic query patterns.
3. Next, I’ll use TimesTen to hold what we refer to as “hot data” – the actual transactional data from the source database in this case from the last six months. So now, TimesTen will contain both the full transaction-level data that users are most likely to query at detail-level, plus aggregations of the full dataset over time, giving us even more chance that TimesTen will be able to fulfil a user’s query.
4. Then I’ll supplement this data in TimesTen with some additional reference data, that I’ll bring in from file using a tool like Oracle Data Integrator or the Oracle Database external table feature, modifying the existing TimesTen table structure using SQL*Developer to hold these additional dimension attributes.
5. Finally, I’ll set up incremental refresh of all of this, in order to avoid the full table-drop-and-reload approach that the Summary Advisor and Aggregate Persistence Wizard use, and to make data refreshing more efficient
Let’s start with steps 1 and 2, creating the aggregate layer for the underlying Sales History Oracle data source.
1. Creating Initial Aggregate using the Summary Advisor
We can use OBIEE’s Summary Advisor, and the underlying Aggregate Persistence mechanism that the Summary Advisor uses, to build the in-memory aggregates for our underlying datasource. Starting with the simplest part first, we’ll use the Summary Advisor to create a set of aggregates based off of recommendations from historic query data.
Running the Summary Advisor in my case brings up a recommendation to create three aggregates:
Running the script that the Summary Advisor generates brings up one of the issues though that you’re likely to hit when using this tool – your RPD has to be absolutely “rock solid” in terms of correctness, otherwise your aggregate build will fail. In the case of this example, the summary advisor aggregate persistence script starts-off running OK, but then errors at the end when one of the aggregate tables fails to build:
c:\Middleware\Oracle_BI1\bifoundation\server\bin>nqcmd -d coreapplication_OH799 36239 -u weblogic -p welcome1 -s c:\ttscripts\summ_adv_SH_full_blog.sql ------------------------------------------------------------------------------- Oracle BI ODBC Client Copyright (c) 1997-2013 Oracle Corporation, All rights reserved ------------------------------------------------------------------------------- create aggregates "ag_561184940" for "Sales History (Oracle + TT)"."SALES"("AMOUNT_SOLD","QUANTITY_SOLD") at levels ("Sales History (Oracle + TT)"."CustomerDim"."City", "Sales History Oracle + TT)"."TimesDim"."Month", "Sales History (Oracle + TT)"."ProductsDim"."Category") using connection pool "tt_aggr_store"."Connection Pool" in "tt_aggr_store".."EXALYTICS", … "ag_1162984574" for "Sales History (Oracle + TT)"."SALES"("AMOUNT_SOLD","QUANTITY_SOLD") at levels ("Sales History (Oracle + TT)"."TimesDim"."Year", "Sales History (Or cle + TT)"."ProductsDim"."Category", "Sales History (Oracle + TT)"."PromoDim"."Promo Subcategory") using connection pool "tt_aggr_store"."Connection Pool" in "tt_aggr_store".."EXALYTICS" [10058][State: S1000] [NQODBC] [SQL_STATE: S1000] [nQSError: 10058] A general error has occurred. Statement preparation failed Processed: 1 queries Encountered 1 errors
Looking at the list of tables left in TimesTen at that point, I can see that one of the dimensions failed to build, which then cascading down to the dependent fact table build failing:
C:\Users\Administrator>ttisql Copyright (c) 1996, 2013, Oracle and/or its affiliates. All rights reserved. Type ? or "help" for help, type "exit" to quit ttIsql. Command> connect "DSN=tt_exalytics_3;uid=exalytics;pwd=welcome1"; Connection successful: DSN=tt_exalytics_3;UID=exalytics;DataStore=C:\TimesTen\tt_data_3;DatabaseCharacterSet=AL32UTF8;ConnectionCharacterSet=US7ASCII;DRIVER=C:\TimesTen\TT1122~2\bin\ttdv1122.dll;LogDir=C:\TimesTen\tt_data_3;PermSize=4000;TempSize=2000;LockWait=60010.0;SQLQueryTimeout=60000;TypeMode=0;QueryThreshold=60000;PLSCOPE_SETTINGS=IDENTIFIERS:NONE;RangeIndexType=1;(Default setting AutoCommit=1) Command> tables; EXALYTICS.SA_CATEGOR0000AFE8 EXALYTICS.SA_CHANNEL0000B8F9 EXALYTICS.SA_CITY0000AC6A EXALYTICS.SA_MONTH0000AC81 EXALYTICS.SA_PROMO_S0000B8E8 EXALYTICS.TEST 6 tables found.
Summary Advisor aggregates failing to build is something that regular Exalytics developers will be used to, at least from when they first use the tool. The trick to it is to make sure you run the Model Checker, within the BI Administration Tool, before you try and generate some aggregates; whilst your RPD might pass the regular consistency check and be valid as a repository that can go online and be used for reporting, it may not be consistent or logically correct from an aggregation standpoint, and looking at the NQQuery.log file, I can see that the first aggregate failed to build because aggregate persistence rejected the shared product dimension all of the aggregate fact tables used.
Running the model checker on my business model, I can see there are two errors listed for the product dimension, and a bunch of warnings where I’ve not added dimension attribute columns to their relevant dimension level.
Running the logical SQL given in the two error messages at the start, I can see that there’s an error in the data that I’m using for my Products dimension data source, in that one or more of the dimension levels contains keys that are duplicates – in this case because there are two “Accessories” product subcategories with different IDs. I go into SQL*Developer and correct this issue, and also add the logical columns listed as “warnings” into their respective logical dimension levels, so that running the model checker again gives me this message instead:
I then pick two of the aggregate recommendations and run the script for them, so in the end I’m left with the aggregate tables below in my RPD.
Thereafter, I can run the Summary Advisor again to generate some more recommendations, or I can add my own specific aggregates into the TimesTen database using the Aggregate Persistence Wizard, for example to add an aggregate I know users are going to want to use.
2. Adding Custom Aggregates using the Aggregate Persistence Wizard
Generating aggregates using the Aggregate Persistence Wizard uses the same underlying mechanism to create and populate the aggregate tables, but in this instance I select the specific measures, dimensions and hierarchy levels I want to aggregate by. In this instance, I use the Aggregate Persistence Wizard to create an additional aggregate using the Promotions dimension, as I know some new reports coming along will make use of this pre-aggregation.
So where I am at the end of all this is with a TimesTen aggregate data mart containing three aggregate fact tables – two recommended by the Summary Advisor, one I added myself – along with supporting dimension tables.
And this is great for handling dashboard queries that request summarised data, through a graph or pivot table analysis. But what about when users want to drill-down to the detail, or run reports against recent transactional activity? That’s where the second TimesTen use-case, “hot data”, comes in.
3. Populating TimesTen with Transactional “Hot Data”
So far we’ve looked at populating our TimesTen database with aggregate data, but what about making the most recent set of transactions available to users also in TimesTen, so that analyses running against recent activity even at the detail level run fast too? To do this we can use the ttImportFromOracle utility to replicate into TimesTen the Sales History schema, and then edit the data loading script it generates to only load the last six months of data.
This utility only works for Oracle sources, and ships with recent versions of TimesTen in the /support directory. Using it I can have the utility scan the data in my source database in order to recommend the most space-efficient TimesTen datatypes, and I can also use it to recommend compression settings for use with TimesTen for Exalytics Columnar Compression feature.
As ttImportFromOracle requires you to have a TimesTen schema with the same name as the one you’re importing from in Oracle, I create a new schema in my TimesTen for Exalytics database, ready for importing from the Oracle source:
create user sh identified by welcome1; grant create session to sh; grant create table to sh; grant select on SYS.OBJ$ to sh; grant admin to sh;
Then I can use the utility to create the DDL and data loading scripts that I’ll then use to import the hot data from Oracle.
c:\TimesTen\tt1122_64_3\support>ttImportFromOracle.exe -oraconn sh/password@orcl -tables CUSTOMERS PRODUCTS SALES CHANNELS PROMOTIONS TIMES -typeMap 2,1 -compression 1 Beginning processing Resolving any tablename wildcards Eliminating any duplicate tables Getting metadata from source Generating database user list Assigning TimesTen datatypes Analyzing source tables (this may take some time) Analyzing table 'SH.CUSTOMERS' ... Analyzing table 'SH.PRODUCTS' ... Analyzing table 'SH.SALES' ... Analyzing table 'SH.CHANNELS' ... Analyzing table 'SH.PROMOTIONS' ... Analyzing table 'SH.TIMES' ... Optimizing TimesTen datatypes Estimating compression ratios Generating output files Finished processing
This set of scripts creates the tables and indexes to hold the replicated data from Oracle, and I can edit the “LoadData.sql” script that comes with the DDL scripts to load just the last six months of data – to do this I split the “ttLoadFromOracle” TimesTen procedure calls in this script into ones for the dimensions and one for the fact table load, which after amending it to load just six months data looks like this:
timing 1; call ttLoadFromOracle('SH', 'SALES', 'SELECT S.PROD_ID, S.CUST_ID, S.TIME_ID, S.CHANNEL_ID, S.PROMO_ID, S.QUANTITY_SOLD, S.AMOUNT_SOLD FROM SH.SALES S, SH.TIMES T WHERE S.TIME_ID = T.TIME_ID AND T.CALENDAR_MONTH_DESC > ''2001-06''');
Then I connect to TimesTen as the SH user, providing the password to the Oracle database as part of the connect string, and then run the scripts I’ve just generated:
Command> connect "DSN=tt_exalytics_3;uid=sh;password=welcome1;oraclepwd=password"; Command> run c:\ttimportscripts\CreateTables.sql Command> run c:\ttimportscripts\LoadData_dims.sql Command> run c:\ttimportscripts\LoadData_fact.sql Command> run c:\ttimportscripts\CreateIndexes.sql Command> run c:\ttimportscripts\UpdateStats.sql
I can then go over to SQL*Developer and see the new TimesTen tables there, with the optimised datatypes provided by ttImportFromOracle:
and also import these new tables into the RPD, adding an extra schema alongside the aggregate tables:
Then, the way to make use of this six months of transactional data in TimesTen is to set up “fragmentation” in the OBIEE RPD, so that the BI Server goes to the TimesTen data source when queries require data from the past six months, and the standard Oracle datasource if data further back from that is required; in addition, because of the aggregates we’ve also set up, it’ll go to the TimesTen aggregate tables if queries request aggregated data, so TimesTen performs the role of holding both “hot” data, and aggregate data.
The first step in setting this up is to map in the new TimesTen table as additional logical table sources into the existing business model. In the screenshot below, you can see the new TimesTen logical table sources added to the existing Oracle, and TimesTen aggregate logical table sources, and when you do this make sure you remove any additional logical table keys that might come across when you map in the new TimesTen tables.
Then, for just the fact table within the logical iodel, edit the Oracle detail-level and the TimesTen “hot data” detail level table sources, add fragmentation conditions to define what time period each source covers, like this:
Do this just for the fact table logical table source; then, when a query comes through to the BI Server, if it needs to refer to the TimesTen logical table source it’ll join-out to the relevant TimesTen hot data logical table sources to get the rest of the required data, and when it needs to go to the Oracle fact table logical table source, it’ll join-out to the Oracle LTSs, as shown in the diagram below.
Then, when you incude the TimesTen aggregate tables into the architecture as well, you’ve got three potential sources of data for user reports, all handled seamlessly and automatically by the BI Server; the Oracle and TimesTen “hot” data sources for detail-level reporting, one or the other (or both) used for queries at the transaction level, and the TimesTen aggregate tables when they would answer a query faster than rolling-up the detail-level sources on-the-fly.
4. Supplementing with Additional Reference Data
One of the advantages of using a mid-tier database like TimesTen, that’s under the control of the BI department, is that you can be a lot more “agile” in terms of bringing in new data sources, than is often the case when you’re reporting against a data warehouse that’s strictly governed and a shared corporate resource. TimesTen lends itself well to agile development too, in that you can work with it using tools like SQL*Developer and easily load data into it using ETL tools such as Oracle Data Integrator, or leverage the connectivity features in the Oracle Database and then use ttImportFromOracle.
Working with TimesTen database structures is more or less the same process as working with Oracle ones, but there’s a couple of differences you need to be aware of when developing your TimesTen tables. Adding new columns to a table is possible, but these can end up stored “out of line” with the main column set and can cause performance issues when queried, or compatibility issues when loading using tools like ttImportfFromOracle. If you’re using TimesTen to store “hot” data, make sure you’ve got corresponding tables for this reference data in the source Oracle database, and make sure you select the most appropriate TimesTen datatypes for your new data (for example, TT_TINYINT and TT_INTEGER) rather than just the default Oracle-like datatypes.
Most importantly, for all of your TimesTen tables, make sure you run the Index Advisor after you start to run user reports in order to ensure you’re using the right indexes for your particular query patterns, and make sure you gather stats on your TimesTen tables as well as the Oracle ones, so that the TimeTen query optimiser can generate the most efficient query plans.
5. Setting up Incremental Refresh
The last thing we want to set up for our TimesTen analytic data mart is some way to incrementally refresh the aggregates we built using the Summary Advisor and Aggregate Persistence Wizard. The way these tools work is that they drop, and then recreate the aggregate tables each time you load them, and add and remove them from the RPD at the start and end of the data load, as shown in the diagram below:
What you can do, though, as outlined in this earlier blog post by Robin Moffatt, is take the POPULATE BI Server command that you’ll find in the nqquery.log file after an aggregate build, and use it to just refresh the aggregates in-place, without dropping the tables beforehand or removing them from the RPD, as shown in the diagram below.
Looking through the nqquery.log file I can see various POPULATE commands for the TimesTen dimension tables, like this:
populate "SA_Promo_C0000B8E4" mode ( append table connection pool "tt_aggr_store"."Connection Pool") as select_business_model "Sales History (Oracle + TT)"."PROMOTIONS"."PROMO_CATEGORY" as "PROMO_CATE0000B8C6","Sales History (Oracle + TT)"."PROMOTIONS"."PROMO_CATEGORY_ID" as "PROMO_CATE0000B8C7", RCOUNT (1) as "Promo_C_0000B8E4SK" from "Sales History (Oracle + TT)"; [[
If I then couple that with the INACTIVE_SCHEMAS variable being set to the TimesTen physical data source, add a command to truncate the dimension table before the load, and then alter the fact table POPULATE command’s SELECT statement to only load in just this past month’s data, deleting that data down from the target fact table beforehand, I’ve got a full incremental refresh I can run from a cron job or other scheduler. The script excerpt below shows such an example, with just a few of the dimensions being reloaded.
For the “hot data” tables it’s a similar case of taking the existing LoadData.sql and either modifying it to load in just the new transactions, or reload all six months if it’s not easier to do just that.
execute physical connection pool "tt_aggr_store"."Connection Pool" truncate table SA_Categor0000AFE8; SET VARIABLE INACTIVE_SCHEMAS='"tt_aggr_store".."EXALYTICS_AGG_WIZ"': populate "SA_Categor0000AFE8" mode ( append table connection pool "tt_aggr_store"."Connection Pool") as select_business_model "SH (Oracle)"."PRODUCTS"."PROD_CATEGORY_DESC" as "PROD_CATEG0000AFB4", RCOUNT (1) as "Categor_0000AFE8SK" from "SH (Oracle)"; execute physical connection pool "tt_aggr_store"."Connection Pool" truncate table SA_Month0000AC81; SET VARIABLE INACTIVE_SCHEMAS='"tt_aggr_store"..”EXALYTICS"': populate "SA_Month0000AC81" mode ( append table connection pool "tt_aggr_store"."Connection Pool") as select_business_model "SH (Oracle)"."TIMES"."CALENDAR_MONTH_DESC" as "CALENDAR_M0000AC0A","SH (Oracle)"."TIMES"."CALENDAR_QUARTER_DESC" as "CALENDAR_Q0000AC0E","SH (Oracle)"."TIMES"."CALENDAR_YEAR" as "CALENDAR_Y0000AC12", RCOUNT (1) as "Month_0000AC81SK" from "SH (Oracle)"; execute physical connection pool "tt_aggr_store"."Connection Pool" truncate table SA_City0000AC6A; SET VARIABLE INACTIVE_SCHEMAS='"tt_aggr_store".."EXALYTICS"': populate "SA_City0000AC6A" mode ( append table connection pool "tt_aggr_store"."Connection Pool") as select_business_model "SH (Oracle)"."CUSTOMERS"."CUST_CITY" as "CUST_CITY0000ABC5","SH (Oracle)"."CUSTOMERS"."CUST_STATE_PROVINCE" as "CUST_STATE0000ABD4", RCOUNT (1) as "City_0000AC6ASK" from "SH (Oracle)"; execute physical connection pool "tt_aggr_store"."Connection Pool" delete from ag_SALES where CALENDAR_M0000AC0A = '2001-12'; SET VARIABLE INACTIVE_SCHEMAS='"tt_aggr_store".."EXALYTICS"': populate "ag_SALES" mode ( append table connection pool "tt_aggr_store"."Connection Pool") as select_business_model "SH (Oracle)"."CUSTOMERS"."CUST_CITY" as "CUST_CITY0000ABC5", "SH (Oracle)"."SALES"."AMOUNT_SOLD" as "AMOUNT_SOL0000ABF6", "SH (Oracle)"."SALES"."QUANTITY_SOLD" as "QUANTITY_S0000ABFB", "SH (Oracle)"."TIMES"."CALENDAR_MONTH_DESC" as "CALENDAR_M0000AC0A", "SH (Oracle)"."PRODUCTS"."PROD_CATEGORY_DESC" as "PROD_CATEG0000AFB4" from "SH (Oracle)" where "SH (Oracle)"."TIMES"."CALENDAR_MONTH_DESC" = '2001-12';
Conclusion
So what we’ve got here then is a much-expanded use of TimesTen in the context of Exalytics and OBIEE; we’ve not only used it as an in-memory cache for Summary Advisor aggregates, but we’ve used the same underlying mechanism to create other aggregates that we know will be useful for future queries.
We’ve also made use of the 1-2TB of RAM on the Exalytics server to also cache the last six months of detail-level transactional data, making it even more likely that TimesTen will be able to answer all of the users’ queries.
Once you’ve got your data in TimesTen, you’ve got access to the same full set of analysis functions that are available when reporting against regular Oracle databases, with TimesTen for Exalytics giving you more native aggregation and analysis functions compared to standard TimesTen, and the BI Server “functionally compensating” for anything not natively available in TimesTen by performing the calculation itself, using the raw data provided by TimesTen
Finally, we’ve talked about how we can load in additional datasets into TimesTen via ODI or ttImportFromOracle, potentially using the latter in conjunction with external tables to bring in file data, and then looked at what’s involved in trickle-feeding the TimesTen cache rather than dropping and reloading it each time.
If you’re at KScope’14 in Seattle this week, I’m presenting on this topic on the Wednesday; if not, and you’re potentially interested in us helping you get more out of your TimesTen for Exalytics install, just drop me at line at mark.rittman@rittmanmead.com.
SampleApp v406 – Automatic startup of OBIEE
Last week Oracle released v406 of SampleApp. SampleApp is a one-stop-shop for a demonstration of pretty much any conceivable thing that OBIEE is capable of, and then some more on top of it. It is a VirtualBox appliance with everything on the one machine (Database, OBIEE, Endeca, TimesTen, Essbase, etc), and demonstrates basic analysis building techniques through to dashboarding, Mobile App Designer, analysis visualisations with D3, ADF, JavaScript, backend hackery with undocumented NQS calls (hi Christian!), and much, much more.
So, SampleApp is awesome, but … there’s no such thing as absolute perfection ;-) One of the things that is still missing from it is the automatic start/stop of OBIEE when you bootup/shutdown the machine respectively. Setting it up is easy to do, as I demonstrate below. I’ve also put my cheatsheet for whipping SampleApp into line for my day-to-day use, focussed on the commandline and automation (because respectively that’s where I spend most of my time on a server and because I’m lazy).
The OBIEE init.d service script that I demonstrate here is available for use on any Linux installation of OBIEE. For more information, see the Rittman Mead public scripts github repository here: http://ritt.md/init.d
Before we get started I’m assuming here that you’ve:
- downloaded the 28GB worth of zip files
- Unpacked them using 7zip
- Found 70+GB of disc space free and imported the OVF into VirtualBox
- Started up the VM
There’s full instructions in the SampleApp_QuickDeploymentGuide–406.pdf, available in the SampleApp v406 Documentation download from the same page as the SampleApp image itself.
So to make OBIEE start automatically, the first thing we need to do is make sure that the database (where the RCU schemas are) is doing the same, by setting it up as a service (init.d). This is based on this article if you want more details about quite how it works, but for know you just need to copy and paste this whole code block into the command prompt on SampleApp to create the necessary files. If you can’t copy & paste between your host and the Virtualbox guest, just go to this blog from Firefox within the SampleApp VM itself.
(Or if you know what you’re doing, SSH onto the server and paste the text into an SSH client on your host machine.)
Copy and paste this whole code block into the command prompt:
# Create startup/shutdown script files mkdir -p /home/oracle/scripts chown oracle.oinstall /home/oracle/scripts cat>>/home/oracle/scripts/startup.sh<<EEOF #!/bin/bash # Start Listener lsnrctl start # Start Database sqlplus / as sysdba << EOF STARTUP; ALTER PLUGGABLE DATABASE ALL OPEN; EXIT; EOF EEOF cat>>/home/oracle/scripts/shutdown.sh<<EEOF #!/bin/bash # Stop Database sqlplus / as sysdba << EOF ALTER PLUGGABLE DATABASE ALL CLOSE IMMEDIATE; SHUTDOWN IMMEDIATE; EXIT; EOF # Stop Listener lsnrctl stop EEOF # Make them executable chmod u+x /home/oracle/scripts/startup.sh /home/oracle/scripts/shutdown.sh chown oracle.oinstall /home/oracle/scripts/startup.sh /home/oracle/scripts/shutdown.sh # Create service script cat>/tmp/dbora<<EOF #!/bin/sh # chkconfig: 345 90 25 # description: Oracle auto start-stop script. # # Set ORA_OWNER to the user id of the owner of the # Oracle database software. ORA_OWNER=oracle case "\$1" in 'start') # Start the Oracle databases: # The following command assumes that the oracle login # will not prompt the user for any values su - \$ORA_OWNER -c "/home/oracle/scripts/startup.sh >> /home/oracle/scripts/startup_shutdown.log 2>&1" touch /var/lock/subsys/dbora ;; 'stop') # Stop the Oracle databases: # The following command assumes that the oracle login # will not prompt the user for any values su - \$ORA_OWNER -c "/home/oracle/scripts/shutdown.sh >> /home/oracle/scripts/startup_shutdown.log 2>&1" rm -f /var/lock/subsys/dbora ;; esac EOF sudo mv /tmp/dbora /etc/init.d/dbora sudo chown root. /etc/init.d/dbora # Make the service script executable sudo chmod 750 /etc/init.d/dbora # Associate the dbora service with the appropriate run levels and set it to auto-start using the following command. sudo chkconfig --add dbora
On SampleApp v406 there is an Oracle 12c container database (CDB), with two “pluggable” databases (PDB) within it. Assuming you’ve not started the database yet, trying to connect to one of the PDBs for the RCU schema will fail:
[oracle@demo ~]$ sqlplus BIEE_BIPLATFORM/Oracle123@localhost:1521/pdborcl SQL*Plus: Release 12.1.0.1.0 Production on Tue Jun 17 03:03:51 2014 Copyright (c) 1982, 2013, Oracle. All rights reserved. ERROR: ORA-12541: TNS:no listener
Now run the service start (which will happen automatically at boot)
sudo service dbora start
And check the status again:
[oracle@demo ~]$ sqlplus BIEE_BIPLATFORM/Oracle123@localhost:1521/pdborcl SQL*Plus: Release 12.1.0.1.0 Production on Tue Jun 17 03:06:12 2014 Copyright (c) 1982, 2013, Oracle. All rights reserved. Last Successful login time: Tue Jun 17 2014 03:02:09 -04:00 Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options SQL>
Now we can set up OBIEE as a service that can start automatically at boot time. This is using a script that I wrote and is shared on the Rittman Mead public scripts github repository here: http://ritt.md/init.d. To install it on SampleApp v406 needs a couple of changes for the environment paths and dependencies etc, which I’ve incorporated in this code block. Again, copy and paste the whole thing into a command prompt on SampleApp.
# lsof is used in the script and isn't installed by default, so let's install it: sudo yum -y install lsof # Now fetch the init.d script itself sudo wget --no-check-certificate https://raw.githubusercontent.com/RittmanMead/scripts/master/obi/init.d/obiee -O /etc/init.d/obiee # Update the FMW_HOME path in the script # If you're doing this manually, you just neeed to change the line # "FMW_HOME=" and put in the FMW_HOME path for your installation. # In the case of SampleApp v406 it is /app/oracle/biee sudo sed -i -e 's/FMW_HOME=.*$/FMW_HOME=\/app\/oracle\/biee/g' /etc/init.d/obiee # Make the script executable sudo chmod 750 /etc/init.d/obiee
You should now be able to run the following status
command to see if OBIEE is running or not:
sudo service obiee status
and if it’s not running, start it up:
sudo service obiee start
To shut it down:
sudo service obiee stop
You can read more about the script here.
There’s one more step to run to make OBIEE start automatically at bootup:
sudo chkconfig --add obiee
Now that the service scripts are in place, restart the machine and check they work:
sudo shutdown -r now
When the VM restarts, it may appear to “hang” on boot – the progress bar will show and “Oracle Linux Server 6.5”, but no desktop. That’s because the services are starting up, and you can switch to the console view to see this. On my Mac I press Shift-Function-Backspace to do this, it may vary on other host operating systems (try ALT + d on Windows):
Once the desktop appears you should be able to launch Firefox and go straight to OBIEE, up and running
There are some additional tweaks I usually make to any new server I work with:
- Install screen (don’t know what screen is? Read this!):
sudo yum -y install screen
Note I’m usingsudo
which this version of SampleApp thankfully has configured for theoracle
user – previous versions didn’t IIRC. - Configure screen with a nice statusbar:
cat > ~/.screenrc <
I’m using the bash “here document” functionality here to embed the contents of a document in a command statement. It means I can cut and paste it from my clipboard (if you’re on a Mac, you really should check out Alfred, which has a clipboard history manager, so I always have this screenrc and many other snippets available to paste within just a couple of key presses). Cut and paste a single command is easier (and thus less error-prone) than explaining what content to edit into which text file with which text editor.
- Set up SSH keys. I wrote about SSH keys previously on this blog (in fact, all of the things on this checklist are covered there). Put simply, it makes logging in much faster and removes the problem of fat-fingering the password:
As with the screen configuration in the previous step, I use a snippet of code from my clipboard, pasted into the shell of any new server I’m working on:
# setup ssh key mkdir ~/.ssh cat >> ~/.ssh/authorized_keys <<EOF ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDMvNjJ8y1a3+NR5OEZHjAd6HhPFdkOI4mDiIIegbRDG+ABhTQmE80DuDe0vp4tex7qefS1u5FG3Xf+hmKDBzKFyegpC+9oEmrtIPkuW7b1Uf/f7Dr6EjXbyA6x1LGEu3THCrGVfw5Mkakqzr43ZfGEFf8k0RiMroHQbp3vOcR+5y0Q132Tbrr0GPd0RPuj4cVOC7QQ6jXSs7TZWepzrcqEpB4M1+N3xS/jEQ5aiCY2FTDOUVj6Y4c0Ogg93bLos6JltPKznq08KXy0ZW/rWDindAmeL0ZAMuU12Qv1ehQZJsxjUwq3jz4yNMgXs5ba3nSvMAr7ZVRzbK5nLl7+eAN3 zaphod@beeblebrox EOF # chmod -R 700 ~/.ssh
As a side node, that’s my public SSH key there. One’s public ssh key is just that – public. If you want to use it, go ahead, it just means I’ll be able to login to your server. That’s because I have the other half of the key-pair; the private key, and that is the one that should not be shared, because with it you can access any server that has been configured with the public key. Private SSH keys should be treated just as your most important passwords (and indeed, you should use a passphase when creating your private SSH key, to add a layer of security to it, so that it can only be used if the passphrase is known).
The next step is Installing obi-metrics-agent, Graphite, and collectl, but that’s a matter for another blog post :-)
Introducing the Updated Oracle / Rittman Mead Information Management Reference Architecture Pt2. – Delivering the Data Factory
In my previous post on our updated Oracle Information Management Reference Architecture, jointly-developed with Oracle’s Enterprise Architecture team, we went through a conceptual and logical view of the information architecture, introducing new concepts like the Raw Data Reservoir, the Data Factory and the Discovery Lab. I said that the Data Factory, Data Reservoir, Enterprise Information Store and Reporting, together with the Discovery Lab, comprised the Information Platform, which could be used to create data applications (raw data plus the data factory), or more wide-ranging Information Solutions, with the Discovery Lab often acting as the place where new ideas and models were prototyped, and algorithms developed that could be productionized in the Execution area of the overall architecture.
And this is typically where most information architectures leave you; with a good starting point and an overall schematic, but without any real guidance on how to deliver it. For example, how do you develop small pieces of functionality that may often be throwaway, and how do you take on larger projects? Is it possible to develop this type of architecture in an agile way, and if so, can the data reservoir part be delivered agile too? In the collaboration with Oracle’s Enterprise Architecture team this is where we made our major contribution, taking the ideas behind our ExtremeBI agile development approach and using them to deliver this updated Information Management architecture. Let’s go back a few steps though and think about why an agile, or at least a different, way of developing BI content is needed, and what part of the overall architecture are we developing for?
Within this overall architecture, the bit that developers such as ourselves typically add value for is the “data factory”; the part of the architecture that acts as the conduit, or interface, between the schema-on-read raw data reservoir and the schema-on-write enterprise information store.
There’s other code to write too; interface and streaming code to get data into the event engine, for example, and more traditional ETL code to load data in from database and application stores. But what we’re most concerned about is the data models and reports that users request in their user stories and epics. If you take a look back at the logical view of our new information architecture, you can see that information can come in at any level in the architecture – raw data reservoir, foundation or access and performance, with different pros and cons for landing and accessing data at each layer, and the ability to take data from one layer and use it to load layers higher up in the layer hierarchy.
And this is where our agile development methodology, “ExtremeBI”, comes in. ExtremeBI, for structured relational sources, typically uses Oracle GoldenGate to trickle-feed data of interest directly into the foundation layer of this information management data architecture, with developers then using that layer plus the access and performance layer, if needed, to close-off user stories as quickly as possible, as shown in the diagram below.
In some cases it’s possible to model against just the foundation layer in OBIEE, where that foundation layer contains all the metadata and history columns that you need. One of the main parts of our ExtremeBI methodology is a technical process to make this modelling possible, and more importantly allow the underlying physical model to evolve in response to user stories about performance and analysis needs, without breaking existing reports or creating an unmanageable system.
We’ll be documenting this process as part of the series of white papers we’ll be co-authoring with Oracle, and we also bring utilities and accelerators to ExtremeBI customer projects, along with continuous integration and regression testing tools, to help get usable software in the hands of end-users as soon as possible.
But what about the parts of the data factory that source data from the raw data store? Can we use similar techniques, and tools like ODI and OBIEE, to deliver working software to users within a short time span? Well in practice it’s not quite so easy; typically, getting information out of schema-on-read database involves writing code, and its easy to fall back into bad habits where everything is scripted, code is brittle and fragile, and no-one feels confident in evolving the model over time.
We think though it’s possible to use ExtremeBI techniques with schema-on-read sources thought, when you combine them with tools like Flume or GoldenGate for trickle-feed extraction and loading, and new capabilities in ODI and OBIEE where Hadoop data can be accessed via Hive. Flume is the industry-standard mechanism for transporting log entries from source to the Hadoop cluster, and GoldenGate has the ability through its Java API to land data in HDFS or Hive, when the source is a relational database (see MOS Doc.IDs 1586188.1 (Hive) and 1586210.1 (HDFS) for example code).
Hive, though slow for ad-hoc query access against Hadoop data, has a range of Serializer/Deserializer utilities that can translate semi-structured and NoSQL sources into more regular columns and tables, with Cloudera Impala removing the response-time issue and rapidly matching Hive in terms of plugins and SerDes. ODI, as I demonstrated in a five-part series on the blog last week, has a number of Hadoop-native knowledge modules and also leverages Oracle Loader for Hadoop to move data out of the raw data reservoir and into the foundation and access + performance layers.
The important thing to understand though when working with data in the updated Information Management architecture, and reporting against schema-on-read and schema-on-write sources, is that the most effective way to deliver real value for end-users is to have two phases to your development work;
- A discovery phase, typically done in the Discovery Lab sandbox or Rapid Development Sandboxes, that focuses on understanding the data and helping the users see where the value is, with less emphasis on governance and corporate standards, and
- A commercial exploitation phase, when our understanding of the data has significantly increased and we start to add structure to the data, and where it’s an appropriate time to add governance and address data quality issues. Typically, this would be done in the Execution part of our architecture, taking our prototype design and adding it to the data factory as one of its interface routines.
We’ll be developing these ideas out further over the summer, with some white papers and a joint presentation at Openworld 2014 to formally launch it. In the meantime, keep an eye on the blog as we work through some of the key concepts.
Introducing the Updated Oracle / Rittman Mead Information Management Reference Architecture Pt1. – Information Architecture and the “Data Factory”
One of the things at Rittman Mead that we’re really interested in, is the architecture of “information management” systems and how these change over time as thinking, and product capabilities, evolve. In fact we often collaborate with the Enterprise Architecture team within Oracle, giving input into the architecture designs they come up with, and more recently working on a full-blown collaboration with them to come up with a next-generation Information Management architecture. I these two posts I wanted to share some of our recent thinking in this area, looking first at our new proposed architecture, and then in the second post talking about how we’d use agile development methods, in-particular our “ExtremeBI” development approach, to deliver it.
But first, some history. Back in 2009 I blogged about a first-generation DW reference architecture which introduced a couple of new concepts, based on new capabilities from tools such as OBIEE plus some thinking we, and the Enterprise Architecture team at Oracle, had been doing over the years. This reference architecture introduced the concept of “Foundation” and “Access and Performance” layers, and recognised the reality that Kimball-type star schemas were great for querying but not so good for long-term, query-neutral storage of data, whilst Inmon-style EDW models were great as a long-term, process-neutral system of record, but not so good for running user queries on. This new architecture included both of these design approaches, with the foundation layer forming the “information management” layer and the access and performance layer being the “information access” layer. Most importantly, tools like OBIEE made it possible for enterprises to create metadata layers that potentially accessed all layers in this model, so users could query the foundation layer if needed as well as the access and performance layer, if the foundation layer was a better source of data for a particular reports.
A second revision to this model, a couple of years later, expanded on the original one and introduced another two interesting concepts, brought upon by the introduction of tools like Endeca Information Discovery, and the rise of unstructured and semi-structured data sources. This new architecture added unstructured and semi-structured sources into the model, and also introduced the concept of “sandboxes”, areas of the information management model that allowed more free-form, exploratory BI applications to be built.
But in-practice, this idea of “unstructured” and “semi-structured” sources wasn’t all that helpful. What really started to make an impact in the past couple of years is the increasing use of “schema-on-read” databases, where we trade-off the performance and defined structure of traditional relational 3NF and star schemas for the flexibility and “time-to-value” provided by key-value store databases. The Endeca Server is a good example of these types of database, where the Endeca Server allows rapid storage of loosely-associated datasets and tools like Endeca Studio then apply a structure to the data, at the time of analysis. Schema-on-read databases are great for fast, flexible access to datasets, but the cost of ETL is then borne by each system that accesses the data.
Probably the most well-known examples of schema-on-read sources though are Hadoop, and NoSQL databases. Coupled with their ability to store lots of detail-level data at relatively low cost, Hadoop and NoSQL databases have significantly affected the overall landscape for BI, data warehousing and business analytics, and we thought it was about time for a new reference architecture that fully-incorporated the capabilities and latest thinking around this area. Back at the start of 2014 myself, Jon Mead and Stewart Bryson met up with Oracle’s Andrew Bond in his team for a series of workshops, and what came out of it was an updated Information Management Architecture *and* a development methodology for delivering it. Let’s start off then by looking at this updated architecture from a conceptual view.
At a conceptual level, we build on this idea of sandbox environment and formally separate things out into the Execution area – business-as-usual, production and development areas – and an Innovation area, where we build on the idea of a sandbox and rename it the “Discovery lab”. The Discovery lab is where, for want of a better word, the “data scientists” work, with fewer constraints on development and whose inputs are events and data, and outputs are the discovery output that can be the prototype and inspiration for designs going into the execution area.
The main “engine” of the Execution area is our enterprise store of data, this time broken down into four areas:
- A “data reservoir” where we store all incoming events and data at detail-level, typically on HDFS. This blog article by Oracle’s Jean-Pierre Dijcks sets out the concept of a data reservoir well, and I like this blog by Scaleabilities’ Jeff Needham where he makes the case for calling it a “data reservoir” that can ingest, process and analyse data rather than a “data lake”, which implies a passive store.
- An Enterprise Data Store, analogous to the enterprise data warehouses we use today, and a reporting component, typically in our case OBIEE
- Most importantly, the new concept of a “data factory”, a conduit between the data reservoir and the enterprise information store
Together, the execution and innovation layers form our “information platform”, with the event engine feeding real-time events into the platform and outputting them into the data reservoir, and traditional ETL routines loading structured data from the enterprise into the enterprise information store.
This conceptual architecture then permits several types of information application. For example, the data reservoir and the data factory together could support what we call “data applications”, applications working on semi-structured, large and low-granularity data sets such as those used for genomic analysis.
Other applications might be more traditional BI and data warehousing applications, but with the addition of data from the data reservoir and the analysis capabilities of Hadoop.
The discovery lab can be a standalone area, or the insights and discovery it outputs can be used as inputs into the main information platform. More event-based data will typically come in via the event engine, with its output going into the data reservoir and supporting “next-best-decision” applications like Oracle Real-Time Decisions.
Another way of looking at this architecture is from a logical perspective, in particular focusing on the data layers and access/loading processes to load them. The diagram below is our latest version of the two diagrams at the start of this article, and as you can see we’ve kept the data sources and BI element much the same, and kept the concept of the sandbox, in this case refined as the “discovery lab sandbox”.
What is different this time though is the middle bit; we’ve lost the staging area and replaced it with the raw data reservoir, added a “Rapid Development Sandbox”, and drawn the main layers as a slanted set of stacked areas. So why?
What we’re trying to show with the slanted data layers is the relative cost of data ingestion (loading), and the relative cost of accessing it (information interpretation). For the raw data reservoir, for example, there’s little cost in ingesting the data – maybe copy some files to HDFS, or use Flume or GoldenGate to capture log or transaction data to HDFS or Hive, but the cost is then borne in accession this typically “schema-on-read” data source. As you go up the stack, there’s a bit more work in landing data into the Foundation layer – ETL routines, GoldenGate routines, some data cleaning and constraint checking, for example – but it’s correspondingly easier to get data out. For the Access and Performance Layer there’s the most cost in getting data in, but then users have very little work to do when getting data out.
Data can move up the stack from Raw Data Reservoir to Foundation, or directly into Access and Performance, or it could be landed at levels above Raw Data Reservoir, for example in our ExtremeBI approach where we use GoldenGate to replicate source system tables directly into Foundation without going through a staging layer. The Rapid Development Sandboxes are there to support agile, iterative development, with the output from them either being the result in itself, or their designs and insights being used to create more formal projects and data structures.
From a more product-centric perspective, you can overlay these types of diagrams with specific schematics for example enterprises. For example, in the diagram below you can see Oracle NoSQL database being see with HDFS and the Oracle Big Data Connectors to capture and store events from Complex Event Processing, and then outputs from CEP being also fed into a more traditional, “high density data” store as well as directly into a decision engine.
So this all sounds great, but how do you build it? Do we have to use the (discredited) step-by-step, waterfall method to build this type of architecture, and in particular the key “data factory” element that provides connectivity between the Raw Data Reservoir and the Enterprise Information Store? And can we apply agile methods to big data sources, as well as regular databases and applications? Check back on Monday for our thoughts on how this should be done.