Category Archives: Rittman Mead

Happy Christmas from Rittman Mead, and Where To Find Us in 2015

It’s the day before Christmas Eve over in the UK, and we’re finishing up for the Christmas break and looking forward to seeing friends and family over the next few days. To all of our readers, customers, friends and partners, have a great Holiday season and New Year, and we’re looking forward to seeing you at various events and on customer projects in 2015.

If you’re thinking of submitting an abstract for the Rittman Mead BI Forum 2015 the call for papers is now open. with abstracts accepted through to January 18th, 2015. In addition, as well as the BI Forum in May you can also catch-up with us at these events in the first-half of the New Year:

That’s it for now though – have a great Christmas and New Year, and see you in 2015!

TIMESTAMPS and Presentation Variables

TIMESTAMPS and Presentation Variables can be some of the most useful tools a report creator can use to invent robust, repeatable reports while maximizing user flexibility.  I intend to transform you into an expert with these functions and by the end of this page you will certainly be able to impress your peers and managers, you may even impress Angus MacGyver.  In this example we will create a report that displays a year over year analysis for any rolling number of periods, by week or month, from any date in time, all determined by the user.  This entire document will only use values from a date and revenue field.

Final Month DS

The TIMESTAMP is an invaluable function that allows a user to define report limits based on a moving target. If the goal of your report is to display Month-to-Date, Year-to-Date, rolling month or truly any non-static period in time, the TIMESTAMP function will allow you to get there.  Often users want to know what a report looked like at some previous point in time, to provide that level of flexibility TIMESTAMPS can be used in conjunction with Presentation Variables.

To create robust TIMESTAMP functions you will first need to understand how the TIMESTAMP works. Take the following example:

Filter Day -7 DS

Here we are saying we want to include all dates greater than or equal to 7 days ago, or from the current date.

  • The first argument, SQL_TSI_DAY, defines the TimeStamp Interval (TSI). This means that we will be working with days.
  • The second argument determines how many of that interval we will be moving, in this case -7 days.
  • The third argument defines the starting point in time, in this example, the current date.

So in the end we have created a functional filter making Date >= 1 week ago, using a TIMESTAMP that subtracts 7 days from today.

Results -7 Days DS

Note: it is always a good practice to include a second filter giving an upper limit like “Time”.”Date” < CURRENT_DATE. Depending on the data that you are working with you might bring in items you don’t want or put unnecessary strain on the system.

We will now start to build this basic filter into something much more robust and flexible.

To start, when we subtracted 7 days in the filter above, let’s imagine that the goal of the filter was to always include dates >= the first of the month. In this scenario, we can use the DAYOFMONTH() function. This function will return the calendar day of any date. This is useful because we can subtract this amount to give us the first of the month from any date by simply subtracting it from that date and adding 1.

Our new filter would look like this:

DayofMonth DS

For example if today is December 18th, DAYOFMONTH(CURRENT_DATE) would equal 18. Thus, we would subtract 18 days from CURRENT_DATE, which is December 18th, and add 1, giving us December 1st.

MTD Dates DS

(For a list of other similar functions like DAYOFYEAR, WEEKOFYEAR etc. click here.)

To make this even better, instead of using CURRENT_DATE you could use a prompted value with the use of a Presentation Variable (for more on Presentation Variables, click here). If we call this presentation variable pDate, for prompted date, our filter now looks like this:

pDate DS

A best practice is to use default values with your presentation variables so you can run the queries you are working on from within your analysis. To add a default value all you do is add the value within braces at the end of your variable. We will use CURRENT_DATE as our default, @{pDate}{CURRENT_DATE}.  Will will refer to this filter later as Filter 1.

{Filter 1}:

pDateCurrentDate DS

As you can see, the filter is starting to take shape. Now lets say we are going to always be looking at a date range of the most recent completed 6 months. All we would need to do is create a nested TIMESTAMP function. To do this, we will “wrap” our current TIMESTAMP with another that will subtract 6 months. It will look like this:

Month -6 DS

Now we have a filter that is greater than or equal to the first day of the month of any given date (default of today) 6 months ago.

Month -6 Result DS

To take this one step further, you can even allow the users to determine the amount of months to include in this analysis by making the value of 6 a presentation variable, we will call it “n” with a default of 6, @{n}{6}.  We will refer to the following filter as Filter 2:

{Filter 2}:

n DS

For more on how to create a prompt with a range of values by altering a current column, like we want to do to allow users to select a value for n, click here.

Our TIMESTAMP function is now fairly robust and will give us any date greater than or equal to the first day of the month from n months ago from any given date. Now we will see what we just created in action by creating date ranges to allow for a Year over Year analysis for any number of months.

Consider the following filter set:

 Robust1 DS

This appears to be pretty intimidating but if we break it into parts we can start to understand its purpose.

Notice we are using the exact same filters from before (Filter 1 and Filter 2).  What we have done here is filtered on two time periods, separated by the OR statement.

The first date range defines the period as being the most recent complete n months from any given prompted date value, using a presentation variable with a default of today, which we created above.

The second time period, after the OR statement, is the exact same as the first only it has been wrapped in another TIMESTAMP function subtracting 1 year, giving you the exact same time frame for the year prior.

YoY Result DS

This allows us to create a report that can run a year over year analysis for a rolling n month time frame determined by the user.

A note on nested TIMESTAMPS:

You will always want to create nested TIMESTAMPS with the smallest interval first. Due to syntax, this will always be the furthest to the right. Then you will wrap intervals as necessary. In this case our smallest increment is day, wrapped by month, wrapped by year.

Now we will start with some more advanced tricks:

  • Instead of using CURRENT_DATE as your default value, use yesterday since most data are only as current as yesterday.  If you use real time or near real time reporting, using CURRENT_DATE may be how you want to proceed. Using yesterday will be valuable especially when pulling reports on the first day of the month or year, you generally want the entire previous time period rather than the empty beginning of a new one.  So, to implement, wherever you have @{pDate}{CURRENT_DATE} replace it with @{pDate}{TIMESTAMPADD(SQL_TSI_DAY,-1,CURRENT_DATE)}
  • Presentation Variables can also be used to determine if you want to display year over year values by month or by week by inserting a variable into your SQL_TSI_MONTH and DAYOFMONTH statements.  Changing MONTH to a presentation variable, SQL_TSI_@{INT}{MONTH} and DAYOF@{INT}{MONTH}, where INT is the name of our variable.  This will require you to create a dummy variable in your prompt to allow users to select either MONTH or WEEK.  You can try something like this: CASE MOD(DAY(“Time”.”Date”),2) WHEN 0 ‘WEEK’ WHEN 1 THEN ‘MONTH’ END

INT DS

MOD DS

DropDown DS

In order for our interaction between Month and Week to run smoothly we have to make one more consideration.  If we are to take the date December 1st, 2014 and subtract one year we get December 1st, 2013, however, if we take the first day of this week, Sunday December 14, 2014 and subtract one year we get Saturday December 14, 2014.  In our analysis this will cause an extra partial week to show up for prior years.  To get around this we will add a case statement determining if ‘@{INT}{MONTH}’ = ‘Week’ THEN subtract 52 weeks from the first of the week ELSE subtract 1 year from the first of the month.

Our final filter set will look like this:

Final Filter DS

With the use of these filters and some creative dashboarding you can end up with a report that easily allows you to view a year over year analysis from any date in time for any number of periods either by month or by week.

Final Month Chart DS

Final Week Chart DS

That really got out of hand in a hurry! Surely, this will impress someone at your work, or even Angus MacGyver, if for nothing less than he or she won’t understand it, but hopefully, now you do!

Also, a colleague of mine Spencer McGhin just wrote a similar article on year over year analyses using a different approach. Feel free to review and consider your options.


 

Calendar Date/Time Functions

These are functions you can use within OBIEE and within TIMESTAMPS to extract the information you need.

  • Current_Date
  • Current_Time
  • Current_TimeStamp
  • Day_Of_Quarter
  • DayName
  • DayOfMonth
  • DayOfWeek
  • DayOfYear
  • Hour
  • Minute
  • Month
  • Month_Of_Quarter
  • MonthName
  • Now
  • Quarter_Of_Year
  • Second
  • TimestampAdd
  • TimestampDiff
  • Week_Of_Quarter
  • Week_Of_Year
  • Year

Back to section


 

Presentation Variables

The only way you can create variables within the presentation side of OBIEE is with the use of presentation variables. They can only be defined by a report prompt. Any value selected by the prompt will then be sent to any references of that filter throughout the dashboard page.

In the prompt:

Pres Var DS

From the “Set a variable” dropdown, select “Presentation Variable”. In the textbox below the dropdown, name your variable (named “n” above).

When calling this variable in your report, use the syntax @{n}{default}

If your variable is a string make sure to surround the variable in single quotes: ‘@{CustomerName]{default}’

Also, when using your variable in your report, it is good practice to assign a default value so that you can work with your report before publishing it to a dashboard. For variable n, if we want a default of 6 it would look like this @{n}{6}

Presentation variables can be called in filters, formulas and even text boxes.

Back to section


 

Dummy Column Prompt

For situations where you would like users to select a numerical value for a presentation variable, like we do with @{n}{6} above, you can convert something like a date field into values up to 365 by using the function DAYOFYEAR(“Time”.”Date”).

As you can see we are returning the SQL Choice List Values of DAYOFYEAR(“Time”.”Date”) <= 52.  Make sure to include an ORDER BY statement to ensure your values are well sorted.

Dummy Script DS

Back to Section

OBIEE Enterprise Security

The Rittman Mead Global Services team have recently been involved in a number of security architecture implementations and produced a security model which meets a diverse set of requirements.  Using our experience and standards we have been able to deliver a robust model that addresses the common questions we routinely receive around security, such as :

“Whats considerations do I need to make when exposing Oracle BI to the outside world?”

or

“How can I make a flexible security model which is robust enough to meet the demands of my organisation but easy to maintain?”

The first question is based on a standard enterprise security model where the Oracle BI server is exposed by a web host, enabling SSL and tightening up access security.  This request can be complex to achieve but is something that we have implemented many times now.

The second question is much harder to answer, but our experience has led us to develop a multi-dimensional inheritance security model, with numerous clients that has yielded excellent results.

What is a Multi-dimensional Inheritance Security Model?

The wordy title is actually a simple concept that incorporates 5 key areas:

  • Easy to setup and maintain
  • Flexible
  • Durable
  • Expandable
  • Be consistent throughout the product

While there numerous ways of implementing a security model in Oracle BI, by sticking to the key concepts above, we ensure we get it right.  The largest challenge we face in BI is the different types of security required, and all three need to work in harmony:

  • Application security
  • Content security
  • Data security

Understanding the organisation makeup

The first approach is to consider the makeup of a common organisation and build our security around it.

1

This diagram shows different Departments (Finance, Marketing, Sales) whose data is specific to them, so normally the departmental users should only see their own data that is relevant to them.  In contrast the IT department who are developing the system need visibility across all data and so do the Directors.

 

What types of users do I have?

Next is to consider the types of users we have:

  1. BI Consumer: This will be the most basic and common user who needs to access the system for information.
  2. BI Analyst: As an Analyst the user will be expected to generate more bespoke queries and need ways to represent them. They will also need an area to save these reports.
  3. BI Author: The BI Author will be able to create content and publish that content for the BI Consumers and BI Analysts.
  4. BI Department Admin: The BI Department Admin will be responsible for permissions for their department as well as act as a focal point user.
  5. BI Developer: The BI Developer can be thought of as the person(s) who creates models in the RPD and will need additional access to the system for testing of their models. They might also be responsible for delivering Answers Requests or Dashboards in order to ‘Prove’ the model they created.
  6. BI Administrator:  The Administrator will be responsible for the running of the BI system and will have access to every role.  Most Administrator Task will not require Skills in SQL/Data Warehouse and is generally separated from the BI Developer role.

The types of users here are a combination of every requirement we have seen and might not be required by every client.  The order they are in shows the implied inheritance, so the BI Analyst inherits permissions and privileges from the BI Consumer and so on.

What Types do I need?

Depending on the size of organization determines what types of user groups are required. By default Oracle ships with:

  1. BI Consumer
  2. BI Author
  3. BI Administrator

Typically we would recommend inserting the BI Analyst into the default groups:

  1. BI Consumer
  2. BI Analyst
  3. BI Author
  4. BI Administrator

This works well when there is a central BI team who develop content for the whole organization. The structure would look like this:

2

 

For larger organizations where dashboard development and permissions is handled across multiple BI teams then the BI Department Administrator group can be used to locally manage permissions for each department.  Typically we see the BI team as a central Data Warehouse team who deliver the BI model (RPD) to the multiple BI teams.  In a large Organization the administration of Oracle BI should be handled by someone who isn’t the BI Developer, the structure could look like:

3

 

 

Permissions on groups

Each of the groups will require different permissions, at a high level the permissions would be:

 

Name
Permissions
BI Consumer
  • View Dashboards
  • Save User Selections
  • Subscribe to Ibots
BI Analyst
  • Access to Answers and standard set of views
  • Some form of storage
  • Access to Subject areas
BI Author
  • Access to Create/Modify Dashboards
  • Save Predefined Sections
  • Access to Action Links
  • Access to Dashboard Prompts
  • Access to BI Publisher
BI Department Admin
  • Ability to apply permissions and manage the Web Catalog
BI Developer
  • Advance access to answers
  • Access to all departments
BI Administrator
  • Everything

 

Understanding the basic security mechanics in 10g and 11g

In Oracle BI 10g the majority of the security is handled in the Oracle BI Server.  This would normally be done through initialisation blocks, which would authenticate the user from a LDAP server, then run a query against a database tables to populate the user into ‘Groups’ used in the RPD and ‘Web Groups’ used in the presentation server.  These groups would have to match in each level; Database, Oracle BI Server and Oracle BI Presentation Server.

With the addition of Enterprise Manager and Weblogic the security elements in Oracle BI 11g radically changed.  Authenticating the user is in the Oracle BI server is no longer the recommended way and is limited in Linux. While the RPD Groups and Presentation Server Web Groups still exist they don’t need to be used.  Users are now authenticated against Weblogic.  This can be done by using Weblogic’s own users and groups or by plugging it into a choice of LDAP servers.  The end result will be Groups and Users that exist in Weblogic.  The groups then need to be mapped to Application Roles in Enterprise Manager, which can be seen by the Oracle BI Presentation Services and Oracle BI Server.  It is recommended to create a one to one mapping for each group.

4

 

What does all this look like then?

Assuming this is for an SME size organization where the Dashboard development (BI Author) is done by the central BI team the groups would like:

 

5

 

The key points are:

  • The generic BI Consumer/Analyst groups give their permissions to the department versions
  • No users should be in the generic BI Consumer/Analyst groups
  • Only users from the BI team should be in the generic BI Author/Administrator group
  • New departments can be easily added
  • the lines denote the inheritance of permissions and privileges

 

Whats next – The Web Catalog?

The setup of the web catalog is very important to ensure that it does not get unwieldy, so it needs to reflect the security model and we would recommend setting up some base folders which look like:

6

 

Each department has their own folder and 4 sub folders. The permissions applied to each department’s root folder is BI Administrators so full control is possible across the top.  This is also true for every folder below however they will have additional explicit permissions described to ensure that the department cannot create any more than the four sub folders.

  • The Dashboard folder is where the dashboards go and the departments BI Developers group will have Full control and the departments BI consumer will have read . This will allow the departments BI Developers to create dashboards,  the departments BI Administrators to apply permissions and the departments consumers and analysts the ability to view.
  • The same permissions are applied to the Dashboard Answers folder to the same effect.
  • The Development Answers folder has Full control given to the departments BI Developers and no access to for the departments BI Analysts or BI Consumers. This folder is mainly for the departments BI Developers to store answers when in the process of development.
  • The Analyst folder is where the departments BI Analysts can save Answers. Therefore they will need full control of this folder.

I hope this article gives some insight into Security with Oracle BI.  Remember that our Global Services products offer a flexible support model where you can harness our knowledge to deliver your projects in a cost effective manner.

OBIEE and ODI on Hadoop : Next-Generation Initiatives To Improve Hive Performance

The other week I posted a three-part series (part 1, part 2 and part 3) on going beyond MapReduce for Hadoop-based ETL, where I looked at a typical Apache Pig dataflow-style ETL process and showed how Apache Tez and Apache Spark can potentially make these processes run faster and make better use of in-memory processing. I picked Pig as a data processing environment as the multi-step data transformations creates translate into lots of separate MapReduce jobs in traditional Hadoop ETL environments, but run as a single DAG (directed acyclic graph) under Tez and Spark and can potentially use memory to pass intermediate results between steps, rather than writing all those intermediate datasets to disk.

But tools such as OBIEE and ODI use Apache Hive to interface with the Hadoop world, not Pig, so its improvements to Hive that will have the biggest immediate impact on the tools we use today. And what’s interesting is the developments and work thats going on around Hive in this area, with four different “next-generation Hive” initiatives going on that could end-up making OBIEE and ODI on Hadoop run faster:

  • Hive-on-Tez (or “Stinger”), principally championed by Hortonworks, along with Stinger.next which will enable ACID transactions in HiveQL
  • Hive-on-Spark, a more limited port of Hive to run on Spark and backed by Cloudera amongst others
  • Spark SQL within Apache Spark, which enables SQL queries against Spark RDDs (and Hive tables), and exposes a HiveServer2-compatible Thrift Server for JDBC access
  • Vendor initiatives that build on Hive but are mainly around integration with their RDBMS engines, for example Oracle Big Data SQL

Vendor initiatives like Oracle’s Big Data SQL and Cloudera Impala have the benefit of working now (and are supported), but usually come with some sort of penalty for not working directly within the Hive framework. Oracle’s Big Data SQL, for example, can read data from Hive (very efficiently, using Exadata SmartScan-type technology) but then can’t write-back to Hive, and currently pulls all the Hive data into Oracle if you try and join Oracle and Hive data together. Cloudera’s Impala, on the other hand, is lightening-fast and works directly on the Hadoop platform, but doesn’t support the same ecosystem of SerDes and storage handlers that Hive supports, taking away one of the key flexibility benefits of working with Hive. 

So what about the attempts to extend and improve Hive, or include Hive interfaces and compatibility in Spark? In most cases an ETL routine written as a series of Hive statements isn’t going to be as fast or resource-efficient as a custom Spark program, but if we can make Hive run faster or have a Spark application masquerade as a Hive database, we could effectively give OBIEE and ODI on Hadoop a “free” platform performance upgrade without having to change the way they access Hadoop data. So what are these initiatives about, and how usable are they now with OBIEE and ODI?

Probably the most ambitious next-generation Hive project is the Stinger initiative. Backed by Hortonworks and based on the Apache Tez framework that runs on Hadoop 2.0 and YARN. Stinger aimed first to port Hive to run on Tez (which runs MapReduce jobs but enables them to potentially run as a single DAG), and then add ACID transaction capabilities so that you can UPDATE and DELETE from a Hive table as well as INSERT and SELECT, using a transaction model that allows you to roll-back uncommitted changes (diagram from the Hortonworks Stinger.next page)

NewImage

Tez is more of a set of developer APIs rather than the full data discovery / data analysis platform that Spark aims to provide, but it’s a technology that’s available now as part of Hortonworks HDP2.2 platform and as I showed in the blog post a few days ago, an existing Pig script that you run as-is on a Tez environment typically runs twice as fast as when its using MapReduce to move data around (with usual testing caveats applying, YMMV etc). Hive should be the same as well, giving us the ability to take Hive transformation scripts and run them unchanged except for specifying Tez at the start as the execution engine.

NewImage

Hive on Tez is probably the first of these initiatives we’ll see working with ODI and OBIEE, as ODI has just been certified for Hortonworks HDP2.1, and the new HDP2.2 release is the one that comes with Tez as an option for Pig and Hive query execution. I’m guessing ODI will need to have its Hive KMs updated to add a new option to select Tez or MapReduce as the underlying Hive execution engine, but otherwise I can see this working “out of the box” once ODI support for HDP2.2 is announced.

Going back to the last of the three blog posts I wrote on going beyond MapReduce, many in the Hadoop industry back Spark as the successor to MapReduce rather than Tez as its a more mature implementation that goes beyond the developer-level APIs that Tez aims to provide to make Pig and Hive scripts run faster. As we’ll see in a moment Spark comes with its own SQL capabilities and a Hive-compatible JDBC interface, but the other “swap-out-the-execution-engine” initiative to improve Hive is Hive on Spark, a port of Hive that allows Spark to be used as Hive’s execution engine instead of just MapReduce.

Hive on Spark is at an earlier stage in development than Hive on Tez with the first demo being given at the recent Strata + Hadoop World New York, and specific builds of Spark and versions of Hive needed to get it running. Interestingly though, a post went on the Cloudera Blog a couple of days ago announcing an Amazon AWS AMI machine image that you could use to test Hive on Spark, which though it doesn’t come with a full CDH or HDP installation or features such as a HiveServer JDBC interface, comes with a small TPC-DS dataset and some sample queries that we can use to get a feeling for how it works. I used the AMI image to create an Amazon AWS m3.large instance and gave it a go.

By default, Hive in this demo environment is configured to use Spark as the underlying execution engine. Running a couple of the TPC-DS queries first using this Spark engine, and then switching back to MapReduce by running the command “set hive.execution.engine=mr” within the Hive CLI, I generally found queries using Spark as the execution engine ran 2-3x faster than the MapReduce ones.

NewImage

You can’t read too much into this timing as the demo AMI is really only to show off the functional features (Hive using Spark as the execution engine) and no work on performance optimisation has been done, but it’s encouraging even at this point that it’s significantly faster than the MapReduce version.

Long-term the objective is to have both Tez and Spark available as options as execution engines under Hive, along with MapReduce, as the diagram below from a presentation by Cloudera’s Szenon Ho shows; the advantage of building on Hive like this rather than creating your own new SQL-on-Hadoop engine is that you can make use of the library of SerDes, storage handlers and so on that you’d otherwise need to recreate for any new tool.

NewImage

The third major SQL-on-Hadoop initiative I’ve been looking at is Spark SQL within Apache Spark. Unlike Hive on Spark which aims to swap-out the compiler and execution engine parts of Hive but otherwise leave the rest of the product unchanged, Apache Spark as a whole is a much more freeform, flexible data query and analysis environment that’s aimed more at analysts that business users looking to query their dataset using SQL. That said, Spark has some cool SQL and Hive integration features that make it an interesting platform for doing data analysis and ETL.

In my Spark ETL example the other day, I loaded log data and some reference data into RDDs and then filtered and transformed them using a mix of Scala functions and Spark SQL queries. Running on top of the set of core Spark APIs, Spark SQL allows you to register temporary tables within Spark that map onto RDDs, and give you the option of querying your data using either familiar SQL relational operators, or the more functional programming-style Scala language

NewImage

You can also create connections to the Hive metastore though, and create Hive tables within your Spark application for when you want to persist results to a table rather than work with the temporary tables that Spark SQL usually creates against RDDs. In the code example below, I create a HiveContext as opposed to the sqlContext that I used in the example on the previous blog, and then use that to create a table in my Hive database, running on a Hortonworks HDP2.1 VM with Spark 1.0.0 pre-built for Hadoop 2.4.0:

scala> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
scala> hiveContext.hql("CREATE TABLE posts_hive (post_id int, title string, postdate string, post_type string, author string, post_name string, generated_url string) row format delimited fields terminated by '|' stored as textfile")
scala> hiveContext.hql("LOAD DATA INPATH '/user/root/posts.psv' INTO TABLE posts_hive")

If I then go into the Hive CLI, I can see this new table listed there alongside the other ones:

hive> show tables;
OK
dummy
posts
posts2
posts_hive
sample_07
sample_08
src
testtable2
Time taken: 0.536 seconds, Fetched: 8 row(s)

What’s even more interesting is that Spark also comes with a HiveServer2-compatible Thrift Server, making it possible for tools such as ODI that connect to Hive via JDBC to run Hive queries through Spark, with the Hive metastore providing the metadata but Spark running as the execution engine.

NewImage

This is subtly different to Hive-on-Spark as Hive’s metastore, support for SerDes and storage handlers runs under the covers but Spark provides you with a full programmatic environment, making it possible to just expose Hive tables through the Spark layer, or mix and match data from RDDs, Hive tables and other sources before storing and then exposing the results through the Hive SQL interface. For example then, you could use Oracle SQL*Developer 4.1 with the Cloudera Hive JDBC drivers to connect to this Spark SQL Thrift Server and query the tables just like any other Hive source, but crucially the Hive execution is being done by Spark, rather than MapReduce as would normally happen.

NewImage

Like Hive-on-Spark, Spark SQL and Hive support within Spark SQL are at early stages, with Spark SQL not yet being supported by Cloudera whereas the core Spark API is. From the work I’ve done with it, it’s not yet possible to expose Spark SQL temporary tables through the HiveServer2 Thrift Server interface, and I can’t see a way of creating Hive tables out of RDDs unless you stage the RDD data to a file in-between. But it’s clearly a promising technology and if it becomes possible to seamlessly combine RDD data and Hive data, and expose Spark RDDs registered as tables through the HiveServer2 JDBC interface it could make Spark a very compelling platform for BI and data analyst-type applications. Oracle’s David Allen, for example, blogged about using Spark and the Spark SQL Thrift Server interface to connect ODI to Hive through Spark, and I’d imagine it’d be possible to use the Cloudera HiveServer2 ODBC drivers along with the Windows version of OBIEE 11.1.1.7 to connect to Spark in this way too – if I get some spare time over the Christmas break I’ll try and get an example working.

Rittman Mead BI Forum 2015 Call for Papers Now Open!

I’m very pleased to announce that the Call for Papers for the Rittman Mead BI Forum 2015 is now open, with abstract submissions open to January 18th 2015. As in previous years the BI Forum will run over consecutive weeks in Brighton, UK and Atlanta, GA, with the provisional dates and venues as below:

  • Brighton, UK : Hotel Seattle, Brighton, UK : May 6th – 8th 2015
  • Atlanta, GA : Renaissance Atlanta Midtown Hotel, Atlanta, USA : May 13th-15th 2015

Now on it’s seventh year, the Rittman Mead BI Forum is the only conference dedicated entirely to Oracle Business Intelligence, Oracle Business Analytics and the technologies and processes that support it – data warehousing, data analysis, data visualisation, big data and OLAP analysis. We’re looking for session around tips & techniques, project case-studies and success stories, and sessions where you’ve taken Oracle’s BI products and used them in new and innovative ways. Each year we select around eight-to-ten speakers for each event along with keynote speakers and a masterclass session, with speaker choices driven by attendee votes at the end of January, and editorial input from myself, Jon Mead and Charles Elliott and Jordan Meyer.

NewImage

Last year we had a big focus on cloud, and a masterclass and several sessions on bringing Hadoop and big data to the world of OBIEE. This year we’re interested in project stories and experiences around cloud and Hadoop, and we’re keen to hear about any Oracle BI Apps 11g implementations or migrations from the earlier 7.9.x releases. Getting back to basics we’re always interested in sessions around OBIEE, Essbase and data warehouse data modelling, and we’d particularly like to encourage session abstracts on data visualization, BI project methodologies and the incorporation of unstructured, semi-structured and external (public) data sources into your BI dashboards. For an idea of the types of presentations that have been selected in the past, check out the BI Forum 2014, 2013 and 2012 homepages, or feel free to get in touch via email at mark.rittman@rittmanmead.com

The Call for Papers entry form is here, and we’re looking for speakers for Brighton, Atlanta, or both venues if you can speak at both. All session this year will be 45 minutes long, all we’ll be publishing submissions and inviting potential attendees to vote on their favourite sessions towards the end of January. Other than that – have a think about abstract ideas now, and make sure you get them in by January 18th 2015.