Category Archives: Rittman Mead
Happy Christmas from Rittman Mead, and Where To Find Us in 2015
It’s the day before Christmas Eve over in the UK, and we’re finishing up for the Christmas break and looking forward to seeing friends and family over the next few days. To all of our readers, customers, friends and partners, have a great Holiday season and New Year, and we’re looking forward to seeing you at various events and on customer projects in 2015.
If you’re thinking of submitting an abstract for the Rittman Mead BI Forum 2015 the call for papers is now open. with abstracts accepted through to January 18th, 2015. In addition, as well as the BI Forum in May you can also catch-up with us at these events in the first-half of the New Year:
- BIWA Summit 2015, San Francisco January 27th – 29th 2015
Myself, Francesco Tisiot, Jordan Meyer, Daniel Adams and Andy Rocha will be presenting on Big Data SQL, ODI & OBIEE; data science for Oracle professionals; Oracle BICS and data visualization amongst other topics
- Riga Dev Day, Riga, Latvia, 22nd January 2015
Robin Moffatt and I will be speaking on OBIEE, source control and release management, and Robin will deliver his award-winning session on OBIEE performance optimization
- Oracle Gebruikersclub Holland, Utrecht, 14th January 2015
This is an evening session for the Oracle Users’ Club Holland where I’ll be talking about our project experiences delivering Big Data and BI projects on Oracle Big Data Appliance, Exadata and Exalytics
- Middle East Oracle University Expert Summit, Dubai, Feb 15th-19th 2015 [PDF]
I’m one of nine speakers at this event, and I’ll be speaking about OBIEE development “best practices” on the first day, and then OBIEE futures on the second
- OUGN Conference, March 12th-14th 2015
Robin Moffatt, myself and others will be speaking at the OUGN Conference in March, on topics around OBIEE and Big Data
- Collaborate’15, Las Vegas, April 12th-16th 2015
- Rittman Mead BI Forum 2015, 6th-8th May 2015 (Brighton), 13th-15th May 2015 (Atlanta)
- ODTUG KScope’15, Hollywood Florida, June 21st-25th 2015
Look out for details of these events in early 2015 as agendas and sessions get finalised – and don’t forget to submit your BI Forum 2015 abstract by January 18th, 2015, with final agenda and speaker details to follow in February 2015.
That’s it for now though – have a great Christmas and New Year, and see you in 2015!
OBIEE Enterprise Security
The Rittman Mead Global Services team have recently been involved in a number of security architecture implementations and produced a security model which meets a diverse set of requirements. Using our experience and standards we have been able to deliver a robust model that addresses the common questions we routinely receive around security, such as :
“Whats considerations do I need to make when exposing Oracle BI to the outside world?”
or
“How can I make a flexible security model which is robust enough to meet the demands of my organisation but easy to maintain?”
The first question is based on a standard enterprise security model where the Oracle BI server is exposed by a web host, enabling SSL and tightening up access security. This request can be complex to achieve but is something that we have implemented many times now.
The second question is much harder to answer, but our experience has led us to develop a multi-dimensional inheritance security model, with numerous clients that has yielded excellent results.
What is a Multi-dimensional Inheritance Security Model?
The wordy title is actually a simple concept that incorporates 5 key areas:
- Easy to setup and maintain
- Flexible
- Durable
- Expandable
- Be consistent throughout the product
While there numerous ways of implementing a security model in Oracle BI, by sticking to the key concepts above, we ensure we get it right. The largest challenge we face in BI is the different types of security required, and all three need to work in harmony:
- Application security
- Content security
- Data security
Understanding the organisation makeup
The first approach is to consider the makeup of a common organisation and build our security around it.
This diagram shows different Departments (Finance, Marketing, Sales) whose data is specific to them, so normally the departmental users should only see their own data that is relevant to them. In contrast the IT department who are developing the system need visibility across all data and so do the Directors.
What types of users do I have?
Next is to consider the types of users we have:
- BI Consumer: This will be the most basic and common user who needs to access the system for information.
- BI Analyst: As an Analyst the user will be expected to generate more bespoke queries and need ways to represent them. They will also need an area to save these reports.
- BI Author: The BI Author will be able to create content and publish that content for the BI Consumers and BI Analysts.
- BI Department Admin: The BI Department Admin will be responsible for permissions for their department as well as act as a focal point user.
- BI Developer: The BI Developer can be thought of as the person(s) who creates models in the RPD and will need additional access to the system for testing of their models. They might also be responsible for delivering Answers Requests or Dashboards in order to ‘Prove’ the model they created.
- BI Administrator: The Administrator will be responsible for the running of the BI system and will have access to every role. Most Administrator Task will not require Skills in SQL/Data Warehouse and is generally separated from the BI Developer role.
The types of users here are a combination of every requirement we have seen and might not be required by every client. The order they are in shows the implied inheritance, so the BI Analyst inherits permissions and privileges from the BI Consumer and so on.
What Types do I need?
Depending on the size of organization determines what types of user groups are required. By default Oracle ships with:
- BI Consumer
- BI Author
- BI Administrator
Typically we would recommend inserting the BI Analyst into the default groups:
- BI Consumer
- BI Analyst
- BI Author
- BI Administrator
This works well when there is a central BI team who develop content for the whole organization. The structure would look like this:
For larger organizations where dashboard development and permissions is handled across multiple BI teams then the BI Department Administrator group can be used to locally manage permissions for each department. Typically we see the BI team as a central Data Warehouse team who deliver the BI model (RPD) to the multiple BI teams. In a large Organization the administration of Oracle BI should be handled by someone who isn’t the BI Developer, the structure could look like:
Permissions on groups
Each of the groups will require different permissions, at a high level the permissions would be:
Name
|
Permissions
|
---|---|
BI Consumer |
|
BI Analyst |
|
BI Author |
|
BI Department Admin |
|
BI Developer |
|
BI Administrator |
|
Understanding the basic security mechanics in 10g and 11g
In Oracle BI 10g the majority of the security is handled in the Oracle BI Server. This would normally be done through initialisation blocks, which would authenticate the user from a LDAP server, then run a query against a database tables to populate the user into ‘Groups’ used in the RPD and ‘Web Groups’ used in the presentation server. These groups would have to match in each level; Database, Oracle BI Server and Oracle BI Presentation Server.
With the addition of Enterprise Manager and Weblogic the security elements in Oracle BI 11g radically changed. Authenticating the user is in the Oracle BI server is no longer the recommended way and is limited in Linux. While the RPD Groups and Presentation Server Web Groups still exist they don’t need to be used. Users are now authenticated against Weblogic. This can be done by using Weblogic’s own users and groups or by plugging it into a choice of LDAP servers. The end result will be Groups and Users that exist in Weblogic. The groups then need to be mapped to Application Roles in Enterprise Manager, which can be seen by the Oracle BI Presentation Services and Oracle BI Server. It is recommended to create a one to one mapping for each group.
What does all this look like then?
Assuming this is for an SME size organization where the Dashboard development (BI Author) is done by the central BI team the groups would like:
The key points are:
- The generic BI Consumer/Analyst groups give their permissions to the department versions
- No users should be in the generic BI Consumer/Analyst groups
- Only users from the BI team should be in the generic BI Author/Administrator group
- New departments can be easily added
- the lines denote the inheritance of permissions and privileges
Whats next – The Web Catalog?
The setup of the web catalog is very important to ensure that it does not get unwieldy, so it needs to reflect the security model and we would recommend setting up some base folders which look like:
Each department has their own folder and 4 sub folders. The permissions applied to each department’s root folder is BI Administrators so full control is possible across the top. This is also true for every folder below however they will have additional explicit permissions described to ensure that the department cannot create any more than the four sub folders.
- The Dashboard folder is where the dashboards go and the departments BI Developers group will have Full control and the departments BI consumer will have read . This will allow the departments BI Developers to create dashboards, the departments BI Administrators to apply permissions and the departments consumers and analysts the ability to view.
- The same permissions are applied to the Dashboard Answers folder to the same effect.
- The Development Answers folder has Full control given to the departments BI Developers and no access to for the departments BI Analysts or BI Consumers. This folder is mainly for the departments BI Developers to store answers when in the process of development.
- The Analyst folder is where the departments BI Analysts can save Answers. Therefore they will need full control of this folder.
I hope this article gives some insight into Security with Oracle BI. Remember that our Global Services products offer a flexible support model where you can harness our knowledge to deliver your projects in a cost effective manner.
OBIEE and ODI on Hadoop : Next-Generation Initiatives To Improve Hive Performance
The other week I posted a three-part series (part 1, part 2 and part 3) on going beyond MapReduce for Hadoop-based ETL, where I looked at a typical Apache Pig dataflow-style ETL process and showed how Apache Tez and Apache Spark can potentially make these processes run faster and make better use of in-memory processing. I picked Pig as a data processing environment as the multi-step data transformations creates translate into lots of separate MapReduce jobs in traditional Hadoop ETL environments, but run as a single DAG (directed acyclic graph) under Tez and Spark and can potentially use memory to pass intermediate results between steps, rather than writing all those intermediate datasets to disk.
But tools such as OBIEE and ODI use Apache Hive to interface with the Hadoop world, not Pig, so its improvements to Hive that will have the biggest immediate impact on the tools we use today. And what’s interesting is the developments and work thats going on around Hive in this area, with four different “next-generation Hive” initiatives going on that could end-up making OBIEE and ODI on Hadoop run faster:
- Hive-on-Tez (or “Stinger”), principally championed by Hortonworks, along with Stinger.next which will enable ACID transactions in HiveQL
- Hive-on-Spark, a more limited port of Hive to run on Spark and backed by Cloudera amongst others
- Spark SQL within Apache Spark, which enables SQL queries against Spark RDDs (and Hive tables), and exposes a HiveServer2-compatible Thrift Server for JDBC access
- Vendor initiatives that build on Hive but are mainly around integration with their RDBMS engines, for example Oracle Big Data SQL
Vendor initiatives like Oracle’s Big Data SQL and Cloudera Impala have the benefit of working now (and are supported), but usually come with some sort of penalty for not working directly within the Hive framework. Oracle’s Big Data SQL, for example, can read data from Hive (very efficiently, using Exadata SmartScan-type technology) but then can’t write-back to Hive, and currently pulls all the Hive data into Oracle if you try and join Oracle and Hive data together. Cloudera’s Impala, on the other hand, is lightening-fast and works directly on the Hadoop platform, but doesn’t support the same ecosystem of SerDes and storage handlers that Hive supports, taking away one of the key flexibility benefits of working with Hive.
So what about the attempts to extend and improve Hive, or include Hive interfaces and compatibility in Spark? In most cases an ETL routine written as a series of Hive statements isn’t going to be as fast or resource-efficient as a custom Spark program, but if we can make Hive run faster or have a Spark application masquerade as a Hive database, we could effectively give OBIEE and ODI on Hadoop a “free” platform performance upgrade without having to change the way they access Hadoop data. So what are these initiatives about, and how usable are they now with OBIEE and ODI?
Probably the most ambitious next-generation Hive project is the Stinger initiative. Backed by Hortonworks and based on the Apache Tez framework that runs on Hadoop 2.0 and YARN. Stinger aimed first to port Hive to run on Tez (which runs MapReduce jobs but enables them to potentially run as a single DAG), and then add ACID transaction capabilities so that you can UPDATE and DELETE from a Hive table as well as INSERT and SELECT, using a transaction model that allows you to roll-back uncommitted changes (diagram from the Hortonworks Stinger.next page)
Tez is more of a set of developer APIs rather than the full data discovery / data analysis platform that Spark aims to provide, but it’s a technology that’s available now as part of Hortonworks HDP2.2 platform and as I showed in the blog post a few days ago, an existing Pig script that you run as-is on a Tez environment typically runs twice as fast as when its using MapReduce to move data around (with usual testing caveats applying, YMMV etc). Hive should be the same as well, giving us the ability to take Hive transformation scripts and run them unchanged except for specifying Tez at the start as the execution engine.
Hive on Tez is probably the first of these initiatives we’ll see working with ODI and OBIEE, as ODI has just been certified for Hortonworks HDP2.1, and the new HDP2.2 release is the one that comes with Tez as an option for Pig and Hive query execution. I’m guessing ODI will need to have its Hive KMs updated to add a new option to select Tez or MapReduce as the underlying Hive execution engine, but otherwise I can see this working “out of the box” once ODI support for HDP2.2 is announced.
Going back to the last of the three blog posts I wrote on going beyond MapReduce, many in the Hadoop industry back Spark as the successor to MapReduce rather than Tez as its a more mature implementation that goes beyond the developer-level APIs that Tez aims to provide to make Pig and Hive scripts run faster. As we’ll see in a moment Spark comes with its own SQL capabilities and a Hive-compatible JDBC interface, but the other “swap-out-the-execution-engine” initiative to improve Hive is Hive on Spark, a port of Hive that allows Spark to be used as Hive’s execution engine instead of just MapReduce.
Hive on Spark is at an earlier stage in development than Hive on Tez with the first demo being given at the recent Strata + Hadoop World New York, and specific builds of Spark and versions of Hive needed to get it running. Interestingly though, a post went on the Cloudera Blog a couple of days ago announcing an Amazon AWS AMI machine image that you could use to test Hive on Spark, which though it doesn’t come with a full CDH or HDP installation or features such as a HiveServer JDBC interface, comes with a small TPC-DS dataset and some sample queries that we can use to get a feeling for how it works. I used the AMI image to create an Amazon AWS m3.large instance and gave it a go.
By default, Hive in this demo environment is configured to use Spark as the underlying execution engine. Running a couple of the TPC-DS queries first using this Spark engine, and then switching back to MapReduce by running the command “set hive.execution.engine=mr” within the Hive CLI, I generally found queries using Spark as the execution engine ran 2-3x faster than the MapReduce ones.
You can’t read too much into this timing as the demo AMI is really only to show off the functional features (Hive using Spark as the execution engine) and no work on performance optimisation has been done, but it’s encouraging even at this point that it’s significantly faster than the MapReduce version.
Long-term the objective is to have both Tez and Spark available as options as execution engines under Hive, along with MapReduce, as the diagram below from a presentation by Cloudera’s Szenon Ho shows; the advantage of building on Hive like this rather than creating your own new SQL-on-Hadoop engine is that you can make use of the library of SerDes, storage handlers and so on that you’d otherwise need to recreate for any new tool.
The third major SQL-on-Hadoop initiative I’ve been looking at is Spark SQL within Apache Spark. Unlike Hive on Spark which aims to swap-out the compiler and execution engine parts of Hive but otherwise leave the rest of the product unchanged, Apache Spark as a whole is a much more freeform, flexible data query and analysis environment that’s aimed more at analysts that business users looking to query their dataset using SQL. That said, Spark has some cool SQL and Hive integration features that make it an interesting platform for doing data analysis and ETL.
In my Spark ETL example the other day, I loaded log data and some reference data into RDDs and then filtered and transformed them using a mix of Scala functions and Spark SQL queries. Running on top of the set of core Spark APIs, Spark SQL allows you to register temporary tables within Spark that map onto RDDs, and give you the option of querying your data using either familiar SQL relational operators, or the more functional programming-style Scala language
You can also create connections to the Hive metastore though, and create Hive tables within your Spark application for when you want to persist results to a table rather than work with the temporary tables that Spark SQL usually creates against RDDs. In the code example below, I create a HiveContext as opposed to the sqlContext that I used in the example on the previous blog, and then use that to create a table in my Hive database, running on a Hortonworks HDP2.1 VM with Spark 1.0.0 pre-built for Hadoop 2.4.0:
scala> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) scala> hiveContext.hql("CREATE TABLE posts_hive (post_id int, title string, postdate string, post_type string, author string, post_name string, generated_url string) row format delimited fields terminated by '|' stored as textfile") scala> hiveContext.hql("LOAD DATA INPATH '/user/root/posts.psv' INTO TABLE posts_hive")
If I then go into the Hive CLI, I can see this new table listed there alongside the other ones:
hive> show tables; OK dummy posts posts2 posts_hive sample_07 sample_08 src testtable2 Time taken: 0.536 seconds, Fetched: 8 row(s)
What’s even more interesting is that Spark also comes with a HiveServer2-compatible Thrift Server, making it possible for tools such as ODI that connect to Hive via JDBC to run Hive queries through Spark, with the Hive metastore providing the metadata but Spark running as the execution engine.
This is subtly different to Hive-on-Spark as Hive’s metastore, support for SerDes and storage handlers runs under the covers but Spark provides you with a full programmatic environment, making it possible to just expose Hive tables through the Spark layer, or mix and match data from RDDs, Hive tables and other sources before storing and then exposing the results through the Hive SQL interface. For example then, you could use Oracle SQL*Developer 4.1 with the Cloudera Hive JDBC drivers to connect to this Spark SQL Thrift Server and query the tables just like any other Hive source, but crucially the Hive execution is being done by Spark, rather than MapReduce as would normally happen.
Like Hive-on-Spark, Spark SQL and Hive support within Spark SQL are at early stages, with Spark SQL not yet being supported by Cloudera whereas the core Spark API is. From the work I’ve done with it, it’s not yet possible to expose Spark SQL temporary tables through the HiveServer2 Thrift Server interface, and I can’t see a way of creating Hive tables out of RDDs unless you stage the RDD data to a file in-between. But it’s clearly a promising technology and if it becomes possible to seamlessly combine RDD data and Hive data, and expose Spark RDDs registered as tables through the HiveServer2 JDBC interface it could make Spark a very compelling platform for BI and data analyst-type applications. Oracle’s David Allen, for example, blogged about using Spark and the Spark SQL Thrift Server interface to connect ODI to Hive through Spark, and I’d imagine it’d be possible to use the Cloudera HiveServer2 ODBC drivers along with the Windows version of OBIEE 11.1.1.7 to connect to Spark in this way too – if I get some spare time over the Christmas break I’ll try and get an example working.
Rittman Mead BI Forum 2015 Call for Papers Now Open!
I’m very pleased to announce that the Call for Papers for the Rittman Mead BI Forum 2015 is now open, with abstract submissions open to January 18th 2015. As in previous years the BI Forum will run over consecutive weeks in Brighton, UK and Atlanta, GA, with the provisional dates and venues as below:
- Brighton, UK : Hotel Seattle, Brighton, UK : May 6th – 8th 2015
- Atlanta, GA : Renaissance Atlanta Midtown Hotel, Atlanta, USA : May 13th-15th 2015
Now on it’s seventh year, the Rittman Mead BI Forum is the only conference dedicated entirely to Oracle Business Intelligence, Oracle Business Analytics and the technologies and processes that support it – data warehousing, data analysis, data visualisation, big data and OLAP analysis. We’re looking for session around tips & techniques, project case-studies and success stories, and sessions where you’ve taken Oracle’s BI products and used them in new and innovative ways. Each year we select around eight-to-ten speakers for each event along with keynote speakers and a masterclass session, with speaker choices driven by attendee votes at the end of January, and editorial input from myself, Jon Mead and Charles Elliott and Jordan Meyer.
Last year we had a big focus on cloud, and a masterclass and several sessions on bringing Hadoop and big data to the world of OBIEE. This year we’re interested in project stories and experiences around cloud and Hadoop, and we’re keen to hear about any Oracle BI Apps 11g implementations or migrations from the earlier 7.9.x releases. Getting back to basics we’re always interested in sessions around OBIEE, Essbase and data warehouse data modelling, and we’d particularly like to encourage session abstracts on data visualization, BI project methodologies and the incorporation of unstructured, semi-structured and external (public) data sources into your BI dashboards. For an idea of the types of presentations that have been selected in the past, check out the BI Forum 2014, 2013 and 2012 homepages, or feel free to get in touch via email at mark.rittman@rittmanmead.com.
The Call for Papers entry form is here, and we’re looking for speakers for Brighton, Atlanta, or both venues if you can speak at both. All session this year will be 45 minutes long, all we’ll be publishing submissions and inviting potential attendees to vote on their favourite sessions towards the end of January. Other than that – have a think about abstract ideas now, and make sure you get them in by January 18th 2015.