Tag Archives: Oracle Big Data Appliance

Rittman Mead BI Forum 2014 Abstract Scoring Now Live – For 1 Week Only!

The call for papers for the Rittman Mead BI Forum 2014 closed at the end of January, and we’ve had some excellent submissions on topics ranging from OBIEE, Visualizations and data discovery through to in-memory analytics, big data and data integration. As always, we’re now opening up the abstract submission list for scoring, so that anyone considering coming to either the Brighton or Atlanta events can have a say in what abstracts are selected.

The voting forms, and event details, are below:

In case you missed it, we also announced the speaker for the Wednesday masterclass the other day – Lars George from Cloudera, who’ll be talking about Hadoop, HBase, Cloudera and how it all applies to the worlds of analytics, BI and DW – something we’re all really excited about.

Voting is open for just one week, and will close at 5pm PST on Tuesday, 18th Feb. Shortly afterwards we’ll announce the speaker line-up, and open-up registrations for both events. Keep an eye on the blog for more details as they come.

Rittman Mead BI Forum 2014 Call for Papers Closing Soon – And News on This Year’s Masterclass

Its a couple of days to go until the call for papers for the Rittman Mead BI Forum 2014 closes, with suggested topics this year including OBIEE (of course), Essbase, Endeca, Big Data, Visualizations, In-Memory analysis and data integration. So far we’ve had some excellent submissions but we’re still looking for more – so if you’re considering putting an abstract in, do it now before we close the process late this Friday night!

I’m also very excited to announce that this year’s optional one-day masterclass on the Wednesday before each event will be presented by Lars George from Cloudera, who be talking about Hadoop, Cloudera’s distribution of Hadoop and their management and real-time query tools, and how these relate to the world of Oracle BI&DW. Lars is a Cloudera Solutions Architect and Head of Services for them in EMEA, and is also an HBase committer and author of the book “HBase: The Definitive Guide”.

You’ll probably have seen a lot on big data, and Cloudera, on this blog over the past few months, and I’m particularly grateful to Justin Kestelyn who used to run OTN and the Oracle ACE Program, but now does a similar role over at Cloudera, for making it happen. Thanks Justin and Lars, and we’ll look forward to seeing Lars in Brighton and Atlanta in May this year.

Once the call for papers closes, we’ll do the usual vote to allow potential attendees to influence the paper selection, and then we’ll announce the agendas and open the events up for registration later in February. Until then though – get your abstracts in now before it’s too late… 

Previewing the BI Forum 2013 Data Integration Masterclass

I guess it’s a British thing to not blow our own trumpet (does that translate the same over in the US?), but something I’m particularly proud about with the upcoming Rittman Mead BI Forum 2013 events is our Oracle Data Integration Masterclass, running on the Wednesday before each event properly starts, and put together by myself, Stewart Bryson and Michael Rainey. Although the main theme for the BI Forum is OBIEE, virtually every BI system that we all work with has a data warehouse of some sort underneath it, and most OBIEE professionals to one extent or another have to understand data warehousing principles, and how Oracle’s data integration tools work. So this year, we thought we’d take a deep-dive into Oracle Data Integrator and the wider Oracle Data Integration Suite, and in this preview positing I’ll be giving you a bit of a preview of what’s coming in the session – and places are still available for the US BI Forum event, and for the masterclass itself if you’ve only registered for just the main conference event.

The masterclass is made up of six sections, delivered by myself, Stewart and Michael, assumes a basic understanding of data warehousing and ETL tools but otherwise gets down into the detail of what we’ve found works well “in the field”.  Stewart Bryson, Oracle ACE and Managing Director for Rittman Mead America, will open the session with an overview of ODI and the Oracle Data Integration Suite, taking a look at the product history and walking the audience through the major elements of the ODI product architecture. If you’ve ever wondered what agents do within ODI, why there are two repositories and where WebLogic comes into it, Stewart’s session will make everything clear before we get into the rest of the details.

NewImage

Then, after coffee, Stewart will carry on and talk about what’s called the Oracle Information Management Reference Architecture, Oracle’s next-generation blueprint for data warehousing and information management that combines the best of Kimball and Inmon with new thinking around “big data” and “data discovery”. ODI and Oracle Data Integration Suite is the enabling technology for this new framework. At Rittman Mead, we use this framework for the majority of our DW customer engagements and we’ll be talking later on in the masterclass about how big data sources, for example, can be leveraged by ODI and brought into your BI environment in the same way as any other regular, relational datasource.

NewImage

The third section of the masterclass sees Michael Rainey take over the stage and talk to us about ODI’s integration with Oracle GoldenGate, Oracle’s data integration product for real-time analysis and data loading. Michael has taken part in several ODI & GoldenGate customer engagements over in the States, and has worked with Stewart in producing a number of custom ODI knowledge modules to better make use of this powerful new data integration tool. If you’ve read through any of Michael’s blog posts on ODI and Golden Gate and are interested in hearing a bit more detail on how it all works, as well as some real-world practical tips and tricks, this will be an invaluable session for you.

NewImage

So far I’ve got away with just making the tea, but straight-after Michael is my session, where I’ll be talking about ODI and its new integration with Hadoop, NoSQL and the wider “big data” technology area. I’ve been covering ODI and Hadoop in some blog posts over the past week, but there’s only so much that I can get into a blog post and this session will be the first airing of this new material, where I’ll be demoing all the main integration points and talking about what works well, and where the main value is, with this very interesting new feature.

NewImage

Then it’s back to Stewart again, where he’ll be talking about creating highly-resilient ETL code that’s also resumable, using features such as ODI 11g’s load plans and the Oracle Database’s resumable space allocation feature. Stewart and I were particularly keen to put together this session as it brings together work Stewart did a few years ago on fault-tolerant ETL in the Oracle Database, with some blog posts I put together over the 2012 Christmas break around highly-resilient ETL with ODI11g.  What this session does is explain the background to the ETL resilience features in the Oracle Database, and ODI’s use of WebLogic JEE agents, and demonstrates through some custom knowledge modules how they can be brought together for your project.

NewImage

Finally, Michael concludes the masterclass with a look at a feature you’re probably vaguely aware of, intend to learn something about, but sounds a bit complex; Groovy scripting and the ODI SDK. In fact, like WLST scripting for OBIEE, learning Groovy and the SDK is the key to automating tedious tasks such as mass-importing and reverse-engineering tables and files, as well as making it possible to add functionality to ODI or integrate it with other standards-based products. In a session almost entirely made-up of live demos, Michael will take us through the basics of Groovy and the SDK, and show us a few examples of where this could add value to your data integration projects.

So there we have it – Brighton is now fully-booked up, but if you’ve already registered for the main event but want to come to the masterclass now too, you can log back into the registration site and update your booking to include the additional masterclass fee. Atlanta is running a week later and so still has a few main event passes left, and again if you’ve already registered for the main conference, use the link in your registration confirmation to go back in and add the masterclass to your booking. And – hopefully we’ll see you all in Brighton or Atlanta for the Rittman Mead BI Forum 2013 in the next two weeks!

OBIEE, ODI and Hadoop Part 1: So What Is Hadoop, MapReduce and Hive?

Recent releases of OBIEE and ODI have included support for Apache Hadoop as a data source, probably the most well-recognised technology within the “big data” movement. Most OBIEE and ODI developers have probably heard of Hadoop and MapReduce, a data-processing programming model that goes hand-in-hand with Hadoop, but haven’t tried it themselves or really found a pressing reason to use them. So over this next series of three articles, we’ll take a look at what these two technologies actually are, and then see how OBIEE 11g, and also ODI 11g connect to them and make use of their features.

Hadoop is actually a family of open-source tools sponsored by the Apache foundation that provides a distributed, reliable shared storage and analysis system. Designed around clusters of commodity servers (which may actually be virtual and cloud-based) and with data stored on the server, not on separate storage units, Hadoop came from the world of Silicon Valley social and search companies and has spawned a raft of Apache foundation sub-projects such as Hive (for SQL-like querying of Hadoop clusters), HBase (a distributed, column-store database based on Google’s “BigTable” technology), Pig (a procedural language for writing Hadoop analysis jobs that’s PL/SQL to Hive’s SQL) and HDFS (a distributed, fault-tolerant filesystem). Hadoop, being open-source, can be downloaded for free and run easily on most Unix-based PCs and servers, and also on Windows with a bit of mucking-around to create a Unix-like environment; the code from Hadoop has been extended and to an extent commercialised by companies such as Cloudera (who provide the Hadoop infrastructure for Oracle’s Big Data Appliance) and Hortonworks, who can be though of as the “Red Hat” and “SuSE” of the Hadoop world.

MapReduce, on the other hand, is a programming model or algorithm for processing data, typically in parallel. MapReduce jobs can be written, theoretically, in any language as long as they exposure two particular methods, steps or functions to the calling program (typically, the Hadoop Jobtracker):

  • A “Map” function, that takes input data in the form of key/value pairs and extracts the data that you’re interested in, outputting it again in the form of key/value pairs
  • A “Reduce” function, which typically sorts and groups the “mapped” key/value pairs, and then typically passes the results down to the line to another MapReduce job for further processing

Joel Spolsky (of Joel on Software fame, one of mine and Jon’s inspirations in setting up Rittman Mead) explains MapReduce well in this article back from 2006, when he’s trying to explain the fundamental differences between object-orientated languages like Java, and functional languages like Lisp and Haskell. Ironically, most MapReduce functions you see these days are actually written in Java, but it’s MapReduce’s intrinsic simplicity, and the way that Hadoop abstracts away the process of running individual map and reduce functions on lots of different servers , and the Hadoop job co-ordination tools take care of making sense of all the chaos and returning a result in the end, that make it take off so well and allow data analysis tasks to scale beyond the limits of just a single server..

NewImage

I don’t intend to try and explain the full details of Hadoop in this blog post though, and in reality most OBIEE and ODI developers won’t need to know how Hadoop works under the covers; what they will often want to be able to do though is connect to a Hadoop cluster and make use of the data it contains, and its data processing capabilities, either to report against directly or more likely, use as an input into a more traditional data warehouse. An organisation might store terabytes or petabytes of web log data, details of user interactions with a web-based service, or other e-commerce-type information in an HDFS clustered, distributed fault-tolerant file system, and while they might then be more than happy to process and analyse the data entirely using Hadoop-style data analysis tools, they might also want to load some of the nuggets of information derived from that data in a more traditional, Oracle-style data warehouse, or indeed make it available to less technical end-users more used to writing queries in SQL or using tools such as OBIEE.

Of course, the obvious disconnect here is that distributed computing, fault-tolerant clusters and MapReduce routines written in Java can get really “technical”, more technical than someone like myself generally gets involved in and certainly more technical than you average web analytics person will want to get. Because of this need to provide big-data style analytics to non-Java programmers, some developers at Facebook a few years ago came up with the idea of “Hive”, a set of technologies that provided a SQL-type interface over Hadoop and MapReduce, along with supporting technologies such as a metadata layer that’s not unlike the RPD that OBIEE uses, so that non-programmers could indirectly create MapReduce routines that queried data via Hadoop but with Hive actually creating the MapReduce routines for you. And for bonus points, because the HiveQL language that Hive provided was so like SQL, and because Hive also provided ODBC and JDBC drivers conforming to common standards, tools such as OBIEE and ODI can now access Hadoop/MapReduce data sources and analyse their data just like any other data source (more or less…)

Hive

So where this leaves us is that the 11.1.1.7 release of OBIEE can access Hadoop/MapReduce sources via a HiveODBC driver, whilst ODI 11.1.1.6+ can access the same sources via a HiveJDBC driver. There is of course the additional question as to why you might want to do this, but we’ll cover how OBIEE and then ODI can access Hadoop/MapReduce data sources in the next two articles in this series, as well as try and answer the question as to why you’d want to do this, and what benefits OBIEE and ODI might provide over more “native” or low-level big data query and analysis tools such as Cloudera’s Impala or Google’s Dremel (for data analysis) or Hadoop technologies such as Pig or Sqoop (for data loading and processing). Check back tomorrow for the next instalment in the series.

Rittman Mead BI Forum 2013 Call for Papers now Open!

I’m pleased to announce that the call for papers for the 5th annual Rittman Mead BI Forum is now open, with abstracts being accepted through to January 31st, 2013.

Last year’s BI Forum was the biggest and best ever, running in Brighton, UK and Atlanta, GA in May 2012. This year we’re back again at the Hotel Seattle, Brighton, and moving venues in Atlanta to the Georgia Tech Hotel and Conference Center. As in previous years, the BI Forum is centred around OBIEE and related products such as Oracle Data Integrator, Oracle Endeca, Oracle Essbase and Oracle Database, as well as technologies and concepts such as big data, BI methodology, and BI “best practices”.  What makes this event unique is the audience and its size – we keep the numbers to around sixty attendees at each event, the audience is at intermediate-to-expert level, there’s no marketing or sales presentations, the atmosphere is informal, and we concentrate as much on networking and discussions as we do on the actual sessions.

NewImage

 We’ve had some fantastic speakers over the years, presenting on all aspects of OBIEE development, product internals, case studies and project approaches, and we’re proud to have a speaker mix that includes industry experts, Oracle product managers, and presenters that were previously unknown but have gone on to take “best speaker” award. For each of the two events we’ll generally select eight one-hour presentations, and three TED-style ten-minute sessions, and nearer the event we’ll invite suggestions on what this year’s debate will be.

For now though, the call for papers is now open, and you can propose presentations for either the Brighton event, the Atlanta event, or both. At the end of January we’ll invite anyone thinking of attending to vote for their favourite presentations, and we’ll take those votes, along with a bit of “curating” from myself, and publish the agenda.

The call for papers website is here : Rittman Mead BI Forum 2013 Call for Papers – and abstracts will be accepted through to the end of January 2013. 

For a roundup of last year’s BI Forum events, including details of the sessions and presentation downloads, you can visit the BI Forum 2012 roundup page on our blog. Any questions, just drop me an email at mark.rittman@rittmanmead.com.