Tag Archives: User Groups & Conferences

Photos and Presentation Downloads from the Rittman Mead BI Forum 2014

Well we’re back in Brighton, UK now after the second successful week of the Rittman Mead BI Forum 2014. In Week 1, we went to the Seattle Hotel in Brighton (read the recap here), and then straight-after we flew over to Atlanta, GA, to run the second week – and it was possibly even better than Brighton ;-)

Congratulations to Omri Traub, winner of the Atlanta Best Speaker award – the first person from Oracle to win, in fact – and to all of the other presenters who helped put together an excellent event. If you’re interested, I’ve uploaded a bunch of photos from both Brighton and Atlanta to Flickr, and you can view the BI Forum 2014 photo set here.

Biforumusa

As usual, where we’ve got permission (or the PDF) from the presenter, we’re making all of the presentations available for download to both attendees and non-attendees – not everyone can make it to the event, but we don’t want you to miss-out. We’re also very grateful to Lars George and Cloudera for making their Hadoop Masterclass slides available too – thanks everyone.

Other than that – thanks again to everyone who attended, and hopefully we’ll see you all again next year!

The State of the OBIEE11g World as of May 2014

I’m conscious I’ve posted a lot on this blog over the past few months about hot new topics like big data, Hadoop and Oracle Advanced Analytics, and not so much about OBIEE, which traditionally has been the core of Rittman Mead’s business and what we’ve written about most historically. Part of this is because there’s a lot of innovative stuff coming out of the big data world, but a part of it is because there’s not been a big new OBIEE11g release this year, as we had last year with 11.1.1.7, before that 11.1.1.6, and so on. But there’s actually a lot interesting going on in the OBIEE11g world at the moment without a big headline release, and what with the Brighton RM BI Forum 2014 taking place last week and the product keynotes it gave us, I thought it’d be worth talking a look back at where we are in May 2014, where the innovation is happening and what’s coming up in the next few months for OBIEE.

Product Versions and Capabilities

As of the time of writing (May 11th 2014) we’re currently on the 11.1.1.7.x version of OBIEE, updated with a few patch sets since the original April 2013 release to include features such as Mobile App Designer. OBIEE 11.1.1.7.x saw a UI update to the new FusionFX theme, replacing the theme used from the 11.1.1.3 release, and brought in new capabilities such as Hadoop/Hive integration as well as a bunch of “fit-and-finish” improvements, such that at the time I referred to it as “almost like 11g Release 2”, in terms of usability, features and general “ready-for-deployment” quality.

NewImage

The other major new capability OBIEE 11.1.1.7 brought in was better integration with Essbase and the Hyperion-derived products that are now included in the wider Oracle BI Foundation 11g package. Earlier versions of OBIEE gave you the ability to install Essbase alongside OBIEE11g for the purposes of aggregate persistence into Essbase cubes, but the 11.1.1.7 release brought in a single combined security model for both Essbase and OBIEE, integration of EPM Workspace into the OBIEE environment and the re-introduction of Smartview as OBIEE (and Essbase’s) MS Office integration platform.

Outside of core OBIEE11g but complementing it, and the primary use-case for a lot of OBIEE customers, are the Oracle BI Applications and 2013 saw the release of Oracle BI Applications 11.1.1.7.1, followed just a few days ago by the latest update, OBIA 11.1.1.8.1. What these new releases brought in was the replacement of Informatica PowerCenter by Oracle Data Integrator, and a whole new platform for configuring and running BI Apps ETL jobs based around JEE applications running in WebLogic Server. Whilst at the time of OBIA 11.1.1.7.1’s release most people (including myself) advised caution in using this new release and said most new customers should still use the old 7.9.x release stream – because OBIA 11g skills would be scarce and relatively speaking, it’d have a lot of bugs compared to the more mature 7.9.x stream – in fact I’ve only heard about 11g implementations since then, and they mostly seem to have gone well. OBIA 11.1.1.8.1 came out in early May 2014 and seems to be mostly additional app content, bug fixes and Endeca integration, and there’s still no upgrade path or 11g release for Informatica users, but the 11g release of BI Apps seems to be a known-quantity now and Rittman Mead are getting a few implementations under our belt, too.

Oracle BI Cloud Service (BICS)

So that’s where we are now … but what about the future? As I said earlier, there hasn’t been a major release of OBIEE 11g this year and to my mind, where Oracle’s energy seems to have gone is the “cloud” release of OBIEE11g, previewed back at Oracle Openworld 2013 and due for release in the next few months. You can almost think of this as the “11.1.1.8 release” for this year with the twist being it’s cloud-only, but what’ll be interesting about this version of OBIEE11g is that it’ll probably be updated with new functionality on a much more regular basis than on-premise OBIEE, as Oracle (Cloud) will own the platform and be in a much better position to push-through upgrades and control the environment than for on-premise installs.

NewImage

Headline new capabilities in this cloud release will include:

  • Rapid provisioning, with environments available “at the swipe of a credit card” and with no need to download and install the software yourself
  • Built-in storage, with Oracle’s schema-as-a-service/ApEx database environment backing the product and giving you a place to store data for reporting
  • A consumer-style experience, with wizards and other helper features aimed at getting users familiar with on-premise OBIEE11g up and started on this new cloud version
  • Access to core OBIEE11g features such as Answers, dashboards, mobile and a web-based repository builder

It’s safe to say that “cloud” is a big deal for Oracle at the moment, and it’s probably got as much focus within the OBIEE development team as Fusion Middleware / Fusion Apps integration had back at the start of OBIEE 11g. Part of this is technology trends going on outside of BI, and OBIEE – customers are moving their IT platforms into the cloud anyway, so it makes sense for your BI to be there too, rather than being the only thing left back on-premise, but a bit part of it is the benefits it gives Oracle, and the OBIEE product team – they can own and control much more of the end-to-end experience, giving them control over quality and much more customers on the latest version, and of course the recurring revenues Oracle gets from selling software-as-a-service in the cloud are valued much higher by the market than the one-off license sales they’ve relied on in the past.

But for customers, too, running BI and OBIEE in the cloud brings quite a few potential benefits – both in terms of Oracle’s official “BI in the Cloud Service”, and the wider set of options when you consider running full OBIEE in a public cloud such as Amazon AWS – see my Collaborate’14 presentation on the topic on Slideshare. There’s none of the hassle and cost of actually setting up the software on your own premises, and then doing upgrades and applying patches over time – “empty calories” that have to be spent but don’t bring any direct benefit to the business.  OBIEE in the Cloud also promises to bring a bit of independence to the business from IT, as they’ll be able to spin-up cloud BI instances without having to go through the usual procurement/provisioning cycle, and it’ll be much easier to create temporary or personal-use OBIEE environments for tactical or short-lived work particularly as you’ll only have to license OBIEE for the users and months you actually need it for, rather than buying perpetual licenses which might then sit on the shelf after the immediate need has gone.

Data Visualization, and the Competition from Tableau

It’s probably safe to say that, when OBIEE 11.1.1.3 came out back in 2010, its main competitors were other full-platform, big vendor BI products such as SAP Business Objects and IBM Cognos. Now, in 2014, what we’re hearing anecdotally and from our own sales activity around the product, the main competitor we hear OBIEE 11g coming up against is Tableau. Tableau’s quite a different beast to OBIEE – like QlikTech’s QlikView it’s primarily a desktop BI tool that over the years has been giving some server-based capabilities, but what it does well is get users started fast and give them the ability to create compelling and beautiful data visualisations, without spending days and weeks building an enterprise metadata layer and battling with their IT department.

Of course we all know that as soon as any BI tool gets successful, its inevitable that IT will have to get involved at some point, and you’re going to have to think about enterprise definitions of metrics, common dimensions and so forth, and it’s this area that OBIEE does so well, primarily (in my mind) selling well to the IT department, and with Oracle focusing most of their attention recently on the integration element of the BI world, making it easy to link your ERP and CRM applications to your BI stack, and the whole lot playing well with your corporate security and overall middleware stack. But none of that stuff is important to end users, who want a degree of autonomy from the IT department and something they can use to quickly and painlessly knock-together data visualisations in order to understand the data they’re working with.

So to my mind there’s two aspects to what Tableau does well, that OBIEE needs to have an answer for; ease of setting-up and getting started, and its ability to create data visualisations beyond the standard bar charts and line charts people most associate with OBIEE. And there’s a couple of initiatives already in place, and coming down the line, from Oracle that aim to address this first point; BI Publisher, for example, now gives users the option to create a report directly off-of data in the RPD without the intermediate requirement to create a separate data model, and presents a list of commonly-used report formats at report creation to make the process a bit more “one-stop”.

NewImage

Another initiative that’ll probably come along first as part of the BI in the Cloud Service is personal data-mashups; what this is is a way for users to upload, from spreadsheets or CSV files, data that they want to add to their standard corporate metrics to allow them to produce reports that aren’t currently possible with the standard curated RPD from corporate IT. Metrics users add in this way will have their data stored (probably) in the catalog but marked in a way that it’s clear they’re not “gold-standard” ones, with the aim of the feature being to avoid the situation where users export their base data from OBIEE into Excel and then bring in the additional data there. It does beg a few questions in terms of where the data goes, how it all gets stored and how well it’d work on an on-premise install, but if you work on the basis that users are going to do this sort of thing anyway, it’s best they do it within the overall OBIEE environment than dump it all to Excel and do their worst there (so to speak).

Another even-more intriguing new product capability that’s coming along, and is technically possible with the current 11.1.1.7 release, is the concept of “mini-apps”. Mini-apps are something Philippe Lion’s “SampleApp” team have been working on for a while now, and are extensions to core OBIEE that are enabled via Javascript and allow developers to create self-contained applications, including table creation scripts, to solve a particular user problem or requirement. This Youtube video from one of Philippe’s team goes through the basic concept, with custom Javascript used to unpack a mini-app setup archive and then create tables, and set up the analysis views, to support requirements such as linear regression and trend analysis.

NewImage

It’s likely the BI Cloud Service will take this concept further and introduce a more formalised way of packaging-up BI mini-applications and deploying them quickly to the cloud, and also maybe introduce the concept of a BI App Store or Marketplace where pre-built analytic solutions can be selected and deployed faster even than if the user tried to built the same themselves using Excel (or Tableau, even).

Of course the other aspect to Tableau is its data visualisation capabilities, and while OBIEE 11.1.1.7 improved in this area a bit – with trellis charts being introduced and a new visualisation suggestion engine – it’s probably fair to say that OBIEE 11g has dropped behind the industry-state-of-the-art in this area. What’s been interesting to see though, over the past twelve months, is the widespread adoption of technologies such as D3 and other third-part visualisation tools as additional ways to add graphs and other visuals to OBIEE, with Accenture’s Kevin McGinley showcasing the art of the possible on his blog recently (parts 1, 2 and 3) and presenting on this topic at the Atlanta Rittman Mead BI Forum later this week. Techniques such as those described by Kevin involve deploying separate third-party visualisation libraries such as D3 and Flot to the WebLogic server running OBIEE, and then calling those libraries using custom code contained within narrative views; while these aren’t as developer-friendly as built-in visualisation features in the tool, they do give you the ability to go beyond the standard graphs and tables provided by OBIEE 11g, as Tom Underhill from our team explained in a blog post on OBIEE11g and D3 back in 2013.

NewImage

The upcoming 2014 OBIEE11g SampleApp will most probably feature some more third-party and externally-produced visualisations along these lines, including new HTML5 and Javascript integration capabilities for 11.1.1’7’s mapping feature:

NewImage

and an example of integration ADF charts – which have far more options and capabilities that the subset used in OBIEE 11g – into the OBIEE dashboard. All of this is possible with OBIEE 11.1.1.7 and standard Jdeveloper/ADF, with the video previewing the SampleApp PoC Demo going through the integration process at the end.

NewImage

Community Development of OBIEE Best Practices, Techniques, Product Add-Ons

One of the advantages of OBIEE now being a mature and known product is that best practices are starting to emerge around deployment, development, performance optimisation and so-on around the product. For example, our own Stewart Bryson has been putting a lot of thought into agile development and OBIEE, and topics such as automated deployment of OBIEE RPDs using Git and scripting, giving us a more industry-standard way of building and deploying RPDs now that we’ve got the ability to work with repository metadata in a more atomic format. Robin Moffatt, also from Rittman Mead, has published many articles over the past few years on monitoring, measuring and testing OBIEE performance, again giving us a more industry-standard way of regression testing OBIEE reports and monitoring the overall OBIEE experience using open-source tools.

There’s even a third-party add-on industry for OBIEE, with Christian Screen’s / Art of BI’s “BI Teamwork” being the showcase example; OBIEE still doesn’t have any collaboration or social features included in the base product, unless you count wider integration with WebCenter as the answer for this, and Christian’s BI Teamwork product fills this gap by integrating collaboration, social and SaaS integration features into the core product including localisation into key overseas OBIEE markets.

NewImage

Hadoop and Big Data Integration

You’ll probably have guessed from the amount of coverage we’ve given the topic on the blog over the past few months, but we think Hadoop and big data, and particularly the technologies that will spin-off from this movement, are quite a big deal and will revolutionise what we think-of as analytics and BI over the next few years. Most of this activity has taken place outside the core world of OBIEE using tools such as Cloudera Impala, R and Tableau as the default visualisation tool, but OBIEE will play a role too, primarily through its ability to incorporate big data insights and visualisations into the core enterprise semantic model and corporate dashboards.

What this means in-practice is that OBIEE needs to be able to connect to Hadoop data sources such as Hive and Impala, and also provide a means to incorporate, visualise and explore data from non-traditional sources such as NoSQL and document databases. OBIEE 11.1.1.7 made a first step in this direction with its ability to use Apache Hive as a datasource, but this really is a minimal step-one in support for big data sources, as Hive is generally considered too-slow for ad-hoc query use and the HiveServer1 ODBC driver OBIEE 11.1.1.7 ships with no longer being compatible with recent Cloudera Hadoop (CDH 4.5+) releases. What’s really needed is support for Impala – an in-memory version of Hive – as a datasource, something we hacked-together with a workaround but most probably coming as a supported data source in a future version of OBIEE. What would be very interesting though is support for document-style databases such as MongoDB, giving OBIEE (or most probably, Endeca) the capability to create 360 degree-views of customer activity, including unstructured data held in these NoSQL-style databases.

Exalytics and Engineered Systems

I’d almost forgotten Exalytics from this round-up, which is ironic given its prominence in Oracle BI product marketing over the past couple of years, but not all that surprising given the lack of real innovation around the product recently. There’s certainly been a number of Exalytics updates in terms of product certification – the graphic below shows the software evolution of Exalytics since launch, going up to autumn last year when we presented on it at Enkitec E4:

NewImage

whilst the Exalytics hardware over the same period has seen RAM double, and SSD disk added to improve TimesTen and Essbase startup-times.

NewImage

What Exalytics has lacked, though, is something game-changing that’s only available as part of this platform. There’s a central dilemma for Oracle over Exalytics; do they develop something for OBIEE that only works on OBIEE, that’s substantial and that they hold-back from the on-premise version, or do they largely release the same software for both Exalytics, and non-Exalytics OBIEE and rely on performance tweaks which are hard to quantify for customers, and are hard for Oracle salespeople to use as differentiation for the product. So far they’ve gone for the latter option, making Exalytics – if we’re honest – a bit underwhelming for the moment, but what would be really interesting is some capability that clearly can only be supported on Exalytics – some form of in-memory analysis or processing that needs 1TB+ of RAM for enterprise datasets, possibly based on an as-yet unreleased new analytic engine, maybe based on Essbase or Oracle R technology, maybe even incorporating something from Endeca (or even – left-field – something based on Apache Spark?)

My money however is on this differentiation growing over time, and Exalytics being used extensively by Oracle to power their BI in the Cloud Service, with less emphasis over time on on-premise sales of the products and more on “powered by Exalytics” cloud services. All of that said, my line with customers when talking about Exalytics has always been – you’re spending X million $/£ on OBIEE and the BI Apps, you might as well run it on the hardware its designed for, and which in the scheme of things is only a small proportion of the overall cost; the performance difference might not be noticeable now, but over time OBIEE will be more-and-more optimised for this platform, so you might as well be on it now and also take advantage of the manageability / TCO benefits.

So anyway, that’s my “state-of-the-nation” for OBIEE as I see it today – and if you’re coming along to the Atlanta RM BI Forum event later this week, there’ll be futures stuff from Oracle that we can’t really discuss on here, beyond the 3-6 month timeline, that’ll give you a greater insight into what’s coming in late 2014 and beyond.

RM BI Forum 2014 Brighton is a Wrap – Now on to Atlanta!

I’m writing this sitting in my hotel room in Atlanta, having flown over from the UK on Saturday following the end of the Rittman Mead BI Forum 2014 in Brighton. I think it’s probably true to say that this year was our best ever – an excellent masterclass on the Wednesday followed by even-more excellent sessions over the two main days, and now we’re doing it all again this week at the Renaissance Atlanta Midtown Hotel in Atlanta, GA.

Wednesday’s guest masterclass was by Cloudera’s Lars George, and covered the worlds of Hadoop, NoSQL and big data analytics over a frantic six-hour session. Lars was a trooper; despite a mistake over the agenda where I’d listed his sessions as being just an hour each when he’d planned (and been told by me) that they were an hour-and-a-half each, he managed to cover all of  the main topics and take the audience through Hadoop basics, data loading and processing, NoSQL and analytics using Hive, Impala, Pig and Spark. Roughly half the audience had some experience with Hadoop with the others just being vaguely acquainted with it, but Lars was an engaging speaker and stuck around for the rest of the day to answer any follow-up questions.

NewImage

For me, the most valuable parts to the session were Lars’ real-world experiences in setting up Hadoop clusters, and his views on what approaches were best to analyse data in a BI and ETL context – with Spark clearly being in-favour now compared to Pig and basic MapReduce. Thanks again Lars, and to Justin Kestelyn from Cloudera for organsising it, and I’ll get a second-chance to sit through it again at the event in Atlanta this week.

The event itself proper kicked-off in the early evening with a drinks reception in the Seattle bar, followed by the Oracle keynote and then dinner. Whilst the BI Forum is primarily a community (developer and customer)-driven event, we’re very pleased to have Oracle also take part, and we traditionally give the opening keynote over to Oracle BI Product Management to take us through the latest product roadmap. This year, Matt Bedin from Oracle came over from the States to deliverer the Brighton keynote, and whilst the contents aren’t under NDA there’s an understanding we don’t blog and tweet the contents in too much detail, which then gives Oracle a bit more leeway to talk about futures and be candid about where their direction is (much like other user group events such as BIWA and ODTUG).

NewImage

I think it’s safe to say that the current focus for OBIEE over the next few months is the new BI in the Cloud Service (see my presentation from Collaborate’14 for more details on what this contains), but we were also given a preview of upcoming functionality for OBIEE around data visualisation, self-service and mobile – watch this space, as they say. Thanks again to Matt Bedin for coming over from the States to delver the keynote, and for his other session later in the week where he demo’d BI in the Cloud and several usage scenarios.

We were also really pleased to be joined by some some of the top OBIEE, Endeca and ODI developers around the US and Europe, including Michael Rainey (Rittman Mead) and Nick Hurt (IFPI), Truls Bergensen, Emiel van Bockel (CB), Robin Moffatt (Rittman Mead), Andrew Bond (Oracle) and Stewart Bryson (Rittman Mead), and none-other than Christian Berg, an independent OBIEE / Essbase developer who’s well-known to the community through his blog and through his Twitter handle, @Nephentur – we’ll have all the slides from the sessions up on the blog once the US event is over, and congratulations to Robin for winning the “Best Speaker” award for Brighton for his presentation “No Silver Bullets: OBIEE Performance in the Real World”.

NewImage

We had a few special overseas guests in Brighton too; Christian Screen from Art of BI Software came across (he’ll be in Atlanta too this week, presenting this time), and we were also joined by Oracle’s Reiner Zimmerman, who some of you from the database/DW-side will known from the Oracle DW Global Leaders’ Program. For me though one of the highlights was the joint session with Oracle’s Andrew Bond and our own Stewart Bryson, where they presented an update to the Oracle Information Management Reference Architecture, something we’ve been developing jointly with Andrew’s team and which now incorporates some of our thoughts around the agile deployment of this type of architecture. More on this on the blog shortly, and look out for the white paper and videos Andrew’s team are producing which should be out on OTN soon.

NewImage

So that’s it for Brighton this year – and now we’re doing it all again in Atlanta this week at the Renaissance Atlanta Midtown Hotel. We’ve got Lars George again delivering his masterclass, and an excellent – dare I say it, even better than Brighton’s – array of sessions including ones on Endeca, the In-Memory Option for the Oracle Database, TimesTen, OBIEE, BI Apps and Essbase. There’s still a few places left so if you’re interested in coming, you can book here and we’ll see you in Atlanta later this week!

 

Previewing TimesTen, Endeca and Oracle DW Sessions at the Brighton BI Forum 2014

It’s under a week now to the first of the two Rittman Mead BI Forum 2014 events, with Brighton running next week at the Hotel Seattle and then Atlanta the week after, at the Renaissance Atlanta Midtown Hotel. Earlier in the week I went through a more detailed agenda for the Lars George Cloudera Hadoop Masterclass, and the week before Stewart covered-off some of the Oracle sessions at the Atlanta event, but as a final preview of this series I just wanted to talk about three session running at next week’s Brighton event.

NewImage

Someone I’ve got to know pretty well over the last year is Oracle’s Chris Jenkins, who’s the face of TimesTen development in the UK. I first ran into Chris, and his colleague Susan Cheung, late last year when I posted a series of articles on TimesTen vs. Essbase ASO, and then Chris presented alongside myself and Peak Indicators’ Tony Heljula at last year’s Oracle Openworld, on TimesTen on Exalytics Best Practices. Chris kindly agreed to come along to the Brighton BI Forum and share some of his tips and techniques on TimesTen development, and also answer some of the questions from members of the audience implementing TimesTen as part of their OBIEE setup. Over to Chris:

“Since the launch of Exalytics TimesTen has been at it’s heart delivering high performance access to relational data to support the ‘speed of thought’ experience. But it hasn’t all been plain sailing; each use case has its own specific challenges and correct configuration, adopting best operational practice and properly tuning the TimesTen database to support the workload are essential to getting the best results. When working with customers I often come across situations where things are not setup quite as well as they might be or where a less than optimal approach has been adopted, and this can negatively affect performance or manageability.

In my session I will highlight the most common pitfalls and show how to avoid them. I will also discuss best practices for operation and data loading and look at how to optimise the TimesTen database for your workload. And of course there is the opportunity to ask questions! By the end of the session I hope that you will have a good understanding of how to get the best out of TimesTen for your particular use case.”

NewImage

Another speaker speaking for the first time at the BI Forum is Truls Bergersen, but Truls will of course be well-known to the European user group community through his work with the Norwegian Oracle User Group, who run the excellent OUGN conference cruise each year around April. Truls has been working with Oracle’s BI and data warehousing tools for many years, but more recently has been taking a close look at Endeca Information Discovery, the search and data discovery tool Oracle added to their BI portfolio a couple of years ago. According to Truls …

“It’s been almost two and a half years now, since Oracle acquired Endeca,and in that period the tool has been given a few enhancements. E.g.improvements have been done to the look-and-feel of the UI, it has beenadded support for loading JSON and OBI presentation tables, and the toolcan now be installed on Weblogic. My two favorite things, however, are theself service provisioning and eBS extensions.

The goal of my presentation is to give the audience a good overview of thetool from a data architect’s point of view, and how the tool fits in withand extends your existing BI platform. I will not go into details aboutinstallation and other too technical aspects, but rather look at thetool’s capabilities from a data point of view – how can Endeca utilizeOBIEE and visa versa, what can be done in terms of self-service, etc.”

NewImage

Finally, we’re really pleased to be joined by none other than Reiner Zimmerman, who heads-up Oracle’s Data Warehouse Global Leaders’ Program. Rittman Mead are one of the European partner sponsors of the DW Global Leaders Forum, which brings together the top customers and partners working with Oracle’s data warehousing, engineered systems and big data products several times a year in EMEA, the Americas and APAC.  Reiner’s also the person most likely to take the “last man standing” award from our own Borkur and Ragnar, so before that happens, over to Reiner:

“The DW & Big Data Global Leaders program is an Oracle development driven strategic customer program to establish a platform for Oracle DW and Big Data customers to discuss best practices and experience with Oracle Product Management and Product Development and our Associate Partners like Rittman Mead.

Our current focus is Big Data and Advanced Analytics and we seek to create best practices around Big Data architectures in terms of Manageability, High Availability and Monitoring Big Data Systems. Learn what the program is, what is can bring to you, how you can participate and what other customers are doing.”

The Rittman Mead Brighton BI Forum 2014 runs next week (May 7th-9th 2014) at the Hotel Seattle, Brighton, and there’s still a few places left if you register now. Straight-after, we’re going over to Atlanta to do it all again at the Renaissance Midtown Atlanta Hotel, with full details of the event agendas here, and the event homepage including registration instructions, here. Hopefully see some of you in Brighton or Atlanta later in May!

More Details on the Lars George Cloudera Hadoop Masterclass – RM BI Forum 2014, Brighton & Atlanta

NewImage

It’s just over a week until the first of the two Rittman Mead BI Forum 2014 events take place in Brighton and Atlanta, and one of the highlights of the events for me will be Cloudera’s Lars George’s Hadoop Masterclass. Hadoop and Big Data are two technologies that are becoming increasingly important to the worlds of Oracle BI and data warehousing, so this is an excellent opportunity to learn some basics, get some tips from an expert in the field, and then use the rest of the week to relate it all back to core OBIEE, ODI and Oracle Database.

Lars’ masterclass is running on the Wednesday before each event, on May 7th at the Brighton Hotel Seattle and then the week after, on Wednesday 14th May at the Renaissance Atlanta Midtown Hotel. Attendance for the masterclass is just £275 + VAT for the Brighton event, or $500 for the Atlanta event, but you can only book it as part of the overall BI Forum – so book-up now while there are still places left! In the meantime, here’s details of what’s in the masterclass:

Session 1 – Introduction to Hadoop 

The first session of the day sets the stage for the following ones. We will look into the history of Hadoop, where it comes from, and how it made its way into the open-source world. Part of this overview are the basic building blocks in Hadoop, the file system HDFS and the batch processing system MapReduce. 

Then we will look into the flourishing ecosystem that is now the larger Hadoop project, with its various components and how they help forming a complete data processing stack. We also briefly look into how Hadoop based distributions help today tying the various components together in a reliable manner based on predictable release cycles. 

Finally we will pick up the topic of cluster design and planning, talking about the major decision points when provisioning a Hadoop cluster. This includes the hardware considerations depending on specific use-cases as well as how to deploy the framework on the cluster once it is operational.

Session 2 – Ingress and Egress

The second session dives into the use of the platform as part of an Enterprise Data Hub, i.e. the central storage and processing location for all of the data in a company (large, medium, or small). We will discuss how data is acquired into the data hub and provisioned for further access and processing. There are various tools that allow the importing of data from single event based systems to relational database management systems. 

As data is stored the user has to make decisions how to store the data for further processing, since that can drive the performance implications considerably. In state-of-the-art data processing pipelines there are usually hybrid approaches that combine lightweight LT (no E for “extract” needed), i.e. transformations, with optimised data formats as the final location for fast subsequent processing. Continuous and reliable data collection is vital for productionising the initial proof-of-concept pipelines.

Towards the end we will also look at the lower level APIs available for data consumption, rounding off the set of available tools for a Hadoop practitioner.

Session 3 – NoSQL and Hadoop

For certain use-cases there is an inherent need for something more “database” like compared to the features offered by the original Hadoop components, i.e. file system and batch processing. Especially for slow changing dimensions and entities in general there is a need for updating previously stored data as time progresses. This is where HBase, the Hadoop Database, comes in and allows for random reads and writes to existing rows of data, or entities in a table. 

We will dive into the architecture of HBase to derive the need for proper schema design, one of the key tasks implementing a HBase backed solution. Similar to the file formats from session 2, HBase allows to freely design table layouts which can lead to suboptimal performance. This session will introduce the major access patterns observed in practice and explain how they can play to HBase’s strengths. 

Finally a set of real-world examples will show how fellow HBase users (e.g. Facebook) have gone through various modification of their schema design before arriving at their current setup. Available open-source projects show further schema designs that will help coming to terms with this involved topic.

Session 4 – Analysing Big Data

The final session of the day tackles the processing of data, since so far we have learned mostly about the storage and preparation of data for subsequent handling. We will look into the existing frameworks atop of Hadoop and how they offer distinct (but sometimes also overlapping) functionalities. There are frameworks that run as separate instance but also higher level abstractions on top of those that help developers and data wranglers of all kinds to find their right weapon of choice.

Using all of the learned the user will then see how the various tools can be combined to built the promised reliable data processing pipelines, landing data in the Enterprise Data Hub and using automatisms to start the said subsequent processing without any human intervention. The closing information provided in this session will look into the external interfaces, such as JDBC/ODBC, enabling the visualisation of the computed and analysed data in appealing UI based tools.

Detailed Agenda:

  • Session 1 – Introduction to Hadoop
    • Introduction to Hadoop
      • Explain pedigree, history
      • Explain and show HDFS, MapReduce, Cloudera Manager
    • The Hadoop Ecosystem
      • Show need for other projects within Hadoop
      • Ingress, egress, random access, security
    • Cluster Design and Planning
      • Introduce concepts on how to scope out a cluster
      • Typical hardware configuration
      • Deployment methods 
  • Session 2 – Ingress and Egress
    • Explain Flume, Sqoop to load data record based or in bulk
    • Data formats and serialisation
      • SequenceFile, Avro, Parquet
    • Continuous data collection methods
    • Interfaces for data retrieval (lower level)
       
  • Session 3 – NoSQL and Hadoop
    • HBase Introduction
    • Schema design
    • Access patterns
    • Use-cases examples
       
  • Session 4 – Analysing Big Data
    • Processing frameworks
      • Explain and show MapReduce, YARN, Spark, Solr
    • High level abstractions
      • Hive, Pig, CrunchImpalaSearch
    • Datapipelines in Hadoop
      • Explain Oozie, Crunch
    • Access to data for existing systems
      • ODBC/JDBC

Full details of both BI Forum events can be found at the Rittman Mead BI Forum 2014 web page, and full agenda details for Brighton and Atlanta are on this blog post from last week.