Data Warehousing | OBIEE.nl

Tag Archives: Data Warehousing

December 31, 2015 Rittman Mead

Kickstart Your 2016 with Rittman Mead’s Data Integration Training

Happy Holidays and a Happy New Year to all! As you begin your 2016 this January, it’s time to start planning your team’s data integration training. Look no further than Rittman Mead’s Oracle Data Integrator training course! We offer a 4 day Oracle Data Integrator 12c Bootcamp for those looking to take advantage of the latest and greatest features in ODI 12c. We also still teach our 5 day Oracle Data Integrator 11g Bootcamp, as we know sometimes it can be difficult to upgrade to the latest release and new data warehouse team members need to be brought up to speed on the product. ODI 11g is also still very much alive in Oracle Business Intelligence Applications, being the ETL technology for the 11g release of the product suite.

ODI12c training

Customized Data Integration Training

BI Apps 11g training has been a hot topic from the data integration perspective over the last couple of years. Rittman Mead have delivered custom BI Apps training for ODI developers several times just within the last year, prompting us to add a new public training course specific to this topic to our public schedule. This course walks attendees through the unique relationship between OBIEE and ODI 11g as the data integration technology, including configuration, load plan generation, and ETL customization. If you have an Oracle Business Intelligence Applications 11g team looking to enhance their ODI 11g skills, take a look at the new ODI for BI Applications course description.

The customization of training does not just apply to BI Applications, but to all aspects of Oracle Data Integration. Whether adding more details around Oracle GoldenGate installation and maintenance to the ODI 12c course, or learning about Oracle EDQ integration, the Rittman Mead data integration team of experts can work to deliver the course so your team gains the most value from its investment in Oracle Data Integration technologies. Just ask! Reach out and we can work together to create a custom course to fit your needs.

Public or Onsite Training?

Rittman Mead has several dates for each course, scheduled to be delivered out of our offices in either Atlanta, GA or Brighton, UK. Take a look here for our ODI 12c bootcamp, ODI 11g bootcamp, and ODI for BI Apps Developers offerings in the US. Look here for the same in the UK/Europe (Note: as of the writing of this blog post, the 2016 UK/Europe schedule had not been released). We also offer the same courses for delivery onsite at your company’s office, allowing our experts to come to you! Quite often our clients will combine consulting and training, ensuring they get the most out of their investment in our team of experts.

Why Rittman Mead?

Many folks in the Business Intelligence and Data Integration profession who are looking for a consulting company might think Rittman Mead only work on extremely challenging projects based on the depth of knowledge and type of problems (and solutions) we offer via our blog. The fact is, most of our projects are the “standard” data warehouse or business intelligence reporting implementations, with some of these additional challenges coming along the way. Why do I bring that up? Well, if you’re looking for the experts in Oracle Data Integration technology, with experience in both project implementation and solving challenging technical problems, then you’ve come to the right place to learn about ODI.

Unlike many other companies offering training, we don’t have a staff of educators on hand. Our trainers are the same folks that deliver projects, using the technology you’re interested in learning about, on a day-to-day basis. We offer you real world examples as we walk through our training slide deck and labs. Need to know why Oracle GoldenGate is an integral part of real-time data integration? Let me tell you about my latest client where I implemented GoldenGate and ODI. Want to know what to look out for when installing the JEE Agent in ODI 12c? We’ve done that many times – and know the tricks necessary to get it all working.

Our experts, such Jérôme Françoisse, Becky Wagner, Mark Rittman, myself, and many others, all have multiple years of experience with Oracle Data Integration implementations. Not only that, but we here at Rittman Mead truly enjoy sharing our knowledge! Whether posting to this blog, speaking at Oracle conferences, or on the OTN forums, Rittman Mead experts are always looking to teach others in order to better the Oracle Data Integration community.

If you or your company are in need of Oracle Data Integration training, please drop us a line at training@rittmanmead.com. As always, feel free to reach out to me directly on Twitter (@mRainey), LinkedIn, or via email (michael.rainey@rittmanmead.com) if you have any direct questions. See you all next year!

The post Kickstart Your 2016 with Rittman Mead’s Data Integration Training appeared first on Rittman Mead Consulting.

December 10, 2015 Rittman Mead

Driving OBIEE User Engagement with Enhanced Usage Tracking for OBIEE

Measuring and monitoring user interactions and behaviour with OBIEE is a key part of Rittman Mead’s User Engagement Service. By understanding and proving how users are engaging the system we can improve the experience for the user, driving up usage and ensuring maximum value for your OBIEE investment. To date, we have had the excellent option of Usage Tracking for finding out about system usage, but this only captures actual dashboard and analysis executions. What I am going to discuss in this article is taking Usage Tracking a step further, and capturing and analysing every click that that the user makes. Every login, every search, every report build action. This can be logged to a database such as Oracle, and gives us Enhanced Usage Tracking!

Why?

Because the more we understand about our user base, the more we can do for them in terms of improved content and accessibility, and the more we can do for us, the OBIEE dev/sysadmin, in terms of easier maintenance and better knowledge of the platform for which we are developing.

Here is a handful of questions that this data can answer – I’m sure once you see the potential of the data you will be able to think of plenty more…

How many users are accessing OBIEE through a mobile device?

Maybe you’re about to implement a mobile strategy, perhaps deploying MAD or rolling out BI Mobile HD. Wouldn’t it be great if you could quantify its uptake, and not only that but the impact that the provision of mobile makes on the general user engagement levels of your OBIEE user base?

Perhaps you think your users might benefit from a dedicated Mobile OBIEE strategy, but to back up your business case for the investment in mobile licences or time to optimise content for mobile consumption you want to show how many users are currently accessing full OBIEE through a web browser on their mobile device. And not only ‘a mobile device’, but which one, which browser, and which OS. Enhanced Usage Tracking data can provide all this, and more.

Which dashboards get exported to Excel the most frequently?

The risks that Excel-marts present are commonly discussed, and broader solutions such as data-mashup capabilities within OBIEE itself exist – but how do you identify which dashboards are frequently exported from OBIEE to Excel, and by whom? We’ve all probably got a gut-instinct, or indirect evidence, of when this happens – but now we can know for sure. Whilst Usage Tracking alone will tell us when a dashboard is run, only Enhanced Usage Tracking can show what the user then did with the results:

What do we do with this information? It Depends, of course. In some cases exporting data to Excel is a – potentially undesirable but pragmatic – way of getting certain analysis done, and to try to prevent it unnecessarily petulant and counterproductive. In many other cases though, people use it simply as a way of doing something that could be done in OBIEE but they lack the awareness or training in order to do it. The point is that by quantifying and identifying when it occurs you can start an informed discussion with your user base, from which both sides of the discussion benefit.

Precise Tracking of Dashboard Usage

Usage Tracking is great, but it has limitations. One example of this is where a user visits a dashboard page more than once in the same session, meaning that it may be served from the Presentation Services cache, and if that happens, the additional visit won’t be recorded in Usage Tracking. By using click data we can actually track every single visit to a dashboard.

In this example here we can see a user visiting two dashboard pages, and then going back to the first one – which is captured by the Enhanced Usage Tracking, but not the standard one, which only captures the first two dashboard visits:

This kind of thing can matter, both from an audit point of view, but also a more advanced use, where we can examine user behaviour in repeated visits to a dashboard. For example, does it highlight that a dashboard design is not optimal and the user is having to switch between multiple tabs to build up a complete picture of the data that they are analysing?

Predictive Modelling to Identify Users at Risk of ‘Churn’

Churn is when users disengage from a system, when they stop coming back. Being able to identify those at risk of doing this before they do it can be hugely valuable, because it gives you opportunity to prevent it. By analysing the patterns of system usage in OBIEE and looking at users who have stopped using OBIEE (i.e. churned) we can then build a predictive model to identify those with similar patterns of usage but are still active.

Measures such as the length of time it takes to run the first dashboard after login, or how many dashboards are run, or how long it takes to find data when building an analysis, can all be useful factors to include in the model.

Are any of my users still accessing OBIEE through IE6?

A trend that I’ve seen in the years working with OBIEE is that organisations are [finally] moving to a more tolerant view on web browsers other than IE. I suppose this is as the web application world evolves and IE becomes more standards compliant and/or web application functionality forces organisations to adopt browsers that provide the latest capabilities. OBIEE too, is a lot better nowadays at not throwing its toys out of the pram when run on a browser that happens to have been written within the past decade.

What’s my little tirade got to do with enhanced usage tracking? Because as those responsible for the development and support of OBIEE in an organisation we need to have a clear picture of the user base that we’re supporting. Sure, corporate ‘standard’ is IE9, but we all know that Jim in design runs one of those funny Mac things with Safari, Fred in accounts insists on Firefox, Bob in IT prides himself on running Konquerer, and it would be a cold day in hell before you prise the MD’s copy of IE5 off his machine. Whether these browsers are “supported” or not is only really a secondary point to whether they’re being used. A lot of the time organisations will take the risk on running unsupported configurations, consciously or in blissful ignorance, and being ‘right’ won’t cut it if your OBIEE patch suddenly breaks everything for them.

Enhanced Usage Tracking gives us the ability to analyse browser usage over time:

as well as the Enhanced Usage Tracking data rendered through OBIEE itself, showing browser usage in total (nb the Log scale):

It’s also easy to report on the Operating System that users have:

Where are my users connecting to OBIEE from?

Whilst a lot of OBIEE deployments are run within the confines of a corporate network, there are those that are public-facing, and for these ones it could be interesting to include location as another dimension by which we analyse the user base and their patterns of usage. Enhanced Usage Tracking includes the capture of a user’s IP, which for public networks we can easily lookup and use the resulting data in our analysis.

Even on a corporate network the user’s IP can be useful, because the corporate network will be divided into subnets and IP ranges, which will usually have geographical association to them – you just might need to code your own lookup in order to translate 192.168.11.5 to “Bob’s dining room”.

Who deleted this report? Who logged in? Who clicked the Do Not Click Button?

The uses for Enhanced Usage Tracking are almost endless. Any user interaction with OBIEE can now be measured and monitored.

A frequent question that I see on the OTN forums is along the lines of “for audit purposes, we need to know who logged in”. Since Usage Tracking alone won’t capture this directly (although the new init block logging in > 11.1.1.9 probably helps indirectly with this) this information usually isn’t available….until now! In this table we see the user, their session ID, and the time at which they logged in:

What about who updated a report last, or deleted it? We can find that out too! This simple example shows some of the operations in the Presentation Catalog recorded as clear as day in Enhanced Usage Tracking:

Want to know more? We’d love to tell you more!

If you’d like to find out more, including about Enhanced Usage Tracking and getting a free User Engagement Report for your OBIEE system, get in touch now!

The post Driving OBIEE User Engagement with Enhanced Usage Tracking for OBIEE appeared first on Rittman Mead Consulting.

December 5, 2015 Rittman Mead

Oracle OpenWorld 2015 Roundup Part 3 : Oracle 12cR2 Database Sharding, Analytic Views and Essbase 12c

With the UKOUG conference starting tomorrow I thought it about time I finished off my three-part post-OOW 2015 blog series, with a final post on some interesting announcements around Oracle Database and Essbase. As a reminder the other two posts were on OBIEE12c and the new Data Visualisation Cloud Service, and Data Integration and Big Data. For now though let’s look first at two very significant announcements about future 12cR2 functionality – database sharding and Analytic Views.

Anyone who’s been involved in Oracle Data Warehousing over the years will probably be aware of the shared-everything vs. shared-nothing architecture debate. Databases like Oracle Database were originally designed for OLTP workloads with the optimal way to increase capacity being to buy a bigger server. When RAC (Real Application Clusters) came along the big selling point was a single shared database instance spread over multiple nodes, making application development easy (no real changes) but with practical limits as to how big that cluster can get – due to the need to synchronise shared memory across all nodes in the cluster, and network bottleneck caused by compute and storage being spread across the whole cluster, not co-located as we get with Hadoop and HDFS, for example.

NewImage

Shared-nothing databases such as Netezza, for example, take a different approach and “shard” the database instance over multiple nodes in the cluster so that processing and storage are co-located on the same node for particular ranges of data. This gives the advantage of much greater scalability that a shared-nothing approach (again, this is why Hadoop uses a similar approach for its massively-clustered distributed compute approach) but with the drawback of having to consider data locality when writing ETL and other code; at worst it means data loading and processing needs to be rewritten when you add more nodes and re-shard the database, and it also generally precludes OLTP work and consequently mixed-workloads on the same platform.

NewImage

But if it’s just data warehousing you want to do, you don’t really care about mixed workloads and its generally considered that shared-nothing and sharding is what you need if you want to get to very-large scale data warehousing, such that Oracle went partly down the shared-nothing route with Exadata and push-down of filtering, projection and other operations to storage nodes thereby adding an element of data locality and reducing the network throughput between storage and compute.

NewImage

But both types of database are loosing out to Hadoop for very, very large datasets with Hadoop distributed compute approach designed right from the start for large distributed workloads at the expense of not supporting OLTP at all and, at least initially, all intermediate resultsets being written to disk. For those types of workloads and database size Oracle just wasn’t an option, but a certain top their of Oracle’s data warehousing customers wanted to be able to scale to hundreds or thousands of nodes and most of them have ULAs, so cost isn’t really a limiting factor; for those customers, Oracle announced that the 12c Release 2 version of Oracle Database would support sharding … but with warnings it’s for sophisticated and experienced customers only.

NewImage

Oracle are positioning what they’re referring to as “Oracle Elastic Sharding” as for both scaling and fault-tolerance, with up to 1,000 nodes supported and with data routed to particular shards through use of a “sharding key” passed the connection pool.

NewImage

Sharding in 12c Release 2 was described to me as a featured aimed to the “top 5%” of Oracle customers where price isn’t the issue but they want Oracle to scale to the size of cluster supported by Hadoop and NoSQL. Time will tell how well it’ll work and what it’ll cost, but it certainly completes Oracle’s journey from strict shared-everything for data warehousing to more-or-less shared nothing, if you want to go down that extreme-scalability route.

The other announcement from the Oracle Database side was the even-more-unexpected “Analytic Views”. A clue came from who was running the session – Bud Endress, of Oracle Express / Oracle OLAP fame and more recently, the Vector Group By feature in the In-Memory Option – but what we got was a lot more than Oracle OLAP re-imagined for in-memory; instead what Oracle are looking to do is bring the business metadata and calculation layers that BI tools use right into the database, provide an MDX query interface over it, simplify SQL so that you just select measures, attributes and hierarchies – and then optimise the whole thing so it runs in-memory if you have that option licensed.

NewImage

Its certainly an “interesting” goal with considerable overlap with OBIEE’s BI Server and Essbase Server, but the goal of bringing all this functionality closer to the data and available to all tools is certainly ambitious, if it gets traction it should bring business metadata layers and simpler queries to a wider audience – but the fact that it seems to be being developed separately to Oracle’s BI and Essbase teams means it probably won’t be subsuming Essbase or the BI Server’s functionaliy.

The last area I wanted to look at was Essbase. Essbase Cloud Service was launched at this event with it positioned as a return to Essbase’s roots as a tool you could use in the finance department without requiring IT’s help, except this time it’s because Essbase is running as a service in the cloud rather than on an old PC under your desk. What was particularly interesting though is that the version of Essbase being used in the cloud is the new 12c version, that replaces some of the server components (the Essbase Agent, but not the core Essbase Server part) with new Java components that presumably fit better with Oracle’s cloud infrastructure and also support greater levels of concurrency

NewImage

Apart from the announcement of a future ability to link to R libraries, the other really interesting part of Essbase 12c is that for now the only on-premise version of it is as part of OBIEE12c, and it’ll have a very fixed role there as a pure query accelerator for OBIEE’s BI Server – perhaps the answer to Qlikview and Tableau’s in-memory column-store caches. Essbase as part of an OBIEE12c store doesn’t work with Essbase Studio or any of the other standard Essbase tools, but instead has a new Essbase Business Intelligence Acceleration Wizard that deploys Hybrid ASO/BSO Essbase cubes directly from the OBIEE BI Server and RPD.

NewImage

Coupled with the changes to Essbase announced a couple of years ago at Openworld 2013 designed to improve compatibility with OBIEE, this co-located version of Essbase seems to have completed it’s transformation into the BI Server mid-tier aggregate cache layer of choice that started back with the 11.1.1.6.2 BP1 version of OBIEE – but it does mean this version can’t be used for anything else, even custom Essbase cubes you load and design yourself. Interesting developments across both database server products though, and that wraps up my overview of OOW2015 announcements. Next stop – UKOUG Tech’15 in Birmingham, where I’ve just arrived ready for my masterclass session in tomorrow’s Super Sunday event – on data reservoirs and Customer 360 on Oracle Big Data Appliance.

The post Oracle OpenWorld 2015 Roundup Part 3 : Oracle 12cR2 Database Sharding, Analytic Views and Essbase 12c appeared first on Rittman Mead Consulting.

October 28, 2015 Rittman Mead

Forays into Kafka 02 – Enabling Flexible Data Pipelines

One of the defining features of “Big Data” from a technologist’s point of view is the sheer number of tools and permutations at one’s disposal. Do you go Flume or Logstash? Avro or Thrift? Pig or Spark? Foo or Bar? (I made that last one up). This wealth of choice is wonderful because it means we can choose the right tool for the right job each time.

Of course, we need to establish that have indeed chosen the right tool for the right job. But here’s the paradox. How do we easily work out if a tool is going to do what we want of it and is going to be a good fit, without disturbing what we already have in place? Particularly if it’s something that’s going to be part of an existing Productionised data pipeline, inserting a new tool partway through what’s there already is going to risk disrupting that. We potentially end up with a series of cloned environments, all diverging from each other, and not necessarily comparable (not to mention the overhead of the resource to host it all).

The same issue arises when we want to change the code or configuration of an existing pipeline. Bugs creep in, ideas to enhance the processing that you’ve currently got present themselves. Wouldn’t it be great if we could test these changes reliably and with no risk to the existing system?

This is where Kafka comes in. Kafka is very useful for two reasons:

You can use it as a buffer for data that can be consumed and re-consumed on demand
Multiple consumers can all pull the data, independently and at their own rate.

So you take your existing pipeline, plumb in Kafka, and then as and when you want to try out additional tools (or configurations of existing ones) you simply take another ‘tap’ off the existing store. This is an idea that Gwen Shapira put forward in May 2015 and really resonated with me.

I see Kafka sitting right on that Execution/Innovation demarcation line of the Information Management and Big Data Reference Architecture that Oracle and Rittman Mead produced last year:

Kafka enables us to build a pipeline for our analytics that breaks down into two phases:

Data ingest from source into Kafka, simple and reliable. Fewest moving parts as possible.
Post-processing. Batch or realtime. Uses Kafka as source. Re-runnable. Multiple parallel consumers: –
- Productionised processing into Event Engine, Data Reservoir and beyond
- Adhoc/loosely controlled Data Discovery processing and re-processing

These two steps align with the idea of “Obtain” and “Scrub” that Rittman Mead’s Jordan Meyer talked about in his BI Forum 2015 Masterclass about the Data Discovery:

So that’s the theory – let’s now look at an example of how Kafka can enable us to build a more flexible and productive data pipeline and environment.

Flume or Logstash? HDFS or Elasticsearch? … All of them!

Mark Rittman wrote back in April 2014 about using Apache Flume to stream logs from the Rittman Mead web server over to HDFS, from where they could be analysed in Hive and Impala. The basic setup looked like this:

Another route for analysing data is through the ELK stack. It does a similar thing – streams logs (with Logstash) in to a data store (Elasticsearch) from where they can be analysed, just with a different set of tools with a different emphasis on purpose. The input is the same – the web server log files. Let’s say I want to evaluate which is the better mechanism for analysing my log files, and compare the two side-by-side. Ultimately I might only want to go forward with one, but for now, I want to try both.

I could run them literally in parallel:

The disadvantage with this is that I have twice the ‘footprint’ on my data source, a Production server. A principle throughout all of this is that we want to remain light-touch on the sources of data. Whether a Production web server, a Production database, or whatever – upsetting the system owners of the data we want is never going to win friends.

An alternative to running in parallel would be to use one of the streaming tools to load data in place of the other, i.e.

The issue with this is I want to validate the end-to-end pipeline. Using a single source is better in terms of load/risk to the source system, but less so for validating my design. If I’m going to go with Elasticsearch as my target, Logstash would be the better fit source. Ditto HDFS/Flume. Both support connectors to the other, but using native capabilities always feels to me a safer option (particularly in the open-source world). And what if the particular modification I’m testing doesn’t support this kind of connectivity pattern?

Can you see where this is going? How about this:

The key points here are:

One hit on the source system. In this case it’s flume, but it could be logstash, or another tool. This streams each line of the log file into Kafka in the exact order that it’s read.
Kafka holds a copy of the log data, for a configurable time period. This could be days, or months – up to you and depending on purpose (and disk space!)
Kafka is designed to be distributed and fault-tolerant. As with most of the boxes on this logical diagram it would be physically spread over multiple machines for capacity, performance, and resilience.
The eventual targets, HDFS and Elasticsearch, are loaded by their respective tools pulling the web server entries exactly as they were on disk. In terms of validating end-to-end design we’re still doing that – we’re just pulling from a different source.

Another massively important benefit of Kafka is this:

Sooner or later (and if you’re new to the tool and code/configuration required, probably sooner) you’re going to get errors in your data pipeline. These may be fatal and cause it to fall in a heap, or they may be more subtle and you only realise after analysis that some of your data’s missing or not fully enriched. What to do? Obviously you need to re-run your ingest process. But how easy is that? Where is the source data? Maybe you’ll have a folder full of “.processed” source log files, or an HDFS folder of raw source data that you can reprocess. The issue here is the re-processing – you need to point your code at the alternative source, and work out the range of data to reprocess.

This is all eminently do-able of course – but wouldn’t it be easier just to rerun your existing ingest pipeline and just rewind the point at which it’s going to pull data from? Minimising the amount of ‘replumbing’ and reconfiguration to run a re-process job vs. new ingest makes it faster to do, and more reliable. Each additional configuration change is an opportunity to mis-configure. Each ‘shadow’ script clone for re-running vs normal processing is increasing the risk of code diverging and stale copies being run.

The final pipeline in this simple example looks like this:

The source server logs are streamed into Kafka, with a permanent copy up onto Amazon’s S3 for those real “uh oh” moments. Kafka, in a sandbox environment with a ham-fisted sysadmin, won’t be bullet-proof. Better to recover a copy from S3 than have to bother the Production server again. This is something I’ve put in for this specific use case, and wouldn’t be applicable in others.
From Kafka the web server logs are available to stream, as if natively from the web server disk itself, through Flume and Logstash.

There’s a variation on a theme of this, that looks like this:

Instead of Flume -> Kafka, and then a second Flume -> HDFS, we shortcut this and have the same Flume agent as is pulling from source writing to HDFS. Why have I not put this as the final pipeline? Because of this:

Let’s say that I want to do some kind of light-touch enrichment on the files, such as extracting the log timestamp in order to partition my web server logs in HDFS by the date of the log entry (not the time of processing, because I’m working with historical files too). I’m using a regex_extractor interceptor in Flume to determine the timestamp from the event data (log entry) being processed. That’s great, and it works well – when it works. If I get my regex wrong, or the log file changes date format, the house of cards comes tumbling down. Now I have a mess, because my nice clean ingest pipeline from the source system now needs fixing and re-running. As before, of course it is possible to write this cleanly so that it doesn’t break, etc etc, but from the point of view of decoupling operations for manageability and flexibility it makes sense to keep them separate (remember the Obtain vs Scrub point above?).

The final note on this is to point out that technically we can implement the pipeline using a Kafka Flume channel, which is a slightly neater way of doing things. The data still ends up in the S3 sink, and available in Kafka for streaming to all the consumers.

Kafka in Action

Let’s take a look at the configuration to put the above theory into practice. I’m running all of this on Oracle’s BigDataLite 4.2.1 VM which includes, amongst many other goodies, CDH 5.4.0. Alongside this I’ve installed into /opt :

apache-flume-1.6.0
elasticsearch-1.7.3
kafka_2.10-0.8.2.1
kibana-4.1.2-linux-x64
logstash-1.5.4

The Starting Point – Flume -> HDFS

First, we’ve got the initial Logs -> Flume -> HDFS configuration, similar to what Mark wrote about originally:

# http://flume.apache.org/FlumeUserGuide.html#exec-source  
source_agent.sources = apache_server  
source_agent.sources.apache_server.type = exec  
source_agent.sources.apache_server.command = tail -f /home/oracle/website_logs/access_log  
source_agent.sources.apache_server.batchSize = 1  
source_agent.sources.apache_server.channels = memoryChannel

# http://flume.apache.org/FlumeUserGuide.html#memory-channel  
source_agent.channels = memoryChannel  
source_agent.channels.memoryChannel.type = memory  
source_agent.channels.memoryChannel.capacity = 100

## Write to HDFS  
source_agent.sinks = hdfs_sink  
source_agent.sinks.hdfs_sink.type = hdfs  
source_agent.sinks.hdfs_sink.channel = memoryChannel  
source_agent.sinks.hdfs_sink.hdfs.path = /user/oracle/incoming/rm_logs/apache_log  
source_agent.sinks.hdfs_sink.hdfs.fileType = DataStream  
source_agent.sinks.hdfs_sink.hdfs.writeFormat = Text  
source_agent.sinks.hdfs_sink.hdfs.rollSize = 0  
source_agent.sinks.hdfs_sink.hdfs.rollCount = 10000  
source_agent.sinks.hdfs_sink.hdfs.rollInterval = 600

After running this

$ /opt/apache-flume-1.6.0-bin/bin/flume-ng agent --name source_agent 
--conf-file flume_website_logs_02_tail_source_hdfs_sink.conf

we get the logs appearing in HDFS and can see them easily in Hue:

Adding Kafka to the Pipeline

Let’s now add Kafka to the mix. I’ve already set up and started Kafka (see here for how), and Zookeeper’s already running as part of the default BigDataLite build.

First we need to define a Kafka topic that is going to hold the log files. In this case it’s called apache_logs:

$ /opt/kafka_2.10-0.8.2.1/bin/kafka-topics.sh --zookeeper bigdatalite:2181 
--create --topic apache_logs  --replication-factor 1 --partitions 1

Just to prove it’s there and we can send/receive message on it I’m going to use the Kafka console producer/consumer to test it. Run these in two separate windows:

$ /opt/kafka_2.10-0.8.2.1/bin/kafka-console-producer.sh 
--broker-list bigdatalite:9092 --topic apache_logs

$ /opt/kafka_2.10-0.8.2.1/bin/kafka-console-consumer.sh 
--zookeeper bigdatalite:2181 --topic apache_logs

With the Consumer running enter some text, any text, in the Producer session and you should see it appear almost immediately in the Consumer window.

Now that we’ve validated the Kafka topic, let’s plumb it in. We’ll switch the existing Flume config to use a Kafka sink, and then add a second Flume agent to do the Kafka -> HDFS bit, giving us this:

The original flume agent configuration now looks like this:

source_agent.sources = apache_log_tail  
source_agent.channels = memoryChannel  
source_agent.sinks = kafka_sink

# http://flume.apache.org/FlumeUserGuide.html#exec-source  
source_agent.sources.apache_log_tail.type = exec  
source_agent.sources.apache_log_tail.command = tail -f /home/oracle/website_logs/access_log  
source_agent.sources.apache_log_tail.batchSize = 1  
source_agent.sources.apache_log_tail.channels = memoryChannel

# http://flume.apache.org/FlumeUserGuide.html#memory-channel  
source_agent.channels.memoryChannel.type = memory  
source_agent.channels.memoryChannel.capacity = 100

## Write to Kafka  
source_agent.sinks.kafka_sink.channel = memoryChannel  
source_agent.sinks.kafka_sink.type = org.apache.flume.sink.kafka.KafkaSink  
source_agent.sinks.kafka_sink.batchSize = 5  
source_agent.sinks.kafka_sink.brokerList = bigdatalite:9092  
source_agent.sinks.kafka_sink.topic = apache_logs

Restart the kafka-console-consumer.sh from above so that you can see what’s going into Kafka, and then run the Flume agent. You should see the log entries appearing soon after. Remember that kafka-console-consumer.sh is just one consumer of the logs – when we plug in the Flume consumer to write the logs to HDFS we can opt to pick up all of the entries in Kafka, completely independently of what we have or haven’t consumed in kafka-console-consumer.sh.

$ /opt/apache-flume-1.6.0-bin/bin/flume-ng agent --name source_agent  
--conf-file flume_website_logs_03_tail_source_kafka_sink.conf

[oracle@bigdatalite ~]$ /opt/kafka_2.10-0.8.2.1/bin/kafka-console-consumer.sh 
--zookeeper bigdatalite:2181 --topic apache_logs  

37.252.227.70 - - [06/Sep/2015:08:08:30 +0000] "GET / HTTP/1.0" 301 235 "-" "Mozilla/5.0 (compatible; monitis.com - free monitoring service; http://monitis.com)"  
174.121.162.130 - - [06/Sep/2015:08:08:35 +0000] "HEAD /blog HTTP/1.1" 301 - "http://oraerp.com/blog" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)"  
177.71.183.71 - - [06/Sep/2015:08:08:35 +0000] "GET /blog/ HTTP/1.0" 200 145999 "-" "Mozilla/5.0 (compatible; monitis - premium monitoring service; http://www.monitis.com)"  
174.121.162.130 - - [06/Sep/2015:08:08:36 +0000] "HEAD /blog/ HTTP/1.1" 200 - "http://oraerp.com/blog" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)"  
173.192.34.91 - - [06/Sep/2015:08:08:44 +0000] "GET / HTTP/1.0" 301 235 "-" "Mozilla/5.0 (compatible; monitis.com - free monitoring service; http://monitis.com)"  
217.146.9.53 - - [06/Sep/2015:08:08:58 +0000] "GET / HTTP/1.0" 301 235 "-" "Mozilla/5.0 (compatible; monitis - premium monitoring service; http://www.monitis.com)"  
82.47.31.235 - - [06/Sep/2015:08:08:58 +0000] "GET / HTTP/1.1" 200 36946 "-" "Echoping/6.0.2"

Set up the second Flume agent to use Kafka as a source, and HDFS as the target just as it was before we added Kafka into the pipeline:

target_agent.sources = kafkaSource  
target_agent.channels = memoryChannel  
target_agent.sinks = hdfsSink 

target_agent.sources.kafkaSource.type = org.apache.flume.source.kafka.KafkaSource  
target_agent.sources.kafkaSource.zookeeperConnect = bigdatalite:2181  
target_agent.sources.kafkaSource.topic = apache_logs  
target_agent.sources.kafkaSource.batchSize = 5  
target_agent.sources.kafkaSource.batchDurationMillis = 200  
target_agent.sources.kafkaSource.channels = memoryChannel

# http://flume.apache.org/FlumeUserGuide.html#memory-channel  
target_agent.channels.memoryChannel.type = memory  
target_agent.channels.memoryChannel.capacity = 100

## Write to HDFS  
#http://flume.apache.org/FlumeUserGuide.html#hdfs-sink  
target_agent.sinks.hdfsSink.type = hdfs  
target_agent.sinks.hdfsSink.channel = memoryChannel  
target_agent.sinks.hdfsSink.hdfs.path = /user/oracle/incoming/rm_logs/apache_log  
target_agent.sinks.hdfsSink.hdfs.fileType = DataStream  
target_agent.sinks.hdfsSink.hdfs.writeFormat = Text  
target_agent.sinks.hdfsSink.hdfs.rollSize = 0  
target_agent.sinks.hdfsSink.hdfs.rollCount = 10000  
target_agent.sinks.hdfsSink.hdfs.rollInterval = 600

Fire up the agent:

$ /opt/apache-flume-1.6.0-bin/bin/flume-ng agent -n target_agent 
-f flume_website_logs_04_kafka_source_hdfs_sink.conf

and as the website log data streams in to Kafka (from the first Flume agent) you should see the second Flume agent sending it to HDFS and evidence of this in the console output from Flume:

15/10/27 13:53:53 INFO hdfs.BucketWriter: Creating /user/oracle/incoming/rm_logs/apache_log/FlumeData.1445954032932.tmp

and in HDFS itself:

Play it again, Sam?

All we’ve done to this point is add Kafka into the pipeline, ready for subsequent use. We’ve not changed the nett output of the data pipeline. But, we can now benefit from having Kafka there, by re-running some of our HDFS load without having to go back to the source files. Let’s say we want to partition the logs as we store them. But, we don’t want to disrupt the existing processing. How? Easy! Just create another Flume agent with the additional configuration in to do the partitioning.

target_agent.sources = kafkaSource  
target_agent.channels = memoryChannel  
target_agent.sinks = hdfsSink

target_agent.sources.kafkaSource.type = org.apache.flume.source.kafka.KafkaSource  
target_agent.sources.kafkaSource.zookeeperConnect = bigdatalite:2181  
target_agent.sources.kafkaSource.topic = apache_logs  
target_agent.sources.kafkaSource.batchSize = 5  
target_agent.sources.kafkaSource.batchDurationMillis = 200  
target_agent.sources.kafkaSource.channels = memoryChannel  
target_agent.sources.kafkaSource.groupId = new  
target_agent.sources.kafkaSource.kafka.auto.offset.reset = smallest  
target_agent.sources.kafkaSource.interceptors = i1

# http://flume.apache.org/FlumeUserGuide.html#memory-channel  
target_agent.channels.memoryChannel.type = memory  
target_agent.channels.memoryChannel.capacity = 1000

# Regex Interceptor to set timestamp so that HDFS can be written to partitioned  
target_agent.sources.kafkaSource.interceptors.i1.type = regex_extractor  
target_agent.sources.kafkaSource.interceptors.i1.serializers = s1  
target_agent.sources.kafkaSource.interceptors.i1.serializers.s1.type = org.apache.flume.interceptor.RegexExtractorInterceptorMillisSerializer  
target_agent.sources.kafkaSource.interceptors.i1.serializers.s1.name = timestamp  
#
# Match this format logfile to get timestamp from it:  
# 76.164.194.74 - - [06/Apr/2014:03:38:07 +0000] "GET / HTTP/1.1" 200 38281 "-" "Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)"  
target_agent.sources.kafkaSource.interceptors.i1.regex = (\d{2}\/[a-zA-Z]{3}\/\d{4}:\d{2}:\d{2}:\d{2}\s\+\d{4})  
target_agent.sources.kafkaSource.interceptors.i1.serializers.s1.pattern = dd/MMM/yyyy:HH:mm:ss Z  
#

## Write to HDFS  
#http://flume.apache.org/FlumeUserGuide.html#hdfs-sink  
target_agent.sinks.hdfsSink.type = hdfs  
target_agent.sinks.hdfsSink.channel = memoryChannel  
target_agent.sinks.hdfsSink.hdfs.path = /user/oracle/incoming/rm_logs/apache/%Y/%m/%d/access_log  
target_agent.sinks.hdfsSink.hdfs.fileType = DataStream  
target_agent.sinks.hdfsSink.hdfs.writeFormat = Text  
target_agent.sinks.hdfsSink.hdfs.rollSize = 0  
target_agent.sinks.hdfsSink.hdfs.rollCount = 0  
target_agent.sinks.hdfsSink.hdfs.rollInterval = 600

The important lines of note here (as highlighted above) are:

the regex_extractor interceptor which determines the timestamp of the log event, then used in the hdfs.path partitioning structure
the groupId and kafka.auto.offset.reset configuration items for the kafkaSource.
- The groupId ensures that this flume agent’s offset in the consumption of the data in the Kafka topic is maintained separately from that of the original agent that we had. By default it is flume, and here I’m overriding it to new. It’s a good idea to specify this explicitly in all Kafka flume consumer configurations to avoid complications.
- kafka.auto.offset.reset tells the consumer that if no existing offset is found (which is won’t be, if the groupId is new one) to start from the beginning of the data rather than the end (which is what it will do by default).
- Thus if you want to get Flume to replay the contents of a Kafka topic, just set the groupId to an unused one (eg ‘foo01’, ‘foo02’, etc) and make sure the kafka.auto.offset.reset is smallest

Now run it (concurrently with the existing flume agents if you want):

$ /opt/apache-flume-1.6.0-bin/bin/flume-ng agent -n target_agent 
-f flume_website_logs_07_kafka_source_partitioned_hdfs_sink.conf

You should see a flurry of activity (or not, depending on how much data you’ve already got in Kafka), and some nicely partitioned apache logs in HDFS:

Crucially, the existing flume agent and non-partitioned HDFS pipeline stays in place and functioning exactly as it was – we’ve not had to touch it. We could then run two two side-by-side until we’re happy the partitioning is working correctly and then decommission the first. Even at this point we have the benefit of Kafka, because we just turn off the original HDFS-writing agent – the new “live” one continues to run, it doesn’t need reconfiguring. We’ve validated the actual configuration we’re going to use for real, we’ve not had to simulate it up with mock data sources that then need re-plumbing prior to real use.

Clouds and Channels

We’re going to evolve the pipeline a bit now. We’ll go back to a single Flume agent writing to HDFS, but add in Amazon’s S3 as the target for the unprocessed log files. The point here is not so much that S3 is the best place to store log files (although it is a good option), but as a way to demonstrate a secondary method of keeping your raw data available without impacting the source system. It also fits nicely with using the Kafka flume channel to tighten the pipeline up a tad:

Amazon’s S3 service is built on HDFS itself, and Flume can use the S3N protocol to write directly to it. You need to have already set up your S3 ‘bucket’, and have the appropriate AWS Access Key ID and Secret Key. To get this to work I added these credentials to /etc/hadoop/conf.bigdatalite/core-site.xml (I tried specifying them inline with the flume configuration but with no success):

<property>  
    <name>fs.s3n.awsAccessKeyId</name>  
    <value>XXXXXXXXXXXXX</value>  
</property>  
<property>  
    <name>fs.s3n.awsSecretAccessKey</name>  
    <value>YYYYYYYYYYYYYYYYYYYY</value>  
</property>

Once you’ve set up the bucket and credentials, the original flume agent (the one pulling the actual web server logs) can be amended:

source_agent.sources = apache_log_tail  
source_agent.channels = kafkaChannel  
source_agent.sinks = s3Sink

# http://flume.apache.org/FlumeUserGuide.html#exec-source  
source_agent.sources.apache_log_tail.type = exec  
source_agent.sources.apache_log_tail.command = tail -f /home/oracle/website_logs/access_log  
source_agent.sources.apache_log_tail.batchSize = 1  
source_agent.sources.apache_log_tail.channels = kafkaChannel


## Write to Kafka Channel  
source_agent.channels.kafkaChannel.channel = kafkaChannel  
source_agent.channels.kafkaChannel.type = org.apache.flume.channel.kafka.KafkaChannel  
source_agent.channels.kafkaChannel.topic = apache_logs  
source_agent.channels.kafkaChannel.brokerList = bigdatalite:9092  
source_agent.channels.kafkaChannel.zookeeperConnect = bigdatalite:2181

## Write to S3  
source_agent.sinks.s3Sink.channel = kafkaChannel  
source_agent.sinks.s3Sink.type = hdfs  
source_agent.sinks.s3Sink.hdfs.path = s3n://rmoff-test/apache  
source_agent.sinks.s3Sink.hdfs.fileType = DataStream  
source_agent.sinks.s3Sink.hdfs.filePrefix = access_log  
source_agent.sinks.s3Sink.hdfs.writeFormat = Text  
source_agent.sinks.s3Sink.hdfs.rollCount = 10000  
source_agent.sinks.s3Sink.hdfs.rollSize = 0  
source_agent.sinks.s3Sink.hdfs.batchSize = 10000  
source_agent.sinks.s3Sink.hdfs.rollInterval = 600

Here the source is the same as before (server logs), but the channel is now Kafka itself, and the sink S3. Using Kafka as the channel has the nice benefit that the data is now already in Kafka, we don’t need that as an explicit target in its own right.

Restart the source agent using this new configuration:

$ /opt/apache-flume-1.6.0-bin/bin/flume-ng agent --name source_agent 
--conf-file flume_website_logs_09_tail_source_kafka_channel_s3_sink.conf

and you should get the data appearing on both HDFS as before, and now also in the S3 bucket:

Didn’t Someone Say Logstash?

The premise at the beginning of this exercise was that I could extend an existing data pipeline to pull data into a new set of tools, as if from the original source, but without touching that source or the existing configuration in place. So far we’ve got a pipeline that is pretty much as we started with, just with Kafka in there now and an additional feed to S3:

Now we’re going to extend (or maybe “broaden” is a better term) the data pipeline to add Elasticsearch into it:

Whilst Flume can write to Elasticsearch given the appropriate extender, I’d rather use a tool much closer to Elasticsearch in origin and direction – Logstash. Logstash supports Kafka as an input (and an output, if you want), making the configuration ridiculously simple. To smoke-test the configuration just run Logstash with this configuration:

input {  
        kafka {  
                zk_connect => 'bigdatalite:2181'  
                topic_id => 'apache_logs'  
                codec => plain {  
                        charset => "ISO-8859-1"  
                }
                # Use both the following two if you want to reset processing  
                reset_beginning => 'true'  
                auto_offset_reset => 'smallest'

        }  
}

output {  
        stdout {codec => rubydebug }  
        }

A few of things to point out in the input configuration:

You need to specify plain codec (assuming your input from Kafka is). The default codec for the Kafka plugin is json, and Logstash does NOT like trying to parse plain text and json as I found out:

37.252.227.70 - - [06/Sep/2015:08:08:30 +0000] "GET / HTTP/1.0" 301 235 "-" "Mozilla/5.0 (compatible; monitis.com - free monitoring service; http://monitis.com)" {:exception=>#<NoMethodError: undefined method `[]' for 37.252:Float>, :backtrace=>["/opt/logstash-1.5.4/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.4-java/lib/logstash/event.rb:73:in `initialize'", "/opt/logstash-1.5.4/vendor/bundle/jruby/1.9/gems/logstash-codec-json-1.0.1/lib/logstash/codecs/json.rb:46:in `decode'", "/opt/logstash-1.5.4/vendor/bundle/jruby/1.9/gems/logstash-input-kafka-1.0.0/lib/logstash/inputs/kafka.rb:169:in `queue_event'", "/opt/logstash-1.5.4/vendor/bundle/jruby/1.9/gems/logstash-input-kafka-1.0.0/lib/logstash/inputs/kafka.rb:139:in `run'", "/opt/logstash-1.5.4/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.4-java/lib/logstash/pipeline.rb:177:in `inputworker'", "/opt/logstash-1.5.4/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.4-java/lib/logstash/pipeline.rb:171:in `start_input'"], :level=>:error}

As well as specifying the codec, I needed to specify the charset. Without this I got \u0000\xBA\u0001 at the beginning of each message that Logstash pulled from Kafka
Specifying reset_beginning and auto_offset_reset tell Logstash to pull everything in from Kafka, rather than starting at the latest offset.

When you run the configuration file above you should see a stream of messages to your console of everything that is in the Kafka topic:

$ /opt/logstash-1.5.4/bin/logstash -f logstash-apache_10_kafka_source_console_output.conf

The output will look like this – note that Logstash has added its own special @version and @timestamp fields:

{  
       "message" => "203.199.118.224 - - [09/Oct/2015:04:13:23 +0000] "GET /wp-content/uploads/2014/10/JFB-View-Selector-LowRes-300x218.png HTTP/1.1" 200 53295 "http://www.rittmanmead.com/2014/10/obiee-how-to-a-view-selector-for-your-dashboard/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.143 Safari/537.36"",  
      "@version" => "1",  
    "@timestamp" => "2015-10-27T17:29:06.596Z"  
}

Having proven the Kafka-Logstash integration, let’s do something useful – get all those lovely log entries streaming from source, through Kafka, enriched in Logstash with things like geoip, and finally stored in Elasticsearch:

input {  
        kafka {  
                zk_connect => 'bigdatalite:2181'  
                topic_id => 'apache_logs'  
                codec => plain {  
                        charset => "ISO-8859-1"  
                }
                # Use both the following two if you want to reset processing  
                reset_beginning => 'true'  
                auto_offset_reset => 'smallest'  
        }
}


filter {  
        # Parse the message using the pre-defined "COMBINEDAPACHELOG" grok pattern  
        grok { match => ["message","%{COMBINEDAPACHELOG}"] }

        # Ignore anything that's not a blog post hit, characterised by /yyyy/mm/post-slug form  
        if [request] !~ /^/[0-9]{4}/[0-9]{2}/.*$/ { drop{} }

        # From the blog post URL, strip out the year/month and slug  
        #  http://www.rittmanmead.com/2015/02/obiee-monitoring-and-diagnostics-with-influxdb-and-grafana/  
        #     year  => 2015  
        #     month =>   02  
        #     slug  => obiee-monitoring-and-diagnostics-with-influxdb-and-grafana  
        grok { match => [ "request","/%{NUMBER:post-year}/%{NUMBER:post-month}/(%{NUMBER:post-day}/)?%{DATA:post-slug}(/.*)?$"] }

        # Combine year and month into one field  
        mutate { replace => [ "post-year-month" , "%{post-year}-%{post-month}" ] }

        # Use GeoIP lookup to locate the visitor's town/country  
        geoip { source => "clientip" }

        # Store the date of the log entry (rather than now) as the event's timestamp  
        date { match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]}  
}

output {  
        elasticsearch { host => "bigdatalite"  index => "blog-apache-%{+YYYY.MM.dd}"}  
        }

Make sure that Elasticsearch is running and then kick off Logstash:

$ /opt/logstash-1.5.4/bin/logstash -f logstash-apache_01_kafka_source_parsed_to_es.conf

Nothing will appear to happen on the console:

log4j, [2015-10-27T17:36:53.228]  WARN: org.elasticsearch.bootstrap: JNA not found. native methods will be disabled.  
Logstash startup completed

But in the background Elasticsearch will be filling up with lots of enriched log data. You can confirm this through the useful kopf plugin to see that the Elasticsearch indices are being created:

and directly through Elasticsearch’s RESTful API too:

$ curl -s -XGET http://bigdatalite:9200/_cat/indices?v|sort  
health status index                  pri rep docs.count docs.deleted store.size pri.store.size  
yellow open   blog-apache-2015.09.30   5   1      11872            0       11mb           11mb  
yellow open   blog-apache-2015.10.01   5   1      13679            0     12.8mb         12.8mb  
yellow open   blog-apache-2015.10.02   5   1      10042            0      9.6mb          9.6mb  
yellow open   blog-apache-2015.10.03   5   1       8722            0      7.3mb          7.3mb

And of course, the whole point of streaming the data into Elasticsearch in the first place – easy analytics through Kibana:

Conclusion

Kafka is awesome :-D

We’ve seen in this article how Kafka enables the implementation of flexible data pipelines that can evolve organically without requiring system rebuilds to implement or test new methods. It allows the data discovery function to tap in to the same source of data as the more standard analytical reporting one, without risking impacting the source system at all.

October 21, 2015 Rittman Mead

Introducing the Rittman Mead OBIEE Performance Analytics Service

Fix Your OBIEE Performance Problems Today

OBIEE is a powerful analytics tool that enables your users to make the most of the data in your organisation. Ensuring that expected response times are met is key to driving user uptake and successful user engagement with OBIEE.

Rittman Mead can help diagnose and resolve performance problems on your OBIEE system. Taking a holistic, full-stack view, we can help you deliver the best service to your users. Fast response times enable your users to do more with OBIEE, driving better engagement, higher satisfaction, and greater return on investment. We enable you to :

Create a positive user experience
Ensure OBIEE returns answers quickly
Empower your BI team to identify and resolve performance bottlenecks in real time

Rittman Mead Are The OBIEE Performance Experts

Rittman Mead have many years of experience in the full life cycle of data warehousing and analytical solutions, especially in the Oracle space. We know what it takes to design a good system, and to troubleshoot a problematic one.

We are firm believers in a practical and logical approach to performance analytics and optimisation. Eschewing the drunk man anti-method of ‘tuning’ configuration settings at random, we advocate making a clear diagnosis and baseline of performance problems before changing anything. Once a clear understanding of the situation is established, steps are taken in a controlled manner to implement and validate one change at a time.

Rittman Mead have spoken at conferences, produced videos, and written many blogs specifically on the subject of OBIEE Performance.

Performance Analytics is not a dark art. It is not the blind application of ‘best practices’ or ‘tuning’ configuration settings. It is the logical analysis of performance behaviour to accurately determine the issue(s) present, and the possible remedies for them.

Diagnose and Resolve OBIEE Performance Problems with Confidence

When you sign up for the Rittman Mead OBIEE Performance Analytics Service you get:

On-site consultancy from one of our team of Performance experts, including Mark Rittman (Oracle ACE Director), and Robin Moffatt (Oracle ACE).
A Performance Analysis Report to give you an assessment of the current performance and prioritised list of optimisation suggestions, which we can help you implement.
Use of the Performance Diagnostics Toolkit to measure and analyse the behaviour of your system and correlate any poor response times with the metrics from the server and OBIEE itself.
Training, which is vital for enabling your staff to deliver optimal OBIEE performance. We work with your staff to help them understand the good practices to be looking for in design and diagnostics. Training is based on formal courseware along with workshops based on examples from your OBIEE system where appropriate

Let Us Help You, Today!

Get in touch now to find out how we can help improve your OBIEE system’s performance. We offer a free, no-obligation sample of the Performance Analysis Report, built on YOUR data.

Don’t just call us when performance may already be problematic – we can help you assess your OBIEE system for optimal performance at all stages of the build process. Gaining a clear understanding of the performance profile of your system and any potential issues gives you the confidence and ability to understand any potential risks to the success of your project – before it gets too late.

Social

- Recent Posts
- Tags
- Search
- Sql2Odi – best practices for auto-generating your ETL content
  
  5:23 pm By Janis Rumnieks
  I blogged a while ago about our Sql2Odi tool that converts SELECT (and WITH as well) statements into Oracle ODI Mappings. (Blog posts 1, Read More »
- Why Data Governance Is More About Enabling Good Than Stopping Bad
  
  10:06 am By Jon Mead
  It seems like every day businesses are bombarded with reminders of the value of data. Data, we are told, leads us to Read More »
- Oracle APEX – Debugging Tip #1
  
  11:00 am By Lucas Hirschegger
  If I would need this feature I would probably invest a lot of time trying to find out an answer. But funny enough I found out Read More »
- We’ve added Search!
  
  4:07 pm By Jon Mead
  Well, to be precise, our blog is hosted on Ghost, and we have just upgraded to version 5.0 which includes search, see here.
- Oracle APEX – Social Login
  
  2:21 pm By Lucas Hirschegger
  I looked into Social Sign-in as an option for Oracle APEX a few years ago. This was pre APEX 18.1 and, at this time, it was Read More »
/EPM / Bi /Patch Releases 11g Big Data blog Business Intelligence Cloud EPM Cloud EPM On Premise Hyperion Obiee Oracle Oracle BI Apps Oracle BI Suite EE Oracle Data Integrator patch patch_set_update psu Uncategorized User Groups & Conferences

Tag Archives: Data Warehousing

Customized Data Integration Training

Public or Onsite Training?

Why Rittman Mead?

Why?

How many users are accessing OBIEE through a mobile device?

Which dashboards get exported to Excel the most frequently?

Precise Tracking of Dashboard Usage

Predictive Modelling to Identify Users at Risk of ‘Churn’

Are any of my users still accessing OBIEE through IE6?

Where are my users connecting to OBIEE from?

Who deleted this report? Who logged in? Who clicked the Do Not Click Button?

Want to know more? We’d love to tell you more!

Flume or Logstash? HDFS or Elasticsearch? … All of them!

Kafka in Action

The Starting Point – Flume -> HDFS

Adding Kafka to the Pipeline

Play it again, Sam?

Clouds and Channels

Didn’t Someone Say Logstash?

Conclusion

Fix Your OBIEE Performance Problems Today

Rittman Mead Are The OBIEE Performance Experts

Diagnose and Resolve OBIEE Performance Problems with Confidence

Let Us Help You, Today!

Social