Category Archives: Rittman Mead

A decade in data, a decade to come

First of all I’d like to take the chance to wish everyone in our network, colleagues, customers and friends a very happy new year.

The last decade was an incredible one for the data industry.  I realised before writing this that 2010 was actually when I started at my first data management consultancy, so it’s given me the perfect chance to reflect on what I’ve witnessed in the last 10 years.  

Data Insights out of IT and into the business

One thing that stands out is the relevance of insights through data within business functions.  The reason why we do what we do at Rittman Mead is because we believe that data solutions will drive positive change for us all in the future.  Making it accessible, insightful and interesting to people who are doing great things for great companies is the goal for all of us.

During the last decade we saw the mass market adoption of line of business data discovery tools such as Tableau, Oracle Data Visualization, QlikSense & Microsoft PowerBI.  All wonderfully innovative products that have flown the flag for bringing insights through data into the business.  

Oracle DV Geo Layers

What is Big Data?

Also, what has happened to the Big Data hype which was a huge back in 2010?

Ten years on do we think it lived up to its hype?  Do we even know what it means? On reflection was it as groundbreaking as we might have thought at the time?  This is an argument that could be had but might never end.

Public Cloud Cover

One thing is for sure - the adoption at scale of public Cloud, SaaS, Storage and what it seems is infinite amounts of processing power has been huge since 2010.  We've seen AWS and Microsoft Azure leading the way, we see Google and Oracle Cloud Infrastructure catching up.  We know there are pros, cons, knowns and unknowns when moving to Cloud hosted applications and platforms.  We know that one thing is for sure - come 2030 the Cloud will be bigger whether we like it or not.

Streams of Events

Finally the emergence of event driven architectures is as innovative as it is complex. Large growth companies like Uber have built their business model on the ability to process huge amounts of events at low latency in order to operate.  Think demand + supply and Uber’s dynamic pricing model.  Fortunately open source projects like Kafka and particularly organisations like Confluent, the company behind Kafka, have made it easier to integrate this functionality to businesses throughout the world.

Analysing stream events in realtime using Kafka KSQL

It wouldn’t be fair to say that the innovation in the data industry has always been used ethically in the last 10 years and this is something that we all have a part to play in going forwards.  The Facebook / Cambridge Analytica scandal certainly made everyone reflect on the importance of using data for positive and ethical outcomes rather than underhand and in-transparent tactics.

And whats next?

As we embark on a new decade, we at Rittman Mead are excited about what is to come.  Artificial Intelligence, Machine Learning and Augmented Analytics is going to hit the mainstream and will play a part in all of our lives.  But the good news is - there will always be a place in this market for people.  Data is necessary, sophisticated machines are necessary but it relies on the people for it to work whether that be building an algorithm or making a decision based on the outcome of the information that a particular algorithm has generated.  We will see more automation and this is a brilliant thing.  Robotic Process Automation (RPA) and Autonomous databases such as Oracle’s offering are designed to breed innovation, not replacement.

Come and meet us in February

For us it all starts in February at Oracle OpenWorld Europe.  We will be exhibiting and presenting here and there will be insightful, thought provoking and relatable content throughout the two days.  Augmented analytics & autonomous will be at the top of the agenda and rightly so.  What’s more, it’s completely free to attend. Registration details can be found here:

We’d love to meet with you if you do decide to come.  To make this easier (as we know there is a lot to cover on days like these) we have set up a calendar where you can book a meeting slot with one of our sales team or technical consultants.  We can talk about:

  • Oracle
  • AI/ML
  • What are we doing?
  • What are you doing?
  • Anything you like in an informal setting

Please feel free to book a time slot using the link below:

https://calendly.com/rittman-mead/oracle-openworld

Finally we wish everyone a very prosperous new decade - who knows where will be by 2030!

Machine Learning and Spatial for FREE in the Oracle Database

Machine Learning and Spatial for FREE in the Oracle Database

Last week at UKOUG Techfest19 I spoke a lot about Machine Learning both with Oracle Analytics Cloud and more in depth in the Database with Oracle Machine Learning together with Charlie Berger, Oracle Senior Director of Product Management.

Machine Learning and Spatial for FREE in the Oracle Database

As mentioned several times in my previous blog posts, Oracle Analytics Cloud provides a set of tools helping Data Analysts start their path to Data Science. If, on the other hand, we're dealing with experienced Data Scientists and huge datasets, Oracle's proposal is to move Machine Learning where the data resides with Oracle Machine Learning. OML is an ecosystem of various options to perform ML with dedicated integration with Oracle Databases or Big Data appliances.

Machine Learning and Spatial for FREE in the Oracle Database

One of the most known branches is OML4SQL which provides the ability of doing proper data science directly in the database with PL/SQL calls! During the UKOUG TechFest19 talk Charlie Berger demoed it using a collaborative Notebook on top of an Autonomous Data Warehouse Cloud.

Machine Learning and Spatial for FREE in the Oracle Database

Both Oracle ADW and ATP include OML by default at no extra cost. This wasn't true for all the other database offerings in cloud or on-premises which required an additional option to be purchased (the Advanced Analytics one for on-premises deals). The separate license requirement was obviously something that limited the spread of this functionality, but, I'm happy to say that it's going away!

Oracle's blog post yesterday announced that:

As of December 5, 2019, the Machine Learning (formerly known as Advanced Analytics), Spatial and Graph features of Oracle Database may be used for development and deployment purposes with all on-prem editions and Oracle Cloud Database Services. See the Oracle Database Licensing Information Manual (pdf) for more details.

What this means is that both features are included for FREE within the Oracle Database License! Great news for both Machine Learning as well as Graph Databases fans! The following tweet from Dominic Giles (Master Product Manager for the Oracle DB) provides a nice summary of the licenses including the two options for the Oracle DB 19c.

But hey, this license change effects also older versions starting from the 12.2, the older one still in general support! So, no more excuses, perform Machine Learning where your data is: in the database with Oracle Machine Learning!

Timestamp Functions and Presentation Variables in Oracle Cloud Analytics

One of the most popular Rittman Mead blog posts over the last 10 years is Timestamps and Presentation Variables. As we are seeing more and more migrations to OAC, we decided to review and revise this post for the latest version of Oracle Cloud Analytics (OAC), 105.4.0-140 as of October 2019. Read more about the latest updates here.

--

One could say that creating a chart is not the most complex task in the world of Business Intelligence but we would argue that creating a meaningful report that perfectly illustrates the message hidden in data and therefore adds value to the management is nowhere close to being easy!    A good way to make a report as informative as possible is to use trends and comparison. And to do so, a perfect tool would be the time analysis functions. For example comparing sales in a period of time this year to the same period of time the year before. Or measure the similarity or dissimilarity of sales in different months of the year.

Demo Platform

I have used a free trial instance of OAC for this demo. If you haven’t done yet, sign up for a free 30-day trial Oracle Cloud account (different to an Oracle account). Use the account to access the Oracle Cloud Infrastructure (OCI) console which is the latest Oracle movement towards having one integrated cloud platform to manage all your Oracle cloud applications, platforms, and infrastructure in one place.
From the OCI console it is 5 to 10 minutes before your free trial instance of OAC is up and running. For the detailed step by step of creating a new instance read here.

Demo Goals

In this blog post I intend to show you how to combine the power of timestamp functions and presentation variables to create robust, repeatable reports. We will create a report that displays a year over year analysis for any rolling number of periods, by week or month, from any date in time, all determined by the user. This entire demo will only use values from a date and a revenue field.


TIMESTAMP Functions

TIMESTAMPADD() manipulates data of the data types DATE and DATETIME based on a calendar year.

Syntax: TIMESTAMPADD(interval, expr, timestamp)
Example: TIMESTAMPADD(SQL_TSI_MONTH, 12,Time."Order Date")
Description: Adds a specified number of intervals to a timestamp, and returns a single timestamp.
Timestamp Interval (TSI) Options: SQL_TSI_SECOND, SQL_TSI_MINUTE, SQL_TSI_HOUR, SQL_TSI_DAY, SQL_TSI_WEEK, SQL_TSI_MONTH, SQL_TSI_QUARTER, SQL_TSI_YEAR

Read more about other calendar functions.

Building Filters

Starting to build our demo, the filter below returns all dates greater than or equal to 7 days ago including the current date.

In other words we have now a functional filter to select all the rows where Date >= a week ago.

As a good practice, always include a second filter giving an upper limit to the time filter. For example "Periods"."Day Date" < CURRENT_DATE would confirm that there won’t be any records that you don’t want in the mix and therefore no unnecessary strain on the system.

Let’s go one step further, instead of going 7 days back, we could try and include all the previous days in the current month or in other words dates >= the first day of the month. In this scenario, we can use the DAYOFMONTH() function to get the calendar day of any date. From here it will be easy to calculate the number of days in the month so far. Our new filter would look like this:

For example, if today is October 16th, DAYOFMONTH(CURRENT_DATE) would equal 16. Thus, we would subtract 16 days from CURRENT_DATE to go back to September 30th, and adding one will give us October 1st.

Presentation Variables

A presentation variable is a variable that can be created from the front end, the Analytics as part of one of the following types of dashboard prompts:

  • Column prompt, Associated with a column and the values that it can take come from the column values. For information on working with column prompts, see Creating a Column Prompt.
  • Variable prompt, Not associated with any column, and you define the values that it can take. For information on working with variable prompts, see Creating a Variable Prompt.

Each time a user selects a value in the column or variable prompt, the value of the presentation variable is set to the value that the user selects and will then be sent to any references of that filter throughout the dashboard page. This could be filters, formulas and even text boxes.

The first presentation variable we could introduce is to replace the CURRENT_DATE with a prompted value. Let’s call this presentation variable pv_Date,

  • Use the syntax @{pv_Date} to call this variable in the reports.
  • For variables of type string, surround the name in single quotes: ‘@{pv_String]’
  • It is good practice to assign a default value to the presentation variables so that you can work with your report before publishing it to a dashboard. For example the default value for the pv_Date is CURRENT_DATE so the new syntax would be @{pv_Date}{CURRENT_DATE}

Demo Time!

Our updated filter after replacing the CURRENT_DATE looks like below. Will will refer to this filter later as Filter 1 (F1).

The filter is starting to take shape. Now let's say we are going to always be looking at a date range of six months before the selected date. All we would need to do is create a nested TIMESTAMP function. To do this, we will “wrap” our current TIMESTAMP with another that will subtract six months:

Now we have a filter to select dates that are greater than or equal to the first day of the month of any given date and all the six months prior to that.


To take this one step further, we can create another presentation variable called  pv_n to allow the users to determine the amount of months to include in this analysis from a dashboard prompt.


Here is the updated version of our filter using the number of periods presentation variable and a default value of 6, @{pv_n}{6}. We will refer to the following filter as Filter 2 (F2).

Our TIMESTAMPADD function is now fairly robust and will give us any date greater than or equal to the first day of the month from n months ago from any given date. Now we will see what we just created in action by creating date ranges to allow for a Year over Year analysis for any number of months. Consider the following filter set:

This may appear to be pretty intimidating at first but if we break it into parts we can start to understand its purpose. Notice we are using the exact same filters from before; Filter 1 and Filter 2.   What we have done here is filtered on two time periods, separated by the OR statement.

  • The first date range defines the period as being the most recent completed n months from any given prompted date value, using a presentation variable with a default of today. Dates in the current month have been removed from the set by Filter 1.
  • The second time period, after the OR statement, is the exact same as the first only it has been wrapped in another TIMESTAMP function subtracting a year, giving you the exact same time frame for the year prior.

This allows us to create a report that can run a year over year analysis for a rolling n month time frame determined by the user.

A note on nested TIMESTAMPS: you will always want to create nested TIMESTAMPS with the smallest interval first. Then you will wrap intervals as necessary. In this case our smallest increment is day, wrapped by month, wrapped by year.


Let’s Go Crazy

A more advanced trick, If you use real time or near real time reporting: using CURRENT_DATE may be how you want to proceed. Otherwise, instead of using today as your default date value, use yesterday’s date since most data are only as current as yesterday.  Using yesterday will be valuable especially when pulling reports on the first day of the month or year - you generally want the entire previous time period rather than the empty beginning of a new one.  So, to implement, wherever you have @{pDate}{CURRENT_DATE} replace it with @{pDate}{TIMESTAMPADD(SQL_TSI_DAY,-1,CURRENT_DATE)}

One more change on our filter to make it extra-flexible here is to use a new presentation variable to determine if you want to display year over year values, or by month, or by week. This can be done by inserting a variable into your SQL_TSI_MONTH and DAYOFMONTH statements; changing MONTH to SQL_TSI_@{pv_INT}{MONTH} and DAYOF@{pv_INT}{MONTH}, where pv_INT is the name of our variable.

Start by creating a dummy variable in your prompt to allow users to select either MONTH or WEEK.  You can try something like this: CASE MOD(DAY("Time"."Date"),2) WHEN 0 'WEEK' WHEN 1 THEN 'MONTH' END

The updated filter now look like this:

In order for our interaction between Month and Week to run smoothly we have to factor in one last consideration: if we are to take the date December 1st, 2019 and subtract one year we get December 1st, 2018.  However, if we take the first day of this week, Sunday December 15, 2019 and subtract one year we get Saturday December 15, 2014.  In our analysis this will cause an extra partial week to show up for prior years.  To get around this we will add a case statement determining if '@{pv_INT}{MONTH}' = 'Week' THEN subtract 52 weeks from the first of the week ELSE subtract 1 year from the first of the month. With this, our final filter set will look like this:


With the use of these filters and some creative dashboarding, you can construct a report that easily allows you to view a year over year analysis from any date in time for any number of periods either by month or by week.

Filtered by Week intervals;

The formula below will give you the value of period rolling to use in the analysis;

In this post, we created a cloud version of the amazing demo previously described by Brian Hall.  As demonstrated, Timestamp functions and their power have been within the interesting topics of the visualisation and reporting for as long as we at Rittman Mead remember and can still be used in the realm of the Oracle Cloud Services in a very similar way as the past.

Feel free to get in touch, let us know your reviews and comments.

OOW 2019 Review: Free Tier, New Datacenters and a New End-to-End Path for Analytics and Data Science

OOW 2019 Review: Free Tier, New Datacenters and a New End-to-End Path for Analytics and Data Science

In the Oracle world, last week was "the week" with Oracle Openworld 2019 happening in San Francisco. A week full of exciting news, some of it were also associated with words like "Free", never heard before in any Oracle associated topic. This blog post will go in detail into some of the news with a special focus on the Analytics and Data Science topics.

OOW 2019 Review: Free Tier, New Datacenters and a New End-to-End Path for Analytics and Data Science

Oracle Cloud Free Tier

Let's start with the big news: Oracle Cloud Free Tier! A set of services that can ALWAYS be used for free which include Oracle's best offering in the database space like ATP (Autonomous Transaction Processing) and ADW (Autonomous Data Warehouse) as well as Compute, Storage and additional services for networking, monitoring and notifications.

OOW 2019 Review: Free Tier, New Datacenters and a New End-to-End Path for Analytics and Data Science

This is a huge news in Oracle ecosystem since it enables everyone to start using the products without the need of a credit card! The always free schema can be also used in conjunction with the 30-day Free Trial (with associated 300$ in credits) to experience the full set of Oracle products without spending a single cent.

OOW 2019 Review: Free Tier, New Datacenters and a New End-to-End Path for Analytics and Data Science

An additional interesting point (compared to previous Oracle's developer licensing models) is that there is nothing in the licensing terms blocking any customer to use the free tier for production usage! This means that potentially, if the resources provided satisfy the business requirements, anyone could potentially run production applications directly on the free tier! And, in cases when an upscale is needed, Oracle will provide a very easy option to switch from a free instance to a paid one.

However, as one can expect, the free tier has limitations, for example the databases will allow only 1 OCPU and 20GB of Storage each. On top of the technical limitation, for any of the products in the free tier there is no support and no SLAs. This means, for example, that in case of problems, you'll not be able to open a ticket to Oracle support. Something definitely to ponder about when implementing a production system.

OCI Gen2 and New Datacenters

During his Keynote, Larry Ellison also announced the plan to launch 20 new OCI Gen2 datacenters in the next year! An average of a new datacenter every 23 days!

OOW 2019 Review: Free Tier, New Datacenters and a New End-to-End Path for Analytics and Data Science
Taken from oracle documentation

This is very impressive and, as mentioned during the Keynote, will mean Oracle overtake Amazon for the number of datacenters. A particular mention needs to be given also to the OCI Gen2, the new version of Oracle Cloud Interface. The first generation of OCI mission was the pay per usage: offering a set of services available on demand and payable by hour. The OCI Gen2 adds the autonomous features to Gen1: services are now self-managed, self-patched, self-monitored, self-secured with no downtime required. OCI Gen2 removes a huge set of activities from the hands of administrators taking the human error out of the equation.

Analytics & Data Science

I had a talk on how to Become a Data Scientist with Oracle Analytics Cloud. The topic of Analytics & Data Science was really what interested me most, and my expectation for exciting news was met.

A whole new set of products will be shipped soon, making the whole Data Science experience more continuous and pervasive across the Oracle Products. Let's have a look at the news, I'll try to add links to the relevant sessions I attended.

  • Oracle Machine Learning: ML will be more pervasive in the Oracle Database, the Oracle Machine Learning Notebooks will be capable of handling R and Python in addition to SQL. All of this will be available on the Cloud Databases including ADW. A new Spark option is also coming enabling Machine Learning on the BDA.
  • Oracle Data Science: This is a brand new product for Data Science collaboration, work in team on projects, with sharing and versioning options available out of the box.
  • Oracle Data Catalog: Again a new product aimed at creating inventories of company's data assets and make them searchable and easily usable by business users or data scientist.
  • Oracle Analytics Cloud: A lot of new announcements for this product which is mature and consolidated in the market, like Natural Language Generation or enhancements in the Data Enrichment, which I'll address in a separate blog post.

An interesting feature is AutoML, available both in Oracle Machine Learning and Oracle Data Science, which removes some barriers to Data Science by automating most of the steps in the Machine Learning model creation such as model and feature selection, and hyper-parameters tuning.

OOW 2019 Review: Free Tier, New Datacenters and a New End-to-End Path for Analytics and Data Science
Taken from Oracle ML presentation

You might notice several enhancements in different products. However, the key indicator of Oracle's Data Science maturity is the fact that all of the products above can be easily combined! Oracle Data Science will use Oracle Machine Learning optimizations when running on supported platforms on top of datasets easily discoverable by Oracle Data Catalog. Machine Learning models developed by OML or ODS can then be exposed and used by Oracle Analytics Cloud. This provides an end to end path from data scientist to data analyst and business user all within the same toolset and on top of the best hardware (with support for GPUs coming)!

OOW 2019 Review: Free Tier, New Datacenters and a New End-to-End Path for Analytics and Data Science

All in all a great OOW full of exciting news: no more barriers to access Oracle Cloud with the free tier, 20 new datacenters coming in the next year and a set of tools to perform Data Science, from the Analyst to the Data Scientist, in a collaborative and extremely performant way! If you want to have more news regarding Oracle Analytics Cloud, don't miss my next blog post!

Tableau | Dashboard Design ::Revoke A50 Petition Data::

Tableau | Dashboard Design ::Revoke A50 Petition Data::

Dashboards are most powerful through visual simplicity. They’re designed to automatically keep track of a specific set of metrics and keep human beings updated. Visual overload is like a binary demon in analytics that many developers seem possessed by; but less is more.

For example, many qualified drivers know very little about their dashboard besides speed, revs, temperature and fuel gauge. When an additional dash warning light comes on, even if it is just the tyre pressure icon let alone engine diagnostics light, most people will just take their car to the garage. The most obvious metrics in a car are in regard to its operation; if you didn't know your speed while driving you'd feel pretty blind. The additional and not so obvious metrics (i.e. dash warning lights) are more likely to be picked up by the second type of person who will spend the most time with that car: its mechanic. It would be pointless to overload a regular driver with all the data the car can possibly output in one go; that would just intimidate them. That's not what you want a car to do to the driver and that's certainly not what any organisation would want their operatives to feel like while their “car” is moving.

In light of recent political events, the exact same can metaphorically be applied to the big red Brexit bus. Making sense of it all might be a stretch too far for this article. Still, with appropriate use of Tableau dashboard design it is possible to answer seemingly critical questions on the topic with publicly available data.



There's An Ongoing Question That Needs Answering?
Where did 6 million+ signatures really come from?


Back in the UK, the Brexit fiasco is definitely still ongoing. Just before the recent A50 extensions took place, a petition to revoke article 50 and remain in the EU attracted more than 6 million signatures, becoming the biggest and fastest growing ever in history and sparking right wing criticism over the origin of thousands of signatures, claiming that most came from overseas and discrediting its legitimacy. Government responded by rejecting the petition.

Thankfully the data is publicly available (https://petition.parliament.uk/petitions/241584.json) for us to use as an example of how a dashboard can be designed to settle such a question (potentially in real time too as more signatures come in).

Tableau | Dashboard Design ::Revoke A50 Petition Data::

Tableau can handle JSON data quite well and, to nobody’s surprise, we quickly discover that over 95% of signatures are coming from the UK.

Now that we know what we're dealing with, lets focus the map on Britain and provide additional countries data in a format that is easier to digest visually. As cool as it is to hover over the world map, there's simpler ways to take this in.

Tableau | Dashboard Design ::Revoke A50 Petition Data::

Because in this case we know more than 95% of signatures originate from the UK, the heatmap above is far more useful, showing us the signature count for each constituency at a glance. The hotter the shading, the higher the count.


Scales Might Need Calibration
Bar Chart All The Way


Humans of all levels compute a bar chart well and it's perfect for what we need to know on how many signatures are coming from abroad altogether and from what countries in descending order.

Tableau | Dashboard Design ::Revoke A50 Petition Data::

With a margin so tiny, it's trickier to get a visual that makes sense. A pie chart, for example, would hardly display the smaller slice containing all of the non-UK origin signatures. Even with a bar chart we are struggling to see anything outside of the UK in a linear scale; but it is perfect if using logarithmic scales, which are definitely a must in this scenario.

Tableau | Dashboard Design ::Revoke A50 Petition Data::

And voila! The logarithmic scale allows the remaining counts to appear alongside the UK, even though France, the next country after the UK with most signatures, has a count below 50k. This means we can keep an eye on the outliers in more detail quite effortlessly. Not much looks out of place right now considering the number of expats Britain produces to the countries on the list. Now we know, as long as none of the other countries turn red, we have nothing to worry about!


Innovate When Needed

The logarithmic scale in Tableau isn't as useful for these %, so hacking the visualised values in order to amplify the data sections of interest is a perfectly valid way of thinking outside the box. In this example, half the graph is dedicated to 90-100% and the other half 0-90%. The blue chunk is the percentage of signatures coming from the UK, while every other country colour chunk is still so small. Since the totals from other countries are about the same as each mainland constituency, it's more useful to see it as one chunk. Lastly, adding the heat colour coding keeps the visual integrity.

Tableau | Dashboard Design ::Revoke A50 Petition Data::

Interactivity

Now that we have the count, percentage and location breakdown into 3 simple graphs we feel much wiser. So it's time to make them interact with each other.

The constituency heatmap doesn't need to interact with the bar charts. The correlation between the hottest bars and the heatmap is obvious from the get go, but if we were to filter the bars using the map, the percentages would be so tiny you wouldn't see much on the % graph. The same occurs for the Country bar chart, meaning that only the percentage chart can be usefully used as a filter. Selecting the yellow chunk will show the count of signatures for every country within it only.

Tableau | Dashboard Design ::Revoke A50 Petition Data::

Another way in which interactivity can be introduced is through adding further visualisations to the tooltip. The petition data contains the MP responsible for each constituency, so we can effectively put a count of signatures to each name. It's nice to be able to see what their parliamentary voting record has been throughout this Brexit deadlock, which was obtained publicly from the House of Commons portal https://commonsvotes.digiminster.com and blended in; as more votes come in, the list will automatically increase.

Tableau | Dashboard Design ::Revoke A50 Petition Data::

Keep It Simple

As you can see, 3 is a magic number here. The trio of visuals working together makes a dashing delivery of intel to the brain. With very little effort, we can see how many signatures come from the UK compared to rest of the world, how many thousands are coming from each country, how many from each constituency, who the MP you should be writing to is and how they voted in the indicative votes. Furthermore, this dashboard can keep track of all of that in real time, flagging any incoming surge of signatures from abroad, continuously counting the additional signatures until August 2019 and providing a transparent record of parliamentary votes in a format that is very easy to visually digest.

Tableau | Dashboard Design ::Revoke A50 Petition Data::