Liberate your data

Intelligence is all about knowledge. This website is dedicated sharing expertise on Oracle BI. More »

 

OAC v105.4: Understanding Map Data Quality

OAC v105.4: Understanding Map Data Quality

Last week Oracle Analytics Cloud v105.4 was announced. One of the features particularly interested me since it reminded the story of an Italian couple willing to spend their honeymoon in the Australian Sydney and ending up in the same Sydney city but in Nova Scotia for a travel agency error. For the funny people out there: don't worry, it wasn't me!

The feature is "Maps ambiguous location matches" and I wanted to write a bit about it.

Btw OAC 105.4  includes a good set of new features like a unified Home page, the possibility to customize any DV font and more options for security and on-premises connections amongst others. For a full list of new features check out the related Oracle blog or videos.

Maps: a bit of History

Let's start with a bit of history. Maps have been around in OBIEE first and OAC later since a long time, in the earlier stages of my career I spent quite a lot of time writing HTML and Javascript to include map visualizations within OBIEE 10g. The basic tool was called Mapviewer and the knowledge & time required to create a custom clickable or drillable map was....huge!

With the raise of OBIEE 11g and 12c the Mapping capability became easier, a new "Map" visualization type was included in the Answers and all we had to do was to match the geographical reference coming from one of our Subject Areas (e.g. Country Name) with the related column containing the shape information (e.g. the Country Shape).

OAC v105.4: Understanding Map Data Quality

After doing so, we were able to plot our geographical information properly: adding multiple layers, drilling capabilities and tooltips was just a matter of few clicks.

OAC v105.4: Understanding Map Data Quality

The Secret Source: Good Maps and Data Quality

Perfect, you might think, we can easily use maps everywhere as soon as we have any type of geo-location data available in our dataset! Well, the reality in the old days wasn't like that, Oracle at the time provided some sample maps with a certain level of granularity and covering only some countries in detail. What if we wanted to display all the suburbs of Verona? Sadly that wasn't included so we were forced to either find a free source online or to purchase it from a Vendor.

The source of map shapes was only half of the problem to solve: we always need to create a join with a column coming from our Subject Area! Should we use the Zip Code? What about the Address? Is City name enough? The deeper we were going into the mapping details the more problems were arising.

A common problem (as we saw before about Sydney) was using the City name. How many cities are called the same? How many regions? Is the street name correct? Data quality was and still is crucial to provide accurate data and not only a nice but useless map view.

OAC and the Automatic Mapping Capability

Within OAC, DV offers the Automatic Mapping Capability, we only need to include in a Project a column containing a geographical reference (lat/long, country name etc), select "Map" as visualization type and the tool will choose the most appropriate mapping granularity that matches our dataset.

OAC v105.4: Understanding Map Data Quality

Great! This solves all our issues! Well... not all of them! The Automatic Mapping capability doesn't have all the possible maps in it, but we can always include new custom maps using the OAC Console if we need them.

OAC v105.4: Understanding Map Data Quality

So What's New in 105.4?

All the above was available way before the latest OAC release. The 105.4 adds the "Maps ambiguous location matches" feature, which means that every time we create a Map View, OAC will provide us with a Location Matches option

OAC v105.4: Understanding Map Data Quality

If we click this option OAC will provide as a simple window where we can see:

  • How many locations matched
  • How many locations have issues
  • What's the type of Issue?
OAC v105.4: Understanding Map Data Quality

The type of issue can be one between:

  • No Match in case OAC doesn't find any comparable geographical value
  • Multiple Matches  when there are multiple possible associations
  • Partial Matches when there is a match only to part of the content
OAC v105.4: Understanding Map Data Quality

We can then take this useful information and start a process of data cleaning to raise the quality of our data visualization.

Conclusion

Maps were and are a really important visualization available in OAC. The Maps ambiguous location matches feature provides a way to understand if our visualization is representative of our dataset. So, if you want to avoid spending your honeymoon in the wrong Sydney or if you just want to provide accurate maps on top of your dataset, use this feature available in OAC!

OOW19 Review: Oracle Analytics Deep Dive

OOW19 Review: Oracle Analytics Deep Dive

In my previous blog I outlined the global news regarding Oracle like the Always Free Tier, the new datacenter plan and the set of new tools for Data Science. Today's blog is dedicated to all the news announced regarding Oracle Analytics in any of the versions: Cloud, Server or Applications.

OOW19 Review: Oracle Analytics Deep Dive

Oracle Analytics Server

OAS is the long awaited replacement of OBIEE 12c on-premises and promises  functional parity with OAC. Current official ETA is Fiscal Year 2020 and it will be available to customers as a free upgrade. With OAS all customers still on-premises will experience the following benefits:

  • Almost 1-1 feature with OAC
  • Complete compatibility with OAC
  • Simplified cloud migration and better support for hybrid deployments

A related announcement for on-premises customers regards licensing: there is only a single license to purchase OAS which includes all features within it, no separate option for Mobile or Self-Service Data Visualization needed!

Oracle Analytics for Application

This represents the new incarnation of BIApps, completely redesigned specifically for Fusion Apps. As his predecessor, OAX (this is the acronym) it's a packaged, ready-to-use solution with pre-built ETLs and Analytics content like RPD, dashboards, analysis, KPIs. Under the covers uses Oracle Autonomous Data Warehouse and Oracle Data Integrator Cloud. OAX is also extendible, by bringing additional datasets in ADW and extending the semantic model and catalog.

Oracle Analytics Cloud

Several enhancements were announced, especially during Gabby Rubin's (VP of Oracle Analytics Product Management) Strategy & Roadmap Session. New features will be available in most of the areas of the tool, including the core of the centralized reporting: the RPD.

OOW19 Review: Oracle Analytics Deep Dive

Data Preparation

New options will be available in the Data Preparation/Enrichment phase such as:

  • Custom Enrichments based on pre-existing set of values. E.g. enriching PRODUCT_ID with fields coming from a standard Product dimension. This is an interesting idea to enable standardization of dimensions across reporting without forcing people to write SQL or to know where the standard information is coming from.
  • Force Enrichments/Masking: as Administrators, we could enforce some transformations like the credit card obfuscation of fields that may contain sensitive data.

Natural Language Generation

The Natural Language view is already present in the current version of OAC, there is a plan to enhance this visualization by adding more options in the settings panel for grouping and trending analysis.

OOW19 Review: Oracle Analytics Deep Dive

Spatial Analytics in OAC

A few weeks ago I wrote about Oracle Spatial Studio, a tool designed to provide advanced visual Spatial Analytics. This tool will remain and progress over time, OAC will not cover all the specific use-cases of Spatial Studio. However OAC will enhance its spatial capabilities, like:

  • Providing accurate information about row geo-location: e.g. how many rows were correctly located, how may errors and menus to fix value to location association.
  • Provide spatial functions in the front-end: an end-user will be easily able to calculate the distance between points in a map by writing a simple Logical SQL statement. This option will probably appear on the RPD first (check the twitter thread below)

As you can see, calculating the distance will be just a matter of having the correct dataset and writing a GeometryDistance function.

Connectivity and Security

One of OAC's missions is to become the Analytics Platform on top of any type of datasource. The plan in the future is to expand the list of connectors and security/configuration options like SSL or Kerberos. There is also a roadmap to extend the Data Gateway capabilities to query non-oracle databases.

Modeling capabilities

In OAC we were used to either the classic RPD approach or the self-service Data-Sets. The future reserves some news in both approaches:

  • A new cloud web-based Modeler with the objective of functional parity with the Admintool, so capable of handling more complex designs that the current light data-modeler. I believe this will be also an effort to adapt the RPD development process to the current standards of concurrent development, versioning and storage format.
  • A new Self Service Data Model solution to build light self service models allowing end-users to evolve datasets into proper models sharable and optimized for reporting.

I like the idea of allowing both top-down (centralized) as well as bottom-up (self-service) approach to data modeling. This provides clients the flexibility on the analytical approach while still allowing to enforce centralized rules  (e.g. unique source of truth) when needed.

Unified User Experience and Layout Customizations

As of now the old "Answers and Dashboards" and the new "Data Visualization Projects" were almost completely separated products with each one having its own home page and layout. In the next releases we'll see that the two worlds will start combining, with a unique home and a similar look and feel.

In other news, highly requested by end-users is the possibility of customize almost any option of the layout: from font type and size to colors of any object visible in a project.

Machine Learning Integration

As discussed in the previous OOW review post in the future OAC will be able to use models built in other tools like Oracle Machine Learning in the Autonomous Data Warehouse or Oracle Data Science. This provides an end-to-end Data Science story from Data Analyst to Data Scientist all with a simple, secure, highly configurable and performant toolset.

OOW19 Review: Oracle Analytics Deep Dive

As you can see a lot of news coming in various aspects of the tool, from on-premise functional parity, a new packaged solution for Fusion Apps and a lot of features enhancing OAC functionality and customization options.

What do you think? Is this the right direction? Do you feel there is something missing?

OOW 2019 Review: Free Tier, New Datacenters and a New End-to-End Path for Analytics and Data Science

OOW 2019 Review: Free Tier, New Datacenters and a New End-to-End Path for Analytics and Data Science

In the Oracle world, last week was "the week" with Oracle Openworld 2019 happening in San Francisco. A week full of exciting news, some of it were also associated with words like "Free", never heard before in any Oracle associated topic. This blog post will go in detail into some of the news with a special focus on the Analytics and Data Science topics.

OOW 2019 Review: Free Tier, New Datacenters and a New End-to-End Path for Analytics and Data Science

Oracle Cloud Free Tier

Let's start with the big news: Oracle Cloud Free Tier! A set of services that can ALWAYS be used for free which include Oracle's best offering in the database space like ATP (Autonomous Transaction Processing) and ADW (Autonomous Data Warehouse) as well as Compute, Storage and additional services for networking, monitoring and notifications.

OOW 2019 Review: Free Tier, New Datacenters and a New End-to-End Path for Analytics and Data Science

This is a huge news in Oracle ecosystem since it enables everyone to start using the products without the need of a credit card! The always free schema can be also used in conjunction with the 30-day Free Trial (with associated 300$ in credits) to experience the full set of Oracle products without spending a single cent.

OOW 2019 Review: Free Tier, New Datacenters and a New End-to-End Path for Analytics and Data Science

An additional interesting point (compared to previous Oracle's developer licensing models) is that there is nothing in the licensing terms blocking any customer to use the free tier for production usage! This means that potentially, if the resources provided satisfy the business requirements, anyone could potentially run production applications directly on the free tier! And, in cases when an upscale is needed, Oracle will provide a very easy option to switch from a free instance to a paid one.

However, as one can expect, the free tier has limitations, for example the databases will allow only 1 OCPU and 20GB of Storage each. On top of the technical limitation, for any of the products in the free tier there is no support and no SLAs. This means, for example, that in case of problems, you'll not be able to open a ticket to Oracle support. Something definitely to ponder about when implementing a production system.

OCI Gen2 and New Datacenters

During his Keynote, Larry Ellison also announced the plan to launch 20 new OCI Gen2 datacenters in the next year! An average of a new datacenter every 23 days!

OOW 2019 Review: Free Tier, New Datacenters and a New End-to-End Path for Analytics and Data Science
Taken from oracle documentation

This is very impressive and, as mentioned during the Keynote, will mean Oracle overtake Amazon for the number of datacenters. A particular mention needs to be given also to the OCI Gen2, the new version of Oracle Cloud Interface. The first generation of OCI mission was the pay per usage: offering a set of services available on demand and payable by hour. The OCI Gen2 adds the autonomous features to Gen1: services are now self-managed, self-patched, self-monitored, self-secured with no downtime required. OCI Gen2 removes a huge set of activities from the hands of administrators taking the human error out of the equation.

Analytics & Data Science

I had a talk on how to Become a Data Scientist with Oracle Analytics Cloud. The topic of Analytics & Data Science was really what interested me most, and my expectation for exciting news was met.

A whole new set of products will be shipped soon, making the whole Data Science experience more continuous and pervasive across the Oracle Products. Let's have a look at the news, I'll try to add links to the relevant sessions I attended.

  • Oracle Machine Learning: ML will be more pervasive in the Oracle Database, the Oracle Machine Learning Notebooks will be capable of handling R and Python in addition to SQL. All of this will be available on the Cloud Databases including ADW. A new Spark option is also coming enabling Machine Learning on the BDA.
  • Oracle Data Science: This is a brand new product for Data Science collaboration, work in team on projects, with sharing and versioning options available out of the box.
  • Oracle Data Catalog: Again a new product aimed at creating inventories of company's data assets and make them searchable and easily usable by business users or data scientist.
  • Oracle Analytics Cloud: A lot of new announcements for this product which is mature and consolidated in the market, like Natural Language Generation or enhancements in the Data Enrichment, which I'll address in a separate blog post.

An interesting feature is AutoML, available both in Oracle Machine Learning and Oracle Data Science, which removes some barriers to Data Science by automating most of the steps in the Machine Learning model creation such as model and feature selection, and hyper-parameters tuning.

OOW 2019 Review: Free Tier, New Datacenters and a New End-to-End Path for Analytics and Data Science
Taken from Oracle ML presentation

You might notice several enhancements in different products. However, the key indicator of Oracle's Data Science maturity is the fact that all of the products above can be easily combined! Oracle Data Science will use Oracle Machine Learning optimizations when running on supported platforms on top of datasets easily discoverable by Oracle Data Catalog. Machine Learning models developed by OML or ODS can then be exposed and used by Oracle Analytics Cloud. This provides an end to end path from data scientist to data analyst and business user all within the same toolset and on top of the best hardware (with support for GPUs coming)!

OOW 2019 Review: Free Tier, New Datacenters and a New End-to-End Path for Analytics and Data Science

All in all a great OOW full of exciting news: no more barriers to access Oracle Cloud with the free tier, 20 new datacenters coming in the next year and a set of tools to perform Data Science, from the Analyst to the Data Scientist, in a collaborative and extremely performant way! If you want to have more news regarding Oracle Analytics Cloud, don't miss my next blog post!

HVR | Real-Time CDC ::Oracle Autonomous DW::

Introduction

HVR | Real-Time CDC ::Oracle Autonomous DW::

High quality Business Intelligence is key in decision making for any successful organisation and for an increasing number of businesses this means being able to access real-time data.  At Rittman Mead we are seeing a big upturn in interest in technologies that will help our customers to integrate real-time data into their BI systems.  In a world where streaming is revolutionising the way we consume data, log-based Change Data Capture (CDC) comes into the picture as a way of providing access to real-time transactional data for reporting and analysis. Most databases support CDC nowadays; in Oracle for example the redo logs are a potential source of CDC data. Integration tools that support CDC are coming to the fore and one of these is HVR.

HVR is a real-time data integration solution that replicates data to and from many different data management technologies, both on-premise and cloud-based, including Oracle, SAP and Amazon. The HVR GUI makes it easy to connect together multiple databases, replicate entire schemas, initialise the source and target systems and then keep them in sync.

HVR have recently added Oracle’s Autonomous Data Warehouse (ADW) to their list of supported technologies and so in this blog we are stepping through the process of configuring an HVR Channel to replicate data from an Oracle database to an instance of Oracle ADW.


Setup

Before setting up replication you have to install HVR itself. This is simple enough, a fairly manual CLI job with a couple of files to create and save in the correct directories. Firewalls also needs to allow all HVR connections. HVR needs a database schema in which to store the repository configuration data and so we created a schema in the source Oracle database. It also needs some additional access rights on the Oracle source database.


Process

1.

The first step is to register the newly created Hub in the HVR GUI. The GUI can be run on any machine that is able to connect to the server on which the HVR hub is installed. We tested two GUI instances, one  running on a Windows machine and one on a MAC. Both were easy to install and configure.

HVR | Real-Time CDC ::Oracle Autonomous DW::

The database connection details entered here are for the HVR hub database, where metadata about the hub configuration is stored.

2.

Next we need to define our source and target. In both cases the connection between the HVR and the data uses standard Oracle database connectivity. The source connection is to a database on the same server as the HVR hub and the target connection uses a TNS connection pointing at the remote ADW instance.

Defining the source database involves right clicking on Location Configuration and selecting New Location:

HVR | Real-Time CDC ::Oracle Autonomous DW::

Configuring the target involves the same steps:

HVR | Real-Time CDC ::Oracle Autonomous DW::

You can see from the screenshot that we are using one of the Oracle-supplied tnsnames entries to connect to ADW and also that we are using a separate Oracle client install to connect to ADW. Some actions within HVR use the Oracle Call Interface and require a more recent version of the Oracle client than provided by our 12c database server install.

Next up is creating the “channel”. A channel channel in HVR groups together the source and target locations and allows the relationship between the two to be defined and maintained.  Configuring a new channel involves naming it, defining source and target locations and then identifying the tables in the source that contain the data to be replicated.

3.

The channel name is defined by right clicking on Channel Definitions and selecting New Channel.

HVR | Real-Time CDC ::Oracle Autonomous DW::

We then open the new channel and right click on Location Groups and select New Group to configure the group to contain source locations:

HVR | Real-Time CDC ::Oracle Autonomous DW::

The source location is the location we defined in step 2 above. We then right click on the newly created group and select New Action, Capture  to define the role of the group in the channel:

HVR | Real-Time CDC ::Oracle Autonomous DW::

The Capture action defines that data will be read from the locations in this group.

A second Location Group is needed for the for the target. This time we defined the target group to have the Integrate action so that data will be written to the locations in this group.

4.

The final step in defining the channel is to identify the tables we want to replicate. This can be done using the Table Explore menu option when you right-click on Tables.:

HVR | Real-Time CDC ::Oracle Autonomous DW::

5.

With the channel defined we can start synchronising the data between the two systems. We are starting with an empty database schema in our ADW target so we use the HVR Refresh action to first create the target tables in ADW and to populate them with the current contents of the source tables.  As the Refresh action proceeds we can monitor progress:

HVR | Real-Time CDC ::Oracle Autonomous DW::

6.

Now with the two systems in sync we can start the process of real-time data integration using the HVR Initialise action. This creates two new jobs in  the HVR Scheduler which then need to be started:

HVR | Real-Time CDC ::Oracle Autonomous DW::

One more thing to do of course: test that the channel is working and replications is happening in real-time. We applied a series of inserts, updates and deletes to the source system and monitored the log files for the two scheduled jobs to see the activity captured from the redo logs on the source:

HVR | Real-Time CDC ::Oracle Autonomous DW::

and then applied as new transactions on the target:

HVR | Real-Time CDC ::Oracle Autonomous DW::

The HVR Compare action allows us to confirm that the source and target are still in sync.

HVR | Real-Time CDC ::Oracle Autonomous DW::


Conclusion

Clearly the scenario we are testing here is a simple one. HVR can do much more - supporting one-to-many, many-to-many and also bi-directional replication configurations. Nonetheless we were impressed with how easy it was to install and configure HVR and also with the simplicity of executing actions and monitoring the channel through the GUI. We dipped in to using the command line interface when executing some of the longer running jobs and this was straightforward too.

Tableau | Dashboard Design ::Revoke A50 Petition Data::

Tableau | Dashboard Design ::Revoke A50 Petition Data::

Dashboards are most powerful through visual simplicity. They’re designed to automatically keep track of a specific set of metrics and keep human beings updated. Visual overload is like a binary demon in analytics that many developers seem possessed by; but less is more.

For example, many qualified drivers know very little about their dashboard besides speed, revs, temperature and fuel gauge. When an additional dash warning light comes on, even if it is just the tyre pressure icon let alone engine diagnostics light, most people will just take their car to the garage. The most obvious metrics in a car are in regard to its operation; if you didn't know your speed while driving you'd feel pretty blind. The additional and not so obvious metrics (i.e. dash warning lights) are more likely to be picked up by the second type of person who will spend the most time with that car: its mechanic. It would be pointless to overload a regular driver with all the data the car can possibly output in one go; that would just intimidate them. That's not what you want a car to do to the driver and that's certainly not what any organisation would want their operatives to feel like while their “car” is moving.

In light of recent political events, the exact same can metaphorically be applied to the big red Brexit bus. Making sense of it all might be a stretch too far for this article. Still, with appropriate use of Tableau dashboard design it is possible to answer seemingly critical questions on the topic with publicly available data.



There's An Ongoing Question That Needs Answering?
Where did 6 million+ signatures really come from?


Back in the UK, the Brexit fiasco is definitely still ongoing. Just before the recent A50 extensions took place, a petition to revoke article 50 and remain in the EU attracted more than 6 million signatures, becoming the biggest and fastest growing ever in history and sparking right wing criticism over the origin of thousands of signatures, claiming that most came from overseas and discrediting its legitimacy. Government responded by rejecting the petition.

Thankfully the data is publicly available (https://petition.parliament.uk/petitions/241584.json) for us to use as an example of how a dashboard can be designed to settle such a question (potentially in real time too as more signatures come in).

Tableau | Dashboard Design ::Revoke A50 Petition Data::

Tableau can handle JSON data quite well and, to nobody’s surprise, we quickly discover that over 95% of signatures are coming from the UK.

Now that we know what we're dealing with, lets focus the map on Britain and provide additional countries data in a format that is easier to digest visually. As cool as it is to hover over the world map, there's simpler ways to take this in.

Tableau | Dashboard Design ::Revoke A50 Petition Data::

Because in this case we know more than 95% of signatures originate from the UK, the heatmap above is far more useful, showing us the signature count for each constituency at a glance. The hotter the shading, the higher the count.


Scales Might Need Calibration
Bar Chart All The Way


Humans of all levels compute a bar chart well and it's perfect for what we need to know on how many signatures are coming from abroad altogether and from what countries in descending order.

Tableau | Dashboard Design ::Revoke A50 Petition Data::

With a margin so tiny, it's trickier to get a visual that makes sense. A pie chart, for example, would hardly display the smaller slice containing all of the non-UK origin signatures. Even with a bar chart we are struggling to see anything outside of the UK in a linear scale; but it is perfect if using logarithmic scales, which are definitely a must in this scenario.

Tableau | Dashboard Design ::Revoke A50 Petition Data::

And voila! The logarithmic scale allows the remaining counts to appear alongside the UK, even though France, the next country after the UK with most signatures, has a count below 50k. This means we can keep an eye on the outliers in more detail quite effortlessly. Not much looks out of place right now considering the number of expats Britain produces to the countries on the list. Now we know, as long as none of the other countries turn red, we have nothing to worry about!


Innovate When Needed

The logarithmic scale in Tableau isn't as useful for these %, so hacking the visualised values in order to amplify the data sections of interest is a perfectly valid way of thinking outside the box. In this example, half the graph is dedicated to 90-100% and the other half 0-90%. The blue chunk is the percentage of signatures coming from the UK, while every other country colour chunk is still so small. Since the totals from other countries are about the same as each mainland constituency, it's more useful to see it as one chunk. Lastly, adding the heat colour coding keeps the visual integrity.

Tableau | Dashboard Design ::Revoke A50 Petition Data::

Interactivity

Now that we have the count, percentage and location breakdown into 3 simple graphs we feel much wiser. So it's time to make them interact with each other.

The constituency heatmap doesn't need to interact with the bar charts. The correlation between the hottest bars and the heatmap is obvious from the get go, but if we were to filter the bars using the map, the percentages would be so tiny you wouldn't see much on the % graph. The same occurs for the Country bar chart, meaning that only the percentage chart can be usefully used as a filter. Selecting the yellow chunk will show the count of signatures for every country within it only.

Tableau | Dashboard Design ::Revoke A50 Petition Data::

Another way in which interactivity can be introduced is through adding further visualisations to the tooltip. The petition data contains the MP responsible for each constituency, so we can effectively put a count of signatures to each name. It's nice to be able to see what their parliamentary voting record has been throughout this Brexit deadlock, which was obtained publicly from the House of Commons portal https://commonsvotes.digiminster.com and blended in; as more votes come in, the list will automatically increase.

Tableau | Dashboard Design ::Revoke A50 Petition Data::

Keep It Simple

As you can see, 3 is a magic number here. The trio of visuals working together makes a dashing delivery of intel to the brain. With very little effort, we can see how many signatures come from the UK compared to rest of the world, how many thousands are coming from each country, how many from each constituency, who the MP you should be writing to is and how they voted in the indicative votes. Furthermore, this dashboard can keep track of all of that in real time, flagging any incoming surge of signatures from abroad, continuously counting the additional signatures until August 2019 and providing a transparent record of parliamentary votes in a format that is very easy to visually digest.

Tableau | Dashboard Design ::Revoke A50 Petition Data::