Liberate your data

Intelligence is all about knowledge. This website is dedicated sharing expertise on Oracle BI. More »

 

Oracle Analytics: Everything you always wanted to know (But were afraid to ask)

Oracle Analytics: Everything you always wanted to know (But were afraid to ask)

The release of Oracle Analytics Server (OAS) on 31st January 2020 has left many OBIEE users wondering what it means for them. Common questions we were asked at Oracle OpenWorld Europe last week included:

  • what’s the difference between OAS, OAC and OBIEE?
  • where does DV fit into this?
  • should I be migrating and if so, when; what are the benefits?

This blog post aims to answer all these questions!
First of all, let’s define each product in order to compare them.

Oracle Analytics: Everything you always wanted to know (But were afraid to ask)

Oracle Analytics Cloud (OAC)

Oracle’s strategic analytics platform, hosted in the Oracle Cloud.

Oracle Analytics Server (OAS)

OAS is the new on-prem version of OAC, and is intended as a bridge between legacy OBIEE platforms and Cloud. It has almost complete feature parity with OAC, including AI-powered, modern and self-service analytics capabilities for data preparation, visualisation, enterprise reporting, augmented analysis and natural language processing and search. If you are an OBIEE customer, you are automatically licensed for OAS, and Oracle Data Visualisation (DV) is included at no extra cost.

OAS vs OAC

The main difference between OAS and OAC is related to hosting and administration. OAC is hosted and managed by Oracle, while OAS needs to be installed, configured and patched by you in your datacenter. This also defines the level of control and customisation: with OAS you have the full control over config files, styles, custom function etc, while in OAC you’ll be able to change only what’s exposed in the cloud console by Oracle.
OAC will receive more frequent updates and new features, with OAS scheduled to have an annual release bringing the cloud features to on-premise customers.
So the choice between the two depends on the amount of customisations needed vs the time spent on supporting the platform.

OBIEE vs OAS

OAS was developed to replace OBIEE, however the two products are not exactly the same. There are one or two OBIEE features that are deprecated in OAS, such as BISQLProvider or Act As, but they are still present in the tool, and they’ll not go away until a proper replacement is in place. On the other side, If you were using Scorecards, this tool is no longer shipped with OAS.
OAS on the other hand, brings almost functional parity with OAC, providing a huge amount of new features especially in the self-service area, more info in the dedicated post.

Can I connect to a Database in my Datacenter with OAC?

Oracle Analytics: Everything you always wanted to know (But were afraid to ask)


Yes you can, Data Visualization offers the option to connect to any datasource which is reachable from the Cloud. If you don’t want to expose your database directly, Oracle Data Gateway enables the connection from OAC (including RPD based connections) to on-prem data-sources without the need to open any firewall port.

Where does DV come into this?

Oracle Analytics: Everything you always wanted to know (But were afraid to ask)

Data Visualization (formerly known as Visual Analyzer) is Oracle’s self-service tool. If you’re an OBIEE 12c user, you may be paying extra license fees to use DV. If you’re already using OAC, you may have noticed DV is included with your license, and this will also be the case for OAS.  

Ultimately, Oracle Analytics’ aim is to provide a mix and match offering, where you can choose which components are Cloud and on-prem. For example, you can upgrade to OAC and point it to your on-prem database. Or if you’re Cloud averse for whatever reason, you can migrate to OAS and utilise many of OAC’s features.

What does Rittman Mead recommend you do next?

There is probably a different answer for everyone, depending on where you are in your Oracle Analytics journey, so we’d recommend contacting us for an initial chat to go through your options. Broadly speaking, if you’re using OBIEE 11.1.1.7 onwards and you’re considering upgrading to 12c, you should factor OAS or OAC into your decision making process.

To help you decide which product is best for you, we are offering a FREE two-day Oracle Analytics assessment which aims to help you create a business case for migrating to OAC or OAS based on your pain points and anticipated usage. Contact us for more information.

Rittman Mead also provides OAC, OAS and OBIEE training. Our next OAC bootcamp is taking place in London, March 23rd - 26th 2020. For more information go to our training page or contact Dan Garrod.

Oracle Data Science – Accelerated Data Science SDK Configuration

Oracle Data Science - Accelerated Data Science SDK Configuration

In my last post, I introduced Oracle Data Science, the new tool from Oracle aimed at Data Science collaboration including an Auto-ML Python SDK named Accelerated Data Science. The SDK allows the speed-up and automation of various tasks in the ML pipeline: from feature engineering, model and features selection and model explainability. A very handy tool for newbies and experienced people facing data science problems.

First version, First Hurdle

In the first version of Oracle Data Science, there is a preliminary step to follow before using the Accelerated Data Science SDK. This step is needed to be able to connect Oracle Cloud Infrastructure Object Storage, to save our models in the model catalog.

Oracle Data Science - Accelerated Data Science SDK Configuration

Please note that all the other steps within the SDK will still be available even without this setting. You will still be able to execute the calls to feature engineering,  model and feature selection, model explainability functions but you'll not be able to save the model in the catalog.

If we want to accomplish this last step, we need to create a private/public key and setup a configuration file. Lucky enough we can run the whole process within the Oracle Data Science Notebook! Let's see all the steps

ADF SDK Configuration

First of all, let's login in the notebook and open a terminal session

Oracle Data Science - Accelerated Data Science SDK Configuration

We can then create a folder named .oci under the /home/datascience

mkdir ~/.oci

In the next step, we need to generate an API signing Key

openssl genrsa -out ~/.oci/oci_api_key.pem -aes128 2048 

the command will ask for a password which will secure the key

Oracle Data Science - Accelerated Data Science SDK Configuration

now it's time to generate the public key with

openssl rsa -pubout -in ~/.oci/oci_api_key.pem -out ~/.oci/oci_api_key_public.pem

again the command will ask for a password and then generate the oci_api_key_public.pem file

Another item we need to generate is the key's fingerprint, which can be done with

openssl rsa -pubout -outform DER -in ~/.oci/oci_api_key.pem | openssl md5 -c

the command will ask for the key's password and then output the fingerprint

Oracle Data Science - Accelerated Data Science SDK Configuration

The next piece of information needed is the Tenancy OCID and the User OCID. The first one can be obtained by navigating to the Governance and Administration section and then selecting Administration and Tenancy Details

Oracle Data Science - Accelerated Data Science SDK Configuration

The OCID is shown in the main Tenancy Information section

Oracle Data Science - Accelerated Data Science SDK Configuration

The User OCID can be found by selecting the Identity -> Users

Oracle Data Science - Accelerated Data Science SDK Configuration

After selecting the User we want to connect with, the OCID is visible

Oracle Data Science - Accelerated Data Science SDK Configuration

The next step is to upload the public key generated few steps before. Navigate again to the Identity -> Users -> Username screen in the console. Under the global User info, there is an API Keys section

Oracle Data Science - Accelerated Data Science SDK Configuration

we can click on Add Public Key and paste the content of the oci_api_key_public.pem file generated before

Oracle Data Science - Accelerated Data Science SDK Configuration

Now it's time to use all the information collected so far to create a config file. The file needs to reside under ~/.oci/ folder and must be named config with the following entries

[DEFAULT]
user=<OCID of the user>
fingerprint=<Fingerprint of the Key>
key_file=<Path to the private Key>
pass_phrase=<Passphrase used to Encrypt>
tenancy=<OCID of the Tenancy>
region=<Region where the Tenancy is hosted>

an example is

[DEFAULT]
user=ocid1.user.oc1........
fingerprint=c8:24:75:00:00....
key_file=~/.oci/oci_api_key.pem
pass_phrase=oracle123
tenancy=ocid1.tenancy.oc1.......
region=eu-frankfurt-1

If the settings are not correct, when trying to save the model you'll get an error like the following

Oracle Data Science - Accelerated Data Science SDK Configuration

If, settings are correct, the save model step (defined in my previous blog post) will execute correctly. Enjoy your trials with Oracle Data Science!

Machine Learning Collaboration with Oracle Data Science

Machine Learning Collaboration with Oracle Data Science

This week I was sadly forced to skip my planned trip to OOW London in which Rittman Mead had a fantastic booth and Jon Mead presented the "How to Become a Data Scientist" session with Oracle Analytics Cloud and Oracle Machine Learning.

The last-minute plan change, however, gave me time to test a new product just being available in the Oracle Cloud: Oracle Data Science! I've been talking about the tool been announced in my OOW19 Review post describing it as a Data Science collaboration tool coming from the acquisition of DataScience.com, let's have a look in detail at the first release.

Instance Creation

Oracle Data Science can be found in the OCI console, under the Data and AI section with other products including the newly released Data Catalog and Data Flow

Machine Learning Collaboration with Oracle Data Science

Before starting creating a Data Science Project, you'll need to review some security and policy settings which are well described in this blog post. Those settings are not straightforward, but once in place, we can then create a Project, where a team can collaborate. All we need to define is the compartment name, the project name and description.

Machine Learning Collaboration with Oracle Data Science

After creating the project, is time to create one (or more) Notebook Sessions within the project. In here we need to specify the compartment (the same as the project), the name of the notebook session, the shape (choosing between a pre-defined list of shapes), the size of the block storage and the networking details.

Machine Learning Collaboration with Oracle Data Science

If all the security and policy settings are working we'll have the notebook instance created in minutes

Machine Learning Collaboration with Oracle Data Science

We can then click on Open to enter the notebook itself.

What's in the Notebook

What we see immediately is that Data Science is based on instantiated images of jupyter notebooks including Python 3. Data Science Notebooks can also be version-controlled with a pre-built git integration and contain all the best python open source ML libraries like TensorFlow, Keras, SciKit-Learn, and XGBoost as well as the common visualization ones like Plotly and Matplotlib. The terminal access also means that we have the freedom to install any other custom library we might be interested in.

Machine Learning Collaboration with Oracle Data Science

We can now create a Python notebook and start solving out ML problems.

Pro-Tip: move all notebooks and data under the folder /block_storage  otherwise you'll lose the content when stopping and starting the notebook.

Oracle Accelerated Data Science (ADS) SDK

A new feature coming with Oracle Data Science is also the Accelerated Data Science (ADS) SDK: a python library simplifying and accelerating the data science process by offering a set of methods that covers all the phases of the process, from data acquisition to model creation, evaluation and interpretation.

The first step to use the SDK is to import the related libraries

from ads.dataset.factory import DatasetFactory
from ads.dataset.dataset_browser import DatasetBrowser

Then we can import the data and define pointsCat as our target column

df=DatasetFactory.open("Data/train_winedata.csv", target="pointsCat")

Immediately after importing, the library will suggest to use two methods: show_in_notebook()  and get_recommendations(). Let's execute them!

df.show_in_notebook()

This function creates a series of sections enabling us to understand better the dataset. The sections include a Summary expressing the overall features of the dataset, then for each Feature, a dedicated chart will represent the distribution. The Correlation tab shows the similarity between features and finally, the Data tab shows examples of the dataset.

Machine Learning Collaboration with Oracle Data Science

The next step to try is the get_recommendations() which generates few interesting suggestions to transform our dataset. The function shows which are the primary columns that should be deleted, suggestions on how to handle missing values and features with strong correlation, lastly allows the identification of the positive label for the target.

Machine Learning Collaboration with Oracle Data Science

The end-user can keep or change any of the values proposed and apply the transformations. Please note that the original object is not modified, to get the transformed object we need to call

wine_transformed=df.get_transformed_dataset()

If we don't want to go through all the transformations we can rely on ADF to chose the best for us by calling the auto_transform() function which will implement all the transformations suggested before with the default parameters. If we want to have a look at what transformations have been processed by the auto_transform(), we can call the visualize_transforms() function which will show the pipeline in an image similar to the below

Machine Learning Collaboration with Oracle Data Science

Next step is to create a predictive model, again with the SDK is just a few lines of code away

from ads.dataset.factory import DatasetFactory
from ads.automl.provider import OracleAutoMLProvider
from ads.automl.driver import AutoML
from ads.evaluations.evaluator import ADSEvaluator
ml_engine = OracleAutoMLProvider(n_jobs=-1, loglevel=logging.ERROR)
oracle_automl = AutoML(wine_transformed, provider=ml_engine)
automl_model1, baseline = oracle_automl.train()

The SDK will now start testing a few ML algorithms to find the one providing the best results. At the end of the execution, it'll show a summary output like the following which includes also the Selected Algorithm which is the ML model having the best score.

Machine Learning Collaboration with Oracle Data Science

For each of the ML algorithms tried, the SDK will also show some summary stats that can be used to evaluate all the models.

Machine Learning Collaboration with Oracle Data Science

We can reprint the models anytime passing the number of rows to visualize and the sort order as parameters with the following call

oracle_automl.print_trials(max_rows=20, sort_column='Mean Validation Score')

And have a graphical representation of why a certain algorithm was selected by calling the visualize_algorithm_selection_trials() function.

Machine Learning Collaboration with Oracle Data Science

ADS has similar calls to check the sampling size (visualize_adaptive_sampling_trials()) and the features selected (visualize_feature_selection_trials()). These functions can take a while to execute, probably due to the first version of the SDK.

The model train function accepts optional parameters such as:

  • if we are interested in just trying out a specific model (or models) we can pass it via model_list=['LogisticRegression']
  • to change the scoring function we use score_metric='f1_macro'
  • to give a hint about the amount of time to spend on training we pass time_budget=10
  • we can specify which features we want always to be included  with min_features=['price', 'country'])

Once created the models, we can then evaluate them with  

evaluator = ADSEvaluator(test_wine, models=[automl_model1, baseline],
                         training_data=wine_transformed, positive_class='Good')
evaluator.show_in_notebook(plots=['normalized_confusion_matrix'])
evaluator.metrics

which results being

Machine Learning Collaboration with Oracle Data Science

The ADF also contains functions for model explainability, but the ones available in v1 don't seem to work well, all the trials done as of now are stopping against errors in the code.

Once we identified the model, it's time to save it, we can do it via

from ads.catalog.model import ModelSummaryList, ModelCatalog
from ads.catalog.project import ProjectSummaryList, ProjectCatalog
from ads.catalog.summary import SummaryList
from ads.common.model_artifact import ModelArtifact
path_to_model_artifact = "/home/datascience/block_storage/my_model"
model_artifact = automl_model1.prepare(path_to_model_artifact, force_overwrite=True)

The prepare function will create several files within the chosen directory that describes the model, once done, we can store the model in the catalog with the following

import os
compartment_id = os.environ['NB_SESSION_COMPARTMENT_OCID']
project_id = os.environ["PROJECT_OCID"]

# Saving the model artifact to the model catalog:
mc_model = model_artifact.save(project_id=project_id, compartment_id=compartment_id, display_name="Wine LGBMClassifier",
description="Wine LGBMClassifier predictor", training_script_path="Training.ipynb", ignore_pending_changes=True)

Please note that this step will fail, if you didn't configure the ADF SDK to access the OCI APIs, a separate post is coming covering this step. If on the other side, all the setup is correctly done, the Model is then visible within the Data Science Project Page

Machine Learning Collaboration with Oracle Data Science

The model can now be downloaded and used by others, we can also think about  exposing it as function and call it via REST APIs, all possible with the help of the ADS SDK as mentioned in the documentation.

A New Ecosystem for Data Science Collaboration

Oracle Data Science is an interesting product and covers a missing piece in Oracle's AI strategy. As of now the user experience is a bit rough on the edges both during provisioning, with the policy configuration as a required pre-step, and during utilization with some of the steps not working 100% of the times. This, however, is just the first release and we hope to have a speedy following set of new versions as already happening for all the other products in the Oracle cloud.

Oracle Analytics Server Step by Step Installation

Oracle Analytics Server Step by Step Installation

Oracle Analytics Server is available! There is already a blog post talking about how to have a docker version up and running in minutes. My one explains the GUI process, providing hints on how to automate the whole installation using response files. The steps are very similar to an OBIEE12c installation, let's check the details.

Supported OS and Database

Before installing is always a good practice to check which are the supported OS and Databases (for the RCU schemas). A news for Oracle Analytics Server is that the certification matrix migrated from an Excel sheet to a website making it easier to find informations.

For this initial OAS release, the supported os is Linux, with Oracle Linux 6 being the minimal release. Windows support will come later in the year

Oracle Analytics Server Step by Step Installation

Also worth mentioning that as per documentation, the OAS RCU is supported only in Oracle Databases, starting from 11.2, support for other databases will come in the future.

Oracle Analytics Server Step by Step Installation

Once cleared the pre-reqs, let's see now how we can install OAS!

Files Download

The first step is to download files from e-delivery

Oracle Analytics Server Step by Step Installation

You can avoid downloading the Oracle VM Virtual Appliance since it's not needed and will save you from downloading an extra 2.8GB

Oracle Analytics Server Step by Step Installation

One of the prerequisites is the java JDK 1.8 8u211 or newer, which can be downloaded from the oracle website.

Before starting the installation, let's have a look at the recommended directory structure: as for OBIEE12c Oracle documentation suggests to keep the ORACLE_HOME and DOMAIN_HOME separate. This is not just a good practice, by keeping binaries and configurations separate we avoid having problems when new OAS versions will be released since they'll be installed in new ORACLE_HOMEs while the configurations and applications data will always reside in the DOMAIN_HOME.

Oracle Analytics Server Step by Step Installation

Be aware that the OAS default configuration step will suggest to create the DOMAIN_HOME under the ORACLE_HOME, you'll need to set the DOMAIN_HOME externally to be compliant to what the documentation suggests!

Installation

The first step is to unzip the files V988574-01.zip and V983368-01.zip then we can start the Fusion Middleware installation by

java -jar fmw_12.2.1.4.0_infrastructure.jar

This will open up the installation GUI asking to setup the Oracle Inventory Directory and the OS Group, if not already configured in the same server

Oracle Analytics Server Step by Step Installation

The next steps require to choose the ORACLE_HOME as per diagram before we selected the /opt/oracle/product/oas55 which gives also a hint on the OAS version we're installing

Oracle Analytics Server Step by Step Installation

The rest of the FMW installation is, selecting the installation type, verifying the prerequisites and checking the installation summary. An important step, if you want to reproduce the same installation via script in other environments is to save the response file. The file (in the following command named fmw.rsp) contains all the details of the setup, and, by changing some parameters, it can be reused for other installations.

java -jar name fmw_12.2.1.4.0_infrastructure.jar -silent -responseFile fmw.rsp -invPtrLoc <INVENTORY_FILE_LOCATION>  

After installing, we need to apply the 30657796 patch which was downloaded from e-delivery. To do that, we simply need to unzip the V988922-01.zip file which will create the 30657796 folder and then

cd 30657796
export PATH=$ORACLE_HOME/Opatch:$PATH
opatch apply

If the patching is successful we'll get a message like

Patching component oracle.fmwconfig.common.wls.shared.internal, 12.2.1.4.0...
Patch 30657796 successfully applied.
Log file location: /....log

OPatch succeeded.

Next step is OAS installation, we can do it by running

java -jar Oracle_Analytics_Server_5.5.0.jar

After skipping the auto-upgrades we just need to select the ORACLE_HOME (same one defined on top) and verify that all the prerequisites are met.

Oracle Analytics Server Step by Step Installation

Again, you may want to save the response file for later in case you want to reproduce the silent installation in another server.

Repository

Next step is to create the database schemas required by OAS, by executing the rcu command.

cd $ORACLE_HOME/oracle_common/bin
./rcu

The GUI will ask for Database location and credentials used to create the schemas

Oracle Analytics Server Step by Step Installation

Please note that you need to select Oracle Business Intelligence checkbox in the Select Components tab. You can also specify tablespaces and password for each of the schemas created. Also in this case an option to save the response file in order to do a silent installation is available.

The RCU schemas can also be created as part of the configuration process, I detailed it as separate topic since the standalone rcu is more reliable and provides more options for tablespace and schema management.

Configuration

Next step is to configure OAS, for this we need to run

cd $ORACLE_HOME/bi/bin
./config.sh

The set of screens will ask for the DOMAIN_HOME mentioned above, the GUI will suggest a DOMAIN_HOME located under the ORACLE_HOME. If you want to be compliant to the Oracle documentation and avoid problems in case of future in place upgrades, place the DOMAIN_HOME in a separate folder!

Oracle Analytics Server Step by Step Installation

Other details required are: the Admin username and password, the products to include (with BI Publisher and Oracle Analytics Enterprise Edition being the two options available). Then we'll have to point to the RCU schema created before and decide the port range.

Oracle Analytics Server Step by Step Installation

If all everything is set correctly you should get a successful configuration with all the services starting properly.

Oracle Analytics Server Step by Step Installation

And OAS should be available (by default at http://<SERVER_NAME>:9502/analytics)!

Oracle Analytics Server Step by Step Installation

The whole example above was performed on a OCI compute instance installing schemas on a DBaaS. This could be a solution for customers willing to use the Oracle cloud but still needing a level of customization and control higher than what's achievable in OAC.

For more info, don't hesitate to contact us!

Come and see us at Oracle OpenWorld 😎 One week to go!

Come and see us at Oracle OpenWorld 😎 One week to go!

If you haven’t registered for Oracle OpenWorld Europe yet, why not? It’s free to attend, and is jam packed with business and technical sessions delivered by a mixture of Oracle’s product team and end users giving real world case studies of using Oracle’s Cloud offerings. You can register here.

Rittman Mead will be at Stand 40 on both the 12th and 13th, where you can come and talk to us about OAC, OAS, AI, ML and everything else analytics. You can still book a slot in our OOW diary, or drop by whenever it suits you!

Come and see us at Oracle OpenWorld 😎 One week to go!

With Oracle Analytics Server (OAS) having been released last Friday, you’ve probably got some questions. We’ve done some work in it internally, so come and speak to us if you want to see OAS working, talk about the upgrade/install process or quiz us about licensing etc.
Not to mention, our very own Oracle ACE Director, Francesco Tisiot👇🏼 will be teaching you How to become a Data Scientist with OAC at 8.20am on Wednesday 12th, Arena H - Zone 4. Don’t miss it!

Come and see us at Oracle OpenWorld 😎 One week to go!

The team looks forward to seeing you there!