Liberate your data

Intelligence is all about knowledge. This website is dedicated sharing expertise on Oracle BI. More »

 

Machine Learning Collaboration with Oracle Data Science

Machine Learning Collaboration with Oracle Data Science

This week I was sadly forced to skip my planned trip to OOW London in which Rittman Mead had a fantastic booth and Jon Mead presented the "How to Become a Data Scientist" session with Oracle Analytics Cloud and Oracle Machine Learning.

The last-minute plan change, however, gave me time to test a new product just being available in the Oracle Cloud: Oracle Data Science! I've been talking about the tool been announced in my OOW19 Review post describing it as a Data Science collaboration tool coming from the acquisition of DataScience.com, let's have a look in detail at the first release.

Instance Creation

Oracle Data Science can be found in the OCI console, under the Data and AI section with other products including the newly released Data Catalog and Data Flow

Machine Learning Collaboration with Oracle Data Science

Before starting creating a Data Science Project, you'll need to review some security and policy settings which are well described in this blog post. Those settings are not straightforward, but once in place, we can then create a Project, where a team can collaborate. All we need to define is the compartment name, the project name and description.

Machine Learning Collaboration with Oracle Data Science

After creating the project, is time to create one (or more) Notebook Sessions within the project. In here we need to specify the compartment (the same as the project), the name of the notebook session, the shape (choosing between a pre-defined list of shapes), the size of the block storage and the networking details.

Machine Learning Collaboration with Oracle Data Science

If all the security and policy settings are working we'll have the notebook instance created in minutes

Machine Learning Collaboration with Oracle Data Science

We can then click on Open to enter the notebook itself.

What's in the Notebook

What we see immediately is that Data Science is based on instantiated images of jupyter notebooks including Python 3. Data Science Notebooks can also be version-controlled with a pre-built git integration and contain all the best python open source ML libraries like TensorFlow, Keras, SciKit-Learn, and XGBoost as well as the common visualization ones like Plotly and Matplotlib. The terminal access also means that we have the freedom to install any other custom library we might be interested in.

Machine Learning Collaboration with Oracle Data Science

We can now create a Python notebook and start solving out ML problems.

Pro-Tip: move all notebooks and data under the folder /block_storage  otherwise you'll lose the content when stopping and starting the notebook.

Oracle Accelerated Data Science (ADS) SDK

A new feature coming with Oracle Data Science is also the Accelerated Data Science (ADS) SDK: a python library simplifying and accelerating the data science process by offering a set of methods that covers all the phases of the process, from data acquisition to model creation, evaluation and interpretation.

The first step to use the SDK is to import the related libraries

from ads.dataset.factory import DatasetFactory
from ads.dataset.dataset_browser import DatasetBrowser

Then we can import the data and define pointsCat as our target column

df=DatasetFactory.open("Data/train_winedata.csv", target="pointsCat")

Immediately after importing, the library will suggest to use two methods: show_in_notebook()  and get_recommendations(). Let's execute them!

df.show_in_notebook()

This function creates a series of sections enabling us to understand better the dataset. The sections include a Summary expressing the overall features of the dataset, then for each Feature, a dedicated chart will represent the distribution. The Correlation tab shows the similarity between features and finally, the Data tab shows examples of the dataset.

Machine Learning Collaboration with Oracle Data Science

The next step to try is the get_recommendations() which generates few interesting suggestions to transform our dataset. The function shows which are the primary columns that should be deleted, suggestions on how to handle missing values and features with strong correlation, lastly allows the identification of the positive label for the target.

Machine Learning Collaboration with Oracle Data Science

The end-user can keep or change any of the values proposed and apply the transformations. Please note that the original object is not modified, to get the transformed object we need to call

wine_transformed=df.get_transformed_dataset()

If we don't want to go through all the transformations we can rely on ADF to chose the best for us by calling the auto_transform() function which will implement all the transformations suggested before with the default parameters. If we want to have a look at what transformations have been processed by the auto_transform(), we can call the visualize_transforms() function which will show the pipeline in an image similar to the below

Machine Learning Collaboration with Oracle Data Science

Next step is to create a predictive model, again with the SDK is just a few lines of code away

from ads.dataset.factory import DatasetFactory
from ads.automl.provider import OracleAutoMLProvider
from ads.automl.driver import AutoML
from ads.evaluations.evaluator import ADSEvaluator
ml_engine = OracleAutoMLProvider(n_jobs=-1, loglevel=logging.ERROR)
oracle_automl = AutoML(wine_transformed, provider=ml_engine)
automl_model1, baseline = oracle_automl.train()

The SDK will now start testing a few ML algorithms to find the one providing the best results. At the end of the execution, it'll show a summary output like the following which includes also the Selected Algorithm which is the ML model having the best score.

Machine Learning Collaboration with Oracle Data Science

For each of the ML algorithms tried, the SDK will also show some summary stats that can be used to evaluate all the models.

Machine Learning Collaboration with Oracle Data Science

We can reprint the models anytime passing the number of rows to visualize and the sort order as parameters with the following call

oracle_automl.print_trials(max_rows=20, sort_column='Mean Validation Score')

And have a graphical representation of why a certain algorithm was selected by calling the visualize_algorithm_selection_trials() function.

Machine Learning Collaboration with Oracle Data Science

ADS has similar calls to check the sampling size (visualize_adaptive_sampling_trials()) and the features selected (visualize_feature_selection_trials()). These functions can take a while to execute, probably due to the first version of the SDK.

The model train function accepts optional parameters such as:

  • if we are interested in just trying out a specific model (or models) we can pass it via model_list=['LogisticRegression']
  • to change the scoring function we use score_metric='f1_macro'
  • to give a hint about the amount of time to spend on training we pass time_budget=10
  • we can specify which features we want always to be included  with min_features=['price', 'country'])

Once created the models, we can then evaluate them with  

evaluator = ADSEvaluator(test_wine, models=[automl_model1, baseline],
                         training_data=wine_transformed, positive_class='Good')
evaluator.show_in_notebook(plots=['normalized_confusion_matrix'])
evaluator.metrics

which results being

Machine Learning Collaboration with Oracle Data Science

The ADF also contains functions for model explainability, but the ones available in v1 don't seem to work well, all the trials done as of now are stopping against errors in the code.

Once we identified the model, it's time to save it, we can do it via

from ads.catalog.model import ModelSummaryList, ModelCatalog
from ads.catalog.project import ProjectSummaryList, ProjectCatalog
from ads.catalog.summary import SummaryList
from ads.common.model_artifact import ModelArtifact
path_to_model_artifact = "/home/datascience/block_storage/my_model"
model_artifact = automl_model1.prepare(path_to_model_artifact, force_overwrite=True)

The prepare function will create several files within the chosen directory that describes the model, once done, we can store the model in the catalog with the following

import os
compartment_id = os.environ['NB_SESSION_COMPARTMENT_OCID']
project_id = os.environ["PROJECT_OCID"]

# Saving the model artifact to the model catalog:
mc_model = model_artifact.save(project_id=project_id, compartment_id=compartment_id, display_name="Wine LGBMClassifier",
description="Wine LGBMClassifier predictor", training_script_path="Training.ipynb", ignore_pending_changes=True)

Please note that this step will fail, if you didn't configure the ADF SDK to access the OCI APIs, a separate post is coming covering this step. If on the other side, all the setup is correctly done, the Model is then visible within the Data Science Project Page

Machine Learning Collaboration with Oracle Data Science

The model can now be downloaded and used by others, we can also think about  exposing it as function and call it via REST APIs, all possible with the help of the ADS SDK as mentioned in the documentation.

A New Ecosystem for Data Science Collaboration

Oracle Data Science is an interesting product and covers a missing piece in Oracle's AI strategy. As of now the user experience is a bit rough on the edges both during provisioning, with the policy configuration as a required pre-step, and during utilization with some of the steps not working 100% of the times. This, however, is just the first release and we hope to have a speedy following set of new versions as already happening for all the other products in the Oracle cloud.

Oracle Analytics Server Step by Step Installation

Oracle Analytics Server Step by Step Installation

Oracle Analytics Server is available! There is already a blog post talking about how to have a docker version up and running in minutes. My one explains the GUI process, providing hints on how to automate the whole installation using response files. The steps are very similar to an OBIEE12c installation, let's check the details.

Supported OS and Database

Before installing is always a good practice to check which are the supported OS and Databases (for the RCU schemas). A news for Oracle Analytics Server is that the certification matrix migrated from an Excel sheet to a website making it easier to find informations.

For this initial OAS release, the supported os is Linux, with Oracle Linux 6 being the minimal release. Windows support will come later in the year

Oracle Analytics Server Step by Step Installation

Also worth mentioning that as per documentation, the OAS RCU is supported only in Oracle Databases, starting from 11.2, support for other databases will come in the future.

Oracle Analytics Server Step by Step Installation

Once cleared the pre-reqs, let's see now how we can install OAS!

Files Download

The first step is to download files from e-delivery

Oracle Analytics Server Step by Step Installation

You can avoid downloading the Oracle VM Virtual Appliance since it's not needed and will save you from downloading an extra 2.8GB

Oracle Analytics Server Step by Step Installation

One of the prerequisites is the java JDK 1.8 8u211 or newer, which can be downloaded from the oracle website.

Before starting the installation, let's have a look at the recommended directory structure: as for OBIEE12c Oracle documentation suggests to keep the ORACLE_HOME and DOMAIN_HOME separate. This is not just a good practice, by keeping binaries and configurations separate we avoid having problems when new OAS versions will be released since they'll be installed in new ORACLE_HOMEs while the configurations and applications data will always reside in the DOMAIN_HOME.

Oracle Analytics Server Step by Step Installation

Be aware that the OAS default configuration step will suggest to create the DOMAIN_HOME under the ORACLE_HOME, you'll need to set the DOMAIN_HOME externally to be compliant to what the documentation suggests!

Installation

The first step is to unzip the files V988574-01.zip and V983368-01.zip then we can start the Fusion Middleware installation by

java -jar fmw_12.2.1.4.0_infrastructure.jar

This will open up the installation GUI asking to setup the Oracle Inventory Directory and the OS Group, if not already configured in the same server

Oracle Analytics Server Step by Step Installation

The next steps require to choose the ORACLE_HOME as per diagram before we selected the /opt/oracle/product/oas55 which gives also a hint on the OAS version we're installing

Oracle Analytics Server Step by Step Installation

The rest of the FMW installation is, selecting the installation type, verifying the prerequisites and checking the installation summary. An important step, if you want to reproduce the same installation via script in other environments is to save the response file. The file (in the following command named fmw.rsp) contains all the details of the setup, and, by changing some parameters, it can be reused for other installations.

java -jar name fmw_12.2.1.4.0_infrastructure.jar -silent -responseFile fmw.rsp -invPtrLoc <INVENTORY_FILE_LOCATION>  

After installing, we need to apply the 30657796 patch which was downloaded from e-delivery. To do that, we simply need to unzip the V988922-01.zip file which will create the 30657796 folder and then

cd 30657796
export PATH=$ORACLE_HOME/Opatch:$PATH
opatch apply

If the patching is successful we'll get a message like

Patching component oracle.fmwconfig.common.wls.shared.internal, 12.2.1.4.0...
Patch 30657796 successfully applied.
Log file location: /....log

OPatch succeeded.

Next step is OAS installation, we can do it by running

java -jar Oracle_Analytics_Server_5.5.0.jar

After skipping the auto-upgrades we just need to select the ORACLE_HOME (same one defined on top) and verify that all the prerequisites are met.

Oracle Analytics Server Step by Step Installation

Again, you may want to save the response file for later in case you want to reproduce the silent installation in another server.

Repository

Next step is to create the database schemas required by OAS, by executing the rcu command.

cd $ORACLE_HOME/oracle_common/bin
./rcu

The GUI will ask for Database location and credentials used to create the schemas

Oracle Analytics Server Step by Step Installation

Please note that you need to select Oracle Business Intelligence checkbox in the Select Components tab. You can also specify tablespaces and password for each of the schemas created. Also in this case an option to save the response file in order to do a silent installation is available.

The RCU schemas can also be created as part of the configuration process, I detailed it as separate topic since the standalone rcu is more reliable and provides more options for tablespace and schema management.

Configuration

Next step is to configure OAS, for this we need to run

cd $ORACLE_HOME/bi/bin
./config.sh

The set of screens will ask for the DOMAIN_HOME mentioned above, the GUI will suggest a DOMAIN_HOME located under the ORACLE_HOME. If you want to be compliant to the Oracle documentation and avoid problems in case of future in place upgrades, place the DOMAIN_HOME in a separate folder!

Oracle Analytics Server Step by Step Installation

Other details required are: the Admin username and password, the products to include (with BI Publisher and Oracle Analytics Enterprise Edition being the two options available). Then we'll have to point to the RCU schema created before and decide the port range.

Oracle Analytics Server Step by Step Installation

If all everything is set correctly you should get a successful configuration with all the services starting properly.

Oracle Analytics Server Step by Step Installation

And OAS should be available (by default at http://<SERVER_NAME>:9502/analytics)!

Oracle Analytics Server Step by Step Installation

The whole example above was performed on a OCI compute instance installing schemas on a DBaaS. This could be a solution for customers willing to use the Oracle cloud but still needing a level of customization and control higher than what's achievable in OAC.

For more info, don't hesitate to contact us!

Come and see us at Oracle OpenWorld 😎 One week to go!

Come and see us at Oracle OpenWorld 😎 One week to go!

If you haven’t registered for Oracle OpenWorld Europe yet, why not? It’s free to attend, and is jam packed with business and technical sessions delivered by a mixture of Oracle’s product team and end users giving real world case studies of using Oracle’s Cloud offerings. You can register here.

Rittman Mead will be at Stand 40 on both the 12th and 13th, where you can come and talk to us about OAC, OAS, AI, ML and everything else analytics. You can still book a slot in our OOW diary, or drop by whenever it suits you!

Come and see us at Oracle OpenWorld 😎 One week to go!

With Oracle Analytics Server (OAS) having been released last Friday, you’ve probably got some questions. We’ve done some work in it internally, so come and speak to us if you want to see OAS working, talk about the upgrade/install process or quiz us about licensing etc.
Not to mention, our very own Oracle ACE Director, Francesco Tisiot👇🏼 will be teaching you How to become a Data Scientist with OAC at 8.20am on Wednesday 12th, Arena H - Zone 4. Don’t miss it!

Come and see us at Oracle OpenWorld 😎 One week to go!

The team looks forward to seeing you there!

Oracle Analytics Server is here

Oracle Analytics Server is here

Oracle has been talking about this product for months and we’re pleased to tell you the wait is over (for Linux users anyway), and Oracle Analytics Server (OAS) 5.5.0 is now available for download on edelivery(search Oracle Analytics Server).

Oracle Analytics Server is here
Oracle Fusion Middleware needs to be downloaded separately

If your organisation uses Oracle technology for data analytics, then the likelihood is you’ve heard of Oracle Analytics Cloud (OAC). You may have also come across the latest addition to the family, OAS, the new on-prem version of OAC, set to eventually replace OBIEE.


The umbrella term, Oracle Analytics, now includes:

Oracle Analytics Server is here
  • Oracle Analytics Server (OAS)
  • Oracle Analytics Cloud (OAC)
  • Oracle Analytics for Applications (OAX)

Whilst OAC is the jewel in the crown, and will receive regular quarterly updates, these updates will be reflected in OAS. You’ll be pleased to hear current OBIEE users will be automatically licensed for OAS - the logic behind this is that OAS becomes a stepping stone in your journey to using Cloud. If you’re buying OAS new, the licensing model is the same as the current OBIEE model.

OAS looks almost the same as OAC, minus some features, like the Natural Language Generator. This feature generates explanations of your visualisations in 28 different languages and will probably be included in a later version of the tool.  

How does OAS compare to OBIEE and OAC?

  • Licensing: OAS now includes options like Data Visualization (DV) and Mobile which were previously considered extra.
  • Data Visualization: Oracle’s self-service visualisation tool does what it says on the tin. Allows you to decipher your enterprise data with intelligent visuals. Now it includes almost all the new features available in OAC. A big step forward compared to the DV version available in the latest OBIEE
  • Data Flows: Clean and Transform your data via a GUI based tool without leaving your analytical platform.
  • Machine Learning: All the goodies related to “one-click forecast” or  “Explain” and the full ML capabilities are now included in the on-prem Oracle Analytics Edition!
  • Configuration Options: OAS provides the "OBIEE"-type configuration options, where you can tweak each componend individually

Oracle’s aim is for users to achieve “100% data literacy” and plan to do this via their vision for analytics: augmented, collaborative and integrated. OAS really plays into this strategy, allowing users to employ data science and machine learning techniques to both analyse current trends and predict future ones (find out more in this blog post)

Oracle Analytics Server is here


Talk to us about how to migrate from OBIEE to OAS or OAC. We can help you with every deployment scenario including on-prem, hybrid, full public cloud, or a mix and match of these suited to your needs. Email us: info@rittmanmead.com to arrange a chat with one of our team.

What’s new in OAC5.5?

What's new in OAC5.5?

Last Friday, alongside Oracle Analytics Server (for which a blog post is coming), new OAC version came out, let's have a quick look at all the new features it includes!

Maps

If you use Maps often, then there is a good list of options available to you! The very first is the possibility to associate a Map Layer to a Data Column directly in the data source definition.

Let's say you have a column City Zones in your dataset, which divides a city in customized areas based on your company needs and you have a map layer defining geographically those areas (e.g. with a GeoJSON file). I created an example with Verona, the city where I live. Custom GeoJSON file was created using geojson.io.

What's new in OAC5.5?

The upload of custom shapes and their usage in OAC was already available since some time, however you as project creator had to associate the City Zone column to the correct Map Layer for each map visualization included in your project. Now you can define the Map Layer to Data column association once for all at datasource level, so every Map using the City Zone column will automatically use the correct Layer.

What's new in OAC5.5?

Another cool new feature in Maps is the AutoFocus on Data, meaning that the  visualization will automatically zoom and center the map appropriately based on the dataset presented and rearrange in case of changes in the filtering.

Pivot Tables

Another new option is available in pivot tables where now you can set Totals and Subtotals Above and Below like you were used to do in Answers. Like the "old" tool you can now set a different format for the Totals and Subtotals with coloring, background and font formatting options available. You have now the full control of the layout and can make beautiful or horrible (like the below) color choices.

What's new in OAC5.5?

Visualizations

The perfect visualization is now available: the Spacer Viz! This is an empty visualization that you can add to your canvas allowing you to optimize the layout in cases where you need an extra white space.

What's new in OAC5.5?

Another news in this release is related to the Custom Background: now it's possible to define a color or an image as background for the whole Project or for a single canvas. The image can be a URL reference or uploaded from the desktop. There are also options to position the image in the screen and to auto-fit the image in the window size. Adding a custom background to the whole project means that every time a new canvas is added, it will already have the selected image/color by default.

What's new in OAC5.5?

Another news is represented by the Butterfly Viz, this view was already available  as plugin from the Oracle Analytics Library, now becomes native in OAC5.5. The butterfly viz is useful when comparing two metrics across the same dimension.

What's new in OAC5.5?

By Default the two metrics are on the same scale, but there is also an option "Synchronized Scales" that, when set to OFF will show the metric on different scales.

Datasources and Data Gateway

A new datasource definition to Oracle NetSuite is now available, allowing the connection by passing the parameters Datasource, Account ID and Role ID on top of the usual Host, Username and Password.

What's new in OAC5.5?

An enhancement has been published also for the Oracle Database connection: now you can select between a Basic connection and Advanced. The Basic option should be used when connecting to single node databases. The Advanced, on the other side, is useful when connecting to Cluster RAC DBs where multiple hostnames and ports need to be listed. When selecting the Advanced option we can simply add a custom connection string like the below

(DESCRIPTION 
      (ADDRESS_LIST= (LOAD_BALANCE=on)(FAILOVER=on)
      (ADDRESS=(PROTOCOL=tcp)(HOST=hostname1.subnet.com)(PORT=1529))
      (ADDRESS=(PROTOCOL=tcp)(HOST=hostname2.subnet.com)(PORT=1529))
      (ADDRESS=(PROTOCOL=tcp)(HOST=hostname3.subnet.com)(PORT=1529))
      ...
      (ADDRESS=(PROTOCOL=tcp)(HOST=hostnamen.subnet.com)(PORT=1529))
      )
)

A option is also available to use Data Gateway with Essbase sources, making the OAC  on-premises Essbase connection only one click away by just enabling Use Data Gateway option in the screen. The usage of Data Gateway is also available now on BI Publisher allowing the pixel perfect reporting from on-premises datasources.

What's new in OAC5.5?

Another option in BI Publisher is Data Chunking, extremely useful for big reports since it allows the report execution in multiple sub-jobs in parallel with a final job to consolidate the results in a unique output.

The above are the news for this release, do you want more detailed examples on a particular features? Let me know in the comments and I'll write about it!