Liberate your data

Intelligence is all about knowledge. This website is dedicated sharing expertise on Oracle BI. More ยป

 

Advisor Webcast: Authentication, Authorization, Setup and Configuration in Business Intelligence

Don't forget to register for the following Advisor Webcast scheduled for May 21st, 2019. Authentication, Authorization, Setup and Configuration in Business Intelligence

Schedule

Tuesday , May 21, 2019 09:00 AM (US Pacific Time)
    Tuesday , May 21, 2019 12:00 PM (US Eastern Time)
    Tuesday , May 21, 2019 06:00 PM (Central European Time)
    Tuesday , May 21, 2019 09:30 PM (India Standard Time)

 

Abstract:

  • This one-hour advisor webcast is recommended for technical users, functional users, system administrators who are using Oracle Business Intelligence. The webcast covers authentication, authorization, setup and configuration for Oracle Business Intelligence.

Topics Include:

  • BI Security Overview
  • Authentication in BI
  • Authorization
  • Validating BI Security Setup
  • Common Security Issues

 

For Additional Information and Registration Details:

Advisor Webcast: Oracle Analytics Cloud Data Sync V2.5

Don't forget to register for the following Advisor Webcast scheduled for April 24th, 2019. Oracle Analytics Cloud Data Sync V2.5

Schedule

Wednesday, April 24, 2019 09:00 AM (US Pacific Time)
    Wednesday, April 24, 2019 12:00 PM (US Eastern Time)
    Wednesday, April 24, 2019 06:00 PM (Central European Time)
    Wednesday, April 24, 2019 09:30 PM (India Standard Time)

 

Abstract:

  • This one-hour advisor webcast is recommended for technical users, functional users, system administrators, Database Administrators, etc. who want to load and transform On-Premise/Cloud Data to analyze in an Oracle Database (On-premises, Oracle Database Cloud Service, ADW, Exadata Express), Oracle Essbase, or Oracle Analytics Cloud data sets.

Topics Include:

  • Data Sync
  • Architecture
  • Concepts
  • Install and Setup
  • Loading Data Steps

 

For Additional Information and Registration Details:

Advisor Webcast: Troubleshooting Performance Issues in Planning and Budgeting Cloud Service (PBCS)

Don't forget to register for the following Advisor Webcast scheduled for May 7th, 2019. Troubleshooting Performance Issues in Planning and Budgeting Cloud Service (PBCS)

Schedule

Tuesday , May 07, 2019 08:00 AM (US Pacific Time)
Tuesday , May 07, 2019 11:00 AM (US Eastern Time)
Tuesday , May 07, 2019 05:00 PM (Central European Time)
Tuesday , May 07, 2019 08:30 PM (India Standard Time)

 

Abstract:

  • This one-hour advisor webcast is recommended for System Administrators and PBCS Implementation Consultants who administer and/or implement PBCS for Oracle customers. This session focuses on IT Operations performed by Oracle and how customers can help Oracle Support with performance issues in PBCS applications.

Topics Include:

  • Understanding IT Operations performed by Oracle
  • Customer's responsibilities with regards to Oracle's IT Operations
  • Understand and categorize performance issues
  • Understanding self service options with regards to performance issues
  • Understand how customers can help Support expedite solutions

 

For Additional Information and Registration Details:

Game of Thrones Series 8: Real Time Sentiment Scoring with Apache Kafka, KSQL, Google’s Natural Language API and Python

Game of Thrones Series 8: Real Time Sentiment Scoring with Apache Kafka, KSQL, Google's Natural Language API and Python

Hi, Game of Thrones aficionados, welcome to GoT Series 8 and my tweet analysis! If you missed any of the prior season episodes, here are I, II and III. Finally, after almost two years, we have a new series and something interesting to write about! If you didn't watch Episode 1, do it before reading this post as it might contain spoilers!

Let's now start with a preview of the starting scene of Episode 2:

Game of Thrones Series 8: Real Time Sentiment Scoring with Apache Kafka, KSQL, Google's Natural Language API and Python

If you followed the previous season blog posts you may remember that I was using Kafka Connect to source data from Twitter, doing some transformations with KSQL and then landing the data in BigQuery using Connect again. On top of it, I was using Tableau to analyze the data.

Game of Thrones Series 8: Real Time Sentiment Scoring with Apache Kafka, KSQL, Google's Natural Language API and Python

The above infrastructure was working fine and I have been able to provide insights like the sentiment per character and the "game of couples" analysing how a second character mentioned in the same tweet could change the overall sentiment.

Game of Thrones Series 8: Real Time Sentiment Scoring with Apache Kafka, KSQL, Google's Natural Language API and Python

The sentiment scoring was however done at visualization time, with the data extracted from BigQuery into Tableau at tweet level, scored with an external call to R, then aggregated and finally rendered.

Game of Thrones Series 8: Real Time Sentiment Scoring with Apache Kafka, KSQL, Google's Natural Language API and Python

As you might understand the solution was far from optimal since:

  • The Sentiment scoring was executed for every query sent to the database, so possibly multiple times per dashboard
  • The data was extracted from the source at tweet level, rather than aggregated

The dashboard indeed was slow to render and the related memory consumption huge (think about data volumes being moved around). Furthermore, Sentiment Scores were living only inside Tableau: if any other people/application/visualization tool wanted to use them, they had to recalculate from scratch.

My question was then: where should I calculate Sentiment Scores in order to:

  • Do it only once per tweet, not for every visualization
  • Provide them to all the downstream applications

The answer is simple, I need to do it as close to the source as possible: in Apache Kafka!

Sentiment Scoring in Apache Kafka

There are a gazillion different ways to implement Sentiment Scoring in Kafka, so I chose a simple method based on Python and Google's Natural Language API.

Google Natural Language API

Google's NL APIs is a simple interface over a pre-trained Machine Learning model for language Analysis and as part of the service it provides sentiment scoring.

The Python implementation is pretty simple, you just need to import the correct packages

from google.cloud import language_v1
from google.cloud.language_v1 import enums

Instantiate the LanguageServiceClient

client = language_v1.LanguageServiceClient()

Package the tweet string you want to be evaluated in a Python dictionary

content = 'I'm Happy, #GoT is finally back!'
type_ = enums.Document.Type.PLAIN_TEXT
document = {'type': type_, 'content': content}

And parse the response

response = client.analyze_sentiment(document)
sentiment = response.document_sentiment
print('Score: {}'.format(sentiment.score))
print('Magnitude: {}'.format(sentiment.magnitude))

The result is composed by Sentiment Score and Magnitude:

  • Score indicated the emotion associated with the content as Positive (Value > 0) or Negative (Value < 0)
  • Magnitude indicates the power of such emotion, and is often proportional with the content length.

Please note that Google's Natural Language API is priced per document so the more content you send for scoring, the bigger your bill will be!

Creating a Kafka Consumer/Producer in Python

Once we fixed how to do Sentiment Scoring, it's time to analyze how we can extract a tweet from Kafka in Python. Unfortunately, there is no Kafka Streams implementation in Python at the moment, so I created an Avro Consumer/Producer based on Confluent Python Client for Apache Kafka. I used the jcustenborder/kafka-connect-twitter Connect, so it's always handy to have the Schema definition around when prototyping.

Avro Consumer

The implementation of an Avro Consumer is pretty simple: as always first importing the packages

from confluent_kafka import KafkaError
from confluent_kafka.avro import AvroConsumer
from confluent_kafka.avro.serializer import SerializerError

then instantiating the AvroConsumer passing the list of brokers, group.id useful, as we'll see later, to add multiple consumers to the same topic, and the location of the schema registry service in schema.registry.url.

c = AvroConsumer({
    'bootstrap.servers': 'mybroker,mybroker2',
    'group.id': 'groupid',
    'schema.registry.url': 'http://127.0.0.1:8081'})

Next step is to subscribe to a topic, in my case got_avro

c.subscribe(['got_avro'])

and start polling the messages in loop

while True:
    try:
        msg = c.poll(10)

    except SerializerError as e:
        print("Message deserialization failed for {}: {}".format(msg, e))
        break

    print(msg.value())

c.close()

In my case, the message was returned as JSON and I could extract the tweet Text and Id using the json package

text=json.dumps(msg.value().get('TEXT'))
id=int(json.dumps(msg.value().get('ID')))

Avro Producer

The Avro Producer follows a similar set of steps, first including needed packages

from confluent_kafka import avro
from confluent_kafka.avro import AvroProducer

Then we define the Avro Key and Value Schemas, in my case I used the tweet Id as key and included the text in the value together with the sentiment score and magnitude.

key_schema_str = """
{
   "namespace": "my.test",
   "name": "value",
   "type": "record",
   "fields" : [
     {
       "name" : "id",
       "type" : "long"
     }
   ]
}
"""
value_schema_str = """
{
   "namespace": "my.test",
   "name": "key",
   "type": "record",
   "fields" : [
     {
       "name" : "id",
       "type" : "long"
     },
     {
       "name" : "text",
       "type" : "string"
     },
     {
       "name" : "sentimentscore",
       "type" : "float"
     },
     {
       "name" : "sentimentmagnitude",
       "type" : "float"
     }
   ]
}
"""

Then it's time to load the Key and the Value

value_schema = avro.loads(value_schema_str)
key_schema = avro.loads(key_schema_str)
key = {"id": id}
value = {"id": id, "text": text,"sentimentscore": score ,"sentimentmagnitude": magnitude}

Creating the instance of the AvroProducer passing the broker(s), the schema registry URL and the Key and Value schemas as parameters

avroProducer = AvroProducer({
    'bootstrap.servers': 'mybroker,mybroker2',
    'schema.registry.url': 'http://schem_registry_host:port'
    }, default_key_schema=key_schema, default_value_schema=value_schema)

And finally produce the event defining as well the topic that will contain it, in my case got_avro_sentiment.

avroProducer.produce(topic='got_avro_sentiment', value=value, key=key)
avroProducer.flush()

The overall Producer/Consumer flow is needless to say, very easy

Game of Thrones Series 8: Real Time Sentiment Scoring with Apache Kafka, KSQL, Google's Natural Language API and Python

And it works!

Game of Thrones Series 8: Real Time Sentiment Scoring with Apache Kafka, KSQL, Google's Natural Language API and Python

Parallel Sentiment Scoring

One thing I started noticing immediately, however, is that especially on tweeting peaks, the scoring routine couldn't cope with the pace of the incoming tweets: a single python Consumer/Producer was not enough. No problem! With Kafka, you can add multiple consumers to the same topic, right?

Of course Yes! But you need to be careful.

Consumer Groups and Topic Partitions

You could create multiple consumers on different Consumer Groups (defined by the group.id parameter mentioned above), but by doing this you're telling Kafka that those consumers are completely independent, thus Kafka will send each one a copy of every message. In our case, we'll simply end up scoring N times the same message, one for each consumer.

If, on the other hand, you create multiple consumers with the same consumer group, Kafka will treat them as unique consuming process and will try to share the load amongst them. However, it will do so only if the source topic is partitioned and will exclusively associate each consumer to one (or more) topic partitions! To read more about this check the Confluent documentation.

The second option is what we're looking for, having multiple threads reading from the same topic and splitting the tweet workload, but how do we split an existing topic into partitions? Here is where KSQL is handy! If you don't know about KSQL, read this post!

With KSQL we can define a new STREAM sourcing from an existing TOPIC or STREAM and the related number of partitions and partition key (the key's hash will be used to assign deterministically a message to a partition). The code is the following

CREATE STREAM <NEW_STREAM_NAME> 
    WITH (PARTITIONS=<NUMBER_PARTITIONS>) 
    AS SELECT <COLUMNS> 
    FROM <EXISTING_STREAM_NAME>  
    PARTITION BY <PARTITION_KEY>;

Few things to keep in mind:

  • Choose the number of partitions carefully, the more partitions for the same topic, the more throughput but at the cost of extra complexity.
  • Choose the <PARTITION_KEY> carefully: if you have 10 partitions but only 3 distinct Keys, then 7 partitions will not be used. If you have 10 distinct keys but 99% of the messages have just 1 key, you'll end up using almost always the same partition.

Yeah! We can now create one consumer per partition within the same Consumer Group!

Joining the Streams

As the outcome of our process so far we have:

  • The native GOT_AVRO Stream coming from Kafka Connect, which we divided into 6 partitions using the tweet id as Key and named GOT_AVRO_PARTITIONED.
  • A GOT_AVRO_SENTIMENT Stream that we created using Python and Google's Natural Language API, with id as Key.

The next logical step would be to join them, which is possible with KSQL by including the WITHIN clause specifying the temporal validity of the join. The statement is, as expected, the following:

SELECT A.ID, B.ID, A.TEXT, B.SENTIMENTSCORE, B.SENTIMENTMAGNITUDE 
FROM GOT_AVRO_PARTITIONED A JOIN GOT_AVRO_SENTIMENT B 
    WITHIN 2 MINUTES 
    ON A.ID=B.ID; 

Please note that I left a two minute window to take into account some delay in the scoring process. And as you would expect I get............ 0 results!

Game of Thrones Series 8: Real Time Sentiment Scoring with Apache Kafka, KSQL, Google's Natural Language API and Python

Reading the documentation better gave me the answer: Input data must be co-partitioned in order to ensure that records having the same key on both sides of the join are delivered to the same stream task.

Since the GOT_AVRO_PARTITIONED stream had 6 partitions and GOT_AVRO_SENTIMENT only one, the join wasn't working. So let's create a 6-partitioned version of GOT_AVRO_SENTIMENT.

CREATE STREAM GOT_AVRO_SENTIMENT_PARTITIONED 
	WITH (PARTITIONS=6) AS 
	SELECT ID, 
		TEXT, 
		SENTIMENTSCORE, 
		SENTIMENTMAGNITUDE 
	FROM GOT_AVRO_SENTIMENT  
	PARTITION BY ID;

Now the join actually works!

Game of Thrones Series 8: Real Time Sentiment Scoring with Apache Kafka, KSQL, Google's Natural Language API and Python

Next topics are: pushdown to Google's BigQuery and visualization using Google's Data Studio! But this, sadly, will be for another post! See you soon, and enjoy Game of Thrones!

Oracle Business Intelligence (OBI) Bundle Patch 12.2.1.4.190416 is Available

The following Bundle Patch has been released for Oracle Business Intelligence Enterprise Edition (OBIEE) 12.2.1.4.x

This bundle patch download is available from the My Oracle Support | Patches & Updates section.

Oracle Business Intelligence (OBI) Bundle Patch 12.2.1.4.190416 Patch 28952857

 

This is the most recent and recommended cumulative Critical Patch Update (CPU) patch for this release. The patch should be applied according to the readme.

Readme:

Prior to proceeding with this OBIEE Bundle Patch implementation and related downloads, refer to the Readme file for important information. It is important to verify that the requirements, installation instructions; support paths; notes; etc for this patch are met as outlined within the Readme file; which is available from the Patches & Updates download screen.

Patch Information:

  • Oracle BI Bundle Patches are cumulative.

  • To install the Oracle BI Bundle Patch 12.2.1.4.190416 patch, Oracle BI 12.2.1.4.0 must be installed in the Oracle BI Home directory. Subsequent Oracle BI patches can be installed on Oracle BI 12.2.1.4.0 or any Oracle BI bundle patch with a lower fifth numeral version than the one being installed. For example, you can install 12.2.1.4.190416 on 12.2.1.4.0.

 

For More Information.....

 

To share your experience about installing this patch ...

  • In the MOS | Patches & Updates screen for OBIEE  Patch 28952857, click the "Start a Discussion" and submit your review.

 

Have a question for OBIEE specifically ....

The My Oracle Support Community " OBIEE (MOSC) " is the ideal first stop to seek & find product specific answers: