Category Archives: Rittman Mead

Sql2Odi, Part Two: Translate a complex SELECT statement into an ODI Mapping

What we call a complex SQL SELECT statement really depends on the context. When talking about translating SQL queries into ODI Mappings, pretty much anything that goes beyond a trivial SELECT * FROM <a_single_source_table> can be called complex.

SQL statements are meant for humans to be written and for RDBMS servers like Oracle Database to be understood and executed. RDBMS servers benefit from a wealth of knowledge about the database we are querying and are willing to give us a lot of leeway about how we write those queries, to make it as easy as possible for us. Let me show you what I mean:

SELECT
    FIRST_NAME,
    AGE - '5' LIE_ABOUT_YOUR_AGE,
    REGION.*
FROM
    CUSTOMER
    INNER JOIN REGION ON "CUSTOMER_REGION_ID" = REGION.REGION_ID

We are selecting from two source tables, yet we have not bothered about specifying source tables for columns (apart from one instance in the filter). That is fine - the RDBMS server can fill that detail in for us by looking through all source tables, whilst also checking for column name duplicates. We can use numeric strings like '567' instead of proper numbers in our expressions, relying on the server to perform implicit conversion. And the * will always be substituted with a full list of columns from the source table(s).

All that makes it really convenient for us to write queries. But when it comes to parsing them, the convenience becomes a burden. However, despite lacking the knowledge the the RDBMS server possesses, we can still successfully parse and then generate an ODI Mapping for quite complex SELECT statements. Let us have a look at our Sql2Odi translator handling various challenges.

Rittman Mead's Sql2Odi Translator in Action

Let us start with the simplest of queries:

SELECT
  ORDER_ID,
  STATUS,
  ORDER_DATE
FROM
  ODI_DEMO.SRC_ORDERS

The result in ODI looks like this:

Sql2Odi has created an Expression, in which we have the list of selected columns. The columns are mapped to the target table by name (alternatively, they could be mapped by position). The target table is provided in the Sql2Odi metadata table along with the SELECT statement and other Mapping generation related configuration.

Can we replace the list of columns in the SELECT list with a *?

SELECT * FROM ODI_DEMO.SRC_ORDERS

The only difference from the previously generated Mapping is that the Expression now has a full list of source table columns. We could not get the list of those columns while parsing the statement but we can look them up from the source ODI Datastore when generating the mapping. Groovy!

Let us increase the complexity by adding a JOIN, a WHERE filter and an ORDER BY clause to the mix:

SELECT
  SRC_ORDERS.*
FROM
  ODI_DEMO.SRC_ORDERS
  LEFT JOIN ODI_DEMO.SRC_CUSTOMER CUST ON 
    SRC_ORDERS.CUST_ID = CUST.CUSTID
WHERE 
  CUST.AGE BETWEEN 20 AND 50
ORDER BY CUST.AGE

The Mapping looks more crowded now. Notice that we are selecting * from one source table only - again, that is not something that the parser alone can handle.

We are not using ODI Mapping Datasets - a design decision was made not to use them because of the way Sql2Odi handles subqueries.

Speaking of subqueries, let us give them a try - in the FROM clause you can source your data not only from tables but also from sub-SELECT statements or subqueries.

SELECT 
  LAST_NAME,
  FIRST_NAME,
  LAST_NAME || ' ' || FIRST_NAME AS FULL_NAME,
  AGE,
  COALESCE(LARGE_CITY.CITY, ALL_CITY.CITY) CITY,
  LARGE_CITY.POPULATION
FROM 
  ODI_DEMO.SRC_CUSTOMER CST
  INNER JOIN ODI_DEMO.SRC_CITY ALL_CITY ON ALL_CITY.CITY_ID = CST.CITY_ID  
  LEFT JOIN (
    SELECT 
      CITY_ID,
      UPPER(CITY) CITY,
      POPULATION
    FROM ODI_DEMO.SRC_CITY
    WHERE POPULATION > 750000
  ) LARGE_CITY ON LARGE_CITY.CITY_ID = CST.CITY_ID  
WHERE AGE BETWEEN 25 AND 45

As we can see, a sub-SELECT statement is handled the same way as a source table, the only difference being that we also get a WHERE Filter and an Expression that together give us the data set of the subquery. All Components representing the subquery are suffixed with a 3 or _3_1 in the Mapping.

Now let us try Aggregates.

SELECT 
  REGION,
  SUM(POPULATION) TOTAL_POPULATION,
  ROUND(MAX(SRC_CITY.POPULATION) / 1000000) BIGGEST_CITY_POPULATION_K,
  ROUND(MIN(SRC_CITY.POPULATION) / 1000000) SMALLEST_CITY_POPULATION_K
FROM 
  ODI_DEMO.SRC_CITY
  INNER JOIN ODI_DEMO.SRC_REGION ON SRC_CITY.REGION_ID = SRC_REGION.REGION_ID
WHERE
  CITY_ID > 20 AND 
  "SRC_CITY"."CITY_ID" < 1000 AND 
  ODI_DEMO.SRC_CITY.CITY_ID != 999 AND
  COUNTRY IN ('USA', 'France', 'Germany', 'Great Britain', 'Japan')
GROUP BY
  REGION
HAVING 
  SUM(POPULATION) > 10000 AND
  MIN(SRC_CITY.POPULATION) > 100 AND
  MAX("POPULATION") > 1000 AND
  AVG(ODI_DEMO.SRC_CITY.POPULATION) >= 5

This time, instead of an Expression we have an Aggregate. The parser has no problem handling the many different "styles" of column references provided in the HAVING clause - all of them are rewritten to be understood by ODI.

Now let us throw different Expressions at it, to see how well they are handled.

SELECT
  REG_COUNTRY.COUNTRY,
  REG_COUNTRY.LOC,
  REG_COUNTRY.NR_OF_EURO_REG,
 
  LAST_NAME,
  LAST_NAME AS SURNAME,
  FIRST_NAME,
  FIRST_NAME || ' ' || LAST_NAME FULL_NAME,
  'String concatenation' || ' ' || FIRST_NAME || ' demo.' CONCAT_STRING,
  UPPER(LAST_NAME) || ' in UpperCase' AS LAST_NAME_UPPER,
  SUBSTR(TRIM(UPPER('     Name: ' || LAST_NAME || ' ' || FIRST_NAME || '    ')), 2, 10) TEXT_FUNC,
  TRANSLATE(UPPER(LAST_NAME), 'AEIOU', 'XXXXX') X_LAST_NAME,
  LENGTH(FIRST_NAME || ' ' || LAST_NAME) FULL_NAME_LEN,
  10 +  LENGTH(FIRST_NAME || ' ' || LAST_NAME) FULL_NAME_LEN_10P,
  10 *  LENGTH(FIRST_NAME || ' ' || LAST_NAME) FULL_NAME_LEN_10T,
  INSTR(UPPER(LAST_NAME), 'MC') MC_IN_LAST,
  
  1 + 2 + 3 + 4 +5+6+7 SIMP_SUM,
  1+2-3*4/5+(6*7+8-9)/(1+2+3) SUM2,
  ROUND(1+2-3*4/5+(6*7+8-9)/(1+2+3)) SUM2_ROUND1,
  ROUND(1+2-3*4/5+(6*7+8-9)/(1+2+3), 2) SUM2_ROUND2,
  FLOOR(ROUND(1+2-3*4/5+(6*7+8-9), 2) / ROUND((1+2+3), 2)) SUM2_ROUND3, 
  
  SYSDATE DATE_NOW,
  SYSDATE AS CUR_DAT,
  1 + SYSDATE AS CURD_1,
  SYSDATE + 4 AS CURD_4,
  CURRENT_DATE AS CUR_ALT,
  ADD_MONTHS(SYSDATE, 1) CURD_1M,
  CURRENT_TIMESTAMP STAMP_NOW,
  LAST_DAY(SYSDATE) LAST_CURD,
  NEXT_DAY(LAST_DAY(SYSDATE), 2) LAST_NEXT_CURD,  
  TO_CHAR(LAST_DAY(ADD_MONTHS(SYSDATE, 1)), 'DD/MM/YYYY') CHAR_CURT,
  
  CASE
    WHEN REG_COUNTRY.COUNTRY = 'USA' THEN NULL
    WHEN REG_COUNTRY.COUNTRY = 'France' THEN ROUND(1+2-3*4/5+(6*7+8-9)/(1+2+3), 2)
    WHEN REG_COUNTRY.COUNTRY = 'Great Britain' THEN FLOOR(345.56)
    WHEN REG_COUNTRY.COUNTRY = 'Germany' THEN MONTHS_BETWEEN(SYSDATE, SYSDATE+1000)
    ELSE NULL
  END SIM_CASE_NUM,
  
  CASE
    WHEN REG_COUNTRY.COUNTRY = 'USA' THEN NULL
    WHEN REG_COUNTRY.COUNTRY = 'France' THEN NEXT_DAY(LAST_DAY(SYSDATE+5), 2)
    WHEN REG_COUNTRY.COUNTRY = 'Great Britain' THEN NEXT_DAY(LAST_DAY(SYSDATE+40), 2)
    ELSE NULL
  END SIM_CASE_DATE,  
  
  CASE
    WHEN REG_COUNTRY.COUNTRY = 'USA' THEN NULL
    WHEN REG_COUNTRY.COUNTRY = 'France' THEN UPPER(FIRST_NAME || ' ' || LAST_NAME)
    WHEN REG_COUNTRY.COUNTRY = 'Great Britain' THEN AGE || ' years of pain'
    ELSE NULL
  END SIM_CASE_CHAR,  
  
  CASE
    WHEN REG_COUNTRY.COUNTRY = 'USA' THEN NULL
    WHEN REG_COUNTRY.COUNTRY = 'France' THEN FIRST_NAME
    WHEN REG_COUNTRY.COUNTRY = 'Great Britain' THEN LAST_NAME
    ELSE NULL
  END SIM_CASE_CHARCOL,
  
  CASE
    WHEN REG_COUNTRY.COUNTRY = 'USA' THEN NULL
    WHEN REG_COUNTRY.COUNTRY = 'France' THEN AGE
    WHEN REG_COUNTRY.COUNTRY = 'Great Britain' THEN AGE
    ELSE NULL
  END SIM_CASE_NUMCOL,
  
  '123' * 10 IMPI_NUM1,
  123 * '10' IMPI_NUM2
  
FROM 
  ODI_DEMO.SRC_CUSTOMER
  INNER JOIN ODI_DEMO.SRC_CITY ON SRC_CITY.CITY_ID = SRC_CUSTOMER.CITY_ID
  INNER JOIN ODI_DEMO.SRC_REGION ON SRC_CITY.REGION_ID = SRC_REGION.REGION_ID
  INNER JOIN (  
      SELECT COUNTRY_ID, COUNTRY, 'Europe' LOC, COUNT(DISTINCT REGION_ID) NR_OF_EURO_REG FROM ODI_DEMO.SRC_REGION WHERE COUNTRY IN ('France','Great Britain','Germany') GROUP BY COUNTRY_ID, COUNTRY
      UNION
      SELECT DISTINCT COUNTRY_ID, COUNTRY, 'Non-Europe' LOC, 0 NR_OF_EURO_REG FROM ODI_DEMO.SRC_REGION WHERE COUNTRY IN ('USA','Australia','Japan')    
      ORDER BY NR_OF_EURO_REG
    ) REG_COUNTRY ON SRC_REGION.COUNTRY_ID = REG_COUNTRY.COUNTRY_ID
WHERE
  REG_COUNTRY.COUNTRY IN ('USA', 'France', 'Great Britain', 'Germany', 'Australia')
ORDER BY
  LOC, COUNTRY

Notice that, apart from parsing the different Expressions, Sql2Odi also resolves data types:

  • 1 + SYSDATE is correctly resolved as a DATE value whereas TO_CHAR(LAST_DAY(ADD_MONTHS(SYSDATE, 1)), 'DD/MM/YYYY') CHAR_CURT is recognised as a VARCHAR value - because of the TO_CHAR function;
  • LAST_NAME and FIRST_NAME are resolved as VARCHAR values because that is their type in the source table;
  • AGE || ' years of pain' is resolved as a VARCHAR despite AGE being a numeric value - because of the concatenation operator;
  • More challenging is data type resolution for CASE statements, but those are handled based on the datatypes we encounter in the THEN and ELSE parts of the statement.

Also notice that we have a UNION joiner for the two subqueries - that is translated into an ODI Set Component.

As we can see, Sql2Odi is capable of handling quite complex SELECT statements. Alas, that does not mean it can handle 100% of them - Oracle hierarchical queries, anything involving PIVOTs, the old Oracle (+) notation, the WITH statement - those are a few examples of constructs Sql2Odi, as of this writing, cannot yet handle.

Sql2Odi - what is under the hood?

Scala's Combinator Parsing library was used for lexical and syntactic analysis. We went with a context-free grammar definition for the SELECT statement, because our goal was never to establish if a SELECT statement is 100% valid - only the RDBMS server can do that. Hence we start with the assumption that the SELECT statement is valid. An invalid SELECT statement, depending on the nature of the error, may or may not result in a parsing error.

For example, the Expression ADD_MONTHS(CUSTOMER.FIRST_NAME, 3) is obviously wrong but our parser assumes that the FIRST_NAME column is a DATE value.

Part of the parsing-translation process was also data type recognition. In the example above, the parser recognises that the function being used returns a datetime value. Therefore it concludes that the whole expression, regardless of what the input to that function is - a column, a constant or another complex Expression - will always be a DATE value.

The output of the Translator is a structured data value containing definitions for ODI Mapping Components and their joins. I chose JSON format but XML would have done the trick as well.

The ODI Mapping definitions are then read by a Groovy script from within ODI Studio and Mappings are generated one by one.

Mapping generation takes much longer than parsing. Parsing for a mapping is done in a split second whereas generating an ODI Mapping, depending on its size, can take a couple of seconds.

Conclusion

It is possible to convert SQL SELECT statements to ODI Mappings, even quite complex ones. This can make migrations from SQL-based legacy ETL tools to ODI much quicker, allows to refactor an SQL-based ETL prototype to ODI without having to implement the same data extraction and transformation logic twice.

Extending OAC with Map Backgrounds

One of the really powerful things in OAC is the simplicity with which you can create geographical insights using the Map visualisation. I say this as someone who has implemented maps in OBIEE several times and so know the challenges of MapBuilder and MapViewer all too well. It really is a case of chalk and cheese when you compare this to how easy things are in OAC! Well, things just got even better in OAC 5.9. With the introduction of Web Map Service (WMS) and Tiled Web Maps, we can now integrate custom backgrounds to overlay our layers onto. In this post, I will walk through the process of configuring WMS backgrounds and show how this may prove useful to OAC users with a use case based on openly available data sources.

Maps Background

Before we look into these new Map Backgrounds...it's worth taking a moment to offer some background on Maps in OAC. For those of you know how things work, please skip ahead. For those who are new to mapping or who have used Map visualisations in OAC, but never stopped to consider how the data magically appears in your canvas, then here comes a brief guide:

For your data to be rendered on a map, you need three things:

  • A Map Layer which defines the geometry of your data. The map layer must be presented in geoJSON format, which defines a series of points or polygon boundaries. The map layer must also include a key, which needs to match the key range in your data set...this is how OAC is able to interpret your data as relating to a specific point or space on the map layer.
  • A Background Map, which is basically an image that sits behind the map layer and provides context to it. This would typically be an outline map of the world or it could have greater definition if that is helpful.
  • A Data Set that includes a key attribute which relates to its geography. This may be as simple as a country code (or name) or it could be some form of surrogate or business key. When you use this data set to create a Map visualisation, you can allow OAC to determine which map layer to use (based on a comparison of the key in the data set with the configured keys in the available map layers) or you can specify the default layer to be used by setting the Location Details when preparing the data set.

Both Background Maps and Map Layers are configured within the OAC Console. Out of the box, you get a set of useful backgrounds and layers provided, but you are also able to configure your own custom layers and backgrounds. Once set up, they become available for users to select when they create a Map in their DV Canvas.

There's quite a lot to unpack here, including how you can source your geoJSON map layers (either from open sources or via SQL if the geometry data resides in your database), how to add custom layers, how to assign default layers and how to manage the join quality to ensure your visualisation is presenting the right data. Too much to get into here, but I may return to the subject in future - let me know if this would be useful to any of you!

Back to the Map Backgrounds...

So, OAC provides you with three generic Background Maps out of the box:

  • Oracle BI - a clean, silhouetted but coarse grained background map which provides differentiation of country boundaries
  • Oracle Maps - a more detailed map which includes road networks, watercourses, national parks, place names etc.
  • OpenStreetMaps - an open source background with similar detail to the Oracle Maps background, but a slightly more neutral style

The key thing for me, here, is that the options range from coarse grained to extreme detail, with not much in between. Plus, plotting data against a map that only shows physical infrastructure elements may not be relevant for every use case as it can easily complicate the picture and detract from the user experience.

What if I have a data set as per the one shown above (which gives me population and population growth for major towns and cities in England and Wales) and I want to present it in on a map, but in a way that will help me understand more than simply the geographical location? Well, the good news is that there are a growing number of open data map service resources that offer us a rich source for background maps. The better news is that from 5.9 onwards, we are now able to integrate these into our OAC instance and make them available as background maps.  Specifically, we are able to integrate Web Map Services (WMS) and Tiled Web Maps.

WMS is a universal protocol introduced by the Open Geospatial Consortium and serves as a standard through which providers can make map details available and end user applications can request and consume the maps. The service consists of a number of different request types, among which the GetCapabilities request provides important details needed when integrating into OAC.  

Taking my scenario as an example, and using the openly available UK Air Information Resource, we will look at how we can use this new feature to help us better understand the effects of population growth on air quality.  

Understanding the GetCapabilities Request

As mentioned before, the Background Maps are configured in the OAC Console, under the Maps option. Navigate to the Backgrounds tab and select the Add Background option. As you can see, we have two new options available to us (Web Map Service and Tiled Web Map).

Selecting the Web Map Service option presents us with the Add Background dialog box, where we can configure the map integration. You can see below that we need to provide a name for the background (this is the name that end users will identify the background by in DV) and an optional description. We also have some options that we need to derive from the map service, namely:

  • URL - this is the URL for the server hosting the may service
  • Version - this specifies the WMS version in use
  • Coordinate Reference System - this is a standard value and cannot be modified
  • Layers - the WMS service may support multiple layers, and here we can specify the layer(s) that we want to use, based on the layer name.
  • Format - the type of image that will be rendered when using the background

All of this information can be derived using the map services GetCapabilities request, which is an XML document that describes the definition of the service. Hopefully, your WMS provider will make the GetCapabilities easily available, but if not (and you know the URL for the host server) then you can pass the request=GetCapabilities parameter for the same result.

Once you have provided the required values, you must finally accept the trust statement (which will automatically add the host site to the Safe Domains list). After saving and a quick refresh, you should be able to inspect the new background and get a Preview. I've found it useful to do this up front, just to validate that the integration is working correctly.

In my scenario, the WMS provides discrete layers for every year from 2001 to 2018. As my data set shows population growth between 2009 and 2019 and I want to see if there is any relationship between population growth and the changes in air quality over time, I have created three separate background maps showing snapshots at 2008, 2013 and 2018.

Now...once we've completed this set up (and assuming we have already created any custom Map Layers and have our Data Set to hand), we have everything we need to begin building some insights.

Using the Map Backgrounds in DV

My hypothesis is that population increase will result in greater use of the surrounding road networks and there should, therefore, be an increase in recorded roadside particulate levels. Obviously, there are other offsetting variables, such as the change in vehicles (cleaner technology, electric cars etc.) and societal changes towards a greener environment. I'm interested to learn something about the balance of these factors.

To start with, I will use the data set to identify the range of population growth of each town/city by way of a simple scatter graph. I have assigned colour based on a growth classification and, as well as plotting the growth %, I am using size to represent the base population. To save space, I am plotting data on the X Axis rather than the Y Axis. The end result looks like this:

Next, I create a map using the key element (Code), the Name of the town/city and the population growth measure. The appropriate map layer is selected (as I have already specified the default Location Details for this data set) and it is displayed on the Oracle BI background.

Now, instead of using the Oracle BI background, I am going to switch this to use the WMS based background I have just created...a simple case of navigating to the Map properties tab and selecting the appropriate background from the list.

This doesn't look great when viewing the entire data set, but I am more interested in looking at individual locations. I also want to see how the situation has changed over time. I therefore create two duplicates of the map and change the background to reference the 2013 and 2018 snapshot maps respectively (which, remember, simply pick up different layers within the hosted WMS). I increase the transparency of the Map Layer to 80% (so that I can better focus on the background map) and I switch the Use as Filter setting on my Scatter graph (so that I can use that to focus in on specific locations of interest). The resulting canvas looks something like this:

But...What Does It All Mean?

I can definitely see a change in the roadside emissions, most markedly between 2013 and 2018, but I am missing something important. I have no idea what this change means. Without a legend, I am unable to say whether the analysis supports or contradicts my hypothesis...pretty but also pretty useless!

Luckily, the solution, once again, lies in the WMS service. Returning to the GetCapabilities definition, I can see that each layer includes a <LegendURL> tag which gives me the destination for the legend to the data. A quick check to make sure all layers are using the same legend (thankfully, they are) and an extra visualisation later (this time, an image that references the URL specified in the <LegendURL> tag), I get closer to understanding my data:

Quite pleasingly, it appears that, regardless of the rate of population growth or the location, there is a consistent reduction in recorded roadside particulate levels.

In Summary

At this stage, it is probably worth saying that technically, it would have been possible to perform this same type of analysis in previous versions of OAC, as it has always been possible to create and overlay multiple layers within your Map visualisation. However, in practice, this particular use case would be impossible to achieve. It would involve carrying the data and the geometries for every single point of every single road appearing in the background map. That would be a lot of data, and even assuming we had it all available to us, it would be difficult to manage and definitely much, much slower to process than calling out to the WMS service and rendering the response images. Not to mention that the geoJSON file for the map layer would likely breach the size restrictions! Using the WMS integration saves us all this pain and opens up many more opportunities.

In fact, I can see some really powerful use cases for this new feature, as it takes the map background away from being a passive component of the Map visualisation and creates opportunities for it to become an active part of the insight process, enriching and adding relevant context to your existing data sets. Hopefully the use case I've presented here sparks your imagination and opens up some new ways of viewing data spatially within OAC.

Footnote: The examples shown above contain public sector information licensed under the Open Government Licence v2.0

Rittman Mead sponsoring RMOUG

Rittman Mead sponsoring RMOUG

After enjoying a successful 2020 in the North American data and analytics market, we’re excited to announce that we’re sponsoring Rocky Mountain Oracle Users Group (RMOUG) this year!

Running from 8th - 11th February, the agenda is jam packed with content including interactive workshops and technical sessions all presented by local and international Oracle product managers, partners and end users. There’ll be plenty of content covering everything from the newest developments in Oracle Analytics to database migrations. You can catch our CEO, Jon Mead at 11.30am (MST) on the 11th where he’ll be exploring how metadata can help you to understand and enhance user engagement. Don’t miss it!

Rittman Mead sponsoring RMOUG

We’ll be hosting a couple of break-out sessions (bring your own coffee) so if you want to chat Oracle Analytics, AI, ML, databases or anything in between then please feel free to virtually drop by and have a chat with one of the team.

Register here: https://events.bizzabo.com/TD2021/page/1611456/registration

Using the Go-URL in OAS

The OBIEE Go-URL functionality is widely used and it's deprecation in OAS is causing issues for companies wishing to upgrade to OAS.

This is a feature of OBIEE that allows a report to be accessed directly from a URL, enabling links from other applications - often with optional parameters passed through to filter the report - and usually bypassing the login screen by including user credentials within the URL parameters.  

Typically a single user-id is used for Go-URL access to OBIEE reports thus avoiding the need to either enter/detect user details or to provide login credentials for users who would not login to OBIEE directly.

With the first release of OAS, Oracle have announced that the Go URL functionality is being deprecated and may be removed in a future release. Also, since OBIEE 12.2.1.3, the advent of the  combined bi security login screen for both Analytics and DV has meant the NQUser and NQPassword parameters no longer work, so it is not possible to specify a user for the Go URL and hence it cannot login.

There are currently two scenarios under which the Go-URL does work in OAS:

  1. If full Single Sign-on to OAS has been implemented - via Oracle Access Manager or Apache, currently the only recognised methods for SSO in OAS - then the Go-URL will work under the user-id that SSO signs in under. In other words, the user credential parameters in the URL are ignored, the user logs in automatically under their credentials and the requested report displays as normal.

    However this is often not appropriate - if users are clicking on Go-URL links outside the scope of the SSO environment, for instance connecting via the internet or from other networks within the company LAN where SSO could not operate. Also if reports are designed to work under a specific user, perhaps with special security rules, a users own security levels may result in different data appearing in a report.

  1. Disable the lightweight SSO option in OAS. This separates the security for Analytics and DV. For Analytics this returns to the classic login screen and the NQUser and NQPassword parameters on the Go-URL work correctly, but this causes issues for DV making it virtually unusable. This is only really an option if you do not want to use DV in your implementation of OAS.

There is however a third option we have discovered which uses the Single Sign-On approach, but without the need to actually detect user credentials via an SSO service. Instead a specific user-id can be provided, just like the NQUser parameter in the Go URL.

Instead of the Go URL passing the NQUser and NQPassword parameters, Apache can be used to re-direct access to OAS, providing the user-id to login with during the re-direct.  The effect is that the Go URL will still display the same report output whilst logging in with the same user-id, albeit supplied by Apache rather than the NQUser parameter.

This works by using the Single-Sign-On configuration between Apache and OAS.

In a normal SSO configuration, Apache would authenticate users via a standard SSO service, for example, Kerberos, and once the user has passed authentication, redirect the URL to OAS, providing the user credentials as request-headers, so that OAS can automatically login the user in, bypassing the login screen.

The Oracle document ID 2707401.1 explains this process in detail.

We can however ignore the SSO authentication within Apache and instead hard-code the User-id we want to login to OAS with.  Apache will still send this user-id in the request-header to OAS, which in turn will recognise it as a pre-authenticated user and log the session in.  In the case of a Go URL, the rest of the URL which points to a report or dashboard will continue to process as normal, displaying it to the user.

The user is totally unaware of the difference - they click on a link and see a report in OAS just as they did in OBIEE.

Just as with the Go URL there are security implications of this. The user-id and password are no longer sent in plain text in the URL, which is an improvement, but calling the URL will automatically login you into OAS. if you just use the core URL of http://oas-server:port/analytics, you will be logged in to OAS and end up at the home screen, without any user/password challenge.

It is important therefore to restrict the permissions and privileges of this hard-coded user so it can only access the reports/data it needs to and cannot create new content, access other subject areas etc.  Using the Apache configuration, OAS can be tied down so that just the /analytics URL can be opened for this, so no access can be given to /dv, /xmlpserver or any other area of OAS unless explicitly required.

By using Apache to control access to OAS you create a separate port to call OAS on for the Go-URL. This means the port you installed OAS with (e.g. 9502/9503) will remain as the main port for logging into OAS via the login screen - users manually connecting to OAS for analytics, DV and all the other functionality, should continue to use this port.   These connections will not go via Apache.

The Go-URL will use the port provided by Apache - in the instructions below I’ve set this up as port 9603 - and Apache will re-direct the user seamlessly to OAS’s main port.   Apache can also be configured to use SSL certificates, and if installed on the same server as OAS, the same certificates as OAS.  In this example I’m not using SSL, but the instructions for doing so are included.

Assuming you have already OAS installed, below are the steps to install and configure Apache to act as the “SSO” front-end for the Go-URL.   The instructions below were carried out on an Oracle Enterprise Linux 7.8 server - other versions of Linux or using Windows will be slightly different.

Start by installing Apache HTTP Server

As the root user, use yum to download and install Apache and the required SSL and SSO plug-ins:

yum -y install httpd mod_ssl mod_auth_kerb mod_auth_token

Then enable and start Apache:

systemctl enable httpd.service systemctl start httpd.service

Next enable Apache to connect to the network and allow access on port 9603. In this case I’ve installed  policycoreutils to get the semanage command as I have SELinux enabled.

yum -y install policycoreutils-python 
/usr/sbin/setsebool -P httpd_can_network_connect 1 
semanage port -a -t http_port_t -p tcp 9603 
semanage port -l | grep http

The final command above confirms port 9603 is now available:

Next open the firewall port for 9603:

firewall-cmd --zone=public --add-port=9603/tcp --permanent service 
firewalld stop service 
firewalld start 
systemctl restart httpd.service

Now Apache is is installed, we can configure it. Edit the file /etc/httpd/conf/httpd.conf

Set the Listen port and, if SSL is required, add the SSL virtual host properties with the correct server, port and SSL certificate file values (I’ve included the SSL virtual host in httpd.conf, but it could reside in it’s usual place in ssl.conf instead)

Listen 9603 
<VirtualHost oasvm3.local.com:9603> 
ErrorLog logs/ssl_error_log 
TransferLog logs/ssl_access_log 
ServerName oasvm3.local.com 
SSLEngine on 
SSLProxyEngine on 
SSLCertificateFile /u01/oas55/ssl/certificate.crt 
SSLCertificateKeyFile /u01/oas55/ssl/certificate.key 
SSLCertificateChainFile /u01/oas55/ssl/certificate_chain.pem 
SSLCACertificateFile /u01/oas55/ssl/ca_certificate.crt
SSLProxyCACertificateFile /u01/oas55/ssl/ca_certificate.crt 
RequestHeader set WL-Proxy-SSL "true" 
RequestHeader set IS_SSL "ssl" 
RewriteEngine On 
RewriteOptions Inherit 
ProxyPreserveHost On 
</VirtualHost>

save and close, then restart apache:

systemctl restart httpd.service

Then try the Apache URL in a Browser: http://oasvm3.local.com:9603 (or https for ssl)

if using SSL check for the padlock icon in the address bar.

Configure Apache for OAS

Re-open the httpd.conf file and locate the line:

#ServerName www.example.com:80

below this add the following lines - these are typical recommended settings for OAS

###### Various default settings ###### 
# 
# Timeout: The number of seconds before receives and sends time out. 
# 
Timeout 6000 
# 
# KeepAlive: Whether or not to allow persistent connections (more than 
# one request per connection). Set to "Off" to deactivate. 
# 
KeepAlive On 
# 
# MaxKeepAliveRequests: The maximum number of requests to allow 
# during a persistent connection. Set to 0 to allow an unlimited amount. 
# We recommend you leave this number high, for maximum performance. 
# 
MaxKeepAliveRequests 0 
# 
# KeepAliveTimeout: Number of seconds to wait for the next request from
# the same client on the same connection. 
# 
KeepAliveTimeout 60 
#
#####################################

Now add the following lines to the end of the file, altering the OAS URLs appropriately. The ProxyPass URLs must be the original SSL or Non-SSL OAS URLs.

Define GoUser reportuser 

RewriteEngine On Proxy
PreserveHost On 

# Protected Resources 
<Location "/analytics"> 
ProxyPass "http://oasvm3.local.com:9502/analytics" 
ProxyPassReverse "/analytics" 
#SSLRequireSSL 
RequestHeader unset Authorization 
RequestHeader set OAM_REMOTE_USER ${GoUser} 
RequestHeader set iv-user ${GoUser} 
RequestHeader set SM_USER ${GoUser} 
</Location> 

ProxyErrorOverride Off 

# Unprotected Resources 
<Location "/bi-security-login"> 
ProxyPass "https//oasvm3.local.com:9502/bi-security-login"
ProxyPassReverse "/bi-security-login" 
</Location> 

The first line defines the user which will login to OAS, in this case I’ve created a user called ‘reportuser’. This is then substituted into the request headers below.

Ideally this user should exist in the provider used for authentication in Weblogic - so that any security groups can be picked up to control this user's access with OAS.

Note the SSLRequireSSL is hashed out, the hash must be removed if SSL is required.

save and close, then restart Apache:

systemctl restart httpd.service

Configure Weblogic to accept SSO calls from Apache

To enable Weblogic to recognise SSO tokens to Apache we use the OAM Identity Asserter.

Login to the Weblogic admin console, Navigate to Security Realms → myrealm → Providers, then Click Lock & Edit, then click New.

Click New.  Enter a name and choose OAMIdentityAsserter as the type.

Click OK to add the provider:

Click on the new OAM Asserter provider to edit it. Change the Control Flag to REQUIRED & click Save. The remaining default settings are fine.

Return to the providers screen, click Reorder and move the OAM Asserter so it is second in the list below the main user directory, in this case PaulsAD is the main active directory authenticator

This next step isn’t strictly necessary - the Weblogic Plug-in is required to allow access to Data Visualizer and if you have no intention of allowing access to DV via Apache, this step can be skipped.

The Weblogic Plug-in needs enabling in three locations.

From the Domain Structure window click on bi

Go to the Web Applications tab

Scroll down to the WebLogic Plugin Enabled option and check it. then click Save.

From the Domain Structure window expand Environment and click on Servers. Click on bi_server1 in the Configuration tab.

Scroll down the General, click on the Advanced link, then locate the WebLogic Plug-In Enabled option and set it to ‘yes’. Click save.

From the Domain Structure window click on Clusters. Click on bi_cluster

Again in the General tab click on Advanced then set the WebLogic Plug-In Enabled option to ‘yes’. Click save.

All changes are now complete. Click Activate Changes.

A full stop and restart of Weblogic & OAS is required under the oracle user:

/u01/oas55/config/domains/bi/bitools/bin/stop.sh    
/u01/oas55/config/domains/bi/bitools/bin/start.sh

Enable Lightweight SSO

For this method of accessing OAS to work, the internal lightweight SSO must be enabled. It will be enabled by default, but if it has been disabled, for example to make the Go-URL work, then re-enable it:

Stop OAS, open the WLST command line:

cd /u01/oas55/product/oracle_common/common/bin ./wlst.sh

Re-enable SSO (alter the domain path to suit your environment)

wls:/offline> enableBISingleSignOn('/u01/oas55/config/domains/bi','/bi-security-login/logout?redirect=/dv'); 
wls:/offline> exit();

Then restart OBIEE to reflect the changes.

The Apache configuration is now complete and you should be able to login to OAS on the Apache URL, e.g. http://oasvm3.local.com:9603/analytics

The SSO configuration should automatically log you in under the user defined in the apache configuration above:

The Original OAS URL remains on port 9503 and will take you to the normal login screen:

Note than when you login via the Apache you can’t sign-out!  You can click the sign-out link, but the SSO process will simply login you back in again rather than displaying the login screen. To login normally close the browser and reopen with the standard OAS URL.

Apache Log files

If you have any issues with the above, there are five log files for Apache you can use investigate whats going on.

/var/log/audit/audit.log Contains SELinux issues
/var/log/messages Contains general activity messages
/var/log/httpd/error_log Contains errors generated by Apache,including some SSL and Kerberos messages
/var/log/httpd/ssl_error_log Contains SSL errors generated by Apache
/var/log/httpd/ssl_access_log Contains messages related to users connecting over SSL

The logging level in the error_log file is governed by the LogLevel setting in /etc/httpd/conf/httpd.conf This is normally be set to ‘warn’, but can be set to debug to display detailed messages.

Testing the Go URL

With Apache/SSO now enabled, the Go URL can be used:

http://oasvm3.local.com:9603/analytics/saw.dll?Go&Path=%2Fshared%2FPipeline QuickAnalytics%2FPipeline%2FSales Pipeline-Charts&locale=en-en&lang=en

Note that if the Go URL includes the NQUser and NQPassword parameters, they will be ignored.

Alternatively the Dashboard parameter can be used to take a user fully into OAS to view a dashboard:

http://oasvm3.local.com:9603/analytics/saw.dll?dashboard&PortalPath=%2Fshared%2FPipeline QuickAnalytics%2F_portal%2FSales Effectiveness

An important point here is that the user is now logged into OAS and can access other content. As mentioned earlier, the user used for Go URL access should have their permissions and privileges tightly controlled to limit access to just what it required.

Accessing other areas of OAS

The instructions above only provide access to the core /analytics functionality via the Apache SSO URL. Other areas of OAS, such as /dv or /xmlpserver are not enabled:

If these are required the location entries must be added to the httpd.conf file and Apache restarted.

The full list locations can be found in the Oracle doc mentioned earlier, and includes protected and unprotected locations (as some must be visible prior to logging into OAS, e.g. the bi-security-login URL). As an example, here is the location required to enable DV:

<Location "/dv"> 
ProxyPass "http://oasvm3.local.com:9502/dv" 
ProxyPassReverse "/dv" 
#SSLRequireSSL 
RequestHeader unset Authorization 
RequestHeader set OAM_REMOTE_USER 
${GoUser} RequestHeader set iv-user 
${GoUser} RequestHeader set SM_USER 
${GoUser} 
</Location>

Thats it, your now ready to use the Go-URL with the Apache URL and not include the NQUser/NQPassword parameters.

This solution is not ideal and does have security implications you must take care of, although it does avoid the user/password appearing in the URL, which is a definite positive.

One downside of this is that you can’t use multiple user-ids - all the Go URL’s will connect with the same user, it is not possible (well we’ve not found a way yet) to allow different Go URLs to connect with different user-ids. Apache will only allow one redirect to /analytics on the main listen port.

Enjoy!

Analysing Social Media Activity with ADW and OAC

Analysing Social Media Activity with ADW and OAC

Yesterday I wrote a series of tweets talking about my Twitter activity analysis with Oracle's Autonomous Database and Oracle Analytics Cloud. Since the tweet became popular and I didn't share the steps, I thought a blog post should follow, so... here we are!

Getting the Data from Twitter

There are many ways that you can get data from Twitter. For the sake of my exercise I requested a dump from my data from the Twitter website. You can request the same following these instructions.

Once the data is ready, you'll receive a mail or a notification and you'll be able to download the related zip file named twitter-YYYY-MM-DD-hashed_string.zip.

Once unzipped you'll see two folders:

  • assets containing files you attached in your tweets (mostly images)
  • data where the actual interesting information is.
Analysing Social Media Activity with ADW and OAC

Within the data folder I concentrated on the tweet.js file which contains the tweet information. There are other interesting files such as followers.js or like.js but I didn't include those datasets in my analysis as of now.

Warning: the tweet.js dataset contains only the tweets written by you. It doesn't contain all people tagging you in a tweet or replies to your tweets.

The Your archive.html file allows you to browse the content on the folder from a web browser

Analysing Social Media Activity with ADW and OAC

Importing Data into ADW with SQL Developer Web

The next step in my analysis is to import the data into the Autonomous Data Warehouse. For this task, I used SQLDeveloper Web, available by default in ADW, which has a powerful utility to import JSON documents as rows in a relational table. Jeff Smith's post covers the process in detail.

Unfortunately when trying to import into ADW the file tweet.js I encountered an error due to the fact that the file itself is not a pure JSON file, pretty clear if you check the file itself

window.YTD.tweet.part0 = [ {
  "tweet" : {
    "retweeted" : false,
    "source" : "<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>",
    .....
    "lang" : "en"
  }
},
....
{
  "tweet" : {
    "retweeted" : false,
    "source" : "<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>",
    .....
    "lang" : "en"
  }
 }
 ]

The first item to remove from our file is the window.YTD.tweet.part0 = prefix. I believe this is due to the pagination of the results, but it clearly screws up the JSON formatting.

Once removed I could parse the tweet.js file with SQLDeveloper Web, but the table definition proposed had only one column with containing the whole TWEET  JSON document.

Analysing Social Media Activity with ADW and OAC

At this point, I could either accept this and do further processing using Oracle's SQL JSON parsing functionality or slightly change the shape of the file to be ingested correctly and I opted for the second.

When importing JSON documents into rows, SQLDeveloper Web analyses only the first level or attributes in the document itself. In the case of our tweet.js file was something like

{
  "tweet" : {
    ...
    }
},
{
  "tweet" : {
    ...
    }
},

The first level parsed by SQL Developer Web was correctly only extracting the tweet element and proposing a CLOB (JSON) column to store it. But I wanted the content of the tweet to be parsed. I ended up removing the first layer by substituting in the file any occurrence of },{ "tweet" : { with a simple comma and removing the initial and final parenthesis.

The file now looks like the following

[ {
    "id" : "279908827007180800",
    "created_at" : "Sat Dec 15 11:20:39 +0000 2012",
    "full_text" : "Finally at home after #christmasparty... Looooong travel!",
    "lang" : "en"
    ...
  }
,{
    "id" : "276794944394498048",
    "created_at" : "Thu Dec 06 21:07:12 +0000 2012",
    "full_text" : "@mRainey will you be there next week too? Enjoy uk and visit #italy if you can!",
    "lang" : "en",
    ...
  }
 ...
 }]

We can now parse the file with SQL Developer Web, and the output correctly identifies all the first level entities in the JSON document.

Analysing Social Media Activity with ADW and OAC

Few more clicks and we have our table FRANCESCO.TWEET_DATA_SQLDEV populated automagically!

Analysing Social Media Activity with ADW and OAC

Note: The whole process above could be implemented and automated in several different ways, the aim of the blog post is only to demonstrate the feasibility of the analysis.

Text Tokenization with Oracle Text

The Tweet's FULL_TEXT column tokenization is done in the Oracle Database, you can see the full process described by the following video produced by the Oracle Analytics PM team.

If you prefer a short version of it, here it is: I basically created the following index

CREATE INDEX "FRANCESCO"."TWITTER_TWEET_IDX" ON "FRANCESCO"."TWEET_DATA_SQLDEV" ("FULL_TEXT") INDEXTYPE IS "CTXSYS"."CONTEXT";

The INDEXTYPE IS "CTXSYS"."CONTEXT" creates an index using Oracle Text. A more accurate description of the procedure can be found in OracleBase's post.

Once the index is created, we can see some new tables appearing with the name DR$INDEX_NAME$LETTER with

  • $DR$ being a fixed prefix
  • INDEX_NAME the name of the index
  • LETTER a single letter between I, K, N, U, R which meaning can be found in this Document

For the purpose of our analysis, we'll focus on the DR$TWITTER_TWEET_IDX$I table which contains the tokens of our FULL_TEXT column.

But the token by itself is not very useful, we need to match the token with the Tweet's ID to be able to provide meaningful analysis. Again, this is covered nicely by another video created by the Oracle Analytics PM team.

In order to associate the Token with the original Tweet we can use again the power of Oracle Text and the Index created above with the following query

SELECT
    full_text,
    score(1) AS text_score,
    token_text,
    a.id
FROM
    tweet_data_sqldev a,
    (
        SELECT DISTINCT
            token_text
        FROM
            dr$twitter_tweet_idx$i
    )
WHERE
    contains(a.FULL_TEXT, '/' || token_text, 1) > 0

Again for more info about Oracle Text's CONTAINS function please refer to the relevant documentation.

I physicalized the output of the above query in a table ( TWEET_TOKENS), which contains the TOKEN_TEXT together with the Tweet's ID so we can now join this table with the original one containing the list of Tweets in Oracle Analytics Cloud.

Note: One of the next versions of Oracle Analytics Cloud will provide the Tokenization as step of a DataFlow within the Database Analytics options! You'll be able to tokenize your strings without leaving OAC.

Analysing Social Media Activity with ADW and OAC

Analysis of the Data in Oracle Analytics Cloud

If you're a frequent blog reader, this is probably the easiest part. I just had to:

  • Create a connection to my Autonomous DataWarehouse by using the wallet file.
  • Create the two datasources: one for TWEET_DATA_SQLDEV and another for TWEET_TOKENS
  • Create a project and include both Datasources

Once in the project, the first visualization is about the most "liked" tweet... no surprise is when I become Oracle Ace Director, back in 2019, during KScope Keynote!

Analysing Social Media Activity with ADW and OAC

This first visualization is ok, but not really using any of the Oracle Text capabilities exposed in the Tokens... so my next analysis was...

Which words do I use more often when tweeting?

Easy... with the two datasources created above!

Analysing Social Media Activity with ADW and OAC
How does it change when I reply to people?

Well, you can see that words about OUG sessions and Oracle are still there, but there is the emerging topic of Food!

Analysing Social Media Activity with ADW and OAC
Who do I interact most with?

I plotted the # of Replies to specific Users…
No wonder (at least to me) that I get the most interactions with people that tag me with Italian food abominations…

Analysing Social Media Activity with ADW and OAC
And... how do I reply to them?

Again, very easy with Oracle Analytics Cloud Trellis Options.
You can spot that Food is the major discussion topic with HeliFromFinland and dw_pete, while my chats with @connor_mc_d and @stewartbryson are covering more topics

Analysing Social Media Activity with ADW and OAC
What about Nephentur?

One Last mention to my interactions with Nephentur: It’s clear his love for OUGs, Airports, Flights, Hotels… all driven by  #TheBergEffect

Analysing Social Media Activity with ADW and OAC

Hope you liked the story, just an example of what you can do with a dataset you own and tools available in the Oracle Cloud!