OAC Semantic Modeler and Version Control with Git
This is my third blog post in the series of posts about OAC's Semantic Modeler. The first one was an overview of the new Semantic Modeler tool, the second was about the new SMML language that defines Semantic Modeller's objects. This post is about something that OBIEE developer teams have been waiting for years - version control. It looks like the wait is over - Semantic Modeler comes with native Git support.
When you open Semantic Modeler from OAC, you will see two toggle buttons in the bottom right corner:
The right toggle is for Git Panel, where version control magic takes place.
Enabling Git for a Semantic Model
Version control with Git can be enabled for a particular Semantic Model, not the whole Modeller repository. When first opening the Git Panel, it will inform you it requires configuration.
Click Start and you will be asked for a Git Repository URL and the name of the main branch. I created my test repository on Github but you may have your own company internal Git server. The easiest way to establish version control for a Model is to create an empty Git repository beforehand - that is what I did. In the "Initialize Git" prompt, I copied the full https URL of my newly created, empty (no README.md in it!) Github repository and clicked "Continue".
If the repository URL is recognised as valid, you will get the next prompt to choose a Git profile, which is your Git logic credentials. To create a new profile, add your git user name and password (or Personal Access Token if you are using Github) to it and name your profile.
Click "Initialize Git". After a short while, a small declaration of success should pop up...
... and the Git Panel will now have a typical set of Git controls and content.
Next, let us see it in action.
Git and Semantic Modeler - the Basics
The basics of Semantic Modeler's version control are quite intuitive and user friendly. Let us start by implementing a simple change to our Customers dimension, let us rename a column.
We type in the column name, press Enter. We see that the Unstaged Changes list in the Git Frame is still empty. We press Ctrl+S to save our changes and the Unstaged Changes list gets updated straight away.
We click on "Stage All". At the bottom of the Git panel, "Commit description" input and "Commit" button appear.
We enter a description, click "Commit" and get a message:
However, the changes have not yet been pushed to the Git server - we need to push them by clicking the "Push" button.
Now let us check the repository content in the Git server.
We can see the "Dim - Customers.json" SMML file has just been updated.
Git and Semantic Modeler - Working with Branches
At Rittman Mead we are evangelists of Gitflow - it works well with multiple developers working independently on their own features and allow us to be selective about what features go into the next release. The version control approach we have developed for OBIEE RPD versioning as part of our BI Developer Toolkit relies on Gitflow. However, here it is not available to us. No worries though - where there is a will, there is a way. Each of our devs can still have their own branch.
Before we start playing with branches, let us make sure our main branch is saved, checked in and pushed. To create a new branch, we click on the "Create Local Branch" button.
We base it on our current "main" branch. We name it "dev-janis" and click "Create".
If successful, the Current branch caption will change from "main" to "dev-janis". (An important part of version control discipline is to make sure we are working with the correct branch.)
In our dev branch, let us rename the "LAST_NAME" column to "Last Name".
Save.
Stage. Commit. Push.
Once pushed, we can check on the Git server, whether the new branch has been created and can explore its content.
We can also switch back to the "main" branch to check that the "LAST_NAME" column name remains unchanged there.
Git and Semantic Modeler - Merge Changes
The point of having multiple dev branches is to merge them at some point. How easy is that?
Let us start with changes that should merge without requiring conflict resolution.
In the previous chapter we have already implemented changes to the "dev-janis" branch. We could merge it to the "main" branch now but Git would recognise this to be a trivial fast-forward merge because the "main" branch has seen no changes since we created the "dev-janis" branch. In other words, Git does not need to look at the repository content to perform a 3-way merge - all it needs to do is repoint the "main" branch to the "dev-janis" branch. That is too easy.
Before merging, we will implement a change in the "main" branch.
We switch to the "main" branch.
We rename the "INITIALS" column to "Initials".
Save. Stage. Check in. Push.
Let us remind ourselves that in the "dev-janis" branch, the INITIALS column is not renamed and the LAST_NAME column is - there should be no merge conflicts.
To merge the "dev-janis" branch into the "main" branch, we switch to the "main" branch. We click the "Merge" button.
We select the Merge branch to be "dev-janis" and click "Merge".
The Merge Strategy value applies to conflict resolution. In our simple case, there will be no merge conflicts so we leave the Strategy as Default.
After the merge, I could see moth the "INITIALS" and the "LAST_NAME" columns renamed - the merge worked perfectly!
Save. Stage. Check In. Push.
Well, how easy was that!? Anybody who has managed an OBIEE RPD multidev environment will the new Semantic Modeler.
Git and Semantic Modeler - Merge Conflict Resolution
At last we have come to merges that require conflict resolution - the worst nightmare of OBIEE RPD multidev projects. How does it work with OAC's Semantic Modeler?
Let us create a trivial change that will require conflict resolution - we will rename the same column differently in two branches and then will merge them. The default Git merge algorithm will not be able to perform an automatic 3-way merge.
We use our trusted Customers dimension, we select the "main" branch and change the column "DOB" name to "Date of Birth".
"main" branch:
Save. Stage. Check in. Push.
In the "dev-janis" branch, we rename the same "DOB" column to "DoB".
"dev-janis" branch:
To merge, we switch back to the "main" branch. (Pull if necessary.) As the branch to be merged with, we choose "dev-janis".
As for Merge Strategy, we have 3 options here: Default, Ours and Theirs. (In our example, Ours would be the "main" branch whereas Theirs would be the "dev-janis".) We can use the Ours and Theirs strategies if we are 100% certain in case of a conflict we should always prefer the one or the other branch. In most cases however, we want to see what the conflicts are before deciding upon the resolution, therefore I expect the Default Strategy will almost always be used. You can read more about Merge Strategies in OAC's documentation here.
We call the Merge command.
As expected, we get merge errors - two of them. (The reason there are two is because our Business Layer column names are automatically propagated to the Presentation Layer - hence we get two errors - one from Business Layer and the other from Presentation.)
We click on the first conflict. Lo and behold - we get a proper A/B conflict merge window that does a decent job at showing us the merge conflict. However, I did find it to be a bit buggy - the "Resolve Item" and "Resolve All" buttons only work when you click on their edges. Also the A and B version toggle buttons did not work at all for me.
However, I got it working by using the Take All buttons, which worked fine for me, since there was only a single conflict to resolve. I wanted the B version so I clicked the Take All button in the B frame and then clicked the Resolve Item button and then pressed Ctrl+S. That did the trick and the error disappeared from the Merge Conflicts list. The same I did with the Presentation Layer conflict. After that, there were no more Merge conflicts in the Merge frame and I got a message there: "Merge successful".
And successful it was. This is the end result - the Customer dimension in the "main" branch after a conflict-resolved merge.
Version control merge conflict resolution can be challenging. I recommend you read the Understand and Resolve Merge Conflicts chapter from the OAC documentation.
Conclusions
As of the time of this writing, version control features like change staging, checking in, push and pull, switching between branches, appear to be rock-solid. When it came to merging and in particular merge conflict resolution, the version control functionality appears a bit more capricious but it still worked for me.
Overall, Git support in Semantic Modeler looks well designed and will be a huge improvement over OBIEE RPD versioning.
If you want to run advanced queries against your repository content from Python, if you want to auto-generate Semantic Model content with scripts, version control with Git will enable that. And it will be easier than with OBIEE RPD.
OAC Semantic Modeler and the SMML Language
There are a few reasons to like the Oracle Analytics Cloud (OAC) paired with the new Semantic Modeler:
- It works in the cloud. I mean, it actually works. (Check my blog post for an overview of Semantic Modeler.)
- It has a native, built in git integration. (A blog post about that is coming shortly.)
- The SMML language.
The SMML, which is short for Semantic Modeler Markup Language, is my personal favourite feature coming with the new Semantic Modeler.
The old UDML and the new SMML
In the old days the only way to manipulate the OBIEE RPD content automatically was to export the RPD in UDML format, tweak it and then import it back. UDML was proprietary, undocumented and an utter nightmare to parse - content analysis was usually done with sophisticated regex and it did not work very well. The use of UDML was mostly "local" when you copy an RPD object into Notepad, tweak it and then copy it back into the RPD. That all changed with the XML export feature - parsing of the whole repository became feasible, almost easy. At Rittman Mead we have parsed the entire RPD content and inserted it into a relational database to make it available for further analysis like lineage or data dictionary. The XML export capability was a game changer. However, for individual RPD object manipulation we were still stuck with UDML.
In OAC and with the new Semantic Modeler, UDML is replaced with SMML. It is documented. It is JSON-based, which I usually prefer to XML. Instead of modifying UDML at our own risk, Semantic Modeler objects are available in SMML format from the Modeler itself - the Modeler features a SMML editor.
Editing SMML in Semantic Modeler
To easiest access to Semantic Modeler objects in SMML format is from the Modeler itself. You right-click a Modeller object, choose Open in SMML Editor from the pop-up menu and a new tab will open with the script.
However, this way you are accessing only one object at a time. (You cannot multi-select objects to have them in a single SMML editor view.) Moreover, when opening a high-level object like a Business Model, you will not get full SMMLs of its Logical Tables - you will have to open the SMML Editor for each Logical Table separately. Also, some low-level objects like Columns do not have their own SMML - when opening SMML Editor for a particular Column, you will get the SMML code for the whole Table.
Let us give it a try. We go to the Logical Layer, double-click the "Dim - Customers" Logical Table to open it in designer.
Now, right-click the same "Dim - Customers" table and open the SMML Editor from the menu. The two tabs will sit next to each other.
Let us change a dataType value from NUMERIC to VARCHAR:
To save the changes, we either press Ctrl+S or click the save icon in the top right corner. Now let us go back to the designer tab. We see that the data type in the designer has changed from '99' to 'ab' - without refreshing the designer we can see the change there. (Note that the same change cannot be done from the designer - data types are normally derived from the Physical sources, instead of specified explicitly.)
Can repository content be broken in SMML Editor? Let us try that by renaming a JSON key:
When trying to save it, I get this error:
No explanation, no line number. In this case, the save did not take place. I can either revert the change in the SMML Editor itself or I can close the Editor tab and choose to discard changes.
However, the way errors come up is inconsistent. For some errors, I would get this message:
When I choose to proceed (common sense says No but I'm doing this for you!), the Dim - Customers dimension disappears from the Business Model...
But it is not all lost. It is now found in the Invalid Files section.
Here I can open it and this time the row with the error is actually highlighted. I remove the unneeded commas that make the JSON invalid and save it.
However, the file is still in the Invalid Files section and still missing from the Business Model - I don't think that is how it is supposed to work. Perhaps I am missing a simple trick here but I could not find an easy way of moving the Customer dimension back to the Business Model. The best solution for me was to use the git reset
command.
I will describe git integration in my next blog post.
The SMML editing works well but when you get it wrong, it does not do a good job at telling what and where the issue is. However, it does tell you there is an issue and when that happens, do not save it!
We have seen that SMML editing works well for individual objects. But how about a whole repository export and import?
Whole Repository SMML Export and Import
The documentation does not go into much detail on whole repository exports and imports. I found two ways of exporting the whole repository in SMML format and one for importing it.
The easiest way to export the whole Semantic Model is to open it in Modeler and then open the triple-dot menu from the top right corner.
There you can choose to Export and then specify the name of the Zip file.
Upon pressing Export, the zip file gets generated and your browser will download it. When unpacking the archive, this is what you see - (at least) three folders...
...with lots of JSON files in them - those are SMML scripts.
An alternative way of exporting the whole repository in SMML format is to upload it to a git repository. In git, the repository is stored in exactly the same format as the export zip file.
When the repository is in git, you can clone the repository locally and edit it with a text editor - it will be the same as with a zip export. However, when done, you can commit and push your changes back to the git repository and then pull the repository changes from Semantic Modeler - this is the only way I found to import the whole repository into Semantic Modeler.
Querying OAC Repository SMML Content with Python
Why do I love the SMML so much? Because it presents a great opportunity for repository content analysis and development acceleration.
Now that we have the entire OAC Repository exported in JSON format, we can use Python to query it or even modify its content. Python is very good at handling data in JSON format - it is much easier than XML.
Let us start with something simple - in Python we open the Customers dimension JSON to count columns in it:
import json
with open("D:\OAC-Semantic-Modeller\Repo\logical\HelloBusinessModel\Dim - Customers.json") as f:
customerJson = json.load(f)
print(f"Logical Table Name is {customerJson['logicalTable']['name']} and its type is {customerJson['logicalTable']['type']}.")
logicalColumnNames = [lc['name'] for lc in customerJson['logicalTable']['logicalColumns']]
print(f"The table has {len(logicalColumnNames)} columns:\n\t{', '.join(logicalColumnNames)}")
These are just a few lines of code and the output looks like this:
Note above that in the SMML JSON, the name
attribute is a single attribute whereas logicalColumns
is a list. In Python they become a single attribute and a list accordingly.
It would be easy to modify the above script to do the same for more than one table, to run the script regularly to keep track of table column changes...
However, it would be more useful to do queries across multiple repository objects. With a bit more effort, we can do that.
Let us set a task to get a list of all Logical Table Columns in the format <Logical Table name>.<Logical Column name> (<Logical Column data type>)
but without opening each JSON individually like we did in the first script. Instead we want to load the whole repository in memory and then run our queries.
import json
import os
from functools import reduce
def getFileContent(filePath):
with open(filePath) as f:
return json.load(f)
def getFolderContent(folderPath):
folderPaths = [{'name': f, 'path': os.path.join(folderPath, f)} for f in os.listdir(folderPath) if not os.path.isfile(os.path.join(folderPath, f))]
filePaths = [{'name': f, 'path': os.path.join(folderPath, f)} for f in os.listdir(folderPath) if os.path.isfile(os.path.join(folderPath, f))]
folderContent = [{f['name']: getFolderContent(f['path'])} for f in folderPaths]
fileContent = [{f['name']: getFileContent(f['path'])} for f in filePaths]
return reduce(lambda a, b: {**a, **b}, folderContent + fileContent)
smmlRootPath = "D:\OAC-Semantic-Modeller\Repo" # this is the root path to where I unzipped the SMML export
allRepositoryJson = getFolderContent(smmlRootPath)
# get all Logical Tables in the Repository
logicalTableNames = [lt['logicalTable']['name'] for lt in allRepositoryJson['logical']['HelloBusinessModel'].values()]
print(f"There are {len(logicalTableNames)} Logical Tables in the repository: {', '.join(logicalTableNames)}\n")
# get all Logical Table Columns
ltColumns = []
for ltName in logicalTableNames:
ltColumns.extend([
f"{ltName}.{lc['name']} ({lc['dataType']})"
for lc in allRepositoryJson['logical']['HelloBusinessModel'][ltName + '.json']['logicalTable']['logicalColumns']
])
div = "\n\t * "
print(f"There are {len(ltColumns)} Columns found across {len(logicalTableNames)} Logical Tables:{div}{div.join(ltColumns)}")
The script is still quite simple and now allows us to run pretty much any query we can think of against the repository. The (top of the) output looks like this:
The new SMML language allows for a much easier repository content analysis and manipulation with Python - I expect that no big OAC project in the future will go without a Python developer. Development standards checks, change monitoring, development speed measurement, lineage tracing, repetitive repository content generation - these are just a few scripting opportunities that come to mind.
Conclusions
- The SMML language is a major improvement over the OBIEE's UDML language;
- It is a good way of quickly editing a repository object in a text editor;
- Editing SMML is more dangerous than using the standard object designer. The SMML editor will give you warnings if you try to save something dodgy but you can break things;
- Use Python to run repository-wide queries and updates. It is easier than it was with OBIEE's XML extracts;
- If planning to use SMML editing extensively, enable git and follow best version control practices - if you do break something, make sure it is your own dev branch.
Giving OAC Semantic Modeler a Try
Kids just love it when old folk go on about good old days. The particular good old days I have in mind is when Oracle CRM On Demand was released. Competition in the Cloud CRM space was tough but the Oracle product had one trick up its sleeve: its data analytics capability that set it apart from the competition. It was almost scary how well CRM worked in the cloud, and Cloud BI surely was just round the corner. A few years later (time seems to pass quicker as you get older), Oracle Analytics Cloud (OAC) gets released and we can safely say that Cloud BI works. The Semantic Modeler that has been made available for evaluation on May 2022 as part of OAC, is a big step towards Cloud BI not just working but working well.
At the time of this writing, Semantic Modeler is available for evaluation only, hence needs to be enabled from the Navigator menu > Console > System Settings > Preview Semantic Modeler.
Once enabled, we are ready to try it out, to see how it compares to the old ways.
From the Create menu we select Semantic Model and choose to start with an empty Model.
Let us call our Model 'HelloSemanticModel'. Once it is created, its high-level building blocks are available as icons just under the Model name:
- Connections,
- Physical Layer,
- Logical Layer,
- Presentation Layer,
- Variables,
- Invalid Files.
The names of items 2., 3. and 4. look familiar - indeed they are borrowed from the three Layers of the OBIEE's RPD.
Semantic Model's Physical Layer
Here we manage Databases that consist of Physical Schemas, Tables, Table Aliases and Joins between them - much like in the RPD's Physical Layer. Let us create a new Database - from the New menu we select Database.
Interestingly, the Physical Layer is not where physical database connections are defined - they are defined outside the Semantic Model. But Connections are visible in the Semantic Modeler's Connections view - this is where we select a FACT table and drag and drop it into our newly created Database's Tables view. Helpfully, it offers to import the DIMs that are joined to it in the physical database and even creates joins.
However, the joins created are of little use to us - our FACT_ORDER_LINE fact joins to the DIM_ADDRESS dimension in 3 ways: as Customer, Outlet and Staff Address. Just like in the RPD, we fix that by using Table Aliases.
After creating Aliases for all Physical Tables, I was about to create Joins between them and then I saw this:
Columns in all my Physical Tables had been duplicated. Why? Because most of my Aliases were named the same as their base Physical Tables, the only difference being UPPER_CASE
replaced with Camel_Case
. It turns out, Semantic Modeler's name uniqueness checker is case sensitive whereas further down the line the case is ignored. The best way I found to fix this was to scrap and re-create the Semantic Model, while reminding myself this is just a preview.
After re-creating the Model and using proper naming for Aliases, the issue was fixed.
Before moving on to the next Layer, let us have a look at the Connection Pool tab. Instead of defining a connection from start to finish like in the RPD, here you select it from a drop-down and provide a bit of additional info like Timeout and DB Writeback.
Semantic Model's Logical Layer
Semantic Model's Logical Layer does the same job as the RPD's BMM Layer. Here, as expected, we turn Physical Database Tables into Facts and Dimensions and define Logical Joins between them. Lookup Tables don't get to hang out with the cool guys - they are separated from Facts and Dimensions in a different tab.
Do you notice something missing? That's right - no dimension hierarchies! That is because they are now part of the Dimension tables, which makes sense.
First we import the Fact table. Then the Dimensions. Helpfully, the dimensions that the Fact table can physically be joined with, are already pre-selected.
After importing the Dimensions, we want to define their Hierarchies. To do that, we have to right-click on each and choose Edit from the menu.
Then we go straight to the Hierarchy tab and define a Dimensional Hierarchy in a manner very similar to RPD.
Grand total:
Detail:
It looks very similar to RPD so we feel right at home here. (To build an entire Model in a single blog post, we rush forward without exploring the detail.)
Now that dimensional hierarchies are defined, we want to set dimensional levels for our Fact Source. We right-click our Fact table and choose Edit from menu.
We scroll down the Sources tab and, unsurprisingly, we find Data Granularity section. Setting dimensional levels here is nicer than in the RPD's Admin Tool - kids have it so easy these days!
Before we move on to the next Layer, let us have a look at Attribute expressions. We create a Full Name
attribute for the Dim - Customers
dimension as a concatenation of First and Last Names.
Notice the Aggregation Rule option here, relevant for facts.
Semantic Model's Presentation Layer
Not much going on here - we create a Subject Area...
... and add tables to it.
All done! Well, almost. Just like in the Admin Tool, we want to check consistency.
Is Cloud BI the Future?
It has been a long wait but now it looks like it! Whilst not perfectly stable and production-ready, using the Semantic Modeler is a good experience. In this blog post we went through the creation of a very simple dimensional model. I did not mention the very promising lineage visualisation capability that the Semantic Modeler now offers, the new and much nicer SMML language instead of the old UDML language of the OBIEE's RPD, Git support. But those are so exciting they deserve their own blog posts. The future looks bright for Cloud BI. Back in the day when we worked with Siebel Analytics, we could only dream of...
Oracle APEX – Basic Business Questions
It could be the case that you have heard of Oracle APEX lately. Maybe not just once, but a couple of times and maybe more often than ever before. We I will try to answer a couple of basic questions in this blog, that might help you understand why Oracle APEX is great and is becoming very popular.
What is Oracle APEX? What is Oracle APEX used for? What is Oracle ORDS? is Oracle APEX free? How popular is Oracle APEX? Is Oracle APEX low-code or no-code? How can I learn Oracle APEX?
If you are still doubting Application Express super powers, this blog might help you clarify.
What is Oracle APEX?
Oracle Application Express (APEX) is a low-cost, easy-to-use development environment that enables you to create sophisticated, database-driven applications. With APEX, you can quickly develop applications that run on the Oracle Database.
APEX enables you to develop and deploy web-based applications on the Oracle Database. It includes an IDE with a set of tools to accelerate the development process. APEX is fully supported and comes with a set of sample applications.
What is Oracle APEX used for?
APEX is the perfect tool for creating small, departmental applications, prototyping new ideas or even creating large enterprise applications. Given that it follows the latest web standards, it makes it simple to create any type of application that can be handled by your browser.
Some common use cases for Oracle Apex include:
- Opportunistic Apps
- Spreadsheet Replacement
- Oracle Forms Modernization
- External Data Sharing
- Datamart Reporting
- Enterprise-scale Apps
- SaaS and EBS Extensions
Is Oracle APEX free?
I would like to say NO to this question, as the price seems to be one of the key factors in APEX's unpopularity a couple of years ago. Free things are not very welcomed in enterprises. But... YES, Oracle Application Express (APEX) is free. It is a fully supported no-cost feature of the Oracle Database. It can run as well in the free version of the database (XE).
How popular is Oracle APEX?
There is no definitive answer to this question as popularity is subjective. However, a quick Google search shows that there are many articles and blog posts discussing the popularity of Oracle APEX. For some real examples, you can check this site or contact us for a demo :). Maybe checking this post from Oracle is worth it.
Is Oracle APEX low-code or no-code?
Oracle APEX can be considered low-code as it allows users to develop applications without writing extensive amounts of code. As Oracle said:
Build enterprise apps 20x faster with 100x less code.
but I believe the good thing is, that you can execute any PL/SQL logic or reuse any views that you have already in your database.
How can I learn Oracle APEX?
Generally speaking, if you know SQL and PL/SQL, you might be halfway there. Add some Web technologies knowledge (HTTP, HTML, CSS, JS, etc.) and you will be able to deploy your first app in no time.
There are a few ways that you can learn Oracle APEX:
- Take an Oracle APEX training course. Contact our training department if need help with that.
- Read the Oracle APEX documentation.
- Download the Oracle APEX development environment from the Oracle website and build some sample applications.
- Attend some of the many Oracle APEX community events.
What is Oracle ORDS?
Oracle ORDS (Oracle Rest Data Services) is a tool that allows developers to quickly create RESTful API endpoints for data stored in an Oracle database.
ORDS can be used with any Oracle database and makes it easy to build RESTful services that expose the data in the database.
In the APEX case, is the middle-tier platform that allows a client (browsers) to reach APEX scripts to receive data back from the database (an HTML document to be rendered for example).