Category Archives: Rittman Mead
Using Tableau to Show Variance and Uncertainty
Recently, I watched an amazing keynote presentation from Amanda Cox at OpenVis. Toward the beginning of the presentation, Amanda explained that people tend to feel and interpret things differently. She went on to say that, “There’s this gap between what you say or what you think you’re saying, and what people hear.”
While I found her entire presentation extremely interesting, that statement in particular really made me think. When I view a visualization or report, am I truly understanding what the results are telling me? Personally, when I’m presented a chart or graph I tend to take what I’m seeing as absolute fact, but often there’s a bit of nuance there. When we have a fair amount of variance or uncertainty in our data, what are some effective ways to communicate that to our intended audience?
In this blog I'll demonstrate some examples of how to show uncertainty and variance in Tableau. All of the following visualizations are made using Tableau Public so while I won’t go into all the nitty-gritty detail here, follow this link to download the workbook and reverse engineer the visualizations yourself if you'd like.
First things first, I need some data to explore. If you've ever taken our training you might recall the Gourmet Coffee & Bakery Company (GCBC) data that we use for our courses. Since I’m more interested in demonstrating what we can do with the visualizations and less interested in the actual data itself, this sample dataset will be more than suitable for my needs. I'll begin by pulling the relevant data into Tableau using Unify.
If you haven't already heard about Unify, it allows Tableau to seamlessly connect to OBIEE so that you can take advantage of the subject areas created there. Now that I have some data, let’s look at our average order history by month. To keep things simple, I’ve filtered so that we’re only viewing data for Times Square.
On this simple visualization we can already draw some insights. We can see that the data is cyclical with a peak early in the year around February and another in August. We can also visually see the minimum number of orders in a month appears to be about 360 orders while the maximum is just under 400 orders.
When someone asks to see “average orders by month”, this is generally what people expect to see and depending upon the intended audience a chart like this might be completely acceptable. However, when we display aggregated data we no longer have any visibility into the variance of the underlying data.
If we display the orders at the day level instead of month we can still see the cyclical nature of the data but we also can see additional detail and you’ll notice there’s quite a bit more “noise” to the data. We had a particularly poor day in mid-May of 2014 with under 350 orders. We’ve also had a considerable number of good days during the summer months when we cleared 415 orders.
Depending upon your audience and the dataset, some of these charts might include too much information and be too busy. If the viewer can’t make sense of what you’re putting in front of them there’s no way they’ll be able to discern any meaningful insights from the underlying dataset. Visualizations must be easy to read. One way to provide information about the volatility of the data but with less detail would be to use confidence bands, similar to how one might view stock data. In this example I’ve calculated and displayed a moving average, as well as upper and lower confidence bands using the 3rd standard deviation. Confidence bands show how much uncertainty there is in your data. When the bands are close you can be more confident in your results and expectations.
An additional option is the use of a scatterplot. The awesome thing about a scatterplots is that not only does it allow you to see the variance of your data, but if you play with the size of your shapes and tweak the transparency just right, you also get a sense of density of your dataset because you can visualize where those points lie in relation to each other.
The final example I have for you is to show the distribution of your data using a boxplot. If you’re not familiar with boxplots, the line in the middle of the box is the median. The bottom and top of the box, known as the bottom and top hinge, give you the 25th and 75th percentiles respectively and the whiskers outside out the box show the minimum and maximum values excluding any outliers. Outliers are shown as dots.
I want to take a brief moment to touch on a fairly controversial subject of whether or not to include a zero value in your axes. When you have a non-zero baseline it distorts your data and differences are exaggerated. This can be misleading and might lead your audience into drawing inaccurate conclusions.
For example, a quick Google search revealed this image on Accuweather showing the count of tornados in the U.S. for 2013-2016. At first glance it appears as though there were almost 3 times more tornados in 2015 than in 2013 and 2014, but that would be incorrect.
On the flipside, there are cases where slight fluctuations in the data are extremely important but are too small to be noticed when the axis extends to zero. Philip Bump did an excellent job demonstrating this in his "Why this National Review global temperature graph is so misleading" article in the The Washington Post.
Philip begins his article with this chart tweeted by the National Review which appears to prove that global temperatures haven’t changed in the last 100 years. As he goes on to explain, this chart is misleading because of the scale used. The y-axis stretches from -10 to 110 degrees making it impossible to see a 2 degree increase over the last 50 years or so.
The general rule of thumb is that you should always start from zero. In fact, when you create a visualization in Tableau, it includes a zero by default. Usually, I agree with this rule and the vast majority of the time I do include a zero, but I don’t believe there can be a hard and fast rule as there will always be an exception. Bar charts are used to communicate absolute values so the size of that bar needs to be proportional to the overall value. I agree that bar charts should extend to zero because if it doesn’t we distort what the data is telling us. With line charts and scatterplots we tend to look at the positioning of the data points relative to each other. Since we’re not as interested in the value of the data, I don’t feel the decision to include a zero or not is as cut and dry.
The issue boils down to what it is you’re trying to communicate with your chart. In this particular case, I’m trying to highlight the uncertainty so the chart needs to draw attention to the range of that uncertainty. For this reason, I have not extended the axes in the above examples to zero. You are free to disagree with me on this, but as long as you’re not intentionally misleading your audience I feel that in instances such as these this rule can be relaxed.
These are only a few examples of the many ways to show uncertainty and variance within your data. Displaying the volatility of the data and giving viewers a level of confidence in the results is immensely powerful. Remember that while we can come up with the most amazing visualizations, if the results are misleading or misinterpreted and users draw inaccurate conclusions, what’s the point?
OBIEE 12c Catalog Validation: Command Line
I wrote a blog post a while ago describing the catalog validation: an automated process performing a consistency check of the catalog and reporting or deleting the inconsistent artifacts.
In the post I stated that catalog validation should be implemented regularly as part of the cleanup routines and provides precious additional information during the pre and post upgrade phases.
However some time later I noted Oracle's support Doc ID 2199938.1 stating that the startup procedure I detailed in the previous blog post is not supported in any OBI release since 12.2.1.1.0. You can imagine my reaction...
The question then became: How do we run the catalog validation since the known procedure is unsupported? The answer is in catalog manager and the related command line call runcat.sh
which, in the server installations (like the SampleApp v607p), can be found under $DOMAIN_HOME/bitools/bin
.
How Does it Work?
As for most of command line tools, when you don't have a clue on how it works, the best approach is to run with the -help
option which provides the list of parameters to pass.
Catalog Manager understands commands in the following areas:
Development To Production
createFolder Creates folder in the catalog
delete Deletes the given path from the catalog
maintenanceMode Puts the catalog into or out of Maintenance Mode (aka ReadOnly)
...
Multi-Tenancy
provisionTenant Provisions tenants into a web catalog
...
Patch Management
tag Tags all XML documents in a catalog with a unique id and common version string
diff Compares two catalogs
inject Injects a single item to a diff file
...
Subject Area Management
clearQueryCache Clears the query cache
Unfortunately none of the options in the list seems to be relevant for catalog validation, but with a close look at the recently updated Doc ID 2199938.1 I could find the parameter to pass: validate
.
The full command then looks like
./runcat.sh -cmd validate
In my previous blog I mentioned different types of validation. What type of validation is the default command going to implement? How can I change the behaviour? Again the -help
option provides the list of instructions.
# Command : -cmd validate -help
validate Validates the catalog
Description
Validates the catalog
For more information, please see the Oracle Business Intelligence Suite
Enterprise Edition's Presentation Services Administration Guide.
Syntax
runcat.cmd/runcat.sh -cmd validate
[ -items (None | Report | Clean) [ -links (None | Report | Clean) ] [-folder <path{:path}>] [-folderFromFile <path of inclusion list file>] ]
[ -accounts (None | Report | Clean) [ -homes (None | Report | Clean) ] ]
-offline <path of catalog>
Basic Arguments
None
Optional Arguments
-items (None | Report | Clean) Default is 'Report'
-links (None | Report | Clean) Default is 'Clean'. Also, '-items' cannot be 'None'.
-accounts (None | Report | Clean) Default is 'Clean'
-homes (None | Report | Clean) Default is 'Report'. Also, '-accounts' cannot be 'None'.
-folder <path{:path}> Which folders in the catalog to validate
-folderFromFile <path of inclusion list file> File containing folders in the catalog to validate
Common Arguments
-offline <path of catalog>
-folderFromFile <folder from file> ----- Sample Folder From File ------
/shared/groups/misc
/shared/groups/_filters
------------------------------------
Example
runcat.cmd/runcat.sh -cmd validate -offline c:\oraclebi\data\web\catalog\paint
Few bits to notice:
- -offline: the catalog validation needs to happen offline. Either with services down or on a copy of the live catalog. Running catalog validation on a online catalog is dangerous especially with "Clean" options since could delete content in use.
- -folder: the catalog validation can be run only for a subset of the catalog
- None | Report | Clean: each validation can be skipped (None), logged (Report) or solved via removal of the inconsistent object (Clean)
- Also, '-accounts' cannot be 'None'.: some validations are a prerequisite for others to happen
- Default is 'Clean': some validations have a "Clean" as default value, meaning that will solve the issue by removing the inconsistent object, this may be inappropriate in some cases.
As written before, the initial catalog validation should be done with all options set on Report since this will give a log file of all inconsistencies without deleting pieces of the catalog that could still be valuable. In order to do so the command to execute is:
./runcat.sh -cmd validate -items Report -links Report -accounts Report -homes Report -offline <path_to_catalog> > cat_validation.log
runcat.sh
output is displayed direcly in the console, I'm redirecting it to a file called cat_validation.log
for further analysis.
If, after the initial run with all options to Report you want the catalog validation utility to "fix" the inconsistent objects, just change the desired options to Clean. Please make sure to take a backup of the catalog before since the automatic fix is done by removing the related objects. Moreover ensure that catalog validation is working on a offline catalog. The command itself can work on top on a online catalog but is never a good idea checking a catalog that could potentially be changed while the tool is running.
The output
Let's see few examples of how Catalog Validation spots inconsistent objects. For the purpose of this test I'll work with Oracle's Sampleapp.
Abandoned and inaccessible homes
Running the validation against the Sampleapp catalog provides some "interesting" results: some homes are declared "abandoned": this could be due to the related user not existing anymore in weblogic console, but that's not the case
E10 saw.security.validate.homes Abandoned home /users/weblogic
Looking deeper in the logs we can see that the same user folders are flagged as
User facing object '/users/weblogic' has no user permissions and is inaccessible
Logging in with the user weblogic
doesn't allow me to check the "My Folders" in the catalog. When switching to "Admin View" and trying to open "My Folder" I get the following error
As written in the logs looks like the user folder has permission problems. How can we solve this? One option is to use again the runcat.sh
command with the forgetAccounts
option to remove the inconsistent homes. However this solution deletes all the content related to the user that was stored under the "My Folders".
In order to keep the content we need to overwrite the folder's permission with an administrator account. Unfortunately, when right-clicking on the folder, the "Permission" option is not available.
As a workaround I found that clicking on Properties
and then on Set Ownership of this item and all subitems
allows you to grant full access to the administrator which is then able to reset the proper user the relevant access privilege.
Once the workaround is implemented the users is able to check his "My Folder" content, however the the errors are still present in catalog validation. The solution is storing the relevant artifacts in another part of the catalog, run runcat.sh
with forgetAccounts
option and then reimport the objects if needed.
Inconsistent Objects
The main two reasons generating inconsistent objects are:
- Invalid XML: The object (analysis or dashboard) XML code is not valid. This can be caused by errors during the write to disk or problems during migrations.
- Broken Links: analysis contained in a dashboard or linked from other analysis have been renamed or deleted.
Let's see how catalog validation shows the errors.
Invalid XML
To test this case I created a simple analysis with two columns and then went to the Advanced tab and deliberately removed an >
to make the XML invalid.
When trying to applying the change I got the following error which denied me the possibility to save.
Since I really wanted to ruin my analysis I went directly to the file system under $BI_HOME/bidata/service_instances/ssi/metadata/content/catalog/root/shared/$REQUEST_PATH
and changed the XML directly there.
After than I run the catalog validation with only the flag items
equal to Report
and the rest set to None
since I'm looking only at invalid XMLs.
The result as expected is:
Message: Unterminated start tag, 'saw:column', Entity publicId: /app/oracle/biee/user_projects/domains/bi/bidata/service_instances/ssi/metadata/content/catalog/root/shared/rm+demo/notworkinanalysis, Entity systemId: , Line number: 9, Column number: 13
Which tells me that my analysis notworkinganalysis
is invalid with an unterminated start tag, exactly the error I was expecting. Now I have two choices: either fixing the analysis XML manually or rerunning the catalog validation with option Clean
which will delete the analysis since it's invalid. As said before there is no automated fix.
I wanted to do a further example on this, instead of removing the >
, i removed a quotation mark "
to make the analysis invalid
After clicking to Apply OBIEE already tells me that there is something wrong in the analysis. But since it allows me to save and since I feel masochist I saved the analysis.
But... when running the catalog validation as before I end up seeing 0 errors related to my notworkinganalysis
.
The answer to Jackie Chan question is that I got 0 errors since in this second case the XML is still valid. Removing a "
doesn't make the XML syntax invalid! In order to find and solve that error we would need to use Oracle's Baseline Validation Tool.
Broken Links
To test the broken links case I created the following scenario:
- Analysis
SourceAnalysis
which has navigation action toTargetAnalysis
- Dashboard
TestDashboard
which contains theTargetAnalysis
object.
In order to break things I then deleted the TargetAnalysis
.
Running catalog validation with the option links
to Report
. As expected I get a line
N1 saw.catalog.impl.scour.validateDeadLink Referenced path /shared/RM Demo/TargetAnalysis in file /shared/RM Demo/_portal/TestDashboard/page 1 is inaccessible.
But I don't get anything on the SourceRequest
object, for which navigation is failing.
But if instead of an action link I use TargetAnalysis
to filter the results of SourceAnalysis
And then delete TargetAnalysis
, I get the expected error:
N1 saw.catalog.impl.scour.validateDeadLink Referenced path /shared/RM Demo/TargetAnalysis in file /shared/RM Demo/SourceAnalysis is inaccessible
Summarizing the broken link validation reports if missing objects are included in the main definition of other objects (as filters or as parts of dashboards) but doesn't seem to report if the missing object is only linked via an action.
Conclusion
My experiments show that catalog validation finds some errors like invalid homes, XML files and broken links which otherwise users would hit at the run-time and that won't make them happy. But there are still some errors which it doesn't log like analysis with wrong column syntax, luckily for most of the cases other tools like the Baseline Validation can spot them easily so use all you have, use as frequently as possible and if you want more details about how it works and how it can be included in the automatic checks for code promotions don't hesitate to contact us!
Getting Smarter in Renting with Tableau 10
Preface
Not a long time ago a friend of mine spent a significant amount of time trying to find a flat to rent. And according to what he said it wasn't an easy task. It took him a decent time and efforts to find something that is big enough (but not too big) not too far from a workplace, had required features and affordable at the same time. And as a specialist in data analysis, I prefer to think about this task as a data discovery one (yes, when you have a hammer everything looks like a nail). And I decided to see if a data analysis tool can help me understand the rental market better. I'm sure you've already read the name of this post so I can't pretend I'm keeping intrigue. This tool is Tableau 10.3.
The Data
The friend I was talking before was looking for a flat in Moscow, but I think that this market is completely unknown to the most of the readers. And also I'd have to spend a half of time translating everything into English so for this exercise I took Brighton and Hove data from http://rightmove.co.uk and got a nice JSON Lines file. JSON Lines files are basically the same JSON as we all know but every file has multiple JSONs delimited by a newline.
{json line #1}
{json line #2}
...
{json line #n}
That could be a real problem but luckily Tableau introduced JSON support in Tableau 10.1 and that means I don't have to transform my data to a set of flat tables. Thanks to Tableau developers we may simply open JSON Lines files without any transformations.
Typical property description looks like this:
It has a few major blocks:
- Property name - 2 bedroom apartment to rent
- Monthly price - £1,250
- Description tab:
- Letting information - this part is more or less standard and has only a small number of possible values. This part has
Property name
:Property value
structure ('Date available':'Now'). - Key features - this part is an unformalized set of features. Every property may have its own unique features. And it is not a key-value list like Letting information, but a simple list of features.
- Full description - simply a block of unstructured text.
- Letting information - this part is more or less standard and has only a small number of possible values. This part has
- Nearest stations - shows three nearest train stations (there could be underground stations too if they had it in Brighton).
- School checker - this shows 10 closest primary and 10 secondary schools. For this, I found a kind of API which brought me a detailed description of every school.
And finally, the JSON for one property has the following look. In reality, it is one line but just to make it more easy to read I formatted it to a human readable format. And also I deleted most of the schools' info as it is not as important as it is huge.
Property JSON
{
"furnish":"Unfurnished",
"key_features":[
"LARGE BRIGHT SPACIOUS LOUNGE WITH PATIO DOORS",
"FULLY FITTED KITCHEN",
"TWO DOUBLE BEDROOMS WITH WARDROBES",
"A FURTHER SINGLE BEDROOM/OFFICE/STUDY",
"A GOOD SIZED SHOWER ROOM ",
"SINGLE GARAGE AND ON STREET PARKING",
"EASY ACCESS TO THE CITY CENTRE OF CHICHESTER AND COMMUTER ROUTES. ",
"TO ARRANGE A VIEWING PLEASE CONTACT US ON 01243 839149"
],
"property_price_week":"£254 pw",
"nearest_stations":[
{
"station_name":"Fishbourne",
"station_dist":"(0.4 mi)"
},
{
"station_name":"Chichester",
"station_dist":"(1.2 mi)"
},
{
"station_name":"Bosham",
"station_dist":"(1.7 mi)"
}
],
"letting_type":"Long term",
"secondary_schools":{
"schools":[
{
"distance":"0.6 miles",
"ukCountryCode":"ENG",
"name":"Bishop Luffa School, Chichester",
...
}]
}
"url":"http://www.rightmove.co.uk/property-to-rent/property-66941567.html",
"date_available":"Now",
"date_reduced":"",
"agent":"On The Move, South",
"full_description":"<p itemprop=\"description\">We are delighted to bring to market, this fabulous semi detached bungalow ... </p>",
"primary_schools":{
"schools":[
{
"distance":"0.3 miles",
"ukCountryCode":"ENG",
"name":"Fishbourne CofE Primary School",
}]
}
},
"property_address":[ "Mill Close, Chichester, West Sussex, PO19"],
"property_name":"3 bedroom bungalow to rent",
"date_added":"08 June 2017 (18 hours ago)",
"property_price_month":"£1,100 pcm",
"let_agreed":null,
"unknownown_values":"",
"deposit":"£1384"
}
The full version is here: 6391 lines, I warned you. My dataset is relatively small and has 1114 of such records 117 MB in total.
Just a few things I'd like to highlight. Letting information has only a small number of fixed unique options. I managed to parse them to fields like furnish
, letting_type
, etc. Key Features list became just an array. We have thousands of various features here and I can't put them to separate fields. Nearest stations list became an array of name and value pairs. My first version of the scrapper put them to a key-value list. Like this:
"nearest_stations":[
"Fishbourne": "(0.4 mi)",
"Chichester": "(1.2 mi)",
"Bosham": "(1.7 mi)"
]
but this didn't work as intended. I got around one hundred of measures with names Fishbourne, Chichester, Bosham, etc. Not what I need. But that could work well if I had only a small number of important POIs (airports for example) and wanted to know distances to this points. So I changed it to this and it worked well:
"nearest_stations":[
{
"station_name":"Fishbourne",
"station_dist":"(0.4 mi)"
},
{
"station_name":"Chichester",
"station_dist":"(1.2 mi)"
},
{
"station_name":"Bosham",
"station_dist":"(1.7 mi)"
}
]
Connect to the Data
When I started this study my knowledge of the UK property rent market was close to this:
And it's possible or even likely that some of my conclusions may be obvious for anyone who is deep in the topic. In this blog, I show how a complete newbie (me) can use Tableau and become less ignorant.
So my very first task was to understand what kind of objects are available for rent, what are their prices and so on. That is the typical task for any new subject area.
As I said before Tableau 10 can work with JSON files natively but the question was if it could work with such a complex JSON as I had. I started a new project and opened my JSON file.
I expected that I will have to somehow simplify it. But in reality after a few seconds of waiting Tableau displayed a full structure of my JSON and all I had to do was selecting branches I need.
After a few more seconds I got a normal Tableau data source.
And this is how it looked like in analysis mode
First Look at the Data
OK, let's get started. The first question is obvious: "What types of property are available for rent?". Well, it seems that name
('2 bedroom apartment to rent') is what I need. I created a table report for this field.
Well, it gives me the first impression of what objects are offered and what my next step should be. First of all the names are ending with "to rent". This just makes strings longer without adding any value. The word "bedroom" also doesn't look important. Ideally, I'd like to parse these strings into fields one of which is # of bedrooms
and the second one is Property type
. The most obvious action is to try Split
function.
Well, it partially worked. This function is smart enough and removed 'to rent' part. But except for this, it gave me nothing. On other datasets (other cities) it gave me much better results but it still wasn't able to read my mind and did what I wanted:
But I spent 15 seconds for this and lost nothing and if it worked I'd saved a lot of time. Anyway, I'm too old to believe in magic and this almost didn't hurt my feelings.
Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems.
Yes, this string literally asks some regular expressions wizardry.
I can easily use REGEXP_EXTRACT_NTH
and get what I want. Group 1 is the number of bedrooms and Group 3 is the property type. Groups 2 and 4 are just constant words.
Explanation for my regular expression
I can describe most of the names in the following way: "digit
bedroom property type
to rent" and the rest are "property type
to rent. So digit
and bedroom are optional and property type
to rent are mandatory. The expression is easy and obvious: ([0-9]*)( bedroom )*(.*)( to rent)
Regular expressions are one of my favourite hammers and helped me a lot for this analysis. And after all manipulations, I got a much better view of the data (I skipped some obvious steps like create a crosstab or a count distinct measure to save space for anything more interesting).
And while this result looks pretty simple it gives me the first insight I can't get simply browsing the site. The most offered are 1 and 2 bedroom properties especially flats and apartments. And if a family needs a bigger something with 4 or 5 bedrooms, well I wish them good luck, not many offers to chose from. Also if we talk about living property only we should filter out things like GARAGE, PARKING or LAND.
I think both charts work pretty well. The first one presents a nice view of how flats and apartments outnumber all other types and the second one gives a much better understanding of how many of 2 bedroom properties offered compared to all others.
And while I'm not a big fan of fancy visualisations but if you need something less formal and more eye-catching try Bubbles chart
. It's not something I'd recommend for an analysis but may work well for a presentation. Every bubble represents particular property type, colour shows a number of bedrooms and size shows the number of properties.
Going Deeper
The next obvious question is the price. How much do different properties cost? Is any particular one more expensive than average or less? What influences the price?
As a baseline, I'd like to know what is the average property price. And I obviously don't want just one figure for the city-wide price. It's meaningless. Let's start with a bar chart and see what is the range of prices.
Well, we have a lot of options. Flat share costs less than £700 or we may choose a barn for more than £3600. Again a very simple result but I can't get it directly from the site.
The next obvious question is how the number of bedrooms affects the price. Does the price skyrockets with every additional bedroom or maybe more bedrooms mean smaller rooms and price increases not too fast?
Well, this chart gives me the answer but it looks bad. Mostly because a lot of properties types don't have enough variance in room number. Studio flats have only one bedroom by definition and the only converted barn has 7 bedrooms. I'd like to remove types which don't have at least 3 options and see how the price changes. For this, I created a new computed field using fixed
keyword. It counts the number of bedroom options by property type.
And then I use it in the filter 'Bedroom # variance' at least 3
. Now I have a much more clean view. And I can see that typically more bedrooms mean significantly higher price with a few exceptions. But in fact, these are not actual exceptions just a problem of a small dataset. I can say that increase in # bedrooms certainly means a significant increase in price. And one more insight. Going above 7 bedrooms may actually double the price.
Averages are good but they hide important information of how prices are distributed. For example, six properties priced £1K and one £200 give average £885. And looking at average only may make you think that with £900 you may choose one of 7 options. It's very easy to build a chart to check this. Just create a new calculation called Bins
and use in a chart.
With £100 bins I got the following chart. It shows how many properties have price falling to a particular price range. For example, the £1000 bin shows # of properties with prices £1000-£1100.
The distribution looks more or less as expected but the most interesting here is that £1000-£1100 interval seems to be very unpopular. Why? Let's add # of bedrooms
to this chart.
£1000 is too expensive for 1 bedroom and studios but too cheap for two. Simple. What else can we do here before moving further? Converting this chart to a running total gives a cool view.
What can this chart tell us? For example, if we look at the orange line (2 bedrooms) we will find that with £1200 we may choose among 277 of 624 properties. With £1400 budget we have 486 of 624. Further £200 increase in budget won't significantly increase the number of possibilities and if the change from £1200 to £1400 almost doubled the number of possibilities, the next £200 give only 63 new options. I don't have a ready to use insight here, but I got a way to estimate a budget for a particular type of property. With budget £X I will be able to choose one of X properties.
Why It Costs What It Costs
OK, now I know a lot of statistics about prices. And my next question is about factors affecting the price. I'd like to understand does a particular property worth what it cost or not. Of course, I won't be able to determine exact price but even hints may be useful.
The first hypothesis I want to check is if a train station near raises the price or it isn't any important. I made a chart very similar to the previous one and it seems that Pareto principle works perfectly here. 80% or properties are closer than 20% of the maximum distance to a station.
But this chart doesn't say anything about the price it just gives me the understanding of how dense train stations are placed. I'd say that most of the properties have a station in 10-15 minutes of walking reach and therefore this should not significantly affect the price. My next chart is a scatter plot for price and distance. Every point is a property and its coordinates on the plot determined by its price and distance to the nearest station. Colour shows # of bedrooms
.
I'd say that this chart shows no clear correlation between price and distance. And a more classical line chart shows that.
The maximum price slightly decreases with distance, minimum price on the contrary increases. Average price more or less constant. I think the hypothesis is busted. There is no clear correlation between the distance a tenant have to walk to a station and the price he has to pay. If you want to rent something and the landlord says that the price is high because of a train station near, tell him that there are stations all around and he should find something more interesting.
What about furnishings? Does it cheaper to get an unfurnished property or a landlord will be happy to meet someone who shares his taste?
Unfurnished property is definitely cheaper. And it's interesting that in some cases partly furnished even cheaper than completely unfurnished. But at least for furnished/unfurnished, we can see a clear correlation. When you see a furnished one for the price of unfurnished this may be a good pennyworth.
Another thing I'd like to check. Can we expect I lower price for a property not available immediately? Or is, on the contrary, the best price is offered for already unoccupied properties?
As always start with a general picture. What is the average time of availability by property types?
For most popular types it is about one month and if you have a house you typically publish it two or three months in advance. And what is about the price? One more chart that I like in Tableau. In the nutshell, it is a normal line chart showing an average price by days before property availability. But the thickness of lines shows the number of properties at the same time. So I can see not only the price but reliance too. A thick line means it was formed by many properties and a thin line may be formed by few properties and move up or down significantly then something changes. It would be very interesting to get a historical data and see how much time properties stay free or how long it takes before the price is reduced, but unfortunately, I don't have this data.
And looking at this chart I'd say that there is no statistically significant dependency for price and availability date. Renting a property available in the distant future won't save you money* (*=statistically).
And the last thing I'd like to investigate is the Key features. What do landlords put as the key features of their properties? How do they affect the price?
The list of popular Key features surprised me.
'Unfurnished' looks good to me, it is a really significant part of the deal. But 'Brighton'? For properties in Brighton? '1 Bedroom'. How many bedrooms can '1 bedroom flat to rent' have? Oh, there is a key feature saying '1 bedroom' now I know. But jokes aside. I had to make a lot of cleaning on this data before I could use it. There are six ways to write 'Modern kitchen'. Make everything upper case, then remove quotes, strip spaces and tabs, remove noisy features like 'stylish 1 bedroom apartment' and so on. After this, I got a slightly better list with approximately 3500 features instead of 4500. Note how all variants of writing 'GAS CENTRAL HEATING' now combined into one most popular feature. But there are still too many features. I'm sure that there should be not more than a hundred of them. Even at this screenshot you may see 'Unfurnished' and 'Unfurnished property' features.
When I need a visualisation for this amount of points, bar charts or tables won't play well. My weapon of choice is Scatter plot. Every point is a particular feature, axes are minimum and average prices of it, size is determined by the number of properties declaring to have this feature and the colour is the maximum price. So if a feature is located high on the plot it means that in average it will be expensive to have it. If this feature at the same time located close to the left side even cheap properties may have it. For example, if you want a swimming pool be ready to pay at least £3000 and £7000 in average. And the minimum price for tumble dryer is £3250 but average £3965. The cheapest property with a dryer is more expensive than with a pool, but in average pools are more expensive. That is how this chart works.
The problems of this chart are obvious. It is littered with unique features. Only one property has 4 acres (the point in top right corner). And actually not so many swimming pools are available for rent in Brighton. I filtered it by "# of properties > 25" and here is how prices for the most popular features are distributed.
Central location will cost you at least £100 and £1195 in average and for Great location be ready to pay at least £445 and £1013 in average. Great location seems to be less valuable than the central one.
And now I can see how a particular feature impacts prices. For example 'GAS HEATING'. I made a set with all variants of heating I could find ('GAS CENTRAL HEATING', 'GAS HEAT' and so on). Now I can analyse how this feature impacts properties. And here is how it impacts the price of flats. Blue circles are properties with gas heating and orange are without.
Very interesting in my opinion. The minimum price of properties with gas heating (blue circles) is higher than without. That is expected. But average price for properties without gas heating is higher.
And here are kitchen appliances. For 1 bedroom flats, they increase both minimum and average prices significantly. But for bigger flats minimum price with appliances is higher and average price is lower. Possible this option is important for relatively cheap properties, but its weight is not that big for the bigger ones.
Summary
Rittman Mead at Kscope 2017
Rittman Mead will be well represented in San Antonio, Texas next week for Kscope 17 with some of our best from both sides of the Atlantic! Our very own Francesco Tisiot and Jordan Meyer will present various topics as well as participate in the conference events. Also, the newly named ODTUG BI Community Lead, Rittman Mead's Becky Wagner, will be on hand and leading a lot of activities throughout. See details below and we hope to see you in Texas.
Jordan
Oracle Big Data Spatial and Graph enables the analysis of data sets beyond that of standard relational analytics commonly used. Through graph technology relationships can be identified that may not otherwise have been. This has practical uses including in product recommendations, social network analysis, and fraud detection.
In this presentation we will see a practical demonstration of Oracle Big Data Spatial and Graph to load and analyze the "Panama Papers" data set. Graph algorithms will be utilized to identify key actors and organizations within the data, and patterns of relationships shown. This practical example of using the tool will give attendees a clear idea of the functionality of the tool and how it could be used within their own organization.
When: Jun 27, 2017, Tuesday Session 7 , 11:15 am - 12:15 pm
Room: Magnolia
Francesco
OBIEE 12c is the latest generation of Oracle's Enterprise analytics and reporting tool, bringing with it many powerful new features. Many users are still on earlier releases of OBIEE 11g or even 10g, and are looking to understand how they can move to OBIEE 12c to benefit from its new capabilities.
Liberty Global is a global telecommunications company, with a long history with OBIEE going back to 10g. They wanted to move to OBIEE 12c in order to use the new Advanced Analytics options, and used Rittman Mead to support them with the full scope of the upgrade.
In this presentation, we will see what a highly successful OBIEE 12c migration looks like. We will cover clear details of all the steps required, and discuss some of the problems encountered. Regression testing is a crucial step in any upgrade and we will show how we did this efficiently and accurately with the provided Baseline Validation Tool. This presentation will assist all attendees who are considering, or in the process of, an OBIEE 12c upgrade.
When: Jun 26, 2017, Monday Session 5 , 4:45 pm - 5:45 pm
Room: Wisteria/Sunflower
And
As a DBA or sysadmin responsible for OBIEE how do you really dig into the guts of OBIEE, look at intra-component communication between the system components and examine the apparently un-examinable? What do you do when you need to trace activity beyond what is in the log files? How do you work with log files in order to give precise but low-level information? What information can be gleaned, by hook or by crook, from OBIEE?
OBIEE provides a set of systems management and diagnostic tools, but these only take you so far. Join me in this presentation to dive deeper with OBIEE. We will take a look at a bag of tricks including undocumented configuration options, flame graphs, system call tracing, discovering undocumented REST APIs, and more! This is not just a geek-out - this is real-life examples of where client OBIEE projects have required that next level of diagnostic techniques and tools. Don your beanie hat and beard as we go deep!
When: Jun 28, 2017, Wednesday Session 12 , 9:45 am - 10:45 am
Room: Wisteria/Sunflower
Becky
Becky Wagner is the new ODTUG BI Community Lead. You will find her at:
Monday Community Lunch | 12:45 – 2:00 PM | Grand Oaks K-S
Monday evening BI Community Night | 8:00 - 10:00 PM | Grand Oaks H http://kscope17.com/events/community-nigh-events
She will be doing the 5K Fun Run http://kscope17.com/events/kscope17-5k on Tuesday morning
Women in Technology Lunch | 12:15– 1:45 PM | Cibolo Canyon 6 on Wednesday https://form.jotformpro.com/71134693041955
Unify: Could it be any easier?
Rittman Mead’s Unify is the easiest and most efficient method to pull your OBIEE reporting data directly into your local Tableau environment. No longer will you have to worry about database connection credentials, Excel exports, or any other roundabout way to get your data where you need it to be.
Unify leverages OBIEE’s existing metadata layer to provide quick access to your curated data through a standard Tableau Web Data Connector. After a short installation and configuration process, you can be building Tableau workbooks from your OBIEE data in minutes.
This blog post will demonstrate how intuitive and easy it is to use the Unify application. We will only cover using Unify and it’s features, as once the data gets into Tableau it can be used the same as any other Tableau Data Source. The environment shown already has Unify installed and configured, so we can jump right in and start using the tool immediately.
To start pulling data from OBIEE using Unify, we need to create a new Web Data Connector Data Source in Tableau. This data source will prompt us for a URL to access Unify. In this instance, Unify is installed as a desktop application, so the URL is http://localhost:8080/unify.
Once we put in the URL, we’re shown an authentication screen. This screen will allow us to authenticate against OBIEE using the same credentials. In this case, I will authenticate as the weblogic user.
Once authenticated, we are welcomed by a window where we can construct an OBIEE query visually. On the left hand side of the application, I can select the Subject Area I wish to query, and users are shown a list of tables and columns in the selected Subject Area. There are additional options along the top of the window, and I can see all saved queries on the right hand side of the window.
The center of the window is where we can see the current query, as well as a preview of the query results. Since I have not started building a query yet, this area is blank.
Unify allows us to either build a new query from scratch, or select an existing OBIEE report. First, let’s build our own query. The lefthand side of the screen displays the Subject Areas and Columns which I have access to in OBIEE. With a Subject Area selected, I can drag columns, or double click them, to add them to the current query. In the screenshot above, I have added three columns to my current query, “P1 Product”, “P2 Product Type”, and “1 - Revenue”.
If we wanted to, we could also create new columns by defining a Column Name and Column Formula. We even have the ability to modify existing column formulas for our query. We can do this by clicking the gear icon for a specific column, or by double-clicking the grey bar at the top of the query window.
It’s also possible to add filters to our data set. By clicking the Filter icon at the top of the window, we can view the current filters for the query. We can then add filters the same way we would add columns, by double clicking or dragging the specific column. In the example shown, I have a query on the column “D2 Department” where the column value equals “Local Plants Dept.”.
Filters can be configured using any of the familiar methods, such as checking if a value exists in a list of values, numerical comparisons, or even using repository or session variables.
Now that we have our columns selected and our filters defined, we can execute this query and see a preview of the result set. By clicking the “Table” icon in the top header of the window, we can preview the result.
Once we are comfortable with the results of the query, we can export the results to Tableau. It is important to understand that the preview data is trimmed down to 500 rows by default, so don’t worry if you think something is missing! This value, and the export row limit, can be configured, but for now we can export the results using the green “Unify” button at the top right hand corner of the window.
When this button is clicked, the Unify window will close and the query will execute. You will then be taken to a new Tableau Workbook with the results of the query as a Data Source. We can now use this query as a data source in Tableau, just as we would with any other data source.
But what if we have existing reports we want to use? Do we have to rebuild the report from scratch in the web data connector? Of course not! With Unify, you can select existing reports and pull them directly into Tableau.
Instead of adding columns from the lefthand pane, we can instead select the “Open” icon, which will let us select an existing report. We can then export this report to Tableau, just as before.
Now let’s try to do something a little more complicated. OBIEE doesn’t have the capability to execute queries across Subject Areas without common tables in the business model, however Tableau can perform joins between two data sources (so long as we select the correct join conditions). We can use Unify to pull two queries from OBIEE from different Subject Areas, and perform a data mashup with the two Subject Areas in Tableau.
Here I’ve created a query with “Product Number” and “Revenue”, both from the Subject Area “A - Sample Sales”. I’ve saved this query as “Sales”. I can then click the “New” icon in the header to create a new query.
This second query is using the “C - Sample Costs” Subject Area, and is saved as “Costs”. This query contains the columns “Product Number”, “Variable Costs”, and “Fixed Costs”.
When I click the Unify button, both of these queries will be pulled into Tableau as two separate data sources. Since both of the queries contain the “Product Number” column, I can join these data sources on the “Product Number” column. In fact, Tableau is smart enough to do this for us:
We now have two data sets, each from a different OBIEE subject area, joined and available for visualization in Tableau. Wow, that was easy!
What about refreshing the data? Good question! The exported data sources are published as data extracts, so all you need to do to refresh the data is select the data source and hit the refresh button. If you are not authenticated with OBIEE, or your session has expired, you will simply be prompted to re-authenticate.
Using Tableau to consume OBIEE data has never been easier. Rittman Mead’s Unify allows users to connect to OBIEE as a data source within a Tableau environment in an intuitive and efficient method. If only everything was this easy!
Interested in getting OBIEE data into Tableau? Contact us to see how we can help, or head over to https://unify.ritt.md to get a free Unify trial version.