Tag Archives: Oracle Database
One of the reasons I’ve been silent on the blog lately regarding new Oracle Data Integrator 12c content (even though I promised it some months ago) is because I’ve been writing an article focused on migration options from Oracle Warehouse Builder (OWB). Published recently on OTN, this article hopefully helps customers understand some of the options Oracle has included to make good on their recent Statement of Direction for Data Integration.
While the article is focused primarily on OWB customers looking to make the move, there’s plenty of value here for organizations already familiar with ODI 11g. I walk through an Enterprise Installation of ODI 12c on Linux, including Fusion Middleware domain creation and JEE agent configuration. Following that, I explore two new technical capabilities for OWB shops included in the new tool. First is Runtime Integration, which allows the execution of OWB mappings and process flows directly from ODI with complete auditing and load plan integration. Secondly, I investigate the new Migration Utility which will migrate the majority of customer mappings directly from OWB to ODI 12c. I then embark on a functional assessment of what it means to add value to a data integration project, and how organizations can use these new features to achieve it. I present three different approaches for OWB shops to implement ODI 12c: the Side-by-Side Approach, the Big Bang Approach, and the Phased Approach. Understanding that no size fits all, I’m sure organizations all over the spectrum will recognize a fit in at least one of these solutions.
As always, early access and beta participation means that Rittman Mead is prepared to implement an OWB migration project immediately using any of these three approaches. If you have any questions, please comment here on the blog, or reach out to us directly.
I’m sitting writing this at my desk back home, with a steaming mug of tea next to me and the kids pleased to see me after having been away for eight days (or at least my wifepleased to hand them over to me after looking after them for eight days). It was an excellent Oracle Openworld – probably the best in the ten years I’ve been going in terms of product announcements, and if you missed any of my daily updates, here’s the links to them:
- Oracle Openworld 2013 Day 0: Previewing the Week Ahead
- Oracle Openworld 2013 Day 1 : User Group Forum, News on Oracle Database In-Memory Option
- Oracle Openworld 2013 Day 2: Exalytics, TimesTen and Essbase Futures
- Oracle Openworld 2013 Days 3 & 4 : Oracle Cloud, OBIEE and ODI Futures
We also delivered sixteen sessions over the week, and whilst a few of them can’t be circulated because they contain details on beta or forthcoming products, here’s links to the ones that we can post:
- Deep Dive into OBIA 22.214.171.124.1 – Overview (Mark Rittman)
- Deep Dive into OBIA 126.96.36.199.1 – Data Integration (Stewart Bryson)
- Agile BI Development using OBIEE, ODI and Golden Gate (Stewart Bryson)
- Hyperion Profitability & Cost Management – Integration of Standard & Detailed Profitability (Venkatakrishnan J)
- Make the Most of your Exalytics and BI Investments with Enterprise Manager 12c (Mark Rittman, Henrik Blixt, Dhananjay Papde)
- Birds of a Feather Session: Best Practices for TimesTen on Exalytics (Mark Rittman, Chris Jenkins, Tony Heljula)
- Oracle BI EE Integration with Hyperion Data Sources (Venkatakrishnan J)
- Oracle Data Quality Solutions, Oracle Data Integrator and Oracle GoldenGate on Exadata (Jérôme Françoisse and Gurcan Orhan)
- Innovation in BI: OBIEE against Essbase and Relational (Stewart Bryson and Edward Roske)
- How to Handle Dev/Test/Prod with Oracle Data Integrator (Jerome Francoisse and Gurcan Orcan)
- Oracle Endeca User Experience Case Study at Barclays (James Knight and Kelvin Lau)
- Configuring OBIA 188.8.131.52.1 on ODI – Deep Dive (Mark Rittman, Kevin McGinley and Hari Cherukupalli)
So then, on reflection, what did I think about the various product announcements during the week? Here’s my thoughts now I’m back in the UK.
First off – Exalytics. Clearly there’s a lot of investment going into the Exalytics offering, both from the hardware and the software sides. For hardware, it’s just really a case of Oracle keeping up with additions to Sun’s product line, and with the announcement of the T5-8 model we’re now up to 4TB of RAM and 128 SPARC CPU cores – aimed at the BI consolidation market, where 1 or 2TB of RAM quickly goes if you’re hosting a number of separate BI systems. Cost-wise – it’s correspondingly expensive, about twice the price of the X3-4 machine, but it’s got twice the RAM, three times the CPU cores and runs Solaris, so you’ve got access to the more fine-grained workload separation and virtualisation that you get on that platform. Not a machine that I can see us buying for a while, but there’s definitely a market for this.
With Exalytics though you could argue that it’s been the software that’s underwhelmed so far, as opposed to the hardware. The Summary Advisor is good, but it doesn’t really handle the subsequent incremental refresh of the aggregate tables, and TimesTen itself whilst fast and powerful hasn’t had a great “out of the box” experience – in the wrong hands, it can give misleadingly-slow response-times, something I found myself a few months ago back on the blog. So it was interesting to hear some of the new features that we’re likely to see in “Exalytics v2.0″, probably late in calendar year 2014; an updated aggregate refresh mechanism based on DAC Server technology and with support for GoldenGate; new visualisations including data mash-up capabilities that I’m guessing we’ll see as exclusives on Exalytics and Oracle’s cloud products; enhancements coming for Essbase that’ll make it easier to spin-off ASO cubes from an OBIEE repository; and of course, the improvements to TimesTen to match those coming in the core Oracle database – in-memory analytics.
And what an announcement that was – in-memory column-store technology within the Oracle database, not predicated on using Exadata, and all running transparently in the background withminimal DBA setup required. Now in-reality, not only is this not the first in-memory Oracle database offering – the Exadata boxes in previous open world presentations also were positioned as in-memory, but that was flash memory, not DRAM – and they’re not the first vendor to offer in-memory, column-store as a feature, but given that it’ll be available to all Oracle 12.1.2 databases that license the in-memory option, and it’ll be so easy to administer – in theory – it’s a potential industry game-changer.
Of course the immediate question on my lips after the in-memory Oracle Database announcement was “what about TimesTen“, and “what about TimesTen’s role in Exalytics”, but Oracle played this in the end very well – TimesTen will gain similar capabilities, implemented in a slightly different way as TimesTen already stores its data in memory, albeit in row-store format – and in fact TImesTen can then take on a role of a business-controlled, mid-tier analytic “sandbox”, probably receiving new in-memory features faster than the core Oracle database as it’s got less dependencies and a shorter release cycle, but complementing the Oracle database an it’s own, more large-scale in-memory features. And that’s not forgetting those customers with data from multiple, heterogenous sources, or those that can’t afford to stump-up for the In-Memory option for all of the processors in their data warehouse database server. So – fairly internally-consisent at least at the product roadmap level, and we’ll be looking to get on any betas or early adopter programs to put both products through their paces.
The other major announcement that affects OBIEE customers, is, of course, OBIEE in the Cloud – or “Reporting-as-a-Service” as Oracle referred to it during the keynotes. This is one of the components of Oracle’s new “platform-as-a-service” or PaaS offering, alongside a new, full version of Oracle 12c based on its new multitenant architecture, identity-management-as-a-service, documents-as-a-service and so on. What reporting-a-service will give us isn’t quite “OBIEE in the cloud”, or at least, not as we know it now; Oracle’s view on platform-as-a-service is that it should be consumer-level in terms of simplicity to setup, and the quality of the user interface, it should be self-service and self-provisioning, and simple to sign-up for with no separate need to license the product. So in OBIEE terms, what this means is a simplified RPD/data model builder, simple options to upload and store data (also in Oracle’s cloud), and automatic provisioning using just a credit card (although there’ll also be options to pay by PO number etc, for the larger customers.)
And there’s quite a few things that we can draw-out of this announcement; first, it’s squarely aimed – at least at the start – at individual users, departmental users and the like looking to create sandbox-type applications most probably also linking to Oracle Cloud Database, Oracle Java-as-a-Service and the like. It won’t, for example, be possible to upload data to this service’s datastore using conventional ETL tools, as the only datasource it will connect to at least initially will be Oracle’s Cloud Database schema-as-a-service, which only allows access via ApEx and HTTP, because it’s a shared service and giving you SQL*Net access could compromise other users. In the future, it may well connect to Oracle’s full DBaaS which gives you a full Oracle instance, but for now (as far as I’ve heard) there’s no option to connect to an on-premise data source, or Amazon RDS, or whatever. And for this type of use-case – that may be fine, you might only want a single data source, and you can still upload spreadsheets which, if we’re honest, is where most sandbox-type applications get their data from.
This Reporting-as-a-Service offering though might well be where we see new user interface innovations coming through first, though. I get the impression that Oracle plan to use their Cloud OBIEE service to preview and test new visualisation types first, as they can iterate and test faster, and the systems running on it are smaller in scope and probably more receptive to new features. Similar to Salesforce.com and other SaaS providers, it may well be the case that there’s a “current version”, and a”preview version” available at most times, with the preview becoming the current after a while and the current being something you’ve got 6-12 months to switch from after that point. And given that Oracle will know there’s an Oracle database schema behind the service, it’s going to make services such as the proposed “personal data mashup” feature possible, where users can upload spreadsheets of data through OBIEE’s user interface, with the data then being stored in the cloud and the metrics then being merged in with the corporate dataset, with the source of each bit of data clearly marked. All this is previews and speculation though – I wouldn’t expect to see this available for general use until the middle of 2014, given the timetable for previous Oracle cloud releases.
The final product area that I was particularly interested in hearing future product direction about, was Oracle’s Data integration and Data Quality tools. We’ve been on the ODI 12c beta for a while and we’re long-term users of OWB, EDQ, GoldenGate and the other data integration tools; moreover on recent projects, and in our look at the cloud as a potential home for our BI, DW and data analytcs projects, its become increasingly clear that database-to-database ETL is no longer what data integration is solely about. For example, if you’re loading a data warehouse in the cloud, and the source database is also in the cloud, does it make sense to host the ETL engine, and the ETL agents, on-premise, or should they live in the cloud too?
And what if the ETL source is not a database, but a service, or an application such as Salesforce.com that provides a web service / RESTful API for data access? What if you want to integrate data on-the-fly, like OBIEE does with data federation but in the cloud, from a wide range of source types including services, Hadoop, message buses and the like. And where does replication come in, and quality-of-service management, and security and so forth come in? In my view, ODI 12c and its peers will probably be the last of the “on-premise”, “assumed-relational-source-and-target” ETL tools, with ETL instead following apps and data into the cloud, assuming that sources can be APIs, messages, big data sources and so forth as well as relational data, and it’ll be interesting to see what Oracle’ Fusion Middleware and DI teams come up with next year as their vision for this technology space. Thomas Kurian’s keynote touched on this as a subject, but I think we’re still a long way from working out what the approach will be, what the tooling will look like, and whether this will be “along with”, or “instead of” tools like ODI and Informatica.
Anyway – that’s it for Openworld for me, back to the real world now and time to see the family. Check-back on the blog next week for normal service, but for now – laptop off, kids time.
The Sunday before Oracle Openworld proper is “User Group Forum Sunday”, with each of the major user groups and councils having dedicated tracks for topics such as BI&DW, Fusion Development, Database and so on. Stewart, myself and Venkat were honoured to be presenting on behalf of ODTUG, IOUG and APAC covering topics around the new 11g release of the BI Applications, Hyperion/EPM Suite, and agile development using OBIEE, ODI and Golden Gate. Links to the presentation PDFs from each of our sessions are listed below:
- Deep Dive into OBIA 184.108.40.206.1 – Overview (Mark Rittman)
- Deep Dive into OBIA 220.127.116.11.1 – Data Integration (Stewart Bryson)
- Agile BI Development using OBIEE, ODI and Golden Gate (Stewart Bryson)
- Hyperion Profitability & Cost Management – Integration of Standard & Detailed Profitability (Venkatakrishnan J)
All of the sessions drew a good crowd, and I was especially pleased to see the number of people that came along to the BI Apps 18.104.22.168.1 sessions, and that there were a few early-adopters in the audience who’d either completed their initial implementations, or had carried out pilot or PoC exercises. Feedback from those attendees was as I’d expected – some initial early adopter issues but generally positive feedback on the simplified architecture, and use of ODI. Stewart’s session on the data integration aspects of this new release included content on its new Golden Gate integration, and the new Source-Dependent Store ODI concept that it supports, again which went down well with an audience looking for more technical details on how this new release works.
After the user group sessions finished, it was time to go over to Moscone North for Larry Ellison’s opening keynote, where three new products were announced. First up was the new In-Memory option for the Oracle Database, which adds an in-memory, column-store capability to Oracle databases on all platforms, not just Exadata. Aimed at the upcoming 12.1.2 release, this new feature will provide a column-store capability alongside the existing on-disk row-store, with the column-store being used for DW-style queries whilst the row-store will continue to be used for OLTP.
The way this in-memory column store will work, is as follows:
- The DBA will enable the in-memory feature by setting the database parameter “in memory_size = XX GB”, with the memory then being allocated in the database’s SGA (System Global Area, one of the shared areas in the overall database memory allocation)
- Tables, partitions or sets of columns will be enabled for in-memory storage by an “alter table … in memory” DDL command
- Existing query indexes on the source tables can then be dropped
The database will then take care of copying these tables, columns or partitions into the in-memory column-store area, and then refreshing those tables on a regular basis, so that the database will have both row-store, and column-store versions of the tables available at the same time.
Oracle’s assertion is that the overhead in maintaining both the row- and column-store versions of the tables will be balanced out by the removal of the need to maintain query indexes on the source tables, and performance improvements of 100x to 1000x were quoted for DW-style queries, and 2x for OLTP-style queries. Unlike the Hybrid Columnar Compression feature announced a couple of years ago at Openworld, none of this is Exadata-specific, but it will be an option for the Enterprise Edition of the database, and it will require the 12.1.2 release, so you’ll need to budget for it and you’ll need to be on the most recent release to make use of it.
Other than the in-memory option, the other two product announcements in the keynote were:
- The M6-32 “Big Memory Machine”, with 32TB of DRAM and a SPARC M6 chip architecture – positioned as the ideal server for the in-memory option
- The “Oracle Database Backup, Logging and Recovery” appliance, a server designed to receive and then store incremental database backups for private and public clouds, and then restore those databases as necessarily – basically a backup server optimised for database backup and recovery.
So that was it for today – more news tomorrow once the main conference sessions and keynotes start.
The vast majority of our customers use Oracle Database as the underlying data source for OBIEE 11g, sometimes on Exadata but almost always the Enterprise Edition of the database. OBIEE typically sells best into customers that already extensively use Oracle products, and often it’s often bundled-in with a wider technology or apps deal including database and middleware products. But for some customers, the cost of the Oracle Database as a data source can be prohibitive and in many cases, they’re also looking at purchasing and deployment options that don’t involve upfront perpetual license deals and running the software on-premise. One of our longest-standing customers came to us with just such a scenario a few weeks ago, and so I thought it’d be interesting to take a look at what alternatives there are to licensing Oracle Database Enterprise Edition, and whether running these alternatives in the cloud was a practical option.
To set the context: imagine that you wanted to deploy a small-to-medium sized data warehouse on two servers:
- A “production” server with 64GB RAM, 4 x 4 core servers, Intel x86_64 architecture and running Oracle Linux 6 (for licensing purposes, 8 processors)
- A “dev/test” server with the same spec, onto which we’d install two separate databases
For now we’ll ignore the ETL element, but if we were licensing Oracle Data Integrator Enterprise Edition, we’d need to license it for these two servers too at around $23k/processor.
So what are the alternatives to Oracle Database Enterprise Edition as a data source? To my mind, there are three main options:
- Trading down to fewer processors but keeping Oracle Database EE, or going down to a lower-specced version such as Standard Edition or Standard Edition One
- Using an alternative RDBMS, probably either open-source or a “community edition”
- The left-field option – using Hadoop, NoSQL, Hive and the other new “big data” technologies.
Let’s take a look at the Oracle options first of all. Based on 8 processors (4 x 4 Intel x86_64 cores, x0.5 multiplication factor), the cost of licensing Oracle DB EE + commonly-used database options based on the current Tech Price List would be as follows (at list price):
- Oracle Database Enterprise Edition ($47.5k x 8 = $380k)
- Partitioning Option ($11.5k x 8 = $92k)
- EM Tuning Pack ($5k x 8 = $40k)
- EM Diagnostics Pack ($5k x 8 = $40k)
Giving us a total of $552k for 8 processors (covering both of our servers). Add onto that the standard 22% support and software updates, and you’re looking at an initial payment of $673k or the perpetual license, and an ongoing $121k/year for annual maintenance. Factor in a DBA and the hardware as well, and you’ll probably not get much change out of $800k in year one, and $250k in the years afterwards.
You could switch-down to one of the cheaper editions of Oracle Database, which also don’t come with the add-on packs, which could take you down to $5.8k/processor for SE1 or $17.5k for SE, but you’ll then have to do without most of the high-end data warehousing features in the Oracle database, and work within socket restrictions when it comes to SE1. Now of course you get what you pay for with Oracle Database, including potentially lower ongoing admin costs because much of the tuning and optimisation work is either unnecessary or automated, but if price is the issue and the alternative is a cut-down version of Oracle, what else is on offer – and that works with OBIEE?
If you start to look around at the alternatives to Oracle, and assuming you’re discounting products like IBM DB/2 or Microsoft SQL Server, you’ll find there are two main “conventional” types of RBDMS that you could use with OBIEE:
- General purpose RBDMS’s, such as mySQL and PostgreSQL
- “Analytical” databases, typically in-memory, column-store and shared-nothing, such as Greenplum and Vertica
If you look under the covers with the analytical databases, in most cases you’ll find that the core RBDMS technology they use is in fact PostgreSQL, with the (open-source) code then forked and adapted to work with the particular technology angle each vendor chooses to emphasise. Most developers tend to think that PostgreSQL is a more “enterprise-scale” free RDBMS than mySQL, and so given that it also provides the core DB technology in the other products, how might this work with OBIEE?
As a bit of background, PostgreSQL is an open-source database that’s of a similar vintage to Oracle Database, and you can download it for free or install it as part of most Linux distributions. Many developers swear by PostgreSQL itself as being all they need for data warehouse scenarios, but a couple of vendors have taken the PostgreSQL code and commercialised it, adding their own management tools and support options around the core open-source code. The most famous of these is EnterpriseDB, who package up PostgreSQL into a number of options one of which is called Postgres Plus Advanced Server. If you’re looking for an open-source based, low-cost but mature RBDMS engine for your database, this is more or less the flagship option, so I set about creating a new OBIEE 22.214.171.124 VM for testing purposes, with the aim of using Postgres Plus Advanced Server as my data source.
In fact, the first issue you hit when trying to go “non-Oracle” for the database, is that you need a Oracle, SQL Server or IBM DB/2 database to install the RCU schemas into. PostgreSQL isn’t an option, but in reality most customers using OBIEE will have a small Oracle database available somewhere, so I cheated here and installed the RCU schemas into a spare Oracle database, and then went on with the install.
Installing Postgres Plus Advanced Server itself was pretty straightforward, with the installer separately downloading the core PostgreSQL files and the Apache HTTP server used for its web-based admin tools. On Windows, once you’ve installed the software there’s a new Start Menu entry with the EnterpriseDB-specific (value-add) tools, including a console-based Enterprise Manager-type tool called pgAdmin.
Opening up pgAdmin gives you an environment that’s similar to that you’d find in tools such as TOAD, SQL*Developer and the old Java thick-client based Enterprise Manager, and using the tool I was able to connect to my PostgreSQL database, create a few tables, and run some test SQL queries.
It’s pretty basic stuff compared to Oracle’s Enterprise Manager product, but it may be all you need if you’re looking for a tactical, or departmental solution. EnterpriseDB price Postgres Plus Advanced Server at just under $5k/processor/year for the core database (one processor = one socket, however many cores), with their Enterprise Manager product costing $1500/year for two monitored hosts, and their “Solution Pack” another $5k/year, so based on the same server sizing as we used for the Oracle costings, the price of an EnterpriseDB solution looks a bit like this:
- Postgres Plus Advanced Server ($5k x 8 = $40k)
- Enterprise Manager EM Tuning Pack ($1.5k)
- Solutions Pack ($5k)
Coming out at around $47k/year for the two servers. Now the Oracle licenses were perpetual not annual, but even so, the total cost of the on-premise PostgreSQL solution is about 1/3rd of just the annual maintenance cost of the Oracle software, let alone the software license cost, so it’s considerably cheaper. So what don’t you get with PostgreSQL – either core, or commercialised via EnterpriseDB, that you get with Oracle?
Now as a disclaimer, I’m mainly an Oracle person and can’t really speak with any great depth on PostgreSQL, but the major things that are missing in PostgreSQL compared to Oracle Database 11g are:
- Proper, in-built support for parallel query – PostgreSQL has an open-source GridSQL project but this looks more like a federated, shared-nothing solution rather than PQ within a single server
- Built-in ETL in the form of Oracle Warehouse Builder
- Anything like Oracle’s built in MOLAP server, Advanced Analytics and the like
- All the DR options, security options and so forth
And no doubt the core DW features – bitmap indexing, cost-based optimiser, support for VLDBs and so on – are not nearly as developed, and easy to manage, as with the Oracle database. But – given that not all customers need these features, and that many customers even on 10g and 11g are still using the database as if it were Oracle 7 or 8i – lack of such features may not be a show-stopper if money is tight.
Now so far we’ve been talking about running PostgreSQL “on-premise”, more or less a a direct substitute for Oracle Database; however, it’s also possible to run Postgres Plus Advanced Server on Amazon’s AWS cloud platform, with no local install of the software, simplified administration but access to the same core Postgres Plus Advanced Server features billed at an hourly rate, giving you an architecture like the diagram below:
EnterpriseDB and AWS provide a free 24-hour trial, so I set up another database this time on the Amazon cloud, provisioned it and created the same set of tables.
Connecting to the cloud-based Postgres Plus database was the same as connecting to a regular, on-premise one, and in fact this is more-or-less just commercial PostgreSQL on Amazon AWS. Instead of charging for software and support up-front as with regular Oracle or on-premise Postgres Plus Advanced Server, you’re instead charged, AWS-style, an hourly rate based on the size of instance – one comparable to one of the Oracle servers mentioned beforehand comes in at $6.48/hour or $56k/year, or $112k for the two servers we’d need, still overall less than even the Oracle software maintenance charge but covering hosting as well (though you’ll need to pay Amazon storage and bandwidth charges on top, too).
Connecting OBIEE 11g to these two databases is also fairly straightforward – the Data Direct Postgres ODBC drivers that come with OBIEE work fine, with the connection process involving creating an ODBC DSN to the Postgres database either on-premise on on AWS, then importing the tables into the BI Repository and modelling them as normal. Connecting to the AWS cloud as a data source worked fine as wasn’t laggy, but you’d obviously need to test it at greater volume and with a realistic network connection. But – it works and seems to perform OK, at least with three rows…
So something like Postgres Plus Advanced Server in the cloud would probably appeal to a customer with a temporary requirement, a departmental budget, or an infrastructure that’s already in the cloud, most probably on Amazon AWS. But you can also run Oracle Database on Amazon’s AWS cloud as well, through a partnership between Oracle and Amazon’s RDS (“Relational Database Service”). In this case, you go to the Amazon AWS website, sign-up for the service, put in your credit card details and within a few minutes, you’ve got a provisioned, Standard Edition One database ready to connect to via all the usual tools. And what this is, again, is the standard Oracle Database (or with EnterpriseDB’s offering, Postgres Plus Advanced Server) but hosted on AWS, charged hourly, with two main options for pricing:
- A license-included option (Oracle Database Standard Edition One) at $3.14/hour for a reasonably specced-instance, coming out at about $27k/year, or $54k for the two servers we’d need
- A “bring-your-own licence” (BYOL) option, where you provide the Oracle licenses and Amazon AWS hosts it, at around $19k/year for the same size instance, $38k/year for the two servers we’d need
So if you work on the assumption that Oracle Database SE1 is roughly equivalent to Postgres Plus Advanced Server in terms of DW features, it’s not a bad option, and probably a better one if you’re already heavily invested in Oracle and have lots of skills there. The BYOL option is really just about hosting, but at $38k/year to host the two servers, back them up and so on, it’s not a bad option compared to buying some servers and putting them somewhere.
So, with Amazon RDS or Postgres Plus Advanced Server in the cloud, if you’re happy to take the hit on features and scaleability it’s possible to bring your database costs down considerably, which may be an option when your project first starts off, and with the database platform reviewed say in a couple of years’ time to see whether it’s worth trading up to Oracle Database Enterprise Edition, or bring the whole thing back in-house. But Postgres and Oracle aren’t the only players in the analytical database market – there are other on-premise vendors such as Greenplum and Vertica who’ll sell you either a full-priced, Oracle-comparable database and who have cut-down, community editions available for single-node installs, or there are new players such as Amazon again who are bringing out cloud-native analytic databases again based on instance sizes and hourly charging – so lets look at one of them, Amazon’s “Redshift”.
Amazon Redshift is actually based on an analytic database product called ParAccel, a column-store database in the same vein as Vertica but with major bits cut-out in it’s Amazon Redshift form. That said – it’s a lot more of an interesting starting point than core PostgreSQL or Oracle Database SE1, an given that it’s a core Amazon AWS product, it’s no doubt going to receive a lot of investment, developer evangelism and usage over the next few years. It’s also Postgres-compatible in terms of ODBC drivers, so viable for OBIEE, though like Postgres unsupported, so you’ll not get much in the way of BI Server optimisations or help from Oracle Support if it goes wrong.
To connect OBIEE to Amazon Redshift there’s a set of ODBC and JDBC drivers that you can download, and once you’ve opened up the port to Redshift in the AWS firewall, you can create an ODBC connection on the OBIEE side just like most other PostgreSQL connections, and then import the Redshift metadata into OBIEE’s BI Repository.
Redshift is priced around the same mark as Postgres Plus Advanced Server, but without the Amazon AWS storage and bandwidth charges. It’s a bit higher-risk than Postgres Plus, and it’s not got the Oracle Database compatibility features EnterpriseDB added to core Postgres, but it’s column-store, cloud-native and backed fully by Amazon.
So – with all of this in mind, what are the conclusions? Well it’s fair to say that, based on what turned-up today and OBIEE’s pre-existing heterogenous database support, you can connect non-Oracle databases to OBIEE as well as Oracle ones, but you’ll need to bear in mind that the newer cloud-based ones, and the open-source ones, won’t necessarily come with support from Oracle. If you’re looking to keep as close to an Oracle solution as possible but cut costs dramatically, using Amazon’s RDS service particularly with the license-included SE1 option could bring down the cost considerably, from $800k or so to around $50k/year, but you’ll need to work within the product constraints of SE/SE1, so no materialised views, partitioning option and the like – so it’s probably a matter of preference whether you’d go for Oracle SE1 or a Postgres-based solution such as EnterpriseDB’s Postgres Plus Advanced Server (I’d probably go with Amazon RDS and Oracle Database SE1, if only because I can upgrade in-time to Oracle Database Enterprise Edition and my Oracle skills can be re-used).
Going forward – analytic databases-as-a-service certainly sound interesting, with Amazon Redshift looking particularly interesting. How much customers will take up cloud-based data warehousing databases in-general though will probably be limited until OBIEE itself goes into the cloud – either through cloud-friendly licensing policies that charge by the hour and instance capacity, or with a cloud-native version of OBIEE designed to interoperate with Oracle’s Cloud Database offering, which up until now doesn’t provide external SQL*Net access and is really designed for powering cloud-based Java apps. Watch this space as they say – and it’ll be interesting to hear what Oracle announce in this area at next months’ Oracle Openworld.
Oracle Database 12c Release 1 came out last week, initially available for 64-bit Linux and Solaris (SPARC and x86_64), with the Windows and other versions presumably due over the next few months. So what’s in this new release for Oracle BI & DW customers, and how does compatibility stand with OWB, ODI and Oracle BI Enterprise Edition? It’s early days yet, but let’s take an initial look.
Probably the headline new feature for Database 12cR1 is “pluggable databases” – a system where a single “container” database can be created that can then contain one or more “pluggable” databases, similar to how a single physical server can host one or more virtualised, VMWare or Virtualbox servers. The advantage of container and pluggable databases is that whilst each pluggable databases appears to applications to be its own, self-contained database in-fact it shares the underlying database system infrastructure with the other pluggable databases in that container, reducing the total overhead compared to setting up several full Oracle databases.
Clearly this is something Oracle have put together to support their move into the cloud, but it’ll benefit ordinary customers by permitting a greater degree of data isolation than you’d get by a multi-schema approach, without all the overhead of creating virtual machines to run your databases in.
Whilst not a feature specifically aimed at BI & DW developers, those of us who are constantly spinning-up R&D and sandbox-type environments now have another option alongside multiple schemas, multiple databases and VMs/containers, though as we’ll see in a moment not all database features are available when running in container/pluggable mode. Note thought that use of more than one pluggable database within a container database requires a new Enterprise Edition option called the Multitenant Option, so its most probably aimed at customers serious about cloud, multi-tenancy and TCO reduction.
Another new feature that’ll affect all developers is the replacement of Database Control (the cut-down Enterprise Manager equivalent to Fusion Middleware Control) with DB Express, an Adobe Flash-based management console that comes with one or two new features and a couple of features taken away. As you can see from the screenshots below, the look is similar to Enterprise Manager Cloud Control, and for the average database developer or admin, it still looks like a pretty-good web-based admin tool.
One thing to note is that starting up DB Express is different to the old DB Console; before you can even use it, you need to set the HTTP port it listens on (for example, 8080) using the command below whilst logged in as SYS:
DB Express actually runs using XDB rather than a standalone EM instance, and this article on the AMIS blog by Marco Gralike runs through a bit more of the new product background along with a few other settings you might want to check-out and enable. You’ll also need to have Flash installed and enabled on any browser that uses it, which I guess rules out using Safari on my iPad for checking out 12c databases (at least, without using full Enterprise Manager Cloud Control 12c).
So, onto the BI & DW-specific features now. There’s not, at least with this initial 12cR1 release, much new in the OLAP Option area and of course there’s no new 12c release of Oracle Warehouse Builder, with instead an updated version of OWB 11gR2 now available that’s compatible with Database 12cR1. ODI11g is compatible with Oracle Database 12c in that it can connect to it via the usual JDBC drivers (as can OBIEE through the usual OCI ones), and David Allen from the ODI team posted this article the other day on installing the ODI repositories into a 12c pluggable (as opposed to regular, non-pluggable) database. Steve Karam has been updating and maintaining an excellent list of community Oracle Database 12c-related articles over at his blog, but let’s take a quick look at a couple of new 12c features that have interested us over at Rittman Mead.
The first one that grabbed Stewart’s eye was temporal validity. Whilst this sounds like something out of Doctor Who, it’s actually a feature that leverages flashback (or flash-forward, as Tom Kyte suggests) to return version of a table row entry based on validity at a certain time. To take an example from this Oracle Learning Library article on the topic, a query that uses temporal validity might look like:
select first_name, to_char(valid_time_start,'dd-mon-yyyy') "Start", to_char(valid_time_end,'dd-mon-yyyy') "End"
from hr.emp_temp versions
period for valid_time between to_date('01-SEP-1995') and to_date('01-SEP-1996')
order by 2;
which of course looks very-much like the types of queries we use in BI applications that need to make use of type-2 slowly changing dimensions. In effect, the valid-from and valid-to columns we usually create and maintain become hidden columns used by the Oracle database, and which we can make use of through queries like the one above, or the one below that sets a particular valid time period and then just queries the table, without any reference to this time columns:
select first_name, to_char(valid_time_start,'dd-mon-yyyy') "Start", to_char(valid_time_end,'dd-mon-yyyy') "End"
from hr.emp_temp order by 2;
Taking this to its logical conclusion, what this potentially gives us is a way of implementing SCD2-type dimension tables without having to use surrogate keys, as the temporal validity part will take care of row versioning, though there’s probably other benefits of surrogate keys that we’d then lose that I’ve not thought through, Where this does also get more interesting is when you consider another new 12cR1 feature, in-database row archiving, which allow you to “soft-delete” individual rows so they are excluded from normal DML statements, but they’re still there for regulatory, compliance or archiving reasons. For example, using the command:
update emp_arch set ora_archive_state=dbms_ilm.archivestatename(1)
where employee_id in (102, 103);
you can set a particular set of rows to be archived (i.e., not normally visible), run queries against that table and have those rows excluded, but then issue commands such as:
alter session set row archival visibility = all;
select employee_id, first_name, ora_archive_state
from emp_arch where ora_archive_state = dbms_ilm.archivestatename(1);
to see the archived, soft-deleted rows. This is quite an interesting new option we could consider when having to store in a data warehouse those table rows that have been deleted in the source system but we need to keep in the warehouse for regulatory or compliance purposes, and together with temporal validity gives us a few new ideas for putting together data warehouse physical models. To be honest – I’m expecting there to be a few gotchas – one we’ve spotted already is that temporal validity isn’t available when working with pluggable databases – but they’re a couple of interesting new features, nonetheless.
Other new features of interest in 12cR1 include adaptive SQL query plans that can change as the query executes, something that sounds very useful when the initial cardinality estimates turn out to be way-off, the death of advanced replication and streams (in favour of GoldenGate), the ability to disable archive logging when using data pump, enhanced WITH clauses and subquery factoring including the ability to include inline PL/SQL calls in the subqueries, invisible columns to go with 11g’s invisible indexes (with presumably invisible databases and servers to come later on), automatically incrementing identity columns (yay!), and bunch of other optimiser enhancements.
The final new feature that’s got us pretty excited at Rittman Mead Towers is probably the most significant new BI&DW-related SQL feature in 12c – SQL pattern matching. We’ll no-doubt go into SQL pattern matching in a lot more detail in a future blog post, but to take an example in the Using SQL for Pattern Matching Oracle-by-Example tutorial, the pattern-matching query below:
SELECT * FROM Ticker
PARTITION BY symbol
ORDER BY tstamp
MEASURES STRT.tstamp AS start_tstamp, LAST(DOWN.tstamp) AS bottom_tstamp, LAST(UP.tstamp) AS end_tstamp
ONE ROW PER MATCH
AFTER MATCH SKIP TO LAST UP PATTERN (STRT DOWN+ UP+)
DEFINE DOWN AS DOWN.price < PREV(DOWN.price), UP AS UP.price > PREV(UP.price) ) MR
ORDER BY MR.symbol, MR.start_tstamp;
would look for patterns of rows that featured a “v-shaped” up-and-then-down movement, such as that shown in the graph below, that might be useful in identifying stock disposals around a key date.
Taking this further, you might use pattern-matching to find CEOs/Executives whose pattern of selling publicly traded stocks two months before financial close is inversely similar to Financial Performance(or Profitability) of all companies over the last two years, or follow the beer and diapers example to find group of products where there is sudden 100% spike in sale (inverted V graph) when compared with the previous day or next day (or previous/next n days/months, for example). Look out for blog posts from our own Stewart on features such as temporal validity and in-database row archiving, and Venkat who came up with the pattern matching examples, over the new few weeks – for now though, back to investigating this exciting new database release.