OSS Public:General Articles/BI Suites

From WikiOpenSourceSolutions

Open Source Solutions: Contents | Preface | Fundamentals | Open Source BI Options | Building a Data Warehouse

OSDW Main Pages Main PageOSBI Book Proposal, Preparation, Research, Open Source Business Intelligence, the Book, Public Area for Open Source Solutions s (http://press.teleinteractive.net/open/mw/index.php?title=Special%3ASearch&ns12=1&&fulltext=Search&search=x) e0 (http://press.teleinteractive.net/open/mw/index.php?title=OSS_Public:General_Articles/BI_Suites&action=edit&section=0) +/- (http://press.teleinteractive.net/open/mw/index.php?title=Template:H:hns&action=edit) </div>

Introduction (295 words)

Open source software has been a part of IT organizations, perhaps with the first computers, and certainly since the dawn of the Internet. One area in which there have been few open source projects, is the area of Business Intelligence (BI). This has dramatically changed in 2005, with the introduction and growth of several open source projects providing fully capable BI suites. We've had the opportunity to survey six open source BI Suites, and to compare their maturity and functional readiness to join the Enterprise's IT software portfolio.

Historically, you would find open source software in enterprise IT shops only in the infrastructure, such as the original sendmail simple mail transport protocol (SMTP) daemon, several BSD Unix style operating systems or the Apache web server. This even predates the coining of the term "Open Source", which is considered to have happened in 1998, with the release of the Mozilla source code by the Netscape Corporation. Since then, the open source movement has grown and moved into every aspect of Information Technology.

When we first started looking in 1999, our research into open source BI components had shown few projects supporting BI, and no BI suites, such as one finds from proprietary software vendors. There might be one open source project developing an Extract, Transform and Load (ETL) tool, one for reporting, one for analysis, and tools to build BI components, but nothing comprehensive. Of course, there are open source Relational Database Management Systems (RDBMS), but none that were optimized for very large databases, or for querying. As of 2005, there were over 25 open source projects supporting every aspect of BI, from ETL to the user Portal, including reporting, on-line analytical processing (OLAP), advanced analytics and data mining, workflow, and dashboards. The linkblog in the side column of our Open Source Solutions blog (http://press.teleinteractive.net/oss) tracks over 60 projects, supporting companies and communities for OSBI, as well as blogs and other online resources related to enterprise open source.

Open Source Overview (460 words)

From academics and hackers exchanging code, to the Free Software Movement [n] of 1984, to the current Open Source Initiative [n], communities of amateur, hobbyist and professional developers have provided software in an atmosphere that respected each other's freedom, creativity and ideas. There are now many open source communities, foundries and forges. The most popular of these is likely Sourceforge [n], which exceeded 100,000 registered open source projects early in 2005.

The Free Software Movement explained that "free software" meant "free as in speech, not free as in beer". This has been the rallying cry of open source developers ever since. With the general adoption of the term "open source" since 1998, a necessary clarification of this idea came about. For not all free software is open source. "Freeware" may have as restrictive a license as any proprietary software. Open source software licenses all have one thing in common: they expose the source code for modification, reuse and redistribution. Open source licenses vary in how one may modify, reuse or distribute the source code. The license may prohibit commercial use of the open source code, or not. The source code may be covered by a traditional copyright, copyleft or a Creative Commons license [n]. Attribution, in one form or another, is often required by the open source license. There are even proprietary open source licenses, unique to that open source project or a commercial organization releasing the open source software. The most common open source licenses are the Berkeley licenses and derivatives such as the Apache license, or the General Public License and its derivatives. A full discussion of open source licensing is beyond the scope of this article, but we will name the license for each of the open source BI suites discussed here. A good source of open source licensing discussion is SourceLicense [n].

The impact of the open source movement has also been enhanced by the moves towards open standards and open application programming interfaces (API). The rise of electronic commerce and business communication using applications built upon the Internet Protocol (IP) has led organizations of all sizes, in both the public and private domains, to demand interoperability. The decline in IT budgets has led to a demand for inexpensive IT solutions. Both of these forces have contributed to the rise of open source software.

Open source software has moved beyond just providing basic IT infrastructure, and is no longer just the province of the "little man behind the curtain". Open source software can be found to support every business and communication function in the organization. There are complete open source packages offering rich feature sets for Enterprise Resource Planning (ERP), Customer Relationship Management (CRM), Document Management Systems, Portals, eCommerce and yes, even Business Intelligence.

The Rise of Open Source BI Suites

(393 words - need research on the first one to appear, how many in 2004, then 2005 then past six months and predictions on what is coming - see outline)

In 1999, we began to investigate open source software for data warehousing projects. We found the familiar open source RDBMS systems, but while these may be able to handle large data sets, they were oriented more towards transactional efficiency, not analytical effectiveness. There were some open source Geographic Information Systems (GIS), which could conceivably be reworked to provide a multi-dimensional database (MDDB) structure for data warehousing. We found one reporting tool (JasperReports, 2001-09-25) one ETL tool (Jetstream, 2002-07-29), one OLAP engine (Mondrian, 2001-09-07), and one Analytical user interface (jPivot, 2002-07-25) that used that OLAP Engine.

The statistics contained in this article are accurate as of the time of writing, and may change. Things move very fast in the open source world, and may have changed by the time you read this. Today, we have a much richer field of choices to build BI solutions from open source components. Distributions based upon familiar open source RDBMS have been released that have compatibility with Oracle and PL/SQL, or extended features that make them suitable for querying very large databases. Open source software projects have appeared that provide OLAP capabilities to popular spreadsheet programs, or that take advantage of the Microsoft Analysis extensions to SQL Server. We can find open source projects that support every feature required to build a BI solution. In 2004, there were 14 open source projects that delivered a BI or data warehousing related function. In 2005, that number had grown to 26, and is still growing today.

Also of importance in the open source world, and something that you don't find from proprietary software, are open frameworks designed to help you build your own BI tools or enhance and extend your BI solution. There are three such frameworks: the Eclipse BIRT project [n], the EFEU project [n] and the JpGraph project [n].

There are seven open source RDBMS, three of which provide enhancements specifically aimed at supporting queries against very large databases. There are eight open source ETL tools, two of which are bundled with BI suites. There are four open source Reporting tools. There are six OLAP engines and tools. In addition, either as frameworks, open standards, stand-alone projects or components within a suite, there exist open source tools for advanced analytics, data mining, dashboards and portals. Most importantly, either through combining some of these others, or through using unique components, there are six full featured BI Suites. Five of these were first registered in 2005.

Predictions...

Project - Year Registered - # Downloads (need a way to automatically track and show downloads)


BI Suites

Bee - 2002
Bizgres - 2005
JasperBI Suite (a.k.a. JasperIntelligence) - 2005
Openi - 2005
Pentaho - 2005
SpagoBI - 2005


ETL/EAI/ESB

Aptara ETL - 2007
Clover ETL - 2005
CpluSQL - 2003
Enhydra Octopus - 2002
JasperETL (Talend) - 2006
Jetstream - 2002
KETL - 2005
Kettle -2001-3?
miniETL - yyyy
Mule - 2005
openDigger - 2005
ServiceMix - yyyy
Pequel ETL - 2002
Talend Open Studio - 2006
Tuscany - 2007

OLAP

Cubulus - 2007
gOLAP - 2001
jPivot - 2002
Mondrian - 2001
openOLAP - 2003 (PostgreSQL)
openOLAP for MySQL - 2007
Palo - 2005
pocOLAP - 2004
PostgreSQL MDDB - 2001

Reporting

Agata Report - 200x
DataVision - 200x
JasperReports - 2001
OpenReports - 2005
OpenRPT - 2005


Dashboards

MarvelIT - 2005


Data Mining

CPAS - yyyy
Rattle - yyyy
Weka (joined with Pentaho) - yyyy

Range of Open Source BI Suite capabilities (1000 words)

We are defining BI Suites fairly loosely, but not unlike proprietary BI suites were defined five years ago, especially before the recent spate of consolidations in this market space. We consider an open source project to be a component if provides only one feature, capability or funciton of use to building a BI solution. If an open source project provides several features under one architecture or especially under one installation, we consider it a suite. There are no BI suites, either proprietary or open source, that truly provides a complete end-to-end BI solution, though open source solutions are coming close. Like their proprietary counterparts, open source BI suites are used in conjunction with other components, packages or suites to form the final solution.

Some BI Suites start with the ETL, some include a database, some provide meta data management, most include a reporting tool, others provide some form of OLAP, and many provide a web portal.


   * Some suites like Bizgres have RDBMS enhanced for DW/BI/query, ETL, & reporting - Bizgres consortium seems to have fallen apart
   * Others offer OLAP and Portal
   * JasperBI, Pentaho and SpagoBI see to be the most active in 2007, providing the most features and innovation

Specifics on the Open Source BI Suites (1500 words)

BEE Project

BEE is one of the first open source BI Suites, having been around since 2002. It provides ETL, ROLAP, reporting, integration with the R Project, is written in PERL, and primarily supports MySQL.

Bizgres

Bizgres is a distribution of PostgreSQL with specific modifications to increase performance and use as a data warehouse. In addition, the Bizgres project comes with the KETL ETL tool and JasperReports. The Bizgres project is supported by a consortium of three companies, Greenplum, Kinetic Networks, and JasperSoft. Bizgres seems to be concentrating, in late 2006, on the RDBMS, with the consortium members focusing on their individual efforts.

JasperBI Suite

The JasperBI Suite has evolved from the JasperIntelligence JasperServer framework and JasperAnalysis (Mondrian based), provides a Web and Web services based environment for Reporting, data analysis/OLAP (Mondrian), and data integration/ETL (Talend).

Openi

Openi provides a web-driven interface to OLAP, relational, statistical and data mining sources giving BI integrators user interface, report definition and connector tools.

Pentaho

Pentaho has been getting a lot of attention since its launch and funding in 2005. This project has an impressive pedigree in its team leaders, and provides quite an array of capabilities: Reporting (jFreeReports and others), Analysis (Mondrian), Dashboards, Data Mining (Weka), ETL/EAI (KETTLE) and Workflow.

SpagoBI

SpagoBI is a BI platform drawing its components from the ObjectWeb consortium. Tools include metadata management, ETL, Reporting, Analysis, and Dashboards.


   * Show each of the suites that we know about, capabilities and status

Some Comparison to Proprietary (500 words)

   * Compare the OSS BI Suites to Cognos, Business Objects, SAS, Microstrategy, SPSS, Oracle, etc.
   * Show how proprietary suites grew over time
   * Perhaps talk about consolidation in propritary BI market (Brio, Crystal, Hyperion, etc.)
   * Talk about open source combined with proprietary - like MySQL's arrangement with Business Objects
         o http://www.businessobjects.com/news/press/press2005/20050418_mysql_part.asp
         o http://www.mysql.com/news-and-events/press-release/release_2005_10.html

Some indication of Adoption (500 words)

   * Number of Downloads
   * Any referenceable case studies
   * Any referenceable analysts (Gartner, other) reports or predictions on adoption


Conclusion (250 words)

   * Advantages of Open Source such as community support, availability of source code, ease of customization, 
   * Protection against proprietary firm being acquired - going under -  undesirable upgrade path - loss of support
   * OSS BI growing and getting better - more innovative
   * Use in conjunction with legacy, commercial or proprietary: inexpensive path to expand BI to new user groups