Category: "Business Intelligence"

Nicholas Goodman

I recently came across the blog of Nicholas Goodman, a BI professional who writes about open source tools, Oracle Warehouse Builders, and other BI topics.

I commented on his post about KETL vs KETTLE.

Check him out.

Open Source: Closing thoughts of Vladimir Stojanovski

Over the past five years, our research into open source BI components has shown few projects supporting BI, and no BI suites, until recently. Bee is the oldest of the open source BI suites, starting in 2002. Five years ago, there was one open source project developing an Extract, Transform and Load (ETL) tool - Jetstream, one for reporting - JasperReports, one for analysis - Mondrian. Of the open source Relational Database Management Systems (RDBMS), none were optimized for very large databases, or for querying, until this year. There are now over 25 open source projects supporting every aspect of BI, from ETL to the user Portal, including reporting, on-line analytical processing (OLAP), advanced analytics and data mining, workflow, and dashboards. Six of these can be considered BI suites, with all but Bee having launched this year.

Vladimir Stojanovski has written a five-part article in his blog at ITtoolbox. Part of his conclusion is quoted below.

Call me shortsighted, but then this nomer could also apply to the CRM/BI industry indiscriminately (except for the brave souls at places like SugarCRM [see post Open Source: CRM and Business Intelligence (Part 2 - SugarCRM)] and Pentaho [see post Open Source: CRM and Business Intelligence (Part 3 - Pentaho, et al)]). The industry is finally being forced to take Open Source seriously not necessarily because we think it is a great movement, but because our clients are forcing us to do so. An increasing number of companies are adopting Open Source in fundamental areas such as operating systems (Linux), database platforms (MySQL, PostgreSQL), application servers (JBoss), and web servers (Apache). This foundational platform is then forcing itself onto enterprise-class applications, such as CRM.end quotation
-- Open Source: Closing thoughts, I think... (Part 5) by Vladimir Stojanovski

As shown in my opening paragraph, the open source movement is responding to the interest in open source solutions for enterprise applications, particularly, BI. You can check out the links in the side column of this blog for a list of open source BI suites and tools being developed. We'll be continuing with our research and use of open source BI solutions over the past year, and I think it will be some time beyond that before we, or Vladimir, or anyone else, actually writes the final Closing Thoughts on open source BI.

VC Money for Pentaho

Open Source Business Intelligence is gaining ground and seems to be in the radar of VC firms. Pentaho, a provider of Open Source Business Intelligence (BI) software has received $5 million in series A funding.

Other Pentaho news include:

KETL First Meeting

Clarise and I met with Marshall Stevenson, Juan Carlos Rojas and Nicholas E. C. Wakefield of Kinetic Networks, the company behind KETL. The open source KETL Framework is the ETL tool providing data gathering and data quality for the Bizgres Clickstream initiative.

Marshall, Clarise and I are all ex-Oracle, and worked together at a boutique consultancy eight years ago. In addition, Marshall has been specializing in BI solutions since then, and has even provided us with ETL specialists as supplemental staff for our own projects. Marshall recently joined Kinetic Networks as their VP of Sales, and set up this meeting.

With KPMG and IBM in his background, Juan founded Kinetic Networks in 1999, and is currently serving as the CEO. Kinetic Networks provided a data warehousing platform in a Software as a Service environment, as well as doing Business Intelligence System Integration; delivering BI solutions with a heavy emphasis on operational excellence.

After graduating from Oxford University, Nick worked at EDS and MicroStrategy, before joining Kinetic Networks in 2000. Based upon Nick's work, and their success with KETL the company split its focus between System Integration and Technology development earlier this year.

A custom DW prototype for a client led to the first iterations of KETL. Under Nick's guidance it soon expanded by adding XML capabilities, and a plugin architecture using their own Java classes. From their SI group's best practices for Data Quality, they added the capability for data QA on the fly. It wasn't long before they found a customer who wanted the tool, more than System Integration, and a product strategy came about.

KETL has been used in production environments for over three years. Some of their experiences include

  • customers who have used KETL, because the cost savings on the "backend" ETL tool allowed for a better front end Reporting, OLAP and Portal solution
  • running KETL alongside commercial ETL, with a custom plugin allowing KETL to perform data quality for an existing install
  • in its open source form KETL is capable of handling 100’s of millions of record
  • use of KETL in production environments has driven its development

"Performance is tuned in areas that are important to our clients to date. The ability to read multiple files across multiple devices in parallel, using multiple CPU’s is something KETL can do relatively easily, our clients wouldn not have used KETL extensively otherwise.

"Historically KETL has been used for commercial implementations along with services; packaging has not been the focus. Over time this will improve." end quotation
-- Nick Wakefield, CTO Kinetic Networks

KETL was made from the beginning to handle very large data sets. This, coupled with its ability to perform Statistical Analysis for on-the-fly data quality, makes KETL a perfect tool for Clickstream analysis. Based upon this capability, and working with Greenplum at one customer led to the open sourcing of the KETL Framework, and its inclusion in the Bizgres stack.

Any company wishing to open source their product must decide how best to join the open source community. Choosing a license was very difficult. The KETL Framework is licensed under the LGPL. This allowed them to build commercial components upon it. The MPP and Data Quality modules are licensed commercially through an upgrade path.

Another choice to be made involves building a community around KETL. Currently, KETL relies on the Bizgres community. However, KETL works against any target database, and through its plugin architecture, can work with any source system. Thus, Kinetic Networks is looking towards building its own community.

Kinetic Networks has grown organically since its inception, without requiring any external funding. As such it relies on its System Integration group, including remote management of customer business intelligence solutions, for cash flow, as it grows its open source and product strategy.

As we proceed with our Open Source Business Intelligence book project, we're planning to stay in touch with Marshall, Nick and Juan, doing more interviews with them, and even a podcast or two. Bernard will be applying his Open Source Maturity Model, and we're looking forward to playing with KETL in the lab. We'll keep you posted.

While Kinetic Networks is finalizing their open source strategy, there are no links directly from the Kinetic Networks website to information about KETL. However, Nick has graciously provided the following links for the convenience of our readers. [As per the comment below, the following documentation is no longer available. Please check the documentation section of]

  • KETL for Data Integration [PDF]
  • KETL XML Steps Overview [PDF]
  • KETL Training [PDF]
  • allation Guide [PDF]
  • KETL System Operations Guide [PDF]
  • Documentation related to Bizgres and the stack
  • Software, currently its part of Bizgres Clickstream only

Open Source Vendors

We (Clarise, Bernard and I) recently submitted two article ideas, and one was rejected becuase

The second one is product-specific, thus unsuitable for publication. If you wanted to reshape it into a product review, which are written by end users to describe their experience with a product, please contact [a sister publication]... must emphasize that you avoid mentioning product names as well as methodologies specific to certain vendors.end quotation

Here's the proposal for that second article, which will now be writen on this blog.

Mondrian is an open source OLAP engine that is very mature, having been in use since 2001. It is of interest, not only for its own capabilities, but for the fact that it is included in or required for nearly every other open source BI project that includes OLAP capability, from simple tools such as jPivot to full BI suites such as Pentaho. This article provides details about Mondrian and discusses its use in and importance to open source BI. The article will also discuss how to incorporate Mondrian into an organization's BI project.end quotation

As far as we know, Mondrian doesn't have a commercial arm, though its recent relationship with Pentaho may change that. It is difficult for me to think of various open source solutions as "vendors". That's why I tend to refer to them as "projects" rather than "products". Granted, some open source projects are dual-licensed, or have a commercial arm, like Green Plum for Bizgres, and Kinetic Network for KETL, MySQL AB for MySQL. So, is this rejection an indication of a telling lack of awareness about open source, or am I wrong in my thinking? Are open source licensed software packages projects or products? Is discussing an open source project, a discussion about a "specific vendor"? Food for thought.


Recently, Navica and InterASC teamed up on project where the customer required we use PostgreSQL as a central data warehouse. Clarise pointed out that PostgreSQL lacked essential attributes to be an efficient platform for data warehousing. In investigating alternatives, we discovered Bizgres. Bizgres is a separate distrubtion based on PostgreSQL with the primary purpose of filling exactly those lacks Clarise had highlighted such as table partitioning and bit map indexing, and the secondary purpose of building a BI suite. As we develop Open Source Business Intelligence, we'll be writing posts describing the enhancements that Bizgres is making to PostgreSQL, and compare Bizgres to Oracle as a DW platform.

Technorati Tags: , , , ,

July 2020
Mon Tue Wed Thu Fri Sat Sun
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31    
 << <   > >>

At the beginning, The Open Source Solutions Blog was a companion to the Open Source Solutions for Business Intelligence Research Project, and book. But back in 2005, we couldn't find a publisher. As Apache Hadoop and its family of open source projects proliferated, and in many ways, took over the OSS data management and analytics world, our interests became more focused on streaming data management and analytics for IoT, the architecture for people, processes and technology required to bring value from the IoT through Sensor Analytics Ecosystems, and the maturity model organizations will need to follow to achieve SAEIoT success. OSS is very important in this world too, for DMA, API and community development.

37.652951177164 -122.490877706959


  XML Feeds