Category: "Computers and Internet"

Open Source BI - Dumbest Predictions

My search for Open Source BI predictions on Google led to the following mash-up

We're looking for the "Top 11 dumbest IT predictions for 2006. ... This survey explores two themes: use and adoption of open-source BI software and use and ..end quotation
Google Search

Ah, totally unrelated articles. At least until they find the 11 Dumbest IT Predictions for 2006.

Though one of the prediction hints is somewhat related.

Larry Ellison will invent a new kitchen appliance, quit his job and start doing late-night infomercials.end quotation

KETL First Meeting

Clarise and I met with Marshall Stevenson, Juan Carlos Rojas and Nicholas E. C. Wakefield of Kinetic Networks, the company behind KETL. The open source KETL Framework is the ETL tool providing data gathering and data quality for the Bizgres Clickstream initiative.

Marshall, Clarise and I are all ex-Oracle, and worked together at a boutique consultancy eight years ago. In addition, Marshall has been specializing in BI solutions since then, and has even provided us with ETL specialists as supplemental staff for our own projects. Marshall recently joined Kinetic Networks as their VP of Sales, and set up this meeting.

With KPMG and IBM in his background, Juan founded Kinetic Networks in 1999, and is currently serving as the CEO. Kinetic Networks provided a data warehousing platform in a Software as a Service environment, as well as doing Business Intelligence System Integration; delivering BI solutions with a heavy emphasis on operational excellence.

After graduating from Oxford University, Nick worked at EDS and MicroStrategy, before joining Kinetic Networks in 2000. Based upon Nick's work, and their success with KETL the company split its focus between System Integration and Technology development earlier this year.

A custom DW prototype for a client led to the first iterations of KETL. Under Nick's guidance it soon expanded by adding XML capabilities, and a plugin architecture using their own Java classes. From their SI group's best practices for Data Quality, they added the capability for data QA on the fly. It wasn't long before they found a customer who wanted the tool, more than System Integration, and a product strategy came about.

KETL has been used in production environments for over three years. Some of their experiences include

  • customers who have used KETL, because the cost savings on the "backend" ETL tool allowed for a better front end Reporting, OLAP and Portal solution
  • running KETL alongside commercial ETL, with a custom plugin allowing KETL to perform data quality for an existing install
  • in its open source form KETL is capable of handling 100’s of millions of record
  • use of KETL in production environments has driven its development

"Performance is tuned in areas that are important to our clients to date. The ability to read multiple files across multiple devices in parallel, using multiple CPU’s is something KETL can do relatively easily, our clients wouldn not have used KETL extensively otherwise.

"Historically KETL has been used for commercial implementations along with services; packaging has not been the focus. Over time this will improve." end quotation
-- Nick Wakefield, CTO Kinetic Networks

KETL was made from the beginning to handle very large data sets. This, coupled with its ability to perform Statistical Analysis for on-the-fly data quality, makes KETL a perfect tool for Clickstream analysis. Based upon this capability, and working with Greenplum at one customer led to the open sourcing of the KETL Framework, and its inclusion in the Bizgres stack.

Any company wishing to open source their product must decide how best to join the open source community. Choosing a license was very difficult. The KETL Framework is licensed under the LGPL. This allowed them to build commercial components upon it. The MPP and Data Quality modules are licensed commercially through an upgrade path.

Another choice to be made involves building a community around KETL. Currently, KETL relies on the Bizgres community. However, KETL works against any target database, and through its plugin architecture, can work with any source system. Thus, Kinetic Networks is looking towards building its own community.

Kinetic Networks has grown organically since its inception, without requiring any external funding. As such it relies on its System Integration group, including remote management of customer business intelligence solutions, for cash flow, as it grows its open source and product strategy.

As we proceed with our Open Source Business Intelligence book project, we're planning to stay in touch with Marshall, Nick and Juan, doing more interviews with them, and even a podcast or two. Bernard will be applying his Open Source Maturity Model, and we're looking forward to playing with KETL in the lab. We'll keep you posted.

While Kinetic Networks is finalizing their open source strategy, there are no links directly from the Kinetic Networks website to information about KETL. However, Nick has graciously provided the following links for the convenience of our readers. [As per the comment below, the following documentation is no longer available. Please check the documentation section of]

  • KETL for Data Integration [PDF]
  • KETL XML Steps Overview [PDF]
  • KETL Training [PDF]
  • allation Guide [PDF]
  • KETL System Operations Guide [PDF]
  • Documentation related to Bizgres and the stack
  • Software, currently its part of Bizgres Clickstream only

Open Source Vendors

We (Clarise, Bernard and I) recently submitted two article ideas, and one was rejected becuase

The second one is product-specific, thus unsuitable for publication. If you wanted to reshape it into a product review, which are written by end users to describe their experience with a product, please contact [a sister publication]... must emphasize that you avoid mentioning product names as well as methodologies specific to certain vendors.end quotation

Here's the proposal for that second article, which will now be writen on this blog.

Mondrian is an open source OLAP engine that is very mature, having been in use since 2001. It is of interest, not only for its own capabilities, but for the fact that it is included in or required for nearly every other open source BI project that includes OLAP capability, from simple tools such as jPivot to full BI suites such as Pentaho. This article provides details about Mondrian and discusses its use in and importance to open source BI. The article will also discuss how to incorporate Mondrian into an organization's BI project.end quotation

As far as we know, Mondrian doesn't have a commercial arm, though its recent relationship with Pentaho may change that. It is difficult for me to think of various open source solutions as "vendors". That's why I tend to refer to them as "projects" rather than "products". Granted, some open source projects are dual-licensed, or have a commercial arm, like Green Plum for Bizgres, and Kinetic Network for KETL, MySQL AB for MySQL. So, is this rejection an indication of a telling lack of awareness about open source, or am I wrong in my thinking? Are open source licensed software packages projects or products? Is discussing an open source project, a discussion about a "specific vendor"? Food for thought.

Open Source Business Intelligence

Open Source Business Intelligence is a technology with a ready market. Oopen source RDBMS have matured to the point where they can be used reliably as a data warehouse to support business intelligence solutions. Many open source projects are being introduced to expand the capabilities of open source solutions beyond reporting, and even simple OLAP, into complete business intelligence suites. Just take a look at the LinkBlog in the sidebar.

There are still some rough spots, however. So, in an effort to help data warehousing experts and open source afficianados along the path to making effective use of open source software for BI, we're introducing this blog, and a wiki, as companions to our effort to guide our forthcoming book with the working title of Open Source Business Intelligence.

July 2020
Mon Tue Wed Thu Fri Sat Sun
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31    
 << <   > >>

At the beginning, The Open Source Solutions Blog was a companion to the Open Source Solutions for Business Intelligence Research Project, and book. But back in 2005, we couldn't find a publisher. As Apache Hadoop and its family of open source projects proliferated, and in many ways, took over the OSS data management and analytics world, our interests became more focused on streaming data management and analytics for IoT, the architecture for people, processes and technology required to bring value from the IoT through Sensor Analytics Ecosystems, and the maturity model organizations will need to follow to achieve SAEIoT success. OSS is very important in this world too, for DMA, API and community development.

37.652951177164 -122.490877706959


  XML Feeds