Category: "books"

Reading Pentaho Kettle Solutions

On a rainy day, there's nothing better than to be sitting by the stove, stirring a big kettle with a finely turned spoon. I might be cooking up a nice meal of Abruzzo Maccheroni alla Chitarra con Polpettine, but actually, I'm reading the ebook edition of Pentaho Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration on my iPhone.

Some of my notes made while reading Pentaho Kettle Solutinos:

…45% of all ETL is still done by hand-coded programs/scripts… made sense when… tools have 6-figure price tags… Actually, some extractions and many transformations can't be done natively in high-priced tools like Informatica and Ab Initio.

Jobs, transformations, steps and hops are the basic building blocks of KETTLE processes

It's great to see the Agile Manisto quoted at the beginning of the discussion of AgileBI. 

Questions and Commonality

In the introduction to our open source solutions (OSS) for decision support systems (DSS) study guide (SG), I gave a variety of examples of activities that might be considered using a DSS. I asked some questions as to what common elements exist among these activities that might help us to define a modern platform for DSS, and whether or not we could build such a system using open source solutions.

In this post, let's examine the first of those questions, and see if we can start answering those questions. In the next post, we will lay out a syllabus of sorts for this OSS DSS SG.

The first common element is that in all cases, we have an individual doing the activity, not a machine nor a committee.

Secondly, the individual has some resources at their disposal. Those resources include current and historical information, structured and unstructured data, communiqués and opinions, and some amount of personal experience, augmented by the experience of others.

Thirdly, though not explicit, there's the idea of digesting these resources and performing formal or informal analyses.

Fourthly, though again, not explicit, the concept of trying to predict what might happen next, or as a result of the decision is inherent to all of the examples.

Finally, there's collaboration involved. Few of us can make good decisions in a vacuum.

Of course, since the examples are fictional, and created by us, they represent our biases. If you had fingered our domain server back in 1993, or read our .project and .plan files from that time, you would have seen that we were interested in sharing information and analyses, while providing a framework for making decisions using such tools as email, gopher and electronic bulletin boards. So, if you identify any other commonalities, or think anything is missing, please join the discussion in the comments.

From these commonalities, can we begin to answer the first question we had asked: "What does this term [DSS] really mean?". Let's try.

A DSS is a set of processes and technology that help an individual to make a better decision than they could without the DSS.

That's nice and vague; generic enough to almost meaningless, but provides some key points that will help us to bound the specifics as we go along. For example, if a process or technology doesn't help us to make a better decision, than it doesn't fit. If something allows us to make a better decision, but we can't define the process or identify the technology involved, it doesn't belong (e.g. "my gut tells me so").

Let's create a list from all of the above.

  1. Individual Decision Maker
  2. Process
  3. Technology
  4. Structured Data
  5. Unstructured Data
  6. Historical Information
  7. Current Information
  8. Communication
  9. Opinion
  10. Collaboration
  11. Analysis
  12. Prediction
  13. Personal Experience
  14. Other's Experience

What do you think? Does a modern system to support decisions need to cover all of these elements and no others? Is this list complete and sufficient? The comments are open.

Pentaho Reporting Review

As promised in my post, "Pentaho Reporting 3.5 for Java Developers First Look", I've taken the time to thoroughly grok Pentaho Reporting 3.5 for Java Developers by Will Gorman [direct link to Packt Publishing][Buy the book from Amazon]. I've read the book, cover-to-cover, and gone through the [non-Java] exercises. As I said in my first look at this book, it contains nuggets of wisdom and practicalities drawn from deep insider knowledge. This book does best serve its target audience, Java developers with a need to incorporate reporting into their applications. But it is also useful for report developers who wish to know more about Pentaho, and Pentaho users who wish to make their use of Pentaho easier and the resulting reporting experience richer.

The first three chapters provide a very good introduction to Pentaho Reporting and its relationship to the Pentaho BI Suite and the company Pentaho, historical, technical and practical. These three chapters are also the ones that have clearly marked sections for Java specific information and exercises. By the end of Chapter Three, you'll have installed Pentaho Report Designer, and built several rich reports. If you're a Java developer, you'll have had the opportunity to incorporate these reports into both Tomcat J2EE or Swing web applications. You'll have been introduced to the rich reporting capabilities of Pentaho, accessing data sources, the underlying Java libraries, and the various output options that include PDF, Excel, CSV, RTF, XML and plain text.

Chapters 4 through 8 is all about the WYSIWYG Pentaho Report Designer, the pixel-level control that it gives you over the layout of your reports, and the many wonderful capabilities provided by Pentaho Reporting from a wide range of chart types to embedding numeric and text functions, to cross-tabs and sub-reports. Other than Chapter 5, these chapters are as useful for a business user creating their own reports, as it is for a report developer. Chapter 5 is a very deep dive, very technical look at incorporating various data sources. The two areas that really stand out are the charts (Chapter 6) and functions (Chapter 7).

There are a baker's dozen types of charts covered, with an example for each type. Some of the more exotic are Waterfall, Bar-Line, Radar and Extended XY Series charts.

There are hundreds of parameters, functions and expressions that can be used in Pentaho Reports, and Will covers them all. The formula capability of Pentaho Reporting follows the OpenFormula standard, similar to the support for formulæ in Microsoft Excel, and the same as that followed by One can provide computed text or numeric values within Pentaho reports to a fairly complex extent. Chapter 7 provides a great introduction to using this feature.

Chapters 9 through 11 are very much for the software developer, covering the development of Interactive Reports in Swing and HTML, the use of Pentaho's APIs and extension of Pentaho Reporting capabilities. It's all interesting stuff, that really explains the technology of Pentaho Reporting, but there's little here that is of use to the business user or non-Java report developer.

The first part of Chapter 12, on the other hand, is of little use to the Java developer, as it shows how to take reports created in Pentaho Report Designer and publish them through the Pentaho BI-Server, including formats suitable to mobile devices, such as the iPhone. The latter part of Chapter 12 goes into the use of metadata, and is useful both for the report developer and the Java developer.

So, as I said in my first look, the majority of the book is useful even if you're not a Java developer who needs to incorporate sophisticated reports into your application. That being said, Will Gorman does an excellent job in explaining Pentaho Reporting, and making it very useful for business users, report designers, report developers and, his target audience, Java developers. I heartily recommend that you buy this book. [Amazon link]

Information Architecture and DynamoBI

Anyone who follows either Nicholas Goodman or myself on Twitter (links are to our Twitter handles) or follow either this blog or Nick's Goodman on BI blog, know that I've been helping Nick out here and there with his new business, Dynamo Business Intelligence Corporation, offering support and commercial (and still open source) packages of the "best column-store database you never heard of", LucidDB.

One of the things that I'll be doing over the next few weeks is some website and community development. For all that I've been an executive type for decades, I love to keep hands-on with various technologies, and one of those technologies is "THE WEB". While I've never made a living as a web developer, I started with the web very early on, developing internal sites for the Lynx browser, as one of the internal web chiefs, learning from Comet, the Oracle web master. The first commercial site that I did, in 1994, for the local Eagle Express Flowers, is still up, with a few modernizations. :)

So, while waiting for the style guide from CORHOUSE, who designed the new Dynamo Business Intelligence Corporation logo [what do you think of it?]…

I've decided to go through an old friend. Information Architecture for the World Wide Web: Designing Large-Scale Web Sites

This exercise has reminded me that Information Architecture isn't just important for websites, but also for all the ways that individuals and businesses organize their data, concepts, information and knowledge. I'm happy to be helping out DynamoBI, and glad that doing so led me to this reminder of something I've been taking for granted. Time to revisit those [Ever]notes, [Zotero] researches, files and what not.

Pentaho Reporting 3.5 for Java Developers First Look

I was approached by Richard Dias of Packt Publishing to review "Pentaho Reporting 3.5 for Java Developers" written by Will Gorman. (Link is to

Richard Dias has indicated you are a Friend:

Hi Joseph,

My name is Richard Dias and I work for Packt Publishing which specializes in publishing focused IT related books.

I was wondering if you would be interesteed in reviewing the book "Pentaho Reporting for Java Developers" written by Will Gorman.

- Richard Dias

After some back and forth, I decided to accept the book in exchange for my review.

Hi Joseph,

Thanks for the reply and interest in reviewing the book. I have just placed an order for a copy of the book and it should arrive at your place within 10 days. Please do let me know when you receive it.

I have also created a unique link for you. It is Please feel free to use this link in your book review.

In the meanwhile, if you could mention about the book on your blog and tweet about the book, it would be highly appreciated. Please do let me know if it is fine with you.

I’m also sending you the link of an extracted chapter from the book (Chapter 6 Including Charts and Graphics in Reports). It would be great if you could put up the link on your blog. This would act as first hand information for your readers and they will also be able to download the file.

Any queries or suggestions are always welcome.

I look forward to your reply.

Best Regards,


Richard Dias
Marketing Research Executive | Packt Publishing |

Shortly thereafter, I received notification that the book had shipped. It arrived within two weeks.

Of course, I've been too busy to do more than skim through the book. Anyone who follows me as JAdP on Twitter knows that in the past few weeks, I've been:

  • helping customers with algorithm development and implementing Pentaho on LucidDB,
  • working with Nicholas Goodman with his planning for commercial support of LucidDB through Dynamo Business Intelligence, and roadmaps for DynamoDB packages built on LucidDB's plugin architecture, and
  • migrating our RHEL host at ServerBeach from our old machine to a new one, while dealing with issues brought about by ServerBeach migrating to Peer1's tools.

None of which has left any time for a thorough review of "Pentaho Reporting for Java Developers".

I hope to have a full review up shortly after the holidays, which for me runs from Solstice to Epiphany, and maybe into the following weekend.

First, a little background. Will Gorman, the author, works for Pentaho, in software engineering, as a team lead, and works primarily on Pentaho Reporting products, a combination of server-side (Pentaho BI-Server), Desktop (MacOSX, Linux and Windows platforms) and Web-based software (Reporting Engine, Report Designer, Report Design Wizard and Pentaho Ad Hoc Reporting), which stems from the open source JFreeReport and JFreeChart. While I don't know Will personally, I do know quite a few individuals at Pentaho, and in the Pentaho community. I very much endorse their philosophy towards open source, and the way they've treated the open source projects and communities that they've integrated into their Pentaho Business Intelligence Suite. I do follow Will on Twitter, and on the IRC Freednode Channel, ##pentaho.

I myself am not a Java Developer, so at first I was not attracted to a book with a title that seemed geared to Pentaho Developers. Having skimmed through the book, I think that the title was poorly chosen. (Sorry Richard). I find that I can read through the book without stumbling, and that there is plenty of good intelligence that will help me better server and instruct my customers through the use of Pentaho Report Designer.

My initial impressions are good. The content seems full of golden nuggets of "how-tos" and background information not commonly known among the Pentaho community. Will's knowledge of Pentaho Reporting and how it fits into the rest of the Pentaho tools, such as KETTLE (Pentaho Data Integration) and Mondrian (Pentaho Analysis), along with a clear writing style makes all aspects of Pentaho more accessible to the BI practitioner, as well as those that wish to embed Pentaho Reporting into their own application.

This book is not just for Java developers, but for anyone who wishes to extend their abilities in BI, Reporting and Analysis, with Pentaho as an excellent example.

I'll be following up with the really exciting finds as I wend my way through Will's gold mine of knowledge, and, will do my best to fulfill my promise of a full review by mid-January.

You can also click through the Chapter 6 (a PDF) as mentioned in Richard's email.

Thank you, Richard. And most especially, thank you, Will.

OSBI Book Status

The Open Source Business Intelligence Book, as originally conceived, is dead. Despite the transparency of the blogosphere, we don't feel the need to go into the details as to why. The Open Source Business Intelligence research project, blog, wiki and lens, will continue, and improve, without the distractions and roadblocks posed by the book project.

We do want to share some of the things that we have learned during our year long attempt to have our book published.

The traditional process of publishing a book doesn't serve the needs of the technology industry. Joe Wikert, in his "A Book Publisher's Blog", discusses many of the issues. A simple summary would be that it simply takes too long; by the time a book is accepted and published, it is, in many cases, out of date.

One lesson that keeps coming up over the years, is that every partnership, every business arrangement must be in writing, with as many details and contingencies thought out as is humanly possible. Until everyone is willing to put things in writing, you aren't doing business.

Some large publishing houses are hopeful that eBooks will be the answer for their technical book sales woes. But as long as the model is the traditional publishing model, the problems aren't addressed. Whether the end product is a static eBook or a static physical book, books sales will continue to slide.

Our frustrations with attempting to use traditional methods, and our learning experience with online publishing platforms, as well as our use of various eBook systems is causing us to reconsider our formulation of the TeleInterActive Press, with its blogs, podcasts, wikis, surveys, CMS and document management systems. What would other authors want? What do consumers need? How must the publishing industry change? How will all of this come together into a viable new publishing model to make technical publications relevant, immediate, easily updated and convenient?

We'll see as we continue to develop the OSBI wiki, this blog and related lens, surveys, and total content/document management, all in the context of researching other book topics.

Reviewer Hibernation Period Almost Done

The following are taken from an email exchnage with the publisher currently looking at our Open Source Business Intelligence book proposal.

'm sorry, but things are going more slowly due to the holidays. Now, I've really only gotten one review back (it was positive). My other reviewers appear not to be coming through.

"Can you suggest a couple of reviewers? Ideally somebody from the target audience?"end quotation
-- First email

Nevermind on the reviewers seem to have emerged from their
holiday slumber...end quotation
-- Second email

Dang! I hope it was the holiday cheer that put them out, and not the proposal itself. :>> At least the one reviewer who was awake liked the idea. :idea:

Open Source Vendors

We (Clarise, Bernard and I) recently submitted two article ideas, and one was rejected becuase

The second one is product-specific, thus unsuitable for publication. If you wanted to reshape it into a product review, which are written by end users to describe their experience with a product, please contact [a sister publication]... must emphasize that you avoid mentioning product names as well as methodologies specific to certain vendors.end quotation

Here's the proposal for that second article, which will now be writen on this blog.

Mondrian is an open source OLAP engine that is very mature, having been in use since 2001. It is of interest, not only for its own capabilities, but for the fact that it is included in or required for nearly every other open source BI project that includes OLAP capability, from simple tools such as jPivot to full BI suites such as Pentaho. This article provides details about Mondrian and discusses its use in and importance to open source BI. The article will also discuss how to incorporate Mondrian into an organization's BI project.end quotation

As far as we know, Mondrian doesn't have a commercial arm, though its recent relationship with Pentaho may change that. It is difficult for me to think of various open source solutions as "vendors". That's why I tend to refer to them as "projects" rather than "products". Granted, some open source projects are dual-licensed, or have a commercial arm, like Green Plum for Bizgres, and Kinetic Network for KETL, MySQL AB for MySQL. So, is this rejection an indication of a telling lack of awareness about open source, or am I wrong in my thinking? Are open source licensed software packages projects or products? Is discussing an open source project, a discussion about a "specific vendor"? Food for thought.

Open Source Business Intelligence

Open Source Business Intelligence is a technology with a ready market. Oopen source RDBMS have matured to the point where they can be used reliably as a data warehouse to support business intelligence solutions. Many open source projects are being introduced to expand the capabilities of open source solutions beyond reporting, and even simple OLAP, into complete business intelligence suites. Just take a look at the LinkBlog in the sidebar.

There are still some rough spots, however. So, in an effort to help data warehousing experts and open source afficianados along the path to making effective use of open source software for BI, we're introducing this blog, and a wiki, as companions to our effort to guide our forthcoming book with the working title of Open Source Business Intelligence.

April 2018
Mon Tue Wed Thu Fri Sat Sun
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
 << <   > >>

At the beginning, The Open Source Solutions Blog was a companion to the Open Source Solutions for Business Intelligence Research Project, and book. But back in 2005, we couldn't find a publisher. As Apache Hadoop and its family of open source projects proliferated, and in many ways, took over the OSS data management and analytics world, our interests became more focused on streaming data management and analytics for IoT, the architecture for people, processes and technology required to bring value from the IoT through Sensor Analytics Ecosystems, and the maturity model organizations will need to follow to achieve SAEIoT success. OSS is very important in this world too, for DMA, API and community development.

37.652951177164 -122.490877706959


  XML Feeds


Our current thinking on sensor analytics ecosystems (SAE) bringing together critical solution spaces best addressed by Internet of Things (IoT) and advances in Data Management and Analytics (DMA) is here.

Recent Posts

powered by b2evolution free blog software