Category: "Agile"

Reading Pentaho Kettle Solutions

On a rainy day, there's nothing better than to be sitting by the stove, stirring a big kettle with a finely turned spoon. I might be cooking up a nice meal of Abruzzo Maccheroni alla Chitarra con Polpettine, but actually, I'm reading the ebook edition of Pentaho Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration on my iPhone.

Some of my notes made while reading Pentaho Kettle Solutinos:

…45% of all ETL is still done by hand-coded programs/scripts… made sense when… tools have 6-figure price tags… Actually, some extractions and many transformations can't be done natively in high-priced tools like Informatica and Ab Initio.

Jobs, transformations, steps and hops are the basic building blocks of KETTLE processes

It's great to see the Agile Manisto quoted at the beginning of the discussion of AgileBI. 

BayAreaUseR October Special Event

Zhou Yu organized a great special event for the San Francisco Bay Area Use R group, and has asked me to post the slide decks for download. Here they are:

No longer missing is the very interesting presentation by Yasemin Atalay showing the difference in plotting analysis using the Windermere Humic Aqueous Model for river water environmental factors, without using R and then the increased in variety and accuracy of analysis and plotting gained by using R.


Today I had the good fortune of speaking with Doug Moran, Founder and VP of Community, and James Dixon, Founder and Chief Geek/CTO of Pentaho about their OpenScrum methodology. Doug had responded to my question on LinkedIN, "What has been your experience with Master Data Management and SOA, SaaS and Agile, in any combination?"

In creating the OpenScrum Agile software development method, Pentaho faced many of the same challenges that we've been facing with some of our customers:

  1. Extending Agile methods to be used by distributed workgroups, in Pentaho's case this includes every inhabited time zone on this planet for their extended community and several time zones in the USA and EC for the core group
  2. Adjusting Agile methods to work with several products or modules or projects in parallel
  3. Documenting the communication among team members even when "The most efficient and effective method of conveying information to and within a development team is face-to-face conversation" isn't possible - quote taken from the Agile Manifesto Principles
  4. Dealing with the concepts of timeboxes and rhythm
  5. Deciding on who best fills the roles of product owner, scrum master, and the various levels of commitment among the core group, the extended team and the community at large
  6. How best to involve QA/Test - a subject on which Scrum gives little guidance

The OpenScrum methodology deals with much of this, as does our ever evolving N Dimensions of a Project methodology, which has gone from 5D™ in the mid- to late-90's to the 8D™ iteration that we're currently documenting.

The most interesting point to come from our discussion is that Doug and James have come to many of the same conclusions and principles as have Clarise and I, even though our "experimental sample" &#59;) is the Pentaho open source communities on the one hand, and the various internal IT and SaaS software development distributed workgroups on the other.

One of the first things we discussed concerned the idea of fixed timeboxes and rhythm. The idea of rhythm is very attractive, but very difficult to achieve in most situations. In many ways, the concepts that Agile methods address: responsiveness to changing user needs and a changing market, coupled with the realities of changing personnel and other organizational and business changes, make achieving a short-term rhythm very difficult. As an organization matures, perhaps a longer term rhythm, such as the quarterly or "seasonal" rhythms of might be a realistic goal. As an organization first moves into an Agile process, and develops an Agile software development methodology that fits with their evolving culture and ecosystem, sprint timeboxes must be flexible, and achieving a fixed release rhythm isn't practical. For our part, we make setting the timebox part of each sprint planning meeting. Once set, the timebox is inviolable, and any further negotiations must be around the feature set to be achieved in the sprint.

In the same area of time, James and Doug shared their observation that the sprint burn-down chart can result in a false sense of security or panic. No work effort follows the nice smooth regression line of a sprint burn-down chart, but more sophisticated modeling, to plan for the stepped nature of real-life work, is beyond simple spreadsheets. It's not beyond the current math modeling state of the art nor the capabilities inherent to the various analytic and data mining modules of Pentaho. While Pentaho currently has no plans for a "Pentaho for OpenScrum" similar to "Pentaho for Jira", it's certainly food for thought for the Pentaho community, and led to some fun brainstorming on our call.

An issue related to time, and space in this instance, is that almost everyone today works in some form of distributed workgroup. This may include the occasional telecommuter, or, as in the case of Pentaho and other open source projects, communities that span the globe. This would seem to preclude most Agile methods, especially Scrum, which puts a great deal of importance on stand-up face-to-face meetings, specifically the daily scrum. One way to get around this is to use instant messaging or teleconferences, especially if remote whiteboarding can be used - though always one group or another is inconvenienced by the selected time. Another way is to use blogs, wikis or forums to supplement the daily scrum, especially for those who might not be able to attend due to the time selected. One other things that we've done is to record the daily scrum, either audio alone, or with video. The take-away here is that some form of asynchronous communication, and keeping an historical reference, is a necessary addition in the face of the flat world of software development today.

The idea of parallel scrums has been batted around within the Agile communities for some time now. The reality is that sprints must often be done in parallel and some individuals are going to be assigned roles in more than one sprint team simultaneously. The reasons for this are many, but the bottom line was discovered by project managers long ago: small teams with sharply focused goals are more often successful than not. Large, complex or broadly defined goals must be managed as programs or portfolios, not as projects. We've followed separate strategic, tactical and implementation tracks for over a decade, with a modified iterative waterfall approach cycling through these three tracks throughout time. This allows us to easily incorporate parallel implementation sprints for each tactical project. The OpenScrum methodology has some good depictions of their approach to parallel sprints. For another perspective, I can draw upon a buffet-line-conversation I had with a senior project manager [who felt that Agile was "crap"] at a recent SFBAC (San Francisco Bay Area Chapter) meeting of the PMI (Project Management Institute) and that is the fact that Agile Software Development methods are not generally applicable Project Management methods. [My opinion and not necessarily representative of anyone else mentioned in this post.] As such, it's much easier to see how parallel sprints can be planned. With Agile methods, we're developing software, not dams, bridges or space shuttles. Software development is much more of an art than an engineering discipline, and the Agile methods allow for this artistry.

The roles of Product Owner, Scrum Master and Scrum team are very important to Scrum and OpenScrum and 8D™ and still requires refinement. Many folk that we've encountered in our consulting, naturally assume that the role of Product Owner is best filled by an existing Product Manager and that the Scrum Master is a lead engineer. But just as your best technical person is often a poor choice for any management role [soapbox time: forcing great techies into management roles as the only path for advancement and "the big bucks" is just plain stupid today] the lead engineer may not be the best Scrum Master. We feel that the Product Owner should be a true representative from the user community, or at the very least a customer advocate. A Scrum Master must be skilled in corporate communication, escalation and, yes, politics, and empowered to clear obstacles that could prevent the sprint from achieving its goal. In addition to these two roles, the Scrum Team should have no more than seven (7) other [full time equivalent] individuals with technical and business skills, and subject matter expertise that assure the sprint's success. This is a guideline, with lots of ways to be implemented.

Another area of discussion that deserves weeks of attention, not the few minutes that we could devote to it in our call, is how best to incorporate QA and all the various testing efforts that any engineering effort requires. Well-thought-out test scenarios and automation certainly play an important part, but that's not the end of it. And we didn't get anywhere near the end in our discussion. Most Scrum articles, books, etc. kind-of ignore QA. Other Agile methods, such as Test Driven Development are centered around it.

We believe in incorporating QA and Test as part of the core group, and that code review & unit testing, at the least, must be a part of the automated daily build process. But what about regression and User Acceptance Testing (UAT)? If the idea is that each sprint results in usable code, isn't regression testing a major part of getting to usable code? Isn't UAT the proof of the pudding? But are such activities a part of the Sprint, a separate, overlapping/parallel Sprint, or a separate activity altogether? And what about QA of specifications, documentation, data integrity and metadata?

There are also some issues specific to open source companies/projects related to User Acceptance Testing, and user satisfaction metrics in general. It's one thing to count downloads, but how does one measure the comments, or really, the lack of comments in the forums? Open source companies often don't know who has downloaded their product, and don't whether the lack of comments on a new release is indicative of satisfaction or disinterest or even disgust. One way in which an open source company can get some amount of automated feedback is to have the product report an heartbeat, or other "phone home" technique [see Positive Feedback Enablers in the OpenScrum methodology]. Pentaho has implemented an "opt-in" or "opt-out" system for their heartbeat and has had no negative feedback for implementing this system.

As you can see… many questions. The answers are often dependent upon the supporting infrastructure, maturity and culture of the organization or community.

The culture results from the personalities of the pigs [the truly committed], the chickens [the involved], the sheep [who prefer to be herded] and the goats [who prefer to be led]; and don't forget the penguins, cats and lone wolves either. :>> Whether in an open source community or a corporate team, a manager may know the core group, but might only guess at the extended team or overall community. And, as with any generalization, the devil is in the details. This is why our methodology has gone from 5D™ to 8D™ over the years, and why James has developed both the Beekeeper and OpenScrum methodologies. From the initial framework for success, one must apply the principles of Agile even to the development of your Agile methods. Again and again, we see the need to adapt to specific situation, cultures and objectives.

We talked for well over an hour. One area that we didn't have time to explore was the Sprint Retrospective vs. Sprint Review. The OpenScrum methodology only discusses the Sprint Retrospective [among the truly committed] whereas we also allow for the Sprint Review [among all interested parties] in our "lessons learned" processes. Perhaps we can go into this further on another call or in the comments.

As always, talking with the Pentaho folk is lots of fun. We're always encouraged by their inventiveness in their execution as an open source company.

April 2018
Mon Tue Wed Thu Fri Sat Sun
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
 << <   > >>

At the beginning, The Open Source Solutions Blog was a companion to the Open Source Solutions for Business Intelligence Research Project, and book. But back in 2005, we couldn't find a publisher. As Apache Hadoop and its family of open source projects proliferated, and in many ways, took over the OSS data management and analytics world, our interests became more focused on streaming data management and analytics for IoT, the architecture for people, processes and technology required to bring value from the IoT through Sensor Analytics Ecosystems, and the maturity model organizations will need to follow to achieve SAEIoT success. OSS is very important in this world too, for DMA, API and community development.

37.652951177164 -122.490877706959


  XML Feeds


Our current thinking on sensor analytics ecosystems (SAE) bringing together critical solution spaces best addressed by Internet of Things (IoT) and advances in Data Management and Analytics (DMA) is here.

Recent Posts

powered by b2evolution free blog software