Here's a personal perspective and a bit of a personal history regarding mathematical modeling and predictives.
The 1980s were an exciting time for mathematical modeling of complex systems. At the time, there were two basic types of modeling: deterministic and stochastic (probability or statistics models). Within stochastic modeling, traditional statistics vs. Bayesian statistics was a burgeoning battleground. Physical simulations (often based upon deterministic models) were giving way to computer simulations (often based upon stochastic models, especially Monte Carlo Simulations). Two theories were popularized during this time: catastrophe theory and chaos theory; ultimately though, both of these theories proved incapable of prediction - the hallmark of a good mathematical model. A different type of modeling technique, based upon relational algebra, was also moving from the theoretical work of Ted Codd, to the practical implementations at (the company now known as) Oracle: data modeling.
Mathematical models are attempts to understand the complex by making simplifying assumptions. They are always a balance between complexity and accuracy. One nice example of the evolution of a deterministic mathematical model can be found in the Ideal Gas Laws, starting with Boyle's Law to Charles' Law to Gay-Lussac's Law to Avogadro's Law, culminating in the Ideal Gas Law, which all of saw in high school chemistry: PV=nRT.
Mathematical models are used in pretty much all fields of endeavor: physical sciences, all types of engineering, behavioral studies, and business. In the 1970's, I used deterministic electrochemical models to understand and predict the behaviour of various chemical stoichiometry for fuel cells and photovoltaic cells. In the 1980's, I used Bayesian statistics, sometimes combined with Monte Carlo Simulations to predict the reliability and risk associated with complex aerospace, utility and other systems.
The most popular use of Bayesian statistics was to expand the a priori knowledge of a complex system with subjective opinions. Likely the most famous application of Bayesian Statistics, at the time I became involved with the branch, was the Rand Corporation's Delphi Method. There was actually a joke in the Aerospace Industry about the Delphi Method:
A team of Rand consultants went to Werner von Braun to seek the expert opinion of the engineers working on a new rocket motor. The consultants explained their Delphi Method thusly. Prior to the first static test of the new rocket motor, they would ask, separately, each of the five engineers working on the new design their opinion of the rocket's reliability. Their opinions would form the Bayesian a priori distribution. After the test, they would reveal the results of the first survey and the test results, and ask the five engineers, collectively, their new opinion of the rocket's reliability. This would form the Bayesian a posteriori, from which the rocket's reliability would be predicted. Doctor von Braun said that he could save them some time. He gathered his team of rocket engineers, and asked them if they thought that the new rocket motor would fail. Each answered, as did Doctor Von Braun, "no" in German. "There, you see, five nines reliability, as specified." declared the good Doctor to the Rand consultants, "No need for any further study on your part."
Yep, it's a side splitter. 
I didn't like this method, and did things a bit differently. My method involved gathering all the data for similar test and production models, weighting each relevant engineering variable, creating the a priori, fitting with Weibull Analysis, designing the Bayesian mathematical conjugate, using a detailed post-mortem of the first and subsequent tests of the system being analyzed, updating and learning as we went, to finally predict the reliability and risk for the system. I first used this on the Star48 perigee kick motor, and went on to refine and use this method for:
I started to call this method "objective Bayes", but that name was already taken by a branch of Bayesian statistics that uses a non-informative a priori. Several of my projects resulted in software programs, all in FORTRAN. The first was used as a justification for a 1 MB [no, not a mistake] "box" [memory] for the corporate mainframe. NASA had sent us detailed data on over 4,000 solid propellant rocket motors. Talk about "big data".
I had a lot of fun doing this into the 1990's.
The next paradigm shift, for me personally, was learning data modeling, and focusing on business processes rather than engineering systems. Spending time at Oracle, including Richard Barker and his computer aided system engineering methods, I felt right at home. Rather than Bayesian Statistics, I would be using relational algebra and calculus for deterministic mathematical models of the data for the business processes being stored in a relational database management system. I very quickly got involved in very large databases, decision support systems, data warehousing and business intelligence.
I was surprised, and, after 17 years, continue to be surprised, how few data modelers agree with the statement in the preceding paragraph. I'm surprised how few data modelers go beyond entity-relationship diagrams; how few know or care about relational algebra and relational calculus. I'm amazed how few people realize that the arithmetic average computed in most "analytic" systems is a fairly useless measure of the underlying data, for most systems. I'm amazed that BI and analytic systems are still deterministic, and always go with simplicity over accuracy.
But computer power continues to expand. Moore's Law still rules. We can do better now. Things that used to take powerful main frames or even supercomputers can be done on laptops now. We no longer need to settle for simplicity over accuracy.
More importantly, the R Statistical Language has matured. Literally thousands and thousands of mathematical, graphical and statistical packages have been added to the CRAN, Omegahat and BioConductor repositories. Even the New York Times has published pieces about R.
It's once again time to move from deterministic to stochastic models.
Over the next few weeks, I hope to post a series of "study guides" that will focus on setting up a web-based environment consolidating SQL and MDX based analytics, as expressed in Pentaho and LucidD open source projects, with R, and possibly SQLStream.
There have been many articles as well on "Big Data". As I commented on Merv Adrian's blog post request for "Ideas for SF Big Data Summit":
One area of discussion, which may appear to be for the “newbies” but is actually a matter of some debate, would be the definition of “big data”.
It really isn’t about the amount of data (TB & PB & more) so much as it is about the volumetric flow and timeliness of the data streams.
It’s about how data management systems handle the various sources of data as well as the interweaving of those sources.
It means treating data management systems in the same way that we treat the Space Transportation System, as a very large, complex system.
-- Comment by Joseph A. di Paolantonio, February 1, 2010 at 4:09 pm
I believe this because there is a huge amount of data about to come down the pipe. I'm not talking about the Semantic Web or the pidly little petabytes of web log and click-through data. I'm talking about the instrumented world. Something that's been in the making for ten years, and more: RFID, SmartDust, ZigBee, and more wired and wireless sensors, monitors and devices that will become a part of everything, everywhere.
Let me just cite two examples from something that is coming, is hyped, but not yet standardized, even if solid attempts at definition are being made: the SmartGrid. First, consider the fact that utility companies are distributing and using smart meters to replace manually read mechanical meters at homes and businesses; this will result in thousands of data points per day as opposed to one per month PER METER. The second is EPRI's copper-riding robot, as explained in a recent Popular Science. Think of the petabytes of data that these two examples will generate monthly. [Order the Smart Grid Dictionary: First Edition on Amazon]
The desire, the need, to analyze and make inferences from this data will be great. The need to actually predict from this data will be even greater, and will be a necessary element of the coming SmartGrid, and in making the instrumented world a better world for all of humanity.
On Twitter today, Lance Walter asked me to go into the Ark Business with him, and Gareth Greenaway asked for entertainment. It must be a rainy Friday afternoon ![]()
I'm not sure about Lance's offer, but I did tell Gareth the following story, from tweet-start to tweet-end. This isn't word for word as I tweeted. 'Tis a bit expanded, but the tale is the same.
Once upon a time there was a young penguin named Tux. Tux decided to set off on a journey through IT Land. Now IT Land is a dangerous place, full of hackers fighting crackers, and ruled by those in the Ivory Tower and the acolytes of the Megaliths.
Along the way, the adventurous Tux met the Dolphin, the Elephant and the Beekeeper. They made a pact on the Lucid glyph to become a Dynamo of IT, bringing power to the datasmiths of the Land.
They met many Titans from the Megaliths on their Quest. The Beekeeper used the open source bees to open the scrum along the way, blocking the hookers with their sharp claws.
Some of the Titans were helpful, some, not so much.
The Dolphin was empowered by the Sun. But the Sun was consumed by a powerful Oracle. The Elephant, too, gained a powerful ally, and they do Enterprise against the Oracle. The band of the Quest was broken, and Tux was sad.
The Era of Lucid thought ended, but the Dynamo yet powers the Lucid Glyph, and Tux can rely on the Dynamo and the Beekeeper to predict a future clear of the Oracle.
And thus this quest ends, but another soon begins, where Tux will meet new friends and new foes. Will Beastie and the dæmons be allies? Will the Paladin in the Red Hat be stalwart?
Perhaps we'll find out at OSCON, for Gareth suggested that an assemblage of geeks would enjoy this story, and we'll see if OSCON thinks our tales worthy of a keynote slot in 2010.
Do you recognize all the characters in this tale? Maybe the links will help.
What say you, OSCON? Would these tales make a worthy Keynote?
As promised in my post, "Pentaho Reporting 3.5 for Java Developers First Look", I've taken the time to thoroughly grok Pentaho Reporting 3.5 for Java Developers by Will Gorman [direct link to Packt Publishing][Buy the book from Amazon]. I've read the book, cover-to-cover, and gone through the [non-Java] exercises. As I said in my first look at this book, it contains nuggets of wisdom and practicalities drawn from deep insider knowledge. This book does best serve its target audience, Java developers with a need to incorporate reporting into their applications. But it is also useful for report developers who wish to know more about Pentaho, and Pentaho users who wish to make their use of Pentaho easier and the resulting reporting experience richer.
The first three chapters provide a very good introduction to Pentaho Reporting and its relationship to the Pentaho BI Suite and the company Pentaho, historical, technical and practical. These three chapters are also the ones that have clearly marked sections for Java specific information and exercises. By the end of Chapter Three, you'll have installed Pentaho Report Designer, and built several rich reports. If you're a Java developer, you'll have had the opportunity to incorporate these reports into both Tomcat J2EE or Swing web applications. You'll have been introduced to the rich reporting capabilities of Pentaho, accessing data sources, the underlying Java libraries, and the various output options that include PDF, Excel, CSV, RTF, XML and plain text.
Chapters 4 through 8 is all about the WYSIWYG Pentaho Report Designer, the pixel-level control that it gives you over the layout of your reports, and the many wonderful capabilities provided by Pentaho Reporting from a wide range of chart types to embedding numeric and text functions, to cross-tabs and sub-reports. Other than Chapter 5, these chapters are as useful for a business user creating their own reports, as it is for a report developer. Chapter 5 is a very deep dive, very technical look at incorporating various data sources. The two areas that really stand out are the charts (Chapter 6) and functions (Chapter 7).
There are a baker's dozen types of charts covered, with an example for each type. Some of the more exotic are Waterfall, Bar-Line, Radar and Extended XY Series charts.
There are hundreds of parameters, functions and expressions that can be used in Pentaho Reports, and Will covers them all. The formula capability of Pentaho Reporting follows the OpenFormula standard, similar to the support for formulæ in Microsoft Excel, and the same as that followed by OpenOffice.org. One can provide computed text or numeric values within Pentaho reports to a fairly complex extent. Chapter 7 provides a great introduction to using this feature.
Chapters 9 through 11 are very much for the software developer, covering the development of Interactive Reports in Swing and HTML, the use of Pentaho's APIs and extension of Pentaho Reporting capabilities. It's all interesting stuff, that really explains the technology of Pentaho Reporting, but there's little here that is of use to the business user or non-Java report developer.
The first part of Chapter 12, on the other hand, is of little use to the Java developer, as it shows how to take reports created in Pentaho Report Designer and publish them through the Pentaho BI-Server, including formats suitable to mobile devices, such as the iPhone. The latter part of Chapter 12 goes into the use of metadata, and is useful both for the report developer and the Java developer.
So, as I said in my first look, the majority of the book is useful even if you're not a Java developer who needs to incorporate sophisticated reports into your application. That being said, Will Gorman does an excellent job in explaining Pentaho Reporting, and making it very useful for business users, report designers, report developers and, his target audience, Java developers. I heartily recommend that you buy this book. [Amazon link]
Anyone who follows either Nicholas Goodman or myself on Twitter (links are to our Twitter handles) or follow either this blog or Nick's Goodman on BI blog, know that I've been helping Nick out here and there with his new business, Dynamo Business Intelligence Corporation, offering support and commercial (and still open source) packages of the "best column-store database you never heard of", LucidDB.
One of the things that I'll be doing over the next few weeks is some website and community development. For all that I've been an executive type for decades, I love to keep hands-on with various technologies, and one of those technologies is "THE WEB". While I've never made a living as a web developer, I started with the web very early on, developing internal sites for the Lynx browser, as one of the internal web chiefs, learning from Comet, the Oracle web master. The first commercial site that I did, in 1994, for the local Eagle Express Flowers, is still up, with a few modernizations. ![]()
So, while waiting for the style guide from CORHOUSE, who designed the new Dynamo Business Intelligence Corporation logo [what do you think of it?]…

I've decided to go through an old friend. Information Architecture for the World Wide Web: Designing Large-Scale Web Sites
This exercise has reminded me that Information Architecture isn't just important for websites, but also for all the ways that individuals and businesses organize their data, concepts, information and knowledge. I'm happy to be helping out DynamoBI, and glad that doing so led me to this reminder of something I've been taking for granted. Time to revisit those [Ever]notes, [Zotero] researches, files and what not.
I was approached by Richard Dias of Packt Publishing to review "Pentaho Reporting 3.5 for Java Developers" written by Will Gorman. (Link is to Amazon.com)
Richard Dias has indicated you are a Friend:Hi Joseph,
My name is Richard Dias and I work for Packt Publishing which specializes in publishing focused IT related books.
I was wondering if you would be interesteed in reviewing the book "Pentaho Reporting for Java Developers" written by Will Gorman.
- Richard Dias
After some back and forth, I decided to accept the book in exchange for my review.
Hi Joseph,
Thanks for the reply and interest in reviewing the book. I have just placed an order for a copy of the book and it should arrive at your place within 10 days. Please do let me know when you receive it.
I have also created a unique link for you. It is http://www.packtpub.com/pentaho-reporting-3-5-for-java-developers?utm_source=press.teleinteractive.net&utm_medium=bookrev&utm_content=blog&utm_campaign=mdb_001537. Please feel free to use this link in your book review.
In the meanwhile, if you could mention about the book on your blog and tweet about the book, it would be highly appreciated. Please do let me know if it is fine with you.
I’m also sending you the link of an extracted chapter from the book (Chapter 6 Including Charts and Graphics in Reports). It would be great if you could put up the link on your blog. This would act as first hand information for your readers and they will also be able to download the file.
Any queries or suggestions are always welcome.
I look forward to your reply.
Best Regards,
Richard
Richard Dias
Marketing Research Executive | Packt Publishing | www.PacktPub.com
Shortly thereafter, I received notification that the book had shipped. It arrived within two weeks.
Of course, I've been too busy to do more than skim through the book. Anyone who follows me as JAdP on Twitter knows that in the past few weeks, I've been:
None of which has left any time for a thorough review of "Pentaho Reporting for Java Developers".
I hope to have a full review up shortly after the holidays, which for me runs from Solstice to Epiphany, and maybe into the following weekend.
First, a little background. Will Gorman, the author, works for Pentaho, in software engineering, as a team lead, and works primarily on Pentaho Reporting products, a combination of server-side (Pentaho BI-Server), Desktop (MacOSX, Linux and Windows platforms) and Web-based software (Reporting Engine, Report Designer, Report Design Wizard and Pentaho Ad Hoc Reporting), which stems from the open source JFreeReport and JFreeChart. While I don't know Will personally, I do know quite a few individuals at Pentaho, and in the Pentaho community. I very much endorse their philosophy towards open source, and the way they've treated the open source projects and communities that they've integrated into their Pentaho Business Intelligence Suite. I do follow Will on Twitter, and on the IRC Freednode Channel, ##pentaho.
I myself am not a Java Developer, so at first I was not attracted to a book with a title that seemed geared to Pentaho Developers. Having skimmed through the book, I think that the title was poorly chosen. (Sorry Richard). I find that I can read through the book without stumbling, and that there is plenty of good intelligence that will help me better server and instruct my customers through the use of Pentaho Report Designer.
My initial impressions are good. The content seems full of golden nuggets of "how-tos" and background information not commonly known among the Pentaho community. Will's knowledge of Pentaho Reporting and how it fits into the rest of the Pentaho tools, such as KETTLE (Pentaho Data Integration) and Mondrian (Pentaho Analysis), along with a clear writing style makes all aspects of Pentaho more accessible to the BI practitioner, as well as those that wish to embed Pentaho Reporting into their own application.
This book is not just for Java developers, but for anyone who wishes to extend their abilities in BI, Reporting and Analysis, with Pentaho as an excellent example.
I'll be following up with the really exciting finds as I wend my way through Will's gold mine of knowledge, and, will do my best to fulfill my promise of a full review by mid-January.
You can also click through the Chapter 6 (a PDF) as mentioned in Richard's email.
Thank you, Richard. And most especially, thank you, Will.
On November 14 and 15th, I attended openSQLcamp 2009 in Portland, OR. It was a great event, and I was honored to be accepted for a five minute lightening talk: "I Play with Data". I would like to thank Sheeri (aka tcation) for providing Youtube videos of the lightening talks. Here's mine:
And here's a transcript, with links to things that were mentioned.
Hi mine name is Joseph, and I play with data.
It's good that I followed David [David J. Lutz, Director of Technical Sales Support, Infobright] because part of what I'm looking for, in the solution of how to do statistics with SQL, is column-store databases.
Way back in the 70's & 80's, I was doing pair programming with FORTRAN programmers [laughter in background]
turning algorithms into software. I was, pair programming, we sat down together, I would write math, they would write software, we did things [mostly in Bayes], through the 80's [most with Wendy, who still works with me occasionally].
Then I started playing with data through other people algorithms using SQL, and relational database management systems, and then later, Business Intelligence systems, and most recently playing a lot with Pentaho, using that.
And I'm going to make a lot of statements, but I really have a question. I know of three ways that I can start doing real statistics with SQL databases. And I want to do real statistics because the most you can get just with AVERAGE, is, assuming that I have a uniform distribution or a normal distribution, and even in many cases, an average isn't necessarily the mean, and the mean is certainly not the best descriptor of the underlying distribution of the data. Right?
So, I can start doing fancier algorithms in SQL, but they're painful. And you know the big-O number, and they're nasty big-O numbers, to do, even if I have a frequency function, to try to arrive at the mean or the mode, simple things.
And if I want to do Bayesian statistics, and a Markov Chain Monte Carlo simulation to get at inferences on mathematical conjugates [snickering in the background]
… I'm not going to do this in SQL.
So, I have two other choices that I've been exploring.
Anyone here familiar with the R Project? [Several affirmative responses] Ya! Yeah! All right! I love the R Project, and I'm having a lot of fun with the R Project. The R Project is written in R and C and FORTRAN and there are thousands of packages written in FORTRAN and C and R and I'm doing a lot of nice math with it now, and that's a lot of fun. But everything in R is actually in data sets, and data sets are column-store databases, in memory. And even though you can get 8GB of memory on a lap top now, I run out of memory, frequently, with the type of stuff I do. So, what do I do? I use SQL, because relational database management systems, manage data really, really well, and R analyzes the data really, really well, and R speaks SQL through either RODBC, or DBI… Off you go.
So, I would like to use column-store databases, and one of my questions is that I'm looking for a way of speeding this up, so that I can match a column-store data set in R in memory with a column-store database such as Infobright or MonetDB or LucidDB. And do this one-to-one mapping much more efficiently than I can going through ODBC.
Does anyone have any thoughts on this?
[Discussion with someone in the audience - if you read this, please identify yourself in the comments, and thank you for talking to me] Have you heard of TL/R [my error in listening]?
I have not. I've never heard of TL/R.
It's R embedded in PostgreSQL.
OK, yes, I have. Did you say TL or PL?
PL. [PL/R by Joe Conway is back in development and becoming interesting again].
Yeah, PL/R I know. And there's a lot of things like that, but they're basically interfaces.
Yeah, which isn't all that mature. It tries to map the name of the dataframe in R, where you're doing your stuff in R, to a table in MySQL [in the weeds]. Which is really what you want, is to prodSQL, is that relationship of the sets, where basically you overloaded the dataframe… so you can access… overloaded the access operator… to go out to a SQL area, however it does it.
OK, so SQLDF.
A third solution that I've been looking at is LucidDB, which is a column-store database with a plug-in architecture, written in Java. And there is the math commons on apache.com [oops] packages which have real statistic packages, probability distribution packages, all sorts of really neat packages, which are essentially Java libraries and I would like to see real statistics written into LucidDB as plug-ins for LucidDB [horn sounds] If anyone is interested. Thank you so much.
The notes taken during the lightening rounds were written by Ben Hengst, and can be found at openSQLcamp Lightening Talks
That last part is really the most important to me. I'm working with Nick Goodman, who recently started Dynamo Business Intelligence, and with advice from Julian Hyde and others in the Eigenbase community, to develop plugins for LucidDB which might be bundled into ADBMS versions of DynamoDB to do real statistics, making real inferences and real predictions, using the math packages from the Apache Commons, and having a transparent interface to R, so that R isn't limited by in-memory constraints.
Why not join us on irc.freenode.net ##luciddb and discuss it?
This post is in response to "Volunteer for the Greater Good" written by S. Kleiman. I remember that village in Pennsylvania, and the attitudes of my friend at that time. I'm not surprised that you're attracted to open source; I am surprised that you're having trouble with embracing its ideals. We've have had an email exchange on this subject, and, as you know, I'm fairly attracted to open source solutions my self.
I hadn't seen your blog prior to answering your email, so let me go into a bit more detail here.
"The model contributor is a real geek – a guy in his 20-30’s, single, lives in his parent’s basement, no mortgage, no responsibility other than to pick up his dirty socks (some even have mothers who will do that)." -- "Volunteer for the Greater Good" by S. Kleiman
Wow. What a stereotype, and one that couldn't be further from the truth. Admittedly, during economic downturns, when software developers are forced to take whatever job they can find to put food on the table, many contribute to open source projects, ones that don't have commercial support and ones that do. This helps that open source project and its community. But, it also helps the developers to keep their skills sharp and maintain credibility. Most open source developers get paid. Some are students. Some are entrepreneurs. But most get paid, it's their job. And even if it's not their job, projects have learned to give back to communities.
While there are hundreds of thousands of open source projects on Sourceforge.net and other forges, many have never gone beyond the proposal stage, and have nothing to download. The number of active open source projects does number in the tens of thousands, and that is still pretty amazing. The idea that the great unwashed contribute to these projects whilst Mom does laundry... Well, that just doesn't wash.
The vast majority of open source communities are started by 1 - 5 developers, who have a common goal that can be obtained through that specific open source project. They have strict governance in place to assure that the source code in the main project tree can be submitted only by those that have founded the project, or those that have gained a place of respect and trust in the community (a meritocracy) through the value of the code that they have contributed for plugins, through forums, and the like.
Most active open source projects fall into two categories, and many have slipped back and forth between these two.
While there are thousands of examples of both types, let me give just a few examples of some developers that I know personally, or companies with which I'm familiar.
Mondrian was founded by Julian Hyde, primarily as a labour of love. I know Julian, and he's an incredibly bright fellow. [And public congratulations to you, Julian and to your wife, on the recent birth of Sebastian]. In addition to be the father of Sebastian and Mondrian, Julian is also the Chief Architect of SQLstream, and a contributor to the Eigenbase project. Not exactly sitting around in the basement, coding away and waiting for Mom to clean up after him.
You can read Julian's blog on Open Source OLAP and Stuff, and follow Julian's Twitter stream too. By the way, while Mondrian can still be found on Sourceforge.net under its original license, it is also sponsored by Pentaho, and can be found as Pentaho Analysis, and as the analytical heart of the Pentaho BI Suite, JasperSoft BI Suite and SpagoBI.
Two other fellows had somewhat similar problems to solve and felt that the commercial solutions designed to move data around were simply too bloated, too complex, and prone to failure to boot. I don't believe that these two knew each other, and their problems were different enough to take different forms in the open source solutions that they created. I'm talking about Matt Casters, founder of the KETTLE ETL tool for data warehousing, and Ross Mason, founder of the Mule ESB. Both of them had an itch to scratch, and felt that the best way to scratch it was to create their own software, and leverage the power of the open source communities to refine their back scratchers. KETTLE, too, can now be found in Pentaho, as Pentaho Data Integration. Ross co-founded both Ricston and MuleSource to monetize his brain child, and has done an excellent job with the annual MuleCons. Matt still lives in Belgium, and has been known to share the fine beers produced by a local monastery [Thanks Matt]. You should follow Matt's blog too. Ross lives on the Island of Malta, and Ross blogs about Mule and the Maltese lifestyle.
Let's look at two other projects: Talend and WSO2. Both of these are newer entrants into the ETL and SOA space respectively, and both were started as commercial efforts by companies of the same name. I haven't had the opportunity to sit down with the Talend folk. I have spoken with the founders of WSO2, and they have an incredible passion that simply couldn't be fulfilled with their prior employer. So they founded their company, and their open source product, and haven't looked back. You can follow Sanjiva's Blog to learn more about WSO2 and their approach to open source.
And just one more, and somewhat different example: projects started by multiple educational institutions to meet their unique needs: Kuali for ERP and Sakai for learning management. For another take on commercialization, The rSmart Group contributes to these projects, but is commercializing them as appliances sold to educational institutions. You can read more about this rather different approach to monetizing open source at Chris Coppola's blog.
There are many, many more such examples. Just in the area of data management & analysis, we cover over 60 related open source projects [take a look at the blogroll in the sidebar to the right.
..."they organize themselves into groups of developers and maintainers on an adhoc basis, and on a world-wide basis. And the end products are robust, well developed, and well tested." -- "Volunteer for the Greater Good" by S. Kleiman
I think we've covered my rebuttal to your posting between the first quote and this one. I very much agree with this statement. I'm surprised by your surprise. The organizational dynamics that result in the excellent code that comprise open source projects is the subject of much thought, admiration and research. Here's a few places that you can go for more information.
And just for completeness sake, here's our email exchange:
From S. Kleiman: "OS is the current bug in my head. I'm trying to understand why my intellectual property should be "open" to the world (according to Richard Stallman.
Yes, I've read the copious amounts of literature on open software and the economics thereof - but I still don't get it. If I apply for a patent on a gadget, and then license companies to make that gadget - isn't that intellectual property? To copy my design, while it doesn't destroy my design, does limit any profit I might gain.
Anyway - how are you? Are you one of the original hackers?
I realized that all this time I though I had a great practical engineering degree. Instead I realize they made us into hackers - in the best sense of the word.What is your experience with OS? What are you talking about (besides the title)?
How is the "snow" in CA? "And my response:
Discussions around open source often get very passionate, so we should be having this conversation on a warm beach cooled by ocean breezes, fueled with lots of espresso ristretto followed by rounds of grappa to lower inhibitions and destroy preconceptions ;-)
But email is all we have.
Most open source projects are software, though there are a few examples of hardware projects such as Bug Labs, TrollTech (bought by Nokia, I think), OpenMojo and one for UAVs.
I should start by pointing out that I'm not presenting at the Open Source Business Conference, but am moderating a panel.
http://www.infoworld.com/event/osbc/09/osbc_agenda.html
Session Title: Moving Open Source Up the Stack
Session Abstract: Open Source Solutions for IT infrastructure have shown great success in organizations of all types
and sizes. OSS for business applications have seen greater difficulties in penetrating the glass ceiling
of the enterprise stack. We have put together a panel representing the EU and the US, system
integrators, vendors and buyers, and corporate focus vs. education focus. We''ll explore how the OSS
application strategy has changed over the past four years. We will also look at success and failures,
the trade-offs and the opportunities in solving business/end-user needs with OSS enterprise
applications.Learning Objective 1: Most buyers know the 80% capability for 20% cost mantra of most OSS vendors, but we''ll focus on
what that lower cost actually buys.
Learning Objective 2: Where does OSS fit in the higher levels of the application stack? Learn how flexibility & mashups
can improve the end user experience.
Learning Objective 3: Learn how to come out ahead on the trade-offs of up-front cost vs. operational cost, experience and
learning curves, maintenance and replacement, stagnation and growth.Here are the confirmed panelists:
(1) Tim Golden, Vice President - Unix Engineering, Security & Provisioning, Bank of America
(2) Gabriele Ruffatti, Architectures & Consulting Director, Research & Innovation Division, Engineering Group, Engineering Ingegneria Informatica S.p.A.
(3) Aaron Fulkerson, CEO/Founder, mindtouch
(4) Lance Walter, Vice President - Marketing, Pentaho
(5) Christopher D. Coppola, President, The rSmart Group
(Moderator) Joseph A. di Paolantonio, Principal Consultant/Blogger/Analyst, InterActive Systems & Consulting, Inc.So, back to the "Why open source" discussion.
You might want to listen to a couple of our podcasts:
http://press.teleinteractive.net/tialife/2005/06/30/what_is_open_source
http://press.teleinteractive.net/tialife/2005/07/01/why_open_source
or not :-D
Historically, there were analog computers programmer by moving around jumper cables and circuits. Then there were general purpose computers programmed in machine language. Companies like IBM got the idea of adding operating systems, compilers and even full applications to their new mainframes to make them more useful and "user friendly" with languages like COBOL for the average business person and FORTRAN fir those crazy engineers. Later Sun, Apple, HP and others designed RISC based CPU's with tightly integrated operating systems for great performance. Throughout all this, academicians and data processing folk would send each other paper or magnetic tapes and enhance the general body of knowledge concerning running and programming computers. There eventually grew close to 100 flavours of Unix, either the freely available BSD version or the more tightly licensed AT&T version.
Then a little company called Microsoft changed the game, showing that hardware was a commodity and the money was in patenting, copywriting and using restrictive licenses to make the money in computers come from software sales.
Fast forward ~15 years and the principals in Netscape decided to take a page from the Free Software Foundation & their GNU (Gnu is not Unix) General Public License and the more permissive Berkeley License for BSD and as a final recourse in their lost battle to the Microsoft monopoly, coined the term "open source" and released the geiko web rendering engine under the Mozilla Public License. And the philosophical wars were on.
When I was the General Manager of CapTech IT Services, I had a couple of SunOS Sys Admins who spent their spare time writing code to improve FreeBSD & NetBSD. I let them use their beach time to further contribute to these projects. Then a young'un came along who wanted to do the same for this upstart variant of minix called Linux. :-D. All of this piqued my interest in F/LOSS.
Today, I feel that F/LOSS is a development method and not a distribution method nor a business model. If you look at IBM, HP, Oracle and others, you'll find that >50% of their money comes from services. Just as M$ commodified hardware and caused the Intel CISC architecture to win over proprietary RISC chips, software has become a commodity. Services is how one makes money in the computer market. With an open source development methodology, a company can create and leverage a community, not just for core development but for plugins and extensions, but more importantly that community can be leveraged ad thousands of QA testers at all levels: modules, regression & UAT, for thousands of use cases, and for forum level customer support (People, people helping people, are the happiest people on the world ;-)
Can the functions in your application be replicated by someone else without duplicating a single line of your code? Are the margins on your software sales being forced below 10%? Does most of your profit come from support, system integration, customizations or SaaS? Then why not leverage your community?
So, this is a really short answer to a really complex issue.
To answer some of your other questions...
I'm not an hacker nor a programmer of any type. I have started to
play around with the open source R statistical language to recreate my Objective Bayes assessment technique and grow beyond the (Fortran on OS/360 of VAX/VMS) applications that I caused to be created from it.I haven't gotten to the snow in a couple of years, but we're in a drought cycle. Though it is storming as I write this.
I hope this helps you with your open source struggle, my friend. And thank you for putting up with me being a wordy bastard for the past /cough /harumph years.
Oh, and note the Creative Commons license for this post. This must really cause you great consternation as a writer. Oh, and I'm not going to touch your post on Stallman. ![]()
Today, SQLStream announced version 2.0 of their Real Time BI solution. SQLStream comes from the fertile creativity of Julian Hyde, who is also the founder of the open source Mondrian OLAP engine. While SQLStream is not open source, it does stem from the open source Eigenbase community, leveraging the user-defined transforms that were originally developed for LucidDB to operate on traditional stored relational data, with SQL:2003-compliant syntax. SQLStream extends this to handle streaming relational data.
In addition to capturing standard, structured data while "on the wire", SQLStream also includes adapters for feeds, such as Atom and RSS, and for Twitter.
Methinks Julian and I need to schedule another lunch soon, so that I can learn more about how this unstructured data, especially from Twitter, can fit into real time analytics provided by SQLStream v2.0.
BTW, you can follow me on Twitter as @JAdP.
We'll be leading a session at the SOA Consortium Meeting being held at the Santa Clara Hyatt Regency on 2008 December 10 & 11. I'm saying "leading a session" because, as opposed to the normal slide deck in MS PowerPoint, OpenOffice.org/NeoOffice Impress or Apple Keynote, we'll be using a mindmap to, as the agenda says:
"An interactive session based upon a mindmap for developing a system architecture using Master Data Management (MDM), Service-Oriented Architecture (SOA) and Software as a Service (SaaS) principles. The goal is not to talk about having an enterprise mashup with salesforce.com, but how to apply these principles to internal enterprise initiatives. We'll discuss the success, lessons learned and future of integrating MDM & SOA, and how this approach allows IT to provision business needs quickly through a SaaS approach to the users. Bring your own experiences and ideas, as we'll be expanding the mindmap in the direction you want. A PDF of the basic mindmap will be emailed to all members of the SOA-C and be included in the meeting handout. Changes to the mindmap made during the session will be posted after the meeting."
As consultants, we like to listen
and as believers in the power of collaboration, we like to leverage the wisdom of the group rather than pontificate from a podium. Indeed, this opportunity to speak came through interactions on Twitter, the so-called micro-blogging 24x7 TeleInterActive conversation. Thanks to Brenda Michelson, or @bmichelson by Twitter handle, for arranging this opportunity. The result of all this, is that we prefer to have a fully interactive session with the participants. We want the conversation to go in new and interesting directions. The way we do this, is much like the job of a community manager, but at the micro level. We're hoping that all will join in, and it's our job to assure that we maximize the value of the conversation to the group without abusing anyone's comfort level.
The point of this discussion is to explore how the concepts and principles of Master Data Management (MDM), Service-Oriented Architecture (SOA) and Software as a Service (SaaS) can help Information Technology (IT) departments better serve their customers. We'll be exploring how MDM and SOA work in a SaaS environment from our work with several SaaS firms, how SaaS companies leverage these principles to quickly provision and respond to their customers, and how this differs from a traditional IT department responding to business users and bringing them into production.
One of the most important aspects of this area, is the idea of data services, and how Master Data Management works within a Service-Oriented Architecture to give the users what they really need: access to legacy, historical and transient data.
We'll be starting with our MDM, SOA & SaaS mindmap, collapsed to the first level of branches, and following the branches that are of most interest to the participants. We'll be extending and modifying the mindmap as we go along, and posting the revised mindmap on this blog after the session.
I've been "hearing" all day on Twitter that Microsoft would be announcing something big at OSCON2008. Perhaps this is it:
Microsoft today announced that it intends to acquire DATAllegro, provider of breakthrough data warehouse appliances. The acquisition will extend the capabilities of Microsoft’s mission-critical data platform, making it easier and more cost effective for customers of all sizes to manage and glean insight from the ever expanding amount of data generated by and for businesses, employees and consumers.
-- Press Release "Microsoft to Acquire DATAllegro"
This is very interesting given the progress that Microsoft has made with its analytic services binding MS Office and SQL Server. Further quoting from the press release:
“Microsoft SQL Server 2008 delivers enterprise-class capabilities in business intelligence and data warehousing and the addition of the DATAllegro team and their technology will take our data platform to the highest scale of data warehousing.”
-- Ted Kummert, corporate vice president of the Data and Storage Platform Division at Microsoft
The direction for DATAllegro's data warehouse appliance is also made clear in the press release:
“DATAllegro's integration with SQL Server is the opti mal next generation solution and the acquisition by Microsoft is a great conclusion for the company.”
-- Lisa Lambert, Intel Capital managing director, Software and Solutions Group.
For those who don't know, DATAllegro is a data warehousing appliance company that utilizes "EMC® storage, Dell™ servers, Cisco® InfiniBand switches, Intel® multi-core CPUs and the Ingres® open source database".
So, whither Ingres in this acquisition? As we've written before here, Ingres is one of the earliest and strongest RDBMS products, which was absorbed by CA and then spun off again with an open source play in 2005. MS SQL Server, of course, started out as a rebranding of Sybase SQL*Server, until the partnership dissolved in the mid-1990's. Since then, MS SQL Server has been geared mostly as a workgroup and data mart server. It seems that a switch from Ingres to MS SQL Server could heavily undermine DATAllegro's business. In addition, the switchover in code to T-SQL will be a nightmare for developers. Add to that the challenges of moving from Linux to MS Windows, and from C/C++ to C# and it will take quite some time in production environments to iron out all the wrinkles.
In addition, while most seem to think that this puts Microsoft in a good position to challenge Oracle for the Enterprise Data Warehouse lead, it actually puts Microsoft directly into competition with other DW appliance vendors, such as Teradata. I truly doubt that this move will position Microsoft strongly into competition with either Oracle or Teradata, but merely marks another tactical error in Microsoft's increasingly desperate acquisition strategy to move deeper into the Enterprise on one hand, while striving to move further into the online space on the other.
More can be read at: