Modeling and Predictives

Here's a personal perspective and a bit of a personal history regarding mathematical modeling and predictives.

The 1980s were an exciting time for mathematical modeling of complex systems. At the time, there were two basic types of modeling: deterministic and stochastic (probability or statistics models). Within stochastic modeling, traditional statistics vs. Bayesian statistics was a burgeoning battleground. Physical simulations (often based upon deterministic models) were giving way to computer simulations (often based upon stochastic models, especially Monte Carlo Simulations). Two theories were popularized during this time: catastrophe theory and chaos theory; ultimately though, both of these theories proved incapable of prediction - the hallmark of a good mathematical model. A different type of modeling technique, based upon relational algebra, was also moving from the theoretical work of Ted Codd, to the practical implementations at (the company now known as) Oracle: data modeling.

Mathematical models are attempts to understand the complex by making simplifying assumptions. They are always a balance between complexity and accuracy. One nice example of the evolution of a deterministic mathematical model can be found in the Ideal Gas Laws, starting with Boyle's Law to Charles' Law to Gay-Lussac's Law to Avogadro's Law, culminating in the Ideal Gas Law, which all of saw in high school chemistry: PV=nRT.

Mathematical models are used in pretty much all fields of endeavor: physical sciences, all types of engineering, behavioral studies, and business. In the 1970's, I used deterministic electrochemical models to understand and predict the behaviour of various chemical stoichiometry for fuel cells and photovoltaic cells. In the 1980's, I used Bayesian statistics, sometimes combined with Monte Carlo Simulations to predict the reliability and risk associated with complex aerospace, utility and other systems.

The most popular use of Bayesian statistics was to expand the a priori knowledge of a complex system with subjective opinions. Likely the most famous application of Bayesian Statistics, at the time I became involved with the branch, was the Rand Corporation's Delphi Method. There was actually a joke in the Aerospace Industry about the Delphi Method:

A team of Rand consultants went to Werner von Braun to seek the expert opinion of the engineers working on a new rocket motor. The consultants explained their Delphi Method thusly. Prior to the first static test of the new rocket motor, they would ask, separately, each of the five engineers working on the new design their opinion of the rocket's reliability. Their opinions would form the Bayesian a priori distribution. After the test, they would reveal the results of the first survey and the test results, and ask the five engineers, collectively, their new opinion of the rocket's reliability. This would form the Bayesian a posteriori, from which the rocket's reliability would be predicted. Doctor von Braun said that he could save them some time. He gathered his team of rocket engineers, and asked them if they thought that the new rocket motor would fail. Each answered, as did Doctor Von Braun, "no" in German. "There, you see, five nines reliability, as specified." declared the good Doctor to the Rand consultants, "No need for any further study on your part."

Yep, it's a side splitter. :))

I didn't like this method, and did things a bit differently. My method involved gathering all the data for similar test and production models, weighting each relevant engineering variable, creating the a priori, fitting with Weibull Analysis, designing the Bayesian mathematical conjugate, using a detailed post-mortem of the first and subsequent tests of the system being analyzed, updating and learning as we went, to finally predict the reliability and risk for the system. I first used this on the Star48 perigee kick motor, and went on to refine and use this method for:

  • a variety of apogee and perigee kick motors
  • several components of the Space Transportation System
  • the Extreme Ultraviolet Explorer
  • Gravity Probe-B
  • a halogen lamp pizza oven
  • a methodology for failure mode, effects and criticality analysis of the electrical grid
  • and many more systems

I started to call this method "objective Bayes", but that name was already taken by a branch of Bayesian statistics that uses a non-informative a priori. Several of my projects resulted in software programs, all in FORTRAN. The first was used as a justification for a 1 MB [no, not a mistake] "box" [memory] for the corporate mainframe. NASA had sent us detailed data on over 4,000 solid propellant rocket motors. Talk about "big data". &#59;) I had a lot of fun doing this into the 1990's.

The next paradigm shift, for me personally, was learning data modeling, and focusing on business processes rather than engineering systems. Spending time at Oracle, including Richard Barker and his computer aided system engineering methods, I felt right at home. Rather than Bayesian Statistics, I would be using relational algebra and calculus for deterministic mathematical models of the data for the business processes being stored in a relational database management system. I very quickly got involved in very large databases, decision support systems, data warehousing and business intelligence.

I was surprised, and, after 17 years, continue to be surprised, how few data modelers agree with the statement in the preceding paragraph. I'm surprised how few data modelers go beyond entity-relationship diagrams; how few know or care about relational algebra and relational calculus. I'm amazed how few people realize that the arithmetic average computed in most "analytic" systems is a fairly useless measure of the underlying data, for most systems. I'm amazed that BI and analytic systems are still deterministic, and always go with simplicity over accuracy.

But computer power continues to expand. Moore's Law still rules. We can do better now. Things that used to take powerful main frames or even supercomputers can be done on laptops now. We no longer need to settle for simplicity over accuracy.

More importantly, the R Statistical Language has matured. Literally thousands and thousands of mathematical, graphical and statistical packages have been added to the CRAN, Omegahat and BioConductor repositories. Even the New York Times has published pieces about R.

It's once again time to move from deterministic to stochastic models.

Over the next few weeks, I hope to post a series of "study guides" that will focus on setting up a web-based environment consolidating SQL and MDX based analytics, as expressed in Pentaho and LucidDB open source projects, with R, and possibly SQLStream. Updated 20100314 to correct links (typos). Thanks to Doug Moran of Pentaho for catching this.

There have been many articles as well on "Big Data". As I commented on Merv Adrian's blog post request for "Ideas for SF Big Data Summit":

One area of discussion, which may appear to be for the “newbies” but is actually a matter of some debate, would be the definition of “big data”.

It really isn’t about the amount of data (TB & PB & more) so much as it is about the volumetric flow and timeliness of the data streams.

It’s about how data management systems handle the various sources of data as well as the interweaving of those sources.

It means treating data management systems in the same way that we treat the Space Transportation System, as a very large, complex system.

-- Comment by Joseph A. di Paolantonio, February 1, 2010 at 4:09 pm

I believe this because there is a huge amount of data about to come down the pipe. I'm not talking about the Semantic Web or the pidly little petabytes of web log and click-through data. I'm talking about the instrumented world. Something that's been in the making for ten years, and more: RFID, SmartDust, ZigBee, and more wired and wireless sensors, monitors and devices that will become a part of everything, everywhere.

Let me just cite two examples from something that is coming, is hyped, but not yet standardized, even if solid attempts at definition are being made: the SmartGrid. First, consider the fact that utility companies are distributing and using smart meters to replace manually read mechanical meters at homes and businesses; this will result in thousands of data points per day as opposed to one per month PER METER. The second is EPRI's copper-riding robot, as explained in a recent Popular Science. Think of the petabytes of data that these two examples will generate monthly. [Order the Smart Grid Dictionary: First Edition on Amazon]

The desire, the need, to analyze and make inferences from this data will be great. The need to actually predict from this data will be even greater, and will be a necessary element of the coming SmartGrid, and in making the instrumented world a better world for all of humanity.

Now if only we can avoid the likes of Skynet and Archangel.

An Open Source Childrens Story

On Twitter today, Lance Walter asked me to go into the Ark Business with him, and Gareth Greenaway asked for entertainment. It must be a rainy Friday afternoon &#59;)

I'm not sure about Lance's offer, but I did tell Gareth the following story, from tweet-start to tweet-end. This isn't word for word as I tweeted. 'Tis a bit expanded, but the tale is the same.

Once upon a time there was a young penguin named Tux. Tux decided to set off on a journey through IT Land. Now IT Land is a dangerous place, full of hackers fighting crackers, and ruled by those in the Ivory Tower and the acolytes of the Megaliths.

Along the way, the adventurous Tux met the Dolphin, the Elephant and the Beekeeper. They made a pact on the Lucid glyph to become a Dynamo of IT, bringing power to the datasmiths of the Land.

They met many Titans from the Megaliths on their Quest. The Beekeeper used the open source bees to open the scrum along the way, blocking the hookers with their sharp claws.

Some of the Titans were helpful, some, not so much.

The Dolphin was empowered by the Sun. But the Sun was consumed by a powerful Oracle. The Elephant, too, gained a powerful ally, and they do Enterprise against the Oracle. The band of the Quest was broken, and Tux was sad.

The Era of Lucid thought ended, but the Dynamo yet powers the Lucid Glyph, and Tux can rely on the Dynamo and the Beekeeper to predict a future clear of the Oracle.

And thus this quest ends, but another soon begins, where Tux will meet new friends and new foes. Will Beastie and the dæmons be allies? Will the Paladin in the Red Hat be stalwart?

Perhaps we'll find out at OSCON, for Gareth suggested that an assemblage of geeks would enjoy this story, and we'll see if OSCON thinks our tales worthy of a keynote slot in 2010.

Do you recognize all the characters in this tale? Maybe the links will help.

What say you, OSCON? Would these tales make a worthy Keynote?

Pentaho Reporting Review

As promised in my post, "Pentaho Reporting 3.5 for Java Developers First Look", I've taken the time to thoroughly grok Pentaho Reporting 3.5 for Java Developers by Will Gorman [direct link to Packt Publishing][Buy the book from Amazon]. I've read the book, cover-to-cover, and gone through the [non-Java] exercises. As I said in my first look at this book, it contains nuggets of wisdom and practicalities drawn from deep insider knowledge. This book does best serve its target audience, Java developers with a need to incorporate reporting into their applications. But it is also useful for report developers who wish to know more about Pentaho, and Pentaho users who wish to make their use of Pentaho easier and the resulting reporting experience richer.

The first three chapters provide a very good introduction to Pentaho Reporting and its relationship to the Pentaho BI Suite and the company Pentaho, historical, technical and practical. These three chapters are also the ones that have clearly marked sections for Java specific information and exercises. By the end of Chapter Three, you'll have installed Pentaho Report Designer, and built several rich reports. If you're a Java developer, you'll have had the opportunity to incorporate these reports into both Tomcat J2EE or Swing web applications. You'll have been introduced to the rich reporting capabilities of Pentaho, accessing data sources, the underlying Java libraries, and the various output options that include PDF, Excel, CSV, RTF, XML and plain text.

Chapters 4 through 8 is all about the WYSIWYG Pentaho Report Designer, the pixel-level control that it gives you over the layout of your reports, and the many wonderful capabilities provided by Pentaho Reporting from a wide range of chart types to embedding numeric and text functions, to cross-tabs and sub-reports. Other than Chapter 5, these chapters are as useful for a business user creating their own reports, as it is for a report developer. Chapter 5 is a very deep dive, very technical look at incorporating various data sources. The two areas that really stand out are the charts (Chapter 6) and functions (Chapter 7).

There are a baker's dozen types of charts covered, with an example for each type. Some of the more exotic are Waterfall, Bar-Line, Radar and Extended XY Series charts.

There are hundreds of parameters, functions and expressions that can be used in Pentaho Reports, and Will covers them all. The formula capability of Pentaho Reporting follows the OpenFormula standard, similar to the support for formulæ in Microsoft Excel, and the same as that followed by OpenOffice.org. One can provide computed text or numeric values within Pentaho reports to a fairly complex extent. Chapter 7 provides a great introduction to using this feature.

Chapters 9 through 11 are very much for the software developer, covering the development of Interactive Reports in Swing and HTML, the use of Pentaho's APIs and extension of Pentaho Reporting capabilities. It's all interesting stuff, that really explains the technology of Pentaho Reporting, but there's little here that is of use to the business user or non-Java report developer.

The first part of Chapter 12, on the other hand, is of little use to the Java developer, as it shows how to take reports created in Pentaho Report Designer and publish them through the Pentaho BI-Server, including formats suitable to mobile devices, such as the iPhone. The latter part of Chapter 12 goes into the use of metadata, and is useful both for the report developer and the Java developer.

So, as I said in my first look, the majority of the book is useful even if you're not a Java developer who needs to incorporate sophisticated reports into your application. That being said, Will Gorman does an excellent job in explaining Pentaho Reporting, and making it very useful for business users, report designers, report developers and, his target audience, Java developers. I heartily recommend that you buy this book. [Amazon link]

Information Architecture and DynamoBI

Anyone who follows either Nicholas Goodman or myself on Twitter (links are to our Twitter handles) or follow either this blog or Nick's Goodman on BI blog, know that I've been helping Nick out here and there with his new business, Dynamo Business Intelligence Corporation, offering support and commercial (and still open source) packages of the "best column-store database you never heard of", LucidDB.

One of the things that I'll be doing over the next few weeks is some website and community development. For all that I've been an executive type for decades, I love to keep hands-on with various technologies, and one of those technologies is "THE WEB". While I've never made a living as a web developer, I started with the web very early on, developing internal sites for the Lynx browser, as one of the internal web chiefs, learning from Comet, the Oracle web master. The first commercial site that I did, in 1994, for the local Eagle Express Flowers, is still up, with a few modernizations. :)

So, while waiting for the style guide from CORHOUSE, who designed the new Dynamo Business Intelligence Corporation logo [what do you think of it?]…

I've decided to go through an old friend. Information Architecture for the World Wide Web: Designing Large-Scale Web Sites

This exercise has reminded me that Information Architecture isn't just important for websites, but also for all the ways that individuals and businesses organize their data, concepts, information and knowledge. I'm happy to be helping out DynamoBI, and glad that doing so led me to this reminder of something I've been taking for granted. Time to revisit those [Ever]notes, [Zotero] researches, files and what not.

Pentaho Reporting 3.5 for Java Developers First Look

I was approached by Richard Dias of Packt Publishing to review "Pentaho Reporting 3.5 for Java Developers" written by Will Gorman. (Link is to Amazon.com)

LinkedIn
Richard Dias has indicated you are a Friend:

Hi Joseph,

My name is Richard Dias and I work for Packt Publishing which specializes in publishing focused IT related books.

I was wondering if you would be interesteed in reviewing the book "Pentaho Reporting for Java Developers" written by Will Gorman.

- Richard Dias

After some back and forth, I decided to accept the book in exchange for my review.

Hi Joseph,

Thanks for the reply and interest in reviewing the book. I have just placed an order for a copy of the book and it should arrive at your place within 10 days. Please do let me know when you receive it.

I have also created a unique link for you. It is http://www.packtpub.com/pentaho-reporting-3-5-for-java-developers?utm_source=press.teleinteractive.net&utm_medium=bookrev&utm_content=blog&utm_campaign=mdb_001537. Please feel free to use this link in your book review.

In the meanwhile, if you could mention about the book on your blog and tweet about the book, it would be highly appreciated. Please do let me know if it is fine with you.

I’m also sending you the link of an extracted chapter from the book (Chapter 6 Including Charts and Graphics in Reports). It would be great if you could put up the link on your blog. This would act as first hand information for your readers and they will also be able to download the file.

Any queries or suggestions are always welcome.

I look forward to your reply.

Best Regards,

Richard

Richard Dias
Marketing Research Executive | Packt Publishing | www.PacktPub.com

Shortly thereafter, I received notification that the book had shipped. It arrived within two weeks.

Of course, I've been too busy to do more than skim through the book. Anyone who follows me as JAdP on Twitter knows that in the past few weeks, I've been:

  • helping customers with algorithm development and implementing Pentaho on LucidDB,
  • working with Nicholas Goodman with his planning for commercial support of LucidDB through Dynamo Business Intelligence, and roadmaps for DynamoDB packages built on LucidDB's plugin architecture, and
  • migrating our RHEL host at ServerBeach from our old machine to a new one, while dealing with issues brought about by ServerBeach migrating to Peer1's tools.

None of which has left any time for a thorough review of "Pentaho Reporting for Java Developers".

I hope to have a full review up shortly after the holidays, which for me runs from Solstice to Epiphany, and maybe into the following weekend.

First, a little background. Will Gorman, the author, works for Pentaho, in software engineering, as a team lead, and works primarily on Pentaho Reporting products, a combination of server-side (Pentaho BI-Server), Desktop (MacOSX, Linux and Windows platforms) and Web-based software (Reporting Engine, Report Designer, Report Design Wizard and Pentaho Ad Hoc Reporting), which stems from the open source JFreeReport and JFreeChart. While I don't know Will personally, I do know quite a few individuals at Pentaho, and in the Pentaho community. I very much endorse their philosophy towards open source, and the way they've treated the open source projects and communities that they've integrated into their Pentaho Business Intelligence Suite. I do follow Will on Twitter, and on the IRC Freednode Channel, ##pentaho.

I myself am not a Java Developer, so at first I was not attracted to a book with a title that seemed geared to Pentaho Developers. Having skimmed through the book, I think that the title was poorly chosen. (Sorry Richard). I find that I can read through the book without stumbling, and that there is plenty of good intelligence that will help me better server and instruct my customers through the use of Pentaho Report Designer.

My initial impressions are good. The content seems full of golden nuggets of "how-tos" and background information not commonly known among the Pentaho community. Will's knowledge of Pentaho Reporting and how it fits into the rest of the Pentaho tools, such as KETTLE (Pentaho Data Integration) and Mondrian (Pentaho Analysis), along with a clear writing style makes all aspects of Pentaho more accessible to the BI practitioner, as well as those that wish to embed Pentaho Reporting into their own application.

This book is not just for Java developers, but for anyone who wishes to extend their abilities in BI, Reporting and Analysis, with Pentaho as an excellent example.

I'll be following up with the really exciting finds as I wend my way through Will's gold mine of knowledge, and, will do my best to fulfill my promise of a full review by mid-January.

You can also click through the Chapter 6 (a PDF) as mentioned in Richard's email.

Thank you, Richard. And most especially, thank you, Will.

October 2018
Mon Tue Wed Thu Fri Sat Sun
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31        
 << <   > >>
The TeleInterActive Press is a collection of blogs by Clarise Z. Doval Santos and Joseph A. di Paolantonio, covering the Internet of Things, Data Management and Analytics, and other topics for business and pleasure. 37.540686772871 -122.516149406889

Search

Categories

The TeleInterActive Lifestyle

Yackity Blog Blog

The Cynosural Blog

Open Source Solutions

DataArchon

The TeleInterActive Press

  XML Feeds