<?xml version="1.0" encoding="iso-8859-1"?><!-- generator="b2evolution/3.3.3" -->
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
		<title>Open Source Solutions</title>
		<link>http://press.teleinteractive.net/oss/</link>
		<atom:link rel="self" type="application/rss+xml" href="http://press.teleinteractive.net/oss/?tempskin=_rss2" />
		<description>The Open Source Solutions Blog is a companion to the Open Source Business Intelligence Wiki and Lens</description>
		<language>en-US</language>
		<docs>http://blogs.law.harvard.edu/tech/rss</docs>
		<admin:generatorAgent rdf:resource="http://b2evolution.net/?v=3.3.3"/>
		<ttl>60</ttl>
				<item>
			<title>Data Artisan Smith or Scientist</title>
			<link>http://press.teleinteractive.net/oss/2010/06/30/data-artisan-smith-or-scientist</link>
			<pubDate>Wed, 30 Jun 2010 22:57:26 +0000</pubDate>			<dc:creator>Joseph A. di Paolantonio</dc:creator>
			<category domain="main">Data Science</category>
<category domain="alt">Big Data</category>			<guid isPermaLink="false">947@http://press.teleinteractive.net/</guid>
						<description>&lt;p&gt;Over the past few months, a debate has been proceeding on whether or not a new discipline, a new career path, is emerging from the tsunami of data bearing down on us.  The need for a new type of Renaissance [Wo]Man to deal with the Big Data onslaught.  To whit, Data Science.&lt;/p&gt;

&lt;p&gt;I'm writing about this now, because last night, at an every-three-week get together devoted to cask beer and data analysis, the topic came up.  [Yes, every-THREE-weeks - a month is too long to go without cask beer fueled discussions of Rstats, BigData, Streaming SQL, BI and more.]  The statisticians in the group, including myself, strongly disagreed with the way the term is being used; the software/database types were either in favor or ambivalent.  We all agreed that a new, interdisciplinary approach to Big Data is needed.  Oh, and I'll stay on topic here, and not get into another debate as to the definition of &quot;Big Data&quot;.  &lt;img src=&quot;http://press.teleinteractive.net/rsc/smilies/icon_wink.gif&quot; alt=&quot;&amp;#59;&amp;#41;&quot; class=&quot;middle&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This lively conversation reinforced my desire to write about Data Science that swelled up in me after reading &quot;&lt;a title=&quot;Mike Loukides article from O'Reilly Radar&quot; href=&quot;http://radar.oreilly.com/2010/06/what-is-data-science.html&quot; target=&quot;_blank&quot;&gt;What is Data Science?&lt;/a&gt;&quot; by Mike Loukides published on O'Reilly Radar, and a subsequent discussion on &lt;a title=&quot;link to Twitter home page&quot; href=&quot;http://twitter.com/&quot; target=&quot;_blank&quot;&gt;Twitter&lt;/a&gt; held the following weekend, concerning data analytics.&lt;/p&gt;

&lt;p&gt;The term &quot;Data Science&quot; isn't new, but it is taking on new meanings.  The &lt;strong&gt;Journal of Data Science&lt;/strong&gt; published &lt;a title=&quot;Journal of Data Science Online v1#1 href=&quot;http://www.jds-online.com/v1-1&quot; target=&quot;_blank&quot;&gt;JDS volume 1, issue 1&lt;/a&gt; in January of 2003.  The Scope of the JDS is very clearly related to applied statistics&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;By &quot;Data Science&quot;, we mean almost everything that has something to do with data: Collecting, analyzing, modeling...... yet the most important part is its applications --- all sorts of applications. This journal is devoted to applications of statistical methods at large.&lt;br /&gt;
-- &lt;a title=&quot;About JDS Scope&quot; href=&quot;http://www.jds-online.com/about&quot; target=&quot;_blank&quot;&gt;About JDS, Scope, First Paragraph&lt;/a&gt;&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;There is also the &lt;a title=&quot;Apparently Defunct Data Science Journal&quot; href=&quot;http://www.datasciencejournal.org/&quot; target=&quot;_blank&quot;&gt;CODATA Data Science Journal&lt;/a&gt;, which appears to have last been updated in August of 2007, and currently has no content, other than its self-description as&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;The Data Science Journal is a peer-reviewed electronic journal publishing papers on the management of data and databases in Science and Technology.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;I think that two definitions can be derived from these two journals.&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;Data Science is systematic study, through observation and experiment, of the collection, modeling, analysis, visualization, dissemination, and application of data.&lt;/li&gt;
  &lt;li&gt;Data Science is the use of data and database technology within physical and natural sciences and engineering.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I can agree with the first, especially with the JDS Scope clearly stating that Data Science is applied statistics.&lt;/p&gt;

&lt;p&gt;The New Oxford American Dictionary, on which the Apple Dictionary program is based, defines science as a noun&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;the intellectual and practical activity encompassing the systematic study of the structure and behaviour of the physical and natural world through observations and experiments.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;And a similar &lt;a href=&quot;http://dictionary.reference.com/browse/science&quot;&gt;definition of science&lt;/a&gt; can be found on Dictionary.com.&lt;/p&gt;

&lt;p&gt;In many ways, I like Mike Loukides' article &quot;&lt;a title=&quot;Mike Loukides article from O'Reilly Radar&quot; href=&quot;http://radar.oreilly.com/2010/06/what-is-data-science.html&quot; target=&quot;_blank&quot;&gt;What is Data Science?&lt;/a&gt;&quot; in how it highlights the need for this new discipline.  I just don't like what he describes to be the new definition of &quot;data science&quot;.  Indeed, I very much disagree with this statement from the article.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Using data effectively requires something different from traditional statistics, where actuaries in business suits perform arcane but fairly well-defined kinds of analysis. What differentiates data science from statistics is that data science is a holistic approach. We're increasingly finding data in the wild, and data scientists are involved with gathering data, massaging it into a tractable form, making it tell its story, and presenting that story to others.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;A statistician is not an actuary.  They're very different roles.  I know this because I worked for over a decade applying statistics to determining the reliability and risk associated with very large, complex systems such as rockets and space-borne astrophysics observatories.  I once hired a &lt;a title=&quot;University of California at Berkeley&quot; href=&quot;http://berkeley.edu/&quot; target=&quot;_blank&quot;&gt;Cal&lt;/a&gt; student as an intern because she feared that the only career open to her as a math major, was to be an actuary.  I showed her a different path.  So, yes, I know, from experience, that a statistician is not an actuary.  Actually, the definition of a data scientist given, that is &quot;gathering data, massaging it into a tractable form, making it tell its story, and presenting that story to others&quot; is exactly what a statistician does.&lt;/p&gt;

&lt;p&gt;I do however see the need for a new discipline, separate from applied statistics, or data science.  The massive amount of data to come from an instrumented world with strongly interconnected people and machines, and real-time analysis, inference and prediction from those data, will require inter-disciplinary skills.  But I see those skills coming together in a person who is more of a smith, or, as &lt;a title=&quot;Julian Hyde on Twitter&quot; href=&quot;http://twitter.com/julianhyde&quot; target=&quot;_blank&quot;&gt;Julian&lt;/a&gt; &lt;a title=&quot;Julian Hyde blog on Open Source, OLAP and stuff&quot; href=&quot;http://julianhyde.blogspot.com/&quot; target=&quot;_blank&quot;&gt;Hyde&lt;/a&gt; put it last night, an artisan.  Falling back on the old dictionary again, a smith is someone who is skilled in creating something with a specific material; an artisan is someone who is skilled in a craft, making things by hand.&lt;/p&gt;

&lt;p&gt;Another reason that I don't like the term &quot;data science&quot; for this interdisciplinary role, stems from what Mike Loukides describes in his article &quot;&lt;a title=&quot;Mike Loukides article from O'Reilly Radar&quot; href=&quot;http://radar.oreilly.com/2010/06/what-is-data-science.html&quot; target=&quot;_blank&quot;&gt;What is Data Science?&lt;/a&gt;&quot; as the definition for this new discipline &quot;Data science requires skills ranging from traditional computer science to mathematics to art&quot;.  I agree that the new discipline requires these three things, and more, even softer skills.  I disagree that these add up to data science.&lt;/p&gt;

&lt;p&gt;I even prefer &quot;data geek&quot;, as defined by &lt;a title=&quot;Michael E Driscoll on Twitter as dataspora&quot; href=&quot;http://www.twitter.com/dataspora&quot; target=&quot;_blank&quot;&gt;Michael E. Driscoll&lt;/a&gt; in &quot;&lt;a href=&quot;http://dataspora.com/blog/sexy-data-geeks/&quot;&gt;The Three Sexy Skills of Data Geeks&lt;/a&gt;&quot;.  Michael Driscoll's post of 2009 May 27 certainly agrees skill-wise with Mike Loukides post of 2010 June 02.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Skill #1: Statistics (Studying)&lt;/li&gt;
  &lt;li&gt;Skill #2: Data Munging (Suffering)&lt;/li&gt;
  &lt;li&gt;Skill #3: Visualization (Storytelling)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;And I very much prefer &quot;Data Munging&quot; to &quot;Computer Science&quot; as one of the three skills.&lt;/p&gt;

&lt;p&gt;I'll stick to the definition that I gave above for data science as &quot;systematic study, through observation and experiment, of the collection, modeling, analysis, visualization, dissemination, and application of data&quot;.  This is also applied statistics.  So, what else is needed for this new discipline?  Well, Mike and Michael are correct: computer skills, especially data munging, and art.  Well, any statistician today has computer skills, generally in one or more of SAS, SPSS, R, S-plus, Python, SQL, Stata, MatLab and other software packages, as well as familiarity with various data storage &amp;amp; management methods.  Some statisticians are even artists, perhaps as story tellers, as evidenced by that rare great teacher or convincing expert witness, perhaps as visualizers, creating statistically accurate animations to clearly describe the analysis, as evidenced by the career of that intern I hired so many years ago.&lt;/p&gt;

&lt;p&gt;The data smith, the data artisan, must be comfortable with all forms of data:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;structured,&lt;/li&gt;
  &lt;li&gt;unstructured and&lt;/li&gt;
  &lt;li&gt;semi-structured&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Just as any other smith, someone following this new discipline might serve an apprenticeship creating new things from these forms of data such as a data warehouse or an OLAP cube, a sentiment analysis or a streaming SQL sensor web, or a recommendation engine or complex system predictives.  The data smith must become very comfortable with putting all forms of data together in new ways, to come to new conclusions.&lt;/p&gt;

&lt;p&gt;Just as a goldsmith will never make a piece of jewelry identical to the one finished days before, just as art can be forged but not duplicated, the data smith, the data artisan will glean new inferences every time they look at the data, will make new predictions with every new datum, and the story they tell, the picture they paint, will be different each time.&lt;/p&gt;

&lt;p&gt;And perhaps then, the data smith becomes a master, an artisan.&lt;/p&gt;


&lt;!-- Creative Commons License --&gt; &lt;div class=&quot;cc_license&quot;&gt;&lt;a rel=&quot;cc:license&quot; href=&quot;http://creativecommons.org/licenses/by-nc-sa/3.0/&quot; title=&quot;Creative Commons License: Attribution, Non-Commercial, Share-Alike&quot;&gt;&lt;img src=&quot;http://i.creativecommons.org/l/by-nc-sa/3.0/88x31.png&quot; alt=&quot;Creative Commons License: Attribution, Non-Commercial, Share-Alike&quot; /&gt;&lt;/a&gt;Except where otherwise noted, this content is&lt;br /&gt;licensed under a &lt;a rel=&quot;cc:license&quot; href=&quot;http://creativecommons.org/licenses/by-nc-sa/3.0/&quot; title=&quot;Creative Commons License: Attribution, Non-Commercial, Share-Alike&quot;&gt;Creative Commons License&lt;/a&gt;.&lt;/div&gt; &lt;!-- /Creative Commons License --&gt;
&lt;!--
&lt;rdf:RDF xmlns=&quot;http://creativecommons.org/ns#&quot;
    xmlns:dc=&quot;http://purl.org/dc/elements/1.1/&quot;
    xmlns:rdf=&quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#&quot;&gt;
&lt;Work rdf:about=&quot;http://press.teleinteractive.net/oss/2010/06/30/data-artisan-smith-or-scientist&quot;&gt;
  &lt;dc:title&gt;Data Artisan Smith or Scientist&lt;/dc:title&gt;
  &lt;dc:creator&gt;&lt;Agent&gt;
    &lt;dc:title&gt;Joseph A. di Paolantonio (User ID#4 at http://press.teleinteractive.net/)&lt;/dc:title&gt;
  &lt;/Agent&gt;&lt;/dc:creator&gt;
  &lt;dc:rights&gt;&lt;Agent&gt;
    &lt;dc:title&gt;Joseph A. di Paolantonio (User ID#4 at http://press.teleinteractive.net/)&lt;/dc:title&gt;
  &lt;/Agent&gt;&lt;/dc:rights&gt;
  &lt;dc:date&gt;2010-06-30 18:39:11&lt;/dc:date&gt;
  &lt;license rdf:resource=&quot;by-nc-sa/3.0/&quot; /&gt;
&lt;/Work&gt;
&lt;License rdf:about=&quot;by-nc-sa/3.0/&quot;&gt;
  &lt;permits rdf:resource=&quot;http://web.resource.org/cc/Reproduction&quot; /&gt;
  &lt;permits rdf:resource=&quot;http://web.resource.org/cc/Distribution&quot; /&gt;
  &lt;permits rdf:resource=&quot;http://web.resource.org/cc/DerivativeWorks&quot; /&gt;
  &lt;prohibits rdf:resource=&quot;http://web.resource.org/cc/CommercialUse&quot; /&gt;
  &lt;requires rdf:resource=&quot;http://web.resource.org/cc/Notice&quot; /&gt;
  &lt;requires rdf:resource=&quot;http://web.resource.org/cc/Attribution&quot; /&gt;
  &lt;requires rdf:resource=&quot;http://web.resource.org/cc/ShareAlike&quot; /&gt;
&lt;/License&gt;
&lt;/rdf:RDF&gt;
--&gt;

&lt;p&gt;
&lt;ins&gt;PS: Here's a list of links to that Twitter conversation among some of the most respected people in the biz, on Data Analytics&lt;/ins&gt;&lt;br /&gt;
&lt;ol&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/NeilRaden/status/15512935981&quot;&gt;https://twitter.com/NeilRaden/status/15512935981&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/NeilRaden/status/15513225191&quot;&gt;https://twitter.com/NeilRaden/status/15513225191&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/NeilRaden/status/15513275261&quot;&gt;https://twitter.com/NeilRaden/status/15513275261&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/NeilRaden/status/15513453916&quot;&gt;https://twitter.com/NeilRaden/status/15513453916&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/datachick/status/15513460384&quot;&gt;https://twitter.com/datachick/status/15513460384&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/NeilRaden/status/15513488053&quot;&gt;https://twitter.com/NeilRaden/status/15513488053&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/datachick/status/15513677836&quot;&gt;https://twitter.com/datachick/status/15513677836&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/CMastication/status/15513772446&quot;&gt;https://twitter.com/CMastication/status/15513772446&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/NeilRaden/status/15513821393&quot;&gt;https://twitter.com/NeilRaden/status/15513821393&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/NeilRaden/status/15513854916&quot;&gt;https://twitter.com/NeilRaden/status/15513854916&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/NeilRaden/status/15513915694&quot;&gt;https://twitter.com/NeilRaden/status/15513915694&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/alecsharp/status/15513980301&quot;&gt;https://twitter.com/alecsharp/status/15513980301&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/NeilRaden/status/15514104372&quot;&gt;https://twitter.com/NeilRaden/status/15514104372&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/alecsharp/status/15514097194&quot;&gt;https://twitter.com/alecsharp/status/15514097194&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/CMastication/status/15514374095&quot;&gt;https://twitter.com/CMastication/status/15514374095&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/estrenuo/status/15514634644&quot;&gt;https://twitter.com/estrenuo/status/15514634644&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/NeilRaden/status/15515243453&quot;&gt;https://twitter.com/NeilRaden/status/15515243453&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/CMastication/status/15516185085&quot;&gt;https://twitter.com/CMastication/status/15516185085&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/annmariastat/status/15516321715&quot;&gt;https://twitter.com/annmariastat/status/15516321715&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/NeilRaden/status/15519544709&quot;&gt;https://twitter.com/NeilRaden/status/15519544709&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/NeilRaden/status/15519597061&quot;&gt;https://twitter.com/NeilRaden/status/15519597061&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/NeilRaden/status/15519621974&quot;&gt;https://twitter.com/NeilRaden/status/15519621974&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/skemsley/status/15519932631&quot;&gt;https://twitter.com/skemsley/status/15519932631&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/skemsley/status/15519932631&quot;&gt;https://twitter.com/skemsley/status/15519932631&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/aristippus303/status/15520146540&quot;&gt;https://twitter.com/aristippus303/status/15520146540&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/NeilRaden/status/15520478566&quot;&gt;https://twitter.com/NeilRaden/status/15520478566&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/SethGrimes/status/15520765766&quot;&gt;https://twitter.com/SethGrimes/status/15520765766&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/SethGrimes/status/15520851678&quot;&gt;https://twitter.com/SethGrimes/status/15520851678&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/NeilRaden/status/15521050387&quot;&gt;https://twitter.com/NeilRaden/status/15521050387&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/NeilRaden/status/15521106901&quot;&gt;https://twitter.com/NeilRaden/status/15521106901&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/NeilRaden/status/15521133647&quot;&gt;https://twitter.com/NeilRaden/status/15521133647&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/NeilRaden/status/15521192977&quot;&gt;https://twitter.com/NeilRaden/status/15521192977&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/SethGrimes/status/15521579977&quot;&gt;https://twitter.com/SethGrimes/status/15521579977&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://twitter.com/ryanprociuk/status/15521637974&quot;&gt;https://twitter.com/ryanprociuk/status/15521637974&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/p&gt;&lt;div class=&quot;item_footer&quot;&gt;&lt;p&gt;&lt;small&gt;&lt;a href=&quot;http://press.teleinteractive.net/oss/2010/06/30/data-artisan-smith-or-scientist&quot;&gt;Original post&lt;/a&gt; blogged on &lt;a href=&quot;http://b2evolution.net/&quot;&gt;b2evolution&lt;/a&gt;.&lt;/small&gt;&lt;/p&gt;&lt;/div&gt;</description>
			<content:encoded><![CDATA[<p>Over the past few months, a debate has been proceeding on whether or not a new discipline, a new career path, is emerging from the tsunami of data bearing down on us.  The need for a new type of Renaissance [Wo]Man to deal with the Big Data onslaught.  To whit, Data Science.</p>

<p>I'm writing about this now, because last night, at an every-three-week get together devoted to cask beer and data analysis, the topic came up.  [Yes, every-THREE-weeks - a month is too long to go without cask beer fueled discussions of Rstats, BigData, Streaming SQL, BI and more.]  The statisticians in the group, including myself, strongly disagreed with the way the term is being used; the software/database types were either in favor or ambivalent.  We all agreed that a new, interdisciplinary approach to Big Data is needed.  Oh, and I'll stay on topic here, and not get into another debate as to the definition of "Big Data".  <img src="http://press.teleinteractive.net/rsc/smilies/icon_wink.gif" alt="&#59;&#41;" class="middle" /></p>

<p>This lively conversation reinforced my desire to write about Data Science that swelled up in me after reading "<a title="Mike Loukides article from O'Reilly Radar" href="http://radar.oreilly.com/2010/06/what-is-data-science.html" target="_blank">What is Data Science?</a>" by Mike Loukides published on O'Reilly Radar, and a subsequent discussion on <a title="link to Twitter home page" href="http://twitter.com/" target="_blank">Twitter</a> held the following weekend, concerning data analytics.</p>

<p>The term "Data Science" isn't new, but it is taking on new meanings.  The <strong>Journal of Data Science</strong> published <a title="Journal of Data Science Online v1#1 href="http://www.jds-online.com/v1-1" target="_blank">JDS volume 1, issue 1</a> in January of 2003.  The Scope of the JDS is very clearly related to applied statistics</p>
<blockquote><p>By "Data Science", we mean almost everything that has something to do with data: Collecting, analyzing, modeling...... yet the most important part is its applications --- all sorts of applications. This journal is devoted to applications of statistical methods at large.<br />
-- <a title="About JDS Scope" href="http://www.jds-online.com/about" target="_blank">About JDS, Scope, First Paragraph</a></p></blockquote>

<p>There is also the <a title="Apparently Defunct Data Science Journal" href="http://www.datasciencejournal.org/" target="_blank">CODATA Data Science Journal</a>, which appears to have last been updated in August of 2007, and currently has no content, other than its self-description as</p>
<blockquote><p>The Data Science Journal is a peer-reviewed electronic journal publishing papers on the management of data and databases in Science and Technology.</p></blockquote>

<p>I think that two definitions can be derived from these two journals.</p>
<ol>
  <li>Data Science is systematic study, through observation and experiment, of the collection, modeling, analysis, visualization, dissemination, and application of data.</li>
  <li>Data Science is the use of data and database technology within physical and natural sciences and engineering.</li>
</ol>

<p>I can agree with the first, especially with the JDS Scope clearly stating that Data Science is applied statistics.</p>

<p>The New Oxford American Dictionary, on which the Apple Dictionary program is based, defines science as a noun</p>
<blockquote><p>the intellectual and practical activity encompassing the systematic study of the structure and behaviour of the physical and natural world through observations and experiments.</p></blockquote>
<p>And a similar <a href="http://dictionary.reference.com/browse/science">definition of science</a> can be found on Dictionary.com.</p>

<p>In many ways, I like Mike Loukides' article "<a title="Mike Loukides article from O'Reilly Radar" href="http://radar.oreilly.com/2010/06/what-is-data-science.html" target="_blank">What is Data Science?</a>" in how it highlights the need for this new discipline.  I just don't like what he describes to be the new definition of "data science".  Indeed, I very much disagree with this statement from the article.</p>
<blockquote><p>Using data effectively requires something different from traditional statistics, where actuaries in business suits perform arcane but fairly well-defined kinds of analysis. What differentiates data science from statistics is that data science is a holistic approach. We're increasingly finding data in the wild, and data scientists are involved with gathering data, massaging it into a tractable form, making it tell its story, and presenting that story to others.</p></blockquote>
<p>A statistician is not an actuary.  They're very different roles.  I know this because I worked for over a decade applying statistics to determining the reliability and risk associated with very large, complex systems such as rockets and space-borne astrophysics observatories.  I once hired a <a title="University of California at Berkeley" href="http://berkeley.edu/" target="_blank">Cal</a> student as an intern because she feared that the only career open to her as a math major, was to be an actuary.  I showed her a different path.  So, yes, I know, from experience, that a statistician is not an actuary.  Actually, the definition of a data scientist given, that is "gathering data, massaging it into a tractable form, making it tell its story, and presenting that story to others" is exactly what a statistician does.</p>

<p>I do however see the need for a new discipline, separate from applied statistics, or data science.  The massive amount of data to come from an instrumented world with strongly interconnected people and machines, and real-time analysis, inference and prediction from those data, will require inter-disciplinary skills.  But I see those skills coming together in a person who is more of a smith, or, as <a title="Julian Hyde on Twitter" href="http://twitter.com/julianhyde" target="_blank">Julian</a> <a title="Julian Hyde blog on Open Source, OLAP and stuff" href="http://julianhyde.blogspot.com/" target="_blank">Hyde</a> put it last night, an artisan.  Falling back on the old dictionary again, a smith is someone who is skilled in creating something with a specific material; an artisan is someone who is skilled in a craft, making things by hand.</p>

<p>Another reason that I don't like the term "data science" for this interdisciplinary role, stems from what Mike Loukides describes in his article "<a title="Mike Loukides article from O'Reilly Radar" href="http://radar.oreilly.com/2010/06/what-is-data-science.html" target="_blank">What is Data Science?</a>" as the definition for this new discipline "Data science requires skills ranging from traditional computer science to mathematics to art".  I agree that the new discipline requires these three things, and more, even softer skills.  I disagree that these add up to data science.</p>

<p>I even prefer "data geek", as defined by <a title="Michael E Driscoll on Twitter as dataspora" href="http://www.twitter.com/dataspora" target="_blank">Michael E. Driscoll</a> in "<a href="http://dataspora.com/blog/sexy-data-geeks/">The Three Sexy Skills of Data Geeks</a>".  Michael Driscoll's post of 2009 May 27 certainly agrees skill-wise with Mike Loukides post of 2010 June 02.</p>

<ol>
  <li>Skill #1: Statistics (Studying)</li>
  <li>Skill #2: Data Munging (Suffering)</li>
  <li>Skill #3: Visualization (Storytelling)</li>
</ol>

<p>And I very much prefer "Data Munging" to "Computer Science" as one of the three skills.</p>

<p>I'll stick to the definition that I gave above for data science as "systematic study, through observation and experiment, of the collection, modeling, analysis, visualization, dissemination, and application of data".  This is also applied statistics.  So, what else is needed for this new discipline?  Well, Mike and Michael are correct: computer skills, especially data munging, and art.  Well, any statistician today has computer skills, generally in one or more of SAS, SPSS, R, S-plus, Python, SQL, Stata, MatLab and other software packages, as well as familiarity with various data storage &amp; management methods.  Some statisticians are even artists, perhaps as story tellers, as evidenced by that rare great teacher or convincing expert witness, perhaps as visualizers, creating statistically accurate animations to clearly describe the analysis, as evidenced by the career of that intern I hired so many years ago.</p>

<p>The data smith, the data artisan, must be comfortable with all forms of data:</p>
<ul>
  <li>structured,</li>
  <li>unstructured and</li>
  <li>semi-structured</li>
</ul>

<p>Just as any other smith, someone following this new discipline might serve an apprenticeship creating new things from these forms of data such as a data warehouse or an OLAP cube, a sentiment analysis or a streaming SQL sensor web, or a recommendation engine or complex system predictives.  The data smith must become very comfortable with putting all forms of data together in new ways, to come to new conclusions.</p>

<p>Just as a goldsmith will never make a piece of jewelry identical to the one finished days before, just as art can be forged but not duplicated, the data smith, the data artisan will glean new inferences every time they look at the data, will make new predictions with every new datum, and the story they tell, the picture they paint, will be different each time.</p>

<p>And perhaps then, the data smith becomes a master, an artisan.</p>


<!-- Creative Commons License --> <div class="cc_license"><a rel="cc:license" href="http://creativecommons.org/licenses/by-nc-sa/3.0/" title="Creative Commons License: Attribution, Non-Commercial, Share-Alike"><img src="http://i.creativecommons.org/l/by-nc-sa/3.0/88x31.png" alt="Creative Commons License: Attribution, Non-Commercial, Share-Alike" /></a>Except where otherwise noted, this content is<br />licensed under a <a rel="cc:license" href="http://creativecommons.org/licenses/by-nc-sa/3.0/" title="Creative Commons License: Attribution, Non-Commercial, Share-Alike">Creative Commons License</a>.</div> <!-- /Creative Commons License -->
<!--
<rdf:RDF xmlns="http://creativecommons.org/ns#"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<Work rdf:about="http://press.teleinteractive.net/oss/2010/06/30/data-artisan-smith-or-scientist">
  <dc:title>Data Artisan Smith or Scientist</dc:title>
  <dc:creator><Agent>
    <dc:title>Joseph A. di Paolantonio (User ID#4 at http://press.teleinteractive.net/)</dc:title>
  </Agent></dc:creator>
  <dc:rights><Agent>
    <dc:title>Joseph A. di Paolantonio (User ID#4 at http://press.teleinteractive.net/)</dc:title>
  </Agent></dc:rights>
  <dc:date>2010-06-30 18:39:11</dc:date>
  <license rdf:resource="by-nc-sa/3.0/" />
</Work>
<License rdf:about="by-nc-sa/3.0/">
  <permits rdf:resource="http://web.resource.org/cc/Reproduction" />
  <permits rdf:resource="http://web.resource.org/cc/Distribution" />
  <permits rdf:resource="http://web.resource.org/cc/DerivativeWorks" />
  <prohibits rdf:resource="http://web.resource.org/cc/CommercialUse" />
  <requires rdf:resource="http://web.resource.org/cc/Notice" />
  <requires rdf:resource="http://web.resource.org/cc/Attribution" />
  <requires rdf:resource="http://web.resource.org/cc/ShareAlike" />
</License>
</rdf:RDF>
-->

<p>
<ins>PS: Here's a list of links to that Twitter conversation among some of the most respected people in the biz, on Data Analytics</ins><br />
<ol>
  <li><a href="https://twitter.com/NeilRaden/status/15512935981">https://twitter.com/NeilRaden/status/15512935981</a></li>
  <li><a href="https://twitter.com/NeilRaden/status/15513225191">https://twitter.com/NeilRaden/status/15513225191</a></li>
  <li><a href="https://twitter.com/NeilRaden/status/15513275261">https://twitter.com/NeilRaden/status/15513275261</a></li>
  <li><a href="https://twitter.com/NeilRaden/status/15513453916">https://twitter.com/NeilRaden/status/15513453916</a></li>
  <li><a href="https://twitter.com/datachick/status/15513460384">https://twitter.com/datachick/status/15513460384</a></li>
  <li><a href="https://twitter.com/NeilRaden/status/15513488053">https://twitter.com/NeilRaden/status/15513488053</a></li>
  <li><a href="https://twitter.com/datachick/status/15513677836">https://twitter.com/datachick/status/15513677836</a></li>
  <li><a href="https://twitter.com/CMastication/status/15513772446">https://twitter.com/CMastication/status/15513772446</a></li>
  <li><a href="https://twitter.com/NeilRaden/status/15513821393">https://twitter.com/NeilRaden/status/15513821393</a></li>
  <li><a href="https://twitter.com/NeilRaden/status/15513854916">https://twitter.com/NeilRaden/status/15513854916</a></li>
  <li><a href="https://twitter.com/NeilRaden/status/15513915694">https://twitter.com/NeilRaden/status/15513915694</a></li>
  <li><a href="https://twitter.com/alecsharp/status/15513980301">https://twitter.com/alecsharp/status/15513980301</a></li>
  <li><a href="https://twitter.com/NeilRaden/status/15514104372">https://twitter.com/NeilRaden/status/15514104372</a></li>
  <li><a href="https://twitter.com/alecsharp/status/15514097194">https://twitter.com/alecsharp/status/15514097194</a></li>
  <li><a href="https://twitter.com/CMastication/status/15514374095">https://twitter.com/CMastication/status/15514374095</a></li>
  <li><a href="https://twitter.com/estrenuo/status/15514634644">https://twitter.com/estrenuo/status/15514634644</a></li>
  <li><a href="https://twitter.com/NeilRaden/status/15515243453">https://twitter.com/NeilRaden/status/15515243453</a></li>
  <li><a href="https://twitter.com/CMastication/status/15516185085">https://twitter.com/CMastication/status/15516185085</a></li>
  <li><a href="https://twitter.com/annmariastat/status/15516321715">https://twitter.com/annmariastat/status/15516321715</a></li>
  <li><a href="https://twitter.com/NeilRaden/status/15519544709">https://twitter.com/NeilRaden/status/15519544709</a></li>
  <li><a href="https://twitter.com/NeilRaden/status/15519597061">https://twitter.com/NeilRaden/status/15519597061</a></li>
  <li><a href="https://twitter.com/NeilRaden/status/15519621974">https://twitter.com/NeilRaden/status/15519621974</a></li>
  <li><a href="https://twitter.com/skemsley/status/15519932631">https://twitter.com/skemsley/status/15519932631</a></li>
  <li><a href="https://twitter.com/skemsley/status/15519932631">https://twitter.com/skemsley/status/15519932631</a></li>
  <li><a href="https://twitter.com/aristippus303/status/15520146540">https://twitter.com/aristippus303/status/15520146540</a></li>
  <li><a href="https://twitter.com/NeilRaden/status/15520478566">https://twitter.com/NeilRaden/status/15520478566</a></li>
  <li><a href="https://twitter.com/SethGrimes/status/15520765766">https://twitter.com/SethGrimes/status/15520765766</a></li>
  <li><a href="https://twitter.com/SethGrimes/status/15520851678">https://twitter.com/SethGrimes/status/15520851678</a></li>
  <li><a href="https://twitter.com/NeilRaden/status/15521050387">https://twitter.com/NeilRaden/status/15521050387</a></li>
  <li><a href="https://twitter.com/NeilRaden/status/15521106901">https://twitter.com/NeilRaden/status/15521106901</a></li>
  <li><a href="https://twitter.com/NeilRaden/status/15521133647">https://twitter.com/NeilRaden/status/15521133647</a></li>
  <li><a href="https://twitter.com/NeilRaden/status/15521192977">https://twitter.com/NeilRaden/status/15521192977</a></li>
  <li><a href="https://twitter.com/SethGrimes/status/15521579977">https://twitter.com/SethGrimes/status/15521579977</a></li>
  <li><a href="https://twitter.com/ryanprociuk/status/15521637974">https://twitter.com/ryanprociuk/status/15521637974</a></li>
</ol>
</p><div class="item_footer"><p><small><a href="http://press.teleinteractive.net/oss/2010/06/30/data-artisan-smith-or-scientist">Original post</a> blogged on <a href="http://b2evolution.net/">b2evolution</a>.</small></p></div>]]></content:encoded>
								<comments>http://press.teleinteractive.net/oss/2010/06/30/data-artisan-smith-or-scientist#comments</comments>
			<wfw:commentRss>http://press.teleinteractive.net/oss/?tempskin=_rss2&#38;disp=comments&#38;p=947</wfw:commentRss>
		</item>
				<item>
			<title>Technology for the OSS DSS Study Guide</title>
			<link>http://press.teleinteractive.net/oss/2010/06/15/technology-for-the-oss-dss-study-guide</link>
			<pubDate>Tue, 15 Jun 2010 23:56:38 +0000</pubDate>			<dc:creator>Joseph A. di Paolantonio</dc:creator>
			<category domain="alt">Business</category>
<category domain="alt">Computers and Internet</category>
<category domain="alt">Open Source</category>
<category domain="alt">software</category>
<category domain="alt">Business Intelligence</category>
<category domain="main">OSS DSS Study Guide</category>			<guid isPermaLink="false">937@http://press.teleinteractive.net/</guid>
						<description>&lt;p&gt;'Tis been longer than intended, but we finally have the technology, time and resources to continue with our Open Source Solutions Decision Support System Study Guide (OSS DSS SG).&lt;/p&gt;
&lt;p&gt;First, I want to thank &lt;a title=&quot;SQLstream Website&quot; href=&quot;http://sqlstream.com/&quot; target=&quot;_blank&quot;&gt;SQLstream&lt;/a&gt; for allowing us to use SQLstream as a part of our solution.  As mentioned in our &quot;&lt;a title=&quot;InterASC First Post in DSS Study Guide&quot; href=&quot;/oss/2010/03/15/first-dss-study-guide&quot; target=&quot;_self&quot;&gt;First DSS Study Guide&lt;/a&gt;&quot; post, we were hoping to add a  real-time component to our DSS.  SQLstream is not open source, and not readily available for download.  &lt;a title=&quot;SQLstream and Eigenbase&quot; href=&quot;http://sqlstream.com/Products/eigenbase.htm&quot; target=&quot;_blank&quot;&gt;It is however, a co-founder and core contributer to the open source Eigenbase Project, and has incorporated Eigenbase technology into its product&lt;/a&gt;.  So, what is SQLstream?  To quote their web site, &quot;&lt;a title=&quot;SQLstream and Decision Support&quot; href=&quot;http://sqlstream.com/Products/products.htm&quot; target=&quot;_blank&quot;&gt;SQLstream enables executives to make strategic decisions based on current data, in flight, from multiple, diverse sources&lt;/a&gt;&quot;.  And that is why we are so interested in having SQLstream as a part of our DSS technology stack: to have the capability to capture and manipulate data as it is being generated.&lt;/p&gt;
&lt;p&gt;Today, there are two very important classes of technologies that should belong to any DSS: data warehousing (DW) and business intelligence (BI).  What actually comprises these technologies is still a matter of debate.  To me, they are quite interrelated and provide the following capabilities.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The means of getting data from one or more sources to one or more target storage &amp;amp; analysis systems.  Regardless of the details for the source(s) and the target(s), the traditional means in data warehousing is Extract from the source(s), Transform for consistency &amp;amp; correctness, and Load into the target(s), that is, ETL.  Other means, such as using data services within a services oriented architecture (SOA) either using provider-consumer contracts &amp;amp; Web Service Definition Language (WSDL) or representational state transfer (ReST) are also possible.&lt;/li&gt;
&lt;li&gt;Active storage over the long term of historic and near-current data.  Active storage as opposed to static storage, such as a tape archive.  This storage should be optimized for reporting and analysis through both its logical and physical data models, and through the database architecture and technologies implemented.  Today we're seeing an amazing surge of data storage and management innovation, with column-store relational database management systems (RDBMS), map-reduce (M-R), key-value stores (KVS) and more, especially hybrids of one or several of old and new technologies.  The innovation is coming so thick and fast, that the terminology is even more confused than in the rest of the BI world.  NoSQL has become a popular term for all non-RDBMS, and even some RDBMS like column-store.  But even here, what once meant No Structured Query Language now is often defined as Not only Structured Query Language, as if SQL was the only way to create an RDBMS (can someone say &lt;a title=&quot;Progress Software Website&quot; href=&quot;http://web.progress.com/en/index.html&quot; target=&quot;_blank&quot;&gt;Progress&lt;/a&gt; and its proprietary 4GL).&lt;/li&gt;
&lt;li&gt;Tools for reporting including gathering the data, performing calculations, graphing, or perhaps more accurately, charting, formating and disseminating.&lt;/li&gt;
&lt;li&gt;Online Analytical Processing (OLAP) also known as &quot;slice and dice&quot;, generally allowing forms of multi-dimensional or pivot analysis.  Simply put, there are three underlying concepts for OLAP: the cube (a.k.a. hypercube, multi-dimensional database [MDDB] or OLAP engine), the measures (facts) &amp;amp; dimensions, and aggregation.  OLAP provides much more flexibility than reporting, though the two often work hand-in-hand, especially for ad-hoc reporting and analysis.&lt;/li&gt;
&lt;li&gt;Data Mining, including machine learning and the ability to discover correlations among disparate data sets.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For our purposes, an important question is whether or not there are open source, or at least open source based, solutions for all of these capabilities.  The answer is yes.  As a matter of fact, there are three complete open source BI Suites [there were four, but the first, written in PERL, the Bee Project from the Czech Republic, is no longer being updated].  Here's a brief overview of SpagoBI, JasperSoft, and Pentaho.&lt;/p&gt;
&lt;p&gt;
&lt;table id=&quot;OSBI Suites&quot; style=&quot;border: 10px solid #003399;&quot; border=&quot;10&quot; cellspacing=&quot;2&quot; cellpadding=&quot;2&quot; align=&quot;center&quot; summary=&quot;Open Source Projects Providing Capabilities for SpagoBI, JasperSoft and Pentaho&quot;&gt;
&lt;caption&gt;&lt;/caption&gt; 
&lt;tbody&gt;
&lt;tr&gt;
&lt;th align=&quot;center&quot;&gt;Capability&lt;/th&gt; &lt;th align=&quot;center&quot;&gt;SpagoBI&lt;/th&gt; &lt;th align=&quot;center&quot;&gt;JasperSoft&lt;/th&gt; &lt;th align=&quot;center&quot;&gt;Pentaho&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;ETL&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;Talend&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;Talend&lt;br /&gt;JasperETL&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;KETTLE&lt;br /&gt;PDI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;Included&lt;br /&gt;DBMS&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;HSQLDB&lt;br /&gt;MySQL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;Reporting&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;BIRT&lt;br /&gt;JasperReport&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;JasperReports&lt;br /&gt;iReports&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;jFreeReports&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;Analyzer&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;jPivot&lt;br /&gt;&lt;ins&gt;PaloPivot&lt;/ins&gt;&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;JasperServer&lt;br /&gt;JasperAnalysis&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;jPivot&lt;br /&gt;PAT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;OLAP&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;Mondrian&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;Mondrian&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;Mondrian&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;Data Mining&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;Weka&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;None&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;Weka&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/p&gt;
&lt;p&gt;We'll be using &lt;a title=&quot;Pentaho Community BI Suite&quot; href=&quot;http://community.pentaho.com/projects/bi_platform/&quot; target=&quot;_blank&quot;&gt;Pentaho&lt;/a&gt;, but you can use any of the these, or any combination of the OSS projects that are used by these BI Suites, or pick and choose from the more than 60 projects in our OSS Linkblog, as shown in the sidebar to this blog.  All of the OSS BI Suites have many more features than shown in the simple table above.  For example, &lt;a title=&quot;SpagoBI Listing of Tools&quot; href=&quot;http://www.spagoworld.org/xwiki/bin/view/SpagoBI/TheSuite&quot; target=&quot;_blank&quot;&gt;SpagoBI&lt;/a&gt; has good tools for geographic &amp;amp; location services.  Also,&lt;a title=&quot;JasperSoft Feature Comparison&quot; href=&quot;http://www.jaspersoft.com/jaspersoft-business-intelligence-software-trial&quot; target=&quot;_blank&quot;&gt; JasperSoft Professional and Enterprise Editions have many features than their Community Edition&lt;/a&gt;, such as Ad Hoc Reporting and Dashboards.  Pentaho has a different Analyzer in their Enterprise Edition than either jPivot or PAT, &lt;a title=&quot;Pentaho Announces Strategic Technology Acquisition&quot; href=&quot;http://www.pentaho.com/news/releases/20091005_pentaho_announces_strategic_technology_acquisition.php&quot; target=&quot;_blank&quot;&gt;Pentaho Analyzer, based upon the SaaS ClearView&lt;/a&gt; from the now-defunct LucidEra, as well as ease-of-use tools such as an OLAP sch&amp;#230;ma designer, and enterprise class security and administration tools.&lt;/p&gt;
&lt;p&gt;Data warehousing using general purpose RDBMS systems such as Oracle, EnterpriseDB, PostrgeSQL or MySQL, are gradually giving way to analytic database management system (ADBMS), or, as we mentioned above, the catch-all NoSQL data storage systems, or even hybrid systems.  For example, Oracle recently introduced hybrid column-row store features, and Aster Data has a &lt;del&gt;column-store&lt;/del&gt; Massive Parallel Processing (MPP) DBMS|map-reduce hybrid &lt;ins&gt;[updated 20100616 per comment from &lt;a title=&quot;Seth Grimes OnLine Business Card&quot; href=&quot;http://sethgrimes.com/&quot; target=&quot;_blank&quot;&gt;Seth Grimes&lt;/a&gt;]&lt;/ins&gt;.  Pentaho supports Hadoop, as well as traditional general purpose RDBMS and column-store ADMBS.  In the open source world, there are two columnar storage engines for MySQL, Infobright and Calpont InfiniDB, as well as one column-store ADBMS purpose built for BI, LucidDB.  We'll be using LucidDB, and just for fun, may throw some data into Hadoop.&lt;/p&gt;
&lt;p&gt;In addition, a modern DSS needs two more primary capabilities.  Predictives, sometimes called predictive intelligence or predictive analytics (PA), which is the ability to go beyond inference and trend analysis, assigning a probability, with associated confidence, or likelihood of an event occurring in the future, and full Statistical Analysis, which includes determining the probability density or distribution function that best describes the data.  Of course, there are OSS projects for these as well, such as The R Project, the Apache Common Math libraries, and other GNU projects that can be found in our Linkblog.&lt;/p&gt;
&lt;p&gt;For statistical analysis and predictives, we'll be using the open source R statistical language and the open standard predictive model markup language (PMML), both of which are also supported by Pentaho.&lt;/p&gt;
&lt;p&gt;We have all of these OSS projects installed on a Red Hat Enterprise Linux machine.  The trick will be to get them all working together.  The magic will be in modeling and analyzing the data to support good decisions.  There are several areas of decision making that we're considering as examples.  One is fairly prosaic, one is very interesting and far-reaching, and the others are somewhat in between.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;A fairly simple example would be to take our blog statistics, a real-time stream using SQLstream's Twitter API, and run experiments to determine whether or not, and possibly how, Twitter affects traffic to and interaction with our blogs.  Possibly, we could get to the point where we can predict how our use of Twitter will affect our blog.&lt;/li&gt;
&lt;li&gt;A much more &lt;a title=&quot;BP Gulf Oil Spill Analytics&quot; href=&quot;http://twitter.com/KenWinnick/status/15905499436&quot; target=&quot;_blank&quot;&gt;far-reaching idea was presented by Ken Winnick to me, via Twitter&lt;/a&gt;, and has created an on-going Twitter conversation and hashtag, #BPgulfDB.  Let's take crowd sourced, government, and other publicly available data about the recent oilspill in the Gulf of Mexico, and analyze it.&lt;/li&gt;
&lt;li&gt;Another idea is to take historical home utility usage plus current smart meter usage data, and create a real-time dashboard, and even predictives, for reducing and managing energy usage.&lt;/li&gt;
&lt;li&gt;We also have the opportunity of using public data to enhance reporting and analytics for small, rural and research hospitals.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;!-- Creative Commons License --&gt; &lt;div class=&quot;cc_license&quot;&gt;&lt;a rel=&quot;cc:license&quot; href=&quot;http://creativecommons.org/licenses/by-nc-sa/3.0/&quot; title=&quot;Creative Commons License: Attribution, Non-Commercial, Share-Alike&quot;&gt;&lt;img src=&quot;http://i.creativecommons.org/l/by-nc-sa/3.0/88x31.png&quot; alt=&quot;Creative Commons License: Attribution, Non-Commercial, Share-Alike&quot; /&gt;&lt;/a&gt;Except where otherwise noted, this content is&lt;br /&gt;licensed under a &lt;a rel=&quot;cc:license&quot; href=&quot;http://creativecommons.org/licenses/by-nc-sa/3.0/&quot; title=&quot;Creative Commons License: Attribution, Non-Commercial, Share-Alike&quot;&gt;Creative Commons License&lt;/a&gt;.&lt;/div&gt; &lt;!-- /Creative Commons License --&gt;
&lt;!--
&lt;rdf:RDF xmlns=&quot;http://creativecommons.org/ns#&quot;
    xmlns:dc=&quot;http://purl.org/dc/elements/1.1/&quot;
    xmlns:rdf=&quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#&quot;&gt;
&lt;Work rdf:about=&quot;http://press.teleinteractive.net/oss/2010/06/15/technology-for-the-oss-dss-study-guide&quot;&gt;
  &lt;dc:title&gt;Technology for the OSS DSS Study Guide&lt;/dc:title&gt;
  &lt;dc:creator&gt;&lt;Agent&gt;
    &lt;dc:title&gt;Joseph A. di Paolantonio (User ID#4 at http://press.teleinteractive.net/)&lt;/dc:title&gt;
  &lt;/Agent&gt;&lt;/dc:creator&gt;
  &lt;dc:rights&gt;&lt;Agent&gt;
    &lt;dc:title&gt;Joseph A. di Paolantonio (User ID#4 at http://press.teleinteractive.net/)&lt;/dc:title&gt;
  &lt;/Agent&gt;&lt;/dc:rights&gt;
  &lt;dc:date&gt;2010-06-11 16:56:48&lt;/dc:date&gt;
  &lt;license rdf:resource=&quot;by-nc-sa/3.0/&quot; /&gt;
&lt;/Work&gt;
&lt;License rdf:about=&quot;by-nc-sa/3.0/&quot;&gt;
  &lt;permits rdf:resource=&quot;http://web.resource.org/cc/Reproduction&quot; /&gt;
  &lt;permits rdf:resource=&quot;http://web.resource.org/cc/Distribution&quot; /&gt;
  &lt;permits rdf:resource=&quot;http://web.resource.org/cc/DerivativeWorks&quot; /&gt;
  &lt;prohibits rdf:resource=&quot;http://web.resource.org/cc/CommercialUse&quot; /&gt;
  &lt;requires rdf:resource=&quot;http://web.resource.org/cc/Notice&quot; /&gt;
  &lt;requires rdf:resource=&quot;http://web.resource.org/cc/Attribution&quot; /&gt;
  &lt;requires rdf:resource=&quot;http://web.resource.org/cc/ShareAlike&quot; /&gt;
&lt;/License&gt;
&lt;/rdf:RDF&gt;
--&gt;
&lt;/p&gt;&lt;div class=&quot;item_footer&quot;&gt;&lt;p&gt;&lt;small&gt;&lt;a href=&quot;http://press.teleinteractive.net/oss/2010/06/15/technology-for-the-oss-dss-study-guide&quot;&gt;Original post&lt;/a&gt; blogged on &lt;a href=&quot;http://b2evolution.net/&quot;&gt;b2evolution&lt;/a&gt;.&lt;/small&gt;&lt;/p&gt;&lt;/div&gt;</description>
			<content:encoded><![CDATA[<p>'Tis been longer than intended, but we finally have the technology, time and resources to continue with our Open Source Solutions Decision Support System Study Guide (OSS DSS SG).</p>
<p>First, I want to thank <a title="SQLstream Website" href="http://sqlstream.com/" target="_blank">SQLstream</a> for allowing us to use SQLstream as a part of our solution.  As mentioned in our "<a title="InterASC First Post in DSS Study Guide" href="http://press.teleinteractive.net/oss/2010/03/15/first-dss-study-guide" target="_self">First DSS Study Guide</a>" post, we were hoping to add a  real-time component to our DSS.  SQLstream is not open source, and not readily available for download.  <a title="SQLstream and Eigenbase" href="http://sqlstream.com/Products/eigenbase.htm" target="_blank">It is however, a co-founder and core contributer to the open source Eigenbase Project, and has incorporated Eigenbase technology into its product</a>.  So, what is SQLstream?  To quote their web site, "<a title="SQLstream and Decision Support" href="http://sqlstream.com/Products/products.htm" target="_blank">SQLstream enables executives to make strategic decisions based on current data, in flight, from multiple, diverse sources</a>".  And that is why we are so interested in having SQLstream as a part of our DSS technology stack: to have the capability to capture and manipulate data as it is being generated.</p>
<p>Today, there are two very important classes of technologies that should belong to any DSS: data warehousing (DW) and business intelligence (BI).  What actually comprises these technologies is still a matter of debate.  To me, they are quite interrelated and provide the following capabilities.</p>
<ul>
<li>The means of getting data from one or more sources to one or more target storage &amp; analysis systems.  Regardless of the details for the source(s) and the target(s), the traditional means in data warehousing is Extract from the source(s), Transform for consistency &amp; correctness, and Load into the target(s), that is, ETL.  Other means, such as using data services within a services oriented architecture (SOA) either using provider-consumer contracts &amp; Web Service Definition Language (WSDL) or representational state transfer (ReST) are also possible.</li>
<li>Active storage over the long term of historic and near-current data.  Active storage as opposed to static storage, such as a tape archive.  This storage should be optimized for reporting and analysis through both its logical and physical data models, and through the database architecture and technologies implemented.  Today we're seeing an amazing surge of data storage and management innovation, with column-store relational database management systems (RDBMS), map-reduce (M-R), key-value stores (KVS) and more, especially hybrids of one or several of old and new technologies.  The innovation is coming so thick and fast, that the terminology is even more confused than in the rest of the BI world.  NoSQL has become a popular term for all non-RDBMS, and even some RDBMS like column-store.  But even here, what once meant No Structured Query Language now is often defined as Not only Structured Query Language, as if SQL was the only way to create an RDBMS (can someone say <a title="Progress Software Website" href="http://web.progress.com/en/index.html" target="_blank">Progress</a> and its proprietary 4GL).</li>
<li>Tools for reporting including gathering the data, performing calculations, graphing, or perhaps more accurately, charting, formating and disseminating.</li>
<li>Online Analytical Processing (OLAP) also known as "slice and dice", generally allowing forms of multi-dimensional or pivot analysis.  Simply put, there are three underlying concepts for OLAP: the cube (a.k.a. hypercube, multi-dimensional database [MDDB] or OLAP engine), the measures (facts) &amp; dimensions, and aggregation.  OLAP provides much more flexibility than reporting, though the two often work hand-in-hand, especially for ad-hoc reporting and analysis.</li>
<li>Data Mining, including machine learning and the ability to discover correlations among disparate data sets.</li>
</ul>
<p>For our purposes, an important question is whether or not there are open source, or at least open source based, solutions for all of these capabilities.  The answer is yes.  As a matter of fact, there are three complete open source BI Suites [there were four, but the first, written in PERL, the Bee Project from the Czech Republic, is no longer being updated].  Here's a brief overview of SpagoBI, JasperSoft, and Pentaho.</p>
<p>
<table id="OSBI Suites" style="border: 10px solid #003399;" border="10" cellspacing="2" cellpadding="2" align="center" summary="Open Source Projects Providing Capabilities for SpagoBI, JasperSoft and Pentaho">
<caption></caption> 
<tbody>
<tr>
<th align="center">Capability</th> <th align="center">SpagoBI</th> <th align="center">JasperSoft</th> <th align="center">Pentaho</th>
</tr>
<tr>
<td align="center">ETL</td>
<td align="center">Talend</td>
<td align="center">Talend<br />JasperETL</td>
<td align="center">KETTLE<br />PDI</td>
</tr>
<tr>
<td align="center">Included<br />DBMS</td>
<td align="center"></td>
<td align="center"></td>
<td align="center">HSQLDB<br />MySQL</td>
</tr>
<tr>
<td align="center">Reporting</td>
<td align="center">BIRT<br />JasperReport</td>
<td align="center">JasperReports<br />iReports</td>
<td align="center">jFreeReports</td>
</tr>
<tr>
<td align="center">Analyzer</td>
<td align="center">jPivot<br /><ins>PaloPivot</ins></td>
<td align="center">JasperServer<br />JasperAnalysis</td>
<td align="center">jPivot<br />PAT</td>
</tr>
<tr>
<td align="center">OLAP</td>
<td align="center">Mondrian</td>
<td align="center">Mondrian</td>
<td align="center">Mondrian</td>
</tr>
<tr>
<td align="center">Data Mining</td>
<td align="center">Weka</td>
<td align="center">None</td>
<td align="center">Weka</td>
</tr>
</tbody>
</table>
</p>
<p>We'll be using <a title="Pentaho Community BI Suite" href="http://community.pentaho.com/projects/bi_platform/" target="_blank">Pentaho</a>, but you can use any of the these, or any combination of the OSS projects that are used by these BI Suites, or pick and choose from the more than 60 projects in our OSS Linkblog, as shown in the sidebar to this blog.  All of the OSS BI Suites have many more features than shown in the simple table above.  For example, <a title="SpagoBI Listing of Tools" href="http://www.spagoworld.org/xwiki/bin/view/SpagoBI/TheSuite" target="_blank">SpagoBI</a> has good tools for geographic &amp; location services.  Also,<a title="JasperSoft Feature Comparison" href="http://www.jaspersoft.com/jaspersoft-business-intelligence-software-trial" target="_blank"> JasperSoft Professional and Enterprise Editions have many features than their Community Edition</a>, such as Ad Hoc Reporting and Dashboards.  Pentaho has a different Analyzer in their Enterprise Edition than either jPivot or PAT, <a title="Pentaho Announces Strategic Technology Acquisition" href="http://www.pentaho.com/news/releases/20091005_pentaho_announces_strategic_technology_acquisition.php" target="_blank">Pentaho Analyzer, based upon the SaaS ClearView</a> from the now-defunct LucidEra, as well as ease-of-use tools such as an OLAP sch&#230;ma designer, and enterprise class security and administration tools.</p>
<p>Data warehousing using general purpose RDBMS systems such as Oracle, EnterpriseDB, PostrgeSQL or MySQL, are gradually giving way to analytic database management system (ADBMS), or, as we mentioned above, the catch-all NoSQL data storage systems, or even hybrid systems.  For example, Oracle recently introduced hybrid column-row store features, and Aster Data has a <del>column-store</del> Massive Parallel Processing (MPP) DBMS|map-reduce hybrid <ins>[updated 20100616 per comment from <a title="Seth Grimes OnLine Business Card" href="http://sethgrimes.com/" target="_blank">Seth Grimes</a>]</ins>.  Pentaho supports Hadoop, as well as traditional general purpose RDBMS and column-store ADMBS.  In the open source world, there are two columnar storage engines for MySQL, Infobright and Calpont InfiniDB, as well as one column-store ADBMS purpose built for BI, LucidDB.  We'll be using LucidDB, and just for fun, may throw some data into Hadoop.</p>
<p>In addition, a modern DSS needs two more primary capabilities.  Predictives, sometimes called predictive intelligence or predictive analytics (PA), which is the ability to go beyond inference and trend analysis, assigning a probability, with associated confidence, or likelihood of an event occurring in the future, and full Statistical Analysis, which includes determining the probability density or distribution function that best describes the data.  Of course, there are OSS projects for these as well, such as The R Project, the Apache Common Math libraries, and other GNU projects that can be found in our Linkblog.</p>
<p>For statistical analysis and predictives, we'll be using the open source R statistical language and the open standard predictive model markup language (PMML), both of which are also supported by Pentaho.</p>
<p>We have all of these OSS projects installed on a Red Hat Enterprise Linux machine.  The trick will be to get them all working together.  The magic will be in modeling and analyzing the data to support good decisions.  There are several areas of decision making that we're considering as examples.  One is fairly prosaic, one is very interesting and far-reaching, and the others are somewhat in between.</p>
<ol>
<li>A fairly simple example would be to take our blog statistics, a real-time stream using SQLstream's Twitter API, and run experiments to determine whether or not, and possibly how, Twitter affects traffic to and interaction with our blogs.  Possibly, we could get to the point where we can predict how our use of Twitter will affect our blog.</li>
<li>A much more <a title="BP Gulf Oil Spill Analytics" href="http://twitter.com/KenWinnick/status/15905499436" target="_blank">far-reaching idea was presented by Ken Winnick to me, via Twitter</a>, and has created an on-going Twitter conversation and hashtag, #BPgulfDB.  Let's take crowd sourced, government, and other publicly available data about the recent oilspill in the Gulf of Mexico, and analyze it.</li>
<li>Another idea is to take historical home utility usage plus current smart meter usage data, and create a real-time dashboard, and even predictives, for reducing and managing energy usage.</li>
<li>We also have the opportunity of using public data to enhance reporting and analytics for small, rural and research hospitals.</li>
</ol>
<p><!-- Creative Commons License --> <div class="cc_license"><a rel="cc:license" href="http://creativecommons.org/licenses/by-nc-sa/3.0/" title="Creative Commons License: Attribution, Non-Commercial, Share-Alike"><img src="http://i.creativecommons.org/l/by-nc-sa/3.0/88x31.png" alt="Creative Commons License: Attribution, Non-Commercial, Share-Alike" /></a>Except where otherwise noted, this content is<br />licensed under a <a rel="cc:license" href="http://creativecommons.org/licenses/by-nc-sa/3.0/" title="Creative Commons License: Attribution, Non-Commercial, Share-Alike">Creative Commons License</a>.</div> <!-- /Creative Commons License -->
<!--
<rdf:RDF xmlns="http://creativecommons.org/ns#"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<Work rdf:about="http://press.teleinteractive.net/oss/2010/06/15/technology-for-the-oss-dss-study-guide">
  <dc:title>Technology for the OSS DSS Study Guide</dc:title>
  <dc:creator><Agent>
    <dc:title>Joseph A. di Paolantonio (User ID#4 at http://press.teleinteractive.net/)</dc:title>
  </Agent></dc:creator>
  <dc:rights><Agent>
    <dc:title>Joseph A. di Paolantonio (User ID#4 at http://press.teleinteractive.net/)</dc:title>
  </Agent></dc:rights>
  <dc:date>2010-06-11 16:56:48</dc:date>
  <license rdf:resource="by-nc-sa/3.0/" />
</Work>
<License rdf:about="by-nc-sa/3.0/">
  <permits rdf:resource="http://web.resource.org/cc/Reproduction" />
  <permits rdf:resource="http://web.resource.org/cc/Distribution" />
  <permits rdf:resource="http://web.resource.org/cc/DerivativeWorks" />
  <prohibits rdf:resource="http://web.resource.org/cc/CommercialUse" />
  <requires rdf:resource="http://web.resource.org/cc/Notice" />
  <requires rdf:resource="http://web.resource.org/cc/Attribution" />
  <requires rdf:resource="http://web.resource.org/cc/ShareAlike" />
</License>
</rdf:RDF>
-->
</p><div class="item_footer"><p><small><a href="http://press.teleinteractive.net/oss/2010/06/15/technology-for-the-oss-dss-study-guide">Original post</a> blogged on <a href="http://b2evolution.net/">b2evolution</a>.</small></p></div>]]></content:encoded>
								<comments>http://press.teleinteractive.net/oss/2010/06/15/technology-for-the-oss-dss-study-guide#comments</comments>
			<wfw:commentRss>http://press.teleinteractive.net/oss/?tempskin=_rss2&#38;disp=comments&#38;p=937</wfw:commentRss>
		</item>
				<item>
			<title>OSS DSS Formalization</title>
			<link>http://press.teleinteractive.net/oss/2010/04/28/oss-dss-formalization</link>
			<pubDate>Wed, 28 Apr 2010 19:26:17 +0000</pubDate>			<dc:creator>Joseph A. di Paolantonio</dc:creator>
			<category domain="alt">Business</category>
<category domain="alt">Computers and Internet</category>
<category domain="alt">Open Source</category>
<category domain="alt">software</category>
<category domain="alt">technology</category>
<category domain="alt">Business Intelligence</category>
<category domain="main">OSS DSS Study Guide</category>			<guid isPermaLink="false">936@http://press.teleinteractive.net/</guid>
						<description>&lt;p&gt;The next step in our open source solutions (OSS) for decision support systems (DSS) study guide (SG), according to the syllabus, is to make our first decision: a formal definition of &quot;Decision Support System&quot;.  Next, and soon, will be a post listing the technologies that will contribute to our studies.&lt;/p&gt;

&lt;p&gt;The first stop in looking for a definition of anything today, is &lt;a href=&quot;http://en.wikipedia.org/wiki/Main_Page&quot;&gt;Wikipedia&lt;/a&gt;.  And indeed, Wikipedia does have a nice &lt;a href=&quot;http://en.wikipedia.org/wiki/Decision_support_system&quot;&gt;article on DSS&lt;/a&gt;.  One of the things that I find most informative about Wikipedia articles, is the &quot;Talk&quot; page for an article.  The &lt;a href=&quot;http://en.wikipedia.org/wiki/Talk:Decision_support_system&quot;&gt;DSS discussion&lt;/a&gt; is rather mild though, no ongoing debate as can be found on some other talk pages, such as the &lt;a href=&quot;http://en.wikipedia.org/wiki/Talk:Business_intelligence&quot;&gt;discussion about Business Intelligence&lt;/a&gt;.  The talk pages also change more often, and provide insight into the thoughts that go into the main article.&lt;/p&gt;

&lt;p&gt;And of course, the second stop is a &lt;a href=&quot;http://www.google.com/search?q=decision+support+system&quot;&gt;Google search for Decision Support System&lt;/a&gt;; a &lt;a href=&quot;http://www.google.com/search?q=DSS&quot;&gt;search on DSS&lt;/a&gt; is not nearly as fruitful for our purposes.  &lt;img src=&quot;http://press.teleinteractive.net/rsc/smilies/icon_smile.gif&quot; alt=&quot;&amp;#58;&amp;#41;&quot; class=&quot;middle&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Once upon a time, we might have gone to a &lt;a href=&quot;http://www.librarytechnology.org/libwebcats/&quot;&gt;library&lt;/a&gt; and thumbed through the card catalog to find some books on Decision Support Systems.  A more popular approach today would be to &lt;a href=&quot;http://www.amazon.com/gp/redirect.html?ie=UTF8&amp;amp;location=http%3A%2F%2Fwww.amazon.com%2Fs%3Fie%3DUTF8%26x%3D0%26ref_%3Dnb%5Fsb%5Fss%5Fi%5F3%5F11%26y%3D0%26field-keywords%3Ddecision%2520support%26url%3Dsearch-alias%253Dstripbooks%26sprefix%3DDecision%2520su&amp;amp;tag=intersystecon-20&amp;amp;linkCode=ur2&amp;amp;camp=1789&amp;amp;creative=390957&quot;&gt;search Amazon for Decision Support&lt;/a&gt; books.  There are several books in my library that you might find interesting for different reasons:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;&lt;a href=&quot;http://www.amazon.com/gp/product/0470484322?ie=UTF8&amp;amp;tag=intersystecon-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=0470484322&quot;&gt;Pentaho Solutions: Business Intelligence and Data Warehousing with Pentaho and MySQL&lt;/a&gt; by Roland Bouman &amp;amp; Jos van Dongen provides a very good overview of data warehousing, business intelligence and data mining, all key components to a DSS, and does so within the context of the open source Pentaho suite&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://www.amazon.com/gp/product/0132347962?ie=UTF8&amp;amp;tag=intersystecon-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=0132347962&quot;&gt;Smart Enough Systems: How to Deliver Competitive Advantage by Automating Hidden Decisions&lt;/a&gt; by James Taylor &amp;amp; Neil Raden introduces business concepts for truly managing information and using decision support systems, as well as being a primer on data warehousing and business intelligence, but goes beyond this by automating the data flow and decision making processes&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://www.amazon.com/gp/product/0201784203?ie=UTF8&amp;amp;tag=intersystecon-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=0201784203&quot;&gt;Business Intelligence Roadmap: The Complete Project Lifecycle for Decision-Support Applications&lt;/a&gt; by Larissa T. Moss &amp;amp; Shaku Atre takes a business, program and project management approach to implementing DSS within a company, introducing fundamental concepts in a clear, though simplistic level&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://www.amazon.com/gp/product/1422103323?ie=UTF8&amp;amp;tag=intersystecon-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=1422103323&quot;&gt;Competing on Analytics: The New Science of Winning&lt;/a&gt; by Thomas H. Davenport &amp;amp; Jeanne G. Harris in many ways goes into the next generation of decision support by showing how data, statistical and quantitative analysis within a context specific processes, gives businesses a strong lead over their competition, albeit, it does so at a very simplistic, formulaic level&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These books range from being technology focused to being general business books, but they all provide insight into how various components of DSS fit into a business, and different approaches to implementing them.  None of them actually provide a complete DSS, and only the first focuses on OSS.  If  you followed the Amazon search link given previously, you might also have noticed that there are books that show Excel as a DSS, and there is a preponderance of books that focus on the biomedical/pharmaceutical/healthcare industry.  Another focus area is in using geographic information systems (actually one of the first uses for multi-dimensional databases) for decision support.  There are several books in this search that look good, but haven't made it into my library as yet.  I would love to hear your recommendations (perhaps in the comments).&lt;/p&gt;

&lt;p&gt;From all of this, and our experiences in implementing various DW, BI and DSS programs, I'm going to give a definition of DSS.  From a previous post in this DSS SG, we have the following:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;A DSS is a set of processes and technology that help an individual to make a better decision than they could without the DSS.&lt;br /&gt;
-- &lt;a href=&quot;http://press.teleinteractive.net/oss/2010/03/19/questions-and-commonality&quot;&gt;Questions and Commonality&lt;/a&gt;&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;As we stated, this is vague and generic.  Now that we've done some reading, let's see if we can do better.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;A DSS assists an individual in reaching the best possible conclusion, resolution or course of action in stand-alone, iterative or interdependent situations, by using historical and current structured and unstructured data, collaboration with colleagues, and personal knowledge to predict the outcome or infer the consequences.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;I like that definition, but your comments will help to refine it.&lt;/p&gt;

&lt;p&gt;Note that we make no mention of specific processes, nor any technology whatsoever.  It reflects my bias that decisions are made by individuals not groups (electoral systems not withstanding).  To be true to our &quot;TeleInterActive Lifestyle&quot; &lt;img src=&quot;http://press.teleinteractive.net/rsc/smilies/icon_wink.gif&quot; alt=&quot;&amp;#59;&amp;#41;&quot; class=&quot;middle&quot; /&gt; I should point out that the DSS must be available when and where the individual needs to make the decision.&lt;/p&gt;

&lt;p&gt;Any comments?&lt;/p&gt;

&lt;!-- Creative Commons License --&gt; &lt;div class=&quot;cc_license&quot;&gt;&lt;a rel=&quot;cc:license&quot; href=&quot;http://creativecommons.org/licenses/by-nc-sa/3.0/&quot; title=&quot;Creative Commons License: Attribution, Non-Commercial, Share-Alike&quot;&gt;&lt;img src=&quot;http://i.creativecommons.org/l/by-nc-sa/3.0/88x31.png&quot; alt=&quot;Creative Commons License: Attribution, Non-Commercial, Share-Alike&quot; /&gt;&lt;/a&gt;Except where otherwise noted, this content is&lt;br /&gt;licensed under a &lt;a rel=&quot;cc:license&quot; href=&quot;http://creativecommons.org/licenses/by-nc-sa/3.0/&quot; title=&quot;Creative Commons License: Attribution, Non-Commercial, Share-Alike&quot;&gt;Creative Commons License&lt;/a&gt;.&lt;/div&gt; &lt;!-- /Creative Commons License --&gt;
&lt;!--
&lt;rdf:RDF xmlns=&quot;http://creativecommons.org/ns#&quot;
    xmlns:dc=&quot;http://purl.org/dc/elements/1.1/&quot;
    xmlns:rdf=&quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#&quot;&gt;
&lt;Work rdf:about=&quot;http://press.teleinteractive.net/oss/2010/04/28/oss-dss-formalization&quot;&gt;
  &lt;dc:title&gt;OSS DSS Formalization&lt;/dc:title&gt;
  &lt;dc:creator&gt;&lt;Agent&gt;
    &lt;dc:title&gt;Joseph A. di Paolantonio (User ID#4 at http://press.teleinteractive.net/)&lt;/dc:title&gt;
  &lt;/Agent&gt;&lt;/dc:creator&gt;
  &lt;dc:rights&gt;&lt;Agent&gt;
    &lt;dc:title&gt;Joseph A. di Paolantonio (User ID#4 at http://press.teleinteractive.net/)&lt;/dc:title&gt;
  &lt;/Agent&gt;&lt;/dc:rights&gt;
  &lt;dc:date&gt;2010-04-28 15:16:55&lt;/dc:date&gt;
  &lt;license rdf:resource=&quot;by-nc-sa/3.0/&quot; /&gt;
&lt;/Work&gt;
&lt;License rdf:about=&quot;by-nc-sa/3.0/&quot;&gt;
  &lt;permits rdf:resource=&quot;http://web.resource.org/cc/Reproduction&quot; /&gt;
  &lt;permits rdf:resource=&quot;http://web.resource.org/cc/Distribution&quot; /&gt;
  &lt;permits rdf:resource=&quot;http://web.resource.org/cc/DerivativeWorks&quot; /&gt;
  &lt;prohibits rdf:resource=&quot;http://web.resource.org/cc/CommercialUse&quot; /&gt;
  &lt;requires rdf:resource=&quot;http://web.resource.org/cc/Notice&quot; /&gt;
  &lt;requires rdf:resource=&quot;http://web.resource.org/cc/Attribution&quot; /&gt;
  &lt;requires rdf:resource=&quot;http://web.resource.org/cc/ShareAlike&quot; /&gt;
&lt;/License&gt;
&lt;/rdf:RDF&gt;
--&gt;
&lt;div class=&quot;item_footer&quot;&gt;&lt;p&gt;&lt;small&gt;&lt;a href=&quot;http://press.teleinteractive.net/oss/2010/04/28/oss-dss-formalization&quot;&gt;Original post&lt;/a&gt; blogged on &lt;a href=&quot;http://b2evolution.net/&quot;&gt;b2evolution&lt;/a&gt;.&lt;/small&gt;&lt;/p&gt;&lt;/div&gt;</description>
			<content:encoded><![CDATA[<p>The next step in our open source solutions (OSS) for decision support systems (DSS) study guide (SG), according to the syllabus, is to make our first decision: a formal definition of "Decision Support System".  Next, and soon, will be a post listing the technologies that will contribute to our studies.</p>

<p>The first stop in looking for a definition of anything today, is <a href="http://en.wikipedia.org/wiki/Main_Page">Wikipedia</a>.  And indeed, Wikipedia does have a nice <a href="http://en.wikipedia.org/wiki/Decision_support_system">article on DSS</a>.  One of the things that I find most informative about Wikipedia articles, is the "Talk" page for an article.  The <a href="http://en.wikipedia.org/wiki/Talk:Decision_support_system">DSS discussion</a> is rather mild though, no ongoing debate as can be found on some other talk pages, such as the <a href="http://en.wikipedia.org/wiki/Talk:Business_intelligence">discussion about Business Intelligence</a>.  The talk pages also change more often, and provide insight into the thoughts that go into the main article.</p>

<p>And of course, the second stop is a <a href="http://www.google.com/search?q=decision+support+system">Google search for Decision Support System</a>; a <a href="http://www.google.com/search?q=DSS">search on DSS</a> is not nearly as fruitful for our purposes.  <img src="http://press.teleinteractive.net/rsc/smilies/icon_smile.gif" alt="&#58;&#41;" class="middle" /></p>

<p>Once upon a time, we might have gone to a <a href="http://www.librarytechnology.org/libwebcats/">library</a> and thumbed through the card catalog to find some books on Decision Support Systems.  A more popular approach today would be to <a href="http://www.amazon.com/gp/redirect.html?ie=UTF8&amp;location=http%3A%2F%2Fwww.amazon.com%2Fs%3Fie%3DUTF8%26x%3D0%26ref_%3Dnb%5Fsb%5Fss%5Fi%5F3%5F11%26y%3D0%26field-keywords%3Ddecision%2520support%26url%3Dsearch-alias%253Dstripbooks%26sprefix%3DDecision%2520su&amp;tag=intersystecon-20&amp;linkCode=ur2&amp;camp=1789&amp;creative=390957">search Amazon for Decision Support</a> books.  There are several books in my library that you might find interesting for different reasons:</p>
<ol>
  <li><a href="http://www.amazon.com/gp/product/0470484322?ie=UTF8&amp;tag=intersystecon-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=0470484322">Pentaho Solutions: Business Intelligence and Data Warehousing with Pentaho and MySQL</a> by Roland Bouman &amp; Jos van Dongen provides a very good overview of data warehousing, business intelligence and data mining, all key components to a DSS, and does so within the context of the open source Pentaho suite</li>
  <li><a href="http://www.amazon.com/gp/product/0132347962?ie=UTF8&amp;tag=intersystecon-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=0132347962">Smart Enough Systems: How to Deliver Competitive Advantage by Automating Hidden Decisions</a> by James Taylor &amp; Neil Raden introduces business concepts for truly managing information and using decision support systems, as well as being a primer on data warehousing and business intelligence, but goes beyond this by automating the data flow and decision making processes</li>
<li><a href="http://www.amazon.com/gp/product/0201784203?ie=UTF8&amp;tag=intersystecon-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=0201784203">Business Intelligence Roadmap: The Complete Project Lifecycle for Decision-Support Applications</a> by Larissa T. Moss &amp; Shaku Atre takes a business, program and project management approach to implementing DSS within a company, introducing fundamental concepts in a clear, though simplistic level</li>
  <li><a href="http://www.amazon.com/gp/product/1422103323?ie=UTF8&amp;tag=intersystecon-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=1422103323">Competing on Analytics: The New Science of Winning</a> by Thomas H. Davenport &amp; Jeanne G. Harris in many ways goes into the next generation of decision support by showing how data, statistical and quantitative analysis within a context specific processes, gives businesses a strong lead over their competition, albeit, it does so at a very simplistic, formulaic level</li>
</ol>

<p>These books range from being technology focused to being general business books, but they all provide insight into how various components of DSS fit into a business, and different approaches to implementing them.  None of them actually provide a complete DSS, and only the first focuses on OSS.  If  you followed the Amazon search link given previously, you might also have noticed that there are books that show Excel as a DSS, and there is a preponderance of books that focus on the biomedical/pharmaceutical/healthcare industry.  Another focus area is in using geographic information systems (actually one of the first uses for multi-dimensional databases) for decision support.  There are several books in this search that look good, but haven't made it into my library as yet.  I would love to hear your recommendations (perhaps in the comments).</p>

<p>From all of this, and our experiences in implementing various DW, BI and DSS programs, I'm going to give a definition of DSS.  From a previous post in this DSS SG, we have the following:</p>
<blockquote><p>A DSS is a set of processes and technology that help an individual to make a better decision than they could without the DSS.<br />
-- <a href="http://press.teleinteractive.net/oss/2010/03/19/questions-and-commonality">Questions and Commonality</a></p></blockquote>

<p>As we stated, this is vague and generic.  Now that we've done some reading, let's see if we can do better.</p>
<blockquote><p>A DSS assists an individual in reaching the best possible conclusion, resolution or course of action in stand-alone, iterative or interdependent situations, by using historical and current structured and unstructured data, collaboration with colleagues, and personal knowledge to predict the outcome or infer the consequences.</p></blockquote>

<p>I like that definition, but your comments will help to refine it.</p>

<p>Note that we make no mention of specific processes, nor any technology whatsoever.  It reflects my bias that decisions are made by individuals not groups (electoral systems not withstanding).  To be true to our "TeleInterActive Lifestyle" <img src="http://press.teleinteractive.net/rsc/smilies/icon_wink.gif" alt="&#59;&#41;" class="middle" /> I should point out that the DSS must be available when and where the individual needs to make the decision.</p>

<p>Any comments?</p>

<!-- Creative Commons License --> <div class="cc_license"><a rel="cc:license" href="http://creativecommons.org/licenses/by-nc-sa/3.0/" title="Creative Commons License: Attribution, Non-Commercial, Share-Alike"><img src="http://i.creativecommons.org/l/by-nc-sa/3.0/88x31.png" alt="Creative Commons License: Attribution, Non-Commercial, Share-Alike" /></a>Except where otherwise noted, this content is<br />licensed under a <a rel="cc:license" href="http://creativecommons.org/licenses/by-nc-sa/3.0/" title="Creative Commons License: Attribution, Non-Commercial, Share-Alike">Creative Commons License</a>.</div> <!-- /Creative Commons License -->
<!--
<rdf:RDF xmlns="http://creativecommons.org/ns#"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<Work rdf:about="http://press.teleinteractive.net/oss/2010/04/28/oss-dss-formalization">
  <dc:title>OSS DSS Formalization</dc:title>
  <dc:creator><Agent>
    <dc:title>Joseph A. di Paolantonio (User ID#4 at http://press.teleinteractive.net/)</dc:title>
  </Agent></dc:creator>
  <dc:rights><Agent>
    <dc:title>Joseph A. di Paolantonio (User ID#4 at http://press.teleinteractive.net/)</dc:title>
  </Agent></dc:rights>
  <dc:date>2010-04-28 15:16:55</dc:date>
  <license rdf:resource="by-nc-sa/3.0/" />
</Work>
<License rdf:about="by-nc-sa/3.0/">
  <permits rdf:resource="http://web.resource.org/cc/Reproduction" />
  <permits rdf:resource="http://web.resource.org/cc/Distribution" />
  <permits rdf:resource="http://web.resource.org/cc/DerivativeWorks" />
  <prohibits rdf:resource="http://web.resource.org/cc/CommercialUse" />
  <requires rdf:resource="http://web.resource.org/cc/Notice" />
  <requires rdf:resource="http://web.resource.org/cc/Attribution" />
  <requires rdf:resource="http://web.resource.org/cc/ShareAlike" />
</License>
</rdf:RDF>
-->
<div class="item_footer"><p><small><a href="http://press.teleinteractive.net/oss/2010/04/28/oss-dss-formalization">Original post</a> blogged on <a href="http://b2evolution.net/">b2evolution</a>.</small></p></div>]]></content:encoded>
								<comments>http://press.teleinteractive.net/oss/2010/04/28/oss-dss-formalization#comments</comments>
			<wfw:commentRss>http://press.teleinteractive.net/oss/?tempskin=_rss2&#38;disp=comments&#38;p=936</wfw:commentRss>
		</item>
				<item>
			<title>R the next Big Thing or Not</title>
			<link>http://press.teleinteractive.net/oss/2010/04/20/r-the-next-big-thing-or-not</link>
			<pubDate>Tue, 20 Apr 2010 05:16:28 +0000</pubDate>			<dc:creator>Joseph A. di Paolantonio</dc:creator>
			<category domain="alt">Computers and Internet</category>
<category domain="alt">Predictions</category>
<category domain="main">OSS DSS Study Guide</category>			<guid isPermaLink="false">932@http://press.teleinteractive.net/</guid>
						<description>&lt;p&gt;Recently, AnnMaria De Mars, PhD (multiple) and Dr. Peter Flom, PhD have stirred up a bit of a tempest in a tweet-pot, as well as in the statistical blogosphere, with comparisons of R and SAS, IBM/SPSS and the like.  I've commented on both of their blogs, but decided to expand a bit here, as the choice of R is something that we planned to cover in a later post to our &lt;a href=&quot;/oss/DSS-SG/&quot;&gt;Open Source Solutions Decision Support Systems Study Guide&lt;/a&gt;.  First, let me say that Dr. De Mars and Dr. Flom appear to have posted completely independently of each other, and further, that their posts have different goals.&lt;/p&gt;
&lt;p&gt;In The Next Big Thing, Dr. De Mars is looking for the &lt;strong&gt;next big thing&lt;/strong&gt;, both to keep her own career on-track, and to guide students into areas of study that will be survive in the job market in the coming decades.  This is always difficult for mentors, as we can't always anticipate the &quot;black swan&quot; events that might change things drastically.  The tempestuous nature of her post came from one little sentence:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Contrary to what  some people seem to think, R is definitely not the next big thing, either. -- &lt;a title=&quot;About AnnMaria De Mars, PhD&quot; href=&quot;http://www.thejuliagroup.com/blog/?page_id=2&quot; target=&quot;_blank&quot;&gt;AnnMaria De Mars&lt;/a&gt;, &lt;a title=&quot;The Next Big Thing from AnnMaria De Mars Blog&quot; href=&quot;http://www.thejuliagroup.com/blog/?p=433&quot; target=&quot;_blank&quot;&gt;The Next Big Thing&lt;/a&gt;, &lt;a title=&quot;AnnMaria De Mars' Blog The Julia Group&quot; href=&quot;http://www.thejuliagroup.com/blog/&quot; target=&quot;_blank&quot;&gt;AnnMaria's Blog&lt;/a&gt;&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;In &lt;a title=&quot;SAS vs R Introduction and Request in Dr. Peter Flom's Blog&quot; href=&quot;http://www.statisticalanalysisconsulting.com/sas-vs-r-introduction-and-request/&quot; target=&quot;_blank&quot;&gt;SAS vs. R, Introduction and Request&lt;/a&gt;, &lt;a title=&quot;About Peter Flom, PhD&quot; href=&quot;http://www.statisticalanalysisconsulting.com/about/&quot; target=&quot;_blank&quot;&gt;Dr. Flom&lt;/a&gt; starts a series comparing R and SAS from the standpoint of a statistician deciding upon tools to use.&lt;/p&gt;
&lt;p&gt;There are several threads in Dr. De Mars post.  I agree with Dr. De Mars that two of the &quot;next big things&quot; in data management &amp;amp; analysis are data visualization and dealing with unstructured data.  I'm of the opinion that there is a third area, related to the &quot;Internet of Things&quot; and the tsunami of data that will be generated by it.  These are conceptual areas, however.  Dr. De Mars quickly moves on to discussing the tools that might be a part of the solutions of these next big things.  The concepts cited are neither software packages nor computing languages.  The software packages SAS, IBM/SPSS, Stata, Pentaho and the like, and the computing language S, with its open source distribution R, and its proprietary distribution S+ are none likely to be the next big things, as they are currently useful tools to know.&lt;/p&gt;
&lt;p&gt;I find it interesting that both Dr. De Mars and Dr. Flom, as well as the various commenters, tweeters, and other posters, are comparing software suites and applications with a computing language.  I think that a bit more historical perspective might be needed in bringing these threads together.&lt;/p&gt;
&lt;p&gt;In 1979, when I first sat down with a FORTRAN programmer to turn my Bayesian methodologies into practical applications to determine the reliability and risk associated with the STAR48 kick motor and associated Payload Assist Module (PAM), the statistical libraries for FORTRAN seemed amazing.  The ease with which we were able to create the program and churn through decades of NASA data (after buying a 1MB memory box for the mainframe) was wondrous  &lt;img src=&quot;http://press.teleinteractive.net/rsc/smilies/icon_wink.gif&quot; alt=&quot;&amp;#59;&amp;#41;&quot; class=&quot;middle&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Today, there's not so much wonder from such a feat.  The evolution of computing has drastically affected the way in which we apply mathematics and statistics today.  Several of the comments to these posts argue both sides of the statement that anyone doing statistics today should be a programmer, or shouldn't.  It's an interesting argument, that I've also seen reflected in chemistry, as fewer technicians are used in the lab, and the Ph.D.s work directly with the robots to prepare the samples and interpret the results.&lt;/p&gt;
&lt;p&gt;Approximately 15 years ago, I moved from solving scientific and engineering problems directly with statistics, to solving business problems through vendor's software suites.  The marketing names for this endeavor have gone through several changes: Decision Support Systems, Very Large Databases, Data Warehousing, Data Marts, Corporate Information Factory, Business Intelligence, and the like.  Today, Data Mining, Data Visualization, Sentiment Analysis, &quot;Big Data&quot;, SQL Streaming, and similar buzzwords reflect the new &quot;big thing&quot;.  Software applications, from new as well as established vendors, both open source and proprietary, are coming to the fore to handle these new areas that represent real problems.&lt;/p&gt;
&lt;p&gt;So, one question to answer for students, is which, if any, of these software packages will best survive with and aid the growth of, their maturing careers.  Will &lt;a title=&quot;Data Visualization and Business Analytics for All from Tableau&quot; href=&quot;http://www.tableausoftware.com/&quot; target=&quot;_blank&quot;&gt;Tableau&lt;/a&gt;, &lt;a title=&quot;Easy tools, cool charts, real collaboration from Lyza&quot; href=&quot;http://www.lyzasoft.com/&quot; target=&quot;_blank&quot;&gt;LyzaSoft&lt;/a&gt;, &lt;a title=&quot;QlikView&quot; href=&quot;http://www.qlikview.com/&quot; target=&quot;_blank&quot;&gt;QlikView&lt;/a&gt; or &lt;a title=&quot;Straycat Software Vineyard&quot; href=&quot;http://www.straysoft.com/&quot; target=&quot;_blank&quot;&gt;Viney@rd&lt;/a&gt; be in a better spot in 20 years, through growth or acquisition, than SAS or IBM/SPSS? Will the open source movement take down the proprietary vendors or be subsumed by them?  Is Pentaho/Weka the BI &amp;amp; data mining solution for their career?  Maybe, maybe not.  But what about that other beast of which everyone speaks?  Namely, &lt;a title=&quot;The main web site for the R project&quot; href=&quot;http://r-project.org/&quot; target=&quot;_blank&quot;&gt;R, the r-project, the R Statistical Language&lt;/a&gt;.  What is it?  Is it a worthy alternative to &lt;a title=&quot;The SAS Institute corporate web site&quot; href=&quot;http://www.sas.com/&quot; target=&quot;_blank&quot;&gt;SAS&lt;/a&gt; or &lt;a title=&quot;SPSS, an IBM company&quot; href=&quot;http://www.spss.com/&quot; target=&quot;_blank&quot;&gt;IBM/SPSS&lt;/a&gt; or &lt;a title=&quot;Pentaho corporate web site&quot; href=&quot;http://www.pentaho.com/&quot; target=&quot;_blank&quot;&gt;Pentaho&lt;/a&gt;/&lt;a title=&quot;Pentaho Data Mining enterprise edition based on Weka&quot; href=&quot;http://weka.pentaho.org/&quot; target=&quot;_blank&quot;&gt;Weka&lt;/a&gt;?  Or is it a different genus altogether?  That's a question I've been seeking to answer for myself, in my own career evolution.  After 15 years, software such as &lt;a title=&quot;Business Objects an SAP company web site&quot; href=&quot;http://www.sap.com/solutions/sapbusinessobjects/index.epx&quot; target=&quot;_blank&quot;&gt;SAP/Business Objects&lt;/a&gt; and &lt;a title=&quot;Cognos an IBM company web site&quot; href=&quot;http://www-01.ibm.com/software/data/cognos/&quot; target=&quot;_blank&quot;&gt;IBM/Cognos&lt;/a&gt;, haven't evolved into anything that I like, with their pinnacle of statistical computation being the &quot;average&quot;, the arithmetic mean.  SAS and IBM/SPSS are certainly better, and with data mining, machine learning and predictives becoming important to business, certainly likely to be a good choice for the future.  But are they really powerful enough?  Are they flexible enough?  Can they be used to solve the next generation of data problems?&amp;#160; They're very likely to evolve into software that can do so.&amp;#160; But how quickly?&amp;#160; And like all vendor software, they have limitations based upon the market studies and business decisions of the corporation.&lt;/p&gt;
&lt;p&gt;How is R different?&lt;/p&gt;
&lt;p&gt;Well, first, R is a computing language.  Unlike SAP/Business Objects, IBM/Cognos, IBM/SPSS, SAS, Pentaho, &lt;a title=&quot;JasperSoft corporate web site&quot; href=&quot;http://www.jaspersoft.com/&quot; target=&quot;_blank&quot;&gt;JasperSoft&lt;/a&gt;, &lt;a title=&quot;SpagoBI Community web site&quot; href=&quot;http://www.spagoworld.org/xwiki/bin/view/SpagoBI/&quot; target=&quot;_blank&quot;&gt;SpagoBI&lt;/a&gt;, or &lt;a title=&quot;Oracle corporate web site&quot; href=&quot;http://www.oracle.com/&quot; target=&quot;_blank&quot;&gt;Oracle&lt;/a&gt;, it's not a company, nor a BI Suite, nor even a collection of software applications.&amp;#160; Second, R is an open source project.  It's an open source implementation of S.  Like C, and the other single letter named languages, S came out of Bell Labs, and in the case of R, in 1976.  The open source implementation, R comes from &lt;a title=&quot;The faculty page for Ross Ihaka, co-creator of R&quot; href=&quot;http://www.stat.auckland.ac.nz/showperson.php?firstname=Ross&amp;amp;surname=Ihaka&quot; target=&quot;_blank&quot;&gt;R. Ihaka&lt;/a&gt; and &lt;a title=&quot;The home page of Robert Gentleman co-creator of R&quot; href=&quot;http://gentleman.fhcrc.org/&quot; target=&quot;_blank&quot;&gt;R. Gentleman&lt;/a&gt;, first revealed in 1996 through the article, R: A language for data analysis and graphics. Journal of Computational and Graphical Statistics, 5:299&amp;#8211;314, and is often associated with the &lt;a title=&quot;Department of Statistics web site at University of Auckland&quot; href=&quot;http://www.stat.auckland.ac.nz/&quot; target=&quot;_blank&quot;&gt;Department of Statistics&lt;/a&gt;, University of Auckland.&lt;/p&gt;
&lt;p&gt;While I'm not a software engineer, R is a very compelling statistical tool.  As a language, it's very intuitive&amp;#8230; for a statistician.  It's an interactive, interpretive, functional, object oriented, statistical programming language.  R itself is written in R, C, C++ and FORTRAN; it's powerful.  As an open source project, it has attracted thousands upon thousands of users who have formed a strong community.  There are thousands upon thousands of community contributed packages for R.  It's flexible, and growing.  One of the main goals of R was data visualization, and it has a wonderful new package for data visualization in &lt;a title=&quot;The ggplot2 for R web site&quot; href=&quot;http://had.co.nz/ggplot2/&quot; target=&quot;_blank&quot;&gt;ggplot2&lt;/a&gt;.  It's ahead of the curve.  There are packages for&amp;#160; &lt;a title=&quot;Parallel processing in R for multicore CPUs&quot; href=&quot;http://cran.r-project.org/web/packages/multicore/index.html&quot; target=&quot;_blank&quot;&gt;parallel&lt;/a&gt; &lt;a title=&quot;parallel virtual machine for R processing&quot; href=&quot;http://cran.r-project.org/web/packages/rpvm/index.html&quot; target=&quot;_blank&quot;&gt;processing&lt;/a&gt; (some quite &lt;a title=&quot;CUDA implementation of a Bayesian multilevel model for fMRI&quot; href=&quot;http://cran.r-project.org/web/packages/cudaBayesreg/index.html&quot; target=&quot;_blank&quot;&gt;specific&lt;/a&gt;), for &lt;a title=&quot;bounded memory linear and generalized linear models&quot; href=&quot;http://cran.r-project.org/web/packages/biglm/index.html&quot; target=&quot;_blank&quot;&gt;big data&lt;/a&gt; beyond &lt;a title=&quot;Manage massive matrices in R using C++, with support for shared memory and memory-mapped files&quot; href=&quot;http://cran.r-project.org/web/packages/bigmemory/index.html&quot; target=&quot;_blank&quot;&gt;in-memory capacity&lt;/a&gt;, for &lt;a title=&quot;Binary R server&quot; href=&quot;http://cran.r-project.org/web/packages/Rserve/index.html&quot; target=&quot;_blank&quot;&gt;servers&lt;/a&gt;, and for embedding in a &lt;a title=&quot;RApache&quot; href=&quot;http://biostat.mc.vanderbilt.edu/rapache/&quot; target=&quot;_blank&quot;&gt;web&lt;/a&gt; site.&amp;#160; Get the idea?&amp;#160; If you think you need something in R, search &lt;a title=&quot;The Comprehensive R Archive Network&quot; href=&quot;http://cran.r-project.org/web/packages/&quot; target=&quot;_blank&quot;&gt;CRAN&lt;/a&gt;, &lt;a title=&quot;R Forge for R Developers&quot; href=&quot;http://www.rforge.net/&quot; target=&quot;_blank&quot;&gt;RForge&lt;/a&gt;, &lt;a title=&quot;analysis and comprehension of genomic data&quot; href=&quot;http://www.bioconductor.org/&quot; target=&quot;_blank&quot;&gt;BioConductor&lt;/a&gt; or &lt;a title=&quot;More Packages for R&quot; href=&quot;http://www.omegahat.org/download/R/packages/&quot; target=&quot;_blank&quot;&gt;Omegahat&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;As you can tell, I like R. &lt;img src=&quot;http://press.teleinteractive.net/rsc/smilies/icon_smile.gif&quot; alt=&quot;&amp;#58;&amp;#41;&quot; class=&quot;middle&quot; /&gt; However, in all honesty, I don't think that the SAS vs. R controversy is an either/or situation. SAS, IBM/SPSS and Pentaho complement R and vice-versa. &lt;a title=&quot;Tweet from Doug Moran on the R plugin&quot; href=&quot;https://twitter.com/doug_moran/status/10856741067&quot; target=&quot;_blank&quot;&gt;Pentaho&lt;/a&gt;, IBM/SPSS and some SAS products support R. R can read data from SAS, IBM/SPSS , &lt;a title=&quot;ODBC Database Access&quot; href=&quot;http://cran.r-project.org/web/packages/RODBC/index.html&quot; target=&quot;_blank&quot;&gt;relational&lt;/a&gt; &lt;a title=&quot;R database interface&quot; href=&quot;http://cran.r-project.org/web/packages/DBI/index.html&quot; target=&quot;_blank&quot;&gt;databases&lt;/a&gt;, &lt;a title=&quot;transfer data between R and Excel, writing VBA macros using R&quot; href=&quot;http://cran.r-project.org/web/packages/RExcelInstaller/index.html&quot; target=&quot;_blank&quot;&gt;Excel&lt;/a&gt;, &lt;a title=&quot;flexible mapReduce algorithm for parallel computation&quot; href=&quot;http://cran.r-project.org/web/packages/mapReduce/index.html&quot; target=&quot;_blank&quot;&gt;mapReduce&lt;/a&gt; and more. The real question isn't is one tool better than another but rather selecting the best tool to answer a particular question.&amp;#160; That being said, I'm looking forward to Dr. Flom's &lt;a title=&quot;SAS vs R&quot; href=&quot;http://www.statisticalanalysisconsulting.com/sas-vs-r-introduction-and-request/&quot; target=&quot;_blank&quot;&gt;comparison&lt;/a&gt;, as well as the continuing discussion on Dr. De Mars' &lt;a title=&quot;The Next Big Thing&quot; href=&quot;http://www.thejuliagroup.com/blog/?p=433&quot; target=&quot;_blank&quot;&gt;blog&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;For us, the question is building a decision support system or stack from open source components. It looks like we'll have a good time doing so.&lt;/p&gt;

&lt;!-- Creative Commons License --&gt; &lt;div class=&quot;cc_license&quot;&gt;&lt;a rel=&quot;cc:license&quot; href=&quot;http://creativecommons.org/licenses/by-nc-sa/3.0/&quot; title=&quot;Creative Commons License: Attribution, Non-Commercial, Share-Alike&quot;&gt;&lt;img src=&quot;http://i.creativecommons.org/l/by-nc-sa/3.0/88x31.png&quot; alt=&quot;Creative Commons License: Attribution, Non-Commercial, Share-Alike&quot; /&gt;&lt;/a&gt;Except where otherwise noted, this content is&lt;br /&gt;licensed under a &lt;a rel=&quot;cc:license&quot; href=&quot;http://creativecommons.org/licenses/by-nc-sa/3.0/&quot; title=&quot;Creative Commons License: Attribution, Non-Commercial, Share-Alike&quot;&gt;Creative Commons License&lt;/a&gt;.&lt;/div&gt; &lt;!-- /Creative Commons License --&gt;
&lt;!--
&lt;rdf:RDF xmlns=&quot;http://creativecommons.org/ns#&quot;
    xmlns:dc=&quot;http://purl.org/dc/elements/1.1/&quot;
    xmlns:rdf=&quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#&quot;&gt;
&lt;Work rdf:about=&quot;http://press.teleinteractive.net/oss/2010/04/20/r-the-next-big-thing-or-not&quot;&gt;
  &lt;dc:title&gt;R the next Big Thing or Not&lt;/dc:title&gt;
  &lt;dc:creator&gt;&lt;Agent&gt;
    &lt;dc:title&gt;Joseph A. di Paolantonio (User ID#4 at http://press.teleinteractive.net/)&lt;/dc:title&gt;
  &lt;/Agent&gt;&lt;/dc:creator&gt;
  &lt;dc:rights&gt;&lt;Agent&gt;
    &lt;dc:title&gt;Joseph A. di Paolantonio (User ID#4 at http://press.teleinteractive.net/)&lt;/dc:title&gt;
  &lt;/Agent&gt;&lt;/dc:rights&gt;
  &lt;dc:date&gt;2010-04-19 18:40:16&lt;/dc:date&gt;
  &lt;license rdf:resource=&quot;by-nc-sa/3.0/&quot; /&gt;
&lt;/Work&gt;
&lt;License rdf:about=&quot;by-nc-sa/3.0/&quot;&gt;
  &lt;permits rdf:resource=&quot;http://web.resource.org/cc/Reproduction&quot; /&gt;
  &lt;permits rdf:resource=&quot;http://web.resource.org/cc/Distribution&quot; /&gt;
  &lt;permits rdf:resource=&quot;http://web.resource.org/cc/DerivativeWorks&quot; /&gt;
  &lt;prohibits rdf:resource=&quot;http://web.resource.org/cc/CommercialUse&quot; /&gt;
  &lt;requires rdf:resource=&quot;http://web.resource.org/cc/Notice&quot; /&gt;
  &lt;requires rdf:resource=&quot;http://web.resource.org/cc/Attribution&quot; /&gt;
  &lt;requires rdf:resource=&quot;http://web.resource.org/cc/ShareAlike&quot; /&gt;
&lt;/License&gt;
&lt;/rdf:RDF&gt;
--&gt;
&lt;div class=&quot;item_footer&quot;&gt;&lt;p&gt;&lt;small&gt;&lt;a href=&quot;http://press.teleinteractive.net/oss/2010/04/20/r-the-next-big-thing-or-not&quot;&gt;Original post&lt;/a&gt; blogged on &lt;a href=&quot;http://b2evolution.net/&quot;&gt;b2evolution&lt;/a&gt;.&lt;/small&gt;&lt;/p&gt;&lt;/div&gt;</description>
			<content:encoded><![CDATA[<p>Recently, AnnMaria De Mars, PhD (multiple) and Dr. Peter Flom, PhD have stirred up a bit of a tempest in a tweet-pot, as well as in the statistical blogosphere, with comparisons of R and SAS, IBM/SPSS and the like.  I've commented on both of their blogs, but decided to expand a bit here, as the choice of R is something that we planned to cover in a later post to our <a href="http://press.teleinteractive.net/oss/DSS-SG/">Open Source Solutions Decision Support Systems Study Guide</a>.  First, let me say that Dr. De Mars and Dr. Flom appear to have posted completely independently of each other, and further, that their posts have different goals.</p>
<p>In The Next Big Thing, Dr. De Mars is looking for the <strong>next big thing</strong>, both to keep her own career on-track, and to guide students into areas of study that will be survive in the job market in the coming decades.  This is always difficult for mentors, as we can't always anticipate the "black swan" events that might change things drastically.  The tempestuous nature of her post came from one little sentence:</p>
<blockquote><p>Contrary to what  some people seem to think, R is definitely not the next big thing, either. -- <a title="About AnnMaria De Mars, PhD" href="http://www.thejuliagroup.com/blog/?page_id=2" target="_blank">AnnMaria De Mars</a>, <a title="The Next Big Thing from AnnMaria De Mars Blog" href="http://www.thejuliagroup.com/blog/?p=433" target="_blank">The Next Big Thing</a>, <a title="AnnMaria De Mars' Blog The Julia Group" href="http://www.thejuliagroup.com/blog/" target="_blank">AnnMaria's Blog</a></p></blockquote>
<p>In <a title="SAS vs R Introduction and Request in Dr. Peter Flom's Blog" href="http://www.statisticalanalysisconsulting.com/sas-vs-r-introduction-and-request/" target="_blank">SAS vs. R, Introduction and Request</a>, <a title="About Peter Flom, PhD" href="http://www.statisticalanalysisconsulting.com/about/" target="_blank">Dr. Flom</a> starts a series comparing R and SAS from the standpoint of a statistician deciding upon tools to use.</p>
<p>There are several threads in Dr. De Mars post.  I agree with Dr. De Mars that two of the "next big things" in data management &amp; analysis are data visualization and dealing with unstructured data.  I'm of the opinion that there is a third area, related to the "Internet of Things" and the tsunami of data that will be generated by it.  These are conceptual areas, however.  Dr. De Mars quickly moves on to discussing the tools that might be a part of the solutions of these next big things.  The concepts cited are neither software packages nor computing languages.  The software packages SAS, IBM/SPSS, Stata, Pentaho and the like, and the computing language S, with its open source distribution R, and its proprietary distribution S+ are none likely to be the next big things, as they are currently useful tools to know.</p>
<p>I find it interesting that both Dr. De Mars and Dr. Flom, as well as the various commenters, tweeters, and other posters, are comparing software suites and applications with a computing language.  I think that a bit more historical perspective might be needed in bringing these threads together.</p>
<p>In 1979, when I first sat down with a FORTRAN programmer to turn my Bayesian methodologies into practical applications to determine the reliability and risk associated with the STAR48 kick motor and associated Payload Assist Module (PAM), the statistical libraries for FORTRAN seemed amazing.  The ease with which we were able to create the program and churn through decades of NASA data (after buying a 1MB memory box for the mainframe) was wondrous  <img src="http://press.teleinteractive.net/rsc/smilies/icon_wink.gif" alt="&#59;&#41;" class="middle" /></p>
<p>Today, there's not so much wonder from such a feat.  The evolution of computing has drastically affected the way in which we apply mathematics and statistics today.  Several of the comments to these posts argue both sides of the statement that anyone doing statistics today should be a programmer, or shouldn't.  It's an interesting argument, that I've also seen reflected in chemistry, as fewer technicians are used in the lab, and the Ph.D.s work directly with the robots to prepare the samples and interpret the results.</p>
<p>Approximately 15 years ago, I moved from solving scientific and engineering problems directly with statistics, to solving business problems through vendor's software suites.  The marketing names for this endeavor have gone through several changes: Decision Support Systems, Very Large Databases, Data Warehousing, Data Marts, Corporate Information Factory, Business Intelligence, and the like.  Today, Data Mining, Data Visualization, Sentiment Analysis, "Big Data", SQL Streaming, and similar buzzwords reflect the new "big thing".  Software applications, from new as well as established vendors, both open source and proprietary, are coming to the fore to handle these new areas that represent real problems.</p>
<p>So, one question to answer for students, is which, if any, of these software packages will best survive with and aid the growth of, their maturing careers.  Will <a title="Data Visualization and Business Analytics for All from Tableau" href="http://www.tableausoftware.com/" target="_blank">Tableau</a>, <a title="Easy tools, cool charts, real collaboration from Lyza" href="http://www.lyzasoft.com/" target="_blank">LyzaSoft</a>, <a title="QlikView" href="http://www.qlikview.com/" target="_blank">QlikView</a> or <a title="Straycat Software Vineyard" href="http://www.straysoft.com/" target="_blank">Viney@rd</a> be in a better spot in 20 years, through growth or acquisition, than SAS or IBM/SPSS? Will the open source movement take down the proprietary vendors or be subsumed by them?  Is Pentaho/Weka the BI &amp; data mining solution for their career?  Maybe, maybe not.  But what about that other beast of which everyone speaks?  Namely, <a title="The main web site for the R project" href="http://r-project.org/" target="_blank">R, the r-project, the R Statistical Language</a>.  What is it?  Is it a worthy alternative to <a title="The SAS Institute corporate web site" href="http://www.sas.com/" target="_blank">SAS</a> or <a title="SPSS, an IBM company" href="http://www.spss.com/" target="_blank">IBM/SPSS</a> or <a title="Pentaho corporate web site" href="http://www.pentaho.com/" target="_blank">Pentaho</a>/<a title="Pentaho Data Mining enterprise edition based on Weka" href="http://weka.pentaho.org/" target="_blank">Weka</a>?  Or is it a different genus altogether?  That's a question I've been seeking to answer for myself, in my own career evolution.  After 15 years, software such as <a title="Business Objects an SAP company web site" href="http://www.sap.com/solutions/sapbusinessobjects/index.epx" target="_blank">SAP/Business Objects</a> and <a title="Cognos an IBM company web site" href="http://www-01.ibm.com/software/data/cognos/" target="_blank">IBM/Cognos</a>, haven't evolved into anything that I like, with their pinnacle of statistical computation being the "average", the arithmetic mean.  SAS and IBM/SPSS are certainly better, and with data mining, machine learning and predictives becoming important to business, certainly likely to be a good choice for the future.  But are they really powerful enough?  Are they flexible enough?  Can they be used to solve the next generation of data problems?&#160; They're very likely to evolve into software that can do so.&#160; But how quickly?&#160; And like all vendor software, they have limitations based upon the market studies and business decisions of the corporation.</p>
<p>How is R different?</p>
<p>Well, first, R is a computing language.  Unlike SAP/Business Objects, IBM/Cognos, IBM/SPSS, SAS, Pentaho, <a title="JasperSoft corporate web site" href="http://www.jaspersoft.com/" target="_blank">JasperSoft</a>, <a title="SpagoBI Community web site" href="http://www.spagoworld.org/xwiki/bin/view/SpagoBI/" target="_blank">SpagoBI</a>, or <a title="Oracle corporate web site" href="http://www.oracle.com/" target="_blank">Oracle</a>, it's not a company, nor a BI Suite, nor even a collection of software applications.&#160; Second, R is an open source project.  It's an open source implementation of S.  Like C, and the other single letter named languages, S came out of Bell Labs, and in the case of R, in 1976.  The open source implementation, R comes from <a title="The faculty page for Ross Ihaka, co-creator of R" href="http://www.stat.auckland.ac.nz/showperson.php?firstname=Ross&amp;surname=Ihaka" target="_blank">R. Ihaka</a> and <a title="The home page of Robert Gentleman co-creator of R" href="http://gentleman.fhcrc.org/" target="_blank">R. Gentleman</a>, first revealed in 1996 through the article, R: A language for data analysis and graphics. Journal of Computational and Graphical Statistics, 5:299&#8211;314, and is often associated with the <a title="Department of Statistics web site at University of Auckland" href="http://www.stat.auckland.ac.nz/" target="_blank">Department of Statistics</a>, University of Auckland.</p>
<p>While I'm not a software engineer, R is a very compelling statistical tool.  As a language, it's very intuitive&#8230; for a statistician.  It's an interactive, interpretive, functional, object oriented, statistical programming language.  R itself is written in R, C, C++ and FORTRAN; it's powerful.  As an open source project, it has attracted thousands upon thousands of users who have formed a strong community.  There are thousands upon thousands of community contributed packages for R.  It's flexible, and growing.  One of the main goals of R was data visualization, and it has a wonderful new package for data visualization in <a title="The ggplot2 for R web site" href="http://had.co.nz/ggplot2/" target="_blank">ggplot2</a>.  It's ahead of the curve.  There are packages for&#160; <a title="Parallel processing in R for multicore CPUs" href="http://cran.r-project.org/web/packages/multicore/index.html" target="_blank">parallel</a> <a title="parallel virtual machine for R processing" href="http://cran.r-project.org/web/packages/rpvm/index.html" target="_blank">processing</a> (some quite <a title="CUDA implementation of a Bayesian multilevel model for fMRI" href="http://cran.r-project.org/web/packages/cudaBayesreg/index.html" target="_blank">specific</a>), for <a title="bounded memory linear and generalized linear models" href="http://cran.r-project.org/web/packages/biglm/index.html" target="_blank">big data</a> beyond <a title="Manage massive matrices in R using C++, with support for shared memory and memory-mapped files" href="http://cran.r-project.org/web/packages/bigmemory/index.html" target="_blank">in-memory capacity</a>, for <a title="Binary R server" href="http://cran.r-project.org/web/packages/Rserve/index.html" target="_blank">servers</a>, and for embedding in a <a title="RApache" href="http://biostat.mc.vanderbilt.edu/rapache/" target="_blank">web</a> site.&#160; Get the idea?&#160; If you think you need something in R, search <a title="The Comprehensive R Archive Network" href="http://cran.r-project.org/web/packages/" target="_blank">CRAN</a>, <a title="R Forge for R Developers" href="http://www.rforge.net/" target="_blank">RForge</a>, <a title="analysis and comprehension of genomic data" href="http://www.bioconductor.org/" target="_blank">BioConductor</a> or <a title="More Packages for R" href="http://www.omegahat.org/download/R/packages/" target="_blank">Omegahat</a>.</p>
<p>As you can tell, I like R. <img src="http://press.teleinteractive.net/rsc/smilies/icon_smile.gif" alt="&#58;&#41;" class="middle" /> However, in all honesty, I don't think that the SAS vs. R controversy is an either/or situation. SAS, IBM/SPSS and Pentaho complement R and vice-versa. <a title="Tweet from Doug Moran on the R plugin" href="https://twitter.com/doug_moran/status/10856741067" target="_blank">Pentaho</a>, IBM/SPSS and some SAS products support R. R can read data from SAS, IBM/SPSS , <a title="ODBC Database Access" href="http://cran.r-project.org/web/packages/RODBC/index.html" target="_blank">relational</a> <a title="R database interface" href="http://cran.r-project.org/web/packages/DBI/index.html" target="_blank">databases</a>, <a title="transfer data between R and Excel, writing VBA macros using R" href="http://cran.r-project.org/web/packages/RExcelInstaller/index.html" target="_blank">Excel</a>, <a title="flexible mapReduce algorithm for parallel computation" href="http://cran.r-project.org/web/packages/mapReduce/index.html" target="_blank">mapReduce</a> and more. The real question isn't is one tool better than another but rather selecting the best tool to answer a particular question.&#160; That being said, I'm looking forward to Dr. Flom's <a title="SAS vs R" href="http://www.statisticalanalysisconsulting.com/sas-vs-r-introduction-and-request/" target="_blank">comparison</a>, as well as the continuing discussion on Dr. De Mars' <a title="The Next Big Thing" href="http://www.thejuliagroup.com/blog/?p=433" target="_blank">blog</a>.</p>
<p>For us, the question is building a decision support system or stack from open source components. It looks like we'll have a good time doing so.</p>

<!-- Creative Commons License --> <div class="cc_license"><a rel="cc:license" href="http://creativecommons.org/licenses/by-nc-sa/3.0/" title="Creative Commons License: Attribution, Non-Commercial, Share-Alike"><img src="http://i.creativecommons.org/l/by-nc-sa/3.0/88x31.png" alt="Creative Commons License: Attribution, Non-Commercial, Share-Alike" /></a>Except where otherwise noted, this content is<br />licensed under a <a rel="cc:license" href="http://creativecommons.org/licenses/by-nc-sa/3.0/" title="Creative Commons License: Attribution, Non-Commercial, Share-Alike">Creative Commons License</a>.</div> <!-- /Creative Commons License -->
<!--
<rdf:RDF xmlns="http://creativecommons.org/ns#"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<Work rdf:about="http://press.teleinteractive.net/oss/2010/04/20/r-the-next-big-thing-or-not">
  <dc:title>R the next Big Thing or Not</dc:title>
  <dc:creator><Agent>
    <dc:title>Joseph A. di Paolantonio (User ID#4 at http://press.teleinteractive.net/)</dc:title>
  </Agent></dc:creator>
  <dc:rights><Agent>
    <dc:title>Joseph A. di Paolantonio (User ID#4 at http://press.teleinteractive.net/)</dc:title>
  </Agent></dc:rights>
  <dc:date>2010-04-19 18:40:16</dc:date>
  <license rdf:resource="by-nc-sa/3.0/" />
</Work>
<License rdf:about="by-nc-sa/3.0/">
  <permits rdf:resource="http://web.resource.org/cc/Reproduction" />
  <permits rdf:resource="http://web.resource.org/cc/Distribution" />
  <permits rdf:resource="http://web.resource.org/cc/DerivativeWorks" />
  <prohibits rdf:resource="http://web.resource.org/cc/CommercialUse" />
  <requires rdf:resource="http://web.resource.org/cc/Notice" />
  <requires rdf:resource="http://web.resource.org/cc/Attribution" />
  <requires rdf:resource="http://web.resource.org/cc/ShareAlike" />
</License>
</rdf:RDF>
-->
<div class="item_footer"><p><small><a href="http://press.teleinteractive.net/oss/2010/04/20/r-the-next-big-thing-or-not">Original post</a> blogged on <a href="http://b2evolution.net/">b2evolution</a>.</small></p></div>]]></content:encoded>
								<comments>http://press.teleinteractive.net/oss/2010/04/20/r-the-next-big-thing-or-not#comments</comments>
			<wfw:commentRss>http://press.teleinteractive.net/oss/?tempskin=_rss2&#38;disp=comments&#38;p=932</wfw:commentRss>
		</item>
				<item>
			<title>OSS DSS Studies Introduction</title>
			<link>http://press.teleinteractive.net/oss/2010/04/19/oss-dss-studies-introduction</link>
			<pubDate>Mon, 19 Apr 2010 23:11:22 +0000</pubDate>			<dc:creator>Joseph A. di Paolantonio</dc:creator>
			<category domain="main">OSS DSS Study Guide</category>			<guid isPermaLink="false">929@http://press.teleinteractive.net/</guid>
						<description>&lt;p&gt;&lt;span style=&quot;font-family: verdana,geneva;&quot;&gt;First, let me say that we're talking about systems supporting the decisions that are made by human beings, not &quot;expert systems&quot; that automate decisions.&amp;#160; As an example, let's look at inventory management.&amp;#160; A human might use various components of a DSS to determine the amount of an item in stock, the demand for that item as a trend to determine when it might be out of stock, and predictives as to various factors (internal, external, environmental, political, etc) that might affect supply, to come to a decision as to how much and when to order more of that item.&amp;#160; An expert system might be created that could also determine when and how much of an item to oder, using neural networks, Bayesian nets or other algorithms.&amp;#160; The expert system might even take from the same DSS components (or directly from their underlying data) as the human might.&amp;#160; One could even run the expert system in parallel with humans making the decisions, scoring or otherwise evaluating the two, until the expert system is comparable or better than the expert system.&amp;#160; But, we're not really interested in expert systems in this study guide.&amp;#160; We'll be focusing on systems that help humans to make better decisions, not on automated feedback and control loops.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style=&quot;font-family: verdana,geneva;&quot;&gt;To me, a technology doesn't matter very much if it's not supporting some process, or a step within a process.&amp;#160; That process may be for personal reasons or supporting work activities. For this study guide, let's begin by continuing the discussion that we began in the previous posts, about the process by which one makes a decision, the steps, the events, the triggers and the consequences of making a decision.&lt;br /&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style=&quot;font-family: verdana,geneva;&quot;&gt;I have my own process in making decisions.&amp;#160; I've played in executive and management roles for many years, and have been responsible for 5 P/L centers.&amp;#160; But this is a study guide, and while I intend to offer my own opinions and interpretations, we need some objective sources to study.&amp;#160; Let's start with a &lt;a title=&quot;Google search for decision making process&quot; href=&quot;http://www.google.com/search?q=decision+making+process&quot; target=&quot;_blank&quot;&gt;Google search&lt;/a&gt;.&amp;#160; Of course, Wikipedia has an &lt;a title=&quot;Wikipedia article on the decision making process&quot; href=&quot;http://en.wikipedia.org/wiki/Decision_making&quot; target=&quot;_blank&quot;&gt;article&lt;/a&gt;.&amp;#160; A site of which I've not heard before has the first hit with their &lt;a title=&quot;Businessballs.com article on problem solving and decision making&quot; href=&quot;http://www.businessballs.com/problemsolving.htm&quot; target=&quot;_blank&quot;&gt;article on problem-solving and decision-making&lt;/a&gt;.&amp;#160; Science Daily has a timely article from &lt;a title=&quot;Science Daily article New Insight Into Brain's Decision-Making Process&quot; href=&quot;http://www.sciencedaily.com/releases/2010/03/100311123620.htm&quot; target=&quot;_blank&quot;&gt;2010 March 13 on how we really make decisions&lt;/a&gt;, our brain activity during decision making.&amp;#160; I also like the &lt;a title=&quot;The Institute for Strategic Clarity Decision Making Process Map&quot; href=&quot;http://www.instituteforstrategicclarity.org/dmp.htm&quot; target=&quot;_blank&quot;&gt;map&lt;/a&gt; from The Institute for Strategic Clarity.&amp;#160; Mindtools sets out a &lt;a title=&quot;Mindtools.com Techniques for Decision Making&quot; href=&quot;http://www.mindtools.com/pages/main/newMN_TED.htm&quot; target=&quot;_blank&quot;&gt;list of techniques and tools&lt;/a&gt; for aiding in the decision making process, and provides an important caveat &quot;&lt;em&gt;Do remember, though, that the tools in this chapter exist only to assist your intelligence and common sense. These are your most important assets in good Decision Making&lt;/em&gt;&quot;.&amp;#160; Reading through various reviews, the one book on decision making that I want to add to my library is &lt;a title=&quot;Amazon Link to The Managerial Decision-Making Process&quot; href=&quot;&amp;lt;a href=&amp;quot;http://www.amazon.com/gp/product/0395908213?ie=UTF8&amp;amp;tag=intersystecon-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=9325&amp;amp;creativeASIN=0395908213&amp;quot;&amp;gt;&quot; target=&quot;_blank&quot;&gt;&lt;strong&gt;The Managerial Decision-Making Process&lt;/strong&gt;&lt;/a&gt;, 5th ed. by E. Frank Harrison.&amp;#160; From the Glossary of Political Economy Terms, we have:&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;span style=&quot;font-family: verdana,geneva;&quot;&gt;Where formal organizations are the setting in which decisions are made, the particular decisions or policies chosen by decision-makers can often be explained through reference to the organization's particular structure and procedural rules. Such explanations typically involve looking at the distribution of responsibilities among organizational sub-units, the activities of committees and ad hoc coordinating groups, meeting schedules, rules of order etc. The notion of fixed-in-advance standard operating procedures (SOPs) typically plays an important role in such explanations of individual decisions made. -- &lt;/span&gt;&lt;a title=&quot;Auburn University Paul M. Johnson Definition of the Organizational Process Model of Decision Making&quot; href=&quot;http://www.auburn.edu/~johnspm/gloss/organizational_process&quot; target=&quot;_blank&quot;&gt;&lt;span style=&quot;font-family: verdana,geneva;&quot;&gt;Organizational process models of decision-making&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style=&quot;font-family: verdana,geneva;&quot;&gt;Let's revisit and expand upon&lt;/span&gt; &lt;span style=&quot;font-family: verdana,geneva;&quot;&gt;the summary that we gave in the &lt;a title=&quot;Questions and Commonality, Third Post in the OSS DSS Study Guide&quot; href=&quot;/oss/2010/03/19/questions-and-commonality&quot; target=&quot;_self&quot;&gt;third post&lt;/a&gt; in this series.&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;span style=&quot;font-family: verdana,geneva;&quot;&gt;As an in&lt;/span&gt;&lt;span style=&quot;font-family: verdana,geneva;&quot;&gt;divi&lt;/span&gt;&lt;span style=&quot;font-family: verdana,geneva;&quot;&gt;dual faced with making a decision, I may want input from others, I may want consensus, but in the end, it is an individual decision, and I will bear the fruits of having made that decision.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;span style=&quot;font-family: verdana,geneva;&quot;&gt;I need to put the problem, and my decision making, into context.&amp;#160; I have a variety of resources at my disposal to do so:&lt;/span&gt; &lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span style=&quot;font-family: verdana,geneva;&quot;&gt;historical data&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style=&quot;font-family: verdana,geneva;&quot;&gt;current information&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style=&quot;font-family: verdana,geneva;&quot;&gt;structured data from transactional systems, master data, metadata, data warehouse, and other possible sources&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style=&quot;font-family: verdana,geneva;&quot;&gt;unstructured data from blogs, wikis, Zotero libraries, Evernote, searches, bookmarks and similar sources&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style=&quot;font-family: verdana,geneva;&quot;&gt;email&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style=&quot;font-family: verdana,geneva;&quot;&gt;non-electronic correspondence, notes and conversations&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style=&quot;font-family: verdana,geneva;&quot;&gt;personal experience&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style=&quot;font-family: verdana,geneva;&quot;&gt;the experience of others garnered through water cooler and hallway conversations, formal meetings, twitter, phone calls and the like&lt;br /&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span style=&quot;font-family: verdana,geneva;&quot;&gt;Now I need to understand all of these facts, opinions and conjecture at my disposal&lt;/span&gt;&lt;span style=&quot;font-family: verdana,geneva;&quot;&gt;.&amp;#160; Part of this sifting all of it through my internal filters, using my &quot;gut&quot;.&amp;#160; Part is using the various reporting and analytical tools at my disposal, and then filtering those through my gut.&amp;#160; And really, this and the next point will constitute the majority of this OSS DSS Study Guide - the tools we use.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style=&quot;font-family: verdana,geneva;&quot;&gt;As I contemplate the various decisions that I might make from all of this, I want to understand the consequences of each potential decision: might this decision lead to a better product, more profit, less profit, broader market penetration, higher reliability, or even an alternate universe.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style=&quot;font-family: verdana,geneva;&quot;&gt;As I make this decision, I'll want to collaborate with others.&amp;#160; Ideally, I'll want to collaborate within the context of my decision support system. Once upon a time we would do this by embedding the tools within a portal system, now we take a more master data management approach, and use a services oriented architecture with either web services description language (WSDL) or representational state transition (ReST) application programming interfaces (APIs) to the collaborative environment, usually a wiki.&lt;br /&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;span style=&quot;font-family: verdana,geneva;&quot;&gt;In summary, this introduction has set up a framework for a decision-making process for an individual to use a decision support system.&amp;#160; The majority of this study guide will be to expore the actual decision support system, and the open source tools from which we can build such a system.&lt;/span&gt;&lt;/p&gt;

&lt;!-- Creative Commons License NEWSTYLE license:by-nc-sa nation:3.0 display:new --&gt;&lt;div class=&quot;item_footer&quot;&gt;&lt;p&gt;&lt;small&gt;&lt;a href=&quot;http://press.teleinteractive.net/oss/2010/04/19/oss-dss-studies-introduction&quot;&gt;Original post&lt;/a&gt; blogged on &lt;a href=&quot;http://b2evolution.net/&quot;&gt;b2evolution&lt;/a&gt;.&lt;/small&gt;&lt;/p&gt;&lt;/div&gt;</description>
			<content:encoded><![CDATA[<p><span style="font-family: verdana,geneva;">First, let me say that we're talking about systems supporting the decisions that are made by human beings, not "expert systems" that automate decisions.&#160; As an example, let's look at inventory management.&#160; A human might use various components of a DSS to determine the amount of an item in stock, the demand for that item as a trend to determine when it might be out of stock, and predictives as to various factors (internal, external, environmental, political, etc) that might affect supply, to come to a decision as to how much and when to order more of that item.&#160; An expert system might be created that could also determine when and how much of an item to oder, using neural networks, Bayesian nets or other algorithms.&#160; The expert system might even take from the same DSS components (or directly from their underlying data) as the human might.&#160; One could even run the expert system in parallel with humans making the decisions, scoring or otherwise evaluating the two, until the expert system is comparable or better than the expert system.&#160; But, we're not really interested in expert systems in this study guide.&#160; We'll be focusing on systems that help humans to make better decisions, not on automated feedback and control loops.</span></p>
<p><span style="font-family: verdana,geneva;">To me, a technology doesn't matter very much if it's not supporting some process, or a step within a process.&#160; That process may be for personal reasons or supporting work activities. For this study guide, let's begin by continuing the discussion that we began in the previous posts, about the process by which one makes a decision, the steps, the events, the triggers and the consequences of making a decision.<br /></span></p>
<p><span style="font-family: verdana,geneva;">I have my own process in making decisions.&#160; I've played in executive and management roles for many years, and have been responsible for 5 P/L centers.&#160; But this is a study guide, and while I intend to offer my own opinions and interpretations, we need some objective sources to study.&#160; Let's start with a <a title="Google search for decision making process" href="http://www.google.com/search?q=decision+making+process" target="_blank">Google search</a>.&#160; Of course, Wikipedia has an <a title="Wikipedia article on the decision making process" href="http://en.wikipedia.org/wiki/Decision_making" target="_blank">article</a>.&#160; A site of which I've not heard before has the first hit with their <a title="Businessballs.com article on problem solving and decision making" href="http://www.businessballs.com/problemsolving.htm" target="_blank">article on problem-solving and decision-making</a>.&#160; Science Daily has a timely article from <a title="Science Daily article New Insight Into Brain's Decision-Making Process" href="http://www.sciencedaily.com/releases/2010/03/100311123620.htm" target="_blank">2010 March 13 on how we really make decisions</a>, our brain activity during decision making.&#160; I also like the <a title="The Institute for Strategic Clarity Decision Making Process Map" href="http://www.instituteforstrategicclarity.org/dmp.htm" target="_blank">map</a> from The Institute for Strategic Clarity.&#160; Mindtools sets out a <a title="Mindtools.com Techniques for Decision Making" href="http://www.mindtools.com/pages/main/newMN_TED.htm" target="_blank">list of techniques and tools</a> for aiding in the decision making process, and provides an important caveat "<em>Do remember, though, that the tools in this chapter exist only to assist your intelligence and common sense. These are your most important assets in good Decision Making</em>".&#160; Reading through various reviews, the one book on decision making that I want to add to my library is <a title="Amazon Link to The Managerial Decision-Making Process" href="http://press.teleinteractive.net&lt;a href=&quot;http://www.amazon.com/gp/product/0395908213?ie=UTF8&amp;tag=intersystecon-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=0395908213&quot;&gt;" target="_blank"><strong>The Managerial Decision-Making Process</strong></a>, 5th ed. by E. Frank Harrison.&#160; From the Glossary of Political Economy Terms, we have:</span></p>
<p><br /></p>
<blockquote>
<p><span style="font-family: verdana,geneva;">Where formal organizations are the setting in which decisions are made, the particular decisions or policies chosen by decision-makers can often be explained through reference to the organization's particular structure and procedural rules. Such explanations typically involve looking at the distribution of responsibilities among organizational sub-units, the activities of committees and ad hoc coordinating groups, meeting schedules, rules of order etc. The notion of fixed-in-advance standard operating procedures (SOPs) typically plays an important role in such explanations of individual decisions made. -- </span><a title="Auburn University Paul M. Johnson Definition of the Organizational Process Model of Decision Making" href="http://www.auburn.edu/~johnspm/gloss/organizational_process" target="_blank"><span style="font-family: verdana,geneva;">Organizational process models of decision-making</span></a></p>
</blockquote>
<p><br /></p>
<p><span style="font-family: verdana,geneva;">Let's revisit and expand upon</span> <span style="font-family: verdana,geneva;">the summary that we gave in the <a title="Questions and Commonality, Third Post in the OSS DSS Study Guide" href="http://press.teleinteractive.net/oss/2010/03/19/questions-and-commonality" target="_self">third post</a> in this series.</span></p>
<ol>
<li><span style="font-family: verdana,geneva;">As an in</span><span style="font-family: verdana,geneva;">divi</span><span style="font-family: verdana,geneva;">dual faced with making a decision, I may want input from others, I may want consensus, but in the end, it is an individual decision, and I will bear the fruits of having made that decision.</span></li>
<li><p><span style="font-family: verdana,geneva;">I need to put the problem, and my decision making, into context.&#160; I have a variety of resources at my disposal to do so:</span> </p>
<ul>
<li><span style="font-family: verdana,geneva;">historical data</span></li>
<li><span style="font-family: verdana,geneva;">current information</span></li>
<li><span style="font-family: verdana,geneva;">structured data from transactional systems, master data, metadata, data warehouse, and other possible sources</span></li>
<li><span style="font-family: verdana,geneva;">unstructured data from blogs, wikis, Zotero libraries, Evernote, searches, bookmarks and similar sources</span></li>
<li><span style="font-family: verdana,geneva;">email</span></li>
<li><span style="font-family: verdana,geneva;">non-electronic correspondence, notes and conversations</span></li>
<li><span style="font-family: verdana,geneva;">personal experience</span></li>
<li><span style="font-family: verdana,geneva;">the experience of others garnered through water cooler and hallway conversations, formal meetings, twitter, phone calls and the like<br /></span></li>
</ul>
</li>
<li><span style="font-family: verdana,geneva;">Now I need to understand all of these facts, opinions and conjecture at my disposal</span><span style="font-family: verdana,geneva;">.&#160; Part of this sifting all of it through my internal filters, using my "gut".&#160; Part is using the various reporting and analytical tools at my disposal, and then filtering those through my gut.&#160; And really, this and the next point will constitute the majority of this OSS DSS Study Guide - the tools we use.</span></li>
<li><span style="font-family: verdana,geneva;">As I contemplate the various decisions that I might make from all of this, I want to understand the consequences of each potential decision: might this decision lead to a better product, more profit, less profit, broader market penetration, higher reliability, or even an alternate universe.</span></li>
<li><span style="font-family: verdana,geneva;">As I make this decision, I'll want to collaborate with others.&#160; Ideally, I'll want to collaborate within the context of my decision support system. Once upon a time we would do this by embedding the tools within a portal system, now we take a more master data management approach, and use a services oriented architecture with either web services description language (WSDL) or representational state transition (ReST) application programming interfaces (APIs) to the collaborative environment, usually a wiki.<br /></span></li>
</ol>
<p><span style="font-family: verdana,geneva;">In summary, this introduction has set up a framework for a decision-making process for an individual to use a decision support system.&#160; The majority of this study guide will be to expore the actual decision support system, and the open source tools from which we can build such a system.</span></p>

<!-- Creative Commons License --> <div class="cc_license"><a rel="cc:license" href="http://creativecommons.org/licenses/by-nc-sa/3.0/" title="Creative Commons License: Attribution, Non-Commercial, Share-Alike"><img src="http://i.creativecommons.org/l/by-nc-sa/3.0/88x31.png" alt="Creative Commons License: Attribution, Non-Commercial, Share-Alike" /></a>Except where otherwise noted, this content is<br />licensed under a <a rel="cc:license" href="http://creativecommons.org/licenses/by-nc-sa/3.0/" title="Creative Commons License: Attribution, Non-Commercial, Share-Alike">Creative Commons License</a>.</div> <!-- /Creative Commons License -->
<!--
<rdf:RDF xmlns="http://creativecommons.org/ns#"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<Work rdf:about="http://press.teleinteractive.net/oss/2010/04/19/oss-dss-studies-introduction">
  <dc:title>OSS DSS Studies Introduction</dc:title>
  <dc:creator><Agent>
    <dc:title>Joseph A. di Paolantonio (User ID#4 at http://press.teleinteractive.net/)</dc:title>
  </Agent></dc:creator>
  <dc:rights><Agent>
    <dc:title>Joseph A. di Paolantonio (User ID#4 at http://press.teleinteractive.net/)</dc:title>
  </Agent></dc:rights>
  <dc:date>2010-04-12 15:18:35</dc:date>
  <license rdf:resource="by-nc-sa/3.0/" />
</Work>
<License rdf:about="by-nc-sa/3.0/">
  <permits rdf:resource="http://web.resource.org/cc/Reproduction" />
  <permits rdf:resource="http://web.resource.org/cc/Distribution" />
  <permits rdf:resource="http://web.resource.org/cc/DerivativeWorks" />
  <prohibits rdf:resource="http://web.resource.org/cc/CommercialUse" />
  <requires rdf:resource="http://web.resource.org/cc/Notice" />
  <requires rdf:resource="http://web.resource.org/cc/Attribution" />
  <requires rdf:resource="http://web.resource.org/cc/ShareAlike" />
</License>
</rdf:RDF>
-->
<div class="item_footer"><p><small><a href="http://press.teleinteractive.net/oss/2010/04/19/oss-dss-studies-introduction">Original post</a> blogged on <a href="http://b2evolution.net/">b2evolution</a>.</small></p></div>]]></content:encoded>
								<comments>http://press.teleinteractive.net/oss/2010/04/19/oss-dss-studies-introduction#comments</comments>
			<wfw:commentRss>http://press.teleinteractive.net/oss/?tempskin=_rss2&#38;disp=comments&#38;p=929</wfw:commentRss>
		</item>
				<item>
			<title>Syllabus for OSS DSS Studies</title>
			<link>http://press.teleinteractive.net/oss/2010/03/23/syllabus-for-oss-dss-studies</link>
			<pubDate>Tue, 23 Mar 2010 20:23:01 +0000</pubDate>			<dc:creator>Joseph A. di Paolantonio</dc:creator>
			<category domain="alt">Open Source</category>
<category domain="alt">Business Intelligence</category>
<category domain="main">OSS DSS Study Guide</category>			<guid isPermaLink="false">927@http://press.teleinteractive.net/</guid>
						<description>&lt;p&gt;As promised, here's the syllabus for our study guide to decision support systems using open source solutions.  We'll start with a first draft on 2010-03-23, and update and change based on ideas, comments and lessons learned.  So, please comment.  &lt;img src=&quot;http://press.teleinteractive.net/rsc/smilies/icon_smile.gif&quot; alt=&quot;&amp;#58;&amp;#41;&quot; class=&quot;middle&quot; /&gt;  The updates will be marked.  Deletions will be marked with a &lt;del&gt;strike-though&lt;/del&gt; and not removed.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Introduction&lt;ol&gt;&lt;li&gt;Continuing the discussion of the processes and technologies that constitute a decision support system&lt;/li&gt;&lt;li&gt;Formalizing a definition of DSS as well as the components, such as business intelligence (BI) that contribute to a DSS&lt;/li&gt;&lt;li&gt;Providing [and updating] the list of references for this study guide&lt;/li&gt;&lt;/ol&gt;&lt;/li&gt;
  &lt;li&gt;Preparation&lt;ol&gt;&lt;li&gt;Discussing the technology for use in this study guide including the client(s) and server (Red Hat Enterprise Linux 5)&lt;/li&gt;&lt;li&gt;Checking for prerequisites for the open source solutions that will be used&lt;/li&gt;&lt;li&gt;Hands-on exercises for preparing the system&lt;/li&gt;&lt;/ol&gt;&lt;/li&gt;
  &lt;li&gt;Installation&lt;ol&gt;&lt;li&gt;Pointers and examples for installing the open source server-side packages including but not limited to:&lt;ol&gt;&lt;li&gt;LucidDB&lt;/li&gt;&lt;li&gt;Pentaho BI-Server, including PAT, and Administrative Console&lt;/li&gt;&lt;li&gt;RServe and/or RApache&lt;/li&gt;&lt;/ol&gt;&lt;/li&gt;&lt;li&gt;Pointers for installation of client-side software and some examples on MacOSX&lt;/li&gt;&lt;/ol&gt;&lt;/li&gt;
  &lt;li&gt;Modeling&lt;ol&gt;&lt;li&gt;Generally, we would determine the models, the architecture and then one (or more competing) design(s) to satisfy that architecture, including selecting the right technical solutions for the job at hand.  Here, we're creating a learning environment for certain tools, so we're introducing the architecture and design studies after the technology installs.&lt;/li&gt;&lt;li&gt;In general, this section will explore the various means of modeling processes, systems and data, specifically as these relate to making decisions.&lt;/li&gt;&lt;li&gt;Decision Making Processes&lt;ol&gt;&lt;li&gt;Decision Theory&lt;/li&gt;&lt;li&gt;Game Theory&lt;/li&gt;&lt;li&gt;Machine Learning &amp;amp; Data Mining&lt;/li&gt;&lt;li&gt;Bayes and Iterations&lt;/li&gt;&lt;li&gt;Predictives&lt;/li&gt;&lt;/ol&gt;&lt;/li&gt;&lt;li&gt;Information Flow&lt;/li&gt;&lt;li&gt;Mathematical Modeling&lt;/li&gt;&lt;li&gt;Data Modeling&lt;/li&gt;&lt;li&gt;UML&lt;/li&gt;&lt;li&gt;Dimensional Modeling&lt;/li&gt;&lt;li&gt;PMML&lt;/li&gt;&lt;/ol&gt;&lt;/li&gt;
  &lt;li&gt;Architecture and Design&lt;ol&gt;&lt;li&gt;In this section, we'll examine the differences between enterprise and system architecture, and between architecture and design.  We'll look at various architectural and design elements that might influence both policy and technology directions.&lt;/li&gt;&lt;li&gt;Discussing Enterprise Architecture, especially the translation between the user needs and technology/operational realities&lt;/li&gt;&lt;li&gt;System Architecture&lt;/li&gt;&lt;li&gt;SOA, ReST, WSDL, and Master Data Management&lt;/li&gt;&lt;li&gt;Technology selection and vendor bake-offs&lt;/li&gt;&lt;/ol&gt;&lt;/li&gt;
  &lt;li&gt;Implementation Considerations&lt;ol&gt;&lt;li&gt;Discussing the various philosophies and considerations for implementing any DSS, or really, any system integration project.  We'll look at our own three track implementation methodology, as well as how the new Pentaho Agile BI tools support our method.  In addition, we'll consider how we'll get all these OSS tools working together, on the same data sets, as well as, the importance of managing data about the data.&lt;/li&gt;&lt;li&gt;Pentaho Agile BI and our own 8D&amp;#8482; Method&lt;/li&gt;&lt;li&gt;System and Data Integration&lt;/li&gt;&lt;li&gt;Metadata&lt;/li&gt;&lt;/ol&gt;&lt;/li&gt;
  &lt;li&gt;Using the Tools&lt;ol&gt;&lt;li&gt;This is the vaguest part of our syllabus.  We'll be using the examples from our various references, but with the system we've set-up here, rather than the exact systems that the references use.  For example, we'll be using LucidDB and not MySQL for the examples from Pentaho Solutions.  Remember too, that this is a study guide, and not &lt;del&gt;a oops&lt;/del&gt; meant to be a book written as a series of blog posts, so while we might vary from the reference materials, we'll always refer to them.&lt;/li&gt;&lt;li&gt;ETL&lt;/li&gt;&lt;li&gt;Reporting&lt;/li&gt;&lt;li&gt;OLAP&lt;/li&gt;&lt;li&gt;Data Mining &amp;amp; Machine Learning&lt;/li&gt;&lt;li&gt;Statistical Analysis&lt;/li&gt;&lt;li&gt;Predictives&lt;/li&gt;&lt;li&gt;Workflow&lt;/li&gt;&lt;li&gt;Collaboration&lt;/li&gt;&lt;li&gt;Hmm, this should take years &lt;img src=&quot;http://press.teleinteractive.net/rsc/smilies/icon_biggrin.gif&quot; alt=&quot;&amp;#58;&amp;#68;&quot; class=&quot;middle&quot; /&gt; &lt;/li&gt;&lt;/ol&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;!-- Creative Commons License --&gt; &lt;div class=&quot;cc_license&quot;&gt;&lt;a rel=&quot;cc:license&quot; href=&quot;http://creativecommons.org/licenses/by-nc-sa/3.0/&quot; title=&quot;Creative Commons License: Attribution, Non-Commercial, Share-Alike&quot;&gt;&lt;img src=&quot;http://i.creativecommons.org/l/by-nc-sa/3.0/88x31.png&quot; alt=&quot;Creative Commons License: Attribution, Non-Commercial, Share-Alike&quot; /&gt;&lt;/a&gt;Except where otherwise noted, this content is&lt;br /&gt;licensed under a &lt;a rel=&quot;cc:license&quot; href=&quot;http://creativecommons.org/licenses/by-nc-sa/3.0/&quot; title=&quot;Creative Commons License: Attribution, Non-Commercial, Share-Alike&quot;&gt;Creative Commons License&lt;/a&gt;.&lt;/div&gt; &lt;!-- /Creative Commons License --&gt;
&lt;!--
&lt;rdf:RDF xmlns=&quot;http://creativecommons.org/ns#&quot;
    xmlns:dc=&quot;http://purl.org/dc/elements/1.1/&quot;
    xmlns:rdf=&quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#&quot;&gt;
&lt;Work rdf:about=&quot;http://press.teleinteractive.net/oss/2010/03/23/syllabus-for-oss-dss-studies&quot;&gt;
  &lt;dc:title&gt;Syllabus for OSS DSS Studies&lt;/dc:title&gt;
  &lt;dc:creator&gt;&lt;Agent&gt;
    &lt;dc:title&gt;Joseph A. di Paolantonio (User ID#4 at http://press.teleinteractive.net/)&lt;/dc:title&gt;
  &lt;/Agent&gt;&lt;/dc:creator&gt;
  &lt;dc:rights&gt;&lt;Agent&gt;
    &lt;dc:title&gt;Joseph A. di Paolantonio (User ID#4 at http://press.teleinteractive.net/)&lt;/dc:title&gt;
  &lt;/Agent&gt;&lt;/dc:rights&gt;
  &lt;dc:date&gt;2010-03-23 15:23:01&lt;/dc:date&gt;
  &lt;license rdf:resource=&quot;by-nc-sa/3.0/&quot; /&gt;
&lt;/Work&gt;
&lt;License rdf:about=&quot;by-nc-sa/3.0/&quot;&gt;
  &lt;permits rdf:resource=&quot;http://web.resource.org/cc/Reproduction&quot; /&gt;
  &lt;permits rdf:resource=&quot;http://web.resource.org/cc/Distribution&quot; /&gt;
  &lt;permits rdf:resource=&quot;http://web.resource.org/cc/DerivativeWorks&quot; /&gt;
  &lt;prohibits rdf:resource=&quot;http://web.resource.org/cc/CommercialUse&quot; /&gt;
  &lt;requires rdf:resource=&quot;http://web.resource.org/cc/Notice&quot; /&gt;
  &lt;requires rdf:resource=&quot;http://web.resource.org/cc/Attribution&quot; /&gt;
  &lt;requires rdf:resource=&quot;http://web.resource.org/cc/ShareAlike&quot; /&gt;
&lt;/License&gt;
&lt;/rdf:RDF&gt;
--&gt;
&lt;div class=&quot;item_footer&quot;&gt;&lt;p&gt;&lt;small&gt;&lt;a href=&quot;http://press.teleinteractive.net/oss/2010/03/23/syllabus-for-oss-dss-studies&quot;&gt;Original post&lt;/a&gt; blogged on &lt;a href=&quot;http://b2evolution.net/&quot;&gt;b2evolution&lt;/a&gt;.&lt;/small&gt;&lt;/p&gt;&lt;/div&gt;</description>
			<content:encoded><![CDATA[<p>As promised, here's the syllabus for our study guide to decision support systems using open source solutions.  We'll start with a first draft on 2010-03-23, and update and change based on ideas, comments and lessons learned.  So, please comment.  <img src="http://press.teleinteractive.net/rsc/smilies/icon_smile.gif" alt="&#58;&#41;" class="middle" />  The updates will be marked.  Deletions will be marked with a <del>strike-though</del> and not removed.</p>

<ol>
  <li>Introduction<ol><li>Continuing the discussion of the processes and technologies that constitute a decision support system</li><li>Formalizing a definition of DSS as well as the components, such as business intelligence (BI) that contribute to a DSS</li><li>Providing [and updating] the list of references for this study guide</li></ol></li>
  <li>Preparation<ol><li>Discussing the technology for use in this study guide including the client(s) and server (Red Hat Enterprise Linux 5)</li><li>Checking for prerequisites for the open source solutions that will be used</li><li>Hands-on exercises for preparing the system</li></ol></li>
  <li>Installation<ol><li>Pointers and examples for installing the open source server-side packages including but not limited to:<ol><li>LucidDB</li><li>Pentaho BI-Server, including PAT, and Administrative Console</li><li>RServe and/or RApache</li></ol></li><li>Pointers for installation of client-side software and some examples on MacOSX</li></ol></li>
  <li>Modeling<ol><li>Generally, we would determine the models, the architecture and then one (or more competing) design(s) to satisfy that architecture, including selecting the right technical solutions for the job at hand.  Here, we're creating a learning environment for certain tools, so we're introducing the architecture and design studies after the technology installs.</li><li>In general, this section will explore the various means of modeling processes, systems and data, specifically as these relate to making decisions.</li><li>Decision Making Processes<ol><li>Decision Theory</li><li>Game Theory</li><li>Machine Learning &amp; Data Mining</li><li>Bayes and Iterations</li><li>Predictives</li></ol></li><li>Information Flow</li><li>Mathematical Modeling</li><li>Data Modeling</li><li>UML</li><li>Dimensional Modeling</li><li>PMML</li></ol></li>
  <li>Architecture and Design<ol><li>In this section, we'll examine the differences between enterprise and system architecture, and between architecture and design.  We'll look at various architectural and design elements that might influence both policy and technology directions.</li><li>Discussing Enterprise Architecture, especially the translation between the user needs and technology/operational realities</li><li>System Architecture</li><li>SOA, ReST, WSDL, and Master Data Management</li><li>Technology selection and vendor bake-offs</li></ol></li>
  <li>Implementation Considerations<ol><li>Discussing the various philosophies and considerations for implementing any DSS, or really, any system integration project.  We'll look at our own three track implementation methodology, as well as how the new Pentaho Agile BI tools support our method.  In addition, we'll consider how we'll get all these OSS tools working together, on the same data sets, as well as, the importance of managing data about the data.</li><li>Pentaho Agile BI and our own 8D&#8482; Method</li><li>System and Data Integration</li><li>Metadata</li></ol></li>
  <li>Using the Tools<ol><li>This is the vaguest part of our syllabus.  We'll be using the examples from our various references, but with the system we've set-up here, rather than the exact systems that the references use.  For example, we'll be using LucidDB and not MySQL for the examples from Pentaho Solutions.  Remember too, that this is a study guide, and not <del>a oops</del> meant to be a book written as a series of blog posts, so while we might vary from the reference materials, we'll always refer to them.</li><li>ETL</li><li>Reporting</li><li>OLAP</li><li>Data Mining &amp; Machine Learning</li><li>Statistical Analysis</li><li>Predictives</li><li>Workflow</li><li>Collaboration</li><li>Hmm, this should take years <img src="http://press.teleinteractive.net/rsc/smilies/icon_biggrin.gif" alt="&#58;&#68;" class="middle" /> </li></ol></li>
</ol>

<!-- Creative Commons License --> <div class="cc_license"><a rel="cc:license" href="http://creativecommons.org/licenses/by-nc-sa/3.0/" title="Creative Commons License: Attribution, Non-Commercial, Share-Alike"><img src="http://i.creativecommons.org/l/by-nc-sa/3.0/88x31.png" alt="Creative Commons License: Attribution, Non-Commercial, Share-Alike" /></a>Except where otherwise noted, this content is<br />licensed under a <a rel="cc:license" href="http://creativecommons.org/licenses/by-nc-sa/3.0/" title="Creative Commons License: Attribution, Non-Commercial, Share-Alike">Creative Commons License</a>.</div> <!-- /Creative Commons License -->
<!--
<rdf:RDF xmlns="http://creativecommons.org/ns#"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<Work rdf:about="http://press.teleinteractive.net/oss/2010/03/23/syllabus-for-oss-dss-studies">
  <dc:title>Syllabus for OSS DSS Studies</dc:title>
  <dc:creator><Agent>
    <dc:title>Joseph A. di Paolantonio (User ID#4 at http://press.teleinteractive.net/)</dc:title>
  </Agent></dc:creator>
  <dc:rights><Agent>
    <dc:title>Joseph A. di Paolantonio (User ID#4 at http://press.teleinteractive.net/)</dc:title>
  </Agent></dc:rights>
  <dc:date>2010-03-23 15:23:01</dc:date>
  <license rdf:resource="by-nc-sa/3.0/" />
</Work>
<License rdf:about="by-nc-sa/3.0/">
  <permits rdf:resource="http://web.resource.org/cc/Reproduction" />
  <permits rdf:resource="http://web.resource.org/cc/Distribution" />
  <permits rdf:resource="http://web.resource.org/cc/DerivativeWorks" />
  <prohibits rdf:resource="http://web.resource.org/cc/CommercialUse" />
  <requires rdf:resource="http://web.resource.org/cc/Notice" />
  <requires rdf:resource="http://web.resource.org/cc/Attribution" />
  <requires rdf:resource="http://web.resource.org/cc/ShareAlike" />
</License>
</rdf:RDF>
-->
<div class="item_footer"><p><small><a href="http://press.teleinteractive.net/oss/2010/03/23/syllabus-for-oss-dss-studies">Original post</a> blogged on <a href="http://b2evolution.net/">b2evolution</a>.</small></p></div>]]></content:encoded>
								<comments>http://press.teleinteractive.net/oss/2010/03/23/syllabus-for-oss-dss-studies#comments</comments>
			<wfw:commentRss>http://press.teleinteractive.net/oss/?tempskin=_rss2&#38;disp=comments&#38;p=927</wfw:commentRss>
		</item>
				<item>
			<title>Questions and Commonality</title>
			<link>http://press.teleinteractive.net/oss/2010/03/19/questions-and-commonality</link>
			<pubDate>Fri, 19 Mar 2010 19:39:23 +0000</pubDate>			<dc:creator>Joseph A. di Paolantonio</dc:creator>
			<category domain="alt">books</category>
<category domain="alt">Open Source</category>
<category domain="alt">Business Intelligence</category>
<category domain="alt">Predictions</category>
<category domain="main">OSS DSS Study Guide</category>			<guid isPermaLink="false">925@http://press.teleinteractive.net/</guid>
						<description>&lt;p&gt;In the &lt;a href=&quot;http://press.teleinteractive.net/oss/2010/03/15/first-dss-study-guide&quot;&gt;introduction&lt;/a&gt; to our &lt;a href=&quot;http://press.teleinteractive.net/oss/DSS-SG/&quot;&gt;open source solutions (OSS) for decision support systems (DSS) study guide (SG)&lt;/a&gt;, I gave a variety of examples of activities that might be considered using a DSS.  I asked some questions as to what common elements exist among these activities that might help us to define a modern platform for DSS, and whether or not we could build such a system using open source solutions.&lt;/p&gt;

&lt;p&gt;In this post, let's examine the first of those questions, and see if we can start answering those questions.  In the next post, we will lay out a syllabus of sorts for this OSS DSS SG.&lt;/p&gt;

&lt;p&gt;The first common element is that in all cases, we have an individual doing the activity, not a machine nor a committee.&lt;/p&gt;

&lt;p&gt;Secondly, the individual has some resources at their disposal.  Those resources include current and historical information, structured and unstructured data, communiqu&amp;#233;s and opinions, and some amount of personal experience, augmented by the experience of others.&lt;/p&gt;

&lt;p&gt;Thirdly, though not explicit, there's the idea of digesting these resources and performing formal or informal analyses.&lt;/p&gt;

&lt;p&gt;Fourthly, though again, not explicit, the concept of trying to predict what might happen next, or as a result of the decision is inherent to all of the examples.&lt;/p&gt;

&lt;p&gt;Finally, there's collaboration involved.  Few of us can make good decisions in a vacuum. &lt;/p&gt;

&lt;p&gt;Of course, since the examples are fictional, and created by us, they represent our biases.  If you had fingered our domain server back in 1993, or read our .project and .plan files from that time, you would have seen that we were interested in sharing information and analyses, while providing a framework for making decisions using such tools as email, gopher and electronic bulletin boards.  So, if you identify any other commonalities, or think anything is missing, please join the discussion in the comments.&lt;/p&gt;

&lt;p&gt;From these commonalities, can we begin to answer the first question we had asked: &quot;What does this term [DSS] really mean?&quot;.  Let's try.&lt;/p&gt;

&lt;p&gt;A DSS is a set of processes and technology that help an individual to make a better decision than they could without the DSS.&lt;/p&gt;

&lt;p&gt;That's nice and vague; generic enough to almost meaningless, but provides some key points that will help us to bound the specifics as we go along.  For example, if a process or technology doesn't help us to make a better decision, than it doesn't fit.  If something allows us to make a better decision, but we can't define the process or identify the technology involved, it doesn't belong (e.g. &quot;my gut tells me so&quot;).&lt;/p&gt;

&lt;p&gt;Let's create a list from all of the above.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Individual Decision Maker&lt;/li&gt;
  &lt;li&gt;Process&lt;/li&gt;
  &lt;li&gt;Technology&lt;/li&gt;
  &lt;li&gt;Structured Data&lt;/li&gt;
  &lt;li&gt;Unstructured Data&lt;/li&gt;
  &lt;li&gt;Historical Information&lt;/li&gt;
  &lt;li&gt;Current Information&lt;/li&gt;
  &lt;li&gt;Communication&lt;/li&gt;
  &lt;li&gt;Opinion&lt;/li&gt;
  &lt;li&gt;Collaboration&lt;/li&gt;
  &lt;li&gt;Analysis&lt;/li&gt;
  &lt;li&gt;Prediction&lt;/li&gt;
  &lt;li&gt;Personal Experience&lt;/li&gt;
  &lt;li&gt;Other's Experience&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;What do you think?  Does a modern system to support decisions need to cover all of these elements and no others?  Is this list complete and sufficient?  The comments are open.&lt;/p&gt;

&lt;!-- Creative Commons License NEWSTYLE license:by-nc-sa nation:3.0 display:new --&gt;&lt;div class=&quot;item_footer&quot;&gt;&lt;p&gt;&lt;small&gt;&lt;a href=&quot;http://press.teleinteractive.net/oss/2010/03/19/questions-and-commonality&quot;&gt;Original post&lt;/a&gt; blogged on &lt;a href=&quot;http://b2evolution.net/&quot;&gt;b2evolution&lt;/a&gt;.&lt;/small&gt;&lt;/p&gt;&lt;/div&gt;</description>
			<content:encoded><![CDATA[<p>In the <a href="http://press.teleinteractive.net/oss/2010/03/15/first-dss-study-guide">introduction</a> to our <a href="http://press.teleinteractive.net/oss/DSS-SG/">open source solutions (OSS) for decision support systems (DSS) study guide (SG)</a>, I gave a variety of examples of activities that might be considered using a DSS.  I asked some questions as to what common elements exist among these activities that might help us to define a modern platform for DSS, and whether or not we could build such a system using open source solutions.</p>

<p>In this post, let's examine the first of those questions, and see if we can start answering those questions.  In the next post, we will lay out a syllabus of sorts for this OSS DSS SG.</p>

<p>The first common element is that in all cases, we have an individual doing the activity, not a machine nor a committee.</p>

<p>Secondly, the individual has some resources at their disposal.  Those resources include current and historical information, structured and unstructured data, communiqu&#233;s and opinions, and some amount of personal experience, augmented by the experience of others.</p>

<p>Thirdly, though not explicit, there's the idea of digesting these resources and performing formal or informal analyses.</p>

<p>Fourthly, though again, not explicit, the concept of trying to predict what might happen next, or as a result of the decision is inherent to all of the examples.</p>

<p>Finally, there's collaboration involved.  Few of us can make good decisions in a vacuum. </p>

<p>Of course, since the examples are fictional, and created by us, they represent our biases.  If you had fingered our domain server back in 1993, or read our .project and .plan files from that time, you would have seen that we were interested in sharing information and analyses, while providing a framework for making decisions using such tools as email, gopher and electronic bulletin boards.  So, if you identify any other commonalities, or think anything is missing, please join the discussion in the comments.</p>

<p>From these commonalities, can we begin to answer the first question we had asked: "What does this term [DSS] really mean?".  Let's try.</p>

<p>A DSS is a set of processes and technology that help an individual to make a better decision than they could without the DSS.</p>

<p>That's nice and vague; generic enough to almost meaningless, but provides some key points that will help us to bound the specifics as we go along.  For example, if a process or technology doesn't help us to make a better decision, than it doesn't fit.  If something allows us to make a better decision, but we can't define the process or identify the technology involved, it doesn't belong (e.g. "my gut tells me so").</p>

<p>Let's create a list from all of the above.</p>

<ol>
  <li>Individual Decision Maker</li>
  <li>Process</li>
  <li>Technology</li>
  <li>Structured Data</li>
  <li>Unstructured Data</li>
  <li>Historical Information</li>
  <li>Current Information</li>
  <li>Communication</li>
  <li>Opinion</li>
  <li>Collaboration</li>
  <li>Analysis</li>
  <li>Prediction</li>
  <li>Personal Experience</li>
  <li>Other's Experience</li>
</ol>

<p>What do you think?  Does a modern system to support decisions need to cover all of these elements and no others?  Is this list complete and sufficient?  The comments are open.</p>

<!-- Creative Commons License NEWSTYLE license:by-nc-sa nation:3.0 display:new --><div class="item_footer"><p><small><a href="http://press.teleinteractive.net/oss/2010/03/19/questions-and-commonality">Original post</a> blogged on <a href="http://b2evolution.net/">b2evolution</a>.</small></p></div>]]></content:encoded>
								<comments>http://press.teleinteractive.net/oss/2010/03/19/questions-and-commonality#comments</comments>
			<wfw:commentRss>http://press.teleinteractive.net/oss/?tempskin=_rss2&#38;disp=comments&#38;p=925</wfw:commentRss>
		</item>
				<item>
			<title>First DSS Study Guide</title>
			<link>http://press.teleinteractive.net/oss/2010/03/15/first-dss-study-guide</link>
			<pubDate>Mon, 15 Mar 2010 20:52:02 +0000</pubDate>			<dc:creator>Joseph A. di Paolantonio</dc:creator>
			<category domain="alt">Business Intelligence</category>
<category domain="alt">Predictions</category>
<category domain="main">OSS DSS Study Guide</category>			<guid isPermaLink="false">904@http://press.teleinteractive.net/</guid>
						<description>&lt;p&gt;Someone sitting in their study, looking at their books, journals, piles of scholarly periodicals and files of correspondence with learned colleagues probably didn't think that they were looking at their decision support system, but they were.&lt;/p&gt;

&lt;p&gt;Someone sitting on the plains, looking at the conditions around them, smoke signals from distant tribe members, records knotted into a string, probably didn't think that they were looking at their decision support system, but they were.&lt;/p&gt;

&lt;p&gt;Someone at the nexus of a modern military command, control, communications, computing and intelligence system, probably didn't think that they were looking at their decision support system, but they were.&lt;/p&gt;

&lt;p&gt;Someone pulling data from transactional systems, and dumping the results of reports &amp;amp; analyses from BI tool into a spreadsheet to feed a dashboard for the executives of a huge corporation probably didn't think that they were looking at their decision support system, but they were.&lt;/p&gt;

&lt;p&gt;The term &quot;decision support system&quot; has been in use for over 50 years, perhaps longer.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;But what does this term really mean?&lt;/li&gt;
  &lt;li&gt;What do all of my examples have in common?&lt;/li&gt;
  &lt;li&gt;How can we build a reasonable decision support system from open source solutions?&lt;/li&gt;
  &lt;li&gt;What resources exist to help us learn?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'm starting a series of posts, essentially a &quot;study guide&quot; to help answer these questions.&lt;/p&gt;

&lt;p&gt;I'll be drawing from and pointing to the following books and online resources as we install, configure and use open source systems to create a technical platform for a decision support system.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;a href=&quot;http://www.springer.com/statistics/statistical+theory+and+methods/book/978-0-387-92297-3&quot;&gt;Bayesian Computation in R&lt;/a&gt; by &lt;a href=&quot;http://bayes.bgsu.edu/&quot;&gt;Jim Albert&lt;/a&gt;, Springer Series in UseR!, ISBN: 0-38-792297-0, &lt;a href=&quot;http://www.amazon.com/gp/product/0387922970?ie=UTF8&amp;amp;tag=intersystecon-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=0387922970&quot;&gt;Purchase from Amazon&lt;/a&gt;, you can also purchase the Kindle ebook from Amazon&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://oreilly.com/catalog/9780596801717&quot;&gt;R in a Nutshell&lt;/a&gt; by &lt;a href=&quot;http://twitter.com/jadler&quot;&gt;Joseph Adler&lt;/a&gt;, ISBN: 0-59-68017-0X, &lt;a href=&quot;http://www.amazon.com/gp/product/059680170X?ie=UTF8&amp;amp;tag=intersystecon-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=059680170X&quot;&gt;Purchase from Amazon&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://www.wiley.com/WileyCDA/WileyTitle/productCd-0470484322.html&quot;&gt;Pentaho Solutions; Business Intelligence and Data Warehousing with Pentaho and MySQL&lt;/a&gt;, by &lt;a href=&quot;http://twitter.com/rolandbouman&quot;&gt;Roland Bouman&lt;/a&gt; and &lt;a href=&quot;http://twitter.com/JosvanDongen&quot;&gt;Jos van Dongen&lt;/a&gt;, ISBN: 0-47-048432-2, &lt;a href=&quot;http://www.amazon.com/gp/product/0470484322?ie=UTF8&amp;amp;tag=intersystecon-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=0470484322&quot;&gt;Purchase from Amazon&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://www.packtpub.com/pentaho-reporting-3-5-for-java-developers?utm_source=press.teleinteractive.net&amp;amp;utm_medium=bookrev&amp;amp;utm_content=blog&amp;amp;utm_campaign=mdb_001537&quot;&gt;Pentaho Reporting 3.5 for Java Developers&lt;/a&gt; by &lt;a href=&quot;http://twitter.com/wpgorman&quot;&gt;Will Gorman&lt;/a&gt;, ISBN: 1-84-719319-6, &lt;a href=&quot;http://www.amazon.com/gp/product/1847193196?ie=UTF8&amp;amp;tag=intersystecon-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=1847193196&quot;&gt;Purchase from Amazon&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://www.wiley.com/WileyCDA/WileyTitle/productCd-0470635177.html&quot;&gt;Pentaho Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration&lt;/a&gt; by &lt;a href=&quot;http://twitter.com/mattcasters&quot;&gt;Matt Casters&lt;/a&gt;, &lt;a href=&quot;http://twitter.com/rolandbouman&quot;&gt;Roland Bouman&lt;/a&gt; &amp;amp; &lt;a href=&quot;http://twitter.com/JosvanDongen&quot;&gt;Jos van Dongen&lt;/a&gt;, ISBN: 0-47-063517-7 due 2010 September, &lt;a href=&quot;http://www.amazon.com/gp/product/0470635177?ie=UTF8&amp;amp;tag=intersystecon-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=0470635177&quot;&gt;Pre-Order from Amazon&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://www.cs.waikato.ac.nz/~ml/weka/book.html&quot;&gt;Data Mining: Practical Machine Learning Tools and Techniques&lt;/a&gt; by &lt;a href=&quot;http://www.cs.waikato.ac.nz/~ihw/&quot;&gt;Ian H. Witten&lt;/a&gt; and &lt;a href=&quot;http://www.cs.waikato.ac.nz/~eibe/&quot;&gt;Eibe Frank&lt;/a&gt;, Second Edition, Morgan-Kaufmann Series in Data Management Systems, ISBN: 0-12-088407-0 a.k.a. &quot;The Weka Book&quot;, &lt;a href=&quot;http://www.amazon.com/gp/product/0120884070?ie=UTF8&amp;amp;tag=intersystecon-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=0120884070&quot;&gt;Purchase from Amazon&lt;/a&gt;, &lt;a href=&quot;http://www.amazon.com/gp/product/0123748569?ie=UTF8&amp;amp;tag=intersystecon-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=0123748569&quot;&gt;Pre-Order the Third Edition&lt;/a&gt;, you can also purchase the Kindle ebook from Amazon&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://www.luciddb.org/&quot;&gt;LucidDB&lt;/a&gt; online &lt;a href=&quot;http://pub.eigenbase.org/wiki/LucidDbDocs&quot;&gt;documentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Pertinent information from &lt;a href=&quot;http://www.eigenbase.org/&quot;&gt;Eigenbase&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://n2.nabble.com/luciddb-users-f1374590.html&quot;&gt;LudidDB mailing list archive&lt;/a&gt; on Nabble&lt;/li&gt;
  &lt;li&gt;Anything I can find on &lt;a href=&quot;http://code.google.com/p/pentahoanalysistool/&quot;&gt;PAT&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://www.pentaho.com/index.php&quot;&gt;Pentaho&lt;/a&gt; Community &lt;a href=&quot;http://forums.pentaho.org/&quot;&gt;Forums&lt;/a&gt;, &lt;a href=&quot;http://wiki.pentaho.com/display/COM/Community+Wiki+Home&quot;&gt;Wiki&lt;/a&gt;, &lt;a href=&quot;http://wiki.pentaho.com/display/COM/Community+Technical+WebEx&quot;&gt;WebEx Events&lt;/a&gt;, and other community sources&lt;/li&gt;
  &lt;li&gt;R &lt;a href=&quot;http://www.r-project.org/mail.html&quot;&gt;Mailing Lists&lt;/a&gt; and &lt;a href=&quot;http://groups.google.com/group/rapache&quot;&gt;Forums&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Various &lt;a href=&quot;http://www.r-project.org/doc/bib/R-books.html&quot;&gt;Books in PDF from The R&lt;/a&gt; Project&lt;/li&gt;
  &lt;li&gt;Information Management and Open Source Solution Blogs from our side-column linkblogs&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In this study guide series of posts:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;I'll show how the datawarehousing (DW) and business intelligence (BI) can be extended to include all the elements held in common from my DSS examples.&lt;/li&gt;
  &lt;li&gt;We'll examine the open source solutions Pentaho, R, Rserve, Rapache,  LucidDB and possibly Map-Reduce &amp;amp; Key-value-stores, and the related open source projects, communities and companies in terms of how they can be used to create a DSS.&lt;/li&gt;
  &lt;li&gt;I would like to add a collaboration tool to the mix, as we do in our implementation projects, possibly &lt;a href=&quot;http://www.mindtouch.com/&quot;&gt;Mindtouch&lt;/a&gt;, a ReSTful Wiki Platform.&lt;/li&gt;
  &lt;li&gt;We may add one non-open source package, &lt;a href=&quot;http://SQLstream.com/&quot;&gt;SQLStream&lt;/a&gt;, that's built upon open source elements from Eigenbase.  This will allow us to add a real-time component to our DSS.&lt;/li&gt;
  &lt;li&gt;I'll give my own experience in installing these packages and getting them to work together, with pointers to the resources listed above.&lt;/li&gt;
  &lt;li&gt;We'll explore sample and public data sets with the DSS environment we created, again with pointers to and help from the resources listed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The purpose of this series of posts is a study guide, not an online book written as a blog.  The goal is to help us to define a modern DSS and build it out of open source solutions, while using existing resources.&lt;/p&gt;

&lt;p&gt;Please feel free to comment, especially if there is anything that you feel should be included beyond what I've outlined here.&lt;/p&gt;

&lt;!-- Creative Commons License NEWSTYLE license:by-nc-sa nation:3.0 display:new --&gt;&lt;div class=&quot;item_footer&quot;&gt;&lt;p&gt;&lt;small&gt;&lt;a href=&quot;http://press.teleinteractive.net/oss/2010/03/15/first-dss-study-guide&quot;&gt;Original post&lt;/a&gt; blogged on &lt;a href=&quot;http://b2evolution.net/&quot;&gt;b2evolution&lt;/a&gt;.&lt;/small&gt;&lt;/p&gt;&lt;/div&gt;</description>
			<content:encoded><![CDATA[<p>Someone sitting in their study, looking at their books, journals, piles of scholarly periodicals and files of correspondence with learned colleagues probably didn't think that they were looking at their decision support system, but they were.</p>

<p>Someone sitting on the plains, looking at the conditions around them, smoke signals from distant tribe members, records knotted into a string, probably didn't think that they were looking at their decision support system, but they were.</p>

<p>Someone at the nexus of a modern military command, control, communications, computing and intelligence system, probably didn't think that they were looking at their decision support system, but they were.</p>

<p>Someone pulling data from transactional systems, and dumping the results of reports &amp; analyses from BI tool into a spreadsheet to feed a dashboard for the executives of a huge corporation probably didn't think that they were looking at their decision support system, but they were.</p>

<p>The term "decision support system" has been in use for over 50 years, perhaps longer.</p>

<ul>
  <li>But what does this term really mean?</li>
  <li>What do all of my examples have in common?</li>
  <li>How can we build a reasonable decision support system from open source solutions?</li>
  <li>What resources exist to help us learn?</li>
</ul>

<p>I'm starting a series of posts, essentially a "study guide" to help answer these questions.</p>

<p>I'll be drawing from and pointing to the following books and online resources as we install, configure and use open source systems to create a technical platform for a decision support system.</p>

<ol>
  <li><a href="http://www.springer.com/statistics/statistical+theory+and+methods/book/978-0-387-92297-3">Bayesian Computation in R</a> by <a href="http://bayes.bgsu.edu/">Jim Albert</a>, Springer Series in UseR!, ISBN: 0-38-792297-0, <a href="http://www.amazon.com/gp/product/0387922970?ie=UTF8&amp;tag=intersystecon-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=0387922970">Purchase from Amazon</a>, you can also purchase the Kindle ebook from Amazon</li>
  <li><a href="http://oreilly.com/catalog/9780596801717">R in a Nutshell</a> by <a href="http://twitter.com/jadler">Joseph Adler</a>, ISBN: 0-59-68017-0X, <a href="http://www.amazon.com/gp/product/059680170X?ie=UTF8&amp;tag=intersystecon-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=059680170X">Purchase from Amazon</a></li>
  <li><a href="http://www.wiley.com/WileyCDA/WileyTitle/productCd-0470484322.html">Pentaho Solutions; Business Intelligence and Data Warehousing with Pentaho and MySQL</a>, by <a href="http://twitter.com/rolandbouman">Roland Bouman</a> and <a href="http://twitter.com/JosvanDongen">Jos van Dongen</a>, ISBN: 0-47-048432-2, <a href="http://www.amazon.com/gp/product/0470484322?ie=UTF8&amp;tag=intersystecon-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=0470484322">Purchase from Amazon</a></li>
  <li><a href="http://www.packtpub.com/pentaho-reporting-3-5-for-java-developers?utm_source=press.teleinteractive.net&amp;utm_medium=bookrev&amp;utm_content=blog&amp;utm_campaign=mdb_001537">Pentaho Reporting 3.5 for Java Developers</a> by <a href="http://twitter.com/wpgorman">Will Gorman</a>, ISBN: 1-84-719319-6, <a href="http://www.amazon.com/gp/product/1847193196?ie=UTF8&amp;tag=intersystecon-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=1847193196">Purchase from Amazon</a></li>
  <li><a href="http://www.wiley.com/WileyCDA/WileyTitle/productCd-0470635177.html">Pentaho Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration</a> by <a href="http://twitter.com/mattcasters">Matt Casters</a>, <a href="http://twitter.com/rolandbouman">Roland Bouman</a> &amp; <a href="http://twitter.com/JosvanDongen">Jos van Dongen</a>, ISBN: 0-47-063517-7 due 2010 September, <a href="http://www.amazon.com/gp/product/0470635177?ie=UTF8&amp;tag=intersystecon-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=0470635177">Pre-Order from Amazon</a></li>
  <li><a href="http://www.cs.waikato.ac.nz/~ml/weka/book.html">Data Mining: Practical Machine Learning Tools and Techniques</a> by <a href="http://www.cs.waikato.ac.nz/~ihw/">Ian H. Witten</a> and <a href="http://www.cs.waikato.ac.nz/~eibe/">Eibe Frank</a>, Second Edition, Morgan-Kaufmann Series in Data Management Systems, ISBN: 0-12-088407-0 a.k.a. "The Weka Book", <a href="http://www.amazon.com/gp/product/0120884070?ie=UTF8&amp;tag=intersystecon-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=0120884070">Purchase from Amazon</a>, <a href="http://www.amazon.com/gp/product/0123748569?ie=UTF8&amp;tag=intersystecon-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=0123748569">Pre-Order the Third Edition</a>, you can also purchase the Kindle ebook from Amazon</li>
  <li><a href="http://www.luciddb.org/">LucidDB</a> online <a href="http://pub.eigenbase.org/wiki/LucidDbDocs">documentation</a></li>
  <li>Pertinent information from <a href="http://www.eigenbase.org/">Eigenbase</a></li>
  <li><a href="http://n2.nabble.com/luciddb-users-f1374590.html">LudidDB mailing list archive</a> on Nabble</li>
  <li>Anything I can find on <a href="http://code.google.com/p/pentahoanalysistool/">PAT</a></li>
  <li><a href="http://www.pentaho.com/index.php">Pentaho</a> Community <a href="http://forums.pentaho.org/">Forums</a>, <a href="http://wiki.pentaho.com/display/COM/Community+Wiki+Home">Wiki</a>, <a href="http://wiki.pentaho.com/display/COM/Community+Technical+WebEx">WebEx Events</a>, and other community sources</li>
  <li>R <a href="http://www.r-project.org/mail.html">Mailing Lists</a> and <a href="http://groups.google.com/group/rapache">Forums</a></li>
  <li>Various <a href="http://www.r-project.org/doc/bib/R-books.html">Books in PDF from The R</a> Project</li>
  <li>Information Management and Open Source Solution Blogs from our side-column linkblogs</li>
</ol>

<p>In this study guide series of posts:</p>
<ul>
  <li>I'll show how the datawarehousing (DW) and business intelligence (BI) can be extended to include all the elements held in common from my DSS examples.</li>
  <li>We'll examine the open source solutions Pentaho, R, Rserve, Rapache,  LucidDB and possibly Map-Reduce &amp; Key-value-stores, and the related open source projects, communities and companies in terms of how they can be used to create a DSS.</li>
  <li>I would like to add a collaboration tool to the mix, as we do in our implementation projects, possibly <a href="http://www.mindtouch.com/">Mindtouch</a>, a ReSTful Wiki Platform.</li>
  <li>We may add one non-open source package, <a href="http://SQLstream.com/">SQLStream</a>, that's built upon open source elements from Eigenbase.  This will allow us to add a real-time component to our DSS.</li>
  <li>I'll give my own experience in installing these packages and getting them to work together, with pointers to the resources listed above.</li>
  <li>We'll explore sample and public data sets with the DSS environment we created, again with pointers to and help from the resources listed.</li>
</ul>

<p>The purpose of this series of posts is a study guide, not an online book written as a blog.  The goal is to help us to define a modern DSS and build it out of open source solutions, while using existing resources.</p>

<p>Please feel free to comment, especially if there is anything that you feel should be included beyond what I've outlined here.</p>

<!-- Creative Commons License NEWSTYLE license:by-nc-sa nation:3.0 display:new --><div class="item_footer"><p><small><a href="http://press.teleinteractive.net/oss/2010/03/15/first-dss-study-guide">Original post</a> blogged on <a href="http://b2evolution.net/">b2evolution</a>.</small></p></div>]]></content:encoded>
								<comments>http://press.teleinteractive.net/oss/2010/03/15/first-dss-study-guide#comments</comments>
			<wfw:commentRss>http://press.teleinteractive.net/oss/?tempskin=_rss2&#38;disp=comments&#38;p=904</wfw:commentRss>
		</item>
			</channel>
</rss>
