OpenSQLcamp Play with Data

On November 14 and 15th, I attended openSQLcamp 2009 in Portland, OR. It was a great event, and I was honored to be accepted for a five minute lightening talk: "I Play with Data". I would like to thank Sheeri (aka tcation) for providing Youtube videos of the lightening talks. Here's mine:

And here's a transcript, with links to things that were mentioned.

Hi mine name is Joseph, and I play with data.

It's good that I followed David [David J. Lutz, Director of Technical Sales Support, Infobright] because part of what I'm looking for, in the solution of how to do statistics with SQL, is column-store databases.

Way back in the 70's & 80's, I was doing pair programming with FORTRAN programmers [laughter in background] :D turning algorithms into software. I was, pair programming, we sat down together, I would write math, they would write software, we did things [mostly in Bayes], through the 80's [most with Wendy, who still works with me occasionally].

Then I started playing with data through other people algorithms using SQL, and relational database management systems, and then later, Business Intelligence systems, and most recently playing a lot with Pentaho, using that.

And I'm going to make a lot of statements, but I really have a question. I know of three ways that I can start doing real statistics with SQL databases. And I want to do real statistics because the most you can get just with AVERAGE, is, assuming that I have a uniform distribution or a normal distribution, and even in many cases, an average isn't necessarily the mean, and the mean is certainly not the best descriptor of the underlying distribution of the data. Right?

So, I can start doing fancier algorithms in SQL, but they're painful. And you know the big-O number, and they're nasty big-O numbers, to do, even if I have a frequency function, to try to arrive at the mean or the mode, simple things.

And if I want to do Bayesian statistics, and a Markov Chain Monte Carlo simulation to get at inferences on mathematical conjugates [snickering in the background] &#59;) … I'm not going to do this in SQL.

So, I have two other choices that I've been exploring.

Anyone here familiar with the R Project? [Several affirmative responses] Ya! Yeah! All right! I love the R Project, and I'm having a lot of fun with the R Project. The R Project is written in R and C and FORTRAN and there are thousands of packages written in FORTRAN and C and R and I'm doing a lot of nice math with it now, and that's a lot of fun. But everything in R is actually in data sets, and data sets are column-store databases, in memory. And even though you can get 8GB of memory on a lap top now, I run out of memory, frequently, with the type of stuff I do. So, what do I do? I use SQL, because relational database management systems, manage data really, really well, and R analyzes the data really, really well, and R speaks SQL through either RODBC, or DBI… Off you go.

So, I would like to use column-store databases, and one of my questions is that I'm looking for a way of speeding this up, so that I can match a column-store data set in R in memory with a column-store database such as Infobright or MonetDB or LucidDB. And do this one-to-one mapping much more efficiently than I can going through ODBC.

Does anyone have any thoughts on this?

[Discussion with someone in the audience - if you read this, please identify yourself in the comments, and thank you for talking to me] Have you heard of TL/R [my error in listening]?

I have not. I've never heard of TL/R.

It's R embedded in PostgreSQL.

OK, yes, I have. Did you say TL or PL?

PL. [PL/R by Joe Conway is back in development and becoming interesting again].

Yeah, PL/R I know. And there's a lot of things like that, but they're basically interfaces.

SQLDF?

SQLDF?

Yeah, which isn't all that mature. It tries to map the name of the dataframe in R, where you're doing your stuff in R, to a table in MySQL [in the weeds]. Which is really what you want, is to prodSQL, is that relationship of the sets, where basically you overloaded the dataframe… so you can access… overloaded the access operator… to go out to a SQL area, however it does it.

OK, so SQLDF.

A third solution that I've been looking at is LucidDB, which is a column-store database with a plug-in architecture, written in Java. And there is the math commons on apache.com [oops] packages which have real statistic packages, probability distribution packages, all sorts of really neat packages, which are essentially Java libraries and I would like to see real statistics written into LucidDB as plug-ins for LucidDB [horn sounds] If anyone is interested. Thank you so much.

The notes taken during the lightening rounds were written by Ben Hengst, and can be found at openSQLcamp Lightening Talks

That last part is really the most important to me. I'm working with Nick Goodman, who recently started Dynamo Business Intelligence, and with advice from Julian Hyde and others in the Eigenbase community, to develop plugins for LucidDB which might be bundled into ADBMS versions of DynamoDB to do real statistics, making real inferences and real predictions, using the math packages from the Apache Commons, and having a transparent interface to R, so that R isn't limited by in-memory constraints.

Why not join us on irc.freenode.net ##luciddb and discuss it?

Why Open Source for a Friend

This post is in response to "Volunteer for the Greater Good" written by S. Kleiman. I remember that village in Pennsylvania, and the attitudes of my friend at that time. I'm not surprised that you're attracted to open source; I am surprised that you're having trouble with embracing its ideals. We've have had an email exchange on this subject, and, as you know, I'm fairly attracted to open source solutions my self. &#59;) I hadn't seen your blog prior to answering your email, so let me go into a bit more detail here.

"The model contributor is a real geek – a guy in his 20-30’s, single, lives in his parent’s basement, no mortgage, no responsibility other than to pick up his dirty socks (some even have mothers who will do that)." -- "Volunteer for the Greater Good" by S. Kleiman

Wow. What a stereotype, and one that couldn't be further from the truth. Admittedly, during economic downturns, when software developers are forced to take whatever job they can find to put food on the table, many contribute to open source projects, ones that don't have commercial support and ones that do. This helps that open source project and its community. But, it also helps the developers to keep their skills sharp and maintain credibility. Most open source developers get paid. Some are students. Some are entrepreneurs. But most get paid, it's their job. And even if it's not their job, projects have learned to give back to communities.

While there are hundreds of thousands of open source projects on Sourceforge.net and other forges, many have never gone beyond the proposal stage, and have nothing to download. The number of active open source projects does number in the tens of thousands, and that is still pretty amazing. The idea that the great unwashed contribute to these projects whilst Mom does laundry... Well, that just doesn't wash. :p The vast majority of open source communities are started by 1 - 5 developers, who have a common goal that can be obtained through that specific open source project. They have strict governance in place to assure that the source code in the main project tree can be submitted only by those that have founded the project, or those that have gained a place of respect and trust in the community (a meritocracy) through the value of the code that they have contributed for plugins, through forums, and the like.

Most active open source projects fall into two categories, and many have slipped back and forth between these two.

  1. A labour of love, creating something that no one else has created for the sheer joy of it, or to solve a specific pain point for the lead developer
  2. A commercial endeavor, backed by an organization or organizations to solve their own enterprise needs or those of a specific market

While there are thousands of examples of both types, let me give just a few examples of some developers that I know personally, or companies with which I'm familiar.

Mondrian was founded by Julian Hyde, primarily as a labour of love. I know Julian, and he's an incredibly bright fellow. [And public congratulations to you, Julian and to your wife, on the recent birth of Sebastian]. In addition to be the father of Sebastian and Mondrian, Julian is also the Chief Architect of SQLstream, and a contributor to the Eigenbase project. Not exactly sitting around in the basement, coding away and waiting for Mom to clean up after him. :>> You can read Julian's blog on Open Source OLAP and Stuff, and follow Julian's Twitter stream too. By the way, while Mondrian can still be found on Sourceforge.net under its original license, it is also sponsored by Pentaho, and can be found as Pentaho Analysis, and as the analytical heart of the Pentaho BI Suite, JasperSoft BI Suite and SpagoBI.

Two other fellows had somewhat similar problems to solve and felt that the commercial solutions designed to move data around were simply too bloated, too complex, and prone to failure to boot. I don't believe that these two knew each other, and their problems were different enough to take different forms in the open source solutions that they created. I'm talking about Matt Casters, founder of the KETTLE ETL tool for data warehousing, and Ross Mason, founder of the Mule ESB. Both of them had an itch to scratch, and felt that the best way to scratch it was to create their own software, and leverage the power of the open source communities to refine their back scratchers. KETTLE, too, can now be found in Pentaho, as Pentaho Data Integration. Ross co-founded both Ricston and MuleSource to monetize his brain child, and has done an excellent job with the annual MuleCons. Matt still lives in Belgium, and has been known to share the fine beers produced by a local monastery [Thanks Matt]. You should follow Matt's blog too. Ross lives on the Island of Malta, and Ross blogs about Mule and the Maltese lifestyle.

Let's look at two other projects: Talend and WSO2. Both of these are newer entrants into the ETL and SOA space respectively, and both were started as commercial efforts by companies of the same name. I haven't had the opportunity to sit down with the Talend folk. I have spoken with the founders of WSO2, and they have an incredible passion that simply couldn't be fulfilled with their prior employer. So they founded their company, and their open source product, and haven't looked back. You can follow Sanjiva's Blog to learn more about WSO2 and their approach to open source.

And just one more, and somewhat different example: projects started by multiple educational institutions to meet their unique needs: Kuali for ERP and Sakai for learning management. For another take on commercialization, The rSmart Group contributes to these projects, but is commercializing them as appliances sold to educational institutions. You can read more about this rather different approach to monetizing open source at Chris Coppola's blog.

There are many, many more such examples. Just in the area of data management & analysis, we cover over 60 related open source projects [take a look at the blogroll in the sidebar to the right.

..."they organize themselves into groups of developers and maintainers on an adhoc basis, and on a world-wide basis. And the end products are robust, well developed, and well tested." -- "Volunteer for the Greater Good" by S. Kleiman

I think we've covered my rebuttal to your posting between the first quote and this one. I very much agree with this statement. I'm surprised by your surprise. The organizational dynamics that result in the excellent code that comprise open source projects is the subject of much thought, admiration and research. Here's a few places that you can go for more information.

And just for completeness sake, here's our email exchange:

From S. Kleiman: "OS is the current bug in my head. I'm trying to understand why my intellectual property should be "open" to the world (according to Richard Stallman.

Yes, I've read the copious amounts of literature on open software and the economics thereof - but I still don't get it. If I apply for a patent on a gadget, and then license companies to make that gadget - isn't that intellectual property? To copy my design, while it doesn't destroy my design, does limit any profit I might gain.

Anyway - how are you? Are you one of the original hackers?
I realized that all this time I though I had a great practical engineering degree. Instead I realize they made us into hackers - in the best sense of the word.

What is your experience with OS? What are you talking about (besides the title)?
How is the "snow" in CA? "

And my response:

Discussions around open source often get very passionate, so we should be having this conversation on a warm beach cooled by ocean breezes, fueled with lots of espresso ristretto followed by rounds of grappa to lower inhibitions and destroy preconceptions ;-)

But email is all we have.

Most open source projects are software, though there are a few examples of hardware projects such as Bug Labs, TrollTech (bought by Nokia, I think), OpenMojo and one for UAVs.

I should start by pointing out that I'm not presenting at the Open Source Business Conference, but am moderating a panel.

http://www.infoworld.com/event/osbc/09/osbc_agenda.html

Session Title: Moving Open Source Up the Stack

Session Abstract: Open Source Solutions for IT infrastructure have shown great success in organizations of all types
and sizes. OSS for business applications have seen greater difficulties in penetrating the glass ceiling
of the enterprise stack. We have put together a panel representing the EU and the US, system
integrators, vendors and buyers, and corporate focus vs. education focus. We''ll explore how the OSS
application strategy has changed over the past four years. We will also look at success and failures,
the trade-offs and the opportunities in solving business/end-user needs with OSS enterprise
applications.

Learning Objective 1: Most buyers know the 80% capability for 20% cost mantra of most OSS vendors, but we''ll focus on
what that lower cost actually buys.

Learning Objective 2: Where does OSS fit in the higher levels of the application stack? Learn how flexibility & mashups
can improve the end user experience.

Learning Objective 3: Learn how to come out ahead on the trade-offs of up-front cost vs. operational cost, experience and
learning curves, maintenance and replacement, stagnation and growth.

Here are the confirmed panelists:

(1) Tim Golden, Vice President - Unix Engineering, Security & Provisioning, Bank of America
(2) Gabriele Ruffatti, Architectures & Consulting Director, Research & Innovation Division, Engineering Group, Engineering Ingegneria Informatica S.p.A.
(3) Aaron Fulkerson, CEO/Founder, mindtouch
(4) Lance Walter, Vice President - Marketing, Pentaho
(5) Christopher D. Coppola, President, The rSmart Group
(Moderator) Joseph A. di Paolantonio, Principal Consultant/Blogger/Analyst, InterActive Systems & Consulting, Inc.

So, back to the "Why open source" discussion.

You might want to listen to a couple of our podcasts:

http://press.teleinteractive.net/tialife/2005/06/30/what_is_open_source

http://press.teleinteractive.net/tialife/2005/07/01/why_open_source

or not :-D

Historically, there were analog computers programmer by moving around jumper cables and circuits. Then there were general purpose computers programmed in machine language. Companies like IBM got the idea of adding operating systems, compilers and even full applications to their new mainframes to make them more useful and "user friendly" with languages like COBOL for the average business person and FORTRAN fir those crazy engineers. Later Sun, Apple, HP and others designed RISC based CPU's with tightly integrated operating systems for great performance. Throughout all this, academicians and data processing folk would send each other paper or magnetic tapes and enhance the general body of knowledge concerning running and programming computers. There eventually grew close to 100 flavours of Unix, either the freely available BSD version or the more tightly licensed AT&T version.

Then a little company called Microsoft changed the game, showing that hardware was a commodity and the money was in patenting, copywriting and using restrictive licenses to make the money in computers come from software sales.

Fast forward ~15 years and the principals in Netscape decided to take a page from the Free Software Foundation & their GNU (Gnu is not Unix) General Public License and the more permissive Berkeley License for BSD and as a final recourse in their lost battle to the Microsoft monopoly, coined the term "open source" and released the geiko web rendering engine under the Mozilla Public License. And the philosophical wars were on.

When I was the General Manager of CapTech IT Services, I had a couple of SunOS Sys Admins who spent their spare time writing code to improve FreeBSD & NetBSD. I let them use their beach time to further contribute to these projects. Then a young'un came along who wanted to do the same for this upstart variant of minix called Linux. :-D. All of this piqued my interest in F/LOSS.

Today, I feel that F/LOSS is a development method and not a distribution method nor a business model. If you look at IBM, HP, Oracle and others, you'll find that >50% of their money comes from services. Just as M$ commodified hardware and caused the Intel CISC architecture to win over proprietary RISC chips, software has become a commodity. Services is how one makes money in the computer market. With an open source development methodology, a company can create and leverage a community, not just for core development but for plugins and extensions, but more importantly that community can be leveraged ad thousands of QA testers at all levels: modules, regression & UAT, for thousands of use cases, and for forum level customer support (People, people helping people, are the happiest people on the world ;-)

Can the functions in your application be replicated by someone else without duplicating a single line of your code? Are the margins on your software sales being forced below 10%? Does most of your profit come from support, system integration, customizations or SaaS? Then why not leverage your community?

So, this is a really short answer to a really complex issue.

To answer some of your other questions...

I'm not an hacker nor a programmer of any type. I have started to
play around with the open source R statistical language to recreate my Objective Bayes assessment technique and grow beyond the (Fortran on OS/360 of VAX/VMS) applications that I caused to be created from it.

I haven't gotten to the snow in a couple of years, but we're in a drought cycle. Though it is storming as I write this.

I hope this helps you with your open source struggle, my friend. And thank you for putting up with me being a wordy bastard for the past /cough /harumph years. :D Oh, and note the Creative Commons license for this post. This must really cause you great consternation as a writer. Oh, and I'm not going to touch your post on Stallman. B)

SQLStreamv2 Real Time BI

Today, SQLStream announced version 2.0 of their Real Time BI solution. SQLStream comes from the fertile creativity of Julian Hyde, who is also the founder of the open source Mondrian OLAP engine. While SQLStream is not open source, it does stem from the open source Eigenbase community, leveraging the user-defined transforms that were originally developed for LucidDB to operate on traditional stored relational data, with SQL:2003-compliant syntax. SQLStream extends this to handle streaming relational data.

In addition to capturing standard, structured data while "on the wire", SQLStream also includes adapters for feeds, such as Atom and RSS, and for Twitter.

Methinks Julian and I need to schedule another lunch soon, so that I can learn more about how this unstructured data, especially from Twitter, can fit into real time analytics provided by SQLStream v2.0.

BTW, you can follow me on Twitter as @JAdP.

Panettone French Toast on Boxing Day

Panettone French Toast with Bacon for Boxing Day Winter Holidays 2008
This is a picture of Panettone made into French Toast served with very crisp baked bacon

Il Panettone is a traditional Christmas bread. The best that I've found imported to the USA is La Loggia. Bauli is also good, but not as moist. Naturally leavened, it somehow survives months on a cargo ship and weeks in a refrigerator after opening, without preservatives. A tradition is to make French Toast out of the Panettone on the day after Christmas, or Boxing day. Here's my recipe. Preheat an oven, preferably with a baking stone in it, to 400ºF.

  • One egg per person, separated
  • One teaspoon of Gran Marnier, two drops of vanilla extract, a grind of sea salt and a grate or two of nutmeg
  • One tablespoon of heavy cream per person
  • Two Slices of Panettone per person

Note that there isn't any additional sugar. Whip the egg whites to soft peaks, beat the egg yolks with the remaining ingredients, excepting the Panettone slices. Fold the seasoned egg yolk mixture into the egg whites. Butter a glass baking dish that is sufficiently large enough to hold all of the Panettone slices in one layer. Pour half the batter into the bottom of the baking dish, arrange the panettone slices in one layer in the baking dish, cover with the remaining batter. If possible, let it sit overnight, or at least for two hours in the refrigerator. Place the baking dish in the pre-heated oven for 15 minutes, then into another oven, or reduce the heat, at 225ºF for another 15 to 30 minutes. I also like to serve this dish with crisp bacon or pancetta. Cook thickly sliced bacon in a 225ºF oven for two hours. Drain the fat after the first half-hour and then arrange the bacon on paper towels and cover with more paper towels and cook for the remaining time. You might like powdered sugar over it or maple syrup. I like it plain.

Festivus Hogswatch Solstice Christmachanukwansa 2008

The winter holidays are upon us, and it's time to cook and cook and cook. Of course, the holidays are all about people, but for me, only from the standpoint of them eating what I cook. :p

Solstice

For Solstice, I made one of my favorite dishes, Maccheroni alla Chitarra con Abruzzo Polpettine, though I made it more of a ragù with the meatballs as the recipe says, veal shanks and pork baby back ribs. I made over three pounds of meatballs, as I'll be using them for Christmas supper as well. I served the veal shanks on Solstice, as that's what I like, and since it's also the day I turned 53, I figured what I like mattered. &#59;) Dad likes the pork ribs, so, that's what I'll serve on Christmas Day.

I hunt the solstice shrub on this day, traditionally, but this year I went the day before, as it rained on the Solstice. I brought it up to the living room on the Solstice and set it up to be decorated later.

Christmas Eve

Friends and relatives from around the Bay Area to Carmel decided not to brave the wet weather that we're having this year. Bunkey is still in Iraq, though this is his last year. Without the big appetites that I was expecting this year, we're not doing the traditional seven fishes this year, just four. :>> For four people. This year, we'll be having a soup of anchovies and white beans, Dad's making his tuna in marinara over spaghetti and I'm make a putanesca sauce to go with it. We're also having Chilean Sea Bass, brushed with olive oil and lemon, roasted in the oven and Shrimp Scampi. I'll be serving a latke type of side with those last made of four potatoes and two zucchinis, stripped in a mandolin (or the big holes in a cheese grater), squeezed dry, and mixed with two leeks, sliced thin and sautéed, and two eggs, patted into cakes and fried, then served with sour cream.

Part of the fun of Christmas Eve is decorating the solstice shrub and watching Hogswatch (the movie based upon the Terry Pratchett book, and my favorite winter holiday movie).

Christmas Day

Four people again will be eating on Christmas Day, so nothing too elaborate. Dad is making Italian Wedding or Holiday soup (chicken stock, spinach, teeny-tiny meatballs and cubes of parsley frittata), and I'll be making spinach & cheese ravioli with the meatballs and pork rig ragù from the Solstice and a roast chicken basted with a rosemary twig dipped in olive oil & garlic, served with Brussels Sprouts & Chestnuts, as I make for Thanksgiving.

Boxing Day

This year we're going to friends for a ham dinner on Boxing Day. I'm looking forward to eating and not cooking.

New Year's and Epiphany

Three more winter holidays are coming, and don't forget that the 12 days start too. New Year's Eve is often crab cioppino, New Year's Day is often baby back ribs in sauerkraut, ham hocks and hopping john, and other fine stuff. I'll blog about these holidays later.

That's all of that. Enjoy your holidays, whatever your beliefs, and may Peace be upon the land.

October 2018
Mon Tue Wed Thu Fri Sat Sun
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31        
 << <   > >>
The TeleInterActive Press is a collection of blogs by Clarise Z. Doval Santos and Joseph A. di Paolantonio, covering the Internet of Things, Data Management and Analytics, and other topics for business and pleasure. 37.540686772871 -122.516149406889

Search

Categories

The TeleInterActive Lifestyle

Yackity Blog Blog

The Cynosural Blog

Open Source Solutions

DataArchon

The TeleInterActive Press

  XML Feeds