Pentaho offers one of the most complete data management and analytics suites available both as an open source solution, its Community Edition, and as an Enterprise Edition:
Webdetails is a 20-person strong consultancy based in Portugal, founded by Pedro Alves, focused on building Pentaho solutions for its customers, and on data visualization. In addition to the consulting work, Webdetails has become the major committer for the open source Community Development Framework project, originally developed by Ingo Klose. In the course of their work, as inspired by the muse of customer needs, Webdetails has grown the original CDF project into a full suite of OSS data visualization and dashboard projects, CTools. Over the past year, the talented web details user experience teams, seems to have put out a new CTool almost monthly.
Pedro Alves is an extremely well-respected member of the Pentaho community, leading community events and training, appearing often in the forums and IRC, and staying connected through Twitter and Skype. Recently, Pedro was highly active in helping to create the Pentaho Marketplace, which provides direct access from the BI Server web interface for users, to a series of plug-ins for the BI Server, including CTools, and other community and third-party projects.
I have the pleasure of knowing Pedro, and several other of the Webdetails and Pentaho teams. This week I was able to speak with Pedro, as well as Davy Nys, Vice President, EMEA & APAC at Pentaho, and Doug Moran, one of Pentaho's Founders.
Pedro doesn't feel that the acquisition will change Webdetails, in that both the UX and consulting teams will continue as before. However, both community and enterprise users of Pentaho will feel the impact of both teams, as the lessons learned from Webdetails consulting projects are implemented by the UX team, not only in the Dashboards and data visualizations tools, but also, per Davy, in the overall UX throughout all the Pentaho products. Having worked with Pentaho tools as a practitioner in the past, I know that business users will appreciate this as Pentaho becomes both easier and more pleasant to use. The data scientists will also appreciate more and better tools to draw the story out of the data, and present it to the subject matter experts and business leaders in an Agile fashion.
As Pedro mentioned, most things won't change, such as the fact that CDF is the underpinning of all of Pentaho Dashboards, or the pace of development of new CTools. Several are currently underway. One that I can mention grew out of a request by the Mozilla Foundation, for a file and data browser for the Hadoop distributed file system [HDFS] that would be as easy as the file browser in any modern operating system. The result is CVB - community VFS browser. One thing that will change is that more of the CTools will make their way into the main branch of the EE product as they reach the appropriate state of maturity and stability.
Pedro has many plans for CTools, and for facilitating data visualization through Pentaho. But in addition to continuing his role as the general manager of Webdetails, and Chief Architect of CTools, Pedro will also be assuming the role of Senior Vice President of Community for Pentaho. As a long time friend of the Pentaho community myself, I have to say that there couldn't be a better choice.
One of Pentaho's Founders, Doug Moran, was the "Community Guy", who stayed in this role until the start of 2011, following the original community guy, Gretchen Moran. Doug's philosophy is that any open source community needs to stand on its own to be organic and strong. The Pentaho community is one of the strongest in the OSS DMA space, and as a result, Doug felt comfortable focusing elsewhere, and assumed management of all of Pentaho's "big data" products and Instaview initiatives. As SVP of Community, Pedro will be mostly focused within the company to integrate the community internally and help drive the corporate strategy for community. He'll continue to participate in the community, but as the Pentaho BeeKeeper model, developed by Pentaho CTO & Chief Geek, James Dixon, his main concern will be to assure that there is a rich environment for community innovation. As part of that, Pedro will also be actively pursuing ways to grow and leverage the Pentaho Marketplace. Doug also pointed out that the Pentaho community is also hugely valuable for QA and as a training ground for the best Pentaho developers. This is sure to continue with Pedro in his new role. Doug and Pedro have worked together since the early days of Pentaho, when Pedro decided to quit his job, and, with his wife, create a company devoted to professional services for Pentaho projects and products. This strong relationship between the original Community Guy and the new SVP of Community can only help to make an already strong community even better and more creative.
Davy pointed out to me that there has been an increase in customer demand for Dashboards that were in essence, apps within Pentaho. This might happen through a plan that Pedro has to make it very easy to create such dashboard-based apps without any programming ability, and then publish them to the Marketplace. This planned community plugin kick-starter [CPK] will use CDE to create the front-end, and the Pentaho Data Integration software, KETTLE and Spoon, for the backend logic. I believe that both internal and external consultants, integrating Pentaho into an organization's decision making process, will find this ability exciting, as many of these system integrators are not Java developers. The ability to push such apps to the Marketplace will also be embraced by both CE and EE users, as most customers are excited by the idea of openly sharing their solutions, and enjoy the resulting community recognition.
Webdetails fits very well into creating a finer exploratory analytics experience for the customers, and will make Pentaho a superior choice for big data. Combined with Instaview, and with the proper roadmap, it may even push Pentaho into the new Data Grok market, not only helping users answer the questions they have, but actually pointing out the questions that the data set can answer, even if the user didn't think of it.
Both CE and EE users and customers of Pentaho should welcome this acquisition, and look forward to the better UX and data visualization. Most importantly, they should plan on how they can contribute to, and benefit from the Pentaho Marketplace, as it becomes an important part of the Pentaho ecosystem.
Cara is coming to a brick-and-mortar store near you. But don't be insulted when she doesn't recognize who you are.
Recently, I met with Jason Sosa, the CEO and Founder of IMRSV, Inc… twice. What came through to me was his passion for understanding the societal and human impacts of the technologies he creates and brings to market. This passion makes their mantra of and adherence to "privacy by design" very real and central to their approach.
Cara is the core software product from IMRSV, Inc. Cara analyses your face, and determines demographic, attention and emotive statistics about you, without attempting to identify you. As IMRSV states, Cara turns any connected camera into an intelligent sensor, but does so anonymously. Move from one Cara camera to another, or move away and back again to the same Cara camera, and the temporary ID number associated with you changes.
While Cara is pre-launch, I'm excited both by the technology, and by the IMRSV, Inc business model. The business model is very simple, whether a small shop owner or a developer interested in using Cara as part of a sensor analytics ecosystem, you pay $39.95 per camera, which includes the stand-alone Cara software and the Cloud-based data-as-a-service. The possibilities presented by Cara are what really got me going, fueling both an exciting initial briefing and a follow-up four-hour "lunch" and demo.
What's does any of this mean? Here's a few examples.
In addition to starting companies, Jason is very interested in the Singularity, and the impending impact of technology upon human employment and self-identity. This has led both to the "Privacy by Design" and "Principles of Good Use" for developers/partners. If you don't believe me, maybe you'll believe Jules Polonetsky.
"Privacy by design solutions are critical to implementing new technologies in a world were data collection has become ubiquitous. Steps that Cara takes such as not collecting any personal information, and not storing, transferring or recorded any images are key to ensuring privacy concerns are addressed as these technologies are rolled out.”
- Jules Polonetsky
There are various pieces of research out there that show that the Internet of Things will be a 15 trillion dollar market right now. By 2020, I strongly believe that there will be over a trillion sensors deployed and that if your "thing" isn't connected, it won't be a viable product. Companies like IMRSV, Inc are providing the ecosystem to allow sensor analytics from everyday objects at very affordable prices. This will push this market even further and faster than the pundits anticipate. So, let me put on my tinfoil hat and stand on my soap box:
Big Data is a catchy phrase. Unfortunately, it is often misused and misunderstood. Often, Hadoop and Big Data are used interchangeably; as if the Apache Hadoop family of projects are the only solutions for Big Data, or that that only use for these projects is from Big Data. Neither is true.
As an EDW/BI practitioner, I watched the Hadoop, or really, the Map/Reduce framework, be embraced and forced into being by software developers who were frustrated by Structured Query Language (SQL) and the need to create Entity-Relationship Diagrams (ERD) as data models or schæmas. They were equally unhappy with the various work-arounds to access Relational Database Management Systems from within their programs, such as Object Relational Models (ORMs) and Data Access Objects (DAOs). At first, I felt that these developers were simply lazy.
However, as I worked more with these so-called NoSQL technologies, it helped to clarify the dissatisfaction that I felt during the years I was leading EDW and BI projects. Thirty years ago, I worked in Aerospace System Engineering, developing methods and algorithms for risk assessment using Bayesian statistics. But, by 1996, I became involved in my first EDW project. Since then, the actual structure and functions associated with the data - defined by the data, became less important than fitting the data into an artificial structure imposed by business process models.
Don't get me wrong. Relational algebra, relational calculus and the DBMS technologies that came out of this mathematics, are all very useful. And, in the right hands, SQL is a very powerful language. ERDs provide a wonderful way to map data to business processes and to both transactional and analytic systems.
But… There is so much more that can be done with the data coming from traditional human-to-machine (H2M) interactions, but increasingly from human-to-human (H2H), machine-to-machine (M2M) and machine-to-human (M2H) exchanges. The interweaving of the flows of data from such disparate sources is what drives my research today.
These, and over 70 other use cases that I'm cataloguing, come from the innovation surrounding hype of Big Data, and the Data Science movement. In a recent Quark, I've classified this innovation into 11 areas. A compete mindmap is linked from the initial mindmap shown below, and in the report.
The Quark covers the trends coming from these innovations, and develops the four keys required to bring valuable decision making processes into your organization from these innovations. It's entitled "Big Data: It's Not the Size, It's How You Use It". For such a simple report, it took over 8 months to develop. Mostly this delay was caused by the fast-paced evolution of the innovations. The executive summary from the Quark is linked from the title.
I hope that you find that information, as well as the mindmap, useful in incorporating inference, prediction, insight and performance with intuition for making better decisions.
For all the silliness surrounding Big Data and Data Science, all the hype and all the controversy, there are actually very innovative and disruptive technologies coming from this area, this new approach to data management and analytics [DMA]. How do we categorize the vendors or the technologies that have never existed before?
One new area is Predictive Analytics, also called Predictive Intelligence. Since predictions are not analytics, as the term is used in BI, and certainly not the Intelligence used in BI, I don't like either, but prefer the simpler "Predictives". Four companies with which I've had briefings, fall into the Predictives category, but each of these companies have very different approaches and technologies for performing predictives. These companies are Opera Solutions, Alpine Data Labs, INRIX and Zementis. There are other companies that I'll include in a full report after receiving briefings, such as KXEN, Soft10 and Numenta. By the way, Numenta's product is named "Grok". Given their differences, do they really all belong in the same category?
Opera Solutions: Acting on petabytes of data, Opera Solutions provides a signal hub stack starting with data management, going through pattern matching in the signal layer, and, enhanced by their own Data Science teams, resulting in predictions and inferences for better decisions for enterprise advantage, understanding the "signal" is more important than the underlying technology, to actually create front line productivity through signals manifesting and adjusting "gut feel" where machines don't direct humans but do the heavy lifting.
Alpine Data Labs: Alpine Data Labs brings mathematical, statistical and machine learning predictive methods to the data in situ, no matter how small nor how big the data sets, within a variety of RDBMS technologies and Hadoop distributions. Alpine Data Labs helps data science teams address the data where it lays, across data types and functional areas, working with all the data to bring insight to bear on better decisions.
INRIX: INRIX data science teams and technology provides unique predictives using connected cars, connected devices and connected people.
Zementis: Zementis brings predictive modeling into decision management through their data science teams, Adapa product and strong commitment to the predictive markup modeling language [PMML]. Through partners and customers Zementis works with traditional and innovative data sources to provide decision management from predictives, data mining and machine learning for marketing solutions, financial services, predictive maintenance and energy/water sustainability.
One of the more interesting things to come out of data science is how do you really understand the data that is being gathered and presented. Two of the companies with which I've recently have had briefings, challenge the categories of Data Discovery or Data Exploration. However, each of these companies have different technologies, and different approaches to fully, deeply understanding your data, and to being able to draw conclusions from the data before doing other, more formal analytics. Over the past month, I've had the good fortune of having very in-depth, in-person briefings by both of these companies. Both of these companies are helping those who need it most to truly, fully, deeply, easily understand their data. These approaches, while very, very different, both constitute an entirely new category. Beyond data discovery, beyond data exploration, I call this new category Data Grokking.
"Grok" as I wrote in 2007, means to
"to fully and deeply understand"; [but to you need some background on the word's origins]. It's Martian and not from any Terran language at all. It comes from the fertile mind of Robert A. Heinlein, and was brought to Earth by Valentine Michael Smith in Heinlein's wonderful 1961 novel Stranger in a Strange Land.
One of these companies is still in stealth mode, and I won't mention their name here. The other is Ayasdi, and Ayasdi takes a very, very interesting approach to grokking your data.
These two very different technologies, based upon very different science and mathematics, do indeed allow us to fully and deeply understand our data. Much like the Martian ceremony, the DataGrok allows us to mentally ingest our data, to realize creative insights from our data sets, and to recognize the fundamental interweaving among the data, that, prior to these two innovative firms, could only come about through a long, arduous struggle with the data sets.
As I mentioned, the one company is still in stealth mode, so I'll write about Ayasdi here.
Ayasdi comes out of the intersection of Topology and Computer Science, as brought together by a Stanford Professor, Gunnar Carlsson, and Gurjeet Singh. The project started as a DARPA contract that has spanned more than four years, comptop. The CompTop project included Duke, Rutgers & Stanford nodes. Topological methods discover the structure of the data - this is somewhat analogous to, but not the same as the probabilistic or cumulative distribution or density functions [pdf, PDF, cdf or CDF].
Ayasdi is focused on four markets:
From this, you can see that Ayasdi customers go after expensive data, i.e. expensive to collect, expensive to use. Iris is the front end to the Ayasdi Platform, and while available as a private cloud, their offering is primarily SaaS.
The analyst community is trying to figure out where to put Ayasdi, thus my category of DataGrok. Another area of confusion is "What is the right tool of each step of the process from DataGrok to inferences and predictions?" Some of this stems from mistrust of machines, but we need machines that do more than count and sort, we need machines that help us to find insight and improve performance.
A sensor is anything that can create data about its environs. A more formal definition is
a device that detects or measures a physical property and records, indicates, or otherwise responds to it -New Oxford American Dictionary
A very simple example is a thermocouple.
Essentially, two metals are bound together such that when the environment around this wire becomes hotter or colder, the metals produce a voltage. Through this thermoelectric effect, this strain translate into a voltage differential across the wire, producing an electrical signal. A simple voltmeter can read this signal, and one could calibrate that electrical signal to be read as degrees of temperature change.
You likely have one of these in your home thermostat. Perhaps you have a very simple thermostat that turns your home heater on and off.
Perhaps you have a more complex, programmable thermostat that can control the temperature and humidity of your home through a furnace, air conditioner, humidifier/dehumidifier and fans, with different settings for different times of the day and days of the week.
Perhaps you have something that looks very simple, but is now part of a complex system that includes not only your home HVAC system, but your computer and smartphone, and computers and analytic software at your utility company.
And this progression is why the Internet of Things is about to explode with Connected Data, with sensors being the new nerve endings of an increasingly intelligent world.
Imagine sensors streaming Connected Data from your home entertainment system, refrigerator & most of its contents, toaster, coffee maker, alarm clock, garden, irrigation, home security, parking on the street in front of your home, traffic flowing by your home to your destination, air quality, and so much more.
We will interact with the world around us in ways that will change our decision making processes in our personal lives, in business, and in the regulatory processes of governments.
If you want to learn more, join IBM and my fellow panelists on Thursday, Sept. 13, from 4 to 5 p.m. ET to chat about cloud and the connected home using hashtag #cloudchat.
|<< <||> >>|