Bring distributed resources up when you need them, shut them down when you don’t.
Focus IT and business resources on the important opportunities and fun challenges.
Watch development and operations skills merge into devops.
Interweave open source and proprietary technologies, and data from open, government, third-party and internal sources.
What’s not to like?
The technologies and business processes for elastic, cloud computing, a.k.a. “The Cloud”, are coming into the mainstream, building upon the work of companies such as Salesforce and Amazon and Enomaly, and the vision of avant-garde CIOs such as Ben Haines. Growing out of grid computing in the 1990s, evolving through application service providers to everything-as-a-Service [*aaS], the Cloud is now a major component of IT strategies. There has been one area resistant to this type of IT outsourcing: data management and analytics [DMA] in its various forms. Early attempts to bring BI into the Cloud ended in bankruptcy for pioneering companies such as LucidEra, which ended operations in 2009. Recently, other entrepreneurial ventures have targeted the BI market with Cloud services and Business Intelligence as a Service [BIaaS] and Data as a Service [DaaS] platforms. Self-service BI, data visualization and data preparation companies also operate in the Cloud. There is little doubt [though still some lingering doubt] that data integration and governance, data warehousing, business intelligence, advanced analytics and data science can thrive as Cloud platforms and services.
Who will dominate in the Cloud DMA market? Will one or two of the current players not only survive as independents but emerge with undeniable victory? Or will established enterprise software and Cloud companies acquire those startups, much as the traditional BI vendors became absorbed into the larger IT vendors?
At the end of 2014, Prakash Nanduri, the CEO of the ground breaking firm Paxata turned a forecaster’s gaze upon 2015, to help with his strategy in guiding Paxata and in advising their PaxPro customers. The results were six predictions published as an article in Forbes. If you’ve been following along, you know that the Paxata Data Divas. Lilia Gutnik, Dr. Julie Mayhew, Tricia Lee McNabb and Cari Jaquet, invited us to do a series of short webinars and tweetchats to have some fun with these predictions. The first prediction, "The lines will blur between data scientists and data analysts.", was the subject of the first #PaxChat on March 11. The next will be held on Wednesday, March 25 at 11:00 a.m. PDT covering Prakash’s second prediction, “Microsoft and Salesforce.com will take a dominant share of the Cloud based BI market”. What a bold and interesting prediction!!! One, a large enterprise and consumer player, that has built a reputation for proprietary hardware and software, backed by an enthusiastic developer and partner community, with ofter derisive customers. The other, arguably, the dominant Cloud Software as a Service [SaaS] company that made its motto “No More Software” and has grown from customer relations management through salesforce automation, into customer service and marketing, growing amazing platforms allowing early adoption of mobile and the Internet of Things [IoT], naming this the Internet of Customers and Internet of Connected Products [or Internet of Carrier Pigeons]. Despite its best efforts over two decades, Microsoft is not accepted as a BI company, let alone a CloudBI company. Salesforce has left analytics up to its partner community until very recently, with the acquisition of relateIQ and EdgeSpring, and introduction of Wave. Read Prakash’s predictions for his well reasoned arguments on why these two might just wind up being the dominant players in CloudBI this year!
Listen to the webinar to find out why the Data Divas and the Data Archons are skeptical ;-)
Join the tweetchat, by following #PaxChat and give your own take on the following questions.
Prediction Two: Microsoft and Salesforce.com will take a dominant share of the Cloud based BI market.
Q1 #PaxChat Are you ready to move your #BI to the #Cloud?
Q2 #PaxChat Which parts of data management and analytics #DMA are enhanced by Cloud?
Q3 #PaxChat Are @Microsoft @Azure #PowerBI @Office365 and @Salesforce #Wave on your roadmap?
Q4 #PaxChat If 80% of the #DMA work is #DataPrep do @Microsoft and @Salesforce offerings address self-service governance?
Bonus Q #PaxChat Will the current #CloudBI players fall behind in 2015?
The action comes alive March 25:
As we started to celebrate the end of 2014 and anticipate all that 2015 will bring, Prakash Nanduri published an article in Forbes with six predictions. This wasn’t to show his prowess as a prognosticator or futurist, but to bring focus to his strategy as CEO of Paxata, and to his advice to Paxata customers. Prakash’s strategic thought can be found through the Paxata newsroom and blogs.
Soon after the Forbes article was published, the Paxata Data Divas invited me to do a series of short [less than ten minutes each] webinars, where we discussed each prediction.
If you don't know the Paxata Data Divas, you should:
Diva-in-Charge: Cari Jaquet, VP of Marketing
Diva Control: Tricia Lee McNabb, Director Marketing Programs
Diva-at-Large: Lilia Gutnik, Product Person and Cruise Director
Diva-of-Mayhem: Dr. Julie Mayhew, a.k.a. Doctor Mayhem, Pre-sales Engineer
Following up on these webinars, the Data Divas asked us to host a series of tweetchats based upon Prakash’s predictions, extending our discussion to everyone following on Twitter. The first in this series of TweetChats will begin at 11:00 a.m. Pacific Time, using the hashtag #PaxChat. Coincident with this, Julie will be at The Drive Conference, while Clarise and I will be at IoT day at EclipseCon. As with most tweetchats, we will have five questions based on the first prediction “The lines will blur between data scientists and data analysts." As has become the custom, we will use the format Qn as we tweet each question; please respond with An and the hashtag #PaxChat
Q1: How do you define data science?
Q2: Is data science a solo or team sport?
Q3: What is the difference between a data scientist and a data analyst?
Q4: How has your company introduced data science?
Q5: How do you bring data science into production?
If there is time, we will have a bonus question. You can find a list of all 11 questions that were considered, and more from Cari, at her blog post "PaxChat - The Predictions Come Alive" as well as comment there with any questions or recommendations.
You can view the webinar on YouTube first.
Thank you Divas and Paxata for sponsoring these #PaxChats on the business impact of an increasingly complex and interesting data landscape.
Now that the #PaxChat One is over, you can see the results below. The next PaxChat is scheduled for Wednesday, March 25 at 11:00 a.m. PDT. It will cover the second of Prakash's predictions for 2015: "Microsoft and Salesforce.com will take a dominant share of the Cloud based BI market".
The week of 2013 October 28 was a big one for Paxata, Inc. Founded in January of 2012, followed by advisories, beta customers also known as "Pax Pros", and 12 sprints, Paxata quietly released their first GA product in May of 2013. With panels and debuts at the Strata + Hadoop conference in New York and other events, leading up to announcements and demonstrations at the Constellation Connected Enterprise at the Ritz Carlton in Half Moon Bay, California, Paxata officially left stealth mode, publicly discussing:
The most wondrous feature of the Paxata Adaptive Data Preparation Platform is how it adds semantic richness to one's data sets by automatically recommending and linking to third-party and freely available data. This allows one to bring in firmographic, demographic, social and machine data within the context of the user's goals. This is what truly allows the Paxata Adaptive Data Preparation Platform to go beyond data exploration and discovery.
Paxata has received a fair amount of press as well, some of which I've referenced below. However, all this press misses what is one of the most important additions Paxata makes to the toolboxes of Data Management & Analytics [DMA] professionals… the ability to present questions to the user that they may not have thought of on their own. Paxata was one of the companies that inspired my DataGrok blog post. Paxata was in stealth at the time, and couldn't be named then. Now, I'm happy to be able to write that Paxata is one of the few companies or projects building tools that allow the creator and user of data to go beyond data discovery, beyond data exploration, to being able to fully, deeply understand their data. Data discovery and data exploration tools allow one to determine if various data sets can answer the questions posed by business, engineering or scientific challenges. These tools go further by exposing data integrity issues among data sets or data quality problems within a data set. Some such tools might help the user find new data sets or how various data sources within an organization might fit together in a data warehouse. Some hark back to grep, sed and awk to parse textual data. Others provide probabilistic and statistical tools to determine the appropriate shape, distribution or density functions of a data set. But Paxata is one tool that does all these and more, and does it through your web browser in a collaborative fashion, maintaining the history of each collaborator's operations on the data sets.
When my partner, Clarise, and I were first briefed by Paxata in November of 2012, we were so excited that we stayed over three hours. The demonstration, of what was then a much rougher product than what you see today, incited both of us to exclaim how much we wished that we had this tool back in our DMA practitioner days. We were treated to a demonstration using the data from another Constellation Research customer with which we were familiar. Over a year later, we were treated to a pre-launch briefing using current data sets from that same customer. The ease of use, the pleasantness of the user experience, the simplicity with which one could complete complex tasks, from histograms to column-splitting, showed the maturity that Paxata had gained since our first exposure. What was most important to us, was that Paxata could show a solution for every need that we would like to see in the Adaptive Data Preparation Platform, from our experiences in implementing data warehousing and business intelligence programs since 1996, as well as our decades of experience in computational statistics and operations research.
It allows data warehousing and BI extract, transform and load professionals, business analysts, data scientists, chemists, physicists, engineers, researchers, and professionals of all skills who work with data to completely understand and resonate with their data sets. The Paxata Adaptive Data Preparation Platform does what few other tools can do, it provides clues to what you didn't know to ask. It poses questions that the data can answer, but that you didn't think to ask. And it does all of this in a familiar looking interface, in HTML 5, in your favorite web browser, wherever you are, whenever you need it. In Paxata's words:
Paxata pricing is published and open. There are three subscriptions available:
Each of the Paxata subscriptions build upon the first, from an individual subscription to the ability for those with individual subscriptions to share in a single environment, to a full organization-wide subscription. Of course, what makes this possible, is that the Paxata Adaptive Data Preparation platform is available as a Cloud service, accessible through any modern HTML 5 web browser whether that's from a sophisticated, high-end workstation, a tablet or smart phone.
The main value comes not from a nice-looking, fairly intuitive interface, but from the underlying technologies that makes Paxata so useful: powerful Mathematics, Semantics and Graph Theory algorithms. The results of which are easily accessible through this Cloud-based, web experience, while the complexities are under the covers, not getting in the way. This fact is what makes the Adaptive Data Preparation Platform so accessible to business analysts, and other creators and users of data who are not PhD statisticians. Paxata uses proprietary algorithms that detect relationships among data sets, using probabilistic techniques to select the best joins, semantically typing the data so that it can intelligently enrich the data, clean the data and merge the data based upon context not just metadata. All of this is done in an ad hoc fashion, with no predefined models or schæmas needed. These proprietary algorithms make use of
Distributed computing and in-memory technologies allow these computational statistics algorithms to be,cost effectively executed in parallel, across massive data sets. Coupled with the advancements in visualization technologies, Paxata is able to address a 13.5-16 Billion dollar market over next three years, with extremely attractive pricing. The true return on investment from Paxata comes from flipping the DMA equation around. Currently, a common truism is that 80% of the time on a DMA, Data Science, DW or BI project is spent in preparing data; 20% in analyzing the data. Paxata reduces that data preparation percentage, such that 70% is analytics, 30% is preparation. This reduces not only the labor directly involved in preparing the data, but also allows an Agile framework to address significant business needs at the right time, in a sustainable fashion.
Paxata's strategy is to attach to the QlikView and Tableau markets that are being hampered from enterprise adoption because of these very data preparation challenges. Along with these partnerships, is the partnership with Cloudera, providing enterprise class access to modern, distributed data storage systems. Add connectors to common enterprise and external data sources and the third-party Paxata Enrichment Libraries, and it is obvious to the most casual observer that the Paxata Adaptive Data Preparation Platform addresses the most frustrating complaint of Data Scientists and Business Analysts alike: that too much of their time is spent on plumbing, whether directly or waiting for IT. We have long spoken about the need for IT to give up control of data, and realize that their most effective role is to provide a framework of success for end-users to fully, deeply understand and use their data to solve real problems. Paxata creates this framework for success.
Other Sources to learn about the Paxata launch:
The number of articles about the Internet of Things [IoT], Machine-to-Machine communication [M2M], the Industrial Internet, the Internet of Everything [IoE] and the like have been increasing since I wrote my post introducing my IoT mindmap almost a year ago. I learn from some of them, some I nod sagely in agreement, and others cause me to scratch my head in confusion. One in particular this last week fell in that last category, when they claimed that all the terms listed here all mean the same thing.
From my reading, briefings and research over the past year, I've come to a different conclusion. The following definitions are my opinion. I can't say that any authority has certified these definitions. I believe them to be accurate, and if any vendor with an interest in any of these definitions strongly agree or disagree, I would be very much interested in talking with you.
The first thing to be considered is Machine-to-Machine communication. M2M is really just one of four types of interchanges that occur over the Internet, intranets and any command, control, communication, computing or intelligence network. The other types are Human-to-Machine [H2M], Human-to-Human [H2H] and Machine-to-Human [M2H]. H2M and H2H interchanges have been around since the beginning of ARPAnet, which evolved to become the Internet. From the many different protocols at the beginning, such as FTP and Gopher [among many more], two have come to dominate Internet traffic:
Every transaction made using a computer: online transaction process [OLTP] electronic data interchange [EDI], and eCommerce; every purchase you make at your favorite web store, is an example of H2M.
Of course, starting with email [still the dominant form of communication over the Internet and for businesses and individuals] and expanding to Twiter, Facebook, Waze, Yelp, Foursquare, Yammer, all the various instant messaging networks, voice over Internet protocol [VoIP] and your favorite public or private social network, we have many examples of Internet enabled H2H communication.
These two, H2M and H2H, have become so prevalent, and so important to business, governments and our personal life, that the over-hyped phenomenon "Big Data" was born. But the importance and pervasiveness of M2M, and soon, M2H data will swamp the so-called data tsunami of the past decade. Predictive maintenance, building automation, elastic provisioning, machine logs, software "phoning home" and automated decision support systems are all good examples of direct M2M interchanges where one sensor, device, embedded computer or system has a productive exchange with another such machine, without concurrent human intervention. Self-quantification, gamification, personalized medicine and augmented reality [AR] are all early examples of M2H interchanges, where sensors, devices, embedded computers or system directly provides relevant information to an individual, allowing for better informed decisions.
The Internet of Things was coined in 1999 by Kevin Ashton. Since then, the term has come to mean any device that is connected to the Internet. Most people don't consider computers, routers, edge equipment and other Internet infrastructure hardware to be a "device", and usually exclude such hardware from consideration as a thing that uses that infrastructure. For many, the devices are only smart phones, feature phones and tablets. This has led to predictions by Cisco and GSMA to declare that there will be 30 to 50 billion devices connected to the Internet by 2020. However, even these organizations, and most people with whom I speak who have skin in the IoT game, feel that my own prediction of one trillion devices connected to the Internet by 2020 is more likely. These devices span from individual, but connected sensors, to heavy machinery. However, as companies come out with Tweeting diapers, glowing clothing and other such silliness, the Internet of Things is in danger of becoming a fad. So, what is the Internet of Things? To my mind, the Internet of Things comprises any sensor, embedded sensor, embedded computer, component, package, sub-system, systems, or System that is connected to the Internet and intended to have meaningful interchanges with other such items and with humans. The Internet of Things primarily uses M2M and increasingly M2H interchange.
The first treatment of the IoT as large, complex system, to which I was exposed was at networking event in 2008… One of those events where IBM was introducing their new initiative for a Smarter Planet. The Smarter Planet brings complex systems such as the Smart Grid, building automation across facilities, water management, traffic management, Smarter Cities and Smarter Farms under one System. One approach and one initiative that raises the IoT to a new level of importance for world governments, global businesses and individuals from the poorest village to the most cosmopolitan city. The Smarter Planet initiatives go beyond IoT, beyond the individual things, to treating all such things, the Internet, the protocols, process and policies as one very large, complex, possibly cognate system.
The Industrial Internet is a term coined by General Electric [GE] in 2011. At a very simple level, the Industrial Internet can be thought of connected industrial control systems. But the impact is much more complex, and much more significant. The first thing to be realized is that connected sensors and computing power will be embedded in everything, from robots and conveyor belts on the factory floor, to tractors and irrigation on the farm, from heavy equipment to hand drills, from jet engines to bus fleets; every piece of equipment, everywhere. The Industrial Internet also primarily uses M2M and M2H. While this sounds much like the Internet of Things, the purpose is much different. The Industrial Internet is about changing business processes and making data the new coin of the realm. GE is very serious about the Industrial Internet, and while they don't use the term yet, Sensor Analytics Ecosystems. Data Marketplaces are rapidly becoming core to GEs businesses, as proven by their recent 140 million dollar investment in Pivotal, the new Big Data Platform as a Service [PaaS] by EMC. Another excellent example of the importance of the Industrial Internet comes from Salesforce.com use of The Social Machine by Digi International and its Etherios business unit, in bringing sensor data into customer relationship management [CRM] by allowing sensors embedded in industrial refrigerators, hot tubs, and heavy and light equipment of all types to open SFDC chatter sessions and to file cases.
Cisco has recently started two initiatives related to the IoT, the Internet of Everything [IoE] and Fog Computing. IoE seeks to bring together H2H, H2M, M2M and H2H interchanges. On June 19th of this year, Cisco introduced their IoE Value Index [link to PDF]. By bringing together people, processes, data, and things, and with some impressive research to back it up, Cisco feels that the IoE, in 2013, could bring 1.2 Trillion Dollars in added value, and by 2022, 14.4 Trillion dollars in added market value to business around the world. Fog Computing tends more to the infrastructure of the IoE, bringing the concepts of Cloud Computing, such as distributed computing and elastic provisioning, to the edge of the network, with an emphasis on wireless connectivity, streaming data, and heterogeneity.
While some of the above are corporate initiatives, they each represent important and distinct concepts. In addition to these from IBM, Cisco, GE, EMC and Salesforce.com, there are other initiatives and products, in this sphere, coming from HP, Oracle, SAP, MuleSoft, SnapLogic, Nuance, Splunk, Mocana, Evrythng, Electric Imp, Quirky, reelyActive, Ayla, SmartThings, Withings, Fitbit, Jawbone including BodyMedia, Nike, Basis, Cohda Wireless, AT&T, Verizon, Huawei, Orange, Belkin, DropCam, Gravity Jack, Alcatel-Lucent, and Siemens. Platforms, software, sensor packages and services, are being developed by a wide variety of innovative companies:
These innovative companies, and others, are implementing one or more of these concepts in a variety of ways. As I stated at the beginning, I don't think that these concepts are the same. While the IoT was first named 14 years ago, it is still early days in its implementation. There are many ways that the Internet of Things might evolve, and many missteps that could lead the IoT to be a passing fancy, leaving some important changes in its wake, but never reaching its full potential. I think there is one way, and one way only, that all of the concepts and initiatives will come together and change everything that we do, how we make decisions, how we think about ourselves, how governments make policy, how businesses make money: The Sensor Analytics Ecosystem [SAE]. Here's a tease of a mindmap giving a hint of what I mean by the SAE. Look for my upcoming report "Sensor Analytics as an Ecosystem" and a series of research reports delving into each area introduced therein. The companies listed above are building out parts of the SAE, and will feature heavily in these reports.
EC3 Energy Home PageEC3 Energy Home Page
Pentaho offers one of the most complete data management and analytics suites available both as an open source solution, its Community Edition, and as an Enterprise Edition:
Webdetails is a 20-person strong consultancy based in Portugal, founded by Pedro Alves, focused on building Pentaho solutions for its customers, and on data visualization. In addition to the consulting work, Webdetails has become the major committer for the open source Community Development Framework project, originally developed by Ingo Klose. In the course of their work, as inspired by the muse of customer needs, Webdetails has grown the original CDF project into a full suite of OSS data visualization and dashboard projects, CTools. Over the past year, the talented web details user experience teams, seems to have put out a new CTool almost monthly.
Pedro Alves is an extremely well-respected member of the Pentaho community, leading community events and training, appearing often in the forums and IRC, and staying connected through Twitter and Skype. Recently, Pedro was highly active in helping to create the Pentaho Marketplace, which provides direct access from the BI Server web interface for users, to a series of plug-ins for the BI Server, including CTools, and other community and third-party projects.
I have the pleasure of knowing Pedro, and several other of the Webdetails and Pentaho teams. This week I was able to speak with Pedro, as well as Davy Nys, Vice President, EMEA & APAC at Pentaho, and Doug Moran, one of Pentaho's Founders.
Pedro doesn't feel that the acquisition will change Webdetails, in that both the UX and consulting teams will continue as before. However, both community and enterprise users of Pentaho will feel the impact of both teams, as the lessons learned from Webdetails consulting projects are implemented by the UX team, not only in the Dashboards and data visualizations tools, but also, per Davy, in the overall UX throughout all the Pentaho products. Having worked with Pentaho tools as a practitioner in the past, I know that business users will appreciate this as Pentaho becomes both easier and more pleasant to use. The data scientists will also appreciate more and better tools to draw the story out of the data, and present it to the subject matter experts and business leaders in an Agile fashion.
As Pedro mentioned, most things won't change, such as the fact that CDF is the underpinning of all of Pentaho Dashboards, or the pace of development of new CTools. Several are currently underway. One that I can mention grew out of a request by the Mozilla Foundation, for a file and data browser for the Hadoop distributed file system [HDFS] that would be as easy as the file browser in any modern operating system. The result is CVB - community VFS browser. One thing that will change is that more of the CTools will make their way into the main branch of the EE product as they reach the appropriate state of maturity and stability.
Pedro has many plans for CTools, and for facilitating data visualization through Pentaho. But in addition to continuing his role as the general manager of Webdetails, and Chief Architect of CTools, Pedro will also be assuming the role of Senior Vice President of Community for Pentaho. As a long time friend of the Pentaho community myself, I have to say that there couldn't be a better choice.
One of Pentaho's Founders, Doug Moran, was the "Community Guy", who stayed in this role until the start of 2011, following the original community guy, Gretchen Moran. Doug's philosophy is that any open source community needs to stand on its own to be organic and strong. The Pentaho community is one of the strongest in the OSS DMA space, and as a result, Doug felt comfortable focusing elsewhere, and assumed management of all of Pentaho's "big data" products and Instaview initiatives. As SVP of Community, Pedro will be mostly focused within the company to integrate the community internally and help drive the corporate strategy for community. He'll continue to participate in the community, but as the Pentaho BeeKeeper model, developed by Pentaho CTO & Chief Geek, James Dixon, his main concern will be to assure that there is a rich environment for community innovation. As part of that, Pedro will also be actively pursuing ways to grow and leverage the Pentaho Marketplace. Doug also pointed out that the Pentaho community is also hugely valuable for QA and as a training ground for the best Pentaho developers. This is sure to continue with Pedro in his new role. Doug and Pedro have worked together since the early days of Pentaho, when Pedro decided to quit his job, and, with his wife, create a company devoted to professional services for Pentaho projects and products. This strong relationship between the original Community Guy and the new SVP of Community can only help to make an already strong community even better and more creative.
Davy pointed out to me that there has been an increase in customer demand for Dashboards that were in essence, apps within Pentaho. This might happen through a plan that Pedro has to make it very easy to create such dashboard-based apps without any programming ability, and then publish them to the Marketplace. This planned community plugin kick-starter [CPK] will use CDE to create the front-end, and the Pentaho Data Integration software, KETTLE and Spoon, for the backend logic. I believe that both internal and external consultants, integrating Pentaho into an organization's decision making process, will find this ability exciting, as many of these system integrators are not Java developers. The ability to push such apps to the Marketplace will also be embraced by both CE and EE users, as most customers are excited by the idea of openly sharing their solutions, and enjoy the resulting community recognition.
Webdetails fits very well into creating a finer exploratory analytics experience for the customers, and will make Pentaho a superior choice for big data. Combined with Instaview, and with the proper roadmap, it may even push Pentaho into the new Data Grok market, not only helping users answer the questions they have, but actually pointing out the questions that the data set can answer, even if the user didn't think of it.
Both CE and EE users and customers of Pentaho should welcome this acquisition, and look forward to the better UX and data visualization. Most importantly, they should plan on how they can contribute to, and benefit from the Pentaho Marketplace, as it becomes an important part of the Pentaho ecosystem.
|<< <||> >>|