Hot Topics in Enterprise Analytics PaxChat Three

Once upon a time, as 2014 drew to a close, the Paxata Data Divas asked us to review the predictions made by Prakash Nanduri, their CEO, as published in Forbes. This led to a series of short, 10 minute webinars. We have also decided to do a seres of tweetchats, using the hashtag #PaxChat. As with the webinars, there is one chat per each of the six predictions. Cari Jaquet, VP of Marketing, blogged about this recently.

Prakash Predicts

  1. The lines will blur between data scientists and data analysts.
  2. Microsoft and Salesforce.com will take a dominant share of the Cloud based BI market.
  3. Data Preparation replaces Big Data as hottest topic in Enterprise Analytics.
  4. Hadoop faces a make-or-break year in the larger enterprise market.
  5. Marketing becomes the biggest driver of BI decisions.
  6. The IoT becomes real for B2B.

These predictions were more about Prakash understanding the trends impacting Paxata and their customers. As Prakash wrote:

I see it as an exercise in running our business.In fact, I always try to anticipate what the market will look like, where vendors are going, and what I believe the future holds. It’s less like writing horoscopes and more like playing chess: contemplating the big moves that will matter, not necessarily trying to predict who wins the game.

These webinars and tweetchats are not so much about whether or not we agree with Prakash's prognostications; his analysis of the markets and trends. Though we do have fun with that aspect. We are really exploring the landscape around us. How do these trends affect us, our customers, and the industry that has become so important of late, data management and analytics [DMA].

Data Preparation replaces Big Data

Prakash's third prediction is "Data Preparation replaces Big Data as hottest topic in Enterprise Analytics". What is a hot topic? Popular search terms? Twitter trends? Number of meetups in your area devoted to one subject over another? Injuries industry analysts receive? Perhaps.

More important is where is the budget going and where are the skills lacking? Where is the pain in enterprise analytics. The PaxChat on prediction three, held at 11:00 a.m. PDT on April 8, covered these questions for our third tweetchat, exploring these areas.

Q1: Do you back the opinion that 80% of #DataScience is #DataPrep?
Q2: What is the role of #DataPrep in #DataScience?
Q3: What would help you the most in #DataPrep?
Q4: Are #BigData and #DataPrep competing terms, or is one dependent on the other?
Bonus Q: How will new and evolving available data sources impact data quality and governance?

You should also catch the 10-minute webinar on this prediction.

As you can see from these, we had a lively discussion. The big takeaway for me, is that data science, data analytics, big data, business intelligence… all forms of data analysis, even good ol' manual statistics, require data preparation. The excitement around data preparation today is how Paxata leverages Cloud and distributed computing to bring advanced machine learning algorithms to all types of users through current browser technology. This comes together into a marvelous user experience, that allows one to fully, deeply understand their various data sets and how those data sets may be brought together to not only answer known questions, but to suggest new questions to solve business, operational and technical challenges for all types of organizations. Regardless of how you view it, data preparation will be a hot topic, in searches, in budget, as a necessary step in dealing with big data.

The fourth #PaxChat is coming on Wednesday, 2015 April 22, at 11:00 a.m. PDT and will cover Prakash's fourth prediction "Hadoop faces a make-or-break year in the larger enterprise market". We hope to chat with you then.

PaxChat Two Cloud BI Dominance

The Cloud…

Bring distributed resources up when you need them, shut them down when you don’t.

Focus IT and business resources on the important opportunities and fun challenges.

Watch development and operations skills merge into devops.

Interweave open source and proprietary technologies, and data from open, government, third-party and internal sources.

What’s not to like?

The technologies and business processes for elastic, cloud computing, a.k.a. “The Cloud”, are coming into the mainstream, building upon the work of companies such as Salesforce and Amazon and Enomaly, and the vision of avant-garde CIOs such as Ben Haines. Growing out of grid computing in the 1990s, evolving through application service providers to everything-as-a-Service [*aaS], the Cloud is now a major component of IT strategies. There has been one area resistant to this type of IT outsourcing: data management and analytics [DMA] in its various forms. Early attempts to bring BI into the Cloud ended in bankruptcy for pioneering companies such as LucidEra, which ended operations in 2009. Recently, other entrepreneurial ventures have targeted the BI market with Cloud services and Business Intelligence as a Service [BIaaS] and Data as a Service [DaaS] platforms. Self-service BI, data visualization and data preparation companies also operate in the Cloud. There is little doubt [though still some lingering doubt] that data integration and governance, data warehousing, business intelligence, advanced analytics and data science can thrive as Cloud platforms and services.

Cloud BI Market Dominance

Who will dominate in the Cloud DMA market? Will one or two of the current players not only survive as independents but emerge with undeniable victory? Or will established enterprise software and Cloud companies acquire those startups, much as the traditional BI vendors became absorbed into the larger IT vendors?

At the end of 2014, Prakash Nanduri, the CEO of the ground breaking firm Paxata turned a forecaster’s gaze upon 2015, to help with his strategy in guiding Paxata and in advising their PaxPro customers. The results were six predictions published as an article in Forbes. If you’ve been following along, you know that the Paxata Data Divas. Lilia Gutnik, Dr. Julie Mayhew, Tricia Lee McNabb and Cari Jaquet, invited us to do a series of short webinars and tweetchats to have some fun with these predictions. The first prediction, "The lines will blur between data scientists and data analysts.", was the subject of the first #PaxChat on March 11. The next will be held on Wednesday, March 25 at 11:00 a.m. PDT covering Prakash’s second prediction, “Microsoft and Salesforce.com will take a dominant share of the Cloud based BI market”. What a bold and interesting prediction!!! One, a large enterprise and consumer player, that has built a reputation for proprietary hardware and software, backed by an enthusiastic developer and partner community, with ofter derisive customers. The other, arguably, the dominant Cloud Software as a Service [SaaS] company that made its motto “No More Software” and has grown from customer relations management through salesforce automation, into customer service and marketing, growing amazing platforms allowing early adoption of mobile and the Internet of Things [IoT], naming this the Internet of Customers and Internet of Connected Products [or Internet of Carrier Pigeons]. Despite its best efforts over two decades, Microsoft is not accepted as a BI company, let alone a CloudBI company. Salesforce has left analytics up to its partner community until very recently, with the acquisition of relateIQ and EdgeSpring, and introduction of Wave. Read Prakash’s predictions for his well reasoned arguments on why these two might just wind up being the dominant players in CloudBI this year!

The Tweetchat #PaxChat Two

Listen to the webinar to find out why the Data Divas and the Data Archons are skeptical ;-)

Join the tweetchat, by following #PaxChat and give your own take on the following questions.

Prediction Two: Microsoft and Salesforce.com will take a dominant share of the Cloud based BI market.
Q1 #PaxChat Are you ready to move your #BI to the #Cloud?
Q2 #PaxChat Which parts of data management and analytics #DMA are enhanced by Cloud?
Q3 #PaxChat Are @Microsoft @Azure #PowerBI @Office365 and @Salesforce #Wave on your roadmap?
Q4 #PaxChat If 80% of the #DMA work is #DataPrep do @Microsoft and @Salesforce offerings address self-service governance?
Bonus Q #PaxChat Will the current #CloudBI players fall behind in 2015?

The action comes alive March 25:

Paxata's CEO and co-founder Prakash Nanduri's 2015 Predictions Tweet Chats

As we started to celebrate the end of 2014 and anticipate all that 2015 will bring, Prakash Nanduri published an article in Forbes with six predictions. This wasn’t to show his prowess as a prognosticator or futurist, but to bring focus to his strategy as CEO of Paxata, and to his advice to Paxata customers. Prakash’s strategic thought can be found through the Paxata newsroom and blogs.

Predictions

  1. The lines will blur between data scientists and data analysts.
  2. Microsoft and Salesforce.com will take a dominant share of the Cloud based BI market.
  3. Data Preparation replaces Big Data as hottest topic in Enterprise Analytics.
  4. Hadoop faces a make-or-break year in the larger enterprise market.
  5. Marketing becomes the biggest driver of BI decisions.
  6. The IoT becomes real for B2B.

Soon after the Forbes article was published, the Paxata Data Divas invited me to do a series of short [less than ten minutes each] webinars, where we discussed each prediction.

If you don't know the Paxata Data Divas, you should:

Diva-in-Charge: Cari Jaquet, VP of Marketing
Diva Control: Tricia Lee McNabb, Director Marketing Programs
Diva-at-Large: Lilia Gutnik, Product Person and Cruise Director
Diva-of-Mayhem: Dr. Julie Mayhew, a.k.a. Doctor Mayhem, Pre-sales Engineer

Following up on these webinars, the Data Divas asked us to host a series of tweetchats based upon Prakash’s predictions, extending our discussion to everyone following on Twitter. The first in this series of TweetChats will begin at 11:00 a.m. Pacific Time, using the hashtag #PaxChat. Coincident with this, Julie will be at The Drive Conference, while Clarise and I will be at IoT day at EclipseCon. As with most tweetchats, we will have five questions based on the first prediction “The lines will blur between data scientists and data analysts." As has become the custom, we will use the format Qn as we tweet each question; please respond with An and the hashtag #PaxChat

Q1: How do you define data science?
Q2: Is data science a solo or team sport?
Q3: What is the difference between a data scientist and a data analyst?
Q4: How has your company introduced data science?
Q5: How do you bring data science into production?

If there is time, we will have a bonus question. You can find a list of all 11 questions that were considered, and more from Cari, at her blog post "PaxChat - The Predictions Come Alive" as well as comment there with any questions or recommendations.

You can view the webinar on YouTube first.

Thank you Divas and Paxata for sponsoring these #PaxChats on the business impact of an increasingly complex and interesting data landscape.

Now that the #PaxChat One is over, you can see the results below. The next PaxChat is scheduled for Wednesday, March 25 at 11:00 a.m. PDT. It will cover the second of Prakash's predictions for 2015: "Microsoft and Salesforce.com will take a dominant share of the Cloud based BI market".

Springbok Leaps into Data Harmonization

Springbok by Informatica is the latest entry in the nascent self-service data preparation market. Springbok is impressive on several fronts.

Data Harmonization.

Rather than data preparation, Informatica uses the term data harmonization, to emphasize the capabilities within Springbok to bridge the divide between the business and information technology. This feature truly differentiates Springbok. Other products for self-service data preparation focus 100% on the business side.

Self-service.

Springbok is truly a self-service tool. Though it is possible to integrate with other Informatica products, Springbok is a Cloud offering, available today, wherein you can upload your data all by yourself. For free. Try it yourself at

http://springbok.is/

Social Collaboration and Permutation Management.

These are Informatica’s terms that represent two sides of the same coin: identification of the most valued data players and the most trust data sources, allowing collaboration among business users of those data sets, and visibility to IT into the business use of data. Springbok ranks data users, as well as their Springbok recipes, data sources and data permutations to allow other users of that data to have confidence in unfamiliar data sources. Additionally, IT gains understanding of what internal and third-party business users are actually using and how they are actually using that data; all before a business user makes a request. This prevents IT being blind-sided by Shadow IT.

Connectivity.

While Springbok is fully a Cloud product, it easily connects to both on-premises and Cloud data sources. The family traits from Informatica’s long history in data integration show up here.

Why has a self-service data preparation market come to be, fast on the heels of adoption of self-service BI tools, such as Tableau and Qlik? To solve a problem. With the advent of next generation BI tools and the trend towards self service Data Management and Analytics (DMA) business users manipulate the data themselves. They always have. The first question that we are always asked in a data warehouse or business intelligence project is “Can I export that to Excel?" As Big Data and Data Science have moved from buzzwords to business practices, it has become widely known that 80% of Data Analytics process involves preparing the data for use by locating, cleansing and standardizing the data. Whether this is done by a Data Scientist using Unix shell tools like sed and awk, or as an iterative process between IT and business, it is time consuming. It is also boring. IT gets caught in the dilemma of handling the increasing data preparation requests and is playing catch up.

Springbok is a self service data harmonization tool that empowers business users to find the data and guide them through the process of enriching and shaping the data without the need for deep technical skills nor dependence on outside help. Let’ s take a closer look at the capabilities Springbok brings to all users along the gradient from non-technical to quant to IT specialist.

Automatic data suggestion.

Project springbok provides a quick and easy way to take data from one source and accurately combine them together. It provides the ability to automatically suggest data for data enrichment. Using any single file, any combination of sources, or all available data sources, Springbok suggests completion of spotty records or of an entire data set required for analysis through semantic analysis of the data. For example, if a column of data contains city names, but some records are blank, Springbok can use a Zip Code column within that file, or a Golden Record from a Master Data management System, or a third-party source, such as Dun & Bradstreet, to complete the data set.

Business user social Collaboration.

Springbok Promotes business user collaboration by allowing business users to access correct data, to know who is the person responsible for the evolution of that data, and to understand the lineage of that data. This promotes collaboration in the enterprise through reputation building trust. Within Springbok, a user can find other users and other data sources that their peers trust and use. This is invaluable to both new employees and to old-timers being confronted by new sources of data and changing business processes.

Again, Self-Service.

The basic tenant of the design philosophy of Springbok is self-service from a user uploading a file to the Springbok Cloud and immediately being able to play with their data, to that same user being able to export to the Self-Service BI tool of their choice.

Permutation Management.

This feature is a major differentiator of Springbok. IT is able to have visibility and understand the evolution of a data set as well as the identity of key data influencers. This promotes collaboration between IT and their respective business partners. Permutation Management also aids in finding key external sources, shining a light into Shadow IT. Further, Springbok is a stand-alone product; however, for Informatica customers, data is easily centralized with one-click to bring the business users’ recipes into informatica Power Center for production use in the analytic environment. This last function does raise questions about configuration management, change control, and regulatory compliance. We were assured by Informatica that this is a consideration for Springbok, and again, those enterprise roots show. This capability will be the customer's choice on how they wish to use it. Regulatory compliance and traceability will be handled by exposing the Springbok logs for audits. Notice the “will be”. The one-click instantiation of a Springbok recipe as a PowerCenter transformation is on the roadmap, but not available in the current version of Springbok, which is freely available.

an image of the Springbok Permutation Management
Springbok Permutation Management

One of the most impressive things about Springbok is the rapid adoption among Informatica customers and non-customers. One hundred users representing approximately 30 Informatica customers participated in the development of Springbok. In the three months since the announcement of the public Springbok beta program, over 1700 (now 2300 since our last briefing) users from more than 350 (now:500) organizations have been uploading data into the Springbok cloud, and happily manipulating that data. One other area where Informatica has recently delighted us, is the growth of the Informatica Marketplace. We are looking forward to the day when users can contribute non-proprietary Springbok recipes to the Marketplace. In today’s connected world, data management and analytics is the competitive edge. Participation in such a wide-ranging community provides the cross-fertilization necessary to fully leverage the changes coming about through evolving technologies from social media to the Internet of Things.

Paxata Revealed

The week of 2013 October 28 was a big one for Paxata, Inc. Founded in January of 2012, followed by advisories, beta customers also known as "Pax Pros", and 12 sprints, Paxata quietly released their first GA product in May of 2013. With panels and debuts at the Strata + Hadoop conference in New York and other events, leading up to announcements and demonstrations at the Constellation Connected Enterprise at the Ritz Carlton in Half Moon Bay, California, Paxata officially left stealth mode, publicly discussing:

  • Five Blue Chip Customers: including UBS, Dannon, Box, Pabst, and a $49 B High Tech Networking Manufacturer
  • Partnerships with Tableau, QlikTech and Cloudera
  • Adaptive Data Preparation Platform
  • Eight Million US Dollars in the latest round of funding led by Accel
  • Filling out the Management Team with Enterprise Software executives having backgrounds from SAP, Tableau and Hyperion

The most wondrous feature of the Paxata Adaptive Data Preparation Platform is how it adds semantic richness to one's data sets by automatically recommending and linking to third-party and freely available data. This allows one to bring in firmographic, demographic, social and machine data within the context of the user's goals. This is what truly allows the Paxata Adaptive Data Preparation Platform to go beyond data exploration and discovery.

Paxata has received a fair amount of press as well, some of which I've referenced below. However, all this press misses what is one of the most important additions Paxata makes to the toolboxes of Data Management & Analytics [DMA] professionals… the ability to present questions to the user that they may not have thought of on their own. Paxata was one of the companies that inspired my DataGrok blog post. Paxata was in stealth at the time, and couldn't be named then. Now, I'm happy to be able to write that Paxata is one of the few companies or projects building tools that allow the creator and user of data to go beyond data discovery, beyond data exploration, to being able to fully, deeply understand their data. Data discovery and data exploration tools allow one to determine if various data sets can answer the questions posed by business, engineering or scientific challenges. These tools go further by exposing data integrity issues among data sets or data quality problems within a data set. Some such tools might help the user find new data sets or how various data sources within an organization might fit together in a data warehouse. Some hark back to grep, sed and awk to parse textual data. Others provide probabilistic and statistical tools to determine the appropriate shape, distribution or density functions of a data set. But Paxata is one tool that does all these and more, and does it through your web browser in a collaborative fashion, maintaining the history of each collaborator's operations on the data sets.

When my partner, Clarise, and I were first briefed by Paxata in November of 2012, we were so excited that we stayed over three hours. The demonstration, of what was then a much rougher product than what you see today, incited both of us to exclaim how much we wished that we had this tool back in our DMA practitioner days. We were treated to a demonstration using the data from another Constellation Research customer with which we were familiar. Over a year later, we were treated to a pre-launch briefing using current data sets from that same customer. The ease of use, the pleasantness of the user experience, the simplicity with which one could complete complex tasks, from histograms to column-splitting, showed the maturity that Paxata had gained since our first exposure. What was most important to us, was that Paxata could show a solution for every need that we would like to see in the Adaptive Data Preparation Platform, from our experiences in implementing data warehousing and business intelligence programs since 1996, as well as our decades of experience in computational statistics and operations research.

  • Collect and parse data of disparate types and sources including XML and JSON, and Excel, Flat Files and relational databases
  • Pre-analyze and visualize the data sets
  • Combine different data sets
  • Separate data into patterns
  • Verify individual datum for integrity, quality, mastering and governance
  • Allow multiple IT and end-users to prepare and operate upon the data
  • Maintain the history of what each user [a.k.a. Pax Pro] does, and show that history to all other users

It allows data warehousing and BI extract, transform and load professionals, business analysts, data scientists, chemists, physicists, engineers, researchers, and professionals of all skills who work with data to completely understand and resonate with their data sets. The Paxata Adaptive Data Preparation Platform does what few other tools can do, it provides clues to what you didn't know to ask. It poses questions that the data can answer, but that you didn't think to ask. And it does all of this in a familiar looking interface, in HTML 5, in your favorite web browser, wherever you are, whenever you need it. In Paxata's words:

  1. Connect
  2. Explore
  3. Transform
  4. Combine
  5. Publish

Paxata pricing is published and open. There are three subscriptions available:

  • Pax Personal
  • Pax Share
  • Pax Enterprise

Each of the Paxata subscriptions build upon the first, from an individual subscription to the ability for those with individual subscriptions to share in a single environment, to a full organization-wide subscription. Of course, what makes this possible, is that the Paxata Adaptive Data Preparation platform is available as a Cloud service, accessible through any modern HTML 5 web browser whether that's from a sophisticated, high-end workstation, a tablet or smart phone.

The main value comes not from a nice-looking, fairly intuitive interface, but from the underlying technologies that makes Paxata so useful: powerful Mathematics, Semantics and Graph Theory algorithms. The results of which are easily accessible through this Cloud-based, web experience, while the complexities are under the covers, not getting in the way. This fact is what makes the Adaptive Data Preparation Platform so accessible to business analysts, and other creators and users of data who are not PhD statisticians. Paxata uses proprietary algorithms that detect relationships among data sets, using probabilistic techniques to select the best joins, semantically typing the data so that it can intelligently enrich the data, clean the data and merge the data based upon context not just metadata. All of this is done in an ad hoc fashion, with no predefined models or schæmas needed. These proprietary algorithms make use of

  • Latent Semantic Indexing
  • Statistical Cluster Graphing
  • Pattern Recognition
  • Text Analytics
  • Machine Learning

Distributed computing and in-memory technologies allow these computational statistics algorithms to be,cost effectively executed in parallel, across massive data sets. Coupled with the advancements in visualization technologies, Paxata is able to address a 13.5-16 Billion dollar market over next three years, with extremely attractive pricing. The true return on investment from Paxata comes from flipping the DMA equation around. Currently, a common truism is that 80% of the time on a DMA, Data Science, DW or BI project is spent in preparing data; 20% in analyzing the data. Paxata reduces that data preparation percentage, such that 70% is analytics, 30% is preparation. This reduces not only the labor directly involved in preparing the data, but also allows an Agile framework to address significant business needs at the right time, in a sustainable fashion.

Paxata's strategy is to attach to the QlikView and Tableau markets that are being hampered from enterprise adoption because of these very data preparation challenges. Along with these partnerships, is the partnership with Cloudera, providing enterprise class access to modern, distributed data storage systems. Add connectors to common enterprise and external data sources and the third-party Paxata Enrichment Libraries, and it is obvious to the most casual observer that the Paxata Adaptive Data Preparation Platform addresses the most frustrating complaint of Data Scientists and Business Analysts alike: that too much of their time is spent on plumbing, whether directly or waiting for IT. We have long spoken about the need for IT to give up control of data, and realize that their most effective role is to provide a framework of success for end-users to fully, deeply understand and use their data to solve real problems. Paxata creates this framework for success.

Other Sources to learn about the Paxata launch:

  1. The Paxata Web Site
  2. Diginomica: Can Business Users control their data destiny? Paxata says yes
  3. GigaOM: With $10M from Accel, Paxata wants to make data prep a breeze
  4. VentureBeat: Paxata grabs $8M to help data scientists skip the dirty work
  5. YouTube: Paxata Customers and Partners Help Launch the Company
  6. YouTube: The Cube: Prakash Nanduri - Big Data NYC

May 2019
Mon Tue Wed Thu Fri Sat Sun
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31    
 << <   > >>
We take a system and ecosystem approach to data management and analytics, with a focus on developing Sensor Analytics Ecosystems for the Internet of Things. As Independent Researchers we work with data management and analytics vendors to understand the aspects of IoT data and metadata such as time-series, location, sensor specifications & degradation; we work with IoT vendors to understand their data management and sensor analytics needs; we work with both for adaption to Sensor-Actuator Feedback Loops interacting through the Fog, Edge, Intermediate Aggregation Points, Cloud and Core, with augmenting decisions at every point, and making autonomous decisions as IoT mature through the 5Cs: Connection, Communication, Collaboration, Contextualization and Cognition. We work with Academics, Technology-for-Good, Government and Business Organizations to understand advances in Science, Technology, Engineering, Arti and Mathematics. We filter this information through a framework of Cultural, Regulatory, Economic, Political and Environmental factors to imagine future scenarios that allow our customers to gauge adoption without the hype. We work individually and with partners to develop strategies, define system and enterprise architectures, manage programs and projects, and achieve success with IoT. 37.652951177164 -122.490877706959

Search

  XML Feeds