Guavus Plus SQLstream means Broad and Deep for IoT Data Science


From the first time that Damian Black, founder of SQLstream, and Dr. Anukool Lakhina, founder of Guavus, first met almost a decade ago, the synergies and complementary nature of their visions was apparent to both of them. At the time though, each chose their own path, with Guavus using open source solutions to become a leader in big data, real-time analytics, firmly focused on the Telecommunications CSP (Customer Service Profiles) and operational efficiency market. Meanwhile, SQLstream built off of Eigenbase components to create one of the first true streaming analytics engines, while having strict compliance to SQL standards; on the business side, finding a niche in the burgeoning IoT market, especially in Transportation, all while remaining an horizontal solution.

Guavus was acquired by Thales in 2017. The Thales Group, a large, international player in aerospace and defense, with a significant presence in transportation, expressed interest in SQLstream about four years ago. It was at this point that Damian and Anukool realized that the solutions Guavus and SQLstream had developed since their earlier discussions, had become even more strongly complementary, with Guavus' deep domain expertise in telecommunications, machine learning and data science, and SQLstream as a pioneer and leader in streaming analytics with an horizontal platform. In addition, Guavus is following Thales lead in broadening their domain expertise into the Industrial Internet of Things. SQLstream has had great success in the Transportation area, as well as in other sensor analytics ecosystems (SensAE). In addition, Guavus recognizes the need to process the vast amount of telecoms and IoT data closer to the source. In January of 2019, Guavus acquired SQLsream.


Although the merger is only a month old, the two companies are already working as one to bring the strengths of each together for greater customer success. Over the next six to 12 months, the two will be integrated into a single platform with the ability to scale-up to mind-numbingly large data flows, and to scale-down very finely-tuned small aggregates where and as needed throughout the ecosystem. This will allow greater operational efficiency as separating signal from noise, close to the source, allows processing the data immediately, providing value timely and cost effectively. Data rates are growing, per Damian, by 50% as edge sources increase in importance, but data storage and management costs are only decreasing by 12-14%. Only by pushing the algorithms – the machine learning models – into the streaming pipeline, will organizations be able to actually draw value from this data. Guavus has some of the best data science expertise in the industry for their customers in Telecom. As this domain experience grows to include Transportation, and IIoT in general, companies growing in IoT maturity will be able to perform streaming analytics and machine learning augmented analytics on appropriately aggregated data throughout their ecosystems.

With our integrated solutions, CSPs to IIoT customers will be able to take advantage of something that’s radically different as we deliver AI-powered analytics from the network edge to the network core. With this solution, our customers can now analyze their operational, customer, and business data anywhere in the network in real time, without manual intervention, so they can make better decisions, provide smarter new services, and reduce their costs." — Guavus Press Release

This matches well with what we have seen, and what we present for SensAE architecture, that the ebb and flow of data throughout the ecosystem must allow for appropriate aggregation and analytics at each point within the ecosystem.


At MWC19, there has been a lot of interest in these specific solution, and also in building trust throughout the ecosystem, with security, and, as our research has shown, with the ability to select the desirable levels of privacy and transparency. Responding to these industry concerns is already in the Thales/Guavus/SQLstream roadmap.

The SQLstream products have the ability to analyze, filter, and aggregate data at the network edge in real-time and forward the information to the network core where the Guavus’ Reflex® platform can apply AI-powered analytics, giving customers a widely distributed and scalable architecture with better price/performance and total cost of ownership." — Guavus Press Release

The next few months are going to be exciting with SQLstream, Guavus and Thales bringing together their expertise in streaming analytics, data management, telecommunications, transportation, machine learning, data science, industrial needs and system engineering.

A New Age for Data Quality

Once, most data quality issues were from human errors and inadequate business processes. While these still exist, new data sources, such as sensor data and third-party data from social media, openData and "wisdom of the crowd" introduce new sources of potential error. And yet, the old ways of storing "data" in log books, engineering journals, paper notes and filing cabinets are still widely practiced. At the same time, data quality is more important than ever as organizations rely more on predictive algorithms, machine learning, deep learning, artificial intelligence and cognitive computing. The basics of data quality have remained the same, but the means by which we can assure data quality are changing.

Data Quality Basics

Fundamentally, data quality is about trust; that the decisions made from the data are good decisions, based upon trustworthy data. To achieve this trust, data must be:

  1. correct
  2. valid
  3. accurate
  4. timely
  5. complete
  6. consistent
  7. singular (no duplications that affect count, aggregates, etc)
  8. unique
  9. [have] referential integrity
  10. [apply] domain integrity (data rules)
  11. [enforce] business rules

Now, these principles must be applied to all the new sources and uses of data, often as part of streaming or real-time decision support, automated decisions, or autonomous systems.

Moreover, the data rules and the business rules must reflect reality, including evolving cultural norms and regulatory requirements. For example, in many areas of the world, gender is no longer based simply on biology at birth, but includes gender identification that may be more than just male or female, and may change over time as an individual's self-awareness changes. As another example, regulations in some areas of the world are imposing stricter restrictions around individual privacy, such as the General Data Protection Regulation (GDPR) in the EU with full application coming in May of 2018.

Data Verification

Third-party data verification tools have been around for decades, are often purchased and installed on-premises, including their own databases of information. Today, data verification may be done through such tools, or through openData and openGov databases; modern data preparation tools may even recommend freely available data sources, such as demographic data, to enhance and verify the data that your organization has collected or generated. Other data, such as social media data, is also available to enhance your understanding of customers, markets, culture, regulations and politics that might influence your decisions. Current third-party data is most often accessed through Application Programming Interfaces (APIs) that may be HTTP or ReSTful, or might be proprietary. Use, or rather, misuse of these APIs have the potential to degrade, rather than enhance your decisions support process. Another issue is that you may not know how third-party data is governed according to the basics of data quality. Again, modern data preparation and API management tools can help with these issues, as can open architectures and specifications.

Data from sensors and from sensor-actuator feedback loops, aren't new. Data from connected sensors, actuators, feedback loops, and all kinds of things, from pills to diagnostic machines, from wearables to cars, from parking sensors to a city's complete transportations system, some of which may be available through openGov initiatives, are new. Many of the organizations using such IoT data have never used such data before.

Now that we have taken a very brief look into data quality and new opportunities, let's go into the new tools we have to use these new data opportunities.

Data Stewardship through AI

In the spirit of drinking one’s own champagne, many of the new uses of data – the output of data science – are being applied to data management. As software has consumed the world, machine learning is eating software; deep learning and artificial intelligence are rapidly becoming the top of this food chain. Once, a dozen or so source systems made for a good size data warehouse, with nightly ETL updates. Now, organizations are streaming hundreds of sources into data lakes. The people, processes and technologies for data quality can only keep up through augmentation through the use of advanced analytic algorithms. Machine Learning uses metadata to continuously update business catalogues as artificial intelligence augments the data stewards. Metadata is changing as well, to provide semantic layers within data management tools, and to better understand the data sets coming from the IoT, social media, or open data initiatives.

The first players to apply these techniques to data management and analytics became our first "Data Grok" companies, data that helps humans grok data and how that data can be used. Since then, the first companies to earn the DataGrok designation, Paxata and Ayasdi, have been joined by many others adding machine learning, deep learning and even artificial narrow intelligence (ANI) to provide recommendations and guardrails to data scientists, data stewards, business analysts, and any individual using organizational data to make decisions.

Data Quality Relations

Data Management development through the execution of enterprise architecture, policies, practices and procedures encompasses the interaction among data quality, data governance, and data integrity. Regulatory and process compliance are dependent upon all three. Ownership of each data set, data element and even datum, is critical to assuring data quality and data integrity, and is the first step to providing data governance. Business metadata, technical metadata and object metadata come together through business, technical and operational ownership of the data to build data stewardship and data custodian policies. The architectural frameworks used for Enterprise, IoT and Data architectures result in specifications for each critical data element that provide an overarching view across all business, technical and operational functions.

Data governance interacts with architectural activities in an agile and continuous improvement process that allows standards and specifications to reflect changing organizational needs. The processes and people can assure that data specifications are applicable to the needs of each organizational unit while assuring that data standards are uniformly applied across the organization. The size and culture of an organization determines the formality and structure of data governance and may include a governing council, sponsorship at various organizational levels, executive sponsorship (at a minimum), data ownership, data stewardship, data custodianship, change control and monitoring. But even with all this, the goal of data governance must be to provide appropriate access to data, and not restrict the use of data…from any source.

IT Must Adapt

Information Technology has often been seen as a bottleneck. Many times in our consulting work, we have found ourselves in the position of arbiter between IT and the business. Self-service BI, Analytics and Data Preparation mean IT must become an enabler of data usage, providing trustworthy data without restricting the users. The productionalizing of data science again means that IT must be an enabler of data usage, including the machine learning and other advanced analytics models that data science teams produce. As data science and data management & analytics tools come together, the need for IT to guide the use of data and tools without limiting that use becomes paramount. At the same time, privacy and security must be retained within data governance. Patient data must only be available to the patient and those healthcare professionals and caregivers who require access to that data. Personally Identifiable Information (PII) must be controlled. Regulatory compliance, such as GDPR and PCI, must be adhered to.

There is also a need for two-way traceability from the datum to the end-use in reports and analytics, training sets or scoring, and from the end-use to the source system, including lineage of all transformations along the way. This lineage of source and use enables both regulatory compliance and collaboration. Such transparent history also helps builds trust in the data, and in what other users and IT data management professionals have done to the data.

IT and OT must Work Together

As connected products mature through the 5Cs of our IoT maturity model (connection, communication, collaboration, contextualization and cognition), information technology and operations technology, business systems and engineering systems, must share data under a unified architecture. Much of the promise of the IoT can only be achieved through IT and OT working together. Consumer and marketing information being merged with supply chain and production quality information to build predictive models that allow just-in-time inventory control and agile, custom product delivery is only one example of changes to consumer expectation, whether that consumer is another business, a government or an individual. Industries from every market, such as the energy sector, consumer packaged goods and pharmaceutical manufacturing have reaped the benefits of IT and OT working together, of SCADA/Historians data being integrated with Cloud marketing and sales data or ERP data. But for this partnership between IT and OT to work, they each must trust the data of the other, and that only happens through data governance and data quality efforts.

Metadata and Master Data Management in DQ

Metadata and Master Data Management (MDM) are fundamental in ensuring data quality, and key to using trustworthy data throughout a modern data ecosystem from the most modern data sources and analytic requirements at the Edge to the most enduring legacy systems at the Core; from the droplets in the Fog to the globally distributed multi-Cloud and hybrid architectures. Metadata and MDM have been part of the solution all along, but now must be applied in new ways, both at the core and at the Edge, and distributed through multiple Cloud, hybrid architectures, on-premises, and out into the furthest reaches of the Fog, as all these resources elastically scale up and down at need.

Sensor Data Makes for Interesting DQ

Some of us have been dealing with sensors, sensor-actuator feedback loops and the concepts of the large, complex system for all of our careers, but for many, the fundamentals of connected hardware will be new. Sensor data can be messy. Two sensors from the same manufacturer will be slightly different in the data sets produced, even though they both meet specification; two sensors from different manufacturers will certainly be different in center point, range, precision and accuracy, and how the data are packaged. Sensors drift over time, and will need calibration against public standards. Sensors age, and may be replaced, and both of these conditions affect all the previous points.

Data architecture and DQ

Having worked in System Engineering for aerospace, I go to Deming's definition of Quality as conformance to specifications well suited to the customer, and, for data, specifications come from the architecture.

Architecture abstracts out the organizational needs as a series of views representing the perspectives of the people, processes and technologies affected by and effected through that solution, system or ecosystem. A standalone quality solutions architecture is not a good idea, as quality must be pervasive through an architecture. However, adding quality as a view within an architecture assures that data quality, data governance and compliance are properly represented within the architecture. {Though outside the scope of this post, I would also consider adding security as a separate view.} There are many architectural frameworks, and even controversy about their effectiveness; TOGAF, MIKE2, 4+1 and BOST are the main frameworks. Architectural frameworks focus on enterprise, data and solutions (application) architectures, with a recent interest in Internet of Things (IoT) architecture. Adherence to a framework or method is not as important as that the process by which an architecture is created meets the culture and needs of the organization.


For reference purposes, here are a list of data quality standards and methods that you might find useful:

  • ISO9001 Quality Management Family of Standards
  • ISO 8000 Data Quality Family of Standards
  • EFQM Quality Management Framework and Excellence Model
  • TOGAF The Object Group Architectural Framework for Data Architecture
  • BOST [PDF] An Introduction to the BOST Framework and Reference Models by Informatica
  • MIKE2 The Open Source Standard for Information Management
  • 4+1 Views [PDF] Architectural Blueprint by Philippe Kruchten [citation in alt tag]
  • TDWI Data Improvement Documents

Informatica is First in Customer Loyalty, Again, AND Continues to Innovate

We began using Informatica in its very early days. By 1998, we were using it for an ambitious enterprise data warehouse project spanning three divisions of a Fortune 100 company, taking in transactional and operational data from over 40 operating companies. The days are long gone when we would have implemented complex data architectures and data flows using Informatica Power Center and Power Mart in hub-and-spoke arrangements. But the need to provide powerful data management for analytics around business processes has only grown, as sales, services and customer touch-points have grown. We now generate data every minute of the day, awake or asleep. We tweet, email, and post to social media, personal blogs, and photography and video sharing sites. The things that make the things we use, and all the things around us have embedded computers and are sensor enabled, and generate even more data. Because of this, we have changed the focus of data management from simply extracting from common source systems, transforming so all the data conformed to internal standards, and loaded into that mystical single source of truth [the ETL of old]. Today, our focus is on discovering and exploring data relevant to our organizational and individual needs, no matter the source. And yet, all this data must be vetted; data quality and data governance are more important than ever. While the idea of a single source of truth is passé, trust in our data is not. Whether we are trying to improve our personal fitness or determine the impact of the latest marketing campaign, or bring the perpetrators of genocide to justice, we expect consistency in the answers to the questions we ask of all these sources of data.

Informatica has been amazingly innovative in expanding its capabilities for data management. Informatica solutions and products keep up with where industry is going. Informatica was one of the first data management companies to realize the importance of the Internet of Things (IoT). Their development of the Intelligent Data Platform is seen as a hallmark in handling all these new sources of data. Their attention to metadata and master data management has also improved, and even outpaced, the industry. Informatica can still be deployed on-premises, in one’s own data center, or in private or hybrid clouds, or in public Cloud platforms. Real-time data management, and continuous event processing are also part of Informatica’s suite of products. All of this innovation has been rewarded again today, as for the 11th year in a row, Informatica has been named #1 in Customer Loyalty for data integration. Informatica has earned top marks in customer loyalty in the annual Data Integration Customer Satisfaction Survey conducted by independent research from Kantar TNS.

To show that Informatica is not resting on its laurels, they have also announced today new and enhanced products and services:

  • Cloud Support Offerings
  • Business Critical Success Plan for On-Premises Deployments
  • New Big Data Support Accelerator

You can read more about the Customer Loyalty award and the Informatica announcements in their press release.

The Evolution of Data Management for IoT

In the upcoming webinar for SnapLogic, we will be looking at the Internet of Things from the perspective of data.

  • What data can be expected
  • How IoT data builds upon the evolution of data management and analytics for big data
  • Why IoT data differs from data from other sources
  • Who can make the most use of IoT data or Who can be impacted most by IoT data
  • Where IoT data needs to be processed
  • When IoT data has an impact

Specifically, how the recent evolution of data management in response to big data, is ideally suited in some ways for IoT data, and is still evolving for some unique characteristics of IoT data and metadata.

The business drivers range from new sources of data that can help organizations better understand, service and retain customers, to consolidation in many industries bringing about the need to bring together data from disparate and duplicate information and operation systems after merger and acquisition. One of the more pervasive developments has been the movement of data acquisition, storage, processing, management and analytics, to the Cloud.

Beyond these corporate motives, governments and non-government organizations (NGOs) are using data for good to bring about better quality of life for millions or billions of individuals. Clean water, prosecuting genocide, fighting human trafficking, reducing hunger, and opening up new means of commerce are only a few examples. Some look at the future and see a utopian paradise, others a dystopian wasteland. The IoT with evolving data management and analytics are unlikely to bring about either extreme, but I do think that the future will be better for billions as a result.

The basic question that we’ll ask in this webinar is “What is the Internet of Things?”. From simple connectivity, to the resulting cognitive patterns that will be exhibited by these connected things, we will explore what it means to be a thing on the Internet of Things, how the IoT is currently evolving, and how to bring value from the IoT. It is also important to recognize that the IoT is already here, many organizations are reaping the benefits from IoT data management and sensor analytics. The webinar will show ways in which your organization can join the IoT or mature your IoT capabilities.

Big data was often described by three parameters overwhelming the old ways of integrating and storing data: volume, velocity and variety. Really, we are looking at deftly interweaving the volumetric flow of data in timely ways that flexibly provide for privacy, security, convenience, transparency, governance and compliance. Nowhere is this evolution better expressed than in data management for the Internet of Things (IoT).

We will cover some of the more interesting and useful aspects of preparing for IoT data and sensor analytics. Though coined by Kevin Ashton in 1999, the IoT is still considered in the early stages of adoption and relevance. While the latest trends in data management and analytics apply to IoT data and sensor analytics, there are specific needs for properly addressing IoT data, which legacy ETL (extract, transform and load) and DBMS (database management systems) simply don’t handle well, such as time-series data and location data, as well as metadata specific to IoT. In addition to these characteristics of IoT data, we will explore other aspects that make IoT data so interesting.

The IoT isn’t meeting its hype as yet, which requires many solution spaces coming together as ecosystems. Instead, the IoT is growing within each vertical separately, creating new data silos. This is exemplified by the 30-plus standards bodies addressing IoT data communication, transport and packaging. Metadata and API management can help. Metadata also addresses the nuances of IoT data, such as the factors arising from replacing a sensor that allow continuity of the data set and understanding of the difference before and after the change.

Information Technology (IT) and Operational Technology (OT) are coming together in IoT. This means interfacing legacy systems on both side of the house, such as enterprise resource planning (ERP) and customer relationship management (CRM) systems with supervisory control and data acquisition (SCADA) systems, and relational database management systems (RDBMS) with Historians DBMS. This also means deriving context from the EDGE of the IoT for use in central IT and OT systems, and bringing context from those central systems for use in streaming analytics at the Edge. Further this means that machine learning (ML) is not just for deep analysis at the end of the DMA process; ML is now necessary for properly managing data at each step from the sensor or actuator generating the data stream, to intermediate gateways, to central, massively scalable analytic platforms, on-premises and in the Cloud.

As we discuss all of this, our participants in today’s webinar will come away with five specific recommendations on gaining advantage through the latest IoT data management technologies and business processes. For more on what we will be discussing, visit my post on the SnapLogic Blog. I hope that you’ll register and join the conversation on 2016 October 27 at 10:00 am PDT.

Contest Kognitio on Hadoop Best Use


During the week of 2017 September 26 at the O'Reilly Strata-Hadoop conference in New York City, Kognitio announced the start of their contest looking for the best use case or application of Kognitio-on-Hadoop. Kognitio are looking for innovative solutions that include Kognitio-on-Hadoop. Innovation is defined by Kognitio as

Innovation could be a novel or interesting application or it could be something that is common place but is now being done at scale.

This covers a wide range of potential big data analytics use cases that might include data-for-good, government, academic or business applications. Contestants must write-up their use case in a short paper, to be submitted to Kognitio no later than 2017 March 31. Applications will be judged by a named panel headed by a leading industry analyst. The winner will be notified on 2017 June 01. Applicants can be individuals, groups or organizations. The winner may chose among the following three prizes:

  1. US$5,000.00
  2. A one year standard support contract
  3. A one year internship at Kognitio’s R&D facility in the UK – subject to the intern being eligible to work in the United Kingdom

Kognitio on Hadoop is free to download; registered entrants will receive notifications of patches and updates to the free software, as well as preferential support on the Kognitio forums.


As one of the first in-memory, massively parallel processor (MPP) analytics platform, Kognitio has over 25 years of experience to bring to big data processing…always in-memory, MPP and on clusters. Today, the Kognitio Analytical Platform is delivered via appliances, software, and cloud. Kognitio on Hadoop was announced at the 2016 Strata-Hadoop conference in London. This free-to-use version of the Kognitio Analytical Platform includes full YARN integration allowing Hadoop users to pull vast amounts of data into memory for data management and analytics (DMA). As an in-memory MPP analytical platform, Kognitio is very scalable and can provide MPP execution of any computational statistics or data science applications. MPP of SQL, MDX, R, Python and other languages, for advanced analytics, is handled through bulk synchronous parallel (BSP) API. This provides extremely fast, high concurrency access to the data. In addition to these languages, Kognitio has a strong partnership with business intelligence vendors, such as Tableau, Microstrategy and others. For Tableau, Kognitio has a first-class connector; and, for example, a joint customer in the financial services market, with 10,000 customers accessing nine petabytes (9PB) of data in Hadoop [five terabytes (5TB) in Kognitio]. As example of the high concurrency available through Kognitio, the financial services customer routinely sees 1500-2000 queries per second from ~500 concurrent sessions. Now, know that this is an analytical subsystem; there are another 15 such uses of Kognitio, for specific purposes, accessing that 9PB data lake.

Kognitio on Hadoop

Kognitio on Hadoop can be downloaded free of charge and with no data size limits or functional restrictions. This download is available without registration. There is a range of paid support options available as well. Kognitio on Hadoop is integrated with YARN, and works on any existing Hadoop infrastructure. Thus, no additional hardware is required solely for Kognitio. Kognitio on Hadoop accesses files, such as CSV files, stored on Hadoop, in HDFS, as one would normally store data in Hadoop. Intelligent parallelism in Kognitio 8.2 allows queries to be assigned to as few as one core, or to use all cores, allowing for extraordinarily high levels of concurrency. This apportionment is performed dynamically by Kognitio. In addition to the obvious advantages of such a mature product, as free-to-use, Kognitio on Hadoop can be much more easily deployed, tested, and brought into production, while many open source solutions are still trying to run in a lab. Kognitio on Hadoop was developed internally using Apache Hadoop. Kognitio on Hadoop is in production at customers on Apache Hadoop, and the distributions from Cloudera, Hortonworks and MapR.

Why is this important to SAE?

As the Internet of Things matures beyond simple connectivity and communication, in-memory MPP analytical platforms, such as Kognitio on Hadoop, will be required to allow context to be derived from intelligent sensor packages and Edge gateways, to the Cloud, and provide context to the Edge, Fog and sensors, in real-time. Kognitio on Hadoop conceivably allows true collaboration and contextualization among things and humans in sensor analytics ecosystems.

September 2019
Mon Tue Wed Thu Fri Sat Sun
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
 << <   > >>
We take a system and ecosystem approach to data management and analytics, with a focus on developing Sensor Analytics Ecosystems for the Internet of Things. As Independent Researchers we work with data management and analytics vendors to understand the aspects of IoT data and metadata such as time-series, location, sensor specifications & degradation; we work with IoT vendors to understand their data management and sensor analytics needs; we work with both for adaption to Sensor-Actuator Feedback Loops interacting through the Fog, Edge, Intermediate Aggregation Points, Cloud and Core, with augmenting decisions at every point, and making autonomous decisions as IoT mature through the 5Cs: Connection, Communication, Collaboration, Contextualization and Cognition. We work with Academics, Technology-for-Good, Government and Business Organizations to understand advances in Science, Technology, Engineering, Arti and Mathematics. We filter this information through a framework of Cultural, Regulatory, Economic, Political and Environmental factors to imagine future scenarios that allow our customers to gauge adoption without the hype. We work individually and with partners to develop strategies, define system and enterprise architectures, manage programs and projects, and achieve success with IoT. 37.652951177164 -122.490877706959


  XML Feeds