Neo4j, ML, AI and Causal Inference

There has been incredible growth with Neo4j (@Neo4j) since the first GraphConnect conference in 2013, in business adoption, in data science, in partnerships, in approach, in staff and in customers. During the month of August, we had the privilege of speaking with Amy Hodler (@amyhodler ‬) and, separately, with Lance Walter (@lancewalter), both with Neo4j. Amy is the Analytics & AI Program Manager at Neo4j. Lance is CMO of Neo4j. There are many amazing things happening at Neo4j today, but this blog post will focus on the impact of connected data on machine learning (ML) and artificial intelligence (AI).

Let’s start with some basics.

Background on Graphs and Neo4j

Maps, charts, graphs…what’s the difference among these? At the most basic level, the underlying mathematics which describe maps, charts and graphs are different, and this difference allows for different technology to be created from these concepts. Colloquially though, these three words are often used interchangeably. I might say that the one thing statisticians and data scientists have in common is that we both start by graphing out a representative sample of a new data set. Of course, we really mean plotting the data…often on graph paper. The result might be a chart, an histogram perhaps. And charts are often used in data visualization after analysis. If you’ve read this blog over the years, you know that I love mindmaps, but geographical maps are great for certain types of data visualization, and concept maps are a great gateway to graph technologies.

Maps, charts, graphs…As you can see, there is a lot of overlap in these concepts. Both the words and the mathematics are important because how you might use these concepts determines how you can use data to better understand the world around you, and make better decisions. For our purposes, we are talking about graphs as used in graph databases, graph analytics and graph algorithms to model and elucidate the relationships among entities that are meaningful to a process or ecosystem.

Advances in technology and science have ushered in a digital revolution of an hyper connected environment. This is true in social media and in the socialization of machines. These hyper-connected flows of information, ideas and energy appear in all industries, in government, in technology-for-good NGOs, and even in our personal lives.

In such ecosystems, organizations are presented with opportunities from this connected data. More data and more types of data are being collected at ever higher but varying rates. These continually evolving datasets are both related and dynamic. There is a need to be able to quickly and efficiently identify relationships and patterns in this vast array of data. This need to efficiently store and process highly-connected data has resulted in an increased interest in graph databases.

Most people are familiar with data stored in and accessed from relational models and databases. These systems utilize fairly fixed schemas and are not designed to handle the complex relationships present in today’s connected data sets. Relationships are not easily traversed. Primary and foreign keys are not enough—they are not designed to represent extensive, connected data. These systems require costly programming to find relationships in the data sets, and performance suffers because of the underlying design and storage implementations.

In a graph database, relationships have as much value as the data itself. A graph database transforms a complex interconnected dataset into meaningful and understandable relationships. The key concept here is the edge or relationship. The graph relates the data items in the database to a collection of nodes or vortices, the edges representing the relationships between the nodes. The relationships allow data in the database to be linked together directly and retrieved easily.

Data Scientists Like Graphs

At the same time as we were catching up with Neo4j, our friend, Julian Hyde (@julianhyde), recommended that we read Judea Pearl’s Book of Why, The New Science of Cause and Effect. Having now read the Book of Why as well as Graph Algorithms by Amy Hodler and Mark Needham, I was struck by the similarity of concepts and language between causality and graphs. Learning about causal inference and graph algorithms concurrently in this way truly makes it impossible not to think about one in terms of the other, and makes obvious the usefulness of one to the other. This is especially true for complex systems, machine learning (ML) and artificial intelligence (AI).

AI is used in a variety of contexts with many meanings. AI can be described as artificial narrow intelligence (AnI), general artificial intelligence (GAI), artificial super intelligence (AsI) and Augmented Intelligence (AugI) – though in that last one, it isn’t always clear what or who is being augmented. Of course, many call simple machine learning, even computational statistics at the level of linear regression, “artificial intelligence”. The combination of graph algorithms and causal inference seems to be key in bringing clarity to these various versions of AI, and in helping move towards true GAI or AsI.

Tradition is the passing on of knowledge. If you think of this in terms of graph databases and AI, perhaps we can see the true importance of graphs to AI…providing tradition, an history of knowledge upon which to build, to improve, to provide context, as machines augment humans and humans augment machines. This is where causal inference comes into play. Consider Judea Pearl’s ladder of causation (Observation, Intervention and Counterfactuals).

The Ladder of Causation by Judea Pearl
Book of Why, Chapter 1, Figure 2. The Ladder of Causation, with representative organisms at each level. — by Judea Pearl

From the perspective of graphs, this leads to many possibilities, three of which we plan to pursue.

The first is relevant to general data science workflows, and results from a Twitter conversation with JD Long (@cmastication). Can graphs and causal inference be used to validate and improve the data science workflow, leading to optimizing production use of data science outputs?

The second is useful to projects that we are doing with Rebaie Analytics Group in providing ML prototypes for IoT pilots. Can causal inference and graphs be used to select and validate training sets, especially of third-party data, for ML and AnI?

The third is important to our research and synthesis of ethics frameworks. As this two-way augmented decision making happens, these decisions must be based upon an ethical framework of cultural, regulatory, economic, political and environmental factors, that perhaps, can be built into the graph through the use of the causal inference concept of counterfactuals and graph algorithms that will find and test counterfactuals.

Directed acyclic graphs (DAG) based on causal assumptions as opposed to capturing associations data make more sense to the human minds doing the model. The topology of such graphs is also more amenable to remodeling to capture new information or changes in the causal structure. As such, causal Bayesian Networks as DAG naturally attract data scientists, especially when one of the goals is transparency and explainability in how the model arrived at its predictions or inferences. The concepts of graph local querying and graph global processing also lend themselves well to iteratively updating information across a network, allowing transactions and analytics to interact through causal inference, Bayesian predictions, machine learning scoring and all levels of artificial intelligence.

Contextualization is of paramount importance to every aspect of data management and analytics. Without context, causal inference can’t occur. Context is integral to any organization’s IoT maturity and AI Readiness. Graph technologies inherently bring context to data. Metadata and master data are also a primary means of bringing context to data; and graph technologies are increasingly used in metadata and master data management. Therefore graph technologies are vital to all IoT and AI initiatives. Context through graphs bring even more value to data scientists.

Graphs Make Big Data Look Small

Another reason that data scientists like Neo4j is that purpose-built graph databases make big data look small. Neo4j is based on property graphs, and, like all graphs, the data model and the database treats entities (nodes, vertices) and relationships (edges, paths) as equal elements. Properties assigned to nodes and paths handle much of the separate overhead that other data management systems require. Also, while there are classes of graph algorithms that need a global look at the graph, traversing the entire topology, there are classes of graph solutions that can be considered local graph questions and can be handled in subgraphs. All of this combines to make big data very manageable…making big data look small.

So, unlike other database management systems that require connections between entities using special properties such as foreign keys or out-of-band processing, graph databases use simple abstractions of nodes and relationships when connecting structures. This has the added benefit of enabling users to build sophisticated models in less time.

More Interesting Neo4j Tidbits

As we said at the beginning, there have been many interesting happenings and amazing growth at Neo4j. Here is a list, and references.

  1. The best place to start your graph journey is with Graph Databases by Ian Robinson, Jim Weber and Emil Eifrem, ©2013, O’Reilly — download the 2nd edition as an ebook, free with signup
  2. The Neo4j community is incredibly rich in members, activity and projects
  3. There are over 450 Neo4j events including large conferences, online meetings, graph tours, meetups and training
  4. Neo4j has been adding new hires at every level to strengthen their product, services, business and market position.

    • Mike Asher – former Pivotal CFO, now CFO Neo4j
    • Ivan Zoratti – CTO MariaDB, Head of Field MySQL, now Neo4j DB PM
    • Jake Graham – Intel Saffron AI Product Manager, now AI PM
    • Matt Casters – Author & Architect of Kettle, data integration product
    • Denise Persson – CMO Snowflake joined Neo4j BoD
    • Lisa Hatheway - VP Demand Gen, formerly Vectra AI & Couchbase
    • Alicia Frame - Senior PM for Data Science, formerly of BenevolentAI, EPA and USDA
  5. The Innovation Lab at Neo4j is very popular at the many Graph Tour events, and possibly the best way to be convinced that graph technologies can solve your organization’s challenges; perhaps the best way to learn about this program is with this five minute interview with Alessandro Svensson, the Director of Neo4j Innovation Lab
  6. More on the Book of Why can be found on Dr. Judea Pearl’s publication page at UCLA.
  7. Graph Algorithms by Amy Hodler and Mark Needham, ©2019 published by O’Reilly is available as a free ebook with signup
  8. There has been much discussion over the past decade that data management and analytics must move beyond online transaction processing (OLTP) and online analytical processing (OLAP) to hybrid transaction-analytical processing (HTAP as explained by Donald Feinberg and Merv Adrian at Gartner). Indeed, many of the M&A transactions in this space have been to embed analytics into ERP, eCommerce and financial systems. RDBMS vendors have even put forth the idea that their product can handle both transactions and analytics in the same instance of the database. Modeling such capabilities in entity-relationship diagrams and the corresponding physical models is extremely difficult and the result often performs poorly. GraphDB in the other hand are very good at HTAP. In addition, it is difficult to envision a better platform to manage the ebb and flow of data, metadata and master data among the core, intermediate aggregation points and edge within and among sensor analytics ecosystems.
  9. Google Cloud Partnerships for open source data management and analytics. Neo4j is among these partners, with the goal of bringing native propert graphs into the Cloud.
  10. For the first time in 30 years, a new ISO standard for a query language is firmly on its way to becoming a standard. The proposal has been approved to develop and maintain the graph query language (GQL) by the same international working group that also maintains the SQL standard. This is a significant step in advancing graph technologies for full and extensible platforms.

Graphs Matter to Sensor Analytics Ecosystems

Our research focuses on synthesizing Sensor Analytics Ecosystems (SensAE), bringing value to the Internet of Things (IoT) through data management and analytics…data engineering and data science…within the realm of complex systems…without creating new silos…using an ethics philosophy to guide adoption assuring privacy, transparency, security and convenience by design. Graphs are essential to building Sensor Analytics Ecosystems platforms. Causal inference appears to be essential to IoT maturity, which depends upon AI Readiness.

  • The IoT, even within a siloed solution, is a complex system. By the end of that first Graph Connect in 2013, we were convinced that graphs are the best way to model data in a complex system.
  • As data, metadata, master data and information ebbs and flows in the enterprise, we realized that graphs are the only technology that can break away from source-target data flows, and have included graph technologies as part of SensAE system architectures guidelines.
  • Every IoT pilot starts with connection, and adds communication. To mature, collaboration and contextualization must then be added. With advanced analytics, ML and AnI, the IoT project moves towards cognition. We will now extend our research to determine exactly how causal inference as expressed within a property graph database using graph algorithms to select and direct ML and AnI, enhance this maturation process within an ethics framework. Graphs will be a necessary part of this research.

Guavus Plus SQLstream means Broad and Deep for IoT Data Science


From the first time that Damian Black, founder of SQLstream, and Dr. Anukool Lakhina, founder of Guavus, first met almost a decade ago, the synergies and complementary nature of their visions was apparent to both of them. At the time though, each chose their own path, with Guavus using open source solutions to become a leader in big data, real-time analytics, firmly focused on the Telecommunications CSP (Customer Service Profiles) and operational efficiency market. Meanwhile, SQLstream built off of Eigenbase components to create one of the first true streaming analytics engines, while having strict compliance to SQL standards; on the business side, finding a niche in the burgeoning IoT market, especially in Transportation, all while remaining an horizontal solution.

Guavus was acquired by Thales in 2017. The Thales Group, a large, international player in aerospace and defense, with a significant presence in transportation, expressed interest in SQLstream about four years ago. It was at this point that Damian and Anukool realized that the solutions Guavus and SQLstream had developed since their earlier discussions, had become even more strongly complementary, with Guavus' deep domain expertise in telecommunications, machine learning and data science, and SQLstream as a pioneer and leader in streaming analytics with an horizontal platform. In addition, Guavus is following Thales lead in broadening their domain expertise into the Industrial Internet of Things. SQLstream has had great success in the Transportation area, as well as in other sensor analytics ecosystems (SensAE). In addition, Guavus recognizes the need to process the vast amount of telecoms and IoT data closer to the source. In January of 2019, Guavus acquired SQLsream.


Although the merger is only a month old, the two companies are already working as one to bring the strengths of each together for greater customer success. Over the next six to 12 months, the two will be integrated into a single platform with the ability to scale-up to mind-numbingly large data flows, and to scale-down very finely-tuned small aggregates where and as needed throughout the ecosystem. This will allow greater operational efficiency as separating signal from noise, close to the source, allows processing the data immediately, providing value timely and cost effectively. Data rates are growing, per Damian, by 50% as edge sources increase in importance, but data storage and management costs are only decreasing by 12-14%. Only by pushing the algorithms – the machine learning models – into the streaming pipeline, will organizations be able to actually draw value from this data. Guavus has some of the best data science expertise in the industry for their customers in Telecom. As this domain experience grows to include Transportation, and IIoT in general, companies growing in IoT maturity will be able to perform streaming analytics and machine learning augmented analytics on appropriately aggregated data throughout their ecosystems.

With our integrated solutions, CSPs to IIoT customers will be able to take advantage of something that’s radically different as we deliver AI-powered analytics from the network edge to the network core. With this solution, our customers can now analyze their operational, customer, and business data anywhere in the network in real time, without manual intervention, so they can make better decisions, provide smarter new services, and reduce their costs." — Guavus Press Release

This matches well with what we have seen, and what we present for SensAE architecture, that the ebb and flow of data throughout the ecosystem must allow for appropriate aggregation and analytics at each point within the ecosystem.


At MWC19, there has been a lot of interest in these specific solution, and also in building trust throughout the ecosystem, with security, and, as our research has shown, with the ability to select the desirable levels of privacy and transparency. Responding to these industry concerns is already in the Thales/Guavus/SQLstream roadmap.

The SQLstream products have the ability to analyze, filter, and aggregate data at the network edge in real-time and forward the information to the network core where the Guavus’ Reflex® platform can apply AI-powered analytics, giving customers a widely distributed and scalable architecture with better price/performance and total cost of ownership." — Guavus Press Release

The next few months are going to be exciting with SQLstream, Guavus and Thales bringing together their expertise in streaming analytics, data management, telecommunications, transportation, machine learning, data science, industrial needs and system engineering.

A New Age for Data Quality

Once, most data quality issues were from human errors and inadequate business processes. While these still exist, new data sources, such as sensor data and third-party data from social media, openData and "wisdom of the crowd" introduce new sources of potential error. And yet, the old ways of storing "data" in log books, engineering journals, paper notes and filing cabinets are still widely practiced. At the same time, data quality is more important than ever as organizations rely more on predictive algorithms, machine learning, deep learning, artificial intelligence and cognitive computing. The basics of data quality have remained the same, but the means by which we can assure data quality are changing.

Data Quality Basics

Fundamentally, data quality is about trust; that the decisions made from the data are good decisions, based upon trustworthy data. To achieve this trust, data must be:

  1. correct
  2. valid
  3. accurate
  4. timely
  5. complete
  6. consistent
  7. singular (no duplications that affect count, aggregates, etc)
  8. unique
  9. [have] referential integrity
  10. [apply] domain integrity (data rules)
  11. [enforce] business rules

Now, these principles must be applied to all the new sources and uses of data, often as part of streaming or real-time decision support, automated decisions, or autonomous systems.

Moreover, the data rules and the business rules must reflect reality, including evolving cultural norms and regulatory requirements. For example, in many areas of the world, gender is no longer based simply on biology at birth, but includes gender identification that may be more than just male or female, and may change over time as an individual's self-awareness changes. As another example, regulations in some areas of the world are imposing stricter restrictions around individual privacy, such as the General Data Protection Regulation (GDPR) in the EU with full application coming in May of 2018.

Data Verification

Third-party data verification tools have been around for decades, are often purchased and installed on-premises, including their own databases of information. Today, data verification may be done through such tools, or through openData and openGov databases; modern data preparation tools may even recommend freely available data sources, such as demographic data, to enhance and verify the data that your organization has collected or generated. Other data, such as social media data, is also available to enhance your understanding of customers, markets, culture, regulations and politics that might influence your decisions. Current third-party data is most often accessed through Application Programming Interfaces (APIs) that may be HTTP or ReSTful, or might be proprietary. Use, or rather, misuse of these APIs have the potential to degrade, rather than enhance your decisions support process. Another issue is that you may not know how third-party data is governed according to the basics of data quality. Again, modern data preparation and API management tools can help with these issues, as can open architectures and specifications.

Data from sensors and from sensor-actuator feedback loops, aren't new. Data from connected sensors, actuators, feedback loops, and all kinds of things, from pills to diagnostic machines, from wearables to cars, from parking sensors to a city's complete transportations system, some of which may be available through openGov initiatives, are new. Many of the organizations using such IoT data have never used such data before.

Now that we have taken a very brief look into data quality and new opportunities, let's go into the new tools we have to use these new data opportunities.

Data Stewardship through AI

In the spirit of drinking one’s own champagne, many of the new uses of data – the output of data science – are being applied to data management. As software has consumed the world, machine learning is eating software; deep learning and artificial intelligence are rapidly becoming the top of this food chain. Once, a dozen or so source systems made for a good size data warehouse, with nightly ETL updates. Now, organizations are streaming hundreds of sources into data lakes. The people, processes and technologies for data quality can only keep up through augmentation through the use of advanced analytic algorithms. Machine Learning uses metadata to continuously update business catalogues as artificial intelligence augments the data stewards. Metadata is changing as well, to provide semantic layers within data management tools, and to better understand the data sets coming from the IoT, social media, or open data initiatives.

The first players to apply these techniques to data management and analytics became our first "Data Grok" companies, data that helps humans grok data and how that data can be used. Since then, the first companies to earn the DataGrok designation, Paxata and Ayasdi, have been joined by many others adding machine learning, deep learning and even artificial narrow intelligence (ANI) to provide recommendations and guardrails to data scientists, data stewards, business analysts, and any individual using organizational data to make decisions.

Data Quality Relations

Data Management development through the execution of enterprise architecture, policies, practices and procedures encompasses the interaction among data quality, data governance, and data integrity. Regulatory and process compliance are dependent upon all three. Ownership of each data set, data element and even datum, is critical to assuring data quality and data integrity, and is the first step to providing data governance. Business metadata, technical metadata and object metadata come together through business, technical and operational ownership of the data to build data stewardship and data custodian policies. The architectural frameworks used for Enterprise, IoT and Data architectures result in specifications for each critical data element that provide an overarching view across all business, technical and operational functions.

Data governance interacts with architectural activities in an agile and continuous improvement process that allows standards and specifications to reflect changing organizational needs. The processes and people can assure that data specifications are applicable to the needs of each organizational unit while assuring that data standards are uniformly applied across the organization. The size and culture of an organization determines the formality and structure of data governance and may include a governing council, sponsorship at various organizational levels, executive sponsorship (at a minimum), data ownership, data stewardship, data custodianship, change control and monitoring. But even with all this, the goal of data governance must be to provide appropriate access to data, and not restrict the use of data…from any source.

IT Must Adapt

Information Technology has often been seen as a bottleneck. Many times in our consulting work, we have found ourselves in the position of arbiter between IT and the business. Self-service BI, Analytics and Data Preparation mean IT must become an enabler of data usage, providing trustworthy data without restricting the users. The productionalizing of data science again means that IT must be an enabler of data usage, including the machine learning and other advanced analytics models that data science teams produce. As data science and data management & analytics tools come together, the need for IT to guide the use of data and tools without limiting that use becomes paramount. At the same time, privacy and security must be retained within data governance. Patient data must only be available to the patient and those healthcare professionals and caregivers who require access to that data. Personally Identifiable Information (PII) must be controlled. Regulatory compliance, such as GDPR and PCI, must be adhered to.

There is also a need for two-way traceability from the datum to the end-use in reports and analytics, training sets or scoring, and from the end-use to the source system, including lineage of all transformations along the way. This lineage of source and use enables both regulatory compliance and collaboration. Such transparent history also helps builds trust in the data, and in what other users and IT data management professionals have done to the data.

IT and OT must Work Together

As connected products mature through the 5Cs of our IoT maturity model (connection, communication, collaboration, contextualization and cognition), information technology and operations technology, business systems and engineering systems, must share data under a unified architecture. Much of the promise of the IoT can only be achieved through IT and OT working together. Consumer and marketing information being merged with supply chain and production quality information to build predictive models that allow just-in-time inventory control and agile, custom product delivery is only one example of changes to consumer expectation, whether that consumer is another business, a government or an individual. Industries from every market, such as the energy sector, consumer packaged goods and pharmaceutical manufacturing have reaped the benefits of IT and OT working together, of SCADA/Historians data being integrated with Cloud marketing and sales data or ERP data. But for this partnership between IT and OT to work, they each must trust the data of the other, and that only happens through data governance and data quality efforts.

Metadata and Master Data Management in DQ

Metadata and Master Data Management (MDM) are fundamental in ensuring data quality, and key to using trustworthy data throughout a modern data ecosystem from the most modern data sources and analytic requirements at the Edge to the most enduring legacy systems at the Core; from the droplets in the Fog to the globally distributed multi-Cloud and hybrid architectures. Metadata and MDM have been part of the solution all along, but now must be applied in new ways, both at the core and at the Edge, and distributed through multiple Cloud, hybrid architectures, on-premises, and out into the furthest reaches of the Fog, as all these resources elastically scale up and down at need.

Sensor Data Makes for Interesting DQ

Some of us have been dealing with sensors, sensor-actuator feedback loops and the concepts of the large, complex system for all of our careers, but for many, the fundamentals of connected hardware will be new. Sensor data can be messy. Two sensors from the same manufacturer will be slightly different in the data sets produced, even though they both meet specification; two sensors from different manufacturers will certainly be different in center point, range, precision and accuracy, and how the data are packaged. Sensors drift over time, and will need calibration against public standards. Sensors age, and may be replaced, and both of these conditions affect all the previous points.

Data architecture and DQ

Having worked in System Engineering for aerospace, I go to Deming's definition of Quality as conformance to specifications well suited to the customer, and, for data, specifications come from the architecture.

Architecture abstracts out the organizational needs as a series of views representing the perspectives of the people, processes and technologies affected by and effected through that solution, system or ecosystem. A standalone quality solutions architecture is not a good idea, as quality must be pervasive through an architecture. However, adding quality as a view within an architecture assures that data quality, data governance and compliance are properly represented within the architecture. {Though outside the scope of this post, I would also consider adding security as a separate view.} There are many architectural frameworks, and even controversy about their effectiveness; TOGAF, MIKE2, 4+1 and BOST are the main frameworks. Architectural frameworks focus on enterprise, data and solutions (application) architectures, with a recent interest in Internet of Things (IoT) architecture. Adherence to a framework or method is not as important as that the process by which an architecture is created meets the culture and needs of the organization.


For reference purposes, here are a list of data quality standards and methods that you might find useful:

  • ISO9001 Quality Management Family of Standards
  • ISO 8000 Data Quality Family of Standards
  • EFQM Quality Management Framework and Excellence Model
  • TOGAF The Object Group Architectural Framework for Data Architecture
  • BOST [PDF] An Introduction to the BOST Framework and Reference Models by Informatica
  • MIKE2 The Open Source Standard for Information Management
  • 4+1 Views [PDF] Architectural Blueprint by Philippe Kruchten [citation in alt tag]
  • TDWI Data Improvement Documents

Informatica is First in Customer Loyalty, Again, AND Continues to Innovate

We began using Informatica in its very early days. By 1998, we were using it for an ambitious enterprise data warehouse project spanning three divisions of a Fortune 100 company, taking in transactional and operational data from over 40 operating companies. The days are long gone when we would have implemented complex data architectures and data flows using Informatica Power Center and Power Mart in hub-and-spoke arrangements. But the need to provide powerful data management for analytics around business processes has only grown, as sales, services and customer touch-points have grown. We now generate data every minute of the day, awake or asleep. We tweet, email, and post to social media, personal blogs, and photography and video sharing sites. The things that make the things we use, and all the things around us have embedded computers and are sensor enabled, and generate even more data. Because of this, we have changed the focus of data management from simply extracting from common source systems, transforming so all the data conformed to internal standards, and loaded into that mystical single source of truth [the ETL of old]. Today, our focus is on discovering and exploring data relevant to our organizational and individual needs, no matter the source. And yet, all this data must be vetted; data quality and data governance are more important than ever. While the idea of a single source of truth is passé, trust in our data is not. Whether we are trying to improve our personal fitness or determine the impact of the latest marketing campaign, or bring the perpetrators of genocide to justice, we expect consistency in the answers to the questions we ask of all these sources of data.

Informatica has been amazingly innovative in expanding its capabilities for data management. Informatica solutions and products keep up with where industry is going. Informatica was one of the first data management companies to realize the importance of the Internet of Things (IoT). Their development of the Intelligent Data Platform is seen as a hallmark in handling all these new sources of data. Their attention to metadata and master data management has also improved, and even outpaced, the industry. Informatica can still be deployed on-premises, in one’s own data center, or in private or hybrid clouds, or in public Cloud platforms. Real-time data management, and continuous event processing are also part of Informatica’s suite of products. All of this innovation has been rewarded again today, as for the 11th year in a row, Informatica has been named #1 in Customer Loyalty for data integration. Informatica has earned top marks in customer loyalty in the annual Data Integration Customer Satisfaction Survey conducted by independent research from Kantar TNS.

To show that Informatica is not resting on its laurels, they have also announced today new and enhanced products and services:

  • Cloud Support Offerings
  • Business Critical Success Plan for On-Premises Deployments
  • New Big Data Support Accelerator

You can read more about the Customer Loyalty award and the Informatica announcements in their press release.

The Evolution of Data Management for IoT

In the upcoming webinar for SnapLogic, we will be looking at the Internet of Things from the perspective of data.

  • What data can be expected
  • How IoT data builds upon the evolution of data management and analytics for big data
  • Why IoT data differs from data from other sources
  • Who can make the most use of IoT data or Who can be impacted most by IoT data
  • Where IoT data needs to be processed
  • When IoT data has an impact

Specifically, how the recent evolution of data management in response to big data, is ideally suited in some ways for IoT data, and is still evolving for some unique characteristics of IoT data and metadata.

The business drivers range from new sources of data that can help organizations better understand, service and retain customers, to consolidation in many industries bringing about the need to bring together data from disparate and duplicate information and operation systems after merger and acquisition. One of the more pervasive developments has been the movement of data acquisition, storage, processing, management and analytics, to the Cloud.

Beyond these corporate motives, governments and non-government organizations (NGOs) are using data for good to bring about better quality of life for millions or billions of individuals. Clean water, prosecuting genocide, fighting human trafficking, reducing hunger, and opening up new means of commerce are only a few examples. Some look at the future and see a utopian paradise, others a dystopian wasteland. The IoT with evolving data management and analytics are unlikely to bring about either extreme, but I do think that the future will be better for billions as a result.

The basic question that we’ll ask in this webinar is “What is the Internet of Things?”. From simple connectivity, to the resulting cognitive patterns that will be exhibited by these connected things, we will explore what it means to be a thing on the Internet of Things, how the IoT is currently evolving, and how to bring value from the IoT. It is also important to recognize that the IoT is already here, many organizations are reaping the benefits from IoT data management and sensor analytics. The webinar will show ways in which your organization can join the IoT or mature your IoT capabilities.

Big data was often described by three parameters overwhelming the old ways of integrating and storing data: volume, velocity and variety. Really, we are looking at deftly interweaving the volumetric flow of data in timely ways that flexibly provide for privacy, security, convenience, transparency, governance and compliance. Nowhere is this evolution better expressed than in data management for the Internet of Things (IoT).

We will cover some of the more interesting and useful aspects of preparing for IoT data and sensor analytics. Though coined by Kevin Ashton in 1999, the IoT is still considered in the early stages of adoption and relevance. While the latest trends in data management and analytics apply to IoT data and sensor analytics, there are specific needs for properly addressing IoT data, which legacy ETL (extract, transform and load) and DBMS (database management systems) simply don’t handle well, such as time-series data and location data, as well as metadata specific to IoT. In addition to these characteristics of IoT data, we will explore other aspects that make IoT data so interesting.

The IoT isn’t meeting its hype as yet, which requires many solution spaces coming together as ecosystems. Instead, the IoT is growing within each vertical separately, creating new data silos. This is exemplified by the 30-plus standards bodies addressing IoT data communication, transport and packaging. Metadata and API management can help. Metadata also addresses the nuances of IoT data, such as the factors arising from replacing a sensor that allow continuity of the data set and understanding of the difference before and after the change.

Information Technology (IT) and Operational Technology (OT) are coming together in IoT. This means interfacing legacy systems on both side of the house, such as enterprise resource planning (ERP) and customer relationship management (CRM) systems with supervisory control and data acquisition (SCADA) systems, and relational database management systems (RDBMS) with Historians DBMS. This also means deriving context from the EDGE of the IoT for use in central IT and OT systems, and bringing context from those central systems for use in streaming analytics at the Edge. Further this means that machine learning (ML) is not just for deep analysis at the end of the DMA process; ML is now necessary for properly managing data at each step from the sensor or actuator generating the data stream, to intermediate gateways, to central, massively scalable analytic platforms, on-premises and in the Cloud.

As we discuss all of this, our participants in today’s webinar will come away with five specific recommendations on gaining advantage through the latest IoT data management technologies and business processes. For more on what we will be discussing, visit my post on the SnapLogic Blog. I hope that you’ll register and join the conversation on 2016 October 27 at 10:00 am PDT.

July 2020
Mon Tue Wed Thu Fri Sat Sun
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31    
 << <   > >>
We take a system and ecosystem approach to data management and analytics, with a focus on developing Sensor Analytics Ecosystems for the Internet of Things. As Independent Researchers we work with data management and analytics vendors to understand the aspects of IoT data and metadata such as time-series, location, sensor specifications & degradation; we work with IoT vendors to understand their data management and sensor analytics needs; we work with both for adaption to Sensor-Actuator Feedback Loops interacting through the Fog, Edge, Intermediate Aggregation Points, Cloud and Core, with augmenting decisions at every point, and making autonomous decisions as IoT mature through the 5Cs: Connection, Communication, Collaboration, Contextualization and Cognition. We work with Academics, Technology-for-Good, Government and Business Organizations to understand advances in Science, Technology, Engineering, Arti and Mathematics. We filter this information through a framework of Cultural, Regulatory, Economic, Political and Environmental factors to imagine future scenarios that allow our customers to gauge adoption without the hype. We work individually and with partners to develop strategies, define system and enterprise architectures, manage programs and projects, and achieve success with IoT. 37.652951177164 -122.490877706959


  XML Feeds