Improving data quality with Event Sourcing

In this article we'll take a look at a few different characteristics of data quality and how Event Sourcing can help companies improve the data quality in their software.

Mattias Holmqvist

1/20/2022

Data has become a critical business asset. In the age of information overload, higher demands on compliance, auditing and handling of sensitive data fed into our systems, we need dependable tools to save and manage data without mistakes, or breaking the law.

Avoiding accidental data loss

Most information systems and databases support updating information. Without thinking about it, an update is actually deleting information that we previously had. At the very least, the data about what caused the previous data to be stored is definitely gone. This makes implementing tracking, auditing and error detection an expensive and hard development task.

With Event Sourcing we never update our data, but we continuously write more events to our event log, which means we will never accidentally delete potentially critical data due to a data update.

Not all data is created equal

In an organization data is needed for different purposes and it is important to qualify the nature of your data to manage it properly. A common mistake is to use the same data storage solution for all kinds of data, typically a large relational database. This creates coupling and blurs the boundaries between business-critical data and less important transient data.

Core business data

Your core business data that is used to drive your business and to support your systems logic and functionality. With Event Sourcing this information is stored in an immutable log of events, where data is never updated - only added to. Every change is represented by an event that is described in terms used by the business (not just developers) which gives your developers, business experts and data analysts the great benefit of a common understanding of what happens in your systems.

Read-oriented data

Your core business data describes what happens in your system and in your business, but there is more data that is needed to support the development of modern applications and services. Typically you need to provide fast access to readable data and to tailor the core information to different use cases.

Instead of organizing your systems to read from your core data, it is more effective to read from specific data stores that are specialized to provide readable data models that are based on your core data. These can be separate databases or in-memory caches that provide access to the data in a rapid fashion.

With Event Sourcing we can produce tailored read models (projections) using our event log in a structured way. Requirements to read-models tend to change quite often. Applications want to show more data or format the data in different ways. Using the event log as a core data storage makes it possible to replay the history to recreate read-models with a new structure, without affecting the core systems at all.

Temporal data

A crucial piece of information that is hidden from most implementations of data systems is the notion of time. When we model our data, we tend to forget about the time when certain things happen and model this into the data. This leads to business experts having to lean on external logs and tracking systems and piece that information together with the core data to make the right decisions.

With Event Sourcing, the notion of time cannot be forgotten, since all events are appended to a log where the time of the event is also stored. The event log also clearly visualizes how time actually is a crucial part of our data.

Sensitive data

With new laws and regulations such as GDPR and the CLOUD Act, the demands on business owners, product owners, developers and data experts to understand how sensitive data is being stored and used are ever much higher.

There are now requirements to store certain information over longer periods of time and some information might have to be removed within well-defined timeframes. Keeping an event log as a source of truth, instead of using a highly coupled relational database will result in easier management of complex regulatory requirements, since the individual pieces of information that are stored in different events are not dependent on each other and can therefore be treated and investigated separately and removed when required.

When mistakes are made

High-quality data should be correct. If however, a mistake is made, it is better to acknowledge the mistake rather than sweeping it under the rug. If downstream systems already started using the wrong data you will have to let them know somehow that a mistake was made and a correction is needed.

With Event Sourcing we use compensating events to record the fact that a mistake was made and explicitly correct it in our system. It also has the benefit that we can provide this compensating event to downstream systems and services so that they can revert their logic, if possible.

High-quality data

High-quality data answers more questions about its nature than low-quality data. You can look at a piece of data and understand why it is there, what it represents in the context of your business and when it happened. This is how Event Sourcing helps create richer data with higher quality.

There are more ways Event Sourcing can help your organization improve the data quality of your systems.

To learn more about data quality improvements and Event Sourcing, try out our platform or reach out to us at https://serialized.io or directly to me at mattias@serialized.io