This article was written by Jay Militscher and originally appeared on the Collibra Data Mesh Blog here: https://www.collibra.com/us/en/blog/data-mesh-101-a-straightforward-overview-of-the-hottest-topic-in-enterprise-data
It’s perhaps a shocking truth: We live in an era of stunning digital transformation that is only going to become more data-driven each day. The raw numbers are staggering. The datasphere is forecast to reach 97 zettabytes in 2022 — and double (!) to 181 zettabytes by 2025. By 2025, it’s estimated that 463 exabytes of data will be created every day.
Clearly, our world has gone digital, and the pandemic is only accelerating this global trend. Thriving enterprises in every sector of the economy — from banking to life sciences to retail — are seeking better ways to harness an abundance of data. However, enterprise IT leaders face serious challenges.
- Only 32% realize tangible value from data
- 77% integrate up to 5 different types of data in their data pipelines
- Only 3% of data meet basic quality standards
- 65% or organization are using at least 10 different data engineering tools
(Sources: Accenture, Closing the Data-Value Gap and IDC, 2021.)
Today’s enterprise data management inflection point
While many organizations are using traditional data warehouses and BI platforms, the centralized monolithic model often creates real friction for organizations that need to discover, understand, and leverage data to its fullest potential. As ever-greater volumes of data are managed in a centralized repository, it’s often the Data Office or IT who are tasked with the overwhelming responsibility of managing, curating, and delivering massively complex data sets that are only getting more complicated by the day.
“Data mesh is a decentralized sociotechnical approach in managing and accessing analytical data at scale.’
– Zhamak Dehghani
Conceived by Zhamak Dheghani in 2019, the concept of the data mesh is one of the most-discussed topics in data management — and strategically aligned with our mission at Collibra to help organizations find, understand, trust, and access their data. With a new, robust framework, data mesh is an approach to data management that provides a path to evolve from the shortcomings of legacy, centralized architecture, toward a decentralized, domain-driven design at scale.
This approach maximizes the value of data by reducing friction for data creators and consumers through both organizational and technological design. This decentralizing enabler empowers business domains to control their own data destiny by creating high-value, trustworthy data products that are easily consumed by the organization.
Flexibility is inherent to data mesh, encouraging an evolutionary approach, but it requires strategic commitment and investment.
A summary of data mesh principles
The rich framework of data mesh is centered on four guiding principles to get the most value from your data:
- Domain-driven ownership
- Data as a product
- Self-service data infrastructure
- Federated computational governance
In future blogs, we’ll dive deeper into each of these principles. For now, let’s look at the key reasons why these principles are relevant.
Principle 1: Domain-driven ownership
One of the primary challenges of managing analytical workloads with legacy architecture is this: often when organizations pump everything into a central data lake, they separate the data from the subject matter experts. These subject matter experts have the business knowledge and stewardship of the originating operation, but they cannot do their job easily. In the central data lake model, they must wait for a centralized data team to fulfill analytics requests for them.
In the data mesh model, the experts — the domain — control the data ecosystem, and they are responsible for cleansing, enriching, and making data readily available to data consumers throughout the organization. These domain owners establish and maintain the quality of the data and provide necessary facts and documentation. Centralized data offices no longer need to take this on. This removes friction simply by virtue of commingling data with the business talent.
Principle 2: Data as a product
A data mesh organization puts domain experts in charge of the data — and then applies product thinking to ensure the data roadmap meets the accessibility, governance, and usability needs of the organization.
“For a distributed data platform to be successful, domain data teams must apply product thinking with similar rigor to the datasets that they provide; considering their data assets as their products and the rest of the organization’s data scientists, ML and data engineers as their customers.”
– Zhamak Dehghani
(Source: ‘How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh.’ May 2019)
Data mesh organizations treat data as a product. Data products get a vision and strategy, and a product roadmap that spans from idea to R&D, release, maintenance, and retirement. That means domain owners apply lifecycle planning to data. Above all, the ‘data as a product’ principle ensures that data is always measured by the value it brings to the people who use it.
Principle 3: Self-Service data infrastructure
To scale principles 1 and 2, the data mesh model leverages a self-service data infrastructure so business domains aren’t burdened with managing the underlying complexity of compute, networking, security, and storage requirements. Cloud technology has made this very achievable.
The ultimate expression of this infrastructure is provided through APIs that facilitate highly automated data production and consumption. In this way, the data mesh organization abstracts complexity for the domain owners and reduces friction for data consumers. When data products can be seamlessly developed, shared, and consumed, an organization puts itself in a position to truly foster innovation. However, ‘Rome wasn’t built in a day,’ the old saying goes.
See how Collibra can help you take the next step to start down the data mesh path. Read the white paper, ‘Don’t Drown in Your Data Lake.’
Principle 4: Federated computational governance
The datasphere will continue to evolve, as will each organization’s data ecosystem. So too must their data intelligence and governance strategy. While decentralization is key to the data mesh model, good governance is essential for secure, successful enterprises. Data mesh organizations use a federated approach that fosters both enterprise-wide authority as well as domain-specific needs and requirements. Automation and integration throughout all layers of the data infrastructure are key to achieving this for policy, classification, definition, security, and quality at scale.
Data mesh maximizes the value of data in a data-driven world
Time-consuming. Error-prone. Unsustainable. Unable to scale. If data management at your organization sounds like this, then a data mesh model might be just what you need.
Designed to decentralize much of the data platform and IT teams’ heavy-lifting, the data mesh model transfers the onus of data management to individual business domains. With domain data ownership, true stewards with deep expertise and knowledge of the data actually control the data. Rather than boiling an ocean of data, business domain teams can focus on ensuring data is clean, trustworthy, and always available to support business agility. Now, enterprises building a data management architecture with self-service as a priority can give data consumers rapid access to the right data when they need it.
“We are a big believer in data mesh,” our CEO Felix Van de Maele told Datanami in a recent interview. “We have to embrace the fact that to do data well at scale, it needs to be distributed. … Data mesh maximizes the value of your data by reducing the friction for data creators and consumers.”
A game-changer for organizations, data mesh offers a framework for removing bottlenecks and empowering business domains to produce and curate data at enterprise scale and speed. In the coming weeks, we’ll be sharing more detail on each of the four data mesh principles on the Collibra blog.