Data mesh: two tickets to data paradise

This article was written by Collibra and originally appeared on the Collibra Data Mesh Blog here:


Air travel has come a long way since the Wright brothers first took to the sky in 1903. Today you can book a flight in a few minutes, and travel almost anywhere in the world with little concern about all the logistics happening behind the scenes. The same evolution, however, cannot be said about the ability for organizations to tap into the intelligence within their data to drive business outcomes.

If we go back in time just a bit, these brothers were responsible for the entire design of their Wright flyer, from building their own engine, developing their own propeller, test flying their unproven machine and everything in between. Today, passengers need not worry about the drag coefficient of the wing, what altitude to fly at, or even how their meal was prepared hours (maybe days) before their flight. All that matters is the ease in booking their flight and getting to their destination on time.

Many organizations have taken a centralized approach to consolidate their data assets into a data warehouse or data lake. The initial perception was that a data lake would provide a single repository where data could be easily accessed, cleansed and delivered so that business users could get the insights they needed, as easy as booking their flight to Hawaii.

The problem is that today’s modern enterprise data is dynamic, diverse, and distributed. By continuing to pump massive volumes of this data into a centralized repository someone has to take on the overwhelming responsibility to manage, curate, and deliver this data. Often this monolithic data lake is put under the control and watch of IT or the data platform team who are typically disconnected from the business that created the data, and the users who need access to it. Without knowledge about what’s in the data, where it came from, or what it will be used for, how can they be expected to provide accurate, valuable data?

This approach has created significant friction in the ability to discover, understand, and take full advantage of data. Often business analysts and users must build “the plane” that will deliver them the data insights they require. With little confidence about whether the data is relevant, and of high quality, these data consumers are forced to track down and engage the departments responsible for the data to understand where it came from, and work with technical teams to manually transform and cleanse their data for their needs. These types of efforts simply can’t scale at today’s speed of business. According to Accenture, only 32% of companies are able to realize tangible and measurable value from their data.

Just as aviation and air travel has progressed, data mesh provides a path for organizations to evolve their data strategy to make smarter decisions by extracting the rich intelligence locked within their data. Data mesh isn’t a technology, but is a new architecture to streamline access to high quality data and address the use cases of a complex data environment.

One of the core issues with a centralized data lake approach is that the expertise and stewardship of the data is lost when the data is funneled into the lake out of their control. The idea is to give back control of the data to the business domains that created and own the data. As the domain experts, who better to cleanse, enrich and make the data readily available to the consumers of the data throughout the organization. No longer should data consumers have to constantly knock on the cockpit door to determine data quality and manually obtain the trusted facts about the data.

With business domains now fully in control they can deliver data as a product, which simply means that data owners are providing the requested data that is in a state that is ready-to-use and does not require any additional rework. As such, data should be easily discoverable in a catalog, addressable with standard naming conventions and deemed trustworthy by the data owners. The syntax of the data should be easily understood, interoperable with other data sets, and governed in accordance with compliance requirements.

Back in the day– flight reservations were made by calling each airline to get flight times, prices, layovers, etc…imagine the countless hours and frustration with this user experience. Unfortunately, users suffer a similar experience with traditional data strategies. They struggle to discover the data they need, what’s in it, where it came from, or if it can be trusted. Data mesh helps democratize data by enabling business users to access the data they need in a self-service manner, all while hiding and automating the complexity. Protected by a set of security and governance guardrails, data users can get the trusted data they need when they need it.

The Wright Brothers paved the way for passengers to travel from London to New York in just three hours. The data mesh concept also holds enormous potential to streamline the speed and efficiency to extract data intelligence and impact business outcomes (check out this article from Google on why you should care). Organizations embarking on their data mesh journey need to think about the right tools and processes to get the most out of this approach.