Benefits of using Collibra with Databricks on Google Cloud

This article was written by Collibra and originally appeared on the Collibra Cloud Blog here:


Today, Databricks and Google Cloud announced the general availability of Databricks on Google Cloud. This jointly developed service brings together data engineering, data science, analytics and machine learning through an open lakehouse platform. With an open data and cloud platform, users need data governance to continue to reap the benefits of the open architecture. That’s why we are excited about this partnership so we can support customers leveraging both Databricks and Google Cloud for their governance needs. Collibra provides a unified and governed view of data for both the technical data expert and the non technical business user. We strive to connect all users around data and foster collaboration and innovation across the entire enterprise.

Benefits of using Collibra with Databricks on Google Cloud

The launch of Databricks on Google Cloud offers enterprise flexibility for AI-driven analytics. This partnership helps customers easily deploy Databricks globally, at scale on Google Cloud. It enables customers who only deploy on Google Cloud the opportunity to realize the full capabilities of Databricks Lakehouse and provides the flexibility and choice to use best-of-breed tools for their multi-cloud strategy so they can distribute analytics & AI workloads across clouds.

Collibra’s integrated platform enables users to easily find, understand and trust their data through a singular view.  Our governance and cataloging capabilities ensures trust in the data and provides the foundation to run with Databricks for AI/ML, while our data quality solution helps increase analytics usage. More specifically, Collibra offers four crucial capabilities that help users get the full value from Databricks and Google Cloud. Collibra offers:

  1. Tagging, governance and classification to ensure the data housed in Databricks on Google Cloud is trustworthy and easy to find
  2. Policies, standards and data quality rules to ensure data is consistent and accurate
  3. Policy management for creating, reviewing and updating data policies to ensure adoption and maintain compliance
  4. Data sets and data lineage to show how data transforms and flows as it is transported from source to destination, across its entire lifecycle

Our integration with Google Cloud makes Collibra the only solution where a user can see all their data in one portal and have it governed in one place.

Copy-of-Cloud-Platforms DRAFT_April-2021-CC-300x169

Collibra’s platform sits on top of Databricks Lakehouse, providing a complete view through data lineage of where the data is stored and how it is being used. Collibra offers a full catalog view from Databricks to Looker, and includes Collibra Data Quality to ensure accuracy and trust in the data. For example, if a healthcare enterprise has patient data in BigQuery and Databricks, they can classify this data and bring it under a common language in Looker, and then use Collibra to view all the places this data has been used to ensure consistency and accuracy.  Ultimately, we are excited about Databricks on Google Cloud because it allows us to seamlessly catalog your source of truth across all your data sources through our multi-cloud offering and close integration with both solutions.