Collibra expands integration touchpoints with Google Cloud

This article was written by Collibra and originally appeared on the Collibra Blog here:


Collibra and Google Cloud provide solutions that naturally complement each other. Together, the two companies help customers maximize value from their enterprise data, promoting agile data operations, helping to derive innovative business insights, while ensuring those insights can be trusted and relevant policies are complied with.

This established partnership is being further strengthened by a new set of integrations that bring their cloud data platforms closer together and drive greater value for mutual customers. These integrations focus on three key areas – driving data quality, enhancing data governance through policy enforcement, and ensuring trust in business reporting and analytics.

Driving data quality 

Recently acquired Collibra subsidiary OwlDQ is a leading provider of predictive data quality software, which can be deployed as a cloud-native solution on Google Cloud. The solution, now known as Collibra Data Quality, empowers organizations to build and run scalable Data Quality applications and pipelines in modern, dynamic environments – including public, private, and hybrid clouds – while maintaining a consistent experience, performance and management runbook.

Collibra Data Quality leverages Spark parallel processing and can be applied across large and diverse data sources, including files and streaming data. Unique capabilities such as autonomous rule management, continuous data-drift detection and data profiling are enhanced by Collibra metadata management, data stewardship, data auto-classification, and data lineage to bring full business context to data quality. By combining a fully managed and highly scalable service for running Apache Spark, like Google Cloud Dataproc, with Collibra Data Intelligence Cloud, organizations can create end-to-end high-quality data pipelines to deliver scalable and trusted analytics and AI.

Collibra Data Quality supports a range of Google Cloud integrations, including BigQuery, Google Cloud Storage, Dataproc, Google Kubernetes Engine (leveraging Google Container Registry), CloudSQL, Streaming, Spanner and Looker. Collibra’s Data Lineage offering can then be used to follow data quality as data moves, this is done via data lineage scanners for Dataflow, BigQuery and Spanner.

Enhancing data governance through policy enforcement

Google Cloud offers a data and analytics platform that encompasses a wide range of technologies to support data collection, preparation, storage, analysis and visualization. The platform enables agile data operations and drives powerful business insights, all of which require proper governance of data and analytical processes (for more information see our paper).

An important aspect of governance is policy management and enforcement. Rules determining data usage can vary significantly by industry or jurisdiction. It is vital to know which rules apply to which data sets and under which circumstances. Collibra helps organizations take a data-centric approach to compliance by accurately classifying data and maintaining a central record of all applicable policies.

The next step in the integration is for Collibra’s policy management capabilities to be combined with Google Cloud’s access controls to ensure data usage remains in compliance with applicable rules and regulations. Data classified and tagged with relevant policies in Collibra can be assigned Google Cloud policy tags, which can then be used to enforce access controls. Any policy changes recorded in Collibra can then be published back to Google Cloud via PubSub to update records in Google Cloud Data Catalog, IAM and BigQuery.

Ensuring trusted business reporting and analytics 

Collibra recently completed integrating with Looker, a Google Cloud company that offers data visualization and business intelligence solutions (more details here). The integration enables business analysts to automatically register Looker assets (including reports, dashboards, looks, tiles and queries) in Collibra Data Catalog. Collibra Data Catalog then serves as a collaborative platform to govern both underlying data and the tools used to derive insights from that data.

The combined solution enables organizations to drive operational efficiencies in their business intelligence processes. Analysts find it easier to locate the right data sources and are armed with contextual information to accurately interpret that data. They can then share the results of their analysis via a common platform that promotes collaboration, capturing feedback from report viewers and using certification processes to highlight the most trusted reports. By encouraging re-use of components from certified reports, analysts can also help promote consistency and avoid duplicating efforts; ensuring they are not constantly reinventing the wheel and can focus on innovation. Finally, data lineage helps to ensure reports are pointed at the right data sources, while mitigating operational risk from systems changes – for example, by notifying report owners of deprecated data elements that could corrupt their reports.

Delivering digital transformation 

Google Cloud provides a powerful platform for modern organizations to aggregate, store and analyze enterprise data assets and derive new business insights, all of which need to be underpinned by trusted data and analytics. That means having the right data foundations in place, which is where Collibra excels. The integration points between Collibra and Google Cloud will enable mutual customers to truly reap the benefits of both platforms – deriving data-driven business insights on a foundation of trusted, well-governed data.