10 Tips to Build a Successful Data Catalog

April 16th, 2018

Last August we released Alteryx Connect, a social data exploration and data cataloging platform for the enterprise. What is a data catalog and how do organizations successfully build one? Data cataloging is an organized service that enables users across an organization to find and explore information assets in a secure and governed environment.  It enables organizations to get maximum value from their present information assets by making data sources easy to discover and understand by the users who require it. This blog will highlight 10 Tips to Build a Successful Data Catalog.


1. Create the Culture

Alteryx Connect offers many great features to enable your teams to build a descriptive and social data catalog that will benefit the whole organization.


When a user starts their analytic journey, your data catalog is the starting point — the first mile of analytics — searching and finding content, understanding context and gaining trust in the results through community feedback and interaction.


In the real world, the perfect data catalog won't arrive overnight, and the history of data governance technology is littered with solutions that have simply failed to achieve that critical velocity and adoption in an organization. To truly deliver on a data catalog we must also focus on the people and the process, not just the technology — we must build a culture that enables your users to succeed.


A social data catalog needs to be socially engaging. In building a culture that empowers users to impart and share their knowledge, the technology must support all the different ways that users bring their experience together to solve problems: creating and annotating definitions, discussing quality and purpose in conversation threads, even simple social gestures like sharing a link or giving a 'thumbs up' reinforce the value of the underlying asset and make it richer and easier to find for future users.


As part of the process, ensure that the success of the catalog is tied into the success of the organization: track and reward the most active contributors, understand the assets that are creating the biggest 'buzz' and promote those users ensuring that your company's information assets are well curated and maintained.


2. Stay Focused

From the moment you embark on a data cataloguing project, you stand at a base camp with the peak of expectations staring at you from across the chasm of corporate knowledge! Building a social repository of all your organization's data sources, reports, workflows, terminology, and more — potentially thousands of lifetimes of accumulated knowledge — is as daunting as climbing Mount Everest!


So, don't.


Just like a general about to go to battle, selection and maintenance of the aim is regarded as the master principle of war and strategy: Start small, but think big — win those initial battles and get your army on the move!


How does this work in practice? Pick a single department, or a single project. Maybe, start with a handful of popular (or critical) datasets. Even, simply documenting the reports that users consistently fail to understand gets vital organizational memory greater visibility. Document this expertise while reports and data sources are being created, before the skills and the knowledge leaves the project (or the company!) In 12 months' time, will the resources still exist to be able to explain the function of a dashboard, a report or a database table?


Another route is to follow your business strategy closely — this is where the money and the power will flow: document and socialize the assets associated to key strategic projects, and use the catalog as a means to drive the culture change towards open, social collaboration.


3. Connect to Sources

A fully-manual data catalog relies too heavily on corporate benevolence to ever reach that critical velocity and truly succeed in an organization.


Instead, Alteryx Connect gives your initiative a much-needed burst of acceleration by allowing the catalog to be populated automatically with metadata from your source systems. The metadata acquired in this manner will typically be rather technical and 'raw' - missing that color provided by experts with real business experience in this first instance.


This is perfectly fine. Alteryx Connect loaders aim to get this valuable information into the organization's spotlight as efficiently as possible. Once the spotlight is shining, the details can be added to the social catalog directly from a browser. Assign owners to reports, describe your data sources, or simply certify the high-quality assets so that they're immediately findable to your users.


4. Prototype

Once you start to see all the benefits that a social data catalog can bring to your organization, you will start to ask how this can be achieved: both delivering a software solution and providing the visibility, governance and community behavior towards your information assets.


As with all components in the Alteryx platform — we strongly recommend the Guided Trial as a means to demonstrate success. Just as with Tip #2 — Stay Focused. Choose a functional area, key project or key data sources and implement Alteryx Connect to prototype how a social data catalog would deliver the features you need, in an environment that can be used to showcase to other users and business stakeholders.


5. Timeliness

Who would visit a news website that used yesterday's stories? Who would Instagram last season's fashions? Who would trust a data catalog that stored out-of-date assets and information?


To ensure adoption, it is absolutely vital that users find the information in your social data catalog always up-to-date. Without timeliness, the catalog immediately loses trust and credibility from its users, and the project will sink without trace — after all, IT probably has several of these 'ghost ship' applications (Wikis, SharePoint, even the dreaded Excel data dictionary) still floating in the backwaters of the organization somewhere...


With regular schedules, the Connect 'loaders' will retrieve information and linkages from data platforms, analytics tools, applications and more. The refreshed catalog is immediately available to users, with a powerful version control feature to ensure visibility on any updates or changes that have occurred.


Let's demystify the term 'loader' — it's simply a visual Alteryx drag-and-drop workflow that is published to your Alteryx Server and set up to point at your data sources. An analytic app, designed to democratize the organizational memory held in your most valuable systems and data stores.


In addition to ensuring that loaders are run periodically, be proactive — use Connect's social and collaboration features to inform users about upcoming changes: Stay on the top of change management in your data landscape, and your users will continue to trust the data catalog.


6. Glossary

A business glossary is a critical component of your social data catalog strategy, and is available out-of-the-box with Alteryx Connect.


A glossary can take many forms: definitions, concepts, subject areas and many more. It captures the unique language of your organization in a single centralized location and then connects that meaning with the diverse contents of the catalog itself.


Not only to we get to define the terms, we now can see how those terms are applied across reports, databases and other definitions. Gone are abstract conversations about 'customer churn', 'net revenue', or 'ROE'. Instead, view a certified definition for 'Return on Equity' and interactively explore where this term is applied in your information assets — all from a single browser window.


As with our other advice in this blog: start small and think big. Take a little time from your business analysts (who often work at the intersection of business understanding and business communication) and get their thoughts on how to describe the business. Certify definitions through your Chief Data Officer or steering groups as you start to build that critical momentum and capture the language and culture of your organization.


To give you a head start, Alteryx Connect allows you to import business glossaries from formats like Excel very easily.


Don't start from the scratch.

Don't reinvent the wheel.

Import what you already have, and get it visible!


7. Annotate

A social data catalog lives-or-dies on whether users find value in the information within.


Cold, formal definitions and automated technical metadata can make for a very dry user experience. No-one would visit social sites like Facebook (1.37 Billion Daily Active Users, Users visiting the site 13.8 times per day) or LinkedIn (106 Million Monthly Active Users, 40% of users access the site monthly) if it only contained machine-scraped/interpreted data and didn't allow a user's personality and expertise to shine through.


Likewise, a centralized, top-down approach to building a data catalog is equally doomed-to-failure. Just like monolithic IT projects, if you impose a catalog on users, they will only work with it under duress.


Flip this approach to become a bottom-up, decentralized crowd-swell and you have a completely different picture!


There is no-one central to the organization, not even BI and IT teams that have a 100% understanding of all those data sources, data sets, and reports and other types of assets. Yet this expertise, this 'know-how' is most definitely in the heads of your staff: Business teams, analysts, knowledge workers, analytics groups, and more. It's pervasive and waiting to be harnessed!


To be successful with your social catalog, you need to provide users with the responsibility, the empowerment to annotate any-and-all data loaded in Alteryx Connect: democratizing the curation, maintenance and lifespan of your organization's critical information assets.


8. Let the Users In

Building a successful data catalog is about being a shopkeeper, not a gatekeeper. We need users to visit the catalog regularly, find value and then enrich the things they find most useful, bringing all that tribal knowledge from their heads to their colleagues.


Users can be created manually by the Alteryx Connect administrator on an individual basis, imported from a spreadsheet, or specific email domains (e.g. @alteryx.com) can be 'white-listed' to remove any complexity in the sign-up process.


Many organizations already use an enterprise-strength authentication system, such as SAML or Active Directory, and Alteryx Connect can work directly with those systems to identify and allow access based on a user's existing roles and credentials.


A best practice for increasing engagement in a data catalog is that all assets are open and transparent by default. This makes it easier for a user to find what they're looking for, and then start asking more valuable questions. Without this visibility, users will question the accuracy of the catalog and will lose engagement.


Remember — Alteryx Connect is making metadata generally accessible: the who, what and where of data, not the actual data itself. When users understand the full picture of how their assets are connected, linked and used they'll be able to take much better decisions with that information.


Of course, it is possible to set up permissions and visibility rules within the catalog so that assets can be restricted to certain groups of users, but this approach should be used sparingly for only an organization's most sensitive assets — let your Chief Data Officer guide the strategy for these edge cases.


9. Extend the Reach - "Bring your own metadata"

Connect's loaders provide out-of-the-box connectivity to the leading data and analytics platforms so that information assets from all these sources can be made available in the catalog and refreshed on a regular basis. The breadth and depth of connectivity will grow as more loaders are added to the system.


It's also possible to 'bring your own metadata' from any data source using the Connect Software Development Kit (SDK). The SDK tools can be accessed from within Alteryx Designer, using familiar drag-and-drop, code-free visual workflows.


Using the SDK, Alteryx Connect can ingest metadata from any third-party software of choice, allowing your teams to discover the full breadth of data sources used within your organization and giving your data catalog an unparalleled completeness of vision.


10. Ownership & WHAT-WHO-WHERE rule

A good data catalog will tell you who, what and where about every information asset that's important to your organization.


Who — Understand the owner or the trusted steward for an asset. This is the individual that you need to be able to reach out and get the answers you need.


Maybe this is the creator of the asset — a report author or workflow owner. Maybe this is the acknowledged expert on a particular business field or technology. It's vital for your data catalog to hold this point-of-contact information, so that your growing data community understands who can help when they have questions, or who to engage when new requirements or business change is needed.


A best practice to build engagement is to collaborate directly within the Connect platform — use the social commentary threads, annotations and subscriptions to track activity and tribal knowledge in a single place, rather than dispersed over emails and instant messages. Be open. Be transparent. Be connected.


What — At a minimum, aim to provide a basic description of an asset: business terminology, report functionality, the basic purpose of a dataset. More information can always be added later, so capture what you can but don't aim for 100% of the detail in the first edit. Building a catalog is an iterative, ongoing and collaborative process.


Where — The data catalog will tell your users a great deal about the purpose, meaning and flow of information throughout your organization, but knowing where to locate the underlying assets is vital to putting them to work effectively.


When a user wants to launch an Alteryx Analytic App, or a BI dashboard, they will find links or file locations provided automatically by Connect loaders, or through user annotations edited directly within the catalog.


When an asset has been identified, contextualized and is trusted, the data catalog aims to take the user to the content directly, completing the journey of the 'first mile' of analytics!