Skip to main content

Data Discoverability

Recommendation
Updated
Moved
ASSESS
2022-04-29

What is it

Practices, architectural principles and tools for finding, sharing and understanding data for analytics and machine learning. DataHub and Data Catalog are examples of tools that can be used to improve data discoverability. Practices such as naming conventions, how to structure projects and datasets are also an important part of making data accessible.

Why we use it

To reduce the friction points for data users to locate what they need, to make sense of it and evaluate if it's trustworthy to use. Data lineage is accessible so that development and maintenance is less error prone.

When to use it

When it starts to get difficult to discover what data exist or what it represents and if the confidence in the accuracy of the data is lacking. When development and maintenance is tedious and error prone due to missing data lineage.

How to learn it

The documentation for DataHub and Data Catalog respectively https://datahubproject.io/docs/ https://cloud.google.com/data-catalog