dbt-score: continuous integration for dbt metadata
- Track:
- Data Engineering and MLOps
- Type:
- Talk
- Level:
- intermediate
- Duration:
- 30 minutes
Abstract
dbt (Data Build Tool) is a framework written in Python for creating, building, organizing, testing and documenting data models, i.e. data sets living in a database or a data warehouse.
dbt builds on clean software development practices to allow data engineers and analysts to focus on the core logic of their data transformations, while allowing them to manage datasets at scale.
Following a long-standing tradition in data engineering, dbt makes an extensive use of YAML to store metadata. Such metadata includes for example column types, table documentation, data correctness tests, 3rd party integrations, and any custom metadata that may be useful. In large projects, the metadata can easily amount to millions of values.
This is where things become complex, as there's no mechanism to ensure the correctness and consistency of this metadata. In particular, answering the following questions is hard:
- Is this table sufficiently tested?
- Does this table expose sensitive data?
- Does this column name follow naming conventions?
Answering those questions is essential to maintain a high quality of data, as well as secure it by safe-guarding it against incidental exposure and other privacy and compliance concerns.
This is why we created and open-sourced dbt-score, a linter for dbt metadata. It is designed to be flexible to enforce and encourage any good practice set up by data teams. Through its CLI, data practitioners can easily obtain a maturity score of their data models, keep the metadata chaos manageable, improve consistency, and eventually deliver high-quality data.
This presentation will cover:
- An introduction to dbt
- The importance and advantages of declarative, git-tracked metadata
- The need for linting this metadata
- How dbt-score tackles this challenge, and its applicability to diverse dbt projects
More information about dbt-score can be found online:
- Documentation: https://dbt-score.picnic.tech/
- Source code: https://github.com/PicnicSupermarket/dbt-score