Skip to main content

Dataform

Recommendation
Updated
Moved
HOLD
2022-05-13

What is it

Dataform is a tool for managing data pipelines solely through SQL. It brings good software engineering practices such as unit tests and CI/CD flows into the data engineering domain.

Dataform has been acquired by Google and is on its way to be integrated more and more into GCP and BigQuery. This is the main reason that we invest into using Dataflow over dbt that has a larger community.

Exactly how Dataform will be integrated into GCP and BigQuery is still uncertain.

When to use it

When you want to provide curated datasets in BigQuery through some kind of batch data pipeline defined in SQL and have support for schedules, unit tests and assertions.

How to learn it

Why it's on hold

Using dbt is currently preferred over Dataform - after a lengthy evaluation process for our use cases.