Skip to main content

Apache Beam


What is it

Apache Beam is an open source unified programming model which let's you implement streaming or batch pipelines, and have them run on several different execution engines. Available SDKs are Python, Go, Java and Scala.

Available runners are:

  • Direct. For local development.
  • Google Cloud Dataflow. The managed runner on GCP.
  • Other available runners include Apache Flink, Apache Spark, Apache Samza and Apache Nemo

When to use it

We are currently evaluating when and if we should use Apache Beam. For batch processing our default tool is Dataform, our hypothesis is that Apache Beam will be useful for real time processing use cases, e.g. real time aggregations of vehicle telematics events.

How to learn it

Official documentation