Dataproc

Recommendation

Updated

Moved

HOLD

2022-05-13

What is it

Dataproc is a managed Apache Spark and Apache Hadoop service that lets you take advantage of open source data tools for batch processing, querying, streaming, and machine learning. Dataproc automation helps you create clusters quickly, manage them easily, and save money by turning clusters off when you don't need them. With less time and money spent on administration, you can focus on your jobs and your data.

When to use it

We are currently not using Apache Spark, thus we are not using Dataproc. If we were to use Apache Spark in the future, then we would run the jobs on Dataproc infrastructure in GCP.

How to learn it

Official documentation

Why it's on hold

For use cases involving Dataproc, prefer BigQuery and/or Dataflow.

What is it​

When to use it​

How to learn it​

Why it's on hold​

What is it

When to use it

How to learn it

Why it's on hold