Skip to main content

PySpark

Recommendation
Updated
Moved
HOLD
2022-05-13

What is it

From the documentation:

PySpark is an interface for Apache Spark in Python. It not only allows you to write Spark applications using Python APIs, but also provides the PySpark shell for interactively analyzing your data in a distributed environment. PySpark supports most of Spark’s features such as Spark SQL, DataFrame, Streaming, MLlib (Machine Learning) and Spark Core.

When to use it

We are currently not using Apache Spark, and thus not PySpark.

Why it's on hold

For use cases involving PySpark, we currently prefer BigQuery and/or Dataflow.

How to learn it