redapt / pyspark-s3-parquet-example

This repo demonstrates how to load a sample Parquet formatted file from an AWS S3 Bucket. A python job will then be submitted to a Apache Spark instance running on AWS EMR, which will run a SQLContext to create a temporary table using a DataFrame. SQL queries will then be possible against the temporary table.
19Updated 8 years ago

Alternatives and similar repositories for pyspark-s3-parquet-example:

Users that are interested in pyspark-s3-parquet-example are comparing it to the libraries listed below