redapt / pyspark-s3-parquet-exampleLinks

This repo demonstrates how to load a sample Parquet formatted file from an AWS S3 Bucket. A python job will then be submitted to a Apache Spark instance running on AWS EMR, which will run a SQLContext to create a temporary table using a DataFrame. SQL queries will then be possible against the temporary table.

☆19

Alternatives and similar repositories for pyspark-s3-parquet-example

Users that are interested in pyspark-s3-parquet-example are comparing it to the libraries listed below

Sorting:

ContentSquare / ml-in-prod
Tutorial repo for the article "ML in Production"
☆30Updated 2 years ago
bahchis / airflow-cookbook
Airflow workflow management platform chef cookbook.
☆71Updated 6 years ago
ecloudvalley / Building-a-Data-Lake-with-AWS-Glue-and-Amazon-S3
☆17Updated 6 years ago
thomhopmans / pythom
Code supporting Data Science articles at The Marketing Technologist, Floryn Tech Blog, and Pythom.nl
☆71Updated 2 years ago
kadnan / Airflow-Tutorial
Basic tutorial of using Apache Airflow
☆36Updated 6 years ago
paiml / awsbigdata
AWS Big Data Certification
☆25Updated 6 months ago
kaantas / spark-twitter-sentiment-analysis
Sentiment Analysis of a Twitter Topic with Spark Structured Streaming
☆55Updated 6 years ago
tomaszdudek7 / airflow_project
scaffold of Apache Airflow executing Docker containers
☆86Updated 2 years ago
MrPowers / gill
An example PySpark project with pytest
☆16Updated 7 years ago
holdenk / spark-intro-ml-pipeline-workshop
A simple introduction to using spark ml pipelines
☆26Updated 7 years ago
Yannael / kafka-sparkstreaming-cassandra
Docker container for Kafka - Spark Streaming - Cassandra
☆98Updated 6 years ago
aws-samples / serverless-ai-workshop
This workshop demonstrates two methods of machine learning inference for global production using AWS Lambda and Amazon SageMaker
☆58Updated 4 years ago
aws-samples / amazon-kinesis-analytics-streaming-etl
Streaming ETL with Apache Flink and Amazon Kinesis Data Analytics
☆65Updated last year
karthikmswamy / TFTutorials
Contains code for understanding TensorFlow workflow and basics
☆51Updated 7 years ago
JasonSanchez / spark-streaming-twitter-kafka
Ingest tweets with Kafka. Use Spark to track popular hashtags and trendsetters for each hashtag
☆29Updated 9 years ago
awslabs / deeplearning-emr
Scripts and instructions to facilitate running Deep Learning Tasks on Amazon EMR
☆63Updated last year
BogdanCojocar / medium-articles
Repo for all my code on the articles I post on medium
☆107Updated 2 years ago
awslabs / aws-iot-analytics-notebook-containers
An extension for Jupyter notebooks that allows running notebooks inside a Docker container and converting them to runnable Docker images.
☆28Updated last year
tfayyaz / awesome-airflow
A curated list of all the awesome examples, articles, tutorials and videos for Apache Airflow.
☆96Updated 4 years ago
MrPowers / ceja
PySpark phonetic and string matching algorithms
☆39Updated last year
deliveryhero / pyconde2019-airflow-ml-workshop
PyConDE & PyData Berlin 2019 Airflow Workshop: Airflow for machine learning pipelines.
☆47Updated last year
awslabs / predictive-segmentation-using-amazon-pinpoint-and-amazon-sagemaker
This solution combines Amazon Pinpoint with Amazon SageMaker to help automate the process of collecting customer data, predicting custom…
☆17Updated 4 years ago
FINRAOS / herd-ui
Herd-UI is a search and discovery tool for business and technical users. Everyone in your organization can use Herd-UI to browse and unde…
☆16Updated 2 years ago
datastacktv / kubeflow-introduction
Code examples for the Introduction to Kubeflow course
☆14Updated 4 years ago
cs109 / cs109_data
Datasets for CS109
☆28Updated 11 years ago
neo4j-contrib / training-v2
☆16Updated 2 weeks ago
BasPH / airflow-rocket
Airflow code accompanying blog post.
☆21Updated 6 years ago
jrderuiter / airflow-fs
Composable filesystem hooks and operators for Apache Airflow.
☆17Updated 4 years ago
florimondmanca / kafka-fraud-detector
🚨 Simple, self-contained fraud detection system built with Apache Kafka and Python
☆88Updated 6 years ago
vincentclaes / datajob
Build and deploy a serverless data pipeline on AWS with no effort.
☆111Updated 2 years ago