idealo/terraform-emr-pyspark

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/idealo/terraform-emr-pyspark)

idealo / terraform-emr-pyspark

Quickstart PySpark with Anaconda on AWS/EMR using Terraform

☆48

Alternatives and similar repositories for terraform-emr-pyspark

Users that are interested in terraform-emr-pyspark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

dimajix / terraform-emr-training
View on GitHub
Terraform script for launching multiple EMR clusters for training purposes.
☆16Oct 30, 2025Updated 8 months ago
antimoz-om / Antimoz
View on GitHub
A data engineering pipeline for digital marketers.
☆11Dec 21, 2018Updated 7 years ago
garystafford / emr-demo
View on GitHub
Project files for the post: Running PySpark Applications on Amazon EMR: Methods for Interacting with PySpark on Amazon Elastic MapReduce.
☆38Sep 1, 2022Updated 3 years ago
dlt-hub / dlt-dagster-demo
View on GitHub
dlt-dagster-demo
☆14Nov 6, 2023Updated 2 years ago
spotify-iacs / capstone
View on GitHub
☆14May 15, 2017Updated 9 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
yodasco / pyspark-emr
View on GitHub
A toolset to streamline running spark python on EMR
☆20Nov 16, 2016Updated 9 years ago
richardanaya / spark_delta_lake
View on GitHub
☆16Jun 27, 2020Updated 6 years ago
NYUBigDataProject / SparkClean
View on GitHub
A Scalable Data Cleaning Library for PySpark.
☆29Apr 4, 2019Updated 7 years ago
vuthanhhai2302 / Applied-Pyspark
View on GitHub
My applied big data analytic project with pyspark.
☆10Sep 21, 2022Updated 3 years ago
cevoaustralia / glue-vscode
View on GitHub
Local Development of AWS Glue with Docker and Visual Studio Code
☆14Nov 29, 2021Updated 4 years ago
jareddlc / jenkins-with-docker-socket
View on GitHub
Official Jenkins image with Docker socket
☆10Nov 13, 2019Updated 6 years ago
speedment / avro-mocker
View on GitHub
Generate mock data based on an Apache Avro schema and specific cardinality settings
☆10Apr 16, 2018Updated 8 years ago
DmitriySh / infra
View on GitHub
DevOps course, Google Cloud Platform
☆13Oct 15, 2017Updated 8 years ago
newfront / spark-intro-to-ml
View on GitHub
A Gentle introduction to Machine Learning with Apache Spark
☆11Mar 2, 2026Updated 4 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
elastic / blog-langchain-elasticsearch
View on GitHub
Code examples accompanying blog "Privacy-first AI search using LangChain and Elasticsearch"
☆30Aug 9, 2024Updated last year
aws-samples / amazon-emr-optimize-data-processing
View on GitHub
Optimizing downstream data processing with Amazon Kinesis Data Firehose and Amazon EMR running Apache Spark
☆14Apr 14, 2023Updated 3 years ago
joomcode / spark-platform
View on GitHub
Basic Spark utilities
☆13Updated this week
fscm / terraform-module-aws-spark
View on GitHub
Terraform Module to create a Apache Spark cluster on AWS
☆16Jan 3, 2022Updated 4 years ago
itversity / retail_db_json
View on GitHub
☆14Sep 14, 2021Updated 4 years ago
sanjeevai / disaster-response-pipeline
View on GitHub
ETL pipeline combined with supervised learning and grid search to classify text messages sent during a disaster event
☆16Feb 24, 2019Updated 7 years ago
jamartinh / Orange3-Spark
View on GitHub
A set of widgets for Python's Orange Machine Learning to work with Apache Spark ML
☆15Dec 24, 2016Updated 9 years ago
FavioVazquez / ODSC_India_2018
View on GitHub
My presentation at ODSC India 2018 about Deep Learning with Apache Spark
☆26Sep 1, 2018Updated 7 years ago
RishiSankineni / Machine-Learning-Pipeline-LR-Pyspark
View on GitHub
Power Plant ML Pipeline Application - Apache Spark
☆12Dec 12, 2016Updated 9 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
SaurabhChawla100 / spark-radiant
View on GitHub
Spark-Radiant is Apache Spark Performance and Cost Optimizer
☆25Dec 31, 2024Updated last year
curto2 / mckernel
View on GitHub
McKernel: A Library for Approximate Kernel Expansions in Log-linear Time.
☆14Sep 3, 2022Updated 3 years ago
sbdzdz / python-katas
View on GitHub
Short exercises in Python.
☆11Jan 13, 2023Updated 3 years ago
adityajain10 / pyspark-mlib-based-stock-predictor
View on GitHub
PredictorFinc is a scalable supervised machine learning model the predicts stock price change through Decision Tree Regressor using data …
☆12Sep 5, 2023Updated 2 years ago
tarzain / crosstalk
View on GitHub
a simple system for 2-way interruptible voice interactions between human and LLM
☆30Feb 18, 2024Updated 2 years ago
mark-hoffmann / fastteradata
View on GitHub
Tools for faster and optimized interaction with Teradata and large datasets.
☆17Jul 11, 2018Updated 8 years ago
deepyaman / jaffle-shop
View on GitHub
Example project for building scalable data pipelines with Kedro and Ibis.
☆14Dec 10, 2025Updated 7 months ago
minzhang-1 / PointHop-PointHop2_Spark
View on GitHub
A fast and low memory requirement version of PointHop and PointHop++, which is built upon Apache Spark.
☆10Jul 14, 2020Updated 6 years ago
openstack-archive / ansible-role-jenkins
View on GitHub
MOVED: now at https://opendev.org/x/ansible-role-jenkins
☆15Sep 26, 2019Updated 6 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
MLWhiz / Spark_Projects
View on GitHub
Spark Projects for the Berkeley Data Science Course
☆13Aug 12, 2015Updated 10 years ago
cerndb / sparkMeasure
View on GitHub
This is a mirror of https://github.com/LucaCanali/sparkMeasure - sparkMeasure is a tool for performance troubleshooting of Apache Spark w…
☆16May 21, 2026Updated 2 months ago
ronald-smith-angel / owl-data-sanitizer
View on GitHub
A pyspark lib to validate data quality
☆19Nov 11, 2022Updated 3 years ago
theburningmonk / passwordless-cognito-ui-demo
View on GitHub
Frontend app to go with the backend Cognito demos
☆14Mar 19, 2023Updated 3 years ago
javieraviles / spring-boot-redis-rest
View on GitHub
API REST boilerplate using Spring Boot and Redis as database
☆13Dec 26, 2018Updated 7 years ago
7dir / json-fonts
View on GitHub
TextGeometry json fonts (THREE.js, AFRAME, etc.) Arabic Bengali Cyrillic Devanagari Greek Gujarati Gurmukhi Hebrew Kannada Khmer Latin Ma…
☆10Jun 28, 2017Updated 9 years ago
divyam-rai / simple-kafka-sasl-docker-python
View on GitHub
Due to lack of resources on how to deploy kafka with simple SASL authentication (just username and password) and how to write producer an…
☆12Dec 29, 2021Updated 4 years ago