Wittline/pyspark-on-aws-emr

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Wittline/pyspark-on-aws-emr)

Wittline / pyspark-on-aws-emr

The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.

☆29

Alternatives and similar repositories for pyspark-on-aws-emr

Users that are interested in pyspark-on-aws-emr are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Wittline / Dropout-Students-Prediction
View on GitHub
The goal of this project is to identify students at risk of dropping out the school
☆23May 7, 2021Updated 5 years ago
Wittline / data-engineer-challenge
View on GitHub
Challenge Data Engineer
☆25Jun 13, 2022Updated 4 years ago
Wittline / apache-spark-docker
View on GitHub
Dockerizing an Apache Spark Standalone Cluster
☆42Jun 29, 2022Updated 4 years ago
Wittline / uber-expenses-tracking
View on GitHub
The goal of this project is to track the expenses of Uber Rides and Uber Eats through data Engineering processes using technologies such …
☆125Jun 29, 2022Updated 4 years ago
Wittline / recommendation-system
View on GitHub
Build a Content-Based Movie Recommender System (TF-IDF, BM25, BERT)
☆15Jun 13, 2022Updated 4 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
ayushdixit487 / Uber-Data-Analysis-Project-in-Pyspark
View on GitHub
This data project can be used as a take-home assignment to learn Pyspark and Data Engineering.
☆20Feb 19, 2023Updated 3 years ago
kb1907 / PySpark_Projects
View on GitHub
PySpark Projects
☆27Updated this week
zacharyt-cs / reddit-data-engineering
View on GitHub
An end-to-end data engineering pipeline to create a dashboard for the latest content on the r/Stocks subreddit
☆20Aug 5, 2022Updated 3 years ago
Wittline / data-engineering-challenge-th
View on GitHub
Dockerizing a Python Script for Web Scraping and consume the scraped data using FastApi (www.metroscubicos.com)
☆15Dec 16, 2021Updated 4 years ago
adaltas / spark-streaming-pyspark
View on GitHub
Build and run Spark Structured Streaming pipelines in Hadoop - project using PySpark.
☆13Jun 6, 2019Updated 7 years ago
cevoaustralia / glue-vscode
View on GitHub
Local Development of AWS Glue with Docker and Visual Studio Code
☆14Nov 29, 2021Updated 4 years ago
pancr9 / Netflix-Recommender-System
View on GitHub
ITCS 6190 : Cloud Computing for Data Analysis project. Movie Recommendation Engine for Netflix Data with custom functions implementation …
☆30Dec 8, 2017Updated 8 years ago
antimoz-om / Antimoz
View on GitHub
A data engineering pipeline for digital marketers.
☆11Dec 21, 2018Updated 7 years ago
DIYBigData / spark-data-analysis-projects
View on GitHub
A collection of data analysis projects done using PySpark via Jupyter notebooks.
☆10Oct 8, 2022Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Alexmhack / Django-Rasa-Sockets
View on GitHub
Rasa Chatbot using Django backend and Sockets for communication
☆12Dec 8, 2022Updated 3 years ago
LeonardoEmili / stock-price-forecasting
View on GitHub
Distributed stock price forecasting system to predict S&P 500 stock prices.
☆11Nov 12, 2021Updated 4 years ago
RishiSankineni / Machine-Learning-Pipeline-LR-Pyspark
View on GitHub
Power Plant ML Pipeline Application - Apache Spark
☆12Dec 12, 2016Updated 9 years ago
amalphonse / Udacity_DEND
View on GitHub
This repo contains my projects from the Udacity Data Engineering Nano degree
☆14Apr 26, 2023Updated 3 years ago
rnditdev / PyData_London_2018_Computer_Vision
View on GitHub
These are the slides and code for my tutorial "Computer Vision: an (Un?)Expected Journey" at PyData London 2018
☆30May 30, 2018Updated 8 years ago
DataStax-Academy / cassandra-day-2019
View on GitHub
☆16Oct 23, 2019Updated 6 years ago
adityajain10 / pyspark-mlib-based-stock-predictor
View on GitHub
PredictorFinc is a scalable supervised machine learning model the predicts stock price change through Decision Tree Regressor using data …
☆12Sep 5, 2023Updated 2 years ago
open-data-toronto / framework-data-quality
View on GitHub
☆10Jun 29, 2023Updated 3 years ago
prakashdontaraju / google-cloud-ecommerce
View on GitHub
ecommerce GCP Streaming pipeline ― Cloud Storage, Compute Engine, Pub/Sub, Dataflow, Apache Beam, BigQuery and Tableau; GCP Batch pipelin…
☆11Mar 9, 2022Updated 4 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
minzhang-1 / PointHop-PointHop2_Spark
View on GitHub
A fast and low memory requirement version of PointHop and PointHop++, which is built upon Apache Spark.
☆10Jul 14, 2020Updated 6 years ago
mateuspicanco / project-atlas-sao-paulo
View on GitHub
A project for the development of rich geospatial data from the city of São Paulo for use in Machine Learning models.
☆12Jul 4, 2021Updated 5 years ago
AdamSpannbauer / app_rasa_chat_bot
View on GitHub
a stateless chat bot to perform natural language queries against the App Store top charts
☆29Mar 28, 2018Updated 8 years ago
vsouza / spark-kinesis-redshift
View on GitHub
Example project for consuming AWS Kinesis streamming and save data on Amazon Redshift using Apache Spark
☆11May 22, 2018Updated 8 years ago
camposvinicius / gcp-etl
View on GitHub
This is a pipeline of an ETL application in GCP with open airport code data, which you can find here: https://datahub.io/core/airport-cod…
☆15Nov 15, 2021Updated 4 years ago
ketgo / marshmallow-pyspark
View on GitHub
Marshmallow serializer integration with pyspark
☆12Dec 29, 2023Updated 2 years ago
francescotescari / noiseprint2
View on GitHub
noiseprint2 is a porting of noiseprint to tensorflow 2 and keras
☆12Feb 20, 2021Updated 5 years ago
Dragonsson / Pseudo_efficientNet
View on GitHub
Pytorch--使用伪标签训练efficientNet模型
☆11Dec 28, 2019Updated 6 years ago
dicarlosystems / pointofsale
View on GitHub
Point of Sale module for Invoice Ninja
☆17Jan 24, 2020Updated 6 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
PacktPublishing / DP-203-Azure-Data-Engineer-Associate-Certification-Guide-Second-Edition
View on GitHub
☆21Jun 7, 2024Updated 2 years ago
akanz1 / air_quality_app
View on GitHub
Dockerized python app to measure air quality, temperature and more using a raspberry pi + sensor
☆15Jul 7, 2026Updated last week
ibaiGorordo / Tensorflow-Mobile-Generic-Object-Localizer
View on GitHub
Python Tensorflow 2 scripts for detecting objects of any class in an image without knowing their label.
☆16Sep 18, 2021Updated 4 years ago
AWS-Big-Data-Projects / Analysing-Census-Data-using-aws
View on GitHub
Use aws-emr and aws-redshift to analyse dataset of adult census of USA
☆13Sep 11, 2020Updated 5 years ago
Zacchaeus00 / nbme
View on GitHub
https://www.kaggle.com/c/nbme-score-clinical-patient-notes
☆10Sep 1, 2022Updated 3 years ago
mpfishe2 / az-databricks-realtime-alert-system
View on GitHub
Building a real-time alert monitoring pipeline that sends email notifications off of Azure Event Hubs, Azure Databricks, and a Azure Logi…
☆13Mar 8, 2020Updated 6 years ago
big-data-lab-team / accident-prediction-montreal
View on GitHub
☆12Dec 8, 2022Updated 3 years ago