yodasco/pyspark-emr

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/yodasco/pyspark-emr)

yodasco / pyspark-emr

A toolset to streamline running spark python on EMR

☆20

Alternatives and similar repositories for pyspark-emr

Users that are interested in pyspark-emr are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

trek10inc / lambda-local-cache
View on GitHub
☆10Jul 5, 2016Updated 10 years ago
aws-samples / dbtgluenyctaxidemo
View on GitHub
☆11Oct 11, 2022Updated 3 years ago
jwplayer / sparksteps
View on GitHub
CLI tool to launch Spark jobs on AWS EMR
☆67Oct 18, 2023Updated 2 years ago
newfront / spark-intro-to-ml
View on GitHub
A Gentle introduction to Machine Learning with Apache Spark
☆11Mar 2, 2026Updated 4 months ago
mikulskibartosz / check-engine
View on GitHub
Data validation library for PySpark 3.0.0
☆33Nov 11, 2022Updated 3 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
aws-samples / amazon-emr-optimize-data-processing
View on GitHub
Optimizing downstream data processing with Amazon Kinesis Data Firehose and Amazon EMR running Apache Spark
☆14Apr 14, 2023Updated 3 years ago
joomcode / spark-platform
View on GitHub
Basic Spark utilities
☆13Updated this week
saabeilin / kafkian
View on GitHub
An opinionated Kafka producer/consumer built on top of confluent-kafka-python/librdkafka
☆28Apr 23, 2026Updated 3 months ago
abajwa-hw / ntpd-stack
View on GitHub
Ambari stack service for easily installing and managing NTPD on HDP cluster
☆14Apr 3, 2018Updated 8 years ago
jamartinh / Orange3-Spark
View on GitHub
A set of widgets for Python's Orange Machine Learning to work with Apache Spark ML
☆15Dec 24, 2016Updated 9 years ago
LinkedInAttic / apache-incubator-gobblin
View on GitHub
Gobblin is a distributed big data integration framework (ingestion, replication, compliance, retention) for batch and streaming systems.…
☆11Jul 29, 2017Updated 8 years ago
ZuInnoTe / spark-hadoopoffice-ds
View on GitHub
A Spark datasource for the HadoopOffice library
☆36Sep 29, 2025Updated 10 months ago
mark-hoffmann / fastteradata
View on GitHub
Tools for faster and optimized interaction with Teradata and large datasets.
☆17Jul 11, 2018Updated 8 years ago
hortonworks-gallery / ambari-freeipa-service
View on GitHub
Ambari service for RedHat FreeIPA
☆11Sep 30, 2016Updated 9 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
cevoaustralia / glue-vscode
View on GitHub
Local Development of AWS Glue with Docker and Visual Studio Code
☆14Nov 29, 2021Updated 4 years ago
amazon-archives / amazon-cognito-streams-sample
View on GitHub
Sample demonstrating consuming Amazon Cognito Streams
☆10Jun 15, 2020Updated 6 years ago
cerndb / sparkMeasure
View on GitHub
This is a mirror of https://github.com/LucaCanali/sparkMeasure - sparkMeasure is a tool for performance troubleshooting of Apache Spark w…
☆16May 21, 2026Updated 2 months ago
oleewere / ansible-ambari-manager
View on GitHub
List of playbooks to manage Ambari
☆13Oct 3, 2018Updated 7 years ago
frodenas / grafana_exporter
View on GitHub
Grafana Prometheus exporter
☆10Oct 17, 2017Updated 8 years ago
ronald-smith-angel / owl-data-sanitizer
View on GitHub
A pyspark lib to validate data quality
☆19Nov 11, 2022Updated 3 years ago
javieraviles / spring-boot-redis-rest
View on GitHub
API REST boilerplate using Spring Boot and Redis as database
☆13Dec 26, 2018Updated 7 years ago
vvaks0 / AvroSchemaShredder
View on GitHub
Avro Schema Shredder is a REST API that enables storage of Avro Schemas in Apache Atlas. This API enables an organization to use Apache A…
☆13Jan 11, 2017Updated 9 years ago
aljoscha / blog
View on GitHub
Thoughts on things I find interesting.
☆17Dec 19, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
ankkur13 / Big-Data-Systems-and-Intelligence-Analytics
View on GitHub
☆12Apr 27, 2018Updated 8 years ago
shwethags / atlas-lineage
View on GitHub
Example to create lineage in Atlas with sqoop and spark
☆14Apr 5, 2017Updated 9 years ago
adoroszlai / ambari-runtime-compose
View on GitHub
Sample Docker Compose files for running Apache Ambari
☆11Oct 29, 2018Updated 7 years ago
aardelean / nonblocking-microservice
View on GitHub
☆10Jan 31, 2016Updated 10 years ago
sangwin / Angular-5-Sample-Demo
View on GitHub
Angular 5 Sample Demo Application
☆10Jan 12, 2023Updated 3 years ago
idealo / terraform-emr-pyspark
View on GitHub
Quickstart PySpark with Anaconda on AWS/EMR using Terraform
☆48Jan 7, 2025Updated last year
elisska / cloudera-cassandra
View on GitHub
Cloudera Manager parcel and CSD to manage Cassandra NoSQL database
☆14Nov 16, 2016Updated 9 years ago
jasebell / RecommenderDemo
View on GitHub
An Apache Mahout based recommendation engine demo for 5000 users, 45000 items and 1,000,000 transactions.
☆16Jan 5, 2013Updated 13 years ago
ibm-cloud-solutions / hubot-ibmcloud-nlc
View on GitHub
Adds a framework to enable Natural Language interactions in your Hubot scripts
☆11Dec 6, 2016Updated 9 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
rochacon / jiraya
View on GitHub
Jiraya - Simple Jira CLI
☆17Dec 13, 2019Updated 6 years ago
qubole / spark-state-store
View on GitHub
Rocksdb state storage implementation for Structured Streaming.
☆17Oct 21, 2020Updated 5 years ago
spring-tips / lazy-and-fast
View on GitHub
Hi Spring fans! Welcome to a quick, mid-interregnum installment of Spring Tips in which we look at a few features that let you be both la…
☆13Mar 14, 2019Updated 7 years ago
monolive / ambari-custom-alerts
View on GitHub
Custom Alerts for Ambari server
☆12Jul 27, 2015Updated 11 years ago
jupyterhub / yarnspawner
View on GitHub
Spawn JupyterHub single user notebook servers in Hadoop/YARN containers.
☆19Apr 23, 2025Updated last year
awslabs / amazon-kinesis-connector-flink
View on GitHub
This is a fork of the Apache Flink Kinesis connector adding Enhanced Fanout support for Flink 1.8/1.11 on KDA.
☆24Mar 1, 2026Updated 4 months ago
anupamachandra / WWC
View on GitHub
Women Who Code stuff
☆12Dec 10, 2019Updated 6 years ago