CLI tool to launch Spark jobs on AWS EMR
☆67Oct 18, 2023Updated 2 years ago
Alternatives and similar repositories for sparksteps
Users that are interested in sparksteps are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Cython implementation of DeepWalk☆53Jul 6, 2023Updated 2 years ago
- Docker compose files for various kafka stacks☆32Feb 24, 2018Updated 8 years ago
- Pythonic interfaces using decorators☆36Nov 4, 2023Updated 2 years ago
- A toolset to streamline running spark python on EMR☆20Nov 16, 2016Updated 9 years ago
- Dump mysql tables to s3, and parse them☆31Nov 7, 2014Updated 11 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Unit and integration testing with PySpark can be tough to figure out, let's make that easier.☆23Nov 3, 2015Updated 10 years ago
- Build the numpy/scipy/scikitlearn packages and strip them down to run in Lambda☆208Jul 12, 2018Updated 7 years ago
- A collection of data engineering projects: data modeling, ETL pipelines, data lakes, infrastructure configuration on AWS, data warehousin…☆15Apr 29, 2021Updated 5 years ago
- AWS bootstrap scripts for Mozilla's flavoured Spark setup.☆47Feb 13, 2020Updated 6 years ago
- ☆25Jun 25, 2018Updated 7 years ago
- Dynamic weighted sampling with replacement☆14Mar 19, 2016Updated 10 years ago
- WebSocket-enabled PDF viewer☆15Jun 6, 2022Updated 4 years ago
- Source-LDA: Enhancing probabilistic topic models using prior knowledge sources (ICDE 2017)☆21May 18, 2017Updated 9 years ago
- Chef cookbook for the http://druid.io/☆10Apr 25, 2016Updated 10 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Materials for my talk at PyData Chicago 2016☆20May 25, 2017Updated 9 years ago
- Logistic Regression in Spark Streaming with Online Updating☆20Oct 27, 2016Updated 9 years ago
- A pandas.DataFrame-based ORM.☆85Mar 15, 2022Updated 4 years ago
- Helm plugin to destroy all releases☆19Feb 27, 2018Updated 8 years ago
- Interactive computing for complex data processing, modeling and analysis in Python 3☆79May 3, 2024Updated 2 years ago
- Building blocks of tensorflow architectures☆11Oct 14, 2019Updated 6 years ago
- S3-backed notebook manager for IPython☆29May 1, 2017Updated 9 years ago
- A collection of airflow sample workflows for data processing on aws☆12Dec 1, 2017Updated 8 years ago
- Puppet module to provision Airbnb's Airflow☆19Jun 8, 2022Updated 4 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Apache (Py)Spark type annotations (stub files).☆118Aug 17, 2022Updated 3 years ago
- This repository hold the Amazon Elastic MapReduce sample bootstrap actions☆614Jun 5, 2023Updated 3 years ago
- A simple elasticsearch frontend for serving astrophysical simulation catalog data☆11Mar 14, 2026Updated 2 months ago
- A software engineering framework to jump start your machine learning projects☆37Jan 24, 2026Updated 4 months ago
- Paper: A Zero-rename committer for object stores☆20Nov 7, 2025Updated 7 months ago
- Demonstration of using Apache Spark to build robust ETL pipelines while taking advantage of open source, general purpose cluster computin…☆24Aug 11, 2023Updated 2 years ago
- 📤 In-memory implementation of SQS ideal for unit testing.☆14Jun 8, 2024Updated 2 years ago
- R package for accessing the StatisticsNZ API☆10Feb 20, 2023Updated 3 years ago
- ☆16May 31, 2017Updated 9 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- A demo Piccolo app - a movie database!☆17Oct 30, 2021Updated 4 years ago
- A helper that integrates Pydantic with requests library for seamless access to defined Models☆11Mar 9, 2022Updated 4 years ago
- An OpenCalais API Interface for Python.☆21Mar 13, 2012Updated 14 years ago
- Ansible role to deploy and configure Airflow☆41Jun 4, 2026Updated last week
- Machine Learning Versioning made Simple☆39Jun 21, 2022Updated 3 years ago
- A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support☆260Nov 3, 2017Updated 8 years ago
- Create hadoop cluster in aws ec2 for development☆11Sep 8, 2017Updated 8 years ago