CLI tool to launch Spark jobs on AWS EMR
☆67Oct 18, 2023Updated 2 years ago
Alternatives and similar repositories for sparksteps
Users that are interested in sparksteps are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Docker compose files for various kafka stacks☆32Feb 24, 2018Updated 8 years ago
- Dynamically generate Buildkite pipelines based on project changes☆96Dec 5, 2025Updated 6 months ago
- Streaming left joins in Kafka for change data capture☆53Apr 20, 2026Updated 2 months ago
- A toolset to streamline running spark python on EMR☆20Nov 16, 2016Updated 9 years ago
- Dump mysql tables to s3, and parse them☆31Nov 7, 2014Updated 11 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Spark Streaming ETL jobs for Mozilla Telemetry☆18Dec 5, 2019Updated 6 years ago
- Common post-estimation tasks for scikit-learn☆17Nov 30, 2016Updated 9 years ago
- Unit and integration testing with PySpark can be tough to figure out, let's make that easier.☆23Nov 3, 2015Updated 10 years ago
- AWS bootstrap scripts for Mozilla's flavoured Spark setup.☆47Feb 13, 2020Updated 6 years ago
- Python 3 compatible library for DynamoDB☆13Dec 4, 2021Updated 4 years ago
- ☆25Jun 25, 2018Updated 8 years ago
- Dynamic weighted sampling with replacement☆14Mar 19, 2016Updated 10 years ago
- Tail for AWS CloudFormation stack events☆24Apr 17, 2023Updated 3 years ago
- Chef cookbook for the http://druid.io/☆10Apr 25, 2016Updated 10 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Materials for my talk at PyData Chicago 2016☆20May 25, 2017Updated 9 years ago
- A pandas.DataFrame-based ORM.☆85Mar 15, 2022Updated 4 years ago
- Helm plugin to destroy all releases☆19Feb 27, 2018Updated 8 years ago
- Sample data conversion pipeline for importing data into Amazon Personalize.☆19Feb 13, 2019Updated 7 years ago
- Interactive computing for complex data processing, modeling and analysis in Python 3☆79May 3, 2024Updated 2 years ago
- Building blocks of tensorflow architectures☆11Oct 14, 2019Updated 6 years ago
- A collection of airflow sample workflows for data processing on aws☆12Dec 1, 2017Updated 8 years ago
- Apache (Py)Spark type annotations (stub files).☆118Aug 17, 2022Updated 3 years ago
- 핵토버페스트 서울☆13Oct 26, 2020Updated 5 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- This repository hold the Amazon Elastic MapReduce sample bootstrap actions☆613Jun 5, 2023Updated 3 years ago
- A simple elasticsearch frontend for serving astrophysical simulation catalog data☆11Mar 14, 2026Updated 3 months ago
- Paper: A Zero-rename committer for object stores☆20Nov 7, 2025Updated 7 months ago
- Demonstration of using Apache Spark to build robust ETL pipelines while taking advantage of open source, general purpose cluster computin…☆24Aug 11, 2023Updated 2 years ago
- cli AWS Cloudwatch Logs Downloader☆26Jun 6, 2018Updated 8 years ago
- Generate PNG images of syntax highlighted Python.☆10Jul 7, 2021Updated 4 years ago
- All the code related to building my own data lake☆21May 22, 2023Updated 3 years ago
- R package for accessing the StatisticsNZ API☆10Feb 20, 2023Updated 3 years ago
- A helper that integrates Pydantic with requests library for seamless access to defined Models☆11Mar 9, 2022Updated 4 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Tool to visualize data quickly with no brain usage for plot creation☆48Oct 29, 2025Updated 8 months ago
- An OpenCalais API Interface for Python.☆21Mar 13, 2012Updated 14 years ago
- How to deploy a Machine Learning model for sentiment analysis in the Cloud with AWS Lambda.☆104Oct 22, 2020Updated 5 years ago
- Ansible role to deploy and configure Airflow☆41Jun 25, 2026Updated last week
- Machine Learning Versioning made Simple☆39Jun 21, 2022Updated 4 years ago
- A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support☆260Nov 3, 2017Updated 8 years ago
- DBT Cloud Plugin for Airflow☆38May 14, 2024Updated 2 years ago