CLI tool to launch Spark jobs on AWS EMR
☆67Oct 18, 2023Updated 2 years ago
Alternatives and similar repositories for sparksteps
Users that are interested in sparksteps are comparing it to the libraries listed below
Sorting:
- Docker compose files for various kafka stacks☆32Feb 24, 2018Updated 8 years ago
- A toolset to streamline running spark python on EMR☆20Nov 16, 2016Updated 9 years ago
- Create a monorepo by merging multiple github repositories☆30Jun 2, 2024Updated last year
- Pythonic interfaces using decorators☆33Nov 4, 2023Updated 2 years ago
- Flexible tool to autogenerate a model from an existing database☆18Apr 9, 2017Updated 8 years ago
- Unit and integration testing with PySpark can be tough to figure out, let's make that easier.☆23Nov 3, 2015Updated 10 years ago
- Dynamic weighted sampling with replacement☆14Mar 19, 2016Updated 9 years ago
- ☆32Mar 20, 2024Updated last year
- WebSocket-enabled PDF viewer☆15Jun 6, 2022Updated 3 years ago
- (Weighted) Finite State Transducers for Scala NLP☆21Nov 15, 2014Updated 11 years ago
- Spark Streaming ETL jobs for Mozilla Telemetry☆18Dec 5, 2019Updated 6 years ago
- [UNMAINTAINED] A starter pack for creating a lightweight responsive web app for Fast.AI PyTorch models.☆16Dec 5, 2018Updated 7 years ago
- ☆11Aug 16, 2016Updated 9 years ago
- Common post-estimation tasks for scikit-learn☆17Nov 30, 2016Updated 9 years ago
- GitHubAPI wrapper for scala☆15Mar 22, 2023Updated 2 years ago
- Sample data conversion pipeline for importing data into Amazon Personalize.☆19Feb 13, 2019Updated 7 years ago
- ☆41Aug 17, 2016Updated 9 years ago
- Language support for Scala in Atom.☆51Jul 21, 2021Updated 4 years ago
- AWS bootstrap scripts for Mozilla's flavoured Spark setup.☆47Feb 13, 2020Updated 6 years ago
- Scala macros for making debugging easier☆83Nov 22, 2019Updated 6 years ago
- S3-backed notebook manager for IPython☆29May 1, 2017Updated 8 years ago
- Example projects for using Spark and Cassandra With DSE Analytics☆58Oct 10, 2025Updated 4 months ago
- Export Redshift data and convert to Parquet for use with Redshift Spectrum or other data warehouses.☆117Dec 26, 2022Updated 3 years ago
- A Scala feature transformation library for data science and machine learning☆474Feb 7, 2025Updated last year
- This repository hold the Amazon Elastic MapReduce sample bootstrap actions☆613Jun 5, 2023Updated 2 years ago
- Code supporting Data Science articles at The Marketing Technologist, Floryn Tech Blog, and Pythom.nl☆71Mar 17, 2023Updated 2 years ago
- Women Who Code stuff☆12Dec 10, 2019Updated 6 years ago
- ☆25Oct 29, 2019Updated 6 years ago
- Apache (Py)Spark type annotations (stub files).☆118Aug 17, 2022Updated 3 years ago
- A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support☆261Nov 3, 2017Updated 8 years ago
- A software engineering framework to jump start your machine learning projects☆37Jan 24, 2026Updated last month
- Some sample notebooks using R commands within Python through rpy2 module☆31Feb 13, 2017Updated 9 years ago
- python implementation of the parquet columnar file format.☆21Dec 18, 2025Updated 2 months ago
- RequireJs optimizer plugin for sbt-web☆37Feb 14, 2026Updated 2 weeks ago
- Code and notes from using scikit-learn on the MNIST digits dataset. For more of a narrative on this project, see the article:☆29Jan 28, 2016Updated 10 years ago
- Prototype Pandemic Unemployment Assistance (PUA) claim service☆12Dec 2, 2021Updated 4 years ago
- ☆10May 28, 2025Updated 9 months ago
- Detecting Radiological Threats in Urban Areas (9th place solution)☆10May 4, 2019Updated 6 years ago
- ☆11Aug 11, 2015Updated 10 years ago