CLI tool to launch Spark jobs on AWS EMR
☆67Oct 18, 2023Updated 2 years ago
Alternatives and similar repositories for sparksteps
Users that are interested in sparksteps are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Cython implementation of DeepWalk☆53Jul 6, 2023Updated 2 years ago
- Docker compose files for various kafka stacks☆32Feb 24, 2018Updated 8 years ago
- Flexible tool to autogenerate a model from an existing database☆18Apr 9, 2017Updated 9 years ago
- Automatic model code generator for SQLAlchemy with Flask support☆337Jul 9, 2024Updated last year
- Spark Streaming ETL jobs for Mozilla Telemetry☆18Dec 5, 2019Updated 6 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Unit and integration testing with PySpark can be tough to figure out, let's make that easier.☆23Nov 3, 2015Updated 10 years ago
- AWS bootstrap scripts for Mozilla's flavoured Spark setup.☆47Feb 13, 2020Updated 6 years ago
- ☆25Jun 25, 2018Updated 7 years ago
- ☆33Mar 20, 2024Updated 2 years ago
- Dynamic weighted sampling with replacement☆14Mar 19, 2016Updated 10 years ago
- Tail for AWS CloudFormation stack events☆24Apr 17, 2023Updated 3 years ago
- Source-LDA: Enhancing probabilistic topic models using prior knowledge sources (ICDE 2017)☆21May 18, 2017Updated 8 years ago
- Chef cookbook for the http://druid.io/☆10Apr 25, 2016Updated 10 years ago
- Materials for my talk at PyData Chicago 2016☆20May 25, 2017Updated 8 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- [UNMAINTAINED] A starter pack for creating a lightweight responsive web app for Fast.AI PyTorch models.☆16Dec 5, 2018Updated 7 years ago
- A pandas.DataFrame-based ORM.☆85Mar 15, 2022Updated 4 years ago
- Sample data conversion pipeline for importing data into Amazon Personalize.☆19Feb 13, 2019Updated 7 years ago
- The tensorflow prototype of "Local Low-rank Matrix Approximation" (LLORMA)☆10Jan 11, 2019Updated 7 years ago
- A collection of airflow sample workflows for data processing on aws☆12Dec 1, 2017Updated 8 years ago
- Apache (Py)Spark type annotations (stub files).☆118Aug 17, 2022Updated 3 years ago
- Library for AWS SWF.☆39Apr 13, 2026Updated 2 weeks ago
- This repository hold the Amazon Elastic MapReduce sample bootstrap actions☆614Jun 5, 2023Updated 2 years ago
- A simple elasticsearch frontend for serving astrophysical simulation catalog data☆10Mar 14, 2026Updated last month
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Paper: A Zero-rename committer for object stores☆20Nov 7, 2025Updated 5 months ago
- Demonstration of using Apache Spark to build robust ETL pipelines while taking advantage of open source, general purpose cluster computin…☆25Aug 11, 2023Updated 2 years ago
- Generate PNG images of syntax highlighted Python.☆10Jul 7, 2021Updated 4 years ago
- All the code related to building my own data lake☆21May 22, 2023Updated 2 years ago
- ELK 튜토리얼☆11Mar 15, 2023Updated 3 years ago
- ☆16May 31, 2017Updated 8 years ago
- Code for PyData Talk on "Classifying Products Based on Images and Text using Keras"☆30Apr 3, 2017Updated 9 years ago
- Apache Spark based ETL Engine☆71Oct 18, 2016Updated 9 years ago
- Python code to seasonally adjust data using the census X12-ARIMA program: http://www.census.gov/srd/www/x12a/☆11Mar 22, 2012Updated 14 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Legoo: A collection of automation modules to build analytics infrastructure☆20Jul 24, 2020Updated 5 years ago
- An OpenCalais API Interface for Python.☆21Mar 13, 2012Updated 14 years ago
- Place ASGs on the right Spot Market☆39Dec 27, 2016Updated 9 years ago
- Machine Learning Versioning made Simple☆38Jun 21, 2022Updated 3 years ago
- We are a group of volunteers who is trying to organize python meetups and workshops in Amsterdam.☆11Apr 4, 2020Updated 6 years ago
- A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support☆261Nov 3, 2017Updated 8 years ago
- A Pytest plugin to make console output more manageable when there are multiple failed tests☆13Nov 4, 2023Updated 2 years ago