CLI tool to launch Spark jobs on AWS EMR
☆67Oct 18, 2023Updated 2 years ago
Alternatives and similar repositories for sparksteps
Users that are interested in sparksteps are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Pythonic interfaces using decorators☆36Nov 4, 2023Updated 2 years ago
- Dump mysql tables to s3, and parse them☆31Nov 7, 2014Updated 11 years ago
- Spark Streaming ETL jobs for Mozilla Telemetry☆18Dec 5, 2019Updated 6 years ago
- Common post-estimation tasks for scikit-learn☆17Nov 30, 2016Updated 9 years ago
- Unit and integration testing with PySpark can be tough to figure out, let's make that easier.☆23Nov 3, 2015Updated 10 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Build the numpy/scipy/scikitlearn packages and strip them down to run in Lambda☆207Jul 12, 2018Updated 7 years ago
- A collection of data engineering projects: data modeling, ETL pipelines, data lakes, infrastructure configuration on AWS, data warehousin…☆15Apr 29, 2021Updated 5 years ago
- AWS bootstrap scripts for Mozilla's flavoured Spark setup.☆47Feb 13, 2020Updated 6 years ago
- A Terraform module to create an Amazon Web Services (AWS) Elastic MapReduce (EMR) cluster.☆39Oct 21, 2019Updated 6 years ago
- ☆25Jun 25, 2018Updated 7 years ago
- ☆34Mar 20, 2024Updated 2 years ago
- WebSocket-enabled PDF viewer☆15Jun 6, 2022Updated 3 years ago
- Chef cookbook for the http://druid.io/☆10Apr 25, 2016Updated 10 years ago
- (Weighted) Finite State Transducers for Scala NLP☆21Nov 15, 2014Updated 11 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Spring-boot aws lambda Api Gateway terraform☆15Jul 22, 2021Updated 4 years ago
- Interactive computing for complex data processing, modeling and analysis in Python 3☆79May 3, 2024Updated 2 years ago
- Building blocks of tensorflow architectures☆11Oct 14, 2019Updated 6 years ago
- S3-backed notebook manager for IPython☆29May 1, 2017Updated 9 years ago
- A collection of airflow sample workflows for data processing on aws☆12Dec 1, 2017Updated 8 years ago
- Apache (Py)Spark type annotations (stub files).☆118Aug 17, 2022Updated 3 years ago
- This repository hold the Amazon Elastic MapReduce sample bootstrap actions☆614Jun 5, 2023Updated 2 years ago
- A simple elasticsearch frontend for serving astrophysical simulation catalog data☆10Mar 14, 2026Updated 2 months ago
- Demonstration of using Apache Spark to build robust ETL pipelines while taking advantage of open source, general purpose cluster computin…☆24Aug 11, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- cli AWS Cloudwatch Logs Downloader☆26Jun 6, 2018Updated 7 years ago
- Serverless costs calculator for AWS Lambda☆12Oct 21, 2020Updated 5 years ago
- All the code related to building my own data lake☆21May 22, 2023Updated 3 years ago
- R package for accessing the StatisticsNZ API☆10Feb 20, 2023Updated 3 years ago
- ☆16May 31, 2017Updated 8 years ago
- Python code to seasonally adjust data using the census X12-ARIMA program: http://www.census.gov/srd/www/x12a/☆11Mar 22, 2012Updated 14 years ago
- Legoo: A collection of automation modules to build analytics infrastructure☆20Jul 24, 2020Updated 5 years ago
- Language support for Scala in Atom.☆51Jul 21, 2021Updated 4 years ago
- How to deploy a Machine Learning model for sentiment analysis in the Cloud with AWS Lambda.☆105Oct 22, 2020Updated 5 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Ansible role to deploy and configure Airflow☆41May 12, 2026Updated last week
- Machine Learning Versioning made Simple☆38Jun 21, 2022Updated 3 years ago
- A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support☆261Nov 3, 2017Updated 8 years ago
- Create hadoop cluster in aws ec2 for development☆11Sep 8, 2017Updated 8 years ago
- IPython Notebook + D3☆128Jan 30, 2015Updated 11 years ago
- Code supporting Data Science articles at The Marketing Technologist, Floryn Tech Blog, and Pythom.nl☆71Mar 17, 2023Updated 3 years ago
- Example projects for using Spark and Cassandra With DSE Analytics☆59Oct 10, 2025Updated 7 months ago