Yelp / mrjob
Run MapReduce jobs on Hadoop or Amazon Web Services
☆2,618Updated 2 years ago
Alternatives and similar repositories for mrjob:
Users that are interested in mrjob are comparing it to the libraries listed below
- Python module that allows one to easily write and run Hadoop programs.☆1,033Updated 7 years ago
- PySpark + Scikit-learn = Sparkit-learn☆1,154Updated 4 years ago
- Pinball is a scalable workflow manager☆1,045Updated 5 years ago
- Run Python in Apache Storm topologies. Pythonic API, CLI tooling, and a topology DSL.☆1,495Updated 7 months ago
- A pure python HDFS client☆857Updated 2 years ago
- Apache Kafka client for Python; high-level & low-level consumer/producer, with great performance.☆1,117Updated 4 years ago
- Python interface to Hive and Presto. 🐝☆1,678Updated 7 months ago
- Data Migration for the Blaze Project☆1,004Updated 2 years ago
- a plottling library for python, based on D3☆1,419Updated 4 years ago
- NumPy and Pandas interface to Big Data☆3,196Updated last year
- SFrame: Scalable tabular and graph data-structures built for out-of-core data analysis and machine learning.☆894Updated 6 years ago
- A library for reading text files over multiple cores.☆1,056Updated last year
- Python Extract Transform and Load Tables of Data☆1,263Updated 10 months ago
- Extract Transform Load for Python 3.5+☆1,590Updated last year
- Python helpers for building dashboards using Flask and React☆2,271Updated 7 years ago
- Python MapReduce library written in Cython. Visit us in #hadoopy on freenode. See the link below for documentation and tutorials.☆243Updated 9 years ago
- python implementation of the parquet columnar file format.☆349Updated 3 years ago
- A Python MapReduce and HDFS API for Hadoop☆238Updated last month
- Pyleus is a Python framework for developing and launching Storm topologies.☆401Updated 6 years ago
- Scripts used to setup a Spark cluster on EC2☆393Updated 7 years ago
- A Scala API for Cascading☆3,515Updated last year
- A developer-friendly Python library to interact with Apache HBase☆607Updated 8 months ago
- Python clone of Spark, a MapReduce alike framework in Python☆2,684Updated 4 years ago
- Information for setting up for the BerkeleyX Spark Intro MOOC, and lab assignments for the course☆349Updated 4 years ago
- MongoDB Connector for Hadoop☆1,520Updated 3 years ago
- A Python stream processing engine modeled after Yahoo! Pipes☆1,604Updated 3 years ago
- A Python to Vega translator☆2,032Updated 8 years ago
- scalable analysis of images and time series☆821Updated 8 years ago
- Bringing the python data stack to the shell prompt☆789Updated 4 years ago
- Jupyter magics and kernels for working with remote Spark clusters☆1,346Updated 3 weeks ago