Yelp / mrjobLinks
Run MapReduce jobs on Hadoop or Amazon Web Services
☆2,618Updated 2 years ago
Alternatives and similar repositories for mrjob
Users that are interested in mrjob are comparing it to the libraries listed below
Sorting:
- Python module that allows one to easily write and run Hadoop programs.☆1,031Updated 7 years ago
- A pure python HDFS client☆857Updated 3 years ago
- Run Python in Apache Storm topologies. Pythonic API, CLI tooling, and a topology DSL.☆1,497Updated 9 months ago
- PySpark + Scikit-learn = Sparkit-learn☆1,154Updated 4 years ago
- a Map/Reduce framework for distributed computing☆1,633Updated 7 years ago
- Apache Kafka client for Python; high-level & low-level consumer/producer, with great performance.☆1,115Updated 4 years ago
- Data Migration for the Blaze Project☆1,002Updated 2 years ago
- NumPy and Pandas interface to Big Data☆3,199Updated last year
- A Scala API for Cascading☆3,515Updated 2 years ago
- Crab is a flexible, fast recommender engine for Python that integrates classic information filtering recommendation algorithms in the world…☆1,181Updated 4 years ago
- Pinball is a scalable workflow manager☆1,043Updated 5 years ago
- Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code.☆1,137Updated 2 years ago
- Python interface to Hive and Presto. 🐝☆1,682Updated 9 months ago
- A library for time series analysis on Apache Spark☆1,194Updated 4 years ago
- Livy is an open source REST interface for interacting with Apache Spark from anywhere☆1,007Updated 2 years ago
- Jupyter magics and kernels for working with remote Spark clusters☆1,355Updated last week
- Simple DAG-based job scheduler in Python☆766Updated 5 years ago
- A developer-friendly Python library to interact with Apache HBase☆609Updated 10 months ago
- A scalable machine learning library on Apache Spark☆795Updated 3 years ago
- ☆522Updated 3 years ago
- This repository hold the Amazon Elastic MapReduce sample bootstrap actions☆614Updated last year
- Real-time Query for Hadoop; mirror of Apache Impala☆34Updated 2 years ago
- Python MapReduce library written in Cython. Visit us in #hadoopy on freenode. See the link below for documentation and tutorials.☆243Updated 9 years ago
- Kazoo is a high-level Python library that makes it easier to use Apache Zookeeper.☆1,310Updated last month
- Python DB API 2.0 client for Impala and Hive (HiveServer2 protocol)☆737Updated 2 months ago
- python implementation of the parquet columnar file format.☆352Updated 3 years ago
- Scripts used to setup a Spark cluster on EC2☆392Updated 7 years ago
- A Python MapReduce and HDFS API for Hadoop☆238Updated 3 months ago
- MongoDB Connector for Hadoop☆1,518Updated 3 years ago
- Streaming MapReduce with Scalding and Storm☆2,133Updated 3 years ago