Yelp / mrjobLinks
Run MapReduce jobs on Hadoop or Amazon Web Services
☆2,619Updated 2 years ago
Alternatives and similar repositories for mrjob
Users that are interested in mrjob are comparing it to the libraries listed below
Sorting:
- Python module that allows one to easily write and run Hadoop programs.☆1,031Updated 8 years ago
- Run Python in Apache Storm topologies. Pythonic API, CLI tooling, and a topology DSL.☆1,504Updated 6 months ago
- A pure python HDFS client☆859Updated 3 years ago
- PySpark + Scikit-learn = Sparkit-learn☆1,153Updated 5 years ago
- Python interface to Hive and Presto. 🐝☆1,693Updated last year
- Apache Kafka client for Python; high-level & low-level consumer/producer, with great performance.☆1,117Updated 5 years ago
- Crab is a flexible, fast recommender engine for Python that integrates classic information filtering recommendation algorithms in the world…☆1,175Updated 5 years ago
- Pinball is a scalable workflow manager☆1,043Updated 6 years ago
- [NOT MAINTAINED] Light-weight Python OLAP framework for multi-dimensional data analysis☆1,481Updated 3 years ago
- SFrame: Scalable tabular and graph data-structures built for out-of-core data analysis and machine learning.☆901Updated 7 years ago
- Hadoop (Utilities, Patches and Examples)☆244Updated 9 years ago
- Pyleus is a Python framework for developing and launching Storm topologies.☆400Updated 6 years ago
- [UNMAINTAINED] A developer-friendly Python library to interact with Apache HBase☆614Updated 4 months ago
- For the latest version of boto, see https://github.com/boto/boto3 -- Python interface to Amazon Web Services☆6,439Updated 2 years ago
- NumPy and Pandas interface to Big Data☆3,198Updated 2 years ago
- python implementation of the parquet columnar file format.☆358Updated 4 years ago
- Official native Python client for the Vertica Analytics Database.☆386Updated this week
- Python MapReduce library written in Cython. Visit us in #hadoopy on freenode. See the link below for documentation and tutorials.☆241Updated 10 years ago
- Python helpers for building dashboards using Flask and React☆2,268Updated 7 months ago
- Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code.☆1,132Updated 2 years ago
- Train NLTK objects with zero code☆745Updated 5 years ago
- Information for setting up for the BerkeleyX Spark Intro MOOC, and lab assignments for the course☆346Updated 4 years ago
- A library for time series analysis on Apache Spark☆1,195Updated 5 years ago
- Python Driver for Apache Cassandra®☆1,423Updated last month
- Python DB API 2.0 client for Impala and Hive (HiveServer2 protocol)☆741Updated 5 months ago
- Scripts used to setup a Spark cluster on EC2☆389Updated 8 years ago
- This page is a summary to keep the track of Hadoop related projects, and relevant projects around Big Data scene focused on the open sour…☆690Updated 4 years ago
- A Python MapReduce and HDFS API for Hadoop☆241Updated last week
- This repository hold the Amazon Elastic MapReduce sample bootstrap actions☆613Updated 2 years ago
- Scalable Bloom Filter implemented in Python☆1,624Updated 4 years ago