Yelp / mrjob
Run MapReduce jobs on Hadoop or Amazon Web Services
☆2,619Updated 2 years ago
Alternatives and similar repositories for mrjob:
Users that are interested in mrjob are comparing it to the libraries listed below
- Python module that allows one to easily write and run Hadoop programs.☆1,031Updated 7 years ago
- Run Python in Apache Storm topologies. Pythonic API, CLI tooling, and a topology DSL.☆1,497Updated 9 months ago
- A pure python HDFS client☆857Updated 3 years ago
- NumPy and Pandas interface to Big Data☆3,199Updated last year
- PySpark + Scikit-learn = Sparkit-learn☆1,154Updated 4 years ago
- Data Migration for the Blaze Project☆1,003Updated 2 years ago
- A Scala API for Cascading☆3,515Updated last year
- Apache Kafka client for Python; high-level & low-level consumer/producer, with great performance.☆1,116Updated 4 years ago
- Pinball is a scalable workflow manager☆1,043Updated 5 years ago
- Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code.☆1,137Updated 2 years ago
- Crab is a flexible, fast recommender engine for Python that integrates classic information filtering recommendation algorithms in the world…☆1,182Updated 4 years ago
- Python Extract Transform and Load Tables of Data☆1,266Updated last week
- Python interface to Hive and Presto. 🐝☆1,682Updated 9 months ago
- Python helpers for building dashboards using Flask and React☆2,270Updated 7 years ago
- A scalable machine learning library on Apache Spark☆793Updated 3 years ago
- ☆522Updated 3 years ago
- Python DB API 2.0 client for Impala and Hive (HiveServer2 protocol)☆738Updated last month
- Python MapReduce library written in Cython. Visit us in #hadoopy on freenode. See the link below for documentation and tutorials.☆243Updated 9 years ago
- a plottling library for python, based on D3☆1,419Updated 4 years ago
- Pyleus is a Python framework for developing and launching Storm topologies.☆401Updated 6 years ago
- [NOT MAINTAINED] Light-weight Python OLAP framework for multi-dimensional data analysis☆1,484Updated 3 years ago
- Jupyter magics and kernels for working with remote Spark clusters☆1,351Updated 2 months ago
- A Python to Vega translator☆2,031Updated 8 years ago
- A Python MapReduce and HDFS API for Hadoop☆238Updated 3 months ago
- A developer-friendly Python library to interact with Apache HBase☆608Updated 9 months ago
- A library for time series analysis on Apache Spark☆1,193Updated 4 years ago
- API and command line interface for HDFS☆272Updated 7 months ago
- Data Visualization Server☆962Updated 8 years ago
- ggplot port for python☆3,700Updated 2 years ago
- For the latest version of boto, see https://github.com/boto/boto3 -- Python interface to Amazon Web Services☆6,459Updated last year