Yelp / mrjob
Run MapReduce jobs on Hadoop or Amazon Web Services
☆2,619Updated 2 years ago
Alternatives and similar repositories for mrjob:
Users that are interested in mrjob are comparing it to the libraries listed below
- Python module that allows one to easily write and run Hadoop programs.☆1,032Updated 7 years ago
- NumPy and Pandas interface to Big Data☆3,199Updated last year
- Run Python in Apache Storm topologies. Pythonic API, CLI tooling, and a topology DSL.☆1,495Updated 8 months ago
- A pure python HDFS client☆857Updated 3 years ago
- Pinball is a scalable workflow manager☆1,044Updated 5 years ago
- Data Migration for the Blaze Project☆1,004Updated 2 years ago
- Python helpers for building dashboards using Flask and React☆2,269Updated 7 years ago
- [NOT MAINTAINED] Light-weight Python OLAP framework for multi-dimensional data analysis☆1,484Updated 2 years ago
- A Python to Vega translator☆2,032Updated 8 years ago
- PySpark + Scikit-learn = Sparkit-learn☆1,154Updated 4 years ago
- Crab is a flexible, fast recommender engine for Python that integrates classic information filtering recommendation algorithms in the world…☆1,181Updated 4 years ago
- A developer-friendly Python library to interact with Apache HBase☆608Updated 8 months ago
- A scalable machine learning library on Apache Spark☆793Updated 3 years ago
- Command line utilities for data analysis☆1,939Updated last year
- Apache Kafka client for Python; high-level & low-level consumer/producer, with great performance.☆1,116Updated 4 years ago
- Information for setting up for the BerkeleyX Spark Intro MOOC, and lab assignments for the course☆349Updated 4 years ago
- A Python package to manage extremely large amounts of data☆1,328Updated 2 weeks ago
- python implementation of the parquet columnar file format.☆350Updated 3 years ago
- MongoDB Connector for Hadoop☆1,518Updated 3 years ago
- Python MapReduce library written in Cython. Visit us in #hadoopy on freenode. See the link below for documentation and tutorials.☆243Updated 9 years ago
- Python clone of Spark, a MapReduce alike framework in Python☆2,682Updated 4 years ago
- Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning☆1,782Updated 3 years ago
- Python interface to Hive and Presto. 🐝☆1,679Updated 8 months ago
- a resque clone in python☆954Updated 4 years ago
- a plottling library for python, based on D3☆1,419Updated 4 years ago
- [NOT MAINTAINED] Bubbles – Python ETL framework☆452Updated 7 years ago
- Web UI for PrestoDB.☆2,750Updated 3 years ago
- Evaluation of Deep Learning Frameworks☆2,046Updated 8 years ago
- Hadoop (Utilities, Patches and Examples)☆242Updated 8 years ago
- Streaming MapReduce with Scalding and Storm☆2,136Updated 3 years ago