Yelp / mrjobLinks
Run MapReduce jobs on Hadoop or Amazon Web Services
☆2,621Updated 2 years ago
Alternatives and similar repositories for mrjob
Users that are interested in mrjob are comparing it to the libraries listed below
Sorting:
- Python module that allows one to easily write and run Hadoop programs.☆1,031Updated 7 years ago
- Run Python in Apache Storm topologies. Pythonic API, CLI tooling, and a topology DSL.☆1,500Updated 3 months ago
- PySpark + Scikit-learn = Sparkit-learn☆1,154Updated 4 years ago
- Apache Kafka client for Python; high-level & low-level consumer/producer, with great performance.☆1,117Updated 4 years ago
- A pure python HDFS client☆857Updated 3 years ago
- Pinball is a scalable workflow manager☆1,044Updated 5 years ago
- Python interface to Hive and Presto. 🐝☆1,686Updated last year
- a plottling library for python, based on D3☆1,419Updated 4 years ago
- Hadoop (Utilities, Patches and Examples)☆244Updated 9 years ago
- SFrame: Scalable tabular and graph data-structures built for out-of-core data analysis and machine learning.☆903Updated 7 years ago
- NumPy and Pandas interface to Big Data☆3,200Updated 2 years ago
- [NOT MAINTAINED] Light-weight Python OLAP framework for multi-dimensional data analysis☆1,480Updated 3 years ago
- Pyleus is a Python framework for developing and launching Storm topologies.☆400Updated 6 years ago
- Python helpers for building dashboards using Flask and React☆2,271Updated 4 months ago
- [UNMAINTAINED] A developer-friendly Python library to interact with Apache HBase☆609Updated last month
- A Python to Vega translator☆2,031Updated 9 years ago
- Data Migration for the Blaze Project☆1,003Updated 3 years ago
- Python MapReduce library written in Cython. Visit us in #hadoopy on freenode. See the link below for documentation and tutorials.☆241Updated 9 years ago
- VM based deployment for prototyping Big Data tools on Amazon Web Services☆129Updated 5 years ago
- Information for setting up for the BerkeleyX Spark Intro MOOC, and lab assignments for the course☆347Updated 4 years ago
- Scalable Bloom Filter implemented in Python☆1,622Updated 4 years ago
- Data and example code for Programming Pig, by Alan F. Gates☆187Updated 9 years ago
- Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning☆1,784Updated 4 years ago
- python implementation of the parquet columnar file format.☆354Updated 4 years ago
- Scripts used to setup a Spark cluster on EC2☆390Updated 7 years ago
- The official online compendium for Mining the Social Web (O'Reilly, 2011)☆1,205Updated 12 years ago
- Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code.☆1,135Updated 2 years ago
- MILK: Machine Learning Toolkit☆603Updated 10 years ago
- DataStax Python Driver for Apache Cassandra☆1,418Updated this week
- Please visit https://github.com/h2oai/h2o-3 for latest H2O☆2,247Updated last year