Yelp/mrjob

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Yelp/mrjob)

Yelp / mrjob

Run MapReduce jobs on Hadoop or Amazon Web Services

☆2,610

Alternatives and similar repositories for mrjob

Users that are interested in mrjob are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

klbostee / dumbo
View on GitHub
Python module that allows one to easily write and run Hadoop programs.
☆1,030Jan 9, 2018Updated 8 years ago
bwhite / hadoopy
View on GitHub
Python MapReduce library written in Cython. Visit us in #hadoopy on freenode. See the link below for documentation and tutorials.
☆243Jan 8, 2016Updated 10 years ago
boto / boto
View on GitHub
For the latest version of boto, see https://github.com/boto/boto3 -- Python interface to Amazon Web Services
☆6,425Jan 12, 2024Updated 2 years ago
discoproject / disco
View on GitHub
a Map/Reduce framework for distributed computing
☆1,631Jan 30, 2018Updated 8 years ago
nathanmarz / storm
View on GitHub
Distributed and fault-tolerant realtime computation: stream processing, continuous computation, distributed RPC, and more
☆8,772Aug 16, 2017Updated 8 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
wrobstory / vincent
View on GitHub
A Python to Vega translator
☆2,022Oct 25, 2016Updated 9 years ago
twitter / elephant-bird
View on GitHub
Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code.
☆1,134Apr 10, 2023Updated 3 years ago
Yelp / Tron
View on GitHub
Next generation batch process scheduling and management
☆355Updated this week
YelpArchive / mr3po
View on GitHub
protocols for use with mrjob
☆16Mar 24, 2023Updated 3 years ago
Yelp / Testify
View on GitHub
A more pythonic testing framework.
☆307Apr 2, 2026Updated 3 months ago
pystorm / streamparse
View on GitHub
Run Python in Apache Storm topologies. Pythonic API, CLI tooling, and a topology DSL.
☆1,505Apr 22, 2026Updated 2 months ago
LinkedInAttic / datafu
View on GitHub
Hadoop library for large-scale data processing, now an Apache Incubator project
☆581Jul 8, 2014Updated 12 years ago
crs4 / pydoop
View on GitHub
A Python MapReduce and HDFS API for Hadoop
☆241Jan 19, 2026Updated 6 months ago
mesos / spark
View on GitHub
Lightning-fast cluster computing in Java, Scala and Python.
☆1,419Apr 8, 2014Updated 12 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
facebookarchive / scribe
View on GitHub
Scribe is a server for aggregating log data streamed in real time from a large number of servers.
☆3,912Aug 27, 2020Updated 5 years ago
yhat / ggpy
View on GitHub
ggplot port for python
☆3,689Jan 21, 2023Updated 3 years ago
twitter / scalding
View on GitHub
A Scala API for Cascading
☆3,522May 28, 2023Updated 3 years ago
spotify / luigi
View on GitHub
Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, vis…
☆18,746Updated this week
statsd / statsd
View on GitHub
Daemon for easy but powerful stats aggregation
☆18,061May 20, 2025Updated last year
apache / predictionio
View on GitHub
PredictionIO, a machine learning server for developers and ML engineers.
☆12,521Jan 9, 2021Updated 5 years ago
YelpArchive / EMRio
View on GitHub
Elastic MapReduce instance optimizer
☆30Mar 24, 2023Updated 3 years ago
heynemann / r3
View on GitHub
r³ is a map-reduce engine written in python using redis as a backend
☆346Sep 7, 2012Updated 13 years ago
mikedewar / d3py
View on GitHub
a plottling library for python, based on D3
☆1,414Dec 28, 2020Updated 5 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
jblomo / oddjob
View on GitHub
useful JVM classes for the mrjob hadoop streaming framework
☆31Jun 20, 2013Updated 13 years ago
kvh / ramp
View on GitHub
Rapid Machine Learning Prototyping in Python
☆656Nov 11, 2015Updated 10 years ago
bdarnell / plop
View on GitHub
Python Low-Overhead Profiler
☆921Updated this week
mesos / chronos
View on GitHub
Fault tolerant job scheduler for Mesos which handles dependencies and ISO8601 based schedules
☆4,376Jun 29, 2022Updated 4 years ago
twitter-archive / flockdb
View on GitHub
A distributed, fault-tolerant graph database
☆3,316Mar 16, 2017Updated 9 years ago
Cue / scales
View on GitHub
scales - Metrics for Python
☆919May 25, 2023Updated 3 years ago
amplab / shark
View on GitHub
Development in Shark has been ended.
☆992Aug 11, 2015Updated 10 years ago
twitter / summingbird
View on GitHub
Streaming MapReduce with Scalding and Storm
☆2,123Jan 19, 2022Updated 4 years ago
douban / dpark
View on GitHub
Python clone of Spark, a MapReduce alike framework in Python
☆2,663Dec 25, 2020Updated 5 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
cloudera / python-ngrams
View on GitHub
☆74Jun 18, 2013Updated 13 years ago
lintool / MapReduceAlgorithms
View on GitHub
Data-Intensive Text Processing with MapReduce
☆628Mar 3, 2021Updated 5 years ago
not-kennethreitz / envoy
View on GitHub
Python Subprocesses for Humans™.
☆2,264Jan 15, 2017Updated 9 years ago
nathanmarz / elephantdb
View on GitHub
Distributed database specialized in exporting key/value data from Hadoop
☆558Jun 27, 2014Updated 12 years ago
pyston / pyston_v1
View on GitHub
The previous version of Pyston, a faster implementation of the Python programming language. Please use this link for the new repository:
☆4,845May 7, 2021Updated 5 years ago
bitly / data_hacks
View on GitHub
Command line utilities for data analysis
☆1,979Jan 16, 2024Updated 2 years ago
cayleygraph / cayley
View on GitHub
An open-source graph database
☆15,050May 5, 2026Updated 2 months ago