Yelp / mrjob
Run MapReduce jobs on Hadoop or Amazon Web Services
☆2,618Updated last year
Alternatives and similar repositories for mrjob:
Users that are interested in mrjob are comparing it to the libraries listed below
- Python module that allows one to easily write and run Hadoop programs.☆1,032Updated 7 years ago
- A pure python HDFS client☆855Updated 2 years ago
- Run Python in Apache Storm topologies. Pythonic API, CLI tooling, and a topology DSL.☆1,494Updated 6 months ago
- Pinball is a scalable workflow manager☆1,044Updated 5 years ago
- PySpark + Scikit-learn = Sparkit-learn☆1,154Updated 4 years ago
- A Scala API for Cascading☆3,513Updated last year
- Data Migration for the Blaze Project☆1,004Updated 2 years ago
- Interactive and Reactive Data Science using Scala and Spark.☆3,151Updated last year
- Information for setting up for the BerkeleyX Spark Intro MOOC, and lab assignments for the course☆350Updated 3 years ago
- Crab is a flexible, fast recommender engine for Python that integrates classic information filtering recommendation algorithms in the world…☆1,177Updated 4 years ago
- a plottling library for python, based on D3☆1,417Updated 4 years ago
- A library for reading text files over multiple cores.☆1,055Updated last year
- Apache Kafka client for Python; high-level & low-level consumer/producer, with great performance.☆1,118Updated 4 years ago
- SFrame: Scalable tabular and graph data-structures built for out-of-core data analysis and machine learning.☆892Updated 6 years ago
- Python helpers for building dashboards using Flask and React☆2,268Updated 6 years ago
- A library for time series analysis on Apache Spark☆1,192Updated 4 years ago
- [NOT MAINTAINED] Light-weight Python OLAP framework for multi-dimensional data analysis☆1,485Updated 2 years ago
- NumPy and Pandas interface to Big Data☆3,193Updated last year
- [NOT MAINTAINED] Bubbles – Python ETL framework☆454Updated 7 years ago
- ☆517Updated 3 years ago
- Hadoop (Utilities, Patches and Examples)☆242Updated 8 years ago
- Scripts used to setup a Spark cluster on EC2☆394Updated 7 years ago
- Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code.☆1,139Updated last year
- Python clone of Spark, a MapReduce alike framework in Python☆2,684Updated 4 years ago
- A scalable machine learning library on Apache Spark☆791Updated 3 years ago
- This page is a summary to keep the track of Hadoop related projects, and relevant projects around Big Data scene focused on the open sour…☆692Updated 3 years ago
- Machine Learning Platform and Recommendation Engine built on Kubernetes☆1,472Updated 4 years ago
- Simplifying robust end-to-end machine learning on Apache Spark.☆470Updated 7 years ago
- Code to accompany Advanced Analytics with Spark from O'Reilly Media☆1,526Updated 4 months ago
- Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning☆1,781Updated 3 years ago