Yelp / mrjob
Run MapReduce jobs on Hadoop or Amazon Web Services
☆2,614Updated last year
Related projects ⓘ
Alternatives and complementary repositories for mrjob
- Python module that allows one to easily write and run Hadoop programs.☆1,035Updated 6 years ago
- Run Python in Apache Storm topologies. Pythonic API, CLI tooling, and a topology DSL.☆1,495Updated 3 months ago
- Python interface to Hive and Presto. 🐝☆1,671Updated 3 months ago
- A pure python HDFS client☆855Updated 2 years ago
- Apache Kafka client for Python; high-level & low-level consumer/producer, with great performance.☆1,119Updated 3 years ago
- PySpark + Scikit-learn = Sparkit-learn☆1,154Updated 3 years ago
- A developer-friendly Python library to interact with Apache HBase☆612Updated 3 months ago
- NumPy and Pandas interface to Big Data☆3,187Updated last year
- Data Migration for the Blaze Project☆1,004Updated 2 years ago
- Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning☆1,788Updated 3 years ago
- a Map/Reduce framework for distributed computing☆1,631Updated 6 years ago
- Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code.☆1,138Updated last year
- A scalable machine learning library on Apache Spark☆792Updated 3 years ago
- Python DB API 2.0 client for Impala and Hive (HiveServer2 protocol)☆731Updated 2 weeks ago
- Distributed deep learning on Hadoop and Spark clusters.☆1,266Updated 5 years ago
- [NOT MAINTAINED] Light-weight Python OLAP framework for multi-dimensional data analysis☆1,491Updated 2 years ago
- A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, orga…☆2,230Updated last week
- Python helpers for building dashboards using Flask and React☆2,270Updated 6 years ago
- Python clone of Spark, a MapReduce alike framework in Python☆2,688Updated 3 years ago
- Hadoop (Utilities, Patches and Examples)☆242Updated 8 years ago
- a plottling library for python, based on D3☆1,415Updated 3 years ago
- DataStax Connector for Apache Spark to Apache Cassandra☆1,943Updated 2 months ago
- Scripts used to setup a Spark cluster on EC2☆393Updated 6 years ago
- Livy is an open source REST interface for interacting with Apache Spark from anywhere☆1,010Updated 2 years ago
- DataStax Python Driver for Apache Cassandra☆1,393Updated last week
- A Python MapReduce and HDFS API for Hadoop☆237Updated 10 months ago
- SFrame: Scalable tabular and graph data-structures built for out-of-core data analysis and machine learning.☆890Updated 6 years ago
- Distributed Neural Networks for Spark☆604Updated 4 years ago