Python module that allows one to easily write and run Hadoop programs.
☆1,032Jan 9, 2018Updated 8 years ago
Alternatives and similar repositories for dumbo
Users that are interested in dumbo are comparing it to the libraries listed below
Sorting:
- Python MapReduce library written in Cython. Visit us in #hadoopy on freenode. See the link below for documentation and tutorials.☆242Jan 8, 2016Updated 10 years ago
- Run MapReduce jobs on Hadoop or Amazon Web Services☆2,617Mar 24, 2023Updated 2 years ago
- Utilities to use Avro files from Hadoop Map/Reduce jobs and Streaming☆26Sep 10, 2013Updated 12 years ago
- Twitter on Tornado☆47Sep 28, 2009Updated 16 years ago
- Fast binary [de]serialization of native python types☆33Jun 15, 2010Updated 15 years ago
- A Python MapReduce and HDFS API for Hadoop☆242Jan 19, 2026Updated last month
- An asynchronous client for Amazon SES☆41Oct 16, 2012Updated 13 years ago
- Tornado Hub for Eventlet☆38Mar 5, 2013Updated 13 years ago
- a pastebin clone written in python, using bottle and mongodb☆19Jun 3, 2010Updated 15 years ago
- Fork of flaxcode htmltotext module☆13Jul 30, 2011Updated 14 years ago
- John Langford's original release of Vowpal Wabbit -- a fast online learning algorithm☆57Aug 1, 2024Updated last year
- RHadoop☆762Nov 24, 2015Updated 10 years ago
- Lightning-fast cluster computing in Java, Scala and Python.☆1,426Apr 8, 2014Updated 11 years ago
- Python Thrift driver for Apache Cassandra☆500May 29, 2019Updated 6 years ago
- Redis Sharding on Haskell☆21Apr 10, 2017Updated 8 years ago
- A Python web crawler using Tornado and ZeroMQ☆139May 9, 2012Updated 13 years ago
- WE HAVE MOVED to Apache Incubator. https://cwiki.apache.org/FLUME/ . Flume is a distributed, reliable, and available service for effici…☆944May 26, 2021Updated 4 years ago
- Distributed database specialized in exporting key/value data from Hadoop☆558Jun 27, 2014Updated 11 years ago
- Yahoo!'s topic modelling framework using Latent Dirichlet Allocation☆337Sep 21, 2011Updated 14 years ago
- url shortener using bottle, redis and gevent☆80Jun 30, 2012Updated 13 years ago
- Oozie - workflow engine for Hadoop☆374Jun 8, 2017Updated 8 years ago
- A small collection of useful utilities for the Tornado Webserver☆38Jan 8, 2013Updated 13 years ago
- Additional commands to augment the python virtualenv package.☆36Mar 1, 2010Updated 16 years ago
- Asynchronous Redis client that works within Tornado IO loop.☆77May 20, 2011Updated 14 years ago
- Mirror of Apache MRUnit☆38Dec 10, 2018Updated 7 years ago
- A client for the Sendgrid API☆32Oct 16, 2012Updated 13 years ago
- Python implementation of sessions with Tornado web server and memcached☆46Aug 3, 2011Updated 14 years ago
- Zohmg is a data store for aggregation of multi-dimensional time series data, built on top of Hadoop, Dumbo and HBase.☆173Oct 16, 2012Updated 13 years ago
- A light-weight queue server in python tornado, it uses memcache protocol and store queues persistently.☆46Jun 22, 2017Updated 8 years ago
- example code for "Large-scale social media analysis with Hadoop" tutorial presented at ICWSM 2010☆42Jul 16, 2010Updated 15 years ago
- A small bit of code to make the Boto library for Amazon's AWS services work in an asynchronous (and extremely hacky) manner with Tornado.…☆28Jun 20, 2011Updated 14 years ago
- Scribe logging module for nginx☆27Apr 7, 2011Updated 14 years ago
- Scribe is a server for aggregating log data streamed in real time from a large number of servers.☆3,915Aug 27, 2020Updated 5 years ago
- Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code.☆1,133Apr 10, 2023Updated 2 years ago
- Ruby on Hadoop: Efficient, effective Hadoop streaming & bulk data processing. Write micro scripts for terabyte-scale data☆494Jun 19, 2014Updated 11 years ago
- Slinky, a high-performance web crawler / text analytics in Python, Redis, Hadoop, R, Gephi☆41Aug 30, 2010Updated 15 years ago
- API and command line interface for HDFS☆276Sep 24, 2024Updated last year
- Redis Sharding is a multiplexed proxy-server, designed to work with the database divided to several servers. It's a temporary substitutio…☆110Dec 1, 2016Updated 9 years ago
- Multiminer server is a clustering (and soon to be pooling) management system for efficiently distributing Bitcoin mining work☆26May 3, 2011Updated 14 years ago