douban / dpark
Python clone of Spark, a MapReduce alike framework in Python
☆2,684Updated 4 years ago
Alternatives and similar repositories for dpark:
Users that are interested in dpark are comparing it to the libraries listed below
- A high-level distributed crawling framework.☆1,505Updated 2 years ago
- Thriftpy has been deprecated, please migrate to https://github.com/Thriftpy/thriftpy2☆1,154Updated 6 years ago
- Kazoo is a high-level Python library that makes it easier to use Apache Zookeeper.☆1,301Updated 3 months ago
- Kids Is Data Stream☆1,224Updated 4 years ago
- [DEPRECATED]Douban CODE☆1,812Updated 4 years ago
- CUP, common useful python-lib. (Currently, Most popular python lib in baidu). Python 开发底层库, 涵盖util、service(threadpool/generator/executo…☆948Updated 3 months ago
- 使用scrapy,redis, mongodb,graphite实现的一个分布式网络爬虫,底层存储mongodb集群,分布式使用redis实现,爬虫状态显示使用graphite实现☆3,253Updated 7 years ago
- 微信公众平台 Python 开发包 [DEPRECATED]☆1,359Updated 4 years ago
- Scalable Bloom Filter implemented in Python☆1,620Updated 3 years ago
- 一起写Python文章,一起看Python文章 - 利用readthedocs的Python技术文章的收集和翻译。☆1,421Updated 6 years ago
- Python interface to Hive and Presto. 🐝☆1,679Updated 7 months ago
- scrapy中文翻译文档☆1,109Updated 5 years ago
- Event driven concurrent framework for Python☆1,857Updated 5 years ago
- Redis-based components for Scrapy.☆5,582Updated 8 months ago
- A developer-friendly Python library to interact with Apache HBase☆607Updated 7 months ago
- A lightweight wrapper around MySQLdb. Originally part of the Tornado framework.☆582Updated 7 years ago
- An Internet-Scale Database.☆1,900Updated 9 months ago
- Simple DAG-based job scheduler in Python☆762Updated 5 years ago
- SSDB - A fast NoSQL database, an alternative to Redis☆8,205Updated 2 years ago
- Pyleus is a Python framework for developing and launching Storm topologies.☆401Updated 6 years ago
- Real-time Query for Hadoop; mirror of Apache Impala☆34Updated 2 years ago
- Build a distributed SQL database from the ground up☆2,149Updated 2 years ago
- Redis sharding client library☆360Updated 2 years ago
- Anthelion is a plugin for Apache Nutch to crawl semantic annotations within HTML pages.☆2,842Updated 9 years ago
- A Powerful Spider(Web Crawler) System in Python.☆16,553Updated 10 months ago
- NumPy and Pandas interface to Big Data☆3,195Updated last year
- Coroutine-based concurrency library for Python☆6,313Updated last month
- Scrapy project to scrape public web directories (educational) [DEPRECATED]☆1,636Updated 7 years ago
- Non-blocking Celery client for Tornado☆564Updated 7 years ago
- Python DB API 2.0 client for Impala and Hive (HiveServer2 protocol)☆733Updated this week