atupal / ccrawler
A distrubuted crawler ues celery.
☆17Updated 10 years ago
Alternatives and similar repositories for ccrawler:
Users that are interested in ccrawler are comparing it to the libraries listed below
- Exporters is an extensible export pipeline library that supports filter, transform and several sources and destinations☆40Updated 8 months ago
- Search engine base (crawler, indexer and parser) using Python, Celery, RabbitMQ, CouchDB and Whoosh.☆11Updated last year
- iCQA - Intelligent Community Question Answering Framework☆31Updated 8 years ago
- Crawlera tools☆26Updated 9 years ago
- ☆18Updated 8 years ago
- Toy web crawler☆21Updated 13 years ago
- Gevent Crawling in Python, with Utilities☆23Updated 9 years ago
- High Level Kafka Scanner☆19Updated 7 years ago
- Tornado Web Crawler☆66Updated 12 years ago
- Big GeoSpatial Data Points Visualization Tool☆19Updated 8 years ago
- A middleware to use random user agent in Scrapy crawler.☆33Updated 12 years ago
- A Python 3 compatible fork of https://launchpad.net/pymeta☆18Updated 6 years ago
- MongoDB extensions for Scrapy☆44Updated 10 years ago
- A tiny python utility that converts data crawled from different services into a cloud of words☆30Updated 6 years ago
- Application Driven Stats Monitoring☆229Updated 9 years ago
- Fast Python Bloom Filter using Mmap☆13Updated 12 years ago
- A component that tries to avoid downloading duplicate content☆27Updated 6 years ago
- CHeSF is the Chrome Headless Scraping Framework, a very very alpha code to scrape javascript intensive web pages☆20Updated 7 years ago
- collection of modules to build distributed and reliable concurrent systems in Python.☆206Updated 11 years ago
- python library for interacting with SolrCloud☆36Updated 4 years ago
- A Text Comprehension Engine in Python☆15Updated 9 years ago
- A distributed asynchronous socket framework of Python☆13Updated 8 years ago
- General Architecture for Text Engineering☆48Updated 8 years ago
- Simple Web UI for Scrapy spider management via Scrapyd☆51Updated 6 years ago
- Slinky, a high-performance web crawler / text analytics in Python, Redis, Hadoop, R, Gephi☆41Updated 14 years ago
- [not actively maintained] The C++ webkit-server from capybara-webkit with useful extensions and Python bindings☆48Updated 4 years ago
- Tool to flatten stream of JSON-like objects, configured via schema☆33Updated 5 years ago
- No longer maintained! See https://bitbucket.org/vangheem/pyzipcode☆20Updated 5 years ago
- Traptor -- A distributed Twitter feed☆26Updated 2 years ago
- Dump mysql tables to s3, and parse them☆31Updated 10 years ago