seomoz / g-crawl-pyLinks

Gevent Crawling in Python, with Utilities

☆22

Alternatives and similar repositories for g-crawl-py

Users that are interested in g-crawl-py are comparing it to the libraries listed below

Sorting:

scholrly / lucene-querybuilder
A DSL to build Lucene text queries in Python.
☆38Updated 8 years ago
svetlyak40wt / scrapy-useragents
A middleware to use random user agent in Scrapy crawler.
☆33Updated 12 years ago
cjdd3b / citizen-quotes
A project to demonstrate maximum entropy models for extracting quotes from news articles in Python.
☆25Updated 13 years ago
ContinuumIO / scrapy_scrapers
Scraper built with Scrapy.
☆18Updated last year
scrapinghub / crawlera-tools
Crawlera tools
☆26Updated 9 years ago
Alir3z4 / python-sanitize
Bringing sanity to world of messed-up data
☆66Updated 11 years ago
hymloth / pyredise
A simple and fast search engine
☆70Updated 3 years ago
niwinz / phantompy
Phantompy is a headless WebKit engine with powerful pythonic api build on top of Qt5 Webkit
☆612Updated 8 years ago
trec-kba / streamcorpus
common data interchange format for document processing pipelines that apply natural language processing tools to large streams of text
☆35Updated 9 years ago
OlivierBlanvillain / crawler
Blog crawler for the blogforever project.
☆23Updated 11 years ago
balanced / balanced-python
Balanced API library in python.
☆69Updated 3 years ago
TeamHG-Memex / scrapy-dockerhub
[UNMAINTAINED] Deploy, run and monitor your Scrapy spiders.
☆11Updated 10 years ago
julien-duponchelle / scrapy-elasticsearch
A scrapy pipeline which send items to Elastic Search server
☆98Updated 7 years ago
scrapinghub / scmongo
MongoDB extensions for Scrapy
☆44Updated 11 years ago
brandicted / scrapy-webdriver
☆143Updated 9 years ago
larsbutler / celery-examples
Examples of distributed computation using Celery
☆33Updated 13 years ago
scrapinghub / scrapylib
Collection of Scrapy utilities (extensions, middlewares, pipelines, etc)
☆32Updated 7 years ago
hugsbrugs / scrapyd-webapp
Scrapyd web application for managing projects, spiders and visualize jobs logs and items
☆14Updated 9 years ago
scrapinghub / page_finder
Find which links on a web page are pagination links
☆29Updated 8 years ago
benoitc / flower
collection of modules to build distributed and reliable concurrent systems in Python.
☆206Updated 12 years ago
djangsters / frequests
Requests + futures = <3 - a grequests fork, origonal code: https://github.com/kennethreitz/grequests
☆63Updated 10 years ago
rmax / scrapy-boilerplate
Small set of utilities to simplify writing Scrapy spiders.
☆49Updated 10 years ago
jehiah / urlnorm
Convert URL's to a normalized unicode format
☆67Updated 7 years ago
madirey / django-celery-admin-ext
Adds the ability to manually run a periodic celery task from the Django Admin
☆24Updated 11 years ago
kanzure / pyphantomjs
Headless WebKit with JavaScript API .. but reimplemented in python
☆138Updated 13 years ago
bbrodriges / pholcidae
Tiny python web crawler
☆169Updated 9 years ago
andrewjw / celery-crawler
A Django based search engine powered by CouchDB, celery and whoosh.
☆49Updated 9 years ago
jabbalaci / Jabba-Webkit
Jabba's headless webkit browser for scraping AJAX-powered webpages.
☆90Updated 11 years ago
koblas / spiro
Tornado Web Crawler
☆67Updated 13 years ago
SEL-Columbia / bamboo
Dynamic data analysis over the web. The logic to your data dashboards.
☆156Updated 10 years ago