killets / Distributed-Web-Crawler-with-CeleryLinks

Python: selenium, beautifulsoup2, celery, rabbitmq, Amazon AWS(EC2, S3)

☆10

Alternatives and similar repositories for Distributed-Web-Crawler-with-Celery

Users that are interested in Distributed-Web-Crawler-with-Celery are comparing it to the libraries listed below

Sorting:

jxltom / scrapymon
Simple Web UI for Scrapy spider management via Scrapyd
☆51Updated 7 years ago
anuragrana / scraping_tweets_celery_rabbitmq_docker_cluster
Scraping tweets quickly using celery, RabbitMQ and Docker cluster
☆50Updated 3 years ago
scrapinghub / scmongo
MongoDB extensions for Scrapy
☆44Updated 11 years ago
julien-duponchelle / scrapy-elasticsearch
A scrapy pipeline which send items to Elastic Search server
☆98Updated 7 years ago
scrapinghub / exporters
Exporters is an extensible export pipeline library that supports filter, transform and several sources and destinations
☆40Updated last year
scrapinghub / arche
Analyze scraped data
☆46Updated 6 years ago
Parsely / serpextract
Easy extraction of keywords and engines from search engine results pages (SERPs).
☆93Updated 2 months ago
ponyriders / django-amazon-price-monitor
Monitors prices of Amazon products via Product Advertising API
☆156Updated 6 years ago
Tiago-Lira / scrapyd-mongodb
Library designed to replace the SQLite backend by a MongoDB backend on Scrapy queue management
☆17Updated 8 years ago
scrapy-plugins / scrapy-headless
☆28Updated 4 years ago
TeamHG-Memex / extract-html-diff
extract difference between two html pages
☆32Updated this week
Mimino666 / python-xextract
Extract structured data from HTML and XML documents like a boss.
☆50Updated last year
TeamHG-Memex / arachnado
Web Crawling UI and HTTP API, based on Scrapy and Tornado
☆160Updated last month
lethain / extraction
A Python library for extracting titles, images, descriptions and canonical urls from HTML.
☆151Updated 5 years ago
aaldaber / Distributed-Multi-User-Scrapy-System-with-a-Web-UI
Django based application that allows creating, deploying and running Scrapy spiders in a distributed manner
☆113Updated 7 years ago
jschnurr / scrapyscript
Run a Scrapy spider programmatically from a script or a Celery task - no project required.
☆121Updated last year
jay-johnson / celery-connectors
Want to handle 100,000 messages in 90 seconds? Celery and Kombu are that awesome - Multiple publisher-subscriber demos for processing jso…
☆41Updated 7 years ago
scrapinghub / scrapy-mosquitera
Restrict crawl and scraping scope using matchers.
☆26Updated 9 years ago
chuanconggao / html2json
Lightweight library that converts a HTML webpage to JSON data using a template defined in JSON.
☆23Updated 6 months ago
TeamHG-Memex / autologin
A project to attempt to automatically login to a website given a single seed
☆127Updated 2 months ago
scrapy-plugins / scrapy-jsonschema
Scrapy schema validation pipeline and Item builder using JSON Schema
☆44Updated 4 years ago
app-generator / flask-argon-design-system
Flask App - Argon Design System | AppSeed
☆11Updated 5 years ago
roycehaynes / scrapy-rabbitmq
A RabbitMQ Scheduler for Scrapy
☆87Updated 3 years ago
lopuhin / scrapy-pyppeteer
Use pyppeteer from a Scrapy spider
☆59Updated 5 years ago
scrapinghub / scrapy-autoextract
Zyte Automatic Extraction integration for Scrapy
☆56Updated 3 years ago
TeamHG-Memex / MaybeDont
A component that tries to avoid downloading duplicate content
☆27Updated 7 years ago
dimpu47 / onlineShop
E-commerce Web Application written in Django with Payment Integration, Asyncronous task processing using Celery, Flower etc..
☆11Updated 6 years ago
matiasb / demiurge
PyQuery-based scraping micro-framework.
☆119Updated 3 years ago
kadnan / ScrapeGen
A simple python tool that generates a requests/bs4 based web scraper
☆27Updated 3 years ago
ncouture / python-search-engine
Search engine base (crawler, indexer and parser) using Python, Celery, RabbitMQ, CouchDB and Whoosh.
☆11Updated 6 months ago