lorien / awesome-web-scrapingLinks
List of libraries, tools and APIs for web scraping and data processing.
☆7,454Updated last month
Alternatives and similar repositories for awesome-web-scraping
Users that are interested in awesome-web-scraping are comparing it to the libraries listed below
Sorting:
- A collection of awesome web crawler,spider in different languages☆7,019Updated last year
- Web Scraping Framework☆2,440Updated 2 months ago
- Lightweight, scriptable browser as a service with an HTTP API☆4,194Updated last year
- Visual scraping for Scrapy☆9,475Updated last year
- Declarative web scraping☆5,898Updated 2 months ago
- A curated list of awesome packages, articles, and other cool resources from the Scrapy community.☆551Updated 2 years ago
- Scrapy+Splash for JavaScript integration☆3,242Updated 9 months ago
- A service daemon to run Scrapy spiders☆3,076Updated last week
- A pure-python HTML screen-scraping library☆1,887Updated 3 years ago
- A list of (almost) all headless web browsers in existence☆6,457Updated last month
- 🔍 A helpful checklist/collection of Search Engine Optimization (SEO) tips and techniques.☆2,633Updated 9 months ago
- Multifarious Scrapy examples. Spiders for alexa / amazon / douban / douyu / github / linkedin etc.☆3,258Updated 2 years ago
- A scalable frontier for web crawlers☆1,322Updated 5 months ago
- HTTP API for Scrapy spiders☆870Updated 2 months ago
- A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.☆2,758Updated 4 years ago
- This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.☆1,220Updated 2 years ago
- A list of scrapers from around the web.☆699Updated 9 months ago
- admin ui for scrapy/open source scrapinghub☆2,772Updated 2 years ago
- Scrapy middleware to handle javascript pages using selenium☆955Updated last year
- A curated list of analytics frameworks, software and other tools.☆4,209Updated last week
- A curated list of awesome minimalist frameworks (simple and lightweight).☆3,632Updated 2 weeks ago
- Distributed crawler powered by Headless Chrome☆5,657Updated 2 years ago
- The definitive list of lists (of lists) curated on GitHub and elsewhere☆10,756Updated 6 months ago
- Random User-Agent middleware based on fake-useragent☆693Updated 2 years ago
- Your second OS. SDK that has it all. Streaming, OS control with agents. Declarative. Synced.☆21,184Updated last week
- Web app for Scrapyd cluster management, Scrapy log analysis & visualization, Auto packaging, Timer tasks, Monitor & Alert, and Mobile UI.…☆3,360Updated 9 months ago
- Extract embedded metadata from HTML markup☆934Updated 2 months ago
- Proxy [Finder | Checker | Server]. HTTP(S) & SOCKS☆4,078Updated last year
- A Python library for automating interaction with websites.☆4,816Updated last week
- A Smart, Automatic, Fast and Lightweight Web Scraper for Python☆7,040Updated 5 months ago