pablohoffman / awesome-web-scraping
List of libraries, tools and APIs for web scraping and data processing.
☆13Updated 9 years ago
Related projects: ⓘ
- Utility library to turn country names into ISO two-letter codes☆65Updated 10 months ago
- Scrapy schema validation pipeline and Item builder using JSON Schema☆44Updated 3 years ago
- A Flake8 plugin to catch common issues on Scrapy spiders☆19Updated 2 years ago
- A scrapy extension to store requests and responses information in storage service☆26Updated 2 years ago
- A command line replacement for zapier and ifttt.☆39Updated 6 years ago
- Python implementation of the Parsley language for extracting structured data from web pages☆92Updated 6 years ago
- ☆20Updated this week
- Restrict crawl and scraping scope using matchers.☆25Updated 8 years ago
- Parsel Command Line Interface☆9Updated 8 years ago
- ☆48Updated last month
- A py.test plugin that displays test results as OS X notifications☆74Updated 8 years ago
- Commit Counter Chart is a Python Flask app to view git history using D3.js☆38Updated 8 years ago
- Extends zip() and itertools.zip_longest() to generate named tuples.☆23Updated 5 years ago
- Python library with common functionality for writing web scrapers☆102Updated 9 years ago
- Tools that will make writing tests, bots and scrapers using Selenium much easier☆141Updated this week
- Scrapy downloader middleware that stores response HTMLs to disk.☆18Updated 4 months ago
- Analyze scraped data☆47Updated 4 years ago
- Scraper for categories and lists on ecommerce and other listing websites☆43Updated 3 years ago
- https://mimesniff.spec.whatwg.org/ implementation for Python☆14Updated 8 months ago
- Find which links on a web page are pagination links☆29Updated 7 years ago
- xmldataset: xml parsing made easy 🗃️☆77Updated 4 years ago
- Small set of utilities to simplify writing Scrapy spiders.☆49Updated 9 years ago
- A command-line script to get all the contributors for one or more GitHub projects.☆33Updated 3 years ago
- Paginating the web☆37Updated 10 years ago
- Things and stuff for times, dates and datetimes. Maybe they're useful☆14Updated 6 years ago
- A set of libraries to allow for sane use of old Python runtimes☆30Updated 8 years ago
- A Python packaging utility library☆19Updated last year
- extract difference between two html pages☆32Updated 6 years ago
- Detect and classify pagination links☆14Updated 4 years ago
- Security audit tool for Django sites☆14Updated 4 months ago