pablohoffman / awesome-web-scraping
List of libraries, tools and APIs for web scraping and data processing.
☆13Updated 9 years ago
Alternatives and similar repositories for awesome-web-scraping:
Users that are interested in awesome-web-scraping are comparing it to the libraries listed below
- A Flake8 plugin to catch common issues on Scrapy spiders☆19Updated 2 years ago
- Restrict crawl and scraping scope using matchers.☆25Updated 8 years ago
- Security audit tool for Django sites☆14Updated 3 months ago
- https://mimesniff.spec.whatwg.org/ implementation for Python☆14Updated last year
- A Scrapy pipeline to categorize items using MonkeyLearn☆37Updated 7 years ago
- Python implementation of the Parsley language for extracting structured data from web pages☆92Updated 7 years ago
- A command line replacement for zapier and ifttt.☆39Updated 6 years ago
- Find which links on a web page are pagination links☆29Updated 8 years ago
- Python package to detect and return RSS / Atom feeds for a given website. The tool supports major blogging platform including Wordpress, …☆21Updated 3 years ago
- Scrapy downloader middleware that stores response HTMLs to disk.☆19Updated 8 months ago
- A tool to allow US addresses to be geocoded/georeferenced easily, without using Python or the command line or paid services or anything.☆17Updated 2 years ago
- Definitions of Pardon jargon to help Python beginners understand Pythonista gobbletigook☆53Updated 4 years ago
- Versioned domain model. Python library for revisioning/versioning of databases.☆44Updated 4 years ago
- A brief tutorial on NLP via sentiment classification, Jupyter notebooks, feature creation, and exploritory data analysis.☆25Updated 6 years ago
- Python and pandas tools to perform various analyses on different types of word lists☆16Updated 10 years ago
- Plots various graphs for a series of plaintext files in a directory☆19Updated 8 years ago
- Analyze scraped data☆47Updated 5 years ago
- xmldataset: xml parsing made easy 🗃️☆78Updated 4 years ago
- OpenSSF Scorecard for top Python packages☆16Updated this week
- Stream processing in Python of twitter searches using public APIs.☆9Updated 9 years ago
- Command Line Application for Job Search☆13Updated 7 years ago
- A scrapy extension to store requests and responses information in storage service☆26Updated 2 years ago
- Exporters is an extensible export pipeline library that supports filter, transform and several sources and destinations☆40Updated 8 months ago
- ☆49Updated 3 weeks ago
- Logquacious is a set of simple logging utilities to help you over-communicate.☆33Updated 5 years ago
- A simple command line interface to the datamade/dedupe library.☆42Updated 2 years ago
- Passwordless Email Auth☆26Updated 2 years ago
- Pretty HTML/XML rendering with syntax highlighting for BeautifulSoup objects in IPython notebook and qtconsole.☆69Updated 4 years ago
- Tools that will make writing tests, bots and scrapers using Selenium much easier☆140Updated last month
- Utility library to turn country names into ISO two-letter codes☆66Updated this week