pablohoffman / awesome-web-scraping
List of libraries, tools and APIs for web scraping and data processing.
☆13Updated 9 years ago
Related projects ⓘ
Alternatives and complementary repositories for awesome-web-scraping
- Tools that will make writing tests, bots and scrapers using Selenium much easier☆141Updated this week
- A Flake8 plugin to catch common issues on Scrapy spiders☆19Updated 2 years ago
- https://mimesniff.spec.whatwg.org/ implementation for Python☆14Updated 10 months ago
- Restrict crawl and scraping scope using matchers.☆25Updated 8 years ago
- A component that tries to avoid downloading duplicate content☆27Updated 6 years ago
- A brief tutorial on NLP via sentiment classification, Jupyter notebooks, feature creation, and exploritory data analysis.☆25Updated 6 years ago
- Python implementation of the Parsley language for extracting structured data from web pages☆92Updated 7 years ago
- Plots various graphs for a series of plaintext files in a directory☆19Updated 8 years ago
- A scrapy extension to store requests and responses information in storage service☆26Updated 2 years ago
- Security audit tool for Django sites☆14Updated last month
- An improved shell command for the Flask CLI☆70Updated 2 weeks ago
- Quickly open Python modules in your text editor☆44Updated 2 weeks ago
- 🕷Configuration based html scraper☆22Updated 5 months ago
- Assorted generic flask views, blueprints, Jinja2 filters, macros, forms and more.☆24Updated 5 years ago
- A poster made to remind you of Tim Peters' renowned “Zen of Python”. The guiding principles of a Pythonista.☆81Updated last year
- A command line replacement for zapier and ifttt.☆39Updated 6 years ago
- Python dict-like interface for merging dicts with add to set property☆14Updated 6 years ago
- Python and pandas tools to perform various analyses on different types of word lists☆16Updated 9 years ago
- Parsel Command Line Interface☆9Updated 8 years ago
- Check whether a package name is available on pip☆27Updated 4 months ago
- Transform flat data structures into nested object graphs matching JSON schema definitions.☆28Updated 8 years ago
- Tweet Lake is a commandline interface to Twitter Streaming API and big data project that extracts interesting stats out of tweet corpus.☆20Updated 2 years ago
- Detect and classify pagination links☆14Updated 4 years ago
- Examples for "Implementing intuitive and productive APIs" workshops (OSCON, PyCON 2016)☆50Updated 8 years ago
- Simple CLI tool to inspect your Python modules☆20Updated 8 years ago
- Formatter allowing for colored output in log messages☆49Updated last year