List of libraries, tools and APIs for web scraping and data processing.
☆13Sep 17, 2015Updated 10 years ago
Alternatives and similar repositories for awesome-web-scraping
Users that are interested in awesome-web-scraping are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This is an API for a todo list application implemented using API Star☆12Dec 26, 2022Updated 3 years ago
- Print an image of a cat to the iTerm2 terminal☆14Feb 7, 2017Updated 9 years ago
- ☆14Sep 18, 2012Updated 13 years ago
- Vinta's ESLint and Prettier shareable configs.☆23Feb 19, 2024Updated 2 years ago
- Find which links on a web page are pagination links☆29Jan 12, 2017Updated 9 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- scrapy-extras -- a collection of code samples and modules for the Scrapy framework.☆14Dec 14, 2020Updated 5 years ago
- A Scrapy pipeline to categorize items using MonkeyLearn☆38Apr 28, 2017Updated 8 years ago
- How To Be a Programmer, edited☆12May 21, 2012Updated 13 years ago
- A awesome list of (large-scale) public datasets on the Internet. (On-going collection)☆24Feb 18, 2022Updated 4 years ago
- litrl browser and detectors☆10Oct 5, 2023Updated 2 years ago
- dragonscan is a information gathering tool coded in python cloning in /root/ folder is recommended☆12Aug 31, 2019Updated 6 years ago
- Templates for academic documents in Pandoc Markdown☆15Jan 31, 2019Updated 7 years ago
- command line dictionary written in python.☆19Jun 20, 2015Updated 10 years ago
- A CLI for dealing with the features of ScrapingHub☆16Apr 20, 2021Updated 4 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- A decorator to write coroutine-like spider callbacks.☆109Dec 26, 2022Updated 3 years ago
- Presentation for the NYU Data Lab December 2015☆14Dec 2, 2015Updated 10 years ago
- A JSON API to tag a sentence with part of speech tags. Uses UDPipe, so support for hundreds of languages.☆14Dec 2, 2024Updated last year
- This is a phishing ready platform. Unlike other phishing methods, EvilnoVNC allows you to bypass 2FA using a real browser via noVNC conn…☆10Apr 7, 2023Updated 2 years ago
- A linter for Scrapy projects.☆21Feb 25, 2026Updated last month
- ☆10Nov 2, 2016Updated 9 years ago
- Pseudo-localization tool for .NET☆15Mar 23, 2026Updated last week
- Pre-print:☆11Oct 17, 2023Updated 2 years ago
- The simMixedDAG package enables simulation of "real life" datasets from DAGs☆13Oct 12, 2019Updated 6 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- i will post updates on my instagram @unkn0wn_bali tufhub - a hacking framework with all kinds of bruteforce, info gather, dos attack,…☆13Nov 28, 2018Updated 7 years ago
- This repository has been transferred to jeroendmulder.github.io/RI-CLPM for easier maintenance. The Github Pages automatically redirects …☆13Jul 20, 2022Updated 3 years ago
- A very simple mobile-friendly game that teaches CSS selectors.☆29Dec 20, 2022Updated 3 years ago
- A base library for building web scrapers for statistical data, and a helper ontology for (primarily Swedish) statistical data.☆15Feb 28, 2025Updated last year
- Code for the AnecbotalNYT Twitter bot☆16Sep 23, 2017Updated 8 years ago
- Python implementation of the Parsley language for extracting structured data from web pages☆92Oct 26, 2017Updated 8 years ago
- Formasaurus tells you the type of an HTML form and its fields using machine learning☆121Mar 19, 2026Updated last week
- Tool to flatten stream of JSON-like objects, configured via schema☆33Oct 19, 2019Updated 6 years ago
- Classify Twitter accounts as institutional or ordinary users.☆12Nov 16, 2018Updated 7 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Extract rich information from any text (urls, todos, etc)☆17Jan 14, 2026Updated 2 months ago
- A GitHub Action that lints Python code with Flake8 then automatically creates pull request reviews if there are any violations.☆27Apr 20, 2022Updated 3 years ago
- Extract messages from an iMessage database from iOS 8☆13Apr 10, 2017Updated 8 years ago
- Generate lavaan syntax for RI-CLPM☆10Dec 11, 2020Updated 5 years ago
- Small semi-manual annotated web news corpus in Swedish for CoreNLP NER. 4 categories, PER, ORG, LOC and MISC.☆12Jun 27, 2020Updated 5 years ago
- Synchronizes macOS local files and directorines with a remotes☆10Oct 15, 2024Updated last year
- Do things with words. Scale them, mostly.☆18May 9, 2021Updated 4 years ago