List of libraries, tools and APIs for web scraping and data processing.
☆13Sep 17, 2015Updated 10 years ago
Alternatives and similar repositories for awesome-web-scraping
Users that are interested in awesome-web-scraping are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Cursos☆10Feb 27, 2019Updated 7 years ago
- Python dict-like interface for merging dicts with add to set property☆14Apr 13, 2026Updated 2 months ago
- Print an image of a cat to the iTerm2 terminal☆14Feb 7, 2017Updated 9 years ago
- ☆14Sep 18, 2012Updated 13 years ago
- Vinta's ESLint and Prettier shareable configs.☆23Feb 19, 2024Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Find which links on a web page are pagination links☆29Jan 12, 2017Updated 9 years ago
- Docker Image with Matlab Compiler Runtime and SSHD☆15Aug 28, 2014Updated 11 years ago
- A Scrapy pipeline to categorize items using MonkeyLearn☆38Apr 28, 2017Updated 9 years ago
- Shows how to encrypt data held in public space☆11Aug 11, 2017Updated 8 years ago
- A simple and fast Oh-My-Zsh theme☆26Nov 17, 2018Updated 7 years ago
- How To Be a Programmer, edited☆12May 21, 2012Updated 14 years ago
- A collection of datasets from Skolverket☆11Sep 1, 2020Updated 5 years ago
- A awesome list of (large-scale) public datasets on the Internet. (On-going collection)☆24Feb 18, 2022Updated 4 years ago
- litrl browser and detectors☆10Oct 5, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Data and code used in Yarkoni (2019) -- "The Generalizability Crisis"☆13Nov 22, 2019Updated 6 years ago
- Templates for academic documents in Pandoc Markdown☆15Jan 31, 2019Updated 7 years ago
- Presentation for the NYU Data Lab December 2015☆14Dec 2, 2015Updated 10 years ago
- Test Expectations of a Data Frame☆14Oct 21, 2019Updated 6 years ago
- A JSON API to tag a sentence with part of speech tags. Uses UDPipe, so support for hundreds of languages.☆14Dec 2, 2024Updated last year
- ☆10Nov 2, 2016Updated 9 years ago
- Pre-print:☆11Oct 17, 2023Updated 2 years ago
- The simMixedDAG package enables simulation of "real life" datasets from DAGs☆13Oct 12, 2019Updated 6 years ago
- Simple Scrapy middleware to process non-well-formed HTML with BeautifulSoup☆22Sep 26, 2016Updated 9 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- 🛠 Useful R functions for various things☆18Jul 4, 2019Updated 6 years ago
- R package to import articles from newspaper databases☆14Feb 29, 2024Updated 2 years ago
- PubPeer Chrome browser extension☆14Feb 18, 2025Updated last year
- A very simple mobile-friendly game that teaches CSS selectors.☆29Dec 20, 2022Updated 3 years ago
- Code for the AnecbotalNYT Twitter bot☆16Sep 23, 2017Updated 8 years ago
- Python implementation of the Parsley language for extracting structured data from web pages☆92Oct 26, 2017Updated 8 years ago
- Classify Twitter accounts as institutional or ordinary users.☆12Nov 16, 2018Updated 7 years ago
- Extract rich information from any text (urls, todos, emails, jwt, etc)☆18Updated this week
- A GitHub Action that lints Python code with Flake8 then automatically creates pull request reviews if there are any violations.☆27Apr 20, 2022Updated 4 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- An R package to gather, munge, and convert event datasets into temporal event-networks.☆11Mar 28, 2018Updated 8 years ago
- https://mimesniff.spec.whatwg.org/ implementation for Python☆13Jan 16, 2024Updated 2 years ago
- Do things with words. Scale them, mostly.☆18May 9, 2021Updated 5 years ago
- All of my bibliographic references☆16Jun 21, 2020Updated 5 years ago
- plot p-curves☆14Dec 6, 2017Updated 8 years ago
- API Wrapper for the mediacloud.org API☆16Aug 20, 2019Updated 6 years ago
- plugin to check spacing between sentences☆10Sep 10, 2023Updated 2 years ago