List of libraries, tools and APIs for web scraping and data processing.
☆13Sep 17, 2015Updated 10 years ago
Alternatives and similar repositories for awesome-web-scraping
Users that are interested in awesome-web-scraping are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Cursos☆10Feb 27, 2019Updated 7 years ago
- Python dict-like interface for merging dicts with add to set property☆14Apr 13, 2026Updated 3 weeks ago
- Print an image of a cat to the iTerm2 terminal☆14Feb 7, 2017Updated 9 years ago
- Vinta's ESLint and Prettier shareable configs.☆23Feb 19, 2024Updated 2 years ago
- Find which links on a web page are pagination links☆29Jan 12, 2017Updated 9 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Docker Image with Matlab Compiler Runtime and SSHD☆15Aug 28, 2014Updated 11 years ago
- Feedbuffer buffers RSS and Atom syndication feeds, that is to say it caches new feed entries until the news aggregator requests them and …☆19Jul 2, 2016Updated 9 years ago
- A Scrapy pipeline to categorize items using MonkeyLearn☆38Apr 28, 2017Updated 9 years ago
- Shows how to encrypt data held in public space☆11Aug 11, 2017Updated 8 years ago
- A awesome list of (large-scale) public datasets on the Internet. (On-going collection)☆24Feb 18, 2022Updated 4 years ago
- Tower Sim & Entry for 10k Apart 2016☆12Dec 3, 2019Updated 6 years ago
- Templates for academic documents in Pandoc Markdown☆15Jan 31, 2019Updated 7 years ago
- command line dictionary written in python.☆19Jun 20, 2015Updated 10 years ago
- A decorator to write coroutine-like spider callbacks.☆109Dec 26, 2022Updated 3 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Django module that abstracts the flow of several virtual points of sale including PayPal☆24Aug 13, 2024Updated last year
- csharp-functional provides a set of NuGet packages to drive your coding towards a functional approach as well as enabling Railway Oriente…☆11Jul 12, 2022Updated 3 years ago
- Introductory tutorial creating a narrative to the RStudio's tutorial and other documentation for newbies to R's wonderful package shiny.☆24Jan 27, 2015Updated 11 years ago
- Presentation for the NYU Data Lab December 2015☆14Dec 2, 2015Updated 10 years ago
- Test Expectations of a Data Frame☆14Oct 21, 2019Updated 6 years ago
- This is a phishing ready platform. Unlike other phishing methods, EvilnoVNC allows you to bypass 2FA using a real browser via noVNC conn…☆10Apr 7, 2023Updated 3 years ago
- Stripe Identity Verification API demo app hosted on Codesandbox☆11Jan 6, 2023Updated 3 years ago
- ☆10Nov 2, 2016Updated 9 years ago
- Pseudo-localization tool for .NET☆16Updated this week
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Pre-print:☆11Oct 17, 2023Updated 2 years ago
- The simMixedDAG package enables simulation of "real life" datasets from DAGs☆13Oct 12, 2019Updated 6 years ago
- i will post updates on my instagram @unkn0wn_bali tufhub - a hacking framework with all kinds of bruteforce, info gather, dos attack,…☆13Nov 28, 2018Updated 7 years ago
- Simple Scrapy middleware to process non-well-formed HTML with BeautifulSoup☆21Sep 26, 2016Updated 9 years ago
- PubPeer Chrome browser extension☆14Feb 18, 2025Updated last year
- Python implementation of the Parsley language for extracting structured data from web pages☆92Oct 26, 2017Updated 8 years ago
- Formasaurus tells you the type of an HTML form and its fields using machine learning☆121Apr 8, 2026Updated last month
- mutation testing for R☆16Nov 11, 2024Updated last year
- Extract rich information from any text (urls, todos, etc)☆17Apr 28, 2026Updated last week
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Swedish data☆14Updated this week
- https://mimesniff.spec.whatwg.org/ implementation for Python☆13Jan 16, 2024Updated 2 years ago
- Small semi-manual annotated web news corpus in Swedish for CoreNLP NER. 4 categories, PER, ORG, LOC and MISC.☆12Jun 27, 2020Updated 5 years ago
- All of my bibliographic references☆16Jun 21, 2020Updated 5 years ago
- Complete Mechanical Turk API written in Python that uses the same names as the official documentation☆44Mar 10, 2017Updated 9 years ago
- API Wrapper for the mediacloud.org API☆16Aug 20, 2019Updated 6 years ago
- 🧰 Small (~0.2MB) set of tools to expand the PSL.☆15Oct 18, 2023Updated 2 years ago