List of libraries, tools and APIs for web scraping and data processing.
☆13Sep 17, 2015Updated 10 years ago
Alternatives and similar repositories for awesome-web-scraping
Users that are interested in awesome-web-scraping are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Python dict-like interface for merging dicts with add to set property☆14Nov 14, 2018Updated 7 years ago
- Find which links on a web page are pagination links☆29Jan 12, 2017Updated 9 years ago
- Feedbuffer buffers RSS and Atom syndication feeds, that is to say it caches new feed entries until the news aggregator requests them and …☆19Jul 2, 2016Updated 9 years ago
- scrapy-extras -- a collection of code samples and modules for the Scrapy framework.☆14Dec 14, 2020Updated 5 years ago
- A Scrapy pipeline to categorize items using MonkeyLearn☆38Apr 28, 2017Updated 8 years ago
- Deploy open-source AI quickly and easily - Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Shows how to encrypt data held in public space☆11Aug 11, 2017Updated 8 years ago
- How To Be a Programmer, edited☆12May 21, 2012Updated 13 years ago
- A collection of datasets from Skolverket☆11Sep 1, 2020Updated 5 years ago
- A awesome list of (large-scale) public datasets on the Internet. (On-going collection)☆24Feb 18, 2022Updated 4 years ago
- litrl browser and detectors☆10Oct 5, 2023Updated 2 years ago
- Templates for academic documents in Pandoc Markdown☆15Jan 31, 2019Updated 7 years ago
- A CLI for dealing with the features of ScrapingHub☆16Apr 20, 2021Updated 4 years ago
- csharp-functional provides a set of NuGet packages to drive your coding towards a functional approach as well as enabling Railway Oriente…☆11Jul 12, 2022Updated 3 years ago
- Introductory tutorial creating a narrative to the RStudio's tutorial and other documentation for newbies to R's wonderful package shiny.☆24Jan 27, 2015Updated 11 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Presentation for the NYU Data Lab December 2015☆14Dec 2, 2015Updated 10 years ago
- Test Expectations of a Data Frame☆14Oct 21, 2019Updated 6 years ago
- A JSON API to tag a sentence with part of speech tags. Uses UDPipe, so support for hundreds of languages.☆14Dec 2, 2024Updated last year
- ☆10Nov 2, 2016Updated 9 years ago
- Pseudo-localization tool for .NET☆15Updated this week
- Joint estimation of sentiment and topics in textual data☆14Aug 9, 2023Updated 2 years ago
- Pre-print:☆11Oct 17, 2023Updated 2 years ago
- The simMixedDAG package enables simulation of "real life" datasets from DAGs☆13Oct 12, 2019Updated 6 years ago
- 🛠 Useful R functions for various things☆18Jul 4, 2019Updated 6 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- R package to import articles from newspaper databases☆14Feb 29, 2024Updated 2 years ago
- PubPeer Chrome browser extension☆14Feb 18, 2025Updated last year
- A very simple mobile-friendly game that teaches CSS selectors.☆29Dec 20, 2022Updated 3 years ago
- A base library for building web scrapers for statistical data, and a helper ontology for (primarily Swedish) statistical data.☆15Feb 28, 2025Updated last year
- Code for the AnecbotalNYT Twitter bot☆16Sep 23, 2017Updated 8 years ago
- Python implementation of the Parsley language for extracting structured data from web pages☆92Oct 26, 2017Updated 8 years ago
- Formasaurus tells you the type of an HTML form and its fields using machine learning☆121Apr 8, 2026Updated last week
- Classify Twitter accounts as institutional or ordinary users.☆12Nov 16, 2018Updated 7 years ago
- A GitHub Action that lints Python code with Flake8 then automatically creates pull request reviews if there are any violations.☆27Apr 20, 2022Updated 3 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- An R package to gather, munge, and convert event datasets into temporal event-networks.☆11Mar 28, 2018Updated 8 years ago
- Swedish data☆14Dec 10, 2025Updated 4 months ago
- Autocomplete component for Blazor Webssembly and Blazor Server☆15Nov 13, 2023Updated 2 years ago
- Generate lavaan syntax for RI-CLPM☆10Dec 11, 2020Updated 5 years ago
- Small semi-manual annotated web news corpus in Swedish for CoreNLP NER. 4 categories, PER, ORG, LOC and MISC.☆12Jun 27, 2020Updated 5 years ago
- Do things with words. Scale them, mostly.☆18May 9, 2021Updated 4 years ago
- All of my bibliographic references☆16Jun 21, 2020Updated 5 years ago