⛏ a library for scraping unreliable pages
☆212Apr 3, 2026Updated last week
Alternatives and similar repositories for scrapelib
Users that are interested in scrapelib are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A modern Python library for writing maintainable web scrapers.☆250Nov 22, 2025Updated 4 months ago
- Government-wide search and notification website.☆49May 5, 2016Updated 9 years ago
- Parser and standardizer for politician, individual and organization names.☆128May 18, 2017Updated 8 years ago
- Coding space for the LegisLetters project.☆11Jun 10, 2015Updated 10 years ago
- Python library with common functionality for writing web scrapers☆102Jul 6, 2015Updated 10 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Fuzzy Categorical Distances☆14Mar 31, 2020Updated 6 years ago
- Data and scripts relating to the publishing of the House expenditure reports, and hopefully the Senate's in future.☆25Dec 15, 2020Updated 5 years ago
- A small repo of notes and scripts for collecting data on U.S. deadly force police incidents☆10Aug 9, 2015Updated 10 years ago
- Discussion Summarization is the process of condensing a text document which is a collection of discussion threads, using CBS (Cluster Bas…☆12Apr 10, 2014Updated 12 years ago
- source for Open States scrapers☆896Updated this week
- The core of sunlightlabs' Data Commons project. Includes the Transparency Data site and the APIs that power TransparencyData.com and Infl…☆38Oct 10, 2016Updated 9 years ago
- AI agent for enhancing datasets with information from the internet☆21Nov 6, 2025Updated 5 months ago
- Tracking changes to the official U.S. House and Senate roll call votes XML data files. Monitored hourly-ish by @GovTrack/@JoshData.☆33Dec 22, 2018Updated 7 years ago
- A Ruby gem that extracts press releases and statements by members of Congress.☆70Dec 15, 2015Updated 10 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- legacy backend for Open States☆87Jan 31, 2020Updated 6 years ago
- A PHP library for reading, writing and manipulating CISAC Common Works Registration (CWR) v2.1R7 and v2.2 files☆16Sep 6, 2018Updated 7 years ago
- A custom element for creating Leaflet maps☆18Dec 3, 2021Updated 4 years ago
- Tools and lessons plans☆20Mar 14, 2017Updated 9 years ago
- The Washington Post's app for creating admin foreign key autocompletion fields.☆24May 22, 2013Updated 12 years ago
- A bash tool (script) to generate animated (gif) temporal progressions of land cover with inputs of lat, long, and start/end dates. Requir…☆17Mar 25, 2015Updated 11 years ago
- A Python library that standardizes the names of U.S. states☆25Mar 24, 2015Updated 11 years ago
- ☆25Jul 28, 2014Updated 11 years ago
- moxie☆28Jan 6, 2016Updated 10 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- 🎓 deprecated general purpose python data validator☆236Feb 15, 2024Updated 2 years ago
- framework for scraping legislative/government data☆90Nov 17, 2025Updated 4 months ago
- Tracking FOIA data across government agencies and departments☆15Mar 6, 2017Updated 9 years ago
- webapp for unglue.it - A Free Ebook Foundation program☆18Apr 9, 2026Updated last week
- Turns legal citations in the DOM into links☆20Mar 15, 2017Updated 9 years ago
- ArchiveKit manages data and documents during ETL processes, either on a local file system or on S3.☆15May 2, 2015Updated 10 years ago
- 🗂 A simple wrapper around the Google Sheets API for converting the contents of a Google Sheet into a tabular or key-value data structure…☆23Feb 3, 2023Updated 3 years ago
- A MCP to connect LLMs to the archives of The Guardian☆19Jun 29, 2025Updated 9 months ago
- NICAR 2016 talk about PDFs!☆63Mar 12, 2016Updated 10 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- A complete agency API program.☆12Apr 27, 2017Updated 8 years ago
- America's most comprehensive dictionary of campaign finance jargon. A free resource created by and for data journalists.☆21Updated this week
- Parser for U.S. federal regulations and other regulatory information☆43Mar 27, 2023Updated 3 years ago
- Deprecated! - See osm-tasking-manager2☆84Oct 17, 2017Updated 8 years ago
- Python package to detect and return RSS / Atom feeds for a given website. The tool supports major blogging platform including Wordpress, …☆21Oct 21, 2021Updated 4 years ago
- Linked Data explorer and SPARQL endpoint☆23Dec 15, 2021Updated 4 years ago
- Make workflow for downloading Census geodata and joining it to survey data☆37Dec 6, 2021Updated 4 years ago