⛏ a library for scraping unreliable pages
☆212Feb 20, 2026Updated last month
Alternatives and similar repositories for scrapelib
Users that are interested in scrapelib are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Government-wide search and notification website.☆50May 5, 2016Updated 9 years ago
- Coding space for the LegisLetters project.☆11Jun 10, 2015Updated 10 years ago
- Interactive and searchable House staffer directory, based on House disbursement data.☆30Feb 29, 2024Updated 2 years ago
- Python library with common functionality for writing web scrapers☆102Jul 6, 2015Updated 10 years ago
- Fuzzy Categorical Distances☆14Mar 31, 2020Updated 5 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- ☆23Mar 7, 2015Updated 11 years ago
- MySQL/PostgreSQL schema migrations made easy☆16May 9, 2023Updated 2 years ago
- A small repo of notes and scripts for collecting data on U.S. deadly force police incidents☆10Aug 9, 2015Updated 10 years ago
- Discussion Summarization is the process of condensing a text document which is a collection of discussion threads, using CBS (Cluster Bas…☆12Apr 10, 2014Updated 11 years ago
- source for Open States scrapers☆895Updated this week
- The core of sunlightlabs' Data Commons project. Includes the Transparency Data site and the APIs that power TransparencyData.com and Infl…☆38Oct 10, 2016Updated 9 years ago
- AI agent for enhancing datasets with information from the internet☆21Nov 6, 2025Updated 4 months ago
- Tracking changes to the official U.S. House and Senate roll call votes XML data files. Monitored hourly-ish by @GovTrack/@JoshData.☆33Dec 22, 2018Updated 7 years ago
- A Ruby gem that extracts press releases and statements by members of Congress.☆70Dec 15, 2015Updated 10 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- legacy backend for Open States☆87Jan 31, 2020Updated 6 years ago
- Organizing and publishing the web domains of the US federal government☆19Sep 2, 2018Updated 7 years ago
- A custom element for creating Leaflet maps☆18Dec 3, 2021Updated 4 years ago
- Tools and lessons plans☆20Mar 14, 2017Updated 9 years ago
- A bash tool (script) to generate animated (gif) temporal progressions of land cover with inputs of lat, long, and start/end dates. Requir…☆17Mar 25, 2015Updated 11 years ago
- A Python library that standardizes the names of U.S. states☆25Mar 24, 2015Updated 11 years ago
- A build tool by and for the Los Angeles Times☆29Oct 15, 2025Updated 5 months ago
- ☆25Jul 28, 2014Updated 11 years ago
- moxie☆28Jan 6, 2016Updated 10 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- framework for scraping legislative/government data☆89Nov 17, 2025Updated 4 months ago
- Tracking FOIA data across government agencies and departments☆15Mar 6, 2017Updated 9 years ago
- Turns legal citations in the DOM into links☆20Mar 15, 2017Updated 9 years ago
- ArchiveKit manages data and documents during ETL processes, either on a local file system or on S3.☆15May 2, 2015Updated 10 years ago
- 🗂 A simple wrapper around the Google Sheets API for converting the contents of a Google Sheet into a tabular or key-value data structure…☆23Feb 3, 2023Updated 3 years ago
- A MCP to connect LLMs to the archives of The Guardian☆19Jun 29, 2025Updated 8 months ago
- ☆25Mar 18, 2013Updated 13 years ago
- NICAR 2016 talk about PDFs!☆63Mar 12, 2016Updated 10 years ago
- A complete agency API program.☆12Apr 27, 2017Updated 8 years ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- America's most comprehensive dictionary of campaign finance jargon. A free resource created by and for data journalists.☆21Mar 5, 2026Updated 3 weeks ago
- Deprecated! - See osm-tasking-manager2☆84Oct 17, 2017Updated 8 years ago
- Python package to detect and return RSS / Atom feeds for a given website. The tool supports major blogging platform including Wordpress, …☆21Oct 21, 2021Updated 4 years ago
- Web scraping Page Objects core library☆105Mar 10, 2026Updated 2 weeks ago
- Linked Data explorer and SPARQL endpoint☆23Dec 15, 2021Updated 4 years ago
- Make workflow for downloading Census geodata and joining it to survey data☆37Dec 6, 2021Updated 4 years ago
- A financial disclosure data extraction tool.☆21Aug 2, 2023Updated 2 years ago