⛏ a library for scraping unreliable pages
☆212Apr 13, 2026Updated 3 weeks ago
Alternatives and similar repositories for scrapelib
Users that are interested in scrapelib are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A modern Python library for writing maintainable web scrapers.☆250Nov 22, 2025Updated 5 months ago
- Government-wide search and notification website.☆49May 5, 2016Updated 10 years ago
- Parser and standardizer for politician, individual and organization names.☆128May 18, 2017Updated 8 years ago
- Coding space for the LegisLetters project.☆11Jun 10, 2015Updated 10 years ago
- Interactive and searchable House staffer directory, based on House disbursement data.☆30Feb 29, 2024Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Python library with common functionality for writing web scrapers☆102Jul 6, 2015Updated 10 years ago
- Data and scripts relating to the publishing of the House expenditure reports, and hopefully the Senate's in future.☆25Dec 15, 2020Updated 5 years ago
- A small repo of notes and scripts for collecting data on U.S. deadly force police incidents☆10Aug 9, 2015Updated 10 years ago
- Discussion Summarization is the process of condensing a text document which is a collection of discussion threads, using CBS (Cluster Bas…☆12Apr 10, 2014Updated 12 years ago
- source for Open States scrapers☆898Updated this week
- The core of sunlightlabs' Data Commons project. Includes the Transparency Data site and the APIs that power TransparencyData.com and Infl…☆38Oct 10, 2016Updated 9 years ago
- AI agent for enhancing datasets with information from the internet☆21Nov 6, 2025Updated 6 months ago
- Tracking changes to the official U.S. House and Senate roll call votes XML data files. Monitored hourly-ish by @GovTrack/@JoshData.☆33Dec 22, 2018Updated 7 years ago
- legacy backend for Open States☆87Jan 31, 2020Updated 6 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Organizing and publishing the web domains of the US federal government☆19Sep 2, 2018Updated 7 years ago
- A PHP library for reading, writing and manipulating CISAC Common Works Registration (CWR) v2.1R7 and v2.2 files☆16Sep 6, 2018Updated 7 years ago
- A custom element for creating Leaflet maps☆18Dec 3, 2021Updated 4 years ago
- Tools and lessons plans☆19Mar 14, 2017Updated 9 years ago
- The Washington Post's app for creating admin foreign key autocompletion fields.☆24May 22, 2013Updated 12 years ago
- A Python library that standardizes the names of U.S. states☆25Mar 24, 2015Updated 11 years ago
- A build tool by and for the Los Angeles Times☆30Oct 15, 2025Updated 6 months ago
- ☆25Jul 28, 2014Updated 11 years ago
- moxie☆28Jan 6, 2016Updated 10 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- 🎓 deprecated general purpose python data validator☆236Feb 15, 2024Updated 2 years ago
- framework for scraping legislative/government data☆90Nov 17, 2025Updated 5 months ago
- Tracking FOIA data across government agencies and departments☆15Mar 6, 2017Updated 9 years ago
- Turns legal citations in the DOM into links☆20Mar 15, 2017Updated 9 years ago
- The easiest way to run shell commands with Python. A python command line object mapper.☆27Apr 13, 2026Updated 3 weeks ago
- Python script for matching a list of messy addresses against a gazetteer using dedupe.☆64Mar 31, 2020Updated 6 years ago
- ArchiveKit manages data and documents during ETL processes, either on a local file system or on S3.☆15May 2, 2015Updated 11 years ago
- ☆25Mar 18, 2013Updated 13 years ago
- NICAR 2016 talk about PDFs!☆63Mar 12, 2016Updated 10 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- A complete agency API program.☆12Apr 27, 2017Updated 9 years ago
- America's most comprehensive dictionary of campaign finance jargon. A free resource created by and for data journalists.☆21Updated this week
- Parser for U.S. federal regulations and other regulatory information☆43Mar 27, 2023Updated 3 years ago
- Python package to detect and return RSS / Atom feeds for a given website. The tool supports major blogging platform including Wordpress, …☆21Oct 21, 2021Updated 4 years ago
- Make workflow for downloading Census geodata and joining it to survey data☆37Dec 6, 2021Updated 4 years ago
- A financial disclosure data extraction tool.☆21Aug 2, 2023Updated 2 years ago
- The Poor Man's Web Components☆14Oct 31, 2016Updated 9 years ago