⛏ a library for scraping unreliable pages
☆212Apr 13, 2026Updated last month
Alternatives and similar repositories for scrapelib
Users that are interested in scrapelib are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A modern Python library for writing maintainable web scrapers.☆250Nov 22, 2025Updated 6 months ago
- Government-wide search and notification website.☆49May 5, 2016Updated 10 years ago
- Parser and standardizer for politician, individual and organization names.☆128May 18, 2017Updated 9 years ago
- Interactive and searchable House staffer directory, based on House disbursement data.☆30Feb 29, 2024Updated 2 years ago
- Coding space for the LegisLetters project.☆11Jun 10, 2015Updated 10 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Python library with common functionality for writing web scrapers☆102Jul 6, 2015Updated 10 years ago
- Fuzzy Categorical Distances☆14Mar 31, 2020Updated 6 years ago
- Data and scripts relating to the publishing of the House expenditure reports, and hopefully the Senate's in future.☆25Dec 15, 2020Updated 5 years ago
- ☆23Mar 7, 2015Updated 11 years ago
- A small repo of notes and scripts for collecting data on U.S. deadly force police incidents☆10Aug 9, 2015Updated 10 years ago
- Discussion Summarization is the process of condensing a text document which is a collection of discussion threads, using CBS (Cluster Bas…☆12Apr 10, 2014Updated 12 years ago
- source for Open States scrapers☆899May 18, 2026Updated last week
- AI agent for enhancing datasets with information from the internet☆21Nov 6, 2025Updated 6 months ago
- Tracking changes to the official U.S. House and Senate roll call votes XML data files. Monitored hourly-ish by @GovTrack/@JoshData.☆33Dec 22, 2018Updated 7 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- A Ruby gem that extracts press releases and statements by members of Congress.☆70Dec 15, 2015Updated 10 years ago
- legacy backend for Open States☆88Jan 31, 2020Updated 6 years ago
- Organizing and publishing the web domains of the US federal government☆19Sep 2, 2018Updated 7 years ago
- A custom element for creating Leaflet maps☆18Dec 3, 2021Updated 4 years ago
- Tools and lessons plans☆19Mar 14, 2017Updated 9 years ago
- The Washington Post's app for creating admin foreign key autocompletion fields.☆24May 22, 2013Updated 13 years ago
- A bash tool (script) to generate animated (gif) temporal progressions of land cover with inputs of lat, long, and start/end dates. Requir…☆17Mar 25, 2015Updated 11 years ago
- A Python library that standardizes the names of U.S. states☆25Mar 24, 2015Updated 11 years ago
- A build tool by and for the Los Angeles Times☆30Oct 15, 2025Updated 7 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆25Jul 28, 2014Updated 11 years ago
- moxie☆28Jan 6, 2016Updated 10 years ago
- 🎓 deprecated general purpose python data validator☆236Feb 15, 2024Updated 2 years ago
- framework for scraping legislative/government data☆90Nov 17, 2025Updated 6 months ago
- Tracking FOIA data across government agencies and departments☆15Mar 6, 2017Updated 9 years ago
- Turns legal citations in the DOM into links☆20Mar 15, 2017Updated 9 years ago
- The easiest way to run shell commands with Python. A python command line object mapper.☆27Apr 13, 2026Updated last month
- Python script for matching a list of messy addresses against a gazetteer using dedupe.☆64Mar 31, 2020Updated 6 years ago
- ArchiveKit manages data and documents during ETL processes, either on a local file system or on S3.☆15May 2, 2015Updated 11 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- A MCP to connect LLMs to the archives of The Guardian☆20Jun 29, 2025Updated 11 months ago
- ☆25Mar 18, 2013Updated 13 years ago
- NICAR 2016 talk about PDFs!☆63Mar 12, 2016Updated 10 years ago
- A complete agency API program.☆12Apr 27, 2017Updated 9 years ago
- America's most comprehensive dictionary of campaign finance jargon. A free resource created by and for data journalists.☆22May 1, 2026Updated 3 weeks ago
- Parser for U.S. federal regulations and other regulatory information☆43Mar 27, 2023Updated 3 years ago
- Deprecated! - See osm-tasking-manager2☆84Oct 17, 2017Updated 8 years ago