jamesturk / scrapelibView external linksLinks
⛏ a library for scraping unreliable pages
☆212Jan 9, 2026Updated last month
Alternatives and similar repositories for scrapelib
Users that are interested in scrapelib are comparing it to the libraries listed below
Sorting:
- A modern Python library for writing maintainable web scrapers.☆249Nov 22, 2025Updated 2 months ago
- Interactive and searchable House staffer directory, based on House disbursement data.☆30Feb 29, 2024Updated last year
- Discussion Summarization is the process of condensing a text document which is a collection of discussion threads, using CBS (Cluster Bas…☆12Apr 10, 2014Updated 11 years ago
- Python library with common functionality for writing web scrapers☆102Jul 6, 2015Updated 10 years ago
- ☆23Mar 7, 2015Updated 10 years ago
- Parser and standardizer for politician, individual and organization names.☆128May 18, 2017Updated 8 years ago
- A small repo of notes and scripts for collecting data on U.S. deadly force police incidents☆10Aug 9, 2015Updated 10 years ago
- source for Open States scrapers☆889Updated this week
- legacy backend for Open States☆87Jan 31, 2020Updated 6 years ago
- Turns legal citations in the DOM into links☆20Mar 15, 2017Updated 8 years ago
- Government-wide search and notification website.☆50May 5, 2016Updated 9 years ago
- Data and scripts relating to the publishing of the House expenditure reports, and hopefully the Senate's in future.☆24Dec 15, 2020Updated 5 years ago
- Coding space for the LegisLetters project.☆11Jun 10, 2015Updated 10 years ago
- webapp for unglue.it - A Free Ebook Foundation program☆18Jul 23, 2025Updated 6 months ago
- framework for scraping legislative/government data☆89Nov 17, 2025Updated 2 months ago
- A Ruby gem that extracts press releases and statements by members of Congress.☆70Dec 15, 2015Updated 10 years ago
- Python package to detect and return RSS / Atom feeds for a given website. The tool supports major blogging platform including Wordpress, …☆21Oct 21, 2021Updated 4 years ago
- The core of sunlightlabs' Data Commons project. Includes the Transparency Data site and the APIs that power TransparencyData.com and Infl…☆38Oct 10, 2016Updated 9 years ago
- Tracking changes to the official U.S. House and Senate roll call votes XML data files. Monitored hourly-ish by @GovTrack/@JoshData.☆33Dec 22, 2018Updated 7 years ago
- [obsolete] Moved to https://github.com/rometools/rome☆23Feb 20, 2016Updated 9 years ago
- AI agent for enhancing datasets with information from the internet☆20Nov 6, 2025Updated 3 months ago
- A crawler, indexer, and query interface all in Python with distributed processing via Pyro4.☆23Mar 16, 2012Updated 13 years ago
- America's most comprehensive dictionary of campaign finance jargon. A free resource created by and for data journalists.☆19Feb 1, 2026Updated last week
- NICAR 2016 talk about PDFs!☆63Mar 12, 2016Updated 9 years ago
- NPR Visual's Carebot (deprecated, now in: https://github.com/thecarebot/carebot)☆15Jul 8, 2015Updated 10 years ago
- ArchiveKit manages data and documents during ETL processes, either on a local file system or on S3.☆15May 2, 2015Updated 10 years ago
- Find other OpenStreetMap mappers around you☆26Feb 18, 2025Updated 11 months ago
- Tools and lessons plans☆20Mar 14, 2017Updated 8 years ago
- A bash tool (script) to generate animated (gif) temporal progressions of land cover with inputs of lat, long, and start/end dates. Requir…☆17Mar 25, 2015Updated 10 years ago
- Algorithmic summarizer for RSS/Atom Feeds, Web Urls and arbitrary text. Codebase for the application deployed at http://tldrzr.herokuapp.…☆53Sep 4, 2016Updated 9 years ago
- Scrapers for US municipal governments.☆105Nov 21, 2025Updated 2 months ago
- A dashboard with various internet-y widgets☆18Sep 19, 2017Updated 8 years ago
- etl pipeline, graphical explorer and general toolbox for investigations with follow the money data☆25Jul 15, 2025Updated 6 months ago
- Parser for U.S. federal regulations and other regulatory information☆40Mar 27, 2023Updated 2 years ago
- Lightweight web scraping toolkit for documents and structured data.☆315Jan 10, 2024Updated 2 years ago
- Digitization information system build on top of Fedora repository☆16Jan 15, 2019Updated 7 years ago
- Make workflow for downloading Census geodata and joining it to survey data☆37Dec 6, 2021Updated 4 years ago
- Gates of Olympus: A multi-layer tower defense game in WebGL☆15Jan 30, 2011Updated 15 years ago
- An international meta organization to foster news nerd collaboration and knowledge sharing☆113May 17, 2019Updated 6 years ago