jamesturk / scrapelibLinks
⛏ a library for scraping unreliable pages
☆212Updated last week
Alternatives and similar repositories for scrapelib
Users that are interested in scrapelib are comparing it to the libraries listed below
Sorting:
- A tiny library for Python text normalisation. Useful for ad-hoc text processing.☆152Updated 6 months ago
- framework for scraping legislative/government data☆86Updated 10 months ago
- A modern Python library for writing maintainable web scrapers.☆249Updated 3 weeks ago
- Python library with common functionality for writing web scrapers☆102Updated 10 years ago
- Parser and standardizer for politician, individual and organization names.☆129Updated 8 years ago
- legacy backend for Open States☆87Updated 5 years ago
- A Flask-based static site authoring tool.☆164Updated 3 years ago
- Unified Python bindings for Sunlight APIs☆66Updated 9 years ago
- ScraperWiki Python library for scraping and saving data☆159Updated 2 years ago
- Opinionated template for Django projects on Python 3 and PostgreSQL☆24Updated 7 years ago
- Now included in rigour☆151Updated 2 months ago
- Utility library to turn country names into ISO two-letter codes☆70Updated last month
- ProPublica's collaborative tip-gathering framework. Import and manage CSV, Google Sheets and Screendoor data with ease.☆100Updated 2 years ago
- Easy extraction of keywords and engines from search engine results pages (SERPs).☆90Updated 3 years ago
- python library for extracting html microdata☆166Updated 2 years ago
- A simple Python library/tool for pulling location information from unstructured text☆186Updated 14 years ago
- PANDA: A Newsroom Data Appliance☆205Updated 3 years ago
- Tools for parsing messy tabular data. This is now superseded by https://github.com/frictionlessdata/tabulator-py☆390Updated 2 years ago
- Scrapes sites. Gets news. Eventually events.☆87Updated 9 years ago
- Scrapy middleware which allows to crawl only new content☆80Updated 2 years ago
- Next-gen web application for public finance data warehouses, formerly OpenSpending☆57Updated 3 years ago
- Python implementation of the Parsley language for extracting structured data from web pages☆92Updated 7 years ago
- geonamescache - a Python library for quick access to a subset of GeoNames data.☆109Updated 11 months ago
- A Python module for accessing the Open States API☆29Updated last year
- 🔎 Finds fuzzy matches between CSV files☆190Updated 3 months ago
- Street address parser and formatter☆91Updated 5 years ago
- Easily crowdsource the analysis of your documents☆102Updated 7 years ago
- A deprecated Python wrapper for the DocumentCloud API☆62Updated 4 years ago
- A simple command line interface to the datamade/dedupe library.☆42Updated 2 years ago
- Python package to detect and return RSS / Atom feeds for a given website. The tool supports major blogging platform including Wordpress, …☆21Updated 3 years ago