medialab / uralLinks
A helper library full of URL-related heuristics.
☆70Updated 2 months ago
Alternatives and similar repositories for ural
Users that are interested in ural are comparing it to the libraries listed below
Sorting:
- Now included in rigour☆151Updated last week
- Web scraping Page Objects core library☆101Updated last month
- Extract text from HTML☆134Updated 5 years ago
- Fast and robust date extraction from web pages, with Python or on the command-line☆136Updated last week
- Alternative robots parser module for Python☆18Updated last month
- Extract networks of entities from journalistic reporting☆48Updated 2 years ago
- A webmining CLI tool & library for python.☆333Updated last month
- CoCrawler is a versatile web crawler built using modern tools and concurrency.☆191Updated 3 years ago
- Ingestors extract the contents of mixed unstructured documents into structured (followthemoney) data.☆66Updated last week
- Parse numbers written in natural language☆122Updated 9 months ago
- API client for Aleph, supports bulk entity and document upload.☆28Updated 9 months ago
- URL normalization for Python☆96Updated 3 months ago
- An automated, programming-free web scraper for interactive sites☆111Updated 2 years ago
- Python based Wikidata framework for easy dataframe extraction☆45Updated last year
- how hard is it to get a list of all local news sites in the United States (LOL)☆8Updated 5 years ago
- etl pipeline, graphical explorer and general toolbox for investigations with follow the money data☆23Updated 3 weeks ago
- Inspect a URL and estimate if it contains a news story☆39Updated 8 months ago
- Ultimate Website Sitemap Parser☆223Updated last month
- A tiny library for Python text normalisation. Useful for ad-hoc text processing.☆153Updated 2 weeks ago
- A Python scraping module, that extracts text from articles found in RSS feeds. Uses SQLite as database.☆19Updated last year
- Lightweight web scraping toolkit for documents and structured data.☆313Updated last year
- A pure-Python robots.txt parser with support for modern conventions.☆70Updated 2 weeks ago
- python functions for applied use of schema.org☆38Updated 3 years ago
- Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters☆142Updated 7 months ago
- Effortless conversion between data formats like JSON, XML and CSV☆120Updated 3 years ago
- ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of diff…☆88Updated 3 years ago
- Page Object pattern for Scrapy☆122Updated last month
- Trying to generate name synonyms from wikidata☆32Updated 5 years ago
- DocumentCloud's back end source code - Please report bugs, issues and feature requests to info@documentcloud.org☆40Updated 3 weeks ago
- A Python library for defining rule-based overrides on messy data☆15Updated 2 weeks ago