alephdata / memorious
Lightweight web scraping toolkit for documents and structured data.
☆310Updated last year
Alternatives and similar repositories for memorious:
Users that are interested in memorious are comparing it to the libraries listed below
- Data model and processing tools for investigative entity data☆223Updated last week
- Make it easier to compare and cross-reference the names of companies and people by applying strong normalisation.☆148Updated this week
- Framework and command-line tools for integrating FollowTheMoney data streams from multiple sources☆202Updated this week
- CoCrawler is a versatile web crawler built using modern tools and concurrency.☆189Updated 2 years ago
- API for OpenSanctions with support for entity search and bulk matching of data collections. Supports Reconciliation API spec.☆78Updated this week
- An open database of international sanctions data, persons of interest and politically exposed persons☆530Updated this week
- Ingestors extract the contents of mixed unstructured documents into structured (followthemoney) data.☆58Updated 2 weeks ago
- Extract networks of entities from journalistic reporting☆47Updated last year
- The data journalism platform with built in training☆306Updated last month
- API client for Aleph, supports bulk entity and document upload.☆28Updated 3 months ago
- A tiny library for Python text normalisation. Useful for ad-hoc text processing.☆146Updated 3 weeks ago
- Websites crawler with built-in exploration and control web interface☆334Updated this week
- Web Content Retrieval for Humans™☆617Updated 2 years ago
- A modern Python library for writing maintainable web scrapers.☆245Updated 6 months ago
- searching large heterogenous data dumps with Universal Sentence Encoder☆62Updated 3 years ago
- A self-hosted search engine for documents.☆608Updated this week
- Media Cloud is an open source, open data platform that allows researchers to answer quantitative questions about the content of online me…☆283Updated last year
- A helper library full of URL-related heuristics.☆64Updated 3 months ago
- A cross-platform command line tool for parallelised content extraction and analysis.☆242Updated 2 months ago
- A toolkit for mapping networks of political and economic influence through diverse types of entities and their relations. Accessible at h…☆186Updated 3 years ago
- Platform for journalists to search, analyse, categorise and share unstructured data☆54Updated last week
- An automated, programming-free web scraper for interactive sites☆108Updated last year
- DEPRECATED. Desktop graph visualization application☆50Updated 2 years ago
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆42Updated 5 years ago
- ⛏ a library for scraping unreliable pages☆210Updated 5 months ago
- Easy extraction of keywords and engines from search engine results pages (SERPs).☆90Updated 3 years ago
- legacy backend for Open States☆87Updated 4 years ago
- Social Feed Manager user interface application.☆155Updated 7 months ago
- Utility library to turn country names into ISO two-letter codes☆66Updated this week
- Aviation grade news article metadata extraction☆36Updated last year