gambolputty / newscorpusLinks
A Python scraping module, that extracts text from articles found in RSS feeds. Uses SQLite as database.
☆19Updated last year
Alternatives and similar repositories for newscorpus
Users that are interested in newscorpus are comparing it to the libraries listed below
Sorting:
- etl pipeline, graphical explorer and general toolbox for investigations with follow the money data☆23Updated last year
- Extract networks of entities from journalistic reporting☆48Updated 2 years ago
- Python based Wikidata framework for easy dataframe extraction☆45Updated last year
- Web interface for network analysis.☆21Updated 2 years ago
- Named-Entity Recognition extension for OpenRefine☆29Updated 2 years ago
- A deep learning model for extracting references from text☆29Updated last year
- Adds a reconciliation API endpoint to Datasette, based on the Reconciliation Service API specification.☆24Updated last year
- an interactive visual tool for exploring ideologies of political parties from up to date WikiData, using SPARQL, D3js, and PixiJS☆16Updated 3 years ago
- A helper library full of URL-related heuristics.☆70Updated last month
- A Mashup Interface for Text Analysis Operations☆13Updated 6 months ago
- Web-based data management, network analysis & visualisation environment.☆37Updated 3 months ago
- 📜 Dehyphenation of broken text (mainly German), i.e., extracted from a PDF☆39Updated 3 years ago
- API for OpenSanctions with support for entity search and bulk matching of data collections. Supports Reconciliation API spec.☆86Updated this week
- Data management service that brings continuous data validation to tabular data in your repository via Github Action☆41Updated last year
- WordWanderer – take your text for a walk☆12Updated 6 years ago
- A maximum-strength name parser for record linkage.☆37Updated last month
- Link Wikidata items to large catalogs☆96Updated 4 months ago
- OpenRefine command-line interface written in Bash (💎+🤖). Supports batch processing (import, transform, export).☆18Updated last month
- A Python library for defining rule-based overrides on messy data☆15Updated 3 months ago
- Ricgraph - Research in context graph☆29Updated last week
- How can we improve name matching in screening tools?☆12Updated 5 months ago
- A deep learning architecture for reference mining from literature in the arts and humanities.☆16Updated 5 years ago
- Citation Classification using hybrid neural network model for Wikipedia References☆30Updated 2 years ago
- DBpedia, which frequently crawls and analyses over 120 Wikipedia language editions has near complete information about (1) which facts ar…☆11Updated 2 years ago
- Next-generation Punkt sentence boundary detection with zero dependencies☆17Updated 3 months ago
- One downloader for many scientific data and code repositories! DOI Data☆75Updated this week
- A reconciliation service for OpenRefine serving data from a given CSV file.☆79Updated 5 months ago
- Platform for journalists to search, analyse, categorise and share unstructured data☆55Updated 2 weeks ago
- Linked SDMX☆17Updated 10 years ago
- A Python tool to search for and remove duplicated files in messy datasets☆16Updated 6 months ago