gambolputty / newscorpus
A Python scraping module, that extracts text from articles found in RSS feeds. Uses SQLite as database.
☆17Updated 2 months ago
Related projects: ⓘ
- 📜 Dehyphenation of broken text (mainly German), i.e., extracted from a PDF☆38Updated 2 years ago
- Extract networks of entities from journalistic reporting☆46Updated last year
- Process, enhance and evaluate multiple OCR output.☆20Updated 5 years ago
- A deep learning model for extracting references from text☆24Updated 10 months ago
- Adds a reconciliation API endpoint to Datasette, based on the Reconciliation Service API specification.☆22Updated 7 months ago
- etl pipeline, graphical explorer and general toolbox for investigations with follow the money data☆13Updated 8 months ago
- Extract structured data online☆10Updated 9 months ago
- Adding links to full text in Wikipedia references☆37Updated 8 months ago
- ☆31Updated last year
- Named-Entity Recognition extension for OpenRefine☆24Updated last year
- Python based Wikidata framework for easy dataframe extraction☆39Updated 9 months ago
- OpenRefine reconciler for Research Organization Registry☆12Updated last year
- ☆11Updated 3 years ago
- Collection de romans français du dix-huitième siècle (1751-1800) / Collection of Eighteenth-Century French Novels (1751-1800)☆19Updated 4 months ago
- A deep learning architecture for reference mining from literature in the arts and humanities.☆15Updated 5 years ago
- Code and models for our CLEF-HIPE (Named Entity Processing on Historical Newspapers) submissions☆19Updated last year
- DBpedia, which frequently crawls and analyses over 120 Wikipedia language editions has near complete information about (1) which facts ar…☆10Updated last year
- A PDF classifier ensemble with REST API service☆23Updated 3 years ago
- Topic Modeling Workflow in Python☆16Updated last year
- Python package to reconcile DataFrames☆21Updated last year
- Specifications of the reconciliation API☆31Updated this week
- Citation Classification using hybrid neural network model for Wikipedia References☆26Updated last year
- Data Mining Historical Newspaper Metadata (METS/ALTO formats)☆24Updated 2 years ago
- Fast, permanent and flexible patterns for sharing and computing on texts with metadata using Apache Arrow.☆14Updated 2 years ago
- This repository makes available the Talk of Norway (ToN) dataset, a collection of Norwegian parliament speeches from 1998 to 2016. Every …☆29Updated last year
- Tutorial on NE processing for Digital Humanities - DH Utrech 2019☆25Updated 5 years ago
- A Python library for defining rule-based overrides on messy data☆11Updated 8 months ago
- UI and API to the Integrated Authority File (Gemeinsame Normdatei, GND)☆25Updated this week
- OpenRefine Reconciliation Framework in Python and Flask☆17Updated last year
- Knowledge graph construction: Fast inserts into a Wikibase instance☆44Updated 2 years ago