gambolputty / newscorpus
A Python scraping module, that extracts text from articles found in RSS feeds. Uses SQLite as database.
☆18Updated 6 months ago
Alternatives and similar repositories for newscorpus:
Users that are interested in newscorpus are comparing it to the libraries listed below
- etl pipeline, graphical explorer and general toolbox for investigations with follow the money data☆15Updated last year
- ☆22Updated last year
- German Parliamentary Corpus (GerParCor)☆23Updated 2 weeks ago
- OpenRefine Reconciliation Framework in Python and Flask☆19Updated last year
- ☆43Updated 5 months ago
- DHLAB is a library of python modules for accessing text and pictures at the National Library of Norway.☆22Updated this week
- Python based Wikidata framework for easy dataframe extraction☆41Updated last year
- Extract networks of entities from journalistic reporting☆47Updated last year
- A deep learning architecture for reference mining from literature in the arts and humanities.☆15Updated 5 years ago
- Heritage Connector: Transforming text into data to extract meaning and make connections☆22Updated last year
- A collection of resources for navigating the Digital Humanities job market☆11Updated 3 years ago
- 📜 Dehyphenation of broken text (mainly German), i.e., extracted from a PDF☆38Updated 2 years ago
- ☆32Updated 2 years ago
- Citation Classification using hybrid neural network model for Wikipedia References☆28Updated 2 years ago
- A collaborative collection of datasets that are common to use within "Follow the Money" investigations with european scope☆13Updated 7 months ago
- Neo4j powered web application for multimedia collections: bring graph-based exploration and crowd-based indexation.☆39Updated 5 years ago
- Repository for the book Among Digitized Manuscripts by L.W. Cornelis van Lit (Leiden: Brill, 2020)☆21Updated 4 years ago
- Code and models for our CLEF-HIPE (Named Entity Processing on Historical Newspapers) submissions☆19Updated last year
- Python package for harvesting records from OAI-PMH provider(s).☆62Updated 2 years ago
- Collection de romans français du dix-huitième siècle (1751-1800) / Collection of Eighteenth-Century French Novels (1751-1800)☆22Updated 9 months ago
- Code repository for whatisdigitalhumanities.com☆32Updated 2 years ago
- Adding links to full text in Wikipedia references☆37Updated last year
- daten von offenesparlament.de☆14Updated 7 years ago
- A JavaScript viewer for geospatial linked data☆15Updated 2 years ago
- Process, enhance and evaluate multiple OCR output.☆22Updated 3 months ago
- Tutorial on NE processing for Digital Humanities - DH Utrech 2019☆25Updated 5 years ago
- Python package to reconcile DataFrames☆24Updated last year
- Repository for Kompakkt, the Web Based multimodal 3D Viewer and 3D Annotation System.☆13Updated this week
- Automated listing of repos in GitHub with XML files containing teiHeader. Find a project using TEI today!☆17Updated this week
- A deep learning model for extracting references from text☆27Updated last year