gambolputty / newscorpus
A Python scraping module, that extracts text from articles found in RSS feeds. Uses SQLite as database.
☆19Updated 9 months ago
Alternatives and similar repositories for newscorpus:
Users that are interested in newscorpus are comparing it to the libraries listed below
- etl pipeline, graphical explorer and general toolbox for investigations with follow the money data☆19Updated last year
- Extract networks of entities from journalistic reporting☆48Updated last year
- Python based Wikidata framework for easy dataframe extraction☆43Updated last year
- 📜 Dehyphenation of broken text (mainly German), i.e., extracted from a PDF☆38Updated 3 years ago
- Named-Entity Recognition extension for OpenRefine☆27Updated 2 years ago
- Citation Classification using hybrid neural network model for Wikipedia References☆28Updated 2 years ago
- A deep learning architecture for reference mining from literature in the arts and humanities.☆15Updated 5 years ago
- ☆9Updated 9 years ago
- A deep learning model for extracting references from text☆28Updated last year
- Adds a reconciliation API endpoint to Datasette, based on the Reconciliation Service API specification.☆24Updated last year
- Adding links to full text in Wikipedia references☆37Updated last year
- Jupyter notebook showcases using the Open Legal Data API☆20Updated 2 years ago
- Linked SDMX☆18Updated 10 years ago
- OpenRefine reconciler for Research Organization Registry☆13Updated last week
- A Python database interface for eXist-db☆14Updated 3 months ago
- Example SPARQL queries, mostly for working with ZBW data sets☆16Updated 7 months ago
- Process, enhance and evaluate multiple OCR output.☆22Updated 5 months ago
- A maximum-strength name parser for record linkage.☆36Updated 2 weeks ago
- Open database of scholarly journals☆10Updated 2 years ago
- Legal Reference Extraction☆29Updated 7 months ago
- daten von offenesparlament.de☆14Updated 7 years ago
- WordWanderer – take your text for a walk☆12Updated 5 years ago
- VIAF via Python☆11Updated 11 months ago
- Wikidata authority file mapping tool☆11Updated 6 years ago
- Named entity recognition for the legal domain☆42Updated 3 years ago
- Small Python library to validate persistent identifiers used in scholarly communication.☆29Updated 2 weeks ago
- Inspect a URL and estimate if it contains a news story☆39Updated 4 months ago
- Topic Modeling Workflow in Python☆16Updated 2 years ago
- ☆32Updated 2 years ago
- ☆15Updated 2 years ago