gambolputty / newscorpusLinks
A Python scraping module, that extracts text from articles found in RSS feeds. Uses SQLite as database.
☆19Updated 11 months ago
Alternatives and similar repositories for newscorpus
Users that are interested in newscorpus are comparing it to the libraries listed below
Sorting:
- Next-generation Punkt sentence boundary detection with zero dependencies☆17Updated 2 months ago
- etl pipeline, graphical explorer and general toolbox for investigations with follow the money data☆23Updated last year
- A Python database interface for eXist-db☆14Updated 6 months ago
- A Python library for defining rule-based overrides on messy data☆16Updated 2 months ago
- Named-Entity Recognition extension for OpenRefine☆28Updated 2 years ago
- Extract networks of entities from journalistic reporting☆48Updated last year
- Adds a reconciliation API endpoint to Datasette, based on the Reconciliation Service API specification.☆24Updated last year
- Automated listing of repos in GitHub with XML files containing teiHeader. Find a project using TEI today!☆16Updated this week
- OpenRefine command-line interface written in Bash (💎+🤖). Supports batch processing (import, transform, export).☆17Updated this week
- Bagpipes spaCy is a collection of custom spaCy pipeline components designed to enhance text processing capabilities.☆18Updated 10 months ago
- Knowledge graph construction: Fast inserts into a Wikibase instance☆45Updated 3 years ago
- OpenRefine reconciler for Research Organization Registry☆13Updated 2 months ago
- Heritage Connector: Transforming text into data to extract meaning and make connections☆24Updated 2 years ago
- DBpedia, which frequently crawls and analyses over 120 Wikipedia language editions has near complete information about (1) which facts ar…☆11Updated 2 years ago
- A deep learning architecture for reference mining from literature in the arts and humanities.☆16Updated 5 years ago
- Linked SDMX☆17Updated 10 years ago
- OpenRefine Reconciliation Framework in Python and Flask☆21Updated 2 years ago
- Adding links to full text in Wikipedia references☆37Updated last week
- Open database of scholarly journals☆10Updated 2 years ago
- A library that provides an ergonomic, DOM-like model for XML encoded text documents.☆17Updated last month
- Process, enhance and evaluate multiple OCR output.☆22Updated 7 months ago
- A platform-agnostic, configurable, and brandable SPARQL editor and visualization interface.☆13Updated 2 months ago
- ☆25Updated 2 years ago
- Citation Classification using hybrid neural network model for Wikipedia References☆29Updated 2 years ago
- Python based Wikidata framework for easy dataframe extraction☆44Updated last year
- How can we improve name matching in screening tools?☆13Updated 4 months ago
- Data Mining Historical Newspaper Metadata (METS/ALTO formats)☆25Updated 2 years ago
- Collection de romans français du dix-huitième siècle (1751-1800) / Collection of Eighteenth-Century French Novels (1751-1800)☆22Updated last year
- Code and models for our CLEF-HIPE (Named Entity Processing on Historical Newspapers) submissions☆19Updated 2 years ago
- Python package to reconcile DataFrames☆24Updated 2 years ago