gambolputty / newscorpusLinks
A Python scraping module, that extracts text from articles found in RSS feeds. Uses SQLite as database.
☆19Updated 10 months ago
Alternatives and similar repositories for newscorpus
Users that are interested in newscorpus are comparing it to the libraries listed below
Sorting:
- Extract networks of entities from journalistic reporting☆48Updated last year
- A Python library for defining rule-based overrides on messy data☆14Updated last month
- etl pipeline, graphical explorer and general toolbox for investigations with follow the money data☆23Updated last year
- Next-generation Punkt sentence boundary detection with zero dependencies☆17Updated last month
- WordWanderer – take your text for a walk☆12Updated 6 years ago
- 📜 Dehyphenation of broken text (mainly German), i.e., extracted from a PDF☆39Updated 3 years ago
- Jupyter notebook showcases using the Open Legal Data API☆20Updated 2 years ago
- ☆12Updated 2 years ago
- A Python database interface for eXist-db☆14Updated 5 months ago
- Adds a reconciliation API endpoint to Datasette, based on the Reconciliation Service API specification.☆24Updated last year
- daten von offenesparlament.de☆14Updated 7 years ago
- A maximum-strength name parser for record linkage.☆37Updated 3 weeks ago
- Named-Entity Recognition extension for OpenRefine☆28Updated 2 years ago
- How can we improve name matching in screening tools?☆12Updated 4 months ago
- Python based Wikidata framework for easy dataframe extraction☆44Updated last year
- A library that provides an ergonomic, DOM-like model for XML encoded text documents.☆17Updated 3 weeks ago
- DBpedia, which frequently crawls and analyses over 120 Wikipedia language editions has near complete information about (1) which facts ar…☆11Updated 2 years ago
- Provide partial dates and retain the date precision through processing☆13Updated 2 years ago
- This repository makes available the Talk of Norway (ToN) dataset, a collection of Norwegian parliament speeches from 1998 to 2016. Every …☆31Updated last year
- Text Mining and Topic Modeling Toolkit for Python with parallel processing power☆16Updated 2 years ago
- A helper library full of URL-related heuristics.☆69Updated 2 months ago
- Bagpipes spaCy is a collection of custom spaCy pipeline components designed to enhance text processing capabilities.☆18Updated 9 months ago
- Automated listing of repos in GitHub with XML files containing teiHeader. Find a project using TEI today!☆16Updated this week
- Wikidata authority file mapping tool☆11Updated 6 years ago
- OpenRefine reconciler for Research Organization Registry☆13Updated last month
- 🔎 Finds fuzzy matches between datasets☆13Updated 4 months ago
- Python API for KB data-services☆19Updated 5 years ago
- an interactive visual tool for exploring ideologies of political parties from up to date WikiData, using SPARQL, D3js, and PixiJS☆16Updated 3 years ago
- Web interface for network analysis.☆21Updated 2 years ago
- OpenRefine command-line interface written in Bash (💎+🤖). Supports batch processing (import, transform, export).☆17Updated 3 months ago