gambolputty / newscorpus
A Python scraping module, that extracts text from articles found in RSS feeds. Uses SQLite as database.
☆18Updated 8 months ago
Alternatives and similar repositories for newscorpus:
Users that are interested in newscorpus are comparing it to the libraries listed below
- Python based Wikidata framework for easy dataframe extraction☆43Updated last year
- 📜 Dehyphenation of broken text (mainly German), i.e., extracted from a PDF☆38Updated 3 years ago
- etl pipeline, graphical explorer and general toolbox for investigations with follow the money data☆15Updated last year
- Extract networks of entities from journalistic reporting☆48Updated last year
- A Python library for defining rule-based overrides on messy data☆13Updated 3 months ago
- Web interface for network analysis.