divkakwani / awesome-newspapersLinks
A Directory of Online Newspaper Sources for 70+ Languages 
☆31Updated 4 years ago
Alternatives and similar repositories for awesome-newspapers
Users that are interested in awesome-newspapers are comparing it to the libraries listed below
Sorting:
- A spaCy wrapper of Entity-Fishing (component) for named entity disambiguation and linking on Wikidata☆165Updated 2 years ago
 - AfroLID, a powerful neural toolkit for African languages identification which covers 517 African languages.☆32Updated 7 months ago
 - 🕸 GlotWeb: Web Indexing for Low-Resource Languages -- under construction.☆15Updated 2 months ago
 - Extract dates from text☆65Updated 4 years ago
 - 🖋 Resource and Tool for Writing System Identification -- LREC 2024☆20Updated last year
 - Language Tool style grammar handling with spaCy 2.0☆42Updated 7 years ago
 - A spaCy wrapper of OpenTapioca for named entity linking on Wikidata☆94Updated 2 years ago
 - A spaCy wrapper for DBpedia Spotlight☆111Updated 2 years ago
 - Information extraction from English and German texts based on predicate logic☆139Updated 2 years ago
 - ☆64Updated 2 years ago
 - Anonymization of legal cases (Fr) based on Flair embeddings☆87Updated 4 years ago
 - BERT and ELECTRA models trained on Europeana Newspapers☆38Updated 3 years ago
 - Implementation of the ClausIE information extraction system for python+spacy☆226Updated 3 years ago
 - A tokenizer and sentence splitter for German and English web and social media texts.☆148Updated 10 months ago
 - Text tokenization and sentence segmentation (segtok v2)☆206Updated 3 years ago
 - A compound word splitter for Python☆49Updated 4 years ago
 - A minimal, pure Python library to interface with CoNLL-U format files.☆152Updated this week
 - DKPro C4CorpusTools is a collection of tools for processing CommonCrawl corpus, including Creative Commons license detection, boilerplate…☆52Updated 5 years ago
 - PYthon Automated Term Extraction☆316Updated 2 years ago
 - A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Updated 6 years ago
 - Filter and format a newline-delimited JSON stream of Wikibase entities☆104Updated last month
 - A python module for English lemmatization and inflection.☆272Updated 2 years ago
 - Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆178Updated 4 months ago
 - LegalCrawler: A tool for automated scraping of English legal corpora☆56Updated 3 years ago
 - A toolkit for CDX indices such as Common Crawl and the Internet Archive's Wayback Machine☆186Updated this week
 - Indian Language Tagger and Chunker (Hindi, Telugu, Tamil, Marathi, Punjabi, Kanada, Malayalam, Urdu, Bengali)☆42Updated 2 years ago
 - Repository for "Towards Robust Named Entity Recognition for Historic German"☆18Updated 4 years ago
 - Resources to go with the Indic NLP Library☆76Updated 3 years ago
 - Meta-repository for the open-source version of the SUMMA Platform☆16Updated last year
 - Fast and robust date extraction from web pages, with Python or on the command-line☆141Updated 3 months ago