Skylion007 / OpenWebTextCorpusLinks
☆23Updated last year
Alternatives and similar repositories for OpenWebTextCorpus
Users that are interested in OpenWebTextCorpus are comparing it to the libraries listed below
Sorting:
- DKPro C4CorpusTools is a collection of tools for processing CommonCrawl corpus, including Creative Commons license detection, boilerplate…☆52Updated 5 years ago
- Python SDK for the TextRazor Text Analytics API☆20Updated 2 years ago
- Raw Wikipedia counts for entity linking☆19Updated 8 years ago
- spaCy pipeline component for adding text readability meta data to Doc objects.☆56Updated 6 years ago
- Experiments to help discussion on Wikipedia talk pages☆67Updated this week
- A web application tagging and retrieval of arguments in text☆29Updated 2 years ago
- Linking Entities in CommonCrawl Dataset onto Wikipedia Concepts☆59Updated 13 years ago
- Wikidata embedding☆51Updated last year
- A Named-Entity Recogniser based on Grobid.☆54Updated 8 months ago
- A collection of simple tutorials for using Fonduer☆100Updated 5 years ago
- Language detection extension for spaCy 2.0+☆114Updated 6 years ago
- A temporal ordering system for events and time expressions in written text.☆42Updated 3 years ago
- IXA pipes Named Entity Tagger (http://ixa2.si.ehu.es/ixa-pipes).☆33Updated 6 years ago
- A way to do annotations for NER. TALEN: Tool for Annotation of Low-resource ENtities☆118Updated 6 months ago
- Segtok v2 is here: https://github.com/fnl/syntok -- A rule-based sentence segmenter (splitter) and a word tokenizer using orthographic fe…☆171Updated 4 years ago
- Natural Language Data Augmentation Tool for Conversational Systems☆115Updated 3 years ago
- A Dependency Parser for Tweets☆78Updated 6 years ago
- Temporal Expression Recognition and Normalisation in Python☆77Updated 10 years ago
- Disambiguation of Semantic Resources - Full framework☆30Updated 9 years ago
- An unsupervised compound splitter☆42Updated 6 years ago
- ☆13Updated 8 years ago
- ☆70Updated 3 years ago
- AmbiverseNLU: A Natural Language Understanding suite by Max Planck Institute for Informatics☆213Updated 2 years ago
- Running Prodigy for a team of annotators☆53Updated 5 years ago
- ADEL is a robust and efficient entity linking framework that is adaptive to text genres and language, entity types for the classification…☆19Updated 6 years ago
- 💙 Emoji handling and meta data for spaCy with custom extension attributes☆183Updated 2 years ago
- Socially-Equitable Language Identification☆78Updated 2 years ago
- High-coverage and high-precision lexica of terms annotated with emotion scores for English and Italian.☆155Updated last year
- A clean and easy interface for performing nearest-neighbor lookups☆50Updated 6 years ago
- Implementation of a simple frame identification approach (SimpleFrameId) described in the paper "Out-of-domain FrameNet Semantic Role Lab…☆15Updated 8 years ago