davidsbatista / lexiconsLinks
Dictionaries of names, surnames, acronyms and it's extensions, stop-words, etc., which I gathered for different experiments.
☆28Updated 8 years ago
Alternatives and similar repositories for lexicons
Users that are interested in lexicons are comparing it to the libraries listed below
Sorting:
- A simple neural truecaser written in pytorch and allennlp.☆33Updated last year
- BERT models for many languages created from Wikipedia texts☆33Updated 5 years ago
- Text processing library for sentiment analysis and related tasks☆27Updated 7 years ago
- Build a dialog dataset from online books in many languages☆76Updated 3 years ago
- A curated list of Natural Language Generation papers, tutorials, and blogs.☆12Updated 7 years ago
- Code and data used in named entity transliteration experiments☆57Updated 7 years ago
- ☆31Updated 8 years ago
- An asynchronous concurrent pipeline for classifying Common Crawl based on fastText's pipeline.☆86Updated 4 years ago
- 📄Neural Sentential Paraphrase Generation to Augment Chatbot Training Dataset☆21Updated 3 years ago
- Keras implementation of ontology aware token embeddings☆49Updated 7 years ago
- Corpus preprocessing☆99Updated last year
- Code to compute topic coherence for several topic cardinalities and aggregate scores across them☆22Updated 3 months ago
- COMBO is jointly trained tagger, lemmatizer and dependency parser.☆35Updated 2 years ago
- Hierarchical word clustering, following "Brown clustering" (Brown et al., 1992)☆70Updated 10 years ago
- Many Natural Language Processing tasks rely on sentence boundary detection (SBD). Although amazing libraries like spacy provide state of …☆62Updated 5 years ago
- Pipeline component for spaCy (and other spaCy-wrapped parsers such as spacy-stanza and spacy-udpipe) that adds CoNLL-U properties to a Do…☆81Updated last year
- Pre-trained models and code and data to train and use models from "Pushing the Limits of Paraphrastic Sentence Embeddings with Millions o…☆103Updated 2 years ago
- Open-source tools for morphological tagging, segmentation and stemming.☆40Updated 6 years ago
- ☆32Updated 4 years ago
- Deep-learning based sentence auto-segmentation from unstructured text w/o punctuation☆36Updated 8 years ago
- c++ mosestokenizer☆18Updated last year
- A collection of English tweets annotated in Universal Dependencies.☆39Updated 4 years ago
- Brown clustering in Python☆22Updated 8 years ago
- A way to do annotations for NER. TALEN: Tool for Annotation of Low-resource ENtities☆118Updated 5 months ago
- An example of how to use spaCy for extremely large files without running into memory issues☆36Updated 3 years ago
- High-coverage and high-precision lexica of terms annotated with emotion scores for English and Italian.☆155Updated last year
- A Benchmark Dataset for Understanding Disfluencies in Question Answering☆64Updated 4 years ago
- numeric fused-head identification and resolution☆33Updated 6 years ago
- A Baseline for Multilingual Sentiment Analysis☆36Updated last year
- ☆48Updated 7 years ago