davidsbatista / lexiconsLinks
Dictionaries of names, surnames, acronyms and it's extensions, stop-words, etc., which I gathered for different experiments.
β28Updated 8 years ago
Alternatives and similar repositories for lexicons
Users that are interested in lexicons are comparing it to the libraries listed below
Sorting:
- A simple neural truecaser written in pytorch and allennlp.β33Updated last year
- πNeural Sentential Paraphrase Generation to Augment Chatbot Training Datasetβ21Updated 2 years ago
- General-Purpose Neural Networks for Sentence Boundary Detectionβ73Updated 2 years ago
- BERT models for many languages created from Wikipedia textsβ33Updated 5 years ago
- An example of how to use spaCy for extremely large files without running into memory issuesβ36Updated 2 years ago
- An asynchronous concurrent pipeline for classifying Common Crawl based on fastText's pipeline.β86Updated 4 years ago
- Keras implementation of ontology aware token embeddingsβ49Updated 6 years ago
- Many Natural Language Processing tasks rely on sentence boundary detection (SBD). Although amazing libraries like spacy provide state of β¦β61Updated 4 years ago
- Hierarchical word clustering, following "Brown clustering" (Brown et al., 1992)β70Updated 10 years ago
- A way to do annotations for NER. TALEN: Tool for Annotation of Low-resource ENtitiesβ117Updated last month
- β48Updated 2 years ago
- Visualize word embeddings of a vocabulary in TensorBoard, including the neighborsβ46Updated 8 years ago
- Expletives vomiting library...β13Updated 8 years ago
- β31Updated 8 years ago
- Code and data used in named entity transliteration experimentsβ57Updated 7 years ago
- Build a dialog dataset from online books in many languagesβ76Updated 2 years ago
- Pre-trained models and code and data to train and use models from "Pushing the Limits of Paraphrastic Sentence Embeddings with Millions oβ¦β103Updated last year
- As good as new. How to successfully recycle English GPT-2 to make models for other languages (ACL Findings 2021)β48Updated 4 years ago
- Clinical spelling correction with word and character n-gram embeddings.β74Updated 3 years ago
- This repository contains the code for the Form-Context Model and its Attentive Mimicking variant.β31Updated 5 years ago
- This repo contains code and dataset for the Opinosis Summarization Frameworkβ51Updated 5 years ago
- β34Updated 4 years ago
- Code for pre-training CharacterBERT models (as well as BERT models).β34Updated 3 years ago
- Efficient-Sentence-Embedding-using-Discrete-Cosine-Transformβ17Updated 5 years ago
- OpenNeuroSpell contains parts of NeuroSpell (http://neurospell.com/en.php) released as open-source. More code will be published as soon aβ¦β20Updated 9 months ago
- A collection of English tweets annotated in Universal Dependencies.β39Updated 3 years ago
- simple rule based named entity recognitionβ42Updated 3 years ago
- Language modeling scripts based on TensorFlowβ58Updated 6 years ago
- COMBO is jointly trained tagger, lemmatizer and dependency parser.β35Updated 2 years ago
- Multi-lingual Text Processingβ96Updated 6 years ago