jerry2yu / ngrams
A package in C++ for character or word ngram analysis. It uses Ternary Search Tree instead of hashing table for faster ngram frequency counting. Words are converted to unique IDs and encoded to more compact base 256 integers. It is a partial implementation of Dr. Vlado Keselj 's Text-Ngrams 1.6, which is a very flexible Ngram package in perl.
☆20Updated 9 years ago
Alternatives and similar repositories for ngrams:
Users that are interested in ngrams are comparing it to the libraries listed below
- Frame-Semantic and PropBank Semantic Role Labeling with Syntactic Scaffolding.☆50Updated 3 years ago
- C++ implementation of a part-of-speech (POS) tagger using the lookahead tagging algorithm.☆12Updated 5 years ago
- ☆16Updated 10 years ago
- Tree-Structured, First- and Higher-Order Linear Chain, and Semi-Markov CRFs☆45Updated 5 years ago
- Context Encoders (ConEc) as a simple but powerful extension of the word2vec model for learning word embeddings☆21Updated 4 years ago
- An entity linking prototype, developed using the datasets from the TAC-KBP sub-task.☆28Updated 7 years ago
- Statistical discontinuous constituent parsing☆11Updated 7 years ago
- Entity Linking in Queries: Tasks and Evaluation☆33Updated last year
- Learned string similarity for entity names using optimal transport.☆35Updated 4 years ago
- CytonMT: an Efficient Neural Machine Translation Open-source Toolkit Implemented in C++☆21Updated 6 years ago
- several algorithms for converting dependency structures into constituency structures.☆10Updated 3 years ago
- Code and data related to "Efficient, Compositional, Order-Sensitive n-gram Embeddings" (EACL 2017)☆14Updated 7 years ago
- FIGMENT☆15Updated 5 years ago
- PyTorch port of BERT ML model☆16Updated 6 years ago
- Word Sense Induction with BERT MLM☆28Updated last year
- Fine-grained Entity Typing / Fine-grained Entity Classification☆12Updated 6 years ago
- Java code from the 2008 EMNLP paper "Bayesian Unsupervised Topic Segmentation" by Eisenstein and Barzilay☆36Updated 9 years ago
- ☆26Updated 8 years ago
- source code of bison☆26Updated 4 years ago
- Symmetrized word alignment models, based on mgizapp and GIZA++☆14Updated 10 years ago
- Dynamic Entity Summarization (DynES)☆20Updated 5 years ago
- ☆15Updated 4 years ago
- Risk Minimization Algorithms in Structured Prediction (JMLR 2016)☆13Updated 8 years ago
- Dependency-based Word Embeddings (Levy and Goldberg, 2014) with BZ2 compression support.☆21Updated 9 years ago
- Learning to Distinguish Hypernyms and Co-Hyponyms☆18Updated 10 years ago
- ☆14Updated 5 years ago
- Entity Linking in Queries: Efficiency vs. Effectiveness☆18Updated 7 years ago
- SMOR (Stuttgart Morphology) with alternative lemmatization component☆12Updated last year
- ☆9Updated 4 years ago
- ☆21Updated 8 years ago