jerry2yu / ngramsLinks
A package in C++ for character or word ngram analysis. It uses Ternary Search Tree instead of hashing table for faster ngram frequency counting. Words are converted to unique IDs and encoded to more compact base 256 integers. It is a partial implementation of Dr. Vlado Keselj 's Text-Ngrams 1.6, which is a very flexible Ngram package in perl.
☆20Updated 10 years ago
Alternatives and similar repositories for ngrams
Users that are interested in ngrams are comparing it to the libraries listed below
Sorting:
- Extractors whose input is a chunked sentence. Includes Relnoun, Nesty, and a scala interface for ReVerb.☆28Updated 8 years ago
- WordRank: Learning Word Embeddings via Robust Ranking☆51Updated 7 years ago
- Utilities for manipulating finite state transducers with the OpenFst library.☆32Updated 8 years ago
- Decoder, aligner, and model optimizer for statistical machine translation and other structured prediction models based on (mostly) contex…☆185Updated 5 years ago
- CS224S Course Project☆14Updated 11 years ago
- SMOR (Stuttgart Morphology) with alternative lemmatization component☆13Updated 2 years ago
- Tree-Structured, First- and Higher-Order Linear Chain, and Semi-Markov CRFs☆45Updated 6 years ago
- Extractive and Compressive Neural Summarization Based on Summary State Representations (NAACL 2019)☆16Updated 5 years ago
- Java code from the 2008 EMNLP paper "Bayesian Unsupervised Topic Segmentation" by Eisenstein and Barzilay☆36Updated 10 years ago
- Hierarchical word clustering, following "Brown clustering" (Brown et al., 1992)☆70Updated 10 years ago
- ☆21Updated 9 years ago
- SWIG Wrapper for the SRILM toolkit☆35Updated 5 years ago
- Neural Reranking for Named Entity Recognition, accepted as regular paper at RANLP 2017☆23Updated 8 years ago
- Lightweight C++ translator for OpenNMT Torch models (deprecated)☆81Updated 5 years ago
- Dynamic Entity Summarization (DynES)☆20Updated 6 years ago
- C++ implementation of a part-of-speech (POS) tagger using the lookahead tagging algorithm.☆12Updated 6 years ago
- Open-source tools for morphological tagging, segmentation and stemming.☆40Updated 6 years ago
- Entity Linking in Queries: Efficiency vs. Effectiveness☆18Updated 8 years ago
- Context Encoders (ConEc) as a simple but powerful extension of the word2vec model for learning word embeddings☆20Updated 5 years ago
- Named Entity Recognition (NER) models (neural and sparse) implemented based on package LibN3L☆19Updated 9 years ago
- Fast Word Clustering Software☆79Updated 11 months ago
- Tools for working with the TREC CAR dataset.☆36Updated 5 months ago
- Improving the effectiveness Lucene's BM25 (and testing it using Yahoo! Answers and Stack Overflow collections)☆16Updated 3 years ago
- Corpus preprocessing☆99Updated last year
- ☆31Updated 8 years ago
- Neural topic modeling☆29Updated 5 years ago
- Semantic embeddings of entities☆66Updated 9 years ago
- ☆14Updated 9 years ago
- Deep learning model of machine translation using attentional and structural biases☆13Updated 8 years ago
- Efficient and effective query auto-completion in C++.☆57Updated 2 years ago