jerry2yu / ngrams
A package in C++ for character or word ngram analysis. It uses Ternary Search Tree instead of hashing table for faster ngram frequency counting. Words are converted to unique IDs and encoded to more compact base 256 integers. It is a partial implementation of Dr. Vlado Keselj 's Text-Ngrams 1.6, which is a very flexible Ngram package in perl.
☆20Updated 9 years ago
Alternatives and similar repositories for ngrams:
Users that are interested in ngrams are comparing it to the libraries listed below
- Code and data related to "Efficient, Compositional, Order-Sensitive n-gram Embeddings" (EACL 2017)☆14Updated 8 years ago
- Code repo for the SIGIR '17 paper "Efficient Cost-Aware Cascade Ranking for Multi-Stage Retrieval"☆10Updated 2 years ago
- Deep learning model of machine translation using attentional and structural biases☆13Updated 7 years ago
- CytonMT: an Efficient Neural Machine Translation Open-source Toolkit Implemented in C++☆21Updated 6 years ago
- Dynamic Entity Summarization (DynES)☆20Updated 5 years ago
- Frame-Semantic and PropBank Semantic Role Labeling with Syntactic Scaffolding.☆50Updated 3 years ago
- Robust Cross-lingual Embeddings from Parallel Sentences☆22Updated 4 years ago
- Entity Linking in Queries: Efficiency vs. Effectiveness☆18Updated 7 years ago
- Final parser submitted by ParisNLP team for CoNLL 2018 Shared Task on Multilingual Parsing☆11Updated 6 years ago
- Fine-grained Entity Typing / Fine-grained Entity Classification☆12Updated 7 years ago
- An example of DyNet autobatching for the NIPS "how to code a paper" workshop☆12Updated 7 years ago
- Named Entity Recognition (NER) models (neural and sparse) implemented based on package LibN3L☆19Updated 8 years ago
- Programme used to project the words having vector representation. It help to visualize, how efficiently words are represented☆7Updated 9 years ago
- C++ implementation of a part-of-speech (POS) tagger using the lookahead tagging algorithm.☆12Updated 5 years ago
- Extractive and Compressive Neural Summarization Based on Summary State Representations (NAACL 2019)☆15Updated 4 years ago
- An example of how to use spaCy for extremely large files without running into memory issues☆36Updated 2 years ago
- Code for the paper "Latent Relation Language Models" at AAAI-20.☆41Updated 4 years ago
- Efficient Sentence Embedding via Semantic Subspace Analysis☆14Updated 5 years ago
- maximum inner product tree☆26Updated 12 years ago
- Context Encoders (ConEc) as a simple but powerful extension of the word2vec model for learning word embeddings☆21Updated 4 years ago
- C++ implementation of the Hellinger PCA for computing word embeddings.☆32Updated 8 years ago
- ☆31Updated 8 years ago
- Getting interpretable dimensions in word embedding spaces.☆14Updated last year
- Zero-Shot Open Entity Typing as Type-Compatible Grounding, EMNLP'18.☆42Updated 5 years ago
- Twpipe is a pipeline toolkit that parses raw tweets into universal dependencies.☆28Updated 6 years ago
- Cross-domain word representation learning☆10Updated 9 years ago
- Entity Linking in Queries: Tasks and Evaluation☆33Updated last year
- Tree-Structured, First- and Higher-Order Linear Chain, and Semi-Markov CRFs☆45Updated 5 years ago
- Hacky implementation of ppjoin by Chuan Xia et Al☆20Updated 10 years ago
- Resources for the Tutorial on "Utilizing Knowledge Bases in Text-centric Information Retrieval"☆24Updated 8 years ago