jerry2yu / ngrams
A package in C++ for character or word ngram analysis. It uses Ternary Search Tree instead of hashing table for faster ngram frequency counting. Words are converted to unique IDs and encoded to more compact base 256 integers. It is a partial implementation of Dr. Vlado Keselj 's Text-Ngrams 1.6, which is a very flexible Ngram package in perl.
☆20Updated 9 years ago
Related projects ⓘ
Alternatives and complementary repositories for ngrams
- CytonMT: an Efficient Neural Machine Translation Open-source Toolkit Implemented in C++☆21Updated 6 years ago
- ROUGE summarization evaluation metric, enhanced with use of Word Embeddings☆22Updated 6 years ago
- Dynamic Entity Summarization (DynES)☆21Updated 5 years ago
- ☆21Updated 7 years ago
- Final parser submitted by ParisNLP team for CoNLL 2018 Shared Task on Multilingual Parsing☆12Updated 5 years ago
- Training scripts for paper Miceli Barone et al. 2017 "Deep Architectures for Neural Machine Translation"☆11Updated 7 years ago
- Tree-Structured, First- and Higher-Order Linear Chain, and Semi-Markov CRFs☆44Updated 5 years ago
- Context Encoders (ConEc) as a simple but powerful extension of the word2vec model for learning word embeddings☆20Updated 4 years ago
- Symmetrized word alignment models, based on mgizapp and GIZA++☆15Updated 10 years ago
- Experiment with document similarity via Matt Kusner's MWD paper☆25Updated 8 years ago
- Dependency-based Word Embeddings (Levy and Goldberg, 2014) with BZ2 compression support.☆21Updated 8 years ago
- The dataset and statistical analysis code released with the submission of EMNLP 2017 paper "Why We Need New Evaluation Metrics for NLG"☆19Updated 3 years ago
- Frame-Semantic and PropBank Semantic Role Labeling with Syntactic Scaffolding.☆50Updated 3 years ago
- Extractors whose input is a chunked sentence. Includes Relnoun, Nesty, and a scala interface for ReVerb.☆28Updated 7 years ago
- ☆21Updated 6 years ago
- Resources for the Tutorial on "Utilizing Knowledge Bases in Text-centric Information Retrieval"☆24Updated 8 years ago
- Visualize constituent and dependency parses as PDF or image formats, through GraphViz.☆31Updated 3 years ago
- OxLM: Oxford Neural Language Modelling Toolkit☆39Updated 9 years ago
- MARMOT - the open source framework for feature extraction and machine learning, designed to estimate the quality of Machine Translation o…☆22Updated 7 years ago
- C++ implementation of Generalised Brown clustering and python scripts for feature generation☆41Updated 8 years ago
- Programme used to project the words having vector representation. It help to visualize, how efficiently words are represented☆8Updated 9 years ago
- A simple Python wrapper for the ClearNLP constituents-to-dependencies converter☆10Updated 9 years ago
- ☆25Updated last year
- Contains the main implementation of programs for the paper: Reproducing and learning new algebraic operations on word embeddings using ge…☆12Updated 7 years ago
- Generalized Language Modeling toolkit☆51Updated 2 years ago
- Simple Structured Perceptron tagger in Python☆10Updated 7 years ago
- Code and data related to "Efficient, Compositional, Order-Sensitive n-gram Embeddings" (EACL 2017)☆14Updated 7 years ago
- A re-implementation of redpony/cdec's tokenize-anything.pl script in python☆8Updated 8 years ago
- a fork of Ronan Collobert's senna deep learning based NLP tools☆43Updated 11 years ago
- This is an implementation of the paper written by Yuhua Li, David McLean, Zuhair A. Bandar, James D. O’Shea, and Keeley Crockett☆21Updated 5 years ago