jermp / tongrams_estimationLinks
A C++ library implementing fast language models estimation using the 1-Sort algorithm.
☆17Updated 2 years ago
Alternatives and similar repositories for tongrams_estimation
Users that are interested in tongrams_estimation are comparing it to the libraries listed below
Sorting:
- A C++ library providing fast language model queries in compressed space.☆132Updated 2 years ago
- Utilities for manipulating finite state transducers with the OpenFst library.☆32Updated 8 years ago
- finite-state toolkit, EM and Bayesian (Gibbs sampling) training for FST and context-free derivation forests☆41Updated 3 years ago
- A database of number names for 186 languages, locales, and scripts☆67Updated 2 years ago
- A Translation Task using TurboTransformers☆11Updated 5 years ago
- Compute the most likely permutation of a lattice given an LM☆10Updated 13 years ago
- MozoLM: A language model (LM) serving library☆47Updated 3 weeks ago
- Efficient and effective query auto-completion in C++.☆57Updated 2 years ago
- zero-vocab or low-vocab embeddings☆18Updated 3 years ago
- Fast stand-alone C++ decoder for RNN-based NMT models☆30Updated 5 years ago
- Segmenting a given document using recursive xy-cut algorithm.☆12Updated 7 years ago
- The zhong [|] Chinese grammars☆15Updated 7 months ago
- Deep learning model of machine translation using attentional and structural biases☆13Updated 8 years ago
- A simple semantic search engine for scientific papers.☆28Updated 2 years ago
- An Efficient Language Model Using Double-Array Structures☆17Updated 5 years ago
- CS224S Course Project☆14Updated 11 years ago
- Fast SymSpell written in c++ and exposes to python via pybind11☆44Updated 10 months ago
- GSDMM: Short text clustering (Rust implementation)☆23Updated 2 years ago
- Read-only unofficial mirror of OpenFst☆44Updated 3 years ago
- UniParse: A universal graph-based parsing toolkit☆10Updated 6 years ago
- Open-source implementation of Boostexter (Adaboost based classifier)☆57Updated 7 years ago
- This is the home directory to speaker diarization module being developed for Hetergeneous News data in RedHen Labs as a GSOC Project☆10Updated 10 years ago
- C++ implementation of a part-of-speech (POS) tagger using the lookahead tagging algorithm.☆12Updated 6 years ago
- Unicode Standard tokenization routines and orthography profile segmentation☆38Updated 10 months ago
- A Combinatory Categorial Grammar library.☆22Updated 12 years ago
- Library and command line utility to do approximate string matching of a source against a bitext index and get matched source and target.☆51Updated 8 months ago
- BERT models for many languages created from Wikipedia texts☆33Updated 5 years ago
- ☆14Updated 10 years ago
- Fast and versatile tokenizer for language models, compatible with SentencePiece, Tokenizers, Tiktoken and more. Supports BPE, Unigram and…☆40Updated 3 months ago
- ☆28Updated 4 years ago