saffsd / langid.c
Pure C natural language identifier with support for 97 languages
☆25Updated 7 years ago
Alternatives and similar repositories for langid.c:
Users that are interested in langid.c are comparing it to the libraries listed below
- Lightweight C++ translator for OpenNMT Torch models (deprecated)☆79Updated 4 years ago
- Simhashing in C++☆132Updated 2 years ago
- C++ wrapper library for the NLP library spaCy☆102Updated 2 years ago
- Non-Overlapping Aho-Corasick Python extension, for Python 2 (str and unicode) and Python 3☆51Updated 9 years ago
- Automatically exported from code.google.com/p/chromium-compact-language-detector☆161Updated 4 years ago
- Bitextor generates translation memories from multilingual websites☆292Updated 4 months ago
- C++ implementation for Neural Network-based NLP, such as LSTM machine translation!☆87Updated 7 years ago
- Language Detection based on Chromium's Compact Language Detector library☆106Updated 4 years ago
- A Multilingual and Multilevel Representation Learning Toolkit for NLP☆116Updated 7 years ago
- A multilingual dependency parser based on linear programming relaxations.☆115Updated 6 years ago
- Python bindings for cld3☆27Updated last year
- Fast and customizable text tokenization library with BPE and SentencePiece support☆304Updated 7 months ago
- A simple and fast discriminative sequence labeling toolkit ( http://wapiti.limsi.fr )☆252Updated 2 years ago
- An efficient character based RNN☆91Updated 6 years ago
- Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipg…☆126Updated 3 months ago
- Sentence aligner☆112Updated 3 years ago
- Decoder, aligner, and model optimizer for statistical machine translation and other structured prediction models based on (mostly) contex…☆185Updated 4 years ago
- Parallelizing word2vec in shared and distributed memory☆190Updated 2 years ago
- A-C implementation in "C". Tight-packed (interleaved) state-transition matrix -- as fast as it gets, as small as it gets.☆147Updated 4 years ago
- MIT Language Modeling Toolkit☆116Updated 5 years ago
- GIZA++ is a statistical machine translation toolkit that is used to train IBM Models 1-5 and an HMM word alignment model. This package al…☆265Updated 2 years ago
- A word alignment tool based on famous GIZA++, extended to support multi-threading, resume training and incremental training.☆161Updated 3 years ago
- Corpus preprocessing☆95Updated last year
- TheanoLM is a recurrent neural network language modeling tool implemented using Theano☆81Updated 9 months ago
- Fast Word Clustering Software☆78Updated last month
- Extension of the original word2vec using different architectures☆210Updated 8 years ago
- Train bilingual embeddings as described in our NAACL 2015 workshop paper "Bilingual Word Representations with Monolingual Quality in Mind…☆76Updated 5 years ago
- Code to train and use models from "Charagram: Embedding Words and Sentences via Character n-grams".☆124Updated 8 years ago
- Extract a plain text corpus from MediaWiki XML dumps, such as Wikipedia.☆132Updated 6 years ago
- BLLIP reranking parser (also known as Charniak-Johnson parser, Charniak parser, Brown reranking parser) See http://pypi.python.org/pypi/b…☆226Updated 3 years ago