Elizafox / cld3
Python bindings for cld3
☆27Updated last year
Alternatives and similar repositories for cld3:
Users that are interested in cld3 are comparing it to the libraries listed below
- Python3 bindings for the Compact Language Detector v3 (CLD3)☆149Updated last year
- Segtok v2 is here: https://github.com/fnl/syntok -- A rule-based sentence segmenter (splitter) and a word tokenizer using orthographic fe…☆170Updated 3 years ago
- ☆167Updated 8 months ago
- Language detection extension for spaCy 2.0+☆112Updated 6 years ago
- Non-Overlapping Aho-Corasick Python extension, for Python 2 (str and unicode) and Python 3☆51Updated 9 years ago
- Python bindings to the Compact Language Detector☆33Updated 4 years ago
- Python bindings for libwapiti☆66Updated 5 years ago
- Python search module for fast approximate string matching☆54Updated 2 years ago
- Fast multi-keyword search engine for text strings☆252Updated 5 months ago
- A Python implementation of the Metaphone and Double Metaphone algorithms☆81Updated 11 months ago
- Language independent truecaser in Python.☆160Updated 3 years ago
- Yet another Python binding for fastText☆226Updated 6 years ago
- Fast supervised sentence boundary detection using the averaged perceptron☆90Updated 6 years ago
- A tool to segment text based on frequencies and the Viterbi algorithm "#TheBoyWhoLived" => ['#', 'The', 'Boy', 'Who', 'Lived']☆82Updated 8 years ago
- An efficient simhash implementation for python☆124Updated 5 years ago
- Labeled examples from wiki dumps in Python☆67Updated 8 years ago
- Text tokenization and sentence segmentation (segtok v2)☆202Updated 2 years ago
- Convert word2vec vectors between binary and plain text format☆135Updated 5 years ago
- Efficient Sequence Labeling☆24Updated 10 months ago
- A fully customisable language detection pipeline for spaCy☆92Updated 5 years ago
- LASER multilingual sentence embeddings as a pip package☆224Updated last year
- C++ implementation of Generalised Brown clustering and python scripts for feature generation☆41Updated 8 years ago
- A multilingual, multi-style and multi-granularity dataset for cross-language textual similarity detection☆60Updated 7 years ago
- Socially-Equitable Language Identification☆78Updated last year
- spaCy + UDPipe☆160Updated 2 years ago
- Named Entity Recognition data for Europeana Newspapers☆171Updated last year
- Quickly extract multi-word phrases from a corpus☆190Updated 4 years ago
- Hunspell extension for spaCy 2.0.☆94Updated 6 months ago
- Automatically exported from code.google.com/p/chromium-compact-language-detector☆160Updated 4 years ago
- Named Entity Recognition based on dictionaries☆242Updated 5 years ago