roy-ht / editdistance
Fast implementation of the edit distance(Levenshtein distance)
☆661Updated 9 months ago
Related projects ⓘ
Alternatives and complementary repositories for editdistance
- The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity☆1,264Updated 3 years ago
- Python port of Moses tokenizer, truecaser and normalizer☆488Updated 5 months ago
- 🐍💯pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.☆809Updated 3 months ago
- spellchecking library for python☆601Updated 5 months ago
- pyxDamerauLevenshtein implements the Damerau-Levenshtein (DL) edit distance algorithm for Python in Cython for high performance.☆243Updated 6 months ago
- Python module (C extension and plain python) implementing Aho-Corasick algorithm☆952Updated 8 months ago
- Find parts of long text or data, allowing for some changes/typos.☆312Updated 3 months ago
- Fast, efficiently stored Trie for Python. Uses libdatrie.☆531Updated 9 months ago
- A python binding for crfsuite☆771Updated last month
- Fast BPE☆656Updated 5 months ago
- scikit-learn inspired API for CRFsuite☆426Updated last year
- Static memory-efficient Trie-like structures for Python based on marisa-trie C++ library.☆1,047Updated last month
- Python library implementing a trie data structure.☆816Updated 3 years ago
- Python, Java implementation of TS-SS called from "A Hybrid Geometric Approach for Measuring Similarity Level Among Documents and Document…☆297Updated 5 years ago
- Fast, DB Backed pretrained word embeddings for natural language processing.☆223Updated last year
- English word segmentation, written in pure-Python, and based on a trillion-word corpus.☆365Updated last year
- Formerly known as code.google.com/p/1-billion-word-language-modeling-benchmark☆442Updated 8 years ago
- Weighted Levenshtein library☆105Updated last year
- Byte Pair Encoding for Python!☆223Updated 2 years ago
- Python Set subclass that supports searching by ngram similarity☆120Updated 3 years ago
- A mutable, self-balancing interval tree. Queries may be by point, by range overlap, or by range containment.☆638Updated 3 months ago
- A list of Neural MT implementations☆359Updated 2 years ago
- The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity☆382Updated 2 years ago
- CRFsuite: a fast implementation of Conditional Random Fields (CRFs)☆648Updated 5 months ago
- Morfessor is a tool for unsupervised and semi-supervised morphological segmentation☆185Updated 4 years ago
- terashuf shuffles multi-terabyte text files using limited memory☆205Updated last year
- 🦉 Modern high-performance serialization utilities for Python (JSON, MessagePack, Pickle)☆435Updated 4 months ago
- Scikit-learn style model finetuning for NLP☆703Updated this week
- Package for evaluating word embeddings☆437Updated 3 years ago
- Unsupervised Word Segmentation for Neural Machine Translation and Text Generation☆2,197Updated 3 months ago