fujimotos / mbleven
An efficient algorithm for k-bounded (Damerau-)Levenshtein distance
☆17Updated 5 years ago
Related projects: ⓘ
- Successor to Annoy https://github.com/spotify/annoy☆13Updated 8 years ago
- A C++ library implementing fast language models estimation using the 1-Sort algorithm.☆17Updated last year
- Deep learning spelling patterns with a recurrent neural network☆12Updated 7 years ago
- A tool for detecting sentence fragments.☆7Updated 7 years ago
- Anytime Ranking for Impact-Ordered Indexes☆12Updated 7 years ago
- ALMa (Active Learning Manager) Keeps track of labeled and unlabeled data for active learning☆42Updated 4 years ago
- Hidden alignment conditional random field for classifying string pairs.☆25Updated this week
- Neural Solr = Solr 9 + Mighty Inference + Node☆16Updated 2 years ago
- brat rapid annotation tool (brat) - for all your textual annotation needs☆10Updated 6 years ago
- Tokenize and clean strings in Python☆13Updated 6 years ago
- Official repository of Quickscorer: a fast algorithm to rank documents with additive ensembles of regression trees.☆18Updated 8 years ago
- A workflow system for Natural Language Processing.☆21Updated 4 years ago
- Easy language identification of 380 languages☆18Updated 4 years ago
- allennlp + streamlit demo☆21Updated 4 years ago
- A dataset of popular pages (taken from <dir.yahoo.com>) with manually marked up semantic blocks.☆15Updated 10 years ago
- framework for making streamcorpus data☆11Updated 7 years ago
- An author identification system based on recur☆20Updated 7 years ago
- Supporting example for "A Rust SentencePiece implementation"☆18Updated 4 years ago
- Ranking Entity Types using the Web of Data☆30Updated 7 years ago
- VW, Liblinear and StreamSVM compared on webspam☆14Updated 9 years ago
- GSDMM: Short text clustering (Rust implementation)☆23Updated last year
- ☆10Updated this week
- Memory-efficient Count-Min Sketch Counter (based on Madoka C++ library)☆25Updated 5 years ago
- A DeepWalk implementation for ontologies using NetworkX and Gensim☆19Updated 7 years ago
- ☆29Updated 2 years ago
- Converter from UD-trees to BART representation☆37Updated 6 months ago
- Fast IdEntification of State-of-The-Art models using adaptive bandit algorithms☆14Updated 2 years ago
- Tokenization across languages. Useful as preprocessing for subword tokenization.☆21Updated last year
- Official library of images for the SIGIR 2019 Open-Source IR Replicability Challenge (OSIRRC 2019)☆13Updated 5 years ago
- Editor of training sets for page segmentation and zone classification of scholarly PDFs☆11Updated 7 years ago