currentslab / fastlangidLinks
fastlangid, the only language identification package that support cantonese (zh-yue), simplified (zh-hans) and traditional chinese (zh-hant)
☆41Updated 2 years ago
Alternatives and similar repositories for fastlangid
Users that are interested in fastlangid are comparing it to the libraries listed below
Sorting:
- Tool to fix bitexts and tag near-duplicates for removal☆33Updated last month
 - A python true casing utility that restores case information for texts☆89Updated 2 years ago
 - ☆174Updated 7 months ago
 - A simple neural truecaser written in pytorch and allennlp.☆33Updated last year
 - Python3 bindings for the Compact Language Detector v3 (CLD3)☆154Updated 2 years ago
 - Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures Translation☆15Updated last year
 - An asynchronous concurrent pipeline for classifying Common Crawl based on fastText's pipeline.☆86Updated 4 years ago
 - A tiny BERT for low-resource monolingual models☆31Updated last month
 - Lightning Fast Language Prediction 🚀☆167Updated 2 months ago
 - A python module for word inflections designed for use with spaCy.☆93Updated 5 years ago
 - Automatic extraction of edited sentences from text edition histories.☆83Updated 3 years ago
 - Language detection using Spacy and Fasttext☆57Updated last year
 - ☆30Updated 3 years ago
 - BERT models for many languages created from Wikipedia texts☆33Updated 5 years ago
 - SpanAlign: Sentence Alignment Method based on Cross-Language Span Prediction and ILP☆14Updated 4 years ago
 - Sentence transformers models for SpaCy☆107Updated 2 years ago
 - Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)☆75Updated 7 months ago
 - zero-vocab or low-vocab embeddings☆18Updated 3 years ago
 - ☆15Updated 6 years ago
 - Crawling engine that crawls a set of top-level domains looking for documents in a list of languages☆11Updated last year
 - OpusFilter - Parallel corpus processing toolkit☆110Updated last month
 - ☆57Updated 3 years ago
 - Examples for aligning, padding and batching sequence labeling data (NER) for use with pre-trained transformer models☆64Updated 2 years ago
 - List of corpora annotated for coreference for different languages☆17Updated last year
 - Rust-based Python wrapper for duckling library in Haskell☆25Updated 4 years ago
 - General-Purpose Neural Networks for Sentence Boundary Detection☆73Updated 2 years ago
 - NTREX -- News Test References for MT Evaluation☆85Updated last year
 - Language independent truecaser in Python.☆160Updated 4 years ago
 - DKPro C4CorpusTools is a collection of tools for processing CommonCrawl corpus, including Creative Commons license detection, boilerplate…☆52Updated 5 years ago
 - Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.☆160Updated last year