google/cld3

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/google/cld3)

google / cld3

☆886

Alternatives and similar repositories for cld3

Users that are interested in cld3 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

bsolomon1124 / pycld3
View on GitHub
Python3 bindings for the Compact Language Detector v3 (CLD3)
☆154Apr 19, 2026Updated 3 months ago
CLD2Owners / cld2
View on GitHub
Compact Language Detector 2
☆900May 22, 2021Updated 5 years ago
aboSamoor / pycld2
View on GitHub
☆179Mar 28, 2025Updated last year
saffsd / langid.py
View on GitHub
Stand-alone language identification system
☆2,464Jan 1, 2020Updated 6 years ago
Mimino666 / langdetect
View on GitHub
Port of Google's language-detection library to Python.
☆1,897Mar 3, 2025Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
mbanon / fastspell
View on GitHub
Targetted language identifier, based on FastText and Hunspell.
☆38Sep 4, 2025Updated 10 months ago
google / sentencepiece
View on GitHub
Unsupervised text tokenizer for Neural Network-based text generation.
☆11,983Updated this week
shuyo / language-detection
View on GitHub
This is a language detection library implemented in plain Java. (aliases: language identification, language guessing)
☆769Feb 25, 2019Updated 7 years ago
google-research / multilingual-t5
View on GitHub
☆1,294Dec 15, 2022Updated 3 years ago
facebookresearch / LASER
View on GitHub
Language-Agnostic SEntence Representations
☆3,661May 2, 2024Updated 2 years ago
dkpro / dkpro-c4corpus
View on GitHub
DKPro C4CorpusTools is a collection of tools for processing CommonCrawl corpus, including Creative Commons license detection, boilerplate…
☆53Jun 12, 2020Updated 6 years ago
facebookresearch / fastText
View on GitHub
Library for fast text representation and classification.
☆26,549Mar 22, 2024Updated 2 years ago
saffsd / langid.c
View on GitHub
Pure C natural language identifier with support for 97 languages
☆27Sep 26, 2017Updated 8 years ago
aboSamoor / polyglot
View on GitHub
Multilingual text (NLP) processing toolkit
☆2,364Nov 10, 2023Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
facebookresearch / flores
View on GitHub
Facebook Low Resource (FLoRes) MT Benchmark
☆771Nov 20, 2023Updated 2 years ago
facebookresearch / stopes
View on GitHub
A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB te…
☆309Updated this week
bitextor / bicleaner
View on GitHub
Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.
☆160Jun 18, 2024Updated 2 years ago
marian-nmt / marian
View on GitHub
Fast Neural Machine Translation in C++
☆1,462Aug 25, 2023Updated 2 years ago
commoncrawl / language-detection-cld2
View on GitHub
Natural language detection, Java bindings for CLD2
☆17Feb 26, 2026Updated 5 months ago
facebookresearch / fairseq
View on GitHub
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
☆32,253Sep 30, 2025Updated 9 months ago
facebookresearch / XLM
View on GitHub
PyTorch original implementation of Cross-lingual Language Model Pretraining.
☆2,923Feb 14, 2023Updated 3 years ago
currentslab / fastlangid
View on GitHub
fastlangid, the only language identification package that support cantonese (zh-yue), simplified (zh-hans) and traditional chinese (zh-ha…
☆43Dec 6, 2022Updated 3 years ago
bitextor / bitextor
View on GitHub
Bitextor generates translation memories from multilingual websites
☆299Nov 11, 2024Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
bitextor / bifixer
View on GitHub
Tool to fix bitexts and tag near-duplicates for removal
☆35Sep 4, 2025Updated 10 months ago
mikemccand / chromium-compact-language-detector
View on GitHub
Automatically exported from code.google.com/p/chromium-compact-language-detector
☆160Oct 1, 2020Updated 5 years ago
facebookresearch / MUSE
View on GitHub
A library for Multilingual Unsupervised or Supervised word Embeddings
☆3,248Aug 31, 2022Updated 3 years ago
pemistahl / lingua-py
View on GitHub
The most accurate natural language detection library for Python, suitable for short text and mixed-language text
☆1,766Updated this week
rsennrich / subword-nmt
View on GitHub
Unsupervised Word Segmentation for Neural Machine Translation and Text Generation
☆2,271Aug 7, 2024Updated last year
kpu / kenlm
View on GitHub
KenLM: Faster and Smaller Language Model Queries
☆2,792Mar 30, 2025Updated last year
GEM-benchmark / NL-Augmenter
View on GitHub
NL-Augmenter 🦎 → 🐍 A Collaborative Repository of Natural Language Transformations
☆786May 19, 2024Updated 2 years ago
monologg / ko_lm_dataformat
View on GitHub
A utility for storing and reading files for Korean LM training 💾
☆35Jul 18, 2026Updated last week
AU-DIS / LSTM_langid
View on GitHub
Source code for the Apple reproduction
☆33Apr 23, 2021Updated 5 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
mjpost / sacrebleu
View on GitHub
Reference BLEU implementation that auto-downloads test sets and reports a version string to facilitate cross-lab comparisons
☆1,254Jul 17, 2026Updated last week
laurieburchell / open-lid-dataset
View on GitHub
Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)
☆77Apr 1, 2025Updated last year
marcotcr / checklist
View on GitHub
Beyond Accuracy: Behavioral Testing of NLP models with CheckList
☆2,052Jan 9, 2024Updated 2 years ago
google-research / text-to-text-transfer-transformer
View on GitHub
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"
☆6,537Jul 8, 2026Updated 2 weeks ago
hainan-xv / zipporah
View on GitHub
☆42Jul 17, 2018Updated 8 years ago
clab / fast_align
View on GitHub
Simple, fast unsupervised word aligner
☆769Jul 19, 2022Updated 4 years ago
OpenNMT / Server
View on GitHub
☆17Jan 2, 2017Updated 9 years ago