The most accurate natural language detection library for Python, suitable for short text and mixed-language text
☆1,721Apr 23, 2026Updated 3 weeks ago
Alternatives and similar repositories for lingua-py
Users that are interested in lingua-py are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Port of Google's language-detection library to Python.☆1,883Mar 3, 2025Updated last year
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆199Jun 6, 2025Updated 11 months ago
- ⚡️ 80x faster Fasttext language detection out of the box | Split text by language☆315May 6, 2026Updated 2 weeks ago
- Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets☆4,975Apr 27, 2026Updated 3 weeks ago
- Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XM…☆5,970Sep 12, 2025Updated 8 months ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Leveraging BERT and c-TF-IDF to create easily interpretable topics.☆7,610May 13, 2026Updated last week
- Targetted language identifier, based on FastText and Hunspell.☆38Sep 4, 2025Updated 8 months ago
- Efficient few-shot learning with Sentence Transformers☆2,735Apr 17, 2026Updated last month
- Python3 bindings for the Compact Language Detector v3 (CLD3)☆155Apr 19, 2026Updated last month
- Minimal keyword extraction with BERT☆4,171May 13, 2026Updated last week
- A Python library for calculating a large variety of metrics from text☆363May 5, 2026Updated 2 weeks ago
- 80x faster and 95% accurate language identification with Fasttext☆168Jan 23, 2024Updated 2 years ago
- State-of-the-Art Embeddings, Retrieval, and Reranking☆18,669May 12, 2026Updated last week
- Augmenty is an augmentation library based on spaCy for augmenting texts.☆156May 24, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [EMNLP 2023] 💬 Language Identification with Support for More Than 2000 Labels☆203Apr 15, 2026Updated last month
- Fast inference engine for Transformer models☆4,485May 12, 2026Updated last week
- 🐍💯pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.☆914Aug 20, 2024Updated last year
- Fast, Accurate, Lightweight Python library to make State of the Art Embedding☆2,953Apr 21, 2026Updated 3 weeks ago
- Faster, modernized fork of the language identification tool langid.py☆62Nov 22, 2024Updated last year
- 💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows☆12,577May 12, 2026Updated last week
- 🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization…☆3,392May 7, 2026Updated last week
- A very simple framework for state-of-the-art Natural Language Processing (NLP)☆14,376Oct 27, 2025Updated 6 months ago
- Rapid fuzzy string matching in Python using various string metrics☆3,907May 11, 2026Updated last week
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Top2Vec learns jointly embedded topic, document and word vectors.☆3,105Nov 14, 2024Updated last year
- Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.☆1,289Apr 11, 2026Updated last month
- Structured Outputs☆13,846May 13, 2026Updated last week
- Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts)☆3,188May 13, 2026Updated last week
- OCR, Archive, Index and Search: Implementation agnostic OCR framework.☆225Nov 3, 2023Updated 2 years ago
- ☆883May 24, 2023Updated 2 years ago
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.☆3,058May 6, 2026Updated 2 weeks ago
- Open-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and a…☆25,250Updated this week
- DSPy: The framework for programming—not prompting—language models☆34,496Updated this week
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Lightning Fast Language Prediction 🚀☆168Aug 22, 2025Updated 8 months ago
- Active Learning for Text Classification in Python☆643Updated this week
- Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languages☆7,790Updated this week
- 🌸 fastText + Bloom embeddings for compact, full-coverage vectors with spaCy☆341Apr 25, 2025Updated last year
- SpanMarker for Named Entity Recognition☆473Apr 10, 2026Updated last month
- Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing☆795Jul 22, 2025Updated 9 months ago
- Fast Multimodal Semantic Deduplication & Filtering☆926May 4, 2026Updated 2 weeks ago