pemistahl / lingua-py
The most accurate natural language detection library for Python, suitable for short text and mixed-language text
☆1,270Updated this week
Alternatives and similar repositories for lingua-py:
Users that are interested in lingua-py are comparing it to the libraries listed below
- Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm…☆821Updated this week
- Pure Python Spell Checking http://pyspellchecker.readthedocs.io/en/latest/☆736Updated last week
- python package to calculate readability statistics of a text object - paragraphs, sentences, articles.☆1,250Updated this week
- 🐍💯pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.☆837Updated 6 months ago
- ☆810Updated last year
- 80x faster and 95% accurate language identification with Fasttext☆148Updated last year
- Port of Google's language-detection library to Python.☆1,765Updated last week
- Spelling corrector in python☆477Updated 2 months ago
- Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.☆878Updated last week
- 🧹 Python package for text cleaning☆971Updated last year
- Single-document unsupervised keyword extraction☆1,690Updated this week
- ✔️Contextual word checker for better suggestions (not actively maintained)☆413Updated last month
- A Collection of BM25 Algorithms in Python☆1,118Updated 5 months ago
- Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy☆1,049Updated this week
- NeuSpell: A Neural Spelling Correction Toolkit☆690Updated last year
- 📚 Process PDFs, Word documents and more with spaCy☆466Updated this week
- Efficient few-shot learning with Sentence Transformers☆2,408Updated 2 months ago
- Python3 bindings for the Compact Language Detector v3 (CLD3)☆150Updated last year
- Fuzzy string matching, grouping, and evaluation.☆752Updated 3 weeks ago
- Heuristic based boilerplate removal tool☆758Updated 2 weeks ago
- 🦙 Integrating LLMs into structured NLP pipelines☆1,210Updated 2 months ago
- SPLADE: sparse neural search (SIGIR21, SIGIR22)☆828Updated 10 months ago
- skweak: A software toolkit for weak supervision applied to NLP tasks☆923Updated 6 months ago
- Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024☆1,852Updated 3 weeks ago
- Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.☆241Updated 2 years ago
- Python Keyphrase Extraction module☆1,578Updated last year
- Article extraction benchmark: dataset and evaluation scripts☆305Updated 10 months ago
- MTEB: Massive Text Embedding Benchmark☆2,278Updated this week
- Python bindings to PDFium☆542Updated this week
- Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XM…☆4,015Updated 3 weeks ago