zafercavdar / fasttext-langdetect
80x faster and 95% accurate language identification with Fasttext
☆147Updated last year
Alternatives and similar repositories for fasttext-langdetect:
Users that are interested in fasttext-langdetect are comparing it to the libraries listed below
- FastFit ⚡ When LLMs are Unfit Use FastFit ⚡ Fast and Effective Text Classification with Many Classes☆188Updated 4 months ago
- Language Identification with Support for More Than 2000 Labels -- EMNLP 2023☆118Updated 3 months ago
- Simply, faster, sentence-transformers☆141Updated 6 months ago
- Efficient few-shot learning with cross-encoders.☆49Updated last year
- Augmenty is an augmentation library based on spaCy for augmenting texts.☆151Updated 9 months ago
- Generalist and Lightweight Model for Text Classification☆87Updated last week
- OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.☆49Updated last month
- A Python Search Engine for Humans 🥸☆204Updated 10 months ago
- PyTorch-IE: State-of-the-art Information Extraction in PyTorch☆77Updated this week
- Python API for https://vespa.ai, the open big data serving engine☆113Updated this week
- Targetted language identifier, based on FastText and Hunspell.☆34Updated 2 weeks ago
- The pipeline for the OSCAR corpus☆166Updated last year
- RaKUn 2.0 - A fast keyword detection algorithm☆65Updated last week
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.☆251Updated 2 weeks ago
- Few-shot Named Entity Recognition☆123Updated 2 years ago
- This repository contains an easy and intuitive approach to use SetFit in combination with spaCy.☆76Updated last year
- A multilingual version of MS MARCO passage ranking dataset☆143Updated last year
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆153Updated 3 months ago
- ☆83Updated 2 months ago
- Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)☆70Updated 10 months ago
- Python3 bindings for the Compact Language Detector v3 (CLD3)☆150Updated last year
- Triton backend for https://github.com/OpenNMT/CTranslate2☆34Updated last year
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆63Updated 11 months ago
- Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.☆241Updated 2 years ago
- [EMNLP 2023 Demo] fabricator - annotating and generating datasets with large language models.☆104Updated 9 months ago
- A Python library to chunk/group your texts based on semantic similarity.☆93Updated 7 months ago
- A large-scale multilingual dataset for Information Retrieval. Thorough human-annotations across 18 diverse languages.☆176Updated 7 months ago
- Datasets collection and preprocessings framework for NLP extreme multitask learning☆175Updated last month
- A multi-lingual approach to AllenNLP CoReference Resolution along with a wrapper for spaCy.☆105Updated 10 months ago
- Streamlit Named Entity Recognition (NER) annotation custom component☆38Updated 2 years ago