zafercavdar / fasttext-langdetect
80x faster and 95% accurate language identification with Fasttext
☆151Updated last year
Alternatives and similar repositories for fasttext-langdetect:
Users that are interested in fasttext-langdetect are comparing it to the libraries listed below
- Simply, faster, sentence-transformers☆141Updated 7 months ago
- FastFit ⚡ When LLMs are Unfit Use FastFit ⚡ Fast and Effective Text Classification with Many Classes☆191Updated 5 months ago
- ⚡️ 80x faster Fasttext language detection out of the box | Split text by language☆180Updated this week
- Python API for https://vespa.ai, the open big data serving engine☆117Updated this week
- 💬 Language Identification with Support for More Than 2000 Labels -- EMNLP 2023☆124Updated 4 months ago
- Baguetter is a flexible, efficient, and hackable search engine library implemented in Python. It's designed for quickly benchmarking, imp…☆173Updated 7 months ago
- Augmenty is an augmentation library based on spaCy for augmenting texts.☆153Updated 10 months ago
- Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.☆243Updated 2 years ago
- Python port of Boilerpipe library☆86Updated 7 months ago
- Generalist and Lightweight Model for Text Classification☆110Updated this week
- Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters☆136Updated 3 months ago
- 💫 SpaCy wrapper for ConceptNet 💫☆90Updated last year
- A Python library to chunk/group your texts based on semantic similarity.☆94Updated 8 months ago
- Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: …☆330Updated last year
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.☆272Updated last week
- A multilingual version of MS MARCO passage ranking dataset☆143Updated last year
- Pre-train Static Word Embeddings☆51Updated 3 weeks ago
- Coreference resolution for English, French, German and Polish, optimised for limited training data and easily extensible for further lang…☆121Updated 11 months ago
- A Python Search Engine for Humans 🥸☆213Updated 11 months ago
- Repository for the paper "MultiNERD: A Multilingual, Multi-Genre and Fine-Grained Dataset for Named Entity Recognition (and Disambiguatio…☆44Updated last year
- A large-scale multilingual dataset for Information Retrieval. Thorough human-annotations across 18 diverse languages.☆180Updated 8 months ago
- Fast and robust date extraction from web pages, with Python or on the command-line☆124Updated 3 months ago
- Robust and fast topic models with sentence-transformers.☆48Updated 2 weeks ago
- Efficient few-shot learning with cross-encoders.☆50Updated last year
- Few-shot Named Entity Recognition☆123Updated 3 years ago
- [EMNLP 2023 Demo] fabricator - annotating and generating datasets with large language models.☆107Updated 10 months ago
- KeyPhraseTransformer lets you quickly extract key phrases, topics, themes from your text data with T5 transformer | Keyphrase extraction…☆104Updated 9 months ago
- This repository contains an easy and intuitive approach to few-shot classification using sentence-transformers or spaCy models, or zero-s…☆214Updated 2 months ago
- Python3 bindings for the Compact Language Detector v3 (CLD3)☆151Updated last year
- PyTorch-IE: State-of-the-art Information Extraction in PyTorch☆77Updated this week