zafercavdar / fasttext-langdetectLinks
80x faster and 95% accurate language identification with Fasttext
β157Updated last year
Alternatives and similar repositories for fasttext-langdetect
Users that are interested in fasttext-langdetect are comparing it to the libraries listed below
Sorting:
- π¬ Language Identification with Support for More Than 2000 Labels -- EMNLP 2023β138Updated 3 weeks ago
- Simply, faster, sentence-transformersβ143Updated 9 months ago
- FastFit β‘ When LLMs are Unfit Use FastFit β‘ Fast and Effective Text Classification with Many Classesβ207Updated last month
- Python API for https://vespa.ai, the open big data serving engineβ127Updated this week
- Efficient few-shot learning with cross-encoders.β53Updated last year
- Augmenty is an augmentation library based on spaCy for augmenting texts.β156Updated last year
- PyTorch-IE: State-of-the-art Information Extraction in PyTorchβ78Updated 3 weeks ago
- A multilingual version of MS MARCO passage ranking datasetβ145Updated last year
- Targetted language identifier, based on FastText and Hunspell.β35Updated 4 months ago
- Generalist and Lightweight Model for Text Classificationβ133Updated last week
- β171Updated 2 months ago
- A Python Search Engine for Humans π₯Έβ222Updated last year
- This repository contains an easy and intuitive approach to use SetFit in combination with spaCy.β79Updated last year
- Python3 bindings for the Compact Language Detector v3 (CLD3)β154Updated 2 years ago
- π₯ Use Hugging Face text and token classification pipelines directly in spaCyβ63Updated last year
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.β328Updated 2 weeks ago
- Datasets collection and preprocessings framework for NLP extreme multitask learningβ184Updated 5 months ago
- The pipeline for the OSCAR corpusβ169Updated last year
- This repository contains an easy and intuitive approach to few-shot classification using sentence-transformers or spaCy models, or zero-sβ¦β216Updated 5 months ago
- The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching oβ¦β137Updated last month
- spaCy-wrap is a wrapper library for spaCy for including fine-tuned transformers from Huggingface in your spaCy pipeline allowing you to iβ¦β46Updated last year
- [EMNLP 2023 Demo] fabricator - annotating and generating datasets with large language models.β109Updated last year
- OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.β51Updated this week
- Completion After Prompt Probability. Make your LLM make a choiceβ79Updated 7 months ago
- Repository for the paper "MultiNERD: A Multilingual, Multi-Genre and Fine-Grained Dataset for Named Entity Recognition (and Disambiguatioβ¦β44Updated last year
- Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)β74Updated 2 months ago
- A large-scale multilingual dataset for Information Retrieval. Thorough human-annotations across 18 diverse languages.β188Updated 10 months ago
- Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: β¦β334Updated last year
- German Alpaca Dataset (Cleaned + Translated)β25Updated 2 years ago
- Baguetter is a flexible, efficient, and hackable search engine library implemented in Python. It's designed for quickly benchmarking, impβ¦β181Updated 9 months ago