zafercavdar / fasttext-langdetect
80x faster and 95% accurate language identification with Fasttext
☆141Updated 10 months ago
Related projects ⓘ
Alternatives and complementary repositories for fasttext-langdetect
- Simply, faster, sentence-transformers☆140Updated 2 months ago
- FastFit ⚡ When LLMs are Unfit Use FastFit ⚡ Fast and Effective Text Classification with Many Classes☆183Updated last month
- A fast and lightweight pure Python library for splitting text into semantically meaningful chunks.☆186Updated 4 months ago
- Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER (EMNLP 2…☆66Updated last year
- Language Identification with Support for More Than 2000 Labels -- EMNLP 2023☆92Updated 3 weeks ago
- Generalist and Lightweight Model for Text Classification☆51Updated last week
- Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.☆230Updated 2 years ago
- Efficient few-shot learning with cross-encoders.☆40Updated 9 months ago
- multimodal document analysis☆160Updated 5 months ago
- Augmenty is an augmentation library based on spaCy for augmenting texts.☆151Updated 6 months ago
- PyTorch-IE: State-of-the-art Information Extraction in PyTorch☆76Updated last week
- A multilingual version of MS MARCO passage ranking dataset☆142Updated last year
- This repository contains an easy and intuitive approach to use SetFit in combination with spaCy.☆72Updated last year
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆62Updated 8 months ago
- A multi-lingual approach to AllenNLP CoReference Resolution along with a wrapper for spaCy.☆103Updated 7 months ago
- ☆64Updated 9 months ago
- A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.☆53Updated 3 months ago
- A component orchestration engine☆27Updated 11 months ago
- Repository for the paper "MultiNERD: A Multilingual, Multi-Genre and Fine-Grained Dataset for Named Entity Recognition (and Disambiguatio…☆41Updated 9 months ago
- OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.☆48Updated 2 months ago
- Few-shot Named Entity Recognition☆122Updated 2 years ago
- SpaCyEx allows the creation of spaCy Matcher patterns with RegEx like syntax.☆57Updated 6 months ago
- Streamlit Named Entity Recognition (NER) annotation custom component☆39Updated 2 years ago
- A Python Search Engine for Humans 🥸☆186Updated 7 months ago
- Coreference resolution for English, French, German and Polish, optimised for limited training data and easily extensible for further lang…☆118Updated 7 months ago
- ☆147Updated 5 months ago
- Python API for https://vespa.ai, the open big data serving engine☆105Updated this week
- This repository contains an easy and intuitive approach to few-shot classification using sentence-transformers or spaCy models, or zero-s…☆209Updated 5 months ago
- A spaCy wrapper for GliNER☆91Updated 4 months ago
- Python port of Boilerpipe library☆85Updated 3 months ago