zafercavdar / fasttext-langdetect
80x faster and 95% accurate language identification with Fasttext
☆145Updated last year
Alternatives and similar repositories for fasttext-langdetect:
Users that are interested in fasttext-langdetect are comparing it to the libraries listed below
- Simply, faster, sentence-transformers☆141Updated 5 months ago
- Efficient few-shot learning with cross-encoders.☆44Updated 11 months ago
- Python API for https://vespa.ai, the open big data serving engine☆113Updated this week
- FastFit ⚡ When LLMs are Unfit Use FastFit ⚡ Fast and Effective Text Classification with Many Classes☆182Updated 3 months ago
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆63Updated 10 months ago
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.☆234Updated last week
- Generalist and Lightweight Model for Text Classification☆59Updated last week
- Python3 bindings for the Compact Language Detector v3 (CLD3)☆149Updated last year
- Augmenty is an augmentation library based on spaCy for augmenting texts.☆151Updated 8 months ago
- Language Identification with Support for More Than 2000 Labels -- EMNLP 2023☆111Updated 2 months ago
- spaCy-wrap is a wrapper library for spaCy for including fine-tuned transformers from Huggingface in your spaCy pipeline allowing you to i…☆46Updated 9 months ago
- This repository contains an easy and intuitive approach to use SetFit in combination with spaCy.☆76Updated last year
- Few-shot Named Entity Recognition☆122Updated 2 years ago
- PyTorch-IE: State-of-the-art Information Extraction in PyTorch☆77Updated last week
- Coreference resolution for English, French, German and Polish, optimised for limited training data and easily extensible for further lang…☆120Updated 9 months ago
- ☆63Updated last month
- Fast and robust date extraction from web pages, with Python or on the command-line☆121Updated 3 weeks ago
- Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.☆236Updated 2 years ago
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆151Updated 2 months ago
- SpaCyEx allows the creation of spaCy Matcher patterns with RegEx like syntax.☆59Updated 8 months ago
- A multilingual version of MS MARCO passage ranking dataset☆143Updated last year
- Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters☆133Updated 3 weeks ago
- Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER (EMNLP 2…☆66Updated 2 years ago
- [EMNLP 2023 Demo] fabricator - annotating and generating datasets with large language models.☆104Updated 8 months ago
- A spaCy custom component that extracts and normalizes temporal expressions☆52Updated last year
- The pipeline for the OSCAR corpus☆165Updated last year
- A large-scale multilingual dataset for Information Retrieval. Thorough human-annotations across 18 diverse languages.☆173Updated 5 months ago
- Datasets collection and preprocessings framework for NLP extreme multitask learning☆173Updated 3 weeks ago
- A component orchestration engine☆28Updated last year