zafercavdar / fasttext-langdetectLinks
80x faster and 95% accurate language identification with Fasttext
β162Updated last year
Alternatives and similar repositories for fasttext-langdetect
Users that are interested in fasttext-langdetect are comparing it to the libraries listed below
Sorting:
- π¬ Language Identification with Support for More Than 2000 Labels -- EMNLP 2023β148Updated 2 months ago
- Simply, faster, sentence-transformersβ143Updated last year
- β‘οΈ 80x faster Fasttext language detection out of the box | Split text by languageβ228Updated 4 months ago
- FastFit β‘ When LLMs are Unfit Use FastFit β‘ Fast and Effective Text Classification with Many Classesβ211Updated 3 months ago
- Python API for https://vespa.ai, the open big data serving engineβ137Updated this week
- A Python Search Engine for Humans π₯Έβ231Updated last year
- Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.β252Updated 2 years ago
- Efficient few-shot learning with cross-encoders.β56Updated last year
- Python3 bindings for the Compact Language Detector v3 (CLD3)β154Updated 2 years ago
- The pipeline for the OSCAR corpusβ171Updated last year
- [EMNLP 2023 Demo] fabricator - annotating and generating datasets with large language models.β110Updated last year
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiencyβ170Updated 2 months ago
- Official implementation of the paper "CoEdIT: Text Editing by Task-Specific Instruction Tuning" (EMNLP 2023)β129Updated 11 months ago
- Targetted language identifier, based on FastText and Hunspell.β37Updated 6 months ago
- Baguetter is a flexible, efficient, and hackable search engine library implemented in Python. It's designed for quickly benchmarking, impβ¦β188Updated 11 months ago
- Completion After Prompt Probability. Make your LLM make a choiceβ80Updated 9 months ago
- A language detection softwareβ56Updated 7 years ago
- β106Updated 8 months ago
- A multilingual version of MS MARCO passage ranking datasetβ144Updated last year
- β367Updated last year
- π’ Work with static vector modelsβ28Updated 4 months ago
- A large-scale multilingual dataset for Information Retrieval. Thorough human-annotations across 18 diverse languages.β194Updated last year
- A robust web archive analytics toolkitβ114Updated 5 months ago
- KeyPhraseTransformer lets you quickly extract key phrases, topics, themes from your text data with T5 transformer | Keyphrase extractionβ¦β104Updated last year
- β172Updated 5 months ago
- Generalist and Lightweight Model for Text Classificationβ156Updated 2 months ago
- PyTorch-IE: State-of-the-art Information Extraction in PyTorchβ78Updated 3 weeks ago
- Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)β74Updated 4 months ago
- Incorporating VIsual LAyout Structures for Scientific Text Classificationβ179Updated 2 years ago
- Faster, modernized fork of the language identification tool langid.pyβ56Updated 9 months ago