Brand24-AI / mms_benchmarkLinks
The most extensive open massively multilingual corpus of datasets for training sentiment models. The corpus consists of 79 manually selected from over 350 datasets reported in the scientific literature based on strict quality criteria and covers 27 languages.
☆16Updated 2 years ago
Alternatives and similar repositories for mms_benchmark
Users that are interested in mms_benchmark are comparing it to the libraries listed below
Sorting:
- The robust European language model benchmark.☆142Updated this week
- The website for Danish Foundation Models, a project for training foundational Danish language model.☆76Updated last week
- A Scandinavian Benchmark for sentence embeddings☆44Updated last week
- ☆24Updated last year
- A Simple Bulk Labelling Tool☆598Updated 4 months ago
- Interpretability for sequence generation models 🐛 🔍☆449Updated last week
- just a bunch of useful embeddings for scikit-learn pipelines☆520Updated 2 months ago
- 💬 Language Identification with Support for More Than 2000 Labels -- EMNLP 2023☆175Updated 3 weeks ago
- ITALIC: An ITALian Intent Classification Dataset☆14Updated 2 years ago
- ☆318Updated last year
- ☆359Updated last year
- Active Learning for Text Classification in Python☆633Updated 2 weeks ago
- A python package for benchmarking interpretability techniques on Transformers.☆214Updated last year
- Organize your experiments into discrete steps that can be cached and reused throughout the lifetime of your research project.☆565Updated last year
- Efficiently find the best-suited language model (LM) for your NLP task☆132Updated 4 months ago
- 🤖 A PyTorch library of curated Transformer models and their composable components☆894Updated last year
- ☆118Updated last year
- animal2vec: A self-supervised transformer for rare-event raw audio input☆28Updated 3 weeks ago
- Late Interaction Models Training & Retrieval☆666Updated this week
- Library for Textless Spoken Language Processing☆554Updated 2 years ago
- German Alpaca Dataset (Cleaned + Translated)☆26Updated 2 years ago
- HF's ML for Audio study group☆199Updated 2 years ago
- A plotting tool that outputs Line Rider maps, so you can watch a man on a sled scoot down your loss curves. 🎿☆333Updated last year
- Embeddings: State-of-the-art Text Representations for Natural Language Processing tasks, an initial version of library focus on the Polis…☆36Updated 2 years ago
- Danish Data Science Community's guide to sustainable data science☆19Updated 3 years ago
- FastFit ⚡ When LLMs are Unfit Use FastFit ⚡ Fast and Effective Text Classification with Many Classes☆214Updated 2 months ago
- ☆156Updated this week
- OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.☆53Updated 2 months ago
- A fast and lightweight python-based CTC beam search decoder for speech recognition.☆464Updated 2 years ago
- Bicleaner fork that uses neural networks☆40Updated 6 months ago