Brand24-AI / mms_benchmarkLinks
The most extensive open massively multilingual corpus of datasets for training sentiment models. The corpus consists of 79 manually selected from over 350 datasets reported in the scientific literature based on strict quality criteria and covers 27 languages.
☆16Updated last year
Alternatives and similar repositories for mms_benchmark
Users that are interested in mms_benchmark are comparing it to the libraries listed below
Sorting:
- ITALIC: An ITALian Intent Classification Dataset☆14Updated last year
- explainable Siamese sentence transformers☆12Updated last year
- ☆22Updated last year
- A Simple Bulk Labelling Tool☆594Updated 2 weeks ago
- ☆7Updated 2 years ago
- 💬 Language Identification with Support for More Than 2000 Labels -- EMNLP 2023☆147Updated 2 months ago
- ☆308Updated last year
- A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.☆84Updated 6 months ago
- A Scandinavian Benchmark for sentence embeddings☆40Updated 2 months ago
- The robust European language model benchmark.☆115Updated this week
- This is the way: designing and compiling LEPISZCZE, a comprehensive NLP benchmark for Polish☆13Updated last year
- Materials for "IT5: Large-scale Text-to-text Pretraining for Italian Language Understanding and Generation" 🇮🇹☆30Updated last year
- just a bunch of useful embeddings for scikit-learn pipelines☆503Updated 4 months ago
- Efficiently find the best-suited language model (LM) for your NLP task☆125Updated 2 weeks ago
- 🇮🇹 Italian BERT and ELECTRA models (incl. evaluation)☆18Updated 2 years ago
- SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders.☆797Updated 2 weeks ago
- Interpretability for sequence generation models 🐛 🔍☆432Updated 3 months ago
- A lightweight, local-first, and free experiment tracking Python library built on top of 🤗 Datasets and Spaces.☆626Updated this week
- Knowledge distillation of wav2vec2 (from huggingface)☆9Updated 4 years ago
- Repo for the Belebele dataset, a massively multilingual reading comprehension dataset.☆335Updated 7 months ago
- Official implementation of QATCH: Benchmarking SQL-centric tasks with Table Representation Learning Models on Your Data☆30Updated 3 weeks ago
- Library for Textless Spoken Language Processing☆549Updated last year
- A list of awesome open source projects in the machine learning field, who's developers are mainly based in Germany☆45Updated 11 months ago
- ☆116Updated 2 weeks ago
- A french sequence to sequence pretrained model☆62Updated 2 years ago
- ☆104Updated 8 months ago
- A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB te…☆282Updated 6 months ago
- Universal Romanizer that can convert any unicode script to roman (latin) script☆214Updated last year
- A merged version of multiple open-source German speech datasets.☆32Updated last year
- Optimus is a flexible and scalable framework built to train language models efficiently across diverse hardware configurations, including…☆66Updated last month