Brand24-AI / mms_benchmarkLinks
The most extensive open massively multilingual corpus of datasets for training sentiment models. The corpus consists of 79 manually selected from over 350 datasets reported in the scientific literature based on strict quality criteria and covers 27 languages.
☆16Updated last year
Alternatives and similar repositories for mms_benchmark
Users that are interested in mms_benchmark are comparing it to the libraries listed below
Sorting:
- ITALIC: An ITALian Intent Classification Dataset☆14Updated last year
- ☆310Updated last year
- Interpretability for sequence generation models 🐛 🔍☆441Updated last month
- The robust European language model benchmark.☆129Updated this week
- A Simple Bulk Labelling Tool☆597Updated 2 months ago
- A Scandinavian Benchmark for sentence embeddings☆41Updated 4 months ago
- Various speech datasets made available to the public☆131Updated 10 months ago
- Efficiently find the best-suited language model (LM) for your NLP task☆127Updated 2 months ago
- The central repo for Creole based NLU and NLG work☆18Updated 5 months ago
- ☆16Updated 3 years ago
- Materials for "IT5: Large-scale Text-to-text Pretraining for Italian Language Understanding and Generation" 🇮🇹☆30Updated last year
- A lightweight Python package for setting up adversarial robustness experiments and to compute robustness distributions. The package imple…☆37Updated last week
- just a bunch of useful embeddings for scikit-learn pipelines☆517Updated 2 weeks ago
- explainable Siamese sentence transformers☆13Updated last year
- ☆358Updated last year
- ☆23Updated last year
- Croissant is a high-level format for machine learning datasets that brings together four rich layers.☆732Updated last week
- Bicleaner fork that uses neural networks☆39Updated 4 months ago
- A merged version of multiple open-source German speech datasets.☆33Updated last year
- ☆135Updated last week
- 💬 Language Identification with Support for More Than 2000 Labels -- EMNLP 2023☆162Updated 4 months ago
- Universal Romanizer that can convert any unicode script to roman (latin) script☆226Updated last year
- This is the way: designing and compiling LEPISZCZE, a comprehensive NLP benchmark for Polish☆13Updated last year
- A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.☆92Updated 8 months ago
- A repository for publicly/freely available Natural Language Processing (NLP) datasets for African languages.☆111Updated last year
- A python package for benchmarking interpretability techniques on Transformers.☆213Updated last year
- A french sequence to sequence pretrained model☆62Updated 3 years ago
- Suite for phonetic word embeddings, especially their evaluation and baseline models.☆34Updated 7 months ago
- ☆112Updated 10 months ago
- HF's ML for Audio study group☆198Updated 2 years ago