Brand24-AI / mms_benchmarkLinks
The most extensive open massively multilingual corpus of datasets for training sentiment models. The corpus consists of 79 manually selected from over 350 datasets reported in the scientific literature based on strict quality criteria and covers 27 languages.
☆16Updated last year
Alternatives and similar repositories for mms_benchmark
Users that are interested in mms_benchmark are comparing it to the libraries listed below
Sorting:
- ITALIC: An ITALian Intent Classification Dataset☆14Updated last year
- ☆307Updated last year
- 💬 Language Identification with Support for More Than 2000 Labels -- EMNLP 2023☆153Updated 3 months ago
- A Simple Bulk Labelling Tool☆596Updated last month
- ☆22Updated last year
- Multi-task modelling extensions for huggingface transformers☆13Updated last month
- ☆125Updated last week
- A merged version of multiple open-source German speech datasets.☆33Updated last year
- A Scandinavian Benchmark for sentence embeddings☆40Updated 3 months ago
- This is the way: designing and compiling LEPISZCZE, a comprehensive NLP benchmark for Polish☆13Updated last year
- Materials for "IT5: Large-scale Text-to-text Pretraining for Italian Language Understanding and Generation" 🇮🇹☆30Updated last year
- ☆359Updated last year
- Various speech datasets made available to the public☆128Updated 8 months ago
- Efficiently find the best-suited language model (LM) for your NLP task☆127Updated last month
- Universal Romanizer that can convert any unicode script to roman (latin) script☆220Updated last year
- A list of awesome open source projects in the machine learning field, who's developers are mainly based in Germany☆46Updated 11 months ago
- just a bunch of useful embeddings for scikit-learn pipelines☆516Updated 3 weeks ago
- Library for Textless Spoken Language Processing☆550Updated 2 years ago
- The robust European language model benchmark.☆120Updated last week
- The website for Danish Foundation Models, a project for training foundational Danish language model.☆74Updated last week
- A transcribed speech dataset in Wolof, Pulaar and Sereer, to support agriculture. Funded by Lacuna Fund.☆15Updated last year
- A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.☆86Updated 7 months ago
- A Hackable speech recognition library.☆25Updated 10 months ago
- Bicleaner fork that uses neural networks☆40Updated 2 months ago
- Repository containing the open source code of works published at the FBK MT unit.☆48Updated last month
- Transform datasets at scale. Optimize datasets for fast AI model training.☆534Updated this week
- Adding random noise to a text dataset, and controlling very accurately the quality of the result☆19Updated 3 weeks ago
- Place where folks can contribute to 🤗 community events☆425Updated last year
- 🇮🇹 Italian BERT and ELECTRA models (incl. evaluation)☆18Updated 2 years ago
- Small repo describing how to use Hugging Face's Wav2Vec2 with PyCTCDecode☆111Updated 3 years ago