Brand24-AI / mms_benchmarkLinks
The most extensive open massively multilingual corpus of datasets for training sentiment models. The corpus consists of 79 manually selected from over 350 datasets reported in the scientific literature based on strict quality criteria and covers 27 languages.
☆16Updated last year
Alternatives and similar repositories for mms_benchmark
Users that are interested in mms_benchmark are comparing it to the libraries listed below
Sorting:
- explainable Siamese sentence transformers☆13Updated last year
- A Scandinavian Benchmark for sentence embeddings☆41Updated 5 months ago
- ITALIC: An ITALian Intent Classification Dataset☆14Updated last year
- 💬 Language Identification with Support for More Than 2000 Labels -- EMNLP 2023☆165Updated 5 months ago
- Interpretability for sequence generation models 🐛 🔍☆444Updated last week
- ☆311Updated last year
- Lab tutorials for the MSc NLP course at the University of Groningen 🐮☆29Updated 8 months ago
- The robust European language model benchmark.☆134Updated this week
- animal2vec: A self-supervised transformer for rare-event raw audio input☆26Updated last month
- ☆23Updated last year
- A Simple Bulk Labelling Tool☆598Updated 3 months ago
- The website for Danish Foundation Models, a project for training foundational Danish language model.☆75Updated 2 weeks ago
- Efficiently find the best-suited language model (LM) for your NLP task☆127Updated 3 months ago
- This is the way: designing and compiling LEPISZCZE, a comprehensive NLP benchmark for Polish☆13Updated last year
- German Alpaca Dataset (Cleaned + Translated)☆26Updated 2 years ago
- ☆115Updated 10 months ago
- Various speech datasets made available to the public☆130Updated 10 months ago
- HF's ML for Audio study group☆198Updated 2 years ago
- ☆358Updated last year
- A merged version of multiple open-source German speech datasets.☆33Updated last year
- A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB te…☆283Updated 3 weeks ago
- AfroLID, a powerful neural toolkit for African languages identification which covers 517 African languages.☆32Updated 7 months ago
- Materials for "IT5: Large-scale Text-to-text Pretraining for Italian Language Understanding and Generation" 🇮🇹☆30Updated last year
- OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.☆52Updated last month
- Crowd-sourced lists of urls to help Common Crawl crawl under-resourced languages. See https://github.com/commoncrawl/web-languages-code/ …☆60Updated last week
- Universal Romanizer that can convert any unicode script to roman (latin) script☆228Updated last year
- The central repo for Creole based NLU and NLG work☆18Updated 6 months ago
- Adding random noise to a text dataset, and controlling very accurately the quality of the result☆20Updated 2 months ago
- The Gridspace-Stanford Harper Valley speech dataset. Created in support of CS224S.☆49Updated 4 years ago
- A repository for publicly/freely available Natural Language Processing (NLP) datasets for African languages.☆112Updated last year