Brand24-AI / mms_benchmarkLinks
The most extensive open massively multilingual corpus of datasets for training sentiment models. The corpus consists of 79 manually selected from over 350 datasets reported in the scientific literature based on strict quality criteria and covers 27 languages.
☆16Updated last year
Alternatives and similar repositories for mms_benchmark
Users that are interested in mms_benchmark are comparing it to the libraries listed below
Sorting:
- ITALIC: An ITALian Intent Classification Dataset☆14Updated last year
- A Scandinavian Benchmark for sentence embeddings☆41Updated 4 months ago
- explainable Siamese sentence transformers☆13Updated last year
- 💬 Language Identification with Support for More Than 2000 Labels -- EMNLP 2023☆160Updated 3 months ago
- The robust European language model benchmark.☆125Updated this week
- ☆308Updated last year
- A Simple Bulk Labelling Tool☆597Updated last month
- The website for Danish Foundation Models, a project for training foundational Danish language model.☆74Updated last month
- ☆23Updated last year
- ☆110Updated 9 months ago
- Efficiently find the best-suited language model (LM) for your NLP task☆128Updated 2 months ago
- just a bunch of useful embeddings for scikit-learn pipelines☆518Updated last month
- Official implementation of "GPT or BERT: why not both?"☆59Updated last month
- A list of awesome open source projects in the machine learning field, who's developers are mainly based in Germany☆46Updated last year
- Interpretability for sequence generation models 🐛 🔍☆439Updated last week
- A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.☆89Updated 8 months ago
- ☆128Updated last week
- Bicleaner fork that uses neural networks☆40Updated 3 months ago
- Materials for "IT5: Large-scale Text-to-text Pretraining for Italian Language Understanding and Generation" 🇮🇹☆30Updated last year
- An example starter repo for Python projects☆300Updated 3 months ago
- Optimus is a flexible and scalable framework built to train language models efficiently across diverse hardware configurations, including…☆67Updated 2 months ago
- Late Interaction Models Training & Retrieval☆593Updated this week
- OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.☆51Updated 2 months ago
- Adding random noise to a text dataset, and controlling very accurately the quality of the result☆19Updated last month
- ☆359Updated last year
- Repo for the Belebele dataset, a massively multilingual reading comprehension dataset.☆335Updated 9 months ago
- A different, but useful, textcat approach.☆18Updated last year
- A python package for benchmarking interpretability techniques on Transformers.☆214Updated 11 months ago
- ☆16Updated 3 years ago
- Active Learning for Text Classification in Python☆623Updated 2 weeks ago