Brand24-AI / mms_benchmarkLinks
The most extensive open massively multilingual corpus of datasets for training sentiment models. The corpus consists of 79 manually selected from over 350 datasets reported in the scientific literature based on strict quality criteria and covers 27 languages.
โ16Updated 2 years ago
Alternatives and similar repositories for mms_benchmark
Users that are interested in mms_benchmark are comparing it to the libraries listed below
Sorting:
- ITALIC: An ITALian Intent Classification Datasetโ14Updated 2 years ago
- ๐ฌ Language Identification with Support for More Than 2000 Labels -- EMNLP 2023โ186Updated 2 months ago
- Interpretability for sequence generation models ๐ ๐โ453Updated last week
- Materials for "IT5: Large-scale Text-to-text Pretraining for Italian Language Understanding and Generation" ๐ฎ๐นโ30Updated last year
- A Scandinavian Benchmark for sentence embeddingsโ45Updated 2 months ago
- โ323Updated last year
- explainable Siamese sentence transformersโ13Updated last year
- โ24Updated last year
- A Simple Bulk Labelling Toolโ598Updated 6 months ago
- The robust European language model benchmark.โ159Updated this week
- just a bunch of useful embeddings for scikit-learn pipelinesโ520Updated 4 months ago
- The website for Danish Foundation Models, a project for training foundational Danish language model.โ81Updated last month
- โ357Updated last year
- A merged version of multiple open-source German speech datasets.โ34Updated last year
- โ132Updated 2 weeks ago
- Active Learning for Text Classification in Pythonโ638Updated last week
- Official implementation of QATCH: Benchmarking SQL-centric tasks with Table Representation Learning Models on Your Dataโ32Updated 6 months ago
- A python package for benchmarking interpretability techniques on Transformers.โ215Updated last year
- OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.โ57Updated last week
- German Alpaca Dataset (Cleaned + Translated)โ26Updated 2 years ago
- Control the quality of your labeled data with the Python tools you already know.โ239Updated 2 months ago
- This is a neural spelling checkerโ69Updated 3 years ago
- Curriculum trainingโ22Updated 7 months ago
- animal2vec: A self-supervised transformer for rare-event raw audio inputโ30Updated last month
- โ173Updated this week
- โ10Updated 2 years ago
- Various speech datasets made available to the publicโ130Updated last year
- Universal Romanizer that can convert any unicode script to roman (latin) scriptโ237Updated last year
- Minimum Bayes Risk Decoding for Hugging Face Transformersโ60Updated last year
- Optimus is a flexible and scalable framework built to train language models efficiently across diverse hardware configurations, includingโฆโ68Updated 2 months ago