Brand24-AI / mms_benchmarkLinks
The most extensive open massively multilingual corpus of datasets for training sentiment models. The corpus consists of 79 manually selected from over 350 datasets reported in the scientific literature based on strict quality criteria and covers 27 languages.
β16Updated last year
Alternatives and similar repositories for mms_benchmark
Users that are interested in mms_benchmark are comparing it to the libraries listed below
Sorting:
- π¬ Language Identification with Support for More Than 2000 Labels -- EMNLP 2023β136Updated this week
- Optimus is a flexible and scalable framework built to train language models efficiently across diverse hardware configurations, includingβ¦β56Updated last month
- π« check your data, before you wreck your modelβ16Updated 2 years ago
- β56Updated 2 years ago
- Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.β81Updated 8 months ago
- Survey of available speech datasets for Polish ASR developmentβ15Updated 5 months ago
- ITALIC: An ITALian Intent Classification Datasetβ13Updated last year
- zero shot NER fine tuningβ13Updated 2 months ago
- Fine-tuning scripts for evaluating transformer-based models on KLEJ benchmark.β26Updated last year
- β50Updated 2 years ago
- The robust European language model benchmark.β104Updated this week
- β103Updated last week
- Polish RoBERTA model trained on Polish literature, Wikipedia, and Oscar. The major assumption is that quality text will give a good modeβ¦β35Updated 4 years ago
- A merged version of multiple open-source German speech datasets.β31Updated last year
- A Hackable speech recognition library.β25Updated 7 months ago
- negate_sentence(A Python module that doesn't negate sentences.)β31Updated 7 months ago
- A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR modelsβ31Updated 4 years ago
- Datasets collection and preprocessings framework for NLP extreme multitask learningβ183Updated 5 months ago
- Lightweight self-hosted span annotation toolβ33Updated last week
- Python Finite-State Toolkitβ55Updated this week
- β95Updated 5 months ago
- This is the way: designing and compiling LEPISZCZE, a comprehensive NLP benchmark for Polishβ13Updated last year
- Various speech datasets made available to the publicβ121Updated 5 months ago
- A python package for benchmarking interpretability techniques on Transformers.β212Updated 8 months ago
- Pre-train Static Word Embeddingsβ76Updated this week
- Suite for phonetic word embeddings, especially their evaluation and baseline models.β28Updated 3 months ago
- German small and large versions of GPT2.β20Updated 3 years ago
- Small repo describing how to use Hugging Face's Wav2Vec2 with PyCTCDecodeβ111Updated 2 years ago
- Curriculum trainingβ17Updated 2 months ago
- β16Updated last year