Brand24-AI / mms_benchmarkLinks
The most extensive open massively multilingual corpus of datasets for training sentiment models. The corpus consists of 79 manually selected from over 350 datasets reported in the scientific literature based on strict quality criteria and covers 27 languages.
β16Updated last year
Alternatives and similar repositories for mms_benchmark
Users that are interested in mms_benchmark are comparing it to the libraries listed below
Sorting:
- Pre-train Static Word Embeddingsβ79Updated 3 weeks ago
- π¬ Language Identification with Support for More Than 2000 Labels -- EMNLP 2023β138Updated 3 weeks ago
- This is the way: designing and compiling LEPISZCZE, a comprehensive NLP benchmark for Polishβ13Updated last year
- β38Updated last month
- Efficient BM25 with DuckDB π¦β49Updated 6 months ago
- Optimus is a flexible and scalable framework built to train language models efficiently across diverse hardware configurations, includingβ¦β56Updated 2 months ago
- Embeddings: State-of-the-art Text Representations for Natural Language Processing tasks, an initial version of library focus on the Polisβ¦β36Updated last year
- The robust European language model benchmark.β106Updated this week
- NLP with Rust for Python π¦πβ62Updated last month
- ITALIC: An ITALian Intent Classification Datasetβ14Updated last year
- Zero-shot Audio Classification using Whisperβ79Updated 2 years ago
- A python package for benchmarking interpretability techniques on Transformers.β213Updated 8 months ago
- The central repo for Creole based NLU and NLG workβ18Updated last month
- Crowd-sourced lists of urls to help Common Crawl crawl under-resourced languages. See https://github.com/commoncrawl/web-languages-code/ β¦β45Updated this week
- Generalist and Lightweight Model for Text Classificationβ134Updated 2 weeks ago
- just a bunch of useful embeddings for scikit-learn pipelinesβ500Updated 3 months ago
- β61Updated last week
- Trully flash implementation of DeBERTa disentangled attention mechanism.β58Updated last month
- β56Updated 2 years ago
- β124Updated 8 months ago
- German Alpaca Dataset (Cleaned + Translated)β25Updated 2 years ago
- A BERT-based application for reusable text classification at scaleβ38Updated last year
- β104Updated last month
- Polish RoBERTA model trained on Polish literature, Wikipedia, and Oscar. The major assumption is that quality text will give a good modeβ¦β35Updated 4 years ago
- β99Updated 6 months ago
- The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching oβ¦β137Updated last month
- β296Updated last year
- A Scandinavian Benchmark for sentence embeddingsβ39Updated last month
- β51Updated 2 years ago
- Generate synthetic labeled data for extremely low-resource languages using bilingual lexicons.β15Updated 8 months ago