Brand24-AI / mms_benchmark
The most extensive open massively multilingual corpus of datasets for training sentiment models. The corpus consists of 79 manually selected from over 350 datasets reported in the scientific literature based on strict quality criteria and covers 27 languages.
β16Updated last year
Alternatives and similar repositories for mms_benchmark:
Users that are interested in mms_benchmark are comparing it to the libraries listed below
- π¬ Language Identification with Support for More Than 2000 Labels -- EMNLP 2023β127Updated 4 months ago
- β88Updated 4 months ago
- A python package for benchmarking interpretability techniques on Transformers.β212Updated 6 months ago
- β54Updated last week
- Curriculum trainingβ17Updated last month
- The robust European language model benchmark.β99Updated this week
- [EMNLP'23] Official Code for "FOCUS: Effective Embedding Initialization for Monolingual Specialization of Multilingual Models"β30Updated 6 months ago
- Pre-train Static Word Embeddingsβ55Updated last week
- Polish RoBERTA model trained on Polish literature, Wikipedia, and Oscar. The major assumption is that quality text will give a good modeβ¦β34Updated 3 years ago
- ππ Python 3 library for managing, annotating, and converting natural language corpuses using popular formats (CoNLL, ELAN, Praat, CSV, β¦β17Updated 9 months ago
- Minimum Bayes Risk Decoding for Hugging Face Transformersβ57Updated 10 months ago
- A list of awesome open source projects in the machine learning field, who's developers are mainly based in Germanyβ43Updated 7 months ago
- triple-encoders is a library for contextualizing distributed Sentence Transformers representations.β14Updated 7 months ago
- A python package to run inference with HuggingFace language and vision-language checkpoints wrapping many convenient features.β27Updated 7 months ago
- LTG-Bertβ32Updated last year
- Embeddings: State-of-the-art Text Representations for Natural Language Processing tasks, an initial version of library focus on the Polisβ¦β36Updated last year
- β56Updated 2 years ago
- Repository containing the open source code of works published at the FBK MT unit.β43Updated last week
- Suite for phonetic word embeddings, especially their evaluation and baseline models.β28Updated last month
- The pipeline for the OSCAR corpusβ168Updated last year
- Augmenty is an augmentation library based on spaCy for augmenting texts.β153Updated 10 months ago
- Bicleaner fork that uses neural networksβ40Updated 8 months ago
- ITALIC: An ITALian Intent Classification Datasetβ12Updated last year
- OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.β51Updated 3 months ago
- β46Updated last month
- β44Updated 2 months ago
- This is the way: designing and compiling LEPISZCZE, a comprehensive NLP benchmark for Polishβ13Updated last year
- β22Updated last year
- π’ Work with static vector modelsβ27Updated 2 months ago
- Fine-tuning scripts for evaluating transformer-based models on KLEJ benchmark.β26Updated last year