Brand24-AI / mms_benchmark
The most extensive open massively multilingual corpus of datasets for training sentiment models. The corpus consists of 79 manually selected from over 350 datasets reported in the scientific literature based on strict quality criteria and covers 27 languages.
☆16Updated last year
Alternatives and similar repositories for mms_benchmark:
Users that are interested in mms_benchmark are comparing it to the libraries listed below
- This is the way: designing and compiling LEPISZCZE, a comprehensive NLP benchmark for Polish☆13Updated last year
- Survey of available speech datasets for Polish ASR development☆13Updated 2 months ago
- Language Identification with Support for More Than 2000 Labels -- EMNLP 2023☆120Updated 3 months ago
- ☆22Updated last year
- Embeddings: State-of-the-art Text Representations for Natural Language Processing tasks, an initial version of library focus on the Polis…☆36Updated last year
- ITALIC: An ITALian Intent Classification Dataset☆12Updated last year
- A Scandinavian Benchmark for sentence embeddings☆33Updated 2 weeks ago
- The robust European language model benchmark.☆81Updated this week
- Materials for "IT5: Large-scale Text-to-text Pretraining for Italian Language Understanding and Generation" 🇮🇹☆30Updated 8 months ago
- Evaluation of Sentence Representations in Polish☆22Updated 2 years ago
- Polish RoBERTA model trained on Polish literature, Wikipedia, and Oscar. The major assumption is that quality text will give a good mode…☆34Updated 3 years ago
- Speaker diarization and speech to text☆14Updated 4 years ago
- spaCy-wrap is a wrapper library for spaCy for including fine-tuned transformers from Huggingface in your spaCy pipeline allowing you to i…☆46Updated 10 months ago
- ☆56Updated 2 years ago
- Polish datsets for grammatical error correction☆12Updated last year
- Tool for named entity recognition for Polish based on deep learning.☆31Updated last year
- explainable Siamese sentence transformers☆12Updated 11 months ago
- A library for detecting problematic data segments in structured and unstructured data with few lines of code.☆63Updated last year
- A python package for benchmarking interpretability techniques on Transformers.☆213Updated 5 months ago
- Generalist and Lightweight Model for Text Classification☆87Updated last week
- ☆50Updated 2 years ago
- A lightweight Python library for constructing, processing, and visualizing constituent trees.☆66Updated last month
- negate_sentence(A Python module that doesn't negate sentences.)☆29Updated 4 months ago
- A french sequence to sequence pretrained model☆59Updated 2 years ago
- Small repo describing how to use Hugging Face's Wav2Vec2 with PyCTCDecode☆110Updated 2 years ago
- An efficient OpenFST-based tool for calculating WER and aligning two transcript sequences.☆164Updated last month
- ☆11Updated last year
- An open-source Python package for Danish speech recognition☆29Updated 2 years ago
- A HuggingFace compatible Small Language Model trainer.☆74Updated last month
- ☆15Updated last year