UMxYTL-AI-Labs / MalayMMLULinks
[MalayMMLU] This is the first-ever Bahasa Melayu multitask benchmark designed to elevate the performance of Large Language Models (LLMs) and Large Vision Language Models (LVLMs).
☆35Updated 6 months ago
Alternatives and similar repositories for MalayMMLU
Users that are interested in MalayMMLU are comparing it to the libraries listed below
Sorting:
- Natural Language Toolkit for Malaysian language, https://malaya.readthedocs.io/☆498Updated this week
- We gather Malaysian dataset! https://malaysian-dataset.readthedocs.io/☆320Updated 3 weeks ago
- A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.☆84Updated 5 months ago
- South-East Asia Large Language Models☆337Updated 2 weeks ago
- Build datasets using natural language☆501Updated 2 months ago
- 💬 Language Identification with Support for More Than 2000 Labels -- EMNLP 2023☆144Updated last month
- Efficiently find the best-suited language model (LM) for your NLP task☆123Updated last week
- Fine-tune ModernBERT on a large Dataset with Custom Tokenizer Training☆66Updated 5 months ago
- [ACL 2024 Demo] SeaLLMs - Large Language Models for Southeast Asia☆169Updated 11 months ago
- Repo for the Belebele dataset, a massively multilingual reading comprehension dataset.☆332Updated 7 months ago
- 🤗 Benchmark Large Language Models Reliably On Your Data☆364Updated this week
- Notebooks for training universal 0-shot classifiers on many different tasks☆131Updated 6 months ago
- Fine-tuning Open-Source LLMs for Adaptive Machine Translation☆83Updated last week
- Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)☆74Updated 3 months ago
- Pretraining, fine-tuning and evaluation scripts for IndicBERT-v2 and IndicXTREME☆100Updated 3 months ago
- Enhancing Translation with RAG-Powered Large Language Models☆81Updated 3 months ago
- A compact LLM pretrained in 9 days by using high quality data☆318Updated 3 months ago
- SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders.☆789Updated 3 months ago
- Let's build better datasets, together!☆260Updated 6 months ago
- awesome synthetic (text) datasets☆289Updated last week
- LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM☆265Updated 2 months ago
- FastFit ⚡ When LLMs are Unfit Use FastFit ⚡ Fast and Effective Text Classification with Many Classes☆208Updated 2 months ago
- Facebook AI Research Sequence-to-Sequence Toolkit written in Python.☆23Updated 3 months ago
- Recipes to prepare datasets!☆14Updated this week
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language M…☆229Updated 8 months ago
- An Open Source Toolkit For LLM Distillation☆678Updated last week
- A blueprint for creating Pretraining and Fine-Tuning datasets for Indic languages☆107Updated 9 months ago
- ☆300Updated last year
- ☆673Updated 2 months ago
- Add Arabic diacritics (tashkeel/harakat) using Rust/Python/C++/WASM and NLP models☆32Updated 4 months ago