UMxYTL-AI-Labs / MalayMMLU
[MalayMMLU] This is the first-ever Bahasa Melayu multitask benchmark designed to elevate the performance of Large Language Models (LLMs) and Large Vision Language Models (LVLMs).
☆31Updated 3 months ago
Alternatives and similar repositories for MalayMMLU:
Users that are interested in MalayMMLU are comparing it to the libraries listed below
- Natural Language Toolkit for Malaysian language, https://malaya.readthedocs.io/☆483Updated last week
- We gather Malaysian dataset! https://malaysian-dataset.readthedocs.io/☆312Updated last week
- A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.☆78Updated 2 months ago
- [ACL 2024 Demo] SeaLLMs - Large Language Models for Southeast Asia☆164Updated 8 months ago
- South-East Asia Large Language Models☆297Updated last week
- Speech Toolkit for Malaysian language, https://malaya-speech.readthedocs.io/☆249Updated last week
- Recipes to prepare datasets!☆13Updated 2 weeks ago
- Build datasets using natural language☆440Updated 3 weeks ago
- Low latency, High Accuracy, Custom Query routers for Humans and Agents. Built by Prithivi Da☆101Updated this week
- BLOOM+1: Adapting BLOOM model to support a new unseen language☆71Updated last year
- Enhancing Translation with RAG-Powered Large Language Models☆77Updated last week
- This is the repo for the paper "PANGEA: A FULLY OPEN MULTILINGUAL MULTIMODAL LLM FOR 39 LANGUAGES"☆104Updated 3 months ago
- FBI: Finding Blindspots in LLM Evaluations with Interpretable Checklists☆27Updated 3 weeks ago
- MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents [EMNLP 2024]☆139Updated 2 months ago
- Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback☆94Updated last year
- Translate large dataset to any language with google translation api and multithreads processing, no key required!☆68Updated 5 months ago
- Experimenting with small language models☆64Updated last year
- 💬 Language Identification with Support for More Than 2000 Labels -- EMNLP 2023☆124Updated 4 months ago
- Efficiently find the best-suited language model (LM) for your NLP task☆121Updated this week
- Facebook AI Research Sequence-to-Sequence Toolkit written in Python.☆23Updated this week
- Official data on Malaysia's National Covid-19 Immunisation Programme (PICK). Powered by MySejahtera.☆494Updated last month
- Catalog of abusive language data (PLoS 2020)☆309Updated 9 months ago
- Pretraining, fine-tuning and evaluation scripts for IndicBERT-v2 and IndicXTREME☆93Updated 6 months ago
- LibreOffice Malay dictionary extension. Released under GPLv3 & LGPLv3. Covered by FDLv1.3.☆12Updated 2 years ago
- Multilingual Large Language Models Evaluation Benchmark☆119Updated 7 months ago
- Fine-tune ModernBERT on a large Dataset with Custom Tokenizer Training☆62Updated last month
- LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM☆224Updated last week
- Framework for enhancing LLMs for RAG tasks using fine-tuning.☆736Updated last month
- A collection of preprocessed datasets and pretrained models for generating paraphrases.☆29Updated 3 years ago
- Fine-tuning Open-Source LLMs for Adaptive Machine Translation☆76Updated 3 weeks ago