UMxYTL-AI-Labs / MalayMMLULinks
[MalayMMLU] This is the first-ever Bahasa Melayu multitask benchmark designed to elevate the performance of Large Language Models (LLMs) and Large Vision Language Models (LVLMs).
☆55Updated 5 months ago
Alternatives and similar repositories for MalayMMLU
Users that are interested in MalayMMLU are comparing it to the libraries listed below
Sorting:
- Natural Language Toolkit for Malaysian language, https://malaya.readthedocs.io/☆521Updated last week
- We gather Malaysian dataset! https://malaysian-dataset.readthedocs.io/☆328Updated 3 weeks ago
- A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.☆94Updated last year
- South-East Asia Large Language Models☆383Updated this week
- Efficiently find the best-suited language model (LM) for your NLP task☆134Updated 6 months ago
- 🤗 Benchmark Large Language Models Reliably On Your Data☆425Updated last month
- Fine-tune ModernBERT with custom tokenizers, curriculum learning, and next-gen optimizers.☆74Updated 2 weeks ago
- Build datasets using natural language☆559Updated 4 months ago
- ☆127Updated last year
- Sarjana is an open source desktop application which is used to assist in reading information materials, be it research papers or technica…☆24Updated last year
- implement RED metrics in fastapi integrate with Prometheus and Grafana☆40Updated 11 months ago
- Multilingual Speech Recognition for Indonesian Languages☆70Updated 3 years ago
- Code-Switched translations with Large Language models☆24Updated last year
- Fast Multimodal Semantic Deduplication & Filtering☆877Updated last week
- 💬 Language Identification with Support for More Than 2000 Labels -- EMNLP 2023☆186Updated 2 months ago
- 🎨 NeMo Data Designer: A general library for generating high-quality synthetic data from scratch or based on seed data.☆654Updated last week
- Fine-tuning Open-Source LLMs for Adaptive Machine Translation☆90Updated 6 months ago
- [ACL 2024 Demo] SeaLLMs - Large Language Models for Southeast Asia☆173Updated last year
- SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders.☆858Updated 3 months ago
- Tool for generating high quality Synthetic datasets☆1,476Updated 3 months ago
- Code for Arabic Nougat☆50Updated last year
- Repo for the Belebele dataset, a massively multilingual reading comprehension dataset.☆340Updated last year
- A little(lil) Language Model (LM). A tiny reproduction of LLaMA 3's model architecture.☆55Updated 9 months ago
- Fast State-of-the-Art Static Embeddings☆1,990Updated last month
- Simple UI for debugging correlations of text embeddings☆305Updated 8 months ago
- ☆56Updated last year
- The first large-scale summarization corpus for the Indonesian language. AACL 2020.☆38Updated 4 years ago
- Enhancing Translation with RAG-Powered Large Language Models☆89Updated last month
- This is the official repository for Peacock: A Family of Arabic Multimodal Large Language Models and Benchmarks.☆26Updated last year
- Instruction dataset for Arabic with 10,000 instruction and output pairs. CIDAR can be used to fine-tune LLMs to follow instructions.☆43Updated 9 months ago