UMxYTL-AI-Labs / MalayMMLULinks
[MalayMMLU] This is the first-ever Bahasa Melayu multitask benchmark designed to elevate the performance of Large Language Models (LLMs) and Large Vision Language Models (LVLMs).
☆52Updated 2 months ago
Alternatives and similar repositories for MalayMMLU
Users that are interested in MalayMMLU are comparing it to the libraries listed below
Sorting:
- Natural Language Toolkit for Malaysian language, https://malaya.readthedocs.io/☆510Updated 3 weeks ago
- We gather Malaysian dataset! https://malaysian-dataset.readthedocs.io/☆325Updated 2 months ago
- South-East Asia Large Language Models☆365Updated 2 weeks ago
- Build datasets using natural language☆543Updated last month
- 🤗 Benchmark Large Language Models Reliably On Your Data☆411Updated last month
- A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.☆93Updated 9 months ago
- Efficiently find the best-suited language model (LM) for your NLP task☆127Updated 3 months ago
- Translate large dataset to any language with google translation api and multithreads processing, no key required!☆72Updated last week
- [ACL 2024 Demo] SeaLLMs - Large Language Models for Southeast Asia☆173Updated last year
- Fast State-of-the-Art Static Embeddings☆1,882Updated 3 weeks ago
- ☆222Updated last month
- UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM hallucination detection☆1,059Updated this week
- Tool for generating high quality Synthetic datasets☆1,362Updated last week
- Inference, Fine Tuning and many more recipes with Gemma family of models☆274Updated 3 months ago
- Real Time Speech Transcription with FastRTC ⚡️and Local Whisper 🤗☆687Updated 4 months ago
- Sarjana is an open source desktop application which is used to assist in reading information materials, be it research papers or technica…☆24Updated last year
- Repo for the Belebele dataset, a massively multilingual reading comprehension dataset.☆335Updated 10 months ago
- Fine-tune ModernBERT on a large Dataset with Custom Tokenizer Training☆73Updated 3 weeks ago
- Fine-tuning Open-Source LLMs for Adaptive Machine Translation☆87Updated 3 months ago
- implement RED metrics in fastapi integrate with Prometheus and Grafana☆40Updated 8 months ago
- Train LLM on Hugging Face infra☆65Updated last month
- Recipes to prepare datasets!☆14Updated 3 weeks ago
- SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders.☆833Updated 3 weeks ago
- Fast Semantic Text Deduplication & Filtering☆827Updated last week
- WangchanX Fine-tuning Pipeline☆46Updated last year
- A New Tamil Large Language Model (LLM) Based on Llama 2☆315Updated last year
- A little(lil) Language Model (LM). A tiny reproduction of LLaMA 3's model architecture.☆52Updated 6 months ago
- Colivara is a suite of services that allows you to store, search, and retrieve documents based on their visual embedding. ColiVara has st…☆1,349Updated 6 months ago
- 💬 Language Identification with Support for More Than 2000 Labels -- EMNLP 2023☆165Updated 5 months ago
- Cache-Augmented Generation: A Simple, Efficient Alternative to RAG☆1,431Updated 5 months ago