DAMO-NLP-SG/M3Exam

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/DAMO-NLP-SG/M3Exam)

DAMO-NLP-SG / M3Exam

Data and code for paper "M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models"

☆105

Alternatives and similar repositories for M3Exam

Users that are interested in M3Exam are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

DAMO-NLP-SG / IE-E2H
View on GitHub
Easy-to-Hard Learning for Information Extraction (ACL 2023 Findings)
☆14Jul 11, 2023Updated 3 years ago
kevinyaobytedance / llm_eval
View on GitHub
LLM evaluation.
☆16Nov 7, 2023Updated 2 years ago
Felixgithub2017 / CG-Eval
View on GitHub
Chinese Generation Evaluation
☆13Aug 14, 2023Updated 2 years ago
attapol / tltk
View on GitHub
Thai Language Toolkit
☆33Dec 20, 2025Updated 7 months ago
DAMO-NLP-SG / MT-LLaMA
View on GitHub
Multi-Task instruction-tuned LLaMA
☆14May 5, 2023Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
DAMO-NLP-SG / multilingual-safety-for-LLMs
View on GitHub
[ICLR 2024]Data for "Multilingual Jailbreak Challenges in Large Language Models"
☆106Mar 7, 2024Updated 2 years ago
HumanSignal / label-studio-examples
View on GitHub
Example Code to Supplement the Label Studio Blog
☆33Jan 6, 2026Updated 6 months ago
DAMO-NLP-SG / CMM
View on GitHub
✨✨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
☆54Jul 11, 2025Updated last year
DAMO-NLP-SG / DAMO-SeaLLMs
View on GitHub
[ACL 2024 Demo] SeaLLMs - Large Language Models for Southeast Asia
☆175Jul 30, 2024Updated last year
facebookresearch / lss_eval
View on GitHub
This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…
☆31Aug 25, 2023Updated 2 years ago
Yale-LILY / ReasTAP
View on GitHub
Data and Code for EMNLP 2022 paper "ReasTAP: Injecting Table Reasoning Skills During Pre-training via Synthetic Reasoning Examples"
☆15Jun 4, 2023Updated 3 years ago
LaVi-Lab / CLEVA
View on GitHub
[EMNLP 2023 Demo] "CLEVA: Chinese Language Models EVAluation Platform"
☆64May 16, 2025Updated last year
Yangyi-Chen / CoTConsistency
View on GitHub
The released data for paper "Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models".
☆34Sep 16, 2023Updated 2 years ago
xfhelen / MMBench
View on GitHub
An end-to-end benchmark suite of multi-modal DNN applications for system-architecture co-design
☆22Dec 13, 2024Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
iapp-technology / iapp-wiki-qa-dataset
View on GitHub
Open Thai Wikipedia QA Dataset made by iApp Technology
☆14Feb 17, 2021Updated 5 years ago
DAMO-NLP-SG / Auto-Arena-LLMs
View on GitHub
☆44Oct 7, 2024Updated last year
Ayame1006 / LLMtoGraph
View on GitHub
☆10Aug 24, 2023Updated 2 years ago
yrf1 / LLM-MassiveMulticultureNormsKnowledge-NCLB
View on GitHub
☆20Mar 12, 2025Updated last year
DAMO-NLP-SG / CLEX
View on GitHub
[ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models
☆78Mar 12, 2024Updated 2 years ago
abwilf / Social-IQ-2.0-Challenge
View on GitHub
The Social-IQ 2.0 Challenge Release for the Artificial Social Intelligence Workshop at ICCV '23
☆38Oct 13, 2023Updated 2 years ago
khuangaf / ZeroFEC
View on GitHub
Official implementation of the ACL 2023 paper: "Zero-shot Faithful Factual Error Correction"
☆17Aug 14, 2023Updated 2 years ago
DAMO-NLP-SG / SeaLLMs-Audio
View on GitHub
☆53Dec 7, 2025Updated 7 months ago
tiefenauer / wiki-lm
View on GitHub
Script to train a German n-gram Language Model on articles of Wikipedia
☆14Oct 20, 2018Updated 7 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
declare-lab / HyperRED
View on GitHub
This repository implements our EMNLP 2022 research paper A Dataset for Hyper-Relational Extraction and a Cube-Filling Approach.
☆28Dec 13, 2022Updated 3 years ago
declare-lab / instruct-eval
View on GitHub
This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.
☆552Mar 10, 2024Updated 2 years ago
OpenGVLab / Multi-Modality-Arena
View on GitHub
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing imag…
☆564Apr 21, 2024Updated 2 years ago
MLGroupJLU / LLM-eval-survey
View on GitHub
The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".
☆1,608Apr 17, 2026Updated 3 months ago
AILab-CVC / SEED-Bench
View on GitHub
(CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.
☆365Jan 14, 2025Updated last year
oriyor / turning_tables
View on GitHub
Implementation of the paper: "Turning Tables: Generating Examples from Semi-structured Tables for Endowing Language Models with Reasoning…
☆22Nov 2, 2021Updated 4 years ago
DAMO-NLP-SG / RemeMo
View on GitHub
[EMNLP 2023] Once Upon a *Time* in *Graph*: Relative-Time Pretraining for Complex Temporal Reasoning
☆17Oct 31, 2023Updated 2 years ago
NorskRegnesentral / NeuralTextSanitizer
View on GitHub
Neural models for detecting and masking personal information from texts
☆16Nov 25, 2022Updated 3 years ago
onejune2018 / Awesome-LLM-Eval
View on GitHub
Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs…
☆652Nov 24, 2025Updated 7 months ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
SeaEval / SeaEval
View on GitHub
NAACL 2024: SeaEval for Multilingual Foundation Models: From Cross-Lingual Alignment to Cultural Reasoning
☆26Mar 3, 2025Updated last year
NongMindHouse / MasterNongMind
View on GitHub
🔮 Mastermind puzzle solver using Genetic Algorithm and Grid Search for optimization
☆13Dec 21, 2023Updated 2 years ago
terryyz / llm-benchmark
View on GitHub
A list of LLM benchmark frameworks.
☆75Feb 17, 2024Updated 2 years ago
casmlab / NPHardEval
View on GitHub
Repository for NPHardEval, a quantified-dynamic benchmark of LLMs
☆64Mar 26, 2024Updated 2 years ago
Q-Future / Q-Bench
View on GitHub
①[ICLR2024 Spotlight] (GPT-4V/Gemini-Pro/Qwen-VL-Plus+16 OS MLLMs) A benchmark for multi-modality LLMs (MLLMs) on low-level vision and vi…
☆287Aug 12, 2024Updated last year
kaamanita / link-grammar
View on GitHub
The CMU Link Grammar natural language parser
☆12May 31, 2024Updated 2 years ago
princeton-nlp / ELIZA-Transformer
View on GitHub
[NAACL 2025] Representing Rule-based Chatbots with Transformers
☆23Feb 9, 2025Updated last year