A Massive Multi-Level Multi-Subject Knowledge Evaluation benchmark
☆104Jul 20, 2023Updated 2 years ago
Alternatives and similar repositories for M3KE
Users that are interested in M3KE are comparing it to the libraries listed below
Sorting:
- Chinese Generation Evaluation☆13Aug 14, 2023Updated 2 years ago
- ☆99Dec 5, 2023Updated 2 years ago
- MEASURING MASSIVE MULTITASK CHINESE UNDERSTANDING☆89Mar 24, 2024Updated last year
- CDQA: Chinese Dynamic Question Answering Benchmark☆17Dec 13, 2024Updated last year
- deepseek思维树模式实现☆22Jul 17, 2025Updated 7 months ago
- ☆21Aug 19, 2024Updated last year
- 中文原生检索增强生成测评基准☆125Apr 18, 2024Updated last year
- Official github repo for C-Eval, a Chinese evaluation suite for foundation models [NeurIPS 2023]☆1,815Jul 27, 2025Updated 7 months ago
- 通用简单工具项目☆22Oct 6, 2024Updated last year
- Dataset for Findings of ACL 23 "VCSum: A Versatile Chinese Meeting Summarization Dataset"☆50Jul 25, 2023Updated 2 years ago
- A list of Numerical Multimodal reasoning papers and their implementation☆11May 13, 2024Updated last year
- Fine-Tuning LLM and embedding models☆27Sep 12, 2023Updated 2 years ago
- Semi-supervised Domain Adaptation of Machine Translation☆12Dec 8, 2022Updated 3 years ago
- Data and code for ACL 2023 paper "RobuT: A Systematic Study of Table QA Robustness Against Human-Annotated Adversarial Perturbations"☆15Feb 8, 2024Updated 2 years ago
- Official code repository for AAAI2021 paper Finding Sparse Structures for Domain Specific Neural Machine Translation☆11Apr 1, 2021Updated 4 years ago
- [ACL 2024 Main] NewsBench: A Systematic Evaluation Framework for Assessing Editorial Capabilities of Large Language Models in Chinese Jou…☆34Jun 25, 2024Updated last year
- GAOKAO-Bench is an evaluation framework that utilizes GAOKAO questions as a dataset to evaluate large language models.☆720Jan 7, 2025Updated last year
- ☆83Apr 18, 2024Updated last year
- Code for embedding and retrieval research.☆16Oct 24, 2023Updated 2 years ago
- 中文原生多层次文生视频测评基准☆18Jul 8, 2024Updated last year
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.☆64Jul 8, 2024Updated last year
- ☆18Nov 30, 2025Updated 3 months ago
- 中文大语言模型评测第二期☆71Oct 23, 2023Updated 2 years ago
- ACL 2024 | LooGLE: Long Context Evaluation for Long-Context Language Models☆195Oct 8, 2024Updated last year
- Collection of model-centric MCP servers☆26May 21, 2025Updated 9 months ago
- ☆23Nov 24, 2025Updated 3 months ago
- FlagEval is an evaluation toolkit for AI large foundation models.☆338Apr 24, 2025Updated 10 months ago
- Data and code for paper "M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models"☆103Jun 15, 2023Updated 2 years ago
- Source code for the NAACL 2021 paper: Pruning-then-Expanding Model for Domain Adaptation of Neural Machine Translation☆15Jul 19, 2021Updated 4 years ago
- ☆11Feb 25, 2026Updated last week
- ☆42Nov 7, 2023Updated 2 years ago
- QGEval: A Benchmark for Question Generation Evaluation☆19Nov 7, 2024Updated last year
- ☆21Feb 15, 2024Updated 2 years ago
- ☆20Nov 3, 2024Updated last year
- The complete training code of the open-source high-performance Llama model, including the full process from pre-training to RLHF.☆67Mar 27, 2023Updated 2 years ago
- MTEB: Massive Text Embedding Benchmark French extended☆19Jun 14, 2024Updated last year
- This repository is a collection of legal instruction datasets☆26Jul 12, 2024Updated last year
- MCP DeepResearch Server: 基于 LangGraph + Ollama + Tavily 的深度研究服务器,支持异步运行、超时控制与进度推送☆31Jun 16, 2025Updated 8 months ago
- [ICLR24] The open-source repo of THU-KEG's KoLA benchmark.☆52Sep 28, 2023Updated 2 years ago