A Massive Multi-Level Multi-Subject Knowledge Evaluation benchmark
☆106Jul 20, 2023Updated 2 years ago
Alternatives and similar repositories for M3KE
Users that are interested in M3KE are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Chinese Generation Evaluation☆13Aug 14, 2023Updated 2 years ago
- ☆98Dec 5, 2023Updated 2 years ago
- MEASURING MASSIVE MULTITASK CHINESE UNDERSTANDING☆90Mar 24, 2024Updated 2 years ago
- CDQA: Chinese Dynamic Question Answering Benchmark☆18Dec 13, 2024Updated last year
- Semi-supervised Domain Adaptation of Machine Translation☆12Dec 8, 2022Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Official code repository for AAAI2021 paper Finding Sparse Structures for Domain Specific Neural Machine Translation☆11Apr 1, 2021Updated 5 years ago
- Official github repo for C-Eval, a Chinese evaluation suite for foundation models [NeurIPS 2023]☆1,854Jul 27, 2025Updated 10 months ago
- [EMNLP 2023 Demo] "CLEVA: Chinese Language Models EVAluation Platform"☆64May 16, 2025Updated last year
- ☆21Aug 19, 2024Updated last year
- 中文原生多层次文生视频测评基准☆18Jul 8, 2024Updated last year
- FlagEval is an evaluation toolkit for AI large foundation models.☆337Apr 24, 2025Updated last year
- [AAAI 2024] LLMEval Phase II dataset — professional domain evaluation across 12 academic disciplines☆71May 21, 2026Updated 3 weeks ago
- A list of Numerical Multimodal reasoning papers and their implementation☆11May 13, 2024Updated 2 years ago
- GAOKAO-Bench is an evaluation framework that utilizes GAOKAO questions as a dataset to evaluate large language models.☆759Jan 7, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- A list of paper & code on machine learning techniques for NLP research, including RL/Self-supervised Learning/VAE/GAN/Meta learning☆35Mar 13, 2020Updated 6 years ago
- Visual Storytelling post-edit dataset☆18Sep 27, 2019Updated 6 years ago
- Source codes of ACL 2022-Efficient Cluster-based k-Nearest-Neighbor Machine Translation☆26Sep 30, 2022Updated 3 years ago
- Code for the paper "Balancing Training for Multilingual Neural Machine Translation, ACL 2020"☆23May 26, 2021Updated 5 years ago
- The complete training code of the open-source high-performance Llama model, including the full process from pre-training to RLHF.☆68Mar 27, 2023Updated 3 years ago
- Data and code for ACL 2023 paper "RobuT: A Systematic Study of Table QA Robustness Against Human-Annotated Adversarial Perturbations"☆15Feb 8, 2024Updated 2 years ago
- The repository for paper <Evaluating Open-QA Evaluation>☆25Apr 9, 2024Updated 2 years ago
- 中文原生检索增强生成测评基准☆131Apr 18, 2024Updated 2 years ago
- BELLE: Be Everyone's Large Language model Engine(开源中文对话大模型)☆8,275Oct 16, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Source code for the paper "Multilingual Neural Machine Translation with Soft Decoupled Encoding"☆29Jun 2, 2021Updated 5 years ago
- Data and code for paper "M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models"☆105Jun 15, 2023Updated 2 years ago
- ACL 2024 | LooGLE: Long Context Evaluation for Long-Context Language Models☆196Oct 8, 2024Updated last year
- Dataset for Findings of ACL 23 "VCSum: A Versatile Chinese Meeting Summarization Dataset"☆51Jul 25, 2023Updated 2 years ago
- The official implementation of InterBERT☆11Oct 18, 2022Updated 3 years ago
- 通用简单工具项目☆22Oct 6, 2024Updated last year
- ☆23Apr 29, 2025Updated last year
- ☆21Feb 15, 2024Updated 2 years ago
- ☆924May 22, 2024Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- This is the repository of the Ape210K dataset and baseline models.☆200Dec 10, 2019Updated 6 years ago
- 中文图书语料MD5链接☆217Jan 31, 2024Updated 2 years ago
- The Math23k dataset for downloading☆22Apr 16, 2022Updated 4 years ago
- A framework for evaluating Machine Translation models.☆12Apr 21, 2026Updated last month
- [ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain☆107Mar 14, 2024Updated 2 years ago
- [ICLR24] The open-source repo of THU-KEG's KoLA benchmark.☆57Sep 28, 2023Updated 2 years ago
- MUSIED: A Benchmark for Event Detection from Multi-Source Heterogeneous Informal Texts☆11Nov 23, 2022Updated 3 years ago