[EMNLP 2023 Demo] "CLEVA: Chinese Language Models EVAluation Platform"
☆64May 16, 2025Updated 10 months ago
Alternatives and similar repositories for CLEVA
Users that are interested in CLEVA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Chinese Generation Evaluation☆13Aug 14, 2023Updated 2 years ago
- Official code for "Rethinking Chain-of-Thought Reasoning for Videos"☆20Dec 14, 2025Updated 3 months ago
- Math24o: 高中奥林匹克数学竞赛测评集 High School Olympiad Mathematics Chinese Benchmark☆11Mar 27, 2025Updated last year
- Latex template for CUHK PhD Thesis☆11Jun 29, 2025Updated 9 months ago
- [ICCV 2025] Official code for "AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning"☆57Oct 9, 2025Updated 6 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- [ACL 2024] Making Long-Context Language Models Better Multi-Hop Reasoners☆19May 28, 2024Updated last year
- A Massive Multi-Level Multi-Subject Knowledge Evaluation benchmark☆105Jul 20, 2023Updated 2 years ago
- Data and code for paper "M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models"☆103Jun 15, 2023Updated 2 years ago
- GAOGAO-Bench-Updates is a supplement to the GAOKAO-Bench, a dataset to evaluate large language models.☆41Jan 7, 2025Updated last year
- Implementation of Variational Hierarchical User-based Conversation Model☆10Jul 2, 2021Updated 4 years ago
- Holistic Evaluation of Language Models (HELM) is an open source Python framework created by the Center for Research on Foundation Models …☆2,735Apr 3, 2026Updated last week
- Flames is a highly adversarial benchmark in Chinese for LLM's harmlessness evaluation developed by Shanghai AI Lab and Fudan NLP Group.