[EMNLP 2023 Demo] "CLEVA: Chinese Language Models EVAluation Platform"
☆63May 16, 2025Updated 9 months ago
Alternatives and similar repositories for CLEVA
Users that are interested in CLEVA are comparing it to the libraries listed below
Sorting:
- Chinese Generation Evaluation☆13Aug 14, 2023Updated 2 years ago
- 中文大语言模型评测第三期☆35Dec 30, 2025Updated 2 months ago
- Repository for initial POC NLP based SQL adapter using LLM.☆10May 6, 2025Updated 9 months ago
- Math24o: 高中奥林匹克数学竞赛测评集 High School Olympiad Mathematics Chinese Benchmark☆11Mar 27, 2025Updated 11 months ago
- [ACL 2025] Official code for ''Learning to Reason from Feedback at Test-Time''.☆13May 16, 2025Updated 9 months ago
- Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"☆11Oct 11, 2024Updated last year
- Latex template for CUHK PhD Thesis☆11Jun 29, 2025Updated 8 months ago
- ☆20Nov 20, 2024Updated last year
- Resources for paper "DialSummEval: Revisiting summarization evaluation for dialogues"☆15Jul 22, 2025Updated 7 months ago
- GAOGAO-Bench-Updates is a supplement to the GAOKAO-Bench, a dataset to evaluate large language models.☆38Jan 7, 2025Updated last year
- Flames is a highly adversarial benchmark in Chinese for LLM's harmlessness evaluation developed by Shanghai AI Lab and Fudan NLP Group.☆63May 21, 2024Updated last year
- [NeurIPS 2024] PyTorch code for the paper "Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning…☆23Oct 24, 2025Updated 4 months ago
- PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing☆21Mar 18, 2025Updated 11 months ago
- A Massive Multi-Level Multi-Subject Knowledge Evaluation benchmark☆104Jul 20, 2023Updated 2 years ago
- Data and code for paper "M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models"☆103Jun 15, 2023Updated 2 years ago
- [NAACL 2025] Representing Rule-based Chatbots with Transformers☆23Feb 9, 2025Updated last year
- Official repository for Decentralized Arena via Collective LLM Intelligence☆17May 19, 2025Updated 9 months ago
- Code and data of "Controllable Unsupervised Event-based Video Generation" (accepted as ICIP oral and invited by WACV workshop)☆19Nov 5, 2024Updated last year
- ☆17Oct 15, 2023Updated 2 years ago
- LLM evaluation.☆16Nov 7, 2023Updated 2 years ago
- DrBERT: A Robust Pre-trained Model in French for Biomedical and Clinical domains☆20Feb 7, 2024Updated 2 years ago
- [CVPR2025] Official Implementations "One-Way Ticket : Time-Independent Unified Encoder for Distilling Text-to-Image Diffusion Models"☆28Jul 28, 2025Updated 7 months ago
- ☆21Aug 19, 2024Updated last year
- ☆21Feb 26, 2024Updated 2 years ago
- MEASURING MASSIVE MULTITASK CHINESE UNDERSTANDING☆89Mar 24, 2024Updated last year
- ☆24Apr 2, 2024Updated last year
- [ICCV2023] Official code for "VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control"☆53Sep 21, 2023Updated 2 years ago
- ☆25Aug 23, 2024Updated last year
- ☆22Oct 21, 2024Updated last year
- [NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models☆57May 28, 2025Updated 9 months ago
- Holistic Evaluation of Language Models (HELM) is an open source Python framework created by the Center for Research on Foundation Models …☆2,684Updated this week
- ☆30May 22, 2024Updated last year
- Unleashing the Power of Cognitive Dynamics on Large Language Models☆63Sep 24, 2024Updated last year
- ☆83Sep 5, 2024Updated last year
- Repo for paper "Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents"☆61Feb 20, 2024Updated 2 years ago
- 面向中文大模型价值观的评估与对齐研究☆553Jul 20, 2023Updated 2 years ago
- NAACL 2024: SeaEval for Multilingual Foundation Models: From Cross-Lingual Alignment to Cultural Reasoning☆26Mar 3, 2025Updated 11 months ago
- [IJCAI 2024] CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning☆24Feb 1, 2024Updated 2 years ago
- 智鹿:中文消金领域对话大模型☆30Nov 12, 2023Updated 2 years ago