llmeval / llmeval-3Links
中文大语言模型评测第三期
☆26Updated last year
Alternatives and similar repositories for llmeval-3
Users that are interested in llmeval-3 are comparing it to the libraries listed below
Sorting:
- CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models☆40Updated last year
- ☆82Updated last year
- ☆144Updated last year
- Counting-Stars (★)☆83Updated last month
- ☆50Updated last year
- ☆96Updated last year
- ☆48Updated last year
- [ICML'2024] Can AI Assistants Know What They Don't Know?☆81Updated last year
- [EMNLP 2023 Demo] "CLEVA: Chinese Language Models EVAluation Platform" [ACL 2025 Findings] "C2LEVA: Toward Comprehensive and Contaminatio…☆63Updated 2 months ago
- ☆56Updated 8 months ago
- [COLING 2025] ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios☆68Updated 2 months ago
- ☆84Updated last year
- Dataset and evaluation script for "Evaluating Hallucinations in Chinese Large Language Models"☆130Updated last year
- Unleashing the Power of Cognitive Dynamics on Large Language Models☆62Updated 9 months ago
- The implementation of paper "LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Fee…☆38Updated 11 months ago
- A Toolkit for Table-based Question Answering☆112Updated last year
- [ACL 2024] The official codebase for the paper "Self-Distillation Bridges Distribution Gap in Language Model Fine-tuning".☆124Updated 8 months ago
- ☆142Updated last year
- [ACL 2024 Demo] Official GitHub repo for UltraEval: An open source framework for evaluating foundation models.☆244Updated 8 months ago
- Chinese Generation Evaluation☆12Updated last year
- Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning☆81Updated last year
- ☆124Updated last year
- Llama-3-SynE: A Significantly Enhanced Version of Llama-3 with Advanced Scientific Reasoning and Chinese Language Capabilities | 继续预训练提升 …☆33Updated last month
- 中文大语言模型评测第二期☆70Updated last year
- Official github repo for AutoDetect, an automated weakness detection framework for LLMs.☆42Updated last year
- Official code implementation for the ACL 2025 paper: 'CoT-based Synthesizer: Enhancing LLM Performance through Answer Synthesis'☆27Updated last month
- Reformatted Alignment☆113Updated 9 months ago
- Official implementation of the paper "From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large L…☆50Updated last year
- 🐋 An unofficial implementation of Self-Alignment with Instruction Backtranslation.☆140Updated 2 months ago
- A Massive Multi-Level Multi-Subject Knowledge Evaluation benchmark☆102Updated last year