SuperCLUE-Agent: 基于中文原生任务的Agent智能体核心能力测评基准
☆94Nov 9, 2023Updated 2 years ago
Alternatives and similar repositories for SuperCLUE-Agent
Users that are interested in SuperCLUE-Agent are comparing it to the libraries listed below
Sorting:
- TensorRT☆11Sep 22, 2020Updated 5 years ago
- Counting-Stars (★)☆83Nov 24, 2025Updated 3 months ago
- Using FasterTransformer for accelerating the predict speed of bert and roberta☆14Sep 20, 2019Updated 6 years ago
- A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)☆3,187Feb 8, 2026Updated 3 weeks ago
- A multi-task learning approach for conditioned response generation (NAACL 2021)☆12Nov 18, 2022Updated 3 years ago
- 仓库主要记录 NLP 算法工程师相关的顶会论文研读笔记【文本匹配篇】☆13Jul 9, 2022Updated 3 years ago
- 中文原生多层次文生视频测评基准☆18Jul 8, 2024Updated last year
- Code for Salesforce Research paper, CASPI: Causal-aware Safe Policy Improvement for Task-oriented dialogue - https://arxiv.org/abs/2103.0…☆14Jul 24, 2023Updated 2 years ago
- Berkeley Function Calling Leaderboard (BFCL) with Chinese-Language Evaluation☆23Apr 6, 2025Updated 10 months ago
- end-to-end dialog system dataset☆13Sep 15, 2019Updated 6 years ago
- GAOGAO-Bench-Updates is a supplement to the GAOKAO-Bench, a dataset to evaluate large language models.☆39Jan 7, 2025Updated last year
- CFBench: A Comprehensive Constraints-Following Benchmark for LLMs☆48Aug 26, 2024Updated last year
- ☆363Jun 13, 2024Updated last year
- [ACL 2024] A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset☆25May 29, 2025Updated 9 months ago
- Some code for tutorials following https://gym.openai.com/docs/rl☆14Jul 3, 2016Updated 9 years ago
- LLM evaluation.☆16Nov 7, 2023Updated 2 years ago
- LongBench v2 and LongBench (ACL 25'&24')☆1,101Jan 15, 2025Updated last year
- ☆21Aug 19, 2024Updated last year
- [NAACL 2024 Findings] Evaluation suite for the systematic evaluation of instruction selection methods.☆23Jul 26, 2023Updated 2 years ago
- CMMLU: Measuring massive multitask language understanding in Chinese☆804Dec 6, 2024Updated last year
- Data and code related to the report "Truth, Lies, and Automation: How Language Models Could Change Disinformation"☆28May 18, 2021Updated 4 years ago
- Source code of "Reasons to Reject? Aligning Language Models with Judgments"☆58Feb 29, 2024Updated 2 years ago
- Code for the MTEB leaderboard☆30Feb 4, 2025Updated last year
- [EMNLP 2023 Demo] "CLEVA: Chinese Language Models EVAluation Platform"☆63May 16, 2025Updated 9 months ago
- ☆104Dec 6, 2024Updated last year
- benchmark of KgCLUE, with different models and methods☆28Dec 13, 2021Updated 4 years ago
- deep training task☆30Apr 28, 2023Updated 2 years ago
- [SIGGRAPH Asia 2025] The official implementation of the paper "DvD: Unleashing a Generative Paradigm for Document Dewarping via Coordinat…☆32Nov 22, 2025Updated 3 months ago
- Data for the MTEB leaderboard☆46Feb 23, 2026Updated last week
- leetcode Study☆29Nov 20, 2022Updated 3 years ago
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆69Jul 20, 2023Updated 2 years ago
- Official github repo for C-Eval, a Chinese evaluation suite for foundation models [NeurIPS 2023]☆1,815Jul 27, 2025Updated 7 months ago
- MS-Agent: a lightweight framework to empower agentic execution of complex tasks☆4,011Feb 13, 2026Updated 2 weeks ago
- An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.☆1,953Aug 9, 2025Updated 6 months ago
- 面向中文大模型价值观的评估与对齐研究☆554Jul 20, 2023Updated 2 years ago
- Official github repo for ACLUE, an evaluation benchmark focused on ancient Chinese language comprehension☆33Mar 20, 2024Updated last year
- 机器学习使用过的API中文版及机器学习的理论知识☆13Jun 8, 2025Updated 8 months ago
- AI Alignment: A Comprehensive Survey☆136Nov 2, 2023Updated 2 years ago
- ☆37Aug 30, 2023Updated 2 years ago