aliyun / cflue
☆35Updated 5 months ago
Related projects ⓘ
Alternatives and complementary repositories for cflue
- ☆157Updated last year
- The respository of jec-qa.☆50Updated 4 years ago
- “悟道”数据☆39Updated 3 years ago
- FinEval是一个中文金融领域高质量多项选择与文本问答题的集合。☆160Updated 3 weeks ago
- ☆91Updated 11 months ago
- MEASURING MASSIVE MULTITASK CHINESE UNDERSTANDING☆87Updated 7 months ago
- Dataset and evaluation script for "Evaluating Hallucinations in Chinese Large Language Models"☆109Updated 5 months ago
- Code for "FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models (ACL 2024)"☆87Updated last week
- ☆129Updated 4 months ago
- 中文大语言模型评测第一期☆107Updated last year
- ☆124Updated 8 months ago
- CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models☆236Updated last week
- ☆93Updated 7 months ago
- [EMNLP 2023] C-STS: Conditional Semantic Textual Similarity☆66Updated 5 months ago
- An open-source and powerful Information Extraction toolkit based on GPT (GPT for Information Extraction; GPT4IE for short)。Note: we set a…☆170Updated last year
- T2Ranking: A large-scale Chinese benchmark for passage ranking.☆150Updated last year
- MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models☆16Updated 3 weeks ago
- Paper list of "The Life Cycle of Knowledge in Big Language Models: A Survey"☆61Updated last year
- ☆30Updated 6 months ago
- [ACL 2024] IEPile: A Large-Scale Information Extraction Corpus☆168Updated this week
- SuperCLUE-Agent: 基于中文原生任务的Agent智能体核心能力测评基准☆78Updated last year
- ☆94Updated last year
- ☆241Updated last year
- A Massive Multi-Level Multi-Subject Knowledge Evaluation benchmark☆99Updated last year
- ☆51Updated 3 months ago
- Official completion of “Training on the Benchmark Is Not All You Need”.☆26Updated last month
- Dataset and Code for ACL 2023 paper: "IM-TQA: A Chinese Table Question Answering Dataset with Implicit and Multi-type Table Structures". …☆15Updated 3 months ago
- Chinese Financial Assistant Benchmark for Large Language Model☆33Updated 2 months ago
- LAiW: A Chinese Legal Large Language Models Benchmark☆72Updated 4 months ago
- CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models☆38Updated 8 months ago