☆76Jan 24, 2025Updated last year
Alternatives and similar repositories for ChineseSimpleQA
Users that are interested in ChineseSimpleQA are comparing it to the libraries listed below
Sorting:
- ☆17Mar 13, 2025Updated last year
- Official completion of “Training on the Benchmark Is Not All You Need”.☆39Dec 31, 2024Updated last year
- ☆146May 14, 2025Updated 10 months ago
- Debug DeepSpeed-Chat step by step in IDE (在IDE里一步一步调试DeepSpeed-Chat)☆10Apr 17, 2023Updated 2 years ago
- 中文原生等级化代码能力测试基准☆15Apr 11, 2024Updated last year
- ☆31Nov 9, 2024Updated last year
- [ACL 2025] Uncovering the Impact of Chain-of-Thought Reasoning for Direct Preference Optimization: Lessons from Text-to-SQL☆13Oct 9, 2025Updated 5 months ago
- ☆27Nov 20, 2023Updated 2 years ago
- ☆45Dec 12, 2024Updated last year
- ☆19Nov 5, 2024Updated last year
- ☆52Aug 14, 2024Updated last year
- PyTorch implementation for NAACL 2022 paper: "Document-Level Relation Extraction with Sentences Importance Estimation and Focusing"☆17Apr 29, 2022Updated 3 years ago
- 一种基于FasterWhisper与Pyannote的语音转文字工具以及基于纯多模态LLM的同类工具☆16Jan 31, 2026Updated last month
- Dataset and evaluation script for "Evaluating Hallucinations in Chinese Large Language Models"☆136Jun 5, 2024Updated last year
- This repository contains the code and data for the paper "VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception o…☆29Jul 9, 2025Updated 8 months ago
- MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning☆114Feb 2, 2026Updated last month
- Scalable Meta-Evaluation of LLMs as Evaluators☆43Feb 15, 2024Updated 2 years ago
- [AAAI'25] CharacterBench: Benchmarking Character Customization of Large Language Models☆20Aug 1, 2025Updated 7 months ago
- ☆31Mar 6, 2026Updated 2 weeks ago
- ☆15Apr 11, 2024Updated last year
- ☆11Nov 9, 2022Updated 3 years ago
- ☆22Jan 3, 2026Updated 2 months ago
- ☆62Oct 29, 2024Updated last year
- ☆28Feb 28, 2026Updated 2 weeks ago
- Evaluation for AI apps and agent☆44Jan 18, 2024Updated 2 years ago
- Jam of papers that interest or bore me and my friends :P☆24Jan 4, 2026Updated 2 months ago
- ☆33Jan 26, 2026Updated last month
- Automated Safety Testing of Large Language Models☆18Jan 31, 2025Updated last year
- XVERSE-MoE-A36B: A multilingual large language model developed by XVERSE Technology Inc.☆39Sep 12, 2024Updated last year
- Accelerating the development of large multimodal models (LMMs) with lmms-eval☆14Oct 14, 2024Updated last year
- Code and data for paper "Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation".☆24Oct 22, 2025Updated 4 months ago
- Towards Systematic Measurement for Long Text Quality☆37Sep 5, 2024Updated last year
- A flexible & scalable MLLM-based AIGC detection pipeline☆31Oct 27, 2025Updated 4 months ago
- This repository contains the dataset and implementation details of the paper "An In-depth Analysis of Implicit and Subtle Hate Speech Mes…☆10May 9, 2024Updated last year
- MaXM is a suite of test-only benchmarks for multilingual visual question answering in 7 languages: English (en), French (fr), Hindi (hi),…☆13Jan 16, 2024Updated 2 years ago
- Math24o: 高中奥林匹克数学竞赛测评集 High School Olympiad Mathematics Chinese Benchmark☆11Mar 27, 2025Updated 11 months ago
- ☆12Mar 28, 2025Updated 11 months ago
- [MM 2025] CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models☆54Oct 20, 2024Updated last year
- Generative Judge for Evaluating Alignment☆248Jan 18, 2024Updated 2 years ago